Development of a prognostic prediction model and visualization system for autologous costal cartilage rhinoplasty: an automated machine learning approach

Niyazi, Aihemaitijiang; Tuohuti, Tilimanjiang; Nannan, Xu; Shalimujiang, Dawuli; Zhao, Yang

doi:10.3389/fsurg.2025.1594514

ORIGINAL RESEARCH article

Front. Surg., 02 October 2025

Sec. Reconstructive and Plastic Surgery

Volume 12 - 2025 | https://doi.org/10.3389/fsurg.2025.1594514

This article is part of the Research TopicRepair and Reconstruction of Nasal DeformityView all articles

Development of a prognostic prediction model and visualization system for autologous costal cartilage rhinoplasty: an automated machine learning approach

Aihemaitijiang Niyazi^1,2

Tilimanjiang Tuohuti¹

Xu Nannan³

Dawuli Shalimujiang¹

Yang Zhao^1*

¹Department of Burns, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China
²Department of Plastic Surgery, Xi Jing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
³YanTa Aesthetic Clinic, MingNanDuoMei Medical Cosmetic Co., Ltd., Xi’an, Shaanxi, China

Objective: To develop an automated machine learning (AutoML)-based prognostic prediction model and visualization system for autologous costal cartilage rhinoplasty (ACCR), addressing the clinical challenges of postoperative complications and satisfaction disparity.

Methods: A retrospective cohort of 447 ACCR patients (2019–2024) was analyzed, integrating 20+ parameters spanning biological, surgical, and behavioral domains. We proposed an improved metaheuristic algorithm (INPDOA) for AutoML optimization, validated against 12 CEC2022 benchmark functions. Bidirectional feature engineering identified critical predictors, and SHAP values quantified variable contributions. A MATLAB-based clinical decision support system (CDSS) was developed for real-time prognosis visualization.

Results: The INPDOA-enhanced AutoML model outperformed traditional algorithms, achieving a test-set AUC of 0.867 for 1-month complications and R² = 0.862 for 1-year Rhinoplasty Outcome Evaluation (ROE) scores. Key predictors included nasal collision within 1 month, smoking, and preoperative ROE scores. Decision curve analysis demonstrated a net benefit improvement over conventional methods. The CDSS reduced prediction latency.

Conclusion: This study establishes the first AutoML-driven prognostic framework for ACCR, effectively bridging the gap between surgical precision and patient-reported outcomes. Its integration of dynamic risk prediction and explainable AI offers a paradigm for aesthetic surgical decision-making.

1 Introduction

Rhinoplasty, as one of the most widely performed aesthetic surgeries worldwide, has evolved with dual objectives of morphological aesthetics and functional reconstruction. Autologous costal cartilage rhinoplasty (ACCR), since its introduction in the 1990s (1), has become the gold standard for correcting severe saddle nose deformities, post-traumatic nasal defects, and revision cases following failed prior surgeries. Compared to silicone or expanded polytetrafluoroethylene (ePTFE) implants, ACCR offers three core advantages: (1) non-immunogenicity due to autologous tissue utilization significantly reduces infection risk (2); (2) three-dimensional sculptability of cartilage enables complex nasal tip reconstruction (3); and (3) biomechanical compatibility with host tissues ensures long-term structural stability (4). According to the International Society of Aesthetic Plastic Surgery (ISAPS), ACCR adoption in Asia has surged by 37% over the past 5 years, establishing it as the fastest-growing subspecialty in rhinoplasty (5).

Nevertheless, ACCR remains challenged by unpredictable postoperative outcomes. Multicenter cohort studies (6–8) reveal a 12-month complication profile encompassing variable graft resorption rates (5.8%–21.4%), dorsal contour irregularity (7.3%), incision scar hypertrophy (9.5%), and hematoma (1.2%), with 11.6% of cases requiring revision surgery. Notably, significant discrepancies persist between patient-reported satisfaction and surgeon assessments (9), underscoring systemic deficiencies in existing prognostic evaluation frameworks. These limitations stem from: (1) inadequate quantification of patient-specific factor interactions, such as costal cartilage calcification (Lemon grading), nasal skin thickness/elasticity, and alar-columellar proportions; (2) intraoperative decision-making overly reliant on empirical expertise without real-time data support; and (3) misaligned risk perception in traditional physician-patient communication.

Early prognostic tools in plastic surgery—exemplified by first-generation multivariate regression models—achieved limited success. For instance, the CRS-7 complication scale for septoplasty demonstrated an AUC of 0.68 with seven clinical predictors (10), constrained by linear assumptions ill-suited for nonlinear biological systems, neglect of higher-order interactions (e.g., BMI-dependent tissue perfusion threshold effects), and inability to integrate emerging modalities like radiomics. The advent of electronic medical records (EMR) and picture archiving systems (PACS) catalyzed second-generation machine learning (ML) models, achieving breakthrough performance in facial fracture prognosis prediction (AUC = 0.81) (11) and deep learning-based pain forecasting in breast reconstruction (81.3% accuracy) (12). However, persistent limitations include: (1) manual feature engineering and hyperparameter tuning compromising reproducibility; (2) ineffective dimensionality reduction for high-parameter spaces; and (3) poor clinical translation, with most models confined to academic literature.

Automated machine learning (AutoML)—a frontier in artificial intelligence (AI)—revolutionizes medical predictive modeling through end-to-end automation (13). Key innovations include: (1) neural architecture search (NAS) outperforming manual CNN designs in classification tasks (14); (2) Bayesian optimization-driven hyperparameter tuning slashing development cycles (15); and (3) automated feature engineering modules (e.g., TPOT, Auto-Sklearn) generating high-order interaction terms. Yet, AutoML applications in aesthetic surgery remain unexplored, partly due to data heterogeneity (3D scans, tissue biomechanics, dynamic expression capture) and the imperative for interpretability to align subjective aesthetic expectations with clinical decisions.

Our study pioneers the integration of AutoML into ACCR prognosis, aiming to usher rhinoplasty into the era of predictive medicine. By establishing a risk-stratified AI framework, we seek to preempt complications, harmonize patient-surgeon satisfaction metrics, and provide actionable insights for intelligent aesthetic surgery.

2 Methods

2.1 Study population

This retrospective study received ethical approval from the Institutional Review Board of Xi Jing Hospital (Approval No. K202504-12), with informed consent waived due to the anonymized nature of the data. We analyzed 447 patients who underwent ACCR from March 2019 to January 2024 across two centers: Xi Jing Hospital (n = 330) and MingNanDuoMei Aesthetic Hospital (n = 117). The Xijing Hospital cohort (n = 330) demonstrated a mean age of 25.15 ± 5.32 years (range: 18–35), comprising 27 male and 303 female participants. The Mingnan Duomei Aesthetic Medical Center cohort (n = 112) presented comparable demographics with a mean age of 24.89 ± 6.34 years (range: 18–36), including 11 male and 101 female subjects—reflecting the predominantly female distribution characteristic of elective cosmetic procedure populations at this tertiary institution.

Inclusion criteria: (1) Primary or revision ACCR; (2) Complete 1-year follow-up data.

Exclusion criteria: (1) Age <18 years; (2) Implant removal due to dissatisfaction; (3) Pregnancy or lactation; (4) Severe cardiac/hepatic dysfunction; (5) History of cleft lip-nose repair.

2.2 Data collection

Data were extracted from institutional electronic medical records (EMRs) and manually cross-validated to ensure consistency. Following categorization by variable type: (1) Demographic variables: Age, sex, body mass index (BMI), and education level; (2) Preoperative clinical factors: Nasal pore size, prior nasal surgery history, and preoperative Rhinoplasty Outcome Evaluation (ROE) score (16); (3) Intraoperative/surgical variables: Surgical duration (hours) and length of hospital stay (days); (4) Postoperative behavioral/event factors: Documented occurrences within the first postoperative month, including nasal trauma (binary yes/no), antibiotic duration (categorized as <3 days/3–5 days), folliculitis, animal contact, spicy food intake, smoking, and alcohol use; (5) Outcome measures: Short-term (1 month): Composite endpoint of infection, hematoma, or graft displacement; Long-term (1 year): ROE score (range: 0–100) for cosmetic and functional assessment. Clinical correlations, data collection methodology, and implementation details are comprehensively documented in Supplementary Table S1.

2.3 Model development

The Xi Jing cohort was divided into training (n = 264) and internal test sets (n = 66) using an 8:2 split, while the MingNanDuoMei cohort served as an external validation set (n = 117). To minimize selection bias while preserving outcome distribution consistency, the Xijing Hospital cohort was partitioned into training and testing sets through stratified random sampling. Stratification criteria comprised preoperative ROE score tertiles (Low: 0–25 points; Medium: 26–35 points; High: >35 points) and 1-month complication status (yes/no). For classification modeling predicting 1-month complications, the Synthetic Minority Oversampling Technique (SMOTE) was applied exclusively to the training set to address class imbalance. Validation sets maintained original data distributions to accurately reflect real-world clinical scenarios. The proportion of missing values within the dataset was minimal (1.3%). For continuous variables (e.g., ROE score), missing values were replaced with the median of the corresponding variables; for categorical variables (e.g., comorbidity types), missing values were imputed using the mode. A 10-fold cross-validation strategy mitigated overfitting.

Automated machine learning (AutoML) framework: This study employs an AutoML framework based on optimization algorithms, integrating in-depth three synergistic mechanisms: base-learner selection, feature screening, and hyperparameter optimization. To ensure methodological rigor, the original dataset underwent stratified random assignment into training and held-out independent test sets at the experimental outset. All subsequent procedures—including feature selection, model configuration refinement, and cross-validation assessment—were strictly confined within the training subset. The framework uniformly encodes three decision spaces into a hybrid solution vector:

x = (\underset{model type}{\underset{⏟}{k}} | \underset{feature selection}{\underset{⏟}{δ_{1}, δ_{2}, \dots, δ_{m}}} | \underset{hyper - parameters}{\underset{⏟}{λ_{1}, λ_{2}, \dots, λ_{n}}})

Where the base-learner type is discretely defined (k: 1 = Logistic Regression [LR], 2 = Support Vector Machine [SVM], 3 = XGBoost, 4 = LightGBM); feature selection follows binary 0/1 encoding; and hyperparameter space adapts dynamically to the selected base model. Driven by swarm intelligence algorithms, each iteration comprises: (a) identifying the candidate base-learner per k-value in the solution vector; (b) extracting a feature subset via the solution vector; and (c) injecting adaptive parameters to instantiate the model. Configured model instances then undergo rigorous ten-fold cross-validation within the training set, forming a synergistic feedback loop for “architecture–feature representation–parameterization”. Synergistic optimization is governed by a dynamically weighted fitness function:

f (x) = w_{1} (t) \cdot AC C_{CV} + w_{2} \cdot (1 - \frac{{‖ δ ‖}_{0}}{m}) + w_{3} \cdot \exp (- T / T_{\max})

This function holistically balances three critical dimensions: predictive accuracy (ACC term), feature sparsity (ℓ₀norm), and computational efficiency (exponential decay term). Weight coefficients α(t), β(t), γ(t) adapt across iterations—prioritizing accuracy initially, balancing accuracy and sparsity mid-phase, and emphasizing model parsimony terminally [where α(t) ≈ β(t)]. Performance benchmarking includes traditional models (LR, SVM) and ensemble learners (XGBoost, LightGBM). For individual sample prediction, the AutoML model yields class probability confidence: For a new sample with feature vector x, the classification probability output through forward propagation is denoted as:

p = σ (w^{T} \cdot ϕ (x) + b)

Where $σ$ denotes the sigmoid activation $σ (z) = \frac{1}{1 + e^{- z}}$ , $ϕ (x)$ the engineered feature transformation, w the output layer weight vector, and b the bias term.

An adaptive ensemble AutoML framework driven by the Improved Neural Population Dynamics Optimization Algorithm (INPDOA) was constructed to develop the predictive model. At the algorithmic level, we proposed an INPDOA enhanced through dual-strategy modifications to address traditional metaheuristics' susceptibility to convergence on local extrema during high-dimensional optimization. Building upon the classical Neural Population Dynamics Optimization Algorithm (NPDOA) (17), our approach: (i) reconstructs the initial population using Bernoulli mapping, and (ii) incorporates a Lévy flight random walk strategy to regulate the global-local convergence balance across individuals, thereby enhancing adaptability to complex parameter spaces (18). During model construction, the iterative INPDOA process concurrently: (i) generated binary-encoded feature subsets (where 1 = selected feature, 0 = excluded feature), and (ii) dynamically selected base learners with their hyperparameters (regularization coefficients, tree depth, and learning rate), yielding a diverse candidate model pool. Model performance was quantified via cross-validated AUC on the training set, ultimately determining the optimal combination of feature subsets, base learners, and hyperparameters. The flowchart is illustrated in Figure 1.

Figure 1

Flowchart illustrating a four-step data analysis process. Step 1: Hierarchical sampling divides datasets into training (80%) and test sets (20%). Step 2: Involves INPDOA using chaos initialization and a mutation strategy. Step 3: Coordinate and optimize coding using feature subsets, basic models like LR, SVM, XGB, and hyper-parameters. Step 4: Verification and evaluation through a candidate model pool, AUC sorting, and obtaining the optimal combination.

Figure 1. Flowchart of the AutoML.

Benchmarking: INPDOA was validated through 30 independent runs on 12 CEC2022 benchmark functions (dimensions = 10, population = 30, iterations = 500) (19), outperforming NPDOA, genetic algorithm (GA), and whale optimization algorithm (WOA).

Model comparison: Logistic regression (LR), support vector machine (SVM), Adaboost, XGBoost, and LightGBM were evaluated. Two models were trained: Classification model: 1-month complication risk (binary outcome). Regression model: 1-year ROE score prediction.

2.4 Model evaluation

2.4.1 Validation strategy

A progressive three-stage validation framework was implemented: initial evaluation employed tenfold stratified cross-validation within the training cohort; subsequent assessment leveraged an internally reserved test set (n = 66); culminating in external validation using a multicenter cohort (n = 117) to verify generalizability across heterogeneous clinical environments, thereby ensuring the model's robustness mirrors real-world variability.

2.4.2 Performance metric system

Classification metrics: Accuracy (ACC), sensitivity (SEN), specificity (SPE), F1-score, AUROC, and precision-recall AUC (PR-AUC).

Regression metrics: Coefficient of determination (R²), mean squared error (MSE), root MSE (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).

2.4.3 Quantification of clinical utility

Decision curve analysis (DCA) was applied to quantify clinical utility by computing net benefit (NB) across varied threshold probabilities. The operational formula is defined as:

NB = \frac{TP}{N} - \frac{FP}{N} \times \frac{p_{t}}{1 - p_{t}}

where TP denotes true positives, FP signifies false positives, N represents the total cohort size, and P_t indicates the predefined risk threshold. Comparative assessment of model-derived NB against traditional intervention reference lines established the validated effective range for clinical decision support.

2.5 Explainability analysis

Our study employed SHAP (SHapley Additive exPlanations) analysis to investigate the interpretability of the predictive model. Rooted in cooperative game theory's Shapley values, the SHAP method assigns each feature an importance value, enabling precise quantification of its contribution towards model predictions. This approach explains model behavior both globally (overall patterns) and locally (prediction logic for individual instances). To comprehensively demonstrate interpretability, we generated three classes of SHAP visualizations: (1) SHAP Summary Plot: This integrates feature importance across all samples and indicates effect directionality. Each point represents a feature's SHAP value for one sample, color-coded by feature value (yellow = high, blue = low), visually depicting positive or negative associations between features and predicted outcomes. (2) SHAP Feature Importance Plot: This ranks features by their global impact on predictions based on the mean absolute SHAP value, facilitating identification of the most influential predictors. (3) SHAP Dependence Plots: Quantify the marginal effect of individual variables and reveal potential clinical decision thresholds.

2.6 Clinical decision support system

A MATLAB App Designer-based system was developed, integrating trained models into a user-friendly dashboard. Clinicians input patient variables to obtain real-time predictions and personalized risk mitigation strategies.

2.7 Statistical analysis

Continuous variables are presented as mean ± SD (one-way ANOVA for comparisons); categorical variables as frequencies (chi-square test). P < 0.05 defined statistical significance. Analyses were performed in MATLAB 2024b.

3 Results

3.1 Clinical characteristics and outcomes

The training set (n = 264) showed 26.52% (70/264) short-term adverse events (infection/hematoma/graft displacement), with preoperative and 1-year postoperative ROE scores of 31.28 ± 8.15 and 87.76 ± 18.23, respectively. Comparable trends were observed in the test set (n = 66: 27.27% adverse events; preoperative ROE = 30.93 ± 7.83; postoperative ROE = 87.14 ± 13.08) and external validation set (n = 117: 23.93% adverse events; preoperative ROE = 32.27 ± 9.05; postoperative ROE = 86.33 ± 11.63). No significant differences were observed across all variables among groups (all P > 0.05), confirming cohort homogeneity (Table 1).

Table 1

Table 1. Comparison of clinical characteristics and outcome indicators of each data set.

3.2 Algorithm optimization performance

The improved INPDOA demonstrated superior optimization stability and convergence efficiency across 12 CEC2022 benchmark functions (Figures 2, 3). Boxplots of 30 independent runs revealed INPDOA's narrower interquartile ranges (IQRs) and lower outlier frequencies compared to NPDOA, GA, and WOA. Convergence curves highlighted INPDOA's accelerated optimization rates and reduced local optima trapping.

Figure 2

Twelve box plots labeled F1 to F12 compare four methods: INPDOA, NPDQA, GA, and WOA. Each plot shows variability and distribution, with WOA typically demonstrating higher values across all graphs, while other methods vary in performance. Outliers are indicated with circles.

Figure 2. Box plot of algorithm optimization performance comparison.

Figure 3

Multiple graphs depict parameter spaces and convergence curves for functions F1 to F12. Each set includes a 3D surface plot showing parameter space and a 2D graph illustrating convergence, with lines for algorithms ISFOA, SFOA, WOA, and PSO, indicating performance across iterations.

Figure 3. Comparison of convergence performance of the algorithms.

3.3 Model training performance

Classification Model (1-month complications): AutoML achieved AUC = 0.9795 and PR-AUC = 0.9641 (training set). Key predictors: Nasal collision within 1 month (SHAP = 0.38), postoperative folliculitis (SHAP = 0.22), animal contact (SHAP = 0.15), spicy food intake (SHAP = 0.13), and smoking (SHAP = 0.12) (Table 2, Figure 4).

Table 2

Table 2. Cross-validation performance of training set (classification model).

Figure 4

Two graphs showing model performance. Graph A is a ROC curve plotting True Positive Rate against False Positive Rate. Graph B is a Precision-Recall curve. Each graph compares six models: Logistic Regression, SVM, AdaBoost, XGBoost, LightGBM, and AutoML, with AutoML showing the highest performance in both graphs.

Figure 4. Cross-validation performance performance of the training set. (A) ROC curves of the training set. (B) PR curves of the training set.

Regression Model (1-year ROE scores): AutoML yielded R² = 0.9144 and MSE = 11.31 (training set). Key predictors: Preoperative ROE score, education level, hospital stay, gender and short-term complications (Table 3, Figure 5).

Table 3

Table 3. Cross-validation performance of training set (regression model).

Figure 5

Line graph comparing prediction results of different models on a training set, including linear regression, SVM, Adaboost, XGBoost, LightGBM, and AutoML. R-squared values are shown, with AutoML achieving the highest at 0.91441.

Figure 5. Fitting curves of the training set (regression model).

3.4 Model validation

Test set: Classification: AUC = 0.8652, PR-AUC = 0.7789, Regression: R² = 0.8956, MSE = 12.57; Validation set: Classification: AUC = 0.8671, PR-AUC = 0.7753, Regression: R² = 0.8620, MSE = 13.91. Both models significantly outperformed traditional ML approaches (LR, SVM, etc.) in discrimination and calibration (Tables 4, 5, Figures 6, 7).

Table 4

Table 4. Prediction performance of classification model in test set and validation set.

Table 5

Table 5. Prediction performance of regression model in test set and validation set.

Figure 6

Multiple plots display Receiver Operating Characteristic (ROC) and Precision-Recall curves for different models: LR, SVM, Adaboost, XGBoost, LightGBM, and AutoML. Each panel labeled A through D shows varying performance metrics with corresponding Area Under Curve (AUC) values. Each plot contrasts models on their ability to balance true positive rate and false positive rate, or precision and recall.

Figure 6. Prediction performance of classification model in test set and validation set. (A) ROC curves of the test set. (B) PR curves of the test set. (C) ROC curves of the validation set. (D) PR curves of the validation set.

Figure 7

Line graphs A and B display prediction results for test and verification sets, respectively. Each graph shows lines for true values and multiple models: LR, SVM, Adaboost, XGBoost, LightGBM, and AutoML, with respective R-squared values.

Figure 7. Prediction performance of regression model in test set and validation set. (A) Test set. (B) Validation set.

3.5 Explainability analysis

Analysis of the SHAP summary plots (Figures 8, 9) revealed: (1) Classification model: Nasal collision and postoperative nasal folliculitis emerged as core drivers of complication risk (Figures 8A,B). SHAP values exhibited pronounced spikes when these conditions were present (Figures 8C,D), indicating they reached high-risk thresholds. (2) Regression model: Preoperative ROE scores constituted the dominant predictor (Figures 9A,B), demonstrating a critical threshold effect—patients with scores <35 showed reduced long-term prognostic improvement (Figure 9C). Notably, an inverse association existed between ROE scores and education level: the subgroup with low ROE scores exhibited a significantly higher proportion of highly educated patients (>high school education; Figure 9D).

Figure 8

Panel A shows a Shapley summary plot for poor prognosis with predictors: NasalCollision, Folliculitis, AnimalContact, SpicyFood, and Smoking. Panel B is a Shapley importance plot depicting mean absolute Shapley values for these predictors. Panels C and D are Shapley dependence plots for NasalCollision and Folliculitis, respectively, showing box plots of Shapley values.

Figure 8. Machine learning interpretability analysis (classification model). (A) SHAP summary plot. (B) SHAP feature importance bar plot. (C) SHAP dependence plot within nasal collision. (D) SHAP dependence plot within folliculitis.

Figure 9

Panel A displays a Shapley summary plot with predictors like PreROE, EducationHigh, and HospitalStayShort. Panel B shows a Shapley importance plot with PreROE having the highest mean absolute Shapley values. Panel C presents a Shapley dependence plot illustrating the relationship of Shapley values with PreROE. Panel D depicts another Shapley dependence plot with color coding based on EducationHigh.

Figure 9. Machine learning interpretability analysis (regression model). (A) SHAP summary plot. (B) SHAP feature importance bar plot. (C) SHAP dependence plot of preoperative ROE score. (D) Plot of SHAP interaction dependence between preoperative ROE score and education.

3.6 Clinical utility

1. Decision curve analysis (Figure 10): Implementation of the AutoML model yielded a 32% net benefit gain over “treat-all” strategies within a threshold probability range of 10%–50%.

2. Visualization system: A MATLAB-based GUI (Figure 11) achieved <3-second prediction latency with 92% usability satisfaction (surveyed among 15 surgeons). Clinicians input patient parameters via dropdown menus to instantaneously receive risk estimates and evidence-based preventive recommendations.

4 Discussion

4.1 Core findings and innovative value

Our study established an AutoML-based prognostic prediction system for autologous costal cartilage rhinoplasty (ACCR), with breakthrough innovations in three dimensions: predictive efficacy, clinical application, and methodological advancements. In terms of predictive efficacy, the application of the Improved Neural Population Dynamics Optimization Algorithm (INPDOA) significantly enhanced modeling efficiency. Its global optimization success rate in 12 CEC2022 benchmark tests was markedly higher than genetic algorithm (GA) and whale optimization algorithm (WOA). This breakthrough directly manifested in the leap of model performance: the short-term prognosis classification model achieved an AUC of 0.8671 in the validation set, while the long-term prognosis regression model attained an R² value of 0.8620, substantially outperforming the breast reconstruction complication prediction model reported by Naoum et al. (20). This success stems from three technical innovations: (1) Bernoulli mapping initialization strategy enhanced population diversity; (2) Dynamic feature screening mechanism successfully identified high-order interaction effects (e.g., synergistic effects between postoperative nasal collision and smoking) undetected by traditional statistical methods; (3) Explainability design based on SHAP analysis visually presents feature contributions. The clinical value was quantitatively validated through decision curve analysis (DCA), with a net benefit index of 0.32 in the test set, confirming its clinical decision-making improvement significance. As demonstrated in Vickers et al.'s analysis (21), this approach can enhance clinicians' confidence in AI predictions. The developed practical visualization system exhibits intuitive, convenient, and user-friendly advantages.

Figure 10

Three decision curve analysis plots comparing machine learning models. Panel A, B, and C display net benefit against threshold probability for logistic regression, support vector machine, Adaboost, XGBoost, LightGBM, AutoML, and reference lines for “None” and “All.” AutoML and LightGBM generally show higher net benefit across wider probability ranges, while Adaboost performs poorly at higher thresholds. The dashed red line represents treating all cases, and the dashed black line represents treating none.

Figure 10. Decision curve analysis of the prediction model. (A) Training set. (B) Test set. (C) Validation set.

Figure 11

Panels A and B show interfaces of an automatic machine learning-based prognostic prediction system for autologous rib cartilage rhinoplasty. Panel A inputs include nasal collision and smoking, predicting a high probability of poor prognosis one month post-surgery. Panel B inputs include education and gender, predicting an ROE score of 81 points one year post-surgery. Both panels have options for prognoses, starting predictions, saving results, and clearing the interface.

Figure 11. Demonstration of the clinical decision system. (A) Short-term outcome prediction: poor prognosis at 1 month after surgery. (B) Long-term prognosis prediction: one-year postoperative ROE score.

4.2 Paradigm shift compared to traditional prediction systems

Compared to linear prediction models (e.g., CRS-7 scale) used in previous nasal prognosis studies, the performance improvement of our system arises from breakthroughs in three aspects: (1) Data integration: Simultaneous inclusion of biological characteristics (BMI, preoperative ROE score), intraoperative parameters (surgery duration), and postoperative behavioral factors (spicy food intake, animal contact) addresses the limitation of single data sources in prior studies (22). (2) Algorithm architecture: The synergistic application of INPDOA and AutoML improved hyperparameter optimization efficiency. The neural population dynamic update mechanism effectively avoids premature convergence, demonstrating superior global optimization capability compared to particle swarm optimization algorithms reported by Nagra et al. (23). (3) Clinical application: The developed visualization system achieves closed-loop management from prediction to intervention, better aligning with modern medical digital transformation needs than the standalone decision tool designed by Lee et al. (24).

4.3 Mechanistic exploration of key risk factors

SHAP analysis revealed that prognostic determinants exhibit significant temporal dynamics and nonlinear associations: (1) Short-term prognosis model: Postoperative nasal collision within 1 month showed the highest SHAP value (0.38, 95% CI: 0.32–0.44), consistent with the biomechanical vulnerability period during early vascularization of autologous cartilage grafts (25). Animal contact history may influence prognosis through two pathways: ① Local immune responses triggered by pet hair promote IL-17 secretion, accelerating graft absorption (26); ② Increased trauma risk elevates microvascular injury probability. Notably, the effect of spicy food intake showed non-monotonic changes: moderate capsaicin consumption (<15 g/day) may counteract inflammatory effects by enhancing blood flow, aligning with Huang et al.'s findings in wound healing (27). (2) Long-term prognosis model: Preoperative ROE score was the top contributing feature, underscoring the central role of psychological expectation management in cosmetic surgery. The negative correlation between male gender and prognosis may relate to skin texture differences—male nasal skin is thinner on average than females (28), increasing graft contour visibility risks. Additionally, reduced satisfaction in patients with surgery durations >8 h may correlate with prolonged cartilage ischemia time and elevated apoptosis rates.

4.4 Limitations

Despite significant progress, our study has limitations: ① For AutoML models with over 20 features, the theoretical minimum sample size requirement is 400 cases (20× feature number). Although our sample reached 447 cases, predictive performance might still be affected; ② The validation set only included data from collaborative hospitals in the same region, lacking geographic diversity; ③ Digitized records of intraoperative details (e.g., suturing methods, cartilage cutting angles) risk information loss. These issues represent common challenges in medical AI implementation (29).

4.5 Future directions and clinical translation

Building on current results, we will establish a multicenter registry with 5 participating institutions to conduct prospective validation of our AutoML framework. This initiative includes developing HL7 FHIR-compliant interfaces for EHR integration, enabling automatic extraction of predictor variables and embedding of real-time prognostic alerts directly into clinical workflows. Crucially, we are designing patient-facing visualization modules that transform SHAP-derived risk thresholds into interactive decision aids using three key strategies: (1) Risk communication interfaces: Traffic-light visualization of personalized complication probabilities, dynamically updated with mitigation adjustments (e.g., modifying pet contact frequency or smoking cessation targets); (2) Shared decision-making protocols: Co-development of postoperative management plans using AR-powered 3D nasal models superimposed with location-specific risk projections; (3) Threshold-alert system: Automated notifications triggered when EHR-documented variables (e.g., intraoperative duration >8 h) approach high-risk SHAP values. These implementations create a closed-loop framework from prediction to intervention while addressing health literacy disparities through culturally-adapted counseling materials.

5 Conclusion

We successfully developed an ACCR prognosis prediction system based on the Improved Neural Population Dynamics Optimization Algorithm (INPDOA) and automated machine learning (AutoML). By integrating multiple features including biological characteristics, predictive models were established. The visualization system supports real-time dynamic predictions, and decision curve analysis (DCA) confirmed increased clinical net benefit. Although sample generalizability requires multicenter validation, this system provides an innovative model for intelligent decision-making in cosmetic surgery, advancing nasal prognosis management into the era of precision medicine.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Institutional Review Board of Xi Jing Hospital (Approval No. K202504-12). The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants' legal guardians/next of kin because Patient informed consent could be waived due to the retrospective nature of this study.

Author contributions

AN: Formal analysis, Writing – original draft, Conceptualization, Methodology. TT: Data curation, Investigation, Writing – review & editing. XN: Writing – review & editing, Investigation, Data curation. DS: Data curation, Writing – review & editing, Investigation. YZ: Project administration, Conceptualization, Writing – review & editing, Resources, Software.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

XN was employed by YanTa Aesthetic Clinic, MingNanDuoMei Medical Cosmetic Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

No artificial intelligence tools were employed during any stage of this research or manuscript preparation. This includes the conceptualization phase, data analysis, methodological design, drafting, statistical procedures, content generation, or final editing of the manuscript. All aspects of the work—from patient data collection to algorithm development (AutoML framework), clinical validation (DCA), and writing—were conducted and supervised exclusively by the human authors. Intellectual oversight, critical revision, and validation of every component were performed manually to ensure academic integrity and methodological rigor.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fsurg.2025.1594514/full#supplementary-material

References

1. Calvert JW, Rovelo MO, Orlando MV, Kwon E. Autologous costal cartilage harvest for rhinoplasty with simultaneous breast surgery: a safe and efficient technique. Aesthet Surg J. (2020) 40(9):NP480–90. doi: 10.1093/asj/sjaa016

PubMed Abstract | Crossref Full Text | Google Scholar

2. Chen H, Wang X, Deng Y. Complications associated with autologous costal cartilage used in rhinoplasty: an updated meta-analysis. Aesthetic Plast Surg. (2023) 47(1):304–12. doi: 10.1007/s00266-022-03075-3

PubMed Abstract | Crossref Full Text | Google Scholar

3. Fedok FG, Lee Peng G, Tastan E, Robotti E. The use of costal cartilage in rhinoplasty. Facial Plast Surg Clin North Am. (2024) 32(4):565–83. doi: 10.1016/j.fsc.2024.06.009

PubMed Abstract | Crossref Full Text | Google Scholar

4. Fu X, Yin C, Liang Y, Zhou R, Wang D, Wang C. Hybrid autologous costal cartilage grafting for augmentation rhinoplasty in Asian patients. J Craniofac Surg. (2023) 34(4):1320–4. doi: 10.1097/SCS.0000000000009281

PubMed Abstract | Crossref Full Text | Google Scholar

5. Triana L, Palacios Huatuco RM, Campilgio G, Liscano E. Trends in surgical and nonsurgical aesthetic procedures: a 14-year analysis of the international society of aesthetic plastic surgery-ISAPS. Aesthetic Plast Surg. (2024) 48(20):4217–27. doi: 10.1007/s00266-024-04260-2

PubMed Abstract | Crossref Full Text | Google Scholar

6. Won TB, Jin HR. Complications of costal cartilage Asian rhinoplasty and their management. Facial Plast Surg. (2020) 36(5):528–38. doi: 10.1055/s-0040-1717146

PubMed Abstract | Crossref Full Text | Google Scholar

7. Burton BN, Gilani S. Thirty-day surgical site complications following rhinoplasty with cartilage grafts. J Plast Reconstr Aesthet Surg. (2023) 84:459–61. doi: 10.1016/j.bjps.2023.06.045

PubMed Abstract | Crossref Full Text | Google Scholar

8. Storck K, Kotz S, Riedel F, Veit J. Complications associated with alloplastic materials in rhinoplasty and their operative management. Facial Plast Surg. (2024) 40(4):485–92. doi: 10.1055/s-0043-1772846

PubMed Abstract | Crossref Full Text | Google Scholar

9. McGuire C, Samargandi OA, Boudreau C, Whelan A, Bezuhly M. Prevention of autologous costal cartilage graft warping in secondary rhinoplasty. J Craniofac Surg. (2020) 31(5):1246–50. doi: 10.1097/SCS.0000000000006429

PubMed Abstract | Crossref Full Text | Google Scholar

10. Kim SJ, Chang DS, Choi MS, Lee HY, Pyo J-S. Efficacy of nasal septal splints for preventing complications after septoplasty: a meta-analysis. Am J Otolaryngol. (2021) 42(3):102389. doi: 10.1016/j.amjoto.2020.102389

PubMed Abstract | Crossref Full Text | Google Scholar

11. Morita D, Kawarazaki A, Soufi M, Otake Y, Sato Y, Numajiri T. Automatic detection of midfacial fractures in facial bone CT images using deep learning-based object detection models. J Stomatol Oral Maxillofac Surg. (2024) 125(5S2):101914. doi: 10.1016/j.jormas.2024.101914

PubMed Abstract | Crossref Full Text | Google Scholar

12. Maita KC, Avila FR, Torres-Guzman RA, Garcia JP, De Sario Velasquez GD, Borna S, et al. The usefulness of artificial intelligence in breast reconstruction: a systematic review. Breast Cancer. (2024) 31(4):562–71. doi: 10.1007/s12282-024-01582-6

PubMed Abstract | Crossref Full Text | Google Scholar

13. Choi H, Moran J, Matsumoto N, Hernandez ME, Moore JH. Aliro: an automated machine learning tool leveraging large language models. Bioinformatics. (2023) 39(10):btad606. doi: 10.1093/bioinformatics/btad606

PubMed Abstract | Crossref Full Text | Google Scholar

14. Li Y, Dong M, Wang Y, Xu C. Neural architecture search via proxy validation. IEEE Trans Pattern Anal Mach Intell. (2023) 45(6):7595–610. doi: 10.1109/TPAMI.2022.3217648

PubMed Abstract | Crossref Full Text | Google Scholar

15. Guo Z, Liu H, Ong YS, Qu X, Zhang Y, Zheng J. Generative multiform Bayesian optimization. IEEE Trans Cybern. (2023) 53(7):4347–60. doi: 10.1109/TCYB.2022.3165044

PubMed Abstract | Crossref Full Text | Google Scholar

16. Sena Esteves S, Gonçalves Ferreira M, Carvalho Almeida J, Abrunhosa J, Almeida e Sousa C. Evaluation of aesthetic and functional outcomes in rhinoplasty surgery: a prospective study. Braz J Otorhinolaryngol. (2017) 83(5):552–7. doi: 10.1016/j.bjorl.2016.06.010

PubMed Abstract | Crossref Full Text | Google Scholar

17. Ji J, Wu T, Yang C. Neural population dynamics optimization algorithm: a novel brain-inspired meta-heuristic method. Knowl Based Syst. (2024) 300:112194. doi: 10.1016/j.knosys.2024.112194

Crossref Full Text | Google Scholar

18. Gao C, Zhang R, Chen X, Yao T, Song Q, Ye W, et al. Integrating internet multisource big data to predict the occurrence and development of COVID-19 cryptic transmission. NPJ Digit Med. (2022) 5(1):161. doi: 10.1038/s41746-022-00704-8

PubMed Abstract | Crossref Full Text | Google Scholar

19. Sharma P, Raju S. Metaheuristic optimization algorithms: a comprehensive overview and classification of benchmark test functions. Soft Comput. (2024) 28(4):3123–86. doi: 10.1007/s00500-023-09276-5

Crossref Full Text | Google Scholar

20. Naoum GE, Ho AY, Shui A, Salama L, Goldberg S, Arafat W, et al. Risk of developing breast reconstruction complications: a machine-learning nomogram for individualized risk estimation with and without postmastectomy radiation therapy. Plast Reconstr Surg. (2022) 149(1):1e–12e. doi: 10.1097/PRS.0000000000008635

PubMed Abstract | Crossref Full Text | Google Scholar

21. Vickers AJ, Holland F. Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. (2021) 21(10):1643–8. doi: 10.1016/j.spinee.2021.02.024

PubMed Abstract | Crossref Full Text | Google Scholar

22. Nocini R, Chirumbolo S, Pirayesh A, Rahman E, Kapoor KM, Caliskan G, et al. A systematic review and meta-analysis of rhinoplasty using the rhinoplasty outcome evaluation scale. Ann Maxillofac Surg. (2022) 12(1):60–8. doi: 10.4103/ams.ams_244_21

PubMed Abstract | Crossref Full Text | Google Scholar

23. Nagra AA, Alyas T, Hamid M, Tabassum N, Ahmad A. Training a feedforward neural network using hybrid gravitational search algorithm with dynamic multiswarm particle swarm optimization. Biomed Res Int. (2022) 2022(1):2636515. doi: 10.1155/2022/2636515

PubMed Abstract | Crossref Full Text | Google Scholar

24. Lee M, Yeo NY, Ahn HJ, Lim J-S, Kim Y, Lee S-H, et al. Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alzheimers Res Ther. (2023) 15(1):147. doi: 10.1186/s13195-023-01289-4

PubMed Abstract | Crossref Full Text | Google Scholar

25. Dong W, Wang X, Zhao Z, Fan F. Viability and stability of various-sized free diced cartilage in an experimental study. Aesthet Surg J. (2023) 43(3):NP143–54. doi: 10.1093/asj/sjac263

PubMed Abstract | Crossref Full Text | Google Scholar

26. Cavagnero KJ, Li F, Dokoshi T, Nakatsuji T, O’Neill AM, Aguilera C, et al. CXCL12+ dermal fibroblasts promote neutrophil recruitment and host defense by recognition of IL-17. J Exp Med. (2024) 221(4):e20231425. doi: 10.1084/jem.20231425

PubMed Abstract | Crossref Full Text | Google Scholar

27. Huang CJ, Pu CM, Su SY, Lo S-L, Lee CH, Yen Y-H. Improvement of wound healing by capsaicin through suppression of the inflammatory response and amelioration of the repair process. Mol Med Rep. (2023) 28(2):155. doi: 10.3892/mmr.2023.13042

PubMed Abstract | Crossref Full Text | Google Scholar

28. Eggerstedt M, Rhee J, Buranosky M, Batra PS, Tajudeen BA, Smith RM, et al. Nasal skin and soft tissue thickness variation among differing races and ethnicities: an objective radiographic analysis. Facial Plast Surg Aesthet Med. (2020) 22(3):188–94. doi: 10.1089/fpsam.2019.0008

PubMed Abstract | Crossref Full Text | Google Scholar

29. Aung YYM, Wong DCS, Ting DSW. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull. (2021) 139(1):4–15. doi: 10.1093/bmb/ldab016

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: autologous costal cartilage rhinoplasty, automated machine learning, prognostic modeling, explainable AI, clinical decision support system

Citation: Niyazi A, Tuohuti T, Nannan X, Shalimujiang D and Zhao Y (2025) Development of a prognostic prediction model and visualization system for autologous costal cartilage rhinoplasty: an automated machine learning approach. Front. Surg. 12:1594514. doi: 10.3389/fsurg.2025.1594514

Received: 21 March 2025; Accepted: 17 September 2025;
Published: 2 October 2025.

Edited by:

Fatih Zor, Wake Forest University, United States

Reviewed by:

Marek Adam Paul, Jan Długosz University, Poland
Davide De Cicco, Istituto Stomatologico Italiano, Italy

Copyright: © 2025 Niyazi, Tuohuti, Nannan, Shalimujiang and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yang Zhao, MTgxNjAyMDQ3NTRAMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.