Predictive model of positive surgical margins after radical prostatectomy based on Bayesian network analysis

Objective This study aimed to analyze the independent risk factors for marginal positivity after radical prostatectomy and to evaluate the clinical value of the predictive model based on Bayesian network analysis. Methods We retrospectively analyzed the clinical data from 238 patients who had undergone radical prostatectomy, between June 2018 and May 2022. The general clinical data, prostate specific antigen (PSA)–derived indicators, puncture factors, and magnetic resonance imaging (MRI) characteristics were included as predictive variables, and univariate and multivariate analyses were conducted. We established a nomogram model based on the independent predictors and adopted BayesiaLab software to generate tree-augmented naive (TAN) and naive Bayesian models based on 15 predictor variables. Results Of the 238 patients included in the study, 103 exhibited positive surgical margins. Univariate analysis revealed that PSA density (PSAD) (P = 0.02), Gleason scores for biopsied tissue (P = 0.002) and the ratio of positive biopsy cores (P < 0.001), preoperative T staging (P < 0.001), and location of abnormal signals (P = 0.002) and the side of the abnormal signal (P = 0.009) were all statistically significant. The area under curve (AUC) of the established nomogram model based on independent predictors was 73.80%, the AUC of the naive Bayesian model based on 15 predictors was 82.71%, and the AUC of the TAN Bayesian model was 80.80%. Conclusion The predictive model of positive resection margin after radical prostatectomy based on Bayesian network demonstrated high accuracy and usefulness.


Introduction
Radical prostatectomy is the most essential treatment for localized and locally advanced prostate cancer.However, due to the size and location of the tumor and the anatomical characteristics of the prostate, incomplete resection of the tumor may occur, resulting in positive surgical margins of pathological specimens (1).A positive surgical margin usually indicates a higher biochemical recurrence rate (2), and studies have shown that patients with positive surgical margins are 2-4 times more likely to experience biochemical recurrence than those with negative surgical margins (3) and that they also possess a potentially shortened progression-free survival (4).Therefore, if the probability of encountering a positive resection margin postoperatively can be effectively predicted prior to surgery, an appropriate treatment plan can be better formulated, and the surgical resection margin rate can then be reduced to slow the progression of the disease and improve patient prognosis.
Bayesian theory, which is a statistical theory corresponding to classical statistics, applies sample information to make inferences about a given population.The structure of a Bayesian network is a directed acyclic graph that represents the joint probability density between high-dimensional variables.The TAN Bayesian network (tree-augmented naive Bayesian network) is an extension of the classical Bayesian network model and can address correlated variables with favorable predictive ability for high-dimensional data.As a machine learning method combining probability theory and graph theory, Bayesian network can analyze the problem combined with conditional probability and is often used in disease prediction, treatment effect evaluation, diagnosis, and treatment decision making (5)(6)(7).On this basis, we established a predictive model for positive surgical margins after radical prostatectomy.We also calculated and analyzed the weight of each influencing factor to explore its clinical guiding significance.

Patients and methods
We collected data from patients who underwent transperitoneal laparoscopic or robot-assisted radical prostatectomy at the Affiliated Hospital of Qingdao University from June 2018 to May 2022.All patients underwent systematic 12-core biopsy and cognitive magnetic resonance imaging (MRI)/ultrasound fusiontargeted biopsy.We defined a positive postoperative resection margin as one where the tumor cells contacted the ink-stained edge of the surgical specimen; if the tumor cells were only close to the ink-stained edge, the surgical margin was regarded as negative.Each pathological section with a positive margin was evaluated by two independent pathologists, and the results were reviewed by the deputy director or higher of our department when there was any disagreement.
According to the postoperative pathological results, the patients were divided into a positive resection margin group and a negative resection margin group.The predictive variables of the two groups included general clinical factors (age, body mass index, prostate volume, and surgical type), PSA-derived indicators [total prostate-specific antigen (TPSA), the free (F)/TPSA ratio, PSA density (PSAD)]; biopsy factors (positive needle ratio, Gleason score); MRI-related factors [clinical stage, prostate imaging and reporting data system (PI-RADS) score, location of an abnormal signal, side of an abnormal signal, length of the membranous urethra (MUL), and maximum diameter of an abnormal signal].The positive needle ratio was defined as the ratio of the number of pathologically positive needles to the total number of needles and was divided into three groups of <30%, 30%-60%, and ≥60%, and preoperative T staging was divided into T1-T2 and ≥T3 groups according to the 2017 AJCC tumor-staging criteria.Based upon the prostate imaging reports and data scoring system with 3.0T multi-parameter MRI, the PI-RADS scoring group was divided into four groups: 1, 3, 4, and 5. MUL was defined as the average distance from the tip of the prostate to the urethra at the level of the bulb of the penis in the mid-sagittal plane, and the maximal diameter of the abnormal signal was defined as the maximal diameter of the abnormal signal on the horizontal axis upon MRI T2WI.

Inclusion criteria
1.The postoperative pathological diagnosis was prostate cancer.2. Radical prostatectomy was completed by laparoscopic or robot-assisted laparoscopic surgery, and the operators were all associate-chief physicians or above who had successfully completed their advanced training.

Exclusion criteria
1.The patient had undergone neoadjuvant endocrine therapy before radical surgery.2. The original data were incomplete.

Postoperative pathology indicated benign prostatic tissue
or prostatic intraepithelial neoplasia.

Statistical methods
We employed SPSS 26.0 statistical software to analyze our data.Measurement data with a normal distribution were presented as mean ± SD, and measurement data that did not follow a normal distribution were presented as median (interquartile range).The differences among groups were determined using the Kruskal-Wallis test.Counting data are expressed as frequencies, and we compared groups using the chi-squared test.The chi-squared test was used for univariate analysis of the above variables, and P < 0.05 was considered to be statistically significant.Logistic multivariable regression analysis was used to analyze the statistically significant factors, and P < 0.05 was considered statistically significant.

Development of predictive models
All data were initially divided into a test set and validation set according to an 8:2 ratio by random sampling, and these were then used to establish the predictive model and internal validation set, respectively.The TAN Bayesian model and naive Bayesian model based on 15 clinical predictors were established by exploiting the BayesiaLab software, and the R language was adopted to establish a nomogram model based on the independent prognostic factors for the calculation of accuracy.The respective receiver operating characteristic (ROC) curves of the Bayesian model and the nomogram model were constructed, and the advantages and disadvantages of the models were evaluated according to their areas under the ROC curves (AUCs).Finally, the BayesiaLab validation function was executed to perform a priori statistical analysis on the Bayesian model with high-predictive efficiency, and the positive margin was used as the target variable; the remaining variables were employed as the predictor variables for a posteriori analysis.Based on the results of the a posteriori analysis, we analyzed and calculated the Birnbaum importance measure and ranked the importance of each predictor variable (8).

Correlation analysis of predictors 6.2.1 Positive results of univariate analysis
Prognostic factors such as TPSA, PSAD, Gleason score of puncture pathology, ratio of positive needles, T stage, abnormal signal location, and abnormal signal side difference between positive and negative margin groups were statistically significant (P < 0.05) (Table 1).

Results of logistic multivariable regression analysis of positive surgical margin
We conducted multivariable analysis on the indicators with statistically significant differences in the univariate analysis, and our results revealed that T stage and positive needle ratio were independent predictors of positive resection margin after prostate cancer surgery (Table 2).

Development of predictive models 6.3.1 Naive Bayesian network model and effectiveness evaluation
The 15 clinical predictors in Table 1 were included to establish a naive Bayesian prediction model (Figure 1A).In this figure, the red nodes represent the target value, the blue nodes indicate the predicted value, the depth of the color designates its importance, and the darker the color the higher the importance.The ROC curve was established using the model data (Figure 1B), and the AUC of the model was 81.43%.We applied the BayesiaLab verification function to analyze, calculate, and rank the importance of the model, and the results of importance ranking showed that clinical stage and positive needle ratio were in the first importance interval, and that PSAD was in the second importance interval, with an interval of 15-20.Location of an abnormal signal, Gleason score, TPSA, side of abnormal signal, and PI-RADS score were in the third interval, with important intervals of 10-15 (Figure 1C).

TAN Bayesian network model and effectiveness evaluation
Figure 2A depicts the TAN Bayesian prediction model based on the 15 clinical predictors in Table 1, with the central node representing the target value and the peripheral node representing the predicted value.The ROC curve was established based on the model data (Figure 2B), and the AUC of this model was 80.80%.In the TAN Bayes model variables, PSAD was closely related to F/TPSA, TPSA, and positive needle ratio; the positive needle ratio was correlated with T stage, and the maximum transverse diameter of the tumor was correlated with abnormal signal location, abnormal signal side and PI-RADS score.

Nomogram model and efficiency evaluation
The nomogram model that we constructed was based on independent prognostic factors in the multivariate analysis as shown in Figure 3A, and the established ROC is shown in Figure 3B, with an AUC of 73.80%.

Discussion
Radical prostatectomy is one of the preferred treatment options for early localized and partial locally progressive prostate cancer (9).Its goal is to eradicate prostate tumors, control disease progression, and ensure urinary control and sexual ability as much as possible, and to improve patients' quality of life.Most prostate cancer patients can achieve a clinical cure after radical prostatectomy, but postoperative complications and tumor recurrence can significantly affect quality of life and even the lifespan of patients (10, 11).A positive surgical margin is an important predictor of poor prognosis, and studies have revealed that patients with a positive surgical margin show a significantly increased risk of biochemical recurrence and even of tumor progression (12).In the present study, 35 patients with positive surgical margins manifested biochemical recurrence within one year, accounting for 34.0%(35 of 103), and eight patients (5.9%, eight of 135) with negative surgical margins had biochemical recurrence within 1 year.The probability of biochemical recurrence in our patients with  positive surgical margins was 5.7 times higher than in patients with negative surgical margins, confirming our analysis.In addition, patients with positive surgical margins require further adjuvant therapy (13) (such as local radiotherapy and endocrine therapy), and carry increased psychological burdens (14).Local treatment may prolong recovery time with respect to urinary control and complications such as radiation proctitis that affect patient quality of life (15).Therefore, if positive surgical margins can be effectively identified before surgery, corresponding treatment strategies can be better formulated, and the surgical margin rate can then be reduced, which will slow the progression of the disease and improve the overall quality of life of patients after surgery.
In view of the adverse effects of a positive surgical margin on prognosis, the creation of a model that clinicians can use to evaluate the risk of positive surgical margins and the benefits of surgery is particularly important.A model that predicts early radical prostatectomy for patients with a low probability of positive surgical margins will engender improved surgical benefit.For patients with a high probability of a positive surgical margin, preoperative neoadjuvant endocrine therapy can reduce the volume of the prostate tumor, reduce the tumor stage, and allow for appropriate timing of surgery according to patient condition; this will, in turn, reduce the probability of a positive surgical margin and enhance the achievement of a favorable radical treatment effect.Nomogram, as a commonly used clinical prediction model, can transform complex regression equations into simple visual graphs.Chinese researchers have previously generated a nomogram for the risk of positive surgical margins in prostate cancer based on preoperative factors, and it has been confirmed to exhibit acceptable predictive value in later stages (16).However, nomograms also have their limitations.When a nomogram model contains too many predictors, it is prone to overfitting and, thus, subject to strict conditions.For example, the dependent variables included in the logistic regression model should follow an exponential distribution, while the establishment of regression models is primarily based on statistically significant factors.Bayesian network analysis, as a machine learning method combining probability theory with graph theory, can be used to analyze a problem structure combined with conditional probability, and it is often implemented in the establishment of models such as disease prediction (17), treatment effect evaluation (7), and diagnosis and treatment decision making (18)-importantly, it displays acceptable efficiency.In this study, analyses by naive Bayesian network, TAN Bayesian network, and a nomogram model were exploited to predict positive surgical margins after radical prostatectomy, with the respective ROC curves for the Bayesian network and nomogram model constructed and the AUCs calculated to evaluate the superiority or inferiority of the models.Our results showed that the AUC for the naive Bayes model based on 15 predictors was 81.43%, which was higher than the 80.8% for the TAN Bayesian network model and the 69.2% for the nomogram model based on the same dataset, thus reflecting good predictive efficiency.We hypothesize that the high-predictive level was due to the following two reasons: the construction of the Bayesian network was not only limited to independent predictors but also included nonindependent predictors that exerted a greater impact on the outcome.Although some predictors are not independent predictors, they still generate a certain impact on the results.Therefore, only by integrating as many factors as possible that exert an impact on the results can we achieve a prediction that is closest to the actual situation.However, Bayesian network analysis also allows for a small number of missing data and adopts relevant algorithms to infer data so as to avoid sample size and accuracy reductions caused by a small number of missing data.
In addition, TAN Bayesian network models allow associations to be uncovered between predictor variables and do not only show relationships between the predictor variable and the target variable but also reveal the corresponding relationships between the prediction variables (19).This compensates for the shortcomings of the previous classical statistical models, where it was difficult to resolve complex relationships between multiple variables.In this study, PSAD was found to be closely related to F/TPSA, TPSA, and the positive needle ratio; the maximum transverse diameter of the tumor was correlated with abnormal signal location, abnormal signal side, and PI-RADS score.In contrast to the TAN Bayesian network, the naive Bayesian model only facilitates the study of the relationship between the predictor variables and the target variables, and the default predictor variables reflect no correlation.The correlation between predictor variables herein was small, such that the predictive performance of the naive Bayesian model was higher than that of the TAN Bayesian model.In practice, different Bayesian models can be selected according to correlations between predictor variables so as to achieve optimal predictive ability (20).
With the continuous progress demonstrated in science and technology, an increasing number of machine learning algorithms have been applied to research on clinical issues (15).Computer algorithms can be applied to mine the relationships between clinical data by themselves, and this can compensate for the shortcomings of summarization through past self-experiences and, thus, provide a simpler and more efficient way to solve clinical problems (21-23).
This study was an attempt to resolve a problematic clinical issue with a computer algorithm and our model proved successful and highly accurate.Through this model, we could then accurately predict the risk of postoperative positive surgical margins in patients.We recommend that early radical prostatectomy be performed in patients at low risk so as to improve the surgical benefit in patients.There were also some limitations to the current study due to its relatively small sample size and its design as a single-center study, and the predictive impact of the model on disparate populations was still unclear without external validation.However, we posit that, with the addition of a large amount of multi-center data and the continuous improvement of models using machine learning, the Bayesian prediction model of positive surgical margins after radical prostatectomy can provide a robust basis for clinical decision making.
FIGURE 1 (A) Naive Bayesian model for predicting positive surgical margins after radical prostatectomy.BMI, body mass index; ASL, abnormal signal location; PV, prostatic volume; MD, maximum diameter of abnormal signal; Mul, length of the membranous urethra; ASS, abnormal signal side; ST, surgery type; GS, Gleason score.(B) Naive Bayes ROC curve.(C) Naive Bayesian significance analysis.

TABLE 1
Univariate analysis between positive margin group and negative margin group.

TABLE 2
Multivariable analysis of postoperative margin-positive and -negative groups.