Multiparametric MRI and Machine Learning Based Radiomic Models for Preoperative Prediction of Multiple Biological Characteristics in Prostate Cancer

Objectives This study aims to develop and evaluate multiparametric MRI (MP-MRI)-based radiomic models as a noninvasive diagnostic method to predict several biological characteristics of prostate cancer. Methods A total of 252 patients were retrospectively included who underwent radical prostatectomy and MP-MRI examinations. The prediction characteristics of this study were as follows: Ki67, S100, extracapsular extension (ECE), perineural invasion (PNI), and surgical margin (SM). Patients were divided into training cohorts and validation cohorts in the ratio of 4:1 for each group. After lesion segmentation manually, radiomic features were extracted from MP-MRI images and some clinical factors were also included. Max relevance min redundancy (mRMR) and recursive feature elimination (RFE) based on random forest (RF) were adopted to select features. Six classifiers were included (SVM, KNN, RF, decision tree, logistic regression, XGBOOST) to find the best diagnostic performance among them. The diagnostic efficiency of the construction models was evaluated by ROC curves and quantified by AUC. Results RF performed best among the six classifiers for the four groups according to AUC values (Ki67 = 0.87, S100 = 0.80, ECE = 0.85, PNI = 0.82). The performance of SVM was relatively the best for SM (AUC = 0.77). The number and importance of DCE features ranked first in the models of each group. The combined models of MP-MRI and clinical characteristics showed no significant difference compared with MP-MRI models according to Delong’s tests. Conclusions Radiomics models based on MP-MRI have the potential to predict biological characteristics and are expected to be a noninvasive method to evaluate the risk stratification of prostate cancer.


INTRODUCTION
Prostate cancer (PCa) is the highest incidence cancer and the second leading cause of death among men according to the latest statistics in 2021 (1). Early and precise detection of prostate cancer and subsequent appropriate treatment decisions play an essential role for patients (2,3).
Nowadays, some tumor biomarkers and biological characteristics have been proved to be useful for evaluating the malignant potential of prostate cancer and may influence the treatment decision-making (4)(5)(6). Ki67 as a biomarker of cell proliferation, which is expressed in all phases except resting (G0) phase of the cell cycle, has been demonstrated to be an independent prognostic factor in low volume and grade prostate cancer (7). According to the results of the Mayo model, when the expression of Ki67 increased by 1%, the cancer-specific mortality would increase by1 2% after radical prostatectomy (RP) accordingly (8). S100 is a family of acidic calcium-binding proteins and was found to be upregulated in various tumors (9). Aberg et al. revealed two subtypes of S100 were significantly correlated with short progression-free survival in prostate cancer patients with metastases (10). Extracapsular extension (ECE) could be used as an indication of local advanced prostate cancer (cT3). The positive ECE would increase the risk of death to 5 times than the negative ECE for patients after undergoing radical prostatectomy (11). Surgical margin (SM) is determined by pathological staining of RP specimens. Numerous studies have disclosed that positive SM increased cancer-specific mortality and the likelihood of biochemical recurrence of patients significantly (12)(13)(14). Prostate cancer tends to invade and grow along nerves, and it is also considered to be a potential metastatic route, which is called perineural invasion (PNI) (15,16). PNI has been documented to be associated with biochemical recurrence (BCR) and promoting tumor aggressiveness (16,17). Therefore, judging these biological characteristics before operation can better evaluate the invasiveness of tumor and may change the clinical decision-making patterns in the future.
Definitely, biopsy can solve some of the above problems to a certain extent and is still the mainstream method. It is reported that the combined technique of fusion targeted and systematic biology has been proved to be helpful to improve the diagnostic accuracy (18). However, the defects of sampling errors and a series of subsequent complications, such as pain and hematuria (19), limit the real-time monitoring and accurate evaluation of biological characteristics by biopsy. Multiparametric-magnetic resonance imaging (MP-MRI) is one of the most accurate noninvasive methods to evaluate local lesions, which contains T1 and T2 sequences that provide anatomical and disease information, as well as other sequences that provide functional information, such as diffusion-weighted imaging (DWI), dynamic contrast enhanced (DCE), and magnetic resonance spectroscopy (MRS) (20). As a routine screening method for prostate cancer, MP-MRI can reflect the phenotype and heterogeneity of prostate cancer by signal intensity and enhancement features (21,22). Furthermore, MP-MRI images may contain many clinically valuable information related to the different biological characteristics above, such as ECE (23), which may be hard for radiologists to dig out in clinical practice.
Currently, radiomics serves as a novel approach that extracts abundant quantitative features with high throughput, and through machine learning methods to establish prediction models, which were proved to effectively provide more potential useful information for the clinical practice in urology (24,25). Radiomics of prostate cancer has been widely used in tumor identification, staging, and prognosis evaluation (26,27). However, more comprehensive and accurate prediction models that can determine the risk stratification and provide references for clinical decision-making for prostate cancer still need to be explored.
Thus, in the present study, we attempted to establish and validate the radiomic predictive models for five biological characteristics related to aggressiveness (Ki67, S100, ECE, PNI, SM) of prostate cancer based on MP-MRI. In addition, some clinical information was also added to establish the corresponding combined models.

Patients, Pathological Evaluation, and MRI Acquisition
This work was approved by the Institutional Review Board (IRB) of Shanghai General Hospital (2021KY107), and the patient's informed consent was authorized to be waived according to the nature of the research. This retrospective study collected patients who underwent radical prostatectomy from May 2013 to January 2020. The exclusion criteria are as follows: (1) Preoperative DCE-MRI were unavailable (n = 149); (2) No mass lesion found on MRI image (n = 9); (3) Missing DWI (n = 3); (4) Poor imaging quality (n = 10); (5) Missing clinical information (n = 5); (6) Biopsy before MRI leading to unclear lesions (n = 32); and (7) Previous treatment before MRI examinations (n = 12). Finally, we recruited 252 patients as our subjects.
The clinical and pathological information we collected in this study is as follows: age, prostate serum antigen (PSA), white blood cell (WBC), red blood cell (RBC), hemoglobin, lymphocyte, platelet, albumin, alkaline phosphatase (ALP), platelet-to-lymphocyte ratio (PLR), fibrinogen, surgical Gleason score, immunohistochemistry (Ki67, S100, AR), SM, ECE, PNI, seminal vesicle invasion (SVI), and lymphatic vascular invasion (LVI). Details of the above indicators can be found in Supplementary Table S2. Considering the importance of biological characteristics of prostate cancer mentioned above, and the routine indexes of pathological examination, as well as data distribution (balanced or imbalanced), we selected five of them (Ki67, S100, ECE, PNI, SM) as our research indicators. Each of these indicators was classified as a group, and we divided each group into training cohort and validation cohort in the ratio of 4:1. All indicators were divided into positive and negative in the form of binary classification, except Ki67, which was divided into high expression and low expression with 10% as the threshold according to previous studies (7). The gold standard references of this research were based on the results of radical prostatectomy.
Three MRI sequences were included in this study: T2, DWI, DCE (arterial phase), and this combination also meets the PI-RADS v2 (Prostate Imaging-Reporting and Data System, Version 2) standard of MP-MRI (28). The protocols of MRI examinations are described in Supplement A.

Tumor Segmentation
The patient's images and clinical data were imported into the Darwin research platform (https://arxiv.org/abs/2009.00908) for subsequent tumor lesion delineation and model establishment. The work flow is shown in Figure 1. The boundary of the volume of interest (VOIs) on each axial-DWI picture was manually delineated by radiologist 1 (JC, 5 years of experience in urinary imaging). The ROIs on DWI were then copied to the sequences of T2 and DCE. If some of the copied results of the two sequences were not ideal, further modifications were made. Next, radiologist 2 (RC, 8 years of experience in urinary imaging) would review the segmentation results. If there was any objection to the results, the results would be discussed and resegmented until a consensus was reached.

Feature Extraction
After finishing segmentation, the feature extraction of lesions was carried out by PyRadiomics package. The original feature classes contain first-order, shape, and texture features. First-order features refer to the distribution of voxel intensities through general and basic metrics, such as range, mean, variance, and kurtosis. Texture features embody: Gray-Level Cooccurrence Matrix (GLCM); Gray-Level Run Length Matrix (GLRLM); Gray-Level Size Zone Matrix (GLSZM); Neighboring Gray-Tone Difference Matrix (NGTDM); and Gray-Level Dependence Matrix (GLDM). Furthermore, we use eight filters and the original images were derived into eight kinds of filtered transformed images: Laplacian of Gaussian (LoG), wavelet, square, square root, logarithm, exponential, gradient, and local binary pattern (LBP). Except shape features, the first-order and texture features mentioned above can also be extracted from the derived images. Due to a single MRI sequence containing 1,781 features, MP-MRI produced 5,343 features in total for this study. The detailed description of the image features mentioned above can be found in https://pyradiomics.readthedocs.io/en/latest/features.html.

Feature Selection
Firstly, the extracted feature data were subtracted by the mean and then divided by the variance to achieve data normalization for subsequent comparison. Next, in order to reduce the over fitting of data and find the optimal correlation features, max relevance min redundancy (mRMR) was adopted to find the top 20 features for ECE, PNI, and SM groups and 15 features for Ki67 and S100 groups in the training cohort. Due to some machine learning, classifiers themselves can evaluate the importance of features and find the best feature combination through multiple iterative calculations. Therefore, the recursive feature elimination (RFE) based on random forest (RF) was applied to find the best feature combination step by step based on accuracy.

Model Construction
In this study, six classifiers were included to find the best diagnostic performance among them: support vector machine (SVM), K-nearest neighbor (KNN), random forest, decision tree, logistic regression, and XGBOOST. Support vector machine was  based on polynomial kernel function, and the tolerance for misclassified samples was set by the specific penalty coefficient C (from 0.0001 to 1,000). The best k value (number of neighbors) for KNN was found by training in the range of 3-10. For random forest, decision tree, and XGBOOST, the maximum tree depth was constrained to avoid overfitting. To find out whether the clinical data improved the diagnostic performance, several clinical data (age, PSA, WBC, RBC, hemoglobin, lymphocyte, platelet, albumin, ALP, PLR, fibrinogen) were selected to build the clinical models. Meanwhile, they were integrated into the corresponding MP-MRI radiomic models to construct the combined models. The parameters used in the model construction are described in Supplementary Table S1.

Model Evaluation and Statistical Analysis
The diagnostic efficiency of the models was demonstrated by receiver operating characteristic (ROC) curves and quantified by the area under the curve (AUC). The calibration curve shows the consistency between the prediction model and the actuality.
What is more, the decision curve analysis (DCA) illustrated the clinical net benefits brought by the prediction model. The diagnostic ability of MP-MRI models and combined models was compared by DeLong's test. The overall comparison of PSA in each group was through Mann-Whitney U test. The case distribution between validation cohorts and training cohorts was compared by Chi-square test. All statistical analysis was performed by R (version 4.0.2). The statistically significant level was set at 0.05.

Demographics
In this study, a total of 252 PCa patients were included, and the flowchart of patients' recruitment is depicted in Supplementary   Figure S1. The baseline characteristics of PCa patients could be found in Supplementary Table S2. The mean age of the included patients was 68.4 years (50-84 years), and their surgical Gleason score is mainly distributed in 7 (64.7%). According to their pathological results, the included patients were divided into 5 groups: Ki67 (n = 140), S100 (n = 158), ECE (n = 232), PNI (n = 225), and SM (n = 248). As shown in Table 1, the expression of PSA was significantly different in ECE (p < 0.01), PNI (p = 0.03), and SM (p < 0.01) groups, and it was relatively not significant in Ki67 (p = 0.08) and S100 (p = 0.12) groups. Meanwhile, the case composition between the cohort training and validation cohort was roughly the same in each group (p > 0.05).

Feature Selection
After applying mRMR to the features extracted from MP-MRI, the top 20 features in ECE, PNI, and SM groups and 15 features in Ki67 and S100 groups were obtained in the training cohort. RFE-RF then selected the resulting features and achieved the best performing feature combination as shown in Figures Figure S2 for a better result demonstration. Next, principal component analysis (PCA) was performed to extract principal components of features in each group and reduce the dimensions, which made the division of cases in each group more intuitive according to their feature values. As displayed in Figures 2E-H, the selected features could satisfactorily distinguish the positive and negative cases on PCA (for Ki67, they were ≥10% and <10%), especially in the ECE and PNI groups, which successfully divided the cases with different labels into left sides and right sides. According to the heat maps in Figure 3, the color of positive cases or high Ki67 expression cases was generally darker, and the color in heatmaps referred to the values of the selected feature. This also proved the ability of the features themselves to distinguish the biological characteristics of patients. Supplementary Figure S3 showed that the correlation among radiomics features was weak, indicating there was low redundancy among selected features.

Comparison of Different Classifiers
The six classifiers (SVM, KNN, random forest, decision tree, logistic regression, and XGBOOST) and their AUC in each group are listed in Table 2. In general, the performance of random forest was the best according to AUC values, and we chose random forest as the prediction models for the four groups (Ki67 = 0.87, S100 = 0.80, ECE = 0.85, PNI = 0.82). As for SM, the performance of SVM was relatively the best (AUC = 0.77), so SVM with polynomial kernel function was selected as the optimal classifier for SM.

Performance of MP-MRI and Combined Prediction Models
The optimal MP-MRI models of the five groups performed satisfactorily both in the training cohorts and validation cohorts ( Figure 4; Table 3). The prediction model of Ki67 performed best among the five groups, whose AUC value reached 0.88 in the validation cohort. The second-best model was ECE with AUC value = 0.85. The AUC values of the three remaining models in the validation cohort were 0.80 for S100, 0.82 for PNI, and 0.77 for SM. As for clinical factors, after REF-RF selection, 2 characteristics were included for Ki67 (PSA, PLR), 1 for S100 (PSA), 4 for ECE (PSA, WBC, PLR, and ALP), 5 for PNI (PLR, age, fibrinogen, PSA, and albumin), and 4 for SM (PSA, fibrinogen, albumin, and lymphocyte). This displayed the level of PSA might be helpful to distinguish five biological characteristics to some extent. Clinical characteristics were then added to the MP-MRI model to form the combined models. As a result, in the training cohort, the combined model was significantly better than the MP-MRI models except the Ki67 group based on Delong's tests (p < 0.05). Nevertheless, in the validation cohort, there was no significant difference between the two groups (p > 0.05).
Furthermore, the importance of the features in the combined models is demonstrated in Figure 5. The number and importance of DCE features ranked first in models of each group, followed by DWI, and finally T2. This also revealed that DCE sequences could provide more information for predicting the malignant degree of prostate cancer. In addition, in Figure 6, calibration curves displayed the consistency between the prediction model and the actuality was favorable, and when the risk threshold is greater than about 0.1, the prediction model could bring more clinical net benefits according to the DCA. Finally, we provide the examples of VOI delineation on MP-MRI and the corresponding 3D constructions images in Figure 7.

DISCUSSION
In this study, we constructed the machine-learned radiomic models based on six classifiers for the five biological characteristics (Ki67, S100, ECE, PNI, and SM) related to the invasiveness of prostate cancer. ROC curves showed that the  diagnostic abilities of these models were ideal with AUC values all greater than 0.8 in the validation cohorts (except SM = 0.77). Meanwhile, we added several clinical characteristics to make the combined models, though they barely improved the accuracy of prediction. For traditional diagnostic methods, like serum PSA test, digital rectal examinations (DREs), and prostate biopsy, they inevitably have many major deficiencies (19,(29)(30)(31). The main deficiencies are that they may lead to overdiagnosis of prostate cancer and missed diagnosis of clinically significant cancer (32)(33)(34). As shown in Supplementary Table S2, 22.6% of patients showed Gleason score 6 or less, suggesting that a relevant number of cancers below the threshold which is currently considered clinically significant cancer and leading to overdiagnosis and overtreatment. MP-MRI as a noninvasive method has been recommended as a routine examination of prostate cancer and proved to be beneficial in the detection of clinically significant cancer (21,22). Recently, the NCCN guidelines clearly pointed out that MP-MRI was helpful to the staging and risk stratification of prostate cancer, and its combination with several biomarkers could reduce unnecessary biopsy (35). Moreover, MP-MRI contains much clinically valuable information, which has not attracted enough attention in clinical practice. Recently, artificial intelligence, such as radiomics, has shown great potential for evaluating the aggressiveness of urological tumors (36). Therefore, radiomics could be used as a novel and efficient way to dig out the information (24). Radiomics has been applied to predict many aspects of prostate cancer, such as cancer diagnosis, Gleason score, treatment response, and early biochemical recurrence (37).
Nowadays, using radiomics to predict multiple biological characteristics of tumors simultaneously has become a trend. In the research of Meng et al. (38), they proposed the radiomic models based on MP-MRI have the ability to predict multiple     (AUC = 0.85). He et al. (40) used MP-MRI radiomics to predict ECE (AUC = 0.728, also lower than ours) and SM (AUC = 0.76, similar to ours), yet they did not comprehensively evaluate the aggressiveness of prostate cancer as ours. Therefore, our comprehensive radiomic models made it possible to predict more critical biological characteristics of prostate cancer and improve the prediction accuracy of some biological characteristics compared with the other published AI models.
In the present study, we extracted as many features as the recent literature documented. We then adopted an efficient feature selection method-mRMR, which has been proved advanced in a majority of researches (38,41), to obtain the most relevant and least redundant features. In addition, the low redundancy of selected features could be testified by the correlation maps in Supplementary Figure S3. RFE-RF then ensured the best combinations of the included features. More and more studies use RFE-RF to select the best feature combinations, yet it needs a large amount of computation so that it is suitable for low-dimensional data after primary selection (42).
For the resulting radiomic features of each group, wavelet features account for larger proportions: 6/13 for Ki67, 5/13 for S100, 9/16 for ECE, 2/8 for PNI, and 7/20 for SM. Wavelet features are derived from the wavelet transform and represent The bold values represent the AUC of the classifiers that perform best in each subgroup. SVM, support vector machine; KNN, K-nearest neighbor.  high-dimensional features that cannot be easily deciphered by humans (43). Wavelet features could show the heterogeneity of tumors, and multiple studies have proved its strong prediction ability (38,44). In addition, DCE sequences occupied a large part of the models. The reason may be that prostate cancer has strong ability of neovascularization, and the morphology and density of neovascularization are closely related to metastasis and prognosis of patients (45,46). DCE-MRI is exactly a fairly adequate way to demonstrate neovascularization. Also, because of the increased vascular permeability of prostate cancer, we chose the arterial phases of DCE sequences to delineate the lesions (28). Classifiers play an essential role in machine learning. The six classifiers that were included in this study: SVM, KNN, random forest, decision tree, logistic regression, and XGBOOST. As a result, the classification performance of random forest was generally the best among them. Random forest is composed of a large number of decision trees, and its prediction result is averaged by all the tree predictions, so it can effectively avoid over fitting. It has been documented that random forest occupied a large part of the Kaggle Data Science Competitions and ranked first among 179 classifiers (47).
It is inevitable that there are some coexisting prostatic diseases in patients with prostate cancer. In our study, coexisting diseases contained benign prostatic hyperplasia, chronic prostatitis, prostatic cyst, etc. However, they did not have a great impact on our study and our models still achieved favorable distinguishing ability. The reason may be that the features selected by our screening methods have strong specificity for the corresponding biological characteristics, and many of them reflect the complexity of the lesions, such as texture features (48). For coexisting diseases like prostatic hyperplasia, the density of lesions was relatively more consistent on MP-MRI and they would not make a remarkable difference to the accuracy of our models. The established model should be more applicable to clinical reality. If they were only applied to target diseases and excluding coexisting diseases, the clinical application of the models would be seriously limited.
Although radiomics shows huge potential for the improvement of clinical diagnosis and risk stratification, its practical clinical application is still subject to many difficulties, and its real benefits are required to be further confirmed in prospective cohort studies (49). However, radiomics plays an increasingly important role in medical imaging, and it provides a unique basis for personalized precision treatment (50). In our study, we proved the applicability of radiomics in predicting the multiple biological characteristics of prostate cancer, and we also provided relatively detailed protocol for MP-MRI and key machine-learning parameters to offer a reference for the standardization work in the future (51). The next main steps of radiomics could be to take advantage of deep learning methods (for example, U-Net) to delineate the ROI automatically and to prove the robustness of the radiomic models through multicenter, prospective, randomized-controlled trials (52).
This study had the following limitations: Firstly, this study was a retrospective and single-center study, and this inevitably led to selection bias and lack of samples and external verification. Secondly, some valuable biological characteristics or biomarkers were not included in the model due to incomplete data, for example, gene mutation data, which had great guiding significance for clinical treatment. Thirdly, our models were not as intuitive as a nomogram due to the algorithm of random forest and SVM with polynomial kernel function. Fourthly, the delineation of lesions was performed manually instead of computer-aided, which may lead to inconsistencies in clinical practice. Therefore, our next research focus will be put on multicenter, prospective, more clinically feasible, large-scale, and valuable indicator-based studies.

CONCLUSION
The present work associated the radiomics features of MP-MRI with five biological characteristics related to the aggressiveness of prostate cancer. The established comprehensive models made it possible to predict more critical biological characteristics of prostate cancer and achieved favorable prediction abilities. Therefore, the models are expected to noninvasively evaluate the risk stratification of prostate cancer and provide valuable guidance for clinical decision-making.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board (IRB) of Shanghai General Hospital (2021KY107). The ethics committee waived the requirement of written informed consent for participation. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
XY, HW, and XF designed the study. NX, TL, RC, and HY acquired the data. MH, ZW, and HL analyzed the data. XF and NX wrote the report, which was edited by all authors. YW and XY supervised the project. All authors contributed to the article and approved the submitted version.

FUNDING ACKNOWLEDGMENTS
We would like to thank Yizhun Medical AI Technology Co. Ltd, who kindly provided the DARWIN platform and some technical support.