Multimodal data integration for predicting progression risk in castration-resistant prostate cancer using deep learning: a multicenter retrospective study

Purpose Patients with advanced prostate cancer (PCa) often develop castration-resistant PCa (CRPC) with poor prognosis. Prognostic information obtained from multiparametric magnetic resonance imaging (mpMRI) and histopathology specimens can be effectively utilized through artificial intelligence (AI) techniques. The objective of this study is to construct an AI-based CRPC progress prediction model by integrating multimodal data. Methods and materials Data from 399 patients diagnosed with PCa at three medical centers between January 2018 and January 2021 were collected retrospectively. We delineated regions of interest (ROIs) from 3 MRI sequences viz, T2WI, DWI, and ADC and utilized a cropping tool to extract the largest section of each ROI. We selected representative pathological hematoxylin and eosin (H&E) slides for deep-learning model training. A joint combined model nomogram was constructed. ROC curves and calibration curves were plotted to assess the predictive performance and goodness of fit of the model. We generated decision curve analysis (DCA) curves and Kaplan–Meier (KM) survival curves to evaluate the clinical net benefit of the model and its association with progression-free survival (PFS). Results The AUC of the machine learning (ML) model was 0.755. The best deep learning (DL) model for radiomics and pathomics was the ResNet-50 model, with an AUC of 0.768 and 0.752, respectively. The nomogram graph showed that DL model contributed the most, and the AUC for the combined model was 0.86. The calibration curves and DCA indicate that the combined model had a good calibration ability and net clinical benefit. The KM curve indicated that the model integrating multimodal data can guide patient prognosis and management strategies. Conclusion The integration of multimodal data effectively improves the prediction of risk for the progression of PCa to CRPC.


Introduction
Prostate cancer (PCa) affects men worldwide and is a significant health concern, with a global incidence rate of 13.5% (1).Additionally, the mortality rate of 6.7% makes PCa the fifth leading cause of death among men (2).Androgen deprivation therapy (ADT) is considered the primary treatment modality for men diagnosed with advanced symptomatic PCa, also known as castration-sensitive PCa (CSPC) (3).However, subsequent to the initial favorable treatment response, it is frequently observed in PCa patients that there is a decline in response and eventual progression to CRPC, which is characterized by a dismal prognosis (3).The median duration and mean survival period of patients until progression to CRPC range from 18 to 24 months and 24 to 30 months (4,5), respectively.The status of the depot condition (testosterone [TST] 50 ng/dL or 1.7 nmol/L) and subsequent disease development (a sustained rise in prostate-specific antigen [PSA] and progression seen in images) are now the two most important criteria for detecting CRPC.However, tailored precision medicine is limited by the use of monomodal indicators such as PSA and serum testosterone (6,7).The early detection of CRPC can help physicians determine the optimal timing for administering second-line therapies, possibly increasing the survival rate among patients.Predicting the risk of CRPC is an important factor affecting prognosis in patients with severe PCa.There is an urgent need for early diagnosis and precise management of CRPC.
Despite advancements in technology, there are still persistent challenges in accurately detecting, characterizing, and monitoring cancers (8).The assessment of diseases through radiographic methods primarily relies on visual evaluations, which can be enhanced by advanced computational analyses.Notably, AI holds the potential to significantly improve the qualitative interpretation of cancer imaging by expert clinicians (9).This includes the ability to accurately delineate tumor volumes over time, infer the tumor's genotype and biological progression from its radiographic phenotype, and predict clinical outcomes (10).Radiomics, and pathomics have rapidly emerged as cutting-edge techniques to aid and enhance the interpretation of vast medical imaging data, which may benefit clinical applications.The techniques have the ability to directly process images, giving rise to numerous subdomains for further research (11).Clinical outcomes, such as survival, response to treatment, and recurrence, may be accurately predicted using AI models that use multimodal data (12)(13)(14).The utilization of radiomics and pathomics exhibits significant promise in enhancing clinical decision-making processes and ultimately enhancing patient outcomes via medical imaging techniques (15)(16)(17).
Hence, to effectively and precisely anticipate the likelihood of developing CRPC without invasive procedures.We constructed radiomics and pathomics prediction models based on deep-learning algorithms and investigated their application value in clinical decision-making and the prognosis of PCa.This may allow more accurate prediction of the risk of CRPC and provide a reference for accurate diagnosis and treatment of PCa.

Materials and methods
Clinicopathological data from patients with PCa were acquired retrospectively from the electronic medical record system of the three centers (center A; center B; center C) after receiving approval from the ethics committee of the local institution.This retrospective study was also approved by the Ethics Committee of the Gansu Provincial Geriatrics Association (2022-61), and the requirement for informed consent was waived.Our research program was designed based on the AI model of a local institution.

Participants
We conducted a retrospective study including patients with a pathologically confirmed diagnosis of PCa from the three centers between January 2018 and February 2021.The inclusion criteria were (a) first pathological diagnosis of PCa; (b) use of the same ADT treatment regimen; (c) availability of all MRI scans within 30 Clinical data from 399 patients with PCa were collected, including 254 from the Gansu Provincial Hospital (Center A), 112 from the 940 Hospital of Joint Logistics Support Force of Chinese PLA (Center B), and 33 from the Second People's Hospital of Gansu Province (Center C). Figure 1 shows the flowchart for patient recruitment.

Prostate tumor segmentation
A radiologist (R.W) with 5 years of experience in prostate MRI diagnosis and a urologist (FH.Z) with 30 years of experience in PCa MRI diagnosis were involved in delineating the regions of interest (ROIs).Disagreements regarding individual lesions were resolved after consultation with a third radiologist (LP.Z), and a consensus was attained.The radiologist were unaware of the patients' CRPC status and adhered to the guidelines outlined the Prostate Imaging Reporting and Data System Version 2 (PI-RADS-V2).Once the delineation of the Region of Interest (ROI) was finalized, a random screening of the 11 features extracted from the ADC sequences was performed.Subsequently, Mann-Whitney U tests were conducted on both sets of features to ascertain the presence of any potential bias in the results obtained by the two experts (R.W and FH.Z) during the delineation process.The main sequence parameters of mp-MRI in Supplementary Table 1.The ITK-SNAP software, version 4.0.0 (http://itk-snap.org),was used to annotate the ROIs for each patient from three sequences, including T2-weighted (T2WI), diffusion-weighted imaging (DWI), and apparent diffusion coefficient (ADC).The volume of interest was created by overlapping the ROIs of each patient.To pretrain the DL model, 2-dimensional (2D) ROIs were extracted from the original images of the three sequences by using a clipping tool based on the tumor's 3D segmentation mask.The standard protocol of Digital Imaging and Communications in Medicine (DICOM) is commonly used for managing medical imaging information and related data.To ensure data quality, we standardized it to a resampling format with a resolution of 1 cm × 1 cm × 1 cm and performed N4 bias correction on all images before delineation.
A pathologist (X.Z) selected a histopathological hematoxylin and eosin (H&E) slide (20×10 magnification) of a typical tumor area as the pathological image for the patient.To prevent data heterogeneity, we used Photoshop to adjust each histopathological slide to the same pixel size (640×480) for pretraining the DL model.Overall, 141 patients from Center A were included in the training group, while 60 patients from Center B and Center C were included in the external validation group for building ML and DL models.

Radiomics signature construction
PyRadiomics (http://www.radiomics.io/pyradiomics.html)was used for extracting radiomics features.Additionally, the Z-score was employed for dataset standardization ([column−mean]/standard).The method involved using the Spearman correlation coefficient to evaluate the consistency among observers in feature extraction.Features with a correlation coefficient greater than 0.9 were considered reliable and formed a feature set for subsequent analysis.Normalization was performed by subtracting the mean value of each feature and dividing it by the standard deviation.The least absolute shrinkage and selection operator (LASSO) algorithm was used for feature selection and construction, with multiple iterations to assess the importance of each feature.Lastly, ML classifiers, such as logistic regression (LR) and support vector machines (SVM), were utilized to build the predictive models.

DL signature construction
In this study, ResNet-50, ResNet-34, ResNet-18, Vgg19, and other deep transfer learning (DTL) models were used for model pretraining.The number of iterations (epochs) was set to 100, with a batch size of 32.Imagenet was employed as the regularization method.To enhance the interpretability of the model's decisionmaking process, we applied the Gradient-weighted Class Activation Mapping (Grad-CAM) method for visual analysis of the model.This method utilizes the gradient information from the last convolutional layer of the neural network to generate a weighted fusion of the class activation map.This class activation map highlights the important regions of the classified target image, thereby allowing us to better understand the decision-making principles of the model.

Construction of nomogram
We integrated radiomics models, DL models, and pathomics models to construct a nomogram and investigated the contributions of various modalities in the joint model.

Model evaluation
To evaluate the predictive performance of the models, we plotted ROC curves for each model and calculated the area under the curve (AUC) values.Decision curve analysis (DCA) curves and calibration curves were used to assess the net clinical benefit and goodness of fit of the joint model.Kaplan-Meier (KM) curves were used to evaluate its relationship with progression-free survival (PFS).

Statistical analysis
Statistical Package for Social Sciences (SPSS) 23.0 and R statistical software (version 3.6.1 R, https://www.r-project.org/)were used for statistical analysis.The Kolmogorov-Smirnov test was used to evaluate the normality of the measures, and those that conformed to a normal distribution were expressed as x ± s.The measures that did not conform to a normal distribution were expressed as the median (upper and lower quartiles).An independent samples t-test (normally distributed with equal variance) or Mann-Whitney U-test (skewed distribution or unequal variance) was used to compare the measures.Multi-factor LR analysis was used to screen out the independent predictors to construct the prediction model and plot the nomogram.The AUC of the receiver operating characteristics (ROC) was calculated to evaluate the discriminative power of the model.A DCA curve was plotted to compare the clinical value of the model.A p-value of <0.05 indicated a statistically significant difference.

Clinical characteristics
The study flow is shown in Figure 2. A total of 198 patients were excluded for not meeting the inclusion criteria, and 201 patients were included; 93 included patients progressed to CRPC.Statistical analysis revealed no significant differences in clinical features between the training and validation groups (Table 1).

Feature selection and signature construction
We extracted 2553 radiomic features using PyRadiomics.According to the ROI results presented by the two experts, a random selection of 11 features derived from ADC sequences was subjected to a Mann-Whitney U test.The analysis revealed no statistically significant distinction between the two groups of features (Supplementary Table 2).Seven radiomic features were selected using the LASSO algorithm (Figures 3A-C).Three 2D ROIs with maximum cross-sections were chosen, and different deep-learning models were used for pretraining and external validation.Model evaluation (Table 2) demonstrated that ResNet-50 had better overall performance in the external validation set, with the lowest loss value.This indicates that ResNet-50 had fewer errors during the training process and converged faster than any other Convolutional Neural Network(CNN)model (Figures 4A, B).In terms of model interpretability, each model had distinct attention regions in the samples.In comparison, ResNet-50 had clearer attention regions primarily focused on the internal regions of the tumor, while the tumor regions in the surrounding tissue were not activated (Figure 5).Furthermore, the ResNet-50 model performed better in the ADC sequence among the three sequences (Table 3).Schematic outline of the study.SVM, support vector machine; ROI, region of interest.

Validation of radiomics and pathomics signature
The predictive performance of the models was evaluated using ROC analysis.The best ML model for radiomics was SVM, with an AUC of 0.755 (Figure 6A).For DTL and pathomics, the best model was ResNet-50, with AUC values of 0.768, 0.714, 0.684, and 0.752 (Figures 6B-E).The nomogram graph showed that DTL contributed the most in the combined model (Figure 7), and the AUC of the combined model was 0.86 (Figure 8).Calibration curve  analysis showed that the joint model has a good fit and strong calibration capability (Figure 9).The DCA curve showed that all models had good clinical net benefit, with the combined model showing higher net benefit (Figure 10).

Prognosis
In the classification study of CRPC risks, a total of 87 patients experienced tumor progression-related events.The KM curve analysis showed that the joint model suggests significantly lower PFS for patients at high risk of CRPC compared to those at low risk (Figure 11).

Discussion
To our knowledge, in this retrospective cohort study conducted across multiple centers, a novel prediction model was developed and validated for the first time.This model integrated radiomics, DTL, and pathomics data to provide strong predictive capabilities in primary prostate cancer progressing to CRPC following two years of ADT.The utilization of multiparametric radiological modeling, as employed in this investigation, may aid urologist in evaluating the probability of CRPC progression and formulating personalized treatment strategies.
The prognosis of CRPC is notably unfavorable, and the challenges in its treatment are diverse among patients (18).The acquisition of reliable data from an initial diagnosis of localized PCa managed with ADT is constrained in clinical practice (19).Previous research has demonstrated a significant correlation between Nglycan score and adverse prognosis in CRPC (20).Additionally, the assessment of skeletal muscle index and skeletal muscle attenuation holds predictive value for the prognosis of metastatic CRPC (21).PSA nadir and Grade 5 were both associated with CRPC progression (22).It was also established that AR-V7 mRNA, significantly predicted biochemical recurrences and CRPC progression (23).However, none of these findings provided specific and prospective indications regarding the likelihood of castration-CRPC progression in patients with PCa.Our approach demonstrated significant predictive performance and provided therapeutic advantage.In addition, the calibration curve and KM survival curve were well-suited for the model and provided useful predictive information for patients with PCa.This finding could potentially be attributed to the multimodal data integration and the selection of suitable AI methodologies.

Multimodal data integration
Data fusion addresses inference problems by amalgamating data from various modalities that provide different viewpoints on a shared phenomenon (24, 25).Consequently, the integration of multiple modalities may facilitate the resolution of such challenges with greater precision compared to the utilization of singular modalities (26).This is particularly important in medicine, as similar results from different measurement techniques might provide different conclusions (27,28).In recent years, the growing prevalence of original studies utilizing imaging and pathology images in the field of prostate cancer has created an opportunity for AI technology to demonstrate its potential (29,30).Additionally, DL approaches have direct applications for segmentation, multimodal data integration and model construction (31).We used late-stage fusion, also known as decision-level fusion, to train a separate model for each modality and then aggregate the predictions from each model to produce a final prediction.Aggregation can be done by averaging, majority voting, and Bayesian-based rules among other methods (32).During the data collection phase, we found that some of the data were missing and incomplete, while late fusion still maintained the predictive power.Since each model is trained individually, aggregation methods, such as majority voting, can be applied even if one mode is missing.In contrast, if the unimodal data do not complement one another or have weak interdependencies, late fusion may be preferred due to its simpler design and fewer parameters in comparison to other fusion procedures.This is also advantageous in instances with insufficient

Supervised method
In this study, we selected a supervised AI approach for training radiomics models using radiology image annotations with patient outcomes to input data into predefined labels (e.g., cancer/noncancer) (35).Since the feature extraction was not part of the learning process, the models typically had more simple architecture and lower computation costs.An additional benefit was a high level of interpretability because the predictive features could be related to the data.In contrast, the feature extraction was time-consuming and could translate human bias to the models.Based on the sample size included in this study, the supervised method was sufficient due to its simplicity and ability to learn from our radiomics model.
Self-supervised techniques effectively leverage accessible unlabeled data to acquire superior image features, subsequently transferring this acquired knowledge to supervised models.Consequently, supervised methods like CNNs are employed to address diverse pretexting tasks, wherein labels are automatically generated from the data (36).Notably, self-supervised methods are particularly well-suited for more robust computational systems and higher-resolution images (37,38).

Model selection for DL
DL is the current state-of-the-art ML algorithm, which simulates the connections between the neurons of the human brain.It learns and extracts complex high-level features from the input data through multi-layer neural networks, thus realizing automatic classification, recognition, and prediction of data.Traditional deep CNNs often encounter the issues of gradient vanishing or gradient explosion as the number of network layers increases, leading to challenging model training.ResNet addresses this problem by introducing the concept of residual connections.The structure promotes the flow of gradients and information transfer, thereby facilitating the training of deeper networks.In this study, we selected DL models including ResNet-50, ResNet-34, ResNet-18, and Vgg19 for pre-training.Comparing these models revealed that ResNet-50 outperformed the others.The main advantage of ResNet-50 lies in its ability to effectively train very deep neural networks while avoiding issues such as gradient vanishing and gradient explosion.Consequently, it excels in image classification tasks and can manage large and complex datasets.Due to its versatile application and remarkable performance, ResNet-50 serves as a benchmark model in various computer vision tasks and is widely

Limitations
The study has limitations.First, this is a retrospective study from a multicenter institution, and potential biases, such as differences in MRI acquisition parameters, are inevitable.However, as mentioned previously, we completed the data alignment and pre-processed the images to minimize the impact of these differences on the results.Second, Nomodiagram of the combined model.

Conclusions
In summary, we collected a multimodal dataset from patients who developed CRPC and used it to develop and integrate radiological and histopathological models to improve CRPC risk prediction.This result encourages to conduct further large-scale studies utilizing multimodal DL.Calibration curve of the combined model indicates a better agreement between the predicted probabilities and the actual observed frequencies.Decision curves showed that each model could achieve clinical benefit and that the net benefit of the combined model was better.
KM survival curve analysis demonstrates that multimodal data can serve as a reliable predictor of the risk of CRPC occurrence.CRPC, castration-resistant prostate cancer.

FIGURE 1 Flow
FIGURE 1 Flow chart of patient recruitment.Center (A) Gansu Provincial Hospital; Center (B) The 940 Hospital of Joint Logistics Support Force of Chinese PLA; Center (C) Second People's Hospital of Gansu Province.
FIGURE 3 (A) Coefficient profiles of the features in the LASSO model are shown.Each feature is represented by a different color line indicating its corresponding coefficient.(B) Tuning parameter (l) selection in the LASSO model.(C) Weights for each feature in the model.LASSO, least absolute shrinkage and selection operator.

FIGURE 5 FIGURE 4
FIGURE 5Regions of attention in prostate cancer MRI analysis with different DL models.MRI, magnetic resonance imaging.
utilized in target detection, image segmentation, and image generation.In Lei et al.'s training study of MRI DL involving 396 patients with PCa, training a DL model for PCa classification using pairs of ResNet-50 anti-paradigms improved the generalization and classification abilities of the model(39).In another pathomics study, texture features captured using the ResNet DL framework were able to better distinguish unique Gleason patterns(40).

FIGURE 8 ROC
FIGURE 8ROC curve analysis for the combined model.

TABLE 1
Comparison of clinical data of patients with prostate cancer in the training set and validation set.

TABLE 2
Performance of different DL models.
(34)his study, MRI and H&E tissue sections were weakly complementary to each other, and hence our post-fusion model demonstrated good predictive ability.Examples of late fusion include the integration of imaging data with non-imaging inputs, such as the fusion of MRI scans and PSA blood tests for PCa diagnosis(33).Survival prediction using the fusion of genomics and histology profiles by Chen et al. was also performed(34).

TABLE 3
Performance of ResNet-50 in different sequences.