Multi-Phase CT-Based Radiomics Nomogram for Discrimination Between Pancreatic Serous Cystic Neoplasm From Mucinous Cystic Neoplasm

Purpose This study aimed to develop and verify a multi-phase (MP) computed tomography (CT)-based radiomics nomogram to differentiate pancreatic serous cystic neoplasms (SCNs) from mucinous cystic neoplasms (MCNs), and to compare the diagnostic efficacy of radiomics models for different phases of CT scans. Materials and Methods A total of 170 patients who underwent surgical resection between January 2011 and December 2018, with pathologically confirmed pancreatic cystic neoplasms (SCN=115, MCN=55) were included in this single-center retrospective study. Radiomics features were extracted from plain scan (PS), arterial phase (AP), and venous phase (VP) CT scans. Algorithms were performed to identify the optimal features to build a radiomics signature (Radscore) for each phase. All features from these three phases were analyzed to develop the MP-Radscore. A combined model comprised the MP-Radscore and imaging features from which a nomogram was developed. The accuracy of the nomogram was evaluated using receiver operating characteristic (ROC) curves, calibration tests, and decision curve analysis. Results For each scan phase, 1218 features were extracted, and the optimal ones were selected to construct the PS-Radscore (11 features), AP-Radscore (11 features), and VP-Radscore (12 features). The MP-Radscore (14 features) achieved better performance based on ROC curve analysis than any single phase did [area under the curve (AUC), training cohort: MP-Radscore 0.89, PS-Radscore 0.78, AP-Radscore 0.83, VP-Radscore 0.85; validation cohort: MP-Radscore 0.88, PS-Radscore 0.77, AP-Radscore 0.83, VP-Radscore 0.84]. The combination nomogram performance was excellent, surpassing those of all other nomograms in both the training cohort (AUC, 0.91) and validation cohort (AUC, 0.90). The nomogram also performed well in the calibration and decision curve analyses. Conclusions Radiomics for arterial and venous single-phase models outperformed the plain scan model. The combination nomogram that incorporated the MP-Radscore, tumor location, and cystic number had the best discriminatory performance and showed excellent accuracy for differentiating SCN from MCN.


INTRODUCTION
Pancreatic cystic neoplasms (PCNs) have been increasingly diagnosed in recent years as a direct result of the extensive use of abdominal cross-sectional imaging. The prevalence of incidentally discovered PCNs in the general population has been reported to range from 2.6 to 19.6% (1,2). Considerable attention has been focused on serous cystic neoplasms (SCNs) and mucinous cystic neoplasms (MCNs) because of the significant difference in the probability of malignant transformation between the two (3). SCNs have an extremely low incidence of malignancy (4). The current management strategy for SCN is conservative, based on regular surveillance with rare interventions performed only because of symptoms (5, 6). MCNs are diagnosed almost exclusively in middle-aged women, but with a very definite potential for malignant transformation (7)(8)(9). In contrast to SCN, surgical resection has been advocated for many, if not most, MCN patients. Recognizing the marked difference in the risk of malignancy and the consequent nearly opposite clinical management strategies between these cystic neoplasms, it is vital to correctly discriminate between the two.
Currently, even the high-quality imaging modalities such as computed tomography (CT) and ultrasound do not provide adequate discrimination between SCN and MCN (10,11). Clearly, radiological imaging approaches, especially multidetector computed tomography (MDCT), play a pivotal role in the preoperative diagnosis of PCNs. It has been reported that the discrimination efficacy of CT for SCNs was ranged from 27 to 91% (12,13). Compared with CT, MRI/magnetic resonance cholangiopancreatography (MRCP) could further improve the diagnostic accuracy of PCNs, with an accuracy of 40-95% (14,15) providing a better view of the pancreatic duct system and allowing to detect the presence of a solid component or mural nodule. Endoscopic ultrasound with fine-needle aspiration (EUS-FNA) has become a promising tool for classifying specific subtypes of PCNs. Adding EUS-FNA to CT and MRI has improved the diagnostic accuracy by 36% and 54%, respectively (16). Balanced against these potential benefits is the invasive nature of EUS and the variable risk of FNAassociated complications (17). These limitations and the significant cost curtail their application for the routine evaluation of PCNs (18).
Radiomics is an emerging and rapidly developing method for advanced image analysis. Relative to PCNs, radiomics has been successfully applied to the entire spectrum of the disease process, including differential diagnosis, malignant assessment, and prognosis prediction (19)(20)(21). Most radiomics studies utilizing MDCT pancreatic scans have been limited to the venous phase (VP) for feature extraction. Clearly, the different phases reflect unique vascular enhancement and texture information. Logically, a radiomics model constructed from a plain scan (PS) or arterial phase (AP) scan would augment diagnostic efficiency. To the best of our knowledge, no prior studies have reported feature extraction of all three phases of contrastenhanced CT to discriminate PCN subtypes.
Our aim was to compare the predictive efficacy of each singlephase radiomics model, and then to construct a combination nomogram, incorporating a multi-phase (MP) radiomics model with clinical imaging factors that would noninvasively and accurately discriminate SCNs from MCNs.

Patient Population
The Institutional Review Board of Huashan Hospital of Fudan University approved this retrospective study, and the requirement for informed consent was waived. Patients who were diagnosed with SCNs or MCNs for whom surgical resection was performed in our hospital between January 2011 and December were enrolled in this study. The inclusion criteria were: (1) SCNs or MCNs with surgical pathologic confirmation; (2) contrast-enhanced CT scans (slice thickness: 1.5 mm) performed within one month prior to pancreatic surgery. The exclusion criteria were: (1) CT images with serious artifacts and (2) patients whose radiomics features could not be successfully extracted from the CT images. The details of patient enrollment are shown in Figure S1 in Supplementary Materials. The final study group comprised 115 patients with SCNs and 55 with MCNs. Patients were randomly grouped in a ratio of 7:3, with 120 and 50 patients in the training and validation cohorts, respectively. Patient demographic and clinical information was collected from the hospital medical record system. Demographic information (age and sex) and eight imaging factors known to be valuable in distinguishing SCNs from MCNs from previous studies were selected as the basis for constructing the clinical model (22,23). Two radiologists with considerable experience in abdominal imaging (13 and 6 years, respectively) evaluated the features in consensus including: (1) lesion size, (2) tumor location (head, neck, body, and tail), (3) cyst number (single or multiple), (4) calcification (absent or present), (5) septation (absent or present), (6) lesion shape (oval or irregular lobulation), (7) wall enhancement (absent or present), and (8) mural nodules (absent or present). Both radiologists were blinded to the correlative pathological details. Among these, the lesion size was outlined and decided unanimously by two doctors simultaneously, while the other features were assessed by each doctor, and the results were derived separately. If the two radiologists did not agree on a specific feature in the same patient, a third expert with 23 years' experience in abdominal radiology reviewed the features and helped establish the final decision. The inter-reader agreement of imaging factors was also assessed, as shown in Supplementary Materials ( Table S1). The framework of the study is shown in Figure 1.

Image Acquisition
CT examinations of all patients were performed using the same 256-slice CT system (Brilliance iCT, Philips Medical Systems, The Netherlands). All pancreatic CT images were acquired using a standard dual-phase scanning protocol. The CT scan parameters were as follows: 120 kV; 150-200 mAs; rotation time, 0.5-0.75 s; collimation, 128×0.625 mm; matrix, 512×512; and slice thickness, 1.5 mm. An anionic contrast agent (370 mgI/ mL, Iopamidol-370, GEhealthcare, Princeton, NJ) was administered at a dose of 1.5 mL/kg, 3.0 mL/s. AP images were obtained 30 s after the injection of contrast agent, and VP images were obtained 45 s after the AP acquisition. All images were downloaded from the hospital archives.
Tumor Segmentation and Single-Phase Radiomics Feature Extraction PS, AP, and VP CT images in each patient were used for feature extraction. The window width and window level were 300 and 40 HU, respectively. For each phase, one radiologist (13 years' experience in abdominal imaging) segmented the lesion contour on each slice using open-source software (3D Slicer version 4.11.0; Boston, MA). With the technical support of a radiomics software based on Python (Pyradiomics version 3.0.0; https://github.com/Radiomics/pyradiomics) (24), radiomics features were extracted in three-dimensional volume for each phase. The extracted features were classified into six categories: (1) shape features, (2) first order statistics, (3) gray level cooccurrence matrix features, (4) gray-level run length matrix features, (5) gray-level size zone matrix features, and (6) graylevel dependence matrix features. Details of the features are provided in Supplementary Materials I.
To estimate both intra-and inter-observer reproducibility of extracted features, 60 patients were randomly chosen for a repeat region of interest (ROI) segmentation at 30 days following the initial segmentation, performed by the same radiologist and an additional one (with 6 years' experience in abdominal imaging). The radiologists were blinded to the associated clinical and pathological information. The intra-and inter-class correlation coefficients (ICCs) were used to evaluate feature reliability (25).

Feature Selection, Single-Phase Radiomics Signature Construction, and Performance Comparison
In the training cohort, a three-step procedure was developed to select the radiomics features extracted in each phase. First, features with both intra-and inter-ICC less than 0.75, were  excluded from this process. The mRMR method and the least absolute shrinkage and selection operator (LASSO) algorithm were used to select the most robust and optimal features to construct the single-phase radiomics model. The selected optimal features were then combined with its coefficient in the LASSO regression to construct the radiomics signature: Radscores (including PS-Radscore, AP-Radscore, VP-Radscore). The Mann-Whitney U test was used to evaluate the discrimination capability of the each-phase Radscore. We also used receiver operating characteristic (ROC) curve analysis and area under the curve (AUC) values to compare the performance of the single-phase radiomics signature. The detailed performance of each of the radiomics signatures is shown in Figure 2 and Supplementary Materials II. We also constructed and evaluated the two-phase combined radiomics model (Supplementary Materials III).

Combined Model Building and Nomogram Development
A MP radiomics feature set was developed by integrating all 3654 (1218*3) features of the three phases. We then used the same three-step feature extraction method to obtain an MP radiomics signature, MP-Radscore. The discrimination capability of the MP radiomics model was also evaluated using the Mann-Whitney U test and ROC curve analysis. Univariate analysis was conducted to estimate the differences between SCN and MCN patients for each clinical and imaging feature. In the training cohort, variables with P < 0.100, in the univariate regression, were then allocated to a multivariable logistic regression. The clinical model was constructed by incorporating factors with P < 0.100 in the multivariate analysis (26). Finally, a combination multivariate logistic model was constructed using MP-Radscores together with selected clinical imaging factors. Variance inflation factor (VIF) analysis was performed on the combination model to further reduce the probability of overfitting. The nomogram was developed to visualize the optimal model, specifically to score each patient and quantify the degree of disease tendency.

Model Validation and Clinical Use Evaluation
The combination model was first evaluated in the training cohort (n = 120) and subsequently confirmed in the validation cohort (n = 50). ROC curves and AUC values were used to evaluate the discriminatory performance of the combined models. Calibration curves and the Hosmer-Lemeshow test were conducted to estimate the consistency between the predictive results of the combination model and expected probabilities. We also used the Delong test to compare the predictive efficiency between the combination model and the venous radiomics approach to confirm the advances of our combination model. Decision curve analysis (DCA) was performed to determine the clinical value of the nomogram and calculate the net benefits of the models at different threshold probabilities (27).

Statistical Analysis
Continuous variables are presented as means and standard deviations. Student's t-test and chi-square test were employed to evaluate the statistical differences in continuous and discrete variables, respectively. In the ROC test, accuracy, sensitivity, and specificity at the cutoff value were calculated to evaluate the efficiency of the radiomics model, clinical model, and the combination model. The inter-reader agreement of imaging factors was assessed using the kappa test, and the simple kappa coefficient was used as an assessment criterion for consistency. A two-tailed P value less than 0.05, was deemed as statistically significant. All statistical analyses were performed using the R software (version 3.  In all three single-phase Radscores, there was a significant difference between SCN and MCN patients in the training cohort (P < 0.010), and importantly, this was confirmed in the independent validation cohort (P < 0.010). The PS, AP, and VP radiomics models yielded AUC values of 0.78, 0.83, and 0.85, respectively, for the training cohort, and 0.77, 0.83, and 0.84, respectively, for the validation cohort. The AUC values in the radiomics model in AP and VP were similar and higher than those in the radiomics model of the plain scan. The performance of the single-phase radiomics model is shown in Figure 2.

Combined Model Building and Nomogram Development
Using the three-step selection process described above for singlephase radiomics model construction, 14 features (including 2 in PS, 4 in AP, and 8 in VP) were similarly selected from the MP radiomics feature set ( Figure 3). The MP-Radscore was built to improve the discrimination efficacy of the MP radiomics model. ROC curves showed that the MP radiomics model performed better than the models based on single CT phase (AUC: 0.89 and 0.88 in the training and validation cohorts, respectively). The performance of the MP radiomics model in the Mann-Whitney U test and ROC curves are shown in Figure 4. The detailed calculation formulas of the MP-Radscore and combined nomogram are included in Supplementary Materials IV.
In the univariate analysis of the clinical model building, only tumor location and cyst number were significantly correlated with pathologic results (P < 0.100). Tumor location and cyst number were statistically significant (P < 0.100) in the multivariate logistic regression analysis, therefore comprising the clinical model. The results of the univariate and multivariate logistic regression analyses are shown in Table 2. The combination model was constructed by incorporating the MP-Radscore, tumor location, and cyst number. A nomogram was established to visualize the combined model ( Figure 5A).

Combination Model Validation and Clinical Use Evaluation
The combination nomogram exhibited best predictive performance (AUC: 0.91 and 0.90 in the training and validation cohorts, respectively) for the discrimination between SCNs and MCNs ( Figures 5B, C and Table 3). The Delong test demonstrated statistical differences in AUC values between the combination nomogram and the clinical model (P < 0.010). Significant differences were also found in the ROC curves between the combination nomogram and VP model (Z = 1.962, P = 0.0497 < 0.0500) in the validation cohort. Calibration curves ( Figure S6) revealed good agreement between the predictive and observation probabilities of our combination nomogram (P = 0.480 and 0.582 for the training and validation cohorts, respectively). The decision curve analysis indicated that the combination nomogram provided a net benefit over either a "treat-all" or "treat-none" strategy, and the clinical model at a threshold probability over 10% ( Figure 6). The combination nomogram demonstrated excellent clinical practicality.

DISCUSSION
In this retrospective study, we constructed and validated an MP CT-based radiomics nomogram to differentiate SCN from MCN.
The combination model, incorporating the MP radiomics model plus clinical imaging factors, exhibited better diagnostic performance than any of the single-phase radiomics models or a clinical model alone did. The decision curve analysis also confirmed that the combination model achieved better discriminatory accuracy than the clinical model did. Relating specifically to the single-phase performance comparison, the radiomics model of the AP and the VP performed better than the PS model in terms of AUC values.
The exact morphologic details of MDCT are crucial to exclude tumor invasion of PCNs. Key imaging morphologic factors (tumor size, location, lesion shape, calcification, segmentation, etc.) derived from pathologic characteristics form the basis for radiologic differentiation of PCN subtypes (28,29). Nevertheless, the diagnostic accuracy of cross-sectional imaging, such as MDCT, still falls short of ideal discrimination (30). Therefore, intrusive methods, such as EUS-FNA, have been developed to add diagnostic precision for preoperative PCN subtyping. Clearly, achieving this degree of accuracy requires highly skilled endoscopists and cytologists (31,32).
Apart from the invasive techniques described above, radiomics offers a promising noninvasive technology intended to achieve similar results. We successfully established a combination radiomics model and achieved superior capacity to differentiate SCNs from MCNs. Among numerous clinical imaging factors, only cyst number and tumor location were statistically essential to be included in the combination model. Considering that the clinical and imaging features of PCNs pathologically diagnosed as SCN in the study were not consistent with typical SCN manifestations, we also analyzed and modeled these features. These results are consistent with a number of previous CT imaging studies. The differences in morphology that discriminate between SCNs and MCNs are limited to tumor location, lobular contour, and a large number of cysts (30,33). The prediction accuracy of the clinical model alone was poor, with AUC values of only 0.69 and 0.63 in the training group and validation group, respectively. Even with high-quality CT scans interpreted by skilled radiologists, the accuracy of PCN subclassification remains disappointing.
A recent study assessed the discriminatory efficacy of conventional CT imaging features in distinguishing SCN from MCN and presented it by building a nomogram based on multivariate logistic regression (34). In contrast, our study not only considered conventional clinic-radiological features, but also incorporated radiomics features that reflected the deeper dimensional information of the images to construct a comprehensive model. The results showed that the combined model demonstrated better predictive ability than the clinical  model alone. Several previous studies have applied radiomics to the differentiation of PCNs and have achieved good results (35,36). However, further validation is required because of the limited amount of data (N < 80) and the high risk of overfitting. Moreover, a nomogram has not yet been established to visualize the radiomics model. Finally, they performed feature extraction almost uniquely on VP CT images. In the present study, in addition to VP CT scans, we also investigated the radiomics signatures on plain and AP CT images. The VP-Radscore AUC value was the best among the three single-phase radiomics models; similarly, the radiomics models of both the VP and AP had superior AUC values compared to those of the PS model. These results require verification, because they rely heavily on the experience of the radiologist performing manual segmentation (22). Interestingly, several radiomics studies have constructed radiomics models from only PS and have achieved good results in disease prediction (37,38). The feature composition of the MP radiomics model included 8 features (57.1%) in the VP, 4 in the AP (28.6%), and only 2 (14.3%) in the PS. Our highest quantitative ranking of the VP is consistent with most previously published pancreatic radiomics research, while the PS was used less frequently. Therefore, this study is also significant in that it provides preliminary insight into the effect of contrast-enhanced CT scan phase on the predictive efficacy of imaging histology models and establishes a more comprehensive model to summarize various types of risk factors for prediction.   Our study has some limitations. First, this was a retrospective study that was conducted in a single center with a relatively small sample size. Large-scale external validation is needed to further demonstrate the clinical efficacy of the nomogram constructed here. Second, while our method of manual segmentation set the basis for excellent results, this was possible because of our relatively small number of cases. For widespread application of this technique, more research employing automatic or semiautomatic image segmentation is likely to be necessary. Third, the patients included in this study all had SCN or MCN confirmed using surgical pathology, and there may have been a selection bias. IPMN or other pancreatic cystic diseases need to be further studied to broaden the clinical application of this algorithm.
In conclusion, our study has established a novel multi-phase CT-based radiomics nomogram for a noninvasive preoperative differentiation of SCNs from MCNs. The nomogram could provide a reference basis for an accurate diagnosis, thereby avoiding unnecessary surgical resection in clinical practice. We also preliminarily explored the influence of specific feature extraction phases on the predictive efficacy of the radiomics model; the results may be enlightening to subsequent radiomics studies.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of Huashan Hospital, Fudan University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JG and FH: made the same effort in study, designed and carried out the experiments. XW: collected and sorted the data. SD: technology support. JZ: guided and modified the manuscript. All authors contributed to the article and approved the submitted version.