Edited by: Frank Emmert-Streib, Tampere University, Finland
Reviewed by: Junjie Zhang, Royal Bank of Canada, Canada; Shailesh Tripathi, Tampere University of Technology, Finland
This article was submitted to Medicine and Public Health, a section of the journal Frontiers in Artificial Intelligence
†These authors share senior authorship
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Pancreatic Ductal Adenocarcinoma (PDAC) is one of the most aggressive malignancies with poor prognosis (Stark and Eibl,
As a rapidly developing field in medical imaging, radiomics is defined as the extraction and analysis of a large number of quantitative imaging features from medical images including CT and MRI (Kumar et al.,
Conventional radiomics analytics pipeline.
Despite recent progress, radiomics analytics solutions have a major limitation in terms of performance. The performance of radiomics models relies on the amount of information that radiomics features can capture from medical images (Kumar et al.,
A convolutional neural network (CNN) (Schmidhuber,
However, to train a CNN from scratch, millions of parameters need to be tuned. This requires a large sample size which is not feasible to collect in most medical imaging studies (Du et al.,
CNN-based transfer learning is defined as taking images from a different domain such as natural images (e.g., ImageNet) to build a pretrained model and then apply the pretrained model to target images (e.g., CT images of lung cancer) (Ravishankar et al.,
Transfer learning utilizes this property, training top layers using another large dataset while finetuning deeper layers using data from the target domain. For example, the ImageNet dataset contains more than 14 million images (Russakovsky et al.,
In medical imaging, the target dataset is often so small that it is impractical to properly finetune the deeper layers. Consequently, in practice, a pretrained CNN can be used as a feature extractor (Hertel et al.,
In this study, using two independent small sample size resectable PDAC cohorts, we evaluated the prognosis performance of a transfer learning model and compared its performance to that of a traditional radiomics model. The goal of the prognostication was to dichotomize PDAC patients who were candidates for curative-intent surgery to high-risk and low-risk groups. We found that the transfer learning model provides better prognostication performance compared to the conventional radiomics model, suggesting the potential of transfer learning in a typical small sample size medical imaging study.
Two cohorts from two independent hospitals consisting of 68 (Cohort 1) and 30 (Cohort 2) patients were enrolled in this retrospective study. All patients underwent curative intent surgical resection for PDAC from 2007–2012 to 2008–2013 in Cohort 1 and Cohort 2, respectively, and they did not receive other neoadjuvant treatment. Preoperative portal venous phase contrast-enhanced CT images were used. Overall survival (including survival as duration and death as the event) was collected as the primary outcome and it was calculated as the duration from the date of preoperative CT scan until death. To exclude the confounding effect of postoperative complications, patients who died within 90 days after the surgery were excluded. Institutional review board approval was obtained for this study from both institutions (Khalvati et al.,
An in-house developed Region of Interest (ROI) contouring tool (ProCanVAS Zhang et al.,
A manual contour of CT scan from a representative patient in cohort 2.
Radiomics features were extracted using the PyRadiomics library (Van Griethuysen et al.,
List of radiomic feature classes and filters.
Second-order texture features | Features extracted from Gray-Level |
Morphology features | Features based on the shape of the region of interest |
Filters | No filter, exponential, gradient, logarithm, square, square-root, local binary pattern, wavelet |
We developed a transfer learning model (LungTrans) pretrained by CT images from non-small-cell lung cancer (NSCLC) patients. The Lung CT dataset was published on Kaggle for Lung Nodule Analysis (LUNA16), containing CT images from 888 lung cancer patients and the outcome (malignancy or not) (Armato et al.,
Architecture for pretrained CNN using LUNA16 data.
To have a proper and robust validation, training and test datasets were collected from two different institutions. In Cohort 1 (training cohort,
The prognostic values of these two models were evaluated in Cohort 2 (
Using features from the PyRadiomics feature bank, the Random Forest model yielded AUC of 0.54 [95% Confidence Interval (CI): 0.32–0.76] in the test cohort (Cohort 2) (mtry: 2). In contrast, using LungTrans features, the AUC of the Random Forest model reached 0.81 (95% CI: 0.64–0.98) in the test cohort (mtry: 17). The performances of these two models for both training and test cohorts are listed in
Summary of models' performances in AUC.
PyRadiomics model | 0.57 (95% CI: 0.42–0.73) | 0.54 (95% CI: 0.32–0.76) |
Transfer learning model | 0.72 (95% CI: 0.58–0.86) | 0.81 (95% CI: 0.64–0.98) |
Confusion Matrix of PyRadiomics model in the test cohort.
Predicted death | 12 | 10 |
Predicted survival | 3 | 5 |
Confusion matrix of transfer learning model in the test cohort.
Predicted Death | 13 | 4 |
Predicted Survival | 2 | 11 |
To investigate the prognostic value of each PyRadiomics features, variable importance indices were calculated using the Caret Package in R. The top ten features were first order entropy, first order uniformity, first order interquartile range, GLSZM gray level non-uniformity normalized, GLRLM run length non-uniformity normalized, GLCM cluster tendency, NGTDM busyness, GLSZM small area high gray level emphasis, GLSZM low gray level zone emphasis, and GLSZM large area high gray level emphasis. This confirming previous studies in this field where similar radiomic features have been reported to be prognostic of PDAC (Eilaghi et al.,
Comparing the ROC curves using Delong ROC test (DeLong et al.,
In univariate Cox Proportional Hazard analysis, the risk score from the PyRadiomics model was not associated with overall survival. In contrast, the risk score from the LungTrans model had significant prognostic value with a Hazard Ratio of 1.86 [95% Confidence Interval (CI): 1.15–3.53],
Performance of risk score models in Cox Proportional Hazard analysis.
PyRadiomics based risk score | 1.03 (95% CI: 0.60–1.76) | 0.91 |
Transfer learning based risk score | 1.86 (95% CI: 1.15–3.53) | 0.04 |
Using the risk scores, patients can be categorized into low-risk or high-risk groups based on the median values. As shown in Kaplan-Meier plots in
Kaplan-Meier plots for overall survival in Cohort 2.
In this study, we developed and compared two prognostic models for overall survival for resectable PDAC patients using the PyRadiomics and transfer learning features banks pretrained by lung CT images (LungTrans). The LungTrans model achieved significantly better prognosis performance compared to that of the traditional radiomics approach (AUC of 0.81 vs. 0.54). This result suggested that the transfer learning approach has the potential of significantly improving prognosis performance in the resectable PDAC cohort using CT images.
Previous transfer learning studies in medical imaging research often utilized ImageNet pretrained models (Chuen-Kai et al.,
Although the proposed transfer learning model outperformed the conventional radiomics model, this was not an indication to discard radiomic features altogether. These hand-crafted features have been shown to be prognostic for survival and recurrence in different cancer sites (Kumar et al.,
Despite achieving promising results, we should also note that the differences between NSCLC and PDAC are substantial, in terms of their biological profiles and prognoses, and thus, they may not have similar appearances in CT images. This is a limitation of the present study. A larger PDAC dataset would allow us to address these differences and test different transfer learning approaches in the context of PDAC prognosis. For example, finetuning a few layers of the CNN pretrained by NSCLS CT images using PDAC CT images would allow the network extract features that may further adapt to the PDAC images and lead to better performance.
In this study, we aimed to improve the accuracy of the survival model using the transfer learning approach. For diseases with poor prognosis, including PDAC, providing binary survival classifications offers limited information for clinicians for decision making since the survival rates are usually low. It would be more beneficial to provide time vs. risk information, e.g., identify the high-risk time intervals for a resectable PDAC patient using CT images. Future studies may choose to combine the transfer learning-based features extraction methods with the recent work on deep learning-based survival models (e.g., DeepSurv Katzman et al.,
Deep transfer learning has the potential to improve the performance of prognostication for cancers with limited sample sizes such as PDAC. In this work, the proposed transfer learning model outperformed a predefined radiomics model for prognostications in resectable PDAC cohorts.
The datasets of Cohort 1 and Cohort 2 analyzed during the current study are available from the corresponding author on reasonable request pending the approval of the institution(s) and trial/study investigators who contributed to the dataset.
This study was reviewed and approved by the research ethics boards of University Health Network, Sinai Health System, and Sunnybrook Health Sciences Centre. For this retrospective study the informed consent was obtained for Cohort 1 and the need for informed consent was waived for Cohort 2.
YZ, MAH, and FK contributed to the design of the concept. EML, SG, PK, MAH, and FK contributed in collecting and reviewing the data. YZ and FK contributed to the design and implementation of quantitative imaging feature extraction and machine learning modules. All authors contributed to the writing and reviewing of the paper and read and approved the final manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer JZ declared a past co-authorship with one of the authors FK to the handling editor.
This manuscript has been released as a pre-print at arXiv (Zhang et al.,
Receiver operating characteristic
Area under the ROC curve
Computed tomography
Confidence interval
Convolutional neural network
Gray-Level Co-occurrence matrix
Non-small-cell lung cancer
Pancreatic ductal adenocarcinoma
Region of interest
Synthetic minority over-sampling technique.