Prediction of 5-year progression-free survival in advanced nasopharyngeal carcinoma with pretreatment PET/CT using multi-modality deep learning-based radiomics

Gu, Bingxin; Meng, Mingyuan; Bi, Lei; Kim, Jinman; Feng, David Dagan; Song, Shaoli

doi:10.3389/fonc.2022.899351

ORIGINAL RESEARCH article

Front. Oncol., 29 July 2022

Sec. Head and Neck Cancer

Volume 12 - 2022 | https://doi.org/10.3389/fonc.2022.899351

Prediction of 5-year progression-free survival in advanced nasopharyngeal carcinoma with pretreatment PET/CT using multi-modality deep learning-based radiomics

Bingxin Gu^1,2,3,4,5†

Mingyuan Meng^6†

Lei Bi⁶

Jinman Kim⁶

David Dagan Feng⁶

Shaoli Song^1,2,3,4,5,7*

¹Department of Nuclear Medicine, Fudan University Shanghai Cancer Center, Shanghai, China
²Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
³Center for Biomedical Imaging, Fudan University, Shanghai, China
⁴Shanghai Engineering Research Center of Molecular Imaging Probes, Shanghai, China
⁵Key Laboratory of Nuclear Physics and Ion-beam Application Ministry of Education (MOE), Fudan University, Shanghai, China
⁶School of Computer Science, The University of Sydney, Sydney, NSW, Australia
⁷Department of Nuclear Medicine, Shanghai Proton and Heavy Ion Center, Shanghai, China

Objective: Deep learning-based radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year progression-free survival (PFS) in advanced nasopharyngeal carcinoma (NPC) using pretreatment PET/CT images.

Methods: A total of 257 patients (170/87 patients in internal/external cohorts) with advanced NPC (TNM stage III or IVa) were enrolled. We developed an end-to-end multi-modality DLR model, in which a 3D convolutional neural network was optimized to extract deep features from pretreatment PET/CT images and predict the probability of 5-year PFS. The TNM stage, as a high-level clinical feature, could be integrated into our DLR model to further improve the prognostic performance. For a comparison between conventional radiomics and DLR, 1,456 handcrafted features were extracted, and optimal conventional radiomics methods were selected from 54 cross-combinations of six feature selection methods and nine classification methods. In addition, risk group stratification was performed with clinical signature, conventional radiomics signature, and DLR signature.

Results: Our multi-modality DLR model using both PET and CT achieved higher prognostic performance (area under the receiver operating characteristic curve (AUC) = 0.842 ± 0.034 and 0.823 ± 0.012 for the internal and external cohorts) than the optimal conventional radiomics method (AUC = 0.796 ± 0.033 and 0.782 ± 0.012). Furthermore, the multi-modality DLR model outperformed single-modality DLR models using only PET (AUC = 0.818 ± 0.029 and 0.796 ± 0.009) or only CT (AUC = 0.657 ± 0.055 and 0.645 ± 0.021). For risk group stratification, the conventional radiomics signature and DLR signature enabled significant difference between the high- and low-risk patient groups in both the internal and external cohorts (p < 0.001), while the clinical signature failed in the external cohort (p = 0.177).

Conclusion: Our study identified potential prognostic tools for survival prediction in advanced NPC, which suggests that DLR could provide complementary values to the current TNM staging.

Introduction

Nasopharyngeal carcinoma (NPC) is a malignant epithelial cancer. The GLOBOCAN 2020 estimates that there are approximately 133,354 new NPC cases and 80,008 NPC-related deaths worldwide in 2020 (according to the International Agency for Research on Cancer (IARC)) (1). NPC mainly arises from the nasopharynx epithelium, especially the fossa of Rosenmuller (2), and can be pathologically divided into keratinizing differentiated tumor, non-keratinizing differentiated tumor, and non-keratinizing undifferentiated tumor. Due to its unique anatomical structure and radiosensitivity (3, 4), the primary therapeutic regimen for NPC is radiotherapy, with or without chemotherapy, targeted therapy, and immunotherapy. Survival prediction is a major concern in clinical cancer research, as it provides early prognostic information that is needed to guide the therapeutic regimen. In clinical practice, the tumor, node, metastasis (TNM) stage is widely used as an indicator for survival prediction according to the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control (UICC) staging system. However, for patients classified into the same TNM stage, their prognoses may differ widely, and their 5-year survival rates range from 10% to 40% for advanced NPC (5–7). This may be attributed to the fact that the TNM stage only takes into account the anatomical information, e.g., size, number, border, and location. Therefore, the TNM stage can provide limited benefit for prognoses in patients with advanced NPC (8).

Many clinical biomarkers, such as age, serum lactate dehydrogenase (LDH), body mass index (BMI), and Epstein–Barr virus (EBV), have also been reported as individual prognostic indicators for survival prediction in advanced NPC (9–11). However, these indicators are not specifically relevant to the disease and can be influenced by other indicators, thus failing in repeatability and practicability (12, 13). Non-invasive image-derived biomarkers have also shown good prognostic performance for survival prediction in advanced NPC (14–16). However, conventional imaging modalities, such as computed tomography (CT) and magnetic resonance imaging (MRI), only provide the tumor’s anatomical information. Multi-modality imaging of positron emission tomography/CT (PET/CT) provides both anatomical (from CT) and metabolic (from PET) information about the tumor. However, conventional parameters derived from PET/CT, including standardized uptake value (SUV), metabolic tumor volume (MTV), and total lesion glycolysis (TLG), failed to represent intra-tumor information such as tumor texture, intensity, heterogeneity, and morphology (14, 15, 17, 18). Therefore, prognostic indicators that can better represent the tumor characteristics, especially in intra-tumor information, can more accurately predict survival.

Radiomics, as a widely recognized computational method for prognosis, exploits quantitative features (indicators) extracted from medical images to represent tumor characteristics (19). It has drawn much attention among clinical oncologists due to its ability to provide comprehensive representations of tumor characteristics, including intra-tumor information (20). Conventional radiomics refers to the extraction and analysis of high-dimensional handcrafted features from medical images. Through high-throughput feature extraction and statistical machine learning methods, radiomics can extract and analyze tumor characteristics and has been widely used in many clinical applications (20). Zhang et al. (21) performed radiomics-based prediction of local failure and distant failure in advanced NPC from MRI images. They experimented with 54 cross-combinations derived from six feature selection methods and nine classification methods, and they identified optimal combinations in terms of area under the receiver operating characteristic (ROC) curve (AUC) and testing error. Lv et al. (22) extracted shape, SUV/Hounsfield unit (HU), and texture features from PET and CT images separately and demonstrated that these features could improve prognostic prediction for NPC patients. Xu et al. (23) partitioned the primary tumor of NPC from PET/CT images into different subregions and then separately extracted handcrafted features from these subregions. Yang et al. (24) evaluated the robustness of the handcrafted features extracted from tumor volumes on PET/MR of NPC patients. Conventional radiomics has also been studied in other cancers, such as other head and neck cancers (25) and lung cancer (26). A research of Lv et al. (27) demonstrated that multi-level multi-modality fusion radiomics-based model derived from PET/CT outperformed single-modality models in prognostic prediction for head and neck cancer patients. These studies demonstrated the capabilities of radiomics for prognosis and identified the optimal conventional radiomics methods for their clinical targets through comprehensive comparisons. However, since conventional radiomics is heavily dependent on human prior knowledge, such as handcrafted feature extraction and manual tuning of many model parameters, its limitations in bringing a source of human bias and lacking the ability to understand high-level semantic information have been well recognized (28, 29).

Recently, deep learning, which leverages deep neural networks to learn deep representations (features) of patterns within images, has achieved great success in medical image analysis and inspired trends toward deep learning-based radiomics (DLR) (30). Compared to conventional radiomics, which normally consists of four separate steps (Figure 1A), DLR adopts a deep neural network to directly predict patient outcomes from the preprocessed image data (Figure 1B). DLR can take segmented regions of interest (ROIs) as input, but this is optional, with some pipelines that exclude ROI segmentation (30). DLR’s feature extraction and feature analysis are jointly learned, in an end-to-end manner, which thereby removes the reliance on time-consuming handcrafted feature extraction, and allows for automatic learning of relevant and robust features without human intervention (28). In other words, DLR can remove the human bias brought by handcrafted features and potentially discover high-level semantic features that may be overlooked by manually defined feature extraction.

FIGURE 1

Figure 1 An illustration of radiomics process. (A) Conventional radiomics consisting of four steps: image acquisition/reconstruction, ROI segmentation, feature extraction, and statistical analysis/modeling. (B) DLR, its feature extraction and analysis are jointly learned using a deep neural network. The dotted arrow indicates the optional ROI segmentation operation. ROI, region of interest; DLR, deep learning-based radiomics.

DLR has been widely used in the studies of many cancers, including glioma (31), lung cancer (32), breast cancer (33), renal tumor (34), and NPC (35–38). Peng et al. (35) proposed one of the earliest studies where DLR was introduced into the prognosis of NPC. They used pre-trained convolutional neural networks (CNNs) to extract deep features from PET and CT images separately and then fed the deep features, as well as conventional handcrafted features, into a Cox proportional hazards regression model to establish a prognostic nomogram. Their study suggested that deep features can serve as reliable and powerful indicators for prognosis, but their feature extraction and analysis were not jointly learned in an end-to-end manner. Existing end-to-end DLR studies (31–34, 36–38) are limited by the following: 1) they were mainly designed for a single imaging modality such as MRI and CT, so their DLR models cannot derive complementary features from multi-modality PET/CT images; 2) they had limited comparison to the conventional radiomics methods (e.g., only a few conventional radiomics methods were included for comparison), which undermines the reliability of their conclusions.

In this study, we aimed to develop an end-to-end multi-modality DLR model to directly predict 5-year progression-free survival (PFS) from pretreatment PET/CT images and perform a comprehensive comparison with conventional radiomics methods. Our DLR model is a 3D CNN that was purposely optimized for multi-modality PET/CT images and can simultaneously extract complementary deep features from both PET and CT images. Our DLR model can integrate the TNM stage as a high-level clinical feature, and this has been demonstrated to further improve prognostic performance.

Material and methods

Patients and PET/CT image data

From November 2009 to May 2019 and June 2014 to May 2019, the medical records of 281 NPC patients from Fudan University Shanghai Cancer Center (FUSCC) and 178 NPC patients from Shanghai Proton and Heavy Ion Center (SPHIC) undergoing PET/CT were retrospectively screened, respectively. The eligibility criteria were as follows: i) TNM stage III or IVa according to the 8th edition of AJCC guidelines, ii) the follow-up time of patients without progression of more than 5 years, and iii) available clinical data and PET/CT imaging data. Finally, 170 patients from FUSCC and 87 patients from SPHIC were enrolled in this study (Figure 2). All patients received therapeutic regimens at FUSCC or SPHIC according to the National Comprehensive Cancer Network (NCCN) guidelines. The detailed therapeutic regimen is presented in the Supplementary Material.

FIGURE 2

Figure 2 The flowchart of patient inclusion and exclusion. NPC, nasopharyngeal carcinoma; FUSCC, Fudan University Shanghai Cancer Center; SPHIC, Shanghai Proton and Heavy Ion Center.

After completion of initial treatment, physical examination, imaging examination, and nasopharyngoscopy were performed every 3 months in the first 2 years, then every 6 months in the third to fifth year, and once a year thereafter. Treatment responses were identified by imaging examination according to Response Evaluation Criteria in Solid Tumors (RECIST) 1.1 (39). The following endpoint was set as PFS, defined as the time from randomization to disease progression (locoregional or distant) or death (3). The median follow-up time was 89 months (range, 61–149 months) for FUSCC and 65 months (range, 29–92 months) for SPHIC. FUSCC and SPHIC Ethical Committee approved this study, and informed consent was obtained from all enrolled patients.

All patients underwent ¹⁸F-fluorodeoxyglucose (¹⁸F-FDG) PET/CT (Siemens Biograph 16HR, Knoxville, TN, USA) prior to treatment within 4 weeks. Detailed data acquisition and reconstruction were recorded in the Supplementary Material.

Problem definition

We aimed for long-time survival prediction in patients with advanced NPC using pretreatment PET/CT images. In this study, we mainly focused on 5-year PFS and regarded the survival prediction as a binary classification problem. Specifically, the patients with PFS ≤ 60 months (5 years) were labeled as 1, while other patients (PFS > 60 months) were labeled as 0. Then, the objective of this study was to classify patients into these two classes (0 or 1) using pretreatment PET/CT images.

Internal and external cohorts

The 170 patients acquired from FUSCC were assigned to an internal cohort, while the 87 patients acquired from SPHIC were assigned to an external cohort. The internal cohort was used for the establishment and internal validation of all prognostic methods. Specifically, each method was trained and validated using fivefold cross-validation within the internal cohort, and all hyper-parameters were decided based on the internal validation. For external validation, the five models of each method, established at the fivefold cross-validation, were validated on the external cohort. It should be noted that the external cohort is completely independent of the model establishment.

Data preprocessing

All PET/CT images were preprocessed through the following steps: 1) primary tumors were segmented on PET and CT images simultaneously using a semi-automatic segmentation algorithm, which is available with the ITK-SNAP software (version 3.8.0, http://www.itksnap.org). The semi-automatic segmentation results were then manually adjusted and refined by a senior nuclear medicine physician (SL-S) to ensure reliability. 2) PET images were normalized based on body mass. The derived body mass was applied to convert PET images into SUV maps. 3) PET/CT images and tumor segmentation masks were resampled into isotropic voxels of unit dimension to ensure comparability, where 1 voxel corresponds to 1 mm³. Specifically, PET/CT images and segmentation masks were resampled via linear interpolation and nearest-neighbor interpolation, respectively. 4) PET/CT images were multiplied with the binary masks of tumor segmentation to extract ROI and then were cropped into three-dimensional patches with a size of 80 × 80 × 64 voxels. The patch size was selected to ensure that the whole segmented tumor can be included and the tumor center aligned with the patch center. 5) The patches were then normalized to the range [0, 1] individually using the 99th percentile pixel value and 0 as upper and lower limits, respectively. The pixels whose values are higher/lower than the upper/lower limit were cut off to be 1/0.

Conventional radiomics analysis

For performing a comprehensive comparison between DLR and conventional radiomics, we first followed the analytic scheme proposed by Zhang et al. (21) to identify the optimal conventional radiomics methods for our research objective, and then the optimal methods were chosen as benchmarks and compared to our DLR model in the following analysis. Specifically, a total of six feature selection methods based on statistical approaches were used in the analysis: L¹-Logistic Regression (L¹-LOG), L¹-Support Vector Machine (L¹-SVM), Random Forest (RF), Distance Correlation (DC), Elastic Net Logistic Regression (EN-LOG), and Sure Independence Screening (SIS). For classification methods, we investigated nine machine learning classifiers: L²-Logistic Regression (L²-LOG), Kernel Support Vector Machines (KSVM), Linear-SVM (LSVM), Adaptive Boosting (AdaBoost), Random Forest (RF), Neural Network (Nnet), K-nearest neighborhood (KNN), linear discriminant analysis (LDA), and Naive Bayes (NB). A total of 54 cross-combinations can be derived from the six feature selection methods and nine classification methods. All conventional radiomics methods were implemented using the R package (version 3.6.3, http://www.R-project.org). The following R packages were used for feature selection methods: ‘SIS’ (SIS), ‘VSURF’ (RF), ‘LiblineaR’ (L¹-LOG and L¹-SVM), ‘Energy’ (DC), and ‘glmnet’ (EN-LOG). Additionally, the following packages were used for classification methods: ‘e1071’ (KSVM), ‘LibLineaR’ (LSVM and L²-LOG), ‘adabag’ (AdaBoost), ‘randomForest’ (RF), ‘nnet’ (Nnet), ‘knn’ (KNN), ‘MASS’ (LDA), and ‘e1071’ (NB).

We extracted handcrafted radiomics features for each patient, including 19 features from First Order Statistics (FOS), 24 features from Grey-Level Cooccurrence Matrix (GLCM), 16 features from Grey-Level Run Length Matrix (GLRLM), 16 features from Grey-Level Size Zone Matrix (GLSZM), 5 features from Neighboring Grey Tone Difference Matrix (NGTDM), and 16 features based on the 3D shape of tumors. The 16 shape-based features were extracted from the segmentation mask, while the remaining (19 + 24 + 16 + 16 + 5) = 80 FOS and textural features (GLCM, GLRLM, GLSZM, and NGTDM) were extracted from the preprocessed PET and CT patches, respectively. FOS and textural features also were recomputed after different wavelet decomposition in three directions (x, y, and z) of PET and CT. Performing low-pass or high-pass wavelet filters along x, y, or z directions resulted in eight decompositions of the original image (LLL, LLH, LHL, LHH, HHH, HLL, HHL, and HLH). Therefore, FOS and textural features were extracted from a total of (1 + 8) = 9 decompositions of PET and CT (including original ones). Consequently, the number of handcrafted radiomics features is 80 × 9 × 2 + 16 = 720 × 2 + 16 = 1,456. A total of 720 features were extracted from PET or CT, while 16 features were extracted from tumor segmentation masks.

Redundant features with Spearman’s correlation >0.7 were first eliminated. Then, the remaining ones were fed into a feature selection model to screen for robust and relevant features. The number of the selected features derived from each selection method was regarded as a hyper-parameter and was individually optimized through grid searching from 1 to 30, with 1 as a step. Specifically, for each selection–classification combination, we tried to retain 1–30 features from the selection model and fed them into the classification model. The remaining number resulting in the highest internal validation result was chosen. Therefore, the number of the selected features for each combination is different. The average number of selected features derived from each selection method is as follows: L¹-LOG (10.14), L¹-SVM (8.53), SIS (6.43), DC (6.85), EN-LOG (8.26), and RF (4.57). Moreover, we regarded the TNM stage as a clinical feature because it showed significant relevance to PFS in our univariate analysis (Supplementary Table S1). Both the selected radiomics features and clinical feature (TNM stage) were fed into a classification model for statistical analysis. The top three optimal combinations were identified based on the internal validation and then were further validated on the external cohort.

Deep learning-based radiomics analysis

We developed a multi-modality DLR model to directly predict 5-year PFS from pretreatment PET/CT images in an end-to-end manner. This DLR model is a 3D CNN, and its architecture is based on 3D Deep Multi-modality Collaborative Learning (3DMCL) (29), a deep multi-modality architecture for predicting the distant metastases of soft-tissue sarcoma from PET/CT images. It takes as input a pair of preprocessed PET/CT patches (segmented ROIs) and outputs normalized probabilities for both classes (0 or 1). Note that although segmentation is not compulsory for DLR, we used the segmented ROIs in our DLR models for a fair comparison with conventional radiomics methods. The DLR network architecture is shown in Figure 3, which consists of two separate branches processing the PET and CT images in a simultaneous manner. For each PET and CT branch, there are five convolutional layers (Conv1–5) of 16, 32, 64, 128, and 256 filters with kernel sizes of 5, 3, 3, 3, and 3, respectively. The middle four convolutional layers (Conv2–5) are with a stride of two to reduce the resolution of feature maps. To fuse the feature maps derived from both branches, the outputs of PET and CT branches are concatenated together and then fed into another convolutional layer (Conv6) of 768 filters with a kernel size of 3. Each convolutional layer is followed by a batch normalization layer and an Exponential Linear Unit (ELU) activation. The feature maps obtained from the last convolutional layer (Conv6) are considered deep features with high-level semantic information related to tumor characteristics. Finally, four fully connected layers (FC1–4), which have 1,024, 512, 256, and 2 nodes, respectively, are added to perform survival prediction based on deep features. The first three fully connected layers (FC1–3) are followed by a ReLU activation and a dropout layer, while the last layer (FC4) is followed by a softmax classifier layer, which outputs normalized probabilities of two classes (0 or 1). The TNM stage, as a high-level clinical feature, can be concatenated with the third fully connected layer (FC3).

FIGURE 3

Figure 3 An illustration of the CNN used in the DLR analysis. This network takes a pair of preprocessed PET/CT patches (segmented ROIs) as input, and the final layer (FC4) outputs normalized probabilities for both classes (0 or 1). The clinical feature (TNM stage) is concatenated with FC3. CNN, convolutional neural network; DLR, deep learning-based radiomics; ROIs, regions of interest.

We first developed a DLR model taking PET, CT, and TNM stages as input (named the PET + CT + TNM model). Then, we further established three degraded DLR models to evaluate the individual values of PET, CT, and TNM stage: 1) PET + CT model, a DLR model taking PET and CT as input. In this model, the TNM stage was not concatenated with the FC3 layer. 2) PET + TNM model: a DLR model taking PET and the TNM stage as input, in which the CT branch was truncated and only the PET branch was connected to the Conv6 layer. 3) CT + TNM model: a DLR model taking CT and the TNM stage as input, in which the PET branch was truncated and only the CT branch was connected to the Conv6 layer.

All DLR models were implemented using Keras with a Tensorflow backend on a 12GB TITAN V GPU. We used the Adam optimizer with a batch size of 32 and a learning rate of 0.0001 for training the model. Cross-entropy loss function was used as the loss and our training stops when there are no further drops in the total loss. All the hyper-parameters, including batch size, learning rate, dropout rate, and regularization terms, were chosen through fivefold cross-validation within the internal cohort. During the training stage, data augmentation was applied to the input images in real-time to avoid overfitting. The used data augmentation techniques included random translations up to 8 pixels, random rotations up to 15 degrees, and random flipping along three axes. We sampled an equal number of positive and negative samples during the data augmentation process to minimize the problem introduced by unbalanced classes.

Statistical analysis

For clinical and conventional PET parameters including age, gender, EBV status, histology, BMI, T stage, N stage, TNM stage, SUV_max, SUV_mean, MTV, and TLG, frequencies with percentages were used to describe categorical variables; medians or means with ranges were used to describe continuous characteristics. Differences in these parameters between the internal and external cohorts were assessed using the Mann–Whitney test (for continuous characteristics) and the X² test or Fisher’s exact test (for discrete characteristics). Univariate and multivariate analyses were performed using Cox proportional hazards regression. Factors with p < 0.1 in univariate analysis were taken into multivariate analysis. Cox proportional regression analyses were implemented using SPSS (version 22.0; IBM Inc., New York, NY, USA). p < 0.05 was considered statistically significant.

The performance of conventional radiomics methods and DLR models was evaluated using AUC and testing error. The statistical significance between AUCs was tested via DeLong’s method using R packages (version 3.6.3, http://www.R-project.org).

Furthermore, survival analyses using the Kaplan–Meier method were performed for risk group stratification. Specifically, patients with positive/negative prediction results were stratified into high/low-risk groups, and then a two-sided log-rank test was used to compare the two groups. The survival analyses were performed using SPSS.

Results

Patients and Cox regression analyses for clinical and conventional PET parameters

A total of 170 advanced NPC patients (female = 40, male = 130; median age = 46, range 16–78) were included in the internal cohort (Table 1). Among these patients, 121 patients (71.18%) were diagnosed with TNM stage III, and 49 patients (28.82%) were TNM stage IVa; 126 patients (74.12%) were non-keratinizing undifferentiated NPC; 80 patients (47.06%) were infected with EBV. Induction chemotherapy (IC) was given to 147 out of 170 (86.47%) patients, while concurrent chemoradiotherapy (CCRT) was used for 88 out of 170 (51.76%) patients. Furthermore, 77 out of 170 (45.29%) patients were treated with IC + CCRT. The mean SUV_max value for the primary tumor was 11.40 g/ml (range, 2.74-47.49 g/ml), the mean MTV value was 30.34 ml (range, 0.17–152.99 ml), and the mean TLG value was 155.92 g (range, 0.45–864.03 g). The detailed baseline data of the external cohort were also summarized in Table 1.

TABLE 1

Table 1 Patient characteristics in the internal and external cohorts.

Furthermore, 48 out of 170 (28.24%) patients (for the internal cohort) and 45 out of 87 (51.72%) patients (for the external cohort) suffered disease progression or death after treatment within the first 5 years. Among clinical and conventional PET parameters, only the TNM stage was significantly associated with PFS in univariate analysis (p = 0.047) in the internal cohort, while no factors showed a significant correlation with PFS in the external cohort (Supplementary Table S1). Furthermore, MTV and TLG showed significant association with PFS in univariate analysis (p = 0.002 and p = 0.002) on the internal cohort using a cutoff value of 39.80 ml and 198.68 g, respectively. However, none of these attributes showed a significant correlation with PFS in the multivariate analysis of both cohorts (Supplementary Table S2).

Prognostic performance of conventional radiomics

Figure 4A shows that the combinations with top five mean AUC are as follows: RF + RF (0.796 ± 0.033), RF + AdaBoost (0.783 ± 0.041), SIS + LSVM (0.778 ± 0.024), L¹-LOG + L²-LOG (0.772 ± 0.027), and L¹-LOG + KSVM (0.769 ± 0.038). As demonstrated in Figure 4B, the combinations with bottom five mean testing error are as follows: RF + RF (0.267 ± 0.037), RF + KSVM (0.283 ± 0.034), RF + AdaBoost (0.286 ± 0.029), L¹-LOG + KSVM (0.298 ± 0.026), and RF + KNN (0.301 ± 0.032).

FIGURE 4

Figure 4 A heatmap depicting the mean internal validation AUC (A) and testing error (B) of all 54 cross-combinations derived from six feature selection methods (in rows) and nine classification methods (in columns). AUC, area under the receiver operating characteristic curve.

Scatterplot was used to screen out the combinations with both high AUC and low testing error. Figure 5 illustrates that the optimal combinations are RF + RF (AUC = 0.796 ± 0.033, testing error = 0.267 ± 0.037), RF + AdaBoost (AUC = 0.783 ± 0.041, testing error = 0.286 ± 0.029), and L¹-LOG + KSVM (AUC = 0.769 ± 0.038, testing error = 0.298 ± 0.026), which show higher prognostic performance than other combinations.

FIGURE 5

Figure 5 A scatterplot depicting the mean internal validation AUC and testing error of all 54 cross-combinations derived from six feature selection methods and nine classification methods. Two red dotted lines distinguish the combinations that show top five performance in AUC and testing error. Three highly reliable and prognostic combinations that show top five performance in both AUC and testing error (RF + RF, RF + AdaBoost, and L¹-LOG + KSVM) are displayed in red points. AUC, area under the receiver operating characteristic curve; RF, Random Forest; AdaBoost, Adaptive Boosting; L¹-LOG, L1-Logistic Regression; KSVM, Kernel Support Vector Machines.

Comparison between deep learning-based radiomics and conventional radiomics

We first compared the performance of DLR models using ROC analysis. Figure 6 and Table 2 show that the PET + CT + TNM model has the highest prognostic performance in the internal validation (AUC = 0.842 ± 0.034, 95% CI: 0.801–0.889; testing error = 0.194 ± 0.029) and external validation (AUC = 0.823 ± 0.012, 95% CI: 0.787–0.862; testing error = 0.238 ± 0.008), while PET + CT model and PET + TNM model show lower but also good prognostic performance in the internal validation (AUC = 0.825 ± 0.041 and 0.818 ± 0.029, 95% CI: 0.775–0.870 and 0.762–0.862; testing error = 0.223 ± 0.035 and 0.218 ± 0.024) and external validation (AUC = 0.819 ± 0.017 and 0.796 ± 0.009, 95% CI: 0.778–0.856 and 0.747–0.829; testing error = 0.241 ± 0.009 and 0.262 ± 0.006). However, CT + TNM model shows much lower prognostic performance in the internal validation (AUC = 0.657 ± 0.055, 95% CI: 0.596–0.718; testing error = 0.375 ± 0.048) and external validation (AUC = 0.645 ± 0.021, 95% CI: 0.591–0.709; testing error = 0.403 ± 0.011).

FIGURE 6

Figure 6 ROC curves for comparison among four DLR models and comparison among optimal conventional radiomics methods and optimal DLR model on the internal cohort (A, C) and external cohort (B, D). ROC, receiver operating characteristic; DLR, deep learning-based radiomics.

TABLE 2

Table 2 Comparison among four DLR models.

Then, we further compared the optimal DLR model (PET + CT + TNM model) with the optimal conventional radiomics methods (RF + RF, RF + AdaBoost, and L¹-LOG + KSVM). Figure 6 and Table 3 show that the DLR model has significantly higher prognostic performance than the three conventional radiomics methods in the internal validation (AUC = 0.842 vs. 0.796, 0.783, and 0.769; p = 0.038, 0.046, and 0.027, respectively) and external validation (AUC = 0.823 vs. 0.782, 0.767, and 0.755; p = 0.043, 0.026, and 0.015, respectively).

TABLE 3

Table 3 Comparison of the optimal conventional radiomics methods and DLR model.

Survival analysis for risk group stratification

We used the optimal clinical signature (TNM stage), conventional radiomics signature (RF + RF), and DLR signature (PET + CT + TNM model) for risk group stratification. Figure 7 shows the Kaplan–Meier curves of the high- and low-risk patient groups stratified by the clinical signature (A, D), conventional radiomics signature (B, E), and DLR signature (C, F). The conventional radiomics signature and DLR signature enabled significant differences between the high- and low-risk patient groups in both internal and external cohorts (p < 0.001), while the clinical signature failed in the external cohort (p = 0.042 and 0.177 for the internal and external cohorts).

FIGURE 7

Figure 7 Kaplan–Meier curves of risk group stratification based on clinical signature (TNM stage), conventional radiomics signature (RF + RF), and DLR signature (PET + CT + TNM model) on the internal cohort (A–C) and external cohort (D–F). RF, Random Forest; DLR, deep learning-based radiomics.

Discussion

In this study, we proposed an end-to-end multi-modality DLR model to directly predict 5-year PFS from pretreatment PET/CT images in advanced NPC patients. The main finding of this study is that multi-modality image-derived DLR outperformed clinical indicators, conventional PET predictors, and conventional radiomics methods.

Pretreatment medical images contain much more information than diagnosis and the TNM stage. However, clinical oncologists usually only employ ‘visible’ information in routine clinical practice and treatment planning. Conventional PET parameters, such as MTV or TLG, have been demonstrated to serve as an independent predictor for survival prediction (14, 15, 17). However, in this study, the conventional PET parameters (SUV_max, SUV_mean, MTV, and TLG) showed no significant association with PFS (Supplementary Tables S1, S2), which is consistent with some studies where the conventional PET parameters also failed to show significant association for survival prediction (40–42). Therefore, we suggest that the conventional PET parameters have limited reproductivity among different studies and different centers.

In radiomics analysis, we found that both conventional radiomics and DLR showed consistent prognostic performance in the internal and external cohorts, despite that the distribution of some clinical characteristics and progression rate showed significant differences (e.g., N2 and N1) between the two cohorts, which indicates strong generalizability. This is possible because radiomics extracts feature from PET and CT images, which employ intra-tumor information such as tumor texture, intensity, heterogeneity, and morphology (43). Intra-tumor information can reflect intra-tumor heterogeneity that is caused by genetic instability and potentially leads to drug resistance and treatment failure (44). The uptake of ¹⁸F-FDG in tumor cells can reflect the intra-tumor heterogeneity by exhibiting variations in glucose metabolism of different tumor regions (45). However, the conventional parameters derived from PET/CT (e.g., MTV and TLG) can only represent the apparent metabolism information, which thereby fails to reflect the intra-tumor heterogeneity (46).

Conventional radiomics depends heavily on handcrafted feature extraction and manual tuning of statistical models. Therefore, in many studies (21, 25, 47–49), different feature extraction schemes, feature selection methods, and statistical models were investigated to identify optimal conventional radiomics methods for a specific clinical target. Our conventional radiomics analysis followed the scheme proposed by Zhang et al. (21), and we derived 54 cross-combinations from six feature selection methods and nine classification methods. Considering both AUC and testing error (Figure 4), the optimal combinations were RF + RF and RF + AdaBoost, which is consistent with a study by Zhang et al. (21). DLR is considered a replacement for conventional radiomics and has demonstrated superior prognostic performance in some studies (31–33). However, there are few studies where DLR was comprehensively compared with conventional radiomics. In most literature (31–34, 36–38), only a few conventional radiomics methods were included in the comparison, but there is no guarantee that the included methods are optimal for their clinical targets. In this study, the three optimal conventional radiomics methods chosen from 54 cross-combinations were chosen as the benchmarks for further comparison with our DLR model. Our DLR models showed the highest performance in prognosis (Table 3) and revealed the best risk group stratification for advanced NPC patients when compared to clinical and conventional radiomics signatures (Figure 7). Through the comprehensive comparison, we suggest that DLR is superior to conventional radiomics in the prediction of long-time PFS in advanced NPC. We attribute this superiority to three main reasons: first, DLR allows for automatic learning of robust and relevant features from multi-modality PET/CT images, which reduces the human bias (or even errors) caused by handcrafted feature extraction. Second, automatic feature learning discovered high-level semantic features that may be overlooked by the manually defined feature extraction. Third, it has been widely recognized that CNNs have the outstanding ability for pattern recognition (50), so DLR can inherit this ability by employing CNN architectures.

Although DLR has been applied for prognoses of NPC in several previous studies (35, 37, 38), there still exist the following limitations: Peng et al. (35) used 2.5D CNNs to extract deep features rather than to directly predict survival outcomes in an end-to-end manner, and the deep features were extracted from informative slices instead of from the entire tumor volume. Jing et al. (38) addressed the aforementioned limitations by employing a 3D end-to-end DLR model that directly predicts the risk of disease progression based on the entire tumor volume. Their study demonstrated that end-to-end DLR models are more effective to extract relevant features and showed higher prognostic performance, but their study was limited by only using single-modality MRI. Zhang et al. (37) also used an end-to-end DLR model to directly predict distant metastasis-free survival (DMFS), but their study suffered from the same limitations as a study by Peng et al. and Jing et al.: they also used a 2D CNN that makes decisions based on maximum three tumor slices, and their study was also limited by using single-modality MRI. In this study, we have overcome these limitations by developing a 3D multi-modality end-to-end DLR model that was purposely optimized for PET/CT images. In contrast to a study by Peng et al. where deep features were separately extracted from PET and CT images, our DLR model can simultaneously extract complementary deep features from both PET and CT images. In the experiments, since a study by Zhang et al. and our study both focused on similar clinical targets (DMFS and PFS) and used the same evaluation metrics (AUC), their result can be directly compared with ours. Our multi-modality DLR model (PET + CT + TNM model) showed higher prognostic performance than the method of Zhang et al. (AUC of 0.842 vs. 0.795), in spite of the fact that Zhang et al. employed additional information (e.g., EBV DNA and treatment regimens) for prediction. The result demonstrates that, by overcoming the aforementioned limitations with a 3D PET/CT-based end-to-end DLR model, our study indeed gained improvements over previous DLR studies in NPC.

The TNM staging system is widely used in the clinic for risk stratification and decision-making for treatment, but the TNM stage alone is limited for survival prediction (8). In the survival analysis (Figure 7), no significant difference between the high- and low-risk patient groups was identified in the external cohort (p = 0.177), which indicates that the TNM stage lacks accuracy in risk group stratification. Nevertheless, existing studies have shown that the TNM stage combined with other clinical signatures may improve the performance for risk discrimination (51–53). Our results show that the TNM stage was significantly associated with PFS in univariate analysis (Table 1). Therefore, we further combined the TNM stage into our DLR models (PET + CT + TNM, PET + TNM, and CT + TNM models) to evaluate its capability for further improving the prognostic performance. For evaluating the value of the TNM stage, we established a DLR model excluding the TNM stage (PET + CT model). The results showed that the PET + CT + TNM model has improved prognostic performance compared to the PET + CT model (Table 2), which suggests that the TNM stage can provide supplementary information and enhance the prognostic performance of survival prediction in NPC patients. We further compared the multi-modality DLR model (PET + CT + TNM model) with two single-modality DLR models (PET + TNM and CT + TNM model) for additional comparisons (Table 2). The results showed that the PET + CT + TNM models outperformed the PET + TNM and CT + TNM models, which suggests that integrating both metabolic (from PET) and anatomical (from CT) information helps achieve higher prognostic performance.

There are some limitations to the current study. First, EBV status was determined by testing plasma anti-EBV IgA antibodies rather than plasma EBV DNA levels in our center, and EBV status was missing for 35.02% of the patients, which might limit the accuracy of statistical analysis. This was because the technology was only recently adopted in our center. Second, our internal cohort is relatively small (170 patients). To alleviate this limitation, we performed fivefold cross-validation within the internal cohort for the establishment and internal validation of all prognostic methods. Then, all methods were further validated on the external cohort acquired from a different center. These settings help to avoid the sampling bias led by a single random training-validation split and gain generalizable results in a small dataset (54). Third, the progression rate of the external cohort is higher than that of the internal cohort (51.72% vs. 28.24%). This is attributed to the fact that patients in the external cohort were enrolled from SPHIC, a young center established in June 2014, with the enrollment time from June 2014 to May 2019. Since we focused on the survival analysis of 5-year PFS, many patients without disease progression were excluded, as they were not followed up for sufficient time (more than 5 years). In contrast, all patients with disease progression were enrolled, which inevitably increased the progression rate in the external cohort. Nevertheless, the variations between the internal and external cohorts did validate that radiomics methods have strong generalizability. Radiomics methods performed patient-wise prediction based on PET/CT of each patient, such that their prediction results were not affected by the progression rate in the cohorts. Fourth, DLR models are limited by the ‘black box’ problem: an algorithm operates in an obscured space that is inaccessible to humans (55). Despite this, existing clinical studies (30–38), even conducted at a very large scale (56), demonstrate the advantages and feasibility of using these models for clinical purposes. A recent study has attempted to make advances into this problem and is leading toward a solution that physicians and patients can better understand how DLR makes decisions from medical images (57). Finally, since we focused on the comparison of DLR and conventional radiomics in their capability for survival prediction in advanced NPC, we, therefore, optimized our DLR models based on the existing reliable and proven deep learning architectures [3DMCL (29)]. In our future work, we plan to evaluate other deep learning architectures and apply the proposed DLR model to other types of cancer diseases.

Conclusion

We introduced and evaluated an end-to-end multi-modality DLR model in predicting 5-year PFS of patients with advanced NPC using pretreatment PET/CT images. Our results demonstrated that our DLR models improved prognostic performance over conventional radiomics methods. Furthermore, our DLR model can facilitate risk group stratification for advanced NPC patients, where the DLR signature demonstrated greater accuracy for risk group stratification than the clinical signature and conventional radiomics signature. Our study suggested that multi-modality DLR derived from pretreatment PET/CT could provide complementary values to the current TNM staging and could guide the individual treatment practice in routine clinical care for NPC patients.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Ethics statement

All procedures involving human participants were reviewed and approved by Fudan University Shanghai Cancer Center (FUSCC) and Shanghai Proton and Heavy Ion Center (SPHIC) Ethic Committee and Institutional Review Boards. Informed consent was obtained from all individual participants included in the study.

Author contributions

SLS and DDF conceptualized and designed the study. BXG, MYM, LB, and JK performed the analysis. BXG and MYM interpreted the data and drafted the manuscript. LB, JK, and SLS revised the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This work was funded by the National Natural Science Foundation of China (grant numbers 81901778 and 81971648).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2022.899351/full#supplementary-material

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin (2021) 71(3):209–49. doi: 10.3322/caac.21660

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Chua MLK, Wee JTS, Hui EP, Chan ATC. Nasopharyngeal carcinoma. Lancet (2016) 387(10022):1012–24. doi: 10.1016/s0140-6736(15)00055-0

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Ribassin-Majed L, Marguet S, Lee AWM, Ng WT, Ma J, Chan ATC, et al. What is the best treatment of locally advanced nasopharyngeal carcinoma? an individual patient data network meta-analysis. J Clin Oncol (2017) 35(5):498–505. doi: 10.1200/JCO.2016.67.4119

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Lee AW, Ma BB, Ng WT, Chan AT. Management of nasopharyngeal carcinoma: Current practice and future perspective. J Clin Oncol (2015) 33(29):3356–64. doi: 10.1200/JCO.2015.60.9347

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Huang CI, Chen LF, Chang SL, Wu HC, Ting WC, Yang CC. Accuracy of a staging system for prognosis of 5-year survival of patients with nasopharyngeal carcinoma who underwent chemoradiotherapy. JAMA Otolaryngol Head Neck Surg (2017) 143(11):1086–91. doi: 10.1001/jamaoto.2017.1562

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Tang X-R, Li Y-Q, Liang S-B, Jiang W, Liu F, Ge W-X, et al. Development and validation of a gene expression-based signature to predict distant metastasis in locoregionally advanced nasopharyngeal carcinoma: A retrospective, multicentre, cohort study. Lancet Oncol (2018) 19(3):382–93. doi: 10.1016/s1470-2045(18)30080-9

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Lei Y, Li YQ, Jiang W, Hong XH, Ge WX, Zhang Y, et al. A gene-expression predictor for efficacy of induction chemotherapy in locoregionally advanced nasopharyngeal carcinoma. J Natl Cancer Inst (2020) 113(4):471–80. doi: 10.1093/jnci/djaa100

CrossRef Full Text | Google Scholar

8. Kirienko M, Cozzi L, Antunovic L, Lozza L, Fogliata A, Voulaz E, et al. Prediction of disease-free survival by the PET/CT radiomic signature in non-small cell lung cancer patients undergoing surgery. Eur J Nucl Med Mol Imaging (2018) 45(2):207–17. doi: 10.1007/s00259-017-3837-7

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Cho JK, Lee GJ, Yi KI, Cho KS, Choi N, Kim JS, et al. Development and external validation of nomograms predictive of response to radiation therapy and overall survival in nasopharyngeal cancer patients. Eur J Cancer (2015) 51(10):1303–11. doi: 10.1016/j.ejca.2015.04.003

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Tang LQ, Li CF, Li J, Chen WH, Chen QY, Yuan LX, et al. Establishment and validation of prognostic nomograms for endemic nasopharyngeal carcinoma. J Natl Cancer Inst (2016) 108(1):djv291. doi: 10.1093/jnci/djv291

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Li J, Chen S, Peng S, Liu Y, Xing S, He X, et al. Prognostic nomogram for patients with nasopharyngeal carcinoma incorporating hematological biomarkers and clinical characteristics. Int J Biol Sci (2018) 14(5):549–56. doi: 10.7150/ijbs.24374

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Gu B, Zhang J, Ma G, Song S, Shi L, Zhang Y, et al. Establishment and validation of a nomogram with intratumoral heterogeneity derived from (18)F-FDG PET/CT for predicting individual conditional risk of 5-year recurrence before initial treatment of nasopharyngeal carcinoma. BMC Cancer (2020) 20(1):37. doi: 10.1186/s12885-020-6520-5

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Chen FP, Lin L, Qi ZY, Zhou GQ, Guo R, Hu J, et al. Pretreatment nomograms for local and regional recurrence after radical radiation therapy for primary nasopharyngeal carcinoma. J Cancer (2017) 8(13):2595–603. doi: 10.7150/jca.20255

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Chan SC, Hsu CL, Yen TC, Ng SH, Liao CT, Wang HM. The role of 18F-FDG PET/CT metabolic tumour volume in predicting survival in patients with metastatic nasopharyngeal carcinoma. Oral Oncol (2013) 49(1):71–8. doi: 10.1016/j.oraloncology.2012.07.016

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Chang KP, Tsang NM, Liao CT, Hsu CL, Chung MJ, Lo CW, et al. Prognostic significance of 18F-FDG PET parameters and plasma Epstein-Barr virus DNA load in patients with nasopharyngeal carcinoma. J Nucl Med (2012) 53(1):21–8. doi: 10.2967/jnumed.111.090696

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Zhang L, Dong D, Li H, Tian J, Ouyang F, Mo X, et al. Development and validation of a magnetic resonance imaging-based model for the prediction of distant metastasis before initial treatment of nasopharyngeal carcinoma: A retrospective cohort study. EBioMedicine (2019) 40:327–35. doi: 10.1016/j.ebiom.2019.01.013

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Chen YH, Chang KP, Chu SC, Yen TC, Wang LY, Chang JT, et al. Value of early evaluation of treatment response using (18)F-FDG PET/CT parameters and the Epstein-Barr virus DNA load for prediction of outcome in patients with primary nasopharyngeal carcinoma. Eur J Nucl Med Mol Imaging (2019) 46(3):650–60. doi: 10.1007/s00259-018-4172-3

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Ma B, Hui EP, King A, Leung SF, Kam MK, Mo F, et al. Prospective evaluation of plasma Epstein-Barr virus DNA clearance and fluorodeoxyglucose positron emission scan in assessing early response to chemotherapy in patients with advanced or recurrent nasopharyngeal carcinoma. Br J Cancer (2018) 118(8):1051–5. doi: 10.1038/s41416-018-0026-9

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images are more than pictures, they are data. Radiology (2016) 278(2):563–77. doi: 10.1148/radiol.2015151169

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Sollini M, Antunovic L, Chiti A, Kirienko M. Towards clinical application of image mining: A systematic review on artificial intelligence and radiomics. Eur J Nucl Med Mol Imaging (2019) 46(13):2656–72. doi: 10.1007/s00259-019-04372-x

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, et al. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett (2017) 403:21–7. doi: 10.1016/j.canlet.2017.06.004

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Lv W, Yuan Q, Wang Q, Ma J, Feng Q, Chen W, et al. Radiomics analysis of PET and CT components of PET/CT imaging integrated with clinical parameters: Application to prognosis for nasopharyngeal carcinoma. Mol Imaging Biol (2019) 21(5):954–64. doi: 10.1007/s11307-018-01304-3

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Xu H, Lv W, Feng H, Du D, Yuan Q, Wang Q, et al. Subregional radiomics analysis of PET/CT imaging with intratumor partitioning: Application to prognosis for nasopharyngeal carcinoma. Mol Imaging Biol (2020) 22(5):1414–26. doi: 10.1007/s11307-019-01439-x

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Yang P, Xu L, Cao Z, Wan Y, Xue Y, Jiang Y, et al. Extracting and selecting robust radiomic features from PET/MR images in nasopharyngeal carcinoma. Mol Imaging Biol (2020) 22(6):1581–91. doi: 10.1007/s11307-020-01507-7

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Parmar C, Grossmann P, Rietveld D, Rietbergen MM, Lambin P, Aerts HJ. Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer. Front Oncol (2015) 5:272. doi: 10.3389/fonc.2015.00272

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Kirienko M, Cozzi L, Rossi A, Voulaz E, Antunovic L, Fogliata A, et al. Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions. Eur J Nucl Med Mol Imaging (2018) 45(10):1649–60. doi: 10.1007/s00259-018-3987-2

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Lv W, Ashrafinia S, Ma J, Lu L, Rahmim A. Multi-level multi-modality fusion radiomics: Application to PET and CT imaging for prognostication of head and neck cancer. IEEE J BioMed Health Inform (2020) 24(8):2268–77. doi: 10.1109/jbhi.2019.2956354

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Hosny A, Aerts HJ, Mak RH. Handcrafted versus deep learning radiomics for prediction of cancer therapy response. Lancet Digit Health (2019) 1(3):e106–e7. doi: 10.1016/s2589-7500(19)30062-7

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Peng Y, Bi L, Guo Y, Feng D, Fulham M, Kim J. Deep multi-modality collaborative learning for distant metastases predication in PET-CT soft-tissue sarcoma studies. Annu Int Conf IEEE Eng Med Biol Soc (2019) 2019:3658–88. doi: 10.1109/embc.2019.8857666

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H. From handcrafted to deep-Learning-Based cancer radiomics: Challenges and opportunities. IEEE Signal Process Mag (2019) 36(4):132–60. doi: 10.1109/MSP.2019.2900993

CrossRef Full Text | Google Scholar

31. Li Z, Wang Y, Yu J, Guo Y, Cao W. Deep learning based radiomics (DLR) and its usage in noninvasive IDH1 prediction for low grade glioma. Sci Rep (2017) 7(1):5467. doi: 10.1038/s41598-017-05848-2

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Hosny A, Parmar C, Coroller TP, Grossmann P, Zeleznik R, Kumar A, et al. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Med (2018) 15(11):e1002711. doi: 10.1371/journal.pmed.1002711

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Naik N, Madani A, Esteva A, Keskar NS, Press MF, Ruderman D, et al. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains. Nat Commun (2020) 11(1):5727. doi: 10.1038/s41467-020-19334-3

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Zhou L, Zhang Z, Chen YC, Zhao ZY, Yin XD, Jiang HB. A deep learning-based radiomics model for differentiating benign and malignant renal tumors. Transl Oncol (2019) 12(2):292–300. doi: 10.1016/j.tranon.2018.10.012

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Peng H, Dong D, Fang MJ, Li L, Tang LL, Chen L, et al. Prognostic value of deep learning PET/CT-based radiomics: Potential role for future individual induction chemotherapy in advanced nasopharyngeal carcinoma. Clin Cancer Res (2019) 25(14):4271–9. doi: 10.1158/1078-0432.CCR-18-3065

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Chuang WY, Chang SH, Yu WH, Yang CK, Yeh CJ, Ueng SH, et al. Successful identification of nasopharyngeal carcinoma in nasopharyngeal biopsies using deep learning. Cancers (Basel) (2020) 12(2):507. doi: 10.3390/cancers12020507

CrossRef Full Text | Google Scholar

37. Zhang L, Wu X, Liu J, Zhang B, Mo X, Chen Q, et al. MRI-Based deep-learning model for distant metastasis-free survival in locoregionally advanced nasopharyngeal carcinoma. J Magn Reson Imaging (2020) 53(1):167-78. doi: 10.1002/jmri.27308

CrossRef Full Text | Google Scholar

38. Jing B, Deng Y, Zhang T, Hou D, Li B, Qiang M, et al. Deep learning for risk prediction in patients with nasopharyngeal carcinoma using multi-parametric MRIs. Comput Methods Programs BioMed (2020) 197:105684. doi: 10.1016/j.cmpb.2020.105684

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (Version 1.1). Eur J Cancer (2009) 45(2):228–47. doi: 10.1016/j.ejca.2008.10.026

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Castello A, Russo C, Grizzi F, Qehajaj D, Lopci E. Prognostic impact of intratumoral heterogeneity based on fractal geometry analysis in operated NSCLC patients. Mol Imaging Biol (2019) 21(5):965–72. doi: 10.1007/s11307-018-1299-3

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Hotta M, Minamimoto R, Gohda Y, Miwa K, Otani K, Kiyomatsu T, et al. Prognostic value of (18)F-FDG PET/CT with texture analysis in patients with rectal cancer treated by surgery. Ann Nucl Med (2021) 35(7):843–52. doi: 10.1007/s12149-021-01622-7

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Liu G, Yin H, Cheng X, Wang Y, Hu Y, Liu T, et al. Intra-tumor metabolic heterogeneity of gastric cancer on (18)F-FDG PETCT indicates patient survival outcomes. Clin Exp Med (2021) 21(1):129–38. doi: 10.1007/s10238-020-00659-8

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Cheng NM, Yao J, Cai J, Ye X, Zhao S, Zhao K, et al. Deep learning for fully automated prediction of overall survival in patients with oropharyngeal cancer using FDG-PET imaging. Clin Cancer Res (2021) 27(14):3948–59. doi: 10.1158/1078-0432.Ccr-20-4935

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol (2018) 15(2):81–94. doi: 10.1038/nrclinonc.2017.166

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Kim YI, Kim YJ, Paeng JC, Cheon GJ, Lee DS, Chung JK, et al. Heterogeneity index evaluated by slope of linear regression on (18)F-FDG PET/CT as a prognostic marker for predicting tumor recurrence in pancreatic ductal adenocarcinoma. Eur J Nucl Med Mol Imaging (2017) 44(12):1995–2003. doi: 10.1007/s00259-017-3755-8

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Lee JW, Park JY, Lee HJ, Lee JJ, Moon SH, Kang SY, et al. Preoperative [(18)F]FDG PET/CT tumour heterogeneity index in patients with uterine leiomyosarcoma: A multicentre retrospective study. Eur J Nucl Med Mol Imaging (2018) 45(8):1309–16. doi: 10.1007/s00259-018-3975-6

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts H. Machine learning methods for quantitative radiomic biomarkers. Sci Rep (2015) 5:13087. doi: 10.1038/srep13087

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Zhang Y, Oikonomou A, Wong A, Haider MA, Khalvati F. Radiomics-based prognosis analysis for non-small cell lung cancer. Sci Rep (2017) 7:46349. doi: 10.1038/srep46349

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Du D, Feng H, Lv W, Ashrafinia S, Yuan Q, Wang Q, et al. Machine learning methods for optimal radiomics-based differentiation between recurrence and inflammation: Application to nasopharyngeal carcinoma post-therapy PET/CT images. Mol Imaging Biol (2020) 22(3):730–8. doi: 10.1007/s11307-019-01411-9

PubMed Abstract | CrossRef Full Text | Google Scholar

50. Mu W, Jiang L, Zhang J, Shi Y, Gray JE, Tunali I, et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat Commun (2020) 11(1):5228. doi: 10.1038/s41467-020-19116-x

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Leung SF, Zee B, Ma BB, Hui EP, Mo F, Lai M, et al. Plasma Epstein-Barr viral deoxyribonucleic acid quantitation complements tumor-node-metastasis staging prognostication in nasopharyngeal carcinoma. J Clin Oncol (2006) 24(34):5414–8. doi: 10.1200/JCO.2006.07.7982

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Pan JJ, Ng WT, Zong JF, Lee SW, Choi HC, Chan LL, et al. Prognostic nomogram for refining the prognostication of the proposed 8th edition of the AJCC/UICC staging system for nasopharyngeal cancer in the era of intensity-modulated radiotherapy. Cancer (2016) 122(21):3307–15. doi: 10.1002/cncr.30198

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Xu C, Chen YP, Liu X, Li WF, Chen L, Mao YP, et al. Establishing and applying nomograms based on the 8th edition of the UICC/AJCC staging system to select patients with nasopharyngeal carcinoma who benefit from induction chemotherapy plus concurrent chemoradiotherapy. Oral Oncol (2017) 69:99–107. doi: 10.1016/j.oraloncology.2017.04.015

PubMed Abstract | CrossRef Full Text | Google Scholar

54. An C, Park YW, Ahn SS, Han K, Kim H, Lee SK. Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results. PloS One (2021) 16(8):e0256152. doi: 10.1371/journal.pone.0256152

PubMed Abstract | CrossRef Full Text | Google Scholar

55. Reardon S. Rise of robot radiologists. Nature (2019) 576(7787):S54–s8. doi: 10.1038/d41586-019-03847-z

PubMed Abstract | CrossRef Full Text | Google Scholar

56. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. Lancet Digit Health (2019) 1(6):e271–e97. doi: 10.1016/s2589-7500(19)30123-2

PubMed Abstract | CrossRef Full Text | Google Scholar

57. Dong F, She R, Cui C, Shi S, Hu X, Zeng J, et al. One step further into the blackbox: A pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound. Eur Radiol (2021) 31(7):4991–5000. doi: 10.1007/s00330-020-07561-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: radiomics, deep learning, nasopharyngeal carcinoma, PET/CT, progression-free survival

Citation: Gu B, Meng M, Bi L, Kim J, Feng DD and Song S (2022) Prediction of 5-year progression-free survival in advanced nasopharyngeal carcinoma with pretreatment PET/CT using multi-modality deep learning-based radiomics. Front. Oncol. 12:899351. doi: 10.3389/fonc.2022.899351

Received: 18 March 2022; Accepted: 28 June 2022;
Published: 29 July 2022.

Edited by:

Lizhi Liu, Sun Yat-sen University Cancer Center (SYSUCC), China

Reviewed by:

Lijun Lu, Southern Medical University, China
Xiaoping Lin, Sun Yat-sen University Cancer Center (SYSUCC), China

Copyright © 2022 Gu, Meng, Bi, Kim, Feng and Song. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shaoli Song, c2hhb2xpLXNvbmdAMTYzLmNvbQ==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.