ORIGINAL RESEARCH article
Uncontrolled Confounders May Lead to False or Overvalued Radiomics Signature: A Proof of Concept Using Survival Analysis in a Multicenter Cohort of Kidney Cancer
- 1Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States
- 2Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
- 3Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, United States
Purpose: We aimed to explore potential confounders of prognostic radiomics signature predicting survival outcomes in clear cell renal cell carcinoma (ccRCC) patients and demonstrate how to control for them.
Materials and Methods: Preoperative contrast enhanced abdominal CT scan of ccRCC patients along with pathological grade/stage, gene mutation status, and survival outcomes were retrieved from The Cancer Imaging Archive (TCIA)/The Cancer Genome Atlas—Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) database, a publicly available dataset. A semi-automatic segmentation method was applied to segment ccRCC tumors, and 1,160 radiomics features were extracted from each segmented tumor on the CT images. Non-parametric principal component decomposition (PCD) and unsupervised hierarchical clustering were applied to build the radiomics signature models. The factors confounding the radiomics signature were investigated and controlled sequentially. Kaplan–Meier curves and Cox regression analyses were performed to test the association between radiomics signatures and survival outcomes.
Results: 183 patients of TCGA-KIRC cohort with available imaging, pathological, and clinical outcomes were included in this study. All 1,160 radiomics features were included in the first radiomics signature. Three additional radiomics signatures were then modelled in successive steps removing redundant radiomics features first, removing radiomics features biased by CT slice thickness second, and removing radiomics features dependent on tumor size third. The final radiomics signature model was the most parsimonious, unbiased by CT slice thickness, and independent of tumor size. This final radiomics signature stratified the cohort into radiomics phenotypes that are different by cancer-specific and recurrence-free survival; HR (95% CI) = 3.0 (1.5–5.7), p <0.05 and HR (95% CI) = 6.6 (3.1–14.1), p <0.05, respectively.
Conclusion: Radiomics signature can be confounded by multiple factors, including feature redundancy, image acquisition parameters like slice thickness, and tumor size. Attention to and proper control for these potential confounders are necessary for a reliable and clinically valuable radiomics signature.
Tumor radiomics is a rapidly evolving field aiming to link tumor imaging phenotypes to pathological and clinical outcomes in a quantitative and non-invasive way (1). Radiomics generally converts medical image data into a large-scale and mineable set of imaging features, termed radiomics features, that characterize tumor imaging phenotypes (2). Radiomics signatures, essentially constellations of radiomics features, have shown to be helpful in plenty of medical tasks (3), including predicting malignancy in lung nodules at lung cancer screening CT scans (4), predicting genomic alteration on lung cancer imaging (5), predicting tumor recurrence and patients’ survival (6), and assessing response to treatment (7, 8).
Radiomics signature models have been developed by cancer researchers but their usefulness is usually difficult to replicate at other institutions or cohorts. This is mostly due to challenges encountered in the construction of a radiomics signature models attributed to radiomics feature redundancy and image quality differences (resulting from differences in image acquisition/technical parameters or from scanner vender differences). Another challenge facing useful radiomics signature is the need to provide new information independent of already known and established prognosticators, especially tumor size which is retrieved from routine clinical imaging without the need to run radiomics image analysis (9). Feature redundancy is a challenge to replicate and consolidate radiomics signatures. Two research teams, Lu et al. (10) and Berenguer et al. (11), independently pointed out that radiomics feature sets, which usually contain several hundreds to a thousand radiomics features, could actually be summarized into dozens of representative features. The variations in image acquisition parameters, e.g. thin/thick slice thickness and sharp/smooth reconstruction kernels, etc., could produce images of different qualities (12), which might impede generalization of radiomics signatures. For instance, the performance of radiomics signature developed using CT images of thin slice thickness decreased when applied on CT images with thicker in the predicting the risk of malignancy of lung nodule (13) and cancer-related genomic mutation status (14). Finally, including tumor size measurement (unidimensional, bidimensional and three dimensional) within radiomics features creates confusion about the usefulness of the texture based radiomics; it raises the question whether the prognostic or predictive radiomics signature effect is mainly driven by tumor size which is readily available through routine medical imaging without the need for radiomic analysis. Association between radiomics signature and well-established clinical factors (e.g., tumor size or patient’s age), may lead to overvalued radiomics signatures; this is because the predictive value of radiomics signature may be exaggerated by radiomics’ association with these important clinical factors (15).
Several approaches were proposed for establishing reproduceable and generalizable radiomics studies including radiomics reporting guidelines, such as Radiomics Quality Score (RQS) (9), The Image Biomarker Standardization Initiative (IBSI) (16, 17), and recently harmonization algorithms (18), such as Combat. Although these studies have demonstrated that radiomics signature could be impacted by multiple clinical and technical factors, there is still suboptimal awareness of this confounding potential and lack of consensus on how to control for such confounding. For example, within the RQS, although imaging protocol was suggested to be reported, it does not provide a reliable statistical method to control the confounding effect from imaging protocol and does not alarm that confounding effect of imaging protocol could lead to fake result. In IBSI, its main focus is on standardizing implementation parameters for radiomics feature extraction instead of controlling confounding effect. For those harmonization algorithms, like Combat, although they showed promising potential on removing confounding effect, however, there is limitation on application on new data. For example, when new data were added, the new data have to be combined with original data and the harmonization has to be re-established on the entire combined database (19).
Therefore, in this study, we designed multiple radiomics signature models to show the effect of uncontrolled confounders which may lead to false/overvalued radiomics signature among patients with clear cell renal cell carcinoma (ccRCC). The reason for using radiomics analysis on ccRCC as an example is that, ccRCC is the predominant pathological subtype (85%) in renal adenocarcinomas which account for 90% of kidney cancers because of its variable course (20, 21). The prediction of survival outcomes for ccRCC patients still remain challenging (22–25), due to the variation in ccRCC’s growth pattern, with some tumor showing an indolent growth pattern while others exhibiting aggressive behaviors including local recurrence after resection and distant metastases (26, 27).
We aimed to conduct this study in The Cancer Genome Atlas—Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) cohort data (28) which is a publicly available dataset from multiple medical institutions in the US. The TCGA-KIRC project house the pathological, clinical, and imaging data for patients with clear cell renal cell carcinoma (ccRCC).
Compared to single-center data, the TCGA-KIRC was a multicenter data therefore was more heterogenous in terms of tumor’s pathological stage and grade as well as the image acquisition parameters. The Cancer Imaging Archive (TCIA) (29) represent a repository of clinical imaging for patients/tumors included in the TCGA cohort housing de-identified clinical imaging and provide a great resource for researchers to conduct and validate their imaging related studies.
The overview of our study design is presented in Figure 1. Our study design followed the basic radiomics phases, which included data collection, feature extraction, modeling, and outcome analysis (9, 17). The highlights of the study are the following. First, all the used data are publicly available in the TCIA, so that other researchers can easily and reliably replicate our results. Second, multiple factors that might affect radiomics analysis (9) were investigated, including feature redundancy (e.g. correlation among features), image acquisition parameters (slice thickness was the main CT parameter impacting radiomic signature), and signature’s dependency to tumor size (a previously validated prognostic factor). Four radiomics signatures were successively built throughout our study that included: 1) entire radiomics feature set, 2) radiomics feature set after dimension reduction (i.e. excluding redundant features), 3) radiomics feature set after further exclusion of radiomics features affected by CT scan slice thickness, and 4) radiomics feature set after further exclusion of tumor size related features. Third, to address the over-fitting problem, non-parametric principal component decomposition (PCD) for dimension reduction and unsupervised hierarchical clustering for pattern discovery were used. Fourth, the radiomics signatures associations with clinical outcomes (OS, PFS, and RFS) were tested using Kaplan–Meier’s analysis. Finally, supplementary analyses were conducted to illustrate how impacting factors can affect the radiomics signatures.
Figure 1 Overview of the study design. The study mainly consisted of four key parts: 1) patient data collection, 2) feature extraction and controlling, 3) Modeling, and 4) Outcome analysis. Specially, four experiments were designed to evaluate the effects on radiomics signatures built by radiomics feature sets under four different controlling levels. In addition, supplementary experiments were performed to explore the association between outcomes and the confounding factors, such as CT slice thickness and tumor size.
Patient Data Collection
The data we used were downloaded from the TCGA-KIRC project, which is publicly available in the TCIA dataset (https://wiki.cancerimagingarchive.net/display/Public/TCGA-KIRC, accessed in August 2016). It contained 267 ccRCC patients collected from multiple medical centers nationwide. The downloaded content contained both presurgical contrast-enhanced abdominal CT scans and clinical information such as pathological stage, grade, gene mutation status, and patient’s survival outcomes. For the survival outcome, patients were censored at their last follow-up date if: they were alive (overall survival-OS), alive or dead from non-ccRCC related illness (cancer-specific survival-CSS), alive without tumor recurrence (recurrence-free survival-RFS). Somatic gene mutation status of our patients was retrieved from TCGA official website and adjudicated at Memorial Sloan Kettering Cancer Center (MSKCC) based on collaborative clinical TCGA (cTCGA) consortium data (30). Genetic information included gene mutation status of VHL, PBRM1, SETD2, BAP1, and KDM5C genes. All patients’ informed consents and institutional review boards’ approvals were obtained as part of the TCGA/TCIA efforts, and all demographic and imaging data were de-identified to comply with the Health Insurance Portability and Accountability Act (HIPAA).
In our study, 183 out of 267 patients were included in the final segmentation and analysis (as shown in Figure 1) based solely on the availability of pre-surgical contrast-enhanced abdominal CT scans where ccRCC was depicted and segmented to generate radiomics features.
Tumor Segmentation and Feature Extraction
A fellowship trained abdominal radiologist performed ccRCC segmentations. The radiologist was blinded to the various study endpoints (pathology at surgery and patient outcomes). For image analysis we used a MatLab (MathWorks, Natick, Massachusetts) based dedicated software application to visualize and segment the tumor from each patient’s CT scan. This semi-automated algorithm, combining the region-based active contours and a level set approach, was used in a slice-by-slice fashion (i.e. the entire ccRCC tumor was segmented) (31). The initial step for tumor segmentation required the radiologist to manually select a region-of-interest (ROI) that roughly enclosed the tumor region on a single CT slice. Rough boundary localization of the tumor was then automatically generated by the software algorithm and propagated to consecutive slices, serving as an initial ROI for subsequent segmentations on the neighboring images. The final tumor segmentation boundaries were then verified and fine-tuned by the radiologist (32). The total volume of the tumor (created by adding all segmentations from all slices) was then utilized to generate the radiomics features of each individual tumor.
A total of 1,160 radiomics features, i.e., quantitative imaging features, were extracted from each segmented tumor via the Columbia Image Feature Extractor (CIFE) (33) which has been successfully applied in many radiomics studies (34–36). More details of the CIFE, as well as its comparison with two other open-source feature extractors, the IBEX (37) and Pyradiomics (38), can be founder at (33). Three preprocesses were performed before the feature extraction, 1) a modified soft tissue CT window was adopted with level of 50 HU and width of 175 HU, 2) voxel resolution was resampled to 0.5 × 0.5 × 0.5 mm³ and 3) image was discretized into 64 bins.
Principal Component Decomposition (PCD)
In this study, we introduced an unsupervised method, PCD (39), for feature dimension reduction. On contrary to its supervised counterpart, an unsupervised method focuses more on the intrinsic characteristics of features and is not easily affected by the overfitting problem. PCD belongs to a type of non-parametric transformation that is able to convert a set of possibly correlated features into a set of linearly uncorrelated variables. Such uncorrelated variables are called principal components and are ranked by their corresponding variance, which is their contribution to feature variability in the data. Hence, in the set of resulting principal components, the first principal component has the largest variance, and each succeeding component in turn has a smaller variance. We hypothesized that if a principal component had a larger variance, it would contain more information, so that the dimension reduction could be fulfilled by selecting a compact set of principal components that had the large variance while excluding a large number of principal components with small variances (i.e. excluding those with the least input to the data). In this study, Matlab version 9.5 was used. Principal components that summed up to 99% contribution to the total variance were selected as the new representative features.
Exclusion of Slice Thickness Related Features
The main CT scan parameter in this cohort that affected (was associated with) radiomics features values was the CT scan slice thickness (among other parameters including CT scan voltage (kVp), vender and reconstruction algorithm). Because this is a multi-institutional cohort with different imaging protocols, we aimed to remove the potentially confounding effect of slice thickness from the radiomics signature to be built. The identification of radiomics features dependent on slice thickness involved three steps: First, the patients were distributed into two groups: one with thin CT slices (i.e. ≤3 mm) and one with thick slices (i.e. >3 mm) (The selection of 3 mm as an cutoff is based on clinical practice (40)). Second, C-index (41) was calculated for each feature based on the slice thickness group labels. The C-index, in this model, provided a measure of how good a radiomics feature could fit a binary outcome (groups of slice thickness ≤3 mm and >3 mm). In other words, we attempted to measure how much of the radiomics feature was explained by the CT slice thickness. Generally, for C-index, values below 0.5 indicate poor fitting, values over 0.7 indicate good fitting, values over 0.8 indicate strong fitting, and a value of 1 means perfect fitting. In radiomics signature model #3, we excluded all radiomics features whose C-index was >0.8 in order to remove the radiomics features that are heavily influenced/biased by CT slice thickness.
Exclusion of Tumor Size Related Features
In this project, we aimed to build a radiomics signature that deliver new prognostic information, independent of tumor size which has long been known as an important prognosticator. The correlation between tumor size and the radiomics features were measured by Pearson’s linear correlation coefficients (also called Pearson’s R). In our study, tumor size was obtained by measuring the longest diameter across the tumor’s cross-sectional region, as shown in Figure 1. The features that have strong positive or negative correlation with tumor size (Pearson’s R >0.7 or <−0.7, p < 0.05) were excluded from radiomics signature model #4 as tumor size dependent features.
Unsupervised hierarchical clustering was used to identify the clusters of ccRCC tumors based on input radiomics features. We identified two major clusters of ccRCC tumors in this study. During the clustering, the distance between two clusters in the feature space was measured by ‘cosine’ distance. The unsupervised clustering method was intrinsically an iteration process based on the similarity among radiomics features. At each iteration of the clustering, two of the most similar clusters were combined into one cluster and then acted as one cluster for the next iteration. A cluster node in the clustering tree could be one individual radiomics features or several radiomics features. A detailed description of unsupervised hierarchical clustering can be found in our previous publication (10).
Association Between Confounding Factors and Survival Outcomes
In this study, a direct association between confounding factors and survival outcomes was also studied. The two confounding factors were CT slice thickness and tumor size. The information of CT slice thickness was retrieved from DICOM attributes tagged as (0018,0050). Patients were assigned to two subgroups with slice thickness ≤3 mm (74 patients) and >3 mm (109 patients). With respect to tumor size, patients were assigned to two subgroups with tumor size less than or equal to the median size value (60 patients) and greater than the median value (123 patients).
Unsupervised clustering and principle component analyses were used to stratify the cohort into two groups/phenotypes. The association of this radiomics clustering/phenotypic binary classification was tested primarily with survival outcomes (OS, CSS and RFS) using Kaplan–Meier curves and Cox-regression models. Secondarily, the radiomics cluster’s association with other patient’s and tumor’s characteristics (including demographic characteristics (age, gender and race), pathological characteristics (tumor grade), American Joint Committee on Cancer tumor, node, metastasis staging (AJCC TNM staging), and genetic characteristics (VHL, PBRM1, SETD2, BAP1, and KDM5C) using Chi-Square and T-test when appropriate. P-values smaller than 0.05 indicated statistical significance. All statistical analyses were performed using Matlab 2020a.
Patient Characteristics and CT Examination
A total of 183 patients were included in our study according to the inclusion and the exclusion criteria. The patient characteristics are presented in Table 1. Patients’ average age was 60 years (± standard deviation (std) of 12). Majority of patients were men (66%) and white (96%). The mean ± std of tumor size was 6.4 ± 3.2 cm. The minimum and maximum of tumor size were 1.5 and 15.5 cm, respectively. The cohort was close to be evenly split between early stage (52% had stage I) and advanced stage (48% has stages II–IV). The CT scan characteristics are presented in Table 2. Most of the patients were scanned by the same vender CT scanner (GE Medical System, 85%) but with different slice thicknesses; 60% had thin CT slices scans while 40% had thick CT slices scans. A more detailed CT characteristics were provided in Supplement S2.
Ablation Analysis Based on Survival Outcome
In experiment #1, all the features were used to create a radiomics signature without any exclusion. In this situation, the radiomics signature was not associated with any of the survival outcomes (OS, CSS and RFS, p-value >0.05). In experiment #2, the redundant radiomics features were excluded leaving in only the redundancy-controlled radiomics set yielding a radiomics signature that was significantly associated with OS (HR (95% CI) = 1.8 (1.0–3.3), p-value <0.05), CSS (HR (95% CI) = 2.0 (1.0–4.1), p-value <0.05), and RFS (HR (95% CI) = 2.6 (1.1–6.2), p-value <0.05). The radiomics signature in experiment #2 included striking fewer radiomics features (89/1160 = 7.6%). However, a correlation analysis showed that the second radiomics signature had a high correlation with CT scan slice thickness (p-value <0.001) which is an image acquisition parameter that should not be associated with clinical outcomes. This may be attributed to selection bias inherent in retrospective multi-institutional cohort studies when technical parameters are different between institutions, along with inter-institutional differences in tumor stage, grade, or aggressiveness tumors. Thus, in experiment #3, we used a radiomics feature set used in experiment #2 but after further exclusion of CT slice thickness dependent radiomics features. The third radiomics signature continued to be significantly associated with OS, CSS, and RFS (all p-value <0.05) with even higher magnitude of association; HR (95% CI) increasing to 2.6 (1.5–4.4), 13.7 (7.1–26.5), and 8.0 (3.8–17.0), respectively. The high HRs on predicting patients’ outcomes indicated that the third radiomics signature was a powerful prognostic signature, especially on predicting CSS. However, experiment #3 radiomics signature continued to be associated with tumor size which is an information readily available through routine clinical imaging without the need for complex radiomics analysis. In order to render this radiomics signature independent of ccRCC tumor size, in experiment #4 we further excluded radiomics features (from the set used in experiment#3 model) that are highly correlating with tumor size (C-Index = 0.877) to yield a tumor-size independent radiomics signature. Final results showed that the well-controlled radiomics signature from experiment #4 was significantly associated with CSS (HR (95% CI) = 3.0 (1.5–5.7), p-value <0.05) and RFS (HR (95% CI) = 6.6 (3.1–14.1), p-value <0.05) for ccRCC patients, but was not significantly associated with OS (p = 0.06).
In addition, the associations between confounding factors with survival outcomes were also studied. As shown in Supplement S1 Figure 1, there was a significant association between CT slice thickness and patient’s OS (HR (95% CI) = 2.0 (1.2–3.5), p-value <0.01), CSS (HR (95% CI) = 2.0 (1.0–4.0), p-value <0.01) and RFS (HR (95% CI) = 3.6 (1.6–8.0), p-value <0.01), respectively. In Supplement S1 Figure 2, there was expected significant association between tumor size and patient’s OS (HR (95% CI) = 2.9 (1.2–5.6), p-value <0.01), CSS (HR (95% CI) = 6.0 (3.0–12.2), p-value <0.01) and RFS (HR (95% CI) = 5.0 (2.3–11.4), p-value <0.01). These two association studies revealed that the patient data in the TCGA-KIRC project were indeed factor-biased data within which real imaging phenotypical signals were suppressed.
The radiomics feature set used in experiment #4 was our final set to be implemented in constructing final most parasomnias, scanning parameter-independent, and tumor size-independent radiomics signature model classifying the study cohort into two major phenotypes; referred hereafter as radiomics phenotype I (RAD1) and radiomics phenotype II (RAD2). Demographics, pathological characteristics, clinical parameters, and gene mutation status are presented in Table 4. There was no statistically significant difference between RAD1 and RAD2 clusters, except in regard the AJCC staging; almost three quarters of patients with RAD1 radiomic signature had stage I tumor while less than half of patients in RAD2 cluster had stage I disease (72% vs 40%, p-value <0.01). No significant difference between the two radiomics phenotypes in terms of gene mutation status as can be seen in Table 4.
Table 4 Demographic, clinical, pathological, and genetic characteristics of the final radiomics phenotypes.
RAD1 radiomics phenotype included 71 patients and RAD2 phenotype included 112 patients. RAD1 cluster was reflective of the less aggressive ccRCC, in comparison to RAD2 cluster, consistently associated with lower AJCC cancer stage and with better cancer-specific and recurrence-free survival as reflected in Figure 2. In terms of overall survival, RAD1 tended to have better survival also but the association was not statistically significant. The most striking divergence of survival is noticed in the recurrence-free survival; Cox-regression hazard ratio of RAD2 vs. RAD1 was HR (95% CI) = 6.6 (3.1–14.1), p-value <0.05.
Figure 2 (A–C) are Kaplan–Meier curves displaying the association between radiomics phenotypes and patients’ OS, CSS and RFS, respectively. RAD1 and RAD2 represent radiomics phenotypes I and II.
In this study, we demonstrated a proof of concept to remove redundant, CT slice thickness-dependent (biased), and tumor-size dependent radiomics features towards building a concise radiomics signature in patients with ccRCC. Furthermore, we demonstrated that the final most parsimonious radiomics signature model stratified this multi-institutional cohort into two major radiomics phenotypes that are significantly different by AJCC staging, CSS, and RFS. However, the radiomics signature model was not associated with genetic mutation status nor with any other available patient or tumor characteristic. In this study we have demonstrated how radiomics models can be negatively impacted by confounders (like CT slice thickness) and tumor size (a previously proven prognosticator) leading to false/overvalued signatures. As shown in Figure 1, the addition of the controlling procedure after the feature extraction is a supplement to the existing standard radiomics guideline (9), and should be helpful for medical image analysis whose data were usually relatively small and with high heterogeneity of imaging protocol.
Our analysis framework followed an ablation study paradigm, i.e., investigating factors sequentially, from feature redundancy to imaging parameters (CT slice thickness) to tumor size in order to evaluate the effect of each factor to the final radiomics signature. For the feature redundancy, we have shown that there was a large redundancy existing within the radiomics feature set. A raw feature set containing 1,160 features could be efficiently represented by only 89 dimensions of principal components, which represent 7.6% of the original radiomics features. The reason for the existence of redundancy is because feature extractors, including other widely used extractors (20) (e.g., PyRadiomics (38) and IBEX (37)), were based on a number of basic feature extraction algorithms (e.g., Wavelet features (42), Gray-Level Co-occurrence Matrix (GLCM) features (43), etc.), which contained multiple tunable parameters aiming to extract the features in multi-scales for the sake of not missing any valuable image patterns (33). Thus, it is highly recommended that the removal of feature redundancy be the first step when initializing a radiomics analysis. It’s also notable that unsupervised machine-learning methods, such as non-parametric principal component decomposition (39) and unsupervised hierarchical clustering (10), were recommended for redundancy removal and radiomics signature building. Compared to the supervised methods, the unsupervised machine-learning methods generally have a lower risk on overfitting the problem, because little or no prior knowledge is needed for the learning parameters.
For image acquisition parameters, our study showed that ccRCC patients imaged with different slice thicknesses were associated with significantly different survival outcomes, which is not biologically plausible and certainly is attributed to inherent bias in retrospective studies. The patients with thicker CT scan slices thickness were of more aggressive tumors when compared to patients with thinner CT scan slices (See Supplement S1 Figure 1(A), thick vs. thin slice thickness group was of HR of recurrence (95% CI) = 3.6 (1.6–8.0), p <0.01). We believe this apparent association is because institutions that contributed ccRCC to the TCGA and TCIA with thicker CT slices happened to be contributing ccRCC tumors with more aggressive behavior (i.e. larger tumors with higher stage of disease). If the effect of slice thickness on radiomics features is not attended to and controlled for, we would have committed an error by producing a radiomics signature that is dependent on the slice thickness of the CT scan and therefore completely false.
ccRCC tumor size has long been identified as an important prognosticator and it is easily measured on routine abdominal imaging without the need for advanced processing or radiomics. Our study demonstrated that tumor size-dependent radiomics features may exaggerate the clinical utility of radiomics and may mask the real/tumor size-independent radiomics clinical utility. Size independent radiomics features are reflective of tumor textural heterogeneity will ultimately provide additional prognostic information separate from tumor size measurement which is routinely implemented clinically (e.g., clinical staging for kidney cancer (44), RECIST 1.1 (45)). In this study, we introduced a method to remove the effect of tumor size from the radiomics signature models built to yield a size-independent radiomics signature with more valuable input into the tumor internal environment.
In summary, there were two main findings in our work: in retrospective multi-institutional imaging data with heterogenous techniques, image acquisition parameters could lead to false radiomics signatures while size-dependent radiomics may yield overvalued clinical utility of radiomics signature. Unfortunately, there is still suboptimal awareness of these two pitfalls in radiomics literature, although some researchers have tried to establish quality assurance criteria for radiomics study (46).
For image acquisition parameters, most of previous studies on the effect of image acquisition parameters focused on studying feature reproducibility and model generalization. Such studies could only result in conclusions that the heterogeneity in image acquisition parameters decreased the reproducibility and performance of radiomics signatures. Our study, for the first time, showed that the effect of image acquisition parameters could be severe enough to lead to false biologically unplausible association. Attention to and control for imaging acquisition parameters that are influential of radiomics is of crucial importance in retrospective studies, especially multi-institutional ones.
For tumor size, the risk of overvalued radiomics signature induced by tumor-size-effect was mainly caused by a large portion of radiomics features that were basically ‘mixture’ features, which characterized tumor size and intratumor imaging pattern simultaneously, such as Gray-Level Run Length Matrix (GLRLM) (47) and Gray-Level Size Zone Matrix (GLSZM) (48). The contribution weights between tumor size and image patterns to the final feature value were variable depending on the specific tumor phenotypes. Unfortunately, in most of radiomics feature extraction packages, such as PyRadiomics, IBEX, etc., the effect of size on ‘mixture’ features was not well studied (33). Thus, the size effect in ‘mixture’ feature may be easily overlooked and may lead to an over-valued size-dependent radiomics signature. The prognostic information from such a signature will overlap with the prognostic information already retrieved by measuring tumor size. For example, Mattea et al. (15) tested a radiomics signature previously shown to have predictive values on survival outcome among head and neck cancer patients, but eventually, this radiomic signature was found to be a surrogate for tumor size.
The limitations of our work include the following points. First, the number of patients in the TCGA-KIRC project was relatively small and there were no further external data to validate our final radiomics signature. Second, except for slice thickness, other image acquisition parameters, such as scanning mode and reconstruction kernel, were not studied because most of the CT scanning (85%, see Table 1) were performed on scanners manufactured by the same vender with similar smooth reconstruction kernels. Fortunately, since the TCIA is a rapidly developing open research community supported by National Cancer Institute, it is promising that more and more projects such as the TCGA-KIRC will be created/improved and available in near future. Third, using PCD for dimension reduction could lead to losing of features’ original physical quantification and make the modeling a black box which is difficult for interpret. Finally, in this study, we only used single-modal imaging and unsupervised machine learning algorithms for modeling, however, as the data increase in future, we could investigate multimodal imaging and supervised machine learning algorithms which have shown promising results in recent years (49–53).
In this paper, we demonstrated that a radiomics signature could be negatively impacted by multiple factors, including radiomics redundancy from large-scale feature extraction, biases from image acquisition parameters, and underlying dependency to established clinical prognosticator (tumor size). Proper attention to and control for these pitfalls are needed to guarantee a reliable, reproducible, and clinically relevant radiomics signature. Our work used the prediction of survival outcomes in ccRCC patients as an example. In our study, the final most concise, slice thickness independent, and tumor size-independent radiomics signature stratified multi-institutional retrospective cohort of ccRCC into two distinct phenotypes that are significantly different in tumor stage, CSS and RFS.
Data Availability Statement
The data we used were downloaded from the TCGA-KIRC project, which is publicly available in the TCIA dataset (https://wiki.cancerimagingarchive.net/display/Public/TCGA-KIRC).
The studies involving human participants were reviewed and approved by dataset downloaded from The Cancer Genome Atlas Program supported by National Cancer Institute. The patients/participants provided their written informed consent to participate in this study.
LLu: Conceptualization, methodology, formal analysis, and writing (original draft). FA: Conceptualization, methodology, data curation, and writing (extensive review and editing). OA: Resource, data curation and funding acquisition. LLuk: Data curation. XG: Software. HY: Data preparation. JY: Writing (review and editing). AH: Data curation. LS: Conceptualization, discussion, and funding acquisition. BZ: Conceptualization, discussion, and funding acquisition. All authors contributed to the article and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2021.638185/full#supplementary-material
1. Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, et al. Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nat Commun (2014) 5:4006. doi: 10.1038/ncomms5644
5. Rios Velazquez E, Parmar C, Liu Y, Coroller TP, Cruz G, Stringfield O, et al. Somatic Mutations Drive Distinct Imaging Phenotypes in Lung Cancer. Cancer Res (2017) 77(14):3922–30. doi: 10.1158/0008-5472.CAN-17-0122
6. Oikonomou A, Khalvati F, Tyrrell PN, Haider MA, Tarique U, Jimenez-Juan L, et al. Radiomics Analysis At PET/CT Contributes to Prognosis of Recurrence and Survival in Lung Cancer Treated With Stereotactic Body Radiotherapy. Sci Rep (2018) 8(1):4003. doi: 10.1038/s41598-018-22357-y
7. Sun R, Limkin EJ, Vakalopoulou M, Dercle L, Champiat S, Han SR, et al. A Radiomics Approach to Assess Tumour-Infiltrating CD8 Cells and Response to anti-PD-1 or anti-PD-L1 Immunotherapy: An Imaging Biomarker, Retrospective Multicohort Study. Lancet Oncol (2018) 19(9):1180–91. doi: 10.1016/S1470-2045(18)30413-3
8. He B, Dong D, She Y, Zhou C, Fang M, Zhu Y, et al. Predicting Response to Immunotherapy in Advanced non-Small-Cell Lung Cancer Using Tumor Mutational Burden Radiomic Biomarker. J Immunother Cancer (2020) 8(2):e000550. doi: 10.1136/jitc-2020-000550
9. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: The Bridge Between Medical Imaging and Personalized Medicine. Nat Rev Clin Oncol (2017) 14(12):749–62. doi: 10.1038/nrclinonc.2017.141
11. Berenguer R, Pastor-Juan MDR, Canales-Vazquez J, Castro-Garcia M, Villas MV, Mansilla Legorburo F, et al. Radiomics of CT Features May be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology (2018) 288(2):407–15. doi: 10.1148/radiol.2018172361
13. He L, Huang Y, Ma Z, Liang C, Liang C, Liu Z. Effects of Contrast-Enhancement, Reconstruction Slice Thickness and Convolution Kernel on the Diagnostic Performance of Radiomics Signature in Solitary Pulmonary Nodule. Sci Rep (2016) 6:34921. doi: 10.1038/srep34921
14. Li Y, Lu L, Xiao M, Dercle L, Huang Y, Zhang Z, et al. Ct Slice Thickness and Convolution Kernel Affect Performance of a Radiomic Model for Predicting Egfr Status in Non-Small Cell Lung Cancer: A Preliminary Study. Sci Rep (2018) 8(1):17913. doi: 10.1038/s41598-018-36421-0
15. Welch ML, McIntosh C, Haibe-Kains B, Milosevic MF, Wee L, Dekker A, et al. Vulnerabilities of Radiomic Signature Development: The Need for Safeguards. Radiother Oncol (2019) 130:2–9. doi: 10.1016/j.radonc.2018.10.027
16. Zwanenburg A, Vallieres M, Abdalah MA, Aerts H, Andrearczyk V, Apte A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-Based Phenotyping. Radiology (2020) 295(2):328–38. doi: 10.1148/radiol.2020191145
17. Vallieres M, Zwanenburg A, Badic B, Cheze Le Rest C, Visvikis D, Hatt M. Responsible Radiomics Research for Faster Clinical Translation. J Nucl Med (2018) 59(2):189–93. doi: 10.2967/jnumed.117.200501
22. Nazari M, Shiri I, Hajianfar G, Oveisi N, Abdollahi H, Deevband MR, et al. Noninvasive Fuhrman Grading of Clear Cell Renal Cell Carcinoma Using Computed Tomography Radiomic Features and Machine Learning. Radiol Med (2020) 125(8):754–62. doi: 10.1007/s11547-020-01169-z
23. Yang R, Wu J, Sun L, Lai S, Xu Y, Liu X, et al. Radiomics of Small Renal Masses on Multiphasic CT: Accuracy of Machine Learning-Based Classification Models for the Differentiation of Renal Cell Carcinoma and Angiomyolipoma Without Visible Fat. Eur Radiol (2020) 30(2):1254–63. doi: 10.1007/s00330-019-06384-5
24. Ursprung S, Beer L, Bruining A, Woitek R, Stewart GD, Gallagher FA, et al. Radiomics of Computed Tomography and Magnetic Resonance Imaging in Renal Cell Carcinoma-a Systematic Review and Meta-Analysis. Eur Radiol (2020) 30(6):3558–66. doi: 10.1007/s00330-020-06666-3
25. Nazari M, Shiri I, Zaidi H. Radiomics-Based Machine Learning Model to Predict Risk of Death Within 5-Years in Clear Cell Renal Cell Carcinoma Patients. Comput Biol Med (2021) 129:104135. doi: 10.1016/j.compbiomed.2020.104135
26. Gulati S, Martinez P, Joshi T, Birkbak NJ, Santos CR, Rowan AJ, et al. Systematic Evaluation of the Prognostic Impact and Intratumour Heterogeneity of Clear Cell Renal Cell Carcinoma Biomarkers. Eur Urol (2014) 66(5):936–48. doi: 10.1016/j.eururo.2014.06.053
27. Tang PA, Vickers MM, Heng DY. Clinical and Molecular Prognostic Factors in Renal Cell Carcinoma: What We Know So Far. Hematol Oncol Clin North Am (2011) 25(4):871–91. doi: 10.1016/j.hoc.2011.04.003
28. Akin O, Elnajjar P, Heller M, Jarosz R, Erickson B, Kirk S, et al. Radiology Data From the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma [TCGA-KIRC] Collection. The Cancer Imaging Archive (2016).
29. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J Digit Imaging (2013) 26(6):1045–57. doi: 10.1007/s10278-013-9622-7
30. Hakimi AA, Ostrovnaya I, Reva B, Schultz N, Chen YB, Gonen M, et al. Adverse Outcomes in Clear Cell Renal Cell Carcinoma With Mutations of 3p21 Epigenetic Regulators BAP1 and SETD2: A Report by MSKCC and the KIRC TCGA Research Network. Clin Cancer Res (2013) 19(12):3259–67. doi: 10.1158/1078-0432.CCR-12-3886
32. Ahmed FS, Akin O, Shaish H, Luk L, Guo X, Yang H, et al. Nonenhancing Component of Clear Cell Renal Cell Carcinoma on Computed Tomography Correlates With Tumor Necrosis and Stage and Serves as a Size-Independent Prognostic Biomarker. J Comput Assist Tomogr (2019) 43(4):628–33. doi: 10.1097/RCT.0000000000000877
33. Lu L, Sun SH, Yang H, E L, Guo P, Schwartz LH, et al. Radiomics Prediction of EGFR Status in Lung Cancer-Our Experience in Using Multiple Feature Extractors and The Cancer Imaging Archive Data. Tomography (2020) 6(2):223–30. doi: 10.18383/j.tom.2020.00017
34. Dercle L, Lu L, Schwartz LH, Qian M, Tejpar S, Eggleton P, et al. Radiomics Response Signature for Identification of Metastatic Colorectal Cancer Sensitive to Therapies Targeting EGFR Pathway. J Natl Cancer Inst (2020) 112(9):902–12. doi: 10.1093/jnci/djaa017
35. Dercle L, Fronheiser M, Lu L, Du S, Hayes W, Leung DK, et al. Identification of Non–Small Cell Lung Cancer Sensitive to Systemic Cancer Therapies Using Radiomics. Clin Cancer Res (2020) 26(9):2151–62. doi: 10.1158/1078-0432.CCR-19-2942
36. Lu L, Wang D, Wang L, E L, Guo P, Li Z, et al. A Quantitative Imaging Biomarker for Predicting Disease-Free-Survival-Associated Histologic Subgroups in Lung Adenocarcinoma. Eur Radiol (2020) 30(7):3614–23. doi: 10.1007/s00330-020-06663-6
37. Zhang L, Fried DV, Fave XJ, Hunter LA, Yang J, Court LE. IBEX: An Open Infrastructure Software Platform to Facilitate Collaborative Work in Radiomics. Med Phys (2015) 42(3):1341–53. doi: 10.1118/1.4908210
38. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res (2017) 77(21):e104–7. doi: 10.1158/0008-5472.CAN-17-0339
41. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei L. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures With Censored Survival Data. Stat Med (2011) 30(10):1105–17. doi: 10.1002/sim.4154
45. Schwartz LH, Litiere S, de Vries E, Ford R, Gwyther S, Mandrekar S, et al. Recist 1.1-Update and Clarification: From the RECIST Committee. Eur J Cancer (2016) 62:132–7. doi: 10.1016/j.ejca.2016.03.081
46. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: Extracting More Information From Medical Images Using Advanced Feature Analysis. Eur J Cancer (2012) 48(4):441–6. doi: 10.1016/j.ejca.2011.11.036
48. Thibault G, Fertil B, Navarro C, Pereira S, Cau P, Levy N, et al. Shape and Texture Indexes Application to Cell Nuclei Classification. Int J Pattern Recognit Artif Intell (2013) 27(01):1357002. doi: 10.1142/S0218001413570024
49. Shiri I, Maleki H, Hajianfar G, Abdollahi H, Ashrafinia S, Hatt M, et al. Next-Generation Radiogenomics Sequencing for Prediction of EGFR and KRAS Mutation Status in NSCLC Patients Using Multimodal Imaging and Machine Learning Algorithms. Mol Imaging Biol (2020) 22(4):1132–48. doi: 10.1007/s11307-020-01487-8
50. Parmar C, Grossmann P, Rietveld D, Rietbergen MM, Lambin P, Aerts HJ. Radiomic Machine-Learning Classifiers for Prognostic Biomarkers of Head and Neck Cancer. Front Oncol (2015) 5:272. doi: 10.3389/fonc.2015.00272
51. Hajianfar G, Shiri I, Maleki H, Oveisi N, Haghparast A, Abdollahi H, et al. Noninvasive O6 Methylguanine-Dna Methyltransferase Status Prediction in Glioblastoma Multiforme Cancer Using Magnetic Resonance Imaging Radiomics Features: Univariate and Multivariate Radiogenomics Analysis. World Neurosurg (2019) 132:e140–61. doi: 10.1016/j.wneu.2019.08.232
52. Leger S, Zwanenburg A, Pilz K, Lohaus F, Linge A, Zophel K, et al. A Comparative Study of Machine Learning Methods for Time-to-Event Survival Data for Radiomics Risk Modelling. Sci Rep (2017) 7(1):13206. doi: 10.1038/s41598-017-13448-3
Keywords: radiomics, quality control, machine learning, TCGA, The Cancer Imaging Archive (TCIA), clear cell renal cell cancer
Citation: Lu L, Ahmed FS, Akin O, Luk L, Guo X, Yang H, Yoon J, Hakimi AA, Schwartz LH and Zhao B (2021) Uncontrolled Confounders May Lead to False or Overvalued Radiomics Signature: A Proof of Concept Using Survival Analysis in a Multicenter Cohort of Kidney Cancer. Front. Oncol. 11:638185. doi: 10.3389/fonc.2021.638185
Received: 05 December 2020; Accepted: 06 April 2021;
Published: 27 May 2021.
Edited by:Fatima-Zohra Mokrane, Université Toulouse III Paul Sabatier, France
Reviewed by:Isaac Shiri, Geneva University Hospitals (HUG), Switzerland
Cyril Jaudet, Centre François Baclesse, France
Copyright © 2021 Lu, Ahmed, Akin, Luk, Guo, Yang, Yoon, Hakimi, Schwartz and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Firas S. Ahmed, firstname.lastname@example.org
†These authors have contributed equally to this work