An Exploratory Study on the Stable Radiomics Features of Metastatic Small Pulmonary Nodules in Colorectal Cancer Patients

Objectives To identify the relatively invariable radiomics features as essential characteristics during the growth process of metastatic pulmonary nodules with a diameter of 1 cm or smaller from colorectal cancer (CRC). Methods Three hundred and twenty lung nodules were enrolled in this study (200 CRC metastatic nodules in the training cohort, 60 benign nodules in the verification cohort 1, 60 CRC metastatic nodules in the verification cohort 2). All the nodules were divided into four groups according to the maximum diameter: 0 to 0.25 cm, 0.26 to 0.50 cm, 0.51 to 0.75 cm, 0.76 to 1.0 cm. These pulmonary nodules were manually outlined in computed tomography (CT) images with ITK-SNAP software, and 1724 radiomics features were extracted. Kruskal-Wallis test was performed to compare the four different levels of nodules. Cross-validation was used to verify the results. The Spearman rank correlation coefficient is calculated to evaluate the correlation between features. Results In training cohort, 90 features remained stable during the growth process of metastasis nodules. In verification cohort 1, 293 features remained stable during the growth process of benign nodules. In verification cohort 2, 118 features remained stable during the growth process of metastasis nodules. It is concluded that 20 features remained stable in metastatic nodules (training cohort and verification cohort 2) but not stable in benign nodules (verification cohort 1). Through the cross-validation (n=100), 11 features remained stable more than 90 times. Conclusions This study suggests that a small number of radiomics features from CRC metastatic pulmonary nodules remain relatively stable from small to large, and they do not remain stable in benign nodules. These stable features may reflect the essential characteristics of metastatic nodules and become a valuable point for identifying metastatic pulmonary nodules from benign nodules.


INTRODUCTION
Colorectal cancer (CRC) is the third most common malignant tumor in the world (1), and about 5% to 15% of colorectal patients are accompanied by lung metastases (2). Surgical resection of lung metastases is an optimal treatment method for CRC patients to survive long-term (3). According to previous studies, the 5-year survival time after surgery is 21% to 68% (4). Thus, a definitive diagnosis of lung metastasis is essential for clinical decision making and the improvement of prognosis. Because chest computed tomography (CT) scan is recommended for the detection of lung nodules in primary staging and postoperative surveillance (5), an increasing number of CRC patients are found to have indeterminate pulmonary nodules (IPNs), and approximately 30% of them eventually proved to be metastatic during the follow-up (6). Most of these nodules are <1 cm in diameter, single or double, and lack typical malignant signs. Other further examinations, such as positron emission tomography (PET-CT), can reflect the glucose metabolism of nodules, but small nodules (<1 cm) sometimes have no SUV elevation (7). Long-term follow-up is the most common policy to be used (8). However, the follow-up policy will bring additional costs to the patient and may delay the best treatment period. Therefore, at present, diagnosis of IPNs is a challenging dilemma for radiologists, which caused a puzzle for clinical staging, as well as the subsequent treatment.
Recently, "radiomics" has attracted the attention of doctors as a new medical imaging post-processing technology, which is a noninvasive technique and does not require additional examination. Radiomics refers to high-throughput extraction of a large number of quantitative imaging features from medical imaging images, data analysis, model building, and disease prediction, assisting doctors in making the most accurate diagnosis (9). It has been reported that texture features have a close relationship with the pathological type and pathological grade of tumors (10,11). Nevertheless, so far, there was little published data on the diagnosis of IPNs in CRC patients on radiomics. TingDan Hu et al. was the first to conduct a study of IPNs in CRC patients based on radiomics and achieved a promising consequence by constructing prediction models (12,13).
However, no studies have been conducted to extract the stable radiomics features of the CRC metastatic pulmonary nodules to our knowledge. In this study, we hypothesize that some radiomics features will reflect the essential characteristics of metastatic nodules and remain relatively unchanged during the growth and development of metastatic nodules, just as genes reflect the crucial characteristics of biological. Hence, this study aims to screen out the stable features of CRC metastatic pulmonary nodules with a diameter of ≤1 cm.

Patient
This retrospective study was approved by the ethics committee of the First Affiliated Hospital of Guangzhou Medical University, and the requirement for informed consent was waived. Our study included 260 small pulmonary nodules from 34 CRC patients (13 female/21 male; average age 60.1 ± 12.5 years; range 35-82 years) with lung metastasis between January 2012 and December 2019 from our database as training cohort and verification cohort 2. The inclusion criteria were: 1) With metastatic pulmonary nodules confirmed by histopathology or multiple metastases identified on thoracic CT. 2) The diameter of the nodule is ≤1 cm. Including the nodules found simultaneously as the primary tumor or the nodules found in the subsequent examination. Exclusion criteria: 1) The image quality is poor and cannot be used for quantitative analysis. 2) CRC patients with other malignant tumors.
In addition, our study included 60 small pulmonary nodules from 23 patients (17 males/6 females, mean age, 55.2 ± 13.7 years; range, 26-76 years) with benign pulmonary nodules between January 2012 and December 2019 in our database as verification cohort 1. The inclusion criteria were: 1) With benign pulmonary nodules confirmed by pathology or clinical follow-up for 2 years. 2) The maximum diameter of the nodule is ≤1 cm. Exclusion criteria: The image quality is poor and cannot be used for quantitative analysis.
A total of 320 nodules were selected for radiomics analysis, which included training cohort (200 metastatic nodules), verification cohort 1 (60 benign nodules), and verification cohort 2 (60 metastatic nodules). In these three cohorts, we divided the nodules into four groups according to their maximum diameter respectively, including group A (0-0.25 cm), group B (0.26-0.50 cm), group C (0.51-0.75 cm), and group D (0.76-1.00 cm). Figure 1 shows the process of nodule volume increasing on CT.

CT Image Acquisition
All patients underwent contrast-enhanced CT with a multidetector row CT (Siemens Somatom Definitions AS + 128 rows CT, Germany) using the following parameters: voltage, 120 kV; current, 150-200 mA; matrix, 512 × 512; scanning layer thickness, 0.60 mm; and reconstruction layer thickness, 2.0 mm. A high-pressure auto-injector was used to inject the non-ionic iodine contrast agent (Omnipaque 300, GE Healthcare, Shanghai, China) through the anterior elbow vein. The dose was 1.0 to 1.5 ml/kg, and the injection flow rate was 3 to 4 ml/s. Arterial and venous phase scans were routinely performed. The scanned image and dose report were transferred to the hospital's picture archiving and communication system (PACS). The venous phase images were used for analysis.

Lesion Segmentation
The lesion segmentation was completed by two radiologists with ITK-SNAP (version 3.6.0) software. First, A radiologist of three years of lung imaging diagnosis experience delineated the lesion preliminarily, and then, a radiologist of 14 years of lung imaging diagnosis experience adjusted and confirmed the delineation of the lesion area. Both were blind to the pathological results and clinical information. Three-dimensional region of interest (ROIs) was segmented, avoiding the large blood vessels, bronchi, and pleura. Figure 2 shows one example of 3D segmentation of pulmonary nodules by ITK-SNAP.

Feature Extraction
All CT images and corresponding segmented ROIs were imported into the AK software (GE Healthcare) to perform image preprocessing and feature extraction. Image preprocessing methods include image standardization and image conversion. Image standardization: selected the following parameters in ImagePreprocessing: resample: X spacing, Y spacing, Z spacing were all set to 1.000 mm; intensity standardization: gray level discretization: desired minimum: 0.0, desired maximum: 255.0; grey discretization: 64. Two image conversions were performed: Co-occurrence of Local Anisotropic Gradient Orientations (CoLIAGe) and combination of Discrete wavelet transform and Local binary pattern (DWT + LBP).
Original features include 18 features from Firstorder, 23 from gray level cooccurence matrix (GlCM), 16 from gray level run length matrix (GLRLM), 16 from gray level SizeZone matrix (GlSZM), five from neighboring gray tone difference matrix (NGTDM), 13 from gray level dependence matrix (GLDM), three from shape, 13 from textural, three from normalized_radial_lengths, one from Area ratio of macroscopic contour, and one from Roughness index of boundary.
Conversion of CoLlAGe: First, compute the gradient orientation on a per-pixel basis within the ROI of lesion. Second, obtain the dominant orientations within a neighborhood of each pixel by principal component analysis. Third, calculate secondorder statistics in the dominant direction.
Conversion of DWT + LBP: It is the combination of Discrete wavelet transform and Local binary pattern. DWT is to decompose the original image into four new sub-images to replace it. Each sub-image is 1/4 times the size of the original image. Four new sub-images will be generated. LBP, which encodes the local structure around each pixel. Each pixel is compared with its eight neighbors in the 3 × 3 neighborhood by subtracting the central pixel value. The strict negative value is encoded with 0, and the others are encoded with 1. The binary number is obtained and marked with its corresponding decimal value by connecting all these binary codes clockwise.  Finally, 1,724 radiomics features were extracted from every ROI of lesion. Among them, 122 features were for the original image, 1,170 for CoLIAGe, and 432 for DWT+LBP classification.

Radiomics Analysis
Radiomics features preprocessing was performed in AK software. Two hundred and sixty metastatic nodules were randomly divided into training cohort and verification cohort 2 at the ratio of 3:1. Then, we obtained the features that remained stable (no statistically significance) during the evolution of 0.25-> 0.5-> 0.75-> 1 cm in the training cohort, verification cohort 1, and verification cohort 2, respectively. Comparing the similarities and differences of these radiomics features of the three cohorts, we obtained the features that remained stable in the training cohort and did not keep stable in the verification cohort 1 (benign nodules) but remained stable in the verification cohort 2 (metastatic nodules). Besides, we performed cross-validated experiments (n = 100) that the training cohort, and verification cohort 2 randomly changed to verify the results. Features are considered as "Stable features" if they remain stable more than 90 times in cross-validation. The experimental flowchart is shown in Figure 3.  Calculation process of Kruskal-Wallis Test: First, mix one feature data of multiple samples (Group A, B, C, D) and sort them in ascending order, assign rank 1 to the smallest observation, and rank 2 to the second smallest observation, and so on. Then, calculate the average ranks of each group of samples. finally, examine whether there are significant differences in the averages of the ranks of each group. The statistic is calculated as: where k is the number of sample groups, N is the total sample size, N i is the sample size of the i-th group; R i is the sum of the ranks in the i-th group of samples. Calculation process of Spearman rank correlation coefficient: Spearman correlation coefficient was calculated between every two features in groups A, B, C, and D, respectively. Rank the data of the two features separately and sort them in ascending order, assign rank 1 to the smallest observation, and rank 2 to the second smallest observation, and so on. The Spearman correlation coefficient is denoted by r. For the two sets of data X and Y of size n, convert them to grade data x i , y i (i = 1… …n). x, y present the average value of x i , y i , respectively. The correlation coefficient can be expressed as:

Basic Information
The basic information of 34 CRC patients are shown in Table 1. The basic information of 21 patients with benign lung nodules are shown in Table 2.

Radiomics analysis
In the initial experiment: In the colorectal cancer metastasis (training cohort), during the evolution process of 0.25-> 0.5->0.75-> 1 cm, 90 radiomics features remained unchanged relatively in total. In the benign nodules (verification cohort 1), during the evolution process of 0.25-> 0.5-> 0.75-> 1 cm, 293 radiomics features remained unchanged relatively in total. In the colorectal cancer metastasis (verification cohort 2), during the evolution process of 0.25-> 0.5-> 0.75-> 1 cm, 118 radiomics features remained unchanged relatively in total. Based on the above points, we found 20 features that remained stable in the metastasic nodules (training cohort), and they did not keep stable in the benign nodules (verification cohort 1), but remained stable in the metastatic nodules (verification cohort 2). The 20 features are list in Table 3.
The statistical data in the initial experiment are listed in Tables 4-6, respectively. Through 100 times of cross-validation, 11 features remain stable more than 90 times, and 19 features remain stable more than 80 times. The features are list in Table 7. The definition of these features is attached to the supplementary material.
Twelve pairs of features were chosen using Spearman correlation and |r|>0.75, including 7 Stable features and 10 other extracted features. There is a high correlation(|r|>0.90)   Table 8.

DISCUSSION
Our result shows that during the evolution of CRC metastases (200 cases) from small to large, a total of 90 features remained unchanged relatively, of which 20 features did not keep stable in the benign nodules of the verification cohort 1, but kept stable in the metastases of the verification cohort 2. Eleven features remain stable for more than 90 times in crossvalidation (n=100), and these 11 features are all included in the scope of the 20 features in the initial experiment. This result validates our initial hypothesis that there may be inherent radiomic features in the metastatic pulmonary nodules from colorectal cancer. These stable radiomics features may be the potential tools we are looking for to diagnosis lung metastases of CRC and assist clinical treatment decision making. Whether IPNs are metastases determines the M stage of colorectal cancer, as well as the treatment options. CT scan intuitively shows the morphological characteristics of the pulmonary lesion. However, at present, conventional CT can only accurately diagnose multiple metastases, Large nodules (14), or ground-glass nodules (ground-glass nodules rarely occur in metastatic cancer). Most small solid nodules (≤1 cm) are difficult to identify whether metastatic or benign nodules because they both tended to be round or oval with a smooth contour (15,16). Contrast-enhanced CT and PET/CT with FDG are hard to assess nodules smaller than this size either. Biopsy is considered to be the "gold standard" for the diagnosis of lung metastases. Still, the application of biopsy is clinically limited because it is an invasive procedure that may cause complications after sampling. Besides, it cannot evaluate the histopathological features of nodules as a whole. Although Most IPNs are benign, lung metastases still need to be identified as early as possible because they may develop rapidly. Diagnosis of early lung metastases has a positive impact on clinical practice, and timely detection of them will benefit patients in the long term. Only a small percentage of scholars specifically conducted a radiomics study concerning the indeterminate nodules smaller than 1 cm (13,17). Therefore, how to manage uncertain small pulmonary nodules of CRC patients becomes an important issue at present. Considering these factors, we chose to analyze the features of small nodules(≤1 cm) in this study.
As is known to all, benign and malignant tumor are completely different in genetic characteristics. It determines different cell morphology and biological behavior, reflecting different Histopathological structures and imaging representation (18). In order to obtain more useful information from image data of lesion, high-throughput extraction radiology images can provide valuable assistance (19). A retrospective study reported that radiomics features can discriminate the primary lung cancer from granulomatous nodules reached an area under the curve (AUC) of 0.90 (17). However, up to now, using radiomics features to predict IPNs for CRC has only been investigated in a few studies. TingDan Hu et al. recently developed the nomograms produced by combining radiomics features and clinical risk factors that have good discrimination ability and accuracy for metastasis prediction (12,13). It indicates that using radiomics tools to diagnose lung metastasis of colorectal cancer is feasible. This study was based on a new idea that some radiomics features of nodules with different pathologies may remain unchanged like genes. We tried to find the relatively stable features from metastatic nodules as" "radiomics markers", which may give guidance to clinical diagnosis. The conventional process of most radiomics studies is to find the features that can best distinguish metastasis from non-metastasis, and incorporate it into the model, such as logistic regression model, artificial neural network, and random forest, then finally, verify the model (20). By contrast, this study employed the method of hypothesis testing (Kruskal-Wallis test) to conduct this exploration study. Kruskal-Wallis test (21) is a nonparametric statistical test that can be used to evaluate whether two or more samples come from the same distribution. Thus, we utilized this statistical test method to seek the radiomics commonality of metastatic nodules, that is, to seek the inherent features of radiomics of them. Our study finally screened out 11 stable features from three categories, including "Original", "CoLIAGe" and "DWT + LBP" by AK software. Origin features (22)    phenotypes from similar morphological manifestations. They found that adenocarcinoma lesions have a greater density of CoLlAGe entropy compared with granuloma on computed tomography. Januar AdiPutra1 proposed to develop some new feature extraction and selection algorithms to improve the accuracy of classification. He combined wavelet and LBP (25), which able to divide breast tissue into normal or abnormal, and the performance of this scheme can produce high accuracy of 92.71%. Most of the stable features obtained in our research are "CoLlAGe". A total of 11 features keep stable in Cross-validation (100 times) more than 90 times. Eight of them belong to the "CoLIAGe", one of them belong to "original" and two of them belongs to "DWT + LBP". Metastatic nodules and benign nodules have different phenotypes in histopathology. However, CT appearance performance of small metastatic nodules such as shape, edge are very similar to small benign nodules. So the original features extracted such as "shape" are difficult to distinguish metastatic nodules from benign nodules. The texture feature can distinguish subtle pathological differences to a certain extent, but it only analyzes the difference in the overall intensity pattern of nodules. On the contrary, "CoLIAGe" is to identify the difference of local entropy pattern, which reflects the subtle local difference in microstructure. So, it can distinguish subtle pathology differences from similar overall texture and appearance in imaging.
We found that 12 pairs of features with a correlation coefficient >0.75 in Group A, B, C, and D. One pair of them both are Stable features. Seven features related to Stable features lacked stability slightly. Four features (original_firstorder_10Percentile, CoLIAGe2D_WindowSize5_Sum.Average_firstorder_ RootMeanSquared, CoLIAGe2D_WindowSize11_Sum.  Average_firstorder_90Percentile, wavelet.LLL_lbp.3D. k_firstorder_Kurtosis) related to Stable features were found to lack stability obviously, which is not surprising because their correlation with Stable features is relatively low.
Our study has some limitations. First, this is a single-center experiment, using the same CT machine and images of the same period to avoid the difference in image data. Still, it is not conducive for the promotion of the results. Second, there is a bias in the sample because this is a retrospective study, which may limit the accuracy of the results. Third, we used the criterion of 2-year stability for diagnosing benign lesions on thoracic CT do not have pathological confirmation. Fourth, manually delineating ROI may not be as repeatable as semiautomatic delineation and automatic delineation. Last, this study is still in the initiatory exploration stage, so the results still need a lot of data to confirm and further data mining.
Nowadays, radiomics is developing towards a promising direction as a non-invasive post-processing technology. However, the current radiomics approaches may be lacking in repeatability and reproducibility. Standardized image acquisition and reconstruction, multi-center data support, data sharing, and expansion of sample size will be necessary in the future.

CONCLUSIONS
In summary, this study shows that certain radiomics features of metastatic nodules are stable during the evolution process and do not keep stable in benign nodules, which provide another potential approach to the study of lung metastasis and the development of artificial intelligent diagnosis.    The first column is the Stable features that remained stable more than 90 times in Cross-validation (n=100). The second column is features that are related to Stable features. The number in () represents the number of times that the features remained stable in cross-validation (n=100).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the First Affiliated Hospital of Guangzhou Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
QM, QZ, HC, and TW designed the study. CL, YS, BL, RC, JH, and GL collected the data. CL and QM Segmented the lesion.
TW performed the data extraction and analyzed the data. TW and YL carried out the statistical analysis. CL wrote the manuscript. QM and YL reviewed and modified the manuscript. HC made the final revision to the article. All authors contributed to the article and approved the submitted version.

FUNDING
The work of Huai Chen was supported by the Natural Science Foundation of Guangdong Province, China (2019A1515011382).