Combined Radiomic and Visual Assessment for Improved Detection of Lung Adenocarcinoma Invasiveness on Computed Tomography Scans: A Multi-Institutional Study

Objective The timing and nature of surgical intervention for semisolid abnormalities are dependent upon distinguishing between adenocarcinoma-in-situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (INV). We sought to develop and evaluate a quantitative imaging method to determine invasiveness of small, ground-glass lesions on computed tomography (CT) chest scans. Methods The study comprised 268 patients from 4 institutions with resected (<=3 cm) semisolid lesions with confirmed histopathological diagnosis of MIA/AIS or INV. A total of 248 radiomic texture features from within the tumor nodule (intratumoral) and adjacent to the nodule (peritumoral) were extracted from manually annotated lung nodules of chest CT scans. The datasets were randomly divided, with 40% of patients used for training and 60% used for testing the machine classifier (Training DTrain, N=106; Testing, DTest, N=162). Results The top five radiomic stable features included four intratumoral (Laws and Haralick feature families) and one peritumoral feature within 3 to 6 mm of the nodule (CoLlAGe feature family), which successfully differentiated INV from MIA/AIS nodules with an AUC of 0.917 [0.867-0.967] on DTrain and 0.863 [0.79-0.931] on DTest. The radiomics model successfully differentiated INV from MIA cases (<1 cm AUC: 0.76 [0.53-0.98], 1-2 cm AUC: 0.92 [0.85-0.98], 2-3 cm AUC: 0.95 [0.88-1]). The final integrated model combining the classifier with the radiologists’ score gave the best AUC on DTest (AUC=0.909, p<0.001). Conclusions Addition of advanced image analysis via radiomics to the routine visual assessment of CT scans help better differentiate adenocarcinoma subtypes and can aid in clinical decision making. Further prospective validation in this direction is warranted.


INTRODUCTION
Lung cancer is the leading cause of cancer related deaths in the world. Adenocarcinoma is the most common lung cancer histologic type (1). With the increase in diagnostic imaging methods such as low-dose chest CT screening, there has been an increase in the detection of lung cancers at earlier stages often presenting as small solid/semisolid nodules or ground-glass opacities (GGOs) (2)(3)(4). The new IASLC guidelines (5) and the AJCC-defined 8th edition staging guidelines (6), along with the WHO classification of adenocarcinomas (7), have divided the adenocarcinoma into three broad categories: preinvasive adenocarcinoma [including adenocarcinoma in situ (AIS)], minimally invasive adenocarcinomas (MIA) and invasive adenocarcinoma (INV) (8). Histopathologically, lepidic growth (defined as growth along the alveolar walls) is a hallmark of noninvasive lesions (8). An invasive component in the new classification system is defined as either any cellular histologic subtype other than lepidic or invasion of malignant cells into myofibroblastic stroma (9). Lepidic cancers are observed to follow an orderly progression from the AIS to MIA before becoming INV (10).
Outcomes of adenocarcinomas following surgical resection are dependent on the initial stage. Resected stage IA non-small cell lung cancer (NSCLC) has a five-year overall survival rate of about 75% (11). In comparison, the five-year disease-specific survival rate for resected MIA is nearly 100% (12). The surgical approach and extent of lung resection for these lung nodules can be dictated by the adenocarcinoma histologic subtype (13). Sublobar resection can produce equivalent results to lobectomy in patients with non-or minimally invasive adenocarcinomas, with the benefit of preservation of lung parenchyma and potential eligibility for repeat resection in the case of subsequent primary tumor.
At present, there are no definite radiographic biomarkers to identify the extent of invasion prior to surgical resection. Although the invasive portion of the cancer is typically solid and non-invasive (lepidic portion) is ground glass in appearance on the CT scan, there is substantial overlap in the imaging findings between different subcategories. Furthermore, traditional CT scan evaluation can be subjective, and interpretations tend to vary widely depending on the experience of the reading radiologist (14). This coupled with other variables such as scan parameters, slice thickness, etc. limits reliable differentiation on routine radiologic assessment. Fine needle aspiration and imaging is inaccurate in determining the degree of invasion (15). Hence, there is a critical need to create an accurate model to non-invasively assess the level of invasion on imaging in these early-stage adenocarcinomas prior to surgical resection.
Radiomic textural features represent high-throughput quantitative imaging data extracted from radiographic scans to investigate subtle patterns within a region of interest (ROI) (16). These textural patterns extracted from inside and outside the nodule have been shown to have diagnostic, prognostic, and predictive utility in the lung cancer domain (17). These features are known to capture the underlying tumor biology and morphology of the tissue (18,19). There have been previous attempts at identifying the level of invasion using radiomic features, but most of them focus on radiomic textural analysis solely within the tumor (20,21). The peritumoral microenvironment has emerged as a promising candidate location for identifying the level of invasion, although it has been relatively unexplored (22).
In this study, we constructed a non-invasive radiographic biomarker based on baseline chest CT scan-guided radiomics to distinguish MIA from INV for stage I NSCLC patients with tumor diameter less than 3 cm. We evaluated these radiomics features via supervised and unsupervised approaches to identify specific patterns associated with INV and MIA nodules. We also divided patients into different subgroups based on the diameter of the nodule and evaluated classifier performance within nodules with different sizes. Finally, we compared our model with the performance of two radiologists and integrated the radiologists' score with the corresponding machine classifier performance to assess combined human and machine classification performance.

Study Population
We performed a retrospective, multi-cohort study of patients with resected MIA and stage 1A INV cases. A total of 268 patients from four different institutions were included in the study, all of whom had baseline (pre-treatment) CT scans. Based on our inclusion criteria, we selected a cohort of patients who had tumor size less than or equal to 3 cm with a special focus on a subset of 1 to 2 cm nodules.  The patients were randomly divided into training (D Train =40%) and validation (D Test =60%) cohorts. The D Train was selected to keep the same number of invasive and noninvasive lesions for training the model.

CT Segmentation and Radiomic Textural Feature Extraction
The index pulmonary lesions on these baseline CT scans were annotated using a freehand tool on 3D slicer software by an expert radiologist. The details regarding the CT scan parameters are listed in Appendix 1.
After the tumor was annotated, the area of the nodule was calculated using MATLAB 2015. The tumor area was calculated upon identification of the CT slice with the largest tumor region and was used for subgroup analysis and for creating a combined radiomics area-based model.
These annotated nodules were used to extract the intra-and peri-tumoral texture features. The peri-tumoral compartment around the nodule was defined via quantitative morphological operations (dilation) as a region extending radially from the nodule boundary up to roughly 15 mm, since a resection margin larger than 15 mm for lung nodules is considered not to confer additional benefit in terms of invasive lesions. The program was modified to eliminate skin, air, or fat when the mask was extended. Radiomic peritumoral features were extracted in an annular ring-shaped fashion. Five annular rings peritumorally were analyzed, each with 3-mm increments leading up to a maximum radius of 15 mm from the nodule periphery.
The details regarding extracted radiomic features are provided in Appendix 1. Haralick and Collage features are based on constructing a gray-level co-occurrence matrix and are known to capture the general disorganized and chaotic microarchitecture of the annotated region of interest (23,24). The Laws and Laplace features focus on the high-frequency content of the image, focusing on the boundary of the ROI (25). Gabor features are wavelet-based features (26).

Classifier Construction
All patients included in the study were divided into two groups: pre-invasive/minimally invasive lesion group (AIS, MIA) and frank invasive group (invasive pulmonary adenocarcinoma [IPA]). These two groups were used as a clinical endpoint for the classification problem.
First, all the radiomic features were analyzed using an unsupervised clustering approach to evaluate the ability of the radiomic features to differentiate the two different diagnostic categories blinded to prior pathology results or clinical outcome. First, the PCA was used on an entire feature pool and the top three principal components were used within K-Means clustering analysis. In addition, the hierarchical clustering was performed on an entire cohort.
Next, a supervised machine learning based logistic regression classifier, M R , was constructed using the top selected features from the training cohort, D Train, and then was validated on an independent and blinded validation set D Test . Further, D Test was divided into 3 different subsets based on the nodule size (less than 1 mm, 1 mm-2 mm, 2 mm-3 mm) and the performance of the model was observed on these various subgroups defined using nodule sizes.
Next, another supervised machine classifier model was constructed using the tumor areas, M A , and further integrated with radiomic features to construct the combined tumor arearadiomics based model (M R+A ).

Human Reader Experiment
The patients from D Test were individually assessed by two radiologists with 12 and 21 years of experience, respectively, being blinded to the ground truth pathologic diagnosis of the nodules. The two readers scored each tumor from 1 to 3; 1 suggesting the nodule was MIA, 2 being indeterminate, and 3 being INV. We calculated the accuracy of the radiologists' scores and further compared our radiomics model, M R , with the results from the radiologists (M HR ). Finally, we integrated the probability obtained from the radiomics model, M R , with the radiologists scoring (1 to 3) to obtain the combined human and machine-based interpretations (M R+HR ).

Statistical Analysis
Statistical analysis was performed using MATLAB 2015 and R. version 3.5.3. A two-sided p-value (<0.05) was considered significant for all the statistical analyses.
Looking at the radiomic feature pool, radiomic feature stability and reproducibility were evaluated using the RIDER test-retest dataset (27). This dataset contains 31 lung cancer patients -scanned two times, 15 min apart. These scans were used for calculating the intraclass correlation coefficient (ICC) for each feature vector, which measures the similarity between two feature vectors. Considering the threshold of 0.85, all feature vectors having a value less than this threshold were removed from the analysis.
Within an unsupervised clustering analysis, hierarchical clustering and principal component analysis (PCA) combined with K-means clustering was performed on D Train . The clustering results were compared against ground truth for calculating the clustering accuracy.
For feature selection and building a classifier, 300 iterations of threefold cross-validation were performed within the training dataset, D Train . The minimum redundancy maximum relevance (mRMR) feature selection algorithm (28) was implemented within the cross-validation setting to select the top-performing radiomic features that discriminate INV from MIA/AIS. MRMR identifies a set of features that maximally distinguished two classes while minimizing intra-feature correlation. A maximum of five features was selected to prevent overfitting due to the curse of dimensionality arising from an overabundance of features relative to the sample size. mRMR was performed using MATLAB software with a feature selection toolbox for C. The top radiomic feature set was further analyzed using box-andwhisker plots and qualitative feature maps comparing feature expressions between MIA/AIS and invasive adenocarcinomas.
To evaluate classifier performance, the area under the receiver operating curve (AUC), accuracy, sensitivity, and specificity were calculated for training and validation datasets. The significance of the addition of a nodule area to the radiomic model was calculated using DeLong's test and the corresponding p-value (29). Figure 1 shows the overall pipeline of the procedure.

Baseline Characteristics
Of the 268 nodules, 103 nodules were pathologically confirmed as pre-invasive lesions (AIS, n = 2) and minimally invasive lesions (MIA, n = 101), whereas 165 were confirmed as invasive lesions (INV = 165). Figure 2 shows the datasets and patient inclusion criteria along with training and testing set distributions. Figure 3 shows an example of CT scans with INV and MIA lesions.

Unsupervised Clustering
The extracted radiomic feature pool, that is, the combination of intratumoral textural and peritumoral textural radiomics features, was used within the principal component analysis (PCA) and kmeans clustering to perform unsupervised clustering analysis. The optimal number of clusters was two using the first three principal components on D Train . The constructed clusters had an accuracy of 73.1%. The compactness within the clusters, that is, how similar the members within the same group are, was 62.8%. The validation of the constructed cluster was performed using the silhouette coefficient (silhouette width). The silhouette plot (30) suggests that the clustering using the two groups was optimal with no negative silhouette width and most cluster values > 0.5 (Appendix 1).
Using the entire extracted radiomic feature pool, within the hierarchical clustering analysis, we observed the 4 obvious clusters of patients. Cluster 1 and Cluster 3 were associated with INV cases (cluster 1 = 100%, cluster 3 = 62.5% INV cases), whereas clusters 2 and 4 were associated with MIA cases (cluster 2 = 71.4%, cluster 4 = 75% MIA cases). The results of unsupervised clustering analysis are shown in Figure 4.
The unsupervised clustering analysis suggests that the majority of INV adenocarcinoma cases were clustered together, and MIA/AIS patients were clustered together. Collectively, these results suggest that these specific patient groups have distinct radiomic signatures.

Supervised Analysis and Selecting the Top Differentiating Features
During feature discovery for the model M R within D Train , the top 5 features identified included a peritumoral (CoLlAGe feature    Figure 5 shows the feature expression maps for the INV and MIA cases. The notations of various models constructed using these features are explained in Table 1. On the training cohort (D Train, N=106), the logistic regression AUC for M R was 0.917 [0.87-0.97]. The same classifier, within an independent blinded test set (D Test, N=162), M R yielded an AUC of 0.88 ( Table 2).
Next, within the subgroup analysis, we noticed the radiomic model, M R , was consistent in distinguishing INV from MIA. Further, M R is largely unaffected by the size of the nodule ( Table 2).
Further, when the area of the nodule was integrated within the logistic regression classifier along with the radiomic features, M R+A , there was no statistically significant improvement in AUC on the validation set as compared to M R standalone.

Experiment 2 -Comparing the Radiomics Analysis With Readers
We performed the analysis with individual radiologists (M HR ) along with the combined performance with the classifier (M R + HR ). Reader 1 had an AUC of 0.815 and an accuracy of 0.748 for predicting MIA cases from INV cases, whereas Reader 2 had AUC and accuracy of 0.796 and 0.742, respectively ( Table 3).
Within nodules <1 cm size, the classifier demonstrated an improvement over the radiologists' interpretations.     (31)(32)(33). Lobectomy is considered the standard surgical treatment for INV patients (13). Prior studies using CT scans features of air bronchograms and borders have not been able to accurately distinguish invasive lesions (32). An accurate way to determine the lesion's invasiveness pre-operatively on routine chest CT scans would be beneficial in guiding the need for the timing of resection and potentially amount of resection (13). In our work, we developed a computerized model using textural patterns known as radiomics to accurately differentiate MIA from INV cases from pre-treatment baseline CT scans from four different institutions. We observed that radiomic features extracted from intra-and peritumoral regions of these lung nodules harbor information related to nuances of the tissue properties not apparent to the naked eye. Additionally, in our analysis, two radiologists examined these scans in a blinded fashion. They scored them visually, and the integration of radiologists' interpretation with the classifier performance yielded the highest diagnostic accuracy on the test set (AUC = 0.909).
Although there have been previous successful attempts to examine GGOs via radiomics analysis (20,21,34,35), most studies focus on textural patterns extracted from within the lung lesions to differentiate MIA from INV lesions. Specifically, most of them employed features focused on the gray level cooccurrence-based matrix and wavelet-based feature families for identifying INV cases (20,36). A few studies have further integrated clinical and morphological features into the radiomics model to improve model accuracy (20,37).
Two of the top five features identified by our radiomics based supervised approach corresponded to the gray-level cooccurrence-based feature (GLCM) families which is in line with previously published results (20). In addition, we also noticed Laws and Laplace features extracted from within the nodule to be among the top set of discriminating features. These two feature families (Laws and Laplace) examine higher-order frequency content of the given region of interest (25). We noticed a higher expression of all intratumoral features for INV when compared to MIA nodules. The elevated expressions of these radiomic features could reflect more chaotic and haphazard microarchitecture within the comparatively high-risk invasive tumors ( Figure 5).
In our work, we also interrogated the tumor environment (TME) surrounding the nodule (i.e., peritumoral region) to evaluate its utility in providing complementary information with respect to disease diagnosis. We defined the radiomic profile of these GGO nodules during the feature discovery portion using a combination of intra-and peritumoral regions. Within our analysis, we noticed one of the top five features was from the peritumoral region. The feature was observed from within the 3 to 6 mm region outside the nodule. Recent studies have shed new light on this complex interaction between tumor and host immune cells and immune responses. In work by Altorki et al. (22), the authors demonstrated the role of TME for progression for pre-invasive to invasive adenocarcinoma lesions. They observed a dominant regulatory T cell-mediated immune suppression initiated at the precursor level sustained with rising intensity throughout malignant progression. Few studies also show that these perinodular radiomic features may reflect tumor microarchitecture changes or be  capturing the presence of tumor-infiltrating lymphocytes (TILs) (18). We noticed an increased peritumoral CoLlAGe feature (24) expression for MIA cases. Specifically with respect to the perinodular region, in work by Wu G. et al. (36), the authors did not observe an improvement in AUC with the addition of radiomic features from the perinodular region to differentiate INV cases from MIA and AIS (p = 0.11). They observed the most predictive features to emanate from the ground-glass and solid regions of the nodule. Whereas in the work by Wu L. et. al (38)., the authors show the utility of perinodular features for the same clinical problem. However, in our analysis, we noticed CoLlAGe peritumoral radiomic features to be statistically significant between the training and testing cohorts (Appendix 1; <0.01). CoLIAGe captures higher-order co-occurrence patterns of local gradient tensors at a voxel level and has been shown to be diagnostic and prognostic for a variety of disease indications (17,18,24). Additionally, in our analysis, we included the complete GGOs in addition to semisolid nodules unlike in the study by Wu et al. (36).
We further evaluated and compared our radiomic model with the tumor diameter. Studies show the two-dimensional diameter of the nodule to be one of the strongest predictors for pulmonary nodule risk classification in the quantitative CT image analysis. In work by Xu et al. (34), the authors noticed the diameter of GGOs to be significantly different in MIA and INV nodules, and a conventional model constructed using clinical and quantitative features (such as age, diameter, and density) yielded the best AUC (0.848; 95% CI = 0.750-0.946). The authors observed that the addition of radiomic features to the clinical and quantitative models did not improve the performance of the combined model (34). In contrast, multiple studies have reported the added benefit of radiomics to clinical and quantitative models (20,37). In a study by Weng et al. (20), the authors constructed a nomogram using lesion shape, solid component, and radiomics features from the nodule to obtain an AUC of 0.88. Similarly, Luo et al. (37) used three CT features (pleural indentation, solid component size, and solid component proportion) and one radiomic feature to help differentiate invasive pulmonary adenocarcinoma (IPA) from non-IPA to achieve a final AUC of 0.903. Interestingly, in our analysis, the radiomic model was superior to the model constructed with the nodule area in both training and testing sets. The addition of the nodule diameter to the radiomics model did not improve the performance especially in the independent validation set ( . We further created a subset of nodules with a diameter of less than 10 mm. We noticed that our radiomics classifier was prognostic even within the smaller nodules, giving an AUC of 0.76 [0.53-0.98] on these smaller lesions. Another unique aspect of our study included integrated classifier performance with expert radiologists' visual assessment of the  tumors. We noticed that the classifier had an overall improvement of~4.5% compared to the radiologists' interpretations. We noticed that the radiologists had high sensitivity, but poor specificity. After combining the probabilities of the machine learning classifier with the radiologists' score, the model AUC improved to 0.909 from 0.867 of the classifier model (p<0.05) and 0.816 of the radiologists' model (p<0.05).
Overall, our study has three main novel contributions including the multi-institutional nature, the addition of novel radiomics descriptor in the analysis, and human-machine comparison and integration to create consensus and accurate models.
Despite the progress made in this study, our work has some limitations. First, the developed model is completely retrospective in nature. For a successful transition into the clinically deployable model, a prospective evaluation will be required. Second, even though the analysis had multiple institutions, we did not truly validate the model independently since all the cases from individual sites were collapsed and subsequently randomly divided into training and testing sets. Future work will entail prospective data as well as validation on data from sites independent from those employed for developing the model.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
PV, DJ, MB, HP, RG, FJ, KL-CH, G-YL, and VV were involved in collecting data. PV and KB performed the analysis and wrote the first draft of the manuscript. AG and PR performed the radiologists' evaluation which was integrated with the imaging model. AM and PL outlined the experimental design. All the authors had access to data and approved the manuscript. AM decided to submit the manuscript. All authors contributed to the article and approved the submitted version. the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Copyright © 2022 Vaidya, Bera, Linden, Gupta, Rajiah, Jones, Bott, Pass, Gilkeson, Jacono, Hsieh, Lan, Velcheti and Madabhushi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.