CT radiomics to differentiate neuroendocrine neoplasm from adenocarcinoma in patients with a peripheral solid pulmonary nodule: a multicenter study

Purpose To construct and validate a computed tomography (CT) radiomics model for differentiating lung neuroendocrine neoplasm (LNEN) from lung adenocarcinoma (LADC) manifesting as a peripheral solid nodule (PSN) to aid in early clinical decision-making. Methods A total of 445 patients with pathologically confirmed LNEN and LADC from June 2016 to July 2023 were retrospectively included from five medical centers. Those patients were split into the training set (n = 316; 158 LNEN) and external test set (n = 129; 43 LNEN), the former including the cross-validation (CV) training set and CV test set using ten-fold CV. The support vector machine (SVM) classifier was used to develop the semantic, radiomics and merged models. The diagnostic performances were evaluated by the area under the receiver operating characteristic curve (AUC) and compared by Delong test. Preoperative neuron-specific enolase (NSE) levels were collected as a clinical predictor. Results In the training set, the AUCs of the radiomics model (0.878 [95% CI: 0.836, 0.915]) and merged model (0.884 [95% CI: 0.844, 0.919]) significantly outperformed the semantic model (0.718 [95% CI: 0.663, 0.769], p both<.001). In the external test set, the AUCs of the radiomics model (0.787 [95% CI: 0.696, 0.871]), merged model (0.807 [95%CI: 0.720, 0.889]) and semantic model (0.729 [95% CI: 0.631, 0.811]) did not exhibit statistical differences. The radiomics model outperformed NSE in sensitivity in the training set (85.3% vs 20.0%; p <.001) and external test set (88.9% vs 40.7%; p = .002). Conclusion The CT radiomics model could non-invasively, effectively and sensitively predict LNEN and LADC presenting as a PSN to assist in treatment strategy selection.


Introduction
Lung neuroendocrine neoplasm (LNEN) encompasses a spectrum of tumors that originate from pulmonary neuroendocrine cells, including small cell lung cancer (SCLC), large cell neuroendocrine carcinoma and carcinoid tumor.LNEN accounts for approximately 20% of pulmonary primary malignant tumors and its incidence is constantly increasing (1, 2).However, lung adenocarcinoma (LADC) as the predominant histological type, mainly arises from the alveolar epithelial cells of small bronchial mucosa, representing approximately 40% of pulmonary primary malignant tumors (3,4).LADC is often treated with surgery and early-stage cases even could be cured by lobectomy.Moreover, segmentectomy is recommended for LADC with diameter ≤ 2cm (5, 6).However, LNEN, particularly in poorly differentiated cases with rapid growth, often demonstrates heightened metastatic potential upon detection, leading to more advanced stage of the disease and less benefit from surgery or localized treatment (1, 7-9).For early-stage patients with LNEN detected on chest computed tomography (CT) scans, surgical resection is recommended after ruling out distant metastasis though positron emission tomography/ computed tomography and brain magnetic resonance imaging and confirming negative mediastinal lymph nodes on pathology (10-14).Furthermore, lobectomy is preferred over sublobectomy (14).Consequently, the different biological behaviors of LNEN and LADC significantly impact treatment strategies and prognosis, and early diagnosis is crucial to guide treatment and improve prognosis.
CT, as the preferred method for chest diseases, plays a crucial role in non-invasive diagnosis in lung cancer.In the clinic, LNEN typically presents as a central mass with rapid growth, while LADC often manifests as a peripheral nodule with different ground-glass component.In contrast to the typical manifestations, LNEN appearing as a peripheral solid nodule (PSN) is exceedingly rare and shares similar radiological findings with LADC.Moreover, both LNEN and LADC, manifesting as a PSN, are primarily observed in the early stages and typically lack associated clinical symptoms or signs (15).Therefore, the preoperative differential diagnosis of LNEN and LADC appearing as a PSN is quite challenging.Although radiologists could distinguish LNEN from LADC by analyzing their CT radiological findings to some extent, but the evaluation of radiological findings is subjective and prone to interobserver variation (16).Additionally, preoperative serum neuron-specific enolase (NSE) is also a prevalent tumor marker for non-invasive clinical prediction of LNEN.However, its predictive power is limited due to its relatively low sensitivity, ranging from 30% to 72.5% (17)(18)(19)(20).
Radiomics, a non-invasive, quantitative and objective prediction method, can extract feature information from digital images to assist in clinical decision-making (21)(22)(23).Previous studies have demonstrated that radiomics could effectively differentiate between LNEN and other cancers (24)(25)(26).However, research concerning the differential diagnosis of peripheral LNEN and LADC is scarce, with existing studies being conducted at a single center and lacking independent external validation (19,27).Therefore, the objective of this study was to develop a radiomics model using preoperative chest thin-section non-contrast CT to discriminate LNEN from LADC presenting as a PSN.Subsequently, independent external validation was performed to further explore its robust and generalization.

Materials and methods
The institutional review boards of five participating centers (Zhongshan Hospital [center 1], Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine [center 2], Zhongshan-Xuhui Hospital of Fudan University [center 3], Fudan University Shanghai Cancer Center [center 4], Affiliated Hospital of North Sichuan Medical College [center 5]) approved this retrospective multicenter study.Written informed consent was waived for the retrospective nature of this study.

Study patients
Patients from five medical centers who underwent needle biopsy or surgical resection for primary LNEN (between June 2016 and July 2023) were considered for this retrospective study.The inclusion criteria were as follows: (a) pathological confirmation of primary LNEN, (b) chest thin-slice (≤ 2mm) non-contrast CT within eight weeks before needle biopsy or surgery, (c) lesions located below the lung segment bronchus, (d) lesions with a long axis of ≤ 3cm in the maximum cross-section, (e) solid lesions.The exclusion criteria were as follows: (a) receiving other treatments before pathological confirmation, (b) multifocal cases, (c) poorquality CT images.The detailed process of recruitment is presented in Figure 1.
Patients from center 1-3 comprised the training set for model training and internal validation, while those in center 4 and 5 served as the external test set for external validation.The inclusion and exclusion criteria for LADC were the same as those for LNEN, except for the pathological diagnosis.Due to the predominance of male cases in LNEN, LADC patients in center 1-3 included in the training set were matched 1:1 by sex and age to minimize differences between groups and better train models.To evaluate the generalization of models in proximity to the real world, LADC cases twice as many as LNEN were chronologically collected in the external test set from center 4 and 5. Collection process was stopped once the number of LADC cases reached twice that of LNEN.Data collection spanned from July 2018 to May 2023 at center 4 and 5.

CT study protocols
Patients underwent chest thin-slice (slice thickness ranging from 1.00 to 2.00 mm) non-contrast CT within 8 weeks before needle biopsy or surgery.Detailed imaging protocols are explained in Supplementary Table S1.

Clinical characteristics and radiological signs assessment
Clinical data, including sex, age, and preoperative NSE levels (if available), were collected from the electronic medical record system.The NSE levels were standardized into dichotomous variables by a cutoff value of 16.30 ng/ml to be a clinical predictor for LNEN (NSE level ≥ 16.3 ng/ml).The volumes of interest (VOIs) were firstly automatically segmented by a deep learning network provided by the commercial software uAI Research Portal (United Imaging Intelligence Co., Ltd, China) (28).Subsequently, these VOIs were successively checked by a junior radiologist (XYL, with 2 years of experience in chest imaging) and a senior radiologist (FS, with 22 years of experience in chest imaging) and corrected if necessary.The radiological signs were initially evaluated by XYL and then reviewed by FS.The seven evaluated radiological signs (Figure 2) were as follows: (a) outer 1/3 lung zone, (b) upper lobe of right lung, (c) lobulation, (d) spiculation, (e) pleural indentation, (f) air bronchogram, (g) vascular convergence sign.The outer 1/3 lung zone refers to dividing each lung into three equal parts using concentric circles starting from the hilus and selecting the outermost third of the lung, which is another method for differentiating central from peripheral types.

Radiomics feature extraction
To minimize noise interference and normalize the background information prior to imaging, we transformed the grayscale images using a window level of -600 HU and a window width of 1200 HU.The image voxel dimensions were resampled to 1×1×1 mm (x-, y-, Original; Wavelet; LoG with sigma values of 1, 2, 3, 4, 5; Square; SquareRoot; Logarithm; Exponential; Gradient.Each type of images was extracted with seven types of features: shape-based; first-order; second-order: grey level cooccurrence matrix (GLCM), grey level dependence matrix (GLDM), grey level size zone matrix (GLSZM), grey level run length matrix (GLRLM), neighborhood gray-tone difference matrix (NGTDM).In total, 1781 features were extracted for per patient.

Model development
The model construction process, illustrated in Figure 3, employed the open-source Python package scikit-learn (version 0.24.2;https:// scikit-learn.org/stable/)for data processing and model construction.Firstly, each extracted feature underwent Z-score normalization to ensure comparability.Secondly, the recursive feature elimination (RFE) method was applied for feature selection.Subsequently, the training set was split into cross-validation (CV) training set and CV test set by the ten-fold CV method to train and internally validate the radiomics model based on the SVM classifier.The trained optimal model parameters obtained from ten-fold CV were then fitted to the training set to check for overfitting.Additionally, the radiomics model was externally validated in the external test set to evaluate its generalization.
In the training set, univariable and multivariable logistic regression analysis were employed to identify independent riskfactors of LNEN and LADC among standardized radiological signs.These risk-factors were then used to develop a semantic model.The merged model incorporated the radiological signs applied in the semantic model and radiomics scores from the radiomics model to investigate whether the combination of radiological signs and radiomics information can improve predictive performance.Additionally, SVM classifier and ten-fold CV were utilized in both the construction and internal validation of the semantic model and merged model.External validation of both these two models were performed in the external test set.

Statistical analyses
Continuous variables were presented as medians and interquartile ranges (IQR), analyzed using the Mann-Whitney U test for group comparisons.Categorical variables were presented as frequencies and percentages, and their group comparisons were  Univariable and multivariable logistic regression analysis were conducted to identify risk-factors with odds ratio (OR) and 95% confidence interval (CI).A nomogram was constructed for the merged model.The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and compared using the Delong method.For the cases with NSE levels, the McNemar test was also used to compare the diagnostic performance metrics (e.g., accuracy, sensitivity, specificity) of the radiomics model and NSE in distinguishing LNEN from LADC.Calibration curve was plotted to compare the predicted values with the observed values.Decision curve analysis was used to assess clinical utility.Statistical analysis was performed with Python (version 3.9.12;https://www.python.org/),R software (version 4.2.2;https://www.r-project.org/) and SPSS software (version 25.0).A twosided P value less than 0.05 was considered statistically significant.Compared with the LADC group, the LNEN group exhibited significantly lower occurrences in the outer 1/3 lung zone, lobulation, spiculation, and pleural indentation in both the training set and external test set (p <.05 for all) (Table 1).However, the statistical difference of air bronchogram was only observed in the training set (p <.001), but not in the external test set (p = .108).There was no statistical difference in the seven radiological signs between the training set and external test set (Supplementary Table S3).

Performance of models and NSE for differentiating LNEN from LADC
In the ten-fold analysis in the training set, the radiomics model and merged model had higher AUCs than the semantic model (Table 3).The semantic model, radiomic model and merged model recorded AUCs of 0.707 (95% CI: 0.648, 0.762), 0.879 (95% CI: 0.836, 0.919) and 0.887 (95% CI: 0.845, 0.925) in the CV training set, respectively.In the CV test set, AUCs were 0.708 (95% CI: 0.531, 0.863) for the semantic model, 0.852 (95% CI: 0.699, 0.972) for the radiomics model and 0.878 (95% CI: 0.738, 0.983) for the merged model.The optimal model parameters derived from the ten-fold CV were implemented on the training set without overfitting for all three models.
In the training set, the AUCs of both the radiomics model (0.878 [95% CI: 0.836, 0.915]; p <.001) and merged model (0.884 [95% CI: 0.844, 0.919]; p <.001) significantly outperformed the semantic model (0.718 [95% CI: 0.663, 0.769]).However, the AUCs of both the radiomics model (0.787 [95% CI: 0.696, 0.871], p = .351)and merged model (0.807 [95% CI: 0.720, 0.889], p = .183)did not exhibit statistical differences compared to the semantic model (0.729 [95% CI: 0.631, 0.811]) in the external test set.The performance of all the models is shown in Table 4.The receiver operating characteristic curves, calibration curves and clinical decision curves are provided in Figures 5, 6.The calibration curves showed the radiomics model with the best performance between the predicted probability and the actual probability.Decision curves showed that three models could achieve net benefit within a reasonable range of threshold probabilities. Notably

Discussion
The existing limited studies primarily focus on cases presenting with masses and the differential diagnosis of peripheral SCLC and LADC in single-center studies (19,27).We developed and internally validated a radiomics model using preoperative chest thin-section non-contrast CT to discriminate LNEN from LADC manifested as a PSN and performed external validation to assess the performance of the model.The AUCs of the radiomics model were 0.878 in the training set and 0.787 in the external test set, respectively.Furthermore, in 254 patients with NSE examination, the radiomics model exceled NSE in sensitivity in both the training set (85.3% vs 20.0%, p <.001) and external test set (88.9% vs 40.7%, p = .002).The satisfactory predictive performance of the CT radiomics model implied its potential for non-invasively, quantitatively, objectively and sensitively discriminate between LNEN and LADC manifesting as a PSN, thereby aiding in treatment guidance.
Preoperative histological biopsy is a commonly used invasive method for identifying histological type of lung cancer when diagnosis is challenging.However, this method is invasive and highly dependent on the operators' experience for successful diagnosis.Compared with the localized sampling of biopsy, CT screening non-invasively offers comprehensive information about

B A
Selected Features for the construction of the radiomics model and merged (A) Feature weight map of the radiomics model.(B) Nomogram of the merged model for differentiating neuroendocrine neoplasm from adenocarcinoma in patients with a peripheral solid pulmonary nodule.LNEN, lung neuroendocrine neoplasm.the lesion.In our study, LNEN presenting as a PSN was less with lobulation, spiculation, pleural indentation and air bronchogram, which was consistent with previous studies on differential diagnosis of peripheral SCLC and LADC (19,20,29).This consistency possibly is attributed to the fact that LNEN all originates from pulmonary neuroendocrine cells and LNEN included in our study were predominantly SCLC.The semantic model developed by radiological findings in this study achieved AUCs of 0.718 and 0.729 in the training set and external test set, respectively, which indicated CT radiological findings could help differentiate LNEN from LADC appearing as a PSN to some extent.The differences of radiological findings between LNEN and LADC may be associated with the propensity of LADC to involve local regions and induce changes in surrounding pulmonary structures.
Radiomics is considered as a digital biopsy approach for predicting tumor biological characteristics (30)(31)(32).A previous study using a CTbased radiomics model successfully differentiated peripheral SCLC from LADC with AUCs yielding 0.858 and 0.836 in the training set and validation set, respectively (19).Our radiomics model based on preoperative chest thin-slice non-contrast CT displayed satisfactory performance in distinguishing between LNEN and LADC presenting as a PSN, with AUCs of 0.879 and 0.852 for the CV training set and CV test set, respectively.Furthermore, this radiomics model still achieved an acceptable AUC of 0.787 in the external test set.The 14 filtered second-order texture features (e.g., gradient-glcm-SumSquares, exponential-glrlm-RunLengthNonUniformity) of our radiomics model may potentially reflect the difference in the uniformity of lesion density (33), which might be related to the fact that SCLC exhibits greater homogeneity in comparison with LADC (20,34).In addition, the radiomics method offered a quantitative and objective assessment approach, especially when combined with automatic threedimensional segmentation rather than manual segmentation and twodimensional segmentation (35)(36)(37).Therefore, the radiomics model could potentially mitigate misdiagnosis from inexperienced radiologists and enhance diagnostic reliability in comparison with the subjectivity and variability of the semantic model based on radiological signs evaluated by radiologists (38,39).Additionally, the performance of the merged model had improved on the benchmark of the radiomics model, suggesting that radiological signs may enhance diagnostic performance to some extent (40), but further validation with a larger  sample size remains necessary.Besides, the inclusion of manually evaluated radiological signs in the merged model also made it less convenient and objective than the radiomics model.NSE, a commonly used clinical predictor for LNEN, demonstrated a sensitivity of 72.5% in a cohort of 80 peripheral SCLC cases, half of which were in advanced stages (19).However, this sensitivity decreased to 52.4% in a smaller cohort of 21 SCLC cases presenting as a peripheral nodule (20).Moreover, the sensitivity was only 39.2% in resectable lung carcinoid tumor (41).This suggests that NSE expression may increase with more advanced stages and higher-  Our study also had several limitations.Firstly, the retrospective nature of this study may induce selection bias, despite efforts have been made to match LNEN with LADC based on sex and age in the training set to minimize differences between groups, which may also affect models' performance to some extent.Furthermore, prospective studies are necessary to validate the generalizability of our model.Secondly, the sample size in our study was relatively limited.Although we have collected 202 cases of peripherally LNEN data from five centers, a larger sample size is required for further validation and data-driven deep learning.Thirdly, enlargement of mediastinal or hilar lymph node was not included, as our study mainly focused on the characteristics of the nodule itself.Finally, the radiomics features in this study were solely extracted from unenhanced chest CT images.While chest non-contrast CT scans are straightforward and low-cost, further studies using chest enhanced CT images are needed to identify subtler invisible variations in uniformity of density, thereby improving diagnostic accuracy.
In conclusion, the CT radiomics model demonstrated effective performance in distinguishing between LNEN and LADC in patients with a PSN.Therefore, the radiomics model may serve as a non-invasive, quantitative, objective and sensitive approach for differentiating peripheral LNEN from LADC.

FIGURE 1
FIGURE 1 Flow diagram of the patient selection from five medical centers.LNEN, lung neuroendocrine neoplasm; LADC, lung adenocarcinoma; CV, crossvalidation; SCLC, small cell lung cancer; LCNEC, large cell neuroendocrine carcinoma.Center 1 indicates Zhongshan Hospital, Center 2 indicates Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Center 3 indicates Zhongshan-Xuhui Hospital of Fudan University, Center 4 indicates Fudan University Shanghai Cancer Center, Center 5 indicates Affiliated Hospital of North Sichuan Medical College.

FIGURE 2
FIGURE 2 Radiological signs of four types of lung tumor.(A) A 74-year-old man with lung adenocarcinoma in the medial segment of the middle lobe of the right lung, exhibiting signs of lobulation, spiculation, air bronchogram, and pleural indentation.(B) A 58-year-old man with lung large cell neuroendocrine carcinoma in the apical segment of the upper lobe of the right lung, displaying lobulation, spiculation and pleural indentation.(C) A 58-year-old man with lung carcinoid tumor in the anterior basal segment of the lower lobe of the right lung, demonstrating lobulation, air bronchogram, and vascular convergence sign.(D) An 81-year-old man with small cell lung cancer in the posterior segment of the upper lobe of the right lung, presenting signs of lobulation, spiculation, and vascular convergence sign.Spiculation (red arrow), air bronchogram (yellow arrow), pleural indentation (blue arrow), vascular convergence sign (green arrow).
conducted by Pearson's chi-squared test or McNemar test.

6
FIGURE 5Receiver operating characteristic curve analysis of models for differentiating lung neuroendocrine neoplasm from adenocarcinoma in the training set (A) and external test set (B). AUCs are reported with 95%CIs in parentheses.AUC, area under the receiver operating characteristic curve.
Among the 201 patients with primary LNEN, 122 cases were SCLC, 41 cases were large cell neuroendocrine carcinoma and 38 cases were carcinoid tumor.Additionally, 244 patients with primary LADC were included in this study.A total of 445 patients (median age, 64 years [IQR, 57-69 years]; 345 men) were included, with 316 (158 LNEN) in the training set and 129 (43 LNEN) in the external test set.Furthermore, among the 445 patients included in this study, 254 patients had NSE examinations (median age, 64 years [IQR, 58-69 years]; 189 men): 161 (75 LNEN) in the training set and 93 (27 LNEN) in the external test set.All baseline characteristics are detailed in Table 1; Supplementary Table S2.

TABLE 1
Baseline patient characteristics in the training set and external test set.
Unless otherwise indicated, data are numbers of patients, and data in parentheses are percentages.LNEN, lung neuroendocrine neoplasm; LADC, lung adenocarcinoma; RU, upper lobe of right lung; NA, not applicable.† Data are medians, with interquartile ranges in parentheses.*P-values are statistically significant.

TABLE 2
Logistic regression analysis of variables for their association with LNEN and LADC in the training set.

TABLE 3 Mean
AUCs and accuracies of models in CV training set and CV test set.Unless otherwise indicated, data are the means derived from the 10-fold cross-validation.AUC, area under the receiver operating characteristic curve; CV, cross validation.†

TABLE 4
Diagnostic performance of models for differentiating LNEN from LADC.
Unless otherwise indicated, data are percentages, with proportions of patients(numerator/denominator) in parentheses.Ref, reference; LNEN, lung neuroendocrine neoplasm; LADC, lung adenocarcinoma; AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value.† Data in parentheses are 95% CIs.‡ P value was calculated with the Delong test and indicates the significance level of the comparison of AUCs with the semantic model as the reference in the corresponding data set.*P-values are statistically significant.
grade LNEN.Regrettably, only 254 patients (102 LNEN) in our study had NSE levels available, possibly due to the rarity of LNEN presenting as a PSN which leads clinicians to overlook it and not perform NSE examination.In this study, the sensitivity of NSE was notably low, only 20.0% for the training set and 40.7% for the external set, potentially due to the predominance of early-stage cases and the inclusion of lung carcinoid tumor cases.Compared with NSE, the radiomics model exhibited statistically significant superior sensitivity of 85.3% and 88.9% for the training set and external test set, respectively, across a cohort of 254 patients undergoing NSE testing.These findings suggest that the radiomics model offered a substantial improvement in suggesting LNEN over NSE, positioning it as a promising non-invasive predictive tool.Consequently, this radiomics model could facilitate subsequent positron emission tomography/computed tomography, brain magnetic resonance imaging and/or needle biopsy examination for clinical diagnosis and staging, guiding the selection of optimal treatment strategies.