Nomogram combining clinical and radiological characteristics for predicting the malignant probability of solitary pulmonary nodules measuring ≤ 2 cm

Background At present, how to identify the benign or malignant nature of small (≤ 2 cm) solitary pulmonary nodules (SPN) are an urgent clinical challenge. This retrospective study aimed to develop a clinical prediction model combining clinical and radiological characteristics for assessing the probability of malignancy in SPNs measuring ≤ 2 cm. Method In this study, we included patients with SPNs measuring ≤ 2 cm who underwent pulmonary resection with definite pathology at Qilu Hospital of Shandong University from January 2020 to December 2021. Clinical features, preoperative biomarker results, and computed tomography characteristics were collected. The enrolled patients were randomized at a ratio of 7:3 into a training cohort of 775 and a validation cohort of 331. The training cohort was used to construct the predictive model, while the validation cohort was used to test the model independently. Univariate and multivariate logistic regression analyses were performed to identify independent risk factors. The prediction model and nomogram were established based on the independent risk factors. The receiver operating characteristic (ROC) curve was used to evaluate the identification ability of the model. The calibration power was evaluated using the Hosmer–Lemeshow test and calibration curve. The clinical utility of the nomogram was also assessed by decision curve analysis (DCA). Result A total of 1,106 patients were included in this study. Among them, the malignancy rate of SPNs was 85.08% (941/1,106). We finally identified the following six independent risk factors by logistic regression: age, carcinoembryonic antigen, nodule shape, calcification, maximum diameter, and consolidation-to-tumor ratio. The area under the ROC curve (AUC) for the training cohort was 0.764 (95% confidence interval [CI]: 0.714–0.814), and the AUC for the validation cohort was 0.729 (95% CI: 0.647–0.811), indicating that the prediction accuracy of nomogram was relatively good. The calibration curve of the predictive model also demonstrated a good calibration in both cohorts. DCA proved that the clinical prediction model was useful in clinical practice. Conclusion We developed and validated a predictive model and nomogram for estimating the probability of malignancy in SPNs measuring ≤ 2 cm. With the application of predictive models, thoracic surgeons can make more rational clinical decisions while avoiding overtreatment and wasting medical resources.


Introduction
With the development and popularity of high-resolution computed tomography (HRCT) as a dominating approach for lung cancer screening, the detection rate of isolated solitary pulmonary nodules (SPNs) has significantly increased in recent years (1)(2)(3).Large sample sizes of lung cancer screening trials have shown that the detection rate of SPNs ranges from 8 to 51%, with the vast majority being approximately 20% ( 4).An SPN is defined as a single, focal, round, hyperdense lung shadow ≤ 3 cm in diameter, surrounded by the lung parenchyma, without pulmonary atelectasis, lymph node enlargement, or pleural effusion (5,6).Among them, an SPN with a size ≤ 20 mm is defined as small SPN (7).Although SPN size is an independent risk factor for malignancy (8)(9)(10), approximately 67.5-78% of small SPNs are malignant (7).The ability to accurately distinguish the degree of malignancy of an SPN is critical to providing patients with more beneficial and personalized treatment, which is currently a research hotspot and difficult area of clinical work (11).
Screening for lung cancer using low-dose computed tomography is an effective modality that can reduce mortality from lung cancer (12, 13).However, the qualitative diagnosis of SPNs measuring ≤ 2 cm remains a challenge for thoracic surgeons.The fact that the pathological results of small SPNs are usually confirmed by invasive or minimally invasive methods imposes a heavy burden on patients and healthcare systems (14).Therefore, a non-invasive method to identify benign and malignant SPNs is highly beneficial to clinical practice.At present, many factors have been identified to help determine the nature of SPNs before surgery.For example, previous studies have demonstrated the value of combined cytokine and tumor marker assays for the differential diagnosis of benign and malignant SPNs, which can improve the accuracy of early lung cancer diagnosis (15)(16)(17)(18)(19)(20)(21)(22)(23).Radiological characteristics, such as consolidation-to-tumor ratio (CTR), nodule diameter, presence of spiculation, and location in the lobe, are also increasingly used in the diagnosis of early lung cancer (9,(24)(25)(26).Among them, CTR has been a hotspot in lung cancer imaging research in recent years.Many studies have also confirmed that it can be used as the main reference index for judging the malignancy of early lung cancer and for sub-lobar resection, and it is also an independent correlate of recurrence and prognosis of early lung cancer (27)(28)(29)(30)(31)(32).
To date, there have been many predictive models for SPN diagnosis, such as the most classic Mayo model, Brock University model, Peking University People's (PKUPH) model, VA model, and so on.Most of these models have achieved more than 80% diagnostic accuracy.However, each model has its own shortcomings and needs to be further optimized.
The aim of this study was to establish a new predictive model and nomogram to assist in the identification of benign and malignant SPNs measuring ≤ 2 cm based on clinical characteristics, imaging features, and hematological biomarkers, which can help thoracic surgeons make more rational clinical decisions.

Patients and methods
This single-center study was approved by the Institutional Review Board of Qilu Hospital of Shandong University (registration number: KYLL-202008-023-1).Owing to the retrospective nature of the study, the need for written informed consent was waived.All methods were performed in accordance with the Declaration of Helsinki.

Patient selection
This was a retrospective study of patients with small SPNs who underwent minimally invasive pulmonary resection with definite pathological results from January 2020 to December 2021 at Qilu Hospital of Shandong University.The inclusion criteria were as follows: (1) patients with a single intra-pulmonary nodule suggested by chest computed tomography (CT) within 1 month before surgery, (2) SPN with a maximum diameter ≤ 2 cm, (3) absence of pulmonary atelectasis and active inflammatory imaging in the lung, and (4) clear pathological findings obtained by surgical resection.The exclusion criteria were as follows: (1) patients aged< 18 years, (2) patients undergoing thoracotomy, (3) incomplete perioperative data, and (4) patients with a history of other malignant disease within 5 years.
All enrolled patients were randomly assigned to the training cohort and validation cohort at a ratio of 7:3 using a random split sample method.The training cohort was used to develop the prediction nomogram, while the validation cohort was used to test the performance of the nomogram.
The period of blood collection from patients in this study was standardized, and all patients had their blood obtained in a fasting and tranquil condition on the morning of the second day of hospitalization.All patients' blood test results were acquired within one week before surgery.
All chest CT tests, including the whole chest scan, were performed in the supine position.Single scans were taken while holding one's breath and breathing deeply.The measures were taken by two radiologists with more than five years of experience in chest radiology.Two radiologists independently measured each imaging feature, and any discrepancies were reevaluated by a third radiologist with more than 20 years of experience in chest radiography.Consensus was used to resolve conflicts.The centrality of location was defined as an SPN measuring ≤ 2 cm being located within the inner two-thirds of the lung parenchyma on axial CT images, while peripherality was defined as a nodule located within the outer third.Spiculation was defined as strands that spread from the nodal margins into the lung parenchyma but did not contact the pleural surface.Cavitation signs were defined as gas-filled spaces considered transparent or low-attenuation regions.Calcification was defined as having one of the following patterns on CT imaging: stratification, central nodule, diffusion, or popcorn pattern.The vascular penetration sign was defined as the presence of a vessel crossing the node observed on CT images.Linear attenuation toward the pleura or the primary or secondary fissure from the SPN is known as a pleural adhesion.Direct bronchial involvement of nodules is known as the CT bronchial sign.Lobulation was defined as a wavy or scalloped portion of the lesion surface and strands extending from the nodal margins into the lung parenchyma.The lymph node enlargement sign was an enlargement of the mediastinal lymph nodes that can be observed on the CT image.Lymph node enlargement is defined as a short axis of lymph nodes > 1 cm on CT images.The pleural effusion sign was defined as a blunting of the angle of the rib diaphragm visible on CT images.CTR was the ratio of the diameter of the solid component of the pulmonary nodule to the maximum diameter of the nodule.PNI, NLR, dNLR, MLR, NLPR, SIRI, AISI, SII, and PIV were calculated using the following formulas:

Establishment of the predictive model
Data from the training cohort was analyzed using univariate analysis to assess all factors affecting the probability of SPN malignancy.Then, to find independent predictors, multivariate logistic regression was performed.All factors with P values less than 0.05 in the univariate analysis were included in further multivariate logistic regression analysis.R statistical software (Windows version 4.2.1, http://www.r-project.org/) was used to create the prediction model and nomogram introducing meaningful independent risk factors in the multivariate analysis.A score for each variable was calculated using the regression model, and the predicted probability of malignancy could be derived by summing the scores of the individual variables.

Predictive model and nomogram performance
The performance of the predictive nomogram was assessed by discriminatory power, calibration, and clinical utility.Discriminative power is the ability of a model to correctly distinguish between events and non-events.We used receiver operating characteristic (ROC) curves to assess the identification efficiency of the predictive nomogram (33).Calibration measures how well the predicted probabilities agree with the actual results.The Hosmer-Lemeshow test was used to assess the calibration capability, with a P value greater than 0.05 indicating satisfactory calibration (34).Then, a nomogram calibration curve was formed to further evaluate the calibration.Internal validation was performed by using a bootstrapping method that was repeated 1,000 times (35).Decision curve analysis (DCA) was used to assess the clinical utility of the predictive nomogram based on the net benefits of different threshold probabilities (36).Based on the ROC curve analysis of the training cohort, the optimal cutoff value was determined when the Youden index (sensitivity + specificity -1) reached its maximum value.

Statistical analysis
All statistical analyses were performed using SPSS 26.0 (SPSS Inc., Chicago, Illinois, USA) and R statistical software (Windows version 4.2.1, http://www.r-project.org/).Normally distributed continuous variables were expressed as the mean ± standard deviation and compared using Student's t-test.For non-normally distributed continuous variables, the data were expressed as median (interquartile range) and compared by the Mann-Whitney U test between the two groups.Categorical variables were compared using Pearson's chi-square test or Fisher's exact test.Bilateral P values of< 0.05 were considered statistically significant.

Patient characteristics
The procedure for identifying and selecting eligible patients is shown in Figure 1.Our study initially included 2213 initial patients who underwent surgery from January 2020 to December 2021 at our hospital.All initial patients were consecutive and were not selected.A total of 1,106 eligible patients were included in our study after a cascade of screening.Among them, the malignancy rate of SPNs measuring ≤ 2 cm was 85.08% (941/1106).Enrolled patients were then randomly assigned to either the training cohort (n = 775) or validation cohort (n = 331) at a ratio of 7:3, and there were no significant differences in all variables between the two cohorts (Table 1).Patients were divided into malignant and benign groups according to the malignancy or non-malignancy of SPNs.The characteristics of the two groups in the training and validation cohorts are shown in Table 2.

Nomogram establishment
All six independent risk factors for SPNs measuring ≤ 2 cm were included to build the logistic regression model.The predicted probability of malignancy for small SPNs could be calculated by using the following formula: ln (p/1-p) = -2.511× CTR + 1.368 × maximum diameter -2.997 × calcification (no = 0; yes = 1) + 0.025 × age + 0.235 × CEA + 0.455 × shape (regularity = 0; irregularity = 1) -0.941.Based on the above formula, a malignancy probability prediction nomogram for SPNs measuring ≤ 2 cm was drawn using R statistical software (Figure 3).As shown in this nomogram, there are a total of nine axes, and axes 2-7 represent the six variables in the prediction model.The estimated score for each risk factor can be calculated by plotting a line perpendicular to the highest point axis and can be further summed to obtain the total score.The total point axis is then used to predict the probability of malignancy for SPNs before surgery, and the appropriate surgical procedure can then be further selected.

Predictive performance and validation of the nomogram
The discriminative power of the predictive model and nomogram was assessed by the ROC curve (Figure 4).The area under the curve (AUC) for the training cohort was 0.764 (95% CI: 0.714-0.814),and the AUC for the validation cohort was 0.729 (95% CI: 0.647-0.811),indicating a relatively good predictive accuracy of the nomogram.The cut-off value for the ROC curve of the training cohort was 0.819, and the sensitivity and specificity were 0.680 and 0.766, respectively (Table 4).Calibration power was evaluated using the Hosmer-Lemeshow test and calibration plots.P values for the Hosmer-Lemeshow test were 0.4348 for the training cohort and 0.3175 for the validation cohort, indicating a negligible difference between the predicted probability and actual observed probability.The calibration plots for the training (Figure 5A) and validation (Figure 5B) cohorts also demonstrate a good calibration of the predictive nomogram.

Clinical utility of the predictive nomogram
DCA was used to assess the clinical utility of the predictive nomogram (Figures 6A, B).The results show that the nomogram provided greater net benefit and broader threshold probabilities for predicting the risk of malignancy in SPNs measuring ≤ 2 cm in both the training and validation cohorts, indicating that the nomogram is clinically useful.

Discussion
At present, the most frequent cause of cancer-related death is lung cancer (37-39).Most lung cancers are at an advanced stage when detected and have a poor prognosis.Enhancing the diagnosis rate of early-stage lung cancer to provide proper and rational treatment is crucial to increasing the survival rate (40).Several recent institutional retrospective studies have suggested that survival and recurrence rates may be the same for lobectomy and sub-lobar resection in patients with small lung cancers measuring ≤ COPD, chronic obstructive pulmonary diseases; ASA, American Society of Anesthesiologists; PNI, prognostic nutritional index; NLR, neutrophil-lymphocyte ratio; PLR, platelet-lymphocyte ratio; MLR, monocyte-lymphocyte ratio; dNLR, derived neutrophil-to-lymphocyte ratio; NLPR, neutrophil to lymphocyte and platelet ratio; SIRI, systemic inflammatory response syndrome; AISI, aggregate index of systemic inflammation; SII, systemic inflammation index; PIV, pan-immune-inflammation value; LDH, lactate dehydrogenase; SA, serum amyloid; 5'-NT, 5'nucleotidase; Pro-GRP, pro-gastrin-releasing peptide; SCC, squamous cell carcinoma; Cyfra21-1, cytokeratin 19-fragments; CEA, carcinoembryonic antigen; CA125, carcinoma antigen 125; NSE, neuron-specific enolase; BMI, body mass index; FEV1, forced expiratory volume in one second; MVV, maximal voluntary ventilation; CTR, consolidation-to-tumor ratio.
Multivariate logistic regression analysis of forest plots.CEA, carcinoembryonic antigen; CTR, consolidation-to-tumor ratio.A nomogram for predicting the probability of malignancy in SPN measuring ≤ 2 cm.CEA, carcinoembryonic antigen; CTR, consolidation-to-tumor ratio.There are a total of 9 axes, and axes 2-7 represent the 6 variables in the prediction model.The estimated score for each risk factor can be calculated by plotting a line perpendicular to the highest point axis, and can be further summed to obtain the total score.The total point axis is then used to predict the probability of malignancy for SPN measuring ≤ 2 cm before surgery.
2 cm.Therefore, the management of patients with growing SPNs of 2 cm or smaller is a high priority for clinicians.In this study, we developed a clinical prediction model and designed a nomogram with good predictive performance for assessing the malignancy of small SPNs.This predictive nomogram can be used to estimate the probability of nodal malignancy in patients with SPNs measuring ≤ 2 cm, and thoracic surgeons can make more rational clinical decisions while avoiding overtreatment and wasting medical resources.In this study, multivariate logistic regression analysis showed that age, CEA, shape, calcification, maximum tumor diameter, and CTR were independent predictors for estimating SPN malignancy.Based on these results, a clinical prediction model for SPNs measuring ≤ 2 cm was developed by incorporating one general clinical indicator (age), four imaging indicators (shape, calcification, maximum tumor diameter, and CTR), and one laboratory indicator (CEA).Although various independent risk factors in this model have been previously reported (41)(42)(43)(44)(45)(46)(47)(48)(49)(50), not one has yet included CTR along with clinical and laboratory indicators to predict the malignancy of SPNs measuring ≤ 2 cm.Some patients have clinical features that are considered risk factors for lung malignancy, such as advancing age, sex, smoking history, and chronic obstructive pulmonary disease (24, 41, 42, 51-55).Age has been shown to independently influence the malignancy of SPNs, and the risk of lung cancer incidence increases significantly with age (41,42,51,54).The results of the present study are consistent with the above findings.In addition, sex is a major risk factor for the development of lung cancer, with women being more likely to develop lung cancer (24,42,51).Smoking history and COPD are also risk factors and promote the development and progression of lung cancer (55).In the present study, SPN malignancy did not differ significantly by sex, smoking history, lung function, and history of comorbid diseases including COPD, but this does not mean that these clinical characteristics are not associated with malignant SPNs.In future studies, the epidemiological factors of SPNs can be further explored by expanding the sample size, enriching the potential risk factors, and conducting multicenter prospective studies.Some patients have hematological indicators that are considered risk factors for lung malignancy, such as tumor markers (43, 56-58).In addition, in recent years, the direction of research has gradually shifted to inflammatory factors (59)(60)(61).Several articles have demonstrated that inflammatory factors are associated with lung cancer prognosis (62-65).However, few articles have demonstrated that inflammatory factors are associated with lung carcinogenesis.Therefore, in the present study, we included not only tumor markers but also other hematological correlates, various types of leukocytes, and several inflammatory indicators derived from them.Inflammatory cells are an important component of the tumor microenvironment, and the inflammatory response plays a critical role in cancer development  and progression and may be associated with systemic inflammation (19).Unfortunately, the present study did not investigate a definite association between inflammatory indicators and malignancy.Until now, studies have reported the association of inflammatory indicators with lung cancer prognosis and early recurrence (66-75).However, the association of inflammatory indicators with early lung carcinogenesis remains to be further investigated.However, no article has reported the association of inflammatory indicators with the development of early lung cancer.Inflammatory indicators may be normal in early-stage lung cancer.An association between the two could not be found in the present study.Among the various tumor markers, CEA is a polysaccharide protein complex involved in cell adhesion, which is usually absent or present in very small amounts in the blood of healthy adults and is thought to be associated with poor prognosis of tumors.Serum CEA levels are closely related to the pathological stage of lung cancer.Grunnet and Sorensen found that CEA was more significantly elevated in the serum of lung cancer patients than in patients with benign lesions (P< 0.05) (76).Our findings in which CEA was an independent predictor of malignant SPNs are consistent with previous findings (43, 44,57,76).
A number of additional imaging features also contribute to the risk stratification of patients with SPNs measuring ≤ 2 cm, including location, shape, spiculation, cavitation sign, calcification, vascular penetration sign, pleural adhesions, bronchus sign, lobulation, lymph node enlargement sign, pleural effusion sign, maximum tumor diameter, and CTR (77).We collected the above-mentioned imaging features of the patients, and after analysis, four independent predictors associated with the malignancy of SPNs measuring ≤ 2 cm were screened.Irregular nodules are a common finding in lung cancer screening (78,79).Malignant nodules are more likely to have irregular, lobulated, or needle-like margins because of the spread of cells within the lung mesenchyme and fibrosis within the tumor.Benign nodules are associated with smooth, rounded borders and exhibit a benign growth pattern.Calcification is a common CT feature of pulmonary tuberculosis and is usually considered a benign sign (79).Lung calcification results from deposition of calcium, mostly as a result of healing inflammation.Malignant tumors rarely have calcified foci, but mainly the nodules keep growing and clinically invade other healthy tissues.She et al. indicated that the risk of malignancy in SPNs increased 1.1-fold with a 1-mm increase in nodule diameter (80).However, Chen et al. did not find a diameterrelated association with malignancy in small SPNs (81).Our study showed that the risk of malignancy positively correlated with the maximum diameter of the SPNs measuring ≤ 2 cm.CTR is currently the most commonly used indicator for the management of ground glass nodules (82).However, it is important to note that CTR is only an indicator for malignant nodules and it is generally used to predict the aggressiveness of nodules.It is generally accepted that a lower CTR corresponds to less aggressive behavior, while a higher CTR indicates a more aggressive tumor (83)(84)(85)(86)(87)(88).A prospective radiological study for non-invasive prediction of pathological findings of clinical stage IA peripheral lung cancer by HRCT scan was conducted by the Japanese Clinical Oncology Study Group (JCOG0201) (27).The results of this study showed that pathological non-invasive carcinomas could be predicted by CTR values with a maximum tumor diameter ≤ 2 cm and a CTR ≤ 0.25, with a specificity of 98.7% for lung cancer.The 7.1-year follow-up results of this study concluded that both tumor maximum diameter ≤ 2 cm and CTR ≤ 0.25, and tumor maximum diameter ≤3 cm and CTR ≤ 0.5 on HRCT scans were good predictors of non-invasive pathology, with a 5-year overall survival rate of approximately 97% in both groups (89).In the present study, the role of CTR was contrary to previous findings, which may be because of the high number and proportion of in situ carcinomas with purely ground glass traits in the collected data.
The Mayo model was the most widely used model for predicting malignant SPN, and the PKUPH model claimed to be superior to traditional models.The Brock model is a more accurate predictive tool based on CT and clinical information description.However, these models did not involve clinical biomarkers.Foreign prediction models are not suitable for mainland Chinese populations.Some predictive models incorporate more advanced and quantitative imaging findings, such as CT attenuation and tumor diameter growth rates, in their assessments [5,6].However, these imaging data are rarely recognized and used by doctors since they are difficult to get, hard to conduct, and difficult to standardize.Our predictive model has the following advantages over previously published predictive models.First, we collected a relatively large number of small SPN cases and randomly divided them into a training cohort and an internal test cohort, which makes our conclusions more convincing.Second, surpassing previous work, we collected the most comprehensive clinical data and imaging data and provided a clear pathological diagnosis for each patient.Third, all important risk factors in the nomogram are available and prevalent in clinical practice.Fourth, the ROC, calibration, and DCA curves of the training cohort of the model perform well, and the accuracy and reliability of the model are satisfactory.Thus, our model can aid clinicians and facilitate a more individualized risk prediction for each patient.
There are some limitations to this study that need to be considered.First, owing to the retrospective nature of the study, we could not avoid potential selection bias.For example, we only included patients who underwent surgical resection in our department; otherwise, they would have been excluded, which is a selection bias.Second, our data were obtained from a single center with a relatively small sample size.The predictive model was only validated internally, so the selection bias present in the training cohort may also be present in the validation cohort.These reasons may limit the generalizability of our predictive nomogram and may also present some uncontrolled confounding factors.Therefore, the model requires further studies involving multiple centers and adequate samples to validate our results.Despite these limitations, the results of the internal validation suggest that the model will yield good results when applied to other populations.The independent risk factors identified in this study that preoperatively predict the probability of malignancy of SPNs measuring ≤ 2 cm, and the developed predictive nomogram may inform clinical decision-making by thoracic surgeons and pave the way for future research in this area.

Conclusion
We developed a clinical nomogram for predicting the probability of malignancy of SPNs measuring ≤ 2 cm based on clinical and radiological characteristics, and the nomogram had good predictive performance.The nomogram could predict the probability of nodal malignancy in preoperative patients with SPNs measuring ≤ 2 cm, improving the diagnostic efficacy of lung malignancies and providing additional clinical reference information and diagnostic evidence to guide clinicians in the next step of intervention and subsequent treatment modalities.

FIGURE 1 Flow
FIGURE 1 Flow diagram of patient selection through the study.SPN, solitary pulmonary nodules.

FIGURE 4 ROC
FIGURE 4ROC curves of nomograms for predicting the malignancy of SPN within 2 cm in the training and validation cohorts predicting.ROC, receiver operating characteristic; AUC, area under the ROC curve; SPN, solitary pulmonary nodule.

5 FIGURE 6
FIGURE 5 Calibration curves of the prediction nomogram in the training cohort (A) and validation cohort (B).The X-axis represents the probability predicted by the nomogram and the Y-axis represents the actual probability of malignancy of SPN within 2 cm.The black dashed line represents the ideal curve, the blue solid line represents the apparent curve (non-corrected), and the red solid line represents the bias-corrected curve by bootstrapping (B = 1000 repetitions).SPN, solitary pulmonary nodule.

TABLE 1
Patients' characteristics of the training cohort and validation cohort.

TABLE 2
Clinical characteristics of patients with benign and malignant SPNs measuring ≤ 2cm in the training cohort and validation cohort.

TABLE 3
Univariate and multivariate logistic regression analysis of the training cohort.

TABLE 4
Results of ROC curve for training cohort.