Clinical application of common inflammatory and nutritional indicators before treatment in prognosis evaluation of non-small cell lung cancer: a retrospective real-world study

Objective To evaluate the prognostic value of common clinical inflammatory and nutritional indicators before treatment in patients with non-small cell lung cancer in the real world. Method A total of 5,239 patients with pathologically confirmed non-small cell lung cancer from 2011 to 2018 in the Affiliated Cancer Hospital of Xinjiang Medical University were selected. Their inflammatory and nutritional indicators (RDW, PDW, NLR, LMR, NMR, PLR, SII, PNI, TP, ALB, CYRFA21-1, CEA, CA125, NSE, α1-globulin, α2-globulin, β1-globulin, β2-globulin, and γ-globulin) before treatment were collected. From the total number, 1,049 patients were randomly sampled (18 to 20% of patients each year) and used as the validation set; the remaining 4,190 patients were used as the training set. According to the eighth edition of the guidelines for the diagnosis, treatment, and stage risk stratification of lung cancer, the patients were divided into four groups: stage I/II operable, stage III operable, stage III inoperable, and stage IV. We used the X-tile software to intercept and classify the cut-off values of each index in the validation set. Univariate and multivariate Cox proportional-hazard regression were used to screen the independent risk factors affecting the prognosis of non-small cell lung cancer and establish a prognostic model for 1, 3, and 5 years. The validation set was used to verify its performance. Finally, the Kaplan–Meier curve was used to assess the survival rate, and the corresponding nomogram was established for clinical use. Results After screening, no effective indicators were found in the stage I/II operable group. RDW and CA125 were effective indicators for the stage III operable group (cut-off values were 14.1 and 9.21, respectively, compared with the low-value group; univariate HR was 2.145 and 1.612, and multivariate HR was 1.491 and 1.691, respectively). CYRFA21-1 and CA125 were effective prognostic indicators for the stage III inoperable group (cut-off values were 10.62 and 44.10, respectively, compared with the low-value group; univariate HR was 1.744 and 1.342, and multivariate HR was 1.284 and 1.304, respectively). CYRFA21-1, CA125, NLR, and α1-globulin were effective indicators of prognosis in stage IV (cut-off values were 3.07, 69.60, 4.08, and 5.30, respectively, compared with the low-value group; univariate HR was 1.713, 1.339, 1.388, and 1.539; and multivariate HR was 1.407, 1.119, 1.191, and 1.110, respectively). The model was constructed with the best validation power in stage IV patients (C-index = 0.733, 0.749, and 0.75 at 1, 3, and 5 years, respectively). Conclusion For patients with stage III and IV non-small cell lung cancer, some inflammatory markers, serum tumor markers, and nutritional indicators are independent prognostic factors. Combined with the general data of patients, the constructed prognostic evaluation model has the best efficacy in patients with stage IV and can be widely used in clinical practice.

Method: A total of , patients with pathologically confirmed non-small cell lung cancer from to in the A liated Cancer Hospital of Xinjiang Medical University were selected. Their inflammatory and nutritional indicators (RDW, PDW, NLR, LMR, NMR, PLR, SII, PNI, TP, ALB, CYRFA -, CEA, CA , NSE, α -globulin, α -globulin, β -globulin, β -globulin, and γ-globulin) before treatment were collected. From the total number, , patients were randomly sampled ( to % of patients each year) and used as the validation set; the remaining , patients were used as the training set. According to the eighth edition of the guidelines for the diagnosis, treatment, and stage risk stratification of lung cancer, the patients were divided into four groups: stage I/II operable, stage III operable, stage III inoperable, and stage IV. We used the X-tile software to intercept and classify the cut-o values of each index in the validation set. Univariate and multivariate Cox proportional-hazard regression were used to screen the independent risk factors a ecting the prognosis of non-small cell lung cancer and establish a prognostic model for , , and years. The validation set was used to verify its performance. Finally, the Kaplan-Meier curve was used to assess the survival rate, and the corresponding nomogram was established for clinical use.
Results: After screening, no e ective indicators were found in the stage I/II operable group. RDW and CA were e ective indicators for the stage III operable group (cut-o values were . and . , respectively, compared with the low-value group; univariate HR was . and . , and multivariate HR was . and . , respectively). CYRFA -and CA were e ective prognostic indicators for the stage III inoperable group (cuto values were . and . , respectively, compared with the low-value group; univariate HR was . and . , and multivariate HR was . and .
, and . , respectively). The model was constructed with the best Lv et al.
Conclusion: For patients with stage III and IV non-small cell lung cancer, some inflammatory markers, serum tumor markers, and nutritional indicators are independent prognostic factors. Combined with the general data of patients, the constructed prognostic evaluation model has the best e cacy in patients with stage IV and can be widely used in clinical practice. KEYWORDS NSCLC, prognosis, big data, inflammation, immunoglobulin

. Background
Lung cancer ranks second in all cancer incidence and first in cancer mortality worldwide. Age-standardized morbidity and mortality rates for lung cancer are 22.4 and 18.0 per 100,000, respectively. Lung cancer incidence and mortality are associated with factors such as the Human Development Index (HDI), gross domestic product (GDP), and smoking frequency (1). Based on pathological types, lung cancer can be broadly categorized into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), of which NSCLC is mainly squamous cell carcinoma and adenocarcinoma; adenocarcinoma occupies 30-40% of lung cancer (2). Since 2010, with advances in diagnosis and surgery, such as standardized division of pathological stages, improvement in thoracoscopic surgery, drug therapy for driver mutations, and application of immune checkpoint inhibitors, the survival rate of patients with non-small cell lung cancer has significantly improved (3)(4)(5)(6)(7)(8).
At present, with the study of the tumor microenvironment, more and more studies suggest that inflammation plays an essential role in the occurrence, development, and prognosis of malignant tumors (9,10). In particular, NLR, LMR, NMR, PLR, SII, and other inflammatory indicators have been confirmed to affect the prognosis of various malignant tumors (11-13). Since malignant tumors are consumptive diseases, their nutritional indicators, such as Alb and PNI, are also important factors affecting various types of tumors (14,15). Classical serum NSCLC tumor markers such as CA125, CYFRA21-1, and CEA are also thought to influence the prognosis of malignant tumors (16,17).
Given the importance of inflammation and nutrition in cancer progression and prognosis, peripheral blood leukocyterelated detection indicators and nutrition-related biochemical indicators are routine detection items for clinical patients, and enough data are easily obtained. However, the prognostic value of inflammatory and nutritional indicators in real-world cases of big data in clinical healthcare facilities has rarely been reported. In this study, we combined and continuously investigated the prognostic impact of inflammation, nutritional indicators, and protein components on different stages of NSCLC, providing a brand-new idea.

. . . Exclusion criteria
(1) Pre-operative radiotherapy and chemotherapy; (2) less than one cycle of chemotherapy and radiotherapy; (3) death within 30 days after treatment; (4) history of systemic inflammation associated with active infection; and (5) gene-driven positivity or use of gene-driven positivity drugs (mainly in stage IV patients).
All patients underwent pretreatment assessments, including medical history, treatment regimen, pathologic diagnosis, routine hematology and serum immunology tests, chest radiography, electrocardiography, chest and upper abdominal CT, brain magnetic resonance imaging (MRI), and bronchoscopy. Whole body bone scan or positron emission tomographycomputed tomography (PET-CT) was performed when metastasis was suspected.
. /fmed. .     . . Demographic and clinical variables . . . Demographic (1) Patients in stage I/II disease underwent radical surgery, including complete resection of the primary tumor (lobectomy), mediastinal lymph node dissection, and minimally invasive radical resection of lung cancer. Some patients underwent postoperative adjuvant chemotherapy, including platinum doublet adjuvant chemotherapy, with chemotherapy cycles >1 cycle; (2) Patients in stage III disease underwent surgery, and individualized radiotherapy and chemotherapy, with more than one cycle. Hematological parameters were collected within 1 week before surgery. (3) Patients in stages III and IV who did not undergo surgery underwent individualized radiotherapy and chemotherapy with more than one cycle. Hematological parameters were collected within 1 week before treatment.

. . Follow up
After their first follow-up stage, patients were followed up every 3 months for 1 year, every 6 months for 2 to 3 years, and then once a year. These follow-ups included hematological parameters, CT, MRI, PET-CT, etc. The last follow-up was in January 2022, and overall survival (OS) was calculated from the date of diagnosis to the date of death/last follow-up, with an average OS time of 23.9 months.

. . Statistics
Quantitative data were described using means ± standard deviations, analysis of variance was used to compare quantitative data, and the Chi-square test or Fisher's exact test was used to compare categorical variables. Using the training set data, X-tile software was used to determine the optimal cut-off value for quantitative data and divided into two categories, and all categorical data were described using percentages. Univariate and multivariate Cox proportional-hazard regression was performed using SPSS software to screen independent risk factors for prognosis, and nomogram prognostic models were established using R software.          Calibration and time-dependent ROC curves were validated, training set data were used for internal validation, and finally, the Kaplan-Meier method was used to draw survival curves. The logrank test was used to assess survival differences. All p values < 0.05 were considered significant.

. . Clinical and demographic characteristics
A total of 5,239 NSCLC patients were included in the study, with a mean age of 61.54 years, including 4,190 patients in the training set and 1,049 patients in the validation set. As seen in Table 1, in the training and validation sets, males (n = 2,599, 634) and adenocarcinomas (n = 2,783, 746) accounted for the majority. Among the TNM stage, pT1 (n = 375, 100) and pT2 (n = 379, 81) accounted for the majority of patients with stage I, II, and III operable early stage. Most of them underwent postoperative adjuvant chemotherapy (n = 454, 95), while among patients with stage III un-operated and stage IV advanced stage, T3 (n = 565, 138) and T4 (n = 1,187, 303) accounted for the majority. In the training and validation sets, there were more patients receiving chemotherapy treatment (n = 2,299, 577) than those receiving radiotherapy treatment (n = 617, 155). Among the included indicators, except for nutritional indicators PNI, TP, and ALB, which tended to decrease with stage, the other indicators tended to increase. In the difference analysis, except in the following, the indicators were significant in different stage groups (p < 0.

. . Optimal cut-o values and clinical
In recent years, X-tile software has been the primary tool used to intercept optimal cut-off values. The training set was used to intercept optimal cut-off values of quantitative data of each stage and classify it. For example, the cut-off value of NLR in stage I/II operable group was 1.95. The patients were divided into two groups (≤1.95 and >1.95). Using this method, the cut-off values were intercepted for each indicator in different stage groups and expressed in Table 2. All categorical variables are expressed as percentages through analysis of clinical and demographic characteristics (Table 1).
Based on the univariate Cox proportional-hazard regression results, multivariate Cox proportional-hazard regression analysis was performed (Table 3). It showed that age and T were independent risk factors in the stage I/II operable group. Gender, age, N, RDW, and CA125 were independent risk factors in the stage III operable group. Gender, age, stage, radiotherapy, chemotherapy, CYFRA21-1, and CA125 were independent risk factors in the stage III inoperable group. Gender, age, T, N, chemotherapy, radiotherapy, NLR, CYFRA21-1, CA125, and α1-globulin were independent risk factors in the stage IV group.

. . Establish a nomogram prognostic model
A nomogram prognostic model ( Figures 1A-D) was established based on a multivariate Cox proportional-hazard regression analysis of the training set. Each value level of each factor was scored according to the degree of contribution of each factor to the outcome variable in the model (the magnitude of the regression coefficient). Then each score was summed to obtain . /fmed. . the total score. Finally, the predictive value of the individual outcome event was calculated by the functional transformation relationship between the total score and the probability of the outcome event.

. . Prognostic model performance validation
The prognostic model was validated using the training set data (Figures 2A-I-5A-I). In the stage I/II operable group (Figure 2), the calibration curves at 1, 3, and 5 years (Figures 2A-C) had good agreement, and the predicted probability was more consistent with the actual probability of occurrence. The time-dependent ROC ( Figures 2D-F)

. . K-M survival curve
Based on multivariate Cox proportional-hazard regression analysis to screen independent risk factors, we established the K-M method to draw survival curves and used a log-rank test to assess survival differences. In the stage I/II operable group ( Figures 10A, B), there was a significant difference in the overall production rate between the two groups by stratified log-rank Frontiers in Medicine frontiersin.org . /fmed. . test for age (p < 0.001) and T (p < 0.001). In the stage III operable group (Figures 11A-E), there was a significant difference in the overall production rate between the two groups by stratified log-rank test for gender (p = 0.001), age (p = 0.001), N (p = 0.029), RDW (p < 0.001), and CA125 (p = 0.015). In the stage III inoperable group ( Figures 12A-G), there was a significant difference in the overall production rate between the two groups by stratified log-rank test for gender (p < 0.001), age (p < 0.001), stage (p < 0.001), chemotherapy (p < 0.001), radiotherapy (p < 0.001), CYFRA21-1 (p < 0.001), and CA125 (p < 0.001).

. Discussion
Lung cancer is a highly prevalent malignant tumor with the highest incidence and mortality worldwide, second only to breast cancer. Of all lung cancer types, NSCLC is 85% making it important to study NSCLC. Early lung cancer is mainly treated through surgery. According to relevant literature, the median OS time of patients with early NSCLC who undergo surgery is 7.9 years. When patients exhibit clinical manifestations such as cough, chest pain, and hemoptysis, they are already in the middle and advanced stages. Although the survival rate has significantly increased with individualized and precise treatment of lung cancer, the median OS time is just 4-34 months (18), which is substantially different from early lung cancer prognosis. Therefore, it is equally important to study the different stages of lung cancer.
According to related studies, the tumor microenvironment plays an essential role in the occurrence and development of  malignant tumors. With more in-depth research, the relationship between inflammation and malignant tumors is being increasingly recognized. The inflammation mechanism promoting the growth of malignant tumors may be described as: inflammation releases cytokines and transcription factors are up-regulated, leading to the generation and accumulation of a large number of oxygen-free radicals, which cause DNA damage and breakage in parenchymal cells, including stem cells. The overexpression of proto-oncogenes, loss of tumor suppressor gene function, and up-regulation of genes promoting the cell cycle lead to abnormal cell proliferation, thereby promoting the occurrence of tumors. The presence of NLR, LMR, NMR, PLR, PDW, RDW, and SII in blood cells as inflammatory indicators have been widely confirmed to predict the survival prognosis of malignant tumors effectively. Cytokines and inflammatory mediators produced by inflammatory cells could produce a series of related stress responses, trigger inflammatory cells and protein aggregation, and bring about the biological effect of oxidative cell damage. These activities interfere with the stability of the body's microenvironment, thereby accelerating tumor growth, invasion, metastasis, and other processes that affect the prognosis of tumors (19)(20)(21)(22). These tumors include α1-antitrypsin (α1-AT), α1-acid glycoprotein (α1-AG), C-reactive globulin (CRP) (23), etc., in acute phase response proteins. Their increased levels could be used to predict the prognosis of malignant tumors.
However, there are few studies on the application value of acute phase response proteins in lung cancer. In protein electrophoresis, the vast majority of acute phase response proteins exist in the α1-globulin and α2-globulin, and an increase in α1globulin and α2-globulin can fully reflect the inflammatory status (24). We innovatively included protein electrophoresis results to find an association between acute phase recognition reactive protein and lung cancer. Moreover, with malignant tumors being consumptive diseases, their nutritional indicators, such as Alb and PNI, are also considered to affect their prognosis. Recent research on tumor markers has found that they can assist in the diagnosis and have a certain value for survival prognosis, treatment response, recurrence, and metastasis (25). Therefore, it is of great significance to include inflammation, nutritional indicators, and tumor markers as influencing factors and conduct a large-sample retrospective study.
The current X-tile software is commonly used to intercept optimal cut-off values for survival analysis. In this study, we intercepted the optimal cut-off value for quantitative data according to different stages and divided it into two categories. Using the training set for univariate and multivariate Cox proportional-hazard regression, we found that only age and T were independent risk factors in the group receiving surgery in stage I/II, which may be associated with chronic inflammation caused by malignant tumors that could not stimulate the body for a long time because of the short onset time. At the same time, adjuvant chemotherapy performed after surgery was not an independent risk factor affecting the prognosis. This was compatible with Xue et al. (26) view that receiving surgery was still the most significant influencing factor for early lung cancer. In the stage III operable group, gender, age, N, RDW, and CA125 were independent risk factors. According to Qi-Fan et al. (18), with the progression of the disease, T increased, and N had a greater effect on prognosis than the T stage. Similarly, we found that whether chemotherapy and radiotherapy were performed after surgery was not an influencing factor. However, in this study, because fewer patients underwent surgery in stage III, there were more bias factors, and the statistical power was weakened, which required further verification.
In the stage III inoperable group, gender, age, stage, radiotherapy, chemotherapy, CYFRA21-1, and CA125 were independent risk factors. For stage III inoperable patients, whether they were treated, including chemotherapy and radiotherapy, were strong influencing factors. Zhi and Jun (27) also found that in advanced lung cancer, tumor markers could predict the prognosis more significantly than in the early stage. Most lung cancers were diagnosed at an advanced stage and missed the timing of surgery, so active and effective treatment was particularly important, with the greatest impact of chemotherapy. As described by clinical and demographic characteristics, we believe that chronic inflammatory stimulation for a long time leads to higher and lower inflammatory markers, tumor markers, and nutritional indicators in patients. The large sample data showed that gender, age, T, N, chemotherapy, radiotherapy, NLR, CYFRA21-1, CA125, and α1-globulin were independent risk factors in the stage IV group. Nutritional indicators were not significant in any of the staged risk strata, which may be associated with little difference in nutritional status per patient in each stratum, but nutritional indicators were different between groups and require further study.
While undertaking risk stratification according to stage, we also innovatively combined inflammation, nutritional indicators, and tumor markers to predict prognosis in NSCLC. To the best of our knowledge, this is the first time it has been attempted. Though this study had a large sample of data, it has limitations. With the small sample size for early lung cancer, bias factors may limit statistical power. Because of the retrospective nature of the data collection and the failure to include some known prognostic parameters such as tumor cell differentiation, vascular invasion, and perineural invasion, and some important molecular factors (such as EGFR mutation, ALKEML4 fusion), more rigorous prospective studies are needed to validate, and further efforts are required to improve this model in terms of wider geographic recruitment and integration of some other factors.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the patients/participants or patients/participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions
BX, QZ, and SH had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Concept, design, and statistical analysis: XL, BX, QZ, and SH. Acquisition, analysis, and interpretation of data: XL. Drafting of the manuscript: XL, BX, and YF. Funding: XL and YF. Administrative: XL, BX, QZ, SH, and YF. Study supervision: XL, QZ, and YF. All authors contributed to the article and approved the submitted version.