A Support Vector Machine Based on Liquid Immune Profiling Predicts Major Pathological Response to Chemotherapy Plus Anti-PD-1/PD-L1 as a Neoadjuvant Treatment for Patients With Resectable Non-Small Cell Lung Cancer

The biomarkers for the pathological response of neoadjuvant chemotherapy plus anti-programmed cell death protein-1/programmed cell death-ligand 1 (PD-1/PD-L1) (CAPD) are unclear in non-small cell lung cancer (NSCLC). Two hundred and eleven patients with stage Ib-IIIa NSCLC undergoing CAPD prior to surgical resection were enrolled, and 11 immune cell subsets in peripheral blood were prospectively analyzed using multicolor flow cytometry. Immune cell subtypes were selected by recursive feature elimination and least absolute shrinkage and selection operator methods. The support vector machine (SVM) was used to build a model. Multivariate analysis for major pathological response (MPR) was also performed. Finally, five immune cell subtypes were identified and an SVM based on liquid immune profiling (LIP-SVM) was developed. The LIP-SVM model achieved high accuracies in discovery and validation sets (AUC = 0.886, 95% CI: 0.823–0.949, P < 0.001; AUC = 0.874, 95% CI: 0.791–0.958, P < 0.001, respectively). Multivariate analysis revealed that age, radiological response, and LIP-SVM were independent factors for MPR in the two sets (each P < 0.05). The integration of LIP-SVM, clinical factors, and radiological response showed significantly high accuracies for predicting MPR in discovery and validation sets (AUC = 0.951, 95% CI: 0.916–0.986, P < 0.001; AUC = 0.943, 95% CI: 0.912–0.993, P < 0.001, respectively). Based on immune cell profiling of peripheral blood, our study developed a predictive model for the MPR of patients with NSCLC undergoing CAPD treatment that can potentially guide clinical therapy.


INTRODUCTION
The combination of anti-programmed death receptor 1 (PD-1), or its ligand (PD-L1), and chemotherapy has recently gained attention in patients with advanced non-small cell lung cancer (NSCLC). In the KEYNOTE-407, KEYNOTE-189, and IMPOWER-130 studies (1)(2)(3), platinum-based doublechemotherapy plus anti-PD-1/PD-L1 (such as atezolizumab and pembrolizumab) showed significantly higher objective response, longer progression-free survival (PFS), and overall survival (OS) than chemotherapy alone in patients with metastatic NSCLC. The latest CHECKMATE-816 study showed that neoadjuvant nivolumab plus chemotherapy significantly improves the major pathological response (MPR), as reported in the American Society of Clinical Oncology 2021 (abstract number: 8503). The tumor mutation burden and PD-L1 expression were not found to be related to MPR in patients with NSCLC treated with neoadjuvant chemotherapy plus anti-PD-1/PD-L1 (CAPD). A recent study (SAKK 16/14) reported that patients who achieved an MPR showed longer event-free survival and OS than patients who did not among patients who underwent a combination of perioperative durvalumab and neoadjuvant chemotherapy (4). However, only a subset of patients with NSCLC acquired an MPR when undergoing CAPD as a neoadjuvant treatment. Therefore, identifying novel and useful biomarkers to predict the patients most likely to acquire MPR from CAPD before surgery is critical.
Tumor-infiltrating lymphocytes and peripheral blood immune cells were found to be related to the response of solid tumors to therapy (5)(6)(7)(8)(9). Circulating exhausted-phenotype CD8 + T cells are associated with poor immunological response to pembrolizumab in patients with stage IV melanoma (10). Furthermore, circulating PD-1 + CD8 + T cells, memory T cells, and elevated monocyte levels are strong predictors of response to immunotherapy (11)(12)(13). Therefore, we speculated that the immune cell subsets in peripheral blood may be associated with the MPR to CAPD as a neoadjuvant treatment of patients with NSCLC; if confirmed, an immune cell model can be built.
Machine learning has been gaining attention with respect to medical image recognition tasks and building prognostic prediction models from high-dimensional gene expression profiling data (14)(15)(16)(17)(18). A CHECKMATE-025 study developed a Bayesian network model for predicting immunotherapy prognosis in patients with metastatic renal cell carcinoma (19). Using radiomics biomarkers, radiology text reports, or somatic mutations, machine learning models could estimate the response and prognosis in patients with NSCLC treated with immunotherapy (20)(21)(22). To the best of our knowledge, the use of a support vector machine (SVM) based on immune cells to predict the MPR to CAPD treatment has never been reported.
Here, immune cell profiling was performed before the initial CAPD neoadjuvant therapy of patients with NSCLC before surgery. Using machine learning, a predictive immunological model was constructed and validated that can help identify patients who would most likely acquire MPR while undergoing CAPD.

Patients
Each patient provided detailed informed consent to the investigator. PD-L1 expression and EGFR/ALK mutation status were not necessary conditions for enrollment. The key eligibility criteria were as follows: (i) patients were at least 18 years old and willing to provide routine peripheral blood (2 mL) for immune cell analysis; (ii) Eastern Cooperative Oncology Group performance status was 0-1; (iii) a tissue biopsy of the tumor was confirmed to be lung adenocarcinoma (LUAD) or lung squamous cell carcinoma (LUSC) before any treatment; and (iv) patients with resectable stage Ib-IIIa NSCLC were examined using whole-body computed tomography (CT) or positron emission tomography-CT. The exclusion criteria were: (i) patients who could not tolerate treatment, such as those allergic to albumin-bound paclitaxel; (ii) patients undergoing CAPD but whose NSCLC was progressing rapidly or showed organ metastasis and were not suitable for resection treatment. Between September 2019 and June 2021, 211 patients who were receiving neoadjuvant treatment before surgery and met the criteria were recruited at the Cancer Hospital of the University of Chinese Academy of Sciences (CHUCAS) (Supplementary Figure 1). The patients were randomly divided in a 3:2 ratio into a discovery set (n = 127) and validation set (n = 84). The institutional review board of the Second Affiliated Hospital of Guizhou Medical University and CHUCAS approved our clinical research design. We have been conducted in accordance with the World Medical Association's Declaration of Helsinki.

Study Design
A flowchart of the study design is shown in Figure 1. Before treatment, whole blood samples were collected from patients and rapidly tested by multicolor flow cytometry before initial treatment. The detailed results of peripheral immune cells and clinical characteristics were recorded. Two hundred and eleven patients with NSCLC received 2-4 cycles of CAPD. According to the MPR status, the doctors then performed radical surgery of lung cancer. The pathological response of tumor tissues was estimated by a senior pathologist. We then analyzed the association between clinical factors and MPR using univariate analysis of both cohorts (discovery and validation sets). The immune cells in peripheral blood were chosen by recursive feature elimination (RFE) and least absolute shrinkage and selection operator (LASSO) methods. After integrating these two selection methods, the final immune cell subtypes were confirmed. An SVM model based on liquid immune profiling (LIP-SVM) was developed and validated in the two cohorts. Multivariate analysis for MPR was performed for patients with NSCLC using logistic regression. The integration of LIP-SVM, clinical factors, or radiological response was evaluated in terms of predictive accuracy for MPR. Contrast-enhanced CT images were examined to clearly identify the primary tumor. The radiological response, including complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD), was estimated before radical surgery by a senior thoracic radiologist using Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1.

Assessment of MPR for Patients Receiving CAPD
MPR is defined as the reduction of viable tumors to clinically defined significant margins, depending on the particular histological type of lung cancer and the specific treatment type. All histological types of lung cancer had an MPR with a histological definition of less than or equal to 10% of the viable tumor. The MPR was calculated by dividing the size of the viable tumor by the size of the tumor bed. Here, this was used to establish the threshold of the number of clinical trials. The pathology report recorded the total number of masses in the tumor bed, including some uninvolved lungs, even if these masses were not entirely composed of the tumor bed. The MPR can also be classified as a primary pulmonary tumor, where little or no viable metastatic carcinomas were found in the lymph nodes (ypT0, N1, 2, 3).

Defining and Profiling the Immune Cell Subtypes in Peripheral Blood
We evaluated four types of circulating immune cells: B, T, natural killer (NK), and natural killer T (NKT) cells. B and T cells were defined by CD19 expression (CD19 + B cells) and CD3 expression (CD3 + T cells), respectively. The presence of CD8 and CD4 was used to identify T-lymphocyte subsets (CD3 + CD8 + T cells and CD3 + CD4 + T cells). Memory (CD4 + CD45RO + ) T cells and CD4 + naïve (CD4 + CD45RA + ) T cells were identified by CD45RA and CD45RO expression. A combination of CD56 and CD3 was used to identify NKT (CD3 + CD56 + ) and NK (CD3 -CD56 + ) lymphocyte subsets. Activated CD8 + T cells (CD8 + CD38 + T cells) were recognized by CD38 expression. BD Biosciences (San Jose, CA, USA) provided the following antibodies in Supplementary  Table 1  1 | The study flowchart from patient enrollment to machine learning. Whole blood samples from patients with NSCLC on an empty stomach were collected and analyzed by multicolor flow cytometry before treatment. Patients with NSCLC were subjected to neoadjuvant CAPD treatment followed by radical surgery of lung cancer and MPR evaluation of tumor tissues. The association between clinical factors, radiological response, and MPR was analyzed by univariate analysis. The immune cells were selected by RFE and LASSO methods. The LIP-SVM signature was built and tested in the discovery and validation sets. Then, multivariate analysis for MPR was performed by logistic regression. Finally, three models integrating LIP-SVM, clinical factors, or radiological response were evaluated for predicting MPR. NSCLC, non-small cell lung cancer; CAPD, chemotherapy plus anti-PD-1/PD-L1; MPR, major pathological response; RFE, recursive feature elimination; LASSO, least absolute shrinkage and selection operator; LIP-SVM, support vector machine model based on liquid immune profiling. #5555776). For lymphocyte staining, 4 mL of peripheral blood was collected into a blood tube with ethylenediaminetetraacetic acid and an anticoagulant. Next, 20 mL of CD3-FITC/CD56-PE, CD19-FITC, CD4-FITC/CD45RO-APC/CD45RA-PE, CD8-FITC/CD38-PE, and FITC/PE/APC isotype controls was separately added to five flow cytometry tubes and 100 mL of every blood sample was added to every test tube. The tubes were sufficiently mixed with corresponding antibodies in the dark and incubated for 30 min at room temperature (20°C). A hemolytic agent (up to 2 mL; #70-LSB3; BD Biosciences) was then added to every tube. The supernatant was then removed by centrifugation (6 min), washed twice with phosphate-buffered saline (#SH300256; Hyclone, Logan, UT, USA), and resuspended in paraformaldehyde. Flow cytometry (FACSVia; BD Biosciences) was used to examine the cells. More than 2,000 cells were detected at the lymphocyte gate in every sample. CellQuest Pro software (BD Biosciences) was used to analyze the percentages of positively labeled lymphocytes. The staining procedure was completed and analyzed within 24 h after blood collection.

Feature Selection of RFE and LASSO Algorithms
Two feature selection methods (RFE and LASSO) were used in this study. RFE recursively reduces the size of the examined feature set to select features. The prediction model based on the original features is trained and a weight is assigned to each feature. Features with a minimum absolute weight are recursively removed until the desired number is reached. Random forest function and 5-fold cross-validation (CV) sampling were used for RFE. In addition, we used LASSO to select the most important immune cells from the discovery set. A log partial likelihood subject based on the LASSO method is minimized to add the absolute values of the parameters. Here, the standardized constraint parameter was set to -1.434. RFE and LASSO were performed using the "caret" package in R version 3.5.1.

SVM Building Model
The SVM is a classical model of machine learning with important value in tumor classification, prognosis, and treatment response predictions (23). Radial basis function, the most popular kernel function of SVM for nonlinear classification, can significantly improve the classification ability of the SVM by mapping the original input space to the feature space. The original nonlinear input space is transformed into the linear separability space and classified linearly in the feature space. The equation we used was as follows: where g is greater than 0 and the parameters need to be adjusted. The tuning parameters were set as sigma = 0.035, C = 100, and cross = 10. The "kernlab" library was implemented using R software.

Statistical Analysis
R software and GraphPad Prism were used to perform statistical analysis. A correlation heatmap was generated using the "pheatmap" package to depict the relationships between immune cells in the discovery and validation sets. Correlation scatter plots were used to indicate associations between immune cells. The scatter dot and box plots were used to represent the median and 95% confidence intervals (CI). The differences between groups were analyzed using the Mann-Whitney U test. The receiver operating characteristic curves (ROC) were plotted using the "pROC" package. The frequencies of two groups were compared using the chi-square test. Univariate and multivariate logistic analyses for MPR were performed in the two sets. Statistical significance was defined as P-value < 0.05.

Clinical Characteristics and MPR
The baseline characteristics of 211 patients from two independent cohorts are presented in By analyzing the association between clinical data and MPR in the discovery and validation sets, we found that old age (> 60 y) was significantly correlated with MPR (P = 0.023) (Supplementary Figure 2A), patients with squamous cancer presented with a higher MPR than patients with adenocarcinoma (P = 0.025) (Supplementary Figure 2B), and patients who underwent an anti-PD-L1 regimen showed a higher MPR than those who underwent an anti-PD-1 regimen (P = 0.023) (Supplementary Figure 2C). Moreover, patients with radiological CR and PR showed a higher MPR than patients with radiological SD or PD (both P < 0.001) (Supplementary Figure 2D). The detailed results of the two cohorts are presented in Supplementary Table 2.

Five Immune Cells Were Selected and Significantly Associated With MPR
To find suitable predictors for MPR, the immune cells before neoadjuvant therapy were identified, and then RFE and LASSO were used to perform feature selection. Based on 5-fold CV analysis, root mean square error (RMSE) showed that a combination of six variables had the smallest errors, and thus these immune cells were selected ( Figure 3A). Then, LASSO coefficient analysis of 12 features of immune cells was performed, after which eight coefficients were chosen based on a 5-fold CV analysis of minimum criteria ( Figure 3B). To find robust features, we selected five immune cell types with overlapping features for the SVM model ( Figure 3C). CD3 + CD56 + NKT cells were found significantly associated with MPR, and can thus be a predictor for MPR (P < 0.001, Figure 3D). In the discovery set, the percentage of CD3 -CD19 + B cells was higher in the no-MPR group (P = 0.011, Figure 3D), whereas that of CD3 -CD56 + NK cells was higher in the MPR group (P = 0.032). Moreover, patients in the MPR group had a higher percentage of CD4 + CD45RA -T cells than those in the no-MPR group (P = 0.017). The percentage of CD4 + CD45RA + T cells was higher in the no-MPR group than in the MPR group.

LIP-SVM Was Built and Validated in Another Independent Cohort
To develop the SVM model, fine-tuning was performed during the training process. LIP-SVM was calculated as the output score of machine learning in the discovery and validation sets. We found that patients in the MPR group showed significantly higher LIP-SVM scores than patients in the no-MPR group in both cohorts (both P < 0.001; Figures 4A, B). To compare the accuracy of different models or predictors in our study, we first used three meaningful clinical features, including age (≤ 60 vs. >  Figure 4E). The model is now available for free online testing (https:// pengjie.shinyapps.io/lipsvm/).

LIP-SVM Was an Independent Indicator of MPR in Patients With NSCLC
Based on the multivariate logistic regression analysis in the discovery set, age, radiological response, and LIP-SVM signature were independent risk factors for MPR [P < 0.016,

DISCUSSION
In this prospective study, we selected five immune cell subtypes in peripheral blood and found that patients with three favorable immune cells (CD3 + CD56 + NKT, CD3 -CD56 + NK, and CD4 + CD45RA -T cells) had a high MPR. Based on the SVM algorithm, the LIP-SVM signature was developed and can be used to predict the MPR of CAPD treatment. Multivariate analysis for MPR in the discovery and validation sets revealed that old age, radiological response, and LIP-SVM signature were positive independent factors of MPR. Combined with clinical factors, radiological response, and LIP-SVM, the LIP-SVMRC model exhibited the highest accuracy for predicting MPR compared with the other two models. These findings indicate that LIP-SVMRC can be used as a novel tool for the effective identification of patients that may acquire MPR from CAPD. Several studies have reported that the combination of chemotherapy and anti-PD-1/PD-L1 treatment for metastatic NSCLC as a first-line therapy significantly improves the treatment response, OS, and PFS of patients (24)(25)(26)(27). However, some studies have reported that neoadjuvant combination immunotherapy, especially CAPD (4, 28), makes it challenging to identify significant biomarkers for predicting MPR in patients with NSCLC undergoing CAPD. In our study, we found that old (> 60 years) patients with squamous cancer had higher positive CAPD response, indicating that the combination of platinumbased double-chemotherapy (nab-paclitaxel/pemetrexed) and immunotherapy was more suitable for these patients. Although the sample size of patients on anti-PD-L1 was small (13 and 9 patients in the discovery and validation sets, respectively), the combination anti-PD-L1 regimen showed a higher MPR than the combination anti-PD-1 as a neoadjuvant treatment for patients with NSCLC. This may be because anti-PD-L1 treatment affects both the tumor microenvironment [e.g., T and B cells, dendritic cells (DCs), and macrophages] and the tumor itself, which frequently express PD-L1 (29,30). In the evaluation of treatment response, we found that there was a partial discrepancy between radiological and pathological methods; nevertheless, radiological evaluation of neoadjuvant treatment response can aid preoperative prediction of MPR. In addition to clinical factors and radiological response, detailed knowledge of the patient's immune status of peripheral blood is needed to evaluate the efficacy of combination anti-PD-1/ PD-L1 treatment, and tumor immunogenicity score is evaluated as a predictor of immune checkpoint inhibitor response (31,32). Previous studies have reported that specific PD-1 + CD56 + and CD8 + T cells frequently indicate a good prognosis in patients with melanoma treated with immunotherapy (33,34). However, the immunological biomarkers for predicting MPR with CAPD as a first-line neoadjuvant remain unclear. The main reason for this is the paucity of reported data on neoadjuvant immunotherapy. In our prospective study, we revealed five immune cell subtypes based on liquid immune profiling for predicting the MPR of patients with stage Ib-IIIa NSCLC treated with CAPD. NKT cells have been reported to play a critical role in inducing cross-talk of plasmacytoid DCs with conventional DCs, which is associated with the generation of memory CD8 + T cells (35). A recent study reported that elevated peripheral NK cell numbers in patients with NSCLC are associated with responses (CR/PR) to immunotherapy (36). Our study revealed the positive correlations between CD3 + CD56 + NKT or CD3 -CD56 + NK cells and the MPR to CAPD. NKT/NK cells may play important roles in antitumor immunity, such as reactivation of fatigued immune cells derived from the tumor microenvironment.
Although immunotherapy can unleash CD8 + T cells and specific mutation-associated neoantigens, some tumor microenvironment factors (such as hypoxia and toxic metabolites) inhibit T cell activation (37,38). A recent study showed that PD-1 + CD8 + T cell-positive tumors are significantly associated with poor response to anti-PD-1 therapy (39). An increase in circulating PD-1 + CD8 + T cell numbers in the early stage of immunotherapy is also an indicator of poor prognosis in advanced cancers (5). In our study, we found that active CD8 + CD38 + T cells of the peripheral blood were not associated with MPR in patients undergoing neoadjuvant CAPD, indicating that a subtype of CD8 + T cells was suppressed or exhausted. In a recent study, the change of abundance in CD4 + CD45RA + T cells was a predictive biomarker for PFS after chemoradiotherapy (40). Interestingly, CD4 + CD45RA -T cells were positively and CD4 + CD45RA + T cells were negatively correlated with MPR in our study. These results suggest that a subtype of CD4 + T cells plays a crucial role in determining immunotherapy response in the initial stage of CAPD treatment. Moreover, circulating CD3 -CD19 + B cell counts are increased in patients with oral squamous cell carcinoma after radical operation or chemotherapy (41), but their value in predicting treatment response and prognosis is unclear. In our study, we found that patients with neoadjuvant CAPD and acquired MPR exhibited a lower percentage of CD3 -CD19 + B cells than patients without MPR. This is the first report of an association between CD3 -CD19 + B cells and CAPD, revealing their potentially negative role in cancer treatment response or prognosis. According to RECIST V.1.1, most patients (68.25%) acquired an objective response (CR/PR) to combination treatment, but this radiological method had a high specificity and low sensitivity in the two sets (specificity = 96.67 and 93.55; sensitivity = 58.21 and 45.28, respectively), which is not accurate enough to preoperatively predict pathological response. To precisely screen the patients for MPR, a machine learning method (integrating RFE, LASSO, and SVM algorithms) based on immune cell profiling was performed in this study. After finetuning the parameters, the LIP-SVM model exhibited higher predictive accuracy than the clinical model and evaluation of radiological response. Moreover, the LIP-SVM model showed an earlier prediction of MPR before initial CAPD than radiological  estimation. Multivariate analysis for MPR also revealed that the LIP-SVM signature was an independent factor in both cohorts. Several studies have integrated clinical factors and series models to improve the accuracy and robustness capability (42)(43)(44). In our study, the AUC of the LIP-SVMRC model, which integrates three factors (immune cells, radiological evaluation, and clinical variables), was high in all models, indicating a greatly improved predictive accuracy. This preoperative prediction model of MPR may be helpful for guiding radical surgery and personalizing a treatment regimen for each patient. Our study had two limitations. First, because of the high cost of analysis, targeted next-generation sequencing or whole-exome sequencing results were not analyzed in all samples from patients with NSCLC, which led to a small percentage of patients with test results that could not be analyzed. Second, our sample size was not large and multi-center prospective cohorts are required.
In conclusion, our study revealed the significant association between clinical variables (old age, squamous cancer, and anti-PD-L1 treatment), radiological response, and immune cells from peripheral blood and MPR in patients with NSCLC receiving 2-4 cycles of neoadjuvant CAPD. The classifications of the SVM model based on immune cell profiling and integration models provide a novel and noninvasive predictive method for identifying patients who may achieve MPR after CAPD.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Second Affiliated Hospital of Guizhou Medical University and the Cancer Hospital of the University of Chinese Academy of Sciences. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
Conception and design: JP. Administrative support: JP. Provision of study materials or patients: JP and DZ. Collection and assembly of data: JP. Data analysis and interpretation: JP. Manuscript writing: All authors. All authors contributed to the article and approved the submitted version.