A Novel Nomogram Based on Machine Learning-Pathomics Signature and Neutrophil to Lymphocyte Ratio for Survival Prediction of Bladder Cancer Patients

Traditional histopathology performed by pathologists through naked eyes is insufficient for accurate survival prediction of bladder cancer (BCa). In addition, how neutrophil to lymphocyte ratio (NLR) could be used for prognosis prediction of BCa patients has not been fully understood. In this study, we collected 508 whole slide images (WSIs) of hematoxylin–eosin strained BCa slices and NLR value from the Shanghai General Hospital and The Cancer Genome Atlas (TCGA), which were further processed for nuclear segmentation. Cross-verified prediction models for predicting clinical prognosis were constructed based on machine learning methods. Six WSIs features were selected for the construction of pathomics-based prognosis model, which could automatically distinguish BCa patients with worse survival outcomes, with hazard ratio value of 2.19 in TCGA cohort (95% confidence interval: 1.63–2.94, p <0.0001) and 3.20 in General cohort (95% confidence interval: 1.75–5.87, p = 0.0014). Patients in TCGA cohort with high NLR exhibited significantly worse clinical survival outcome when compared with patients with low NLR (HR = 2.06, 95% CI: 1.29–3.27, p <0.0001). External validation in General cohort also revealed significantly poor prognosis in BCa patients with high NLR (HR = 3.69, 95% CI: 1.83–7.44 p <0.0001). Univariate and multivariate cox regression analysis proved that both the MLPS and the NLR could act as independent prognostic factor for overall survival of BCa patients. Finally, a novel nomogram based on MLPS and NLR was constructed to improve their clinical practicability, which had excellent agreement with actual observation in 1-, 3- and 5-year overall survival prediction. Decision curve analyses both in the TCGA cohort and General cohort revealed that the novel nomogram acted better than both the tumor grade system in prognosis prediction. Our novel nomogram based on MLPS and NLR could act as an excellent survival predictor and provide a scalable and cost-effective method for clinicians to facilitate individualized therapy. Nevertheless, prospective studies are still needed for further verifications.

Traditional histopathology performed by pathologists through naked eyes is insufficient for accurate survival prediction of bladder cancer (BCa). In addition, how neutrophil to lymphocyte ratio (NLR) could be used for prognosis prediction of BCa patients has not been fully understood. In this study, we collected 508 whole slide images (WSIs) of hematoxylin-eosin strained BCa slices and NLR value from the Shanghai General Hospital and The Cancer Genome Atlas (TCGA), which were further processed for nuclear segmentation. Cross-verified prediction models for predicting clinical prognosis were constructed based on machine learning methods. Six WSIs features were selected for the construction of pathomics-based prognosis model, which could automatically distinguish BCa patients with worse survival outcomes, with hazard ratio value of 2.19 in TCGA cohort (95% confidence interval: 1.63-2.94, p <0.0001) and 3.20 in General cohort (95% confidence interval: 1.75-5.87, p = 0.0014). Patients in TCGA cohort with high NLR exhibited significantly worse clinical survival outcome when compared with patients with low NLR (HR = 2.06, 95% CI: 1.29-3.27, p <0.0001). External validation in General cohort also revealed significantly poor prognosis in BCa patients with high NLR (HR = 3.69, 95% CI: 1.83-7.44 p <0.0001). Univariate and multivariate cox regression analysis proved that both the MLPS and the NLR could act as independent prognostic factor for overall survival of BCa patients. Finally, a novel nomogram based on MLPS and NLR was constructed to improve their clinical practicability, which had excellent agreement with actual observation in 1-, 3-and 5-year overall survival prediction. Decision curve analyses both in the TCGA cohort and General cohort revealed that the novel nomogram acted better than both the tumor grade system in prognosis prediction. Our novel nomogram based on MLPS and NLR could act as an excellent survival predictor and provide a scalable and cost-effective method for clinicians to facilitate individualized therapy. Nevertheless, prospective studies are still needed for further verifications.
Keywords: bladder cancer, pathomics, machine learning, neutrophil to lymphocyte ratio, prognosis BACKGROUND Bladder cancer (BCa) is one of the most common malignant tumors worldwide. It is estimated that there will be 83,730 new cases of BCa and 17,200 BCa-related deaths in the United States in 2021 (1). Although more than 75% of BCa patients are without muscle invasion, up to 10 to 15% of them could still progress to muscle-invasive disease after initial surgical treatment (2,3), which results in poor clinical outcomes. In addition, about 10-15% of initially diagnosed BCa patients have metastatic lesion, suffered from a survival possibility less than 5% in five-year (4). Therefore, it is of great urgence to find out useful predictors for predicting the clinical outcomes of BCa patients.
As an emerging high-throughput process of medical images, pathomics combines artificial intelligence and digitalized pathology, which displays its blueprint in future pathology diagnosis (5,6). The digitization in whole slide image (WSI) shows the advantage in artificial intelligence based pathological diagnosis as they provide non-manually handled specimen images (7). In addition, the neutrophil-to-lymphocyte ratio (NLR) has also been reported to be a valid biomarker for prognosis of multiple malignancies (8-10), including urothelial carcinoma (11). A systemic inflammatory marker score was also proved to be an effective predictor for tumor recurrence and progression of BCa without muscular invasion (12). However, how pathomics and NLR could be expediently used for prognosis prediction of BCa patients in clinical practice has not been fully understood.
In this study, we firstly carried out machine learning methods based on WSI to investigate the prognostic value of pathomics signature and NLR in BCa patients. Subsequently, we constructed and verified a novel nomogram based on pathomics signature and NLR to explore convenient and effective ways for prognosis prediction of BCa patients in clinical practice.

Patient Cohorts and Data Resource
Our patient cohorts come from two independent data source-Shanghai General Hospital and The Cancer Genome Atlas (TCGA,https://portal.gdc.cancer.gov). We recruited 102 BCa patients, who received operative treatment from January 2009 to December 2016 in the Shanghai General Hospital (General cohort). All the included patients shall meet the following inclusion criteria: (i) underwent radical or partial cystectomy, without preoperative treatment and positive residual tumor margin; (ii) diagnosed as a single type of primary malignant bladder tumor with pathological evidence; (iii) with complete clinicopathologic data and clinical follow-up information; (iv) with access to total neutrophil count and total lymphocyte count of peripheral blood before surgery; and (v) with access to corresponding hematoxylin-eosin (H&E) staining tumor slides.
Another 406 BCa patients meeting the first two criteria mentioned above and with open-access histopathology images from TCGA were also enrolled in this study. Detailed clinical information and RNA sequencing data was also acquired from TCGA database. RNA sequencing data was normalized using the RSEM method (13). Genes with transcriptomic value less than 70% of the total samples were eliminated for further analysis. Clinical characteristics of BCa patients recruited in this study were shown in Table 1.

Whole Slide Image Process and Analysis Pipeline
Raw H&E profiles of WSI without color deconvolution or any watershed processing were segmented into tiles. We eliminated tiles with non-cell objects or excess whitespace. The eligible tiles were further scanned and detected via QuPath digital pathology software (14) to construct modules for nuclear segmentation. Nuclear segmentation was carried out for recognized objects through Watershed cell detection based on segmentation parameters (15). A serious of tiles from the same H&E image were further reconstituted for representation of the original WSI. The detected image factors were shown in Table S1.
We built an analysis pipeline based on machine learning algorithm to intelligently analyze the detected H&E image features for different clinical applications. Least absolute shrinkage and selection operator via glmnet package (16) were used to identify optimal digital pathological features and calculate coefficients of each features in pathomics-based models. The workflow of histopathology image processing and analysis pipeline was shown in Figure 1.

Neutrophil to Lymphocyte Ratio
NLR was defined as the total neutrophil count divided by the total lymphocyte count (9). For the patients from the Shanghai General Hospital, the total neutrophil count and the total lymphocyte count of peripheral blood were tested before surgery. For the patients in the TCGA cohort, the total neutrophil count and the total lymphocyte count were estimated from transcriptomic data by using CIBERSORT based on the abundances of 22 types of immune cell (17).

Statistical Analysis
In this study, Statistical Package for Social Sciences 24.0 software (SPSS Inc., Chicago, IL, USA) and R 3.6.2 were used to conduct data analyses. Kaplan-Meier (KM) curve analysis with hazard ratio (HR) and 95% confidence interval (CI) were carried out to identify different survival outcomes. The prognostic nomogram was established based on MLPS and NLR via the rms and nomogramEx packages in R, which was evaluated via Calibration and decision curve. The cut-off value of each prognostic biomarker in different patient cohort was set as the optional value defined through survminer packages in R.

Developed and Verified the Machine Learning-Based Pathomics Signature for BCa
As shown in Figure 2A, the left vertical line which equaled to the minimum tenfold cross-validated error arrived at 6, indicating that six image factors were screened out to be the most prognostic factors for patients with BCa. The selected image factors included Nucleus/Cell area ratio, Nucleus circularity, Cell hematoxylin OD std dev, Cell area, Cell mincaliper, Cell eosin OD min.
The regression coefficients (b) of each selected image factors were also extracted from the LASSO analysis in Figure 2B  We further evaluated the performance of our pathomicsbased prognosis model in BCa patients through KM curve survival analysis. As shown in Figure 2C  p= 0.0014, Figure 2D), indicating the forceful performance of the pathomics-based prognosis model for BCa patients.

Important Role of NLR in Clinical Prognosis of Patients With BCa
We next to carry out KM curve survival analysis to identify the important role of NLR in clinical prognosis of BCa patients. As shown in Figure 3A, patients in the TCGA cohort with high NLR exhibited significantly worse clinical survival outcome when compared with patients with low NLR (HR = 2.06, 95% CI: 1.29-3.27, p <0.0001). External validation in General cohort also revealed significantly poor prognosis in BCa patients with high NLR (HR = 3.69, 95% CI: 1.83-7.44 p <0.0001, Figure 3B).
To further evaluate the important roles of MLPS and NLR for BCa patients, we performed univariate and multivariate Cox regression analysis in two different patient cohorts. We find out that both the MLPS and the NLR could act as independent prognostic factor for overall survival of patients with BCa ( Table 2).

Construction and Evaluation of a Novel Nomogram Based on MLPS and NLR
Since MLPS and NLR had been proved to be independent prognostic factors for BCa patients based on the univariate and multivariate Cox regression algorithm, we further tried to construct a novel nomogram based on MLPS and NLR to improve their clinical practicability ( Figure 4A). The calibration plots revealed that 1-, 3-and 5-year OS probability predicted by the integrated nomogram model had excellent agreement with actual observation (Figure 4B), indicating good ability to accurately predict OS status for BCa. Further decision curve analysis in the TCGA cohort revealed that when the threshold probability was larger than 0.42, using the novel nomogram for OS prediction added more benefit than tumor grade system ( Figure 4C). Furthermore, verification of decision curve analysis in General cohort indicated that the novel nomogram acted better than both the tumor grade and stage system in prognosis prediction ( Figure 4D), indicating that the nomogram was clinically useful.

DISCUSSION
Artificial intelligence revolutionizes the traditional healthcare system in various areas including radiology and pathology (5,18). Pathomics and radiomics, the applications of artificial intelligence, belong to high-throughput omics and show the eligibility in malignancy diagnosis and prediction (6,19). In addition, cooperated with machine learning methods, pathomics has exhibited its eligibility in pathological diagnosis, including lung cancers, breast cancers, neuron cancers and skin cancers, with very high accuracy (20)(21)(22)(23).
In this study, we firstly established and verified a pathomics-based prognosis model from WSI for predicting the survival status of BCa patients. Our prognosis model showed remarkable performance in distinguishing BCa patients with high survival risk in both two independent cohorts, indicating its potential in predicting the prognosis of BCa patients. Malignancies with polytypic nuclei, high  nucleoplasm ratio and hyperchromatic nuclei usually stand for higher grade of pathological classification and worse prognosis (24). Hence, in this study, the unrevealed implications for BCa prognosis prediction might be based on classic pathological theories. Through the currently outbreaking research on tumor immune system and tumor microenvironment, we start to realize that tumor immune response may be adjusted by tumor progression and afterwards affect tumor growth (25,26). In addition, focal tumor immune responses show possible association with the systemic immune responses in cancer patients (25,27). Studies show that changes in systemic inflammation environment, such as NLR, can be a useful biomarker for predicting the survival of cancer patients (25,28,29).
As reported previously, neutrophil can promote tumor progression through changing the tumor environment (28,30). Whereas lymphocytes, especially CD8 positive T cells, are the main forces to suppressing and removing tumor cells (28,31). The important roles of inflammatory markers in urothelial carcinoma have also been gradually recognized. The combination of preoperative NLR, C-reactive protein, and plasma fibrinogen could act as an effective predictor for prognosis of patient with upper tract urothelial carcinoma (32). In addition, pretreatment NLR was proved to be associated with advanced tumor stage and increased cancerspecific mortality in BCa patients receiving radical cystectomy (33).
Here, we constructed and evaluated a novel nomogram based on MLPS and NLR for BCa patients. The MLPS based on WSIs contains various important pathologic features, including nucleus/ cell area ratio, nucleus circularity, and cell area. The NLR value can reflect the systemic immune response background, which is detected from peripheral blood. The combination analysis of MLPS and NLR can improve the unilateral prognostic analysis and hence increase the prognostic accuracy. Intriguingly, considering the NLR can predict traditional chemotherapy outcomes in BCa patients (34), the integrated nomogram might show the potential for further drug resistance prediction.
Limitations could also be found in this study. Firstly, only 43 pathological signatures were detected from the segmented tile of each WSI, which reflects the need of more robust segmentation methods. Secondly, our study is retrospective and may be subject to inherent biases, although we have verified our major results in two independent patient cohorts. The machine learning-based models still need further verifications from prospective studies.

CONCLUSION
In conclusion, we identified the important roles of MLPS and NLR in the prognosis prediction of patients with BCa. The novel prognostic nomogram based on MLPS and NLR was further constructed and evaluated to act as an excellent survival predictor and provide a scalable and cost-effective method for clinicians to facilitate individualized therapy. Nevertheless, prospective studies are still needed for further verifications.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be available from the authors upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Research Ethics Committee of Shanghai General Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
JZ, XW and NZ designed the study. SC, TW, FG and EZ acquired the data. SC, LJ and SH analyzed the data. SC and NZ wrote the report, which was edited by all authors. JZ and XW supervised the project. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Natural Science Foundation of China (81972393 and 82002665).