Non-invasive Urine Test for Molecular Classification of Clinical Significance in Newly Diagnosed Prostate Cancer Patients

Objective: To avoid over-treatment of low-risk prostate cancer patients, it is important to identify clinically significant and insignificant cancer for treatment decision-making. However, no accurate test is currently available. Methods: To address this unmet medical need, we developed a novel gene classifier to distinguish clinically significant and insignificant cancer, which were classified based on the National Comprehensive Cancer Network risk stratification guidelines. A non-invasive urine test was developed using quantitative mRNA expression data of 24 genes in the classifier with an algorithm to stratify the clinical significance of the cancer. Two independent, multicenter, retrospective and prospective studies were conducted to assess the diagnostic performance of the 24-Gene Classifier and the current clinicopathological measures by univariate and multivariate logistic regression and discriminant analysis. In addition, assessments were performed in various Gleason grades/ISUP Grade Groups. Results: The results showed high diagnostic accuracy of the 24-Gene Classifier with an AUC of 0.917 (95% CI 0.892–0.942) in the retrospective cohort (n = 520), AUC of 0.959 (95% CI 0.935–0.983) in the prospective cohort (n = 207), and AUC of 0.930 (95% 0.912-CI 0.947) in the combination cohort (n = 727). Univariate and multivariate analysis showed that the 24-Gene Classifier was more accurate than cancer stage, Gleason score, and PSA, especially in the low/intermediate-grade/ISUP Grade Group 1–3 cancer subgroups. Conclusions: The 24-Gene Classifier urine test is an accurate and non-invasive liquid biopsy method for identifying clinically significant prostate cancer in newly diagnosed cancer patients. It has the potential to improve prostate cancer treatment decisions and active surveillance.


INTRODUCTION
Prostate cancer (PCa) is a prevalent cancer in men and a leading cause of cancer-related deaths. With the widespread use of prostate-specific antigen (PSA) screening, a large number of PCa are diagnosed and treated, leading to over-treatment of many early stage cases without significant clinical symptoms or life risk. PCa is a slow-growing tumor, and many studies have shown that treating early stage PCa may not benefit the patient's quality of life or affect mortality (1)(2)(3). Thus, after cancer diagnosis by biopsy, it is crucial to determine which patients have clinically significant cancer who need immediate treatment and which patients have clinically insignificant cancer who can be placed on active surveillance.
However, accurate stratification of PCa clinical significance remains a significant challenge. Clinicopathological parameters (i.e., ISUP Gleason Grade Groups and cancer stage) are used in clinical practice; however, they rely on biopsy specimens, which are susceptible to sampling limitations and analysis errors (4)(5)(6). Magnetic resonance imaging (MRI) and multiparametric MRI are non-invasive imaging tools for PCa diagnosis with the ability to significantly reduce the number of unnecessary repeat prostate biopsies, but their accuracy to detect clinically significant PCa is limited by large false-negative rates (7)(8)(9)(10). Molecular stratification methods using RNA, peptide, or circulating miRNA biomarkers in prostate tissue, blood, or urine samples are being developed. However, most of them were not tested for stratification of clinically significant and insignificant cancer but discriminated cancer risk groups, and no biomarker or biomarker panels have shown high diagnostic accuracy (11)(12)(13). Therefore, the development of more accurate tests is urgently needed.
Urine is a non-invasive source of liquid biopsy samples, since prostate epithelial cells are released into the urine and can be used for PCa diagnosis and prognosis by detecting gene expression levels of prostate-specific biomarkers (14)(15)(16)(17)(18). In addition, urine-based tests are more advantageous than clinicopathological parameters for periodic monitoring of cancer progression during active surveillance. We previously have developed a 25-Gene Panel urine test to distinguish PCa from benign prostate and found it can also distinguish clinically significant and insignificant cancer. In this study, we intended to develop a more accurate gene classifier and test its diagnostic performance for identifying clinically significant and insignificant PCa in the low/intermediate-grade/ISUP Grade Group 1-3 cancer patients, who have more need to determine the clinical significance for making treatment decisions. We showed that a novel 24-Gene Classifier urine test was robust with high diagnostic accuracy in two independent, multicenter retrospective and prospective studies as well as in the low/intermediate-grade/ISUP Grade Group 1-3 cancer subgroups (Figure 1).

Retrospective and Prospective Urine Cohorts
A multicenter retrospective study was conducted at San Francisco General Hospital (San Francisco, USA) with Institutional Review Board (IRB) approval (IRB #:  to collect and test archived urine sediments to identify and validate urine biomarkers for PCa diagnosis and prognosis. The prospectively designed, retrospective study used pre-biopsy urine samples randomly chosen from sample archives at the Cooperative Human Tissue Network (CHTN) Southern Division (patients in the U.S.) and Indivumed GmbH (patients in Germany). This study followed the REMARK guidelines. With prior ethical approval and patient consent for future studies, urine samples were collected from 520 patients who had elevated PSA or symptoms and were diagnosed to have prostate cancer (PCa) by routine biopsy after the urine collection. The patients were recruited from July 2004 to November 2014 with follow-up through June, 2015.
During the follow-up period, all the patients who had radical prostatectomy or other treatments were assessed periodically for biochemical recurrence (BCR, defined as consecutive PSA rise above 0.2 ng/mL twice according to NCCN guidelines) and cancer metastasis (by imaging with CT, magnetic resonance or X-ray as well as bone scan).
A multicenter prospective study was conducted at Shenzhen People's Hospital (Shenzhen, China) with IRB approval (Study Number: P2014-006) to study urine biomarkers for PCa diagnosis and prognosis using pre-biopsy fresh urine samples from patients treated at the seven hospitals that collaborated in the study. The study was conducted according to the REMARK guidelines. Fresh urine samples were collected consecutively from patients with elevated PSA levels or symptoms and who were scheduled for biopsy in the participating hospitals. Two hundred seven urine samples from patients diagnosed to have PCa by routine biopsy were included to form a prospective cohort.
The same patient inclusion and exclusion criteria were used in the retrospective and prospective studies. The inclusion criteria were age 18-90, pathological diagnosis of PCa, and no prior treatment with PCa drugs or 5-Alpha reductase inhibitors. The exclusion criteria were having prior prostatectomy, prior treatment with PCa drugs or 5-Alpha reductase inhibitors. The pathological diagnosis of PCa used standard needle biopsy with consistent procedures in both retrospective and prospective studies. Pre-operative PSA and Gleason score/ISUP Grade Groups (19) were provided. The pathological diagnosis of clinically significant or insignificant PCa was defined based on the National Comprehensive Cancer Network (NCCN) risk stratification guidelines. Clinically significant PCa was defined as having unfavorable intermediate, high, and very high risk, while clinically insignificant PCa was defined as having very low, low, and favorable intermediate risk. Specifically, patients were classified as having clinically significant cancer when meeting any of the following criteria: Gleason score >7 (ISUP Grade Group 4 and 5), Gleason score 4 + 3 = 7 (ISUP Grade Group 3), cancer stage ≥T3, PSA >20 ng/mL, and >50% of biopsy core with cancer, while the patients not meeting any of the criteria were classified as having clinically insignificant cancer.

Urine Sample Processing and Gene Expression Quantification
The procedures for urine sample processing and gene expression quantification were performed as described (20). In the retrospective study, 10-15 ml urine samples were collected without prior digital rectal examination and the urine pellet was flash-frozen and stored at −80 • C. In the prospective study, 15-45 ml urine was collected without prior DRE and the urine was stored with 5 ml DNA/RNA preservative AssayAssure (Thermo Fisher Scientific, Waltham, MA, USA) or U-Preserve (Hao Rui Jia Biotech Ltd., Beijing, China) at 4 • C and processed within a week. The urine was centrifuged at 1,000 × g for 10 min. The pellet was washed with phosphate-buffered saline and centrifuged again at 1,000 × g for 10 min. The cell pellet was then used for RNA purification or frozen immediately on dry ice followed by storage at −80 • C. The procedures for gene expression quantification were performed as described (20)  For the identification of clinically significant and insignificant PCa by the 24-Gene Classifier in the urine samples, CtS of the 24 genes was used in the following Urine Clinically Significant Cancer Algorithm: are gene 1 and gene 1 cross clinically significant PCa regression coefficients through gene 24 and gene 24 cross clinically significant PCa regression coefficients, L 1 through L 24 are clinically insignificant PCa regression coefficients of gene 1 through gene 24, and L 1 * 1 through L 24 * 24 are gene 1 and gene 1 cross clinically insignificant PCa regression coefficients through gene 24 and gene 24 cross clinically insignificant PCa regression coefficients. The sample was diagnosed as clinically significant PCa when the Urine Clinically Significant Cancer D Score was >0, whereas the sample was diagnosed as clinically insignificant PCa when the D Score was ≤0.
The diagnostic method of clinically significant and insignificant PCa by the 24-Gene Classifier in the prostate tissue specimens is described in the Supplementary Data.

Statistical Analysis
To create an algorithm for diagnosing clinically significant or insignificant PCa (Urine Clinically Significant Cancer Algorithm or Tissue Clinically Significant Cancer Algorithm), the association between the pathological diagnosis of clinically significant or insignificant PCa using NCCN classification and the relative gene expression values of the 24 genes in the classifier was tested by discriminant analysis using the statistical software XLSTAT (Addinsoft, Paris, France). To measure diagnostic performance, the diagnosis of all the samples by the algorithm was compared with their pathological diagnosis to calculate sensitivity, specificity, positive predictive value, negative predictive value, odds ratio, and their respective 95% confidence intervals (CI). In addition, the receiver operating characteristic (ROC) curve was plotted, and the area under the ROC curve was calculated along with its 95% CI. To eliminate overfitting, a leave-one-out cross-validation analysis was conducted for the 24-Gene Classifier in the combination cohort. The leave-oneout cross-validation was performed using XLSTAT to generate regression coefficients for the 24 genes to determine classification of clinically significant or insignificant cancer for each sample. Such classification was then compared with the pathological diagnosis of the sample to calculate the diagnostic performance of cross-validation of the 24-Gene Classifier.
Furthermore, to compare the diagnostic performance of the 24-Gene Classifier with pre-biopsy PSA, cancer stage, or Gleason score, univariate and multivariate logistic regression analyses were performed using XLSTAT.

Identification of a 24-Gene Classifier and Validation in Prostate Tissue Cohort
The National Comprehensive Cancer Network (NCCN) guidelines classify PCa into five risk groups and recommend that most patients in the very high, high, and unfavorable intermediate risk groups receive treatment, while most patients in the very low, low, and favorable intermediate risk groups are placed on active surveillance. Therefore, the very high, high, and unfavorable intermediate risk groups can be classified as clinically significant PCa, and the very low, low, and favorable intermediate risk groups are classified as clinically insignificant PCa. This classification is clinically meaningful and can guide treatment decisions. We used this classification as the standard for the development of a molecular classifier.
To validate the 24-Gene Classifier, we assessed its ability to identify clinically significant and insignificant PCa in a prostate tissue cohort MSKCC (n = 149) (21) ( Table 1) using an algorithm (Materials and Methods in Supplementary Data). The diagnosis by the 24-Gene Classifier was compared with the NCCN classification to calculate the diagnostic performance and the result showed an AUC of 0.976 (95% CI 0.954-0.998;  p < 0.0001; Table 2, Figure 2). In addition, subtracting any one or more genes from the classifier would lower its diagnostic accuracy; therefore, all genes in the classifier contributed significantly to the diagnostic algorithm and were included in the classifier.

Development and Validation of a 24-Gene Classifier Urine Test
We recently developed an improved method to detect mRNA expression of biomarker genes by cDNA pre-amplification before real-time qRT-PCR using urine samples collected without digital rectal examination (DRE). The method is robust and can be used for biomarker classifiers as non-invasive and convenient urine tests (20). We tested if the 24-Gene Classifier could detect clinically significant and insignificant cancer in cell pellets of the urine samples collected without DRE using the same method. We conducted two independent, multicenter retrospective and prospective studies to collect pre-biopsy urine samples. The patients in both cohorts were real patients from participating hospitals. The patient characteristics and clinicopathological parameters are shown in Table 1. The study endpoint was to measure the diagnostic performance of the 24-Gene Classifier urine test for the diagnosis of clinically significant and insignificant cancer after PCa diagnosis to determine if the patient needs treatment or active surveillance (Figure 1). We used a retrospective cohort (n = 520) as a training set to create the Urine Clinically Significant Cancer Algorithm to combine mRNA expression quantities of the 24 genes for classification of the urine sample as clinically significant or insignificant PCa. This diagnosis was compared with the NCCN classification to calculate diagnostic performance. The results showed that the 24-Gene Classifier was able to distinguish clinically significant and insignificant PCa with a sensitivity of 83.8% (95% CI 79.5-88.2%), specificity of 94.4% (95% CI 91.5-97.2%), and AUC of 0.916 (95% CI 0.891-0.941) (Table 3, Figure 3A; p < 0.0001).
The 24-Gene Classifier with the algorithm was validated in an independent prospective cohort (n = 207) and showed a   Figure 3B; p <0.0001).
The diagnostic performance of the 24-Gene Classifier was tested by combining the retrospective and prospective cohorts to form a large cohort of 727 patients. Such a combination is valid since both cohorts used the same inclusion and exclusion criteria for patient enrollment and the same urine collection method. In the combination cohort, the 24-Gene Classifier showed similar diagnostic performance as in the retrospective and prospective cohorts with a sensitivity of 84.6% (95% CI 81.2-88.0%), specificity of 94.9% (95% CI 92.4-97.4%), and AUC of 0.930 (95% CI 0.912-0.948) (Table 3, Figure 3C; p < 0.0001). Cross-validation was performed in the combination cohort and the result showed similar diagnostic performance, suggesting that there was no overfitting (Table 3, Figure 3D; p <0.0001).

Comparison of the 24-Gene Classifier Urine Test With Clinicopathological Measures
Clinicopathological parameters, including Gleason score, cancer stage, and preoperative PSA, are currently used for PCa risk stratification and treatment decision-making in clinical practice. We compared the diagnostic performance of the 24-Gene Classifier urine test with these parameters using univariate and multivariate logistic regression analyses. As shown in Table 4, the 24-Gene Classifier urine test had higher accuracy than Gleason score, cancer stage, and PSA, as shown by their respective AUC, sensitivity, specificity, and odds ratio in univariate logistic regression analyses (Table 4, Figures 3E-G,I-K).
In the multivariate logistic regression analyses, combining the 24-Gene Classifier urine test with Gleason score, cancer stage and PSA significantly improved its diagnostic performance with increased sensitivity, specificity, odds ratio, and AUC (combining the 24-Gene Classifier with cancer stage and Gleason score improved sensitivity to 94.7% (95% CI 94. 5
In clinical practice, separating the two cancer groups for treatment or surveillance in patients with low-and intermediategrade/ISUP Grade Group 1-3 cancer (Gleason score 6 and 7) is clinically meaningful, as it is especially difficult but important to determine the clinical significance for making treatment decisions in these patients. Thus, we tested the 24-Gene Classifier with the algorithm in the ISUP Grade Group 1-3 patients (referred as Gleason Score 6-7/ISUP Grade Group 1-3 Cohort) (n = 612). As shown in Table 6, the 24-Gene Classifier had a sensitivity of 86.4% (95% CI 82.8-90.2%), specificity of 94.8% (95% CI 92.2-97.3%), and AUC of 0.860 (95% CI 0.831-0.889) (Figure 3M; p < 0.0001). In contrast, Gleason score had lower diagnostic accuracy [i.e., lower specificity of 37.1% (95% CI 31.5-42.7%) and AUC of 0.548 (95% CI 0.502-0.594)] (Table 6, Figure 3N). Furthermore, combining the 24-Gene Classifier with Gleason score improved diagnostic accuracy (Table 6, Figure 3O; p < 0.01). This suggests that the 24-Gene Classifier urine test is more accurate than Gleason score in identifying clinically significant and insignificant PCa in low-and intermediate-grade/ISUP Grade Group 1-3 cancer patients.   Figure 3P; p < 0.0001). In contrast, the primary Gleason score (Gleason score 4 vs. 3) showed lower sensitivity and AUC (Table 6, Figure 3Q). However, when the 24-Gene Classifier urine test was combined with the primary Gleason score, the diagnostic accuracy improved with an increased sensitivity of 96.8% (95% CI 94.8-98.9%) and AUC of 0.969 (95% CI 0.954-0.984) ( Table 6, Figure 3R; p < 0.0001). The results showed that the 24-Gene Classifier urine test is more accurate at identifying clinically significant and insignificant PCa than the primary Gleason score in patients with intermediate-grade/ISUP Grade Group 2 and 3 cancer, and the two can be combined to provide more accurate stratification.

Ability to Predict Cancer Recurrence and Metastasis by the 24-Gene Classifier Urine Test
To further prove the clinical significance of the 24-Gene Classifier for stratifying clinical significance, we assessed if clinically significant cancer identified by the 24-Gene Classifier included the patients with biochemical recurrence (BCR) or cancer metastasis during the average 8 years follow up period in the retrospective cohort. We found that the 24-Gene Classifier could predict BCR or metastasis with 100% accuracy ( Table 5) as all patients with BCR or metastasis were classified as clinically significant cancer by the 24-Gene Classifier ( Table 7). In contrast, most patients with BCR or metastatic cancer had low-or intermediate-grade Gleason scores (91.3 and 85.7%, respectively) ( Table 7). Thus, using Gleason grade to stratify patients for treatment decision may result in a large number of recurrent and metastatic patients missing treatment. This showed that the 24-Gene Classifier was able to accurately identify clinically significant cancer with the potential of cancer recurrence and metastasis, proving its significant clinical value.

DISCUSSION
In this study, we developed a novel 24-Gene Classifier urine test to identify patients with clinically significant PCa who need immediate treatment and patients with clinically insignificant PCa who can be placed on active surveillance. The 24-Gene Classifier urine test was validated in two independent, multicenter, retrospective and prospective studies, as well as in the low-and intermediate-grade/ISUP Grade Group 1-3 PCa subgroups. In addition, its ability to identify clinically significant cancer with cancer recurrence and metastasis potential was assessed. Our results demonstrated that the 24-Gene Classifier urine test was a highly accurate and noninvasive liquid biopsy tool using urine samples collected without DRE to classify PCa clinical significance with superior performance to Gleason score, cancer stage, and pre-operative PSA. Extensive research has been conducted to improve cancer risk stratification by developing numerous methods for detection of clinically significant PCa including clinicopathological parameters (i.e., Gleason score), panels using multiple RNA, protein or circulating miRNA biomarkers, and imaging technologies (i.e., MRI). The diagnostic performance of these methods was measured in many studies. For example, PHIbased predictive model predicted clinically significant cancer with AUC of 0.75 (22) (9). The diagnostic performance of these methods showed that none of them had robust accuracy, none had high sensitivity and specificity with AUC > 0.9, none had high HR or odds ratio, and none used urine samples collected without invasive DRE (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34). In comparison, our 24-Gene Classifier urine test validated by large independent retrospective and prospective cohorts as well as various patient subgroups showed uniformly high diagnostic accuracy, thus, may serve as a better molecular classification for clinically significant and insignificant PCa. In addition, combining the 24-Gene Classifier urine test with Gleason score, cancer stage, and PSA provided exceptionally high diagnostic accuracy, therefore, the combinations may be used in clinical practice.
The mRNA profile revealed that PCa-specific biomarkers such as KLK3 (gene encoding PSA) and PCA3 were highly enriched in the urine samples, proving the validity of our urine collection and purification method for detecting PCa-specific biomarkers. The patient groups with clinically significant and insignificant cancer had similar age in all urine cohorts and the prostate tissue cohort (Table 1), which eliminated a potential age bias for molecular classification between the two groups. The 24-Gene Classifier showed similar diagnostic performance despite the use of long-term frozen urine pellets (retrospective cohort) or freshly collected urine samples (prospective cohort), patients with different clinicopathological parameters (i.e., Gleason grade, preoperative PSA level, cancer stage), or patients with different ethnic backgrounds. This suggests that the 24-Gene Classifier urine test is robust and may be used in different patient populations regardless of clinicopathological parameters or race/ethnicity. Although clinicopathological information from the initial biopsy and preoperative PSA can be used to assess clinical significance, it is impossible to perform biopsy periodically to obtain information for cancer surveillance. The noninvasive and accurate 24-Gene Classifier urine test is more useful than biopsy or prostatectomy-based measurements (i.e., prostate tissue-based tests such as Decipher, Polaris) for periodic monitoring of cancer progression during active surveillance.
Some of the 24 genes in the classifier have been studied previously as PCa diagnostic or prognostic biomarkers, or involved in cell proliferation, cancer invasion and metastasis (35)(36)(37)(38)(39)(40)(41), our combination of these genes in a classifier is novel. Although we have previously developed a 25-Gene Panel for PCa diagnosis from the same retrospective and prospective studies (20), the 24-Gene Classifier urine test was not intended to be used for cancer diagnosis but for identifying clinically significant cancer during treatment decision-making in the newly diagnosed cancer patients. The 24-Gene Classifier urine test was accurate in the low-and intermediate-grade/ISUP Grade Group 1-3 PCa subgroups, and was able to identify clinically significant cancer with cancer recurrence and metastasis potential at 100% accuracy.
One of the limitations of the study is that the prospective study cohort was smaller than the retrospective cohort. In the future, more prospective studies and studies that combine the 24-Gene Classifier urine test with MRI and other parameters will be conducted.
In summary, we developed and validated a highly accurate and non-invasive 24-Gene Classifier urine test to identify clinically significant and insignificant PCa. This novel molecular classifier can potentially be used in clinical practice to improve cancer treatment decisions, avoid over-treatment, and manage active surveillance.

DATA AVAILABILITY STATEMENT
The data presented in this study is included in the main article and the Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board (IRB) at San Francisco General Hospital (San Francisco, USA) (IRB #:  and IRB at Shenzhen People's Hospital (Shenzhen, China) (Study Number: P2014-006). The patients provided their written informed consent to participate in this study.