Skip to main content

ORIGINAL RESEARCH article

Front. Immunol., 16 December 2021
Sec. Systems Immunology
This article is part of the Research Topic Re-Using Cytometry Datasets in Immunology: “Old Wine into New Wineskins" View all 7 articles

Prostate Cancer: Early Detection and Assessing Clinical Risk Using Deep Machine Learning of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data

Georgina Cosma*&#x;Georgina Cosma1*‡Stphanie E. McArdle,&#x;Stéphanie E. McArdle2,3‡Gemma A. Foulds,Gemma A. Foulds2,3Simon P. Hood&#x;Simon P. Hood2†Stephen Reeder,Stephen Reeder2,3Catherine Johnson,Catherine Johnson2,3Masood A. KhanMasood A. Khan4A. Graham Pockley,*&#x;A. Graham Pockley2,3*‡
  • 1Department of Computer Science, Loughborough University, Loughborough, United Kingdom
  • 2John van Geest Cancer Research Centre, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
  • 3Centre for Health, Ageing and Understanding Disease (CHAUD), School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
  • 4Department of Urology, University Hospitals of Leicester National Health Service (NHS) Trust, Leicester, United Kingdom

Detecting the presence of prostate cancer (PCa) and distinguishing low- or intermediate-risk disease from high-risk disease early, and without the need for potentially unnecessary invasive biopsies remains a significant clinical challenge. The aim of this study is to determine whether the T and B cell phenotypic features which we have previously identified as being able to distinguish between benign prostate disease and PCa in asymptomatic men having Prostate-Specific Antigen (PSA) levels < 20 ng/ml can also be used to detect the presence and clinical risk of PCa in a larger cohort of patients whose PSA levels ranged between 3 and 2617 ng/ml. The peripheral blood of 130 asymptomatic men having elevated Prostate-Specific Antigen (PSA) levels was immune profiled using multiparametric whole blood flow cytometry. Of these men, 42 were subsequently diagnosed as having benign prostate disease and 88 as having PCa on biopsy-based evidence. We built a bidirectional Long Short-Term Memory Deep Neural Network (biLSTM) model for detecting the presence of PCa in men which combined the previously-identified phenotypic features (CD8+CD45RA-CD27-CD28- (CD8+ Effector Memory cells), CD4+CD45RA-CD27-CD28- (CD4+ Effector Memory cells), CD4+CD45RA+CD27-CD28- (CD4+ Terminally Differentiated Effector Memory Cells re-expressing CD45RA), CD3-CD19+ (B cells), CD3+CD56+CD8+CD4+ (NKT cells) with Age. The performance of the PCa presence ‘detection’ model was: Acc: 86.79 ( ± 0.10), Sensitivity: 82.78% (± 0.15); Specificity: 95.83% (± 0.11) on the test set (test set that was not used during training and validation); AUC: 89.31% (± 0.07), ORP-FPR: 7.50% (± 0.20), ORP-TPR: 84.44% (± 0.14). A second biLSTM ‘risk’ model combined the immunophenotypic features with PSA to predict whether a patient with PCa has high-risk disease (defined by the D’Amico Risk Classification) achieved the following: Acc: 94.90% (± 6.29), Sensitivity: 92% (± 21.39); Specificity: 96.11 (± 0.00); AUC: 94.06% (± 10.69), ORP-FPR: 3.89% (± 0.00), ORP-TPR: 92% (± 21.39). The ORP-FPR for predicting the presence of PCa when combining FC+PSA was lower than that of PSA alone. This study demonstrates that AI approaches based on peripheral blood phenotyping profiles can distinguish between benign prostate disease and PCa and predict clinical risk in asymptomatic men having elevated PSA levels.

Introduction

Currently, diagnosing prostate cancer (PCa) primarily relies on painful invasive biopsies which put ~5% of men at risk of developing life-threatening infections, such as urosepsis. As biopsy results are not definitive, there is a significant risk of misdiagnosis, over-treatment, and under-treatment. It is therefore imperative to avoid unnecessary biopsies and more accurately diagnose the presence of PCa and, if present, its clinical significance.

In a landmark study, Stamey et al. performed the first large-scale analysis of serum PSA as a prostate cancer biomarker in 1987, convincingly demonstrating that PSA was more sensitive than prostate specific acid phosphatase (PSAP)/prostatic acid phosphatase (PAP) for monitoring the disease (1). They showed that PSA levels increased with advancing clinical stage and was useful for detecting disease recurrence after curative therapy (1). In 1991, Catalona et al. demonstrated that the combination of a serum PSA measurement ≥4.0 ng/ml with other clinical findings, such as the results of a DRE, improved detection of prostate cancer in a prospective study of 1653 healthy men with no history of cancer (2).

Although the clinical introduction of the Prostate-Specific Antigen (PSA) test in 1986 increased the early diagnosis of localised PCa, elevated levels of PSA do not necessarily indicate the presence of disease, as PSA levels can be raised by prostatitis, other localised infections, benign hyperplasia and/or other factors such as physical stress. It is also the case that 15% of men with PSA levels in the normal range typically have PCa, with a further 15% of these cancers being high‐grade (https://prostatecanceruk.org/prostate-information/prostate-tests/psa-test).

Findings from a study involving 419,582 British men aged 50 to 69 years - the Cluster Randomized Trial of PSA Testing for Prostate Cancer (CAP), which was conducted at 573 primary care practices across the United Kingdom, do not support single PSA testing for population-based screening and suggest that asymptomatic men should not be routinely tested to avoid unnecessary anxiety and treatment (3). However, in contrast to the CAP study, the 16-year follow-up of the European Randomized Study of Screening for Prostate Cancer (ERSPC) which was launched in 1993 and was the world’s largest randomized controlled trial evaluating the effect of PSA screening on PCa mortality involving men aged between 50 and 69 has reported PSA screening to significantly reduce PCa-related mortality (4). Given its poor diagnostic specificity, PSA-based PCa screening is not currently supported by the UK National Health Service (NHS) or promoted in any other country.

So, how do we improve the diagnosis of PCa beyond the utilisation of PSA and digital rectal examination (DRE) alone given that measuring blood PSA levels lacks specificity and the DRE lacks both sensitivity and specificity? PSA and DRE measurements do not necessarily differentiate between clinically significant PCa, which requires treatment, and indolent cancer, for which the current recommendation is active surveillance. The challenge over the past two decades has therefore not only been to improve the diagnostic yield of PCa, but also to develop new approaches for more specifically distinguishing between benign prostate disease and PCa and, arguably more importantly, between low-risk disease which requires no treatment and clinically significant disease which requires treatment. As the diagnosis of PCa based on PSA levels and the DRE alone is not reliable, confirmation using other approaches such as invasive biopsies and/or MRI scans is required.

Traditionally, PCa has been diagnosed by performing transrectal ultrasound (TRUS) guided prostate biopsies. However, such a biopsy technique has a cancer detection rate of less than 30% in a benign feeling prostate. The major drawback in performing TRUS prostate biopsies is that it is only possible to accurately biopsy the posterior peripheral and transition zone due to limitations in mobility of the ultrasound probe. Currently, ~55% of transrectal ultrasound (TRUS) biopsies return negative results (5). A negative TRUS biopsy of the prostate does not therefore necessarily equate to a cancer-free prostate, as prostate cancer may be present in the anterior parts of the peripheral or transition zone that are inaccessible via such a route. As such, a negative TRUS biopsy could falsely be reassuring to the patient who then subsequently presents later with advanced/metastatic PCa. As the rectum is highly colonized with bacteria, approximately 3-5% of men who undergo TRUS guided prostate biopsies will experience potentially life threatening urosepsis (6) with many such patients requiring ITU care. Worryingly, the risk of developing urosepsis has increased over the past decade due to the development of multi-drug resistant fecal bacteria (7). Another issue is that the PCa detection rate significantly reduces when TRUS biopsies are repeated due to rising PSA (8).

The diagnostic strength of an alternative biopsy approach - the transperineal template prostate (TPTP) biopsy - which involves interrogating the entire prostate using a grid/template of needles inserted via the perineal skin has been shown to deliver a better rate of cancer detection than the TRUS biopsy (52%-68%) (9). Directly comparing TRUS against TPTP in biopsy naïve men has also revealed TPTP to significantly outperform TRUS with respect to the detection of PCa (60% versus 32%) (10). Although MRI-based diagnosis of PCa is continuing to develop, MRI cannot currently be used as a sole diagnostic to replace biopsies, as a positive MRI can be incorrect in ~25% of cases and a negative MRI incorrect in ~20% of cases. MRI can be used on patients with a PSA of 10-20 ng/ml and ~70% of these men are currently having `up front’ MRI which consumes vital healthcare resources. However, MRI does have clinical utility for staging and focusing of biopsies. It is therefore essential that misdiagnosis and unnecessary procedures are reduced by the development of non-invasive approaches such as blood tests/liquid biopsies that are more accurate at detecting and categorizing the clinical risk of PCa than the PSA test.

Given the reciprocal relationship between cancer and a patient’s immune system, we proposed, and have previously shown, that the presence of PCa is reflected by detectable changes in the peripheral blood immunome. We were the first to successfully use computational modelling of multi-parametric flow cytometry data of peripheral blood T and B cells to identify phenotypic profiles (‘signatures’) which, when input into a computational machine learning tool, reliably identifies the presence of PCa in asymptomatic men with PSA levels <20 ng/ml (11). Managing this group of individuals presents a particularly significant clinical quandary, because although only 30%-40% of these men will have PCa, currently all must undergo potentially unnecessary invasive prostate biopsies. For this study (11) we devised a combinatorial feature selection method to identify a unique peripheral blood immune cell phenotypic profile (`signature’) of five T and B cell phenotypic ‘features’ which was incorporated into an interpretable machine learning model. Our approach achieved 83% accuracy, versus 77.78% for the PSA test, and decreased false positives by 12.9% (11).

Using samples from the same cohort of asymptomatic men having PSA levels <20 ng/ml, we subsequently demonstrated that incorporating eight peripheral blood natural killer (NK) cell phenotypic features into an Ensemble machine learning prediction model could also distinguish between the presence of benign prostate disease and PCa. Furthermore, and very importantly, we could also demonstrate that the machine learning model, when adapted to incorporate 32 NK cell phenotypic features, could predict the D’Amico Risk Classification (clinical risk of PCa) in those patients identified as having PCa and was thereby able to accurately differentiate between the presence of low-/intermediate-risk disease and high-risk disease without the need for additional clinical data (12). These studies used Genetic Algorithms to identify combinations of phenotypic features which were used to develop prediction models based on the k-Nearest Neighbour (kNN) classification algorithm and Ensemble machine learning prediction models (11, 12).

The phenotypic datasets utilised in our previous studies were generated from asymptomatic men who had PSA levels <20 ng/ml and who had all undergone diagnosis using the more definitive TPTP biopsy (11, 12). The aim of the study presented herein is to extend the findings of our previous studies to determine whether the T and B cell phenotypic features which have previously been identified as being able to distinguish between benign prostate disease and PCa in asymptomatic men having PSA levels < 20 ng/ml (11) can also be used to detect the presence and clinical risk of PCa in a larger cohort of patients whose PSA levels ranged between 3 and 2617 ng/ml, the PCa disease status of whom had been determined using either the TPTP or TRUS biopsy. For this, we implemented two separate bidirectional Long Short-Term Memory Deep Neural Network (biLSTM) models, one for predicting the presence of PCa and another for predicting the clinical risk of any PCa present, as defined by the D’Amico Risk Classification. Given limited sample numbers, it was not possible to undertake a similar analysis using the NK cell phenotyping dataset from our previous study (12).

Materials and Methods

Sample Collection

Peripheral blood samples were obtained from asymptomatic men suspected of having PCa that attended the Urology Clinic at Leicester General Hospital (Leicester, UK) between 24 October 2012 and 15 August 2014. Samples were obtained from two cohorts of patients, termed the ‘TPTP’ and ‘TRUS’ cohorts (see below for more details). For both cohorts, patients were recruited and treated as described previously (10).

Data Collection

Phenotypic data were generated from a total of 130 males (42 diagnosed with benign disease and 88 diagnosed with cancer, as confirmed by TPTP or TRUS biopsy evidence) (Tables 1, 2). Of the 42 subjects diagnosed with benign disease; 2 were diagnosed with Atypical Small Acinar Proliferation (ASAP). 11 with Atypia, 13 with High Grade Prostatic Intraepithelial Neoplasia (PIN) and 16 with benign disease. Of the men diagnosed with PCa, 18 had low-risk, 44 had intermediate-risk, and 25 had high-risk cancer based on their D’Amico Risk Classification for Prostate Cancer (13). The D’Amico Risk for one patient was not available as no Gleason score values were provided. Further details regarding the TRUS (14) and TPTP (9, 10) biopsy techniques have been provided elsewhere.

TABLE 1
www.frontiersin.org

Table 1 Clinical demographics of cohorts.

TABLE 2
www.frontiersin.org

Table 2 Clinical demographics of TRUS and TPTP biopsy cohorts.

Some of the data used in the present study have been previously published (11). These data were derived from 72 males having PSA levels < 20 ng/ml who had a TRUS and then a TPTP biopsy (11). The mean age for this cohort was 66 years old (age range of 50–84 years old).

Flow Cytometric Analysis

Peripheral blood was collected from all individuals using standard clinical procedures, aliquots of which (30 ml) were transferred into sterile 50 ml polypropylene (Falcon) tubes containing 300 µl sterilised Lithium Heparin (1000 U/ml, Merck Millipore). Anti-coagulated samples were immediately transferred to the John van Geest Cancer Research Centre at Nottingham Trent University (Nottingham, UK) and processed immediately upon receipt (always within 3 hours of collection).

Absolute cell counts in whole blood samples were determined by the inclusion of BD Trucount™ beads (BD Biosciences; Mountain View, CA, USA), as per the manufacturer’s protocol. For the flow cytometric analysis, 100 μl of blood was mixed directly in the BD Trucount™ bead tube and T cell, B cell, and NK cell populations identified using the conjugated monoclonal antibodies (mAbs) detailed in Table 3. Samples were incubated for 15 min at room temperature, protected from the light, after which erythrocytes were lysed by incubating samples for 15 min at room temperature in BD Pharm Lyse™ (BD Biosciences). Once staining was complete, cells were washed in phosphate buffered saline (PBS), resuspended in Coulter Isoton™ diluent. Data were acquired within 1 h using a 10-color/3-laser Beckman Coulter Gallios™ flow cytometer and analyzed using Kaluza™ v1.3 data acquisition and analysis software (Beckman Coulter). Controls used a “Fluorescence minus One”, “FMO” approach (15). A typical gating strategy for the analyses is presented in Figure 1.

TABLE 3
www.frontiersin.org

Table 3 Monoclonal antibody (mAb) panel for B and T cell phenotyping.

FIGURE 1
www.frontiersin.org

Figure 1 Representative gating strategies for the flow cytometric analysis of single cells. The staining panel confirmed CD45 expression then determined cell populations as CD14+ monocytes, CD3-CD56+ NK cells (with CD56bright and CD56dim subsets), CD3+CD56+ NKT cell subpopulations, CD19+ B cells, CD3+CD4+ and CD3+CD8+ Naïve, Central Memory, Effector Memory, Terminally Differentiated Effector Memory Cells Expressing CD45RA T cell populations. The definition of monocytes based on CD45+CD4+ generated the same data as defining them based on CD3-CD14+ (data not shown).

Deep Neural Networks for Predicting the Presence of PCa and Its Clinical Risk (biLSTM)

The bidirectional Long Short-Term Memory Deep Neural Network (biLSTM) is also known as a bidirectional Recurrent Neural Network (RNN). LSTM is an artificial recurrent neural network architecture used in the field of deep learning. Unlike the standard feedforward neural network, the LSTM has feedback connections which enable it to process entire sequences of data. A biLSTM is a type of LSTM with a bidirectional layer and learns bidirectional long-term dependencies in sequence data. The architecture of the proposed biLSTM for detecting the presence of PCa is shown in Table 4. Although the biLSTM is widely applied to sequential data, it has been, and can also be successfully applied to non-sequential data.

TABLE 4
www.frontiersin.org

Table 4 Parameter settings of the Deep Learning Models.

A biLSTM model learns the input sequence both forward and backwards and concatenates both interpretations. The model duplicates the first recurrent layer in the network and creates two side-by-side layers, then provides the input sequence ‘as-is’ as input to the first layer and providing a reversed copy of the input sequence to the second (16). The training data are shuffled before each training epoch, and the validation data are shuffled before each network validation. Given that the mini-batch size does not evenly divide the number of training samples, the network discards the training data that do not fit into the final complete mini-batch of each epoch. Shuffling the data as mentioned above avoids discarding the same data at every epoch.

Two biLSTM models were implemented. The first biLSTM model takes as input immunophenotypic features and clinical data and is trained to detect the presence of PCa. The second model takes as input a set of biomarkers comprising immunophenotypic features and clinical data and is trained to predict the clinical risk the PCa when PCA has been identified as being present.

The models were built using combinations of phenotypic features and clinical data to determine the best combinations for training each model. Figure 2 shows how prediction models for detecting the presence of PCA and its clinical risk can be utilised to assist clinical diagnosis.

FIGURE 2
www.frontiersin.org

Figure 2 Flow chart illustrating the process to detect the presence of PCa and its clinical significance. Stage 1 (Model 1): distinguishes between men with benign prostate disease and PCa; Stage 2 (Model 2): predicts risk (in terms of clinical significance) in men identified as having PCa in Stage 1. Note that Stage 1 can also detect PCa in men with PSA levels < 20 ng/ml.

Methodology for Evaluating the Deep Neural Network Models

The dataset was initially split into datasets derived from men with benign prostate disease and patients with PCa, and each of these datasets was randomly split into ‘train’, ‘validation’ and ‘test’ datasets with a split ratio of 60:20:20, respectively. This random split process was repeated 30 times to create 30 different train, validation and testing sets. This allowed for exhaustive evaluations to be carried out using different sub-populations of the dataset for train, validation, and test purposes. The biLSTM Deep Neural Network models utilised the train sets for training, and the validation sets were utilised during the training process to improve the models’ learning. The test sets are unseen during training, and therefore the test results can be considered to represent mini clinical trials. The results at the end of the 30 runs were collected and analyzed. The methodology for evaluating the Deep Learning models is illustrated in Figure 3.

FIGURE 3
www.frontiersin.org

Figure 3 Experimental methodology for evaluating the Deep Learning Models.

Performance Evaluation Measures

A set of relevant metrics were adopted for evaluating the performance of the proposed biLSTM models, These were built using six different ‘feature’ sets: FC; PSA; FC+PSA; FC+Age; FC+Age+PSA; Age+PSA. ‘FC’ stands for flow cytometry features and comprises five phenotypic features CD8+CD45RA-CD27-CD28- (CD8+ Effector Memory cells), CD4+CD45RA-CD27-CD28- (CD4+ Effector Memory cells), CD4+CD45RA+CD27-CD28- (CD4+ Terminally Differentiated Effector Memory Cells re-expressing CD45RA), CD3-CD19+ (B cells), CD3+CD56+CD8+CD4+ (NKT cells), as identified previously as being able to discriminate between benign prostate disease and PCa (11).

Let model PCaPresence be a model for detecting the presence of PCa, and PCaRisk be a model for predicting whether a patient with PCa has D’Amico high-risk (H-risk) or low/intermediate risk (LI-risk) disease.

● |TP| stands for True Positive. |TP| in a PCaPresence model is the total number of patients diagnosed with PCa who were correctly classified with PCa. |TP| in a PCaRisk model is the total number of patients diagnosed with H-risk PCa who were correctly classified with H-risk PCa.

● |TN| stands for True Negative. |TN| in a PCaPresence model is the total the number of patients with benign disease who were correctly classified with benign disease. |TN| in a PCaRisk model is the total the number of LI-risk patients who were correctly classified as LI-risk.

● |FP| stands for False Positive. |FP| in a PCaPresence model is the total number of patients with benign disease who were incorrectly classified with PCa. |FP| in a PCaRisk model is the total number of LI-risk patients who were incorrectly classified as H-risk.

● |FN| stands for False Negative. |FN| in a PCaPresence model is the total number of patients with PCa who were incorrectly classified with benign disease. |FN| in a PCaRisk model is the total number of H-risk patients who were incorrectly classified as LI-risk.

● |P| stands for Positive. |P| in a PCaPresence model is the total number of patients with PCa that exist in the dataset. |P| in a PCaRisk model is the total number of H-risk patients that exist in the dataset. |P|=|TP|+|FN|.

● |N| stands for Negative. |N| in a PCaPresence model is the total number of patients with benign disease that exist in the dataset. |N| in a PCaRisk model is the total number of LI-risk patients that exist in the dataset. |N|=|FP|+|TN|. The following commonly used evaluation measures can be defined.

Accuracy=|TP|+|TN||TP|+|FP|+|FN|+|TN|,ϵ [0,1](1)
Sensitivity=|TP|+|TN||TP|+|FN|,ϵ [0,1](2)

Sensitivity is also known as the True Positive Rate (TRP).

Specificity=|TN||TN|+|FP|,ϵ [0,1](3)

Specificity is also known as the True Negative Rate (TNR).

FPR=|FP||FP|+|TN|=1Specificity,ϵ [0,1](4)

FPR stands for False Positive Rate.

The closer the values of Accuracy, Sensitivity (i.e. TPR, Sensitivity) and Specificity (i.e. TNR, Specificity) are to 100%, then the better the performance of a model.

The Receiver Operating Characteristic (ROC) evaluates the quality of a prediction model’s performance. The ROC curve has an optimal ROC point which comprises two values: the FPR and the TPR values. The optimal ROC point is computed by function (6) for finding the slope, S.

S=Cost(P\N)Cost(N\N)Cost(N\P)Cost(P\P)×NP,(5)

where, in a PCaPresence detection model let the positive class be the class containing patients with PCa, and the negative class be the class containing men having benign prostate disease. In the PCaRisk prediction model let the positive class be the H-risk group and the negative class be class containing the records of the patients belonging to the low and intermediate class (LI-risk). (N|P) is the cost of misclassifying a positive class as a negative class; and Cost (P|N) is the cost of misclassifying a negative class, as a positive class.

The optimal ROC point is identified by moving the straight line with slope S from the upper left corner of the ROC plot (FPR=0%, TPR=100%) down and to the right until it intersects the ROC curve. The Area Under the ROC Curve (AUC) is another important performance evaluation metric which reflects the capacity of a model to discriminate between the data obtained from individuals with benign prostate disease and patients with PCa. The larger the AUC, the better the overall capacity of the classification system to correctly distinguish between benign disease and PCa.

Pre-Processing of Dataset

The dataset comprised 7 features, 5 of which were peripheral blood flow cytometric T and B cell phenotyping features identified in our previous study (11) and the remaining two of which were the clinical features PSA level and Age. The five phenotypic features were: CD8+CD45RA-CD27-CD28- (CD8+ Effector Memory cells), CD4+CD45RA-CD27-CD28- (CD4+ Effector Memory cells), CD4+CD45RA+CD27-CD28- (CD4+ Terminally Differentiated Effector Memory Cells re-expressing CD45RA), CD3-CD19+ (B cells), CD3+CD56+CD8+CD4+ (NKT cells). The data for each immune phenotyping feature were standardized using z-score transformation. The standardized z-scores are scores (or data values) that have been given a common standard. This standard is a mean of zero and a standard deviation of 1. The PSA and Age values were not standardized (Table 5).

TABLE 5
www.frontiersin.org

Table 5 Dataset statistics.

The Kolmogorov-Smirnov and Shapiro-Wilk statistical tests demonstrated that the data are not normally distributed and non-parametric tests were therefore used for the analyses (Table 6).

TABLE 6
www.frontiersin.org

Table 6 Tests for normal distribution in data.

Results

Differences in Measured Features Between Men With Benign Prostate Disease and Patients With PCa

The Mann-Whitney U test revealed that there are no significant differences (p<0.05) between the groups for the flow cytometry features, but that the age and PSA levels in men with benign disease and those with PCa were different (p<0.05, Table 7). Given that there are 7 comparisons, the Bonferroni correction was applied and the α value was set to α= 0.007 to reduce Type I error. Using the adjusted α value revealed that there were no significant differences between the values of the features of the benign and PCa groups.

TABLE 7
www.frontiersin.org

Table 7 Statistical tests for checking on significant differences between groups.

Table 7 also reports the effect size which is the magnitude of the difference between groups, and it is computed using r=Z√N, where Z is the output of the Mann-Whitney U Test, and N is the total number of samples. According to Cohen (17), the effect size is low if the value of r varies around 0.1, medium if r varies around 0.3, and large if r varies more than 0.5. This means that if the values of two groups do not differ by 0.2 standard deviations or more, then the difference is trivial, even if it is statistically significant. Hence, it can be concluded that there are no statistically significant differences between the features indicated in Table 7 in the benign prostate disease and PCa groups, and that any differences that do exist are small and trivial.

The nonparametric Spearman’s rank-order correlation shows there to be no strong positive or strong negative correlations amongst the outputs which will be utilised to build the machine learning classifier (Figure 4).

FIGURE 4
www.frontiersin.org

Figure 4 Heatmap of flow cytometry and other features. Each cell of the heatmap provides a Spearman rho correlation value between two features. There are no strong positive or strong negative correlations amongst the inputs.

TPTP vs TRUS: Differences in Patient Profiles

TPTP is significantly better at diagnosing PCa than TRUS biopsies in biopsy naïve men with an elevated PSA <20 ng/ml and a benign feeling prostate (10). Nafie et al. have therefore proposed that TPTP should be regarded as the biopsy technique of choice in such cases (10).

The Kruskal-Wallis H test is an extension of the Mann-Whitney U test, is the nonparametric equivalent of the one-way analysis of variance and detects differences in distribution location. The major difference between the Mann-Whitney U and the Kruskal-Wallis H is simply that the latter can accommodate more than two groups. Both tests require independent (between-subjects) designs and use summed rank scores to determine the results. Therefore, for the analysis in this subsection the Kruskal-Wallis H test was suitable. The Kruskal-Wallis H (also known as the ‘one-way ANOVA on ranks’) rank-based nonparametric test, was used to determine whether there are any statistically significant differences between the immunophenotypic profiles of the patients when grouped based on biopsy methods and diagnosis. Therefore, a new variable was created, BiopsyDiagnosis, where the biopsy type (i.e. TPTP or TRUS) and diagnosis (i.e. benign prostate disease or PCa) were merged into a four separate labels: TPTPBenign, TRUSBenign, TPTPCancer, TRUSCancer, and the Kruskal-Wallis test was applied to determine significant difference between the TPTPBenign and TRUSBenign groups, and between the TPTPCancer and TRUSCancer patient groups. Table 8 shows the characteristics of each group of subjects and Table 9 the results of the Kruskal-Wallis H test which was applied to determine differences between the ranks of the abovementioned groups.

TABLE 8
www.frontiersin.org

Table 8 Patients by biopsy group.

TABLE 9
www.frontiersin.org

Table 9 Statistical tests for checking on significant differences between biopsy groups and diagnosis.

The α level for these tests was set to 0.005, however applying a Bonferroni correction which was applied to reduce the chance of a false positive (i.e. a Type I error) reduced the α value to 0.05 since there exist 10 possible comparisons. As shown in Table 9, the absence of any significant differences (Asymp. Sig) between any of the immunophenotyping features of the TPTPBenign and TRUSBenign patients is a good indicator that data collected during TPTP and TRUS biopsy can be combined when training a machine learning model.

Results of the Deep Learning Models for Identifying the Presence of PCa

The performance of various biLSTM Deep Neural Network models (whose architecture is described above) for predicting the presence of PCa when using six different subsets of features was assessed. Table 10 shows the training, validation, and test results of the models. Table 10 shows that the FC+Age was the best model, achieving an accuracy of 86.92% on the validation set, and 86.79% on the test set. More specifically, the model was able to detect the presence of PCa in the validation set with Acc: 86.92% (± 0.10), Sensitivity: 83.70% (± 0.16); Specificity: 94.17% (± 0.11); AUC: 88.94% (± 0.07), ORP-FPR: 9.17% (± 0.20), ORP-TPR: 85.74% (± 0.14) (Table 10). Results from the test set (set not used during training or validation) were Acc: 86.79% (± 0.10), Sensitivity: 82.78% (± 0.15); Specificity: 95.83% (± 0.11); AUC: 89.31% (± 0.07), ORP-FPR: 7.50% (± 0.20), ORP-TPR: 84.44% (± 0.14).

TABLE 10
www.frontiersin.org

Table 10 Results of the biLSTM Deep Neural Network models for predicting the presence of PCa.

The validation results for predicting the presence of PCa using PSA revealed a 27.91% lower ORP-FPR when combining FC+PSA than when using PSA alone. For the test results, the ORP-FPR was 20.83% lower when combining FC+PSA than when using PSA alone. The standard deviation values of the FC+PSA were lower indicating a more stable model.

The Role of Age and Its Impact on Predicting the Presence of PCa

Combining Age with immunophenotypic features improved prediction accuracy and therefore age appears to be a good predictor for the presence of PCa when combined with the flow cytometry phenotypic features. As the correlation chart in Figure 4 shows there to be no strong positive or strong negative correlations between age and the rest of the features including diagnosis, we can rule out the fact that correlation is biasing the models’ predictions (i.e. since there are no strong positive or strong negative correlations between Age and the presence of PCa). However, a further statistical investigation was used to conclude whether age is biasing the performance of the prediction models. The two-sample Kolmogorov-Smirnov test, a nonparametric hypothesis test was applied for testing if the variable age has identical distributions in the two populations (i.e. the benign prostate disease and PCa groups). **Note that this is different to the results shown in Table 6 which checks whether the variables are normally distributed, and not whether the two groups follow the same distributions. Figure 5 shows the distribution of Age values across the benign prostate disease and PCa groups.**

FIGURE 5
www.frontiersin.org

Figure 5 Age of men with benign prostate disease and patients with prostate cancer (PCa).

The α value was set to 0.01 to minimize Type I error. The test returned p=0.033, Z=1.431 (and p>0.01) meaning that samples from the benign prostate disease and PCa groups are from the same continuous distribution at the 1% significance level. The next step was to determine whether there are any significant differences in the mean age ranks of these two groups that could be biasing the prediction. The α value was again set to 0.01 to prevent Type I errors and make it harder to predict significant differences. As the Mann-Whitney test revealed p=0.012, Z=-2.515 we can assume that there are no significant differences in the mean ranks of age at the 1% significance level.

However, to further ensure the correct conclusions are reached, the Moses Test of extreme reaction was carried out to recompute the differences between groups when the extreme outliers are not considered. The test is a distribution-free non-parametric test of the difference between two independent groups in the extremity scores (in both directions) that the groups contain. For the benign prostate disease and PCa groups, Moses tests whether extreme values are equally likely in both populations, or if they are more likely to occur in the population from which the sample with the larger range was drawn. The scores from the benign prostate disease and PCa groups are pooled and converted to ranks, and the test statistic is the span of scores (computed as the range plus 1) in one of the groups chosen arbitrarily. An exact probability is computed for the span and then recomputed after dropping a specified number of extreme scores from each end of its range. The exact one-tailed probability is calculated. After trimming the entire dataset, there were 5 patients ≤54 years old and 6 patients ≥84 years old. The information for the Moses extreme reaction test shows that the benign prostate disease and PCa groups have different age values with a Sig. = 0.006. However, when removing the extreme outliers, the Sig. value increases to Sig. = 0.082 hence the two groups have similar age values when the extreme outliers are removed.

Revisiting the results in Table 7, and based on the observations described in this section, we can conclude that the algorithms are not biased towards age, and that age in combination with the immunophenotypic features forms a good predictor for the presence of PCa. It is important to mention that if age was biasing the output of the prediction model, then other machine learning models (FC+Age+PSA, Age+PSA) would have consistently delivered high prediction results, because machine learning models excel at detecting patterns in data and would have found the association (pattern) between the Age and the output variable (diagnosis) if this had existed. Based on these observations, it can be concluded that age is not biasing the output of the prediction model.

Results of the Deep Learning Models for Predicting the Clinical Risk of PCa

Men diagnosed with low-risk or small volume intermediate-risk PCa will very rarely require treatment compared to men who have been diagnosed with high-risk PCa. It is therefore important to detect men in the H-risk group accurately to prioritise treatment for those men, and to prevent unnecessary invasive procedures.

Consequently, we determined whether biLSTM models can differentiate between the clinical risk of PCa using the same features as those which have been used for building the models for predicting the presence of PCa (Table 11). Given that there are 85 patients having low-risk (n=18), intermediate-risk (n=44) or high-risk (n=25) PCa, patients were grouped into L/I (low-intermediate) and H (high) risk groups. The biLSTM model that was designed for predicting risk was then utilised to predict risk (L/I or H). The test results in Table 11 show that the model which combined the flow cytometry features with PSA was able to predict clinical risk in the validation set with Acc: 94.51% (± 6.35), Sensitivity: 92% (± 21.09); Specificity: 95.56% (± 2.99); AUC: 93.78% (± 10.50), ORP-FPR: 4.44% (± 2.99), ORP-TPR: 92% (± 21.09). The results on the test set with Acc: 94.90% (± 6.29), Sensitivity: 92% (± 21.39); Specificity: 96.11% (± 0.00); AUC: 94.06% (± 10.69), ORP-FPR: 3.89% (± 0.00), ORP-TPR: 92% (± 21.39). These are a positive indicator, and it is expected that with a larger dataset the model will be able to learn better, and the standard deviation values will reduce. Comparing the results of the FC+PSA model with those which uses PSA values alone, the FCA+PSA model returned better validation and test results. It therefore appears that PSA is a good predictor of clinical risk when combined with FC values.

TABLE 11
www.frontiersin.org

Table 11 Results of the biLSTM deep neural network models for predicting the D’Amico risk of PCa.

Experimental results described above using models for detecting the presence of PCa found that Age is a feature which, when combined with the immunophenotypic profiling, delivers a greater predictive accuracy than when it is used alone. Here, we follow a similar analysis for interrogating the impact of PSA. As indicated in the correlation chart (Figure 4), there are no strong positive or strong negative correlations between PSA and the other features.

The two-sample Kolmogorov-Smirnov test was applied to check for identical distributions in the two populations (i.e. benign prostate disease and PCa). The test returned p=0.033, Z=1.431 (and p>0.01) meaning that samples from the benign prostate disease and the PCa groups are of the same continuous distribution at the 1% significance level. Next, we determined whether significant differences in the mean ranks of the benign prostate disease and PCa groups could be biasing the prediction. The alpha value was set to 0.01 to prevent Type I errors and make it harder to predict significant differences. The Mann-Whitney test returned p=0.007, Z=-2.717. Therefore, it can be assumed that there are significant differences in the mean ranks of age at the 1% significance level. These results show that the PSA could be influencing the risk of disease, which makes clinical sense given that the high-risk patients often (but not always) have higher PSA values than the low and intermediate-risk patients (Table 11).

Discussion

It is essential that men with low-risk prostate abnormalities are not diagnosed with PCa, as those with low-grade disease do not require active treatment, yet they become `labelled’ as having PCa. This can have adverse psychological and financial consequences and assign these men to life-long surveillance. Inappropriate assignment of men to potentially life-threatening invasive procedures and lifelong surveillance for PCa has significant psychological, quality of life, financial and societal consequences. Although the diagnosis of PCa based on PSA levels alone is not reliable, combining PSA measurements with other approaches might strengthen the diagnostic value of PSA measurements and identifying its clinical risk, and it is based on this concept that the current study has been performed.

Given the established reciprocal relationship(s) between cancers and the immune system, we have previously demonstrated (11) that a set of five phenotypic features (CD8+CD45RA-CD27-CD28- (CD8+ Effector Memory cells), CD4+CD45RA-CD27-CD28- (CD4+ Effector Memory cells), CD4+CD45RA+CD27-CD28- (CD4+ Terminally Differentiated Effector Memory Cells re-expressing CD45RA), CD3-CD19+ (B cells), CD3+CD56+CD8+CD4+(NKT cells) could be used to identify the presence of PCa in a population of asymptomatic men with PSA levels that were elevated above the normal, but <20 ng/ml (‘normal’ is ~5 ng/ml), a population which presents a significant clinical challenge. In a subsequent study we identified an NK cell phenotypic signature which can be used to identify both the presence and clinical risk of PCa in the same cohort of asymptomatic men (12).

Herein, we explored whether this T and B cell phenotypic signature can be incorporated into models that can predict the presence and clinical risk of PCa in men having elevated PSA values of any level, and whose disease status had been defined using the TRUS and TPTP biopsy. Given limited sample numbers, it was not possible to undertake a similar analysis using the NK cell phenotyping dataset. For this, we built two prediction models: the first to detect the presence of PCa and the second to predict the clinical risk of any PCa present in asymptomatic men with raised PSA values, not just those < 20 ng/ml. Although this signature alone was not suitable for detecting the presence of PCa or its clinical risk in a population of men having PSA values <20 ng/ml, this T and B cell phenotypic signature can be used to build highly accurate machine learning models for predicting the presence of PCa (when combined with Age) and the clinical risk of any PCa which is present (when combined with PSA levels).

Using a set of immunophenotyping biomarkers combined with basic clinical data we have shown it to be possible to develop machine learning models which can predict the presence of PCa and its clinical significance, without the need for invasive biopsies. Inserting the data derived from the analysis of the peripheral blood from an individual into the proposed tool will return a prediction about that individual. The proposed models are based on machine learning methods which can be continually retrained as more patient data are collected to learn patterns from a larger population - this will further increase performance. We expect that the proposed approaches will spare men with benign prostate disease or low-risk PCa from unnecessary invasive diagnostic procedures such as TRUS or TPTP biopsy. We expect that these new approaches could avoid up to 70% of prostate biopsies, thereby sparing men with benign disease or low-risk PCa from unnecessary biopsy and significantly reduce under- and over-diagnosis.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Mendeley Data: doi: 10.17632/wmgtzw2w8f.1 (18).

Ethics Statement

Research Protocols were registered and approved by the National Research Ethics Service Committee of East Midlands and by the Research and Development Department in the University Hospitals of Leicester NHS Trust. All participants were given information sheets explaining the nature of the study and all provided informed consent. Ethical approval for the collection and use of samples from the TPTP cohort (Project Title: Defining the role of Transperineal Template-guided prostate biopsy) was given by NRES Committee East Midlands – Derby 1 (NREC Reference number: 11/EM/3012; UHL11068). Ethical approval for the collection and use of samples from the TRUS cohort (Project title: A pilot study to identify gene fusions in Prostate Cancer) was given by NRES Committee East Midlands – Derby 2 (NREC Reference number: 09/H0401/92; UHL 10856). The patients/participants provided their written informed consent to participate in this study.

Author Contributions

GC computationally analyzed the flow cytometry data, prepared and tested the algorithms, analyzed the results, wrote the first draft, and made a significant contribution to the preparation of the manuscript. SM contributed to the preparation, staining and analysis of the flow cytometry data, and generated the multidimensional flow cytometry datasets on which the study has been based. SR, GF, CJ, and SH contributed to the preparation, staining, and analysis of the flow cytometry data, and generated the multidimensional flow cytometry datasets on which the study has been based. MK identified the clinical need, provided access to clinical samples and clinical data, and made a significant contribution to the preparation of the manuscript. AP conceived the study and made a significant contribution to the interpretation of the data and the preparation of the manuscript. All authors contributed to the article and approved the submitted version.

Funding

The authors acknowledge the financial support of the John and Lucille van Geest Foundation and the Healthcare and Bioscience iNet, an ERDF funded initiative managed by Medilink East Midlands. GC acknowledges the financial support of The Leverhulme Trust (Research Project Grant RPG-2016-252). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Stamey TA, Yang N, Hay AR, Mcneal JE, Freiha FS, Redwine E. Prostate-Specific Antigen as a Serum Marker for Adenocarcinoma of the Prostate. N Engl J Med (1987) 317:909–16. doi: 10.1056/NEJM198710083171501

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Catalona WJ, Smith DS, Ratliff TL, Dodds KM, Coplen DE, Yuan JJ, et al. Measurement of Prostate-Specific Antigen in Serum as a Screening Test for Prostate Cancer. N Engl J Med (1991) 324:1156–61. doi: 10.1056/NEJM199104253241702

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Martin RM, Donovan JL, Turner EL, Metcalfe C, Young GJ, Walsh EI, et al. Effect of a Low-Intensity PSA-Based Screening Intervention on Prostate Cancer Mortality: The Cap Randomized Clinical Trial. JAMA (2018) 319:883–95. doi: 10.1001/jama.2018.0154

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Hugosson J, Roobol MJ, Mansson M, Tammela TLJ, Zappa M, Nelen V, et al. A 16-Yr Follow-Up of the European Randomized Study of Screening for Prostate Cancer. Eur Urol (2019) 76:43–51. doi: 10.1016/j.eururo.2019.02.009

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Serag H, Banerjee S, Saeb-Parsy K, Irving S, Wright K, Stearn S, et al. Risk Profiles of Prostate Cancers Identified From UK Primary Care Using National Referral Guidelines. Br J Cancer (2012) 106:436–9. doi: 10.1038/bjc.2011.596

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Raaijmakers R, Kirkels WJ, Roobol MJ, Wildhagen MF, Schrder FH. Complication Rates and Risk Factors of 5802 Transrectal Ultrasound-Guided Sextant Biopsies of the Prostate Within a Population-Based Screening Program. Urology (2002) 60:826–30. doi: 10.1016/s0090-4295(02)01958-1

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Carlson WH, Bell DG, Lawen JG, Rendon RA. Multi-Drug Resistant E.Coli Urosepsis in Physicians Following Transrectal Ultrasound Guided Prostate Biopsies - Three Cases Including One Death. Can J Urol (2010) 17:5135–7.

PubMed Abstract | Google Scholar

8. Djavan B, Ravery V, Zlotta A, Dobronski P, Dobrovits M, Fakhari M, et al. Prospective Evaluation of Prostate Cancer Detected on Biopsies 1, 2, 3 and 4: When Should We Stop? J Urol (2001) 166:1679–83. doi: 10.1016/S0022-5347(05)65652-2

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Nafie S, Pal RP, Dormer JP, Khan MA. Transperineal Template Prostate Biopsies in Men With Raised PSA Despite Two Previous Sets of Negative TRUS-Guided Prostate Biopsies. World J Urol (2014) 32:971–5. doi: 10.1007/s00345-013-1225-x

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Nafie S, Mellon JK, Dormer JP, Khan MA. The Role of Transperineal Template Prostate Biopsies in Prostate Cancer Diagnosis in Biopsy Naïve Men With PSA Less Than 20 Ng.Ml-1. Prostate Cancer Prostatic Dis (2014) 17:170–3. doi: 10.1038/pcan.2014.4

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Cosma G, Mcardle SE, Reeder S, Foulds GA, Hood S, Khan M, et al. Identifying the Presence of Prostate Cancer in Individuals With Psa Levels <20 Ng.Ml–1 Using Computational Data Extraction Analysis of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data. Front Immunol (2017) 8:1771. doi: 10.3389/fimmu.2017.01771

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Hood SP, Cosma G, Foulds GA, Johnson C, Reeder S, Mcardle SE, et al. Identifying Prostate Cancer and Its Clinical Risk in Asymptomatic Men Using Machine Learning of High Dimensional Peripheral Blood Flow Cytometric Natural Killer Cell Subset Phenotyping Data. Elife (2020) 9. doi: 10.7554/eLife.50936

PubMed Abstract | CrossRef Full Text | Google Scholar

13. D'Amico AV, Whittington R, Malkowicz SB, Schultz D, Blank K, Broderick GA, et al. Biochemical Outcome After Radical Prostatectomy, External Beam Radiation Therapy, or Interstitial Radiation Therapy for Clinically Localized Prostate Cancer. JAMA (1998) 280:969–74. doi: 10.1001/jama.280.11.969

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Bjurlin MA, Taneja SS. Standards for Prostate Biopsy. Curr Opin Urol (2014) 24:155–61. doi: 10.1097/MOU.0000000000000031

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Cossarizza A, Chang HD, Radbruch A, Acs A, Adam D, Adam-Klages S, et al. Guidelines for the Use of Flow Cytometry and Cell Sorting in Immunological Studies (Second Edition). Eur J Immunol (2019) 49:1457–973. doi: 10.1002/eji.201970107

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Schuster M, Paliwal KK. Bidirectional Recurrent Neural Networks. IEEE Trans Signal Process (1997) 45:2673–81. doi: 10.1109/78.650093

CrossRef Full Text | Google Scholar

17. Cohen J. Statistical Power Analysis. Curr Dir psychol Sci (1992) 1:98–101. doi: 10.1111/1467-8721.ep10768783

CrossRef Full Text | Google Scholar

18. Pockley AG, Cosma G, Mcardle SEM, Foulds GA, Hood SP, Reeder S, et al. Deep Machine Learning of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data for Identifying Prostate Cancer and Its Clinical Risk in Asymptomatic Men, Mendeley Data. (2021). doi: 10.17632/wmgtzw2w8f.1

CrossRef Full Text | Google Scholar

Keywords: prostate cancer, predictive modeling, immunophenotyping data, flow cytometry, PSA level, computational analysis, machine learning

Citation: Cosma G, McArdle SE, Foulds GA, Hood SP, Reeder S, Johnson C, Khan MA and Pockley AG (2021) Prostate Cancer: Early Detection and Assessing Clinical Risk Using Deep Machine Learning of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data. Front. Immunol. 12:786828. doi: 10.3389/fimmu.2021.786828

Received: 30 September 2021; Accepted: 11 November 2021;
Published: 16 December 2021.

Edited by:

Antonio Cosma, Luxembourg Institute of Health, Luxembourg

Reviewed by:

Palak Sekhri, George Washington University, United States
Abdullah Demirtas, Erciyes University, Turkey

Copyright © 2021 Cosma, McArdle, Foulds, Hood, Reeder, Johnson, Khan and Pockley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Georgina Cosma, g.cosma@lboro.ac.uk; A. Graham Pockley, graham.pockley@ntu.ac.uk

Present address: Simon P. Hood, Cancer Research UK Manchester Institute, University of Manchester, Manchester, United Kingdom

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.