ORIGINAL RESEARCH article

Front. Mol. Biosci., 19 January 2023

Sec. Biological Modeling and Simulation

Volume 10 - 2023 | https://doi.org/10.3389/fmolb.2023.1094321

A Boolean-based machine learning framework identifies predictive biomarkers of HSP90-targeted therapy response in prostate cancer

  • 1. Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia

  • 2. Cancer Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia

  • 3. South Australian Immunogenomics Cancer Institute and Freemasons Foundation Centre for Men’s Health, University of Adelaide, Adelaide, SA, Australia

  • 4. South Australian Health and Medical Research Institute, Adelaide, SA, Australia

Abstract

Precision medicine has emerged as an important paradigm in oncology, driven by the significant heterogeneity of individual patients’ tumour. A key prerequisite for effective implementation of precision oncology is the development of companion biomarkers that can predict response to anti-cancer therapies and guide patient selection for clinical trials and/or treatment. However, reliable predictive biomarkers are currently lacking for many anti-cancer therapies, hampering their clinical application. Here, we developed a novel machine learning-based framework to derive predictive multi-gene biomarker panels and associated expression signatures that accurately predict cancer drug sensitivity. We demonstrated the power of the approach by applying it to identify response biomarker panels for an Hsp90-based therapy in prostate cancer, using proteomic data profiled from prostate cancer patient-derived explants. Our approach employs a rational feature section strategy to maximise model performance, and innovatively utilizes Boolean algebra methods to derive specific expression signatures of the marker proteins. Given suitable data for model training, the approach is also applicable to other cancer drug agents in different tumour settings.

Introduction

Precision treatment has become an important treatment modality in oncology, where the molecular makeup of patients’ tumour dictates therapeutic decisions. Identifying predictive biomarkers of treatment response that aid stratification of patients is critical for effective deployment of personalized oncology (; ; ). However, for most existing cancer drug agents including those that have been clinically approved (; Zhong et al., 2021), we currently lack companion predictive biomarkers that can reliably predict treatment response and inform patient selection. Thus, identification of predictive response biomarkers for cancer therapies represents a broad and unmet clinical need.

A major challenge that complicates the identification of response biomarkers is the multi-factorial determinant of cellular response to drug treatment, which is further accentuated by the extensive tumour heterogeneity between patients (Turajlic et al., 2019; Xi et al., 2019; ). Consequently, except for a few notable cases of clinical success (; ), single-gene biomarkers are insufficient for predicting treatment responses and unlikely to be clinically useful (). Instead, multi-gene biomarker panels are more likely to capture the complexity underpinning tumour drug response, and deliver better prediction (Zhu et al., 2011; ). Biomarker discovery approaches therefore should explicitly model combinations of relevant marker genes/proteins.

Computational methods have been key in the derivation of response biomarkers for cancer therapeutics (; ; ). A simple but commonly used approach is to identify genes (or proteins) that are differentially expressed between treatment-sensitive and -resistant groups using–omics data such as transcriptomic or proteomic data (; ). However, the degree of differential expression of a gene (based on fold-change and/or p-value) is not a good indicator of its predictive power towards treatment responsiveness. Moreover, the lists of DEGs are typically very long, and without further analysis to prioritize and narrow them down, the applicability of DEGs-based approaches remains limited. Recently, more sophisticated approaches such as machine learning (ML) have been applied to the biomarker discovery domain (; ; ; ).

Yet, the ‘curse of dimensionality’ widespread in pharmacogenomics data - where the number of molecular features often far exceeds the number of biological samples - necessitates the development of feature selection strategies for ML algorithms (; ; .). For example, Ballester and others have developed a scheme termed Optimal Model Complexity (OMC) aimed at identifying a smaller subset of informative features from the much larger original feature space, and integrated OMC with various ML algorithms (; ; ). OMC works by ranking the features using the p-values obtained from univariate statistical tests to correlate between each feature and the corresponding labels (e.g., IC50 values of treated drugs), thereby pinpointing the most relevant features prior to model training (). OMC-based XGBoost was employed to predict cancer drug response using pharmacogenomic data derived from either cancer cell lines (Yang et al., 2013; ), or cancer patient-derived xenografts (; ). In addition, Bomane et al. has also applied OMC to other ML algorithms, including Random Forest and LightGBM, to predict response to paclitaxel treatment in breast cancer (). On the other hand, selected potentially informative molecular genes for predicting cancer drug response by analysing the variance in gene expression profiles using cell lines based pharmacogenomic datasets. To predict cancer patient response to chemotherapeutic drugs, employed standard recursive feature elimination method to select for most relevant features (gene expression data) and applied it on top of support vector machine algorithm. Other studies utilise knowledge-based approaches to select likely relevant subsets of features: for example, by leveraging the protein-protein interaction network surrounding the drug targets (), or restricting those to genes in the cancer gene census set (). However, due to lack of relevant patient-derived pharmacogenomic data, most ML studies to date have been performed using panels of cancer cell lines (; ; ; ), which do not necessarily reflect the heterogeneity and drug sensitivity in human tumours (; ).

In this study, we have developed a generally applicable machine learning framework for identification of multi-gene predictive biomarker panel and associated expression signatures for anti-cancer drugs. The approach comprises two phases (Figure 1). The first is to identify optimal biomarker panels that predict drug response using ML coupled with a new feature selection strategy. The second is to derive expression signatures of the identified biomarkers for different response groups utilizing a new Boolean function minimization-based pipeline. We applied the new approach to identify predictive biomarker panels and expression signatures for 17-AAG, a small-molecule inhibitor targeting heat shock protein 90 (Hsp90), using pharmacoproteomic data obtained from prostate cancer patient-derived explants (PDEs) (). Blocking Hsp90 is considered as an attractive therapeutic strategy for prostate cancer. This is because Hsp90 is commonly overexpressed in prostate cancer compared to normal prostate cells (); prostate cancer cells are selectively sensitive to Hsp90-directed agents; and Hsp90 clients include the androgen receptor (Trepel et al., 2010), a major driver of prostate tumorigenesis. However, despite the anti-tumour activity of Hsp90 inhibitors (e.g., 17-AAG) in preclinical models (), the lack of companion predictive biomarkers for rational patient stratification have in part contributed to the poor response rates to these agents seen in clinical trials ().

FIGURE 1

We derived a 16-protein biomarker panel that achieved 92% response prediction accuracy to 17-AAG. To facilitate clinical translation, we further reduced this to a compact 5-protein panel having 80% prediction accuracy, and identified associated expression signatures. Interrogation of prostate cancer patient data identified almost half of the patients with matching expression signatures, who may benefit from 17-AAG treatment. Overall, this work presents a novel ML framework that aids the discovery of predictive biomarker panels for improved patient selection and treatment of cancer.

Materials and methods

Patient data analysis

Patient data from two prostate cancer patient cohorts were used to interrogate the utility of our derived 5-gene biomarker panel as a potential patient stratification tool. These include the TCGA (Pancancer Atlas, ()) cohort containing 494 patients; and another independent prostate cancer patient (PNAS 2019, ()) cohort containing 208 patients. Patient-specific transcriptomic data was downloaded from the cBioPortal for Cancer Genomics database (; )) for analysis (see Figure 5E, left panels).

Patient-derived explant (PDE)

Fresh prostate cancer specimens were obtained with written informed consent through the Australian Prostate Cancer BioResource from men undergoing robotic radical prostatectomy at the Royal Adelaide Hospital and St Andrew’s Hospital (Adelaide, South Australia). Tumors from two cohorts of patients were used for this study: a discovery cohort (n = 40, obtained from ()), and a test cohort (n = 7). A single 6 mm core of tissue was obtained per patient. A longitudinal section of the entire core was taken for hematoxylin and eosin (H&E) analysis of tumor content. The remaining tissue was dissected into 1 mm3 pieces and cultured in triplicate on a presoaked gelatin sponge (Johnson and Johnson, New Brunswick, NJ) in 24-well plates containing 500 L RPMI 1640 with 10% FBS, 1 antibiotic/antimycotic solution (Sigma, St Louis, MO), 0.01 mg/ml hydrocortisone, 0.01 mg/ml insulin (Sigma) and cultured for 48 h with 17-AAG (500 nM) or DMSO vehicle alone as previously described (). Mass spectrometry-based proteomic profiling were performed on the discovery cohort as described in (), and these data were used as inputs (features) for our ML models. Treatment response was quantified based on the relative expression of the proliferative marker Ki67, measured post drug treatment by immunohistochemical (IHC) assay ().

Immunohistochemical staining

Paraffin-embedded tissues were sectioned (2 mM) on Ultraplus slides prior to H&E staining and IHC detection of Ki67 (Agilent, M7240 antibody; 1:200 dilution, Santa Clara, CA). IHC staining was performed and tissues assessed for tumor content and Ki67 positivity in a blinded fashion as described in ().

qRT-PCR

Real-Time Quantitative Reverse Transcription PCR was used to measure the baseline expression of the five genes identified in our 5-gene biomarker panel: AQP1, SEPT8, RBM17, TRIM47, and VPS25 for the testing PDE cohort (Supplementary Table S5). qRT-PCR was also used to measure the baseline and post-treatment expression of MKi-67, the gene encoding of the proliferative marker Ki67. Cultured patient derived explants were placed in a Precellys Tissue Homogenizer (Bertin instruments) for 2 cycles at 6500rpm. RNA was extracted from tissue homogenate using miRNeasy mini kit (Qiagen) according to manufactures instructions. RNA (700 ng) was reversed transcribed to cDNA using IScript cDNA synthesis kit (Bio-Rad). QRT-PCR was performed with a 1:10 dilution of cDNA using SYBR green (Bio-Rad) on a CFX 384 real time system (Bio-Rad). Relative gene expression was calculated using the comparative ct method and normalized to internal control genes GAPDH & TUBA1B. Primer sequences used for PCR are given in Supplementary Table S5.

ML implementation

To classify drug response groups, we developed a multi-class Support Vector Machine (SVM) and an artificial neural network (ANN) classifier using the MATLAB function fitcecoc and patternnet, respectively. For the multi-class SVM model, we set the “standardized” option to “true,” which normalized the predictor data and used the option linear as the kernel function of mSVM. For the ANN model, the predictor data was also normalized, and the size of the hidden layers was set to 10. Protein expression data profiled from 40 prostate cancer PDE samples was used for model training and testing. For training and testing, the functions’ default settings were used (e.g., scaled conjugate gradient backpropagation algorithm (), implemented using MATLAB function trainscg), with 80%–20% data split ratio. For the implementation of K-Nearest Neighbor, Naïve Bayes, Random Forest, and AdaBoost we used Matlab functions fitcknn (Distance = ‘Euclidean’), fitcnb (Kernel = ‘Normal’), fitrensemble (Method = ‘Bag’), fitcensemble (Method = ‘AdaBoostM2’), respectively. For Deep Forest, we utilized the Matlab codes developed by (Zhou and Feng, 2017), available at Github at https://github.com/cnzakimuena/casForest.git.

These data were deposited onto the Mass spectrometry Interactive Virtual Environment (www.massive.ucsd.edu) with identifier: MSV000082244 (). Model validation was performed using the function predict for mSVM and sim for ANN, respectively. ROC curves and confusion table were generated using functions roc and confusionchart in MATLAB. All the relevant codes were deposited on Github at https://github.com/NguyenLabNetworkModeling/GFFS-Biomarker.

Importance score calculation

The importance score (IS) associated with a feature was calculated through performing a systematic ‘feature drop-out’ analysis. For this, each feature (e.g., DEP) was removed from the feature list, one at a time, and the effect on model prediction performance was assessed. IS measures the difference in prediction accuracy between the ‘drop-out’ and the original mSVM model, computed as follows:where PAO and PAi represent the prediction accuracy of the original mSVM and the “drop-out” model where input feature i is removed from the feature list. Thus, IS > 0, <0, = 0 indicate the dropped-out feature has a positive, negative, or no impact on the model predictive performance, respectively.

Feature selection

Our feature selection strategy GFFS was implemented based on the IS values, as described in the text. To compare different feature selection algorithms, we implemented ReliefF using the function relief and MRMR using the function fscmrmr in MATLAB. For LASSO regression, we used the function fitcecoc and lasso as a regularization method. To calculate the importance score of the Boosting and Bagging ensemble models, we used the functions predictorImportance and oobPermutedPredictorImportance in MATLAB. We also implemented RFE and FFE strategies on top of the SVM.

Explainable ML analyses

SHAP and LIME analyses were implemented using the shapley and lime functions in MATLAB, and AdaBoostM2 as a ‘black-box ensemble model’.

Boolean functions and minimization

A Boolean function is an algebraic expression consisting of n-binary variables, f (x1, x2, … , xn). Boolean functions can be formulated through Sum of Product (SOP) or Product of Sum (POS). In SOP, different product terms of inputs are summed together, where the products are logical AND the sum are OR operators. For example: x′+xy + yz’ where x, y and z are binary variables and prime (‘) represent complement of a variable, that is if x = 0 then x′ = 1. On the other hand, in POS products of different summation terms of inputs are taken, e.g. (x′)⋅(x + y)⋅(y + z′). Boolean functions can be simplified using Boolean laws and theorems (; Whitesitt, 2012). The process of simplifying the algebraic expression of a Boolean function is called ‘minimization’. To minimize the Boolean function, we employed the Quinine-MacCluskey algorithm () implemented in MATLAB (http://www.tu-harburg.de/∼rtsap/#Programs).

Statistical and bioinformatic analysis

Statistical t-tests were performed using GraphPad Prism 9 and Matlab R2022b. For the KEGG pathway and the GO function analysis we utilized Enrichr web application (), which can be accessed at https://maayanlab.cloud/Enrichr/.

Pharmacoproteomic data from prostate cancer PDEs for machine learning

To recapitulate the in vivo response of prostate cancer to therapies, we have previously developed an ex vivo culturing model of prostate cancer tissue that retains the structure and stromal-epithelial interactions of the tumor microenvironment and provides the level of disease heterogeneity seen in patients (). Using this system, we established in a previous study 40 prostate cancer patient-derived explants (PDEs) and subjected them to either vehicle (DMSO) or 17-AAG (500 nM) treatment for 48 h (Figure 2A, see also Materials and Methods) (). Treatment response was quantified based on the relative expression of the proliferative marker Ki67, measured post drug treatment by immunohistochemical assay (). In addition, we performed mass spectrometry-based proteomic profiling and HRM-DIA data analysis on the corresponding 40 PDEs, which identified the expression of 3,766 quantifiable proteins prior to 17-AAG treatment (). These datasets will be used in this study to develop companion biomarkers that accurately predict response to 17-AAG treatment.

FIGURE 2

). (B) Classification of the PDEs into three response groups based on Ki67 positivity post 17-AAG treatment: RD, PR, and NR with indicated fold-change in Ki67 positivity (in log2 scale). (C) Unsupervised hierarchical clustering of the PDEs using the proteomic data, which failed to reasonably predict response to 17-AAG.

To label the data, the PDE samples were classified into three distinct response groups based on changes in Ki67 positivity upon treatment with 17-AAG (). These are depicted in Figure 2B: (i) RD (responders) group containing PDEs having > two-fold decrease in Ki67 positivity; (ii) PR (poor responders) group containing PDEs with < two-fold increase in Ki67 positivity; and (iii) NR (non-responders) group with Ki67 positivity in between. As a result, 14 PDEs were classified as RD, 17 as NR and 9 as PR (Figure 2B; Supplementary Table S2). Of note, the PDE proteomic data has 0.16% missing (undetectable) values and they were imputed with random values generated from a uniform distribution between 0 and 1 (1 is the minimal machine-detectable protein amount) (Wei et al., 2018). Together, the PDE data consists of protein expression levels of 3,766 proteins serving as ‘input features’ and Ki67-based response classification serving as ‘labelled outputs’ for development of ML models.

Results

Supervised ML using differentially expressed proteins (DEPs) sub-optimally predicts 17-AAG response

Using expression levels of all the 3,766 proteins as inputs, we first tested whether unsupervised hierarchical clustering could predict the PDE response to 17-AAG treatment. While this identified three distinct clusters, they poorly reflected the labelled response groups (Figure 2C). Each of the three clusters comprises a good mixture of RD, NR and PR samples, suggesting that unsupervised clustering could not reasonably predict response to 17-AAG.

Next, to examine if supervised ML methods would improve the response prediction, we developed a multi-class support vector machine (mSVM) model using the protein expression as inputs and the labelled drug responses (RD, PR, NR) as outputs (Figure 3A). The dataset was randomly divided into a training (80%, 32 PDEs) and a test set (20%, 8 PDEs). To avoid biases in data splitting and mitigate model overfitting, we held out the test set and trained the model with the training set. This training and test process were repeated 50 times to obtain reliable and robust performance evaluation (). We found that the model displayed an average prediction accuracy of ∼39% (Supplementary Figure S1). This poor performance is somewhat expected because the number of input variables/features (3,766) greatly exceeds the number of samples (40), a phenomenon known as ‘curse of dimensionality’ in ML (). By this principle, the prediction power of a ML classifier typically improves as the number of the features gradually increases, but after an threshold (i.e., optimal) number of features, adding more starts to diminish the model performance (). This is because the high dimensionality of the input data causes every observation to appear equidistant from the others, preventing meaningful clustering (). Moreover, irrelevant or partially relevant features can negatively impact model performance ().

FIGURE 3

In order to circumvent the curse of dimensionality, we carried out a feature selection strategy with the goal to rationally reduce the number of non-relevant features (; ). As differentially expressed proteins (DEPs) often provide a good starting point for identifying potential biomarkers (; ), we first performed differential expression analysis between the three response groups using analysis of variance (ANOVA) tests, and obtained a total of 157 DEPs (p-value <0.05, Supplementary Table S2). Unsupervised hierarchical clustering using these DEPs still failed to appropriately cluster the PDE samples (Supplementary Table S2), confirming the suboptimal performance of this approach. Next, we retrained the mSVM model using the 157 DEPs as input features and found that it displayed an average prediction accuracy of 77% (Figure 3B). Examining the confusion matrix results further showed that while the precision and sensitivity for the RD and NR groups are around and above 80%, they are below 55% for the PR group (Figure 3C), demonstrating the mSVM model did not perform well against the PR group. Consistently, analysis of the receiver operating characteristic (ROC) curves confirms that model performance against the PR group was inferior compared to the other groups (Figure 3D).

In addition to mSVM, for comparison purposes we also performed similar analyses using an array of common ML methods, including artificial neural network (ANN), K-Nearest Neighbor (KNN) (; Zhang, 2016), Naive Bayes (Yousef et al., 2007), Decision Tree (), AdaBoost (), Random Forest and Deep Forest (; Zhou and Feng, 2018; ). The results show that mSVM was the best-performing algorithm, followed by ANN and KNN (Supplementary Figure S3). Like mSVM, the ANN model performed relatively poorly in predicting the PR group (Supplementary Figure S4). Together, these results suggest that although supervised ML approaches perform better than unsupervised hierarchical clustering, using all the DEPs as features may be inadequate for optimizing predictive power. This may be due to the noise exhibited by certain DEPs that bear no relevance in predicting response to 17-AAG, which interferes with the predictive signals from the relevant features, thereby lowering the model’s overall predictive performance ().

To interrogate how modulation of the input feature space may influence performance of the mSVM, we systematically increased the number of features by adding the DEPs one by one to the training set and re-evaluated the model prediction accuracy. Figure 3E shows an overall upward trajectory of prediction accuracy as the number of feature increases. However, there were specific DEPs whose addition to the feature space actually worsened the model’s predictive power, evidenced by drops in the trajectory (indicated by red arrows, Figure 3E). Specifically, 66 of the 157 DEPs contributed positively to the model performance while 63 contributed negatively, and some had negligible effects on performance (Figure 3F). These results support the idea that irrelevant features can negatively impact the model’s ability to predict drug response, and thus rational selection of informative features is key in improving predictive performance.

A novel ML framework maximises prediction accuracy through rational feature selection

To select the most relevant features from the DEPs, we first performed a systematic feature drop-out analysis. One at a time, each DEP was removed from the feature space and the effect on performance of the mSVM was assessed, as compared to the original model using all the 157 DEPs as features (workflow in Figure 4A). If removal of a protein attenuates/improves the model prediction accuracy, then the protein is deemed to have a positive/negative impact on drug response prediction. We quantified these effects by defining an ‘importance score’ (IS) as in Eq. 1 that computes the difference in prediction accuracy between the drop-out and original mSVM models. Thus, IS > 0, <0, and = 0 indicates proteins having positive-impact, negative-impact and no-impact on drug response prediction, respectively (Materials and Methods). Figure 4B displays a sorted list of the 157 DEPs according to the respective IS values. Interestingly, a large fraction (48%) of the DEPs had a negative impact on drug response prediction (Figure 4B), suggesting inclusion of these in the feature space may diminish the model performance. In contrast, more than half of the DEPs had a positive impact on the drug response prediction (Figure 4B), with the top 20 proteins shown in Figure 4C.

FIGURE 4

We reasoned that the positive-impact DEPs would represent good candidate features for maximizing model prediction. We next introduced a new algorithm, termed greedy forward feature selection (GFFS), which aimed to select the optimal combination of features from the pool of positive-impact DEPs. A schematic of the algorithm is given in Figure 4D. First, we trained the mSVM using the positive-impact DEP having the highest IS (i.e., SEPT8; Figure 4E) as the single input variable, employing a similar training/validation data splitting scheme as in Section 3.2. Unsurprisingly, this single-feature model achieved ∼50% accuracy (Figure 4E), much worse compared to the model using all 157 DEPs. Next, we retrained the model by adding the second most influential DEP (i.e. FDFT1 having the second highest IS) to the feature space and re-evaluated the model performance. Because the new model had a better overall prediction accuracy, FDFT1 was kept as an input feature (Figure 4E). This process was repeated by gradually adding the next most important DEP to the feature space: if the new DEP improves prediction accuracy then it is kept; however, if it attenuates (or does not affect) accuracy, the protein is skipped and we move to the next positive-impact DEP. This was done until all the positive-impact DEPs were considered and the model performance did not further increase (Figure 4D). As a result, we determined an optimal feature space containing 16 DEPs, depicted in Figure 4E. The corresponding mSVM model achieved an overall prediction accuracy of 92% (Figures 4E–G), which was significantly superior to the initial model using all the 157 DEPs (77%, Figure 3B). This was further confirmed by examining the confusion matrix (Figure 4H) and the ROC curves (Figure 4I), indicating significantly improved prediction of 17-AGG response within each of the response groups.

A key attribute of our ML-based algorithm is the rationalized selection of features guided by prior IS-based ranking. To determine if this was critical in enhancing prediction accuracy, we assessed the performance of mSVM models using randomly selected features instead, and replicated this 1,000 times. The result shows that the model with feature selection consistently and significantly outperformed the random-feature models (Figure 4J), suggesting our IS-based feature selection strategy was key in boosting predictive power.

Next, we comparatively evaluated the performance of our IS-based GFFS approach with a range of available feature selection algorithms, including filter (ReliefF; minimum redundancy maximum relevance (MRMR)) (; ), wrapper [recursive feature elimination (RFE); forward feature selection (FFS)] (; Tang et al., 2007; ; Zhang et al., 2013) and embedded (boosting; bagging; least absolute shrinkage and selection operator (LASSO)) methods (Vasquez et al., 2016; ) (Supplementary Figure S5A). GFFS showed significantly better predictive accuracy than all of the tested methods except for RFE, with which GFFS had comparable performance (Supplementary Figure S5A). Interestingly, the top three performers were GFFS, RFE, and FFS, highlighting the importance of rational feature selection in this context. We note that the maximal performance of FFS and RFE was achieved with 41 and 28 features, respectively (Supplementary Figures S5C, D), which were higher than GFFS, but at the cost of much larger number of features. Importantly, among of the top three methods, GFFS’s running time scales linearly and was significantly better than RFE and FFS (Supplementary Figure S5B). Thus overall, GFFS-based feature selection achieved a strong and balanced performance in terms of predictive accuracy and computational cost.

Identification of a compact biomarker panel for 17-AAG treatment response

There is a general trade-off between the size of a biomarker panel and its practical applicability. A panel having more relevant proteins tends to deliver enhanced prediction, but this comes at a cost of having to detect more readouts from patients–a non-trivial task for poorly characterized biomarkers. In order to facilitate translation of the predictive biomarkers for 17-AAG-based therapy, here we aim to derive a more compact-size panel from the 16 identified marker proteins while maintaining high predictive performance. To this end, we considered all possible ways to combine the marker proteins into panels with increasing size, ranging from 1 to 16 (Figure 4K). As such, there are 16 possible panels with size 1; 120 panels with size 2; 4368 panels with size 5; and so on. We then evaluated the predictive performance of the mSVM model using each panel as input features. The results, displayed in Figure 4L, show that for each panel size the prediction accuracy varied significantly depending on the specific composition of the feature proteins (Supplementary Table S3). For instance, among 4368 5-protein panels, the one comprising AQP1, SEPT8, RBM17, TRIM47, and VPS25 exhibits the highest prediction accuracy of 80% (Figure 4L–M). Interestingly, this panel significantly outperformed the 5-protein panel derived from ranked IS score (accuracy 60%, Figure 4M), and panels derived from random shuffling (accuracy 61%, Figure 4M). Moreover, this 5-protein panel also outclassed the model using all the 157 DEPs (accuracy 77%, Figure 3B). Taken together, given its small size yet excellent predictive power, we concluded [AQP1, SEPT8, RBM17, TRIM47, VPS25] as a novel, practical biomarker panel for predicting response to 17-AAG treatment in prostate cancer.

Machine learning models have traditionally been treated as “black boxes”. As ML applications become more widespread, it is important to better interpret ML-based predictions and decision-making processes. Largely, the model interpretability (or explainability) methods can be categorized in two types: (i) global and (ii) local approaches (; ; ). Global explainability approaches explain the model’s behavior as a whole (across whole samples). For example, which features in the model contribute to the model’s prediction performance and how important they are. In Figure 4C, we have analysed the importance of individual features through performing a systematic “feature drop-out” analysis, which exactly corresponds to a global explainability method (). On the other hands, local explainability approaches explain why and how the model make a particular decision for a particular sample (). Among these, LIME (Local Interpretable Model-agnostic Explanations; (; )] and SHAP (SHapley Additive exPlanations, (; )] have emerged as state-of-the-art approaches. For example, Gardiner et al. have recently applied SHAP to infer important features associated with drug responses (5-ASA, Prednisolone, BIRB796) for patients having inflammatory bowel diseases ().

Thus, to examine the relative contribution of each feature (protein) to the prediction of drug response (RD, NR, and PR classification), we implemented SHAP (; ) and LIME (; ) analyses. TRIM47, RBM17 and AQP1 were found to positively contribute to model prediction of the RD class; while the VPS25 and SEPT8 contribute negatively instead to the model prediction or not strong enough (Supplementary Figure S7A); The SHAP results were consistent with the importance score of LIME. On the other hand, RBM17 and AQP1 both have a positive impact on NR and PR classes but the contribution of TRIM47 is less significant for NR. The correlation analysis of features (proteins) with the Shapley values revealed that the AQP1 expression has a negative impact on the NR prediction but a positive impact on PR (Supplementary Table S4). VPS25 and SEPT8 showed a strong correlation with RD and NR although they did not contribute to the model prediction of target variables. Overall, these analyses helped enhance the interpretability of our ML model predictions.

A Boolean algebra-based pipeline to derive biomarker expression signatures

Once the biomarkers have been identified, it is important to define specific expression signatures of these markers that could then be utilized for patient stratification. For this, analyses including t-test and boxplot are often employed to deduce the differential expression patterns of the marker proteins across the response groups. For example, the 17-AAG responsive PDEs (RD group) displayed significantly higher VPS25 expression, while those in the PR group have significantly lower TRIM47 expression compared to the other groups (Figures 5A, B). While useful, these approaches do not consider the expression heterogeneity within each response group (evidenced in Figure 5A) and possible hidden interlinks between the markers. Thus, derivation of biomarker signatures that encapsulate the response group-specific heterogeneity and possible functional links between the markers is important.

FIGURE 5

Here, we propose a new pipeline to identify combinatorial expression signatures for biomarkers characterizing individual response group utilizing methods from Boolean algebra. The pipeline consists of 4 steps and is illustrated in Figure 5C for example proteins A, B, and C. Step 1 discretizes the continuous expression data into binary values where 1 and 0 indicate high and low expression, respectively. This is done by normalizing the protein expression data to its median value across the samples: normalized value >1 or <1 will be converted to 1 or 0, respectively.

In step 2, all combinatorial binary expression patterns of the proteins are identified and summarized in a ‘truth’ table, which are then converted into logical expressions of the proteins (Figure 5C). Then, the logical expression of the individual patterns are summed together in a Sum-of-Products (SOP) form using the Boolean operator (+) (Materials and Methods) (). In step 3, the summed logical expression is reduced to a minimal form without losing information using a Boolean function minimization algorithm, the Quine-McCluskey algorithm (). Finally, in step 4 the resulting reduced logical expression is converted back into binary expression patterns of the biomarkers. In the example in Figure 5C, we started with four expression patterns involving 3 proteins (A′B′C + A′BC + ABC’ + ABC) that were simplified into two patterns (A′C + AB) involving only 2 proteins (; Whitesitt, 2012) (see Figure 5C). Here, the prime (‘) sign indicates the respective protein should be low, and high otherwise.

Next, we applied the new pipeline to our previously identified 5-protein biomarker panel (Figure 5A). As a result, we identified six, five and four specific expression signatures of the biomarkers for the RD, NR and PR response groups, respectively (Figure 5D; Supplementary Figure S6). As an example, Figure 5D displays the six expression signatures for the RD group. Among these, signature ID 6, characterized by concomitant high expression of SEPT8, TRIM47, VPS25 while the expression of AQP1 and RBM17 could be either high or low, represents the most common signature among the RD-group PDEs (∼49.8%). The next most common signature, signature ID 4, is however characterized by high expression of AQP1, SEPT8, TRIM47 coupled with low expression of RBM17, while VPS25 expression could be high or low (Figure 5D). The biomarker signatures identified for the NR and PR groups are given in Supplementary Figure S6. In summary, our new Boolean logics-based pipeline has allowed us to identify specific expression signatures of the biomarkers that could be utilized to stratify patients for 17-AAG response.

Validation of the biomarker signatures using independent PDE and patient cohorts

To demonstrate the utility of our derived biomarker signatures as a tool for patient stratification, we interrogated whether there are patients with matching 17-AAG-responsive signatures using publicly available prostate cancer patient datasets. To this end, two prostate cancer patient cohorts were obtained from the cBioPortal for Cancer Genomics database for analysis (see Materials and Methods). Patient-specific gene expression data of the 5 proteins in our biomarker panel were binarized as in step 1 of our pipeline (Figure 5E, left panels). Comparing the expression patterns of these proteins in the patients with the six identified signatures for the RD group showed that in both cohorts, a substantial fraction of the patients displays matching expression signatures (Figure 5E, right). Consistent with our PDE-based prediction, signature ID 6, the most frequent signature of the RD group (Figure 5D), was actually found in more patients than any other RD-specific signatures. Together, these findings support the utility of the derived biomarker signatures in identifying subsets of patients with specific drug response behaviour.

To further validate the predictive power of our identified biomarker signatures, we generated an independent cohort of prostate cancer patient derived explants (n = 7). Tissues were collected, cultured and analysed as previously described for our discovery cohort (). These PDEs were treated with DMSO and 500 nM 17-AAG for 48 h. Treatment response to 17-AAG was assessed based on changes in Ki-67 positivity compared to vehicle treatment, detected using immunohistochemical staining and using similar cut-offs as done in the discovery cohort (Figure 6A, Materials and Methods). Baseline expression levels of the five biomarker genes AQP1, SEPT8, RBM17, TRIM47 and VPS25 were measured using qRT-PCR for each PDE under DMSO control (Figure 6B). Then, for each PDE we predicted the drug response using the expression signatures of the biomarkers identified using the Boolean optimization-based pipeline for the different response classes (Figure 5D; Supplementary Figure S6). Following the pipeline, for each PDE we first binarized the gene expression levels of the biomarkers into low or high expression, as shown in Figure 6C. The biomarker expression patterns for each PDE were then mapped to the identified signatures for the three response groups. For example, the expression pattern for PDE X34393R matches with Signature ID 5 of the RD group, which correctly predicted this PDE to be responsive to 17-AAG. On the other hand, PDE X34380R matches with Signature ID4 of the NR group, thus correctly predicting this PDE to be non-responsive to the drug. Overall, cross-validating model predictions with measured drug response, our identified signatures correctly predicted response classification for the responsive and non-responsive PDEs, but did not correctly predict the poor-responders, achieving an overall >71% accuracy on this independent dataset. Despite the small size of the validation cohort, this independent validation analysis has provided a proof-of-concept demonstrating the potential of our predictive pipeline. We envisage as more similar data become available in the future, further validation will be done to strengthen the validity of the identified biomarkers.

FIGURE 6

Discussion

Precision oncology embraces cancer treatment strategies that are based on the distinct molecular characteristics of a tumour. However, lack of predictive companion biomarkers that help forecast patient-specific treatment response remains a barrier to widespread adoption of this paradigm (; ). In this study, we have developed a novel computational framework that couples supervised machine learning-based biomarker discovery with Boolean algebra-based signature derivation, in order to identify predictive multi-gene biomarker signatures for cancer therapies (Figure 1). We demonstrated the utility of the approach by applying it to the HSP90 inhibitor 17-AAG in the context of prostate cancer. The approach is however broadly applicable, and given suitable data, can be deployed for different drugs in various tumour types.

The new framework possesses two salient distinguishing properties. First, it rationalizes the most predictive input features based on an importance score that measures how each feature influences the model’s predictive performance. Only features that contribute positively to the classification accuracy are retained in the feature space. These are then ranked by their IS values, and increasingly combined one-by-one to identify the optimal combinatorial panel of features that delivers the maximal predictive accuracy. Second, once the biomarkers have been identified, our framework innovatively utilizes Boolean algebra and function minimization [Quine-McCluskey algorithm (; )] techniques to deduce common expression patterns of the response-specific biomarkers. Because Quine-McCluskey algorithm enables the minimal form of a Boolean function to be reached, our framework helps derive the most compact response group-specific biomarker expression patterns. These easily-interpretable patterns thus constitute biomarker signatures that ultimately allows predictive selection of patient subgroups having a particular drug response, an ability invaluable for precision clinical trials and treatment.

Boolean function minimization algorithms aim to identify the core logics of the underlying phenomenon and are routinely used in engineering fields, such as to design digital logic circuits (; ). We have previously applied Boolean function minimization to identify core combinatorial feedback loop structures that generate switch-like behavior of E-cadherin (). To our best knowledge, the current study represents the first attempt to apply Boolean function minimization to the problem of biomarker signature derivation. Nevertheless, Boolean logics-based approaches have been used to predict drug response. For example, the LOBICO (Logic Optimization for Binary Input to Continuous Output) modelling framework was developed to explain drug response in cancer cell lines based on binary mutation data of 60 selected genes (). Using integer linear programming, LOBICO aims to identify the logic combinations of mutations that best explain the response of cancer cell lines to cancer drug agents. In a similar vein, MOCA (Multivariate Organization of Combinatorial Alterations) has been applied to predict drug response by inferring logic combinations of genomic input features (). Overall, our framework represents a novel effort in repurposing Boolean function minimization techniques for derivation of drug-response biomarker signatures.

The results in this study emphasize the importance of rational feature selection in optimizing drug response prediction accuracy by machine learning classifiers. While our GFFS approach is similar to FFS in the sense that it starts with no feature, it differs in two key aspects. Firstly, it pre-determines the relative importance of the input features by calculating the IS beforehand via comparison of model performance, and it does this only once (Figures 4A–C). This is opposed to the classical implementation of FFS, where the relative importance ranking of the remaining features is repeatedly evaluated at each iteration. Secondly, GFFS goes through the pre-ranked features, adds each feature and keeps the feature only if it improves the model performance; otherwise, the feature is dropped and the algorithm moves on to the next feature in the ranked list. Again, this differs from the classical FFS, where addition of a new best feature (n+1 features) at the current round may reduce the overall model performance as compared to the optimal model at the previous round (n features), as shown in Fig. S5C. This difference stems from the fact that in FFS-based implementation, a specific number of features is typically specified prior to model running whereas GFFS does not require such specification. Another consequential difference is that the model performance increases monotonically as more features are added with our approach (Figure 4E), while for FFS the model performance could exhibit a drop as new features are added (Supplementary Figure S5C). Importantly, because of the pre-determined feature ranking, GFFS is computationally much more efficient than FFS and RF. In Big-O notation, our algorithm has O (2n) complexity compared to O (n (n+1)/2) complexity displayed by RFE/FFS (see Supplementary Figure S5B). This superiority in computational cost makes GFFS highly scalable compared to other feature selection techniques, particularly when the number of input features to be assessed is in the range of thousands to tens of thousands.

It is worthy to note that in our two-phase framework, the machine learning coupled GFFS feature selection (phase 1) is integrated with but can work independently from the Boolean logics-based biomarker signature identification (phase 2). As such, in principle the Boolean logics-based biomarker signature identification part can be plugged into any other feature section ML approaches (e.g., using FFS or RFE) and serves as a downstream analysis. This plug-and-play flexibility provides another strength of the framework.

Due to its ability to stabilize client oncogenic proteins and thereby maintain the survival of cancer cells, HSP90 presents an attractive therapeutic target and has been explored in a variety of cancers including prostate, breast, and colon cancer (; Wang et al., 2016; ). Although limited, several studies have attempted to identify predictive markers for HSP90-based therapy. For example, Nguyen et al. () has identified PCBP3, an RNA binding protein important in post-transcriptional control of gene expression, as a potential predictive biomarker for 17-AAG response in prostate cancer. In colorectal cancer, high expression of the UDP glucuronosyltransferase 1A (UGT1A) gene was found to correlate with poor sensitivities to the HSP90 inhibitor ganetespib, and its related compound NVP-AUY922, suggesting UGT1A levels in tumour tissues may be a suitable predictive biomarker for ganetespib treatment (). Interestingly, gene expression levels of UGT1A did not show correlation with 17-AAG response, implying different classes of HSP90 inhibitors may have different predictive biomarkers (). In addition, in acute lymphoblastic leukemia (ALL), patients with high levels of phosphorylated Src were more sensitive to the Hsp90 inhibitor NVP-BEP800 compared to those with low phosphorylated Src (), suggesting Src phosphorylation may serve as a predictive biomarker. Moreover, since Hsp90 inhibition regulates Akt phosphorylation and Bcl-xL, expression levels of these effector proteins may be suitable predictive of response to Hsp90 inhibition in triple negative breast cancer (). Similarly, as Hsp90 inhibition downregulates c-Myc expression and upregulates the expression of tumour repressor proteins such as p53 and pRB, which inhibits the G1/S transition (Yamaki et al., 2011), expression levels of cell cycle regulatory proteins such as pRB, E2F, cyclin–cyclin-dependent kinase (CDK) complexes could inform predictive biomarkers in specific tumour contexts. However, there are several limitations associated with current studies of predictive biomarkers for Hsp90 inhibitors, including: (i) their derivation was largely based on correlation analyses; (ii) the biomarkers are mostly single-gene markers and so unlikely to be clinically robust; (iii) and lack of patient-derived data. Together, these factors may explain the fact that so far, no companion predictive biomarkers of Hsp90-based therapy are employed for clinical practice.

In this study, we have aimed to alleviate these limitations through utilization of patient-derived data from unique explant models; implementation of predictive ML modelling rather than association analyses; and derivation of multi-gene rather than single-gene biomarkers. As a result, we have identified a highly-predictive biomarker panel (92% accuracy) consisting of 16 proteins. Its superior performance to individual DEPs and to using all the 157 DEPs points to the importance of selectively combining the relevant input features in optimizing drug-response prediction. The result also highlights the need to venture beyond the contemporary single-marker paradigm. Reassuringly, the identified panel contains proteins that have been implicated in prostate tumorigenesis and drug resistance, including CDK2, IGFBP7, TRIM47, and RBM17. For example, CDK2 was identified as a therapeutic target in prostate cancer (Yin et al., 2018). Its activation is significantly associated with disease recurrence, and its inhibition reduces invasion of prostate cancer cell lines (Yin et al., 2018). Moreover, CDK2 mediates androgen-dependent inhibition of AR+, castration-resistant prostate cancer cell proliferation (). IGFBP7, a member of the insulin growth factor binding protein family, is involved in a variety of cancers including prostate cancer (; ). Aberrant promoter hypermethylation of IGFBP7 and consequential gene silencing were found in prostate cancer cell lines (). On the other hand, the tripartite motif (TRIM) protein TRIM47 is significantly increased in prostate cancer compared to normal tissues (). In addition, SPF45, a splicing factor, is overexpressed in select tumours including prostate cancer, and it confers resistance to multiple anti-cancer drugs (; ). Overall, these evidences support the validity of our predictive multi-protein biomarker panel.

Translation of predictive biomarkers into clinical usage depends strongly on the ability to develop assays for detection of these markers in patient samples. We therefore reasoned that compact biomarker panels displaying strong predictive power are optimal for clinical application. With this in mind, we reduced the panel from 16 to 5 proteins, which achieved excellent prediction accuracy (80%). The 5-protein panel includes VPS25 (Vacuolar Protein Sorting 25 Homolog), TRIM47 (Tripartite motif 47), RBM17 (RNA Binding Motif Protein 17), SEPT8 (Septin-8) and AQP1 (Aquaporin 1). In addition to TRIM47’s involvement in prostate cancer mentioned above, RBM17 is frequently overexpressed in a variety of carcinomas, including prostate cancer (). Importantly, RBM17 confers resistance to doxorubicin and vincristine, two chemotherapeutic drugs commonly used in cancer treatment (). Septins are GTP-binding proteins that are evolutionarily and structurally related to the RAS oncogenes (). Septin’s expression levels are altered in hormonally regulated cancers such as prostate, breast, ovarian and endometrial cancers (; ). AQP1 is known to be upregulated by hypoxia that leads to increased cell water permeability, motility, and migration in neuroblastoma, lung and prostate cancer cells (; ; Wei and Dong, 2015; ). Further, AQP1 is involved in microvascular alteration during prostate tumour angiogenesis (); and it promotes sensitivity of anthracycline chemotherapy in breast cancer (). Taken together, these studies provide evidence linking the identified marker proteins to prostate cancer, supporting to the validity of the simplified panel. Further understanding of the roles of these proteins in prostate cancer tumorigenesis, and how they mechanistically modulate 17-AAG sensitivity are important areas of future research.

In addition, we conducted KEGG pathway and GO function analysis using both the 5-protein and 16-protein biomarker panels. As shown in the Supplementary Figure S8, the 5-proteins panel is mainly related to the proximal tubule bicarbonate reclamation () and renin secretion () in the KEGG pathway analysis. Proximal tubule bicarbonate reclamation is a process by which the proximal tubules in the kidney reclaim bicarbonate ions from the filtrate in the renal tubules (). This process helps to maintain electrolyte balance in the body by reabsorbing bicarbonate ions and preventing their excretion in urine. Renin secretion is important in cancer development as it regulates the production of angiotensin II, which has been shown to stimulate cancer cell growth and proliferation (). The 16-protein panel was found to be mainly related to steroid biosynthesis, a process by which the body produces steroid hormones. Abnormal steroid hormone production, which can be influenced by abnormalities in steroid biosynthesis pathways, has been linked to prostate cancer (Wilding, 1992; ).

Our GO function analysis identified that both the 5- and 16-protein biomarker panels are mainly related to polyol transmembrane transporter activity (GO:0015166) and intracellular cGMP-activated cation channel activity (GO:0005223). Polyol transmembrane transporter activity involves the transport of small sugar molecules, such as glucose, across cell membranes. Dysregulation of this activity has been implicated in cancer development, as it can contribute to increased cellular proliferation and survival (). Intracellular cGMP-activated cation channels are proteins activated by the signaling molecule cGMP, which allow ions to enter cells (). Dysregulation of these channels has been linked to the development of various types of cancer, including breast, prostate, and ovarian cancer ().

Importantly, in an effort to validate the identified 5-protein biomarker signatures, we have generated an independent validation cohort of PDEs, and predicted their responses to 17-AAG treatment based on the PDE-specific expression levels of the five marker proteins. Overall, our framework correctly predicted the response for the responsive and non-responsive PDEs, but did not correctly predict the poor-responders, achieving >71% accuracy on this independent dataset. A limitation of the current validation is pertained to the small size of the validation cohort, due primarily to the challenge in accessing a large number of suitable patient samples and establishing the corresponding PDEs. This, however, is a general issue in biomarker studies utilizing pharmacogenomic data derived from cancer patients (; ; ). We envisage as additional PDEs are generated in the future, the data will provide a more robust validation of our pipeline.

In summary, we have developed a new computational framework based on machine learning that aids the identification of multi-gene predictive biomarkers for targeted cancer drugs. While we have demonstrated its power focusing on prostate cancer as a proof-of-concept, the framework has broad applicability and can be applied to other drugs and cancer types in future studies.

Statements

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: TCGA, cancer_study_identifier: prad_tcga_pan_can_atlas_2018, PNAS cancer_study_identifier: prad_su2c_2019.

Author contributions

Conceptualization, S-YS and LN; methodology, S-YS, MC, JH, EN, LB, RD, and LN; writing—original draft preparation and revision, S-YS and LN; supervision, S-YS and LN; funding acquisition, LN. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a Victorian Cancer Agency Mid-Career Research Fellowship (MCRF18026) and an Investigator Initiated Research Scheme grant from National Breast Cancer Foundation and Love Your Sister (IIRS-20–094) that were awarded to LN.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2023.1094321/full#supplementary-material

References

  • 1

    AbbeyM.GaestelM.MenonM. B. (2019). Septins: Active GTPases or just GTP-binding proteins?Cytoskelet. Hob.76, 5562. 10.1002/cm.21451

  • 2

    AbidaW.CyrtaJ.HellerG.PrandiD.ArmeniaJ.ColemanI.et al (2019). Genomic correlates of clinical outcome in advanced prostate cancer. Proc. Natl. Acad. Sci. U. S. A.116, 1142811436. 10.1073/pnas.1902651116

  • 3

    AhaD. W.BankertR. L. (1996). “A comparative evaluation of sequential feature selection algorithms,” in Learning from data: Artificial intelligence and statistics V. Editors FisherD.LenzH.-J. (New York, NY: Springer), 199206.

  • 4

    AlsahafA.PetkovN.ShenoyV.AzzopardiG. (2022). A framework for feature selection through boosting. Expert Syst. Appl.187, 115895. 10.1016/j.eswa.2021.115895

  • 5

    AngelisD.SpiliotisE. T. (2016). Septin mutations in human cancers. Front. Cell Dev. Biol.4, 122. 10.3389/fcell.2016.00122

  • 6

    ArmstrongH. K.KoayY. C.IraniS.DasR.NassarZ. D.SelthL. A.et al (2016). A novel class of Hsp90 C-terminal modulators have pre-clinical efficacy in prostate tumor cells without induction of a heat shock response. Prostate76, 15461559. 10.1002/pros.23239

  • 7

    BarretinaJ.CaponigroG.StranskyN.VenkatesanK.MargolinA. A.KimS.et al (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature483, 603607. 10.1038/nature11003

  • 8

    BielM.MichalakisS. (2009). Cyclic nucleotide-gated channels. Handb. Exp. Pharmacol.191, 111136. 10.1007/978-3-540-68964-5_7

  • 9

    BlumA. L.LangleyP. (1997). Selection of relevant features and examples in machine learning. Artif. Intell.97, 245271. 10.1016/s0004-3702(97)00063-5

  • 10

    BomaneA.GonçalvesA.BallesterP. J. (2019). Paclitaxel response can Be predicted with interpretable multi-variate classifiers exploiting DNA-methylation and miRNA data. Front. Genet.10, 1041. 10.3389/fgene.2019.01041

  • 11

    BorstP.WesselsL. (2010). Do predictive signatures really predict response to cancer chemotherapy?Cell Cycle9, 48364840. 10.4161/cc.9.24.14326

  • 12

    CaiJ.LuoJ.WangS.YangS. (2018). Feature selection in machine learning: A new perspective. Neurocomputing300, 7079. 10.1016/j.neucom.2017.11.077

  • 13

    Caldas-LopesE.CerchiettiL.AhnJ. H.ClementC. C.RoblesA. I.RodinaA.et al (2009). Hsp90 inhibitor PU-H71, a multimodal inhibitor of malignancy, induces complete responses in triple-negative breast cancer models. Proc. Natl. Acad. Sci.106, 83688373. 10.1073/pnas.0903392106

  • 14

    CardilloM. R.IppolitiF. (2006). IL-6, IL-10 and HSP-90 expression in tissue microarrays from human prostate cancer assessed by computer-assisted image analysis. Anticancer Res.26, 34093416.

  • 15

    CenteneraM. M.GillisJ. L.HansonA. R.JindalS.TaylorR. A.RisbridgerG. P.et al (2012). Australian Prostate CancerEvidence for efficacy of new Hsp90 inhibitors revealed by ex vivo culture of human prostate tumors. Clin. Cancer Res.18, 35623570. 10.1158/1078-0432.CCR-12-0782

  • 16

    CeramiE.GaoJ.DogrusozU.GrossB. E.SumerS. O.AksoyB. A.et al (2012). The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov.2, 401404. 10.1158/2159-8290.CD-12-0095

  • 17

    ChenC.ZhangL. G.LiuJ.HanH.ChenN.YaoA. L.et al (2016). Bioinformatics analysis of differentially expressed proteins in prostate cancer based on proteomics data. Onco Targets Ther.9, 15451557. 10.2147/OTT.S98807

  • 18

    ChongW.ZhangH.GuoZ.YangL.ShaoY.LiuX.et al (2021). Aquaporin 1 promotes sensitivity of anthracycline chemotherapy in breast cancer by inhibiting β-catenin degradation to enhance TopoIIα activity. Cell Death Differ.28, 382400. 10.1038/s41418-020-00607-9

  • 19

    Di IorioP.RonciM.GiulianiP.CaciagliF.CiccarelliR.CarusoV.et al (2021). Pros and cons of pharmacological manipulation of cGMP-PDEs in the prevention and treatment of breast cancer. Int. J. Mol. Sci.23, 262. 10.3390/ijms23010262

  • 20

    DieciM. V.MigliettaF.GriguoloG.GuarneriV. (2020). Biomarkers for HER2-positive metastatic breast cancer: Beyond hormone receptors. Cancer Treat. Rev.88, 102064. 10.1016/j.ctrv.2020.102064

  • 21

    DingC.PengH. (2005). Minimum redundancy feature selection from microarray gene expression data. J. Bioinform Comput. Biol.3, 185205. 10.1142/s0219720005001004

  • 22

    DolatL.HuQ.SpiliotisE. T. (2014). Septin functions in organ system physiology and pathology. Biol. Chem.395, 123141. 10.1515/hsz-2013-0233

  • 23

    DuboseT. D. (1990). Reclamation of filtered bicarbonate. Kidney Int.38, 584589. 10.1038/ki.1990.246

  • 24

    FengD.-C.LiuZ.-T.WangX.-D.ChenY.ChangJ.-Q.WeiD.-F.et al (2020). Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Constr. Build. Mater.230, 117000. 10.1016/j.conbuildmat.2019.117000

  • 25

    FortinoV.WisgrillL.WernerP.SuomelaS.LinderN.JalonenE.et al (2020). Machine-learning–driven biomarker discovery for the discrimination between allergic and irritant contact dermatitis. Proc. Natl. Acad. Sci.117, 3347433485. 10.1073/pnas.2009192117

  • 26

    FujimuraT.InoueS.UranoT.TakayamaK.YamadaY.IkedaK.et al (2016). Increased expression of tripartite motif (TRIM) 47 is a negative prognostic predictor in human prostate cancer. Clin. Genitourin. Cancer14, 298303. 10.1016/j.clgc.2016.01.011

  • 27

    FutrealP. A.CoinL.MarshallM.DownT.HubbardT.WoosterR.et al (2004). A census of human cancer genes. Nat. Rev. Cancer4, 177183. 10.1038/nrc1299

  • 28

    GaoJ.AksoyB. A.DogrusozU.DresdnerG.GrossB.SumerS. O.et al (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal6, pl1. 10.1126/scisignal.2004088

  • 29

    GaoH.KornJ. M.FerrettiS.MonahanJ. E.WangY.SinghM.et al (2015). High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med.21, 13181325. 10.1038/nm.3954

  • 30

    GardinerL. J.CarrieriA. P.BinghamK.MacluskieG.BuntonD.McneilM.et al (2022). Combining explainable machine learning, demographic and multi-omic data to inform precision medicine strategies for inflammatory bowel disease. PLoS One17, e0263248. 10.1371/journal.pone.0263248

  • 31

    GarnettM. J.EdelmanE. J.HeidornS. J.GreenmanC. D.DasturA.LauK. W.et al (2012). Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature483, 570575. 10.1038/nature11005

  • 32

    GeeleherP.CoxN. J.HuangR. S. (2014). Clinical drug response can be predicted using baseline gene expression levels and in vitrodrug sensitivity in cell lines. Genome Biol.15, R47. 10.1186/gb-2014-15-3-r47

  • 33

    GilletJ. P.VarmaS.GottesmanM. M. (2013). The clinical relevance of cancer cell lines. J. Natl. Cancer Inst.105, 452458. 10.1093/jnci/djt007

  • 34

    GopikaN.Meena KowshalyaA.M.E. (2018). “Correlation based feature selection algorithm for machine learning,” in 2018 3rd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, October 15–16, 2018, 692695.

  • 35

    GuidottiR.MonrealeA.RuggieriS.TurinF.GiannottiF.PedreschiD. (2019). A survey of methods for explaining black box models. Acm Comput. Surv.51, 142. 10.1145/3236009

  • 36

    HanfW. (1975). The Boolean algebra of logic. Bull. Am. Math. Soc.81, 587589. 10.1090/s0002-9904-1975-13747-5

  • 37

    HeathE. I.HillmanD. W.VaishampayanU.ShengS.SarkarF.HarperF.et al (2008). A phase II trial of 17-allylamino-17-demethoxygeldanamycin in patients with hormone-refractory metastatic prostate cancer. Clin. Cancer Res.14, 138141. 10.3816/cgc.2005.n.024

  • 38

    HoadleyK. A.YauC.HinoueT.WolfD. M.LazarA. J.DrillE.et al (2018). Cell-of-Origin patterns dominate the molecular classification of 10, 000 tumors from 33 types of cancer. Cell173, 291304. 10.1016/j.cell.2018.03.022

  • 39

    HuangC.ClaytonE. A.MatyuninaL. V.McdonaldL. D.BenignoB. B.VannbergF.et al (2018). Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy. Sci. Rep.8, 16444. 10.1038/s41598-018-34753-5

  • 40

    HughesG. (1968). On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory14, 5563. 10.1109/tit.1968.1054102

  • 41

    HuntingtonE. V. (1933). New sets of independent postulates for the algebra of logic, with special reference to Whitehead and Russell's principia mathematica. Trans. Am. Math. Soc.35, 274304. 10.1090/s0002-9947-1933-1501684-x

  • 42

    HuoZ.LomoraM.KymU.PalivanC.Holland-CunzS. G.GrosS. J. (2021). AQP1 is up-regulated by hypoxia and leads to increased cell water permeability, motility, and migration in neuroblastoma. Front. Cell Dev. Biol.9, 605272. 10.3389/fcell.2021.605272

  • 43

    HwangI.JungS.-I.HwangE.-C.SongS. H.LeeH.-S.KimS.-O.et al (2012). Expression and localization of aquaporins in benign prostate hyperplasia and prostate cancer. Chonnam Med. J.48, 174178. 10.4068/cmj.2012.48.3.174

  • 44

    IorioF.KnijnenburgT. A.VisD. J.BignellG. R.MendenM. P.SchubertM.et al (2016). A landscape of pharmacogenomic interactions in cancer. Cell166, 740754. 10.1016/j.cell.2016.06.017

  • 45

    JainT. K.KushwahaD. S.MisraA. K. (2008). “Optimization of the quine-McCluskey method for the minimization of the boolean expressions,” in Fourth International Conference on Autonomic and Autonomous Systems (ICAS'08), 165168.

  • 46

    JinL.ShenF.WeinfeldM.SergiC. (2020). Insulin growth factor binding protein 7 (IGFBP7)-Related cancer and IGFBP3 and IGFBP7 crosstalk. Front. Oncol.10, 727. 10.3389/fonc.2020.00727

  • 47

    JohnG. H.KohaviR.PflegerK. (1994). “Irrelevant features and the subset selection problem,” in Machine learning proceedings 1994. Editors CohenW. W.HirshH. (San Francisco (CA): Morgan Kaufmann), 121129.

  • 48

    JonesR. S.MorrisM. E. (2016). Monocarboxylate transporters: Therapeutic targets and prognostic factors in disease. Clin. Pharmacol. Ther.100, 454463. 10.1002/cpt.418

  • 49

    KnijnenburgT. A.KlauG. W.IorioF.GarnettM. J.McdermottU.ShmulevichI.et al (2016). Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci. Rep.6, 36812. 10.1038/srep36812

  • 50

    KokontisJ. M.LinH. P.JiangS. S.LinC. Y.FukuchiJ.HiipakkaR. A.et al (2014). Androgen suppresses the proliferation of androgen receptor-positive castration-resistant prostate cancer cells via inhibition of Cdk2, CyclinA, and Skp2. PLoS One9, e109170. 10.1371/journal.pone.0109170

  • 51

    KongJ.LeeH.KimD.HanS. K.HaD.ShinK.et al (2020). Network-based machine learning in colorectal and bladder organoid models predicts anti-cancer drug efficacy in patients. Nat. Commun.11, 5485. 10.1038/s41467-020-19313-8

  • 52

    KuleshovM. V.JonesM. R.RouillardA. D.FernandezN. F.DuanQ.WangZ.et al (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res.44, W90W97. 10.1093/nar/gkw377

  • 53

    KurtzA. (2012). Control of renin synthesis and secretion. Am. J. Hypertens.25, 839847. 10.1038/ajh.2011.246

  • 54

    LandmannH.ProiaD. A.HeS.OgawaL. S.KramerF.BeißbarthT.et al (2014). UDP glucuronosyltransferase 1A expression levels determine the response of colorectal cancer cells to the heat shock protein 90 inhibitor ganetespib. Cell Death Dis.5, e1411. 10.1038/cddis.2014.378

  • 55

    LeeD.ParkY.KimS. (2021). Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief. Bioinform22, bbaa188. 10.1093/bib/bbaa188

  • 56

    LimaA. R.PintoJ.AzevedoA. I.Barros-SilvaD.JerónimoC.HenriqueR.et al (2019). Identification of a biomarker panel for improvement of prostate cancer diagnosis by volatile metabolic profiling of urine. Br. J. Cancer121, 857868. 10.1038/s41416-019-0585-4

  • 57

    LinardatosP.PapastefanopoulosV.KotsiantisS. (2020). Explainable AI: A review of machine learning interpretability methods. Entropy (Basel)23, 18. 10.3390/e23010018

  • 58

    LinardatosP.PapastefanopoulosV.KotsiantisS. (2021). Explainable AI: A review of machine learning interpretability methods. Entropy23, 18. 10.3390/e23010018

  • 59

    LiuY.WangY.ZhangJ. (2012). “New machine learning algorithm: Random forest,” in Information computing and applications. Editors LiuB.MaM.ChangJ. (Springer Berlin Heidelberg), 246252.

  • 60

    LundbergS. M.LeeS.-I. (2017). “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA: Curran Associates Inc).

  • 61

    Marcano-CedeñoA.Quintanilla-DomínguezJ.Cortina-JanuchsM. G.AndinaD. (2010). “Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network,” in IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society, 28452850.

  • 62

    MasicaD. L.KarchinR. (2013). Collections of simultaneously altered genes as biomarkers of cancer cell drug response. Cancer Res.73, 16991708. 10.1158/0008-5472.CAN-12-3122

  • 63

    MateoJ.SteutenL.AftimosP.AndréF.DaviesM.GarraldaE.et al (2022). Delivering precision oncology to patients with cancer. Nat. Med.28, 658665. 10.1038/s41591-022-01717-2

  • 64

    MccluskeyE. J. (1956). Minimization of boolean functions. Bell Syst. Tech. J.35, 14171444. 10.1002/j.1538-7305.1956.tb03835.x

  • 65

    MendenM. P.IorioF.GarnettM.McdermottU.BenesC. H.BallesterP. J.et al (2013). Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One8, e61318. 10.1371/journal.pone.0061318

  • 66

    Min-LingZ.Zhi-HuaZ. (2005). “A k-nearest neighbor based algorithm for multi-label classification,” in 2005 IEEE International Conference on Granular Computing, 718721.

  • 67

    MobasheriA.AirleyR.HewittS. M.MarplesD. (2005). Heterogeneous expression of the aquaporin 1 (AQP1) water channel in tumors of the prostate, breast, ovary, colon and lung: a study using high density multiple human tumor tissue microarrays. Int. J. Oncol.26, 11491158. 10.3892/ijo.26.5.1149

  • 68

    MøllerM. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw.6, 525533. 10.1016/s0893-6080(05)80056-5

  • 69

    MostaghelE. A. (2013). Steroid hormone synthetic pathways in prostate cancer. Transl. Androl. Urol.2, 212227. 10.3978/j.issn.2223-4683.2013.09.16

  • 70

    MshaikR.SimonetJ.GeorgievskiA.JamalL.BechouaS.BalleriniP.et al (2021). HSP90 inhibitor NVP-BEP800 affects stability of SRC kinases and growth of T-cell and B-cell acute lymphoblastic leukemias. Blood Cancer J.11, 61. 10.1038/s41408-021-00450-2

  • 71

    NalejskaE.MaczynskaE.LewandowskaM. A. (2014). Prognostic and predictive biomarkers: tools in personalized oncology. Mol. Diagn Ther.18, 273284. 10.1007/s40291-013-0077-9

  • 72

    NaulaertsS.MendenM. P.BallesterP. J. (2020). Concise polygenic models for cancer-specific identification of drug-sensitive tumors from their multi-omics profiles. Biomolecules10, 963. 10.3390/biom10060963

  • 73

    NavadaA.AnsariA. N.PatilS.SonkambleB. A. (2011). “Overview of use of decision tree algorithms in machine learning,” in 2011 IEEE Control and System Graduate Research Colloquium, 3742.

  • 74

    NguyenL.DangC. C.BallesterP. J. (2016). Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data. F1000Res5. 10.12688/f1000research.10529.2

  • 75

    NguyenE. V.CenteneraM. M.MoldovanM.DasR.IraniS.VincentA. D.et al (2018). Identification of novel response and predictive biomarkers to Hsp90 inhibitors through proteomic profiling of patient-derived prostate tumor explants. Mol. Cell Proteomics17, 14701486. 10.1074/mcp.RA118.000633

  • 76

    NguyenL. C.NaulaertsS.BrunaA.GhislatG.BallesterP. J. (2021). Predicting cancer drug response in vivo by learning an optimal feature selection of tumour molecular profiles. Biomedicines9, 1319. 10.3390/biomedicines9101319

  • 77

    OgunleyeA. Z.PiyawajanusornC.GonçalvesA.GhislatG.BallesterP. J. (2022). Interpretable machine learning models to predict the resistance of breast cancer patients to doxorubicin from their microRNA profiles. Adv. Sci.9, e2201501. 10.1002/advs.202201501

  • 78

    ParcaL.PepeG.PietrosantoM.GalvanG.GalliL.PalmeriA.et al (2019). Modeling cancer drug response through drug-specific informative genes. Sci. Rep.9, 15222. 10.1038/s41598-019-50720-0

  • 79

    PernasS.TolaneyS. M.WinerE. P.GoelS. (2018). CDK4/6 inhibition in breast cancer: current practice and future directions. Ther. Adv. Med. Oncol.10, 1758835918786451. 10.1177/1758835918786451

  • 80

    PerryW. L.3rdShepardR. L.SampathJ.YadenB.ChinW. W.IversenP. W.et al (2005). Human splicing factor SPF45 (RBM17) confers broad multidrug resistance to anticancer drugs when overexpressed--a phenotype partially reversed by selective estrogen receptor modulators. Cancer Res.65, 65936600. 10.1158/0008-5472.CAN-03-3675

  • 81

    PichO.BaileyC.WatkinsT. B. K.ZaccariaS.Jamal-HanjaniM.SwantonC. (2022). The translational challenges of precision oncology. Cancer Cell40, 458478. 10.1016/j.ccell.2022.04.002

  • 82

    QuineW. V. (1955). A way to simplify truth functions. Am. Math. Mon.62, 627631. 10.1080/00029890.1955.11988710

  • 83

    Quintás-CardamaA.CortesJ. (2009). Molecular biology of bcr-abl1-positive chronic myeloid leukemia. Blood113, 16191630. 10.1182/blood-2008-03-144790

  • 84

    RectorF. C.CarterN. W.SeldinD. W. (1998). The mechanism of bicarbonate reabsorption in the proximal and distal tubules of the kidney. J. Am. Soc. Nephrol.9, 11341145; discussion 1134-1139. 10.1681/ASN.V961134

  • 85

    RibeiroM. T.SinghS.GuestrinC. (2016). “Why should I trust you? Explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA: Association for Computing Machinery).

  • 86

    SampathJ.LongP. R.ShepardR. L.XiaX.DevanarayanV.SanduskyG. E.et al (2003). Human SPF45, a splicing factor, has limited expression in normal tissues, is overexpressed in many tumors, and can confer a multidrug-resistant phenotype to cells. Am. J. Pathol.163, 17811790. 10.1016/S0002-9440(10)63538-9

  • 87

    Seashore-LudlowB.ReesM. G.CheahJ. H.CokolM.PriceE. V.ColettiM. E.et al (2015). Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov.5, 12101223. 10.1158/2159-8290.CD-15-0235

  • 88

    ShinS. Y.RathO.ZebischA.ChooS. M.KolchW.ChoK. H. (2010). Functional roles of multiple feedback loops in extracellular signal-regulated kinase and Wnt signaling pathways that regulate epithelial-mesenchymal transition. Cancer Res.70, 67156724. 10.1158/0008-5472.CAN-10-1377

  • 89

    SobczukP.SzczylikC.PortaC.CzarneckaA. M. (2017). Renin angiotensin system deregulation as renal cancer risk factor. Oncol. Lett.14, 50595068. 10.3892/ol.2017.6826

  • 90

    SolitD. B.ZhengF. F.DrobnjakM.MünsterP. N.HigginsB.VerbelD.et al (2002). 17-Allylamino-17-demethoxygeldanamycin induces the degradation of androgen receptor and HER-2/neu and inhibits the growth of prostate cancer xenografts. Clin. Cancer Res.8, 986993.

  • 91

    StiefA.OttewillJ. R.BaranowskiJ. (2018). “Relief F-based feature ranking and feature selection for monitoring induction motors,” in 2018 23rd International Conference on Methods & Models in Automation & Robotics (MMAR), 171176.

  • 92

    SuR.LiuX.WeiL.ZouQ. (2019). Deep-resp-Forest: A deep forest model to predict anti-cancer drug response. Methods166, 91102. 10.1016/j.ymeth.2019.02.009

  • 93

    SullivanL.MurphyT. M.BarrettC.LoftusB.ThornhillJ.LawlerM.et al (2012). IGFBP7 promoter methylation and gene expression analysis in prostate cancer. J. Urol.188, 13541360. 10.1016/j.juro.2012.06.002

  • 94

    TablA. A.AlkhateebA.ElmaraghyW.RuedaL.NgomA. (2019). A machine learning approach for identifying gene biomarkers guiding the treatment of breast cancer. Front. Genet.10, 256. 10.3389/fgene.2019.00256

  • 95

    TangY.ZhangY. Q.HuangZ. (2007). Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinform.4, 365381. 10.1109/TCBB.2007.70224

  • 96

    TrepelJ.MollapourM.GiacconeG.NeckersL. (2010). Targeting the dynamic HSP90 complex in cancer. Nat. Rev. Cancer10, 537549. 10.1038/nrc2887

  • 97

    TurajlicS.SottorivaA.GrahamT.SwantonC. (2019). Resolving genetic heterogeneity in cancer. Nat. Rev. Genet.20, 404416. 10.1038/s41576-019-0114-6

  • 98

    VasquezM. M.HuC.RoeD. J.ChenZ.HalonenM.GuerraS. (2016). Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application. BMC Med. Res. Methodol.16, 154. 10.1186/s12874-016-0254-8

  • 99

    WangC. Y.GuoS. T.WangJ. Y.YanX. G.FarrellyM.ZhangY. Y.et al (2016). Reactivation of ERK and Akt confers resistance of mutant BRAF colon cancer cells to the HSP90 inhibitor AUY922. Oncotarget7, 4959749610. 10.18632/oncotarget.10414

  • 100

    WeiX.DongJ. (2015). Aquaporin 1 promotes the proliferation and migration of lung cancer cell in vitro. Oncol. Rep.34, 14401448. 10.3892/or.2015.4107

  • 101

    WeiR.WangJ.JiaE.ChenT.NiY.JiaW. (2018). GSimp: A gibbs sampler based left-censored missing value imputation approach for metabolomics studies. PLoS Comput. Biol.14, e1005973. 10.1371/journal.pcbi.1005973

  • 102

    WhitesittJ. E. (2012). Boolean algebra and its applications. Mineola, New York: Dover Publications.

  • 103

    WildingG. (1992). The importance of steroid hormones in prostate cancer. Cancer Surv.14, 113130.

  • 104

    XiJ.YuanX.WangM.LiA.LiX.HuangQ. (2019). Inferring subgroup-specific driver genes from heterogeneous cancer samples via subspace learning with subgroup indication. Bioinformatics36, 18551863. 10.1093/bioinformatics/btz793

  • 105

    YamakiH.NakajimaM.ShimotohnoK. W.TanakaN. (2011). Molecular basis for the actions of Hsp90 inhibitors and cancer therapy. J. Antibiot. (Tokyo)64, 635644. 10.1038/ja.2011.60

  • 106

    YangW.SoaresJ.GreningerP.EdelmanE. J.LightfootH.ForbesS.et al (2013). Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res.41, D955D961. 10.1093/nar/gks1111

  • 107

    YinX.YuJ.ZhouY.WangC.JiaoZ.QianZ.et al (2018). Identification of CDK2 as a novel target in treatment of prostate cancer. Future Oncol.14, 709718. 10.2217/fon-2017-0561

  • 108

    YousefM.JungS.KossenkovA. V.ShoweL. C.ShoweM. K. (2007). Naïve Bayes for microRNA target predictions—Machine learning for microRNA targets. Bioinformatics23, 29872992. 10.1093/bioinformatics/btm484

  • 109

    ZhangF.KaufmanH. L.DengY.DrabierR. (2013). Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood. BMC Med. Genomics6, S4. 10.1186/1755-8794-6-S1-S4

  • 110

    ZhangZ. (2016). Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med.4, 218. 10.21037/atm.2016.03.37

  • 111

    ZhongL.LiY.XiongL.WangW.WuM.YuanT.et al (2021). Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduct. Target Ther.6, 201. 10.1038/s41392-021-00572-w

  • 112

    ZhouZ. H.FengJ. (2017). “Deep forest: Towards an alternative to deep neural networks,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 35533559.

  • 113

    ZhouZ.-H.FengJ. (2018). Deep forest. Natl. Sci. Rev.6, 7486. 10.1093/nsr/nwy108

  • 114

    ZhuC. S.PinskyP. F.CramerD. W.RansohoffD. F.HartgeP.PfeifferR. M.et al (2011). A framework for evaluating biomarkers for early detection: validation of biomarker panels for ovarian cancer. Cancer Prev. Res. Phila. Pa.4, 375383. 10.1158/1940-6207.CAPR-10-0193

Summary

Keywords

precision oncology, predictive biomarker, machine learning, feature selection, Boolean function minimization, prostate cancer, Hsp90 inhibitor, 17-AAG

Citation

Shin S-Y, Centenera MM, Hodgson JT, Nguyen EV, Butler LM, Daly RJ and Nguyen LK (2023) A Boolean-based machine learning framework identifies predictive biomarkers of HSP90-targeted therapy response in prostate cancer. Front. Mol. Biosci. 10:1094321. doi: 10.3389/fmolb.2023.1094321

Received

10 November 2022

Accepted

06 January 2023

Published

19 January 2023

Volume

10 - 2023

Edited by

George Bebis, University of Nevada, United States

Reviewed by

Jianing Xi, Guangzhou Medical University, China

Chaowang Lan, Guilin University of Electronic Technology, China

Updates

Copyright

*Correspondence: Sung-Young Shin, ; Lan K. Nguyen,

This article was submitted to Biological Modeling and Simulation, a section of the journal Frontiers in Molecular Biosciences

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics