ORIGINAL RESEARCH article

Front. Chem., 16 July 2019

Sec. Medicinal and Pharmaceutical Chemistry

Volume 7 - 2019 | https://doi.org/10.3389/fchem.2019.00509

Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data

  • 1. CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France

  • 2. Department of Tumor Immunology, Institut de Duve, Bruxelles, Belgium

Abstract

Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5,000 pair-wise drug combinations, 60 cell lines, 4 types of models, and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least 2-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic.

Introduction

Drug combinations are a well-established form of cancer treatment (Bayat Mokhtari et al., ). Administering more than one drug can provide many benefits: higher efficacy, lower toxicity, and at least delayed onset of acquired drug resistance (Sugahara et al., 2010; Holohan et al., ; Crystal et al., ). Serendipitous discovery in the clinic has been a traditional source of effective drug combinations (Zoli et al., 2001; Kurtz et al., ). Yet systematic large-scale efforts to identify them have only recently been pursued, with a growing number of preclinical experimental efforts to identify synergistic combinations (Zoli et al., 2001; Budman et al., ; Lieu et al., ; Kashif et al., ; Yu et al., 2015; Kischkel et al., ) being reported in literature. The sheer number of available and possible drug-like molecules (Polishchuk et al., ) and an exponential number of their combinations, however, make the process of finding new therapeutic combinations by purely experimental means highly inefficient.

An efficient way of discovering molecules with previously unknown activity on a given target is using in silico prediction methods. Quantitative Structure-Activity Relationship (QSAR) models establish a mathematical relationship between the chemical structure of a molecule, encoded as a set of structural and/or physico-chemical features (descriptors), and its biological activity on a target. Such methods have been successfully used in a wide variety of pharmacology and drug design projects (Cherkasov et al., ), including cancer research (Chen et al., ; Mullen et al., ; Ali and Aittokallio, ). QSAR models are traditionally built using simple linear models (Sabet et al., ; Pick et al., ; Speck-Planche et al., , ) to predict the activity of individual molecules against a molecular target. In the last 15 years, non-linear machine learning methods, such as Neural Network (NN) (González-Díaz et al., ), Support Vector Machine (SVM) (Doucet et al., ) or Random Forest (RF) (Singh et al., ), have also been employed to build QSAR models. More recently, QSAR modeling has also achieved accurate prediction of compound activity on non-molecular targets such as cancer cell lines (Kumar et al., ).

To extend QSAR modeling beyond individual molecules, the set of features from each molecule in the combination must be integrated. Various ways exist to encode two or more molecules as a feature vector, e.g., SIRMS descriptors (Kuz'min et al., ) for properties of combinations or the CGR approach for chemical reactions (de Luca et al., ). Rigorous validation strategies for the resulting models have been developed too (Muratov et al., ). The most common representation of a drug pair is, however, the concatenation of features from both molecules (Bulusu et al., ). On the other hand, modeling drug combinations requires the quantification of their synergy. Several metrics exist to quantify synergy (Foucquier and Guedj, ) (e.g., Bliss independence Bliss, , Loewe additivity Chou and Talalay, , Highest single agent approach Greco et al., or Chou-Talalay Method Chou, ). These are implemented in various commercial and publicly available software kits for the analysis of combination data, e.g., Combenefit (Di Veroli et al., ), CompuSyn (http://www.combosyn.com) or CalcuSyn (http://www.biosoft.com/w/calcusyn.htm).

One major roadblock in drug synergy modeling has been the lack of homogeneous data (i.e., datasets generated with the same assay, experimental conditions and synergy quantification). This has been, however, alleviated by the recent availability of large datasets from High-Throughput Screening (HTS) of drug combinations on cancer cell lines. For instance, Merck has released an HTS synergy dataset (O'Neil et al., ), covering combinations of 38 drugs and their activity against 39 cancer cell lines (more than 20,000 measured synergies). This dataset has been used to build predictive regression and classification models using multiple machine learning methods (Preuer et al., ). AstraZeneca carried out a screening study, spanning 910 drug combinations over 85 cancer cell lines (over 11,000 measured synergy scores), which was subsequently used for a DREAM challenge (Li et al., ; Menden et al., ). Very recently, the largest publicly available cancer drug combination dataset has been provided by the US National Cancer Institute (NCI). This NCI-ALMANAC (Holbeck et al., ) tested over 5,000 combinations of 104 investigational and approved drugs, with synergies measured against 60 cancer cell lines, leading to more than 290,000 synergy scores (ComboScores).

NCI-ALMANAC datasets have recently been modeled to predict the best growth inhibition of a given drug combination—cell line tuple (Xia et al., 2018). However, the question remains of how well ComboScores can be predicted on each NCI-60 cell line, which is important given that ComboScore-based screening has led to the identification of novel synergistic drug combinations in vivo (Holbeck et al., ). Here we present a large-scale study addressing this question. We build an individual model for each cell line using the popular RF algorithm (Breiman, ). We also build a second model per cell line using XGBoost (XGB for short) (Chen and Guestrin, ), a recent machine learning method that has helped to win numerous Kaggle competitions (Chen and Guestrin, ) as well as to generate highly predictive QSAR models (Sheridan et al., ). We validate these models for commonly-encountered prediction scenarios: e.g., unseen drug combination or unseen drug partner. We also introduce and validate reliability estimation techniques to further improve prediction of drug combination synergy. Lastly, we assess the suitability of NCI-ALMANAC datasets for predictive modeling depending on the screening center where they were generated.

Methods

Data

NCI-ALMANAC is the largest-to-date phenotypic drug combination HTS. It contains the synergy measurements of pairwise combinations of 104 FDA approved drugs on the 60 cancer cell lines forming the NCI-60 panel (Shoemaker, ). The drugs include a wide array of small organic compound families, as well as several inorganic molecules (cisplatin and related platinum-organic compounds, arsenic trioxide). A similarity clustering dendrogram (Figure 1) shows the high diversity of the drugs in NCI-ALMANAC. Indeed, only 3 clusters comprising 8 drugs are formed with a Tanimoto score threshold of 0.8 (Vinblastine with Vincristine, Sirolimus and Everolimus, and Daunorubicin-Doxorubicin-Idarubicin-Epirubicin clusters), while the remaining 96 drugs have smaller similarity among them.

Figure 1

NCI-ALMANAC aggregates synergy data from three screening centers: NCI's Frederick National laboratory for Cancer Research (screening center code 1A, 11,259 synergy determinations), SRI International (FF, 146,147 determinations), and University of Pittsburgh (FG, 136,129 determinations). The synergy of drug pairs is measured in these screening centers against the NCI-60 panel, which includes cell lines from nine cancer types: leukemia, melanoma, non-small-cell lung, colon, central nervous system, ovarian, renal, prostate, and breast. In total, synergy is measured for 293,565 drug combination—cell line tuples, which represents a matrix completeness of 91.35%. Each center follows its own protocol, and some drugs are absent from the combination pool depending on the screening center. Since there is no overlap between drug combination—cell line tuples between the three centers, it is not possible to estimate inter-center batch effects, and therefore we must use data from different screening centers separately.

The combination benefit is quantified in NCI-ALMANAC by the so-called ComboScore (a modified version of the Bliss independence model). From the entire dose-response matrix of the considered drug combination and cell line tuple, the gain (or loss) of the effect achieved by the combination over the theoretically expected value if the effect was additive is calculated. Positive values of ComboScore indicate a synergistic effect of the combination, whereas the negative correspond to an antagonistic effect (those purely additive obtain a zero ComboScore).

Further description of NCI-ALMANAC data is available at Supplementary Information.

Features

For the use in machine learning, the structures of compounds must be encoded as vectors of numerical features known in chemoinformatics as molecular descriptors (Todeschini and Consonni, 2000). Several types of chemical structure features have been considered in this work: (1) Morgan FingerPrints (MFP) are topological descriptors describing the connectivity of the molecular structure, which take values 0 or 1, depending on whether the pattern is present in the molecule or not (Rogers and Hahn, ). They have been calculated with RDKit library (Lamdrum, ) using the following parameters—length is 256 bits, radius is 2. (2) Morgan FingerPrint Counts (MFPC) are a non-binary version of MFP that takes integer values equal to the number of times the pattern is detected in the molecule (256 features per drug, also calculated with RDKit). (3) MACCS keys encode presence or absence of 166 predetermined substructural fragments as binary vectors (calculated with RDKit). (4) ISIDA fragments encode structure as a vector of numbers of occurrences of substructural fragments of given nature and topology in the molecule (Varnek et al., 2005), which are calculated with ISIDA/Fragmentor (Ruggiu et al., ). Only one type of fragments is considered here: sequences of atoms and bonds of length 2 to 6 (1,325 features per drug in total). (5) SIRMS fragments are the number of occurrences of 4-atom fragments of varying topology in a molecule, including bonded and non-bonded atoms (Kuz'min et al., ). Calculated with SiRMS python library (github.com/DrrDom/sirms), it led to 1,454 features per drug. In addition to these sets, 7 physico-chemical features are calculated by RDkit: total polar surface area (TPSA), molecular weight, logP, number of aliphatic and aromatic rings, H-bond donors and acceptors.

Machine Learning (ML) Workflow

Models are built using two ML algorithms: Random Forest (RF) (Svetnik et al., 2003) and Extreme Gradient Boosting (XGBoost; XGB for short) (Sheridan et al., ). The entire modeling workflow is sketched in Figure 1. Further details about how ML models were built are available at the Supplementary Information.

Predictive Performance Metrics

To evaluate the performance of a model, the following metrics are calculated from observed yobs and predicted ypred ComboScore values:

Root Mean Squared Error (RMSE)

Coefficient of determination (R2) (Leach and Gillet, )

Pearson's correlation coefficient (Rp)

Spearman's rank-order correlation coefficient (Rs)

We use Rp between observed and predicted values of ComboScore of a dataset not used to train the model as a primary metric of its accuracy. For proper estimation of the generalization error, these metrics are always calculated here on a test set not used to train or select the model.

Results

Exploratory Modeling of NCI-ALMANAC Data

First, we perform an exploratory modeling on the FG datasets in order to determine optimal settings for synergy prediction by assessing various types of features, data augmentation schemes and machine learning methods. The summary of performance improvements is shown on Figure 2. The best median Rp across cell lines for RF was obtained with 250 trees, a third of the features evaluated at each tree node, training data augmentation and MFPC fingerprints complemented by physico-chemical properties (256 and 7 features per drug, respectively). The gain of performance with RF is substantial: the median Rp increases from 0.530 (I) to 0.634 (VI).

Figure 2

XGB models are generated with the same features and data set partitions. Changing the machine learning algorithm from RF to XGB does not improve the median test set Rp, although both minimum and maximum Rp are higher with XGB (boxplots VI and VII in Figure 2, respectively). After tuning of XGB hyperparameters per cell line, a small gain in overall performance is obtained: the median Rp of tuned XGB rises to 0.641 (boxplot VIII). In comparison, Y-randomization (Tropsha et al., 2003) tests using the same learning algorithm did only obtain a median Rp of −0.016 (−0.024 when using RF). Figure 3 shows the degree of accuracy achieved by each algorithm for the best and the worst predicted cell line. The cell lines with the worst predictions (OVCAR-8 for RF and SF295 for XGB) have substantially smaller variance in observed ComboScore than those with the best predictions (SK-MEL-5 for both algorithms).

Figure 3

Estimating the Reliability of Drug Synergy Predictions

For prospective use of models, it is paramount to calculate not only predicted drug combination synergies, but also how reliable these predictions are (Mathea et al., ). With this purpose, we have applied a RF-specific reliability prediction approach, where the degree of agreement between the diverse trees in the forest serves as a reliability score. This is quantified here as the standard deviation (SD) of the RF tree predictions (250 per drug combination and cell line) and referred to as tree_SD. tree_SD has been pointed out as one of the most powerful metrics to assess the reliability of predictions in regression problems (Mathea et al., ). We thus assemble test subsets with the 25% most reliable ComboScore predictions per cell line (i.e., combinations with the 25% lowest tree_SD scores). Likewise, we assemble test subsets with 25% least reliable predictions per cell line.

Figure 4 presents the test set performances of each cell line model on the three scenarios: 25% most reliable predictions, all predictions regardless of estimated reliability and 25% least reliable predictions. The top and bottom 25% predictions in terms of reliability obtain the lowest and highest RMSE in every cell line, which demonstrates the accuracy and generality of tree_SD as a reliability score for drug synergy predictions. Test set RMSE varies greatly across cell lines, e.g., models built on leukemia cell lines obtain in general higher error. This, however, comes from the higher range of ComboScores observed in these cell lines. Indeed, the larger this range, the higher the range of predicted ComboScores is, which combined tend to make RMSE larger. Similar RMSE is only obtained on the K-562 leukemia cell line, which is consistent with the fact that it has the lowest range among leukemia cell lines and similar to that of other cancer types.

Figure 4

Reliability estimation is evaluated in terms of RMSE rather than Rp. While RMSE is not as intuitive as correlation, correlations may be misleading when comparing performances of models across test sets with distinct variances. Figure 5 illustrates this issue with the test performances of HL-60 models, which benefit the most from reliability estimation. The test set with the most reliable combinations is predicted with half the RMSE of the entire test set (RMSE of 41 vs. 80) and a third of the least reliable combinations (RMSE of 41 vs. 117). This more accurate prediction can be visually observed too, but the other metrics (R2, Rp, and Rs) do not capture this increase in accuracy due to substantially different ComboScore variance between the compared test sets. Importantly, RF with reliability prediction provides a much larger reduction in RMSE than that introduced by XGB (bottom right), both with respect to RF without reliability prediction (bottom left). These results strongly suggest that, in cases where it is not necessary to test all positive predictions (here synergistic drug combinations), selecting the most reliable predictions is more effective than using the most suitable ML algorithm.

Figure 5

Performance in Predicting Synergies With Drugs Not Included in NCI-ALMANAC

The random data splits that we have used so far may overestimate the model's performance in the case of drug combinations. This would be due to the presence of the two drugs in the combination in both training and test sets, albeit with other partners (Muratov et al., ). In order to assess to which extent this is the case, we also carry out Leave-One-Drug-Out (LODO) cross-validation experiments for each cell line. In LODO cross-validation, every combination containing the considered left-out drug is placed in its test set, and the model is built on the remaining combinations tested on that cell line. Thus, there are as many folds as drugs in the dataset. In this way, the LODO cross-validation simulates the model's behavior when presented with a new chemical entity outside of the model's scope, as if it was not included in the dataset.

Figure 6 shows the outcome of LODO cross-validation for XGB per cell line. We henceforth use XGB with the recommended values for hyperparameters, as tuning them for each LODO cross-validation fold and cell line is prohibitive and would only provide marginal gains (see Figure 2). LODO results show that combinations associated with 75% of the left-out drugs can be predicted with an accuracy of at least Rp = 0.3 against any cell line. This accuracy raises to at least Rp = 0.5 for 50% of the left-out drugs. The latter is not much worse than the median Rp across cell lines when using 90/10 data partitions (Rp = 0.641 as shown in Figure 2's boxplot VIII). k-fold cross-validation results are available for comparison in Supplementary Figure 3.

Figure 6

Figure 7 shows the analysis for LODO cross-validations in terms of RMSE. About 75% of models demonstrate at least moderate accuracy (RMSE < 50). The exceptions are mostly leukemia cell line models, which obtain higher RMSE due to having the highest variances in ComboScores among cancer types. An important result is that using RF models restricted to the most reliable predictions allows us to reduce the error of prediction further in every cell line (RMSE < 40), in full agreement with the findings from random 90/10 partitions (see Figure 4) and also outperforming the best models without reliability prediction.

Figure 7

Analyzing LODO results per left-out drug instead of per cell line reveals that synergy prediction is much worse for certain left-out drugs in each cancer type. LODO performance of each drug across cell lines is shown in Supplementary Figure 4. This figure shows that models for arsenic trioxide, highly dissimilar to other drugs, have the lowest performance across cell lines and partner drugs (median Rp of models concerning this drug is −0.28). Conversely, partners of tyrosine kinase inhibitors, well-represented in these datasets (e.g., Imatinib, Nilotinib or Lapatinib), are predicted with high accuracy (e.g., models for imatinib have median Rp = 0.82). Topoisomerase inhibitors (Teniposide and Etoposide) are also among the best-predicted left-out drugs. These in silico models could be used to anticipate how the synergies of a drug in combination with its partner drugs would vary across NCI-60 cell lines. However, since high accuracy is only obtained on those left-out drugs well-represented in NCI-ALMANAC, such selectivity predictions should only be accurate for drugs with similar chemical structure to those in NCI-ALMANAC. As models predicting drug-induced cell line response have been shown to improve by integrating drug features with multi-omics cell features (Menden et al., ; Xia et al., 2018), we expect that predicting drug synergy across cell lines will also improve by following such multi-task learning approach on this closely related problem.

Comparing Predictive Models Built With Data From Different Screening Centers

So far we have exclusively employed data from the FG screening center, which represents about half of NCI-ALMANAC data. Practically all the remaining ComboScores come from the FF screening center and are also determined with a 3 ×3 grid of non-zero concentrations. Thus, we evaluate here the predictive potential of FF datasets. We start by building RF models from FF data using the same 90/10 partitions as with FG. Surprisingly, FF-based models obtained worse performance in every cell line (Figure 8) and thus were objectively worse at predicting ComboScores.

Figure 8

In trying to understand this unexpected result, we started by investigating whether this was due to modeling differences, but this was not the case. First, FF training sets are slightly larger than FG datasets (see Supplementary Table 1), which theoretically favors better performance on FF. Furthermore, using tuned XGB models led to essentially the same result (median Rp of 0.641 for FG vs. 0.368 for FF) as shown in Figure 8 with RF. In addition to these non-linear methods, we also used Elastic Net (EN), but FF models were still substantially less predictive than FG models (median Rp of 0.37 for FG vs. 0.23 for FF). When we carried out LODO cross-validations instead of 90/10 partitions, the same trend was observed (Supplementary Figures 5, 6 also show worse performance of FF-based LODO than that of FG-based LODO in Figure 6).

To shed light into this issue, we looked at the only factor that we can compare between these screening centers: the relative growth inhibition (PERCENTGROWTH) induced by a given concentration of a drug tested individually. Interestingly, by counting the different test dates, we observed that FG had on average tested a non-combined drug 3.77 times per cell line, whereas FF almost doubled this number (7.13 times per cell line). A higher number of tests is not in itself worrisome if the growth inhibition of the drug-concentration-cell line tuple is similar between dates. However, if the measurements from these tests are substantially different, this is a problem because the set of ComboScores determined with variable measurements from the same tuple will be inconsistent as well. Consequently, synergy differences between such combinations will not only come from their intrinsic properties, but also from unrelated experimental variability.

To show that higher growth inhibition variability in FF data results in less predictive models, we analyzed five drugs (Thioguanine, Chlorambucil, Altretamine, Fluorouracil, and Melphalan) with a high number of different test dates in both centers. We first consider the drugs on a cell line were only FG models obtain high average accuracy in predicting synergy (NCI/ADR-RES) and subsequently on another where both FF and FG models are on average predictive (NCI-H322M). On each cell line, each drug has a set of growth inhibition replicates per concentration and screening center (i.e., 15 sets per screening center). The performance on NCI/ADR-RES using FF data is indeed poor (Rp = 0.14 in 90/10 partition by RF), but it is much better predicted using FG data (Rp = 0.65, using the same partition and method). Fourteen of the fifteen sets have higher standard deviation of growth inhibition with FF data (Figure 9), which is consistent with the lower accuracy in predicting synergy obtained with this dataset. Conversely, we repeated this operation with NCI-H322M where synergy is well-predicted by RF with both FF (Rp = 0.61 in 90/10 partition) and FG data (Rp = 0.66, on the same partition). The standard deviations from both screening centers are now similar (Figure 9). Taken together, these experiments suggest that the reason why FF data results in less predictive models is the noise introduced in ComboScore determination by larger variability of growth inhibition measurements.

Figure 9

Discussion

NCI-ALMANAC is an extremely valuable resource for the discovery of novel synergistic drug combinations on NCI-60 cell lines. First, it is by far the largest-to-date HTS of drug combinations, therefore allowing in silico models with much higher accuracy and broader domain of applicability in predicting the synergy of other combinations. Second, some of the synergistic drug combinations discovered in vitro by NCI-ALMANAC were subsequently tested on human tumor mouse xenografts of the same cell line. 48% of them were also synergistic in at least one of these in vivo models (Holbeck et al., ), which led to the launch of two clinical trials so far (NCT02211755 and NCT02379416).

In this study, we have found that it is possible to predict the synergy of unseen drug combinations against NCI-60 panel cell lines with high accuracy by exploiting NCI-ALMANAC data. We have established a general ML workflow (types of structural features, data preprocessing strategy, ML method) to generate such models. When trained on FG data, predicted synergies from these models match observable synergies with Rp correlations comprised between 0.43 and 0.86 depending on the considered cell line. Incidentally, these regression problems must be highly non-linear, as EN leads to substantially less predictive models than XGB or RF.

Some cell lines and drug combinations can be predicted with higher accuracy than others. For example, models for the SK-MEL-5 cell line perform best with any method (Figure 6). However, if we use RMSE instead of Rp to reduce the influence of the ComboScore range, models for the NCI-ARD-RES are now best (gray squares in Figure 7). Another explanatory factor for this variability is the adequacy of the employed ML technique to the problem instance to solve (each cell line constitutes here a different problem instance). Even if training set size, features and classifier are the same, the modeled relationship between drug synergy and features depend on training set composition and cell line properties (implicitly). It is well-established that the performance of supervised learning algorithms varies depending on the problem instance in ways that cannot be anticipated without doing the actual numerical experiments (Fernández-Delgado et al., ). LODO cross-validation also revealed both best and worst partner drugs. These differences are mainly due to the number of similar partner drugs. For example, it is difficult to predict synergy of combinations containing arsenic trioxide because its 103 partner drugs are highly dissimilar in terms of chemical structure and physico-chemical properties. Indeed, machine learning from dissimilar data instances tend to be less accurate, although here the dissimilarity can be partial as arsenic trioxide's partner can be similar to other NCI-ALMANAC drugs. On the other hand, combinations containing some other drugs are better represented in NCI-ALMANAC and hence tend to be predicted with higher accuracy. This is the case of various alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors (Supplementary Figure 4).

Recent QSAR and drug combination modeling studies have evaluated the application of the latest machine learning algorithms (e.g., XGBoost, Deep Neural Network). These studies have found that these algorithms provide better performance on average across targets than RF. However, these gains are small and hence do not always justify the much greater resources required for hyperparameter tuning (Sheridan et al., ; Preuer et al., ). Performance gains have also been found small here with NCI-ALMANAC data, as the average test set Rp of XGBoost across the 60 cell lines is just +0.007 larger than with RF. An important result is that restricting to the most reliable RF predictions provides much greater predictive accuracy than that introduced by a more suitable learning algorithm (e.g., XGBoost). It is surprising that this powerful technique is so uncommonly used, as has already been pointed out (Sheridan, ; Mathea et al., ). In fact, we are not aware of any other previous study applying reliability estimation to the prediction of drug synergy on cancer cell lines. Here reliability prediction permitted to reduce the RMSE by up to 50% depending on the cell line. This is particularly exciting for virtual screening problems, where only a small subset of the predictions can be tested in vitro. In this scenario, it is useful to identify those combinations that are not only predicted to be synergistic, but also reliable because this should provide higher hit rates. Lastly, highly synergistic combinations predicted with low reliability should also be tested, as the corresponding measurements would be those broadening the applicability domain of future models the most.

We have also found that using FG datasets leads to substantially more predictive models than FF datasets. This result is robust in that it is observed with various types of models (XGB, RF, EN). Moreover, it occurs in spite of the availability of slightly more training data. Further investigation revealed that there are many more measurements of growth inhibition and with greater variability in FF than in FG. This in turn introduces more noise into ComboScore determinations in FF, thus impairing its modeling. Inconsistencies between centers measuring the response of cancer cell lines to drugs have been observed before (Haibe-Kains et al., ). There has been intense controversy about the extent, sources and impact of these inconsistencies (Stransky et al., ; Geeleher et al., ; Safikhani et al., , ). In any case, it is clear that data permits the development of predictive models regardless of the screening center (Ammad-ud-din et al., ; Covell, ; Fang et al., ; Naulaerts et al., ), as it has also been the case here with NCI-ALMANAC. Owing to this controversy on datasets from multiple screening centers, a better understanding of their limitations and the identification of protocols to generate them with improved consistency has emerged (Haverty et al., ). These protocols will ultimately permit that merging datasets from different screening centers result in further predictive accuracy.

Conclusion

While NCI-ALMANAC measured the synergies of over 5,000 combinations per cell line, this still represents a minuscule part of all conceivable combinations. Even if we restricted ourselves to the set of 12,000 drug molecules estimated to have reached clinical development or undergone significant preclinical profiling (Janes et al., ), almost 72 million combinations per cell line would have to be tested in vitro to identify the most synergistic among them. Therefore, the developed in silico models are of great importance because these can drastically reduce the number of required in vitro tests by predicting which of the considered combinations are likely to be synergistic.

Statements

Author contributions

PB conceived the study and wrote the manuscript. PS implemented the software and carried out the numerical experiments with the assistance of SN. All authors commented and proposed improvements to the manuscript.

Funding

ANR Tremplin-ERC grant n° N° ANR-17-ERC2-0003-01 (PB).

Acknowledgments

The initially submitted version of this manuscript was previously released as a Pre-Print at https://www.biorxiv.org/content/10.1101/504076v1 (Sidorov et al., ).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem.2019.00509/full#supplementary-material

References

  • 1

    AliM.AittokallioT. (2018). Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev.11, 3139. 10.1007/s12551-018-0446-z

  • 2

    Ammad-ud-dinM.GeorgiiE.GönenM.LaitinenT.KallioniemiO.WennerbergK.et al. (2014). Integrative and personalized QSAR analysis in cancer by kernelized Bayesian matrix factorization. J. Chem. Inf. Model.54, 23472359. 10.1021/ci500152b

  • 3

    Bayat MokhtariR.HomayouniT. S.BaluchN.MorgatskayaE.KumarS.DasB.et al. (2017). Combination therapy in combating cancer. Oncotarget8, 3802238043. 10.18632/oncotarget.16723

  • 4

    BlissC. I. (1939). The toxicity of poisons applied jointly I. Ann. Appl. Biol.26, 585615. 10.1111/j.1744-7348.1939.tb06990.x

  • 5

    BreimanL. (2001). Random forests. Mach. Learn.45, 532. 10.1023/A:1010933404324

  • 6

    BudmanD. R.CalabroA.RosenL.LesserM. (2012). Identification of unique synergistic drug combinations associated with downexpression of survivin in a preclinical breast cancer model system. Anticancer Drugs23, 272279. 10.1097/cad.0b013e32834ebda4

  • 7

    BulusuK. C.GuhaR.MasonD. J.LewisR. P.MuratovE.Kalantar MotamediY.et al. (2016). Modelling of compound combination effects and applications to efficacy and toxicity: state-of-the-art, challenges and perspectives. Drug Discov. Today21, 225238. 10.1016/j.drudis.2015.09.003

  • 8

    ChenJ.ShenY.LiaoS.ChenL.ZhengK. (2007). DFT-based QSAR study and molecular design of AHMA derivatives as potent anticancer agents. Int. J. Quantum Chem.107, 14681478. 10.1002/qua.21285

  • 9

    ChenT.GuestrinC. (2016). XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD '16 (New York, NY: ACM Press), 785794. Available online at: https://dl.acm.org/citation.cfm?id=2939672&picked=prox

  • 10

    CherkasovA.MuratovE. N.FourchesD.VarnekA.BaskinI. I.CroninM.et al. (2014). QSAR modeling: where have you been? where are you going to?J. Med. Chem.57, 49775010. 10.1021/jm4004285

  • 11

    ChouT.-C. (2010). Drug combination studies and their synergy quantification using the chou-talalay method. Cancer Res.70, 440446. 10.1158/0008-5472.CAN-09-1947

  • 12

    ChouT.-C.TalalayP. (1984). Quantitative analysis of dose-effect relationships: the combined effects of multiple drugs or enzyme inhibitors. Adv. Enzyme Regul.22, 2755. 10.1016/0065-2571(84)90007-4

  • 13

    CovellD. G. (2015). Data mining approaches for genomic biomarker development: applications using drug screening data from the cancer genome project and the cancer cell line encyclopedia. PLoS ONE10:e0127433. 10.1371/journal.pone.0127433

  • 14

    CrystalA. S.ShawA. T.SequistL. V.FribouletL.NiederstM. J.LockermanE. L.et al. (2014). Patient-derived models of acquired resistance can identify effective drug combinations for cancer. Science346, 14801486. 10.1126/science.1254721

  • 15

    de LucaA.HorvathD.MarcouG.Solov'evV.VarnekA. (2012). Mining chemical reactions using neighborhood behavior and condensed graphs of reactions approaches. J. Chem.Inf. Model.52, 23252338. 10.1021/ci300149n

  • 16

    Di VeroliG. Y.FornariC.WangD.MollardS.BramhallJ. L.RichardsF. M.et al. (2016). Combenefit: an interactive platform for the analysis and visualization of drug combinations. Bioinformatics32, 28662868. 10.1093/bioinformatics/btw230

  • 17

    DoucetJ.-P.BarbaultF.XiaH.PanayeA.FanB. (2007). Nonlinear SVM approaches to QSPR/QSAR studies and drug design. Curr. Comput. Aided-Drug Des.3, 263289. 10.2174/157340907782799372

  • 18

    FangY.QinY.ZhangN.WangJ.WangH.ZhengX. (2015). DISIS: prediction of drug response through an iterative sure independence screening. PLoS ONE10:e0120408. 10.1371/journal.pone.0120408

  • 19

    Fernández-DelgadoM.CernadasE.BarroS.AmorimD. (2014). Do we need hundreds of classifiers to solve real world classification problems?J. Mach. Learn. Res.15, 31333181. Available online at: http://jmlr.org/papers/v15/delgado14a.html

  • 20

    FoucquierJ.GuedjM. (2015). Analysis of drug combinations: current methodological landscape. Pharmacol. Res. Perspect.3:e00149. 10.1002/prp2.149

  • 21

    GeeleherP.GamazonE. R.SeoigheC.CoxN. J.HuangR. S. (2016). Consistency in large pharmacogenomic studies. Nature540, E1E2. 10.1038/nature19838

  • 22

    González-DíazH.BonetI.TeránC.De ClercqE.BelloR.GarcíaM. M.et al. (2007). ANN-QSAR model for selection of anticancer leads from structurally heterogeneous series of compounds. Eur. J. Med. Chem.42, 580585. 10.1016/J.EJMECH.2006.11.016

  • 23

    GrecoW. R.BravoG.ParsonsJ. C. (1995). The search for synergy: a critical review from a response surface perspective. Pharmacol. Rev.47, 331385.

  • 24

    Haibe-KainsB.El-HachemN.BirkbakN. J.JinA. C.BeckA. H.AertsH. J.et al. (2013). Inconsistency in large pharmacogenomic studies. Nature504, 389393. 10.1038/nature12831

  • 25

    HavertyP. M.LinE.TanJ.YuY.LamB.LianoglouS.et al. (2016). Reproducible pharmacogenomic profiling of cancer cell line panels. Nature533, 333337. 10.1038/nature

  • 26

    HolbeckS. L.CamalierR.CrowellJ. A.GovindharajuluJ. P.HollingsheadM.AndersonL. W.et al. (2017). The National Cancer Institute ALMANAC: a comprehensive screening resource for the detection of anticancer drug pairs with enhanced therapeutic activity. Cancer Res.77, 35643576. 10.1158/0008-5472.can-17-0489

  • 27

    HolohanC.Van SchaeybroeckS.LongleyD. B.JohnstonP. G. (2013). Cancer drug resistance: an evolving paradigm. Nat. Rev. Cancer13, 714726. 10.1038/nrc3599

  • 28

    JanesJ.YoungM. E.ChenE.RogersN. H.Burgstaller-MuehlbacherS.HughesL. D.et al. (2018). The ReFRAME library as a comprehensive drug repurposing library and its application to the treatment of cryptosporidiosis. Proc. Natl. Acad. Sci. U.S.A.155, 1075010755. 10.1073/pnas.1810137115

  • 29

    KashifM.AnderssonC.HassanS.KarlssonH.SenkowskiW.FryknäsM.et al. (2015). In vitro discovery of promising anti-cancer drug combinations using iterative maximisation of a therapeutic index. Sci. Rep.5:14118. 10.1038/srep14118

  • 30

    KischkelF. C.EichJ.MeyerC. I.WeidemüllerP.KrapflJ.Yassin-KelepirR.et al. (2017). New in vitro system to predict chemotherapeutic efficacy of drug combinations in fresh tumor samples. PeerJ5:e3030. 10.7717/peerj.3030

  • 31

    KumarR.ChaudharyK.SinglaD.GautamA.RaghavaG. P. (2014). Designing of promiscuous inhibitors against pancreatic cancer cell lines. Sci. Rep.4:4668. 10.1038/srep04668

  • 32

    KurtzS. E.TraerE.MartinezJ.ParkA.WagnerJ.PandyaR.et al. (2015). Identification of effective targeted drug combinations using functional ex vivo screening of primary patient specimens. Blood 126:865. Available online at: http://www.bloodjournal.org/content/126/23/865

  • 33

    Kuz'minV. E.ArtemenkoA. G.MuratovE. N. (2008). Hierarchical QSAR technology based on the Simplex representation of molecular structure. J. Comput. Aided. Mol. Des.22, 403421. 10.1007/s10822-008-9179-6

  • 34

    LamdrumG. (2015). RDKit: Open-source Cheminformatics. Available online at: http://www.rdkit.org/ (accessed April 3, 2015).

  • 35

    LeachA. R.GilletV. J. (2007). An Introduction To Chemoinformatics. Dordrecht: Springer Netherlands.

  • 36

    LiH.LiT.QuangD.GuanY. (2018). Network Propagation Predicts Drug Synergy in Cancers. Cancer Res.78, 54465457. 10.1158/0008-5472.CAN-18-0740

  • 37

    LieuC. H.TanA.-C.LeongS.DiamondJ. R.EckhardtS. G. (2013). From bench to bedside: lessons learned in translating preclinical studies in cancer drug development. JNCI J. Natl. Cancer Inst.105, 14411456. 10.1093/jnci/djt209

  • 38

    MatheaM.KlingspohnW.BaumannK. (2016). Chemoinformatic classification methods and their applicability domain. Mol. Inform.35, 160180. 10.1002/minf.201501019

  • 39

    MendenM. P.IorioF.GarnettM.McDermottU.BenesC. H.BallesterP. J.et al. (2013). Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE8:e61318. 10.1371/journal.pone.0061318

  • 40

    MendenM. P.WangD.MasonM. J.SzalaiB.BulusuK. C.GuanY.et al. (2019). Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat. Commun.10:2674. 10.1038/s41467-019-09799-2

  • 41

    MullenL. M. A.DuchowiczP. R.CastroE. A. (2011). QSAR treatment on a new class of triphenylmethyl-containing compounds as potent anticancer agents. Chemom. Intell. Lab. Syst.107, 269275. 10.1016/J.CHEMOLAB.2011.04.011

  • 42

    MuratovE. N.VarlamovaE. V.ArtemenkoA. G.PolishchukP. G.Kuz'minV. E. (2012). Existing and developing approaches for QSAR analysis of mixtures. Mol. Inform.31, 202221. 10.1002/minf.201100129

  • 43

    NaulaertsS.DangC. C.BallesterP. J. (2017). Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget8:20923. 10.18632/oncotarget.20923

  • 44

    O'NeilJ.BenitaY.FeldmanI.ChenardM.RobertsB.LiuY.et al. (2016). An unbiased oncology compound screen to identify novel combination strategies. Mol. Cancer Ther.15, 11551162. 10.1158/1535-7163.MCT-15-0843

  • 45

    PickA.MüllerH.MayerR.HaenischB.PajevaI. K.WeigtM.et al. (2011). Structure–activity relationships of flavonoids as inhibitors of breast cancer resistance protein (BCRP). Bioorg. Med. Chem.19, 20902102. 10.1016/J.BMC.2010.12.043

  • 46

    PolishchukP. G.MadzhidovT. I.VarnekA. (2013). Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided. Mol. Des.27, 675679. 10.1007/s10822-013-9672-4

  • 47

    PreuerK.LewisR. P. I.HochreiterS.BenderA.BulusuK. C.KlambauerG. (2018). DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics34, 15381546. 10.1093/bioinformatics/btx806

  • 48

    RogersD.HahnM. (2010). Extended-connectivity fingerprints. J. Chem. Inf. Model.50, 742754. 10.1021/ci100050t

  • 49

    RuggiuF.MarcouG.Solov 'evV.HorvathD.VarnekA. (2015). ISIDA Fragmentor2015—User Manual.

  • 50

    SabetR.MohammadpourM.SadeghiA.FassihiA. (2010). QSAR study of isatin analogues as in vitro anti-cancer agents. Eur. J. Med. Chem.45, 11131118. 10.1016/J.EJMECH.2009.12.010

  • 51

    SafikhaniZ.El-HachemN.SmirnovP.FreemanM.GoldenbergA.BirkbakN. J.et al. (2016). Safikhani et al. reply. Nature540, E2E4. 10.1038/nature19839

  • 52

    SafikhaniZ.SmirnovP.FreemanM.El-HachemN.SheA.ReneQ.et al. (2017). Revisiting inconsistency in large pharmacogenomic studies. F1000Res.5:2333. 10.12688/f1000research.9611.3

  • 53

    SheridanR. P. (2013). Using random forest to model the domain applicability of another random forest model. J. Chem. Inf. Model.53, 28372850. 10.1021/ci400482e

  • 54

    SheridanR. P.WangW. M.LiawA.MaJ.GiffordE. M. (2016). Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model.56, 23532360. 10.1021/acs.jcim.6b00591

  • 55

    ShoemakerR. H. (2006). The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer6, 813823. 10.1038/nrc1951

  • 56

    SidorovP.NaulaertsS.Ariey-BonnetJ.PasquierE.BallesterP. (2018). Predicting synergism of cancer drug combinations using NCI-ALMANAC data. bioRxiv10:504076. 10.1101/504076

  • 57

    SinghH.SinghS.SinglaD.AgarwalS. M.RaghavaG. P. (2015). QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest. Biol. Direct10:10. 10.1186/s13062-015-0046-9

  • 58

    Speck-PlancheA.KleandrovaV. V.LuanF.CordeiroM. N. (2011). Multi-target drug discovery in anti-cancer therapy: fragment-based approach toward the design of potent and versatile anti-prostate cancer agents. Bioorg. Med. Chem.19, 62396244. 10.1016/J.BMC.2011.09.015

  • 59

    Speck-PlancheA.KleandrovaV. V.LuanF.CordeiroM. N. (2012). Chemoinformatics in anti-cancer chemotherapy: multi-target QSAR model for the in silico discovery of anti-breast cancer agents. Eur. J. Pharm. Sci.47, 273279. 10.1016/J.EJPS.2012.04.012

  • 60

    StranskyN.GhandiM.KryukovG. V.GarrawayL. A.LehárJ.LiuM.et al. (2015). Pharmacogenomic agreement between two cancer cell line data sets. Nature528, 8487. 10.1038/nature15736

  • 61

    SugaharaK. N.TeesaluT.KarmaliP. P.KotamrajuV. R.AgemyL.GreenwaldD. R.et al. (2010). Coadministration of a tumor-penetrating peptide enhances the efficacy of cancer drugs. Science328, 10311035. 10.1126/science.1183057

  • 62

    SvetnikV.LiawA.TongC.CulbersonJ. C.SheridanR. P.FeustonB. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci.43, 19471958. 10.1021/ci034160g

  • 63

    TodeschiniR.ConsonniV. (2000). Handbook of Molecular Descriptors. Weinheim: Wiley-VCH Verlag GmbH.

  • 64

    TropshaA.GramaticaP.GombarV. (2003). The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci.22, 6977. 10.1002/qsar.200390007

  • 65

    VarnekA.FourchesD.HoonakkerF.Solov'evV. P. (2005). Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J. Comput. Aided. Mol. Des.19, 693703. 10.1007/s10822-005-9008-0

  • 66

    XiaF.ShuklaM.BrettinT.Garcia-CardonaC.CohnJ.AllenJ. E.et al. (2018). Predicting tumor cell line response to drug pairs with deep learning. BMC Bioinformatics19:3. 10.1186/s12859-018-2509-3

  • 67

    YuD.KahenE.CubittC. L.McGuireJ.KreahlingJ.LeeJ.et al. (2015). Identification of synergistic, clinically achievable, combination therapies for osteosarcoma. Sci. Rep.5:16991. 10.1038/srep16991

  • 68

    ZoliW.RicottiL.TeseiA.BarzantiF.AmadoriD. (2001). In vitro preclinical models for a rational design of chemotherapy combinations in human tumors. Crit. Rev. Oncol. Hematol.37, 6982. 10.1016/S1040-8428(00)00110-4

Summary

Keywords

chemoinformatics, drug synergy, machine learning, QSAR (qualitative structure-activity relationships), predictive (QSPR) models

Citation

Sidorov P, Naulaerts S, Ariey-Bonnet J, Pasquier E and Ballester PJ (2019) Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data. Front. Chem. 7:509. doi: 10.3389/fchem.2019.00509

Received

17 May 2019

Accepted

02 July 2019

Published

16 July 2019

Volume

7 - 2019

Edited by

Jose L. Medina-Franco, National Autonomous University of Mexico, Mexico

Reviewed by

Oscar Mendez Lucio, Bayer, France; Monica Campillos, Helmholtz Center Munich, Germany

Updates

Copyright

*Correspondence: Pedro J. Ballester

This article was submitted to Medicinal and Pharmaceutical Chemistry, a section of the journal Frontiers in Chemistry

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics