ORIGINAL RESEARCH article
Pathway Based Analysis of Mutation Data Is Efficient for Scoring Target Cancer Drugs
- 1Oncobox Ltd., Moscow, Russia
- 2Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
- 3The Laboratory of Clinical Bioinformatics, IM Sechenov First Moscow State Medical University, Moscow, Russia
- 4Omicsway Corp., Walnut, CA, United States
- 5Science-Educational Center Department, M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- 6Laboratory of the Swiss Hepato-Pancreato-Biliary, Department of Surgery, Transplantation Center, University Hospital Zurich, Zurich, Switzerland
Despite the significant achievements in chemotherapy, cancer remains one of the leading causes of death. Target therapy revolutionized this field, but efficiencies of target drugs show dramatic variation among individual patients. Personalization of target therapies remains, therefore, a challenge in oncology. Here, we proposed molecular pathway-based algorithm for scoring of target drugs using high throughput mutation data to personalize their clinical efficacies. This algorithm was validated on 3,800 exome mutation profiles from The Cancer Genome Atlas (TCGA) project for 128 target drugs. The output values termed Mutational Drug Scores (MDS) showed positive correlation with the published drug efficiencies in clinical trials. We also used MDS approach to simulate all known protein coding genes as the putative drug targets. The model used was built on the basis of 18,273 mutation profiles from COSMIC database for eight cancer types. We found that the MDS algorithm-predicted hits frequently coincide with those already used as targets of the existing cancer drugs, but several novel candidates can be considered promising for further developments. Our results evidence that the MDS is applicable to ranking of anticancer drugs and can be applied for the identification of novel molecular targets.
Globally, cancer is one of the major causes of death (Centers for Disease Control and Prevention, 2017). For several decades, chemotherapy remains a key treatment for many cancers, often with impressive success rates. For example, its use in testicular cancer turned near complete mortality to >90% disease-specific survival (Hanna and Einhorn, 2014; Oldenburg et al., 2015). However, most of the advanced cancers remain incurable and/or unresponsive using standard chemotherapy approaches, frequently develop resistance to treatments and relapse (Vasey, 2003; Housman et al., 2014). More recently, a new generation of drugs has been developed that specifically target functional tumor marker molecules. These medicines termed Target drugs have one or a few specific molecular targets in a cell (Druker et al., 2001a,b; Sawyers, 2004; Spirin et al., 2017). They have greater selectivity and generally lower toxicity than the conventional chemotherapy (Joo et al., 2013). Structurally, they can be either low molecular mass inhibitor molecules or monoclonal antibodies (Padma, 2015). The repertoire of their molecular targets is permanently growing and now includes receptor and intracellular tyrosine kinases (Baselga, 2006), vascular endothelial growth factor (Rini, 2009), immune checkpoint molecules such as PD1, PDL1, and CTLA4 (Azoury et al., 2015), poly(ADP-ribose) polymerase (Anders et al., 2010), mTOR inhibitors (Xie et al., 2016), hormone receptors (Ko and Balk, 2004), proteasomal components (Kisselev et al., 2012), ganglioside GD2 (Suzuki and Cheung, 2015), and cancer-specific fusion proteins (Giles et al., 2005). For many cancers, the emergence of target drugs was highly beneficial. For example, trastuzumab (anti-HER2 monoclonal antibody) and other related medications at least doubled median survival time in patients with metastatic HER2-positive breast cancer (Hudis, 2007; Nahta and Esteva, 2007). In melanoma, immune checkpoint inhibitors, and anti-BRAF target drugs like Vemurafenib and Dabrafenib dramatically increased the patient's chances to respond to treatment and to increase survival (Chapman et al., 2011; Prieto et al., 2012). Target drugs were also of a great advantage for inoperable kidney cancer, before almost uncurable (Ghidini et al., 2017).
The efficiencies of target drugs vary from patient to patient (Ma and Lu, 2011) and the results of clinical trials clearly evidence that the drugs considered inefficient for an overall cohort of a given cancer type, may be beneficial for a small fraction of the patients (Zappa and Mousa, 2016). For example, the anti-EGFR drugs gefitinib and erlotinib showed little advantage in the randomized trials on patients with non-small cell lung cancer. However, ~10-15% of the patients responded to the treatment and had longer survival characteristics. It was further understood that these patients had activating mutations of EGFR gene and that these mutations, therefore, can predict response to the EGFR-targeting therapies (Gridelli et al., 2011). Interestingly, the same approach was ineffective in colorectal cancer, where EGFR-mutated status had no predictive power for the anti-EGFR drugs cetuximab and panitumumab. In the latter case, it is the wild-type status of KRAS gene (~60% of all the cases) that is indicative of tumor response to these drugs (Grothey and Lenz, 2012).
The price for inefficient treatment is high as it is converted from decreased patient's survival characteristics and overall clinical expenses. There are currently more than 200 different anticancer target drugs approved in different countries, and this number grows every year (Law et al., 2014). However, the predictive molecular diagnostic tests are available for only a minor fraction of drugs, in a minor fraction of cancer types (Hornberger et al., 2005; Le Tourneau et al., 2014; Buzdin et al., 2018). This makes the clinician's decision on drug prescription a difficult task somewhat similar to finding needle in a haystack. The problem of choosing the right medication for the right patient is currently well understood, so US FDA(Food and Drug Administration) strongly recommends any new target drug emerging on the market to be supplied with the companion diagnostics test1. It is, therefore, of a great importance to identify robust predictive biomarkers of target drug efficacy, for as many cancer-drug combinations as possible. Recently, a new generation of molecular markers has been proposed involving gene combinations and even entire molecular pathways (Gu et al., 2011; Li et al., 2014; Toren and Zoubeidi, 2014). Here, the biomarkers used are not just a single gene or single locus-based mutation, expression or epigenetic features, but rather the aggregated combinations of those, crosslinking the physiologically relevant gene products (Diamandis, 2014; Sanchez-Vega et al., 2018; Zaim et al., 2018). The pathway-based approach has been better developed for the high throughput gene expression data (Khatri et al., 2012; Buzdin A. A. et al., 2017; Buzdin et al., 2018) where the Pathway Activation Strength (PAS) may be used as an aggregated biomarker (Buzdin et al., 2014). The formulas for PAS calculation may be different; they normally consider relative concentrations of gene products, internal molecular architecture of pathways and gene coexpression patterns (Ozerov et al., 2016; Aliper et al., 2017; Buzdin et al., 2018). PAS was shown to be more efficient as a biomarker than the individual gene expression data (Borisov et al., 2014, 2017), and PAS biomarkers were further generated for a plethora of normal and pathological conditions, including cancer response to treatments (Kurz et al., 2017; Petrov et al., 2017; Spirin et al., 2017; Wirsching et al., 2017; Sorokin et al., 2018).
Furthermore, a method for ranking of more than a 100 of target anticancer drugs has been recently published based on the PAS scoring and the pathway enrichments by the molecular targets of drugs (Artemov et al., 2015). This approach termed Drug Scoring was experimentally shown promising for drugs prescription to advanced solid tumor patients (Buzdin A. et al., 2017; Buzdin et al., 2018; Poddubskaya et al., 2018). However, good quality expression profiles required for PAS-based Drug Scoring frequently cannot be obtained due to apparent lack of biopsy biomaterials and RNA degradation. To our knowledge, so far there were no published reports on the application of gene mutation data for Drug Scoring.
In this study, for the first time we proposed and tested 10 alternative pathway-based Drug Scoring algorithms utilizing mutations data. These algorithms were used for the data from 3,800 published cancer mutation profiles representing eight tumor localizations and validated using the published clinical trials data. We showed that several mutation-based Drug Scoring methods can be used efficiently for predicting the effectiveness of target drugs. This has been evidenced by statistically significant positive correlations between Drug Score ratings of individual drugs and their therapeutic success reflected by the completed phases of clinical trials for the respective cancer types. We also used the best Drug Scoring algorithm to simulate all known protein coding genes as the potential drug targets. We found that the algorithm-predicted most efficient targets are highly congruent with the molecular targets already used by the real anticancer drugs.
Materials and Methods
The human mutation dataset was obtained from the Catalog Of Somatic Mutations In Cancer (COSMIC) (Forbes et al., 2017). COSMIC aggregates and annotates mutation data from various sources by providing lists of verified somatic mutations. We downloaded the data from COSMIC website, version 76. The complete dataset includes 6,651,236 somatic mutation records for 20,528 genes in 19,434 tumor samples of 37 primary localizations.
The Algorithm Validation Dataset
For the validation of drug scoring algorithms, we extracted mutation data only for the primary localizations containing at least 100 samples indexed in COSMIC and originally taken from The Cancer Genome Atlas (TCGA) project (Tomczak et al., 2015; Forbes et al., 2017) because of the uniform sequencing and data processing pipeline used there. For the algorithm validation dataset, we totally took 3,800 tumor mutation profiles from eight primary localizations: central nervous system, kidney, large intestine (including cecum, colon, and rectum), liver, lung, ovary, stomach, thyroid gland (Table 1).
The COSMIC data were processed with script written in R (version 3.4.3) to obtain mutation profile for each tumor1. The processed data is available as Supplementary Data Sheet 1.
The Dataset for Prediction of Potential Molecular Targets
We used the full COSMIC dataset to increase the statistical significance and to investigate the effectiveness of potential target drugs for a maximum range of cancer localizations. However, we excluded the samples related to cell cultures or tumor xenograft to standardize the analysis. We excluded records having the following marks in the “Sample source” field: organoid culture, short-term culture, cell-line, xenograft. Thus, the final dataset included 6,027,881 mutations records in 18,273 in tumor samples of 35 primary localizations. The COSMIC data were processed with script written in R (version 3.4.3) to return mutation rates for all genes2. The processed data is available as Supplementary Data Sheet 2.
Clinical Trials Data
We extracted clinical trials data from the web sites of NIH (the National Institutes of Health)3 and US FDA4. They were processed by manually curation of web data as of July 2017. The processed clinical trials data used for the correlation studies are shown on Supplementary Table 1.
Molecular Pathways Data
The gene contents data about 3,125 human molecular pathways used to calculate mutation drug scores were extracted from Reactome (Croft et al., 2014), NCI Pathway Interaction Database (Schaefer et al., 2009), Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto, 2000), HumanCyc (Romero et al., 2004), Biocarta (Nishimura, 2001), Qiagen5. For drug scores calculation, we used only the 1,752 pathways including at least 10 gene products because of previously reported poor theoretical data aggregation effect for smaller pathways (Borisov et al., 2017). The information about molecular specificities of 128 anticancer target drugs were obtained from databases DrugBank (Law et al., 2014) and ConnectivityMap (Lamb et al., 2006).
The results were visualized using package ggplot2 (Wickham, 2009).
In this study, we developed a molecular pathway-based method of target drug scoring using high throughput mutation data.
Algorithms of Mutation Drug Scoring
The principle of Mutation Drug Scoring (MDS) methods proposed here deals with quantization of mutation enrichment for the molecular pathways having molecular targets of a drug under investigation. Overall, they are based on the rationale that the greater is the mutation level of the respective pathways, the higher will be the expected drug efficiency. The mutation enrichment of a molecular pathway called pathway instability (PI) is calculated based on the relative mutation rates (MR) of its member genes. Under mutations, we meant here the changes in protein coding sequence understood as such in the Catalog of Somatic Mutations in Cancer (COSMIC) v.76 database (Forbes et al., 2016). COSMIC is the world's largest database of somatic mutations relating to human cancers. We used only Genome-wide Screen Data to estimate MR correctly. This part of COSMIC consists of peer reviewed large-scale genome screening data and data from the validated sources such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC).
Mutation rate (MR) is calculated according to the formula:
where MRn, g is MR of a gene n in a group of samples g; N mut(n,g) is the total number of mutations for gene n in a group of samples g; N samples (g) is the number of samples in a group g. The MR values strongly positively correlated with the lengths of gene coding DNA sequence (CDS; data not shown). In order to remove bias linked with the CDS length, we took for further consideration a normalized value termed Normalized Mutation Rate (nMR) expressed by the formula:
where nMRn is the nMR of a gene n; MRn is the MR of a gene n; Length CDS(n) is the length of CDS of gene n in nucleotides. Indeed, normalization of this metric enabled to terminate any CDS-linked bias (data not shown).
To determine if gene n is included in pathway p, we introduced a Boolean flag pathway-gene indicator PGn, p expressed by the formula:
The Pathway Instability (PI) score is then calculated as follows:
where PIp is pathway instability score for a pathway p; nMRn is the normalized mutation rate of a gene n, PGn, p is pathway-gene indicator for gene n and pathway p. Pathway instability score characterizes the mutation enrichment of a pathway (Pathway instability is an effective new mutation-based type of cancer biomarkers, 2018, in preparation). To formalize if gene n is molecular target of drug d, we introduced another Boolean flag drug target index, DTId, n:
To complete DTI database for this study, we used the data about molecular specificities of 128 target drugs extracted from the databases DrugBank (Law et al., 2014) and Connectivity Map (Lamb et al., 2006).
To link PI scores and estimated drug efficiencies, the following basic formula was proposed for the calculation of Mutation Drug Score (MDS):
where d is drug name; n is gene name; p is pathway name; MDSd is MDS for drug d; DTId, n is drug target index for drug d and gene n; PIp is Pathway Instability of pathway p; PGn, p is pathway-gene indicator for gene n and pathway p.
The above basic formula (1) was modified to generate several alternative methods of drug scoring.
- Pathway size-normalized. Since molecular pathways include considerably different number of genes varying from dozens to hundreds, we proposed a modification of the calculation method (1) where normalization is performed for MDS on the respective number of genes for each PI member:
- where kp is number of genes in pathway p.
- Single count-normalized. Impact of each gene participating in pathways targeted by drug d is counted only once:
- where GIId, n – Boolean flag gene involvement index,
- Number of pathways-normalized. MDS for drug d is normalized on the number of its targeted molecular pathways.
- where md – number of pathways targeted by drug d.
- Number of pathways-normalized. MDS_N is additionally normalized on the number of pathways targeted by drug d (md).
- Number of target genes-normalized. MDS_bd is additionally normalized on the number of target genes for drug d, (bd).
- Number of target genes-normalized MDS_N. MDS_N, normalized on the number of target genes for drug d, (bd).
- Number of target genes-normalized MDS_gene. MDS_gene, normalized on the number of target genes for drug d, (bd).
- Target genes dependent only. MDS2 is calculated considering only mutation frequencies of target genes.
- Single count-normalized, target genes dependent only. MDS2_gene is calculated, considering each target gene for drug d only one time.
For these algorithms of mutation-based drug scoring, we next compared their congruences with the published clinical trials data.
Validation of Mutation Drug Scoring (MDS) Algorithms on Clinical Trials Data
We calculated different versions of MDS according to formulae (1–10) for 128 anticancer target drugs, for eight cancer types (Supplementary Data Sheet 3). We examined somatic mutation profiles for 3,800 samples of the following primary tumor localizations: large intestine (including cecum, colon and rectum), lung, kidney, stomach, ovarian, central nervous system, liver, thyroid (Table 1).
Mutation profiles were extracted from COSMIC v76 database (Forbes et al., 2016). To validate the MDS algorithms, we selected only tumor samples related to TCGA project because it was the largest source of biosamples profiled using a single deep sequencing and bionformatic pipeline (Tomczak et al., 2015). Molecular specificities of drugs were obtained from DrugBank (Law et al., 2014) and Connectivity Map (Lamb et al., 2006) databases. The information about clinical approval and the completion of phases of clinical trials for 128 target drugs for the above eight tumor localizations was taken from the web sites of NIH and US FDA. To measure completion of clinical investigations for a drug, we introduced the metric termed Clinical Status. These values are congruent with the apparent efficiencies of drugs for the given cancer types. The same drugs most frequently had different clinical statuses for the different cancer types.
The Clinical Status varied in a range from 0 to 1 proportional to the top phase of clinical trials passed by a drug for a given cancer type. The Clinical Status grows incrementally depending on the completion of the clinical trials phases 1–4, while the later phases have a greater specific weight, because they allow to more accurately determine clinical efficacy of a drug (Table 2).
The complete Clinical Status information for 128 drugs under investigation is shown on Supplementary Table 1. The major limitation of this approach is that only the drugs that had been already clinically investigated for the respective tumor type can be ranked in such a way.
To investigate the capacities of different versions of Mutation Drug Scores to successfully predicts clinical efficiencies of drugs, we analyzed how ranks of MDS values correlated with Clinical Status of drugs. We calculated correlations and compared distributions of the Spearman correlation coefficients. To calculate correlations, we took all cancer mutation profiles together without separation on cancer types (Figure 1).
Figure 1. Correlation between Clinical Status and MDS rank for 10 types of drug scoring in eight cancer types at once. (A) Distributions of Spearman correlation coefficients between Clinical Status and MDS rank for 128 target drugs in 3,800 tumor samples. MDS rank of a drug was calculated as the individual drug's position in the rating (from top to low) of all drugs under investigation. Ten violin plots distributed along X-axis, each represent a particular type of drug scoring. The Y-axis reflects density distributions of correlations between Clinical Status and MDS ranks. Boxes indicate the second and third quartiles of distribution, black dots indicate outliers. (B) The plot demonstrates the distributions of p-value for the correlation coefficients between Clinical Status and MDS rank for 128 target drugs in the same tumor samples. The horizontal green line corresponds to p = 0.05.
Overall, the markedly better correlations were seen for the MDS and MDS_N types of drug scoring (Figure 1). We next analyzed the cancer type-specific distributions (Figure 2). It was seen that both MDS and MDS_N scores positively correlated with the drugs clinical efficiencies in all the localizations investigated, thus confirming their top status among the drug scoring algorithms. Among those, MDS showed best overall functional characteristics and was, therefore, used in further analyses.
Figure 2. Correlation between Clinical Status and MDS rank for two best types of drug scoring in eight cancer types separately. (A) Distributions of Spearman correlation coefficients between Clinical Status and MDS rank for 128 target drugs in eight tumor types. MDS rank of a drug was calculated as the drug's position in the rating (from top to low) of all drugs under study. The drug scoring methods are shown in horizontal lines, and the cancer types are placed vertically. The violin plots distributed along X-axis, each represent a particular cancer type. The Y-axis reflects density distributions of correlations between Clinical Status and MDS ranks. Boxes indicate the second and third quartiles of distribution, black dots indicate outliers. (B) The plot shows the distributions of p-value for the correlation coefficients between Clinical Status and MDS rank for 128 target drugs in the same tumor types. The horizontal green line corresponds to p = 0.05.
Application of MDS for Identification of Possible Target Genes
We next tested the MDS algorithm for its capacity to identify potentially valuable drug targets. To this end, we modeled a situation when each gene specifically corresponds to one target drug. Those simulated, or virtual drugs, also were specific each to only one gene product. Using the database of 1,752 molecular pathways, we were able to calculate MDS for 8,736 virtual drugs specific to the same number of genes included in these pathways. For this analysis, we used 18,273 full-exome tumor mutation profiles from the COSMIC v76 database. Top 30 molecular targets with highest MDS values and already clinically approved cancer drugs specific for these molecular targets are listed on Table 3. The complete MDS calculation data are given in Supplementary Table 2.
Table 3. Top 30 molecular targets sorted by MDS and clinically approved drugs using these molecular targets.
We next ranked all the virtual drugs according to their MDS values and compared if the same molecular targets are already exploited by the existing 128 target cancer drugs (Figure 3).
Figure 3. Dependence of MDS and occurrence of molecular targets in approved cancer drugs. (A) Deciles of potential molecular targets sorted in ascending order according to MDS value. TC was calculated for each decile, shown on vertical axes. (B) Distribution of MDS values among the potential molecular drug targets. The color scale on the graph indicates densities of clinically approved cancer drugs exploiting the respective molecular targets.
To do this, we introduced an auxiliary value termed Target Conversion (TC). It reflects the percentage share of known molecular targets among predicted molecular targets.
For the overall (complete) list of potential molecular targets, TC was 2.17%. However, there was an clear-cut incremental TC growth trend when the potential molecular targets were sorted in the ascending order of MDS value (Figure 3A, shown for deciles of the potential targets). The greater TC value exceeding 10% was observed for the decile of molecular targets having the highest MDS values.
Molecular targets with the highest MDS are clearly enriched by the existing clinically approved drugs compared to those with low MDS scores (Figure 3A). On the other hand, target genes with higher MDS are covered by a bigger number of approved drugs per target, as many drugs have common molecular specificities (Figure 3B).
The present algorithm for scoring potential drug targets considers a cumulative mutation enrichment of molecular pathways. For the example shown on Figure 4 (Nectin adhesion pathway), most genes involved in a pathway are mutated in cancers, see the color scale. The mutation enrichment of a pathway may characterize its overall involvement in malignization. According to the present conception of drug scoring, the maximum efficiency of drug can be obtained by acting on the most strongly affected molecular pathways.
Figure 4. Mutation enrichment of Nectin adhesion pathway. The pathway is targeted by Idelalisib. The pathway structure is taken from the NCI database (Schaefer et al., 2009). The mutation burden was visualized using Oncobox pathway plot tool. The color scale reflects mutation levels of the corresponding nodes on the pathway graph. The green arrows indicate activation, red arrow—inhibition, bold black arrow indicates molecular target of Idelalisib.
In this study, we report a new bioinformatic instrument of ranking target anticancer drugs using high throughput gene mutation data. We proposed here 10 different versions of molecular pathway-based mutation drug scoring. At least two types of this scoring could provide output data positively correlated with the clinical trials data for 128 drugs in all eight tumor localizations tested. We hope that the pathway-based mutation drug scoring approach has a potential of helping clinical oncologists to implement personalized selection of target drugs based on the individual, the patient's tumor-specific high throughput mutation profile.
We showed that the same approach can be applied to identify potentially efficient molecular targets in experimental oncology. The educated choice of new drug targets is one of the main tasks in pharmacology (Schenone et al., 2013). Experimental search for new efficient drug targets is still time consuming, laborious, and expensive (Haggarty et al., 2003), so since recently a credit is frequently given to computational predictive algorithms (Rifaioglu et al., 2018).
The history of computational prediction of drug targets began with prediction of druggability based on the structure of targets and biomedical text mining (Cheng et al., 2007; Zhu et al., 2007). Several methods have been also proposed based on known links between drugs and genes (Luo et al., 2017). Further development of bioinformatic methods allowed to apply for this task a set of systems approaches based on networks of molecular interactions (Mani et al., 2008).
Our results provide principal evidence that the mutation drug scoring is applicable to ranking of anticancer drugs. On the other hand, our data suggest that these drug scoring algorithms can be applied for the identification of novel molecular targets for the prospective anticancer drugs. Although many genes with high MDS already serve as molecular targets of the approved cancer drugs, there is a number of top MDS genes that are not yet covered by the existing medications. This latter fraction of genes, therefore, can be considered a source of potential targets for new drug developments. For example, the following top 100 MDS genes can be mentioned that are not yet covered by approved or experimental cancer or antineoplastic drugs [according to DrugBank (Law et al., 2014), DGIdb (Cotto et al., 2018), FDA6, HMDB (Wishart et al., 2018), Tocris7, GeneCards (Safran et al., 2010) databases]: GRB2, SOS1,SOS2, SHC1, GNB1, CREB1, GNG2, GNAQ, GNB5, GNAI2. Three of them (GRB2, GNG2, CREB1) are the targets of approved non-oncological drugs (Pegademase bovine, Naloxone, Adenosine monophosphate, Citalopram, Halothane), thus illustrating MDS method potential in drug repurposing.
This study can be regarded as proof-of concept trial of MDS approach exemplified by bigger proportion of real cancer targets among the genes with higher MDS values. In this application, we assessed integral MDS for all cancer types. However, in further applications the same approach can be used for any specific tumor type or subtype to identify targets that may seem most promising for this particular disease. This could be valuable, for example, for drugs repurposing among the different tumor types and for more effectively identifying the patient cohorts in clinical trials
The present mutation drug scoring approach scores the molecular pathway instability caused by accumulation of mutations and ranks drugs according to a simple rationale—the higher is mutation burden of a pathway, the greater may be the efficiency of a drug targeting this pathway. We hope these findings will be interesting to those working in the fields of oncology, drug discovery, systems biomedicine, high throughput mutation data analysis, personalized medicine and molecular diagnostics.
Data Availability Statement
The datasets analyzed for this study can be found in the COSMIC repository (COSMICv76; CosmicGenomeScreensMutantExport. tsv.gz, https://cancer.sanger.ac.uk/cosmic/download).
MZ developed algorithms, did mutation drug scoring analyses, planned the research and wrote the manuscript. PB planned the research, extracted and filtered cancer mutation data. AG planned the research and developed algorithms. MS completed the molecular pathway database. DK completed and processed Clinical Status database. AE organized the information about molecular specificities of anticancer target drugs. NB developed algorithms, did statistical analyses, and planned the research. AB completed Clinical Status database, developed algorithms, planned the research, and wrote the manuscript.
This study was supported by the Oncobox research program in digital oncology, by the Russian Science Foundation grant no. 18-15-00061, by Amazon and Microsoft Azure grants for cloud-based computational facilities for this project.
Conflict of Interest Statement
MS, NB, AG were employed by company Omicsway Corp. MZ, AB were employed by company Oncobox Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2019.00001/full#supplementary-material
Supplementary Data Sheet 1. Mutation profiles for 3800 TCGA tumor samples obtained from COSMIC v76.
Supplementary Data Sheet 2. Normalized mutation rate for genes that mutated in 18273 tumor samples from COSMIC v76.
Supplementary Data Sheet 3. Mutation drug scores for 3800 TCGA tumor samples for 10 versions of drug scoring algorithm. The dataset contains 10 tables that match the 10 methods of drug scoring.
Supplementary Table 1. The complete clinical status information for 128 target drugs in eight cancer localizations.
Supplementary Table 2. Mutation Drug Scores for 8736 virtual drugs specific to the same number of genes.
2. ^Cosmic v76 processing Available at: https://gitlab.com/White_Knight/cosmic76_processing/tree/master [Accessed October 22, 2018].
5. ^QIAGEN - Sample to Insight Available at: https://www.qiagen.com/us/shop/genes-and-pathways/pathway-central/ [Accessed September 19, 2018].
CDS length, Coding DNA Sequence Length; COSMIC, Catalog Of Somatic Mutations In Cancer; FDA, Food and Drug Administration; ICGC, International Cancer Genome Consortium; MDS, Mutational Drug Scores; MR, Mutation rate; nMR, Normalized mutation rate; NIH, The National Institutes of Health; PAS, Pathway Activation Strength; PI, Pathway instability; TCGA, The Cancer Genome Atlas; TC, Target Conversion.
Aliper, A. M., Korzinkin, M. B., Kuzmina, N. B., Zenin, A. A., Venkova, L. S., Smirnov, P. Y., et al. (2017). Mathematical justification of expression-based pathway activation scoring (PAS). Methods Mol. Biol. 1613, 31–51. doi: 10.1007/978-1-4939-7027-8_3
Anders, C. K., Winer, E. P., Ford, J. M., Dent, R., Silver, D. P., Sledge, G. W., et al. (2010). Poly(ADP-Ribose) polymerase inhibition: andquot;targetedandquot; therapy for triple-negative breast cancer. Clin. Cancer Res. 16, 4702–4710. doi: 10.1158/1078-0432.CCR-10-0939
Artemov, A., Aliper, A., Korzinkin, M., Lezhnina, K., Jellen, L., Zhukov, N., et al. (2015). A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation. Oncotarget 6, 29347–29356. doi: 10.18632/oncotarget.5119
Azoury, S. C., Straughan, D. M., and Shukla, V. (2015). Immune checkpoint inhibitors for cancer therapy: clinical efficacy and safety. Curr. Cancer Drug Targets 15, 452–462. doi: 10.2174/156800961506150805145120
Borisov, N., Suntsova, M., Sorokin, M., Garazha, A., Kovalchuk, O., Aliper, A., et al. (2017). Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. Cell Cycle 16, 1810–1823. doi: 10.1080/15384101.2017.1361068
Borisov, N. M., Terekhanova, N. V., Aliper, A. M., Venkova, L. S., Smirnov, P. Y., Roumiantsev, S., et al. (2014). Signaling pathways activation profiles make better markers of cancer than expression of individual genes. Oncotarget 5, 10198–10205. doi: 10.18632/oncotarget.2548
Buzdin, A., Sorokin, M., Garazha, A., Sekacheva, M., Kim, E., Zhukov, N., et al. (2018). Molecular pathway activation - new type of biomarkers for tumor morphology and personalized selection of target drugs. Semin. Cancer Biol. 53, 110–124. doi: 10.1016/j.semcancer.2018.06.003
Buzdin, A., Sorokin, M., Glusker, A., Garazha, A., Poddubskaya, E., Shirokorad, V., et al. (2017). Activation of intracellular signaling pathways as a new type of biomarkers for selection of target anticancer drugs. J. Clin. Oncol. 35:e23142. doi: 10.1200/JCO.2017.35.15_suppl.e23142
Buzdin, A. A., Prassolov, V., Zhavoronkov, A. A., and Borisov, N. M. (2017). Bioinformatics meets biomedicine: oncofinder, a quantitative approach for interrogating molecular pathways using gene expression data. Methods Mol. Biol. 1613, 53–83. doi: 10.1007/978-1-4939-7027-8_4
Buzdin, A. A., Zhavoronkov, A. A., Korzinkin, M. B., Venkova, L. S., Zenin, A. A., Smirnov, P. Y., et al. (2014). Oncofinder, a new method for the analysis of intracellular signaling pathway activation using transcriptomic data. Front. Genet. 5:55. doi: 10.3389/fgene.2014.00055
Chapman, P. B., Hauschild, A., Robert, C., Haanen, J. B., Ascierto, P., Larkin, J., et al. (2011). Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 364, 2507–2516. doi: 10.1056/NEJMoa1103782
Cheng, A. C., Coleman, R. G., Smyth, K. T., Cao, Q., Soulard, P., Caffrey, D. R., et al. (2007). Structure-based maximal affinity model predicts small-molecule druggability. Nat. Biotechnol. 25, 71–75. doi: 10.1038/nbt1273
Cotto, K. C., Wagner, A. H., Feng, Y. Y., Kiwala, S., Coffman, A. C., Spies, G., et al. (2018). DGIdb 3.0: a redesign and expansion of the drug–gene interaction database. Nucleic Acids Res. 46, D1068–D1073. doi: 10.1093/nar/gkx1143
Druker, B. J., Sawyers, C. L., Kantarjian, H., Resta, D. J., Reese, S. F., Ford, J. M., et al. (2001a). Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. N. Engl. J. Med. 344, 1038–1042. doi: 10.1056/NEJM200104053441402
Druker, B. J., Talpaz, M., Resta, D. J., Peng, B., Buchdunger, E., Ford, J. M., et al. (2001b). Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl. J. Med. 344, 1031–1037. doi: 10.1056/NEJM200104053441401
Forbes, S. A., Beare, D., Bindal, N., Bamford, S., Ward, S., Cole, C. G., et al. (2016). COSMIC: high-resolution cancer genetics using the catalogue of somatic mutations in cancer. Curr. Protoc. Hum. Genet. 91, 10.11.1–10.11.37. doi: 10.1002/cphg.21
Forbes, S. A., Beare, D., Boutselakis, H., Bamford, S., Bindal, N., Tate, J., et al. (2017). COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783. doi: 10.1093/nar/gkw1121
Ghidini, M., Petrelli, F., Ghidini, A., Tomasello, G., Hahne, J. C., Passalacqua, R., et al. (2017). Clinical development of mTor inhibitors for renal cancer. Expert Opin. Investig. Drugs 26, 1229–1237. doi: 10.1080/13543784.2017.1384813
Giles, F. J., Cortes, J. E., and Kantarjian, H. M. (2005). Targeting the kinase activity of the BCR-ABL fusion protein in patients with chronic myeloid leukemia. Curr. Mol. Med. 5, 615–623. doi: 10.2174/156652405774641115
Gridelli, C., De Marinis, F., Di Maio, M., Cortinovis, D., Cappuzzo, F., and Mok, T. (2011). Gefitinib as first-line treatment for patients with advanced non-small-cell lung cancer with activating epidermal growth factor receptor mutation: review of the evidence. Lung Cancer 71, 249–257. doi: 10.1016/j.lungcan.2010.12.008
Gu, Y., Zhao, W., Xia, J., Zhang, Y., Wu, R., Wang, C., et al. (2011). Analysis of pathway mutation profiles highlights collaboration between cancer-associated superpathways. Hum. Mutat. 32, 1028–1035. doi: 10.1002/humu.21541
Haggarty, S. J., Koeller, K. M., Wong, J. C., Butcher, R. A., and Schreiber, S. L. (2003). Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Biol. 10, 383–396. doi: 10.1016/S1074-5521(03)00095-4
Hornberger, J., Cosler, L. E., and Lyman, G. H. (2005). Economic analysis of targeting chemotherapy using a 21-gene RT-PCR assay in lymph-node-negative, estrogen-receptor-positive, early-stage breast cancer. Am. J. Manag. Care 11, 313–324.
Kurz, S., Thieme, R., Amberg, R., Groth, M., Jahnke, H. G., Pieroh, P., et al. (2017). The anti-tumorigenic activity of A2M-A lesson from the naked mole-rat. PLoS ONE 12:e0189514. doi: 10.1371/journal.pone.0189514
Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., et al. (2006). The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935. doi: 10.1126/science.1132939
Le Tourneau, C., Paoletti, X., Servant, N., Bièche, I., Gentien, D., Rio Frio, T., et al. (2014). Randomised proof-of-concept phase II trial comparing targeted therapy based on tumour molecular profiling vs conventional therapy in patients with refractory cancer: results of the feasibility part of the SHIVA trial. Br. J. Cancer 111, 17–24. doi: 10.1038/bjc.2014.211
Luo, Y., Zhao, X., Zhou, J., Yang, J., Zhang, Y., Kuang, W., et al. (2017). A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8:573. doi: 10.1038/s41467-017-00680-8
Mani, K. M., Lefebvre, C., Wang, K., Lim, W. K., Basso, K., Dalla-Favera, R., et al. (2008). A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol. Syst. Biol. 4:169. doi: 10.1038/msb.2008.2
Oldenburg, J., Aparicio, J., Beyer, J., Cohn-Cedermark, G., Cullen, M., Gilligan, T., et al. (2015). Personalizing, not patronizing: the case for patient autonomy by unbiased presentation of management options in stage I testicular cancer. Ann. Oncol. 26, 833–838. doi: 10.1093/annonc/mdu514
Ozerov, I. V., Lezhnina, K. V., Izumchenko, E., Artemov, A. V., Medintsev, S., Vanhaelen, Q., et al. (2016). In silico pathway activation network decomposition analysis (iPANDA) as a method for biomarker development. Nat. Commun. 7:13427. doi: 10.1038/ncomms13427
Petrov, I., Suntsova, M., Ilnitskaya, E., Roumiantsev, S., Sorokin, M., Garazha, A., et al. (2017). Gene expression and molecular pathway activation signatures of MYCN-amplified neuroblastomas. Oncotarget 8, 83768–83780. doi: 10.18632/oncotarget.19662
Poddubskaya, E. V., Baranova, M. P., Allina, D. O., Smirnov, P. Y., Albert, E. A., Kirilchev, A. P., et al. (2018). Personalized prescription of tyrosine kinase inhibitors in unresectable metastatic cholangiocarcinoma. Exp. Hematol. Oncol. 7:21. doi: 10.1186/s40164-018-0113-x
Prieto, P. A., Yang, J. C., Sherry, R. M., Hughes, M. S., Kammula, U. S., White, D. E., et al. (2012). CTLA-4 blockade with ipilimumab: long-term follow-up of 177 patients with metastatic melanoma. Clin. Cancer Res. 18, 2039–2047. doi: 10.1158/1078-0432.CCR-11-1823
Rifaioglu, A. S., Atas, H., Martin, M. J., Cetin-Atalay, R., Atalay, V., and Dogan, T. (2018). Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief. Bioinform. doi: 10.1093/bib/bby061 [Epub ahead of print].
Romero, P., Wagg, J., Green, M. L., Kaiser, D., Krummenacker, M., and Karp, P. D. (2004). Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 6:R2. doi: 10.1186/gb-2004-6-1-r2
Sanchez-Vega, F., Mina, M., Armenia, J., Chatila, W. K., Luna, A., La, K. C., et al. (2018). Oncogenic signaling pathways in the cancer genome atlas. Cell 173, 321–337.e10. doi: 10.1016/j.cell.2018.03.035
Schenone, M., Dancík, V., Wagner, B. K., and Clemons, P. A. (2013). Target identification and mechanism of action in chemical biology and drug discovery. Nat. Chem. Biol. 9, 232–240. doi: 10.1038/nchembio.1199
Sorokin, M., Kholodenko, R., Grekhova, A., Suntsova, M., Pustovalova, M., Vorobyeva, N., et al. (2018). Acquired resistance to tyrosine kinase inhibitors may be linked with the decreased sensitivity to X-ray irradiation. Oncotarget 9, 5111–5124. doi: 10.18632/oncotarget.23700
Spirin, P., Lebedev, T., Orlova, N., Morozov, A., Poymenova, N., Dmitriev, S. E., et al. (2017). Synergistic suppression of t(8;21)-positive leukemia cell growth by combining oridonin and MAPK1/ERK2 inhibitors. Oncotarget 8, 56991–57002. doi: 10.18632/oncotarget.18503
Wirsching, A., Melloul, E., Lezhnina, K., Buzdin, A. A., Ogunshola, O. O., Borger, P., et al. (2017). Temporary portal vein embolization is as efficient as permanent portal vein embolization in mice. Surgery 162, 68–81. doi: 10.1016/j.surg.2017.01.032
Wishart, D. S., Feunang, Y. D., Marcu, A., Guo, A. C., Liang, K., Vázquez-Fresno, R., et al. (2018). HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617. doi: 10.1093/nar/gkx1089
Zaim, S. R., Li, Q., Schissler, A. G., and Lussier, Y. A. (2018). Emergence of pathway-level composite biomarkers from converging gene set signals of heterogeneous transcriptomic responses. Pac. Symp. Biocomput. 23, 484–495. doi: 10.1142/9789813235533_0044
Keywords: cancer, DNA mutation, molecular pathways, biomarker, target drugs, tyrosine kinase inhibitors, nibs, mabs
Citation: Zolotovskaia MA, Sorokin MI, Emelianova AA, Borisov NM, Kuzmin DV, Borger P, Garazha AV and Buzdin AA (2019) Pathway Based Analysis of Mutation Data Is Efficient for Scoring Target Cancer Drugs. Front. Pharmacol. 10:1. doi: 10.3389/fphar.2019.00001
Received: 07 November 2018; Accepted: 03 January 2019;
Published: 23 January 2019.
Edited by:Zhe-Sheng Chen, St. John's University, United States
Reviewed by:Honglin Jiang, University of California, San Francisco, United States
Nelson Shu-Sang Yee, Penn State Milton S. Hershey Medical Center, United States
Copyright © 2019 Zolotovskaia, Sorokin, Emelianova, Borisov, Kuzmin, Borger, Garazha and Buzdin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Marianna A. Zolotovskaia, email@example.com