<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2019.00121</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>CaDrA: A Computational Framework for Performing Candidate Driver Analyses Using Genomic Features</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Kartha</surname> <given-names>Vinay K.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/491655/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Sebastiani</surname> <given-names>Paola</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/38742/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kern</surname> <given-names>Joseph G.</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/631405/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Liye</given-names></name>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/588672/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Varelas</surname> <given-names>Xaralabos</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/79535/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Monti</surname> <given-names>Stefano</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/61455/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Bioinformatics Program, Boston University</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Section of Computational Biomedicine, Boston University School of Medicine</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Biostatistics, Boston University School of Public Health</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Biochemistry, Boston University School of Medicine</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff5"><sup>5</sup><institution>School of Life Sciences and Technology, ShanghaiTech University</institution>, <addr-line>Shanghai</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Binhua Tang, Hohai University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ao Li, University of Science and Technology of China, China; Samir B. Amin, The Jackson Laboratory for Genomic Medicine, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Stefano Monti, <email>smonti@bu.edu</email></corresp>
<fn fn-type="other" id="fn002"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>02</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>121</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>10</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>04</day>
<month>02</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2019 Kartha, Sebastiani, Kern, Zhang, Varelas and Monti.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Kartha, Sebastiani, Kern, Zhang, Varelas and Monti</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The identification of genetic alteration combinations as drivers of a given phenotypic outcome, such as drug sensitivity, gene or protein expression, and pathway activity, is a challenging task that is essential to gaining new biological insights and to discovering therapeutic targets. Existing methods designed to predict complementary drivers of such outcomes lack analytical flexibility, including the support for joint analyses of multiple genomic alteration types, such as somatic mutations and copy number alterations, multiple scoring functions, and rigorous significance and reproducibility testing procedures. To address these limitations, we developed Candidate Driver Analysis or CaDrA, an integrative framework that implements a step-wise heuristic search approach to identify functionally relevant subsets of genomic features that, together, are maximally associated with a specific outcome of interest. We show CaDrA&#x2019;s overall high sensitivity and specificity for typically sized multi-omic datasets using simulated data, and demonstrate CaDrA&#x2019;s ability to identify known mutations linked with sensitivity of cancer cells to drug treatment using data from the Cancer Cell Line Encyclopedia (CCLE). We further apply CaDrA to identify novel regulators of oncogenic activity mediated by Hippo signaling pathway effectors YAP and TAZ in primary breast cancer tumors using data from The Cancer Genome Atlas (TCGA), which we functionally validate <italic>in vitro</italic>. Finally, we use pan-cancer TCGA protein expression data to show the high reproducibility of CaDrA&#x2019;s search procedure. Collectively, this work demonstrates the utility of our framework for supporting the fast querying of large, publicly available multi-omics datasets, including but not limited to TCGA and CCLE, for potential drivers of a given target profile of interest.</p>
</abstract>
<kwd-group>
<kwd>oncogenic driver analysis</kwd>
<kwd>stepwise search</kwd>
<kwd>TCGA</kwd>
<kwd>CCLE</kwd>
<kwd>R package</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="67"/>
<page-count count="15"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec><title>Introduction</title>
<p>Advances in high-throughput sequencing technology has led to a rapid rise in the availability of large multi-omic datasets through compendia such as the CCLE, TCGA, the Genotype-Tissue Expression (GTEx), and others (<xref ref-type="bibr" rid="B2">Barretina et al., 2012</xref>; <xref ref-type="bibr" rid="B8">Chang et al., 2013</xref>; <xref ref-type="bibr" rid="B1">Ardlie et al., 2015</xref>). These data include genetic alterations, comprising SCNAs and somatic mutations, epigenetic information, such as microRNA expression and DNA methylation, as well as gene expression profiling through microarray or RNA-sequencing (RNASeq) technology, across tens of thousands of samples representing varying biological contexts. Concomitantly, several computational methods have been developed and applied to effectively query and integrate different types of genome-wide datasets in order to make meaningful predictions about the biological processes driving the phenotypes of interest (<xref ref-type="bibr" rid="B19">Drier et al., 2013</xref>; <xref ref-type="bibr" rid="B35">Kristensen et al., 2014</xref>). An important application of such methods is the identification of recurrent genomic alterations, and their potential effects on downstream pathway activity or phenotypes associated with development and disease states. For example, in many cancers, samples exhibiting elevated activity of a given oncogenic signature may be enriched for, or driven by functionally relevant somatic mutations or SCNAs. Identifying such associations may help elucidate underlying mechanisms contributing to abnormal pathway activity, further enabling disease subtyping and sample classification (<xref ref-type="bibr" rid="B3">Bea et al., 2005</xref>; <xref ref-type="bibr" rid="B52">Savage et al., 2003</xref>; <xref ref-type="bibr" rid="B42">Monti et al., 2012</xref>). Alternatively, linking these genomic features with their close interactors through protein-protein interaction networks, gene function annotations or phenotypic readouts such as drug sensitivity may support the discovery of novel druggable targets and further guide precision medicine regimens (<xref ref-type="bibr" rid="B4">Bild et al., 2006</xref>; <xref ref-type="bibr" rid="B25">Heiser et al., 2011</xref>; <xref ref-type="bibr" rid="B15">Daemen et al., 2013</xref>; <xref ref-type="bibr" rid="B28">Hou and Ma, 2014</xref>; <xref ref-type="bibr" rid="B30">Jia and Zhao, 2014</xref>).</p>
<p>Recently, computational methods and models have been developed for performing driver gene analyses applied to high-dimensional &#x2018;omics&#x2019; data from cancer cell lines and patients. These are typically motivated either by frequency or exclusivity of alterations across samples (<xref ref-type="bibr" rid="B64">Youn and Simon, 2011</xref>; <xref ref-type="bibr" rid="B13">Ciriello et al., 2012</xref>; <xref ref-type="bibr" rid="B16">Dees et al., 2012</xref>; <xref ref-type="bibr" rid="B60">Vandin et al., 2012</xref>; <xref ref-type="bibr" rid="B36">Lawrence et al., 2013</xref>; <xref ref-type="bibr" rid="B37">Leiserson et al., 2013</xref>; <xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>), or their functional interplay based on biological interaction networks and pathway ontology (<xref ref-type="bibr" rid="B46">Ng et al., 2012</xref>; <xref ref-type="bibr" rid="B14">Creixell et al., 2015</xref>; <xref ref-type="bibr" rid="B38">Leiserson et al., 2015</xref>; <xref ref-type="bibr" rid="B12">Cho et al., 2016</xref>). Indeed, certain approaches integrate interactome and functional information to further guide driver gene prioritization in cancer (<xref ref-type="bibr" rid="B11">Chen et al., 2014</xref>; <xref ref-type="bibr" rid="B62">Xi et al., 2017</xref>; <xref ref-type="bibr" rid="B51">Sanchez-Vega et al., 2018</xref>). Some of these tools have been proposed to specifically identify subsets or combinations of genomic features that are collectively associated with a given phenotypic response, explaining a larger fraction of the biological context than any individual feature alone (<xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>). These methods, while useful, do not offer simultaneous support for: (i) the joint analyses of multi-type features, including SCNAs and somatic mutations, with possible extension to other genomic data, (ii) multiple feature scoring functions and, most importantly, (iii) rigorous assessment of the statistical significance of the discovered associations. Of equal relevance, a user-friendly and flexible programming package supporting the rapid screening for candidate drivers given a set of ranked genomic features is currently lacking, and would prove extremely useful for incorporation in analytical pipelines aimed at the generation of novel biological hypotheses.</p>
<p>Here, we present CaDrA, a methodology that searches for the set of genomic alterations, here denoted as <italic>features</italic> (mutations, SCNAs, translocations, etc.), associated with a user-provided ranking of samples within a dataset. Our method specifically employs a stepwise heuristic search to identify a subset of features whose union is maximally associated with the observed sample ranking, and carries out rigorous statistical significance testing based on sample permutation, thereby allowing for the identification of candidate genetic drivers associated with aberrant pathway activity or drug sensitivity, while still exploiting aspects of feature complementarity and sample heterogeneity. To highlight the method&#x2019;s overall performance, along with its relevance and ability to select sets of genomic features that indeed drive certain oncogenic phenotypes in cancer, we perform extensive evaluation of CaDrA based on simulated data, as well as real genomic data from cancer cell lines and primary human tumors. The results from simulations show that CaDrA has high sensitivity for mid- to large-sized datasets, and high specificity for all sample sizes considered. Using genomic data drawn from CCLE and TCGA, we demonstrate CaDrA&#x2019;s capacity to correctly identify well-characterized driver mutations in cancer cell lines and primary tumors spanning multiple cancer types, along with its ability to discover novel features associated with invasive phenotypes in human breast cancer samples, which we functionally validate <italic>in vitro</italic>. Our framework, which is publicly available as an R package, will allow for rapidly mining numerous multi-omics datasets for candidate drivers of user-specified molecular readouts, such as pathway activity, drug sensitivity, protein expression, or other quantitative measurements of interest, further enabling targeted queries and novel hypothesis generation.</p>
</sec>
<sec><title>Results</title>
<sec><title>CaDrA Overview</title>
<p>An overview of CaDrA&#x2019;s workflow is summarized in <xref ref-type="fig" rid="F1">Figure 1</xref>. CaDrA implements a step-wise heuristic approach that searches through a set of binary features [each represented as a 1/0-valued vector, indicating the presence/absence of a SCNA, somatic mutation, or other (epi)genetic alterations across samples, respectively], and returns a final subset of features whose union (logical OR) defines an alteration &#x2018;meta-feature&#x2019; that is maximally associated with the defined sample ranking provided as input (see section &#x201C;Methods&#x201D;). The strength of the association of a meta-feature with a sample ranking is a function of the agreement between the skewness of the alterations&#x2019; occurrences and the sample ranking. The input sample ranking is usually a function of a sample-specific measurement, e.g., the activity level of a pathway, the response to a targeted treatment, the expression level of a given transcript or protein, etc. Therefore, the meta-feature returned by the search is the set of features maximally predictive of that same sample-specific measurement variable. The logical OR operator used in the iterative search framework specifically takes advantage of heterogeneity seen across samples (i.e., samples harboring similar phenotypes but different drivers of the given outcome), thus enabling the potential identification of complementary drivers of target phenotypes (<xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>). CaDrA allows for multiple modes to query ranked binary datasets with user-specified parameters defining search criteria, enables rigorous permutation-based significance testing of results, and reduced computation time by exploiting pre-computed score distributions and parallel computing, when available (see section &#x201C;Methods&#x201D;).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Overview of CaDrA workflow and implementation. CaDrA takes as input a sample-specific measurement to rank samples, and a matrix of binary features of the same samples. In Step 1 (blue box), CaDrA begins by choosing a starting feature, which is either the single feature having the best score based on its left-skewness, or a user-specified start feature. In the next step (Step 2; orange box), the union (logical OR) of this feature with each of the remaining features in the dataset is taken, yielding &#x2018;meta-features&#x2019; with their corresponding scores. If any meta-feature has a better score than the hit from the previous step (Step 3; green box), CaDrA uses this new meta-feature as a reference for the next iteration, repeating Steps 2 and 3 until no further improvement in scores can be obtained. The final output is a set of features (meta-feature) whose union has the (local) maximum score and its permutation-based <italic>p</italic>-value.</p></caption>
<graphic xlink:href="fgene-10-00121-g001.tif"/>
</fig>
</sec>
<sec><title>Analysis of Simulated Data to Evaluate CaDrA Performance</title>
<p>To assess the overall performance of CaDrA to recover (statistically) significantly associated meta-features, we simulated two types of datasets for a range of sample sizes: (i) the <italic>true-positive datasets</italic> consist of both left-skewed (i.e., true positive with skewness concordant with sample ranking) as well as uniformly distributed (i.e., null) features; and (ii) the <italic>null datasets</italic> consist of null features only (see section &#x201C;Methods&#x201D; and <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1</xref>). This enabled us to estimate the overall sensitivity and specificity of CaDrA using the true positive and null datasets, respectively. By running CaDrA on multiple simulated datasets of different sample sizes (<italic>n</italic> = 500 true positive and null datasets for each sample size), we first evaluated the resulting meta-features based on the number of true positive features and the total number of features contained within each returned meta-feature (i.e., the meta-feature size; <xref ref-type="fig" rid="F2">Figure 2A,B</xref>). The true positive datasets had a maximum of five positive features to be detected, while the maximum number of features CaDrA was allowed to add was set to 7, to evaluate the ability of the search to recover all but no more than the positive features. With progressively higher sample sizes, we observed an increase in the fraction of CaDrA-identified meta-features that include all 5 true positive features (<xref ref-type="fig" rid="F2">Figure 2A</xref>). The TPR and FPR of CaDrA on the simulated positive and null data, respectively, for different sample sizes are shown in <xref ref-type="fig" rid="F2">Figure 2C,D</xref>, and was calculated as the fraction of searches returning meta-features with permutation <italic>p</italic>-value significant at &#x03B1; = 0.05 (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref>). The TPR was estimated for different numbers of recovered true positive features (in the true positive datasets), while the FPR was estimated for different numbers of returned features (by definition, false positives) in the null datasets, and is summarized in <xref ref-type="table" rid="T1">Table 1</xref>. CaDrA returned all of the simulated true positive features with 100% TPR for sample sizes larger than <italic>N</italic> = 100. CaDrA also yielded a very high mean TPR of >95% at <italic>N</italic> = 100, with the sensitivity dropping to 7.7% only at the smallest sample size of <italic>N</italic> = 50 (<xref ref-type="table" rid="T1">Table 1</xref>). Further, when applied to the null datasets (<xref ref-type="fig" rid="F2">Figure 2B</xref>), the majority of meta-features returned by CaDrA were correctly deemed as non-significant at &#x03B1; = 0.05, with a maximum mean FPR of 7.2% for the lowest sample size analyzed (<xref ref-type="fig" rid="F2">Figure 2D</xref> and <xref ref-type="table" rid="T1">Table 1</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>CaDrA performance on simulated data. CaDrA was run on 500 independent simulated datasets containing <bold>(A)</bold> both positive and null, and <bold>(B)</bold> only null features with sample sizes ranging between 50 and 500 samples (number in gray box above each sub-panel). In each case, the distribution of the number of features per meta-feature (i.e., the meta-feature size) returned by CaDrA is shown <bold>(A,B)</bold> as well as the number and fraction of searches that yielded significance for &#x03B1; = 0.05 <bold>(C,D)</bold>, corresponding to the true positive rate (TPR) and false positive rate (FPR), respectively.</p></caption>
<graphic xlink:href="fgene-10-00121-g002.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Overall true positive rate (TPR) and false positive rate (FPR) of CaDrA based on simulated data.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Sample Size (<italic>N</italic>)</th>
<th valign="top" align="center">Mean TPR (%)</th>
<th valign="top" align="center">Mean FPR (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">50</td>
<td valign="top" align="center">7.69</td>
<td valign="top" align="center">7.2</td>
</tr>
<tr>
<td valign="top" align="left">60</td>
<td valign="top" align="center">5.76</td>
<td valign="top" align="center">2.8</td>
</tr>
<tr>
<td valign="top" align="left">70</td>
<td valign="top" align="center">11.53</td>
<td valign="top" align="center">3.8</td>
</tr>
<tr>
<td valign="top" align="left">80</td>
<td valign="top" align="center">30.72</td>
<td valign="top" align="center">4.6</td>
</tr>
<tr>
<td valign="top" align="left">90</td>
<td valign="top" align="center">87.55</td>
<td valign="top" align="center">5</td>
</tr>
<tr>
<td valign="top" align="left">100</td>
<td valign="top" align="center">96.51</td>
<td valign="top" align="center">4.6</td>
</tr>
<tr>
<td valign="top" align="left">250</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">4.6</td>
</tr>
<tr>
<td valign="top" align="left">500</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">4.2</td>
</tr>
<tr>
<td valign="top" align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic><italic>Weight-averaged TPR and FPRs were computed per sample size for true positive and null simulated datasets, respectively (n = 500 simulated datasets per sample size; see section &#x201C;Methods&#x201D;)</italic>.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<p>These results suggest that CaDrA requires mid- to large-sized datasets for sufficient sensitivity, while maintaining high specificity at all sample sizes assessed.</p>
</sec>
<sec><title>CaDrA Identifies Known Regulators of Ras/Raf/Mek/ERK Signaling Sensitivity in Cancer Cell Lines</title>
<p>The mitogen-activated protein kinase (MAPK) kinase (MEKK)/extra-cellular signal-regulated kinase (ERK) pathway is a well-conserved kinase cascade known to play a regulatory role in cell proliferation, differentiation, and survival in response to extracellular signaling (<xref ref-type="bibr" rid="B33">Kim and Choi, 2010</xref>; <xref ref-type="bibr" rid="B7">Cargnello and Roux, 2011</xref>; <xref ref-type="bibr" rid="B5">Burotto et al., 2014</xref>). Increased MAP/ERK kinase (MEK) activity is a feature of many cancers, and is often triggered by missense mutations in <italic>BRAF</italic> and <italic>NRAS</italic>, two upstream oncogenes and potent regulators of Ras/Raf/Mek/ERK signaling (<xref ref-type="bibr" rid="B6">Cantwell-Dorris et al., 2011</xref>; <xref ref-type="bibr" rid="B5">Burotto et al., 2014</xref>). Small molecules targeting these mutated proteins have been shown to be effective in treating these cancers via inactivation of Ras/Raf/Mek/ERK signaling (<xref ref-type="bibr" rid="B49">Roberts and Der, 2007</xref>; <xref ref-type="bibr" rid="B9">Chapman et al., 2011</xref>; <xref ref-type="bibr" rid="B2">Barretina et al., 2012</xref>; <xref ref-type="bibr" rid="B31">Johnson and Puzanov, 2015</xref>). To highlight CaDrA&#x2019;s ability to recover independent genomic features that may confer hypersensitivity of cancer cells to targeted small molecule treatment, we utilized drug sensitivity profiles for MEK inhibitor AZD6244 (<xref ref-type="bibr" rid="B63">Yeh et al., 2007</xref>), along with matched genomic data from CCLE. Specifically, we used per-sample estimates of &#x2018;ActArea&#x2019; or area under the fitted dose response curve, a metric that has been shown to accurately capture drug response behavior (<xref ref-type="bibr" rid="B29">Jang et al., 2014</xref>), to rank cell lines from high to low sensitivity, as well as data comprising somatic mutations and SCNAs as the binary feature matrix (see section &#x201C;Methods&#x201D;). CaDrA was then run to look for a subset of features associated with increased sensitivity to treatment with AZD6244 (i.e., increased ActArea scores).</p>
<p>The resulting feature set (i.e., meta-feature) is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. Remarkably, CaDrA selected the BRAF<sup><italic>V</italic>600<italic>E</italic></sup> and <italic>NRAS</italic> somatic mutations in the first two iterations, respectively. Subsequent iterations identified mutations in <italic>APAF1</italic>, <italic>TGFBR2</italic>, and <italic>AMHR2</italic>, before terminating the search process (<italic>P</italic> &#x2264; 0.001). APAF1 is a pro-apoptotic factor and known regulator of cell survival and tumor development (<xref ref-type="bibr" rid="B21">Ferraro et al., 2003</xref>), the depleted expression of which has been observed in malignant melanoma cell lines and specimens (<xref ref-type="bibr" rid="B55">Soengas et al., 2006</xref>). TGFBR2 and AMHR2 are both type II receptors functioning as part of the transforming growth factor (TGF)/bone morphogenetic protein (BMP) superfamily, together serving as mediators of cellular differentiation, proliferation and survival, and play important roles in directing epithelial-mesenchymal transition (EMT) (<xref ref-type="bibr" rid="B50">Rojas et al., 2009</xref>; <xref ref-type="bibr" rid="B57">Stone et al., 2016</xref>). Notably, MAPK signaling activity can also be regulated by TGF/BMP stimulation (<xref ref-type="bibr" rid="B17">Derynck and Zhang, 2003</xref>; <xref ref-type="bibr" rid="B45">Moustakas and Heldin, 2005</xref>; <xref ref-type="bibr" rid="B10">Chapnick et al., 2011</xref>), suggesting that these mutations are potential independent drivers of increased MEK signaling, and hence, of increased sensitivity to treatment with AZD6244. We next extended our analysis of cancer cell line sensitivity profiles to alternative small molecules targeting MEK (PD-0325901), as well as RAF (PLX4720 and RAF265). The meta-features associated with increased sensitivity to each of the four drug treatments assessed are shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3</xref> and summarized in <xref ref-type="table" rid="T2">Table 2</xref>. Importantly, both BRAF<sup><italic>V</italic>600<italic>E</italic></sup> and NRAS mutations were identified as candidate drivers of sensitivity to MEK inhibition by AZD6244 and PD-0325901. Furthermore, the BRAF<sup><italic>V</italic>600<italic>E</italic></sup> mutation was returned by CaDrA for all four independent queries, highlighting its association with increased sensitivity to inhibitors targeting the same protein (BRAF) as well as its downstream effector (MEK).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>CaDrA identifies mutations in MAPK/ErK signaling genes that contribute to hyper-sensitivity to MEK inhibition <italic>in vitro</italic>. ActArea measurements reflecting sensitivity to MEK inhibitor AZD6244 were used to rank CCLE cell lines (<italic>n</italic> = 477). CaDrA was then run to identify sets of genomic features that were most-associated with decreasing ActArea (i.e., increasing sensitivity) scores. Through step-wise search iterations, CaDrA identified somatic mutations in known regulators upstream of MEK, including an activating mutation in <italic>BRAF</italic> (BRAF<sup><italic>V</italic>600<italic>E</italic></sup>) and <italic>NRAS</italic>, as well as those in <italic>APAF1</italic>, <italic>TGFBR2</italic>, and <italic>AMHR2</italic>, before terminating the search process. The resulting meta-feature (red track) and its corresponding enrichment score (ES) is shown.</p></caption>
<graphic xlink:href="fgene-10-00121-g003.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Summary of mutation subsets identified by CaDrA as associated with elevated Mek and Raf inhibition in cancer cell lines.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Target</th>
<th valign="top" align="left">Treatment</th>
<th valign="top" align="left">CaDrA hits</th>
<th valign="top" align="center"><italic>P</italic>-value</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">MEK</td>
<td valign="top" align="left">AZD6244</td>
<td valign="top" align="left"><italic>BRAF.V600E, NRAS, APAF1, TGFBR2, AMHR2</italic></td>
<td valign="top" align="center">0.001</td>
</tr>
<tr>
<td valign="top" align="left">MEK</td>
<td valign="top" align="left">PD-0325901</td>
<td valign="top" align="left"><italic>BRAF.V600E, NRAS, TRIM33</italic></td>
<td valign="top" align="center">0.001</td>
</tr>
<tr>
<td valign="top" align="left">RAF</td>
<td valign="top" align="left">PLX4720</td>
<td valign="top" align="left"><italic>BRAF.V600E</italic></td>
<td valign="top" align="center">0.001</td>
</tr>
<tr>
<td valign="top" align="left">RAF</td>
<td valign="top" align="left">RAF265</td>
<td valign="top" align="left"><italic>TTK, BRAF.V600E, ZMYM2, IL21R, BCL11B, MAP3K5, TAF15</italic></td>
<td valign="top" align="center">0.005</td>
</tr>
<tr>
<td valign="top" align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic><italic>Mutation meta-features identified as associated with increased sensitivity to inhibitors targeting Mek (AZD6244, PD-0325901) and Raf (PLX4720) are shown, along with the corresponding permutation</italic> p-value of each search result.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<p>Collectively, these results confirm CaDrA&#x2019;s capability to accurately identify upstream drivers of cellular response to treatment that are both components of independently linked pathways, as well as part of the same signaling branch, which in turn suggests their role in driving the disease state of interest.</p>
</sec>
<sec><title>CaDrA Identifies Hallmark Drivers Associated With Protein Biomarkers in Human Cancers</title>
<p>Protein abundance levels have widely been utilized to histologically classify several human tumor subtypes, with relevant diagnostic and therapeutic implications. Epidermal Growth Factor Receptor (EGFR) expression, for instance, together with <italic>EGFR</italic> mutation status can be used to predict response to existing anti-EGFR treatments in patients with lung cancers (<xref ref-type="bibr" rid="B47">Pao et al., 2004</xref>; <xref ref-type="bibr" rid="B40">Mascaux et al., 2011</xref>). To demonstrate CaDrA&#x2019;s targeted search mode when identifying genomic alterations that track with a pre-defined starting feature, we ran CaDrA using phosphorylated EGFR (EGFR<sup>Tyr1068</sup>) protein expression levels to stratify TCGA lung adenocarcinomas (LUAD), and seeded the search process with EGFR mutations. Subsequent search iterations selected well-known regulators of EGFR activity in lung cancers, including mutations in epithelial-to-mesenchymal transition mediators <italic>SMAD4</italic> and <italic>LAMC2</italic>, as well as <italic>ERBB2</italic> (<xref ref-type="bibr" rid="B39">Liu et al., 2015</xref>; <xref ref-type="bibr" rid="B43">Moon et al., 2015</xref>), with the meta-feature being statistically significant based on the permuted null background obtained for the same search criterion (<italic>P</italic> &#x2264; 0.02; <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S4</xref>).</p>
<p>We then wished to more systematically determine whether CaDrA can identify known drivers of target profiles previously associated with oncogenic and tumor-suppressive markers in human cancers. To do so, we queried TCGA expression profiles of proteins encoded by a set of hallmark genes that are defined in the COSMIC database (<xref ref-type="bibr" rid="B22">Forbes et al., 2017</xref>), along with genomic data from nine different cancer types in TCGA (<xref ref-type="bibr" rid="B22">Forbes et al., 2017</xref>). Briefly, for each cancer type, a CaDrA query was performed with respect to each of the proteins corresponding to the COSMIC-defined oncogenes or tumor suppressor genes (<italic>n</italic> = 57). In particular, CaDrA was applied to search for sets of genomic features associated with elevated protein expression for each protein under consideration. The features selected by CaDrA were then pooled across all protein queries, and the resulting feature set was tested for enrichment against the reference COSMIC list of frequently mutated oncogenes and tumor suppressor genes (<italic>n</italic> = 554; see section &#x201C;Methods&#x201D;). We observed a significant enrichment of the reference cancer driver mutations among the CaDrA-identified features in all cancer types tested (Hyper-enrichment FDR &#x003C; 0.05; <xref ref-type="fig" rid="F4">Figure 4</xref> and <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S1</xref>). These results validate CaDrA&#x2019;s ability to identify independently cataloged, functionally relevant genomic drivers in primary human malignancies.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>CaDrA systematically identifies known drivers of onco-proteins and tumor suppressor proteins in human cancers. TCGA genomic data for nine different cancer types were queried using the expression of distinct proteins mapping to hallmark genes included in COSMIC (<italic>n</italic> = 57) for sample ranking. Resulting meta-features identified by CaDrA were then pooled across all protein queries and tested for enrichment against a reference COSMIC-defined gene list (<italic>n</italic> = 554). FDR-adjusted gene set enrichment <italic>p</italic>-values are shown, with cancer types sorted in decreasing order of FDR <italic>q</italic>-value. BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinomas; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PRAD, prostate adenocarcinoma. Points are plotted in -log<sub>10</sub> space.</p></caption>
<graphic xlink:href="fgene-10-00121-g004.tif"/>
</fig>
</sec>
<sec><title>CaDrA Reveals Novel Drivers of Oncogenic YAP/TAZ Activity in Human Breast Cancer</title>
<p>Next, we tested whether our framework can be applied to the discovery of novel drivers of oncogenic pathways in cancer. The Hippo signaling pathway is a highly conserved developmental pathway known to play an essential role in cell proliferation and survival (<xref ref-type="bibr" rid="B61">Varelas, 2014</xref>). YAP (<xref ref-type="bibr" rid="B58">Sudol, 1994</xref>), and TAZ (<xref ref-type="bibr" rid="B32">Kanai et al., 2000</xref>) serve as central downstream transcriptional effectors of the pathway. Aberrant nuclear YAP/TAZ localization and transcriptional activity is associated with a range of cancers, including BRCAs (<xref ref-type="bibr" rid="B27">Hiemer et al., 2015</xref>; <xref ref-type="bibr" rid="B44">Moroishi et al., 2015</xref>; <xref ref-type="bibr" rid="B67">Zanconato et al., 2015</xref>, <xref ref-type="bibr" rid="B66">2016</xref>). To identify alternative genetic events that can potentially explain the elevated YAP/TAZ activity exhibited in some human breast cancers, we applied CaDrA using genomic data from the TCGA BRCA sample cohort, along with corresponding per-sample estimates of YAP/TAZ activity derived using a gene expression signature of YAP/TAZ knockdown in MDA-MB-231 cells (see section &#x201C;Methods&#x201D;). Samples with available RNASeq, somatic mutation and SCNA profiles (<italic>n</italic> = 957) were first ranked in decreasing order of their overall YAP/TAZ activity estimates. The ranked binary matrix of mutation and SCNA features were then used as input to CaDrA. In the first iteration, CaDrA identified the top scoring genomic feature to be a deletion on chromosomal locus chr5q21.3 (<xref ref-type="fig" rid="F5">Figure 5A</xref>), harboring tyrosine kinase receptor-encoding gene <italic>EFNA5</italic>. <italic>EFNA5</italic>, a member of the Eph receptor family, has been hypothesized to function as a tumor suppressor, whose expression has been shown to be reduced in human BRCAs relative to normal epithelial tissue (<xref ref-type="bibr" rid="B23">Fu et al., 2010</xref>). Advancing to a second iteration, CaDrA then identified an additional deletion of chr20p13 as the next-best feature (<xref ref-type="fig" rid="F5">Figure 5A</xref>). The chr20p13 genomic deletion spans multiple genes (<xref ref-type="supplementary-material" rid="SM1">Supplementary Table S2</xref>), including <italic>RBCK1</italic>, whose reduced expression has been shown to be associated with increased tumor cell proliferation and survival, as well as with poor prognosis in breast cancer (<xref ref-type="bibr" rid="B18">Donley et al., 2014</xref>). CaDrA then proceeded to identify somatic mutations in the <italic>RELN</italic> gene, before terminating the search process (<italic>P</italic> &#x2264; 0.001; <xref ref-type="fig" rid="F5">Figure 5A</xref>). Loss of <italic>RELN</italic> expression has indeed been shown to induce cell migration in esophageal carcinoma, and to be associated with poor prognosis in breast cancer (<xref ref-type="bibr" rid="B56">Stein et al., 2010</xref>; <xref ref-type="bibr" rid="B65">Yuan et al., 2012</xref>). To ensure that the derived meta-feature association is not a spurious consequence of correlation with tumor subtype, we tested for the association of YAP/TAZ activity with the meta-feature while controlling for BRCA TN status using a linear regression model. The results confirmed that the positive association between YAP/TAZ activity and the occurrence of these genomic alterations is independent of BRCA patho-histology (linear regression meta-feature coefficient <italic>P</italic> &#x003C; 0.0001; <xref ref-type="fig" rid="F5">Figure 5B</xref>). Analysis of YAP/TAZ activity based on the same knockdown signature in CCLE BRCA cell lines (<italic>n</italic> = 59; <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S5A</xref>) shows that <italic>RBCK1</italic> and <italic>RELN</italic> display the highest anti-correlation between their gene expression and YAP/TAZ activity (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S5B</xref>). In order to assess whether these identified candidates indeed drive the elevated YAP/TAZ activity phenotype, we performed siRNA-mediated knockdown of <italic>RELN</italic> or <italic>RBCK1</italic> in HS578T breast cancer cells, followed by expression quantification of YAP/TAZ canonical targets, which serves as a read-out of nuclear YAP/TAZ activity (<xref ref-type="bibr" rid="B48">Piccolo et al., 2014</xref>). HS578T cells which, similar to MDA-MB-231 cells from which the gene signature was derived, are TN BRCA cells but display lower overall YAP/TAZ activity (rank 7/59) compared to the latter (rank 54/59). Importantly, knockdown of either of these candidate drivers in these cells yielded a significant increase in expression levels of YAP/TAZ targets CTGF and CYR61 (FDR &#x003C; 0.05; two-tailed Student&#x2019;s <italic>t</italic>-test), validating the association of their loss of function with increased YAP/TAZ transcriptional activity (<xref ref-type="fig" rid="F5">Figure 5C</xref>).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>CaDrA identifies novel drivers of oncogenic YAP/TAZ activity in human breast carcinomas. <bold>(A)</bold> TCGA BRCA RNASeq data (<italic>n</italic> = 951) was projected onto the space of YAP/TAZ-activating genes (blue area plot; see section &#x201C;Methods&#x201D;). A CaDrA search for features associated with elevated YAP/TAZ activity identified two chromosomal deletions (<italic>Del5q21.3</italic>, <italic>Del20p13</italic>), and a somatic mutation in <italic>RELN</italic> (black tracks). The union of the three features (red track) and the corresponding running enrichment score (ES) is also shown. <bold>(B)</bold> Box plot of YAP/TAZ activity estimates for triple negative (TN) and non-TN TCGA BRCA samples. Sample groups are further stratified by the presence or absence of the union alteration status of the meta-feature identified by CaDrA (panel a, red track). Only samples with known TN status were considered <bold>(C)</bold> siRNA-mediated knockdown of 20p13-harboring gene <italic>RBCK1</italic>, and <italic>RELN</italic> in HS578T cells resulted in significant increase in the expression levels of canonical YAP/TAZ targets CTGF and CYR61, as indicated by their relative qRT-PCR expression, confirming the identified CaDrA hits as potential regulators of BRCA-associated YAP/TAZ activity. <bold>(D)</bold> Sub-sampling-based reproducibility assessment for candidate drivers of YAP/TAZ activity compared to a CaDrA query for a random profile ranking in TCGA BRCAs. Jaccard (<italic>J</italic>) indices of the returned meta-features obtained with and without sub-sampling (repeated for <italic>n</italic> = 100 independent sub-sampling iterations) were computed and compared for the two queries, yielding a significantly higher <italic>J</italic> index distribution for the original query relative to the permuted ranking query (Wilcox <italic>P</italic> &#x003C; 0.0001). Ctrl: Scrambled control; YT: YAP/TAZ; <sup>&#x2217;</sup> FDR &#x003C; 0.05; two-tailed Student&#x2019;s <italic>t</italic>-test.</p></caption>
<graphic xlink:href="fgene-10-00121-g005.tif"/>
</fig>
<p>Thus, application of CaDrA to the analysis of YAP/TAZ activity in primary BRCA samples identified multiple new candidate drivers, with <italic>in vitro</italic> validation confirming the causal role of the top two candidates, <italic>RBCK1</italic> and <italic>RELN</italic>, in driving this activity. These results highlight our tool&#x2019;s ability to discover novel oncogenic genomic drivers.</p>
</sec>
<sec><title>Evaluation of CaDrA Reproducibility</title>
<p>Next, we sought to determine CaDrA&#x2019;s reproducibility, and how this may be influenced by the statistical significance of the returned meta-feature (as determined by permutation <italic>p</italic>-value). To do so, we implemented a sub-sampling procedure and applied it to the search for YAP/TAZ activity drivers in TCGA BRCAs. Specifically, the original meta-feature returned by the search on the full dataset, and the meta-feature returned when performing the same search on a random subset (80%) of samples were compared by the Jaccard (<italic>J</italic>) index (see section &#x201C;Methods&#x201D;). We performed this sub-sampling search procedure both with respect to the original sample ranking (<xref ref-type="fig" rid="F5">Figure 5A</xref>), and with respect to a permuted sample ranking (<italic>n</italic> = 100 iterations each). Comparison of the resulting <italic>J</italic> index distributions yielded a significantly higher reproducibility of results when sub-sampling from the original sample ranking, than from the randomly permuted one (Wilcox <italic>P</italic> &#x003C; 0.0001; <xref ref-type="fig" rid="F5">Figure 5D</xref>). These results support the conclusion that the CaDrA-based significance testing is a strong predictor of a search result reproducibility, and a rigorous criterion to discriminate between true and false positives.</p>
<p>To systematically validate this conclusion, we extended the sub-sampling analysis to CaDrA queries of protein expression profiles across the nine different cancer types previously described. Briefly, for each cancer type we assessed whether the meta-features corresponding to the top five most-significant CaDrA protein queries (CaDrA <italic>P</italic> &#x2264; 0.05) were more reproducible than those corresponding to a randomly selected subset of five non-significant protein queries (CaDrA <italic>P</italic> > 0.05). To this end, the <italic>J</italic> index distribution obtained upon sub-sampling from the significant queries (<italic>n</italic> = 100 iterations each) was compared to the equivalent distribution from the non-significant queries, and a significantly higher reproducibility of the former was observed in all nine cancer types tested (Wilcox FDR &#x003C; 0.001; <xref ref-type="fig" rid="F6">Figure 6</xref>).</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Pan-cancer sub-sampling analysis confirms agreement between CaDrA search significance and reproducibility of identified meta-features. CaDrA was applied to search for genomic alterations associated with elevated protein expression for all proteins profiled using RPPAs, for nine different cancer types in TCGA. Reproducibility by sub-sampling was then assessed for the top 5 significant (CaDrA <italic>P</italic> &#x2264; 0.05), and 5 non-significant (CaDrA <italic>P</italic> > 0.05) protein queries (see text). Consistency of CaDrA results was computed by the Jaccard (<italic>J</italic>) index of the returned meta-feature obtained with and without sub-sampling for each iteration, with the <italic>J</italic> indices pooled for the 5 significant and non-significant results, respectively. Box plots highlight a significantly higher <italic>J</italic> index coefficient among the significant protein queries compared to the non-significant queries across all cancer types investigated (Wilcox FDR &#x003C; 0.001).</p></caption>
<graphic xlink:href="fgene-10-00121-g006.tif"/>
</fig>
<p>Taken together, these results show that CaDrA-based significance testing is a strong predictor of a search result reproducibility. Most importantly, it provides for a statistically rigorous decision rule, which would not be available based on the sub-sampling results alone.</p>
</sec>
</sec>
<sec><title>Discussion</title>
<p>Identifying (epi)genetic drivers of molecular readouts is of fundamental importance to determining alternative mechanisms influencing the phenotype in question. Existing methods attempting to extract functionally relevant sets of genomic alterations associated with a given context either do not support the analysis of data beyond somatic mutations, do not incorporate multiple feature scoring functions and search modes, or do not implement rigorous statistical significance testing of the obtained results. Importantly, a computational framework package bundling all of these features does not exist, and can significantly help identify novel drivers of signature activity.</p>
<p>Here, we presented CaDrA as a tool that determines the subset of queried binary features most associated with a phenotypic signature of interest by specifically exploiting a stepwise heuristic search method. CaDrA was applied to identify both known and novel genomic drivers of sample signature activity, comprising drug sensitivity, protein expression and gene set activity estimates, using publicly available multi-omics datasets from cancer cell lines and primary tumors. Querying CCLE data for features associated with increased sensitivity to Mek/Raf inhibitors, CaDra recovered known driver mutations in oncogenes known to be gate-keepers of MEK pathway activity, including <italic>NRAS</italic> and <italic>BRAF</italic>. Importantly, BRAF<sup><italic>V</italic>600<italic>E</italic></sup> mutations account for >90% of BRAF mutations and is generally found to be mutually exclusive to <italic>NRAS</italic> mutations (<xref ref-type="bibr" rid="B53">Sensi et al., 2006</xref>; <xref ref-type="bibr" rid="B6">Cantwell-Dorris et al., 2011</xref>), as also observed in the CCLE, highlighting CaDrA&#x2019;s ability to identify features exhibiting mutual exclusivity. Further, the large-scale investigation of expression profiles of annotated hallmark proteins in tumors from nine different cancer types in TCGA confirmed CaDrA&#x2019;s ability to systematically identify known mutations of oncogenes and tumor suppressor genes in human cancers, as defined in the COSMIC database.</p>
<p>Through our extensive evaluation on simulated data, we were able to highlight CaDrA&#x2019;s high sensitivity for mid-to-large sized datasets (<italic>N</italic> > 90), and high specificity for all sample sizes considered. Importantly, multi-omics datasets produced by networks such as CCLE and TCGA, also presented in this study, are well above this sample size limit. CaDrA&#x2019;s specificity was further evident when querying genetic drivers of increased sensitivity to treatment with PLX4720, a potent and selective inhibitor designed to preferentially inhibit active B-Raf protein bearing the V600E allele (<xref ref-type="bibr" rid="B59">Tsai et al., 2008</xref>). In this scenario, the search process correctly identified the BRAF<sup><italic>V</italic>600<italic>E</italic></sup> mutation as the sole feature associated with elevated sensitivity to treatment, in agreement with the known specificity of the small molecule inhibitor, with the feature association being highly statistically significant. It is important to emphasize that the evaluation of CaDrA&#x2019;s sensitivity and specificity crucially relied on the statistical testing procedure we defined, a feature missing in most of the other existing methods.</p>
<p>We were also able to demonstrate the utility of our framework in the discovery of novel drivers in human breast cancers. Specifically, we asked whether there were genomic alterations associated with elevated activity of Hippo pathway co-activators YAP/TAZ, known to control pro-tumorigenic signals in multiple cancer types (<xref ref-type="bibr" rid="B27">Hiemer et al., 2015</xref>; <xref ref-type="bibr" rid="B44">Moroishi et al., 2015</xref>; <xref ref-type="bibr" rid="B66">Zanconato et al., 2016</xref>). The mechanisms contributing to dysregulated YAP/TAZ activity in cancer remain poorly understood. To date, very few genomic alterations have been associated with driving tumorigenic YAP/TAZ activity (<xref ref-type="bibr" rid="B24">Harvey et al., 2013</xref>). Our CaDrA search with respect to a sample ranking of decreasing YAP/TAZ activity, as measured by the coordinated expression of YAP/TAZ-activated genes, yielded a meta-feature consisting of chromosomal deletions of 5q21.3 and 20p13, and mutations in the <italic>RELN</italic>. Subsequent functional validation by knockdown of select targets, namely RELN and RBCK1, in HS578T BRCA cells exhibiting low YAP/TAZ-activity resulted in a significant increase in the expression of canonical YAP/TAZ targets CTGF and CYR61. These results confirmed the selected targets&#x2019; involvement in the regulation of YAP/TAZ-mediated activity, and the capability of CaDrA to identify new drivers of pathway activity. Importantly, this case study highlights the capability of the method to integrate information, and discover targets pertaining to multiple DNA alteration types.</p>
<p>A sub-sampling-based assessment of CaDrA&#x2019;s results show that the ability to recover reproducible meta-features was higher for the true (significant) YAP/TAZ activity ranking, compared to a randomly permuted sample ranking. This sub-sampling procedure was independently assessed using a systematic pan-cancer comparison of reproducibility results from significant and non-significant protein queries, which revealed a significantly higher concordance of the former compared to the latter in all cases tested. Together, these results confirm the agreement between the estimated permutation <italic>p</italic>-values and the reproducibility of the meta-features identified by CaDrA, and emphasize the importance of our statistical testing procedure in supporting normative decision making.</p>
<p>Previously developed methods have indeed been shown to aid in the selection of functionally relevant genomic features in cancer (<xref ref-type="bibr" rid="B13">Ciriello et al., 2012</xref>; <xref ref-type="bibr" rid="B60">Vandin et al., 2012</xref>; <xref ref-type="bibr" rid="B37">Leiserson et al., 2013</xref>, <xref ref-type="bibr" rid="B38">2015</xref>; <xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>). However, CaDrA is to our knowledge the only method performing <italic>rank-based</italic> prediction in this context, which we believe is well-suited to: (i) model the noisy relationship between (epi)genetic alterations and a functional readout, and (ii) privilege the accurate prediction of highly ranked samples over lowly ranked samples, a desirable feature when modeling oncogenic activity. Furthermore, the framework as defined is flexible enough such that non-rank-based scoring functions can be easily incorporated. We emphasize that using rank-based scoring functions, while advantageous for the reasons mentioned, rely on accurate stratification of samples based on the dependent variable to yield concordant associations for a given biological question. Thus, the soundness of predictions is dependent on the quality of signatures used to query the target profile of interest.</p>
<p>The method that most-resembles CaDrA in its approach is REVEALER (<xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>), an iterative search algorithm that functions in a similar fashion to CaDrA, while specifically seeking only those features that are mutually exclusive given the sample context. We note that a direct and rigorous comparison between CaDrA and REVEALER was not possible given the lack of a formal procedure to estimate statistical significance of results in the latter. We further emphasize that our tool defines a flexible framework capable of incorporating additional feature scoring functions, including the mutual information criterion implemented in REVEALER. Indeed, the incorporation of such scoring functions would benefit from the statistical significance estimation module built into CaDrA.</p>
<p>Current implementations of CaDrA and other similar methods are limited to the use of summarized input genomic features that are treated as binary events, denoting the presence or absence of a given mutation or SCNA in a sample. As we have demonstrated, this summarization approach is indeed sufficient to identifying genomic feature sets that may drive the target profile of interest. However, since different types of point mutations (missense, truncating, etc.) may impose differing functional impacts in oncogenes versus tumor suppressor genes, we surmise that these methods could be further improved by qualitatively differentiating between the different types of alterations being considered. One possibility would be to separate mutations by predicted gain or loss-of-function, as well as to distinguish between low (1) and high (&#x2265;2) DNA copy number gains or losses, although this may lead to excessive sparsity in the input matrix for low-frequency point mutations and SCNAs.</p>
<p>While our evaluations focused on somatic mutations and SCNAs, CaDrA&#x2019;s search functionality can be applied to additional sequencing readouts capturing regulatory features, including and not limited to, DNA methylation and microRNA expression, albeit with proper discretization of these continuous features. A joint analysis of these additional data types might provide insight into epigenetic mechanisms that complement the assessed genetic features in driving phenotypic variation. Furthermore, we envision the adoption of CaDrA for the study of germ-line variation as well, thus contributing to move beyond the &#x201C;one feature at a time&#x201D; paradigm typical of GWAS studies, although issues of computational efficiency in that problem space will likely become more challenging.</p>
</sec>
<sec><title>Conclusion</title>
<p>CaDrA enables the efficient identification of subsets of genomic features, including somatic mutations and SCNAs, as candidate drivers of a pre-defined phenotypic variable. Given the rapid rise in the availability of multi-omics datasets, as well as an increased need to interrogate targeted molecular readouts within these contexts, we believe that our methodology will accelerate feature prioritization for further follow-up and consideration, in turn aiding in the discovery of potential drivers of the phenotype of interest. Thus, we propose CaDrA as a tool for both targeted hypotheses testing, and novel hypothesis generation.</p>
</sec>
<sec><title>Methods</title>
<sec><title>The CaDrA Algorithm</title>
<p>An overview of CaDrA&#x2019;s workflow is summarized in <xref ref-type="fig" rid="F1">Figure 1</xref>. CaDrA takes as input the sample ranking induced by a sample-specific measurement, a matrix of binary features (1/0 indicating the presence/absence of a given feature in a sample), and a scoring method specification to measure the significance of the concordance between the occurrence of alteration events and the defined sample ranking. The pre-defined sample ranking can be based on quantitative estimates of a gene expression, a signature or pathway activity, or other experimentally derived measurements. Each row in the matrix of binary features denotes the presence or absence of a somatic alteration (mutation, CNA, or other) in each of the samples in the ranked cohort. The score function is a measure of the <italic>left-skewness</italic> of a binary vector with respect to the sample ranking. The more the occurrences of an alteration are skewed toward higher rankings (i.e., the more the 1&#x2019;s in the feature vector are skewed toward the left), the higher the score. The scores currently implemented are the KS test (default), and the Wilcoxon rank-sum test, but additional scoring functions can easily be added.</p>
<p>Given the sample ranking, the matrix of binary features, and the score of choice (KS or Wilcoxon), CaDrA implements a step-wise greedy search: it begins by first selecting the single feature that maximizes the score (Step 1; <xref ref-type="fig" rid="F1">Figure 1</xref>). It then generates the union (logical OR) of this starting feature with every other remaining feature in the dataset and computes scores for the obtained &#x2018;meta-features&#x2019; (Step 2; <xref ref-type="fig" rid="F1">Figure 1</xref>); it selects a 2nd feature that, added to the first (as a union), maximally increases the score &#x2013; which will then serve as the new top reference hit (Step 3; <xref ref-type="fig" rid="F1">Figure 1</xref>). Repeating this process until no further improvement to the cumulative score can be attained, the search output is a set of features (i.e., a meta-feature) whose union has the (local) maximum skewness score with respect to the input sample ranking. The significance of a CaDrA search and its cumulative score are determined by generating an empirical null distribution of scores based on the exact same data and search parameters, but with randomly permuted sample rankings, providing a permutation <italic>p</italic>-value per search result. Since the CaDrA algorithm specifically returns feature-sets maximally left-skewed given the provided sample ranking variable, it can be applied to identify features that are either positively correlated or anti-correlated with the continuous variable of interest by ranking samples in decreasing or increasing order of that variable, respectively.</p>
</sec>
<sec><title>CaDrA Features</title>
<sec><title>Search Modes</title>
<p>CaDrA supports multiple search modalities: it allows for the selection of a user-specified feature from which to start the search (rather than selecting the feature with highest score as depicted in Step 1 of <xref ref-type="fig" rid="F1">Figure 1</xref>); alternatively, since the greedy search is not guaranteed to find the global maximum, it also allows for a &#x201C;top-N&#x201D; search modality, whereby the search is started from each of the first N features (as measured by their individual skewness scores), and the result of the best search can be determined by selecting the set of features with the best cumulative score over the top-N runs.</p>
</sec>
<sec><title>Visualization of Search Results</title>
<p>For a given search, CaDrA outputs a set of features (meta-feature), which can be visualized as a &#x2018;meta-plot&#x2019;. This includes (panels from top to bottom): an area plot of the sample-specific measurements used to obtain the sample ranks; a color-coded matrix of all features in the meta-feature (in the step-wise order that they were added), one feature per row, with the corresponding union of the meta-feature (red) last; and a corresponding enrichment score (ES) plot below. Additionally, top-N search results can be visualized for overlapping features to evaluate robustness across different search starting points.</p>
</sec>
<sec><title>Parallelization Support</title>
<p>The generation of the empirical null distribution for significance testing is typically done for <underline>&#x2265;</underline>500 iterations (i.e., permuted sample ranks). In order to speed up this potentially time-consuming task, CaDrA supports exploiting parallel computing with the help of the parallel R package functionality, should multiple compute cores be available to users.</p>
</sec>
<sec><title>Permutation Caching</title>
<p>Since the generation of the null distribution used for significance testing is a time-consuming step, and since the null distribution of scores depends solely on the feature dataset and the search parameters specified (scoring method, starting feature versus top-N search mode etc.), and not on the input sample ranking, we can implement cacheing of the null distribution corresponding to each dataset and search parameters. When submitting multiple subsequent queries (each with its own sample ranking) that utilize the same dataset and search criteria, CaDrA can then fetch the corresponding cached null distribution to generate permutation <italic>p</italic>-values almost instantaneously, avoiding the need for repetitive computation, thus significantly reducing overall query run time.</p>
</sec>
</sec>
<sec><title>Data Availability and Processing</title>
<p>CaDrA is freely available for download and use as a documented R package under the git repository <ext-link ext-link-type="uri" xlink:href="https://github.com/montilab/CaDrA">https://github.com/montilab/CaDrA</ext-link>, and will further be deposited and maintained for future use under Bioconductor, including complete code and example use-cases.</p>
<p>DNA copy number (GISTIC2), mutation and RPPA data for TCGA analyses were obtained using Firehose v0.4.3 corresponding to the Jan 28th, 2016 (SCNA and somatic mutations) and Jul 15th, 2016 (RPPA) Firehose release. Somatic mutation data was processed at the gene level by assigning either 1 or 0 based on the presence or absence of any given mutation in that gene, respectively (excluding synonymous mutations). Annotated Level 3 RPPA data was used for all protein-related TCGA data queries. For pan-cancer analyses, these three data sets were obtained for nine cancer types, including bladder urothelial carcinoma (BLCA), breast invasive carcinomas (BRCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), and prostate adenocarcinoma (PRAD). RNASeq version 2 data processed as Level 3 RSEM-normalized gene expression values corresponding to the Feb 4th, 2015 Firehose release was used for the TCGA BRCA analysis. CCLE genomic data were downloaded from <ext-link ext-link-type="uri" xlink:href="https://portals.broadinstitute.org/ccle">https://portals.broadinstitute.org/ccle</ext-link> and processed as previously described (<xref ref-type="bibr" rid="B34">Kim et al., 2016</xref>). Somatic mutation binary calls per gene were used as is, and SCNA data was processed using GISTIC2 (<xref ref-type="bibr" rid="B41">Mermel et al., 2011</xref>) with all default parameters barring the confidence level, which was set to 99%. ActArea estimates pertaining to drug treatment sensitivity across CCLE samples was used as previously described (<xref ref-type="bibr" rid="B2">Barretina et al., 2012</xref>).</p>
<p>In all cases presented, SCNA and somatic mutation data were jointly analyzed as a single input dataset to CaDrA, thereby including samples for which both data were available. All input data to CaDrA were further pre-filtered so as to exclude alteration frequencies below 3% and above 60% to reduce feature sparsity and redundancy, respectively, across samples (CaDrA&#x2019;s default feature pre-filtering settings).</p>
</sec>
<sec><title>Simulated Data Generation</title>
<p>To evaluate both the sensitivity and specificity of CaDrA, we generated simulated data to represent cases where there was a mix of left-skewed (&#x201C;true positive&#x201D;) and randomly distributed (&#x201C;null&#x201D;) features, as well as cases where there were only null features. The left-skewness of a feature is a measure of its association with the sample ranking, since samples are sorted from left (high rank) to right (low rank). The design and parameter specification of the simulated data matrix is shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1</xref>. Each feature/row is a binary (0/1) vector, with 1 (0) in the <italic>i</italic>th position denoting the occurrence (non-occurrence) of the genetic event (e.g., SCNA or mutation) in the <italic>i</italic>th sample. This simulation of binary features relies on the following parameters:</p>
<list list-type="simple" prefix-word="simple">
<list-item><p><italic>N</italic>: Dataset sample size (number of columns in the matrix).</p></list-item>
<list-item><p><italic>n</italic>: Total number of features in the dataset (number of rows in the matrix).</p></list-item>
<list-item><p><italic>p</italic>: Number of true positive features generated per dataset [a positive feature is a feature whose distribution of events (i.e., the number of 1&#x2019;s) is significantly associated with the sample ranking, i.e., left-skewed].</p></list-item>
<list-item><p><italic>f</italic>: Left-skew proportion. The proportion of samples that are <italic>cumulatively</italic> left-skewed in the sample ranking.</p></list-item>
<list-item><p>&#x03BB;: The mean (and variance) of the Poisson distribution from which the number of events in the null features is sampled. This is equal to the number of 1&#x2019;s per skewed positive feature. A Poisson distribution is used so that we can partially control (through the mean) the number of 1&#x2019;s in a null feature, which are then uniformly distributed across samples (see description of Null feature generation below).</p></list-item>
</list>
<p>The resulting simulated binary data matrix will consist of two main types of features:</p>
<list list-type="simple" prefix-word="simple">
<list-item><p><italic>True Positive (TP) Features:</italic> A total of <italic>p</italic> TP features are generated. Events (i.e., 1&#x2019;s) are assigned to the TP features in a mutually exclusive fashion, with each of these features having (<italic>f &#x00D7; N</italic>)<italic>/p</italic> entries set to 1, with their cumulative OR yielding an N-sized vector with the left-most <italic>f &#x00D7; N</italic> entries set to 1&#x2019;s. For example, if we generate data for 100 samples and 5 positive features, with the left-skew proportion set to 0.5, each non-overlapping feature will have 10 among the 50 left-most entries (columns) set to 1, such that the union (logical OR) of the 5 features will have 1&#x2019;s in the first 50 entries.</p></list-item>
<list-item><p><italic>Null Features:</italic> Null features are generated for a total of (<italic>n&#x2013;p)</italic> features. To generate these features, we sample the number of 1&#x2019;s per null feature based on a Poisson distribution with mean parameter &#x03BB; = (<italic>f &#x00D7; N</italic>)/<italic>p</italic>. In this fashion, the number of 1&#x2019;s in the null features will have a distribution centered on the corresponding number for the TP features. For instance, if we generate data for 100 samples and 5 TP features with left-skew proportion <italic>f</italic> = 0.5, then each of the TP features will have ten 1&#x2019;s, and each of the remaining 995 null features will have a number of 1&#x2019;s sampled from Poisson (&#x03BB; = 10), uniformly distributed over the <italic>N</italic> samples.</p></list-item>
</list>
<p>A schematic representation of this data, along with the parameters that define its composition is shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1</xref>.</p>
</sec>
<sec><title>Evaluation of CaDrA Performance on Simulated Data</title>
<p>Evaluation of CaDrA performance was performed considering two main scenarios: (a) True positive datasets: Data containing both true positive and null features (where the sensitivity of CaDrA is tested); and (b) Null datasets: Data containing only null features (where the specificity of CaDrA is tested), with the following parameter specifications for data generation:</p>
<list list-type="simple" prefix-word="simple">
<list-item><p><italic>N</italic> = &#x007B;50, 60, 70, 80, 90, 100, 250, and 500&#x007D;</p></list-item>
<list-item><p><italic>n</italic> = 1000</p></list-item>
<list-item><p><italic>p</italic> = 5</p></list-item>
<list-item><p><italic>f</italic> = 0.5</p></list-item>
</list>
<p>CaDrA was run using default input parameters, returning a meta-feature which had the best score, along with a permutation <italic>p</italic>-value based on the empirical null search distribution (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref>). These results were then used to determine performance estimates for different sample sizes, composition (i.e., distribution of TP versus null features per returned meta-feature), size (i.e., the number of features within the returned meta-feature) and statistical significance of the returned meta-features. Mean TPR percentages shown in <xref ref-type="table" rid="T1">Table 1</xref> are a result of weight-averaging TPRs corresponding to different number of true positive features per meta-feature, weighted by the total searches returning such meta-features (gray circles <xref ref-type="fig" rid="F2">Figure 2C</xref>). Mean FPR percentages shown in <xref ref-type="table" rid="T1">Table 1</xref> are a result of weight-averaging FPRs corresponding to different meta-feature sizes, weighted by the total searches returning such meta-features (gray circles <xref ref-type="fig" rid="F2">Figure 2D</xref>).</p>
</sec>
<sec><title>COSMIC Enrichment Analyses</title>
<p>For enrichment analyses, RPPA protein data for the nine cancer types (see section &#x201C;Data Availability and Processing&#x201D;) was first restricted to those proteins representing hallmark oncogene or tumor suppressor genes included in the COSMIC v84 database (<italic>n</italic> = 57)<sup><xref ref-type="fn" rid="fn01">1</xref></sup> (<xref ref-type="bibr" rid="B22">Forbes et al., 2017</xref>). For each cancer type, a CaDrA query was then performed with respect to the protein expression-induced sample ranking, using somatic mutation and copy number alteration data as input features, in order to search for features associated with elevated protein expression of each of the hallmark proteins queried. The features selected thereof were then pooled across all queries, and the resulting gene list tested for significant enrichment (based on the hyper-geometric distribution) with respect to a set of annotated oncogenes and tumor suppressor genes in COSMIC (<italic>n</italic> = 554), compared to the pooled list of non-selected features.</p>
</sec>
<sec><title>Sub-Sampling Analyses</title>
<p>For all sub-sampling analyses presented, CaDrA was run after sub-sampling 80% of the original data, with consistency of CaDrA results computed as the Jaccard (<italic>J</italic>) index of the returned meta-feature obtained with and without sub-sampling (repeated for <italic>n</italic> = 100 independent sub-sampling iterations). To assess reproducibility of drivers associated with YAP/TAZ activity, the search was repeated by either preserving the observed ranking (decreasing YAP/TAZ activity), or by taking a permuted ranking. <italic>J</italic> indices were then compared between the original and permuted ranking cases using a Wilcox rank sum test. For the pan-cancer protein query analysis, all available proteins profiled as part of the RPPA data were used, with <italic>J</italic> indices similarly computed for the top 5 protein queries that yielded significant meta-features (<italic>P</italic> &#x2264; 0.05), and 5 queries randomly selected from the non-significant list (<italic>P</italic> > 0.05) in each cancer type. <italic>J</italic> indices were then pooled for the five significant, and non-significant results, respectively, and compared using a Wilcox rank sum test. FDR correction was used for all pan-cancer analyses tests of significance.</p>
</sec>
<sec><title>YAP/TAZ Signature Projection and Assessment in TCGA BRCAs</title>
<p>A signature comprising YAP/TAZ-activating genes (<italic>n</italic> = 717) in MDA-MB-231 cells was obtained based on a previous study (<xref ref-type="bibr" rid="B20">Enzo et al., 2015</xref>). The TCGA BRCA RNASeq data (<italic>n</italic> = 1,186 samples) was projected onto the signature genes and per-sample estimates of YAP/TAZ activity were derived using ASSIGN (<xref ref-type="bibr" rid="B54">Shen et al., 2015</xref>), which was then used as a continuous ranking variable with CaDrA. The association of YAP/TAZ activity with the CaDrA-derived meta-feature, and with BRCA subtype (i.e., TN status) was determined using a linear regression model.</p>
</sec>
<sec><title>Cell Culture, siRNA Knockdown and qRT-PCR</title>
<p>HS578T BRCA cells were purchased from ATCC and cultured using media and conditions suggested by ATCC. For RNA interference, cells were transfected using RNAiMAX (Thermo Fisher) with control siRNA (Qiagen, 1027310) or an equal molar mixture of siRNA targeting RELN (Sigma), RBCK1 (Sigma), or TAZ and YAP (<xref ref-type="bibr" rid="B26">Hiemer et al., 2014</xref>). 48 h post transfection, RNA was extracted from cells using RNeasy kit (Qiagen) and the synthesis of cDNA was performed as previously described (<xref ref-type="bibr" rid="B26">Hiemer et al., 2014</xref>). Quantitative real-time PCR (qRT-PCR) was performed using Taqman Universal master mix II (Thermo Fisher) and measured on ViiA 7 real-time PCR system. Taqman probes used included those recognizing CTGF (Thermo Fisher Hs00170014_m1), CYR61 (Thermo Fisher Hs00155479_m1), RELN (Thermo Fisher Hs01022646_m1), RBCK1 (Thermo Fisher Hs00934608_m1), WWTR1 (Thermo Fisher Hs01086149_m1), and YAP (Thermo Fisher Hs00902712_g1) and GAPDH (Thermo Fisher 4326317E). Expression levels of each gene were calculated using the &#x0394;&#x0394;Ct method and normalized to GAPDH. Knockdown efficiency of YAP, TAZ, RELN, and RBCK1 was verified for each experiment. Mean transcriptional knockdown of YAP, TAZ, and RBCK in HS578T cells was >80%. Basal RELN levels in HS578T cells were low, and relative knockdown in these cells was 28.3% (&#x00B1;14.1). Data from qRT-PCR experiments are shown as mean &#x00B1; S.D., with each knockdown compared with respect to the scrambled siRNA control (siCtl) using an unpaired, two-tailed Student&#x2019;s <italic>t</italic>-test.</p>
</sec>
<sec><title>CaDrA Search Parameters</title>
<p>For evaluation using genomic data, CaDrA was run in the top-N mode using the default of <italic>N</italic> = 7, choosing the best resulting meta-feature (see section &#x201C;Methods&#x201D;; CaDrA features: Search modes). For evaluation of simulated data, only the top-scoring feature was considered as a starting feature per search run (i.e., <italic>N</italic> = 1). The &#x201C;ks&#x201D; method was chosen for evaluating skewness of features at each step in all cases presented. All other default input search parameters were used for all cases presented.</p>
</sec>
</sec>
<sec><title>Availability of Data and Material</title>
<p>The datasets generated and/or analyzed during the current study are available in the TCGA repository (<ext-link ext-link-type="uri" xlink:href="https://tcga-data.nci.nih.gov/docs/publications/tcga">https://tcga-data.nci.nih.gov/docs/publications/tcga</ext-link>), and CCLE repository (<ext-link ext-link-type="uri" xlink:href="https://portals.broadinstitute.org/ccle">https://portals.broadinstitute.org/ccle</ext-link>), and are available from the corresponding author on reasonable request.</p>
</sec>
<sec><title>Author Contributions</title>
<p>VK developed the R package and conducted the analyses. VK and SM wrote the manuscript, with input from PS and XV. JK performed the siRNA and qRT-PCR experiments. LZ assisted in obtaining the gene expression signature for TCGA data projection. PS assisted in the evaluation of CaDrA on simulated data. SM and VK designed the CaDrA framework and features, and interpreted the results. XV designed the experimental validation of novel candidate drivers, and interpreted the results thereof. All authors read and approved the final manuscript.</p>
</sec>
<sec><title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported by National Institutes of Health NIDCR fellowship F31 DE025536 (VK), CDMRP grant W81XWH-14-1-0336 (XV), the Dahod breast cancer research program at Boston University School of Medicine (XV and SM), as well as the Clinical and Translational Science Institute (supported by Clinical and Translational Research Award CTSA grant UL1-TR001430) at Boston University School of Medicine (SM). The funding sources played no role in the design of the study and collection, analysis, and interpretation of data and in the writing of this manuscript.</p>
</fn>
</fn-group>
<ack>
<p>We would like to thank Joshua Klein for making suggestions toward the implementation of specific package features. We further acknowledge dbGap for granting access to the TCGA data (phs000178.v9.p8).</p>
</ack>
<sec sec-type="supplementary material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2019.00121/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2019.00121/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ardlie</surname> <given-names>K. G.</given-names></name> <name><surname>Deluca</surname> <given-names>D. S.</given-names></name> <name><surname>Segre</surname> <given-names>A. V.</given-names></name> <name><surname>Sullivan</surname> <given-names>T. J.</given-names></name> <name><surname>Young</surname> <given-names>T. R.</given-names></name> <name><surname>Gelfand</surname> <given-names>E. T.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans.</article-title> <source><italic>Science</italic></source> <volume>348</volume> <fpage>648</fpage>&#x2013;<lpage>660</lpage>. <pub-id pub-id-type="doi">10.1126/science.1262110</pub-id> <pub-id pub-id-type="pmid">25954001</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barretina</surname> <given-names>J.</given-names></name> <name><surname>Caponigro</surname> <given-names>G.</given-names></name> <name><surname>Stransky</surname> <given-names>N.</given-names></name> <name><surname>Venkatesan</surname> <given-names>K.</given-names></name> <name><surname>Margolin</surname> <given-names>A. A.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity.</article-title> <source><italic>Nature</italic></source> <volume>483</volume> <fpage>603</fpage>&#x2013;<lpage>607</lpage>. <pub-id pub-id-type="doi">10.1038/nature11003</pub-id> <pub-id pub-id-type="pmid">22460905</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bea</surname> <given-names>S.</given-names></name> <name><surname>Zettl</surname> <given-names>A.</given-names></name> <name><surname>Wright</surname> <given-names>G.</given-names></name> <name><surname>Salaverria</surname> <given-names>I.</given-names></name> <name><surname>Jehn</surname> <given-names>P.</given-names></name> <name><surname>Moreno</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2005</year>). <article-title>Diffuse large B-cell lymphoma subgroups have distinct genetic profiles that influence tumor biology and improve gene-expression &#x2013; based survival prediction.</article-title> <source><italic>Hematology</italic></source> <volume>106</volume> <fpage>3183</fpage>&#x2013;<lpage>3190</lpage>. <pub-id pub-id-type="doi">10.1182/blood-2005-04-1399</pub-id> <pub-id pub-id-type="pmid">16046532</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bild</surname> <given-names>A. H.</given-names></name> <name><surname>Yao</surname> <given-names>G.</given-names></name> <name><surname>Chang</surname> <given-names>J. T.</given-names></name> <name><surname>Wang</surname> <given-names>Q.</given-names></name> <name><surname>Potti</surname> <given-names>A.</given-names></name> <name><surname>Chasse</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Oncogenic pathway signatures in human cancers as a guide to targeted therapies.</article-title> <source><italic>Nature</italic></source> <volume>439</volume> <fpage>353</fpage>&#x2013;<lpage>357</lpage>. <pub-id pub-id-type="doi">10.1038/nature04296</pub-id> <pub-id pub-id-type="pmid">16273092</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burotto</surname> <given-names>M.</given-names></name> <name><surname>Chiou</surname> <given-names>V. L.</given-names></name> <name><surname>Lee</surname> <given-names>J. M.</given-names></name> <name><surname>Kohn</surname> <given-names>E. C.</given-names></name></person-group> (<year>2014</year>). <article-title>The MAPK pathway across different malignancies: a new perspective.</article-title> <source><italic>Cancer</italic></source> <volume>120</volume> <fpage>3446</fpage>&#x2013;<lpage>3456</lpage>. <pub-id pub-id-type="doi">10.1002/cncr.28864</pub-id> <pub-id pub-id-type="pmid">24948110</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cantwell-Dorris</surname> <given-names>E. R.</given-names></name> <name><surname>O&#x2019;Leary</surname> <given-names>J. J.</given-names></name> <name><surname>Sheils</surname> <given-names>O. M.</given-names></name></person-group> (<year>2011</year>). <article-title>BRAFV600E: implications for carcinogenesis and molecular therapy.</article-title> <source><italic>Mol. Cancer Ther.</italic></source> <volume>10</volume> <fpage>385</fpage>&#x2013;<lpage>394</lpage>. <pub-id pub-id-type="doi">10.1158/1535-7163.MCT-10-0799</pub-id> <pub-id pub-id-type="pmid">21388974</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cargnello</surname> <given-names>M.</given-names></name> <name><surname>Roux</surname> <given-names>P. P.</given-names></name></person-group> (<year>2011</year>). <article-title>Activation and function of the MAPKs and their substrates, the MAPK-activated protein kinases.</article-title> <source><italic>Microbiol. Mol. Biol. Rev.</italic></source> <volume>75</volume> <fpage>50</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1128/MMBR.00031-10</pub-id> <pub-id pub-id-type="pmid">21372320</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>K.</given-names></name> <name><surname>Creighton</surname> <given-names>C. J.</given-names></name> <name><surname>Davis</surname> <given-names>C.</given-names></name> <name><surname>Donehower</surname> <given-names>L.</given-names></name> <name><surname>Drummond</surname> <given-names>J.</given-names></name> <name><surname>Wheeler</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>The cancer genome atlas pan-cancer analysis project.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>45</volume> <fpage>1113</fpage>&#x2013;<lpage>1120</lpage>. <pub-id pub-id-type="doi">10.1038/ng.2764</pub-id> <pub-id pub-id-type="pmid">24071849</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chapman</surname> <given-names>P. B.</given-names></name> <name><surname>Hauschild</surname> <given-names>A.</given-names></name> <name><surname>Robert</surname> <given-names>C.</given-names></name> <name><surname>Haanen</surname> <given-names>J. B.</given-names></name> <name><surname>Ascierto</surname> <given-names>P.</given-names></name> <name><surname>Larkin</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Improved survival with vemurafenib in melanoma with BRAF V600E mutation.</article-title> <source><italic>N. Engl. J. Med.</italic></source> <volume>364</volume> <fpage>2507</fpage>&#x2013;<lpage>2516</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMoa1103782</pub-id> <pub-id pub-id-type="pmid">21639808</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chapnick</surname> <given-names>D. A.</given-names></name> <name><surname>Warner</surname> <given-names>L.</given-names></name> <name><surname>Bernet</surname> <given-names>J.</given-names></name> <name><surname>Rao</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name></person-group> (<year>2011</year>). <article-title>Partners in crime: the TGF&#x03B2; and MAPK pathways in cancer progression.</article-title> <source><italic>Cell Biosci.</italic></source> <volume>1</volume>:<issue>42</issue>. <pub-id pub-id-type="doi">10.1186/2045-3701-1-42</pub-id> <pub-id pub-id-type="pmid">22204556</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J. C.</given-names></name> <name><surname>Alvarez</surname> <given-names>M. J.</given-names></name> <name><surname>Talos</surname> <given-names>F.</given-names></name> <name><surname>Dhruv</surname> <given-names>H.</given-names></name> <name><surname>Rieckhof</surname> <given-names>G. E.</given-names></name> <name><surname>Iyer</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks.</article-title> <source><italic>Cell</italic></source> <volume>159</volume> <fpage>402</fpage>&#x2013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2014.09.021</pub-id> <pub-id pub-id-type="pmid">25303533</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cho</surname> <given-names>A.</given-names></name> <name><surname>Shim</surname> <given-names>J. E.</given-names></name> <name><surname>Kim</surname> <given-names>E.</given-names></name> <name><surname>Supek</surname> <given-names>F.</given-names></name> <name><surname>Lehner</surname> <given-names>B.</given-names></name> <name><surname>Lee</surname> <given-names>I.</given-names></name></person-group> (<year>2016</year>). <article-title>MUFFINN: cancer gene discovery via network analysis of somatic mutation data.</article-title> <source><italic>Genome Biol.</italic></source> <volume>17</volume>:<issue>129</issue>. <pub-id pub-id-type="doi">10.1186/s13059-016-0989-x</pub-id> <pub-id pub-id-type="pmid">27333808</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ciriello</surname> <given-names>G.</given-names></name> <name><surname>Cerami</surname> <given-names>E.</given-names></name> <name><surname>Sander</surname> <given-names>C.</given-names></name> <name><surname>Schultz</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). <article-title>Mutual exclusivity analysis identifies oncogenic network modules.</article-title> <source><italic>Genome Res.</italic></source> <volume>22</volume> <fpage>398</fpage>&#x2013;<lpage>406</lpage>. <pub-id pub-id-type="doi">10.1101/gr.125567.111</pub-id> <pub-id pub-id-type="pmid">21908773</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Creixell</surname> <given-names>P.</given-names></name> <name><surname>Reimand</surname> <given-names>J.</given-names></name> <name><surname>Haider</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>G.</given-names></name> <name><surname>Shibata</surname> <given-names>T.</given-names></name> <name><surname>Vazquez</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Pathway and network analysis of cancer genomes.</article-title> <source><italic>Nat. Methods</italic></source> <volume>12</volume> <fpage>615</fpage>&#x2013;<lpage>621</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3440</pub-id> <pub-id pub-id-type="pmid">26125594</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daemen</surname> <given-names>A.</given-names></name> <name><surname>Griffith</surname> <given-names>O. L.</given-names></name> <name><surname>Heiser</surname> <given-names>L. M.</given-names></name> <name><surname>Wang</surname> <given-names>N. J.</given-names></name> <name><surname>Enache</surname> <given-names>O. M.</given-names></name> <name><surname>Sanborn</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Modeling precision treatment of breast cancer.</article-title> <source><italic>Genome Biol.</italic></source> <volume>14</volume>:<issue>R110</issue>. <pub-id pub-id-type="doi">10.1186/gb-2013-14-10-r110</pub-id> <pub-id pub-id-type="pmid">24176112</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dees</surname> <given-names>N. D.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Kandoth</surname> <given-names>C.</given-names></name> <name><surname>Wendl</surname> <given-names>M. C.</given-names></name> <name><surname>Schierding</surname> <given-names>W.</given-names></name> <name><surname>Koboldt</surname> <given-names>D. C.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>MuSiC: identifying mutational significance in cancer genomes.</article-title> <source><italic>Genome Res.</italic></source> <volume>22</volume> <fpage>1589</fpage>&#x2013;<lpage>1598</lpage>. <pub-id pub-id-type="doi">10.1101/gr.134635.111.22</pub-id> <pub-id pub-id-type="pmid">22759861</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derynck</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>Y. E.</given-names></name></person-group> (<year>2003</year>). <article-title>Smad-dependent and Smad-independent pathways in TGF-&#x03B2; family signalling.</article-title> <source><italic>Nature</italic></source> <volume>425</volume> <fpage>577</fpage>&#x2013;<lpage>584</lpage>. <pub-id pub-id-type="doi">10.1038/nature02006</pub-id> <pub-id pub-id-type="pmid">14534577</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donley</surname> <given-names>C.</given-names></name> <name><surname>McClelland</surname> <given-names>K.</given-names></name> <name><surname>McKeen</surname> <given-names>H. D.</given-names></name> <name><surname>Nelson</surname> <given-names>L.</given-names></name> <name><surname>Yakkundi</surname> <given-names>A.</given-names></name> <name><surname>Jithesh</surname> <given-names>P. V.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Identification of RBCK1 as a novel regulator of FKBPL: implications for tumor growth and response to tamoxifen.</article-title> <source><italic>Oncogene</italic></source> <volume>33</volume> <fpage>3441</fpage>&#x2013;<lpage>3450</lpage>. <pub-id pub-id-type="doi">10.1038/onc.2013.306</pub-id> <pub-id pub-id-type="pmid">23912458</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drier</surname> <given-names>Y.</given-names></name> <name><surname>Sheffer</surname> <given-names>M.</given-names></name> <name><surname>Domany</surname> <given-names>E.</given-names></name></person-group> (<year>2013</year>). <article-title>Pathway-based personalized analysis of cancer.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>110</volume> <fpage>6388</fpage>&#x2013;<lpage>6393</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1219651110</pub-id> <pub-id pub-id-type="pmid">23547110</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Enzo</surname> <given-names>E.</given-names></name> <name><surname>Santinon</surname> <given-names>G.</given-names></name> <name><surname>Pocaterra</surname> <given-names>A.</given-names></name> <name><surname>Aragona</surname> <given-names>M.</given-names></name> <name><surname>Bresolin</surname> <given-names>S.</given-names></name> <name><surname>Forcato</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Aerobic glycolysis tunes YAP/TAZ transcriptional activity.</article-title> <source><italic>EMBO J.</italic></source> <volume>34</volume> <fpage>1349</fpage>&#x2013;<lpage>1370</lpage>. <pub-id pub-id-type="doi">10.15252/embj.201490379</pub-id> <pub-id pub-id-type="pmid">25796446</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferraro</surname> <given-names>E.</given-names></name> <name><surname>Corvaro</surname> <given-names>M.</given-names></name> <name><surname>Cecconi</surname> <given-names>F.</given-names></name></person-group> (<year>2003</year>). <article-title>Physiological and pathological roles of Apaf1 and the apoptosome.</article-title> <source><italic>J. Cell. Mol. Med.</italic></source> <volume>7</volume> <fpage>21</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1111/j.1582-4934.2003.tb00199.x</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Forbes</surname> <given-names>S. A.</given-names></name> <name><surname>Beare</surname> <given-names>D.</given-names></name> <name><surname>Boutselakis</surname> <given-names>H.</given-names></name> <name><surname>Bamford</surname> <given-names>S.</given-names></name> <name><surname>Bindal</surname> <given-names>N.</given-names></name> <name><surname>Tate</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>COSMIC: somatic cancer genetics at high-resolution.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>45</volume> <fpage>D777</fpage>&#x2013;<lpage>D783</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1121</pub-id> <pub-id pub-id-type="pmid">27899578</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fu</surname> <given-names>D.-Y.</given-names></name> <name><surname>Wang</surname> <given-names>Z.-M.</given-names></name> <name><surname>Wang</surname> <given-names>B.-L.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>W.-T.</given-names></name> <name><surname>Shen</surname> <given-names>Z.-Z.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Frequent epigenetic inactivation of the receptor tyrosine kinase EphA5 by promoter methylation in human breast cancer.</article-title> <source><italic>Hum. Pathol.</italic></source> <volume>41</volume> <fpage>48</fpage>&#x2013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1016/j.humpath.2009.06.007</pub-id> <pub-id pub-id-type="pmid">19733895</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harvey</surname> <given-names>K. F.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Thomas</surname> <given-names>D. M.</given-names></name></person-group> (<year>2013</year>). <article-title>The Hippo pathway and human cancer.</article-title> <source><italic>Nat. Rev. Cancer</italic></source> <volume>13</volume> <fpage>246</fpage>&#x2013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1038/nrc3458</pub-id> <pub-id pub-id-type="pmid">23467301</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heiser</surname> <given-names>L. M.</given-names></name> <name><surname>Sadanandam</surname> <given-names>A.</given-names></name> <name><surname>Kuo</surname> <given-names>W.</given-names></name> <name><surname>Benz</surname> <given-names>S. C.</given-names></name> <name><surname>Goldstein</surname> <given-names>T. C.</given-names></name> <name><surname>Ng</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Subtype and pathway specific responses to anticancer compounds in breast cancer.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>109</volume> <fpage>2724</fpage>&#x2013;<lpage>2729</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1018854108</pub-id> <pub-id pub-id-type="pmid">22003129</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hiemer</surname> <given-names>S. E.</given-names></name> <name><surname>Szymaniak</surname> <given-names>A. D.</given-names></name> <name><surname>Varelas</surname> <given-names>X.</given-names></name></person-group> (<year>2014</year>). <article-title>The transcriptional regulators TAZ and YAP direct transforming growth factor B-induced tumorigenic phenotypes in breast cancer cells.</article-title> <source><italic>J. Biol. Chem.</italic></source> <volume>289</volume> <fpage>13461</fpage>&#x2013;<lpage>13474</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M113.529115</pub-id> <pub-id pub-id-type="pmid">24648515</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hiemer</surname> <given-names>S. E.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Kartha</surname> <given-names>V. K.</given-names></name> <name><surname>Packer</surname> <given-names>T. S.</given-names></name> <name><surname>Almershed</surname> <given-names>M.</given-names></name> <name><surname>Noonan</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>A YAP/TAZ-regulated molecular signature is associated with oral squamous cell carcinoma.</article-title> <source><italic>Mol. Cancer Res.</italic></source> <volume>13</volume> <fpage>957</fpage>&#x2013;<lpage>968</lpage>. <pub-id pub-id-type="doi">10.1158/1541-7786.MCR-14-0580</pub-id> <pub-id pub-id-type="pmid">25794680</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hou</surname> <given-names>J. P.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>DawnRank: discovering personalized driver genes in cancer.</article-title> <source><italic>Genome Med.</italic></source> <volume>6</volume>:<issue>56</issue>. <pub-id pub-id-type="doi">10.1186/s13073-014-0056-8</pub-id> <pub-id pub-id-type="pmid">25177370</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jang</surname> <given-names>I. S.</given-names></name> <name><surname>Neto</surname> <given-names>E. C.</given-names></name> <name><surname>Guinney</surname> <given-names>J.</given-names></name> <name><surname>Friend</surname> <given-names>S. H.</given-names></name> <name><surname>Margolin</surname> <given-names>A. A.</given-names></name></person-group> (<year>2014</year>). <article-title>Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data.</article-title> <source><italic>Pac. Symp. Biocomput.</italic></source> <volume>2014</volume> <fpage>63</fpage>&#x2013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1055/s-0029-1237430</pub-id> <pub-id pub-id-type="pmid">19711252</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>P.</given-names></name> <name><surname>Zhao</surname> <given-names>Z.</given-names></name></person-group> (<year>2014</year>). <article-title>VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>10</volume>:<issue>e1003460</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003460</pub-id> <pub-id pub-id-type="pmid">24516372</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>D. B.</given-names></name> <name><surname>Puzanov</surname> <given-names>I.</given-names></name></person-group> (<year>2015</year>). <article-title>Treatment of NRAS-mutant melanoma.</article-title> <source><italic>Curr. Treat. Options Oncol.</italic></source> <volume>16</volume>:<issue>15</issue>. <pub-id pub-id-type="doi">10.1007/s11864-015-0330-z</pub-id> <pub-id pub-id-type="pmid">25796376</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanai</surname> <given-names>F.</given-names></name> <name><surname>Marignani</surname> <given-names>P. A.</given-names></name> <name><surname>Sarbassova</surname> <given-names>D.</given-names></name> <name><surname>Yagi</surname> <given-names>R.</given-names></name> <name><surname>Hall</surname> <given-names>R. A.</given-names></name> <name><surname>Donowitz</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2000</year>). <article-title>TAZ: a novel transcriptional co-activator regulated by interactions with 14-3-3 and PDZ domain proteins.</article-title> <source><italic>EMBO J.</italic></source> <volume>19</volume> <fpage>6778</fpage>&#x2013;<lpage>6791</lpage>. <pub-id pub-id-type="doi">10.1093/emboj/19.24.6778</pub-id> <pub-id pub-id-type="pmid">11118213</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>E. K.</given-names></name> <name><surname>Choi</surname> <given-names>E.-J.</given-names></name></person-group> (<year>2010</year>). <article-title>Pathological roles of MAPK signaling pathways in human diseases.</article-title> <source><italic>Biochim. Biophys. Acta</italic></source> <volume>1802</volume> <fpage>396</fpage>&#x2013;<lpage>405</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbadis.2009.12.009</pub-id> <pub-id pub-id-type="pmid">20079433</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J. W.</given-names></name> <name><surname>Botvinnik</surname> <given-names>O. B.</given-names></name> <name><surname>Abudayyeh</surname> <given-names>O.</given-names></name> <name><surname>Birger</surname> <given-names>C.</given-names></name> <name><surname>Rosenbluh</surname> <given-names>J.</given-names></name> <name><surname>Shrestha</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Characterizing genomic alterations in cancer by complementary functional associations.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>34</volume> <fpage>3</fpage>&#x2013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3527</pub-id> <pub-id pub-id-type="pmid">27088724</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kristensen</surname> <given-names>V. N.</given-names></name> <name><surname>Lingj&#x00E6;rde</surname> <given-names>O. C.</given-names></name> <name><surname>Russnes</surname> <given-names>H. G.</given-names></name> <name><surname>Vollan</surname> <given-names>H. K. M.</given-names></name> <name><surname>Frigessi</surname> <given-names>A.</given-names></name> <name><surname>B&#x00F8;rresen-Dale</surname> <given-names>A.-L.</given-names></name></person-group> (<year>2014</year>). <article-title>Principles and methods of integrative genomic analyses in cancer.</article-title> <source><italic>Nat. Rev. Cancer</italic></source> <volume>14</volume> <fpage>299</fpage>&#x2013;<lpage>313</lpage>. <pub-id pub-id-type="doi">10.1038/nrc3721</pub-id> <pub-id pub-id-type="pmid">24759209</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lawrence</surname> <given-names>M. S.</given-names></name> <name><surname>Stojanov</surname> <given-names>P.</given-names></name> <name><surname>Polak</surname> <given-names>P.</given-names></name> <name><surname>Kryukov</surname> <given-names>G. V.</given-names></name> <name><surname>Cibulskis</surname> <given-names>K.</given-names></name> <name><surname>Sivachenko</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Mutational heterogeneity in cancer and the search for new cancer-associated genes.</article-title> <source><italic>Nature</italic></source> <volume>499</volume> <fpage>214</fpage>&#x2013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1038/nature12213</pub-id> <pub-id pub-id-type="pmid">23770567</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leiserson</surname> <given-names>M. D. M.</given-names></name> <name><surname>Blokh</surname> <given-names>D.</given-names></name> <name><surname>Sharan</surname> <given-names>R.</given-names></name> <name><surname>Raphael</surname> <given-names>B. J.</given-names></name></person-group> (<year>2013</year>). <article-title>Simultaneous identification of multiple driver pathways in cancer.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>9</volume>:<issue>e1003054</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003054</pub-id> <pub-id pub-id-type="pmid">23717195</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leiserson</surname> <given-names>M. D. M.</given-names></name> <name><surname>Vandin</surname> <given-names>F.</given-names></name> <name><surname>Wu</surname> <given-names>H. T.</given-names></name> <name><surname>Dobson</surname> <given-names>J. R.</given-names></name> <name><surname>Eldridge</surname> <given-names>J. V.</given-names></name> <name><surname>Thomas</surname> <given-names>J. L.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>47</volume> <fpage>106</fpage>&#x2013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1038/ng.3168</pub-id> <pub-id pub-id-type="pmid">25501392</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Cho</surname> <given-names>S. N.</given-names></name> <name><surname>Akkanti</surname> <given-names>B.</given-names></name> <name><surname>Jin</surname> <given-names>N.</given-names></name> <name><surname>Mao</surname> <given-names>J.</given-names></name> <name><surname>Long</surname> <given-names>W.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>ErbB2 pathway activation upon smad4 loss promotes lung tumor growth and metastasis.</article-title> <source><italic>Cell Rep.</italic></source> <volume>10</volume> <fpage>1599</fpage>&#x2013;<lpage>1613</lpage>. <pub-id pub-id-type="doi">10.1016/j.celrep.2015.02.014</pub-id> <pub-id pub-id-type="pmid">25753424</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mascaux</surname> <given-names>C.</given-names></name> <name><surname>Wynes</surname> <given-names>M. W.</given-names></name> <name><surname>Kato</surname> <given-names>Y.</given-names></name> <name><surname>Tran</surname> <given-names>C.</given-names></name> <name><surname>Asuncion</surname> <given-names>B. R.</given-names></name> <name><surname>Zhao</surname> <given-names>J. M.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>EGFR protein expression in non-small cell lung cancer predicts response to an EGFR tyrosine kinase inhibitor - a novel antibody for immunohistochemistry or AQUA technology.</article-title> <source><italic>Clin. Cancer Res.</italic></source> <volume>17</volume> <fpage>7796</fpage>&#x2013;<lpage>7807</lpage>. <pub-id pub-id-type="doi">10.1158/1078-0432.CCR-11-0209</pub-id> <pub-id pub-id-type="pmid">21994417</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mermel</surname> <given-names>C. H.</given-names></name> <name><surname>Schumacher</surname> <given-names>S. E.</given-names></name> <name><surname>Hill</surname> <given-names>B.</given-names></name> <name><surname>Meyerson</surname> <given-names>M. L.</given-names></name> <name><surname>Beroukhim</surname> <given-names>R.</given-names></name> <name><surname>Getz</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.</article-title> <source><italic>Genome Biol.</italic></source> <volume>12</volume>:<issue>R41</issue>. <pub-id pub-id-type="doi">10.1186/gb-2011-12-4-r41</pub-id> <pub-id pub-id-type="pmid">21527027</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Monti</surname> <given-names>S.</given-names></name> <name><surname>Chapuy</surname> <given-names>B.</given-names></name> <name><surname>Takeyama</surname> <given-names>K.</given-names></name> <name><surname>Rodig</surname> <given-names>S. J.</given-names></name> <name><surname>Hao</surname> <given-names>Y.</given-names></name> <name><surname>Yeda</surname> <given-names>K. T.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Integrative analysis reveals an outcome-associated and targetable pattern of p53 and cell cycle deregulation in diffuse large B cell lymphoma.</article-title> <source><italic>Cancer Cell</italic></source> <volume>22</volume> <fpage>359</fpage>&#x2013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1016/j.ccr.2012.07.014</pub-id> <pub-id pub-id-type="pmid">22975378</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moon</surname> <given-names>Y. W.</given-names></name> <name><surname>Rao</surname> <given-names>G.</given-names></name> <name><surname>Kim</surname> <given-names>J. J.</given-names></name> <name><surname>Shim</surname> <given-names>H. S.</given-names></name> <name><surname>Park</surname> <given-names>K. S.</given-names></name> <name><surname>An</surname> <given-names>S. S.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>LAMC2 enhances the metastatic potential of lung adenocarcinoma.</article-title> <source><italic>Cell Death Differ.</italic></source> <volume>22</volume> <fpage>1341</fpage>&#x2013;<lpage>1352</lpage>. <pub-id pub-id-type="doi">10.1038/cdd.2014.228</pub-id> <pub-id pub-id-type="pmid">25591736</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moroishi</surname> <given-names>T.</given-names></name> <name><surname>Hansen</surname> <given-names>C. G.</given-names></name> <name><surname>Guan</surname> <given-names>K.-L.</given-names></name></person-group> (<year>2015</year>). <article-title>The emerging roles of YAP and TAZ in cancer.</article-title> <source><italic>Nat. Rev. Cancer</italic></source> <volume>15</volume> <fpage>73</fpage>&#x2013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1038/nrc3876</pub-id> <pub-id pub-id-type="pmid">25592648</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moustakas</surname> <given-names>A.</given-names></name> <name><surname>Heldin</surname> <given-names>C. H.</given-names></name></person-group> (<year>2005</year>). <article-title>Non-Smad TGF-beta signals.</article-title> <source><italic>J. Cell Sci.</italic></source> <volume>118</volume> <fpage>3573</fpage>&#x2013;<lpage>3584</lpage>. <pub-id pub-id-type="doi">10.1242/jcs.02554</pub-id> <pub-id pub-id-type="pmid">16105881</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ng</surname> <given-names>S.</given-names></name> <name><surname>Collisson</surname> <given-names>E. A.</given-names></name> <name><surname>Sokolov</surname> <given-names>A.</given-names></name> <name><surname>Goldstein</surname> <given-names>T.</given-names></name> <name><surname>Lopez-bigas</surname> <given-names>N.</given-names></name> <name><surname>Benz</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>640</fpage>&#x2013;<lpage>646</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts402</pub-id> <pub-id pub-id-type="pmid">22962493</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pao</surname> <given-names>W.</given-names></name> <name><surname>Miller</surname> <given-names>V.</given-names></name> <name><surname>Zakowski</surname> <given-names>M.</given-names></name> <name><surname>Doherty</surname> <given-names>J.</given-names></name> <name><surname>Politi</surname> <given-names>K.</given-names></name> <name><surname>Sarkaria</surname> <given-names>I.</given-names></name><etal/></person-group> (<year>2004</year>). <article-title>EGF receptor gene mutations are common in lung cancers from &#x201C;never smokers&#x201D; and are associated with sensitivity of tumors to gefitinib and erlotinib.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>101</volume> <fpage>13306</fpage>&#x2013;<lpage>13311</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0405220101</pub-id> <pub-id pub-id-type="pmid">15329413</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Piccolo</surname> <given-names>S.</given-names></name> <name><surname>Dupont</surname> <given-names>S.</given-names></name> <name><surname>Cordenonsi</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>The biology of YAP/TAZ: hippo signaling and beyond.</article-title> <source><italic>Physiol. Rev.</italic></source> <volume>94</volume> <fpage>1287</fpage>&#x2013;<lpage>1312</lpage>. <pub-id pub-id-type="doi">10.1152/physrev.00005.2014</pub-id> <pub-id pub-id-type="pmid">25287865</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname> <given-names>P. J.</given-names></name> <name><surname>Der</surname> <given-names>C. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer.</article-title> <source><italic>Oncogene</italic></source> <volume>26</volume> <fpage>3291</fpage>&#x2013;<lpage>3310</lpage>. <pub-id pub-id-type="doi">10.1038/sj.onc.1210422</pub-id> <pub-id pub-id-type="pmid">17496923</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rojas</surname> <given-names>A.</given-names></name> <name><surname>Padidam</surname> <given-names>M.</given-names></name> <name><surname>Cress</surname> <given-names>D.</given-names></name> <name><surname>Grady</surname> <given-names>W. M.</given-names></name></person-group> (<year>2009</year>). <article-title>TGF-B receptor levels regulate the specificity of signaling pathway activation and biological effects of TGF-B.</article-title> <source><italic>Biochim. Biophys. Acta</italic></source> <volume>1793</volume> <fpage>1165</fpage>&#x2013;<lpage>1173</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbamcr.2009.02.001</pub-id> <pub-id pub-id-type="pmid">19339207</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanchez-Vega</surname> <given-names>F.</given-names></name> <name><surname>Mina</surname> <given-names>M.</given-names></name> <name><surname>Armenia</surname> <given-names>J.</given-names></name> <name><surname>Chatila</surname> <given-names>W. K.</given-names></name> <name><surname>Luna</surname> <given-names>A.</given-names></name> <name><surname>La</surname> <given-names>K. C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Oncogenic signaling pathways in the cancer genome atlas.</article-title> <source><italic>Cell</italic></source> <volume>173</volume> <issue>321</issue>.<fpage>e10</fpage>&#x2013;<lpage>337</lpage>.e10. <pub-id pub-id-type="doi">10.1016/j.cell.2018.03.035</pub-id> <pub-id pub-id-type="pmid">29625050</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savage</surname> <given-names>K. J.</given-names></name> <name><surname>Monti</surname> <given-names>S.</given-names></name> <name><surname>Kutok</surname> <given-names>J. L.</given-names></name> <name><surname>Cattoretti</surname> <given-names>G.</given-names></name> <name><surname>Neuberg</surname> <given-names>D.</given-names></name> <name><surname>De Leval</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2003</year>). <article-title>The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma.</article-title> <source><italic>Blood</italic></source> <volume>102</volume> <fpage>3871</fpage>&#x2013;<lpage>3879</lpage>. <pub-id pub-id-type="doi">10.1182/blood-2003-06-1841</pub-id> <pub-id pub-id-type="pmid">12933571</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sensi</surname> <given-names>M.</given-names></name> <name><surname>Nicolini</surname> <given-names>G.</given-names></name> <name><surname>Petti</surname> <given-names>C.</given-names></name> <name><surname>Bersani</surname> <given-names>I.</given-names></name> <name><surname>Lozupone</surname> <given-names>F.</given-names></name> <name><surname>Molla</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Mutually exclusive NRASQ61R and BRAFV600E mutations at the single-cell level in the same human melanoma.</article-title> <source><italic>Oncogene</italic></source> <volume>25</volume> <fpage>3357</fpage>&#x2013;<lpage>3364</lpage>. <pub-id pub-id-type="doi">10.1038/sj.onc.1209379</pub-id> <pub-id pub-id-type="pmid">16462768</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Rahman</surname> <given-names>M.</given-names></name> <name><surname>Piccolo</surname> <given-names>S. R.</given-names></name> <name><surname>Gusenleitner</surname> <given-names>D.</given-names></name> <name><surname>El-Chaar</surname> <given-names>N. N.</given-names></name> <name><surname>Cheng</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>ASSIGN: context-specific genomic profiling of multiple heterogeneous biological pathways.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>1745</fpage>&#x2013;<lpage>1753</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv031</pub-id> <pub-id pub-id-type="pmid">25617415</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soengas</surname> <given-names>M. S.</given-names></name> <name><surname>Gerald</surname> <given-names>W. L.</given-names></name> <name><surname>Cordon-Cardo</surname> <given-names>C.</given-names></name> <name><surname>Lazebnik</surname> <given-names>Y.</given-names></name> <name><surname>Lowe</surname> <given-names>S. W.</given-names></name></person-group> (<year>2006</year>). <article-title>Apaf-1 expression in malignant melanoma.</article-title> <source><italic>Cell Death Differ.</italic></source> <volume>13</volume> <fpage>352</fpage>&#x2013;<lpage>353</lpage>. <pub-id pub-id-type="doi">10.1038/sj.cdd.4401755</pub-id> <pub-id pub-id-type="pmid">16110320</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stein</surname> <given-names>T.</given-names></name> <name><surname>Cosimo</surname> <given-names>E.</given-names></name> <name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Smith</surname> <given-names>P. R.</given-names></name> <name><surname>Simon</surname> <given-names>R.</given-names></name> <name><surname>Cottrell</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Loss of reelin expression in breast cancer is epigenetically controlled and associated with poor prognosis.</article-title> <source><italic>Am. J. Pathol.</italic></source> <volume>177</volume> <fpage>2323</fpage>&#x2013;<lpage>2333</lpage>. <pub-id pub-id-type="doi">10.2353/ajpath.2010.100209</pub-id> <pub-id pub-id-type="pmid">20847288</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stone</surname> <given-names>A. V.</given-names></name> <name><surname>Vanderman</surname> <given-names>K. S.</given-names></name> <name><surname>Willey</surname> <given-names>J. S.</given-names></name> <name><surname>David</surname> <given-names>L.</given-names></name> <name><surname>Register</surname> <given-names>T. C.</given-names></name> <name><surname>Shively</surname> <given-names>C. A.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Anti-M&#x00FC;llerian hormone signaling regulates epithelial plasticity and chemoresistance in lung cancer.</article-title> <source><italic>Cell Rep.</italic></source> <volume>23</volume> <fpage>1780</fpage>&#x2013;<lpage>1789</lpage>. <pub-id pub-id-type="doi">10.1016/j.joca.2015.05.020</pub-id> <pub-id pub-id-type="pmid">26033163</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sudol</surname> <given-names>M.</given-names></name></person-group> (<year>1994</year>). <article-title>Yes-associated protein (YAP65) is a proline-rich phosphoprotein that binds to the SH3 domain of the Yes proto-oncogene product.</article-title> <source><italic>Oncogene</italic></source> <volume>9</volume> <fpage>2145</fpage>&#x2013;<lpage>2152</lpage>. <pub-id pub-id-type="pmid">8035999</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsai</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>J. T.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Cho</surname> <given-names>H.</given-names></name> <name><surname>Mamo</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>105</volume> <fpage>3041</fpage>&#x2013;<lpage>3046</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0711741105</pub-id> <pub-id pub-id-type="pmid">18287029</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vandin</surname> <given-names>F.</given-names></name> <name><surname>Upfal</surname> <given-names>E.</given-names></name> <name><surname>Raphael</surname> <given-names>B. J.</given-names></name></person-group> (<year>2012</year>). <article-title>De novo discovery of mutated driver pathways in cancer.</article-title> <source><italic>Genome Res.</italic></source> <volume>22</volume> <fpage>375</fpage>&#x2013;<lpage>385</lpage>. <pub-id pub-id-type="doi">10.1101/gr.120477.111</pub-id> <pub-id pub-id-type="pmid">21653252</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Varelas</surname> <given-names>X.</given-names></name></person-group> (<year>2014</year>). <article-title>The Hippo pathway effectors TAZ and YAP in development, homeostasis and disease.</article-title> <source><italic>Development</italic></source> <volume>141</volume> <fpage>1614</fpage>&#x2013;<lpage>1626</lpage>. <pub-id pub-id-type="doi">10.1242/dev.102376</pub-id> <pub-id pub-id-type="pmid">24715453</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xi</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information.</article-title> <source><italic>Mol. Biosyst.</italic></source> <volume>13</volume> <fpage>2135</fpage>&#x2013;<lpage>2144</lpage>. <pub-id pub-id-type="doi">10.1039/c7mb00303j</pub-id> <pub-id pub-id-type="pmid">28825429</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeh</surname> <given-names>T. C.</given-names></name> <name><surname>Marsh</surname> <given-names>V.</given-names></name> <name><surname>Bernat</surname> <given-names>B. A.</given-names></name> <name><surname>Ballard</surname> <given-names>J.</given-names></name> <name><surname>Colwell</surname> <given-names>H.</given-names></name> <name><surname>Evans</surname> <given-names>R. J.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>Biological characterization of ARRY-142886 (AZD6244), a potent, highly selective mitogen-activated protein kinase kinase 1/2 inhibitor.</article-title> <source><italic>Clin. Cancer Res.</italic></source> <volume>13</volume> <fpage>1576</fpage>&#x2013;<lpage>1583</lpage>. <pub-id pub-id-type="doi">10.1158/1078-0432.CCR-06-1150</pub-id> <pub-id pub-id-type="pmid">17332304</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Youn</surname> <given-names>A.</given-names></name> <name><surname>Simon</surname> <given-names>R.</given-names></name></person-group> (<year>2011</year>). <article-title>Identifying cancer driver genes in tumor genome sequencing studies.</article-title> <source><italic>Bioinformatics</italic></source> <volume>27</volume> <fpage>175</fpage>&#x2013;<lpage>181</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq630</pub-id> <pub-id pub-id-type="pmid">21169372</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Ma</surname> <given-names>G.</given-names></name> <name><surname>Cao</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name></person-group> (<year>2012</year>). <article-title>Reelin is involved in transforming growth factor-B1-induced cell migration in esophageal carcinoma cells.</article-title> <source><italic>PLoS One</italic></source> <volume>7</volume>:<issue>e31802</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0031802</pub-id> <pub-id pub-id-type="pmid">22393371</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zanconato</surname> <given-names>F.</given-names></name> <name><surname>Cordenonsi</surname> <given-names>M.</given-names></name> <name><surname>Piccolo</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>YAP/TAZ at the roots of cancer.</article-title> <source><italic>Cancer Cell</italic></source> <volume>29</volume> <fpage>783</fpage>&#x2013;<lpage>803</lpage>. <pub-id pub-id-type="doi">10.1016/j.ccell.2016.05.005</pub-id> <pub-id pub-id-type="pmid">27300434</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zanconato</surname> <given-names>F.</given-names></name> <name><surname>Forcato</surname> <given-names>M.</given-names></name> <name><surname>Battilana</surname> <given-names>G.</given-names></name> <name><surname>Azzolin</surname> <given-names>L.</given-names></name> <name><surname>Quaranta</surname> <given-names>E.</given-names></name> <name><surname>Bodega</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Genome-wide association between YAP/TAZ/TEAD and AP-1 at enhancers drives oncogenic growth.</article-title> <source><italic>Nat. Cell Biol.</italic></source> <volume>17</volume> <fpage>1218</fpage>&#x2013;<lpage>1227</lpage>. <pub-id pub-id-type="doi">10.1038/ncb3216</pub-id> <pub-id pub-id-type="pmid">26258633</pub-id></citation></ref>
</ref-list>
<glossary>
<title>Abbreviations</title>
<def-list id="DL1">
<def-item>
<term>BRCA</term>
<def>
<p>breast carcinomas</p>
</def>
</def-item>
<def-item>
<term>CaDrA</term>
<def>
<p>candidate driver analysis</p>
</def>
</def-item>
<def-item>
<term>CCLE</term>
<def>
<p>Cancer Cell Line Encyclopedia</p>
</def>
</def-item>
<def-item>
<term>COSMIC</term>
<def>
<p>Catalogue of Somatic Mutations in Cancer</p>
</def>
</def-item>
<def-item>
<term>FDR</term>
<def>
<p>false discovery rate</p>
</def>
</def-item>
<def-item>
<term>FPR</term>
<def>
<p>false positive rate</p>
</def>
</def-item>
<def-item>
<term>KS</term>
<def>
<p>Kolmogorov&#x2013;Smirnov</p>
</def>
</def-item>
<def-item>
<term>qRT-PCR</term>
<def>
<p>quantitative real-time polymerase chain reaction</p>
</def>
</def-item>
<def-item>
<term>RPPA</term>
<def>
<p>reverse phase protein array</p>
</def>
</def-item>
<def-item>
<term>SCNA</term>
<def>
<p>somatic copy number alteration</p>
</def>
</def-item>
<def-item>
<term>TCGA</term>
<def>
<p>The Cancer Genome Atlas</p>
</def>
</def-item>
<def-item>
<term>TN</term>
<def>
<p>triple-negative</p>
</def>
</def-item>
<def-item>
<term>TPR</term>
<def>
<p>true positive rate</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group>
<fn id="fn01"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="https://cancer.sanger.ac.uk/census">https://cancer.sanger.ac.uk/census</ext-link></p></fn>
</fn-group>
</back>
</article>