In silico Prioritization of Transporter–Drug Relationships From Drug Sensitivity Screens

The interplay between drugs and cell metabolism is a key factor in determining both compound potency and toxicity. In particular, how and to what extent transmembrane transporters affect drug uptake and disposition is currently only partially understood. Most transporter proteins belong to two protein families: the ATP-Binding Cassette (ABC) transporter family, whose members are often involved in xenobiotic efflux and drug resistance, and the large and heterogeneous family of solute carriers (SLCs). We recently argued that SLCs are collectively a rather neglected gene group, with most of its members still poorly characterized, and thus likely to include many yet-to-be-discovered associations with drugs. We searched publicly available resources and literature to define the currently known set of drugs transported by ABCs or SLCs, which involved ∼500 drugs and more than 100 transporters. In order to extend this set, we then mined the largest publicly available pharmacogenomics dataset, which involves approximately 1,000 molecularly annotated cancer cell lines and their response to 265 anti-cancer compounds, and used regularized linear regression models (Elastic Net, LASSO) to predict drug responses based on SLC and ABC data (expression levels, SNVs, CNVs). The most predictive models included both known and previously unidentified associations between drugs and transporters. To our knowledge, this represents the first application of regularized linear regression to this set of genes, providing an extensive prioritization of potentially pharmacologically interesting interactions.


INTRODUCTION
The role of cellular metabolism in determining the potency and distribution of drugs is increasingly recognized (Zhao et al., 2013). Along with the enzymes involved in actual xenobiotic transformation, such as members of the cytochrome and transferases families, a critical role is played by transmembrane transporters, which directly affect both the uptake and the excretion of drugs and their metabolites (Zhou et al., 2017). Among transmembrane transporters, two large families have been described: the family of ATP-binding cassette (ABC) transporters and the family of solute carriers (SLCs) . ABC transporters are pumps powered by the hydrolysis of ATP and show a remarkable broad range of substrates, including lipids, secondary metabolites, and xenobiotics. Members of this family, such as the ABCB/MDR and ABCC/MRP proteins, have been associated with resistance to a large number of structurally diverse compounds in cancer cells (Fletcher et al., 2010). SLCs are secondary transporters involved in uptake or efflux of metabolites and other chemical matter (Cesar-Razquin et al., 2015). At more than 400 members and counting, SLCs represent the second largest family of membrane proteins and comprise uniporters, symporters and antiporters, further grouped into subfamilies based on sequence similarity (Hoglund et al., 2011). Among the reported SLC substrates are all major building blocks of the cell, such as nucleic acids, sugars, lipids, and amino acids as well as vitamins, metals, and other ions . Despite the critical processes mediated by these proteins, a large portion of SLCs is still poorly characterized and, in several cases, lacks any associations with a substrate (Cesar-Razquin et al., 2015). Importantly, several members of the SLCO (also known as Organic Anion Transporter Proteins or OATPs) and SLC22 families (including the group of organic cation transporters or OCTs, organic zwitterion/cation transporters or OCTNs and organic anion transporters or OATs) have been found to play prominent roles in the uptake and excretion of drugs, especially in the liver and kidneys (Hagenbuch and Stieger, 2013). Several other cases of SLCs mediating the uptake of drugs have been reported, such as in the case of methotrexate and related anti-folate drugs with the folate transporter SLC19A1 (Zhao et al., 2011) or the anti-cancer drug YM155/sepantronium bromide and the orphan transporter SLC35F2 (Winter et al., 2014). Indeed, whether carrier-mediated uptake is the rule or rather the exception is still a matter of discussion (Dobson and Kell, 2008;Sugano et al., 2010). Due to the understudied nature of transporters and SLCs in particular, we can nonetheless expect that several other associations between drugs and transporters, involving direct transport or indirect effects, remain to be discovered and could provide novel insights into the pharmacokinetics of drugs and drug-like compounds.
Analysis of basal gene expression and genomic features in combination with drug sensitivity data allows the identification of molecular markers that render cells both sensitive and resistant to specific drugs. Such a pharmacogenomic analysis represents a powerful method to prioritize in silico gene-compound associations. Different statistical and machine learning (ML) strategies have been used in the past to confirm known as well as to identify novel drug-gene associations, although generally in a genome-wide context (Iorio et al., 2016). For our study, we mined the "Genomics of Drug Sensitivity in Cancer" (GDSC) dataset (Iorio et al., 2016) which contains drug sensitivity data to a set of 265 anti-cancer compounds over ∼1,000 molecularly annotated cancer cell lines, in order to explore drug relationships exclusively involving transporters (SLCs and ABCs). To such end, we used regularized linear regression (Elastic Net, LASSO) to generate predictive models from which to extract cooperative sensitivity and resistance drug-transporter relationships, in what represents, to our knowledge, the first work applying this type of analysis to this group of genes.

Data
Solute carriers and ABC genes were considered as in (Cesar-Razquin et al., 2015). Known drug transport cases involving SLC and ABC proteins were obtained from four main repositories as of September 2017: DrugBank (Law et al., 2014), The IUPHAR/BPS Guide to PHARMACOLOGY (Alexander et al., 2015), KEGG: Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto, 2000), and UCSF-FDA TransPortal (Morrissey et al., 2012). These data were complemented with various other cases found in the literature (Sprowl and Sparreboom, 2014;Winter et al., 2014;Nigam, 2015;Radic-Sarikas et al., 2017). Source files were parsed using custom python scripts, and all entries were manually curated, merged together and redundancies eliminated. The final compound list was searched against PubChem (Kim et al., 2016) in order to systematize names. A list of FDA-approved drugs was obtained from the organization's website. Network visualization was done using Cytoscape (Shannon et al., 2003).
All data corresponding to the GDSC dataset 1 (drug sensitivity, expression, copy number variations, single nucleotide variants, compounds, and cell lines) were obtained from the original website of the project as of September 2016. Drug sensitivity and transcriptomics data were used as provided. Genomics data were transformed into a binary matrix of genomic alterations vs. cell lines, where three different modifications for every gene were considered using the original source files: amplifications (ampSLCx), deletions (delSLCx), and variants (varSLCx). An amplification was annotated if there were more than two copies of at least one of the alleles for the gene of interest, and a deletion if at least one of the alleles was missing. Single nucleotide variants were filtered in order to exclude synonymous SNVs as well as nonsynonymous SNVs predicted not to be deleterious by either SIFT (Ng and Henikoff, 2001), Polyphen2 (Adzhubei et al., 2010), or FATHMM (Shihab et al., 2013).

LASSO Regression
LASSO regression analysis was performed using the "glmnet" R package (Friedman et al., 2010). Expression values for all genes in the dataset (17,419 genes in total) were used as input features. For each compound, the analysis was iterated 50 times over 10-fold cross validation. At each cross validation, features were ranked based on their frequency of appearance (number of times a feature has non zero coefficient for 100 default lambda possibilities). We then averaged the ranking across the 500 runs (50 iterations × 10 CV) in order to obtain a final list of genes associated to each compound. In this context, the most predictive gene for a certain drug does not necessarily have an average rank of one, even though its final rank is first.

Elastic Net Regression
Elastic net regression analysis was performed using the "glmnet" R package (Friedman et al., 2010). Genomic data (copy number variations and single nucleotide variants) and transcriptional profiles of SLC and ABC genes across the cell line panel were used as input variables, either alone or in combination. Drug area under the curve (AUC) values were used as response. Elastic net parameters were fixed as follows: (i) alpha, the mixing parameter that defines the penalty, was set to 0.5 in order to apply an intermediate penalty between Ridge and LASSO, and (ii) lambda, the tuning parameter that controls the overall strength of the penalty, was determined individually for every model (drug) by optimizing the mean squared cross-validated error.
For each compound, 500 Elastic Net models were generated by a 100× fivefold cross-validation procedure. In order to assess model performance, the Concordance Index (CI) (Harrell et al., 1996;Papillon-Cavanagh et al., 2013) between the predicted and observed AUC values was calculated for each run, and then averaged across all models. This index estimates the fraction of cell line pairs for which the model correctly predicts which of the two is the most and least sensitive; hence CI values of 0.5 and 1 would indicate random and perfect predictors, respectively. Feature weights were calculated by normalizing the fitted model coefficients to the absolute maximum at every cross-validation run. The final list of features associated with each compound was built by computing the frequency of appearance of each feature in all the 500 models as well as its average weight. Features with positive weights are associated with a resistance phenotype to the compound, and negative weights are indicative of sensitivity.

SLC and ABCs as Drug Transporters
We collected data from public repositories as well as relevant publications to define the current knowledge on transport of chemical compounds by members of the SLC and ABC protein classes. A total of 493 compounds linked to 107 transporters were retrieved, which altogether formed a single large network with a few other smaller components (Figure 1 and Supplementary  Table S1).
Within the largest network and in agreement with previous reports (Nigam, 2015), three families are significantly enriched (hypergeometric test, FDR ≤ 0.05): the SLCO/SLC21 family of organic anion transporters (9/12 members) (Hagenbuch and Stieger, 2013), the SLC22 family of organic anion, cation, and zwitterion transporters (13/23) (Koepsell, 2013;Nigam, 2018), and the ABCC family of multidrug resistance transporters (8/13) (Vasiliou et al., 2009). Not surprisingly, ABCB1 (P-glycoprotein; MDR1), the very well-studied efflux pump known for its broad substrate specificity and mediation of resistance to a large amount of drugs (Aller et al., 2009), is the most connected transporter in the network, linked to more than 200 compounds. In particular, 106 compounds are connected exclusively with ABCB1 and 25 other are exclusively shared with ABCG2 (BCRP), another well-known transporter and the one with the second highest degree in the network (Robey et al., 2007; Figure 1B). Other top-connected SLCs include members of the above mentioned SLCO and SLC22 families, which also show several common substrates (e.g., SLCO1B1 and SLCO1B3 share 36 compounds, and SLC22A8 and SLC22A6 share 20), as well as members of the SLC15 family (SLC15A1 and SLC15A2, which share 22 compounds), involved in the transport of beta-lactam antibiotics and peptide-mimetics (Smith et al., 2013). In contrast to these cases, other transporters appear related to one or only a few compounds. One such case is SLC35F2, whose only reported substrate to date is the anti-cancer drug YM155 (sepantronium bromide) (Winter et al., 2014). Finally, while most chemical compounds appear linked to one or two transporters, a few others show higher connectivities ( Figure 1C). A well-known example, methotrexate is transported by more than 20 different SLC and ABCs, including some belonging to families not commonly involved in drug transport, such as the folate carriers SLC19A1 and SLC46A1.

Transporter Expression Landscape in Cancer Cell Lines
The GDSC dataset contains expression data for 371 SLCs and 46 ABCs across a panel of ∼1,000 cell lines of different tissue origin. Each of these cell lines effectively express between 167 and 255 transporters, with a median value of 195 (Figure 2A and Supplementary Table S2). Although all together they cover almost the whole transporter repertoire (414/417), the distribution is clearly bimodal, with a common set of ∼130 transporters expressed in at least 900 cell lines and a more specific set of ∼140 expressed in less than 100 ( Figure 2B). Among the most commonly expressed transporters, we find several members of the SLC25 (mitochondrial carriers) and SLC35 (nucleosidesugars transporters) sub-families, the two largest among SLCs, as well as several members of the SLC39 family of zinc transporters. On the other end, many members of the SLC22 family, a large and well known group of proteins involved in the transport of drugs, as well as the SLC6 family, a well-studied family of neurotransmitter transporters, show a more specific expression pattern. As for ABCs, it is worth highlighting that subfamilies A and C present half of their members in the set of transporters of specific expression, while subfamily B has members in both sets.
When looking at actual expression values across the panel, some of the commonly expressed transporters coincide with those of highest expression ( Figure 2C). The most extreme cases are SLC25A5, SLC25A3, SLC25A6, and SLC38A1, which present very similar maximum and median values across the cell line panel. On the contrary, other transporters such as SLC26A3, SLC17A3, or SLC38A11 present a much wider range of expression, being amongst the highest expressed in some cell lines but completely absent from others.
Finally, substantial differences become apparent when considering transporter expression patterns according to the tissue of origin of the GDSC cancer cell lines ( Figure 2D). Cell lines belonging to the hematopoietic (blood) lineage, which includes leukemias, lymphomas, and myelomas, present the largest proportion of transporters with highest average expression values (28%), as indicated by Z-score, followed by cancer cell lines belonging to skin, kidney, and the digestive system. Interestingly, kidney cell lines also present the largest number of transporters with low expression values, pointing to a very wide range of expression and high specificity in those cells.

LASSO Regression Shows Importance of SLC Genes Across Whole Genome
We investigated the importance of SLC and ABC transporters for drug response by applying regularized linear regression on the GDSC dataset. To this end, we first built LASSO models of sensitivity to each compound based on genome-wide gene expression levels (17,419 genes in total) (Tibshirani, 1996), and then looked for cases where a transporter ranked as the top (first) predictor (see section "Materials and Methods"). The decision to focus exclusively on the top predictor is supported by a literature search. Indeed, the average number of PubMed publications containing both the drug and the gene name was over 40 in the case of top predictors, falling down to below 10 for the ones ranked second (Supplementary Figure S1). Consistent with their well-characterized role as drugtransporters, the multi-drug resistance pump ABCB1, as well as ABCG2, were the main predictors of resistance to a large number of drugs (Table 1). More interestingly, several compounds had an SLC as their best predictor ( Table 2). Among them, and in concordance with previous expression-sensitivity data (Rees et al., 2016), we find the sensitive association of sepantronium bromide (YM155) and SLC35F2, its main known importer (Winter et al., 2014). Another sensitive association involving SLC35F2 links this transporter to NSC-207895, a MDMX inhibitor (Wang et al., 2011). Dimethyloxalylglycin (DMOG), a synthetic analog of α-ketoglutarate that inhibits HIF prolyl hydroxylase (Zhdanov et al., 2015), showed association to two SLCs: monocarboxylate transporter SLC16A7 (MCT2) was the main predictor for sensitivity to this compound, while creatine transporter SLC6A8 (CT1) was associated with  (32) Vinorelbine (1) Thapsigargin (20) AT resistance. However, due to the high IC50 values of DMOG (in the millimolar range), this association is unlikely to be clinically relevant. Finally, cystine-glutamate transporter SLC7A11 (Blomen et al., 2015) is associated to resistance to the ROS-inducing drugs Shikonin, (5Z)-7-Oxozeaenol and Piperlongumine. This is in agreement with previous studies that highlighted a positive correlation of the expression of this transporter and resistance to several drugs via import of the cystine necessary for glutathione balance maintenance (Huang et al., 2005).

Elastic Net Regression Identifies Transporter-Drug Relationships
In order to further explore SLC and ABC involvement in drug response, we decided to build new predictive models using Elastic Net regression based on transporter molecular data only. Assessment of model performance was done by cross-validation using the CI (see section "Materials and Methods") We considered different predictors to build the models: genomics (copy number variations and single nucleotide variants), transcriptomics (gene expression) and a combination of both. Among these, gene expression resulted to be most predictive, in agreement with previous reports (Aydin et al., 2014; Figure 3A). A total of 139 (53%) of the 265 drugs included in the dataset had predictive models with a CI higher than 0.60, and 36 (14%) higher than 0.65 ( Figure 3B). For those drugs, we then ranked genes based on their frequency of appearance in the cross-validated models (indicative of the robustness of the association) and their average weight (indicative of the strength of the association as well as its direction). In this context, increased levels of transporter expression could therefore be associated with either sensitivity or resistance to the drug, for example, through its uptake or efflux, respectively ( Figure 3C). Among the top ranked transporterdrug associations, we identified several known cases of drug transport. For instance, the strongest sensitivity association with sepantronium bromide (YM155) corresponded again to SLC35F2. Similarly, the strongest resistance association for this drug was ABCB1, which includes YM155 among its many substrates (Lamers et al., 2012;Voges et al., 2016;Radic-Sarikas et al., 2017). Another example was methotrexate, for which  the folate transporter SLC19A1, known to mediate its import (Zhao et al., 2011), ranked second for sensitivity association (Supplementary Table S3). Two major patterns are apparent in the set of top-ranking associations: genes showing similar profiles of resistance or sensitivity across several different and unrelated compounds as well as groups of genes showing a similar profile in relation to a functionally related class of drugs ( Figure 3C).
A prototypical case of the first pattern is ABCB1, which is associated with resistance phenotypes to several compounds ( Figure 3D). Together with the aforementioned YM155, resistance relationships were predicted for known substrates vinblastine and docetaxel (Fletcher et al., 2010), 17-AAG/Tanespimycin (Huang et al., 2007) and AT-7519 (Cihalova et al., 2015) as well as other not previously associated compounds such as ZG-10 (a JNK1 inhibitor), the CDK2/5/7 inhibitor PHA-793887 and the broad kinase inhibitor WZ3105. Similar to ABCB1, other transporters showed multiple resistance and sensitivity associations to different compounds, particularly kinases and chromatin-related enzymes. Two of these "hubs" were SLC12A4/KCC1, a potassium-chloride cotransporter involved in cell volume homeostasis (Arroyo et al., 2013), and SLC35D2, an activated sugar transporter localized in the Golgi (Song, 2013).
As an example of the second class of associations, some of the best models were achieved for compounds belonging to the MEK inhibitor drug class (Trametinib, Selumetinib, Refametinib, CI-1040, PD-0325901, (5Z)-7-oxozeaenol), which showed very similar patterns, with sensitivity associated to SLC45A2, SLC27A1, SLC20A1, and SLC22A15 ( Figure 3E). SLC45A2 has been related to melanin synthesis and it is highly expressed in melanomas (Park et al., 2017), a cancer type sensitive to MEK inhibitors. Interestingly, SLC20A1/PiT1, a sodium-dependent phosphate transporter (Olah et al., 1994), was previously shown to regulate the ERK1/2 pathway independently of phosphate transport in skeletal cells (Bon et al., 2018). SLC27A1, a long-chain fatty acid transporter, and SLC22A15, an orphan member of the well-known family of cationic transporters involved in the transport of different compounds, were not previously associated with this drug class.
Finally, we also observed a strong sensitivity relationship between expression levels of the amino acid transporter SLC7A5/LAT1 and the Her2 and EGFR kinase inhibitors Afatinib, Gefitinib, and Bosutinib ( Figure 2C), consistent with previously published data (Timpe et al., 2015).

DISCUSSION
Transporters of the ABC and SLC superfamilies are increasingly recognized as key players in determining the distribution and metabolism of drugs and other xenobiotic compounds as they possess distinct and extremely variable expression patterns across cell lines and tissues (O'Hagan et al., 2018). Moreover, they have been implicated in the development of resistance to several chemotherapeutic drugs (Fletcher et al., 2010). A survey of currently known drug transport relationships revealed that only a fifth of the more than 500 SLCs and ABCs have been described to be involved in the transport of drugs. These transporters appear to be very unevenly distributed, with some genes and families considerably more represented and better connected than others (Figure 1). This is the case for several members of the ABCB, ABCC, SLCO, and SL22 sub-families. Similarly, while compounds such as methotrexate are linked to more than 20 transporters, most drugs are connected to only one.
To further expand this network, we took advantage of the expression and drug sensitivity data available within the GDSC project. We started by characterizing the expression patterns of SLCs and ABCs in the GDSC panel of ∼1,000 cancer cell lines, covering thirteen different tissues of origin (Figure 2). Roughly 80% of SLCs and 90% of ABCs were included in the datasets and we observed a bimodal distribution of their expression, with similarly sized sets of transporters either present in most cell lines or specific to a few. In particular, a broad spectrum of expressed transporters was detected in cell lines derived from the hematopoietic (blood) lineage as well as in cell lines derived from skin, kidney, and the digestive system. A large variability in the level of expression was also observed within the superfamilies, consistent with what recently reported by another recent study (O'Hagan et al., 2018).
We then implemented a linear regression-based approach to identify the set of transporters associated with sensitivity to each compound across all cell lines. Previous reports undertook a similar approach to identify associations of the ABC (Szakacs et al., 2004) and SLCO/SLC22 (Okabe et al., 2008) families with drug sensitivity within a limited set of about 60 cell lines. We now extended these results to a much more comprehensive set of cell lines while implementing regularized linear regression approaches. In a first step, LASSO regression was used to assess genome-wide importance of transporters as predictors for drug sensitivity. The choice of the LASSO method was motivated by its ability to shrink a large number of coefficients to zero, ideal for models that make use of thousands of predictors. Moreover, being a linear regression method, it can account for both positive and negative interactions (i.e., resistance and sensitivity, for example, by export and import in the case of a transporter), while at the same time being more interpretable than more complex models. Subsequently, we based our analysis on transporter data only. By removing the effect of other genes in the models, we could prioritize compounds that show a stronger dependency on transporters, as well as to analyze potential cooperative interactions among them. We used in this case Elastic Net regression, a generalization of the LASSO that overcomes some of its limitations and that has already been applied in similar contexts (Zou and Hastie, 2005;Barretina et al., 2012;Iorio et al., 2016).
We identify a large set of drug-transporter associations roughly split between sensitivity and resistance relationships (Figure 3 and Tables 1, 2). Known associations involving, for example, ABCB1 expression levels with increasing resistance to several unrelated compounds as well as known interactions such as the associations between antifolates and SLC19A1 or YM155 and SLC35F2 were clearly identified. Interestingly, we also observed cases were, similarly to ABCB1, a single gene was associated with several compounds, possibly as a result of an alteration of the general metabolic state of the cell. We also observed the opposite scenario, with several genes associated with a functionally related class of compounds as in the case of the MEK inhibitors and the genes SLC45A2, SLC27A1, SLC20A1, and SLC22A15. To our knowledge, no transporter has so far been identified for any member of this class of compounds, and while the association with the skin-specific SLC45A2 transporter is likely the result of the high sensitivity of melanoma cell lines to these drugs, other associations are more difficult to interpret.
We propose the gene list reported here as a means of prioritizing transporters that could explain the transport and pharmacodynamics of the associated compounds. While in many cases these associations could be due to indirect effects, such as a change in the metabolic state of the cells that renders them more sensitive or resistant to a compound, some might correspond to actual import or export processes. Further validation, for example, modulating the expression levels of the transporters or by transport assays, will be necessary in order to confirm and distinguish such different scenarios. Finally, the power of the analysis could also be increased by larger datasets, for instance including additional compounds, as well as by orthogonal or more accurate measurements. Availability of such pharmacogenomics datasets will be of critical importance for the further identification and characterization of transporter-drug associations.
In conclusion, we provide here an overview of the known ABC-and SLC-based drug transport relationships and expand this with an in silico-derived ranking of transporter-drug associations, identifying several novel and potential interesting cases. On the one hand, these new interactions offer new insights into the mode of drug transport across membranes, as well as provide initial structure activity relationships (SARs) for natural ligands, still unknown for many of these transporters. On the other hand, as many of the compounds involved in our analysis are clinically approved or candidates for oncological treatments, we hope that this study will provide novel hypotheses that could illuminate how transporters affect their pharmacodynamics and pharmacokinetics, as well as point to potential interactions with other compounds transported by the same proteins (e.g., in combination treatments), eventually leading to more specific and effective therapies.

AUTHOR CONTRIBUTIONS
AC-R and MY performed the data analysis. EG, MB, JS-R, and GS-F provided scientific insight and project supervision. AC-R, EG, MY, JS-R, and GS-F wrote the manuscript.

FUNDING
CeMM and the Superti-Furga laboratory were supported by the Austrian Academy of Sciences (GS-F and AC-R).

ACKNOWLEDGMENTS
We acknowledge receipt of third-party funds from the Austrian Science Fund (FWF I2192-B22 ERASE, AC-R and FWF P29250-B30 VITRA, EG), and from the JRC for Computational Biomedicine which was partially funded by Bayer AG.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.01011/full#supplementary-material FIGURE S1 | PubMed search of drug gene associations.