Abstract
Gene expression profiles of tissues treated with drugs have recently been used to infer clinical outcomes. Although this method is often successful from the application point of view, gene expression altered by drugs is rarely analyzed in detail, because of the extremely large number of genes involved. Here, we applied tensor decomposition (TD)-based unsupervised feature extraction (FE) to the gene expression profiles of 24 mouse tissues treated with 15 drugs. TD-based unsupervised FE enabled identification of the common effects of 15 drugs including an interesting universal feature: these drugs affect genes in a gene-group-wide manner and were dependent on three tissue types (neuronal, muscular, and gastroenterological). For each tissue group, TD-based unsupervised FE enabled identification of a few tens to a few hundreds of genes affected by the drug treatment. These genes are distinctly expressed between drug treatments and controls as well as between tissues in individual tissue groups and other tissues. We also validated the assignment of genes to individual tissue groups using multiple enrichment analyses. We conclude that TD-based unsupervised FE is a promising method for integrated analysis of gene expression profiles from multiple tissues treated with multiple drugs in a completely unsupervised manner.
Background
Drug design is a time-consuming and expensive process. Multiple coordinated experimental efforts, involving large-scale trial-and-error methods, are required to investigate new compounds. In general, this is due to the inherent difficulties in identifying novel therapeutic targets such as genes that cause disease. Even where potential target genes are identified robustly, it is difficult to find drug candidate compounds that successfully bind to the proteins they encode.
Computer-based methods have been introduced in an attempt to shorten the period of drug development and to reduce the expenses involved. The two major computer-aided drug design strategies are ligand-based drug design (LBDD) and structure-based drug design (SBDD). LBDD has various advantages including less required computational resources and better success rates for drug design. However, it also has the disadvantage of limited ability to find drug candidate compounds with low structural similarity to known drugs. To compensate for the weaknesses of LBDD, SBDD shows a greater ability to find drug candidate compounds lacking in structural similarity with known drugs. This is because SBDD tries to screen drug candidate compounds by investigating whether these can bind to target proteins. The weak point of SBDD is that it requires massive computational resources, and this prevents its application to large-scale screening, in which candidate drug compounds often number several million.
Considering the relatively low cost of obtaining gene expression profiles, a third computer-aided strategy has been proposed: gene expression profile-based drug design. In this strategy, the gene expression profiles of tissues/cell lines treated with candidate drug compounds are collected. The collected profiles are then compared with those of tissues/cell lines treated with known drug compounds. If the candidate drug compounds share a gene expression profile to some extent with known drug compounds, they are identified as having therapeutic potential against target diseases/proteins.
Some databases have been established to assist gene expression profiling for drug design. For example, chemical checker (Duran-Frigola et al., 2020) includes gene expression in computer-aided drug design, whereas PharmacoDB (Smirnov et al., 2017) is fully implemented to consider the dose dependence of drug-treated cell lines for drug design. Many papers have been published on the use of gene expression profiles for computer-aided drug design (Chengalvala et al., 2007; Bates, 2011). For instance, Huang et al. (2019) used combinatorial analysis of drug-induced gene expression for cancer drugs, which were then experimentally confirmed in vitro. Lee et al. (2017) proposed DeSigN, a robust and useful method for identifying candidate drugs using an input gene signature obtained from gene expression analysis. Kim et al. (2019) performed computational drug repositioning for gastric cancer using reversal of gene expression profiles, and De Wolf et al. (2018) analyzed high-throughput gene expression profiles to identify similarities between drugs and to predict compound activity. Hodos et al. (2017) tried to fill in missing gene expression observations in cells treated with drugs by predicting cell-specific drug perturbation profiles using available expression data from related conditions. Pabon et al. (2018) predicted protein targets for drug-like compounds using transcriptomics. In contrast, Liu et al. (2017) performed comparative analysis of genes that are frequently regulated by drugs based on connectivity to map transcriptome data.
In contrast to these successful applications of gene expression profile analysis to computer-aided drug design, it is unclear how individual gene expression is affected by drug treatment. First, the number of genes expressed in a dose dependent-manner is as large as the number of genes expressed. Thus, it is not easy to invent a useful method to integrate and understand the dose dependent-genes pertaining to individual gene expression profiles. For example, Lukačišin and Bollenbach (2019) employed principal component analysis (PCA) to integrate the dose dependence of gene expression profiles upon combinatorial drug treatment. They reported a convex (not monotonic) dependence on dose density and identified this as evidence of the cooperative effects of dual drug treatments. Nevertheless, convex dependence on dose was reportedly observed in a single drug treatment if tensor decomposition (TD) was employed to integrate multiple gene expression profiles of cell lines treated with a single drug (Taguchi, 2019). Thus, it is primarily important to identify an effective method that can integrate numerous gene expression profiles of tissues/cell lines treated with drugs.
Recently, Kozawa et al. (2020) used the gene expression profiles of mouse tissues treated with drugs to predict human clinical outcomes. In this paper, we applied TD-based unsupervised feature extraction (FE) to the gene expression profiles used in their study and attempted to identify the changes in gene expression profiles of mouse tissues treated with individual drugs.
Methods and Materials
Figure 1 shows the flow chart of analysis.
Figure 1
Gene Expression Profiles
The gene expression profiles used in this study were downloaded from the gene expression omnibus (GEO) with GEO ID GSE142068. Twenty four profiles named “GSE142068_count_XXXXX.txt.gz” were downloaded, where “XXXXX” indicates one of the 24 tissues, i.e., AdrenalG, Aorta, BM (Bone marrow), Brain, Colon, Eye, Heart, Ileum, Jejunum, Kidney, Liver, Lung, Pancreas, ParotidG, PituitaryG, SkMuscle, Skin, Skull, Spleen, Stomach, Spleen, Thymus, ThyroidG, and WAT (white adipose tissue), which were treated with 15 drugs: Alendronate, Acetaminophen, Aripiprazole, Asenapine, Cisplatin, Clozapine, Clozapine, Empagliflozin, Lenalidomide, Lurasidone, Olanzapine, Evolocumab, Risedronate, Sofosbuvir, and Teriparatide, and Wild type (WT).
TD-Based Unsupervised FE
For applying TD-based unsupervised FE (Taguchi, 2020) to the downloaded gene expression profiles, they must be formatted as a tensor. In this analysis, they were formatted as tensor, , for N genes, 24 tissues, 18 drug treatments, and two replicates. Then, the HOSVD (Taguchi, 2020) algorithm was applied to xijkm and we obtained TD
where G ∈ ℝN× 24 × 18 × 2 is the core tensor, , ,, and , represents singular value matrices that are also orthogonal matrices. xijkm is considered to be standardized as and .
Mathematically, Equation (1) aims to decompose the dependence of xijkm upon i, j, k, m into a series of products among uℓ1j, uℓ2k, uℓ3m, and uℓ4i, each of which is supposed to represent the dependence on i, j, k, m. As it is unlikely that a single product of uℓ1j, uℓ2k, uℓ3m, and uℓ4i can reproduce xijkm, we need to consider various combinations of uℓ1j, uℓ2k, uℓ3m, and uℓ4i, where those associated with distinct ℓ1, ℓ2, ℓ3, ℓ4 are supposed to be associated with distinct dependence on i, j, k, m. Then, the products of uℓ1j, uℓ2k, uℓ3m, and uℓ4i, must be summed up with the weight of G to reproduce xijkm. Biologically, we cannot expect that uℓ1j, uℓ2k, uℓ3m, and uℓ4i can represent the biological aspect because Equation (1) is simply a mathematical hypothesis; therefore, their association with a biological aspect after performing TD needs to be validated.
To understand how gene expression profiles are altered by drug treatment in a tissue-group-wide manner, we first need to investigate uℓ1j, uℓ2k, and uℓ3m. After identifying which ℓ1, ℓ2, and ℓ3 are biologically interesting, we select ℓ4 associated with G(ℓ1, ℓ2, ℓ3, ℓ4) that have the largest absolute values with fixed ℓ1, ℓ2, and ℓ3, because uℓ4i associated with such ℓ4 is supposed to represent the weight of gene i that is expressed in association with j, k, m dependence represented by the selected uℓ1j, uℓ2k, uℓ3m.
Using the identified uℓ4i, the P-value, Pi, is attributed to gene i as
where is the cumulative probability of χ2 distribution and σℓ4 is the standard deviation. Here, we assume that uℓ4i obeys a Gaussian distribution with zero mean because . Pi is corrected via the BH criterion (Burgos et al., 2014) and I, a set of genes i associated with adjusted P-values less than 0.01, is selected. For a more detailed explanation of TD-based unsupervised FE, see the recently published monograph (Taguchi, 2020).
t-Test and Wilcoxon Test Applied to Sets of Genes Classified Based on Tissue Groups and Drugs Groups
In order to determine whether the selected set of genes, I, are expressed distinctly between the two assigned tissue groups, J, {xijkm|i ∈ I, j ∈ J}, and , , we applied a two-way t test and Wilcoxon test and computed the P-values. Similar analyses were done with two drug groups, K, {xijkm|i ∈ I, k ∈ K}, and , .
Enrichment Analysis
The selected genes (gene symbols) were uploaded to Enrichr (Kuleshov et al., 2016) and Metascape (Zhou et al., 2019) in order to validate the various biological functions of the selected genes.
Results
Figure 2 summarizes the results obtained in this study.
Figure 2
Drug Treatment Specificity
After obtaining the TD, Equation (1), we first investigated uℓ2k attributed to the kth drug. Although the number of drugs tested is as many as 15, the total number of drug treatments was considered to be 18 due to the testing of three additional conditions. Usually, the first singular value vectors represent uniform values (i.e., components that are not distinct between samples) (Taguchi, 2020). In this case, u1k does not represent any dependence on drug treatment. This is reasonable because the expression of most genes is unlikely to be affected by drug treatment. We thus considered the second and third singular value vectors, u2k and u3k, attributed to drug treatments (Figure 3). In contrast to expectations, the drug treatments were quite universal. Most of the drug treatments [other than (2), (9), (15), and (17)] were separated from the control treatments [(2), (9), (15), and (17)] along one direction (red arrow) whereas the diversity among drug treatments was spread perpendicular (blue arrow) to that direction, only among drug treatments. This suggests that the gene expression profiles are altered similarly, independently of the drug treatment.
Figure 3
Tissue-Specificity
We further studied the relationship of universal drug treatments with individual tissues. For this, we next investigated uℓ1j attributed to 24 tissues. We then found that several uℓ1j are expressed in a tissue-group wide manner (Figure 4). The tissue-wide expression pattern identified by singular value vectors is described as follows; As u1j does not express any tissue specificities, it is unlikely to exhibit tissue specificity; as u2j has larger absolute values for the brain, eye, pituitary, and testis, it is likely to represent neuronal tissue specificities; as u3j has larger absolute values only for the parotid, we did not consider it further; as u4j exhibits larger absolute values for the heart and SkMuscle, we considered that it exhibits muscle specificities; As u5j and u6j exhibit larger absolute values for the pancreas and stomach, we considered that it exhibits gastric tissue specificities. It is thus obvious that the combination of tissue specificity is quite reasonable biologically.
Figure 4
Aiming to specify singular value vectors attributed to genes, uℓ4i, for gene selection, we then checked which of G(ℓ1, 2, 1, ℓ4) and G(ℓ1, 3, 1, ℓ4) have larger absolute values, as u1m always exhibits the same values between two replicates (Table 1).
Table 1
| ℓ1 | 2 | 4 | ||
|---|---|---|---|---|
| ℓ4 | G(2, 2, 1, ℓ4) | G(2, 3, 1, ℓ4) | G(4, 2, 1, ℓ4) | G(4, 3, 1, ℓ4) |
| 1 | 131.248442 | 19.7819438 | −98.4349019 | −13.498228 |
| 2 | −173.243689 | −23.9915660 | −4.8528076 | 1.113899 |
| 3 | −11.859736 | −3.2551088 | −0.1595594 | −1.116396 |
| 4 | 13.669561 | 2.4373120 | −81.3734282 | 36.838277 |
| 5 | 26.610843 | −0.3136913 | −22.2440356 | 9.820737 |
| 6 | −1.275395 | 4.5339065 | −1.3753621 | −5.318282 |
| 7 | −18.306263 | 15.9791077 | 21.3673134 | −11.230437 |
| 8 | 20.891762 | 26.5918473 | 3.9733331 | −7.152480 |
| 9 | 21.836494 | 16.1461476 | 9.2972447 | 2.232529 |
| 10 | 11.717415 | −12.8960548 | 1.4137802 | −7.748038 |
| ℓ1 | 5 | 6 | ||
| ℓ4 | G(5, 2, 1, ℓ4) | G(5, 3, 1, ℓ4) | G(6, 2, 1, ℓ4) | G(6, 3, 1, ℓ4) |
| 1 | 97.897860 | −42.9481806 | 72.181307 | 27.3218396 |
| 2 | 9.267391 | 4.5503920 | 3.780984 | 6.3881436 |
| 3 | −3.744432 | 0.2003586 | 2.340165 | −0.2130656 |
| 4 | 1.648558 | 3.4031386 | −9.812308 | 2.8751439 |
| 5 | 93.027741 | −56.9322793 | 6.435061 | 8.5776220 |
| 6 | −57.463765 | 23.2247109 | −19.332916 | 34.1868710 |
| 7 | 28.276681 | −26.9479131 | 30.604535 | −18.8319412 |
| 8 | 12.884351 | −13.8270607 | 1.798188 | −10.6484624 |
| 9 | −5.865058 | 1.0216563 | 9.581512 | 0.3507831 |
| 10 | 15.683762 | 3.7893181 | −14.429706 | −4.7985105 |
G(ℓ1, 2, 1, ℓ4) and G(ℓ1, 3, 1, ℓ4) for ℓ1 = 2, 4, 5, 6.
Values in bold correspond to those of ℓ4s used for gene selection with uℓ4i.
For ℓ1 = 2, which is supposed to be attributed to neuron-specific tissues (u2j), Gs with ℓ4 = 2 have larger absolute values. Thus, u2i was employed for neuron-specific gene selection. For ℓ1 = 4, which is supposed to be attributed to muscle-specific tissues (u4j), Gs with ℓ4 = 4 have larger absolute values. Thus, u4i was employed for muscle-specific gene selection. For ℓ1 = 5, which is supposed to be attributed to gastrointestinal-specific tissues (u5j), Gs with ℓ4 = 5 have larger absolute values. Thus, u5i was employed for muscle-specific gene selection. For ℓ1 = 6, which is also supposed to be attributed to gastrointestinal-specific tissues (u6j), Gs with ℓ4 = 6, 7 have larger absolute values. Then, u6i and u7i were employed for muscle-specific gene selection.
After computing the adjusted P-values, Pi, attributed to the genes (see methods), the genes associated with adjusted Pi < 0.01 were selected (Table 2). The lists of selected genes can be found in supporting information (Additional File 1). Figure 5 shows a Venn diagram of the selected genes. As expected, two sets of genes, Gas1 and Gas2, which are supposed to be gastrointestinal-specific, are quite common. Other than these, the selected genes are quite distinct from one another. Thus, TD-based unsupervised FE successfully identified the genes whose expression was affected by the drugs in a tissue group-specific manner.
Table 2
| P-values by statistical tests | ||||||
|---|---|---|---|---|---|---|
| Tissues | Drug treatment | |||||
| ℓ1 | Tissue specificity | # of Genes | Specified tissues | t-test | Wilcoxon test | t-test |
| 2 | Neuron | 18 | Brain, Eye, Pituitary, Testis | 2.14 × 10−24 | 9.65 × 10−49 | 0.22 |
| 4 | Muscle | 51 | Heart, SkMuscle | 1.99 × 10−55 | 2.67 × 10−77 | 0.04 |
| 5 | Gastrointestinal | 97 | Pancreas, Stomach | 8.48 × 10−11 | 2.73 × 10−40 | 8.13 × 10−22 |
| 6 | 128 | 6.67 × 10−8 | 8.69 × 10−90 | 8.69 × 10−90 | ||
Statistical tests for distinct expression between the specified tissues and other tissues, and between drug treatments and controls.
Figure 5
Confirmation of Differential Expression
In order to check whether the selected genes are expressed distinctly between the specified tissues and other tissues, as well as between drug treatments and controls, we first applied statistical tests to the selected genes (Table 2). The data clearly showed that for all cases, gene expression was distinct between the specified tissues and other tissues as well as between drug treatments and controls. Thus, TD-based unsupervised FE allowed us to select the genes whose expression is coincident with uℓ2ks in Figure 3 and uℓ1js in Figure 4.
Biological Evaluation
Next, we evaluated the selected genes biologically. For this purpose, we first uploaded the genes to Metascape (Figure 6). Initially, we noticed that Gas1 and Gas2 largely shared the enriched terms as expected, even though these two gene sets were selected using distinct singular values (u5i and u6i, u7i, respectively). In particular, it is important to note that two KEGG terms, “mmu04971: Gastric acid secretion” and “mmu04972: Pancreatic secretion” are shared by Gas1 and Gas2, which are supposed to be Pancreas- and Stomach-specific. In contrast, various muscle-related terms are enriched in the Muscle gene set as expected, whereas “GO:0002088: lens development in camera-type eye” is enriched in the neuronal gene set. All of these results suggest that TD-based unsupervised FE selected the biologically reasonable genes.
Figure 6
Figure 7 shows the protein-protein interaction (PPI) network provided by Metascape. A high degree of connectivity was obvious. Thus, TD-based unsupervised FE identified the sets of genes among which PPI is enriched. Moreover, Gas1 and Gas2 largely share the PPI network, whereas the neuronal and muscular gene sets form their own PPI network within which PPI is enriched. Thus, PPI analysis also indicated that TD-based unsupervised FE identified biologically reasonable genes.
Figure 7
To eliminate the possibility that our results were specific to the Metascape data set, we uploaded the genes selected by TD-based unsupervised FE to Enrichr (Table 3). With this data set, we observed clear detection of at least one tissue-related disease for four sets of tissue-specific genes, validating the Metascape-based results.
Table 3
| Disease perturbations from GEO up | |||
|---|---|---|---|
| Term | Overlap | P-value | Adjusted P-value |
| Neuron-specific genes | |||
| Amyotrophic lateral sclerosis DOID-332 mouse GSE3343 sample 685 | 5/138 | 1.16 × 10−7 | 9.72 × 10−5 |
| Retinitis Pigmentosa C0035334 mouse GSE128 sample 33 | 5/338 | 9.57 × 10−6 | 4.01 × 10−2 |
| Muscle specific | |||
| Polycystic Ovary Syndrome C0032460 human GSE6798 sample 292 | 21/306 | 2.87 × 10−25 | 2.41 × 10−22 |
| Polycystic ovary syndrome DOID-11612 human GSE8157 sample 880 | 21/325 | 1.03 × 10−24 | 4.32 × 10−22 |
| Insulin Resistance DOID-9352 human GSE36297 sample 581 | 20/290 | 4.52 × 10−24 | 1.26 × 10−21 |
| Neurogenic Muscular Atrophy C0270948 rat GSE2566 sample 396 | 18/208 | 1.98 × 10−23 | 4.15 × 10−21 |
| Nemaline myopathy DOID-3191 mouse GSE3384 sample 976 | 15/150 | 1.65 × 10−20 | 2.77 × 10−18 |
| Psoriasis DOID-8893 mouse GSE27628 sample 822 | 18/346 | 2.06 × 10−19 | 2.88 × 10−17 |
| Nemaline Myopathy C0206157 mouse GSE3384 sample 160 | 16/276 | 5.20 × 10−18 | 6.24 × 10−16 |
| Muscular Dystrophy C0026850 mouse GSE2507 sample 405 | 16/278 | 5.84 × 10−18 | 6.13 × 10−16 |
| Cystic fibrosis DOID-1485 mouse GSE3100 sample 1057 | 17/344 | 5.91 × 10−18 | 5.51 × 10−16 |
| COPD - Chronic obstructive pulmonary disease C0024117 human GSE475 sample 343 | 16/289 | 1.09 × 10−17 | 9.11 × 10−16 |
| Disease perturbations from GEO down | |||
| Term | Overlap | P-value | AdjustedP-value |
| Gas1 genes | |||
| Pancreatitis DOID-4989 mouse GSE3644 sample 513 | 36/238 | 9.21 × 10−45 | 7.73 × 10−42 |
| Skin squamous cell carcinoma DOID-3151 human GSE2503 sample 627 | 37/373 | 5.24 × 10−39 | 2.20 × 10−36 |
| Pancreatic ductal adenocarcinoma DOID-3498 mouse GSE53659 sample 699 | 26/101 | 1.24 × 10−38 | 3.48 × 10−36 |
| Pancreatic invasive intraductal papillary-mucinous carcinoma DOID-8150 human GSE19650 sample 610 | 31/248 | 1.21 × 10−35 | 2.54 × 10−33 |
| Cystic fibrosis DOID-1485 mouse GSE769 sample 1058 | 32/288 | 3.92 × 10−35 | 6.58 × 10−33 |
| Acute pancreatitis C0001339 mouse GSE3644 sample 376 | 28/188 | 2.33 × 10−34 | 3.26 × 10−32 |
| Cystic Fibrosis C0010674 mouse GSE769 sample 428 | 31/275 | 3.34 × 10−34 | 4.00 × 10−32 |
| Chronic phase chronic myelogenous leukemia DOID-8552 human GSE5550 sample 456 | 30/270 | 7.05 × 10−33 | 7.40 × 10−31 |
| Invasive ductal carcinoma DOID-3008 human GSE21422 sample 606 | 31/304 | 8.09 × 10−33 | 7.54 × 10−31 |
| Eczema C0013595 human GSE6012 sample 268 | 26/163 | 1.06 × 10−32 | 8.87 × 10−31 |
| Gas2 genes | |||
| Skin squamous cell carcinoma DOID-3151 human GSE2503 sample 627 | 51/373 | 9.37 × 10−55 | 7.86 × 10−52 |
| Pancreatitis DOID-4989 mouse GSE3644 sample 513 | 45/238 | 1.14 × 10−54 | 4.77 × 10−52 |
| Systemic lupus erythematosus DOID-9074 human GSE10325 sample 691 | 43/210 | 9.36 × 10−54 | 2.62 × 10−51 |
| Systemic lupus erythematosus (SLE) DOID-9074 human GSE36700 sample 512 | 47/294 | 1.50 × 10−53 | 3.15 × 10−51 |
| Invasive ductal carcinoma DOID-3008 human GSE21422 sample 606 | 44/304 | 5.90 × 10−48 | 9.90 × 10−46 |
| Eczema C0013595 human GSE6012 sample 268 | 37/163 | 7.47 × 10−48 | 1.04 × 10−45 |
| Malignant Melanoma C0025202 human GSE3189 sample 117 | 41/250 | 6.75 × 10−47 | 8.09 × 10−45 |
| Chronic phase chronic myelogenous leukemia DOID-8552 human GSE5550 sample 456 | 41/270 | 1.91 × 10−45 | 2.00 × 10−43 |
| Sickle Cell Anemia C0002895 human GSE9877 sample 109 | 37/197 | 1.61 × 10−44 | 1.50 × 10−42 |
| Actinic keratosis C0022602 human GSE2503 sample 350 | 46/429 | 4.69 × 10−44 | 3.94 × 10−42 |
Enrichment analysis for “Disease Perturbations from GEO up” and “Disease Perturbations from GEO down” by Enrichr.
Diseases in bold correspond to those related to specific tissues. Up to the top 10 ranked terms are shown.
Discussion
Although it is unclear why the 15 drugs affected the expression of many common genes, a detailed investigation can allow further interpretation. Table 4 shows the drugs' effects on neuronal, muscular, and pancreatic tissues. These data suggest that most drugs simultaneously affect these three groups of tissues.
Table 4
| Tissue types | |||
|---|---|---|---|
| Drugs | Neuron | Muscle | Pancreas or stomach |
| Alendronate | Brain calcification (Oliveira and Oliveira, 2016) | Muscle mass (Harada et al., 2015) | Pancreatitis (Hung, 2014) |
| Acetaminophen (APAP) | Brain (Ghanem et al., 2016) | Skeletal muscle (Trappe et al., 2011) | Pancreatitis (Chen et al., 2015) |
| Aripiprazole | Brain activation (Myrick et al., 2010) | Muscle spasms (*) | Pancreatitis (Kiraly and Gunning, 2008) |
| Asenapine | Cognitive and monoamine dysfunction Elsworth et al. (2012) | Muscle rigidity(*) | — |
| Cisplatin | Prefrontal cortex (Huo et al., 2018) | Muscle atrophy (Sakai et al., 2014) | Pancreas (Yadav, 2019) |
| Clozapine | Brain (Li et al., 2014) | Myotoxicity (Reznik et al., 2000) | Pancreatitis (Bergemann et al., 1999) |
| Doxycycline | Brain (Lucchetti et al., 2018) | Smooth Muscle (Bendeck et al., 2002) | Acute pancreatitis (Rawla and Raj, 2017) |
| Empagliflozin | Neurovascular unit and neuroglia (Hayden et al., 2019) | Muscle sympathetic nerve activity (Jordan et al., 2017) | Pancreatitis (Kishimoto et al., 2019) |
| Lenalidomide | Memory loss (Rollin-Sillaire et al., 2013) | Muscle cramp (Reece et al., 2012) | Pancreatic cancer (Ullenhag et al., 2017) |
| Lurasidone | Acute schizophrenia (Yasui-Furukori, 2012) | Muscle (*) | — |
| Olanzapine | Brain stem (Anwar et al., 2016) | Acute muscle toxicity (Keyal et al., 2017) | Pancreatitis (Kerr et al., 2007) |
| Repatha (Evolocumab) | — | Muscle-related statin Intolerance (Nissen et al., 2016) | — |
| Risedronate (actonel) | Ocular myasthenia (Raja et al., 2007) | Muscle weakness (Badayan and Cudkowicz, 2009) | Gastrointestinal cancer (Vinogradova et al., 2013) |
| Sofosbuvir | Ocular surface (Salman, 2016) | Myositis (Patel et al., 2015) | Pancreatitis (Margapuri and Jubbal, 2019) |
| Teriparatide | — | Muscle cramp (Kakaria et al., 2005) | Pancreatitis (*) |
Previously reported drug effects on neuron (brain and eye), muscle and pancreas tissues.
Reported side effects.
Our results are in contrast to the study that inspired our work (Kozawa et al., 2020), in which the authors employed a fully supervised approach requiring previous knowledge. Although Kozawa et al. (2020) also aimed to infer the therapeutic and side effects of drug treatments in humans based on gene expression in drug-treated tissues, their analysis required previous knowledge that is not needed for TD-based unsupervised FE. In this sense, our approach has distinct potential that the original study could not achieve.
In addition to the above-mentioned biological superiority of TD-based unsupervised FE, this approach also has some methodological advantages as follows. First, although we classified 24 tissues into two groups based on the observation of singular value vectors attributed to tissues, uℓ1j (Figure 4) prior to the identification of differentially expressed genes, it is computationally infeasible for other methods to classify 24 tissues into two groups before starting to seek differentially expressed genes, as there are no criteria on how to divide 24 tissues into two groups. It is thus practically impossible to analyze all possible divisions, as they number in the millions. The same advantage is observed when grouping 18 drug treatments into two. This may be much easier than classifying tissues, because some of the drug treatments are obviously controls. Nevertheless, based upon the second and third singular value vectors attributed to drug treatments, u2k and u3k (Figure 3), acetaminophen (APAP) and sofosbuvir are grouped together with two control treatments. Such a classification can never be proposed without TD. In this sense, there is no computationally feasible method that can compete with our method.
The biological basis for the two groups of drugs seen in Figure 3 may be questioned. To clarify this point, we uploaded two groups of drugs to DrugEnrichr (Kuleshov et al., 2020), which evaluates the coincidence of genes targeted by the uploaded drugs (Additional File 2). Based on the “Geneshot Predicted from Co-expression” category in DrugEnrichr, we found that there are at least as many as 164 genes targeted by two drugs (APAP and Sofosbuvir) in group1 whereas 213 genes are targeted by at least two drugs among as many as 13 drugs included in group2 (Alendronate, Aripiprazole, Asenapine, Cisplatin, Clozapine, Dox, EMPA, Lenalidomide, Lurasidone, Olanzapine, Repatha, Risedronate, Teriparatide). This suggests that these two groups of drugs are quite distinct because there are no common targeted genes between these 164 and 213 genes. Thus, the groups of drugs identified by TD based unsupervised FE are likely based on the genes that the drugs target.
In view of the two above-mentioned advantages, TD-based unsupervised FE might yield completely distinct outcomes that other supervised methods cannot, and it therefore represents a worthwhile primary or supplementary approach to gene-expression-based investigation of drug effects.
One might wonder if the results were confirmed only by single experiments. As the results shown in Table 3 indicate coincidence between the present result and other studies, the results derived in this study are not dependent on a single study, but are coincident with numerous studies in the public domain database.
Moreover, TD-based unsupervised FE is a very useful strategy for repositioning known drugs. As shown in Figure 3, TD-based unsupervised FE can determine the effective tissue. Furthermore, as indicated in Table 3, the genes selected by TD-based unsupervised FE can indicate the diseases for which the drugs have potential effectiveness. Therefore, applying TD-based unsupervised FE to gene expression profiles altered by drug treatments can be a promising strategy to repurpose known drugs for new diseases.
Conclusions
In this paper, we applied TD-based unsupervised FE (Taguchi, 2020) to the gene expression profiles of 24 mouse tissues treated with 15 drugs. Integrated analysis allowed us to identify the universal nature of drug treatments in a tissue-group-wide manner, which is generally impossible to identify using any other supervised strategy that requires prior information.
Statements
Data availability statement
All datasets analyzed in this study were obtained from GEO: GSE142068.
Author contributions
Y-hT planned and performed the study. Y-hT and TT discussed the results and wrote the paper. All authors contributed to the article and approved the submitted version.
Funding
This study was supported by KAKENHI 19H05270, 20K12067, and 20H04848. This project was also funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. KEP-8-611-38. The authors, therefore, acknowledge DSR with thanks for providing technical and financial support.
Acknowledgments
The authors would like to thank the reviewers for very constructive comments and thoughtful suggestions. This manuscript has been released as a pre-print at BioRxiv (Taguchi and Turki, 2020).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.00695/full#supplementary-material
Additional File 1List of genes selected by TD-based unsupervised FE. List of genes shown in Table 2 (xlsx).
Additional File 2List of genes predicted by DrugEnrichr, whose expression is likely affected by the drugs investigated in this study. List of genes (xlsx).
References
1
AnwarI. J.MiyataK.ZsombokA. (2016). Brain stem as a target site for the metabolic side effects of olanzapine. J. Neurophysiol.115, 1389–1398. 10.1152/jn.00387.2015
2
BadayanI.CudkowiczM. E. (2009). Profound muscle weakness and pain after one dose of Actonel. Case Rep. Med.2009:693014. 10.1155/2009/693014
3
BatesS. (2011). The role of gene expression profiling in drug discovery. Curr. Opin. Pharmacol.11, 549–556. 10.1016/j.coph.2011.06.009
4
BendeckM. P.ConteM.ZhangM.NiliN.StraussB. H.FarwellS. M. (2002). Doxycycline modulates smooth muscle cell growth, migration, and matrix remodeling after arterial injury. Am. J. Pathol.160, 1089–1095. 10.1016/S0002-9440(10)64929-2
5
BergemannN.EhrigC.DieboldK.MundtC.EinsiedelR. (1999). Asymptomatic pancreatitis associated with clozapine. Pharmacopsychiatry32, 78–80. 10.1055/s-2007-979197
6
BurgosK.MalenicaI.MetpallyR.CourtrightA.RakelaB.BeachT.et al. (2014). Profiles of extracellular miRNA in cerebrospinal fluid and serum from patients with Alzheimer's and Parkinson's diseases correlate with disease status and features of pathology. PLoS ONE9:e94839. 10.1371/journal.pone.0094839
7
ChenS.-J.LinC.-S.HsuC.-W.LinC.-L.KaoC.-H. (2015). Acetaminophen poisoning and risk of acute pancreatitis. Medicine94:e1195. 10.1097/MD.0000000000001195
8
ChengalvalaM. V.ChennathukuzhiV. M.JohnstonD. S.StevisP. E.KopfG. S. (2007). Gene expression profiling and its practice in drug development. Curr. Genomics8, 262–270. 10.2174/138920207781386942
9
De WolfH.CougnaudL.Van HoordeK.De BondtA.WegnerJ. K.CeulemansH.et al. (2018). High-throughput gene expression profiles to define drug similarity and predict compound activity. ASSAY Drug Dev. Technol.16, 162–176. 10.1089/adt.2018.845
10
Duran-FrigolaM.PaulsE.Guitart-PlaO.BertoniM.AlcaldeV.AmatD.et al. (2020). Extending the small molecule similarity principle to all levels of biology with the Chemical Checker. Nat. Biotechnol. 34:591. 10.1038/s41587-020-0564-6
11
ElsworthJ. D.GromanS. M.JentschJ. D.VallesR.ShahidM.WongE.et al. (2012). Asenapine effects on cognitive and monoamine dysfunction elicited by subchronic phencyclidine administration. Neuropharmacology62, 1442–1452. 10.1016/j.neuropharm.2011.08.026
12
GhanemC. I.PérezM. J.ManautouJ.MottinoA. D. (2016). Acetaminophen from liver to brain: new insights into drug pharmacological action and toxicity. Pharmacol. Res.109, 119–131. 10.1016/j.phrs.2016.02.020
13
HaradaA.ItoS.MatsuiY.SakaiY.TakemuraM.TokudaH.et al. (2015). Effect of alendronate on muscle mass: Investigation in patients with osteoporosis. Osteopor. Sarcopen.1, 53–58. 10.1016/j.afos.2015.07.005
14
HaydenM. R.GrantD. G.AroorA. R.DeMarcoV. G. (2019). Empagliflozin ameliorates type 2 diabetes-induced ultrastructural remodeling of the neurovascular unit and neuroglia in the female db/db mouse. Brain Sci.9:57. 10.3390/brainsci9030057
15
HodosR.ZhangP.LeeH.-C.DuanQ.WangZ.ClarkN. R.et al. (2017). Cell-specific prediction and application of drug-induced gene expression profiles, in Biocomputing 2018, eds AltmanR. B.DunkerA. K.HunterL.RitchieM. D.MurrayT.KleinT. E. (Kohala Coast, HI: World Scientific), 32–43.
16
HuangC.-T.HsiehC.-H.ChungY.-H.OyangY.-J.HuangH.-C.JuanH.-F. (2019). Perturbational gene-expression signatures for combinatorial drug discovery. iScience15, 291–306. 10.1016/j.isci.2019.04.039
17
HungW. Y. (2014). Contemporary review of drug-induced pancreatitis: A different perspective. World J. Gastrointest. Pathophysiol.5:405. 10.4291/wjgp.v5.i4.405
18
HuoX.ReyesT. M.HeijnenC. J.KavelaarsA. (2018). Cisplatin treatment induces attention deficits and impairs synaptic integrity in the prefrontal cortex in mice. Sci. Rep.8:17400. 10.1038/s41598-018-35919-x
19
JordanJ.TankJ.HeusserK.HeiseT.WannerC.HeerM.et al. (2017). The effect of empagliflozin on muscle sympathetic nerve activity in patients with type II diabetes mellitus. J. Am. Soc. Hypertens.11, 604–612. 10.1016/j.jash.2017.07.005
20
KakariaP. J.NashelD. J.NylenE. S. (2005). Debilitating muscle cramps after teriparatide therapy. Ann. Intern. Med.142:310. 10.7326/0003-4819-142-4-200502150-00023
21
KerrT. A.JonnalagaddaS.PrakashC.AzarR. (2007). Pancreatitis following olanzapine therapy: a report of three cases. Case Rep. Gastroenterol.1, 15–20. 10.1159/000104222
22
KeyalN.ShresthaG.PradhanS.MaharjanR.AcharyaS.MarhattaM. (2017). Olanzapine overdose presenting with acute muscle toxicity. Int. J. Crit. Illness Inj. Sci.7, 69–71. 10.4103/2229-5151.201962
23
KimI.-W.JangH.KimJ. H.KimM. G.KimS.OhJ. M. (2019). Computational drug repositioning for gastric cancer using reversal gene expression profiles. Sci. Rep.9:2660. 10.1038/s41598-019-39228-9
24
KiralyB.GunningK. (2008). A case of pancreatitis associated with aripiprazole in the absence of hyperglycemia. Prim. Care Compan. J. Clin. Psychiatry10, 484–485. 10.4088/PCC.v10n0612e
25
KishimotoM.YamaokiK.AdachiM. (2019). Combination therapy with empagliflozin and insulin results in successful glycemic control: A case report of uncontrolled diabetes caused by autoimmune pancreatitis and subsequent steroid treatment. Case Rep. Endocrinol.2019:9415347. 10.1155/2019/9415347
26
KozawaS.SagawaF.EndoS.AlmeidaG. M. D.MitsuishiY.SatoT. N. (2020). Predicting human clinical outcomes using mouse multi-organ transcriptome. iScience23:100791. 10.1016/j.isci.2019.100791
27
KuleshovM.KropiwnickiE.Ma'ayanA. (2020). Drugenrichr. Available online at: https://amp.pharm.mssm.edu/DrugEnrichr/
28
KuleshovM. V.JonesM. R.RouillardA. D.FernandezN. F.DuanQ.WangZ.et al. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res.44, W90–W97. 10.1093/nar/gkw377
29
LeeB. K. B.TiongK. H.ChangJ. K.LiewC. S.RahmanZ. A. A.TanA. C.et al. (2017). DeSigN: connecting gene expression with therapeutics for drug repurposing and development. BMC Genomics18:934. 10.1186/s12864-016-3260-7
30
LiC. H.StratfordR. E.de MendizabalN. V.CremersT. I.PollockB. G.MulsantB. H.et al. (2014). Prediction of brain clozapine and norclozapine concentrations in humans from a scaled pharmacokinetic model for rat brain and plasma pharmacokinetics. J. Transl. Med.12:203. 10.1186/1479-5876-12-203
31
LiuX.ZengP.CuiQ.ZhouY. (2017). Comparative analysis of genes frequently regulated by drugs based on connectivity map transcriptome data. PLoS ONE12:e179037. 10.1371/journal.pone.0179037
32
LucchettiJ.FracassoC.BalducciC.PassoniA.ForloniG.SalmonaM.et al. (2018). Plasma and brain concentrations of doxycycline after single and repeated doses in wild-type and APP23 mice. J. Pharmacol. Exp. Therap. 368, 32–40. 10.1124/jpet.118.252064
33
LukačišinM.BollenbachT. (2019). Emergent gene expression responses to drug combinations predict higher-order drug interactions. Cell Syst.9, 423–433.e3. 10.1016/j.cels.2019.10.004
34
MargapuriJ.JubbalS. (2019). 902: Acute pancreatitis following treatment with ledipasvir/sofosbuvir for hepatitis virus infection. Crit. Care Med.47:430. 10.1097/01.ccm.0000551651.45733.80
35
MyrickH.LiX.RandallP. K.HendersonS.VoroninK.AntonR. F. (2010). The effect of aripiprazole on cue-induced brain activation and drinking parameters in alcoholics. J. Clin. Psychopharmacol.30, 365–372. 10.1097/JCP.0b013e3181e75cff
36
NissenS. E.StroesE.Dent-AcostaR. E.RosensonR. S.LehmanS. J.SattarN.et al. (2016). Efficacy and tolerability of evolocumab vs. ezetimibe in patients with muscle-related statin intolerance: the GAUSS-3 randomized clinical trial. JAMA315, 1580–1590. 10.1001/jama.2016.3608
37
OliveiraJ. R. M.OliveiraM. F. (2016). Primary brain calcification in patients undergoing treatment with the biphosphanate alendronate. Sci. Rep.6:22961. 10.1038/srep22961
38
PabonN. A.XiaY.EstabrooksS. K.YeZ.HerbrandA. K.EvelynS.et al. (2018). Predicting protein targets for drug-like compounds using transcriptomics. PLoS Comput. Biol.14:e1006651. 10.1371/journal.pcbi.1006651
39
PatelS.TrakrooS.SanakaS.QureshiK. (2015). Severe myositis with the use of Sofosbuvir/Ledipasvir for Hepatitis C infection: a case of unexpected interactions. Am. J. Gastroenterol.110:S333. 10.14309/00000434-201510001-00761
40
RajaV.SandanshivP.NeugebauerM. (2007). Risedronate induced transient ocular myasthenia. J. Postgrad. Med.53:274. 10.4103/0022-3859.37525
41
RawlaP.RajJ. P. (2017). Doxycycline-induced acute pancreatitis: a rare adverse event. Gastroenterol. Res.10, 244–246. 10.14740/gr838w
42
ReeceD.KouroukisC. T.LeBlancR.SebagM.SongK.AshkenasJ. (2012). Practical approaches to the use of lenalidomide in multiple myeloma: a canadian consensus. Adv. Hematol.2012:621958. 10.1155/2012/621958
43
ReznikI.VolchekL.MesterR.KotlerM.Sarova-PinhasI.SpivakB.et al. (2000). Myotoxicity and neurotoxicity during clozapine treatment. Clin. Neuropharmacol.23, 276–280. 10.1097/00002826-200009000-00007
44
Rollin-SillaireA.DelbeuckX.PolletM.MackowiakM.-A.LenfantP.NoelM.-P.et al. (2013). Memory loss during lenalidomide treatment: a report on two cases. BMC Pharmacol. Toxicol.14:41. 10.1186/2050-6511-14-41
45
SakaiH.SagaraA.ArakawaK.SugiyamaR.HirosakiA.TakaseK.et al. (2014). Mechanisms of cisplatin-induced muscle atrophy. Toxicol. Appl. Pharmacol.278, 190–199. 10.1016/j.taap.2014.05.001
46
SalmanA. G. (2016). Ocular surface changes with sofosbuvir in egyptian patients with Hepatitis C virus infection. Cornea35, 323–328. 10.1097/ICO.0000000000000736
47
SmirnovP.KofiaV.MaruA.FreemanM.HoC.El-HachemN.et al. (2017). PharmacoDB: an integrative database for mining in vitro anticancer drug screening studies. Nucleic Acids Res.46, D994–D1002. 10.1093/nar/gkx911
48
TaguchiY-h. (2019). Drug candidate identification based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data. BMC Bioinformatics19:388. 10.1186/s12859-018-2395-8
49
TaguchiY-h. (2020). Unsupervised Feature Extraction Applied to Bioinformatics: A PCA Based and TD Based Approach. Switzerland: Springer International Publishing.
50
TaguchiY-h.TurkiT. (2020). Universal nature of drug treatment responses in drug-tissue-wide model-animal experiments using tensor decomposition-based unsupervised feature extraction. bioRxiv 2020.03.08.982405. 10.1101/2020.03.08.982405
51
TrappeT. A.CarrollC. C.DickinsonJ. M.LeMoineJ. K.HausJ. M.SullivanB. E.et al. (2011). Influence of acetaminophen and ibuprofen on skeletal muscle adaptations to resistance exercise in older adults. Am. J. Physiol. Regul. Integr. Compar. Physiol.300, R655–R662. 10.1152/ajpregu.00611.2010
52
UllenhagG. J.MozaffariF.BrobergM.MellstedtH.LiljeforsM. (2017). Clinical and immune effects of lenalidomide in combination with gemcitabine in patients with advanced pancreatic cancer. PLoS ONE12:e169736. 10.1371/journal.pone.0169736
53
VinogradovaY.CouplandC.Hippisley-CoxJ. (2013). Exposure to bisphosphonates and risk of common non-gastrointestinal cancers: series of nested case–control studies using two primary-care databases. Br. J. Cancer109, 795–806. 10.1038/bjc.2013.383
54
YadavY. C. (2019). Effect of cisplatin on pancreas and testies in wistar rats: biochemical parameters and histology. Heliyon5:e02247. 10.1016/j.heliyon.2019.e02247
55
Yasui-FurukoriN. (2012). Update on the development of lurasidone as a treatment for patients with acute schizophrenia. Drug Design Dev. Ther.6, 107–115. 10.2147/DDDT.S11180
56
ZhouY.ZhouB.PacheL.ChangM.KhodabakhshiA. H.TanaseichukO.et al. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun.10:1523. 10.1038/s41467-019-09234-6
Summary
Keywords
tensor decomposition, unsupervised learning, gene expression profiles, gene selection, drug treatment
Citation
Taguchi Y and Turki T (2020) Universal Nature of Drug Treatment Responses in Drug-Tissue-Wide Model-Animal Experiments Using Tensor Decomposition-Based Unsupervised Feature Extraction. Front. Genet. 11:695. doi: 10.3389/fgene.2020.00695
Received
20 March 2020
Accepted
05 June 2020
Published
20 August 2020
Volume
11 - 2020
Edited by
Yang Yang, Shanghai Jiao Tong University, China
Reviewed by
Mohit Kumar Jolly, Indian Institute of Science (IISc), India; Vishal Acharya, Institute of Himalayan Bioresource Technology (CSIR), India
Updates
Copyright
© 2020 Taguchi and Turki.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Y-h. Taguchi tag@granular.com
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.