Universal Nature of Drug Treatment Responses in Drug-Tissue-Wide Model-Animal Experiments Using Tensor Decomposition-Based Unsupervised Feature Extraction

Gene expression profiles of tissues treated with drugs have recently been used to infer clinical outcomes. Although this method is often successful from the application point of view, gene expression altered by drugs is rarely analyzed in detail, because of the extremely large number of genes involved. Here, we applied tensor decomposition (TD)-based unsupervised feature extraction (FE) to the gene expression profiles of 24 mouse tissues treated with 15 drugs. TD-based unsupervised FE enabled identification of the common effects of 15 drugs including an interesting universal feature: these drugs affect genes in a gene-group-wide manner and were dependent on three tissue types (neuronal, muscular, and gastroenterological). For each tissue group, TD-based unsupervised FE enabled identification of a few tens to a few hundreds of genes affected by the drug treatment. These genes are distinctly expressed between drug treatments and controls as well as between tissues in individual tissue groups and other tissues. We also validated the assignment of genes to individual tissue groups using multiple enrichment analyses. We conclude that TD-based unsupervised FE is a promising method for integrated analysis of gene expression profiles from multiple tissues treated with multiple drugs in a completely unsupervised manner.

compounds with low structural similarity to known drugs. To compensate for the weaknesses of LBDD, SBDD shows a greater ability to find drug candidate compounds lacking in structural similarity with known drugs. This is because SBDD tries to screen drug candidate compounds by investigating whether these can bind to target proteins. The weak point of SBDD is that it requires massive computational resources, and this prevents its application to large-scale screening, in which candidate drug compounds often number several million.
Considering the relatively low cost of obtaining gene expression profiles, a third computer-aided strategy has been proposed: gene expression profile-based drug design. In this strategy, the gene expression profiles of tissues/cell lines treated with candidate drug compounds are collected. The collected profiles are then compared with those of tissues/cell lines treated with known drug compounds. If the candidate drug compounds share a gene expression profile to some extent with known drug compounds, they are identified as having therapeutic potential against target diseases/proteins. Some databases have been established to assist gene expression profiling for drug design. For example, chemical checker (Duran-Frigola et al., 2020) includes gene expression in computer-aided drug design, whereas PharmacoDB (Smirnov et al., 2017) is fully implemented to consider the dose dependence of drugtreated cell lines for drug design. Many papers have been published on the use of gene expression profiles for computeraided drug design (Chengalvala et al., 2007;Bates, 2011). For instance, Huang et al. (2019) used combinatorial analysis of drug-induced gene expression for cancer drugs, which were then experimentally confirmed in vitro. Lee et al. (2017) proposed DeSigN, a robust and useful method for identifying candidate drugs using an input gene signature obtained from gene expression analysis. Kim et al. (2019) performed computational drug repositioning for gastric cancer using reversal of gene expression profiles, and De Wolf et al. (2018) analyzed highthroughput gene expression profiles to identify similarities between drugs and to predict compound activity. Hodos et al. (2017) tried to fill in missing gene expression observations in cells treated with drugs by predicting cell-specific drug perturbation profiles using available expression data from related conditions. Pabon et al. (2018) predicted protein targets for drug-like compounds using transcriptomics. In contrast, Liu et al. (2017) performed comparative analysis of genes that are frequently regulated by drugs based on connectivity to map transcriptome data.
In contrast to these successful applications of gene expression profile analysis to computer-aided drug design, it is unclear how individual gene expression is affected by drug treatment. First, the number of genes expressed in a dose dependent-manner is as large as the number of genes expressed. Thus, it is not easy to invent a useful method to integrate and understand the dose dependent-genes pertaining to individual gene expression profiles. For example, Lukačišin and Bollenbach (2019) employed principal component analysis (PCA) to integrate the dose dependence of gene expression profiles upon combinatorial drug treatment. They reported a convex (not monotonic) dependence on dose density and identified this as evidence of the cooperative effects of dual drug treatments. Nevertheless, convex dependence on dose was reportedly observed in a single drug treatment if tensor decomposition (TD) was employed to integrate multiple gene expression profiles of cell lines treated with a single drug (Taguchi, 2019). Thus, it is primarily important to identify an effective method that can integrate numerous gene expression profiles of tissues/cell lines treated with drugs.
Recently, Kozawa et al. (2020) used the gene expression profiles of mouse tissues treated with drugs to predict human clinical outcomes. In this paper, we applied TD-based unsupervised feature extraction (FE) to the gene expression profiles used in their study and attempted to identify the changes in gene expression profiles of mouse tissues treated with individual drugs.

TD-Based Unsupervised FE
For applying TD-based unsupervised FE (Taguchi, 2020) to the downloaded gene expression profiles, they must be formatted as a tensor. In this analysis, they were formatted as tensor, x ijkm ∈ R N×24×18×2 , for N genes, 24 tissues, 18 drug treatments, and two replicates. Then, the HOSVD (Taguchi, 2020) algorithm was applied to x ijkm and we obtained TD where G ∈ R N×24×18×2 is the core tensor, u ℓ 1 j ∈ R 24×24 , u ℓ 2 k ∈ R 18×18 ,u ℓ 3 m ∈ R 2×2 , and u ℓ 4 i ∈ R N×N , represents singular value matrices that are also orthogonal matrices. x ijkm is considered to be standardized as i x ijkm = 0 and i x 2 ijkm = N. Mathematically, Equation (1) aims to decompose the dependence of x ijkm upon i, j, k, m into a series of products among u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 i , each of which is supposed to represent the dependence on i, j, k, m. As it is unlikely that a single product of u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 i can reproduce x ijkm , we need to consider various combinations of u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 i , where those associated with distinct ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 are supposed to be associated with distinct dependence on i, j, k, m. Then, the products of u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 i , must be summed up with the weight of G to reproduce x ijkm . Biologically, we cannot expect that u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 i can represent the biological aspect because Equation (1) is simply a mathematical hypothesis; therefore, their association with a biological aspect after performing TD needs to be validated.
To understand how gene expression profiles are altered by drug treatment in a tissue-group-wide manner, we first need to investigate u ℓ 1 j , u ℓ 2 k , and u ℓ 3 m . After identifying which ℓ 1 , ℓ 2 , and ℓ 3 are biologically interesting, we select ℓ 4 associated with G(ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 ) that have the largest absolute values with fixed ℓ 1 , ℓ 2 , and ℓ 3 , because u ℓ 4 i associated with such ℓ 4 is supposed to represent the weight of gene i that is expressed in association with j, k, m dependence represented by the selected u ℓ 1 j , u ℓ 2 k , u ℓ 3 m .
Using the identified u ℓ 4 i , the P-value, P i , is attributed to gene i as where P χ 2 [> x] is the cumulative probability of χ 2 distribution and σ ℓ 4 is the standard deviation. Here, we assume that u ℓ 4 i obeys a Gaussian distribution with zero mean because i x ijkm = 0. P i is corrected via the BH criterion (Burgos et al., 2014) and I, a set of genes i associated with adjusted P-values less than 0.01, is selected. For a more detailed explanation of TD-based unsupervised FE, see the recently published monograph (Taguchi, 2020).

t-Test and Wilcoxon Test Applied to Sets of Genes Classified Based on Tissue Groups and Drugs Groups
In order to determine whether the selected set of genes, I, are expressed distinctly between the two assigned tissue groups, J, {x ijkm |i ∈ I, j ∈ J}, and J, {x ijkm |i ∈ I, j ∈ J}, we applied a twoway t test and Wilcoxon test and computed the P-values. Similar analyses were done with two drug groups, K, {x ijkm |i ∈ I, k ∈ K}, and K, {x ijkm |i ∈ I, k ∈ K}.

Enrichment Analysis
The selected genes (gene symbols) were uploaded to Enrichr (Kuleshov et al., 2016) and Metascape (Zhou et al., 2019) in order to validate the various biological functions of the selected genes. Figure 2 summarizes the results obtained in this study.

Drug Treatment Specificity
After obtaining the TD, Equation (1), we first investigated u ℓ 2 k attributed to the kth drug. Although the number of drugs tested is as many as 15, the total number of drug treatments was considered to be 18 due to the testing of three additional conditions. Usually, the first singular value vectors represent uniform values (i.e., components that are not distinct between samples) (Taguchi, 2020). In this case, u 1k does not represent any dependence on drug treatment. This is reasonable because the expression of most genes is unlikely to be affected by drug treatment. We thus considered the second and third singular value vectors, u 2k and u 3k , attributed to drug treatments (Figure 3). In contrast to expectations, the drug treatments were quite universal. Most of the drug treatments [other than (2), (9), (15), and (17)] were separated from the control treatments [(2), (9), (15), and (17)] along one direction (red arrow) whereas the diversity among drug treatments was spread perpendicular (blue arrow) to that direction, only among drug treatments. This suggests that the gene expression profiles are altered similarly, independently of the drug treatment.

Tissue-Specificity
We further studied the relationship of universal drug treatments with individual tissues. For this, we next investigated u ℓ 1 j attributed to 24 tissues. We then found that several u ℓ 1 j are expressed in a tissue-group wide manner (Figure 4). The tissuewide expression pattern identified by singular value vectors is described as follows; As u 1j does not express any tissue specificities, it is unlikely to exhibit tissue specificity; as u 2j has larger absolute values for the brain, eye, pituitary, and testis, it is likely to represent neuronal tissue specificities; as u 3j has larger absolute values only for the parotid, we did not consider it further; as u 4j exhibits larger absolute values for the heart and SkMuscle, we considered that it exhibits muscle specificities; As u 5j and u 6j exhibit larger absolute values for the pancreas and stomach, we considered that it exhibits gastric tissue specificities. It is thus obvious that the combination of tissue specificity is quite reasonable biologically. Aiming to specify singular value vectors attributed to genes, u ℓ 4 i , for gene selection, we then checked which of G(ℓ 1 , 2, 1, ℓ 4 ) and G(ℓ 1 , 3, 1, ℓ 4 ) have larger absolute values, as u 1m always exhibits the same values between two replicates ( Table 1).
For ℓ 1 = 2, which is supposed to be attributed to neuronspecific tissues (u 2j ), Gs with ℓ 4 = 2 have larger absolute values. Thus, u 2i was employed for neuron-specific gene selection. For ℓ 1 = 4, which is supposed to be attributed to muscle-specific tissues (u 4j ), Gs with ℓ 4 = 4 have larger absolute values. Thus, u 4i was employed for muscle-specific gene selection. For ℓ 1 = 5, which is supposed to be attributed to gastrointestinal-specific tissues (u 5j ), Gs with ℓ 4 = 5 have larger absolute values. Thus, FIGURE 4 | Singular value vectors, u ℓ1 j , attributed to tissues. u 1j : no tissue specificity. u 2j : Brain Eye, Pituitary, and Testis, thus mostly neuron-specific. u 3j : Parotid-specific, u 4j : Heart and SkMuscle, thus muscle-specific, u 5j and u 6j : stomach and pancreas, thus, gastrointestinal-specific.
ℓ 1 2 4 ℓ 4 G(2, 2, 1, ℓ 4 ) G(2, 3, 1, ℓ 4 ) G(4, 2, 1, ℓ 4 ) G(4, 3, 1, ℓ 4 )  u 5i was employed for muscle-specific gene selection. For ℓ 1 = 6, which is also supposed to be attributed to gastrointestinalspecific tissues (u 6j ), Gs with ℓ 4 = 6, 7 have larger absolute values. Then, u 6i and u 7i were employed for muscle-specific gene selection. After computing the adjusted P-values, P i , attributed to the genes (see methods), the genes associated with adjusted P i <0.01 were selected ( Table 2). The lists of selected genes can be found in supporting information (Additional File 1). Figure 5 shows a Venn diagram of the selected genes. As expected, two sets of genes, Gas1 and Gas2, which are supposed to be gastrointestinal-specific, are quite common. Other than these, the selected genes are quite distinct from one another. Thus, TD-based unsupervised FE successfully identified the genes whose expression was affected by the drugs in a tissue groupspecific manner.

Confirmation of Differential Expression
In order to check whether the selected genes are expressed distinctly between the specified tissues and other tissues, as well as between drug treatments and controls, we first applied statistical FIGURE 5 | Venn diagram of genes selected by TD-based unsupervised FE. Neuron: genes associated with u 2j , which is supposed to be neuron-specific. Muscle: genes associated with u 4j , which is supposed to be muscle-specific. Gas1 and Gas2: genes associated with u 5j and u 6j , respectively, which are supposed to be gastrointestinal-specific.
tests to the selected genes ( Table 2). The data clearly showed that for all cases, gene expression was distinct between the specified tissues and other tissues as well as between drug treatments and   controls. Thus, TD-based unsupervised FE allowed us to select the genes whose expression is coincident with u ℓ 2 k s in Figure 3 and u ℓ 1 j s in Figure 4.

Biological Evaluation
Next, we evaluated the selected genes biologically. For this purpose, we first uploaded the genes to Metascape (Figure 6). Initially, we noticed that Gas1 and Gas2 largely shared the enriched terms as expected, even though these two gene sets were selected using distinct singular values (u 5i and u 6i , u 7i , respectively). In particular, it is important to note that two KEGG terms, "mmu04971: Gastric acid secretion" and "mmu04972: Pancreatic secretion" are shared by Gas1 and Gas2, which are supposed to be Pancreas-and Stomach-specific. In contrast, various muscle-related terms are enriched in the Muscle gene set as expected, whereas "GO:0002088: lens development in camera-type eye" is enriched in the neuronal gene set. All of these results suggest  (Oliveira and Oliveira, 2016) Muscle mass (Harada et al., 2015) Pancreatitis (Hung, 2014) Acetaminophen (APAP) Brain (Ghanem et al., 2016) Skeletal muscle (Trappe et al., 2011) Pancreatitis (Chen et al., 2015) Aripiprazole Brain activation (Myrick et al., 2010) Muscle spasms (*) Pancreatitis (Kiraly and Gunning, 2008) Asenapine Cognitive and monoamine dysfunction Elsworth et al. (2012) Muscle rigidity(*) -
that TD-based unsupervised FE selected the biologically reasonable genes. Figure 7 shows the protein-protein interaction (PPI) network provided by Metascape. A high degree of connectivity was obvious. Thus, TD-based unsupervised FE identified the sets of genes among which PPI is enriched. Moreover, Gas1 and Gas2 largely share the PPI network, whereas the neuronal and muscular gene sets form their own PPI network within which PPI is enriched. Thus, PPI analysis also indicated that TD-based unsupervised FE identified biologically reasonable genes.
To eliminate the possibility that our results were specific to the Metascape data set, we uploaded the genes selected by TD-based unsupervised FE to Enrichr (Table 3). With this data set, we observed clear detection of at least one tissuerelated disease for four sets of tissue-specific genes, validating the Metascape-based results.

DISCUSSION
Although it is unclear why the 15 drugs affected the expression of many common genes, a detailed investigation can allow further interpretation. Table 4 shows the drugs' effects on neuronal, muscular, and pancreatic tissues. These data suggest that most drugs simultaneously affect these three groups of tissues.
Our results are in contrast to the study that inspired our work (Kozawa et al., 2020), in which the authors employed a fully supervised approach requiring previous knowledge. Although Kozawa et al. (2020) also aimed to infer the therapeutic and side effects of drug treatments in humans based on gene expression in drug-treated tissues, their analysis required previous knowledge that is not needed for TD-based unsupervised FE. In this sense, our approach has distinct potential that the original study could not achieve.
In addition to the above-mentioned biological superiority of TD-based unsupervised FE, this approach also has some methodological advantages as follows. First, although we classified 24 tissues into two groups based on the observation of singular value vectors attributed to tissues, u ℓ 1 j (Figure 4) prior to the identification of differentially expressed genes, it is computationally infeasible for other methods to classify 24 tissues into two groups before starting to seek differentially expressed genes, as there are no criteria on how to divide 24 tissues into two groups. It is thus practically impossible to analyze all possible divisions, as they number in the millions. The same advantage is observed when grouping 18 drug treatments into two. This may be much easier than classifying tissues, because some of the drug treatments are obviously controls. Nevertheless, based upon the second and third singular value vectors attributed to drug treatments, u 2k and u 3k (Figure 3), acetaminophen (APAP) and sofosbuvir are grouped together with two control treatments. Such a classification can never be proposed without TD. In this sense, there is no computationally feasible method that can compete with our method.
The biological basis for the two groups of drugs seen in Figure 3 may be questioned. To clarify this point, we uploaded two groups of drugs to DrugEnrichr (Kuleshov et al., 2020), which evaluates the coincidence of genes targeted by the uploaded drugs (Additional File 2). Based on the "Geneshot Predicted from Co-expression" category in DrugEnrichr, we found that there are at least as many as 164 genes targeted by two drugs (APAP and Sofosbuvir) in group1 whereas 213 genes are targeted by at least two drugs among as many as 13 drugs included in group2 (Alendronate, Aripiprazole, Asenapine, Cisplatin, Clozapine, Dox, EMPA, Lenalidomide, Lurasidone, Olanzapine, Repatha, Risedronate, Teriparatide). This suggests that these two groups of drugs are quite distinct because there are no common targeted genes between these 164 and 213 genes. Thus, the groups of drugs identified by TD based unsupervised FE are likely based on the genes that the drugs target.
In view of the two above-mentioned advantages, TDbased unsupervised FE might yield completely distinct outcomes that other supervised methods cannot, and it therefore represents a worthwhile primary or supplementary approach to gene-expression-based investigation of drug effects.
One might wonder if the results were confirmed only by single experiments. As the results shown in Table 3 indicate coincidence between the present result and other studies, the results derived in this study are not dependent on a single study, but are coincident with numerous studies in the public domain database.
Moreover, TD-based unsupervised FE is a very useful strategy for repositioning known drugs. As shown in Figure 3, TDbased unsupervised FE can determine the effective tissue. Furthermore, as indicated in Table 3, the genes selected by TDbased unsupervised FE can indicate the diseases for which the drugs have potential effectiveness. Therefore, applying TD-based unsupervised FE to gene expression profiles altered by drug treatments can be a promising strategy to repurpose known drugs for new diseases.

CONCLUSIONS
In this paper, we applied TD-based unsupervised FE (Taguchi, 2020) to the gene expression profiles of 24 mouse tissues treated with 15 drugs. Integrated analysis allowed us to identify the universal nature of drug treatments in a tissue-group-wide manner, which is generally impossible to identify using any other supervised strategy that requires prior information.

DATA AVAILABILITY STATEMENT
All datasets analyzed in this study were obtained from GEO: GSE142068.

AUTHOR CONTRIBUTIONS
Y-hT planned and performed the study. Y-hT and TT discussed the results and wrote the paper. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by KAKENHI 19H05270, 20K12067, and 20H04848. This project was also funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. KEP-8-611-38. The authors, therefore, acknowledge DSR with thanks for providing technical and financial support.