3′-UTR Shortening Contributes to Subtype-Specific Cancer Growth by Breaking Stable ceRNA Crosstalk of Housekeeping Genes

Shortening of 3′UTRs (3′US) through alternative polyadenylation is a post-transcriptional mechanism that regulates the expression of hundreds of genes in human cancers. In breast cancer, different subtypes of tumor samples, such as estrogen receptor positive and negative (ER+ and ER–), are characterized by distinct molecular mechanisms, suggesting possible differences in the post-transcriptional regulation between the subtype tumors. In this study, based on the profound tumorigenic role of 3′US interacting with competing-endogenous RNA (ceRNA) network (3′US-ceRNA effect), we hypothesize that the 3′US-ceRNA effect drives subtype-specific tumor growth. However, we found that the subtypes are available in different sample sizes, biasing the ceRNA network size and disabling the fair comparison of the 3′US-ceRNA effect. Using normalized Laplacian matrix eigenvalue distribution, we addressed this bias and built tumor ceRNA networks comparable between the subtypes. Based on the comparison, we identified a novel role of housekeeping (HK) genes as stable and strong miRNA sponges (sponge HK genes) that synchronize the ceRNA networks of normal samples (adjacent to ER+ and ER– tumor samples). We further found that distinct 3′US events in the ER- tumor break the stable sponge effect of HK genes in a subtype-specific fashion, especially in association with the aggressive and metastatic phenotypes. Knockdown of NUDT21 further suggested the role of 3′US-ceRNA effect in repressing HK genes for tumor growth. In this study, we identified 3′US-ceRNA effect on the sponge HK genes for subtype-specific growth of ER- tumors.


INTRODUCTION
Approximately, 70% of human genes contain multiple polyadenylation (polyA) sites in the 3 ′ -untranslated region (3 ′ -UTR) (Mayr and Bartel, 2009). Through alternative polyadenylation (APA) during transcription, messenger RNAs (mRNA) from the same gene can have various 3 ′ -UTR lengths. Since the 3 ′ -UTR contains regulatory regions including microRNA (miRNA) target sites, mRNAs with shortened or lengthened 3 ′ -UTRs may diversify the regulation landscape, for example, miRNA binding landscape. In human cancer, 3 ′ -UTR lengthening (3 ′ UL) has been associated with cell senescence (Chen M. et al., 2018) with implications for tumor-associated processes, such as cell cycle inhibition, DNA damage/repair process, and tumor suppression (Dimri et al., 1995;Busuttil et al., 2003;López-Otín et al., 2013;Muñoz-Espín and Serrano, 2014). Widespread 3 ′ -UTR shortening (3 ′ US) has been reported for diverse types of human cancer (Mayr and Bartel, 2009). Furthermore, 3 ′ US events add prognostic power beyond common clinical and molecular covariates in cancer patients  and are associated with drug sensitivity in cancer cell lines (Xiang et al., 2018). These results suggest that APA events, both 3 ′ -UTR shortening and lengthening, play important roles in cancer etiology and treatments.
The 3 ′ -UTR is also implicated in competing-endogenous RNA crosstalk (ceRNA) . ceRNAs co-regulate each other's RNAs through competing for binding miRNAs. In diverse types of cancer, ceRNA regulation involves established oncogenes and tumor-suppressor genes (Sumazin et al., 2011) and facilitates molecular pathway interactions for tumorigenesis (Park et al., 2018a). When 3 ′ -UTR shortening genes lose miRNA target sites on their 3 ′ -UTRs and do not sequester the miRNAs, the associated miRNAs bind to the 3 ′ -UTR of the ceRNA partners. As a result, 3 ′ -UTR shortening disrupts ceRNA crosstalk (3 ′ US-ceRNA effect) for growth in diverse types of cancer, including breast cancer (Park et al., 2018b). In a recent study, we showed that this 3 ′ US-ceRNA effect promotes tumor growth independent of potential confounding factors, such as somatic mutation status (SNPs and small INDELs), tumor purity, immune cell infiltration, cell proliferation, or miRNA biogenesis and expression .
Breast cancer can be classified into two major subtypes based on the presence or absence of estrogen receptor (ER) (Hammond et al., 2010). Estrogen receptor positive (ER+) breast tumors grow in the presence of the hormone estrogen. So, ER+ cancers can be treated with endocrine therapy which blocks ER activity or depletes estrogen levels. On the other hand, estrogen receptor negative (ER-) breast tumors have a unique growth mechanism due to absence of the estrogen receptor. The unique growth mechanism of ER-tumors makes it difficult to treat ER-breast cancer that has a worse prognosis than ER+ (Tsutsui et al., 2002) with a more aggressive phenotype (Sheikh et al., 1994;Perou et al., 2002). Based on the profound tumorigenic effect of 3 ′ US-ceRNA (Park et al., 2018b), we hypothesize that 3 ′ US-ceRNA effects specific to ER-breast tumors contribute to the unique growth mechanism. In this study, we tested this hypothesis by addressing a quantitative challenge due to the different sample sizes between ER+ and ER-breast tumor samples. As a result, we identified a novel subset of housekeeping (HK) genes (sponge HK) effectively sponging miRNAs to synchronize the ceRNA networks in normal samples (adjacent to the subtype tumor samples). Furthermore, we showed that the 3 ′ US-ceRNA effects repress the sponge HK genes, leading to subtype-specific tumor growth. In ER-breast tumor, this subtype-specific tumor growth is associated with aggressive and metastatic phenotypes of ERtumors, attributing its unique growth mechanism partially to subtype-specific 3 ′ US-ceRNA effects.

TCGA Breast Tumor RNA-seq Data and Identification of Breast Cancer Subtypes
Quantified gene expression files (RNASeqV1) for primary breast tumors (TCGA sample code 01) and their matching solid normal samples (TCGA sample code 11) were downloaded from the TCGA Data Portal (Goldman et al., 2013). We used 97 breast tumor samples that have matched normal tissues, which were further categorized into 77 estrogen receptor positive (ER+) and 20 estrogen receptor negative (ER-). For ER+ and ER-, we collected both normal (ER+ normal and ER-normal) and tumor (ER+ tumor and ER-tumor) samples. A total of 10,868 expressed RefSeq genes [fragments per kilobase of transcript per million mapped reads (FPM) ≥1 in >80% of all samples] were selected for downstream analyses.

Selection of miRNA Target Sites
The predicted miRNA target sites were obtained from TargetScanHuman version 6.2 (Lewis et al., 2005). Only those with a preferentially conserved targeting score of more than 0 were used . Experimentally validated miRNA target sites were obtained from TarBase version 5.0 (Papadopoulos et al., 2009), miRecords version 4 (Xiao et al., 2009), and miRTarBase version 4.5 (Hsu et al., 2014). The target sites found in indirect studies such as microarray experiments and high-throughput proteomics measurements were filtered out (Dvinge et al., 2013). Another source is the microRNA target atlas composed of public AGO-CLIP data (Hamilton et al., 2013) with significant target sites (q < 0.05). The predicted and validated target site information was then combined for use in this study. Among 1,261 miRNAs curated in the TCGA BRCA data, we used 713 expressed ones (average FPM >1) in our analyses (Supplementary Table 7).

Statistical Significance of Pearson Correlation Coefficient
The implementation of the Pearson r function is provided by a python package, SciPy, and available at https://scipy.org/, which returns the calculated correlation coefficient and a twotailed p-value for testing the non-correlation. The Pearson correlation coefficient measures the linear relationship between two variables (e.g., gene X and gene Y), and when the two covariates follow a binormal distribution, we can assume that their Pearson's correlation follows Student's t-distribution. The p-value is calculated by three steps: (1) calculating the value of the Pearson's correlation t, (2) defining the degree of freedom df (N-2, where N is the sample size), and (3) getting the probability of having t or more extreme than t from a Student's t-distribution with the degrees of freedom df. We used hypergeometric test in Scipy to estimate the significance of miRNA binding site overlap between genes.

Detection of APA Events
We used DaPars  to identify 3 ′ UTR shortening and lengthening in RNA-Seq data based on the same cutoff and parameter values optimized in the original paper. We checked that our prediction is 100% matched with that of the original DaPars result. The DaPars paper provided multiple lines of evidence to demonstrate that DaPars indeed identified APA events in the TCGA data. First, 51% of the DaPars predictions are within 50 bp of the annotated APAs compiled from Refseq, ENSEMBL, UCSC gene models, and polyA database [(polyA_DB Lee et al., 2007]. Second, in the upstream (−50 nt) of the predicted APA sites, MEME motif enrichment analysis (Bailey et al., 2009) successfully identified canonical polyA signal AATAAA.

Housekeeping, Transcription Factor, and Tumor-Associated Genes
Housekeeping genes are required for the maintenance of basic cellular functions that are essential for the existence of a cell, regardless of its specific role in the tissue or organism. Generally, housekeeping (HK) genes are expected to be expressed at relatively constant rates in most non-pathological situations (Eisenberg and Levanon, 2003). We used 3,804 HK genes defined in RNA-Seq data for 16 normal human tissue types: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells (Eisenberg and Levanon, 2013).
Transcription factors (TFs) play an important role in the gene regulatory network. We downloaded 2,020 TF genes defined in the TFcheckpoint database (Chawla et al., 2013), in which TF information is collected from nine different resources. Among them, we used 1,020 genes that are further supported by sequence-specific DNA-binding RNA polymerase II activity.
The tumor-suppressor genes and oncogenes were defined by the TUSON algorithm from genome sequencing of >8,200 tumor/normal pairs (Davoli et al., 2013), in particular, residuespecific activating mutations for oncogenes and discrete inactivating mutations for tumor-suppressor genes. TUSON computationally analyzes patterns of mutation in tumors and predicts the likelihood that any individual gene functions as a tumor-suppressor gene or oncogene. We used 466 oncogenes and 466 tumor-suppressor genes at the top 500 in each prediction (after subtracting 34 genes in common).

Building Subtype ceRNA Networks
For each of the breast cancer data (ER+ normal, ER+ tumor, ERnormal, and ER-tumor) that we defined above, we constructed a ceRNA network based on the microRNA (miRNA) target site share and expression correlation (Ala et al., 2013;Park et al., 2018b). The same miRNA target site information was determined regardless of the subtypes, resulting into the miRNA target site share network (based on FDR > 0.05 in hypergeometric test with miRNA target site information). Given the same miRNA target site share network, the expression correlation information for each subtype will select ceRNA network edges for each subtype.
We first constructed the ER+ normal reference ceRNA network by applying a traditional correlation cutoff (≥0.6) on the miRNA target site share network. Then, to identify ERnormal ceRNA network comparable to ER+ normal reference ceRNA network, we applied different correlation cutoff values (0-1, with a step size of 0.01) on the miRNA target site share network for ER-normal samples and selected the correlation cutoff values that make ER-normal ceRNA network most similar to the ER+ normal reference ceRNA network. To estimate topological similarity, we employed normalized Laplacian matrix eigenvalue distribution that discovers ensembles of Erdos-Rényi graphs better than other metrics such as Sequential Adjacency or Laplacian (Gera et al., 2018). After identifying the ER+ normal reference network and the corresponding ER-normal network, we used the same cutoffs (0.6 for ER+ subtypes and 0.68 for ER-subtypes) to construct the ER+ tumor network and the ERtumor network, respectively. An overall workflow is presented in Supplementary Figure 1.

Estimating Topological Similarity
To identify the structural equivalence between two networks, we employed spectral analysis not only to identify the structural similarities but also to track down the underlying dynamic behavior changes between them. Spectral clustering on networks uses the eigenvalues of several matrices, such as adjacency matrix, the Laplacian matrix, and the normalized Laplacian matrix. In this research, we used the normalized Laplacian matrix since it involves both the degree matrix and adjacency matrix, where the degree matrix can identify the node-related equivalence of networks and the adjacency matrix can capture the structural equivalence of networks. Another very important reason of using the normalized Laplacian eigenvalue matrix is that it is more sensitive to small changes because it considers more information (Perou et al., 2002).
For network G, the normalized Laplacian of G is the matrix: where L is the Laplacian matrix of G and D is the degree matrix. The Laplacian matrix L is defined as: In N, each of its entry elements is given by: where degree (vertex v) is the function that returns the degree of the vertex v.
To assess how close the two networks G 1 and G 2 are, we first built N1 and N2 based on the connection information of G1 and G2, respectively. Then, we defined dist 1 and dist 2 as the eigenvalue distribution of N1 and N2, respectively. We further used the Kolmogorov-Smirnov test (KS test), which is defined as: where sup x is the supremum of the set of distances. By using the normalized Laplacian matrix and KS test, ER+ normal reference network G ER+ ref is compared with ER-normal subnetwork with a particular correlation cutoff i G ER− i in the following three steps: (

Widespread 3 ′ -UTR Shortening and Lengthening Events for ER+ and ER-
To identify subtype-specific APA genes, we first identified 77 ERpositive (ER+) and 20 ER-negative (ER-) sample pairs (breast tumor and the adjacent normal samples) from 97 sample pairs available in TCGA (see "MATERIALS AND METHODS"). Then, we identified 3 ′ UTR shortened (3 ′ US) and 3 ′ UTR lengthened (3 ′ UL) genes (tumor vs. normal) using DaPars  in each subtype. We found that the ER+ and ER-sample pairs have similar numbers of total 3 ′ US genes and 3 ′ UL genes ( Figure 1A). However, the 3 ′ US genes are more recurrent [occurring in >20% of the tumor samples ] in both subtype tumors (Figures 1B,C; e.g., P = 5.0 × 10 −5 for ER+). Further analyses showed that 3 ′ US and 3 ′ UL play distinct roles in the subtypes. First, the recurrent 3 ′ US and 3 ′ UL genes show little overlap (one and 13 genes in common, P = 1.27e −6 and P = 3.97e −9 , respectively; Figures 1B,C). Second, the number of 3 ′ UL events is not correlated with that of 3 ′ US events across the tumor samples (P = 0.35 for ER+ and P = 0.61 for ER-; Figures 1D,E). Third, ingenuity pathway analysis (IPA) shows that the recurrent 3 ′ US and 3 ′ UL genes are enriched for distinct sets of molecular pathways (Supplementary Table 1,  Supplementary Figure 2). The IPA analysis further suggests that the 3 ′ UL or 3 ′ US genes themselves have limited roles for cancer overall since a small number of pathways are significantly (P < 10 −2 ) enriched for them (12 and 14 for 3 ′ UL in ER± and 29 and three for 3 ′ US in ER± samples) and at most a couple of them are "cancer" pathways (one for 3 ′ UL in ER+ and two for 3 ′ US in ER-with keyword "cancer"). Based on the profound tumorigenic role of 3 ′ US in its interaction with ceRNAs (3 ′ US-ceRNA effect) (Park et al., 2018b), we hypothesize that 3 ′ US-ceRNA effect, not 3 ′ US cis effect, promotes ER-specific tumor growth.

Two-Step Pairwise Normalization of ER+ and ER-ceRNA Network
We previously identified the 3 ′ US-ceRNA effect in the ceRNA network (Park et al., 2018b). To identify the 3 ′ US-ceRNA effect specific to ER-tumors, we aim to build ceRNA networks for ER-and ER+ tumors and compare them. Computationally, the ceRNA gene pairs in the networks are those that share a significant number of miRNA target sites and are co-expressed (Ala et al., 2013;Park et al., 2018b). However, using the common co-expression cutoff (e.g., Pearson's ρ > 0.6) will inflate the number of edges for ER-(160,687 in ER-normal vs. 88,275 in ER+ normal; Supplementary Figure 3A). To test if this inflation is attributable to the sample size difference, we built the ceRNA network 100 times from different numbers of (20, 40, 60, and 75) normal subsamples from ER+ tumors based on the same co-expression cutoff (Supplementary Figure 3B). In general, the number of edges in the ceRNA networks increases as the subsample size decreases. Especially when the same number of samples (20) to that of the ER-normal network is used, the number of edges in the subsampled networks becomes closer to the case of the ER-normal network.
Since the network size difference is attributable to the sample size difference, one might want to subsample ER+ normal samples to match the number of samples for ER-(n = 20). To assess this solution, we subsampled 20 ER+ normal samples 100 times, built a ceRNA network for each subsample, and collected all the edges (916,999) across the networks. Then, we checked how many times each edge occurs across the 100 subsampled networks. We found that the subsampled ceRNA networks do not keep topological consistency within them as 22.1% (202,997) of the edges are shared by <20 ceRNA networks (Supplementary Figure 3C). Then, one might want to build the ER-ceRNA network using the co-expression cutoff with the same statistical significance to ER+ (0.91, P ∼ = 10 −8.2 ; Supplementary Figure 3D). To achieve the same statistical significance of the traditional cutoff value (0.6) of ER+, the cutoff value of ER-would inflate to 0.91, resulting in a drastically deflated number of edges (Supplementary Figure 3D). We addressed this issue in the following way. First, we built the reference network from normal samples of larger size (ER+) using the common correlation cutoff (Pearson's ρ > 0.6). Since the normal samples should have similar molecular dynamics between ER+ and ER-, we sought to find the coexpression cutoff for the ER-normal network that yields the most topological similarity to the ER+ reference network. To estimate topological similarity, we employed a normalized Laplacian matrix eigenvalue distribution that discovers ensembles of Erdos-Rényi graphs better than other metrics, such as Sequential Adjacency or Laplacian (Gera et al., 2018) (see "MATERIALS AND METHODS"). While ER-normal network topology changes drastically if different correlation cutoff values are used (Supplementary Figures 3E,F), we found that the cutoff of 0.68 makes the ER-normal network most similar to the ER+ reference network (Supplementary Figure 3G). Using another measure for topological similarity, average clustering coefficient (Friedel and Zimmer, 2006), the cutoff of 0.68 is supported again since normal ER-network with correlation cutoff 0.68 makes the closest average clustering coefficient to the reference network (0.4; Supplementary Figure 3H). Since normal and tumor ceRNA networks within each subtype share the same number of samples and thus would not suffer from this bias (Altay et al., 2011;Park et al., 2018b;Dalgiç et al., 2019), we applied the subtype-specific cutoffs (0.68 for ER-and 0.6 for ER+) to build the tumor ceRNA networks in each subtype.

′ UTR Shortening Is Associated With the Aggressive Metastatic Phenotypes of ER-Tumors in ceRNA
In a normal ER-ceRNA network based on the subtype-specific co-expression cutoff, 1,783 genes are in the ceRNA relationship with 521 3 ′ US genes (3 ′ US ceRNA partners). Among 1,783 3 ′ US ceRNA partners, 498 (27.9%) are found only in ER-(ER-3 ′ US ceRNA partners), whereas the other 1,285 (72.1%) are also in ER+ as 3 ′ US ceRNA partners (common 3 ′ US ceRNA partners; Figure 2A). We found that 118 IPA canonical pathways significantly (P < 0.01) enriched for the ER-3 ′ US ceRNA partners (Supplementary Table 2) are linked with several aspects of ER-specific tumor phenotypes ( Figure 2B). The first set of the pathways are "cancer" pathways. For example, the "molecular mechanisms of cancer" pathway (P = 10 −5.25 ) includes a comprehensive set of genes, disruptions of which are known to promote tumor growth. Specific to breast cancer, the enrichment of the "breast cancer regulation by Stathmin1" (P = 10 −3.92 ) pathway is interesting since the overexpression of Stathmin1 correlates with loss of the ER (Curmi et al., 2000) and with aggressive breast tumor phenotypes (Obayashi et al., 2017). The second category of pathways underlies the aggressive metastasis of ER-tumors. For example, among eight pathways that were shown to play roles in breast tumor metastasis (Krishnan et al., 2013), we found that five of them are significantly enriched for ER-3 ′ US ceRNA partners with the exception of PI3K/AKT, the enriched p-value of which is just below the significance cutoff (P = 10 −1.95 ). Furthermore, previous studies have associated breast tumor malignancy and poor survival with the abnormal control of Ephrin A [reviewed in (Vaught et al., 2008)], which is strongly enriched for ER-3 ′ US ceRNA partners (P-value = 10 −5.05 ). In normal samples without 3 ′ -UTR shortening, 3 ′ US ceRNA partners should closely regulate these pathways. However, in ER-tumors characterized by widespread 3 ′ US events, most (81.7%) of the 3 ′ US ceRNA partners lost the ceRNA relationship ( Figure 2C), likely losing the normal control.

Housekeeping Genes Keep ER+ and ER-Normal ceRNA Networks to Similar Topology
Furthermore, we categorized genes that have a possible sponge effect (>5 miRNA binding sites in the 3 ′ UTR) into housekeeping genes, tumor-associated genes (tumor suppressors or oncogenes; TA), and transcription factors. Based on 3,804 HK (Eisenberg and Levanon, 2013), 932 TA (Davoli et al., 2013), and 1,020 TF genes (Chawla et al., 2013) curated in public databases (see "MATERIALS AND METHODS"), the ceRNA networks consist of threefold more HK genes than TA or TF genes (Figure 3A for normal and Supplementary Figure 4A for tumor). Due to their active roles in cell maintenance (Eisenberg and Levanon, 2013), HK genes are expected to maintain constant expression levels under most physiological conditions (Eisenberg and Levanon, 2013). Accordingly, the 958 HK ceRNA genes in ER-normal ( Figure 3A) express as highly as (Supplementary Figure 4B), but with less significant variation (P = 1.72e −54 ) across the normal samples (Figure 3B), the 1,906 non-HK ceRNA genes in the network. With our observation that the HK genes contain more miRNA binding sites than the other genes (P = 0.05, Figure 4C), they should function as stable sponges for miRNAs (Tay et al., 2014). Thus, with a significant number (P = 8.77e −771 ) of overlap in the HK ceRNA genes between ER-and ER+ normal samples (Figure 3D), we hypothesize that they keep ER-and ER+ normal ceRNA networks in similar topology. To test this hypothesis, we first selected edges involving the HK ceRNA genes from the ER+ and the ER-normal ceRNA networks to form subnetworks and compared the subnetworks using normalized Laplacian matrix eigenvalue distribution. Furthermore, we randomly subsampled the same number of edges not involving HK genes 200 times from the ER+ and ER-ceRNA networks and compared the networks in the same way ( Figure 3E). The HK ceRNA networks are significantly more similar between ER+ and ER-(P < 0.01) than the 200 non-HK ceRNA networks, suggesting that HK genes make normal ceRNA crosstalk consistent between the subtypes through the miRNA sponge effect.

′ US Disrupts ceRNA Crosstalk of Housekeeping Genes for ER-Specific Growth
We further examined the impact of 3 ′ US on the role of HK genes. First, 3 ′ US genes are highly connected to HK genes. Out of 958 HK genes, 727 HK genes (75.8%) are connected to 3 ′ US genes, which is in the same scale as the other classes of genes that are known to be regulated by 3 ′ US genes (Sumazin et al., 2011;Park et al., 2018b) [196 (61.8%) TA genes and 245 (90.2%) TF genes; Figure 4A]. Also, these HK genes are more highly connected in the network compared to 231 HK genes that are not connected to 3 ′ US genes ( Figure 4B). Previously, we showed that 3 ′ US represses the ceRNA partners in tumor (Park et al., 2018b). Consistently, these HK genes, ceRNA partners of 10.2 3 ′ US genes on average (Supplementary Table 5), are more repressed in tumor than the 231 HK genes not connected to 3 ′ US genes (P = 0.00035; Figure 4C). For example, transforming growth factor beta regulator 1 (TBRG1) is connected to four 3 ′ US genes (PPP6C, DICER1, H2AFV, and UBL3) in ER-normal samples. With 3 ′ US in ER-tumor samples, TBRG1 is significantly downregulated (log fold change = −0.15) considering the general upregulation of the other housekeeping genes (Figure 4C). TBRG1 and those four 3 ′ US genes are predicted to share binding sites of miR-874 (see "MATERIALS AND METHODS"). MiR-874 was experimentally shown to repress TBRG1 to promote nonfamilial breast cancer (Bastos et al., 2014). Although miR-874 was expressed (average FPM is 5.3 and 5.1 in ER-tumor and normal samples), they were not significantly (P = 0.58) up-regulated in ER-tumor samples to repress TBRG1. The 3'UTR shortening of the four genes may instead redirect miR-874 to bind more efficiently on TBRG1, leading to its repression. We checked that TBRG1 is not alternatively polyadenylated in ER-tumors (neither 3 ′ US nor 3 ′ UL). Globally, we checked that only 76 out of 958 HK ceRNA genes in ER-(7.9%) are either 3 ′ US or 3 ′ UL genes in tumors. This low overlap between our HK genes and 3 ′ US genes implies that HK genes may not be directly related to growthrelated functions (Curinha et al., 2014;Masamha et al., 2014) but contribute to tumorigenesis through 3 ′ US-ceRNA. To further understand the impact of the repression on the ceRNA network, we compared the number of the ceRNA partners of these HK genes between normal and tumor. Previously, we showed that 3 ′ US genes will break their relationship with the ceRNA partners (Park et al., 2018b). Since the ceRNA relationship changes, either loss or gain, could propagate to neighboring ceRNA relationships FIGURE 2 | 3 ′ UTR shortening is associated to ER-'s aggressive phenotypes in ceRNA. (A) Intersection of 3 ′ US ceRNA partners between ER-and ER+ normal ceRNA networks. (B) IPA canonical pathways significantly (P < 0.01) enriched for the ER-3 ′ US ceRNAs. The pathways are color-coded by keyword: "cancer" in blue, "signaling" in red, and those associated with aggressive phenotypes (Krishnan et al., 2013) in green. (C) Intersection of 3 ′ US ceRNA partners in ER-between normal and tumor ceRNA networks. (Park et al., 2018a), the repression of HK genes should break the ceRNA relationship not only with 3 ′ US genes but also with other ceRNA partners. Consistent to the expectation, 727 3 ′ US HK ceRNA partners lost higher ratios of the ceRNA partners in tumor (Figure 4D). We found a similar trend of HK gene repression in ER+ breast cancer when connected to 3 ′ US genes (Supplementary Table 6).
The loss of HK ceRNA partners naturally reduces the high overlap of HK genes between ER+ and ER- (Figure 5A), resulting into 505 and 144 HK genes that are ceRNA partners of 3 ′ US genes unique in ER-and ER+ tumor (ER-and ER+ HK ceRNA partners), respectively ( Figure 5B). While it is known that cell growth and cell cycle regulations are different in the subtypes (Abba et al., 2005;Alles et al., 2009;Zhou et al., 2013), we found that the 505 ER-HK ceRNA partners are enriched for cell growth-related and cell cycle-related IPA pathways ( Figure 5C, Supplementary Table 3). First, they are enriched for pathways associated to growth factor (with keyword "GF"). EGF (P =10 −2.99 ) especially activates cell cycle progression in ERtumors (Biswas et al., 2000), and the expression of VEGF (P = 10 −2.42 ) is associated to ER-tumors (Fuckar et al., 2006). Also, both EGF and VEGF are suspected to proliferate ER-tumors when estrogen cannot sustain them (Fuckar et al., 2006). Second, cell cycle pathways are enriched for ER+ specific HK ceRNA partners, suggesting that ER-regulated cell cycle (Paruthiyil et al., 2004;Javanmoghadam et al., 2016) differentiates ER+ and ERcancer partially at the ceRNA level. Since the regulation of the cell cycle, G1-and S-phase and their transition ratio, is especially crucial for ER+ tumor's proliferation [reviewed in (Foster et al., 2002)], it is interesting that the cell cycle regulation pathways for  various phases (G1/S or G2/M) of various mediators (estrogen or cyclins) are enriched with 144 ER+ HK ceRNA partners. Third, considering that the enrichment analysis was for the disjoint sets of genes (505 unique to ER-and 144 unique to ER+), it is interesting that these unique HK ceRNA partners are commonly significantly enriched for some "cancer" pathways, e.g., "molecular mechanisms of cancer, " showing that the HK ceRNAs are involved in cancer mechanisms equally significantly but in a subtype-specific fashion.

′ US Represses Housekeeping Genes to Promote Tumor Growth
To gain insights into the cause-and-effect relationship from 3 ′ US-mediated HK gene repression to tumorigenesis, we revisited a previous study (Masamha et al., 2014;Park et al., 2018b) in which 3 ′ US-ceRNA effect promotes tumorigenesis in NUDT21 knockdown (KD) in HeLa cells and glioblastoma [data available in GSE42420 (Hammond et al., 2010) and GSE78198 (Ala et al., 2013)]. First, we chose 11,431 genes that are expressed FIGURE 5 | 3 ′ US disrupts the ceRNA relationship of HK genes for ER-specific growth. Number of HK ceRNA partners unique and common to ER-and ER+ normal (A) and tumor (B) ceRNA networks. The numbers in parentheses are normalized to the number of genes shared between tumor and normal. (C) IPA canonical pathways significantly (P < 0.01) enriched for ER+ and ER-specific HK ceRNA partners. The pathways are color-coded by keyword: "cancer" in red, "GF" in brown, "estrogen" in green, and "cell cycle" in blue.
in the experiment data (average FPKM >1). Among them, we further chose 4,430 genes that would work as miRNA sponges (>5 miRNA binding sites). To identify ceRNA relationship with the genes, we will solely use the significance of miRNA binding site overlap (FDR < 0.05) since the other criteria for the ceRNA identification, co-expression, cannot be effectively estimated from two replicates of NUDT21 KD experiments. In this way, we identified 860 3 ′ US genes and 2,449 of their ceRNA partners. Among these 3 ′ US ceRNA partners, a significant portion of them (705, 28.8%) are HK genes, while 184 are TA and 163 are TF genes. It is especially interesting to note that HK genes in the network are only either 3 ′ US genes (n = 298) or 3 ′ US ceRNA partners (n = 705). On the other hand, almost half of the TA and TF genes in the network are not connected with 3 ′ US genes [149 of 333 (44.7%) and 147 of 310 (47.4%) for TA and TF, respectively], showing that HK genes can be a major target of 3 ′ US ceRNA effect. Based on our previous finding that 3 ′ US represses the ceRNA partners in tumor (Park et al., 2018b), we further checked the repression of HK genes in NUDT21 KD. A total of 705 HK genes that are 3 ′ US ceRNA partners are more repressed than TA and TF genes or than 298 HK 3 ′ US genes in the network (Figure 6A; P = 0.01 and 0.05, 0.002, respectively). These results confirm that HK genes are repressed in the tumorigenic process that 3 ′ US-ceRNA effect promotes (Park et al., 2018b).
To assess the impact of this repression on tumor growth, we further conducted IPA analysis on 705 HK 3 ′ US ceRNA partners in comparison to the other 2,410 HK genes not in the network. First, although there are much less HK 3 ′ US ceRNA partners than the other HK genes, they are enriched for more IPA diseases and functions terms (Supplementary Table 4). While the IPA analysis gives N/A for the terms that are so lowly enriched that cannot be estimated, HK 3 ′ US ceRNA partners have 581 terms with N/A value and HK genes not in the network have 693 terms with N/A value. Furthermore, we replaced the N/A values with the minimum value and compared the p-values in HK 3 ′ US ceRNA partners vs. the other HK genes. This comparison shows that more terms are significantly (P < 0.01) enriched for HK 3 ′ US ceRNA partners (254 terms with better p-values for HK 3 ′ US ceRNA partners and 141 for the other HK genes). This trend is more pronounced for the terms that are important for cancer. For example, IPA terms with keywords "cell, " "cancer (or tumor), " "apoptosis (death, or necro), " and "growth (proliferation, or progression)" are significantly (P < 2.2e −16 ) more enriched in the HK 3 ′ US ceRNA partners, while certain terms for general biological processes such as "RNA" are enriched in the other HK genes (Figure 6B). While this analysis does not support our hypothesis as a whole, it demonstrates a potential role of HK gene repression in a tumorigenesis process with HeLa as a model system. It follows that the ER-specific tumor progression is attributable to the repression of different HK genes.

DISCUSSION
To investigate the role of 3 ′ US-ceRNA effect (Park et al., 2018b) for ER-vs. ER+ breast tumors, we built the ceRNA networks that are comparable to each other subtype by addressing the bias due to the different numbers of samples (72 for ER+ and 20 for ER-in TCGA). A fair comparison of the networks suggests that 3 ′ US disrupts the ceRNA network for ER-tumors' aggressive phenotypes. Furthermore, we revealed a role of 3 ′ US-ceRNA effect on HK genes. In the cancer context, the potential for being ceRNA was identified for mRNAs [e.g., ] as well as long non-coding RNAs [e.g., (Tuersong et al., 2019)] and pseudogenes [e.g., (Wei et al., 2017)]. Among mRNAs, it has been shown that TA genes and TF genes heavily contribute to ceRNA regulation (Sumazin et al., 2011). While reaffirming the high contribution (and thus high potential of biological function) of the TA and TF genes to breast cancer ceRNA networks, we further found the high contribution of HK genes. HK genes were reported as stable "control" genes for miRNA sponge effect [e.g., (Bouhaddioui et al., 2014)], indirectly supporting our novel findings. By analyzing TCGA breast cancer and reanalyzing an experimental data, we found more direct supports for their roles to ceRNA.
Further analyses show that 3 ′ US disrupts the ceRNA crosstalk of HK genes in a subtype-specific fashion. First, we showed that a subset of HK genes is trans target of 3 ′ US-ceRNA effect (sponge HK genes) enriched in important pathways in association to ER-'s aggressive phenotype. Since they are much less than the other HK genes in number (e. g., 705 3 ′ US ceRNA HK genes vs. 2,401 HK genes in the NUDT21 KD experiment), our definition may shed novel insights into identifying another set of biomarkers indicating tumor progression.
In network analysis, a network of interest is often compared to a reference network. However, if the networks are built from different numbers of samples, the comparison will be biased due to the sample size difference (Supplementary Figure 3). With the assumption that normal samples should have similar molecular dynamics, we found the subtype-specific cutoff values for normal ceRNA networks. Then, we constructed ER+ and ER-tumor ceRNA networks (two-step pairwise normalization method). As the resulting ceRNA networks facilitate novel discoveries on the subtype-specific 3 ′ US-ceRNA effect, we expect that the two-step pairwise normalization method can further help normalize biological networks built with the different numbers of samples if the matched normal samples are available.
We note that this normalization method can help further identify the genes playing important roles in a subtype-specific fashion. For example, we used the KS test to compare the eigenvalue distribution of the Laplacian matrix of the two networks, ER+ and ER-ceRNA networks. The eigenvalue distribution is a set of eigenvalues, each representing a temporal snapshot of the network (Gera et al., 2018). Since K 1,2 in Equation 4 represents the snapshot point at which the topology of the two networks is most apart, the edges appearing at that time point strongly differentiate the two subtypes in the ceRNA level. In that sense, genes in the edges can be further investigated for their roles in each of the subtypes. Also, the resulting networks, the comparable ceRNA networks of ER+ and ER-breast tumors, can further help identify important genes for specific functions in the subtypes. Biological network analysis techniques were used to identify the genes playing important roles in the ceRNA network (Sumazin et al., 2011;Cheng et al., 2015). To identify such genes for ER-tumor, the samples need to be compared with ER+ in our context. In that sense, we can build a differential network (ER-vs. ER+) based on the comparable ceRNA networks. Then, since hub genes in the differential network would facilitate the ceRNA regulation of many genes only for a specific subtype, e.g., ERbreast tumor, they would be good candidates for important functions specific to the ER-tumors. We can further identify those for specific functions based on the gene sets defined for the functions, e.g., gene ontology (The Gene Ontology Consortium, 2018). Our study showed the distinct 3 ′ US-ceRNA dynamics between the ER+ and the ER-groups of tumor samples. Although ER status is an important clinical variable (Hammond et al., 2010), it is important to note that the two groups do not directly represent further clinical subtypes of breast cancers, such as HER2+ or triple-negative. Thus, to reveal a further clinical relevance of 3 ′ US-ceRNA dynamics, more study is warranted in the clinical subtypes within each group.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
HP conceived the project and designed the experiments. ZF performed the data analysis with the help from YB. SK conducted statistical tests. HP wrote the manuscript with input from BD and SK.

FUNDING
This work was supported partly by the CTSI Biomedical Modeling Awards and the Joan Gollin Gaines Cancer Research Fund at the University of Pittsburgh to HP. This project used the UPMC Hillman Cancer Center Biostatistics Shared Resource that was supported in part by award P30CA047904.
We thank Steffi Oesterreich, Department of Pharmacology and Chemical Biology, University of Pittsburgh for valuable discussion on the effect of the 3 ′ US-ceRNA effect on breast cancer. We thank Susanne M. Gollin, the Department of Human Genetics in the University of Pittsburgh for their valuable help in editing this article. Also, this research was supported in part by the University of Pittsburgh Center for Research Computing through the resources provided. This manuscript has been released as a Pre-Print at Zhenjiang et al. (2019) (doi: 10.1101/601526).