Original Research ARTICLE
Gene Co-expression Is Distance-Dependent in Breast Cancer
- 1Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- 2Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
- 3Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
Breast carcinomas are characterized by anomalous gene regulatory programs. As is well-known, gene expression programs are able to shape phenotypes. Hence, the understanding of gene co-expression may shed light on the underlying mechanisms behind the transcriptional regulatory programs affecting tumor development and evolution. For instance, in breast cancer, there is a clear loss of inter-chromosomal (trans-) co-expression, compared with healthy tissue. At the same time cis- (intra-chromosomal) interactions are favored in breast tumors. In order to have a deeper understanding of regulatory phenomena in cancer, here, we constructed Gene Co-expression Networks by using TCGA-derived RNA-seq whole-genome samples corresponding to the four breast cancer molecular subtypes, as well as healthy tissue. We quantify the cis-/trans- co-expression imbalance in all phenotypes. Additionally, we measured the association between co-expression and physical distance between genes, and characterized the ratio of intra/inter-cytoband interactions per phenotype. We confirmed loss of trans- co-expression in all molecular subtypes. We also observed that gene cis- co-expression decays abruptly with distance in all tumors in contrast with healthy tissue. We observed co-expressed gene hotspots, that tend to be connected at cytoband regions, and coincide accurately with already known copy number altered regions, such as Chr17q12, or Chr8q24.3 for all subtypes. Our methodology recovered different alterations already reported for specific breast cancer subtypes, showing how co-expression network approaches might help to capture distinct events that modify the cell regulatory program.
A cellular context is determined by the coordinated expression of specific sets of genes, hence, gene transcription is a highly regulated process. In this sense, gene expression profiles and co-expression patterns might provide insight about shared transcriptional regulatory mechanisms (1, 2). Among the elements involved in those regulatory mechanisms there are proteins that constitute the transcriptional machinery such as the RNA polymerase II and its associated enzymes (3), transcription factors and their cofactors, sequences identified by those transcription factors such as promoters (4) and enhancers (5, 6), histone modifications, both repressing or promoting gene expression, and structural proteins associated with chromatin architecture (7) [for a comprehensive review see (8, 9)].
Interactions among those elements allow the emergence of long-range regulatory instances and functional relationships have been characterized between regulatory elements and distal target genes (10). Although these long-range instances are clearly important, transcription of a specific gene is also influenced by its physical location: gene order is not random and neighboring genes tend to be co-expressed in several eukaryotic cells (11–13). Moreover, there is evidence that the transcription process contributes to the formation of gene clusters and influence chromatin organization at a fine scale (14).
In cancer there is a dramatic disruption of the transcriptional process leading to altered gene expression and the promotion of tumor progression (15). Elements in nearly all levels of transcriptional control can suffer from modifications in cancer. A mutation can alter gene expression directly when it takes place in gene control sequences such as enhancers, promoters or insulators. On the other hand, transcriptional dysregulation can arise from genomic alterations indirectly when they modify the activity of signaling factors such as transcription factors, signaling proteins or when they cause a disruption in chromatin regulators or chromosome structuring proteins (16).
Heterogeneity in the aforementioned alterations is displayed in the diversity of cancer-associated gene expression profiles and their related phenotypes. In this sense, breast cancer and its molecular subtypes, Luminal A, Luminal B, HER2+, and Basal-like (17, 18), are an emblematic example because they have distinct patterns of gene co-expression profiles and lead to different breast cancer manifestations both at the molecular and clinical levels.
Breast cancer subtypes have been associated with different prognosis, survival rates and metastatic behavior (19, 20). The longest survival has been observed in patients with the Luminal A subtype, followed by those with Luminal B and HER2+. Finally, Basal-like subtype patients have been associated with the poorest survival (21). Median duration of survival from time of first distant metastasis follows the same trend (22). Hence, outcomes in breast cancer can be traced back to the transcriptomic level.
The altered gene expression patterns observed in breast cancer are usually studied using next generation sequencing (NGS) techniques such as RNA-Seq (23). Computational methods can be used to analyze the resulting data and to understand different features of the biological system. In particular, co-expression networks capture important aspects of the connectivity structure of the transcriptional profile (24, 25). Moreover, global and local properties of co-expression networks may provide some insights into the actual biological mechanisms associated to the gene co-expression program (26–28).
Previously, by comparing the gene co-expression network derived from RNA-Seq data from the unclassified breast cancer samples in TCGA and the co-expression network from the healthy samples, we reported a significant reduction in the number of interactions linking inter-chromosomal genes (trans-) and an increase on the strength of co-expression between intra-chromosomal (cis-) genes (29, 30). On the other hand, in the healthy breast tissue, gene co-expression was apparently independent of the chromosome location of genes. This phenomenon evidenced that in breast cancer, there is an influence of the physical distance between genes and their related co-expression values.
With the aim to provide a broader and deeper landscape of the transcriptional regulation changes in the context of breast cancer heterogeneity, in this work, we characterized the cis-/trans- co-expression imbalance for each breast cancer molecular subtype compared with the co-expression profile of the healthy phenotype. We also assessed the gene co-expression distance dependency between cis- genes. Additionally, to have a more comprehensive description of the distance level of cis- co-expression, we characterized the intra-cytoband/inter-cytoband co-expression ratio per subtype.
With our network approach, we confirmed the loss of trans- co-expression in all breast cancer molecular subtypes. The cis-/trans- co-expression imbalance is specific for each subtype. We observed that gene cis- co-expression decays with distance in all breast cancer subtypes, but not in the healthy phenotype. We observed highly co-expressed gene hotspots, that tend to be confined at cytoband regions, and coincide accurately with already known copy number altered regions, such as the case for Chr17q12, or Chr8q24.3 for all subtypes.
2. Materials and Methods
A collection of The Cancer Genome Atlas (TCGA) breast invasive carcinoma datasets were used in this work (26). Briefly, 780 tumor and 101 normal IlluminaHiSeq RNASeq samples were acquired and pre-processed to log2 normalized gene expression values were obtained as described in (29).
2.2. Data Processing
The tumor log2 normalized expression values were classified using PAM50 algorithm into the respective intrinsic breast cancer subtypes (Normal-like, Luminal A, Luminal B, Basal and HER2-Enriched) using the Permutation-Based Confidence for Molecular Classification (31) as implemented in the pbcmc R package (32). Tumor samples with a non-reliable breast cancer subtype call, were removed from the analysis. The number of reliable samples were 113, 217, 192, 105, and 221 for control, Luminal A, Luminal B, HER2+, and Basal subtypes, respectively.
Multidimensional Principal Component Analysis (PCA) over gene expression values showed a blurred overlapped pattern among the different breast cancer subtypes. Hence, multidimensional noise reduction using ARSyN R implementation was used (33). Finally, PCA visual exploration showed that the noisy pattern was removed, thus breast cancer subtypes clustered without overlap.
2.2.1. Statistical Analyses
Fisher's least significant difference analysis (FLSDA) was performed to test whether Mutual Information (MI) distributions in the different groups were significantly different or there is some overlapping. FLSDA is a two-stage test for (multiple) pairwise comparison. The first stage consists in performing a “global” test for the null hypothesis that the expected value of the mean of the different groups is equal. In the (trivial) case that this null hypothesis is accepted, no further analysis is needed. If the global null hypothesis is rejected at a pre-specified level of significance, then a second stage is performed on all pairwise comparisons at the same level of significance (34). Here we performed FLSDA with the LSD.test function in R using the default significance value (α = 0.05) and FDR correcting for multiple testing. LSD.test is an implementation of Fisher's Least Significant Difference analysis as implemented in the agricolae v1.3-2 R package.
2.2.2. Network Construction
Gene regulatory network deconvolution from experimental data has been extensively used to unveil co-regulatory interactions between genes by looking out for patterns in their experimentally-measured mRNA expression levels. A number of correlation measures have been used to deconvolute transcriptional interaction networks based on the inference of the corresponding statistical dependency structure in the associated gene expression patterns (35–39). It has long been known that the maximum likelihood estimator of statistical dependency is mutual information (MI) (38–41). ARACNE (42) is the flagship algorithm used to quantify the degree of statistical dependence between pairs of genes.
In a nutshell, the algorithm calculates the MI—a non-parametric measure that captures non-linear dependencies between variables—in a relatively fast implementation. The method associates a MI value to each significance value (p-value) based on permutation analysis, as a function of the sample size. To make comparable the five networks, the highest interactions in terms of their statistical significance (P ≤ 1e−8) were kept. In this way, we better assessed the differences among groups. The number of interactions in the five networks were reduced to 13,317 interactions, which is the lowest number of interactions in our networks with significant p-value. This was the case of HER2+ network.
3.1. cis-/trans- Imbalance Is a Common Feature in Breast Cancer Subtypes
To characterize the co-expression patterns for breast cancer molecular subtypes and the healthy phenotype, gene pair co-expression values were calculated (see section 2). Only the highest interactions in terms of their statistical significance (P ≤ 1e−8) were kept to better assess the differences among groups and the number of interactions for all datasets were reduced to match the lowest one (13,317 interactions, in the case of HER2+ co-expression network).
Figure 1 exhibits circos plot representations of the most significant interactions for healthy and Basal subtype. Purple links represent cis- interactions, meanwhile orange links account for trans- relationships. Heatmap displays the number of adjacent links for every chromosome. Lowest row displays cis- interactions in purple, while the rest of the rows in the triangle refer to inter-chromosomal (orange) interactions. The number of cis- and trans- interactions for each phenotype is indicated in Table 1 and circos plot and heatmaps for all groups are included in Supplementary Figure 1.
Figure 1. Abundance of cis-/trans- interactions in healthy and Basal networks. The circos plots and heatmaps show difference in cis-/trans- co-expression patterns between the healthy and Basal phenotypes. Plots contain the most significant interactions (P ≤ 1e−8). Orange links join genes from different chromosomes (trans-); meanwhile purple links represent cis- interactions. External arcs represent each chromosome. Heatmap shows the number of adjacent cis- and trans- links for both phenotypes in all chromosomes. Notice the difference in the purple/orange proportion. The Basal subtype is almost depleted of trans- (orange) interactions in all chromosomes.
The circos plot and the cis-/trans- ratio values in Table 1 show that among the most significant co-expression relationships in the healthy phenotype there are more interactions between genes in different chromosomes than between genes in the same chromosome. Meanwhile, each one of the breast cancer molecular subtypes present an inverted proportion than the healthy phenotype indicating that cis-/trans- imbalance is a common feature in breast cancer subtypes. In fact, the Basal breast cancer heatmap shows that this subtype is almost depleted of trans- interactions. Additionally, cis- interactions in Basal circos plot appear to be straight lines, since genes with a high co-expression values are physically close.
The cis-/trans- ratio values where compared with those obtained from a configuration null model that takes into account the connectivity patterns displayed by the sets of the most significant co-expression interactions. According to Z-scores (Table 1), cis-/trans- ratio values lie far from expected values in the configuration models, being the Basal suptype the set with the most extreme value. The fact that the healthy phenotype also displays a significant Z-score reveals that the cis-/trans- ratio in the healthy phenotype is greater even than what it is expected in a random model.
3.1.1. Gene Co-expression Networks Show Unique Structures
Gene co-expression networks were assembled to analyze the structure that emerges from the most significant interactions. Figure 2 displays the healthy network structure as well as the four network subtypes. In these visualizations, genes are colored according to the chromosome in which they are located. The five networks depicted show different structural features regarding size of components (set of nodes that are connected among them but have no interactions with other nodes) and gene connectivity.
Figure 2. Gene co-expression networks. In this representation we observe the most significant interactions per network. Genes are colored according to the chromosome in which they are located. Notice the evident difference between size and color combination in the largest component in the healthy network vs. the four subtypes. Cancer networks have almost the same color in the largest component. A particular case is the Luminal A subtype; despite the largest component (circled) has several genes from different chromosomes, they are grouped according to their corresponding chromosome.
In the healthy network, the largest component (LC) comprises almost all genes, contrary to any tumor subtype network, where the size of the LC is smaller (Table 1). Furthermore, the vast majority of genes in each component of any breast cancer subtype network belongs to a single chromosome. LC in the Luminal A subtype network displays a mixed behavior: while there are genes from different chromosomes linked inside the component, they are mostly assembled into clusters of cis- interactions. In terms of connectivity, genes in breast cancer networks are highly connected between them, i.e., network components in cancer networks present a higher clustering coefficient compared to the healthy network.
3.2. Gene Co-expression Is Distance-Dependent in Breast Cancer
3.2.1. Physically Close Genes Exhibit Higher Co-expression in Breast Cancer Subtypes
To evaluate the relationship between gene-pair physical distance and the strength of co-expression, the entire set of cis- interactions were analyzed by calculating gene-pair co-expression values measured by Mutual Information (MI) for every cis- gene pair in the five phenotypes. MI values were normalized per phenotype to allow for comparison. In Figure 3, gene pairs are grouped into 5 Mbp distance bins and the mean normalized-MI value for each bin is plotted. The area displays the standard deviation for each group.
Figure 3. Strongest interactions occur between physically close genes in breast cancer subtypes. Mean co-expression MI values (black line) and the standard deviation (colored shadows) for all cis- interactions grouped at 5 Mbp gene-pair distance. In the case of healthy phenotype, co-expression remains constant at higher distances, contrary to any case of breast cancer. Furthermore, the plateau reached by all cancer phenotypes is clearly lower compared to the healthy one.
In the healthy phenotype plot, co-expression values remain almost constant with slightly greater MI values for the closest distances. Meanwhile, breast cancer subtype plots display similar behavior: short gene-pair distances have the highest MI values and, as distance increases, co-expression is lower, reaching a plateau close to 100 Mbp. Notice that the plateau in cancer is lower than the plateau observed in the healthy plot.
3.2.2. Co-expression Decay Associated to Distance Is Specific for Each Chromosome and Subtype
Once observed that gene co-expression decays with distance in each breast cancer subtype, the decay was evaluated for every chromosome individually. Normalized-MI values from cis- interactions were grouped into 100-gene-pair-bins and plotted vs. gene-pair distance for every chromosome and for each subtype.
Figure 4A displays plots for chromosomes 8 and 17 for the non-cancerous and the different breast cancer subtypes, while Figure 4B displays values for all chromosomes in the healthy and the Basal phenotypes.
Figure 4. Strength of cis- insteractions decays with chromosomal distance in breast cancer subtypes. (A) Chromosomes 8 and 17 gene co-expression through distance. For both chromosomes, there is a remarkable difference between the strength of co-expression in healthy phenotype compared to any subtype. Each point represents the average of gene co-expression (MI) of 100 gene pairs according to the distance between them. Green and turquoise curves are the fitting to a 4th degree linear model. (B) Gene co-expression by distance in all chromosomes for healthy and Basal subtype. Color code of the chromosomes appears at the right of the figure. (C) Fisher's Least Significant Difference test result of MI mean values at 1 Mbp gene-gene distance. Test assigns a different group for each phenotype, which confirms they are significantly different.
Values in the MI-distance plots were adjusted using a linear model that accounts for an independent intercept for each chromosome (see section 2). The complete model has an adjusted R2 = 0.94. The ANOVA table results are presented in Supplementary Table 1, which confirms the distance-dependent decay in breast cancer subtypes (P < 1e-4). The shape of this decay in all cancer phenotypes is similar to that showed by our group for Basal breast cancer (43). There, co-expression strength diminished with distance in an exponential-like form. Supplementary Figure 2 contains the scatter plots for all chromosomes in the four breast cancer subtypes and the non-cancerous phenotype.
In Figure 4A for any subtype, MI follows a decay relative to gene-pair distance. On the other hand, Figure 4B shows that the strength of cis- interactions in the healthy phenotype is higher for extremely close gene pairs and then remains constant along the x-axis. Furthermore, the decay is different for each chromosome in breast cancer. This can also be observed in Figure 4A, in Chr17 the decay is more abrupt than the case of Chr8.
Between chromosomes, the same subtype has slightly different patterns of decay, but that is not the case for the healthy co-expression program (Figure 4). The inserts of Figure 4B are a zoom over the closest distances. Color distribution allows the observation that there is more variability of cis- interactions in the Basal co-expression values than in the healthy phenotype.
By means of a Fisher's Least Significant Difference test (see section 2) it was proved that co-expression strength decay is different for each breast cancer subtype as they are classified into individual groups displayed in Figure 4C.
3.3. High Density Intra-Cytoband Hotspots Emerge in Breast Cancer Subtypes
Patterns of highly connected individual chromosome regions emerge for each subtype when individual chromosomes are analyzed. Cytogenetic bands where used to define chromosome regions of physically close genes: cis- interactions in the co-expression networks are classified into intra-cytoband when they link genes in the same cytogenetic band, and inter-cytoband, when genes are on different cytogenetic regions.
In the four subtype networks, a highly interconnected region composed only by intra-cytoband links is observed in chr8q24.3 (Figure 5). This intra-cytoband hotspot is not recovered in the healthy network. This region is important because it is commonly amplified in breast cancer tumors associated with altered function of MYC (44). Furthermore, intra-cytoband genes in p-arm of chromosome 8 are also highly connected in the four cancer networks.
Figure 5. Chromosome 8 in breast cancer networks. This ideogram-like network layout shows the number of intra-cytoband (blue), inter-cytoband (purple), gene-gene interactions for all phenotypes. Notice the four hotspots observed in Chr8q24.3 region (right part) in the four subtypes, absent in the healthy network.
In Luminal A network, an important increment in cis- density in chromosome 17 is observed. A high number of inter-cytoband interactions in Chr17p can be appreciated (left part, Figure 6). As in Figure 5, hotspots of co-expression can also be appreciated. p-arm of Luminal A network contains highly dense intra-cytoband interactions, but also inter-cytoband ones. In the rest of subtypes, intra-cytoband hotspots appear in different regions. Finally, chr17q25.3 region contains a highly dense hotspot in all subtypes. Bottom part of the figure is a zoom-in of chr17q12 region, where ERBB2 gene is located. There, several intra-cytoband interactions may be observed. The ideogram-like figures for all chromosomes in each phenotype are presented in Supplementary Figures 3–7.
Figure 6. Chromosome 17 presents different intra/inter cytoband patterns for each subtype. As in the previous figure, inter/intra-cytoband interactions are depicted for genes in chromosome 17. Notice the inversion of proportion from the healthy case to any other subtype. The bottom part corresponds to a zoom-in of chr17q12 region. Inter-cytoband links does not appear since we depicted one cytoband only.
3.4. Strongest Interactions Are Most Commonly Intra-Cytoband in Breast Cancer
To further analyze the co-expression patterns of the five phenotypes, we used cytogenetic bands since they delimit chromosome regions of physically close genes.
When genes linked by the most significant interactions are classified according to their cytogenetic band, the healthy phenotype displays a greater number of inter-cytoband than intra-cytoband links (apart from the already mentioned difference between cis- and trans- interactions). On the contrary, for any breast cancer subtype there are much more intra-cytoband interactions than inter-cytoband interactions.
In upper panel of Figure 7 differences between intra-cytoband/inter-cytoband proportion between healthy and cancer networks are displayed. It is also remarkable the already mentioned difference between cis- and trans- interactions (orange bars). Boxplots in the bottom panel show the differences of co-expression values between intra/inter-cytobands, and also between trans- co-expression values.
Figure 7. Strongest interactions are preferably Intra-cytoband in breast cancer Upper panel shows the number of cis- intra-cytoband, inter-cytoband, and trans- interactions for the five phenotypes. The lower panel shows boxplots of MI co-expression values for the same groups. To notice the higher MI values in healthy phenotype compared with the cancer ones.
In this work we have described the co-expression patterns in breast cancer molecular subtypes by analyzing the structure established by the strongest interactions in co-expression networks, as well as the relationship between the entire set of co-expression values and physical distance between gene pairs. There are four main characteristics displayed by the co-expression patterns in breast cancer molecular subtypes that are not present in the healthy phenotype: (1) there is a cis-/trans- interactions proportion imbalance, (2) the strength of gene-pair co-expression depends on physical distance, (3) there is an emergence of high density co-expression hotspots and (4) strongest interactions as well as hotspots are preferably intra-cytoband in breast cancer.
The cis-/trans- ratio imbalance in the co-expression networks is a phenomenon that was already observed for the unclassified set of breast cancer samples (29). However, differences in transcriptional profile are widely accepted to recover clinical features such as prognosis and response to treatment (45). Transcriptional profiles are the basis of the classification into the main four molecular subtypes: Luminal A, Luminal B, Her2-enriched and Basal. Our group has shown that co-expression networks of breast cancer molecular subtypes are different in terms of their modularity (46) and network architecture (47). Here, it is shown that the cis-/trans- ratio imbalance is another feature that characterizes each network (Table 1, Figure 3).
When compared to a null model, the healthy co-expression network received a significant Z-score on its cis-/trans- ratio. This is an expected observation given the well-described influence from neighboring transcription in eukaryotic genome (12, 13). Although it has been shown that some clustered genes are not only co-expressed but also functionally related (48), the phenomenon has been mostly explained as a mechanism to control dose-sensitive genes, allowing the modification of the expression of multiple genes by the same epigenetic marks (13, 49, 50).
In agreement with this feature, MI values for the healthy phenotype in Figure 3 display high co-expression at a very close distance but they quickly reach a plateau and then remain unchanged. In contrast, for any breast cancer subtype, there is a decay in gene co-expression associated with physical gene pair distance. This result suggests that co-expression from neighboring genes gets exacerbated in breast cancer and presents different patters for each molecular subtype.
During the oncogenic process, there is a disruption of the regulatory elements that control gene transcription (15). The result is an altered expression profile associated with the promotion of a tumorigenic phenotype. Among these alterations, there are some examples that might favor the formation of clustered genes with similar expression. For instance, it has been shown that Topologically Associated Domains (TAD) boundaries are altered in cancer promoting the formation of additional TADs, smaller and within normal TAD-architecture (51, 52).
Furthermore, Estrogen receptor, usually upregulated in the luminal subtypes, is known to have an influence in the structural organization of chromatin in estrogen-regulated genes, particularly, by promoting long range interactions (53–56). This feature may have an influence in the co-expression profiles, for example, in the higher number of trans- interactions in Luminal subtypes (Figure 2). Additionally, Luminal A and Luminal B maximal co-expression values at closest distances are smaller than in the remaining breast cancer subtypes (Figure 3).
Alterations in the methylation patterns of contiguous gene regions have also been identified in cancer. Through hypermethylation and subsequent gene repression, regions of long-range epigenetic silencing (LRES) are found to be functionally inactivated in breast (57) and prostate cancer (58). Long-range epigenetic activation (LREA) of adjacent genes within a region have also been found in prostate cancer (59). These alterations might favor the formation of high density intra-cytoband hotspots, observed in the breast cancer co-expression profiles but not in the healthy one.
Among the most documented genomic modifications in breast cancer are gene copy number alterations (CNAs). CNAs are also known to affect gene expression over clusters of genes (60–62). Amplification and protein overexpression of HER2 actually define a specific molecular subtype (63, 64). According to (65), the HER2-amplicon was narrowed to an 85.92 kbp region including the TCAP, PNMT, PERLD1, HER2, C17orf37 and GRB7 genes. Four out of those six genes appear in the HER+ co-expression network, HER2 included.
There is another amplicon that has been associated with the BRCA1 mutated triple negative breast cancer in the region 17q25.3, that was also observed in HER2+ and Luminal B subtype (66) (Figure 6). We recovered the region as an intra-cytoband hotspot but it was more dense in the HER2+ subtype. Region 6p21.2-6p12 in the triple-negative breast cancer, associated with the Basal subtype has been also identified as an amplicon (67) and as a hotspot in our network (Supplementary Figure 7). We also identified the 17q21.3 amplicon in the HER2+ subtype (61).
Chr 16p13 hotspot was also found in our networks for all cancer phenotypes, but not for the healthy one. Genomic loss in this region has been associated with poor prognosis in colorectal cancer (68). However, a copy number gain in breast cancer has been observed (69). Here we report a co-expression hotspot in said region, which coincides with the observation of (69). Hence, the observed hotspots may serve as a proxy of other genomic and epigenomic alterations in cancer.
It is widely known that the DNA-binding protein CCCTC-binding factor (CTCF) regulates higher-order chromatin organization in mammalian cells, (70). It has been observed that CTCF binding to enhancer RNAs is enriched when breast cancer cells are stimulated with estrogen (71). Furthermore, cell line specific CTCF binding events have been encountered in breast cancer cell lines (72). This could alter the tridimensional structure, affecting gene expression, and thus modifying the co-expression landscape. The fact that CTCF binding sites could be modified in different phenotypes or conditions, may be behind the observed loss of trans- co-expression.
The latter acquire more relevance since apparently, network hotspots in the cancer subtypes may be generated precisely due to an altered chromatin modification. Further steps toward a proper answer for that most include a mapping of subtype-specific CTCF binding sites and intra-cytoband hotspots boundaries.
Alterations are not isolated. In addition to the already mentioned genomic modifications observed in cancer, it is known that the three-dimensional chromatin organization influences chromosomal alterations at the sequence level (73). Hence, there is an interplay between the different regulatory elements altered during the oncogenic process. However, all of these changes are reflected in the gene expression profile.
It is worth mentioning that mutual information values in control network are, in general, higher than those of breast cancer subtypes (Lower values of Figures 3, 4). This may be due to a more stable gene expression signature in healthy tissue. This could produce similar expression values that traduce in higher MI co-expression between gene couples.
Despite the lack of an accurate experimental setup to assess these results, our methodology recovered different instances of alterations that were already reported for specific breast cancer subtypes. This is an example of how a co-expression network approach might help to understand the different layers of transcriptional regulation that promote a tumorigenic phenotype.
The heterogenic nature of breast cancer makes it a highly complex disease. The only way toward a profound comprehension of its functioning must be addressed from all possible perspectives. In this paper, we have used a co-expression network approach to dissect with high accuracy, the dependency of physical distance on gene co-expression in cancer. Several chromosome regions displaying a subtype-specific co-expression pattern coincide with genomic and epigenomic alterations already observed in breast cancer using different experimental protocols, as well as distinct -omic approaches
From this work we highlight the following: (i) gene co-expression networks capture different instances of molecular alterations in breast cancer, which can be translated to any other tissue. (ii) This is a different approach to analyze alterations at the transcriptional level in cancer, and (iii) The loss of long-range co-expression, concomitant with the strong gain of physically close gene co-expression, may constitute a manifestation of a non-previously observed phenomenon in cancer, and it can be the starting point to conduct investigations on how the transcriptional process shapes the oncogenic phenotype.
Data Availability Statement
Publicly available datasets were analyzed in this study. Samples from The Cancer Genome Atlas Project can be found in the NCI Genomic Data Commons (GDC) portal (https://portal.gdc.cancer.gov/).
DG-C performed computational analyses, developed and implemented programming code, performed pre-processing and low-level data analysis, made the figures, and drafted the manuscript. GA-J developed and implemented programming code and contributed to the writing of the manuscript. CF performed pre-processing and low-level data analysis and contributed to the writing of the manuscript. EH-L devised the project's strategy and methodological approach, contributed to the theoretical and modeling analysis, co-supervised the project, and contributed and supervised the writing of the manuscript. JE-E conceived and designed the overall project, co-supervised the project, took the lead in the biological analyses, and drafted the manuscript. All authors read and approved the final version of the manuscript.
This work was supported by CONACYT (558985 student grant to DG-C, 285544/2016, and 2115/2018 to JE-E), as well as by federal funding from the National Institute of Genomic Medicine (Mexico). Additional support has been granted by the National Laboratory of Complexity Sciences (232647/2014 CONACYT). JE-E is recipient of the 2018 Miguel Alemán Fellowship in Health Sciences. EH-L is a recipient of the 2016 Marcos Moshinsky Fellowship in the Physical Sciences. DG-C is a doctoral student from Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM). This work is part of her Ph.D. Thesis.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2020.01232/full#supplementary-material
Supplementary Figure 1. Circos plots contain the most significant interactions for each phenotype (P < 1e−8). Orange lines join genes from different chromosomes; meanwhile purple and blue lines take account for intra-chromosomal (cis-) interactions. External circles indicate the chromosome band. Bottom right panel is a heatmap representing the number of cis- and trans- interactions per chromosome for each subtype.
Supplementary Figure 2. Scatter plots showing the strength of cis- interactions with respect to physical distance in all chromosomes for the five phenotypes.
Supplementary Figure 3. cis- interactions for healthy network. Blue edges are intra cytoband, meanwhile inter cytoband interactions are colored in purple.
Supplementary Figure 4. cis- interactions for Luminal A network.
Supplementary Figure 5. cis- interactions for Luminal B network.
Supplementary Figure 6. cis- interactions for HER2+ network.
Supplementary Figure 7. cis- interactions for Basal network.
Supplementary Table 1. Type-III ANOVA table results for the Intra-chromosomal regulation decay model. The ANOVA table tested the dependence of Mutual Information (MI) values using a linear model including a fourth-order polynomial for the distance (D) between gene-pairs, accounting independent intercepts for each chromosome (Chr), breast cancer subtype and also for their corresponding polynomial terms. The model was fitted in the natural logarithm (Ln) scale for both MI and D. Significance codes: 0; “***”0.001; “**”0.01; “*”0.05; “.”0.1; “”1.
LC, Largest Component; MI, Mutual Information; TCGA, The Cancer Genome Atlas.
20. Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, et al. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res. (2008) 14:5158–65. doi: 10.1158/1078-0432.CCR-07-4756
21. Fallahpour S, Navaneelan T, De P, Borgo A. Breast cancer survival by molecular subtype: a population-based analysis of cancer registry data. CMAJ Open. (2017) 5:E734–9. doi: 10.9778/cmajo.20170030
27. Alcalá-Corona SA, Velázquez-Caldelas TE, Espinal-Enríquez J, Hernández-Lemus E. Community structure reveals biologically functional modules in mef2c transcriptional regulatory network. Front Physiol. (2016) 7:184. doi: 10.3389/fphys.2016.00184
28. de Anda-Jáuregui G, Alcalá-Corona SA, Espinal-Enríquez J, Hernández-Lemus E. Functional and transcriptional connectivity of communities in breast cancer co-expression networks. Appl Netw Sci. (2019) 4:22. doi: 10.1007/s41109-019-0129-0
29. Espinal-Enríquez J, Fresno C, Anda-Jáuregui G, Hernández-Lemus E. RNA-Seq based genome-wide analysis reveals loss of inter-chromosomal regulation in breast cancer. Sci Rep. (2017) 7:1760. doi: 10.1038/s41598-017-01314-1
30. de Anda-Jáuregui G, Espinal-Enriquez J, Hernández-Lemus E. Spatial organization of the gene regulatory program: an information theoretical approach to breast cancer transcriptomics. Entropy. (2019) 21:195. doi: 10.3390/e21020195
31. Fresno C, González GA, Merino GA, Flesia AG, Podhajcer OL, Llera AS, et al. A novel non-parametric method for uncertainty evaluation of correlation-based molecular signatures: its application on PAM50 algorithm. Bioinformatics. (2017) 33:693–700. doi: 10.1093/bioinformatics/btw704
33. Nueda MJ, Ferrer A, Conesa A. ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics. (2012) 13:553–66. doi: 10.1093/biostatistics/kxr042
38. Emmert-Streib F, Glazko GV, Altay G, de Matos Simoes R. Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front Genet. (2012) 3:8. doi: 10.3389/fgene.2012.00008
39. Hernández-Lemus E, Rangel-Escareño C. The role of information theory in gene regulatory network inference. In: Deloumeaux P, Gorzalka JD, editors. Information theory: New research, Mathematics Research Developments Series. New York, NY: Nova Publishing (2011).
40. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. (2006) 7(Suppl 1):S7. doi: 10.1186/1471-2105-7-S1-S7
43. de Anda-Jáuregui G, Fresno C, García-Cortés D, Esinal-Enríquez J, Hernández-Lemus E. Intrachromosomal regulation decay in breast cancer. Appl Math Nonlinear Sci. (2019) 4:217–24. doi: 10.2478/AMNS.2019.1.00020
44. Ahmadiyeh N, Pomerantz MM, Grisanzio C, Herman P, Jia L, Almendro V, et al. 8q24 prostate, breast, colon cancer risk loci show tissue-specific long-range interaction with MYC. Proc Natl Acad Sci USA. (2010) 107:9742–6. doi: 10.1073/pnas.0910668107
47. de Anda-Jáuregui G, Velázquez-Caldelas TE, Espinal-Enríquez J, Hernández-Lemus E. Transcriptional network architecture of breast cancer molecular subtypes. Front Physiol. (2016) 7:568. doi: 10.3389/fphys.2016.00568
48. Aure MR, Steinfeld I, Baumbusch LO, Liestøl K, Lipson D, Nyberg S, et al. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLoS ONE. (2013) 8:e53014. doi: 10.1371/journal.pone.0053014
50. Lian S, Liu T, Jing S, Yuan H, Zhang Z, Cheng L. Intrachromosomal colocalization strengthens co-expression, co-modification and evolutionary conservation of neighboring genes. BMC Genomics. (2018) 19:455. doi: 10.1186/s12864-018-4844-1
51. Taberlay PC, Achinger-Kawecka J, Lun AT, Buske FA, Sabir K, Gould CM, et al. Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res. (2016) 26:719–31. doi: 10.1101/gr.201517.115
52. Barutcu AR, Lajoie BR, McCord RP, Tye CE, Hong D, Messier TL, et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. (2015) 16:1–14. doi: 10.1186/s13059-015-0768-0
54. Rafique S, Thomas JS, Sproul D, Bickmore WA. Estrogen-induced chromatin decondensation and nuclear re-organization linked to regional epigenetic regulation in breast cancer. Genome Biol. (2015) 16:1–19. doi: 10.1186/s13059-015-0719-9
55. Yang J, Wei X, Tufan T, Kuscu C, Unlu H, Farooq S, et al. Recurrent mutations at estrogen receptor binding sites alter chromatin topology and distal gene expression in breast cancer. Genome Biol. (2018) 19:1–15. doi: 10.1186/s13059-018-1572-4
56. Baxter JS, Leavy OC, Dryden NH, Maguire S, Johnson N, Fedele V, et al. Capture Hi-C identifies putative target genes at 33 breast cancer risk loci. Nat Commun. (2018) 9:1028. doi: 10.1038/s41467-018-03411-9
57. Novak P, Jensen T, Oshiro MM, Watts GS, Kim CJ, Futscher BW. Agglomerative epigenetic aberrations are a common event in human breast cancer. Cancer Res. (2008) 68:8616–8625. doi: 10.1158/0008-5472.CAN-08-1419
58. Coolen MW, Stirzaker C, Song JZ, Statham AL, Kassir Z, Moreno CS, et al. Consolidation of the cancer genome into domains of repressive chromatin by long-range epigenetic silencing (LRES) reduces transcriptional plasticity. Nat Cell Biol. (2010) 12:235–46. doi: 10.1038/ncb2023
59. Bert SA, Robinson MD, Strbenac D, Statham AL, Song JZ, Hulf T, et al. Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell. (2013) 23:9–22. doi: 10.1016/j.ccr.2012.11.006
60. Zhou Y, Luoh SM, Zhang Y, Watanabe C, Wu TD, Ostland M, et al. Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis. Cancer Res. (2003) 63:5781–4.
62. Menghi F, Inaki K, Woo X, Kumar PA, Grzeda KR, Malhotra A, et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc Natl Acad Sci USA. (2016) 113:E2373–82. doi: 10.1073/pnas.1520010113
63. Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL. Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science. (1987) 235:177–82. doi: 10.1126/science.3798106
64. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. (2001) 98:10869–74. doi: 10.1073/pnas.191367098
65. Staaf J, Jönsson G, Ringnér M, Vallon-Christersson J, Grabau D, Arason A, et al. High-resolution genomic and expression analyses of copy number alterations in HER2-amplified breast cancer. Breast Cancer Res. (2010) 12:R25. doi: 10.1186/bcr2568
66. Toffoli S, Bar I, Abdel-Sater F, Delrée P, Hilbert P, Cavallin F, et al. Identification by array comparative genomic hybridization of a new amplicon on chromosome 17q highly recurrent in BRCA1 mutated triple negative breast cancer. Breast Cancer Res. (2014) 16:466. doi: 10.1186/s13058-014-0466-y
67. Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C, et al. Molecular characterization of breast cancer with high-resolution oligonucleotide comparative genomic hybridization array. Clin Cancer Res. (2009) 15:441–51. doi: 10.1158/1078-0432.CCR-08-1791
68. Andersen CL, Lamy P, Thorsen K, Kjeldsen E, Wikman F, Villesen P, et al. Frequent genomic loss at chr16p13.2 is associated with poor prognosis in colorectal cancer. Int J Cancer. (2011) 129:1848–58. doi: 10.1002/ijc.25841
69. Tran LM, Zhang B, Zhang Z, Zhang C, Xie T, Lamb JR, et al. Inferring causal genomic alterations in breast cancer using gene expression data. BMC Syst Biol. (2011) 5:121. doi: 10.1186/1752-0509-5-121
70. Pugacheva EM, Kubo N, Loukinov D, Tajmul M, Kang S, Kovalchuk AL, et al. CTCF mediates chromatin looping via N-terminal domain-dependent cohesin retention. Proc Natl Acad Sci USA. (2020) 117:2020–31. doi: 10.1073/pnas.1911708117
71. Fiorito E, Sharma Y, Gilfillan S, Wang S, Singh SK, Satheesh SV, et al. CTCF modulates Estrogen Receptor function through specific chromatin and nuclear matrix interactions. Nucleic Acids Res. (2016) 44:10588–602. doi: 10.1093/nar/gkw785
Keywords: breast cancer molecular subtypes, loss of trans- co-expression, gene co-expression networks, distance-dependent gene co-expression, breast cancer
Citation: García-Cortés D, de Anda-Jáuregui G, Fresno C, Hernández-Lemus E and Espinal-Enríquez J (2020) Gene Co-expression Is Distance-Dependent in Breast Cancer. Front. Oncol. 10:1232. doi: 10.3389/fonc.2020.01232
Received: 27 February 2020; Accepted: 16 June 2020;
Published: 24 July 2020.
Edited by:Markus A. N. Hartl, University of Innsbruck, Austria
Reviewed by:Annalisa Astolfi, University of Ferrara, Italy
Yan Guo, Vanderbilt University, United States
Copyright © 2020 García-Cortés, de Anda-Jáuregui, Fresno, Hernández-Lemus and Espinal-Enríquez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.