Integrated Transcriptome and Pathway Analyses Revealed Multiple Activated Pathways in Breast Cancer

Breast cancer (BC) is the leading cause of cancer-related death in women. Therefore, a better understanding of BC biology and signaling pathways might lead to the development of novel biomarkers and targeted therapies. Although a number of transcriptomic studies have been performed on breast cancer patients from various geographic regions, there are almost no such comprehensive studies performed on breast cancer from patients in the gulf region. This study aimed to provide a better understanding of the altered molecular networks in BC from the gulf region. Herein, we compared the transcriptome of BC to adjacent normal tissue from six BC patients and identified 1,108 upregulated and 518 downregulated transcripts. A selected number of genes from the RNA-Seq analysis were subsequently validated using qRT-PCR. Differentially expressed (2.0-fold change, adj. p < 0.05) transcripts were subjected to ingenuity pathway analysis, which revealed a myriad of affected signaling pathways and functional categories. Activation of ERBB2, FOXM1, ESR1, and IGFBP2 mechanistic networks was most prominent in BC tissue. Additionally, BC tissue exhibited marked enrichment in genes promoting cellular proliferation, migration, survival, and DNA replication and repair. The presence of genes indicative of immune cell infiltration and activation was also observed in BC tissue. We observed high concordance [43.5% (upregulated) and 62.1% (downregulated)] between differentially expressed genes in our study group and those reported for the TCGA BC cohort. Our data provide novel insight on BC biology and suggest common altered molecular networks in BC in this geographic region. Our data suggest future development of therapeutic interventions targeting those common signaling pathways.

The age-standardized incidence rate is growing in many countries, particularly in the Arab countries where the reported BC incidence ranges from 9.5 to 50 cases per 100,000 women per year. In the gulf region, the incidence of BC in Bahrain, United Arab Emirates, Saudi Arabia, Qatar, and Kuwait were 53. 4, 22.8, 17.5, 48.2, and 46.6 cases per 100,000 women, respectively (8). Although the incidence of BC in this geographic region is lower than those reported in Europe and USA, the incidence of BC in Arab countries are on the rise (1,9). Interestingly, patients diagnosed with BC in the Arab world are approximately a decade younger and they are oftentimes presented with larger and more advanced stage tumors (8). A previous microarray-based study comparing the clinical and gene expression profile of breast cancer from the north (France) and south (Lebanon, Tunisia, and Morocco) Mediterranean patients revealed more aggressive tumor in the south Mediterranean patient group. Tumors from the south group were predominantly luminal B, while tumors from the north were mostly luminal A subtype (10).
Recent advances in transcriptome analysis have revolutionized our understanding of human disease (11). In the current study, we utilized next generation sequencing (NGS) and bioinformatics and characterized the transcriptional landscape of BC compared to adjacent normal tissue from the gulf region and identified multiple activated networks. Our data provides the first transcriptome and network analyses of BC in this geographic region setting the foundation for future development of novel BC biomarkers and therapeutic interventions.

Ethics Statement and Sample Collection
Tumor tissues (TT) and adjacent non-cancerous normal tissues (NT) were obtained from six breast cancer patients. All patients included in the study were treatment-naive prior to surgery and were provided with a written informed consent prior to sample collection. The study was performed under ethical approval from Qatar Biomedical Research Institute, Doha, Qatar (Protocol no. 2017-006). The characteristics of patients included in current study are provided in Table 1.

Tissue Preparation and RNA Isolation
RNA was isolated using the RNA/DNA/Protein Purification Plus Kit (Norgen Biotek Corp, Ontario, Canada) as per the manufacturer's instructions from TT and adjacent NT. Briefly, frozen tissues were transferred into a mortar containing adequate amount of liquid nitrogen and were grinded thoroughly using a pestle followed by resuspending the tissue in lysis buffer followed by RNA extraction. The concentration and purity of extracted RNA were measured using NanoDrop 2000c (Thermo scientific, MA, USA) and RNA were stored at −80 • C.

RNA Concentration and Quality Assessment
The quality and quantity of extracted RNA was measured using on-chip electrophoresis utilizing the Agilent RNA 6000 Nano Kit (Agilent Technologies, CA, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies) as per the manufacturer's instructions. Samples exhibited an RNA Integrity Number (RIN) > 7 were used for library preparation.

Library Preparation
The RNA was quantified using Qubit instrument (Invitrogen, USA) and RNA BR assay kit (Invitrogen). Hundred nanogram of RNA was used as an input for library preparation using TruSeq RNA Access Library preparation kit (Illumina, CA, USA) as per the manufacturer's instructions. Briefly, the RNA was fragmented into small pieces under high temperature using divalent cations. The RNA fragments were immediately reverse transcribed to first strand cDNA using random hexamers. Following the first strand, second strand was synthesized by incorporating dUTP instead of dTTP. The sequencing adaptors were ligated to the double-stranded cDNA followed by a single "A" nucleotide adenylation at 3 ′ end of blunt fragments. The final library was created by capturing the coding regions of the transcriptome using sequence-specific probes. The yield of cDNA libraries was quantified using Qubit dsDNA HS assay kit (Invitrogen) and size distribution of the cDNA libraries were determined using the Agilent 2100 Bioanalyzer DNA1000 chip (Agilent Technologies). The clusters were generated on a cBot cluster generation system (Illumina) and sequencing was done on Hiseq 4000 with 300 bp paired-ends. One microgram of RNA from each sample was reverse transcribed into cDNA using QuantiTect Reverse Transcription Kit (Qiagen, Hilden, Germany). PCR reactions were performed on QuantStudio 7/6 Flex qPCR (Applied Biosystems, California, USA) using PowerUP SYBR Green Master Mix (Applied Biosystems). All data were normalized to β-actin. Non-specific amplifications were checked by the use of melting curve. The relative changes in target gene expression were analyzed using 2-CT method. Sequences of primers used in current study are listed in Table 2. The primers were designed using Primer3 (http://www.ncbi.nlm.nih.gov/tools/primer-blast/).

RNA-Seq Data Analysis
Pair end reads were aligned to the hg19 human reference genome in CLC Genomics Workbench-12 (QIAGEN, Germany). The abundance of the expression of transcripts was measured as the score of TPM (Transcripts Per Million) mapped reads in CLC Genomics Workbench 12. Abundance data were subsequently subjected to differential gene expression using 2.0-fold change and < 0.05 p-value cut-off in CLC Genomics Workbench 12.

Gene Set Enrichment and Modeling of Gene Interactions Networks
Upregulated genes were imported into the Ingenuity Pathways Analysis (IPA) software (Ingenuity Systems; www.ingenuity. com/) and were subjected to functional annotations and regulatory network analysis using upstream regulator analysis (URA), downstream effects analysis (DEA), mechanistic networks (MN) and causal network analysis (CNA) prediction algorithms. IPA uses precise to predict functional regulatory networks from gene expression data and provides a significance score for each network according to the fit of the network to the set of focus genes in the database. The p-value is the negative log of P and represents the possibility that focus genes in the network being found together by chance (12,13).

Statistical Analysis
Statistical analyses and graphing were performed using Microsoft excel 2016 and GraphPad Prism 8.0 software (GraphPad, San Diego, CA, USA). The Benjamini-Hochberg False Discovery Rate (FDR) method was used for multiple testing corrections. For comparative qRT-PCR analysis, p-values ≤ 0.05 (two-tailed t-test) were considered significant. For IPA analyses, a Z score (2.0 ≤ Z ≥ 2.0) was considered significant.

RNA-Seq Gene Expression Profiling in BC
The clinical information of the six patients included in the current study and their tumor characteristics are provided in Table 1. To characterize the transcriptional landscape alterations during malignant transformation, tumor, and adjacent normal breast tissues from six BC patients were subjected to whole transcriptome RNA-Seq analysis. As shown in Figure 1A, hierarchical clustering based on differentially expressed RNA transcripts revealed clear clustering of breast cancer from adjacent normal tissues. Using 2.0 FC and ≤ 0.05 FDR p-value cut off, 1108 upregulated and 518-downregulated transcripts were identified (Supplementary File 1). Selected number of upregulated (forkhead box A1 (FOXA1), mucin 1 (MUC1), and downregulated [hemoglobin alpha 2 (HBA2), myocilin (MYOC), hemoglobin subunit beta (HBB), and hemoglobin subunit alpha 1 (HBA1)] genes from the RNA-Seq data were subsequently validated using quantitative reverse transcriptase-PCR (qRT-PCR) (Figure 1B) and demonstrated concordant expression to those observed in the RNA-Seq data  File 2). Canonical pathway analysis on the upregulated gene transcripts using ingenuity pathway analysis (IPA) revealed most significant enrichment in pathways related to pyrimidine ribonucleotides biosynthesis, and estrogen-mediated S phase entry, while G-Protein inhibitory (Gai) and IL8 signaling were among the most under-presented canonical pathways ( Figure 1C).

Activation of Cancer Cell Proliferation, Invasion, and Metabolism of DNA Functional Categories in Breast Cancer Tissue
IPA downstream effector analysis provides a powerful tool to predict the increase or decrease in downstream biological activities and functions that are likely to be casually affected by the transcriptome data. Figure 2A presents a high-level tree map of affected downstream functional categories based on differentially expressed genes in breast cancer tissue. The major colored rectangles indicates a family of associated biological functions or diseases, blue (decreasing) and orange (increasing), and dimension (using FET P-value) of rectangles indicates where associated functions are predicted to up or down most significantly as a group, the color intensity specify higher absolute Z-scores. This analysis revealed remarkable enrichment in several functional categories, mainly those involved in cancer cell growth, and proliferation ( Figure 2B). Furthermore, functional categories associated with tumor cell movement and invasion were enriched, while those associated with myeloid and phagocyte cell chemotaxis were diminished ( Figure 2C). Notably, functional categories associated with DNA replication, recombination and repair were also upregulated in BC tissues, especially those involved in chromosome alignment and metabolism of DNA (Figure 2D), while those involved in cell death were under presented ( Figure 2E). Taken together, our data revealed a significant increase in cell proliferation, migration, DNA replication, while chemotaxis, and cell death-associated functional categories were suppressed.
ERBB2 (HER2), which was upregulated in BC tissues, is predicted to be directly activating NCOA3 and inhibiting CDKN1A and AR (inconsistent relationship) with more confidence. Similarly, activated ERBB2 is directly inhibiting EGFR (inconsistent relationship) and ERK with less confidence, and in a high confidence state PPARG (inconsistent relationship) has been found to be inactivated through the down regulation of ERK and activation of NCOA3. Furthermore, upregulation of ERBB2 was predicted to activate the NFkB complex and impeding the RELA and STAT3 through the downregulation of ERK. However, its inconsistent relationship, in a high confidence mode tumor suppressor TP53 function, was disabled through direct inhibition of CDKN1A and AR. Moreover, HIF1A, CTNNB1, and ESR1 are predicted to be activated by the upregulated ERBB2 through the intermediated EGFR and AR downregulations ( Figure 4A). Analogous to the ERBB2, FOXM1 also exert its inhibitory effect on TP53 via downregulation of CDKN1A (Figure 4B). While excavating further on the ESR1 function from the Figure 4A, though the effect of relationship was not predicted, RARA, NCOA2, NCOA3, HIF1A, JUN, and CTNNB1 are predicted in an active state whereas EGFR, SP3, and STAT3 are inactivated (Figure 4C). Similar occurrence has also been observed with IGFBP2 activation in BC tissues ( Figure 4D).

Breast Cancer Gene Signature Is Highly Enriched in Genes Indicative of Breast Cancer-Related Functional Categories With More Confidence and High Level of Predicted Relationship
Interestingly, we also observed ESR1 as a key hub gene in the cancer network. This interaction network is illustrated as genes (presented as nodes) and biological relationships between nodes (presented as edges) as mapped by IPA. The intensity of the node color correlates with the degree of gene upregulation. Nodes are displayed using different shapes representing the functional class of the gene (e.g., Enzyme, growth factors, transporters, etc.) that is illustrated in the corresponding legend ( Figure 5A).
The top enriched functional network generated by the regulator effect network analysis in IPA is the cell cycle progression, and cell proliferation of breast cancer cell lines network. The network combined differentially expressed potential regulators (12) and genes (34, including 31 increased and 3 decreased legends) in the middle of the hierarchy which are involved in the two major downstream effector function such as cell cycle progression and cell proliferation of breast cancer cell lines (Figure 5B). The other intriguing enriched network was that involved in the metabolism of DNA. This functional network consists of 3 activated (E2F, E2F3, and HELLS) and two inhibited (E2F6 and MAP2) upstream regulators, and 12 upregulated and one downregulated gene, in the middle hierarchy which are involved in metabolism of DNA (Figure 5C). Orange symbols at the top are the predicted upstream regulators. Colored symbols represent upregulated genes, with color intensity corresponding to the change in gene expression (Figure 5C).
We subsequently compared the list of differentially expressed genes from the current study to those reported in the TCGA invasive breast cancer dataset. Overall, we observed a large similarity between the upregulated (43.5%) and downregulated (62.1%) genes in the current study and those reported in the TCGA invasive breast cancer dataset (Figures 6A,B). The expression profile of select cancer-related genes, which were upregulated in current study, in the TCGA dataset is presented in Figure 6C, which was concordant with our data.

DISCUSSION
The pathobiology of breast cancer is orchestrated by complex regulatory networks involving many gene hubs and regulatory molecules (20)(21)(22). Deciphering such complex signaling and functional networks provides a foundation for future development of targeted therapeutic interventions and disease biomarker discovery. While a multitude of transcriptomic data are currently available for breast cancer from several parts of the world, there are almost no such studies performed on breast cancer from the gulf region (23)(24)(25). Our previous data have highlighted a number of common and novel transcriptome networks in colorectal cancer from patients in the gulf region, suggesting a possible role for environmental and genetic factors in shaping the transcriptome of colorectal cancer (26, 27). Therefore, our current study provides the first RNA-Seq transcriptome analysis of breast cancer from the gulf region. Herein, we integrated the power of next generation sequencing with the ingenuity pathway analysis platform to understand the biology of breast cancer and to highlight various signaling and functional perturbations during breast cancer development in this geographic region, highlighting a number of key signaling networks in breast cancer.
Our global analyses revealed the enrichment of gene signatures indicative of cell proliferation and movement (migration and invasion), DNA replication and recombination, and immune cell trafficking, while genes associated with cell death were underrepresented. Mechanistic network analyses revealed activation of several signaling cascades with ERBB2, FOXM1, and ESR1 on top of the hierarchy. Our data are concordant with other studies highlighting an important role for ERBB (HER2), FOXM1, and ESR1 in breast cancer from other geographic regions (28)(29)(30).
In agreement with our expression data, functional studies revealed exogenous expression of ERBB2 in ERBB2-negative breast cancer cells (MCF7 and T47D) to enhance the expression of FOXM1 and MMP2. Inhibition of FOXM1 by RNA interference prevented induction of invasion by ionizing radiation (IR), while overexpression of FOXM1 in MCF10A cells was sufficient to promote IR-induced invasion (31). On the other hand, silencing of IGFBP-2 suppressed MCF7 breast cancer cell proliferation and increased cell death, suggesting IGFBP-2 as promoter of breast cancer survival (32). Interestingly, silencing of ER-α/ESR1 reversed the ability of IGFBP-2 to confer cell survival, suggesting IGFBP-2 to modulate IGFs, to directly regulate PTEN, and to play a role in maintaining ER-α expression (33). Those data corroborate a functional role for the identified molecular networks in BC biology.
Interestingly, one of the patients used in the RNA-Seq experiments was classified as HER2 − based on the pathological report; however, the expression of HER2 mRNA transcript was elevated based on the RNA-Seq data, suggesting possible differences in the pathological and molecular assessment of HER2 expression in breast cancer. Our data also revealed upregulation of not only ERBB2, but also ERBB3 and ERBB4 in breast cancer.
Tumor-infiltrating immune cells play critical roles in breast cancer pathogenesis (34). Our data highlighted the presence of a gene signature indicative of altered immune infiltration. Interestingly, CXCL9 and CXL10 were upregulated in BC tissue and were shown before to enhance the mobilization of cytotoxic T cells form regional lymph nodes to tumor tissues and to promote CTL-mediated tumor immunity (16). In contrary, expression of CXCL17 by tumor cells was shown to recruit CD11b + Gr1 high F4/80 − immune cells and to promote tumor progression in mice (17). On the other hand, several other chemokines were downregulated in breast compared to normal tissue. For instance, previous studies had shown strong correlation between CCL2, which is downregulated in BC tissue, expression and TAM infiltration and tumor progression (18). Interestingly, both CCL3 and CCL8 binds to CCR5, which was shown to regulate breast cancer cell proliferation, through P53 activation (19). Our data implies loss of CCL3 and CCL8 in breast cancer could lead to enhanced cell proliferation and progression. Altered chemokine expression in the tumor microenvironment (TME) results into several consequences including leukocyte activation and trafficking, angiogenesis, metastasis, and proliferation of cancer cells (35,36). In ovarian cancer, it has been reported that monoclonal antibodies or pharmacological inhibitors targeting CCL11 may be beneficial for the treatment of the disease (37). Additionally, both in vivo and in vitro studies in breast cancer patients elucidated the importance of CXCL17-CXCR8 axis in promoting the proliferation and migration of cancer cells (38). Additional studies on primary colorectal tumor showed that the expression of CXCL17 on tumor cells promotes angiogenesis and tumor infiltration of immune cells (39). These data show that CXCL17 could be a promising target for cancer immunotherapy. On the other hand, other reports showed that chemokine ligands including CXCL9 and CXCL10 have potential angiostatic and anti-tumor activities (40,41). The interaction between CXCL9/CXCL10 with CXCR3 can recruit anti-tumoral dendritic cells, T lymphocytes and natural killer cells to the TME, which could be beneficial for tumor suppression (41). Our data suggest that in the breast TME, chemokines/receptors including CCL11, CXCL17, CXCL9, and CXCL10 were significantly upregulated, compared with normal tissues. Apart from chemokines, ITGA2 and EGF were also found to be upregulated ( Figure 3A). It has been reported that ITGA2 is expressed more significantly in glioblastoma compared with normal glial cells, and targeting ITGA2 through monoclonal antibody could impede the migration of glioblastoma cells, but not their proliferation (42). Moreover, some reports showed that EGF promotes epithelial-mesenchymal transition (EMT), which could contribute to the migration/metastasis of tumor cells, and resistance to chemotherapy or hormonal therapy (43,44). We also found that SERPINA1 and SERPINA3 to be upregulated in breast TME ( Figure 3A). Previous reports showed that SERPINA1 and SERPINA3 are potential prognostic markers and therapeutic targets for colorectal cancer and melanoma, respectively (45,46). Furthermore, another chemokine, CXCL17, which was found to be upregulated in breast tissue (Figure 3A), has been reported to be involved in angiogenesis, recruitment of immune suppressor cells and tumor metastasis (17,47). CXCL17 was preferentially expressed in the aggressive types of breast, lung and gastrointestinal cancer cells, resulted in the accumulation of immature CD11b + Gr1 + myeloid-derived suppressor cells at tumor sites (17,47). These data revealed that targeting migrationrelated genes including chemokines and their receptors in breast TME might be beneficial for tumor immunotherapy.
Current data revealed large similarity in the transcriptome of breast cancer from our study (43.5% upregulated), and (62.1% downregulated) when compared to differentially expressed genes from the TCGA BC dataset. In particular, we observed common altered breast cancer-driver genes in both datasets, suggesting common altered mechanism in breast cancer, regardless of the geographic region. Therefore, our data highlight a common molecular signature associated with key signaling networks in breast cancer, regardless of the ethnic background and geographical relation, which warrants further investigations using larger sample size and multicenter involvement.

DATA AVAILABILITY
This manuscript contains previously unpublished data. The name of the repository and accession number(s) are not available.