NFIX as a Master Regulator for Lung Cancer Progression

About 40% of lung cancer cases globally are diagnosed at the advanced stage. Lung cancer has a high mortality and overall survival in stage I disease is only 70%. This study was aimed at finding a candidate of transcription regulator that initiates the mechanism for metastasis by integrating computational and functional studies. The genes involved in lung cancer were retrieved using in silico software. 10 kb promoter sequences upstream were scanned for the master regulator. Transient transfection of shRNA NFIXs were conducted against A549 and NCI-H1299 cell lines. qRT-PCR and functional assays for cell proliferation, migration and invasion were carried out to validate the involvement of NFIX in metastasis. Genome-wide gene expression microarray using a HumanHT-12v4.0 Expression BeadChip Kit was performed to identify differentially expressed genes and construct a new regulatory network. The in silico analysis identified NFIX as a master regulator and is strongly associated with 17 genes involved in the migration and invasion pathways including IL6ST, TIMP1 and ITGB1. Silencing of NFIX showed reduced expression of IL6ST, TIMP1 and ITGB1 as well as the cellular proliferation, migration and invasion processes. The data was integrated with the in silico analyses to find the differentially expressed genes. Microarray analysis showed that 18 genes were expressed differentially in both cell lines after statistical analyses integration between t-test, LIMMA and ANOVA with Benjamini-Hochberg adjustment at p-value < 0.05. A transcriptional regulatory network was created using all 18 genes, the existing regulated genes including the new genes PTCH1, NFAT5 and GGCX that were found highly associated with NFIX, the master regulator of metastasis. This study suggests that NFIX is a promising target for therapeutic intervention that is expected to inhibit metastatic recurrence and improve survival rate.


INTRODUCTION
The 5-year survival rate of patients with stage 1A and 1B non-small cell lung cancer (NSCLC) is about 49% and 45% respectively (American Cancer Society, 2015). The poor prognosis in lung cancer suggests that some cells in the primary tumours are programmed to metastasize (Quail and Joyce, 2013;Seyfried and Huysentruyt, 2013). Lung cancer is usually asymptomatic in the early stage and typically symptoms arise when the tumor has metastasized (Slatore et al., 2011). Besides drug resistance, metastasis is also a major cause in cancer recurrence and a big concern in treatment effectiveness (Tang et al., 2013). Therefore, it is crucial to determine the regulatory network that controls the metastatic genes in lung cancer to improve prognosis and increase the survival rate of patients. These genes are not well defined because of the gene transcription changes at every stage of cancer (Shibayama et al., 2011). However, identification of the transcription factors (TFs) that regulate the metastasis mechanisms could be a novel approach for reducing metastasis in lung cancer.
Previous studies of colorectal cancer have indicated that distinct gene expression profiles are associated with disease progression and metastatic recurrence (Shibayama et al., 2011). Metastasis is a multi-step process where tumor cells interact with their microenvironment (Chiang and Massagué, 2008). Expression of the metastatic signature has been directly associated with poor prognosis in most cancers, including colorectal and breast cancers (Hanahan and Weinberg, 2000). For example, expression of the ELAC1 gene in colorectal cancer is essential for inducing tumor development, cell growth and disruption of apoptosis pathways (Kleivi et al., 2007). MTA1 gene is involved in the progression and metastasis of breast cancer (Gururaj et al., 2006). Overexpression of MTA1 also promotes the transcription of oncogenes (Gururaj et al., 2006;Pakala et al., 2013).
Patients with metastatic gene expression shows poor prognosis compared to patients with no metastatic gene signatures (Ramaswamy et al., 2003). The authors suggested that rare cells within a primary tumor have the metastatic phenotype, where they can migrate and invade other cells (Ramaswamy et al., 2003). Although more than 90% of metastasis signatures have been discovered, the metastatic mechanisms of these signatures are largely unknown (Sleeman and Steeg, 2010). Gene expression profiling for identification of molecular signatures results in extensive data on differentially expressed genes (Rohrbeck et al., 2008). The gene expression data provides an insight into every step of lung carcinogenesis according to the different tumor morphology (Borczuk et al., 2003). Using the bioinformatics software, a molecular network on metastasis can be constructed and the TFs that control the metastasis process can be identified.
Predicting the specific binding of TFs to a DNA sequence is the main key in constructing a specific transcriptional regulation network which leads to the metastatic process. To facilitate understanding of the metastatic mechanisms, the TFs need to be identified first; hence the regulatory network can be constructed. These TFs could potentially be targeted to control lung cancer metastasis, thus could improve the survival and prognosis of patients with lung cancer.

Identification of Candidate Metastatic Lung Cancer Genes
Nine lung cancer datasets were selected from the Oncomine database 1 : Bhattacharjee, Beer, Bild, Hou, Lee, The Cancer 1 www.oncomine.org/ Genome Atlas (TCGA), TCGA2, Tomida and Lung. These datasets are based on the original microarray analyses on lung cancer by various researchers that published and pooled together in data-mining platform of Oncomine for easy discovery. The difference between both TCGAs is that the second dataset (TCGA2) is actually containing the data from new samples profiled since the first published. Genes related to non-small cell primary lung cancer adenocarcinoma, the 5-year survival rate and stage I and II pathology subtypes were retrieved using the given filters. All datasets were compared except for the outliers, and the cut-off point was set at p-value < 0.01. FunDO is a functional disease ontology annotation database (django.nubic.northwestern.edu/fundo/) lists the genes involved in various cancers (Osborne et al., 2009). The common genes involved in cancers and lung cancer obtained were classified according to their biological processes and molecular functions (cell-cell adhesion, cell migration, cell motion, cell death, apoptosis, programmed cell death, regulation of locomotion, cell migration, cell cycle, cell division, localization of cell, cellular differentiation, cell viability, inflammation, regulation of transcription, angiogenesis and oncogenic signaling pathway) were using DAVID Bioinformatics Database Version 6.7 (Huang et al., 2008). Integrated genes from DAVID analysis underwent further filtration by Pathway Studio analysis with the cut-off point at p-value < 0.01. Overlapping genes were selected using GeneVenn (Pirooznia et al., 2007).

Identification of the Master Regulator of Metastatic Lung Cancer
10kb promoter sequences upstream from the transcription start site (TSS) of the final candidate genes were retrieved from the Eukaryotic Promoter Database (EPD 2 ) and the National Centre for Biotechnology Information (NCBI 3 ) for TF recognition in FASTA format. MATCH, a TRANSFAC R gene regulation database program, was used for predicting putative transcription factor binding sites (TFBS) in DNA sequences. TFBS library in the TRANSFAC R database was used to construct specific binding site weight matrices in TFBS prediction. A maximum core dissimilarity cut-off value of 15% (85% core similarity) was chosen for each matrix. This parameter shows the similarity of a sequence and the weight matrix used by the system to report the TF as a true binding site. PROMO version 3.0.2 4 scans and identifies TFBS in promoter regions by weight matrix search (Farré et al., 2003;Messeguer et al., 2012). Factors predicted within a dissimilarity margin less or equal to 15%.

Identification of Specific Genes and Their Master Regulator
Transcription factor binding sites derived from both MATCH and PROMO were integrated to identify the common TFs of the regulated genes. To obtain novel findings, we constructed a network that includes all regulated genes and the predicted TFs using Pathway Studio and String 2.0. One of the direct networks was selected based on in silico prediction results as shown in Section "Results."

Lentiviral Vector and Cell Lines Used
The lentiviral vector with NFIX plasmids (shRNA) was purchased from Thermo Scientific Open Biosystems (BD Biosciences, United States). Plasmids were cultured in Luria broth with ampicillin (AMRESCO, United States) and DNA was extracted using Qiagen purification kit (Germany) for transient transfection. Two human lung cancer cell lines, A549 and NCI-H1299 were purchased from American Type Culture Collection (United States). A549 cells, from a human lung carcinoma cell, maintained in Kaighn's modification of Ham's F-12 (F-12K) medium (Thermo Fisher Scientific, United States). NCI-H1299 cells, also derived from a human lung carcinoma and metastatic site lymph nodes, maintained in Roswell Park Memorial Institute1640 (RPMI1640) medium (Thermo Fisher Scientific, United States). Both medium were supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, United States).
Transient Transfection of NFIX shRNA 4 × 10 4 cells were cultured with serum-free medium without antibiotics in a 6-well culture plate until 60-70% confluent. One microgram of shRNA DNA was transiently transfected into the cells using TurboFECT (Thermo Fisher Scientific, United States). Three NFIX shRNAs, GAPDH shRNA (endogenous control) and non-silencing plasmid were used to study the effects of NFIX loss of function. Transfection efficiency was measured using the fluorescent imaging software NIS-Elements (Nikon Instrument Inc., United States) standardized in all materials and methods. The experiment was performed in triplicate.

Cell Migration and Cell Invasion Assays
The QCM TM Chemotaxis (3 µm) migration kit (Chemicon, United States) and QCM TM 24-well Fluorimetric Cell Invasion assay kit (Millipore, United States) were used to measure the migration and invasion activities of the transfected cell lines. The membrane of the ECMatrix TM in the invasion kit only allows invasive cells to migrate toward the underside of the insert filter into the bottom well. In total, 1 × 10 6 cells were harvested in 1ml serum-free medium and 250 µl cells were pipetted into the insert in triplicate for each treatment. About 400 and 500 µl complete culture medium was filled in the bottom of the well before soaking the insert. After 48 h incubation, the migrated and invaded cells were detached using cell detachment solution and dyed with cyQUANT R GR and 4X Lysis Buffer in a ratio of 1:75. The migrated and invaded cells at the bottom of wells were measured using VarioskanFlash (Thermo Fisher Scientific, United States) at 480/520 nm. The experiment was conducted in triplicate.

Gene Expression Microarray
Transfected cells were collected and RNA was extracted using NucleoSpin R RNA II (Macherey-Nagel, Germany) and diluted with RNase-free water to a final concentration of 150 ng/µl. RNA concentration and purity were measured using NanoDrop (Thermo Fisher Scientific, United States) and Bioanalyzer 2100 RNA 6000 kit (Agilent Technologies, United States  hybridisation assay system from Illumina was used to process the BeadChip. Cy3-streptavidin (Thermo Fisher Scientific, United States) was introduced as biotin to the analytical probes in the hybridized BeadChip. The iScan Image BeadChip software system was used to scan the BeadChip. Genome Studio, Partek and R software were utilized to analyze the microarray data.

In Silico Analyses for Identifying Master Regulator and the Candidate Genes
Nine datasets were obtained from the Oncomine database and 3561 genes were selected based on p-value < 0.01 after filtration for the criteria of lung adenocarcinoma, 5-year survival rate and stage I and II pathological subtypes. Early stages of cancer were used to find the genes commonly appear compared to the late stages as we believed they are aiding to metastasis in the micrometastasis stage. The FunDO results were presented in two data sets: one is showing the gene list in lung cancer and the other one is containing the gene list for various cancers such as colorectal, breast, embryoma and prostate. Tables 1, 2 show the gene list involved in lung cancer and various cancers respectively. Figure 1A shows the network involved in various diseases from 549 up-regulated and 674 down-regulated genes. About 18 genes are involved in lung cancer regulation and 88 genes are shared by various cancers based on the gene network that is automatically generated by FunDO analysis using the algorithm set up with Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover genedisease relationship from the GeneRIF database. All 106 genes were enriched and classified by selected ontology using DAVID and Pathway Studio. All genes involved in the migration pathway were selected as final candidate genes. Figure 1B shows all candidate of transcription regulator and Table 3 shows the final gene list commonly involved in lung cancer with their respected direction of regulation in lung cancer including ALOX15B, CALCA, CEACAM1, DLC1, ICAM1, IL6ST, ITGB1, RET, S100P, VEGFA, BCL2, IGF1R, IGFBP5, ITGA6, NRP2, PDGFB and PDPN.  and NF1A/NFIX. We selected nuclear factor I X (NFIX) as the candidate master regulator as it is a family of closely related TFs that constitutively bind as dimers to specific DNA sequences with high affinity. We chose NFIX instead of another TFs since it has higher occurrence. NFIX is also less studied especially in lung cancer. Furthermore, NFIX has a high occurrence in 10 kb upstream of the promoter sequences of the candidate genes. We constructed a network consist of all our known regulated genes and the predicted TFs using Pathway Studio and String 2.0, however, we only pick one direct network based on in silico prediction results as shown in Figure 2. The signaling network of IL6ST was evaluated and TIMP1 was established as the gene that is co-expressed in the migration program. In addition, ITGB1 was selected as the final candidate gene to complete the network as it is involved in the migration and invasion pathways as determined in the previous analyses.

NFIX Decreased Cellular Migration and Invasion
Transfection of all NFIX shRNAs significantly reduced the migration and invasion activities in both cell lines at 48h post-transfection. Transfection of NFIX_1 shRNA showed the most significant reduction in both cell lines after 48h (A549: p-value = 7.9 × 10 −3 ; NCI-H1299: p-value = 7.3 × 10 −3 ). This observation suggests that NFIX may play an important role in lung cancer metastasis. Figure 5 shows that NFIX shRNAs transfection reduced migration and invasion activities in both cell lines.

Microarray Analysis of Master Regulator and Regulatory Network in Lung Cancer
Gene expression microarray analyses were carried out to find differentially expressed genes before and after NFIX knockdown by comparing each cell line with respective controls at p-value < 0.05. Figure 6A depicts the heatmap of the top 50 genes including NFIX, IL6ST, TIMP1 and ITGB1, for both cell lines at 48 h post-transfection. All differentially expressed genes in both cell lines from microarray analyses were integrated before undergo the statistical analyses whereas Figure 6B demonstrates the results of integrated analyses from all statistical software used. Table 5 listed 18 genes identified via integrated analyses with respect to their fold change. All 18 genes were mapped into pathways to determine their involvement in lung cancer metastasis and how they are regulated by NFIX.

Construction of a New Regulatory Pathway from Genes Identified in Microarray Analyses
The regulated genes ITGB1, TIMP1 and IL6ST, were included in the pathway altogether with 17 genes identified in the gene expression microarray using GeneMANIA. One pathway was selected as a new regulatory pathway network where it contained the most regulated genes in lung cancer metastasis as explained in Figure 2 above. Figure 7A shows several new pathways created from genes identified in the meta-analysis and the microarray analyses. One direct network was selected prior to determination of their functions using GeneCards. NFIX, a master regulator regulates ITGB1, TIMP1 and IL6ST, as well as three new   entities, PTCH1, NFAT5 and GGCX, were included in this new transcriptional network as shown in Figure 7B.

DISCUSSION
The understanding of tumor progression and TFs are critical in the efforts of identifying new biomarkers, inventing novel therapeutics, making patients' prognosis and designing an advance cancer detection tools (Gomez, 2016). The identification of gene signatures for metastasis is crucial for preventing tumor cells from metastasizing, to improve lung cancer prognosis as well as increased survival. The target gene could be used for future therapy to inhibit the metastasis process in lung cancer. The identification of TFs that regulate gene expression is essential for understanding the whole transcription process, particularly the genes involved in metastasis. In this study, we identified 17 genes involved in the migration pathway using DAVID and Pathway Studio. Subsequently, 10 kb upstream of the promoter sequences of these 17 genes were retrieved and finally we postulated that NFIX is a master regulator for lung cancer metastasis. We showed evidence that NFIX regulates IL6ST, TIMP1 and ITGB1 genes. Altogether, these genes promote cell migration and invasion, which lead to lung cancer aggressiveness, poor survival and high mortality rates. However, further study need to be done to understand the molecular mechanism of NFIX in regulating these regulated genes and functional pathways of metastasis. In addition to NFIX, AP1 was also identified more occurrences at 10 kb upstream of the promoter sequence. The AP1 gene is well studied in the migration pathway of breast cancer via T-test, LIMMA and ANOVA were used to identify differentially expressed genes between NFIX silencing and non-silencing in the A549 and NCI-H1299cell lines. In total, 18 genes were identified as differentially expressed between both cell lines ( * with Benjamini-Hochberg adjustment with p-value < 0.05).
the extracellular signal-regulated kinase pathway (Chen et al., 2009). It plays an important role in regulating endothelin-1 (ET1) gene transcription in endothelial cells (Kawana et al., 1995). The TF SP1 also identified with the highest occurrences; it is highly expressed in the tumor cells progression of gastric cancer . Blomquist et al. (1999) showed that NFIX acts as a specific dimer to DNA sequences with high affinity which can increase the specificity for molecular recognition and aid targeted therapy in lung cancer (Blomquist et al., 1999).
Silencing of NFIX reduced the expression of its regulated genes IL6ST, TIMP1 and ITGB1. Necula et al. (2012) showed that increased levels of IL6ST in tissues (>758 pg/ml) and plasma (>38.21 pg/ml) are associated with shorter survival in gastric adenocarcinoma. IL6 manipulates the tumorigenesis process by activating the genes involved in differentiation, survival, apoptosis and proliferation (Lin et al., 2007). In addition, IL6 may be involved in promoting inflammation by inducing antiapoptotic signals which is mediated by STAT3 (Necula et al., 2012). Lissat et al. (2015) showed that high expression of IL6ST aids Ewing sarcoma tumor progression and renders it resistant to apoptosis and promotes metastasis, thus suggesting that IL6ST plays a role in cancer metastasis. TIMP1 promotes melanoma through activation of the AKT pathway independent of the PI3K signaling pathway (Toricelli et al., 2013). In human melanoma cell lines, TIMP interacts with ITGB1 and CD63 and assists melanomagenesis and resistance to the cell death program (Toricelli et al., 2013).
A previous study of ITGB1 demonstrated its association with migration activity, invasion and wound healing in an in vitro experiment of lung adenocarcinoma . The authors reported that osteopontin, LAMB3 and ITGB1 are pro-metastatic genes for lung cancer. Osteopontin also leads to increased vascular endothelial cell migration, proliferation, angiogenesis and tumor growth in lung cancer (Cui et al., 2007;Fong et al., 2009). Lymphatic metastasis in lung cancer has a high concentration of ITGB1 as compared to non-lymphatic metastasis . Our data demonstrate similar results in the in vitro experiments, where significant reduction  (B) Successful construction of a new transcriptional network from both meta-analysis and microarray analyses. NFIX was identified as a master regulator that regulates the ITGB1, TIMP1 and IL6ST genes. Three new entities: the PTCH1, NFAT5 and GGCX, were also included in this metastasis regulatory network for lung cancer. NFIX is the master regulator; IL6ST, TIMP1 andITGB1 are the NFIX-regulated genes and PTCH1, NFAT5 and GGCX are genes involved in the NFIX transcriptional regulatory network.
of cell proliferation, migration and invasion was observed post-NFIX knockdown. The in vitro studies were in concordance with the in silico analyses predictions, where NFIX regulates IL6ST, TIMP1 and ITGB1 expression. Microarray analyses were carried out to discover differentially expressed genes in A549 and NCI-H1299 cell lines post-NFIX silencing as compared to non-silenced cells. Silencing of cell line A549 using NFIX_3 shRNA does not show any difference compared to the other shRNAs. We suggest that the sequence in NFIX_1 shRNA is the most powerful in silencing the master regulator NFIX especially in aggressive cell line NCI-H1299. NFAT5 was identified as a new entity in this network. NFAT5 is a crucial component in tumor development and progression, particularly in regulating inflammation in the carcinogenesis process (Neuhofer, 2010;Yoon et al., 2011). We also observed that PTCH1 is co-expressed with NFIX. PTCH1 serves as a tumor suppressor gene as demonstrated in NSCLC cell lines (Shikata et al., 2011). Li et al. (2012) showed that PTCH1 interacts with the Hedgehog (Hh) signaling pathway via the effect of miRNA-212 on cell proliferation. However, our microarray data showed that PTCH1 is co-expressed with NFIX. Further studies on the co-expression between these two genes need to be performed in the future. The GGCX gene, encoding the enzyme responsible for the post-translational modification of vitamin K-dependent proteins in haemostasis (Stafford, 2005;Weston and Monahan, 2008), was also found in our newly constructed transcriptional regulatory network. Our findings suggest that NFIX may be a potential biomarker with high potential for inhibiting the metastatic process in NSCLC as demonstrated by the in-silico and in vitro experiments.

CONCLUSION
In silico analyses identified NFIX as a predicted master regulator. Based on the in silico findings, functional assays and microarray, a new transcriptional network involving the inflammation, migration and invasion pathways specific to metastatic lung cancer was created. We acknowledged the in-silico results are considered the limitation of our study as the results of the same analyses will always be different with the new additions from new discoveries as the programmer might update and upgrade the software and algorithms. We used several in silico analyses to strengthen and support our findings. We also ran all NFIX shRNAs lentiviral with different sequences in our study to check the best shRNA needed to significantly reduce the expression of the master regulator and gene functions that lead to metastasis phenotypic outcome. We suggest that NFIX is a master regulator that regulates metastasis in lung cancer by transcribing the genes responsible in activating the pathways leading to micrometastasis and finally cause the cancer cells to metastasize such as inflammation, proliferation, migration and invasion. Where its' silencing can reduce cellular proliferation, migration and invasion in vitro especially as observed in aggressive cell line. Hence, NFIX with IL6ST, TIMP1, ITGB1, PTCH1, GGCX and NFAT5 genes may regulate the migration and invasion process and they could serve as potential therapeutic targets in patients with lung cancer to predict survival and improve prognosis. We couldn't carry the in vivo and also protein study experiments due to lack of funding. However, more detailed functional studies, in vivo and clinical validation need to be carried out in the future to prove this concept and make the study more transparent and reliable.

AUTHOR CONTRIBUTIONS
This study was planned by RH. All experiments were carried out by NIAR as described in the manuscript including interpretation of data and statistical analyses. MMM participated in microarray data analysis. RH and NAAM supervised the student, assisted in study and manuscript writing. The manuscript was written by NIAR with comments from all co-authors and mostly from NAAM, RH, and RJ. All co-authors read and approved the final version of the manuscript.