Integrated Analysis of Distant Metastasis-Associated Genes and Potential Drugs in Colon Adenocarcinoma

Background: Most colon adenocarcinoma (COAD) patients die of distant metastasis, though there are some therapies for metastatic COAD. However, the genes exclusively expressed in metastatic COAD remain unclear. This study aims to identify prognosis-related genes associated with distant metastasis and develop therapeutic strategies for COAD patients. Methods: Transcriptomic data from The Cancer Genome Atlas (TCGA; n = 514) cohort were analyzed as a discovery dataset. Next, the data from the GEPIA database and PROGgeneV2 database were used to validate our analysis. Key genes were identified based on the differential miRNA and mRNA expression with respect to M stage. The potential drugs targeting candidate differentially expressed genes (DEGs) were also investigated. Results: A total of 127 significantly DEGs in patients with distant metastasis compared with patients without distant metastasis were identified. Then, four prognosis-related genes (LEP, DLX2, CLSTN2, and REG3A) were selected based on clustering analysis and survival analysis. Finally, three compounds targeting the candidate DEGs, including ajmaline, TTNPB, and dydrogesterone, were predicted to be potential drugs for COAD. Conclusions: This study revealed that distant metastasis in COAD is associated with a specific group of genes, and three existing drugs may suppress the distant metastasis of COAD.


INTRODUCTION
Distant metastasis accounts for a major proportion of cancer-related mortality, including colon carcinoma, which makes up to 10% of the global burden of cancer (1). Colon adenocarcinoma (COAD) is the most common histological subtype of colon carcinoma and has a poor prognosis (2). To date, metastatic colon carcinoma is uncurable in most cases, though there are some treatment options for metastatic disease, such as fluoropyrimidine-based chemotherapy, anti-VEGF agents, anti-EGFR drugs, and immunotherapy (3). Previous studies of COAD focused on the prognostic models and regulators of metastasis (4,5). However, the genes exclusively expressed in metastatic COAD have yet to provide novel biomarkers and drug targets for use in the clinic.
Differences in the gene expression spectrum between primary and metastatic COAD, which are important for finding cancer biomarkers, have not been thoroughly evaluated. Recent advances in colon carcinoma highlight the urgent need for molecular markers and therapeutic targets in the management of metastatic cancer (6,7). Here, we screened the differentially expressed gene (DEG) profiles of primary tumor from COAD patients with distant metastasis (M1) compared with those of patients without distant metastasis (M0) in The Cancer Genome Atlas (TCGA) database. Finally, we identified and validated four prognosis-related genes as well as three potential drugs for COAD based on bioinformatics analysis.

The TCGA Cohort
The RNA-Seq gene expression profiles (Counts format) of COAD patients were downloaded from the TCGA database (https:// portal.gdc.cancer.gov/) in March 2020. Clinical data, such as gender, age, tumor grade, clinical stage, and survival time, were also downloaded from the TCGA portal. Next, according to the American Joint Committee on Cancer (AJCC) staging system, we enrolled COAD samples that pathologic M Stage were M0 (n = 334) and M1 (n = 62), while COAD samples that pathologic M Stage were MX or missing were excluded. Then, R software was used for data extraction and sorting to obtain the gene expression matrices and clinical data.

Screening the Differentially Expressed miRNAs (DEmiRNAs) and DEGs
The miRNA differential expression analysis was conducted using the "edgeR" package in Bioconductor (https://bioconductor. org/). The related codes were run in R, and the DEmiRNAs and DEGs in the primary tumors of patients with distant metastasis (M1) compared with those of patients without distant metastasis (M0) were analyzed through the "edgeR" package. P < 0.05 and |fold change (FC)| > 1 were set as the thresholds for identifying DE-miRNAs and DEGs. The package "pheatmap" was used for clustering and heat map construction.

Predicting the Target Genes of DEmiRNAs
The target genes of the DEmiRNAs were identified using mirDIP v4.1 (http://ophid.utoronto.ca/mirDIP/index.jsp), an integrative database containing nearly 152 million human microRNA-target predictions, which were collected across 30 different resources (8). Then, the common DEGs identified by the above two methods were presented as Venn diagrams and identified as candidate DEGs.

Protein-Protein Interaction (PPI) Network and miRNA-Gene Network Construction
The PPI network and miRNA-gene network were successively constructed. The candidate DEGs were first mapped to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (http://string-db.org) to assess the functional associations among the target genes (9). Only interactions with a combined score of > 0.4 were considered significant. Next, to obtain the hub genes, the degree of connectivity in the PPI network was analyzed by Cytoscape software (version 3.6.0) (10), after which the top 30 hub genes were screened and their PPI subnetwork was established.
To establish the regulatory network of DEmiRNAs and hub genes, first, the upregulated DEmiRNAs and the downregulated hub genes were mapped based on the mirDIP database. In the same way, the downregulated DEmiRNAs and the upregulated hub genes were mapped. Based on this, two miRNA-hub gene regulatory networks were constructed using Cytoscape software (version 3.6.0).

Screening Prognosis-Related Genes
The Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia2.cancer-pku.cn/), an interactive web server for analyzing the RNA-Seq expression data from the TCGA and GTEx projects, was used to analyze the prognostic significance of the screened candidate DEGs in COAD (11). In addition, the PROGgeneV2 database (http://www.compbio. iupui.edu/proggene), a web application for analyzing the prognostic implications of mRNA biomarkers in 18 cancers that combines public repositories such as Gene Expression Omnibus (GEO), European Bioinformatics Institute (EBI) ArrayExpress, and TCGA (12), was used to study the prognostic relevance of the key genes associated with distant metastasis. Furthermore, the package "survival" package was used to plot the Kaplan-Meier survival curves of the TCGA-COAD cohort. In survival analysis, patients were divided into low-expression and high-expression groups based on the median value of the corresponding genes expression.
Next, the Comparative Toxicogenomics Database (CTD) (http://ctdbase.org/) was used to find integrated chemicalgene, chemical-disease, and gene-disease interactions to generate expanded networks and predict novel associations (16). Here, relationships between prognosis-related genes and diseases were investigated.

Survival Curve and Risk Score
After prognosis-related genes were detected, we used step multivariate Cox regression analysis in the "survival" package to build a gene signature panel. Patients were divided into highand low-risk groups using the median risk score, and Kaplan-Meier curves were used to explore overall survival (OS) of the two groups using the "survminer" package. The survival receiver operating characteristic curve (ROC) was drawn and the area under the curve (AUC) was calculated for evaluating prognostic value.

Prediction of Novel Drugs for COAD
The Connectivity Map (CMap) (https://portals.broadinstitute. org/cmap/) (17) is a platform for finding connections among small molecules that share the same mechanism of action and associations with chemicals and physiological processes, diseases, and drugs. In the present study, the candidate DEGs were divided into two groups: up-and downregulated genes. The upregulated and downregulated genes were subsequently uploaded to query CMap, and searches were made for small-molecule compounds. Scores ranged from −1 to 1, representing the relationship between the drugs and input genes, and drugs with a score of ≤−0.75 (p < 0.05, percent non-null > 50) were considered candidate drugs.

ONCOMINE Analysis
ONCOMINE gene expression array datasets (https://www. oncomine.org/), an online cancer microarray database containing 715 gene expression data sets and data from 36 86,733 cancerous and normal tissues, was used to analyze the transcription levels of the four identified genes in colorectal cancer. The mRNA expressions of LEP, DLX2, CLSTN2, and REG3A in colorectal cancer specimens were compared with that in normal controls, using a Student's t-test to generate a p-value.

Tumor Immune Estimation Resource (TIMER) Database Analysis
The Tumor Immune Estimation Resource (TIMER) (18) is an integrative web server for evaluating tumor-infiltrating immune cells across diverse cancer types (https://cistrome.shinyapps. io/timer/). The TIMER includes more than 10,000 samples across multiple cancer types of the TCGA. It applies a partial deconvolution linear least square regression method to calculate the abundance of immune infiltrates. We investigated the correlations between the identified genes expression and immune infiltrates in tumors, including CD8+ T cells, CD4+ T cells, dendritic cells, neutrophils, B cells, and macrophages. The Correlation module was used to draw the expression scatterplots comparing a pair of user-defined genes in COAD (n = 457), together with the Spearman's correlation and estimated statistical significance. The identified genes were represented on the yaxis with gene symbols, whereas the metastasis-related immune markers were placed on the x-axis. The gene expression levels were determined with log2 RSEM. Options used for partial correlation conditioned on tumor purity were also used.

RESULTS
To explore the prognostic markers specifically expressed in metastatic COAD and novel therapeutic strategies, the present study was designed and analyzed according to the flow chart shown in Figure 1. The patients' clinicopathological characteristics of are listed in Table 1. The specific analysis processes are shown in detail in the following figures and tables.

DEmiRNAs and Their Target Genes
To screen DEmiRNAs from the miRNA-Seq dataset of the TCGA cohort, we conducted a differential expression analysis using the "edgeR" software package. In this cohort, 62/379 (16.4%) of the DEmiRNAs were from patients with stage M1, and 317/379 (83.6%) were from patients with stage M0. Then, 14 DEmiRNAs with a critical point of |log2FC > 1| and p < 0.05, were obtained, including 8 upregulated DEmiRNAs and 6 downregulated DEmiRNAs ( Table 2). The heat map and volcano plot are shown in Figure 2. Additionally, 8,918 potential target genes for the 8 upregulated miRNAs and 5,354 potential target genes for the 6 downregulated miRNAs were determined by using the mirDIP database.

Identification of DEGs
To acquire DEGs from the mRNA-Seq dataset of the TCGA cohort, we conducted a differential expression analysis using the "edgeR" software package. In this cohort, 61/391 (15.6%) of the DEGs were from patients with stage M1, and 330/391 (84.4%) were from patients with stage M0. Then, 1,071 DEGs with a critical point of |log2FC > 1| and p < 0.05 were identified. Among these DEGs, 709 DEGs were upregulated and 362 DEGs were downregulated in the metastatic group (Supplementary Table 1). The heat map and volcano plot are displayed in Figure 3. The overlap between the targets of the DEmiRNAs and the DEGs was determined, revealing 127 shared genes ( Figure 4A). We focused our analysis on these 127 genes, which were defined as candidate DEGs (Supplementary Table 2).

Functional Annotation and Pathway Enrichment Analysis
To analyze the enriched pathways of the DEmiRNAs, we carried out KEGG pathway enrichment analysis using the DIANA bioinformatic tool. The 14 DEmiRNAs were significantly enriched in cancer and metastasis-related pathways, such as the cell cycle, adherens junction, and p53 signaling pathway ( Figure 4B).
Moreover, pathway and process enrichment analyses were conducted based on the 127 candidate DEGs. The top 12 involved clusters were "NABA matrisome associated, " "forebrain neuron differentiation, " "cell-cell adhesion via plasmamembrane adhesion molecules, " "skin development, " "blood circulation, " "peptide cross-linking, " "neuronal system, " "forelimb morphogenesis, " "acute-phase response, " "hormone metabolic process, " "glucuronidation, " and "bile secretion" (Figure 4C). Specifically, among these candidate DEGs, 10 genes participated in cell-cell adhesion via the plasma membrane adhesion molecule biological process, including DSG1, CDH9, SLITRK1, CLDN19, PCDHA9, PCDH11Y, PCDHGB7, CLSTN2, REG3A, and CLDN18, which were considered potential metastasisrelated DEGs. FIGURE 1 | Schematic diagram of the study design. The TCGA-COAD cohort was used to identify the differentially expressed miRNAs (DEmiRNAs) and genes (DEGs) between primary COAD patients and metastatic COAD patients. In brief, 14 miRNAs and 1,071 genes were identified as DEmiRNAs and DEGs, respectively. Subsequently, 14,272 genes were identified as targets of the 14 DEmiRNAs. The 14,272 targets and 1,071 DEGs were overlapped. Next, the common gene set consisting of 127 genes were analyzed. As a result, four candidate genes associated with prognosis were obtained by combining the top 30 hub genes with genes involved in metastasis-related processes. Additionally, three potential small-molecule drugs for COAD were predicted using the CMap database and further validated in CTD.

Determination of Key Genes Correlated With Prognosis in COAD
To screen the prognostic genes, Kaplan-Meier analysis was performed based on the expression levels of the hub genes in the TCGA-COAD cohort. Among the top 30 hub genes, we found that two genes (LEP and DLX2) were significantly associated with overall survival (OS), and these genes were considered risky genes, with hazard ratio (HR) > 1 (p < 0.05; Figures 6A,B). Furthermore, for patients with stage M0 disease, higher levels of LEP and DLX2 were markedly associated with worse OS (p < 0.05; Figures 6C,D). However, for patients with stage M1, the expression of LEP had no significant effect on OS (Figure 6E), while higher expression of DLX2 was significantly correlated with worse OS (p < 0.05; Figure 6F). We validated the prognostic values of LEP and DLX2 using the GEPIA database. The results showed that COAD patients with higher LEP expression had worse OS and disease-free survival (DFS) (p < 0.05; Figures 7A,B). In addition, COAD patients with higher DLX2 expression had worse OS (p < 0.05; Figure 7C), but DLX2 expression had no significant effect on DFS ( Figure 7D).
Subsequently, we further investigated the prognostic roles of the above 10 metastasis-related DEGs in colorectal cancer patients. Higher expression of CLSTN2 was markedly correlated with worse OS and metastasis-free survival (MFS) (p < 0.05; Figures 8A,B). In contrast, higher expression of REG3A was  significantly associated with better OS and MFS (p < 0.05; Figures 8C,D).

Risk Score and Survival Analysis
Next, we compared the performance of identified individual genes with the combined gene signature panel. First, we used step multivariate Cox regression analysis to establish a gene signature with prognostic value for COAD. According to the results, the risk scoring system for each patient was calculated as follows: risk score = LEP * 0.134582 + REG3A * (−0.05402) + DLX2 * 0.211469, after which the patients were classified into a low-risk (n = 178) or a high-risk (n = 178) group based on the median risk score. As shown in Supplementary Figure 1A, patients in high-risk group had significantly poorer survival compared with those in low-risk group (log-rank test, p < 0.05). Subsequently, to evaluate the prognostic value of risk score, we drew the ROC curve at 5 years of OS and calculated the AUC value. From Supplementary Figure 1B, the 5 year AUC was 0.672 (AUC > 0.65), which revealed superior predictive accuracy in OS (19).    Second, for the individual genes, the correlations between their expression levels and prognosis are shown in Figure 6. Similarly, the ROC curves at 5 years of OS were plotted based on the risk score of each sample. Additionally, the 5 year AUC of LEP, DLX2, CLSTN2, and REG3A were 0.638, 0.622, 0.5, and 0.62, respectively (AUC < 0.65), indicating that the combined gene signature panel have better performance on evaluating the prognosis of COAD patients. To learn more about the four genes, we analyzed their GO term enrichment related to biological processes, molecular functions, and cellular components (Supplementary Table 3). Notably, both LEP and REG3A were involved in the regulation of the extracellular space. DLX2 participated in modulating DNA binding-related processes, transcription, and cell differentiation. CLSTN2 played a role in homophilic cell adhesion via plasma membrane adhesion molecules. Subsequently, the CTD was used to investigate the diseases related to these genes, and the results showed that all four genes are relevant to colonic neoplasms (Supplementary Table 4).

The Four Prognosis-Related Genes Expressions Are Correlated With Immune Infiltration Level and Immune Markers in COAD
As tumor microenvironment, especially immune cells play an important role in metastasis (20,21), we explored whether the four genes expressions are correlated with immune cells and immune-related markers in COAD.
As regard to metastasis-related immune markers ( Table 3). After the correlation adjustment by purity, the results revealed that the LEP expression level was significantly positively correlated with 18 immune markers among 22 metastasisrelated markers. DLX2 and CLSTN2 expression levels were significantly positively correlated with 11 and 16 immune markers, respectively. However, among 22 metastasis-related immune markers, REG3A expression level was significantly positively correlated with two immune markers and significantly negatively correlated with three markers in COAD.

Expression of Four Prognosis-Related Genes in Colorectal Cancer
To better understand the potential of these prognosis-related genes as drug targets, we analyzed their expression levels in normal tissues and colorectal cancer tissues using ONCOMINE 4.5 database. The results revealed that mRNA expression levels of these four genes were significantly higher in colorectal cancer tissues than in normal tissues (p < 0.01). Although the fold differences of LEP, DLX2, and CLSTN2 were within 2, they ranked within the top 20% based on mRNA expression (Overexpression Gene Rank (LEP): in top 20%; Over-expression Gene Rank (DLX2): in top 11%; Over-expression Gene Rank (CLSTN2): in top 14%; (Supplementary Figures 2A-C). In addition, REG3A expression level was significantly higher in colorectal cancer tissues than in normal colon tissues (Supplementary Figure 2D).

Prediction of Potential Therapeutic Drugs for COAD
Given the strong association between distant metastasis and the survival of cancer patients, we postulated that drugs affecting the DEGs in metastatic patients may play an antitumor role in cancer treatment. Among the 127 candidate DEGs, 49 upregulated and 27 downregulated genes were selected as input files for the CMap pilot dataset. As a result, four compounds (Mean ≤−0.40, p < 0.05, percent non-null > 50) were considered candidate drugs: ajmaline, a class IA antiarrhythmic drug used diagnostically (not used therapeutically) to elicit ECG changes in patients suspected to have Brugada syndrome; TTNPB, a synthetic retinoid that acts as a selective agonist for retinoic acid receptors (RARs); dydrogesterone, a progestogen drug used to treat menstrual and premenstrual disorders, endometriosis, infertility and other conditions; and dicycloverine, also known as dicyclomine, an anticholinergic drug used for treating irritable bowel syndrome (IBS) ( Table 4). Then, the relationships between the four compounds and cancers were investigated using CTD, and the results revealed that ajmaline, TTNPB, and dydrogesterone could target colonic neoplasms (Supplementary Table 5).

DISCUSSION
In this study, we identified four novel distant metastasisrelated genes significantly associated with the survival of COAD patients. In brief, we screened 127 DEGs by comparing the gene expression profiles of primary tumors from metastatic patients with those from primary patients. Most previous studies focused on the differentially expressed regulators between tumor tissue and normal tissue (22)(23)(24). However, in the present study, we analyzed DEmiRNAs and DEGs between non-distant metastatic and distant metastatic COAD patients. The overlap between distant metastasis-related genes and prognostic factors represents an important group of potential biomarkers and therapeutic targets. Moreover, the expression levels of LEP, DLX2, CLSTN2, and REG3A were significantly higher in tumor tissues than normal tissues. Based on this, we determined four small-molecule drugs (ajmaline, TTNPB, dydrogesterone, and dicycloverine) that potentially target these genes according to the data from CMap. As the CMap results are based on small-molecule perturbations profiled across a large number of cell types (17), we further validated the relationships between the four candidate drugs and colon carcinoma in CTD. Among the four drugs, the results suggested that three drugs, including ajmaline, TTNPB, and dydrogesterone, could treat colon carcinoma.
Notably, compared to four individual prognosis-associated genes, the combined gene signature panel had better discrimination performance in COAD patients. LEP was reported to promote cancer cell migration and invasion (25,26). DLX2 was indicated to increase the risk of metastasis in prostate cancer patients with a high expression of Ki67 (27). In addition, CLSTN2 was demonstrated to be involved in both the tumorigenesis and pulmonary metastasis of osteosarcoma (28).  Interestingly, REG3A was found to promote cell proliferation in gastric and colorectal cancers (29,30). However, the present study found that REG3A correlates with favorable survival in colorectal cancer, and is down-regulated in primary tumors of patients with distant metastasis (M1) compared to patients without distant metastasis (M0). Consistent with this result, Qiu et al. reported that REG3A was a tumor suppressor in gastric cancer, while REG3A overexpression could inhibit the invasion and proliferation and promote the apoptosis of gastric cancer cells via the phosphatidylinositol 3-kinase (PI3K)/Akt-GSK3 signaling pathway axis (31). From this perspective, REG3A might promote tumor cells proliferation, but suppress metastatic cascade indirectly. The understanding of the roles of these four genes in the metastasis of colorectal cancer may be improved by analyzing lager cohorts and performing in vitro and in vivo experiments. In general, tumor development could be controlled by cytotoxic innate and adaptive immune cells. However, cancer cells evolve different mechanisms that mimic peripheral immune tolerance to avoid tumoricidal attack. Furthermore, increasing evidence reveals that tumor-infiltrating immune cells could promote the metastatic cascade thus affecting clinical outcome (32,33). In this study, we found that expression of LEP, DLX2, CLSTN2, and REG3A are correlated with immune infiltrating levels of CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and DCs in COAD, indicating that these four prognosisassociated genes expression levels could also reflect immune status, which partially explains that LEP, DLX2 and CLSTN2 are predictors of poor outcome while REG3A indicates better outcome and lower probability of distant metastasis in COAD.
To develop more treatment options for COAD, we found that three existing drugs may be used for treating COAD. First, ajmaline, an indole alkaloid commonly used for hypertension, was found to inhibit DNA synthesis by arresting cells at the G2 phase and promoting apoptosis in prostate cancer cells (34). As a voltage-gated Na+ channels (VGSCs) blocker, ajmaline preferentially binds to the open state of cardiac Na+ channel protein (Nav1.5) (35). In recent years, VGSCs have been recognized as the invasiveness-associated proteins, and they are exclusively highly expressed in aggressive tumor biopsies and metastatic cancer cells (36)(37)(38). Specifically, Nav1.5, a neonatal splice variant of VGSCs, has been found to promote the breast cancer cell invasion in vitro and metastasis in vivo (39). Thus, VGSCs are a class of promising anti-metastasis drug targets (40), especially Nav1.5, which is a neonatal isoform not commonly expressed in normal adult tissues (41,42). There are numerous clinically used drugs that target Nav1.5, but some undesirable effects such as local anesthetics, antiarrhythmics, and anticonvulsants restrict their application in treating metastatic cancers (43). More recently, researchers are focusing on exploring the proper dose of old drugs and developing novel tissue-specific drugs to treat metastatic cancers by inhibiting VGSCs. For example, Shilpa et al. designed and synthesized five small molecule compounds to inhibit the invasion of MDA-MB-231 cells (a highly aggressive human breast cancer cell line) through blocking Nav1.5-dependent inward currents (44). Moreover, Hatice et al. reported that low concentrations of naringenin (5 and 10 µM), a natural compound found in citrus fruits and tomatoes, could inhibit MAT-LyLu cells (a highly metastatic prostate cancer cell line) metastasis by blocking VGSCs (45). Also, Nelson et al. proposed that phenytoin, a sodium channel-blocking antiepileptic drug, at a dose equivalent to that used to treat epilepsy (60 mg/kg; daily), could significantly reduce breast tumor metastasis to the liver, lungs, and spleen in orthotopic tumor models (46). Besides, Baptista-Hon et al. showed that VGSCs expressed in colon cancer cells could promote invasion. Apart from this, ropivacaine, an inhibitor of Nav1.5, could inhibit invasion of SW620 cells (a metastatic colon cancer cell line), with a dose of inhibition of 3.8 µM (47). Second, TTNPB is a retinoic acid analog that acts as a selective RAR agonist. It has been reported that an analog of TTNPB, 4-hydroxybenzyl-modified compound (4HBTTNPB), could induce apoptosis in breast cancer cells (48). For dydrogesterone, only one case has been reported. In brief, one patient with recurrent endometrial stromal sarcoma was treated exclusively with dydrogesterone at a daily dose of 10 mg, and the tumor clinically disappeared after 4 years and 3 months (49). To date, no studies have focused on the three drugs in COAD. Therefore, future studies are needed to explore the key molecular targets and therapeutic effects of these three drugs in COAD.
In summary, we identified four novel prognosis-related genes and three potential drugs for COAD patients, which may serve to unravel the complex gene networks of distant metastasis with clinical relevance.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
MW participated in its design, analysis, interpretation, and manuscript drafting. WL assisted in making the study design and evaluating the results obtained. X-FY supervised the study, participated in its design, interpretation, analysis, and including drafting. All authors contributed to drafting, reviewing the manuscript, and approved the submission of the final version.