Identification of Down-Regulated ADH1C is Associated With Poor Prognosis in Colorectal Cancer Using Bioinformatics Analysis

Colorectal cancer (CRC) is the second most deadly cancer in the whole world, with the underlying mechanisms largely indistinct. Therefore, we aimed to identify significant pathways and genes involved in the initiation, formation and poor prognosis of CRC using bioinformatics methods. In this study, we compared gene expression profiles of CRC cases with those from normal colorectal tissues from three chip datasets (GSE33113, GSE23878 and GSE41328) to identify 105 differentially expressed genes (DEGs) that were common to the three datasets. Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses showed that the highest proportion of up-regulated DEGs was involved in extracellular region and cytokine-cytokine receptor interaction pathways. Integral components of membrane and bile secretion pathways were identified as containing down-regulated DEGs. 13 hub DEGs were chosen and their expression were further validated by GEPIA. Only four DEGs (ADH1C, CLCA4, CXCL8 and GUCA2A) were associated with a significantly lower overall survival after the prognosis analysis. Lower ADH1C protein level and higher CXCL8 protein level were verified by immunohistochemical staining and western blot in clinical CRC and normal colorectal tissues. In conclusion, our study indicated that the extracellular tumor microenvironment and bile metabolism pathways play critical roles in the formation and progression of CRC. Furthermore, we confirmed ADH1C being down-regulated in CRC and reported ADH1C as a prognostic predictor for the first time.


INTRODUCTION
Colorectal cancer (CRC) causes approximately 10% of cancer-related deaths each year and is the second most common lethal cancers globally (Dekker et al., 2019;Sung et al., 2021). During the past few decades, the incidence of CRC has increased between two-and four-fold in many Asian countries, including China (Nielsen et al., 2005). Early CRCs are highly treatable and screening for the disease can greatly reduce cancer mortality (Schoen et al., 2012). However, CRC is often diagnosed at the advanced stage due to the limitations of the current screening methods and the high metastatic potential of CRC (Li et al., 2018a). Therefore, it is extremely important to identify more reliable biomarkers for the early diagnosis of CRC and to reveal the underlying pathogenic mechanism of CRC.
In recent years, high-throughput sequencing technology is particularly powerful for screening differentially expressed genes (DEGs) in biological samples (Vogelstein et al., 2013). This has led to an increase in the number of gene expression profiles researches and a huge amount of data have been accumulated in public databases, of which bioinformatic analysis is necessary to obtain valuable insights into disease mechanisms. This is also the case with CRC (Isella et al., 2015) and a lot of data on CRC-related DEGs have been accumulating in the database. Bioinformatics studies on CRC, based on public databases (Guo et al., 2017), have shown that using these methods will help to explain the formation of the disease and reveal the underlying molecular mechanisms.
In this study, we downloaded three original expression microarray datasets: GSE33113, GSE23878, and GSE41328, from the Gene Expression Omnibus (GEO) database (Available online: https://www.ncbi.nlm.nih.gov/geo). These datasets provided 127 CRC cases and 32 normal colorectal tissue samples. To obtain the common DEGs from the three datasets, we used the GEO2R online tool, which is linked with the GEO database and the Draw Venn diagram online software. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) software was then applied to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and gene ontology (GO) enrichment analysis of these common DEGs. The GO analysis included biological process (BP), cellular component (CC), and molecular function (MF) categories. We also built the protein-protein interaction (PPI) network for the common DEGs and determined the hub genes using Cytoscape plugin cytoHubba. The expression of these hub genes, compared between CRC and normal colorectal tissues, was validated using the Gene Expression Profiling Interactive Analysis (GEPIA) (P < 0.05). We imported the validated hub genes into the UALCAN and OncoLnc online resource to perform the prognostic analysis (P < 0.05). Four DEGs (ADH1C, CLCA4, CXCL8, and GUCA2A) were found to be associated with a significantly worse prognosis and lower overall survival. The down-regulated protein level of ADH1C (Class I Alcohol dehydrogenase 1C, gamma polypeptide) was verified in the CRC tissues and normal colorectal tissues by immunohistochemical staining and western blot. Therefore, we reported ADH1C as a prognostic predictor for CRC for the first time. In conclusion, our bioinformatics study provides some significant pathways and potential biomarkers that may be helpful in the interpretation of the molecular mechanism of CRC and could provide potential markers for CRC diagnosis.

Information From Three Microarray Datasets
We downloaded three original expression microarray datasets: GSE33113, GSE23878, and GSE41328, from the GEO database (Available online: https://www.ncbi.nlm.nih.gov/geo). These datasets provided 127 CRC cases and 32 normal colorectal tissue samples. The microarray data were all based on GPL570 Platforms (HG-U133_Plus_2 Affymetrix Human Genome U133 Plus 2.0 Array). The GSE33113 dataset included 90 CRC tissues and six normal colorectal tissues. The GSE23878 dataset included 36 CRC tissues and 24 normal colorectal tissues. The GSE41328 dataset included two CRC tissues and two normal colorectal tissues.

Identification of Common DEGs
The DEGs between CRC tissues and normal colorectal tissues were first processed through the GEO2R online tools (Davis and Meltzer, 2007), with an adjusted P-value < 0.05 and logFC < −2 or logFC >2 as the cut-off criterion. The integrated raw data stored in TXT files were analyzed by Draw Venn diagram online software (http:// bioinformatics.psb.ugent.be/webtools/Venn/) to obtain the common DEGs from the three databases. The DEGs with a log FC < 0 were deemed to be down-regulated genes, while the DEGs with a log FC > 0 were deemed to be up-regulated genes.

GO and KEGG Pathway Enrichment Analysis of the Common DEGs
GO analysis is a widely used approach for the identification of unique biological characteristics of genes and proteins from data obtained by high-throughput sequencing (Ashburner et al., 2000). The KEGG is a group of databases that are designed to systematically analyze gene functions and link genomic information with higherorder biological function pathways (Kanehisa and Goto, 2000). The DAVID software is an integrative online bioinformatics program that aims to analyze and interpret the biological properties of genes or proteins (Huang da et al., 2009). Therefore, the GO and KEGG pathways enrichment analyses of common DEGs were carried out using DAVID 6.8 (https://david.ncifcrf.gov/). A P value <0.05 was used as the cut-off criterion.

Protein-Protein Interaction Network Analysis
The PPI network of common DEGs was evaluated using STRING (Search Tool for the Retrieval of Interacting Genes), which is an open online tool (Szklarczyk et al., 2015). The PPI network complex of the common DEGs was then imported into Cytoscape, which is a free software for visualization of PPI networks to detect the hub DEGs (confidence score ≥0.4, maximum number of interactors = 0) (Shannon et al., 2003). The Cytoscape application, cytoHubba, was applied to determine the hub genes of the PPI network and the top 15 hub genes was ranking by the Maximal Clique Centrality (MCC) algorithm (Wang et al., 2021).

Validation and Survival Analysis of Hub Genes
The GEPIA website was chosen for validation of the expression level of hub genes that were compared between the CRC tissues and normal colorectal tissues. The GEPIA software was used to collect high-throughput sequencing data from many biological samples of The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects (Tang et al., 2017). The UALCAN website provided easy access to available cancer OMICS data (TCGA and MET500) (Chandrashekar et al., 2017) and OncoLnc was used as an online tool for the exploration of survival correlations based on TCGA (Anaya, 2016). We could find the P-value on the plot. A P value of <0.05 was used as the cut-off criterion.

Patients and Specimens
We collected 18 pairs of formalin-fixed paraffin-embedding CRC tissues and matched normal colorectal tissues (13 men and 5 women; age range 43-84 years; mean age ±standard deviation (SD), 67 ± 14 years) from the Affiliated Hospital of Hebei University, among which 8 patients (3 men and 5 women; age range 53-84 years; mean age ±SD, 70 ± 12 years) had fresh CRC and matched normal colorectal tissues frozen in liquid nitrogen within 30 min after resection. All the CRC patients involved in the study were diagnosed with pathological proof and has not been received chemotherapy or radiotherapy before the surgery. The study was authorized by the Ethics Committee of Affiliated Hospital of Hebei University (HDFY-LL-2021-013). Informed consent was obtained from the patients and their families who participated in the study.

Immunohistochemical Staining
The formalin-fixed paraffin-embedding CRC tissues were fixed in 10% formalin, embedded in paraffin, and sliced into 5 μm thick sections. The slides were deparaffinized by xylene and discontinuous concentration of alcohol. After incubating the slices at 95~100°C for 10 min to retrieve antigens, the samples were cooled to room temperature treated with 3% hydrogen peroxide to block the endogenous peroxidase. Then, the primary antibody of ADH1C (A8081, Abclonal, Wuhan, China, 1:100) and CXCL8 (A2541, Abclonal, Wuhan, China, 1:100) were added to the slices and incubated in a humidified chamber at 4°C overnight. For the next, the slices were incubated with the HRP*Polyclonal Goat Anti-Rabbit IgG (H + L) (PV-9001, Origene, Beijing, China) at room temperature for 20 min, followed by treatment with DAB substrate solution (ZLI-9019, Origene, Beijing, China). Finally, the slices were incubated with hematoxylin to visualize cell nuclei, dehydrated by alcohol, cleared in xylene, and sealed by using a mounting solution (neutral resin). The assay ensured no signal in the negative control. The result was observed by microscopy and analyzed by ImageJ 1.8.0 and IHC Toolbox plugin (Shu et al., 2016). Briefly, a proper threshold was stetted according to the positive DAB-staining specimen, which will be applied to all the pictures for comparison. The pixels of the area above the threshold, which is considered the positive area were measured automatically, and the % relative area which is the percentage of positive area of the whole picture was used to represent the immunoreactivity of the stained antibody. For each sample at least ten fields were measured.

Western Blot
After the evaporation of the liquid nitrogen, the frozen tissues were homogenized by the homogenizer (02642, PRO Scientific, CT, United States) in the lysis buffer (P0013b, Beyotime, Shanghai, China) with phenylmethanesulfonyl fluoride on ice. Then the suspensions were centrifuged at 4°C 14,000×g for 15 min. The concentration of the total protein was measured by the BCA protein quantification kit (PA101-01, Biomed, Beijing, China). SDS-PAGE electrophoresis was used to separate the equal amounts of protein samples, which were then transferred to a nitrocellulose membrane (HATF00010, Millipore, MA, United States). The nitrocellulose membrane was blocked by the 5% nonfat milk (LP0031B, Solarbio, Beijing, China) diluting in tris-buffered saline with Tween-20 (TBST) at room temperature for 1 h, and followed by incubating with the primary antibodies of ADH1C (A8081, Abclonal, Wuhan, China, 1:1000), CXCL8 (A2541, Abclonal, Wuhan, China, 1:1000), and β-actin (20536-1-AP, Proteintech, Wuhan, China, 1:3000) as the internal reference at 4°C overnight, which was also diluted in TBST. At last, the nitrocellulose membrane was incubated with the secondary antibody (SA00001-2, Proteintech, Wuhan, China, 1:5000) for 1 h at room temperature and visualized by the UItraECL Substrate chemiluminescence detection Kit (Biomed, Beijing, China). The absolute intensity of the blot signals was quantified using the ImageJ 1.8.0 software (National Institutes of Health, Boston, MA, United States).

Statistical Analysis
Paired t-test was used to compare the quantitative data of the IHC, real-time PCR and western blot that conformed to normal Frontiers in Molecular Biosciences | www.frontiersin.org March 2022 | Volume 9 | Article 791249 distribution by the Graphpad 6.0. The results were represented by mean ± SD. A P value of <0.05 was used as the cut-off criterion.

Identification of Common DEGs in CRC
There were 127 CRC tissues and 32 normal colorectal tissue samples used in this study. We obtained 990, 720, and 280 DEGs from GSE33113, GSE23878, and GSE41328, respectively, using the GEO2R online tool. After integrated analysis, a total of 105 DEGs were identified as common to the three databases. These included 22 up-regulated DEGs (logFC> 0) and 83 downregulated DEGs (logFC< 0) in the CRC tissues when compared with the normal colorectal tissues ( Table 1 and Figure 1).

GO and KEGG Pathway Enrichment Analyses of Common DEGs in CRC
All 22 up-regulated DEGs were analyzed using DAVID online resources. The GO analysis showed that the highest proportion of up-regulated DEGs were involved in 1) collagen catabolic process, cell-cell signaling, extracellular matrix disassembly, negative regulation of protein kinase activity, and positive regulation of protein oligomerization, found in the BP category; 2) proteinaceous extracellular matrix, extracellular region, extracellular space and extracellular matrix, found in the CC category; 3) metalloendopeptidase activity, protein kinase inhibitor activity, serine-type endopeptidase activity and transforming growth factor-beta receptor binding, found in the MF category ( Table 2).
All 83 down-regulated DEGs were also analyzed using the DAVID online database. The GO analysis indicated that the highest proportion of down-regulated DEGs are enriched in 1) bicarbonate transport, one-carbon metabolic process, regulation of intracellular pH, chloride transmembrane transport, and cellular glucuronidation, found in the BP category; 2) an integral component of the plasma membrane, apical plasma membrane, anchored component of the membrane and integral component of membrane, found in the CC category; 3) carbonate dehydratase activity, chloride channel activity, hormone activity, transporter activity and glucuronosyltransferase activity, found in the MF category ( Table 3).
We also performed a KEGG pathway analysis using the DAVID online database. This indicated that up-regulated DEGs were particularly enriched in rheumatoid arthritis, cytokine-cytokine receptor interaction, and bladder cancer pathways, whilst the nitrogen metabolism, bile secretion, retinol metabolism, proximal tubule bicarbonate reclamation, and drug metabolism pathways were identified as the most represented pathways for the down-regulated DEGs ( Table 4, P < 0.05).

Validation of Hub Genes by GEPIA
We used GEPIA to compare the expression levels of the 15 hub genes between samples from CRC patients and normal individuals. The results showed that 13 out of the 15 hub genes had significantly different expression levels in CRC samples when compared to normal colorectal mucosa tissues (P < 0.05, Table 6 and Figure 3). SLC26A3 and UGT2B15 did not show differential expression ( Table 6).

Prognostic Analysis of the Validated Hub Genes by UALCAN and OncoLnc
We performed a prognostic analysis of the 13 validated hub genes using the UALCAN (http://ualcan.path.uab.edu/) online resource database. The results showed that low expression of ADH1C,  CLCA4, and GUCA2A was associated with significantly lower survival rates. There was no available data for CXCL8 and the other nine genes showed no significant changes (P < 0.05, Table 7 and Figure 4A). However, down-regulated CXCL8 was also significantly related to a lower survival rate (P < 0.05, Figure 4A), as shown by an analysis on the OncoLnc (http:// www.oncolnc.org/) open online database. Furthermore, we collected CRC tissues and matched normal colorectal tissues from the Affiliated Hospital of Hebei University. The immunohistochemical analyses, real-time PCR and western blot confirmed that both the mRNA and protein level of ADH1C was down-regulated, well CXCL8 was up-regulated in the CRC tissues compared with normal colorectal tissues (P < 0. 05, Figures 4B-E).

DISCUSSION
Numerous studies have focused on the pathology and mechanism of CRC, but the exact mechanism remains largely unknown. To investigate the genes related to pathogenesis and to address the urgent need for early-stage diagnostic biomarkers for CRC, we integrated three gene profile datasets: GSE33113, GSE23878, and GSE41328, from GEO. 22 up-regulated and 83 down-regulated genes were identified when CRC samples were compared with colorectal tissues. We performed GO and KEGG pathway enrichment analyses using DAVID online resources. Most up-regulated genes had functions that are integral to the extracellular region and extracellular space, which suggested that the formation of CRC has a close relationship with the tumor microenvironment. The largest proportion of down-regulated genes was mainly involved in the integral component of the membrane, an integral component of the plasma membrane, and bicarbonate transport. This is consistent with the fact that the integral cell membrane is essential for the avoidance of pathogens and to maintain the acid-base balance (Than et al., 2016;Westman et al., 2019). In the KEGG pathway analysis, up-  regulated genes were particularly enriched in rheumatoid arthritis and the cytokine-cytokine receptor interaction. This is consistent with the GO analysis result of the up-regulated genes because the extracellular matrix and cytokine signal transduction play an important role in CRC invasion and metastasis (Cabrero-de Las Heras and Martinez-Balibrea, 2018;Yuzhalin et al., 2018). The highest proportion of down-regulated genes was significantly associated with nitrogen metabolism and bile secretion. Reactive nitrogen may function as a pro-inflammatory factor that promotes CRC (Janakiram and Rao, 2014). We also noted that many studies have shown that the gut microbiota that participates in the bile acid metabolism takes part in the formation of CRC (Feng et al., 2015;Wong et al., 2017), which is consistent with bile acid being associated with the occurrence of CRC .
FIGURE 3 | The expression levels of the hub genes were validated using the GEPIA website. Thirteen of the 15 hub genes showed significant differential expression in CRC samples, when compared with the normal samples (*P < 0.05). Red represents tumor tissues and grey represents normal tissues.

SLC26A3, UGT2B15
Frontiers in Molecular Biosciences | www.frontiersin.org March 2022 | Volume 9 | Article 791249 7 We found that 13 out of the 15 hub genes showed differential expression in CRC samples when compared to normal samples. This was validated using GEPIA (P < 0.05). However, only four (ADH1C, CLCA4, CXCL8, GUCA2A) out of the 13 genes were significantly associated with a lower survival rate. The lower protein level of ADH1C and higher protein level of CXCL8 were further verified in our CRC and matched normal colorectal samples.
ADH1C, also known as ADH3, is an alcohol dehydrogenase that can catalyze ethanol oxidation to metabolite acetaldehyde (Shen et al., 2019). According to THE HUMAN PROTEIN ATLAS database, ADH1C locates mainly to the cytosol and plasma membrane, which is consistent with our IHC result. Genetic polymorphism of the ADH1C gene (ADH1C*1 allele) is associated with various human cancers, such as gastric cancer (Hidaka et al., 2015), oral cancer (Anantharaman et al., 2014) and CRC (Offermans et al., 2018), for it encodes an alcohol dehydrogenase producing 2.5 times more acetaldehyde (Seitz and Stickel, 2010). Chiang et al. showed that ADH1C was the mainly expressed isozyme in colorectal tissues (Chiang et al., 2012). Mashkova et al. observed that lower mRNA level of ADH1C in the CRC tissues compared with the normal or only hyperplasia colorectal tissues (Kropotova et al., 2014). In addition, the latest study showed that ADH1C is down-regulated in familial adenomatous polyposis case adenomas (Thiruvengadam et al., 2019), but up-regulated in patients with ulcerative colitis (Low et al., 2019). Kumamoto et al. (2019) revealed that ADH1C could predict the recurrence of stage III CRC in patients who accept chemotherapy treatment. Our study is the first to show a correlation between the lower expression level of ADH1C and worse prognostic of CRC patients. What's more, we conducted immunohistochemical analyses and western blot to testify that ADH1C was down-regulated in CRC tissues in comparison with the normal tissues consisting with our bioinformatics analysis. However, there are few studies supporting our conclusion, further experiments are needed to verify our results.
Chloride channel accessory 4 (CLCA4) is well known as a tumor inhibitor and has been shown to suppress the development of various malignant tumors. In bladder cancer and hepatocellular carcinoma, tumor cell proliferation and migration are inhibited by CLCA4 through the PI3K/AKT signalling pathway (Hou et al., 2017;Liu et al., 2018). Low expression of CLCA4 has been reported in CRC patients (Zhao et al., 2019). Chen et al. (2019) reported that the epithelial-mesenchymal transition can also be suppressed by CLCA4 via the PI3K/AKT pathway in CRC, and also indicated that low CLCA4 is correlated with poor survival of CRC patients. This is consistent with our results.
The C-X-C motif chemokine ligand 8 (CXCL8) gene belongs to the CXC chemokine family whose main function is to recruit and activate neutrophils and granulocytes to sites of inflammation (Waugh and Wilson, 2008). Several previous studies have shown that CXCL8 which is a secreted protein functions with its receptors, CXCR1 and CXCR2, to promote the progression of some cancers, such as breast cancer (Yi et al., 2019), prostate cancer (Baci et al., 2019), lung cancer (He et al., 2019), and CRC (Liu et al., 2016). The CXCL8 gene is upregulated in CRC tissue and correlated with the development of CRC, which is physiologically hard to detect (Rubie et al., 2007). Xia et al. (2015) proved that high levels of expression of CXCL8 are significantly associated with poor overall survival, tumor stage, lymphatic and liver metastasis. This suggests that CXCL8 could be a potential indicator for both detection and prognosis by meta-analysis. Fisher et al. (2019) demonstrated that blocking the CXCL8-CXCR1 pathway can inhibit the tumorigenicity that originates in the CRC stem cells. However, we did observe elevated CXCL8 levels in CRC patients in our study, the bioinformatics analysis indicated that a high level of CXCL8 correlated with poor prognosis at the beginning and in related with longer survival time as the time lasted in the opposite. Therefore, more studies are needed to investigate the exact relationship between the expression of CXCL8 and the CRC.
Guanylate cyclase activator 2A (GUCA2A) is a peptide hormone that is secreted by gut epithelial cells to regulate the function of guanylate cyclase 2C (GUCY2C) signalling in the autocrine and paracrine systems (Pattison et al., 2016). Silencing of the GUCY2C receptor induces genomic instability, hyperproliferation, and transformation of tumor cells (Li et al., 2007;Lin et al., 2016). Bashir et al. (2019) showed that a low level of GUCA2A silences the tumor inhibitory receptor, GUCY2C, in pathophysiological conditions, and leads to microsatellite instability in tumors. Loss of GUCA2A has been observed in CRC and inflammatory bowel disease, in which it may be associated with the disruption of intestinal homeostasis (Brenna et al., 2015;Zhang et al., 2019). Zhang et al. (2019) used analysis of the TCGA database to show that GUCA2A is associated with poor overall survival, which is consistent with our results.

CONCLUSION
In conclusion, our bioinformatics study identified significantly enriched KEGG pathways and four common DEGs (ADH1C, CLCA4, CXCL8, and GUCA2A) that correlate with poor overall survival of CRC patients. Our GO and KEGG pathways analyses indicated that the tumor microenvironment and gut microbiota are involved in the progression of CRC. We confirmed the decreased level of ADH1C in our CRC tissues from our patients and we are also the first to reveal the relationship between ADH1C and the

Category Genes
Genes with significantly worse survival (P < 0.05) ADH1C, CLCA4, CXCL8, GUCA2A Genes without significantly worse survival (P > 0.05) ADH1B, GCG, GUCA2B, MMP1, MS4A12, PYY, SST, ZG16, CXCL5 Frontiers in Molecular Biosciences | www.frontiersin.org March 2022 | Volume 9 | Article 791249 lower overall survival rate of CRC patients. This study provides information on the pathogenesis of CRC and indicates that ADH1C may be a candidate prognostic molecular. More empirical and clinical verifications are needed to add to our analyses and further studies on the mechanism of the disease are also necessary. In short, our results have identified potential prognostic markers for CRC and are also shed light on the mechanism of CRC.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Affiliated Hospital of Hebei University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ML conducted the Venn diagram, GO and KEGG analysis and also drafted the manuscript. YW and JG performed the immunohistochemical analyses. JS helped to build the PPI network, validate the expression of the hub genes by GEPIA and perform the prognostic analysis by UALCAN and OncoLnc. HW revised of the work and participated in interpreting the data. ZL and TW helped to finish the real-time PCR and western blot.