Bioinformatics and Experimental Analysis of the Prognostic and Predictive Value of the CHPF Gene on Breast Cancer

Background Recent studies in the United States have shown that breast cancer accounts for 30% of all new cancer diagnoses in women and has become the leading cause of cancer deaths in women worldwide. Chondroitin Polymerizing Factor (CHPF), is an enzyme involved in chondroitin sulfate (CS) elongation and a novel key molecule in the poor prognosis of many cancers. However, its role in the development and progression of breast cancer remains unclear. Methods The transcript expression of CHPF in the Cancer Genome Atlas-Breast Cancer (TCGA-BRCA), Gene Expression Omnibus (GEO) database was analyzed separately using the limma package of R software, and the relationship between CHPF transcriptional expression and CHPF DNA methylation was investigated in TCGA-BRCA. Kaplan-Meier curves were plotted using the Survival package to further assess the prognostic impact of CHPF DNA methylation/expression. The association between CHPF transcript expression/DNA methylation and cancer immune infiltration and immune markers was investigated using the TIMER and TISIDB databases. We also performed gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis with the clusterProfiler package. Western blotting and RT-PCR were used to verify the protein level and mRNA level of CHPF in breast tissue and cell lines, respectively. Small interfering plasmids and lentiviral plasmids were constructed for transient and stable transfection of breast cancer cell lines MCF-7 and SUM1315, respectively, followed by proliferation-related functional assays, such as CCK8, EDU, clone formation assays; migration and invasion-related functional assays, such as wound healing assay and transwell assays. We also conducted a preliminary study of the mechanism. Results We observed that CHPF was significantly upregulated in breast cancer tissues and correlated with poor prognosis. CHPF gene transcriptional expression and methylation are associated with immune infiltration immune markers. CHPF promotes proliferation, migration, invasion of the breast cancer cell lines MCF-7 and SUM1315, and is significantly enriched in pathways associated with the ECM-receptor interaction and PI3K-AKT pathway. Conclusion CHPF transcriptional expression and DNA methylation correlate with immune infiltration and immune markers. Upregulation of CHPF in breast cancer promotes malignant behavior of cancer cells and is associated with poorer survival in breast cancer, possibly through ECM-receptor interactions and the PI3K-AKT pathway.


INTRODUCTION
Breast cancer has become the second most common cause of cancer death in women worldwide (1). According to previous literature, there is no clear cause of breast cancer to achieve precise cause-specific treatment; some patients are still at advanced stages upon detection due to the limitations of early diagnosis techniques and popularity; some types of breast cancer progress rapidly and have limited treatment options.
The emergence of prognostic predictors is expected to improve the prognosis of breast cancer patients. Previous clinical applications mostly relied on tumor size, lymph node status, and tumor grading, which were later found not to enable personalized treatment. Therefore, the search for new markers that can achieve a prognostic role has gradually tended to continue, from the RNA level to the protein level. However, none of these studies has achieved a revolutionary breakthrough, and there is still an urgent need for more emerging indicators.
Efforts have been made by experts from various disciplines to improve the prognosis of breast cancer patients. For example, the introduction of new diagnostic techniques (2,3); the introduction of genomic and metabolomic studies thus refining the type of breast cancer pathology (4)(5)(6); the exploration of molecular markers (7)(8)(9)(10) and the development of targeted therapeutic modalities (11)(12)(13)(14)(15)(16). However, there is a lack of more studies about the emerging molecular marker, CHPF, in cancer.
Chondroitin Sulfate(CS), is a type of sulfated glycosaminoglycans (GAGs) (17,18) and is involved in the biosynthesis of the skeleton (19). Multiple studies find CS involvement in tumor progression and metastasis (20)(21)(22)(23). CHPF has beta-1,3-glucuronic acid and beta-1,4-N-acetylgalactosamine transferase activity and is involved in CS chain elongation (24)(25)(26). CHPF is located in the 2q35-q36 region of human chromosomes, spanning four exon regions, and plays an important role in cellular function (27).The latest study reported that, CHPF may act as both an oncogene and a cancer-promoting factor in a variety of tumors. It is upregulated and its high expression was positively correlated with poor prognosis in breast cancer (28,29), lung cancer (30)(31)(32), malignant melanoma (33), cholangiocarcinoma (34). However, studies in hepatocellular carcinoma are contradictory (35,36). Currently, there is no clear mechanism of action of CHPF in cancer. In addition, there are few studies on this gene in breast cancer, and there is a lack of additional evidence to confirm its important role in breast cancer.
In this paper, we investigated the role of CHPF in breast cancer prognosis prediction and proposed a combined bioinformatics and basic experimental approach. The present study presents and explores for the first time the relevance of CHPF as well as methylation to immunity.

METHODS AND MATERIALS UCSC Xena
UCSC Xena (http://xena.ucsc.edu/) comprises a cancer genomics data analysis platform containing integrated data from various TCGA tumors. Obtaining breast cancer expression data, survival data files, and pan-cancer data from this site.

CHPF DNA Methylation and Cancer Immune Infiltration Analysis
The relationship between CHPF DNA methylation and CHPF transcript expression was investigated in TCGA-BRCA. KM survival analysis was performed using Survival package to evaluate the potential impact of CHPF DNA methylation/ expression on clinical outcomes. Analysis of the association between CHPF transcript expression/DNA methylation and cancer immune infiltration using the GSCA database.

Source of Human BRCA With Adjacent Non-Tumor Samples
All breast tissue samples were obtained from the Department of Pathology, Affiliated Hospital of Xuzhou Medical University. The patients' clinicopathological data were obtained from the hospital medical record system and informed consent was obtained from patients for all human samples. The specimens and data used for the study were approved by the hospital ethics committee.

RNA Extraction and RT-qPCR
Total RNA was extracted using the TRIzol reagent (Invitrogen) according to instructions. After determining the RNA concentration, the reverse transcription reaction was performed with HiScript II Q RT SuperMix for qPCR (Vazyme Biotech, Nanjing, China). The RNA expression was determined using SYBR Green (Vazyme Biotech, Nanjing, China) reagents. The machine was operated on a Bio-Rad QX100 Droplet Digital PCR system (USA) and the relative RNA amounts were calculated and normalized to GAPDH using the 2 -DDCt method. All premiers were obtained from GENERAY Biotechnology (Shanghai, China) and are summarized in Additional File 1: Table S1.
siRNA Transfections CHPF (siLCHPF) specific small interfering RNA (siRNA) and non-specific control siRNA (siCtrl) were purchased from GenePharma (Shanghai, China) and transfected with siLentFect Lipid Reagent (Bio-Rad Laboratories, Inc.) based on the manufacturer's instructions when BRCA cells grew to 20~50% confluence. Four to six hours after transfection, the medium containing the transfection reagent was substituted with a medium containing 10% fetal bovine serum. The siRNA sequences are listed in Supplementary Table S2.

Stable Cell Line Generation
CHPF short hairpin RNA (shRNA) and interference control lentivirus were purchased from GenePharma. Cells were spread in 24-well plates at 1×10^5/well. The next day, 2 ml of fresh medium containing polybrene 6-8 ug/ml was added to replace the original medium, followed by the addition of an appropriate amount of virus suspension and incubation at 37°C. After 4-6 h, the medium was changed. Continue incubation for 24-48 h and then screen with 2 ng/ml puromycin for 2 weeks, changing the medium every 3 days. Stably transfected cell lines were screened. The infected and screened cells were passaged and continued to be cultured with the addition of puromycin for maintenance screening, and after 3 generations of continuous culture and passaging, the cells were lyophilized. The shRNA sequences are listed in Supplementary Table S3.

Cell Proliferation Assay and Colony Formation Assay
To assess the proliferative capacity of cells, Cell Counting Kit-8 (CCK-8, Dojindo Laboratories, Kumamoto, Japan) was used. Cells were inoculated into 96-well plates at 2000 cells per well. Six replicated wells were set up for each group. Then,10ml CCK8 solution was added to the wells and the samples were incubated for 2 h at 37°C. The absorbance of the samples at 450 nm was measured for five consecutive days. We performed three independent experiments and presented the results as mean ± SD. For colony formation assays, 800 cells/well were inoculated in 60 mm plates and cultured in a medium containing 10% FBS for 14 days. The culture medium was discarded, methyl-fixed for 20 min, stained with crystal violet for 20 min, gently rinsed in running water, dried, and photographed for counting.

EDU Assays
EDU (5-Ethynyl-2'-deoxyuridine) assays were performed using the EDU assay kit (RiboBio, Guangzhou, China) and according to the manufacturer's instructions. Logarithmic growth phase cells were taken and cultured at 4 × 10 3 cells/well and inoculated in 96-well plates. After 20h of incubation, cells were treated with 50mmol/L Edu medium and incubated at 37°C for 2 h. Cells were washed with PBS, fixed with 4% paraformaldehyde for 30 min, and incubated with 50 ul of 2 mg/mL glycine for 5 min. The cells were then incubated with 0.5% Triton X-100 permeate for 10 min and 100mL of 1× Apollo ® staining reaction solution for 30 min at room temperature, protected from light, followed by permeabilization. Finally, 100mL of Hoechst 33342 (5mg/mL) was used for staining for 30 min and observed and captured with a fluorescence microscope (IX71; Olympus, Tokyo, Japan).

Wound Healing Assay
The cells were cultured to logarithmic phase and then inoculated in 6-well plates according to 1×105 cells per well and incubated in an incubator at 37°C for 24 h. After the cells were spread all over, a 20.0 mL pipette tip was used to scratch vertically on the horizontal line, and the medium without fetal bovine serum was added after washing with PBS, and the position and width of the scratch were recorded under a 200× microscope at 0 h. The cells were further incubated in the incubator for 24 h and then recorded under 200× microscope. After incubation for 24 h, the cells were photographed and recorded under 200× microscope to detect the migration distance.

Transwell Assay
Transwell (BD bioscience, SanJose, CA)assay was used to assess cell invasiveness. The matrigel was diluted in proportion (serumfree medium: matrigel = 9:1) one day before performing the invasion assay and the diluted matrigel was subsequently added to the upper chamber (40ul/well) and placed in a 37°C incubator overnight. Cells were counted and 40,000 cells/well were selected for invasion assay, and added to 200ul with serum-free medium. At the same time, 800 ul of the serum-containing medium was added to the lower chamber. Placing 24-well plates in a 37°C incubator for 48 hours. When the time comes, fixation, staining, swabbing, and photo-counting are performed. Five randomly selected regions were counted for the number of invading cells.

Bioinformatical and Statistical Analysis
All statistical analyses of bioinformatics were performed with Rstudio software (version 1.4.1717; http://www.rstudio.com/ products/rstudio). First, differential expression analysis was performed using the limma package to explore whether CHPF is differentially expressed in breast cancer patients and normal cases. To explore the correlation between CHPF transcriptional expression/DNA methylation and the prognosis of breast cancer patients, Kaplan-Meier survival analysis was performed in this study using the Survival and Survminer software packages and matched by the log-rank test. In addition, we further analyzed the univariate Cox regression analysis between multivariate and survival. In order to explore the possible mechanism of action of CHPF in breast cancer, we performed GO,KEGG, and GSEA analysis based on TCGA data. All underlying experimental statistical analyses were performed by the SPSS 23.0 statistical package (SPSS, Inc., Chicago, IL). T-tests were used to evaluate differences between control and knockdown groups. Differences were deemed significant when P < 0.05.

CHPF Is Highly Expressed in Breast Cancer and Is Associated With Poor Prognosis
After preprocessing the data from 33 tumors obtained from UCSC Xena, differential expression analysis was performed using the limma package to compare CHPF expression in 33 tumor samples as well as the corresponding normal samples (in this case, only tumors with the number of normal samples >= 5 were selected).Significant differences in CHPF gene expression were found in BLCA, BRCA, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PRAD, READ, STAD, and UCEC, and the gene expression was significantly upregulated in breast cancer ( Figure 1A). Because of the unequal number of TCGA-BRCA tumor samples and normal samples, a paired analysis was selected, which showed a significant increase in CHPF gene expression levels in breast tumor samples ( Figure 1B). Subsequent KM survival analysis was performed and the results demonstrated a poorer prognosis in the high CHPF expression group ( Figure 1C), in addition, we selected the GEO dataset GSE20685 for further validation and obtained the same results as TCGA ( Figure 1D). When we integrated both clinicopathological factors and CHPF gene expression in the univariate Coxregression analysis variables, we could see that CHPF gene expression was related to prognosis and function as a risk factor ( Figure 1E). Here, we analyzed the expression of the CHPF gene in 14 pairs of tumors and normal tissues and found that the CHPF gene was significantly increased in breast cancer tissues ( Figure 1F). In addition, we examined CHPF protein levels in six breast cell lines by western blotting. CHPF was extensively abundant in breast cancer cell lines MCF-7 as well as SUM1315( Figure 1G), and so these two cell lines were selected for follow-up studies.

CHPF DNA Methylation Was Negatively Correlated With CHPF Transcript Expression, Both of Which Were Associated With Immune Infiltrates
In TCGA BRCA, we first analyzed the extent to which methylation occurred at different loci in the CHPF gene ( Figure 2A). A subsequent correlation analysis revealed that CHPF transcript expression was negatively correlated with cg03176520 site methylation ( Figure 2B). KM survival analysis showed that higher CHPF cg03176520 site methylation was correlated with better overall survival (OS) and progression-free survival (PFS) (Figures 2C, D) In addition, Breast cancer patients with both cg03176520 site hypermethylation and low CHPF gene expression had significantly increased overall survival contrasted to hypermethylation combined with high CHPF gene expression group ( Figure 2E). We also analyzed immune cell infiltration in tumor microenvironment. We can see that the degree of immune cell infiltration in the breast cancer tumor microenvironment correlates with prognosis(P=0.011) ( Figure 2F). The results of immune cell content analysis based on high and low CHPF gene expression groups showed that a total of 10 immune cells differed between the two groups (P<0.05) ( Figure 2G). The correlation test further analyzed the correlation between immune cells and the CHPF gene, and the results showed that a total of 12 immune cells were correlated with the target gene (P<0.05). In this time, a total of 10 differentially expressed immune cells were obtained after taking the intersection of immune cell differential analysis and correlation analysis results ( Figure 2H). KM survival analysis of these 10 differential immune cells showed among them B cells memory, B cells naive, T cells CD4 memory resting, Macrophages M0, Macrophages M1 were statistically significant in relation to survival, where higher Macrophages M0, Macrophages M1, B cells memory were associated with poor prognosis ( Figure 2I). Apart from that, the results from the GSCA database analysis showed that CHPF DNA methylation was significantly negatively correlated with the infiltration levels of B, CD8+ T/CD4+ T cells, cytotoxic, and Exhausted ( Figure 2J).  We then analyzed the relationship between clinicopathological parameters and CHPF gene expression and methylation in BRCA patients. The results showed that CHPF expression was upregulated in patients over 35 years of age compared with patients <= 35 years of age ( Figure 3A), and the level of CHPF expression was significantly higher in patients in stage M1 compared with the M0 group ( Figure 3B). Although there were no significant differences in tumor stage, T-stage, and Nstage subgroups, CHPF expression appeared to be increased in stages IV, T4, and N3 compared to other classifications in the same group (Figures 3C-F). Using the online database bc-GenExMiner (http://bcgenex.ico.unicancer.fr), we investigated the relationship between CHPF expression and ER, PR, HER2, and from the results, we can conclude that CHPF is significantly predominantly present in ER-, PR-, ER/PR-, HER2+ groups, respectively ( Figures 3G-J). And in clinical practice, these groups tend to have a poor prognosis. Meanwhile, the relationship between CHPF cg03176520 motif methylation and breast cancer tumor stage was analyzed online at smartapp (http://www.bioinfo-zs.com/smartapp/), and although not supported by positive results, we could still see that the CHPF cg03176520 motif was less methylated in samples with advanced IV samples were less methylated than other stages ( Figure 3K).

CHPF Promotes Breast Cancer Cell Proliferation, Migration, and Invasion In Vitro
Since the expression of CHPF was higher in MCF-7 cells and SUM1315 cells than in normal mammary cells MCF-10A in the cell line validation, these two cell lines were selected for interference and transiently transfected with siRNA targeting CHPF (S1 and S2) or siCtrl (NC) in MCF-7 and SUM1315 cells, respectively. Both western blot and RT-qPCR results showed that CHPF was significantly reduced in CHPF siRNA-transfected cells compared to control cells (Figures 5A, B) The results of the clone formation assay showed that the number of colony formation was dramatically reduced in the knockdown CHPF group compared to the control group ( Figure 5C). The proliferation of MCF-7 and SUM1315 cells was markedly decreased after down-regulation of CHPF expression in the CCK8 value-added assay ( Figure 5D). In addition, EDU incorporation analysis also showed that the proportion of EDU-positive MCF-7 and SUM1315 cells was significantly reduced in the CHPF-interfered group compared with the corresponding control cells ( Figure 5E). The results of wound healing and invasion assays showed that interference with CHPF reduced the migratory capacity and invasive ability of MCF-7 and SUM1315 cells (Figures 5F, G). Furthermore, the protein levels of EMT-related genes N-cadherin, Snail, and Vimentin were down-regulated in CHPF knockdown MCF-7 and SUM1315 cells, while the protein levels of E-cadherin were upregulated ( Figure 5H).

CHPF Can Alter the Expression of Genes Related to ECM-Receptor Interactions and PI3K-AKT Pathways
To further understand the molecular mechanism of CHPFinduced BRCA metastasis, we performed bioinformatics analysis using TCGA-BRCA data. The samples were divided into two groups of high and low CHPF gene expression, and all genes in the two groups were analyzed for differential expression, with |logFC|>1 and adjusted P value <0.05 as the screening conditions, and a total of three differentially expressed genes were screened out, namely MMP11, COMP, and COL6A2. GO enrichment analysis revealed that the differential genes were mostly related to extracellular matrixassociated terms which are often associated with tumor aggressiveness ( Figure 6A), while KEGG pathway enrichment analysis showed that differential genes were mainly enriched in ECM-receptor interaction and PI3K-AKT pathway ( Figure 6B). The ECM-receptor interaction and PI3K-AKT pathways are cross-linked with each other and consist of many genes involved in cell motility and cancer metastasis,  which is consistent with the metastasis-promoting role of CHPF genes. GSEA enrichment analysis was performed on the most significantly enriched ECM-receptor interaction pathway in KEGG, and the results showed a positive correlation between this pathway and CHPF gene expression. We used GEPIA to verify the correlation between the above three genes and the CHPF gene, and the results showed that all the above genes were significantly associated with the CHPF gene ( Figures 6C-F).
In addition, western blotting was performed to verify the key genes of the PI3K-AKT signaling pathway. According to the results, it can be seen that the levels of P-PI3K and P-AKT in the CHPF knockdown group were lower than those in the control group ( Figure 6G). Subsequently, we selected genes differentially expressed in the two groups and involved in the ECM-receptor   interaction and PI3K-AKT pathways for quantitative real-time PCR validation (fold change > 1.5) ( Figure 6H). We analyzed the changes in mRNA levels of a total of 13 genes this time, 10 of which were altered with the CHPF gene alterations. Among them, only COL6A2, and SDC1 were significantly increased in response to CHPF knockdown. In contrast, COL6A2 was indeed positively correlated with CHPF in the GEPIA database, which is contrary to our present findings.

DISCUSSION
Little has been reported about CHPF as a novel tumor-associated gene. At present, only a few publications focus on the role of CHPF in non-small cell adenocarcinoma, hepatocellular carcinoma, and breast cancer. DNA methylation, an epigenetic modification (37)(38)(39)(40). It plays a crucial role in normal human growth and development and cell biology (41,42). Emerging evidence suggests that tumors often hijack various epigenetic mechanisms to evade immune restriction (38,43). There are precise patterns of DNA methylation regulation in healthy human tissues, and changes in them can be detected in cancer development and progression. Previous studies have reported that hypomethylation of oncogenes is one of the hallmarks of almost all types of cancers, including breast cancer (44). Currently, there are no relevant studies on the methylation of this gene and immune infiltration.
The tumor microenvironment (TME) includes tumor cells, various immune cells as well as endothelial cells, and fibroblasts (45,46). Previous studies have reported that TME components, particularly immune cells, influence tumor development and the body's response to immune checkpoint blockade (ICB) (45).
In the current study, CHPF transcript expression was negatively correlated with DNA methylation in breast cancer, and CHPF transcript expression was associated with poorer prognosis while methylation of the CHPF DNA cg03176520 locus was associated with better survival. Immune cell differential analysis and correlation analysis showed that CHPF transcript expression was associated with 10 immune cells, including macrophages, CD4+ T cells, CD8+ T cells, and B cell infiltration. CHPF DNA methylation was significantly and negatively associated with B, CD8+ T/CD4+ T cells, cytotoxic, and Exhausted. In addition, CHPF transcript expression and DNA methylation correlated with various immunomodulators and most chemokines and receptors listed in TISIDB. Our study provides new research direction for the role of CHPF in breast cancer.
Our research also proved a significant increase in CHPF expression in breast cancer tissues. In vitro, CHPF promoted proliferation, migration, and invasion of breast cancer cells. The EMT-related genes associated with migration and invasion were also positive in this experiment. In addition, possible mechanisms were further investigated by bioinformatics analysis. We divided TCGA-BRCA samples into two groups of high and low expression according to the median expression of CHPF, and performed differential expression analysis of genes in both groups, and a total of three significantly different genes were obtained this time, namely MMP11, COMP, and COL6A2. Based on these three differential genes and the target gene CHPF, GO, KEGG, GSEA enrichment analysis was subsequently performed, and the results showed significant enrichment in ECM-receptor interactions and PI3K-AKT signaling pathways. After the knockdown of CHPF, we confirmed the changes in expression of key genes in the PI3K-AKT pathway, especially P-PI3K, P-AKT by western blotting, and the changes in differential genes in ECM-receptor interactions and PI3K-AKT pathway by RT-qPCR. The results showed that BAD, COL1A2, COL6A1, COL6A2, COMP, ITGA11, MMP11, RELN, SDC1, and SV2B were significantly differentially expressed. Moreover, among them, COL6A2, and SDC1 were significantly increased with the knockdown of the CHPF gene. In summary, we speculate that the CHPF gene may function with the above three genes, especially the more significantly altered MMP11, and subsequently promote breast cancer metastasis through the PI3K/AKT pathway. Of course, this requires further experimental validation. There are also shortcomings in the present study, and the following questions still need to be addressed: (1) How the CHPF gene plays a role in promoting the proliferation and migration invasion of breast cancer cells with the help of the PI3K-AKT pathway, and whether it must act through the relevant genes we have validated need to be further investigated. (2) There is a lack of support from animal experiments. In addition, in this study, no experiments such as cell cycle were performed to verify that CHPF promotes cell proliferation (3) The CHPF gene has been linked to immunity in both transcriptional expression and DNA methylation, and the next step is to find the most relevant immune markers for the target gene and to investigate whether CHPF affects certain immunotherapy targets. (4) In addition, we have only explored the relationship between CHPF and immunity initially by bioinformatics methods. Relevant experimental evidence, such as the use of immunohistochemistry or PCR to verify the association between CHPF and immune-inflammatory indicators, is still lacking. The above shortcomings will require more time and effort to explore further in the future.

CONCLUSION
In conclusion, our study demonstrated that CHPF, a novel tumor-promoting gene in BRCA, can promote cell migration and invasion through the ECM and PI3K/AKT signaling pathways, ultimately altering the survival of BRCA patients.  , and EDU assay all showed that knockdown of the CHPF gene significantly inhibited the proliferation of breast cancer cells. (F, G) Wound healing, and invasion assays were performed to identify metastasis ability after CHPF knockdown in MCF-7 and SUM1315 cells. (H) Changes in the expression of the EMT biomarkers E-cadherin, N-cadherin, Snail, and Vimentin after CHPF knockdown were detected by western blot. All experiments were repeated three times. The data are shown as mean ± S.D. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. (H) Results of quantitative real-time PCR of genes differentially expressed and involved in ECM-receptor interactions and PI3K-AKT pathway. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. ns is the abbreviated form of non-significance.