A possible genetic association between obesity and colon cancer in females

Object There is mounting clinical evidence that an increase in obesity is linked to an increase in cancer incidence and mortality. Although studies have shown a link between obesity and colon cancer, the particular mechanism of the interaction between obesity and colon cancer in females remains unknown. The goal of this work is to use bioinformatics to elucidate the genetic link between obesity and colon cancer in females and to investigate probable molecular mechanisms. Methods GSE44076 and GSE199063 microarray datasets were obtained from the Gene Expression Omnibus (GEO) database. In the two microarray datasets and healthy controls, the online tool GEO2R was utilized to investigate the differential genes between obesity and colon cancer. The differential genes (DEGs) identified in the two investigations were combined. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment studies were performed on the DEGs. The STRING database and Cytoscape software were then used to build protein-protein interaction (PPI) networks to discover hub genes. NetworkAnalyst was also used to build networks of target microRNAs (miRNAs) and hub genes, as well as networks of transcriptions. Results Between the two datasets, 146 DEGs were shared. The DEGs are primarily enriched in inflammatory and immune-related pathways, according to GO analysis and KEGG. 14 hub genes were identified via PPI building using the Cytoscape software’s MCODE and CytoNCA plug-ins: TYROBP, CD44, BGN, FCGR3A, CD53, CXCR4, FN1, SPP1, IGF1, CCND1, MMP9, IL2RG, IL6 and CTGF. Key transcription factors for these hub genes include WRNIP1, ATF1, CBFB, and NR2F6. Key miRNAs for these hub genes include hsa-mir-1-3p, hsa-mir-26b-5p, hsa-mir-164a-5p and hsa-mir-9-5p. Conclusion Our research provides evidence that changed genes are shared by female patients with colon cancer and obesity. Through pathways connected to inflammation and the immune system, these genes play significant roles in the emergence of both diseases. We created a network between hub genes and miRNAs that target transcription factors, which may offer suggestions for future research in this area.


Introduction
In developed nations, a considerable fraction of the population is overweight or obese.Body mass index (BMI), which employs ranges of 18.5-24.9for normal, 25-29.9 for overweight, and 30 for obesity, is typically used to measure obesity (1).Obesity prevalence has significantly increased during the previous ten to fifteen years and has already reached epidemic levels, affecting 2 billion individuals (2,3).Due to an increase in caloric intake and a decrease in physical activity, there is a persistent rise in the incidence of related cancers.This is thought to be the cause of the ongoing rise in the obese and overweight population (4).Obesity is more prevalent in women than in males (5), and up to 20% of cancer deaths in women are thought to be attributable to it (6).
In developed countries, colon cancer (CC) is the second most prevalent malignant disease (7).Colonic epithelial cells, which are positioned in the organ's lumen and self-renew every five days from cryptic stem cells at the base of the colonic epithelial cell crypt, are the source of CC (8).There may be a variety of variables that affect cellular self-renewal and lead to the formation of CC.These variables include one's way of life, food and genetic alterations (9), which account for 15-30% of cases of CC and are brought on by oncogene overexpression and tumor suppressor gene inactivation (10).There is still a sizable number of CCs that are irregular and linked to environmental or lifestyle variables.Patients lack a definite genetic susceptibility (11,12).CC is the most prevalent type of nutrition-related cancer in the general population, according to research (13).Hypertrophic expansion of fat adipose tissue brought on by overeating shares many characteristics with solid tumor growth.Obese people's adipose tissue contains adipocyte precursor cells that accelerate tumor growth by producing more endothelial precursor cells, pericytes and adipocytes, which results in angiogenesis and the growth of cancer cells in living organisms (14).
Obese individuals have hyperinsulinemia, insulin resistance, hormonal dysregulation, a chronic inflammatory state and a dysregulated energy metabolism.These modifications put obese people at higher risk for cancer and a poor prognosis (15,16).Adipose tissue is abundant in the tumor mesenchyme and adipocytes operate as endocrine cells that secrete signaling molecules such adipokines, pro-inflammatory cytokines and proangiogenic proteins to aid in the development and growth of tumors (17).For instance, through activation of the prostaglandin and insulin-like growth factor pathways, long-term persistent chronic inflammation and insulin resistance increase colon cell proliferation and block programmed cell death or apoptosis.The Wnt pathway is particularly rich in proteins involved in the control of cell polarity, mitotic activity and fertility, which is closely associated with colon carcinogenesis (18).Additionally, visceral adipocytes released greater amounts of lactate dehydrogenase (19), interleukin (IL)-6, IL-8 and tumor necrosis factor alpha (TNF).These inflammatory substances activate signaling pathways associated with NF-kB, STAT3, phosphatidylinositol 3-kinase/threonine kinase Akt (PI3K/Akt), mechanistic target of rapamycin/mitogen-activated protein kinase (mTOR/MAPK/p38) and other pro-inflammatory pathways.The development of metastases and associated cancer cell proliferation, invasion, angiogenesis and cell survival are all mediated by the activation of these pathways (4, [20][21][22].In contrast, while most clinical investigations have identified a significant link between weight and CC risk in males, there has been little to no evidence of this relationship in women (23)(24)(25)(26)(27). Therefore, it is crucial to research the relationship between female obesity and CC.Understanding the genetic relationships between CC and obesity may offer crucial mechanistic insights into CC in the setting of female obesity.
With the advancement of information technology in biology, increased gene screening in a genomic context has been made possible in recent years.To determine the differences between CC and obesity, this work employed two original microarray datasets retrieved from Gene Expression Omnibus (GEO, http:// www.ncbi.nlm.nih.gov/geo).The detected DEGs were then examined using gene ontology (GO) analysis, KEGG pathway analysis and protein-protein interaction (PPI) analysis.Then, using the Cytoscape program, we predicted the probable transcription factors and miRNAs of hub genes and created a network that displayed the interactions between transcription factors, miRNAs and genes.

Data download
The two datasets mentioned above were downloaded from the GEO website (https://www.ncbi.nlm.nih.gov/geo/).For the GSE44076 dataset, we chose 27 female CC patients and 23 healthy female controls; for the GSE199063 dataset, we chose 50 female obese patients and 28 healthy female controls.Table 1 displays the dataset's specifics.Without further authorization, the data were downloaded from the official platform.

Screening of differential genes between the two datasets
Using the online biometric analysis tool GEO2R (https:// www.ncbi.nlm.nih.gov/geo/geo2r/), which is available on the official GEO website, differentially expressed genes in the two datasets were analyzed.Volcano plots were downloaded to further screen the DEGs between the two datasets with a screening threshold of|log2 FC | ≥ 1, P < 0.05.For overlapping DEGs between female obesity and female CC, the resulting differential genes between the two datasets were visualized using Venn plots from the Venn online platform (http://bioinformatics.psb.ugent.be/webtools/Venn/).The analysis of the obtained overlapped DEGs then continued.

Functional enrichment analysis of DEGs
With the use of the R program, the aforementioned overlapping DEGs were analyzed for GO enrichment, including biological process (BP), cellular component (CC) and molecular function (MF) with an adjusted P value < 0.05.

Pathway enrichment analysis of DEGs
The KEGG pathway enrichment analysis of the aforementioned overlapping DEGs was performed using the R package with an adjusted P value < 0.05.

Protein interaction network analysis of DEGs
To further examine the relationships between DEGs, the aforementioned list of overlapping DEGs was imported into the STRING (http://string-db.org)database.To download the list of protein mediators and subsequently use Cytoscape software to display and analyze the PPI network, an interaction score of at least 0.4 was deemed significant.Key protein expression molecules were screened using the Cytoscape plugin Minimal Common Oncology Data Elements (MCODE), and PPI network was obtained.The PPI network was screened for hub genes using the number-centered method with the CytoNCA plug-in.The 14 genes that overlap between the two calculations made by the plug-ins were then subjected to the next step of study.

MiRNAs and transcription factors associated with hub genes
The miRNAs, transcription factors, and miRNAs-transcription factors-gene connections of hub genes were created using the NetworkAnalyst program (version 3.0, available at https:// www.networkanalyst.ca/).Cytoscape software was used to view and evaluate the data.

Identification of DEGs in obesity and colon cancer
From the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/), we retrieved the GSE199063 dataset on obesity and the GSE44076 dataset on CC. 355 DEGs were found in the GSE199063 dataset and 3135 DEGs were found in the GSE44076 dataset after modifying the threshold screening for P value < 0.05 and |log2 FC | >1.0.The GEO 2R web tool Volcano Plot was used to visualize the data results.The overlapped 146 genes were then obtained by intersecting the two datasets.The data outcomes were shown graphically in Figure 1.

GO analysis of overlapping DEGs
We used the cluster Profiler tool in R to analyze the GO enrichment function of overlapping DEGs in order to understand the function of these genes.The molecular function (MF), biological process (BP) and cellular component (CC) make up the three parts of the GO enrichment function.After the P < 0.05 screening adjustment, the following results were obtained: BP functions mainly include positive regulation of cell adhesion, positive regulation of leukocyte activation, positive regulation of cell activation, positive regulation of lymphocyte activation, regulation of leukocyte cell-cell adhesion, positive regulation of leukocyte cellcell adhesion, positive regulation of T cell activation, positive regulation of T cell adhesion.CC functions mainly include collagen-containing extracellular matrix, external side of plasma membrane, endoplasmic reticulum lumen, secretory granule membrane, collagen trimer, basement membrane, MHC class II protein complex, MHC protein complex.MF functions mainly GSE44076: normal adjacent mucosa and tumor samples from 98 individuals and 50 healthy colon mucosae.However, we only selected female patients who met the criteria for our study.GSE199063: adipose tissue from a cohort of obese women.Adipose tissue biopsies were obtained from the control group(n = 28),before RYGB(n = 50), and then 2(n = 49) and 5 years (n = 38)thereafter.

KEGG analysis of overlapping DEGs
We evaluated the KEGG enrichment function of these genes using the cluster Profiler program in R software in order to comprehend the enrichment pathway of overlapping DEGs.The

A B C
Volcano and Venn plots of differentially expressed genes.(A), genes differentially expressed between obese and control in the GSE199063 dataset (red dots represent up-regulated genes, blue dots represent down-regulated genes).(B), genes differentially expressed between colon cancer and healthy control in the GSE44076 dataset (red dots represent up-regulated genes, blue dots represent down-regulated genes).(C), Number of genes overlapping differentially expressed genes in the above two datasets.(CC, colon cancer).

FIGURE 2
Bubble plot of GO enrichment analysis results.x-axis indicates the proportion of genes per functional term.y-axis indicates the annotated terms of gene enrichment.The circle size represents the number of genes: the larger the circle, the higher the number of genes.The circle color represents the adjusted P value: the redder the color, the higher the degree of gene enrichment.MF, Molecular Function; BP, Biological Process; CC, Cellular Component.

Target transcription factors prediction
To forecast the hub genes' target transcription factors, we used the Network Analyst database.Cytoscape software was used to import the obtained data results for visual analysis.In Figure 7 and Table 2, we predicted the transcription factors of hub genes, where the transcription factor WRNIP1 interacts with 5 core genes, including CD44, CD53, IL2RG, IGF1, and CXCR4.Transcription factors ATF1, CBFB and NR2F6 interacted with 4 core genes, including transcription factor ATF1 interacting with FCGR3A, TYROBP, IGF1 and FN1.The transcription factor CBFB interacts with CD44, TYROBP, IGF1 and CXCR4.The transcription factor NR2F6 interacts with IGF1, CCND1, CD53 and FN1.The transcription factor ZNF24 interacts with CD44, TYROBP and CD53.The transcription factor NR2C2 interacts with IL2RG, TYROBP and CD53.The transcription factor SMARCE1 interacts with IL2RG, IGF1 and FCGR3A.The transcription factor GABPA Bubble plot of KEGG enrichment analysis results.x-axis indicates the proportion of genes per functional term.y-axis indicates the annotated terms of gene enrichment.The circle size represents the number of genes: the larger the circle, the higher the number of genes.The circle color represents the adjusted P value: the redder the color, the higher the degree of gene enrichment.The protein interaction network obtained from the analysis of PPI with Cytoscape plugin cytoNCA, the circles represent the proteins and lines connect the interacting proteins.

Target miRNAs prediction
To forecast the hub genes' target transcription factors, we used the Network Analyst database.Cytoscape software was used to import the obtained data results for visual analysis.In Figure 8 and Table 3, we predicted the transcription factors of hub genes, where Transcription factor-gene network of 12 hub genes, red circles represent hub genes and blue diamonds represent transcription factors.

A B D C
The protein interaction network obtained by analyzing PPI with Cytoscape plugin MCODE, (A-D) represent the four sub-modules obtained by MCODE plugin.

Transcription factors and miRNA network construction of hub genes
To build the network between target miRNA-transcription and factor-hub genes, we used the Network Analyst database.Cytoscape software was used to visualize and analyze the generated data results (Figure 9).

Discussion
In practically all developing and developed nations, the prevalence of overweight and obesity has substantially increased.
In developed nations, it has reached prevalence levels of 60-70% of the adult population and is particularly prevalent in cities and among women.Undoubtedly, this poses a significant threat to the weight of the global economy (28).While obesity-related endocrine disturbance is significantly linked to the onset of cancer, earlier research has found a connection between obesity and CC in males.It is still not entirely apparent how CC is related to obese females.Therefore, it is very important to investigate any potential biological processes that link female obesity to the development of CC.In this study dataset GSE44076, female CC patients had an average age of 69.63 ± 9.44 years.These female patients were diagnosed with CC when they were in the menopausal stage.Endocrine disturbance and the chronic inflammatory state brought on by weight changes in these patients may be directly related to the onset of CC at this period.
In this study, we downloaded data from two gene chips from the GEO website and performed a series of bioinformatic analyses based on obese women and women with CC. Between the two datasets, we were able to identify 146 DEGs.We discovered that these genes were primarily enriched in inflammatory and immunerelated pathways by GO analysis and KEGG analysis.These pathways were tightly linked to CC development and endocrine dysregulation brought on by obesity.We also discovered 14 hub genes, including TYROBP, CD44, BGN, FCGR3A, CD53, CXCR4, FN1, SPP1, IGF1, CCND1, MMP9, IL2RG, IL6 and CTGF, which were crucial in the development of CC and obesity.Then, for these 14 hub genes, we created the miRNAs-gene-transcription factor regulatory networks.miRNA-gene network of 9 hub genes, red circles represent hub genes, yellow diamonds represent miRNAs.
TABLE 3 The microRNA of identified hub genes.
In conclusion, our research discovered a number of critical genes.miRNAs -gene-transcription factor regulatory networksplay a role in the pathophysiology of female obesity and female CC, making them attractive diagnostic and therapeutic targets for CC in the setting of female obesity.

Conclusion
Our findings show that females with obesity and females with CC have shared genes.Our findings suggest that obesity and CC can be genetically connected in female individuals.Between female adipose patients and female CC, we discovered 14 hub genes and built a transcription factor-miRNA-gene network comprising hub genes.WRNIP1, ATF1, CBFB, and NR2F6 transcription factors were found in our study.Furthermore, hsa-mir-1-3p, hsa-mir-26b-5p, hsa-mir-164a-5p, and hsa-mir-9-5p are implicated in the development of female obesity and female colon cancer via inflammatory and immune-related pathways.Our findings could help with future mechanistic research and medication target prediction.
YT reviewed and edited the manuscript.All authors contributed to the article and approved the submitted version.All authors contributed to the article and approved the version.

FIGURE 4
FIGURE 4Network diagram of KEGG enrichment analysis results.The line segments connect genes and enrichment pathways, and different colors represent different enrichment pathways.The size of the circles represents the different number of connected line segments, the larger the circle, the more genes and pathways are connected, the gray circle represents genes, the yellow circle represents pathways.

TABLE 1
Details of the two dataset clusters.

TABLE 2
The transcription factors of identified hub genes.