PLCE1 Polymorphisms Are Associated With Gastric Cancer Risk: The Changes in Protein Spatial Structure May Play a Potential Role

Background Gastric cancer (GC) is one of the most significant health problems worldwide. Some studies have reported associations between Phospholipase C epsilon 1 (PLCE1) single-nucleotide polymorphisms (SNPs) and GC susceptibility, but its relationship with GC prognosis lacked exploration, and the specific mechanisms were not elaborated fully yet. This study aimed to further explore the possible mechanism of the association between PLCE1 polymorphisms and GC. Materials and Methods A case-control study, including 588 GC patients and 703 healthy controls among the Chinese Han population, was performed to investigate the association between SNPs of PLCE1 and GC risk by logistic regression in multiple genetic models. The prognostic value of PLCE1 in GC was evaluated by the Kaplan-Meier plotter. To explored the potential functions of PLCE1, various bioinformatics analyses were conducted. Furthermore, we also constructed the spatial structure of PLCE1 protein using the homology modeling method to analyze its mutations. Results Rs3765524 C > T, rs2274223 A > G and rs3781264 T > C in PLCE1 were associated with the increased risk of GC. The overall survival and progression-free survival of patients with high expression of PLCE1 were significantly lower than those with low expression [HR (95% CI) = 1.38 (1.1–1.63), P < 0.01; HR (95% CI) = 1.4 (1.07–1.84), P = 0.01]. Bioinformatic analysis revealed that PLCE1 was associated with protein phosphorylation and played a crucial role in the calcium signal pathway. Two important functional domains, catalytic binding pocket and calcium ion binding pocket, were found by homology modeling of PLCE1 protein; rs3765524 polymorphism could change the efficiency of the former, and rs2274223 polymorphism affected the activity of the latter, which may together play a potentially significant role in the tumorigenesis and prognosis of GC. Conclusion Patients with high expression of PLCE1 had a poor prognosis in GC, and SNPs in PLCE1 were associated with GC risk, which might be related to the changes in spatial structure of the protein, especially the variation of the efficiency of PLCE1 in the calcium signal pathway.


INTRODUCTION
Gastric cancer (GC) is becoming a worldwide problem year by year, endangering human life and health severely. It was estimated that over one million new GC cases occurred in 2018 and about 783 000 patients died of that, making GC the fifth most frequently diagnosed cancer and the third deadliest cancer worldwide (Bray et al., 2018). China has a large number of GC patients, with a 5-year overall survival (OS) of less than 25% (Chen et al., 2016;Zeng et al., 2018). The pathogenesis of GC is still unclear till now, but some risk factors have been reported, such as helicobacter pylori (Shimizu et al., 2014;Plummer et al., 2015;Jukic et al., 2021), Epstein-Barr virus infection (Camargo et al., 2014), low consumption of vegetables and fruits, high intake of salts and pickles, smoking and obesity (Lunet et al., 2007;Lin et al., 2014;Li et al., 2019). However, these research results are far from enough for us to understand the oncogenesis and susceptibility mechanism of GC.
In recent years, the genomic analysis of gastric tumors has highlighted the importance of its gene heterogeneity; and differentiations of GC molecular subtypes may be the key to guiding early diagnosis strategies, identifying new therapeutic targets, and predicting the prognosis of patients. In the last decade, single nucleotide polymorphism (SNP) analysis has been extensively used to screen candidate gene and detect various complex human diseases, providing a way to identify genetic loci associated with the heterogeneity of cancers.
Phospholipase C epsilon 1 (PLCE1) gene is one of the largescale candidate genes located at 10q23 and served as a member of the human phosphoinositide-specific phospholipase C family (Song et al., 2001), which exerts an enormous function on growth, differentiation, and oncogenesis (Citro et al., 2007;Bunney and Katan, 2010;Gresset et al., 2012). The most-reported SNPs in PLCE1 were rs2274223 and rs3765524, which have a significant value in increasing the risk of gastrointestinal tumor progression (Cui et al., 2014a;Mocellin et al., 2015;Mou et al., 2015;Xue et al., 2015;He et al., 2016;Gu et al., 2018). However, relevant studies of the associations between PLCE1 and GC susceptibility remain inconsistent presently, and the prognostic value of PLCE1 in GC is unclear; moreover, the specific mechanism between SNPs and GC risk is elusive now. Thus, further studies are still necessary.
This study aimed to analyze the relationship between three SNPs (rs3765524, rs2274223, and rs378126) in the PLCE1 gene and GC susceptibility by a case-control study in the Chinese Han population firstly; then we explored the prognostic value of PLCE1 in GC using online databases; finally, we tried to explain the correlation mechanism between the SNPs in PLCE1 and the risk and prognosis of GC from the perspective of variable bioinformatics and protein spatial structure changes. We hope to make a contribute to the further exploration on the possible mechanism of the association between PLCE1 polymorphisms and GC.

Study Population
A case-control study was conducted, including 588 patients with GC (392 males and 196 females) and 703 healthy control subjects (396 males and 307 females). All subjects were genetically related to Chinese Han. Patients with histologically confirmed GC in the Second Affiliated Hospital of Air Force Medical University from January 2015 to January 2019 were enrolled. The exclusion criteria for patients were: Patients who had a family history (three generations) of tumors; Those who had received radiotherapy or chemotherapy before blood sampling collection; Patients with any other digestive diseases or caused by metastasis of other cancer. Additionally, the healthy controls were randomly recruited from the physical examination center of the same hospital during the same period when they visited for an annual health examination. When recruiting healthy participants, we investigated the demographic information by personally interviewing through a structured questionnaire by trained personnel, including age, gender, residential region, ethnicity, and family history of cancer and other diseases. The healthy participants who had a family history of cancer were also excluded from the study. After that, we collected 5 mL peripheral blood of each subject to detect the SNPs of the PLCE1 gene for our research. All participants were voluntarily recruited and provided written informed consent before taking part in this study. All research analyses were performed following the approved guidelines and regulations. This study was approved by the Research Ethics Committee of the Second Affiliated Hospital of Air Force Medical University (K201501-05) and abided by the Declaration of Helsinki.

Genotyping
Agena MassARRAYAssay Design 4.0 software was used to design the multiplexed SNP Mass EXTEND assay. The PLCE1 gene rs3765524, rs2274223, and rs3781264 polymorphisms were genotyped on the Agena MassARRAY RS1000 platform according to the standard protocol (Applied Biosystems, Foster City, CA, United States). Then, Agena Typer 4.0 software was applied to analyze and manage our data.

Bioinformatics Analysis
The Prognostic Value of PLCE1 in GC The Kaplan Meier (K-M) plotter 1 was used to evaluate the prognostic value of mRNA expression of PLCE1in GC patients. They were divided into high-and low-expression groups according to median values of mRNA expression and validated by K-M survival curves, with the hazard ratio (HR) with 95% confidence intervals (CIs) and Logrank P-value.

PLCE1 Associated Genes Screening and Enrichment Analysis
STRING database 2 (Szklarczyk et al., 2019) was applied to detect co-expression genes with PLCE1 in GC, and Cytoscape software (Smoot et al., 2011)

Protein Homology-Modeling and Vitalization
The amino acid (aa) sequence of PLCE1 protein was obtained through NCBI. 4 We used SWISS-MODEL 5 to perform PLCE1 protein homology-modeling from its primary sequence (Schwede et al., 2003;Waterhouse et al., 2018). The protein with the highest coverage of the primary sequences was selected as the most homologous protein. We download the files of the constructed protein spatial structures in SWISS-MODEL and then opened them in PyMOL version 2.4 6 for protein visualization to pave the way for PLCE1 protein spatial structure analysis (Arroyuelo et al., 2016;Yuan et al., 2016).

Statistical Analysis
SPSS 26 (IBM SPSS Statistics, RRID:SCR_019096) software was applied to analyze the general characteristics of GC patients and healthy control groups. Welch's t-test and the Pearson Chi-square test were applied to analyze differences of the basic characteristics between the two groups. The Pearson Chi-square test was also used to assess deviation from Hardy-Weinberg equilibrium (HWE) to compare the observed and expected genotype frequencies among the control subjects. Allele and genotype frequencies were compared between GC patients and healthy controls using the Pearson Chi-squared test and Fisher's exact test. To evaluate the associations between PLCE1 SNPs and the risk of GC, we calculated odds ratios (ORs) and 95% confidence intervals (CIs) adjusted by gender and age. Three different genetic models were applied (the codominant model, the dominant model and the recessive model) using PLINK software (PLINK, version 2.0, RRID:SCR_001757). p-value < 0.05 was considered statistically significant in all statistical tests in this study.

Demographic Characteristics
The primary characteristics of all subjects were shown in Table 1. A total of 1,291 participants, including 588 GC patients and 703 healthy controls, were enrolled in this study. The mean age was P-values were calculated using Welch's t-test/Pearson Chi-square test. SD, standard deviation. *P < 0.05 indicates statistical significance, which was marked in bold.
Frontiers in Genetics | www.frontiersin.org 58.12 ± 11.66 years in GC patients and 48.57 ± 9.43 years in healthy controls, which indicated that the patients were elder than the healthy participants (P < 0.001). Besides, the scale of males was larger than females in the GC group (male to female is 66.67-33.33%), while the difference between males and females in the control group was minor (male to female is 56.33-43.67%). The difference in the distributions between GC patients and healthy controls suggested that the ORs and p-values need to be adjusted according to age and gender in subsequent analysis. Additionally, most of the participants in the study had an adverse family cancer history (cases, 96.3%; controls, 98.0%). Moreover, nearly onethird (30.3%) of patients were at an early stage (the carcinoma was confined to the gastric mucosa and submucosa).

Genotyping Analysis
The detailed information of the three selected SNPs, including roles, MAF, and HWE P-values, were listed in Table 2. These SNPs were genotyped successfully in further analysis. MAF of all SNPs was greater than 5%, and the observed genotype frequencies of all SNPs in the control groups were in HWE (P > 0.05).  ORs and P-values were adjusted by age and gender. OR, odds ratio; CI, confidence interval. *P < 0.05 indicates statistical significance, which was marked in bold.
Frontiers in Genetics | www.frontiersin.org comparing patients with high-expression (red) and low-expression (black) of PLCE1 in gastric cancer by two probes (205112 and 214159) were plotted using the Kaplan-Meier plotter database according to the threshold of P-value of < 0.05.
Differences in the frequency distribution of SNPs genotypes and alleles between GC patients and healthy controls were compared by Pearson Chi-squared test and odds ratios (ORs) to evaluate the associations with GC risk, as displayed in Supplementary Table 1. The minor allele of each SNP as a risk factor was compared to the wild-type (major) allele. Remarkably, we found that the allele frequency of rs2274223 locating in the exon region was significantly different between GC cases and healthy controls [OR (95% CI) = 1.20 (1.00-1.45), P = 0.048]. What's more, the genotype of rs3781264 in the intron region was also significantly different between the two groups [OR (95% CI) = 1.43 (1.16, 1.76), P = 0.001].
(2.01-3.16), P < 0.001], which indicated that PLCE1 increased the risk of a poor prognosis in GC patients.

PLCE1 PPI Analysis
We investigated the PPI network of PLCE1 by STRING website, and we obtained the core network constructed by 11 nodes and 22 edges with an average node degree of 4 (P = 0.004; Figure 2). The interaction proteins with PLCE1 were PIP5K1A, PIP5K1B, PIP5K1C, PIP5KL1, IPMK, ITPKA, ITPKB, HRAS, RAP2B, and RRAS.

PLCE1 Protein Spatial Structure Changes
We modeled the primary PLCE1 protein by SWISS-MODEL. The original (wild-type) model of PLCE1 was shown in Figure 5A. FIGURE 3 | The GO enrichment analysis of PLCE1 and its co-expression genes by DAVID database. BP (biological process) was marked in green; CC (cellular component) was in orange; and MF (molecular function) was in purple.
The protein was colored from blue to red, representing the coiled peptide chain from N-to C-terminal. We found that the PLCE1 protein had two crucial functional domains, namely the calcium ion binding pocket (related to activity), which is composed of 1,873, 1,897, 1,926, 1,928, and 1,933 aa sites (red in Figure 5B), and the catalytic binding pocket (related to catalytic efficiency), consisting of 1,391, 1,392, 1,421, 1,423, 1,436, 1,470, 1,637, 1,639, 1,743, 1,770, and 1,772 aa sites (orange in Figure 5B).
Hence, the rs2274223 (A > G) changed the aa at the 1927 site, which may affect the activity of the calcium-binding pocket (yellow in Figure 5C). Similarly, the mutation of rs3765524 (T > C) enabled the aa at the 1,771 site to change, influencing the catalytic efficiency of the catalytic binding pocket (green in Figure 5C).
Interestingly, in further analysis of the impact of the single aa mutation on the protein microenvironment, we found that the ARG1927, in the wild type, formed two ionic bonds with MET1901 and SER1903, respectively (Figure 5D), making the interaction force between the two loops extremely tight. However, the mutation (A > G) of rs2274223 resulted in Arg1927His in PLCE1 protein, displayed in Figure 5E; although it still formed ionic bonds with these two aa residues after the mutation, one of them was located on the loop of the 1,927 site itself and formed a conjugate bond, causing the attraction between the residues to be stronger than the original one. Consequently, the loop in 1,927 would be tighter than before, and the calcium-binding pocket was more difficult to open after the mutation, leading to the decrease of the protein (PLCE1) activity.
Likewise, in the wild-type, Ile1771 formed two ionic bonds with Gln1687, Val1689 residues, respectively. The interaction force between the ionic bond and the left loop was tight, but no force existed between the loops on the right to "fix" (Figure 5F), so it would be easier for the dissociation in a solution or the local changes, facilitating the substrate entered the active center readily. However, rs3765524 (T > C) lead to FIGURE 4 | The KEGG enrichment analysis of PLCE1 and its co-expression genes by DAVID database. The size of the circle represents the counts of genes enriched, and the larger the circle, the more genes were enriched. From orange to blue, -10log (P-value) gradually decreased.
the Ile1771Thr, which generated four ionic bonds with the four aa residues (Gln1687, Val1689, Ser1772, and Leu1798) in the surrounding space, two of which located on the left loop and the others on the right loop, making the local structure more stable, so the change of the catalytic pocket seemed to be more challenging (Figure 5G).
These variations mentioned above combined with the results of bioinformatics analysis indicated that SNPs in PLCE1 could change the catalytic activity of the protein in Ca 2+related pathways, so more substrates (such as Ca 2+ ) might be required to perform normal functions, which will be verified in our future studies.

DISCUSSION
As a common genetic variation in human genome, SNP is beneficial for understanding the possible relationships between tumors and individuals' biological functions on a genomic scale. It provides a comprehensive tool for identifying candidate genes of cancer, offering fundamental knowledge for clinical diagnosis and revealing drug discovery for relevant genetic diseases; therefore, SNP is considered as a kind of commendable biological marker in diverse tumors (Engle et al., 2006).
Protein is an indispensable carrier of various biological activities and plays a crucial role in the smooth progress of diverse life courses. The primary structure of a protein is aa sequence, which is derived from gene transcription and translation. It is the basis of a high-order structure of a protein and determines the spatial structure and functional properties of a protein. When a SNP is present in a gene, the expressed aa sequence may change, resulting in a change in the spatial structure of the protein. Therefore, it is imperative to study the risk of SNPs and GC from the perspective of protein spatial structure changes, which will contribute to the research on the pathogenesis and prognosis of GC.
In this study, for the first time, we analyzed the correlation between SNPs and GC susceptibility and prognosis in terms of protein spatial structure changes. Firstly, we carried on a casecontrol study, and by detecting and analyzing the differences on SNPs of PLCE1 between GC patients and healthy controls, we found that rs3765524 (C > T), rs2274223 (A > G), The wild-type protein microenvironment analysis of PLCE1 on the single 1927aa site. ARG1927 formed two ionic bonds with MET1901 and SER1903, respectively, making the force between the two loops very tight. (E) The microenvironment analysis of the mutation (rs2274223 A > G) of PLCE1. Arg1927His-mutant of PLCE1 formed ionic bonds with these two aa residues; one of them was located on the loop of the 1927 site itself and formed a conjugate bond, making the attraction between the residues stronger than the wild-type. (F) The wild-type protein microenvironment analysis of PLCE1 on the single 1771aa site. Ile1771 formed two ionic bonds with Gln1687 and Val1689 residues, respectively. (G) The microenvironment analysis of the mutation (rs3765524 T > C) of PLCE1. Ile1771Thr mutant formed four ionic bonds with Gln1687, Val1689, Ser1772, and Leu1798, two of which located on the left loop and the others on the right loop, making the local structure more stable. and rs3781264 (T > C) were related to the susceptibility of GC. Then, the K-M plotter demonstrated that high-expression of PLCE1 was associated with poor survival in GC. To explore the potential function of PLCE1, we used a series of bioinformatics tools, investigating the PPI network, GO and KEGG of PLCE1, and found it played a potential role in the calcium signaling pathway. Furthermore, we constructed the primary and mutant protein spatial structures of PLCE1 by homology modeling method, and interestingly, we found that the changes of the protein spatial structure could reduce the catalytic activity, which might mainly influence its function in Ca 2+ -related pathways. Combined with the bioinformatic results of PLCE1, we speculated that PLCE1 polymorphisms increase GC susceptibility by changing the spatial structure of PLCE1 protein, affecting its activity and catalytic efficiency in the calcium signaling pathway. This hypothesis will be verified in our future experiments.
As a member of the phospholipase C family of proteins, PLCE1 encodes a phospholipase C enzyme which mediates the hydrolysis reaction of phosphatidylinositol-4,5-bisphosphate to produce the Ca 2+ -mobilizing second messenger inositol 1,4,5-triphosphate and the protein kinase C-activating second messenger diacylglycerol. It interacts with the protooncogene Ras among other proteins (Bunney et al., 2009).
The expression of PLCE1 was significantly related to tumor differentiation degree, invasion depth, lymph node metastasis and distant metastasis (Cui et al., 2014b;Cheng et al., 2017;Yu et al., 2020).
We confirmed the significance of the two SNPs previously reported, rs3765524 and rs2274223, and revealed another SNP in PLCE1, rs3781264, through genotyping and logistic regression in this case-control study was associated with the GC risk. Abnet et al. (2010) firstly used GWAS to identify those variants of PLCE1 had a significant correlation with GC in the Chinese Han population Until now, an increasing number of studies have identified a shared susceptibility locus in PLCE1 such as rs2274223 A > G and rs3765524 C > T for gastrointestinal cancer (Abnet et al., 2010;Umar et al., 2013;Cui et al., 2014b;Liu et al., 2014;Malik et al., 2014;Mocellin et al., 2015;He et al., 2016;Gu et al., 2018;Li et al., 2018;Liang et al., 2019;Xie et al., 2020), and the most reported SNP of PLCE1 was the former, but the conclusions lack consistency. A meta-analysis showed that PLCE1 rs2274223 polymorphism resulted in susceptibility to esophageal and GC in Asians (Umar et al., 2013). However, another study suggested that an increased association between rs2274223 and GC risk among Asian ethnic groups could only be observed in esophageal cancer rather than GC (Xue et al., 2015). The discrepancy probably results from considerable heterogeneity in these studies as well as gene-gene interaction and gene-environment interaction. A study (Liang et al., 2019) also confirmed our hypothesis at the protein level by immunohistochemistry (IHC), which confirmed that the PLCE1 protein expression was higher in group of rs3765524 CT/TT than in group of rs3765524 CC. Additionally, our study also showed that rs3781264, located on an intron region, had a potential relationship with GC risk, which was scarcely reported before. Hitherto, most the previous studies focus on the correlation between gene SNPs and cancer susceptibility or risk but never explore its mechanism further.
Currently, the diagnosis, treatment and prognosis of GC are usually based on a risk stratification system. The most efficient curative therapeutic option for GC patients is timely adequate surgical resection (Lutz et al., 2012). Besides, chemotherapy, as a way of second-line treatment, can improve overall survival (Kang et al., 2012). Although we have some understanding of carcinogenesis of GC, early diagnosis and appropriate therapy methods on GC patients still remain a major clinical challenge till now (Choi et al., 2003;Ang and Fock, 2014). It is essential for individuals to identify high-risk GC; thus, more precise gene loci associated with it should be explored. In this study, the K-M plotter analysis was performed in the online bioinformatics database, and both two probes showed that the patients with high mRNA expression of PLCE1 would have a poorer prognosis. It was suggested that PLCE1 might have the potential to be a biomarker for the prognosis of GC.
The function of a protein is significantly determined by the spatial structure, which is an indispensable part of protein research. In this study, we analyzed the changes of PLCE1 protein spatial structure after mutations by homology modeling method; and we found it had two important functional domains, calcium-binding pocket related to its protein activity and Ca 2+ binding pockets associated with the efficiency of Ca 2+ , which were never reported before. Interestingly, the two SNP sites we focused on, rs2274223 and rs3765524, were located on these important domains. The mutation in rs2274223 affected the Ca 2+ binding pockets, deregulating its bioactivity efficiency related to Ca 2+ , and the T > C change in rs3765524 resulted in the efficiency decrease in catalytic activity. All these above together altered the structure, stability, and function of PLCE1 protein. Therefore, by our research, we suppose that SNPs of PLCE1 may have potential significance in the tumorigenesis and progression of GC, perhaps mainly attributed to the changes of the protein activity, but further studies are needed to confirm.
In summary, this study for the first time analyzes the correlation between SNPs of PLCE1 and GC in terms of protein spatial structure changes, which has a great significance to the diagnosis and treatment for patients with GC. The more complex connections or the subtle crosstalk will be verified in our future paper, and actually, this experiment is being carried out in full swing.
There were some limitations in this study. Firstly, we selected only three SNPs of PLCE1, and more other potentially significant loci were not included in this case-control study. Secondly, the prognostic value of PLCE1 was investigated in the patients from the online database but not the subjects included in our study, which probably caused background heterogeneity. Thirdly, the mechanism of potential significance in the tumorigenesis and progression of GC was based on the bioinformatic results and the protein homology modeling analysis but lack of experimental verification. Therefore, studies in vitro and in vivo are needed and will be performed in the future to confirm our results, and we hope to contribute to the era of precise diagnosis and individualized treatment of GC.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Research Ethics Committee of The Second Affiliated Hospital of Air Force Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XH, LYu, and GB designed the research. ZY, SC, JX, and SD performed the study. XH, JJ, SP, PY, LYu, and LYa analyzed the results. XH and JJ edited and commented on the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to acknowledge all the participants involved in this study.