Rare variants confer shared susceptibility to gastrointestinal tract cancer risk

Background Cancers arising within the gastrointestinal tract are complex disorders involving genetic events that cause the conversion of normal tissue to premalignant lesions and malignancy. Shared genetic features are reported in epithelial-based gastrointestinal cancers which indicate common susceptibility among this group of malignancies. In addition, the contribution of rare variants may constitute parts of genetic susceptibility. Methods A cross-cancer analysis of 38,171 shared rare genetic variants from genome-wide association assays was conducted, which included data from 3,194 cases and 1,455 controls across three cancer sites (esophageal, gastric and colorectal). The SNP-level association was performed by multivariate logistic regression analyses for single cancer, followed by association analysis for SubSETs (ASSET) to adjust the bias of overlapping controls. Gene-level analyses were conducted by SKAT-O, with multiple comparison adjustments by false discovery rate (FDR). Based on the significant genes indicated by SKATO analysis, pathways analysis was conducted using Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases. Results Meta-analysis in three gastrointestinal (GI) cancers identified 13 novel susceptibility loci that reached genome-wide significance (P ASSET< 5×10-8). SKAT-O analysis revealed EXOC6, LRP5L and MIR1263/LINC01324 to be significant genes shared by GI cancers (P adj<0.05, P FDR<0.05). Furthermore, GO pathway analysis identified significant enrichment of synaptic transmission and neuron development pathways shared by all three cancer types. Conclusion Rare variants and the corresponding genes potentially contribute to shared susceptibility in different GI cancer types. The discovery of these novel variants and genes offers new insights for the carcinogenic mechanisms and missing heritability of GI cancers.


Introduction
Gastrointestinal (GI) cancers represent the most commonly diagnosed cancer types globally, a group of cancers exhibiting the highest incidence burden (1). According to GLOBOCAN 2020, GI cancers account for approximately 18.8% of cancer incidents and 22.6% of cancer deaths worldwide, and contributing to major public health burdens (2). The incidence of GI cancers also varies by country and region. China has far more new cases of GI cancers than the rest of the world, accounting for 29.7% and 32.0% of all cancer cases and deaths in the region in 2020 (2,3). The particular high incidents were seen in esophageal and gastric cancer; both accounted for about half of the new cases in the world in 2020 (esophageal cancer: 53.7%, gastric cancer: 44.0%) (2). Although the incidence of colorectal cancer in China is moderate comparing to high-incidence country, it is still higher than global average along with a rising trend over the past few years (4). It is established that genetic and lifestyle factors play vital roles in the risk of cancer. Family studies and population-based studies also revealed genetic components contributing to cancer susceptibility, possibly by modifying risk factors induced by environmental carcinogens or altering the functions of critical regulatory pathways (5).
It is estimated that approximately 5% of GI cancers incidents are attributed to inherited genetic mutations with familial aggregation characteristics (6), and an additional 20-25% are estimated to have hereditary components which yet to be established. Most of these cancers develop from sporadic events, with diverse arrays of genetic variations contributing to cancer susceptibility. Tremendous efforts have been done in the search for missing heritability; in particular genome-wide association studies (GWAS) and next-generation sequencing (NGS) technology have been performed to identify the effects of common and rare variants across various cancer types (6). GWASs have been instrumental in deciphering the effects of common variants (Minor Allele Frequency (MAF) > 1%) on the carcinogenesis of single cancer (7)(8)(9). However, the confirmed candidate causal variants so far only explain for a small proportion of the heritability. The effects of rare variants (MAF<1%) have been relatively less investigated. Based on the established heritability for GI cancers, more evidence suggests that rare variants exhibiting large effect sizes still remain to be discovered. These variants displaying high penetrance may have more profound clinical manifestations (10). The aggregation of rare variants in predisposition genes can lead to changes in gene expression or function (11), and altered functions in biological pathways may be pivotal in the carcinogenetic process of GI cancers.
Discerning the molecular mechanisms of sporadic GI cancers is complex. In addition to the effort in single cancer research, shared genetic risk factors have been reported in various cancer groups, and target genes with potential cross-cancer roles have been identified at specific genetic loci. For instance, shared mechanisms of carcinogenesis were reported in hormone-sensitive and obesityrelated cancers (12,13). Common variants in epigenetic and vitamin D metabolism genes were reported for the susceptibility of a subset of GI cancer (14,15). A pan-cancer study suggested common genes were shared between 12 cancer types, including gastric cancer (16). The 9p21 region have been associated with multiple tumors including esophageal squamous cell carcinoma (ESCC) and endometrial cancer (17,18). However, the effects of shared rare variants have been scarcely investigated in the literature. Based on this background, we conducted a cross-cancer study of rare variants by performing meta-analysis on the genotyping data from genome-wide association arrays on esophageal, gastric and colorectal cancers, and assess the aggregate effects of significant variants on the gene level. We hypothesize that these rare genetic variants may facilitate identification, at genome-wide significance, of risk loci shared among GI cancer, leading to the identification of novel susceptibility genes and pathways contributing to carcinogenesis, which may in turn explain part of the missing heritability.

Study population
The ESCC, GC, and CRC patients from Fudan University Shanghai Cancer Center (FUSCC) were derived from published studies (19), which were newly diagnosed and histopathologically confirmed between 2009 and 2011. The control population was enrolled at the same time from a cancer-free healthy Han individuals (n=1455) in Taizhou Longitudinal cohort (TZL) study from Eastern China (20). The inclusion criteria and exclusion criteria of patients and controls were described in detail previously (21)(22)(23), and patients of each cancer type were matched with the cancer-free controls by sex and age ( ± 5 years). After matching, 1,066 cases and 1,122 controls, 1,069 cases and 1,109 controls, and 1,065 cases and 1,096 controls were included in the ESCC, GC and CRC datasets, respectively ( Figure 1). Blood samples from ESCC, GC, CRC patients and cancer-free controls were provided by the tissue banks of FUSCC and TZL studies, respectively. Each participant donated approximately 10 mL of peripheral blood, from which genomic DNA was extracted. For validation, three independent datasets were recruited, which included Beijing ESCC GWAS dataset (cases/controls:2,031/ 2,044), Chinese GC meta-GWAS (cases/controls:10,254/10,914), Nanjing CRC GWAS (cases/controls: 1,316/2,207). The details of the external validation population have been described in previous studies (8,24,25). All participants provided written informed consent for scientific research use of their biological samples and demographic information, including age, sex and smoking. The study protocol was approved by the Institutional Ethics Review Board of FUSCC.

Genome-wide association scanning and quality control
Genomic DNA was extracted from the blood samples of patients and controls, and genome-wide genotyping was performed by the Infinium ® Global Screening Array-24 v1.0 Beadchip (Illumina, US). Quality control (QC) of the raw genotyping results was conducted in the case-control dataset of each cancer type. The exclusion criteria of SNPs were set as follows: 1) Duplicated variants; 2) Missing rate over 5% between cases and controls; 3) MAF ≥ 1%; 4) SNPs not mapped to autosomal chromosomes; 5) SNPs with genotyping frequency = 0; 6) Deviation from Hardy-Weinberg Equilibrium (HWE) test (P< 1x10 -4 ). The exclusion criteria of ineligible individuals are described as follows: 1) Inconsistent sex between determined genotypes and clinically derived information; 2) First-degree relatives (IBD< 0.5); 3) Missing genotype type > 5%. After QC, 2,158 individuals (1,061 cases/1,097 controls) and 112,413 variants in ESCC dataset, 2,172 individuals (1,068 cases/1,104 controls) and 109,771 variants in GC dataset, 2,114 individuals (1,065 cases/1,049 controls) and 113,867 variants in CRC dataset were retained and Schematic overview of cross-cancer analysis and quality control process.
subjected to further analysis. The details of the QC process are illustrated in the study flow chart (Figure 1).

Variant classification and annotation
After QC, 38,171 genetic variants (MAF< 0.01) were found to be shared among the 3 cancer types. ANNOVAR software was used to annotate these rare variants by their location and characteristics (26). Overall, 35,923 of 38,171 shared variants in three cancer types were successfully annotated and classified by ANNOVAR. We divided the variants into nine categories: missense mutation, nonsense/nonstop mutation, intergenic region mutation, intronic mutation, ribonucleic acid (RNA) mutation, 3' untranslated region (3'UTR) mutation, 5' untranslated region (5'UTR) mutation, splice site mutation, in-frame/ frame-shift insertion/deletion mutation. All variants were genotyped by GSA BeadChip array, which is comprised of SNPs or indels. The results of rare variant classification determined by ANNOVAR are presented in Supplementary Figure 1.

Single variant association analysis
First, logistic regression models were applied in 3 cancer datasets separately to assess the association between single variant and the risk of ESCC, GC and CRC. Odds ratio (OR) with 95% confidence interval (CI) and P-values were calculated by additive model adjusted for sex and age. Then meta-analysis of the individual cancer results was conducted to determine the pooled association across these three cancer types, with the significance criteria of genome-wide threshold (P<5×10 -8 ), consistency in risk alleles and odds ratio. The fixed effect model was applied if SNPs passed Cochran's Q test (Q > 0.1), otherwise random effect models were considered. The association results of individual cancer and meta-analysis were presented by Manhattan plot. Genomic inflation statistics (l) were calculated and visualized by quantile-quantile (Q-Q) plot to evaluate population stratification in each dataset. To account for the correlation between datasets due to overlapping controls, we applied Association analysis for SubSETs test (ASSET), which is a subset-based approach to determine the best subset of subjects with maximized test statistics for meta-analysis (27). In our study, all significant SNPs passed Cochran's Q test and showed consistent direction of association in each cancer; therefore, onesided ASSET analysis was conducted with a fixed-effect model (28). In order to identify the effects of gender, age, and smoking on the results of the single variant, we further stratified the population by men, women, age ≤60, age > 60, never smoking and ever smoking, and performed logistic regression analysis in each subgroup. The significant variants identified by meta-analysis were further validated in external datasets from previous reported GWAS.

Gene-based association analysis
Due to the fact that the MAF of rare variants in the population is very low, affected individuals may have different mutation sites or frequencies in a particular genetic region of interest. To account for the low efficiency that traditional single variants association analysis may incur, rare variants can be collapsed or compressed into a set and then examined for inter-set frequency differences between case and control groups (29). A few statistical tests have been developed, including the burden test and Sequence Kernel Association Test (SKAT) (30). The method selection depends on the number of rare mutations with the same direction of action. However, since the genetic model in complex diseases is unknown in practice, it is challenging to select the optimal application method. The optimal sequence kernel association test (SKAT-O) overcomes this problem by including the correlation matrix of the relationship structure of rare variants in the SKAT test (31), with the formula listed as follows: Q b and Q s are the score statistics of the burden test and SKAT test under the null model. The correlation matrix contains a parameter r, and mimics the burden test (when r= 1) or general SKAT (when r= 0). Different genetic structures of different traits correspond to different optimal r values. In practical application, r is obtained through the test procedure, and the weighted average value of SKAT and burden test statistics is calculated. In this study, the SKAT-O test was applied for gene-based association analysis by using the R package 'SKAT' (V2.0.1). For the loci shared by three cancer types, two or more loci located in the same genetic region were combined into a SET (genes) (32). Additionally, the obtained SKAT-O P-value was adjusted by FDR to control the type I error. Genes with FDR-adjusted P-values (P adj ) of less than 0.05 were considered significant.

Pathway enrichment analysis
In order to determine the underlying biological pathways affected by the significant genes identified by SKAT-O analysis, enrichment analysis was performed by Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases by R package 'clusterProfiler' (V4.4.4). For intergenic variants, both annotated genes were included in pathway analysis with subsequent FDR correction. The total gene lists contained 961, 959 and 1,011 genes in ESCC, GC and CRC dataset, respectively. Due to the presence of overlapping genes in common pathways, Pvalues were adjusted by FDR to control for a low proportion of false positives with the threshold of 0.05. In addition, we calculated Fold Enrichment (FE) by dividing Gene Ratio by Background Ratio to normalize the size of gene set. The shared pathways were identified after pooling the pathway enrichment results of these three GI cancers. Within the shared pathways, we discarded the 'integral component of postsynaptic density membrane' and 'intrinsic component of postsynaptic density membrane' because they represent protein topology, not a cellular component, and were already obsoleted by the newest vision of the database. For pathways which are subsets of larger pathways, only the larger pathways were retained in our results, the subset pathways were discussed when we explored potential mechanisms of association between shared pathways and GI cancer risk.

Statistical analysis
The comparison of demographic characteristics between the case and control groups was performed by Chi-square test or two independent sample Wilcoxon rank sum test. Variant filtering and quality control were conducted by PLINK (V1.9) and R software (V4.2.0). For single variant association analysis, multivariate logistic analyses were performed by PLINK 1.9 adjusted for sex and age, followed by one-sided ASSET using R package 'ASSET' (V2.14.0). Meta-analysis was performed using fixed-effect model followed by Cochran's Q test. Genome-wide statistical significance was set at Pvalue of 5x10 -8 . Manhattan plots and Q-Q plots of each dataset were carried out by R package 'qqman' (V0.1.8). Functional annotation of variants was carried out by ANNOVAR based on human genome hg19 coordinates.

Population characteristics of study subjects
After quality control of the recruited individuals and genetic data, three case-control datasets of esophageal squamous cell carcinoma (ESCC), gastric cancer (GC) and colorectal cancer (CRC) were included in the study (Figure 1). Table 1 summarizes the general demographic characteristics of the study samples included in the three cancer types. The age and sex distribution of cases and controls in the three cancer types were comparable (P > 0.05). In the ESCC and GC datasets, over half of the subjects were under 60 years of age, whereas over 50% of the subjects in CRC dataset were over 60 years old. All three datasets showed a higher percentage of male. The distribution of smoking status was significantly different between cases and controls in all three datasets (P< 0.001). More cases in the ESCC dataset were current smokers (57.7%) compared to the control group (36.6%). The majority of the GC and CRC samples were non-smokers, and 38.5% of the patients in the GC's case group were either smokers or ever-smokers compared to 36.6% in the control group. In the CRC dataset, only 16.2% of the patients were smokers, whereas 32.3% of control samples have smoking history.

Meta-analysis of single variant across GI cancers
Multivariate logistic regression analysis in ESCC, GC and CRC datasets was performed separately with adjustment for sex and age. A total of 29, 28 and 10 variants were significantly associated with ESCC, GC and CRC risk, respectively (P< 5x10 -4 ) (Supplementary Figure 2). The Q-Q plots of each cancer type showed modest deviations of observed P-value from expected P-value (Supplementary Figure 3). (l ESCC =1.029, l GC =1.068 and l CRC =1.069). In order to assess the shared susceptibility loci across GI cancers, we conducted a meta-analysis to combine the association results of three cancers. Based on one-sided Association analysis based on SubSET (ASSET) analysis (Table 2; Figure 2; Supplementary Table 1), 13 novel rare variants were found to reach the genome-wide significance threshold (P ASSET < 5×10 -8 ), with no pleiotropy or high homogeneity across three cancer types (same direction of effect and Q > 0.1). Of the 13 variants, 8 were likely to be risk alleles (OR >1), while 5 may confer protective effects (OR<1). The most significant variant was LRP5L/rs78345670 (A→G) (19), which was also the top hit variant in GC (OR (95% CI) = 11.13 (3.32-37.32), P = 9.50×10 -5 ) and CRC dataset [OR (95% CI) = 6.73 (2.62-17.30), P = 7.67×10 -5 ]. Exonic variant TMEM119/ rs112991728 (G→A) [OR (95% CI) = 0.09 (0.04-0.19), P ASSET = 3.29×10 -9 ) was identified as the most significant protective variant.  The table presents the meta-analysis and ASSET analysis results of ESCC, GC and CRC dataset for newly identified loci reaching genome-wide significance (P<5x10-8). P-values of meta-analysis were derived from fixed-effects models. One-sided ASSET analysis was used to correct the false positive rate caused by overlapping controls in three datasets. a Effect Allele; b MAF of cases in all datasets; c MAF of controls in all datasets; d Multivariate logistic regression in additive models adjusted for sex and age. Chr, chromosome; MAF, minor allele frequency; OR, odds ratio; ASSET, Association analysis based on SubSETs.

Stratified analysis of single variant
The results of stratified analysis showed consistent effects in most subgroups in each cancer type, although the P-values of some variants were not significant due to insufficient number of samples (Supplementary Tables 2-4). After stratifying the population by gender, age and smoking status, the effects of some GI cancerassociated variants seem to differ across different cancers. Age seems to have a specific effect on the ESCC and GC for two variants. For holders of risk allele LRP5L/rs78345670-G, people under age of 60 may have a higher risk for ESCC [OR (95% CI) =13.50 (1.77-103.10), P=0.012], whereas older people (>60 years old) may be more susceptible to GC [OR (95% CI) =18.54 (2.46-139.90), P=0.005]. The opposite effect of age is seen in EXOC6/ rs1339820-A holders and risk of ESCC and GC. For holders of the SLIT2/rs147972672-C allele, females have a 2.5-fold increased risk of ESCC compared to males, whereas males are at higher risk for GC. For bearers of RBMS3-AS3/rs73832011-A, the risk for all three GI cancers is only significant in non-smokers. The modifying effect of demographic or lifestyle factors found above is based on the assumption of holding a single variant, for people holding more than one risk variants, this modifying effect may change.

Gene-based rare variant analysis
Based on the principle of the Optimal Sequence Kernel Association Test (SKAT-O) analysis, 38,171 shared cross-cancer loci were merged into SETs (genes), whereas two or more loci located in the same gene region were merged into a set, and 6,741 sets spanning 28,152 loci were constructed. After FDR correction, 48, 31 and 25 genes were found to be significant in ESCC, GC and CRC, respectively, of which 20 were significant in at least two cancer types (Table 3). Among which three genes, EXOC6 (P adj|ESCC =7.30×10 -3 , P adj|GC =6.00×10 -4 , P adj| CRC =2.6×10 -2 ), LRP5L (P adj|ESCC =2.70×10 -3 , P adj|GC =5.70×10 -3 , P adj| CRC =1.65×10 -2 ), MIR1263/LINC01324(P adj|ESCC =3.22×10 -2 , P adj| GC =4.36×10 -2 , P adj|CRC =3.26×10 -2 ) were statistically significant in all three cancer types. In addition, two included SNPs EXOC6/rs1339820 Manhattan plot for cross-cancer meta-analysis. The meta-analysis revealed 13 shared variants passed the genome-wide threshold of 5×10 -8 in ESCC, GC and CRC. The x-axis denotes the chromosome location of the variants; the y-axis represents the -log 10 (P meta ) value of each variant. The rsID of the significant SNPs and the corresponding genes are annotated. and LRP5L/rs78345670 were also significant in the single variant metaanalysis. ABCC11 showed significant in GC and CRC dataset (P adj| GC =4.77×10 -2 , P adj|CRC =2.46×10 -2 ) while the variant rs75797074 located in ABCC11 nearly passed ASSET analysis with the P-value of 5.23×10 -8 (Supplementary Table 1). Detailed information of the SNPs included in these 20 genes is summarized in the Supplementary Table 6.

Pathway enrichment analysis
Significant genes in each cancer type identified by SKAT-O were subjected to pathway enrichment analysis based on GO, KEGG and Reactome databases. FDR test was conducted in pathway analysis rather than SKAT-O to avoid over-adjustment. As a result, GO enrichment analysis identified 180, 168, 115 pathways in ESCC, GC and CRC datasets, in which 24 were common across cancers (Supplementary Tables 7-10). In contrast, no significant pathways passed FDR tests in the KEGG and Reactome databases. After filtrating 7 subset pathways and 2 obsoleted pathways, we found 15 pathways which were shared in 3 cancers (Figure 3), 7 pathways were fallen under the categories of biological process (BP) and 8 pathways belonged to cellular components (CC). The majority of the pathways represent neurological functions and structures, indicating the potential association between neurogenesis and cancer risks. Among the BP pathways, 'glutamatergic synaptic transmission' showed the largest fold enrichment (FE) in all three cancers, while 'axonogenesis' had the most significant adjusted P-value and most enriched genes in ESCC and CRC datasets. According to the GO database, 'axonogenesis' is a subset of the pathway 'axon development', in that case, we included the larger pathways in the bubble diagram. Within the CC category, 'catenin complex' presented the largest FE in all three cancers (FE ERCC = 4.96, FE GC = 5.18, FE CRC = 8.48), which was also the most significant pathway in CRC dataset.

Discussion
Most of the GI cancers share similar multistep carcinogenic processes, therefore exploring shared genetic susceptibility factors can broaden our understanding of the underlying mechanisms in GI cancers. Previous genome-wide association studies focusing on common variants (MAF > 0.01) revealed a handful of loci which associated with individual types of cancer, leaving a considerable proportion of genetic susceptibility remain to be discovered. We hypothesized that parts of the susceptibility may be explained by rare variants with larger effect sizes, and the cross-cancer design may increase the statistical power to detect shared genetic features. This analytical strategy was implemented through meta-analysis along with ASSET method of single variants and supplemented with combined gene-based approach. The biological significance of the identified genes was further explored by pathway enrichment analysis. As a result, we identified 13 novel independent genetic variants associated with GI cancer susceptibility, 20 potential susceptibility genes shared by at least two GI cancers by SKAT-O analysis (3 genes shared by all three cancer types), and 15 pathways that were significantly enriched within three GI cancers.
In the single variant cross-cancer meta-analysis, the most significant SNP rs78345670 resides in the intronic region of LRP5L, which is also one of the most significant genes in SKAT-O analysis shared by all three cancers. Another variant rs1339820 located in the 3'-UTR region of EXOC6 which was also a member of shared associated genes in GI cancers. This variant was also significant in GC and CRC validation datasets, although only the OR orientation in GC was consistent with the original meta-analysis. ABCC11/ rs75797074, a significant SNP associated with risk of all three cancers (yet not passed the ASSET testing threshold), was successfully validated in the CRC and ESCC datasets (Supplementary Table 5). These three variants or SNPs in their corresponding LD blocks were not previously associated with GI cancer risk. EXOC6 has been shown to be a predictor of breast cancer in previous study (33), but its role in GI cancers has not been reported. Taken together, EXOC6/rs1339820 and ABCC11/ rs75797074 showed some cross-cancer susceptibility in CG and CRC, and LRP5L maybe a potential susceptibility gene in GI cancers. Furthermore, gene-based association analysis revealed 20 genes that were shared by at least two GI cancers by SKAT-O analysis. Among them three genes were implicated in all three cancer types, two of which correspond to the significant SNPs in the single variant analysis. LDL Receptor Related Protein 5 Like (LRP5L) is a key gene involved in the regulation of Wnt signaling pathway, which plays an important role in tumor progression (34). Abnormal Wnt signaling pathway has been identified in various types of cancer, and it has been reported to be involved in maintaining abnormal proliferation of gastric cancer (35). It is also reported that TMEM119 that encode membrane proteins may regulate Wnt signaling pathway according to previous studies (36,37). However the potential role of these genes in GI cancers were not thoroughly studied. We have conducted intersection search on Pubmed for all significant genes found in our study, and found LPR5L, TMEM119 and SLIT2 have been reported in other non-GI cancer types. TMEM119 was observed to be overexpressed in ovarian cancer (OV) tissues and associated with poor survival in OV patients. Overexpression of this gene also promoted proliferation, invasion, and migration in OV cells (38). Similarly in osteosarcoma, TMEM119 was connected with tumor size, clinical stage and overall survival time, and associated with cell cycle, metastasis, apoptosis as well as TGF-b signaling in osteosarcoma cell lines (39). SLIT2 was reported to regulate breast tumor growth and metastasis by blocking the expansion of tumor vasculature (40), and was also reported to be a predictive marker for thyroid cancer (41). Abnormal expression of LPR5L has been reported in breast cancer and pancreatic cancer (42,43). Therefore, LPR5L, TMEM119 and SLIT2 may confer some universal oncosusceptibility, while other identified genes were mainly related to GI cancer susceptibilities based on our results. Pathway enrichment analysis indicated major pathways related to synaptic transmission, axon development, catenin complex and regulation of ion transmembrane transport in the carcinogenesis of three GI cancers. Different groups of metabotropic glutamate receptors (mGluRs) modulate part of GI tract selectively (44), among which group III mGluRs was implicated in CRC risk and considered as a prognostic marker in CRC (45). Our findings may extend its involvement in the carcinogenic process of GI cancers other than CRC. GRIK2 (glutamate ionotropic receptor kainate type subunit 2)one of the susceptibility genes indicated by single variant analysis, acts as a member of glutamatergic synaptic transmission pathway which may contribute to GI cancers carcinogenesis by regulating glutamate receptor in synapse (46). Based on literature search, LRRN4 and NLGN1 in the nerve or glutamatergic synapses complex were found to promote CRC progression. The transmembrane protein Neuroligin 1 (NLGN1) is reported to interact at the synapse with the tumor suppressor Adenomatous Polyposis Coli (APC), which is intensively involved in the pathogenesis of CRC and is a key player in the WNT/bcatenin pathway (47). As for the potential correlation between synapse and carcinogenesis, several researchers found that tumor cells can form pseudo-tripartite synapses with neurons to increase tumor growth (48)(49)(50). Genetic alterations in non-neural/neural synapse systems were reported to contribute to the development of ESCC (51). The synaptic adhesion-like molecule (SALM) was found to be a potential prognostic biomarker in GC patients (52). Former study showed that the hypermethylation silencing of GRIK2 results in decreased colony formation and invasion, in gastric cancer cells (46). Their research indicated that ionotropic glutamate receptors were related to the development and progression of some GI cancers. In the pathway analysis, a total of 29 and 27 genes identified by SKAT-O were enriched in glutamatergic synapse pathway in ESCC and CRC (Supplementary Tables 7-9). Taken together it is plausible that these genes may contribute to GI cancers carcinogenesis by regulating glutamate receptor in synapse. According to the results of our pathway enrichment analysis, GRIK2 participate in the 'cation channel complex' and 'regulation of trans-synaptic signaling' pathway. Therefore, we speculate that glutamate receptors may contribute to GI cancer risks mediated by crosstalk of various pathways related to synaptic transmission, organization and trans-membrane transport. Taken together, our research may provide the evidence of the association between these signaling pathways and GI cancer risks.
Neuron density is generally regarded higher in tumor tissues than normal tissues (53), and neuron development is considered as a common characteristic in tumor microenvironment; hence cancer-specific neurogenesis may promote cancer growth (54). SLIT2 might mediate the process of axon guidance by attracting or repelling developing axons and migrating neurons (55). In our research, 'axon development', and its subgroups 'axonogenesis' and 'axon guidance' were significantly enriched in all three GI cancers. Consistent with our findings, a miRNAs-based study suggested that axon guidance, targeted by dysregulation of miRNAs, may mediate the effects of oncogenes in gastric, colorectal and liver cancers (56). Moreover, the interaction between axon guidance and the corresponding receptors plays a vital role in the formation of malignancies by regulating vascularization, cell survival, apoptosis and cell migration (57). To this end, we provide further evidence for the potential relationship between neuron development and the carcinogenic progress of GI cancers.
There are several limitations to our study. First, due to the retrospective nature of our patient cohort, missing information may affect the accuracy of our multivariate logistic regression analysis, which only adjusted for age and sex. Important lifestyle factors such as drinking cannot be assessed. Second, the sample size of each cancer type was relatively small, with the overlapping control population.
Although we tried to address this potential issue by using the ASSET test, this may still result in limited statistical power. However, the research interest of cross-cancer susceptibility has allowed us to focus only on the shared association through the meta-analysis. Last, our study population come from the eastern China, while the validation dataset included subjects from the northern and central China. No single SNP was validated in all three validation datasets suggests potential population stratification hence extrapolation of the result needs cautious implementation.

Conclusions
Based on a sequential investigation of single variant, gene-based and pathways enrichment analysis, we uncovered novel rare variants and genes that may contribute to the susceptibility of GI cancers. Further studies are warranted to look into the underlying mechanism of the association between these susceptibility markers and GI cancers.

Data availability statement
The summary data of the detailed results of the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Ethics statement
All patients provided an informed consent to participate in the study, and the study protocol was approved by the ethical committee of the Institutional Review Board of Fudan University Shanghai Cancer Center.

Author contributions
The hypothesis of analyzing shared rare variants associated with the risk of GI cancer was conceived by RZ. JZ conducted quality control and bioinformatic analysis. Data collection and the bioinformatic analysis was conducted by XW. JL helped with the visualization of results. QW provided suitable guidance to the study design. MYW and YW helped with the collection of blood DNA samples. JC, JX, MLW and TW assisted with validation of significant SNPs in external datasets. All authors have read and approved the final manuscript.