Identification of Claudin-6 as a Molecular Biomarker in Pan-Cancer Through Multiple Omics Integrative Analysis

Claudin-6 (CLDN6) is one of the 27 family members of claudins and majorly involved in the tight junction and cell-to-cell adhesion of epithelial cell sheets, playing a significant role in cancer initiation and progression. To provide a more systematic and comprehensive dimension of identifying the diverse significance of CLDN6 in a variety of malignant tumors, we explored CLDN6 through multiple omics data integrative analysis, including gene expression level in pan-cancer and comparison of CLDN6 expression in different molecular subtypes and immune subtypes of pan-cancer, targeted protein, biological functions, molecular signatures, diagnostic value, and prognostic value in pan-cancer. Furthermore, we focused on uterine corpus endometrial carcinoma (UCEC) and further investigated CLDN6 from the perspective of the correlations with clinical characteristics, prognosis in different clinical subgroups, co-expression genes, and differentially expressed genes (DEGs), basing on discussing the validation of its established monoclonal antibody by immunohistochemical staining and semi-quantification reported in the previous study. As a result, CLDN6 expression differs significantly not only in most cancers but also in different molecular and immune subtypes of cancers. Besides, high accuracy in predicting cancers and notable correlations with prognosis of certain cancers suggest that CLDN6 might be a potential diagnostic and prognostic biomarker of cancers. Additionally, CLDN6 is identified to be significantly correlated with age, stage, weight, histological type, histologic grade, and menopause status in UCEC. Moreover, CLDN6 high expression can lead to a worse overall survival (OS), disease-specific survival (DSS), and progression-free interval (PFI) in UCEC, especially in different clinical subgroups of UCEC. Taken together, CLDN6 may be a remarkable molecular biomarker for diagnosis and prognosis in pan-cancer and an independent prognostic risk factor of UCEC, presenting to be a promising molecular target for cancer therapy.


INTRODUCTION
Claudins consist of 27 members in mammals that are essential for the regulation of paracellular permeability and the maintenance of cell polarity, due to their significant roles of being key components in tight junctions, involving almost all vital physiological and pathological bioprocesses (Peter and Goodenough, 2004;Lal-Nag and Morin, 2009;Barmeyer et al., 2015;Tabaries and Siegel, 2017). In recent years, claudins have been recognized as crucial regulators in the initiation, progression, and metastasis of cancers, playing distinct roles in a variety of cancers according to their different patterns of tissue-dependent expression (Tabaries and Siegel, 2017).
Claudin-6 (CLDN6) is a member of the claudin family and serves as a tight junction molecule, which plays a vital role in cell-to-cell adhesion in epithelial or endothelial cell sheets. It encodes the tetraspan membrane protein, with the size of 220 amino acids and molecular mass of 23,292 Da. CLDN6 has been identified to be the origination of cell adhesion signaling taking part in the regulation of nuclear receptor activity through targeting molecules of the nuclear receptor superfamily and managing their gene expression (Sugimoto et al., 2019). In terms of its various characters in human cancers, CLDN6 may be a helpful positive marker for further identification of atypical teratoid/rhabdoid tumors (AT/RTs) for diagnosis and therapy (Birks et al., 2010). CLDN6 is also reported to be a possible single prognostic marker and promising therapeutic target for a subgroup of intestinal type gastric cancer (Kohmoto et al., 2020). However, over-expression of CLDN6 may suppress the progression of breast cancer, whereas DNA methylation of CLDN6 can downregulate its gene expression and promote migration and invasion (Liu Y. et al., 2016). Recently, CLDN6 has been identified to be an oncofetal cell surface antigen for chimeric antigen receptor (CAR)-T cell targeting, due to its aberrant activation and high protein expression for solid cancers, while silence for healthy tissues, respectively (Reinhard et al., 2020).
However, as the previous studies reported, CLDN6 may take diverse parts in different cancers, even in different subtypes of the same cancer, either for cancer promotion or for cancer suppression. The discrepancies between the dual roles of CLDN6 in human cancers may be due to the heterogeneity and complexity of tumors. In order to provide a more systematic and comprehensive insight of CLDN6, to the best of our knowledge, we are the first to explore the expression and biofunction of CLDN6 from the perspective of pan-cancer, focusing on its diagnostic and prognostic values, and find that CLDN6 is not only significantly upregulated in 20 types of human cancers but also differently expressed in different molecular subtypes of seven cancer types and different immune subtypes of nine cancer types. Additionally, CLDN6 has a high accuracy in predicting acute myeloid leukemia (LAML), testicular germ cell tumors (TGCT), ovarian serous cystadenocarcinoma (OV), and uterine carcinosarcoma (UCS) while having notable correlations with the overall survival (OS), disease-specific survival (DSS), and progression-free interval (PFI) of uterine corpus endometrial carcinoma (UCEC), adrenocortical carcinoma (ACC), bladder urothelial carcinoma (BLCA), and stomach adenocarcinoma (STAD). Next, we put emphasis on UCEC and identify CLDN6 as an independent risk factor for OS, DSS, and PFI in UCEC. Moreover, we further investigate the co-expression genes correlated with CLDN6 and the differentially expressed genes (DEGs) between CLDN6 high expression group and low expression group. Taken together, CLDN6 is a potential biomarker for diagnosis and prognosis in pan-cancer and a promising molecular target for UCEC.

Gene Expression Analysis
The RNA-seq data and relevant clinical data across 33 tumor types and normal tissues of 15,776 samples were downloaded from The Cancer Genome Atlas (TCGA) database and the Genotype-Tissue Expression (GTEx) database by UCSC XENA 1 . The data of tumor cell line were downloaded from the Cancer Cell Line Encyclopedia (CCLE) database 2 . R software v3.6.3 was used for statistical analysis, and the ggplot2 package was used for visualization. The Wilcoxon rank sum test detected two sets of data, and p < 0.05 was considered statistically significant (ns, p ≥ 0.05; * , p < 0.05; * * , p < 0.01; * * * , p < 0.001) (Vivian et al., 2017).

CLDN6 Expression in Molecular Subtypes and Immune Subtypes of Cancers
We explored the correlations between CLDN6 expression and molecular subtypes or immune subtypes in pan-cancer from the TISIDB database (Ru et al., 2019), which integrates multiple data types to assess tumor and immune system interaction. We also explored the correlations between CLDN6 expression and immunomodulators in pan-cancer from the TISIDB database. (C) CLDN6 expression in TCGA tumors and adjacent normal tissues; (D) CLDN6 expression in TCGA tumors and normal tissues with the data of the GTEx database as controls (*p < 0.05, **p < 0.01, ***p < 0.001).

Protein-Protein Interaction Network Building
A total of 50 CLDN6-binding proteins were acquired from the STRING web 3 by setting the following main parameters: minimum required interaction score ["medium confidence (0.400)"] and active interaction sources ("Experiments, Text mining, Databases"). Then, Cytoscape (version 3 https://string-db.org/ 3.7.2) was applied for visualization of protein-protein interaction (PPI) network.

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Enrichment Analyses
The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted for 50 CLDN6-binding proteins using the ggplot2 package for visualization and the cluster Profiler package for statistical analysis (Subramanian et al., 2005;Yu et al., 2012).

Diagnostic Value Analysis
The receiver operating characteristic (ROC) curve was used to assess the diagnostic value of CLDN6 in pan-cancer. The area value under the ROC curve is between 0.5 and 1. The closer the area under the curve (AUC) is (1), the better the diagnostic effect is. AUC in 0.5-0.7 has a low accuracy, AUC in 0.7-0.9 has a certain accuracy, and AUC above 0.9 has a high accuracy.

Survival Prognosis Analysis
Kaplan-Meier plots were used to assess the relationship between CLDN6 expression and prognosis (OS, DSS, and PFI) of cancers. Moreover, we further investigated the associations between CLDN6 expression and prognosis (OS, DSS, and PFI) in different clinical subgroups of UCEC. The survminer package was used for visualization, and the survival package was used for statistical analysis. The Cox regression was used in the hypothesis test, and p < 0.05 is considered statistically significant.

Associations Between CLDN6 Expression and Different Clinical Characteristics in UCEC
The box plots and tables were presented for CLDN6 expression levels of patients with different clinical characteristics in UCEC. The RNA-seq data and related clinical data in level 3 HTSeq-fragments per kilobase per million (FPKM) format were downloaded from TCGA database, converted to transcripts per million reads (TPM) format, and then analyzed after log2 conversion. The Wilcoxon rank sum test was used to detect two groups of data, and p < 0.05 was considered statistically significant (ns, p ≥ 0.05; * , p < 0.05; * * , p < 0.01; * * * , p < 0.001).

Univariate and Multivariate Cox Regression Analyses in UCEC
Univariate and multivariate Cox regression analyses of CLDN6 and clinical characteristics were performed to identify their prognostic values in OS, DSS, and PFI of UCEC. The survival package was used for statistical analysis.

Co-expression Gene Analysis of CLDN6 in UCEC
We explored the top 50 co-expression genes positively and negatively correlated with CLDN6 expression in UCEC. The gene co-expression heatmaps were displayed using the stat package.
We also showed the correlations between CLDN6 expression and top 10 genes expression in the heatmap using Pearson correlation coefficient.

DEGs Between CLDN6 High Expression and Low Expression Groups in UCEC
We explored the DEGs between different CLDN6 expression groups (low expression group: 0-50%; high expression group: 50-100%) in UCEC using the deseq2 package. The volcano map was drawn by the ggplot2 package with the threshold values of |log2 fold-change (FC)| > 1.0 and adjusted p-value < 0.05. Then, we performed GO and KEGG enrichment analyses of DEGs using the ggplot2 package for visualization and the cluster Profiler package for statistical analysis. Furthermore, we built a PPI network of DEGs obtained with the threshold values of |log2 fold-change (FC)| > 2.0 using STRING web and analyzed the hub genes by the MCC algorithm of CytoHubba in Cytoscape (version 3.7.2).

CLDN6 Expression in Pan-Cancer
We displayed CLDN6 expression in normal tissues from the GTEx database and found that CLDN6 was less expressed across most normal tissues, and the highest expression tissue was the testis ( Figure 1A). In contrast, CLDN6 was expressed more in almost all cell lines ( Figure 1B). For TCGA tumors and adjacent normal tissues, CLDN6 expression was significantly upregulated in eight cancer types, including breast invasive carcinoma (BRCA), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), lung adenocarcinoma (LUAD), STAD, and UCEC, while it was downregulated in

Correlations Between CLDN6 and Molecular or Immune Subtypes of Cancers
We explored the correlations between CLDN6 differential expression and molecular subtypes in pan-cancer from the TISIDB database and found that CLDN6 was expressed differently in different molecular subtypes of seven cancer types, including UCEC, BRCA, ESCA, LUSC, HNSC, OV, and STAD. Moreover, for UCEC, CLDN6 was identified to express more in the molecular subtype of CN_HIGH than other molecular subtypes (Figure 2A). For BRCA, CLDN6 was expressed the highest in the molecular subtype of basal ( Figure 2B). For ESCA, CLDN6 was expressed the highest in the molecular subtype of CIN ( Figure 2C). For LUSC, CLDN6 was expressed the highest in the molecular subtype of secretory ( Figure 2D). For HNSC, CLDN6 was expressed the highest in the molecular subtype of classical ( Figure 2E). For OV, CLDN6 was expressed the highest in the molecular subtype of proliferative ( Figure 2F). For STAD, CLDN6 was expressed the highest in the molecular subtype of CIN ( Figure 2G). Meanwhile, we observed that CLDN6 expression was significantly correlated with different immune subtypes (

PPI Network and GO and KEGG Enrichment Analyses
We screened out 50 targeted binding proteins of CLDN6 using the STRING database and Cytoscape ( Figure 4A). Then, we conducted the GO enrichment analysis (Figure 4B) of 50 targeted binding proteins, revealing that the primary biological process (BP) contained calcium-independent cell-cell adhesion via plasma membrane cell adhesion molecules, cell-cell adhesion via plasma membrane adhesion molecules, tight junction organization, and cell-cell junction organization. The cellular component (CC) was mainly enriched in bicellular tight junction, apical junction complex, tight junction, and cell-cell junction. The molecular function (MF) was primarily involved in virus receptor activity, hijacked MF, miRNA binding, and regulatory RNA binding. The KEGG pathway enrichment (Figures 4C,D) was mainly related to leukocyte transendothelial migration, tight junction, cell adhesion molecules, hepatitis C, and pathogenic Escherichia coli infection.

Diagnostic Value of CLDN6 in Pan-Cancer
The ROC curve was performed to assess the diagnostic value of CLDN6 in pan-cancer. The results showed that CLDN6 had a certain accuracy (AUC > 0.7) in predicting

CLDN6 Is Correlated With Different Clinical Characteristics in UCEC
We further presented the associations between CLDN6 and different clinical characteristics in UCEC and found that CLDN6 expression was significantly related to age, stage, weight, histological type, histologic grade, and menopause status of UCEC (Table 1). Moreover, CLDN6 was expressed higher in patients with age > 60 (Figure 10A), stage III/IV (Figure 10D), histologic grade 3 (Figure 10E), and histological type of serous (Figure 10F), while it was expressed lower in patients with weight > 80 ( Figure 10B) and primary therapy outcome (CR) (Figure 10C), respectively.

Univariate and Multivariate Cox Regression Analyses in UCEC
We implemented univariate and multivariate Cox regression analyses of CLDN6 and clinical characteristics in UCEC. In the univariate and multivariate Cox regression analyses, clinical stage, primary therapy outcome, age, and CLDN6 were significantly associated with the OS (Table 2), while clinical stage, primary therapy outcome, histologic grade, and CLDN6 were significantly associated with DSS (Supplementary Table 1), and clinical stage, primary therapy outcome, and CLDN6 were significantly associated with PFI (Supplementary Table 2).

Co-expression Gene Analysis of CLDN6 in UCEC
We explored the top 50 co-expression genes positively or negatively correlated with CLDN6 expression in UCEC and displayed the correlations between CLDN6 expression and top 10 genes expression in the heatmap. In the heatmap of positive correlation (Figure 11A), we obtained the top 10 genes, including HIF3A (r = 0.71) (Figure 11B), GAL3ST3 (r = 0.66) (Figure 11C), PNOC (r = 0.64) (Figure 11D

DEGs Between CLDN6 High Expression Group and Low Expression Groups in UCEC
A total of 2,884 DEGs were acquired with the threshold values of |log2 fold-change (FC)| > 1.0 and adjusted p-value < 0.05, including 2,005 upregulated genes and 879 downregulated genes ( Figure 13A). Among them, 446 DEGs were obtained with the threshold values of |log2 fold-change (FC)| > 2.0 and adjusted p-value < 0.05, including 388 upregulated genes and 58 downregulated genes. Then, we conducted the GO and KEGG enrichment analyses (Figures 13B,C) of DEGs, revealing that the primary BP contained complement activation, classical pathway, humoral immune response mediated by circulating immunoglobulin, complement activation, proteinactivation cascade, and humoral immune response. The CC was mainly enriched in immunoglobulin complex, immunoglobulin complex, circulating, transmembrane transporter complex, transporter complex, and ion channel complex. The MF was primarily involved in antigen binding, immunoglobulin receptor binding, receptor ligand activity, gated channel activity, and passive transmembrane transporter activity. The KEGG pathway enrichment was mainly related to neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, cAMP signaling pathway, adrenergic signaling in cardiomyocytes, and protein digestion and absorption. Furthermore, we obtained the top 10 hub genes ( Figure 13D) of 446 DEGs, including CCK, NTS, AVP, ADRA1B, NPFFR2, CASR, GCGR, TACR3, UTS2R, and NTSR1. Among them, the top five hub genes ( Figure 13E) were CCK, NTS, AVP, ADRA1B, and NPFFR2. Additionally, the top three hub genes ( Figure 13F) were CCK, NTS, and AVP.
However, there is no existing study to our knowledge to evaluate the significance of CLDN6 in pan-cancer on the whole scale. In the present study, to assess the expression level of CLDN6 across pan-cancer, we examined CCLE database, TCGA database, and GTEx database and found that it was significantly upregulated in 20 types of human cancers, including ACC, BLCA, BRCA, CHOL, COAD, ESCA, HNSC, LIHC, LUAD, LUSC, OV, PAAD, PCPG, READ, STAD, TGCT, THCA, THYM, UCEC, and UCS, while it was downregulated in GBM, KICH, KIRC, LAML, and LGG. The finding reveals that CLDN6 serves as a cancer-promoting gene in the majority of malignant tumors and might be involved in tumor formation or cancer development. In addition, there are meaningful associations between CLDN6 expression level and molecular subtypes of seven cancer types, including UCEC, BRCA, ESCA, LUSC, HNSC, OV, and STAD. For instance, CLDN6 was observed to express the highest in the molecular subtype of CN_HIGH in UCEC, in the molecular subtype of basal in BRCA, and in the molecular subtype of CIN in STAD. Meanwhile, we observed that CLDN6 was significantly correlated with different immune subtypes in nine cancer types. It is worth pointing out that CLDN6 is closely correlated with both molecular subtype and immune subtype in six types of cancers, including UCEC, BRCA, STAD, OV, LUSC, and HNSC. As the previous study demonstrated, CLDN6 is confirmed to be a possible single prognostic marker and promising therapeutic target for a subgroup of intestinal type gastric cancer (Kohmoto et al., 2020). Hence, the research focused on a distinct molecular subtype or immune subtype of cancers may provide an appropriate entry point to explore the role of CLDN6. Then, to examine the diagnostic and prognostic value of CLDN6, we performed the ROC curve and the Kaplan-Meier survival curve in pan-cancer and found that CLDN6 had a certain accuracy (AUC > 0.7) in predicting 15 cancer types, especially had a high accuracy (AUC > 0.9) in predicting LAML, TGCT, OV, and UCS. Additionally, CLDN6 was significantly correlated with the OS, DSS, and PFI in UCEC, ACC, BLCA, and STAD. The results demonstrate here that CLDN6 presents great diagnostic and prognostic significance in the above cancers and may be a potential biomarker or therapeutic target for precision oncology. To deeply investigate the biofunction of CLDN6, we conducted the GO and KEGG pathway enrichment analyses of its 50 targeted binding proteins, revealing that the BP was major in cell-cell adhesion or bicellular tight junction, its MF was primarily involved in virus receptor activity, hijacked MF, miRNA binding, and regulatory RNA binding, and the main pathways were enriched in leukocyte transendothelial migration, tight junction, cell adhesion molecules, hepatitis C, and pathogenic E. coli infection. It should be emphasized that CLDN6 is not only essential for cell-cell adhesion and tight junction but also crucial for outside-in cell signaling.
Furthermore, we primarily analyzed the role of CLDN6 played in UCEC and identified the significant correlations between CLDN6 expression level and age, stage, weight, histological type, histologic grade, and menopause status. Subsequently, we discovered that CLDN6 high expression could cause a worse OS, DSS, or PFI in a variety of clinical subgroups of UCEC, yet cause a worse all of the OS, DSS, and PFI only in clinical subgroups of weight > 80, BMI > 30, postmenopause, or primary therapy outcome of CR. Since then, we confirmed clinical stage, primary therapy outcome, and CLDN6 expression level as independent risk factors in OS, DSS, and PFI of UCEC through univariate and multivariate Cox regression analyses. The results are further supported by the prior research involving prognostic significance of aberrant CLDN6 expression in UCEC (Kojima et al., 2020), which established a monoclonal antibody, then performed immunohistochemical staining and semi-quantification to assess the associations between CLDN6 and clinical characteristics in 173 cases, and finally identified stages III/IV, distant metastasis, and high CLDN6 expression as independent prognostic variables for the OS. It is of great importance that our findings provide a more comprehensive and further supplement and strengthen the role of CLDN6 played in UCEC, and that we precisely analyze the correlations between CLDN6 expression and different prognosis conditions (OS, DSS, or PFI) in various clinical subgroups of UCEC. In addition, we further obtained co-expression genes of CLDN6, and the top 10 co-expression genes positively correlated with CLDN6 contained HIF3A, GAL3ST3, PNOC, AC068987.3 (Lnc-SCN8A-2), L1CAM, AP000662.1 (Lnc-CLP1-3), HMGA2, CLDN9, CLDN19, and EPHB2, whereas the top 10 co-expression genes negatively correlated with CLDN6 contained ELAPOR1, PIGR, MLPH, SPDEF, FOXA2, TFF3, GLYATL2, IL20RA, PLPP2, and LNCTAM34A. Besides, we explored the DEGs between CLDN6 high expression group and low expression group, then performed the GO and KEGG pathway analyses of the DEGs, and found that the BP mainly linked to complement activation, classical pathway, humoral immune response mediated by circulating immunoglobulin, complement activation, protein activation cascade, and humoral immune response. The MF was primarily involved in antigen binding, immunoglobulin receptor binding, receptor ligand activity, gated channel activity, and passive transmembrane transporter activity. Finally, we screened out the hub genes of DEGs, including CCK, NTS, AVP, ADRA1B, NPFFR2, CASR, GCGR, TACR3, UTS2R, and NTSR1.
There are still several limitations in the present study. On the one hand, we set out to explore CLDN6 only using CCLE database, TCGA database, and GTEx database while lacking actual clinical data. On the other hand, the precise verification and high-quality evidence should be further performed and provided by biological experiments.
Many computational methods have been applied in bioinformatics research, such as lncRNA-miRNA interaction predictions (Liu et al., 2020;Zhang et al., 2021a,b), the identification of microRNA combinatorial biomarkers (Yang et al., 2017a,b;Liu and Yang, 2018), and so on (Wang et al., 2020;Zhang et al., 2020). In the next step, we will consider the use of the machine learning and deep learning methods in the research of CLDN6. In summary, the identification of CLDN6 in diagnostic and prognostic significance across pancancer, together with its further exploration in UCEC, could add a new dimension to the comprehensive understanding of its critical role in tumor promotion and suppression and provide an integrative analyzing basis for deep verification of molecular biology experiments, even for future clinical application of cancer therapies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.