High Expression Levels of CDK1 and CDC20 in Patients With Lung Squamous Cell Carcinoma are Associated With Worse Prognosis

Purpose: Progress related to the early detection and molecular targeted therapy of lung squamous cell carcinoma (LUSC) remains limited. The goal of our study was to identify key candidate indicators of LUSC. Methods: Three microarray datasets (GSE33532, GSE30219 and GSE19188) were applied to find differentially expressed genes (DEGs). Functional enrichment analyses of DEGs were carried out, and their protein-protein interaction (PPI) network was established. Hub genes were chosen from the PPI network according to their degree scores. Then, overall survival (OS) analyses of hub genes were carried out using Kaplan-Meier plotter, and their GSEA analyses were performed. Public databases were used to verify the expression patterns of CDK1 and CDC20. Furthermore, basic experiments were performed to verify our findings. Results: A total of 1,366 DEGs were identified, containing 669 downregulated and 697 upregulated DEGs. These DEGs were primarily enriched in cell cycle, chromosome centromeric region and nuclear division. Seventeen hub genes were selected from PPI network. Survival analyses demonstrated that CDK1 and CDC20 were closely associated with OS. GSEA analyses revealed that cell cycle, DNA replication, and mismatch repair were associated with CDK1 expression, while spliceosome, RNA degradation and cell cycle were correlated with CDC20 expression. Based on The Cancer Genome Atlas (TCGA) and The Human Protein Atlas (THPA) databases, CDK1 and CDC20 were upregulated in LUSC at the mRNA and protein levels. Moreover, basic experiments also supported the obvious upregulation of CDK1 and CDC20 in LUSC. Conclusion: Our study suggests and validates that CDK1 and CDC20 are potential therapeutic targets and prognostic biomarkers of LUSC.

upregulated in LUSC at the mRNA and protein levels. Moreover, basic experiments also supported the obvious upregulation of CDK1 and CDC20 in LUSC.

Conclusion:
Our study suggests and validates that CDK1 and CDC20 are potential therapeutic targets and prognostic biomarkers of LUSC.
Keywords: lung squamous cell carcinoma, protein-protein interaction network, survival analysis, gene set enrichment analysis, experimental validation BACKGROUND Lung cancer is one of the most common cancer types with high cancer-related mortality and is estimated to lead to 228,820 cancer cases and 135,720 deaths in 2020 (Siegel et al., 2020). Non-small cell lung cancer (NSCLC) accounts for 85% of total lung cancer cases, and lung adenocarcinoma (LUAD) is the first prevalent histological type of NSCLC followed by lung squamous cell carcinoma (LUSC). Accumulating evidence indicated that overexpression and mutation of the genes are associated with carcinogenesis, proliferation, and/or worse prognosis of LUSC, including phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (PIK3CA), MET proto-oncogene (MET), and epidermal growth factor receptor (EGFR) (Sands et al., 2020). Numerous studies have demonstrated considerable progress of molecular targeted therapy for LUAD, including tyrosine kinase inhibitors targeting EGFR and anaplastic lymphoma kinase (ALK) (Li et al., 2016). However, the progress in the diagnosis and treatments of LUSC is extremely limited compared with that in LUAD. Thus, key biomarkers involved in the carcinogenesis and progression of LUSC need to be identified, and effective therapeutic measures need to be developed.
In the last 2 decades, gene chip technologies and bioinformatics analysis have been rapidly developed for the screening of genetic alterations at the genome level (Dawany et al., 2011). These technologies are used to identify differentially expressed genes (DEGs), which play important roles in the development of LUSC. However, false positive rates of independent microarray studies may influence the reliability of the outcomes (Manoli et al., 2006).
Therefore, in the current study, we selected three microarray datasets from Gene Expression Omnibus (GEO) database to identify DEGs in LUSC and normal lung tissues. Functional enrichment analyses and proteinprotein interaction (PPI) network analysis were carried out to determine the molecular mechanisms of initiation and invasion of LUSC. The top 17 genes with the highest degree scores were selected as hub genes of the PPI network. CDK1 and CDC20 were survival-related hub genes. GSEA analyses of CDK1 and CDC20 were performed. Finally, our findings are validated in public databases and basic experiments.

MATERIALS AND METHODS
The Obtainment of Microarray Datasets GEO (Barrett et al., 2013) is the public functional genomics data repository containing the results of high throughput gene expression, chips, and microarrays. All microarray datasets were selected according to the following selection criteria: 1) the inclusion of LUSC and normal lung tissue samples; 2) the platforms were GPL570 [HG-U133_Plus_2] Affymetrix human genome U133 plus 2.0 array; 3) the organisms were Homo sapiens; and 4) the size of the normal lung tissue samples was >3. Three gene expression profiles (GSE33532 (Meister et al., 2014), GSE30219 (Rousseaux et al., 2013), and GSE19188 (Hou et al., 2010)) were downloaded from GEO (Table 1).
Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 2 FIGURE 1 | Volcano plot and Venn diagram of three expression profiles. The selection procession of downregulated and upregulated genes in GSE19188 (A), GSE30219 (B) and GSE33532 (C) with p-value<0.05 and |logFC| > 1, and upregulated genes are marked in red, and downregulated genes are marked in green. The three datasets display an overlap of 1366 DEG (D).

Identification of Differentially Expressed Genes
DEGs were selected between LUSC and normal lung tissues using the limma package of R software (Ritchie et al., 2015). To minimize the false discovery rate (FDR), the p-values were adjusted using the Benjamini-Hochberg method. Probe sets without corresponding gene symbols were omitted, while genes with >1 probe sets were averaged. The filtering criteria for DEGs were adjusted p < 0.05 and |logFC(fold change)|>1.

Functional Enrichment Analysis of DEGs
To analyze DEGs' biological functions, gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses were carried out through the clusterProfiler (Yu et al., 2012) and GOplot packages. GO is an important bioinformatics tool for gene annotation and the analysis of biological processes of genes (Ashburner et al., 2000). Moreover, Kyoto Encyclopedia of Genes and Genomes (KEGG) is used to identify high-level functions and biological systems based on the large-scale molecular datasets (Kanehisa et al., 2017). Adjusted p < 0.05 was regarded as statistically significant.

Construction of the PPI Network and Selection of Hub Genes
The PPI network of DEGs was constructed using Search Tool for the Retrieval of Interacting Genes (STRING) (Franceschini et al., 2013), and the interaction with a combined score>0.9 was regarded as statistically different. Cytoscape is a bioinformatics software that is used to establish visual networks of molecular interactions (Smoot et al., 2011). Significant gene modules were visualized using The Molecular Complex Detection (MCODE) plugin which was used to identify closely correlated modules from PPI networks (Bandettini et al., 2012). The criteria of selection were MCODE score>5, node score cutoff 0.2, degree cutoff 2, k-score 2, and max depth 100.
Hub genes were chosen based on the degree scores using the cytoHubba plugin (Chin et al., 2014). The GO enrichment and KEGG pathway analysis of the hub genes were carried out using the ClueGO plugin (Mlecnik et al., 2019). We still explored the correlation network of these hub genes. Based on The Cancer Genome Atlas (TCGA)-LUSC dataset, heatmap of these hub genes was constructed and visualized using pheatmap package. The Kaplan-Meier (KM) plotter was applied to explore the prognostic values of hub genes among LUSC patients (Gy} orffy et al., 2013). Log-rank p < 0.05 was considered statistically different.

GSEA Analysis of the Hub Genes
GSEA is a computational method which estimates whether a previously defined gene set has statistical differences and concordant differences in two biological states (Subramanian et al., 2005). We divided LUSC samples into high-expression and low-expression groups based on the median expression levels Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 4 FIGURE 3 | The PPI network and the two most significant modules of DEGs. The PPI network of DEGs is constructed using Cytoscape (A). The most significant module is acquired from the PPI network with 37 nodes and 665 edges (B). 17 hub genes are shown (C). The correlation network of the hub gene is illustrated (D). Upregulated genes are marked in red, and downregulated genes are marked in blue.

The Verification Based on Public Databases
To compare the mRNA expression levels of CDK1 and CDC20 between LUSC and normal lung tissues, the expression patterns of the two genes were explored using the TCGA-LUSC dataset. Additionally, we assessed whether the expression levels of the two genes are associated with clinical information. To further explore their protein expression patterns, the immunohistochemistry (IHC) outcomes of the two genes were compared between the normal and LUSC tissue samples using The Human Protein Altas (THPA) database.  Table 1). These tissues were frozen in liquid nitrogen for further analysis, containing real timequantitative PCR (RT-qPCR), western blot and IHC. Besides, these experiments were independently repeated in triplicate. These experiments obtained the approval of the Medical Ethics Committee of Zhejiang Cancer Hospital (IRB-2020-817).

The External Validation of Basic Experiments
Firstly, total RNA extracted from these tissues using Trizol Reagent (TaKaRa, Japan) were reverse transcribed with RT reagent Kit gDNA Eraser (TaKaRa, Japan). SYBR-Green (TaKaRa, Japan) and RT-qPCR analysis were applied to detect cDNA expression levels, and GAPDH was applied as the internal reference. Primers of GAPDH, CDK1 and CDC20 in RT-qPCR were displayed (Table 2).
Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 6 FIGURE 4 | Functional enrichment analysis and heat map of these hub genes. The biological processes (BP), cell component (CC), and molecular function (MF) analysis of hub genes are performed using ClueGO (A-C). The KEGG pathway analysis of hub genes is performed using ClueGO (D). The two terms are marked in the same colors if they are similar, and the size of nodes refers to the p-value of biological process analysis. The hub genes participating in these terms are shown, and they are connected with relevant terms by solid lines. Heat map of hub genes is constructed based on data from TCGA and visualized using pheatmap package (E). Upregulation is marked in red, while down-regulation is marked in green.
Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 Thirdly, the sections of these tissues via deparaffinization and dehydration were incubated with anti-CDK1 (dilution: 1: 100) and anti-CDC20 antibodies (dilution: 1:300) overnight at 4°C after epitope retrieval, hydrogen peroxide treatment and non-specific antigens blocking. Then, these sections were incubated using secondary antibodies, followed by signal detection with the DAB staining kit (Vector Laboratories, United States).

Identification of DEGs in LUSC
The volcano plots reveal the downregulated and upregulated genes in GSE19188 ( Figure 1A), GSE30219 ( Figure 1B), and GSE33532 ( Figure 1C). After the normalization of the microarray outcomes, DEGs are selected in LUSC and normal lung tissue samples. The overlap among the three

Functional Enrichment Analysis
The results of the GO enrichment analysis of DEGs are illustrated as GO plots using the GOplot package. GO enrichment analyses of upregulated DEGs demonstrate that nuclear division, chromosome centromeric region, and DNA replication origin binding are primarily enriched in MF (Figure 2A), while KEGG pathway analysis indicates that cell cycle, DNA replication, and p53 signaling pathway are mainly enriched ( Figure 2B), suggesting that upregulated genes are highly associated with cell cycle. In addition, GO enrichment analyses of downregulated DEGs suggest that second-messenger-medicated signaling, apical part of cell, and positive regulation of vasculature development are significantly enriched in MF ( Figure 2C), whereas KEGG pathway analysis manifests that cell adhesion, complement and coagulation cascades, and protein digestion and absorption are primarily enriched ( Figure 2D).

Construction of the PPI Network and Selection of Hub Genes
The PPI network of the DEGs is displayed ( Figure 3A) and includes 585 nodes and 3,797 edges. The most significant modules are identified via the MCODE plug-in. The most significant module includes 37 nodes and 665 edges ( Figure 3B). Seventeen nodes are selected from the PPI network as the hub genes with degrees>60. 17 hub genes are all up-regulated genes, which are displayed ( Figure 3C). Moreover, Table 3 lists gene symbols, degrees, full names, and functions of 17 hub genes. The correlation network indicates that these hub genes are closely correlated with each other. ( Figure 3D). The GO enrichment analysis of 17 hub genes indicates that regulation of spindle checkpoint, mitotic spindle assembly checkpoint, condensed nuclear chromosome outer kinetochore, and cyclin-dependent protein serine/threonine kinase regulator activity are significantly enriched (Figures 4A-C). The KEGG pathway analysis of the hub genes indicates that cell cycle, p53 signaling pathway, and progesterone-mediated oocyte maturation are predominantly enriched ( Figure 4D), revealing that these hub genes are closely correlated with cell cycle. Based on TCGA database, the heatmap of 17 hub genes suggests that all hub genes are expressed at high levels in the LUSC samples and expressed at low levels in the normal lung samples ( Figure 4E).
Survival analyses indicated that CDK1, CDC20, CCNB1, CCNB2, BUB1, TOP2A, and CENPF are associated with worse OS in LUSC patients ( Figure 5). In detail, Supplementary Table 2 lists the median OS, hazard ratio (HR), 95% confidence interval (CI), and log-rank P-value of these hub genes. CDK1 and CDC20 have the highest node degree scores (123 and 106, respectively), and both of them are closely associated with cell cycle, indicating that the two genes may serve core roles in the occurrence and progression of LUSC. Our survival analyses suggest that high expression levels of CDK1 and CDC20 are correlated with poor OS (p 0.041 and p 0.0013, respectively).

Gene Set Enrichment Analysis of CDK1 and CDC20
The results of the GSEA analysis indicate that numerous pathways, such as cell cycle, DNA replication, mismatch repair, and base excision repair, are associated with CDK1 expression ( Figure 6A). The results of the GSEA analysis indicate that multiple pathways, containing DNA replication, cell cycle, mismatch repair, and RNA degradation are correlated with CDC20 expression ( Figure 6B). The detailed information of the GSEA analysis of CDK1 and CDC20 is listed in Table 4.

The Verification Based on Public Databases
The expression level of CDK1 is significantly increased in 502 LUSC tissue samples compared with that in 49 normal tissue samples ( Figure 7A). The CDK1 mRNA expression in LUSC is associated with the severity of the clinical stages, and the highest mRNA expression is observed in stage IV ( Figure 7B). CDC20 is expressed at high levels in LUSC tissue samples compared that in normal tissue samples ( Figure 7C). The CDC20 mRNA expression is correlated with the severity of the T stages ( Figure 7D). Moreover, IHC results from the THPA database confirm that the protein expression levels of CDK1 and CDC20 are consistent with the results of the mRNA expression; namely, CDK1 and CDC20 are significantly upregulated in LUSC (Figures 7E,F). The detailed results of IHC of CDK1 and CDC20 are illustrated in Supplementary Table 3.

The External Validation of Basic Experiments
The mRNA expression levels of CDK1 and CDC20 in seven pairs of randomly selected tissues are showed, and CDK1 and Abbreviations: FDR, false discovery rate; GSEA, gene set enrichment analysis; NES, normalized enrichment score; NOM, nominal.
Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 FIGURE 7 | The expression levels of hub genes in LUSC based on TCGA and THPA databases. CDK1 (A) and CDC20 (C) are obviously upregulated in LUSC tissues. CDK1 is significantly associated with clinical stages (B), and CDC20 is significantly correlated with T stages (D). Immunohistochemistry (IHC) results reveal that CDK1 (E) and CDC20 (F) are overexpressed in LUSC tissues compared with normal tissues.
CDC20 are significantly higher compared with surrounding normal tissues (Figures 8A,B). Also, we investigate the distribution and expression of CDK1 and CDC20 proteins in the 7 pairs of randomly selected tissues. Western blot analysis is highly consistent with the outcomes of mRNA levels, namely, CDK1 and CDC20 are highly expressed in LUSC than those in normal lung tissues (p < 0.05; Figures  8C,D). The representative image shows that CDK1 and CDC20 protein are mainly distributed in the cytoplasm/ membrane, and both have high expression in LUSC ( Figures 8E,F).

DISCUSSION
Three gene expression profiles were analyzed in the present study, and 1,366 DEGs were acquired, including 669 downregulated and 697 upregulated genes. These DEGs are FIGURE 8 | The validation of basic experiments. Real time-qPCR manifests that CDK1 (A) and CDC20 (B) in LUSC tissues are higher than normal lung tissues at the mRNA expression level. Western blot reveals that CDK1 and CDC20 in LUSC tissues are obviously higher than normal lung tissues at the protein expression level (C,D). Immunohistochemistry (IHC) results reveal that CDK1 (E) and CDC20 (F) are overexpressed in LUSC tissues compared with normal tissues.
Frontiers in Molecular Biosciences | www.frontiersin.org July 2021 | Volume 8 | Article 653805 12 mainly enriched in several pathways correlated with cell cycle, including nuclear division, cell cycle, and DNA replication. Through further selection, 17 hub genes were selected. Heatmaps indicate these hub genes are upregulated in the LUSC tissues. CDK1 and CDC20 are found to be survival-related hub genes. Public databases and basic experiments confirm that CDK1 and CDC20 are upregulated in LUSC at mRNA and protein levels.
CDK1 promotes cell entry into mitosis by binding to cyclin B to form cyclin B-CDK1 (Hiraoka et al., 2019). CDK1 is upregulated in various cancers, including LUSC, LUAD, and liver cancer (Stauffer et al., 2017). Many studies reported the important role of CDK1 in the malignancy of lung cancer. Singh et al. (2007) demonstrated that HEI10 negatively regulates cell motility and invasion by inhibiting cyclin B-CDK1 in tumor cells (Singh et al., 2007). Long noncoding RNA (lncRNA) CASC11 contributes to carcinogenesis and progression of lung cancer by targeting the miRNA-302/CDK1 axis (Tong et al., 2019). Similarly, lncRNA OR3A4 is involved in apoptosis, cell cycle, and cisplatin resistance by upregulating CDK1 in NSCLC (Shang et al., 2019). Autocrine IL-10 regulates the Op18/ stathmin signaling via the IL-10-NF-κB-ERK/CDK1 axis to initiate carcinogenesis in NSCLC . Furthermore, some studies demonstrated that CDK1 may be a therapeutic target for NSCLC. Iron-dependent CDK1 contributes to tumorigenesis in lung cancer via the GP130/ STAT3 axis, and the inhibition of CDK1 reduces carcinogenicity of the tumor cells suggesting the therapeutic potential of CDK1 in lung cancer (Kuang et al., 2019). Chen et al. suggested that the purvalanol A (CDK1 inhibitor) can increase the cytotoxicity of paclitaxel via Op18/stathmin in NSCLC . Shi et al. demonstrated that miRNA-181a suppresses cell proliferation by targeting CDK1 in NSCLC (Shi et al., 2017). Moreover, CDK1 may be used to predict prognosis because high expression of CDK1 is associated with a worse prognosis in NSCLC (Liu et al., 2018). The results of survival analysis in this study are consistent with their findings.
CDC20 encodes an important spindle assembly checkpoint (SAC) protein that can activate the anaphasepromoting complex (APC/C) (Fujimitsu and Yamano, 2020). A mistake during the segregation of sister chromatids prevents mitotic arrest due to abnormal levels of CDC20 thus contributing to premature anaphase and causing aneuploidy, which is associated with malignant transformation of the tumor cells (Lara-Gonzalez et al., 2019). The elevated expression of CDC20 is often observed in cancer, including NSCLC and breast cancer . Numerous studies demonstrated that CDC20 plays an important role in the development of NSCLC. CDC20 participates in the pathogenesis of NSCLC, and elevated expression of CDC20 is associated with higher tumor grade and stage (Gayyed et al., 2016). Zhang Y et al. demonstrated that CDC20 is involved in the occurrence and progression of NSCLC (Zhang et al., 2015). Additionally, several studies have reported the associations between CDC20 and prognosis in NSCLC. The abnormal expression of CDC20 is closely associated with worse survival in NSCLC patients (Zhang et al., 2015). Similarly, Kato et al. (2012) demonstrated that elevated expression of CDC20 correlates with pleural invasion and inferior 5-year OS in 362 NSCLC cases (Kato et al., 2012). The results of the survival analysis in the present study confirm the associations of CDC20. CDC20 is a potential target for the treatment of cancer. Knockdown of CDC20 via small interfering RNA (siRNA) inhibits the growth of the tumor cells, leads to the accumulation of the cells in the G2/ M-phase, and improves the cytotoxicity of paclitaxel (Taniguchi et al., 2008). External validation suggests that CDK1 and CDC20 are highly expressed in LUSC through public databases and basic experiments. Thus, CDK1 and CDC20 may play important roles in carcinogenesis and development of LUSC and may be the candidate therapeutic targets and prognostic biomarkers in LUSC.
Several limitations in our study are listed as follows. First, only three datasets are included in our analysis. Although some datasets are similar, they fail to meet the selection criteria described above. To reduce the bias of our study, we excluded these datasets. Additional relevant investigations are needed to further elucidate the role of CDK1 and CDC20 in LUSC. Second, the included datasets fail to provide detailed survival information, and we had to perform the overall survival analysis of the hub genes using the K-M plotter.

CONCLUSION
Our study was performed to identify DEGs associated with the tumorigenesis and invasion of LUSC. Seventeen hub genes were selected from 1,366 DEGs, and CDK1 and CDC20 were shown to be correlated with the prognosis of LUSC. Furthermore, public databases and basic experiments validated the upregulated status of CDK1 and CDC20. CDK1 and CDC20 may be potential therapeutic and prognostic indicators of LUSC. Additional studies are demanded to investigate the biological associations of the two genes with LUSC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethics Committee of Zhejiang Cancer Hospital (IRB-2020-817). The patients/participants provided their written informed consent to participate in this study.