A Combined Prediction Model for Lymph Node Metastasis Based on a Molecular Panel and Clinicopathological Factors in Oral Squamous Cell Carcinoma

Objective Lymph node metastasis is the most important factor influencing the prognosis of oral squamous cell carcinoma (OSCC) patients. However, there is no proper method for predicting lymph node metastasis. This study aimed to construct and validate a preoperative prediction model for lymph node metastasis and guide personalized neck management based on the gene expression profile and clinicopathological parameters of OSCC. Methods Based on a previous study of related genes in OSCC, the mRNA expression of candidate genes was evaluated by real-time PCR in OSCC specimens. In this retrospective study, the gene expression profile and clinicopathological parameters of 112 OSCC patients were combined to construct the best prediction model for lymph node metastasis of OSCC. The model was validated with 95 OSCC samples in this study. Logistic regression analysis was used. The area under the curve (AUC) ultimately determined the diagnostic value of the prediction model. Results The two genes CDKN2A + PLAU were closely related to lymph node metastasis of oral squamous cell carcinoma. The model with the combination of CDKN2A, PLAU, T stage and pathological grade was the best in predicting lymph node metastasis (AUC = 0.807, 95% CI: 0.713-0.881, P=0.0001). The prediction model had a specificity of 96% and sensitivity of 72.73% for stage T1 and T2 OSCC (AUC = 0.855, 95% CI: 0.697-0.949, P=0.0001). Conclusions High expression of CDKN2A and PLAU was associated with lymph node metastasis in OSCC. The prediction model including CDKN2A, PLAU, T stage and pathological grade can be used as the best diagnostic model for lymph node metastasis in OSCC.


INTRODUCTION
Oral cancer is a common malignant tumor that occurs in oral epithelial tissue and among these tumors, more than 90% are OSCC (1). OSCC has a propensity for occult nodal metastasis in the early stage, which is the most important factor influencing patient prognosis (2)(3)(4)(5)(6). Statistics have shown that the 5-year survival rate of OSCC is 50% to 60%; unfortunately, the presence of just one metastatic lymph node designates patients to an advanced stage disease category and has been shown to confer a 50% decrease in long-term survival (7). Therefore, many studies have suggested that elective neck dissection should be performed for all early-stage cN0 OSCC (2,6). However, clinical practice clearly shows that approximately 70% of early-stage OSCC patients undergo needless neck dissections (8). To formulate individualized surgical treatment for different OSCC patients, an accurate method to judge lymph node metastasis needs to be urgently explored (9).
Many studies have found that OSCC is a polygenic disease, and gene expression profiling technology has made highthroughput gene analysis possible (10)(11)(12)(13). Researchers can obtain the gene expression characteristics of a certain type of tumor by analyzing the gene expression profiles of tumor samples (14). Our previous research detected the expression of 22 candidate genes and 1 housekeeping gene in 120 OSCC tissue samples and 120 normal tissue samples at the mRNA level using real-time PCR (15). Statistical methods were used to analyze and determine the differentially expressed genes related to lymph node metastasis in OSCC. Cyclin-dependent kinase inhibitor 2A (CDKN2A) and urokinase-type plasminogen activator (PLAU) were closely correlated with lymph node metastasis in OSCC.
In this study, we conducted a retrospective and independent prospective large sample study of tumor tissues with tumor classification data based on the latest AJCC 8 th edition guidelines. The predictive value of candidate gene expression for lymph node metastasis was validated. Furthermore, the best diagnostic model for lymph node metastasis, which included CDKN2A, PLAU and other clinicopathologic parameters was analyzed.

Comparison of CDKN2A and PLAU mRNA Levels Between Cancerous and Normal Tissues From the Online Oncomine and GEPIA Databases
The mRNA expression data of oral cavity SCC were downloaded from the online Oncomine database (https://www.oncomine. org/). Differences in CDKN2A and PLAU expression between tumor and normal tissues were analyzed using independent sample t tests. The mRNA expression of the CDKN2A and PLAU genes in HNSCC/normal tissues was also analyzed using the online GEPIA database (http://gepia.cancer-pku.cn/).

Prognostic Analyses of CDKN2A and PLAU Expression From the GEPIA and HUMAN PROTEIN ATLAS Databases
The association between the two genes and disease-free survival was downloaded from the online GEPIA database. The associations of CDKN2A/PLAU protein expression with 5-year overall survival for HNSCC were analyzed using the head and neck cancer -interactive survival scatter plot and survival analysis tool from the Human Protein Atlas database (https:// www.proteinatlas.org/).
Protein-Protein Interaction (PPI) Networks and Gene Set Enrichment Analysis (GSEA) of CDKN2A and PLAU From the STRING Database and GSEA Database The PPI networks of CDKN2A and PLAU were analyzed using the STRING database (https://string-db.org/). The most primary PPI networks between the two proteins were determined. GSEA of CDKN2A and PLAU was performed using the GSEA database (https://www.gsea-msigdb.org/gsea/index.jsp).

Patient Samples
In this retrospective study, 112 OSCC tissue specimens were selected from the Department of Oral In this study, the inclusion criteria for eligible patients were as follows (1): a pathological diagnosis of squamous cell carcinoma (2); a tumor located in the tongue, lower gingiva, upper gingiva, buccal mucosa, floor of the mouth, or hard palate; (3) a primary tumor without evidence of distant metastasis; (4) underwent radical resection of the primary tumor with or without neck dissection; (5) no previous treatment such as neoadjuvant chemotherapy or prior radiotherapy; (6) complete clinicopathological data, follow-up data and available tissue specimens; and (7) provided informed consent. The exclusion criteria were as follows: (1) malignancies in other organs; and (2) requested withdraw from the study.

Real-Time PCR
After performing RNA extraction and reverse transcription on 95 fresh tissue samples of oral squamous cell carcinoma, the expression of the CDKN2A and PLAU genes was detected by real-time PCR. The b-actin housekeeping gene was used as an internal reference. All assays were carried out in triplicate.
According to the protocol provided by the manufacturer, predenaturation was first performed for 30 seconds (95°C), followed by denaturation for 5 seconds(95°C) and annealing and extension for 30 seconds (60°C), for a total of 40 cycles. The primer sequences are available in Supplemental Table 1.

Sample Size Calculation
The sample size calculation method was as follows: according to the conclusions of our previous retrospective study, the rate of delayed neck metastasis after OSCC surgery was approximately 50.0%, therefore, the ratio of metastatic to nonmetastatic disease was approximately 1:1. The accuracy of conventional clinical and imaging examinations in the neck for diagnosing metastasis of OSCC is approximately 60%, while a previous retrospective study found that the accuracy of predicting neck metastasis can be increased by 15% by the addition of molecular information, up to approximately 75%. Therefore, the sample size of this study was calculated as follows: applying the ratio of neck OSCC with and without actual metastasis after surgery of 1:1, PASS 15.0 software was applied for the following: One ROC Curve Power Analysis (AUC0:0.6;AUC1: 0.75), two-sided test, a=0.05 (probability of type-1 error≤5%), and b=0.20 (power of test≥80%); the minimum effective sample size to predict lymph node metastasis was 102 (51 patients in the metastatic group, and 51 patients in the nonmetastatic group). Considering the possibility of participant loss during clinical trials due to factors such as noncompliance, loss to follow-up, and accidental death, which would reduce the effective number of observed subjects, the theoretical sample size needed to be increased by 5%; thus, the sample size of this study was set to at least 108 cases.

Statistical Analysis
The flowchart of the whole study is shown in Figure 1. All calculations and analyses were performed using SPSS 25.0 Statistical Package for Windows (IBM Corp., Armonk, NY). The PCR data of 95 OSCC samples were standardized by the DCt method. The expression level was defined as 2-DCt, where DCt = Ct (target gene) -Ct (housekeeping gene). To analyze the ROC curve under logistic regression, a log transformation with base 2 was performed.
Lymph node metastasis was defined as positive cervical lymph nodes reported after neck dissection or delayed neck metastasis during the follow-up period of this study. All potential prognostic factors with P values <0.05 from the univariate analysis were incorporated into the multivariate analyses. The hazard ratios with corresponding 95% confidence intervals (CIs) and P values are reported. Logistic regression and the area under the ROC curve were used to compare and analyze the different combinations of genes and clinicopathological parameters. The ROC curve was generated by plotting the sensitivity against the false-positive rate (100-specificity), and the area under the curve (AUC) was calculated. The AUC ultimately determined the diagnostic value of the prediction model. In the case of AUCs> 0.5, the closer the AUC is to 1, the higher the diagnostic efficiency.

mRNA Expression and Prognostic Value of CDKN2A and PLAU From Online Databases
The expression of both the CDKN2A and PLAU genes was upregulated in cancer tissues compared with that in normal tissues in both the Oncomine database and GEPIA database. Specifically, an independent sample t test was performed on mRNA expression data of the CDKN2A and PLAU genes between OSCC (only oral cavity cancer was selected from HNSCC) and normal tissues. There were significant differences in the expression of the two genes between OSCC and normal tissues in the Oncomine database (P <0.01) ( Figure 2A). Moreover, mRNA expression of the CDKN2A and PLAU genes in HNSCC was upregulated compared with that in normal tissues in the GEPIA database, P <0.01 ( Figures 2B, C).
However, the results from the survival map showed that the prognostic significances of CDKN2A and PLAU expression in HNSCC were clearly different ( Figure 2D). In the GEPIA database, high expression of CDKN2A was associated with good disease-free survival in HNSCC patients (P <0.05) ( Figure 2E). The mRNA expression of the PLAU gene was highly expressed in HNSCC and was associated with poor disease-free survival (P < 0.05) ( Figure 2F).

PPI Networks Between CDKN2A and PLAU Based on the STRING Database and GSEA of CDKN2A and PLAU
According to the analysis of the STRING database, the PPI networks between CDKN2A and PLAU included cyclins, cell cycle regulation, extracellular matrix organization and the PI3K-AKT pathway (Supplemental Figure 1).
Analysis of the GSEA database showed, the most important respective gene sets for CDKN2A (cell cycle pathway, Supplemental Figure 2A) and PLAU (PI3K-AKT pathway and so on, Supplemental Figure 2B).

Retrospective Data
First, logistic regression analysis was performed for PLAU and CDKN2A. The AUC of PLAU was 0.732 with a 95% CI of 0.640-0.811, sensitivity of 74.65% and specificity of 63.41% ( Figure 3A). The AUC of CDKN2A was 0.602 with a 95% CI of 0.506-0.694, sensitivity of 53.52% and specificity of 68.29 ( Figure 3B).
Second, logistic regression analysis was performed for traditional T stage and pathological grade. Similarly, the AUC of pathological T stage was 0.613 with a 95% CI of 0.516-0.704, sensitivity of 66.20 and specificity of 58.54% ( Figure 3C). The AUC of pathological grade was 0.635 with a 95% CI of 0.539-0.724, sensitivity of 90.14% and specificity of 29.27% ( Figure 3D).
mRNA expression data of the CDKN2A and PLAU genes in 112 OSCC samples and clinicopathological parameters (pathological T stage and grade) of the patients were used to construct a prediction model for lymph node metastasis in OSCC. We performed receiver operating characteristic (ROC) curve analysis on the real-time PCR data. We further determined the AUC and the corresponding P values from a Wilcoxon signed rank test. Additionally, we employed logistic regression analysis to identify the best combination of multiple diagnostic factors. The AUC of the final combination was 0.802 with a 95% confidence interval (CI) of 0.716-0.871, sensitivity of 80.28% and specificity of 70.73%. The corresponding ROC curve is shown in Figure 3E. The logistic regression equation is shown in Table 3.   Figure 4A). The AUC of CDKN2A was 0.671 with a 95% CI of   Figure 4B). The AUC of pathologic T stage was 0.598 with a 95% CI of 0.493-0.698, sensitivity of 75.56% and specificity of 50.00% ( Figure 4C). The AUC of pathological grade was 0.557 with a 95% CI of 0.451-0.659, sensitivity of 93.33% and specificity of 14.00% ( Figure 4D). The AUC for the prediction model including CDKN2A and PLAU mRNA expression, T stage and pathological grade of OSCC was 0.807 with a 95% CI 0.713-0.881, sensitivity of 68.89% and specificity of 80.00% ( Figure 4E). Table 4.   Figure 4F). This result indicated that the diagnostic performance of the prediction model was optimal for earlystage OSCC patients.

DISCUSSION
The main feature of OSCC metastasis is that it easily spreads along draining lymphatics, and it is relatively uncommon for OSCC to metastasize to distant sites (16). However, cN0 OSCC as an indication for elective neck dissection in all patients is still controversial. Traditional T staging and pathological grading increasingly show the limitations of predicting metastasis. Many novel pathological factors have been considered as potential approaches to predict the risk of regional recurrence, in particular the number of positive lymph nodes and lymph node ratio (17)(18)(19)(20). However, to obtain the results of the above two variables, it is necessary that the patient has undergone neck dissection and can only judge the risk of postoperative neck recurrence. It does not apply to the prospective weighing of whether the patient should undergo neck dissection or neck observation strategy. Accordingly, the influence of biological heterogeneity on OSCC metastasis is increasingly recognized (21). In recent years, many biomarkers have been developed to predict lymph node metastasis, but there is no one or panel of markers in the field of OSCC that can be widely applied in clinical practice (22). Trying to integrate biomarkers and clinicopathological variables and predict jointly is a more feasible strategy (23,24). With the update of the AJCC 8 th edition guidelines in 2017, the novel T stage, which fully considers the increased value of depth of invasion for the clinical staging of early tumors, was believed to improve predictive discrimination over that of the AJCC 7 th edition guidelines (25,26). Pathology grade was an important part of routine pathology reports despite its controversial prognostic value (27). Our recent study found that pathological grade had independent prognostic value in early-stage OSCC but not in advanced-stage OSCC (28). In the prospective dataset, the T stage (AJCC 8 th edition guidelines) and pathological grade were included in the predictive model as the crucial traditional variables. Although the two variables partially reflect a predictive value for lymph node metastasis, neither of the two variables alone nor in combination can achieve very good discrimination. Therefore, the inclusion of highly effective biomarkers in this model will be a key factor in establishing a metastasis prediction model. As expounded by Yalniz et al. (12), the metastasis of OSCC can be predicted, and multiple accurate prediction profiles can be obtained by using various predictive gene subsets. Our previous study explored a molecular diagnostic method based on realtime quantitative PCR technology to determine lymph node metastasis in OSCC. CDKN2A and PLAU were identified as closely related genes to metastasis of OSCC. The mRNA expression of the CDKN2A and PLAU genes in OSCC tumor tissues was higher than that in normal tissues. The CDKN2A gene encodes the tumor suppressor protein p16, which prevents phosphorylation of retinoblastoma protein and thus halts the cell cycle progression from G1 to S phase (29). Downregulated expression, inactivation or copy number deletion of CDKN2A has been a frequent event in the development of OSCC and is related to the occurrence, development and prognosis of OSCC (29)(30)(31)(32)(33). In addition, it was also found that a CDKN2A/p16 (+) status in head and neck cancer was strongly predictive of poorly differentiated tumors (34). The above findings showed that CDKN2A was associated with an increased clinical stage and histological differentiation of OSCC. PLAU belonging to the S1 serine peptidase of Clan PA is a proteinase involved in the transformation of plasminogen to plasmin, and it can hydrolyze extracellular matrix remodeling related proteins and activate growth factors (35). Extracellular matrix organization and the P13K-Akt signaling pathway may be involved in the possible mechanism of PLAU's function in OSCC (35). PLAU and its receptor were upregulated in tumor cells and were associated with tumor proliferation, migration and metastasis (36)(37)(38)(39)(40).
A recently published study based on the Gene Expression Omnibus (GEO) and TCGA databases identified and validated a set of robust prognostic signatures including PLAU, CLDN8 and CDKN2A, that could predict overall survival in OSCC patients (41). Gene ontology (GO) enrichment analysis, ingenuity pathway analysis (IPA), PPI network and survival analysis indicated that their three-gene signature and identified several pathways that play important roles in regulating the initiation and development of OSCC. As there were few genes that overlapped with the findings of different gene expression profiling studies with similar purposes (42), this key study also indirectly indicated that our predictive markers can be replicated between different studies. What differentiates out study from previous research is that our study combined biomarkers with clinicopathological characteristics and used tissue samples to confirm that we collected this model has a good ability to predict lymph node metastasis in early OSCC. The direct or indirect PPI networks between CDKN2A and PLAU include cyclins, cell cycle regulation, extracellular matrix organization and the PI3K-AKT pathway, which regulate proliferation, invasion and metastasis.
In this prospective study, T stage was defined according to the latest AJCC 8 th edition guidelines. The AUC for the prediction model was 0.807 with a sensitivity of 68.89% and specificity of 80.00%. The specificity of the predictive model for lymph node metastasis in OSCC tumor tissues increased by nearly 10% compared with that in the retrospective study. Furthermore, the specificity of the model was increased to 96% for T1 and T2 stage OSCC tumor tissues. In other words, the true-negative rate of the prediction model was 96%; thus, these low-risk patients did not need to undergo neck dissection. This avoids wasting health-care resources and improves quality of life.
The present study included a retrospective training set and a prospective validation set. The main limitation of the study was that the T stages were classified according to the AJCC 7 th edition classification in the training set. DOI and extranodal extension data are also partially missing. However, the research was a twocenter study, and in both independent samples of OSCC, the model achieved high predictive efficiency for lymph node metastasis, especially in early-stage diseases. Thus, the study has good external authenticity. These limitations will be given further consideration in future studies.

CONCLUSIONS
Our study demonstrates that this prediction model has considerable clinical value for the accurate diagnosis of lymph node metastasis of OSCC. Before the model is applied in clinical practice, a randomized controlled trial is still needed.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of Beijing Stomatological Hospital (CMUSH-IRB-KJ-PJ-2020-12). The patients/participants provided their written informed consent to participate in this study.