A 15-Gene Signature and Prognostic Nomogram for Predicting Overall Survival in Non-Distant Metastatic Oral Tongue Squamous Cell Carcinoma

Background Oral tongue squamous cell carcinoma (OTSCC) is a devastating tumor with poor prognosis. There is an urgent need for reliable biomarkers to help predict prognosis and guide treatment for OTSCC. In the current study, we aimed to develop a robust multi-gene signature and prognostic nomogram to predict the prognosis of patients with non-distant metastatic OTSCC. Methods OTSCC-related differentially-expressed genes were screened from The Cancer Genome Atlas (TCGA) database. Univariate Cox regression based on 1,000 bootstrap replicates, LASSO regression and stepwise multivariate Cox regression were utilized to develop a novel multi-mRNA signature for predicting overall survival in OTSCC. The concordance index, area under receiver operating characteristic (ROC AUC) and calibration curve were employed to assess the prediction capacity of the novel multi-gene model. In addition, a prognostic nomogram was constructed to facilitate the clinical use of the fitted model. The Kaplan-Meier with log-rank test was employed to assess differences in overall survival. Results We successfully established a novel 15-mRNA prognostic model for predicting overall survival of non-distant metastatic OTSCC, involving ADTRP, ITGA3, RFC4, CCDC96, CYP2J2, NELL2, SPHK1, SPAG16, HBEGF, S100A9, EGFL6, ADGRG6, PDE4D, ABCA4, and CTTN. The prediction ability of this 15-gene signature was independent of other clinicopathological factors, with an HR of 11.5 (95% CI: 4.70–28.3). Moreover, internal validation by bootstrap analysis yielded a C-index of 0.849, with a 3-year AUC of 0.907 and 5-year AUC of 0.944, which implied excellent prediction accuracy of the fitted model. In addition, external validation by using the GEO dataset (GSE41116) yielded a C-index of 0.804, with a 3-year AUC of 0.868 and 5-year AUC of 0.855, which also indicated good prediction ability of the 15-gene model. Finally, a prognostic nomogram integrating risk group, grade, T stage and N stage was established. Conclusion Our results demonstrate our 15-gene signature was independently associated with overall survival in non-distant metastatic OTSCC. Moreover, the prognostic nomogram integrating the 15-gene signature and clinicopathological factors has potential to be developed as a prognostic tool.


INTRODUCTION
Oral tongue squamous cell carcinoma (OTSCC) is the most common cancer in the oral cavity, comprising 90% of all cases of oral malignancies (1,2). Despite advances in treatment for OTSCC, the 5-year overall survival rate remains poor (3). Although the TNM staging system is of great value for risk stratification of OTSCC, it has imprecise prognostic predicting capability to guide patient management due to the heterogeneity of the disease characteristics. Thus, new prognostic models are essentially needed to identify the high-risk patients with OTSCC and consequently improve personalized treatment. Moreover, a precise and quantitative prognostic model is essential and valuable for patient counseling.
An increasing amount of evidence has demonstrated that aberrant expression of certain messenger RNAs (mRNA) is closely related to the prognosis of cancer patients and could be used as molecular biomarkers for the prognostic evaluation and identification of potential high-risk patients (4). In recent years, integration of an increasing number of molecular biomarkers into a single model is being established for prognostic evaluation (5)(6)(7)(8). However, a multi-mRNA prognostic signature and nomogram for OTSCC patients are still lacking.
The Cancer Genome Atlas (TCGA) is one of the prominent databases containing genomic, transcriptomic, epigenetic, proteomic, and clinical information for different types of cancer. In the current study, we identified significant cancer-related genes that are differentially-expressed between non-distant metastatic OTSCC samples and normal tongue tissue samples in TCGA. Then, we utilized univariate Cox regression based on 1,000 bootstrap replicates, the least absolute shrinkage and selection operator (LASSO) regression model and the multivariate Cox regression model to establish a reliable and precise multi-mRNA-based prognostic signature for predicting overall survival of non-metastatic OTSCC patients. Simultaneously, a prognostic nomogram was established to facilitate the clinical use of the multi-mRNA signature. In addition, the C-index, calibration plot and time-dependent ROC curve were used to evaluate the ability of the model to predict prognosis.

Data Source and Identification of Differentially Expressed Oncogenes
First, for the training set, clinical and gene expression data of 13 normal tongue tissues and 127 OTSCC tissues was downloaded from TCGA database (https://www.cancer.gov/about-nci/ organization/ccg/research/structural-genomics/tcga). For the external validating set, gene expression data of 28 OTSCC tissues was downloaded from the GEO database (GSE41116) (https://www.ncbi.nlm.nih.gov/gds/?term=GSE41116). Clinical data of these 28 OTSCC patients was downloaded from cBioPortal for Cancer Genomics (https://www.cbioportal.org). Next, differential expression analysis was employed to identify differentially-expressed genes based on mRNA expression differences having an absolute |logFC| >1 and p < 0.05. In addition, any differentially-expressed genes with an average expression value less than 100 were removed. Then, oncogenic signature enrichment analysis was performed using Metascape (http://metascape.org/gp/index.html#/main/step1) to identify the differentially-expressed cancer-related genes. These differentially-expressed cancer-related genes served as candidate genes for further regression analysis and prognostic model building.

Establishment and Validation of the 15-Gene Signature Prognostic Model and Nomogram
In the present study, the correlation between overall survival (OS) and differentially-expressed cancer-related gene expression levels were studied using univariate Cox regression based on 1,000 bootstrap replicates. Those differentially-expressed cancerrelated genes found to be statistically significant factors by univariate analysis were included in the LASSO-penalized Cox regression analysis to screen for independent prognostic genes and avoid over fitting. Then, stepwise multivariate Cox hazard ratio regression was performed to select the best model for predicting overall survival of non-distant metastatic OTSCC. Subsequently, an mRNA-based prognostic risk score was calculated by combining the expression values of prognostic genes and their regression coefficients from the multivariate Cox regression model. The risk-score calculation formula was as follows: Risk score = S n i=1 Expi * bi Here, n represents the number of prognostic genes, Expi represents the expression value of prognostic gene i, and bi represents the regression coefficient of gene i from the multivariate Cox regression analysis. By using the median risk score as a cutoff value, OTSCC patients were divided into highrisk and low-risk groups, and the association between the prognostic gene signature and overall survival was investigated based on the risk groups.
A time-dependent receiver operating characteristic (ROC) curve was constructed and the area under the curve (AUC) was calculated to assess the predictive power of the prognostic model. Internal validation and external validation for the prognostic gene signature were performed using the C-index, and a calibration curve was drawn based on the bootstrap method. Co-expression analysis was employed to assess the collinear relationship of the prognostic genes. Finally, a nomogram for predicting overall survival of a M0 OTSCC patient was constructed using the risk group based on the prognostic gene model and clinical prognostic factors such as histologic grade and clinical stage. The calibration curve was studied to assess the performance of the prognostic nomogram.

Statistical Analysis
All statistical analyses were performed with R software and packages from the Bioconductor project. Univariate Cox regression based on 1,000 bootstrap replicates was performed to identify single significant factors of survival outcomes. Differentially-expressed genes found to be statistically significant by univariate analysis were included in the LASSOpenalized Cox regression analysis to screen for independent prognostic genes. Then, stepwise multivariate Cox hazard ratio regression was performed to establish a prognostic gene signature for predicting overall survival of OTSCC. The area under the ROC curve (AUC) was used as an accuracy index to identify the best combination of multiple bio-markers. The predictive ability of the 15-gene signature and prognostic nomogram was estimated by using the calibration curve. Overall survival times were estimated by the Kaplan-Meier method. Kaplan-Meier survival curves were generated, and the statistical significance was evaluated by the log rank test. All significance tests were two-sided, and a P < 0.05 was considered significant.

Clinical Characteristics of Oral Tongue Squamous Cell Carcinoma Patient Cohorts in The Cancer Genome Atlas and GEO
For the training cohort, clinical and RNA sequencing data of 13 normal tongue tissues and 127 OTSCC tissues was downloaded from TCGA (https://www.cancer.gov/about-nci/organization/ ccg/research/structural-genomics/tcga). In this cohort, we included 83 male and 44 female patients with a median age of 57 years (range from 19 to 87 years). Fifteen patients were AJCC stage I, 23 patients were stage II, 28 patients were stage III, and 61 patients were stage IV. Moreover, 70 patients had pathologically-positive cervical lymph nodes. All those cases had no distant metastasis. For the external validating cohort, a gene expression matrix of 28 OTSCC tissues was downloaded from GEO (GSE41116) (https://www. ncbi.nlm.nih.gov/gds/?term=GSE41116) (9). Clinical data of these 28 OTSCC patients was downloaded from cBioPortal for Cancer Genomics (https://www.cbioportal.org). The gene expression matrix and clinicopathological characteristics of the validating set is shown in Supplementary Data 1. In this cohort, there were 18 male and 10 female patients with a median age of 60 years (range from 26 to 85 years). Six patients were AJCC stage II, 13 patients were stage III, and 9 patients were stage IV. Consistent with the training cohort, all the cases in the validating cohort had no distant metastasis. Clinicopathological characteristics of the training cohort and testing cohort are showed in Table 1.

Identification of a 15-Gene Signature for Predicting Overall Survival for Non-Distant Metastatic Oral Tongue Squamous Cell Carcinoma
Candidate Screening for Differentially Expressed Cancer-Related Gene In total, 6,111 differentially-expressed genes were identified using edgeR. Then genes with an average expression less than 100 were excluded, resulting in 2,579 differentially-expressed genes being retained. Next, oncogenic signature enrichment analysis by Metascape was performed and 901 differentially-expressed cancer-related genes were identified for further regression analysis.

Univariate Cox Regression Based on 1,000 Bootstrap Replicates
We employed univariate Cox regression based on a bootstrap resampling technique (1,000 replicates) to generate a hazard ratio (HR) with both 95% bias-corrected and accelerated (BCa) bootstrap confidence intervals, as well as the 95% percentage (Perc) bootstrap confidence interval. Both BCa and Perc 95% confidence intervals that did not include zero were considered statistically significant and were incorporated in the LASSO regression analysis. The bootstrap univariate regression results are listed in Supplementary Table 1. In the end, 41 out of 901 differentially-expressed cancer-related genes were identified as prognostic factors for OTSCC ( Table 2). Forty-one differentially-expressed cancer-related genes, selected by univariate Cox regression analysis, were incorporated in the LASSO regression analysis. The regularization parameter (lambda) was set to log.lambda.min. Then 25 genes were considered statistically significant and included into the LASSO regression model (Shown in Figure 1). These 25 genes were incorporated in the stepwise multivariate Cox hazard ratio regression to select the best model for predicting overall survival of OTSCC, and resulted in a 15-gene prognostic signature involving ADTRP, ITGA3, RFC4, CCDC96, CYP2J2, NELL2, SPHK1, SPAG16, HBEGF, S100A9, EGFL6, ADGRG6, PDE4D, ABCA4, and CTTN ( Figure 2A). The results of the stepwise multivariate Cox hazard ratio regression are summarized in Table 3. Co-expression analysis, employed to assess the collinear relationship of the 15-gene prognostic signature, indicated the absence of collinearity problems ( Figure 2B). Subsequently, by using the median risk score of the 15-gene signature as a cutoff value, 127 non-distant metastatic OTSCC patients were divided into a high-risk group (n = 63) and a lowrisk group (n = 64). Kaplan-Meier survival analysis indicated the high-risk group had significantly poorer overall survival than the low-risk group (p < 0.0001) (Figure 3).

Validation of the 15-Gene Signature for Prediction of Non-Distant Metastatic Oral Tongue Squamous Cell Carcinoma Overall Survival
For internal validation, the accuracy and reliability of the 15gene prognostic signature was validated by three different methods. First, the discrimination ability of the 15-gene signature prognostic model was assessed using a C-index. In the present study, the C-index was 0.849, indicating excellent prediction accuracy of the fitted model. Second, the calibration plot-predicted 3-year and 5-year overall survival of M0 OTSCC, built to internally validate the 15-gene signature, performed very well compared with the ideal model ( Figures  4A, B). Finally, the prediction performance of the 15-gene signature for M0 OTSCC patients, evaluated using the area under receiver operating characteristic curve (ROC AUC), showed that our prognostic model provided excellent prediction performance (3-year AUC = 0.907 and 5-year AUC = 0.944) ( Figures 4C, D). For external validation, GSE41116 was employed. Consistent with the training set, survival analysis showed that in the external validating set, the high-risk group had significantly poorer overall survival than the low-risk group (p = 0.0086) ( Figure 5A). Moreover, the external validating set yielded a C-index of 0.804, with a 3-year AUC of 0.868 and 5year AUC of 0.855, which implied excellent prediction accuracy of the fitted model ( Figures 5D, E). In addition, the calibration plot predicted 3-year and 5-year overall survival of non-distant metastatic OTSCC patients and also showed excellent performance when compared with the ideal model ( Figures  5B, C).

The 15-Gene Signature Serves as an Independent Predictor of Overall Survival in Non-Distant Metastatic Oral Tongue Squamous Cell Carcinoma Patients
To evaluate the independent effect of the 15-gene signature on overall survival for M0 OTSCC patients, we used a stepwise multivariate COX regression analysis to adjust clinicopathological parameters, including age, sex, T stage, N stage, and histologic grade. The results indicated that the predictive ability of the 15-gene signature was independent of other clinicopathological factors for overall survival of M0 OTSCC patients (HR = 11.5, 95% CI = 4.70-28.3, P < 0.001) ( Figure 6A).

Establishment and Internal Validation of the Prognostic Nomogram for Predicting Overall Survival in Non-Distant Metastatic Oral Tongue Squamous Cell Carcinoma Patients
To ensure the robustness and practicability of the 15-gene prognostic model, a prognostic nomogram for predicting overall survival in non-distant metastatic OTSCC patients was established ( Figure 6B). Independent prognostic factors, such as risk group, grade, T stage, and N stage, were included in   In recent years, more and more multiple-biomarker signatures and prognostic nomograms have been established and utilized for clinical decision-making and prognostic evaluation for different type of cancers (5,(10)(11)(12). However, few biomarker-based signatures and prognostic nomograms have been studied in relationship to OTSCC (13)(14)(15). After studying tumor characteristics of 1,617 patients with cancer of the oral cavity, Montero et al. constructed a prognostic nomogram for oral cavity cancer patients (16). Their nomogram included several important clinicopathological variables such as age, sex, race, alcohol and tobacco use, oral cavity subsite, invasion of other structures, tumor size, and clinical nodal status, but unfortunately, did not include biomarkers. The concordance index of their model was only 0.67. 15-mRNA signature. The possible reasons for this difference might be as follows. First, Qiu et al. used five GEO data sets as training set to identify differentially expressed genes. Different from Qiu et al., we used TCGA data set as training set. Second, genes with an average expression less than 100 were excluded in our study. Third, we conducted an oncogenic signature enrichment analysis before our prognostic model was established to ensure only cancer-related genes would be selected into our model. When compared to the previous studies, there are several advantages in our study. First, we conducted an oncogenic signature enrichment analysis before our prognostic model was established to ensure only differentially-expressed cancer-related genes would be selected into our model. Thus, we avoided accidental inclusion of differentially-expressed non-cancerrelated genes in order to make our prognostic model more reliable. Second, small sample size could magnify the error of analysis. In the current study, a bootstrap resampling technique was employed during univariate analysis to expand the sample size and make our results more convincing. Third, LASSO regression analysis was utilized in our study before stepwise multivariate Cox regression was used, which was helpful for minimizing genetic collinearity problems. Fourth, our prognostic model yielded a high C-index (C-index = 0.849) and AUC (3-year AUC of 0.907 and 5-year AUC of 0.944), indicating better prognostic prediction than previous models. Fifth, an independent GEO data set was used for external validation, which made our fitted model more reliable. Last but not least, a prognostic nomogram integrating a multi-mRNA signature and clinicopathological variables was established to facilitate the clinical use of our 15-mRNA signature. Several limitations should be considered while reviewing our results. First, there are only 13 normal tongue samples available in TCGA, and the small sample size of normal tongue tissue was underpowered to yield a credible result of differentiallyexpressed gene analysis. Second, only non-distant metastatic cases were included in our study. Hence, the results of this study may not be applicable to the OTSCC patients with distant metastasis. Third, the fitted model was constructed based on existing data from public database. Unfortunately, some clinicopathological risk factors, such as depth of invasion, perineural invasion, vascular invasion, and so on, are unavailable from the existing data. Fourth, three different methods had been utilized for internal validation and external validation of our 15-gene prognostic signature, but the external validating cohort in the present study contained only 28 samples. An external validating cohort with larger sample sizes will be more helpful to confirm our findings. Finally, the current study is a retrospective study. A prospective randomized clinical trial is needed before the fitted model could be used in clinical practice.

CONCLUSIONS
In conclusion, our results demonstrate our 15-gene signature was independently associated with overall survival in non-distant metastatic OTSCC. Moreover, the prognostic nomogram integrating the 15-gene signature and clinicopathological factors has potential to be developed as a prognostic tool.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.