HPV Positive Status Is a Favorable Prognostic Factor in Non-Nasopharyngeal Head and Neck Squamous Cell Carcinoma Patients: A Retrospective Study From the Surveillance, Epidemiology, and End Results Database

Objective To investigate the impact of the human papillomavirus (HPV) status on head and neck squamous cell carcinoma (HNSCC) arising from different anatomic subsites. Methods HNSCC patients with known HPV status from the Surveillance, Epidemiology, and End Results (SEER) database between 2010–2015 were included in our analysis. Patients were classified into three categories of HNSCC according to Site recode ICD-O-3/WHO 2008 and Primary Site-labeled, namely, oropharynx, hypopharynx, and nasopharynx. Logistic regression model was conducted to evaluate the relationship between patient characteristics and HPV status. Kaplan-Meier methods and COX regression analysis were used to analyze survival data. Results A total of 9,943 HNSCC patients with known HPV status from the SEER database were enrolled, with 6,829 (68.7%) HPV-positive patients. HPV-positive and HPV-negative HNSCC were distinct and had different clinical and socioeconomic features (all P < 0.001). Primary sites, socioeconomical factors (age, sex, marital status, and race), and pathological features (TNM stage and grade) were closely related with HPV status (all P < 0.001). HPV-positive status was a favorable prognostic marker in HNSCC patients with cancers of the oropharynx and hypopharynx (all P < 0.001), but was not in nasopharyngeal carcinoma patients (P = 0.843). A total of 8,933 oropharyngeal carcinoma (OPC) and 558 hypopharyngeal carcinoma (HPC) patients were divided into the training and validation cohorts with a ratio of 1:1. Significant prognostic factors of the OS yielded by multivariate COX analysis in the training cohort were integrated to construct nomograms for OPC and HPC patients. The prognostic models showed a good discrimination with a C-index of 0.79 ± 0.007 and 0.73 ± 0.023 in OPC and HPC, respectively. Favorable calibration was reflected by the calibration curves. Additionally, corresponding risk classification systems for OPC and HPC patients based on the nomograms were built and could perfectly classify patients into low-risk, intermediated-risk, high-risk groups. OS in the three risk groups was accurately differentiated and showed a good discrimination. Conclusion HPV positivity was associated with an improved survival in HNSCC patients with cancers of the oropharynx and hypopharynx. Nomograms and corresponding risk classification systems were constructed to assist clinicians in evaluating the survival of OPC and HPC patients.

respectively. Favorable calibration was reflected by the calibration curves. Additionally, corresponding risk classification systems for OPC and HPC patients based on the nomograms were built and could perfectly classify patients into low-risk, intermediatedrisk, high-risk groups. OS in the three risk groups was accurately differentiated and showed a good discrimination.
Conclusion: HPV positivity was associated with an improved survival in HNSCC patients with cancers of the oropharynx and hypopharynx. Nomograms and corresponding risk classification systems were constructed to assist clinicians in evaluating the survival of OPC and HPC patients.
Keywords: head and neck squamous cell carcinoma (HNSCC), human papillomavirus (HPV), SEER database, prognosis, nomogram BACKGROUND Head and neck squamous cell carcinomas (HNSCC) are an atomically heterogeneous group of neoplasms arising from the nasopharynx, oral cavity, oropharynx, hypopharynx, and larynx. Each year, there are approximately 700,000 new cases and 380,000 deaths of HNSCC worldwide (1). It is well known that tobacco smoking, alcohol consumption, and betel quid chewing in Iran and some Southeast Asian countries are classical etiological factors for HNSCC development (2,3). Virus infection is another important etiological cause of HNSCC. For instance, Epstein-Barr virus (EBV) is common and strongly associated with nasopharyngeal carcinoma (NPC) in Southern China and Southeast Asian countries (4). And it has become clear that high-risk human papillomavirus (HPV) infection is an important etiological and prognostic factor for a subset of HNSCC over the last decade (5)(6)(7). Moreover, there is a solid epidemiological work showing that HPV-related HNSCC is on the rise in the Western world with an increased incidence of HPV infection in HNSCC of approximately 50%, while a decrease in incidence of smoking-related HNSCC is seen due to an effective smoking control (8,9).
In fact, HPV infection as an established cause and a risk stratification biomarker in oropharyngeal carcinoma (OPC) is well known (10). Guidelines have recommended that all OPC patients should be tested for HPV status and HPV-positive OPC was specified separately as an independent entity in the eighth edition of the American Joint Committee on Cancer (AJCC) tumor node metastasis (TNM) staging system (11)(12)(13). OPC patients with HPV positivity showed an improved response to therapy and a better survival (14,15). However, the role of HPV infection in non-oropharyngeal HNSCC of the nasopharynx, oral cavity, and hypopharynx was not well defined despite HPV infection being present in 7% to 25% of non-oropharyngeal HNSCC. Results in some published studies about this topic were inconsistent. For example, several retrospective studies showed that HPV positivity was associated with an improved survival in patients with HNSCC from the oral cavity, hypopharynx, and nasopharynx (16)(17)(18)(19). While some studies reported that there was no survival difference among HPV-positive and HPV-negative non-oropharyngeal patients (20)(21)(22), even a detrimental role of HPV positivity in HPV-positive ones (23)(24)(25). Large sample research to explore the role of HPV status in non-oropharyngeal patients was warranted. And to the best of our knowledge, there is no prognostic model including the HPV status for HNSCC patients.
Therefore, in this study, we sought to investigate the prognostic role of HPV status in HNSCC from different subsites based on the data from the Surveillance, Epidemiology, and End Results (SEER) database. We further established an HPV-based nomograms to predict the survival probability and provided a risk classification tool for OPC and hypopharyngeal squamous cell carcinoma (HPC) patients.

Cohort Population
We performed a retrospective research based on information from the SEER database, a publicly available cancer statistics database, which is constitutive of 18 cancer registries in the United States and covers about 28% of the total population of the United States (https://seer.cancer.gov/data/). Informed consent was waived for the use of public data from the SEER.
Data of race, age at diagnosis, gender, marital status, primary sites, pathology grade, HPV status, treatment (primary surgery, radiation, and chemotherapy), and survival time were collected. The endpoint for the current study was the overall survival (OS), which was defined as the time from cancer diagnosis to the time of death from any cause or of the last follow-up.

Statistical Analysis
Descriptive statistics were used to compare the demographic and clinical characteristics of patients between HPV-positive and HPV-negative HNSCC patients. Categorized variables, presented as frequency and their proportion, were analyzed by Chi-square tests. Logistic regression analysis was applied to analyze the associations between the clinicopathologic factors and tumor HPV status. Kaplan-Meier analyses were performed to generate survival curves and Log Rank test was applied to compare the differences among the curves. Comparative risk factors of the overall survival (OS) were identified by univariate and multivariate analysis using Cox regression models. Simple random sampling was performed with the random sampling function [sample () function] in the R software, and a total of 8,933 OPC and 558 HPC patients were randomly classified into the training and validation groups by a ratio of 1 to 1. The data of the training cohort was used to establish the nomogram at the 3-and 5-year OS with the "rms" package. Concordance index (C-index) and area under curve (AUC) were calculated to evaluate the discrimination of the established nomograms. Calibration plots were used to evaluate the calibrating ability. C-index and AUC values vary from 0.5 to 1.0, where 0.5 represents random chance and 1.0 indicates a perfect fit. Typically, C-index and AUC values greater than 0.7 suggest a reasonable estimation. All statistical analyses were conducted using the statistical software packages R version 3.6.2 (http://www.R-project.org, The R Foundation) and SPSS statistics version 23.0 (IBM SPSS Statistics, New York, United States). All statistical tests were two-sided and a P-value < 0.05 was considered statistically significant.

The Association Between Patient Characteristics and Tumor Human Papillomavirus Status
Logistic regression model was conducted to evaluate the relationship between patient characteristics and tumor HPV status. As shown in Figure 1

Survival Analysis and Prognostic Factors
The median follow-up period for the entire study population was 37 months (95% CI: 36.30-37.70 months). Kaplan-Meier estimates demonstrated that HPV-positive HNSCC patients had a better survival than that of HPV-negative HNSCC patients (P < 0.001) ( Figure 2A). The estimated 3-and 5-year OS rates were 80.0% and 75.0%, respectively, for HPV-positive HNSCC patients, compared with 54.0% and 48.0% for HPV-negative patients. Multivariate Cox regression analysis showed that HPV status was an independent prognostic factor for the OS in the overall HNSCC population (Supplementary Table 1). Compared to HPV-negative patients, patients with HPV-positive HNSCC had an improved OS (HR = 0.51, 95% CI: 0.46-0.55, P < 0.001). Other factors associated with the OS in the multivariate analysis included primary site, age, race, T stage, N stage, M stage, grade, marital status, primary site surgery, radiotherapy, and chemotherapy.
Next, to clarify the prognostic effect of HPV infection on different HNSCC, we compared the survival data of the OS between HPV-positive and HPV-negative HNSCC patients of different anatomical subsites (oropharynx, hypopharynx, and nasopharynx). The survival curves intuitively illustrated that, in the oropharynx and hypopharynx, patients with HPV-positive cancers showed a better OS than that of their counterparts with HPV-negative cancers (P < 0.001) ( Figures 2B, C). While for    Table 3). All statistically and clinically significant prognostic indicators for the OS were integrated to construct the prognostic models. Prognostic models for OPC and HPC patients were virtually presented in the form of a nomogram ( Figures 3A, B) and were validated using the validation cohort. The nomogram for OPC was constructed based on 11 important prognostic factors including HPV status, race, age, marital status, grade, T stage, N stage, M stage, primary site surgery, chemotherapy, and radiotherapy, and 10 statistically and  The bold values indicated that P-value was less than 0.05 and the difference was statistically significant.
( Figures 4C, D), respectively, indicating that the established nomograms exhibited a good predictive performance. The calibration curves showed a good calibration with an optimal agreement between the predicted nomograms and actual OS at 3 years and 5 years ( Figure 5). Application of the nomogram for OPC in the validation cohort still gave a good discrimination and good calibration as shown in Supplementary Figures 1A, B and Supplementary Figures 2A, B. Similar results were observed for HPC patients in the validation cohort (Supplementary Figures 1C, D and Supplementary Figures 2C, D).

Risk Classification System
Additionally, the corresponding risk classification systems of the OS for OPC and HPC patients were constructed, according to the cutoff analyses for the total points of each patient in the total cohort by the X-title program. All OPC patients were classified

DISCUSSION
In the present study, we aimed to determine the role of tumor HPV status in HNSCC from different subsites (oropharynx, nasopharynx, and hypopharynx), based on the customized SEER Head and Neck with HPV Status Database. We found that HPV positivity was related to a superior survival in OPC and HPC patients, but not in patients with NPC. We also established nomograms including the HPV status that predicted the 3-and 5-year overall survival for OPC and HPC patients. In this database, HPV infection was prevalent (68.7%) among HNSCC patients, even in non-oropharyngeal HNSCC, the HPVpositivity in cancers of the oropharynx, hypopharynx, and nasopharynx were 73.1% (6530/8933), 25.1% (140/558), 35.2% (159/452), respectively. This prevalence was similar to what has been reported in the Western population (27). Notably, all patients in this database were detected for HPV infection in their tumors, while those who did not have HPV infection detection information were not included. This would inevitably lead to a selection bias. Another factor that might further add to the information bias was the fact that we could not distinguish the exact detection methods used to determine the HPV infection status and genotypes of HPV. Direct and specific test of HPV such as HPV DNA and RNA detection were sensitive but more complicated and more expensive (28). On the other hand, immunohistochemical staining of the p16 protein was widely used as a surrogate marker of HPV infection and has been recommended by the Eighth edition of the TNM Classification system in oropharyngeal carcinoma (29,30). A large international study of 3,680 samples showed that HPV infection in OPC, oral cavity carcinoma, and larynx carcinoma were 22.4%, 4.4%, and 3.5%, respectively, based on positivity for HPV-DNA, and for either HPV E6 mRNA or p16, and were 18.5%, 3.0%, and 1.5%, respectively, when requiring a simultaneous positivity for all three markers (31). Another study found that there might be a considerable (up to 26.2%) misclassification when using p16 staining alone to determine the HPV infection status (32). It would be ideal if there was a standard HPV detection method that stratify patient outcomes well while being clinically practical and inexpensive. Nevertheless, with the current information we could obtain from the SEER head and neck cancer with HPV status database, we could still find convincing clues about how HPV infection would impact the survival of a head and neck cancer patient.
In the present study, we mainly focus on the prognostic role of HPV status in HNSCC from different subsites (oropharynx, nasopharynx, and hypopharynx). We found that HPV status was not only an important prognostic marker in patients with OPC but also an important prognostic factor in patients with HNSCC from the hypopharynx. In other words, HPV status was significantly associated with the prognosis of non-nasopharyngeal HNSCC. Based on the fact of that an HPV-positive status was a crucial prognostic factor, we developed and validated prognostic nomograms that integrated the HPV status for OPC and HPC patients, respectively. Our established nomograms for OPC and HPC patients performed well in calibration and discrimination, showing a good predictive value. Moreover, based on the total points produced by the nomograms, we developed a novel risk classification system for OPC and HPC patients, which classified patients into low-, intermediate-, and high-risk groups. Significant difference in the OS was observed among the three prognostic groups in the three cohorts. Therefore, by using the nomograms, we could accurately predict the individual survival probability at a certain timepoint and make a risk classification for OPC and HPC patients. But the present nomograms were established and validated by using the data from the same database, thus, a prospective validation of the nomograms in another independent dataset is warranted for a reliable evaluation. HPV infection was not a clinically prognostic marker for NPC patients in our study. Previous studies had showed that HPV infection in NPC was observed but it was relatively rare compared to EBV infection and the prognostic role of HPV infection in NPC was controversial (33). There were studies suggesting that there was no statistical difference in the survival between HPV positive and EBV positive NPC patients (34). While, existing literature showed that HPV positive patients had worse outcomes compared to patients with EBV-positive NPC (35). It was well known that NPC was strongly associated with EBV infection and plasma EBV DNA have been used for population screening, prognostication, predicting treatment response for therapeutic adaptation, and disease surveillance in NPC (36,37). However, the role of HPV in NPC or its interplay with EBV was unclear despite of an increased awareness of HPV infection in NPC. In the current study, we could not further evaluate the role of EBV and its interplay with HPV as the EBV information in the SEER database was unavailable. Therefore, future studies exploring the role of HPV in NPC or its interplay with EBV are needed.
In addition, we explored the association between patient characteristics and HPV status. The results showed that primary sites, socioeconomical factors, and pathological features are closely with the HPV status. Specifically, patients who were married, at younger age, male, and white race were more likely to present with an HPV-positive HNSCC. The results were consistent with previous literatures (38,39). This might imply that patients with these characteristics were more vulnerable to HPV infection and that they may gain a potential benefit from HPV vaccines. In fact, globally there are around 22,000 OPSCC cases annually caused by HPV infection with 80%-90% being due to HPV 16 infection. These cases might be preventable by HPV vaccination (40). Importantly, there has been prospective clinical research to explore the implementation of HPV vaccination in HPV-associated HNSCC (41). Considerable efforts are needed to further propel HPV vaccination program in HNSCC patients.
As a retrospective study using data from SEER, our study had several limitations. Importantly, due to the nature of the SEER database, information of the HPV test method was not available in the SEER database. Therefore, caution should be taken when interpreting our results about the prevalence of HPV-positive tumors. In addition, data of EBV was incompletely captured in the SEER database, which was a crucial factor for NPC and may lead to a different result for NPC patients. Nonetheless, this study rested on a large sample size to describe the effects of tumor HPV status on HNSCC patients arising from different anatomical subsites including the nasopharynx, oropharynx, and hypopharynx, while prior studies have mainly focused on oropharyngeal cancer. In conclusion, HPV infection was not low in HNSCC patients, even in non-oropharyngeal HNSCC. HPV status was a crucial clinically applicable prognostic marker in nonnasopharyngeal HNSCC, which suggested that HPV testing was recommended for non-nasopharyngeal HNSCC patients. Prognostic nomograms for OPC and HPC patients including the HPV status were essential for a correct prognosis, and risk classification systems was built which could perfectly classify OPC and HPC patients into low-, intermediated-, and highrisk groups.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The ethics committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
MW: Data collection and data analysis, and writing of the manuscript. QW: Design and data analysis, writing of the manuscript and funding support. YQ, XH, YLiu, and YLi: Data analysis and data collection. XW: data analysis and writing of the