CDKN2A Copy Number Loss Is an Independent Prognostic Factor in HPV-Negative Head and Neck Squamous Cell Carcinoma

Background HPV infection is associated with high p16 expression and good prognosis in head and neck squamous cell carcinomas (HNSCCs). Analysis of CDKN2A, the gene encoding p16, may further elucidate the association between p16 expression and prognosis. We sought to determine whether CDKN2A copy number loss was associated with poor survival in HPV-negative HNSCCs. Methods The Cancer Genome Atlas HNSCC clinical and genomic data were obtained and integrated. Patients <80 years old with a primary tumor in the oral cavity, oropharynx, hypopharynx, or larynx were included. Stratifying by copy number loss status, CDKN2A mRNA and p16 protein expression levels were examined and overall survival (OS) and disease-free survival (DFS) were evaluated. Results 401 patients with HPV-negative HNSCC were identified. 146 patients demonstrated CDKN2A copy number loss. The CDKN2A copy number loss group expressed significantly lower levels of CDKN2A mRNA and p16 protein than did the non-copy number loss group. Median OS for patients with and without CDKN2A copy number loss was 16.5 and 46.6 months, respectively (p = 0.007). Median DFS for both groups was 11.6 and 19.2 months, respectively (p = 0.03). In both univariate and multivariable analyses, stage IV designation, receipt of chemotherapy and CDKN2A copy number loss were predictive of OS. Conclusion CDKN2A copy number loss predicted poor survival independently of other patient and treatment factors and may be a clinically useful prognostic factor.

Background: HPV infection is associated with high p16 expression and good prognosis in head and neck squamous cell carcinomas (HNSCCs). Analysis of CDKN2A, the gene encoding p16, may further elucidate the association between p16 expression and prognosis. We sought to determine whether CDKN2A copy number loss was associated with poor survival in HPV-negative HNSCCs.
Methods: The Cancer Genome Atlas HNSCC clinical and genomic data were obtained and integrated. Patients <80 years old with a primary tumor in the oral cavity, oropharynx, hypopharynx, or larynx were included. Stratifying by copy number loss status, CDKN2A mRNA and p16 protein expression levels were examined and overall survival (OS) and disease-free survival (DFS) were evaluated.
results: 401 patients with HPV-negative HNSCC were identified. 146 patients demonstrated CDKN2A copy number loss. The CDKN2A copy number loss group expressed significantly lower levels of CDKN2A mRNA and p16 protein than did the non-copy number loss group. Median OS for patients with and without CDKN2A copy number loss was 16.5 and 46.6 months, respectively (p = 0.007). Median DFS for both groups was 11.6 and 19.2 months, respectively (p = 0.03). In both univariate and multivariable analyses, stage IV designation, receipt of chemotherapy and CDKN2A copy number loss were predictive of OS.
conclusion: CDKN2A copy number loss predicted poor survival independently of other patient and treatment factors and may be a clinically useful prognostic factor.
Keywords: cDKn2a, head and neck neoplasms, prognostic biomarkers, genomics and genetics, outcomes assessment inTrODUcTiOn It is well known that head and neck squamous cell carcinomas (HNSCCs) caused by human papillomavirus infection (HPV-positive) have considerably better prognosis than those not associated with HPV infection (HPV-negative) (1)(2)(3). To differentiate between HPV-positive and HPV-negative disease, p16 immunohistochemistry has historically been used; HPV viral protein E7 has been observed to downregulate pRb and subsequently increase p16 expression (4,5). While the gold standard of HPV detection in head and neck cancers is now RNA-based detection of viral proteins E6/E7 (6), clinical and molecular analysis of p16 tumor data suggests that p16 may play an important role in the pathogenesis of head and neck cancers (7)(8)(9)(10).
Given current understanding that HPV-positive and HPVnegative HNSCCs are clinically and biologically distinct, analysis of HNSCCs should ideally be stratified by HPV status. Interestingly, recent studies suggest that p16 expression varies greatly even among only HPV-positive or only HPV-negative HNSCCs (11)(12)(13). This wide variability in gene expression amongst patients with the same HPV status suggests that differences in p16 expression cannot be explained solely by HPV infection. CDKN2A, the gene that encodes p16, is frequently inactivated via copy number loss among HPV-negative head and neck cancer patients (14). Given the high prevalence of CDKN2A copy number loss, it is possible that this genomic abnormality may largely explain the wide variability in p16 expression among HPV-negative tumors. Moreover, considering the role of p16 as a known tumor suppressor, it is possible CDKN2A copy number loss may independently predict survival even when considering HPV-negative HNSCCs alone. We aim to investigate the emerging clinical significance of CDKN2A copy number loss in HPV-negative head and neck cancers using The Cancer Genome Atlas (TCGA).

Data source and study Population
The Cancer Genome Atlas is a joint effort by the National Cancer Institute and National Human Genome Research Institute that collected genomic and clinical patient data for 33 types of cancer. We analyzed TCGA head and neck cancer data, integrating various types of genomic measurements with clinical metadata. The TCGA data were analyzed as follows: previously published results of PCR-based RNA-detection of HPV E6/E7 RNA were used to identify HPV-negative cancers (15), and Affymetrix SNP6 copy number measurements were used to identify patients with CDKN2A copy number loss. CDKN2A mRNA expression (RNA-Seq v2) and p16 protein quantification (reverse phase protein array) were evaluated to characterize tumor CDKN2A expression. Since CDKN2A mRNA and p16 protein are downstream products of tumor CDKN2A DNA, mRNA and protein expression were expected to be relatively lower in individuals with CDKN2A copy number loss as long as the gene was transcriptionally and translationally active.
Inclusion criteria included patients with a primary tumor of known HPV-negative status in the oral cavity, oropharynx, hypopharynx, or larynx. Patients 80 years of age or greater were excluded for overall and disease-free survival (DFS) analyses. Clinical data were obtained from the TCGA Genomic Data Commons (16). Raw copy number data were acquired from the Broad Institute's Genome Data Analysis Centers Firehose website (17). HPV status designations for this cohort were downloaded from the supplementary files of a recent publication by Nulton et al. (15) mRNA counts and protein expression data were obtained from the MSKCC Cancer Genomics Data Server through the "cgdsr" R package (18).

statistical analyses and Variable Definitions
Copy number loss is defined as the loss of a chromosomal segment (e.g., during DNA replication). Loss of one or both copies of a gene contained in the deleted chromosomal segment often results in functional deficit due to gene under-expression (19). The primary independent variable, CDKN2A copy number loss, was defined a priori as having a relative log2 copy number ratio <−1. Kolmogorov-Smirnov testing was performed to evaluate differences in mRNA and protein expression between groups, both to validate CDKN2A copy number loss status and to investigate the transcriptional and translational effects of copy number loss. mRNA read count data were preprocessed by library size normalization using the TMM method, followed by log-transformation and z-scoring of mRNA reads (20).
Overall survival (OS) and DFS were the outcome variables examined. Wilcoxon rank-sum and χ 2 tests were performed to assess the relationships between CDKN2A copy number loss and various demographic, clinicopathologic, and treatment variables. Survival analyses were conducted using the Kaplan-Meier method with log-rank testing for significance. Cox proportional hazards models with and without multiple imputation of missing values were also fit to identify demographic, clinicopathologic, and treatment factors associated with survival. Feature selection for multivariable analyses was performed by including clinicopathologic features previously reported to be prognostic in HNSCCs, patient and treatment factors found to be prognostic in our univariate Cox regressions, and clinicopathologic variables that differed significantly in prevalence between our copy number loss and non-copy number loss groups. All independence and hypothesis tests were performed using a two-sided significance level of 0.05. R version 3.4.1 and the following R packages were used to perform all data visualization and statistical analyses: "ggplot2, " "survival, " "survminer, " "interval, " and "mice" (21)(22)(23)(24)(25)(26).

Demographic and clinicopathologic Differences Between genomic groups
We identified 401 patients under age 80 with HPV-negative head and neck cancer. Of these 401 patients, 146 (36.4%) exhibited CDKN2A copy number loss. The median age of all HPV-negative patients was 61 and the range was 19-79. The cohort tended to be mostly male (73.1%), with no significant difference in sex distribution between copy number groups. Anatomic site of lesion varied: 223 (56%) were cancers of the oral cavity, 66 (16%) were of the oropharynx, 105 (26%) were of the larynx, and 6 (1%) were of the hypopharynx. Clinical stage was used in place of pathologic stage for 44 patients for whom clinical stage but not pathologic stage information was available. The CDKN2A copy number loss group consisted of a slightly greater proportion of African Americans and  Stage III/IV tumors and tended to have higher rates of smoking and heavy alcohol consumption ( Table 1). The only statistically significant differences between groups at a 0.05 significance level were with regards to smoking status and heavy alcohol consumption.

mrna/Protein expression Differences Between Test and control groups
The CDKN2A copy number loss group exhibited significantly lower CDKN2A mRNA expression than did the non-copy number loss group (median −1.10 vs. 0.64, p < 2.2 × 10 −16 ; Figure 1A). Similarly, analysis of protein assay results for 174 samples for which RPPA data were available revealed a lower expression of p16 in the CDKN2A copy number loss group (median −0.94 vs. 0.19, p < 7.1 × 10 −11 ; Figure 1B). These results corroborate CDKN2A copy number loss status assignments. Furthermore, they demonstrate that CDKN2A copy number loss has a functional impact on gene transcription and translation.

DiscUssiOn
In our integrated genomic and clinical analysis of TCGA, CDKN2A copy number loss was associated with poor prognosis in HPV-negative head and neck cancer independently of other known prognostic factors including age, advanced tumor stage, and African American race (13,14,27). CDKN2A copy number loss was also strongly associated with decreased CDKN2A mRNA and protein expression, demonstrating significant impact on gene transcription and translation.
Univariate survival analysis found CDKN2A copy number loss to indicate worse prognosis in all HPV-negative disease. Stratifying by copy number and sub-stratifying by stage (early vs. late) showed that CDKN2A copy number loss indicated significantly poorer OS and DFS in advanced-stage but not early-stage disease. Given that our cohort consists primarily of advanced-stage tumors, lack of observed survival difference in the early-stage cohort may be due to low sample size. Follow-up studies with a greater number of early-stage tumors would be useful to validate this finding. Notably, copy number loss retained prognostic value on multivariable analysis. For this analysis, we included possible covariates identified by reviewing previous reports of patient factors predictive of survival and conducting univariate Cox regressions on our own data. Additionally, we included the variables of smoking and heavy alcohol consumption (which were both found to have significantly different prevalence between compared groups) in our multivariable Cox regression to identify possible confounders. The finding that high pathologic stage is an independent predictor of survival is consistent with previous findings, and the finding that receipt of chemotherapy also independently predicts poor survival is not too surprising considering that patients who receive chemotherapy tend to have more advanced, systemic disease even amongst high-stage cancers. The finding that African American race was predictive of survival on univariate analysis has precedence (13), and its prognostic value was not maintained in our multivariate analysis. This too is consistent with previous reports suggesting that the difference in survival between racial groups is likely related to socioeconomic factors resulting in treatment disparities (28). Interestingly, CDKN2A copy number loss was found to be significant on multivariable analysis, suggesting it may be clinically useful as an independent prognostic factor.
Some missing or unavailable clinical data limit the conclusions that can be drawn from this study. For instance, data on receipt of chemotherapy and adjuvant radiotherapy were sparse (missing in 43 and 16% of patients, respectively). These variables are of particular clinical interest, as outcomes analysis of individuals who receive these treatments can reveal insights into efficacy and help shape best practice guidelines. We were able to incorporate variables with sparse data into our multivariable analyses through categorical representation of unknowns or through multiple imputation, but these approaches are not perfect substitutes for actual values. Thus, as highlighted in a recent editorial, we stress the importance of documenting such treatment information more completely in future data collection efforts (29). Another limitation encountered was the absence of comorbidity data. We had access to many major demographic and clinicopathologic variables, but full comorbidity histories were not available. Such data are helpful when performing retrospective cohortbased analyses to more comprehensively control for clinical confounders. Additionally, though we used overall and DFS as outcome measures, cancer-specific survival would have been ideal. To facilitate access to comorbidity and cancer-specific survival data and to provide researchers with more complete longitudinal clinical data, we suggest that future genomics data collection efforts like TCGA consider linking with Medicare to provide researchers with more complete longitudinal clinical data for patients age 65 and older.
Despite these limitations, this study highlights strengths of the dataset and our integrated approach to clinical and genomic analysis. To our knowledge, of all clinically oriented genomic studies of HNSCCs, this cohort includes the largest collection of HPV-negative HNSCCs to date. In genomics research, sample size is often limited because of the extensive costs and workflow required to acquire and sequence patient samples (30). With the large number of samples in TCGA, we believe that our cohort is more representative of the total population of HPV-negative HNSCCs than smaller HNSCC cohorts of previously published clinical analyses. A methodologic strength of this study is that it integrates data from different sequencing platforms to validate the functional significance of the genomic abnormality, prior to survival analysis. We emphasize the importance of including complementary mRNA and protein expression data when evaluating mutations and copy number alterations, as not all mutations or abnormalities of a given gene have the same (or any) effect. Transcriptional and translational analysis can provide insights into the biological relevance of mutations and other genomic abnormalities in the context of other disease influences.
In conclusion, we found that CDKN2A copy number loss was associated with low expression of CDKN2A mRNA and p16 protein and indicated poor clinical prognosis in terms of disease progression and OS. These survival differences remained significant on multivariable analysis, suggesting CDKN2A copy number loss may have clinical utility as an independent prognostic factor for advanced-stage HNSCC. Through this analysis, we demonstrate the power and limitations of the TCGA database in analyzing the clinical impact of a genomic abnormality. Future large-scale genomic data collection efforts should emphasize linking genomic data with robust, longitudinal treatment and outcomes data to accelerate clinical discovery.
aUThOr cOnTriBUTiOns WC and JY conceived of the presented idea. WC, RB, ZH, JC, and JY contributed to experimental design, selection of outcomes measures, and variable selection. WC preprocessed and performed the initial data analysis. WC, RB, AM, TH, ZH, JC, and JY contributed to interpretation of the results. SG and JT verified the analytical methods. JY supervised the findings of this work. All authors discussed the results and contributed to writing the final manuscript.