Genomic Landscape of Head and Neck Squamous Cell Carcinoma Across Different Anatomic Sites in Chinese Population

Background The characteristics of head and neck squamous cell carcinoma (HNSCC) across different anatomic sites in the Chinese population have not been studied. To determine the genomic abnormalities underlying HNSCC across different anatomic sites, the alterations of selected cancer-related genes were evaluated. Methods Genomic DNA samples obtained from formalin-fixed, paraffin-embedded tissues were analyzed using targeted sequencing in a panel of 383 cancer-related genes to determine the genomic alterations. Results A total of 317 formalin-fixed, paraffin-embedded HNSCC specimens were collected, and a total of 2,156 protein-coding mutations, including 1,864 single nucleotide variants and 292 insertions and deletions, were identified across more than six different anatomic sites. Mutation loads were distinct across the anatomic sites. Larynx carcinoma was found with the highest mutation loads, whereas nasopharynx carcinoma showed the lowest mutation loads. A total of 1,110 gains and 775 losses were identified in the 317 specimens. Patients who had at least one clinically actionable alteration (levels 1–4 in OncoKB) were identified. One patient had an actionable alteration with level 1 evidence in OncoKB, TEX10-NTRK2 fusion, who may benefit from larotrectinib or entrectinib treatment. Conclusion The genomic profiling of HNSCC using targeted sequencing can identify rational therapeutic candidate genes suitable for the treatment of the HNSCCs.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC), the main type of malignancy in the head and neck, arises from the mucosal surfaces of the mouth, salivary glands, pharynx, larynx, nasal cavity, and paranasal sinuses, accounting for more than 90% of head and neck malignancies, and is the sixth most common malignant tumors in the world (Bray et al., 2018). The incidence of HNSCCs is relatively low in China: according to GLOBOCAN 2012, the estimated age-standardized incidence rate in China is 2.7 per 100,000 compared with the world rate of 8.0 per 100,000 (Li et al., 2015). However, the total case number is largely due to the large population base. There were reportedly ∼75,000 new cases in 2015, of which 37,000 died from HNSCC (Chen et al., 2016).
Phenotypic, etiological, biological, and clinical heterogeneities are characteristics of HNSCC. Alcohol and tobacco consumption, human papillomavirus, particularly human papillomavirus type 16 infection, and Epstein-Barr virus infection are the risk factors for different types of HNSCCs derived from various anatomic sites (Niedobitek, 2000;Hashibe et al., 2006;Boffetta et al., 2008;Gandini et al., 2008;Chaturvedi et al., 2011;Young and Dawson, 2014;Yoshizaki et al., 2018). HNSCCs are well recognized as a particularly challenging class of tumors to treat. Despite surgery, radiation, and chemotherapy, such treatments for HNSCC can result in cosmetic deformity and functional impairment of vital functions, and >50% of HNSCC patients die from this disease (Bray et al., 2018) with a 5-year survival rate of only approximately 50% (Leemans et al., 2011). Despite the approval of cetuximab (Vermorken et al., 2007(Vermorken et al., , 2008, a monoclonal antibody against epidermal growth factor receptor, the survival rates of HNSCC have improved very little over the past 40 years (Stransky et al., 2011;Du et al., 2019). Low survival outcomes in combination with significant toxicity of current treatment strategies emphasize the necessity for novel therapeutic modalities.
Similar to all solid tumors, HNSCC is thought to be initiated and progress through a series of genetic alterations. Comprehensive molecular profiling leads to the development of "personalized" or "precision" medicine. By promoting molecular diagnosis and targeted therapies, treatment of certain HNSCCs may soon be fundamentally transformed. Several studies have characterized alterations in a single anatomic site of head and neck in the Chinese population, including oral squamous cell carcinoma (Song et al., 2014;Izumchenko et al., 2015;Ma et al., 2018), nasopharyngeal carcinoma (NPC) Tu et al., 2018), and laryngeal squamous cell carcinoma (Zhang et al., 2014;Tao et al., 2018). These studies characterized the genomic alterations in these carcinomas. However, few studies describe and compare the characteristics of HNSCCs across different anatomic sites in the Chinese population.
To gain a comprehensive view of the genetic alterations underlying HNSCCs across different anatomic sites, we performed targeted deep sequencing on 317 samples from several anatomic sites in Chinese HNSCC patients. The genomic alterations of these HNSCCs were analyzed and compared.

Clinical Cancer Specimens
The HNSCC samples gathered for this study were approved by the Ethics Committee of the Third Affiliated Hospital of Kunming Medical University (No. QT201918; Kunming, Yunnan, China). Two pathologists reviewed a total of 317 formalin-fixed, paraffin-embedded (FFPE) tissues to make sure cancer cell contents were ≥20% before DNA extraction. The HNSCC samples were collected from September 2017 to March 2020 and sequenced in the Research and Development Institute of Precision Medicine, 3D Medicine Inc. (Shanghai, China). All patients provided written informed consent for their samples to be examined and their clinical data to be utilized.

Targeted Sequencing
For each clinical FFPE HNSCC specimen, genomic DNA was extracted using QIAamp DNA FFPE Tissue Kit (Qiagen) and then quantified by PicoGreen fluorescence assay (Invitrogen). Fifty to 200 ng of DNAs were fragmented to around ∼200 bp by sonication (Covaris) and constructed into sequencing libraries with KAPA Hyper Prep Kit (Kapa Biosystems) according to the protocol. Target regions covering 383 cancer-related genes were captured with baits for each library. The captured libraries were then amplified with polymerase chain reaction, followed by purification with 1.8 × SPRI, quantification by Qubit 3.0 (Life Technologies). Finally, libraries were sequenced on an Illumina Nextseq 500 platform.

Sequencing Data Processing
Raw sequencing reads were mapped to the human reference genome (hg19) using BWA-MEM v0.7.12 (Li and Durbin, 2010) with default settings. Mapped reads were then sorted and converted to BAM format using Picard (version 2.0.1) 1 , followed by the PCR duplicate removal process. For each BAM file, sequencing metric collections were summarized using an in-house script.

Base Substitution and Small INDEL Calling
A Bayesian methodology-based self-developed algorithm validated previously (Su et al., 2017) was used for base substitutions and small INDEL calling. A series of filtering models were applied to remove artifactual mutations in the calling algorithm, including corrections for background error, strand bias, base quality, mapping quality, and short tandem repeat regions. Somatic variant calls were then chosen with annotated mutations based on the following criteria: (1) for a given site, reads supported the alternative allele ≥ 8 and variant allele frequency ≥ 0.1; (2) only exonic or splicing sites were selected; (3) sites with strand bias ≥ 0.9 were removed; (4) INDELs longer than 40 bp were removed; and (5) sites with allele frequency larger than 0.015 in either of 1000 Genome Project or ESP6500 database were removed.

Copy Number Alteration Calling
A self-developed algorithm (in publishing) was used for the detection of copy number alterations. Briefly, we firstly built a mixed panel-of-normal (PON) using several normal samples. For each sample, including tumor and mixed PON, sequencing coverage depth was calculated by a fixed bin size across the targeted regions and normalized for GC content. A log2 ratio profile was obtained by dividing the normalized depth of each tumor sample by the mixed PON for all bins. A circular binary segmentation procedure was performed on the log2 ratio values to obtain copy number segments. To estimated tumor ploidy and purity, we used the B-allele frequency information of ∼5,000 selected single-nucleotide polymorphism loci from the human genome. We calculated the absolute copy number for each segment based on the log2 ratio values, tumor ploidy, and tumor purity.

DNA Rearrangement Analysis
Genomic rearrangements were identified using a self-developed algorithm (in publishing). Briefly, the tag information was extracted from BAM files from BWA-MEM for the clipped reads. Then, reads mapped to separate chromosomes or at a distance of more than 2 kb were selected for the detection of genomic rearrangements. The rearrangement results contained translocation, inversion, long deletion, etc.

Identification of Microsatellite Instability
To detect microsatellite instability (MSI) for each sample, 100 MS loci (repeat region from the human genome) were added into our targeted panel, and probes were specially designed for these loci. For each microsatellite locus, the distribution differences of repeat lengths were compared between tumor and paired normal samples. Each locus was classified as an MSI-high (MSI-H) or MSI-stable site, whether the differences were statistically significant or not. A sample was classified as MSI-H if there were not less than 75 MSI-H loci in the sample, otherwise MSI-stable.

Statistical Analysis
Wilcoxon rank-sum test was used to compare the tumor mutation load of HNSCCs across different anatomic sites. The comparison of alteration frequencies across anatomic sites was conducted using a proportion test with continuity correction. The differences in alteration frequency between this study and The Cancer Genome Atlas (TCGA) dataset were established using a proportion test with continuity correction. All of the statistical analyses were performed with R.

Clinical Specimens and Targeted Sequencing of Head and Neck Squamous Cell Carcinoma
In this study, a total of 317 HNSCC patients (Supplementary Table 1) were enrolled. FFPE samples and matched normal controls (peripheral white blood) were collected and subjected to targeted sequencing in a panel of 383 cancer-related genes (Supplementary Table 2) using Illumina Nextseq 500. The mean coverage of the sequencing depth was 542 × across the capture regions for all samples, whereas it was 730 × and 343 × for the examined tumor samples and the matched normal controls, respectively. According to the anatomic sites, the present cohort consisted of patients with paranasal sinus carcinoma (n = 4), NPC (n = 83), larynx carcinoma (LC, n = 60), oral cavity carcinoma (OCC, n = 111), hypopharynx carcinoma (HPC, n = 36), oropharynx carcinoma (n = 7), and 16 with unknown anatomic sites.

Characterizing the Somatic Single-Nucleotide Variants and INDELs
The most common mutations in HNSCC were C > T transitions (42.2%), followed was C > A transversions (22.4%), and the distribution of mutation spectrum in our cohort was similar to that in TCGA HNSCC cohort (Cancer Genome Atlas Network (CGAN), 2015) (proportion test P = 0.95; Supplementary  Figure 1). A total of 2,156 candidate somatic mutations [including 1,864 single-nucleotide variants (SNVs) and 292 INDELs involved in 273 genes] were identified in 317 samples with a median of five mutations (range 0-84). A mean of 5.5 non-synonymous mutations (including both SNVs and INDELs) per tumor was identified, resulting in a mean mutation rate of 4.2 mutations/Mb. This mutation rate was comparable with a previously published oral squamous cell carcinoma study (4.5 mutations/Mb) using a panel of 409 cancer-related genes (Nakagaki et al., 2017) and was more than twice as high as the rate seen in other HNSCC studies (Stransky et al., 2011; India Project Team of the International Cancer Genome Consortium (IPTICGC)., 2013; Vogelstein et al., 2013;Nakagaki et al., 2017). The results suggest that the cancer-related gene panel can successfully detect somatic mutations in FFPE samples. Of the top 23 highly mutated genes (mutated in ≥ 12 tumors), TP53 was the most frequently mutated gene (63.1%), followed by FAT1 (16.1%), KMT2D (13.9%), CDKN2A (12.9%), LRP1B (12.3%), NOTCH1 (11.0%), and CHD4 (10.4%) ( Figure 1A). The mutation frequencies of recurrent genes were highly consistent with those in the TCGA dataset (Cancer Genome Atlas Network (CGAN), 2015) (R 2 = 0.93; Supplementary Figure 2). However, four genes were significantly different. PIK3CA [6.9 vs. 18.4%; false discovery rate (FDR) = 1.7e-04, proportion test] and CDKN2A (12.9 vs. 21.9%; FDR = 0.011, proportion test) were observed with lower mutation frequency in the current Chinese cohort, whereas CHD4 (10.4 vs. 2.9%; FDR = 2.0e-04, proportion test) and BAP1 (4.1 vs. 0.6%; FDR = 0.0079, proportion test) were found with a higher mutation frequency in the current Chinese cohort. The mutational frequencies of recurrently mutated genes were also consistent between the NPC subgroup in the current Chinese cohort and NPC cohort from National University Hospital Singapore (Lin et al., 2014), and no significant statistical differences were found across the compared top mutated genes between the two datasets (Supplementary Figure 3).
The overall mutational loads across different anatomic sites of HNSCC were distinct. Due to the limited sample numbers, oropharynx carcinoma (n = 7) and paranasal sinus carcinoma (n = 4) were excluded from this analysis. LC had the highest mutation rate with a mean of 8.3 non-synonymous mutations per tumor, whereas NPC showed the lowest mutation rat with a mean of 3.6 non-synonymous mutations (Figure 1B and  Supplementary Table 3). The mutational loads of OCC (an average of 4.5 non-synonymous mutations per tumor) were lower than LC and HPC (7.3 non-synonymous mutations per tumor) but higher than NPC. The lowest somatic mutational loads of NPC are consistent with previous studies (Lin et al., 2014;Zheng et al., 2016;Li et al., 2017).
Interestingly, the CNV frequencies of recurrent genes across different anatomic sites were also different. Generally, most of the genes in HPC had the highest CNV frequencies ( Figure 2B). PIK3CA had the highest discrepancy across HPC, LC, NPC, and OCC, with CNV frequencies of 38.9, 31.7, 1.2, and 1.8% (FDR = 1.6e-11, proportion test), respectively.

Identification of Fusions
In total, there were 14 fusions in 14 patients detected in this study, among which 64.3% (9/14) of fusions were not found in the COSMIC database (Supplementary Table 4). Of note, one patient had TEX10-NTRK2 fusion, which can benefit from larotrectinib and entrectinib (Cocco et al., 2018). This fusion was annotated as a join of exon 9 of TEX10 to exon 15 of NTRK2 (Figure 3A), and the tyrosine kinase domain of NTRK2 was retained ( Figure 3B). Thus, this fusion can produce a functional fusion transcript. We also found two patients with an FGFR3-TACC3 fusion that could be a target for cancer therapy [29].

Deregulated Cancer-Related Pathways in the Chinese Head and Neck Squamous Cell Carcinoma Cohort
To further understand the function of the identified gene alterations in this Chinese cohort, SNVs/INDELs and CNVs were combined together to perform the pathway analysis, and a limited number of cancer-related pathways targeted by frequent genome alterations were frequently deregulated in HNSCC (Figure 4). Most of the HPC and LC samples had at least one gene alteration belonging to the receptor tyrosine kinase  (RTK)/RAS/phosphatidylinositol-3-OH kinase (PI3K) pathway. The alteration frequency of the RTK/RAS/PI3K pathway in this Chinese cohort was much higher than that in the TCGA HNSCC dataset (Cancer Genome Atlas Network (CGAN), 2015). Among the RTKs, FGFR1 alterations were the most frequent, followed by EGFR and ERBB2 alterations. For the downstream of the RTK/RAS/PI3K pathway, PIK3CA had the highest alteration frequency. Interestingly, the alteration frequencies of PIK3CA varied greatly across different anatomic sites. More than half of the HPC samples had PIK3CA alterations, whereas only 2% NPC and 7% OCC samples altered in this gene. For the downstream cell cycle pathway, a total of 97% HPC, 97% LC, 61% NPC, and 82% OCC samples were altered. Among components of the cell cycle pathway, TP53 alterations dominated, followed by CDKN2A. CCND1 and MYC were the two most frequently altered oncogenes of this pathway. Further alterations of NOTCH1 and FAT1 linked functionally to β-catenin (CTNNB1) are also detected. As tumor-suppressor genes, the inactivation of FAT1 and NOTCH1 may converge to inhibit the Wnt/β-catenin signaling pathway, which is implicated in the deregulation of cell polarity and differentiation. Compared with the patients from the other anatomic sites, a higher alteration frequency in the nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) signaling pathway was found for NPC patients, although the alteration frequencies in the other pathways were lower for them. CYLD alterations dominated in this pathway, followed by NFKBIA (Figure 4).

Identification and Characterization of the Actionable Alterations
To identify and characterize the actionable alterations in this Chinese cohort, different kinds of genetic alterations types, including SNVs/INDELs, CNVs, fusions, and MSI (Supplementary Table 5) status, were combined and annotated using the OncoKB knowledge database (Chakravarty et al., 2017). OncoKB is a database containing information about the effects and treatment implications of specific cancer gene alterations, which is curated from guidelines from the Food and Drug Administration, National Comprehensive Cancer Network, or American Society of Clinical Oncology, ClinicalTrials.gov, and the scientific literature. According to the annotation results (Supplementary Table 6), 85.5% (271/317) of patients had at least one oncogenic mutation, and 44.5% (141/317) had at least one actionable alteration. The evidence levels of most actionable alterations were levels 3 and 4. We summarized all actionable alterations to 37 kinds of alterations based on gene symbol and variant type (Supplementary Table 7). CDKN2A deletion was the most frequent actionable alteration affecting 15.1% of total cases, followed by CDKN2A SNVs/INDEs covering 11.4% of all cases. There were 5% of patients with actionable alterations on PIK3CA SNVs/INDEs of level 3B. Of note, there were three patients with level 1 alteration, including one patient with TEX10-NTRK2 fusion and two patients with MSI-H.

DISCUSSION
In this study, we delineated the genome alteration landscape of HNSCC in a Chinese cohort across different anatomic sites. Overall, the mutation loads and CNV frequency in NPC were lower than those in the other anatomic sites (Figures 1A, 2A), consistent with other studies (Lin et al., 2014;Zheng et al., 2016;Li et al., 2017). Thus, we confirmed the molecular characteristics of the Chinese population. Among the recurrently mutated genes, TP53 was the most highly mutated gene across different anatomic sites. The mutation frequency of TP53 in NPC was much lower than that in the other sites, indicating that the mechanisms of tumor tumorigenesis may be different between NPC and other HNSCC sites.
HPC and LC were found with higher CNV alteration frequency compared with NPC and OCC (Figure 2A). PIK3CA had the highest CNV discrepancy across the four anatomic sites ( Figure 2B). PIK3CA is a well-known oncogene that can play an important role in cancer development. PIK3CA gains also recurrently occurred in TCGA HNSCC dataset (Cancer Genome Atlas Network (CGAN), 2015). However, PIK3CA gain was much lower in OCC in the current cohort than in TCGA OCC (1.8 vs. 15.1%; P = 3.4e-04, proportion test). The co-amplification of FGF3, FGF4, FGF19, and CCND1 in HNSCC patients was most likely due to co-localization of the loci in the same chromosomal region (11q13) (Parish et al., 2015).
Most genes and pathways analyzed in NPC were relatively altered in a lower frequency; however, CYLD and NF-κB signaling pathways in NPC had a higher mutation frequency than the other sites (Figure 4). A previous study detected enrichment of genomic abnormalities of multiple negative regulators of the NF-κB pathway, including CYLD, TRAF3, NFKBIA, and NLRC5 (Li et al., 2017). In this study, such observation confirmed that genes altered in the NF-κB signaling pathway had a higher mutation frequency in NPC than in the other sites. Activation of NF-κB is associated with a poorer prognosis in NPC (Zhang et al., 2011;Sun et al., 2012), supporting the rationale for exploiting NF-κB inhibition as a potential treatment strategy for NPC. Several studies have demonstrated that targeting the NF-κB pathway is a promising treatment strategy in a variety of cancers (Sunwoo et al., 2001;Lun et al., 2005;Zhang et al., 2015). Thus, it is rational to apply the NF-κB activation status to stratify NPC patients who may be more likely to benefit from treatment with NF-κB inhibitors. CYLD is a tumor suppressor gene encoding a cytoplasmic protein that functions as a deubiquitinating enzyme that deubiquitinates TRAF2/5/6, which is an agonist of the NF-κB signaling pathway (Sun, 2010). A previous study also reported that CYLD could regulate p53 DNA damage response by removing K48-linked ubiquitin chains from p53 (Fernandez-Majada et al., 2016). Interestingly, the mutation of CYLD and TP53 was absolutely mutually exclusive in NPC patients, indicating different roles of CYLD and TP53 in NPC oncogenesis.
Most HNSCC samples harbor potentially actionable mutations. Using genomic profiling, we discovered the potentially actionable mutation (either directly targeted or a pathway component of a directly targeted gene by an approved or investigational drug), which could be targeted with either an offlabel or an investigational therapeutic available in a clinical trial. Nearly half of the HNSCC patients were identified with clinically actionable alterations in this cohort. Therefore, these patients may benefit from Food and Drug Administration-approved drugs or clinical trials. TEX10-NTRK2 fusion was identified in an LC patient, and the patient may benefit from larotrectinib or entrectinib with level 1 evidence, and two MSI-H patients may benefit from Keytruda. Although the majority of alterations were used as biomarkers in clinical trials, patients with these alterations may also benefit from clinical trials.
The limitations of this study are worth mentioning. First, the observed differences across different anatomic sites need verification. Second, the genes significantly different between this Chinese cohort and TCGA dataset also need verification.
In conclusion, we performed targeted sequencing in 317 Chinese HNSCC patients. The molecular characteristics and differences were analyzed and compared across different anatomic sites. Actionable mutations were identified, and patients who may benefit from the actionable alterations from clinical treatment were also identified.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Figshare Database: https://figshare.com/projects/Genomic_landscape_of_ head_and_neck_squamous_cell_carcinoma_across_different_ anatomic_sites_in_Chinese_population/111842.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the Third Affiliated Hospital of Kunming Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SY, FL, HC, NX, and DZ: conception and design. SY and SZ: administrative support. YJ, XW, and QL: provision of study materials or patients. YJ, XW, HW, and QL: collection and assembly of data. YJ, XW, HW, BL, SY, and SZ: data analysis and interpretation. YJ, XW, HW, BL, QL, DZ, HC, NX, FL, SY, and SZ: manuscript writing and final approval of manuscript.