Identifying Actionable Variants Using Capture-Based Targeted Sequencing in 563 Patients With Non-Small Cell Lung Carcinoma

Although the NSCLC diagnostic standards recommend the detection of driver gene mutation, comprehensive genomic profiling has not been used widely in clinical practice. As to the different mutation spectrum characteristics between populations, the research based on Chinese NSCLC cohort is very important for clinical practice. Therefore, we collected 563 surgical specimens from patients with non-small cell lung carcinoma and applied capture-based sequencing using eight-gene panel. We identified 556 variants, with 416 potentially actionable variants in 54.88% (309/563) patients. These single nucleotide variants, insertions and deletions were most commonly found in EGFR (55%), followed by ERBB2 (12%), KRAS (11%), PIK3CA (9%), MET (8%), BRAF (7%), DDR2 (2%), NRAS (0.3%). By using ten protein function prediction algorithms, we also identified 30 novel potentially pathogenic variants. Ninety-eight patients harbored EFGR exon 21 p.L858R mutation and the catalytic domain of the protein tyrosine kinase (PTKc) in EGFR is largely mutated. In addition, there were nine frequent pathogenic variants found in five or more patients. This data provides the potential molecular basis for directing the treatment of lung cancer.

INTRODUCTION Non-small cell lung carcinoma (NSCLC) is the leading factor of cancer death rate worldwide (18.4% of the total cancer deaths) (1). With the emergence of targeted therapies for NSCLC, genetic testing has become a mandatory component for guiding the patient treatment (2). NSCLC diagnostic standards has included the detection of epidermal growth factor receptor (EGFR), B-Raf protooncogene (BRAF), MET proto-oncogene (MET), erb-b2 receptor tyrosine kinase 2 (ERBB2), and erb-b2 receptor tyrosine kinase 2 (KRAS) mutations (3). However, comprehensive genomic profiling has not been used widely in clinical practice.
For the time being, laboratory work about therapeutic targeted genes of NSCLC has been undertaken in many countries. A study sequenced whole-exome genome in 31 NSCLCs and identified common and unique mutation spectra (4). Hu et al. performed a genome-wide association scan in 2,331 lung cancer patients and found four related SNPs (5).Si et al. described low-frequency gene alterations by next-generation sequencing (NGS) (6). NGS sequencing maybe more sensitive to detect actionable genomic alterations (7,8).
Despite previous studies have identified and discovered many lung cancer driver genes, the mutation spectrum characteristics of NSCLC patients is different between populations. The research of genomic characteristics of a large-scale Chinese NSCLC cohort is very important for clinical practice. Therefore, we conducted a retrospective study to use a hybrid capture-based eight-gene panel NGS assays to detect driver genes in tumor surgical specimens from patients with NSCLC. Among the eight target genes, EGFR, BRAF, MET, ERBB2, and KRAS are lung cancer target therapy-associated genes recommended by NCCN Clinical Practice Guidelines in Oncology. Neuroblastoma RAS viral (v-ras) oncogene homolog (NRAS), phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), and discoidin domain receptor tyrosine kinase 2 (DDR2) are supported by clinical trials, literature, and in vitro evidence. We intended to confirm the clinical feasibility and utility of the capture-based targeted sequencing in reflecting the genetic profiles and assisting clinicians in clinical decision-making. Moreover, 30 novel potentially pathogenic variants we identified emerged as a potential therapeutic target for NSCLC.

Samples Collection
We collected 563 tissue biopsy samples from 563 patients with NSCLC. Clinical diagnosis was verified by cytopathology. The Ethics Committee of the Affiliated Hospital of Qingdao University approved the study (approval no.QYFY WZLL 26620). All patients provided signed informed consent. We performed the experiments according to the guideline released by the National Health and Family Planning Commission of the PRC.

Biopsy DNA Extraction
According to the manufacturer's instructions, DNA was extracted from tissue biopsy samples using the QIAamp DNA formalin-fixed paraffin-embedded (FFPE) tissue kit (Qiagen). Cell-free DNA was isolated from plasma using the QIAamp Circulating Nucleic Acid kit (Qiagen). DNA integrity, purity, and concentration were assessed by agarose gel electrophoresis, the NanoDrop2000 spectrophotometer, and the Qubit 2.0 fluorimeter (Thermo Fisher Scientific). Qualified DNA samples were used for library construction.

Library Construction and Sequencing
Library construction was performed as previously described (9). DNA was fragmented randomly using ultrasound, followed by end repair, phosphorylation, and adaptor ligation. Fragments were hybridized with a custom sequence capture-probe (Nimblegen, USA), amplified through PCR, and sequenced on an Illumina Hiseq2500 platform with 2×101 bp paired-end reads (Illumina, San Diego, USA). The panel enables capture-based ultra-deep targeted sequencing for the following genes: EGFR, KRAS, MET, BRAF, NRAS, PIK3CA, DDR2, and ERBB2 (Supplementary Table 1).

Statistical Analysis
We presented frequency and percentage for descriptive statistics. We used pair-wise Fisher's exact test, which was performed with GraphPad Priem 8 (https://www.graphpad.com), to compare the difference between the rates of affected cases in The Cancer Genome Atlas (TCGA) cohort and this cohort. We also used maftools to perform Fisher's exact test to detected mutually exclusive or co-occurring set of genes. P < 0.05 was considered statistically significant.

Sequencing and Variant Detection
We conducted a retrospective nationwide study to survey the prevalence rate of driver gene mutations in advanced (stage IIIB to IV) Chinese NSCLC patients with various histological subtypes. We conducted capture-based targeted sequencing of 563 primary lung tumors samples. On average, we generated 8.4 Gb of sequence per sample to a mean average depth of~1,810×. To reduce false-positive rates and eliminate common germline mutations, we adopted comprehensive filtering criteria and removed variants with population frequencies higher than 1% in dbSNP, ExAC, 1000 Genomes, and ESP6500SI-V2 database. After filtering, we identified 556 variants in 354 (62.88%) samples (Supplementary Table 2), including 438 SNVs, 26 insertions, and 92 deletions (Figure 2A). Among them, 416 variants were loss of function (LOF) in 309 samples, i.e., 329 missense SNVs, three nonsense SNVs, 84 indels, and three splicing variants ( Figure 2B). Missense SNVs and frameshift indels generally lead to the inactivation of the protein products. So these variants may be clinically significant and were included in the subsequent analyses.

Clinical Implications of Mutations
We annotated the clinical significance of these mutations based on ClinVar, dbSNP COSMIC70, and OncoKB databases. We predicted the pathogenicity using ten algorithms following the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) guidelines. In total, 380, 439, and 319 variants were annotated to the ClinVar, dbSNP, and COSMIC70 databases ( Figure 4A). ClinVar and COSMIC70 database-registered variants (n=292) were considered functionally important mutations. After comparing with ClinVar, we classified 98 variants as pathogenic, 70 as benign, 173 as drug response, two as risk factors, and 213 as variants of uncertain significance (VUS; Supplementary Table 4). Among VUS, nine had conflicting evidence for pathogenic and benign criteria, while the others did not have enough evidence. In addition, 68 variants were oncogenic and 17 were likely oncogenic in the OncoKB database. Among the 104 novel variants, which were not registered in the dbSNP build 150, ClinVar, and COSMIC70, 30 SNVs were predicted as deleterious by at least three algorithms (Supplementary Tables 2,  5). The gene with the highest number of novel variants was MET (n = 9) followed by ERBB2 (n = 6), DDR2 (n = 4), EGFR (n = 4), PIK3CA (n = 3), BRAF (n = 3), and KRAS (n = 1; Figure 4B). Among them, ERBB2 p.R678W mutation was predicted as likely oncogenic according to the OncoKB database.

DISCUSSION
In this descriptive study, we retrospectively investigated a cohort of 563 NSCLC patients using the capture-based targeted sequencing and revealed a hotspot mutations spectrum in NSCLC. We identified 556 variants, with 416 potentially actionable variants in 54.88% (309/563) patients. The percentage of patients harboring actionable genetic alterations is slightly lower than the previous studies (62%) (31). This is due to the small target gene panel. We compared our list of mutated genes with the COSMIC, dbSNP, and ClinVar databases, then used ten protein function prediction algorithms to revealed that 30 new mutated genes found in our exome study have not yet been reported. Otherwise, our study accurately reproduced a number of some observations in previous studies. For example, our mutation spectrum is similar to the somatic mutation spectrum of NSCLC reported in other studies (4,32,33). In our cohort, EGFR (55%) is the most common driver mutations followed by ERBB2 (12%), KRAS (11%), PIK3CA (9%), MET (8%), BRAF (7%), DDR2 (2%), NRAS (0.3%). Our data further highlight the importance of these mutations in lung tumorigenesis. The EGFR exon 21 p.L858R mutation is the highest-frequency mutation and the catalytic domain of the protein tyrosine kinase (PTKc) in EGFR is largely mutated. L858R mutation is considered sensitive to EGFR-TKIs. As expected, C>T/G>A transversions were the most common substitution in patients with lung cancer, which was the tobacco exposure-related mutation signatures (34,35). EGFR/BRAF, EFRF/KRAS, KRAS/ERBB2 demonstrated a mutually exclusive relationship, consistent with the recent evidence from cancer genomic studies which demonstrated that driver genes are often mutated in a mutually exclusive manner (36).
We compared our cohort with TCGA Caucasians cohort to explore the mutation profile in Chinese NSCLC patients, the mutation frequency of EGFR, KRAS, DDR2, ERBB2, and MET in our cohort is different from the frequency in TCGA cohort. This underlines the importance of the extensive analytical investigations. This study could help develop targeted treatment strategy and design gene panel which are more suitable for Chinese NSCLC patients.
There are some weaknesses in the present study that must be recognized. Firstly, because surgical specimens (n = 563) were collected over a longer period, patient clinical information was missing, which placed a somewhat large range of limitations. Secondly, we focused on point mutations and small insertions and deletions in this study. As to the complex genomic alterations of lung tumors, gene fusion and copy number variant analysis should be included in the future study. Third, we did not sequence tumors' matched normal samples simultaneously, which resulted in the inability to distinguish between somatic and germline mutations.
In summary, we showed a clear genomic landscape of the mutation frequencies of oncogenic drivers and 30 novel potentially pathogenic variants from 563 patients with NSCLC. This data may assist clinicians in clinical decision-making and provides the potential molecular basis for directing the treatment of lung cancer.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://db.cngb.org/, CNP0002486.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics Committee of the Affiliated Hospital of Qingdao University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XJ and ZZ participated in study conception and design. HX, WL, XY, HT, and CM enrolled and managed patients. HJ, HT, HX, and WL carried out collection and assembly of data. HJ, YW, and XW were involved in data analysis and interpretation. HJ and YW prepared the manuscript and manuscript figures. XJ and ZZ edited, critically read, and revised the manuscript. All authors contributed to the article and approved the submitted version.