Comprehensive Genomic Profiling of Rare Tumors: Routes to Targeted Therapies

Comprehensive Genomic Profiling may be informative for novel treatment strategies and to improve outcomes for patients with rare tumors. This study aims to discover opportunities for use of targeted therapies already approved for routine use in patients with rare tumors. Solid tumors with an incidence lower than 2.5/100,000 per year was defined as rare tumors in China after comprehensive analysis based on epidemiological data and current availability of standardized treatment. Genomic data of rare tumors from the public database cBioPortal were compared with that of the Chinese population for targetable genomic alterations (TGAs). TGAs were defined as mutations of ALK, ATM, BRAF, BRCA1, BRCA2, CDKN2A, EGFR, ERBB2, FGFR1,2,3, KIT, MET, NF1, NTRK1,2,3, PIK3CA, PTEN, RET, and ROS1 with level 1 to 4 of evidence according to the OncoKB knowledge database. Genomic data of 4,901 patients covering 63 subtypes of rare tumor from cBioPortal were used as the western cohort. The Chinese cohort was comprised of next generation sequencing (NGS) data of 1,312 patients from across China covering 67 subtypes. Forty-one subtypes were common between the two cohorts. The accumulative prevalence of TGAs was 20.40% (1000/4901) in cBioPortal cohort, and 53.43% (701/1312) in Chinese cohort (p < 0.001). Among those 41 overlapping subtypes, it was still significantly higher in Chinese cohort compared with cBioPortal cohort (54.1%% vs. 26.1%, p < 0.001). Generally, targetable mutations in BRAF, BRCA2, CDKN2A, EGFR, ERBB2, KIT, MET, NF1, ROS1 were ≥3 times more frequent in Chinese cohort compared with that of the cBioPortal cohort. Cancer of unknown primary tumor type, gastrointestinal stromal tumor, gallbladder cancer, intrahepatic cholangiocarcinoma, and sarcomatoid carcinoma of the lung were the top 5 tumor types with the highest number of TGAs per tumor. The incidence of TGAs in rare tumors was substantial worldwide and was even higher in our Chinese rare tumor population. Comprehensive genomic profiling may offer novel treatment paradigms to address the limited options for patients with rare tumors.


INTRODUCTION
Molecular profiling to identify potential therapeutic targets has been widely applied in common tumors such as lung cancer (1,2), breast cancer (3,4), melanoma (5), and colorectal cancer (6,7). The use of targeted therapy in selected patients can significantly improve outcomes. Increasingly, clinical trials feature targeted therapeutic agents or require a specific biomarker for entry (8,9). However, limited information is available regarding the utility of targeted therapy for rare tumors (10,11). What's more, while rare individually, rare tumors cumulatively account for over 20% of adult malignant neoplasms in the United States (12,13).
There is no universally applied definition for rare tumors ( Table 1). The European Society for Medical Oncology (ESMO) defines a rare tumor as a tumor with an annual incidence of 6/10,000 (14) in Europe. The National Cancer Institute (NCI) (https://www.cancer.gov/publications/dictionaries/cancerterms/def/791790) and Food and Drug Administration (FDA) (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789814/) defines it as a tumor with an annual incidence of <15/10,000 in the US. According to the NCI definition, lung cancer, colon cancer, breast cancer, prostate cancer, endometrial carcinoma, rectal cancer, ovarian cancer, kidney cancer, melanoma, non-Hodgkin lymphoma, and gastric cancer belong to common cancers.
There is some discordance between these definitions and data specific to China. While esophageal cancer and hepatocellular carcinoma are rare tumors according to NCI definition, these are common in China based on annual incidence. On the other hand, skin tumors, especially basal cell carcinoma, are common tumors in the United States, with an incidence of 255.6/100,000 (15), but are relatively rare in China (14) (2.4/100,000 for all skin tumors). This suggests that the definitions from US and Europe were possibly not appropriate in China based on the different incidences and prevalence of tumors.
This study analyzed data from the National Cancer Registry office of the National Cancer Center (16) and integrated it with presently available treatment options to generate a definition of rare tumors specific to China. Subsequently, available data for targetable genomic alterations (TGAs) of two cohorts of rare tumors from the cBioPortal and Geneplus databases were collected and analyzed. Our work provides valuable knowledge to guide personalized, targeted therapy for rare tumors.

Definition of Rare Tumors in China
We consulted National Cancer Registry of the National Cancer Center, China (16) and generated an estimation of incidence of tumors in mainland China. Tumor types were classified according to the International Classification of Diseases (ICD), and we comprehensively synthesized the epidemiology data and availability of standard treatment in China as well as opinions of experts from National Cancer Center. We then defined rare tumors in China according to the following standardizations ( Table 2): 1. First, we eliminated the tumors from systems or organs which have consensus or guidelines for treatment in China; an incidence of "2.5/100,000 per year" was selected as a cut-off value for "rare tumor" for tumors with unique ICD codes listed with systems or organs; 2. Secondly, we searched OncoTrees (http://oncotree.mskcc. org/) to further investigate the subtypes of those common tumors that (1) have a distinct ICD code and (2) exhibit an incidence >2.5/100,000 per year in China. We included subtypes of those tumors after further confirming that the incidence of which was ≤2.5/100,000 per year in China by searching Pubmed database (https://www.ncbi.nlm.nih.gov/ pubmed/) and the China National Knowledge Infrastructure (CNKI) database; 3. Finally, we also included cancers of unknown primary (CUP) tumors, not only because the incidence of those tumors was ≤2.5/100,000 per year in China, but also because there were no consensus or guidelines for treatment of CUP in China.

Definition of Targetable Mutations
According to the OncoKB Framework

Estimation of Targetable Mutations
To estimate the prevalence of targetable mutations in rare tumors, we queried the cBioPortal database using the genes listed in Supplementary Table 1 in a manually curated set of 175 non-redundant studies, including TCGA and non-TCGA

Patient Recruitment
We retrospectively analyzed genomic profiling data of 1,312 patients with rare tumors from Geneplus database. This database contained patients enrolled from multiple hospitals of China from September 2015 to October 2019 (18,19). All patients received next-generation sequencing (NGS) testing in Geneplus-Beijing Institute after obtaining written informed consent. Meanwhile, all the patients were stratified into different clinicopathological subgroups according to OncoTree system (http://oncotree.mskcc.org/). All tissues samples included in this study underwent an onsite pathology review to confirm histologic classification and tumor tissue adequacy, which required a minimum of 20% of tumor cells. Genomic profiling was performed in a College of American Pathologists-accredited laboratory (Geneplus-Beijing) using the Illumina Nextseq CN 500 or Gene+Seq 2000 instrument (20,21). Briefly, serial sections from formalinfixed paraffin-embedded (FFPE) tumor tissues were used for genomic tumor DNA extraction using the QIAamp DNA mini kit (Qiagen, Valencia, CA). ctDNA was isolated from 4 to 5 mL of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Valencia, CA). DNA from leukocytes was extracted using the DNeasy Blood Kit (Qiagen, Valencia, CA). Sequencing libraries were prepared from ctDNA using KAPA DNA Library Sequencing data were analyzed using default parameters. Adaptor sequences and low-quality reads were removed. The clean reads were aligned to the reference human genome (hg19) using Burrows-Wheeler Aligner (BWA; version 0.7.12-r1039). Realignment and recalibration were performed using GATK (version 3.4-46-gbc02625). Single nucleotide variants (SNV) were called using MuTect (version 1.1.4) and NChot, a software developed in-house to review hotspot variants (22). Small insertions and deletions (InDels) were determined by GATK. Somatic copy number alterations were identified with CONTRA (v2.0.8). The final candidate variants were all manually verified using Integrative Genomics Viewer.
Targeted capture sequencing required a minimal mean effective depth of coverage of 300× in tissues and 1,000× in plasma samples. For the 1,312 patients included in our study, the mean effective depth of coverage is 1,295× in tissues and 2,014× in plasma samples and 299× in germline DNA samples (Supplementary Table 4).
TGAs simultaneously detected by this assay included base substitutions, short insertions and deletions, focal gene amplifications and homozygous deletions (copy number alterations) and select gene fusions and rearrangements. Variants were filtered to exclude synonymous variants, known germline variants in dbSNP, and variants that occur at a population frequency of >1% in the Exome Sequencing Project. Germline variants were interpreted following ACMG guidelines, and the variants were classified as pathogenic, likely pathogenic, unknown significance, likely benign, and benign.

Statistics
The Chi-square test or Fisher's exact test was performed to compare frequency targetable mutations between groups. All statistical analysis was performed with SPSS (v.23.0; STATA, College Station, TX, USA) or GraphPad Prism (v. 6.0; GraphPad Software, La Jolla, CA, USA) software. Statistical significance was defined as a two-sided P-value of < 0.05.

Mutation Profiling of Rare Tumors in cBioPortal Database
Rare

Mutation Profiling of Chinese Patients With Rare Tumors
We recruited a second, independent patient cohort from another pan-China database, Geneplus. One thousand three hundred and twelve patients (1312) with rare tumors were included for the study. The clinicopathological characteristics of all the patients are summarized in
We first compared the overall prevalence of TGAs in these two cohorts. The prevalence of targetable mutations was significantly higher in our cohort compared with the data from cBioPortal (53.4 vs. 20.4%, p < 0.001) ( Table 6). Specifically, mutations or amplifications of BRAF, BRCA2, CDKN2A, EGFR, ERBB2, KIT, MET, NF1, ROS1 were 3 or more times more frequent in our cohort than in the cBioPortal cohort. Alterations of BRCA1, NTRK fusion were slightly more common in the cBioPortal cohort. When restricting analysis to the 41  overlapping subtypes, the difference of targetable mutations was still significant (54.1 vs. 26.1%, p < 0.001). We further focused on 4 rare tumors (gallbladder cancer, astrocytoma, gastrointestinal stromal tumor, and cancer of unknown primary) with more than 30 cases in both cohorts. We found the overall incidence rate of targetable mutations was higher in our cohort (Supplementary Table 8). For gallbladder cancer, ERBB2 and BRCA2 mutations were significantly more frequent in our cohort, while ATM mutation was enriched in the cBioPortal cohort ( Figure 1A) (23). For astrocytoma, BRAF, ATM, CDKN2A, and EGFR mutations/amplifications were highly enriched in our cohort ( Figure 1B). For gastrointestinal stromal tumor, the prevalence of the KIT mutation was similar between the two groups, but our cohort had a significantly higher prevalence of CDKN2A and NF1 (Figure 1C). For cancer of unknown primary, EGFR mutation and ALK fusion were highly enriched in our cohort, which indicate that those tumors might originate from lung ( Figure 1D).

DISCUSSION
This study focused on rare tumors in China and proposed a novel definition of rare tumors customized for China by jointly considering frequency and clinical characteristics to addresses the disparate requirements of clinical decision-making, clinical research, drug development, and health care services. Applying this new definition, a comprehensive list of rare tumors was explored for genetic biomarkers of response to targeted therapy both in the worldwide cBioPortal database and a mainland China-specific patient cohort mainly to explore potential novel treatment indications for those rare tumors in China. Results show that targetable gene alterations are frequently present in rare tumors, and that these mutations are enriched in Chinese population as compared to the general global population. Most importantly, a definition of rare tumors in China was proposed for the first time based on the epidemiology data and availability of standard treatment in China. An incidence  Frontiers in Oncology | www.frontiersin.org of ≤2.5/100,000 per year as a cut off value for rare tumor in China is novel and it is rigorous compared with those of the USA and Europe which is 15/100,000 and 6/100,000 respectively. The disparity should be mainly attributed to the facts that China has a larger population base, and a different epidemiological distribution for most types of tumors compared to western countries. We believe any threshold for rarity is artificial and should be considered as just indicative. We should always be aware that an incidence threshold rate as a line for rareness should be used with flexibility. The most important purpose of proposing the definition is to increase the attention from clinical practitioners and government personnel of China, as well as drug investigators all over the world, to promote the development of novel drugs and strategies for those rare tumors without consensus and guidelines for effective treatment in China, and finally to improve the outcome of rare tumor patients. After applying our rare tumor criteria to patient data, we discovered the overall prevalence of TGAs in Chinese rare tumor patients' cohort was much higher than that of the cBioPortal cohort. We restricted our analysis of TGAs to genes having Level 1-4 evidence of being a cancer gene according to the OncoKB knowledge database. Using this framework, we identified mutations of ALK, ATM, BRAF, BRCA1, BRCA2, CDKN2A, EGFR, ERBB2, FGFR1,2,3, KIT, MET, NF1, NTRK1,2,3, PIK3CA, PTEN, RET, and ROS1 within our cohort. The cumulative prevalence of TGAs was significantly higher in Chinese cohort (53.43%) compared with general population worldwide (26.1%). This indicates that there might be higher possibilities those patients could benefit from targeted therapies. The underlying causes for the disparities in mutation prevalence were complicated as the two cohorts had significantly different compositions of tumor subtypes, as well as different numbers of patients in each subtype. The overall difference between the two cohorts was still significant (p < 0.001) if we only studied the shared 41 subtypes of rare tumor. This phenomenon is in agreement with the data showing that EGFR mutation rate in Asian NSCLC patients is higher than that of Caucasian patients. Our findings indicate that the classification of "rare tumor" is heterogeneous by ethnicity.
We also found that most common TGAs in both cohorts are actionable with available drugs. The top 5 targetable mutations found in Chinese patients cohort were EGFR, KIT, CDKN2A, PIK3CA, and PTEN; and in the cBioPortal cohort were PIK3CA, PTEN, KIT, CDKN2A, and ATM. Regarding the 4 shared targetable mutations, there is at least one targeted drug for each mutation (imatinib for KIT, palbociclib for CDKN2A, temsirolimus and everolimus for PIK3CA and PTEN) currently available in China ( Table 6). This suggests that we have available effective treatment options for some rare tumor patients.
Finally, our data indicate that samples for genetic profiling of rare tumor are still inadequate. There are only 10.5% (4901/46566) tumor samples from rare tumors in cBioPortal database. Moreover, 52 out of 141 (36.9%) subtypes of rare tumors did not have genetic data available in cBioPortal or in our cohort (Supplementary Table 9). For most subtypes with data, the median number of samples was 19 in cBioPortal and 5 in our cohort. Considering the high prevalence of TGAs in the rare tumor population and the largely unmet medical needs of those patients, more attention and efforts should be applied in this field in the near future.

CONCLUSIONS
We defined rare tumor in China as ICD-specified tumors with incidence ≤2.5/100,000 per year in China, and subtypes of non-rare ICD-specified tumors with incidence ≤2.5/100,000 per year in China, and cancers of unknown primary. Genomic profiling of rare tumors matching this definition from cBioPortal and a Chinese cohort drawn from the Geneplus database demonstrated a substantial prevalence of targetable genomic alterations in these tumors, which was even higher in Chinese rare tumor patient population than in the general population. All of the above facilitates future drug investigations and treatment improvement for rare tumors.

DATA AVAILABILITY STATEMENT
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

ETHICS STATEMENT
This study was approved by the ethics committees of the National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (NCC2019C-222). All patients signed written informed consent for further scientific analysis of genetic data.

AUTHOR CONTRIBUTIONS
NL and XY conceived the study. SW and RC processed data, performed data analysis. YT, YY, YF, HH, DW, HF, YB, CS, AY, QF, and DG. contributed to data collection, generation of tumor list and scientific insights. SW and RC wrote the manuscript. SW, NL, and XY revised the manuscript.

ACKNOWLEDGMENTS
We thank the patients for providing the valuable genetic data for scientific analysis.