Unraveling Gene Fusions for Drug Repositioning in High-Risk Neuroblastoma

High-risk neuroblastoma (NB) remains a significant therapeutic challenge facing current pediatric oncology patients. Structural variants such as gene fusions have shown an initial promise in enhancing mechanistic understanding of NB and improving survival rates. In this study, we performed a comprehensive in silico investigation on the translational ability of gene fusions for patient stratification and treatment development for high-risk NB patients. Specifically, three state-of-the-art gene fusion detection algorithms, including ChimeraScan, SOAPfuse, and TopHat-Fusion, were employed to identify the fusion transcripts in a RNA-seq data set of 498 neuroblastoma patients. Then, the 176 high-risk patients were further stratified into four different subgroups based on gene fusion profiles. Furthermore, Kaplan-Meier survival analysis was performed, and differentially expressed genes (DEGs) for the redefined high-risk group were extracted and functionally analyzed. Finally, repositioning candidates were enriched in each patient subgroup with drug transcriptomic profiles from the LINCS L1000 Connectivity Map. We found the number of identified gene fusions was increased from clinical the low-risk stage to the high-risk stage. Although the technical concordance of fusion detection algorithms was suboptimal, they have a similar biological relevance concerning perturbed pathways and regulated DEGs. The gene fusion profiles could be utilized to redefine high-risk patient subgroups with significant onset age of NB, which yielded the improved survival curves (Log-rank p value ≤ 0.05). Out of 48 enriched repositioning candidates, 45 (93.8%) have antitumor potency, and 24 (50%) were confirmed with either on-going clinical trials or literature reports. The gene fusion profiles have a discrimination power for redefining patient subgroups in high-risk NB and facilitate precision medicine-based drug repositioning implementation.


INTRODUCTION
Neuroblastoma (NB) is the most common and deadly pediatric malignancy, and the average age of patients is about 1-2°years at diagnosis (Matthay et al., 2016;Fletcher et al., 2018). Approximately 70% of NB patients have a metastatic disease with a less than 30% event-free survival rate (Moroz et al., 2011). Several therapy and treatment options, such as immunotherapeutic strategies, local irradiation, autologous stem cell transplantation (ASCT) combined with chemotherapy, have improved the survival rate of NB patients. However, a substantial number of NB patients, particularly in the high-risk group, still suffer profound treatment-related morbidity (Semeraro et al., 2015). Therefore, advanced treatment options are still urgently needed to improve the survival rate while eliminating adverse events.
The molecular understanding of high-risk NB is mostly elusive. It becomes significant hurdles to advance NB prognosis and therapy development (Almstedt et al., 2020). Beyond the frequently detected gene alternations such as ALK activations (Mossé et al., 2008), MYCN amplification (Zeid et al., 2018), and LMO1 expression (Wang et al., 2011), advancement in sequencing technologies provided a more in-depth and width view of the molecular basis of NB (Pugh et al., 2013;Bueno et al., 2016;Boeva et al., 2017;Ma et al., 2018). More and more complex genetic events such as gene fusions have been identified and could differential high-risk NB patients (Rorie et al., 2004;Mertens et al., 2015;Peifer et al., 2015). For example, (Peifer et al., 2015) discovered recurrent genomic rearrangements of the telomerase reverse transcriptase gene (TERT), occurring only in high-risk NB patients. Furthermore, TERT-related fusions could be used to define a new patient subgroup in high-risk NB with adverse clinical outcomes. These promising findings trigger deeper thinking on how to translate these genetic findings into therapy development.
Gene fusion is a result of structural variants (SVs), including insertion, deletion, inversion, and translocation that joins the two separate transcripts. Commonality and diversity of gene fusions in cancer were discussed elsewhere (Lee et al., 2017;Xu et al., 2018;Gao et al., 2018;Hu et al., 2018;Picco et al., 2019). A lot of computational tools have been developed to detect fusion transcripts based on DNA/RNA sequencing (Kim and Salzberg, 2011;Jia et al., 2013;Beccuti et al., 2014;Davidson et al., 2015;Heyer et al., 2019). Some comparative studies have also been conducted to prioritize the fusion detection tools based on statistical measures such as precision and F-scores (Wang et al., 2013;Liu et al., 2016;Kumar et al., 2016;Zhang et al., 2016). The conclusion drawn from these comparative studies mainly suggested combined top performance callers generate consensus results for further experimental verification (Liu et al., 2016;Kumar et al., 2016). Fusion detection algorithms with different mathematical equations and hypothesis behind, it is still an open question on how to anchor and apply these algorithms based on biological relevance. Furthermore, two tumors rarely shared the same gene fusions due to tumor heterogeneity, which limited the discrimination power for stratification of cancer into informative subtypes based on the individual fused transcript. The question has been raised on whether the patient gene fusion profiles could be used to regroup the patients, which are predictive of clinical outcomes such as patient survival, therapy response, and tumor pathology.
Moreover, oncogenic gene fusion not only expands our understanding of tumor biology but provides possible therapeutic targets for treatment development. For example, Imatinib and ponatinib were approved US FDA to treat chronic myeloid leukemia (CML), which targets BCR-ABL1 fusion (Druker, 2008). Moreover, crizotinib/ceritinib inhibiting ALK fusion was approved to treat non-small cell lung cancer (Mertens et al., 2015). However, these approved drugs are mainly tyrosine kinase inhibitor (TKI) inhibitors, which suffers some severe adverse drug reactions (ADRs) such as cardiotoxicity and drug-induced liver injury Xu et al., 2018). It is interesting to explore the probability of utilizing transcriptomic response, interplayed with gene fusions for a distinct patient subgroup for alternative treatment development while minimizing toxicity.
To explore the potential solutions for these unsolved questions, we conducted a comprehensive genomic analysis of 498 human NB cases ( Figure 1). First, A landscape of gene fusions was presented by using three state-of-art fusion calling algorithms, including ChimeraScan (Iyer et al., 2011), SOAPfuse , and TopHat-Fusion (Kim and Salzberg, 2011). Then, high-risk NB patients were regrouped based on detected gene fusion profiles and evaluated by survival analysis. Next, differentially expressed genes (DEGs) of the redefined high-risk NB patient subgroups were extracted and functionally analyzed. Finally, Repositioning candidates for the redefined high-risk subgroups were enriched with a large-scale of transcriptomic profiles in LINCS L1000 Connectivity Map (CMap). The proposed framework provides a promising approach to translate novel genetic findings into therapy development.

RNA-Seq Dataset of Neuroblastoma
The tumor samples of 498NB patients were enrolled from seven countries under the consent of respective clinical trials. The range of patients at diagnosis was from 0 to 295.5°months. The patients were classified based on the International Neuroblastoma Staging System (INSS, https://www.cancer.org/cancer/neuroblastoma/ detection-diagnosis-staging/staging.html) and the MYCNamplified (MNA) was measured. The ratio of MNA patients and total patients in each stage: stage 1 (3/121), stage 2 (5/78), stage 3 (15/63), stage 4 (65/183), and stage 4 (4/53). Furthermore, 176 patients were classified as high-risk ones based on the Revised International Neuroblastoma Response Criteria. The clinical characteristics of 498 NB patients were listed in Supplementary Table S1.
The detailed sample preparation was described elsewhere (Oberthuer et al., 2006). Briefly, the patients' tumor samples were manually checked by a pathologist and ensure the sample contains at least 60% of tumor content. The total RNA was then isolated from 30 to 60 mg of snap-frozen tissue obtained before cytotoxic treatment using the FastPrep FP120 cell disruptor (Qbiogene-Inc, Carlsbad, CA) and the TRIzol reagent (Invitrogen, Karlsruhe, Germany). Last, the RNA integrity was assessed, and the samples were selected with an RNA integrity number of more than 7.5. The RNA-seq data of 498 primary NB samples were generated in FDA Sequencing Quality Control (SEQC) phase I project. The raw RNA-seq data could be downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database with series accession number GSE62564 (Zhang et al., 2015). The purified mRNA was extracted from total RNA using Dynabeads ® mRNA Purification Kit (Invitrogen), and ERCC RNA spike-in was injected based on the user guide for RNA sequencing. Then, the non-stranded TruSeqs ™ protocol was used to conduct Library preparation. Next, clusters were generated based on the TruSeq PE Cluster Kit v3 reagent preparation guide. Last. High-throughput shotgun sequencing was performed on the Illumina HiSeq 2000 platform with the paired-end 100-bp reads. A total of 30,753,066,000 reads were produced, which enabled high coverage of the entire genome spectrum of NB. The average reads per sample was 6.1790e+07 ± 9.5474e+06, and the reads were distributed in a range of 39270986 to 10442490 (see Supplementary Figure S1). More detailed information on sequencing data generation was described elsewhere (Zhang et al., 2015).

Detection of Transcript Expression and Fusion Transcripts
There are more than 20 state-of-art fusion transcript detection tools (Liu et al., 2016). In this study, we applied the three most cited tools, including TopHat-Fusion (Kim and Salzberg, 2011), ChimeraScan (Iyer et al., 2011), and SOAPfuse , to detect fusion transcripts from the RNA-seq data of 498 primary NB samples. In this study, we used the human reference genome sequence (hg19, downloaded from the UCSC Genome Browser: http://hgdownload.soe.ucsc.edu/downloads.html#human) to detect the transcript-level expression and fusion transcripts.
To generate a transcript-level expression, we used TopHat2 v2.1.0 (Kim et al., 2013) to align the raw reads to the UCSC human genome and quantified the expression levels of all the transcripts with FPKM (Fragments Per Kilobase of transcript per FIGURE 1 | Flowchart of study: the workflow consists of three components including (1) fusion detection by three algorithms (i.e., ChimeraScan, SOAPfuse, and TopHat-Fusion) and fusion annotation by publicly available database and pathway analysis; (2) The high-risk patients were regrouped based on gene fusion profiles using non-negative matrix factorization (NMF) and hierarchical cluster analysis (HCA); (3) Survival analysis and drug repositioning for the redefined patient subgroup.
Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778 Million fragments mapped) values using Cufflinks v2.2.1 (Trapnell et al., 2010). All the parameters in the pipelines of alignment and quantification were set as default. The expressions at transcription-level were further summarized to the annotated genes in the RefSeq database (https://www.ncbi.nlm.nih.gov/ refseq/) by using the highest FPKM value in those of transcripts as the expression of a gene when there were multiple transcripts corresponded to one gene (Beccuti et al., 2014). There are many parameters in different fusion transcript detection algorithms, which have a significant impact on calling performance. However, optimization of parameter setting for each algorithm with different datasets is far beyond the scope of our paper. Therefore, we mainly followed the default parameter setting in each algorithm.

TopHat-Fusion
For TopHat-Fusion, all raw reads in FASTQ files were first aligned to the reference genome by using the TopHat2 v2.1.0 (Kim et al., 2013). Then, the initially unmapped reads were split into small segments and remapped to the reference genome for identifying the initial fusion candidates by TopHat-Fusion. Lastly, the fusions, for which the number of fusion spanning reads was higher than five and the sum of the fusion spanning reads, and the supporting mate pairs were greater than 10, were kept as candidate fusion transcripts.

ChimeraScan
The ChimeraScan pipeline utilized Bowtie (version 1.1.2) to align the raw reads to the reference genome (Langmead and Salzberg, 2012). The subsequent procedures of nominating the candidate fusions, detecting the spanning reads, and filtering the false positives were conducted with the default settings.

SOAPfuse
For SOAPfuse (Li et al., 2009), the raw reads were mapped to the reference genome by using the SOAP2 (version 2.21) algorithm. The single-end and paired-end mapped reads were kept for identifying the candidate gene fusions. The unmapped reads were then aligned to the annotated transcripts (Ensemble release), and the mapped reads were retained. Finally, the unmapped reads in the second step were iteratively trimmed and realigned to the annotated transcripts until the length of the reads was less than 30 nucleotides. The reads still unmapped to the annotated transcripts were filtered out. All the aligned reads in the three steps were used to detect the gene fusions by seeking the spanreads. The maximum hits for each span-read, and the filter parameters for identifying the gene fusions, were set as default.

Gene Fusion Annotation
The detected fusion transcripts were annotated by the following strategies. First, the reported fusion transcripts were curated and combined based on three public resources, including the Catalogue of Somatic Mutations in Cancer (COSMIC, https:// cancer.sanger.ac.uk/cosmic/download) (Forbes et al., 2008), Tumor Fusion Gene Data Portal (http://tumorfusions.org/) (Xu et al., 2018), and ChimerDB 3.0 (https://academic.oup. com/nar/article/45/D1/D784/2605708) (Lee et al., 2017). Second, a list of genes associated with neuroblastoma risk was curated by literature survey by querying against PubMed and other databases using key words "genes" and "neuroblastoma". Third, a file of human protein kinases was downloaded from UniProt (https://www.uniprot.org/docs/pkinfam). All the annotation data sets were listed in Supplementary Table S2.

Fusion Transcripts-Based Stratification and Survival Analysis
The 176 high-risk NB patients were stratified into new subgroups based on the detected fusion transcript profiles. First, the patientfusion transcript profiles matrix was constructed based on fusion transcripts detected from each algorithm. Then, NMF was used to decompose the patient-fusion transcript profiles matrix (F: 176 patients × unique number of fusion transcripts) into two matrices. 1) patient subgroup assignment (W: 176 patients × k subgroups) 2) fusion transcript assignment (H: k subgroups × unique number of fusion transcripts). This procedure was repeated 500 times. Consequently, the patient-patient relationship matrix (176 patients × 176 patients) was generated, and each cell of the matrix represent the probability of any two patients assigned to the same patient subgroup. Subsequently, the Hierarchical clustering analysis was used to create the consensus assignment of patients into k subgroups. Finally, Kaplan-Meier survival analysis was conducted for the comparisons between the subgroups, and the p values were calculated using the Log-rank test. All the procedures were performed in R (version 3.4.1) with the packages NMF v0.21.0, ggplot2 v2.2.1, survminer v0.4.2, and survival v2.4.3.

Importance Fusions for Distigusing the Patients Subgroups
To further investigate the important fusions and classification performance of gene fusions for redefined patient groups, we employed the XGboost binary classifier (Chen and Guestrin, 2016). XGBoost (Extreme Gradient Boosting) is an ensemble machine learning algorithm for regression and classification problems based on the Gradient Boosting Decision Tree (GBDT), which has been widely applied in biomedical applications (Ji et al., 2019). Specifically, the 176 high-risk patients were stratified into patient subgroups, and we developed the XGboost classifier on the gene fusion profiles and their redefined patient groups. The important gene fusion profiles and performance metrics (i.e., The area under the receiving operating characteristic curve (AUC)) based on 100-run 5-fold cross-validation were calculated. The calculation was performed in R (version 3.4.1) with packages xgboost version 1.3.2. The detailed hyperparameters, including binary: logistic objective, max-depth 6, step size of each boosting step 50, were used.

Differentially Expressed Genes in Patient Subgroups
To identify the differentially expressed genes (DEGs), we separately compared the transcript profiles of the patients in k Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778 subgroups with those of the patients in the control group, in which the survival days of the patients were longer than the median survival days among the stages 1 and 2. The DEGs were finally identified using the R packages limma and edgeR with an adjusted p value less than 0.05 as a cut-off value (Robinson and Smyth, 2008). The genes in the DEG list were ranked by their fold changes in descending order and the top/down 500 genes were extracted for further analysis. The functional analysis of extracted DEGs in each patient subgroup was conducted by using The Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) [44].

Enrichment of Repositioning Candidates
Drug-induced transcriptional profiles of NIH LINCS project (http://www.lincsproject.org/) (Corsello et al., 2017) were employed to enrich repositioning candidates specifically for each patient subgroup. The hypothesis behind genome-based repositioning is that if the drug signature is reversely correlated with the disease signature, the drug could be potentially used to treat the diseases. For LINCS data, LINCS L1000 characteristic direction signatures search engine (L1000CDS 2 ) was used to reversely compare the DEG in each patient subgroup to the drug transcriptional signatures in LINCS project (Duan et al., 2016). The L1000CDS 2 used data sets including LINCS L1000 level 3 normalized data and level 5 moderated Z-scores (MODZ), which were downloaded from lincscloud.org and GEO (GSE70138). There are a total of transcriptomic profiles generated from 98 cell lines.

Immune Cell Gene Signatures
Although neuroblastoma is typically considered to be an immunologically 'cold' tumor (Szanto et al., 2020), several studies have demonstrated the presence of tumor-infiltrating lymphocytes (e.g., T cells and NK cells), in human neuroblastoma tumors (Coughlin et al., 2006;Hishiki et al., 2018;Wienke et al., 2021). Therefore, we further investigated the immune-related gene expression in the refined patient subgroup. Immune cell gene expression data in mouse cell lines and tissues were extracted from the Immunological Genome Project (ImmGen) (Heng and Painter, 2008). The pre-processing and normalization of data were described previously (Painter et al., 2011;Kidd et al., 2015). Specifically, 304 differential state gene expression (e.g., fold change values) covering 11,153 mapped ortholog human Entrez gene ids were generated between two steady-state profiles from 221 unique immunological cell types (stored at https://github.com/ iguana128/Gene-fusion_NB). In this study, we ranked order gene expression profile from high to low based on fold change values for each of 304 immune-related states. Then, the top/down 500 genes in each immunological state were selected as DEGs for further analysis.

Code Availability
The scripts of RNA-seq analysis, data analysis, and data curation were listed in the GitHub (https://github.com/iguana128/Gene-fusion_NB). Furthermore, the scripts for generating all the figures and tables were provided as well.

Overview of Gene Fusion Transcripts
Three algorithms, including ChimeraScan (Iyer et al., 2011), SOAPfuse , and TopHat-Fusion (Kim and Salzberg, 2011), were employed to harbor gene fusion transcripts among 498 human NB cases. A total of 9,153 (i.e., 9,153/498 18.37 per patient), 2,004 (i.e., 2,004/498 4.02 per patient), and 1,057 (i.e., 1,057/498 2.12 per patient) unique gene fusions were detected by ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively (Supplementary Table S3). We found 47.07% (4,308/9,153), 59.28% (1,188/2,004), and 43.61% (461/1,057) of total detected fusions only specific to the individual patient for ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively. ChimeraScan is more sensitive to detect more gene fusions than another two algorithms. There were only 221 gene fusion transcripts identified by at least two algorithms, which represents the divergence of fusion detection algorithms ( Figure 2A). We further investigated the distribution of detected gene fusions across the patient subgroups defined by using the International Neuroblastoma Staging System (INSS). Specifically, the number of identified gene fusions were increased from the early stages (stage 1 ∼ stage 3) to late stages (stage 4 and high-risk group), indicating the higher stage has more complex tumor compositions. For ChimeraScan and TopHat-Fusion, more across-chromosomal fusions were detected than that of inter-chromosomal fusions. However, SOAPfuse harbored more inter-chromosomal fusion transcripts ( Figure 2B). Furthermore, the concordances among the three algorithms for individual patients also tended to be increased in patients in stage 4 and high-risk groups, although overall concordances were suboptimal (less than 5% overlapped ratio) (Supplementary Figure S2).
The detected gene fusions from the three algorithms were further annotated by the curated knowledge concerning reported cancer-related genes fusions, kinase protein family, and neuroblastoma related key genes. Similarly, patients from late stages (stage 4 and high-risk group) enriched more regarding reported cancer-related fusions, kinase-related fusions, and neuroblastoma key genes related fusions. The total number of annotated fusions was increased from TopHat-Fusion to SOAPfuse and ChimeraScan ( Figure 2C). Table 1 summarized the annotated fusions identified by the three algorithms. SOAPfuse detected more reported cancer-related fusions (90) than that of ChimeraScan (26) and TopHat-Fusion (10). Axon guidance pathway was enriched by the genes involving reported cancer-related gene fusions from both ChimeraScan and SOAPfuse. However, the enriched pathways (e.g., Hippo signaling pathway and Oxytocin signaling pathway) based on reported gene fusions identified by TopHat-Fusion were distinct. MAPK signaling pathway was enriched by kinase-related fusions from both ChimeraScan and TopHat-Fusion, although the involved gene fusions were entirely different. It indicated that fusion detection algorithms have a relatively higher similarity in the biological levels. It was interesting that some neuroblastoma related essential genes are more susceptible to form gene fusions. For example, ChimeraScan was more sensitive to ALK related Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778 fusions, while SOAPfuse and TopHat-Fusion detected more MYCN and LOM1 related fusions. DDX1 associated fusions were identified by all three algorithms. We further extracted the top fifty high-frequent gene fusions across the patients in each stage for the three algorithms ( Figure 2D). There were 38, 26, and 31 high-frequency gene fusions that appeared in all clinical stages for ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively (Supplementary  Table S4). However, as mentioned above, substantial lowfrequency gene fusions are more patient-specific.

Fusion Transcripts-Based Stratification
To further investigate the discrimination power of gene fusion profiles, we stratified 176 high-risk NB patients into new subgroups based on the detected fusion transcripts in each of the three algorithms. Here, only fusions appeared more than one time in the high-risk group were employed. Consequently, the high-risk patient and gene fusion profiles matrices were constructed with dimensions 176 patients × 2,509 fusions, 176 patients × 446 fusions, and 176 patients × 351 fusions for ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively. Then, 176 patients were divided into a predefined number of subtypes (k 6, 6, and 3) by using average subtype assignment frequency from 500 runs of the NMF algorithm. The predefined subtype k was optimized by using cophenetic degree and sparseness parameters in NMF (Supplementary Figure S3). Then, patient subtype assignment matrices (176 patients × 176 patients) for each fusion detection method were further clustered by using hierarchical clustering analysis (HCA) ( Figure 3A). The 176 patients were clustered into four subgroups for each fusion algorithm. The overall survival distribution of each subtype was illustrated in Figure 3B. It is illustrated that there existed one subgroup (number of patients 99, 120, and 67) with significantly smaller overall survival time (median overall survival 798, 883, and 735°days) than another three groups (median overall survival 1,242, 1,241, and 1,229°days). It was found 39 patients were overlapped by the subgroups based on fusion profiles from ChimeraScan (39/99 39.4%), SOAPfuse (39/120 32.5%), and TopHat-Fusion (39/67 58.2%), respectively (Supplementary Figure S4). The Kaplan-Meier survival analysis showed significantly different survival times between the redefined patient subgroups with p values (0.014, 0.022, and 0.032) and hazard ratios (1.670, 1.638, and 1.617) for the three algorithms ( Figure 3C; Supplementary Table S5). It was indicated that the fusion transcript profiles could be used to further distinguish the patients from the high-risk group with improved survival rates. Table 2 listed the clinical characteristics of redefined high-risk patients. MYCN-amplified patients are more enriched in the refined high-risk patient group (73.7, 72.5, and 59.7% for ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively). In contrast, the lower percentage of MYCN-amplified patients was classified into other subgroups (i.e., 24.7, 8.9, and 47.7% for ChimeraScan, SOAPfuse, and TopHat-Fusion, respectively). It  was indicated that the redefined patient group might be MYCNamplified, especially for ChimeraScan and SOAPfuse. Furthermore, the age of diagnosis was more than 18°months for over 92% of refined high-risk patients and 96% of patients in other groups, highlighting the early diagnosis of high-risk NB patients is very challenging. Additionally, a higher percentage of males were assigned into the redefined high-risk patient group, e.g., 57.5 and 62.7% for ChimeraScan and TopHat-Fusion, respectively. However, a higher rate of females was classified into other subgroups. The detail clinical information for redeinfed high-risk patients were listed in Supplementary  Table S5.
To further investigate the important gene fusion profiles that could distinguish the high-risk group patients, we developed an XGboost binary classifier. Figure 4 illustrated the top 10 important features derived from XGboost classifiers for the three fusion detection tools. Furthermore, the average AUC values of 100-run 5-fold cross-validations were ranked with the following order: SOAPfuse (0.862 ± 0.031) > ChimeraScan (0.849 ± 0.018) > TopHat-Fusion (0.799 ± 0.013). The high AUC and small stand deviation among 100 5-fold crossvalidation results indicated the reliable classification results could be obtained based on the gene fusion profiles derived from the three detection tools.

Differentially Expressed Genes Associated With Patient Subgroups
We next sought for the DEGs associated with the redefined high-risk subgroups and examined their underlying mechanism. The DEGs were generated by comparing the transcript profiles between the patients in the redefined highrisk subgroup, and patients with the overall survival days were longer than the median survival days in stages 1 and 2. Here, the FIGURE 3 | Gene fusion profile-based patient stratification and survival analysis: (A) hierarchical cluster analysis (HCA) for redefining patient subgroups in high-risk NB patient groups for each fusion detection algorithm. First, patient assignment matrices were generated from average results of 500 runs with nonnegative matrix factorization (NMF) with patient gene fusion profiles. Then, the HCA was performed on the patient assignment matrices to redefine the patient groups. The color denotes the redefined patient groups; (B) The overall survival time (Days) of patients in the redefined patient subgroups. The subgroup with lowest median survival time was considered as the redefined high-risk group; (C) Kaplan-Meier survival analysis was conducted between the redefined high-risk subgroups and the combination of other subgroups. Log-Rank p value and the hazard ratio was calculated.
Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778 top 500 up-and down-regulated genes were extracted (Supplementary Table S6). The overlapped DEGs associated with refined high-risk subgroups from the three algorithms occupied 65.3% of 1,000 DEGs for each fusion detection method ( Figure 5A). The shared KEGG pathways enriched by using DEGs were illustrated in Figure 5B. Three KEGG pathways, including Ribosome, Cell cycle, and DNA replication, were enriched by all the three algorithms. The DEG from TopHat-Fusion enriched more immune-related pathways such as Primary immunodeficiency and an Intestinal immune network for IgA production (Supplementary Table S7). Figure 5C highlighted the top ten up-and down-regulated genes associated with refined high-risk subgroups. MYCN, MYCNOS, and SLC30A3 were the most up-regulated genes, while APOD, INSRR, and PIRT were the most down-regulated genes across the three algorithms. Those genes have been reported to play a different regulation role in NB development (Huang and Weiss, 2013;Olsson et al., 2016).
Notably, MYCN has been found in ∼25% of high-risk NB patients and correlated with poor diagnosis (Huang and Weiss, 2013). Another interesting finding here is the onset age of NB in the redefined high-risk NB patient group is significantly lower than another three redefined subgroups for ChimeraScan and SOAPfuse. It is indicated that the development of an adaptive immune system may play a role in neuroblastoma evolution ( Figure 6). We further compared the DEG in a redefined highrisk group with the highly expressed gene signatures in 304 immune-related states obtained from the ImmGen (https:// www.immgen.org/) (Heng and Painter, 2008). The similarities between DEG in the redefined high-risk subgroup and immunerelated cell types were calculated and rank-ordered (Supplementary Table S8). The percentage of overlapped ranked immune-related cell types among the three fusion detection algorithms was illustrated in Figure 7. The enriched immune-related cell types were very similar among the three  Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778 10 algorithms. Furthermore, we listed the top ten enriched immunerelated cell types from each method and found T8 cell types dominated (Orentas et al., 2012).

Repositioning Candidates for Redefined High-Risk Subgroup
We further explore the potential repositioning candidates that could be used to treat the patients in the redefined high-risk subgroup. The L1000 drug signatures were employed to compare the DEGs of the redefined high-risk group. LINCS L1000 characteristic direction signatures search engine (L1000CDS 2 ) was applied to enrich Repositioning candidates. If the drug signatures were reversely correlated with DEG in a refined high-risk group, the drug was considered as potential repositioning candidates. Consequently, we got the top 50 repositioning candidates for patients in the redefined high-risk group from each fusion detection algorithms. Supplementary  Table S9 listed the detailed information on repurposing candidates, including compound/time/dose/cell line enriched from L1000CDS 2 . Interestingly, a total of 37 Repositioning candidates were overlapped by at least two refined high-risk groups from different fusion detection algorithms (Supplementary Table S9).
To verify the enriched repositioning candidates, we conducted a two-step analysis. Table 3 compiles the candidates for the redefined high-risk groups defined by the three fusion detection algorithms. First, we queried clinical trial studies through http:// clinicaltrial.gov (www.clinicaltrials.gov) to seek for the clinical evidence of repositioning candidates for the treatment of neuroblastoma. Notably, three repurposing candidates, including selumetinib, vemurafenib, and trametinib, were in clinical trials for potentially treating NB. Second, we further carried out a comprehensive literature survey through PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) using the keywords FIGURE 6 | Distribution of onset age in the redefined high-risk and low-risk NB patients: blue and yellow colors represent the redefined high-risk and low-risk patient subgroups, respectively. The student's t-test was used to generated p value.
FIGURE 7 | Percentage of overlapped immune-related cell types: the immune-related cell types were enriched and rank-ordered from high to low by comparing the DEG regulated by the redefined high-risk subgroup for each fusion detection algorithm and gene signatures of immune-related cell type from ImmGen (https://www.immgen.org/). Then, the percentage of overlapped enriched immune-related was calculated between any two rankordered immune-related cell type lists. The top-ten enriched immune-related cell types were illustrated in the sub-

DISCUSSION
One of the critical aspects of precision medicine is to translate the novel genetic findings into therapy development. Structural variants (SVs) such as gene fusions have been identified in various tumor types, indicating some potential discriminative power to improve the patient survival curves. However, it was still elusive how to translate these novel genetic elements into therapy development. Several challenges become significant hurdles toward the practical application of novel genetic findings in a clinical setting. First, the inconsistency of gene fusions among the gene fusion detection algorithms still exists. Technically, several comprehensive comparison studies have been reported and suggested the consensus approach as a solution to improve reproducibility. However, the inconsistency among algorithms may be explained by biological relevance, which has not thoroughly investigated. Second, the discrimination power of gene fusion has only been assessed individually. The patient gene fusion profiles may provide a more robust solution to redefine the patient groups toward better survival rates. Finally, it is still a gap to utilize novel genetic findings into therapy development, although the possibility was always discussed elsewhere (Cully, 2015). We therefore aimed to explore the opportunity to apply patients' gene fusion profiles to stratify the high-risk neuroblastoma patients into subgroups for improving the survival rate and implementing the precision medicine-based drug repositioning. Overall, three popular gene fusion detection algorithms were comprehensively assessed with RNA-seq data of 498 NB patients. The technical reproducibility among the three algorithms was suboptimal, which reflected the interior difference of mathematical proof behind the algorithms. Moreover, the sample heterogenicity is still a significant factor for fusion detection algorithm selection. The number of gene fusions was correlated with clinical stages from low-risk to high-risk. The sensitivity of fusion detection algorithms was increased from TopHat-Fusion to SOAPfuse and ChimeraScan. Notably, the biological relevance of detected gene fusions from the three algorithms shared substantial similarity regarding regulating pathways and DEGs.
More importantly, the gene fusion profiles derived from each algorithm have a discrimination power to redefine patient subgroups in the high-risk group with improved survival rates with Log-Rank p values less than 0.05. Furthermore, highly overlapped repositioning candidates (37 out of 48 candidates) could be enriched based on DEGs from different algorithms, and 50% of repositioning candidates (24 out of 48 repositioning candidates) could be verified by on-going clinical studies and literature reports.
The detected gene fusions were annotated by the current knowledge (Table 1). For example, the Axon guidance pathway was enriched by genes involved in reported cancerrelated gene fusions based on ChimeraScan and SOAPfuse. Axon guidance plays a central role in controlling neuronal migration and neuronal survival (Chédotal et al., 2005). The expression change of proteins such as slits, semaphorins, and netrins involved in the Axon guidance pathways induce the pathological changes in neural circuits which predisposed to neurological disorder in adult and NB in children (Van Battum et al., 2015). Another interesting finding here is some kinase-related fusions could perturb immune-related pathways such as the T cell receptor signaling pathway. It may help investigate the immune cell types regulated by gene fusion profiles in the redefined patient subgroups (Figures 5, 6). The abnormalities of immune systems in children with NB have been observed. However, the underlying mechanism is not fully understood. We found the onset age of NB patients in the redefined patient subgroup was significantly smaller than the others. Furthermore, some specific immune cell types could be enriched by all the three algorithms, which may provide more biological hints for better understanding the interplay between immune systems and pathogenesis of NB.
Several repurposing candidates hold a promise for further investigation on clinical usage for treating NB. Over 93% of enriched repositioning candidates were designed for anticancer purposes. For example, selumetinib and vemurafenib are in A Some literature reports also show some preclinical evidence of repositioning candidates for NB treatments. One example is MK-2206, an allosteric AKT inhibitor is in clinical Phase 2 designed for colorectal cancer, breast cancer, and other solid tumors. Preclinical human in vitro studies suggested a combination of an allosteric Akt Inhibitor MK-2206 with etoposide or rapamycin to enhances the antitumor growth effect in high-risk NB patients (Li et al., 2012). Another example is AZD8055, an mTOR inhibitor, which was designed for treating recurrent gliomas in clinical phase 2. It was reported that AZD8055 could induce cell cycle arrest, autophagy, and apoptosis and had strong antitumor activity on NB in both human in vitro and mouse in vivo models. It may be worth further investigating for clinical application for high-risk NB treatment (Xu et al., 2018).
We also enriched one repositioning candidate named SN-38, which was initially approved for small cell lung cancer. SN-38 is a topoisomerase I inhibitor and the active metabolite of irinotecan. It has a solubility issue that makes it hard for patient administration. However, the efficacy was much better than anti-neuroblastoma drug irinotecan. A novel nano carrier-based strategy for tumor-targeted delivery of a prodrug of SN-38 was developed and verified in mouse xenografts, which solve the poor blood-brain barrier (BBB) concentration of SN-38 for neuroblastoma treatment (Iyer et al., 2015).
More than 20 different gene fusion detection tools based on RNA-seq data have been developed (Kumar et al., 2016;Haas et al., 2019). The performances of fusion detection tools on sensitivity, specificity, and required computational resource (e.g., memory size and computational time) varies among different datasets. It was suggested the critical influence of calling performance also highly relies on the RNA-Seq read length, read number, and the quality of the reads. The optimization of parameters in each pipeline based on the properties of their RNA-Seq datasets along statistical methods may improve the calling performance (Dehghannasiri et al., 2019). Another comparative study also suggested the consensus calling results from different detection tools may decrease the false-positive rates. The standardization of calling results from various detection tools is the key to avoiding the specific tools and establishing a consensus calling result (Liu et al., 2016). However, a ground truth set verified from orthogonal technologies was needed to assess the calling performances from different detection tools objectively. As a proof-of-concept paper, the points described above were out of the scope of the current study. However, we highly recommend further assessing the reliable fusion detection results from multiple detection tools to establish the reproducible and high-confidence fusion calling to enable a real-world application. It is worthwhile to consider some additional work to further enhance and confirm the findings in our studies. First, the current studies were focused on RNA-seq data. As our known, DNA-seq has been wellestablished for SVs detection. Further investigation on DNAseq or combined with RNA-seq data could improve the precision of gene fusion detection and establish satisfactory technical reproducibility . The DNA-seq provides the unbiased characterization and most comprehensive of the potential gene fusions and tumor suppressor genes disrupted by genomic rearrangement. However, it requires in-depth coverage, ample storage, and long computational analysis time. The RNA-seq only sequences the genome regions that are transcribed and spliced into mature mRNA. Therefore, only relatively high expressed fusions can be detected. However, RNA-seq data requires less storage, space, and analysis time. Furthermore, the read length for RNA-seq data can be either short or long with different sequencing platforms (Schröder et al., 2019). Second, as a proof-of-concept study, we explored the discrimination power of gene fusion profiles and their potential for patient stratification and treatment development. However, an integrative approach by incorporating various genetic elements beyond mRNA could yield a better survival curve and more precise treatment development. Third, to reuse the drugs for pediatric tumor treatment is a challenging task. In this study, the enriched repositioning candidates are mainly anticancer candidates, which could serve as a good starting point. The further PK/PB property optimization, dosage adjustment, and safety profile prioritization should be taken into consideration for further investigation (Liu Z et al., 2016;Liu et al., 2017). Fourth, the unbalance number of collected neuroblastoma patient samples across different clinical predefined INSS cancer stages might influence the performance of gene-fusion-based patient stratification. The more consortium efforts with more balanced patients distribution in different cancer stages may further verify and improve the performance. Finally, in the current study, we only focused on three popular fusion detection algorithms to elaborate the points and establish the framework for translating genetic findings into therapy development. Other fusion detection algorithms and patient stratification strategies may also be considered to improve the results. In conclusion, we carried out an exploratory study to investigate how to apply genetic findings such as gene fusions for clinical applications. The framework developed it straightforward and may also serve as a strategy for treatment development for other diseases.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: ZL. Analyzed the data: ZL. The first version of the manuscript: ZL. Contributed reagents/ Frontiers in Pharmacology | www.frontiersin.org April 2021 | Volume 12 | Article 608778