Identification of a Five-Gene Signature Derived From MYCN Amplification and Establishment of a Nomogram for Predicting the Prognosis of Neuroblastoma

Background: Neuroblastoma (NB), the most common solid tumor in children, exhibits vastly different genomic abnormalities and clinical behaviors. While significant progress has been made on the research of relations between clinical manifestations and genetic abnormalities, it remains a major challenge to predict the prognosis of patients to facilitate personalized treatments. Materials and Methods: Six data sets of gene expression and related clinical data were downloaded from the Gene Expression Omnibus (GEO) database, ArrayExpress database, and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database. According to the presence or absence of MYCN amplification, patients were divided into two groups. Differentially expressed genes (DEGs) were identified between the two groups. Enrichment analyses of these DEGs were performed to dig further into the molecular mechanism of NB. Stepwise Cox regression analyses were used to establish a five-gene prognostic signature whose predictive performance was further evaluated by external validation. Multivariate Cox regression analyses were used to explore independent prognostic factors for NB. The relevance of immunity was evaluated by using algorithms, and a nomogram was constructed. Results: A five-gene signature comprising CPLX3, GDPD5, SPAG6, NXPH1, and AHI1 was established. The five-gene signature had good performance in predicting survival and was demonstrated to be superior to International Neuroblastoma Staging System (INSS) staging and the MYCN amplification status. Finally, a nomogram based on the five-gene signature was established, and its clinical efficacy was demonstrated. Conclusion: Collectively, our study developed a novel five-gene signature and successfully built a prognostic nomogram that accurately predicted survival in NB. The findings presented here could help to stratify patients into subgroups and determine the optimal individualized therapy.


INTRODUCTION
Neuroblastoma (NB) is the most common extracranial solid tumor in children, with the highest incidence and mortality in infancy, accounting for approximately 8-10% of pediatric malignancies (Li et al., 2008;Marshall et al., 2014). It is a highly heterogeneous disease characterized by diverse clinical manifestations ranging from spontaneous regression to progression with therapy resistance. Based on clinical and molecular patterns, NB patients can be classified into lowrisk, intermediate-risk, and high-risk groups (Cohn et al., 2009). Although the prognosis of NB has significantly improved over the past decade with the development of multiple treatment methods and the application of immunotherapy targeting NB-specific antigens, such as GD2 (Casey and Cheung, 2020), less than 40% of high-risk NB patients achieve long-term survival (Pinto et al., 2015). Given the significantly divergent outcomes among these patients, an effective prediction model is needed so that effective management and therapy can be tailored to diverse subgroups of NB patients.
There is increasing evidence that several genomic alterations, such as gain of chromosome 17q, hemizygous deletions of 1p and 11q, and MYCN gene amplification, are powerful prognostic markers that are strongly associated with clinical outcomes (Janoueix-Lerosey et al., 2009;Maris, 2010;Schleiermacher et al., 2012). Among these factors, MYCN amplification is one of the earliest biomarkers discovered in NB and is still considered one of the most reliable predictors of aggressive clinical behavior (Park et al., 2013;Campbell et al., 2017). Augmented MYCN is detected in 20-30% of all NB cases, and the overall survival rate for this group remains less than 50% (Huang and Weiss, 2013). Although MYCN plays a pivotal role in NB development, its application as a therapeutic target remains challenging. Although much research has been done on targeting MYCNrelated factors, new target genes are still needed to effectively treat this aggressive and heterogeneous disease (Puissant et al., 2013;Gustafson et al., 2014;Huang et al., 2019). Due to the rapid development of high-throughput sequencing, multigene signatures have been shown in many studies to be more effective at predicting prognosis than conventional biomarkers. The genes discovered by bioinformatic approaches can also provide insights into cancer progression and identify pathway profiles involved in tumorigenesis. Several NB prognostic models have been made based on sequencing data (Vermeulen et al., 2009;De Preter et al., 2010;He et al., 2020;Wang et al., 2020). A deeper dive into public data sets may reveal other genes associated with prognosis and new potential therapeutic targets.
Herein, we aimed to build a prognostic signature that could predict the prognosis of NB patients more accurately than traditional clinical factors. We systematically analyzed multiple public NB data sets to identify differentially expressed genes (DEGs) based on MYCN amplification. Then, we successfully established a prognostic five-gene risk prediction model that can accurately predict the outcome of NB.
Simultaneously, we performed preliminary validation of the pivotal DEGs using clinical samples. A prognostic nomogram was finally designed based on the gene signature and clinical factors, and this nomogram will help physicians develop more customized treatment plans for different subgroups of NB patients. . The four GEO data sets were used for DEG analysis. The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) NB gene expression data sets and related clinical information were downloaded using cBioPortalData in the R client. The TARGET data sets were used for prognostic gene signature prediction. E-MTAB-8248 and GSE49710 with complete follow-up information were used for prognostic signature validation.

Differentially Expressed Gene Analysis
Patients were stratified into MYCN amplification and MYCN nonamplification groups according to the clinical information provided in the data sets. In the case of a gene with multiple probes matching it, its expression level was calculated by taking the highest expression value. DEGs were explored between two groups using the "limma" R package (Ritchie et al., 2015). The DEG cutoff was set as |log2 (fold-change) | > 1 and adjusted p-value < 0.05. The volcano plots were created using the R package Enhanced Volcano. The DEGs identified from the four data sets from GEO were further analyzed using the robust rank aggregation (RRA) method-based R package "RobustRankAggreg" (Kolde et al., 2012). p < 0.05 was considered statistically significant in this study.

Enrichment of Differentially Expressed Genes
DEG functional enrichment analysis, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, was carried out using the "clusterProfiler" R package .

Establishment of the Prognostic Gene Signature
The TARGET-NBL mRNA expression data were used for prognostic gene signature establishment. The DEGs identified from the GEO data sets were used for least absolute shrinkage and selection operator (LASSO)-penalized Cox regression analysis in the "glmnet" package in R. Subsequently, multivariate Cox regression analysis was used to further reduce the number of DEGs for model construction. Finally, a prognostic signature of NB was constructed. The risk score was calculated by multiplying the mRNA expression level by the regression coefficient for each gene. According to the median risk score, patients were divided into low-risk and high-risk groups. The predictive performance of the signature was evaluated according to Kaplan-Meier (KM) analysis, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and Harrell's concordance index (C-index). The GSE49710 and E-MTAB-8248 data sets were used for external data set validation.

Assessment of Tumor Immunity
The tumor microenvironment of each NB sample was evaluated via the Estimation of STromal and Immune cells in MAlignant Tumor tissues using the Expression data (ESTIMATE) algorithm. ESTIMATE is a sophisticated algorithm that estimates the degree of infiltration of tumor cells and determines the stromal score, immune score, and estimate score (Yoshihara et al., 2013). The degree of immune cell infiltration was estimated via Cell Type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT), which is a deconvolution algorithm for analyzing the expression matrix of immune cell subtypes (Newman et al., 2015).

Preparation of Tissue Samples
The study was permitted by the Ethics Committee of the Tianjin Medical University Cancer Institute and Hospital (Approval No. E20210027). Five MYCN-amplified NB samples and five MYCNnonamplified NB samples were harvested from NB patients at Tianjin Medical University Cancer Institute and Hospital. All patients were in International Neuroblastoma Staging System (INSS) stage 3 and stage 4. The tissue samples were stored at −80°C until use. All patients provided written informed consent.

Quantitative Real-Time PCR
Total RNA was extracted from NB tissues using the TRIzol reagent (Ambion, United States). Complementary DNA (cDNA) was synthesized using a quantitative real-time polymerase chain reaction (RT-PCR) kit (Takara, Japan). Quantitative RT-PCR was performed using an miScript PCR system (QIAGEN) according to the manufacturer's instructions.
The qRT-PCR data were assessed via the 2-ΔΔCt method with β-actin as the control. The primer sequences are summarized in Supplementary Table S1.

Independent Prognostic Parameters of Neuroblastoma Identification
To identify independent prognostic factors in the TARGET data set, univariate analyses were performed on the five-gene signature as well as clinical and pathological parameters, including sex, age, MYCN status, tumor histology, race, and tumor stage. Statistically significant parameters were further included in multivariate Cox regression analyses and nomogram construction.

Nomogram Construction
A prognostic nomogram including all independent prognostic factors was constructed to predict the overall survival of NB patients in the TARGET cohort. The predictive performance of the nomogram was evaluated according to KM analysis, the AUC of the ROC curve, and the C-index.

Statistical Analysis
We performed all the statistical analyses with RStudio (1.1.463). Mann-Whitney U-tests and Kruskal-Wallis tests were used to compare two groups. Survival curves were plotted and compared

Identification of Differentially Expressed Genes
Our study was performed based on the flow chart illustrated in Figure 1. Detailed information on the GEO, TARGET, and ArrayExpress data sets in this project is shown in Supplementary  Table S2. The DEG criteria were set as |logFC|>1 and adj. p < 0.05 in all data sets.  Table S3). Figure 2B depicts the top 20 genes that were identified as upregulated and downregulated by using the RRA method. In clustering analyses, a marked difference in DEG expression appeared between the MYCN amplification and nonamplification groups ( Figure 2C and Supplementary Figure S2A-C).

Enrichment Analysis of Differentially Expressed Genes
GO and KEGG analyses were performed to determine the potential functions of DEGs with the clusterProfiler R package (Supplementary Tables S4, S5). The DEGs were significantly enriched in biological processes, such as glial cell differentiation, the receptor recycling process, the catecholamine biosynthetic process, sympathetic nervous system development, and central nervous system neuron differentiation; cellular components, such as neuronal cell bodies and dendrites; and molecular functions, such as carbohydrate binding ( Figure 3A). In the KEGG pathway analysis, DEGs were shown to be involved in the amino acid metabolism and cell adhesion ( Figure 3B). To identify gene interactions, a PPI network containing 168 nodes and 312 interactions was constructed. Figure 3C shows the top 10 candidate hub genes identified in this network. Module analysis identified the top two highest-scoring clustering modules identified by using module analysis (Supplementary Figures S3A, C). According to functional enrichment analysis, module 1 had a strong association with synapse formation and the differentiation of glia (Supplementary Figure S3B). Module 2 showed a significant correlation with the amino acid metabolism process and DNA replication, indicating its involvement in the tumor-related metabolism and DNA replication (Supplementary Figure S3D). The above results of the PPI network indicated that the DEGs were involved in the NB cell differentiation and metabolism.

Construction of a Five-Gene Prognostic Signature
A total of 247 patients from the TARGET-NBL gene expression data sets were included for subsequent survival analyses. The detailed characteristics of the 247 patients are shown in Supplementary Table S6. LASSO-penalized Cox regression analysis identified 35 DEGs significantly correlated with survival (lambda 0.05, Supplementary Figure S4). After optimization of 35 DEGs by the multivariate Cox regression method, a prognostic signature consisting of five genes, including CPLX3, GDPD5, SPAG6, NXPH1, and AHI1, was constructed. The downregulated genes GDPD5, NXPH1, and AHI1 were identified as tumor suppressors, while the upregulated genes CPLX3 and SPAG6 were regarded as oncogenes ( Figure 4A).  The median value of the risk score was set as the cutoff value. Patients from the TARGET-NBL data set were categorized into two groups. The KM survival curve revealed significantly favorable outcomes in the low-risk-score group (log-rank p < 0.0001, Figure 4B). The high-risk group had poor prognosis ( Figures  4C-E). Evaluation of the five-gene prediction model was carried out using time-dependent ROC curves and the C-index. The AUCs for the 1-, 3-, and 5-year survival were 0.696, 0.784, and 0.744, respectively ( Figure 4F), and the C-index of the gene signature was 0.697 (95% CI: 0.656-0.738). Patients without MYCN amplification were further stratified into two groups based on the median risk score, and the KM survival curves showed a Frontiers in Molecular Biosciences | www.frontiersin.org December 2021 | Volume 8 | Article 769661 7 significant difference in prognosis between the high-and low-risk groups (log-rank p < 0.0001, Figure 4G). In general, the five-gene signature was highly accurate for predicting prognosis.

Validation of the Gene Signature
The E-MTAB-8248 and GSE49710 data sets, which contain overall survival data, were utilized to validate the performance of the prognostic model ( Figure 5; Supplementary Figure S5). Calculations of the risk score were made using the same equation for all patients. Based on the median risk score, the patients were divided into low-and high-risk groups. The high-risk group had markedly worse outcomes than the low-risk group in both validation cohorts (Figures 5A-D and Supplementary Figures S5A-D). Further evaluation of the predictive accuracy was conducted using time-dependent ROC curve analysis and the C-index. The five-gene signature performed well in both short-term and long-term survival prediction ( Figure 5E and Supplementary Figure S5E , between the age ≥18 months and age <18 months groups (B), between the favorable and unfavorable histology groups (C), between the MYCN-amplified and MYCN-nonamplified groups (D), between the hyperdiploid and diploid groups (E), and between different race groups (F) in the TARGET data set. ***p < 0.001, ****p < 0.0001, n. s. not significant, two-sided unpaired Wilcoxon test. (G-I) Box plots of the distribution of the stromal score (G), immune score (H), and estimate score (I) calculated by the ESTIMATE algorithm between the high-risk and low-risk groups. *p < 0.05, ***p < 0.001, n. s. not significant, two-sided unpaired Wilcoxon test. (J) Box plots of the distribution of the cell proportions calculated by the CIBERSORT algorithm of immune cells between the high-risk and low-risk groups. *p < 0.05, ***p < 0.001, n. s. not significant, two-sided unpaired Wilcoxon test.
Frontiers in Molecular Biosciences | www.frontiersin.org December 2021 | Volume 8 | Article 769661 9 and 0.560, respectively ( Figures 6A-D). Decision curve analyses (DCAs) for 1-, 3-, and 5-year overall survival predictions showed better efficiency for the five-gene signature than the INSS staging and MYCN status. The areas under the decision curve of the fivegene signature were 0.00975, 0.13147, and 0.17676 for 1-, 3-, and 5-year survival, respectively, while those of INSS staging and MYCN status were 0.00880, 0.09328, and 0.15217 and 0.00086, 0.00595, and 0.00478, respectively ( Figures 6E-G). Overall, the five-gene signature was more effective in predicting the survival of NB.

Correlations of Clinical Pathology and Tumor Immune Cell Infiltration With the Five-Gene Signature
Relationships between the five-gene signature risk score and the clinical characteristics of NB patients, including INSS stage, age, MYCN status, tumor histology, tumor ploidy, and race, were analyzed in TARGET data sets. In terms of INSS stage, a higher risk score was observed in patients in stage IV than in those in stage I in the TARGET data set ( Figure 7A). The five-gene risk score increased from stage 1 to stage 4, except for stage 4S in GSE49710 and E-MTAB-8248 (Supplementary Figures S6A, D). Patients aged ≥18 months or with unfavorable histology, MYCN amplification, and diploid tumors had higher five-gene risk scores ( Figures 7B-E). The risk scores showed no difference according to race ( Figure 7F). Two extra data sets also validated that the MYCN amplification group had higher five-gene risk scores (Supplementary Figures S6B, S6E). Analysis of the GSE49710 data set revealed that patients with higher five-gene risk scores tended to experience disease progression (Supplementary Figure S6F).
The ESTIMATE algorithm was used to investigate the correlation between the five-gene signature and immune cell infiltration. The stromal score, immune score, and estimate score were calculated for the TARGET data set. However, the stromal score showed no difference between the high-risk and lowrisk groups. The immune score and estimate scores were significantly higher in the low-risk group, indicating the enrichment of infiltrating immune cells in the low-risk group (p < 0.05; Figures 7G-I). Further analysis by the CIBERSORT algorithm showed that the five-gene low-risk group had higher levels of follicular helper T cell and activated NK cell infiltration ( Figure 7J). The E-MTAB-8248 and GSE49710 data sets also validated the significant difference between the two groups in immune score and estimate score (Supplementary Figures  S6F-J). However, the results of the CIBERSORT algorithm in both validation sets were different from those of the training set, where the levels of CD8 + T cells and M2-type macrophages were found to be much higher in the low-risk group (Supplementary Figure S7).

Validation of the Five Differentially Expressed Genes Used for the Gene Signature
The mRNA expression levels of the five pivotal genes were measured in five MYCN-amplified NB tissues and five MYCN-nonamplified tissues by qRT-PCR. This result was consistent with the above DEG analysis between the MYCN amplification and MYCN nonamplification groups in the four data sets (Supplementary Table S3). As shown in Supplementary Figure S8, compared with MYCNnonamplified tissues, the expression level of CPLX3 was elevated in MYCN-amplified tissues. The other four genes were found to be downregulated in the MYCN-amplified group.

Evaluation of Independent Prognostic Factors in Neuroblastoma
A total of 220 patients from the TARGET data set with sufficient clinical information were included in the analysis. Overall survival-related independent prognostic factors were identified through stepwise Cox regression analysis. The univariate analysis showed that the five-gene risk score (p < 0.0001), tumor histology (p < 0.0001), age (p 0.00065), and MYCN status (p 0.024) were associated with the prognosis of NB (Supplementary Table  S7). The five-gene risk score and age were independently associated with overall survival according to multivariate Cox regression analysis (Supplementary Table S8).

Establishment and Validation of a Nomogram
A prognostic nomogram was built for predicting overall survival based on the stepwise Cox regression analysis results. Risk score, age, tumor histology, and MYCN status were parameters included in the nomogram ( Figure 8A). The AUCs of the 1-, 3-, and 5-year overall survival predictions for the nomogram model were 0.754, 0.815, and 0.795, respectively ( Figures 8B-D). The C-index of the risk score was 0.717 (95% CI: 0.674-0.760). According to the scores of the nomogram model, the patients were divided into three equal groups. There was good discrimination between these groups in the KM plots (p < 0.0001) ( Figure 8E). Calibration plots at 1, 3, and 5 years showed that the nomogram correctly predicted the NB patient's overall survival ( Figures 8F-H).

DISCUSSION
In the current study, we developed a novel multigene signature that showed better predictive power than conventional biomarkers. At present, there is still an urgent need for new effective stratification systems and treatment targets for NB. Although it has been well established that MYCN amplification affects the prognosis of NB (Brodeur et al., 1984;Seeger et al., 1985), it remains challenging to target this oncogene in clinical treatment. The development of the sequencing technology allows us to deeply explore the molecular mechanism by which MYCN amplification affects prognosis at the whole-genome level. Relevant NB data were obtained from multiple data sets, and patients were grouped based on MYCN status. We identified 319 DEGs, including 51 upregulated and 268 downregulated genes. GO  these DEGs were performed to further investigate the molecular mechanism of NB. The DEGs were shown to be significantly enriched in biological processes linked to the occurrence and development of NB, such as glial cell differentiation, the catecholamine biosynthetic process, and sympathetic nervous system development. The disialoganglioside GD2, a unique carbohydrate antigen, is one of the most attractive immune surface targets for NB. GD2 is uniformly expressed by NB cells, which makes GD2 an ideal therapeutic target for NB (Suzuki and Cheung, 2015;Sait and Modak, 2017). According to GO analysis, the DEGs in our study were significantly enriched in molecular functions involved in carbohydrate binding. Among these DEGs, there may be potential regulatory genes associated with GD2 immunotherapy. KEGG pathway analysis showed that the DEGs were mainly enriched in cancer-related pathways, such as the amino acid metabolism and cell adhesion processes. Cancer cells require an abundant supply of amino acids to sustain their growth (Vettore et al., 2020), and an increased amino acid metabolism has been observed in chemoresistant NB cell lines (Gunda et al., 2020). The cancer metastatic cascade is dependent on the loss of adhesion between cells, resulting in the dissociation of the cell from the primary tumor (Martin and Jiang, 2009). The subsequent PPI network results further confirmed that these DEGs participated in the NB cell differentiation and metabolism. We further screened 35 DEGs that significantly affected survival. Afterward, a risk prediction model consisting of five genes (GDPD5, NXPH1, AHI1, CPLX3, and SPAG6) was established. The upregulated genes CPLX3 and SPAG6 appeared to be associated with poor survival, whereas the downregulated genes GDPD5, NXPH1, and AHI1 were identified as tumor suppressors. The outcome was significantly different between NB patients with high and low risk scores. Additionally, the correlation between NB clinical features and the five-gene signature was investigated. Many known risk factors for NB that affect prognosis were shown to be related to a high risk score. Furthermore, the risk score for patients in each INSS stage increased from stage 1 to stage 4, except for stage 4S, which often has a favorable outcome (Kawano et al., 2021). Collectively, these findings suggest that patients with high risk scores exhibit more aggressive features than those with low risk scores. Subsequent external validation further confirmed that the five-gene signature could accurately predict survival in NB patients. To assess its clinical efficacy, we compared the gene signature with INSS staging and MYCN status, which are commonly used in predicting outcome in the clinic. The AUC and DCA curves also confirmed that the five-gene signature had a better prognosis prediction than the INSS stage and MYCN status, both in the short term and in the long term. In the MYCN nonamplification group, the gene model was still able to provide a good indication of the prognosis, which may aid physicians in evaluating prognosis and optimizing treatment options for this cohort of patients.
Due to the complex biological and clinical characteristics of NB, conventional staging systems may be difficult to utilize for calculating tumor risk in an accurate manner. To determine the treatment plan and predict the overall survival rate, doctors and researchers need a comprehensive prognostic evaluation system. Nomograms are being increasingly assessed in medical studies and are often used in the prediction of prognosis (Balachandran et al., 2015;Semenkovich et al., 2021). We integrated our fivegene model and other well-known clinicopathological parameters to create a nomogram. The prognosis for the children diagnosed with NB before 18 months is generally better than those diagnosed later (Vo et al., 2014). In the nomogram, age remained a crucial factor and ranked second after the fivegene model. Patients were divided into low-, medium-, and high-risk groups based on their total nomogram scores. The KM curves showed that the nomogram had excellent discriminative ability. According to the calibration curves, the predictions made by the nomogram were consistent with the observed data. The nomogram offers a visualized scoring system to facilitate medical decision making, which gives high-risk patients a chance to obtain more accurate treatments. Using it, physicians could predict the overall survival for each patient and make an individualized treatment plan. Patients in the high-risk group should be given special attention and more intensive treatment, while excessive treatment could be avoided for patients in the low-medium-risk groups.
Immunotherapy is currently a key hotspot in the field of cancer research. However, due to the difference in immunogenicity from adult tumors, targeted drugs, including immune checkpoint inhibitors, have limited applications in pediatric tumors (Wienke et al., 2021). In recent years, although new targeted therapies such as anti-GD2 therapy have exhibited great promise in treating NB, the limited efficacy and considerable toxicity make it still urgent to expand therapeutic targets and identify groups with potential benefits for NB immunotherapy (Sait and Modak, 2017;Park and Cheung, 2020). In our study, tumor immunological characteristics were assessed by the ESTIMATE algorithm. Higher immune scores and ESTIMATE scores were observed in patients with low risk scores. In light of the difference in overall survival between these two groups, more immune components in the tumor microenvironment may contribute to the better outcome of low-risk patients. In addition, the results calculated by the CIBERSORT algorithm indicated a significantly higher presence of T follicular helper cells (TFh) and activated NK cells in low-risk patients, both of which have been demonstrated to have protective roles in cancer. TFh plays an important role in humoral immunity (Eivazi et al., 2016). A previous study in NB showed that γδ TFh cells might facilitate the maturation of B cells and the production of antibodies (Mou et al., 2017). In studies of colorectal and breast cancer, infiltrating TFh cells have also been found to be positively correlated with prognosis (Bindea et al., 2013;Gu-Trantien et al., 2013). Similarly, a reduced degree of NK cell infiltration in tumor tissues has been found to be associated with a worse outcome in some patient cohorts (Albertsson et al., 2003;Mandal et al., 2016). NK cells play a central role in cancer immunotherapy (Shimasaki et al., 2020). As mentioned above, the application of anti-GD2 immunotherapy in recent years has significantly improved the survival of high-risk NB patients. NK cellmediated antibody-dependent cellular cytotoxicity (ADCC) is a potent mechanism of anti-GD2 immunotherapy for NB. Anti-GD2 monoclonal antibodies coadministered with NK cell infusion have achieved promising results in several clinical trials. These studies demonstrated that NK cell infusion enhances ADCC directed by the antibody (Federico et al., 2017;Modak et al., 2018). Dinutuximab-mediated autologous killing of NB cells could be enhanced by plasmacytoid dendritic cell-induced NK cell activation (Belounis et al., 2020). Recent findings have shown that exosomes derived from human NK cells contain tumor suppressor miRNAs that are cytotoxic to NB cells and inhibit tumor cell immune escape, suggesting that NK cellderived exosomes could be used as a complement to NK cellbased immunotherapy (Neviani et al., 2019). Except for these two types of cells, no differences were found between the two groups for other important immune cells in the training set, such as tumor-associated macrophages and CD8 + T cells. Interestingly, we obtained different results in the other two validation sets, suggesting that the different data sets selected might have been confounding. Unlike adult cancers, most pediatric solid tumors are considered immunologically cold, characterized by low tumor mutation burden, limited T-cell infiltration, and a lack of responsiveness to immune checkpoint inhibitors (Grobner et al., 2018;Wienke et al., 2021). In light of these differences between childhood tumors and adult tumors, a better understanding of NB immunobiology is needed, which will improve future immunotherapy in NB.
Among the five hub genes, AHI1, NXPH1, and CPLX3 have not been reported in any previous studies on NB. The oncogenic role of AHI1 has been identified in human leukemia and Sezary syndrome (Ringrose et al., 2006;Chen et al., 2013). Mutations in the AHI1 gene could lead to Joubert syndrome, which is a rare Frontiers in Molecular Biosciences | www.frontiersin.org December 2021 | Volume 8 | Article 769661 genetic disorder characterized by the underdevelopment of cerebellar vermis (Alvarez Retuerto et al., 2008). Moreover, it is also involved in autism (Alvarez Retuerto et al., 2008) and was considered to be a susceptibility gene for schizophrenia (Torri et al., 2010). It seems that AHI1, which has been associated with these brain developmental disorders, has a crucial role in brain growth during early life. As a specific neuronal cell surface protein, NXPH1 is essential for trans-synaptic activation (Missler et al., 2003). By using a genome-wide methylation screen, Faryna et al. found that the methylation level of NXPH1 in normal tissues was significantly higher than that in breast cancer samples, indicating it as a potential useful biomarker for tumor diagnosis (Faryna et al., 2012). In a study of intraductal papillary mucinous neoplasms (IPMNs), NXPH1 was more likely to be methylated in low-grade dysplasia than in high-grade dysplasia (Hong et al., 2012). NXPH1 was incorporated in a 10-gene signature that predicted biochemical recurrence for prostate cancer, and its expression level was upregulated in patients with a Gleason score ≥7 (Wu et al., 2020). In contrast, there was a negative correlation between lymph node metastasis and NXPH1 in pancreatic cancer (Jin and Tsai, 2016). GDPD5 is located on chromosome 11q13, and the presence of an 11q deletion is usually associated with an advanced stage and worse outcome in NB (Jiang et al., 2011). It belongs to the glycerophosphodiester phosphodiesterase family, which is critical for the metabolism of glycerol. In our study, we discovered that overexpression of GDPD5 is associated with better survival, which is in accordance with the result of a previous report in which high GDPD5 expression was found to be strongly correlated with a favorable outcome in neuroblastoma, while low GDPD5 expression was associated with a poor outcome. In addition, the study further revealed that GDPD5 induced the differentiation and inhibited the motility of NB cells via multiple mechanisms (Matas-Rico et al., 2016). GDPD5 was found to be overexpressed in colorectal cancer (CRC) and to promote metastasis and chemoresistance. Additionally, silencing GDPD5 in 5-Furesistant CRC cells decreased epithelial-to-mesenchymal transition (EMT) and cell invasion, both of which are essential for CRC metastasis (Feng et al., 2018). Cao et al. silenced GDPD5 in breast cancer cell lines and found a significant decrease in tumor cell viability, migration, and invasion (Cao et al., 2016). SPAG6 is identified as a cancer-testis antigen that regulates multiple functions in various cell types (Zheng et al., 2019). A variety of human cancers have been shown to be associated with SPAG6. A study on the differential analysis of genome-wide methylation of NB cell lines reported that SPAG6 was a possible target of the CpG island methylator phenotype (CIMP), and CIMP was demonstrated to be negatively related to the survival of NB patients (Abe et al., 2005;Abe et al., 2008). However, the role of SPAG6 in NB progression requires further study. CPLX3 is specifically localized to retinal ribbon synapses. It is a key regulator of transmitter release at retinal ribbon synapses (McMahon et al., 1995;Mortensen et al., 2016). At present, there is no research on CPLX3 and its role in tumors. Our results suggest that it may promote the development of NB and affect prognosis. Therefore, the specific role of CPLX3 in NB needs to be further studied. Our research has discovered five genes closely related to MYCN. Some of them have been identified to be involved in development of NB, and some have been studied in other malignant tumors. Next, we will further investigate the relationship between these five pivotal genes and MYCN as well as their roles in the development of NB. However, there were several imitations to our study. First, since the TARGET-NBL gene expression data set used as the training set mainly contained NB cases in stage 1 and stage 4, we did not include INSS staging in the nomogram. This may limit the predictive power of the nomogram. Second, the bioinformatic analysis of this study was based on public data sets, and the tissues used for verification were retrospectively collected. Thus, an inherent bias in case selection may have influenced the findings.

CONCLUSION
Overall, we developed a novel five-gene signature and successfully built a prognostic nomogram that accurately predicts the survival of NB patients. It will simplify the stratification of NB patients and provide some guidance for individualized treatment. The five hub genes obtained in this study were based on integrated analyses of multiple data sets, which provides a very high level of reliability. Prior to this study, three of the identified prognostic genes had not been associated with NB. The mechanisms of these genes and their roles in targeted therapy for NB remain to be investigated.

DATA AVAILABILITY STATEMENT
The data sets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics Committee of the Tianjin Medical University Cancer Institute and Hospital. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
YX and XT performed data analysis and original draft writing; XL performed the study concept and design, and QZ reviewed the data and manuscript; All authors read and approved the final manuscript.