Identification and Validation of a Novel Prognosis Prediction Model in Adrenocortical Carcinoma by Integrative Bioinformatics Analysis, Statistics, and Machine Learning

Adrenocortical carcinoma (ACC) is a rare malignancy with poor prognosis. Thus, we aimed to establish a potential gene model for prognosis prediction of patients with ACC. First, weighted gene co-expression network (WGCNA) was constructed to screen two key modules (blue: P = 5e-05, R^2 = 0.65; red: P = 4e-06, R^2 = −0.71). Second, 93 survival-associated genes were identified. Third, 11 potential prognosis models were constructed, and two models were further selected. Survival analysis, receiver operating characteristic curve (ROC), Cox regression analysis, and calibrate curve were performed to identify the best model with great prognostic value. Model 2 was further identified as the best model [training set: P < 0.0001; the area under curve (AUC) value was higher than in any other models showed]. We further explored the prognostic values of genes in the best model by analyzing their mutations and copy number variations (CNVs) and found that MKI67 altered the most (12%). CNVs of the 14 genes could significantly affect the relative mRNA expression levels and were associated with survival of ACC patients. Three independent analyses indicated that all the 14 genes were significantly associated with the prognosis of patients with ACC. Six hub genes were further analyzed by constructing a PPI network and validated by AUC and concordance index (C-index) calculation. In summary, we constructed and validated a prognostic multi-gene model and found six prognostic biomarkers, which may be useful for predicting the prognosis of ACC patients.


INTRODUCTION
Adrenal cortical carcinoma (ACC) is a relatively rare malignant tumor (Lam, 2021). Forty percent of the cases had distant metastasis at the time of diagnosis (Ayala-Ramirez et al., 2013;Guillaume et al., 2014). Worst of all, most patients with ACC have to face poor prognosis. The morbidity age of ACC showed bimodal distribution in kids and in 40-to 50-year-old people (Rodriguez-Galindo et al., 2021). As for the prognosis of ACC patients, the 5-year survival rate for ACC patients was as reported as 50% according to recent data from the American Society of Clinical Oncology (ASCO). At present, many studies identified molecular biomarkers associated with ACCs to solve this problem (He et al., 2017). It means that predicting survival and prognosis of patients with the help of biomarkers has great application prospects. Thus, we aimed to construct a multi-gene signature and some prognostic biomarkers, which perhaps can help in predicting the prognosis of patients with ACC.
Weighted gene co-expression network (Langfelder and Horvath, 2008) has been widely used for potential biomarker selection (Lv et al., 2021). In the present study, by using GSE76019 [downloaded from Gene Expression Omnibus (GEO) database], a co-expression network was constructed to select two key modules related to survival state. Then we screened out hub genes in a co-expression network and further identified survival-associated genes among them by using GEPIA (Gene Expression Profiling Interactive Analysis) (Tang et al., 2017). Considering that Ridge, Elastic Net, and Lasso (least absolute shrinkage and selection operator) methods were widely used data mining methods especially for regression model construction (Ye et al., 2021), based on the survival-associated genes, all the three methods were included for constructing multiplegene signatures by using the data from TCGA database. Two prognostic models were constructed, which could significantly predict the OS (overall survival) of patients with ACC. Moreover, we constructed two nomograms based on the two multiple-gene signatures, which were verified to predict the probability of the overall survival of patients with ACC well. The calibration curve indicated that nomogram of model 2 showed better prognostic value compared with model 1, which was regarded as the best prognostic model. Moreover, six genes were further screened out by protein-protein interaction (PPI) network construction, which were regarded as hub genes in this study. In conclusion, we constructed a multi-gene signature that might be an independent survival and prognostic biomarker for ACCs. The nomogram based on the best model for survival rate of ACC patient prediction might give more beneficial guidance for clinical practice, and six hub genes were also screened out, which might be novel prognostic biomarkers of patients with ACC.

Data Collection
Supplementary Figure 1 showed the flowchart for the present research. In total, four independent datasets with complete clinical information were retrieved from public databases in the present study. For the WGCNA analysis, dataset GSE76019 (Pinto et al., 2016) was retrieved from the GEO database 1 to conduct WGCNA. GSE76019, performed on Affymetrix HT HG-U133 + PM array plate, included 34 ACCs. To search for the best regression model by using the Ridge (α = 0), Elastic Net (with α varying from 0.1 to 0.9), and Lasso (α = 1) methods, we downloaded the mRNA-seq data and clinical information of 1 http://www.ncbi.nlm.nih.gov/geo/ ACCs from The Cancer Genome Atlas (TCGA) database. After weeding out incomplete samples, 79 ACCs were included in this study. We randomly split them into two groups with a ratio of 2:1. The training set, including 52 samples, was selected to construct the regression models. Meanwhile, the rest of the 27 samples were used to preliminarily screen the best model and further validate our results (as an internal verification set). In order to conduct external validation for our results, two independent datasets were further retrieved. GSE19750 (Demeure et al., 2013) that performed on Affymetrix Human Genome U133 Plus 2.0 array included 44 ACCs and four normal adrenal glands. Dataset GSE76021, performed on Affymetrix Human Genome U133A Array, included 29 ACC samples. Supplementary Table 1 shows a detailed information including gender, age, stage, grade, and survival data of these datasets. The clinical characteristics of the training dataset and the internal validation dataset are shown in Supplementary Table 2. Further correlation analysis indicated that there was no significant difference between the two datasets (P of age: 0.870, P of gender: 0.660, P of stage: 0.051, and P of status: 0.816), which demonstrated that our grouping method was appropriate and reasonable.
To explore mutations and copy number variations (CNVs) of genes in the best model, we also downloaded the CNV data of the ACC samples from TCGA database.

Preprocessing of Collected Data
For the microarray data of GSE19750, GSE76019, and GSE76021 from Affymetrix, we first downloaded the raw "CEL" data from the GEO database and further applied robust multiarray averaging (RMA) method for background adjustment and quantile normalization, by using the "affy" package in R software. For TCGA-ACC data, we first downloaded the count number expression profiles. The normalization and log2 transformation were next obtained via "DEseq.2" (Love et al., 2014) in R. Batch effects from non-biological technical biases were corrected using ComBat algorithm based on R package "sva" (Leek et al., 2012).
For WGCNA, 34 adrenocortical carcinoma samples from GSE76019 were included for WGCNA. Probe sets were filtered by their variances across all samples; only probes with variances ranked in the top 2,500 were selected for subsequent analyses. Microarray quality was assessed by sample clustering according to the distance between different samples in Pearson's correlation matrices and average linkage. The expression values from TCGA-ACC data of the genes we identified in WGCNA were extracted to construct the regression models by using the Ridge, Elastic Net, and Lasso methods.

Co-expression Network Constructing and Key Module Screening
First, gsg (goodSamplesGenes) and sample network methods were used to check the expression data profile of the top 2,500 genes from GSE76019. A standard of Z.Ku ≥ −2.5 was set to pick out qualified ACC samples for constructing a co-expression network. Based on "WGCNA" (Langfelder and Horvath, 2008) in R software, we constructed a co-expression network. We used three branch cutting methods used for classification of genes and construction of modules [manual (interactive) branch cutting approach, automatic single block analysis, and two block analysis]. Some essential parameters for branch splitting are shown here: minClusterSize = 30 and deepSplit = 2. Furthermore, a cutline for merging highassociated models was set through calculating the dissimilarity of module eigengenes (MEs). In order to identify key modules that correlated to survival state (the trait that interested us the most), we first calculated the gene significance (GS) with the aim of quantifying the correlation between genes and trait. Then we further defined the module significance (MS), which was the average GS of all the genes in a module. Based on the above analyses, the most positive correlation module and the most negative correlation module were regarded as key modules.

Exploration of Potential Functions of Genes
GO enrichment (Ashburner et al., 2000) and KEGG pathway analyses (Kanehisa and Goto, 2000) were conducted via "clusterProfiler" (Yu et al., 2012) in R software. For the GO part, we only identified the biological process (BP). A gene set or KEGG signaling pathway with P < 0.05 was thought to be strong and enriched.

Survival-Associated Gene Identification
In the present study, we chose genes that reached the standards of |cor.geneModuleMembership| > 0.8 and |cor.geneTraitSignificance| > 0.2 as hub genes. In order to find out genes associated with survival, we further performed overall survival (OS) analysis and diseases-free survival (DFS) analysis for these hub genes by using GEPIA (Tang et al., 2017). Genes were considered to be survival-associated genes when they showed significant P-value in both OS and DFS.

Regression Model Construction and the Best Model Screening
Based on the training set, the expression values of survivalassociated genes were extracted to construct models by using the Ridge, Elastic Net, and Lasso methods. Elastic Net regression utilizes a combination of Lasso (L1-norm) and Ridge (L2-norm) regularization penalties, and reduces overfitting by limiting coefficient magnitudes (Zou and Hastie, 2005). The regression equation can be seen in a previous study (Zou and Hastie, 2005). All of the three methods were performed by using R package "glmnet" (Friedman et al., 2010). A 10-fold cross-validation was performed for tuning parameter selection in the models. The 10fold cross-validation was based on 1-SE (standard error) criteria. Samples in the training set were randomly divided into 10 parts, nine of which were trained in turn, while the remaining one was tested, and the average of the results of the 10 tests was used to estimate the accuracy of the algorithm. A total of 11 models would be constructed. We further got a gene signature from each model that contained the most useful prognostic biomarkers that the model was thought to have. To identify the best model among the 11 models, we calculated the risk score (RS) of every sample in the internal validation set based on these models. In each model, we calculated the RS by using the formula as follows: In which Coef represents the relative regression coefficient and Exp represents the expression value of each prognostic gene identified by the Ridge, Elastic Net, or Lasso methods. In the internal validation set, we used the risk score of each model to represent the model and further worked out the AUC to distinguish live ACCs from dead ACCs by performing receiver operating characteristic curve (ROC) analysis using the R package "plotROC" (Sachs, 2015). The model with the highest AUC was regarded as the best model, which was included for further analysis.

Survival Analysis and Time-Dependent Receiver Operating Characteristic Analysis
We further worked out the risk score of each sample in all datasets based on the best model. ACCs in the training set, internal validation set, entire set, GSE19750, and GSE76021 were divided into high-and low-risk groups according to the median risk score we calculated. R package "survival" (Therneau, 2015) was used to perform survival analysis. In order to investigate the performance of the signature, R package "timeROC" (Blanche et al., 2013) was used to perform time-dependent ROC analysis. In this study, AUC values of 1, 3, and 5 years were explored, accurately, which could affect predictive accuracy.

Cox Proportional Hazards Regression Analysis
To verify the prognostic value of the gene signature (best model), we included the risk score of the best model and some essential clinicopathological features (gender, age, tumor-, T-, N-, and M-stages, and Weiss score) for univariable Cox analysis of OS via TCGA-ACC dataset. Features with P-value < 0.05 were immediately chosen for multivariate Cox analysis. It would be clear if the gene signature was irrelevant to other clinical features for predicting OS of ACCs after Cox regression analysis. Visualization was applied by using "forestplot" (Aut and Aut, 2016) in R software.

Gene Set Enrichment Analysis
In order to understand the lurking functions of the best multigene signature, we evaluated the median risk score based on GSE76019 (the dataset which was used for WGCNA analysis). After that, a total of 34 ACCs were split into two groups accurately (high-and low-risk groups). "c2.cp.kegg.v7.2.symbols.gmt, " "c5.all.v7.4.symbols.gmt, " and "c6.all.v7.4.symbols.gmt" were set as the reference gene sets, respectively. GSEA (Subramanian et al., 2005) was conducted between the two groups. In this study, KEGG signaling pathways reached the standards (nominal P < 0.05, | ES| > 0.6, gene size ≥ 100 and FDR < 25%) and were significantly enriched.

Nomogram Construction and Validation
Cross-validation could deal with the overfitting of the model. Thus, we performed cross-validation before nomogram construction. Based on TCGA-ACC data, "rms" in R was used to construct the nomogram (Yizhou et al., 2013). Calibration curve was plotted to test the nomogram. The calibration curve was also used for visualization and the 45 • line in the curve represents the best prediction. Besides, in order to evaluate the prediction effectiveness of the nomograms, the consistency index (C-index) (Michael and Ralph, 2010) between actual probability and predicted probability was measured. ROC curves were also plotted using R package "pROC" (Robin et al., 2011). In addition, in order to check the stability of the nomogram with and without ARG signature, we performed time-dependent (1-, 3-, and 5-year) ROC analysis. Beyond ROC, we also used R package "rmda" to perform decision curve analysis (DCA) (Vickers and Elkin, 2006) to make sure if the signature was of great value for predicting 1-, 3-, and 5-year survival probability by using TCGA-ACC data and GSE19750. Moreover, survival analysis was performed to explore the difference in survival between different nomogram point groups.

Mutations and Copy Number Variations of Genes in the Best Model
To explore mutations and CNVs of genes in the best model, we included all the ACC samples with CNV data from TCGA database for this analysis. A webtool named CBio Cancer Genomics Portal 2 was chosen for exploration of genetic alterations. Furthermore, we identified the association between genetic alterations and the clinical features of patients with ACC. We used chi-square or Mann-Whitney U-test to analyze the statistical significance of the result. By using the relative mRNA expression levels of genes in the best model, we also explored the relationship between CNVs and mRNA expression levels of these genes. ANOVA or Kruslal-Wallis test was chosen to test the result. Unpaired t-test was used to test the expression differences between shallow deletion and gain groups. Moreover, the association between CNVs of genes in the best model and survival of patients with ACC was also identified by performing survival analysis based on R package "survival."

Linear Discriminant Analysis, K-Nearest Neighbor, and Support Vector Machine
To verify the prognostic value of genes in the best model, we regarded genes as variables and relative mRNA expression values as variable values. Then we performed LDA, KNN, and SVM analyses. We used package "MASS" (Venables and Ripley, 2002) in R software for LDA. For KNN algorithm, we chose the best K parameter by adopting cross validation method based on R package "caret" (Kuhn, 2015). Then we performed KNN algorithm by using R packages "class" (Venables and Ripley, 2002) and "kknn" (Hechenbichler and Schliep, 2004). For SVM analysis, four kinds of algorithms were used in the present study based on "e1071" (Meyer, 2015) in R software [linear-SVM, polynomial-SVM, radial basis function (RBF) SVM, and sigmoidkernel SVM]. R package "kernlab" (Karatzoglou et al., 2004) was used to assist with SVM feature selection. We used the data from TCGA as training set to first construct classifier in each analysis and further verify the value of the classifier by calculating the classification rate based on TCGA-ACC data, GSE19750, GSE76019, and GSE76021. In this study, we regarded genes in the best model as meaningful prognostic biomarkers when the accuracy of classification ≥ 0.50.

Protein-Protein Interaction Network Construction to Screen Hub Genes in the Best Model
To find out some relative genes in the best model, we first constructed a PPI network of genes formed from this model by using the STRING (Szklarczyk et al., 2019). Then the PPI network was imported into the Cytoscape software 3 . We further calculated the degree of connectivity of each gene. In this study, genes with the highest degree were regarded as hub genes. We also analyzed and visualized the correlation among genes in the best model by performing Spearman analysis using R package "corrplot" (Wei and Simko, 2013) and "PerformanceAnalytics" (Peterson et al., 2014). We further performed ROC and calculated AUC and C-index to validate these hub genes.

Constructing a Co-expression Network to Identify Key Modules
We first included 33 ACC samples to perform WGCNA analysis after weeding out the outlier samples (Supplementary Figure 2). According to the result, the beta (β) = 5 (scale free R2 = 0.89) was further set as the soft-thresholding value for further adjacency calculation (Supplementary Figure 3). Eventually, as shown in Supplementary Figure 4, in total, eight modules were identified. Genes not included in any other significant modules were included in the gray module, which were removed from subsequent analysis. Among the eight modules, the blue module was positively associated with survival status the most (P = 5e−05, R2 = 0.65). Meanwhile, the red module was the most negatively associated module (P = 4e−06, R2 = −0.71) ( Figure 1A). As shown in Figure 1B, the MS of the two modules were also higher than the MS of any other module. The relationships between MM and GS in the blue module (cor = 0.64) and red module (cor = 0.74) were also significant as suggested by Figures 1C,D. Thus, we considered blue and red modules to be the key modules in this study. As shown in Supplementary Figures 5A,B, we also created a network heatmap and a classical MDS plot. migration, positive regulation of cell migration, regulation of G-protein-coupled receptor protein signaling pathway, and chemokine-mediated signaling pathway (Supplementary Figure 6A). As for genes in the blue module, in total, they were significantly correlated with 197 BPs (Supplementary Table 3). The top 10 enrichment BPs were organelle fission, nuclear division, chromosome segregation, nuclear chromosome segregation, mitotic nuclear division, sister chromatid segregation, mitotic sister chromatid segregation, microtubule cytoskeleton organization involved in mitosis, regulation of chromosome segregation, and mitotic spindle organization (Supplementary Figure 6B). When talking about KEGG pathway analysis, genes in the blue module were enriched in only three KEGG pathways including cell cycle, oocyte meiosis, and p53 signaling pathway (Supplementary Figure 6C).

Survival-Associated Gene Identification
With the cutoff criterion (| cor.geneModuleMembership| > 0.8 and | cor.geneTraitSignificance| > 0.2), 107 genes (12 genes in the red module and 95 genes in the blue module) were identified in the co-expression network in total. Then we performed OS and DFS analysis for these 107 genes by using GEPIA. According to the results, high expression levels of 95 genes were obviously associated with poor OS for ACCs. Meanwhile, 100 genes showed significant P-values in DFS analysis. Ninety-three genes were common among them, which were regarded as survivalassociated genes (Supplementary Table 4).

Establishment of Two Multi-Gene Signatures for Predicting Overall Survival
We identified 93 biomarkers significantly related to survival of ACCs by using WGCNA analysis. With the aim of setting up a multiple-gene signature for prognosis in ACC patient prediction, we calculated relative regression coefficient of each gene using the Ridge, Elastic Net, and Lasso methods. The most powerful prognostic markers of the 93 biomarkers were screened out in each model as shown in Supplementary Figure 7. After that, 11 prognostic models were constructed, and the relative regression coefficients of the most powerful prognostic markers are shown in Supplementary Table 5. Combining the relative expression levels of the mRNA in the classifier and corresponding Lasso coefficients, we first worked out the risk scores of samples in the internal verification set. The distribution of risk scores of Ridge (Supplementary Figure 8A), Elastic Net with α varying from 0.1 to 0.9 (Supplementary Figures 8E-J), and Lasso model (Supplementary Figure 8K) of ACC patients based on TCGA-ACC data are shown. In all the models, it was obvious that the number of patients who died in the high-risk group was high, when compared with the low-risk group (Supplementary Figure 8). Furthermore, we worked out the AUC based on the risk score (Table 1). Two models, namely, model 1 (AUC = 0.676; α = 0.1) and model 2 (AUC = 0.676; α = 0.6) were finally identified, which were considered as the best candidate models.
In model 1 and model 2, ACCs in the training set were further split into two groups (low-and high-risk) via a median-risk score (5.0406 in model 1 and 16.5329 in model 2). In both models, ACC patients in the high-risk group occupied worse OS (Model 1: Figure 2A; model 2: P < 0.0001, Figure 2D). We also performed time-dependent ROC analysis. The results suggested that the AUC values of model 1 in the training set were 0.95 at 1 year, 0.98 at 3 years, and 0.96 at 5 years (Figure 2A). Meanwhile the AUC values of model 2 in the training set were 0.92 at 1 year, 0.91 at 3 years, and 0.95 at 5 years ( Figure 2D).

Validation of Model 1 and Model 2
In the present study, we used three validation sets (intervalidation set, GSE19750, and GSE76021) to validate the results we got from the training set. With the same method we mentioned in the training set, we calculated the risk score for each sample in these three validation sets based on model 1 and model 2. Samples in internal validation set, GSE19750, and GSE76021 were split into two groups relying on the median-risk score in each set. In model 1, according to the survival analysis, the high-risk group had lower survival rate compared with the low-risk group in all the validation sets (internal validation set: P = 3E−04; GSE19750: P = 0.011; GSE76021: P = 0.012) as the training set suggested (Figures 2B, 3A,B). The same conclusion was reached in the entire TCGA-ACC dataset ( Figure 2C). As for the results of timedependent ROC analysis, the prognostic accuracy of model 1 in the internal validation set was 0.73 at 1 year, 0.82 at 3 years, and 0.77 at 5 years ( Figure 2B); the prognostic accuracy of model 1 in the entire TCGA-ACC dataset was 0.86 at 1 year, 0.90 at 3 years, and 0.96 at 5 years ( Figure 2C); the prognostic accuracy of model 1 in GSE19750 was 0.61 at 1 year, 0.88 at 3 years, and 0.91 at 5 years ( Figure 3A); the prognostic accuracy of model 1 in GSE76021 was 0.84 at 1 year, 0.78 at 3 years, and 0.76 at 5 years TABLE 1 | Area under the curve (AUC) of risk scores calculated by the Ridge (α = 0), Elastic Net (with α varying from 0.1 to 0.9), and least absolute shrinkage and selection operator (Lasso; α = 1) methods by using internal validation set.

Models
Alive  Figure 3B). As for model 2, the high-risk group was associated with poorer survival compared with the low-risk group (internal validation set: P = 0.018; entire TCGA-ACC dataset: P < 0.0001; GSE19750: P = 0.0035; GSE76021: P = 0.012; Figures 2E,F, 3C,D). Moreover, the prognostic accuracy of model 2 in the internal validation set was 0.78 at 1 year, 0.74 at 3 years, and 0.76 at 5 years ( Figure 2E); the prognostic accuracy of model 2 in the entire TCGA-ACC dataset was 0.92 at 1 year, 0.91 at 3 years, and 0.95 at 5 years ( Figure 2F); the prognostic accuracy of model 2 in GSE19750 was 0.60 at 1 year, 0.89 at 3 years, and 0.89 at 5 years ( Figure 3C); the prognostic accuracy of model 2 in GSE76021 was 0.90 at 1 year, 0.78 at 3 years, and 0.77 at 5 years ( Figure 3D). Both the two models showed great prognostic values of patients with ACC.

Prognostic Value of Model 1 and Model 2
Because we had identified the two best models before, we respectively included the risk score of model 1 (or model 2) and important factors we mentioned before for the Cox regression analysis. Risk score of model  (Figures 4A,C). In model 1, the results of multivariate Cox analysis suggested that even being adjusted by other features, risk scores of model 1 (hazard ratio = 4.663, 95% CI of ratio: 2.238-9.713, P < 0.001) were still relevant to OS among patients with ACC as shown in Figure 4B.
In model 2, the results of multivariate Cox analysis suggested that even being adjusted by other features, risk scores of model 2 (hazard ratio = 5.603, 95% CI of ratio: 2.410-13.024, P < 0.001) and tumor stage (hazard ratio = 3.803, 95%CI of ratio: 1.223-11.822, P = 0.021) were still relevant to OS among patients with ACC ( Figure 4D).

Identification of Model 1 and Model 2-Associated Biological Pathways
With the cutoff criteria we set before, only one risk score-related KEGG signaling pathways were enriched in model 1 including cell cycle (nominal P < 0.001, | ES| = 0.738, gene size = 123, and FDR = 0.903%) as shown in Supplementary Table 6. In model 2, two risk score-related KEGG signaling pathways were significantly enriched including cell cycle (nominal P < 0.001, | ES| = 0.722, gene size = 123, and FDR = 0.884%) and oocyte meiosis (nominal P < 0.001, | ES| = 0.604, gene size = 111, and FDR = 0.913%; Supplementary Table 6). Interestingly, these two pathways were consistent with the results of KEGG analysis of genes in the blue module. We also increased the biological pathways related to C5 (ontology gene sets). The results indicated that model 1 was significantly associated with DNR replicationrelated and cell cycle-related signaling pathways (Supplementary Table 6). The top three enriched biological pathways were mitotic sister chromatid segregation (nominal P < 0.001, | ES| = 0.829, gene size = 158, and FDR = 0%), condensed chromosome (oncogenic signature gene sets)-related biological pathways were also explored. We found that model 1 was significantly related to seven biological pathways. Meanwhile, model 2 was associated with six pathways (Supplementary Table 6). Both the two models

Model 2 Was Regarded as the Best Model With Better Clinical Utility
Both of the two candidate models showed great prognostic values of patients with ACC according to previous analyses. In order to distinguish the best prognostic model, we further constructed two nomograms separately according to risk scores of model 1 (Supplementary Figure 9A) and model 2 ( Figure 5A) and other significant features in multivariate Cox analysis. Both of the two models showed good potential for clinical application.
According to the result of the calibration curves (Figures 5B-D and Supplementary Figures 9B-D), the nomogram of model 2 showed better value compared with the ideal model and model 1.
Especially for nomogram's 3-year and 5-year OS, the nomogram showed great performance almost as the same as the ideal model did (Figures 5C,D). Thus, we regarded model 2 (α = 0.6) as the best prognostic model in the present study.

Validation of the Nomogram Assessed by the Best Model
In order to validate the predictive value of the nomogram based on the best model, we further performed DCA, C-index, and ROC analyses. DCA was performed to evaluate the clinical net benefit using the nomogram with and without risk score (calculated by the best model) for predicting 1-, 3-, and 5year survival probability. As Figure 5E showed, there was a trend that the nomogram based on the best model occupied higher net benefit than the nomogram without risk score when predicting 1-year survival probability, when Pt ranged between 0 and 0.15. For the evaluation of the clinical net benefit using the nomogram based on the best model for predicting 3year survival probability, the nomogram with the best model had higher net benefit compared with the nomogram without risk score, when Pt ranging from 0 to 0.50 ( Figure 5F). The nomogram with the best model also obviously improved the net benefit for 5-year survival prediction. In detail, the nomogram with the best model had a higher net benefit than the simple nomogram of Pt between 0 to 0.20 ( Figure 5G). Summarizing above, the nomogram assessed by the best model showed high potential for clinical application, especially for 1-and 3year survival prediction. ROC analysis using TCGA-ACC data indicated that the nomogram could predict OS of ACC patients effectively (AUC: 0.891; C-index: 0.872; Figure 5H). Timedependent ROC curves demonstrated that this nomogram with the best model showed excellent stability over a period of 5 years (1-year AUC: 0.838, 3-year AUC: 0.942, 5-year AUC: 0.961, Figure 5I). We directly used the nomogram to divide the ACC patients into high-and low-point groups and survival analysis demonstrated that ACC patients with higher nomogram points had better OS compared with patients with lower nomogram points (P < 0.0001, Figure 5J). We also validated the predication value of this nomogram by using GSE19750, we got the similar conclusion that the nomogram with the best model could predict OS of ACC patients effectively (AUC: 0.864; C-index: 0.882; Figure 5K). Time-dependent ROC curves demonstrated that this nomogram based on the best model showed good stability over a period of 5 years (1-year AUC: 0.897, 3-year AUC: 0.882, 5year AUC: 0.755, Figure 5L). Survival analysis also demonstrated that ACC patients in highnomogram point group had better OS compared with ACC patients in low-nomogram point group (P = 0.029, Figure 5M).

A Summary of Mutations and Copy Number Variations of Genes in the Best Model
The best model contained 14 most powerful prognostic markers [ASPM (abnormal spindle microtubule assembly), AURKA (aurora kinase A), CCNB2 (cyclin B2), CDC20 (cell division cycle 20), CENPA (centromere protein A), EXO1 (exonuclease 1), FBXO5 (F-box protein 5), HJURP (Holliday junction recognition protein), KIF2C (kinesin family member 2C), MKI67 (marker of proliferation Ki-67), NUF2 (NUF2 component of NDC80 kinetochore complex), PARPBP (PARP1 binding protein), TACC3 (transforming acidic coiled-coil containing protein 3), and TROAP (trophinin associated protein)], which were selected for further analysis. Among the 92 ACC patients with sequencing data, only in 10 independent samples contained mutations of genes as in the best model ( Table 2). As for genetical alteration of the 14 genes, MKI67 altered the most (12%), and the main type was mRNA High (Supplementary Figure 10). Combined with the relative mRNA expression values of these genes, these genes seemed to highly express when there exist alterations in them (Supplementary Figure 10). Among the 90 ACC samples with CNV data, CNVs of the 14 genes constantly existed ( Figure 6A and Table 3). Among the 14 genes, PARPBP was the most associated gene, which had the highest frequency of CNV events (77.78%, 70/90) ( Figure 6A and Table 3). Moreover, gain of copy number was the most common CNV event (62.32%, 344/552, Figure 6B).

Relationship Between Alterations in Genes in the Best Model and the Clinical Features
Based on the data from TCGA database, six clinical features (gender, age, pathologic stage, T stage, N stage, and M stage) were collected for this analysis. As the result suggested (Table 4), alterations in genes were significantly associated with pathologic stage (Chi-square = 10.644, P = 0.014), T stage (Chisquare = 11.008, P = 0.012), and M stage (Chi-square = 10.687, P = 0.001).

Validation of the 14 Genes in the Best Model
According to the result, all the classifiers constructed and verified by TCGA-ACC data showed good performance because the accuracy of classification was more than 0.70 (Table 5). Especially the polynomial-SVM-based classifier, its accuracy was 0.9870, and we validated all the classifiers by using GSE19750, GSE76019, and GSE76021. All the classifiers showed good performance in GSE76019 and GSE76021 (accuracy of classification ≥ 0.50, Table 5), but in GSE19750, KNN-based classifier and all the SVM-based classifiers did not play well as we expected (Table 5), perhaps because only 22 ACC samples from GSE19750 were included for validation. All in all, all the classifiers constructed based on the 14 genes showed great performance to distinguish dead ACC samples from live ACC samples, which meant all the 14 genes were significantly associated with prognosis of patients with ACC.

External Validation of the Best Model/the Nomogram and Comparison With Other Biomarkers Using Gene Expression Data
In order to assess the prognostic value of the six hub genes, and compare the prognostic value of the best model/the nomogram with the prognostic biomarkers given by other studies, we collected several biomarkers from previous studies. In total, eight biomarkers including CTNNB1 (Assié et al., 2014), IGF2 (Catalano et al., 2021), TP53 (Assié et al., 2014), MKI67 (Beuschlein et al., 2015), SF1 (Tian et al., 2021), IPRPs model (Tian et al., 2021), m6A-related signature (Jin et al., 2021), and m6A-based signature (Shen et al., 2021) were used in the present study. According to the result (Table 6)  C-index = 0.872) showed better ability for predicting prognosis than those biomarkers collected from previous studies. AUC and C-index of these biomarkers are shown in Table 6. To summarize the above, the results demonstrated that the best model and the nomogram based on the best model may act better than other biomarkers.

DISCUSSION
Adrenocortical carcinoma is a relatively rare malignancy in the urinary system associated with a poor prognosis (Lam, 2021). 5-year survival rate for patients with ACC was 50% as ASCO reported. Similarly, it is not optimistic that 40% of the cases have distant metastasis when diagnosed (Ayala-Ramirez et al., 2013;Guillaume et al., 2014), which means that they have missed the best opportunity for treatment. Thus, there is of great need to develop novel molecular biomarkers for diagnosis and prognosis of ACC patients. Over the past several years, the discoveries of novel and informative genes by using bioinformatics methods have provided valuable information in the diagnosis of malignancies (He et al., 2017;Meulekamp et al., 2017). At present, WGCNA has been widely used to screen out novel and effective molecular biomarkers in the bioinformatics field Li et al., 2019;Lv et al., 2021). In our previous study, we have made great efforts to use WGCNA as the main method to identify prognosis biomarkers for bladder cancer (Yan et al., 2019). Thus, we first identified genes associated with survival of ACC in the present study, and 93 biomarkers significantly relevant to survival of ACC were further identified combined with overall survival analysis and disease-free survival analysis. Interestingly, some studies thought that a single biomarker might not predict the prognosis of patients with ACC well . Nowadays, more and more studies have paid their attention on developing a multi-gene signature for predicting prognosis in malignancies instead of just a single biomarker (Qiao et al., 2019;Wu et al., 2019). Given that the Ridge, Elastic Net, and Lasso methods have been widely used to construct a Cox proportional hazard regression model (Lu et al., 2019), 11 multiple-gene signatures were constructed for predicting prognosis of ACC patients. Two candidate models were further screened out and validated by calculating the AUC by using internal validation set. To test whether the prognostic value of the two signatures were independent of other clinical features (gender, age, tumor stage, T stage, N stage, M stage, and Weiss score), we included these factors for the Cox regression analysis. The result indicated that tumor stage, risk score of model 1, and risk score of model 2 were significant prognostic factors for ACC patients. Moreover, risk score of model 1 and risk score of model 2 were independent of tumor stage. Tumor stage including TNM staging system reflects the internal characteristics of tumors, which is of great value for evaluation of the degree of tumor deterioration and prediction of prognosis according to previous findings (Veeratterapillay et al., 2012). In our study, model 1 and model 2 could not only discriminate between low-and high-risk groups like tumor stage did, but also showed more effective prognostic value compared with tumor stage, which made our results reliable. To better understand the biological role of the two signatures, we performed GSEA analysis. Two KEGG pathways associated with model 1 were identified including cell cycle and oocyte meiosis. Meanwhile, one KEGG pathway associated with model 2 was identified including cell cycle. Cell cycle was the basic process of cell proliferation. According to a study by Kaistha et al. (2015), the deregulation of cell cycle was obviously associated with tumor progression. As for oocyte meiosis, maturation and development of oocyte is a complex biologic process in mammals, which is the key event in the reproductive process (Jaffe and Egbert, 2017). Any errors in this process can lead to the oocyte developmental abnormalities or infertility (Jaffe and Egbert, 2017). Meiosis is specialized cell division with the reduction in genetic content, which guaranteed the stability of chromosome numbers of gamogenetic organisms (Jaffe and Egbert, 2017). Subsequent analysis demonstrated that model 2 showed better predictive value than model 1; thus, we regarded model 2 as the best model in the present study. Moreover, the best model was validated to show strong association with survival of ACC patients and high accuracy for OS probability prediction in patients with ACC.
By exploring the association between mutations and CNVs with genes in the best model, we found that high expression levels of genes in the best model was associated with gene alteration. Subsequent analysis indicated that shallow deletions in ASPM, CENPA, EXO1, HJURP, and NUF2 caused better OS of ACC patients. Meanwhile, shallow deletions in AURKA, MKI67, PARPBP, and TACC3 led to poorer OS. To summarize the above, we found that alterations in these biomarkers might have strong effects on the survival of ACC patients. A PPI network was constructed based on the 14 genes from the best model; six hub genes were further identified according to the degree of connectivity. All the hub genes could effectively predict the OS of ACC patients. In the past few years, some potential prognostic biomarkers for ACC were identified. Tian et al. constructed a robust prognostic model for ACC (Tian et al., 2021). Two m6A-related signatures, which might be used as independent prognostic factors in evaluating the prognosis of ACC patients, were also established (Jin et al., 2021;Shen et al., 2021). Among the six hub genes we identified, two, including ASPM (Yuan et al., 2018) and CCNB2 (Gao et al., 2019;Guo et al., 2020), were already regarded as prognostic biomarkers in previous studies. Yuan et al. screened out 12 hub genes including ASPM, which were associated with the progression and prognosis of ACC. Two recent studies proved that CCNB2 was associated with worse OS of ACC (Gao et al., 2019;Guo et al., 2020). As for the rest of the four genes, the relationship between them and prognosis of ACC has never been reported by previous studies, which might be novel prognostic biomarkers for ACC. Moreover, the risk score calculated by the best mode and the nomogram based on the best model showed better ability in predicting prognosis than these biomarkers collected from previous studies.
There still exist some limitations in our study. First, though our study was a relatively preliminary exploration of establishing a multi-gene signature by combing bioinformatics methods, statistics, and machine learning, there might be lack of novelty in methods. Therefore, we will explore more novel bioinformatics and machine learning methods to integrate prognostic gene signatures. Second, the best prognostic model was developed and validated by using data derived from TCGA and GEO databases. Although the signature could distinguish high-and low-risk groups well, it was not clear if it could show good performance as we expected in clinical trials. Thus, we will further apply and test this signature in clinical judgment for prognosis of ACCs. Third, according to the results of DCA, the best signature showed high potential for 5-year survival prediction when 0.00 < Pt < 0.20, but when Pt equaled other values, this signature did not play well as we expected, perhaps due to the small size of these datasets. Therefore, we will further validate the multi-gene signature in larger and richer datasets.
To sum up, 93 hub genes in the co-expression network were identified as survival associated genes. Moreover, we constructed an Elastic Net model including 14 genes, which were validated to show good performance by using three independent datasets. This signature could act as an effective prediction tool for the prognosis of ACC patients independently. The best model-based nomogram was established based on risk score assessed by this risk signature to provide clinical doctors with a visual tool. Moreover, six hub genes were identified among the 14 genes from the best model, which might be novel prognostic biomarkers of prognosis of patients with ACC. However, the best model and six hub genes needs to be verified by using more novel bioinformatics methods, clinical trials, and larger datasets.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
SL and T-ZL conceived and designed the study. XY, Z-XG, D-HY, and X-PL performed the analysis procedures. XY, Z-XG, CC, and X-PL analyzed the results. SL, Z-WY, and T-ZL contributed analysis tools. XY contributed to the writing of the manuscript. All authors reviewed the manuscript.