Development and validation of a copper-related gene prognostic signature in hepatocellular carcinoma

Introduction: Reliable biomarkers are in need to predict the prognosis of hepatocellular carcinoma (HCC). Whilst recent evidence has established the critical role of copper homeostasis in tumor growth and progression, no previous studies have dealt with the copper-related genes (CRGs) signature with prognostic potential in HCC. Methods: To develop and validate a CRGs prognostic signature for HCC, we retrospectively included 353 and 142 patients as the development and validation cohort, respectively. Copper-related Prognostic Signature (Copper-PSHC) was developed using differentially expressed CRGs with prognostic value. The hazard ratio (HR) and the area under the time-dependent receiver operating characteristic curve (AUC) during 3-year follow-up were utilized to evaluate the performance. Additionally, the Copper-PSHC was combined with age, sex, and cancer stage to construct a Copper-clinical-related Prognostic Signature (Copper-CPSHC), by multivariate Cox regression. We further explored the underlying mechanism of Copper-PSHC by analyzing the somatic mutation, functional enrichment, and tumor microenvironment. Potential drugs for the high-risk group were screened. Results: The Copper-PSHC was constructed with nine CRGs. Patients in the high-risk group demonstrated a significantly reduced overall survival (OS) (adjusted HR, 2.65 [95% CI, 1.83–3.84] and 3.30, [95% CI, 1.27–8.60] in the development and validation cohort, respectively). The Copper-PSHC achieved a 3-year AUC of 0.74 [95% CI, 0.67–0.82] and 0.71 [95% CI, 0.56–0.86] for OS in the development and validation cohort, respectively. Copper-CPSHC yield a 3-year AUC of 0.73 [95% CI, 0.66–0.80] and 0.72 [95% CI, 0.56–0.87] for OS in the development and validation cohort, respectively. Higher tumor mutation burden, downregulated metabolic processes, hypoxia status and infiltrated stroma cells were found for the high-risk group. Six small molecular drugs were screened for the treatment of the high-risk group. Conclusion: Copper-PSHC services as a promising tool to identify HCC with poor prognosis and to improve disease outcomes by providing potential clinical decision support in treatment.

To make full use of all available data with missing observations in the development cohort, we applied imputation by Chained Equations [1] under the assumption of "missing at random". We included Copper-PSHC risk score (as a continuous variable), age, sex, tumor grade, margin residual, vascular invasion, and stage in the imputation model, with a random seed of 5,020,027. The missing observations were 1.4%, 7.4%, 15.3%, and 6.8% for tumor grade, margin residual, vascular invasion, and stage, respectively. Only variables with less than 20% missing observations were imputed.

Differentially expressed genes (DEGs) selection
For the initial screening of copper-related genes, we searched PubMed for studies with the keywords of ("copper homeostasis" OR "copper binding" OR "cuproptosis") included in the title or abstract. Then, two researchers (HS and JH) read the title and abstract to determine whether the articles fit the purpose of our study, respectively. When there was a disagreement, a third researcher (XW) would judge it. Finally, ten articles that were most consistent with our study were included. We enumerated the copper-related genes mentioned in these articles and excluded duplicates. Genes related to copper included in the study and the corresponding references were presented in Supplementary Table 2.
The DEGs between tumor tissues and non-tumor tissues were identified by "DESeq2" [2] R package in the development cohort firstly, with the fold change > 1 and a false discovery rate < 0.05. To minimize the false discovery rate of the DEGs, we then used "limma" R package [3] to identify the DEGs between the two types of tissues. The intersection of the DEGs identified by "DESeq2" and "limma" and measured in both cohorts was treated as the DEGs for downstream analyses.

Development of Copper-PSHC
We used the least absolute shrinkage and selection operator (LASSO) Cox regression to select the most contributing prognostic genes among the rest of the 13 genes to establish Copper-PSHC for prediction of overall survival. The LASSO is a valuable method for selecting the predictors with the most contributing values and fitting a generalized linear model based on L1 regularization. [4] It is widely used in regression analysis with high-dimensional inputs to prevent overfitting. LASSO has been applied to the Cox regression model for survival analysis. [5] The penalty parameter, l, was determined by 10-fold cross-validation following the minimum criteria, i.e., the optimal l is where the value of the smallest partial likelihood deviance was observed. Copper-PSHC was then constructed as follows: where β is the corresponding Cox regression coefficient and the gene expression level was normalized using the Z-score method for each cohort.
Finally, nine genes were included in the Cox regression model. Our model did not violate the 10EPV principle [6] as the observed number of events (death) was 129 and 119 in the development and validation cohort. The analysis was performed with the "glmnet" R package [7].

Development of Copper-CPSHC
To develop a personalized prognostic model for HCC patients, we combined clinical factors, including age, sex, and stage with Copper-PSHC to construct a composite prognostic model, Copper-CPSHC, using multivariate cox regression in the development cohort. Age, sex, and stage were treated as continuous variables. Age < 60 years were coded as 0 and ≥ 60 years were coded as

Validation of Copper-PSHC
The risk score generated by Copper-PSHC was estimated for each individual in both cohorts. Then, we adopt the median score in the development cohort as a cut-off value to classify HCC patients into the high-(≥ median) and low-risk (< median) groups.
We conducted univariate analyses to evaluate the prognostic value of Copper-PSHC in both cohorts. Then, stratified analyses by age (< 60 vs. ≥ 60 years), sex (male vs. female), and stage (I-II vs III-IV) were then performed for both cohorts, and the hazard ratio (HR) was merged. Considering the small sample size of stage IV patients in both cohorts (1.4% and 7.0 %for the development and validation cohort, respectively), we combined stage III-IV patients as one group, while stage I-II was another.
The proportional hazards assumption was tested with Grambsch and Therneau method [8][9]. In the overall population of the two cohorts, the proportional hazards assumption was not violated (Supplementary Table S3). However, considering the statistical examination was not robust in subgroups due to the small population, where the violation of proportional hazards assumption may exist, we reported HR estimated either by cox model or log-rank test (annotated in Supplementary Figure 5). Further, the HR estimated with the log-rank test was merged by SAS (SAS Institute, Cary, NC) using a fixed model. Further, we included age (< 60 vs. ≥ 60 years), sex (male vs. female), and stage (I-II vs. III-IV) in multivariate analyses to justify whether Copper-PSHC was independently associated with survival outcomes.
Time-dependent ROC analyses for time-to-event outcomes, i.e., OS and DFS, were performed to test the predictive power of Copper-PSHC over time. Time-dependent ROC analysis was used to characterize the predictive accuracy of a predictor when the outcome was censored data [10]. Considering the median follow-up time in both cohorts was less than 3 years (28.0 months [IQR 23.9 to 33.8 months] and 28.5 months [IQR 24.6 to 30.5 months] for the development and validation cohort, respectively), we set 1, 2 and 3 years as time point to evaluate the predictive power by estimating the area under the curve.
Additionally, restricted mean survival times (RMSTs) were estimated for high-and low-risk groups determined by Copper-PSHC to quantify the long-term survival benefit [11][12]. We reported the overall RMSTs and the 36-month RMSTs. The concordance index (c-index) was estimated to quantify the prognostic accuracy. RMST was estimated by the "survRM2" package. [13] C-index was calculated by the R "survminer" package. [14]

Validation of Copper-CPSHC
The risk score generated by Copper-CPSHC was estimated for each individual in both cohorts. Then, the optimal cut-off value classifying the high-and low-risk group was determined according to the Youden index of the time-dependent ROC curve at 3 years.
Similar to the validation of Copper-PSHC, we performed univariate analysis and represented in Kaplan-Meier curves. Time-dependent ROC analyses were also performed at 1, 2, and 3 years. RMSTs and C-index were also estimated. We additionally evaluated the model calibration with calibration curves [15], which compared the predicted risk with the observed risk. Also, we conducted a decision curve analysis (DCA). [15][16] We compare the net benefits of Copper-CPSHC with TNM stage and a multivariate model including age, sex, and stage.

Hypoxia analysis
The HCC hypoxia score of the high-and low-risk groups were calculated as bellow [17]: Hypoxia score = 0.0376* expression of HAVCR1 + 0.0337* expression of PSRC1 + 0.1417* expression of CCNJL + 0.0530* expression of PDSS1 + 0.0316* expression of MEX3A + 0.2148* expression of EID3 + 0.0148 * expression of EPO + 0.0081* expression of PLOD2 + 0.0296* expression of KPNA2 + 0.0381* expression of CDCA8 + 0.2877* expression of ADAMTS5 + 0.0187* expression of SLC1A7 + 0.0065* expression of PIGZ A higher hypoxia score indicates a lower oxygen status, which associated with a poorer prognosis [18][19]. Previous studies found that hypoxia promotes proliferation, metastasis, angiogenesis, resistance to radiotherapy and chemotherapy of HCC [20-21]. Of note, the assessment of hypoxia was limited in the development cohort because some of the above genes were not measured in the validation cohort. ESTIMATE is a gene-signature-based algorithm to infer the fraction of stromal and immune cells. Stromal score and immune score were estimated to quantify the presence of stromal and immune cell, respectively. Then these two scores were combined to generate an ESTIMATE score, which representing the tumor purity. CIBERSORT is a deconvolution-based approach to infer the proportions of 22 immune cells based on linear support vector regression. ssGSEA is an approach which converts gene expression profiles of individual samples into gene set enrichment profiles to calculate immune cell infiltration scores. Similarly, xCell is a marker-gene-based approach to infer 64 immune and stromal cell types. This approach was based on reliable gene signature (marker) and ssGSEA algorithm. Also, an immune, stroma and microenvironment score were generated by xCell to describe the infiltration of immune and stroma cells, and to measure the abundance of microenvironment.

Tumor microenvironment analysis
In our analysis, significant differences (for proportion and expression) were only claimed if they were found in both the development and validation cohorts. For those cells estimated by more than one algorithm, the findings were considered robust only when all algorithms reach consensus.

Exploration of potential therapy for HCC
The online website CLUE (available at: https://clue.io/) was used to query the potential drugs. CLUE was based on the concept of CMap (Connectivity map), whereby genes, drugs and disease states are connected by virtue of common gene-expression signatures [26][27]. Hence, the potential therapy to reverse the disease status for high-risk group can be predicted by DEGs.
The upregulated and downregulated gene symbol between high-and low-risk were upload to the website. The parameters of query were select as "Gene expression (L1000), touchstone". Then, an enrichment score ranging from -1 to 1 will be generated, representing the similarity between drugs and current biological process or disease status. A positive score indicates that drugs could induce the biological phenomena, while a negative score represents that the drug could reverse the disease status and have potential therapeutic value. The potential drugs were selected with the criteria of P < .05 and enrichment score < -0.60.

Hu B, Yang XB, Sang XT. Development and Verification of the Hypoxia-Related and
Immune

Supplementary Tables
Supplementary Table S1. Characteristics of participants after imputation. ..

Notes:
The variables in the development cohort with missing observation less than 20% were imputed.