Clinical Diagnostic and Prognostic Potential of NDRG1 and NDRG2 in Hepatocellular Carcinoma Patients

Background Primary liver cancer is still the most common lethal malignancy. The N-myc downstream-regulated gene family (NDRG1–4) is a group of multifunctional proteins associated with carcinogenesis. However, systematic evaluation of the diagnostic and prognostic values of NDRG1 or NDRG2 expression in liver cancer is poorly investigated. Method The gene expression matrix of liver hepatocellular carcinoma (LIHC) was comprehensively analyzed by the “limma” and “Dseq2” R packages. The Gene Ontology (GO) and Gene Set Enrichment Analysis (GSEA) were used to identify the biological functional differences. A single-sample GSEA (ssGSEA) was conducted to quantify the extent of immune cell infiltration. Finally, the clinical and prognostic information of LIHC patients was systematically investigated using Kaplan–Meier analysis and logistic and Cox regression analysis. Results Compared with normal tissues, NDRG1 expression was higher, whereas NDRG2 expression was lower in tumor tissues (P <0.001). The area under the receiver operator characteristic curve (AUROC) of NDRG1 and NDRG2 for LIHC was 0.715 and 0.799, respectively. Kaplan–Meier analysis revealed that NDRG1 and NDRG2 were independent clinical prognostic biomarkers for the overall survival (OS, P = 0.001 and 2.9e−06), progression-free interval (PFI, P = 0.028 and 0.005) and disease-specific survival (DSS, P = 0.027 and P <0.001). The C-indexes and calibration plots of the nomogram suggest that NDRG1 and NDRG2 have an effective predictive performance for OS (C-index: 0.676), DSS (C-index: 0.741) and PFI (C-index: 0.630) of liver cancer patients. The mutation rate of NDRG1 in liver cancer reached up to 14%, and DNA methylation levels of NDRG1 and NDRG2 promoters correlated significantly with clinical prognosis. Conclusions The mRNA expression and DNA methylation of NDRG superfamily members have the potential for LIHC diagnosis and prognosis via integrative analysis from multiple cohorts.


INTRODUCTION
Primary liver cancer is the fourth leading cause of cancerrelated deaths worldwide, characterized by an insidious onset and a low rate of early diagnosis (1). In China, liver hepatocellular carcinoma (LIHC) is the most common type of primary liver cancer and typically results from chronic hepatitis B virus infection (2). Despite the recent advances in diagnostic and therapy, the 5-year relative survival rate of LIHC remains to be only 12% (3). Currently, LIHC contributes to over half of the new deaths, and it is predicted to rise continuously in the next decade (4). Therefore, identifying potential biomarkers that improve diagnostic accuracy and prognostic prediction is critical.
Increasing evidence indicates that disturbance of proto-oncogenes and tumor suppressor genes results in hepatocarcinogenesis, which is a complicated pathophysiological process (5). Viral infection and metabolic stress induce genetic and epigenetic alterations through cell cycle turnover and the inflammatory environment (6). The N-myc downstream-regulated gene (NDRG1-4) family, a hypoxia-associated protein, has been involved in cell proliferation and differentiation, stress responses, tumor progression, and metastasis (7)(8)(9). NDRG1 is involved in cellular skeleton modification and organ development and can be induced by hypoxia and DNA damage (7,10). NDRG2 is highly expressed in dendritic cells and maintains activated leukocyte adhesion (11). Moreover, NDRG3 could promote cell growth and angiogenesis and participate in the lactate-dependent hypoxia signaling pathway (12). NDRG4 is exclusively expressed in the embryonic stage and regulates the proliferation and growth of nerve cells and cardiomyocytes (13). Of note, abnormal expression of NDRG1 or NDRG2 has been found in different cancer types, such as in esophageal cancer, colorectal cancer, breast cancer, prostate cancer, and hepatocellular carcinoma, and is significantly associated with poor prognosis (10,(14)(15)(16). Existing data indicate that NDRG1 is upregulated and NDRG2 is downregulated in LIHC. However, a few studies focused on the diagnostic and prognostic values of NDRG1 or NDRG2 in LIHC, and their potential mechanisms in LIHC remain unknown.
Here, the diagnostic and prognostic significance of NDRG1 and NDRG2 were systematically identified in LIHC using RNAseq data from the TCGA database. We first comprehensively analyzed the gene expression matrix of LIHC and then applied bioinformatics methods (namely, Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and Gene Set Enrichment Analysis (GSEA)) to explore the underlying biological mechanism. Secondly, we analyzed DNA mutation and methylation in NDRG1 or NDRG2, and explored the relevance between immune cells and NDRG1 or NDRG2 expression by the single sample GSEA (ssGSEA). Moreover, clinical and follow-up information of LIHC patients were used for Kaplan-Meier analysis, and logistic and Cox regression analysis. Finally, we plotted a nomogram for the prognosis prediction of the patients. Taken together, this study revealed that NDRG1 or NDRG2 could be a potential diagnostic and prognostic biomarker for LIHC.

Data Source
RNA-seq data for pan-cancer analysis were retrieved from the UCSC XENA (https://xenabrowser.net/datapages/) (17, 18). The mRNA expression datasets (GSE14520, GSE25097, and GSE36376) were obtained from the GEO database (https:// www.ncbi.nlm.nih.gov/geo/) to further validate the results of pan-cancer analysis in LIHC patients. Meanwhile, we also collected the data for NDRG1 and NDRG2 protein expression from the Human Protein Atlas (HPA) (https://www.proteinatlas. org/). The mRNA expression matrix file of TCGA-LIHC and the corresponding clinical data were collected from the TCGA website (https://portal.gdc.cancer.gov/repository), and level-3 HTSeq-FPKM data were transformed into TPM (transcripts per million reads) for subsequent analyses. A total of 371 patient information was used, while unavailable or unknown clinical information was excluded.

Identification of the NDRG1 and NDRG2 Expression Profile
The raw expression profiles from GEO datasets were preprocessed by R software (version 4.0.5) and differentially analyzed by running the "limma" R package (version 3.46.0). Gene expression data of LIHC case samples were stratified into high-and low-expression groups based on the median expression of NDRG1 and NDRG2, respectively. The expression analysis between high-and low-expression groups was performed using the "DESeq2" R package (19) (version 3.18.1; http://www.Rproject.org). Genes with the threshold for | log2FoldChange | >0.5 and adjusted P <0.05 were regarded as statistically significant.

Functional Annotation and Enrichment Analysis
expression groups (20,21). Here, the curated gene sets (c2.cp.v7.2.symbols.gmt) from MSigDB Collections were selected as reference gene sets for GSEA. A permutation test with 1,000 iterations was used to identify pathways that had changed significantly. Significant pathway enrichment was identified by a false discovery rate (FDR) <0.25 and Pvalue <0.05.
Single-Sample GSEA (ssGSEA) for Immune Infiltration Analysis The ssGSEA was conducted using the GSVA R package (3.6.3) to quantify the tumor immune infiltration levels in 24 types of immune cells (22) and the marker of which was obtained from a previous study (23). Next, Spearman's correlation analysis was conducted to evaluate the associations of NDRG1 and NDRG2 expression with immune cell infiltration, and the Wilcoxon ranksum test was used to investigate the enrichment scores of immune infiltration levels between high-and low-NDRG1 and NDRG2 expression groups.

DNA Mutations and Methylation Analysis
The cBio Cancer Genomics Portal (http://cbioportal.org) was applied to analyze NDRG family alterations in the TCGA-LIHC sample, which is an interactive exploration website of multidimensional cancer genomic datasets (24). The DNA CpG methylation of NDRG1 and NDRG2 in TCGA was also analyzed by MethSurv (https://biit.cs.ut.ee/methsurv/) to explore the relevance between the CpG methylation level and prognostic values (25).

Statistical Analysis
The R software (version 4.0.5) was used for all statistical analysis and graphical plotting. Scatter plot analysis was performed using the Wilcoxon rank-sum test. Non-parametric survival analysis was performed by the Kaplan-Meier method and log-rank test. Correlation analysis was evaluated using the Spearman's coefficient. The Wilcoxon signed rank test, Kruskal-Wallis test, and Chi-Squared test were used to assess the clinicopathological features between high-and low-expression groups. The diagnostic performance of NDRG1 or NDRG2 expression was tested by the Area Under the Receiver Operator Characteristic Curve (AUROC). Univariate and multivariate analyses using Cox proportional hazard modeling were performed to estimate the death risk. Unless stated otherwise, P <0.05 (two-sided) was considered statistically significant.

NDRG1 Was Upregulated and NDRG2 Was Downregulated in LIHC
We used pan-cancer RNA-seq data from the UCSC XENA to evaluate the mRNA expression levels of NDRG1 and NDRG2 in diverse human cancers. As shown in Figure 1A, the mRNA expression of NDRG1 and NDRG2 was found to be significantly different (P <0.001) in almost all tumor types, including LIHC.
To further verify the above results, we performed common differential expression gene analysis among the three selected GEO datasets (GSE14520, GSE25097, and GSE3637), and NDRG1 and NDRG2 were found to be two of the 843 overlapping genes (|log2FoldChange| >0.5, Padj <0.05) ( Figure 1B). As shown in Figures 1C, D, the results from GSE14520 showed that NDRG1 mRNA was highly expressed in LIHC tissues compared to the adjacent normal tissues (p <0.001), and NDRG2 mRNA was markedly downregulated in LIHC tissues (P <0.001).
To further validate NDRG1 and NDRG2 protein expression in LIHC, we analyzed the NDRG1 and NDRG2 expression profiles in the HPA database ( Figure 1E). The immunohistochemical staining from HPA also showed that NDRG1 and NDRG2 proteins were expressed in a pattern consistent with the mRNAlevel changes in LIHC tissues. At the experimental level, we analyzed NDRG1 and NDRG2 mRNA expression by qPCR in 32 pairs of LIHC tissues ( Figures 1F, G). The qPCR results again indicated the high expression of NDRG1 (P <0.001) and low expression of NDRG2 (P <0.001) in LIHC tissues.

DNA Mutation and Methylation of NDRG1 or NDRG2 for LIHC Prognosis
Next, we analyzed the mutation frequencies of NDRG1 or NDRG2 in TCGA-LIHC using the cBioPortal online tool. As shown in Figure 2A, a high mutation rate of NDRG1 (up to 14%) was observed in LIHC patients, while other NDRG family genes (NDRG2, NDRG3, and NDRG4) were rarely mutated (less than 2%). Moreover, we analyzed the DNA methylation levels of NDRG1 and NDRG2 (Figures 2B, C) and the prognostic values ( Figure 2D) of single CpG in TCGA-LIHC via MethSurv © 2017. As a result, they showed that cg15393676 in NDRG1 ( Figure 2B), and cg16409562 and cg04359602 in NDRG2 ( Figure 2C) showed the highest methylation in their promoter regions. The DNA methylation level has been negatively correlated with the gene expression level. As shown in Figure 2D, LIHC patients with hypermethylation in NDRG1 and hypomethylation in NDRG2 could have a better clinical prognosis, which supported our mRNA expression results.
Furthermore, we explored the relationship between the expression of NDRG1 or NDRG2 and DNA repair and methyltransferase genes. As shown in Figure 2E, NDRG1 expression has a strong positive correlation with DNA repair genes and methyltransferase genes in LIHC. As shown in Figure 2F, we found that NDRG2 expression has a markable positive correlation with only a few RNA methyltransferase genes and DNA repair genes, but not with DNA methyltransferase in LIHC. This suggests that NDRG1 may indirectly affect the development and progression of LIHC by regulating epigenetic status.

NDRG1-or NDRG2-Related Differentially Expressed Genes
The enrolled 371 LIHC tumor samples from the TCGA dataset were stratified into high-and low-expression groups according to NDRG1 or NDRG2 median values (cut-off value of 50%), respectively. The identified differentially expressed genes (DEGs) (|log2FoldChange| >0.5, Padj <0.05) between different cohorts were illustrated by the volcano plot. A total of 3,547 genes (2,322 upregulated and 1,225 downregulated) were identified as DEGs in the high-NDRG1 group ( Figure 3A), and 3,216 genes (1,204 upregulated and 2,012 downregulated) in the high-NDRG2 group ( Figure 3B). As shown in Figure 3C, 1,345 overlapping DEGs were found between the NDRG1 and NDRG2 groups based on the TCGA-LIHC dataset.
To predict the function of NDRG1-and NDRG2-related DEGs, we conducted GO and KEGG pathway analysis using 1,345 overlapping DEGs by the "ClusterProfiler" R package (version 3.14.3). As shown in Figures 3D-G, we found that NDRG1-and NDRG2-related DEGs were involved in many biological processes, namely, retinol metabolism, bile secretion, fatty acid degradation, chemical carcinogenesis, amino acid metabolism, PPAR signaling pathway, and the cytochrome p450 pathway.

NDRG1-or NDRG2-Related Signaling Pathways
To identify NDRG1-and NDRG2-related signaling pathways, the GSEA was then conducted between low-and high-expression groups, based on significant differences (P-value <0.05, FDR <0.25) in the enrichment of the MSigDB Collection [c2.cp.v7.2.symbols.gmt (curated)]. Here, the most significant enrichment was selected according to the Normalized Enrichment Score (NES). As shown in Figure 4, GSEA results showed that the NDRG1-related signaling pathway is involved in the PLK1 pathway, complement cascade, complement and coagulation cascades, and fatty acid metabolism ( Figure 4A), while the NDRG2-related signaling pathway is involved in fatty acid metabolism, oxidation by cytochrome p450, retinol metabolism, and eukaryotic translation elongation ( Figure 4B).

Correlation Between NDRG1 or NDRG2 Expression and Immune Infiltration
The tissue microenvironment is vital for tumor cells. Therefore, we first assessed the infiltration of 24 types of immune cells in LIHC using the ssGSEA method from the R package "GSVA", and subsequently evaluated the relationship between NDRG1 or NDRG2 mRNA expression and immune cell infiltration by Spearman's analysis. As shown in

Correlation Between NDRG1 or NDRG2 Expression and Clinicopathological Characteristics
The clinicopathological characteristics of LIHC patients with differential NDRG1 and NDRG2 expression are listed in Table 1.
There was a significant difference in the distribution of BMI (P = 0.018), T stages (P = 0.031), pathologic stages (P = 0.016), residual tumor (P = 0.008), and histological grade (P = 0.027) between the high-and low-NDRG1 groups. We also further evaluated the correlation between NDRG1 or NDRG2 expression and clinicopathological characteristics by logistic regression analysis ( Table 2). As a result, we found markedly positive correlations of NDRG1 expression with clinical T stage (P = 0.004), histological grade (P = 0.005), and AFP concentration (P <0.001). Diagnosis and Prognosis Values of NDRG1 or NDRG2 for LIHC As shown in Figure 6A, the AUROC of NDRG1 and NDRG2 is 0.715 and 0.799, respectively. The results of AUROC indicated that the expression of NDRG1 or NDRG2 had high sensitivity and specificity for LIHC diagnosis. Next, Kaplan-Meier (K-M) analysis was performed to verify the prediction of NDRG1 or NDRG2 on clinical outcomes. As shown in Figures 6B-E Subsequently, we constructed a prognostic nomogram using multivariate Cox regression analysis and validated the efficiency of the nomogram by drawing a calibration curve. As shown in Figures 7A-C, tumor TNM stage and age, as well as the expression level of NDRG1 or NDRG2, were included in the nomogram to predict OS (C-index: 0.676, Figure 7A), DSS (Cindex: 0.741, Figure 7B) and PFI (C-index: 0.630, Figure 7C). The calibration curves showed the desired predictive performance of these nomograms of 1-, 3-, and 5-year clinical outcomes ( Figures 7D-F  regression analyses on each subgroup to assess the prognostic performance of NDRG1 (Table 3) or NDRG2 (Table 4) expression on OS, DSS, and PFI. As shown in Table 3, NDRG1 was a significant risk factor for OS in male patients (HR = 1.97, P = 0.003), patients aged above 60 (HR = 1.78, P = 0.014), patients at N0 stage (HR = 1.55, P = 0.048), patients at M0 stage (HR = 1.60, P = 0.034), patients with vascular invasion (HR = 2.19, P = 0.028), and patients with tumor (HR = 1.58, P = 0.048). NDRG1 was also a significant risk factor for DSS in patients with tumors (HR = 1.58, P = 0.048), and for PFI in females (HR = 2.55, P = 0.008), patients at T stage I-II (HR = 1.48, P = 0.034), and patients at N1 stage (HR = 1.44, P = 0.044). As shown in Table 4, NDRG2 was a significant favorable factor for OS in male patients (HR = 0.62, P = 0.034), patients aged below 60 (HR = 0.45, P = 0.004), and patients at M0 stage (HR = 0.60, P = 0.020). Similar observations occurred for DSS in male patients (HR = 0.53, P = 0.031), patients aged below 60 (HR = 0.46, P = 0.019), patients at the N0 stage (HR = 0.53, P = 0.029), and patients at the M0 stage (HR = 0.56, P = 0.043). All the results demonstrated significantly better clinical outcomes in the low-NDRG1 or high-NDRG2 expression groups.

DISCUSSION
To date, clinical outcomes of LIHC are far from satisfactory owing to the lack of efficient indicators and effective treatment (26). Therefore, it is crucial to find potential biomarkers for predicting clinical prognosis and guiding individualized treatment. Many cancers have been reported to have the Nmyc downstream-regulated gene (NDRG1-4) family. To our knowledge, the expression levels of NDRG family members and their potential prognostic values in LIHC have not been fully explored. Hence, we focused on the expression and prognostic values of NDRG1 and NDRG2 in LIHC by analyzing datasets from the TCGA and GEO.
Here, the data analysis from UCSC XENA indicated that the NDRG1 expression had a significant increase in LIHC tumor tissues compared to normal tissues (P <0.001). The same result can be found in the GEO database (GSE14520, GSE25097, and GSE36376), the HPA database, and our experiment validation (qPCR). NDRG1 is a member of the NDRG superfamily and has been found to be involved in embryonic development (7), cellular vesicle transportation (27), and membrane protein circulation (28). NDRG1 is strongly upregulated under hypoxic conditions, and this condition is prevalent in solid tumors (29). Previous studies suggested that NDRG1 could inhibit proliferation and induce apoptosis of cancer cells by regulating Bcl-2 and Ca 2+ -associated proteins (30,31) and epithelialmesenchymal transition (EMT) (32). Another study indicated that the encoded protein of NDRG1 is necessary for p53mediated caspase activation and apoptosis (33). However, NDRG1 protein is expressed at low levels in normal tissues while NDRG1 mRNA is ubiquitously over-expressed in various human cancers. In other words, the regulation mechanism involved in NDRG1 is somewhat complex, governed by hypoxia-inducible factor 1 alpha (HIF-1a)-and p53-dependent pathways (29), which makes the NDRG1 gene potentially an important biomarker for tumor progression. Furthermore, Kaplan-Meier analysis showed that high NDRG1 expression correlates significantly with poor clinical outcomes in LIHC patients, which was consistent with previous research (34). Our results suggest that NDRG1 is an independent risk factor for LIHC and could act as an oncogene to accelerate LIHC progression. However, previous studies revealed that the expression of NDRG1 decreased in esophageal (16), colorectal (35), and breast cancers (10), and was correlated with poor clinical outcomes. This contradictory result may be tumor type-specific, further emphasizing that NDRG1 is involved in complex biological processes. In contrast to NDRG1, NDRG2 expression was reduced significantly in LIHC, and the downregulation of NDRG2 was associated with poor prognosis, which was consistent with previous studies (9,14). Past research has also shown that the NDRG2 expression has a significant decrease in gastric (36), pancreatic (14), and breast cancers (15). Our results showed that NDRG2 might act as an antioncogene to suppress the LIHC progression.
Different types of mutations can greatly increase the risk of developing certain cancers. Interestingly, we found that the NDRG1 gene has a mutation frequency of up to 14% in LIHC patients, while the other NDRG family members have a mutation frequency of less than 2%. DNA methylation is a common epigenetic phenotype that exists in almost all types of human cancers (37), and its occurrence in the promoter region often results in gene silencing (38). In this study, the methylation of NDRG1 and NDRG2 was related to the clinical prognosis of LIHC patients, and patients with hypomethylated NDRG1 or hypermethylated NDRG2 had worse OS. Of note, their mRNA expression was consistent with DNA methylation levels for clinical prognosis. Therefore, the promoter methylation status and mRNA levels of NDRG1 and NDRG2 can be used as independent predictors of LIHC patients. The tumor microenvironment (TME), composed of diverse cell populations in a complex matrix, plays a crucial role in the occurrence and progression of tumors. Tumor-associated    fibroblasts in TME contribute to the progression of LIHC by secreting various growth factors and cytokines (39). Tumorassociated immune cells in TME could be divided into tumorantagonizing and tumor-promoting immune cells (40). In this study, we observed that NDRG1 and NDRG2 expression were negatively associated with most of the immune infiltration cells in LIHC, namely, DCs, macrophages, and neutrophils. As is known to all, DCs are the most effective antigen-presenting cells, which can activate CD8 + T cells and then initiate anti-tumor immunity (41). In the following immune response, neutrophils and macrophages work together against tumors (42). Therefore, the upregulation of NDRG1 or downregulation of NDRG2 seemed to suppress tumor immunity, assist cancer cells to escape from immune elimination, and finally promote tumorigenesis. Meanwhile, NDRG1 mRNA level was positively associated with Th2 cell infiltration level, and NDRG2 expression was positively correlated with Th17 and NK cell infiltration level.
These results indicate that NDRG1 and NDRG2 may play vital roles in tumor infiltration immunity. Here, we observed the correlations between immune cell infiltration and the expression of NDRG1 and NDRG2. However, there remains a gap to be filled between NDRG1 and NDRG2 in LIHC cancer and various types of immune cells.
The clinical significance of NDRG1 and NDRG2 is another concern. Our AUROC provided strong evidence that NDRG1 or NDRG2 could be biomarkers for LIHC diagnosis and prognosis. Further Cox regression analysis and nomograms further demonstrated that NDRG1 or NDRG2 had promising performance for evaluating clinical outcomes. Patients with higher NDRG1 or lower NDRG2 levels had worse OS, PFI, and DSS. In particular, the prognostic value of NDRG1 or NDRG2 was better when considering gender, age, and clinical TNM stages ( Figure 7). As a result, NDRG1 or NDRG2 could be a promising biomarker for poor prognosis prediction. Our findings are consistent with previous reports that NDRG1 plays a role in promoting tumorigenesis in liver, kidney, and esophageal cancers (43). In contrast, evidence has elucidated that NDRG1 is associated with anti-oncogenic and anti-metastatic effects in breast, prostate, colorectum, and pancreatic cancers (44). The inconsistent results might be that NDRG1 is potentially involved in many biological processes and the bi-directional crosstalk and may not play one single role. The pleiotropy of NDRG1 may reflect the heterogeneity of signal transduction in different tumor cell-types. Nowadays, the multi-omics approach (45) and the emerging single-cell studies (46) provide a new perspective on identifying biomarkers for clinical diagnosis and tumor typing. In short, the mRNA expression and DNA methylation of NDRG superfamily members (NDRG1 and NDRG2) show the potential for LIHC diagnosis and prognosis via integrative analysis from multiple cohorts. Considering the multiple mechanisms of the NDRG family, more experiments and a larger sample size will be needed to demonstrate and validate our bioinformatics results for future clinical application.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The hospital ethics committee, Zhongnan Hospital of Wuhan University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
S-ML and SX conceived and designed the experiments. SX, YZho, YY, and RG analyzed data and drafted the original manuscript. YZha and RG performed the experiments. S-ML, CL and QL critically revised the manuscript. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.