RNA N6-Methyladenosine-Related Gene Contribute to Clinical Prognostic Impact on Patients With Liver Cancer

Liver cancer (LC) is the fourth leading cause of cancer-related deaths worldwide. There is an urgent need to identify novel and reliable prognostic biomarkers for LC in order to improve patient outcomes. N6-methyladenosine (m6A) is the most common internal modification in eukaryotic mRNA and has been associated with various cancers, although its roles in the prognosis of LC remains to be elucidated. We analyzed the expression profiles of 15 m6A-related genes in the International Cancer Genome Consortium (ICGC) LIRI-JP dataset, and applied consensus clustering to stratify LC patients into two subgroups (Cluster 1 and Cluster 2). Cluster1 was significantly correlated to lower tumor stage and longer overall survival (OS). Gene set enrichment analysis showed that tumorigenic markers, including DNA repair, E2F targets, G2M checkpoint, and MYC targets V1, were enriched in Cluster2. We then constructed a prognostic risk model using three m6A-related genes that were identified as independent factors affecting OS. The nomogram based on the risk model score indicated good performance in predicting the 1-, 2- and 3-year survival of the LC patients. In conclusion, m6A-related genes are potential prognostic markers and therapeutic targets for LC.


INTRODUCTION
Liver cancer (LC) is the fourth leading cause of cancer-related deaths worldwide (Villanueva, 2019). The etiology of LC differs geographically due to differences in the prevalence of risk factors (Jiang et al., 2019). For instance, chronic viral hepatitis infection is the most important risk factor in Asian countries, whereas non-viral factors are the major causative agents of LC in the Western countries (Yau et al., 2019). In East Asia, hepatitis B virus (HBV)-and hepatitis C virus (HCV)-related LC accounts for more than 80% of the cases (Liu et al., 2019a).
Hepatocellular carcinoma (HCC) is classified into different subclasses based on pathological characteristics and/or transcriptomes (Hoshida et al., 2009;Calderaro et al., 2017), and no study has so far reported prognostic subclasses of LC based on the expression of m6A-related genes. Since the prognosis of LC patients depends on the etiology and the ethnicity and/or geographical region (Hashimoto et al., 2017;Villanueva, 2019), and as East Asia has the highest incidence of LC (Bray et al., 2018), we therefore analyzed the m6A profile in an East Asian LC cohort (LIRI-JP dataset) from the International Cancer Genome Consortium (ICGC) database. The aim of this study was to determine the prognostic value of the m6A-related gene signature in LC.

Datasets
The RNA sequencing data and corresponding clinicopathological information of LC patients were extracted from the ICGC (LIRI-JP dataset 1 ) and The Cancer Genome Atlas (TCGA, LIHC dataset 2 ) databases in May 2019. The gene expression data from TCGA was estimated as Transcripts Per Kilobase of the exon model per Million mapped reads (TPM). In the LIRI-JP dataset, the clinical stages of the patients were classified as per the Stage of Liver Cancer Study Group of Japan (LCSGJ) guidelines. The simple somatic mutation data was also retrieved for calculating the tumor mutation burden (TMB). The data of non-solid tissues and non-primary tumors, and of samples lacking sufficient clinical information were excluded. In case two or more samples were derived from the same patient, the mean value was used for analysis. Finally, 231 LC patients and 199 healthy controls from the LIRI-JP dataset, and 370 LC patients from the LIHC dataset were selected. The clinicopathological data of all patients are summarized in Supplementary Table S1.

Bioinformatics Analysis
Fifteen m6A-related genes were extracted from the LIRI-JP dataset (Supplementary Table S2). We analyzed the expression of 15 m6A-related genes in LC patients and normal tissue using the Limma package. LC patients were then clustered into different subgroups using the "Consensus Cluster Plus" package. In order to functionally annotate differentially expressed genes (DEGs) in different subgroups, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using the "clusterProfiler" package (Yu et al., 2012), and the Gene set variation analysis (GSVA) R package was used to analyze significant differences between the subgroups (Hanzelmann et al., 2013). Gene set enrichment analysis (GSEA) was used to identify the hallmarks of tumor sets in different LC subgroups (Liberzon et al., 2015).
The prognostic values of the m6A-related gene were determined by univariate Cox regression analysis in the LIRI-JP dataset in terms of hazard ratio (HR) and 95% confidence interval (CI). Six prognostic relevant genes (P < 0.05) were then used for the multivariable Cox analysis by step-wise forward and backward selection approaches as well as the smallest Akaike information criterion (AIC). Finally, a risk model was constructed using three genes, and the risk score (designated as riskScore) was calculated for each patient in the LIRI-JP and LIHC dataset using the formula: riskScore = Coef gene1 × Exp gene1 + Coef gene2 × Exp gene2 + Coef gene3 × Exp gene3 , where Coef is the coefficient and Exp is the gene expression value. The clinicopathological factors and riskScore were used as variates in the univariate and multivariate Cox proportional hazards (PH) regression analyses to determine the independent predictive factors of overall survival (OS) in both datasets. A nomogram for 1-, 2-, and 3-year OS was then constructed based on the independent predictive factors, and its predictive performance was evaluated by C-index (Harrell et al., 1996). The calibration curve of the nomogram was used to assess the congruency between the predicted and actual survival. Bootstraps with 1,000 resamples were used to quantify model overfit, and a decision curve analysis (DCA) was made to evaluate the clinical efficacy (Vickers et al., 2008). The prediction power of the distinct parameters was determined using the area under receiver operating characteristic (ROC) curve (AUC) values.

Statistics
The expression level of 15 genes in the LC patients and controls was analyzed using the Wilcoxon rank sum test. The correlation between genes was determined by Pearson's analysis. Patients were divided into different groups by consensus analysis or riskScore (median value as the cutoff), and the distribution of clinical parameters between the subgroups was determined by Fisher's exact test. The OS of LC patients in the different subgroups was analyzed by the Kaplan-Meier method and compared with the log-rank test. All statistical analyses were performed by R v3.6.0 3 .

Three m6A-Related Genes Form a Prognostic Risk Signature in Liver Cancer
Six m6A-related genes significantly correlated with OS by Univariate Cox analysis (P < 0.05), of which METTL3, YTHDC2, and YTHDF2 were identified as independent predictors of OS and the coefficients were obtained by the multivariate analysis ( Table 1). A risk model was constructed using these genes, and the riskScore was calculated for LC patients. Using the median riskScore as the cutoff value, we classified the LC patients into the high and low risk groups and observed poorer OS in the former (P < 0.001; Figure 3A). In addition, the  risk subgroups differed significantly in terms of tumor stage (P < 0.01) and gene cluster (P < 0.001) (Figure 3B), but not age and gender in the LIRI-JP dataset (Supplementary Figure S3). The AUC values showed a better predictive ability of the riskScore for 3-year OS compared to the aforementioned parameters ( Figure 3C). The multivariate analysis confirmed that riskScore, gender and stage were independent prognostic factors for the OS (stage and riskScore, P < 0.001; gender, P < 0.01) (Figures 3D,E). Furthermore, the female and Stage 4 LC patients had poorer prognosis compared to the male patients and those at other tumor stages, respectively (Supplementary Figures S4A-C). Interestingly, patients in the low risk subgroup stratified further by gender (female, P < 0.001; male, P < 0.05) or age (≤65, P < 0.01; >65, P < 0.05) had relatively longer OS compared to those in the high risk subgroup (Figures 3F-I), whereas the tumor stage was not affected by the riskScore (Supplementary Figures S4D-G).

Validation of the Risk Signature in TCGA Cohort
In the TCGA dataset, METTL3, YTHDC2, and YTHDF2 were also significantly upregulated in the LC patients relative to the controls (Supplementary Table S2), and the riskScore was an independent prognostic factor for OS in this cohort ( Figure 4A). We also stratified these patients into the high and low riskScore groups as with the LIRI-JP cohort, and observed significantly poorer prognosis in the former (P < 0.01) ( Figure 4B). Furthermore, the low riskScore group had longer OS compared to the high riskScore group in the Asian cohort Frontiers in Genetics | www.frontiersin.org  (P < 0.01) ( Figure 4C). Although there was no significant difference, a trend of better survival in the low risk group was observed in the non-Asian cohort ( Figure 4D). The AUC values showed that the riskScore had moderate predictive ability for 1-, 2-, and 3-year OS in the TCGA dataset (Supplementary Figure S5A), and the AUC values in the Asian cohort were higher than those in the non-Asian cohort (Supplementary Figures S5B,C).

Construction and Validation of Nomogram
The 231 LC patients in the LIRI-JP dataset were arbitrarily separated into the training (n = 116) and validation cohort (n = 115) with a 5:5 split ratio (seeds = 100). In the training cohort, all independent prognostic factors were included in the predictive nomogram for OS (Figure 5A), and the points for each predictor are listed in Supplementary Table S3. The calibration curves indicated good congruency between the predicted and observed 3-year OS ( Figure 5B). The Harrell's concordanceindex (C-index) and 3-year AUC value of the nomogram were 0.797 and 0.822, respectively, which were higher compared to that of the riskScore, gender, or tumor stage ( Figure 5C and Table 2). Similar outcomes were obtained in the validation as well as the entire cohort (Figures 5D,E and Table 2). In addition, DCA curves showed a greater threshold of the nomogram compared to the riskScore or tumor stage (Figure 5F), indicating that the nomogram has greater discriminatory capacity and accuracy for predicting survival compared to the other factors.

The m6A-Related Nomogram Has High Predictive Power
Immune checkpoint proteins including the programmed cell death protein 1 (PD-1/PDCD1), programmed death-ligand-1 (PD-L1/CD274), and cytotoxic T-lymphocyte associated antigen 4 (CTLA-4) are established prognostic markers for LC patients (Cariani and Missale, 2019;El Dika et al., 2019;Johnston and Khakoo, 2019). Recent studies showed that tumor mutation burden (TMB) is also significantly associated with the susceptibility to anti-tumor immunotherapy, and a higher TMB indicates better prognosis in many cancer types (Peng et al., 2019;Wang and Li, 2019). The m6A-related gene YTHDF1 was also closely related to the prognosis of HCC in the TCGA dataset in a previous study (Zhao et al., 2018). We compared the AUC values of our established nomogram with these biomarkers, and found that the predictive power of the nomogram was superior for 1-, 2-, and 3-year OS in the LIRI-JP dataset (Figure 6). Finally, pathway enrichment analysis by Metascape 4 indicated that METTL3, YTHDC2, and YTHDF2 and their 100 most strongly correlated co-expressed genes were enriched for functions like mRNA processing, DNA repair, covalent chromatin modification, and regulation of the cell cycle, which are closely involved in tumorigenesis (Figure 7).

DISCUSSION
Although numerous genes and non-coding RNAs associated with LC progression have been identified in recent years (Tsuei et al., 2004;Yuan et al., 2014;Chua et al., 2015;Li et al., 2016;Mattu et al., 2016;Xiao et al., 2016;Zhang et al., 2016;Liu et al., 2017;Zhao et al., 2018;Zhou et al., 2019), the prognosis of the patients remains disappointing. Therefore, it is imperative to identify novel and reliable prognostic biomarkers or models in order to improve the clinical outcomes of LC patients. LC is a highly heterogenous cancer, and patient prognosis depends significantly on the geographical region and etiology. Chronic infection with the hepatitis virus is a major risk factor of LC in East Asia, whereas alcohol consumption and non-alcoholic fatty liver disease are the main causes in Western countries. We analyzed the gene expression data of East Asian LC patients from the ICGC LIRI-JP dataset in order to determine the prognostic potential of m6A-related genes in LC. We found that six m6Arelated genes were significantly associated with the malignant progression and prognosis of LC, and a risk signature consisting of three of these genes was predictive of the prognosis. We used consensus clustering to stratify the patients into two subgroups based on the expression of m6A-related genes, which showed significant differences in OS and the enriched pathways associated with tumor development and progression. The prognostic risk model also stratified the patients in the LIRI-JP cohort into two groups based on the 3-gene riskScore, which showed greater predictive performance compared to single clinical indicators. Multivariate Cox analysis revealed that the riskScore was an independent prognostic factor for LC in the LIRI-JP and LIHC datasets. The nomogram, constructed using the riskScore and clinicopathological features, further increased the predictive power for OS compared to the riskScore, immunotherapy-related genes, or TMB alone. Interestingly, in the TCGA dataset, the riskScore was able to make a distinction for the OS in the Asian cohort, but not in the non-Asian cohort. This difference may be due to the fact that risk factors for LC differ across ethnicities.
The three genes (YTHDC2, YTHDF2, and METTL3) incorporated in the prognosis risk model were upregulated in the LC patients in both LIRI-JP and LIHC datasets, which are similar to those of previous studies (Yuan et al., 2014;Chen et al., 2018). YTHDF2 and METTL3 have previously been reported as tumor suppressors in HCC, and as oncogenes in pancreatic cancer and acute myelocytic leukemia (Cui et al., 2017;Wang et al., 2017;   Zhong et al., 2019). Chen et al. (2018) demonstrated that overexpression of METTL3 in HCC patients have poor prognosis. Further, knockout of METTL3 suppresses HCC tumorigenicity and lung metastasis by modulation of cytokine signaling 2 through a YTHDF2-dependent mechanism . Zhong et al. (2019) demonstrated that YTHDF2 acts as an HCC suppressor via promoting the degradation of epidermal growth factor receptor mRNA. Hou et al. (2019) reported that a high expression of YTHDF2 gives rise to a better prognosis of HCC patients, and represses tumor growth and angiogenesis by degradation of interleukin 11 and serpin family E member 2 mRNAs. Tanabe et al. (2014) reported that YTHDC2 acts as a tumor suppressor in the LC cell line by perhaps recruiting c-Jun and activating transcription factor 2 to the YTHDC2 promoter. The above three m6A-related genes may affect HCC growth and metastasis by regulating the stability of multiple target genes.
Recent studies showed that m6A-related genes could be potential prognostic markers for predicting patient survival in a variety of cancers. These genes significantly correlated and interacted with each other which indicated that the cross-talk exists among the m6A-related genes . Because of a complex reciprocal regulatory relationship among the m6Arelated genes, it seems necessary to analyze prognostic and predictive values using a signature comprised of multiple m6Arelated genes in patients with distinct tumor types. Kandimalla et al. (2019) reported that a gene expression signature consisted of seven m6A-related regulators characterized as a robust prognostic and predictive signature in 13 human cancers including HCC (relapse-free survival). This study offered a landscape of the biological and clinical characteristics pertaining to m6A machinery in tumor patients (Kandimalla et al., 2019). However, the external validation cohort was applied to colorectal cancer, gastric cancer, breast cancer, and ovarian cancer, but not to HCC for survival analysis. In our study, we successfully established a prognostic signature comprised of three m6Arelated genes for predicting survival of HCC patient, using an additional RNA-seq dataset as external validation avoiding biased results to some extent.
There were some limitations in this study. First, an additional LC patient cohort for a prognostic study was needed to validate the predictive power of our prognostic signature in the future. Second, experimental studies that focus on the molecular mechanisms remain necessary to investigate the functions of these m6A-related genes in LC.
In summary, m6A-related genes have a prognostic value in LC, and the constructed riskScore can identify patients who are high risk and can enable individualized therapy. Our findings have to be validated in larger cohorts, and further studies are also needed to elucidate the mechanism of these m6A-related genes in LC.

DATA AVAILABILITY STATEMENT
The data used to support the results of this study are from the public databases ICGC (International Cancer Genome Consortium, https://icgc.org/) and TCGA (The Cancer Genome Atlas, https://cancergenome.nih.gov/).

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
CH designed the research study. WW, BS, YX, and SS analyzed the data and performed the data analysis. WW wrote the manuscript and interpreted the data. CH helped to revise the manuscript. All authors read and approved the final manuscript.