Comprehensive Pan-Cancer Genomic Analysis Reveals PHF19 as a Carcinogenic Indicator Related to Immune Infiltration and Prognosis of Hepatocellular Carcinoma

Background As a crucial constituent part of Polycomb repressive complex 2, PHD finger protein 19 (PHF19) plays a pivotal role in epigenetic regulation, and acts as a critical regulator of multiple pathophysiological processes. However, the exact roles of PHF19 in cancers remain enigmatic. The present research was primarily designed to provide the prognostic landscape visualizations of PHF19 in cancers, and study the correlations between PHF19 expression and immune infiltration characteristics in tumor microenvironment. Methods Raw data in regard to PHF19 expression were extracted from TCGA and GEO data portals. We examined the expression patterns, prognostic values, mutation landscapes, and protein-protein interaction network of PHF19 in pan-cancer utilizing multiple databases, and investigated the relationship of PHF19 expression with immune infiltrates across TCGA-sequenced cancers. The R language was used to conduct KEGG and GO enrichment analyses. Besides, we built a risk-score model of hepatocellular carcinoma (HCC) and validated its prognostic classification efficiency. Results On balance, PHF19 expression was significantly higher in cancers in comparison with that in noncancerous samples. Increased expression of PHF19 was detrimental to the clinical prognoses of cancer patients, especially HCC. There were significant correlations between PHF19 expression and TMB or MSI in several cancers. High PHF19 levels were critically associated with the infiltration of myeloid-derived suppressor cells (MDSCs) and Th2 subsets of CD4+ T cells in most cancers. Enrichment analyses revealed that PHF19 participated in regulating carcinogenic processes including cell cycle and DNA replication, and was correlated with the progression of HCC. Intriguingly, GSEA suggested that PHF19 was correlated with the cellular components including immunoglobulin complex and T cell receptor complex in HCC. Based on PHF19-associated functional gene sets, an eleven-gene prognostic signature was constructed to predict HCC prognosis. Finally, we validated pan-cancer PHF19 expression, and its impacts on immune infiltrates in HCC. Conclusion The epigenetic related regulator PHF19 participates in the carcinogenic progression of multiple cancers, and may contribute to the immune infiltration in tumor microenvironment. Our study suggests that PHF19 can serve as a carcinogenic indicator related to prognosis in pan-cancer, especially HCC, and shed new light on therapeutics of cancers for clinicians.


INTRODUCTION
Cancer is a major concern regarding public health and the primary cause of death worldwide, and the incidence and mortality are rapidly increasing globally (1). Although cancer treatment has improved substantially over the last decades and currently allows cures for many previously fatal cases, large quantities of patients still experienced therapeutic failure and succumbed to cancer (2). Accordingly, there is a dire need to clarify the molecular mechanisms elucidating patterns of cancer pathogenesis and to identify reliable biomarkers for the early detection, diagnosis and treatment of cancers (3). Since the first human genome sequencing in 2001, comprehensive genomic characterization of tumors has become a major goal in the field of cancer research, and recent advances in sequencing technologies and computational analytical methods have revolutionized cancer research studies (4). Large-scale genomics projects like The Cancer Genome Atlas (TCGA) database, and the public repository named the NCBI Gene Expression Omnibus (GEO), provide matched molecular and clinical data of various cancers, which helps systematically analyze the survival impact of single gene expression. Currently, the application of cancer biomarkers has aroused great interest among scientists, which encourages researchers to explore novel prognostic biomarkers and therapeutic targets.
Polycomb group (PcG) proteins, as a class of widely-studied epigenetic modifiers, form large multiprotein complexes that serve as chromatin-modifying or -remodeling enzymes and participate in maintaining cell identity and cell differentiation, by keeping the transcriptional repression of functional genes which regulate developmental processes or cell-cycle progression (5,6). Dysregulation of PcG proteins was reported to play pivotal roles in the anomalous activation of cellular differentiation, carcinogenesis, cancer development and progression (7). PcG proteins generally assemble in two functionally distinct Polycomb repressive complexes (PRCs) referred to as PRC1 (responsible for H2AK119 monoubiquitylation) and PRC2 (catalyzing H3K27 methylation) (8). The PRC2 core formed by enhancer of zeste homolog 1/2 (EZH1/2), suppressor of zeste 12 (SUZ12), the embryonic ectoderm development (EED) and retinoblastoma-binding protein 4/7 (RBBP4/7), can interact with several substoichiometrical accessory proteins that modulate its function, including Polycomb-like (PCL) proteins (9,10). The Pcl gene was initially identified in Drosophila melanogaster (11), and three mammalian homologs of Drosophila Pcl have been characterized to date, termed PCL1 [also named PHD finger protein 1 (PHF1)], PCL2 [also named Metal response element binding transcription factor 2 (MTF2)], and PCL3 [also named PHD finger protein 19 (PHF19)], respectively (12). These PCL proteins are PRC2-relevant factors that form sub-complexes with PRC2 core components, and regulate the enzyme activity of PRC2 or its recruitment to the target loci (13).
PHD finger protein 19 (PHF19), namely PCL3, is an critical component of PRC2 that acts as a transcriptional repressor of several developmentally regulated genes and functions as a pivotal regulator of various biological processes (14). PHF19 protein contains a single Tudor domain followed by two plant homeodomain (PHD) fingers and an extended homologous (EH) domain, and binds trimethylated histone H3 Lys36 (H3K36me3) through its Tudor domain (15,16). Direct recognition of H3K36me3 by PHF19 is a requisite for the complete enzyme activity of PRC2 complex and serves to recruit PRC2 and H3K36me3 demethylases NO66 or KDM2b to specific genomic loci to facilitate the removal of H3K36me3 active mark and deposition of histone H3 Lys27 trimethylation (H3K27me3) (16,17). Previous researches have elucidated that PHF19 is overexpressed in multiple cancerous tissues compared with the normal tissue counterparts. For instance, PHF19 expression is present in all subgroups of multiple myeloma (MM) and is preferentially upregulated in high-risk MM (18). Aberrant overexpression of PHF19 has also implicated in gastric cancer, associated with cancer cell differentiation and poor prognosis for patients (19). Significantly elevated in the advanced stages of Glioblastoma (GBM), PHF19 was reported to block the degradation of b-catenin via transcriptional repression of SIAH1 and promote the progression of GBM (20). Tissue microarray analysis of surgically resected paired colorectal cancer (CRC) samples showed that PHF19 protein was overexpressed in CRC tissues compared with paired adjacent normal tissues (21). Nevertheless, despite the efforts to understand the roles of PHF19 in multiple cancers, a comprehensive analysis that determines the genetic targets and mis-regulated pathways controlled by PHF19 has not been reported so far, and the molecular contributions of PHF19 remain elusive.
In the current research, we conducted a integrative pancancer analysis of tumor samples from public databases. We investigated the expression patterns of PHF19 in normal tissues, various cell lines and cancers, and estimated the prognostic values of PHF19 in pan-cancer based on multiple databases. Besides, we explored the links between PHF19 expression and tumor mutation burden (TMB), microsatellite instability (MSI), immune checkpoints and immune infiltration, and identified the specific genes and signaling pathways involved in the regulation of cancer development by PHF19. Finally, due to the fact that functional enrichment analysis of PHF19 was obviously correlated with hepatocellular carcinoma (HCC), we constructed a PHF19-related prognostic risk-score model for HCC patients and performed a validation of this model in an external dataset. These findings may have important implications in guiding basic research as well as clinical practice.

Data Acquisition and Processing
As a landmark cancer genomics project, TCGA molecularly characterized more than 20,000 primary cancer and corresponding normal tissues across 33 cancer types (22). In our analysis, TCGA transcriptome RNA-seq data and clinical information were downloaded using the UCSC Xena platform (23). Transcripts per million (TPM) and fragments per kilobase million (FPKM) were used for quantification and comparison. Besides, two liver hepatocellular carcinoma (LIHC) cohorts and matched clinical data used in our study were respectively obtained from the Genomic Data Commons (GDC) Data Portal (https://portal.gdc.cancer.gov/) and the GSE14520 dataset from the GEO database (24).

Patients and Clinical Specimens
All the biospecimens are provided by Nanjing multicenter biobank, biobank of Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School. Written informed consents were obtained from all subjects, and normalized ethnic audit has been proceeded.

Analysis of PHF19 mRNA Expression Profiles
The mRNA expression profiles of PHF19 in major tissues and organs of human body were explored in the Human Protein Atlas (HPA), as well as the single cell transcriptomics analysis (25). Transcript levels of PHF19 in different cancers were analyzed using the ONCOMINE database (26), under the settings of P-value = 0.001 and fold change (FC) = 1.5, and in the "Gene_DE" module of TIMER2.0 database (27). Differential mRNA expression analysis of normal and tumor samples, and pathological stage analysis of PHF19, were performed in the "Single Gene Analysis" module of GEPIA (28). "Expression on Box Plots" module was used to depict box plots of expression differences between tumors and matched normal samples of the GTEx database, with the thresholds set as a P-value cutoff of 0.01 and log 2 FC cutoff of 1, and "Match TCGA normal and GTEx data" was set. The log 2 (TPM + 1) data was applied for log-scale.

Survival Analysis
Cox regression analysis for TCGA datasets was performed using RStudio software (version 1.2.5042) with the "survival" and "forestplot" package to investigate the correlation between PHF19 expression and cancer prognosis, including overall survival (OS) and disease-specific survival (DSS). We calculated the log-rank P-value and hazard ratio (HR) with 95% confidence intervals (95% CI) via the "survival" package and utilized the "forestplot" package to visualize the survival analysis. The Kaplan-Meier plotter, which is a web database aiming to evaluate the effect of 54,000 genes on survival in 21 tumor types (29), was used to determine PHF19 expressionassociated OS outcomes of patients. Additionally, the GEPIA2 database was also utilized to determine the correlation between PHF19 mRNA expression and OS and disease-free survival (DFS) of cancers (30).

Genomic Alterations and Mutation Profiles
Based on the cBioPortal tool (http://www.cbioportal.org/), PHF19 mutation frequency and general mutation count in cancer patients were calculated to analyze the genomic alterations of PHF19 in various TCGA cancer types (31). The genome alterations of PHF19 included copy number amplification, deep or shallow deletion, missense mutation with uncertain significance and mRNA upregulation. Tumor mutation burden (TMB) is calculated as total somatic nonsynonymous mutation counts in coding regions and emerging as a biomarker for predicting immunotherapy effect. Microsatellite instability (MSI) refers to the nucleotide insertions or deletions in the microsatellite loci. The TMB and MSI scores were obtained from TCGA database and analyses regarding association between PHF19 expression and TMB or MSI were conducted by R language.

Immune Infiltration Analysis
The ESTIMATE algorithm (32), which is a method that infers the fraction of immune and stromal cells in tumor samples via analysis of gene expression signatures, was applied to evaluate the immune cell infiltration levels (ImmuneScore) and the abundance of stromal components (StromalScore) for each TCGA sample in the RStudio software with the "estimate" package. The relationships of PHF19 expression with immune or stromal scores in several cancers were visualized as scatter plots. Higher ImmuneScore or StromalScore indicated larger proportion of immune or stromal components in tumor microenvironment (TME).

Enrichment Analysis
The protein-protein interaction (PPI) network was established applying the Search Tool for the Retrieval of Interacting Genes (STRING) with the following input parameters: "evidence", "experiments", and 0.200 confidence level (34). The protein interaction file from STRING database was imported into the Cytoscape software (version 3.8.2) for PPI network construction, visualization and analysis (35). Besides, we adjusted the parameter "minimum required interaction score" to conformity = 0.150 and set the parameter "max number of interactors to show" as "no more than 50 interactors", in order to get access to experimentally determined PHF19binding proteins.
"Similar Genes Detection" function of GEPIA2 database was utilized to acquire the first 100 PHF19-correlated genes based on the TCGA and GTEx datasets. "Correlation Analysis" module of GEPIA2 was applied to compute pair-wise gene expression correlations between PHF19 and selected genes, using the Pearson correlation method. We also used the "Gene_Corr" function of TIMER2.0 database to acquire the heatmap data of corresponding genes, containing the partial correlation coefficient (cor) and P-value calculated by the purity-adjusted Spearman's rank correlation test. Meanwhile, the Venn diagram was generated by a Venn diagram tool (http://bioinformatics. psb.ugent.be/webtools/Venn/) to perform the intersection analysis of the PHF19-binding and associated genes. These two sets of genes were combined to perform Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis by Metascape portal, which is designed to offer a comprehensive gene list annotation and analysis resource for experimental biologists (36). The resulting enriched pathways were visualized using the "ggplot2" R package. Besides, we conducted the Gene Ontology (GO) analysis to access the molecular functions (MF) via the "clusterProfiler" R package and the result was visualized using the cnetplot function.
CancerSEA is a dedicated database that portrays single-cell functional status maps that involve fourteen functional states of more than 40,000 single cells across 25 cancer types, aiming at comprehensively decoding distinct functional states of cancer cells at single-cell resolution (37). In the present research, the CancerSEA database was applied for the functional analysis of PHF19. Further gene set enrichment analysis (GSEA) was performed to identify the significant pathways between low expression and high expression group of PHF19, and the top four terms of GO analysis and transcription factor targets were exhibited, using the "clusterProfiler" R package.

Construction and Evaluation of Prognostic Risk Model
Forty genes were extracted from the PHF19 functionally associated gene set obtained by GO-MF analysis as described above, and the "limma" R package was used to determine differentially expressed genes (DEGs) between TCGA HCC samples and normal controls. We conducted univariate and multivariate Cox regression analyses by the "survival" package, and performed LASSO regression using the "glmnet" package to acquire the most useful predictive genes. The risk assessment model was constructed based on the corresponding coefficients and then applied to patients to generate the risk score of each patient. Patients in TCGA LIHC cohort and GSE14520 cohort were respectively separated into high-and low-risk groups in accordance with the median value of risk scores. We evaluated the predictive capability of the risk model by survival analysis and Receiver Operating Characteristic (ROC) curves. Univariate and multivariate Cox regression analyses were then conducted to confirm the prognostic efficiency of the risk-score model, as well as other clinicopathological features. Nomograms were formulated by using the "rms" R package in RStudio.

Statistical Analysis
For experimental studies, at least three biological replicates were repeated. Data were shown as average values ± SEM. The P value was calculated using GraphPad Software.

PHF19 Expression Profiles in Human Normal Tissues and Cancers
To determine the expression profiles of PHF19 in human normal tissues, we investigated the mRNA expression patterns of PHF19 in various non-tumor tissues and single cell types based on publicly available genome-wide expression data. As shown in Figure 1A, among all detected tissues and cell types, the highest PHF19 expression was observed in the monocytes, followed by the bone marrow and tonsil, based on the Consensus dataset created by integrating the data from three transcriptomics datasets (HPA, GTEx and FANTOM5). Low RNA tissue specificity was indicated since PHF19 was expressed in all tissues tested, with the consensus normalized expression (NX) in the vast majority of tissues > 1. With regard to RNA blood cell type specificity, interestingly, the PHF19 mRNA expression was obviously enriched in non-classical monocytes, when analyzing in the HPA/Monaco/Schmiedel datasets ( Figure 1B). Non-classical monocytes express CD14 low CD16 + antigen and constitute about 10%-15% of blood monocytes (38). Our result indicated that PHF19 might be implicated in specific functions of these monocytes.
We next retrieved PHF19 mRNA expression levels over a cancer-wide range via the ONCOMINE database. The results determined that compared with that in the corresponding normal groups, PHF19 expression was higher in cancer tissues, such as brain and CNS cancer, breast cancer, colorectal cancer, gastric cancer, head and neck cancer, liver cancer, lung cancer, lymphoma, melanoma, pancreatic cancer, and sarcoma ( Figure 1C). Yet in certain studies, PHF19 expression was lower in bladder cancer, breast cancer, leukemia, ovarian cancer, and prostate cancer. To further evaluate the expression status of PHF19 spanning various cancer types, we analyzed the TCGA RNA sequencing data by applying the TIMER2.0 approach. As presented in Figure 1D, PHF19 expression was significantly elevated in multiple cancer types, including BRCA (breast invasive carcinoma), CESC (cervical and endocervical cancer), CHOL (cholangiocarcinoma), COAD (colon adenocarcinoma), ESCA (esophageal carcinoma), GBM (glioblastoma multiforme), HNSC (head and neck cancer), KIRC (kidney renal clear cell carcinoma), KIRP (kidney renal papillary cell carcinoma), LIHC (liver hepatocellular carcinoma), LUAD (lung adenocarcinoma), L U S C ( l u n g s q u a m o u s c e l l c a r c i n o m a ) , P C P G (pheochromocytoma and paraganglioma), READ (rectum adenocarcinoma), STAD (stomach adenocarcinoma), and THCA (thyroid carcinoma), compared with their corresponding adjacent non-cancerous tissues. Meanwhile, PHF19 expression was markedly decreased in KICH (kidney chromophobe), PRAD (prostate adenocarcinoma), and UCEC (uterine corpus) than in their respective normal samples. By integrating data from the GTEx database as normal controls, we further performed differential-expression analysis of PHF19 between tumor and normal samples of DLBC (diffuse large B-cell lymphoma), LAML (acute myeloid leukemia), LGG (lower grade glioma), OV (ovarian serous), SKCM (skin cutaneous melanoma), and THYM (thymoma) ( Figure 1E). Besides, we further evaluated the correlation of PHF19 expression with cancer pathological stages, including ACC (adrenocortical carcinoma), BLCA (bladder urothelial carcinoma), KICH, KIRC, LIHC, LUSC, and THCA ( Figure 1F). The results determined a positive relationship between PHF19 level and advanced tumor stages.

Multifaceted Prognostic Analysis of PHF19 in Cancers
To investigate the clinical significance of PHF19 in cancer patients, we downloaded the TCGA mRNA sequencing and clinical information of 33 cancer types from the UCSC Xena platform and calculated the correlations of PHF19 expression with overall survival (OS) and disease-specific survival (DSS) of patients using the univariate Cox survival analysis. As shown in Figure 2A, the forest plots suggested that elevated PHF19 expression was significantly associated with worse OS in ACC (HR = 4.18, P < 0. showed that high expression of PHF19 was strongly associated with poor patient outcomes in multiple cancer types, which suggested that PHF19 may serve as a potential prognostic biomarker in pan-cancer. Of note, and in contrast, increased PHF19 expression was implicated in prolonged OS in THYM (HR = 0.32, P = 0.011).
Kaplan-Meier (KM) survival curves comparing PHF19 high and low expressing patients were also constructed to further evaluate the prognostic potential of PHF19 via the Kaplan-Meier plotter database. The results revealed that high PHF19 expression predicted worse OS in BC (bladder carcinoma), KIRC, KIRP, LIHC, LUAD, LUSC, and UCEC, nevertheless, patients with higher PHF19 expression showed remarkably improved OS in HNSC, SARC, THCA, and THYM (all logrank P values < 0.05) ( Figure 2B). We next compared the survival contribution of PHF19 in multiple cancer types, estimated using Mantel-Cox test through the GEPIA2 database, and the survival maps accompanied with OS curves and disease-free survival (DFS) curves are presented in Figure 2C. High transcriptional levels of PHF19 were linked to unfavorable prognosis in OS of ACC, LIHC, MESO, and SKCM, and DFS analysis data showed that elevated PHF19 level was related to unfavorable prognosis for ACC, LGG, LIHC, and UVM (uveal melanoma) (all log-rank P values < 0.05). Overall, the above data indicated that PHF19 expression was significantly correlated with patient prognosis in various cancers, especially in LIHC, and the relevance of PHF19 to clinical outcomes may shed new light on the underlying pathogenesis of different tumors.

Mutation Landscape of PHF19 in Cancers
We inspected the genomic alterations and mutation profiles of PHF19 in the TCGA cancer cohorts by employing the cBioPortal database. As presented in Figure 3A, the highest alteration frequency of PHF19 appeared in UCEC patients with "mutation" as the predominant type, while the "amplification" type of copy number alteration (CNA) and copy number "deep deletion" were respectively the primary type in KICH and THCA. Besides, we detected altogether 72 mutation sites including 65 missense, 4 truncating, 2 inframe, and 1 fusion mutation between amino acids 0 and 580, and the types, sites and case number of PHF19 genomic alterations were shown in Figure 3B. We also analyzed the general mutation count of PHF19 in 10953 patients/10967 samples from TCGA datasets ( Figure 3C). In addition, we investigated the association between PHF19 alteration and the clinical outcomes of UCEC cases, and found that UCEC patients with altered PHF19 showed improved prognosis in terms of progression-free survival (PFS) (log-rank P = 0.035), but not OS (log-rank P = 0.077), DFS (log-rank P = 0.101), and DSS (log-rank P = 0.224), compared with those without PHF19 alteration ( Figure 3D). Meanwhile, since TMB and MSI are regarded as critical factors impacting on oncogenesis and progression of tumors, and affecting response to immunotherapy in cancers, we next performed association analyses between PHF19 expression and TMB/MSI spanning all

PHF19 Expression Correlates With Tumor Immune Infiltration
Tumor-infiltrating immune cells, as principal compositions of the TME, are frequently involved in tumor behaviors including cancer initiation, progression or metastasis, and are deemed as independent predictors of sentinel lymph node status and cancer prognosis (39). Given that PHF19 expression correlates with TMB and MSI which affect response to cancer  immunotherapy, we next explored the correlations between PHF19 expression level and the abundance of immune cell infiltrates. By adopting the ESTIMATE method, we first computed the immune and stromal scores of cancer tissues. As Figure 4A indicated, PHF19 was correlated with the immune and stromal scores in KIRC, PRAD and THCA (all data Pvalues < 0.001). Since immune checkpoint-associated genes participate in the immunosuppressive mechanism that allows tumor cells to escape anti-tumor immunity (40), we next investigated the correlations between PHF19 expression and immune checkpoint-related genes, including BTLA, CD27, CD274, CD276, CD28, CD40, CD70, CD80, CD86, CTLA4, HAVCR2, HHLA2, ICOS, ICOSLG, IDO1, IDO2, LAG3, PDCD1, TIGIT, TNFRSF9, and TNFSF9 across human cancers from the TCGA cohorts, as shown in Figure 4B. Our results suggested that PHF19 expression was closely associated with almost all immune checkpoint-associated genes in BLCA, BRCA, HNSC, LIHC, PRAD and THCA, implying that PHF19 might conduce to immune escape in these tumors. Further, we calculated the correlation coefficients of PHF19 expression and immune infiltration levels by employing the TIDE and XCELL algorithms, and depicted the landscape of PHF19 correlating with immune cell infiltrates in various TCGA cohorts. The heatmap exhibited that PHF19 expression was positively and statistically significantly correlated with the immune infiltration of myeloidderived suppressor cells (MDSCs) and Th2 subset of CD4+ T cells in the majority of cancers ( Figure 4C). Intriguingly, PHF19 expression was also positively relevant to the infiltration abundance of CD4+ Th1 cells in 18 cancer types, with all correlation coefficients < 0.45, and negatively relevant to that in PRAD, with the correlation coefficient = -0.29. Representative scatter plots of MDSC infiltration level related to PHF19 expression were presented in Figure 4D, using the TIDE algorithm (with the correlation coefficient > 0.5). The results indicated that PHF19 expression was obviously positively correlated with the infiltration abundance of MDSCs in ACC (Cor = 0.645, P = 7.57e-10), KICH (Cor = 0.542, P = 3.15e-06) and LIHC (Cor = 0.669, P = 3.43e-46). As shown in Figure 4E,

Enrichment Analysis of PHF19-Related Partners
To further decipher the underlying molecular mechanisms by which PHF19 contributes to carcinogenesis, we next investigated the available experimentally confirmed PHF19binding proteins and PHF19 expression-associated genes for pathway enrichment analyses. In total, 50 PHF19-interacted proteins were retrieved from the STRING database by experimental evidence, and the PPI network of proteins with confidence level > 0.200 was presented as Figure 5A. We next acquired the top 100 genes that associated with PHF19 expression based on TCGA and GTEx datasets by utilizing the GEPIA2 database. As seen in Figure 5B, The correlation heatmap showed that PHF19 was positively related to the above genes in the majority of TCGA cancers ( Figure 5C). Besides, we performed the Venn intersection analysis between the two datasets described above and identified a common member, namely EZH2 ( Figure 5D). Further, these two datasets were combined to perform KEGG pathway and GO molecular function (MF) enrichment analyses. As presented in Figure 5E, several pathways including "p53 signaling pathway", "microRNAs in cancer", "hepatocellular carcinoma", "apoptosis", "DNA replication" and "cell cycle" were revealed as the most significantly enriched KEGG pathways, indicating that PHF19 was crucially involved in the development and progression of cancers, especially HCC. Meanwhile, the GO-MF enrichment analysis confirmed that five terms were highly enriched, such as histone binding, catalytic activity acting on DNA, DNA-dependent ATPase activity, helicase activity and ATPase activity ( Figure 5F). We also performed single-cell analysis by using CancerSEA database, and determined that PHF19 clearly stimulated a multitude of carcinogenic processes, including promotion of the cell cycle, DNA damage, epithelial to mesenchymal transition (EMT), invasion, and proliferation in different cancer cell types ( Figure 5G). To gain insight into the potential effect of PHF19 on HCC progression, we then downloaded the LIHC RNA-seq data from TCGA portal and performed GSEA analysis based on PHF19 expression level to identify the relevant pathways and underlying mechanisms. Enrichment score (ES) was calculated to compare the enrichment of genes in a ranked list. Our results indicated that PHF19 was significantly enriched in humoral immune response mediated by circulating immunoglobulin, immunoglobulin production, and pathways related to immunoglobulin complex and T cell receptor complex ( Figure 5H). Moreover, we conducted the transcription factor target analysis, and found that DICER1 (dicer 1, ribonuclease III), GTF3A (general transcription factor IIIA), RUVBL2 (RuvB like AAA ATPase 2), and ZNF704 (zinc finger protein 704) were the main transcription factors participating in the PHF19regulated pathways in HCC.

Construction and Evaluation of Prognostic Risk-Score Model
For assessing the application of PHF19-associated functional gene sets in HCC prognosis, we entered the variables in Figure 5F into a univariable Cox proportional hazard regression to analyze the training set, namely the TCGA LIHC cohort. This strategy led to an optimal eleven-gene prognostic model in HCC, and the formula was applied to calculate the risk score of each patient, as follows: RiskScore = 0.081 * KIF2C + 0.026 * SMARCC1 -0.005 * ASF1B + 0.032 * RBBP4 + 0.031 * MCM6 -0.012 * KIF11 -0.089 * RAD54L + 0.069 * GINS1 + 0.123 * CBX8 + 0.001 * ANP32B -0.084 * SUZ12. To better validate the robustness of the model, GSE14520 cohort was used as the independent external validation dataset.
We first calculated and plotted the prognostic Kaplan-Meier survival curves predicted by this model in both internal and external datasets (Figures 6A, B). The results showed that patients with high risk scores had obviously less survival probability than low-risk patients, which meant the higher the score, the worse the prognosis. The distribution of risk scores, survival statuses, and signature gene expression patterns for HCC patients in training and validation sets were visualized in Figures 6C, D, respectively. In TCGA LIHC cohort, both univariate (HR = 4.01, p < 0.001) and multivariate (HR = 3.71, p < 0.001) Cox regression analyses determined that the prognostic signature was strongly associated with prognosis ( Figure 6E). Moreover, as shown in Figure 6F, the risk score was correlated with prognosis in univariate COX regression model (HR = 1.97, p = 0.004) in GSE14520 cohort, and the multivariate analysis suggested that the risk score was capable to independently predict the prognosis of HCC after adjusting for gender, age, AFP, ALT, tumor size, multinodular, BCLC staging, CLIP staging, and TNM staging (HR = 1.65, p = 0.038). These results suggested that the eleven-gene prognostic signature performed well in predicting the prognosis of HCC patients, and could function as a useful tool to supplement the gold standard for clinical diagnosis. Additionally, we further performed the time-dependent ROC curve analysis to validate the predictive classification efficiencies of risk-score model in HCC, and the area under the curve (AUC) values for 0.5-, 1-, 2-, 3-, and 5-years overall survival were presented in Figure 6G. Finally, as shown in Figures 6H, I, we formulated the prognostic nomograms to anticipate the individualized survival probability based on TCGA LIHC cohort and GSE14520 cohort, which might contribute to efficacy assessment and managing patients.

Validation of PHF19 Expression and Impacts of PHF19 on Immune Infiltrates in HCC
To ensure positive confirmation of pathophysiological roles of PHF19, we applied experimental validation to investigate its clinicopathological characteristics. We performed immunohistochemical (IHC) analyses in 78 cancer samples across BRCA, CESC, CHOL, COAD, ESCA, KIRC, KIRP, LIHC, LUAD, LUSC, PCPG, READ, and STAD, with three pairs of different surgical specimens analyzed per tumor type ( Figures 7A-M). Adjacent or distant noncancerous tissues from the surgical margin were used as the control tissues. We found that the PHF19 protein expressions were significantly higher in tumor tissues in comparison with the control tissues, and the results were quantitated in Figure 7N, which indicated the extensive carcinogenic effects of PHF19.
In view of the prognostic value of PHF19 in HCC, we further studied the impacts of PHF19 expression on immune infiltration, by performing flow cytometry analysis on 15 clinical specimens diagnosed as HCC. MDSCs are a heterogeneous population of cells which expand during cancer, inflammation and infection, with a remarkable ability to suppress T-cell responses, and were defined as CD11b + CD14 -CD33 + in most tumors (41). We first determined the PHF19 mRNA levels in all samples, and patients were ranked according to PHF19 expression, and divided into low-, median-and high-PHF19 expression groups, respectively ( Figure 8A). Significant differences were observed between these three groups. We found that the MDSC infiltration ratios were clearly higher in high-PHF19 expression group than in low-PHF19 group ( Figures 8B, C), and PHF19 expression level was closely related to the degree of immune infiltration of MDSCs ( Figure 8D). Similarly, we detected the infiltration of Th2 subsets of CD4+ T cells in specimens, which were defined as CD4 + IL4 + . The results showed that hardly Th2 subsets can be detected in low-PHF19 expression group ( Figure 8E). Collectively, these data indicated that PHF19 expression had noticeable effects on immune cell infiltration in HCC.

DISCUSSION
The physiological functions of PRC2 complex are subjected to intricate cellular regulation, which is correlated with the enormous complexity of PRC2 components (42). Previous studies have proved that H3K36me3-binding activity is harbored in the Tudor motif of PRC2-associated PCL protein named PHF19, and the Tudor function of PHF19 is also essential for H3K27me3 and repression of previously described 'poised' developmental genes (43). Although investigators have gained some understanding of the regulation of Polycomb activity by PHF19, little is known about whether and how it drives tumor initiation, progression, and metastasis (13). Collectively, as a critical epigenetic related gene, PHF19's potential roles in carcinogenesis and cancer development are worthwhile to be disclosed.
In the present study, we explored the pan-cancer expression profiles of PHF19, and the correlation between PHF19 aberrant expression and patient prognosis in different cancers. Compared with corresponding noncancerous tissues, PHF19 expression was significantly up-regulated across a range of cancers, which implied the extensively oncogenic characteristics of PHF19 in cancers and promising perspectives in the field of cancer research. This present result is consistent with findings of previous study in 2004 when PHF19 was first identified in human tissues and extends the work in important ways (44). COX regression analyses suggested that elevated PHF19   (18), and reduced PHF19 levels in chronic myeloid leukemia cells arrested the cell cycle and promoted differentiation toward erythroid fate (45).
Cancer is a complicated disease involving complex reciprocal networks between tumor cells and the immune system. TME is composed of a variety of cell types, including mesenchymal cells and resident and infiltrating immune cells (46). Our initial exploration demonstrated that aberrant PHF19 expression was correlated with increased immune cell infiltration of MDSCs and Th2 subset of CD4+ T cells in the majority of cancers, which implied potential value of clinical application for PHF19 in cancer treatment. Ample evidence has supported that MDSCs are critical in regulating immune responses under pathological conditions, and play an prominent role in tumor angiogenesis, drug resistance, and promotion of cancer metastases (47). Past literature pointed that the discovery of CD4+ T cell subsetdefining key transcription factors and framing of the Th1/Th2 paradigm ignited the CD4+ T cell field (48). CD4+ T cell subsets, such as Th1, Th2, Th17, and regulatory T (Treg) cells, serve pivotal functions in cancer immunity, among which the Th2 subset of CD4+ T cells secretes IL-4, IL-5, and IL-13, and activates B cells to become antibody-secreting plasma cells (49). It is worth noting that the balance between Th1 and Th2 differentiation is critical for immune homeostasis, and shift of Th1/Th2 balance towards Th2 cells is correlated with the immunosuppression and progression of cancer (50)(51)(52). Previously, researchers found that PHF19 restrained T cell senescence and sustained CD8+ T cell antitumor responses by orchestrating a transcriptional program extensively shared with miR-155 (53). Yet, more exact mechanisms underpinning the effects of PHF19 on tumor immunity remain to be elucidated. Cancer immunotherapies, especially immune checkpoint blockade therapy, have shifted the treatment of cancer by promoting complete and durable responses, and are now standard treatment for various malignant tumors (54). Unfortunately, only a small proportion of patients with certain cancer types respond to immunotherapy, probably due to inadequate immune activation to recognize tumor-specific antigens (55). Therefore, it is essential to identify additional potential therapeutic targets. Our current research showed that PHF19 levels demonstrated strong correlations with a variety of immune checkpoint molecules in BLCA, BRCA, HNSC, LIHC, PRAD and THCA. Moreover, in LIHC cohort, PHF19 was closely correlated with checkpoints including BTLA, CD27 ,  CD274, CD276, CD28, CD40, CD70, CD80, CD86, CTLA4,  HAVCR2, HHLA2, ICOS, IDO1, LAG3, PDCD1, TIGIT,  TNFRSF9, and TNFSF9, indicating that PHF19 serves as a potential immune-related therapeutic target for HCC patients.
In consequence, the present study points new directions for delineating the relationships between the epigenetic related PHF19 gene and immune cell infiltration within the TME, which may have important implications for exploring new strategies for cancer therapy.
As the fourth leading cause of cancer-related mortality globally, HCC imposes a huge health burden on society (56). To better explore new targets for early diagnosis and treatment, there is an urgent need to determine novel prognostic predictors and construct more reliable prognostic models of HCC. Our results provided evidence that elevated PHF19 expression indicated worse clinical outcomes in HCC patients. The GSEA results revealed that PHF19 was associated with the cellular components including immunoglobulin complex and T cell receptor complex in HCC, which provided new ideas for future research. Moreover, after the generation of eleven-gene prognostic signature, we performed a preliminary in silico validation using the external GEO dataset, which proved the effectiveness of the model. Taken together, the present study unveiled the complicated roles of PHF19 aberrant expression in the progression and prognoses of cancers, and summarized the pivotal signaling pathways associated with the pathophysiological functions of this epigenetic related gene. We also demonstrated that PHF19 played important roles in regulating tumor-infiltration of immune cells, and might exhibit beneficial therapeutic effects on cancer treatment. Enrichment Analysis highlighted the potential mechanistic basis of PHF19 in induction of HCC development, and the prognostic signatures derived from PHF19-related functional gene sets were validated to predict the overall survival of HCC independently. While these findings warrant further investigation, our research provides novel insights into the promising application prospects of PHF19 in the field of cancer research.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the human ethics committees of the Affiliated Drum Tower Hospital, Medical School of Nanjing University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
Z-yZ and N-T conceived and designed the study, collected and assembled data, performed data analysis and interpretation, and wrote the manuscript. M-fW and J-cZ collected and assembled data, and performed data analysis and interpretation. J-lW, H-zR and X-lS conceived and designed the study, provided financial support and study material, performed data analysis and interpretation, wrote and gave final approval of the manuscript. All authors read and approved the manuscript.