Genomics and Prognosis Analysis of N6-Methyladenosine Regulators in Lung Adenocarcinoma

Objective: N6-methyladenosine (m6A) modification is involved in modulating various biological processes in human cancers. But the implication of m6A modification in lung adenocarcinoma (LUAD) is still unclear. Hence, this study conducted a comprehensive analysis of the expression and clinical implication of m6A regulators in LUAD. Methods: Consensus clustering analysis of 502 LUAD samples in the TCGA dataset was presented based on the expression profiles of 20 m6A regulators using ConsensusClusterPlus package. Overall survival (OS), activation of signaling pathways and tumor immunity (immune/stromal score, tumor purity, expression of HLA and immune checkpoints, and immune cell infiltration) were compared between m6A modification patterns. The m6A-related genes between patterns were identified and prognostic m6A-related genes were imported into LASSO-cox regression analysis. The m6A risk score was developed and its prognostic implication was evaluated and externally verified in the GSE30219 and GSE72094 dataset. Furthermore, a nomogram that contained independent prognostic indicators was established, followed by external verification. Results: Two m6A modification patterns were clustered across LUAD based on the expression similarity of the m6A regulators via consensus clustering analysis, with distinct OS, activation of signaling pathways and tumor immunity. Totally, 213 m6A-related genes that were identified by comparing two patterns were significantly related to LUAD prognosis. By LASSO method, we constructed the m6A risk score that was a reliable and independent prognostic factor for LUAD. Patients with low m6A risk score displayed a prominent survival advantage. After incorporating independent clinical features, we developed the prognostic nomogram that exhibited high predictive accuracy and the best clinical net benefit for OS. Conclusion: Collectively, our study may provide a clinically useful tool for precise prognostic management and optimization of immunotherapeutic strategies for LUAD patients.


INTRODUCTION
Lung cancer has the high incidence and mortality globally, occupying almost 20% of cancer-related deaths in 2018 (Bray et al., 2018). It was estimated that there were 2.1 million new lung cancer cases and 1.8 million deaths in 2018 (Bray et al., 2018). This disease mainly includes two histological subtypes: non-small cell lung cancer (NSCLC; 85%) and small cell lung cancer (SCLC). NSCLC contains lung adenocarcinoma (LUAD) and lung squamous cell carcinoma . LUAD is the main histology, and its incidence is constantly on the rise. Conventional therapeutic options against NSCLC include surgery resection, chemotherapy, and radiotherapy. Despite the progress in combined and personalized therapies such as tyrosine kinase inhibitors and immunotherapies (PD1/PD-L1 inhibitors), the 5-year survival rate is only 16% (Zhang C. et al., 2020). Diagnosis of LUAD usually occurs at an advanced stage, and most patients experience badly toxic treatment and poor clinical benefit (Schmidt et al., 2019). Hence, it is of importance to explore specific prognostic models for predicting patients' survival, which can assist design appropriate therapeutic strategies and management choice for distinct LUAD subgroups. N 6 -methyladenosine (m 6 A) is the most abundant type of RNA post-transcriptional modification in eukaryotes, which plays a key role in a variety of biological processes by regulating the translation, splicing, stabilization, and degradation of mRNAs . Typically, m 6 A regulators contain three types: writers (including VIRMA, METTL14, METTL3, RBM15, RBM15B, RBMX, WTAP, and ZC3H13), erasers (including ALKBH5 and FTO) and readers (including HNRNPA2B1, HNRNPC, IGF2BP1, IGF2BP2, IGF2BP3, YTHDC1, YTHDC2, YTHDF1, YTHDF2, and YTHDF3) (Fu et al., 2014). Emerging evidence suggests that the 20 m 6 A regulators display tight relationships with LUAD Chao et al., 2020;. For instance, YTHDC2 suppresses LUAD carcinogenesis through inhibiting SLC7A11-dependent antioxidant function . Moreover, FTO accelerates LUAD progression through mRNA demethylation (Ding et al., 2020). ALKBH5 facilitates proliferation and invasion of LUAD cells following intermittent hypoxia (Chao et al., 2020). YTHDF1 is linked to hypoxia adaptation and LUAD progression (Shi et al., 2019). FTO triggers LUAD progression through activating cell migration via mRNA demethylation (Ding et al., 2020). These experimental evidences suggest that an in-depth understanding of m 6 A regulators may deepen our understanding on the role of m 6 A modification in the progression of LUAD. Here, we comprehensively analyzed the expression and clinical implication of m 6 A regulators in LUAD.

Consensus Clustering Analysis
Consensus clustering analysis was carried out utilizing ConsensusClusterPlus package (version 1.48.0) to assign LUAD patients in the TCGA dataset into different m 6 A modification patterns with 50 iterations and resample rate of 80% based on the expression matrix of the 20 m 6 A regulators (Wilkerson and Hayes, 2010). Kaplan-Meier curves of overall survival (OS) were conducted between two m 6 A modification patterns. The survival difference was compared with log-rank test. The t-distributed stochastic neighbor embedding (t-SNE) was presented to validate the accuracy of this classification.

Gene Set Variation Analysis
The activation of pathways was quantified in each LUAD sample from the TCGA dataset by single-sample gene set enrichment analysis (ssGSEA) method derived from GSVA package (version 1.32.0) in an unsupervised manner (Hänzelmann et al., 2013). The gene set of "c2. cp.kegg.v7.2. symbols" was obtained from the Molecular Signatures Database, which was used as the reference set (Liberzon et al., 2015).

Estimation of Tumor Immunity
According to the normalized expression matrix, stromal and immune scores across LUAD samples in the TCGA dataset were estimated via the Estimation of Stromal and Immune Cells in Malignant Tumors Using Expression Data (ESTIMATE) method (https://sourceforge.net/projects/ estimateproject/). (Yoshihara et al., 2013) that was applied for inferring the overall infiltrations of stromal and immune cells in LUAD tissues based on gene symbols. The tumor purity was calculated via ESTIMATE and consensus measurement of purity estimations methods. Tumor immune signatures were assessed in LUAD samples, including the mRNA expression of human leukocyte antigen (HLA) family genes and immune checkpoints. The infiltration levels of immune cells were quantified across LUAD samples based on the published gene signatures utilizing the ssGSEA algorithm (Charoentong et al., 2017;Jia et al., 2018).

Identification of m 6 A-Related Differentially Expressed Genes
The DEGs were screened between two m 6 A modification patterns in the TCGA dataset through limma package (version 3.40.6) (Ritchie et al., 2015). The cut-off was |log2 fold change (FC)|>1 and false discovery rate (FDR) < 0.001. FDR was calculated with Benjamin-Hochberg method. The m 6 A-related DEGs were visualized into volcano and heat maps via pheatmap package (version 1.0.12).

Construction of a Least Absolute Shrinkage and Selection Operator-Cox Regression Model
LASSO represents a regularization and descending dimension method that has been applied for prognostic Cox models. Univariate-cox regression analysis was utilized to assess the correlation between overall survival (OS) of LUAD patients in the TCGA dataset and the m 6 A-related DEGs. The genes with p < 0.05 were input into the LASSO-cox regression model through glmnet package (version 2.0.16) (Engebretsen and Bohlin, 2019). Variable selection was presented for penalizing the data fitting criteria, which reduced the complexity and made the model more interpretable. The coefficient of each variable was the average estimate of the coefficient obtained from 10-fold crossverification. The m 6 A risk score was developed following the formula: risk score n i 1 Coef (i)X (i), where n indicated the number of variables in this LASSO model, Coef 1) represented the regression coefficient, and X 1) meant the mRNA expression levels of variables in LUAD samples. To evaluate the prediction utility of the LASSO model, time-dependent receiver operating characteristic (ROC) curves were conducted by survivalROC package (version 1.0.3) in the TCGA, GSE30219 and GSE72094 datasets, followed by calculation of one, three and 5-year area under curve (AUC). In the two datasets, LUAD patients were separately split into two groups according to the median m 6 A risk score through survminer (version 0.4.9) and survival (version 3.2-13) packages. Kaplan-Meier curves of OS were depicted for two groups via survival package and OS difference was compared with log-rank test. The distribution of survival status in two groups was then visualized. By pheatmap package, heatmap was established to visualize the mRNA expression pattern of each variable in the LASSO model.

Estimation of the Prediction Independency of the m 6 A Risk Score
To estimate whether the m 6 A risk score independently predicted LUAD patients' OS, univariate-and multivariate-cox regression analysis was carried out following adjusting clinical features (gender, stage, T, N and M) in the TCGA, GSE30219 and GSE72094 datasets. Hazard ratio (HR) and p value were calculated for each variable.

Construction of a Nomogram Model
To better apply the m 6 A risk score in clinical practice, a nomogram that included independent prognostic indicators was conducted to predict LUAD patients' one, three and 5year OS in the TCGA, GSE30219 and GSE72094 datasets via rms package (version 6.2-0). Calibration plot was presented to evaluate predictive performance of the m 6 A risk score. Furthermore, decision curve analysis was carried out for calculating the clinical net benefit of every model in comparison to all or none strategies. The none plots indicated the assumption that no subjects had one, three or 5-year OS. Meanwhile, all plots indicated the assumption that each subject had one, three or 5-year OS at specific threshold probabilities. The best model was the one with the highest net benefit.

Statistical Analysis
All statistical analysis was implemented through the R software (version 3.6.3). Wilcoxon test was used for comparison between two groups. p < 0.05 was statistically significant.

Construction of m 6 A Regulators-Mediated m 6 A Modification Patterns in Lung Adenocarcinoma
A total of 502 LUAD samples were clustered based on the expression similarity of the m 6 A regulators via consensus clustering analysis. Our data found that when the number of groups (k) 2, there was an excellent clustering among LUAD samples in the consensus matrix ( Figure 2A). Consensus Red indicated that a specific regulator was a risk factor for a type of cancer and blue indicated that a specific regulator was a protective factor for a type of cancer. (C) Comparison of the expression of m 6 A regulators between LUAD and normal tissues in the TCGA dataset with Wilcoxon test. Ns: not significant; *p < 0.05; ***p < 0.001.
Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 746666 Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 746666 5 cumulative distribution function (CDF) diagram showed that when k value 2, CDF reached an approximate maximum ( Figure 2B). Delta area plot depicted the relative change in the area under CDF curve for k compared to k-1 ( Figure 2C). As shown in tracking plot, when k value 2, sample classification was stable ( Figure 2D). Hence, we clustered LUAD patients into two m 6 A modification patterns, named as C1 (N 318) and C2 (N 184). To further understand the characteristics of m 6 A modification patterns clustered by consensus clustering analysis in LUAD, we firstly analyzed the difference in OS. The data showed that C2 exhibited a more unfavorable OS in comparison to C1 (p 0.00054; Figure 2E). Furthermore, we visualized the expression patterns of the m 6 A regulators in two m 6 A modification patterns. As shown in Figure 2F, IGF2BP2, IGF2BP1, IGF2BP3 had distinctly higher expression in C2 compared to C1. The t-SNE was carried out for verifying whether the categories were appropriate. Our results showed that most of samples from C1 and C2 were separately gathered ( Figure 2G), indicating that the clustering of two m 6 A modification patterns was a relatively good choice. By applying GSVA algorithm, activation of several signaling pathways was quantified in each LUAD sample. We found that E2F targets, G2M checkpoint, MYC targets, mTORC1 signaling, DNA repair and unfolded protein response had higher activations in C2 than C1 ( Figure 2H).

Two m 6 A Modification Patterns Characterized by Distinct Tumor Immunity
The overall infiltration levels of immune and stromal cells were estimated in 502 LUAD samples from the TCGA dataset via ESTIMATE algorithm. Compared to C1, C2 pattern had a significantly decreased immune score (p 0.0025; Figure 3A). But no significant difference in stromal score was detected between m 6 A modification patterns ( Figure 3B). There was significantly increased tumor purity in C2 than C1 (p 0.049; Figure 3C). The mRNA expression of HLA genes and immune checkpoints was compared between m 6 A modification patterns. Most of HLAs were highly expressed in C1 compared to C2, including HLA-E, HLA-  Figure 3D). We also evaluated the differences in mRNA expression of common immune checkpoints between two m 6 A modification patterns. Our results showed that BTLA, CD200R1, CD40LG, CD48, HHLA2, IDO2, LGALS9, TNFRSF14, TNFSF14 and TNFSF15 displayed higher mRNA expression in C1 compared to C2 ( Figure 3E). Meanwhile, C2 pattern had increased mRNA expression of CD200, CD274, CD276, IDO1, LAG3, PDCD1, PDCD1LG2, TNFRSF25, TNFRSF8, TNFRSF9, TNFSF4 and VTCN1 in comparison to C1. The infiltration levels of immune cells were quantified in each LUAD specimen via ssGSEA algorithm. Compared to C2, there were increased infiltration levels of activated B cell, activated CD8 T cell, central memory CD4 T cell, effector memory CD8 T cell, immature B cell, T follicular helper cell, type 17 T helper cell, activated dendritic cell, eosinophil, immature dendritic cell, mast cell and monocyte in C1 ( Figure 3F). The higher infiltration levels of activated CD4 T cell, central memory CD8 T cell, memory B cell, type 2 T helper cell and plasmacytoid dendritic cell were found in C2 than C1.

Identification of DEGs Between Two m 6 A Modification Patterns
To explore the molecular mechanisms underlying two m 6 A modification patterns, we presented differential expression analysis. With the cutoff of |log2FC|>1 and adjusted p < 0.001, a total of 297 genes exhibited abnormal expression between two m 6 A modification patterns ( Figures 4A,B). Among them, 111 genes were down-regulated and 186 were up-regulated in C1 compared to C2 (Supplementary Table  S2). These genes could be affected by m 6 A methylation modification in LUAD.

Development of an m 6 A Risk Score for Lung Adenocarcinoma
Prognostic implications of the 297 m 6 A-related DEGs were assessed via univariate-cox regression analysis. As a result, 213 genes had significant correlations to LUAD prognosis in the TCGA dataset (Supplementary Table S3). Candidate prognostic m 6 A-related DEGs were further screened with LASSO-Cox regression analysis (Figures 5A,B). As a result, 12 candidate m 6 A-related DEGs were identified for constructing a LASSO-cox regression model. Non-zero coefficients and the expression of 12 m 6 A-related DEGs in this LASSO-cox regression model were calculated in the TCGA dataset. The m 6 A risk score formula was as follows: m 6 A risk score 0.0301860610758127 * mRNA expression of ANLN + 0.0263443393604996 * mRNA expression of PLK1 + 0.0576834829109261 * mRNA expression of IGF2BP1 + 0.0236368688862302 * mRNA expression of HMMR + 0.0356239951044486 * mRNA expression of NEIL3 + (-0.00157287764972551) * mRNA expression of SFTA3 + (-0.0250630515726943) * mRNA expression of CXCL17 + (-0.0244060686771379) * mRNA expression of IRX5 + 0.0277060677471147 * mRNA expression of PKP2 + 0.0281636442957098 * mRNA expression of LYPD3 + 0.00917559407956536 * mRNA expression of ABCC2 + 0.157371504648263 * mRNA expression of DKK1. ROC curves were conducted to evaluate whether the m 6 A risk score accurately and sensitively estimated the survival likelihood of LUAD patients in the TCGA dataset. The AUC values of one, three and 5-year OS were separately 0.751, 0.690 and 0.611 ( Figure 5C). These indicated the good predictive performance of the m 6 A risk score. Figure 5D depicted the distribution of the m 6 A risk score across LUAD patients. According to the median m 6 A risk score, patients were split into high-and low-m 6 A risk score groups. The OS difference was compared between groups. As shown in Figure 5E, low m 6 A risk score group displayed a potential survival advantage in comparison to high m 6 A risk score group (p < 0.0001). The distribution of survival status was visualized in Figure 5F. We found that high m 6 A risk score group had the relatively increased number of dead status than low m 6 A risk score group. Figure 5G showed the mRNA expression of 12 variables in the model between high-and low-m 6 A risk score groups. DKK1, PKP2, LYPD3, NEIL3, HMMR, ANLN, PLK1, IGF2BP1 and ABCC2 displayed higher mRNA expression in high-m 6 A risk score group compared to low-m 6 A risk score group.

External Verification of the m 6 A Risk Score in Lung Adenocarcinoma Prognosis
To externally verify the prognostic implication of the m 6 A risk score, we acquired the transcriptome data and follow-up information of 274 LUAD patients from the GSE30219 cohort. The AUC values under one, three and 5-year OS were separately 0.663, 0.677 and 0.694 ( Figure 6A). According to the median m 6 A risk score, we clustered these LUAD patients into two groups ( Figure 6B). High m 6 A risk score group had more patients with dead status than low m 6 A risk score group ( Figure 6C). Consistently, high m 6 A risk score was markedly associated with worse prognosis of LUAD in comparison to low m 6 A risk score (p < 0.0001; Figure 6D). Furthermore, DKK1, PKP2, LYPD3, NEIL3, HMMR, ANLN, PLK1, IGF2BP1, and ABCC2 were highly expressed in high m 6 A risk score group than low m 6 A risk score group ( Figure 6E). We also verified the prognostic significance of the m 6 A risk score in the GSE72094 dataset. The AUC values at one, three and 5-year OS were 0.699, 0.635, and 0.663 ( Figure 6F). As expected, high m 6 A risk score indicated poorer OS than low m 6 A risk score ( Figure 6G). Therefore, the m 6 A risk score possessed the potential in accurately predicting survival outcomes of LUAD patients.

Establishment and Verification of a Prognostic Nomogram for Lung Adenocarcinoma Patients
A nomogram was built for predicting one, three and 5-year OS likelihood of LUAD patients in the TCGA dataset, including the m 6 A risk score and stage that were obtained from multivariatecox regression analysis ( Figure 8A). Calibration diagram demonstrated that there was a high consistency in this nomogram-predicted and actual one, three and 5-year OS probabilities ( Figures 8B-D). Moreover, decision curves suggested that the nomogram showed the best clinical net benefit for one, three and 5-year OS ( Figures 8E-G). The nomogram was externally verified in the GSE30219 cohort (Supplementary Figure S1A-G) and GSE72094 cohort (Supplementary Figure S2A-G).

DISCUSSION
LUAD is usually diagnosed at an advanced stage, characterized by a high mortality . The development of more effective therapeutic strategies requires an in-depth understanding of factors that impact the initiation and progression of LUAD. Recently, several studies have reported the key implications of m 6 A regulators during LUAD development Chao et al., 2020;. For example, Zhuang et al. constructed a diagnostic score model and a prognostic model for LUAD based on m 6 A regulators (Zhuang et al., 2020). Zhou et al. characterized two molecular subtypes with diverse prognosis and tumor microenvironment in LUAD based on m 6 A RNA methylation modification . Wu et al.
developed a five-m 6 A regulatory gene signature as a prognostic biomarker in LUAD patients . Xu et al. proposed m 6 A-related lncRNAs as potential biomarkers for prediction of prognosis and immune response in patients with LUAD . However, more studies should be presented for investigating the biological significance of m 6 A regulators in LUAD progression and prognosis. Hence, this study systematically analyzed the abnormal expression and clinical implications of m 6 A regulators. By consensus clustering analysis, we established two m 6 A modification patterns with distinct survival outcomes based on the expression matrix of 20 m 6 A regulators for LUAD ( Figure 2). Immune evasion represents a "hallmark of cancer," which reflects that immune effector constitutes a key determinant in the tumor microenvironment . Exploring the interactions between tumors and corresponding immune cells may reveal powerful and novel treatment options against LUAD. The distribution of survival status in high-and low-m 6 A risk score groups. Red dots indicated dead status and blue dots indicated alive status in the GSE30219 dataset. (D) Kapan-Meier curves of OS for high-and low-m 6 A risk score groups, followed by log-rank test in the GSE30219 dataset. (E) Hierarchical clustering analysis for the mRNA expression patterns in LUAD samples from high-and low-m 6 A risk score groups in the GSE30219 dataset. (F) ROC curves for assessing the efficacy of the m 6 A risk score in prediction of one, three and 5-year OS in the GSE72094 dataset. (G) Kapan-Meier curves of OS for high-and low-m 6 A risk score groups, followed by log-rank test in the GSE72094 dataset.
Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 746666 Immunotherapies like PD1/PD-L1 inhibitors have become the standard-of-care therapeutic strategy against NSCLC (Cui et al., 2019). But, only 20-30% patients respond to such therapy (Rittmeyer et al., 2017). Limited data are available concerning the interactions between markers and immunotherapy response. Hence, it is necessary to explore and identify effective tumor immunity-related markers for LUAD, thereby reducing the mortality and developing innovative targeted therapeutic options. Compared to C2, C1 had an increased immune score and most of HLAs were highly expressed in C1, indicating that patients in C1 pattern exhibited higher tumor immunity ( Figure 3).
Much progress in genome-wide methods like RNA-seq and microarrays has accelerated the evolution of cancer biomarkerrelated research. Numerous genetic markers of LUAD have been discovered, which are significantly correlated to diagnosis, survival outcomes, and drug resistance. But most of studies are limited to a single marker or a small sample population, leading to the limited accuracy and availability of markers. Furthermore, due to tumor heterogeneity, conventional clinical parameters like TNM staging are difficult to meet the requirement of accuracy and individuation in predicting prognosis. Thus, combined multiple markers or large sample analysis are necessary. Here, we established an m 6 A risk score containing ANLN, PLK1, IGF2BP1, HMMR, NEIL3, SFTA3, CXCL17, IRX5, PKP2, LYPD3, ABCC2, and DKK1 for LUAD prognosis ( Figure 5). High m 6 A risk score indicated reduced OS duration of LUAD patients after external validation in two cohorts ( Figure 6). Following multivariate-cox regression analysis, m 6 A risk score was an independent prognostic indicator of LUAD (Figure 7).
ROC curves confirmed the excellent performance in predicting LUAD patients' outcomes. Previously, ANLN up-regulation is in relation to LUAD metastasis (Xu et al., 2019). ANLN promotes LUAD progression via activating RHOA and involving the PI3K/AKT signaling (Suzuki et al., 2005). PLK1/vimentin pathway promotes immune escape through recruitment of Smad2/3 to PD-L1 promoter in LUAD metastasis (Jang et al., 2021). PLK1 induces migration of LUAD epithelial cells through STAT3 (Yan et al., 2018). IGF2BP1 induces LUAD progression via interaction with circXPO1 . Up-regulation of IGF2BP1 contributes to an unfavorable prognosis of LUAD (Huang et al., 2019). HMMR acts as an oncogene of LUAD and induces tumor progression (Li W. et al., 2020). NEIL3 that is correlated to immune infiltrations serves as an independent indicator for prediction of LUAD survival (Zhao et al., 2021). CXCL17 is an important determinator for LUAD spine metastasis (Liu W. et al., 2020). IRX5 as an oncogene is in relation to LUAD outcomes (Zhang et al., 2018). PKP2 accelerates the development of LUAD through increasing focal adhesion and EMT . Elevated expression of LYPD3 contributes to LUAD carcinogenesis and unfavorable survival outcomes . ABCC2, a multidrug resistance-associated protein, displays an increased expression in LUAD (Maruhashi et al., 2018). DKK1 is an immune-associated prognostic marker in LUAD . Above findings revealed the critical biological implications of the variables in the m 6 A risk score in the progression of LUAD. Furthermore, our data indicated that in comparison to the nomogram established by a single prognostic indicator, the nomogram established by the m 6 A risk score and clinical features might become the best model in prediction of shortand long-term survival of LUAD patients, thereby possibly assisting clinical management and therapy ( Figure 8). However, there are some limitations in our study, as follows: firstly, more information should be provided for internal mechanisms of m 6 A modification; secondly, the prognostic value of the m 6 A risk score should be verified in prospective research.

CONCLUSION
Collectively, this study comprehensively characterized the expression and clinical implication of m 6 A regulators in LUAD. Two m 6 A modification patterns were conducted, with different OS and tumor immunity. Furthermore, we developed the m 6 A risk score, which had high accuracy in predicting LUAD prognosis. Thus, our data may provide a reliable tool for prediction of prognosis and optimization of immunotherapy for LUAD patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.