lncRNA Profiles Enable Prognosis Prediction and Subtyping for Esophageal Squamous Cell Carcinoma

Long non-coding RNAs (lncRNAs) have emerged as useful prognostic markers in many tumors. In this study, we investigated the potential application of lncRNA markers for the prognostic prediction of esophageal squamous cell carcinoma (ESCC). We identified ESCC-associated lncRNAs by comparing ESCC tissues with normal tissues. Subsequently, Kaplan–Meier (KM) method in combination with the univariate Cox proportional hazards regression (UniCox) method was used to screen prognostic lncRNAs. By combining the differential and prognostic lncRNAs, we developed a prognostic model using cox stepwise regression analysis. The obtained prognostic prediction model could effectively predict the 3- and 5-year prognosis and survival of ESCC patients by time-dependent receiver operating characteristic (ROC) curves (area under curve = 0.87 and 0.89, respectively). Besides, a lncRNA-based classification of ESCC was generated using k-mean clustering method and we obtained two clusters of ESCC patients with association with race and Barrett’s esophagus (BE) (both P < 0.001). Finally, we found that lncRNA AC007128.1 was upregulated in both ESCC cells and tissues and associated with poor prognosis of ESCC patients. Furthermore, AC007128.1 could promote epithelial-mesenchymal transition (EMT) of ESCC cells by increasing the activation of MAPK/ERK and MAPK/p38 signaling pathways. Collectively, our findings indicated the potentials of lncRNA markers in the prognosis, molecular subtyping, and EMT of ESCC.


INTRODUCTION
Esophageal cancer is one of the most common cancers worldwide, and it ranks seventh and sixth in terms of overall incidence and mortality, respectively (Bray et al., 2018). There are two histologic subtypes including esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (AC) (Short et al., 2017). ESCC occupies over 90% of all esophageal cancer patients and the longterm outcome of ESCC is still limited with 5-year survival rate of 20% (Pennathur et al., 2013;Siegel et al., 2020). In general, prognostic factors may be helpful for better individual risk stratification and clinical outcome prediction. Traditional prognostic factors mainly depend on clinical and pathologic features, such as the history of drinking and smoking, tumor stage, and lymph node metastasis and so on, which have an effect on the overall survival (OS) in ESCC. However, the clinical course of ESCC patients with the same characteristics might be significantly different (Liu et al., 2012). Discovering new prognostic factors that improve the clinical outcome prediction of patients has been an urgent clinical need in ESCC.
As endogenous RNA transcripts with more than 200 nucleotides, long non-coding RNAs (lncRNAs) lack proteincoding potentials (Djebali et al., 2012;Iyer et al., 2015;Quinn and Chang, 2016). They are now known to act as scaffolds to modulate interactions between proteins and genes, as decoys to bind proteins, and as enhancers to regulate transcription of target genes (Huarte, 2015;Schmitt and Chang, 2016;Yue et al., 2016;Zhang et al., 2018;Blank-Giwojna et al., 2019). The discrepant expressions of lncRNAs, which have been demonstrated in various types of malignancies, possess significant regulatory effects on oncogenesis and tumor progression, indicating their potential oncogenic and tumor-suppressive roles (Hu et al., 2014;Li et al., 2018b;Wang et al., 2018). Growing evidence has suggested that unique lncRNA expression profiles can provide important prognostic information for patients with cancer (Cheng, 2018). Till today, several differentially expressed lncRNAs have been proved to be relevant for ESCC prognosis (Xie et al., 2014;Chen et al., 2015;Hu et al., 2015;Wang et al., 2015). However, previously published studies mainly focused on single lncRNAs as prognostic biomarkers. Little or nothing is known about the multi-lncRNA-based signature associated with OS in ESCC.
Our previous study has shown that the serumal and/or urinary exosome lncRNA profiles can be applied to the diagnosis and prognostic prediction of bladder cancer, colorectal cancer, breast cancer, and lung cancer (Li et al., 2017(Li et al., , 2018aXie et al., 2018;Zhan et al., 2018;Zhang et al., 2019). In the present study, we used publicly available The Cancer Genome Atlas (TCGA) RNAseq expression data and evaluated the potential effectiveness of lncRNA biomarkers for prognosis and subtyping of ESCC. We constructed a prognostic model with selected lncRNA markers and further obtained a lncRNA-based subtype of ESCC by k-means method. Besides, we validated the biological role of lncRNA AC007128.1 by cellular assays in ESCC progression.

Study Design
In the present study, we were aimed at identifying lncRNAbased model for the prognostic prediction and subtyping of ESCC. Firstly, differential and prognostic lncRNA markers were identified from public TCGA ESCC RNA-seq datasets, and then the prognostic model was constructed by cox stepwise regression analysis. The model was evaluated with cross-validation and time-dependent receiver operating characteristic (ROC) methods and then compared with the clinical characteristics including stage, lymph node metastasis, and distant metastasis. With the lncRNA profiles, two lncRNA ESCC clusters were also constructed via the unsupervised clustering method, and the robustness of subtyping markers in two other datasets was verified. The potential function and mechanism of AC007128.1 were further investigated, which demonstrated high expression in ESCC cells and tissues and could promote cell migration and invasion by MAPK pathway.

Patients Source
Tissue RNA-seq data were obtained from TCGA and Gene Expression Omnibus (GEO) (TCGA-ESCA and GSE53625).
Complete clinical and histopathological information are available at TCGA and GEO websites as follows: https://tcga-data.nci.nih.gov/tcga/ and https: //www.ncbi.nlm.nih.gov/geo/. The characteristics of ESCC patients are summarized in Supplementary Table 1.

Unsupervised Discovery of lncRNA-Based Subtypes
The training dataset (n = 111) was used to discover ESCC subtypes. To narrow down markers and obtain meaningful clustering by lncRNA information, the list was firstly ranked by variance, and the top 1,000 lncRNAs with the larger variance were selected. Secondly, different clustering methods, including hierarchical clustering (HC) (single, complete, average, centroid, and ward) and non-hierarchical clustering (NHC), were compared. Finally, the k-means was chosen to cluster for ESCC. Differentially expressed lncRNA markers were obtained among the new clusters by using the limma package, and the top 50 differential lncRNAs were selected. These 50 marker sets were finally used as the signature to predict ESCC subtypes. Validation was performed on the two predefined validation datasets (n = 51 and 179).

Cell Viability Assay
Cell viability was examed using the Cell Counting Kit-8 (CCK-8) assay (Dojindo Labs). The optical density was assessed with a microplate reader (Bio-Rad, Hercules, CA, United States) at a wavelength of 450 nm.

Transwell Invasion and Migration Assays
The transwell invasion assay was done using the transwell (Corning, NY, United States) and matrigel (BD Biosciences, San Jose, CA, United States). Briefly, inserts with 8-µm pores were coated with 8 µL of matrigel. Cells (∼7.5 × 10 4 ) with 200 µL of serum-free medium were added into the upper compartment of the chamber. Subsequently, 600 µL of RPMI-1640 medium or DMEM containing 20% FBS was added into the lower chamber of the chamber as a chemo-attractant. After 36 h of incubation at 37 • C in a humidified atmosphere containing 5% CO 2 , cells that migrated through the matrigel were fixed with methanol, stained with 0.1% crystal violet, and photographed using a microscope (Zeiss, Axio Observer). The cell number in five randomly selected fields from the central and peripheral regions of the filter was counted. The migration assay was conducted similarly without matrigel coating.

Wound Healing Assay
Wound healing assay was conducted in six-well plates. When cells reached a confluence of 90-100%, a wound was made with a 200-µL pipette tip. Spreading cells between parallel lines were measured by an inverted microscope (Zeiss, Axio Observer) and photographed after 0 and 24 h.

RNA Isolation and Real-Time PCR
Total RNA was obtained from cultured cells with RNA fast 2000 Reagent (Fastagen, Shanghai, China) and quantified with a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, United States). Purified RNA was reversely transcribed into cDNA using oligo-dT and random primers with the PrimeScript TM RT reagent Kit (TaKara, Dalian, China). Real-time PCR was performed on a CFX-96 real-time PCR System (Bio-Rad, Shanghai, China) using TB Green TM Premix Ex Taq TM (TaKara, Dalian China). Glyceraldehyde phosphate dehydrogenase (GAPDH) was employed as a housekeeping gene. The relative expressions of the target genes were calculated using the 2 − CT method and subsequent log2 transformed. Primer sequences are displayed in Supplementary Table 3.

Western Blotting Analysis
Total cell lysates were obtained by using the Western/IP lysis buffer (Beyotime, Haimen, China). Equal amounts of proteins were subjected to sodium dodecyl sulfatepolyacrylamide gel electrophoresis (SDS-PAGE) and then transferred onto polyvinylidene fluoride (PVDF) membranes (Millipore, United States). Membranes were immunoblotted with primary antibodies (Supplementary Table 4), followed by incubation with peroxidase-conjugated affinipure goat anti-mouse IgG or peroxidase-conjugated affinipure goat anti-rabbit IgG (Supplementary Table 4). Immunoreactive bands were visualized using the enhanced chemiluminescence system (Bio-Rad Laboratories). GAPDH was adopted as a loading control.

RNA-seq Library Construction
For RNA-seq experiments with shCtrl and shAC007128.1 in KYSE150 cells, total RNA was isolated from cells with the RNeasy mini kit (Qiagen, Germany). Paired-end libraries were constructed using the TruSeq TM RNA Sample Preparation Kit (Illumina, United States) according to TruSeq TM RNA Sample Preparation Guide. Briefly, the mRNA containing poly-A were purified using poly-T oligo-attached magnetic beads.
After purification, the mRNA was cleaved into small fragments using divalent cations at 94 • C for 8 min. The cleaved RNA fragments were amplified into first-strand cDNA using reverse transcriptase and random primers. Second-strand cDNA then was synthesized by DNA polymerase I and RNase H. Next, these cDNA fragments went through an end-repair process, the addition of a single "A" base, and then ligation of the adapters. The products were purified and enriched with PCR to create the final cDNA library. Purified libraries were quantified by Qubit R 2.0 Fluorometer (Life Technologies, United States) and validated by Agilent 2100 bioanalyzer (Agilent Technologies, United States) to confirm the insert size and calculate the molar concentration. Clusters were generated by cBot with the library diluted to 10 pM and then sequenced on the Illumina NovaSeq 6000 (Illumina, United States).
The library was constructed and sequenced at Shanghai Sinomics Corporation.

RNA Sequencing Analysis
Raw reads were preprocessed by filtering out rRNA reads, sequencing adapters, short-fragment reads, and other low-quality reads. Tophat v2.0.9 (Trapnell et al., 2009) was used to map the cleaned reads to the human hg38 reference genome with two mismatches. After mapping, Cufflinks v2.1.1 (Trapnell et al., 2012) was used to generate FPKM values for known gene models. Differentially expressed genes (DEGs) were identified using Cuffdiff (Trapnell et al., 2012). The p-value significance threshold in multiple tests was set by the false discovery rate (FDR) (Storey and Tibshirani, 2003). The fold changes (FCs) were also estimated according to the FPKM in each sample. The DEGs were selected using the filter criteria as follows: FDR ≤ 0.05 and FC ≥ 2.

Statistical Analysis
A Cox regression model was applied to build the combined prognosis score (cp-score). Survival analysis was performed by KM curves and log-rank tests with a high-risk and lowrisk group assignment relative to the median of cp-score. The time-dependent ROC curve was adopted to compare the performance of cp-score, TNM stage, lymph node metastasis, distant metastasis, and the combination of all factors. The effects of potential risk factors upon the survival time were assessed by multivariate Cox regression analysis. A P-value < 0.05 was considered statistically significant. Analysis for functional data were presented as mean ± SEM. The comparison between two groups was conducted by the Student's t-test. All statistical analyses were performed using R software (version 3.6.0.), SPSS 17.0 (IBM, SPSS, Chicago, IL, United States) and GraphPad Prism 5.0 (GraphPad Software, LaJolla, CA, United States).

Identification of Differential and Prognostic LncRNAs for ESCC
Based on the RNA-seq data from TCGA database, we extracted lncRNA expression profiles and compared them between 111 ESCC tumor samples and 11 normal samples with DEseq2 package (Love et al., 2014), defined the best cutoff of 2.482 with the mean of absolute log2FC plus standard deviation of absolute log2FC, and found 422 lncRNAs with an FDR less than 0.05, including 257 up-regulated and 165 down-regulated lncRNAs ( Figure 1A). The heatmap and principal component analysis (PCA) disclosed that ESCC patients and controls could be differentiated by the lncRNA expression profile (Figures 1B,C). These selected differential lncRNAs were considered as candidate prognostic markers for ESCC patients.
To select the prognostic lncRNAs, we analyzed 111 ESCC patients from TCGA with a follow-up of more than 1 month. The KM and univariate Cox proportional hazards regression (UniCox) methods were implemented to reduce the dimensionality. We obtained 413 overlapping lncRNAs based on the two algorithms, which were significantly correlated with the ESCC patients' OS ( Figure 1D). Based on the differential and prognostic lncRNAs, 13 lncRNAs were considered as candidate prognostic biomarkers for ESCC patients ( Figure 1D).
Meanwhile, we examined the relationship between clinicopathological features and patients' survival time. Patients with late-stage tumors or lymph node metastasis or distant metastasis had a shorter OS than those with early-stage tumors or without lymph node metastasis or distant metastasis (P = 0.00098, 0.02, and 0.00093 respectively, by log-rank test) (Figures 1E-G). However, gender, age, race, alcohol, Barrett's esophagus (BE), and T stage were not correlated with the OS (Supplementary Figures 1A-E). Furthermore, UniCox showed the same results as the log-rank analysis (Table 1). Overall, these results suggested advanced stage, lymph node metastasis, and distant metastasis were related to poor prognosis of patients.

LncRNA-Based Prognostic Prediction Model for ESCC
To construct the prognostic model, the 13 candidate lncRNAs were subjected to cox stepwise regression analysis, and 11 out of 13 lncRNAs were selected to build an optimum prognostic model, with a c-index of 0.73 indicating good discrimination (Figure 2A). Supplementary Table 5 lists the characteristics of 11 prognostic lncRNAs including permutation P-values, hazard ratios, and coefficients and so on. Figure 3E and Supplementary  Figures 2A-J illustrate the KM curves for these lncRNAs. We introduced a cp-score for the prognostic prediction of ESCC according to Luo et al. (2020). The cp-score was obtained according to the coefficients from cox regression and separated the patients into high-risk and low-risk groups. KM curves were performed by using the dichotomized cp-score. The result showed that the low-risk group had significantly better survival time compared with the high-risk group (P < 0.0001) ( Figure 2B). Figures 2C-E show the distribution of the risk score, OS status and the corresponding expression profiles of 11 lncRNAs, which were ranked according to the cp-score value. The mortality of high-risk patients was higher than low-risk patients.
Furthermore, we used a time-dependent ROC curve (Heagerty et al., 2000) to characterize the prognostic potential of the cp-score, pathological stage, lymph node metastasis, distant metastasis, and the combination of all above factors. The prognostic prediction model could effectively predict the 3and 5-year prognosis and survival of ESCC patients by timedependent ROC curves with area under curve 0.87 and 0.89, respectively (Figures 2F-G). Compared with pathological stage, lymph node metastasis, and distant metastasis, the cp-score both showed a better ability to predict prognosis in the 3-and 5-year (Figures 2F-G). Besides, the combination of cp-score and clinicopathologic characteristics significantly improved the performance to predict the 5-year prognosis [AUC-ROC:1.00] ( Figure 2G) compared with the cp-score or clinicopathologic features alone, including TNM stage, lymph node metastasis, and distant metastasis.
Next, we assessed the association between cp-score and the TNM stage, T stage, the presence of lymph node metastasis, and distant metastasis of ESCC. Patients with stage I/II disease had lower cp-scores than those with stage III/IV disease (P = 0.03; Figure 2H). Similarly, the cp-scores of patients with lymph node and distant metastasis were significantly higher compared with those without lymph node and distant metastasis (P = 0.014 and 0.0036; Figures 2I,J). The cp-scores of patients with T3/4 tumors were moderately higher compared with those with T1/2 tumors (P = 0.18; Supplementary Figure 2K). These results demonstrated that the cp-score was correlated with the tumor stage and might be used to predict the TNM stage, lymph node, and distant metastasis. Multivariate Cox regression analysis indicated that the cp-score was an independent prognostic factor for ESCC (Table 1).

lncRNA-Based Subtyping of ESCC
We used unsupervised clustering to generate lncRNA-based subtypes of ESCC. Figure 4A illustrates the schematic diagram for sample clustering. Firstly, we compared different clustering methods, including HC (single, complete, average, centroid, and ward) and NHC methods (k-means). Ward showed better results among HC methods (Supplementary Figures 3A-E). The quality of the single clustering solutions could be assessed with average silhouette width as a measure for clustering quality, so we compare the clustering ability between ward and k-means methods by silhouette plot. The results showed ward method had a lower silhouette width (0.17) than k-means (0.2), which indicated that the clustering quality of ward method was inferior to that of k-means (Supplementary Figures 3G,H). We finally chose the k-means to cluster for ESCC. Using the 111 ESCC patients as a training dataset, we obtained two clusters with 50 lncRNAs that were different expression between the two clusters and clinical data were indicated by the annotation bars above the heatmap (Figure 4B). Silhouette analysis of the clusters showed a good clustering quality with average width 0.54 ( Figure 4C). We also validated the clusters from residual 57 TCGA ESCC tumor samples with a follow-up of less than 1 month, which showed a strong positive silhouette width (average = 0.61) (Figures 4D,E). Furthermore, we also observed the robustness of 50 lncRNAs in another independent dataset from GEO (GSE53625) (Figures 4F,G). Taken together, these results indicated that k-means fitted the respective cluster for ESCC. Among these lncRNAs, four were also found in the differential gene list, while none were found in the prognostic marker list (Supplementary Figure 3I).
To explore the clinical significance of the two subtypes, we systematically tested the associations between the subtype and clinical characteristics, including TNM stage, distant metastasis, lymph invasion, grade, BE, location, tobacco, alcohol, age, sex, race, and survival outcomes (Supplementary Table 6, Figures 4H,I, and Supplementary Figures 3J,K). There was no survival difference between cluster 1 and cluster 2 in all three datasets (Supplementary Figures 3J,K, both P > 0.05, logrank test). Cluster 1 were frequently observed to correlate with the white race and those with BE which was considered as a precancerous lesion (Figures 4H,I, both P < 0.05, chi-square test). These differences in race and BE with unsupervised lncRNA signatures might implicate the discrepancy of the intrinsic biological processes in each cluster group.

AC007128.1 Is Highly Expressed in ESCC With Poor Prognosis
To explore possible connections between lncRNAs and carcinogenesis and ESCC development, we evaluated the expressions of seven out of 11 prognostic lncRNA markers in ESCC cell lines and normal esophageal epithelial cell lines by RT-qPCR because the expressions of the other four lncRNAs were too low to detect. Of the seven lncRNAs, we found that AC007128.1 had a higher expression in three established ESCC cell lines (KYSE150, TE-1, and EC109) compared with normal cell lines (Het-1A and HEEC), and the expression of AC007128.1 was the highest among the seven lncRNAs ( Figure 3A). We also detected the level of AC007128.1 in the nuclear and cytoplasmic fractions of KYSE150 and EC109 cells and found that AC007128.1 is mainly enriched in the nucleus (Supplementary Figures 4A,B). We also found a higher expression of AC007128.1 in human ESCC samples from three independent datasets (Figures 3B-D). Meanwhile, increased AC007128.1 expression was related with a shorter OS in TCGA ESCC samples (Figure 3E), which was consistent with a previous study (Liu et al., 2019). These data demonstrated that the expression of AC007128.1 was upregulated in both ESCC tissues and cells, and associated with poor prognosis in ESCC. Therefore, we focused on AC007128.1 for further research.
We further analyzed the correlation between the AC007128.1 expression and the clinicopathologic characteristics of the ESCC patients. The expression of AC007128.1 was associated with the TNM stage (P = 0.039), gender (P = 0.01), and age (P = 0.046) in TCGA dataset (Supplementary Table 7). The AC007128.1 expression was associated with grade (P = 0.013) in GSE53625 (Supplementary Table 8). There was no statistical association between the AC007128.1 expression and other clinical characteristics in the two datasets. Based on these data, we postulated a role for AC007128.1 in ESCC pathology.

Identification of AC007128.1 as a Regulator of ESCC Cell Migration and Invasion
To explore the biological effects of AC007128.1 in the tumorigenesis and development of ESCC, three specific siRNAs targeted to AC007128.1 were used to reduce its expression in two ESCC cell lines, KYSE150 and EC109. We showed that si-AC007128.1-2 exhibited a better silencing efficiency ( Figure 5A). CCK-8 assay revealed that depletion of AC007128.1 did not affect the proliferation of KYSE150 and EC109 cells (P > 0.05; Supplementary Figures 4C,D). Next, we assessed the effect of AC007128.1 depletion on cell migration and invasion in ESCC. We found that AC007128.1 depletion dramatically inhibited cell migration, invasion, and wound closure in both KYSE150 and  EC109 cell lines (Figures 5B-H). We also observed the same inhibitory effects on the migration and invasion in TE-1 cells (Supplementary Figures 4E-G). Transwell assay (Figures 5I-M) and wound healing assay (Figures 5N-P) indicated that overexpression of AC007128.1 in EC109 and TE-1 cells dramatically promoted cell migration and invasion. However, it had no significant effect on cell growth (Supplementary Figures 4H,I).
Collectively, these results suggested that AC007128.1 promoted migration and invasion in ESCC cell lines.
Epithelial-mesenchymal transition (EMT) plays an important role in tumorous migration and invasion (Yang and Weinberg, 2008;De Craene and Berx, 2013;Lamouille et al., 2014; Data represent mean ± SEM from three independent experiments. *P < 0.05, **P < 0.01, and ***P < 0.001 by Student's t-test as compared with the corresponding control. Pang et al., 2018;Dongre and Weinberg, 2019). We further detected the expressions of EMT markers, including the epithelial marker E-cadherin and the mesenchymal markers, Vimentin, β-catenin, and Snail in AC007128.1-depleted cells. Figure 6A show that depletion of AC007128.1 resulted in a significantly increased expression of E-cadherin at the protein level, while it led to dramatically decreased expressions of Vimentin, β-catenin, and Snail at the protein level. These results indicated that AC007128.1 might promote EMT, thus affecting cell migration and invasion in ESCC.

AC007128.1 Activated MAPK/ERK and MAPK/p38 Signaling Pathways
To explore the molecular mechanism, through which AC007128.1 promoted migration and invasion of ESCC, we constructed an shRNA specific to human AC007128.1, which was used to successfully reduce the expression of AC007128.1 at the mRNA level (Figure 6B), and further performed RNA-sequencing in KYSE150 cells. Global reprogramming of the ESCC transcriptome was detected in AC007128.1-depleted cells (Figure 6C), in which more than 2,500 genes were differentially expressed compared with the shCtrl group (Figure 6D), including 856 upregulated genes and 1,688 downregulated genes ( Figure 6E). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the repressed genes showed that these down-regulated genes were mainly enriched in the following pathways including mitogen-activated protein kinase MAPK signaling pathway, cAMP signaling pathway, HIF-1α signaling pathway, Ras signaling pathway, and cytokinecytokine receptor interaction, which were known to be important in cancer (Figure 6F). Moreover, gene set enrichment analysis (GSEA) for all genes revealed that some signaling pathways were depressed, such as MAPK signaling pathway, cell adhesion molecules, IL-17 signaling pathway, TNF signaling pathway, and cytokine-cytokine receptor interaction (Figure 6G). The results of KEGG and GSEA indicated that AC007128.1 might contribute to EMT via these signaling pathways in ESCC.
MAPK signaling pathway was identified as inhibited pathway in both KEGG and GSEA analysis when AC007128.1 was depleted in KYSE150 cells (Figures 6F-H). Previous studies has been implicated that the MAPK signaling pathway take part in regulating EMT of different cancers, including ESCC (Huang et al., 2004;Hu et al., 2017;Hawsawi et al., 2018;Chen et al., 2019;Peng et al., 2019), Therefore, we hypothesized that AC007128.1 might affect cell EMT by deregulating the MAPK signaling pathway. To test this hypothesis, we explored the effect of AC007128.1 on the activity of MAPK signaling. As expected, we found that depletion of AC007128.1 decreased the expressions of phosphorylated ERK1/2 (p-ERK1/2) and phosphorylated p38 (p-p38) in KYSE150 ( Figure 6I). It has been shown that ERK1/2 (Wong et al., 2012;Wang et al., 2020;Zhang F. et al., 2020) and p38 (Wang et al., 2012;Ma et al., 2016) pathways are critical in EMT regulation for ESCC. Therefore, we inferred from the above-mentioned results that AC007128.1 affected cell EMT via deregulating MAPK signaling pathway.

DISCUSSION
In recent years, more and more novel lncRNAs have been identified and the roles of lncRNAs in cancer development have been increasingly studied (Yang et al., 2014;Bhan et al., 2017;Peng et al., 2017). In our previous study, we have shown the effectiveness of lncRNA signatures in diagnosis, prognostic prediction, and drug-resistance in four common cancers, namely breast cancer, colorectal cancer, bladder cancer, and lung cancer (Li et al., 2017(Li et al., , 2018aXie et al., 2018;Zhan et al., 2018;Zhang et al., 2019). Here, we identified an 11-lncRNA model that was associated with tumor OS in ESCC patients, constructed a lncRNA-based molecular subtype of ESCC, and found that lncRNA AC007128.1 might mediate ESCC EMT through mitogen-activated protein kinases/extracellular signal-regulated kinase (MAPK/ERK) and MAPK/p38 signaling pathways.
In the present study, we developed a prognostic prediction model using 11 selected lncRNA markers from TCGA ESCC RNA-seq dataset and found that this model could effectively distinguish ESCC patients with different prognoses. Moreover, multivariable analysis confirmed that this model was considered as an independent prognostic risk factor. The discrimination potential of the cp-score was superior to other prognostic risk factors (TNM stage, lymph node metastasis, and distant metastasis). We also compared the prognostic ability of this model with those of having been reported prognostic biomarkers for ESCC. The results demonstrated that the 11-lncRNA signature had a higher fidelity than other signatures Mao et al., 2018;Zhang L. et al., 2020). The combination of cp-score and clinical characteristics improved the 5-year survival prediction, which could identify patients needing more aggressive treatment and surveillance. We also found that the cp-score was correlated with the staging, as well as the lymphatic and distant metastases of ESCC. These results suggested that this model was also useful for detecting staging, lymphatic, and distant metastasis of ESCC. Although the 11-lncRNA model showed potential prognostic biomarkers for esophageal carcinoma, one limitation should be taken into consideration: the present study is short of other independent datasets to show its fidelity as prognostic biomarkers for esophageal carcinoma, and the distribution of clinical features in TCGA dataset might be distinct from those of other datasets, making it unsuitable for other subjects. Therefore, our results should be further validated in other clinical samples in the future.
Molecular subtyping is considered as a favorable source of disease stratification (Guinney et al., 2015;Sjodahl et al., 2017), while similar lncRNA-based subtyping is still lacking. With a k-means clustering analysis (Kakushadze and Yu, 2017), we could divide 111 ESCC patients from TCGA into two molecular subgroups based on 50 lncRNA markers, and its performance was again confirmed in internal and external datasets. The lncRNAbased subtyping showed good clustering capability and could be treated as a potential tool for ESCC molecular subtyping. We further studied the correlation between the lncRNA-based subtyping and clinical factors, including TNM stage, distant metastasis, lymph invasion, grade, BE, tumor site, alcohol, tobacco, age, sex, race, and survival outcomes. We found a relation between lncRNA-based clusters and race, alcohol, BE, and TNM stage in TCGA dataset, which might imply that these clinical variables partially affected the different transcriptions of lncRNAs. However, the study on subtyping was limited by clinical samples that all samples were collected from available public data. Further verification with multi-center clinical samples is still necessary to adequately assess the performance of the lncRNAbased subtype.
We presumed that lncRNA markers in the prognostic model might play important roles in the carcinogenesis and development of ESCC. Therefore, elucidating the underlying mechanism might provide potentially therapeutic targets of ESCC. In this study, we found lncRNA AC007128.1, which was significantly up-regulated in ESCC tissues from three independent datasets, and correlated with OS of ESCC in TCGA, However, there is one limitation to the validation of lncRNA expression: we didn't detect the expression level of AC007128.1 by RT-qPCR or other method in new ESCC tissue samples due to limited corresponding samples. In view of this limitation, we attempted to prove the reliability of AC007128.1 by detecting the expression in ESCC cell lines and performing functional experiment in vitro. The results showed that AC007128.1 was upregulated in ESCC cell lines compared with normal esophageal epithelial cell lines and its depletion could suppress migration and invasion of ESCC cells in vitro. EMT is an important biological event during human cancer progression (De Craene and Berx, 2013;Pastushenko et al., 2018;Pastushenko and Blanpain, 2019). Reduced AC007128.1 could increase the expressions of epithelial markers and decreased the expressions of mesenchymal markers. Taken together, these results indicated that AC007128.1 might be closely related to cancer progression and play a tumor progressive role in ESCC.
It is urgently necessary to elucidate the molecular mechanisms underlying ESCC progression. We found that the deletion of AC007128.1 decreased the activation of MAPK/ERK and MAPK/p38 signaling pathways, but the exact mechanism by which AC007128.1 acts on the MAPK signaling pathway requires further study. Several published studies have also shown that the MAPK signaling pathway is activated in the process of tumorigenesis, metastasis, and angiogenesis of multiple human malignancies, including ESCC (Burotto et al., 2014;Hu et al., 2017). We speculated that AC007128.1 might promote migration and invasion of ESCC by increasing the activation of MAPK/ERK and MAPK/p38 signaling pathways.
Collectively, we identified lncRNA markers for prognostic prediction and subtyping of ESCC using lncRNA expression profiles. We also demonstrated a significant up-regulation of AC007128.1 in ESCC tumor tissues and cell lines and found its association with poor survival in ESCC patients. The mechanistic analysis demonstrated that AC007128.1 might promote cancer cell EMT by aberrantly activating the MAPK signaling pathway. These findings provided the possibility that overexpressed AC007128.1 was an important molecule participating in the aberrant activation of the MAPK signaling pathway and could serve as a therapeutic target for ESCC.

DATA AVAILABILITY STATEMENT
The data generated in the manuscript can be found in GEO using accession GSE167345.

AUTHOR CONTRIBUTIONS
SZ and JL designed the experiments. SZ performed the experiments, drafted the figures, and wrote the manuscript. HG and YT helped to perform the experiments. PL and YW provided direction in the experimental design. LD and CW revised the manuscript. All authors contributed to the article and approved the submitted version.  Periostin mediates epithelial-mesenchymal transition through the MAPK_ERK