Identification and Validation of an Autophagy-Related lncRNA Signature for Patients With Breast Cancer

Background Autophagy is a “self-feeding” phenomenon of cells, which is crucial in mammalian development. Long non-coding RNA (lncRNA) is a new regulatory factor for cell autophagy, which can regulate the process of autophagy to affect tumor progression. However, poor attention has been paid to the roles of autophagy-related lncRNAs in breast cancer. Objective This study aimed to construct an autophagy-related lncRNA signature that can effectively predict the prognosis of breast cancer patients and explore the potential functions of these lncRNAs. Methods The RNA sequencing (RNA-Seq) data of breast cancer patients was collected from The Cancer Genome Atlas (TCGA) database and the GSE20685 database. Multivariate Cox analysis was implemented to produce an autophagy-related lncRNA signature in the TCGA cohort. The signature was then validated in the GSE20685 cohort. The receiver operator characteristic (ROC) curve was performed to evaluate the predictive ability of the signature. Gene set enrichment analysis (GSEA) was used to explore the potential functions based on the signature. Finally, the study developed a nomogram and internal verification based on the autophagy-related lncRNAs. Results A signature composed of 9 autophagy-related lncRNAs was determined as a prognostic model, and 1,109 breast cancer patients were divided into high-risk group and low-risk group based on median risk score of the signature. Further analysis demonstrated that the over survival (OS) of breast cancer patients in the high-risk group was poorer than that in the low-risk group based on the prognostic signature. The area under the curve (AUC) of ROC curve verified the sensitivity and specificity of this signature. Additionally, we confirmed the signature is an independent factor and found it may be correlated to the progression of breast cancer. GSEA showed gene sets were notably enriched in carcinogenic activation pathways and autophagy-related pathways. The qRT-PCR identified 5 lncRNAs with significantly differential expression in breast cancer cells based on the 9 lncRNAs of the prognostic model, and the results were consistent with the tissues. Conclusion In summary, our signature has potential predictive value in the prognosis of breast cancer and these autophagy-related lncRNAs may play significant roles in the diagnosis and treatment of breast cancer.


INTRODUCTION
Breast cancer is the most common malignant tumor and the leading cause of cancer-related deaths in women worldwide, with an incidence and mortality of 24.2% and 15.0%, respectively (1,2). Current treatment options for breast cancer usually combine surgery with a variety of adjuvant therapies, such as chemotherapy, radiotherapy, endocrine therapy, targeted therapy, and immunotherapy (3)(4)(5)(6)(7)(8). Most patients respond to initial treatment within a certain period of time, but there are still some breast cancers, especially triple negative breast cancer, that will develop into a more invasive tumor form, resulting in a poor prognosis (9)(10)(11). Because of the strong heterogeneity of breast cancer, multi-parameter signals are more valuable than single biomarkers in predicting the prognosis of breast cancer.
Autophagy is a physiological process that guides the degradation of damaged, denatured, or senescent proteins and damaged organelles in lysosomes. There are two states of autophagy activation: normal state and pathological state. Autophagy responds to various stresses by providing circulating metabolic substrates needed for survival under normal physiological conditions (12). Recent studies further indicate that abnormal autophagy is considered to be a potential cause of many diseases, including cardiovascular and cerebrovascular diseases, liver diseases, and cancer (13,14). Autophagy is closely related to the occurrence, development, and metastasis of tumor. Multiple studies have reported that autophagy may play especially important roles in supporting cell growth and drug resistance to targeted therapy in BRAF-mutant melanoma (15)(16)(17). Some researchers found that MIR106A-5p upregulation suppressed autophagy and accelerated malignant phenotype in nasopharyngeal carcinoma (18). Furthermore, recent findings have revealed the autophagy drugs such as sirolimus and arsenic trioxide can induce autophagy of tumor cells and improve the outcome of certain cancer patients. However, only a limited proportion of patients benefited from autophagy-based treatment. Therefore, establishing predictive biomarkers for autophagic drug therapy response is required to identify patients who can benefit from autophagybased treatment.
LncRNA is a small RNA with a length of more than 200bp and no protein coding function (19). Although they cannot be translated into proteins, they are involved in the process of protein translation (20). LncRNA plays a critical role in complex autophagy regulatory networks by regulating the biological effects of a variety of autophagy-related DNA, RNA, or proteins (21). Some studies have reported that lncRNA-HOTAIRM1 regulates autophagy and degrades tumor protein PML-RARA, during differentiation arrest of bone marrow cells, suggesting that lncRNA is a potential therapeutic target for leukemia (22). Other researchers have found the expression of lncRNA MEG3 is significantly down-regulated in glioma tissues and cells, and its overexpression can significantly inhibit cell proliferation and promote apoptosis and autophagy of glioma cells (23).
Considering the significance of lncRNA and autophagy in breast cancer biology, we determined an autophagy-related lncRNA signature to predict the prognosis of breast cancer patients and provide a theoretical basis for the diagnosis and treatment of breast cancer.

Datasets Preparation
The workflow of our analysis is illustrated in Figure 1. TCGA (https://cancergenome.nih.gov) RNA-Seq data of breast cancer patients was used as a training set to construct a signature composed of autophagy-related lncRNAs. The training cohort includes TGGA mRNA expression (FPKM) in 1,109 breast cancer patients and related clinical information. We collected an independent dataset GSE20685 (series matrix files), which was performed using the platform GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array (https://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20685). In order to validate the predictive capability and the universal applicability of the signature, GSE20685 dataset including 327 breast cancer patients was employed as a testing set in our present study.
breast cancer by using Pearson method. LncRNAs with correlation coefficient R 2 > 0.3 and P < 0.001 are considered to be autophagy-related lncRNAs. Extract the shared lncRNAs from autophagy-related lncRNAs and GSE20685 for the following analysis. Then, the shared lncRNAs in TCGA were used as a training dataset (n=1,109), and the shared lncRNAs in GSE20658 were used as a testing dataset (n=327). All RNAsequencing data were normalized by log2 conversion.

Signature Development
Univariate Cox regression analysis was executed to obtain the prognosis-related lncRNAs from the training dataset. Then the lncRNAs with P < 0.05 were included in the multivariate Cox hazard model analysis to produce an autophagy-related lncRNA signature model. In order to avoid over-fitting, Akaike Information Criterion (AIC) was conducted to select the stepwise signature, and the signature with the lowest AIC value was considered to be the optimal signature. The risk score of each patient with breast cancer was calculated according to the following formula: Risk score= b lncRNA1 × expr lncRNA1 + b lncRNA2 × expr lncRNA2 + …. + b lncRNAn × expr lncRNAn . b is the regression coefficient of the corresponding lncRNA, expr is the expression level of lncRNA, and its unit is FPKM. Based on the median risk score, 1,109 patients with breast cancer in the training dataset were divided into two groups: high-risk group and low-risk group. In order to compare the difference of OS between high-risk group and low-risk group, Kaplan Meier-plotter was implemented. In addition, ROC curve was performed to evaluate the predictive ability of the signature by using the SurvivalROC package. Finally, 327 breast cancer patients with credible prognostic data from the GSE20685 set were regarded as a testing set to evaluate the predictive capability of the signature. All P<0.05 were regarded as a significant difference.

Gene Set Enrichment Analysis
GSEA was executed to discover changes in the expression of predefined gene sets rather than individual gene, so it can be performed to identify whether gene sets show statistically remarkable differences between the two biological states of samples. In our analysis, GSEA version 4.0.3 was used to verify whether the differentially expressed gene sets between the lowrisk group and high-risk group were enriched in cancer-related and autophagy-related processes.

Cell Culture
Human normal breast cells (HBL-100) and breast cancer cells (MCF-7 and T-47D) were purchased from the Cell Bank of the FIGURE 1 | Flowchart for identification and validation of an autophagy-related lncRNA signature in breast cancer patients. The study was carried out in TCGA and GSE20685 database. The TCGA training set was used to identify prognostic lncRNAs. The multivariate Cox hazard model analysis was performed to construct a prognostic signature based on the prognostic lncRNAs. The prognosis signature was validated in the GSE20685 testing set.

LncRNAs Validation Using qRT-PCR
After being isolated by TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and quantified by NanoDrop 2000 (Thermo Fisher Scientific Inc, Rockford, IL, USA), 1 µg of total RNA underwent reverse transcription by UEIris II RT-PCR System for First-Strand cDNA Synthesis (US EVERBRIGHT INC, Suzhou, China) according to the manufacturer's instruction. The product was then used for realtime PCR by Universal SYBR Green qPCR Supermix (US EVERBRIGHT INC, Suzhou, China). PCR was conducted as follows: predegeneration at 95°C for 5 min; denaturation at 95°C for 5 sec, annealing at 60°C for 30 sec, repeating the previous process 40 times. GRAPDH was used as internal references, and experiment of each group was repeated three times. Quantification of the relative expression levels of lncRNA by 2 -△△Ct method. Primers sequences are presented in Table 1.

Statistical Analysis
All statistical analysis and drawings were carried out using R software (version 3.6.3) and Bioconductor. The co-expression   network was constructed by Cytoscape to visualize candidate lncRNAs related to prognosis. The autophagy-related lncRNAs in patients with breast cancer were analyzed with R software package "ggalluvial," "ggplot2," and "pheatmap." Then Sankey diagram, risk score, survival status, and lncRNA heat map were drawn. Kaplan Meier-plotter was performed to evaluate the OS differences between high-risk group and low-risk group of the signature. Univariate and multivariate Cox regression were constructed to assess whether the prognostic signature was independent of clinical characteristics such as age, gender, clinical stage, and TNM stage. And the Cox regression also was used to confirm the clinical value of the signature in age, gender, clinical stage, and TNM stage. The above operations are implemented using the R software package "Survival." The nomogram was implemented using rms R package to predict the OS. All P<0.05 was regarded as a significant difference. The Limma package was used to identify the differently expressed lncRNAs in view of |log2 fold change (FC)|≥1 and false discovery rate (FDR)<0.05 based on 9 lncRNAs. All P<0.05 were considered statistically significant.

Construction of Autophagy-Associated lncRNAs Co-expression Network
We identified a total of 19,658 mRNAs and 14,142 lncRNAs, which were screened from the TCGA dataset. According to 222 autophagy genes in Human Autophagy Database, we developed a co-expression network to determine autophagy-related lncRNAs. The Pearson correlation analysis showed that 1,270 lncRNAs were associated with autophagy-related genes in breast cancer (R 2 > 0.3, P < 0.001). Finally, we obtained 180 shared lncRNAs from 1,270 autophagy-related lncRNAs and 1,146 lncRNAs of GSE20685 (Figures 2A, B).

Identification of an Autophagy-Related lncRNA Signature in Breast Cancer
By using univariate cox regression analysis, we identified 25 lncRNAs that were strongly with patients' OS from 180 autophagy-related lncRNAs in training dataset ( Figure 2C) (P<0.05). Stepwise multivariable cox proportional hazards regression analysis were carried out to identify the optimal prognostic lncRNAs among the 25 candidate lncRNAs. Subsequently, 9 candidate lncRNAs were identified to be independent prognostic factors for breast cancer patients ( Figure  2D). We then developed autophagy-related lncRNAs coexpression networks based on the 9 genes (Table 2, Figure 3A). Next, we constructed a signature of 9 autophagy-related lncRNAs using a risk score method. The prognostic risk score formula of the signature was as follows: prognostic score = (0.2520 × KMT2E-  Table 3). Based on the median  risk score, 1,109 patients with breast cancer in training dataset were divided into two groups: high-risk group and low-risk group. In Figure 3B,  patients in the high-risk group was poorer than the low-risk group based on the prognostic signature (HR: 1.794, 95% CI: 1.556-2.069; P<0.001) ( Figure 4A). Additionally, Figure 4B illustrated that risk score of the signature had an AUC of 0.806 under the ROC curve, indicating a credible diagnostic value. The risk score of the prognostic signature in the low-risk group and high-risk group, survival status of breast cancer patients, and the expression heat map are displayed in Figure 4C. To assess the robustness of the signature in OS prediction of breast cancer patients, we further examined it in the testing cohort from the   GSE20685 dataset using the same formula. The results revealed that the OS of patients with high-risk score was also worse than those with low-risk score in the GSE20685 cohort (HR: 1.215, 95% CI: 1.105-1.336; P<0.001) and the AUC of the ROC curve of the prognostic model was 0.788 ( Figures 5A-C). This result is consistent with the training set. Finally, Cox regression analyses were used to confirm the possibility that risk score of the signature can be regarded as an independent predictor of prognosis in breast cancer patients. The results indicated that whether in the training dataset or the testing dataset, the risk score of the signature could significantly help to forecast the prognosis of breast cancer patients, eliminating the influence of clinical features (gender, age, clinical stage, and TNM stage) (P<0.001) ( Table 4, Figures 6A-D).

Clinical Value of the Prognostic Signature in Patients With Breast Cancer
We further evaluated the clinical value of the signature in breast cancer patients by determining the correlation between lncRNA and the clinical characteristics (age, gender, clinical stage, and TNM stage) of breast cancer patients. The results showed that

Gene Set Enrichment Analysis
To further explore the potential biological behavior of this signature in breast cancer patients, we determined a total of 25 gene sets using GSEA, which were notably enriched with a nominal P value of < 0.05 and FDR q-value < 0.25 ( Table 6).
The results revealed that the differentially expressed genes between the high-risk and low-risk groups were enriched in the cancer-related and autophagy-related pathways. As shown in Table 5 and Figures 7A-F, stromal and carcinogenic activation pathways were markedly enriched in high risk group, such as cell adhesion, ECM receptor interaction, TGF beta signaling pathway, Renal cell carcinoma pathway, and Prostate cancer pathway. Surprisingly, we also found that glucose metabolic pathways were significant such as Glycolysis/Gluconeogenesis pathway, TCA cycle pathway, and starch and sucrose metabolism pathway enriched in high risk groups ( Figures  7G-I). It is worth noting that TGF-b signaling pathway and glucose metabolism pathway are closely related to autophagy. The results from GSEA have revealed these autophagy-related genes contribute to carcinogenic activation pathways and autophagy-related pathways, which may provide sufficient evidence for targeted therapy of breast cancer.

The Nomogram Establishing and the Prognostic Value Validating Base on 9 lncRNAs in TCGA Dataset
Based on these autophagy-related lncRNAs of the signature, a nomogram was constructed. Multivariate Cox analysis was performed to assign the points in the nomogram to each variable. By drawing a vertical line between the total point axis  and each prognostic axis, the estimated survival rate of breast cancer patients at 1, 3, and 5 years can be calculated, which may help professionals make clinical decisions for breast cancer patients ( Figure 8A). In order to further explore the prognostic value of the 9 lncRNAs, the Kaplan Meier curve was built to confirm the relationship between these lncRNAs and OS. In our analysis, a total of 6 of the 9 lncRNAs (USP30-AS1, ST7-AS1, VPS9D1-AS1, OTUB6DB-AS1, LINC01087, and MAPT-AS1) were identified. The results indicated that the 6 autophagy-related lncRNAs were correlated to the OS in breast cancer patients ( Figure 8B).

Validation the Expression of 9 lncRNAs in Breast Cancer Tissues and Cells
To further validate the expression profiles of the 9 lncRNAs, we explored the expression levels of the lncRNAs in breast cancer tissues based on TCGA database as well as in breast cancer lines by qRT-PCR. The results showed that USP30-AS1, TFAP2A-AS1, MAPT-AS1, and LINC01087 expression were significantly increased and HOXB-AS1 expression was significantly decreased in breast cancer tissues compared with normal breast tissue ( Figure 9A, P<0.001). qRT-PCR results showed that USP30-AS1, TFAP2A-AS1, MAPT-AS1, and LINC01087 were overexpressed and HOXB-AS1 was lowexpressed in MCF-7 and T-47D cell lines compared to HBL-100 cell line ( Figures  9B-F, P<0.05).

DISCUSSION
Breast cancer is characterized by the accumulation of abnormal cells, which may be attributed to the imbalance of cell proliferation, apoptosis, and autophagy regulation disorder. Autophagy plays a dual role in inhibiting and promoting the occurrence and development of cancer at different stages (24). Increasing evidence shows that lncRNAs and autophagy are closely associated with the occurrence, development, and prognosis of many different tumors. Recent evidence suggests that lncRNA NAMPT-AS promoted breast cancer progression and regulated autophagy through the mTOR pathway (25). Existing studies showed that there is a positive relationship between the expression of lncRNA LCPAT1 and LC3b in lung cancer (26). Other studies showed that lncRNA DICER1-AS1 expression was up-regulated in osteosarcoma cells, while knockdown of DICER1-AS1 could inhibit the proliferation, migration, and invasion of osteosarcoma cells, and reduce the protein expression levels of ATG5, LC3, and Beclin1, indicating that knocking down DICER1-AS1 can inhibit autophagy of osteosarcoma cells (27).
The specific mechanisms of lncRNAs regulating autophagy can be divided into three categories: lncRNAs act as competitive endogenous RNAs (ceRNAs) combined with miRNAs to regulate the expression of miRNAs, thus affecting the process of autophagy (28); lncRNAs can also affect the expression of ATG gene cis or trans (24); lncRNAs promote tumor progression by inhibiting autophagy-mediated apoptosis through AKT/ mTOR pathway (29). LncRNAs regulate the occurrence and development of autophagy in both directions, which provides new ideas and ways for further research on the prevention and treatment of autophagy-related diseases, such as cancer. At present, using high-throughput biotechnology to detect genetic changes to predict tumor recurrence and metastasis has become a research hotspot. However, a single lncRNA may not be sufficient to forecast the prognosis of the breast cancer patients. Therefore, it is urgent to construct an autophagyrelated lncRNA signature to predict the survival of breast cancer patients.
In our analysis, the TCGA dataset and GSE20685 dataset were downloaded to investigate the prognosis signature of autophagy-related lncRNAs for breast cancer patients. Firstly, we identified 1,270 lncRNAs associated with autophagy-related genes in breast cancer (R 2 > 0.3, P < 0.001). Because the breast cancer samples in TCGA and GEO databases are different, the extracted lncRNAs are not exactly the same. We analyzed the common lncRNAs in the two databases to ensure that these lncRNAs are general and universal in breast cancer samples, and narrow the scope of biological marker screening. Then we obtained 180 shared lncRNAs from TCGA dataset and GSE20685 dataset. We further identified a signature composed of 9 autophagy-related lncRNAs that could divide breast cancer patients into high-risk group or low-risk group. Among the 9 autophagy-related lncRNAs, USP30-AS1, ST7-AS1, VPS9D1-AS1, TFAP2A-AS1, HOXB-AS1, LINC01087, and MAPT-AS1 are protective factors, while KMT2E-AS1 and OTUB6DB-AS1 are risk-related factors. In addition, it was also found that the OS of breast cancer patients in the high-risk group was poorer than that in the low-risk group based on the prognostic signature. The validation in training set and testing set revealed that the prognostic signature has better diagnostic capability. Besides, we used Cox regression analysis that concluded that the signature is an independent factor of breast cancer. We also evaluated the clinical value of the signature in age, gender, clinical stage, and TNM stage and found that the signature may be associated with the progression of breast cancer. To further analyze these 9 lncRNAs of the signature, we developed a nomogram and validated their prognostic value and expression. We found USP30-AS1, ST7-AS1, VPS9D1-AS1, OTUB6DB-AS1, LINC01087, and MAPT-AS1 were correlated to the OS in breast cancer patients. And we further verified that USP30-AS1, TFAP2A-AS1, MAPT-AS1, and LINC01087 were significantly increased and HOXB-AS1 was significantly decreased in breast cancer tissues and cells. Moreover, in our analysis, we found that several pathways have been established in cancer such as ECM receptor interaction, TGF beta signaling pathway, cell adhesion, Renal cell carcinoma pathway, and Prostate cancer pathway, which were markedly enriched in high risk group. We also detect that the glucose metabolism pathways were enriched in high-risk patients, especially glycolysis, gluconeogenesis, and tricarboxylic acid cycle. Previous studies have shown that abnormal cell metabolism, especially glucose metabolism, is related to the occurrence and progression of tumors (30,31). Tumor cells absorb a large amount of glucose and tend to carry out glycolysis in the cytoplasm even under the condition of sufficient oxygen supply, thus promoting the rapid growth and proliferation of cells (31,32). Recent studies have found an unexpected link between glucose metabolism and autophagy. They found that increase of glycolysis in autophagic cells can promote Rasmediated adhesion-independent transformation, indicating autophagy may promote Ras-driven tumor growth in a specific metabolic environment (33). This is consistent with our present findings.
However, the molecular mechanism of these 9 autophagyrelated lncRNAs is still inadequately understood in breast cancer, and further investigation of the underlying mechanisms may be meaningful. So we established a nomogram to more intuitively predict the OS in 1, 3, and 5 years. Subsequently, we discover the prognostic value of these 9 autophagy-related lncRNAs using the Kaplan-Meier survival analysis and the results were basically consistent with the prognostic analysis results of TCGA dataset. Furthermore, targeted analysis on a specific histological group of breast cancer patients will increase the specificity and individuality of the autophagy-related lncRNAs. In our further study, we will analyze and verify these lncRNAs in breast cancer patients with the specific morphological and molecular features.
Taken together, our study defines an innovative autophagyrelated lncRNA signature in breast cancer. It is a comprehensive analysis of RNA sequencing data and clinical information available in TCGA database. The autophagy-related lncRNA signature may provide new insights into predicting the prognosis of patients with breast cancer. More importantly, the functions and approaches associated with our signature may contribute to the development of a new therapeutic strategy for breast cancer.