A Novel Prognostic Prediction Model for Colorectal Cancer Based on Nine Autophagy-Related Long Noncoding RNAs

Introduction Colorectal cancer (CRC) is the most common gastrointestinal cancer and has a low overall survival rate. Tumor–node–metastasis staging alone is insufficient to predict patient prognosis. Autophagy and long noncoding RNAs play important roles in regulating the biological behavior of CRC. Therefore, establishing an autophagy-related lncRNA (ARlncRNA)-based bioinformatics model is important for predicting survival and facilitating clinical treatment. Methods CRC data were retrieved from The Cancer Genome Atlas. The database was randomly divided into train set and validation set; then, univariate and multivariate Cox regression analyses were performed to screen prognosis-related ARlncRNAs for prediction model construction. Interactive network and Sankey diagrams of ARlncRNAs and messenger RNAs were plotted. We analyzed the survival rate of high- and low-risk patients and plotted survival curves and determined whether the risk score was an independent predictor of CRC. Receiver operating characteristic curves were used to evaluate model sensitivity and specificity. Then, the expression level of lncRNA was detected by quantitative real-time polymerase chain reaction, and the location of lncRNA was observed by fluorescence in situ hybridization. Additionally, the protein expression was detected by Western blot. Results A prognostic prediction model of CRC was built based on nine ARlncRNAs (NKILA, LINC00174, AC008760.1, LINC02041, PCAT6, AC156455.1, LINC01503, LINC00957, and CD27-AS1). The 5-year overall survival rate was significantly lower in the high-risk group than in the low-risk group among train set, validation set, and all patients (all p < 0.001). The model had high sensitivity and accuracy in predicting the 1-year overall survival rate (area under the curve = 0.717). The prediction model risk score was an independent predictor of CRC. LINC00174 and NKILA were expressed in the nucleus and cytoplasm of normal colonic epithelial cell line NCM460 and colorectal cancer cell lines HT29. Additionally, LINC00174 and NKILA were overexpressed in HT29 compared with NCM460. After autophagy activation, LINCC00174 expression was significantly downregulated both in NCM460 and HT29, while NKILA expression was significantly increased. Conclusion The new ARlncRNA-based model predicts CRC patient prognosis and provides new research ideas regarding potential mechanisms regulating the biological behavior of CRC. ARlncRNAs may play important roles in personalized cancer treatment.

Introduction: Colorectal cancer (CRC) is the most common gastrointestinal cancer and has a low overall survival rate. Tumor-node-metastasis staging alone is insufficient to predict patient prognosis. Autophagy and long noncoding RNAs play important roles in regulating the biological behavior of CRC. Therefore, establishing an autophagy-related lncRNA (ARlncRNA)-based bioinformatics model is important for predicting survival and facilitating clinical treatment.
Methods: CRC data were retrieved from The Cancer Genome Atlas. The database was randomly divided into train set and validation set; then, univariate and multivariate Cox regression analyses were performed to screen prognosis-related ARlncRNAs for prediction model construction. Interactive network and Sankey diagrams of ARlncRNAs and messenger RNAs were plotted. We analyzed the survival rate of high-and low-risk patients and plotted survival curves and determined whether the risk score was an independent predictor of CRC. Receiver operating characteristic curves were used to evaluate model sensitivity and specificity. Then, the expression level of lncRNA was detected by quantitative real-time polymerase chain reaction, and the location of lncRNA was observed by fluorescence in situ hybridization. Additionally, the protein expression was detected by Western blot.
Results: A prognostic prediction model of CRC was built based on nine ARlncRNAs (NKILA, LINC00174, AC008760.1, LINC02041, PCAT6, AC156455.1, LINC01503, LINC00957, and CD27-AS1). The 5-year overall survival rate was significantly lower in the high-risk group than in the low-risk group among train set, validation set, and all patients (all p < 0.001). The model had high sensitivity and accuracy in predicting the 1-year overall survival rate (area under the curve = 0.717). The prediction model risk score

INTRODUCTION
Colorectal cancer (CRC) is a common gastrointestinal cancer. According to the 2020 global cancer statistics, more than 1.9 million new cases of CRC were diagnosed, and 935,000 CRC patients died. CRC ranks third in morbidity and second in mortality for all tumors (1). With advancements in comprehensive therapy, including surgery, chemotherapy, biological immunotherapy, and radiotherapy, CRC outcomes are improving, but the overall survival rate is still low (2). Historically, the tumor-node-metastasis (TNM) staging system has been widely used to predict the prognosis of CRC patients. Generally, the prognosis is better with earlier staging. In recent years, however, some studies have shown that the combination of biomarkers and TNM staging is more accurate in predicting the prognosis of CRC patients (3,4).
Long noncoding RNAs (lncRNAs) are defined as RNAs that contain more than 200 nucleotides and do not encode a protein.
They play important roles in transcription, translation, and cell cycle regulation (5). A growing body of evidence indicates that lncRNAs play important roles in CRC occurrence, development, metastasis, and drug resistance (6)(7)(8)(9). Moreover, several lncRNA-based prediction models have been built to predict the survival of patients with CRC (10)(11)(12), indicating that lncRNAs can serve as CRC biomarkers.
Autophagy is a process of intracellular degradation that helps maintain homeostasis by promoting nutrient recycling during nutrient deficiency, hypoxia, DNA damage, and infection (13,14). It plays an important role in tumor development, maintenance, and progression (15). Studies have shown that autophagy is a double-edged sword, as it promotes CRC invasion and metastasis and CRC cell apoptosis (16,17). Current studies show that tumor autophagy can predict patient prognosis (18,19).
Overall, autophagy and lncRNAs both play important biological roles in CRC. To date, few studies have been conducted that investigate the role of autophagy-related lncRNAs (ARlncRNAs) in the survival of cancer patients. One study showed that the prognosis of patients with lung adenocarcinoma is related to the abnormal expression of 13 ARlncRNAs (20). We hypothesize that ARlncRNAs are closely related to the survival of CRC patients and that ARlncRNAs are potential biomarkers of and treatment targets for CRC. In this study, using bioinformatics technology, we identified and screened ARlncRNAs related to the prognosis of CRC patients and built a novel model that can be used to predict prognosis. This model is of great significance in predicting the prognosis of CRC patients in clinical practice.

Data Analysis
We employed Strawberry Perl software (v5.30.2.1) to compile the clinical data of CRC patients. After deleting "unknown" and incomplete survival data ("NULL"), we recorded the survival time, survival status, age at diagnosis, sex, clinical stage, T stage, M stage, and N stage of each patient. We used the Strawberry Perl program to organize the transcriptome data of CRC samples into an expression matrix and then used GRCh38.p12 to annotate the genes. Next, we obtained separate messenger RNA (mRNA) and lncRNA expression matrices for CRC samples and used the "limma" package in R (v4.0.0) for coexpression analysis of the mRNA expression matrix and autophagy-related gene profile to obtain the expression matrix for autophagy-related genes. Then, we performed coexpression analysis of the autophagy-related expression matrix and lncRNA expression matrix (selection criterion: absolute correlation coefficient > 0.3, p < 0.001) to obtain the lncRNA matrix coexpressed with autophagy-related genes, that is, ARlncRNAs. We utilized Strawberry Perl software to merge the clinical data of CRC patients and the information from the ARlncRNA expression matrix to obtain a matrix of the expression levels of ARlncRNAs and survival status. The flow chart of overall procedures is shown in Figure 1.

Model Establishment
We used the "caret" package in R to divide the database into train set and validation set. Then, we used the "survival" package in R to conduct univariate Cox regression analysis (p < 0.05) to screen ARlncRNAs related to the 5-year OS of CRC patients and then performed multivariate Cox regression analysis with optimization based on the optimal Akaike information criterion to screen ARlncRNAs for the prediction model. The risk score was the sum of the product of the expression level of each ARlncRNA and the corresponding multivariate Cox regression coefficient (21,22). The above model was applied to calculate the risk score for each patient. The patients were divided into high-and a low-risk groups with the median risk score as the cutoff value. Next, the "pheatmap" package in R was employed to plot the heat maps of nine ARlncRNAs in each group.

Survival Analysis
We used the "survival" package in R to plot Kaplan-Meier survival curves to analyze the 5-year OS associated with high or low expression of each ARlncRNA and then performed the log-rank sum test to analyze survival differences between the high and low expression groups. We also compared the 5-year OS between the high-risk group and the low-risk group obtained from this model. Next, the "pheatmap" package in R was used to plot the survival time and survival status.

Model Evaluation
We utilized the "survivalROC" package in R to plot the receiver operating characteristic (ROC) curve to evaluate the sensitivity and specificity of the ARlncRNA-based risk model via the area under the curve (AUC). Moreover, we analyzed the performance of the risk score from the ARlncRNA-based model versus TNM stage, age, sex, and clinical stage for predicting 5-year survival. In addition, we performed multivariate Cox regression analysis and stratified analysis to determine whether the risk score was independent of clinical variables.

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyzed the signal pathways of enrichment of genes that construct an interaction network with lncRNAs in the model.

ARlncRNA Interactive Networks
We used Perl software to screen autophagy-related genes coexpressed with prognostic ARlncRNAs and then used Cytoscape software (v3.8.0) to plot and visualize interactive networks of ARlncRNAs and mRNAs from the prediction model. Next, we utilized the "ggplot2," "gglluvial," and "dplayr" packages in R to plot a Sankey diagram to determine whether the ARlncRNAs included in the model were risk factors or protective factors.

Quantitative Real-Time Polymerase Chain Reaction
RNAs were extracted using the Trizol reagent (Invitrogen, Carlsbad, CA, USA), followed by removal of DNA with the TurboDNase kit (Ambion). Quantification of extracted RNA was performed using NanoDrop. Complementary DNA synthesis was performed using PrimeScript real-time (RT) reagent kit (Takara Bio, Japan) using 1,000 ng of total RNA. Quantitative real-time polymerase chain reaction (qRT-PCR) was performed using the SYBR Select Master Mix (Applied Biosystems, Waltham, MA, USA) on an ABI 7900 system (Applied Biosystems). Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was used as a control. The Ct value was calculated based on the DDCt method. Fold change of gene expression was expressed as 2 −DDCt . The primers used in this study were as follows: NKILA, sense strand 5′-CGGATACATCTTAGTTGTTATG-3′, antisense strand 5′-GTGCTGGAATCATCATTG-3′ and LINC00174, sense strand 5′-GCATTAGATTCTCATAGG-3′, antisense strand 5′-GGCATTAGATTCTCATAG-3′.

Western Blot
LC3B, p62, beclin1, and ATG7 were extracted from the indicated cells using radioimmunoprecipitation assay (RIPA) lysis buffer, and a BCA Protein Assay Kit (Thermo Scientific, USA) was used to measure the protein concentration. In total, 60 mg of protein was separated on 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels by PAGE and transferred onto nitrocellulose membranes. The membranes were blocked using 5% non-fat dry milk and incubated with primary rabbit monoclonal antibody overnight at 4°C. The membranes were washed with Tris-buffered saline with Tween 20 (TBST) and then incubated with the appropriate secondary antibody.

Statistical Analysis
The Cox proportional hazard model was employed for univariate and multivariate analyses. The Kaplan-Meier method was used to plot survival curves, and the log-rank sum test was performed to analyze between-group differences. Due to homogeneity of variance, Student's t-test and Dunnett's multiple comparisons test in analysis of variance were used to analyze the expression levels of lncRNA in different cells. Statistical analyses were conducted with R (v4.0.0) and GraphPad Prism (v9.2.0) software. A two-tailed value of p < 0.05 was considered statistically significant, unless otherwise specified.

Identification of Nine Prognostic ARlncRNAs
Using univariate Cox regression analysis and the Kaplan-Meier method, we screened 32 prognostic ARlncRNAs (Supplementary Table 2). Further multivariate Cox regression analysis of these 32 ARlncRNAs with optimization at an Akaike information criterion of 365.59 yielded an interactive network diagram with 9 ARlncRNAs and 53 coexpressed mRNAs ( Figure 2). We then plotted Kaplan-Meier survival curves to analyze survival and plotted the corresponding survival curve associated with high or low expression of each of the nine ARlncRNAs ( Figure 3). The results showed that the 5-year survival rate was significantly lower in the high-than in the low-expression group among all ARlncRNAs (NKILA, LINC00174, AC008760.1, LINC02041, PCAT6, AC156455.1, LINC01503, LINC00957, and CD27-AS1, all p < 0.05).
FIGURE 2 | The interaction network of OS-associated lncRNAs and autophagy genes. The pink circle refers to ARlncRNA, and the cyan square refers to autophagy genes.

Risk Score Model Based on Nine ARlncRNAs
We constructed a proportional hazards model of these nine ARlncRNAs using multivariate Cox regression analysis: risk score =  Table 1).

Evaluation of the ARlncRNA-Based Prediction Model for CRC
We calculated the risk score for each patient based on the proportional hazard model for nine ARlncRNAs and then sorted the risk scores in ascending order. With the median as the cutoff value, we divided the patients into a high-and a low-risk group between train set and validation set. Figure 4 shows the risk score and survival status. Kaplan-Meier survival analysis indicated that the 5-year OS was significantly higher in the low-than in the high-risk group among train set ( Figure 5A), validation set ( Figure 5B), and all patients ( Figure 5C). Moreover, the model had high sensitivity and specificity in predicting the 1-year OS of CRC patients among train set ( Figure 5D), validation set ( Figure 5E), and all patients (AUC = 0.717, Figure 5F). In addition, multivariate analysis showed that risk value was better than clinicopathological factors ( Figure 5G). Sankey diagrams and heat maps were used to visualize the expression profiles for the nine ARlncRNAs in the low-and high-risk groups. The results showed that the expression level of all ARlncRNAs tended to be higher in the low-than in the high-risk group, and they were determined to be risk factors ( Figures 5H-K).   Figures 6A, B). Moreover, the risk score was a risk factor, as the 5-year survival rate was lower if the risk score was >0.943.

Correlation Between the Model Risk Score and Clinical Data
We performed stratified analysis based on age, sex, clinical stage, T stage, M stage, and N stage. The mean risk score for each factor was compared between the two groups of patients.

Signaling Pathways Enriched With the Nine ARlncRNAs
We performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) to identify the signal pathways of enrichment of genes that construct an interaction network with lncRNAs in the model. The results showed that those genes were most enrichment in autophagy signal pathways ( Figures 7A, B).

Validation In Vitro
To verify the expression of selected lncRNAs, i.e., NKILA and LINC00174, in colorectal cancer cell lines, qRT-PCR was performed. As shown in results, NKILA and LINC00174 were all higher   expressed in colorectal cancer cell line HT29 compared to normal colonic epithelial cell line NCM460 (Figure 8). FISH assay was used to detect the subcellular localization of LINC00174 and NKILA. The results showed that the target genes LINC00174 and NKILA were both expressed in the nucleus and cytoplasm of the NCM460 and HT29 cells (Figure 9). In addition, the protein expression of autophagy-related genes was detected by Western blot. The results showed that, compared to NCM460 cell, LC3B and ATG7 were lower expressed in RKO and HT29 cell ( Figure 10). To detect the relation of NKILA and LINC00174 with autophagy, we stimulated the cells with mTOR inhibitor to activate the autophagy. As shown in the results, significant upregulation of LC3B and ATG7 protein expression was observed between HT29 and NCM460, which suggested that they were related to the activation of autophagy ( Figure 11). In addition, compared with normal cultured cells, the expression of LINC00174 was significantly decreased in both NCM460 and HT29 cells after the addition of mTOR inhibitor, while the expression of NKILA was significantly increased (Figure 12), which suggested that NKILA and LINC00174 were related to autophagy.

DISCUSSION
Cell death occurs via necrosis, apoptosis, and autophagy. Cell apoptosis and autophagy are programmed cell death pathways, and current studies show that apoptosis and autophagy are both regulated by lncRNAs and that regulating the expression level of lncRNAs involved in apoptosis and autophagy regulates the biological behavior of tumor cells. Wei et al. (23) showed that, in CRC, lncRNA CA3-AS1 reduces miR-93 expression by directly binding to miR-93, thereby promoting CRC apoptosis and suppressing tumor proliferation and invasion. In another study, lncRNA FAL1 promoted CRC apoptosis, thereby reducing tumor proliferation (24). ARlncRNAs and apoptosis-related lncRNAs may have similar effects. In recent years, researchers have shown increased interest in the role of lncRNAs in tumor autophagy. lncRNA SLCO4A1-AS1 binds to miR-508-3p to upregulate PARD3, thereby promoting protective CRC autophagy and CRC proliferation (25). Zheng et al. (26) showed that the survival time of CRC patients is shorter for those with high lncRNA HAGLROS expression than for those with low expression. HAGLROS inhibits apoptosis and promotes autophagy to regulate tumor biological behavior mainly via the PI3K/AKT/mTOR and miR-100/ATG5 pathways. These data indicate that apoptosis and autophagy may have common regulatory pathways. We suspect that HAGLROS may be a marker for predicting the prognosis of patients with CRC. Another study showed that lncRNA Malat1 directly binds to miR-101 to regulate CRC autophagy, apoptosis, and proliferation. Further studies showed that the use of 3-methyladenine, an autophagy inhibitor, decreased Malat1-induced cell proliferation and promoted Malat1-induced apoptosis (27). Taken together, these studies show that lncRNAs play important roles in activating CRC autophagy, suggesting that lncRNAs regulate tumor biological behavior by activating CRC autophagy pathways, thereby enhancing tumor cell proliferation. This conclusion should be taken into consideration in clinical treatment.
In addition, ARlncRNAs play important roles in resistance to chemotherapy. Liu et al. (28) showed that lncRNA nuclear paraspeckle assembly transcript 1 (NEAT1) is highly expressed in CRC tissues and cell lines and is negatively correlated with miR-34a, which is involved in autophagy activation via targeted sites (HMGB1, ATG9A, and ATG4B). NEAT1 knockdown significantly inhibits CRC proliferation and enhances sensitivity to 5fluorouracil (5-FU). Wang et al. (29) showed that lncRNA H19 triggers autophagy and induces 5-FU resistance in CRC cells via the miR-194-5p/SIRT1 pathway. H19 expression is significantly increased in patients with recurrent CRC, and the recurrence-free survival time is significantly shorter in patients with high H19 expression than in patients with low H19 expression. Han et al. (30) showed that, in CRC, the expression levels of lncRNA SNHG14 and ATG14, an autophagy-related gene, are significantly increased and  Numerous studies have shown that lncRNA-based models can predict the prognosis of CRC patients (10)(11)(12). However, no ARlncRNA-based model has been established for predicting the prognosis of CRC patients. One study showed that an ARlncRNA-based model can predict the prognosis of patients with lung adenocarcinoma (20). In this study, we established an ARlncRNA-based model to predict the survival of CRC patients for the first time. We performed univariate analysis and obtained 32 prognostic ARlncRNAs from 612 tissue samples from TCGA and then performed multivariate Cox regression analysis to build a risk score model of nine significant prognostic ARlncRNAs (NKILA, LINC00174, AC008760.1, LINC02041, PCAT6, AC156455.1, LINC01503, LINC00957, and CD27-AS1). Next, we calculated the risk score for each CRC patient based on this model. Survival analysis showed that the 5-year OS was significantly higher in the low-risk group (risk score < 0.943) than in the high-risk group. A multivariate ROC curve showed that the model had high sensitivity and accuracy in predicting the 5-year OS rate (AUC = 0.717). Moreover, the risk score from this model, based on nine ARlncRNAs, was an independent predictor of CRC.
LINC01503 is an ARlncRNA that we screened that is negatively correlated with patient prognosis. Previous studies have shown that LINC01503 is highly expressed in CRC cell lines. LINC01503 overexpression promotes CRC proliferation and invasion, while LINC01503 silencing inhibits CRC proliferation In gastric cancer and brain glioma, LINC01503 promotes tumor proliferation and invasion via the Wnt signaling pathway (32,33). Using bioinformatics analysis, we concluded that LINC01503 is an autophagy-related prognostic predictor for CRC, but further research is needed to investigate the mechanism by which LINC01503 regulates tumor biological behaviors via autophagy pathways. Numerous studies have shown that PCAT6 also plays an important role in tumor proliferation and invasion, such as in non-small cell lung cancer (34)(35)(36), breast cancer (37), cervical cancer (38,39), and liver cancer (40,41). Furthermore, PCAT6, a lncRNA, is highly expressed in colon cancer and is closely related to tumor malignancy. High PCAT6 expression is associated with low survival. PCAT6 activates the expression of antiapoptotic ARC and inhibits colon cancer cell apoptosis by increasing EZH2 expression (42). This study showed that PCAT6 is negatively correlated with the prognosis of CRC patients. Moreover, this study showed that PCAT6 is associated with autophagy and that PCAT6 probably increases tumor malignancy via autophagy pathways. Therefore, PCAT6 may regulate tumor biological behaviors through different molecular mechanisms, but research on autophagy pathways is still lacking. Another study showed that PCAT6 is an adverse prognostic predictor of CRC. Moreover, PCAT6 inhibits miR-204 expression, thereby promoting activation of the HMGA2/PI3K pathway and enhancing 5-FU resistance in CRC (43). Further research is needed to investigate additional mechanisms through which PCAT6 affects the biological behaviors of CRC.
Colorectal cancer patients with overexpression of LINC00174 have a poor prognosis. Overexpression of LINC00174 promotes the proliferation, migration, and invasion of colorectal cancer cells by regulating Mir-1910-3p/TAZ and Mir-3127-5p/E2F7 signaling pathways (44,45). LINC00174, as an autophagyrelated lncRNA, secreted by vascular endothelial cells, inhibits autophagy by SRSF1/P53 signaling pathway, which can alleviate myocardial insulin-reperfusion injury (46). In addition, NKILA can enhance autophagy of HK2 cells through Mir-140-5p/ CLDN2/LPS pathway to induce acute kidney injury (47). In our research, we found that NKILA and LINC00174 were related to autophagy in colorectal cancer. However, at present, no study has found whether LINC00174 and NKILA can affect the occurrence and development of colorectal cancer through autophagy, which will also be the focus of our follow-up research.

CONCLUSION
We constructed a novel prediction model based on nine ARlncRNAs to predict the prognosis of CRC patients. Moreover, we verified the relationship among LINC00174, NKILA, and autophagy in colorectal cancer cells. Our results provide new ideas for further research on potential mechanisms involved in regulating the biological behaviors of CRC.

Limitations
The prediction model based on nine ARlncRNAs has some limitations. First, no prospective studies have been conducted to confirm its reliability. Second, while we plotted the complex interactive networks between ARlncRNAs and mRNAs, we did not perform in-depth pathway research. Last, we did not consider each patient's treatment plan in this study, which may affect the study results. In summary, few studies have been conducted to investigate the use of ARlncRNAs to predict the prognosis of patients with CRC and how ARlncRNAs regulate autophagy. In the future, well-designed, randomized, controlled trials are needed to validate the reliability of this model.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: The Cancer Genome Atlas (https://portal.gdc. cancer.gov/repository), the Ensembl database (https://asia. ensembl.org/index.html), and the Human Autophagy database (http://www.autophagy.lu/clustering/index.html).

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

ACKNOWLEDGMENTS
Thanks to the bioinformatics self-study network for providing part of the code and thanks to TGCA for providing the data.