Long non-coding RNA AC099850.4 correlates with advanced disease state and predicts worse prognosis in triple-negative breast cancer

Our understanding of the function of long non-coding RNAs (lncRNAs) in health and disease states has evolved over the past decades due to the many advances in genome research. In the current study, we characterized the lncRNA transcriptome enriched in triple-negative breast cancer (TNBC, n = 42) and estrogen receptor (ER+, n = 42) breast cancer compared to normal breast tissue (n = 56). Given the aggressive nature of TNBC, our data revealed selective enrichment of 57 lncRNAs in TNBC. Among those, AC099850.4 lncRNA was chosen for further investigation where it exhibited elevated expression, which was further confirmed in a second TNBC cohort (n = 360) where its expression correlated with a worse prognosis. Network analysis of AC099850.4high TNBC highlighted enrichment in functional categories indicative of cell cycle activation and mitosis. Ingenuity pathway analysis on the differentially expressed genes in AC099850.4high TNBC revealed the activation of the canonical kinetochore metaphase signaling pathway, pyridoxal 5'-phosphate salvage pathway, and salvage pathways of pyrimidine ribonucleotides. Additionally, upstream regulator analysis predicted the activation of several upstream regulator networks including CKAP2L, FOXM1, RABL6, PCLAF, and MITF, while upstream regulator networks of TP53, NUPR1, TRPS1, and CDKN1A were suppressed. Interestingly, elevated expression of AC099850.4 correlated with worse short-term relapse-free survival (log-rank p = 0.01). Taken together, our data are the first to reveal AC099850.4 as an unfavorable prognostic marker in TNBC, associated with more aggressive clinicopathological features, and suggest its potential utilization as a prognostic biomarker and therapeutic target in TNBC.

Our understanding of the function of long non-coding RNAs (lncRNAs) in health and disease states has evolved over the past decades due to the many advances in genome research.In the current study, we characterized the lncRNA transcriptome enriched in triple-negative breast cancer (TNBC, n = ) and estrogen receptor (ER+, n = ) breast cancer compared to normal breast tissue (n = ).Given the aggressive nature of TNBC, our data revealed selective enrichment of lncRNAs in TNBC.Among those, AC .lncRNA was chosen for further investigation where it exhibited elevated expression, which was further confirmed in a second TNBC cohort (n = ) where its expression correlated with a worse prognosis.Network analysis of AC .high TNBC highlighted enrichment in functional categories indicative of cell cycle activation and mitosis.Ingenuity pathway analysis on the di erentially expressed genes in AC .high TNBC revealed the activation of the canonical kinetochore metaphase signaling pathway, pyridoxal '-phosphate salvage pathway, and salvage pathways of pyrimidine ribonucleotides.Additionally, upstream regulator analysis predicted the activation of several upstream regulator networks including CKAP L, FOXM , RABL , PCLAF, and MITF, while upstream regulator networks of TP , NUPR , TRPS , and CDKN A were suppressed.Interestingly, elevated expression of AC .correlated with worse short-term relapse-free survival (log-rank p = .).Taken together, our data are the first to reveal AC .as an unfavorable prognostic marker in TNBC, associated with more aggressive clinicopathological features, and

Introduction
Breast cancers represent a diverse group of cancers with different underlying biological features exhibiting differences in their clinical management, responses to treatment, and clinical outcomes (1).Recent advances in genomic research led to the BC classification of defined molecular subtypes, based on hormone receptor (HR), including estrogen receptor (ER) and progesterone receptor (PR), expression, as well as ERBB2 [also known as human epidermal growth factor receptor 2 (HER2)]  (7), thus corroborating a prognostic value for several lncRNA in TNBC.lncRNAs represent a major class of ncRNAs with lengths exceeding 200 nucleotides and a lack of functional protein translation.lncRNAs can be divided into six different groups based on their genomic positions, subcellular localizations, and functions: (1) enhancer lncRNAs, (2) intronic lncRNAs, (3) antisense lncRNAs, (4) sense lncRNA, (5) intergenic lncRNA, and (6) bidirectional lncRNAs (8,9).Increasing evidence has implicated lncRNAs in the onset and progression of various human cancers, through the regulation of key cellular processes, including proliferation, migration, invasion, and apoptosis at the transcriptional and post-transcriptional levels (10).Phase II/III clinical trials highlighted the potential use of RNA-based therapeutics, including antisense oligonucleotides (ASOs) and small interfering RNAs (siRNAs) to treat various human diseases (11).
In the current study, we characterized the differentially expressed lncRNAs in TNBC and ER + breast cancers compared to normal breast tissues.Given the aggressive nature and lack of targeted therapies for TNBC, we subsequently aimed at identifying unique lncRNA transcripts expressed in TNBC, but not ER + BC, which could potentially be used as prognostic biomarkers and therapeutic targets.Subsequently, we focused our study on AC099850.4(alternatively named lnc-SKA2-1, AC099850.3,or ENSG00000265415), revealing AC099850.4as a novel prognostic biomarker associated with unfavorable disease outcomes in TNBC.Comprehensive bioinformatics and network analysis revealed a plausible role of AC099850.4 in cell cycle regulation.

Results
To provide a global overview of the differentially expressed lncRNAs in different BC subtypes, transcriptomic data from 42 TNBC, 42 ER + HER2 − (referred to as ER + throughout the article), and 56 normal breast tissues (NT) were pseudoaligned to the GENCODE release (V33) reference genome using Kallisto.Data presented in Figure 1 revealed a distinct lncRNA expression profile for the indicated breast cancer molecular subtypes compared to NT (Figure 1A, Supplementary Table S1).Concordantly, PCA analysis revealed similar segregation of TNBC from ER + and NT (Figure 1B).Our analysis revealed 226 lncRNAs that were upregulated in TNBC vs. NT and in ER + vs. NT (Figure 1C).Interestingly, we identified 57 lncRNAs that were upregulated in TNBC vs. ER + and in TNBC vs. NT, but not in ER + vs. NT, suggesting their specific expression in TNBC (Figure 1C).

AC . expression correlates with advanced tumor grade and worse prognosis
Among the identified TNBC-enriched lncRNAs, AC099850.4was chosen for further analysis since its expression was enriched in TNBC and has not been implicated in TNBC thus far.The expression AC099850.4 in TNBC, ER + , and NT is shown in Figure 2A.We subsequently confirmed the upregulated expression of AC099850.4 in a larger cohort of TNBC (n = 360) compared to normal (n = 88) exhibiting 2.2 fc, p(Adj) = 1.3 × 10 −30 , as shown in Figure 2B.Interestingly, we observed the highest expression of AC099850.4 in TNBC with advanced tumor grade (Figure 2C) and the BLIS TNBC subtype exhibiting the worst prognosis (18) (Figure 2D).

AC
. is an unfavorable prognostic biomarker for TNBC relapse-free short-term survival We subsequently sought to assess the prognostic value of AC099850.4 in relation to RFS in TNBC.In that regard, we divided the 360 TNBC cohorts into AC099850.4high and AC099850.4low based on median AC099850.4expression and performed the Kaplan-Meyer survival analysis.Interestingly, AC099850.4expressed had a modest correlation with RFS in the long term (log-rank p-value = 0.4, Figure 6A).However, when we assessed the ability of AC099850.4 to predict short-term RFS (24 months), the high expression of AC099850.4correlated with a worse prognosis (log-rank p-value = 0.01, Figure 6B).Those data highlighted a role for AC099850.4as an unfavorable prognostic biomarker for short-term RFS.

Discussion
Understanding the biological roles of various lncRNAs has contributed to our knowledge of the functions of this class of epigenetic regulators in cancer.In the current study, we characterized the lncRNA transcriptome of TNBC and ER + breast cancers and identified 57 lncRNAs that were upregulated in TNBC vs. ER + and in TNBC vs. NT, but not in ER + vs. NT, suggesting their restricted expression in TNBC.Of particular interest, we conducted a comprehensive investigation on the expression AC099850.4 in TNBC.Interestingly, the highest expression of AC099850.4was observed in TNBC patients with advanced tumor grade and in the BLIS subtype, which is known to have the worst prognosis among different TNBC subtypes (18).Investigating the expression of AC099850.4 in a larger cohort of TNBC (n = 360) correlated higher expression of AC099850.4and enriched functional categories indicative of cellular proliferation and mitosis.
More in-depth computational analyses using IPA revealed activation of several functional categories in AC099850.4high TNBC, including the canonical kinetochore metaphase signaling pathway, pyridoxal 5'-phosphate salvage pathway, and salvage pathways of pyrimidine ribonucleotides.Additionally, upstream regulator analysis predicted activation of CKAP2L, FOXM1, RABL6, PCLAF, and MITF and suppression of TP53, NUPR1, TRPS1, and CDKN1A in AC099850.4high TNBC.Nonetheless, our data highlighted AC099850.4as an unfavorable prognostic biomarker predicting short-term TRFS in TNBC.In agreement with our data, AC099850.4was recently identified among 8 lncRNA biomarker panels in head and neck squamous cell carcinoma (19).Similarly, the elevated expression of AC099850.4,an m6Arelated lncRNA, was reported in patients with oral squamous cell carcinoma (20), and the elevated expression of AC099850.4was also correlated with worse survival in lung cancer (21).Recently, AC099850.4was reported to be highly expressed and correlated with a worse prognosis in non-small cell lung cancer FIGURE Protein-protein interaction (PPI) network analysis of upregulated genes in AC .high vs.AC .low TNBC.PPI network based on STRING analysis of upregulated genes in AC .high vs.AC .low .Network statistics: number of nodes: , number of edges: , expected number of edges: , average node degree: ., avg.local clustering coe cient: ., PPI enrichment p-value: < .× − .(22).Similarly, a recent study on hepatocellular carcinoma (HCC), which included 374 HCC and 160 non-HCC samples, identified five immune-related lncRNA prognostic panels, including AC099850.3.Silencing of AC099850.3inhibited HCC cell proliferation and migration and led to significant inhibition of PLK1, TTK, CDK1, and BULB1 cell cycle molecules and CD155 and PDL1 immune receptors (23).Numerous recent studies revealed intriguing aspects of AC099850.4as immuno-autophagy-related lncRNA (24), epithelial-mesenchymal transition-related lncRNA (25), and cancer cell stemness-associated lncRNA (26) in HCC.Those reports further support an oncogenic role for AC099850.4 in various human cancers, which remains to be validated in TNBC.
While several studies implicated AC099850.4 in various other cancer types, our data are the first to implicate this lncRNA in TNBC prognosis.Our data suggest the potential use of AC099850.4as a prognostic biomarker and therapeutic target in TNBC, which warrants further investigation.

Conclusion
Our data are the first to identify AC099850.4as a novel prognostic biomarker for TNBC, correlating with advanced disease stage and patient survival.

Limitations of the study
Our data provide solid evidence implicating AC099850.4as a prognostic biomarker in TNBC.One limitation of the current study is that the cohort we analyzed has only ER + and TNBC, but none of the patients were HER2 + ; hence, the expression of AC099850.4 in HER2 + BC remains to be assessed.Although our study was initially based on patients' transcriptomic data, the potential to utilize this lncRNA for patient prognosis remains to be validated in multiple TNBC cohorts.The functional consequences of AC099850.4depletion in TNBC cell models remain to be validated in vitro, and the potential use of RNAbased therapeutics to target AC099850.4systemically remains also to be addressed in vivo.Our data highlighted multiple enriched GO and networks in AC099850.4high vs.AC099850.4low TNBC; however, the exact mechanism by which AC099850.4 exerts its biological functions and its interacting protein partners remains to be identified using biochemical approaches, such as comprehensive identification of RNA-binding proteins by mass spectrometry, ChIRP-MS (27).

RNA-Seq data analysis and bioinformatics
Raw RNA sequencing data were retrieved from the sequence read archive (SRA) database under accession no.PRJNA251383, consisting of 42 TNBC, 42 ER + HER2 − , and 56 normal breast tissue samples.The Kallisto index was constructed by creating a de Bruijn graph employing the GENCODE release (V33) reference transcriptome and 31 length k-mer.FASTQ files were subsequently  pseudo-aligned to the generated index using KALLISTO 0.4.2.1, as previously described (3,28).Normalization (TPM, transcript per million) was conducted using KALLISTO 0.4.2.1.A detailed description of the study subjects can be found in Ref. (29).Normalized expression data (TPM) were sequentially imported into AltAnalyze v.2.1.3software for differential expression and PCA analysis using 2.0-fold change and adjusted cut-off pvalue of <0.05 (30).Low abundant transcripts (<1.0 TPM raw expression value) were excluded from the analysis.The Benjamini-Hochberg method was used to adjust for the false discovery rate (FDR).The marker finder prediction was carried out as previously explained.PRJNA486023 (360 TNBC and 88 normal samples) was retrieved from the SRA databases using the SRA toolkit v2.9.2 as previously described (31,32) and was mapped to GENCODE release (v33) as mentioned above and was used to confirm our findings.Detailed information on the study subjects in this validation cohort can be found in Jiang et al. (33).

Protein-protein interaction and KEGG network analysis
Upregulated genes in AC099850.4high TNBC (n = 180) were subject to PPI network analysis using the STRING (STRING v10.5) database to illustrate the interacting genes/proteins based on knowledge and predication as described before (34).KEGG pathway analysis was conducted using DAVID as described earlier (35).

Gene set enrichment and modeling of gene interactions networks
Upregulated genes in AC099850.4high were imported into the Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems; http://www.ingenuity.com/) and were subjected to functional annotations and regulatory network analysis using upstream regulator analysis (URA), downstream effects analysis (DEA), mechanistic network (MN) and causal network analysis (CNA) prediction algorithm.IPA uses precision to predict functional regulatory networks from gene expression data and provides a significance score for each network according to the fit of the network to the set of focus genes in the database.The p-value is the negative log of P and represents the possibility of focus genes in the network being found together by chance.

Survival and statistical analysis
The Kaplan-Meier survival analysis and plotting were conducted using IBM SPSS version 26 software.For survival analysis, patients were grouped into high or low based on the corresponding lncRNA median expression.The log-rank test was used to compare the outcome between expression groups.GraphPad Prism 9.0 software (San Diego, CA, USA) was used to compare the lncRNA expression as a function of tumor grade and LN status.An unpaired two-tailed t-test was used to compare two groups, while a one-way ANOVA was used to compare multiple groups.The Benjamini-Hochberg method was used to adjust for the false discovery rate (FDR).The p-value of < 0.05 was considered statistically significant.

FIGURE
FIGURE LncRNA transcriptional landscape in di erent breast cancer subtypes and normal breast tissue.(A) Hierarchical clustering of TNBC (n = ), ER + breast cancer (n = ) and normal breast tissue (n = ) based on di erentially expressed lncRNAs.Each column represents one sample, and each row represents a single lncRNA.The expression level of each lncRNA (log ) is depicted according to the color scale.(B) Principal component analysis (PCA) for the lncRNA transcriptome of TNBC, ER + breast cancer, and normal breast tissue.(C) Venn diagram depicting the overlap between upregulated lncRNAs in TNBC vs. normal, ER + vs. normal, and TNBC vs. ER + .

FIGURE
FIGUREIngenuity pathway analysis of di erentially expressed genes in AC .high vs.AC .low TNBC.(A) Tree map (hierarchical heatmap) depicting a ected functional categories based on di erentially expressed genes in AC .high vs.AC .low where the major boxes represent a category of diseases and functions.Upstream regulator analysis depicting activated (B) and inhibited (C) networks in AC .high vs.AC .low TNBC.

FIGURE
FIGURE Relapse-free survival (RFS) analysis according to AC .expression.(A) Long-term RFS analysis in a cohort of TNBC based on median AC .expression.(B) Short-term RFS analysis in a cohort of TNBC based on median AC .expression.The log-rank test was used to compare groups.