Identification of Prognostic miRNAs Associated With Immune Cell Tumor Infiltration Predictive of Clinical Outcomes in Patients With Non-Small Cell Lung Cancer

Background A detailed means of prognostic stratification in patients with non-small cell lung cancer (NSCLC) is urgently needed to support individualized treatment plans. Recently, microRNAs (miRNAs) have been used as biomarkers due to their previously reported prognostic roles in cancer. This study aimed to construct an immune-related miRNA signature that effectively predicts NSCLC patient prognosis. Methods The miRNAs and mRNA expression and mutation data of NSCLC was obtained from The Cancer Genome Atlas (TCGA). Immune-associated miRNAs were identified using immune scores calculated by the ESTIMATE algorithm. LASSO-penalized multivariate survival models were using for development of a tumor immune-related miRNA signature (TIM-Sig), which was evaluated in several public cohorts from the Gene Expression Omnibus (GEO) and the CellMiner database. The miRTarBase was used for constructing the miRNA-target interactions. Results The TIM-Sig, including 10 immune-related miRNAs, was constructed and successfully predicted overall survival (OS) in the validation cohorts. TIM-Sig score negatively correlated with CD8+ T cell infiltration, IFN-γ expression, CYT activity, and tumor mutation burden. The correlation between TIM-Sig score and genomic mutation and cancer chemotherapeutics was also evaluated. A miRNA-target network of 10 miRNAs in TIM-Sig was constructed. Further analysis revealed that these target genes showed prognostic value in both lung squamous cell carcinoma and adenocarcinoma. Conclusions We concluded that the immune-related miRNAs demonstrated a potential value in clinical prognosis.


INTRODUCTION
The most malignant and most commonly encountered lung cancer subtype is non-small cell lung cancer (NSCLC) (1). The NSCLC subtype can further be classified as either lung adenocarcinoma (LUAD) or lung squamous cell carcinoma (LUSC). Forty percent of all lung cancers are of the LUAD subtype, with LUSC reported as the second leading cause of lung malignancy-related death, resulting in an average of 400,000 deaths worldwide annually (2). Although several new treatment regimens including chemotherapeutic and biological agents have been introduced, the effectiveness of these protocols have been marred by the occurrence of drug resistance, leading to inevitably poor outcomes for patients with advanced NSCLC (3). Immune-directed therapy has, in recent years, shown better efficacy and lower toxicity rates over regular chemotherapy in NSCLC. Nevertheless, durable benefits from immunotherapy are reported in only 25-30% of patients (4). Therefore, more effective prognostic biomarkers for risk stratification in NSCLC are required.
MicroRNAs (miRNAs) represent long, non-coding RNAs of approximately 22 nucleotides in length. These molecules are central in posttranscriptional regulation (5). Both tumor initiation and metastasis have been reported to depend heavily on miRNA expression, with certain miRNAs shown to be associated with poor outcomes in NSCLC (6). A myriad of immune-related processes such as the development, activation, and effector functions of various innate and adaptive immune cells have been linked to miRNAs, which therefore appear to be directly responsible in regulating the infiltration of immune cells into tumors (7,8). Growing evidence has depicted the key function of the tumorinfiltrating immune cell (TIIC) in tumor progression and prognosis (9,10). Signatures associated with TIIC show promising predictive values in prognosis and responses to immunotherapy in patients with NSCLC (11,12). Previous research showed that these signatures may be obtained by exploring the expression of certain miRNAs. In cervical cancer, miR-1468-5p was found to upregulate lymphatic PD-L1 and augment lymphangiogenesis, both of which result in dysregulated T cell immunity (13). Reduced miR-4772-3p levels were inversely related to the concentration of Tregs in malignant pleural effusion (MPE) (8). Furthermore, the miR141-CXCL1-CXCR2 pathway was found to modulate Tregs migration into MPE (7). Nevertheless, these studies were on single miRNAs only. An integrated model comprising of a variety of biomarkers has been shown to offer higher predictive capabilities in comparison to models of single biomarkers (14). Construction of multiple biomarker models using conventional Cox regression models has been problematic and often suffers from high rates of model overfitting especially in the context of a large number of biomarkers. The least absolute shrinkage and selection operator (LASSO)-penalized Cox model has been introduced to implement variable selection and has been applied successfully for creating models of several biomarkers (15). This study uses the LASSO technique to construct a multi-miRNA-based signature to provide an immune infiltration score (TIM-Sig score) which is able to stratify NSCLS patients according to their prognosis. We further systematically correlated the TIM-Sig score with available genetic and clinical features of NSCLC patients.

Dataset Preprocessing
Transcriptional profiles and clinical information for lung cancer were obtained from the Gene Expression Omnibus (GEO, http:// www.ncbi.nlm.nih.gov/geo) and The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov). The miRNA expression profiles were obtained from the UCSC Xena browser (GDC hub: https:/gdc.xenahubs.net). A total of 1,884 miRNAs were obtained for the following analysis. Five of the following NSCLC cohorts were also processed by log2 transformation: the TCGA-LUAD/LUSC cohort, GSE16025 cohort, GSE27435 cohort, GSE31210 cohort, and GSE3141 cohort. A brief summary of the clinical and pathological characteristics is shown in Table 1. We also downloaded somatic mutation data from TCGA as calculated by the mutect2 workflow.

LASSO Mixture and Cox Regression Models for Predicting Survival
Tumor purity and tumor immune scores were derived using the ESTIMATE algorithm which is a novel algorithm by Yoshihara et al. It is a method using gene expression profiles to evaluate the fraction of stromal and immune cells in tumor samples. The ESTIMATE algorithm generates three scores: stromal score, immune score, and estimate score (16). The immune score was used to selected immune-related miRNAs. The Spearman correlation coefficient between differentially expressed miRNAs (DEMs) and the immune score was calculated with significance set at (|R| > 0.2, P < 0.01). A total of 35 miRNAs significantly correlated with the immune score were identified and analyzed in the LASSO regression model. The R package "glmnet" and "survival" were used to carry out LASSO and Cox regression analyses to assess the relationship between overall NSCLS patient survival and DEG expression levels. We identified a tumorinfiltrating immune-related miRNA signature score (TIM-Sig score) with the following formula: in which n represents the total number of prognostic miRNAs or genes, EXP I represents gene i profile expression and coef i represents an estimate of the gene i regression coefficient as identified using the multivariable Cox regression analysis or LASSO.

Assessment of the Clinical Significance of the miRNA Signature
To determine the value of the constructed model in the clinical management of patients with lung cancer, miRNA profiles and drug sensitivity IC50 values of the NCI-60 panel of human cancer cell lines were extracted from the CellMiner database (https://discover. nci.nih.gov/cellminer/) (30). The therapeutic effects of 161 Food and Drug Administration (FDA)-approved drugs in NSCLC patients were determined. The Wilcoxon test was used to analyze the significance between differences in the IC50 Z-score between the high-and low-risk cohorts. Results are depicted in terms of box drawings plotted using the ggplot2 function of R.

MiRNA-Target Interactions
The miRTarBase (http://mirtarbase.cuhk.edu.cn/php/index.php) is a database containing over 430,000 miRNA-target interactions (MTIs) (31). All documented MTIs have been verified using nextgeneration sequencing, microarray, western blot, and reporter assay experiments. We obtained the target information of 10 miRNA in TIM-Sig to construct the miRNA-target network.

Functional Analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) tool. We obtained pathway function annotations of 1,862 target genes. The statistical threshold was set as: P < 0.05.

Statistical Analysis
The R software (version 3.6.1, http://www.R-project.org) was used to derive all statistical analyses. Differentially expressed miRNAs or genes were calculated using R packages "DESeq2." Spearman correlation between signatures were calculated by the corr. test () function in the R program. Overall survival (OS) was predicted using Kaplan-Meier survival plots. The R package "survival" function was used to assess for differences between the OS of high-and low-groups, with a p-value of less than 0.05 taken to indicate statistical significance. The Wilcoxon test allowed for inter-group comparisons. Euclidean distances and Ward's linkage methods were used to carry out hierarchical cluster analyses. Protein-protein interaction (PPI) networks were visualized on STRING tools.

MiRNA-Targets Network
The construction of miRNA-target networks and parameter settings were completed using Cytoscape tools (version 3.6.0). A summary and oncoplot of mutation data were calculated using R package "maftools." All statistical tests with a p-value of less than 0.05 were identified as having achieved statistical significance (*p-value < 0.05; **p-value < 0.005; ***p-value < 0.0005; ****p-value < 0.00005).

Screening of Candidate
Immune-Related miRNAs Figure S1 demonstrates the workflow of this study. In order to screen potential tumor immune-related miRNA biomarkers in NSCLC, a total of 1,884 miRNAs were obtained from the TCGA LUAD/LUSC cohort. Next, we filtered the miRNAs according to the following selection criteria: (1) positive expression in more  Figure S2A). The ESTIMATE algorithm was used to determine tumor purity and tumor immune scores. Spearman correlation analysis was then applied to assess the relationship between the 78 miRNAs and immune score, resulting in a total of 35 miRNAs which were significantly associated with the immune scores (|Spearman correlation| > 0.2 and P < 0.01) ( Figure 1C and Table S1). Interestingly, we found that 15 miRNAs were significantly positively correlated to the stromal score also correlated positively to the immune scores. All of these miRNAs were negatively associated with tumor purity ( Figure 1C and Figures  S2B, C).

Prognostic Value of TIM-Sig
The GSE27435 and GSE16025 cohorts were used to verify our constructed TIM-Sig. The TIM-Sig score was determined for all subjects in the cohort, with the median values used to stratify patients as being either high-or low-risk. Patients with higher risk scores were noted to have poorer OS in contrast to those with lower risk scores (GSE16025: P = 0.0033; GSE27435: P = 0.035), as depicted in Figures 2A, B. The clinical value of the constructed 10-miRNA signature in prognosticating patients with lung cancer was verified.

Potential of the TIM-Sig as an Indicator of Immune and Clinical Factors
We next investigated whether the TIM-Sig score was associated with tumor TNM classification or patient gender. We found a significant difference of TIM-Sig score with tumor size, distant metastasis as well as stage (Wilcoxon test, p-value <0.05; Figure 2C and Figure S3). The difference of some immune factors such as the CYT activity, APM score, TILs score, TIS score, chromosomal instability level, tumor mutation burden, IFN-g expression signature, and T cell infiltration score (TIS) between TIM-Sig high-and low-risk groups were also assessed. A higher immune score (Wilcoxon test, p-value = 3.9e-06; Figure 2D), CD8 T cell score (Wilcoxon test, p-value = 8.5e-06; Figure 2F), HRD (Wilcoxon test, p-value = 6.02e-06; Figure 2G), IFN-g expression signature (Wilcoxon test, p-value = 0.00098; Figure 2H), CYT activity (Wilcoxon test, p-value = 0.0035; Figure 2I), TILs score (Wilcoxon test, p-value = 2.7e-07; Figure 2K), TIS score (Wilcoxon test, p-value = 0.00049; Figure 2L) and TMB (Wilcoxon test, p-value = 0.00016; Figure 2M) were observed in the low-risk group of NSCLC patients. On the contrary, the higher value of tumor purity (Wilcoxon test, p-value = 6.6e-05; Figure 2E) and APM (Wilcoxon test, p-value = 5.4e-07; Figure 2J) were found in high risk group. Generally, these immune factors varied significantly between highand low-risk groups. Furthermore, we also investigated the correlation between the TIM-Sig score and above immune-related factors. We found that the majority of these immune factors were negatively correlated with the TIM-Sig score ( Figure S4). Based on these results, we conclude that there exists a close relationship between TIM-Sig and immune infiltration as well as the immune escape mechanism.
Relationship Between Mutation and TIM-Sig Figure 3A depicts identified somatic mutations in LUAD and LUSC patients. TP53 mutations were found in 58% of samples based on TCGA data, which depict the top 20 most frequently encountered gene mutations in lung cancer ( Figure 3B). In addition, we compared the TIM-Sig score between the mutant   Figure 3C). The frequency that each mutation was encountered also varied between high-and low-risk groups ( Figure S5).

TIM-Sig Could Predict Chemotherapeutics Response
We then investigated whether the TIM-Sig could predict chemosensitivity. To do this, we calculated the TIM-Sig score of NCI60 cell lines using the expression data available in a cellminer database (60 cell lines). The association between the TIM-Sig score and the inhibitory centration (IC 50 ) value of 161 FDA-approved drugs across 60 cell lines were calculated. The result showed that Eribulin mesylate, Olaparib, Brigatinib, Bleomycin, Fulvestrant, Gemcitabine, Dromostanolone Propionate, Imiquimod, and Digoxin appeared to correlate significantly with the risk model (|Spearman correlation| > 0.2 and p < 0.01, Figure 4A). A high immune score was linked to a lower half inhibitory centration (IC 50 ) of medications including Irinotecan (Wilcoxon test, p = 0.039, Figure 4B), Methotrexate (Wilcoxon test, p < 0.047, Figure 4C), Oxaliplatin (Wilcoxon test, p < 0.0034, Figure 4D), and Pemetrexed (Wilcoxon test, p < 0.008, Figure 4E). These findings suggest that the model was able to function as a chemosensitivity predictor.

Identification of TIM-Sig Regulated Targets
To further investigate the function of the 10 miRNA components in the TIM-Sig, a total of 1,862 experimentally validated targets of the 10-miRNAs signature were extracted from the miRTarBase database ( Figure 5A). KEGG pathway analysis revealed that the miRNAs were enriched in cancer, transcriptional dysregulation in cancer, the Hippo signaling pathway, cell cycle, the MAPK signaling pathway as well as other cancer signaling pathways (Top-20 results, Figure 5B). Moreover, we performed KEGG pathway analysis for the targets of each miRNA and revealed that 8 out of 10 were enriched in cancer and immune-related signaling pathways ( Figure 5C), suggesting that 10 miRNAs were associated with immune function and metastasis in cancer. Next, to further explore the relationship between NSCLC patient survival and miRNA targets, differentially expressed genes (DEGs) between normal and tumor samples derived from the TCGA dataset were identified. We obtained a total of 6,914 DEGs, of which there were 403 overlaps with 1,862 targets ( Figures 6A, B). We found that the TCGA cohort could be grouped into two clusters (C1 and C2) by hierarchical clustering using the 403 overlapped genes ( Figure 6C). Survival analysis showed that LUAD-C2 had a good prognosis (log-rank p = 0.0014; Figure 6D). There was marked variability in survival rates in the two groups in the TCGA-LUAD cohort, although none was discovered in

Construction of the TIM-Sig Targets-Based Prognostic Signature
A univariate Cox regression analysis was used to determine the relationship between the 403 overlapped target genes and overall NSCLC patient survival. A total of 49 genes were found to be related to OS in NSCLC patients ( Table S3). The expression of 49 genes between the C1 and C2 group is shown in Figure 7A. Figure 7C demonstrates the interaction of miRNAs and the 49 targets. Multivariate Cox regression analysis was then carried out to determine the relationship between genes with OS. Of these, six genes demonstrated significant ability to prognosticate NSCLC (HR > 1, P < 0.05; Figure 7B and Table S4). The protein-protein interaction (PPI) network also demonstrated close interactions between VEGFC, ALDOA, and PDGFB ( Figure 7H). The risk scores used for predicting prognostic values were derived as follows: RS (patient) = (0.1190*expression value of VEGFC) + (0.0339*expression value of BEST3) + (0.0351*expression value of A1CF) + (0.2608*expression value of ALDOA) + (0.0960* expression value of HOXC4) + (0.1713*expression value of PDGFB). All subjects were separated into low-or high-risk groups. Kaplan-Meier analysis identified a poorer OS in those of high-risk compared to those of low-risk groups (log-rank p < 0.0001; Figure 7D). Prognosis was good in those of LUAD-lowrisk and LUSC-low-risk groups (log-rank p = 0.00021; Figure 7E). In addition, application of these formulas in our verification cohorts also found that patients with low-risk were more likely to have better OS (GSE31210: log-rank p < 0.0001, GSE3141: log-rank p = 0.0023; Figures 7F, G). GO and KEGG pathway analysis revealed that these genes were enriched in cancer and immune-related functions, such as cell motility and Glycolysis/Gluconeogenesis ( Figure 7I). Finally, we compared the frequency of mutations in the six genes between the two groups ( Figures 7J-L). The most commonly encountered mutation was the VEGFC mutation (23%, Figure 7K). Additionally, we compared the differences of mutation among RS genes in low-and high-risk groups. An obvious difference of mutation location of 6 risk genes between high and low groups were observed. (Wilcoxon test, p < 0.05; Figure 7L). Taken as a whole, these findings highlight the significant value of a six-gene signature in prognosticating patients. individualized immune signature that can estimate prognosis in patients with early-stage non-squamous NSCLC (32). Hawazin et al. characterized the molecular subtypes of NSCLC, which demonstrated important differences in immune host response (33). Also, our previous study identified different molecular subtypes of NSCLC according to the immune landscape and constructed a prognostic model (18). However, the role of non-coding RNA, especially miRNAs in the NSCLC immune microenvironment have not been well elucidated. Some immune-related miRNAs were found to be fundamental in the regulation of innate and immune responses to tumor cells. But these studies were only focused on individual immune-related miRNAs in limited samples. In this study, we identified potential tumor immune-related miRNA biomarkers from miRNA-seq profiling data in TCGA. The constructed immune-related miRNA signature was tested and found to be able to function as a means stratify the risk of NSCLC patients. Of these 10 miRNAs included in the signature, a number have previously been explored in cancer research. One example is the central role of miR-146a in the melanoma immune microenvironment (34). Combined inhibition of PD-1 and miR-146a may be able to elicit an anti-tumor immune response (34). Furthermore, miR-27b has been characterized as a biomarker for recurrent ovarian cancer (35). Our novel miRNA profile also includes yet to be reported miRNAs which may hold significant prognostic values in NSCLC. The immuneassociated functions of these miRNAs were confirmed by stratifying the subjects into low-and high-risk cohorts. We found that the low-risk group had a markedly higher immune score and lower tumor purity. These observations were confirmed by the higher tumor cell aggregation in the low-risk group which was represented by TILs and TIS. This is consistent with previous reports that a high degree of immune cell infiltration was responsible for a significantly favorable prognosis in NSCLC (18). Our results also demonstrated that the lowrisk group possessed raised CYT and TMB expressions, with higher degrees of CD8+ T cell infiltration. Similar findings of better outcomes in those with higher CYT levels have also been reported in cancer patients (18,36). In patients with resected NSCLC, higher TMB scores were indicative of a more favorable prognosis (37). CD8+ T cell infiltration appeared to function as a superior predictive biomarker in response to anti-PD-1 immunotherapy (38). Our data indicated that the TIM-Sig score was significantly higher in TP53 and CSMD3 mutation samples. Other studies have reported a higher proportion of activated immune cell infiltration in patients with TP53 mutations, resulting in a significantly prolonged progression-free survival in the LUAD cohort (39). CSMD3 mutations have been characterized as tumors with high concentrations of T cells in patients with high-grade serous ovarian carcinoma (40). In addition, the CSMD3 mutation was related to improved response to anti-PD1/PD-L1 and higher survival rates solid tumors (41,42).

DISCUSSION
Previous studies have shown that immunogenomic-derived immune scores were indicative of chemotherapeutic benefits (43). We subsequently investigated whether the TIM-Sig could predict chemosensitivity in NSCLC. Our results suggested that the IC50 values were significantly higher in the low-risk group for some anti-cancer agents. Among these agents, irinotecan represents a widely used chemotherapeutic medication in treating solid tumors and its sensitivity has been reported to correlate with CD8+ T cell fraction in pancreatic cancer (44). CD8 effector cells have previously been reported to enhance the anti-tumor response of methotrexate, another anti-cancer agent, in breast cancer (45). However, we are unable to investigate TIM-Sig prognostic significance in regard to response to immunotherapy due to the lack of subjects who received treatment involving immune checkpoint inhibitors. This is one limitation in our study. Functional analysis of these 10 miRNAs in TIM-Sig may further our understanding of their individual roles in NSCLC. To this end, we obtained the validated targets of these miRNAs. KEGG pathway analysis uncovered that the target genes were most enriched in cancer-related pathways. Additionally, the targets of individual miRNAs were also enriched in cancer and immune-related pathways comprising of pathways in cancer, adherens junction, the chemokine signaling pathway, and the HIF-1 signaling pathway. Dysregulation of adherens junction function is critical in modulating efficient collective invasion and migration of carcinoma cells (46). The adherens junction was also an activated pathway in breast cancer cases with low immunity (29). The classification based on HTF 1 signaling pathway profile was able to determine subgroups of prostate cancer patients who were maximally responsive to chemo-and immunotherapy (47). The components of our constructed 10-miRNA signature were strongly involved in immune function and cancer metastasis. Next, we identified 403 differentially expressed target genes between normal lung and NSCLC samples. The expression profile of these genes revealed two distinct sample clusters with different outcomes. The two patient clusters in the TCGA-LUAD cohort had significantly different survival outcomes. However, no significant difference was observed in the TCGA-LUSC cohort. Although LUAD and LUSC are the most frequently encountered NSCLC subtypes, they vary from each other considerably (48,49). LUSC has been found to grow at a faster rate in contrast to LUAD. LUSC was also found to possess suppressed expressions of molecules involved in the activation of the immune response, such as chemokines and MHC molecules (50). These findings might explain the different results of survival analysis between LUAD and LUSC. In efforts to improve the prognostic performance of the target genes, we identified six differentially expressed target genes which were correlated with survival: VEGFC, ALDOA, BEST3, A1CF, HOXC4, and PDGFB. Subjects in the TCGA cohort were stratified into high-or low-risk groups using the gene-based RS. Low-risk groups of both LUAD and LUSC had significantly better survival than those in the high-risk groups. Among these six genes, VEGFC, ALDOA, and PDGFB closely interacted with each other in the PPI network. It has been reported that VEGFC knockdown results in reduced PDGFB levels in melanoma cell lines. Moreover, both of them were regulated by E2F1 in angiogenesis (51). Lung cancer metastasis and metabolic reprogramming appears to be strongly dependent on ALDOA (52). Samples of lung cancer have been noted to possess an overexpression of ALDOA, which enhances epithelialmesenchymal transition (53). Our data suggested that immunerelated miRNAs regulated immune cell infiltration in NSCLC both through themselves and their target genes. In summary, our study on the identification of tumor immune-associated miRNAs provides valuable functional insights and potential clinical guidance for personalized therapy for NSCLC patients.

CONCLUSIONS
In brief, this study aimed to construct an immune-related miRNA signature that effectively predicts NSCLC patient prognosis. An immune-related miRNA signature (TIM-Sig) was constructed using LASSO-penalized multivariate survival models and was evaluated in several public cohorts from the Gene Expression Omnibus (GEO) and the CellMiner database. Further analysis on the miRNA-target network of TIM-Sig revealed that these target genes had prognostic value in both lung squamous cell carcinoma and adenocarcinoma. Our study provides valuable functional insights and potential clinical guidance for personalized therapy for NSCLC patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
SJ and LY designed the study. YZ, ZL, LQ, and ML collected data. YZ and KM developed the computational model and analyzed the data. SJ and YZ wrote the article. All authors reviewed the manuscript. All authors contributed to the article and approved the submitted version.