Analysis of Immune–Stromal Score-Based Gene Signature and Molecular Subtypes in Osteosarcoma: Implications for Prognosis and Tumor Immune Microenvironment

Objective: Infiltrating immune and stromal cells are essential for osteosarcoma progression. This study set out to analyze immune–stromal score-based gene signature and molecular subtypes in osteosarcoma. Methods: The immune and stromal scores of osteosarcoma specimens from the TARGET cohort were determined by the ESTIMATE algorithm. Then, immune-stromal score-based differentially expressed genes (DEGs) were screened, followed by univariate Cox regression analysis. A LASSO regression analysis was applied for establishing a prognostic model. The predictive efficacy was verified in the GSE21257 dataset. Associations between the risk scores and chemotherapy drug sensitivity, immune/stromal scores, PD-1/PD-L1 expression, immune cell infiltrations were assessed in the TARGET cohort. NMF clustering analysis was employed for characterizing distinct molecular subtypes based on immune-stromal score-based DEGs. Results: High immune/stromal scores exhibited the prolonged survival duration of osteosarcoma patients. Based on 85 prognosis-related stromal–immune score-based DEGs, a nine-gene signature was established. High-risk scores indicated undesirable prognosis of osteosarcoma patients. The AUCs of overall survival were 0.881 and 0.849 in the TARGET cohort and GSE21257 dataset, confirming the well predictive performance of this signature. High-risk patients were more sensitive to doxorubicin and low-risk patients exhibited higher immune/stromal scores, PD-L1 expression, and immune cell infiltrations. Three molecular subtypes were characterized, with distinct clinical outcomes and tumor immune microenvironment. Conclusion: This study developed a robust prognostic gene signature as a risk stratification tool and characterized three distinct molecular subtypes for osteosarcoma patients based on immune–stromal score-based DEGs, which may assist decision-making concerning individualized therapy and follow-up project.


INTRODUCTION
Osteosarcoma represents the most frequent and primary bone sarcoma, which primarily affects children, adolescents, and young adults (Corre et al., 2020). Globally, the incidence of osteosarcoma is approximately 1-3 cases yearly per million persons (Itoh et al., 2018). Neo-adjuvant therapy followed by postoperative adjuvant therapy with a cocktail of chemotherapy is the first-line therapeutic strategy for locally resectable osteosarcoma, leading to increased 5-years survival rate that is up to 70% for patients with localized osteosarcoma (Deng et al., 2020). However, most of metastatic or relapsed patients do not benefit from this therapy and the 5-years survival rate is below 30% (Yang et al., 2020). How to improve osteosarcoma treatment and prolong clinical outcomes is a prime issue in current research.
Osteosarcoma is characterized by huge heterogeneity at the intra-tumoral and individual levels (Zhou et al., 2020b). Thus, it is of significance to identify shared gene sets driving osteosarcoma. Tumor microenvironment is composed of mesenchymal cells, immune cells, endothelial cells, stromal cells, inflammatory mediators, and the like . It participates in mediating biological properties of osteosarcoma like metastasis, immune escape, and drug resistance (Zheng et al., 2018). Evidence demonstrates that immune-and stroma cell-related signatures in the tumor microenvironment serve as critical regulators in the prognosis of osteosarcoma patients (Wen et al., 2020). Precise management and proper individualized therapy regarding osteosarcoma are required in conformity to risk stratification. Furthermore, it is of importance to understand the immune-stromal score-based signatures in the tumor immune microenvironment, which may assist clarify clinical implications in the tumor immune microenvironment as well as propose novel therapeutic strategies (Qi et al., 2020). Herein, this study set out to characterize immune-stromal score-based gene signature as a prognostic stratification tool and molecular subtypes with distinct prognosis and tumor immune microenvironment in osteosarcoma.

Data Preparation and Preprocessing
Clinical and transcriptome data of osteosarcoma patients were downloaded from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database (https://ocg. cancer.gov/programs/target). Eighty-four samples with both complete survival information and expression profiles were set as the training set. One external dataset from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) was selected as the validation set (accession: GSE21257). This dataset contained 53 osteosarcoma samples on the GPL10295 platform. The specific clinical information of 84 samples in the TARGET database and 53 samples in the GSE21257 dataset was separately listed in Supplementary Tables S1 and S2. The workflow of this study is shown in Figure 1.

Estimation of STromal and Immune Cells in MAlignant Tumor Tissues Using Expression Data
Based on the normalized expression matrix, stromal and immune scores across osteosarcoma specimens from the TARGET and GSE21257 datasets were estimated by applying the ESTIMATE algorithm (Yoshihara et al., 2013). This algorithm may infer the overall infiltration levels of stromal and immune cells in tumor tissues using gene expression signatures (https://sourceforge.net/ projects/estimateproject/). On the basis of the median values of stromal/immune scores, the patients were separated into two groups, respectively. The Kaplan-Meier overall survival (OS) curves were examined between groups and the prognosis was compared by log-rank test.

Differential Expression Analysis
The limma package was applied for differential expression analysis between high and low stromal/immune score groups of osteosarcoma samples from the TARGET database (Ritchie et al., 2015). |Fold change (FC)| > 1.5 and adjusted p < 0.05 were set as the criteria of differentially expressed gene (DEG) identification. The up or downregulated genes were visualized into volcano plots. Upregulated genes from immune/stromal high versus low groups and downregulated genes were separately intersected.

Functional Enrichment Analysis
The enrichment analysis of stromal-immune score-based DEGs was carried out via the clusterProfiler package, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Yu et al., 2012). GO categories contained biological process (BP), molecular function (MF), as well as cellular component (CC). Terms with adjusted p < 0.05 were significantly enriched. regression analysis. p < 0.05 was set as the significant cut-off for identifying candidate genes related to osteosarcoma prognosis. The least absolute shrinkage and selection operator (LASSO) was employed for feature selection and obtaining an optimal gene signature in the TARGET database. By applying 10-fold crossverification and penalty, a prognostic gene signature was constructed via the glmnet package. The risk score of each sample from the training and validation sets was determined based on expression profiles and coefficients of feature genes. The formula was as follows: risk score expression of gene 1 × coefficient of gene 1 + expression of gene 2 × coefficient of gene 2 +. . .+ expression of gene n × coefficient of gene n.

Evaluation of the Prognostic Efficacy of the Gene Signature
The osteosarcoma patients in the training and validation sets were classified into high-and low-risk groups on the basis of the median value of the risk scores. The OS time between groups was compared by Kaplan-Meier curves and log-rank test. Receiver operating characteristic (ROC) curve of OS time was conducted for assessing the predictive performance of this gene signature via survival ROC package. The area under the curve (AUC) value of ROC was calculated to obtain the optimal model. Also, the Akaike information criterion (AIC) value of each point on the AUC was determined to distinguish the optimal cut-off point to differentiate the high-or low-risk patients.

Assessment of Chemotherapy Drug Sensitivity
By employing the Genomics of Drug Sensitivity in Cancer (GDSC) database (www.cancerRxgene.org) (Yang et al., 2013), underlying chemotherapy drugs that were sensitive for osteosarcoma subjects harboring risk scores were predicted FIGURE 1 | Study design and workflow overview.

Connectivity Map
The cMap database contains over 7,000 expression profiling of human cells treated with small molecules (Lamb et al., 2006). DEGs between high-and low-risk groups were screened with the screening criteria of |FC| > 1.5 and adjusted p < 0.05. These up-and down-regulated genes were mapped onto the cMap database. The connectivity scores ranging from −1 to 1 indicated the correlations between small molecules and DEGs. The positive connectivity scores were indicative of stimulative effects of compounds on these DEGs and the negative connectivity scores were indicative of repressed effects of compounds on them. The candidate small molecules related to osteosarcoma were screened with | connectivity score|>0.9 and p < 0.05. Furthermore, shared mechanism of action (MOA) was predicted for these candidate drugs.

Single-Sample Gene Set Enrichment Analysis
The infiltrations and activations of immune cells that were retrieved from published gene signature across all cancer as well as normal specimens were quantified utilizing the ssGSEA algorithm (Charoentong et al., 2017;Jia et al., 2018). The immune cells contained activated B cells, activated CD4 T cells, activated CD8 T cells, central memory CD4 T cells, central memory CD8 T cells, effector memory CD4 T cells, effector memory CD8 T cells, gamma delta T cells, immature B cells, memory B cells, regulatory T cells, T follicular helper cells, type 1 T helper cells, type 17 T helper cells, type 2 T helper cells, activated dendritic cells, CD56bright natural killer cells, CD56dim natural killer cells, eosinophils, immature dendritic cells, macrophages, mast cells, MDSCs, monocytes, natural killer cells, natural killer T cells, neutrophils, and plasmacytoid dendritic cells. The ssGSEA scores of each immune cell type were normalized in all samples in the TARGET dataset. Previous two studies have characterized marker genes of three phenotypes of osteosarcoma cell lines: tumorigenic phenotype, invasive phenotype, and colony forming phenotype (Lauvrak et al., 2013;Sharma et al., 2017). This study curated the shared marker genes in above studies, which played important roles in modulating the three phenotypes of osteosarcoma cell lines. The ssGSEA score of each phenotype was quantified in the TARGET dataset.

Gene Set Enrichment Analysis
The KEGG pathways that were distinctly related to the risk score were analyzed by GSEA software (https://www.broadinstitute. org/gsea/index.jsp) based on the gene expression (Subramanian et al., 2005). The "c2.cp.kegg.v7.1.symbols.gmt" file was obtained as a reference gene set. Enrichment adjusted p-values were based on 1,000 permutations. Terms with |normalized enrichment score (NES)|>1 and adjusted p < 0.05 were recognized as significant enrichment.

Nomogram
To better apply this prognostic gene signature, a nomogram was established in the training set for prediction of 1-, 3-, and 5-years survival probability of osteosarcoma patients.

Unsupervised Clustering Analysis by Nonnegative Matrix Factorization
Consensus clustering analysis was performed with the NMF algorithm for identifying distinct molecular subtypes according to the 272 stromal-immune score-based DEGs. The expressions of these signatures (matrix A) were factorized into three nonnegative matrices. Repeated factorization of matrix A was carried out, and its outputs were aggregated for obtaining consensus clustering of osteosarcoma specimens in the TARGET dataset. The optimal number of clustering was chosen based on cophenetic, dispersion, as well as silhouette coefficients. The 200 nruns was utilized for performing the consensus clustering.

Statistical Analysis
All analysis was presented using R version 3.4.1 (http://www.Rproject.org) and its appropriate packages. Two groups were compared utilizing Wilcoxon test, while multiple comparisons were evaluated by applying Kruskal-Wallis test. Two-sided p < 0. 05 indicated statistical significance. The source code of this study is provided in Supplementary Table S3.

Stromal and Immune Scores Are Associated With Survival Outcomes of Osteosarcoma Patients
From the TARGET database, stromal and immune scores of each osteosarcoma sample were estimated by the ESTIMATE algorithm. All patients were classified into high and low stromal/immune score groups on the basis of the median values. The survival differences between groups were compared by Kaplan-Meier curves. Accordingly, patients with high immune scores (p 2.739e-03) or stromal scores (p 1.965e-02) displayed the advantages of survival duration (Figures 2A,B). To obtain stromal-or immune score-related genes, differential expression analyses were carried out between high and low stromal-or immune-score groups. As a result, 633 genes were upregulated and 933 genes were downregulated between high and low immune score groups ( Figure 2C; Supplementary Table S4). Furthermore, there were 576 upregulated genes and 232 downregulated genes between high and low stromal score groups ( Figure 2D; Supplementary Table  S5). Also, we depicted the distributions of stromal score and immune score across osteosarcoma samples in the TARGET and GSE21257 datasets ( Figures 2E,F). The correlation analyses showed that there were positive interactions between stromal score and immune score across osteosarcoma samples in the TARGET (R 0.52 and p 3.1e-07) and GSE21257 (R 0.68 and p 7.6e-08) datasets ( Figures 2G,H). Following intersection, we obtained 272 stromal-immune score-based DEGs that showed consistent trends between high/low immune score groups and high/low stromal score groups in osteosarcoma ( Figure 2I). Their biological functions were analyzed in depth. Accordingly, they were involved in regulating immune cell activation like lymphocytes and neutrophils ( Figure 2J). MHC protein complex, antigen binding, and immunoglobulin were significantly enriched by these genes. Also, immune pathways such as antigen processing and presentation, Th1, Th2, and Th17 Frontiers in Genetics | www.frontiersin.org September 2021 | Volume 12 | Article 699385 5 cell differentiation were distinctly enriched by these genes ( Figure 2K).

Development and Validation of a
Stromal-Immune Score-based Prognostic Gene Signature for Osteosarcoma By applying univariate Cox regression analysis, 85 genes were distinctly related to osteosarcoma prognosis among the 272 stromal-and immune score-related DEGs in the TARGET dataset (Table 1). Based on them, nine factors were included in this LASSO model in the training set, containing FPR1, GBP1, FUCA1, PDK1, BNIP3, EVI2B, APBB1IP, FOLR2, and COCH ( Figures 3A,B). The risk score of each subject was determined according to the expression of FPR1 * (−0.00618173524670456) + expression of GBP1 * (−0.00428251209641429) + expression of FUCA1 * (−0.00768782441651179) + expression of PDK1 * 0.0078776021549243 + expression of BNIP3 * 0.00311883 814864424 + expression of EVI2B * (−0.0014871562280857) + expression ofAPBB1IP * (−0.0258746909192034) + expression of FOLR2 * (−0.000566207076062648) + expression of COCH * 0.00731886943085428. All patients in the TARGET cohort were classified into high-and low-risk groups. In Figure 3C, patients in the high-risk group displayed the worse survival outcomes in comparison with those in the low-risk group ( 1.072e-06). Furthermore, there was higher proportion of dead patients in the high-risk group ( Figures 3D,E). To assess the predictive efficacy of this signature, ROC of OS time was  conducted. The AUC value was 0.881, demonstrating the well performance on predicting osteosarcoma patients' prognosis ( Figure 3F). The maximum inflection point was recognized as the cut-off point on the ROC curve by the AIC values. The GSE21257 dataset was used for independently validating this prognostic signature. Consistently, high-risk score indicated poor prognosis of osteosarcoma patients (p 5.074e-03; Figure 3G). Higher proportion of dead patients was found in the high-risk group ( Figures 3H,I). The AUC of OS was 0.849, confirming that this signature possessed high accuracy and sensitivity on predicting prognosis ( Figure 3J).

Prediction of Chemotherapy Drug Sensitivity and Potential Small Molecule Compounds Based on Risk Scores
The sensitivity to chemotherapy drugs including paclitaxel, methotrexate, doxorubicin, and cisplatin was estimated in each osteosarcoma sample from the TARGET database ( Figures  4A-D). Accordingly, there were significantly lowered IC50 values of doxorubicin in the high-risk group than the low-risk group (p 0.0062), indicating that the high-risk osteosarcoma patients were more sensitive to doxorubicin. There were 57 upregulated and 658 downregulated genes in the high-risk specimens than the low-risk specimens (Supplementary Table S6). Based on these DEGs, 61 small molecule compounds with |connectivity score| > 0.9 and p < 0.05 were predicted for treating osteosarcoma by cMap analysis ( Table 2). Also, we analyzed the shared MOA among these small molecule compounds. As a result, sotalol, bisoprolol, suloctidil, and nicergoline shared adrenergic receptor antagonist ( Figure 4E). Pirenzepine, tropicamide and lobeline had the shared acetylcholine receptor antagonist. Risperidone, molindone, and piperacetazine shared dopamine receptor antagonist. Protein synthesis inhibitor was shared by puromycin, emetine, and diloxanide.

Associations Between Risk Scores and Tumor Immune Microenvironment of Osteosarcoma
Stromal and immune scores were evaluated in the high-and low-risk osteosarcoma specimens with the ESTIMATE algorithm. Accordingly, higher stromal scores (p 4.47e-07) and immune scores (p 4.32e-11) were detected in the low-risk group compared to the high-risk group ( Figures 5A,B). Considering that programmed death-1 (PD-1) and programmed death-ligand 1 (PD-L1) are well-established markers for prediction of the responses to anti-PD-1/L1 therapies , the expression distributions of PD-1 and PD-L1 were assessed. No significant difference in PD-1 expression was found between the high-and low-risk groups ( Figure 5C). The low-risk osteosarcoma group displayed distinctly higher PD-L1 expression than the highrisk group (p 0.00054; Figure 5D), indicating that the low-risk samples could be likely to respond to anti-PD-L1 therapy. The  T cells, neutrophils, and plasmacytoid dendritic cells ( Figure 5E).  In TARGET cohort, we evaluated the associations between risk scores and clinicopathological characteristics of osteosarcoma patients. In Figure 7A, metastatic patients exhibited distinctly increased risk score than non-metastatic patients (p 0.0033). However, no significant difference in risk score was found between female and male ( Figure 7B) as well as between stage 1/2 and stage 3/4 ( Figure 7C). In the GSE21257 dataset, we observed that there was no significant difference in risk score between female and male ( Figure 7D) as well as between grades I-II and grades III-IV ( Figure 7E). Three phenotypes of osteosarcoma cell lines were scored in each osteosarcoma sample in the TARGET cohort based on the relevant marker genes using ssGSEA method. We observed that risk score was negatively correlated to tumorigenic phenotype, invasive phenotype, and colony forming phenotype across osteosarcoma ( Figures 7F,G).

Establishment of a Nomogram for Prediction of 1-, 3-, and 5-years Survival of Osteosarcoma
For better applying this risk signature, a nomogram was established for osteosarcoma prognosis in the TARGET cohort, which contained FPR1, GBP1, FUCA1, PDK1, BNIP3, EVI2B, APBB1IP, FOLR2, and COCH. Our data demonstrated that this nomogram could be predictive of 1-, 3-, and 5-years survival probability of osteosarcoma patients (Figure 8). In this model, the 1-, 3-, and 5-years survival probability of the patients was determined through the total points that were calculated through adding up the point of each gene.

Characterization of Osteosarcoma Subtypes With Distinct Prognostic Implications
Using the 272 stromal-immune score-based DEGs, consensus clustering analysis was applied with the NMF algorithm for stratifying osteosarcoma samples into distinct molecular subtypes. Accordingly, three molecular subtypes were characterized, including cluster1 (n 50), cluster2 (n 13) and cluster3 (n 21; Figures 9A,B). To understand the difference in the underlying biology of the three subtypes, we established a heatmap that showed the differences in expression of the 272 DEGs among three clusters ( Figure 9C). Among them, cluster1 had the worst survival outcomes (p 2.987e-02; Figure 9D). Also, we noticed distinct differences in immune and stromal scores between molecular patterns. The lowest immune scores (p 5e-09; Figure 9E) and stromal scores (p 0.00047; Figure 9F) were found in cluster1.

DISCUSSION
Osteosarcoma, a common bone malignancy, is prone to metastasis as well as undesirable survival outcomes (Yao et al., 2018). Stromal and immune cells in the tumor microenvironment are critical regulators of osteosarcoma progression, drug resistance and treatment response (Smeland et al., 2019). Based on immune-stromal score-based DEGs, we developed and verified a gene signature for the prediction of survival outcomes of osteosarcoma patients as well as characterized three distinct molecular subtypes. An immune-stromal score-based gene signature containing FPR1, GBP1, FUCA1, PDK1, BNIP3, EVI2B, APBB1IP, FOLR2, and COCH was established for predicting clinical outcomes of osteosarcoma subjects. Consistent with previous research, FUCA1 was distinctly correlated to osteosarcoma patients (Yiqi et al., 2020). PDK1 upregulation was detected in osteosarcoma, which elevated a proliferative capacity of osteosarcoma cells (Li et al., 2017). EVI2B that possessed the predictive potential on prognosis was associated with metastasis and immune cell infiltration in osteosarcoma (Yang et al., 2021). Immune-related APBB1IP was in relation to clinical outcomes of osteosarcoma patients (Cao et al., 2020). More assays should be carried out to verify their biological implications in osteosarcoma. Consistently, high stromal or immune scores were in relation to prolonged survival time of osteosarcoma (Alves et al., 2019). Patients with high risk were indicative of unfavorable prognosis. After external verification, this signature possessed the well performance on prediction of clinical outcomes of osteosarcoma patients.
At current, neoadjuvant chemotherapy and surgery are approved to remove primary or metastatic osteosarcoma, followed by adjuvant chemotherapy after surgery, which may increase OS time of osteosarcoma patients . Nevertheless, drug resistance leads to worse clinical outcomes (Xiao et al., 2018). Doxorubicin represents the most effective firstline drug regarding high-grade osteosarcoma (Buondonno et al., 2019). Here, high-risk patients were more sensitive to doxorubicin, which indicated that these subjects were more likely to benefit from this chemotherapy drug. Immunotherapy, especially immune checkpoint inhibitors, has hugely altered the therapeutic landscape of metastatic or recurrent osteosarcoma . However, only some patients can benefit from immunotherapy (Yu et al., 2019). Here, we found that low-risk patients had the increase in PD-L1 expression and higher infiltration levels of various immune cells, indicating that these patients were more likely to respond to anti-PD-L1 therapy. Our GSEA results also confirmed that immune pathways such as B cell receptor signaling pathway (Long et al., 2016), chemokine signaling pathway (Chao et al., 2020), cytokine-cytokine receptor interaction (Lu et al., 2018), and toll-like receptor signaling pathway (Zhou et al., 2020a) were significantly activated in the low-risk patients. Previous two studies have characterized the gene set of phenotypes of 22 osteosarcoma cell lines, including tumorigenic phenotype, invasive phenotype, and colony forming phenotype, which reflect the osteosarcoma heterogeneity (Lauvrak et al., 2013;Sharma et al., 2017). One has characterized osteosarcoma phenotypes at multiple levels. The other suggests that these phenotypically differentiated osteosarcoma cell lines play important roles in modulating tumor microenvironment by multitype network-guided target controllability analysis (Sharma et al., 2017). Here, we employed ssGSEA method to quantify tumorigenic phenotype, invasive phenotype, and colony-forming phenotype across osteosarcoma samples based on the expression profiles of shared marker genes between above two studies. We observed that immune-stromal score-based risk score was negatively associated with tumorigenic phenotype, invasive phenotype, and colony-forming phenotype, indicative of the important role of the risk score in the progression of osteosarcoma. Kaplan-Meier curves of OS for osteosarcoma patients in the TARGET cohort with three molecular subtypes. P for log-rank test. (E) The immune scores in three stromal-immune score-based clusters using the ESTIMATE algorithm. (F) The stromal scores of three gene clusters with the ESTIMATE algorithm. The differences between three subtypes were compared by Kruskal-Wallis test. ns, no statistical significance. ***p < 0.001; ****p < 0.0001.
By employing NMF algorithm, we established three immunestromal score-based gene patterns for osteosarcoma patients, with distinct survival outcomes. Among them, cluster3 had the highest PD-1/PD-L1 expression and the highest infiltration levels of activated B cells, activated CD4 T cells, activated CD8 T cells, central memory CD4 T cells, central memory CD8 T cells, effector memory CD4 T cells, effector memory CD8 T cells, gamma delta T cells, immature B cells, memory B cells, regulatory T cells, T follicular helper cells, type 1 T helper cells, type 2 T helper cells, activated dendritic cells, mast cells, MDSC, natural killer cells, natural killer T cells and plasmacytoid dendritic cells. These data indicated that osteosarcoma patients in cluster3 were more likely to benefit from immunotherapy. The limitations of this study should be pointed out. First, due to limited number of patients, their clinical implications will be validated in a larger osteosarcoma cohort in our future studies. Second, in the TARGET and GSE21257 datasets, there was lack of clinicopathological characteristics (such as stage and grade) of osteosarcoma patients. Thus, it was difficult to analyze the correlations between the risk score and clinicopathological characteristics. The differences between three subtypes were compared by Kruskal-Wallis test. (C) The fractions of tumor-infiltrating immune cells in three clusters with the ssGSEA algorithm. The statistical differences between clusters were compared with the Kruskal-Wallis test. ns, no statistical significance. *p < 0.05; **p < 0.01; ***p < 0.001.