International Prognostic Index-Based Immune Prognostic Model for Diffuse Large B-Cell Lymphoma

Background The International Prognostic Index (IPI) is widely used to discriminate the prognosis of patients with diffuse large B-cell lymphoma (DLBCL). However, there is a significant need to identify novel valuable biomarkers in the context of targeted therapy, such as immune checkpoint blockade (ICB). Methods Gene expression data and clinical DLBCL information were obtained from The Cancer Genome Atlas and Gene Expression Omnibus datasets. A total of 371 immune-related genes in DLBCL patients associated with different IPI risk groups were identified by weighted gene co-expression network analysis, and eight genes were selected to construct an IPI-based immune prognostic model (IPI-IPM). Subsequently, we analyzed the somatic mutation and transcription profiles of the IPI-IPM subgroups as well as the potential clinical response to immune checkpoint blockade (ICB) in IPI-IPM subgroups. Results The IPI-IPM was constructed based on the expression of CMBL, TLCD3B, SYNDIG1, ESM1, EPHA3, HUNK, PTX3, and IL12A, where high-risk patients had worse overall survival than low-risk patients, consistent with the results in the independent validation cohorts. The comprehensive results showed that high IPI-IPM risk scores were correlated with immune-related signaling pathways, high KMT2D and CD79B mutation rates, and upregulation of inhibitory immune checkpoints, including PD-L1, BTLA, and SIGLEC7, indicating a greater potential response to ICB therapy. Conclusion The IPI-IPM has independent prognostic significance for DLBCL patients, which provides an immunological perspective to elucidate the mechanisms of tumor progression and sheds light on the development of immunotherapy for DLBCL.


INTRODUCTION
Diffuse large B-cell lymphoma (DLBCL) accounts for approximately 40% of non-Hodgkin B-cell lymphoma, with an annual incidence rate of over 100,000 cases worldwide (1,2). Although the current frontline DLBCL therapy (the standard R-CHOP chemotherapy regimen) is associated with a high complete response rate of 70%-80%, 10%-15% of DLBCL patients are refractory, and almost 40% of patients experience relapse within 2-3 years after initial response (3,4). With the development of high-throughput technologies, germinal center B-cell-like and activated B-cell-like DLBCL subtypes were identified by gene expression profiling based on cell-of-origin classification (5)(6)(7). More recently, several key cytogenetic alterations including mutations, somatic copy number alterations, and structural variants have been shown to classify distinct genetic subtypes within the cell-of-origin subgroups, providing insights into heterogeneous disease pathogenesis and candidate treatment targets (1,3,(7)(8)(9). Several prognostic factors including cell-of-origin and the International Prognostic Index (IPI) have already been identified in the rituximab era, which still need further investigation in the context of targeted therapies (7,(9)(10)(11)(12). Therefore, there is an urgent need to explore potential molecular mechanisms and identify key biomarkers and therapeutic targets.
Accumulating evidence has shed light on the prognostic role of the tumor microenvironment (TME) in immune checkpoint blockade therapy (ICB), which is mostly composed of a variety of immune cells (T, NK, and B cells as well as macrophages) and stroma (blood vessels and extracellular matrix [ECM]) (13)(14)(15)(16). Kotlov et al. (11) characterized the DLBCL TME into four distinct microenvironment compositions including "germinal center-like" (GC), "mesenchymal" (MS), "inflammatory" (IN), and "depleted" (DP) form, which are associated with distinct clinical behavior and provide novel potential targets for innovative therapeutic interventions.
In this study, we identified immune-related hub genes in DLBCL patients at different IPI levels by weighted gene coexpression network analysis (WGCNA) and constructed an IPIbased immune-related prognostic model (IPI-IPM). We then characterized the somatic mutation and transcription profiles of the IPI-IPM subgroups, investigated the expression of several inhibitory immune checkpoints between low-and high-risk subgroups, and applied an unsupervised clustering algorithm to analyze the gene expression pattern of lymphoma microenvironment (LME) signatures. The results showed that IPI-IPM was a promising prognostic biomarker, which also has potential for use in patient management.

Identification of Immune-Related Genes Associated With IPI in DLBCL Patients
A flowchart is shown to demonstrate the procedure and results of our study (Figure 1). RNA-seq data of 570 DLBCL patients were obtained from The Cancer Genome Atlas (TCGA) (48 from TCGA-DLBC, 41 from CTSP-DLBCL1, and 481 from NCICCR-DLBCL). Among the 566 DLBCL patients with overall survival (OS) data, 321 (56.71%) were male and 245 (43.29%) were female. The age of the patients ranged from 14 to 92 years (median, 62 years) at initial diagnosis. Other clinical characteristics, including follow-up period, Ann Arbor stages, lactate dehydrogenase (LDH) ratio, Eastern Cooperative Oncology Group (ECOG) performance status, and number of extranodal sites, are documented in Table 1 and Supplementary Material 1. To remove the batch effect among these three projects, we utilized the ComBat-seq function to transform the raw count data using the sva R package. Then, principal component analysis was performed to show that there was no obvious batch effect among the samples (Figure 2A). After excluding samples with unrecorded IPI scores or with an IPI score crossing risk groups (such as 1-5 or 3-4), 458 DLBCL patients were divided into low-risk (n = 118), intermediate-risk (n = 221; 106 at low-intermediate risk, and 92 at highintermediate risk), and high-risk groups (n = 109) ( Table 1). Consistent with previous publications, patients in the low-IPI risk group had a much longer OS ( Figure 2B).
As shown by gene set variation analysis (GSVA), ECM receptor interaction, CD28-dependent PI3K/AKT signaling, IL-6-type cytokine receptor ligand interaction, and other immunologic signaling pathways were significantly enriched in the low-risk group ( Figure 2C and Supplementary Figures S1A, B). A total of 4,651 genes (633 upregulated and 4,018 downregulated) were significantly differentially expressed between the IPI high-and lowrisk groups (Supplementary Figures S1C, D and Supplementary Material 2). By intersecting with the immunologic signature gene sets (combining 20,837 genes from ImmuneSigDB and Immport, Supplementary Material 3), 1,927 immune differentially expressed genes (DEGs) (254 upregulated and 1,673 downregulated) were identified for further analysis ( Figure 2D and Supplementary Figures S1E, F).
We applied variance-stability-transformed (VST) expression data via DESeq2 as the input data for WGCNA, including 13,329 genes with the top 25% variance among all samples ( Figure 3A). All clinical characteristics were enrolled as trait variables, and the best b value in the co-expression network was calculated to be 9 (Supplementary Figure S2B). The distance threshold for merging modules was set to be 0.30, so as to construct a reasonable number of merged modules (Supplementary Figures S2A, D). As shown in the module-trait relationship, eight modules were significantly correlated with the IPI group ( Figure 3B), and a high correlation (p < 0.0001) between gene significance of IPI risk groups and gene module membership was found in the genes of three modules (brown, pink, dark red) ( Figure 3C and Supplementary Figures  S2E, G). By intersecting 4,106 genes from the top three IPIcorrelating modules with 1,927 immune DEGs, a total of 371 genes were identified as immune-related genes associated with IPI, which were used for prognostic risk model construction (Supplementary Figure S2H). Several gene sets, such as ECM organization, ECM receptor interaction, PI3K/AKT signaling, and integrin cell surface interaction were enriched in the top three IPI-correlating modules (Supplementary Figure S2I and Supplementary Material 4).

Construction and Validation of an IPI-Based Immune Prognostic Risk Model
In the training cohort, of which 563 patients with matched RNA-seq data and overall survival follow up data, 93 out of 371 IPI-related immune genes were significantly correlated with OS in the univariate Cox regression analysis ( Table 2 and Supplementary Material 5). Next, we applied Lasso-penalized Cox regression to identify the optimal number of genes (n = 10) for the risk score model ( Figure 4A and Supplementary Figure S3A). As a result of multivariate Cox regression analysis with variable selection by Akaike information criterion, eight genes were selected to construct the most optimal IPI-IPM ( Figure 4B Figure S3C), we defined a cutoff value of 0.982 and then divided all patients into high-and low-risk groups according to their risk scores. As shown in Figure 4D, Kaplan-Meier survival analysis showed worse OS of patients in the high-risk score group (log-rank p = 3.13e-14  Figure 4G. In addition, we conducted Spearman's rank-order analysis to examine the correlation between IPI risk group and the eight-gene expression. As shown in Supplementary Figure S3F, the expression of almost all genes except PTX3 was significantly correlated with IPI risk groups. Then, we tested the difference of the eight-gene expression between high-and low-IPI risk groups. There existed significantly difference of gene expression in seven genes except PTX3. Moreover, 335 patients with available clinicopathologic parameters were enrolled in the multivariate Cox regression analysis, presenting the risk scores and age as independent prognostic factors of OS ( Figure 4H). Risk scores, along with age, Ann Arbor clinical stage, LDH ratio, ECOG performance status, and number of extranodal sites, were integrated to construct a nomogram model ( Figure 5A). As shown in the decision curve analysis (DCA), the nomogram and the risk score from IPI-IPM showed a relatively high net benefit ( Figure 5B). Moreover, the bias-corrected lines for the nomogram were close to the ideal line in the 1-, 3-, and 5-year and median survival time periods ( Figure 5C). The C-index for the nomogram was 0.790 (95% CI: 0.736-0.843, p = 2.38e-26). Altogether, these results suggest that the nomogram has excellent capacity and consistency for OS prediction in the training cohort. Moreover, we collected patients with matched IPI-IPM risk score and available IPI information. A total of 445 patients with OS data and 386 patients with PFS data were enrolled to compare the difference of IPI and IPI-IPM in prognostic predictability. As shown in Figures 5D, E, the AUCs of time-dependent ROC for IPI-IPM were almost larger than those for the IPI risk group, indicating fairly equivalent prognostic predictability between IPI and IPI-IPM. In addition, the C-indices for OS of IPI-IPM and IPI were 0.749 (p = 3.78e-23) and 0.756 (p = 1.72e-13), and the C-indices for PFS were 0.739 (p = 6.36e-21) and 0.738 (p = 8.40e-11), respectively.
Three independent cohorts from different clinical centers (the cohort of Staudt et al., GSE10846, n = 414; the cohort of Dubois et al., GES87341, n = 221; the cohort of Sha et al., GSE117556, n = 928) were enrolled for further validation of IPI-IPM. The risk score for each patient was calculated, and all patients were divided into the high-and low-risk groups. As for the cohort of Staudt et al., the AUCs were 0.619, 0.603, and 0.601 for 1, 3, and 5 years, respectively ( Figure 5F). Kaplan-Meier survival analysis showed significantly shorter OS of patients in the high-risk group (log-rank p = 5.30e-05, Figure 5G and Supplementary Figures S3G, H). Moreover, patients in the high-risk group were shown to have a shorter OS, regardless of the treatment regimens they received. Similar results of large AUC, shorter OS, and PFS of high-risk patients were also shown in the other two cohorts (Figures 5H-K and Supplementary Figures S3F, G). Taking the results of the training and testing cohorts together, the IPI-IPM and the nomogram combined risk score with relevant clinical characteristics (age, Ann Arbor clinical stage, LDH ratio, ECOG performance status, and number of extranodal sites) was an excellent model for predicting short-term or long-term OS in DLBCL patients, which may guide therapeutic strategy decisions and long-term prognosis.

MOLECULAR CHARACTERISTICS OF IPI-IPM SUBGROUPS
Compared to the low-risk group, a total of 5,980 genes (690 upregulated and 5,290 downregulated), and 2,731 immunologic genes (400 upregulated and 2,331 downregulated) were significantly differentially expressed in the high-risk group (Supplementary Figure S4A, Figure 6A, and Supplementary Material 6). . If the IPI is not recorded or the range of the IPI spans among groups (such as 1-5 or 1-2), this feature is marked as non-applicable (NA). ③ Age of nine samples from NCICCR_DLBCL was not reported.
Pre-ranked gene set enrichment analysis (GSEA) was performed to show that several gene sets, including negative regulation of immune response, DNA repair, and response to IL-12, were enriched in the high-risk group, and several gene sets, including ECM receptor interaction, IL-10 synthesis, and regulation of humoral immune response, were enriched in the low-risk group. Details are documented in Figure 6B, Supplementary Figures S4B-E and Supplementary Material 6. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was applied to show obvious genetic diversity between the high-and low-risk groups ( Figure 6C). Based on the Pearson correlation analysis, there was either a positive or negative correlation between the eight genes from the IPI-IPM ( Figure 6D). Figure 6E presents the heatmap of differentially expressed genes correlating IPI-IPM risk scores between the high-and low-risk groups.
To gain further molecular insight into the molecular characteristics of IPI-IPM, 176 genes correlating with risk scores and the eight genes from the IPI-IPM were identified as IPI-IPM-associated immune genes (absolute Pearson correlation coefficient ≥ 0.3, FDR < 0.05). Overrepresentation analysis was applied to identify the enriched biological functions and pathways, such as ECM organization and T cell differentiation, activation, and mediated immunity ( Figures 7A, B). Detailed results are listed in Supplementary Material 6. As shown by the PPI network, the ECM organization, oncogenesis, and tumor immunity-associated regulatory genes (COL6A2, COL16A1, COL26A1, COL13A1, COL22A1, C3, ELN, MMP9, CLU, FOXP3, and ADGRL1) were closely correlated, and acted as hub genes ( Figure 7C). In addition, the TFTRUST database was used to explore the transcription factors (TFs) regulating the 184  IPI-IPM-associated immune genes; thus, MMP9, FOXP3, and PLAU were identified in the TF network ( Figure 7D). Based on the somatic mutational data of 37 samples from TCGA, 18 out of 20 patients in the high-risk group and 13 out of 17 patients in the low-risk group were found to have altered gene expression (Supplementary Figures S5A, B and Figure 8A). Although most mutations were missense mutations, more nonsense mutations and other mutations were identified in the high-risk group ( Figure 8A). In addition, the mutation frequency of the top 10 genes in the high-risk group was much higher than that in the low-risk group. Furthermore, we investigated specific mutation sites of key genes corresponding to their amino acid location, including KMT2D, MUC16, CARD11, LRP1B, BTG2, and PIM1 ( Figure 8B and Supplementary Figure S5C). As shown in the Oncodrive plot, MYD88, CD79B, KHL6, and MUC4 were identified as cancer driver genes in the high-risk group whereas only PEG3 and ZNF337 were identified in the low-risk group ( Figure 8C and Supplementary Material 7).

Immune Landscape of IPI-IPM Subgroups
CIBERSORT was applied to analyze the infiltrating abundance of various immune cell types in the different IPI-IPM subgroups ( Figure 9A and Supplementary Figure S6A). Activated memory CD4 + T cells and resting NK cells showed high infiltration in the high-risk group, whereas memory B cells, CD8 + T cells, follicular helper T cells, regulatory T cells (Tregs), and non-activated macrophages were more abundant in the low-risk group ( Figure 9B). In addition, the MCPCounter and xCell algorithms were applied to show that myeloid dendritic cells (mDCs) and common lymphoid progenitors showed high infiltration in the high-risk group, whereas hematopoietic stem cells, cancer-associated fibroblasts (CAFs), and T cells, especially CD8 + T cells, were more abundant in the low-risk group ( Figure 9C and Supplementary Figure S6B). In addition, DLBCL subtypes varied between the high-and low-risk groups. As shown in Figure 10A, more patients with higher IPI levels were classified into the IPI-IPM high-risk group, with similar results for age, Ann Arbor clinical stage, LDH ratio, ECOG performance status, and the number of extranodal sites (Supplementary Figure S6C). The high-risk group contained a higher proportion of activated B-cell-like DLBCLs, whereas the low-risk group was more enriched in germinal center B-cell-like DLBCLs (p = 1.34e-14). Schmitz et al. (4) identified four prominent genetic subtypes in DLBCL with different responses to immunochemotherapy: MCD (the cooccurrence of MYD88 L265P and CD79B mutations), BN2 (BCL6 fusions and NOTCH2 mutations), N1 (NOTCH1 mutations), and EZB (EZH2 mutations and BCL2 translocations), and uneven distributions of these four subtypes were found between the high-and low-risk groups. For example, the poor-prognostic MCD subtype represented 40% of the highrisk group and 12% of the low-risk group. In addition, the correlation analysis showed that IPI-IPM risk scores were positively correlated with the expression of BCL-2 and MYC and negatively correlated with BCL-6 expression ( Figure 10B). Taken together, these data indicate that IPI-IPM provides additional orthogonal information from previous lymphoma classifications.
To further explore the interaction between DLBCL cells and the microenvironment, Kotlov et al. (11) defined the LME into four major transcriptionally defined categories with distinct biological properties and clinical behavior, including GC, MS, IN, and DP forms. Similarly, we utilized an unsupervised clustering method to assign the samples into four groups by using expression data from the 22 functional gene expression signature (F GES ) sets ( Figure 10C). GSVA enrichment scores were calculated to demonstrate distinct TME characteristics among the four biological patterns ( Figure 10D). Consistent with the CIBERSORT results, the DP LME categories represented 35% of the high-risk group whereas the GC-like and MS LME categories were more enriched in the low-risk group ( Figures 10E, F and Supplementary Material 8).

Potential Therapeutic Value of IPI-IPM
To further understand the effects of the risk score on drug response, 184 IPI-IPM-associated immune genes were mapped into the connectivity map database (17). As shown in Figure 11A, 14 genes (ADORA1, ADRA2A, CACNA1C, DBH, DNM1, ELN, ENPP1, EPHA1, IL12A, MMP9, PLAU, S1PR2, SLC1A2, and SV2A) were associated with 122 inhibitors, The clinical development of cancer immunotherapy and the advances in genomic analysis have validated the important role of the TME in predicting cancer response to ICB therapy (13,18,19). We investigated the expression of several inhibitory immune checkpoints between the high-and low-risk groups by using the normalized gene expression data (VST transformed). As shown in Figure 11B, the expression of PD-L1, BTLA, and SIGLEC7 was significantly upregulated in the high-risk group. Expressions of other inhibitory immune checkpoints such as B7-H3 were found downregulated in the high-risk group (Supplementary Figure S6D).
Various biomarkers were reported to predict the response to immunotherapy, including tumor mutation burden and expression of immune checkpoints, such as PD-L1. We examined the value of the IPI-IPM to predict the response of patients to ICB therapy based on a publicly accessible dataset, the IMvigor210 Cohort (20). Using an optimal cutoff point calculated through the "surv_cutpoint" function (a minimal proportion of observations per group was set to 30%) of the survminer R package for group assignment, we observed that patients with high IPI-IPM risk scores had shorter OS after anti-PD-L1 treatment (log-rank p = 0.029, Figure 11C).

DISCUSSION
Recent groundbreaking insights into the pronounced genomic heterogeneity of DLBCL have identified potential biomarkers for patient diagnosis and prognosis, paving the way for a standardized application of precise medicine (3,5,13).
Multiple subtype classifications as well as IPI and enhanced NCCN-IPI were built to stratify prognostically relevant subgroups of DLBCL patients with R-CHOP therapy, whose robustness requires further investigation in the context of targeted therapies (7,11,13,21). In the current study, we used WGCNA to profile IPI correlating immune gene sets and constructed and validated an eight-gene IPI-IPM (CMBL, TLCD3B, SYNDIG1, ESM1, EPHA3, HUNK, PTX3, and IL12A) with shorter OS in the high-risk patients and longer OS in the low-risk patients in both TCGA and three independent cohorts. ESM1, also known as endocan, has been shown to regulate endothelial cell function in the initiation and progression of human cancers, including esophageal cancer, hepatocellular carcinoma, bladder cancer, and breast cancer (22). The aEphA3 receptor plays a critical role in cell adhesion and migration during development and homeostasis of many tissues as well as in cancer growth, progression, and angiogenesis (23). In addition, EphA3 is highly expressed in tumors, but not in normal tissues, which, together with antitumor properties of anti-EphA3mAb (chIIIA4), defined EphA3 as a potential target for antibody-based anticancer therapies (23). PTX3 is secreted by dendritic cells, macrophages, and fibroblasts and is actively involved in the regulation of inflammation, tissue remodeling, and cancer (24). PTX3 interacts with the PI3K/AKT/mTOR signaling pathway or the fibroblast growth factor-2/receptor system to regulate tumor cell proliferation, apoptosis, and metastasis in lung cancer, breast cancer, melanoma, prostate cancer, and multiple myeloma (24,25). IL12A is a potent immunosuppressive cytokine produced by regulatory B cells, Treg cells, macrophages, dendritic cells, and tumor cells, which suppresses the effector functions of CD4 + and CD8 + T cells but strongly favors Treg proliferation (26). Furthermore, Larousserie et al. found that high levels of IL12A were associated with poor survival in DLBCL patients (27).
High-throughput gene expression databases and bioinformatics analysis have enabled systematic profiling of prognostic signatures in DLBCL. For example, Zhou et al.  In the GSEA analysis of the high-and low-risk groups, several immune-related gene sets, including negative regulation of immune response, B cell and T cell proliferation, response to IL-12 and g-IFN, and NOD-like and TOLL-like receptor signaling, were enriched in the high-risk group whereas ECM receptor interaction, IL-10 synthesis, and regulation of humoral immune response were enriched in the low-risk group. Therefore, we speculated that the local immune signature conferred a weaker immune phenotype in the high-risk group,  but an intense immune phenotype in the low-risk group. Moreover, overrepresentation analysis identified several immune-related pathways, including ECM organization and T cell activation, which were enriched with IPI-IPM-associated immune genes. The ECM plays important roles in supporting cells and regulating intercellular interactions, thus contributing to the progression of several malignancies (11,14). Lenz et al. built a survival model with two stromal gene signatures for DLBCL patients who received CHOP or R-CHOP, where the prognostically favorable stromal-1 signature reflected ECM deposition and histiocytic infiltration, whereas the prognostically unfavorable stromal-2 signature reflected tumor blood vessel density (12).
To further explore the immunological nature of the IPI-IPM subgroups, we analyzed the somatic mutational profiles of 37 samples and found higher mutation counts in the high-risk  group with more nonsense mutations, although missense mutations were the most common type. The largest difference in mutations between high-and low-risk groups was in the KMT2D mutation (40% in high-risk samples vs. 17.65% in lowrisk samples) (29). KMT2D is a tumor-suppressor gene in DLBCL, and genetic ablation of KMT2D in a BCL2overexpression-driven model promotes higher DLBCL penetrance (30). Moreover, MYD88 and CD79B were identified as cancer driver genes in the high-risk group, which is in good agreement with the previous findings that MYD88 and CD79B mutations have been associated with tumor response and survival in DLBCL patients (31)(32)(33). Ngo et al. initially identified activating MYD88 mutations in DLBCL, where L265P was the most frequent and oncogenic form. MYD88 was shown to interact with IRAK1 and IRAK4 and activate the NF-kB and JAK-STAT3 pathways, promoting malignant cell proliferation and causing worse survival of DLBCL patients (34). CD79B mutations are frequently detected in the first tyrosine (Y196) of the immunoreceptor tyrosine-based activation motif. CD79B mutations were shown to cause chronic activation of BCR signaling and constitutive NF-kB activation, further promoting tumor cell growth within the immunosuppressive TME. Consistently, the composition of immune cells was different between the two IPI-IPM subgroups, where memory B cells, CD8 + T cells, Tregs, non-activated macrophages, and CAFs were more abundant in the low-risk group. It is generally accepted that cytotoxic CD8 + T cells, following successful priming, recognize tumor-specific (neoantigens) or tumor-associated antigens and exert antitumor function primarily via the release of cytotoxic molecules such as perforin and granzymes (35)(36)(37). However, Tregs suppress CD8 + T cells by direct cell contact and secretion of inhibitory cytokines including IL-10 and TGF-b (36,38). TAMs have been shown to mediate antibody-dependent cellular phagocytosis of rituximab in malignant B cells, limit CD8 + T cell activity through PD-L1 expression, and release IL-10 and TGF-b or inhibiting enzymes, thus regulating antitumor immunity and response to therapy (39). CAFs, the resident fibroblasts activated in a chronically inflamed TME, have been shown to promote recruitment and polarization of regulatory cells including Tregs, monocytes, and M2-macrophages by secreting IL-6, CXCL12, Chi3L1, MCP-1, and SDF-1, thus actively shaping immune infiltration in the TME (40). In addition, CAFs have been shown to impact the cytolytic activity of CD8 + T cells through different mechanisms, such as the production of prostaglandin E2 and nitric oxide to dampen CD8 + T cell proliferation, expression of PD-L2 and FasL to promote CD8 + T cell apoptosis, and induction of abnormal ECM deposition and remodeling to physically trap CD8 + T cells and prevent effective tumor access (41).
In addition, DP LME represented more of the high-risk group, whereas immune-rich GC LME was more enriched in the low-risk group. Kotlov et al. characterized the DLBCL microenvironment by analyzing gene expression profiles, developing TME-derived F GES , analyzing ECM composition by proteomics, and establishing patient-derived tumor xenograft models (11). Four basic categories of DLBCL LME with distinct clinical and biological connotations were identified to uncover the bidirectional interaction between DLBCL cells and the LME. Remarkably, immune-rich GC LME confers a better prognosis than DP LME, suggesting the fundamental role of LME in preventing lymphomagenesis. In turn, DLBCL cells develop genetic and epigenetic traits that contribute to immune evasion from LME.
Finally, we applied Spearman correlation analysis to estimate the potential therapeutic effects of IPI-IPM and explored the expression of several inhibitory immune checkpoints between IPI-IPM subgroups to predict the response to immunotherapy. The results showed that IPI-IPM-associated genes were correlated with sensitivity to drugs targeting the Aurora kinase, DNA methyltransferase, histone acetyltransferase, FLT3, EGFR, and VEGFR signaling pathways, indicating that high-risk DLBCL patients may benefit from novel inhibitors targeting these signaling pathways. As for the expression of inhibitory checkpoints, PD-L1, BTLA, and SIGLEC7 were significantly upregulated in the high-risk group. Upregulation of checkpoints (PD-1, CTLA-4, and TIM-3) and their ligands (PD-L1 and PD-L2) in the TME can mediate tumor cells to escape immune surveillance by modulating T-cell activity (13,18,42,43). Thus, ICB exerted significant antitumor effects in both solid tumors and hematologic malignancies. Although nivolumab monotherapy showed low efficacy in unselected DLBCL patients, pembrolizumab combined with R-CHOP was safe and associated with a high complete response rate and improved 2-year progression-free survival (19,44). BTLA has been reported to mark a high-checkpoint-expressing T-cell subset (PD-1, TIM-3, LIGHT, and LAG-3) with decreased cytolytic function and increased proliferation ability, thus correlating with poor prognosis in DLBCL (13,45).
Taken together, the IPI-IPM risk score was compatible with the ability of tumor-infiltrating immune cells to determine the expression of immune checkpoints, suggesting that the poor prognosis of the high-risk group may be due to the stronger immunosuppressive TME and that high-risk patients will benefit more from immune checkpoint inhibitors than low-risk patients, resulting in a better prognosis. Our research provides new insights into the TME and immune-related therapies for DLBCL. However, it is noteworthy that some limitations arose because the conclusions were drawn from data from retrospective studies, and prospective studies are warranted to further confirm our results. In addition, functional and mechanistic studies should be conducted to support their clinical application of the genes in our risk model.

Data Selection and Acquisition
Data acquisition of the present study is fully under the TCGA publication guidelines (https://www.cancer.gov/tcga). Gene expression data (RNA-seq) of 570 samples, masked somatic mutation data of 37 samples, and clinical follow-up data with clinicopathological characteristics of 566 patients of DLBCL projects (TCGA-DLBC, CTSP-DLBCL1, NCICCR-DLBCL) were obtained from the Cancer Genome Atlas (TCGA) by using the TCGAbiolinks (46) R package in R software (version 4.0.2, https://www.r-project.org). Clinical information, gene expression subtype, and genetic subtype of the DLBCL patients were supplemented by referring to open-access supplementary files of the GDC DLBCL publication (4). Matched gene expression data and survival follow-up data can be obtained in 563 samples. Matched gene expression data and survival followup data with available IPI data can be obtained in 445 samples (Supplementary Table 1). Gene expression data (Microarray) with matched clinical information of 414, 221, and 928 DLBCL patients were obtained from GSE10846, GSE87371, and GSE117556, respectively (12,(47)(48)(49), by accessing the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/ geo). The immunologic gene lists used in the present study were downloaded from the ImmPort (www.immport.org/shared/ home) and ImmuneSigDB (50) through MSigDB Collections (www.gsea-msigdb.org/gsea/msigdb). For the RNA-Seq data, HTSeq-count data were downloaded. The Combat-Seq function of the Sva (51) R package was used to remove batch effects among different projects. The principal component analysis was performed and visualized to examine the batch effect (52). For each gene, the effective gene length was extracted by using the EDASeq (53) R package and TPM (Transcripts Per Kilobase Million) gene expression data were calculated through the count2TPM function utilized in the IOBR (54) R package by using the corresponding effective gene length. A Homo sapiens GRCh38 annotation file (Ensembl 103) downloaded from Ensembl (55) was used for gene symbol annotation. The DESeq2 (56) R package was applied for the normalization of RNA-seq count data, and variance stabilizing transformation (VST) data were used for downstream analysis. DLBCL microenvironmental (LME) signatures and LME subtype categorizing method were referred to the publication of Kotlov et al. on Cancer Discovery (11).

Identification of Differentially Expressed Genes
Samples with available IPI information were categorized to high-, intermediate-, and low-risk groups according to criteria of IPI, and the DESeq2 (56) R package was applied to identify differentially expressed genes (DEGs) between high-and low-risk groups. The differential expression was defined with a fold change of threshold at 1.5 and a false discovery rate (FDR) value < 0.05.

Gene Functional Enrichment Analysis
The clusterProfiler (57) R package was used for both overrepresentation analysis and pre-ranked gene set enrichment analysis (GSEA). Analysis of Gene Ontology (GO) (58), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (59), and Reactome pathway (60) were contained in the present study. An adjusted p value<0.05 was considered statistically significance. The threshold for GSEA was set at the pvalue < 0.05, FDR < 0.05, and | normalized enrichment score (NES) | > 1.0. The non-parametric gene set variation analysis was further performed with the GSVA (61) package of R.

Weighted Gene Co-Expression Network Analysis
Weighted gene co-expression network analysis (WGCNA) is commonly used for analyzing high-throughput gene expression data with different characteristics, so as to mine gene coexpression networks and intramodular hub genes based on pairwise correlations in genomic applications. In the present study, we applied the WGCNA (62, 63) R package to analyze key gene clusters that were most relevant to IPI risk groups in DLBCL samples.

Construction and Validation of IPI-Based Immune-Related Prognostic Model
IPI-based immune-related genes were selected to construct the prognostic risk model. The training cohort (563 patients with matched normalized RNA-seq data and survival data from TCGA) was used for the construction of IPI-IPM, and three testing cohorts (GSE10846, GES87341, and GSE117556) were used for validation of the prognostic risk model. The Survival R package was used to analyze the correlation between the expression of objective gene sets and DLBCL patients' overall survival (OS). Univariate Cox regression analysis was performed to screen genes, of which the expression was associated with OS with a p value < 0.05. Lasso (least absolute shrinkage and selection operator) regression analysis was applied for variable selection and regularization to enhance the prediction accuracy and interpretability by using the glmnet (64) R package. Multivariate Cox regression analysis was then carried out to select the optimal genes, based on the method of the Akaike information criterion (AIC) (65). For each sample, the risk score equals the sum of the normalized expression of each gene multiplying its corresponding regression coefficient. Timedependent ROC curves were plotted by using the survivalROC (66) R package. Five hundred sixty-three DLBCL patients in the training cohort were divided to low-and high-risk score groups according to the optimal cutoff value with largest AUC in the receiver operating characteristic (ROC) curve of the median survival time. Then, Kaplan-Meier survival analysis and timedependent ROC curve analysis were performed to evaluate the prognostic significance and accuracy of IPI-IPM (67). Besides, Harrell's concordance index (C-index) was calculated by using the survcomp (68) R package. Univariate and multivariate Cox regression analyses were performed on the risk score and all available clinicopathologic parameters, including age, gender, Ann Arbor clinical stage, LDH ratio, ECOG performance status, and number of extranodal site. Then, we utilized the rms (69) R packages which set up a prognostic nomogram for OS probability assessment by enrolling all the independent prognostic factors. The discriminative efficacy, consistency, and clinical judgment utility of the nomogram score was evaluated by time-dependent calibration plots and decision curve analysis (DCA) (70) using the rms and rmda (71) R package.

Comprehensive Analysis of Molecular and Immune Characteristics in Different IPI-IPM Subgroups
DEGs between high-and low-risk score groups were analyzed following the fold change of a threshold at 1.5 and FDR value < 0.05. The gene expression of samples between IPI-IPM subgroups were analyzed with the t-distributed stochastic neighbor embedding (t-SNE) method by using the Rtsne (72) package of R and then visualized on the 3D map with the scatterplot3d (73) package of R. Somatic mutations of IPI-IPM subgroups were analyzed by using the Maftools (74) R package. The intersection of DEGs and immune-related gene set was used to construct a protein-protein interaction (PPI) network based on the STRING (75) database. Cytoscape (76), plugin MCODE (77), and cytoHubba (78) were utilized to identify top 10 degree genes in the network. The TRRUST (79) database was browsed to explore the curated transcriptional regulatory networks. Gene expression data (TPM) of 570 DLBCL samples were imported into CIBERSORT (80), MCPCounter (81), and xCell (82) to calculate the score to estimate the proportion of TME cells including immune and stromal cells.

Clustering Analysis of Expression Pattern of LME Signatures
An unsupervised clustering algorithm was applied to analyze the gene expression pattern of LME signatures in 563 DLBCL samples. By using the ConsensusClusterplus (83) R package, we performed the k-means clustering algorithm with 1,000 repetitions to ensure the stability. The Pearson distance matrix calculated from the clustering was then imported to Cytoscape for the visualization of distribution of samples corresponding to LME signature clustering.

Data Analysis
All statistical data were analyzed in the R software (version 4.0.3). A Wilcoxon test was applied to compare continuous variables between two groups of sample data. A Kruskal-Wallis test was applied to compare continuous variables among three or more groups of sample data. A test is considered with statistical significance at two-sided p < 0.05. Pearson correlation was used to test the correlation between two sets of continuous data, and an absolute Pearson correlation coefficient larger than 0.3 was considered to be correlated. A Pearson correlation is considered with statistical significance at FDR < 0.05. Spearman correlation was used to test the correlation between gene expression and IPI groups. We used ggplot2, ggstatsplot, and ggpubr R packages (84,85) for data analysis and visualization.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The patient cohorts we used were publicly available datasets that were collected with patients' informed consent.

AUTHOR CONTRIBUTIONS
SM, DS, CS, and YH conceived of and designed the study. SM and DS did the literature research, performed the study selection, data extraction, and statistical analysis, and wrote the draft. LA, FF, and FP participated in the extraction and analysis of data. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
The results here are based upon data generated by the TCGA Research Network. The study reported herein fully satisfies the TCGA publication requirements (https://www.cancer.gov/tcga). The authors would like to thank the TCGA and GEO developed by National Institutes of Health.