Analysis of Cellular Heterogeneity in Immune Microenvironment of Primary Central Nervous System Lymphoma by Single-Cell Sequencing

Background Primary central nervous system lymphoma (PCNSL) is characterized by a lack of specificity and poor prognosis. Further understanding of the tumor heterogeneity and molecular phenotype of PCNSL is of great significance for improving the diagnosis and treatment of this disease. Methods To explore the distinct phenotypic states of PCNSL, transcriptome-wide single-cell RNA sequencing was performed on 34,851 PCNSL cells from patients. The cell types, heterogeneity, and gene subset enrichment of PCNSL were identified. A comparison of the PCNSL cells with 21,250 normal human fetal brain (nHFB) cells was further analyzed to reveal the differences between PCNSL and normal sample. Results Six cell populations were mainly identified in the PCNSL tissue, including four types of immune cells—B cell, T cell, macrophage and dendritic cell—and two types of stromal cells: oligodendrocyte and meningeal cell. There are significant cellular interactions between B cells and several other cells. Three subpopulations of B cells indicating diffident functions were identified, as well as a small number of plasma cells. Different subtypes of T cells and dendritic cells also showed significant heterogeneity. It should be noted that, compared with normal, the gene expression and immune function of macrophages in PCNSL were significantly downregulated, which may be another important feature of PCNSL in addition to B cell lesions and may be a potential target for PCNSL therapy.

INTRODUCTION Primary central nervous system (CNS) lymphoma (PCNSL), defined as diffuse large B cell lymphoma (DLBCL) confined to the CNS, can occur in the setting of immunosuppression (HIV/ AIDS, post-transplant) or in immunocompetent individuals (1,2). The incidence of PCNSL accounting for 2% to 3% of all CNS neoplasms (3) has reportedly increased in the immunocompetent population (4). Incidence of PCNSL is recently increasing in the elderly (5). It is not possible to morphologically distinguish PCNSL and peripheral DLBCL. This non-Hodgkin aggressive Bcell lymphoma is distinguished from extra-cerebral DLBCL by its poorer prognosis. The tumor cells are mainly large, with numerous apoptotic cells or widespread necrosis, which commonly hinders diagnosis in small biopsies (6,7).
The molecular subtype of PCNSL has been studied by different methodological approaches with conflicting conclusions. On the basis of several IHC studies, an ABC-like immunophenotype is typical (8,9), but immunoglobulin heavychain gene mutational signatures also provide evidence for germinal center exposure, indicating that PCNSL develops from a B-cell that has been exposed to a germinal center influence outside the CNS (10)(11)(12). In addition, results of immunophenotyping studies suggest that tumor cells originate from a late germinal center to an early postgerminal center stage (13), while gene expression profiling studies indicate that PCNSLs are distributed among the spectrum of systemic DLBCL with roughly equal proportion of ABC and GC cases (14,15).
Standard chemotherapeutic regimens for systemic DLBCL show little efficacy in PCNSL, likely because of inefficient drug delivery across the blood-brain barrier (16). High-dose methotrexate-based chemotherapy is the standard therapy for PCNSL. Chemotherapy with whole-brain radiation therapy has produced response rates up to 80%-90% and median overall survival up to 5 years. While treatment response rates are high, relapses are frequent and prognosis after recurrence is poor with 5-year survival rates ranging from 22% to 40% (17,18). Up to 50% of patients with PCNSL will relapse, and 10%-15% demonstrate primary refractory disease, indicating a significant unmet therapeutic need (19). However, PCNSL has a poor prognosis compared with that of DLBCL (20) and the reason for the difference in prognosis between PCNSL and DLBCL has not been elucidated.
The rarity of the disease and the difficulty of obtaining intracranial specimens have hindered understanding of the pathophysiology of PCNSL. The observed overexpression of BCL6 and aberrant somatic hypermutation (aSHM) of many genes, together with expression of immunoglobulin M at the cell surface, have suggested that PCNSL cells may be arrested at the stage of terminal B-cell differentiation (21). The genomic alterations (GAs) underlying PCNSL have not been comprehensively studied. Single-nucleotide mutations in various genes, including MYD88, CD79b, PIM1, and BTG2, have been reported as the most prevalent genetic alterations in PCNSL (22)(23)(24).
Genome-wide gene expression in PCNSL compared with non-CNS DLBCL suggests that PCNSL has specific signatures to be distinguished from non-CNS DLBCL and greater molecular heterogeneity (14,25). Of the genetic changes that lead to PCNSL, very little is known and no characteristic genetic alterations have been defined thus far. However, few retrospective studies have examined the detailed molecular network and cell signaling based on diagnosis with gene mutations and CNVs or the prognosis of patients with PCNSL.
Some vague evidences suggest that CNS lymphoma and peripheral lymphoma are heterogeneous, which may be related to different immune environments and different origins of lymphoma cells. At present, there is no clinical significance to reveal the molecular characteristics of CNS lymphoma.
Here, we performed an analysis of scRNA-seq data of CNS tissue sequenced by 10x Genomics to identify the cell types of CNS cells; we next performed differential expression (DE) analysis on two PCNSL-associated cells (i.e., B cell, plasma cell) and gene set enrichment analysis (GSEA) of their DE genes on the cell type-specific marker genes to examine the cell-specific functionalities in PCNSL development.

Single-Cell RNA Sequencing
Human PCNSL tumor samples were obtained surgically from clinical patients confirmed by pathology from our hospital. After sample collection processing and suspensions, we performed scRNA-seq following the manufacturer's protocol. Briefly, the cells were washed with PBS and resuspended in 500 ml PBS. scRNA-seq libraries were prepared using a Chromium Single cell 3′ Reagent kit, version 2. Amplified cDNA and final libraries were evaluated using a High Sensitivity DNA Kit (Agilent Technologies). Sequencing was performed on NovaSeq 6000 (Illumina) at a depth of approximately 400M reads.

Single-Cell Data Processing
The Cell Ranger software pipeline (version 4.0, http://support. 10Xgenomics.com/single-cell-gene-expression/software/ overview/welcome) provided by 10x Genomics was used to demultiplex cellular barcodes. Unique molecular identifier (UMI) counts were obtained by mapping reads to the human reference genome (GRCH38 3.1.0) genome and align transcriptomes using the STAR aligner and down-sample reads as required to generate normalized aggregate data across samples. In the end, a matrix of gene counts by cells was produced. We processed the UMI count matrix using the Seurat R package (version 4.0.2), resulting in 34,851 cells with 36,601 genes for PCNSL samples.
We first removed the likely multiplet captures, which is a major concern in microdroplet-base experiments through DoubletFinder. We filtered cells at the cell and gene levels to obtain the reliability results of PCNSL scRNA-seq data, respectively. We removed the low-quality cells with the following criteria: (i) the number of expressed genes was <200 or >2,000; (ii) the number of total counts was > 20,000; and (iii) the percentage of mitochondrial counts > 10%. We only kept the genes detected in at least 20 cells. After applying these quality control criteria, we obtained 20,307 cells with 12,229 genes in total, which were used for downstream analysis.
To analyze the differences between PCNSL and normal brain cells, we referred to a published scRNA-seq data of normal human fetal brain (nHFB). The raw data are from the European Genome-Phenome Archive: EGAS00001004422 (https://ega-archive.org/studies/EGAS00001004422). For accessibility reasons, we directly used the cellranger matrices from HFA567_total.filtered_gene_matrices, HFA570_total. filtered_gene_matrices, and HFA571_total.filtered_gene_ matrices (https://datahub-262-c54.p.genap.ca/GBM_paper_ data/GBM_cellranger_matrix.tar.gz) (26). The following criteria were used for nHFB scRNA data processing: (i) the number of expressed genes was <100 or >3,000; (ii) the number of total counts was > 20,000; and (iii) the percentage of mitochondrial counts > 10%. We only kept the genes detected in at least 20 cells. After applying these quality control criteria, we obtained 20,015 cells with 16,608 genes from 21,255 cells with 33,694 genes in total, which were used for downstream analysis.

Cell Subpopulation Detection
scRNA-seq data were normalized using log transforms and scaled so that the mean expression across cells is 0 and the variance across cells is 1. Then, we performed dimension reduction and visualization on the single-cell data through the Seurat R package (version 4.0.2) (1,27). We used the VST method (28) to obtain the top 3,000 highly variable genes and used the PCA method to reduce dimensionality. The top 21 principal components (PCs) were selected for tSNE to visualize the cell clustering (29). We then clustered the cells using the Leiden algorithm (30) implemented in the Seurat package, with the resolution parameter set to be 0.7. There are two methods for annotation cell types: (i) the Wilcox test was used to identify significantly differentially expressed genes (DEGs) between clusters. To identify the marker genes of each cluster, we combined the rest of the cells and set them to control cells. For the process of identifying DEGs between PCNSL and normal samples, we set cells of one cluster in PCNSL samples as the treatment group and the cells of this cluster in normal samples as the control group. We considered the genes with adjusted p-value < 0.05 and |log2FC|> 0.58 for each scRNA-seq cell type as the DEGs. The top 30 genes are used for the annotation cell type manual. According to the detailed intracellular marker gene, we combined with the CellMarker (http://bio-bigdata.hrbmu.edu.cn/ CellMarker/) database. (ii) By comparing scRNA-seq with HumanPrimaryCellAtlasData and BlueprintEncodeData data, we utilized both the SingleR R package (version 1.4.1) and manual work to assign cell types to cluster automatic annotation. Finally, we compare the annotation results and determine the cell types.

Gene Function Enrichment Analysis
We then performed gene ontology (GO) enrichment analysis using the database for annotation, visualization, and integrated discovery database (DAVID: version 6.8, with the significant DEGs https:// david.ncifcrf.gov/summary.jsp). It is an essential tool for systematically extracting biological information from a set of genes. Adj. p-value < 0.05 was considered statistically significant.

Cell Communication Analysis
In order to concretely prove the direct interaction between cell subpopulations, we used CellPhoneDB (version 2.1.7) to conduct interactive analysis on cell subpopulations. CellPhoneDB is a publicly available repository of selected receptors, ligands, and their interactions (31). Here, we used it (version 2.1.7) to explore the cell-cell communication by immune-related proteins. Here, we focus on the four classes of immune cell subpopulations and immunerelated genes. There are eight classes of gene lists collected in our research: chemokines, T helper type 1 genes (Th1), T helper type 2 genes (Th2), T helper 17 genes (Th17), T regulators (Treg), and costimulatory, coinhibitory, and immune niche genes. The circle edge width is proportional to the number of cells in each cell cluster and the communication score between interacting cell clusters, respectively.

Identification of Cell Types in PCNSL and nHFB
After quality filtering, 40,322 cells with 20,452 genes were obtained from the PCNSL and nHFB samples (details in Materials and Methods). All cells were classified into 13 subpopulations by combining the automatic and manual annotation methods. The cell types were identified as B cell, T cell, macrophage, dendritic cell, oligodendrocyte, meningeal cell, excitatory neuron (EN), interneuron (IN), radial glia (RG), inhibitory neuronal progenitor cell (INP), excitatory neuronal progenitor (ENP), astrocyte, and oligodendrocyte progenitor cell (OPC) ( Figure 1A and Table 1). The combination comparison ( Figures 1A, B) indicated that PCNSL and nHFB had significant differences in cell heterogeneity, and there were certain similarities only in macrophages and meningeal cells. The B cell, T cell, macrophage, and dendritic cell population accounted for the highest proportion in PCNSL (Table 2, Figure 1C). By extracting the average value of top 30 gene expression in each group, and performing Pearson correlation coefficient (PCC) expression correlation analysis, the difference between groups is greater than the difference within the groups, which proves that our subpopulation annotation has high sensitivity and specificity ( Figure 1D). The result of the bubble plot reveals that typical marker genes are particularly expressed in the cell subpopulations ( Figure 1E). Overall, the annotation results show the complex cell composition in PCNSL tissue.

The Heterogeneity of B Cells in PCNSL and Their Communication Interaction With Other Immune Cells
Malignant proliferation and abnormal differentiation of B cells (toward plasma cells) are the main pathological characteristics of PCNSL. Molecular targeted therapy targeting B cells has also been considered as a promising way to treat PCNSL in recent years. Nevertheless, the heterogeneity of B cells in PCNSL remains unclear. Therefore, we first analyzed the heterogeneity of B-cell subtypes and genetic characteristics in PCNSL from single-cell resolution. CD79A served as a typical marker gene of the B cell population in total PCNSL cells ( Figure 2A). Then, the identical methods with cell population detection were used to re-cluster B cells into four distinct subclusters: B cell-1, B cell-2, B cell-3, and plasma cell ( Figure 2B). We identified the marker genes for each subcluster among the B cell population ( Figure 2C and Supplementary Table 1). The functional annotations of each subcluster were counted (Supplementary Table 2) and the three significant pathways are shown in Figure 2D. The B cell-1 cluster highly expressed marker genes (e.g., CD79A, CD79B, TCL1A) of the classical B cells. The B cell-2 cluster was significantly marked by, e.g., NCL, LTB, NEAT1, and enriched functional pathways of antigen processing and presentation of endogenous peptide antigen via MHC class I and positive regulation of T cell-mediated cytotoxicity, suggesting that these B cells have strong antigen-antibody immunity and synergistic disturbance with T cells. The B cell-3 cluster (e.g., CST7, CXCL13, MT-ND3) indicated functions of cell-cell adhesion regulation, neuroinflammation response, and hydrogen peroxide biosynthetic process. Plasma cells (e.g., IGHG1, IGHG3, IGHG4) played a significant role in complement activation.
Cell-cell communication is a fundamental process that shapes biological tissue. In particular, ligand-receptor pairs can infer        (32). Here, we focused on four classes of immune cell subpopulations and immune-related genes. Figure 2E shows the interactions among immune-related cell subpopulations: B cell, T cell, macrophage, and dendritic cell subpopulation. The circle edge width is proportional to the number of cells in each cell cluster and the communication score between interacting cell clusters, respectively. The circle plot ( Figure 2F) showed the interactions of B cell with T cell, macrophage, and dendritic cell, respectively. The bubble plots ( Figure 2G) show the ligand-receptor pairs contributing to the signaling from B cells to other three clusters. The circle plots show that the interaction between macrophage and T cell has the most count numbers among others. Interestingly, we identified that significant target CD74 with MIF, COPA, and APP might participate the process of B cell interacting with T cell, macrophage, and dendritic cell based on the CellPhoneDB results, and CD74-MIF interaction was the most significant among these.  Figure 3C). The proportions of these subtypes are shown in Figure 3D. The marker genes were identified for each subcluster among the T cell population (Supplementary Table 3), and the three typical marker genes for annotation are shown in a bubble plot ( Figure 3D). Three significant function pathways enriched by the marker genes of the four subclusters are shown in Figure 3E (Supplementary Table 4). DEGs of MPC were highly enriched in the G2/M transition of the mitotic cell cycle pathway, which was closely related to the proliferation and differentiation of various lymphocytes in PCNSL and might be the main source of tumor germination. The dendritic cell population was labeled by a marker gene CPVL ( Figure 3F). Then, the dendritic cells were re-clustered into three distinct clusters: conventional dendritic cell (cDC), myeloid dendritic cell (mDC), and plasmacytoid dendritic cell (pDC) ( Figure 3G). We identified the marker genes for each subcluster among the dendritic cell population (Supplementary   Table 5). The typical marker genes for annotation are shown in a bubble plot (cDC: e.g., STMN1, HMGB2, HIST1H4C; mDC: e.g., CST3, HLA-DPA1, HLA-DQB1; pDC: IGLC2, GAS5, IL32) ( Figure 3H and Supplementary Table 5). The proportions of the subclusters are shown in Figure 3I. Evaluating known pathway expression in every three subpopulations of dendritic cell group using GO enrichment analysis revealed strong enrichment of regulation of viral entry into host cell, adaptive immune response, and neutrophil degranulation for mDC subpopulation; SRP-dependent cotranslational protein targeting to membrane , interferon-gamma-mediated signaling pathway, and antigen processing and presentation of exogenous peptide antigen via MHC class II for pDC; and positive regulation of NIK/NF-kappaB signaling, regulation of G2/M transition of mitotic cell cycle, and viral process in cDC. The function pathways of the marker genes enriched by GO analysis are shown in Figure 3J and Supplementary Table 6.

Identification and Variation Analysis of Macrophage Between PCNSL and Normal Brain
It is noted that macrophage in PCNSL and HFB shows obvious difference in cell proportion and distribution. Then they may also have some differences in function. LYZ was used as a marker gene of the macrophage population ( Figure 4A). p-value = 7.35E-34), and TLR10 (log2FC = -0.54, adj. p-value = 3.69E-40) are selected for visualization of the differentially expressed between PCNSL and nHFB samples ( Figure 4B). There are 143 DEGs between cancer and normal cells. Fifty genes are upregulated in cancer cells compared to the normal cells, while 93 genes are downregulated in cancer cells compared to normal cells ( Figure 4C and Table 3). Moreover, the top 20 enriched functional pathways of these DEGs are shown in Figures 4D, E. Compared with PCNSL, macrophages in nHFB exhibit normal immune function, and the un-regulated DEGs are mainly enriched in pathways including immune response, neutrophil chemotaxis, and chemokine-mediated signaling pathway ( Figure 4D). However, this classic immune function of macrophage in PCNSL was significantly downregulated, and no significant pathways were found ( Figure 4E). These results suggest that the immune function of macrophages in PCNSL is significantly suppressed.

DISCUSSION
CNS tissue is the key site of PCNSL pathogenesis and progression which consists of diverse types of cells.
Understanding the molecular phenotype of each cell type to the PCNSL is an important step to reveal the pathogenesis of PCNSL. The availability of scRNA-seq data allows us to perform molecular and functional heterogeneity analysis for different cells in the PCNSL tissue. In this study, we investigated CNS tissues at a single-cell resolution using gene expression profiling. We mapped the immune microenvironment of PCNSL and identified the cellular markers and functional signals of immune cells such as B cells, macrophages, and dendritic cells in PCNSL, which provided a partial basis for guiding the precise treatment of PCNSL. Abnormal proliferation and differentiation of B cells are directly related to the pathological basis of PCNSL, and scRNA-seq enables us to directly parse the heterogeneity of B cells from single-cell resolution. We identified that B cells can be divided into three subgroups with completely different gene expression and function, including a small number of terminally differentiated plasma cells (IGHG1, IGHG3, IGHG4). We identified that B cells can be divided into three subgroups with completely different gene expression and function, including a small number of terminally differentiated plasma cells. In addition to the subsets of normal B cells (B cell-1 overexpressed CD79A and CD79B), B cell-2 showed an important role in antigen processing and regulating the cytotoxicity of T cells, and its function was similar to that of dendritic cells, which may be related to the abnormal differentiation of B cells in PCNSL. This also indicated from the molecular phenotype that B cells in PCNSL were in the pathological overactivation state.
Cell-cell communication between B cells and other immune cells is also an important mechanism for the progression of PCNSL disease (12,15,33). Surprisingly, we found that CD74 might be a key target to regulate the communication between B cells and the other three types of immune cells (T cell, macrophage, dendritic cell) and CD74-MIF was identified to be the most significant interaction. The macrophage migration inhibitory factor (MIF) receptor CD74 is overexpressed in various neoplasms, mainly in hematologic tumors, and         (34). CD74 is quickly internalized and recycles after antibody binding; therefore, it constitutes an attractive target for antibody-based treatment strategies. CD74 has been further described as one of the most upregulated molecules in human glioblastomas. They also pointed that CD74 expression in human gliomas is restricted to GAMs and positively associated with patient survival. Moreover, CD74 represents a positive prognostic marker most probably because of its association with an M1-polarized immune milieu in high-grade gliomas. Our study strongly suggests that CD74 of B cells in PCNSL may be an important cellular and molecular pathological basis for its coordination and disturbance with other immune cells to induce the disorder of the immune microenvironment. This may provide an important basis for targeted therapy of PCNSL. However, there are still limitations in the above conclusions, namely, that malignant and non-malignant B cells in PCNSL have not been identified and distinguished. These analyses are mainly based on the mixed population of B cells in PCNSL. In this regard, the separated malignant and non-malignant B cells from PCNSL tumors should be accurately analyzed for the tumor cell heterogeneity of PCNSL in the future.
In addition, we also demonstrated that immune cells would be affected by the tumor microenvironment. Heterogeneities were also observed in these cell types. We identified T cell, T helper cell, NKT cell, and MPC subtypes within T cell populations that possessed different ontologies. Moreover, we also identified three subtypes within the dendritic cells, including conventional dendritic cell, myeloid dendritic cell, and plasmacytoid dendritic cell. They also enriched into NF-kB signaling, antigen processing and presentation, and neutrophil degranulation pathways, respectively. Further studies on the immune regulation mechanism of these subtypes of T cells and dendritic cells might provide a new strategy for the treatment of PCNSL.
In conclusion, our single-cell analysis enabled us to study the cellular heterogeneity and gene expression characteristics of PCNSL at a single-cell resolution and whole-transcriptome scale. Our study is the first to analyze the cellular and molecular pathological map at single-cell resolution. Despite the small size of single-cell samples in our study, it reveals novel potential for targeted immunotherapy of PCNSL.

DATA AVAILABILITY STATEMENT
The raw and processed data supporting the findings in this study are available from the corresponding author, and has been submitted to the Gene Expression Omnibus (GEO) database under the accession code GSE181304 (https://www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc=GSE181304).

ETHICS STATEMENT
Written informed consent was not obtained from the individual (s) for the publication of any potentially identifiable images or data included in this article.