Two Distinct Subtypes Revealed in Blood Transcriptome of Breast Cancer Patients With an Unsupervised Analysis

Background: Breast cancer (BC) is a highly heterogeneous cancer. The interaction between immune system and BC is complex, widespread yet unclear. In this study, we aimed to reveal the heterogeneity of host systemic immune response to BC and understand the possible mechanisms that may drive the heterogeneity using transcriptomic data from peripheral blood mononuclear cells (PBMCs). Methods: Transcriptome-wide gene expressions of PBMCs in 33 BC patients were generated by RNA sequencing. An unsupervised clustering algorithm was employed to discover PBMC transcriptome subtypes among BC patients. Association analysis between PBMC subtypes and age, clinical stage, abundance of immune cells, and other clinical factors was performed to understand the underlying biological processes that may drive this heterogeneity. Immune gene signature identification and in silico survival analysis were performed to investigate the potential clinical implications of these PBMC subtypes. The findings were validated using the whole blood transcriptomes of an independent cohort. Results: We observed that established BC subtypes were not associated with PBMC gene expression profiles. Instead, we discovered and validated two new BC subtypes using PBMC transcriptome, which have distinct immune cell proportions, especially for lymphocytes (P = 5.22 × 10−12) and neutrophils (P = 1.13 × 10−14). Enrichment analysis of differentially expressed genes revealed that these two subtypes had distinct patterns of immune responses, including osteoclast differentiation and interleukin-10 signaling pathway. We developed two immune gene signatures that can differentiate these two BC PBMC subtypes. Further analysis suggested they had the ability to predict the clinical outcome of BC patients. Conclusions: PBMC transcriptome profiles can classify BC patients into two distinct subtypes. These two subtypes are mainly shaped by different immune cell abundance, which may have implications on clinical outcomes.


INTRODUCTION
Breast cancer (BC) is now the most frequently diagnosed cancer and the sixth leading cause of cancer-related death among Chinese women (1). To gain better outcomes, the early diagnosis, prognosis and treatment monitoring are critically important (1). However, BC is well-known as a highly heterogeneous malignant tumor, both molecularly and histologically. At present, BC has been classified into five intrinsic molecular subtypes, including luminal-A, luminal-B, HER2-enriched, basal-like, and normallike (2)(3)(4)(5). Each subtype has distinct gene expression profiles, which is associated with cancer prognosis, disease progression, cancer metastasis, and therapeutic resistance (2)(3)(4)(5). Based on several clinical and pathological factors, such as estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status, BC is routinely divided into several subtypes in clinical implementation (6,7). These clinical classifications are frequently used to guide the treatment of BC patients (6,7).
Although genetic and epigenetic changes are the key causes of BC, both the innate and adaptive immune system may play substantial roles in BC progression and metastasis as well (8). The presence of cancer cells can activate different immune cells to undergo various phenotypic and functional changes, and eventually kill cancer cells or promote the proliferation of cancer cells (9,10). Several studies have attempted to detect the presence of cancers by profiling the gene expression in peripheral blood mononuclear cells (PBMCs) from BC patients (11)(12)(13)(14) and some other malignant tumors (15,16). They have proposed several PBMC gene expression signatures that can significantly differentiate cancer patients from healthy controls (12,13,15,16). Furthermore, expression profiles of several immune-related genes in PMBCs from BC patients can predict the relapse of triple negative BC (11,14). These findings indicated that transcriptomic analysis of peripheral blood immune cells (PBMCs) might be a practical way to evaluate the host systemic immune responses against cancer cells. Notably, this is especially valuable, since the collection of blood samples is non-invasive and convenient as compared to the sampling of tumor tissues (11). However, the human immune system is substantially variable (17). A wide range of factors, such as age, sex, genetic background, and some environmental influences, can perturb and shape the blood transcriptome (17). The relationship between immune system and BC is intricate, and many unanswered questions remain (8,18). Among them, one of the most important issues is to explore the heterogeneity of blood transcriptome of BC patients and the clinical relevance of this heterogeneity.
In this study, we aimed to reveal the heterogeneity of host systemic immune response to BC and understand the possible mechanisms that drive the heterogeneity. First, we measured the transcriptome-wide gene expressions in PBMC samples from 33 BC patients using RNA sequencing (RNA-seq), and correlated the gene expression profiles with known clinical classifications. Next, we performed an unsupervised cluster analysis on PBMC expressions to reveal the heterogeneity among BC patients and de novo classified BC patients with distinct host response patterns.
Then, we validated the PBMC subtypes in an independent BC dataset. Furthermore, we investigated possible clinical factors that may be related to the PBMC subtypes of BC patients, including age, clinical stages and the abundance of immune cells. Finally, we explored the potential of using PBMC gene signatures to predict the clinical outcome of BC patients.

Overview of Patient Cohorts
In this study, we recruited 33 BC patients from the First Affiliated Hospital of Nanjing Medical University, between July and September 2017, as a discovery cohort. All patients participated anonymously in consideration of privacy and security concerns. The detailed baseline demographic information of the discovery cohort is listed in  (NuGEN Technologies, CA, USA) and sequenced on Illumina HiSeq X Ten platform (Illumina, CA, USA) using paired-end 150 bp runs.

RNA-Seq Data Analysis
RNA-seq reads were aligned to human genome 19 by HISAT2 (22), quantified by featureCounts (23) and assembled by StringTie (24). The expression level of genes was quantified in forms of both counts data and normalized FPKM (fragments per kilobase of exon per million reads mapped). In total, expression values of 57,773 unique genes in PBMC samples of BC patients in the discovery cohort were measured. Considering the different types of gene expression profiles in the discovery and validation cohorts, GLM in DESeq2 (25) was used to perform the differential gene expression analysis for RNA-seq data, while linear models in limma (26) was used for microarray data. Genes with a fold change in expression level of <0.25 or >4.0 and FDR-corrected P < 0.01 were identified as significant differentially expressed genes (DEGs). The annotation and enrichment visualization of DEGs were accomplished using Metascape (http://metascape. org) (27) and Reactome pathway database (https://reactome.org/) (28). The Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Reactome pathways with a P < 1 × 10 −5 in the enrichment analysis were retained.

Discovery and Validation of the PBMC Subtypes
We used unsupervised consensus clustering (29) to discover intrinsic PBMC subtypes in the discovery and validation cohorts, respectively. The consensus clustering is a resampling-based method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters (29). The method, which is robust and insensitive to the initial conditions, has been widely used to identify biologically meaningful clusters (29). In detail, we first selected the top 5,000 variable genes measured by median absolute deviation as the most informative genes for class detection. Then, we performed a bootstrap procedure with 80% item resampling and 80% gene resampling on the PBMC gene expression profiles 10,000 times using the agglomerative hierarchical clustering algorithm with the Spearman distance metric. We selected the optimal number of clusters that corresponds to the most stable consensus matrices and the most unambiguous cluster assignments across permuted clustering runs by varying the number of clusters from 2 to 10 (29). This process determined the optimal number of intrinsic unsupervised clusters defined by PBMC transcriptome in the discovery cohort. To validate the result, we implemented the same procedure on the validation cohort. In addition, we used ingroup proportion (IGP) statistical analysis (30) to demonstrate the existence of the clusters in the validation cohort and evaluate the reproducibility of the clusters derived from consensus clustering in the two independent cohorts. IGP provides a quantitative approach to measure the similarity between the clusters. IGP will be 100% if the clusters are identical between two datasets and will be 0% conversely. Due to the different types of expression values in the two datasets, we normalized the expression data by Z-score prior to the IGP statistical analysis. The consensus clustering and IGP analysis were performed in R (https://www.r-project.org/) (31).

Estimation of the Abundance of Major Immune Cells Using Gene Expression Profiles
We used CIBERSORT (https://cibersort.stanford.edu/) (32) with the LM22 signature gene matrix (32) to characterize the proportion of immune cells in the PBMC sample of each BC patient in both discovery and validation cohorts. CIBERSORT is able to accurately estimate cell composition of complex tissues from their gene expression profiles, including the immune cells in human PBMC samples (32). We obtained the proportion of seven major immune cell types, including lymphocytes (consisting of all types of B cells, T cells, and NK cells), monocytes, macrophages (consisting of M0, M1, M2 macrophages), dendritic cells (consisting of resting and activated dendritic cells), mast cells (consisting of resting and activated mast cells), eosinophils and neutrophils. All subsequent analysis of immune cell proportions in this study was based on the estimation of these seven major cell types.

Survival Analysis
We identified the immune-related gene signatures using their expression in PBMC samples. To explore the implication of the immune-related gene signatures on the patient's survival, we used Kaplan-Meier-plotter (http://www.kmplot.com/) (33) to perform in silico survival analysis. Kaplan-Meier-plotter is able to assess the effect of 54,000 genes on cancer survival in 21 cancer types, including BC, using their expression profiles in the tumor tissue (33).

Statistical Analysis
To compare the clinical characteristics, cell proportions and established subtypes between clusters in both cohorts, we performed the Fisher's exact test or Pearson's chi-squared test for categorical variables and the Student's t-test for continuous variables. All statistical analysis were performed in R (https:// www.r-project.org/) (31).

Established Clinical Classifications Cannot Explain PBMC Expression Heterogeneity Among BC Patients
First, we explored the heterogeneity of PBMC transcriptome among the BC patients. We observed that a substantial number of genes varied significantly in expression in PBMC samples of the BC patients in both cohorts (Additional File 2). To explain this variation, we projected the PBMC transcriptome differences among BC patient groups onto known clinical classification. In the discovery cohort, the status of three immunohistochemistry (IHC) markers was available for each patient. We classified BC patients using all three IHC markers' status and compared the gene expression of BC patients with different ER, PR, and HER2 status. No significant difference was found between BC patients with different IHC markers' status (Additional File 3).
FIGURE 1 | Unsupervised consensus clustering of PBMC transcriptome subtypes. Consensus matrix heatmaps for the chosen optimal cluster number (k = 2) for the discovery (A) and validation cohorts (C), respectively. Rows and columns are patient samples and consensus matrix values range from 0 (never clustered together) to 1 (always clustered together). The dendrogram above the heatmap illustrates the ordering of patient samples in 2 clusters. The relative change in area under the cumulative distribution function (CDF) curves when cluster number varying from k to k+1 for discovery (B) and validation data (D). The range of k changed from 2 to 10, and the optimal k = 2.
Frontiers in Oncology | www.frontiersin.org In the validation cohort, only the status of ER and HER2 was available. We tested the expression differences in patients with ER and HER2 status. Again, we found no significant difference (Additional File 4). In addition, gene expression profile of the matched tumor tissue is available for each patient in the validation cohort. With the expression data, we further classified the patients in the validation cohort into PAM50 subtypes (2) and investigated the PBMC transcriptome variations among these patient groups. The result indicated that PBMC gene expression in the BC patients with different PAM50 subtypes are statistically similar (Additional File 4). All these results suggested that the established known subtypes based on IHC marker and PAM50 were not associated with PBMC gene expression in BC patients.

Identification and Validation for PBMC Transcriptome-Based Subtypes for BC Patients
Next, we employed an unsupervised clustering algorithm to classify the BC patients into de novo groups based on their heterogeneity of systemic immune response to BC. We selected the top 5,000 genes with the highest median absolute deviation of expression values in the discovery cohort, and classified BC patients into two clusters, subtype_1 and subtype_2 (Figure 1A), using the consensus clustering algorithm (29). The 2-cluster solution corresponded to the largest cluster number that induced the least incremental change in the area under the cumulative distribution function (CDF) curves while keeping the maximal consensus within clusters and the minimal rate of ambiguity in cluster assignments ( Figure 1B). Finally, subtype_1 includes 19 patients (58%), while subtype_2 includes 14 patients (42%).
To confirm this de novo classification, we independently applied the same analysis procedure (29) on the validation dataset, which is whole blood transcriptome data. Interestingly, we observed that the samples in the validation cohort were also clustered into two optimal clusters, which is very similar to that identified in the discovery dataset (Figures 1C,D). We evaluated the reproducibility of the two PBMC subtypes across the discovery and validation cohorts using in-group proportion (IGP) statistic (30). The IGP values are 89.8 and 75.3% for subtype_1 and subtype_2, respectively, indicating that both subtypes had high consistency between the two cohorts. This suggested that these two PBMC transcriptome subtypes are robust across different BC cohorts.
To understand the underlying biological mechanisms that differ in these two PBMC subtypes, we performed differential gene expression analysis using DESeq2 (25). We observed 1,988 DEGs between these two subtypes in the discovery cohort.
FIGURE 2 | PBMC subtypes shared distinct molecular pathways and immune response patterns. GO terms, KEGG pathways, and Reactome pathways with a P < 1 × 10 −5 in the enrichment analysis are displayed. We observed distinct immune patterns between the two PBMC subtypes. These distinct patterns cover the whole process of host immune response to tumor, including the activation of immune cells, the regulation and response of innate and adaptive immune system, and the production of some specific antibodies.
Frontiers in Oncology | www.frontiersin.org In enrichment analysis for the DEGs, the top 20 significantly enriched GO terms are related to immune regulation (Figure 2). Among them, myeloid leukocyte activation was the most significant GO term. Similarly, the enriched KEGG pathways and Reactome pathways (Figure 2) include osteoclast differentiation and interleukin-10 signaling, which associate to host immune response. The results suggested that the major differences between these two subtypes may be explained by their different immune responses to BC.

PBMC Transcriptome Subtypes Are Distinct in Terms of Immune Cell Abundance
Then, we investigated possible clinical factors that relate to the two subtypes in the BC patients, including age, clinical stage, established BC subtype, blood immune cell abundance, and TILs. In the discovery cohort, there was no statistical difference between the two subtypes in terms of age, histological type or clinical stage ( Table 2), or age, menopausal status or weight in the validation cohort (Table 3). Moreover, we found that the known established BC subtypes, including IHC marker status, IHC-based subtypes, and PAM50 intrinsic molecular subtypes,  cannot account for the differences between PBMC transcriptome subtypes (Tables 2, 3), because both PBMC subtypes contained the BC patients with IHC marker status and PAM50 subtypes. Interestingly, we observed significant differences in proportion of lymphocytes (in the discovery cohort: P = 5.22 × 10 −12 ; in the validation cohort: P = 5.80 × 10 −18 ) and proportion of neutrophils (in the discovery cohort: P = 1.13 × 10 −14 ; in the validation cohort: P = 1.86 × 10 −24 ) between the two PBMC transcriptome-based subtypes ( Table 4). Furthermore, we calculated the neutrophil-to-lymphocyte ratio (NLR), a common and stable hematological indicator that can reflect the inflammatory state of the body (34,35). In comparing the NLR values between the two subtypes, we also observed a FIGURE 3 | The heatmap of TIL differences between patients with two PBMC subtypes. Each row represents an immune cell type identified by LM22, and each column represents an established subtype of BC patients. The value of matrix is the P-value of the TIL difference (Student's t-test) between patients with PBMC subtype_1 and subtype_2. significant difference (in the discovery cohort: P = 6.60 × 10 −6 ; in the validation cohort: P = 9.08 × 10 −21 ). Other immune cells, such as the monocytes in the discovery cohort and the macrophages in the validation cohort, do not show a significant difference (Table 4).
Furthermore, we assessed the TIL differences in tumor tissue samples of patients with different PBMC transcriptome subtypes. We estimated the proportion of immune cells in the tumor tissue sample of each BC patient in the validation cohort, using CIBERSORT with the LM22 signature (32). We found the tumor infiltration of memory B cells is statistically different in BC patients with two PBMC transcriptome subtypes (Figure 3), including all BC patients (P = 0.032), ER+ patients (P = 0.027), Luminal-B patients (P = 0.036) and HER2-patients (P = 0.0022). Additionally, memory resting CD4+ T cells is differentially infiltrated in cancer tissues of patients with different PBMC subtypes in HER2+ patients (P = 0.034) and HER2enriched patients (P = 0.037).
These results suggested that the composition of immune cells in PBMCs and TILs in tumor tissues, rather than age, clinical stage, and known BC subtypes, are related to the heterogeneity of PBMC transcriptome in BC patients.

PBMC Transcriptome Subtypes May Be Related to BC Survival
Finally, we tried to explore the implications of the PBMC transcriptome heterogeneity on BC management. In the previous results, we found no difference in several available clinical characteristics between the two subtypes ( Table 3). However, NLR, which is an indicator of the inflammation level, differed between the two subtypes. The inflammation level has important potential in predicting the clinical outcome of BC (36). We investigated if patients with different PBMC subtypes have different survival rate. Twenty-eight immune-related genes were identified in the pathway of osteoclast differentiation, which is the most enriched KEGG pathway ( Table 5). Expression values of all the 28 genes were significantly higher in subtype_2 than in subtype_1 (Figure 4). Using Kaplan-Meier-plotter (33), we observed that the tissue expression values of the 28-gene signature had the ability to predict the clinical outcomes of all subtypes of BC patients (Figure 5A), as well as ER positive patients (Figure 5B), basal-like patients ( Figure 5C) and clinical stage III patients (Figure 5D). The high expression of these 28 genes in tumor tissue, including IFNGR1, IFNGR2, IL1A, IL1B, TLR2, TLR4, FOSL1, and CSF1, associates with a lower  TYROBP, IFNGR1, GAB2, TNFRSF1A, PTGS2, NFKB2, NFKBIA,  SIRPB1, NFKBIB, RELB, IL1A, IL1R1, IL1B, TLR4, TLR2,  FCGR2A, IFNGR2, FCGR3B, JUNB, FOSL1, JUN, SOCS3,  SIRPA, CR1, LILRB3, LILRA2, LILRA6, CSF1 risk of cancer recurrence and better survival rate in BC patients ( Figure 5).
Furthermore, we repeated the analysis above and identified 16 immune-related genes in the most enriched Reactome pathway (Additional File 5). Similarly, 16 genes including IL1R2, CXCL1, CXCL8, PTGS2, IL1A, IL1RN, and CSF1 were highly expressed in the subtype_2 BC patients (Additional

DISCUSSION
In this study, we revealed substantial heterogeneity of PBMC transcriptome in BC patients (Additional File 2) and identified two subtypes based on the PBMC gene expression profiles (Figure 1). Our results indicated that these two subtypes had distinct molecular pathways in host immune response and regulation (Figure 2). We observed that the PBMCtranscriptome based subtyping was a novel and independent classification for BC patients. The essential molecular basis of the subtyping reflects the interaction between host immune system and BC. We found that the proportion of immune cells in peripheral blood, especially lymphocytes and neutrophils, shaped the significant differences between the two subtypes ( Table 4). Furthermore, two gene signatures that discriminates these two PBMC subtypes are able to predict the clinical outcomes of BC patients (Figure 5 and Additional File 7). Importantly, such subtyping is general and robust, since they were independently observed in both the discovery dataset and validation dataset. In the discovery dataset, we quantified PBMC transcriptome using RNA-seq technology, while the transcriptome data in the validation dataset was gene expression array (12). Although the quantification platform and source samples are different in these two datasets, the findings are consistent (Figure 1). However, a future study using a large prospective cohort will be highly helpful to validate these two PBMC subtypes in BC, since the sample size in the discovery cohort is relatively small.
Current clinical classifications did not reflect the heterogeneity of interactions between BC and host immune system (Tables 2, 3). This is consistent with several previous findings, suggesting that transcriptional fingerprint of BC subtypes is not the predominant signal in the patient's systemic immune response (14,21). Thus, it was difficult to classify BC patients into classical BC subtypes using the PBMC expression profiles. The classification of the established BC subtypes was based on the expression of several important makers in tumor tissue, including ER, PR, and HER2 (6,7). In contrast, PBMCs contains the major inflammatory or supportive cells, which are composed of the main stromal components of tumor microenvironment and govern the systemic inflammatory responses in human malignancies, including BC (37). Therefore, it was reasonable that PBMC transcriptome cannot mirror the different expression profiles in tissue samples among BC patients of different clinical subtypes. Instead, PBMC gene expression profiles might be useful for early diagnosis of human cancers, such as BC and colorectal cancer (11)(12)(13)38).
In order to explore the heterogeneity of host systemic immune response to BC, we employed an unsupervised clustering algorithm to cluster BC patients using PBMC gene expression data, and revealed two distinct subtypes (Figure 1). Functional annotation and enrichment analysis displayed distinguishing immune patterns between the two subtypes (Figure 2). These distinct patterns covered the whole process of host immune response to tumor, including the activation of immune cells, the regulation and response of innate and adaptive immune system, and the production of some specific antibodies. Considering KEGG categorizes genes into meaningful biological pathways, which makes the interpretation more straightforward (39), we focused on the enriched KEGG pathways below. In our results, osteoclast differentiation, cytokine-cytokine receptor interaction and TNF signaling pathway were the top three KEGG pathways that had distinct expression patterns between the two subtypes. Osteoclasts are multinucleated cells of monocyte/macrophage origin that degrade bone matrix. The differentiation of osteoclasts is dependent on a tumor necrosis factor (TNF) family cytokine, receptor activator of nuclear factor (NF)-κB ligand (RANKL), as well as macrophage colony-stimulating factor (M-CSF) (40). BC frequently metastasizes to the skeleton, interfering with the normal bone remodeling process and inducing bone degradation (41,42). Cytokines are highly inducible, secretory proteins that mediate intercellular communication in the immune system. Cytokine and cytokine receptor interaction are regarded as crucial aspects of inflammation and tumor immunology (43). Although the exact initiation process of BC is unknown, inflammation has been proposed as an important factor in tumor initiation, promotion, angiogenesis, and metastasis, in which cytokines are prominent players (44,45). Moreover, many studies suggested that cytokines play an important role in the regulation of both induction and protection in BC (46,47). TNF is a proinflammatory cytokine that plays a critical role in diverse cellular events, including cell proliferation, differentiation and apoptosis (48). TNF-α is an important inflammatory factor that acts as a master switch in establishing an intricate link between inflammation and cancer (48). A wide variety of evidence has pointed to a pivotal role of TNF-α in tumor proliferation, migration, invasion and angiogenesis, including BC (49,50). These enriched pathways hinted that the different status of inflammation may partly explain the differences between PBMC FIGURE 4 | The expression values of 28 immune-related signature genes are significantly different in two PBMC subtypes. The 28-gene signature are derived from immune-related genes in osteoclast differentiation pathway. X-axis and Y-axis are the two PBMC subtypes and gene expression level normalized by FPKM, respectively. P-value is the result of the Student's t-test. All these genes are significantly low-expressed in subtype_1.
transcriptome subtypes of BC patients, which may be related to BC metastasis.
The correlation of PBMC heterogeneity to BC metastasis is also confirmed by the differential analysis of immune cell proportions. Our results showed significant differences of the proportions of lymphocytes and neutrophils in the peripheral blood and the neutrophil-to-lymphocyte ratio (NLR) in the two subtypes ( Table 4). The proportion of lymphocytes in subtype_1 was higher than that in subtype_2, whereas neutrophils were merely the major component of PBMCs in subtype 2. Several previous studies suggested that peripheral blood lymphocytes expressed abundant information about the interactions between the tumors and the host immune system, which are useful biomarkers for predicting the risk of cancer occurrence and recurrence (51,52). Neutrophils, altering the local microenvironment by releasing inflammatory signals and promoting the formation of metastases, were considered as the main driving force of pulmonary metastatic colonization of BC cells (36,53,54). Neutrophils were also observed to be useful biomarkers for clinical BC diagnosis and prognosis assessment (36,53,54). Additionally, the pre-treatment NLR was a prognostic factor for BC (34,35,55,56). A higher NLR was associated with poorer recurrence-free survival in BC patients (34,35,55,56). In addition to immune cells that are circulating in the peripheral blood, BC patients with different PBMC transcriptome subtypes showed distinct TILs in tumor tissues (Figure 3). Although the precise role of tumorinfiltrating lymphocytes in cancer development and metastasis is not well-understood and remains controversial, accumulating evidences suggest that the adaptive immunity mediated by T and B lymphocytes provides a critical foundation for effective and sustained antitumor responses (57).
Above evidences hinted that patients with different PBMC transcriptome subtypes may have different clinical outcomes. However, due to the limitation of small sample size and insufficient clinical data, the direct association between the PBMC subtypes and disease recurrence or cancer survival remains unexplored in our analysis. To partially overcome this, we identified two immune-related gene signatures in PBMCs and examined their power of predicting clinical outcomes using in silico prognostic analysis on their expressions in BC tissue samples. Both gene signatures showed the ability to predict the survival of BC patients (Figure 5 and Additional File 7), which is similar to the findings observed by Foulds et al. (14). In their study, they measured PBMC expression values of 800 immune-related genes and investigated their implications on clinical outcomes. They reported that the expression of CD163, CXCR4, and THBS1 in PBMCs could predict the relapse-free survival for triple negative BC patients (14). In our results, the higher expression of signature genes in tumor tissue corresponded to a lower risk of cancer recurrence and better survival rate. Interestingly, the BC patients with subtype_1 might had smaller metastasis probability and better prognosis, because they had higher proportion of lymphocytes, smaller proportion of neutrophils and lower NLR. However, the expression values of the two sets of signature genes were down-regulated in subtype_1. Therefore, we proposed that the up-regulation expression of immune-related genes in peripheral blood is probably related to a down-regulated expression in tumor tissue. This is very similar to the findings in literatures that the regulation of immune-related gene expression is opposite in blood and tissue (58).

CONCLUSIONS
In conclusion, we identified two new subtypes of BC based on their PBMC expression profiles. The two PBMC transcriptome subtypes had distinct immune patterns, which was associated with different immune cell abundances. In silico prognostic analysis suggested that BC patients of the two subtypes may have different clinical outcomes. Although this classification is probably useful for personalized BC management, further investigation in a large prospective setting is required to ascertain their clinical values.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The study was approved by the ethical committee of the First Affiliated Hospital of Nanjing Medical University. All samples were used according to the ethical guidelines of the 1975 Declaration of Helsinki and obtained with the patients' understanding that it might be published. The written informed consent was obtained from the participants of this study.

AUTHOR CONTRIBUTIONS
This study was conceptualized and designed by WM, XS, HL, YL, and WG. Samples were collected and provided by HX and YZ. RNA sequencing was completed by YB. ZH and YC contributed to the analysis of RNA-seq data and the estimation of the abundance of immune cells. WM completed the discovery and validation of new subtypes, and performed the survival analysis and statistical analysis. The draft manuscript was developed by WM and WG. All authors reviewed the draft and provided comments, contributing to the final version of the manuscript, read and approved the submission and publication.