Multicohort Analysis Identifies Monocyte Gene Signatures to Accurately Monitor Subset-Specific Changes in Human Diseases

Monocytes are crucial regulators of inflammation, and are characterized by three distinct subsets in humans, of which classical and non-classical are the most abundant. Different subsets carry out different functions and have been previously associated with multiple inflammatory conditions. Dissecting the contribution of different monocyte subsets to disease is currently limited by samples and cohorts, often resulting in underpowered studies and poor reproducibility. Publicly available transcriptome profiles provide an alternative source of data characterized by high statistical power and real-world heterogeneity. However, most transcriptome datasets profile bulk blood or tissue samples, requiring the use of in silico approaches to quantify changes in cell levels. Here, we integrated 853 publicly available microarray expression profiles of sorted human monocyte subsets from 45 independent studies to identify robust and parsimonious gene expression signatures, consisting of 10 genes specific to each subset. These signatures maintain their accuracy regardless of disease state in an independent cohort profiled by RNA-sequencing and are specific to their respective subset when compared to other immune cells from both myeloid and lymphoid lineages profiled across 6160 transcriptome profiles. Consequently, we show that these signatures can be used to quantify changes in monocyte subsets levels in expression profiles from patients in clinical trials. Finally, we show that proteins encoded by our signature genes can be used in cytometry-based assays to specifically sort monocyte subsets. Our results demonstrate the robustness, versatility, and utility of our computational approach and provide a framework for the discovery of new cellular markers.


INTRODUCTION
Monocytes, together with macrophages and dendritic cells (DCs), are part of the mononuclear phagocyte system. Monocytes and monocyte-derived cells play important roles in the regulation of inflammation, both as precursors as well as effector cells (1)(2)(3). Monocytes are a heterogeneous group of cells, and since Passlick and colleagues showed that the combined use of CD16 (FcgRIII) and the LPS co-receptor CD14 identified distinct subsets of monocytes in humans (4), three major subsets have been defined, as well as their murine counterparts. These subsets are: a classical (CD14 + CD16in humans and Ly6C hi in mice), a nonclassical (CD14 -CD16 + in humans and Ly6C lo in mice), and an intermediate subset (CD14 + CD16 + in humans and Ly6C + Treml4 + in mice) (5)(6)(7). Although plasticity is an important characteristic of monocytes (8,9), surface marker expression and functional studies have shown each monocyte subset to be functionally distinct. Consistent with this, transcriptome analyses of sorted monocyte subsets have revealed different gene expression profiles ascribed to each subset (7,10,11). Classical monocytes, around 90% of total monocytes in humans, are efficient phagocytic cells important for the initiation of inflammatory response, with high expression of scavenger and chemokine receptors, and elevated cytokine production (12,13). Earlier work on nonclassical monocytes, around 5% of total monocytes, emphasized their capacity to produce inflammatory cytokines, especially TNF (14). However, recent work has shown that nonclassical monocytes are also involved in immune surveillance of the vasculature and have pro-and antiinflammatory functions (9,15). Intermediate monocytes are considered efficient antigen presenting cells (12). Consequently, altered frequencies of different subsets have been associated with inflammatory conditions, such as infections and autoimmune disorders including lupus, rheumatoid arthritis, and inflammatory bowel disease (13,(16)(17)(18)(19), and more recently, COVID-19 (20,21).
Dissecting the contribution of different monocyte subsets to disease is currently limited by samples and cohorts that can be profiled experimentally using cytometry and cell-staining-based assays. These limitations often result in underpowered studies and, consequently, poor reproducibility (22). Public transcriptomes provide an alternative source of data characterized by high statistical power and real-world biological, clinical, and technical heterogeneity, resulting in increased reproducibility (23)(24)(25)(26)(27)(28)(29)(30). However, most transcriptome datasets profile bulk blood or tissue samples, requiring the use of in silico approaches to quantify changes in the levels of specific cell types (31)(32)(33)(34)(35)(36).
Here, we integrated 853 publicly available microarray expression profiles of sorted human monocyte subsets from 45 independent studies to identify robust and parsimonious gene expression signatures, consisting of 10 genes specific to each subset. These signatures, although derived using only datasets profiling healthy individuals, maintain their accuracy independent of the disease state in an independent cohort profiled by RNA-sequencing. Furthermore, we demonstrate that these signatures are specific to monocyte subsets compared to other immune cells such as B, T, dendritic cells (DCs) and natural killer (NK) cells. This increased specificity results in estimated monocyte subset levels that are strongly correlated with cytometry-based quantification of cellular subsets. Consequently, we show that these monocyte subsetspecific signatures can be used to quantify changes in monocyte subsets levels in expression profiles from patients in clinical trials. Finally, we show that proteins encoded by our signature genes can be used in cytometry-based assays to specifically sort monocyte subsets. Our results demonstrate the robustness, versatility, and utility of our computational approach and provide a framework for the discovery of new cellular markers.

Public Data Collection, Annotation, and Analysis
Unless otherwise noted, we obtained all gene expression data used in this study from the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo/) using the MetaIntegrator R package from CRAN (35) (Supplemental Table 1). All data was manually annotated using the available expression metadata. We normalized each expression dataset using quantile normalization and computed gene-level expression from probelevel data using the original probe annotation files available from GEO as described previously (31). We performed conormalization, effect size calculation, and gene ranking as previously described (31). We performed gene set selection to identify parsimonious gene signatures using the following criteria: (a) we ranked genes based on effect size (b) we filtered genes that were up-regulated in the cell subset of interest (c) we filtered genes with a mean expression difference of 32 expression units or above (d) We selected the top 10 genes for each subset. We chose these criteria to increase the likelihood of successful independent experimental validation of each marker gene. Signature scores were computed by calculating the geometric mean of expression levels of the signature genes in the dataset of interest, as described previously (17)(18)(19)(20)(21)(22)(23). All follow-up analyses were performed using R (v. 3.4.1). Analysis scripts are included as Supplemental Materials and available online here (https://www.biorxiv.org/ content/biorxiv/early/2020/12/22/2020.12.21.423397/DC1/ embed/media-1.gz?download=true).

Pathway Analysis
We performed gene set enrichment analysis (GSEA) as previously described (37). Briefly, for each monocyte subset we computed an effect size vector across all genes as described above. We then applied GSEA to the effect size vectors by comparing to MSigDB, a collection of molecular signatures derived from pathway analysis databases and published molecular data (http:// software.broadinstitute.org/gsea/msigdb/index.jsp). We corrected for multiple hypothesis testing across all pathways using Benjamini-Hochberg's FDR correction. We performed this analysis using the 'fgsea' and 'MSigDB' packages in R.

Samples
For the monocyte sorting and expression profiling, de-identified blood samples from 3 healthy adult donors were obtained from the Stanford Blood Center (SBC) and from 3 Rheumatoid Arthritis (RA) patients at UCSF (Supplemental Table 6). For the flow cytometry, 8 de-identified blood samples from healthy adults from the SBC, and 12 samples from patients with Systemic Juvenile Idiopathic Arthritis from Lucille Packard Children's Hospital were tested (Supplemental Table 7). The work was conducted with approval from the Administrative Panels on Human Subjects Research from Stanford University and UCSF.

Fluorescent-Activated Cell Sorting of Monocytes
Venous blood from healthy controls and RA patients was collected in heparin tubes (BD Vacutainer, BD, Franklin Lakes, NJ); peripheral blood mononuclear cells (PBMCs) were isolated by density gradient centrifugation using LSM Lymphocyte Separation Medium (MP Biomedicals, Santa Ana, CA). PBMCs were enriched for total monocytes using the Pan Monocyte Isolation Kit (Miltenyi Biotech, San Diego, CA). Enriched monocytes were stained for surface antigens as previously described (38). Briefly, cells were stained with LIVE/DEAD Fixable Aqua Dead Cell Stain (Life Technologies, Eugene, OR). Antibodies against CD3, CD19, CD56, and CD66b, all labeled with PercpCy5.5, were used to exclude T cells, B cells, NK cells, and neutrophils respectively ('dump'); antibodies against CD1c and CD141 were used to identify and exclude dendritic cells. Antibodies against HLA-DR-APC Cy7 (clone L243), CD14-Pacific Blue (clone M5E2), and CD16-PE Cy7 (clone 3G8) were used to identify monocytes and their three subsets. All antibodies are from Biolegend (San Diego, CA). Fluorescence minus one (FMO) were used as control for gating cell populations. Sterile flow cytometry sorting was performed using a BD FACSAria II (BD Biosciences) at the Stanford Shared FACS Facility (SSFF) using a 100uM nozzle, yielding monocyte subset purity of over 98% (verified using the classical subset). Sorted cells were collected into polypropylene tubes containing RPMI media (RPMI+10% Heat inactivated Fetal Bovine Serum +1% Penicillin Streptomycin), counted and spun down for total RNA extraction.

Total RNA Extraction and RNA-Seq
Total RNA extraction was performed using the Qiagen RNeasy Micro kit (Qiagen, Germantown, MD). Total RNA concentration and quality were determined using a NanoDrop 1000 Spectrophotometer (ThermoFisher Scientific, Waltham, MA) and a BioAnalyzer 2100 (Agilent Technologies, Santa Clara, CA) at the Stanford PAN Facility. RNA sequencing (RNA-seq) was performed by BGI Americas (Cambridge, MA). The total RNA was enriched for mRNA using oligo(dT), and the RNA was fragmented. cDNA synthesis was performed using random hexamers as primers. After purification and end reparation, the short fragments were connected with adapters. The suitable fragments were selected for PCR amplification. The library was then sequenced using Ilumina HiSeq 2000.

Gene Expression by Real Time RT-PCR
Total RNA from sorted monocytes was converted into cDNA using the iScript Reverse Transcription Supermix (Bio-Rad, Hercules, CA), and cDNA was amplified with SsoAdvanced ™ SYBR ® Green Supermix (Bio-Rad); reactions were performed in a CFX384 real time PCR instrument (Bio-Rad). Primers sequences were obtained from qPrimerDepot (http:// primerdepot.nci.nih.gov/), synthesized at the Stanford University Protein and Nucleic Acid Facility (PAN) and validated in our laboratory as previously described (39) using unsorted total monocytes. Primers are listed on Supplemental Table 8. Relative starting amounts of each gene of interest were determined using the delta delta Cq method.

Robust Subset-Specific Monocyte Signatures From Heterogeneous Gene Expression Datasets
We hypothesized that integrating heterogeneous transcriptome profiles of sorted human monocyte subsets across multiple cohorts would allow us to identify robust subset-specific gene expression signatures. To test this hypothesis, we collected and annotated 853 publicly available gene expression profiles of sorted human monocytes across 45 studies. These datasets spanned 22 microarray platforms for transcriptome profiling of samples acquired from healthy donors (Supplemental Table 1). After sample annotation, we co-normalized and integrated all expression data as previously described (31) (Figure 1). For each gene, we calculated effect sizes as Hedge's g between the samples from a cell subset of interest compared to all other samples. We then characterized the underlying biological functions represented within the transcriptional data for each monocyte subset by performing Gene Set Enrichment Analysis (GSEA) (see Methods). We identified 91 significantly enriched pathways in classical monocytes and 1737 for the nonclassical subset (FDR < 5%). Our analysis revealed pathways associated with known functions of classical monocytes, such as wound healing, cytoskeleton remodeling, and phagocytosis, positively enriched in the classical subset (Supplemental Table 2). In contrast, our analysis of the nonclassical subset revealed categories of diseaseassociated gene expression changes, cell cycle, and metabolism (Supplemental Table 3). Notably, our most significant enrichment consisted of a gene signature previously reported to be down-regulated in Alzheimer's Disease (pathway: 'BLALOCK_ALZHEIMERS_DISEASE_DN', p = 9.9e-6). This is in agreement with previous reports showing that nonclassical monocytes are found to be reduced in patients affected by Alzheimer's (40). These results suggest that our data integration strategy allowed us to preserve and capture previously described biological functions of monocyte subsets irrespective of technical and biological confounders within our collection of datasets. We then applied our multi-cohort analysis framework to identify robust cell-subset specific genes (see Methods) (23,31,41). We considered classical and nonclassical monocyte subsets for our analysis because of their functional importance and the number of available datasets for each subset that could be integrated into our analysis (n>=4). There were 30 genes significantly over-expressed in classical monocytes and 268 genes over-expressed in nonclassical monocytes. We created classical and nonclassical monocyte specific gene signatures using the top 10 genes for each monocyte subset that were consistently elevated within the subset of interest across all our discovery cohorts ( Figures  2A, B and Supplemental Table 4). Among the classical signature genes, five have been previously associated with the classical monocytes in a single-cell RNA-seq study of healthy FIGURE 1 | Generation of monocyte-specific signatures: Workflow depicting collection and annotation of publicly available discovery datasets from NCBI GEO profiling sorted human monocyte cell subsets (classical and non classical). Data was the combined and co-normalized to identify subset-specific signatures. Signatures were validated on a independent RNA-seq chohort, and on PBMCs expression profiles with paired flow data. After validation, signatures were applied on disease-affected cohorts and tested for their viability as phenotypic markers in cytometry-based assays. monocytes, and SIGLEC10 was identified in non-classical monocytes as well (42).

Monocyte Signatures Are Robust in an Independent Validation Cohort and Independent of Disease State
Although the gene signatures for classical and non-classical monocytes were derived by integrating independent datasets with substantial technical and biological heterogeneity, they only included healthy human subjects. We have previously shown that the accuracy of cell-type specific genes can be significantly affected by disease-induced changes in gene expression (31). Therefore, we investigated whether these monocyte subset-specific signatures are confounded by disease state in an independent cohort of healthy controls (n=3) and patients with rheumatoid arthritis (RA; n=3). We sorted monocyte subsets (classical, nonclassical, and intermediate) from peripheral blood samples and measured their transcriptomic profile using RNA-seq (see Materials and Methods).
Hierarchical clustering of the RNA-seq data using all genes in our monocyte subset-specific signatures accurately separated samples according to their cell subset identity, but not by their disease status ( Figure 3A). Importantly, the signature genes showed variable expression levels in the intermediate monocytes, suggesting that the intermediate monocytes may represent a transitional cellular state rather than a stable state. All genes except two from the non-classical monocyte signature, (IER2 and CTSA), were correctly over-expressed in their respective subset, and most (15 of 20) were independently confirmed by RT-PCR in sorted monocytes from healthy controls and patients with RA (Supplemental Figure 1). Next, we defined a classical monocyte subset score (cMSS) of a sample as a geometric mean of expression of genes in classical monocyte-specific signature, and nonclassical monocyte subset score (ncMSS) of a sample as a geometric mean of expression of genes in nonclassical monocyte-specific signature. We computed cMSS and ncMSS for each sample. In the independent cohort of healthy controls and patients with RA, we found that cMSS was  (Figures 3B, C).
Overall, our results provide further independent evidence that our monocyte subset-specific signatures are consistently accurate across healthy and disease-affected samples.

Monocyte Signatures Are Highly Specific Across All Immune Cells
Although our transcriptional signatures are accurate and specific within monocyte subsets, irrespective of disease state and gene expression platform, they were obtained using gene expression data solely derived from monocytes. Therefore, we asked whether our signatures maintained their specificity and accuracy when compared across all immune cell lineages, including B cells, T cells, NK cells. This is important, as their direct application to blood or biopsy-derived expression profiles, which contain multiple and diverse cell populations, would otherwise produce confounded results (43). To answer this question, we compared effect sizes for each gene in our monocyte subset signatures across 20 sorted human immune cell types using 6160 transcriptomes from across 42 different microarray platforms described before (31). Hierarchical clustering of Hedge's g effect sizes (see Methods) of the genes in monocyte subset-specific signatures distinguished myeloid and lymphoid lineages (Supplemental Figure 3). Further, within the myeloid cluster, both CD14+ and CD16+ subsets clustered separately from other myeloid cells (Supplemental Figure 3). Next, we calculated cMSS and ncMSS scores for each sample and evaluated their ability to accurately distinguish each subset among all immune cells. As expected, cMSS scores were significantly higher in classical monocytes compared to nonclassical monocytes (t-test p=1.2e-7) and other immune cell types (t-test p<2.2e-16, Figure 4A). Similarly, ncMSS were significantly higher in nonclassical monocytes compared to classical monocytes (t-test p=4.6e-9) and other immune cell types (t-test p<2.2e-16, Figure 4B).

Monocyte Signatures Reveal Changes Associated With Disease and Treatment
Next, we hypothesized that monocyte subset-specific signatures and their corresponding scores, cMSS and ncMSS, could be used to monitor changes in proportions of monocyte subset levels associated with disease. To test this hypothesis, we analyzed transcriptome profiles of whole blood samples (GSE93272) from healthy controls (n=43) and patients with RA (n=232) (44). We computed cMSS and ncMSS for each sample. We observed a significant increase in both cMSS (p = 4.2e-4) and ncMSS (p = 4.4e-8) in RA-patients compared to healthy controls, which is in line with increased monocyte proportion in patients with RA that has been previously observed ( Figure 5A) (45). Finally, we assessed whether our signatures could detect changes in cellular composition induced by treatment. To this end, we analyzed a longitudinal dataset, GSE80060, profiling whole blood samples of patients affected by sJIA before and after treatment with canakinumab, a monoclonal antibody against IL-1 beta.  Changes in levels of circulating monocytes in sJIA have been described, with higher levels during active disease (46,47). When comparing pre-and post-treatment samples, we measured a significant decrease in both classical (p = 2.9e-12) and nonclassical (p = 1.5e-6) signatures post-treatment irrespective of response ( Figure 5B). Our results indicate that our signatures can be used to specifically monitor changes in monocyte subsets occurring during disease and treatment.

Validation of Monocyte Signature Genes as Novel Cell Surface Markers for Subset Quantification Using Cytometry
Cell type-specific gene signatures have been shown to enable accurate in silico estimation of corresponding cell types from expression data of mixed-cell samples, such as whole blood or peripheral blood mononuclear cells (PBMCs) (31). We therefore tested whether these signatures could be used to accurately quantify monocyte subsets within human samples by using publicly available expression profiles from healthy human PBMCs with paired flow-cytometry [GSE65316 (48)]. We indeed found the cMSS to be strongly and significantly correlated with cytometry-measured monocyte proportions across all samples (r = 0.69, p = 6.7e-4; Figure 6). Next, we hypothesized that our large-scale transcriptome analysis would enable identification of cell surface markers to improve cellular phenotyping by cytometry using FACS or CyTOF. To test this hypothesis, we selected an extended set of genes that were significantly highly expressed in either classical or non-classical monocytes in both discovery and validation samples with an absolute effect size ≥ 1, had documented surface expression, and for which an antibody for follow-up protein quantification by cytometry was commercially available (Supplemental Table 5). We selected CD114 (gene name: CSF3R, ES=-1.28, p=2.84e-7), CD32 (gene name: FCGR2A, ES=-1.21, p=9.48e-7), CD36 (ES=-1.17, p=3.63e-6), and IL17RA (ES=-1.10, p=4.39e-6) as markers with higher expression in classical monocytes, and SIGLEC10 (ES=4.93, p=2.2e-70) as a marker with higher expression in nonclassical monocytes compared to classical monocytes ( Figures 7A, B).
We profiled cell surface proteins corresponding to these differentially expressed genes in PBMCs from healthy adult donors (n=8) and pediatric patients with sJIA (n=12). In both healthy adults and sJIA subjects, expression of the selected markers was higher in the corresponding monocyte subset, as predicted by transcriptome analysis (Figures 7C, D) and Supplemental Figure 5). All of our predicted proteins had significantly different levels between classical and nonclassical monocytes on the cell surface in both healthy controls and sJIA patients (p<0.05).
In summary, we have developed monocyte subset-specific robust and parsimonious gene expression signatures. Our results highlight their specificity and accuracy irrespective of technical and biological confounders and show their utility in translational applications. More importantly, our approach demonstrates that genes differentially expressed between two groups despite biological and technical heterogeneity across multiple independent datasets can be robust differentiators of the two groups at the protein level as well.

DISCUSSION
Here, we describe the generation and application of robust and parsimonious gene expression signatures to accurately and specifically quantify changes in monocyte subset levels from existing publicly available datasets. Our analysis presented here builds upon an existing framework that was previously applied to create a new and unbiased basis matrix for cell-mixture deconvolution of gene expression data. By applying this computational framework that integrates existing heterogeneous public expression data from sorted human monocytes, we identified gene signatures for the classical and nonclassical subsets, each consisting of ten over-expressed genes. We then validated our signatures using transcriptome profiles of 6661 sorted immune cell samples across 168 studies, including samples from patients with various diseases to demonstrate their generalizability despite biological, clinical, and technical heterogeneity. In addition, we profiled two independent validation cohorts by RNA-seq and flow-cytometry, respectively, to validate our signatures at the individual gene level. Our current work differs from previous efforts in several meaningful ways: First, our previous work on deconvolution aimed at building a basis matrix, immunoStates, that would account for 20 immune cell types. As a result, immunoStates is composed of more than 300 genes, and is applied in its entirety to deconvolve a sample of interest. Other studies, such as the one by Monaco et al. (36), established deconvolution approaches to quantify different cell types with high accuracy, as they rely on larger gene sets and statistical approaches tailored toward a specific data type or platform (e.g., RNA-seq).
In contrast, here we focused on creating cell-type specific signatures consisting of only a small set of genes to be used independent of any other signatures or deconvolution framework, while retaining high specificity and accuracy across multiple platforms. This strategy allows the researcher to specifically measure our parsimonious signature genes in a sample of interest using targeted assays such as qPCR or nanoString, which can be useful and cost-effective in pilot studies and clinical settings and is therefore complementary to tailored deconvolution approaches (36).
Second, our current gene selection strategy was chosen to prioritize genes that could be easily used as individual biomarkers for cytometry-based assays. Such strategies take into account the directionality of the markers and their expression difference, to increase the likelihood of validation by flow-cytometry. Indeed, a number of genes in our signatures correspond to surface markers with commercially available antibodies. Using this set of markers, we confirmed the subset specificity of the markers in both healthy and disease samples at the protein level. Among the markers identified and validated by flow cytometry, only CD16 and CD36, in addition to CD14 and HLA-DR, have been commonly used to identify monocyte subsets (49,50). The additional markers we identified could thus be potentially useful in further probing the heterogeneity of monocyte subsets, as revealed by recent studies utilizing the high dimensionality of mass cytometry and single cell sequencing to tease out the heterogeneity of the human monocyte population (42,43,51).
Finally, our analysis leveraged only samples profiled from healthy individuals, whereas our previous work included expression data from disease-affected samples as well. Our rationale for this decision was based on having on average 22 studies per targeted cell type in our discovery set, which triples both the statistical power and the amount of accounted heterogeneity in this study compared to our previous work [8 studies per cell type, (31)]. We hypothesized these increases would result in more robust signatures, and our validation cohort validated our signatures irrespective of disease state. Analysis of our cohort also revealed that the expression levels of our signature scores in intermediate monocytes were intermediate between the classical and non-classical subsets. It has long been debated whether intermediate monocytes exist as a stable subset or represent a transitional state between classical and non-classical monocytes (52). Another alternative, not necessarily mutually exclusive, is that intermediate monocytes may comprise a heterogeneous cell mix (42,53). Our results are consistent with evidence that human intermediate monocytes, like similar cells in mice (54) are an intermediate, transitory subset between classical and non-classical monocytes rather than a fixed, independent population (55). To test this hypothesis, we generated an additional expression signature from our discovery data, identifying genes that could specifically distinguish intermediate monocytes compared to all other subsets. Using the same criteria applied for the other signatures, we identified 10 genes that accurately distinguished intermediate monocytes from all other subsets (ATG2A, ATP50, DX39A, EVL, GPR183, LPCAT1, POU2F2, TSC22D4, ZNF14, ARHGAP27, Supplemental Figure 6). Unlike our previous signatures, we did not observe a good distinction of intermediate monocyte samples, neither in our own discovery set, nor in our validation cohort. Similarly, although gene expression analysis, both from microarray as well as from single cell RNA-seq analysis, generally support the concept of genetically separate three monocyte subsets, the exact nature of the intermediate subset, and its relationship to other monocyte subsets, could not be fully determined (56).
Our work has several limitations. First, our discovery data sets consist exclusively of microarray datasets, which can limit the number and type of cell-type specific markers that can be discovered compared to sequencing-based transcriptome profiling. This is particularly relevant in the context of intermediate monocytes. For example, HLA-DM has been identified as a strong discriminating marker that separates the intermediate subset from classical and non-classical (37). However, this marker is not usually profiled on microarrays, limiting our discovery potential. Secondly, we identified our signature genes by simply selecting the top 10 from a ranked list. While this approach is simple and intuitive, it prevents the consideration of other high-ranking genes as potential biomarkers. This potential can be explored in future work, where additional gene set selection strategies can be applied to this data.
Finally, to increase robustness and power of our signatures, our work leverages solely transcriptomic data without accounting for differences occurring post-transcriptionally that may affect final protein levels (57). This concern is especially relevant when translating our results into cytometry/staining based assays that leverage protein expression of surface markers. To this date, highthroughput proteomics data is limited by technical constraints on the number of protein markers that can be simultaneously profiled on a single sample. Advances in mass-cytometry based techniques can in principle extend our ability to profile multiple markers expressed in a single-cell (58), as well as proteomics (59), but at scales substantially lower than transcriptomics-based assays. In conclusion, we present a collection of robust and parsimonious gene expression signatures that can distinguish and quantify monocyte subsets across disease affected samples and can be used to identify cytometry biomarkers. Our work provides several applications and highlights the potential for our signatures and markers to be used in clinical and translational settings.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository and accession numbers can be found in the Supplemental Material.

AUTHOR CONTRIBUTIONS
FV, EM, and PK designed research, interpreted the results, and wrote the manuscript. FV and LZ performed bioinformatic analysis. CM contributed to experimental design, drafting of and revising manuscript. CM, SH, NR-a, and SM performed experiments and analyzed data. MN and JG performed patient sample collection and processing. EM and PK supervised the study. All authors contributed to the article and approved the submitted version.