Identification of novel clusters of co-expressing cytokines in a diagnostic cytokine multiplex test

Introduction Cytokines are mediators of the immune system that are essential for the maintenance, development and resolution of immune responses. Beneficial immune responses depend on complex, interdependent networks of signaling and regulatory events in which individual cytokines influence the production and release of others. Since disruptions in these signaling networks are associated with a wide spectrum of diseases, cytokines have gained considerable interest as diagnostic, prognostic and precision therapy-relevant biomarkers. However, currently individual cytokines testing has limited value because the wider immune response context is often overlooked. The aim of this study was to identify specific cytokine signaling patterns associated with different diseases. Methods Unbiased clustering analyses were performed on a clinical cytokine multiplex test using a cohort of human plasma specimens drawn from individuals with known or suspected diseases for which cytokine profiling was considered clinically indicated by the attending physician. Results and discussion Seven clusters of co-expressing cytokines were identified, representing common patterns of immune activation. Common expression profiles of the cytokine clusters and preliminary associations of these profiles with specific diseases or disease categories were also identified. These findings increase our understanding of the immune environments underlying the clinical presentations of patients of inflammatory, autoimmune and neoplastic diseases, which could then improve diagnoses and the identification of evidence-based treatment targets.


Introduction
Cytokines are protein mediators of the immune system that have major impacts on the maintenance, activation, progression, regulation and resolution of immune responses (1). Cytokines can be broadly classified as pro-inflammatory or anti-inflammatory factors, as chemokines which regulate cell migration, or as stimulating factors that drive the proliferation, survival and differentiation of hematopoietic cells and their progenitors. These broad classifications are complicated by evidence that most cytokines are highly pleiotropic and can regulate diverse biological processes. The function of cytokines is dependent on the tissue, immune or inflammatory context and microenvironment, as factors that are pro-inflammatory in one context can be anti-inflammatory in another. For example, interleukin (IL)-4 is a major driver of inflammation in allergic diseases such as asthma (2), but has anti-inflammatory effects in autoinflammatory conditions such as rheumatoid arthritis and psoriasis (3). Additionally, cytokines can be characterized based on the type or pathways of immune response that they contribute to, although as noted above, these characterizations are variable based on their highly pleiotropic nature.
Broadly considered, immune activation can be classified as innate or adaptive. Cytokines that are induced, activated, and/or released following the recognition of molecular patterns associated with pathogens (pathogen-associated molecular patterns or PAMPs) or to tissue damage (damage-associated molecular patterns or DAMPs), such as the IL-1 family (IL-1a, IL-1b, IL-18, IL-33), IL-6, tumor necrosis factor a (TNFa), and interferons (IFN), contribute to innate immunity (4). Adaptive immunity refers to inflammation driven by lymphocytes (T and B cells). Different programs of adaptive immunity have evolved to confront different types of pathogens: type 1 immunity (driven by T helper (Th) 1 cells) targets intracellular pathogens such as viruses; type 2 immunity (Th2-mediated) targets multicellular pathogens such as helminths; and type 3 immunity (Th17-mediated) targets small extracellular pathogens. The T cell responses then orchestrate the release of inflammatory mediators and the recruitment of appropriate leukocytes to the site of inflammation. Dysregulation of type 1 and type 3 immune responses tends to result in autoimmune and autoinflammatory conditions, while dysregulation of type 2 immune responses tends to result in conditions associated with allergy, such as asthma (5). The dysregulation of these immune pathways often involves an unusual expression or release of cytokines. Individual cytokines have therefore been identified as diagnostic and therapeutic targets in a number of conditions, such as autoimmune and autoinflammatory diseases (6,7), chronic inflammatory conditions (8), cancers (9), among others. As such, there is considerable interest in using circulating cytokine levels as complimentary diagnostic, prognostic or actionable biomarkers in inflammatory and autoimmune diseases (10).
An established methodology in the quantification of cytokines is the addressable laser bead (i.e., Luminex ™ ) technology, which consists of highly sensitive bead-based multiplex assays that allow the simultaneous detection of up to 100 analytes using a low volume of biological fluid samples (1). In our laboratory (Eve Technologies Corporation, Calgary, AB), this technology platform is used to provide multiplex diagnostic tests quantifying levels of up to 71 circulating cytokines, chemokines and growth factors. However, the interpretation of a test result consisting of a large panel of analytes can be challenging, particularly when the biomarkers can have diverse and opposing biological effects in different immune contexts, and the immune responses that they regulate involve complex cascades of signaling events.
Our goal was to enhance the interpretation of our testing by reflecting the interaction of multiple analytes in the immune environment in our clinical results, which we aim to achieve by identifying patterns of commonly co-expressing cytokines by clustering analysis. Clustering analysis is an unbiased data mining technique that separates a dataset into groups based on the similarity of the individual datapoints. This type of analysis has become a common approach to identify cytokine signatures as diagnostic or prognostic markers in diseases such as COVID-19 (11), systemic lupus erythematosus (12), chronic lymphocytic leukemia (13), small cell lung cancer (14), among many others. By performing clustering analyses in a cohort of clinical specimens submitted for cytokine testing, our aim was to identify patterns of cytokine expression reflecting common signatures of immune activation in a diverse array of inflammatory, autoimmune, and neoplastic conditions which could then be used to aid clinicians in making diagnoses and in identifying potential treatment targets.

Specimen collection
Plasma-EDTA specimens were collected for cytokine, chemokine and growth factor testing as requested by 88 health care providers and referral labs in eight Canadian provinces and territories and the USA, and submitted either to our laboratory (Eve Technologies, Calgary, AB) or to an affiliated laboratory (MitogenDx, Calgary, AB) between March 2021 and October 2022. All patient identities were anonymized and only demographic information (age, sex, reason for testing/diagnosis, location) was extracted from the test requisition forms. Information on disease severity or clinical outcomes associated with the specimens was not recorded. Specimens were received on dry ice and stored at -20°C until they were processed (typically within 3-5 days). Specimens that were provided with incomplete patient demographic information (age, sex, location), that did not meet pre-analytic quality criteria (e.g., grossly hemolyzed or lipemic) and/or that demonstrated possible test interferences (e.g., human anti-mouse antibody (HAMA)-induced false positive signals (15)) were not included in the analysis.

Statistical analysis
Since a relative absence (or low levels) of any particular analyte could be meaningful with respect to patterns of cytokine expression, datapoints with fluorescence values below the range of the standard curves (for which concentration values are unable to be interpolated) were assigned a value of 0.01 pg/ml and were included in the analysis. The concentration values derived from the standard curves for each analyte were transformed into percentile ranks to bring the entire dataset within a similar range for the clustering analyses and heatmap generation.

Cytokine clustering analysis
K-means clustering was performed using R (version 4.2.2) with the base 'kmeans' function, and the resulting clusters were visualized with 2-dimensional cluster plots using the factoextra R package. The optimal number of clusters was estimated with both silhouette analysis and the elbow method (within-cluster sum of squared errors) using the factoextra package. Hierarchical clustering using Spearman rank correlation coefficients as the distance metric was performed using the base 'hclust' R function, and the resulting dendrogram was visualized using the factoextra package. Cytokine clusters were identified by partitioning the dendrogram according to the height metric provided by the factoextra package. Additionally, to assess correlations between every analyte, a correlation matrix plot of Spearman rank coefficients (calculated with the hmisc R package) was generated using the corrplot R package with only significant (p < 0.01) correlations plotted.
The approach of comparing the results of two separate clustering analyses, K-means and hierarchical clustering, was taken to identify more strongly clustering analytes, since the algorithms associated with different clustering methods can yield variable results (16). As each strategy has its own advantages and disadvantages (17), a consensus between the two different clustering methods can be taken as an indication of the validity of the clusters, which can be corroborated with an assessment of the potential physiological significance of each cluster.

Patient clustering analysis
To assess the potential clinical or pathophysiological significance of the cytokine clusters, as well as associations between patterns of cytokine signaling and demographic characteristics (age and sex), the patient data were clustered and visualized as a heatmap using the ComplexHeatMap R package (18). The analytes were grouped into the consensus clusters, the patients were grouped by age, since circulating cytokine levels have been shown to be influenced by age (19), and the different patient age groups were clustered separately with hierarchical clustering using Spearman rank correlation coefficients as the distance metric. Groups of patients with similar cytokine profiles were identified by partitioning the dendrograms based on visual inspection of the heatmaps and dendrograms. To characterize the cytokine cluster expression profiles for each of the patient clusters, the median percentile rank values of all of the analytes within each cytokine cluster were calculated. To illustrate the relative expression levels of the cytokines within each cluster, the percentile distributions were partitioned into quintiles, with median percentile rank values of 0-20% designated to be 'low'; 20-40% 'moderate low'; 40-60% 'median'; 60-80% 'moderate high', and 80-100% 'high'.
To assess differences in demographic information of each patient cluster compared to their respective total age group cohorts, the following tests were performed: The Kruskal-Wallis test, with Dunn's test to correct for multiple comparisons, was used to compare the median age values using Graphpad Prism version 9.5.0; the binomial test was used to compare the sex ratios using the base 'binom.test' R function.
Additionally, to examine the association of patterns of cytokine expression with specific conditions, a heatmap was generated using only the patients that had diagnoses or reasons for testing available, and limited to conditions associated with at least two patients. Patients associated with the same condition, or with a similar category of conditions (e.g., lymphoproliferative diseases, or eosinophilic conditions, etc.), were manually grouped together, patients within each group were clustered using hierarchical clustering, and all clustered groups were combined into a single heatmap for comparison.

Patient demographics
254 specimens drawn from unique individuals were included in the analysis, including 66 children and 188 adults. The median age of the total cohort was 39 years, with a range from 6 months to 87 years old. The pediatric cohort had a median age of 8.5 years with a range from 6 months to 17 years old, and the adult cohort had a median age of 46 years with a range of 18-87 years old. The sex ratio of the individuals was approximately equal with 128 males and 126 females, with 36 males and 30 females in the pediatric cohort, and with 92 males and 96 females in the adult cohort. Patient demographic information in this cohort is summarized in Table 1.

Cytokine clustering
Silhouette and elbow method analyses of the percentile rank data yielded an estimated optimal number of five clusters for this dataset for K-means clustering ( Figure 1A). Seven clusters were identified in the hierarchical clustering analysis by partitioning the dendrogram ( Figure 1B). Comparisons of the clusters generated by each of the clustering methods yielded seven consensus groupings of consistently co-clustering cytokines, with twelve analytes (CCL5, CCL8, CCL7, CCL13, CCL22, FLT-3L, IFNg, IL-12p40, IL-13, IL-22, TNFb, TRAIL) that were placed in separate clusters by the different clustering methods (summarized in Table 2). Clusters 1-3 and 6 contain mostly pro-inflammatory factors and hematopoietic growth factors; cluster 4 contains mostly chemokines; cluster 5 contains several major growth factors; the functional link between the analytes in cluster 7 is unclear.
Significant correlations (p < 0.01) between each analyte are visualized in a correlation plot ( Figure 2) where positive correlations are coloured blue, negative correlations are coloured red, and non-significant correlations are left white in the plot. The analytes that were grouped into different clusters by the two clustering methods were placed in the cluster associated with the highest Spearman coefficients. Each cytokine cluster (separated in the plot by dashed lines) demonstrates strong intra-cluster correlations, visualized as blue squares along the diagonal axis of the plot.

Patient clustering
Patients were grouped based on age and the different age groups were clustered separately, yielding four clusters for the pediatric (0-17 years) age group (clusters A1-A4), five clusters for the 18-39 year age group (clusters B1-B5), six clusters for the 40-59 year age group (clusters C1-C6), and three clusters for the 60+ year age group (clusters D1-D3), each consisting of patient specimens with similar cytokine expression patterns ( Figure 3). The cytokine profiles of each patient cluster, characterized by the median percentile rank values of each cytokine cluster, along with the known diagnoses/ reasons for testing associated with the individuals in each patient cluster, are summarized in Table 3 (pediatric) and Table 4 (adult). Similar conditions were grouped into the same clusters in some instances, for example there were multiple diagnoses of arthritis and fever in cluster A1, hemophagocytic lymphohistiocytosis (HLH) in clusters B1 and C3, adult-onset Still's disease (AOSD) in clusters C3 and D3, and autoimmune/autoinflammatory conditions in cluster C1, etc. However, there were also several instances in which patients with the same condition were grouped into different clusters within each age group. For example, there were patients with AOSD that were grouped into clusters B1 and B5, C1 and C3, D2 and D3, etc.
This finding could indicate the presence of multiple immune profiles associated with the same condition.
Demographic data associated with each patient cluster are summarized in Table 5. There were no significant differences in the demographic data between the age group-specific clusters and their respective total cohorts, with the exception of the 40-59 years age group, in which cluster C1 had a significantly higher proportion of females (p= 0.0086), and cluster C5 had a significantly younger median age (p= 0.027) than the total age group cohort.
Since patients with the same condition were often grouped into different patient clusters, patterns of disease-associated cytokine expression were visualized in groups of patients manually selected based on the diagnoses/reasons for testing provided on the test requisitions ( Figure 4). Certain conditions appeared to be associated with consistent cytokine expression profiles based on a visual assessment of the heatmap: Lymphoproliferative syndromes (lymphocytic leukemia, autoimmune lymphoproliferative syndrome, HLH) tended to be associated with higher concentrations of the analytes in cytokine cluster 2 and moderate to low values in clusters 3 and 5; systemic inflammatory diseases in children (multisystem inflammatory syndrome in children (MIS-C), Kawasaki disease and juvenile arthritis) were associated with higher values in each of cytokine clusters 1, 2, 3, and 6; patients with Crohn's disease (but not early-onset IDB) tended to be associated with high values in cytokine cluster 3; morphea was consistently associated with higher levels of certain chemokines in cytokine cluster 2, including CCL1, CCL8, CXCL13, and the IFN-induced chemokines CXCL9 and CXCL10. As observed in the patient clusters, other conditions were each associated with multiple cytokine expression profiles, such as AOSD, hidradenitis suppurativa, fevers, and eosinophilic diseases.

Discussion
A cohort of plasma specimens provided for clinical diagnostic testing by multiple specialists from Canada and the USA, representing individuals with a wide spectrum of confirmed or suspected inflammatory, autoimmune, neoplastic and neurological conditions, clusters of co-expressing cytokines were identified utilizing machine learning methods. Seven distinct cytokine clusters associated with different adult and pediatric disease groups suggesting there are common patterns of immune activation that are readily identifiable by cytokine multiplex array testing. Through the implementation of an evidence-based approach to cytokine testing, clinicians can utilize these biomarkers to improve diagnostic precision, thereby promoting a deeper comprehension of the pathogenic mechanisms and prognosis for these diseases, and ultimately facilitating the development of more effective treatments. There are also several analytes in cluster 3 that are prominently derived from or that act primarily upon epithelial cells, such as TSLP (27), IL-33 (28), TGFa (29), IL-28A (30), and IL-20 (31), which suggests an underlying functional basis for this grouping. Cluster 4 consists mainly of chemokines, of which CCL21, CCL27, and CXCL12 are known to be constitutively expressed homeostatic chemokines (32). Cluster 5, in which growth factors are prominently represented, likely represents a wound healing and/or platelet activation response, as these factors are generally stored in and released by platelets (33).
with IL-2 and IFNa2, which both suppress Th17 differentiation and function (20, 35). These patterns of mixed T cell cytokine release could also reflect the heterogeneity and plasticity observed in T cells in vivo, since "hybrid" cells that co-express the signature cytokines of two different T cell subtypes (e.g., IL-4 with IFNg, IFNg with IL-17A, IL-4 with IL-17A, IL-13 expressed by both Th1 and Th17 cells) may be common (36)(37)(38).

Patient clusters
The pathological significance of these cytokine clusters in this cohort is limited due to the lack of detailed clinical information associated with the specimens, as well as low sample numbers associated with any one specific disease or condition. However, some potentially informative patterns were revealed with the patient clustering analyses. The largest category of known diagnoses or reasons for testing of the patients in this cohort consisted of conditions that are often associated with cytokine storm, such as adult-onset Still's disease (AOSD) (39), hemophagocytic lymphohistiocytosis (HLH) (40), macrophage activation syndrome (MAS) (41), COVID-19 (42), multisystem inflammatory syndrome in children (MIS-C) (43), and hematopoietic stem cell transplantation (HSTC) (44), as well as Kawasaki disease, which has been shown to involve many of the same cytokines as those described in cytokine storm (45). The majority of the patients with these conditions had high values of cluster 2 analytes (many found in patient clusters A3, B2, and C3), which could indicate that this cytokine cluster includes the major mediators of cytokine storm. Higher values of the analytes in cluster 2 are also associated with lymphoproliferative syndromes (acute and chronic lymphocytic leukemia, autoimmune lymphoproliferative syndrome (ALPS)) and juvenile arthritis in this cohort.
The abundance of the analytes in cytokine cluster 3 may also provide diagnostic or prognostic value. High values in this cytokine cluster are observed in patient clusters, A1, A3, B3, B4, C1, C2, and D1, and tend to be associated with conditions such as arthritis, Kawasaki disease and Crohn's disease (but not early-onset IBD), as well as subsets of the patients that were tested for AOSD, fever, autoimmune diseases, non-specified autoinflammatory disorders and eosinophilic conditions. These analytes were clustered in an analysis of cytokines released by peripheral blood mononuclear cells (PBMCs) isolated from COVID-19 patients (11). In that study high levels of these analytes were associated with greater disease severity, coagulopathy, and mortality, so it would be interesting to assess if similar outcomes apply to the other conditions with high values of the cluster 3 analytes. A correlation matrix of Spearman rank coefficients between the percentile rank data of each analyte. Only significant correlations (p < 0.01) are plotted, non-significant correlations are left blank. Analytes that were placed into separate clusters by the different clustering methods are included in the clusters in which they had the highest associated Spearman rank coefficients, and are highlighted with red text and asterisks (*).
Cytokine cluster 5, comprised of analytes that are released by activated platelets, may provide insight into the involvement of platelets in different conditions. High values in this cytokine cluster were observed in patient clusters A2, B3, C1, and D2, and associated with conditions that often involve thrombocytosis, such as AOSD (46), Kawasaki disease (47), juvenile arthritis (48), familial Mediterranean fever (FMF) (49), COVID-19 (50), and Crohn's disease (51), whereas conditions that are generally associated with thrombocytopenia, such as HLH (52), lymphocytic leukemia (53), and hematopoietic stem cell transplantation (54) tended to have lower values. Further validation studies could help to clarify whether this cluster of cytokines has prognostic value for conditions in which abnormalities in platelet production and function are associated with disease severity.
There are some conditions that presented with different immune activation patterns (such as AOSD, fever of unknown origin, autoinflammatory disease) in this cohort. As this study only includes single samples from unique individuals, the temporal dynamics of cytokine expression related to immune activity and disease progression were not considered. The distinct cytokine profiles associated with individual diseases could be representative of different stages of disease progression, severity, exacerbation, etc. Furthermore, there was limited information provided of the treatments that were prescribed for these patients, which could have effects on the observed immune activation profiles. Explorations of the relationships between cytokine profiles and clinical presentations, disease stage/severity, treatments and treatment responses, and outcomes would therefore be an important future direction, particularly as they pertain to treatment options and whether different treatments may be more or less effective for inflammation characterized by different immune activation patterns. Other sources of the variability of the cytokine profiles observed for specific conditions could be due to the influence of patient demographic status on immune responses, and more detailed studies on these factors would also be of value. For example, in this cohort only children tested for fever, and only Distinct patterns of cytokine signaling are revealed by performing hierarchical clustering of the patients using the percentile rank data for each analyte. Individuals were grouped by age and the different age groups were clustered separately. Four clusters in the 0-17 year age group, five clusters in the 18-39 year age group, six clusters in the 40-59 year age group, and 3 clusters in the 60+ year age group were identified by partitioning the dendrograms at the height of the dashed line, and cytokine signatures are identified by their associated patterns in the heatmap. Demographic information for each patient is presented in the annotations on the right side of the heatmap.  adults tested for autoinflammatory syndromes, had high values in cytokine cluster 3. It would be interesting to assess if these agerelated differences are consistently observed in larger cohorts. Similarly, the expression of inflammatory and anti-inflammatory mediators has been found to change in post-menopausal women (55), so assessing the influence of menopausal status on these immune signatures in women may be an interesting and important factor requiring additional in-depth study. Other characteristics known to influence immune responses that could not be studied in this dataset due to a lack of patient-specific information, such as body mass index (BMI) (56), could also be a focus of future studies into the clinical significance of cytokine clusters. Additionally, although there is a range of diverse conditions represented in this cohort, there are likely other immune activation patterns that were not included in the present study cohort. Additional immune activation patterns may be revealed as further data are collected.
The main strength of this study is that the cohorts studied were representative of the samples received by a clinical diagnostic laboratory. We did not control for age, sex, or clinical presentation  for this analysis, and as such the immune activation patterns that we identified are likely common in inflammatory and autoimmune conditions, and are likely be observed in future clinical test results. Hence, further exploration and validation of these results in separate highly characterized cohorts is required. The weakness of this study design is that the information that can be associated with the samples is limited to that provided on the test requisition forms, so we are unable to make strong inferences about the clinical relevance of the clusters with respect to diagnoses or clinical outcomes. Also, as the samples were collected remotely, there were possible variables in sample collection and transport (e.g., samples left thawed for varying periods of time before freezing) that we were unable to control but which reflects real world diagnostic serology. Further controlled studies on selected patient cohorts of specimens using strict standardized operating procedures would be useful to determine any potential effects of these potential variables.
In summary, we have identified novel clusters of co-expressing cytokines and common patterns of immune activation in a cohort of patient specimens provided for diagnostic cytokine testing. These clusters and immune signatures may provide clinicians with additional information regarding the inflammatory environment underlying the clinical presentation of their patients, and therefore could be a valuable resource in supporting diagnosis and identifying treatment options. The patients were grouped according to the diagnoses or reasons for testing provided on the test requisitions, and hierarchical clustering of the patients within each group was performed. Each row represents an individual patient with the associated diagnosis or reason for testing listed on the right side of the heatmap.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.
Author contributions DP conceived of the study, conducted the literature review and wrote the manuscript drafts; PL and KB provided study design advice and biostatistics oversight; MC, KB, MJF, and MLF edited the manuscript, through to the final version. All authors read and approved the final submission.

Conflict of interest
MJF is Director of Mitogen Diagnostics Corporation and has received honoraria and consulting fees from Werfen and Aesku; MC is the Associate Director of Mitogen Diagnostic corporation and has received consulting fees from MitogenDx, Janssen, AstraZeneca, GSK, Mallinckrodt Pharmaceuticals. DP, PL and MLF are employees of Eve Technologies, a company that provides fee for services cytokine testing.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.