Finding Gene Regulatory Networks in Psoriasis: Application of a Tree-Based Machine Learning Approach

Psoriasis is a chronic inflammatory skin disorder. Although it has been studied extensively, the molecular mechanisms driving the disease remain unclear. In this study, we utilized a tree-based machine learning approach to explore the gene regulatory networks underlying psoriasis. We then validated the regulators and their networks in an independent cohort. We identified some key regulators of psoriasis, which are candidates to serve as potential drug targets and disease severity biomarkers. According to the gene regulatory network that we identified, we suggest that interferon signaling represents a key pathway of psoriatic inflammation.


INTRODUCTION
Psoriasis is one of most common chronic inflammatory disorders of the skin, with an estimated 2-3% of adults affected worldwide (1 Psoriasis is characterized by altered keratinocyte differentiation, hyperproliferative epidermis and infiltrating inflammatory cells leading to extensive inflammation (2). Several studies have been performed to understand the molecular pathways underlying psoriasis. The IL-23/IL-17-axis is the most consistent molecular signature underlying psoriasis, resulting in activation of infiltrating immune cells such as myeloid or conventional dendritic cells (mDC or cDC), neutrophils and T-cells, leading to proliferation of keratinocytes (3). However, the exact mechanisms underlying psoriasis remain unclear (4).
One possibility to reveal those underlying mechanisms is based on gene expression data. Using differential gene expression analysis, common differentially expressed genes (DEGs) in psoriasis have been annotated (5). The latter large-scale study also has created a co-expression network of lesional and non-lesional skin, and integrated knowledge about factors such as druggability, genetic association and cell line-specific expression profiles. This analysis however is lacking an inference of regulators in psoriasis. To create such a regulator-target network, different computational approaches have been used. One reliable approach is based on a random forest algorithm, which splits the gene expression data in different learning samples and then finds the networks that most accurately predict the underlying data. The recently developed computational tool RegEnrich also supplies a list of regulators and has shown high accuracy in finding underlying gene regulatory networks (6).
Here, we analyzed large-scale public data to unravel the robust gene signatures in psoriasis using differential gene expression analysis and gene regulatory network analysis of six public cohorts and one independent validation cohort. We compared our results from individuals with psoriasis with those from individuals with atopic dermatitis (AD), to distinguish between specific regulators of psoriatic skin lesions and regulators involved in general skin inflammation ( Figure 1). We found that psoriasis and AD share many DEGs and pathways, but also identified regulators, such as FOXE1, FOXM1, STAT3 and SOX7, and discovered a gene regulatory network unique for psoriasis.

Data Acquisition
Four RNA-seq datasets on psoriasis and three on atopic dermatitis were included in this study (Supplementary Table 1) (7)(8)(9)(10)(11). Among these datasets, three datasets on psoriasis (GSE67785, GSE83645, GSE63979) and one on atopic dermatitis (GSE65832) were obtained from the manually curated and publicly available database by Federico et al. (12). These datasets were already pre-processed, harmonized and prepared according to the FAIR principles (13 The additional RNA-seq datasets GSE121212 (for both psoriasis and AD cohorts) and GSE140227 were retrieved from the gene expression omnibus (GEO) database. Genes with an average count lower of 1 were excluded from all datasets.

Validation Cohort
As a validation cohort, we used an independent dataset from our previous study (14). This included i) individuals with psoriasis with a dermatologist-confirmed diagnosis of psoriasis, in whom concomitant psoriatic arthritis (PsA) was clinically excluded by a rheumatologist, and ii) individuals with psoriasis and concomitant PsA [fulfilled classification of psoriatic arthritis criteria (CASPAR)] (15). Individuals with a clinical diagnosis of ankylosing spondylitis (AS) [fulfilled assessment of spondyloarthritis international society criteria (ASAS)] (16) were included as a non-psoriatic, inflammatory disease reference group, none of whom had a history of psoriasis (Supplementary Table 2). Ethical approval was obtained from the institutional review board before the recruitment of participants. All participants signed written informed consent before participation. As part of a larger overall study design, we performed transcriptome analysis as previously published (14): Briefly, alignment workflow is performed using a two-pass method with STAR (version 020201) (17) to align reads to the reference genome GRCh38 (build 99). The mRNA quantification analysis pipeline measures gene level expression in HT-Seq (version 0.9.0) (18) for raw read count. In differential expression analysis, likelihood ratio test (LRT) was applied for pair-wise comparison between lesion and non-lesion.

Evaluation of Psoriasis Severity at the Site of Biopsy
For the validation cohort, the psoriasis area severity index (PASI) was calculated by adding up the scores for redness, scaling and thickness of head, arms, trunk and legs and multiplying by the coverage of the plaques (19).

Differential Expression Analysis
To find genes that were significantly differentially expressed between lesional and non-lesional skin samples, DEA was carried out using the two-group comparison method Wald significance test of the DESeq2 R-package comparing lesional versus non-lesional samples (version 1.34.0) (21). To make sure that all setting were the same for all public datasets, design formula was simply set as "design =~lesional" and "coef=lesional_lesional_vs_non_lesional". To identify the biological function of the DEGs, pathway enrichment analysis was carried out using the function gseGO, enrichKEGG and enrichPathway of the R package clusterProfiler (version 4.2.0) (22).

Gene Regulatory Network Analysis
The results from DEA were used to infer a regulator-target network using a random forest and gene set enrichment analysis (GSEA) approach implemented in the RegEnrich R package (version 1.4.0), with the parameters: method = "Wald_DESeq2, networkConstruction = "GRN", enrichTest = "GSEA", and default transcription factor list (6). Using RegEnrich, we analyzed the regulator-target relationships between transcription factors and the DEGs obtained from the psoriatic lesional and non-lesional comparison, allowing us to rank the regulators based on their importance in each of the studied cohorts. We selected the regulators that were in the front median rank across all datasets and which correlated with the disease severity and biomarkers of psoriasis, and then visualized the regulatory network using Cytoscape (version 3.8.0) (23). Between Lane Normalization (BLN) and Z-score transformation were used for the heatmap visualization of the gene expression of regulatory network with pheatmap R package (version 1.0.12).

Statistical Analysis
Spearman rank correlations were calculated between gene expression and clinical parameters with correlation R package (version 0.8.0). For differential expression analysis, P values were automatically adjusted with the Benjamini-Hochberg (BH) method with the function DESeq; For the correlation analysis, BH adjustment was applied with function correlation and parameter "p_adjust = 'BH'"; For the hypergeometric tests in pathway enrichment analysis, BH adjustment was performed with parameter "pAdjustMethod = 'BH'" in function gseGO, enrichKEGG and enrichPathway. For pathway enrichment analysis, adjusted P value of 0.2 was used as the cutoff for significance. For other tests, adjusted P values < 0.05 were considered statistically significant. All statistical analyses were performed using R (version 4.0.3) (http://cran.r-project.org/).

Psoriasis Shares Most of the Immune-Related Pathways With AD
We first analyzed the DEGs between lesional and non-lesional skin across all included public datasets (four datasets for psoriasis and three for AD, Supplementary Table 1). Wesix first looked for similarities between the datasets for each disease, and found a considerable number of overlapping DEGs, i.e., 3391 and 1945 consensus genes for psoriasis and AD, respectively. We then studied the overlap between these consensus genes in psoriasis and AD and found 1076 genes to be shared by these two diseases (Figure 2A, Supplementary Table 4). This finding suggests a large similarity in gene expression profile between psoriasis and AD. We used these sets of commonly expressed genes and disease-specific genes to carry out a gene ontology (GO) enrichment analysis to find their functional similarities and differences. As expected, the shared genes display some common inflammatory pathways, mainly the activation of immune system pathways ( Figure 2B). The differential expression of genes in the cell cycle pathway is unique for psoriasis, and may relate to epithelial hyperplasia and hyper-keratinization. The dominance of IL-13 pathways in AD is presented in Reactome pathway enrichment, which is consistent with the recent study (11 In summary, psoriasis has discernable auto-immunological activations, while AD has increased presence of dermal response to stimulus.

Unique Regulators Highlight Differences of Psoriasis and AD
We further explored the public datasets to identify gene regulators consistent across multiple psoriasis and AD datasets. We used a random forest algorithm (as part of the RegEnrich package) to construct the gene-regulatory networks of each individual dataset. Using RegEnrich, we identified the top 30 regulators for each disease that were consistently ranking as the top regulators across multiple datasets corresponding to each disease ( Figure 3A). We found several regulators that have previously been reported to be increased in psoriasis, such as CRABP2 (a carrier protein for retinoic acid signaling pathways) and STAT3 (a central regulator of various immune responses) (24-26) ( Figure 3B). We also identified some regulators that were decreased in the psoriatic condition. For instance, GATA3, a transcription factor for the differentiation of Th2 cells, which is typically increased in AD (27), was down-regulated in psoriatic lesions. Among the top 30 regulators, we identified 18 diseasespecific regulators for each disease; the remaining 12 regulators were shared between psoriasis and AD and may thus be involved in general inflammation of the skin ( Figure 3C).

Independent Cohort to Validate Regulators of Psoriasis
To validate the identified top regulators and to connect them to clinical parameters of psoriasis, we used one of our own recently produced datasets comprising transcriptome profiles of 40 skin samples from a cohort of individuals with psoriasis, psoriatic arthritis or ankylosing spondylitis. All the top 30 regulators were differentially expressed between lesion and non-lesion samples ( Figure 4A, Supplementary Table 4) (14). We further analyzed the correlation between the expression of these regulators and some well-known gene biomarkers of psoriasis (i.e. IL17A, IL17C, IL23A) and measures of disease severity (i.e. the PASI and PSI). Remarkably, 22 out of the 30 identified regulators correlated significantly with at least one of the biomarkers of psoriasis in our cohort, showing the robustness of our method. Nine regulators were significantly positively correlated with disease severity ( Figure 4B). Of these nine regulators, four regulators (STAT3, FOXE1,PITX1,TFD) did not overlap with the top 30 regulators of AD, we consider these four regulators as key regulators for psoriasis. Thus, we visualized the gene regulatory network for psoriasis with these key regulators and their target genes, based on the representation of regulator-target connectivity across all discovery sets and the validation set ( Figure 5A). We found that these regulators and their target genes were highly expressed in lesions, compared to non-lesions or skin from individuals with AS. The regulators and their target genes showed a coordinated expression pattern across all samples ( Figure 5B).
In summary, we validated the gene regulatory networks in our independent cohort, in terms of differential gene expression, clinical phenotype association, and regulator-target gene co-expression.

DISCUSSION
Psoriasis is one of many inflammatory skin diseases. Here, we used a machine learning approach to reveal the underlying gene regulatory networks and to identify the gene regulators responsible for the activity of several other genes. To evaluate if the regulators that we identified are specific for psoriasis or are commonly found in inflammatory skin, we repeated the gene regulatory network inference with datasets of AD. Comparing our results of both diseases, we found a large overlap in the involved pathways. Nevertheless, we were able to highlight distinct regulators for psoriasis and AD. These regulators might help further studies to understand both diseases in more detail.
Our approach firstly generated a list of common and specific DEGs of psoriasis and AD for further investigation. 3391 and 1945 consensus DEGs for psoriasis and AD, respectively. Among these consensus DEGs, 1076 DEGs are shared by these two diseases. Many of the consensus DEGs are known to be relevant to psoriasis or AD. For example, IL17 families (IL17A, IL17C, IL17D, IL17F, IL17RC, IL17RD, IL17RE) were consistently differentially expressed in all psoriatic datasets but not in AD.
By contrast, IL-6 that mainly produced by Th2 cells, is a specific DEG for AD. Interestingly, keratinocyte is involved in the pathogenetic of both psoriasis and AD, but the profile of keratins is distinct between these two diseases. KRT15, KRT24, KRT31, KRT37, KRT78 are differentially expressed in psoriasis, while KRT5 and KRT14 are specific for AD. Further psoriasis and AD share the KRT6A, KRT6B, KRT6C, KRT16, KRT17 as their common DEGs. This pattern provides us more information about the role keratins play in modulation of the immune response in skin. In the pathway analysis, we found that most of the dysregulated pathways in psoriasis being shared with AD. This finding has been reported in some comparative transcriptomic studies of psoriasis and AD (11,28 Further, we identified the gene regulators in psoriasis and AD. After the regulator ranking, we utilized our independent cohort to validate the top 30 regulators of psoriasis. Most of these regulators were associated with the expression of IL17 and IL23. Four regulators were correlated with disease severity and are therefore considered to be key regulators of psoriasis. STAT3 plays an important role in inflammatory responses, and is implicated in the pathogenesis of psoriasis. It has been reported to mediate a majority of cytokines in the development of psoriasis, including the central IL-23/IL-17 axis (24, 29, 30, p. 3; 31, 32). The discovery of the central role of STAT3 signaling in psoriasis pathogenesis has promoted the development of new drugs such as JAK inhibitors (33). FOXE1 is a member of the forkhead box family. A previous study found FOXE1 to be upregulated after IL-17 or TNF simulation on keratinocytes (34). Notably, two additional forkhead box family members (FOXM1 and FOXP3) are also ranked within the top 30 regulators in psoriasis. This might indicate the involvement of this family in the development of psoriasis. TFDP1 is a member of E2F family that control the transcriptional activity of numerous genes involved in cell cycle progression from G1 to S phase. It was identified as central node with another systems biology approach, suggesting it might play a role in the keratinocyte hyperproliferation in psoriasis (35). PITX1 is a member of the RIEG/PITX homeobox family which is involved in organ development and is yet not associated with psoriasis.
We noticed that the four identified key regulators shared most of their target genes with each other. Some of these target genes have been reported as psoriasis-associated genes. For instance, PRKCQ and EPHB2 are reported to be overexpressed in psoriatic skin (36,37). Interestingly, interferon-induced gene families (IFIT3 and IFI16) are involved in this gene regulatory network, which suggests that inflammation is affected by these regulators.
APOL6 is indicated to be mediated by interferon-g in atherosclerotic cells (38). SHCBP1 is one of the host proteins interacting with the C protein which counteracts the host interferon response (39). These target genes are all involved in the interferon signaling pathway (40). This is of particular importance since interferon signaling represents a key feature of psoriatic inflammation. Previous studies demonstrated the interferon responses, especially IFITM1, IFITM2, and CCL5 were increased of in skin of patients with AD (41,42). In our study, we found a prioritized gene regulatory network of interferon signaling in psoriasis but not in AD, suggesting the interferon-mediated priming can contribute to the responses to different inflammatory factors in different conditions. In conclusion, the combination of differential gene expression, tree-based gene regulatory network inference and enrichment analysis on large-scale datasets reveals a set of key regulators in both psoriasis and AD. The identified genes, pathways, and regulators are consistent with the current knowledge about psoriasis and AD, reflecting the credibility of our applied methods. Moreover, we identified some key regulators of psoriasis, which are candidates to serve as disease severity biomarkers and potential drug targets.
However, our study only includes a relatively small number of public datasets. With increasing access to publicly available psoriasis and AD datasets, our approach serves a robust way to integrate large amount of data. Also, we found that the regulators ranked after top 30 were less associated with IL23/IL17 and disease severity (Supplementary Figure 3), so that we only included top 30 regulators for the validation and network visualization, which do not cover all potential regulators of interest. Furthermore, we could validate the network with our psoriatic cohort, but may not have been able to validate our findings with inhouse AD cohort. To assess the clinical relevance of these identified regulators, further analytical and laboratory studies will be necessary.

DATA AVAILABILITY STATEMENT
The skin transcriptome data for validation is available in the GEO database (GSE186063).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Institutional Review Board of University Medical Centre Utrecht. The patients/participants provided their written informed consent to participate in this study.

ACKNOWLEDGMENTS
We appreciated the bio-sample donations of participants of validation cohort in the study. We would like to express our gratitude to the contributors of GEO transcriptome datasets (GSE67785, GSE83645, GSE63979, GSE65832, GSE140227 and GSE121212).