Single-cell RNA-seq public data reveal the gene regulatory network landscape of respiratory epithelial and peripheral immune cells in COVID-19 patients

Introduction Infection with SARS-CoV-2 leads to coronavirus disease 2019 (COVID-19), which can result in acute respiratory distress syndrome and multiple organ failure. However, its comprehensive influence on pathological immune responses in the respiratory epithelium and peripheral immune cells is not yet fully understood. Methods In this study, we analyzed multiple public scRNA-seq datasets of nasopharyngeal swabs and peripheral blood to investigate the gene regulatory networks (GRNs) of healthy individuals and COVID-19 patients with mild/moderate and severe disease, respectively. Cell-cell communication networks among cell types were also inferred. Finally, validations were conducted using bulk RNA-seq and proteome data. Results Similar and dissimilar regulons were identified within or between epithelial and immune cells during COVID-19 severity progression. The relative transcription factors (TFs) and their targets were used to construct GRNs among different infection sites and conditions. Between respiratory epithelial and peripheral immune cells, different TFs tended to be used to regulate the activity of a cell between healthy individuals and COVID-19 patients, although they had some TFs in common. For example, XBP1, FOS, STAT1, and STAT2 were activated in both the epithelial and immune cells of virus-infected individuals. In contrast, severe COVID-19 cases exhibited activation of CEBPD in peripheral immune cells, while CEBPB was exclusively activated in respiratory epithelial cells. Moreover, in patients with severe COVID-19, although some inflammatory genes, such as S100A8/A9, were found to be upregulated in both respiratory epithelial and peripheral immune cells, their relative regulators can differ in terms of cell types. The cell-cell communication analysis suggested that epidermal growth factor receptor signaling among epithelia contributes to mild/moderate disease, and chemokine signaling among immune cells contributes to severe disease. Conclusion This study identified cell type- and condition-specific regulons in a wide range of cell types from the initial infection site to the peripheral blood, and clarified the diverse mechanisms of maladaptive responses to SARS-CoV-2 infection.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection causes a contagious disease known as coronavirus disease 2019 (COVID-19), which spread quickly across the globe, resulting in the COVID-19 pandemic.According to the World Health Organization (WHO), there had been more than 761 million cases worldwide (including over 6.8 million deaths) by the beginning of March 2023.Although the vast majority of infected individuals present asymptomatic, moderate, or mild symptoms, a proportion of cases require hospitalization and intensive care, or even progress to death (1)(2)(3)(4).SARS-CoV-2 enters epithelial cells, assembles its structures and nucleocapsids, is released, and subsequently stimulates immune cells (such as macrophages and dendritic cells) by inducing inflammatory factors.Finally, its antigen is presented via histocompatibility complexes I and II (MHC I and II) to activate humoral and cellular immunities, which are mediated by B and T cells, to induce the production of cytokines and antibodies (5)(6)(7)(8)(9)(10).The severity of inflammation can lead to cytokine storms in some COVID-19 patients (11)(12)(13)(14).
Multiple studies have been conducted to date to investigate alterations associated with immune responses, with the aim of providing deeper insights into the roles of the nasal, upper, and lower airway tissues and peripheral blood (21,(29)(30)(31)(32).A large-scale single-cell transcriptome atlas of the lungs and peripheral blood of COVID-19 patients has also been compiled (33).However, a detailed analysis of the gene regulatory changes in both respiratory epithelial and peripheral immune cells during progression to severe COVID-19 is required to completely understand aberrant and protective immune responses to SARS-CoV-2 infection.
Therefore, in this study, we analyzed single-cell RNA sequencing data from nasopharyngeal swabs and peripheral blood mononuclear cells (PBMCs) to capture the immune response at the site of infection (epithelial cells) and of the peripheral immune system.We found that epithelial cells (e.g., squamous and goblet cells) and immune cells (e.g., CD14 and CD16 monocyte cells) exhibit substantial phenotypic differences after SARS-CoV-2 infection.The transcription factor regulatory network construction underlies heterogeneous immune responses during progression to severe COVID-19 among different cell types from various infection sites.Furthermore, we demonstrated the important role of some inflammatory genes (such as S100A8 and S100A9) in the pathogenesis of COVID-19 and found that regulators of these critical genes can be unique to cell types and conditions.A cell-cell communication analysis suggested that epidermal growth factor receptor (EGFR) signaling in epithelial cells may contribute to mild/moderate COVID-19.Collectively, our work reveals and clarifies the mechanisms involved in maladaptive responses to SARS-CoV-2 infection and provides a rich resource for predicting, preventing, and treating SARS-CoV-2 infection in respiratory epithelial cells and peripheral immune cells.

Data collection
Single-cell RNA sequencing (scRNA-seq) data from nasopharyngeal swabs and PBMCs were collected (21,31).The nasal scRNA-seq data were publicly available for exploration and download via the single-cell portal (https://singlecell.broadinstitute.org/single_cell/study/SCP1289/), and the PBMCs data were available for viewing and downloading from the COVID-19 Cell Atlas (https:// www.covid19cellatlas.org/#wilk20)hosted by the Wellcome Sanger Institute.Biological samples of nasopharyngeal swabs were collected from the University of Mississippi Medical Center between April and September 2020.Eligible participants for blood samples were recruited into the Stanford University ICU Biobank study between March 2020 and April 2020.With respect to the nasal epithelial data, eight individuals were removed from our study based on the following criteria: (1) healthy individuals with a recent history of COVID-19 and (2) individuals who needed intensive care units but without a recent history of COVID-19.In addition, because of the small numbers of cells collected from mast cells (6 cells), plasmacytoid DCs (11 cells), and enteroendocrine cells (1), these cell types were excluded from our analyses.A total of 15 healthy participants and 35 patients diagnosed with COVID-19 were ultimately studied.According to the COVID-19 severity stratification of the World Health Organization (WHO) guidelines, these 35 patients were further divided into two groups: those with mild/moderate disease (14 patients) and those with severe disease (21 patients).WHO scoring system for healthy, mild/ moderate, and severe cells were represented by Control_WHO_0, COVID19_WHO_1-5, and COVID19_WHO_6-8.With respect to the PBMCs scRNA-seq data, six healthy and seven severely ill individuals were studied.Processed count matrices with embeddings were used only for the PBMCs.The cell identity labels, annotated by the authors in the original papers, were transferred to the scRNA-seq datasets in this study.All sample characteristics were given in Supplementary Table 1.

Single-cell RNA sequencing data processing
The Seurat package (version 4.0.4)(34) implemented in R (version 4.1.0)was used to explore the single-cell transcriptome data.The count matrices were normalized using Seurat NormalizeData.Specifically, the log-normalized method was used to normalize the total feature expressions per cell, multiply them by a scaling factor (10,000 by default), and further log-transform the results.Highly variable genes were then identified using the FindVariableFeatures function (3,000 top variable features were set).The percentage of mitochondrial genes was regressed out using the ScaleData function.The scaled data were passed to run a principal component analysis (PCA) dimensionality reduction algorithm.The FindNeighbors and FindClusters functions were then employed to cluster the cells, and a graph-based clustering algorithm that calculates the k-nearest neighbors and constructs a shared nearest neighbor graph, was applied to identify cell clusters.Nonlinear dimensionality reduction (RunUMAP function) and Uniform Manifold Approximation and Projection (UMAP) were then conducted to visualize the clustering results in two dimensions.To identify differentially expressed genes (DEGs) when comparing any two given groups, the FindMarkers function in Seurat was applied with the following configurations: test.use= "wilcox" (a Wilcoxon Rank Sum test), min.pct= 0.25, logfc.threshold= 0.25, and only.pos= FALSE.An additional adjusted P-value threshold of ≤ 0.05 was used for filtering DEGs.
Additionally, an additional scRNA-seq with accession number GSE158055 (33) was integrated with the aforementioned scRNAseq PBMCs dataset.The GSE158055 dataset included immune cells from PBMCs.Cell identities annotated by the authors in the original paper were used.Common cell types between GSE158055 and existing scRNA-seq PBMCs were selected for integration, which were seven immune cell types (CD14 Monocyte, CD16 Monocyte, B, CD4 T, CD8 T, NK, and DC).For integration, the Seurat integration workflow was performed to remove batch effects.The FindIntegrationAnchors function with reciprocal PCA was used to discover anchors in large datasets.These anchors were then passed to the IntegrateData function.Finally, downstream visualizations were conducted following the RunPCA and RunUMAP functions.

Gene regulatory network analysis and visualization
To explore the regulatory landscape across cell types between healthy and COVID-19 patients, the SCENIC (single-cell regulatory Network Inference and Clustering, version 1.2.4) (35) tool was used.SCENIC is a set of tools that can infer transcription factors (TFs) and construct gene regulatory networks (GRNs) from scRNA-seq data.The required human RcisTarget database was downloaded from https://resources.aertslab.org/cistarget/.GENIE3 and RcisTarget in SCENIC were used to identify potential direct binding targets (called regulons) based on co-expression modules and a TF motif analysis.Here, the regulon represented one TF and its targets.Utilizing the AUCell algorithm, the activity of regulons in each individual cell were analyzed and evaluated by calculating the area under the recovery curve (AUC) score.To identify specific regulators of cell type-specifics and conditions (healthy, mild/ moderate, and severe COVID-19), we calculated the average regulon activity by cell type in each condition and merged them to create an AUC score heatmap via the pheatmap package in R. The AUC score matrix of all regulons in each cell was submitted to the Seurat object to project the AUC (as well as TF expression) onto UMAP plots.In addition, in terms of each condition, the targets of the identified TFs in each cell type were filtered using the corresponding DEGs.Finally, GRNs for each cell type and condition, which comprised the observed TFs and their differentially expressed target genes, were constructed and displayed using Cytoscape software (version 3.9.1)(36).

Function and pathway enrichment analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of gene sets of interest were performed using the clusterProfiler (version 4.0.5)(37) package in R. The GO enrichment analysis was conducted based on biological processes, and GO annotation data were provided by AnnotationHub.KEGG annotation data were available in the KEGG database (https://www.genome.jp/kegg/).An adjusted Pvalue ≤ 0.05 was considered significantly enriched.

Cell-cell communication analysis
The CellChat (version 1.5.0)(38) package in R was used to infer and analyze cell-cell communications (CCCs) among cell types."Secreted Signaling" was set to explore intercellular communication networks, and the communication probability was then computed to infer the cellular communication network.The aggregated CCC network was calculated by counting the number of links or summarizing the communication probability.Collectively, based on gene expressions and prior knowledge of the interactions, all significant communications (ligand-receptor interactions) associated with signaling pathways from one cell type to other cell types were determined.

Whole blood bulk transcriptomic and proteomic data analysis
Pre-processed whole-blood bulk transcriptomic data are publicly available for download at GEO (accession number GSE157103) (39).Transcript counts were normalized using the transcript per million (TPM) method.Samples were selected from a total of 126 samples according to the following criteria: (i) select COVID-19 infection samples and (ii) samples were removed if the sequential organ failure assessment (SOFA) scores were unknown.A final total of 56 samples were used in this study.The selected samples were then grouped based on the SOFA score and the Pearson's correlations calculated between the TPM and SOFA scores.The lung proteome data are available at the iProX database with accession number IPX0002393000 (40).The plasma proteome data (accession number GSE207015) are available at GEO (41).Sample characteristics of bulk transcriptome and proteome datasets were given in Supplementary Table 1.

Single-cell characterization of nasopharyngeal swabs and PBMCs
To better understand and compare the host response to SARS-CoV-2 infection at the initial infection site and peripheral immune cells, we obtained single-cell RNA sequencing (scRNA-seq) data from nasopharyngeal swabs (21) and peripheral blood mononuclear cells (PBMCs) (31) under three different conditions: healthy individuals and COVID-19 patients with mild/moderate and severe disease.Metadata, such as cell type annotation and embedding of PBMCs data, were mainly obtained from original studies.As mentioned in the Materials and Methods section, certain cells associated with nasal scRNA-seq data were excluded from this study based on the criteria described therein.Data from nasopharyngeal swabs and PBMCs were then processed using the same protocols, including those relating to data normalization, dimensionality reduction, and cell clustering, and the results were visualized on the UMAP plot.A total of 26,894 cells from the nasal mucosa and 44,721 cells from PBMCs were analyzed, comprising 15 and 13 cell types, respectively (Figure 1A; Supplementary Figure 1, Supplementary Table 2).Of the cell types, SARS-CoV-2 induced CD14 monocyte expansion and NK cell loss, while the B and T cell abundances remained similar between healthy and COVID-19 patients.In addition, ciliated and goblet cells from nasal epithelial cells and dendritic cells (DCs) from PBMCs exhibited the highest number of expressed genes.

Similarity and dissimilarity of regulons and pathways were identified in respiratory epithelial and peripheral immune cell types associated with COVID-19
To investigate the gene regulatory network (GRN) changes underlying COVID-19 manifestations, we conducted single-cell regulatory network inference and clustering (SCENIC) analyses (35).Regulons, including transcription factors (TFs) and their direct target genes, were detected in each cell type.We then calculated the average area under the recovery curve (AUC) scores per cell type to estimate regulon activities.Using a SCENIC analysis, we identified potential TFs in terms of the cell type and the three conditions (healthy individuals and COVID-19 infections with mild/ moderate and severe disease).By comparing nasopharyngeal swabs and PBMCs, we found that there were notable differences between the many identified regulons among conditions or infection sites (i.e., nasal or peripheral blood), while some were found to be shared, and XBP1, FOS, STAT1, and STAT2 were activated in both the epithelial and peripheral immune cells of virus-infected individuals (Figure 1B).In addition, when comparing the detected regulons across cell types, we found that some cell types were distinguished by different regulon combinations among the three conditions, but some were not (Supplementary Figure 2A).Specifically, epithelial cells, such as ciliated cells and mitotic basal cells, unexpectedly shared common TFs, whereas distinct TFs were identified in other epithelia, such as basal cells, squamous cells, and goblet cells.For example, RFX2 and RFX3 showed high activity in ciliated cells regardless of disease severity, while XBP1, NR2F6, SPDEF, and ELF3 were preferentially activated in goblet cells in patients with severe COVID-19, and KLF5 and STAT2 were coactivated in patients with mild/moderate COVID-19.Furthermore, NR2F6, SPDEF, and ELF3 were found to be activated in squamous cells in patients with severe COVID-19, and this activity was also shared with goblet cells.
It is of note that we mainly focused on epithelial cells from nasal scRNA-seq data due to their very small number of immune cells, but we then used PBMCs scRNA-seq data to analyze immune cells.The PBMCs data showed that most regulons were unique to cell types or patient condition (healthy and severe).For example, CEBPD and FOS were highly activated in both CD14 and CD16 monocyte cells in severe COVID-19 disease, whereas BACH1 exhibited particularly high activation in CD16 monocyte cells (Supplementary Figure 3A).
For each cell type, we further analyzed the differentially expressed genes (DEGs) between COVID-19 patients and healthy individuals, and we then performed GO (biological process) and KEGG enrichment analyses using these DEGs (Figures 1C, D; Supplementary Figures 2B, 3B, Supplementary Tables 3-8).In the case of nasal epithelial cells, many genes were upregulated in the ciliated cells of patients with mild, moderate, and severe COVID-19, and they showed enrichment in some functions and pathways, such as COVID-19, oxidative phosphorylation, protein targeting, and viral gene expression (Supplementary Tables 6, 7).Surprisingly, we found that squamous cells showed the highest number of downregulated genes (200 genes) during mild/moderate COVID-19, and goblet cells displayed the highest number of upregulated genes (1,033 genes) during severe COVID-19 (Figure 1C).The 200 deregulated genes were associated with regulating translation, the cellular amide metabolic process, and RNA splicing, while the 1,033 upregulated genes were associated with the ATP metabolic process, interleukin-1-mediated signaling pathway, Wnt signaling pathway, planar cell polarity pathway, response to decreased oxygen levels, and the viral life cycle (Figure 1C; Supplementary Figure 2B).Likewise, we found that compared to the number of upregulated genes, a larger number of genes tended to be downregulated in most cell types from PBMCs, and the highest differences between healthy and severe COVID-19 patients were noted in CD14 monocyte cells (Figure 1C).During severe COVID-19, upregulated genes in both CD14 and CD16 monocyte cells were associated with type I interferon signaling, response to the virus, positive regulation of cytokine production, and toll-like receptor (TLR) signaling pathways (Figure 1D).TLRs have been reported to play an important role in responses to certain infections, and their changes may lead to cytokine storms (14, 42).We also found that I-kappaB kinase/NF-kB signaling was enriched.A recent study suggested that NF-kB might be associated with a poor proinflammatory cytokine production mechanism in the monocytes of patients severe COVID-19 (32).Other findings based on the GO and KEGG analyses among cell types are given in Supplementary Table 8.These results may provide an important reference for understanding the mechanisms of cytokine storms in different cell types.By comparing respiratory epithelial and peripheral immune cells, the SCENIC analysis identified similar and dissimilar TFs between the conditions.Furthermore, different combinations of these TFs were common or unique to certain cell types under different conditions.These distinct cell type-or condition-specific TFs potentially contribute to transcriptional regulation among cell types during disease progression.The studies on DEGs indicated different response mechanisms to SARS-CoV-2 occur at different infection sites (based on nasal and peripheral blood data).Specifically, DEGs of certain cell types were enriched in pathways such as the regulation of cytokine production, response to decreased oxygen levels, and viral response.

The construction of a gene regulatory network from nasopharyngeal swabs and PBMCs evidences immune responses to SARS-CoV-2 infection in different cell types
With our identified cell type-and condition-specific regulons, further studies on TFs and their direct targets were conducted with the aim of exploring the detailed mechanisms of immune responses at different infection sites.Our findings suggested that goblet and squamous cells among epithelial cells, as well as CD14 and CD16 monocyte cells among immune cells, exhibited considerable differences in not only DEGs but also regulons.We constructed and visualized GRNs (i.e., regulons) per cell type using the Cytoscape tool (36).The target genes of each TF were further filtered using the corresponding DEGs in relation to cell types and conditions.We aimed to construct GRNs for all cell types and conditions; however, some cases failed because there were no remaining target genes of certain TFs after DEG filtering (i.e., the targets were not differentially expressed and thus the regulons were removed), or all of the regulons of these cell types or conditions showed very low activation.Note that we constructed the GRNs with differentially expressed targets (min.pct= 0.25, logfc.threshold= 0.25), while potentially existing regulons without having target genes to be differentially expressed were not examined in this study.As a result, the GRNs of goblet, squamous, CD14, and CD16 monocyte cells, were constructed (Figure 2).In goblet cells, STAT2 and KLF5 were highly activated and upregulated in many genes with mild/moderate COVID-19, such as ISGs (PARP14 and IFI44L), whereas ELF3, SBP1, NR2F6, and SPDEF were preferentially activated in severe COVID-19 to regulate the expression of their targets (Figure 2A).We found that related TFs were expressed, and regulons showed high AUC scores in the corresponding cell types (Figure 2B; Supplementary Figure 4).Furthermore, their target genes, including cytokines, interferonstimulated genes (ISGs), and S100/Calbindin genes, were significantly upregulated in severe COVID-19 patients (Figure 2C).For squamous cells, there were no activated regulons in SPDEF and ELF3 in mild/moderate COVID-19, but they were activated in severe COVID-19 (Figure 2A).In summary, SPDEF and ELF3 were shared between goblet and squamous cells in severe COVID-19, and S100A9 was co-upregulated.The regulators of S100A8 were not identified, although they were significantly upregulated in the two cell types.However, MAFF and GRHL1 downregulated S100A8 and S100A9, respectively, in healthy squamous cells (Supplementary Figure 5).In addition, ELF3 upregulated the expression of S100A11 in goblet cells (Figure 2).Similarly, for CD14 and CD16 monocyte cells in severe COVID-19, CEBPD and FOS were particularly activated in the two cell types, and CEBPD upregulated S100A8 and S100A9 (Figure 2).We also identified the substantial expression of SELL (an ISG) in most CD14 monocyte cells in severe COVID-19.BACH1 was uniquely regulated in CD16 monocytes.Unexpectedly, HLA-DRA, a major histocompatibility complex II (MHC-II) molecule, was considerably downregulated in CD14 monocyte cells, but upregulated in CD16 monocyte cells.MHC I molecules are considered to contribute to the SARS-CoV infection response (42).A recent study demonstrated that epithelial cells with SARS-CoV-2 RNA+ express only MHC-I and poorly express MHC-II family genes (21).However, a previous study discovered that the MHC class II transactivators, CIITA and CD74, can defend against many viruses, such as SARS-like coronaviruses; therefore, upregulation of MHC-II family genes may block the entry of viruses (43).
We also constructed GRNs in other respiratory epithelial and peripheral immune cells, such as ciliated, B, T, and NK cells (Supplementary Figures 5, 6).Notably, although certain TFs were shared among cell types and conditions, their target genes and regulations differed considerably.For example, RFX2 and RFX3 were activated in ciliated cells under all three conditions, but the majority of targets were downregulated in mild/moderate COVID-19 patients compared to healthy cells (Supplementary Figure 5).Furthermore, RFX3 upregulated most of its target genes in patients with severe COVID-19.For the NK cells, many ISGs were upregulated by STAT1 in severe COVID-19, such as EIF2AK2, PARP14, ISG15, PSMB9, MX1, SP110, DDX60, SAMD9L, ADAR, IFI44L, IFIT3, EPSTI1, SAMD9 (Supplementary Figure 6).In patients with severe disease, we also found that TCF4 was activated in B cells, whereas RUNX3, IKZF1, and EOMES were activated in CD8 cells (Supplementary Figure 6).
By analyzing different cell types from nasal or peripheral blood during progression to severe COVID-19, our findings demonstrated the existence of diverse GRNs.Intriguingly, we found that S100A8 and S100A9 were considerably upregulated by different TFs in a wide range of respiratory epithelial and peripheral immune cells in patients with severe COVID-19 (Supplementary Figure 4), which suggests that their upregulation tends to be independent of certain cell types and virusinfection sites but that different regulators can be used among cell types.Specifically, the systemic upregulation of S100A8 and S100A9 mainly occurred in goblet, squamous, B, CD14/CD16 monocytes, granulocytes, PB, and DC.However, their regulators can differ in terms of cell type at different infection sites.S100A8 and S100A9 have been reported to be markers of severe COVID-19 (18) and contribute to the recruitment of immune cells and cytokine storms in megakaryocytes and monocytes (21,33,44).

Robust DEGs were identified in T cells
To identify robustly expressed genes during the immune response against SARS-CoV-2 infection, we observed the overlap of DEGs in T cells from nasal and PBMCs scRNA-seq data.
Compared to one type of T cell in the nasal data, three types of T cells were annotated in the PBMCs data: CD4, CD8, and gd T cells.We extracted corresponding DEGs when comparing healthy individuals and severe patients to identify overlapping DEGs by comparing one T cell type from nasal data to three T cell types from PBMCs data, and then visualized their expressions in the two datasets (Figure 3).The majority of overlapping genes were robustly downregulated in patients with severe disease, while prothymosin alpha (PTMA) was the only one gene consistently upregulated.Interestingly, PTMA, the proprotein of thymosin alpha-1 (Ta1), has been reported to show increased expression in CD8 T memory cells in severe disease and slightly reduced activation of T cells in vitro (45), and the authors indicated that lymphopenia in COVID-19 patients could be relieved by Ta1 treatment.Among the overlapping downregulated genes, genetic defects in TCM6/8 may lead to lower intrinsic immunity to human b-papillomaviruses (b-HPVs) in epidermodysplasia verruciformis patients (46).SUN2 (Sad1 and UNC84 domain containing 2) is associated with mitosis, maintains a repressive chromatin state, and inhibits HIV-1 infection via association with Lamin A/C (47, 48).Despite previous studies showing quercetin and resveratrol, inhibitors of thioredoxin-interacting protein (TXNIP), as potential therapies for COVID-19 (49,50), our observation of robust TXNIP downregulation in severe patients suggested that these inhibitors may exhibit reduced performance in these treatments to a certain degree.

Alterations in enriched pathways and cell-cell communications between mild/moderate and severe COVID-19 were identified
As nasal epithelial cells mute the antiviral response in severe COVID-19 compared to mild/moderate patients, and this early failure may underlie and predict severe COVID-19 (21), we analyzed and compared these cells between the two patients groups.There was a greater upregulation in squamous and goblet cells in patients with severe disease than in those with mild/ moderate disease (Figure 4A; Supplementary Table 9).Upregulated genes in squamous cells were associated with pathways such as viral entry into the host cell, epidermis development, protein localization, and apoptotic signaling, while upregulated genes in goblet cells were related to protein targeting, response to hypoxia, and viral gene expression (Figure 4A).Although only a few downregulated genes were identified in squamous (19 genes) and goblet (50 genes) cells, they were still enriched in certain pathways.For example, CD74, TNFAIP3, and S100A4 in squamous cells were associated with I-kappaB kinase/ NF-kappaB signaling and interleukin-6 production, while MX1, IFIT1, SP100, and XAF1 in goblet cells were associated with the type I interferon signaling pathway (Supplementary Figure 7).
To investigate the ligand-receptor (L-R) interactions among epithelial cells, we performed a cell-cell communication (CCC) analysis using CellChat (38).Compared to patients with mild/   C).EGFR (also known as ErbB1) belongs to a family of receptor tyrosine kinases (ErbB), and ErbB contains four receptors: ErbB1, ErbB2, ErbB3, and ErbB4 (51).In this study, three of the receptors (not ErbB3) were identified in mild/moderate patients.The ligand of these receptors is betacellulin (BTC), and a previous study indicated that it might be useful in preventing an excessive fibrotic response to viral infections (such as SARS-CoV) by inhibiting EGFR signaling (52).Similarly, we observed an absence of EGFR signaling in healthy cells (Supplementary Figure 8A), and we therefore consider that EGFR inhibitors could be used as a potential treatment for mild/moderate COVID-19.In contrast, L-R interactions associated with the midkine (MDK) signaling pathway were observed in ciliated, secretory, and other cells in healthy, mild/ moderate, and severe COVID-19 patients (Figures 4B, C; Supplementary Figure 8A).Moreover, we visualized the enriched signaling pathways among epithelial cells in networks separately and compared them between mild/moderate and severe COVID-19 (Supplementary Figure 9).Some of the representative signaling pathways were found to be shared between the two conditions, but some were not.The common enriched pathways included MDK, VISFATIN, GRN, and GALECTIN signaling pathways.ANNEXIN and PERIOSTIN signaling pathways were specific to severe COVID-19.In contrast, EGF, GDF, and SEMA3 signaling pathways were unique to mild/moderate COVID-19.
3.6 Whole blood bulk transcriptomic data analysis showed genes correlated with the sequential organ failure assessment score To test whether our identified genes could contribute to immune responses during the progression of COVID-19 severity, we downloaded public whole blood bulk transcriptomic data of COVID-19 patients for validation (39) and analyzed alterations in gene expression levels in terms of the sequential organ failure assessment (SOFA) score.The transcripts per million (TPM) method was used to normalize the transcript counts.We then calculated and ranked the Pearson correlations between every single gene TPM value and the SOFA scores in COVID-19 patients (Supplementary Table 10).We found that 26 genes were highly positively correlated with SOFA scores (Pearson's r > 0.50) (Figure 5A), indicating that gene expression levels increased with the severity of clinical organ failure in critically ill patients.Intriguingly, among the top 26 correlated genes, two were from the S100 gene family: S100A8 and S100A12.According to our scRNA-seq data findings in this study, these two genes were differentially expressed in patients with severe COVID-19 (Figure 2C).We further calculated the average TPM value for each SOFA score, including several S100 and other genes, as shown in Figure 5A (Figure 5B).The S100A8/A12 expression increased with an increasing SOFA severity score.The expression of S100A9 showed a certain correlation with the SOFA score (r = 0.22), but S100A11 was negatively correlated.Notably, we found that matrix metalloproteinase family members (MMPs), MMP8 and MMP27, were highly correlated.It has been reported that MMPs contribute to the COVID-19 severity (53-55).Taken together, our findings indicate the critical roles of S100 family members and other genes (e.g., MMPs) in the progression of COVID-19.

Discussion
SARS-CoV-2 infection causes immune response alterations during the progression of COVID-19 severity, such as the enhanced expression of pro-inflammatory cytokines (i.e., cytokine storm) and considerably different expressions of genes associated with ISGs, MHC I, and II families (11-13, 21, 24, 43, 56).However, the GRN changes in terms of a wide range of cell types and COVID-19 severity (healthy, mild/moderate, and severe) require further description.In this study, therefore, we conducted a SCENIC analysis to capture similar and dissimilar regulons, and we then finely constructed a GRN landscape for both healthy individuals and COVID-19 patients across a wide range of cell types by comparing scRNA-seq data related to the various infection sites (respiratory epithelial and peripheral immune cells).Further analyses of the target genes of regulators and cell-cell communication revealed detailed intracellular and extracellular immune responses.Lastly, the comparison of findings using different data sources and the validation of certain key findings using additional scRNA-seq, bulk RNA-seq, and proteome data provided greater robustness to our results.
We identified cell type-and condition-specific activated regulons (including TFs and their targets) in a wide range of cell types, and we demonstrated that some cells, such as goblet, squamous, and monocyte cells, display a strong response against SARS-CoV-2 infection.For example, compared to activated STAT2 and KLF5 in mild/moderate goblet cells, SPDEF, ELF3, XBP1, and NR2F6 were found activated in patients with severe COVID-19.Similarly, SPDEF and ELF3 were found to be activated in squamous cells, and CEBPD and FOS were shared between CD14 and CD14 monocytes in severe COVID-19.We then constructed and c o m p a r e d G R N s .M o s t i m p o r t a n t l y , b y c o m p a r i n g nasopharyngeal swab and PBMCs data, we discovered the regulation of S100A8 and S100A9 expression, which could help clarify maladaptive responses to SARS-CoV-2 infection in a large range of cell types.Specifically, we found that SPDEF and ELF3 coupregulated S100A9 in goblet and squamous cells in severe patients, whereas CEBPD upregulated S100A8 and S100A9 in monocyte cells.In contrast, S100A8 and S100A9 were downregulated by MAFF and GRHL1 in healthy squamous cells.We also observed that during severe COVID-19, S100A8 and S100A9 were considerably upregulated in multiple other cell types, such as B cells, granulocytes, PB, and DC.The expression level of S100A8 was further found to have a high positive correlation with the SOFA score (Figure 5).A previous study reported the same regulation in T, B, NK, and DC cells (33).However, compared to healthy individuals, we did not observe upregulation of S100A8 and S100A9 in CD4, CD8, and gd T cells in severe patients.In addition, we identified regulators of many cytokines, ISGs, or MHC II family genes in severe patients, such as gene regulations of goblet (CLCL17, EIF2AK2, UBE2L6), monocyte (HLA-DRA, SELL, TRIM25, UBE2L6), CD8 T (SAMD9), NK (EIF2AK2, SAMD9, DDX60, PARP14, MX1, PARP9, ADAR, TXNIP, ISG15, IFIT3, EPSTI1, SP110, SAMD9L, IFI44L, TAP1), granulocyte (RSAD2, MX1, IFI44, EPSTI1, EIF2AK2, IFIT3, IFI44L, ISG15), and DC cells (MX1, RSAD2, IFI44L, IFI44, ISG15, IFIT3) (Figure 2; Supplementary Figures 5, 6).Notably, although ciliated cells play critical roles in viral entry, we did not detect distinct regulators in infected patients.Regulators RFX2 and RFX3 were shared between healthy individuals and COVID-19 patients, but the two regulators showed different target genes or target expression levels in the two sample types (Supplementary Figure 5).Further CCC analysis demonstrated that compared to severe patients, EGFR signaling has an important role when ciliated cells interact with other cells in mild/moderate patients.
We also utilized PBMCs data to explore cell-cell interactions among immune cells.The chemokine (C-C motif) signaling pathway was found to be unique to severe patients compared to healthy individuals (Supplementary Figures 8B, C).This pathway mainly contributes to pathways from CD8 T and NK cells to CD14 monocytes via the CCL5 and CCR1 ligand-receptor pairs.A recent study demonstrated that CCL5 contributes to the recruitment of inflammatory cells (mainly T cells and macrophages), and blockading CCR5 signaling using leronlimab (a monoclonal antibody) has been conducted to treat COVID-19-associated cytokine storms (57).
Furthermore, to validate our findings, such as the identified TFs and their differently expressed targets per cell type, we integrated the scRNA-seq PBMCs with additional scRNA-seq data (GSE158055) (Supplementary Figure 10).The integration was performed using common cell types between the datasets, consisting of seven peripheral immune cells (Supplementary Table 2).Then, the DEGs were identified per cell type by comparing severe to healthy cells (Supplementary Table 11).Consistent with our findings of GRNs from existing scRNA-seq PBMCs (Figure 2; Supplementary Figures 4,  6), we discovered that the associated TFs were also expressed in the corresponding cell types and upregulated their target genes such as S100A8/A9 in severe COVID-19 (Supplementary Figures 10B, C).Moreover, to make the findings more robust, we selected all identified TFs and their differentially expressed targets across cell types from the existing scRNA-seq PBMCs and then compared them to that of integrated data.We found that many targets per cell type were upregulated or downregulated as well in severe COVID-19 from integrated data (Supplementary Table 12).Apart from this, two proteome datasets were further used for validation analysis.On one hand, the DEGs of integrated data, comparing severe to healthy donors, were identified (Supplementary Table 13).Among these DEGs, some cytokines, ISGs, and S100A8/A9 were found to be upregulated in severe cases.Next, we validated similar upregulations in the lung and plasma proteomes, which the associated proteins exhibited higher levels in either the autopsy COVID-19 lung proteome or the severe/critical COVID-19 plasma proteome (Supplementary Figure 11) compared to those in healthy donors.On the other hand, some of the top 26 genes identified from bulk RNA-seq data (Figure 5) were also validated for their associated protein considerable abundances in the two proteomes, including the abundant S100A12 in either autopsy or severe/critical COVID-19 cases (Supplementary Figure 12).
In summary, this study integrated healthy individuals and COVID-19 patient data from independent scRNA-seq data sources (nasopharyngeal swabs, PBMCs, and GSE158055 scRNA-seq data), compared the findings of each dataset, and validated certain key findings using bulk RNA-seq and proteome data.By separately analyzing the data from each study, we revealed the GRN landscape and identified similar and dissimilar regulons and pathways under different conditions (i.e., healthy individuals and COVID-19 patients).Regulators of certain key genes (e.g., S100A8/A9) were found to differ among cell types and disease severity.However, when comparing and combining the independent studies, we found that virus-infected individuals had certain common and/or unique features at different infection sites.For example, (i) certain activated regulators (such as XBP1, FOS, STAT1, and STAT2) were shared in both respiratory epithelial and peripheral blood; (ii) certain genes (e.g., PTMA) were differentially expressed in T cells from independent studies; (iii) although S100A8/A9 were found to be upregulated in both respiratory epithelial and peripheral blood, their relative regulators can differ (e.g., SPDEF and ELF3 were found in goblet and squamous cells and CEBPD was seen in monocyte cells, which showed that regulators of a gene were specific to the infection site, cell type, and condition).Collectively, the results using our approach offer clues to comprehensively understanding the diverse disease mechanisms of SARS-CoV-2 infection.These findings can be used as a rich resource for predicting, preventing, and treating COVID-19 across a wide range of cell types, which may help control severe symptoms.
This study has several limitations.The numbers of epithelial cell types were not equal: some were obtained in large quantities and others in small quantities.It is acknowledged that a small number of cells may influence transcriptional landscapes.In the case of PBMCs data, we lacked those associated with mild and moderate COVID-19.Technical issues, such as those encountered with scRNA-seq techniques, have led to certain limitations that may be challenging to overcome.Additionally, the distribution of donor demographics (e.g., age and gender) might potentially influence the results to a degree due to the small group size, such as only 15 individuals in the control group of the nasal scRNA-seq dataset.Nonetheless, by integrating additional information and data from independent sources where available, some of these limitations may be resolved.Therefore, further studies using large-scale data and finer-grained descriptions of COVID-19 severity categories would help to fully understand GRNs, their gradual shift, and dynamics in terms of COVID-19 severity progression.Future characterization of the specific role of some genes (e.g., S100 family members) in the immune response to SARS-CoV-2 infection, particularly in relation to disease severity, is needed in vitro and in vivo experiments.

1
FIGURE 1 Characterization of nasopharyngeal swabs and PBMCs data.(A) Cell types in nasopharyngeal swabs and PBMCs data and their percentage proportion in patients with different COVID-19 severity.Middle: cell type visualization on UMAP plots.Right: cell proportion associated with disease severity (mild/moderate and severe) and healthy cells.(B) Heatmap of the area under the curve (AUC) scores of regulons estimated per cell type by SCENIC.Detected regulons are represented by their corresponding transcription factors in the right-hand columns.Columns represent AUC scores of cell types.For each condition, the column order is the same as the label order in (A).Asterisks behind the TFs represent commonly identified TFs between nasopharyngeal swabs and PBMCs.(C) Number of differentially expressed genes (DEGs) when comparing COVID-19 (mild/moderate, severe) and healthy cells.Yellow and light-blue colors represent upregulated and downregulated genes, respectively.The corresponding gene numbers are shown at the top and bottom of each bar.In the upper panel, bars without dots indicate identified DEGs by comparing mild/moderate disease to healthy cells, while bar with dots indicate DEGs when comparing severe disease and healthy cells.(D) GO (biological process) enrichment analysis to compare upregulated genes found in diseased and healthy cells.

2
FIGURE 2Gene regulatory networks of specific cell types.(A) The gene regulatory networks (GRNs) of specific cell types from COVID-19 patients were constructed using Cytoscape software.In the network, the red, yellow, and light blue colors represent transcription factors, upregulated, and downregulated genes, respectively.(B) UMAP plot visualization of TF expression (left) and AUC score (right) of each regulon for nasopharyngeal swabs and PBMCs data.(C) The dot plots represent target gene expressions of detected TFs in (B).The target genes include specific cytokines, interferon-stimulated genes (ISGs), and inflammatory genes (S100A8, S100A9).

3
FIGURE 3 Overlap of DEGs in nasal and PBMCs T cells when comparing healthy and severe COVID-19 cases.Violin plots represent expressions of overlapping DEGs in (A) nasopharyngeal swabs and (B) PBMCs data.The asterisk following each gene indicates a significant difference between healthy and severe COVID-19 cases.Significance: *P < 0.05, **P < 0.01, ***P < 0.001.

4
FIGURE 4 Comparisons between mild/moderate and severe COVID-19 from nasopharyngeal swabs.(A) Number of identified DEGs when comparing severe and mild/moderate COVID-19.Yellow and light-blue colors represent upregulated and downregulated genes, respectively.The right panel indicate the results of GO enrichment analysis using DEGs based on biological process.Cell-cell communication (CCC) in (B) mild/moderate and (C) severe COVID-19.The left-hand networks of (B) and (C) represent cell interaction numbers among cell types.Cell types are represented by circles with different colors.Circle size indicates the number of cells of a cell type, while the edge width corresponds to the numbers of ligand-receptor (L-R) interactions.The Bubble plots [right-hand side of (B) and (C)] show all the significant L-R pairs associated with signaling pathways from a given cell type to another one.The dark blue to red colors relate to the communication probability (from minimum to maximum).

5
FIGURE 5Relationships between gene expressions and SOFA scores in COVID-19 samples.(A) Pearson correlations between gene expression and sequential organ failure assessment (SOFA) score.Top 26 genes with over 0.5 correlations are shown.Transcripts per million (TPM) were used for RNA-seq normalization.(B) Average TPM values of selected genes for each SOFA score.The selected genes include some S100 family members and some genes from the top 26 gene list in (A).For each bar plot, the Pearson's "r" between the TPM values and SOFA scores of all related samples is shown.