Construction of a Suite of Computable Biological Network Models Focused on Mucociliary Clearance in the Respiratory Tract

Mucociliary clearance (MCC), considered as a collaboration of mucus secreted from goblet cells, the airway surface liquid layer, and the beating of cilia of ciliated cells, is the airways’ defense system against airborne contaminants. Because the process is well described at the molecular level, we gathered the available information into a suite of comprehensive causal biological network (CBN) models. The suite consists of three independent models that represent (1) cilium assembly, (2) ciliary beating, and (3) goblet cell hyperplasia/metaplasia and that were built in the Biological Expression Language, which is both human-readable and computable. The network analysis of highly connected nodes and pathways demonstrated that the relevant biology was captured in the MCC models. We also show the scoring of transcriptomic data onto these network models and demonstrate that the models capture the perturbation in each dataset accurately. This work is a continuation of our approach to use computational biological network models and mathematical algorithms that allow for the interpretation of high-throughput molecular datasets in the context of known biology. The MCC network model suite can be a valuable tool in personalized medicine to further understand heterogeneity and individual drug responses in complex respiratory diseases.


INTRODUCTION
The respiratory tract is under constant challenge to provide the body with oxygen while monitoring air quality for pollutants and microorganisms. The mucous membranes in the airways, which are lined with microtubule-based projections, the cilia, represent a powerful first-line defense. In response to irritants and infection, mucus is secreted by goblet cells, and cilia on the surface of ciliated cells move mucus upward in coordinated waving and beating motions. Eventually, particles are expelled through sneeze and cough (Wanner et al., 1996). This self-clearing mechanism, mucociliary clearance (MCC), ensures proper functioning of the respiratory tract.
Cilia have attracted increasing attention because of the growing number of diseases caused by mutations in genes that impact cilium assembly, function, and turnover (Fliegauf et al., 2007;Kempeneers and Chilvers, 2018). Traditionally, cilia are classified as primary or motile (Wheatley, 1995;Satir and Christensen, 2007). Primary cilia are present on almost all cell types and are involved in tissue homeostasis (Gerdes et al., 2009;Nozawa et al., 2013). Motile cilia often occur as clusters of several hundred protrusions covering cells and direct fluid flow (Choksi et al., 2014).
Cilia assembly and resorption often depend on the cell cycle (Kim and Tsiokas, 2011), with a neatly interwoven mode of regulation assuring timely and developmentally precise control of cilium biogenesis. The regulatory factor X (RFX) family of transcription factors is a key regulator of both primary and motile cilia assembly programs (reviewed in Thomas et al., 2010;Choksi et al., 2014). A master regulator of motile cilia assembly across the vertebrates is forkhead box J (FOXJ1), a member of the forkhead/winged-helix family of transcription factors (Murphy et al., 1997;Chen et al., 1998;Brody et al., 2000), which is under control of multiciliate differentiation and DNA synthesis-associated cell cycle protein (MCIDAS) and geminin coiled-coil domain-containing (GMNC) protein in the respiratory epithelium (Stubbs et al., 2012;Arbi et al., 2016). Mutations in MCIDAS and its downstream effector cyclin O are implicated in an MCC disorder known as reduced generation of multiple motile cilia (RGMC) (Boon et al., 2014). In RGMC patients, cilia numbers are reduced, resulting in impaired MCC, airway obstruction, and recurring respiratory infections.
An alternative mechanism of ciliary regulation is the disassembly of the organelle by aurora A kinase (AURKA), which also regulates the entry into mitosis (Pan et al., 2004;Pugacheva et al., 2007). AURKA phosphorylates histone deacetylase 6 (HDAC6), stimulating HDAC6-dependent deacetylation of axonemal microtubules (Hubbert et al., 2002), destabilization of the ciliary shaft, and subsequent collapse of the cilium.
While mucus secretion is a normal defense response, mucin synthesis in goblet cells and mucus secretion are amplified in respiratory diseases such as asthma or chronic obstructive pulmonary disease (COPD). In addition, the number of goblet cells can increase by proliferation (hyperplasia) and by airway epithelial cell transdifferentiation (metaplasia), further contributing to increased mucus production (Blyth et al., 1998;Rogers, 2007;Turner and Jones, 2009;Boucherat et al., 2013;Ramos et al., 2014). This airway epithelial remodeling decreases ciliated cell numbers and ciliary beating efficiency, reducing MCC and aggravating airway plugging (Nini et al., 2012;Yaghi et al., 2012;Yaghi and Dolovich, 2016).
There is overwhelming evidence that oxidative stress and oxidative damage play a pivotal role in the pathogenesis of COPD (Rahman and MacNee, 1999;Rahman and Adcock, 2006;Anderson and Macnee, 2009;Kim and Criner, 2015;Matera et al., 2016). Oxidative stress is a well-described trigger of the epidermal growth factor receptor (EGFR) signaling pathway that leads to mucus hypersecretion (Takeyama et al., 1999(Takeyama et al., , 2001Perrais et al., 2002;Hewson et al., 2004;Casalino-Matsuda et al., 2006;Hao et al., 2014). We recently published an adverse outcome pathway that describes the events that follow oxidative stressmediated EGFR activation to goblet cell hyperplasia/metaplasia and decreased lung function following mucus overproduction (Luettich et al., 2017).
Signaling downstream of interleukin (IL) 13 is involved in the pathogenesis of asthma (Wills-Karp, 2004). The IL13 receptor complex initiates several cascades of molecular events that result in goblet cell metaplasia/hyperplasia. One important downstream effector of IL13 is the sterile alpha motif pointed domain-containing ETS transcription factor (SPDEF), which is directly involved in mucin gene expression (Park et al., 2007;Chen et al., 2009).
The vast volume and diversity of biological data available on cilium assembly, CBF, and goblet cell hyperplasia/metaplasia require that the information be integrated for better visualization and understanding of the processes that underlie respiratory diseases. Biological network models offer a framework for understanding biological processes and diseases and aid in drawing new, often unpredicted conclusions. Over the years, we have built several causal biological network (CBN) models that capture biological processes that are impacted in COPD. These models, stored in the CBN database, are emerging as an innovative and powerful tool to quantify the impact of exposure or potentially affected biological processes in disease (Cho et al., 2012;Martin et al., 2014;Boue et al., 2015;Talikka et al., 2017). The major advantage of the CBN approach is that it transforms unstructured data into interconnected and organized knowledge that describes biological processes precisely and accurately Westra et al., 2011Westra et al., , 2013Cho et al., 2012;Gebel et al., 2013;De Leon et al., 2014;Martin et al., 2014;Szostak et al., 2016).
In this study, we present a suite of causal biological models that describe important molecular events involved in MCC, from cilium assembly to ciliary beating, goblet cell hyperplasia/metaplasia, and mucus hypersecretion. We also show how transcriptomic data are scored onto these network models and how the models can provide mechanistic understanding of gene expression changes.

Literature Curation
Biological Expression Language (BEL) 1 version 1.0 is used for scientific text curation. BEL is a computable language that converts causal and correlative biological observations to statements consisting of two biological entities connected by a relationship predicate 1 . Relevant original research articles for curation were identified from pertinent review articles in the field. The journal impact factor or any other means to rank the publications was not considered. If the statements in the original research articles were sufficiently supported by the results presented in figures, the information was considered reliable and captured. To retrieve causal relationships the result sections were extracted from these articles for curation. The introduction, discussion and conclusion sections were avoided because the evidences therein largely contain data from earlier studies, repetition of the results, hypotheses and assumptions. Although several evidences supporting an interaction would provide more confidence on the edge, we capture the interactions even when a single experiment is provided in the literature, in order to not omit the relevant information. Contradicting statements were captured without preferential treatment and with proper annotations (model organism, tissue, cell line, treatment/disease, experimental setup). The experimental information from the relevant peer-reviewed scientific articles is semi-automatically processed through the BEL Information Extraction workFlow (BELIEF) platform (Szostak et al., 2015(Szostak et al., , 2016. BELIEF contains a text-mining software that recognizes biological terms in the text and assembles them into BEL statements. The curation interface allows review, correction, and annotations (cell/tissue type, disease if applicable, species, and experimental design) of the statements that BELIEF proposes. The literature curation is an iterative process. After the curation of initial articles, a gap analysis is performed, and more literature is identified based on gaps in the network models.

Network Model Assembly and Visualization
BEL statements are then compiled to generate a cohesive knowledge assembly model using the OpenBEL framework 3.0.0, an open source compilation framework. The network model consists of nodes that are the biological entities in the network models connected by edges (i.e., the relationships between the biological entities). Any RNA nodes are removed from the model backbone and used in the downstream layer for model scoring as described in Martin et al. (2014). The Cytoscape web application 2 is used to visualize and analyze the network properties (Shannon et al., 2003). Cytoscape supports powerful visual mapping whereby biological entities are depicted as defined-shaped nodes connected by the relationship edges. The network visualization is used also during the curation process to identify the gaps and to trim the network models. The trimming here means that any nodes that are "hanging" and do not lead to a biological process described in the model are removed, or further curation is performed to add molecular relationships to connect such nodes to the biological process.
The network model suite is available in the CBN database. The NPA algorithm as well as some measurable "downstream" relationships (backbone node to mRNA) can be downloaded as R packages from the GitHub project pages https://github.com/pmpsa-hpc/NPA and https://github.com/pmpsa-hpc/NPAModels.

Network Model Scoring
The network perturbation amplitude (NPA) methodology is used to obtain a quantitative assessment of how each of the models interprets the transcriptomic changes in the datasets we selected (GSE22430, GSE37693, and GSE5264). This methodology allows for the translation of gene expression fold-changes to differential values for each network node as well as enabling a networklevel summary to provide a quantitation of the degree of network model perturbation (Hoeng et al., 2012;Martin et al., 2014;Sewer et al., 2015;Szostak et al., 2016). Raw data were obtained from Gene Expression Omnibus (GEO) repository and normalized following a standard pipeline based on robust multiarray normalization implemented in the R environment for statistical computing (Smyth, 2004). The differential expression values and statistics were calculated using the Bioconductor LIMMA package with appropriate experimental comparisons. "O" and "K" statistics was used to test the specificity of the network models (including the "downstream edges" that connect the network nodes to gene differential expression nodes according to the underlying reverse-causal concept; Catlett et al., 2013). They compare the actual NPA value to the distributions of alternative NPA values obtained by permuting the edges of the networks (the connections between nodes for "K" and the connection between nodes and gene differential expression nodes for "O"). If the actual NPA value is significantly different from these "background" non-biological values, then we consider it as significantly specific.
The leading node analysis allows to focus on a fewer number of nodes in the network by ranking the nodes based on their contribution (%). Using an empirical 80% collective contribution instead of the actual rank, does not limit the number of the nodes, when the contribution of several nodes is almost equal (Martin et al., 2014).

Cilium Assembly Model
The cilium assembly network model is a collection of intertwined biological entities and processes that are supported by 59 relevant peer-reviewed articles. The network contains 209 nodes and 319 edges that represent relationships between nodes (Figure 1). When the connections between the nodes in the network were analyzed, many poorly connected nodes and a few highly connected ones, "hubs, " were observed. The most connected node (63 indegree edges) was the biological process "cilium assembly, " and the transcription factor FOXJ1, which is downstream of MCIDAS and GMNC, had the most outdegree edges (Figure 1).
A number of pathways, including delta-like canonical notch ligand (DLL)/NOTCH (through MCIDAS/GMNC), smoothened/hedgehog, and grainyhead-like transcription factor 2, converge into a FOXJ1/RFX module that triggers cilium assembly in the network model. This shows a high level of cooperativity between FOXJ1 and RFX factors; FOXJ1 can induce RFX2 and RFX3 expression, FOXJ1 gene expression is partially dependent on RFX3 activity, and a subset of FOXJ1 FIGURE 1 | Causal biological network model for cilium assembly. The table shows the top 10 highly connected nodes and their degrees of distribution. The vocabulary for the BEL is provided in http://www.openbel.org/. The Cytoscape layout is the Yfiles hierarchical layout. The network model can be downloaded from causalbionet.com. and RFX target genes overlap (Figure 1). This assures timely and developmentally precise control of cilium biogenesis. In addition, numerous molecules and complexes necessary for structural integrity of cilia, such as the axoneme constituents, BBSome complex (structural components of the basal body), and exocyst complex (membrane transport to cilium), support the "cilium assembly" hub as immediate neighbor nodes. With regard to the "cilium disassembly" hub, as expected, the AURKA-HDAC6 axis and their upstream regulators emerged as a supporting subnetwork.

Ciliary Beating Model
The ciliary beating network model was computed from 52 articles and comprises 80 nodes and 137 edges. The network illustrates the path from various stimuli through intermediate signaling molecules converging into consecutive biological processes, with "mucociliary clearance" as the final node (Figure 2). "Epithelial cilium movement" has the most inward connections in the network, and adenosine triphosphate has the most outward connections. Calcium and "nitric oxide synthase family" are central hubs in the network, with several incoming and outgoing edges. The model shows the CBF increases as a result of cAMP-and cGMP-mediated activation of the respective protein kinases through Ca 2+ release or by a calcium-independent mechanism. The model also captures cystic fibrosis transmembrane conductance regulator, whose activation triggers the adenylate cyclase (ADCY)/cAMP pathway. Several other stimuli, such as serotonin or macrophagestimulating protein, via corresponding receptors (HTR and MST1R, respectively), lead to increased ciliary motion in the model. Another level of regulation is added through sex hormone-dependent modulation, such as progesterone-mediated decreases or estrogen-mediated increases in CBF.

Model Scoring With Transcriptomic Data
NPA Network scoring with transcriptomic data is based on the inference of activities of the molecular entities in the network from gene expression changes. This backward reasoning employs a downstream layer with information on gene expression changes known to be induced by the backbone entities (Martin et al., 2014). To test the ability of the MCC network models to provide a quantitative measure of MCC, we identified publicly available datasets in Gene Expression Omnibus. The first dataset selected for model scoring (GSE22430) was from lung epithelial cells treated with the redox-active toxin pyocyanin from Pseudomonas aeruginosa that stimulates EGFR (Rada et al., 2011). Dataset GSE5264 was derived from an in vitro experiment, in which airway epithelial cells were allowed to differentiate to a pseudostratified epithelium at the air-liquid interface (Ross et al., 2007). Finally, we used a transcriptomic dataset from IL13-treated human airway epithelial cells (GSE37693) (Alevy et al., 2012).
The cilium assembly network model responded strongly to the treatment of lung cells with pyocyanin and to the time-course of bronchial epithelial cell differentiation with increasing amplitude over time. There was no impact on the models in response to the IL13 treatment (Figure 4). When the same datasets were used to score the cilia beating network models, the largest amplitude of network perturbation was observed in response to pyocyanin treatment of lung cells (Figure 4). Similar to the cilium assembly model, the amplitude of cilia beating network perturbation increased with advanced mucociliary differentiation, and the model did not respond to the IL13 treatment. The scoring of the goblet cell hyperplasia/metaplasia network model again showed a very strong response to the pyocyanin treatment and, to a lesser extent, to mucociliary differentiation of airway cells. This model responded to IL13 treatment (Figure 4).

Leading Node Analysis
To investigate the mechanistic foundation underlying the perturbations of the network models from transcriptomic data and to further validate the biology in the models, we used the leading node analysis (Martin et al., 2014). Leading nodes are the entities in the network models upon which the impact contributes 80% of the observed effect on the network as a whole. Leading node analysis also allows for the assessment of the directionality (activation or inhibition) of the inferred effect on each node. All leading nodes for all contrasts and models are provided in Supplementary Data Sheets S1-S3. Figure 5 shows the leading node analysis of the cilium assembly network model scored with transcriptomic data from early, intermediate, and late time points of human airway epithelial cell mucociliary differentiation. At the early time point, bone morphogenic protein (BMP) signaling was inferred to be upregulated. The mechanistic target of rapamycin (mTOR), platelet-derived growth factor A (PDGFA), and protein kinase B (AKT) signaling were inferred to be downregulated, in contrast with the inferred upregulation of cilium assembly. At the same time, NudE neurodevelopment protein 1 like 1 (NDEL1) was inferred to be downregulated, resulting in downregulation of cyclin A2 (CCNA2) and cell cycle arrest. At the intermediate and late time points of mucociliary differentiation, BMP signaling was no longer inferred to be upregulated. Instead, DLL1/NOTCH1 signaling was inferred to be downregulated, resulting in an increase in MCIDAS and FOXJ1, the master transcription factors required for the formation of motile cilia. RFX3, also known to induce FOXJ1, was inferred to be upregulated at the intermediate  and late time points. PDGFA, mTOR, AKT, and NDEL1/CCNA2 continued to be downregulated in the leading node analysis.

Cilium assembly model
The leading node analysis of the pyocyanin treatment data scored on the cilium assembly network model indicated the upregulation of PDGFA and downregulation of BMP signaling (Supplementary Data Sheet S1). Figure 6 shows the leading node analysis of the ciliary beating network model scored with transcriptomic data from early, intermediate, and late time points of human airway epithelial cell mucociliary differentiation. The β2-adrenergic receptor/ADCY signaling pathway, leading to an increase in cAMP levels and subsequent Ca 2+ increase via the activation of the PRKA family, was inferred to be upregulated. The analysis also inferred the activation of cGMP-dependent protein kinase 1 (PRKG1). In addition, the leading node analysis of the pyocyanin treatment data scored on the ciliary beating network model indicated the upregulation of ADCY and calcium signaling (Supplementary Data Sheet S2). Figure 7 shows the leading-node analysis of the goblet cell hyperplasia/metaplasia model with the pyocyanin and IL13 datasets. The levels of reactive oxygen species (ROS) were inferred to increase with subsequent activation of EGFR and ERK1/2, followed by an inferred increase in mucin production. Three other branches of the network that were highlighted and led to increases in mucins included the AP-1, FOXA3/SPDEF, and IL13/SPDEF pathways. The inferred activation of the IL13 receptor complex mirrored the activation of the matrix metalloproteinase family. The inferred activation of the p38 MAPK family upstream of MUC5AC was unique to the pyocyanin dataset. While the NPA analysis showed a significant network perturbation in response to human airway epithelial cell mucociliary differentiation, a closer examination of the leading nodes clearly indicated inferred downregulation of ROS and the EGFR and MAPK ERK1/2 pathways (Supplementary Data Sheet S3).

DISCUSSION
MCC is an important defense mechanism that protects the respiratory tract, and thus the body, from infections and airborne pollutants. In this article, we presented a suite of CBN models that describe relevant molecular processes related to MCC. Derived from original articles, the BEL-scripted scientific statements FIGURE 7 | Goblet cell hyperplasia/metaplasia subnetwork based on leading nodes. The graphical representation shows an example of connected leading nodes for pyocyanin treatment (first lane) and IL13 treatment (second lane) compared with experimental controls. The top 10 leading nodes were prioritized and connected with other leading nodes from the analysis. Orange/red bars indicate inferred upregulation. Bold edges indicate "direct regulation." The dotted edge denotes a member of a protein family. The green asterisk indicates that the node is a leading node in a given contrast. The vocabulary for the BEL is provided in http://www.openbel.org/.
were assembled into three separate network models that capture molecular processes involved in cilium assembly, ciliary beating, and goblet cell hyperplasia/metaplasia accurately. The key factors involved in these processes are part of the backbone that interconnects various entities in the network models. As an example, the cilium assembly hub integrates the diversity of the cascades that are determined by the variety of cilia types, each requiring precise regulation (for review, see Choksi et al., 2014).
As part of network model validation, we conducted network scoring with gene expression data from experiments that were expected to trigger perturbation of the MCC models. The scoring also allowed us to look farther from the static network view into the key factors that impact the network and assess the behavior (activation or inhibition) of molecular entities in the model backbone based on differential gene expression in the selected datasets.
As expected, the biology in the redox-active pyocyanin treatment experiment was best reflected in the goblet cell hyperplasia/metaplasia model, with EGFR and downstream MAPK ERK1/2 factors predicted to be activated, leading to mucin production. This was in line with other experimental observations of increased numbers of goblet cells and increased mucin production in response to pyocyanin treatment (Rada et al., 2011).
Impact on the cilium-focused network models could be explained by the cell redox state and ROS levels affecting multiple cellular signaling pathways, some of which overlap with cilium biology. As an example, the activity of the nitric oxide synthase (NOS) family as well as the nitric oxide (NO) chemical node were inferred to be upregulated by pyocyanin treatment in the ciliary beating network model (Supplementary Data Sheet S2). NO is a redox molecule that regulates tissue oxidative balance through direct and indirect mechanisms of action and can lead to an increase in ciliary beat frequency through the activation of NOS family.
Network scoring with the airway epithelial cell differentiation dataset clearly showed time-dependent activation of pathways leading to cilium assembly and ciliary beating. At the early stage, BMP signaling was inferred to be upregulated, indicating the lack of cilium assembly, while at later time points, BMP released the inhibitory effect on cilium assembly, with the MCIDAS/FOXJ1/RFX3 pathway inferred to be activated to promote cilium assembly. Network scoring, however, indicated that mTOR and AKT were downregulated in the dataset, contradictory to the causal connection from mTOR and AKT to cilium assembly that was inferred to be upregulated. These relationships were derived from articles describing primary cilium assembly and may not appropriately reflect the biology in the respiratory tract, where the operating process is the motile cilia assembly program that culminates in the FOXJ1/RFX3 module (Wang et al., 2015;Suizu et al., 2016).
Downregulation of CCNA2 and cell cycle arrest in the cilium assembly network model could indicate a slowing down in cell proliferation to enforce cell differentiation. This was in accordance with the inferred inhibition of the mTOR/AKT pathway in the cilium assembly model. The results were further enforced by the scoring of the goblet cell network. The inferred reduction in EGFR signaling could indicate loss of proliferative potential in cultures differentiating to pseudostratified epithelium. This result also suggests that the network model discriminates between a physiological (i.e., differentiation) and pathological (i.e., COPD-related) increase in the number of goblet cells in airways. Finally, the cilia beating model appropriately captured the activation of cAMP/PRKA and cGMP/PRKG signaling that elevates cellular Ca 2+ levels, leading to increases in cilia beating.
Scoring the three network models with the datasets from IL13treated lung cells highlighted the specificities of the different networks: IL13-induced airway mucus production affected several hubs in the hyperplasia/metaplasia model, notably the SPDEF transcription factor, while impacts on the cilium assembly and ciliary beating models did not reach statistical significance.
In conclusion, the representation of cilium assembly, ciliary beating, and airway remodeling processes through CBN models is a potential powerful tool for systems medicine . MCC networks can be used as a substrate for scoring highthroughput data for mechanistic understanding of the differences between diseased and healthy tissue. The MCC network model suite presented here, along with gene expression data from wellcontrolled clinical studies, could be used in individuals with MCC disorders for subject classification, identification of mode of action of novel drug candidates, or prediction of treatment outcome. Ultimately, the MCC network model suite provides perspectives for tailored drug therapy and precision medicine.

AUTHOR CONTRIBUTIONS
MT, JH, and MP conceived and designed the experiments. HY, SV, and MT performed the study. HY and MT wrote the manuscript with the support of KL. FM, AS, and SG analyzed the data. All authors made critical revision and approved the final version of the manuscript.

FUNDING
Philip Morris International is the sole source of funding and sponsor of this research.

ACKNOWLEDGMENTS
We would like to thank Nicholas Karoglou and Elena Scotti for editorial assistance as well as Stephanie Boue for preparing and uploading the network models in the CBN database.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00087/full#supplementary-material DATA SHEET S1 | Leading node analysis for cilium assembly network model scored with transcriptomic data from pyocyanin treatment, IL-13 treatment, and mucociliary differentiation datasets. For each node, an * indicates it is a leading node and the number denotes its rank among the leading nodes. + or − in parenthesis indicates inferred up-or down-regulation, respectively. Value in % is the contribution of the node to the overall NPA.
DATA SHEET S2 | Leading node analysis for ciliary beating network model scored with transcriptomic data from pyocyanin treatment, IL-13 treatment, and mucociliary differentiation datasets. For each node, an * indicates it is a leading node and the number denotes its rank among the leading nodes. + or − in parenthesis indicates inferred up-or down-regulation, respectively. Value in % is the contribution of the node to the overall NPA.
DATA SHEET S3 | Leading node analysis for goblet cell hyperplasia/metaplasia network model scored with transcriptomic data from pyocyanin treatment, IL-13 treatment, and mucociliary differentiation datasets. For each node, an * indicates it is a leading node and the number denotes its rank among the leading nodes. + or − in parenthesis indicates inferred up-or down-regulation, respectively. Value in % is the contribution of the node to the overall NPA.