Boolean Feedforward Neural Network Modeling of Molecular Regulatory Networks for Cellular State Conversion

Choo, Sang-Mok; Almomani, Laith M.; Cho, Kwang-Hyun

doi:10.3389/fphys.2020.594151

ORIGINAL RESEARCH article

Front. Physiol., 01 December 2020

Sec. Systems Biology Archive

Volume 11 - 2020 | https://doi.org/10.3389/fphys.2020.594151

Boolean Feedforward Neural Network Modeling of Molecular Regulatory Networks for Cellular State Conversion

1. Department of Mathematics, University of Ulsan, Ulsan, South Korea
2. Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea

Abstract

The molecular regulatory network (MRN) within a cell determines cellular states and transitions between them. Thus, modeling of MRNs is crucial, but this usually requires extensive analysis of time-series measurements, which is extremely difficult to obtain from biological experiments. However, single-cell measurement data such as single-cell RNA-sequencing databases have recently provided a new insight into resolving this problem by ordering thousands of cells in pseudo-time according to their differential gene expressions. Neural network modeling can be employed by using temporal data as learning data. In contrast, Boolean network modeling of MRNs has a growing interest, as it is a parameter-free logical modeling and thereby robust to noisy data while still capturing essential dynamics of biological networks. In this study, we propose a Boolean feedforward neural network (FFN) modeling by combining neural network and Boolean network modeling approach to reconstruct a practical and useful MRN model from large temporal data. Furthermore, analyzing the reconstructed MRN model can enable us to identify control targets for potential cellular state conversion. Here, we show the usefulness of Boolean FFN modeling by demonstrating its applicability through a toy model and biological networks.

Introduction

Cellular behavior is governed by intracellular molecular regulatory networks (MRNs), such as signaling and gene regulatory networks (Schmidt et al., 2005; Kim and Cho, 2006; Sreenath et al., 2008; Kim et al., 2011). Reconstruction and mathematical modeling of such MRNs based on biological experiments have been of great interest in the field of systems biology. Modeling MRNs has been, however, very challenging due to the limited availability of time course measurements from biological experiments. This can now be overcome by recent advancement of technologies in experimental data measurements, and thus, there is a growing interest in developing a new paradigm of modeling MRNs based on large data sets.

Single-cell technologies have emerged in the fields of genomics (Ludwig et al., 2019; Tritschler et al., 2019; Baslan et al., 2020; Yofe et al., 2020), epigenomics (Berkel and Cacan, 2019; Chen et al., 2019; Verma and Kumar, 2019), transcriptomics (Cui et al., 2019; He et al., 2020; Huang et al., 2020), proteomics (Minakshi et al., 2019; Zhu et al., 2019; Labib and Kelley, 2020), and metabolomics (Duncan et al., 2019; Kawai et al., 2019; Kumar et al., 2020). We can now obtain omics information of hundreds to thousands of individual cells from a single experiment. For instance, single-cell RNA sequencing technologies can measure messenger RNA concentration of hundreds to thousands of genes expressed by single cells, and single cell proteomics by mass spectrometry can quantify over 1,000 proteins per single cell at once (Budnik et al., 2018; Lun and Bodenmiller, 2020). Such single-cell data can be used as pseudo-time-series measurements of distinct cellular states that can provide a new opportunity for modeling MRNs.

There have been attempts to develop dynamic models of MRNs based on ordinary differential equations, regression models, and Boolean networks. Boolean models are more appropriate to be employed for modeling MRNs from pseudo-time-series single-cell data since high-throughput single-cell data are more noisy than conventional bulk sequencing data, and Boolean logical network models are relatively robust to noise. Constructing a Boolean network model usually requires two steps: generating pairs of Boolean input and output for each node in the MRN from states of pseudo-time-ordered single cells and then fitting the Boolean state update logic of each node to the data (Hamey et al., 2017). There are, however, a number of challenges in determining the backbone network structure and optimizing the regulatory logic to the measured data sets. To overcome such challenges, we propose an approach combining Boolean network modeling and feedforward neural network (FFN) learning algorithm, which is particularly useful for inferring input–output relationships from large temporal data. For this purpose, we use only temporal data of network nodes and do not need to determine the network structure nor to optimize the regulatory logics. Of note, in our Boolean FFN model, each node of MRN is represented by a single output node of an FFN with all MRN nodes as its input nodes, and then, the state transition dynamics of MRN can be simulated by executing the entire Boolean FFN model.

Considering a cellular state transition process, we can partition the temporal data of such a process into three parts: ordered pairs of initial cellular states, ordered pairs of transitional cellular states, and ordered pairs of final cellular states. These three ordered pairs can then be used for building initial, transitional, and final cellular states of FFNs, which can be referred to as iFFN, tFFN, and fFFN, respectively. Employing the trained iFFN, tFFN, and fFFN, we can generate trajectories starting from initial to terminal cellular states and use such state trajectories as new training data for building a cell fate transition FFN (cFFN) for each node.

The eventual goal of our study is to identify control targets that can induce desired cellular state conversion, and for this purpose, we propose to build cFFN using iFFN, tFFN, and fFFN based on temporal data measurements of network nodes. We demonstrate the effectiveness and possible application of the proposed Boolean FFN modeling of MRNs by applying it to a toy network model as well as real biological networks. In particular, we compare identified control targets for cellular state conversion between the Boolean FFN and its original Boolean network model in order to show the effectiveness of the proposed Boolean FFN modeling of MRNs.

Results

Overview of Constructing cFFN

The overall procedure of constructing a cFFN is summarized in Figure 1. We presume that the nodes playing a significant role in the cellular state transition of interest are known, whereas the regulatory relationships among the nodes are unknown (Figure 1A). Here, all the nodes are assumed to have binarized values for their expression levels as to consider MRNs represented by Boolean network models. We also assume that marker nodes, which define specific desired or undesired states that are known, can be used as a primary basis for evaluation after identifying control targets for cellular state conversion.

FIGURE 1

We consider three clusters of Boolean states over the transition from initial to final states through transitional states, resulting in three sets of ordered pairs of initial, transitional, and final cellular states as shown in Figure 1B. These will also be referred to as the first, second, and third clusters to emphasize the order of cellular state transition. As we consider a transition process from an initial normal state to a final abnormal state, there is a tendency that the number of desired states decreases from the first to third clusters while the number of undesired states increases, which is referred to as marker tendency. In each cluster, the first state of ordered pair is assumed to be updated to the second state, which is represented by connecting arrows. However, there is no connection information between two clusters, resulting in no trajectory from initial to final states. We call these three consecutive clusters disconnected trajectories.

To construct connected trajectories, we build three FFNs, i.e., iFFN, tFFN, and fFFN, for each node using the corresponding cluster as training data (Figure 1C). The marker tendency is used as a constraint for training each FFN.

We consider the first states in the pairs of initial cellular states be the initial input. By applying iFFN, tFFN, and fFFN to each corresponding initial input as iFFN(initial input) and tFFN[iFFN(initial input)], we can construct connected trajectories from initial input to final output states (Figure 1D).

Using the set of states on each connected trajectory as new training data, we can construct cFFN for the node (Figure 1E). The entire MRN is then composed of cFFNs of the nodes within a network model, which is illustrated by a conceptual diagram in Figure 1F.

Toy Network for Illustrating FFNs

Construction of cFFN

We demonstrate an example of building iFFN, tFFN, fFFN, and cFFN using a toy network of six nodes with Boolean update logics to identify control targets in Figure 2. The graph in Figure 2A only represents collective regulatory relationships between two nodes in the network (without considering the regulatory logics), and node N6 is considered as a unique marker in this case, where a state is the desired state if N6 is active (value 1) or otherwise undesired (value 0) as shown in Figure 2A. All possible states except one state from the toy network converge to an undesired state, which is called an undesired attractor, and are partitioned into seven sets: D0 denotes a singleton set of the undesired attractor. Dj denotes those states converging to the attractor when they are updated j (1 ≤ j ≤ 6) times.

FIGURE 2

We use Dj to generate initial, transitional, and final cellular states as shown in Figure 2B. Fifteen states randomly chosen from D6, D5, and D4 and their one-time updated states are represented as the first and second states of ordered pairs of initial states, respectively. Fifteen states randomly chosen from D3^∗ and their one-time updated states are represented as the first and second states of ordered pairs of transitional states, respectively. Here, D3^∗ denotes the set of all states in D3 except those states that are updated from the second initial states. Fifteen states randomly chosen from D2^∗ and D1 and their updated states are represented as the first and second states of ordered pairs of final states, respectively, where D2^∗ denotes the set of all states in D2 except those states that are updated from the second transitional states (Supplementary Data 1).

The first and second states of ordered pairs of initial, transitional, and final states are used for training input and target of iFFN, tFFN, and fFFN, respectively, as shown in Figure 2C. The constraint of marker tendency is also considered when training each FFN (see section “Materials and Methods” for details). A sequential application of iFFN, tFFN, and fFFN to the initial input, iFFN(initial input), and tFFN[iFFN(initial input)], produces 15 trajectories as shown in Figure 2D. The two consecutive states on each trajectory are used as training input and target for a Boolean FFN, which is cFFN as shown in Figure 2E.

Conversion of Undesired States With cFFN

We demonstrate that cFFN can be used in identifying control targets for state conversion of undesired states to desired ones. Pinning the values of single node or two nodes during state update is referred to as single or double controls, respectively. To validate whether the control candidates identified from cFFN can drive the undesired states to desired ones, we compare the control “candidates” to control “targets” found by extensive simulation analysis of the original Boolean network models of MRNs.

Single-control target

To evaluate control candidates, we search for all single-control targets by simulating the Boolean network model of this toy network. For this particular example, when pinning the value of a node to 0 and updating every state according to the regulatory logics of the Boolean network model, there exists a state that cannot be driven to a desired state. This shows that there is no single-control target of value 0 in this case. However, there is a unique single-control target of value 1. Pinning the value of N4 to 1 is the only way to drive all possible states to desired states. This shows that N4 is a unique single-control target of value 1. To examine whether cFFN can be used to identify N4, we consider each node Nj (1 ≤ j ≤ 6) in cFFN as a single-control candidate of value 1.

To identify control candidates as the unique single-control target N4 by using cFFN, we define the probability of each single-control candidate of value 1 to be a single-control target of value 1. Here, the value of Nj is fixed to 1 in a given cFFN, and every possible state is updated using the cFFN. Then, the number of states driven to desired states is counted. After obtaining such counted numbers of all single-control candidates, Nj gets a score 1 if the counted number of Nj is one of the two highest numbers of the candidates, or 0 otherwise. Here, the number 2 is a kind of hyperparameter. We repeat this scoring process for each of 1,000 cFFNs and divide the total score of Nj by 1,000, which is represented as the probability of Nj shown in the left panel of Figure 2F. As a result, the single-control target N4 has the highest probability among all of the single-control candidates.

Double-control target

First, we performed a case study for double control by pinning the values of two nodes to (1, 1). We find that, if one of seven pairs, (N1, N3), (N1, N4), (N2, N4), (N3, N4), (N3, N6), (N4, N5), and (N4, N6), has pinned values as (1, 1), any states would eventually converge to desired states. As a result, those seven pairs are identified as double-control targets of values (1, 1). To examine whether cFFN can be used for identifying such double-control targets of values (1, 1), we consider 15 pairs of two nodes as double-control candidates of values (1, 1) and evaluate each of them. To identify control candidates for the double-control targets of values (1, 1) by using cFFN, the probability of each double-control candidate of values (1, 1) to be a target of values (1,1) is defined similarly to that used in the case of single-control candidate. This can be done by replacing single control and the two highest numbers with double control and the eight highest numbers, respectively. We present the probability in the right panel of Figure 2F. We find that five of the seven double-control targets of values (1,1) are in the list of five highest probabilities.

We performed the second case study for double control by pinning the values of two nodes to values (0, 1) since there is no double-control target of values (0, 0). If one of five ordered pairs, (N1, N4), (N2, N4), (N3, N4), (N5, N4), and (N6, N4), has values (0, 1), then any states would eventually converge to the desired states. As a result, those five pairs are identified as double-control targets of values (0, 1). To examine whether cFFN can be used for identifying such five double-control targets of values (0,1), we consider 30 ordered pairs of two nodes in cFFN as double-control candidates of values (0, 1) and evaluate each of them. The probability of each double-control candidate of values (0, 1) to be a target of values (0, 1) is defined similarly to that used in the case of double-control candidate of values (1, 1) by replacing the eight highest numbers with the 10 highest numbers. We present the probability in Supplementary Figure 1, where all the five double-control targets of values (0, 1) have the five highest probabilities.

Applications of FFN for Identifying Biomolecular Control Targets

To construct cFFN of an MRN and demonstrate its applicability for identifying control targets as in Figure 3A, we employ two biomolecular network models. One of the network models is composed of 21 nodes and has a large portion (81.73%) of states converging to an undesired state. In contrast, the other network model is composed of 33 nodes and has a unique undesired state with a very small portion (0.02%) of states converging to an undesired state.

FIGURE 3

Colitis-Associated Colon Cancer Network