- Laboratory of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan

In general, it is very difficult to measure the information flow in a cellular network directly. In this study, based on an information flow model and microarray data, we measured the information flow in cellular networks indirectly by using a systems biology method. First, we used a recursive least square parameter estimation algorithm to identify the system parameters of coupling signal transduction pathways and the cellular gene regulatory network (GRN). Then, based on the identified parameters and systems theory, we estimated the signal transductivities of the coupling signal transduction pathways from the extracellular signals to each downstream protein and the information transductivities of the GRN between transcription factors in response to environmental events. According to the proposed method, the information flow, which is characterized by signal transductivity in coupling signaling pathways and information transductivity in the GRN, can be estimated by microarray temporal data or microarray sample data. It can also be estimated by other high-throughput data such as next-generation sequencing or proteomic data. Finally, the information flows of the signal transduction pathways and the GRN in leukemia cancer cells and non-leukemia normal cells were also measured to analyze the systematic dysfunction in this cancer from microarray sample data. The results show that the signal transductivities of signal transduction pathways change substantially from normal cells to leukemia cancer cells.

## Introduction

No cell lives in isolation. Even eukaryotic microorganisms such as yeast, slime molds, and protozoans can secrete molecules called pheromones to coordinate the aggregation of free-living cells for sexual mating or differentiation under certain environmental conditions (Lipke and Kurjan, 1992; Figueiras et al., 2009; O'day and Keszei, 2012). The most important among these molecules are extracellular signaling molecules that function within plants and animals to control metabolic processes within cells, the growth and differentiation of tissues, the synthesis and secretion of proteins, and the composition of the intracellular and extracellular fluids. During intercellular communication or cellular stress responses, the cell senses extracellular signals. Different external changes or events may stimulate signaling. Typical signals are hormones, pheromones, heat, cold, light, osmotic pressure, and the appearance or changes in the concentration of substances such as glucose, potassium ions, calcium ions, or cyclic adenosine monophosphate (cAMP) (Dibner et al., 2010; Dodd et al., 2010; Kim and Choi, 2010; Kim et al., 2010; Leung and Sharp, 2010; Mosenden and Tasken, 2011). In the flow of the signal transduction pathway, the extracellular signals are perceived by a transmembrane receptor. The receptor changes its own state from inactive to active and then triggers subsequent cellular processes. The active receptor stimulates an internal signaling cascade. This cascade frequently includes a series of changes in protein phosphorylation states. The changes affect downstream proteins across the nuclear membrane. Eventually, a transcription factor (TF) is activated or deactivated, which changes its binding activity to target genes that encode the corresponding proteins in response to extracellular signals or stresses. Therefore, signal transduction pathways can also be viewed as information-processing and transferring systems to control the gene activities of cells in response to stimuli (Tay et al., 2010).

With regards to information-processing and transferring systems, many studies have investigated the properties of signal transduction pathways, such as amplification(Little et al., 2011), specification (Corada et al., 2010), adaptive ultrasensitivity (Srividhya et al., 2011), oscillation (Waters et al., 2014), and synchronization (Liu et al., 2014). However, owing to the complex nature of dynamic networks, knowledge of their components and interactions is often not sufficient to interpret their system behavior. Therefore, it remains challenging to analyze the information flow in signal transduction pathways efficiently.

Information transmission ability was first expressed as a mathematical formula in Koshland et al. (1982). This report focused on the sensitivity amplification of signals, which is defined as the ratio of percent change in output response to percent change in input signals, i.e., the relative change in transduction system output with respect to a specific input. Signal amplification is also defined as the signal gain of the signal transduction pathways (Chen and Lin, 2012; Chen and Wu, 2012). Information flow is necessarily interpreted on a case-by-case basis, i.e., the signal transductivity measured in a signal transduction pathway is affected not only by the structure of the signal transduction system but also by the input to the signal transduction system.

In this study, the information flows of both coupling signal transduction pathways and the downstream gene regulatory network (GRN) were estimated by system identification techniques and a systems biology method using microarray data. First, the dynamic information flow model of coupling transduction pathways was identified by microarray temporal data. Then, based on a system theory about the discrete-time state-space model, the signal transductivity was estimated for each of the coupling transduction pathways from receptors at the membrane to TFs in the nucleus from a system gain perspective. When the microarray data were not time-profile data, but rather represented different samples at a single time point, the information flow for the coupling signal transduction pathway was also estimated based on a linear regression model and the recursive least square parameter estimation method.

In general, it is challenging to calculate the information transductivity from one gene to another in a GRN from graph theory, especially with a large directed graph (digraph). In this study, instead of using the digraph method, the regulatory dynamic model was rearranged as a dynamic state-space system with a single gene as an input and another as an output. Then, based on the state-space dynamic system theory and the recursive parameter estimation method, we estimated the information transductivity of a GRN from one gene to another from the microarray temporal data. Similarly, when the microarray data represented a single time point but different samples, we estimated the information transductivity between genes in the GRN based on the steady-state model.

Finally, we used microarray sample data to estimate the information flow in cancer-related signal transduction pathways and a GRN. We compared the information flows of signal transduction pathways between leukemia cancer cells and non-leukemia normal cells. Based on the information flow analysis, we traced back the main cause of systematic dysfunction of the related proteins in the signal transduction pathways. Furthermore, we also identified the systematic dysfunction of information flow between genes related to leukemogenesis. The methods proposed here are very useful for estimating signal transductivity or information transductivity in cellular systems using microarray temporal or sample data. Since the information flow model can also be identified using high-throughput data such as next-generation sequencing (NGS) or proteomic data, the proposed method can also be used to estimate signal transductivity and information transductivity from NGS or proteomic data efficiently in future.

System identification technologies for discrete-time systems and linear matrix inequalities are standard methods, which, when combined with system gain theories and microarray data to estimate the information flow in signaling pathways in cellular systems, provide a new method for measuring information flow in cellular networks using microarray data. To the best of our knowledge, this is the first study to quantify the information flow between large coupling signal transduction pathways and a complex GRN using corresponding microarray data.

The proposed method has the following limitations. (1) We use a linear model to approximately measure the information flow. (2) Proteomic data that comprise differential expressions are not discussed in this study. Although the information flow can be measured using a non-linear model, the parameter estimation of the networks will be more difficult due to the non-linearity of the model, and more samples of microarray data are required for the identification of more parameters in a non-linear model. Furthermore, the information flow also becomes more difficult to measure in coupling signaling pathways and a GRN. Additionally, the results measured by the proposed methods will be biased by microarray data noise. In other words, the variance of parameter estimation error, i.e., the bias of the proposed least square parameter estimation, is proportional to the variance of microarray data noise (Johansson, 1993).

## The Information Flow in Signal Transduction Pathways

### Estimation of Signal Transductivity by Microarray Temporal Data

For the coupling signal transduction pathways in Figure 1 throughout intercellular communication or cellular stress response, the receptors in the cell membrane sensed extracellular signals. *y _{i}*(

*t*) denotes the expression level of the

*i*th protein in the coupling signal transduction pathways. They are commuted to intracellular signals and sequences of reactions.

*u*(

_{i}*t*) denotes extracellular signals. Different external changes or events outside the cell may stimulate signaling. Typical extracellular signals are hormones, pheromones, heat, cold, light, osmotic pressure, and appearance or concentration change of substance such as glucose, potassium ion, calcium ion, or cAMP (Klipp, 2005; Lin et al., 2007; Dibner et al., 2010; Dodd et al., 2010; Kim and Choi, 2010; Kim et al., 2010; Leung and Sharp, 2010; Li and Chen, 2010; Mosenden and Tasken, 2011). The extracellular signals

*u*(

_{i}*t*) for

*i*= 1,…,

*l*are perceived by a transmembrane receptor, as depicted in Figure 1. The receptor changes its own state from susceptible to active and then triggers subsequent processes within the cell. The active receptor stimulates an internal signaling cascade. This cascade frequently includes a series of changes in protein phosphorylation states. The changes affect downstream proteins across the nuclear membrane. Eventually, the TF is activated or deactivated to change its binding activity to target genes. In the information flow chart of simple coupling transduction pathways in Figure 1,

*y*

_{13}(

*t*),

*y*

_{14}(

*t*),

*y*

_{15}(

*t*), and

*y*

_{16}(

*t*) represent the expression levels of terminal TFs in the simple coupling signal transduction pathways.

**Figure 1. Information flow chart depicting coupling signal transduction pathways that control gene activities**. *u*_{1}, *u*_{2}, *u*_{3}, and *u*_{4} are extracellular signals; *y*_{13}, *y*_{14}, *y*_{15}, and *y*_{16} are expression levels of TFs.

For the purpose of system identification for the information flow of coupling signal transduction pathways in Figure 1, a simple regression model for the expression level of the *i*th protein at time *t* + 1 can be described as follows:

where *y _{i}*(

*t*) indicates the expression level of the

*i*th protein at time

*t*;

*c*denotes the interaction ability between protein

_{i, j}*i*and protein

*j*;

*h*denotes the basal level of the

_{i}*i*th protein; and

*b*denotes the binding ability of extracellular signal

_{ij}*j*to the

*i*th protein. In general, extracellular signals always bind to the receptor proteins on the membrane.

Let us denote the state vector and system matrix of coupling signal transduction pathways of Figure 1 in Equation (1) as follows:

Then, the network of coupling signal transduction pathways in Figure 1 is represented by:

To exploit the effect of extracellular signals *u*(*t*) on the coupling signal transduction pathways, the effect of basal level *H* should be extracted from the dynamic network in Equation (2). Without consideration of extracellular signal *u*(*t*), the network of coupling signal transduction pathways in Equation (2) can be represented by:

which is only the effect of basal level *H* and noise *w*(*t*).

Let us denote $\tilde{{y}}$(*t*) = *y*(*t*)−$\widehat{{y}}$(*t*) and subtract Equation (3) from Equation (2). We then get

where $\tilde{{y}}$(*t*) denotes the information flow in the coupling signal transduction pathways due to extracellular signals *u*(*t*) in Figure 1.

In this case, the initial condition is at zero, $\tilde{{y}}$(0) = 0. The solution of the recursive dynamic equation in (4) is given by the following information flow equation (Ogata, 1987):

Obviously, if the system matrices *C* and *B* of the coupling signal transduction pathways can be identified from the microarray data, the information flow in the input/output response equation in (5) can be calculated.

Let us denote the signal transduction level ρ of the coupling signal transduction pathways from the extracellular signals *u*(*t*) to $\tilde{{y}}$(*t*) in Equation (4):

where *t _{p}* denotes the present time;

*l*

_{2}[0,

*t*] denotes the set of all possible extracellular signals with bounded energy in [0,

_{p}*t*]; and ρ denotes the upper bound of the signal transductivity of the coupling signal transduction pathways.

_{p}**Proposition 1:**

The coupling signal transduction pathways in Equation (4) have a signal transduction level ρ in Equation (6) if the following linear matrix inequality (LMI) holds for some positive definite matrix *P* = *P ^{T}* > 0

Proof: see Appendix A.

Since ρ is the upper bound of signal transductivity ρ_{0} of the coupling signal transduction pathways from extracellular signals *u*(*t*) to $\tilde{{y}}$(*t*)in Equation (4), the signal transductivity of the signal transduction network in Figure 1 can be obtained by solving the following LMI-constrained optimization problem:

The constraint optimization problem in Equation (8) can be easily solved by decreasing the upper bound ρ in Equation (7) until no *P* > 0 exists in Equation (7) by using the MATLAB LMI toolbox.

**Remark 1**: The signal transductivity ρ_{0} in Equation (8) is equivalent to the system gain from *u*(*t*) to $\tilde{{y}}$(*t* + 1) in Equation (4) (Boyd, 1994; Chiu and Chen, 2011):

from a system theory perspective.

If we want to know the information flow from *u*(*t*) to any protein, then the coupling signal transduction dynamic equation in (4) should be represented by

where $\tilde{{y}}$* _{i}*(

*t*) denotes the expression of the

*i*th protein and

*D*is a row vector with all zeros except 1 at the

_{i}*i*th element.

**Remark 2**: (i) The information flow from *u*(*t*) to $\tilde{{y}}$* _{i}*(

*t*) in the signal transduction dynamic equation in (10) is given by ${\tilde{{y}}}_{{i}}{(}{t}{+}{1}{)}{\text{}}{=}{\text{}}{\displaystyle {{\sum}}_{{j}{=}{0}}^{{t}}{{D}}_{{i}}{{C}}^{{t}{-}{j}}{B}{u}{(}{j}{)}{,}{\text{for}}{i}{=}{1}{,}{\text{}}{\dots}{,}{\text{}}{M}{.}}$

In this situation, the signal transductivity ρ^{i}_{0} from *u*(*t*) to the *i*th protein in the coupling signal transductivity pathways is solved by the following LMI-constrained optimization problem:

(ii) Further, if we only want to discuss the signal transductivity from the *l* extracellular signals to the *i*th protein, and we let *B _{l}* = [

*b*

_{1}

*…*

_{l}*b*]

_{Ml}*, the binding vector of the*

^{T}*l*th extracellular signal to the

*M*proteins of the coupling signal transduction pathway, it can be calculated by solving the following LMI-constrained optimization problem:

(iii) If the NGS data and proteomic data are available, epigenetic regulations, such as DNA methylation and histone modification, can be also involved in our model. The regulations are considered as the extra inhibition term −*B _{m}m*(

*t*) in Equation (2).

*m*(

*t*) denotes the expressions of DNA methylation or histone. In this case, Equation (2) is modified as:

We then obtain

It does not influence signal transductivity ρ_{0} from *u*(*t*) to *y*(*t*) in Equation (9) but will influence output signal *y*(*t*).

(iv) The above LMI-constrained optimization problem can be easily solved with the help of LMI solver in the MATLAB LMI toolbox by decreasing ρ until no *P* > 0 exists in the LMI. The LMI solver works essentially in four steps, initial guess, elimination of equality constraints, elimination of variables, and optimization. The pipeline description of the procedure to solve the LMI-constrained optimization problem in Equation (12) is available (Elghaoui et al., 1995).

Based on the above analyses, if the system matrices *C, H*, and *B* can be identified from the microarray temporal data or proteomic temporal data, the signal flow in Equation (5) and the signal transductivity ρ_{0} in Equation (8), (11), or (12) can be easily estimated by solving the corresponding LMI-constrained optimization problem. Therefore, as follows, we will focus on how to estimate these system matrices of coupling signal transduction pathways in Equation (2) from microarray data. In general, we do not identify *C, H*, and *B* from Equation (2) directly owing to its complex computation with much more round off error in the parameter identification process. We want to estimate these parameters protein-by-protein from Equation (1). From Equation (1), they can be represented by the following regression form:

By the recursive least square parameter estimation algorithm (Johansson, 1993),

This recursive least square parameter estimation can use microarray temporal data to update parameters protein-by-protein. Therefore, it can be used for real-time parameter estimation. If the number of time-profile data *y _{i}*(

*t*) is small, we can repeat several rounds of the recursive parameter estimation algorithm in Equation (14) with the previous result as initial parameter estimate θ

_{i}(0) and initial

*P*(0) to achieve the optimal parameter estimate. After parameter estimate θ

_{i}*from the least square parameter estimation algorithm in Equation (14) for all proteins in the coupling signal transduction pathways, i.e.,*

_{i}*i*= 1,…,

*M*, we can estimate the system matrices

*C, H*, and

*B*of both coupling signal transduction pathways in Equation (2) from microarray temporal data. By substituting these system parameters into the constrained optimization problem in Equation (8), (11), or (12), we can estimate different kinds of signal transductivities from the extracellular signals to proteins in the coupling signal transduction pathways through microarray temporal data.

### Estimation of Signal Transductivity by Microarray Sample Data

If the microarray data for measuring signal transductivity is of one time-point microarray from different samples, the regression model for the protein expression level of the *i*th protein in the coupling signal transduction pathways in Figure 1 cannot be represented by the discrete-time dynamic equation in (2) but can be represented by the following linear static regression form:

where *y*(*k*) = [*y*_{1}(*k*) … *y _{M}*(

*k*)]

*;*

^{T}*u*(

*k*) = [

*u*

_{1}(

*k*) …

*u*(

_{l}*k*)]

^{T};

*w*(

*k*) = [

*w*

_{1}(

*k*) …

*w*(

_{M}*k*)]

*;*

^{T}*C, H*, and

*B*are defined as in Equation (2); and

*k*= 1,…,

*K*denote the samples of microarray data.

In this situation,

From Equation (16), it is seen that the transduction function *T* from *u* to *y* is

Then, the signal transductivity is obtained as Boyd (1994):

where σ_{max}(·) denotes the maximum singular value.

Therefore, if we want to estimate the signal transduction function *T* in Equation (17) or the signal transductivity in Equation (18), we need to estimate the system matrices *C* and *B* in Equation (15) from the microarray sample data. From the linear regression model in Equation (15), we get:

The recursive least square parameter identification for θ* _{i}* of the

*i*th protein in Equation (19) with

*K*sample microarray is given by Johansson (1993):

If the sample number of microarray data is small, we can repeat several rounds of the recursive parameter estimation algorithm in Equation (20) with previous results as initial conditions θ* _{i}*(0) and

*P*(0) to achieve the optimal parameter estimate. After the parameters θ

_{i}*for*

_{i}*i*= 1,…,

*M*are estimated by the recursive least square estimation algorithm in Equation (20) through microarray data with

*K*samples, we can estimate

*C, H*, and

*B*in Equation (15). Then, the signal transduction function

*T*in Equation (17) or the signal transductivity ρ

_{0}in Equation (18) can be calculated through microarray sample data. If we only want to estimate the signal transduction from all extracellular signals

*u*(

*k*) to the

*i*th protein (or TF)

*y*(

_{i}*k*), then

*T*in Equation (17) should be replaced by:

where *D _{i}* is defined in Equation (10).

The signal transductivity from all extracellular signals *u*(*k*) to the *i*th protein is given by:

where || · ||_{2} denotes 2-norm.

**Remark 3**: If we only want to estimate the information flow from the *l*th extracellular signal *u _{l}*(

*k*) to the

*i*th protein (TF)

*y*(

_{i}*k*), then

*T*in Equation (21) should be replaced by:

and the signal transductivity from *u _{l}*(

*k*) to the

*i*th protein is given by:

where | · | denotes the absolute value.

## The Information Flow in a GRN

### Estimation of Information Transductivity of a GRN by Microarray Temporal Data

Consider the information flow of the GRN in Figure 2. The regulatory dynamics of the *i*th gene can be represented by the following regressive equation

where *x _{i}*(

*t*) denotes the gene expression level of the

*i*th gene;

*v*(

_{i}*t*) denotes noise and residue; and

*a*denotes the regulatory ability of gene

_{i, j}*j*on gene

*i*.

**Figure 2. Information flow in a GRN**. *x _{i}*(

*t*) denotes the gene expression of the

*i*th gene. Since the gene regulation is directional, the graph of the GRN is a directed graph (digraph). The signal processing model of the GRN is given in Equation (26) by microarray temporal data; the signal flow between any two genes is described by Equation (28) and the information transductivity can be obtained by solving Equation (30). If data are available from only one temporal sample microarray, the static state-space model in Equation (32) is used, and the information flow between any two genes is given by Equation (34), with information transductivity in Equation (36).

**Remark 4**: (i) In general, a positive value of *a _{i, j}* means the

*j*th gene is an active regulator while a negative value of

*a*means the

_{i, j}*j*th gene is an inhibitive regulator. (ii) The regulatory parameter

*a*, for

_{i, j}*j*= 1,…,

*n*, in Equation (24) can be identified by the recursive least square parameter estimation algorithm in Equation (14) through the corresponding microarray temporal data

*x*(

_{i}*t*), for

*i*= 1,…,

*n*(Chen and Wang, 2006).

Therefore, the GRN in Figure 2 can be represented as follows:

Suppose we want to estimate the information transductivity from gene *j* to gene *i*. In general, it is difficult to solve the information flow problem for the GRN in Figure 2 from the graph theory perspective (Kreyszig, 1993), especially for a digraph (directed graph) like Figure 2. In this study, an input/output state-space method is proposed to solve this difficult information flow problem of the digraph as follows. First, the dynamic model of the GRN in Equation (25) is represented by the following input/output dynamic state-space equation (Ogata, 1987):

In the input/output dynamic state-space system (Equation 27), *x _{j}*(

*t*) is considered an input signal and

*x*(

_{i}*t*) an output signal. Therefore, Equation (27) is simply represented by:

which is similar to the signal transduction dynamic equation in (10) but with the gene expression level *x _{j}*(

*t*) replacing the extracellular signal

*u*(

*t*).

By the system theory (Ogata, 1987; Johansson, 1993), the information flow from gene *j* to gene *i* in the GRN in Figure 2 is given by

Similar to solving the LMI-constrained optimization problem in Equation (12) for the information transductivity from the *l*th extracellular signal to the *j*th protein, the information transductivity γ_{0}* ^{i, j}* from gene

*j*to gene

*i*, i.e., the system gain from

*x*(

_{j}*t*) to

*x*(

_{i}*t*) γ

^{i, j}_{0}= ${\mathrm{sup}}{\text{}}\frac{{{\Vert}{{x}}_{{i}}{(}{t}{)}{\Vert}}_{{2}}}{{{\Vert}{{x}}_{{j}}{(}{t}{)}{\Vert}}_{{2}}}$, can be estimated by solving the following LMI-constrained optimization problem:

Based on the above analysis, we can easily calculate the regulatory information flow in Equation (29) and solve the LMI-constrained optimization in Equation (30) for the information transductivity γ_{0}* ^{i, j}* between any two genes in a GRN if we identify the system parameter

*A*in Equation (26) from the microarray temporal data.

### Estimation of Information Transductivity of a GRN by Microarray Sample Data

If the microarray data for measuring the information flow of the GRN in Figure 2 are of one time-point microarray from different samples, then the regression model for gene regulation in Equation (24) is modified to the following:

where *x*_{1}(*k*),…, *x _{n}*(

*k*) denote the gene expression levels of the GRN in Figure 2 at the

*k*th sample microarray. Therefore, the whole GRN in Figure 2 can be represented by:

where $\begin{array}{l}{X}{(}{k}{)}{\text{}}{=}{\text{}}{\left[}\begin{array}{c}{{x}}_{{1}}{(}{k}{)}\\ {\vdots}\\ {{x}}_{{n}}{(}{k}{)}\end{array}{\right]}{,}{\text{}}{v}{(}{k}{)}{\text{}}{=}{\text{}}{\left[}\begin{array}{c}{{v}}_{{1}}{(}{k}{)}\\ {\vdots}\\ {{v}}_{{n}}{(}{k}{)}\end{array}{\right]}\end{array}$, and $\begin{array}{l}{A}{\text{}}{=}{\text{}}{\left[}\begin{array}{ccc}{{a}}_{{1}{,}{\text{}}{1}}& {\cdots}& {{a}}_{{1}{,}{\text{}}{n}}\\ {\vdots}& {\ddots}& {\vdots}\\ {{a}}_{{n}{,}{\text{}}{1}}& {\cdots}& {{a}}_{{n}{,}{\text{}}{n}}\end{array}{\right]}\end{array}$.

Suppose we want to calculate the information flow from gene *j* to gene *i* of the GRN in Figure 2. Then, Equation (32) needs to be re-arranged as

where *A _{j}, B_{j}*, and

*D*are defined in Equations (27) and (28).

_{j}Then, from Equation (33), we can get the information flow from gene *j, x _{j}*(

*k*), to gene

*i, x*(

_{i}*k*), as follows:

Hence, the regulatory information transduction equation from gene *j* to gene *i* is given by:

Then the information transductivity from gene *j* to gene *i* is given by

Therefore, if we can identify regulatory parameter *a*_{i, 1},…, *a _{i, n}* in Equation (31) by the recursive least square estimation algorithm in Equation (20) through microarray data with

*K*samples for all genes of a GRN, we can identify the regulatory matrix

*A*in Equation (32) and then

*A*, and

_{j}, B_{j}*D*. In this situation, we can estimate the information transduction equation

_{i}*T*in (35) from gene

_{i, j}*j*to gene

*i*and information transductivity γ

_{0}

*in Equation (36) for any*

^{i, j}*i*and

*j*.

**Remark 5**: (i) MicroRNA-mediated repressions can be also involved in our model if NGS data is available. The repressions are considered as the extra inhibition term −*B _{m}m*(

*t*) in Equation (26).

*m*(

*t*) denotes the expressions of microRNAs. In this case, Equation (26) is modified as:

Then, Equation (28) is modified as:

Therefore, we obtain

In this case, the information transductivity γ_{0}* ^{i, j}* = ${\mathrm{sup}}{\text{}}\frac{{{\Vert}{{x}}_{{i}}{(}{t}{)}{\Vert}}_{{2}}}{{{\Vert}{{x}}_{{j}}{(}{t}{)}{\Vert}}_{{2}}}$ from gene

*j*to gene

*i*is the same as found in Equation (30). However, the gene expression

*x*(

_{i}*t*) with microRNA-mediated repressions is different from that without the repressions due to the extra inhibition term −

*B*(

_{m}m*t*) in the above equation. (ii) For the system identification of Equation (13), (19), (26), or (32), the noise term,

*w*or

_{i}*v*, is also model residue. In other words, it represents the modeling error and environmental noise. In general, it cannot be measured beforehand, and is corrupted in microarray data. After the system parameter

*θ*was estimated by the system identification in Equation (14), the noise could be estimated by the modeling error $\widehat{{w}}$ =

*y*−ϕ $\widehat{{\theta}}$, where $\widehat{{\theta}}$ is the estimate of

*θ*, from Equation (13) and Equation (19).

## Example of Calculating Information Flow in Signal Transduction Pathways and Control of a GRN

Following on from the analyses of signal transductivity of coupling signal transduction pathways based on protein–protein interaction data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (Kanehisa and Goto, 2000; Kanehisa et al., 2014) and the information transductivity of a GRN based on the GRN from the TRANSFAC gene-regulation database (Matys et al., 2006) in the above section, we applied the microarray data for estimation of the signal transductivities of the signal transduction pathways in normal cells and cancer cells (Figure 3), and compared the differences between these to determine the dysfunction in information flow due to carcinogenesis. Furthermore, we estimated the information transductivities between different genes in a GRN in normal and cancer cells (Figure 4). The proteins with dysfunctional information flow in signal transduction pathways and a GRN can be considered as therapeutic targets. Using the microarray sample raw data from the Gene Expression Omnibus (GEO) database (accession number: GSE 13159) (Haferlach et al., 2010) as *y*(*k*) and the data of the corresponding ligand for each receptor protein as *u*(*k*) in Equation (15), we used the recursive least square parameter estimation method to identify the system parameters *C, H*, and *B* in Equation (15) of the coupling transduction pathways shown in Figure 3. The estimated signal transductivities from extracellular signals to 28 TFs in acute myeloid leukemia (AML) cancer cells and non-leukemia normal cells are given in Table 1. Among them, the signal transductivity of each protein in the MAPK and PI3K-SKT coupling pathways in normal and cancer cells is shown in Figure 5. The dysfunction in signal transductivity in a protein is mainly due to genetic mutations involved in the carcinogenesis. The effects of these genetic changes on cellular function, via signal transductivity changes in TFs, are also shown in Figure 5. Similarly, the signal transductivity of each protein in the MAPK and JAK-STAT coupling pathways in normal and cancer cells is shown in Figure 6. The effects of signal transductivity changes in TFs on cellular functions are also shown in Figure 6. The information transductivity of the GRN related to leukemia is shown in Figure 4, based on the GRN from the TRANSFAC gene-regulation database (Matys et al., 2006). By using microarray sample raw data (accession number: GSE 13159) (Haferlach et al., 2010) as *X*(*k*) to identify the system matrix *A* of the GRN in Equation (32), we estimated the information flow γ_{0}* ^{i, j}* in Equation (36) between any two genes

*i*and

*j*in the GRN from Equation (36). For example, the information flow γ

_{0}

*from*

^{i, j}*STATs*to

*c-Jun*is 0.3104 in AML cancer cells and 3.3952 in normal cells; the information flow γ

_{0}

*in Equation (36) from*

^{i, j}*CREBs*to

*p53*is 0.6276 in AML cancer cells and 0.2273 in normal cells. This clearly shows that the information flow in a GRN can be affected by leukemia. Additionally, according to the results of parameter estimation methods in AML and non-leukemia cells, the correlation coefficients of noise

*v*(

*k*) and the identified parameter

*A*in Equation (32) in AML cells almost lie within −0.2 and +0.2 (96.71%,), while those in non-leukemia cells almost lie within −0.2 and +0.2 (98.58%), i.e., the estimation parameters are almost uncorrelated with noises.

**Figure 3. The coupling signal transduction pathways that control gene activities related to leukemogenesis**. The figure consists of 147 groups of proteins and 18 groups of TFs. The downstream network in the nucleus denotes the gene regulatory network as shown in Figure 4, while the pathways in plasma denote protein-protein interactions. Other notations are defined and shown at the top of this figure.

**Figure 4. The information flow in the gene regulatory network related to leukemogenesis in the downstream of coupling signal transduction pathways in Figure 3**. Each TF regulates other genes and acts as a gene of TF regulated by other TFs. The 18 leukemogenesis-related TFs and the regulations are determined by the TRANSFAC gene-regulation database and shown as follows: AP-2α, β-catenin, C/EBPα, p300, c-Myc, CREBs (ATF-4/CREB1/Luman/OASIS/CREB3L2/CREB-H/ AIBZIP/CREBPA), CSLs(RBP-JK/RBPJL), E2Fs (E2F1/E2F2/E2F3), Elk-1, c-Fos, HIF-1α, c-Jun, LEF-1, NF-κBs (NF-κB1/NF-κB1-p50/NF-κB2-p52, NF-κB2/ NF-κB2-p52/c-Rel/RelA-p65/RelB), Smad3, STATs (STAT1/STAT3/STAT5), TCFs (TCF-1/TCF-3/TCF-4), and p53.

**Table 1. The signal transductivities of signal transduction pathways in Figure 2 from extracellular signals to 28 TFs in non-leukemia normal cells and acute myeloid leukemia (AML) cancer cells**.

**Figure 5. Signal transductivities of the proteins in p38 MAPK and PI3K-AKT coupling pathways contributing to loss of transductivity of TF p53 at the acute myeloid leukemia (AML) subtype when compared with normal type**. The solid line represents the PPIs in the plasma membrane, while the dot-and-dash line represents the interactive contribution from the other pathways. The proteins underlined in red play an important role in dysfunction of the downstream TF. The other notations are shown at the top of the Figure. Stars ★ denote the locations at which the genetic mutations occur.

**Figure 6. Signal transductivity of the proteins in JAK-STAT and MAPK coupling pathways contributing to the gain of transductivity of STATs and the loss of transductivity of CEBPα at the acute myeloid leukemia (AML) subtype**. The notations are the same as those in Figure 5. The bold solid line with a pink arrow denotes the demonstration of the directional signal flow between two pathways.

In order to clarify the proposed method for a broad audience, we use a flowchart in Figure 7 to simplify the estimation procedures of signal transductivity of coupling signaling pathways and information transductivity in a GRN. We also clarified the estimation procedure of the method proposed in this study. If we want to measure signal transductivity of coupling signal transduction pathways and information transductivity in a GRN, some technical specifications are required. First, we need microarray sample data (or temporal data) or NGS sample data (or temporal data) for normal or cancer cells. Second, we need recursive parameter estimation algorithm in Equation (12) or (20), and singular value decomposition and basic matrix operation for Equations (18), (23), and (36). Finally, LMITOOL is required for solving the LMI-constrained optimization problem in Equation (30) (Elghaoui et al., 1995).

**Figure 7. The estimation flowchart of the signal transductivity for coupling signaling pathways and information transductivity for a GRN**. The figure summarizes parameter estimations and transductivity measurements of information flows in this study.

In order to estimate the signal transductivities of the coupling signal transduction pathways (Figure 3) obtained from the KEGG database and the information transductivities of the GRN (Figure 4) obtained from the TRANSFAC database in non-leukemia normal cells and AML cancer cells, we applied the microarray sample data from non-leukemia normal cells or AML cancer cells to the Equations (15) and (32). We first identified the system parameters in the models by using the recursive least square parameter estimation algorithm in Equation (20). Finally, we use Equations (22) and (36) to estimate the signal transductivities of coupling signaling pathways in Table 1 and Figures 5, 6 and information transductivities in the GRN, respectively.

## Conclusion

In this study, the signal transductivity and information transductivity in cellular networks were estimated using microarray temporal data and a state-space model of signal processing systems. If the microarray data are obtained from different samples with a single time point, the static state-space model can also be developed to measure the information flow of cellular systems from multi-input extracellular signals to multi-output TFs in the coupling signal transduction pathways. Furthermore, we also proposed an input/output state-space signal model to overcome the difficulties of the digraph theory method in efficiently estimating the regulatory one-gene-to-another in a complex digraph network of a GRN. Finally, the proposed signal transductivity and information transductivity methods were applied to measure the signal transductivity of coupling signal transduction pathways and the information transductivity of a GRN related to cancer via microarray sample data. By comparing signal transductivity and information transductivity between cancer and normal cells, we were able to determine the systematic dysfunctions of proteins and genes in the signal transduction pathways or a GRN easily, using the proposed method and microarray data. Since the proposed information flow models can also be identified using high-throughput data such as NGS, real-time PCR, or proteomic data, the proposed model has great potential for efficiently estimating the signal transductivity and information transductivity of cellular systems using such data. However, we do not discuss the use of differential expression protein data here.

## Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

The work was supported by the Ministry of Science and Technology of Taiwan under grant No. MOST 103-2745-E-007-001-ASP.

## References

Boyd, S. P. (1994). *Linear Matrix Inequalities in System and Control Theory*. Philadelphia, PA: Society for Industrial and Applied Mathematics. doi: 10.1137/1.9781611970777

Chen, B. S., and Lin, Y. P. (2012). On the information transmission ability of nonlinear stochastic dynamic networks. *Entropy* 14, 1652–1670. doi: 10.3390/e14091652

Chen, B. S., and Wang, Y. C. (2006). On the attenuation and amplification of molecular noise in genetic regulatory networks. *BMC Bioinformatics* 7:52. doi: 10.1186/1471-2105-7-52

Chen, B. S., and Wu, C. C. (2012). On the calculation of signal transduction ability of signaling transduction pathways in intracellular communication: systematic approach. *Bioinformatics* 28, 1604–1611. doi: 10.1093/bioinformatics/bts159

Chiu, W. Y., and Chen, B. S. (2011). Multisource prediction under nonlinear dynamics in WSNs using a Robust fuzzy approach. *IEEE Trans. Circuits Syst. I Regular Papers* 58, 137–149. doi: 10.1109/TCSI.2010.2055331

Corada, M., Nyqvist, D., Orsenigo, F., Caprini, A., Giampietro, C., Taketo, M. M., et al. (2010). The Wnt/beta-catenin pathway modulates vascular remodeling and specification by upregulating DII4/notch signaling. *Dev. Cell* 18, 938–949. doi: 10.1016/j.devcel.2010.05.006

Dibner, C., Schibler, U., and Albrecht, U. (2010). The mammalian circadian timing system: organization and coordination of central and peripheral clocks. *Annu. Rev. Physiol*. 72, 517–549. doi: 10.1146/annurev-physiol-021909-135821

Dodd, A. N., Kudla, J., and Sanders, D. (2010). The language of calcium signaling. *Annu. Rev. Plant Biol*. 61, 593–620. doi: 10.1146/annurev-arplant-070109-104628

Elghaoui, L., Nikoukhah, R., and Delebecque, F. (1995). “LMITOOL: a package for LMI optimization,” in *Proceedings of the 34th IEEE Conference on Decision and Control*. Vol. 1–4 (New Orleans, LA), 3096–3101.

Figueiras, A. N. L., Girotti, J. R., Mijailovsky, S. J., and Juarez, M. P. (2009). Epicuticular lipids induce aggregation in Chagas disease vectors. *Parasites Vectors* 2:8. doi: 10.1186/1756-3305-2-8

Haferlach, T., Kohlmann, A., Wieczorek, L., Basso, G., Kronnie, G. T., Bene, M. C., et al. (2010). Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. *J. Clin. Oncol*. 28, 2529–2537. doi: 10.1200/JCO.2009.23.4732

Kanehisa, M., and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. *Nucleic Acids Res*. 28, 27–30. doi: 10.1093/nar/28.1.27

Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2014). Data, information, knowledge and principle: back to metabolism in KEGG. *Nucleic Acids Res*. 42, D199–D205. doi: 10.1093/nar/gkt1076

Kim, E. K., and Choi, E. J. (2010). Pathological roles of MAPK signaling pathways in human diseases. *Biochim. Biophys. Acta* 1802, 396–405. doi: 10.1016/j.bbadis.2009.12.009

Kim, T. H., Bohmer, M., Hu, H. H., Nishimura, N., and Schroeder, J. I. (2010). Guard cell signal transduction network: advances in understanding abscisic acid, CO_{2}, and Ca^{2+} Signaling. *Annu. Rev. Plant Biol*. 61, 561–591. doi: 10.1146/annurev-arplant-042809-112226

Klipp, E. (2005). *Systems Biology in Practice: Concepts, Implementation and Application*. Weinheim: Wiley-VCH. doi: 10.1002/3527603603

Koshland, D. E., Goldbeter, A., and Stock, J. B. (1982). Amplification and adaptation in regulatory and sensory systems. *Science* 217, 220–225. doi: 10.1126/science.7089556

Leung, A. K. L., and Sharp, P. A. (2010). MicroRNA functions in stress responses. *Mol. Cell* 40, 205–215. doi: 10.1016/j.molcel.2010.09.027

Li, C.-W., and Chen, B.-S. (2010). Identifying functional mechanisms of gene and protein regulatory networks in response to a broader range of environmental stresses. *Comp. Funct. Genomics* 2010:408705. doi: 10.1155/2010/408705

Lin, L. H., Lee, H. C., Li, W. H., and Chen, B. S. (2007). A systematic approach to detecting transcription factors in response to environmental stresses. *BMC Bioinformatics* 8:473. doi: 10.1186/1471-2105-8-473

Lipke, P. N., and Kurjan, J. (1992). Sexual agglutination in budding yeasts - structure, function, and regulation of adhesion glycoproteins. *Microbiol. Rev*. 56, 180–194.

Little, A. S., Balmanno, K., Sale, M. J., Newman, S., Dry, J. R., Hampson, M., et al. (2011). Amplification of the driving oncogene, KRAS or BRAF, underpins acquired resistance to MEK1/2 inhibitors in colorectal cancer cells. *Sci. Signal*. 4:ra17. doi: 10.1126/scisignal.2001752

Liu, J. Y., Rice, J. H., Chen, N. N., Baum, T. J., and Hewezi, T. (2014). Synchronization of developmental processes and defense signaling by growth regulating transcription factors. *PLOS ONE* 9:e98477. doi: 10.1371/journal.pone.0098477

Matys, V., Kel-Margoulis, O. V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., et al. (2006). TRANSFAC (R) and its module TRANSCompel (R): transcriptional gene regulation in eukaryotes. *Nucleic Acids Res*. 34, D108–D110. doi: 10.1093/nar/gkj143

Mosenden, R., and Tasken, K. (2011). Cyclic AMP-mediated immune regulation—overview of mechanisms of action in T cells. *Cell. Signal*. 23, 1009–1016. doi: 10.1016/j.cellsig.2010.11.018

O'day, D. H., and Keszei, A. (2012). Signalling and sex in the social amoebozoans. *Biol. Rev*. 87, 313–329. doi: 10.1111/j.1469-185X.2011.00200.x

Srividhya, J., Li, Y. F., and Pomerening, J. R. (2011). Open cascades as simple solutions to providing ultrasensitivity and adaptation in cellular signaling. *Phys. Biol*. 8:046005. doi: 10.1088/1478-3975/8/4/046005

Tay, S., Hughey, J. J., Lee, T. K., Lipniacki, T., Quake, S. R., and Covert, M. W. (2010). Single-cell NF-kappa B dynamics reveal digital activation and analogue information processing. *Nature* 466, 267–271. doi: 10.1038/nature09145

Waters, K. M., Cummings, B. S., Shankaran, H., Scholpa, N. E., and Weber, T. J. (2014). ERK oscillation-dependent gene expression patterns and deregulation by stress response. *Chem. Res. Toxicol*. 27, 1496–1503. doi: 10.1021/tx500085u

## Appendix

### Appendix A

Let us denote the Lyapunov function of the coupling signal transduction pathways in Equation (4) as *V*($\tilde{{y}}$(*t*)) = $\tilde{{y}}$* ^{T}*(

*t*)

*P*$\tilde{{y}}$(

*t*), for some symmetric positive definite matrix

*P*=

*P*> 0. We have:

^{T}If we assume

Then, Equation (A1) becomes

Summing the above inequality from *t* = 0 to *t _{p}*, we get

By the factor $\tilde{{y}}$(0) = 0, we get

which in Equation (6).

The above inequality holds if the assumption of the inequality in Equation (A2) holds, i.e.,

i.e., if the LMI in Equation (7) holds, then the coupling signal transduction pathways in Equation (4) have a transduction level (Equation 6).

Keywords: system theory, information flow, signal transductivity, information transductivity, microarray data, signal transduction pathway, gene regulatory network

Citation: Chen B-S and Li C-W (2015) Measuring information flow in cellular networks by the systems biology method through microarray data. *Front. Plant Sci*. **6**:390. doi: 10.3389/fpls.2015.00390

Received: 27 November 2014; Accepted: 15 May 2015;

Published: 02 June 2015.

Edited by:

José Díaz, Universidad Autónoma del Estado de Morelos, MexicoReviewed by:

Elena R. Alvarez-Buylla, Universidad Nacional Autónoma de Mexico, MexicoArmando Hernandez-Mendoza, Universidad Autonoma del Estado de Morelos-Facultad de Ciencias, Mexico

Copyright © 2015 Chen and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bor-Sen Chen, Laboratory of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, EECS 619, No. 101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan, bschen@ee.nthu.edu.tw