Determining the Maximum States of the Ensemble Distribution of Boolean Networks

Inference of the gene regulation mechanism from gene expression patterns has become increasingly popular, in recent years, with the advent of microarray technology. Obtaining the states of genes and their regulatory relationships would greatly enable the scientists to investigate and understand the mechanisms of the diseases. However, it is still a big challenge to determine relationships from several thousands of genes. Here, we simplify the above complex gene state determination problem as an inference of the distribution of the ensemble Boolean networks (BNs). In order to investigate and calculate the distribution of the BNs’ states, we first compute the probabilities of the different BNs’ states and obtain the number of states Ω . Then, we find the maximum possible distribution of the number of the BNs’ states and calculate the fluctuation of the distribution. Finally, two representative experiments are conducted, and the efficiency of the obtained results is verified. The proposed algorithm is conceptually concise and easily applicable to many other realistic models; furthermore, it is highly extensible for various situations.

number of genes expression states at the same time, this incurs the difficulties of inferring the states of the gene expression at a given time stamp.
Recently, Cheng et al. [11] proposed the semi-tensor product (STP) of matrices, which can only represent the logical equation as an algebraic equation, but also convert the dynamics of a BCN into a linear discrete-time control system. Based on such reformation, many interesting properties have been obtained for BCN [12][13][14][15][16]. The optimal control is an interesting topic in system control theory. Other than the STP technique, they developed statistical methods for solving problems in BNs. A Mayer-type optimal control problem for BCNs with multi-input and single-input has been well studied in Refs. [17] and [18], respectively. The states of biological networks and electronic networks are often influenced by instantaneous disturbances. In addition, they may still experience abrupt changes at certain points, because of the switching phenomenon and sudden noise, that is, impulsive effects. Impulsive dynamical networks have attracted the interests of many researchers for their various applications in information science, bioinformatics, and automated control systems.
There are many cells with the same function in an organ. However, it is hard to get the states of every single cell. Here in this study, we find that the states of a proportion of cells share one particular distribution. Thus, it is useful for biologists to conclude whether the illness is caused by the changes of the cell state distribution or not.
From a biological standpoint, inference of gene regulation mechanism from expression patterns is becoming increasingly important, along with the invent of DNA microarray technology. Thus, we need to get the ensemble distribution of the BNs and determine the states of genes, which is the key for further exploration of the expression profiles of thousands of genes. Specifically, in this study, we proposed an algorithm for inferring the distribution states of the BNs. First, we compute the probability of different BNs' states and get the value of Ω. Second, we find the maximum possible distribution of the number of BNs' states, as well as the function of this distribution. Finally, two representative experiments are conducted to verify the efficiency of the obtained results. Although the practical genetic networks are different from the BNs in this study, the theoretical and practical results can be extended easily to the real-world scenarios. Moreover, the proposed algorithm is highly extensible in various scenarios because of the computational simpleness.

The States of Boolean Networks
This section provides a base knowledge for Section 2.1. Ω is the only hypothesis. In this section, we assume that the probability of each state is equivalent, which is used for the next efficiency.
First, we suppose that there are many Boolean networks in one group, and the probability of different BNs' states is P.
where Ω is the number of BNs. We assume that ψ j is however the jth state, M j is the number of ψ j in the BNs, and E j is the weight of ψ j . Evidently, the number of states is M, which is calculated as follows: and the value of the cells E is gives as Although we know the number of cells, it is difficult to determine, even if a distribution M j is given, what the specific state of each cell is. For example, suppose there are three cells in state 1 and five cells in state 2, we do not know which three cells are in state 1 and which five cells are in state 2. So the theorem 1 is given as follows in order to solve this problem.
Theorem 1We know the number of BNs in the ensemble is M and the value of the ensemble E. Given a distribution {M j }, it is easy to determine the number of states Ω as Proof: The system consists of M number of identical transforms, which have M! permutations. Given the condition that the total number of states do not change, if there exists n transforms M 1 → M 2 , denoted as the state 1 switching to the state 2, the number of M 2 states will increase by n, while the number of M 1 states will decrease by n. Therefore, the state permutation number is (pi)M j !, and Two specific examples are given to illustrate Theorem 1, while there is an ensemble with 5 BNs. Thus, M 5.
(1) We assume that there are three Boolean networks in state j 1 and two Boolean networks in state j 2 , then Ω is (2) We assume that there are four Boolean networks in state j 1 and one Boolean network in state j 2 , then Ω is We can easily find that even if the number of M is very large, the conclusion still holds.

The Maximum Probabilistic Distribution of Boolean Networks
In this section, we study and prove the maximum probabilistic distribution of the Boolean network. The maximum probabilistic Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 690748 distribution is a Gaussian distribution, and then the cells' states distribution can be determined as shown in Figure 1.
Although given E, M, and the distribution M j , it is not easy to figure out the particular states where the BNs are. The best probability of the distribution needs further calculation. Given Eq. 1, we can find that the more states in the system, the larger probability the states are. The probability of each distribution of the ensemble networks is proportional to the number of the BN state Ω. Thus, when determining the maximum probability, the maximum Ω should be specified. Under the constrained conditions (2) and (3), we can use the differentiation to calculate the maximum value of the states. Two Lagrange multipliers α and ß will be utilized, and the condition of the peak can be written as follows: To determine the probability, we need to assume that the number of M is relatively large. In contrast, the data of BNs do not need be large. When M goes to infinite, M j also goes to infinite. For M ≫ 1, we can use the Stirring's approximation to Using Eq. 8 (the specific calculation process is shown in the Appendix), we can get the following equation: When we compute the partial derivative of ln Ω, there are two ways to solve this problem (Eq. (8)), that is, one is fixing the M, while the other does not fix the M. The difference between the two solutions is a constant. In order to boost the computation, the second way for solving Eq. 9 is used.
Substituting Eqs 10-12 into Eq. 7, we can get the following equations: So there is When given the number of BN M, represented as the scale, we can get the distribution M j , given that the parameters α and ß should be specified in advance. To prove Theorem 2, two definitions are given as follows.
Definition II1 When P j is the best probability distribution, the probability of system in the state j is Definition II 2 Partition function [19] is where Q indicates the sum of the probability of all the states. The partition of Eq. 17 plays an important role as a normalization  E j e −βE j is the definition of E for succinctly, and the latter one in terms of formula expression is good for clarity and following computation.After the computation of P j , α can be eliminated, and ß can be expressed by the mean value E: From Eq. 3, it can be rewritten as Replacing Eq. 19 with Eq. 17, we can get From the result, we can get the information about that in a canonical ensemble. When E is given, M tends to infinite, P j and ß do not have any relationship with M.   Theorem 2When H and E are given and M tends to infinite, the best of the distribution M is the true distribution. In other words, the fluctuation is equal to 0.
ProofWe need to talk about a function, (21) However, Since the second term and the third term of f are the linear functions, the second derivative of M j equals to zero, which    Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 690748 5 means the peak is stable.Using the Taylor series which starts f (M j ) at point M j , the equation can be obtained as follows: The peak of f is as follows: Substituting Eqs 21 and 24 into Eq. 23, we can get Ignoring the term O ΔM M , we can get (27) Thus, we complete the proof of this theorem.

The Fluctuation of the Distribution
This section is aiming to prove that cells are impossible in the same states, when the number of cells goes to infinity. It is easy to find that Eq. 27 is a Gaussian distribution. Now, we need to prove the function Eq. 7 is a δ function. We need to prove the fluctuation would be eliminated when M → ∞. Here, Theorem three is provided as follows: Theorem 3 When M → ∞, the value of fluctuation tends to be 0, that is, Proof: There is a distribution that Then Eq. 29 can be rewritten as .

(30)
Comparing Eq. 27 with Eq. 30, we can get and substituting it into Eq. 28, there is where M → ∞. Hence, the proof of the theorem is completed.Until now, the proof of Theorem three is finished.
When H and E are fixed and M → ∞, the distribution with the maximum probability is the true distribution.

EXPERIMENTS
In this section, we perform analysis of the cells' states distribution model, that is, Eq 26. We establish that two experiments are conducted in order to illustrate the distribution of the BNs' states, which can be used to verify our conclusions. Since there are no practical data for the state changes of the same type of cells, we can only simulate the transformation process of these cells through Boolean network, and then we also perform extensive analyses of the data of the state changes of these cells.

A Boolean Network with 100 Cells
In this example, we choose the state change function [17]. While the number of cells is 100, the number of the same Boolean is 1,000. And the Boolean network's state change rule is illustrated as follows: where (x 1 , x 2 ) indicates the cell's state, while x 1 or 0, and the function indicates the state change rule. Hence, in this example, there are four states in 100 cells, and the state change rule is shown in Figure 2B.
Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 690748 6 Assume that the number of four initial states in the cells is shown in Table 1.
From Theorem 1, we can obtain the k combinations. We generate the particular network relationship between cells in a random manner, where each node represents a cell, and the edge indicates a connection between two cells. The probability of connecting the two cells is initialized as 0.05. The indicators of the association network between the cells are shown in the following table.
Through Figure 3; Table 2, we get the basic characteristics of this cellular network; there are 1,000 nodes, 2,781 edges, and so on. The visualization of the network is shown in Figure 2B. In this figure, different colors of the nodes are expressed as different states of the cells.
When the cell states change, they will be initialized with a random state, and the influence of states by other states is modeled as well. Assuming that the number of identical states between the connected cells is greater than 10, the other cells directly skip the changed state, and switch directly to the next state. Thus, the function Ω can be obtained as follows: (32) where Ω indicates the distribution, P i , i 1, 2, 3, 4 is the probability of the cells state, M is the number of cells, and M j is the number of jth states. The state change rule as shown in Figure 2A demonstrates the end state is (0, 0), meaning the cells getting the state (0, 0) twice. In addition, the state of the cells will be randomly assigned, in Figure 2B, and it is easy to find that when x 50, the distribution reaches its mode, showing that when all the states of the cells are equal, the state in the collection of cells is the most prominent.

A Boolean Network with 150 Cells
In this example, we choose the state change function similar to the previous reported one [12]. Here, the number of cells is 500, meaning the number of the same Boolean is 500. Along with the Boolean network's state change, the mathematical rules can be formatted as where (x 1 , x 2 ) mean the cell's state, and x 1 or 0, and the function is the state change rule, so in this example, there are eight states in 450 cells, and the state change rule is shown in Figure 4B.
Assume that the number of four initial states in the cells is as shown in Table 3.
Form We generate a network relationship among cells in a random manner, where each node represents the cell, and the edge indicates that there is a connection between the two cells, and the probability of connecting the two cells is 0.05. The indicators of the association network between the cells are shown in the following Table 4.
Through Figure 4; Table 3, we get the basic characteristics of this cellular network; there are 450 nodes, 1890 edges, and so on. The visualization of the network is shown in Figure 5B.
Here, different colors of the nodes are expressed as different states of the cells.
The state change rule as shown in Figure 5A, and the end state is (0, 0, 1), meaning the cells get the state (0, 0, 1) twice, and the state of the cells will be randomly assigned, in Figure 5B; it is easy to find that when x ≃ 18, the distribution reaches the peak. It means that when all the states of the cells are equal and the number of the eight states is approximately equal to 18, the collection of cells is the most prominent state.
From these two experiments, we verify that the distribution of these states is a Gaussian distribution, and these cells cannot be in the same state when the number of cells approaches to the infinity. Thus, the above theorems are right.

CONCLUSION
In this article, we study and calculate the distribution of the Boolean networks' states. First, we compute the probability of different BNs' states and get the value of Ω, then we find the maximum possible distribution of the number of BNs' states. Furthermore, we calculate the fluctuation of the distribution. Finally, two representative experiments are conducted to verify the efficiency of the obtained results. Although the real genetic networks are different from the BNs, the theoretical and practical results in this study may be extended for more realistic models. Since the proposed algorithm is conceptually concise and efficient, it is highly extensible for various situations.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
XC drafted the idea. ZL did the derivation, while BR drafted the manuscript. All authors have read through the manuscript.