Functional Relationships between Genes Associated with Differentiation Potential of Aged Myogenic Progenitors

Aging is accompanied by considerable heterogeneity with possible co-expression of differentiation pathways. The present study investigates the interplay between crucial myogenic, adipogenic, and Wnt-related genes orchestrating aged myogenic progenitor differentiation (AMPD) using clonal gene expression profiling in conjunction with Bayesian structure learning (BSL) techniques. The expression of three myogenic regulatory factor genes (Myogenin, Myf-5, MyoD1), four genes involved in regulating adipogenic potential (C/EBPα, DDIT3, FoxC2, PPARγ), and two genes in the Wnt signaling pathway (Lrp5, Wnt5a) known to influence both differentiation programs were determined across 34 clones by quantitative reverse transcriptase polymerase chain reaction (qRT-PCR). Three control genes were used for normalization of the clonal expression data (18S, GAPDH, and B2M). Constraint-based BSL techniques, namely (a) PC Algorithm, (b) Grow-shrink (GS) algorithm, and (c) Incremental Association Markov Blanket (IAMB) were used to model the functional relationships (FRs) in the form of acyclic networks from the clonal expression profiles. A novel resampling approach that obviates the need for a user-defined confidence threshold is proposed to identify statistically significant FRs at small sample sizes. Interestingly, the resulting acyclic network consisted of FRs corresponding to myogenic, adipogenic, Wnt-related genes and their interaction. A significant number of these FRs were robust to normalization across the three house-keeping genes and the choice of the BSL technique. The results presented elucidate the delicate balance between differentiation pathways (i.e., myogenic as well as adipogenic) and possible cross-talk between pathways in AMPD.

expressed simultaneously (Taylor-Jones et al., 2002), (v) microenvironment: expression across sub-populations of cells governed by their immediate surrounding (Allinen et al., 2004) and (vi) multiple pathways to achieve the same end phenotype (Madras et al., 2002). Probabilistic techniques such as Bayesian structure learning (BSL) accommodate the uncertainty and noisiness in gene expression (Friedman et al., 2000), hence a natural choice for modeling functional relationships (FRs) from heterogeneous populations. FRs represent regulatory connections between the genes inferred from their expression profiles (Gardner et al., 2003). Understanding FRs can provide insight into the concerted working of genes as a system as opposed to independent entities.
Transcriptional regulatory networks controlling both myogenic and adipogenic differentiation are continuously being expanded and refined. The myogenic regulatory factors, MyoD, myogenin, Myf-5 and MRF4 clearly are involved in controlling myogenic progenitor specification and subsequent differentiation both during development and during muscle regeneration postnatally (Chargé and Rudnicki, 2003). Although under different experimental settings, each factor can transactivate the expression of the other,

IntroductIon
Classical gene expression analysis on mass cultures implicitly assumes homogeneity in expression across a given population of cells. However, there has been accumulating evidence of considerable heterogeneity in expression (Loeffler and Roeder, 2002;Madras et al., 2002;Taylor-Jones et al., 2002;Beggs et al., 2004;Nagarajan et al., 2004;Bahar et al., 2006;Somel et al., 2006). Some of these studies have also emphasized the nexus between aging and heterogeneity in expression (Taylor-Jones et al., 2002;Beggs et al., 2004;Bahar et al., 2006;Somel et al., 2006). Several factors can contribute to heterogeneity in expression, these include (i) noise: stochastic mechanisms at the dynamical and measurement levels inherent in gene expression estimation (Kaern et al., 2005) (ii) plasticity: commitment toward a specific phenotype re-directed to another phenotype by the activation and/or repression of certain crucial genes (Hu et al., 1995). (iii) de-differentiation: differentiated cells revert to a less-differentiated (i.e., more proliferative) phase, resulting in considerable overlap between proliferation and differentiation (McGann et al., 2001), (iv) co-expression of differentiation pathways: gene programs corresponding to multiple differentiation pathways Nagarajan et al. Functional relationships in muscle differentiation

Myogenic progenitor data generation protocol
Myogenic progenitors were isolated from the tibialis anterior of four aged (23 months) DBA/2JNIA mice as described (11) The relative standard curve method was used to calculate the amplification difference between the samples for each primer set. This is performed by correcting for each signal concentration with the concentration of the controls.

BayesIan structure LearnIng
Constraint-based algorithms are all based on the seminal work by Verma and Pearl (1991). The present study uses three constraintbased BSL algorithms [PC, Grow-Shrink (GS), and Incremental they are expressed in unique patterns in muscle and do not have completely overlapping functions. Similarly, the C/EBP family of transcription factors and PPARγ cross-activate expression to control adipocyte gene expression (Rosen et al., 2000). Typically myogenic and adipogenic differentiation pathways are considered non-overlapping. However, we have shown that myogenic progenitors from aged mice co-express some aspects of both myogenic and adipogenic gene programs (Taylor-Jones et al., 2002), with the balance apparently regulated by Wnt signaling (Vertino et al., 2005). However, to our knowledge there has been limited effort in understanding their interaction. Wnt ligands have been shown to promote myogenic commitment in adult muscle progenitor cells (Polesskaya et al., 2003) and appear to be essential for differentiation of epaxial muscles during development, potentially through regulating MyoD and Myf5 gene expression (Tajbakhsh et al., 1998). However, different Wnts have distinct effects during limb myogenic differentiation (Anakwe et al., 2003) and may activate myogenesis via canonical, β-catenin-dependent and non-canonical pathways (Brunelli et al., 2007). By contrast, Wnt signaling appears to inhibit adipogenic differentiation, potentially by downregulating PPARγ activity (Ross et al., 2000). More recently Wnt signaling has also been implicated in regulating myogenic progenitor fate during aging that may contribute to fibrosis (Brack et al., 2007;Hidestrand et al., 2008). Bayesian structure learning (Pearl, 1988;Glymour and Cooper, 1999) represents the joint distribution of a given set of random variables as product of conditional probability distributions. Several articles have used BSL for modeling FRs (Friedman et al., 2000) from gene expression profiles. Recent studies (Sachs et al., 2005) also used BSL successfully to model FRs from translational and post-translational expression profiles of single cells. It is important to note that BSL implicitly models the FRs in the form acyclic network, hence does not accommodate feedback structure. Nevertheless, these abstractions have been useful in obtaining preliminary insights and system-level understanding. BSL can also be useful in capturing the cause and effect relationships between the variables of interest under certain implicit assumptions such as the causal-Markov assumption and faithfulness (Glymour and Cooper, 1999).

MaterIaLs and Methods cLonaL gene expressIon data
The clonal gene expression data were generated from RNA isolated from 34 clones of myogenic progenitors obtained from 24-monthold mice, cultured to confluence and allowed to differentiate for 24 h. Real-time RT-PCR was used to quantify expression of 12 genes (nine genes of interest and three controls) across the 34 clones. The 12 genes consisted of: Myogenic regulatory factors: MyoD1, Myogenin and Myf-5; Adipogenesis related genes: FoxC2 (Forkhead box C2), DDIT3 (DNA damage inducible transcript 3), C/EBP (CCAAT/enhancer binding protein alpha), PPAR (Peroxisome proliferator-activated receptor gamma); Wnt signaling related genes: Wnt5a (wingless-type mmtv integration site family member 5A) and Lrp5 (Low-density lipoprotein receptorrelated protein 5) and Control genes: GAPDH (Glyceraldehyde-3-phosphate dehydrogenase), 18S (Ribosomal RNA) and B2M (Beta-2 microglobulin). Nagarajan et al. Functional relationships in muscle differentiation

Algorithm
Given: Empirical sample X m×n , representing the expression of m independent clones (replicates) of n genes, Bayesian structure learning technique (B).
Step 1: GenerateX m n r × by row-wise resampling of X m × n with replacement (i.e., each time sample a row from X m × n with replacement). Learn the network from X m n r × using BSL technique (B). Determine the corresponding partially directed acyclic graph (PDAG)Π r (Chickering, 1995).
Step 2: Generate X m n p × by randomly permuting the values in each column in X m×n . Learn the network structure from X p using the BSL technique (B). Determine the corresponding partially directed acyclic graph (PDAG)Π p (Chickering, 1995).
Step 3: Repeat Steps 1 and 2 n s times resulting in partially directed acyclic graphs Π g r and Π g p , g = 1…n s .
Step 4: Determine the frequency of the edges (i.e., confidence, Friedman et al., 2000) in the resampled graphs in Step 3 from Π r , ( , , , and in the randomly permuted coun- Here, the distribution f ij p essentially represents the noise-floor.
Step 5: This has to be contrasted to the ad hoc threshold approach where The proposed algorithm is essentially a non-parametric bootstrap (Efron and Tibshirani, 1993) that estimates the joint empirical distribution of the edge frequencies from the given data and compares it to the null distribution of edge frequencies obtained from the randomly permuted counterpart. The correlation structure across the replicates is destroyed in the random permuted counterparts, hence the edge frequencies f gh p essentially represent the noise-floor (Step 4). It should be noted that the cumulative distribution function of f ij r is stochastically larger than that of f gh p (Lehmann and Romano, 2005) since the latter corresponds to the absence of FRs under the null hypothesis. Therefore, large values of f ij r should result in rejection of the null hypothesis and existence of a significant FR, i → j. The use of random permutations does not require additional assumptions since the gene expression measurement across the replicate clones is generated independently. In such a setting, inference is also exact conditionally on the observed sample, which is always a sufficient statistic for the underlying joint distribution (Pesarin, 2001). Alternatively, this conditioning on the empirical distribution of the observed data makes the tests invariant to underlying statistical distribution of the data which may be partially or even completely unknown and thus obviates the need of asymptotic or approximate solutions. Therefore, the performance of the tests can be solely judged by their statistical properties (i.e., power, consistency, unbiasedness etc.) and the number of permutations. No assumption on the distribution of the population is required.
The proposed approach is also faster compared to incorporating permutation tests directly in the learning algorithm (Edwards, 2000;Legendre, 2000), since in general it's not feasible to reuse the same permutations of the data across different data sets. Thus the latter approach requires on an average 0(n 2 ).n s permutations while the former requires only n s permutations overall.
Association Markov Blanket (IAMB)] to learn the structure from the given clonal expression data. The Inductive Causation (IC) algorithm (Verma and Pearl, 1991), which provides the theoretical framework for learning the structure of Bayesian networks as a form of causal model, can be summarized in three steps: (i) learn the undirected graphical structure underlying the Bayesian network; arcs correspond to direct probabilistic dependencies between the incident nodes, (ii) set the direction of the arcs that are part of a v-structure (a triplet of nodes incident on a converging connection, i.e., B → A ← C), which is identified by the fact that the parent nodes are never independent given their child (i.e., B and C are dependent conditional on every subset S including A), and (iii) set the direction of the other arcs as needed to obtain a directed acyclic graph. The PC algorithm (Spirtes et al., 2000) is a straightforward implementation of the IC algorithm. It performs independence tests on all pairs of variables conditional on all possible subsets of the remaining variables to complete the first step. Conditioning sets are considered in order of increasing dimension, so that in many cases a small number of low-order tests is sufficient to establish whether two variables are independent. The computational complexity of the algorithm still increases rapidly with the number of variables, to the point that PC can only be applied to small or very sparse networks. The GS algorithm (Margaritis, 2003) improves on PC by learning the Markov Blanket of each variable (which includes the parents, the children and the variables which share a child with that particular variable) first. This preliminary selection limits both the number and the size of the subsets considered in the conditional tests, and results in a lower computational complexity without compromising the accuracy of the resulting network. The IAMB algorithm (Tsamardinos et al., 2003) is similar to GS, but uses the heuristic of the same name which consists of a forward selection instead of a simple sequential selection to learn the Markov Blankets. This approach leads to fewer false positives being included in the Markov Blankets, thus improving the quality of the network structure and further reducing the number of conditional tests. The algorithms used in the present study are implemented in open-source R (R Development Core Team, 2009) and publicly available (Chavan et al., 2009;Kalisch and Maechler, 2009;Scutari, 2009). a resaMpLIng approach to deterMIne statIstIcaLLy sIgnIfIcant frs In Friedman et al. (1999) and Friedman et al. (2000), statistically significant FRs in the learned acyclic graph was chosen as those whose confidence is greater than a pre-defined threshold (0 ≤ θ ≤ 1). Confidence was defined as the frequency of a given FR across independent acyclic networks learned from samples generated from the given empirical sample by bootstrapping with replacement (Friedman et al., 2000). Related studies (Friedman et al., 1999;Husmeier, 2003) have also emphasized the impact of the pre-defined threshold on the conclusions and the network structure. The choice of a user-defined threshold becomes especially challenging at small sample sizes. In the present study, we propose a straightforward approach that obviates the need for a pre-defined threshold in identifying statistically significant FRs in the acyclic network.
respectively. Of interest is to note that the results obtained using the proposed algorithm are considerably better than the ones for several choices of threshold (θ = 0.50, 0.75, and 0.95). The performance of the threshold (θ = 0.05, 0.25) were comparable to that of the proposed algorithm with low FPR and high TPR. Thus it is possible a statistically motivated choice of threshold may perform well. However, given the infinite possibilities θ ∈ [0, 1], the chances of picking the correct threshold are slim. Subsequently, we repeated the analysis using a reduced sample size (N = 34) similar to that of the myogenic progenitor differentiation data (i.e., N = 34). It should be noted that this is a (∼147 = 5000/34) fold reduction in the sample size. The frequency (confidence, Friedman et al., 1999Friedman et al., , 2000 is likely to reduce considerably. Therefore, the ad hoc threshold chosen for (N = 5000) may not be appropriate for (N = 34). We also compared the (FPR, TPR) obtained using the proposed algorithm and the ad hoc threshold, Figures 1D-F. As expected, the overall TPR for (N = 34), Figures 1D-F was found to be lower than that for (N = 5000). Interestingly, the FPR for (N = 34) was also lesser than that of (N = 5000). This may be due to the small number of nodes in the network. More importantly, the proposed algorithm seemed to have relatively higher TPR than the ad hoc threshold (θ = 0.50, 0.75, and 0.95), Figures 1D-F. There was no appreciable change in the FPR between the various threshold choices.

osteoprogenItor dIfferentIatIon
A recent study (Madras et al., 2002), established the inherent probabilistic mechanism underlying osteoprogenitor differentiation by investigating the expression of eight genes including bone-related genes and cytokine receptors (COLL1, OCN, ALP, BSP, FGFR1,

resuLts synthetIc data
Prior to the analysis of the experimental gene expression data we tested the proposed algorithm on data sampled from the ASIA network (Lauritzen and Spiegelhalter, 1988). The choice of ASIA network can be attributed to its number of nodes (8 nodes), which is comparable to that of the myogenic progenitor differentiation data (i.e., nine genes). The false-positive and false-negative rates (FPR, FNR) were used as performance indicators. The (FPR, FNR) obtained using the proposed algorithm were compared to those obtained using ad hoc threshold, Figure 1. This was accomplished as follows: (i) Generate the PDAG (Chickering, 1995) of the true ASIA network (Σ o ). (ii) Identify statistically significant edges (Σ 1 ) from the given empirical sample using the proposed algorithm. (iii) Identify statistically significant edges (Σ 2 ) from the given empirical sample using a statistically motivated threshold θ, (0 ≤ θ ≤ 1). (iv) Compute (FPR, TPR) from (Σ o , Σ 1 ). Compute (FPR, TPR) from (Σ o , Σ 2 ).
Repeat steps (ii-v) separately using three BSL techniques (PC, GS, IAMB) with correlation metric across samples sizes (N = 5000, N = 34) and (n s = 1000). In the present study we used ad hoc thresholds ( techniques (PC, GS and IAMB) and Pearson correlation metric are shown in Figure 3. The results obtained by normalization across the three control genes (GAPDH, 18S, and B2M) are shown separately in Figures 3A-C respectively. Of interest, is to note the considerable similarity in the results obtained across GAPDH and B2M. FRs that was persistent across the three BSL techniques and across the three control genes are shown in Figure 3D and consists of genes related to myogenic as well as adipogenic pathway and their interaction. The undirected nature of the network, Figure 3, can be attributed to the small sample size (34 clones) and equivalent classes (Chickering, 1995), an inherent feature of BSL techniques. Nevertheless, the results presented un-equivocally supports the co-expression of differentiation pathways and their cross-talk in AMPD. On a related note, equivalence classes may also imply possible existence of multiple networks that may be biologically significant.

FRs corresponding to myogenic related genes
The FR MyoD1-Myogenin validates the approach as these two genes transactivate the expression of the other and promote musclespecific gene expression (reviewed in Chargé and Rudnicki, 2003). The FR Myogenin-C/EBPα is unexpected given that overexpression of C/EBPα in muscle cell lines results in down-regulation of Myogenin (Hu et al., 1995). The aged muscle cell environment appears to alter the regulation of these transcription factors resulting in co-expression of some aspects of both myogenic and adipogenic differentiation. This FR warrants further study.

FRs corresponding to Wnt-related genes
The FR Lrp5-Wnt5a may be indicative of activation of the canonical Wnt/β-catenin pathway through binding of Wnt5a to Frizzled 4 (Fzd4) receptor and Lrp5 co-receptors (Mikels and Nusse, 2006). Coordinate regulation of Wnt5a/Fzd4/Lrp5 has been demonstrated during the first phase of hematopoiesis suggesting that these genes are involved in early hematopoietic differentiation (Corrigan et al., 2009). Canonical Wnt signaling has clearly been shown to activate both Myf-5 and MyoD expression during development (Tajbakhsh et al., 1998) and to PTH1R, PTHrP, and PDGFRα) across 99 colonies. Subsequently, we had used BSL techniques in conjunction with ad hoc threshold to identify critical FRs from this data . There are two reasons why we chose to re-investigate this data (i) The experimental design of the myogenic progenitor differentiation is similar to that of osteoprogenitor differentiation (ii) we eliminate the need for ad hoc threshold using the proposed algorithm minimizing ambiguity in the conclusion and may possibly identify biologically relevant and novel FRs. Since the sample size of the myogenic progenitor differentiation data is 34, we present the results on the osteoprogenitor differentiation at samples sizes 99 as well as 34. Functional relationships deemed as statistically significant across (PC, GS, IAMB) with Pearson χ 2 test for conditional independence and α = 0.05 for sample sizes (N = 99) and (N = 34) are shown in Figures 2A,B respectively. The reduced sample size (34) was generated by row-wise resampling without replacement from the given 99 colonies. Of interest is to note that three of the FRs (COLL-ALP; BSP-ALP; ALP-OCN) reported in the earlier study using an ad hoc threshold  were deemed as significant using the proposed algorithm across (N = 99), Figure 2A, as well as (N = 34), Figure 2B. In addition, four new FRs (COLL1-FGFR1; FGFR1-BSP; FGFR1-PTH1R; PTH1R-PDGFRα) were also identified across (N = 99) and (N = 34), Figures 2A,B.

MyogenIc progenItor dIfferentIatIon
The expression of nine genes (n = 9) across 34 clones (m = 34) was generated during myogenic progenitor differentiation. These nine genes, Figure 3, consisted of myogenic related genes (Myogenin, MyoD1, Myf-5), adipogenic related genes (CEBPα, PPARγ, DDIT3, FoxC2), and Wnt-related genes (Wnt5a, LRP5). In order to assess reproducibility of the results, normalization was carried out independently across three control genes (GAPDH, 18S, and B2M). The proposed algorithm in conjunction with three BSL techniques (PC, GS, and IAMB) and linear correlation to test for conditional independence (α = 0.05) were used to model the network structure. Only those FRs that were deemed as significant across all the BSL FRs identified in an earlier study  using a statistically motivated threshold and new FRs identified using the proposed algorithm are shown in black and gray respectively. cell phenotype. Further, reduced expression of FoxC2 has been reported to be associated with reduced expression of brown adipogenic genes in humans (Yang et al., 2003) On the other hand, adipocyte-specific overexpression of FoxC2 in transgenic mice resulted in increased adipocyte expression of PPARγ (Kim et al., 2005). Thus, results in adipose tissue are equivocal and difficult to extrapolate to muscle tissue. FoxC2 is detectable in human muscle, but at low levels compared to adipose (Di Gregorio et al., 2005). The dependency of FoxC2-C/EBPα and DDIT3, a gene regulated by C/EBPα, in aged myogenic progenitors, suggests that regulation of adipocyte-specific gene expression may be altered at the molecular level as a function of age. FoxC2 may modulate the relative balance of adipogenic transcriptional activity such that, as proposed for adipocytes, some adipogenic genes are expressed in myogenic progenitors but differentiation does not progress to a fully mature adipocyte (Guan et al., 2002).

dIscussIon
The present study investigated possible co-expression of differentiation pathways in aged myogenic progenitor differentiation (AMPD) using clonal gene expression profiling and BSL techniques. FRs between critical myogenic, adipogenic, Wnt-related genes and their interaction were identified. While the FRs identified in the present study may not necessarily represent direct relationship, they clearly establish the orchestration of differentiation pathways in AMPD and their interaction. The study also demonstrates the promote myogenic differentiation during regeneration (Brack et al., 2007), which may justify the FRs MyoD1-LRP5 and Myf-5-LRP5. During aging, Wnt signaling also appears to regulate adipogenic gene expression in myogenic progenitors (Taylor-Jones et al., 2002;Vertino et al., 2005) and to mediate conversion of myogenic progenitors to a fibroblast phenotype (Brack et al., 2007;Hidestrand et al., 2008). We hypothesize that the distinct function of Wnt signaling in aged muscle may be mediated via Lrp5. Whereas Lrp5 function in muscle has been under-studied, it has been well-characterized in regulation of bone mass, where loss of Lrp5 results in low bone mineral density, presumably by decreasing the bone-forming activity of osteoblasts (reviewed in Krishnan et al., 2006).

FRs corresponding to adipogenic related genes
These included FoxC2-C/EBPα, FoxC2-DDIT3, and FoxC2-PPARγ, Figure 3D. Normalization with respect to the control genes GAPDH and B2M also revealed the widely documented FR across adipogenic transcription factors C/EBPα and PPARγ. FoxC2 is a member of the forkhead family of transcription factors that has been most extensively characterized in adipose tissue. It has been proposed that FoxC2 may control white versus brown adipose differentiation, promoting the latter through inhibition of PPARγ transcription (Davis et al., 2004). FoxC2 has also been shown to inhibit the expression of a subset of PPARγ downstream genes that may prevent the acquisition of the mature fat Edges that were identified as significant by the three BSL techniques as well as normalization with respect to the three control genes are shown in (D). These edges are also represented by dark arrows in the subplots (A-C).
error rate and false-discovery rate and comparing the network structure obtained on the aged myoblasts to those obtained on adult myoblasts.

acknowLedgMents
This work was supported by funds from American Federation for Aging Research and R03-LM008853 from the National Library of Medicine to RN and R01-AG20941 to CAP. The authors would also like to thank Professor Fortunato Pesarin, University of Padova, for his comments and suggestions.
feasibility of a focused small-throughput clonal gene expression profiling for obtaining preliminary insights into co-expression of distinct differentiation markers and their interaction. The proposed resampling approach obviates the need for ad hoc threshold, hence ideally suited from the perspective of pragmatic systems biology that can demand modeling FRs from small sample sizes. A more detailed study and validation is necessary in order to establish the cross-talk between the various pathways and their variation with age. This would include accommodating multiple testing corrections in the structure learning algorithm to control for family-wise