Longitudinal omics modeling and integration in clinical metabonomics research: challenges in childhood metabolic health research

Systems biology is an important approach for deciphering the complex processes in health maintenance and the etiology of metabolic diseases. Such integrative methodologies will help better understand the molecular mechanisms involved in growth and development throughout childhood, and consequently will result in new insights about metabolic and nutritional requirements of infants, children and adults. To achieve this, a better understanding of the physiological processes at anthropometric, cellular and molecular level for any given individual is needed. In this respect, novel omics technologies in combination with sophisticated data modeling techniques are key. Due to the highly complex network of influential factors determining individual trajectories, it becomes imperative to develop proper tools and solutions that will comprehensively model biological information related to growth and maturation of our body functions. The aim of this review and perspective is to evaluate, succinctly, promising data analysis approaches to enable data integration for clinical research, with an emphasis on the longitudinal component. Approaches based on empirical and mechanistic modeling of omics data are essential to leverage findings from high dimensional omics datasets and enable biological interpretation and clinical translation. On the one hand, empirical methods, which provide quantitative descriptions of patterns in the data, are mostly used for exploring and mining datasets. On the other hand, mechanistic models are based on an understanding of the behavior of a system's components and condense information about the known functions, allowing robust and reliable analyses to be performed by bioinformatics pipelines and similar tools. Herein, we will illustrate current examples, challenges and perspectives in the applications of empirical and mechanistic modeling in the context of childhood metabolic health research.


Introduction
The rise in chronic and progressive diseases worldwide leads to new challenges in the field of health economics (Nicholson, 2006). The biological complexity of multifactorial disorders such as diabetes, food intolerances, inflammatory diseases, obesity, amongst others, highlights the need to model the web of interactions between genetics, metabolism, environmental factors, lifestyle and nutrition (Nicholson et al., 2011). Furthermore, advances in clinical research pinpoint the critical importance for early diagnosis and treatment of disease progression to minimize their consequences, especially in the case of progressive diseases such as inflammatory bowel diseases or rheumatoid arthritis. Life-long health promotion and disease prevention by nutrition and lifestyle can prevent or delay the onset of chronic diseases . Identification of personal risk factors for chronic disorders together with a better understanding of individual lifestyle requirements may thus provide a roadmap for a healthier metabolic and clinical status. In such a context, there is a clear need to develop new approaches for enabling personalized therapeutical and nutraceutical management and monitoring solutions (Rezzi et al., 2013;Martin et al., 2013b).

Systems Biology Research in Health and Disease Across Lifetime
Metabolic syndrome encompasses multifactorial metabolic abnormalities including visceral obesity, glucose intolerance, hypertension, hyperuricaemia, dyslipidemia, and non-alcoholic fatty liver disease, all of which are associated with cardiovascular complications (Mottillo et al., 2010;Scherer et al., 2015). Although insulin resistance (IR) remains a key mechanism underlying the pathophysiology of metabolic syndrome, many studies further investigate the more complex etiology that seems to also depend on genetics, body composition, nutrition, and lifestyle. In particular, adiposity is subject to extensive research, since its quantitative and qualitative (e.g., subcutaneous, visceral) distribution in the body associates with different cardiometabolic, obesogenic, and diabetogenic risks (Wildman et al., 2008). More specifically, epicardial adipose tissue may play an important role for predicting metabolic health in overweight and obese children (Schusterova et al., 2013). Recent multivariate data analyses have associated specific metabolite and lipid profiles to body fat distribution (Wahl et al., 2012;Yamakado et al., 2012;Martin et al., 2013b;Scherer et al., 2015). More specifically, these studies described the close relationships between region-specific fat distribution and the levels of amino acids, sphingomyelin, diacylglycerols, triacylglycerols, and phospholipid species in the blood. Such metabolic insights generate new mechanistic knowledge of complex underlying physiological processes. For instance, the inability of adipose tissue to expand or to store fat, results in lipid overflow to other organs under conditions of excess caloric intake combined with a lack of physical activity (Scherer et al., 2015).
In parallel, new evidence has pointed toward the critical and long-term importance of early nutrition and lifestyle on later health and disease risk predisposition (Koletzko et al., 1998). The rising prevalence of type 2 diabetes and obesity in children is a growing and alarming problem, associated with several short-term and long-term metabolic and cardiovascular complications (Rosenbloom et al., 1999;Marcovecchio and Chiarelli, 2013;Cominetti et al., 2014). Consequently, early identification of people with high risk of becoming diabetic is important because the development of diabetes can be delayed or prevented by lifestyle or medical intervention (Hosking et al., 2014). However, evidence-based dietary guidelines and a more comprehensive characterization of the influence of environmental factors at the onset and during the evolution of type 2 diabetes and obesity are needed (Martin et al., 2013a). As a pre-requisite, reference information on how dietary and lifestyle habits influence metabolic functions must be further expanded. This will enable us to comprehensively document the biological processes associated with individual health at the different stages of the life cycle, including the critical pubertal physiological window, which may appear as a susceptibility period for several metabolic deregulations (Mantovani and Fucic, 2014).
Growth during childhood and adolescence occurs at different rates and is influenced by the interaction amongst genetic, nutritional, and environmental factors, which can lead to different susceptibility to childhood disease and disease risks later in life. This introduces a temporal dimension in the study designs and poses additional analytical challenges. Although little is known about the underlying genetics, growth variability during puberty correlates with a complex genetic architecture linking pubertal height growth, the timing of puberty and childhood obesity and provides new information about processes linking these traits (Cousminer et al., 2013). In the context of metabolic health, childhood and adolescence, obesity introduces a significant disturbance into normal growth and pubertal patterns (Sandhu et al., 2006;Marcovecchio and Chiarelli, 2013). There is evidence in both adults and children that glucose levels that are close to the upper limit of the normal range are indicative of future diabetes. One third of children showing transient hyperglycaemia in the absence of serious illness can be expected to develop diabetes within 1 year (Herskowitz-Dumont et al., 1993;Hosking et al., 2014). IR is associated with diabetes and is modulated by complex patterns of external factors throughout childhood that remains poorly understood. IR is higher during puberty in both males and females, with some studies showing the increase to be independent of changes in adiposity (Jeffery et al., 2012). Modeling of longitudinal data on IR, its relationship to pubertal onset, and interactions with age, sex, adiposity, and IGF-1 has recently been conducted (Jeffery et al., 2012). The study exemplified how IR starts to rise in mid-childhood, some years before puberty, with more than 60% of the variation in IR prior to puberty remaining unexplained. In addition, conventional markers, such as HbA1c, that are used to detect diabetes, or to identify adult individuals at risk of developing diabetes, and for adult metabolic disease risk, are not sensitive and specific enough for pediatric applications, suggesting that other factors influence the variance of these markers in youth (Hosking et al., 2014). One key factor currently being studied is the excess of body weight during childhood which can also influence pubertal development, through an effect on timing of pubertal onset and pubertal hormonal levels (Marcovecchio and Chiarelli, 2013). Additionally, skeletal growth and changes in body composition during growth show important variability in both genders (Ballabriga, 2000). The link between fat and puberty is complex and gender-specific. Body fat of contemporary UK children, for example, does not appear to be deleterious to bone quality (Streeter et al., 2013). Moreover, in girls, higher IR limits further gain in body fat in the long term, an observation consistent with insulin desensitization as an adaptive response to weight gain (Hosking et al., 2011). The complex dynamics of growth and development also involve changes in biological processes that influence basal metabolic function (for instance, resting energy expenditure) and physical activity. The role of resting energy expenditure and weight gain in children is subject to controversy, with particular interest in studying whether low energy expenditure may be a predisposing factor for childhood obesity (Griffiths et al., 1990), and in better understanding of energy requirements prior to and during puberty (Hosking et al., 2010). In recent years advances in microbiota research has provided compelling evidence that the intestinal microbiota contributes to the overall health status of the host and therefore plays an important role in modulating the effect of nutrition on health and disease (Nicholson et al., 2012). In particular, there is increasing evidence for the role that the gut microbiota plays in regulating fat storage and energy homeostasis in the host, hence acting as an important environmental factor for diabetes and obesity (Musso et al., 2010). We and others (Wikoff et al., 2009;Moco et al., 2012Moco et al., , 2014Tremaroli and Bäckhed, 2012;Sommer and Bäckhed, 2013) have also demonstrated how specific metabolic activities of gut bacterial species can provide the host with new biochemical compounds in sufficient amounts to be detected in the systemic blood stream. These host-gut bacterial co-metabolites may subsequently impact human host metabolism, for instance through modulating quantitatively and qualitatively the nutrient and calories made available to the host throughout digestion (Jumpertz et al., 2011;Martin et al., 2013a).

Omics Modeling and Integration in Clinical Research
The rising prevalence of multifactorial disorders, the lack of understanding of the molecular processes at play, and the need for disease prediction in asymptomatic conditions are some of the many challenges that systems biology is well-suited to address. With its aim to connect the information flow between the different organizational levels of life such as the genome, epigenome, transcriptome, proteome, and metabolome, systems biology approaches are becoming highly relevant for assessing the connection between human physiology and nutrition (Mantovani and Fucic, 2014;Moco et al., 2014). Systems biology also aims at understanding the global dynamics of biological processes to gain a deep understanding of the system, which adds an additional layer of complexity to existing intracohort heterogeneities, inter-laboratory methodology differences and changes in the instrumentation (Moco et al., 2014).
Omics technologies are often employed to generate a snapshot of the system being studied, at multiple pathway levels, yet only considering cross-sectional information. Therefore, integrative solutions and resources are becoming nowadays a pre-requisite to clinically leverage the knowledge from large amounts of existing omics data collected from different compartments, and ultimately to provide a unified view and personalized therapeutic approaches to disease (Moco et al., 2014). In the context of childhood metabolic studies, major challenges lie in the high dynamics (e.g., metabolic requirements for growth and development), specificities (e.g., hormonal maturation) and amplitude of changes (e.g., acute growth, major switch in the distribution of body fat and lean mass) that affect the biological, physiological, clinical, and anthropometric parameters. Hence, there is a need to adapt methodologies and design of experiment to explore processes related to growth, development, maturation and pubertal stages over months and years of the childhood spectrum.
The aim of this current review and perspective is to evaluate, summarily, some promising data analysis approaches to enable data integration for clinical research, with an emphasis on the longitudinal component ( Table 1). Approaches based on empirical (statistics) and mechanistic modeling of omics data are essential to leverage findings from high dimensional omics datasets and enable biological interpretation and clinical translation. Empirical methods are based on direct observations, measurements, and extensive data records. These methods provide quantitative descriptions of patterns in the data and do not attempt to describe underlying processes or the mechanisms involved. Therefore they are mostly used for exploring and mining datasets. Contrasting with empirical approaches, mechanistic models aim at understanding the behavior of a system's components (Thakur, 1991). Mechanistic models are based on the most comprehensive set of available knowledge of the systems of interest (knowledge base)-more than just the data used to train it. They are rooted in two basic principles, namely (i) every observed phenomenon is based on multiple inter-connected processes; and (ii) when the most significant processes are represented mathematically, the simulated output resembles the actual observations. Mechanistic models may also lead to the discovery of emerging properties. These are properties that arise through interactions among smaller or simpler entities but they cannot be observed within the isolated smaller entities. In biology the most prominent mechanistic models are the genome scale metabolic models. They are built on current knowledge (biochemical, metabolic, transcriptional, translation, and signaling) and condense information about the known functions of protein-encoding genes, how these genes/proteins interact with other bioactive compounds and associated reactions, allowing robust and reliable analyses to be performed by bioinformatics pipelines and similar tools (Shen et al., 2010). They are also the base for multi-scale models. In the following sections, we will illustrate current examples, challenges and perspectives in the applications of empirical and mechanistic modeling in the context of childhood metabolic health research.

Integration of Longitudinal Omics Data: Methods and Challenges
Unlike for adult and elderly population studies, there is a lack of standards and thresholds used to characterize healthy status during childhood, as well as a lack of comprehensive human trials which could guide its study. Moreover, as previously discussed, the nature of growth and development occurring across childhood is linked with complex patterns of dynamics and amplitudes of changes not observed in adult and elderly. Therefore, there is a need to include a wider number of data types including time resolved data and to have a more exploratory type of approach when analyzing the data. Similarly to other omics technologies, metabolic profiling (Nicholson et al., 1999;Fiehn, 2002;Smith et al., 2006) based on mass spectrometric (MS) and nuclear magnetic resonance spectroscopy (NMR) produce data, analysis of which brings a number of challenges, with some requiring special attention in clinical omics studies, namely (i) high-dimensional nature of omics data; (ii) longitudinal aspect of multivariate omics data; (iii) multiple omics datasets; and (iv) mechanistic interpretation. The different levels of complexity are depicted through a series of schematic pictures in Figure 1. In the case of childhood metabolic health research these challenges are clearly present and important to address.

High-dimensional Omics Data
The high number of variables in omics data involves working with a particular structure between the variables, often related to their analytical or biological relationships, which results in the need for complex frameworks for biomarker discovery (Montoliu, 2015). Furthermore, even if most omics data types are continuous, it is not uncommon to have to deal with discrete variables (clinical or experimental). To address such challenges, multivariate data analyses appears as a more appropriate alternative to the standard approach of univariate analysis plus multiplicity testing correction (Massart et al., 1997;Montoliu, 2015). From the set of techniques driven by a pure chemometric approach, Principal Component Analysis (PCA) (Jolliffe, 2002), Partial Least Squares regression (PLS) (Geladi and Kowalski, 1986; Wold et al., 2001) and their derivates, such as Orthogonal Projection on Latent Structures (OPLS) Wold, 2002, 2003), are amongst the reference methodologies which perform well in low n (subjects), high p (observations) datasets (where n refers to the sample size and p to the number of dimensions, as in Figure 1A) through the projection of multivariate data onto a reduced subspace (Richards et al., 2010). PLS methods adapt well to linear and non-linear relationships, but require a validation process to assess whether they apply in a more general way and to minimize overfitting (Westerhuis et al., 2008). Moreover, since PLS assumes a given variable distribution and the linearity of the model, there is an additional need for a careful validation of these features (Montoliu, 2015). Variants of these methodologies are employed when the response variable is categorical (Westerhuis et al., 2010). These approaches are referred to as Linear Discriminant Analysis (LDA), e.g., Partial Least Squares Discriminant Analysis (PLS-DA) (Barker and Rayens, 2003) and Orthogonal Projection on Latent Structures Discriminant Analysis (OPLS-DA) (Bylesjö et al., 2006).
Several other classification algorithms focused on solving low "n to p" ratio issues ( Figure 1A) have been developed, but objective criteria to assess performance and conditions of use remain undefined. It is unlikely that a universal classifier/regressor can satisfy all conditions, and therefore applications of the different methodologies are driven by the research question (Montoliu, 2015). As extensively discussed by Gomez-Cabrero et al., the analysis of large and heterogeneous data sets encourage researchers to develop novel data integration methodologies (Gomez-Cabrero et al., 2014). Amongst these methodologies, machine learning approaches or regularized statistical methods provide a wealth of tools that can learn from and make predictions on data (classification and regression), including Support Vector Machines (SVMs), Random Forests (RFs) and Multilayer perceptrons on the one side; and SPLS, Lasso or Elastic Nets (ENs) on the other side. However, the use of kernels and weight connection layers in Multilayer perceptrons removes any traceability of the role of the individual variables in the model (Montoliu, 2015).
In the context of childhood research, such approaches remain very relevant, allowing to compare groups of subjects, similarly to what is applied when studying obesity, overweight, diabetes, impaired insulin, and glucose control in adults, as exemplified by Wahl et al. (2012). Moreover, methodologies like RFs or PLS, are extremely useful to predict the influence of early metabolic status on later outcomes. However their application in the context of time resolved data remains more challenging as discussed hereafter.

Longitudinal Multivariate Omics Data
When it comes to longitudinal omics data, i.e., one or more type of omics data measured over time (see also Figure 1B where the same matrix of measurements is repeated at different time points, depicted using an increase in intensity of color), the statistical analysis becomes even more challenging (Dean et al., 2009;Stanberry et al., 2013;Cominetti et al., 2014). However, longitudinal studies are key to understand the global evolution of biological processes. Such studies aim typically at following populations of subjects over time. Resulting time profiles can be clustered to identify subgroups or can be used for monitoring, forecasting and diagnostic purposes (Albert and Schisterman, 2012;Liquet et al., 2012). In addition, the time dimension is important and often specific to the type of data and clinical endpoints in human trials, ranging from minutes and hours, to months and even years. Indeed, the biological processes described by the omics data show specific time-dependent modulation, amplitude of change and regulatory mechanisms; for instance, gene expression and metabolites involved in gluconeogenesis show very different and specific time scale but contribute to the same biochemical processes. Moreover, repeated measurements are often unequally-spaced in time (in our pictorial representation of Figure 1B, this would mean that the colors of the cells of the matrices do not change in a linear manner) and it is important to account for this difference in the model (Albert and Schisterman, 2012) as well as for delays in time-to-event such as disease onset or phenotypic change. Additional challenges when dealing with longitudinal data are auto-correlation of repeated measurements of the same variables, random effects, missing data, and dropouts, which are being discussed hereafter (Dean et al., 2009;Carin et al., 2012). Autocorrelation can be both a limitation and an advantage depending on the type of analysis. For instance, it is a limitation when trying to use certain techniques such as projection-based methods (e.g., PCA, PLS) which are well suited to tackle high-dimensional datasets but that do not take into account subjects' trajectories. With respect to missing data and dropouts it is important to assess if the reason for the data missing is related to the process under observation or not (Albert and Schisterman, 2012). One approach to deal with missing data could be to impute it while avoiding biased results.
Despite all the challenges mentioned above, the analysis of longitudinal omics datasets typically provides major advantages, not only in terms of gain in information, but also through (i) an increase in the statistical power of the studies (Zeger and Liang, 1992), (ii) a decrease in noise (if correlations of repeated measurements and inter individual variability are properly accounted for) (Liquet et al., 2012;Cominetti et al., 2014), as well as (iii) an increase in the robustness to model specification (Zeger and Liang, 1992).
A range of solutions have been proposed, including Generalized Linear Mixed Models (GLMM), Generalized Estimating Equations (GEE), Markov models, non-parametric or semi-parametric models or Bayesian models, factor analysis, dictionary learning, dynamical pathway analysis, latent growth curves, amongst others (Carin et al., 2012;Stanberry et al., 2013;Cominetti et al., 2014). Alternatively to those parametric methods, non-parametric or semi-parametric statistical models remain widely employed, being more flexible than parametric models, to model the complex curves of longitudinal trajectories (Dean et al., 2009), especially when the variations in the omics variables are large or are induced by major biological events (e.g., changes in metabolism and requirements during the growth of the child, puberty and the onset and/or remission of a disease). However, considering the vast array of techniques and their specific advantages and limitations, it will often depend on the overall objective of the study, and constraints imposed by the data, when choosing the best adapted modeling tools. Up to this point we were considering one data set generated over time. This can be further extended to multiple omics datasets, like the ones represented in Figure 1C.

Combined Analysis of Multiple Multivariate Datasets
In addition to modeling temporal omics data, combined analysis of different omics data sets is still in its infancy (Gomez-Cabrero et al., 2014). As clearly presented by Gomez-Cabrero et al., the term of data integration refers to the integrative study of different sources and types of data from a given system (Gomez-Cabrero et al., 2014). In this context, identifying shared or common information among two or more sets of data from a biological process under study can help us to better describe underlying molecular events. However, these large heterogeneous data sets result in some significant challenges (Gomez-Cabrero et al., 2014). First, the fundamental differences in the data types need to be considered, including the difference in their variance-covariance structure, the multi-scale nature of omics data and differences in sizes of omics datasets (see also Figure 1C showing two matrices with sizes n by p and n by q respectively, where p and q are the different number of analytes measured), which brings the issue of having to weight groups of variables differently. Richards et al. have previously summarized key approaches for intra-and inter-omic fusion strategies in a metabonomics-driven context (Richards et al., 2010). Their work highlighted some promising methods for inter-instrument, inter-sample type and inter-omics integration, namely multiblock hierarchical PCA, consensus PCA, Parallel Factor Analysis (PARAFAC), Multivariate Curve Resolution-Alternative Least Squares (MCR-ALS) and O 2 PLS techniques. MCR-ALS and PARAFAC are well adapted to assess functional relationships across matrices and to enable the characterization of compartment-specific metabolic signatures (Montoliu et al., 2009;Martin et al., 2010). Eventually, such approaches are also relevant for stepwise variable and data-block selection for further multivariate and longitudinal analysis. From other related fields, such as ecology and multi-species genomics, a variety of methodologies are being used to enable various data integration strategies, including Generalized Singular Value Decomposition (GSVD), Latent Variable Multivariate Regression (LVMR), Simultaneous Component Analysis (SCA), Canonical Correlation Analysis (CCA) (Hotelling, 1936), Co-Inertia Analysis (COIA), Integrative Bi-Clustering or Multiple Factor Analysis (MFA). These approaches may also offer novel opportunities in the field of clinical metabonomics. Moreover, with the aim of identifying common and data-specific information for a given omics data set, methods based on twoblock latent variable regression with an integral OSC filter, such as O2PLS Wold, 2002, 2003) are being used (especially in the field of Metabonomics), but Joint and Individual Variation Explained (JIVE) (Lock et al., 2013) and DIStinct COmmon SCA (DISCO-SCA) (Schouteden et al., 2013) may offer some advantages in terms of analytical strategies. JIVE represents an extension of PCA, it works by decomposing data into three elements, one of which captures the joint structure between data types, another captures structure individual to each data type and a third element which captures the residual noise (Lock et al., 2013). JIVE may offer advantages compared to CCA and PLS approaches and it could offer some promising capabilities for the integrated analysis of omics data (Lock et al., 2013). SCA methods are well adapted to study linked data and model a small number of simultaneous components that maximally account for the variations in the data sets (Schouteden et al., 2013). While SCA reflects a mix of common and distinct information, the DISCO-SCA approach aims at solving this problem in multi-block data analysis, by enabling both the modeling of relationships across all the data types under consideration, but also to explore the relationships within a single or a few selected blocks at the same time (Schouteden et al., 2013). Schouteden et al. presented an example where children from different age groups are given the same personality questionnaire, which results in a set of child-by-item data blocks, with each data block pertaining to a specific age group and with the different data blocks having the questionnaire items in common. DISCO-SCA could enable the analysis of both general personality dimensions and dimensions that are specific for a certain developmental stage. In the context of metabolic phenotype in childhood, such a method could thus allow the study of molecular processes related to growth, and the simultaneous exploration of age-specific phenotype.
The integrative personal omics profile (iPOP) analysis  tries to go one step further, namely combining multiple time-resolved multivariate datasets, such as genomics, transcriptomics, proteomics, and metabolomics profiles from a single individual measured over a 14-month period. The datasets were first transformed using a Lomb-Scargle (Lomb, 1976;Scargle, 1982Scargle, , 1989 [Fourier] transformation in order to remove the effect of uneven data sampling in time. Based on this transformed data, the original time-series were reconstructed using an inverse Fourier transform and evenly resampling frequencies/times (for more details see also Chen et al., 2012 and references therein). This allowed the use of standard time-series analysis methods and the clustering of the combined datasets. As a proof-of-concept, the longitudinal iPOP study has shown the potential to interpret healthy and disease status by connecting genomic information with additional dynamic omics activity. These methodologies could offer unprecedented opportunities to further explore the functional relationships between omics biological data and growth and development, and subsequently to allow novel characterizations of factors contributing to a healthy or unhealthy childhood trajectory.
However, these methods provide quantitative descriptions of patterns in the data and do not attempt to describe underlying processes or the mechanisms involved. In this respect mechanistic models are an important addition to empirical data analysis providing a framework to mathematically represent current biological knowledge as well as data interpretation and clinical translation of the multiple cellular processes captured by the omics approaches.

Mechanistic and Biological Interpretation of Models Based on Omics Data
In the last couple of years mechanistic modeling has become more and more popular as approach to model phenotypes under different conditions and therefore to expand the understanding of complex biological systems. In contrast to the previously discussed methods that try to develop models based on the given data, mechanistic models are knowledge-based models and are therefore independent of the data. At the lower end of the modeling hierarchy, in terms of biological organization, we have the cell. Their phenotype is mainly controlled at the following three levels: (i) metabolism: enzyme-catalyzed chemical transformations taking place in a cell that either consumes metabolites for energy production or generates small molecules that serve as building blocks, (ii) gene regulation: control of increase or decrease of transcripts (mRNA) and their translation into proteins, and (iii) signaling: complex communication system that combines proteins, lipids, and small molecules in various ways allowing cells to sense the environment and respond correctly. These three levels are linked through diverse types of interactions but with respect to modeling they are still mostly treated separately using mathematical formalisms that are specific to the level that is modeled and that reflect the molecules and the processes involved. Combined models are still rare, since there is currently no single modeling formalism that can deal with the different biological aspects.
Ordinary differential equations (ODEs) are frequently used to describe metabolic pathways. However, the challenge with ODEs is that it is often difficult to obtain the parameters required for the model. Consequently, when it comes to genome-scale models and simulations they quickly become unfeasible. Alternatively, for larger networks, genome-scale metabolic models (GEMs) or Boolean networks are widely used (Goncalves et al., 2013). In particular, constraint-based GEMs are well suited to handle the complexity of the cellular metabolism leading to a better understanding of the full cellular metabolism at the systems level ( Figure 1D depicts metabolic pathways with the nodes representing metabolites and solid edges representing enzymatic reactions). Therefore GEMs are very useful to study disorders that have a strong metabolic component.
Currently most GEMs are based on steady-state analysis. Only recently different groups have started with the construction of kinetic models (Chakrabarti et al., 2013;Stanford et al., 2013). However, in absence of real data, estimation of the kinetic parameters remains a challenge. Consequently they are not yet used for higher eukaryotes.
The starting point for the generation of GEMs for human cells are essentially two generic literature-based GEMs, Recon 1 (Duarte et al., 2007) and the Edinburgh Human Metabolic Network (EHMN) (Ma et al., 2007), which have been developed by different research groups with the aim to study human metabolism. These generic GEMs were later merged into one database, the Human Metabolic Reaction (HMR) database (Agren et al., 2012), together with reactions related to human metabolism from KEGG (Kanehisa et al., 2010). Recently the HMR database has been updated with data from Reactome (Croft et al., 2011), HepatoNet1 (Gille et al., 2010), Lipidomics Gateway (Harkewicz and Dennis, 2011) and the HumanCyc database (Romero et al., 2005).
In the last couple of years the cost of large-scale omics data generation has considerably decreased but analyzing and more specifically the interpretation of such data remains a challenge. This is mainly due to the complexity of the underlying cellular processes which involve the regulation of multiple genes that are not fully understood in terms of function and interactions amongst them (Palsson and Zengler, 2010). In an attempt to overcome these challenges, several groups introduced the use of GEMs to place omics data in the context of the cellular metabolism (Palsson, 2009;Yizhak et al., 2010;Ideker and Krogan, 2012). GEMs can be reconstructed based on highthroughput omics data, but they also serve as a computational framework to analyze and interpret such data as a network where the nodes represent the substrates/products and the edges the reactions, like the schematic representation in Figure 1D. These networks are then transformed into stoichiometric matrices which serve as the base for constraint-based modeling (Famili et al., 2003), into which numerical omics data can be effectively plugged. Moreover, personalized GEMs can be reconstructed in the same way as cell/tissue-specific models are generated.
In this case one would start from omics data obtained from a single patient and earlier studies on inborn errors of metabolism (Shlomi et al., 2009) and metastatic breast cancer  show that GEMs are of potential use in the discovery of biomarkers.
A multi-cellular/multi-tissue type GEM was elegantly described by Bordbar et al. (2011). They developed a model that connects GEMs that represent human adipocytes, hepatocytes and myocytes and hence allows connecting the metabolic pathways of the three cell types. They used the resulting model to study diabetes and in this respect they simulated the behavior of known cross-cell metabolic cycles. In order to study differences in the metabolic activity between obese and obese type II patients that underwent a gastric bypass surgery high-throughput data was integrated with the multi-cellular/multi-tissue type GEM. This approach allowed the authors to link known physiological changes seen in these patients with a mechanistic understanding. These findings can be described as emergent properties, since they could only be observed using the multitissue modeling approach. It would not have been possible to make these observations from transcription data only. This example illustrates how such approaches could be used to obtain a mechanistic understanding of the phenotypic evolution during childhood by linking the phenotype with the underlying metabolism.
The multi-cellular/multi-tissue study described above essentially connects GEMs that represent different cell types. Such multiscale models are ideal to model biological systems, since biological systems are intrinsically complex; composed of multiple functional networks, which operate across different temporal and spatial levels to maintain growth, development, and reproduction. Multiscale models are combinations of continuous and discrete modeling strategies either deterministic or stochastic. Such computational models are uniquely positioned to capture the connectivity between these divergent scales of biological function, as they can bridge the gap in understanding between isolated in vitro experiments and whole-organism in vivo models. Starting at the cell level, the next step would be to combine GEMs using an agent based modeling approach to represent cell networks and tissues. These tissues would then need to be combined into larger, whole-organ models, typically using finite element approaches (Moreno et al., 2011). However, no single comprehensive gene-to-organism multiscale model has been developed so far but remains subject to intensive research (Walpole et al., 2013).
Successful applications of GEMs may lead to the generation of testable hypotheses with strong mechanistic interpretations and identification of knowledge gaps. Moreover GEMs may lead to the prediction of proteins and/or metabolites that are key in the evolution of a disease and provide a contextdependent framework for the analysis of disease specific omics data. Consequently GEMs can be used to better understand the relationship between genotype and phenotype and generate new biological knowledge (Patil and Nielsen, 2005;Lewis et al., 2012), possibly leading to the discovery of biomarkers; drug targets and new therapeutic agents (Jerby and Ruppin, 2012;Mardinoglu and Nielsen, 2012).

Conclusion
Systems biology methodologies will help better understanding the molecular mechanisms involved in growth and development through childhood, and consequently will result in new insights about metabolic and nutritional requirements of infants, children and adults. To achieve this, a better deciphering of the physiological processes at an anthropometric, cellular and molecular level for any given individual is needed. In this respect, novel omics technologies in combination with sophisticated data modeling techniques are key, as summarized in Table 1. Amongst the major challenges when integrating longitudinal omics data are the high dimensional nature of the omics data, the longitudinal aspect of multivariate omics data and integrating multiple datasets, as well as the mechanistic interpretation of the omics data. Projection methodologies such as PCA and PLS work well for low n, high p datasets, but not for longitudinal data. Therefore, methodologies able to adapt to the complexity of individual trajectories are needed, such as non-parametric statistical models, GEE, Markov models, Factor analysis and Bayesian models that have appeared as good tools for modeling longitudinal data. Furthermore, the integration of different omics datasets could be achieved via techniques including CCA, COIA, multiple factor analysis and integrative biclustering. Some of these tools utilize a multi-block approach and/or study the covariance between the different matrices. In contrast to these empirical approaches mechanistic modeling has become a key methodology to better understand biological systems. In the last couple of years, these methods have made big progresses and can be used as framework to interpret omics data. Also, such models serve as knowledge bases that combine the current understanding in a mathematical form and allow to make phenotypic predictions under different conditions and to identify gaps. However, due to the high complexity of the network of influential factors determining individual trajectories, the field is still in its infancy and it becomes imperative to develop proper tools and solutions that will comprehensively model biological information related to growth and maturation of our body functions.