The Use of Natural Genetic Diversity in the Understanding of Metabolic Organization and Regulation

The study of metabolic regulation has traditionally focused on analysis of specific enzymes, emphasizing kinetic properties, and the influence of protein interactions and post-translational modifications. More recently, reverse genetic approaches permit researchers to directly determine the effects of a deficiency or a surplus of a given enzyme on the biochemistry and physiology of a plant. Furthermore, in many model species, gene expression atlases that give important spatial information concerning the quantitative expression level of metabolism-associated genes are being produced. In parallel, “top-down” approaches to understand metabolic regulation have recently been instigated whereby broad genetic diversity is screened for metabolic traits and the genetic basis of this diversity is defined thereafter. In this article we will review recent examples of this latter approach both in the model species Arabidopsis thaliana and the crop species tomato (Solanum lycopersicum). In addition to highlighting examples in which this genetic diversity approach has proven promising, we will discuss the challenges associated with this approach and provide a perspective for its future utility.


INTRODUCTION
Elucidation of metabolic regulation in plants has been an academic pursuit spanning many decades. Indeed the question of how metabolism is regulated was first raised in the scientific era in which metabolism was first truly defined (Buchner, 1907;Krebs and Henseleit, 1932;Plaxton, 1996;Kornberg, 2000). Whilst the ability to regulate the rates of metabolic processes in response to cellular circumstance is a common feature of all organisms, it is particularly acute in sessile organism such as plants (Plaxton, 1996). From a textbook perspective, metabolic regulation is classically divided into coarse and fine levels of control (Dennis et al., 1997;Fell, 1997). Coarse control refers to long-term mechanisms that are energetically expensive and lead to changes in the total cellular population of a protein. By contrast, fine control describes generally fast (and therefore energetically inexpensive) regulatory devices that modulate the activity of pre-existing enzyme molecules. Whilst this arbitrary division can be useful for descriptive purposes, recent reports suggest that several regulatory mechanisms cannot be so easily defined and hence such classification is commonly regarded as outdated. However, unfortunately, despite the massive amount of data available regarding steady-state RNA levels afforded by microarrays (see for example the data stored in GENEVESTIGA-TOR; http://www.genevestigator.ethz.ch) and more recently by next generation sequence analyses (see for example Gonzalez-Ballester et al., 2010;Bräutigam et al., 2011), protein abundance data remains relatively scare. That said, important recent advances have been made both regarding protein synthesis (Mustroph et al., 2009;Piques et al., 2009) and degradation (Araújo et al., 2010(Araújo et al., , 2011Hua and Vierstra, 2011).
Several regulatory mechanisms act on already synthesized enzymes. Indeed our understanding of regulation of central (primary) plant metabolism has been largely defined by the discovery of such features within the last 50-60 years, whereas understanding of specialized (secondary) metabolism has made similar strides within the last 30 years (for reviews see Pichersky and Gang, 2000;D'Auria and Gershenzon, 2005;Gachon et al., 2005;Yonekura-Sakakibara and Saito, 2009). In brief, such mechanisms include (i) alteration in substrate or co-substrate concentration, (ii) variation in pH, (iii) allosteric effectors. The importance of all three of these mechanisms is illustrated by multiple examples. The first, of these is essentially the most simple and certainly the most rapid to affect metabolic systems with the rate of an enzymecatalyzed reaction proceeding more rapidly upon an increase in sub-saturating substrate -a case that is common in vivo (Dennis et al., 1997). All enzyme reactions are, to a greater or lesser extent regulated in this manner. However, the situation is complicated by the fact that not all reactions display simple Michaelis-Mentonlike kinetics and by the fact that many co-substrates are shared by multiple reactions. These factors alone render understanding the systemic response to prevailing fluctuations in substrate conditions unpredictable. Secondly, many enzymes are affected by pH. For example, regulation of enzymes of the Calvin cycle is well documented to be pH regulated; stromal pH is 8.0 in that the light and 7.0 in the dark (see Dennis et al., 1997). Thirdly, allosteric effectors are immensely important in the regulation of plant metabolic networks, be they activators or inhibitors. Within the major pathways of carbohydrate metabolism, several examples of the importance of such metabolites exist, including the 3 phosphoglycerate (3PGA)/inorganic phosphate (Pi) ratio in activating ADP glucose pyrophosphorylase (AGPase; Preiss, 1982;Tiessen et al., 2002), the fructose 2,6-bisphosphatase (Fru 2,6P 2 ase) system (Stitt, 1990;Fernie et al., 2001) and the effect of pyruvate on the alternative oxidase of mitochondrial respiration (Millar et al., 1993;Oliver et al., 2008).
Understanding the function of a given enzyme within a biological process has until recently largely followed a set protocol by which novel genes associated with a specific process are identified by means of similar patterns of expression across a wide range of experiments and subsequently their function tested. This is initially carried out by analyzing the metabolite profiles of genotypes deficient in the expression of the gene. Confirmation of kinetic properties of the enzyme either in planta (in the case that the gene encodes the only isoform of an enzyme) or following expression of the gene in a heterologous system lacking the activity is subsequently required (see for details Tohge and Fernie, 2010). Whilst this approach has been tremendously successful (Hirai et al., , 2007Tohge et al., 2005;Okazaki et al., 2009) in terms of annotating the precise biochemical function of individual genes, it does not enable elucidation of the exact physiological function in vivo. In the last 25 years the roles of specific metabolic enzymes have been addressed via the use of transgenic plants (Stitt and Sonnewald, 1995;Lytovchenko et al., 2007). Such studies have greatly advanced our understanding of metabolic regulation. However, such directed approaches sometimes fail to uncover complex inter-pathway interactions and pathway regulators. Given that the chemical constituents of any life form determine the development and functioning of the organism (Sumner et al., 2003;Fernie et al., 2004a;D'Auria and Gershenzon, 2005), strategies to characterize the chemical complement of the cell are becoming increasingly important. The diversity of metabolites is controlled by a complex interaction involving many structural and regulatory genes as well as environmental influences (Harrigan et al., 2007a). Although chemical profiles differ between and even within species, thousands of diverse metabolites are usually found in a single plant (De Luca and St Pierre, 2000;Sumner et al., 2003;D'Auria and Gershenzon, 2005;Fernie, 2007). These range from small and simple structures such as vitamins and amino acids to more complex compounds such as polycyclic antioxidants and protease inhibitors. Other compounds function as energy carriers that can store or release energy upon formation or degradation respectively (Fernie et al., 2004b). For example, glucose synthesized during gluconeogenesis can be polymerized to form starch or be broken down during glycolysis. This regulated interconversion of compounds is perhaps the most important hallmark of plant metabolism, enabling the organism to respond to specific demands over the course of its life and on a minute-by-minute basis (Keuentjes and Fernie, 2011).
Metabolites are often classified as being either primary or secondary, although no strict discrimination can be made and interactions between the two classes are manifold. Primary metabolism includes essential metabolites such as those in central carbohydrate metabolism (Koch, 1996;Rontein et al., 2002) whereas secondary metabolism is often connected to interactions with environmental cues, including cell signaling, interspecies communication, and responses to biotic and abiotic stress (see for example Wink, 1988;Mitchell-Olds and Pedersen, 1998;Lehmann et al., 2009;Rubin et al., 2009). Although primary metabolic pathways are strongly conserved between species, quantitative variation is often observed, possibly related to the different growth characteristics of various species (Mitchell-Olds and Pedersen, 1998). Qualitative and quantitative variation in secondary metabolism is, however, much more extensive and it is widely accepted that secondary metabolism determines to a great extent the success of plant adaptation (Herms and Mattson, 1992;Pichersky and Gang, 2000). As stated above, thousands of different metabolites can be found in a single plant species (De Luca and St Pierre, 2000;Sumner et al., 2003;D'Auria and Gershenzon, 2005;Fernie, 2007). That said, we are only starting to explore the composition of the metabolome, let alone unravel all of the biosynthetic pathways leading to this diversity of chemical structures.
In the current article we concentrate on results from broad screening of the natural genetic diversity of metabolism in Arabidopsis rosettes and tomato fruit. Following intensive statistical analysis, clear patterns of metabolic regulation can be demarcated via these approaches and a sub-set of these patterns can be resolved at the genetic level. We conclude that the screening of diverse genetic populations by metabolic profiling significantly adds to our understand metabolic regulation.

METABOLIC VARIANCE IN ARABIDOPSIS
In recent years much research has exploited natural variance in the pre-eminent model species Arabidopsis thaliana (Kliebenstein et al., 2001;Koornneef et al., 2004;Weigel and Nordborg, 2005;Borevitz et al., 2007;Alonso-Blanco et al., 2009). Study of metabolic traits in Arabidopsis largely focuses on understanding the principles underlying metabolic regulation and the influence of metabolism on growth and development (Kliebenstein et al., 2001;Keurentjes et al., 2006;Meyer et al., 2007;Lisec et al., 2008Lisec et al., , 2009Rowe et al., 2008;Sulpice et al., 2009Sulpice et al., , 2010. As for all quantitative traits, those associated with metabolism are characterized by continuous variation. Establishment of the genetic basis of quantitative traits commonly referred to as quantitative trait loci (QTL), has often been hampered due to their complex multigenic inheritance and strong interactions with the environment. The principle of QTL mapping in segregating populations is based on genotyping of progeny derived from a cross between distinct genotypes for the trait under study. Phenotypic values for the trait are then compared with molecular markers in the progeny to search for particular genomic regions showing statistically significant associations with the trait variation (Broman, 2001;Slate, 2005). Over the past few decades, the field has benefited enormously from the progress made in molecular marker technology. The ease with which such markers can be developed has facilitated QTL mapping studies of even the most complex traits (Borevitz and Nordborg, 2003).
Quantitative trait loci analysis makes use of the natural variation present within species (Alonso- Blanco and Koornneef, 2000;Maloof, 2003;Fernie et al., 2006) and has been successfully applied to various types of segregating populations. In plants, the use of "immortal" mapping populations consisting of homozygous individuals that, at least theoretically, can be propagated indefinitely is preferred because it permits replication and multiple analyses of the same population. Homozygous populations can be obtained by repeated selfing, as is the case for recombinant inbred lines (RILs), but also by induced chromosomal doubling of haploids, such as for doubled haploids (DHs; Han et al., 1997;Rae et al., 1999;von Korff et al., 2004). RILs are likely advantageous over DHs since they are characterized by a higher frequency of recombination within the population, resulting from multiple meiotic events occurred during repeated selfing (Jansen, 2003;Keurentjes and Fernie, 2011).
Another type of immortal population consists of introgression lines (ILs; Eshed and Zamir, 1994), which are obtained through repeated backcrossing and extensive genotyping. These are also referred to as near isogenic lines (NILs; Monforte and Tanksley, 2000), or backcross inbred lines (BILs; Jeuken and Lindhout, 2004;Blanco et al., 2006). These lines contain a single or a small number of genomic introgression fragments from a donor parent into an otherwise homogeneous genetic background. In plants, RILs and NILs are the most common types of experimental populations used for the analysis of quantitative traits (for an illustration of these populations see Figure 1). In both cases the accuracy of QTL localization, referred to as mapping resolution, depends on population size. For RILs, the position of the recombination event is fixed and can therefore only be increased within the population by adding more lines (i.e., more independent recombination events). Alternatively, recombination frequency can be increased by intercrossing lines before fixation as homozygous lines by inbreeding (Zou et al., 2005;Balasubramanian et al., 2009). In NIL populations resolution can be improved by minimizing the introgression size of each NIL. Consequently, to maintain genome-wide coverage either a larger number of lines or a high proportion of overlapping regions, or both, are needed. Despite the similarities between these two types of mapping populations, large differences exist in the genetic makeup of the respective individuals and the resulting mapping approach. In general, recombination frequency in RIL populations is higher than in equally sized NIL populations, allowing analysis of fewer individuals. Each RIL contains several introgressed fragments and, on average, each genomic region is represented by an equal number of both parental genotypes in the population. Therefore, replication of individual lines is often not necessary because the effect of each genomic region on phenotypic traits is independently tested multiple times by comparing the two genotypic RIL classes. In addition, the multiple introgressions per RIL can potentially reveal epistasis between loci. However, this may negatively bias the power to detect QTL. Furthermore, the wide variation of morphological and developmental traits among individuals within most RIL populations may hamper analysis of traits requiring the same growth and developmental stage of the individual lines. When many traits segregate simultaneously, this often affects the expression of other traits due to genetic interactions. By contrast, NILs preferably contain only a single introgressed segment per line, increasing the power to detect small-effect QTL. However, the presence of a single introgressed segment limits testing for genetic interactions and thereby the detection of epistatic QTL. Since most of the genetic background is identical for all lines, NILs show more limited developmental and growth variation, increasing the homogeneity of growth stage within experiments.

FIGURE 1 | Comparison of introgression (A) and recombinant inbred lines (B).
Introgression lines are created by backcrossing an F1 of a cross between two parental lines to a recurrent parent for several times. Homozygous individuals containing single introgressions are then selected from the progeny. Recombinant inbred lines are generated by selfing an F1 for at least eight generations when full homozygosity is reached. Each individual of the population contains multiple introgressions. Recombinant inbred lines allow for the testing of epistasis. Also because of the higher recombination frequency they often offer higher resolution than introgression lines. Introgression lines, however, often display greater statistical power in the detection of small-effect QTL.
In Arabidopsis, the ease of generating fertile RIL populations with complete genome coverage has led to their extensive use in QTL mapping (O'Neill et al., 2008). NILs have also been developed to confirm and fine map QTL previously identified in RILs (Alonso- Blanco et al., 1998Blanco et al., , 2003Swarup et al., 1999;Bentsink et al., 2003;Edwards et al., 2005;Juenger et al., 2005;Teng et al., 2005). Genome-wide sets of NILs and RILs, descending from identical intercrosses, that allow mapping to chromosomal sections have been described in Arabidopsis and empirical comparative studies have been performed between the two population types (Keurentjes et al., 2007a;Lisec et al., 2008Lisec et al., , 2009). These studies illustrate the complementary benefits of both resources, facilitating the genetic dissection of various quantitative traits in Arabidopsis. RIL populations allow mapping at higher resolution, whilst NILs have the advantage of detecting small-effect QTL (Keurentjes and Fernie, 2011). Having extensively described the approaches, we now detail the biological significance of results obtained to date in Arabidopsis.
Although many simple metabolic traits affecting protein, oil, and starch content have been studied using targeted approaches (see Moose and Mumm, 2008;Fernie and Schauer, 2009), the adoption of methods able to detect the levels of multiple metabolites at once greatly expanded our ability to pose questions about regulation at the pathway and network level (Sweetlove et al., 2008;Stitt et al., 2010). Studies, largely reliant on gas chromatographymass spectrometry (GC-MS) or liquid chromatography-mass spectrometry (LC-MS) have been carried out in Arabidopsis by three different research groups.
The first publication of note focused on a RIL population derived from a cross between the Landsberg erecta (Ler) and Cape Verde Islands (Cvi) accessions and evaluated the variance of some 2000 mass peaks across this population (Keurentjes et al., 2006). Interestingly, almost one-third of these peaks were not detected in either parent, implying a vast potential for manipulation of chemotypes via classical breeding. Another intriguing finding of this study was the fact that colocation of QTL coincided with clusters of highly correlated mass peaks. For example, several glucosinolate QTL co-clustered with one another. Comparison of these data with those of previous targeted work showed that these co-locations were at positions of known regulators of glucosinolate metabolism to which Peptide methionine sulfoxide reductase (MAM) and QTL for production of alkenyl or hydroxyalkyl glucosinolates (AOP) mapped (Kliebenstein et al., 2001;Kroymann et al., 2001). Further work from this group focused on parallel analysis of 15 enzyme activities in tandem with their corresponding transcript levels and a set of relevant metabolites. The results revealed that traits affecting primary metabolism are often correlated and many activity QTL co-localize with expression QTL (although a fair number do not, suggesting that such multilevel approaches will be highly useful for distinguishing between transcriptional and metabolic control (Keurentjes et al., 2008). As an extension of this work they next performed a multiplexed transcriptome, proteome, and metabolome study on the same material, combining these data with publically available data (Fu et al., 2009). The surprising finding of this study was that following mapping of over 40000 molecular and 100 phenotypic traits, there were only six QTL hotspots. The authors concluded that there are thus only six breakpoints in a system otherwise buffered against many of the half-million single nucleotide polymorphisms (SNPs) between the parental lines (Fu et al., 2009).
Following a similar approach to that mentioned above, groups at the Max-Planck-Institute in Golm focused on primary metabolism. Two different strategies were employed; the study of RILs and NILs resulting from a cross between Col-0 and C24 (Meyer et al., 2007;Lisec et al., 2008;Brotman et al., 2011) and analysis of the natural variance of metabolite accumulation inherent in ecotypes (Cross et al., 2006;Sulpice et al., 2007Sulpice et al., , 2009Sulpice et al., , 2010Keurentjes et al., 2008). In the first approach, metabolites were not treated in isolation but rather evaluated with respect to the influence they exerted toward plant growth -with a metabolic signature for high growth being defined from results obtained in the Col-0/C24 RILs (Meyer et al., 2007). A more detailed study examined both RILs and NILs derived from the same parents (Lisec et al., 2008), revealing a couple of hotspots in which yield QTL overlapped with a large number of metabolite accumulation QTL. In this study, metabolic pathway-derived candidate genes were found for 24-67% of all tested metabolite QTL in the database AraCyc 3.5, demonstrating the power of this approach to identify possible sites of metabolic regulation. It is, however, important to note that this is only the first step and considerable further experimentation is required to confirm the existence and physiological relevance of such regulations.
A recent paper describes the identification of a cytosolic isoform of fumarase as the causal gene underlying traits of decreased fumarate and increased malate (Brotman et al., 2011). This result represents an elegant proof-of-concept study and beautifully fits the cross-over theorem, which states that if an equilibrium reaction of a linear pathway is inhibited, a build-up of substrate and a depletion of product of that reaction will occur (Rolleston, 1972). An illustration of the power of this theorem is shown in its application to starch synthesis in potato, which prompted a successful search for a novel post-translational modifier of the AGPase enzyme . Bearing this example in mind, analyses of metabolite ratios in advanced genetic populations will likely prove to be an important route to identify previously uncharacterized mechanisms of metabolic regulation in the future. The rapidly increasing availability of genome information for multiple Arabidopsis ecotypes (Weigel and Mott, 2009;Schneeberger and Weigel, 2011), alongside the increase in the number of groups performing metabolomic analyses should accelerate gene discovery.
A second approach to identification of metabolic regulators is comparative analysis of various Arabidopsis ecotypes. This approach was initiated to assess the natural variance in enzyme activities; the first study examined the activities of seven enzymes across 24 Arabidopsis ecotypes (Cross et al., 2006) whilst the second evaluated the activity of the most abundant protein in photosynthetic tissues, Ribulose-1,5-bisphosphate-carboxylase/-oxygenase (Rubisco), across the abovementioned Col-0/C24 RIL population (Sulpice et al., 2007). In the former study it was observed that enzyme activities largely vary on mass, but are not well correlated to the levels of the metabolites measured in the same sample (Cross et al., 2006). This observation is consistent with the concept that metabolism is highly regulated at multiple levels. The latter study described application of a novel Rubisco assay to describe the characteristics of this enzyme in 118 Arabidopsis accessions, defining two loci for Rubisco activity and two for Rubisco activation state (Sulpice et al., 2007). These analyses were subsequently massively expanded to encompass either 94 or 112 accessions as well as quantification of select metabolites in an attempt to establish the major integrator of growth within the species . The results of these analyses spotlighted starch and protein biosynthesis as the major controlling factors with respect to total biomass . Transcript profiling in 21 accessions further revealed coordinated changes in expression of more than 70 carbon-regulated genes, identifying two (myo-inositol-1-phosphate synthase and a Kelch-domain protein) whose transcripts correlate with biomass. The impact of allelic variation at these two loci was shown by association mapping, identifying them as candidate genes to increase biomass production.
Kliebenstein and co-workers have also taken both association mapping and QTL-based approaches in their studies, profiling both primary and secondary metabolites alongside studies of gene expression (Kliebenstein et al., 2006;Rowe et al., 2008;Kliebenstein, 2009;Chan et al., 2010). They have expended considerable effort to understand whole genome expression QTL, revealing a high incidence of both cis-and trans-acting QTL, including nonadditive variation such as epistasis and transgressive segregation as Frontiers in Plant Science | Plant Physiology well as genetic variation affecting entire transcriptional networks (Kliebenstein et al., 2006;Kliebenstein, 2009). They additionally linked this variation to phenotypic alterations in secondary metabolite content (Wentzell et al., 2007;Hansen et al., 2008) and more recently profiled primary metabolites (Rowe et al., 2008) in a 210 member RIL population. Statistical analysis of the resultant dataset suggested that epistatic interactions control a majority of the variation in the network of plant primary metabolism. They also identified 11 metabolite QTL hotspots, two of which overlapped the AOP and MAM loci previously characterized as QTL for glucosinolate accumulation.
Taking this analysis one step further, the authors constructed two biochemical networks de novo; however, it is too early to judge the accuracy of such an approach. The group subsequently profiled some 327 metabolites against greater than 200 000 SNPs (Chan et al., 2010). However, comparison of the resultant data from this study and that described above on the RIL population revealed that the higher level of genetic variation in the accession population was not reflected by a higher variation in the metabolome. They suggest that evolutionary constraints limit metabolic variation. Another important finding of this study is the large environmental influence on metabolite levels. Clearly such studies must be conducted in a range of environmental conditions since not only do individual metabolites change with the environment but the entire network behavior changes as well.

METABOLIC VARIANCE IN CROP SPECIES
While Arabidopsis is the best characterized plant species with respect to natural variation in the metabolome, an increasing number of studies are being carried out in crop species. Unfortunately, a limited number of these studies have addressed environmental influences across independent harvests and/or locations. Nonetheless, considerable information has been gleaned from single-harvest studies (see for example Fraser et al., 2007;Kusano et al., 2007). Despite these limitations, studies on rice, the staple food crop of almost half of the world's population , are particularly pertinent from an applied perspective. Kusano et al. (2007) recently profiled a total of 70 rice cultivars using a combination of two-dimensional GC-MS yielding a highly accurate inventory of the nutritional value of these cultivars. Similar smaller scale studies have also been carried out in sesame, broccoli, and mustard (Magrath et al., 1993;Laurentin et al., 2008;Rochfort et al., 2008). Adoption of MALDI/TOF-MS to individual mutagenized plants and Solanum pennellii ILs has been used for screening of fruit containing high levels of nutraceutical compounds such as carotenoids (Fraser et al., 2007). These studies all reveal large diversity within populations at the genetic and metabolic levels and hint that such information is likely of high value for breeding programs (Fernie and Schauer, 2009). As we will discuss in the following paragraphs it is also highly interesting material in which to study metabolic regulation.
There are examples of multi-harvest replication of metabolomic studies in crop species. To date the majority of these have been focused on tomato and these studies are the focus of this section. However, interesting examples in maize (Harrigan et al., 2007a,b;Zhang et al., 2010;Lisec et al., 2011), Lolium perenne (Koulman et al., 2009), and wheat (Hazehzarghani et al., 2008) will also be discussed. In maize a range of compositional traits including protein, oil, fatty acid, amino acid, and organic acid content was carried out in two independent maize hybrids grown at three separate locations (Harrigan et al., 2007a,b). This important proof-of-concept study demonstrated the high non-genetic variability in crop composition, illustrating the need for replicated trials. More recently a comparative analysis of the root metabolome of six parental maize inbred lines and their corresponding 14 hybrids was performed . The metabolic profile of each hybrid when compared to its parents is distinct and even reciprocal hybrids are easily distinguished. Reconstructed metabolic networks display a higher network density in most hybrids as compared to the corresponding inbred lines, suggesting that metabolite levels are subject to tighter control. On a broader scope, a maize diversity panel was screened for 10 key enzyme activities and heritable variation was found in each one (Zhang et al., 2010). Association mapping subsequently identified a novel amino acid substitution associated with a variation in isocitrate dehydrogenase activity, demonstrating that this approach can identify putative functional sites. A later study of the same 10 enzymes across a maize intermated B73 × Mo17 mapping population provided almost a four-fold increase in genetic map distance compared with conventional mapping populations (Zhang et al., 2011). In total, 73 significant QTL that influence the activity of these 10 enzymes as well as 8 QTL that influence biomass were identified. While some QTL were shared by different enzymes or biomass, the authors critically evaluated the probability that this may be fortuitous. All enzyme activity QTL were in trans to the known genomic locations of structural genes (i.e., genes that encode enzymes operating within the pathways), except for single cis-QTL for nitrate reductase, glutamate dehydrogenase, and shikimate dehydrogenase; the low frequency and low additive magnitude compared with trans-QTL indicates that, at least in this population, cis-regulation is relatively unimportant versus trans-regulation.
Returning to metabolite content QTL, large datasets have been obtained for L. perenne populations (Rasmussen et al., 2008), as well as wheat infected with Fusarium head blight (Hazehzarghani et al., 2008). By far the best characterized crop system, however, is tomato (Klee, 2010;Keurentjes and Fernie, 2011). In this species, a broad profiling of fruit volatiles, which are extremely important flavor components, in a population consisting of 74 Solanum lycopersicum × S. pennellii ILs yielded 100 QTL that were conserved across harvests (Tieman et al., 2006b). Metabolic and flux profiling of one of these QTL was instrumental in defining the pathway for synthesis of important phenylalanine-derived aromatic compounds in the fruit (Tieman et al., 2006a). Thirty additional QTL that affect the volatile emissions of red-ripe fruit were identified in a second population ofILs derived from a cross between S. lycopersicum and S. habrochaites grown in multiple seasons and locations (Mathieu et al., 2010). The same population has also recently been characterized for QTL for ripening-associated ethylene release (Dal Cin et al., 2009) and used to define a novel pathway for sesquiterpene biosynthesis from Z,Z -farnesyl pyrophosphate (Sallaud et al., 2009). In other studies of note, the volatile metabolite composition of some 300 compounds was determined across a population of 94 elite cultivars of tomato (Tikunov et al., 2005), www.frontiersin.org whilst the primary metabolite composition of five wild species of tomato was assessed in comparison to the cultivated tomato (Schauer et al., 2005). These studies provide important inventories of the metabolic differences between genotypes. That said, it will be some time before our knowledge is sufficiently advanced that we can facilely use such information for predictive breeding (Keurentjes and Fernie, 2011). Similar, albeit not quite so extensive, studies were performed on intraspecific crosses of S. lycopersicum cultivars (Causse et al., 2002), and have subsequently been validated in replicated experiments (Zanor et al., 2009). The same S. pennellii ILs described above were profiled using an established GC-MS method in replicated harvests resulting in identification of 889 QTL covering 74 metabolites including important primary metabolites such as sugars and organic acids as well as essential amino acids, intermediate metabolites and vitamins (Schauer et al., 2006). It is important to note that despite the fact that in many cases metabolite content was elevated, the vast majority of these QTL were associated with a yield penalty. In a subsequent study the heritability of these traits were established (Schauer et al., 2008). For this purpose, the S. penellii ILs were grown alongside lines heterozygous for the introgression (ILHs) allowing evaluation of both heritability and the mode of inheritance. These studies revealed that mean heritability of the metabolite QTL was generally relatively low (as in Arabidopsis; Rowe et al., 2008). However, a handful of the traits were nevertheless highly conserved and displayed reasonable heritability. Comparative study of the ILs and ILHs revealed that most of the metabolic QTL were dominant with a considerable number displaying additive or recessive mode of action and only a negligible number displaying overdominant phenotypes. Interestingly, the mode of inheritance was quantitatively different between diverse compound classes and several metabolite pairs displayed a similar mode of inheritance at the same chromosomal loci, suggesting that the variation is likely to be mediated by enzymes involved in their interconversion (Schauer et al., 2008).
The S. pennellii ILs have also been characterized for trichome specialized metabolites including terpenes, flavonoids, and acyl sugars (Schilmiller et al., 2010). Metabolite profiling led to the discovery of ILs producing different acyl chain substitutions on acyl sugar metabolites as well as two regions quantitatively influencing acyl sugar content. A QTL that influenced the types of glycoalkaloids was also identified. These results illustrate the power of QTL mapping for identification of novel enzymatic functions and pathways. Following a similar approach Tieman et al. (2006b) used ILs and reverse genetics in combination with volatile and isotope tracer analysis to establish the route of 2-phenylethanol and phenylacetaldehyde biosynthesis in tomato. A second flavor volatile QTL responsible for synthesis of methylsalicylate was shown to be the consequence of altered expression QTL (eQTL) of the gene encoding the biosynthetic enzyme, salicylic acid methyltransferase .
Another important agronomic property in tomato, total soluble solids content, has been defined at the genetic level. An early study utilizing the S. pennellii ILs mapped the moderate QTL Brix 9-2-5 to a 484-bp region of the cell wall invertase gene LIN5 (Fridman et al., 2000), although there was no difference in expression or protein content of LIN5 in the IL harboring this QTL (Fridman et al., 2004). QTL analysis of five different tomato species delimited the functional polymorphism of Brix 9-2-5 to an amino acid at the fructosyl binding site near the catalytic site of the invertase crystal with enzyme kinetic analysis of recombinant protein demonstrating that the S. pennellii allele was more efficient in degrading sucrose (Fridman et al., 2004). Subsequent experiments involving RNA interference of this isoform resulted in reduced Brix and sink strength, thus confirming the results obtained in the heterologous system (Zanor et al., 2009). Currently, optimized assays  for a set of key enzymes of central metabolism are being used to profile the S. pennellii ILs. Once finished it will be highly informative to integrate these analyses with those of the metabolites.
Further proof of the value of metabolic QTL analysis is illustrated by studies on branched chain amino acid metabolism for which four co-ordinate QTL were identified (Schauer et al., 2006). As a first approach we focused on branched chain aminotransferases (BCATs), mapping all six members of the family. BCAT1 is an eQTL whereas BCAT4 is a protein quality QTL (i.e., a QTL that does not affect protein abundance but rather its relative efficiency within a given biological process; Maloney et al., 2010). We next mapped a further 22 putative gene functions associated with branched chain amino acid metabolism. Mapping the chromosomal locations of these enzymes, it was possible to define the map positions of 24 genes (with two of the putative gene functions being encoded by two independent genes). Eight colocalized with BCAA QTL including those encoding ketol-acid reductoisomerase (KARI), dihydroxy-acid dehydratase (DHAD), and isopropylmalate dehydratase (IPMD; Kochevenko and Fernie, 2011). Quantitative evaluation of the expression of these genes revealed that the S. pennellii allele exhibited altered expression of IPMD, whereas expression of KARI and DHAD were invariant across the genotypes. Whilst the antisense inhibition of IPMD resulted in increased BCAA, the antisense inhibition of KARI or DHAD had no effect on fruit BCAA contents (Kochevenko and Fernie, 2011).

CONCLUDING REMARKS
The above examples in both Arabidopsis and crop species illustrate how forward genetic approaches based on natural variation can produce high resolution information concerning metabolic regulation. This is the case at the levels of the individual enzyme, whole pathways, and even metabolic networks. They furthermore, demonstrate how natural variance is an undermined resource of biochemical diversity and one that will certainly be an important resource for breeding approaches toward metabolic engineering . Given the ever increasing number of crop species for which full-genome sequences are becoming available it is highly likely that such studies will be greatly aided by the development of translational approaches both at the molecular (Mutwil et al., 2011) and phenotyping levels . As yet one drawback of this approach is that it is considerably slower than jumping straight into using reverse genetics. That said, the development of a wide array of genetic materials including those resulting from TILLING (Till et al., 2006) and the adoption of every more sophisticated rapid screening technologies such as those afforded by viral induced gene silencing (Ruiz et al., 1998;Quadrana et al., 2011) will likely considerably accelerate this process. Regardless, the forward genetic approach has at least two advantages (i) by taking a top-down approach it may uncover different levels of metabolic regulation within an experiment and (ii) given that the whole genome is considered, this approach is not restricted to evaluation of previously known enzymes or regulators thereof. This approach is entirely complementary to the currently more commonly used reverse genetic approach. While the forward genetic approach is in its infancy we are convinced that it will prove a highly powerful tool for the identification of novel mechanisms of metabolic regulation.

ACKNOWLEDGMENTS
We thank Dr. Takayuki Tohge for help with Figure 1.  Koornneef, M., and Vreugdenhil, D. (2003). The genetics of pytate and phosphate accumulation in seeds and leaves of Arabidopsis thaliana, using natural