The female gametophyte: an emerging model for cell type-specific systems biology in plant development

Systems biology, a holistic approach describing a system emerging from the interactions of its molecular components, critically depends on accurate qualitative determination and quantitative measurements of these components. Development and improvement of large-scale profiling methods (“omics”) now facilitates comprehensive measurements of many relevant molecules. For multicellular organisms, such as animals, fungi, algae, and plants, the complexity of the system is augmented by the presence of specialized cell types and organs, and a complex interplay within and between them. Cell type-specific analyses are therefore crucial for the understanding of developmental processes and environmental responses. This review first gives an overview of current methods used for large-scale profiling of specific cell types exemplified by recent advances in plant biology. The focus then lies on suitable model systems to study plant development and cell type specification. We introduce the female gametophyte of flowering plants as an ideal model to study fundamental developmental processes. Moreover, the female reproductive lineage is of importance for the emergence of evolutionary novelties such as an unequal parental contribution to the tissue nurturing the embryo or the clonal production of seeds by asexual reproduction (apomixis). Understanding these processes is not only interesting from a developmental or evolutionary perspective, but bears great potential for further crop improvement and the simplification of breeding efforts. We finally highlight novel methods, which are already available or which will likely soon facilitate large-scale profiling of the specific cell types of the female gametophyte in both model and non-model species. We conclude that it may take only few years until an evolutionary systems biology approach toward female gametogenesis may decipher some of its biologically most interesting and economically most valuable processes.


SYSTEMS BIOLOGY: AN INTEGRATED APPROACH TO MODEL BIOLOGICAL PROCESSES WITH LARGE-SCALE DATA
Since the foundation of the Institute for Systems Biology in the year 2000 and the formal definition of systems biology at the beginning of the twenty-first century (Ideker et al., 2001;Kitano, 2002), it has been a steadily growing field of research. As an integrative approach, systems biology is markedly different from the reductionistic approach generally used in molecular biology and genetics. Powered by the central dogma of biology, where a gene is transcribed to mRNA, which is then translated into proteins, molecular biology and genetics have successfully identified genes, their functions, and the processes they are involved in. However, the implicit link of a gene to a certain function or a phenotype is an oversimplification of the underlying process. It thus frequently misses important interactions with other cellular or environmental factors (e.g., responses to environmental conditions like a temperaturedependent phenotype of a mutant). In contrast, systems biology may be described as an attempt to quantitatively and/or qualitatively describe and understand the global behavior of a biological entity, emerging from the interactions between its molecular components. Such a comprehensive understanding would allow the prediction and modeling of the biological entity, its precise control, and ultimately the targeted manipulation of a complex biological system (reviewed in Kitano, 2002;Yuan et al., 2008;Fukushima et al., 2009;Chuang et al., 2010;Katari et al., 2010;Weckwerth, 2011).
Systems biology comprises and integrates experimental studies and large-scale data sets derived from high-throughput technologies (omics), such as transcriptomics (RNA profiling), proteomics (analysis of proteins), and metabolomics (profiling of metabolites). However, also epigenetic regulatory processes based on the modification of chromatin components or DNA (epigenomics), the translation of mRNAs to proteins (translatomics), complex formation of proteins with proteins or nucleic acids (interactomics), the investigation of protein modifications, e.g., phosphorylation important for the regulation of their activity (phospho-proteomics), and the transport of ions or metabolites (fluxomics) need to be taken into account to achieve a full picture of the dynamic processes of a cell or organism (reviewed by Sheth and Thaker, 2014). One of the most crucial aspects for systems biology approaches is the comprehensiveness of the omics data (Kitano, 2002). For a given method this includes the number of items that can be measured at once (e.g., transcripts with transcriptomics). For the entire system, it is then important whether the relevant items (e.g., enzymes and metabolites) or processes (e.g., posttranslational modifications) can be accurately measured with a combination of certain methods. An additional level of complexity may be imposed by the requirement of a high spatial and/or temporal resolution. For a single, isolated cell this can refer to specific organelles, subcellular compartments, certain domains of the plasma membrane, and the stage of the cell-cycle. For an unicellular organism like yeast, this may be augmented by studying the cell-to-cell variability within the population (Pelkmans, 2012). In multicellular organisms, each cell (type) has a specific function and position within an organ. Its role and differentiation status may be influenced by local signals as well as systemic signals originating from other organs (e.g., hormones). In addition, the temporal coordinate expands to developmental stages of the organs or the life span of the organism.
Consequently, a complete understanding at the systems level requires highly resolved, quantitative spatio-temporal data on the individual components and their interactions, and the integration of the data into models. On one hand, integration of these data with computational methods can aid to characterize previously unknown components (e.g., genes) of a system, as exemplified for yeast (Brown et al., 2006). Alternatively, the data may be used in a mathematical model describing the system and allowing the prediction of a system's behavior and the formulation of hypotheses (Süel et al., 2007). Finally, the integration of omics data, the formulation of mathematical models, the generation of hypotheses, and the experiments are interlinked and benefit from each other. A possible extension of systems biology is the use of interspecies comparisons to, for example, elucidate the extent to which genotypic variation translates into phenotypic differences (Konstantinidis et al., 2009). Even broader, evolutionary systems biology may be recognized as an approach to describe and understand how biological systems are shaped by evolution and are steering it at the same time (reviewed in Soyer, 2012).
Prior to the understanding of a complex organism composed of many different cell and tissue types, investigations of distinct cell types can lead to an understanding of basic processes governing cellular specification, differentiation, and metabolism. To date, yeast (S. cerevisiae) is a widely used model system appreciated as the currently best understood cell (Boone, 2014). While evolutionary only distantly related, pathways in yeast have shown to have considerable similarities to the ones in plants, animals, and humans (Ideker et al., 2001). In addition, yeast serves for the production of food and pharmaceuticals. Due to its simplicity and its importance for biotechnology and biomedical research, yeast has shaped modern molecular biology to a great extent. Indeed, it has been a pioneering organism in systems biology (reviewed in Bostein and Fink, 2011;Österlund et al., 2012;Boone, 2014), starting from gene expression and regulatory networks discovered during early transcriptome studies and their integration with other genome-wide data, over genetic interaction networks obtained by crossing thousands of mutant strains (Costanzo et al., 2010) and modeling of gene expression as a Quantitative Trait Locus (eQTL, Brem et al., 2002), to genomewide metabolic models. However, given the unicellularity of yeast, it can hardly serve as a developmental model for complex multicellular animals and even less so for plants. In plants, systems biology is less advanced for several reasons, including the higher complexity of most plant genomes, large gene families, the multitude of primary and secondary metabolites, and the lack of suitable in vitro systems or cell lines for most plant tissues. Most efforts in plant research thus require in vivo experiments, making the procedures generally more difficult and less suitable to high-throughput approaches. As a consequence, data generation can be a severely limiting factor for plant systems biology. On the other hand, the results are of high relevance for the process under investigation.
Apart from the above mentioned obstacles, substantial progress in the analysis of specific cell types in plants has been made over the last decade. Facilitated by advances in high-throughput profiling technologies and methods for the isolation of individual cell types, recent studied focussed on the analysis of specific cell types or even single cells (Figure 1). To investigate cell type-specific processes in higher plants, root hairs and trichomes have been used as models, both for their physiological importance and their accessibility at the epidermal surface (for details see below; Ishida et al., 2008;Brechenmacher et al., 2009Brechenmacher et al., , 2010Dai et al., 2010;Libault et al., 2010a,b;Schilmiller et al., 2010;Nestler et al., 2011;Van Cutsem et al., 2011;Dai and Chen, 2012;Rogers et al., 2012;Tissier, 2012;Qiao and Libault, 2013). In addition, starting with only a few examples at the beginning of the twenty-first century (Kehr, 2001), cell type-specific transcriptional profiling has become a robust and frequently used method. In the model plant Arabidopsis thaliana, novel insights into plant development and cellular responses to environmental stimuli were for example gained through studies on individual cell types of the root, root hairs, trichomes, and guard cells, and by transcriptional profiling during male and female gametogenesis (reviewed in Taylor-Teeples et al., 2011;Schmidt et al., 2012;Wuest et al., 2013). These examples clearly illustrate the importance of cell type-specific investigations for a detailed understanding of differentiation processes and environmental responses of distinct cell types. However, depending on the cell type under investigation, the currently available methods for cell isolation may be challenging, time-consuming, or limited to a subset of omics approaches (e.g., Laser-Assisted Microdissection (LAM) of rare cell types, Wuest et al., 2013). While studies focusing on specific cell types, which can be isolated in quantities high enough for the full set of omics approaches, can serve as initial models for cell type-specific systems biology in plants (Libault et al., 2010a), the ultimate goal must be that the full set of methods can be applied to any cell type of interest.

METHODS FOR THE ACQUISITION OF LARGE-SCALE QUANTITATIVE DATA FROM SPECIFIC CELL TYPES
Large-scale profiling of distinct cell types critically depends on the possibility to isolate these cells in sufficient purity and quantity, as well as the sensitivity and accuracy of the profiling methods. Despite the rapid improvements of established and novel tools for systems biology, the demand for fast and easily applicable methodologies for cell type-specific analyses is not yet satisfied. Further challenges are associated with the requirement for normalization and integration of different data types, and the increasing demand for platforms allowing storage and sharing of the rapidly growing amount of large-scale datasets (reviewed by Chuang et al., 2010;Katari et al., 2010; Gomez-Cabrero FIGURE 1 | Cell and tissue types frequently used for cell type-specific systems biology and omics studies in plants. For the germlines, only the mature gametophytes are shown. sp, sperm cell;veg, vegetative cell;syn, synergids;cen, central cell;egg, egg cell. et al., 2014;Sheth and Thaker, 2014). In brief, three steps are of great importance for cell type-specific systems biology: (i) isolation and purification of the specific cell type, (ii) profiling of the selected molecular compounds, and (iii) data analysis, integration, storage, and sharing. In the following sections, we will present current methods to acquire large-scale quantitative data required for systems biology. We will focus on methods allowing genome-wide cell type-specific analyses and present representative examples. For a discussion on the computational challenges in systems biology, the reader is referred to several recent reviews (Ahrens et al., 2007;Yuan et al., 2008;Fukushima et al., 2009;Chuang et al., 2010;Katari et al., 2010;Liberman et al., 2012;Fukushima et al., 2014;Gomez-Cabrero et al., 2014;Robinson et al., 2014).

Methods for the Isolation of Specific Cell Types
A few cell types in plants are exposed on the surfaces of tissues and can be collected by abrasion or mechanical detachment. Depending on the species, relatively simple mechanical isolation procedures for trichomes and root hairs enabled a large spectrum of methods. Mechanical isolation of trichomes allowed transcriptomics and metabolomics in various species (for an integrated database see Dai et al., 2010) and proteomics (Schilmiller et al., 2010;Van Cutsem et al., 2011). Another example for an exposed cell type are root hairs, for which relatively simple isolation procedures facilitated transcriptomics (Libault et al., 2010b), proteomics (Brechenmacher et al., 2009;Nestler et al., 2011), and metabolomics (Brechenmacher et al., 2010). Certain other cell types can be isolated by tissue disruption, followed by centrifugation-based methods or manual isolation of the dissociated cells under a microscope using a micropipette (eventually with a marker for the cell type of interest). Examples include specific cell types from the male or female reproductive lineages, plant mesophyll cells, and guard cells (reviewed by Dai and Chen, 2012;Schmidt et al., 2012;Wuest et al., 2013). Proteomic profiling has, for example, been performed on Brassica napus guard cells and mesophyll cells that could be purified as protoplasts (Zhu et al., 2009).
However, for most cell types these methods are not applicable. Several methods for the isolation of specific cell types embedded in differentiated tissues have been established. Fluorescent Activated Cell Sorting (FACS) can be used to sort fluorescent cells based on their light scattering characteristics and fluorescence (reviewed by Hu et al., 2011). This method allowed high resolution transcriptional profiling of different cell types in the Arabidopsis root, and, more recently, proteomics (Petricka et al., 2012) and metabolite mapping of selected root cell and tissue types (Brady et al., 2007; reviewed by Benfey, 2012;Moussaieff et al., 2013). Similarly, Fluorescence-Activated Nuclei Sorting (FANS) has been established and, for example, used to isolate endosperm nuclei for profiling of RNA activity or epigenetic modifications (Weinhofer et al., 2010;Weinhofer and Köhler, 2014). Despite the great potential of FACS/FANS for plant cell type-specific systems biology, both approaches have certain limitations: They can only be applied if transgenic lines carrying cell type-specific fluorescent markers can be established, and they are thus not suitable for most non-model species. In addition, depending on the tissue type, longer enzymatic incubations are required to digest the cell walls and to release the protoplasted cells prior to sorting (Evrard et al., 2012). Consequently, changes in, for example, the transcriptome or metabolome cannot be fully excluded. Alternatively, the INTACT method (Isolation of Nuclei TAgged in specific Cell Types) allows the isolation of nuclei expressing a biotinylated nuclear envelope protein by affinity purification with streptavidin-coated beads (Deal and Henikoff, 2011). This method is suitable to study epigenetic modifications (DNA methylation of histone modifications) and to profile the RNA within the nucleus. To study actively translated mRNAs bound to ribosomes (translatome), small epitope tags can be fused to a ribosomal protein to allow immunopurification of the ribosomes containing the mRNAs with a method named TRAP (Translating Ribosome Affinity Purification; reviewed in Bailey-Serres, 2013). Alternatively, RNAs binding to RNA binding proteins involved in the formation of ribonucleoprotein (RNP) complexes can be profiled by immunoprecipitation of an epitopetagged protein (RNP ImmunoPurification, RIP; Bailey-Serres, 2013). It has to be noted that the analyses of transcriptome and translatome abundance will not give the same results, because not all mRNAs present in a cell are actively translated at a given time point. In this respect, profiling of mRNAs bound to ribosomes gives complementary results to transcriptome profiling as the readouts are closer to the synthesis of proteins (Bailey-Serres, 2013). Similar to FACS and FANS, also INTACT, TRAP, and cell type-specific RIP require the use of transgenic lines and pre-existing knowledge about cell type-specific promoters or markers.
An alternative method not requiring any molecular knowledge is LAM (Kerk et al., 2003). Plant tissues are thereby typically fixed and embedded in paraffin wax (reviewed in Schmidt et al., 2012;Wuest et al., 2013) or resin (Tucker et al., 2012;Okada et al., 2013). Thin sections of the tissues (typically between 6 and 10 µm) are subsequently mounted on metal framed plastic slides and used to isolate the cell types of interest after resolving the wax or resin and drying the tissues on the slides (Okada et al., 2013;Wuest and Grossniklaus, 2014). Alternatively, the tissue may also be embedded in optimal cutting temperature compound for cryosectioning, followed by on-slide tissue dehydration and LAM Walbot, 2012, 2014). The main constraint of LAM is that harvesting sufficient material for downstream omics methods can be very time-consuming. Furthermore, the suitability for single cell isolation depends on the optical resolution in sectioned tissues and the recognizability of the cell type of interest. In addition, the physical properties of the laser beam of the instrument used can impose limitations on which cell types can be isolated . Thus, the time required for collecting enough material for one sample is largely dependent on the cell type of interest. So far, the applications of LAM for cell type-specific omics have been restricted to transcriptional profiling, e.g., to study cell type-specification in the female reproductive lineage in Arabidopsis thaliana, Boechera gunnisoniana, and Hieracium praealtum (Wuest et al., 2010;Schmidt et al., 2011Schmidt et al., , 2014Okada et al., 2013). However, other applications, such as genome wide profiling of DNA methylation, are likely feasible (see below).

Transcriptomics
Transcriptome profiling encompasses the identification and quantification of all expressed RNA transcripts at a given time point (mRNA, tRNA, microRNA). However, due to the frequent use of oligo-dT priming during cDNA synthesis or the hybridization to microarrays covering only coding regions of the genome, many studies are restricted to mRNAs or a subset of mRNAs. Several types of microarrays were produced and extensively used for the analyses of gene expression in different plant species, including the model plant Arabidopsis thaliana and different important crop species like maize, rice, and barley (reviewed in Sheth and Thaker, 2014). The Affymetrix ATH1 GeneCHIP (www.affymetrix.com), the most popular microarray for Arabidopsis has for example been used to profile a large variety of different tissue types (e.g., Schmid et al., 2005), specific cell types of the root isolated through FACS (Birnbaum et al., 2003), and specific cell types of the male and female reproductive lineages (reviewed in Schmidt et al., 2012). In addition to well established tools for data analysis, the wealth of publicly available datasets generated on the same platform makes commonly used microarrays a very valuable tool for systems biology (Katari et al., 2010).
Apart from microarrays, several platforms for Next Generation Sequencing (NGS) have been developed over the last years and are now routinely used for transcriptional profiling (RNA-Seq; see Mardis, 2013, for a review on NGS platforms). RNA-Seq has several advantages as compared to the use of microarrays, including a higher dynamic range, higher sensitivity, and whole-genome coverage allowing the identification of previously unknown transcripts and splice variants (reviewed in Schmidt et al., 2012). A major advantage is the applicability to non-model species, either through de novo assembly of the short reads into transcripts or by the use of a reference transcriptome either produced separately or taken from a public database (e.g., the ongoing effort to sequence 1000 plant transcriptomes, www.onekp. com). Examples for such an approach are the central cells of Arabidopsis thaliana, and cells of the female reproductive lineage in Hieracium praealtum and Boechera gunnisoniana Okada et al., 2013;Schmidt et al., 2014). Several tools for RNA-Seq data analysis are available (see Sheth and Thaker, 2014, for a selection of software tools, and Schmid and Grossniklaus, 2015, for Rcount, a count tool addressing the problem of reads aligning at multiple locations in the genome, or reads aligning at positions where two or more genes overlap). Current challenges are the increasing demand for standardized annotations of datasets and the development of computational methods allowing the integration of data from different studies using different methods and platforms. In the future, the integration of data from different species will be of great value for plant systems biology, allowing researchers to gain insights into conserved common regulatory mechanisms, environmental adaptations, and evolutionary changes.

Proteomics
In addition to the analysis of gene expression and actively translated mRNAs, the investigation of proteins and protein modifications (e.g., phospho-proteomics and glyco-proteomics) add additional levels of complexity. From a systems biology perspective the aim is the combination of cell type-specific proteomics with transcriptomics and metabolomics to elucidate and model regulatory networks (reviewed in Dai and Chen, 2012). In the beginning of proteomics, 2D gel electrophoresis was frequently used for separation of the proteins in a sample and to identify spots representing proteins differentially occurring in two samples (reviewed by Schulze and Usadel, 2010). However, the protein or protein mixture in one spot could only be identified by excising the spot and analysis using Mass Spectrometry (MS). To date, proteomics largely depends on the use of various MS methods in combination with different protein separation procedures. Typically, proteins are first digested with trypsin and subsequently either analyzed directly by MS or first separated by chromatography before MS. MS methods have greatly improved with the development of soft ionization methods like ElectroSpray Ionization (ESI) in solution (typically aqueous or organic solvents) or Matrix Assisted Laser Desorption Ionization (MALDI, Hollenbeck et al., 1999;Schulze and Usadel, 2010). By both methods, intact gas phase ions are generated that are introduced into mass analyzers and sorted depending on their mass-to-charge ratio, e.g., using their Time-Of-Flight (TOF, Hollenbeck et al., 1999; for a recent summary of mass analyzers see Lee et al., 2012; for a description of Orbitrap mass analyzers see Perry et al., 2008). However, detection based on peptide mass-to-charge ratios is largely qualitative and can only be used for quantification in two or more samples acquired under standardized conditions (Schulze and Usadel, 2010). Thus, stable isotope or chemical labeling is frequently applied for quantification in proteomic methods (reviewed in Schulze and Usadel, 2010). While software and algorithms for protein identification are well established, quantitative analysis remains more challenging (Schulze and Usadel, 2010;Sheth andThaker, 2014, see Sakata andKomatsu, 2014, for a recent survey on proteomics repositories and databases).
To date, only a restricted number of plant cell types have been profiled in a cell type-specific manner by proteomics, including guard cells, mesophyll cells, trichomes, root hair cells, leaf epidermal cells, lily and rice sperm cells, different stages of pollen development in tobacco, Arabidopsis, and tomato, and rice egg cells (Brechenmacher et al., 2009;Grobei et al., 2009;Abiko et al., 2013;Chaturvedi et al., 2013;Ischebeck et al., 2014, and reviewed by Dai and Chen, 2012;Wuest et al., 2013). As compared to transcriptomics approaches, a larger amount of starting material is required. For example, approximately 40 µg of protein were isolated to study the proteome during tobacco pollen development (Ischebeck et al., 2014). In addition, the amount of proteins detected is typically in the range of 10-30% of the transcripts identified from the same cell or tissue type, as exemplified by a study on Arabidopsis pollen, in which 3599 proteins as compared to 11,150 expressed genes were reported (Grobei et al., 2009). This quantitative difference largely reflects the difference in the sensitivity of the methods and likely only to a smaller extent meaningful biological differences. Nevertheless, as only a few proteins have been identified in previous studies, e.g., from maize egg cells, these data reflect a great improvement (Okamoto et al., 2004), and a rapid advance since the shaping of the term proteomics in 1997 (James, 1997).

Protein-Protein Interactions
For studies of protein-protein interactions, the major methods used are Yeast Two-Hybrid (Y2H) screens, Affinity Purification Mass Spectrometry (AP-MS), or Bimolecular Fluorescence Complementation (BiFC) (reviewed in Zhang et al., 2010). Y2H assays take advantage of the bipartite structure of the yeast GAL4 transcriptional activator consisting of two functional domains, a transcription activation domain and a DNA-binding domain. In Y2H assays, the bait and the target protein are fused to the two functional domains of GAL4, respectively, together reconstituting the functional GAL4 protein that binds to its target promoter (UAS GAL4 ) to activate the expression of a downstream gene encoding a selectable marker. Apart from a high false-positive rate, the use of yeast itself is a major drawback of the method. While cell type-specific cDNA libraries can be used to profile pairwise protein interactions, the system does not truly reflect the in vivo state of a specific plant cell (e.g., cofactors of an interaction may be missing). Several systems similar to Y2H assays have been established to specifically study membrane proteins (e.g., split-ubiquitin system; Obrdlik et al., 2004;Chen et al., 2010). For AP-MS, a bait protein is fused to an affinity tag for expression in vivo. The tagged protein of interest is subsequently purified as a complex with interacting proteins or other molecules and assayed by MS. This method is also associated with a relatively high false-positive rate due to protein contaminants. While the method is well-suitable for cell type-specific studies if the expression of the tagged protein is driven by a cell type-specific promoter, true omics-scale profiling can hardly be achieved, as a precondition would be the cell type-specific tagging of all proteins represented in a cell. This also holds true for BiFC, where a fluorescent protein (YFP, RFP, or GFP) is split into two non-fluorescent halves that are reconstituted to a fluorescent protein upon interaction of the bait and target proteins they are fused to (reviewed by Zhang et al., 2010). While BiFC has the advantage that spatial and temporal interactions can be resolved, it is also associated with a high false-positive rate. Consequently, methods for true cell type-specific large-scale protein-protein interaction studies in plants are lacking to date. Nonetheless, the currently available data on protein-protein interactions, as for example the recently established membrane protein interactome , may help to resolve certain dependencies within regulatory networks (see Sheth and Thaker, 2014, for a summary of the available databases).

Protein-DNA Interactions
Interactions between proteins and DNA comprises several functional aspects, for example nucleosome occupancy, specific histone modifications, or transcription factor binding. These interactions may be studied using either Chromatin ImmunoPrecipitation (ChIP, Orlando and Paro, 1993), or DNA adenine methyltransferase IDentification (Dam-ID, van Steensel and Henikoff, 2000). In both cases, the interaction of one protein (variant) with the DNA is monitored genome-wide. During the ChIP procedure, the DNA is cross-linked by formaldehyde to bound proteins before fragmentation by sonication. Chromatin fragments are then isolated with antibodies against the protein (variant) of interest. After recovery of the co-purified DNA by reverting the cross-links, the DNA sequence can be identified using microarray hybridization or high-throughput sequencing (He et al., 2011). Protocols facilitating cell type-specific ChIP (Chromatin Affinity purification from Specific cell Types by ChIP; CAST-ChIP), without the need for purification of the cell type of interest or a protein-specific antibody, have been developed (Schauer et al., 2013). However, these protocols rely on transgenics and specific promoters. In addition, we are not aware of a report where this method has been applied in plants or used to study rare cell types.
For Dam-ID, the protein of interest is fused to an adeninemethyltransferase of E. coli (Dam, Greil et al., 2006). Endogenous methylation of adenine is absent in most eukaryotes. Upon expression of the fusion protein, Dam is targeted to the native binding sites of the protein fused to it. This results in a localized methylation of adenines in the GATC sequence context. These regions can then be identified using methylationsensitive restriction enzymes and microarray hybridization or high-throughput sequencing (Greil et al., 2006;Luo et al., 2011). Tissue or cell type-specific expression of the fusion protein can be used to overcome the need for cell isolation and has been shown to be highly specific (targeted DamID, "TaDa, " Southall et al., 2013). The major disadvantages of the method are the requirement for transgenics and specific promoters, as well as the need for optimization of the expression level to avoid untargeted methylation and toxicity of the Dam fusion protein. Thus, both approaches are currently quite laborious and generally only applicable to model-species. Nonetheless, especially transcription factor binding is of great value for the study of transcriptional networks (Yuan et al., 2008). If cell typespecific data is not available, previously identified transcription factor binding motifs may still help to identify transcriptional modules (Diez et al., 2014).

Protein Microarrays
Protein microarrays are a promising tool for proteomics as well as for interactions of proteins with other proteins, nucleic acids, cellular surface markers, or posttranslational proteinmodifications (Yang et al., 2011b;Uzoma and Zhu, 2013). Several different types of protein microarrays can therefore be distinguished. On analytical microarrays, well characterized proteins (e.g., monoclonal antibodies) are spotted to identify a specific set of proteins. Alternatively, less well characterized proteins (e.g., lysates from whole cells) are spotted on functional microarrays to test for interaction partners. Finally, proteome microarrays hold the majority of encoded proteins for an organism (Yang et al., 2011b). While the first proteome microarray for budding yeast was established in 2001 (reviewed by Uzoma and Zhu, 2013), not many applications were reported in plants (Yang et al., 2011b;Uzoma and Zhu, 2013). Nevertheless, protein microarrays have, for example, successfully been used to study 802 transcription factors in Arabidopsis (almost half of all transcription factors annotated in Arabidopsis, Gong et al., 2008). While protein microarrays may have a high potential for applications in systems biology, they are currently still limited by high production costs and laborious production methods (e.g., large-scale cloning of open reading frames, protein purification, and production of high-affinity monoclonal antibodies, Yang et al., 2011b).

Metabolomics
Due to the high complexity of plant metabolites coming from both primary and secondary metabolism, the plant metabolome is highly complex. Although by far not comprehensively elucidated to date, about 200,000 different metabolites are estimated to be represented in plants (reviewed by Sheth and Thaker, 2014). While a variety of analysis platforms can in principle be applied for metabolite detection, Nuclear Magnetic Resonance (NMR) and MS are the most frequently used methods (Kueger et al., 2012;Sheth and Thaker, 2014). High resolution mapping of metabolites has recently been achieved in Arabidopsis roots by combining FACS with high resolution MS (Moussaieff et al., 2013). In addition, glandular trichomes have been used as model systems for large-scale metabolome analyses (Tissier, 2012). However, the major limitation of current metabolomics is the lack of a single method allowing comprehensive measurements in terms of qualitative detection, quantitation, and spatio-temporal resolution. This is the case because the metabolites differ significantly in their concentration, chemical properties, and analytical behavior. Two major strategies in metabolome profiling are the use of either targeted or untargeted MS (reviewed in Kueger et al., 2012). Targeted MS relies on previous knowledge about structures and chemical properties of the metabolites of interest and combines chromatographic separation techniques, e.g., High Pressure Liquid Chromatography (HPLC) or Gas Chromatography (GC), with MS techniques. In contrast, non-targeted analyses using MS without prior chromatographic separation is used to profile metabolites without prior knowledge about their abundance or structure. This method often only allows the determination of metabolic signatures, as the characterization of a specific metabolite, for example by NMR, is highly challenging. Therefore, a key problem is the availability of reference spectra and compounds for compound identification and annotation (Kueger et al., 2012). Thus, the need for comprehensive databases including relevant information on the compounds, e.g., spectra, and the requirement for integration of metabolome data with other large-scale omics data has been noted (Fukushima et al., 2009). Current online resources include the Golm Metabolome Database (gmd.mpimp-golm.mpg.de) and the MASSBANK Database (www.massbank.jp).
An alternative method to study, for example, metabolites at spatial resolution without the need for prior cell isolation is MALDI-MS Imaging (MSI, reviewed by Lee et al., 2012). For MSI, a suitable matrix is directly applied to thin tissue sections (e.g., 10-20 µm). The prepared tissue sections are then rasterized with a laser-beam coupled to a high mass resolution (TOF-MS, reviewed in Kaspar et al., 2011). The spot size of the laser thereby determines the resolution. Only recently, technical improvements allowed to reach resolutions required for the analysis of single cells (<20 µm, reviewed in Kueger et al., 2012;Lee et al., 2012). MSI has rarely been used in plants for proteomics, and only few studies reported the imaging of metabolites (reviewed in Kaspar et al., 2011;Kueger et al., 2012). Examples for metabolite imaging with MSI include the measurement of wheat grain cell-wall polysaccharides (Veličković et al., 2014, 100 µm spot size), or the lipid measurements in embryos of cotton (Horn et al., 2012, 35 µm spot size). While MSI has a great potential for cell type-specific studies for plant systems biology, it needs to be noted that only thin surface layers of <1 µm are sampled by MALDI (Lee et al., 2012). However, further improvements in MSI are likely to be developed soon and adaption of these methods to plant tissues may once facilitate single-cell proteomics as well as metabolomics in a range of species.
In addition, to study the subcellular localization of specific ions or metabolites and their physiological relocation, e.g., by directed transport, a variety of molecular sensors has recently been developed. Such sensors usually depend on proteins changing their conformation upon binding of a specific substrate. Consequently, the distance between attached fluorescent proteins will change leading to an alteration in Fluorescent Resonance Energy Transfer (FRET, reviewed by Okumoto, 2012;Okumoto et al., 2012). For spatially and temporally resolved measurements, FRET can be measured by, for instance, Fluorescence Lifetime Imaging Microscopy (FLIM, reviewed by De Los Santos et al., 2015). While being very valuable tools in plant research, these techniques do not readily allow the high-throughput analysis of a large number of compounds in a plant cell and will thus not be discussed in detail in this review.

DNA Methylation
DNA (cytosine) methylation is a heritable epigenetic modification of the genome and is involved in various cellular and developmental processes in a wide range of species, including animals, fungi, and plants. Several methods for genome-wide profiling of the DNA cytosine methylation status have been established. These include the hybridization onto whole-genome DNA microarrays after digestion of genomic DNA with methylation-sensitive restriction enzymes, or the precipitation of methylated DNA with antibodies targeting methylated cytosines (Methylated DNA ImmunoPrecipitation, MeDIP), followed by either microarray hybridization (MeDIP-chip) or NGS (MeDIP-Seq, reviewed by Su et al., 2011;Ji et al., 2015). The current method of choice for methylome profiling is Whole-Genome Bisulfite Sequencing (WGBS). In brief, DNA is incubated with bisulfite, converting all unmethylated cytosines to uracils, which are identified as thymines during sequencing. In contrast, all methylated cytosines are protected from the conversion, remain unchanged, and are identified as cytosines during sequencing (Ji et al., 2015). Compared to the profiling of other epigenetic marks, such as histone modifications, WGBS has two major advantages. It does not require the use of transgenic plants or antibodies, and recently developed methods allow WGBS on as little as 125 pg of DNA (Post-Bisulfite Adaptor Tagging (PBAT), Miura et al. (2012); 20 pg diluted Arabidopsis DNA with a modified protocol, our unpublished data). WGBS is therefore a very promising method for the profiling of specific cell types in plants.

SYSTEMS BIOLOGY APPROACH TOWARD PLANT DEVELOPMENT
As evident from the previous examples, plant cell type-specific systems biology is most advanced in cell types that can relatively easily be isolated in large enough amounts of suitable for any type of omics approach. For the root hairs of soybean, for example, a promising method to isolate large quantities facilitating any omics analysis has recently been described and will likely be of great use (Qiao and Libault, 2013). The method uses an ultrasound aeroponic system to enhance root hair density, followed by fixation and separation of the root hairs in liquid nitrogen. In addition, for the different cell types of the Arabidopsis root, FACS yields sufficient material for most omics approaches. An advantage of these systems is that due to the use of only one isolation method, the variability imposed by it can be held constant over all experiments. The use of a single method is also cost-efficient as it requires less time and resources to optimize only one method as compared to several. Due to the relatively easy sample collection and their physiological roles, roots, root hairs, and trichomes are excellent models to study responses to environmental stimuli, host-pathogen/symbiont interactions, metabolic pathways, or the dynamics of cellular specification and cell-cell communication in complex tissues. However, even the root may not be an optimal model to address fundamental questions of developmental systems biology. Its main disadvantages are the long developmental time span, starting very early during embryogenesis, and the complex interplay within and between the different cell types of the root, but also with the above-ground tissues, and biotic and abiotic environmental factors. Ideally, a developmental model system should allow an experimental coverage of the entire life-span of the organism. It would be of advantage if the organism were short-lived and comprise only a limited number of developmental stages and specialized cell and tissue types to reduce complexity and increase the affordability of comprehensive studies. For comparative analyses and evolutionary systems biology approaches, it would be further advantageous if the phylogeny of the model system included a broad range of organisms with gradual phenotypic changes, or with gain, loss, and alternative usage of modular building blocks. Finally, an ideal model system is most beneficial if its understanding can lead to direct applications in, for example, production of food or pharmaceuticals.
An intuitive model for the development of an organism is the embryo. During plant embryogenesis, the basic body organization with an apical-basal and radial pattern is established starting from a single cell, the zygote. The mature embryo already contains the progenitors of the main organizers of plant growth, the primary Shoot and Root Apical Meristems (SAM and RAM), and the hypocotyl and cotyledons with their various tissue types (reviewed in Lau et al., 2012). However, it is thus already a relatively complex system composed of multiple cell and tissue types. Additional complexity is imposed by the different stages of embryo development, spanning the time between the onecellular zygote and the mature embryo. An in-depth systems biological description of embryogensis would therefore require sampling of a large variety of cell types at many time points. Nevertheless, while most transcriptional studies published so far focussed on whole tissues or entire embryos (reviewed in Palovaara et al., 2013;Zhan et al., 2015), recently, high-quality cell type-specific transcriptomes of the proembryo and the suspensor of the early stages of the Arabidopsis embryo were described (Slane et al., 2014).
Alternative models for the development of organisms, which are far less complex than the embryo, are the gametophytes of flowering plants: the pollen (male) and the embryo sac (female). They are typically formed from one spore (meiotic product) and, at maturity, they consist of only a few cells and cell types, including the male and female gametes, the sperm cells and the egg and central cells, respectively (reviewed in Yang et al., 2010;Twell, 2011;Schmidt et al., 2015). Upon double fertilization, the egg cell and the central cell fuse with one sperm each to give rise to the embryo and endosperm, respectively. The latter nurtures the embryo and acts as storage organ for seed reserves in many species, including the cereals. The endosperm is thus the most important food and feed source.
Given the sheer amount of pollen produced by a single plant, and the relatively simple isolation procedures for some of the specific cell types of the male germline in developing pollen, multiple cell type-specific transcriptome data sets are available from different species, including Arabidopsis thaliana, Oryza sativa (rice), Zea mays (maize), Lilium longiflorum (lily), and Plumbago zeylanica (white leadwort) ( Table 1; reviewed in Schmidt et al., 2012;Anderson et al., 2013;Dukowic-Schulze et al., 2014;Kelliher and Walbot, 2014), and several cell typespecific proteomes have recently been described for tobacco, Lilium davidii var. unicolor (Lanzhou lily), and tomato (Table 1; Abiko et al., 2013;Chaturvedi et al., 2013;Zhao et al., 2013;Ischebeck et al., 2014). Due to its characteristic tip-growth, pollen tubes also serve as an excellent model to study cell elongation and mechanical properties of the cell wall (Vogler et al., 2013). However, pollen development is strikingly uniform in angiosperms (Maheshwari, 1950), and inter-species comparisons would therefore likely be more fruitful in gymnosperms, which show a remarkable variation in terms of the number of cell divisions between meiosis and the subsequent specification of the sperm cells (Fernando et al., 2010). In contrast to pollen, a plant forms much fewer female gametophytes, which are deeply embedded in the maternal floral tissue (e.g., in Arabidopsis, each flower contains around 50 ovules, each of which harbors only one embryo sac). Nonetheless, several cell type-specific transcriptomes ( Table 2; reviewed in Schmidt et al., 2012;Wuest et al., 2013, and more recent data in Anderson et al., 2013;Okada et al., 2013;Schmidt et al., 2014) as well as a proteome analysis for rice egg cells ( Table 2; Abiko et al., 2013) are currently available. Even though it is more difficult to collect than the pollen, In brief, pollen formation starts with a microspore mother cell (or meiocyte) which undergoes meiosis to give rise to a tetrad of reduced spores. Each of these microspores undergoes pollen mitosis I to give rise to a generative and a vegetative cell. The subsequent mitotic division of the generative cell (pollen mitosis II) results in the formation of two sperm cells (Twell, 2011). UNM, uninucleate microspore; GC, generative cell; SC, sperm cell; LC, liquid chromatography. the embryo sac has certain developmental features rendering it a highly interesting model system for plant development: (i) high evolutionary diversity within angiosperms, (ii) syncytial development (i.e., the formation of a multinucleate cell), (iii) specification and differentiation of only three to four distinct cell types, and (iv) a process in which plants can reproduce asexually via seeds (gametophytic apomixis). The mature embryo sacs of angiosperms generally contain at least three distinct cell types: the synergids required for pollen tube attraction and reception, and the two gametes, the egg and the central cell. An exception are, for example, the Podostemaceae, where the central cell seems to degenerate before pollen tube arrival, resulting in a single fertilization event (Sehgal et al., 2014). In addition, antipodal cells are frequently present, but little is known about their function. It has been hypothesized that they might be involved in nutrient transfer from the surrounding tissues to the embryo sac (Raghavan, 1997). Despite the high functional similarity of mature embryo sacs, their formation is highly diverse across different plant taxa (Figure 2; Maheshwari, 1950;Huang and Russell, 1992;Baroux et al., 2002;Williams and Friedman, 2004). Reproductive development can be divided into two steps: megasporogenesis and megagametogenesis. Megasporogenesis comprises the formation and maturation of the initial meiotic products (megaspores) from a single selected sporophytic cell, the Megaspore Mother Cell (MMC), and is under the control of the usually diploid sporophytic genome. Megagametogenesis describes the following mitotic divisions, cellularization, cell FIGURE 2 | Schematic showing several basic types of female gametophyte development in angiosperms and the structural diversity of the mature embryo sacs (after Maheshwari, 1950). The development of the female gametophyte can be devided into two steps: megasporogenesis (orange shading) and megagametogenesis (green shading). During megasporogenesis, a selected sporophytic cell, the megaspore mother cell (MMC), undergoes meiosis to give rise to (Continued) Frontiers in Plant Science | www.frontiersin.org FIGURE 2 | Continued spores. In most angiosperms, a tetrad of four megaspores is formed, of which three subsequently abort, leaving only one functional megaspore (FMS) to participate in megagametogenesis (e.g., Polygonum-type). However, a high diversity of the developmental processes of megasporogenesis and megagametogenesis has been observed in different genera, with variations, for example, including bispory and tetraspory. During megagametogenesis, the mature female gametophyte is formed through mitotic divisions, nuclear migration, and cellularization. For the mature embryo sac, the colors indicate the cell types: egg (pink), synergids (yellow), central cell (blue), and antipodal/lateral cells (white). Cells structurally similar to egg cells or synergids are drawn accordingly, but are colored gray.
specification, and maturation of the female gametophyte, which is under the control of the typically haploid genome. Both processes exhibit high diversity within angiosperms. Depending on the number of spores that survive and participate in megagametogenesis, megasporogenesis can be divided into monosporic (one spore), bisporic (two spores), and tetrasporic (all four spores). Further variation includes the location of the degenerating spores and the positioning of the spores in the tetrasporic types. Likewise, megagametogenesis can vary in the number of mitotic divisions, the arrangement of the nuclei/cells, and late divisions of individual cells after cellularization (e.g., in Amborella, Friedman, 2006). Comparative analysis of the structure of a wide range of embryo sacs and reconstruction of the ancestral state suggest that the embryo sacs of early angiosperms contained only four cells: two synergids, one egg cell and one central cell. It has been hypothesized that duplication of this four-celled module facilitated the emergence of the bi-nucleate central cell that, following fertilization, forms an endosperm with a maternal:paternal genome contribution ratio of 2:1 (Williams and Friedman, 2004;Friedman, 2006;Friedman and Ryerson, 2009). This unequal parental contribution to the endosperm has received a lot of attention over the last century. As a tissue protecting and nourishing the embryo, the endosperm may be subject to adaptive processes and parental conflicts (Haig and Westoby, 1989;Baroux et al., 2002). An interesting aspect of female gametophyte development (and tetrasporic megasporogenesis) is the formation of a syncytium during the divisions of the nuclei prior to cellularization. In angiosperms, gametogenesis and early stages of endosperm development are the two major examples for the formation of a syncytium. In contrast, the plasmodial tapetum, for example, is formed by degeneration of the cell walls and the fusion of the resulting protoplasts (Furness and Rudall, 1998). Unlike regular cell divisions, where the positions of cells are relatively fixed due to the rigid cell wall, a syncytium allows for nuclear migration and for differentiation according to gradients of positional information. Indeed, determination of cell fate in the embryo sac of Arabidopsis depends on the position of the nuclei as, for example, indicated by the Arabidopsis retinoblastoma-related1 (rbr1) mutant, which produces supernumerary nuclei differentiating according to their position within the FG (Johnston et al., 2008;Sprunck and Groß-Hardt, 2011). However, the nature of such information is still under debate. Appealing candidates may be gradients of plant hormones, such as cytokinin or auxin. For both, a role in establishing polarity during embryo sac development has been proposed (reviewed in Schmidt et al., 2015) but their role may be rather indirect (Lituiev et al., 2013). However, an alternative or complementary hypothesis can be formulated using the analogy to the syncytial embryogenesis in Drosophila, where around 70% of the genes expressed during early embryogenesis show a specific subcellular localization of their mRNA in the syncytium. Interestingly, specific subcellular mRNA localization peaks around the transition from syncytial to cellular development, potentially reflecting the high demand for localization mechanisms (Lécuyer et al., 2007). Thus, a fascinating possibility is that the specific subcellular localization of mRNAs in the syncytial stage of the developing embryo sac may play a role in determining cell fate. A possibility to test this hypothesis would be to separately isolate specific subcellular regions (e.g., the two opposing poles) of the developing syncytial female gametophyte and to compare the transcriptional profiles of these regions with each other.
Another interesting variation of reproductive development is gametophytic apomixis. It refers to the process of asexual reproduction through seeds in the absence of fertilization (reviewed in Koltunow and Grossniklaus, 2003). Apomixis occurs in more than 400 plant species from around 40 genera and is likely of polyphyletic origin (Asker and Jerling, 1992;Carman, 1997). Gametophytic apomixis involves the omission or abortion of meiosis (apomeiosis) and the formation of an embryo from an unfertilized egg (parthenogenesis), while the endosperm can be formed by autonomous development of the central cell or dependent on fertilization (pseudogamy). Depending on the mechanism of the formation of the unreduced megaspore, the resulting offspring can be genetically completely identical to the mother plant without any chromosomal rearrangements. It is thereby possible to fix complex genotypes over multiple generations without a loss in heterozygosity. While gametophytic apomixis is absent in major crop plants, engineered apomictic crops would promise great potential and economical value for plant breeding and agriculture (Koltunow et al., 1995;Vielle-Calzada et al., 1996;Grossniklaus et al., 1998). From a developmental perspective, apomixis can be seen as an alteration of the sexual pathway, where certain processes are initiated too early or in the wrong cell type (Koltunow, 1993;Grossniklaus, 2001). Detailed understanding of the molecular processes and pathways governing gametogenesis during sexual and apomictic reproduction is therefore a precondition to engineer apomixis in crop plants. In evolutionary terms, apomixis is a highly interesting trait. On one hand, it allows the dispersal of seeds without the need for a sexual partner (Smith, 1978) and may therefore be advantageous for the colonization of new habitats (Tomlinson, 1966). On the other hand, the trade-off for this clonal reproduction appears to be very costly. Apomicts may accumulate deleterious mutations over many generations (Muller, 1964) and their populations are likely of low genetic variability, which reduces their potential to adapt to a changing environment. Recent proposals, however, suggest that epigenetic variation may also contribute to adaptive potential, which may explain the ecological success of many apomicts (Hirsch et al., 2012).
Given the natural variation in sexual and apomictic species, the female gametophyte of angiosperms can be seen as an excellent model system to study fundamental developmental processes and evolutionary aspects of plant development and biology that are of high importance to agriculture. Its simple organization and the relatively few developmental stages would allow for an in-depth analysis of various species enabling evolutionary comparisons at the whole-genome level. Given the high diversity, inter-species comparisons may identify genes and genetic networks involved in the emergence of evolutionary novelties, such as the unequal genetic contribution of the two parents to the endosperm or gametophytic apomixis. Deciphering the evolutionary mechanisms underlying these processes may also provide an answer to the long-standing question, how useful research on model organisms is for crop improvement. However, the small size and inaccessibility of the cell types of developing and mature embryo sacs make the isolation and subsequent application of omics methods very difficult. Aside the challenges associated with data integration and analysis, data generation is hence a major limiting factor. In general, the main obstacle with most approaches is the number of cells required for in-depth profiling of a certain molecule (e.g., protein or metabolite). This may be overcome by either increased sensitivity of the profiling method, or through a simplified collection of a large number of cells. However, most highthroughput isolation methods (e.g., for FACS/FANS/INTACT) rely on the existence of a specific marker (i.e., a cell typespecific promoter) and the possibility to generate transgenic plants. In addition, typically a certain abundance of the cell type of interest in the sample is required for efficient sorting and purification. Given that these preconditions are generally not met by low abundant cell types of of non-model organisms, it is likely that plant systems biology will profit the most from an increase in sensitivity and the development of novel profiling methods. In the following sections, we will therefore focus on a subset of omics approaches, which are readily available or which bear great future potential for routine large-scale in vivo profiling of specific cell types. The examples given are restricted to studies on specific cell types of the female gametophytes of angiosperms.

Transcriptome
Transcriptomics is clearly the most frequently used and currently the most robust omics approach to study female gametophyte and plant reproductive development. Following the early transcriptional profiling with low-throughput technologies [early Expressed Sequence Tag (EST) sequencing projects, reviewed in Wuest et al. (2013)], cell type-specific transcriptomes were generated for the egg cell, the central cell, the synergids, and the MMC of Arabidopsis (Wuest et al., 2010;Schmidt et al., 2011;, the egg cell and the synergids for rice (Ohnishi et al., 2011;Anderson et al., 2013), all cell types of the mature embryo sac and the Apomictic Initial Cell (AIC) of Boechera gunnisoniana (a close apomictic relative of Arabidopsis thaliana where an AIC is specified instead of a sexual MMC, Schmidt et al., 2014), and the AIC of Hieracium praealtum (hawkweed, where the AIC is formed by an additional sporophytic cell developing adjacent to the sexual reproductive lineage, Okada et al., 2013; Table 2). Given the requirement to establish a specific gene expression profile for cell specification and differentiation, transcriptomics is also especially suitable as a first approach toward an unknown species, because it provides a comprehensive snapshot of the cellular instruction machinery. It further enables the identification of cell type-specific markers and can thus provide a basis for other approaches, like detailed molecular and mechanistic studies. The advantage of transcriptional profiling as compared to proteomic studies is the possibility to amplify the material prior to detection. Several RNA-Seq protocols allow transcriptional profiling of single cells corresponding to as little as about 10 pg of total RNA (reviewed in Head et al., 2014). This low detection limit facilitates the use of relatively low throughput isolation methods, such as LAM or manual microdissection, allowing the profiling of specific cell types of embryo sacs in model and non-model species (Okada et al., 2013;Wuest et al., 2013;Schmidt et al., 2014). A current drawback of the amplification strategy is the introduction of potential quantification biases. A possible solution may be Unique Molecular Identifiers (UMI). These are short sequences with random nucleotides (e.g., 1024 different UMIs with 5 random nucleotides), which are used to label initial cDNA molecules prior to amplification. An excess of UMIs compared to the number of identical cDNAs ensures that each combination of a given UMI with a certain cDNA is unique. After amplification and sequencing, this can be used to differentiate between individual molecules in the initial cDNA pool and duplicates originating from cDNA amplification (i.e., to count molecules instead of reads, Islam et al., 2014). An interesting approach for future studies may be Fluorescent In Situ RNA SEQuencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced directly within a biological sample, thereby not only quantifying gene expression, but also detecting the subcellular localization of the transcripts (Lee et al., 2014). Improvement of this method and its adaption to plant tissues would thus undoubtfully be a major advance in cell type-specific transcriptional profiling.

Proteome and Metabolome
Proteomics and metabolomics on specific cell types is substantially more challenging than transcriptomics. A current limitation for cell type-specific proteomics is the large discrepancy between the number of detected proteins compared to the number of expressed genes, which is due to the low sensitivity of proteomics methods towards lowabundant proteins. An additional complexity arises by the presence of a wide range of post-translational modifications, such as phosphorylation or glycosylation. Apart from two early examples, identifying only the major proteins in the egg cells of maize and rice (6 and 4 proteins, Okamoto et al., 2004;Uchiumi et al., 2007), we are only aware of the recent description of the egg cell proteome in rice, where 2138 proteins were identified using around 500 egg cells (Abiko et al., 2013; Table 2). In the same study, 2179 proteins were identified starting from 30,000 isolated sperm cells (Table 1; Abiko et al., 2013). Given the further improvements of the sensitivity of mass spectrometers, the example demonstrates that proteomics of purified cells of the female gametopyhte should be possible for cases where enough material can be collected. Mechanical or manual isolation of female gametes was reported for a variety of species including barley, wheat, rape seed, maize, tobacco, Torenia, Alstroemeria, and Arabidopsis (Kranz et al., 1991;Holm et al., 1994;Kovács et al., 1994;Katoh et al., 1997;Tian and Russell, 1997;Sprunck et al., 2005;Hoshino et al., 2006;Okuda et al., 2009;Jullien et al., 2012). In most of these species, we anticipate that the protocols would already allow the isolation of sufficient material for MS-based proteomics. Another promising approach for future experiments may be MSI, circumventing the need for (laborious) cell purification.

Methylome
DNA cytosine methylation (5mC) plays an important role in the epigenetic regulation of plant genomes. While WGBS has not yet been reported for isolated cells of the female gametophyte, bisulfite sequencing of specific sequences has already been applied for Arabidopsis central cells and synergids isolated by LAM (Wöhrmann et al., 2012;You et al., 2012). It would likely be possible to combine LAM or manual microdissection with WBGS. This would thus allow methylome profiling of gametes in model as well as non-model species. Importantly, this may provide novel insights into the molecular basis underlying heterosis (Groszmann et al., 2011), characterized by superior characteristics of F1 hybrid plants as compared to their parents. While epigenetic regulatory pathways are likely important for heterosis, their precise involvement remains elusive to date (Chen, 2013). Understanding of the regulatory mechanisms governing heterosis is of great interest for plant breeding and crop production. Importantly, gametophytic development and early stages of embryogenesis are likely important for the establishment of heterosis.

CONCLUSION AND PERSPECTIVES
To date, cell type-specific systems biology in plants is frequently constrained by the difficulties associated with the isolation of the cell type of interest in large enough amounts. Robust and simple isolation methods exist only for a few cell types. Consequently, the comprehensive profiling of all cell types of an organism with different large-scale profiling methods, allowing the detailed understanding of all biological processes ongoing in the biological system, is still an unreached goal. While the in-depth understanding of complex organisms over their lifespan is a major aim for systems biology, the use of simple model organisms bears advantages, given the persisting technical limitations. We introduce the female gametophyte of angiosperms as an attractive model system for future systems biology approaches in plant development. Apart from its relatively simple organization, it is of great biological and agronomical importance, for example with respect to seed production and plant breeding.
Currently, most high-throughput isolation methods with broader application (e.g., FACS/FANS/INTACT) are limited to model organisms (e.g., Arabidopsis thaliana, Oryza sativa). However, a biological system may be best understood in the context of evolution. In addition, a detailed understanding of the cellular processes in major agriculturally important species including wheat, where an additional challenge is the genome size and its hexaploid nature, are a precondition for targeted crop improvement. Such studies would thus not only be of potential applied value, but would also help to understand the common concepts and divergent mechanisms active in different species. Therefore, methods facilitating largescale profiling of specific cell types in model as well as non-model organism are of crucial importance. Parallel highthroughput profiling of several organisms covering a phenotypic gradient, or including gain, loss, and alternative usage of modular building blocks along the phylogeny, will enable evolutionary systems biology. This approach may ultimately help to reconstruct the emergence of evolutionary novelties and to find the underlying genetic and molecular networks. Such an understanding would in turn allow the control of the underlying processes with an unprecedented resolution. In perspective, this can be an important precondition for targeted improvement of crop species, including the engineering of apomixis into crop plants.
Even though the isolation of individual cell types is currently still very challenging, the rapid technical advances observed over the past few years in, for example, transcriptional profiling, are clear indications for the tremendous improvement of large-scale profiling technologies. In this light, we emphasize methods for transcriptomics, proteomics, metabolomics, and methylomics, in which we see great future potential. However, cell type-specificity and single-cell resolution are just one step towards a more comprehensive view on developmental processes and environmental responses. Clearly, monitoring subcellular localization of molecules and their interactions will be essential to understand certain patterning processes and specific cellular functions. In analogy to the hypothesized distribution of mRNA within the syncytial female gametophyte, subcellular localization of mRNA may also occur within the cell types of the mature female gametophyte to, for example, target the proteins they encode to a specific subcellular region. In this respect, technologies based on high resolution imaging, allowing large-scale profiling without prior cell isolation, for example MSI or FISSEQ, are very promising for future applications. The growing amount of data and data types also points to the need for novel computational solutions addressing the problems of data storage, integration, and analysis (see Ahrens et al., 2007;Yuan et al., 2008;Fukushima et al., 2009;Chuang et al., 2010;Katari et al., 2010;Liberman et al., 2012;Fukushima et al., 2014;Gomez-Cabrero et al., 2014;Robinson et al., 2014). The current situation, in which data sometimes remain unpublished, are frequently poorly annotated, and widely dispersed in specialized databases, may be taken as motivation to develop integrative computational platforms specifically focussing on future data.
Considering the almost exponential growth of biological data over the last years (Ideker et al., 2001;Chuang et al., 2010), these platforms may also ignore data from the past to allow for innovative solutions. In this context, standardized data formats and annotation, easily accessible databases, powerful data mining tools, user-friendly and freely available software, as well as scalable storage platforms are the current and future demands in systems biology (Chuang et al., 2010;Gomez-Cabrero et al., 2014).