Advanced phenotyping and phenotype data analysis for the study of plant growth and development

Due to an increase in the consumption of food, feed, fuel and to meet global food security needs for the rapidly growing human population, there is a necessity to breed high yielding crops that can adapt to the future climate changes, particularly in developing countries. To solve these global challenges, novel approaches are required to identify quantitative phenotypes and to explain the genetic basis of agriculturally important traits. These advances will facilitate the screening of germplasm with high performance characteristics in resource-limited environments. Recently, plant phenomics has offered and integrated a suite of new technologies, and we are on a path to improve the description of complex plant phenotypes. High-throughput phenotyping platforms have also been developed that capture phenotype data from plants in a non-destructive manner. In this review, we discuss recent developments of high-throughput plant phenotyping infrastructure including imaging techniques and corresponding principles for phenotype data analysis.


Introduction
Global agricultural demand is expanding rapidly, not the least because of a growing world population but also due to indirect factors which are rendering agricultural production suboptimal, such as unequal food distribution, competing claims for land use and increased demand for meat and dairy due to a change in dietary habits in the G5 countries (emerging economics). Agriculture, in particular, faces tremendous challenges for crop production in the coming decades. According to a prediction by the United Nations Food and Agriculture Organization 1 , cereal production must be doubled before 2050 to satisfy the demand for food by the growing world population, as well as the increasing competition for crops as sources of bio-energy, fiber, and other industrial purposes. Additionally, the supply of the major crop rice, a staple food throughout the world, has become insufficient . Besides the many biotic and abiotic factors, predicted changes in temperature and rainfall patterns as a consequence of climate change may lead to further reduction in yields (Sticklen, 2007). In order to meet the global challenges represented by the rapidly growing human population and environmental changes, novel methods are required to improve the quality and productivity of cereal grains (Tester and Langridge, 2010). For these reasons, there is a demand for quantitative analyses of plant traits to accelerate the selection of crops that are better adapted to 1 www.fao.org resource limited environments and soil conditions, which is also a major constraint to global food production (Fiorani and Schurr, 2013).
Plant researchers have been trying to propose appropriate strategies for plants that will be resistant to environmental stress, insects and diseases, while still possessing high nutrient efficiency (Zhang, 2007;Ahmed et al., 2013). As a way to improve cereal crops, a lot of effort has been put into functional genomics studies using high-throughput genomic tools (Ayliffe and Lagudah, 2004;Xing and Zhang, 2010;Huang et al., 2013;Pallotta et al., 2014;Valluru et al., 2014). However, more effort is required to map genotype-phenotype relationships for the global development of crop breeding (Tester and Langridge, 2010). Due to the rapid progression of functional genomics and genetic technologies, especially in the field of high-throughput sequencing technology many plant genomes are now available. Their functional analysis has entered the high-throughput phase, providing available genetic information as well as enabling genomic analysis (Holtorf et al., 2002). These sequenced genomes that are supposed to represent a crop species often only constitute a single genotype.
The outdated phenotyping procedures-a technique dealing with plant characteristics, in conjunction with available genetic information, have not allowed a thorough functional analysis and have not led to a functional map between genotype and phenotype. A focus on overcoming these shortcomings has led to an emerging and increasingly important branch of biological sciences termed "phenomics" (Furbank, 2009;Furbank and Tester, 2011). Phenomics is a technology that enables highthroughput phenotyping for crop improvement in response to present and future demographic and climate scenarios.
To meet the needs of current research, reliable, automatic and high-throughput phenotyping platforms have been developed (Hartmann et al., 2011). Multiple studies in phenomics highlight findings, such as causal genes and background variation, relationships between traits, plant growth behavior as well as reproduction in various conditions (Furbank and Tester, 2011;Yang et al., 2013;Brown et al., 2014). In this way, the challenges of extracting multi-parametric phenotypic information along with the genetic variability can be adequately met. Current phenotyping platforms include a variety of imaging methodologies to obtain high-throughput non-destructive phenotype data for quantitative studies of complex traits, such as growth, tolerance, resistance, architecture, physiology, yield, and the basic measurement of individual quantitative parameters that form the basis for more complex traits (Chen et al., 2014b;Li et al., 2014).
High-throughput automated imaging is now the ideal tool for phenotyping, and is becoming more advanced and popular, with the capacity to measure multiple morphological and physiological traits for an individual plant. There is also a trade-off in speed versus accuracy when using high-throughput imaging-manual measurement of several traits on one plant, if done properly, is currently more accurate than automated measurement, but much slower. Furthermore, through imaging techniques, plant phenomics could offer plant scientists a new way to discover the features and functionality of living plants via scanning temperature profiles, measuring photosynthetic rates, gauging growth rates, and getting insight into root physiology (Finkel, 2009). These advances have also boosted plant phenotyping to a new level. Therefore, we describe phenotyping techniques based on various imaging systems. In addition, we highlight the importance of phenotype data analysis, analytical techniques, and methods for plant growth and developmental studies. We also highlight the major challenges of high-throughput phenotyping and phenotype data analysis for promising applications in plant phenomics.

Consequences of Environmental Factors for Plant Phenotyping: A Big Challenge for the Imminent Generation
Several studies have suggested that upcoming generations can be influenced by the environmental factors experienced by the earlier generation (Dawson et al., 2011). Recent studies indicate that under rapid climate change phenotypic plasticity rather than genetic diversity is more likely to play a crucial role in allowing plants to persist in their environments (Vitasse et al., 2010;Gratani, 2014). The plant reacts by exhibiting phenotypic plasticity when the genotype is grown under various environmental conditions, and this plasticity is particularly big under extreme conditions such as frost, drought, and salinity.
The factor frost is one of the most important abiotic stresses for the countries with severe winters, adversely affecting crop development and yield production (Chinnusamy et al., 2007;Li et al., 2011). Moreover, drought is a complex stress that permanently affects the soil, which elicits a wide variety of plant responses and limits crop yield (Pennisi, 2008;Honsdorf et al., 2014). This is a worldwide threat for agricultural production, and crop improvement of drought tolerance is a principal target. Soil salinity is another major abiotic stress that threatens sustainability of global crop production (Rengasamy, 2006). For instance, in Southern Asia and South East Asia about 48 million ha of potentially useful agricultural land is unusable due to saline soils (Hairmansis et al., 2014). Also, in those regions fertile land is often used to expand cities, and so crop production is decreasing significantly (Hedhly, 2011).
To assess the performance of plant species, it is crucial to increase the understanding about plant reactions to different stress environments and in which ways genotypes differ in such responses (Suter and Widmer, 2013). Moreover, for both economic and social importance, there is a need to know about the phenotype response to breed for increased yield and yield stability in the face of changing climate and environment (Brown et al., 2014). Hence, to improve crop production, it is necessary to link suitable phenotyping protocols in all stages, such as the screening of germplasm collections, mutant libraries, mapping populations, transgenic lines, breeding materials, and the design of "omics" and QTLs experiments (Salekdeh et al., 2009). Scientists are using advanced approaches to explain genetic mechanisms underlying the major plant phenotypic traits (Salekdeh et al., 2009). Furthermore, additional exploration and use of the novel plant development approaches are urgently required for the imminent challenging decade.

Importance of Advanced Phenotyping and Phenomics in Modern Agriculture
With the rapid development of sequencing technologies, whole genomes of many plant species are now available in online databases. The sequencing of the genome of the model plant Arabidopsis represented a landmark in plant genomics (Weigel and Mott, 2009). There are also many economically important crop varieties that have since been sequenced and annotated (Cannon et al., 2009;Weigel and Mott, 2009). However, making sense of, and exploiting genetic information for genomic analysis still requires considerable effort.
The selection of high yielding and stress-tolerant plants is necessary to ensure that crop production keeps pace with population growth. By establishing the connection between genotype and phenotype, it is possible to improve agricultural production to satisfy the requirement of the growing human population. Therefore, phenotyping is as important as genotyping in establishing the relationship between genes and traits. Indeed, phenotyping is rapidly becoming the major operational bottleneck in limiting the power of genetic analysis and genomic prediction.
Phenotyping tools in common use are labor-intensive, timeconsuming and costly, and require destruction of plants at fixed times or at particular phenological stages. The goal of current plant phenotyping is to raise the accuracy, precision, and throughput of phenotype inference at all levels of biological organization, while reducing costs and labor through mechanization, remote sensing, improving data integration, and experimental design. However, with technological advances in plant breeding, genetic progresses through "omics" approaches are being conducted to meet the ideal phenotype, which will enable plants to have superior and stable yields under changes in climate and environment. These large-scale "omics" approaches are routinely used in various research disciplines of plants to study cellular processes, their genetic control and interactions with the environmental changes in molecular plant biology (Deshmukh et al., 2014).
The available components of "omics" approaches contain genomics, proteomics, transcriptomics, epigenomics, and metabolomics (Chen et al., 2014a). Integrated "omics" approaches have more potential in aiding crop breeding, leading to a new approach-"phenomics"-involving highthroughput analysis of physical and biochemical traits of an organism. The concept of phenomics has altered the strategy in crop development research, and it is defined as the study of phenome-the full set of phenotypes of an organism. In genomics, a sequenced genome is fully characterized, whereas in phenomics, we cannot characterize the entire phenome due to its highly dynamic and high-dimensional properties. However, we can carry out high-throughput and high-dimensional phenotyping of a set of particular traits. In plant phenotyping, throughput refers to the number of individual units at particular organizational levels within plants, and dimensionality refers to the diversity of phenotypic traits measured at various spatial and temporal regulations and in different categories, such as plant structure, physiology, and performance. Dimensionality also includes the number of genotypes and the diversity of environmental conditions and treatments taken into account upon phenotyping (Dhondt et al., 2013).
Genotype-phenotype mapping, along with the significant rate of trait discovery, has enormously improved phenotypic prediction (Topp et al., 2013). Integrated data from phenotype and genome-wide approaches provide models of the biological processes over time and across various scales. Quantitative trait loci (QTL) mapping and genome-wide association studies (GWAS) have been a useful tool for genetic analysis, giving valuable information about genomes in various plant studies. They have been broadly adopted for gene mapping (Yin et al., 2004;Atwell et al., 2010;Huang et al., 2010;Wurschum et al., 2011;Ranc et al., 2012;Wang et al., 2012;Topp et al., 2013). Comprehensive phenome-wide data enable plant similarity or dissimilarity to be studied across the whole population. Consequently, phenomics studies increasingly characterize all possible phenotypes, establishing the structural, physiological, and performance related traits (biomass/ha, seed yield) under different environmental conditions for a given genotype.

Mechanism of Imaging Technologies: Meeting Challenges and Needs in Plant Phenomics
Imaging and image processing techniques with light sources from visible to near infrared spectrum provide non-destructive plant phenotype image datasets. These approaches have accelerated the precision and speed of real-time, high-throughput, and highdimensional phenotype data for modeling and prediction of plant growth and structural development (Tardieu and Tuberosa, 2010;Golzarian et al., 2011). The application of combined image based novel technologies in phenomics and dedicated high-throughput dynamic controlled environment facilities have resulted in increased performance, and provide a new prospect for improving plant phenotype.
Materially, plant phenotyping is not a new research for recording quantitative and qualitative plant traits. It has been the backbone of most studies in ecology, agronomy, and eco-physiology to explore plant functional diversity, compare the performance of species, or study plant responses to the environment (Granier and Vile, 2014). Phenotyping has been progressing from the manual, non-destructive or destructive, study of a few different genotypes, which can only be done for a few replications. Non-destructive phenotyping is performed for intact plants; while destructive phenotyping is an invasive measurement where the plants can no longer be used for further experiments. The developmental course (kinetics) of the same organ cannot be monitored destructively. Basically, destructive measurements are more complicated, time consuming, and demand high labor costs. However, when measurements are carried out manually, non-destructive phenotyping can be even more time consuming and labor intensive.
The advanced imaging-based phenotyping procedure is ideal for combining controlled irrigation and phenotype protocols (Berger et al., 2010). They enable studies to establish potential heritable traits and understanding the complex regulatory networks underlying adaptive phenotypic variation on a population with fully sequenced genome in high-throughput quantitative studies (Cooper et al., 2009;Munns et al., 2010;Furbank and Tester, 2011). Imaging-based high-throughput plant phenotyping platform has led to popular tools for plant biology, underpinning the field of plant phenomics (Paproki et al., 2012). Various imaging methodologies, such as visible light imaging, infrared imaging, fluorescence imaging, imaging spectroscopy, etc., are being used to collect multi-level phenotype data from macroscopic to molecular scale over a few seconds to weeks (Sozzani et al., 2014).
Since imaging methodologies are the key technologies in plant phenomics with increasing importance, the main goal is to measure quantitative phenotype through the interaction between plants and light, such as reflected photons, absorbed photons, or transmitted photons. The best phenotyping practice also requires standardized experimental protocols, including imaging sensor calibration and a precise definition of raw data processing routines.

Visible Light Imaging
In plant science, visible light imaging has been broadly adopted due to its low cost and simplicity. Using this imaging system, with a similar wavelength (ranging from 400 to 700 nm) perception as the human eye, two-dimensional (2D) images can be used to analyze numerous phenotypic characteristics and to record the changes in plant's biomass (Tackenberg, 2007;Bylesjo et al., 2008;Duan et al., 2011;Golzarian et al., 2011). To spread the spatial and volumetric information of phenotype images, three-dimensional (3D) imaging approaches have been developed, which could provide more accurate estimations of the morphological features (Clark et al., 2011;Paproki et al., 2012).
Therefore, during the integration of 2D and 3D image analysis, visible light imaging techniques are popular components for the integrated plant phenotyping platform (Yang et al., 2013). It represents raw data of a phenotype image in spatial matrices based on the intensity values relating to photon fluxes (red∼600 nm, green∼550 nm, blue∼450 nm) of the visible light spectral band. Although, it is the most trivial method in plant phenotyping, the drawback is that visible images only provide physiological information, and the common problem is created by the overlapping adjacent leaves and soil background during segmentation process (Fiorani and Schurr, 2013;Li et al., 2014).

Infrared Imaging
Infrared imaging technologies are used for screening objects of internal molecular movements which emit infrared radiation (Kastberger and Stachl, 2003). Two popular infrared imaging devices-a near-infrared (NIR) and a far-infrared (Far-IR, also called IR thermal)-can be used to screen radiation images. Many studies have combined visible and NIR imaging to detect vegetative indices due to the fact that healthy plants reflect a large proportion of NIR light (800-1400 nm), whereas soil reflects little NIR light. Moreover, soil and unhealthy plants reflect considerably more red light as compared with healthy plants (Yang et al., 2013).
The major advantage of visible light and NIR imaging are that they can assess plant health status response to different stress conditions. Visible and NIR digital imaging techniques are more suitable for screening multi-traits and nitrogen status under stress condition (Rajendran et al., 2009). For drought resistance, IR thermal imaging can be used to visualize temperature differences. A thermal infrared imaging technique has been introduced in both, laboratories and fields, and can characterize mutant screens, drought tolerance, salinity tolerance, osmotic tolerance, tissue tolerance, and Na + exclusion. It can be used to compare chlorophyll pigments, leaf color and canopy temperature (Merlot et al., 2002;Jones et al., 2009;Munns et al., 2010). Infrared imaging has improved drought resistance and/or salinity resistance research by quantifying the osmotic tolerance in response to drought or salinity stress (Munns et al., 2010).
The benefits of the infrared imaging technologies are that they provide spatial resolution and more precise measurement under changing environmental conditions, and in field trials a large number of plots can be imaged at the same time (Li et al., 2014). One limitation of thermal imaging in the field is that it needs to include correction of soil background, wind impact and effects of transient cloudiness (Jones et al., 2009;Munns et al., 2010;Fiorani and Schurr, 2013).

Fluorescence Imaging
Fluorescence imaging is used from laboratory to field. This imaging technique describes the information about the plant metabolic status that can be obtained by the artificial excitation of the plant photo systems and observation of the relevant responses (Li et al., 2014). It is based on charge-couple device (CCD) cameras with sensitive fluorescence signals, where the signals occur by illuminating samples with visible or ultraviolet light. There are two types of fluorescence (red to far red region and the blue to green region) generated by the ultraviolet illumination ranging from 340 to 360 nm, and is expressed as a principle of underlying multi color fluorescence imaging. This technique offers the simultaneous capture of fluorescence emission, and provides a quick way to probe photosystem II status in vivo (Schreiber, 1986;Daley et al., 1989;Maxwell and Johnson, 2000;Baker, 2008).
There have been several uses of fluorescence imaging proposed for early detection of stress responses to biotic and abiotic factors before a decline in growth can be measured (Baker, 2008;Jansen et al., 2009;Konishi et al., 2009;Munns et al., 2010;De Smet et al., 2012;Chen et al., 2014b). To screen large mutant collections and to characterize mutants with different photosynthetic pigment composition, portable fluorometers, and fluorescence cameras are widely used (Niyogi et al., 1998;Lu et al., 2011). Furthermore, fluorescence imaging technique provides powerful diagnostic tool to resolve the heterogeneity problem of leaf photosynthetic performance, and is used in many areas of plant physiology (Baker, 2008). Most of the fluorescence imaging applications are limited to the seedling level or the single leaves of model crop. However, it is necessary to develop more robust software and standard procedures for the fluorescence image phenotyping, processing, and data analysis.

Spectroscopy Imaging
The use of spectroscopy imaging is very promising for plant phenotyping. It measures the interaction of solar radiation with plants, and originated from remote sensing of vegetation research (Kokaly et al., 2009;Li et al., 2014). Spectral measurements of the electromagnetic spectra can be obtained through multispectral or hyperspectral cameras that are capable of scanning wavebands of interest at high regulation (Fiorani and Schurr, 2013). Multispectral and hyperspectral measurements of the absorption band in the infrared range are used to describe various water statuses that estimate the canopy water content. The best usable examples of spectral measurements is the derivation of a number of reflectance vegetation indices from simple differences between two wavelength reflectance values to normalized reflectance values. The reflected spectra carry the information about plant architecture and health condition, which can be used to evaluate growth characteristics.
Beyond visible and infrared imaging methods, hyperspectral imaging method can divide images into bands, thus providing a huge portion of the electromagnetic spectrum of the images (Yang et al., 2013). The high spectral resolution of hyperspectral technologies make it an essential method for detecting the severity of damage caused by insects (Huang et al., 2012;Yang et al., 2013). The application of spectroscopy imaging is well-suited for field phenotyping when combined with aerial platforms, but the cost of the spectral cameras and its related infrastrucres are relatively expensive.

Structural Tomography and Other Imaging
In recent times, modern optical 3D structural tomography and functional imaging techniques have been developed and extended to improve living plant visualization. Functional imaging such as chlorophyll fluorescence imaging and PET (Positron emission tomograpy) are used for finding photosynthetic performance, stress, and focuses on physiological changes (Baker, 2008). The combination of structural tomography and functional imaging can screen more precise physiological activity of plant. Another novel imaging technique, MRI (magnetic resonance imaging) is used for imaging of internal physiological processes occurring in vivo (Borisjuk et al., 2012). Screening the dynamic changes in plant functions and structures by the combining technique of MRI and PET provides a novel functional and structural imaging procedure (Jahnke et al., 2009).
The FRET (Förster resonance energy transfer) sensor is another of the non-invasive advanced imaging technologies for high-resolution measurement of small molecules in living tissue based on genetically encoded, ratiometric fluorescent sensors that bind to and report on levels of the target molecule (Jones et al., 2014). It is used for molecular phenotyping, and a single FRET sensor can lead to discoveries of multiple pathways and processes involved in the dynamics of the sensor target. The cellular/subcellular location of interest has to be properly characterized and expressed by a FRET sensor, and measurements can be easily acquired with high temporal and spatial resolution (Okumoto et al., 2012). As the application example, FRET has been used in plant tissue to study calcium and zinc dynamics with subcellular spatial and real-time temporal resolution, the characterization of sugar transport in roots of insect seedlings, the identification of novel sugar transporters (Jones et al., 2014). To address many basic questions of plant growth and development, FRET could be an outstanding technology for advanced phenotyping.
Each of these digital photonics-based systems acquire phenotype image data from plant laboratories, greenhouse or fields, and monitors these with special imaging sensor via a remote system. Table 1 illustrates a summary of optical photonics-based key techniques and applications in advanced phenotyping.

Experiment Setup and Large-Scale Phenotype Data Collection
High-throughput experimental samples are prepared in a control phenotyping station by selecting different genotypes under normal and various treatments and conditions (Figure 1). The commencement and intensity of those conditions (biotic or abiotic) can be defined and controlled during the experiment. Since acquiring data must be analyzed with respect to the micro-climate and environmental conditions, it is very difficult to monitor and combine experimental materials in a dynamic process (Sadok et al., 2007;Parent et al., 2010). Advances in automation of plant phenotype, robotic-and sensor-based monitoring have enabled phenotype data acquisition, performed at regular time intervals throughout the life cycle of the plant in an automated manner for a given experiment. Highthroughput phenotyping facilities of these type of experiments are commercially available, but many laboratories are now developing their own systems (Granier et al., 2006;Walter et al., 2007;Jansen et al., 2009;Skirycz et al., 2011;Tisne et al., 2013;Yang et al., 2014). Currently, various research institutes, e.g., IPK Gatersleben, Germany 2 ; Crop Design, Gent, Belgium 3 ; The Plant Accelerator, Adelaide, Australia 4 ; PhenoArch, Montpellier, France 5 are using these facilities. Another more advanced and dominant phenotyping platform developed by LemnaTec 6 provides many software and tools for plant phenotype screening and image analysis.
In the modern phenotyping platforms, a fully automated control house enables plants to be delivered via conveyor belts to watering, weighing, and imaging stations, and several 100 individual plants can be imaged per day automatically. Such imaging platforms are designed by either moving plants to a stationary camera or robotically moving the camera to a stationary plant. After designing suitable experiments, highthroughput phenotyping platforms non-destructively capture multi-categories [infrared (IR), fluorescence (FLUO), visible (VIS) spectra, etc.] plant images for dissecting the phenotypic This imaging technique can be used to assess plant growth status, biomass accumulation, nutritional status, or health status (Golzarian et al., 2011;Camargo et al., 2014;Yang et al., 2014).
Thermal infrared Thermal infrared imaging sensor includes near-infrared, multispectral line scanning cameras. This imaging technique produces time series or single-time-point analysis based data.
Leaf area index, shoot or leaf temperature, surface temperature, insect infestation of grain, leaf and canopy water status, composition parameters for seeds, disease severity, etc.
This imaging technique used to characterize the plant temperature responses to the water status and transpiration rate and detect difference in stomatal conductance of the plant for adoption abiotic stress (Chen et al., 2014b).

Fluorescence
Fluorescence imaging technique detects chlorophyll and other fluorophores signals using fluorescence cameras.
Photosynthetic performance, quantum yield, non-photochemical quenching, leaf disease severity assessments, leaf health status, etc.
It provides a fleet way to probe photosystem status in vivo, diagnosing early stress responses before decline growth (Fiorani and Schurr, 2013), useful for disease detection in genetic disease resistance (Chen et al., 2014b), mapping QTLs for growth-related traits (El-Lithy et al., 2004), characterizing mutants with numerous photosynthetic pigment compositions (Niyogi et al., 1998), etc.

Hyperspectral
This imaging technique use hyper spectral, thermal cameras produced continuous, or discrete spectra raw data.
Water content, leaf growth and health status, panicle health status, grain quality, pigment composition, etc.
This imaging technique used to measure spatiotemporal growth patterns during the experiment and provide insight into the diversity of growth dynamics (Chen et al., 2014b).

CT
It is based on X-ray digital radiography/computed tomography.
PET Positron emission tomography. Water transport, flow velocity, etc. This is used to visualize distribution and transportation of radionuclide-labeled tracers involved in metabolism-related activities (Jahnke et al., 2009;Granier and Vile, 2014).

MRI
Magnetic resonance imaging. Water content, morphometric parameters, etc.
The purpose of this imaging technique is to visualize metabolites, provides structural information, and monitor internal physiological processes occurring in vivo (Borisjuk et al., 2012;Granier and Vile, 2014).
components (Brien et al., 2013;Dhondt et al., 2013;Klukas et al., 2014). In the case of Arabidopsis, the top view of the rosette image is sufficient for measuring rosette area (Walter et al., 2007;Skirycz et al., 2011;Tisne et al., 2013). But monocot plant morphology is complicated and the top view image alone is insufficient for morphological operation. Thus, a side view image is also required (Sozzani et al., 2014). Automated phenotyping process acquires large numbers of side-view and top-view images from different angles in regular time intervals and stores them in an image data management server (Figure 1). On the other hand, image processing is one of the major tasks for acquiring accurate traits or features from these images (Hartmann et al., 2011;De Vylder et al., 2012;Klukas et al., 2014). Many general image processing software and tools are available for phenotype image processing and morphological operation of plants (Lobet et al., 2013). We illustrated a summary of phenotyping platforms and open source plant image processing and analysis software and tools in Tables 2 and 3, respectively.

Principles of Phenotype Data for Forecasting Plant Performance
High-throughput phenotyping provides multi-categorical phenotypic traits, and corresponding trait analysis is essential for the understanding of (a) stress resistance, (b) insect and disease resistance and for the (c) yield and quality improvement (Yang et al., 2013). The most often investigated phenotypic traits include leaf area index, biomass, canopy temperature, leaf number, seed yield, water content, leaf expansion rate, leaf shape, rate of photosynthesis, number of layers, tissue thickness, mesophyll conductance, cell size, cell division rate, and cell turgor (Tackenberg, 2007;Duan et al., 2011;Golzarian et al., 2011;Dhondt et al., 2013). The phenotype data attained by the imaging system can afford high-throughput phenotypic traits based on image color, shape, and texture (Aerts et al., 2014;Klukas et al., 2014). Color-related trait categories depend on visible/RGB cameras used for multiple phenotype images and expressed with the color intensity/pixels, and other traits depend on different geometrical and mathematical measurements, e.g., area, compactness, circumference, roundness, plant height, plant width, plant length (Klukas et al., 2014). These traits help to determine the similarity/dissimilarity among the different genotypes and treatments, different stress status and its effects on the phenotype (Chen et al., 2014b). Phenotypic features also depend on its corresponding camera being used in the phenotype imaging system, for example, fluorescencerelated features, tomography-related (CT) features (Konishi et al., 2009;Aerts et al., 2014;Klukas et al., 2014;Yang et al., 2014). In the phenotyping platforms for high-throughput phenotype imaging, plants are cultured under controlled environmental conditions in robotic control house systems for sample preparation. Each plant with a special treatment such as stress and/or mutant treatment is located in a container with controlled nutrient supply which is transported by the conveyor belt to the required position. The platform automatically screens germplasm resources and populations, and captures multiple top view/side view images. After image acquisition, the data should be transferred and managed by the data management system with recording environmental data and genotype information. Then image processing methods are used to calculate phenotypic traits/features from the image data. Data mining methods are used to acquire the values of the extracted features, or to statistically model and simulate the phenotype data in order to produce phenotype-genotype models in different environmental scenarios.
Advanced mathematical and statistical methods are required to predict plant development performance using these multiple traits. For a better interpretation of results, the integration of experimental metadata within data schemas for the ensured phenotype, genomic data, and environmental data are also required. A variety of methods and tools are widely used for phenotype data analysis. A choice of statistical univariate and multivariate methods are used for hypothesis testing and measuring interrelationships among the traits. Path analysis is used to control for covariations between variables and test hypothetical causal graphs for an interpretative approach (Granier and Vile, 2014). Computer-vision based measurements and assorted data mining techniques are a more useful infrastructure for phenotype data analysis. The uses of such analytical approaches select robust genotype and describe variation of plant phenotypic characteristics, which have implications for crop development and food security (Camargo et al., 2014).
Phenotype data analysis and modeling offer a meaningful structure of plant studies. The analysis results of phenotype data explain different relationships of traits-traits, traits-environment, phenotypic variations as well as important features for plant response, and phenotype-genotype associations. Here, we described high-throughput phenotype data analysis principles and methods (Figure 2) which can be of help to plant researchers to analyze large-scale phenotype image data for studying plant growth and development. This automated phenotyping platform is an integrated device, allowing simultaneous culture of 735 individual Arabidopsis plants and high-throughput acquisition, storage and analysis of quality phenotypes (Tisne et al., 2013).
TraitMill http://www.cropdesign.com High-throughput gene engineering platform developed by Crop Design. This is a highly versatile tool that enables large-scale transgenesis and automated high resolution phenotypic plant evolution (Reuzeau, 2007 This is an automated high-resolution phenomic center which provides non-invasive analysis of plant structure, morphology and function by utilizing cutting edge information technology including high resolution cameras and 3D reconstruction software.
LemnaTec http://www.lemnatec.com Visualize and analysis 2D/3D non-destructive high-throughput imaging, monitor plant growth and behavior under entirely controlled conditions in a robotic greenhouse system.
QubitPhenomics http://qubitphenomics.com Integrated conveyor and robotic high-throughput plant imaging system for the laboratory, growth chamber and field phenotype screening and phenotyping.
HRPF N/A High-throughput rice phenotyping facility (HRPF) designed with two main sections: rice automatic phenotyping (RAP) and yield trait scorer (YTS). This high-throughput platform was developed for automatic screening of rice germplasm resources and populations throughout the growth period and after harvest (Yang et al., 2014).

Name URL Description
ImageJ http://imagej.nih.gov/ij A popular, powerful, and extensible application used to process and measure a large quantity of phenotypic traits captured by images.
IAP http://iap.ipk-gatersleben.de Large-scale plant phenotyping image analysis software for different species based on real-time imaging data obtained from various spectra (Klukas et al., 2014).
LAMINA http://lamina.sourceforge.net Automated leaves image analysis tool which measures a variety of characteristics related to leaf shape and size (Bylesjo et al., 2008).
Leaf Processor http://gips.group.shef.ac.uk/resources.html An application that semi-automatically stores a number of single-metric parameters and PCA analysis for leaf shape and size including contour bending energy (Backhaus et al., 2010).
FIGURE 2 | A general workflow for the high-throughput image data analysis. The workflow describes image data processing steps for the extraction of the quantitative traits (left) and the analytical methodology (right).

High-Throughput Phenotype Data Analysis
Image data is pre-processed for determining quantitative or qualitative values of phenotypic traits of a plant/genotype in a given environment. Again, post-processing of phenotypic traits is also required for prediction of plant behavior using statistical analysis. Different statistical methods and algorithms are used for analyzing the process phenotype image data set, so as to move from the raw data to final results.

Hypothesis
Before starting the image data analysis, a proper hypothesis is required that corresponds to the expectation of the experiment within an appropriate statistical framework (Vasseur et al., 2012;Vile et al., 2012;Aerts et al., 2014).

Data Quality
The selection of an inadequate part of a trait often affects the data quality in a negative manner. Noisy images highly affect the dataset and could bias the results. Data normality tests and outlier detection is necessary to improve the data quality. Among the many data normalization and outlier test methods, Shapiro normality test with appropriate logtransformed and Bonferroni outlier tests are commonly used (Camargo et al., 2014). Grubb's test is another outlier detection method that performs better for any single outlier test existing in a particular sample (Grubbs, 1950). These methods provide a powerful statistic for the data normality test and control outliers in the data set. Also, phenotype data quality is affected by throughput and image resolution (Dhondt et al., 2013).

Data Dimension
Phenotypic traits which are extracted from the high-throughput image dataset can be high-dimensional and be highly correlated. Therefore, analyzing this high-dimensional dataset can be difficult due to the limitations of current analytical techniques.
To overcome these difficulties the data size can be reduced with as little information loss as possible. Here, we mention two statistical methods that are popular for dimensionality reduction and projection of the high-dimensional data set. Suppose that X nxp is an adjusted phenotype data matrix. Thus the basic equation of PCA is-in matrix notation-given by where Y is a matrix of new features, called PCA, constructed as a weighted average of the original traits/features and W is a matrix of coefficients determined by PCA.
(b) FA model (factor analysis model): FA is another data reduction tool which removes redundancy or duplication from a set of correlated phenotypic traits. Under some assumption basic model of FA expressed as: where X represents observed features, F represents latent feature, e is the measurement error and λ is the loading value for X.
These statistical methods can easily select important traits and reduce the data size to explore the relationships between traits, their variations, relationships with the environmental factors, and also shows a feature's contribution in a specific study (Vile et al., 2012). Data analysis by these methods can be described as phenotypic variation in different conditions and enable to distinguish plants of different agronomic groups (Chen et al., 2014b). One can compute the intra-class correlation coefficients for the reliable analysis and inference to evaluate the stability of the selected features obtained from these methods (Aerts et al., 2014).

Model Selection
An appropriate model is needed for phenotypic variance and biomass prediction. Linear mixed-effect models can be used for phenotypic variance decomposition. Phenotypic variance decomposition results show the effect of genetic and environmental sources and their interaction for the phenotypic traits. By the likelihood estimation of mixedeffect models, it is possible to test the effect's significance with respect to phenotypic variance (Joosen et al., 2013;Chen et al., 2014b). However, linear mixed-effect modeling approaches are more appropriate alternatives when dealing with time series data, in case observation variances are unequal or there is a degree of correlation between measurements (Camargo et al., 2014). Linear models and/or generalized linear models are widely used for biomass prediction (Golzarian et al., 2011;Camargo et al., 2014). To select the effective predictors for biomass prediction Akaike's information criteria provides all relevant regression models (Yang et al., 2014). During the selection, it is necessary to check the heterogeneity and to solve the auto-correlation problem for phenotype data. Since, phenotype datasets may contain redundant and reproducible features and therefore, stepwise variable selection methods can be used to select an optimal set of explanatory variables for an appropriate statistical model by removing the multi-colinearity (the correlation among the independent variables of a regression model) problem among the features. Such a model provides more accurate biomass information.

Relationship Measurement
A bivariate relationship study is a powerful tool that describes numerous relationships among the traits-traits and traitsenvironment for a given genotype. This study provides the relationships between phenotypic traits, and its treatments, or other biotic and abiotic effects on the phenotype (Vasseur et al., 2012). The bivariate relationship study includes correlation of traits, allometric relationships, and QTLs relationship to demonstrate strong genetic and phenotypic relations of the same categorical traits (Chen et al., 2014b;Granier and Vile, 2014). These are also useful for measuring the genetic overlap and phenotypic similarity of different traits. A phylogenetic method within the data analysis can be used to infer the causation relationship history of both a gene and its corresponding phenotype (Kaplan and Pigliucci, 2001;Fiorani and Schurr, 2013).

Networking
Network analysis is also essential to find the relationships among the significant traits. To describe the network relations among the phenotypic traits, structural equation models, and Bayesian networks are used for the causal relationship and correlative network analyses, respectively. The objective of structural equation modeling is to quantify the relative contributions of correlated causal sources of variance once a certain network of interconnected features with biological significance has been accepted (Hershberger, 2001;Tisne et al., 2008). The Bayesian networks can be used to visualize genetic and/or phenotypic structure using the trait-trait genetic correlation and/or trait-trait phenotypic correlation (Chen et al., 2014b).

Growth Modeling
Another major part of phenotype data analysis is plant modeling (Kaitaniemi et al., 1999;Fournier and Andrieu, 2000;Buck-Sorlin, 2002;Evers et al., 2007;Buck-Sorlin et al., 2008;Xu et al., 2011). Visual 3D plant modeling and simulation provide a deeper understanding of plant growth and its relationship with the environment. Plant growth modeling helps us to test hypotheses and carry out virtual experiments concerning plant growth processes (Fourcaud et al., 2008). Functionalstructural (FS) plant growth models are extremely important for integrating biological processes with environmental conditions in 3D virtual plants (Vos et al., 2010). Nowadays for more advanced research in plant sciences, time-lapse imaging-based phenotype data provides an opportunity to fit models and predict plant growth under numerous conditions. To observe the dynamic behavior of plant growth, many models have been established for different patterns of growth (Paine et al., 2012). It is well known that among the available models a sigmoid model (logistic, Gompertz) performs better for interpreting individual plant growth (Damgaard and Weiner, 2008;Karadavut et al., 2010;Chen et al., 2014b). Other population growth models, such as linear, exponential, power law, and monomolecular are also used for plant growth and pathological studies. In plant pathology, these models are often used for studying disease progression over time.

Classification
Classification methods are useful for biological image analysis and have simplified numerous tasks (Kamber et al., 1995;Warfield et al., 2000;Cocosco et al., 2003;Li and Chen, 2009). For example, there is a need to control diseases and numerous stresses to maintain food quality worldwide and to reduce food-borne illness originated from infected plants (Schikora et al., 2008). A wide variety of plant stress and diseases caused by the environmental factors (such as light quantity, light quality, CO 2 , nutrients, air humidity, water, temperature, drought, salinity) or other organisms (such as fungi, bacteria, and viruses) have high impact to decrease grain production and grain quality. Thus, it is important to detect and classify the plant infestations (Granier and Vile, 2014). In most cases symptoms of stress and disease in plants result is the change of the plant color. Therefore, classification approaches can be used to classify the color-related traits obtained from the plant phenotype image pixels under the biotic and abiotic conditions (Schikora et al., 2012;Chen et al., 2014b). There are many popular classification algorithms that are very helpful for plant research, such as SVM (support vector machine), Bayesian classifier, neural network (Schikora et al., 2010(Schikora et al., , 2012Chen et al., 2014b).

Similarity/Dissimilarity Measurement
Clustering approaches provide important information regarding the similarity/dissimilarity among the significant features. For phenotype data analysis, these can be used to measure plant stress sensitivity between control and stress plants, phenotypic trait similarity of different genotypes, identifying unknown groups of plant species, and for supporting the idea of the phenotypic profiles corresponding to the similar genotype (Chen et al., 2014b). K-means clustering, hierarchical clustering, SOM (self-organizing map) are very popular approaches for cluster analysis of various types of dataset. Furthermore, neighborjoining trees and phylogenetic trees are useful methods to show the phenotypic similarity and evolution of plants of various origins, revealing clusters of similar phenotypic patterns (Aerts et al., 2014). This type of analysis helps distinguish phenotypic trait's patterns, provide important trait information and support further evaluation of the defined traits. Therefore, it is possible to find the significant association between trait profiles or pairs of the same groups or between groups of genotypes and phenotypes using correlation coefficient and test statistic (e.g., χ 2 test, onesided Mann-Whitney U-test; Aerts et al., 2014;Chen et al., 2014b).

Conclusion and Future Indication
Research in plant biology has benefited and continues to benefit from developing high-throughput traits measurement methodologies at different levels including metabolomics, proteomics, and transcriptomics data (Granier and Vile, 2014). Advanced phenotyping technologies combine molecular techniques and non-invasive sensors with computer vision approaches. These approaches contribute to the momentous progression of high-throughput plant development research. This advanced research enables observation of highthroughput phenotypic traits and how these traits change depending on environment and genotype. These studies generate large-scale multidimensional data sets, requiring proper data management and analytical frameworks for their interpretation (Fiorani and Schurr, 2013;Klukas et al., 2014).
Most high-throughput phenotyping platforms accumulate huge amounts of image data, but these automated workflows may also increase the risk of data quality deterioration, and they might miss interesting phenotypes if proper checkpoints are not implemented at different stages of the imaging and image processing (Arvidsson et al., 2011;Dhondt et al., 2013). Therefore, it is necessary to manage and process data efficiently. Although different techniques and analytical frameworks provide a solution for handling this big data problem, these are designed individually to discuss a few specific questions and trait information (Sozzani and Benfey, 2011). Hence, the major problem is the modeling and analysis of phenotype data. There are existing statistical techniques and methods, which are often useful for dimension reduction, significant feature extraction, data pattern identification, and inference analysis (Granier et al., 2006;Karkee et al., 2009;Yang et al., 2009;Golzarian et al., 2011;Romer et al., 2011;Camargo et al., 2014).
In the near future, there is an urgent need to develop more adaptable, less expensive and sophisticated data analysis infrastructures for analyzing high-dimensional phenotype datasets in the phenomics area. In case more efficient statistical methods are being developed, multidisciplinary simulation models might support the proper experiment design and an improved acquisition of phenotype data. These aspects will support the promotion and explanation of plant growth, development, or responses to adverse environments. In this review, we have discussed different imaging techniques, phenotyping platforms, image analysis pipelines and phenotype data analysis methods for the high-throughput plant study. Based on our discussion we suggest that scientists should address the future challenges to enable the development of optimal digital phenotyping platforms. These challenges are, e.g., the reduction of phenotyping and other related laboratory costs, the development of an efficient data storage and less expensive analytical tools, as well as the improvement of the statistical methods to explore the plant dynamic phenotypic components and their properties.