Accurate Digitization of the Chlorophyll Distribution of Individual Rice Leaves Using Hyperspectral Imaging and an Integrated Image Analysis Pipeline

Pigments absorb light, transform it into energy, and provide reaction sites for photosynthesis; thus, the quantification of pigment distribution is vital to plant research. Traditional methods for the quantification of pigments are time-consuming and not suitable for the high-throughput digitization of rice pigment distribution. In this study, using a hyperspectral imaging system, we developed an integrated image analysis pipeline for automatically processing enormous amounts of hyperspectral data. We also built models for accurately quantifying 4 pigments (chlorophyll a, chlorophyll b, total chlorophyll and carotenoid) from rice leaves and determined the important bands (700-760 nm) associated with these pigments. At the tillering stage, the R2 values and mean absolute percentage errors of the models were 0.827–0.928 and 6.94–12.84%, respectively. The hyperspectral data and these models can be combined for digitizing the distribution of the chlorophyll with high resolution (0.11 mm/pixel). In summary, the integrated hyperspectral image analysis pipeline and selected models can be used to quantify the chlorophyll distribution in rice leaves. The use of this technique will benefit rice functional genomics and rice breeding.


INTRODUCTION
Rice is a staple food for a majority of the world population (Zhang, 2007). To meet the increasing demand due to natural disasters, human factors and the increasing world population on rice growth and yield, it is important to breed new rice varieties. In breeding research, the plant phenotype is essential for the evaluation of breeding results and gene functional analysis (Yang et al., 2013;Jasinski et al., 2016;Montagnoli et al., 2016;Negi et al., 2016). Plants contain pigments such as chlorophylls and carotenoids, which absorb light and provide energy for photosynthesis (Blackburn, 1998b). Chlorophyll is the major nitrogenous substance in higher plants and can be used for measuring plant growth (Kochubey and Kazantsev, 2007;Xue and Yang, 2009). The amount of chlorophyll present also determines a plant's photosynthetic capability, productivity and yield potential (Carter, 1998;Xue and Yang, 2009). Thus, quantification of these pigments is vital for rice phenomics and rice research.
Traditional methods for the quantification of plant pigments, including spectrophotometry (Ergun et al., 2004), paper chromatography (Sporer et al., 1954), thin-layer chromatography (Sievers and Hynninen, 1977), and high-performance liquid chromatography (Yuan et al., 1997), are time-consuming, destructive and not suitable for high-throughput phenotyping. Plant pigments have different absorption peaks under different wavelengths, which means that their spectral reflectance characteristics can be used for evaluating or distinguishing pigments (Benedict and Swidler, 1961;Gamon and Surfus, 1999). Using spectroscopy and a portable chlorophyll meter, several spectral indices have been identified, which can be used for predicting plant chlorophyll content non-destructively. Blackburn et al. reported that the amount of canopy chlorophyll a and b is related to the original reflectance at 676 and 810 nm (Blackburn, 1998a;Blackburn and Pitman, 1999). Because derivatization can reduce the noise caused by illumination, soil background, and atmosphere (Collins, 1978;Baret et al., 1992), derivative spectra have also been found to be more sensitive to the chlorophyll content and more effective than the original spectral index (Le Maire et al., 2004). Moreover, spectral indices calculated by the red edge can provide a more accurate estimation of pigment content (Miller et al., 1990;Zou et al., 2011). Researchers have also found that the ratio and normalized spectral indices are closely related to the pigment content (Moss and Rock, 1991;Chappelle et al., 1992). Yi et al. used partial least square regression and found that the reflectance at 515-550 nm, 715 and 750 nm regions had high sensitivity for detecting the carotenoid contents of cotton (Yi et al., 2014). A recent study has used hyperspectral imagery to estimate the spatial variability in the chlorophyll and nitrogen content of rice, with an R 2 of 0.69-0.82 (Moharana and Dutta, 2016). Researchers also used canopy reflectance to estimate the durum wheat nitrogen status, with an RMSECV of 19.3-36.3% (Thorp et al., 2017). Portable chlorophyll meters, such as  and , are widely used for measuring the chlorophyll content; however, manually operated portable chlorophyll meters are relatively subjective, and spectroscopy techniques cannot be used to digitize the chlorophyll distribution in rice leaves. Moreover, we summarized the recent studies on chlorophyll or nitrogen quantification that used spectral techniques (Supplementary Table 1). These studies showed that few efforts have been made to handle massive amounts of hyperspectral data and automatically digitalize the chlorophyll distribution in individual rice leaves with high-resolution.
In this study, we developed an integrated image analysis pipeline that can process extremely large amounts of hyperspectral data and built models to accurately measure 4 rice leaf pigments: chlorophyll a, chlorophyll b, total chlorophyll, and carotenoid. Moreover, by combining the hyperspectral data and the selected models, the distribution of these 4 pigments can be digitized with high resolution.

Hyperspectral Imaging System and Hyperspectral Indices Extraction
Three leaves were selected from the main stem of each rice plant and scanned using the hyperspectral imaging system, which consisted of 4 major parts ( Figure 1A): a halogen lamp, a translation stage, a hyperspectral camera (HyperspecTM VNIR, Headwall Photonics, USA), and a computer (OXPCO3, Dell, USA). To scan three leaves of one main stem simultaneously, the field of view was set at 115 × 180 mm. The major configurations of the hyperspectral imaging system are shown in Figure 1B, and the main parameters of the hyperspectral imaging system are shown in Table 1. The data were continuously stored as a binary data stream to acquire and store the original hyperspectral data as rapidly as possible. For each sample, the data size was 1.15 GBit.
After data acquisition, the binary data stream was reorganized to build 188 hyperspectral images under different wavelengths (Figures 2A-C). To process the massive number of images automatically, an integrated hyperspectral image analysis pipeline was developed (Figure 3). The detailed image analysis pipeline designed by LabVIEW is shown in Supplementary Figures 1-11, which included the following steps: (1) Open one binary data stream with the band interleaved by line format: The size of the hyperspectral data cube was 188 × 1,004 (W) × 1,637 (H). (2) The binary data stream was reorganized to build 188 hyperspectral images. (3) Image processing and ROI extracting: After image division, gray conversion, image binarization, horizontal open operation, removal of large areas, removal of noise, region growing, and extraction of the area of interest, a region of interest (ROI) was extracted for each leaf (Figures 2E-N). (4) ROI reflectance extracting: 188 original average reflectance indices

Manual Measurement
After hyperspectral imaging system acquisition, the ROI of each leaf was immersed in a 95% ethanol solution. When all of the pigments had been dissolved, a spectrophotometer (L3, INESA, China) was used to measure the absorbance values of the solution at different wavelengths (470, 649, and 665 nm, Figure 2O). Finally, the contents of 4 pigments, chlorophyll a, chlorophyll b, carotenoid, and total chlorophyll, were calculated according to Equations (1)-(4) (Arnon, 1949).
C a = 13.95A 665 − 6.88A 649 (1) C a is the chlorophyll a content, C b is the chlorophyll b content, C xc is the carotenoid content, and C is the total chlorophyll content. A 665 , A 649 , and A 470 represent the absorbance at 665, 649, and 470 nm, respectively. The distribution of the pigments at the two stages of plant growth is shown in Supplementary Table 3. For instance, at the tillering stage, the chlorophyll a content ranged from 61.24 to 573.63 mg/m 2 . The average value, the standard deviation, and the variable coefficient were 294.35 mg/m 2 , 92.19 mg/m 2 , and 31.32%, respectively. The correlation coefficients (r) between the pigments for the two stages were all above 0.88 (Supplementary Table 4), demonstrating that the concentrations of the various pigments were highly correlated.

Data Analysis and Modeling
To determine the specific bands that are highly correlated with chlorophyll a, we calculated all of the correlation coefficients between 634,615 spectral indices and 4 pigments. The calculation of correlation coefficients was programmed using LabVIEW 8.6 (National Instruments, Inc., USA). The hot bands were found using the heat maps of the correlation coefficients, which were drawn using HemI software (Deng et al., 2014). After all of the indices were obtained, the best index with the highest r was identified and used to build 5 models (linear, power, exponential, logarithmic, and quadratic models). The statistical analyses of the 5 models (linear, power, exponential, logarithm, and quadratic model) for 4 pigments and cross-validation were implemented with LabVIEW 8.6 (National Instruments, Inc., USA). To evaluate the model performance with primary indices or multiple variables, stepwise regression analysis (SRA) was conducted using SPSS software (Statistical Product and Service Solutions, Version 13.0, SPSS Inc., USA) ( Figure 2P). Finally, the digitization of pigment distribution was performed using LabVIEW 8.6 (National Instruments, Inc., USA).

The Relationship between Chlorophyll a Concentration and Hyperspectral Indices
The number of total indices was too large to handle (634,615 indices for each sample); thus, to decrease the number of redundant indices, we first determined the relationship between the chlorophyll content and all the hyperspectral indices. Because the pigments were highly correlated with each other (Supplementary Table 4), we used chlorophyll a as an example to define the relationship between the pigments and the hyperspectral indices. In the 500-700 nm region (Figure 4A), the reflectance R was inversely correlated with the chlorophyll a content, indicating that the higher the reflectance was, the lower the chlorophyll a content was. This occurred because leaves with high chlorophyll content absorbed more light, causing the reflectance to decrease ( Figure 2D). From Figures 4A-F, we found that compared with a logarithmic transformation, the use of derivative transformations such as dR, ddR, d(lg(1/R)), and dd(lg(1/R)) could provide more abundant hyperspectral information. Table 2 and chlorophyll a, and Figures 4J-L show the correlation between the normalized index (also defined in Table 2) and chlorophyll a. Each point on the heat map represents the correlation coefficient between a hyperspectral index and the chlorophyll a level. The correlation coefficients for other indices and the chlorophyll a level are shown in Supplementary Figures 12, 13. When R i and R j were both in the 500-750 nm region, the correlation coefficient was high, sometimes even close to 1. Thus, we can infer that useful information for estimating chlorophyll a can be obtained in the wavelength range 500-750 nm.

Figures 4G-I show the correlation between the ratio index as defined in
By comparing the data shown in Figures 4G-I, we found that for the ratio indices, the correlation between the derivative indices and chlorophyll a decreased, and the original hyperspectral index (average reflectance, R) showed better correlation with chlorophylla. As illustrated in Figures 4J-L, the same results could be obtained for the normalized indices. Thus, to decrease the redundant indices, primary indices, including the original average reflectance (R i ), first derivative index (d(R i )), second derivative index (dd(R i )), ratio index (R i /R j ), and normalized index ((R i -R j )/(R i +R j )), were used for the subsequent modeling and prediction of chlorophyll levels. A combined heat map obtained by adding together all of the heat maps of ratio and normalization coefficients is shown in Figure 5. From this, we found that the region of the highest correlation was located between 700 and 760 nm. If only the primary indices in the 700-760 nm region were used, the number of indices would decrease from 634, 615 to 483.

Linear Modeling with a Single Variable
After all of the indices were calculated, the hyperspectral indices with the highest correlation coefficients (r) of the pigments were selected for the modeling step, as shown in Table 3. The singlevariable model for 4 pigments at the tillering and heading stages is shown in Table 4, which show that R 2 ranged from 0.654 to 0.928, and the mean absolute percentage error (MAPE) was 6.94-12.84%. The scatter plots and the distribution of the relative error are shown in Figure 6 and Supplementary Figure 14, respectively, which show the points to be evenly distributed around the line y = x and that most of the relative error within the range ±10%. A 5-fold cross-validation of the single variable model for the 4 pigments at the two stages is shown in Table 4, which shows the ranges of R 2 and MAPE as 0.671-0.930 and 7.49-13.02%, respectively.
To evaluate the model's robustness, we evaluated the relationship between lg(R 715 )/lg(R 500 ) and the chlorophyll a level for different accessions grown under different nitrogen regimes at the tillering stage (Figure 7). The model was not sensitive to accession or the nitrogen application level. Figure 7B shows that the amount of chlorophyll an increased with increase in the nitrogen application level. Moreover, we also compared the best model for the 4 pigments in this study with the published indices, as shown in Table 3 and Supplementary Table 6. The correlation between the pigments and the indices selected in this study (0.81-0.96) was higher than the correlation between the pigments and the published index with the highest r (0.67-0.92). On the other hand, all of the published indices with high r values were based on at least one wavelength in the range of 700-760 nm, implying that this range (700-760 nm) is important for the quantification of leaf chlorophyll.

Comparison of Linear and Non-linear Models
To determine the best model for determination of chlorophyll a levels, 5 models, including the linear, power, exponential, logarithmic, and quadratic models, were compared. The results are shown in Table 5. We found that the linear model had the highest R 2 (0.928) and lowest MAPE (6.94%). Based on the relative robustness of the models, the linear model was selected as the final model for the quantification of chlorophyll. The results also indicate that the best relationship between the chlorophyll content and the index value was linear in our study.

Comparison of Models with All Indices and Models with Primary Indices
To compare the models that use all indices with those that use primary indices, we used 634,615 indices, and 483 primary indices for evaluating the model performance. The results (Table 3) showed that the highest r of the models that used primary indices (0.782-0.920) was similar to the highest r of the models that used all indices (0.809-0.963), indicating that the models that use primary indices are sufficiently accurate for the quantification of the 4 pigments. If only the primary indices were extracted and analyzed, the volume of hyperspectral data decreased from hundreds of thousands to hundreds, which dramatically reduced the workload of data acquisition and data analysis. The results of this comparison are shown in Table 3 and Supplementary Table 7. FIGURE 7 | Relationship between R 715 /R 500 and chlorophyll a content for different accessions (A) and for the same accessions under different nitrogen application levels (B) at the tillering stage.

Linear Modeling with Multi-Variables
We also evaluated the model performance using multi-variables.
To faciliate the evaluation, only some primary indices, including R, dR, and ddR, were used to build the model using a stepwise regression analysis. The results (Supplementary Table 8) showed that R 2 and R adj 2 increased slightly and that MAPE and RMSE decreased slightly as the number of independent variables increased. The distribution of the relative error of the model using a stepwise regression analysis and multivariables for chlorophyll a at the tillering stage is shown in Supplementary Figure 15, and 5-fold cross-validation of these models is shown in Supplementary Table 8.

Digitization of Leaf Chlorophyll Distribution
After the best single-variable model was built, it was used to digitize the leaf chlorophyll distribution at a high resolution (0.11 mm/pixel), as shown in Figure 8 (pseudo-color images). Figures 8A-C show the results obtained for one accession grown under different nitrogen application levels; with increasing nitrogen application, the chlorophyll a content increased dramatically. The chlorophyll a content of different accessions grown under the same nitrogen application level also varied (Figures 8D-F). Figures 8A-F show that for most samples, the chlorophyll concentration in the middle portion of the leaf was the highest, followed by the lower leaf and the upper leaf. Moreover, for the same leaf, the chlorophyll a content of the leaf vein was less than that of the leaf pulp, as shown in Figure 8G.

Modeling Nitrogen with Hyperspectral Imaging
A recent study showed that R 2 between the total chlorophyll content and leaf nitrogen content of Papaya plants (Castro et al., 2011) could reach 0.78, and hyperspectral reflectance measurements could reflect the canopy nitrogen content of winter wheat (Zhou et al., 2016). To test the correlation between the nitrogen and hyperspectral indices in rice, we measured 90 rice accessions, selected from 533 rice core germplasm resources, using an auto discrete analyzer (Smartchen 200, France), SPAD-502, and hyperspectral imaging. The correlation coefficient (r) between the SPAD value and the nitrogen content was 0.766 (Figure 9A), and r between the nitrogen content and hyperspectral measurements with 4 indices was 0.897 ( Figure 9B). Moreover, only using one index, the r between the nitrogen content and hyperspectral measurements was 0.773 ( Figure 9C). The results showed that nitrogen in rice plants could also be quantified using hyperspectral imaging.

Comparison of Recent Related Studies for Quantifying Chlorophyll or Nitrogen Distribution
We compared the present research with recent related studies and found that several key wavelengths that reflect chlorophyll, such as cotton at 715 and 750 nm (Yi et al., 2014), winter wheat at 705 nm and the red edge (Zhou et al., 2016), and grass at 690-750 (Tong and He, 2017), were co-determined. Moreover, the commonly adopted tools, such as ENVI and SAS, handled enormous amounts of hyperspectral data, particularly image analysis, with difficulty. To relieve the bottleneck, we developed an integrated image analysis pipeline in this study. With a single variable, the measuring accuracy of chlorophyll, R 2 , ranged from 0.654 to 0.928. Moreover, due to using hyperspectral imaging in a higher resolution (0.11 mm/pixel), the distribution of leaf chlorophyll could be clearly visualized. The goal of this article was to quantify the chlorophylls in individual rice leaves, which should be tested and verified in the field in future.
Combining the current field phenotyping tools, such as field phenotyping at the plot level (Andrade-Sanchez et al., 2014)  and movable imaging chambers in the field (Busemeyer et al., 2013), the integrated image analysis pipeline could be expanded to the field. Moreover, combined hyperspectral imaging with a novel sensor for structure imaging, such as a micro-CT (Mineyuki, 2014) and 3D laser scanning (Paulus et al., 2014), could also reconstruct the 3D distribution of chlorophyll in a high resolution.

CONCLUSIONS
In this study, we used a hyperspectral imaging system to develop an integrated image analysis pipeline to handle extremely large amounts of hyperspectral data automatically. We also built models that could be used to accurately quantify 4 rice leaf pigments and identify the important spectral bands (700-760 nm) associated with these pigments. Moreover, by combining the hyperspectral data and these models, the distribution of chlorophyll could be digitized with high resolution (0.11 mm/pixel). In the future, the pipeline and selected models can potentially be applied to quantify the chlorophyll distribution in individual plants non-destructively. Evidence from related works shows that the image analysis pipeline combined with hyperspectral imaging could also be extended for co-determining wavelengths for quantifying chlorophyll in other crops.

AUTHOR CONTRIBUTORS
HF and WY designed the research, performed the experiments, analyzed the data and wrote the manuscript. GC provided the rice samples and also performed experiments. LX and QL supervised the project, designed the research, and wrote the manuscript. Supplementary Figure 15 | Distribution of relative error of the one independent variable (A), two independent variables (B), three independent variables (C), four independent variables (D) models using stepwise regression analysis for chlorophyll a at the tillering stage.

ACKNOWLEDGMENTS
Supplementary