In-Field Detection and Quantification of Septoria Tritici Blotch in Diverse Wheat Germplasm Using Spectral–Temporal Features

Hyperspectral remote sensing holds the potential to detect and quantify crop diseases in a rapid and non-invasive manner. Such tools could greatly benefit resistance breeding, but their adoption is hampered by i) a lack of specificity to disease-related effects and ii) insufficient robustness to variation in reflectance caused by genotypic diversity and varying environmental conditions, which are fundamental elements of resistance breeding. We hypothesized that relying exclusively on temporal changes in canopy reflectance during pathogenesis may allow to specifically detect and quantify crop diseases while minimizing the confounding effects of genotype and environment. To test this hypothesis, we collected time-resolved canopy hyperspectral reflectance data for 18 diverse genotypes on infected and disease-free plots and engineered spectral–temporal features representing this hypothesis. Our results confirm the lack of specificity and robustness of disease assessments based on reflectance spectra at individual time points. We show that changes in spectral reflectance over time are indicative of the presence and severity of Septoria tritici blotch (STB) infections. Furthermore, the proposed time-integrated approach facilitated the delineation of disease from physiological senescence, which is pivotal for efficient selection of STB-resistant material under field conditions. A validation of models based on spectral–temporal features on a diverse panel of 330 wheat genotypes offered evidence for the robustness of the proposed method. This study demonstrates the potential of time-resolved canopy reflectance measurements for robust assessments of foliar diseases in the context of resistance breeding.


INTRODUCTION
Hyperspectral remote sensing has shown significant potential for the rapid, non-invasive assessment of crop diseases at different scales, ranging from single leaves (e.g., Mahlein et al., 2010;Ashourloo et al., 2014) to the canopy (e.g., Cao et al., 2013;Yu et al., 2018) to fields and regions (Wakie et al., 2016). Applications have been proposed primarily in the context of precision agriculture, but resistance breeding may equally benefit (Mahlein, 2016). The identification of novel sources of durable, quantitative disease resistance requires screening large and diverse germplasm collections under field conditions. Reflectance-based techniques hold the potential to reduce associated costs and allow for the screening of more genetic variation, if deployed on adequate phenotyping platforms (Kirchgessner et al., 2017;Aasen and Bolten, 2018). This may enable indirect selection in early breeding generations and facilitate the identification of novel sources of resistance.
However, to benefit crop breeding, new methods must accurately estimate phenotypes for large numbers of diverse genotypes under field conditions (Furbank and Tester, 2011;Araus and Cairns, 2014;Araus et al., 2018). This represents a significant challenge because genotypic diversity and contrasting environmental conditions are major sources of variation in spectral reflectance. This variation arises mostly from i) genotype morphology, canopy cover, and canopy 3-D structure (Haboudane et al., 2002;Zarco-Tejada et al., 2005;Jacquemoud et al., 2009;Gutierrez et al., 2015); ii) differences in genotype phenology and the timing of developmental transitions such as heading, flowering, and senescence (Pimstein et al., 2009;Stuckens et al., 2011;Kipp et al., 2014); and iii) reactions to other biotic or abiotic stresses, which may result in similar spectral responses as the disease of interest (Zhang et al., 2012). At present, effects of diseases on canopy reflectance are often investigated at specific points in time (see e.g., Yang, 2010;Cao et al., 2013;. Such investigations have provided valuable but highly context-specific insights (i.e., specific to the genotype, growth stage and/or site and environment under study; see e.g., Delalieux et al., 2007;Zhang et al., 2012;Zheng et al., 2019). Accordingly, identified spectral features and corresponding thresholds or calibration curves are not sufficiently robust (i.e., universally applicable) for use in resistance breeding. Largely due to such difficulties, high throughput phenotyping of disease resistance under field conditions using hyperspectral reflectance is still elusive (Araus et al., 2018).
Septoria tritici blotch (STB) caused by the fungal pathogen Zymoseptoria tritici is a serious threat to wheat production in major wheat growing areas around the world (Orton et al., 2011;Torriani et al., 2015). The development of cultivars with improved resistance to this disease has become a significant objective in wheat breeding and constitutes a key component of STB management strategies (O'Driscoll et al., 2014;Brown et al., 2015;McDonald and Mundt, 2016). Several major resistance genes conferring near-complete resistance to certain Z. tritici isolates have been identified and used in commercial cultivars (reviewed by Brown et al., 2015). However, these genes are frequently overcome within a few years of their introduction due to the high evolutionary potential of Z. tritici populations . Genetic loci conferring broadly effective partial resistance are thought to be more durable than major resistance genes (McDonald and Linde, 2002;McDonald and Mundt, 2016). However, sources of partial resistance are much more difficult to identify, as subtle differences in disease severity must be accurately quantified under field conditions, ideally over time.
Automated image analysis has shown great potential to accurately quantify STB resistance and characterize different components of resistance in genetically diverse breeding material (Stewart et al., 2016;Karisto et al., 2018). However, such measurements are more labor-intensive than visual scorings and do not provide the necessary throughput to routinely screen large breeding trials over time. Some recent work has investigated the potential of reflectance-based techniques to detect and quantify STB non-destructively at the leaf and canopy levels (Odilbekov et al., 2018;Yu et al., 2018). At the canopy level, the abovementioned challenges are particularly pronounced in the case of STB, because epidemics frequently reach damaging levels and affect crop performance most during the grain filling phase (Bancal et al., 2007). Consequently, detection and quantification of STB must be achieved in fully developed canopies with a complex architecture, and a clear delineation of STB and physiological senescence is essential for efficient selection.
Recently, efforts have been made to increase the specificity of reflectance-based methods. For example, new spectral vegetation indices (SVIs) with improved specificity to diseases have been developed by several authors for various patho-systems (e.g., Mahlein et al., 2013;Ashourloo et al., 2014). Other work has demonstrated that SVI combinations may allow to differentiate between diseases (Mahlein et al., 2010) and to delineate disease symptoms and nitrogen deficiency in wheat (Devadas et al., 2015). Yu et al. (2018) investigated the potential of different spectral features to robustly estimate STB severity at the canopy level in a large population of genetically diverse wheat genotypes. Other work has demonstrated that the sequence of temporal changes in hyperspectral reflectance signatures at the leaf level may be disease-specific, allowing to differentiate between sources of biological stress (Mahlein et al., 2010;Mahlein et al., 2012;Wahabzada et al., 2015).
Here, we aimed to achieve robust reflectance-based detection and quantification of STB under field conditions by exploiting changes in hyperspectral canopy reflectance over time. The basic rationale of the proposed approach is that pathogenesis consists in a specific and fixed sequence of events producing a constant outcome (i.e., disease symptoms). Accordingly, these events and outcomes should result in a specific and fixed sequence of changes in canopy spectral reflectance over time, irrespective of the genotype or environment under study. It seems highly likely that relying exclusively on this type of information increases the robustness of resulting estimations.
To test the feasibility of this approach, we engineered spectral-temporal features based on hyperspectral time series measurements. These features are designed to capture relevant changes in reflectance over time while minimizing the effect of the known confounding factors discussed above. We put forward the following hypotheses: (H 1 ) Confounding effects of contrasting morphology, canopy cover, and canopy 3-D structure are strongly reduced, if only relative changes in reflectance over time at the level of individual plots are analyzed. (H 2 ) Confounding effects of phenology can be reduced by using combinations of STBsensitive and STB-insensitive spectral features. Specifically, we hypothesize that several plant traits are relatively unaffected in their temporal dynamics by STB. Thus, related spectral features can be used as a baseline of changes in spectral reflectance over time, arising primarily from advancing crop phenology. This baseline can then be used to correct temporal patterns observed in STB-sensitive features for variation in phenology. Finally, we hypothesized (H 3 ) that the sequence and the dynamics of STBsensitive features are to a certain extent specific to this disease and not related to other biotic or abiotic stresses.
Thus, the objective of this study was i) to evaluate the potential of time-resolved hyperspectral reflectance measurements to detect and quantify STB infections, ii) to delineate STB and physiological senescence, and iii) to estimate the robustness of the proposed method and hence its potential for breeding applications.

MATERIALS AND METHODS
Plant Materials, Experimental Design, Phenology, and Meteorological Data A field experiment was carried out in the field phenotyping platform (FIP, Kirchgessner et al., 2017) at the ETH Research Station for Plant Sciences Lindau-Eschikon, Switzerland (47.449N, 8.682E, 520 m a.s.l.; soil type: eutric cambisol), in the wheat growing season of 2017-2018. A subset of 18 bread wheat (Triticum aestivum) genotypes was selected from the genom-analyse in biologischen system pflanze (GABI) wheat panel (Kollers et al., 2013; complemented with Swiss cultivars) for contrasting levels of resistance to STB and for contrasting stay-green properties based on previous experiments at the same location (Anderegg et al., 2019, submitted to this issue;Karisto et al., 2018). The set comprised morphologically diverse genotypes (e.g., awned and unawned), and there were obvious differences in canopy structural parameters (e.g., flag leaf angle) among the selected genotypes (Supplementary Figure 1). The study was conducted as a two-factorial experiment in a split-plot design with the presence/absence of artificial pathogen inoculation as a whole-plot factor and genotype as a split-plot factor.
Artificial inoculation with Z. tritici spore suspension was done on May 21, 2018. Z. tritici strains ST99CH_1A5, ST99CH_1E4, ST99CH_3D1, and ST99CH_3D7 were used (Zhan et al., 2002; see also http://www.septoria-tritici-blotch.net/isolate-collections. html). Spores were grown for 6 days in 200 ml of yeast sucrose broth liquid media (10 g yeast extract and 10 g sucrose in 1 L of water) in several flasks for each strain. The spore suspension was filtered and pooled together for each strain. Spore concentration was adjusted, and spore suspensions of each strain were mixed to achieve 150 ml of inoculum for each plot containing in total 10 6 spores/ml (2.5 x 10 5 sp/ml of each strain). Inoculum was sprayed in the evening into wet canopy of each plot.
There were two replications for the whole-plot factor. On the same site, the entire GABI panel was also grown in two replicates (two spatially separated lots). One replication of the inoculated plots each was located in a row adjacent to a lot of the GABI panel, separated by one row sown with the resistant cultivar CH NARA (DSP, Delley, Switzerland). One replication of the noninoculated control plots each was spatially randomized within a lot of the GABI panel. The experiments were sown with a sowing density of 400 plants m −2 on Oct 18, 2017. The plots sown with the GABI panel (and thus control plots within it) were treated with fungicides on three occasions: i) Input, Bayer (a mixture of sprioxamine at 300 g/L and prothioconazole at 150 g/L), was applied with a dose of 1.25 L/ha on 23 April 2018 (BBCH 31); ii) Aviator Xpro, Bayer (a mixture of bixafen at 75 g/L and metconazole at 41.25 g/L), was applied with a dose of 1.25 L/ha on 14 May 2018 (BBCH 51); and iii) Osiris, BASF (a mixture of epoxiconazole at 56.25 g/L and metconazole at 41.25 g/L), was applied with a dose of 2.5 L/ha on 28 May 2018 (BBCH 65). The inoculated control plots did not receive fungicide treatment. Temperature data was obtained from an on-site weather station. Rainfall data was obtained from a nearby weather station of the federal Swiss meteorological network Agrometeo (www. agrometeo.ch) located at ca. 250 m distance to the field trial. The temperature data was used to calculate growing degree-days (GDD) following  Lancashire et al., 1991). BBCH scores within the main growth stages were linearly interpolated between assessment dates. Stay-green was assessed visually as described previously (Anderegg et al., 2019, submitted), separately for the flag leaf and the whole canopy, following guidelines provided by Pask et al. (2012). Flag leaf stay-green was scored based on the portion of green leaf area on a scale from 0 (0% green leaf area) to 10 (100% green leaf area). An integer mean value was estimated for plants located in a central region of about 0.5 m × 0.5 m of each plot. Canopy stay-green was scored on the same scale by estimating the overall greenness of the plot when inspected at a view angle of approximately 45° considering the entire plot area. All scorings were done by the same person in 2-3 day intervals. An overview of measurements, scorings, and samplings is given in Table 1. Growth stages were recorded until physiological maturity following the BBCH scale (Lancashire et al., 1991).

Hyperspectral Reflectance Measurements
Canopy hyperspectral reflectance in the optical domain from 350 to 2500 nm was measured using a passive non-imaging spectroradiometer (ASD FieldSpec ® 4 spectroradiometer, ASD Inc., USA) equipped with an optic fiber with a field of view of 25°. Five spectra were recorded as the average of 15-25 separate spectral records while moving the fiber optic once along the diagonal of each plot at a height of approximately 0.4 m above the canopy. A Spectralon ® white reference panel was used for calibration before measuring canopy reflectance. The calibration was repeated after measuring one-half of a replicate (i.e., after nine plots, approximately every 3-5 min).
Measurements were carried out on 14 dates between heading and physiological maturity (i.e., between May 30 and July 12, 2018), resulting in an average of one measurement every 3 days. The maximum distance between two consecutive measurement dates was 6 days. In parallel, one lot of the GABI panel was measured on 13 dates. Here, the sensor calibration was repeated approximately every 10 min after completion of measurements on two rows.

STB Disease Assessment
The amount of STB in each plot was assessed on five dates (t1-t5) between 16 days after inoculation (dai, June 6, 2018) and 42 dai (July 2, 2018). STB was quantified by combined assessments of disease incidence (i.e., the proportion of leaves showing visible symptoms of STB) and conditional disease severity (i.e., the amount of disease on symptomatic plants). Disease incidence was assessed visually for 30 plants per plot by inspecting the leaves of one tiller per plant. Incidence scorings were obtained per leaf layer. Conditional disease severity was then measured using automated analysis of scanned leaves exhibiting obvious disease symptoms. To this end, eight infected leaves were sampled per plot, transported to the laboratory, and imaged on flatbed scanners following the method described by Stewart et al. (2016) and Karisto et al. (2018). However, to avoid interfering excessively with the development of the disease epidemic, leaf samples from inoculated plots were taken only if disease incidence was at least 1/3 (i.e., if at least 10 out of 30 examined leaves exhibited symptoms of STB infection). Thus, no leaf samples were taken at t1, while at t2, second leaves from the top were sampled from a subset of plots. Starting at t3, all plots were sampled at the flag leaf layer. In contrast, from non-inoculated control plots, eight leaves were sampled without reference to their disease status due to very low disease incidence. Automated image analysis was then used to extract the percent of leaf area covered by lesions (PLACL) from the generated leaf scans using thresholds in the HSV color space and functions of the Python application program interface of OpenCV V3.0.0 (https://opencv.org/). The precision of the automated image analysis method used here to assess STB has been demonstrated repeatedly in greenhouse and field studies (Lendenmann et al., 2014;Zhong et al., 2017;Krishnan et al., 2018;Meile et al., 2018;Stewart et al., 2018). The procedure was optimized to minimize the effect of insect damage, powdery mildew infections, and physiological senescence, particularly leaf-tip necrosis, on the derivation of PLACL. PLACL was extracted only from t2 and t3 scans, as leaves increasingly displayed physiological senescence at later time-points. Finally, overall disease severity was calculated by multiplying disease incidence with conditional disease severity for inoculated plots, whereas it was directly extracted from the leaf scans for control plots.

Data Analysis
All data analyses were done in the R environment for statistical computing (R version 3.5.2; R Core Team, 2018). Raw spectra were smoothed using the Savitzky-Golay smoothing filter (Savitzky and Golay, 1964) with a window size of 11 spectral bands and a third order polynomial, using the R package "prospectr" V0.1.3. (Ramirez-Lopez and Stevens, 2014). Spectral regions comprising the wavelengths from 1,350 to 1,475 nm, from 1,781 to 1,990 nm, and from 2,400 to 2,500 nm were removed because of the very low signal-to-noise ratio resulting from high atmospheric absorption. Spectra were averaged for each experimental plot. Pre-processed spectra, consisting of reflectance values at 1,709 wavelengths, were then used for time-point specific analysis as well as for time-integrated analyses, as described in the next sections. For ease of notation, the reflectance at a specific wavelength will be abbreviated by R followed by the wavelength in nm (e.g., R750).

Benchmark Time-Point Specific Analysis
The relationship between spectral reflectance and STB was studied on a diverse panel of wheat genotypes by Yu et al. (2018), but the analysis was limited to single time-points. We performed a comparable analysis for each measurement time-point as a benchmark and to estimate model transferability across time. Yu et al. (2018) reported improved prediction of STB severity metrics and classification accuracy when using the full spectrum rather than single SVIs. Therefore, our analysis focuses on these approaches. We tested two parametrically structured linear models [partial least squares (PLS) regression and ridge regression] and two tree-based ensemble models (random forest regression and cubist regression) for their capability to predict STB severity metrics. For classification, PLS discriminant analysis (PLSDA) was used (for details on these methods we refer to Kuhn and Johnson, 2013 and citations therein). Prior to modeling, spectral resolution was reduced to 6 nm by binning (i.e., by computing average values for six adjacent wavelengths) due to very high correlation of reflectance values at neighboring wavelengths. Following a standard procedure (Kuhn and Johnson, 2013), model hyperparameters were tuned using 10 times repeated 10-fold cross-validation. Thus, training and test datasets comprised 90 and 10% of the original dataset, respectively. The root mean square error of predictions (RMSE) and overall classification accuracy were used as performance metrics for regression and classification, respectively. The overall accuracy reflects the agreement between the predicted and the observed classes. This agreement can also be expressed in terms of sensitivity and specificity of the model, with Sensitivity plots correctly predicted as dise = # a ased diseased plots # and Specificity plots correctly predicted as heal = # t thy healthy plots # .
The simplest model with a performance within one standard error of the absolute best model was chosen as the final model. Variable importance for the projection (VIP) was computed for PLSDA models to estimate the importance of individual wavebands to predict the class (i.e., "healthy" or "diseased"). In the regression setting, two different training datasets were used for model fitting: the full dataset, including all control plots, and a dataset consisting of the inoculated plots only. When all control plots were used for model fitting, the RMSE and R 2 of the resulting models were adjusted by removing the predicted and observed values for the control plots again, in order to avoid overly optimistic performance estimates resulting from a good prediction of disease severity in control plots. The R packages "caret" V6.0.80 (Kuhn, 2008), "mixOmics" V6.3.2 (Rohart et al., 2017), "pls" V2.7.0 (Mevik et al., 2018), "Cubist" V0.2.2 (Kuhn et al., 2018), "ranger" V0.10.1 (Wright and Ziegler, 2017), and "elasticnet" V1.1.1 (Hastie, 2018) were used for the analysis.

Time-Integrated Analysis
Summarizing H 1 -H 3 , we hypothesized that the analysis of temporal dynamics in hyperspectral reflectance signatures may facilitate a robust detection and quantification of STB across diverse wheat genotypes under field conditions. To evaluate these hypotheses, we condensed the hyperspectral time series into time series of SVIs, similar to the procedure described previously (Anderegg et al., 2019, submitted). Thereby, we obtained a comprehensive summary representation of the hyperspectral dataset collected over time, interpretable in terms of plant physiology and canopy characteristics. The smoothness of SVI values over time was evaluated graphically, and only SVIs showing a clear and interpretable temporal trend were maintained for further analyses. Values of the selected SVIs were scaled to range from 0 to 10, representing the minimum and maximum values recorded during the assessment period for the corresponding experimental plot, respectively. To simplify subsequent steps in the analysis, the scale for SVIs with increasing values over time was inverted. Measurement dates were converted to thermal time after heading by subtracting the plot-specific accumulated thermal time at heading from the accumulated thermal time at each measurement date. The scaled SVI values were then fitted against thermal time after heading for each experimental plot and SVI using a Gompertz model with asymptotes constrained to 0 and 10 (eq. 1).
where S represents the scaled SVI value, t is the accumulated thermal time after heading, b is the rate of change at time M, and M is the accumulated thermal time after heading when the rate is at its maximum (Gooding et al., 2000). Eq. (1) was fitted using the R package "nls.multstart" V1.0.0 (Padfield and Matheson, 2018). Two types of dynamics parameters for each experimental plot and SVI were extracted from the resulting model fits: 1) "key time-points, " which are specific points in time when a certain criterion (e.g., a threshold) is met, and 2) "change parameters, " which represent the rate or duration of a process (Figure 1). We extracted two key time-points: the M parameter of the Gompertz model and the time when fitted values decreased to 8.5 (t85). As change parameters, we used the rate parameter b of the Gompertz model and the duration between t85 and M (dur). While the b and M parameters of the Gompertz model fully describe the fitted curve, the t85 and dur parameters are affected by both Gompertz model parameters, thus representing a mix of both. The threshold was set to 8.5 because i) this level efficiently captured observed variation during the late stay-green phase (Figure 2), ii) it was little affected by somewhat unstable values during the stay-green phase observed for some SVIs, and iii) for some SVIs, the initial highest values were not optimally represented by the Gompertz model. Next, SVIs least affected in their temporal dynamics by the presence or absence of STB infections were selected separately for each dynamics parameter as follows: for the key time-points (t85, M) by selecting SVIs with the smallest average difference between the parameter values of the inoculated and non-inoculated control plots and for change parameters (b, dur) by selecting SVIs with the smallest average deviance from 1 of the ratio of the change parameter values. For each dynamics parameter, a subset of eight SVIs with the smallest difference or ratio was selected. All other SVIs were considered to be significantly affected by STB infection. Figure 2 shows an example of an STB-sensitive and an STB-insensitive SVI.
We then performed unsupervised subset selection (i.e., without considering the response) on both sets of SVIs (i.e., the STB-sensitive and STB-insensitive SVIs) with the aim of removing redundant SVIs. For each dynamics parameter (t85, M, dur, b), pairwise Pearson correlation coefficients between the parameter values derived from all used SVIs were computed. For change parameters, the maximum linear correlation allowed was set to r = 0.9, whereas for the key time-points, it was set to r = 0.95, as these were generally highly collinear. In cases where pair-wise correlations were higher than these threshold values, only one of the two SVIs was retained, preferring narrow-band SVIs over broad-band SVIs, SVIs with a specific physiological interpretation, and SVIs developed specifically for use in wheat or barley canopies over more generic SVIs. Additionally, the goodness of the Gompertz model fit was evaluated qualitatively (i.e., graphically) and used as an additional selection criterion. The parameters were then combined by calculating differences between the key time-points derived from selected STBsensitive and STB-insensitive SVIs and the ratios of the change parameters derived from STB-sensitive and STB-insensitive SVIs (Figure 3). These differences and ratios were calculated for all possible pairs of STB-sensitive and STB-insensitive SVIs and were then used as features for 1) the classification of plots into non-inoculated healthy control plots and inoculated, diseased plots and 2) the prediction of STB severity in each plot. This final step was performed primarily to estimate whether a combination of features outperform single features in predicting STB severity (relevant with respect to H 3 ) and to identify the most predictive features (see section 2.4.3). Models and model fitting procedures were identical to the time-point specific analysis.

Selection of Spectral-Temporal Features
While tree-based models are considered naturally resistant to non-informative predictors and some perform feature selection intrinsically, the presence of highly correlated predictors makes the interpretation of resulting variable FIGURE 1 | Extraction of dynamics parameters for one spectral vegetation index (SVI; here scaled values of the plant senescence reflectance index) and one experimental plot (here a non-inoculated control plot). The t85 parameter is the time point when fitted scaled SVI values decrease to 8.5; M is a parameter of the Gompertz model, representing the time point when the rate of decrease is at its maximum; the dur parameter represents the duration in thermal time between t85 and M; and b is a parameter of the Gompertz model, representing the maximum rate of decrease. M and t85 are labeled "key points"; dur and b are labeled "change parameters." importance measures challenging (Strobl et al., 2007). Hence, we performed supervised feature selection by recursive feature elimination with cubist and random forest regression as baselearners, using a nested cross-validation approach. The dataset was resampled 30 times with an 80:20 split using stratified sampling. Samples were binned into eight classes based on percentiles of STB severity to ensure balanced evaluation datasets. Thus, for each resample, feature elimination was carried out on 80% of the data, and model performance was evaluated on the remaining 20% in 28 decreasing steps. Eliminated features were assigned a rank corresponding to the iteration after which they were excluded (i.e., those eliminated first had rank 28, whereas the feature retained as the last had rank 1). In each iteration, the base-learner hyperparameters were tuned using 10-fold cross-validation (see Ambroise and McLachlan, 2002;Guyon et al., 2002;Granitto et al., 2006 for a detailed discussion of the methodology).

Independent Model Validation
Due to the relatively small size of the experiment (n = 72 experimental plots), we did not rely exclusively on the cross-validated training performance estimates for model evaluation. The performance and robustness of the developed models were further evaluated using data of 360 wheat plots sown with 330 registered varieties obtained from one replication of the GABI panel. Low to intermediate disease incidence and very low conditional disease severity as well as late appearance of symptoms in all 36 control plots spatially randomized within the two replications of the panel suggested that STB disease should not have reached damaging levels in the vast majority of these plots. Therefore, these plots were considered as essentially disease-free. For all of these plots, spectral-temporal features were extracted from the 13 measurement time-points as described previously and were then used to generate a class label and class probabilities from the classification models as well as a prediction of disease severity from the regression models. To distinguish the performance measures obtained for held-out samples of the main experiment (i.e., the cross-validated training performance) from those obtained for the independent plots, these are referred to as the internal accuracy (acc int ) and the generalized accuracy (acc gen ), respectively. It is important to note that accuracy represents only model specificity in this case, as no independent plots with significant levels of STB were available. In a final validation step, the spatial distribution of class labels and severity predictions were examined by creating plots of the experimental design. Thus, we aimed to test the robustness of the models to heterogeneous field conditions. Field heterogeneity may affect plant physiology and thus hyperspectral reflectance over time (e.g., through the development of local drought stress). The presence or absence of spatial patterns in model predictions can therefore be interpreted as an additional measure of model robustness.

Validation of the Most Predictive Feature in a Contrasting Environment
Spectral-temporal features were engineered specifically to minimize effects of genotype and environment. However, it is still conceivable that relative changes in spectral reflectance over time are also affected by environmental conditions. This may result in the extraction of environment-specific relationships between spectral-temporal features and disease severity. We therefore evaluated the relationship between the most predictive spectral-temporal feature and STB severity using data from a separate year. This dataset enables a rigorous validation of the spectral-temporal features as predictors of STB for several reasons: i) it originates from a strongly contrasting environment with wet and cool weather conditions; ii) sowing parameters were different, likely affecting canopy structural parameters; iii) only natural variance in disease resistance was observed, as FIGURE 3 | Derivation of the final key time points based predictors for disease classification and quantification. Key time points extracted from disease-sensitive and disease-insensitive spectral vegetation indices (SVI) are combined to isolate the effect of the disease from other effects (e.g., contrasting stay-green) by calculating the differences (highlighted by red arrows). (A) Control, "healthy" plot. (B) Inoculated, "diseased" plot. For change parameters, the ratio, rather than the difference, was calculated. artificial inoculations were not performed; iv) a large number of morphologically, phenologically, and structurally diverse genotypes not contained in the training population were assessed; v) reflectance measurements were not carried out with a sufficient frequency to allow fitting of parametric models, and we had to use linear interpolations of individual measurement time points instead, losing the advantage of the smoothing effect; and finally, vi) sampling procedures to quantify STB severity were not optimal for our purpose. As parametric models could not be fitted, the time point when interpolated values decreased below 50% of their initial value (t 50 ) was extracted as an equivalent to the M parameter of the Gompertz model. For more details on datasets and experiments, we refer to Karisto et al. (2018) and Anderegg et al. (2019, submitted).

Development of STB Disease
Toward the end of the vegetation period, all inoculated plots had substantial levels of STB. In contrast, non-inoculated control plots were essentially disease-free until late in the vegetation period. Thus, artificial inoculations were effective in all plots, and the dataset was suitable for testing the feasibility of the classification of plots into healthy and diseased canopies based on reflectance spectra or spectral-temporal features. Furthermore, large variation in the levels of STB could be observed among the inoculated plots, probably attributable to different levels of resistance, with the largest variation observed during late stay-green (i.e., at t3, June 19, 2018, compare with Figure 4B). Thus, the dataset was also suitable for testing the feasibility of disease quantification using reflectance spectra or spectral-temporal features.
High levels of STB in inoculated plots were the result of both high incidence and high conditional severity, particularly at t4 and t5 ( Figure 5A, Table 2). Visual assessments of scanned leaves suggested a high conditional severity in all inoculated plots at these late stages. In contrast, the non-inoculated control plots displayed very low levels of STB even at late stages. STB incidence increased in some plots at t4 and t5, but visual assessments of the sampled leaves demonstrated very low conditional disease severity even at t5. Since the subset of genotypes used for the experiment also included some highly susceptible genotypes, this suggests that natural infections did not cause agronomically significant levels of STB in this experiment. This was likely the result of fungicide applications and the very low rainfall in the period from May to July. Rainfall in this period totaled 178 mm, which represents 52% of the average of 343 mm in the reference period 1981-2010 (MeteoSwiss, 2019). STB incidence was low at t1, both in inoculated and in control plots. Symptoms were apparent at significant levels only on lower leaf layers of inoculated plots, whereas flag leaves were essentially disease-free in both treatments. At t2, there was approximately a five-fold increase in STB incidence at the flag leaf and subtending leaf layer in many inoculated plots ( Figure 5A, Table 2). At t3, STB incidence on flag leaves reached very high levels in most inoculated plots, and PLACL reached an average of 17%, indicating a moderate conditional STB severity on average. Thus, the observed differences in STB severity among inoculated plots are primarily a result of differences in conditional severity, with PLACL ranging from 0 to 38% (Figure 5B). In control plots, almost no lesions were detected. There were some signs of physiological senescence on sampled flag leaves at t3, but these were mostly limited to yellowing of the entire leaves and/or leaf tip necrosis. As there was ample variation for disease severity at t3 among the inoculated plots and physiological senescence did not significantly affect extraction of PLACL from leaf scans, this time-point was chosen as a measure of overall disease severity and used as response variable in the time-integrated analysis.

Effects of STB and Phenology on Canopy Spectral Reflectance
Over the assessment period, observed changes in spectral reflectance were similar for diseased and healthy canopies ( Figure 4A), showing the typical pattern of senescing canopies. However, an obvious effect of STB infections consisted in an early marked decrease in reflectance in the near-infrared region (NIR) not observable in healthy canopies. This decrease preceded the appearance of physiological senescence and was observable in the pre-symptomatic phase of STB infections. Furthermore, an early increase in reflectance in the VIS, especially for wavelengths greater than 535 nm, was observed. An early increase in shortwave infrared region (SWIR) reflectance in diseased canopies compared to healthy canopies was also discernible. However, these differences were small compared to changes in reflectance over time.
Canopy spectral reflectance seemed to remain relatively constant throughout the stay-green phase in healthy canopies ( Figure 4A, upper panel). However, the examination of SVI values over time revealed significant changes in canopy reflectance during this period ( Figure 4C). Importantly, variation caused by advancing phenology (i.e., within-treatment variation FIGURE 5 | Development of Septoria tritici blotch (STB) disease. (A) STB incidence at five different assessment dates for non-inoculated "healthy" control plots (ctrl) and for artificially inoculated "diseased" plots (dis). In diseased plots, STB incidence on flag leaves was assessed at all time points, whereas for the control plots, it was assessed only from t3 onward. Filled boxes represent STB incidence on flag leaves; open boxes represent STB incidence on lower leaf layers. 1) Open boxes represent STB incidence on third leaves from the top; 2) open boxes represent STB incidence on second leaves from the top. (B) Conditional disease severity measured as percent leaf area covered by lesions (PLACL) at t3 (June 19, 2018) for eight flag leaves per plot for all 72 experimental plots.
TABLE 2 | Summary of Septoria tritici blotch (STB) assessments. STB incidence and conditional severity were assessed at the level of individual leaf layers, namely, the flag leaf (Fl0), the penultimate leaf (Fl1), and the ante-penultimate leaf (Fl2). Conditional severity was measured as percent leaf area covered by lesions (PLACL). Severity was calculated as the product of STB incidence and conditional severity. Values are reported separately for non-inoculated control plots and inoculated plots, separated by a slash. Mean values across all plots are reported, with minima and maxima in brackets. Disease assessments were carried out on five dates (t1-t5) covering the growth phases of 15 days after inoculation (dai) to 41 dai.  Figure 4C) was prominent with respect to STB-induced variation (i.e., between-treatment variation in Figure 4C). This is true even for the structure insensitive pigment index (SIPI), which has been proposed as a potential surrogate for crop disease under field conditions ; Figure 4C). For several other spectral indices, initial variation as well as variation over time was even larger (data not shown).

Time-Point Specific Full-Spectrum Analysis
Binary Classification Into Healthy and Diseased Canopies Using Reflectance Spectra PLSDA models correctly classified all held-out samples in most resampling iterations, resulting in classification accuracies acc int ≥ 0.96 for all time-points (Figure 6). The optimal number of components used by the PLSDA models (determined via repeated CV) was between 5 and 17, depending on the time-point. Correct class labels were obtained for all held-out samples even for the first time-point at 9 dai, when no visual symptoms of STB were present in most experimental plots. However, prediction accuracies for the independent GABI plots were distinctly lower, particularly for models calibrated with data from early and late measurement time-points (data not shown). A satisfactory performance on independent plots was observed for models calibrated with data from the late stay-green phase (i.e., between 2018-06-10 and 2018-06-22), which correctly predicted the independent plots as disease-free in most cases (acc gen ≥ 0.88 in all cases). VIP scores quantify the importance of wavebands to predict the response, i.e., to generate the class label ("healthy" or "diseased") or to predict STB disease severity here . VIP scores for the first three components showed some general patterns across time-points (Figure 6). The NIR (750-1,300 nm) and the SWIR (1,475-1,781 and 1,991-2,400 nm) had a relatively high importance (Figure 6). However, the FIGURE 6 | Variable importance for the projection (VIP) of the time point specific partial least squares discriminant analysis (PLSDA) models for the first three components (comp.1-comp.3). The total number of components used by the model (ncomp), the prediction accuracy for plots included in the experiment (acc int ), and prediction accuracy for the independent test set comprising >300 plots of an adjacent experiment measured on the same day or with a maximum delay of one day (acc gen ) are also reported for each model. The earliest and latest time points are not represented. The gray shaded area represents the spectral range between 680 and 750 nm, i.e., the red edge. The horizontal black line marks a commonly used threshold value for an important contribution (VIP = 0.8).
relative importance of the SWIR compared to the NIR drastically changes over time. The importance of the SWIR is comparably low during early stay-green, but its importance greatly increases and exceeds the importance of the NIR during late staygreen. Furthermore, the red edge (RE, 680-750 nm) had a low importance at the beginning but is increasingly important at later stages, as indicated by a gradual left-shift of the peak in VIP at the NIR for later time-points. Finally, at early time-points, there is a significant contribution of wavebands in the visible range (VIS, 400-700 nm). This feature is somewhat transformed over time, resulting in a narrow peak in VIP at wavelengths around 535 nm at intermediate time-points. Toward later time-points, this feature broadens again.
Given the common patterns but also significant differences across time-points, we aimed to evaluate the robustness of the developed models to temporal changes in reflectance induced by advancing crop phenology, as differences in crop phenology are typically present among genotypes in breeding programs. Model performance across time is shown in Figure  7. The performance of models calibrated with data from early and late time-points quickly deteriorates. In contrast, models calibrated with data of intermediate time-points show a higher stability over time, although the performance of some models still decreases rather fast. The models created using data from 2018-06-10 and 2018-06-19 were most robust over time and produce accurate class predictions over a period of about 10 days. It is essential to note that these performance estimates are derived from the same plots used to calibrate the models, although at different time-points. Given the lower acc gen compared to the acc int (see above), significantly decreased performance should be expected on entirely independent plots (different genotypes).

Regression Models to Quantify Disease Severity Using Reflectance Spectra
Cubist regression models performed best in predicting disease severity. The smallest RMSE was obtained for models trained on data from inoculated plots only (RMSE = 0.061, R 2 = 0.67; Figure 8). The underlying model was simple, building on only four variables (R748, R766, R892, and R1084) in a single model tree. Model performance was slightly decreased when all available data was used for model fitting (RMSE adj = 0.066, R 2 adj = 0.55). Here, significant improvements were achieved by increasing model complexity. Validation on the largely disease-free plots of the GABI panel suggested a high specificity of the model (i.e., disease levels on healthy plots were predicted to be virtually zero for almost all plots). This was true irrespective of whether the control plots were included in the training dataset or not. PLS and ridge regression performed similarly (RMSE = 0.063, R 2 = 0.64), whereas random forest regression performed comparably poorly (RMSE = 0.087, R 2 = 0.46). A strong systematic bias was observed in predictions of the random forest, with low values of disease severity overestimated and high values underestimated.

Engineering of Spectral-Temporal Features as New Predictors
Fifty-seven of the tested SVIs were deemed amenable for analysis of their temporal dynamics in the proposed framework (i.e., they displayed a clear and interpretable temporal trend, which could be represented using a Gompertz-type model). For seven of these, the last three measurements were excluded prior to FIGURE 7 | Overall prediction accuracy of PLSDA models across time. Binary classification models were calibrated for each measurement time point, using reflectance spectra collected on this date as predictors and the known class label (i.e., "diseased" or "healthy") as response. The performance of these models was evaluated on held out samples of the same date as well as on all plots of the subsequent measurement time points as the overall accuracy of classification. Colored lines track the performance of each date-specific model across time (e.g., the left-most red line represents the performance of the model calibrated with spectra obtained on May 30, 2018, when tested on the same date, and for all subsequent measurement dates). The broken black line indicates a performance of 0.5, i.e., the performance of a random guess of the class label.
modeling their temporal dynamics, as values increased again in later stages. After subset selection, 7 and 13 SVIs were retained as insensitive and sensitive SVIs, respectively, for the key timepoints. For change parameters, 4 and 12 SVIs were retained as insensitive and sensitive SVIs, respectively. In total, 24 distinct SVIs were retained, of which 10 sensitive and 14 insensitive SVIs. From their fitted dynamics, a total of 278 (i.e., 4 insensitive SVIs * 12 sensitive SVIs * 2 parameters + 7 insensitive SVIs * 13 sensitive SVIs * 2 parameters) spectral-temporal features were generated as pairwise combinations of dynamics parameters obtained from sensitive and insensitive SVIs. These features were then used for classification and regression, as described below.

Binary Classification Into Healthy and Diseased Canopies Using Spectral-Temporal Features
A PLSDA model using four components achieved a classification accuracy acc int = 1.00, suggesting correct classification of each experimental plot as healthy or diseased canopy based on spectral-temporal features. In the external validation, the model achieved acc gen = 0.86, thus correctly classifying 304/353 plots as healthy. This is slightly less accurate than the best time-point specific models (Figure 6).

Regression Models to Quantify Disease Severity Using Spectral-Temporal Features
Overall, disease severity predictions from spectral-temporal features were similarly accurate as those obtained from time-point specific models based on reflectance spectra. The lowest RMSE was achieved when no control plots were used as training data using the PLS algorithm (RMSE = 0.068, R 2 = 0.71). Differences in performance among algorithms were smaller than in time-point specific analyses, but the random forest performed relatively poorly also in this case (RMSE = 0.076, R 2 = 0.62). Both tree-based models, and particularly the random forest, produced systematically biased predictions, which was not observable for PLS and ridge regression. In contrast to the time-point specific analyses, validation on the largely disease-free plots of the GABI panel suggested the necessity to include the control plots in the training data in order to obtain accurate predictions for low levels of disease, except for ridge regression. When these were included, treebased models produced good estimates of the low diseaselevels, whereas PLS and ridge regression still predicted disease severity of >0.05 in a significant number of plots. The inclusion of the control plots resulted in a lower systematic bias of the tree-based predictions, while only marginally decreasing model performance (RMSE adj = 0.074 and RMSE adj = 0.076 for cubist and random forest, respectively; Figure 9A). Importantly, the inclusion of the control plots also strongly reduced or eliminated spatial patterns in predictions of the GABI panel, except for ridge regression ( Figure 9B). Thus, cubist regression seemed to perform best when taking all evaluated aspects of model performance into account.

Feature Selection and Validation
Feature selection was performed to identify the most important spectral-temporal features and to estimate the benefit of adding additional features. The difference between M derived from the modified chlorophyll absorption ratio index (MCARI2) and the SIPI was consistently the most informative spectral-temporal feature (Supplementary Table 1). The MCARI2 was designed to estimate green leaf area index in crop canopies, whereas the SIPI measures pigment concentrations and ratios in leaves. This feature was retained as the last in all 30 resamples by the random forest and in 29 resamples by cubist. Following features had much increased ranks. The most influential features were all based on the M parameter of the Gompertz model, while other parameters were clearly less important. In particular, differences in change parameters did not seem to be informative of disease severity. Most selected features used the SIPI, R780/R740, and PRInorm indices as STB-insensitive index, even though they seemed to be somewhat more affected by the presence of disease than the flowering intensity index (FII) on average ( Table 3). There was little evidence for the existence of complementary information among the spectral-temporal features, as model performance was affected little by the sequential removal of features (Figure 10). However, the small sample size resulted in very high variance of the performance estimates obtained from the test set and contrasting patterns between the crossvalidated training and the test performance estimates ( Figure  10). The top-selected spectral-temporal feature was found to be informative of STB severity in the separate experiment carried out under contrasting environmental conditions (Pearson r = 0.53, p < 0.001; Figure 11).

Limitations of Time-Point Specific Analyses
Current reflectance-based approaches to high throughput phenotyping of crop diseases under field conditions suffer from a lack of specificity and from insufficient robustness to genotypic diversity and environmental variability (i.e., context specificity). This problem has previously been described in detail with regard to growth stages of the crop, different phases in the pathogenesis, and the presence of other stresses (Zhang et al., 2012;Devadas et al., 2015;Zheng et al., 2019).
Our results prominently illustrate context-specificity of the relationship between spectral reflectance and disease. Firstly, variation on a specific date in potentially disease-sensitive spectral features, such as the SIPI (see Bajwa et al., 2017;Yu et al., 2018), is quickly overridden by variation caused by advancing phenology (Figure 4C), illustrating the difficulty in defining thresholds or calibration curves. Secondly, unstable VIP values of single wavelengths in PLSDA models, systematic shifts in VIP patterns (Figure 6), and limited model applicability over time, even for the plots contained in the training dataset (Figure 7), illustrate marked short-term changes in the relationship between STB and spectral reflectance. Thirdly, the decreased classification accuracy on independent test plots (Figure 6) indicates context-specificity related to the effect of genotypes and, possibly, field heterogeneity.
There was a short period during late stay-green when classification models were transferable between time-points to FIGURE 9 | (A) Predicted vs. observed Septoria tritici blotch (STB) severity levels of 36 artificially inoculated wheat plots. Mean and standard error of predictions are shown. Data from all experimental plots (n = 72) was used to tune/train the model, but reported performance estimates are based only on artificially inoculated plots (n = 36) in order to avoid overly optimistic performance estimates resulting from a good prediction of disease severity in control plots. STB severity was measured on flag leaves using a combination of visual incidence scorings and scans of flag leaves exhibiting disease symptoms. Predictions were obtained from a cubist regression model based on spectral-temporal features for the same 36 plots and 36 non-inoculated control plots sown with the same genotypes. The broken red line represents the 1:1 line, the blue line represents the least squares line of the linear regression of predicted vs. observed values, and the gray area represents the 95% confidence interval of the least squares line. (B) Spatial distribution of predicted STB severity levels of ~360 largely disease-free plots of the GABI wheat panel, grown next to the plots used as training dataset. White fields correspond to the plots contained in the training dataset. some extent (Figure 6). This can be explained by the relatively synchronized appearance of moderate to high levels of STB in front of the relatively stable background of a stay-green canopy. In this intermittent phase, the signal caused by STB is strong compared to the noise caused by genotypic diversity and in-field measurements (see also Figure 4A). Nonetheless, the regression model for STBseverity based on reflectance spectra is still context-specific, as reflectance in the NIR (used as predictors) gradually decreases during the stay-green and senescence phase irrespective of the presence of STB ( Figure 4A). NIR reflectance is also strongly affected by genotype morphology, canopy 3-D structure, and canopy cover (Jacquemoud et al., 2009;Gutierrez et al., 2015) and is therefore not specific to STB if analyzed on a particular point in time. In addition, time-point specific models highlight the potential of detecting STB in different phases, using different spectral features. This potential would be left unused if only a short period would be targeted.

Potential of Temporal Changes in Reflectance to Detect and Quantify STB
Due to the strong limitations of models based on reflectance spectra, we evaluated the potential of exploiting temporal changes in reflectance for disease detection and quantification instead. Models based on spectral-temporal features were characterized by a somewhat lower performance compared with models trained on reflectance spectra of a specific time-point (Figure 7,  Figure 8). Nevertheless, classification accuracies were similar to the time-point specific PLSDA models, and regression models suggested that spectral-temporal features were also informative of disease severity. This is encouraging, particularly given the strongly contrasting morphological, canopy structural, and staygreen properties of the genotypes comprised in the experiment.

Selected Spectral Indices and Resulting Spectral-Temporal Features
Even though high levels of STB developed during the stay-green phase in most artificially inoculated plots (Figure 4), several SVIs could be identified which displayed similar temporal patterns across treatments (Table 3). In particular, the FII (Stuckens et al.,FIGURE 11 | STB severity as measured by the percent leaf area covered by lesions (PLACL) plotted against the spectral-temporal feature identified as most predictive of STB severity (the difference between t 50 extracted from the MCARI2 and SIPI spectral vegetation indices, expressed in growing degree days). The Pearson product moment correlation coefficient and the p-value of the linear correlation are based on 592 experimental plots. Data was collected in a separate experiment conducted in 2016 without artificial inoculation. In this experiment, 330 genotypes were grown in two replicates, but some plots were excluded from the analysis due to heavy lodging. 2011), i.e., the normalized difference of R475 and R365, was found to be almost unaffected by STB (Figure 2A). In a previous study, we found that early physiological senescence of wheat canopies results in only (proportionally) small increases in reflectance at wavelengths shorter than 500 nm (unpublished data). Strong increases were observable only toward later stages of senescence. The observed insensitivity of the FII to STB likely results from the fact that STB affects only parts of the vegetation, initially mostly lower leaf layers, while significant amounts of healthy green tissue remain. Thus, FII values should change significantly only with the onset of rapid apical senescence, encompassing a generalized loss of green leaf area. It has been suggested that STB does not accelerate or anticipate apical senescence under a range of environmental conditions (Bancal et al., 2016). This is in line with the observed insensitivity of the FII to STB. Interestingly, the dynamic pattern of the SIPI (Penuelas et al., 1995) was also found to be highly insensitive to STB. In contrast, this SVI was previously suggested as a potential surrogate for crop disease under field conditions (Bajwa et al., 2017;Yu et al., 2018). This index was developed at the leaf scale to maximize sensitivity to the ratio between carotenoid and chlorophyll a concentrations (Car/Chl a ratio), while minimizing the effect of leaf surface and mesophyll structure (Penuelas et al., 1995). Provided the principles underlying the SIPI hold also for canopy level reflectance, a low sensitivity of the dynamic pattern to the presence of STB would be expected, as there seems to be no reason to expect a significant change at canopy level of the Car/Chl a ratio due to STB. STB causes the appearance of localized necrotic lesions; however, a general increase in the Car/Chl a ratio is not expected, unless STB accelerates or anticipates apical senescence, which does not seem to be the case (Bancal et al., 2016). The PRInorm was also among the most STB-insensitive SVIs. This SVI is based on the photochemical reflectance index (PRI), initially employed to measure changes in the relative levels of pigments in the xanthophyll cycle (Gamon et al., 1992). Over larger temporal scales, the PRI was shown to be strongly responsive to the Car/ Chl ratio (Sims and Gamon, 2002). Zarco-Tejada et al. (2013) modified this SVI to decrease the effect of reduced canopy leaf area resulting from water stress. Thus, its insensitivity to STB can probably be explained in an analogous manner as for the SIPI. The temporal patterns of water-sensitive SVIs such as the water index (WI; Peñuelas and Inoue, 1999) and the normalized difference water index (NDWI; Gao, 1996) were found to be highly sensitive to STB. Similarly, the disease water stress index (DSWI; Apan et al., 2004), which uses information from the water-sensitive SWIR and the NIR, was strongly affected in its temporal dynamics. In particular, water sensitive SVIs decreased much earlier for inoculated than for control plots, and the decrease occurred more gradually than in healthy plots (data not shown). This is in line with findings by Yu et al. (2018), who reported both the WI and NDWI to discriminate best between STB-diseased and healthy canopies in early stages of disease development. Several SVIs using reflectance in the RE and NIR also showed strongly contrasting temporal patterns [e.g., DSWI, normalized difference vegetation index, plant senescence reflectance index (PSRI), and VI700]. Similar to the SIPI, the PSRI (Merzlyak et al., 1999) is highly sensitive to the Car/Chl ratio at the leaf level.
However, NIR reflectance is used to normalize the difference between R677 and R500. It seems highly questionable whether the PSRI is particularly sensitive to the Car/Chl ratio in diverse germplasm at the canopy level. Variation in the PSRI seems to arise primarily from differences in canopy structure among genotypes and from canopy structural changes over time (Anderegg et al., 2019, submitted). The modified chlorophyll absorption ratio index (MCARI2; Haboudane et al., 2004), sensitive to green leaf area index, also showed strongly contrasting dynamic patterns ( Figure  2B). The healthy index (HI; Mahlein et al., 2013) was developed to separate healthy sugar beet leaf tissue from tissues affected by various foliar diseases. The prominent use of the RE by this index suggests that in our case, HI values are mostly driven by canopy structure and to a lesser extent chlorophyll absorption. Overall, our results thus suggest that SVIs sensitive to leaf internal structure and canopy structure are strongly affected by the presence of STB. This effect has been previously described for various patho-systems (e.g., Zhang et al., 2012;Yu et al., 2018;Zheng et al., 2019).
In summary, it seems that many of the derived spectral-temporal features can be interpreted as robust measures of STB-induced temporal changes in leaf internal structure, canopy structural parameters, and canopy water content. These are obtained by normalizing the temporal dynamics of corresponding SVIs via the estimation of temporal changes in pigment ratios and reflectance at short wavelengths centered around 465 nm, likely representing physiological apical senescence. Thus, spectral-temporal features seem to well represent our hypothesis H 2 , as STB-affected plant and canopy traits are expressed relative to phenology-related traits.

Robustness of Spectral-Temporal Features
As far as quantifiable in the framework of this experiment, models based on spectral-temporal features were robust to variation in genotype morphology, phenology, canopy cover, and canopy 3D structure as well as genotype-specific temporal changes thereof. Accurate predictions were obtained also for 330 diverse genotypes comprised in the GABI panel and grown in a large field experiment, suggesting robustness of the method to varying growing conditions arising from field heterogeneity ( Figure 9B). The most predictive feature was also among the best predictors of STB severity in a different experiment. The observed correlation is very similar to the correlations reported by Yu et al. (2018) for simple SVI measured at individual time points, however with the advantages discussed throughout the manuscript. The results of this final validation likely underestimate the power of the developed approach for several reasons as described in section 2.4.5 but offer evidence for the transferability of our results across sites, environments, genotypes, and agricultural practices.
Results from feature selection suggested that a single spectraltemporal feature (i.e., the difference between M derived from MCARI2 and SIPI), relating structural changes in leaves and canopies to senescence-induced changes in pigment composition, was sufficient to achieve the performance illustrated above. Other stresses occurring during grain filling such as terminal drought stress and nitrogen shortages are likely to result in a similar decrease in green leaf area index. However, these stresses are also known to accelerate physiological senescence (Martre et al., 2006;Bogard et al., 2011;Distelfeld et al., 2014). Therefore, we speculate that the developed models may be moderately robust against the effect of common other stresses despite their simplicity. Yet, we conclude that our hypothesis H 3 (i.e., that the combination of several spectral-temporal features representing the unique sequence and dynamics of separate events during pathogenesis could increase the specificity of the method) remains to be confirmed in larger experiments including other stress factors. In particular, other diseases causing similar symptoms and prevalent in the same developmental stage of the crop may have similar effects on the temporal evolution of hyperspectral reflectance. Large multifactor experiments will be required to judge the potential of the proposed approach to detect, quantify, and delineate individual necrotrophic foliar diseases. Finally, it should be noted that some effects of fungicide applications on canopy reflectance characteristics cannot be excluded in our experiment. However, a fungicide formulation without greening effect was used for the last treatment at BBCH 65. It seems unlikely that this or earlier fungicide applications significantly affected the temporal dynamics of canopy reflectance.

Multiple Spectral Vegetation Indices to Exploit Temporal Dynamics in Reflectance
A key component of the proposed approach consists in summarizing hyperspectral data in terms of multiple SVIs and modeling of their temporal dynamics. Though this may result in the loss of some relevant information contained in reflectance spectra (Pauli et al., 2016), the use of SVIs presented a number of advantages here: i) noise in temporal patterns was much reduced compared to reflectance values at single wavelengths, facilitating the fitting of parametric models; ii) the inevitable subset selection step preceding feature combination could be based on objective criteria related to the form and purpose of SVIs; iii) many of the used SVIs were designed specifically to maximize responsiveness to certain vegetation properties while minimizing the effect of common confounding factors, which is likely to also increase the robustness of derived spectral-temporal features (see e.g., Penuelas et al., 1995;Haboudane et al., 2004); and finally, iv) the procedure results in a summary of the hyperspectral dataset that is interpretable in terms of plant physiology and canopy characteristics, which also holds true for derived spectral-temporal features. Fitting parametric models to scaled SVI values may smooth out measurement errors related to single measurement dates, resulting for example from varying sun angle at measurement or short-term variation in illumination conditions. Thus, scaling SVI values and modeling their temporal dynamics reduces the effect of confounding factors on initial reflectance spectra and minimizes the effect of errors related to single measurements in the series.

Context and Scope
In this study, we used a non-imaging spectroradiometer and manual feature engineering for disease detection and quantification. A high spatial resolution of imaging sensors has been deemed critical for disease detection, identification, and quantification by others (Mahlein et al., 2010;Mahlein et al., 2012;Mahlein, 2016). The high potential of 2-D information in combination with deep learning methods for disease identification has been demonstrated recently (Mohanty et al., 2016;Fuentes et al., 2017). However, changes in spectral reflectance over time have also been shown to be highly informative at the leaf level (Mahlein et al., 2010;Wahabzada et al., 2015). To make use of the spatial and temporal dimensions under field conditions, individual lesions would arguably have to be tracked across time. Some solutions to this problem have been presented for close-range hyperspectral measurements . However, similar solutions at the canopy level may be technically extremely challenging to implement and require extensive studies due to problems in tracking individual pixels or organs over time and in obtaining a clean spectral signal from objects with varying orientation. Existing approaches to make use of spectral, spatial, and temporal information rely on automated and data-driven extraction of characteristic spectral features for diseased plants under controlled conditions (Wahabzada et al., 2015;Wahabzada et al., 2016;Thomas et al., 2018). Here, promising results were achieved using a non-imaging sensor and manual feature extraction. This highlights that an improved understanding of potential confounding factors arising under field conditions may equally boost the potential of remote sensing methods for applications in crop breeding.
We developed and validated the presented method to facilitate robust in-field detection and quantification of STB. However, the underlying concepts should be transferrable to different problems, such as the detection and quantification of other foliar diseases. Several features of the proposed approach (e.g., exploiting plot-based relative changes in reflectance over time, combining sensitive and insensitive features, or the SVI-based parameterization of temporal dynamics) may also be valuable in quantifying other breeding-relevant traits, such as the timing and dynamics of nitrogen remobilization.

CONCLUSION
Here, we tested the possibility to detect and quantify STB relying exclusively on relative changes in spectral reflectance over time, which is expected to minimize confounding effects on spectral reflectance arising from genotypic diversity and environmental conditions. Our results demonstrated the feasibility of the proposed approach and suggested that resulting models were robust against variation in several common nuisance factors.
Specifically, it appears that the temporal dynamics in green leaf area index when set in relation to the dynamics of physiological apical senescence is highly indicative of the presence of STB infections and of STB severity. Time-resolved measurements of the MCARI2 and the SIPI SVIs could allow to assess these traits at very high throughput, facilitating time-resolved large-scale screenings of breeding nurseries.
Larger calibration experiments will offer the opportunity to evaluate the inclusion of additional spectral-temporal features that better capture relevant information in different phases of pathogenesis. This is likely to improve sensitivity and specificity of resulting models, which should also be tested in more detail.
Furthermore, the evaluation of the scalability to unmanned aerial vehicles will represent a crucial step toward application of such methods in breeding programs.

AUTHOR CONTRIBUTIONS
All authors planned the experiment. AH created the experimental design. PK prepared and carried out pathogen inoculation. AM and PK designed the STB assessment strategy and supervised STB scorings, collections and scanning of leaves. JA performed reflectance measurements. JA performed leaf scan analysis, conceived the proposed data analysis approach, analyzed the data and drafted the manuscript. AM and AH supervised the project. All authors contributed to, read and approved the final version of the manuscript.

FUNDING
The project was partially funded by the Swiss Federal Office of Agriculture (FOAG) and by the Swiss National Science Foundation through Ambizione grant PZ00P3_161453.