Combining NDVI and Bacterial Blight Score to Predict Grain Yield in Field Pea

Field pea is the most commonly grown temperate pulse crop, with close to 15 million tons produced globally in 2020. Varieties improved through breeding are important to ensure ongoing improvements in yield and disease resistance. Genomic selection (GS) is a modern breeding approach that could substantially improve the rate of genetic gain for grain yield, and its deployment depends on the prediction accuracy (PA) that can be achieved. In our study, four yield trials representing breeding lines' advancement stages of the breeding program (S0, S1, S2, and S3) were assessed with grain yield, aerial high-throughput phenotyping (normalized difference vegetation index, NDVI), and bacterial blight disease scores (BBSC). Low-to-moderate broad-sense heritability (0.31–0.71) and narrow-sense heritability (0.13–0.71) were observed, as the estimated additive and non-additive genetic components for the three traits varied with the different models fitted. The genetic correlations among the three traits were high, particularly in the S0–S2 stages. NDVI and BBSC were combined to investigate the PA for grain yield by univariate and multivariate GS models, and multivariate models showed higher PA than univariate models in both cross-validation and forward prediction methods. A 6–50% improvement in PA was achieved when multivariate models were deployed. The highest PA was indicated in the forward prediction scenario when the training population consisted of early generation breeding stages with the multivariate models. Both NDVI and BBSC are commonly used traits that could be measured in the early growth stage; however, our study suggested that NDVI is a more useful trait to predict grain yield with high accuracy in the field pea breeding program, especially in diseased trials, through its incorporation into multivariate models.


INTRODUCTION
Field pea or dry pea (Pisum sativum L.) is an annual pulse crop widely grown in all major cropping regions, including North America, Russia, India, Australia, etc. (Zohary, 1999), with global production of 15 million tons in FAOSTAT (2020). It is an important staple legume with high nutritional value and high protein content, as well as being able to be harvested as hay or used for forage in poor seasons (Burstin et al., 2011;Amarakoon et al., 2012).
In low-input farming systems, field pea is grown as a significant rotational crop, providing benefits of improved soil fertility, a disease break, and giving better options for weed control (Pritchard, 2015;Powers and Thavarajah, 2019). More recently, field pea has been used as a protein source for meat substitution products in the food manufacturing industry due to its low price and lower allergenic nature (Bashi et al., 2019).
Field pea is a cool-season crop, and grain yield can vary dramatically, from 0.5-1 to 4-5 t/ha depending on environmental conditions (Smýkal et al., 2012). Conventional breeding has improved the field pea grain yield by improving desirable traits, such as early vigor, flowering time, and plant type, and by deploying diverse genetic resources (Lejeune-Henaut et al., 2008;Sadras et al., 2012;Singh and Srivastava, 2015;Coyne et al., 2020). Molecular markers have been identified in gene or QTL mapping and association studies for important agronomical traits, although such studies are limited compared with cereal crops (Cheng et al., 2015;Tayeh et al., 2015a;Kreplak et al., 2019). Marker-assisted breeding approaches that have the potential to improve productivity have been outlined (Tayeh et al., 2015b;Pandey et al., 2021).
Diseases are the major constraint to yield in all crop species, and breeding for resistance is a valuable aim. In field pea, grain yield has been shown to be heavily impacted by biotic stresses, such as rust and Ascochyta blight (Bretag et al., 2006;Rai et al., 2011). In recent years, bacterial blight (Pseudomonas syringae pv. pisi Sackett and Pseudomonas syringae pv. syringae van Hall ) has become one of the most important diseases in Australia's field pea production (Hollaway et al., 2007). If initiated in the early growth stages, it can cause up to 60% yield loss of susceptible varieties under conducive conditions (Hollaway et al., 2007).
Field-based assessments of disease and yield in variety trials are often challenging due to the subjective nature of measurements. Remote sensing technologies have the potential to objectively assess traits in the field. Sensors deployed in an unmanned aerial system (UAS) have the capacity to become highthroughput platforms (HTP) that can assess many thousands of lines in a short period of time (Huang et al., 2020). Normalized difference vegetation index (NDVI) generated from a highthroughput phenotyping platform has been used to study its association with disease resistance in chickpea , plant height in maize (Han et al., 2019), and phenology in rice . In field pea, NDVI has been used in the study of lodging (Quiros Vargas et al., 2019), and high correlation of around 0.83 was observed between NDVI and yield .
Genomic selection (GS) is a modern breeding method that combines genotypes and phenotypes of a training population to predict breeding values in genotyped but not phenotyped individuals by using appropriate statistical models. The breeding values can then be used for selection in the breeding process (Meuwissen et al., 2001). It is assumed that causal variants underlying a trait are in linkage disequilibrium (LD) with at least one marker when using high-density genome-wide molecular markers. Therefore, GS could account for all causal variants and, theoretically, is expected to outperform pedigree or markerassisted selection (Goddard and Hayes, 2007). GS has been widely studied in cereal crops Robertsen et al., 2019). It has been tested in a diverse pea collection, and high prediction accuracy (PA) was achieved for thousand seed weights (Burstin et al., 2015). Using the recombinant inbred line (RIL) populations, PA for grain yield ranged from 0.19 to 0.30 for cross-validation between populations and within populations (Annicchiarico et al., 2019).
The univariate or single-trait genomic prediction model is the most common method used in crop GS (de Los Campos et al., 2013;Wang et al., 2018;Zhao et al., 2021). However, additional information in genetically correlated traits can be exploited to improve prediction accuracies via multivariate models. Such approaches have been investigated in wheat, rice, and soybean (Jia and Jannink, 2012;Bao et al., 2015;Wang et al., 2017;Gill et al., 2021).
In Australia, the field pea breeding program started in the 1970s (Hawthorne et al., 2003), and its production currently averages about 250,000 tons per year, equating to 10-15% of the annual Australian pulse crops. The growing area of field pea covers Victoria, South Australia, Western Australia, and New South Wales (GRDC, 2018). As an important cash crop, incorporating GS to improve grain yield in the future field pea breeding program is critical. The aims of this study were to (1) estimate variance components and heritability for yield, NDVI, and bacterial blight scores (BBSC); (2) calculate the genetic correlation among the three traits; and (3) assess the genomic PA for grain yield by combining NDVI and BBSC with the multivariate models under cross-validation and forward independent prediction scenarios. This study was established to evaluate the potential to predict grain yield for field pea using phenotypic traits collected through HTP field phenotyping and disease scores, and to facilitate the implementation of GS in the field pea breeding program.

Plant Material and Field Trials
In 2018, the field pea breeding program had four breeding stages, represented as stages 0 (S0), 1 (S1), 2 (S2), and 3 (S3), which had decreasing numbers of entries due to the artificial selection conducted in the breeding cycle. Breeding lines in each cycle were selected from the respective previous staged breeding lines from 2017, i.e., S3 lines were selected from the 2017 S2 lines. The S0 stage consisted of 960 newly generated lines, with S1, S2, and S3 trials having 420, 240, and 144 breeding lines, respectively. Pedigree of all breeding lines are included in the two replicates, apart from the S0 trial which was partially replicated. The plots in all trials were arranged in 12 columns, with each plot being 5 m long and 1.25 m wide with five rows spaced at 0.25 m each. Visual scores of BBSC were recorded on a 1-10 scale from each plot when disease infection was at its peak (pod filling stage). Briefly, score 1 represented a plot that had 0-10% plants affected by the disease, 2 represented a plot with 10-20% plants affected by the disease, and so on. Plots were machine-harvested, and grain weights were recorded. The plot weights were converted to yield (tons/ha) using a plot width of 1.75 m. Weather data during the growing season were obtained from a weather station installed at the site (Supplementary Figure 1). Chemical applications were according to local farmer practice.
Aerial Image Acquisition, Processing, and Analysis The aerial images were acquired when disease infection was at maximum, at the early pod filling stage, using a multispectral Parrot Sequoia camera (Parrot S. A., Paris, France) attached to a UAS. The camera had four discrete channels, namely, red (660 nm), green (550 nm), red-edge (735 nm), and near-infra-red (790 nm), and an RGB sensor. The application "Tower" was used to automate the flight mission with a front and side overlap of 75% each, at a speed of 5 m/s, and conducted at 40 m above ground level to estimate NDVI. These flight parameters delivered high-resolution orthomosaics with a ground sampling distance of 2.69 cm. The acquired images were processed using the software Pix4D mapper Pro (Pix4D, Lausanne, Switzerland) to generate high-resolution orthomosaics and NDVI maps (Gebremedhin et al., 2020). In brief, the images were imported into the Pix4D software and radiometrically corrected using a known reflectance source, the Airinov calibration plate (Airinov, Paris, France), to generate orthomosaics and NDVI TIFF images. These TIFF files were imported into ArcGIS Pro version 2.1 (https://www. arcgis.com/) to estimate plot level NDVI values. The rectangular polygons with a unique number (plot number) were created and overlaid on the NDVI orthomosaics image to extract plot level average NDVI values for each trial.

Genotyping
Eight seeds from each line of field pea were germinated, leaf tissue samples of five seedlings were pooled, and total RNA was extracted using RNeasy R 96 Kit (Qiagen, Hilden, Germany). RNA-Seq libraries were prepared using SureSelect stranded RNA library preparation kit (Agilent Technologies, Santa Clara, CA) following the manufacturer's instructions (Malmberg et al., 2018), and sequencing was performed on either the HiSeq 3000 or Novaseq system (Illumina Inc., San Diego, USA) to a depth of 4-5 million paired reads per sample. The sequence reads were filtered and aligned to the field pea reference genome, Pisum_sativum_v1a.fa of cultivar Cameor (Kreplak et al., 2019). Variant calling was performed using GATK (Van der Auwera et al., 2013), and SNP filtering was performed with read depth (DP ≥ 5), missing rate (<60%), and base quality (Q30). Genotype imputation was performed using the Beagle package (Browning and Browning, 2007). A total of 1,453 lines with 47,352 SNPs (40% heterozygosity) were used for genomic prediction. The genomic relationship matrix (GRM) was calculated for all lines according to VanRaden (2008). Each breeding population GRM was extracted from the combined GRM. A heatmap sorted by hierarchical clustering based on the GRM in each breeding cycle was combined and plotted in R. The principal component analysis (PCA) based on combined GRM was performed to visualize cluster information across the breeding cycles.

Statistical Analysis
All statistical analyses were conducted in ASReml (Gilmour et al., 2015), and variance components were estimated with implemented restricted maximum likelihood (REML) method.

Univariate Models
The general form of the univariate linear mixed model was as follows: where y is a vector of phenotypic values for each line, and b is a vector of fixed effects, including the mean and replications. For the basic best linear unbiased prediction (BLUP) model, g is a vector of random genetic effects of lines ∼ N (0, Iσ 2 G ), I is an identity matrix, and σ 2 G is the genetic variance due to lines; for the additive genetic model with pedigree information, g is a vector of random additive genetic effects explained by pedigree, and I is replaced with A, a numerator relationship matrix based on pedigree, referred to as ABLUP; with G, a genomic relationship matrix calculated from SNP markers, referred as GBLUP; with fitting A and G simultaneously, referred as AGBLUP. σ 2 a is the additive genetic variance captured by pedigree; σ 2 g is the additive genetic variance explained by markers; r and c are the vectors of random field design effects for row and column; ε includes the independent measurement residual ∼ N (0, Iσ 2 e ), and the spatial dependent residual, which includes the row and column two-dimensional covariance, R = r (p r ) ⊗ c (p c ). X, Z g , Z r , and Z c are the incidence matrices associating phenotypes with fixed and random effects of b, g, r, and c, respectively. The nonadditive genetic component was also estimated by fitting line effects in addition to pedigree and marker effects in model 1. We assumed that the variance explained by pedigree and marker effects were associated with additive effects, and the remaining genetic variance explained by lines was due to non-additive genetic effects.
Best linear unbiased estimates (BLUE) were calculated using a univariate model similar to the BLUP model, but the breeding lines were fitted as fixed effects. BLUEs were used as adjusted phenotypic data to calculate the genomic PA in different models (Supplementary Table 2).

Multivariate Models
The multivariate model with three traits, also referred to as the BLUP multivariate model, was illustrated as follows: where the model terms for traits 1, 2, and 3 were similar to the previously described univariate model. In this model,   is the independent residual variance and covariance matrix for three traits as described for the univariate model. I is the incidence matrix. When I was replaced with G, the model was referred to GBLUP multivariate model.

Heritability and Genetic Correlation
Broad-sense heritability (H 2 ) for each trait was calculated as the proportion of phenotypic variance explained by genetic variance (σ 2 G ), which was estimated with univariate models; where σ 2 G is the genetic variance of lines, σ 2 e is the random independent residual error variance, and rep is the trial replicates number (Holland et al., 2002).
Narrow-sense heritability (h 2 ) was represented as h a 2 , narrowsense heritability captured by pedigree; h g 2 , narrow-sense heritability captured by markers; h a+g 2 , narrow-sense heritability captured by pedigree and markers, and they were estimated as follows: where σ 2 a is the additive genetic variance explained by pedigree (A), and σ 2 g is the additive genetic variance explained by makers (G). When non-additive genetic effects were decomposed, narrow-sense heritability could be estimated the same way but the variance of line effects was added to the total phenotypic variance.
Genetic correlation (r G ) between traits was calculated using the variance components estimated from BLUP multivariate models in four breeding populations, where σ 2 Gi and σ 2 Gj are the total genetic variance of traits i and j, and σ 2 Gij is the total genetic covariance between traits i and j.
In GBLUP multivariate model, the additive genetic correlation (r g ) between traits was calculated in the same way but with additive genetic variance, and covariance of traits was based on G.

Genomic Prediction and Validation
The genomic prediction was evaluated for grain yield with the GBLUP univariate model and GBLUP multivariate models, including fitting both the NDVI and yield together as a bivariate model and fitting NDVI, BBSC, and yield as a multivariate model. The PA was defined as Pearson's correlation coefficient between genomic estimated breeding values (GEBVs) and BLUEs. In the cross-validation method, all lines within each breeding cycle were randomly divided into five equal subsets. Each subset was, in turn, chosen as the validation set and was subsequently predicted by using the other four subsets as the training set. The crossvalidation process was repeated five times, and mean PA and FIGURE 1 | Box plots of grain yield, NDVI, and BBSC across the breeding stages (S0-S3).
Frontiers in Plant Science | www.frontiersin.org standard deviation (SD) were calculated across all 25 validation sets. In the forward prediction method, the training set consisted of one or multiple breeding stages previous to the validation breeding stage. A total of six scenarios were tested using GBLUP models, including using traits in the S0 to independently predict S1, S2, and S3; combining S0 and S1 to predict S2 and S3; and combining S0, S1, and S2 to predict S3.

Summary Statistics of the Traits
Summary statistics of grain yield, NDVI, and BBSC for the four breeding stages are shown in Figure 1. The average grain yield varied across the stages. S2 had the lowest average grain yield, while the other three stages had average grain yields of ≈2.5 t/ha. The average NDVI was consistent, and S3 showed the lowest NDVI. The average BBSC varied dramatically across stages. S1 and S3 showed lower BBSC than S0 and S2.

Population Structure
In total, 1,453 breeding lines with 47,352 SNPs were used to calculate the GRM. With all stages combined, three groups could be observed according to the dendrogram with different levels of relatedness among lines within and between the groups (Supplementary Figure 2). When the GRM was ordered according to breeding stages, highly related lines across stages were observed (Figure 2). S0 was the largest breeding stage and had 738 lines with genotypic data. Two main subgroups of closely related individuals within S0 had close relatives in S1, S2, and S3 (Figure 2, Supplementary Figure 3). There were 356 genotyped lines in S1, and some of them had close relationships with each other. One of the highly related S1 subgroups had close relatives in S2 and S3 but not in S0 (Figure 2,  Supplementary Figure 3). In S2 and S3, 223 and 136 lines were genotyped, respectively. The heatmaps for the subset of the GRM for each stage clearly showed more highly related lines within S3 than S1 (Supplementary Figure 3). According to PCA based on the GRM, the first three PCs explained nearly 70% of the total variance, and the plots for 1st, 2nd, and 3rd PCs further showed that all stages of the breeding lines generally overlapped (Supplementary Figure 4).

Variance Components and Heritability
Estimating variance components from different breeding stages will help us understand the breeding population, especially when selection is conducted at each breeding cycle. The genetic parameters estimated for each trait would help us to observe the traits' genetic architecture. Overall, the non-additive genetic variance was observed for all three traits, but its proportion varied with models fitted and varied across breeding stages. The estimated h 2 was reduced in most cases for all traits when including non-additive genetic components while the estimated error variances were not reduced. The proportion of total genetic variation explained by additive genetic effects, represented as the estimated h g 2 (by marker), was comparable to h a 2 (by pedigree) when the non-additive genetic variance was not included.
When combining all breeding stages together, the estimated broad-sense heritability (H 2 ) was 0.58, 0.47, and 0.38 for grain yield, NDVI, and BBSC, respectively ( Table 1). The estimated narrow-sense heritability (h 2 , represented as h a 2 , h g 2 , and h (a+g) 2 ) was lower than H 2 for combined stages and each breeding stage except for NDVI. The estimated h 2 for grain yield ranged from 0.33 to 0.52, and the estimates were reduced and ranged from 0.25 to 0.46 when line effects were included in FIGURE 2 | Heatmap of the genomic relationship matrix (GRM) for field pea breeding stages represented with differed colors (S0_pink, S1_red, S2_orange, and S3_yellow). The colors within the GRM indicate the degree of relatedness between breeding lines (high relationships are shown in green, and low relationships are shown in blue).
Frontiers in Plant Science | www.frontiersin.org   the model for stages 0-2. Stage 3 showed higher estimated h 2 when line effects were not included. The estimated h 2 value for NDVI was moderate, ranging from 0.33 to 0.50 without including line effects. Including line effects in the model reduced the h 2 dramatically in S1. S3 showed the highest estimated h 2 compared with other stages. The estimated h 2 for BBSC was not reduced much when including and excluding line effects for S0 to S2, ranging from 0.26 to 0.4; however, the h 2 estimated for S3 was lower than that in other stages.

Genetic Correlation
The BLUP and GBLUP multivariate models showed nearly identical correlation patterns between traits ( Table 2). In S0, S1, and S2, grain yield showed a strong positive correlation with NDVI both in the BLUP (0.75-0.89) and GBLUP (0.78-0.85), and a strong negative correlation with BBSC in the BLUP (−0.72 to −0.76) and in the GBLUP (−0.70 to −0.78). The correlation between NDVI and BBSC was also strong, ranging from −0.84 to −0.94 in the BLUP model and −0.76 to −0.88 in the GBLUP model. In S3, the correlation among traits was lower except for the strong correlation between yield and NDVI in the BLUP model (0.83). The standard errors associated with the estimated correlations were much higher in S3 compared with other stages.

Genomic Prediction Accuracy
Apart from the S3, the PA for grain yield with the cross-validation method was higher in the bivariate model, which combined yield and NDVI compared with univariate models ( Table 3).
The difference in PA was more dramatic between univariate and bivariate models compared with bivariate and multivariate models, which combined yield, NDVI, and BBSC. Using the bivariate model, in S0, the PA improved from 0.41 to 0.60, a nearly 50% increase, while in S2, the accuracy improved from 0.39 to 0.49, about a 25% increase. In S1, the PA increased from 0.5 to 0.53, about a 6% increase with the bivariate model. The PA was low in S3, and the highest PA of 0.28 was achieved with the univariate model. The PA was generally lower in the forward validation method compared with cross-validation except for S3 when S0-S2 were used as a training set (Figure 3). Using S0 as a training population to predict the yield in S1, S2, or S3, the bivariate model dramatically increased the PA, about 60% improvement was observed to predict S2. When combining breeding populations, using S0 and S1 to predict S2 or S3, PA was increased in bivariate and multivariate models. When S0, S1, and S2 were combined and used to predict S3, all models showed PA above 0.3, whereas the multivariate model had the highest PA (0.33).

DISCUSSION
Increasing the grain yield in field pea could increase the crop's economic return and attract farmers to grow the crop. GS is a modern breeding approach that could substantially improve the rate of genetic gain for grain yield. Its deployment depends on the PA that can be achieved. Traits that are correlated with grain yield have the potential to improve PA through the incorporation into the GS models. In this study, we investigate combining NDVI and BBSC in multivariate models to improve PA of grain yield in field pea.
The univariate models have been applied in pea GS studies, and PA for seed yield within the population reached 0.3 with FIGURE 3 | The PA for forward prediction scenarios where one or multiple breeding stages were used as the training set to predict the grain yield in the next breeding stage with different models (uni, univariate; bi, bivariate; multi, multivariate). the Bayesian Lasso model and 0.42 with the Reproducing kernel Hilbert space model (Annicchiarico et al., 2019;Bari et al., 2021). Using the GBLUP multivariate model, we achieved higher PA, around 0.5 within the population and above 0.4 when using the forward prediction method for grain yield, supporting that the multivariate model could improve PA in pea genome prediction. We observed close to a 50% PA increase when using bivariate models to predict grain yield, while a study in wheat reported about 70% increase in PA with the multivariate models (Rutkoski et al., 2016). In our study, NDVI and BBSC are both highly correlated with yield in S0-S2 stages. However, the correlation among traits in S3 was low. The reason was not very clear, but the small population size, close relationship between lines, and high inbreeding could impact the correlation pattern. Within S3, the PA of yield was lower in the multivariate model than the univariate model, suggesting that correlations between traits were important in multivariate models to improve PA. Similar conclusions were also observed with the multivariate studies in barley and wheat (Hayes et al., 2017;Bhatta et al., 2020). However, a sorghum GS study showed that high heritability with low correlations could also improve the PA when combined with multivariate models (Velazco et al., 2019). We observed that when we added the third highly correlated trait, BBSC, into the model, PA improvement was limited compared with the bivariate model. This is likely due to the high correlation between NDVI and BBSC and the low heritability of BBSC, resulting in less additional information to further improve the predictions.
The training population size is critical in achieving sound PA, and this can be an issue in smaller breeding programs that are not well resourced. Including the previous generation in the training population is an effective way to increase the PA, and this has been shown in several studies (Auinger et al., 2016;Gill et al., 2021). In the current pea breeding program, the breeding materials were descendants of more than 100 founder lines (Supplementary Table 1). The recycling of highly adapted parents in the crossing program induced the inbred nature of the current breeding population, although Burstin et al. (2015) highlighted the existence of large genetic diversity within the Pisum sativum genepool. All the breeding lines in different breeding stages have gone through different selection cycles. For example, breeding lines in advanced trials such as S2 and S3 were selected based on yield data and phenology, disease resistance, and abiotic stress tolerance from multi-location environment trials. In contrast, breeding lines in early-stage trials such as S0 and S1 were selected based on limited data and came through fewer cycles of selection. When combining early breeding stages to increase the reference population size, the relatedness between lines and diverse selection environments could be beneficial for predicting the later breeding cycle. We observed a higher PA in all the forward prediction scenarios, especially in S3, and was higher than the PA that cross-validation achieved when all the previous stages were combined and used as the training population. This supports the work of Bari et al. (2021), which showed that an increase in the size of the reference population could increase PA and reported similar PA (0.27-0.48) for pea yield.
The heritability of traits is important in GS and could affect the PA as genomic prediction depends on the precisely estimated SNP effects and the proportion of the variance explained by the SNPs (Daetwyler et al., 2008;Goddard, 2009). Narrow sense heritability (h 2 ) can be estimated using the pedigree information, the proportion of variance explained by numerator A matrix (Oakey et al., 2006), or using the molecular marker information, the proportion of variance explained by G matrix (Goddard et al., 2011). Our study estimated variance components and h 2 with ABLUP, GBLUP, and AGBLUP models. The additive genetic variance explained by markers was comparable with the pedigree, indicating that SNPs used in our study could reflect the similarity between lines. However, a proportion of the genetic variance could not be captured by G when comparing the GBLUP model with the AGBLUP model. This could be due to some causal mutations segregated within families or lines, which were not captured by the markers (Khansefid et al., 2014). In our study, we also fitted the non-additive genetic effects in the model, and we observed a reduction in the estimated h 2 . In contrast, the estimated error variance was not reduced with or without the line effects. This indicated that a proportion of variance explained by line could be absorbed as additive genetic variance in the ABLUP and GBLUP models, thereby inflating the estimation of h 2 . Early studies supported those observations (Kumar et al., 2015;Piaskowski et al., 2018). The impact of the non-additive genetic effects on PA needs to be further studied.
Pea grain yield has been associated with rainfalls (annual and growing season rainfall), minimum temperatures early in the season, and maximum temperatures during seed filling. In Australia, ≈70% of the field pea crop is grown in low rainfall (<230 mm) environments (Leonforte et al., 2013). In 2018, the season was relatively dry, receiving only 157 mm of rainfall during the entire growing season. Maximum and minimum temperatures were cooler early in the season, and a substantial number of frost hours (hours below freezing point) were recorded throughout the season (Supplementary Figure 1), encouraging the early season bacterial blight outbreak. Further, the lack of follow-up rains meant that crops did not recover, and substantial bacterial blight losses were observed in the farmer's field. High correlations between yield and BBSC within the breeding population reflected the impact of bacteria blight on grain yield in 2018. This severe bacterial blight epidemic was not seen in other breeding program trials. Although it is a limitation of this study with only 1 year of data in one location, the use of four trials with different population sizes allowed the investigation of traits' correlation and the development of useful prediction equations. Currently, phenotyping of yield and disease traits in the breeding program can be a challenging task due to the time and resource demands and the subjective nature of measurements. As an important auxiliary trait in predicting grain yield, NDVI has been used to predict yield in wheat (Sun et al., 2017;Hassan et al., 2019). Our study revealed a high correlation between NDVI and grain yield, consistent with the recent study in pea observing a similarly high correlation, 0.85, without biotic stresses . High correlations between NDVI and BBSC further support the effectiveness of using NDVI in predicting pea grain yield. Our study indicated that at an early stage, the prediction of grain yield with HTP in field pea is feasible.
In conclusion, this study estimated the heritability for grain yield, NDVI, and BBSC in a field pea breeding program, and it also evaluated the genomic PA for grain yield with univariate and multivariate models. The results showed that heritability for all traits was low to moderate. However, strong genetic correlations were observed among traits apart from S3. In both cross-validation and forward prediction methods, we found that the multivariate model outperformed the univariate models by substantially improving PA. We also confirmed that with a larger training set, a higher PA could be achieved when different breeding stages were included in the training population. Field phenotyping is costly and labor-intensive for large breeding trials. Both NDVI and BBSC are commonly used traits in breeding programs that could be measured in the early growth stage. Our study has shown the potential of adopting NDVI with multivariate models to improve grain yield PA. GS is becoming popular among crop breeders to predict phenotypes, especially yield. We showed that GS holds great potential for improving grain yield in field pea breeding.

DATA AVAILABILITY STATEMENT
The details of the data and results supporting the discussions and conclusions are included in the Supplementary Materials. The genotype data used for this study are deposited in the figshare repository, can be accessed from here: https://doi.org/10.6084/ m9.figshare.19817509.

AUTHOR CONTRIBUTIONS
HZ, BP, SKant, and GR conceived and designed the experiment. BP, SS, SJ, SKant, and Skaur performed and supervised the phenotyping and genotyping. HZ performed the statistical analysis and wrote the draft. BP and MK assisted in the data analysis and interpretation. SKaur and GR supported the study. All authors revised the manuscript.

FUNDING
This study was funded by Agriculture Victoria Research and Grains Research Development Corporation through a research project DAV00153.