Deciphering genetic factors contributing to enhanced resistance against Cercospora leaf blight in soybean (Glycine max L.) using GWAS analysis

Cercospora leaf blight (CLB), caused by Cercospora cf. flagellaris, C. kikuchii, and C. cf. sigesbeckiae, is a significant soybean [Glycine max (L.) Merr.] disease in regions with hot and humid conditions causing yield loss in the United States and Canada. There is limited information regarding resistant soybean cultivars, and there have been marginal efforts to identify the genomic regions underlying resistance to CLB. A Genome-Wide Association Study was conducted using a diverse panel of 460 soybean accessions from maturity groups III to VII to identify the genomic regions associated to the CLB disease. These accessions were evaluated for CLB in different regions of the southeastern United States over 3 years. In total, the study identified 99 Single Nucleotide Polymorphism (SNPs) associated with the disease severity and 85 SNPs associated with disease incidence. Across multiple environments, 47 disease severity SNPs and 23 incidence SNPs were common. Candidate genes within 10 kb of these SNPs were involved in biotic and abiotic stress pathways. This information will contribute to the development of resistant soybean germplasm. Further research is warranted to study the effect of pyramiding desirable genomic regions and investigate the role of identified genes in soybean CLB resistance.


Introduction
Soybean is a major oil-producing crop grown around the world with biotic stresses restricting global production (Hartman et al., 2015).Some major yield reducing diseases in soybean include frogeye leaf spot (caused by Cercospora sojina Hara), Phytophthora root and stem rot (caused by Phytophthora sansomeana E. M. Hansen and P. sojae Kaufm.& Gerd.),Cercospora leaf blight [caused by Cercospora cf.flagellaris Ellis & G. Martin, C. kikuchii (Tak.Matsumoto & Tomoy) M.W. Gardner, C. cf.nicotianae, and C. cf.sigesbeckiae Katsuki], soybean cyst nematode (caused by Heterodera glycines Ichinohe), and charcoal rot [caused by Macrophomina phaseolina (Tassi) Goid.] (Wrather et al., 2010;Allen et al., 2018;Bandara et al., 2020;Bradley et al., 2021).Cercospora leaf blight (CLB) typically develops in the upper canopy, and can be observed on leaves, petioles, stems, and pods, progressing from the upper canopy downward through the plant canopy (Walters, 1980).In addition to CLB producing symptoms on all plant parts, purple seed stain (PSS) can occur on soybean kernels and is caused by the same group of organisms (Alloatti et al., 2015;Albu et al., 2016;Turner et al., 2020).Favorable environmental conditions, such as high relative humidity, prolonged dew period, and warm temperatures, influence the development of CLB (Schuh, 1992, Schuh, 1993).In general, CLB symptoms begin during reproductive stages generally during the beginning seed stage (R5) as purplish bronze-colored necrotic lesions on leaves that vary in size and elongated reddish-purple lesions on petioles (Chanda et al., 2014).
The yield losses that result from CLB are generally related to the severity of disease in the field.Numerous lesions on the leaves can result in heavy premature defoliation, delay soybean plant senescence, and reduce the production, and size of the kernels produced (Walters, 1980).Based on a survey in 2006, global yield losses taken collectively for CLB as well as purple seed stain were approximately 1,912 thousand metric tons (Wrather et al., 2010).In the U.S. at that time, CLB was considered a minor disease due to generally low yield losses.However, more recently it has become more prevalent in the southern U.S. (Cai et al., 2009;Geisler, 2013).From 2015 to 2019, CLB caused estimated yield losses of approximately 969 metric tons from 28 growing states in the U.S. and Ontario, Canada (Bradley et al., 2021).
Multiple fungal species within the genus Cercospora have been associated with CLB (Soares et al., 2015;Albu et al., 2016;Sautua et al., 2020b).Phylogenetic studies have previously been conducted using different molecular markers and DNA sequencing to understand the genetic diversity, variation in pathogenicity, and the presence of different lineages (Imazaki et al., 2006;Cai and Schneider, 2008;Lurá et al., 2011;Soares et al., 2015).A genetic diversity study focusing on 164 C. kikuchii isolates from Louisiana grouped the species in two lineages and observed that isolates in lineage II (older lineage) were more aggressive than the dominating lineage I (newer lineage) (Cai and Schneider, 2008;Cai et al., 2009).Isolates collected from Argentina, Brazil, and the U.S. were observed to contain four lineages with recombination between the second and third lineage and manifesting cryptic speciation of the pathogens associated with CLB (Soares et al., 2015).Understanding the hostpathogen interactions and ecology of these cryptic species makes defining the causal organisms of the disease difficult and adds challenges to the potential development of new monitoring tools aimed at more efficient disease management strategies (Stergiopoulos and Gordon, 2014).
General disease management strategies for CLB include crop rotation, tillage, early planting dates, the use of resistant cultivars, and the application of fungicides during reproductive stages.Fungicide applications have been a primary approach to managing CLB.However, such applications have a chance of producing fungicide-resistant strains and may have negative environmental effects (Price et al., 2015;Sautua et al., 2019;Sautua et al., 2020a).Conversely, planting disease-resistant genotypes is an alternative approach to manage CLB as they reduce fungicide applications and are cost-effective.However, at present CLB-resistant cultivars are not available and as a result soybean producers rely heavily on fungicide products that come from multiple chemical classes to reduce the yield losses associated with CLB (Carmona et al., 2011;Sautua et al., 2020a).
Research has previously been conducted to identify soybean germplasm resistant to CLB and PSS (Orth and Schuh, 1994;Alloatti et al., 2015;Li et al., 2019;Kashiwa et al., 2021;Ward et al., 2021).However, minimal research efforts have been made to identify the genes or the genomic regions that confer resistance to the pathogen causing CLB identified on the vegetative structures of the plant, but

Chr a
Total SNP Chr length Avg.SNP distance b the inheritance of PSS resistance and QTLs have been identified for symptoms related to the seed (Jackson et al., 2008;Alloatti et al., 2015).Genome-Wide Association Studies (GWAS) have been used in many pathosystems to identify genomic regions and genes in the surrounding region of associated markers that can contribute to the trait of interest.GWAS leverages the power of natural diversity in germplasm and captures the historical recombinant event, thus having greater resolution than traditional linkage mapping using biparental population (Yu and Buckler, 2006).Research has been conducted to explore genomic regions that contribute to agronomically essential traits in soybean (www.soybase.org,accessed on 12/15/2023).This study reports the first GWAS study to identify genomic regions conferring resistance to CLB and putative genes that play an essential role in plant defense mechanisms.The study aims to identify genomic regions that can be used for marker-assisted selection in soybean breeding programs that focus on developing CLB resistance germplasm.
2 Materials and methods

Evaluating soybean accessions for CLB
A total of 568 soybean accessions (Supplementary Table S1) with maturity groups ranging from III to VII were obtained from the USDA germplasm collection and evaluated for CLB for 3 years (2016, 2017, and 2018) at different locations in the mid-southern U.S. Soybean accessions were planted in 21 environments: Five locations in 2016 (Fayetteville, Marianna, and Stuttgart, AR; Bossier City, LA; and Stoneville, MS) and eight locations in 2017 and 2018 (Fayetteville, Keiser, Rohwer, and Stuttgart, AR; Alexandria and Bossier City, LA; Portageville, MO; and Stoneville, MS).Planting was done during mid-June, but generally differed at each location to maximize infection and CLB symptom expression.Seed was planted in a plot represented by a single row of each accession measuring approximately 3 m in length, and the distance between each row was approximately 100 cm; however, this varied by location and ranged from 96.5 to 101.6 cm across the study locations.Recommended agronomic practices for irrigation and fertilization were conducted as well as weed and insect management.
We depend on natural inoculum because it was not practical to produce large quantities of conidia for this project.We carefully evaluated the accessions for their response to CLB during seed growth stages, especially between R5 and R6.We assessed each accession about three times during these critical growth stages.In 2016 and 2017, accessions were evaluated for CLB with different rating schemes involving observation of symptom expression from multiple specific plant parts (e.g., leaves and petioles were evaluated separately) and standard symptoms such as purpling/bronzing, blight, and petiole lesions.In 2018, a simplified severity rating scale of 0-6 to evaluate CLB severity on an additive scale based on the presentation of symptoms throughout the canopy whereby 0 for no disease symptoms, 1 for light purple/bronzing, few petiole lesions, no leaf blight, 2 for moderate purple/bronzing and/or petiole lesions, no or minimal leaf blight, 3 for heavy purple/bronzing and/ Linkage disequilibrium (LD) patterns on the soybean genome developed using 31,491 SNPs (MAF ≥0.05).The y-axis represents the average r 2 value, while the x-axis represents physical distance.
or petiole lesions, light blight, 4 for heavy purple/bronzing and/or petiole lesions, moderate blight, 5 for severe blight and less than 50% defoliation, and 6 for severe blight but more than 50% defoliation.In addition, percent incidence was presented on a 0 to 4 scale to encompass observational incidence as quartiles (Ward et al., 2021).
Natural infection was relied on, and several locations with low disease pressure were not used in the final GWAS analysis.Locations with moderate to heavy disease pressure were used for GWAS analysis and included: Bossier City, LA (BLA) and Stoneville, MS (SMS) in 2016; Alexandria, LA (ALA), Bossier City, LA, and Stoneville, MS in 2017; and Alexandria and Bossier City, LA, Stoneville, MS, and Fayetteville, AR (FAR) in 2018.

Accession and SNP array data for GWAS
As a result of seed scarcity there was variation in the number of accessions planted at each location and genotypic data for all 568 accessions was not available.Therefore, 460 accessions that were planted and evaluated in all the location-years as well as with available genotypic data were used for analyses.A total of 42,080 SNPs that can be assigned to 20 chromosomes within the soybean genome were obtained through the Illumina Infinium SoySNP50 K iSelect SNP Beadchip (Song et al., 2013).Markers with minor allele frequency (MAF) < 5% or had a missing rate greater than 10% were excluded, resulting in 31,491 SNPs for additional analysis.

Population structure and linkage disequilibrium analysis
Population structure analysis to determine the number of subpopulation and member of each subpopulation was done using STRUCTURE v.2.3.4 software (Pritchard et al., 2000).For structure analysis, the resulting 31,491 SNPs were used with admixture model with five iterations of 50,000 burn-in and 50,000 Monte Carlo Markov Chain (MCMC) replications for k = 1 to k = 10.The number contained within the subpopulation was determined by calculating DeltaK using Structure Harvester (Earl and Vonholdt, 2012).Furthermore, linkage disequilibrium (LD) decay or r2 between pairs of markers was calculated using TASSEL  (Continued).Population structure analysis of 460 soybean accessions evaluated for Cercospora leaf blight incidence and severity over multiple environments from 2016-2018: (A) Delta K (ΔK) calculated between K = 1 to K = 10 using information from STRUCTURE analysis indicates six subpopulations (K = 6) and (B) Bar plot illustrating Subpopulation K = 6 with each color representing a unique cluster.(C) Principal components analysis was performed with 31,491 SNPs selected for GWAS in R. The PCA result is consistent with the existence of six subpopulations.(Bradbury et al., 2007) using sliding windows of 50 SNPs, and visualization was done in R to determine the LD decay rate and distance when the r2 dropped to half of the maximum value (Remington et al., 2001).Linkage disequilibrium between pairs of markers was plotted against the physical distance relying on the r 2 values.The red fitting curve estimates the rate of LD decay when the power of maximum r 2 was reduced to half, and the blue fitting curve illustrates the model fit to LD decay.

Assessing phenotypes and statistical analysis
The combined data observations for the years 2016 and 2017 were converted into categorical data and analyzed using a linear model with the "lm" function.The "anova" function was then used to obtain an ANOVA table.For the 2018 data, ANOVA was performed using the "aov" function, considering the severity and incidence.ANOVA was calculated separately for the years 2016-17 and 2018, as different methods and scales were used to rate CLB disease.
The program TASSEL was used to remove markers with minor allele frequency (MAF) < 5% or with a missing rate larger than 10% (Bradbury et al., 2007).The filtered set of markers were used to conduct GWAS using FarmCPU (Fixed and random model Circulating Probability Unification) with one of the tools present in GAPIT version 3 package in R (Liu et al., 2016;Wang and Zhang, 2021).FarmCPU is more effective than the generalized linear model (GLM) and the mixed linear model (MLM) as it controlled with both false positives and false negatives and is a widely used method in soybean GWAS studies (Kaler et al., 2018;Chamarthi et al., 2021).Kinship and PCA analysis were conducted using GAPIT and incorporated in the GWAS analysis.A SNP was declared significantly associated with the trait if it crossed the threshold value, [−Log10 (P) ≥ 3.5] or had a p-value ≤ 0.0003, which is similar to previous soybean GWAS studies (Dhanapal et al., 2015;Kaler et al., 2018;Chamarthi et al., 2021).If a SNP had a significant association to a trait [−Log10 (P) ≥ 3.5] in at least one environment and passed a threshold value of p ≤ 0.05 in additional environments, it was considered to be a common significant associated SNP for multiple environments (Kaler et al., 2017;Kaler et al., 2018;Chamarthi et al., 2021).
Candidate genes were searched for SNPs that were associated with CLB in multiple environments by scanning a ~20 kb region of the significant SNP.The genome browser at Phytozome (https:// phytozome.jgi.doe.gov/jbrowse/) and SoyBase (https://soybase.org) was used to identify putative genes and their functions.The search was targeted to identify any genes that are involved in defense mechanisms against biotic and abiotic stresses.

Assessing CLB disease in soybean accessions
The ANOVA analysis indicated that in 2016 and 2017, the accessions significantly impacted CLB severity more than the locations.In 2018, the locations became the primary cause of variation in plant response to CLB, with the accessions contributing a lesser role (Supplementary Table S3, adapted from Ward et al., 2021).All accessions planted over the 3 years were ranked from most resistant to least resistant based on normalized disease evaluations (Supplementary Table S4, adapted from Ward et al., 2021).Among the 460 accessions evaluated, 17 consistently showed resistance to CLB for all 3 years and appeared in the top 10% of the population with the lowest disease severity (Supplementary Figure S1; Supplementary Table S4, adapted from Ward et al., 2021).Similarly, 17 accessions were identified consistently higher level of the disease severity across the years.These accessions will provide excellent genetic resources for cloning the genes that interact with the pathogen and understanding the interaction between the host and the pathogen.

Genomic insights: understanding marker distribution, LD decay, and population structure
The filtered SNPs amounted to 31,491 in total, spanning an estimated 948 megabases (Mb), which covers approximately 82% of the soybean genome (1.15 Gb).Across the chromosomes, the number of SNP markers per chromosome exhibited a range from 998 (chromosome 20) to 2,694 (chromosome 18), with an average of 1,575 SNP markers per chromosome (Table 1).The average distance between the two markers for the genome was approximately 30.1 Kb, while the range varied from 21.5 Kb (chromosome 18) to 48 Kb (chromosome 20).
The estimated LD decay rate, measured in the r2, dropped to half at approximately 324 Kb (Figure 1).According to population structure evaluation, optimum subpopulation (k) was determined to be 6, which means that the 460 genotypes can be divided into six populations (Figure 2).The average distance (expected heterozygosity) within a subpopulation was approximately 0.2 while the FST fixation index, which measures the genetic variance in the subpopulation, ranged from 0.34 to 0.98 (Supplementary Table S2).Principal component analysis (PCA) clustering also supported that the accessions group into six different subpopulation (Figure 2).

Identification of genomic regions linked to CLB
A total of 99 SNPs in 18 of 20 chromosomes were identified to be associated with CLB severity, while 47 SNPs that were present in more than two environments were distributed in 13 chromosomes (Figures 3-5; Supplementary Figures S1-S3; Supplementary Table S5).Chromosome 19 had the greatest number of associated SNPs to severity, while chromosome 6 had the greatest number of common SNPs (Table 2).Furthermore, the allele effect for disease severity ranged from −8.68 to 9.18 for these significantly associated SNPs.Here a negative value for allelic effect suggested that minor alleles were favorable, while a positive allele effect suggested that major alleles were favorable for disease resistance.For disease incidence we observed a total of 85 SNPs associated in all 20 chromosomes with chromosome 2 having the greatest number of SNPs (Figures 3-5; Supplementary Figures S2-S4; Supplementary Table S6).A total of 23 SNPs were determined to be associated to disease incidence in more than two environments (Table 3).The allele effect for disease incidence ranged from −5.96 to 10.98 for these significantly associated SNPs.

Candidate genes and ontologies for CLB resistance
A total of 47 and 23 SNPs (Tables 2, 3) showed a significant association and were observed to be associated in multiple  environments for disease severity and incidence rating, respectively.The search for candidate genes revealed that 36 out of 47 associated SNPs for disease severity and 19 out of 23 SNPs for disease incidence had genes in the vicinity of the 10 kb region surrounding them.A total of 72 and 57 genes (Supplementary Table S7), respectively, were identified in the vicinity of 36 (disease severity) and 19 (disease incidence) SNPs.GO enrichment analysis (https://www.soybase.org/) of these genes indicate involvement in several biotic and abiotic stress pathways such as regulating hydrogen peroxide; salicylic acid metabolic process; abscisic acid mediated signaling pathway; defense response to pathogens and insects; flavonoid biosynthetic pathway; response to cold and salt stress; as well as wound response signaling.

Discussion
To date, a limited number of studies have focused on identifying genomic regions associated with resistance to Cercospora kikuchii causal pathogen of CLB.The QTL resistance of C. kikuchii resistance for PSS was identified in a F 2 population with parents, "Agripro 350" and "PI 80837" (Jackson et al., 2008); however, this provided limited information due to the lack of recombination events and genomic composition in a single cross.In GWAS studies, natural variation among a large set of soybean germplasm is used providing greater resolution and increasing the chances of identifying additional genomic regions associated with the trait.This approach has previously been used in soybean to discover genomic regions related to essential attributes like canopy cover, lodging, leaflet area, oil and protein content, plant height, pod shattering, pod number, oil component, protein components, and abiotic and biotic stresses (Leamy et al., 2017;Kaler et al., 2018;Lee et al., 2019;Patel et al., 2023b).
A total of 99 and 85 SNPs were identified associated with disease severity and incidence, respectively, out of which 47 and 23 were confirmed over multiple environments.In total, 14 SNPs were less than 200 kb away from the QTLs that have been identified to bestow resistance to diseases such as soybean cyst nematode (SCN), sudden death syndrome (SDS), Soybean mosaic virus (SMV), and Sclerotinia stem rot (Table 4).
Recently, GWAS and RNA-Seq studies were conducted to identify genomic regions and genes linked to the soybean response to target spot, caused by Corynespora cassiicola (Patel et al., 2023a;Patel et al., 2023b).Two associated SNPs from the current study were located within 1 Mb from markers associated with target spot symptoms in soybean: One nestled on chromosome 11 (ss715608754), which demonstrated an association with the incidence of CLB and another residing on chromosome 16 (ss715624926), which was linked to the severity of CLB.The SNP ss715624926 on chromosome 16 is within a 1.86 Mb genomic region, which hosts markers that were linked not only to target spot but also to Sclerotinia stem rot, iron deficiency chlorosis, and SCN.This 1.86 Mb genomic region is home to 35 genes that are crucial in disease resistance including: Family protein/LRR family protein, TIR-NBS-LRR family, receptor-like proteins (RLP), cysteine-rich RLK protein kinase 25, cytochrome P450, LRR-RLK, and LRR transmembrane protein kinase (Bradbury et al., 2007).Further exploration of this region may open exciting possibilities for the cultivation of soybean cultivars with broadspectrum resistance against multiple pathogens, thereby fortifying the sustainability and resilience of soybean.

Conclusion
In summary, this study marks a pioneering effort in identifying genomic regions and genes influencing soybean plant response to CLB.The 17 lines identified in this effort coupled with the identification of key SNPs and candidate genes offer considerable value for developing novel breeding lines fortified with CLB resistance.Moreover, certain genomic regions revealed in our study consistently demonstrated associations with CLB across diverse environments and have previously been linked to additional diseases.These findings present an opportunity to Frontiers in Genetics frontiersin.orgleverage these genomic regions, facilitating the streamlined integration of multiple disease-resistance loci in future breeding programs.To better understand our findings, additional research in controlled environments using specific species of Cercospora is necessary.Frontiers in Genetics frontiersin.org11 Patel et al. 10.3389/fgene.2024.1377223

FIGURE 3
FIGURE 3 Manhattan plot presenting the association between SNPs, disease severity (SEV), and incidence (INC) of soybean accessions evaluated for Cercospora leaf blight in Bossier City, LA (BLA) and Stoneville, MS (SMS) during 2016.The gray horizontal line indicates the genome-wide significant threshold [−log10 (p-value) = 3.5].

TABLE 1
Distribution of SNP markers along the soybean genome.
a Environment.b Chromosome.c Common environment.d Minor allele frequency.e Bold allele is favorable for disease resistance.
a Environment.b Chromosome.c Common environment.d Minor allele frequency.e Bold allele is favorable for disease resistance.

TABLE 4
Soybean genome region associated for multiple disease resistance.
a Chromosome.b Bold allele is favorable for disease resistance.c Quantitative trait loci.