Genome-Wide Mapping of Loci Associated With Resistance to Clubroot in Brassica napus ssp. napobrassica (Rutabaga) Accessions From Nordic Countries

Rutabaga [Brassica napus ssp. napobrassica (L.) Hanelt] is reported to be an excellent source of clubroot (Plasmodiophora brassicae) resistance genes. In this study, 124 rutabaga accessions from the Nordic countries (Norway, Sweden, Finland, Denmark, and Iceland) were evaluated for their reaction to five single-spore isolates representing P. brassicae pathotypes 2F, 3H, 5I, 6M, and 8N and 12 field isolates representing pathotypes 2B, 3A, 3O, 5C, 5G, 5K, 5L, 5X (two isolates, L-G2 and L-G3), 8E, 8J, and 8P. The accessions were also genotyped using a 15K Brassica SNP array and 60 PCR-based primers linked to previously identified clubroot resistance genes. Six thousand eight hundred sixty-one SNP markers were retained after filtering with TASSEL 5.0, and used to evaluate four general linear models (GLM) and four mixed linear models (MLM). The PCA + K and Q + K MLM models gave the minimal deviance of the observed from the expected distribution in quantile-quantile plots, and hence were used for SNP-clubroot association analyses. In addition, 108 alleles derived from the PCR-based markers and the phenotypic data were analyzed with the PCA + K model. Forty-five SNPs and four PCR-based markers were identified to be associated strongly with resistance to isolates representing 13 pathotypes (2F, 3H, 5I, 6M, 8N, 2B, 3A, 3O, 5C, 5G, 5K, 5L, and 8P). These markers revealed the top and bottom segments of rutabaga chromosome A03 and the middle segment of chromosome A08 as genomic hotspots associated with resistance to the different P. brassicae pathotypes.


INTRODUCTION
Rutabaga (Brassica napus ssp. napobrassica) has been cultivated commercially as a vegetable crop in Canada since the 1950s and 1960s (Shattuck and Proudfoot, 1990;Spaner, 2002). This crop is affected by a number of diseases including clubroot of crucifers, caused by the obligate parasite Plasmodiophora brassicae. Clubroot is best managed by the use of host resistance . The European winter oilseed rape (syn. canola; B. napus) 'Mendel' has been the main source of clubroot resistance gene(s) in many canola breeding programs in Canada , but 'new' pathotypes of P. brassicae capable of overcoming this resistance in clubroot resistant (CR) canola cultivars have emerged recently in Alberta (Strelkov et al., 2016. Indeed, the CRa/CRb Kato resistance gene(s) on the A03 chromosome of 'Mendel' (Fredua-Agyeman and  were found to confer resistance to only about 50% of the new P. brassicae pathotypes identified in Alberta from 2013 to 2015 . As such, efforts have intensified to identify new clubroot resistance genes from other Brassica sources to develop the next generation of CR canola cultivars. Greenhouse resistance screening showed that rutabagas possess both qualitative and quantitative resistance to isolates representing the old and new pathotypes of P. brassicae (Hasan et al., 2012;Fredua-Agyeman et al., 2019). Therefore, rutabagas can be used in the breeding of CR canola cultivars. Examples of CR rutabaga cultivars developed in Canada include 'Chignecto, ' 'York, ' 'Fortune, ' 'Kingston, ' and 'Brookfield' (Shattuck and Proudfoot, 1990). The Canola Breeding Program at the University of Alberta mapped the Crr1 clubroot resistance gene in 'Brookfield' to the A08 chromosome of B. rapa and B. napus (Hasan and Rahman, 2016). While rutabaga is a root vegetable and canola is an oilseed, they are both B. napus and share the same ploidy level and genome (2n = 38, AACC). Therefore, the introgression of clubroot resistance genes from rutabaga to canola does not face many of the pre-and postzygotic challenges associated with crosses with more distant relatives. On the other hand, many cycles of backcrossing and breeding will be needed to achieve canola quality (spring growth characteristics, plant architecture, reductions in the high content of erucic acid in the oil and glucosinolate in the seed meal, as well as days to flower). In addition, molecular markers linked to the clubroot resistance genes in rutabagas need to be identified for marker-assisted selection (MAS) to track the introgressed genes.
In plants, linkage-based mapping is often carried out on populations developed from bi-parental crosses, and so polymorphism is restricted to the contrasting genetic variability for the trait of interest in the parents (Brachi et al., 2011;Gupta et al., 2014). As such, markers detected for quantitative traits are restricted to the two parents and may not work in populations developed from other parents. An advantage of linkage-based QTL mapping is the often very high power of QTL detection (Gupta et al., 2014). Linkage disequilibrium (LD)-based association mapping (AM), such as genome-wide association mapping (GWAS), is an alternate approach for capturing recombination events to the gene level in natural populations, and has several advantages over traditional linkagebased mapping (Gupta et al., 2014). GWAS can be applied to a diverse set of genotypes of any crop species, does not require ancestry or pedigree information for QTL mapping, captures higher allelic diversity, provides higher resolution, can be used to study many traits of interest, and is cheaper and faster to complete since it does not require the development of a mapping population. However, LD-based AM is often limited by the detection of false positive associations (Type I error) that may arise as the result of linkage, population stratification and relatedness among individuals (Gupta et al., 2014). In addition, SNPs with minor alleles (<5-10%) are filtered off during GWAS analyses, and hence GWAS lacks the power to detect these minor alleles (Brachi et al., 2011).
The objectives of this study were to phenotype and genotype rutabaga accessions collected from Norway, Sweden, Finland, Denmark, and Iceland, and to use GWAS to identify SNP and SSR markers significantly associated with clubroot resistance to a collection of P. brassicae isolates from Alberta, Canada, representing different pathotypes. In addition, the positions of the identified markers, as well as those from previous genetic mapping studies, were traced to the B. rapa, B. oleracea, and B. napus reference genomes. Lastly, candidate genes associated with the identified genomic regions were identified based on the proteins they encode.

Plant Material
The materials used in this study comprised all 124 B. napus ssp. napobrassica (rutabaga) accessions used for the genetic diversity studies of Yu (2020). Population-genetic differences were captured in the aforementioned study with the software STRUCTURE v2.3.4 (Pritchard et al., 2000). Seeds of the rutabaga were multiplied by planting 2-4 seeds of each accession in 13 × 13 × 15 cm pots filled with Sunshine Mix #4 Aggregate Plus growing mix (Sungro Horticulture, Seba Beach, Alberta, Canada), and placing the pots in a greenhouse at the Crop Diversification Centre North (CDCN), Alberta Agriculture and Forestry (AAF), Edmonton, Canada. After 1 week, the seedlings were thinned to two plants per pot and kept in the greenhouse for another 5-6 weeks at 20-25 o C/15-18 o C day/night temperatures and a 16 h light/8 h dark photoperiod. The plants were then vernalized for 12 weeks in a cold room maintained at 4 o C with a 12 h photoperiod, and subsequently returned to the greenhouse for flowering, maturation and seed harvest.
The five SSIs were classified as pathotypes 2, 3, 5, 6, and 8 on the differentials of Williams (1966) and as pathotypes 2F, 3H, 5I, 6M, and 8N, respectively on the CCD Set . The numbers in the CCD designations correspond to the classification according to the differentials of Williams, while the letters designate the unique virulence patterns on the hosts of the CCD. The 17 isolates (representing isolates of 16 pathotypes) used in this study were maintained as galls on the universally susceptible European Clubroot Differential (ECD) 05 (Buczacki et al., 1975) at −20 o C.

Clubroot Phenotyping
Greenhouse clubroot tests were conducted at the CDCN following Fredua-Agyeman et al. (2019). Briefly, 4-8 seeds of each rutabaga accession were placed on Whatman No. 1 filter paper wetted with distilled water in Petri dishes, and kept at room temperature (18-22 o C) and a 12 h light/12 h dark photoperiod provided by white fluorescent light. The filter paper was moistened daily with water for 7 days, at which point the seeds had germinated. The universally susceptible B. napus (canola) cv. 'Westar' was included in each experiment as a positive control.
The inoculated seedlings were transplanted into 13 cm × 13 cm × 15 cm pots filled with Sunshine Mix #4 Aggregate Plus soilless mix (Sungro Horticulture Canada Ltd., Seba Beach, Canada), followed by the addition of 1 mL of inoculum into the potting mix at the base of each plant (Lamers and Toxopeus, 1977 as cited by Voorrips and Visser, 1993). The inoculated rutabaga seedlings were planted at the periphery of each pot while one seedling of the susceptible check 'Westar' was placed at the center. Pots inoculated with the same isolate were placed randomly on one bench and re-randomized twice on the 3rd and 6th weeks after transplanting.
The experiments were repeated 3-4 times based on seed availability. Maintenance of the plants (watering, fertilizing and insect pest control) and growing conditions (photoperiod and temperature) in the greenhouse were as described by Fredua-Agyeman et al. (2019).

Clubroot Disease Rating and Phenotypic Variation
The severity of clubroot was assessed 8 weeks after inoculation on a 0-3 scale (Kuginuki et al., 1999) where: 0 = no galling, 1 = a few small galls on the lateral roots, 2 = moderate galling on the lateral roots but not on the main root, and 3 = severe galling on both the lateral and main root. Ratings for the rutabaga accessions in each pot were considered valid only if the disease rating of the susceptible check 'Westar' in the same pot was 2 or 3. An index of disease (ID, 0-100%) was calculated for each isolate/host genotype combination following Strelkov et al. (2006), with the mean IDs and standard deviations (SDs) of the repeated experiments determined for all 124 rutabaga accessions for each of the 17 P. brassicae isolates.
(2019) and the clubroot resistance screening guidelines of the Western Canada Canola/Rapeseed Recommending Committee (WCC/RCC). The grand mean (GM) IDs (means of the means of IDs with all 17 isolates) of each accession were also calculated and used as an indicator of broad-spectrum resistance. The percentage of the accessions that were resistant (R), moderately resistant (MR), or susceptible (S) were calculated. In addition, the frequency distribution of the resistance reaction of the rutabaga accessions to each pathotype was tested for normality. PROC TRANSREG (Box-Cox and Log transformation) in SAS v. 9.4 (SAS Institute, United States) and the rank based inverse normal transformation (INT) were carried out on the phenotypic data that did not yield normal curves.

DNA Extraction
DNA was extracted from ca. 0.25 g leaf tissue of each accession using the cetyltrimethyl ammonium bromide (CTAB) method (Sambrook and Russell, 2001). The DNA concentration was measured with a ND-2000C spectrophotometer (NanoDrop, Technologies, Inc., Wilmington, DE, United States) and the concentration diluted to 10-20 ng µL −1 for the working solution of each DNA sample.

SNP Genotyping
SNP genotyping was performed using the Brassica 15K SNP array at TraitGenetics GmbH, Gatersleben, Germany, according to the manufacturer's protocols (Clarke et al., 2016). After removing monomorphic, low coverage site markers, markers with MAF ≤ 0.05 and those missing data for >5% of the accessions, 6861 SNP markers, comprising 4390 A-genome and 2471 C-genome markers, were used for GWAS analyses.
PCR and Genotyping With PCR-Based Markers Linked to Known CR Genes PCR reactions, genotyping, amplified product detection and calling of allele sizes were performed as described by Fredua-Agyeman et al. (2014).
The genotyping was carried out with 60 PCR-based primers linked to seven previously identified clubroot resistance genes. Primers screened from the A03 chromosome of B. rapa consisted of the following: six SCAR markers linked to the CRa gene , three SSR, four SCAR and one CAP marker(s) linked to the CRb gene (Piao et al., 2004;Zhang et al., 2014), 32 SSR and three InDel markers linked to the CRb Kato gene (Kato et al., 2012, one STS marker linked to the Crr3 gene (Saito et al., 2006), one CAP marker linked to the Rcr1 gene (Chu et al., 2014), one SSR and one SCAR marker linked to the CRk gene Matsumoto et al., 2012) and two SSR markers linked to the CRd gene (Pang et al., 2018).
Primers screened from other chromosomes comprised: three SSR markers on the A08 chromosome of B. rapa linked to the Crr1 gene Hasan and Rahman, 2016;Hobson and Rahman, 2016), one SSR marker on the A01 chromosome linked to the Crr2 gene and one SSR marker on the A06 chromosome linked to the Crr4 gene (Suwabe et al., 2003. The primer names, sequences and chromosomal locations are listed in Supplementary Table S1. Each allele was coded as a separate site with each allele in turn recoded as one of two nucleotides (A or T; C or G) as specified in the TASSEL Manual (Bradbury et al., 2007). Genotyping was repeated for 10% of the individual samples to confirm the reproducibility of the molecular data (Fredua-Agyeman et al., 2014). In addition, alleles with a frequency of ≤5% were removed to reduce false positive associations (Nie et al., 2016). One hundred and eight alleles linked to known CR genes were used for the GWAS analyses.
Intra-chromosomal linkage disequilibrium in the rutabaga accessions was analyzed by calculating the squared value of the correlation coefficient of the allele frequencies (r 2 ) between all pairs of linked loci with TASSEL 5.0 (Bradbury et al., 2007). However, due to computational challenges, the inter chromosomal pairwise comparison of r 2 -values could not be obtained directly for the 4384 SNP markers (9607536 pairs) on chromosomes A 1 -A 10 , the 2367 SNP markers (2800161 pairs) on chromosomes C 1 -C 9 and all 6751 SNP markers (22784625 pairs) for the entire genome. Instead, the data of the pairwise comparison of linked SNP markers on each chromosome were used to obtain the mean values for the A-and C-genomes, as well as for the entire AC genome of rutabaga ( Table 1).
The significance of pairwise marker r 2 -values was determined by calculating the Chi-square (χ 2 ) statistic for each SNP pair according to Zhou et al. (2012), except that a threshold p-value < 0.001 was used to assess the level of significance. The PROC GPLOT procedure in SAS v. 9.4 (SAS Institute) was used to generate LD plots of the r 2 -values of pairs of markers with p < 0.001 vs. physical map distance (in Mb) for each chromosome. The data points were then fitted with a solid curve using the PROC TRANSREG function in SAS. Background linkage disequilibrium (BLD) was calculated as the r 2 -values for unlinked markers that exceeded 95% (95th percentile) of the data set, following Breseghello and Sorrells (2006). The average extent of LD of each chromosome was estimated by the projection of the intersection between the fitted curve and the r 2 threshold line onto the physical distance axis (Breseghello and Sorrells, 2006;Bellucci et al., 2015).

Marker-Clubroot Association
To identify loci in the rutabaga accessions associated with the response to each of the 17 P. brassicae isolates, genomewide association analyses were conducted with TASSEL 5.0 (Bradbury et al., 2007) using the 6861 SNP marker data and the mean ID values of each accession and pathotype. A separate analysis was performed using the 108 alleles derived from the 60 PCR-based markers and the mean ID data from the clubroot phenotyping experiments.
Marker-trait association analyses were carried out following Li et al. (2014) with slight modifications. Four models each were tested with the general linear models (GLM) and mixed linear models (MLM) procedures implemented in TASSEL 5.0 (Bradbury et al., 2007). The GLM tested comprised the N (Naïve), Q-only (population structure), K-only (Kinship) and PCA-only (Principal Component Analysis) models. The MLM tested comprised the Q + K and PCA + K models (Li et al., 2014) and two additional MLM association tests of Q and PCA using Distance matrices (D) (i.e., Q + D and PCA + D).
Quantile-Quantile (Q-Q) plots, which plot the -log 10 P-value of the test of association (observed) with that expected given the null hypothesis of no marker-trait associations were obtained for each model. The results for the best GWAS model(s) were presented as Manhattan plots. Markers with strong associations to clubroot resistance were identified from the peaks on Manhattan plots. The Bonferroni correction was used to set the significance cut-off (α/n, where α = level of significance, n = number of markers) for both the SNP and the PCR-based markers. The SNP markers were considered to be significantly associated with the traits if P ≤ 1 × 10 −4 (i.e., -log 10 P = 4.0; 1/n, n = number of markers) based on the adjusted Bonferroni correction method (Benjamini and Hochberg, 1995). In the case of the PCR-based markers, a slightly lower threshold of P ≤ 5 × 10 −4 (i.e., -log 10 P = 3.3) was used to indicate the significance of associations between the markers and the traits.
The physical positions of the SNP and the SSR markers with strong associations to clubroot resistance were then mapped to the B. rapa, B. oleracea, and B. napus genome sequences to identify their association with previously characterized clubroot resistance genes.

Candidate Gene Prediction
The sequences of the PCR-based and SNP markers found to be associated with clubroot resistance loci were used in BlastN searches of the B. rapa (AA), B. oleracea (CC), B. napus (AACC) and Arabidopsis thaliana genome sequences available in the NCBI 1 GenBank database to determine their possible functions. An E-value ≤ E-20 and a percentage identity of ≥ 95% were used as the threshold cut-offs to confirm the putative functions of the candidate genes.

Statistical Analyses
Statistical analyses of the phenotype data were conducted with SAS v. 9.4 (SAS Institute, United States). The PROC SORT function was used to select the R and MR (i.e., ID ± SD ≤ 50%) 0.047 1081.6 ± 649.9 * Chromosomes which had more distant SNP markers removed before determination of extent of linkage disequilibrium. φ The number and percentage of SNP pairs in significant LD were determined from Chi-squared tests at p < 0.001. ψ The extent of LD decay was estimated from the projection of the intersection between the fitted curve of the data points and the 95th percentile of unlinked r 2 threshold line (background LD) onto the physical distance axis.
accessions. Duncan's test (Steel and Torrie, 1960) was used to test (P ≤ 0.05) for differences among the ID mean values of all 17 isolates and to quantify these differences among the rutabaga accessions. The distribution of the IDs for each R and MR rutabaga accession were visualized with box-and-whisker plots.

Phenotypic Variation of Clubroot Resistance in Rutabaga Accessions
The frequency distribution of 124 rutabaga accessions evaluated for resistance to 17 P. brassicae isolates (representing 16 pathotypes) is shown in Figure 1. The phenotypic values for two of the 17 traits were bimodal distributed (Figures 1G,N), two (Figures 1H,O) were skewed to the right, while the remaining 13 distributions were skewed to the left (Figures 1A-F,I-M,P,Q). Unfortunately, transformation and rank-based inverse normal transformation of the data did not yield superior normal curves. The complexity of the disease, virulence of the isolates and the susceptibility of the host plants might have accounted for the skewness of the data even under transformation. The descriptive statistics for the reactions of the rutabaga accessions (Supplementary Table S2) to the different P. brassicae isolates suggested that, in spite of the largely skewed phenotypic data, sufficient resistant and susceptible resources existed to carry out this GWAS study.
Overall, 12.1% of the rutabaga accessions were resistant or moderately resistant to ≥8 of the isolates, while the vast majority (87.9%) exhibited very little resistance (Figures 1-3). The latter comprised 33 (26.6%) accessions susceptible to all 17 P. brassicae isolates, and 76 (61.3%) accessions that were resistant or moderately resistant to isolates representing 1-7 pathotypes (data not shown). Therefore, the rutabaga accessions showed a wide variation in their reaction to isolates representing all the P. brassicae pathotypes used in this study. Resistance in the accessions with broad spectrum resistance was of the order FGRA106 > FGRA037 > FGRA108, FGRA068, FGRA044, FGRA072 > FGRA036, FGRA112 > FGRA109.

Linkage Disequilibrium
The deletion of the most distant SNP markers reduced the coverage length of chromosomes A04, C01, C02, C03, C04, C05, C06, C07, C08, and C09. For example, the deletion of just one or two SNPs from chromosomes C05, C06, C07, and C08 significantly reduced the SNP length covered from 19.9, 17.4, 26.6, and 33.0 Mb (data not shown) to 2.1, 3.2, 4.9, and 4.4 Mb, respectively (Table 1, Figure 4, and Supplementary Figures S1, FIGURE 4 | Plots of correlation coefficient (r 2 ) as a function of physical distance (in Mb) between pairs of SNP markers on chromosomes A02, A03, and A08 of the A-genome (top) and chromosomes C02, C03, and C05 of the C-genome (bottom). Red curves represent the fit plots of the data points and the orange line represents the background linkage disequilibrium (BLD) or threshold line. The extent of LD decay was determined from projection of the intersection of the curves and the BLD line onto the physical distance. The SNP markers on the A-genome had a wider coverage (∼22-32 Mb) while a reduced set of SNP markers was available on the C-genome (∼2.0-3.2 Mb).

S2).
As a result, the 6751 SNP markers used for the estimation of LD decay covered 285.9 Mb (A-genome 259.1 Mb + Cgenome 26.7 Mb) instead of 478.0 Mb (A-genome 265.9 Mb + Cgenome 212.1 Mb, data not shown) if the 110 more distant SNP markers were included in the analysis. The marker density ranged from one marker/5.4-106.4 kb for the 19 chromosomes. The mean inter-marker distance were 63.4 ± 21.9 kb and 15.0 ± 8.4 kb for the A-and C-genomes, respectively. Overall, this translated to one SNP marker per 40.5 ± 29.8 kb in the rutabaga chromosomes ( Table 1).
The average of the squared allele correlation LD (r 2 ) for the A-genome, C-genome and the entire genome were calculated to be 0.041, 0.063, and 0.047, respectively ( Table 1). About 22.3 ± 4.4% (A-genome 20.3 ± 3.4% + C-genome 24.5 ± 4.4%) of the total intra-chromosomal SNP pairs were in the significant LD (p < 0.001). Based on the selected significant SNPs, the background LD (estimated as the 95th percentile of unlinked r 2values) for the entire genome was determined to be r 2 = 0.181. The average extent of LD decay (which was derived from the intersection between the fitted curve and background LD) for the 19 chromosomes ranged from 200 to 2300 kb (Figure 4 and Supplementary Figures S1, S2) with a mean of 1081.6 ± 649.9 kb ( Table 1). The respective ranges for the A-and C-genome chromosomes were 1100-2300 kb and 200-1500 kb while the means were 1560.0 ± 408.8 kb and 550.0 ± 398.1 kb, respectively ( Table 1). Based on the above data, LD in the C-genome decayed faster whereas LD persisted two to five times longer in the A-genome.

Association Mapping of Clubroot Resistance Loci
Analyses of the SNP marker genotype data with TASSEL 5.0 (Bradbury et al., 2007) detected population structure (Q) and kinship (K) among the 124 rutabaga accessions. Based on the Q-Q plots of the four GLM models, the deviation of the observed -log 10 P distribution from the expected distribution was of the order Naïve > K-only > PCA-only and Q-only for both the PCR-based (data not shown) and SNP markers ( Supplementary  Figures S3A-D). All four MLM (Q + K, PCA + K, Q + K, and PCA + D) models (Supplementary Figures S3E-H) gave a minimal deviance of the observed -log 10 P from the expected distribution compared with the aforementioned GLM models. In addition, the MLM models that utilized Kinship matrix (i.e., PCA + K and Q + K) departed the least from the expected distribution compared with the MLM models that utilized the Distance matrix (i.e., PCA + D and Q + D). Therefore, based on the Q-Q plots of the eight GWAS models, the PCA + K and Q + K MLM models were used for SNP-clubroot association analyses. Manhattan plots for the PCA + K and Q + K models used for the identification of clubroot resistance loci to the different isolates are presented in Figure 5.
FIGURE 5 | Manhattan plots of the PCA + K (a-k) and Q + K (l-p) models for identifying clubroot resistance loci in 124 rutabaga (Brassica napus ssp. napobrassica) accessions. The dashed horizontal lines indicate the Bonferroni-adjusted significance threshold (P ≤ 1 × 10 −4 ). The dots above the significance threshold indicate SNPs associated with resistance to each P. brassicae isolate.
Forty three (43) SNP markers were detected by the PCA + K model to be significantly associated with resistance to 11 P. brassicae pathotypes. This comprised 4, 13, 3, 4, 6, 2, 1, 3, 1, 5, and 1 SNPs which were associated with resistance to pathotypes 2F, 3H, 5I, 6M, 8N, 2B, 3A, 3O, 5C, 5K, and 8P, respectively ( Table 2). The SNP marker Bn_A01_p161237 was significantly associated with resistance to pathotypes 3H and 6M, while the remaining 42 were associated with resistance to only one pathotype each. Thirteen (13) SNP markers were found by the Q + K model to be significantly associated with resistance to six P. brassicae pathotypes ( Table 2). This comprised nine SNPs which were associated with resistance to pathotype 3H and one SNP marker each which was associated with resistance to pathotypes 3O, 5I, 5K, and 8N. Eleven of the 13 SNPs were detected both by the Q + K and PCA + K models, while two SNPs (Bn_A03_p21487106 and Bn_A05_p3191390) were detected only by the Q + K model ( Table 2).
Only the PCA + K model was used in the case of the GWAS with the PCR-based markers. Two SSR markers KB29N19 and B0903 (Kato et al., 2012) on the B. rapa chromosome A03 were significantly associated with resistance to eight (3H, 6M, 8N, 2B, 3A, 5C, 5G, and 5I) and 3 pathotypes (5C, 5L, and 5K), respectively. The InDel marker B1005 (Kato et al., 2012), also on the A03 chromosome, was associated with resistance to pathotype 5K. In addition, the SSR marker A08-5021 (Hobson and Rahman, 2016) on the A08 chromosome was significantly associated with resistance to pathotype 5L. Therefore, a total of 45 (32 by PCA + K only, 2 by Q + K only and 11 by both PCA + K and Q + K) SNP markers and 4 PCR-based markers were detected to be significantly associated with resistance loci to the five old and eight of the 12 new P. brassicae pathotypes used in this study. None of the molecular markers were found to be associated with resistance to pathotypes 5X (L-G2 and L-G3), 8E or 8J.
About 85% of the markers identified mapped to the A-genome of B. rapa and B. napus. The highest number of markers (22%) mapped to the A03 chromosome, where the Crr3 (Hirai et al., 2004;Saito et al., 2006), CRk , CRd (Pang et al., 2018), CRa (Matsumoto et al., 1998, CRb (Piao et al., 2004;Zhang et al., 2014), CRb Kato (Kato et al., 2012, Rcr1 (Chu et al., 2014;Yu et al., 2016) and BraA.CR.a (Hirani et al., 2018) genes have been characterized previously (Figure 6). This next highest number of markers mapped to the A08 chromosome (13%), where the Crr1 gene has been reported Hasan and Rahman, 2016;Hirani et al., 2018; Figure 7). About 5-8% of the markers were identified on the A01, A02, and A06 chromosomes, where the Crr2, CRc and the Crr4 genes have been identified, respectively. In addition, 11% of the markers mapped to each of the A04 and A05 chromosomes, while about 2-5% of the markers mapped to the A07, A09, and the A10 chromosomes, where no major clubroot resistance genes have been mapped thus far. About 8% of the markers mapped to the C-genome (C02, C03, and C05) of B. oleracea, while 7% mapped to scaffolds.  θ Mixed Linear Model (MLM) designations: PCA, principal component analysis; Q, population structure; K, Kinship. α SNP markers denoted with the same superscript letter mapped to multiple chromosomes on the reference genomes. The type of PCR-based markers showing trait association has been specified. β Linkage groups A 1 -A 10 = Brassica rapa and C 1 -C 9 = Brassica oleracea. Pathotype designations are as per the Canadian Clubroot Differential Set (letters) and the system of Williams (numbers). Putative functions are based on matching entries in the GenBank database of NCBI.
The three significant PCR-based markers (B1005, B0903, and KB2919) detected on the A03 chromosome were in LD with the closest SNP marker BnA03_P21487106 with r 2 -values in the range of 0.62-0.76. In the case of the significant SSR marker A08-5021 on the A08 chromosome, the r 2 -value with the closest SNP marker Bn_A08_p10123561 was 0.92. Thus, the four PCR-based markers and their closest SNP markers on both chromosomes were located on genomic regions tightly linked to CR genes. The physical positions and the chromosomal locations of the associated SNP and the PCR-based markers on the B. rapa and B. oleracea genome sequences are provided in Table 2.

Putative Functions of Proteins Encoded by Identified Sequences
The sequences identified in this study matched entries in GenBank corresponding to ATP binding proteins, kinases, hydrolases, transferases, transcription factors, DNA topoisomerases, chaperone proteins, and translocationrelated proteins, which are involved in cellular and biological processes as well as plant growth and development. Genes that encoded nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, WD-40 repeat family proteins, and syntaxin and histone deacetylases, which are associated with plant defense against pathogens, also matched some of the sequences identified in this study. Other matching genes encoded proteins including the RING/Ubox superfamily proteins and nodulin-like transporter family proteins, which have been reported to play a role in plant growth and organ size development. Overall, about 60% of the genomic regions associated with the identified clubroot loci were previously characterized. The remaining 40% encoded proteins of unknown molecular function ( Table 2). FIGURE 6 | Physical maps of the A03 chromosome of B. rapa constructed by the use of SSR and SNP markers identified to be associated with clubroot resistance in this study (a) and the PCR-based markers previously identified to be linked to the Crr3 (Hirai et al., 2004;Saito et al., 2006), CRk , and CRd (Pang et al., 2018) gene(s) located on the top half of chromosome A03 (b), as well as the CRa (Matsumoto et al., 1998, CRb (Piao et al., 2004;Zhang et al., 2014), CRb Kato (Kato et al., 2012, Rcr1 (Chu et al., 2014;Yu et al., 2016) and BraA.CR.a (Hirani et al., 2018) gene(s) located on the bottom half of chromosome A03 (c).
FIGURE 7 | Physical maps of chromosome A08 of Brassica rapa constructed by the use of SSR and SNP markers identified to be associated with clubroot resistance in this study (a), and the PCR-based markers previously identified to be linked to the Crr1 gene (Hirani et al., 2018) (b) and Hasan and Rahman (2016) (c). Fine mapping of the Crr1 genomic region was conducted with SSR markers developed by Hobson and Rahman (2016).

DISCUSSION
The complex genetic basis of clubroot disease and the emergence of new virulent isolates of P. brassicae make it imperative to identify molecular markers significantly associated with resistance to different P. brassicae pathotypes for use in markerassisted selection (MAS) and for the breeding of clubrootresistant Brassica crops. Genome wide association studies have without doubt proven to be one of the most useful methods for finding significant marker-trait associations (Rafalski, 2010). In this study, 124 rutabaga accessions collected from Norway, Sweden, Finland, Denmark and Iceland were evaluated for resistance to pathotypes of 17 P. brassicae identified in Canada from 2003 to 2018 (Strelkov et al., 2005(Strelkov et al., , 2016. This is the most comprehensive clubroot GWAS or genetic mapping study carried out to date, since the previous studies utilized from one to five pathotypes (Matsumoto et al., 1998Hirai et al., 2004;Piao et al., 2004;Saito et al., 2006;Kato et al., 2012Kato et al., , 2013Chu et al., 2014;Zhang et al., 2014;Fredua-Agyeman and Rahman, 2016;Hasan and Rahman, 2016;Li et al., 2016;Yu et al., 2016;Pang et al., 2018). Various studies have reported that transformation of data do not necessarily lead to the estimation of the correct Type 1 error rates (Beasley et al., 2009;Goh and Yap, 2009). Therefore, we used the untransformed data bearing in mind that Type 1 errors may be inflated. Similar, data analysis on the untransformed data was carried out by Zhou et al. (2018). However, comparing the power of different methods and determining the merits of whether or not to transform data to achieve normality may be beyond the scope of this study.
Linkage disequilibrium is important for determining the number and density of markers needed for GWAS and the experimental design needed to perform association analysis (Flint-Garcia and Thornsberry, 2003). The filtered set of 6751 SNP markers obtained from the Brassica 13.2K SNP array covered 285.9 Mb of the B. napus genome. About twice the coverage (∼645 Mb) has been reported in studies that have used the Brassica 60K array (Qian et al., 2014;Qu et al., 2017), as well as by the study by Zhou et al. (2017) that utilized Specific-Locus Amplified Fragment Sequencing (SLAF) technology. Despite using a reduced SNP array set (especially on the C-genome), the mean marker density ( Table 1) of 14 of the 19 chromosomes in this study was within the range of one SNP/14.5-72.0 kb obtained by Qian et al. (2014), one SNP/25.0-52.9 kb according to Li et al. (2014) and one SNP/11.6-54.7 kb obtained by Qu et al. (2017) during association studies of B. napus using the Brassica 60K SNP array. Like the aforementioned studies, the marker density varied greatly among chromosomes and also according to genome.
Furthermore, the extent of LD decay using high density SNP markers in B. napus was reported by Qian et al. (2014) to vary from 80 to 2000 kb for the A-genome and from 400 to 7500 kb for the C-genome. Wu et al. (2016) reported the LD levels in the A-genome varied from 106 to 1908 kb whereas that in the C-genome varied from 227 to 4089 kb. However, Qu et al. (2017) and Zhou et al. (2017) reported much smaller ranges of LD decay for the A-genome (50-100 kb and 9-141 kb) and C-genome (1250-1500 kb and 63-1993 kb). Therefore, the determined LD decay in this study (which varied from 1100 to 2300 kb for the A-genome and which ranged from 200 to 1500 kb for C-genomes) was comparable to previous decayed LD estimates obtained in the aforementioned studies.
In B. napus, one cM was estimated to correspond to 500 kb Ecke et al., 2010). Therefore, the total length of 285.9 Mb covered by the 6751 SNP markers correspond to approximately a quarter (571.7 cM) of the length of the genetic map of B. napus (∼2500 cM; Delourme et al., 2013). With the minimum LD decay of 200 kb (∼0.4 cM), a maximum of 1450 (few thousands) evenly spaced SNP markers would be necessary to perform the GWAS in the rutabaga accessions. Therefore, the more than 6000 SNP markers used for the GWAS should have sufficient power to perform a good association analysis. The variations in the LD decay among the 19 chromosomes were large and hence significant SNP-trait associations were extended to flanking regions 0.2 cM∼100 kb upstream and downstream of the identified genomic hotspots or candidate genes. Different GWAS models were tested to find the best fit model for identifying associations between the SNP and PCRbased markers and clubroot resistance. Clearly, the PCA + K model, which found 43 SNP markers to be strongly associated with clubroot resistance, overestimated the number of markers. In contrast the Q + K model, which detected 13 markers, underestimated the number of markers associated with clubroot resistance. Therefore, we combined the results from the two models to harness the strengths of the two methods. The 11 markers captured by the two MLM models were the most consistent, but we found that markers detected by either the PCA + K or the Q + K method only still had value. For example, the SNP marker Bn_A03_p21487106, which was detected only by the Q + K model, was associated with Leucine-rich repeat receptor kinases (LRR-RKs). The InDel marker B1005, which was detected only by the PCA + K model, was associated with the Toll and interleukin-1 receptor nucleotide binding site-Leucine rich repeat (TIR-NBS-LRR) protein. These disease resistance proteins play roles in host-specific and non-host-specific defense and wounding responses, as well as in the control of development, the stress response and senescence in plants (Torii, 2004;McHale et al., 2006;Passricha et al., 2018).
'Electronic PCR' (e-PCR) is a useful procedure to search for and position DNA sequences on reference genomes (Schuler, 1998;Rotmistrovsky et al., 2004;Fredua-Agyeman and Rahman, 2016). By positioning the markers identified in this study and markers from previous studies that co-segregated with clubroot resistance on the B. rapa reference genome sequences, we identified the top and bottom segments of the A03 chromosome and the middle segment of the A08 chromosome of rutabaga as genomic hotspots associated with resistance to the different P. brassicae pathotypes.
The SNP marker Bn_A03_p13610858 was located ∼786,000 nt upstream of the SCAR marker HC688-6, while the SNP marker Bn_Scaffold000164_p55747 was located ∼1,375,000 nt downstream of BrSTS61. The genomic region based on the SNP markers spanned ∼3,000,000 nt compared with that based on the PCR-based markers which spanned ∼765,000 nt and conferred resistance to the field isolates representing pathotypes 2B and 8P. The large physical distances between the SNP and the PCRbased markers located at the top half of the A03 chromosome, and the fact that the genes controlling clubroot resistance in this region conferred resistance to multiple P. brassicae pathotypes, suggest that the genes controlling clubroot resistance in this region are independent.
The second genomic hotspot was located at the bottom half of the A03 chromosome and conferred resistance to isolates representing 10 pathotypes. Fredua-Agyeman and Rahman (2016) identified the major resistance gene(s) in this genomic region to be the CRa and or the CRb Kato genes based on markers from Matsumoto et al. (1998Matsumoto et al. ( , 2012 and Kato et al. (2012Kato et al. ( , 2013. In the literature, three other clubroot resistance genes CRb (Piao et al., 2004;Zhang et al., 2014), Rcr1 (Chu et al., 2014;Yu et al., 2016), and BraA.CR.a (Hirani et al., 2018) were also mapped to the same genomic region. Marker TCR37 (23,826,564 nt)  and SC2930-2 (24,814,696 nt) (Matsumoto et al., 1998 flanked all five major clubroot resistance genes (CRa, CRb, CRb Kato , Rcr1, and BraA.CR.a) on the bottom half of the A03 chromosome and this genomic region spanned about 1 million (∼988,000 nt) nucleotides. In this study, a much smaller genomic region (∼260,000 nt) with flanking markers B1005 (24,376,816 nt) and KB29N19 (24637310 nt) conferred resistance to four SSIs (pathotypes 3H, 5I, 6M, and 8N) and six field isolates (pathotypes 2B, 3A, 5C, 5G, 5L, and 5K). In contrast to the top half, this study and the previous studies mapped the CRa, CRb, CRb Kato , Rcr1, and BraA.CR.a genes located at the bottom half of the A03 chromosome to <1 million nucleotides from each other. This strongly suggests that the aforementioned reported genes could be alleles. It also is possible, however, that they could be part of a gene cluster with each conferring resistance to the different pathotypes as reported by Zhang et al. (2014) and Yu et al. (2016).
The third genomic hotspot was located at the middle segment of the A08 chromosome with SSR marker A08-5021 (11,614,477 nt) being closest to the overlapping gene (Bra034629) and the SNP markers Bn_A08_p10123561 (10,122,198 nt) and Bn_scaff_16110_1_p2556157 (13,408,834) as the flanking markers. This genomic region conferred resistance to pathotypes and 3H, 5L and 5K in this study. Hirani et al. (2018) used linkage analysis to position the Crr1 gene between SSR markers S11R11 and S06R06, which by e-PCR was between 10,779598 and 10,970,334 nt. Hasan and Rahman (2016) reported that the Crr1 gene in the rutabaga cultivar 'Brookfield' was located in the genomic region (10,692,602-11,617,968 nt) and it conferred resistance to pathotypes 2F, 3H, 5I, 6M, and 8N. Unfortunately, e-PCR could not position any of the Crr1 markers used by Hatakeyama et al. (2013) on the B. rapa genome. However, markers for the amplification of the 2nd Intron and 3rd intron of the Crr1 gene were positioned between S11R11 (8,529,231 nt) and S27R27 (6,871,891 nt) on the B. napus genome. This study positioned the Crr1 gene in the same genomic region as was reported by the other three studies. Hatakeyama et al. (2013) reported two gene loci for the Crr1 gene: Crr1a which encodes a TIR-NB-LRR class of R proteins and has major effects, and Crr1b with minor effects but necessary to confer resistance to some P. brassicae isolates. The overlapping genes identified in this genomic region encoded a cyclase-associated protein, a DDB1-CUL4 associated factor and proteins of unknown molecular function. Therefore, functional analysis studies need to be carried out to get a full understanding of the role of the overlapping genes in this genomic region.

CONCLUSION
In conclusion, the rutabaga accessions identified in this study can be used as new donor resistant germplasm for the next generation of CR Brassica napus variety development. This is because markers linked to clubroot resistance in the rutabaga accessions mapped to genomic regions on the A03 and A08 chromosomes where almost all CR genes (CRk, Crr3, CRd, CRa, CRb Kato , Rcr1, CRb, and Crr1) on the A-genome are housed. In addition, the study identified novel clubroot resistance loci on the C-genome of rutabaga associated with resistance to different P. brassicae pathotypes from Alberta. Markers identified in this study need to be validated to find out whether they directly co-segregate with clubroot resistance or they are in linkage disequilibrium with a QTL that contributes to the resistance. The markers identified in this study will be valuable in MAS and the breeding of clubroot resistant cruciferous crops.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the text, figures, tables and Supplementary Material of the article.

AUTHOR CONTRIBUTIONS
RF-A: grant application, collection of germplasm, data analysis, writing of manuscript, and supervision of graduate student. ZY: seeding of plant materials, inoculum preparation, greenhouse inoculation experiments, washing of roots, disease rating, and writing of basic draft of manuscript. S-FH and SS: grant application, supervision and provision of technical support for all pathotyping work carried in the greenhouse, and revision of the manuscript.

FUNDING
Financial support for this work was provided by the (1) Alberta Crop Industry Development Fund Ltd. (ACIDF) under the project #2016C040R which was entitled "Screening for resistance to new pathotypes of Plasmodiophora brassicae for sustainable clubroot management." (2) Canola Council of Canada (CCC) under the project #ASC-02 which was entitled "The Canola Agriscience Cluster: Sustainable, Reliable Supply for a Changing World."

ACKNOWLEDGMENTS
We thank the Nordic Genetic Resource Center (NordGen, Sweden) for providing the rutabaga accessions used for this work. We are also grateful to Alberta Agriculture and Forestry for greenhouse space and logistic support. Lastly, we thank Alberta Canola, SaskCanola, the Manitoba Canola Growers Association, Agriculture and Agri-Food Canada, and the Canola Council of Canada for financial support over the years.
FIGURE S2 | Plots of correlation coefficient (r 2 ) vs. physical distance (Mb) for chromosomes C01, C04, C06, C07, C08, and C09. FIGURE S3 | Quantile-Quantile comparison of eight GWAS models for identifying clubroot resistance loci in 124 rutabaga (Brassica napus ssp. napobrassica) accessions tested with 17 field and single-spore isolates of Plasmodiophora brassicae representing 16 different pathotypes. The four GLM tested comprised the Naïve(N) (A), Kinship (K)-only (B), Population structure (Q)-only (C), and the Principal Coordinate Analysis (PCA)-only (D). The four MLM tested comprised Q + D (E), Q + K (F), PCA + D (G), PCA + K (H) models; where D is the Distance Matrix. The black line is the expected -log 10 P distribution while colored lines are the observed -log 10 P distribution for each of the 17 P. brassicae pathotypes.
TABLE S1 | List of 60 PCR-based primers linked to previously identified clubroot resistance genes used to genotype 124 rutabaga accessions.