Identification of Candidate Genes for Clubroot-Resistance in Brassica oleracea Using Quantitative Trait Loci-Sequencing

Clubroot caused by Plasmodiophora brassicae is a devastating disease of cabbage (Brassica oleracea). To identify quantitative trait loci (QTLs) for clubroot resistance (CR) in B. oleracea, genomic resequencing was carried out in two sets of extreme pools, group I and group II, which were constructed separately from 110 and 74 F2 cloned lines derived from the cross between clubroot-resistant (R) cabbage “GZ87” (against race 4) and susceptible (S) cabbage “263.” Based on the QTL-sequencing (QTL-Seq) analysis of group I and group II, three QTLs (i.e., qCRc7-2, qCRc7-3, and qCRc7-4) were determined on the C07 chromosome. RNA-Seq and qRT-PCR were conducted in the extreme pools of group II before and after inoculation, and two potential candidate genes (i.e., Bol037115 and Bol042270), which exhibiting upregulation after inoculation in the R pool but downregulation in the S pool, were identified from the three QTLs on C07. A functional marker “SWU-OA” was developed from qCRc7-4 on C07, exhibiting ∼95% accuracy in identifying CR in 56 F2 lines. Our study will provide valuable information on resistance genes against P. brassicae and may accelerate the breeding process of B. oleracea with CR.


INTRODUCTION
Clubroot disease caused by the obligate parasite Plasmodiophora brassicae is a devastating disease that affects brassica species worldwide including important crops such as Brassica oleracea, Brassica rapa, and Brassica napus (Dixon, 2009;Chu et al., 2014). The pathogen usually causes gall formation on plant roots, which may subsequently hamper the uptake of sufficient nutrients and water, leading to abnormal growth of plants, and finally resulting in severe yield losses and economic damage to the crop (Devos et al., 2005). It is hard to control this disease once a field is contaminated with the pathogen since the resting spores of the soil-borne pathogen can survive in the soil for as long as 20 years (Chu et al., 2014). Cultural management and chemical fungicides are currently the common approaches to control clubroot (Friberg et al., 2005;Ludwig-Müller et al., 2009;Peng et al., 2015), but the efficiency is unstable and the environmental damage from fungicides cannot be ignored. Therefore, developing resistant (R) cultivars is the most effective, economical, and environment-friendly way to control clubroot.
In a previous study , several QTLs for traits, such as root length and P. brassicae content in roots, were identified from a hybrid cabbage (B. oleracea L. var. capitata) variety "GZ87, " which was R to P. brassicae race 4. However, QTLs for CR in "GZ87" have not been identified yet. In this study, QTLsequencing (QTL-Seq) strategy  was used to identify QTLs using genome resequencing in two sets of extreme pools. Potential candidate genes were identified from important QTL regions, and a functional marker was further developed for molecular-assisted selection. Our study will provide important information for the breeding of clubroot-R cabbage.

Pathogen Isolates
Inoculum (resting spores) were obtained from clubbed roots of infected plants collected from Fuling, Chongqing, China (E107.6459, N29.5709), where P. brassicae race 4 is the dominant pathogen causing clubroot. The clubbed roots were washed and stored at −20 • C. Before inoculation, resting spores were extracted from clubbed roots by homogenizing the clubroot galls. The homogenate was filtered through a double gauze and then adjusted to a concentration of 4 × 10 7 spores/ml for inoculation.

Plant Materials and Resistance Test
Two cabbage lines, "GZ87" (with CR to P. brassicae race 4) and "263" [susceptible (S) to P. brassicae race 4], were selected as parents to develop an F2 segregating population comprising 184 cloned lines through tissue culture following the asexual reproduction technology (Luo et al., 2000). Vegetative plants were transplanted into 72-well plug trays after rooting and kept in a phytotron (16/8 h light/dark cycle under 25 ± 2 • C). The inoculation of P. brassicae was conducted 1 week after transplanting, in which each plant was inoculated with 5 ml P. brassicae resting spores suspension (4 × 10 7 spores/ml) at the stem base using an irrigation method (Nagaoka et al., 2010;Peng et al., 2018). Notably, 15 plants of each line with three replications were inoculated. Plants were rated at 6 weeks after inoculation according to Chen J. et al. (2013) on a 0-4 scale ( Figure 1C), where 0 = no clubs, 1 = less than three small clubs on the lateral roots, 2 = less than three large clubs on the lateral roots, or more small clubs, or small clubs on the taproot, 3 = large clubs on the taproot and lateral roots or many large clubs on the lateral roots, 4 = taproot rots, no lateral roots, or less. Disease index (DI) was calculated according to the following formula (Strelkov et al., 2006): DI = [(0n0 + 1n1 + . . . + 4n4) × 100]/(4 × NT), where n0-n4 are the numbers of different plants in different disease grades, and NT is the total number of plants tested. The group resistance grading standard described in the study by Liu et al. (2018) was followed: DI = 0, highly R; DI < 10, R; 10 ≤ DI < 20, moderately S; 20 ≤ DI ≤ 50, S; and DI > 50, highly S. Disease incidence (DIC) was recorded as the percentage of diseased plants in the total number of inoculated plants.

Whole-Genome Resequencing and Quantitative Trait Loci-Seq Analysis
To improve the reliability of QTL mapping, the F2 segregating population was randomly divided into two small groups, in which group I and group II contained 110 and 74 F2 lines, respectively. Each of 17/20 extreme R and S lines was selected from group I/II. Genomic DNA was extracted from young leaves of two parents and the extreme lines following the cetyltrimethylammonium bromide (CTAB) method (Doyle, 1991). An R pool and an S pool were constructed within each group by mixing an equal volume of DNA from the extreme R and S lines. The two pools in group I and two parents were subjected to whole-genome resequencing on an Illumina HiSeq X Ten platform by using a biomarker (Beijing, China), while the two pools in group II and parents were sequenced on a NovaSeq 6000 platform by using the same service provider. Reads with low-quality bases (Q ≤ 20) or with an N ratio over 10% in the pooled samples and over 1% in parental lines were removed. Then, the high-quality reads were mapped to the reference genome of B. oleracea 1 using BWA software (Li and Durbin, 2009). The reads filtering and single nucleotide polymorphisms/insertions or deletions (SNPs/indels) calling were conducted according to Zhou et al. (2015).
The QTL-Seq analysis was performed in accordance with the method of Takagi et al. (2013). The SNP-index is calculated for each SNP and relating SNP-index and chromosome positions are obtained for both the R and S pools separately. The two SNP-index are compared to identify the region with (SNPindex)? value that is specific to the disease resistance. In brief, the (SNP-index) was calculated for each locus (SNP or indel), and a sliding window analysis was applied to generate the (SNPindex) curve with a window size of 100 Kb and increment of 10 Kb (Fekih et al., 2013). Significant SNP index values (the 1% right tail) were identified as the empirical thresholds, where the threshold value was 0.34 for samples in group I and 0.35 for samples in group II.

Transcriptome Sequencing
The whole roots were collected from extreme lines in group II at 0, 4, 7, and 14 days after inoculation (DAI), with three biological replications. Roots from R and S lines were mixed at each time point, respectively, resulting in 24 samples (R0, R4, R7, R14, S0, S4, S7, and S14, three biological replications for each) for RNA extraction and transcriptome sequencing (RNA-seq). Total RNA was extracted from these pooled root samples using TRNzol-A + Reagent (TianGen, Beijing, China), and 24 library preparations were generated and sequenced on an Illumina Hiseq 2000 TM platform using a biomarker. Clean reads were aligned to the B. oleracea reference genome. Fragments per kilobase of transcript per million fragments mapped (FRKM) was used as the indicator of the gene expression level. DESeq2 was used for identifying differentially expressed genes (DEGs) (Love et al., 2014). Fold change (FC) ≥ 2.0 and false discovery rate (FDR) < 0.01 were used as the screening criteria of DEGs.

Functional Marker Development
According to the genome sequence of the interested region, a common polymorphic site between two parents and between extreme pools was developed for a molecular marker, of which the primers were 5 -TACACACGCTGATATACCAACA-3 (forward) and 5 -TACACACGCTGCCCTGGAAA-3 (reverse). The PCR amplification was conducted among B. oleracea lines in group II. The PCR products were separated into 1.5% agarose gels and visualized under UV light.

Clubroot Resistance of Parents and Extreme Pools
Clubroot resistance was investigated in two parental lines and the two F2 groups across 2 years (Supplementary Table 2 and Figure 1B). When tested together with group I (the 1st year), "GZ87" exhibited a high resistance level to P. brassicae (mean DI of 6.3), while "263" showed an average DI value of 40 ( Figure 1A). The two parental lines exhibited obvious differences in DI when tested together with group II (DI GZ87 = 0 and DI 263 = 56.3). Each of 17 and 20 lines was selected from the 110 and 74 F2 vegetative lines in group I and group II, respectively, presenting an average DI of 10.6 and 57.4 in the R and the S pools, respectively, in group I and 1.3 and 49.0 in the two pools in group II ( Table 1).

Identification of Quantitative Trait Loci
An average of 10.2 Gb clean data (over 20 × of the reference genome) was yielded for each parental line and extreme pool in group I, with a content: guanine-cytosine content (GC) content of 37.7% and Q30 > 92.2%. A range of 96.7-97.3% of the clean reads was aligned to the reference genome of B. oleracea. A total of 1,338,555 high-quality SNPs and 3,381,999 indels were detected between the parents and between two pools (Supplementary Table 3), while in group II, an average of 14.2 Gb clean data was yielded for each sample, with a GC content of 37.6% and Q30 > 91.26%. A range of 88.67-90.09% of the clean reads was aligned to the reference genome of B. oleracea, revealing 732,713 high-quality SNPs and 30,624,760 indels for the following QTL-Seq analysis (Supplementary Table 3).
After calculating (SNP-index) of each locus and visualizing the (SNP-index) trends by using the sliding window method within each group, 1/4 peaks were detected in group I/II with higher values than corresponding thresholds ( Table 2). One QTL  Table 2), were detected in group II. Comparing QTLs between two groups, the three QTLs detected in group II were all located within the locus qCRc7-1 identified in group I (Figure 2). Therefore, the three QTLs (i.e., qCRc7-2, qCRc7-3, and qCRc7-4) were focused on the following analysis.

Potential Candidate Genes for Clubroot Resistance
The RNA-seq was conducted in R and S pools of group II to investigate the gene expression in three QTLs (i.e., qCRc7-2, qCRc7-3, and qCRc7-4). Over six Gb clean data were obtained from each sample (Q30 > 93%), of which 72.6-74.2% were aligned to the reference genome of B. oleracea (Supplementary Table 4). Compared to 0 DAI, 540/500/1837 and 2656/1518/4182 DEGs were detected from the R and the S pool at 4/7/14 DAI, respectively. In total, 11/16/28/16 DEGs were detected from 123/79/134/99 genes with large amounts of SNPs and indels located within the CIs of qCRc4-1/qCRc7-2/qCRc7-3/qCRc7-4 ( Table 2 and Supplementary Table 5). Then, 45 DEGs with high expression levels (the selection criterion is the average expression of samples with three replicates > 5) among the 71 DEGs in the CIs were selected for further screening of candidate genes. According to the heatmap analysis of the 45 DEGs, the expression levels of most genes in the S pool were higher than those in the R pool at the 4, 7, and 14 DAI, indicating that the response to P. brassicae may be more dramatic in S pool (Figure 3). Within these 45 DEGs, eight DEGs exhibiting obvious differential expression patterns between the R and the S pool were found to be located in the overlap regions on chromosome C07. The qRT-PCR revealed consistent expression patterns with RNA-seq for six genes among these eight genes (Figure 4). Among the six genes, four genes (i.e., Bol017643, Bol017632, Bol024384, and Bol024340) were generally not induced by the pathogen in the R pool, and only two genes (i.e., Bol037115 and Bol042270) were upregulated after inoculation in the R pool but downregulated in the S pool. Of these, Bol037115 [annotated as an FCS-like zinc finger (FLZ) protein] was located in qCRc7-2, with seven SNPs and one indel between two pools, and Bol042270 [plant intracellular Rasgroup-related leucine-rich repeat sequences (LRR) protein 8] was located in qCRc7-4, with nine SNPs and six indels between the two pools. The two genes were considered as potential candidates for future certification.

Functional Marker for Clubroot Resistance
According to the deep comparison between two parents and between extreme pools, a PCR primer pair named "SWU-OA" was designed within the interval of qCRc7-4 (Figure 2). After the amplification by SWU-OA in 54 F2 lines in group II, 28 lines that exhibited the absence of the same bands (400 bp) as the R parent GZ87 were all S to the pathogen, with DI values   (Figure 5). It is indicated that the SWU-OA exhibited ∼95% accuracy in identifying CR in 56 F2 lines and possibly provided a way to accelerate the breeding process of B. oleracea with CR.

DISCUSSION
In the past, QTLs were mainly identified based on the genetic linkage map constructed by molecular markers, such as random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), and simple-sequence repeats (SSR) markers (Landry et al., 1992;Suwabe et al., 2006;Nagaoka et al., 2010), which is usually time-consuming (involves labor-intensive genotyping work). With the rapid development of sequencing technology in recent years, many researchers have combined bulked segregation analysis (BSA) with the whole genome and transcriptome to search for candidate genes (Dakouri et al., 2018;Zhou et al., 2020). With the complement of B. oleracea genome information, the QTL-Seq approach  combined with transcriptome sequencing made it easier to identify potential candidate genes for traits of interest.
FIGURE 3 | The heatmap analysis of the expression of 46 differentially expressed genes (DEGs) underlying the CIs of qCRc4-1, qCRc7-2, qCRc7-3, and qCRc7-4. The average expression values of genes in different inoculation times were denoted at the right with a color scale, in which green, black, and red color indicated the low, medium, and high level of expression, respectively.
As compared with that of B. rapa and B. napus, fewer CR loci were identified from B. oleracea possibly due to the lack of R resources. The reported CR loci of B. oleracea were detected from chromosomes C01 (Pb3, PbBo1), C02 [Pb-Bo(Anju)1, Pb-Bo(Anju)2, CRQTL-YC], C03 [Pb-Bo(Anju)3, Pb4], C07 [Rcr7, Pb-Bo(Anju)4], and C09 (CRQTL-GN_1, CRQTL-GN_2) (Voorrips et al., 1997;Moriguchi et al., 1999;Rocherieux et al., 2004;Nagaoka et al., 2010;Lee et al., 2015;Dakouri et al., 2018). In our study, we identified four QTLs for CR, of which one located on chromosome C04 and the other three located adjacently on C07. No CR-QTL has been detected before on C04 in B. oleracea, but two CR-QTLs (SCR-C4a and SCR-C4b) were found from chromosome C04 in B. napus (Li et al., 2016). By aligning sequences of the markers in SCR-C4a and SCR-C4b to the reference genome of B. oleracea, the two QTLs were aligned to 2.49-2.51 Mb and 8.06-8.10 Mb on C04 of B. oleracea, which were obviously distant from the interval of our qCRc4-1 (16.92-18.79 Mb). By using the same approach, CR-QTLs on chromosome C07 were compared among studies in both B. oleracea and B. napus. In B. oleracea, the CR loci PbBo(Anju)4 (Nagaoka et al., 2010) was found to be located in 37.93-39.25 Mb, which was partially overlapped with our qCRc7-2 (38.96-39.52 Mb). However, qCRc7-2 may have a limited effect on CR since PbBo(Anju)4 was reported to be with a very small effect (R 2 = 0.03), and we failed in finding the candidate resistance gene from this region. The other two QTLs for CR on C07,i.e., Mb), were found to be located nearby but not overlapped with  Mb), which is a major QTL in B. oleracea for resistance against P. brassicae pathotypes 3 and 5X (Dakouri et al., 2018). In addition, the most possible candidate gene for Rcr7 (Bo7g108760, a TIR-NBS-LRR disease resistance gene) (Dakouri et al., 2018) was not induced by P. brassicae race 4 in this study (FPKM of 0.5,0.6,0.4,and 0.4 in R0,R4,R7,and R14;1.6,0.5,0.8,and 0.6 in S0, S4, S7, and S14, respectively), whereas, an SNP association analysis in B. napus detected a significant locus from chromosome C07 at 42.02-42.22 Mb, which is located within the homologous region of our qCRc7-3. These suggest that FIGURE 4 | Comparison of expression patterns of nine DEGs located within C07 QTL regions in R and S pools before and after inoculation. The relative expression in qRT-PCR was indicated by bars with ordinate on the left; the FPKM in RNA-seq was presented by lines with ordinate on the right side. chromosome C07 of B. oleracea possibly carries multiple QTLs for CR, and qCRc7-4 is probably a novel CR locus.
In a previous study, 23 QTLs for three CR-associated traits were identified from the same segregating population between "GZ87" and "263, " including DIC, numbers of fibrous roots (NFR), and P. brassicae content in roots (PCR) . Of these, four QTLs (i.e., NFR.I-3, NFR.I-4, NFR.II-4, and PCR.II-3) were located on chromosome C04 and two QTLs (i.e., NFR.II-6 and NFR.II-7) were located on chromosome C07. However, after a comparison of the physical positions of these loci, none of them was overlapped with the QTLs for CR found in this study. The closest loci between the two studies were NFR.II-6 (47.77-48.30 Mb) and qCRc7-4, which showed a distance of 3.62 Mb on chromosome C07. This suggests that DIC, NFR, and P. brassicae content in roots may not be representative indicators for CR, which is usually determined by DI.
A total of 312 genes were found to be located in the three QTL regions on C07, including six R genes encoding TIR-NBS-LRR disease resistance proteins. However, none of these R genes presented an expression difference between R and S pools in RNA-seq (Supplementary Table 6). Although 61 DEGs were identified from the three regions, most of them presented similar expression patterns between the two pools excepting eight genes. Among these, an FLZ domain protein (Bol037115) and a plant intracellular Ras-group-related LRR (PIRL) protein (Bol042270) exhibited over threefold upregulation after inoculation in the R pool but with downregulation in the S pool. The FLZ domain proteins are implicated in the regulation of various biotic and abiotic stresses (Chen X. et al., 2013;Jamsheer and Laxmi, 2015). PIRLs encode a plant-specific class of leucine-rich repeat proteins related to Ras-interacting LRRs that take part in developmental signaling in animals and fungi (Forsthoefel et al., 2005). It seems that both the two candidate genes are potentially involved in response to hormones, which are tightly associated with host response to pathogens. Nevertheless, further studies are needed to validate the roles of these two genes in CR, as well as unravel their resistance mechanisms.
Resistant varieties are of great importance in the control of clubroot of Brassica crops, but breeding in B. oleracea has been largely unsuccessful due to the lack of highly R sources and the complexity of this quantitative trait (Laurens and Thomas, 1993;Nagaoka et al., 2010;Tomita et al., 2013). Although a few QTLs have been identified in B. oleracea, the effects of these QTLs were usually not as high as expected, and usable markers for marker-assisted selection (MAS) were quite limited recently. For example, Tomita et al. (2013) identified several QTLs for CR in B. oleracea, but the major QTL PbBo(Anju)1 was not enough to produce sufficient resistance against P. brassicae, while the genotype contained five CR-QTLs produced high resistance; Nomura et al. (2005) identified three CR-QTLs from the cross between a cabbage and a kale line and found that the cumulation of those three QTLs showed high resistance to three isolates of P. brassicae, whereas the mean DI in the plants carrying only single QTL was intermediate. Therefore, multigene pyramiding breeding centering on MAS is necessary for the breeding of R varieties in B. oleracea. In our study, several QTLs were identified, and a molecular marker (SWU-OA) that developed from the polymorphic region within qCRc7-4 was effective in distinguishing the R or S F2 lines (with an accuracy of 95%). This suggested a great potential of this marker to be applied in MAS of offspring with CR, as well as in the pyramiding of our QTL with other CR-QTLs to create new B. oleracea resources. This is of practical significance in breeding of high R B. oleracea varieties with multiple CR-genes (QTLs).

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI, BioProject ID: PRJNA735118.