Comparative genomic analysis of duplicated homoeologous regions involved in the resistance of Brassica napus to stem canker

All crop species are current or ancient polyploids. Following whole genome duplication, structural and functional modifications result in differential gene content or regulation in the duplicated regions, which can play a fundamental role in the diversification of genes underlying complex traits. We have investigated this issue in Brassica napus, a species with a highly duplicated genome, with the aim of studying the structural and functional organization of duplicated regions involved in quantitative resistance to stem canker, a disease caused by the fungal pathogen Leptosphaeria maculans. Genome-wide association analysis on two oilseed rape panels confirmed that duplicated regions of ancestral blocks E, J, R, U, and W were involved in resistance to stem canker. The structural analysis of the duplicated genomic regions showed a higher gene density on the A genome than on the C genome and a better collinearity between homoeologous regions than paralogous regions, as overall in the whole B. napus genome. The three ancestral sub-genomes were involved in the resistance to stem canker and the fractionation profile of the duplicated regions corresponded to what was expected from results on the B. napus progenitors. About 60% of the genes identified in these duplicated regions were single-copy genes while less than 5% were retained in all the duplicated copies of a given ancestral block. Genes retained in several copies were mainly involved in response to stress, signaling, or transcription regulation. Genes with resistance-associated markers were mainly retained in more than two copies. These results suggested that some genes underlying quantitative resistance to stem canker might be duplicated genes. Genes with a hydrolase activity that were retained in one copy or R-like genes might also account for resistance in some regions. Further analyses need to be conducted to indicate to what extent duplicated genes contribute to the expression of the resistance phenotype.


Introduction
Polyploidization is a common mode of evolution in flowering plants which occurs through genome merging (allopolyploidy) or genome doubling (autopolyploidy) and results in an increased gene set (Doyle et al., 2008;Wang et al., 2012). The evolution of duplicated genes after whole-genome duplication (WGD) has been studied extensively. In various plant species, a large proportion of duplicated genes are lost after a polyploidization event, a phenomenon called fractionation or diploidization, while a smaller proportion remains in a duplicated state Jiang et al., 2013). Several hypotheses have been put forward to describe the fate of duplicated genes after WGD, including loss of function, neo-functionalization, sub-functionalization or functional redundancy (Freeling, 2008;Edger and Pires, 2009;Freeling, 2009;Wang et al., 2012).
The loss/retention of duplicated genes is not a random process and, from studies in various plant species, it appears to depend on gene functional category. Thus for example, in Arabidopsis thaliana, over-retained genes are involved in basic cellular machinery, nucleotide-sugar metabolism, signal transduction or regulatory functions, while the diploidized genes are involved in DNA repair, tRNA ligation or defense (Blanc and Wolfe, 2004). The loss/retention of duplicated genes might also depend on their parental origin. In this case, one of the parental genomes is more likely to retain genes and has a higher gene density than the other(s) genome(s). This phenomenon, referred to as biased fractionation, has been demonstrated in several species including Zea mays (Schnable et al., 2011), Triticum aestivum (Pont et al., 2013), Arabidopsis thaliana (Thomas et al., 2006), and Brassica rapa . Furthermore, expression bias, as described by Wang et al. (2011), has been reported in allopolyploid species: when considering homoeologous gene pairs in allopolyploids, mRNA transcripts from the sub-genome that retains more genes tend to be more highly expressed and thus contribute more to the total transcriptome (Bardil et al., 2011;Schnable et al., 2011;Yoo et al., 2013;Woodhouse et al., 2014).
In several polyploid crop species such as wheat, soybean, cotton and strawberry, it has been shown that duplicated regions were involved in controlling complex agronomical traits (Axelsson et al., 2001;Rong et al., 2007;Liu et al., 2011;Combes et al., 2012;Lerceteau-Kohler et al., 2012). However, there is a lack of studies dissecting the genomic structure and function of these duplicated regions. A recent study in soybean showed that one major QTL (Quantitative Trait Loci) involved in the control of oil and protein seed content on chromosome 20 and its homoeologous region on chromosome 10, were divergent in gene content and gene density (Severin et al., 2011;Lestari et al., 2013). Both expression bias and expression equivalence were observed at the transcriptomic level in these homoeologous regions: out of the 89 homoeologous gene pairs expressed in both regions, 40% showed bias (higher transcript accumulation) toward the homoeologs on chromosome 10. To fully understand both the genetic control and the mechanisms underlying complex traits in polyploid plant species, it is worthwhile to take into account the evolutionary processes through the dissection and comparison of the genomic regions underlying homoeologous duplicated QTL.
Brassica napus is an outstanding model for investigating structural and/or functional comparisons of duplicated regions involved in the control of complex traits. Firstly, the B. napus genome is highly duplicated. Indeed, B. napus (2n = 4x = 38, genome AACC) is an allotetraploid species formed from the hybridization between Brassica rapa (2n = 2x = 20, A genome) and Brassica oleracea (2n = 2x = 18, C genome) (U, 1935). Brassica ancestors have undergone two duplication events (named α and β) and two triplication events of which the most recent is specific to the Brassica clade (Jenczewski et al., 2013). These WGD events, along with the merger of the two progenitor genomes, have resulted in a large number of duplicated regions in the B. napus genome. Secondly, the organization of duplications in the B. napus genome has been well characterized. The ancestral Brassicaceae genome was reconstructed in 24 genomic blocks (A-X) also called the ancestral karyotype blocks (AK blocks). Theses blocks have been identified in B. rapa Cheng et al., 2013), B. oleracea (Kaczmarek et al., 2009), and B. napus (Parkin, 2011;Delourme et al., 2013). Moreover, the structural organization of the hexaploid Brassica ancestor genome was determined from whole-genome sequence analyses of the B. rapa (Tang and Lyons, 2012;Cheng et al., 2013) and B. oleracea Parkin et al., 2014) genomes. The ancestral Brassica genome contains three sub-genomes showing different levels of fractionation: the LF sub-genome corresponds to the less fractionated genome with the highest gene density, the MF1 sub-genome is the moderately fractionated sub-genome with moderate gene density and sub-genome MF2 is the most fractionated genome with the lowest gene density (Cheng et al., 2013;Liu et al., 2014;Parkin et al., 2014). The organization of these sub-genomes has not yet been elucidated in B. napus. However, the recently released genomic sequence of B. napus showed good collinearity between the A/C genomic sequences of B. napus and the A and C genomes of B. rapa and B. oleracea (Chalhoub et al., 2014). The organization of LF, MF1 and MF2 can thus be extrapolated to the B. napus genome. Thirdly, B. napus is closely related to the model plant species Arabidopsis thaliana [diverged ∼20 million years ago (MYA)] thus allowing functional divergences within B. napus duplicated regions to be explored further based on the function of the A. thaliana orthologous genes. This relatedness between A. thaliana and B. napus has been successfully used in previous comparative genetic and genomic analyses to identify candidate genes underlying QTL or to find putative functions of targeted genes (Cai et al., 2012;Zou et al., 2012). In B. napus, homoeologous regions were shown to be involved in the control of important traits such as seed glucosinolate content (Howell et al., 2003), flowering time (Wang et al., 2009), yield-related components (Chen et al., 2007) and resistance to sclerotinia stem rot (Zhao et al., 2006;Wei et al., 2014). Comparative genomic analyses of homoeologous B. napus regions involved in polygenic traits would then give the opportunity to study the impact of genome duplications on the structural and functional organization of such regions in a highly duplicated genome.
Stem canker caused by the fungus Leptosphaeria maculans is one disease of oilseed rape causing serious losses in Europe, Australia and North America (West et al., 2001;Fitt et al., 2006). Both qualitative and quantitative resistances were identified in B. napus or in related species (Delourme et al., 2006;Hayward et al., 2012;Raman et al., 2012). Qualitative resistance is based on a gene-for-gene interaction, which is expressed from the seedling stage. More than 10 specific resistance genes (R genes) have been identified in B. napus and related Brassica species B. rapa, B. juncea, and B. nigra (Rlm1-11, RlmS, LepR1-4) (Yu et al., 2005(Yu et al., , 2008(Yu et al., , 2013Delourme et al., 2006;Van de Wouw et al., 2009;Balesdent et al., 2013), some of which are organized in clusters on B. napus chromosomes (Delourme et al., 2004(Delourme et al., , 2006. Since this resistance is total, it exerts strong selection pressure on the fungal populations that rapidly adapt and varieties whose resistance is based solely on such major genes lose their effectiveness only three-four years after their release (Brun et al., 2000;Li et al., 2003;Rouxel et al., 2003). Quantitative adult-plant resistance, which is a partial, polygenic resistance mediated by QTL is considered as non-race specific (Delourme et al., 2006) and more durable but its effectiveness varies between cropping seasons due to environmental conditions. Qualitative resistance protects the plant from development of leaf lesions at the young plant stage and subsequently prevents the development of stem canker at the adult plant stage. Thus, effective detection of quantitative resistance in field conditions is only possible in the absence of effective R genes. The use of quantitative resistance alone or in combination with qualitative resistance is a way to maximize the effectiveness and durability of the resistance (Brun et al., 2010). These findings highlight the interest and need to strengthen quantitative resistance studies in order to help breeding and improve disease management. Genetic studies of quantitative resistance showed that many QTL in different genetic backgrounds controlled this resistance (Pilet et al., 1998(Pilet et al., , 2001Kaur et al., 2009;Jestin et al., 2011Jestin et al., , 2012Jestin et al., , 2015Raman et al., 2012). Recently, using linkage and genomewide association (GWA) analyses, we showed that a large proportion of regions (at least 44%) involved in this quantitative resistance corresponded to duplicated homoeologous regions. These regions were located mainly in duplications of the five AK blocks E, J, R, U, and W (Fopa Fomeju et al., 2014). In the present study, we conducted a comparative genomic analysis of these duplicated homoeologous regions involved in the control of the quantitative resistance to stem canker in B. napus. We investigated the similarity/divergence of the duplicated regions in terms of gene content and putative functions of retained genes. We took advantage of the recently released B. napus genome sequence (Chalhoub et al., 2014) for the structural analysis of the duplicated regions. B. napus-A. thaliana relatedness was used to explore gene ontology (GO) categories and the function of genes located in the duplicated regions. According to the functions of the genes that were retained in all duplicated regions or were specific to some regions, we discussed the putative functions associated to the stem canker quantitative resistance.

Materials and Methods
Association Mapping Panels, Genotyping, and Phenotyping Data A panel of 170 winter B. napus varieties was used (Table S1). The panel comprises modern oilseed rape (OSR) varieties with low seed erucic acid and glucosinolate content (00 quality) and old varieties with high seed erucic acid and high glucosinolate content (++ quality). The varieties were chosen to represent a wide range of winter oilseed rape diversity and derived from different breeding programs. The absence of any effective specific resistance genes conferring total resistance in our field trial conditions was also checked in order to only evaluate the level of quantitative resistance. This was deduced from cotyledon tests using isolates carrying known AvrLm genes (Balesdent et al., pers. comm.). Thus, we excluded OSR varieties known as carrying Rlm7, a highly effective gene that was recently introduced in OSR varieties in France. This panel was genotyped with single nucleotide polymorphism (SNP) markers using a Brassica 60K SNP BeadChip Array  and evaluated for resistance to stem canker in a field trial in 2013. One hundred and nine OSR varieties were common to a panel of 116 OSR varieties phenotyped for resistance to stem canker in a field trial in 2006 (Table S1) (Fopa Fomeju et al., 2014).
DNA was isolated from young leaves and DNA extracted using the DNeasy 96 Plant Kit (Qiagen, Courtaboeuf, France).
DNA was quantified with the Quant-iT ™ PicoGreen R Assay (Invitrogen, Carlsbad, USA), using the Appliskan multiplate reader (Thermo Scientific, Courtaboeuf, France). Concentrations were adjusted to a minimum of 50 ng/µL and were submitted to a provider, where the Infinium R assay was performed following the manufacturer's protocol (Illumina Inc., San Diego, USA). The automatic allele calling for each locus was accomplished using the Genome Studio software (Illumina Inc., San Diego, USA). The clusters were manually edited when necessary. Technical replicates and signal intensities were controlled and only the most reliable calls were retained.
Stem canker was evaluated at one location in a α-design (3 columns and 5 rows) with three replicates (INRA Le Rheu) in France in 2012/13. In each replicate, three control varieties ("Eurol" and "Goeland" as moderately susceptible and "Jet Neuf " as resistant) were replicated in each row × column block in order to check the homogeneity of the trial. Contaminated residues collected from the previous season were scattered throughout the plots 1 month after sowing in order to ensure homogeneous disease infection. Leaf lesions were assessed on each plot at the autumn using a 0-4 scale according to the number of plants with leaf lesions and the number of leaf lesions per plant. Stem canker severity was assessed 2-3 weeks before harvest (mid-June). Forty plants per plot were uprooted and stem canker was assessed on a 0-9 disease index (DI) (Aubertot et al., 2004). DI increases with crown canker severity, starting from zero for healthy plants to nine for completely lodged plants. The 2006 phenotypic data were obtained from a similar experimental design (Jestin et al., 2011).

Association Mapping
Genome wide association (GWA) analyses were performed on two panels (i) panel 1: the sub-panel of 109 OSR varieties phenotyped in 2006 and (ii) panel 2: the whole panel of 170 OSR varieties phenotyped in 2013. A total of 13,973 informative SNPs that were mapped on our "Darmor-bzh" × "Yudal" doubled haploid population was selected for association analyses. GWA was carried out as described in Fopa Fomeju et al. (2014) using the GAPIT package (Lipka et al., 2012). Markers with a major allele frequency (MajAF) greater than 0.95 and genotypes or markers with more than 10% of missing genotyping data were eliminated to prevent analyses bias. We used a mixed linear model taking into account the structure within the panel as a fixed effect, and the relatedness between genotypes as a random effect. The calculation of the P matrix, which took into account the panel structure, was based on Principle Component Analysis (PCA) and was performed with EIGENSTRAT software (Price et al., 2006). The calculation of the kinship, or K, matrix was based on an identity by state matrix for all SNP markers and was directly performed by GAPIT (Lipka et al., 2012). The proportion of false positive associations was controlled by a False Discovery Rate (FDR) test at 10%.

Delineation of the A. thaliana and B. napus Genomic Sequences to Investigate
The flanking sequences (100-300 bp, median size: 120 bp) of SNPs were used as queries for BLASTn alignments against A. thaliana nucleic sequences in TAIR10 (https://www.arabidopsis. org/) with an E-value threshold of 10 −6 . Best gene hits and the corresponding AK block in A. thaliana were anchored on the genetic B. napus reference map. Thus for each genetic region with resistance-associated markers in B. napus, the corresponding A. thaliana gene interval was deduced. We took advantage of our present GWA studies and of the one of Fopa Fomeju et al. (2014) to delineate these intervals. They included all the resistance-associated markers identified in all corresponding B. napus duplicated regions.
The genomic regions thus delineated were extracted from the A. thaliana genome (https://www.arabidopsis.org/) and aligned against the whole B. napus genome sequence (Chalhoub et al., 2014) to find the corresponding orthologous duplicated regions in B. napus. This reciprocal alignment was conducted to make sure that (i) we investigated all the corresponding duplicated regions in B. napus and (ii) we searched for the same ancestral interval on all the duplicates in the B. napus genome. The software MumMer v3.0 (Kurtz et al., 2004) was used to conduct a nucleotide versus nucleotide alignment ("NUCmer" command) between each A. thaliana region and the B. napus genome. The default alignment parameters were used (http://mummer. sourceforge.net/manual/#nucmer).
For a visual overview of the synteny and the collinearity between the identified B. napus duplicated regions thus identified, we used the software MAUVE v2.3.1 (Darling et al., 2010).

Structural Analysis of the Duplicated Regions with Resistance-associated Markers
We took advantage of the close relatedness between B. napus and A. thaliana to identify the duplicated genes in the B. napus regions. The annotated B. napus genes (Chalhoub et al., 2014) were translated and aligned against the bank of A. thaliana proteins (https://www.arabidopsis.org/). When two different translated B. napus genes showed the best matches to the same A. thaliana protein then we considered that the two B. napus genes were likely to be homoeologous (if one gene was on the A genome and the other on the C genome) or paralogous (if the two genes were on the same genome).
We compared the duplicated regions in terms of interval length, number of genes and gene density. For each block, we also estimated the level of retention of A. thaliana genes in the duplicated regions. The results were examined in relation to the putative organization of the ancestral sub-genomes LF, MF1, and MF2. The organization of these sub-genomes was reported in B. napus from findings in B. rapa (Cheng et al., 2013) and B. oleracea Parkin et al., 2014) genomes. We then looked at the distribution of the retained gene copies in the different duplicated regions in B. napus. Indeed, for a given block, the A. thaliana genes can be present in one of the duplications in B. napus or in up to six duplicated regions. Furthermore, in a given B. napus region, genes can be present in one copy (single-copy) or in two or more copies (multicopies).

In silico Functional Analysis of the Duplicated Regions with Resistance-associated Markers
We deduced Gene Ontology (GO) categories from the A. thaliana best blast hits in order to functionally characterize and compare the duplicated B. napus regions between each other and with A. thaliana orthologous regions. The GO terms were obtained from the most recent release of predicted gene locus in the TAIR database (https://www.arabidopsis.org/).
For GO term enrichment analysis, we used the online analysis tool "Singular Enrichment Analysis" (SEA) from AgriGO (http:// bioinfo.cau.edu.cn/agriGO/analysis.php). A Fisher test with the Yekutielli multi-test correction method (significance threshold = 0.05) was used. The GO type chosen was the Plant GO slim. For comparison of GO terms of the genes present in 5-6 vs. 1-2 duplicated regions of our studied five blocks, we used a contingency Chi-2 test (α = 0.001).
Finally, nucleotide binding site-leucine rich repeat (NBS-LRR) genes identified in the B. napus genome (Chalhoub et al., 2014), genes predicted in the B. napus genome as being putative resistance gene analogs in A. thaliana, and known major stem canker resistance genes were searched for in our targeted regions. This last category included the major LepR3/Rlm2 gene which was cloned and identified as a receptor-like protein (Larkan et al., 2013(Larkan et al., , 2015 and other Rlm genes previously located in B. napus maps (Delourme et al., 2006). For the latter ones, the previously identified flanking markers were mapped on our SNP map in order to deduce their putative position.

Phenotypic Analysis of the B. napus Panel
The results obtained confirmed that we have mainly assessed quantitative resistance in our panels. Indeed, all oilseed rape varieties in the experiment showed typical leaf lesions in the autumn, indicating their lack of effective specific total resistance conferred by major genes that prevent leaf lesion development. As expected, the varieties showed variability in stem canker resistance with DI ranging from 0.3 (resistant) to 8.4 (susceptible). Analysis of variance showed significant (P < 0.001) phenotypic variation among lines and replicates. The proportion of total phenotypic variation which resulted from the genotypic variation within the oilseed rape collection was estimated at 0.94. Analysis of variance on the three control varieties did not show a significant genotype x replicate interaction (α = 0.05). The mean disease index of each variety (Table S1) was then used for GWA.

GWA and Identification of Duplicated Homoeologous Regions Associated to Stem Canker Resistance
Association analyses were conducted on the two OSR panels with 13,973 informative SNP markers that were well distributed on our reference genetic map with an average density of 1 SNP every 0.15 centimorgans (cM). The average physical distance between two SNPs on the B. napus reference sequence was  (Table S2). A set of 52 associated SNP markers was common to both panels. After false discovery rate (FDR) correction, no significant associated markers were detected in the 109 OSR panel, and one associated marker located on linkage group (LG) A08 in block U was observed in the 170 OSR panel. Out of the 686 and 1019 resistance-associated markers, 505 (73%) and 467 (45%) were clearly anchored on one AK block (Table  S3). The remaining markers were located in regions including several nested blocks or were putatively located on two different blocks.
We compared results obtained in the present two GWA analyses with those published by Fopa Fomeju et al. (2014), and we found 28 and 32 common regions containing resistance-associated markers between the three or two analyses, respectively (Table S3). Of these 60 regions, 27 (45%) corresponded to duplications of the AK blocks that were already identified by Fopa Fomeju et al. (2014) i.e., the blocks E (six regions), J (six regions), R (four regions), U (six regions), and W (four regions). In addition in the present study, resistanceassociated markers were identified on the six duplications of the block F. In further analyses we focused on the AK blocks E, J, R, U, and W, as these were consistent between our different studies.

Structural Organization of Homoeologous Duplicated B. napus Regions
The A. thaliana gene intervals corresponding to the duplicated B. napus regions that carry resistance associated markers were delineated ( Table 1) and their genomic sequence was extracted. The alignment of the sequence of all the studied AK block intervals to the B. napus sequence ( Figure S1) revealed the expected number (six duplicated regions) and the expected location (according to the genetic map) of the duplications

AK block
A. thaliana gene interval The interval definition is based on the location of resistance-associated markers. Each AK interval is defined by two A. thaliana genes.
in the B. napus genome. The dotplots indicated that there was a good collinearity between A. thaliana and B. napus nucleotide sequences. Few rearrangements, mainly inversions, were identified between A. thaliana and B. napus duplicates.
One small translocation appeared to be located on U block between chromosomes C3 and C8. Multiple alignments of the duplicated B. napus regions indicated that there was globally a better collinearity between homoeologous regions than between paralogous regions ( Figure S2). For each block, although the size of the duplicated regions was larger on the C genome than on the A genome, the gene density was higher in the A than in the C regions ( Table 2). Both on the A and C B. napus genomes, the regions showing the highest number of conserved genes with A. thaliana corresponded to regions located on the ancestral less fractionated sub-genome (LF subgenome) ( Table 2). The only exception was for the duplication of AK block J on C4, which was not assigned to the same subgenome in previous studies by Liu et al. (2014) and Parkin et al. (2014). Likewise, for the duplicated regions of blocks E, J, R, and U on the A genome and of block U on the C genome, the regions with an intermediate number and the lowest number of conserved genes with A. thaliana corresponded to the MF1 and MF2 sub-genomes, respectively. For the duplicated regions of block W on the A genome and blocks E, J, R, and W on the C genome, the differences between the number of conserved genes with A. thaliana was not significant enough to distinguish between the MF1 and MF2 sub-genomes in these regions. Except for the block E, there were more tested SNPs in the regions located on the A than on the C genome. There was no bias in the distribution of tested markers across the three sub-genomes. The proportion of resistance-associated markers identified in the duplicated regions was similar on the A or C genomes, and on the LF, MF1, and MF2 sub-genomes, except for a higher proportion observed on MF2 on A genome.
The analysis of the conservation of A. thaliana orthologous genes in the duplicated B. napus regions ( Table 3) showed that (i) the A. thaliana genes were retained in one to six duplicated regions of a block and (ii) in a given B. napus region, one A. thaliana orthologous gene was present in 0, 1 or several copies. Thus, duplicated genes can be retained in two ways: one gene copy can be retained on all the duplicated regions of an AK block and/or several copies of the gene can be located in at least one of the duplicated regions of the AK block. For all the studied blocks, most of the A. thaliana orthologs (∼60%) were  located in one or two duplicated regions of a given block. In contrast, only 4% of the genes were present in all the duplicated regions of a given block and 6% on five of the six duplicated regions ( Table 3). In most cases, when gene copies were located in two duplicated regions, one copy was on the A genome and the other was on the C genome (data not shown). Overall, for all AK blocks, a large majority of genes (85%) were maintained in one copy per duplicated region. In addition, the proportion of genes in multiple copies is lower for genes that are located in one region compared with genes that are retained in several duplicated regions. For example, 7% of the genes retained in only one duplicate of block J were found in multiple copies within this duplicated region whereas 36% of the genes that were retained on the six duplicates of block J were in multiple copies on at least one of the six duplicated regions.

Gene Ontology Term Analyses
We investigated gene categories in the duplicated regions involved in resistance to stem canker. B. napus genes with an ontology deduced from A. thaliana orthologs were used to perform GO slim enrichment analyses. No significant enrichment was identified when we compared GO terms of genes located in the duplicated B. napus regions with the GO terms of genes located in the corresponding A. thaliana orthologous regions. We then compared GO terms of genes located on five and six duplicated regions versus those located on one or two regions in B. napus. Among the genes located in five or six duplicated regions (Table S4), there were significantly more genes involved in "developmental processes, " "cell organization & biogenesis, " "response to stress, " "response to abiotic or biotic stimulus, " "transcription DNA-dependent, " "transport, " and "signal transduction" compared with genes located in one duplicated region (Figure 1). Furthermore, among the genes retained in five or six duplicated regions, more genes had molecular functions corresponding to "transcription factor activity, " "protein binding, " and "transferase activity." In contrast, there were more genes with a molecular function corresponding to "hydrolase activity" within genes located on one or two duplicated regions (Figure 1).

Structural and Functional Characterization of Genes with Resistance-associated Markers
From the present GWA studies and the one of Fopa Fomeju et al. (2014), we identified 294 resistance-associated markers in the duplicated regions corresponding to the five studied AK blocks. Of these, 148 markers were anchored in B. napus genes. They corresponded to 129 genes, of which 124 had a blast hit with A. thaliana and their putative function was thus investigated (Table S5). These 124 B. napus genes represented a large proportion of the resistance associated markers on the blocks W (65%) and E (51%) and about 30% of resistance-associated markers on blocks J, R, and U. Seven resistance-associated genes were retained in only one duplicated region and the 117 others were retained in two or more duplicated regions. Among these 117 genes, 18 (15%) were present in multiple copies in at least one of the duplicated regions. The 124 resistance-associated genes were mainly related to the biological processes "response to stress, " "cell organization & biogenesis, " and "protein metabolism" (Figure 2). The three main molecular functions found were "hydrolase activity, " "nucleotide binding, " and "transferase activity." These GO categories described genes located on five or six duplicated regions, except for "hydrolase activity" which was preferentially identified for genes located in one duplicated region.

Investigation of R-like Genes
Out of the 425 NBS-LRR genes identified in the B. napus genome (Chalhoub et al., 2014), 135 genes were located in our studied B. napus duplicated regions. In addition, we found that 89 B. napus genes had a significant hit to NBS-LRR genes or resistance gene analogs (RGA) in A. thaliana. These were mostly distributed on duplicated regions corresponding to the blocks E, R, and U. Out of the 224 resistance-like genes (NBS-LRR and RGA), 21 were tested in the association study and four were significantly associated with resistance to stem canker (Table S2). Two were located on block E on LG A7 (LF subgenome) and C2 and the two others on block U on LG C7. According to genetic linkage maps, the Rlm3-Rlm4-Rlm7-Rlm9 For each block, the number of orthologous A. thaliana genes has been identified in the duplicated regions (total genes). These genes can be retained in one of the duplicated regions of each block or up to the six duplicated regions. In each region, one gene can be represented in one copy (single copy) or in two or more copies (multi-copies).
cluster might be in the neighborhood of block E-LF subgenome on chromosome A7 and Rlm1 was assigned to the studied region of block E-MF2 sub-genome on chromosome A7. The approximate mapping position of Rlm1 spread in a region of 1300 kb according to bridge SNP between linkage map and genomic sequence (Table S2). In this region, 22 resistance-associated markers have been tested and 12 were significantly associated to resistance to stem canker. LepR3/Rlm2 sequence aligned to the B. napus gene GSBRNA2G00135758001 which was located outside our investigated region on the block R on chromosome A10 and was not tested in association mapping.

Discussion
In this study, we analyzed the structural and functional organization of duplicated regions involved in quantitative resistance of B. napus to stem canker. We took advantage of the recently released B. napus genome sequence (Chalhoub et al., 2014) for structural analyses and of the B. napus-A. thaliana relatedness for exploring GO categories and the putative function of genes located in the duplicated regions.

Duplicated Regions of At Least Five AK Blocks Are Involved in Quantitative Resistance to Stem Canker
Our present GWA analyses confirmed that duplicated regions of AK blocks E, J, R, U, and W are involved in quantitative resistance to stem canker as shown in our previous GWA study (Fopa Fomeju et al., 2014). Results from FDR corrected data indicated that there might be false positives among the resistanceassociated markers. However, the duplicated regions of the blocks E, J, R, U, and W were identified with three datasets, suggesting that these regions are likely to be "true" quantitative resistance regions. Moreover, at least 15 of these duplicated resistance associated regions (on LG A01, A02, A03, A05, A08, C01, C02, C03, C04, C06, and C07) were located within or in the vicinity of QTL that were previously identified in biparental segregating populations and in a connected multiparental population (Pilet et al., 2001;Jestin et al., 2012Jestin et al., , 2015. New potential duplicated regions were highlighted (such as duplications of block F) indicating that, as suggested in (Fopa Fomeju et al., 2014), more than 44% of duplicated genomic regions are involved in the control of this complex trait. This proportion is quite high compared to that observed in other polyploid plant species FIGURE 1 | Gene ontology terms associated to genes retained in 1-2 or 5-6 duplicated regions. Gene ontology terms for biological process (A) and molecular activity (B).
such as strawberry (Lerceteau-Kohler et al., 2012) or cotton (Rong et al., 2007) for other traits. This could be tightly linked to either the good structural conservation between the B. napus A and C genomes (Cheung et al., 2009;Chalhoub et al., 2014), the highly duplicated nature of the Brassicaceae genomes resulting from several WGD events, or the fact that loci involved in defense responses are often over-retained in plants after WGD (Blanc and Wolfe, 2004;Moghe et al., 2014).

Structural Organization of the Homoeologous Duplicated Regions Involved in Stem Canker Resistance
All the expected duplicated regions of our targeted blocks were found on the B. napus genomic sequence. This reflects the good coverage of the B. napus sequence, which was estimated at 79% (Chalhoub et al., 2014). Alignment of B. napus duplicated regions revealed better collinearity between homoeologous regions than between paralogous regions, homoeologous duplicated regions having more genes in common than paralogous duplicated regions, as generally observed in the genomes of B. napus and its progenitors (Cheung et al., 2009). The size of the extracted duplicated regions varied depending on their location on either the A or C genome. For a given block, the studied regions on the C genome were larger than their homoeologous regions on the A genome. It has been shown that the C genome is larger due to insertions of transposable elements; almost 40% of the C vs. 26% of the A genome corresponds to transposable elements in B. napus (Chalhoub et al., 2014). Similar results were observed for the A and C genomes of B. rapa and B. oleracea, respectively (Cheung et al., 2009;Cheng et al., 2012;Liu et al., 2014).
FIGURE 2 | Gene ontology terms associated to SNP/genes associated to resistance to stem canker. Gene ontology terms for biological process (A) and molecular activity (B).
By analyzing gene distribution in our studied regions, we observed that the B. napus duplicated regions with the highest number of retained genes compared with A. thaliana were orthologous to regions located on the less fractionated subgenome (LF) of B. rapa  and B. oleracea Parkin et al., 2014). This agrees with the hypothesis that the fractionation of the sub-genomes took place before the B. rapa-B. oleracea divergence  and suggests that, probably due to the recentness of B. napus, no significant new fractionation levels were observed in this allotetraploid. However, the occurrence of homoeologous exchanges, which may shape fractionation levels by modifying gene number, on the sub-genomes have been reported (Chalhoub et al., 2014), suggesting that fractionation might be ongoing in the B. napus genome. The proportions of resistance associated markers on the LF, MF1 and MF2 subgenomes were similar, which suggests that all the subgenomes were involved in the control of the resistance trait. This remains to be further studied since the number of associated markers identified does not provide information on the number of underlying causative genes, nor on their contribution to the phenotype expression.
When considering gene distribution across the duplicated regions, we found that a large majority of genes (60%) were only present in one or two duplicated regions of the AK blocks and that these were mostly present in a single copy per region (on average, 85% of the genes). In general, genes present in two duplicated regions had one copy on the A genome and the other copy on the C genome. In a recent study, De Smet et al. (2013) estimated that 66% of the B. rapa genes were single-copy genes. This is consistent with our findings suggesting that these single-copy genes have not resulted from the allotetraploidization event but rather from the long-term evolution of diploid ancestors.

Functional Analysis of the Genes Located in the Duplicated Regions Involved in Resistance to Stem Canker
Genes located on five or six of the duplicated regions of a given block represented about 10% of the total genes identified. The GO terms of these duplicated genes were related to response to stress, signaling and transcription regulation as identified in other plant species (Blanc and Wolfe, 2004;Paterson et al., 2006). Highly duplicated genes, i.e., genes present in several copies in a given region, were preferentially associated to transport, signal transduction and kinase activity functions. Similar results were reported in other plant species (De Smet et al., 2013;Han et al., 2014). Actually, genes involved in adaptive traits (such as resistance or tolerance to biotic and abiotic stresses) are preferentially over-retained after a WGD event allowing faster evolution and conferring an adaptive advantage to polyploid plants (Pont et al., 2013;Murat et al., 2014;Wu et al., 2014). This can lead to copy number or sequence variation between accessions in the homoeologous and paralogous regions and to phenotype differentiation as shown for flowering-related genes (Schiessl et al., 2014). Twenty-two of the genes retained in five or six duplicated regions were associated to resistance and were related to response to stress and transferase activity. However, at this point, we could not conclude on the involvement of duplicated genes in stem canker resistance due to the fact that only one of the duplicated copies was tested for 77% of the duplicated genes.
Genes maintained in only one duplicated region were enriched in "hydrolase activity" functions. An analysis conducted by Han et al. (2014) in 29 angiosperm genomes indicated that genes in single copy have binding and catalytic activities (including hydrolase activity). Previously,  highlighted that genes involved in hydrolase activity were retained as single-copy genes after the alpha duplication in the Brassicaceae lineage. All these results suggest that the bias in retention of genes in single copy is common across different species. Thus our results may reflect a global pattern across the whole genome and may not be specific to the studied regions. Twenty-four associated markers were located in genes present in only one duplicated region and had GO terms related to "hydrolase activity." Thus, we cannot exclude that genes encoding proteins with hydrolase activity are involved in response to stem canker in some regions. Indeed, the involvement of hydrolase activity has been highlighted in the response to tobacco virus infection (Guo et al., 1998). The HE-1 hydrolase protein may play a role in protection from oxidative damage associated with defense responses. It may also play a role in generating signals for activation of certain defense responses. The relationship between hydrolase activity and plant response to pathogens has been also demonstrated in A. thaliana (Kang et al., 2008) and potato (Reddy et al., 2000). R and R-like genes could also account for resistance in some regions. Indeed, four resistance-associated markers located in NBS-LRR or RGA genes were found on blocks E and U. In addition, the major gene Rlm1 conferring blackleg resistance in B. napus was located in the neighborhood of the resistanceassociated markers on the block E, chromosome A7, sub-genome MF2. The effect of this region on resistance could be due either to a residual effect of Rlm1 gene or to linked genes, as previously suggested for Rlm2 gene that was shown to co-localize with a resistance QTL to stem canker (Pilet et al., 2001).

Conclusion and Future Prospects
We identified a set of genes that were retained in duplicated regions and associated with quantitative resistance to stem canker and many were involved in stress responses, signaling and transcriptional regulation. It was shown in various other species that such genes were over-retained after WGD, indicating a common evolutionary pattern across species. We also found that almost 60% of the genes located in the studied duplicated regions were single-or two-copy genes, as expected in the whole B. napus genome, and that some of them e.g., genes with a hydrolase activity were associated with resistance. Thus, we cannot exclude that the genes underlying quantitative resistance to stem canker are not duplicated genes. Results also suggested that R-like genes might account for B. napus quantitative resistance to stem canker in some genomic regions. In future studies, the B. napus genes that are present in our studied regions but had no hit with A. thaliana genes, should be investigated to complete the results. The three B. napus sub-genomes (LF, MF1, and MF2) were found to contribute to quantitative stem canker resistance. In wheat, it was suggested that the most fractionated genome (B sub-genome) i.e., the sub-genome which is more sensitive to genome fractionation and more plastic, controls adaptive traits such as reaction to biotic and abiotic stresses (Pont et al., 2013). This has to be further analyzed in oilseed rape by investigating whether all duplicated copies are expressed and equally contribute to phenotype expression and if they have undergone sub-functionalization, which appears to be a major force in maintaining genes in a duplicated state.

Author Contributions
BF performed GWA studies, structural and functional sequence analyses. CF carried out genetic mapping and BLAST analyses for the construction of the integrated map used in this study and the positioning of duplicated blocks on this map. GL contributed to the structural and BLAST analyses. RD and MM helped analyse the results and coordinated the project. BF, RD, and MM wrote the manuscript.
in the RAPSODYN project. We acknowledge the Genetic Resource Center (BrACySol, UMR IGEPP Ploudaniel, France) for providing the seeds of some B. napus varieties. We thank the team of the INRA Experimental Unit at Le Rheu for performing the disease evaluation trials.