Candidate Effectors of Plasmodiophora brassicae Pathotype 5X During Infection of Two Brassica napus Genotypes

Clubroot, caused by Plasmodiophora brassicae, is one of the most important diseases of canola (Brassica napus) in Canada. Disease management relies heavily on planting clubroot resistant (CR) cultivars, but in recent years, new resistance-breaking pathotypes of P. brassicae have emerged. Current efforts against the disease are concentrated in developing host resistance using traditional genetic breeding, omics and molecular biology. However, because of its obligate biotrophic nature, limited resources have been dedicated to investigating molecular mechanisms of pathogenic infection. We previously performed a transcriptomic study with the cultivar resistance-breaking pathotype 5X on two B. napus hosts presenting contrasting resistance/susceptibility, where we evaluated the mechanisms of host response. Since cultivar-pathotype interactions are very specific, and pathotype 5X is one of the most relevant resistance-breaking pathotypes in Canada, in this study, we analyze the expression of genes encoding putative secreted proteins from this pathotype, predicted using a bioinformatics pipeline, protein modeling and orthologous comparisons with effectors from other pathosystems. While host responses were found to differ markedly in our previous study, many common effectors are found in the pathogen while infecting both hosts, and the gene response among biological pathogen replicates seems more consistent in the effectors associated with the susceptible interaction, especially at 21 days after inoculation. The predicted effectors indicate the predominance of proteins with interacting domains (e.g., ankyrin), and genes bearing kinase and NUDIX domains, but also proteins with protective action against reactive oxygen species from the host. Many of these genes confirm previous predictions from other clubroot studies. A benzoic acid/SA methyltransferase (BSMT), which methylates SA to render it inactive, showed high levels of expression in the interactions with both hosts. Interestingly, our data indicate that E3 ubiquitin proteasome elements are also potentially involved in pathogenesis. Finally, a gene with similarity to indole-3-acetaldehyde dehydrogenase is a promising candidate effector because of its involvement in indole acetic acid synthesis, since auxin is one of the major players in clubroot development.


INTRODUCTION
Clubroot disease, which results from infection by the soilborne obligate parasite Plasmodiophora brassicae Wor., causes significant damage to plants in the Brassicaceae family, and specifically to crops of agricultural importance like canola (Brassica napus L.) (Dixon, 2009). While strategies for disease management including soil amendments, fungicides, and the sanitization of equipment hold some promise, the best approach for controlling clubroot has been the planting of resistant cultivars (Donald and Porter, 2009;Hwang et al., 2014). However, short rotations with clubroot resistant (CR) cultivars can result in resistance breakdown and the emergence of more virulent pathotypes (Cao et al., 2020). The emergence of new pathotypes of P. brassicae (Strelkov et al., , 2018Hollman et al., 2021) is therefore related to crop management practices and selection pressure on the pathogen, reflecting the arms race between the molecular defense mechanisms of the host and the generation of pathogen effectors utilized to overcome resistance.
Plasmodiophora brassicae is a resilient pathogen that produces resting spores that can remain in the soil for periods of up to 20 years (Hwang et al., 2012). The resting spores germinate to produce primary zoospores, which invade host root hairs and epidermal cells to generate primary plasmodia. These cleave into zoosporangia, from which secondary zoospores are released back into the soil. The secondary zoospores infect the root cortex, establishing a secondary infection through secondary plasmodia that will eventually give rise to a new generation of resting spores (Kageyama and Asano, 2009;Liu et al., 2020). Secondary plasmodia are responsible for the hypertrophy and hyperplasia of the host tissues through host hormone regulation (Jahn et al., 2013;Schuller et al., 2014;Ludwig-Müller et al., 2017). Increased root growth aids in housing the enlarged secondary plasmodia and provides a sink for nutrients utilized by the pathogen, and it is likely that most of the molecular interactions between host and pathogen take place during this stage, making this phase relevant in effector secretion and pathogenesis (Pérez-López et al., 2020. Effectors from plant pathogens aid in the infection process either by directly affecting the immune response of the host, or by altering plant responses for their own benefit. These effectors are usually secreted proteins that can either act on the apoplast or enter plant cells, where they interact with different cellular components to cause disease and/or bind receptors (Rgenes) or mediator proteins associated with R proteins that trigger immunity (Dodds and Rathjen, 2010;Toruño et al., 2016). Effectors can have diverse functions, including protection of the pathogen, protection of other effectors, renaturation of effectors that have been damaged by host reactive oxygen species (ROS), interference and/or manipulation of host gene expression and metabolism, manipulation of vesicular traffic or inhibition of pathogen-associated molecular patterns (PAMPs) (Toruño et al., 2016).
Effector proteins usually possess specific domains or motifs that allow their secretion. For example, an N-terminal signal peptide is typical of proteins that are processed through the secretory pathway. Different types of pathogens have distinct signatures depending on their infection characteristics. While oomycete effectors seem to have some typical signals in their N-terminal regions, this feature seems to be more variable in fungi (Selin et al., 2016). For example, intracellular effectors need translocation motifs like RxLR. In the case of P. brassicae, it has been speculated that motifs like RxLR, DEER and PEXEL are potentially necessary (Pérez-López et al., 2018), at least for some of the secreted proteins, to fulfill their function. It has been suggested, however, that the conservation of pathogenic effectors might reside in their tertiary and quaternary protein structures (Selin et al., 2016), making specific folds and structure more predictive of function since effectors tend to evolve rapidly at their primary sequence level (Mukhi et al., 2020). Additionally, many effectors are said to be small proteins (less than 300 or 400 amino acids), and bear a high proportion of cysteines, but not all effectors are necessarily small or rich in cysteines (Gout et al., 2006;Sperschneider et al., 2015).
So far, in P. brassicae, putative effectors have been identified through seminal studies on the P. brassicae genomes (Schwelm et al., 2015;Rolfe et al., 2016). Schwelm et al. (2015) predicted 553 secreted proteins in P. brassicae from genomic data, showing that these putative proteins were rich in ankyrin domains, which are essential for protein-protein interactions, and RING-domains, which can modify ubiquitination processing in the host. Other features common to many plant pathogens, like high proportions of cysteines, chitin recognition and protease inhibition, were common in the small secreted proteins of P. brassicae, but a general enrichment of translocation motifs like RxLR was not evident. In the meantime, Rolfe et al. (2016) predicted 590 secreted proteins, 221 of which were classified as small-secreted proteins (<300 aa based on their classification). This study also confirmed a low prevalence of the RxLR translocation motif, with only 13 proteins showing the motif. Additionally, differential expression of the effectors, 6 weeks after inoculation, showed that genes corresponding to these proteins were likely interfering with the host SA-induced defenses, most likely through the action of benzoic acid/salicylic acid methyltransferase (Pb3-BSMT) (Rolfe et al., 2016). PbBSMT was shown to be one of the most highly expressed genes in galled tissue of Brassica oleracea (Ciaghi et al., 2019). Functional analysis to examine the putative effect of this P. brassicae effector on host plants indicated that it could interfere with host induced SA-triggered defenses (Ludwig-Müller et al., 2015), and localization analyses have shown the appearance of BSMT mRNA during early secondary infection and early sporogenesis (Badstöber et al., 2020). Furthermore, overexpression of PbBSMT in host Arabidopsis plants caused a reduction in SA production and higher susceptibility to clubroot (Bulman et al., 2019;Djavaheri et al., 2019).
A pipeline was proposed to discover effector proteins based on putative secretion and typical motifs needed for translocation into host cells (Pérez-López et al., 2018). This pipeline was then used to predict 32 small-secreted proteins (SSPs) of <400 aa that were rich in cysteine (Pérez-López et al., 2018). Few of the predicted proteins had the expected translocation motifs (RxLRR and PEXEL), with some proteins bearing ankyrin domains. Kinase functional activity was confirmed for one of the predicted SSPs.
Effectors were traditionally identified using genetic mapbased cloning (Selin et al., 2016), but with the advent of Next Generation Sequencing (NGS) technologies, studies in genomics and transcriptomics provide the raw data for prediction of putative effector candidates expressed during the infection cycle. Here we have implemented a pipeline for traditional and non-traditional effector prediction in a P. brassicae pathotype 5X infecting two B. napus genotypes that previously showed divergent molecular responses to this pathotype (Galindo-González et al., 2020). Pathotype 5X, as classified on the Canadian Clubroot Differential (CCD) set (Strelkov et al., 2018), was one of the first pathotypes to break first generation resistance in clubroot-resistant (CR) canola cultivars . Many of the proteins and domains previously predicted as P. brassicae effectors in other studies (Ludwig-Müller et al., 2015;Schwelm et al., 2015;Yahaya et al., 2015;Pérez-López et al., 2020) were identified in the current study, but some novel candidates are also outlined. Structural modeling of the proteins helped in predicting effector functions in some cases where comparison of the primary amino acid sequence did not result in a putative function.
While traditionally the management of plant pathogens has centered on developing host resistance, the study of pathogen infection mechanisms can help to pinpoint potential interactions that can be targeted to improve plant resistance or to interfere with pathogen compatibility. These interactions are being shown to be quite specific between pathotypes and host genotypes, and therefore, an understanding of both the specific mechanisms of virulence by each pathogen and the common mechanisms of virulence may help in the development of effective host resistance.

Experimental Set Up and Data Analysis
Data analyzed in the current study comes from previously published research where host molecular responses were assessed (Galindo-González et al., 2020). The materials and experiments performed previously are briefly described below for context, and novel analyses performed for the current study on the raw reads corresponding to the pathogen are described in detail.

Pathogen Material and Host Inoculation
Experiments corresponding to the current study have been previously performed and published (Galindo-González et al., 2020). Briefly, inoculum of P. brassicae pathotype 5X, as defined on the Canadian Clubroot Differential (CCD) set (Strelkov et al., 2018), was adjusted to a concentration of 1 × 10 7 resting spores/mL with sterile distilled water. Two B. napus genotypes, the oilseed rape 'Brutor' and rutabaga 'Laurentian, ' susceptible and partially resistant to pathotype 5X, respectively, were inoculated by the root-dip method and grown in parallel with non-inoculated controls; the roots were harvested at 7, 14, and 21 days after inoculation (dai) for each of the control vs. inoculated treatments and both host genotypes.

RNA Extraction and RNAseq
RNA is also from a previous published experiment ( Galindo-González et al., 2020), where only the host fraction was analyzed (Galindo-González et al., 2020). Briefly, RNA was extracted from frozen roots of both host genotypes, and RNAseq was performed from 3 µg of RNA per sample with RNA Integrity numbers >9, as previously described (Galindo-González et al., 2020). Sequencing was conducted at Oklahoma State Genomics (Stillwater, OK, United States).
For the current study, after mapping reads to the host B. napus reference genome (Genbank assembly: AST_PRJEB5043_v1) using TopHat, the unmapped bam files were sorted and transformed into fastq files using samtools v1.10 (Li et al., 2009) and bedtools v2.27.1 (Quinlan and Hall, 2010). Reads were checked for quality using fastqc v0.11.8 1 , and filtered for quality and adapter presence using a minimum per base phred quality value of 30, a minimum read length of 70 and polyG and polyX trimming with a minimum length of 10 bases; the rest of the parameters were left by default.
Coding sequences (CDS') from the e3 P. brassicae genome (Stjelja et al., 2019) were downloaded from the European Nucleotide Archive (WGS project OVEO01000000), and the filtered reads were mapped and quantified to all 9284 CDS' with Salmon v1.1.0 (Roberts and Pachter, 2013) using an indexed transcriptome, mapping validation, a mean read size of 75 bases, a mean read size standard deviation of 5, 50 bootstrap repetitions and GC bias correction. Differential expression analysis of mapped reads was performed using Deseq2 (Love et al., 2014) by comparing the P. brassicae quantified reads from both genotypes at each time-point. Significant differential gene expression was assessed by comparing the TPM (Transcripts Per kilobase Million) dispersion (Deseq2 dispersion based on mean and variance) of three replicates with adjusted p-values (after multiple comparisons) that were equal or below 0.05. Putative secreted proteins were selected with the pipeline described in the section below for genes having at least 10 mapped reads in at least one of the two samples being compared. Additionally, we performed an analysis of these genes for each time-point and sample (P. brassicae reads coming from infection of 'Laurentian' or 'Brutor') for genes which showed <20% covariance in their TPMs, and at least 10 mapped reads on average across three biological replicates.

Secretome Bioinformatics Pipeline
Coding sequences were first annotated by comparing them to the non-redundant (nr) protein Genbank database (downloaded 2020-04-22) and to the Uniprot database (downloaded 2020-08-31), using BLASTx (Altschul et al., 1990) with a minimum e-value of 1e-10. Gene annotation from the comparison against nr was performed by submitting the resulting xml file from BLASTx to BLAST2GO (Conesa et al., 2005;Conesa and Götz, 2008), which performs a consensus annotation from the top 20 hits of each CDS. Uniprot top hit IDs for each CDS were searched on ProSecKB (Powell et al., 2016), a database for protist secretomes, using P. brassicae to search subcellular localization. Pfam and Interpro domain predictions were performed with InterproScan (Quevillon et al., 2005;Hunter et al., 2009).
Putative translations of CDS' were used for secretome prediction. SignalP v4.1f (Nielsen, 2017) was used to predict a signal peptide (presence of signal peptide = Y) and potential cleavage sites, and SecretomeP v1.0h (Bendtsen et al., 2004) was used to predict non-traditional leaderless secreted proteins with a minimum score of 0.5. Proteins with a valid prediction from these two tools were filtered further using TargetP v2.0 (Armenteros et al., 2019), selecting proteins predicted to be targeted for the secretory pathway or unknown pathway "OTHER, " while the proteins carrying chloroplast or mitochondria transit peptides were discarded. The software TMHMM v2.0 (Sonnhammer et al., 1998) was then used for transmembrane helix prediction, where proteins with two or more helices or with one helix nonoverlapping a predicted signal peptide were filtered out. From the remaining sequences, GPI-anchored proteins were discarded using PredGPI (Pierleoni et al., 2008) and NetGPI v1.0 (Gíslason et al., 2021). Extracellular proteins were predicted with WolfPsort (Horton et al., 2007) and Deeploc v1.0 (Almagro Armenteros et al., 2017). These proteins were depicted as secreted and were further analyzed with EffectorP v2.0 (Sperschneider et al., 2018a) and ApoplastP (Sperschneider et al., 2018b). Motifs typically found on effectors (RxLR, Pexel and DEER) (Pérez-López et al., 2018) were searched with Fuzzpro 2 . The percentage of cysteines was calculated, and the protein was classified as cysteine-rich if this percentage was equal or larger than 2.5%. A protein was classified as a small-secreted protein (SSP) if its size was <400 amino acids (Pérez-López et al., 2018).
Our pipeline is similar to the one used by Pérez-López et al. (2018), but uses additional tools for prediction of non-conventional secretion (SecretomeP), membrane anchors (PredGPI and NetGPI), and a dual approach for determining extracellular secretion (WolfPsort + Deeploc). Additionally, we added EffectorP, a tool to try to distinguish between proteins that are secreted and those that exclusively characterized as effectors. Finally, our search for the RxLR, Pexel, and DEER motifs was done automatically with Fuzzpro, as opposed to the manual search performed in the previous studies (Pérez-López et al., 2018.

Orthologous Proteins With Virulence Changes
All non-redundant differentially expressed transcripts and low covariance transcripts were putatively translated and submitted for a PHIB-BLAST at the pathogen-host interactions database (PHI-base) (Urban et al., 2020) using the default parameters. This comparison enabled matching of the predicted P. brassicae secreted proteins with closely related proteins from other pathogens that have been functionally tested for their effects on hosts. Hits with e-values below 1e-5 and showing a change in virulence upon experimental gene alteration were selected for the analysis. 2 http://www.bioinformatics.nl/cgi-bin/emboss/fuzzpro

Protein Structural Prediction
Protein structural prediction was performed with Swiss-Model (Waterhouse et al., 2018) under automated settings. The list of potential templates was checked to confirm that the resulting model was generated from the best template according to sequence alignment and coverage, and to the generated model as assessed by the Global Model Quality Estimation (GMQE) score 3 . Protein structural alignment was performed with PyMol v2.4 (The PyMOL Molecular Graphics System, Version 2.4 Schrödinger, LLC.) with a one-to-one comparison, default alignment, outlier rejection for 5 cycles, a cut-off of 2.0 and mobile and target states of −1. When alignment presented high RMSDs, they were aligned with the super and cealign modes, which work when there is low primary sequence similarity but some structural conservation.

Validation of Expression Levels From Transcripts Corresponding to Secreted Proteins
Validation was performed for 28 target genes predicted as secreted and five housekeeping genes (Supplementary Table 1) using a nanostrings nCounter R Custom Codeset (Nanostrings, Seattle, WA, United States). Target and reporter probes were designed by Nanostrings using the sequences from the latest e3 P. brassicae genome (Stjelja et al., 2019). Probe design by Nanostrings is based on proprietary algorithms that interrogate target sequences in windows of 100 nucleotides shifting one nucleotide at a time. Probes were screened for hybridization efficiency, cross hybridization (no more than 85% sequence homology between two sequences and no more than 16 consecutive matches), GC content and secondary structure (Nanostrings Technologies, 2011). One microgram of RNA per sample was hybridized to probes using a standard nanostrings hybridization protocol (Nanostrings Technologies, 2021b) and run on a nCounter R SPRINT Profiler (Nanostrings, Seattle, WA, United States). Analysis of data was performed using the nSolver Analysis Software 4.0 (Nanostrings, Seattle, WA, United States). Normalized log 2 ratio data from differentially expressed genes and log 2 transformed normalized count values were used for comparison to differentially expressed genes and transcripts with low covariance among biological replicates in the RNAseq data.

BLAST Analysis of Candidate Effectors
The most relevant putative secreted transcript sequences were used to perform a BLASTn search against the genome assemblies of isolates Pb3 (Rolfe et al., 2016), eH (Daval et al., 2019), and ZJ-1 (Bi et al., 2019). Genomes from the three isolates were downloaded from Genbank 4 , and formatted as a blastable database using makeblastd. The query sequences were blasted against the three databases using a minimum e-value of 1e-10.

Differentially Expressed Secretome
There were no differentially expressed genes at 7 dai when comparing P. brassicae transcripts coming from infection of the resistant host genotype 'Laurentian' vs. the susceptible genotype 'Brutor.' This could be due to a lower number of pathogenic cells at the beginning of secondary infection, which was demonstrated by lack of symptoms in the both host genotypes at this time point (Galindo-González et al., 2020). At 14 dai, only two genes were differentially expressed and just one was predicted to be a secreted protein (SPQ95221.1) in our pipeline. This P. brassicae transcript was downregulated in the roots of the resistant genotype 'Laurentian' compared with the roots of 'Brutor' (log 2 FC = −0.36) and was annotated as a hypothetical protein with no recognized domains. The predicted protein did not have a signal peptide, but was classified by SecretomeP as a non-traditional secreted protein, which was supported by the ProSecKB classification. Interestingly, the transcript corresponding to this protein also showed the highest relative abundance (TPM) from the differentially expressed secreted genes (DESGs) at 21 dai in P. brassicae cells from the two host genotypes, but with an opposite trend with the gene being more abundant in P. brassicae infecting 'Laurentian.' The transcript levels of SPQ95221.1, as calculated by TPM, decreased in P. brassicae infecting 'Laurentian' from 29758.99 at 14 dai to 26366.57 at 21 dai. In 'Brutor' the decrease was more substantial, with a TPM of 37719.81 at 14 dai and of 11681.39 at 21 dai.
At 21 dai, 21 differentially expressed transcripts were identified as secreted proteins, with only nine being SSPs based on their size ( Table 1). These transcripts had higher TPMs in P. brassicae infecting 'Laurentian.' Pfam annotation of the 21 putatively secreted proteins showed that nine possessed ankyrin domains and two could be classified as SSPs (SPQ95835.1 and SPQ95830.1). These two proteins were also identified by Pérez-López et al. (2020) as small secreted proteins (SSPbP43 and SSPbP93), following challenge of Arabidopsis with P. brassicae. Transcripts corresponding to genes SPR01793.1 and SPR01853.1 (Table 1) in our study were also identified in the Arabidopsis study as SSPbP31 and SSPbP42, but did not have ankyrin domains. Ankyrin domain repeats mediate proteinprotein interactions, which are essential for the effector-receptor relationship and have been shown to be involved in cell cycle, cell proliferation of host cells and, potentially, P. brassicae life cycle transitions (Pérez-López et al., 2020). Interestingly, EffectorP and ApoplastP, which are machine-learning methods, did not predict any of the ankyrin bearing protein sequences as either being effectors or apoplastic. In fact, only three of the 21 proteins were classified as effectors by EffectorP (SPQ99629.1, SPQ98385.1, and SPR01853.1). Another three had both an RxLR and a Pexel motif (SPR00206.1, SPR00984.1, and SPQ99289.1); these domains are relevant in translocation and secretion (Wawra et al., 2013(Wawra et al., , 2017Pérez-López et al., 2020). SPR00984.1 also presented one of the highest levels of expression from our analysis.
Two DESGs (SPQ97184.1 and SPQ97185.1) encoded proteins with kinase domains, a domain which has been reported in studies of secreted proteins of P. brassicae (Schwelm et al., 2015;Pérez-López et al., 2020). SPQ97185.1 showed the highest differential expression of any transcript at 21 dai (Table 1) and can be classified as a SSP. Finally, another protein (SPR00192.1) had a leucine-rich repeat (LRR) domain; this domain was also reported as enriched in secreted proteins (Schwelm et al., 2015).
The second gene with the highest differential expression (SPQ99629.1) did not have any Pfam domains, and was annotated as a 30S ribosomal subunit 19 (Supplementary Table 2). Seven of eight tools predicted this gene for extracellular secretion. Although it might seem counterintuitive for a ribosomal subunit to be secreted, ribosomal proteins are part of the secretome of Trichoderma virens during its interaction with maize roots (Nogueira-Lopez et al., 2018). Furthermore, knocking down a 40S ribosomal subunit from Puccinia striiformis f. sp. tritici (Pst) resulted in a decreased pathogenicity on infected wheat . While the protein from Pst is not secreted, ribosomal proteins from pathogens could likely be involved in numerous plant-pathogen interactions.
The third gene with the highest differential expression (SPQ98385.1) did not have a blast match or a recognizable protein domain. However, most of the tools predict this gene as secreted and it is the only sequence in this list classified as both an effector by EffectorP and as secreted to the apoplast by ApoplastP ( Table 1). The predicted protein is also the right size for a SSP and has a high cysteine content (5.1% -Supplementary Table 2). This gene constitutes an important candidate for functional analysis, and therefore we built a 3D model using Swiss model to discover the potential function of the encoded protein. The predicted structure was similar to the E3 ubiquitin-protein ligase RING subunit (Figure 1A) of the ubiquitin ligase gp78, which forms a heterotrimeric complex with another protein ubiquitin ligase and an ubiquitin-conjugating E2 enzyme ( Figure 1B). The glycoprotein Gp78 is an E3 ubiquitin ligase that prevents accumulation of misfolded proteins (Joshi et al., 2017), but also seems to be involved in cell proliferation in different cancerous cells. E3 ubiquitin ligases are involved in protein modulation and more often in protein degradation via the proteasome. Characterization of E3 ubiquitin ligases from the P. brassicae genome indicated that 11 RING-type E3 ligases had a signal peptide and were likely to be secreted (Yu et al., 2019). While an effector function has not been validated for these proteins, it has been speculated they could aid in targeting host receptors for degradation, therefore favoring susceptibility of the host (Yu et al., 2019). Interestingly, the structural model of another one of our predicted P. brassicae secreted proteins (SPQ96771.1 - Figure 2) with similar characteristics to SPQ98385.1, but without positive predictions on EffectorP or ApoplastP, was similar to an S-phase kinase-associated protein 1 (SKP1 -Swiss model ID: 6w66.1, GMQE = 0.16, QMEAN, -2.66). This protein is an important component of the SCF complex (Skip1-Cullin1-F-box) E3 ubiquitin ligase complex that is formed for protein degradation through the 26S proteasome. The heterotrimeric complex used as template for modeling SPQ96771.1 is involved in ubiquitination and degradation of inactive heterodimers of BTB proteins (Mena et al., 2020). | Mutational analysis of the blast   rice fungus Skp1 has shown that this gene is essential for fungal development and pathogenicity (Prakash et al., 2016), indicating how proteins involved in protein degradation may be important candidates for pathogenicity in P. brassicae.

Consistently Expressed Genes
To identify genes that showed consistent high expression (low variability among biological replicates), and that can uncover effectors affecting both hosts, we looked at the P. brassicae transcripts within each genotype and time-point having <20% covariance in their TPM across the three biological replicates per treatment. We found no P. brassicae transcripts with low covariance among biological replicates at 7 dai in either genotype, while 17 and 29 transcripts corresponding to secreted proteins under our pipeline showed low covariance at 14 dai in P. brassicae cells from 'Laurentian' and 'Brutor, ' respectively. At 21 dai, these numbers increased to 23 P. brassicae transcripts from the interaction with 'Laurentian' and 68 with 'Brutor' (Figure 3).

Shared P. brassicae Transcripts at 14 and 21 Dai From Both Hosts
Five transcripts showed consistent expression in P. brassicae infecting both host genotypes at 14 and 21 dai (Figure 3 and

Supplementary Tables 3-5).
Only one of these five transcripts (SPQ99289.1) was differentially expressed in P. brassicae samples coming from the two hosts at 21 dai (Supplementary Table 5).
The transcript with the highest expression across both days and genotypes was a SAM (S-adenosyl-L-methionine)dependent carboxyl methyltransferase (SPQ93076.1 -Supplementary Table 5), also known as Benzoic acid/salicylic acid methyltransferase (BSMT). A higher TPM was evident for this transcript at 14 dai in P. brassicae infecting 'Brutor, ' but the relationship was inverted at 21 dai (Supplementary Table 5). This is consistent with the increased expression of this gene between 2 and 4 weeks when P. brassicae colonized B. napus (Rolfe et al., 2016), and its presence during secondary infection using localization assays (Badstöber et al., 2020). Salicylic acid is usually inactive in its methylated form, which impairs SA-mediated defense responses. In our previous study, where we characterized the host responses of genotypes infected by pathotype 5X from the current study, SA-mediated responses were the main defense mechanism from hosts, with the more resistant cultivar displaying a stronger SA-mediated molecular defense (Galindo-González et al., 2020). Multiple studies in B. napus, Brassica rapa and Arabidopsis showed that SA and SA-mediated molecular responses increased in resistant cultivars The region corresponding to the S-phase kinase-associated protein 1 was predicted using Swiss-model. A Global Model Quality Estimation (GMQE) score closer to 1 represents more accuracy in the models, and a QMEAN value above -4 is indicative of good quality of the structure (https://swissmodel.expasy.org/docs/help).
when challenged with P. brassicae, and that the application of SA decreased clubroot symptom severity (Lemarie et al., 2015;Chen et al., 2016;Prerostova et al., 2018). The benzoic acid/salicylic acid methyltransferase from P. brassicae was shown to methylate salicylic acid (SA) in in vitro assays and its highest transcription level matches a high production of SA by the host (Ludwig-Müller et al., 2015). This gene also showed high levels of expression in gall tissue of B. oleracea when compared with asymptomatic root tissue in the same plants (Ciaghi et al., 2019), and its overexpression in host plants causes increased susceptibility and decreases in active SA levels (Bulman et al., 2019;Djavaheri et al., 2019). Transcripts of PbBSMT increase as clubroot disease progresses in Arabidopsis, and reads related to this gene were overrepresented in a callus culture of B. rapa infected with P. brassicae (Ludwig-Müller et al., 2015). The authors argued that increased expression of PbBSMT during secondary infection of P. brassicae in Arabidopsis would potentially be necessary to decrease the SA-mediated response of the host. This gene is also one of 32 SSPs previously predicted in the Arabidopsis-clubroot interaction (Pérez-López et al., 2020). From our bioinformatics analysis, BSMT is a SSP with RxLR (aa 211-214) and  motifs. The protein sequence, however, was not predicted to be an effector or apoplastic by EffectorP or ApoplastP (Supplementary Table 5), and its percentage of cysteine (1.3%) was below our threshold for a cysteine-rich protein. Nevertheless, BSMT is probably one of the most important proteins from the P. brassicae secretome.
The second transcript (SPR00473.1) with highest expression among these five genes expressed from P. brassicae cells from both hosts at 14 and 21 dai was annotated as a luminalbinding protein 5/heat shock protein 70 (Hsp70) (Supplementary Table 5). Heat shock proteins aid in correct protein folding and are usually expressed under stress conditions where proteins tend to denature; although counterintuitive, the infection of the host by P. brassicae also exerts physiological stress upon the pathogen proteins which are directly secreted in the host cytoplasm. A heat shock protein 70 was one of 76 genes identified using subtractive hybridization in P. brassicae-infected Arabidopsis (Bulman et al., 2006). Heat shock protein transcript expression was detected in P. brassicae infecting Arabidopsis at 14, 21, and 28 dai (Siemens et al., 2009), and throughout a time-course from 6 to 41 dai in B. rapa (Wu et al., 2012). Interestingly, HSPs including HSP70 from Plasmodium falciparum have been implicated in inter-organelle trafficking and the export of proteins of parasitic origin (Shonhai, 2010). In fact, because of their nature, HSPs have been proposed as targets for antimalarial drugs that can inhibit HSP association with co-chaperons and their trafficking function (Shonhai, 2010;Kumar et al., 2018). Although a human parasite, there may be some similarities between pathogenesis involving P. falciparum and P. brassicae (Pérez-López et al., 2020). For example, the Pexel motif found in P. falciparum was suggested as a very likely motif directing translocation of effectors in P. brassicae (Marti et al., 2004;Pérez-López et al., 2020). The third gene with highest transcript level (SPQ96353.1) was annotated as a peptide methionine sulfoxide reductase (Msr). This enzyme catalyzes the reduction of oxidized methionine in proteins back to methionine (Weissbach et al., 2002). Oxidation of sulfur groups in sulfur peptides like methionine is a common occurrence under highly oxidative environments, and would be favored when host cells enhance reactive oxygen species (ROS) production as part of their defense response. In one of our recent studies, three respiratory burst oxidase (RBOH) genes were upregulated in a resistant B. napus host in response to P. brassicae pathotype 3A ; similar genes were activated in resistant wild cabbage when challenged with this pathogen . A peptide methionine sulfoxide reductase A (MsrA) from the bacterium Erwinia chrysanthemi was shown to be necessary for virulence on its plant hosts, since mutants of this gene resulted in lower virulence and no systemic invasion of the pathogen (El Hassouni et al., 1999). This demonstrates that host plants respond to the pathogen by activating ROS and that the pathogen activates antioxidant enzymes to survive in the oxidative environment. Furthermore, it is possible that antioxidant pathogen enzymes can protect the virulence factors by reversing their oxidation to allow disease progression (El Hassouni et al., 1999). The action of Msr proteins as important determinants of virulence is supported by studies of human pathogens, which have shown decreases in virulence when these genes are mutated (Alamuri and Maier, 2004;Denkel et al., 2011;Singh et al., 2018). Our characterization of SPQ96353.1 indicates that this gene encodes a cysteine-rich non-conventional SSP without a signal peptide. Additionally, the translated product was predicted to be an effector and apoplastic by EffectorP and ApoplastP, further suggesting a role in infection.
The last of these consistently expressed genes was a serine carboxypeptidase (SPR01261.1). A serine protease cloned from P. brassicae (PRO1) showed catalytic activity in a heterologous system and was upregulated in canola roots from 4 to 42 dai (Feng et al., 2010). This gene was proposed to be involved in resting spore germination. Peptidase/proteases were reported among predicted SSPs in seminal studies of the P. brassicae genome (Schwelm et al., 2015;Rolfe et al., 2016). In other plant hosts, serine carboxypeptidases have also been reported as likely virulence factors (Kim et al., 2004;Williams and Sternthal, 2010;Dalman et al., 2013).

Plasmodiophora brassicae Secretome Transcripts When Infecting 'Brutor' at 21 Dai
The majority of transcripts with low covariance came from P. brassicae infecting 'Brutor' at 21 dai (Supplementary Table 3 Table 3). From all transcripts analyzed, a larger proportion of pathogen genes had higher expression when infecting 'Brutor' at 14 dai, while at 21 dai, more pathogen transcripts interacting with 'Laurentian' had a higher average TPM. However, the level of expression of the P. brassicae transcripts from the 'Laurentian' interaction at 21 dai presented high covariance, which resulted in these genes being identified mainly in P. brassicae coming from 'Brutor' (Supplementary Table 6).
From the 36 transcripts with low covariance identified from P. brassicae in infected 'Brutor' plants at 21 dai (Bin 2, Supplementary Table 3), the most common domain according to the Pfam annotation corresponded to Ankyrin repeats, as described in our differential expression analysis above. In addition to this domain, other domains identified indicated the importance of pathogen protection, detoxification and neutralization of oxidative damage, including 2OG-Fe (II) oxygenase, glutathione peroxidase, aldehyde dehydrogenase, thioredoxin, and short chain dehydrogenase. The 2OG-Fe (II) oxygenase (SPQ94350.1 -Supplementary Table 6), presented low covariance at 21 dai but was also expressed at 14 dai. This exact same gene [corresponding to PBRA003835 in the first genome draft of P. brassicae (Schwelm et al., 2015)] has been recently identified as significant inhibitor of programmed cell death (PCD) in the clubroot interaction, and designated as PbPE15 . Another group FIGURE 4 | Predicted 3D structure of a Cystatin Protease Inhibitor. (A) The cystatin homo-tetramer was predicted using Stefin-B (Cystatin-B, STML ID:2oct.1.C) using Swiss-model. A Global Model Quality Estimation (GMQE) score closer to 1 represents more accuracy in the models, and a QMEAN value above -4 is indicative of good quality of the structure (https://swissmodel.expasy.org/docs/help). Alfa helices are depicted in purple and beta sheets and loop regions are in green; the four L1 and the four L2 loop regions are indicated. (B) Multiple sequence alignment of the four predicted monomers forming the tetramer with the template Stefin-B. Alpha helices are highlighted in purple and beta sheets are in green. The three conserved sites that form a complex with papain: the N-terminal Trunk (NT), loop 1 (L1), and loop 2 (2), are also shown.
of genes was annotated to processes of nutrient acquisition and transformation, including carbohydrates and nitrogen derivates metabolism (glucosidase II beta subunit, aldolase 1-epimerase, allantoicase, and amidohydrolase), which is in agreement with overall patterns of secretomes in biotrophic fungi (Kim et al., 2016).
Three previously reported SSPs (SSPbP31, SSPbP53, and SSPbP21) (Pérez-López et al., 2020) were among these 36 predicted proteins at 21 dai, and corresponded to genes SPR01793.1, SPR01202.1, and SPQ93225.1 in the current study. The three transcripts did not show similarity to any pfam domains in our study. Pérez-López et al. (2020) reported SSPbP53 as a cysteine protease inhibitor directed to the apoplast, consistent with the apoplastic prediction we obtained with ApoplastP (Supplementary Table 3). Furthermore, homology modeling of the protein (Figure 4) showed its similarity to a human Stefin B protein tetramer. Stefin B is a protease inhibitor from the cystatin family, which helps to protect the cells from protein misfolding and aggregation (Žerovnik, 2019), thereby reducing the impact of ROS. Protease inhibitors are important weapons in the virulence of fungi and oomycetes, by directly interacting with plant proteases and thus suppressing their activity (Wang et al., 2020). Further functional characterization of SSPbP53 has confirmed its function as a papain-like cysteine protease inhibitor in cruciferous plants . Also, a cystatin-like protease inhibitor from the oomycete Phytophthora palmivora inhibits the activity of the papaya cysteine protease papain (Gumtow et al., 2018). Each of the cystatin monomers of the cystatin tetramer has a typical structure with an alpha helix on top of an antiparallel beta sheet as can be seen from the predicted model ( Figure 4A). The monomers in the model contain two of the three predicted conserved sites (Nagata et al., 2000;Gumtow et al., 2018), a first binding loop (L1) made of a conserved QxVxG motif, and a second binding loop (L2), which usually contains a proline followed by an aromatic residue (H or W), although this latter loop seems interrupted by a Glycine (G) in the P. brassicae protein (Figure 4B). The third site could not be predicted on the 3D model, but corresponds to an N-terminal trunk (NT) that can be seen in the sequence from the multiple alignment ( Figure 4B). These conserved sites form a hydrophobic wedge that tightly binds to the active cleft of papain (Bode et al., 1988;Turk and Bode, 1991). Finally, the last of these three SSPs (SSPbP21) corresponding to SPQ93225.1 was reported as bearing a chromosome segregation protein domain, which could be involved in pathogen proliferation or lifecycle transitions (Pérez-López et al., 2020).
Genes that are expressed consistently among replicates at more than one time-point may be relevant to initiating and maintaining infection. From the 22 transcripts found to be shared in the P. brassicae interaction with 'Brutor' at 14 and 21 dai, five corresponded to the previously discussed genes that were shared at both time-points and genotypes, and eight of the remaining transcripts were differentially expressed at 21 dai. Among the remaining transcripts, there were two protein kinases, and a NUDIX domain protein. From the two genes characterized as protein kinases, SPQ95766.1 corresponded to a P. brassicae effector named SSPbP22 that has been previously validated for kinase activity (Pérez-López et al., 2020). On the other hand, nudix effectors are common in bacteria, fungi and oomycetes, highlighting their importance in pathogenesis (Dong and Wang, 2016). Ectopic expression of the NUDIX domain effector Avr3b from the oomycete Phytophthora sojae reduces ROS, and its similarity with the host NUDIX regulator potentially results in negative regulation of the host immune response (Dong et al., 2011;Dong and Wang, 2016). A NUDIX hydrolase gene was also overexpressed in a pathobiome study where clubroot symptoms were severe (Daval et al., 2020).

Orthologous Pathogen-Host Interactions
We compiled all non-redundant genes corresponding to secreted proteins from our differential and covariance analysis and submitted them to a database of pathogen-host interactions 5 , which provides information on genes proven to affect the interaction with the host (Urban et al., 2020). Among the 84 non-redundant secreted proteins determined from our pipeline (Supplementary Table 7), we selected those with a matching pathogen protein displaying a minimum e-value of 1e-5, and with a demonstrated effect on virulence upon its mutation ( Table 2).
Twenty-one P. brassicae sequences corresponding to secreted proteins showed similarity to proteins from other pathogens where disruption or alteration of gene expression resulted in loss of pathogenicity, or increased or reduced virulence ( Table 2). Some of these genes, which were discussed in the preceding sections, constitute important candidates for disruption in P. brassicae. Strategies including RNA interference (siRNA and dsRNA) have been used in plants infected by biotrophic and hemibiotrophic fungi to disrupt transcripts of important effectors (Nowara et al., 2010;Bharti et al., 2017;Zhu et al., 2017;Song and Thomma, 2018;Guo et al., 2019). The interfering RNA molecules transformed into the host are expected to be exchanged between host and pathogen through a vesicular system, resulting in degradation of the pathogen RNA target gene (Ghag, 2017;Rose et al., 2019). For example, disruption of CPK1 (protein kinase catalytic subunit) resulted in a loss of pathogenicity by Colletotrichum lagenaria on cucumber ( Table 2). This protein was also disrupted through interfering RNAs using host-induced silencing in P. striiformis f. sp. tritici, through transformation of wheat plants, resulting in increased host resistance (Qi et al., 2018). Therefore, we infer that the P. brassicae query genes (SPQ97184.1, SPQ97185.1) could be suitable candidates for decreasing clubroot disease symptoms if RNA interference technology was applied.
Transcripts SPQ97950.1 and SPQ97562.1 matched MoTup1 (Magnaporthe oryzae general transcriptional repressor Tup1), which upon mutagenesis presented loss of pathogenicity ( Table 2). The most significant hit from the analysis was for SPQ95942.1, a gene with domains for thioredoxin and glucosyltransferase. Its matching hit in PHI-base is a Beta-1,6 glucan synthesis gene from the hemibiotroph Colletotrichum graminicola, which showed reduced virulence on Zea mays when RNAi was used (Oliveira-Garcia and Deising, 2016). Another candidate of interest is SPQ99747.1, which was similar to an arginine methyltransferase from Fusarium graminearum. The arginine methyltransferase (AMT1) from F. graminearum is a key gene in the methylation of ribonucleoprotein complexes that are exported from nucleus to cytoplasm for mRNA maturation, and therefore, its disruption results in decreased pathogen development and infection (Wang et al., 2012).
Finally, one of the most interesting effector candidates may be SPQ96145.1, which showed similarity to an indole-3acetaldehyde dehydrogenase from Pseudomonas syringae. We built a model of the protein (Figure 5), using an alphaaminoadipic semialdehyde dehydrogenase (SMTL:6o4i.1) that resulted in a GMQE of 0.83 and a QMEAN of -1.33, showing high reliability for the model generated. Alignment of the generated model to the template (PDB accession 6o4i.1) resulted in a root-square mean deviation (RMSD) of 0.121 for a total of 10426 atoms aligned ( Figure 5A). We also built models for aldehyde dehydrogenases A and B (Figures 5B,C). Aldehyde dehydrogenase B from P. syringae, was the match found on Table 2 for SPQ96145.1, while Aldehyde dehydrogenase A is another closely related aldehyde dehydrogenase from P. syringae involved in indole acetic acid (IAA) synthesis (McClerklin et al., 2017). Structural alignment of the SPQ9645.1 model to the AldA model resulted in a RMSD of 2.302 for a total of 8465 atoms, while a good RMSD for the alignment to AldB could only be achieved by aligning the closest atoms (760) between the structures using the cealigm option from PyMol. These results indicate good structure conservation between the P. brassicae protein and the P. syringae proteins, especially with AldA. P. syringae uses AldA  The aldehyde dehydrogenase from P. brassicae (SPQ96145.1) was predicted using Alpha-aminoadipic semialdehyde dehydrogenase (STML ID:6o4i.1.A -second structure from left to right) using Swiss-model. The structural alignment of the two structures can be seen on the third column. (B,C) Models for aldehyde dehydrogenases A and B (AldA and AldB -second column) from P. syringae which are involved in IAA synthesis, were aligned with the P. brassicae model, to produce the structural alignments seen on column 3. RMSD, root mean square deviation of all aligned atoms between the two structures, where a value of 0 is a perfect alignment. The number of aligned atoms is also reported for the three structural alignments. A Global Model Quality Estimation (GMQE) score closer to 1 represents more accuracy in the models, and a QMEAN value above -4 is indicative of good quality of the structure (https://swissmodel.expasy.org/docs/help).
and AldB to synthesize the auxin IAA, and it has been shown that disruption of these genes decreases virulence on Arabidopsis thaliana (McClerklin et al., 2017), with a higher disruption of IAA synthesis in aldA mutants. Moreover, the production of IAA by the pathogen increases virulence by disrupting SA-mediated defenses, which is complementary to increased susceptibility when auxin is generated by the host (McClerklin et al., 2017). Since auxin is one of the main players in clubroot development, but has generally been studied as a plant response hormone, a potential homologous mechanism to P. syringae (where not only the host but also P. brassicae aids in IAA production to promote virulence) could be in place. While the level of expression for this gene seems low in our study (Supplementary Table 6), a more complete study of the metabolic pathways leading to IAA production in P. brassicae, along with expression and functional analysis, would demonstrate the relevance of these key genes in infection, and open avenues for disruption of the disease.

RNAseq Data Validation With Nanostrings
We used 28 predicted effector genes showing either significant differential expression at 21 dai between P. brassicae infecting the two B. napus genotypes, or showing low levels of covariance among biological replicates and consistent high expression at 14 and 21 dai for P. brassicae from either of the two genotypes (most of the genes selected are discussed throughout the manuscript). Nanostrings nCounter technology is based on single RNA molecule detection without the need for amplification, using unique fluorescent barcodes bound to capture probes that bind to specific gene target templates (Nanostrings Technologies, 2021a). Once the probe complex is bound to the target, they can be captured on a streptavidin surface using a biotin tag from the probe, where they are immobilized for imaging. A scanner counts the number of each class of barcode, reflecting the number of RNA molecules from each gene. FIGURE 6 | Validation of RNAseq data using nCounter Nanostrings technology. We evaluated 28 target genes which were either differentially expressed 21 dai or consistently expressed in both cultivars at 14 and 21 dai. As seen from the graphs, not all scatter plots have the same number of points, which reflects different amounts of genes evaluated under the different analyses according to the obtained results. (LA) P. brassicae genes coming from the interaction with 'Laurentian,' (BR) P. brassicae genes coming from the interaction with 'Brutor,' (DEGs) Differentially Expressed Genes.
After filtering a single gene with low Nanostring counts (<100 -SPQ94523), and one biological replicate from each genotype at 14 dai, flagged during mRNA positive control normalization, we compared average log 2 RNAseq values with the average log 2 Nanostring values. To establish the stability of five housekeeping genes, we plotted log 2 counts of the five genes per sample to one random sample from either 14 or 21 dai, which confirmed that all genes were stable and could be used for differential expression analyses (Supplementary Figure 1).
We found a Pearson correlation of 0.87 at 21 dai when comparing the log 2 fold changes between the P. brassicae RNAseq TPMs of the two genotypes with the log 2 fold changes determined from the ratio counts with Nanostrings ( Figure 6). The comparisons of the RNAseq results with the Nanostrings results for consistently expressed effectors at both 14 and 21 dai for both genotypes also showed correlations close to or >0.80 (Figure 6). This demonstrates the robustness of the RNAseq results, and shows that using Nanostrings as an alternative to qRT-PCR is a valid and less cumbersome approach to validate transcriptome results.

CONCLUSION AND FUTURE DIRECTIONS
Diverse prediction tools can sometimes provide different outcomes based on distinct algorithms. For example, most tools used in this study were predictive (e.g., HMMs or Neural Networks), but some other tools were annotated databases (e.g., ProSecKB). This will result in some gene annotation predictions not being confirmed by most tools, but allows the identification of more likely secreted effectors. While tools like SecretomeP were trained for mammalian sequences and have low reliability to predict leaderless secretory proteins in plants (Lonsdale et al., 2016), such tools have been used for the prediction of nonclassical secretion in plant pathogenic fungi (Verma et al., 2016). SecretomeP is an important resource, since it predicted proteins as secreted in agreement with the protist secretome database (ProSecKB), with many of them not carrying a predicted signal peptide. This highlights the importance of looking at alternate routes of secretion and testing an array of tools for prediction.
While we only found a few transcripts differentially expressed from P. brassicae infecting either the resistant host 'Laurentian' or the susceptible host 'Brutor, ' we observed a larger number of genes being consistently regulated among the three biological replicates coming from P. brassicae when infecting the susceptible cultivar. Likewise, we observed certain common genes coming from P. brassicae when infecting both genotypes (e.g., Supplementary Table 5). We believe that P. brassicae might use some of these core effectors as a general deployment strategy to promote infection, regardless of the host interaction.
Many of the predicted effectors (e.g., BSMT) have been shown as being key players in other studies (Ludwig-Müller et al., 2015;Pérez-López et al., 2020), and as such, represent good candidates for further study. We have selected the most relevant candidates from our study to summarize their importance in previous and future studies ( Table 3). The 17 predicted secreted proteins are present in three other genome isolates: isolate Pb3 from Canada (Rolfe et al., 2016), ZJ-1 from China (Bi et al., 2019) and eH from Europe (Daval et al., 2019). These three isolates represent broad evolutionary divergence and have draft genomes which can be used for comparison. We found that all candidates selected are present in the other three genomes and have high levels of similarity (Table 3). A few hits have short unaligned regions (SPQ99629.1, SPQ96353.1, and SPQ97184.1) that may represent regions of variability, resulting in no alignment by the BLAST algorithm, or regions which might be absent in the target genome.
The presence of these putative secreted proteins in a wide variety of genomes highlights the importance of validating their roles using techniques like RNA interference. The candidates can also be used for designing high-throughput yeast-two hybrid assays to discover potential host targets of virulence. Future disruption of effector-receptor associations holds promise for understanding the basis of clubroot disease.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/sra/?term= PRJNA597078. Sequencing reads used in this study correspond to a previous study Galindo-González et al. (2020). The reads are deposited in the NCBI Sequence Read Archive (SRA) under accession number PRJNA597078.

AUTHOR CONTRIBUTIONS
LG-G designed and conducted all the experiments, performed all analyses, and wrote the manuscript. S-FH helped secure project funding and contributed to the original experimental context. SS provided project guidance and edited several versions of the manuscript. All authors contributed to the article and approved the submitted version.