Abstract
The steadily increasing number of sequenced fungal and oomycete genomes has enabled detailed studies of how these eukaryotic microbes infect plants and cause devastating losses in food crops. During infection, fungal and oomycete pathogens secrete effector molecules which manipulate host plant cell processes to the pathogen's advantage. Proteinaceous effectors are synthesized intracellularly and must be externalized to interact with host cells. Computational prediction of secreted proteins from genomic sequences is an important technique to narrow down the candidate effector repertoire for subsequent experimental validation. In this study, we benchmark secretion prediction tools on experimentally validated fungal and oomycete effectors. We observe that for a set of fungal SwissProt protein sequences, SignalP 4 and the neural network predictors of SignalP 3 (D-score) and SignalP 2 perform best. For effector prediction in particular, the use of a sensitive method can be desirable to obtain the most complete candidate effector set. We show that the neural network predictors of SignalP 2 and 3, as well as TargetP were the most sensitive tools for fungal effector secretion prediction, whereas the hidden Markov model predictors of SignalP 2 and 3 were the most sensitive tools for oomycete effectors. Thus, previous versions of SignalP retain value for oomycete effector prediction, as the current version, SignalP 4, was unable to reliably predict the signal peptide of the oomycete Crinkler effectors in the test set. Our assessment of subcellular localization predictors shows that cytoplasmic effectors are often predicted as not extracellular. This limits the reliability of secretion predictions that depend on these tools. We present our assessment with a view to informing future pathogenomics studies and suggest revised pipelines for secretion prediction to obtain optimal effector predictions in fungi and oomycetes.
Introduction
The growing number of sequenced fungal and oomycete plant pathogen genomes has enabled detailed reverse genetics studies into molecular pathogen-host interactions (Dean et al., 2012; Kamoun et al., 2014). Though fungi and oomycetes belong to phylogenetically distinct microbial taxa, they both use a diverse class of molecules, termed effectors, to promote pathogenicity through subversion of host defenses or impairment of normal host-cell function (Dodds and Rathjen, 2010; Lo Presti et al., 2015). Effector molecules may be the products of either secondary metabolite or protein synthesis, however the majority of effectors identified in fungi and oomycetes are the latter. Proteinaceous effectors are initially synthesized intracellularly and require relocation to the extracellular space (apoplastic effectors) or subsequent import into the host cell cytoplasm or specific organelles (cytoplasmic effectors). The classical endoplasmic reticulum (ER)/Golgi-dependent secretion pathway in eukaryotes is well-defined and involves recognition of an N-terminal signal peptide that is cleaved off as the protein is translocated across the membrane (Von Heijne, 1990). Classical signal peptides can be predicted computationally with high accuracy (Menne et al., 2000; Klee and Ellis, 2005; Choo et al., 2009; Min, 2010; Melhem et al., 2013), and the majority of experimentally verified fungal and oomycete effectors are predicted to be secreted in this manner. However, reports are emerging for yet unknown, non-classical secretion pathways to also play a role in fungal and oomycete effector externalization (Ridout et al., 2006; Liu et al., 2014). Numerous eukaryotic plant pathogen effectors have been found to be active inside the host cell cytoplasm; however the knowledge of how effectors are delivered into the plant cells after secretion is fragmentary. In oomycete effectors, conserved amino acid motifs such as RXLR, CHXC, or LFLAK are positioned in N-terminal domains and define oomycete effector superfamilies (Petre and Kamoun, 2014). Although mechanisms have been proposed as to how the RXLR motif may facilitate cell entry through the host cell membrane phospholipid bilayer, the results are still controversial (Tyler et al., 2013; Wawra et al., 2013). Conserved sequence motifs associated with translocation have thus far not been found for fungal effector proteins, which makes their computational prediction from secretomes challenging (Sperschneider et al., 2015). A conserved Y/F/WxC-motif has been identified in the N-terminus of effector candidates in the barley powdery mildew fungus (Godfrey et al., 2010), however the role of this motif in cell entry or pathogenicity remains undetermined.
Several studies have exploited proteomics to experimentally identify secreted proteins involved in pathogenicity. For example, an early proteomics study of extracellular proteins of the wheat-infecting fungus Fusarium graminearum identified 120 candidates secreted in planta, of which only 56% possessed a predicted signal peptide motif (Paper et al., 2007). A later study in the same species identified only 69 secreted proteins, following growth in barley or wheat flour-based liquid cultures to mimic host-pathogen interactions (Yang et al., 2012). Of these, 70% possessed a predicted signal peptide. A recent study in the oomycete potato pathogen Phytophthora infestans predicted 80% of its extracellular proteome to contain a signal peptide (Meijer et al., 2014). Thus, there appears to be wide variability (both between species and experiments) in the number of extracellular proteins identified through experimental proteomics that are also predicted to be secreted in silico. A high percentage of proteins lacking a classical signal peptide may be due to contamination of extracellular samples with intracellular proteins, due to rupture of the fungal cells during the protein extraction procedure. Furthermore, protein extraction may be complicated in species where there is a low or variable pathogen biomass relative to the host, or that selectively secrete different proteins when grown in different in vitro or infection-mimicking cultures. Computational limitations also have the potential to complicate proteomics experiments. This may come from variability between species in their use of non-classical secretion mechanisms, which cannot yet be accurately predicted. Gene annotation is also an important determining factor for the reliability of both experimental proteomics and computational prediction of secretion. Proteomic identification of genes is dependent on the completeness and accuracy of translated gene annotations that are used to generate a searchable database of predicted trypsin-digested proteins, to which peptide mass-spectra are matched (Bringans et al., 2009). Thus, missing or incorrect gene annotations may exclude or confuse identification of extracellular proteins. Prediction of secretion also relies strongly on the presence and accurate annotation of the 5′ exons of genes, which encode N-terminal signal peptides. Due to these technical difficulties, deriving accurate computational predictions of secreted proteins from whole genome sequences remains an important pursuit in plant pathology, with a view toward efficient identification of secreted proteins for subsequent effector prediction.
The apparent ease of secretion prediction has led to its common use in pathogenomic studies as a first pass filter in narrowing down a whole proteome dataset into a short-list of potential effector candidates (Kämper et al., 2006; Raffaele et al., 2010; Rouxel et al., 2011; Hane et al., 2014; Nemri et al., 2014). A variety of software tools exists for eukaryotes that can predict whether proteins are secreted into the extracellular environment (Emanuelsson et al., 2007). Typically, this involves recognition of the N-terminal secretory signal peptide motif that directs proteins through the classical ER/Golgi-dependent pathway using tools such as SignalP (Petersen et al., 2011). Whilst this is a robust approach for defining a set of potential effector candidates, typically far more candidates are predicted for experimental validation than is feasible. Furthermore, proteins that are predicted to be secreted via a classical pathway might be retained in the ER/Golgi or fulfill roles as part of the cell wall. Therefore, subcellular localization prediction is an important tool that can point toward the functional role or interaction partners of a protein based on its amino acid sequence and can be used to assess if a protein is indeed secreted into the extracellular space (Emanuelsson, 2002). Transmembrane proteins are also commonly predicted and removed from the secretome as these are likely to fulfill functions in the pathogen cell wall. Whilst in silico methods for secretome prediction are under active development and show robust performance, their reported predictive accuracy strongly depends on the selection of the test set and independent benchmarking studies are important for an unbiased tool evaluation. For example, a comprehensive benchmark of secretion prediction tools found that predictive accuracy was in many cases lower than those initially reported by the developers (Klee and Ellis, 2005). Although an evaluation on a large test set covering a wide taxonomic spectrum gives a good indication of a tool's performance, it provides limited insight into its expected performance on a specialized set of proteins, such as effector proteins of fungal and oomycete pathogens.
This study set out to reveal the strengths and weaknesses of existing protein secretion and subcellular localization prediction methods, as applied to the identification of effector proteins produced by fungi or oomycete plant pathogens. Prediction pipelines that have been used in previous studies for defining secretomes and subsequently effector candidates of eukaryotic plant pathogens are diverse and highly parameterized, as exemplified in Table 1. For example, SignalP (Nielsen et al., 1997; Nielsen and Krogh, 1998; Bendtsen et al., 2004b; Petersen et al., 2011) or Phobius (Käll et al., 2004) are utilized by the majority of pipelines to extract proteins that are likely to be secreted via a classical pathway. Despite the availability of the latest version of SignalP 4, which was designed to discriminate between signal peptides and N-terminal transmembrane (TM) regions, previous versions (2 and 3) are still frequently used due to their increased sensitivity. Phobius was designed to predict secretion and N-terminal TM domains separately, predicting both the presence of a signal peptide and the number and location of TM helices.
Table 1
| References | Species/taxon | Secretome size (total number of predicted genes) | Positive evidence | Negative evidence | Remarks |
|---|---|---|---|---|---|
| (A) FUNGAL PATHOGENS | |||||
| Kämper et al., 2006 | Ustilago maydis | 426 (6902) | SignalP 3.0, TargetP, ProtComp 6.0 | ||
| Ma et al., 2010 | Fusarium oxysporum f. sp. lycopersici | 1803 (17,735) | SignalP 3.0 | TMHMM, Phobius, TargetP | SignalP D-score, TM allowed in the first 50 amino acids, TargetP no mitochondrial targeting with RC = 1 or 2 |
| F. verticillioides | 1549 (14,179) | ||||
| Spanu et al., 2010 | Blumeria graminis f. sp. hordei | NA (5854) | SignalP, SecretomeP | – | S-prob > 0.9, SecretomeP > 0.5 |
| Rouxel et al., 2011 | Leptosphaeria maculans | NA (12,469) | SignalP 3.0, TargetP | TMHMM | Both SignalP-NN and SignalP-HMM predict SP, TargetP predicts “secreted,” 1 TM in SP allowed |
| Duplessis et al., 2011 | Melampsora larici-populina | NA (16,399) | SignalP | TargetP, TMHMM, big-PI | S-prob > 0.9, TargetP no mitochondrial targeting with RC = 1 or 2, one TM ≥ 18 in first 60 aas allowed |
| Puccinia graminis f. sp. tritici | NA (17,773) | ||||
| Klosterman et al., 2011 | Verticillium dahliae | 780 (10,535) | SignalP 3.0, WoLF PSORT | TMHMM, Phobius | D-score > 0.5, WoLF PSORT predicted to be extracellular, no TM predicted by either TMHMM or Phobius |
| V. albo-atrum | 759 (10,221) | ||||
| Cantu et al., 2011 | P. striiformis f. sp. tritici | 1088 (20,423) | SignalP 3.0 | TMHMM, TargetP | S-prob ≥ 0.9, average cleavage site position: 24 ± 9, TM length ≥ 18 aas and not in the first 60 aas, TargetP no mitochondrial proteins (RC = 1 or 2) |
| O'Connell et al., 2012 | Colletotrichum higginsianum | 2142 (16,172) | WoLF PSORT | – | WoLF PSORT predicted to be extracellular |
| C. graminicola | 1650 (12,006) | ||||
| Ohm et al., 2012 | Dothideomycetes | NA | SignalP 3.0 | TMHMM 2.0 | One TM allowed in first 40 aas |
| Brown et al., 2012 | Fusarium graminearum | 574 (13,937) | SignalP 3.0, TargetP 1.1, ProtComp 8.0, WoLF PSORT 0.2 | TMHMM 2.0, big-PI | SignalP D-score = Y, TargetP predicts “secreted,” ProtComp extracellular or unknown, no TM in mature protein, WoLF PSORT extracellular score > 17, no GPI-anchor |
| de Wit et al., 2012 | Cladosporium fulvum | 1200 (14,127) | SignalP 3.0, WoLF PSORT | Phobius, TMHMM 2.0, TargetP 1.1, PredGPI | D-score > 0.5, TargetP no mitochondrial targeting, no GPI anchor, no TM predicted by either TMHMM or Phobius |
| Dothistroma septosporum | 905 (12,580) | ||||
| Wiemann et al., 2013 | Fusarium fujikuroi | 1336 (14,813) | SignalP 4.0, SecretomeP, WoLF PSORT | TargetP, TMHMM | SecretomeP score > 0.6, TargetP no mitochondrial targeting, TargetP RC-score < 4, Signalp S-score > 0.5 |
| Manning et al., 2013 | Pyrenophora tritici-repentis | 1146 (12,141) | SignalP 3.0, WoLF PSORT | TMHMM | WoLF PSORT predicted to be extracellular, one TM allowed unless it starts in the first 10 aas |
| Hane et al., 2014 | Rhizoctonia solani AG8 | 1959 (13,964) | SignalP 4.1, Phobius 1.01, WoLF PSORT 0.2 | Phobius 1.01 | SignalP secreted or Phobius secreted or WoLF PSORT predicted to be extracellular, only one TM domain allowed |
| Nemri et al., 2014 | Melampsora lini | 1085 (26,443) | SignalP 2.1-HMM, SignalP 4.1 | TMHMM 2.0, TargetP 1.1 | No TM, TargetP no mitochondrial targeting, D-score > 0.36, S-prob > 0.88, cleavage site predicted between amino-acid 10 and 40 using SignalP 2 |
| Guyon et al., 2014 | Sclerotinia sclerotiorum | 745 (14,503) | SignalP 2, SignalP 4 | TMHMM, GPIsom | TM after SP removed, no GPI anchor |
| (B) OOMYCETE PATHOGENS | |||||
| Haas et al., 2009 | Phytophthora infestans | NA (17,797) | SignalP 3.0 | – | – |
| Raffaele et al., 2010 | P. infestans | 1415 (18,155) | SignalP 2.0, SignalP 3.0, TargetP, PSort | TMHMM | SignalP-HMM 2.0 score ≥ 0.9, SignalP-NN 3.0 Y-max score ≥ 0.5, SignalP- NN 3.0 D-score ≥ 0.5, SignalP- HMM 3.0 S-prob ≥ 0.9, TargetP predicted localization “Secreted” (S), most probable PSort location “extracellular” (extr.) and no TMHMM predicted TM domain after signal peptide cleavage site |
| Lévesque et al., 2010 | Pythium ultimum | 747 (15,297) | SignalP 2.0 | TMHMM, TargetP | SignalP-HMM predicts signal peptide, SignalP-NN predicts a cleavage site between amino acids 10 and 40 |
| Links et al., 2011 | Albugo candida | 939 (15,824) | SignalP 3.0 | – | Either SignalP-NN or SignalP-HMM predict SP, SignalP predicts a cleavage site between amino acids 10 and 30 |
| Kemen et al., 2011 | A. laibachii | 1636 (13,032) | SignalP 3.0 | MEMSAT3 | Both neural network and hidden Markov model predict signal peptide. Proteins were considered to be without a TM domain with pnon−TM> 0.0004 or, for high stringency, pnon−TM> 0.01. |
| (C) SECRETOME STUDIES | |||||
| Lowe and Howlett, 2012 | Fungi | – | SignalP | – | – |
| Sperschneider et al., 2015 | Fungi | – | SignalP 4.1 | – | – |
| Lo Presti et al., 2015 | Fungi | – | SignalP 4.0 | TMHMM | No TMs as predicted by TMHMM 2.0c (TMHMM score < 2) |
Examples for approaches used in eukaryotic plant pathogen genomic studies that predict secreted proteins.
If provided in the original paper, version numbers of prediction tools are given.
Furthermore, there are also discrepancies in how tools are used and how thresholds for secretion are set (Table 1). For example, some studies have used the neural network scores from SignalP 2 and 3 with custom thresholds, whereas others rely on the hidden Markov model probability for predicting the presence of a signal peptide. SignalP 2 and 3 employ predictions from both a neural network (SignalP-NN) and a hidden Markov model (SignalP-HMM), whilst the latest version SignalP 4 is purely based on neural networks. SignalP 2 returns three neural network scores for each position in the sequence: a raw cleavage site score (C-score), the signal peptide score (S-score), and the combined cleavage site score (Y-score). For each sequence, it reports the maximal C-, S-, and Y-scores as well as the mean S-score between the N-terminus and the predicted cleavage site that it used to assess whether a sequence contains a signal peptide. Furthermore, it returns two hidden Markov model scores, the C-score as well as the probability that the sequence contains a signal peptide (S-probability). SignalP 3 replaces the previously used mean S-score for classification with the D-score, which is calculated as the average of the mean S-score, and the maximal Y-score. It still uses both neural network scores and calculates the signal peptide probability with a hidden Markov model. SignalP 4 is a neural network based method designed to discriminate between signal peptides and transmembrane regions. Prediction of signal peptides is based entirely on the D-score. For all scores, Boolean flags are provided which are either “Y” for a signal peptide or “N” for no signal peptide.
Subcellular localization tools such as TargetP (Emanuelsson et al., 2000), WoLF PSORT (Horton et al., 2007), or ProtComp are frequently used to complement the predictions made by SignalP or Phobius, either through a union or intersection of predictions made by these methods (Table 1). This can serve to filter proteins that may be predicted to contain a signal peptide, yet that might not be fully secreted into the extracellular space due to being retained within the ER/Golgi. TargetP predicts if a protein is secreted or localized to the mitochondria, chloroplast, or another unknown location. It reports reliability class scores from 1 to 5, where 1 corresponds to the strongest prediction. Another tool WoLF PSORT, an updated version of PSORT II, has been trained separately on fungi, animal, and plant data. It reports predicted subcellular locations (nuclear, mitochondria, cytosol, cytoskeleton, endoplasmic reticulum, plasma membrane, extracellular, chloroplast, peroxisome, Golgi apparatus, lysosome, and vacuolar membrane) in terms of respective scores based on a weighted k-nearest neighbor classifier. The output format is similar to a sequence similarity search, with scores assigned for each predicted localization site based on the number of nearest neighbors to the query protein. In most studies that employ WoLF PSORT, proteins have been predicted as secreted where extracellular predictions score higher than other locations (Table 1). Less commonly, the prediction of non-classically secreted proteins has been reported using SecretomeP, which has been trained on a very small set of verified non-classically secreted proteins derived from mammalian and bacterial sequences (Bendtsen et al., 2004a). Consequently, the relevance of SecretomeP to fungal and oomycete proteins is questionable. Finally, ProtComp is a web-server based tool combining several methods for protein localization, ranging from neural networks to sequence homology searches. Its lack of a publicly distributed version for local installation precludes it from routine use for whole-genome analysis. Predicted transmembrane proteins are typically removed from the set of predicted extracellular proteins using programs such as TMHMM (Krogh et al., 2001) or Phobius. However, most pipelines allow for the presence of one transmembrane domain in the N-terminus, as this can correspond to the signal peptide as both are predicted based on the presence of hydrophobic residues. Additionally, TargetP is often employed to eliminate proteins predicted to be targeted to mitochondria or chloroplasts. In some fungal studies, predicted GPI-anchored proteins are also removed from the set of secreted effector candidates.
The diversity of prediction pipelines shown in Table 1 illustrates an overall lack of consensus used to predict extracellular pathogen proteins, in particular for effector candidates, and presents difficulties when comparing secretome sizes across different species. Herein we benchmark the performance of individual secretion prediction tools on experimentally verified fungal and oomycete effectors and use the best-performing tools to predict extracellular proteins across fungal and oomycete pathogens. In particular, we show that for cytoplasmic effector proteins that are first secreted into the extracellular space and subsequently translocated to the host cell, protein subcellular localization predictors suffer from poor accuracy. We highlight differences in performance for secretion prediction between fungal effectors and oomycete effectors and conclude by providing practical recommendations for the computational secretion prediction for effector candidate mining from eukaryotic pathogen genomes.
Materials and methods
Various datasets were chosen for the purpose of comparing the performance of secretion prediction software tools, in the context of plant pathogenomics. Experimentally validated fungal and oomycete effector protein sequences were collected from PHIbase version 3.6 (Urban et al., 2015) and from manual literature searches (Supplementary Data Sheet 1, 2, Supplementary Table 1). For further benchmarking, representative datasets for both extracellular and intracellular proteins of the fungi were obtained by searching SwissProt database records created between 2011 and 2015 for: (1) fungal proteins that have been manually annotated as secreted (taxonomy:“Fungi [4751]” locations:(location:“Secreted [SL-0243]” evidence:manual) created:[20110101 TO 20150101]) (Supplementary Data Sheet 3); and (2) fungal proteins that have been manually annotated as localized to the nucleus (taxonomy:“Fungi [4751]” locations:(location:“Nucleus [SL-0191]” evidence:manual) created:[20110101 TO 20150101]) (Supplementary Data Sheet 4). Sequences that did not start with “M” or were shorter than 30 aas were removed. Both sets only cover proteins for which entries were created after 2011, to avoid an overlap with the training sets used for secretion prediction tools. We could not extract an equivalent set for oomycete proteins from SwissProt due to the very low number of entries for manually curated secreted proteins (four entries). Secretion prediction tools were run on a local machine, or using web servers where indicated, as in Table 2 (Results given in Supplementary Data Sheet 5). Sensitivity was calculated as TP/(TP + FN) and specificity as TN/(TN + FP), where TP is the number of true positives, TN the number of true negatives, FP the number of false positives and FN the number of false negatives. The Matthews correlation coefficient (MCC) was calculated as .
Table 2
| Software | References | Secretion or transmembrane prediction criteria |
|---|---|---|
| SignalP 2.0 | Nielsen et al., 1997 Nielsen and Krogh, 1998 | SignalP-HMM: labeled “Y” based on S-probability (default threshold 0.5). SignalP-NN: labeled “Y” based on S-score (default threshold 0.47). |
| SignalP 3.0 | Bendtsen et al., 2004b | SignalP-HMM: labeled “Y” based on S-probability (default threshold 0.5). SignalP-NN: labeled “Y” based on D-score (default threshold 0.43). |
| SignalP 4.1 | Petersen et al., 2011 | Labeled “Y” based on D-score (default thresholds 0.45 for SignalP-noTM networks and 0.5 for SignalP-TM networks). |
| Phobius | Käll et al., 2004 | Predicted as secreted if presence of a signal peptide (SP) labeled as “Y.” Transmembrane protein if the number of predicted transmembrane segments (TM) is ≥ 1. |
| TargetP 1.1 | Emanuelsson et al., 2000 | Labeled “S” for signal peptide (“secreted”), regardless of the reliability class score. |
| WoLF PSORT 0.2 | Horton et al., 2007 | Secreted if the best score in the ranked localization list is “extracellular.” |
| ProtComp 9.0* | http://www.softberry.com/berry.phtml | Secreted if the integral prediction of protein location contains “extracellular (secreted).” |
| TMHMM 2.0 | Krogh et al., 2001 | Transmembrane protein if one or more transmembrane helices beginning outside the first 60 aas. |
Software tested in this study and the parameters under which proteins were predicted to be secreted.
All tools were run with default parameters and settings.
Run using web-server.
Results and discussion
Signalp 2, 3 and 4 show the best performance for secretion prediction on a set of fungal protein sequences
Several independent benchmark analyses have been published that compare the accuracy of secretion prediction tools. For example, Klee and Ellis (2005) evaluated a range of secretion prediction methods (SignalP 3.0, SignalP 2.0, TargetP 1.01, PrediSi, Phobius, and ProtComp 6.0) on 372 proteins from five vertebrate organisms and found that TargetP, the SignalP 3 maximum S-score and SignalP 3 D-score were the most accurate single scores. Choo et al. (2009) found that most of the tested tools were capable of reliably distinguishing secreted from non-secreted proteins, as indicated by the high specificities that were achieved. SignalP 4 has been reported by the authors to outperform previous versions of SignalP for a test set spanning eukaryotic and bacterial sequences (Petersen et al., 2011).
Min (2010) evaluated eukaryotic secretion prediction using Phobius, SignalP 3.0, TargetP, and WoLF PSORT individually and in combination with TMHMM and PS-Scan and found that for fungi the most reliable individual predictor of secretion was WoLF PSORT, but a combination of tools produced the most accurate predictions. A follow-up study including SignalP 4.0 reported WoLF PSORT as the best individual tool for fungal data and also made the general recommendation of using SignalP 4.0 over SignalP 3.0 (Melhem et al., 2013). However, the authors assign a protein as predicted to be secreted by WoLF PSORT if it features “extracellular” in the ranked localization list whereas other studies (Table 1), including ours, have used this tool quite differently requiring more stringently that the “extracellular” score is higher than that of all other sub-cellular locations. Notably, WoLF PSORT stands out amongst the tools compared in that it has been trained on a relatively extensive set of fungal proteins. However, while it performs well for fungal secreted proteins overall, when restricted to known secreted effectors its performance is markedly poorer.
For the evaluation of secretion prediction performance we utilized two data sets from the SwissProt database: one that contained fungal proteins that were manually annotated as secreted (409 proteins) and the other that contained non-secreted fungal proteins that were manually annotated as nuclear (1113 proteins). We could not extract an equivalent set for oomycete proteins from SwissProt due to the very low number of entries for manually curated secreted or nuclear proteins. All tools tested achieved high specificity in the range of 97.2–99.8%, whereas sensitivity varied more dramatically (Table 3). All versions of SignalP, Phobius, and TargetP achieved high sensitivity of more than 94.9%. In contrast, the proportion of proteins that are predicted to be extracellular by WoLF PSORT and ProtComp showed lower sensitivity at 88 and 63.3%, respectively. In terms of the Matthews correlation coefficient (MCC), SignalP 4, SignalP-NN 3 (D-score), and SignalP-NN 2 perform best (MCC = 0.96), with SignalP 2 and 3 showing slightly more sensitivity than SignalP 4, which in turn achieves marginally higher specificity. These results confirm the strong predictive performance of SignalP for secreted fungal proteins.
Table 3
| Measure | SignalP 4.1 | SignalP-NN 3.0 (D-Score) | SignalP-HMM 3.0 | SignalP-NN 2.0 | Signal-HMM 2.0 | Phobius | TargetP 1.1 | WoLF PSORT | ProtComp 9.0 |
|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | 94.9% | 95.6% | 95.8% | 95.6% | 95.8% | 95.4% | 95.1% | 88% | 63.3% |
| Specificity | 99.7% | 99.6% | 98.4% | 99.4% | 98.2% | 98.7% | 98.3% | 99.8% | 97.2% |
| MCC | 0.96 | 0.96 | 0.94 | 0.96 | 0.94 | 0.94 | 0.93 | 0.91 | 0.68 |
Performance of secretion prediction tools applied to secreted fungal proteins sourced from SwissProt.
Sensitivity, specificity and the Matthews correlation coefficient (MCC) are shown for evaluating the performance of secretion prediction tools. All tools were run with the settings and parameters given in Table 2. The best performance in terms of MCC is marked in bold.
Differences in sensitivity of secretion prediction tools for effectors from fungi and oomycetes
In line with previous studies (Menne et al., 2000; Klee and Ellis, 2005; Choo et al., 2009; Min, 2010), we found that all tools tested achieved high specificity in secretion prediction. For effector prediction in particular, the use of a sensitive method can be desirable to obtain the most complete candidate effector set. To test the sensitivity of secretion prediction tools for effector proteins from eukaryotic plant pathogens, we collected two sets of experimentally verified fungal and oomycete effectors from the literature. In total, the test set of fungal and oomycete effectors contain 69 and 53 proteins, respectively, (Supplementary Table 1). Interestingly, the sensitivity of secretion prediction tools varied between the fungal and oomycete effector sets (Figure 1). The neural network predictors of SignalP 3 and SignalP 2 (SignalP-NN 2, SignalP-NN 3) as well as TargetP (“S” for secreted with RC scores ranging from 1 to 5) were found to be the most sensitive for fungal effectors (95.7%). In contrast, the hidden Markov model predictors of SignalP 2 and SignalP 3 (SignalP-HMM 2, SignalP-HMM 3) achieved highest sensitivity for oomycete effectors (98.1%). In general, neural networks and hidden Markov models have different strengths in pattern recognition tasks. Whereas neural networks are powerful for correlating features over a longer range, hidden Markov models are advantageous for modeling sequential regions or patterns found in signal peptides (Nielsen et al., 1999). How this could relate to the prediction of signal peptides in fungal and oomycete effectors remains to be determined.
Figure 1
From the fungal effector set, all secretion predictors, including the best-performing tools SignalP-NN 2, SignalP-NN 3, and TargetP, were consistently unable to predict a signal peptide for only three effectors: Avra10, Avrk1, and Vdlsc1 (Table 4). Similarly for the oomycete effector set, all secretion predictors including the best-performing tools SignalP 2-HMM and SignalP-HMM 3 were unable to predict a signal peptide for only a single oomycete effector (Pslsc1). These four effector proteins have been demonstrated to be secreted via non-classical pathways (Ridout et al., 2006; Liu et al., 2014). This suggests that the most sensitive methods are only likely to fail to predict the secretion of non-classically secreted effectors and that using a union of multiple methods would not necessarily improve sensitivity for this test set. At this stage the computational identification of non-classically secreted effectors remains challenging and these types of effectors require experimental validation of their secretion. In the future, an increased understanding of non-classical secretion mechanisms of fungal and oomycete effectors might lead to improved computational prediction of these effectors. Protein tribe clustering with subsequent examination of high-priority effector candidate families (Saunders et al., 2012) or the presence of conserved protein domains has been effectively applied to identify related effector candidates lacking a predicted signal peptide. However, as the vast majority of fungal effectors share little sequence homology, the utility of this method is limited. Furthermore, orthologs of a secreted protein are not necessarily also secreted (Poppe et al., 2015). Therefore, secretomes predicted through the additional use of reciprocal BLASTs and/or tribe analysis are likely to include a high number of false positives.
Table 4
| SignalP 4.1 | SignalP-NN 3.0 (D-Score) | SignalP-HMM 3.0 | SignalP-NN 2.0 | Signal-HMM 2.0 | Phobius | TargetP 1.1 | |
|---|---|---|---|---|---|---|---|
| Fungal effectors | Avra10 Avrk1 Vdlsc1 AvrP4 SIX5 | Avra10 Avrk1 Vdlsc1 | Avra10 Avrk1 Vdlsc1 Avr-Pita2 | Avra10 Avrk1 Vdlsc1 | Avra10 Avrk1 Vdlsc1 Avr-Pita2 | Avra10 Avrk1 Vdlsc1 Avr-Pita2 AvrP123 | Avra10 Avrk1 Vdlsc1 |
| Oomycete effectors | Pslsc1 CRN1 CRN2 CRN15 CRN16 CRN63 CRN115 Avr1k ATR5 | Pslsc1 CRN1 CRN15 CRN16 ATR5 | Pslsc1 | Pslsc1 CRN16 | Pslsc1 | Pslsc1 CRN16 | Pslsc1 CRN16 Avr1k ATR5 |
Fungal and oomycete effectors that were not predicted to be secreted by the prediction tools tested.
TargetP predicted signal peptides with the highest reliability class (RC = 1) for only 63.8% of fungal effectors and for 56.6% of oomycete effectors (Figure 2). Without a restriction on the reliability class (RC from 1 to 5), TargetP predicted “secreted” as the localization for 95.6% of the fungal effectors (three effectors were predicted as “unknown”), whereas it returned “secreted” for 92.4% of the oomycete effectors (two effectors were predicted as “unknown” and two were predicted as “mitochondrial”). Therefore, a restriction on the predicted reliability class should not be used for predicting the secretion of effectors and the exclusion of proteins predicted to be localized to mitochondria has to be used with caution for oomycete effectors.
Figure 2
The relatively poor performance of SignalP 4 for oomycete effectors (Figure 1, sensitivity 83%) is surprising and suggests that previous versions of SignalP (SignalP 2, SignalP 3) should be used for effector mining in oomycete genomes instead. In particular, SignalP 4 does not predict a signal peptide for six out of seven Crinkler effectors in the test set (Table 4; CRN1, CRN2, CRN8, CRN15, CRN16, CRN63, CRN115). Crinkler effectors are a large family of modular proteins that are translocated into host cells, featuring a signal peptide followed by a LXLFLAK sequence motif and C-terminal domains (Haas et al., 2009; Schornack et al., 2010). On the set of seven Crinkler effectors, SignalP 4 achieves the lowest sensitivity, whereas the hidden Markov model predictors of SignalP 2 and SignalP 3 (SignalP-HMM 2, SignalP-HMM 3) correctly predict the signal peptide in all seven Crinklers (Table 4). This exemplifies the substantial benefits of using previous versions of SignalP (SignalP 2, SignalP 3) for oomycete effector mining.
Signal peptide prediction tools such as SignalP return the set of proteins that are likely to carry a signal peptide for the classical pathway, but do not necessarily imply that a protein will be extracellular. Many proteins with a signal peptide are retained in various cellular compartments and thus, signal peptide prediction is often combined with additional evidence for extracellular protein secretion, such as the absence of transmembrane domains, GPI anchors or retention signals (Table 1). We found that no transmembrane regions outside the signal peptide region (first 60 aas) were predicted for any of the 69 fungal effectors using TMHMM or Phobius. For the 53 oomycete effectors, TMHMM and Phobius both return one transmembrane helix outside the signal peptide region for the RXLR effector PITG_03192. This might be an indication that TMHMM and Phobius can be used as a preliminary filter to exclude proteins with multiple, non-N-terminal transmembrane domains for effector mining in fungi. However, these tools should be used with less stringent requirements for effector prediction in oomycetes.
Subcellular localization prediction tools should not be used for predicting effector secretion
Prediction of subcellular localization is important for inferring hints about a protein's function. In eukaryotes, a number of compartments exist to which proteins may be localized, e.g., the extracellular space, mitochondria, chloroplast, nucleus, peroxisome, cytosol or plasma membrane. Several plant pathogenomics studies have used the subcellular localization of “extracellular” as a criterion for predicting secretion, commonly using WoLF PSORT which has been trained separately on fungi, animal and plant data. However, we found that applying WoLF PSORT (fungi) to the sets of experimentally verified fungal and oomycete effectors returned 25 cytoplasmic effectors that are not predicted to be extracellular (34.2% of cytoplasmic effectors, Figure 3). This could be explained as follows. First, the estimated sensitivity and specificity of WoLF PSORT is fairly low at around 70% (Horton et al., 2007), which might lead to a high number of false predictions. However, we found that false predictions occurred in particular for non-apoplastic effectors (Figure 3). It is possible that WoLF PSORT may have predicted a signal for host cell localization in effectors rather than for the extracellular secretion of the effector from the pathogen cell. Thus, WoLF PSORT should be used with caution when predicting secretomes and its “extracellular” predictions should not be solely relied upon for effector prediction. An alternative approach is to impose a high level of stringency to WoLF PSORT predictions, as was the case for the F. graminearum secretome in which proteins were reported as secreted if the extracellular score was >17 (Brown et al., 2012). Whilst this practice is likely to drastically reduce the number of false positives in the secretome, it is prone to miss bona fide effectors that are not predicted to be extracellularly localized. In this study, of the oomycete Crinkler effectors CRN1, CRN2, CRN8, CRN15 and CRN16 which are known to localize to the host cell nucleus (Schornack et al., 2010), WoLF PSORT only predicted a nuclear localization for CRN16. Therefore, the predictions of subcellular localization tools may need to be used with caution in effector prediction studies.
Figure 3
Practical recommendations for prediction of extracellular proteins in fungi and oomycetes
In this study, we have assessed the performance of various secretion and subcellular localization prediction tools, when applied to datasets derived from known fungal and oomycete effectors, as well as extracellular and intracellular fungal proteins. Based on our benchmarking, we deduce recommendations for extracellular protein prediction in fungal and oomycete pathogen genomes.
We observe that previous versions of SignalP (2, 3) demonstrate increased sensitivity over the latest version (4.1) for predicting signal peptides of oomycete effectors, with the HMM-based methods outperforming the NN-based methods. Indeed, this has formed the basis for the pipeline PexFinder (Phytophthora Extracellular Proteins Finder), which automates identification of oomycete extracellular proteins from EST data (Torto et al., 2003). PexFinder uses SignalP 2.0 but applies an additional logical filter that predicts a protein to be secreted only if both the hidden Markov model predicts a signal peptide and the neural network predicts a cleavage site between amino acids 10 and 40. Whilst this pipeline was proposed over a decade ago, it still retains its value for mining effectors from oomycete genomes.
In contrast with oomycete effectors, the NN predictors of SignalP 2 and 3, as well as TargetP, were observed to be the most sensitive for predicting signal peptides of fungal effector proteins. Unlike oomycete effectors, no TM domains were predicted outside the N-terminal signal peptide region using TMHMM or Phobius. Therefore, we propose that for fungal effector mining the requirement of a predicted signal peptide using either SignalP-NN 2 or 3, a TargetP localization prediction of “secreted” or “unknown” (with no restriction on the RC score) and a lack of transmembrane domains outside the signal peptide region (TMHMM/Phobius) would be a robust method. Applying this proposed pipeline to publicly available fungal genomes (some with secretome predictions given in Table 1) highlights the wide variability in the number of predicted secreted proteins produced by the different techniques used in previously published studies (Figure 4). In line with previous reports, we observe a higher percentage of proteins that are predicted to be secreted in pathogens with a biotrophic phase, compared to necrotrophs and saprophytes (Lowe and Howlett, 2012; Lo Presti et al., 2015). By our method, similar numbers of predicted secreted proteins were predicted across multiple species of the same trophic class, whereas reported numbers were highly variable in genome survey publications for these species (Figure 4).
Figure 4
Conclusion
Prediction of effector proteins is of vital importance to the field of plant pathology, and relies heavily on the strengths or weaknesses of secretion prediction software. In this study, we assess the performance of popular software tools against known effectors of both the fungi and oomycetes and offer recommendations on which may be better suited to specialized applications. However, such performance evaluations inevitably vary based on the test data sets used, and therefore, we advise readers to carefully consider the suitability of these recommendations to their own data. Based on the results discussed herein, we recommend the use of the neural network predictors of SignalP 2 or 3, a TargetP localization prediction of “secreted” as well as transmembrane protein removal using either TMHMM or Phobius as a robust choice for predicting the secretion of fungal effectors. In comparison, the hidden Markov model predictors of SignalP 2 and 3 perform best for predicting the signal peptide of oomycete effectors and automated pipelines such as PexFinder retain their value (Torto et al., 2003). However, the secretome includes many proteins unrelated to pathogenicity, and a number of additional conditions must be subsequently assessed in order to arrive at a subset that represents a potential set of effectors. In oomycetes, this can be achieved using motif enrichment analysis based on RXLR or Crinkler effector families (Petre and Kamoun, 2014), whereas in fungi this process is not feasible and alternative criteria such as small size, an enrichment in cysteines, genomic location, or signatures of diversifying selection can be used (Sperschneider et al., 2015).
While the reliability of secretion prediction is highly relevant to effector prediction, one must not overlook the potential for errors to arise from prior steps involved in the generation of sequence resources. The annotation of gene structure in effector genes can be particularly error-prone for various reasons, stemming from idiosyncrasies related to their genomic context for example their tendency to be associated with repetitive regions of the genome (Raffaele and Kamoun, 2012). There is potential for errors to occur in the assembled genome sequence, especially for those assembled from short-read data only, and the subsequent use of automated annotation pipelines can contribute to inaccurate or fragmented gene predictions. If this occurs in the 5′ region it can lead to misprediction of N-terminal signal peptides. We also note that due to high gene density in fungi that transcript UTRs of adjacent gene loci frequently overlap (Guida et al., 2011; Wang et al., 2014), potentially resulting in gene annotations that are merged products of two or more adjacent loci. Therefore, the use of RNA-seq-based annotation methods specifically designed for fungi (Reid et al., 2014; Testa et al., 2015) can be beneficial to arrive at an optimal set of gene annotations for subsequent secretion prediction.
Our results showed that one of the areas that is currently suffering from poor accuracy is the prediction of subcellular localization for effector proteins that are first secreted from the fungus and then targeted to a host organelle. In particular, we recommend that the requirement of extracellular localization as predicted by WoLF PSORT should not be used for effector mining in secretomes. Re-training subcellular localization tools with updated data sets including experimentally validated effectors might help to improve accuracy. There are few well-studied fungal effectors with confirmed host-localization, one being the SP7 effector of the arbuscular mycorrhiza Glomus intraradices (Kloppholz et al., 2011). SP7 is initially secreted to the apoplast, then imported into the host cell, and then into its nucleus. This localization is determined by multiple motifs, including a signal peptide, nuclear localization domain and an array of imperfect tandem hydrophilic repeats possibly involved in membrane integration. Both TargetP and WoLF PSORT predicted that the complete version of SP7 was secreted, however after removal of the signal peptide based on SignalP analysis, the TargetP prediction changed to “other” and WoLF PSORT (plant mode) predicted nuclear localization. Intriguingly, this suggests that subcellular localization prediction has the potential to become a powerful tool for providing insight into potential modes of action for candidate effectors based on their organelle targets. Additionally, there are currently no tools designed to predict proteins secreted in a non-classical manner that have been specifically trained on either fungi or oomycetes sequences due to a lack of training data. Although tools like SecretomeP are able to predict some cases (Liu et al., 2014), in the future refined tools for non-classical secretion prediction could be a source of significant improvements in effector prediction.
In summary, whilst existing methods for signal peptide prediction achieve high accuracy, the main areas for improving eukaryotic effector secretion prediction will come from advances in subcellular localization prediction tools as well as from investigations of non-classical secretion pathways and improved gene prediction tools for pathogen genomes.
Funding
JS was partially supported by the Australian Grains Research and Development Corporation.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Statements
Author contributions
JS conceived the study and all authors contributed to the design of the study. JS, AW, and JH acquired, analyzed and interpreted the data. All authors drafted the manuscript and approved the final version.
Acknowledgments
We thank Dr. Louise Thatcher and Dr. Ian Dry for their constructive feedback on this work.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2015.01168
References
1
AmselemJ.CuomoC. A.Van KanJ. A.ViaudM.BenitoE. P.CoulouxA.et al. (2011). Genomic analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea. PLoS Genet.7:e1002230. 10.1371/journal.pgen.1002230
2
AndersenM. R.SalazarM. P.SchaapP. J.van de VondervoortP. J.CulleyD.ThykaerJ.et al. (2011). Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88. Genome Res.21, 885–897. 10.1101/gr.112169.110
3
BendtsenJ. D.JensenL. J.BlomN.Von HeijneG.BrunakS. (2004a). Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel.17, 349–356. 10.1093/protein/gzh037
4
BendtsenJ. D.NielsenH.Von HeijneG.BrunakS. (2004b). Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol.340, 783–795. 10.1016/j.jmb.2004.05.028
5
BringansS.HaneJ. K.CaseyT.TanK. C.LipscombeR.SolomonP. S.et al. (2009). Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum. BMC Bioinformatics10:301. 10.1186/1471-2105-10-301
6
BrownN. A.AntoniwJ.Hammond-KosackK. E. (2012). The predicted secretome of the plant pathogenic fungus Fusarium graminearum: a refined comparative analysis. PLoS ONE7:e33731. 10.1371/journal.pone.0033731
7
CantuD.GovindarajuluM.KozikA.WangM.ChenX.KojimaK. K.et al. (2011). Next generation sequencing provides rapid access to the genome of Puccinia striiformis f. sp. tritici, the causal agent of wheat stripe rust. PLoS ONE6:e24230. 10.1371/journal.pone.0024230
8
ChooK. H.TanT. W.RanganathanS. (2009). A comprehensive assessment of N-terminal signal peptides prediction methods. BMC Bioinformatics10(Suppl 15):S2. 10.1186/1471-2105-10-S15-S2
9
DeanR.Van KanJ. A.PretoriusZ. A.Hammond-KosackK. E.Di PietroA.SpanuP. D.et al. (2012). The Top 10 fungal pathogens in molecular plant pathology. Mol. Plant Pathol.13, 414–430. 10.1111/j.1364-3703.2011.00783.x
10
DeanR. A.TalbotN. J.EbboleD. J.FarmanM. L.MitchellT. K.OrbachM. J.et al. (2005). The genome sequence of the rice blast fungus Magnaporthe grisea. Nature434, 980–986. 10.1038/nature03449
11
de WitP. J.van Der BurgtA.ÖkmenB.StergiopoulosI.Abd-ElsalamK. A.AertsA. L.et al. (2012). The genomes of the fungal plant pathogens Cladosporium fulvum and Dothistroma septosporum reveal adaptation to different hosts and lifestyles but also signatures of common ancestry. PLoS Genet.8:e1003088. 10.1371/journal.pgen.1003088
12
DoddsP. N.RathjenJ. P. (2010). Plant immunity: towards an integrated view of plant-pathogen interactions. Nat. Rev. Genet.11, 539–548. 10.1038/nrg2812
13
DuplessisS.CuomoC. A.LinY. C.AertsA.TisserantE.Veneault-FourreyC.et al. (2011). Obligate biotrophy features unraveled by the genomic analysis of rust fungi. Proc. Natl. Acad. Sci. U.S.A.108, 9166–9171. 10.1073/pnas.1019315108
14
EmanuelssonO. (2002). Predicting protein subcellular localisation from amino acid sequence information. Brief. Bioinformatics3, 361–376. 10.1093/bib/3.4.361
15
EmanuelssonO.BrunakS.Von HeijneG.NielsenH. (2007). Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc.2, 953–971. 10.1038/nprot.2007.131
16
EmanuelssonO.NielsenH.BrunakS.Von HeijneG. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol.300, 1005–1016. 10.1006/jmbi.2000.3903
17
FloudasD.BinderM.RileyR.BarryK.BlanchetteR. A.HenrissatB.et al. (2012). The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science336, 1715–1719. 10.1126/science.1221748
18
GalaganJ. E.CalvoS. E.BorkovichK. A.SelkerE. U.ReadN. D.JaffeD.et al. (2003). The genome sequence of the filamentous fungus Neurospora crassa. Nature422, 859–868. 10.1038/nature01554
19
GalaganJ. E.CalvoS. E.CuomoC.MaL. J.WortmanJ. R.BatzoglouS.et al. (2005). Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature438, 1105–1115. 10.1038/nature04341
20
GodfreyD.BohleniusH.PedersenC.ZhangZ.EmmersenJ.Thordal-ChristensenH. (2010). Powdery mildew fungal effector candidates share N-terminal Y/F/WxC-motif. BMC Genomics11:317. 10.1186/1471-2164-11-317
21
GoffeauA.BarrellB. G.BusseyH.DavisR. W.DujonB.FeldmannH.et al. (1996). Life with 6000 genes. Science 274:546, 563–547. 10.1126/science.274.5287.546
22
GuidaA.LindstädtC.MaguireS. L.DingC.HigginsD. G.CortonN. J.et al. (2011). Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis. BMC Genomics12:628. 10.1186/1471-2164-12-628
23
GuyonK.BalaguéC.RobyD.RaffaeleS. (2014). Secretome analysis reveals effector candidates associated with broad host range necrotrophy in the fungal plant pathogen Sclerotinia sclerotiorum. BMC Genomics15:336. 10.1186/1471-2164-15-336
24
HaasB. J.KamounS.ZodyM. C.JiangR. H.HandsakerR. E.CanoL. M.et al. (2009). Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature461, 393–398. 10.1038/nature08358
25
HaneJ. K.AndersonJ. P.WilliamsA. H.SperschneiderJ.SinghK. B. (2014). Genome sequencing and comparative genomics of the broad host-range pathogen Rhizoctonia solani AG8. PLoS Genet.10:e1004281. 10.1371/journal.pgen.1004281
26
HaneJ. K.LoweR. G.SolomonP. S.TanK. C.SchochC. L.SpataforaJ. W.et al. (2007). Dothideomycete plant interactions illuminated by genome sequencing and EST analysis of the wheat pathogen Stagonospora nodorum. Plant Cell19, 3347–3368. 10.1105/tpc.107.052829
27
HortonP.ParkK. J.ObayashiT.FujitaN.HaradaH.Adams-CollierC. J.et al. (2007). WoLF PSORT: protein localization predictor. Nucleic Acids Res.35, W585–W587. 10.1093/nar/gkm259
28
KällL.KroghA.SonnhammerE. L. (2004). A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol.338, 1027–1036. 10.1016/j.jmb.2004.03.016
29
KamounS.FurzerO.JonesJ. D.JudelsonH. S.AliG. S.DalioR. J.et al. (2014). The Top 10 oomycete pathogens in molecular plant pathology. Mol. Plant Pathol. 16, 413–434. 10.1111/mpp.12190
30
KämperJ.KahmannR.BolkerM.MaL. J.BrefortT.SavilleB. J.et al. (2006). Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis. Nature444, 97–101. 10.1038/nature05248
31
KemenE.GardinerA.Schultz-LarsenT.KemenA. C.BalmuthA. L.Robert-SeilaniantzA.et al. (2011). Gene gain and loss during evolution of obligate parasitism in the white rust pathogen of Arabidopsis thaliana. PLoS Biol.9:e1001094. 10.1371/journal.pbio.1001094
32
KleeE. W.EllisL. B. (2005). Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics6:256. 10.1186/1471-2105-6-256
33
KloppholzS.KuhnH.RequenaN. (2011). A secreted fungal effector of Glomus intraradices promotes symbiotic biotrophy. Curr. Biol.21, 1204–1209. 10.1016/j.cub.2011.06.044
34
KlostermanS. J.SubbaraoK. V.KangS.VeroneseP.GoldS. E.ThommaB. P.et al. (2011). Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathog.7:e1002137. 10.1371/journal.ppat.1002137
35
KroghA.LarssonB.von HeijneG.SonnhammerE. L. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol.305, 567–580. 10.1006/jmbi.2000.4315
36
LévesqueC. A.BrouwerH.CanoL.HamiltonJ. P.HoltC.HuitemaE.et al. (2010). Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol.11, R73. 10.1186/gb-2010-11-7-r73
37
LinksM. G.HolubE.JiangR. H.SharpeA. G.HegedusD.BeynonE.et al. (2011). De novo sequence assembly of Albugo candida reveals a small genome relative to other biotrophic oomycetes. BMC Genomics12:503. 10.1186/1471-2164-12-503
38
LiuT.SongT.ZhangX.YuanH.SuL.LiW.et al. (2014). Unconventionally secreted effectors of two filamentous pathogens target plant salicylate biosynthesis. Nat. Commun.5, 4686. 10.1038/ncomms5686
39
Lo PrestiL.LanverD.SchweizerG.TanakaS.LiangL.TollotM.et al. (2015). Fungal effectors and plant susceptibility. Annu. Rev. Plant Biol.66, 513–545. 10.1146/annurev-arplant-043014-114623
40
LoweR. G.HowlettB. J. (2012). Indifferent, affectionate, or deceitful: lifestyles and secretomes of fungi. PLoS Pathog.8:e1002515. 10.1371/journal.ppat.1002515
41
MaL. J.Van der DoesH. C.BorkovichK. A.ColemanJ. J.DaboussiM. J.Di PietroA.et al. (2010). Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature464, 367–373. 10.1038/nature08850
42
MachidaM.AsaiK.SanoM.TanakaT.KumagaiT.TeraiG.et al. (2005). Genome sequencing and analysis of Aspergillus oryzae. Nature438, 1157–1161. 10.1038/nature04300
43
ManningV. A.PandelovaI.DhillonB.WilhelmL. J.GoodwinS. B.BerlinA. M.et al. (2013). Comparative genomics of a plant-pathogenic fungus, Pyrenophora tritici-repentis, reveals transduplication and the impact of repeat elements on pathogenicity and population divergence. G3 (Bethesda).3, 41–63. 10.1534/g3.112.004044
44
MartinF.AertsA.AhrénD.BrunA.DanchinE. G.DuchaussoyF.et al. (2008). The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis. Nature452, 88–92. 10.1038/nature06556
45
MartinezD.BerkaR. M.HenrissatB.SaloheimoM.ArvasM.BakerS. E.et al. (2008). Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nat. Biotechnol.26, 553–560. 10.1038/nbt1403
46
MeijerH. J.MancusoF. M.EspadasG.SeidlM. F.ChivaC.GoversF.et al. (2014). Profiling the secretome and extracellular proteome of the potato late blight pathogen Phytophthora infestans. Mol. Cell. Proteomics13, 2101–2113. 10.1074/mcp.M113.035873
47
MelhemH.MinX. J.ButlerG. (2013). The impact of SignalP 4.0 on the prediction of secreted proteins, in Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2013 IEEE Symposium on: IEEE (Singapore), 16–22.
48
MenneK. M.HermjakobH.ApweilerR. (2000). A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics16, 741–742. 10.1093/bioinformatics/16.8.741
49
MinX. J. (2010). Evaluation of computational methods for secreted protein prediction in different eukaryotes. J. Proteomics Bioinform.3, 143–147. 10.4172/jpb.1000133
50
MorinE.KohlerA.BakerA. R.Foulongne-OriolM.LombardV.NagyL. G.et al. (2012). Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche. Proc. Natl. Acad. Sci. U.S.A.109, 17501–17506. 10.1073/pnas.1206847109
51
NemriA.SaundersD. G. O.AndersonC.UpadhyayaN.WinJ.LawrenceG. J.et al. (2014). The genome sequence and effector complement of the flax rust pathogen Melampsora lini. Front. Plant Sci.5:98. 10.3389/fpls.2014.00098
52
NielsenH.BrunakS.von HeijneG. (1999). Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng.12, 3–9. 10.1093/protein/12.1.3
53
NielsenH.EngelbrechtJ.BrunakS.von HeijneG. (1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng.10, 1–6. 10.1093/protein/10.1.1
54
NielsenH.KroghA. (1998). Prediction of signal peptides and signal anchors by a hidden Markov model. Proc. Int. Conf. Intell. Syst. Mol. Biol.6, 122–130.
55
O'ConnellR. J.ThonM. R.HacquardS.AmyotteS. G.KleemannJ.TorresM. F.et al. (2012). Lifestyle transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses. Nat. Genet.44, 1060–1065. 10.1038/ng.2372
56
OhmR. A.FeauN.HenrissatB.SchochC. L.HorwitzB. A.BarryK. W.et al. (2012). Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi. PLoS Pathog.8:e1003037. 10.1371/journal.ppat.1003037
57
OhmR. A.RileyR.SalamovA.MinB.ChoiI. G.GrigorievI. V. (2014). Genomics of wood-degrading fungi. Fungal Genet. Biol.72, 82–90. 10.1016/j.fgb.2014.05.001
58
PaperJ. M.Scott-CraigJ. S.AdhikariN. D.CuomoC. A.WaltonJ. D. (2007). Comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus Fusarium graminearum. Proteomics7, 3171–3183. 10.1002/pmic.200700184
59
PetersenT. N.BrunakS.von HeijneG.NielsenH. (2011). SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods8, 785–786. 10.1038/nmeth.1701
60
PetreB.KamounS. (2014). How do filamentous pathogens deliver effector proteins into plant cells?PLoS Biol.12:e1001801. 10.1371/journal.pbio.1001801
61
PoppeS.DorsheimerL.HappelP.StukenbrockE. H. (2015). Rapidly evolving genes are key players in host specialization and virulence of the fungal wheat pathogen Zymoseptoria tritici (Mycosphaerella graminicola). PLoS Pathog.11:e1005055. 10.1371/journal.ppat.1005055
62
RaffaeleS.KamounS. (2012). Genome evolution in filamentous plant pathogens: why bigger can be better. Nat. Rev. Microbiol.10, 417–430. 10.1038/nrmicro2790
63
RaffaeleS.WinJ.CanoL. M.KamounS. (2010). Analyses of genome architecture and gene expression reveal novel candidate virulence factors in the secretome of Phytophthora infestans. BMC Genomics11:637. 10.1186/1471-2164-11-637
64
ReidI.O'tooleN.ZabanehO.NourzadehR.DahdouliM.AbdellateefM.et al. (2014). SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinformatics15:229. 10.1186/1471-2105-15-229
65
RidoutC. J.SkamniotiP.PorrittO.SacristanS.JonesJ. D.BrownJ. K. (2006). Multiple avirulence paralogues in cereal powdery mildew fungi may contribute to parasite fitness and defeat of plant resistance. Plant Cell18, 2402–2414. 10.1105/tpc.106.043307
66
RouxelT.GrandaubertJ.HaneJ. K.HoedeC.van de WouwA. P.CoulouxA.et al. (2011). Effector diversification within compartments of the Leptosphaeria maculans genome affected by Repeat-Induced Point mutations. Nat. Commun.2, 202. 10.1038/ncomms1189
67
SaundersD. G.WinJ.CanoL. M.SzaboL. J.KamounS.RaffaeleS. (2012). Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi. PLoS ONE7:e29847. 10.1371/journal.pone.0029847
68
SchornackS.Van DammeM.BozkurtT. O.CanoL. M.SmokerM.ThinesM.et al. (2010). Ancient class of translocated oomycete effectors targets the host nucleus. Proc. Natl. Acad. Sci. U.S.A.107, 17421–17426. 10.1073/pnas.1008491107
69
SpanuP. D.AbbottJ. C.AmselemJ.BurgisT. A.SoanesD. M.StuberK.et al. (2010). Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism. Science330, 1543–1546. 10.1126/science.1194573
70
SperschneiderJ.DoddsP. N.GardinerD. M.SinghK. B.MannersJ. M.TaylorJ. M. (2015). Advances and challenges in computational prediction of effectors from plant pathogenic fungi. PLoS Pathog.11:e1004806. 10.1371/journal.ppat.1004806
71
StajichJ. E.WilkeS. K.AhrénD.AuC. H.BirrenB. W.BorodovskyM.et al. (2010). Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus). Proc. Natl. Acad. Sci. U.S.A.107, 11889–11894. 10.1073/pnas.1003391107
72
TestaA. C.HaneJ. K.EllwoodS. R.OliverR. P. (2015). CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics16:170. 10.1186/s12864-015-1344-4
73
TortoT. A.LiS.StyerA.HuitemaE.TestaA.GowN. A.et al. (2003). EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora. Genome Res.13, 1675–1685. 10.1101/gr.910003
74
TylerB. M.KaleS. D.WangQ.TaoK.ClarkH. R.DrewsK.et al. (2013). Microbe-independent entry of oomycete RxLR effectors and fungal RxLR-like effectors into plant and animal cells is specific and reproducible. Mol. Plant Microbe Interact. 26, 611–616. 10.1094/MPMI-02-13-0051-IA
75
UrbanM.PantR.RaghunathA.IrvineA. G.PedroH.Hammond-KosackK. E. (2015). The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res.43, D645–D655. 10.1093/nar/gku1165
76
Von HeijneG. (1990). The signal peptide. J. Membr. Biol.115, 195–201. 10.1007/BF01868635
77
WangL.JiangN.WangL.FangO.LeachL. J.HuX.et al. (2014). 3' Untranslated regions mediate transcriptional interference between convergent genes both locally and ectopically in Saccharomyces cerevisiae. PLoS Genet.10:e1004021. 10.1371/journal.pgen.1004021
78
WawraS.DjameiA.AlbertI.NürnbergerT.KahmannR.Van WestP. (2013). In vitro translocation experiments with RxLR-reporter fusion proteins of Avr1b from Phytophthora sojae and AVR3a from Phytophthora infestans fail to demonstrate specific autonomous uptake in plant and animal cells. Mol. Plant Microbe Interact.26, 528–536. 10.1094/MPMI-08-12-0200-R
79
WickerT.OberhaensliS.ParlangeF.BuchmannJ. P.ShatalinaM.RofflerS.et al. (2013). The wheat powdery mildew genome shows the unique evolution of an obligate biotroph. Nat. Genet.45, 1092–1096. 10.1038/ng.2704
80
WiemannP.SieberC. M.von BargenK. W.StudtL.NiehausE. M.EspinoJ. J.et al. (2013). Deciphering the cryptic genome: genome-wide analyses of the rice pathogen Fusarium fujikuroi reveal complex regulation of secondary metabolism and novel metabolites. PLoS Pathog.9:e1003475. 10.1371/journal.ppat.1003475
81
YangF.JensenJ. D.SvenssonB.JorgensenH. J.CollingeD. B.FinnieC. (2012). Secretomics identifies Fusarium graminearum proteins involved in the interaction with barley and wheat. Mol. Plant Pathol.13, 445–453. 10.1111/j.1364-3703.2011.00759.x
Summary
Keywords
signal peptide prediction, effectors, protein secretion, fungi, oomycetes, plant pathogens
Citation
Sperschneider J, Williams AH, Hane JK, Singh KB and Taylor JM (2015) Evaluation of Secretion Prediction Highlights Differing Approaches Needed for Oomycete and Fungal Effectors. Front. Plant Sci. 6:1168. doi: 10.3389/fpls.2015.01168
Received
05 June 2015
Accepted
07 December 2015
Published
23 December 2015
Volume
6 - 2015
Edited by
Marc-Henri Lebrun, Institut National de la Recherche Agronomique, France
Reviewed by
Guus Bakkeren, Agriculture & Agri-Food Canada, Canada; Gregor Langen, University of Cologne, Germany; Marc-Henri Lebrun, Institut National de la Recherche Agronomique, France
Updates
Copyright
© 2015 Sperschneider, Williams, Hane, Singh and Taylor.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jana Sperschneider jana.sperschneider@csiro.au
This article was submitted to Plant Biotic Interactions, a section of the journal Frontiers in Plant Science
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.