Prediction of Host-Specific Genes by Pan-Genome Analyses of the Korean Ralstonia solanacearum Species Complex

The soil-borne pathogenic Ralstonia solanacearum species complex (RSSC) is a group of plant pathogens that is economically destructive worldwide and has a broad host range, including various solanaceae plants, banana, ginger, sesame, and clove. Previously, Korean RSSC strains isolated from samples of potato bacterial wilt were grouped into four pathotypes based on virulence tests against potato, tomato, eggplant, and pepper. In this study, we sequenced the genomes of 25 Korean RSSC strains selected based on these pathotypes. The newly sequenced genomes were analyzed to determine the phylogenetic relationships between the strains with average nucleotide identity values, and structurally compared via multiple genome alignment using Mauve software. To identify candidate genes responsible for the host specificity of the pathotypes, functional genome comparisons were conducted by analyzing pan-genome orthologous group (POG) and type III secretion system effectors (T3es). POG analyses revealed that a total of 128 genes were shared only in tomato-non-pathogenic strains, 8 genes in tomato-pathogenic strains, 5 genes in eggplant-non-pathogenic strains, 7 genes in eggplant-pathogenic strains, 1 gene in pepper-non-pathogenic strains, and 34 genes in pepper-pathogenic strains. When we analyzed T3es, three host-specific effectors were predicted: RipS3 (SKWP3) and RipH3 (HLK3) were found only in tomato-pathogenic strains, and RipAC (PopC) were found only in eggplant-pathogenic strains. Overall, we identified host-specific genes and effectors that may be responsible for virulence functions in RSSC in silico. The expected characters of those genes suggest that the host range of RSSC is determined by the comprehensive actions of various virulence factors, including effectors, secretion systems, and metabolic enzymes.


INTRODUCTION
The Ralstonia solanacearum species complex (RSSC) is a group of rod-shaped Gram-negative bacteria with polar flagella belonging to the Burkholderiaceae family of the Betaproteobacteria class. RSSC are soil-borne pathogens and can live for several years without a host. The bacteria invade host vascular tissues through injured roots or natural openings. Then, colonization and production of exopolysaccharide (EPS) in the stem block water transport in the xylem, resulting in wilting and death of the host plant (Denny, 2006).
Ralstonia solanacearum has an uncommonly broad host range, infecting more than 450 plant species that belong to more than 50 families, encompassing monocots and dicots and herbaceous and woody plants (Hayward, 1991;Wicker et al., 2007;Jiang et al., 2017). R. solanacearum are found in distinct geographical regions, which include tropical, subtropical, and warm and cool temperate areas across the six continents of Asia, Africa, Europe, and North and South America, and Oceania (Hayward, 1991;Denny, 2006). With their broad host range and wide geographical distribution, R. solanacearum have demonstrated great diversity in their genetic and phenotypic properties, and thus strains of this species have been designated as a species complex, the RSSC (Fegan and Prior, 2005;Genin and Denny, 2012). The RSSC group has been expanded to include Ralstonia syzygii and blood disease bacteria (BDB), which are closely related organisms (Taghavi et al., 1996).
The investigation of the genome of R. solanacearum began with the complete genome sequencing of strain GMI1000, which significantly advanced the study of pathogenicity by characterizing the molecular complexity of the organism (Salanoubat et al., 2002). Subsequently, the complete or draft genomes of many R. solanacearum strains with various host ranges became available. To date, there are 86 genomes of RSSC deposited in the National Center for Biotechnology Information (NCBI database in Aug 2018 1 ). RSSC usually have a bipartite genome, with one chromosome and one megaplasmid; however, some strains (i.e., CMR15 and PSI07) carry an additional plasmid (Remenant et al., 2010). Wholegenome comparisons of sequenced genomes have confirmed the phylotype classification scheme of RSSC: phylotype I (GMI1000, FQY_4, EP1, and Y45), phylotype II (CFBP2957, IPO1609, K60, MolK2, Po82, and UW551), phylotype III (CMR15), and phylotype IV (PSI07, R. syzygii R24 and BDB R229). RSSC have been reclassified into three species, based on analyses of various biochemical properties and genomic comparisons, as follows: R. solanacearum (phylotype II strains), Ralstonia pseudosolanacearum (phylotype I and III strains), and R. syzygii (phylotype IV strains, including R. syzygii R24 and BDB R229) (Safni et al., 2014). RSSC is also divided into five biovars based on carbohydrate utilization, i.e. the ability to oxidize 3 disaccharides (lactose, maltose, and cellobiose) and 3 hexose alcohols (mannitol, dulcitol, and sorbitol) (Hayward, 1991). 1 https://www.ncbi.nlm.nih.gov A number of factors are responsible for the pathogenesis of RSSC: global regulatory transcription factors, EPS, plant hormones, host cell wall-degrading enzymes, adhesion/surface proteins, toxins, and oxidative stress resistance. Many of them are secreted in RSSC, and the bacterial pathogen mainly use type II (T2SS) and type III secretion systems (T3SS). Three types of T2SS, one orthodox system and two unorthodox systems, have been found in seven reference genomes, including those of phylotypes I, II, III, and IV (Li et al., 2016). T3SS is also crucial for bacterial virulence as the plant-inducible secretion machinery, which is encoded by the highly conserved hrp gene cluster in strains of phylotypes I, II, III, and IV (Li et al., 2016).
Type III secretion systems exports virulence factors directly into the host cells, and these injected proteins are called type III effector proteins (T3e) (Cunnac et al., 2004;Mukaihara et al., 2010). T3es of RSSC have been designated as Rip (Ralstonia-injected proteins), which include the Pop/AWR/Gala families (Peeters et al., 2013). These proteins have various functions in the invasion of host plants. For instance, the GALA family has F-box and leucine-rich repeat (LRR) domains that are required for full virulence (Angot et al., 2006;Kajava et al., 2008;Remigi et al., 2011). PopP family proteins work as avirulence proteins with acetyl transferase activity (PopP2) (Deslandes et al., 2003), and some AWR family effectors induce necrotic cell death in host plants (Sole et al., 2012). Likewise, T3e disrupts the homeostasis of host plants by disturbing signal transduction. Their defense system falters, leading to bacterial infection and death by wilting (Poueymiro and Genin, 2009). In a previous study, pan-genome analyses of 11 representative RSSC strains identified 94 Rips (Peeters et al., 2013). Individual RSSC strains possess around 60-75 effectors, and effector repertoire comparisons have revealed 32 core effectors in 10 strains. Most rip genes have a feature in their promoter region called a plant-inducible promoter (PIP) box, which is responsive to the T3SS transcriptional regulator HrpB (Cunnac et al., 2004).
A number of studies have investigated the host specificity of this highly vulnerable pathogen; however, because of the indefinite host range and bacterial nomenclature related to virulence, few systematic studies have clearly defined host specificity (Guidot et al., 2007;Cellier et al., 2012;Peeters et al., 2013;Ailloud et al., 2015). Nonetheless, the rsa1 gene from strain SL2029 (phylotype IV biovar 2) has been described as a pepperspecific avirulence gene. The Rsa1 protein from strain SL2029 is specifically avirulent for pepper infection, and when this gene is introduced into the pepper pathogenic strain SL341 (phylotype I biovar 4), the SL341 with the rsa1 gene becomes avirulent for pepper (Jeong et al., 2011).
In our previous study of the pathogenicity against potato, tomato, eggplant, and pepper, Korean RSSC were divided into four pathotypes. Since pathotypes reflect genetic traits of RSSC, we attempted to identify bacterial genes for host specificity of RSSC by adopting pan-genome analysis. In this study, we sequenced whole genome of 25 Korean RSSC strains and performed comparative genome analyses to present the candidate genes responsible for host specificity including T3es and pathogenesis related genes.

Strain Selection
We previously analyzed the genetic and pathogenic diversity of Korean RSSC using 93 strains isolated from samples of potato bacterial wilt throughout the country. To conduct in-depth analyses of the genetic relationships between RSSC and hosts, 25 isolates were selected based on their host range on solanaceous crop plants, with typical representative characters for each of the individual phylotypes (Cho et al., 2018).

Genome Sequencing
For genomic DNA preparation, high-molecular-weight genomic DNA was prepared as previously described (Cho et al., 2018). Each bacterial genome was sequenced using the Pacific Biosciences' Single Molecule Real Time (SMRT) Sequencing Technology with a 20 kb library, P6/C4 chemistry, and one SMRT cell running ( 2 DNAlink, South Korea). De novo assembly was conducted using the hierarchical genome assembly process (HGAP ver. 2.3) workflow (Chin et al., 2013), including consensus polishing with Quiver, and the default parameters (Minimum Subread Length 500 bp, Minimum Polymerase Read Quality 0.8, and Minimum Polymerase Read Length 100 bp). After error correction based on the longest seed reads with shorter reads, sequences were assembled with error-corrected reads. Gene prediction was carried out using the Glimmer3 to predict coding sequences (CDSs), and RNAmmer-1.2 and tRNAscan-SE were used to identify rRNA and tRNA sequences in the assembled genome. The annotation of each CDS was performed using homology search against Blastall ver. 2.2.26.

Genome Comparisons
For comparison with reference strains, the genome sequences of GMI1000, FQY_4, CMR15, PO82, and PSI07 were retrieved from the National Center for Biotechnology Information (NCBI) database and those of R. syzygii R24 and BDB R229 were obtained from the EMBL database (Table 1). To analyze the overall genome sequence similarity, the Orthologous Average Nucleotide Identity (OrthoANI) tool was used (Lee et al., 2016). Multiple genome alignments were performed using the Mauve software 3 .

Analysis of Clusters of Orthologous Groups and Pan-Genome Orthologous Groups
For comparative genome analyses, clusters of orthologous group (COG) and pan-genome orthologous group (POG) analyses were performed using the Chunlab pipeline ( 4 South Korea) (Chun et al., 2009). Gene prediction and annotation were analyzed using the Server for Rapid Annotations using Subsystem Technology (RAST) (Aziz et al., 2008). The annotation of each CDS was conducted by homology search against the Swiss-Prot, EggNOG 4.1, SEED (the database and infrastructure for comparative genomics), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.

Effectors Prediction
The effectors of T3es of the Korean RSSC strains were identified using the RalstoT3E server 5 (Peeters et al., 2013).

Genome Submission
The genome sequences of 25 Korean RSSC strains were deposited in the NCBI database and the accession numbers are listed in Table 1.

General Genomic Features of the RSSC Strains
High-quality genomes of the Korean RSSC strains were constructed using PacBio long read sequencing data. The general genomic features of the RSSC strains used in this study are summarized in Table 1. The newly sequenced 25 Korean RSSC genomes were complete and contained two contigs: one for the chromosome and another for a megaplasmid, except for T78 strain, which contained one chromosome, one megaplasmid, and one small plasmid.
All 25 strains analyzed in this study belong to phylotype I and IV, the assembled genome sizes were 5.4-6.15 Mbp and the GC contents were 66.4-66.9%. The number of predicted genes was 5,071-6,811, that of rRNA genes was 9 or 12, and that of tRNA genes was 54-59. The genome comparison revealed several differences between the genomes of phylotype I and phylotype IV, as follows: the average genome sizes of phylotype I and IV were 5.8 and 5.5 Mbp, the average GC contents were 66.91 and 66.39%, the average number of predicted CDSs was 5,551 and 5,152, the number of rRNA genes was 12 and 9, and the average number of tRNA genes was 58 and 55, respectively. All genomic features (genome size, GC content, and number of CDSs, rRNAs, and tRNAs) of the phylotype I strains were larger than those of the phylotype IV strains, except for the T42 strain, which was similar in genome size and CDS number to the phylotype IV strains.
To demonstrate the consistency or variation among the Korean RSSC genomes, multiple genome alignment was performed using the Mauve tool (Figure 2). This alignment revealed that eleven phylotype IV-biovar 2 (hereafter, IV-2) strains were co-linear along the chromosomes and the megaplasmids. For phylotype I strains, the phylotype I-biovar 3 (I-3) and phylotype I-biovar 4 (I-4) strains were generally similar to each other with respect to their genome organization; however, some strains had a genetic inversion, i.e., SL3300 and T117 had an inversion in the middle of the chromosome, and T25 and SL3822 had a large inversion in the megaplasmid. Between the phylotype I and IV strains, the gene organization revealed many rearrangements, particularly in the megaplasmid.

COG Distribution Between RSSC Strains
To compare the distribution of functional genes between the RSSC strains in relation to bacterial virulence and potential host specificity, the functional categories of COG were analyzed (Figure 3 and Supplementary Table S2). Among the predicted CDSs, about 70% of the genes were classified into one of the 22 COG categories and about 30% were of unknown function. Aside from the genes of unknown function (S), the largest  functional group was the amino acid transport and metabolism group (E), which contained an average of 367 genes, followed by the transcription group (K), which contained an average of 330 genes, and the energy production and conversion group (C), which contained an average of 286 genes. COGs were distributed differently between the phylotype I and IV strains. The phylotype I strains had more genes than the phylotype IV strains in the categories of transcription (K); replication; recombination and repair (L); intracellular trafficking, secretion, and vesicular transport (U); carbohydrate transport and metabolism (G); and secondary metabolite biosynthesis, transport, and catabolism (Q). The phylotype IV strains had more genes than the phylotype I strains in the categories of signal transduction mechanisms (T) and inorganic ion transport and metabolism (P).

Candidate Host-Specific Genes for Bacterial Virulence
To identify genes responsible for host specificity, comparative genome analyses were performed by comparing functional genes among the four pathotypes. Previously, Korean RSSC strains isolated from samples of potato bacterial wilt were divided into four different pathotypes based on tests against four solanaceae plants: potato, tomato, eggplant, and pepper. The classifications were only pathogenic on potato (P); pathogenic on potato and tomato (PT); pathogenic on potato, tomato, and eggplant (PTE); and pathogenic on potato, tomato, eggplant, and pepper (PTEPe) (Cho et al., 2018). Using pan-genome analyses of the Korean isolates, we identified a number of candidate genes expected to contribute to host specificity ( Table 2) and the POGs are listed in Supplementary Tables S3-S5. A total of 128 genes were only found in four tomato-non-pathogenic strains. Most of these genes encoded proteins with hypothetical functions and a few of the functionally designated genes were related to mobile elements, such as bacteriophage infection or insertional elements (Supplementary Table S3). Three genes revealed homology with a gene encoding the clustered regularly interspaced short palindromic repeat (CRISPR) proteins, Cas9, Cas1, and Cas2 (Supplementary Figure S1A). CRISPR is an adaptive immune system in prokaryotes that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). Four genes showed homology with a gene encoding Mu-like prophage proteins (Supplementary Figure S1B), and one capsid protein and another Mu-like virus tape-measure protein were identified by the analyses. In addition, two copies of insertional element IS476 were also identified in the analyses (Supplementary Figure S1C). In the tomato-pathogenic group, these bacteria shared 8 genes. Among them, three genes encoded components of the type II secretion system (T2SS) and next one gene encoded RhsB (rearrangement hotspot), which is a probable deoxyribonuclease (Table 2, Supplementary Figure S2, and Supplementary Table S3).
Regarding eggplant, seven eggplant-non-pathogenic strains shared 5 genes (Supplementary Table S4). Four of them encoded hypothetical proteins and one was similar to a gene encoding a putative RipA, which is a transcriptional regulator for type III secretion with a helix-turn-helix DNA binding motif (Supplementary Figure S3). For eggplant infection, it was predicted that 7 genes were shared. None of these genes were similar to genes of known function.

T3SS Effectors
Type III secretion systems is deeply involved in pathogenicity and RSSC carries abundant T3es that are secreted through the FIGURE 3 | Graph of COG functional categories of 25 Korean strains. Orange color represents phylotype IV-biovar 2 strains, and olive and green colors represent the phylotype I-biovar 3 and phylotype I-biovar 4 strains. The red triangle and box indicate the category with more genes in phylotype IV than in phylotype I, and the green triangle and box indicate the category with more genes in phylotype I than phyloytpe IV. Category S, genes of unknown function, is excluded from the graph.
T3SS (Peeters et al., 2013). Therefore, we predicted the T3es of the Korean RSSC using the RalstoT3E annotation server and analyzed effectors related to host range on four Solanaceae plants (Figure 4). Of a total of 94 T3e repertoires of RSSC, 82 effectors were identified as full or partial forms and 12 effectors were absent ( Figure 4A). While a total of 30 effectors were present in all sequenced Korean strains, 8 were present only in phylotype I and 6 were found only in phylotype IV ( Figure 4B). Regarding the sharing of effectors, it was shared 68 effectors in four I-3 strains, 70 in ten I-4 strains, and 70 in eleven IV-2 strains. In general, phylotype I strains had more effectors than phylotype IV strains and the distributions of each effector revealed a more conserved patterns in phylotype I strains than in phylotype IV strains. In the case of SL3175 and T98 of phylotype IV, revealed different distribution pattern compared to the other phylotype IV strains, which this clustering was consistent with the genomic phylogenetic tree (Figure 1).
Next, we investigated the presence of effectors related to host specificity for tomato, eggplant, or pepper plant infections. Among them, we identified three host-specific effectors: two for tomato (RipH3 and RipS3) and one for eggplant (RipAC). The genetic loci of the genes encoding these proteins are depicted in Figure 5. The gene ripS3 (also known as SKWP3) was present in all 21 tomato-pathogenic strains, but was absent in the four tomato-non-pathogenic strains (SL2312, T12, T82, and T101). The function of RipS3 is unknown; however, it contains the nucleotidyl transferase (NT) domain of the RelA-and Spo-like protein (Supplementary Figure S5A). The gene ripH3 (HLK3) was present as a partial form with a truncated N-terminal or C-terminal region in the tomato-non-pathogenic strains, whereas it was present in a complete form in tomato-pathogenic strains. The ripH3 gene contained a PIP box in the promoter region, but did not exhibit any known functional domain. In the case of RipAC (PopC) in strains pathogenic for eggplant, the gene was different from that of non-pathogenic strains, which had another type of ripAC (Supplementary Figure S5B). In the ripAC gene, it was difficult to find DNA sequence similarity between eggplantpathogenic and non-pathogenic strains; however, there was 34% homology in the amino acid sequences, which had 17 LRR motifs and 10 LRRs in common with each other.

DISCUSSION
Ralstonia solanacearum species complex is a sophisticated complex of plant pathogens with an unusually broad host range and wide geographical distribution. We identified genes that are relevant to host specificity by analyzing whole genomes of 25 Korean RSSC strains isolated from potato bacterial wilt and performing in silico genome-wide comparison of four different pathotypes of RSSC strains.
The isolated Korean potato bacterial wilt strains belonged to phylotypes I or IV, and phylotype I strains showed destructive pathogenicity, not only on potato, but also on tomato, eggplant, and pepper plants (Cho et al., 2018). Previously, it had been suggested that phylotype I strains may have evolved from phylotype IV strains (Wicker et al., 2012;Li et al., 2016). The genomes of phylotype I strains possess distinctive features compared to those of the other representative phylotype II (Po82, IPO1609), III (CMR15), and IV strains (PSI07, R24, and R229): large genome size, high GC content, and larger numbers of rRNA and tRNA genes. From these genomic features, we suppose the possibility that phylotype I strains have accepted foreign DNA fragments to adapt to various environments resulting in increased GC content and genome size, and among the imported fragments, there might be various virulence factors that enable these strains to infect new hosts.
When we compared the genomes to identify genes responsible for host specificity, we found specific genes from the tomato pathogenic strains: two T3es (ripS3 and ripH3), a set of T2SS, and an adjacent rhsB (Figure 6). The function of RipS3 has not been reported; however, RipS3 has the nucleotidyl transferase domain of RelA-and SpoT-like ppGpp synthetases and hydrolases, and belongs to the PRK09169 superfamily (Supplementary Figure S5A). The ppGpp works as an alarmone, which integrates general stress responses, such as starvation, heat shock, and oxidative stress in bacteria and plants ( Van der Biezen et al., 2000;Braeken et al., 2006;Wang et al., 2016). We speculate that effector RipS3 is translocated into host plant cells and may disturb the host stress response mediated by the ppGpp. The other effector, RipH3, possesses a PIP box but no known functional motif. The ripH3 gene was partially present in the tomatonon-pathogenic strains (Figure 5), and this feature is consistent with a previous report that a triple deletion mutant of ripH1-H3 (HLK1-3) was significantly impaired with respect to tomato infection (Chen et al., 2014).
Type II secretion systems (T2SSs) have been identified in bacteria belonging to the alpha, beta, gamma, and deltaproteobacteria, and are encoded by 12 to 16 genes, of which the core components are named as gsp (general secretory proteins) (Abby et al., 2016). Substrates of T2SS are recruited and transported as fully folded and often oligomeric proteins (Gu et al., 2017). It has been reported that RSSC possessed three types of T2SS: one is the orthodox system constructed out of 12 components, and the others are the unorthodox systems, possessing 7 core genes (Li et al., 2016). In our study, the tomato-non-pathogenic strains were lacking one set of unorthodox T2SS and the adjacent rhsB gene (Figure 6). Because T2SS mutants were impaired in colonization and proliferation in planta (Kang et al., 1994), this feature appears to relate one set of the T2SS deficiency with the tomato-non-pathogenic trait. In addition, the rhsB gene, which was located next to the T2SS-encoding gene cluster, was also absent in the tomato-nonpathogenic strains. Rhs protein possesses YD-peptide repeats, which play a role in the bacterial-eukaryotic host cell interaction, and it also carries nuclease domain to degrade target cell DNA (Koskiniemi et al., 2013). In Dickeya dadantii 3937, Rhs protein worked as a DNase toxin to inhibit neighboring cell growth in a contact-dependent manner, and its translocation to host cells provided evidence that Rhs may be exported through a type VI secretion system (T6SS) (Koskiniemi et al., 2013). Although it is a distinctive genetic feature of tomato-pathogenic strains, further experiments are required to define the mechanisms of T2SS and RhsB in tomato infection.
Our pan-genome analyses to identify host-specific genes for bacterial virulence revealed that a number of genes encoding hypothetical protein with unknown function and mobile element or phage-related genes are present in four pathotypes of Korean Tables S3-S5). Functional studies of each gene identified by in silico analysis should be performed to define their involvement in bacterial virulence in a host-specific infection and virulence. It would be also interesting to investigate if mobile element or prophages can contribute to host specific virulence of each pathotypes by disrupting certain avirulence-or virulence-related function of RSSC strains.

RSSC (Supplementary
In analyses using the RalstoT3E prediction system, we identified the RipAC for eggplant-specific effector. RipAC was previously called as PopC and has LRR motifs, which are expected to interact with some host proteins (Peeters et al., 2013). The popC (ripAC) gene constitutes an operon with popA (ripX) and popB (ripAB) expressed by a promoter containing a PIP box (Guéneron et al., 2000). This popABC operon was located next to the hrp gene cluster, and the encoded PopA (harpin), PopB (NLS motif), and PopC (LRR motifs) proteins were secreted by T3SS. In the RalstoT3E database, the names of these effectors were assigned as RipX (PopA), RipAB (PopB), and RipAC (PopC) (Peeters et al., 2013). Interestingly, the RNA expressions of these genes were upregulated in tomato plants (Ailloud et al., 2016), but Tn5-B20-inserted mutants of these genes were still pathogenic on tomato (Marenda et al., 1998). In general, the  Korean RSSC strains carried the similar gene organization with previously reported RSSC strains for popABC operon driven by a promoter with a PIP box and adjacent hrp cluster. However, while popA (ripX) and popB (ripAB) were present in all strains, popC (ripAC) was not; the popC gene of most strains encoded a protein with 17 LRR motifs, like that of the GMI1000 strain, but 7 eggplant-non-pathogenic strains carried a smaller popC gene encoding 10 LRR motifs, like that of R24 and BDB 225 (Supplementary Figure S5B). These results suggest that popC (ripAC) may be involved in eggplant-infection of RSSC.
The dhb gene cluster homologs were identified as pepper-specific candidate virulence genes. The metabolite 2,3-dihydroxybenzoate (DHB) works as an isochorismatederived secondary metabolite in plants ( Bartsch et al., 2010), or a key intermediate of several siderophores: enterobactin (Salmonella enterica and Escherichia coli), anguibactin, and vanchrobacin (Vibrio anguillarum) (Raymond et al., 2003;Li and Ma, 2017). Siderophores are low molecular weight iron-chelating molecules that facilitates iron uptake in many gram-negative bacteria. Iron uptake systems are critical for the function of some pathogens to infect host plants to lead to disease (Wolf and Crosa, 1986). In the pepper-non-pathogenic RSSC strains, the dhb gene cluster homologs and other related genes were absent. This deficiency appeared to affect the ability of the bacteria to infect pepper.
A previous study identified a pepper-specific avirulence gene rsa1, which confers avirulence to a pepper-pathogenic strain, from a potato pathogenic SL2029 (Jeong et al., 2011). In our study, this rsa1 gene was present in the genome of all pepper nonpathogenic phylotype IV strains of Korean RSSC as expected. However, the rsa1 gene was not identified as a host-specific avirulence gene in comparative analysis. This was because one of phylotype I strains, SL3103, does not carry rsa1 gene in its genome while this strain was pepper non-pathogenic unlike to other phylotype I strains. It is likely that the strain SL3103 is pepper non-pathogenic due to the absence of dhb gene cluster (Supplementary Figure S4 and Supplementary Table S5), while other phylotype I strains carry dhb gene cluster.
Our extensive comparative genomic analyses uncovered several genes associated with the pathogenicity of RSSC on different crops. It is likely that the host-specificity of RSSC will be a function of the comprehensive actions of various virulence factors, effectors, secretion systems, and metabolic enzymes. Although further biological functions of these genes should be determined, these data contribute to expand our understanding on the host specificity of RSSC.

DATA AVAILABILITY
The whole genome sequences of 25 Korean RSSC strains can be found in the NCBI GenBank and the accession numbers are listed in Table 1.

AUTHOR CONTRIBUTIONS
HC, E-SS, DP, J-GK, and IH conceived and designed the study. YL, SL, and S-WL provided the RSSC isolates and related information. HC and E-SS carried out the experiments. JB and T-HL assembled and analyzed the genomes. HC and IH analyzed and interpreted the data. HC, SH, and S-WL prepared the manuscript.

FUNDING
This work was supported by a grant from the National Institute of Agricultural Sciences, Rural Development Administration (PJ01008502 and PJ01246601), South Korea.