Comprehensive genome sequence analysis of Ralstonia solanacearum gd-2, a phylotype I sequevar 15 strain collected from a tobacco bacterial phytopathogen

Introduction Plant bacterial wilt is an important worldwide disease caused by Ralstonia solanacearum which is a complex of species. Methods In this study, we identified and sequenced the genome of R. solanacearum strain gd-2 isolated from tobacco. Results Strain gd-2 was identified as R. solanacearum species complex (RSSC) phylotype I sequevar 15 and exhibited strong pathogenicity to tobacco. The genome size of gd-2 was 5.93 Mb, including the chromosomes (3.83 Mb) and the megaplasmid (2.10 Mb). Gene prediction results showed that 3,434 and 1,640 genes were identified in the chromosomes and plasmids, respectively. Comparative genomic analysis showed that gd-2 exhibited high conservation with ten highly similar strain genomes and the differences between gd-2 and other genomes were mainly located at positions GI12-GI14. 72 type III effectors (T3Es) were identified and RipAZ2 was a T3E specific to gd-2 compared with other eight sequenced strain. Discussion Our study provides a new basis and evidence for studying the pathogenic mechanism of R. solanacearum.


Introduction
Plant bacterial wilt is an important worldwide disease caused by Ralstonia solanacearum, which can be transmitted through soil, irrigation, plants, and seed potatoes (Genin and Denny, 2012).This pathogen has a wide host range and can infect more than 200 plant species belonging to more than 50 families, including monocotyledons and dicotyledons, such as potatoes, tomatoes, eggplants, peanuts, tobacco, bananas, and ginger (Paret et al., 2010;Qian et al., 2012;Yuliar et al., 2015;Sharma, 2021;Huang et al., 2023).Bacterial wilt is widely distributed in tropical, subtropical, and temperate regions (Elphinstone et al., 2005).Peanut wilt generally causes a 10-20% reduction in production, and, production in severe cases was reduced up to 50% or even halted (Chen et al., 2020).Ginger wilt caused by R. solanacearum Xiao et al. 10.3389/fmicb.2024.1335081Frontiers in Microbiology 02 frontiersin.orgremains the biggest obstacle to ginger production (Mao et al., 2017).
In 1880, Bacterial wilt in tobacco was first discovered in Granville, USA, then it was classified as an important disease in tobacco production subsequently because of the enormous potential threat posed to the tobacco industry (Hayward, 2003).And bacterial wilt in tobacco has subsequently spread throughout the world, including in the United States, Indonesia, Japan, South Korea, and Australia (Prokchorchik et al., 2020).
Ralstonia solanacearum has complex physiological and biochemical characteristics.It is a gram-negative rod-shaped bacterium with an optimum growth temperature of approximately 32°C and a pH of 6.6.When cultured on TTC medium, R. solanacearum generally exhibits a central reddish color surrounded by a milky white irregular shape and exhibits strong fluidity under high light conditions (Kang et al., 2008).There are two internationally recognized traditional taxonomic methods for R. solanacearum.One method divides R. solanacearum into five physiological variants based on their host range (Buddenhagen et al., 1962).The other method divides R. solanacearum into five biochemical variants based on their utilization of carbohydrates (lactose, maltose, cellobiose, mannitol, sorbitol, and xylitol) (He, 1983).According to the diversity of R. solanacearum in different hosts and different geographical origins, Fegan and Prior proposed a new evolutionary taxonomic framework based on the analysis of the 16S-23S rDNA gene spacer region sequence endoglucanase (egl) gene and hypersensitive response and pathogenicity (hrpB) gene, which reflects the genetic evolution and geographical origins of R. solanacearum better.Its evolutionary taxonomic framework includes four different levels of taxonomic units: species, phylotype, sequevar and clone (Fegan and Prior, 2005).These evolutionary types reflect their different geographical origins: Asian (phylotype I), American (phylotype IIA and phylotype IIB), African (phylotype III), and Indonesian (phylotype IV) (Castillo and Greenberg, 2007).Each evolutionary type can be further subdivided into different sequence types (sequevars), and different sequence types may contain different strains with similar pathogenicity or consistent geographical origins.According to the homology of the egl gene sequence in the strains, each evolutionary type of strain is divided into multiple different sequence variants; 55 sequence variants have been identified to date (Liu et al., 2017;Greenrod et al., 2023).
The pathogenesis and regulatory process of R. solanacearum are very complex.The main virulence factors include the type I, II, III, IV, V, VI secretion system (T1SS, T2SS, T3SS, T4SS, T5SS, T6SS), extracellular polysaccharides (EPSs) and extracellular proteins (EXPs).Among them, EPSs play a crucial role in pathogenicity of bacterial (Kang et al., 2002;Valls et al., 2006;Tsai et al., 2019).R. solanacearum can spread through soil, and it enters the plant roots and invades the vascular bundles of the plant and rapidly spreads to the aboveground tissues through the vascular bundle system.The typical symptoms of diseases caused by R. solanacearum infection are browning of the xylem, preferential growth of leaves, and plant wilting.After entering the host, R. solanacearum secretes more than 30 effector proteins through the type II secretion system (T2SS), including various cell wall-degrading enzymes.The most studied effector proteins are pectinolytic enzymes and cellulose hydrolytic enzymes, which play an important role in the colonization of R. solanacearum (Liu et al., 2005;Tsujimoto et al., 2008;Sharma et al., 2020).The T3SS plays an important role in the interaction between R. solanacearum and its host (Alfano and Collmer, 2004;Coll and Valls, 2013).All the type III effectors (T3Es) of R. solanacearum are located on the large plasmid of the bacterium, known as the hrp, which is approximately 23-30 kb.When this region is mutated, the host cannot exhibit a hypersensitive response or cause plant disease (Mukaihara et al., 2010;Ran et al., 2014).The effector proteins of R. solanacearum exhibit widespread gene-level transfer and significant intraspecies genetic differentiation.Peeters et al. (2013a,b) sorted the effector proteins of R. solanacearum and unified their nomenclature based on their genetic relationships using the general term Rip to name all T3E genes, obtaining 94 Rip genes and 16 candidate T3E genes.Sabbagh et al. (2019) updated the database published by Peeters et al. (2013a,b) and generated a pangenomic library containing 102 T3Es and 16 hypothetical T3Es.The functions of the effector proteins of R. solanacearum include interfering with the basic defense response of plants, interfering with host plant metabolic processes, promoting infection, and stimulating host plant immune responses (Tasset et al., 2010;Landry et al., 2020;Cheng et al., 2021;Schachterle and Huang, 2021).
In recent years, the completion of whole-genome sequencing of R. solanacearum has laid the foundation for researchers to elucidate the molecular mechanism of disease pathogenesis at the genomic level.The genome of R. solanacearum is approximately 5.8 Mb, dominated by two circular replicons.Housekeeping genes and some virulence genes are located on the chromosomes, while important virulence factors, such as T3SS and EPS which determine the pathogenicity of R. solanacearum, are located on megaplasmids.Salanoubat et al. (2002) isolated the R. solanacearum strain GMI1000 from tomato plants and completed genome sequencing using this strain as a material, which allowed for further research on its pathogenesis mechanism and identification of plant resistance improvement strategies.Currently, the gene data of the strain can be obtained on three database platforms, NCBI, 1 Ralsto T3E 2 and R. solanacearum sp., 3 which play an important role in analyzing the diversity and evolution of the R. solanacearum genome, studying the genes affecting host range, and determining the comprehensive regulatory mechanism controlling bacterial virulence (Guidot et al., 2009;Peeters et al., 2013a,b;Tan et al., 2019).Genomic islands (GI) are an important form of horizontal gene transfer (HGT), which contain genes related to various biological functions.The genes carried by GI can often bring selective advantages to bacteria.According to the different genes contained, GI can be generally divided into virulence islands, drug resistance islands, metabolic islands, symbiotic islands (Shrivastava et al., 2010).Then gene islands, secreted proteins are generally considered when identifying virulence factors, which play a key role in enhancing the pathogenic efficacy of pathogens (Stritzler et al., 2018;Choi et al., 2020).The NCBI database has published the complete draft genome of 145 R. solanacearum.These Ralstonia strains were mainly isolated from tomato (GMI1000, FJAT-1458), eggplant (EP1, RS-09-161), pepper (RS-10-244, KACC10709), tobacco (CQPS-1, FQY-4), potato (UY031), sesame (SEPPX05) and plantain plants (UW163) (Salanoubat et al., 2002;Ahn et al., 2011;Cai et al., 2015;Asolkar and Ramesh, 2018).The genomic data of Ralstonia isolated from tobacco include Y45 (phylotype I, sequevar 17), FQY-4 (phylotype I, sequevar 17), CQPS-1 (phylotype I, sequevar 17), FJ1003 (phylotype I, sequevar 14), and SL1931 (race 1, biovar 3 strain) (Cao et al., 2013;Liu et al., 2017).Using the genomics of Ralstonia to understand and explore the regulatory mechanism of virulence differentiation and host adaptation will provide an important theoretical basis for targeted prevention and control of Ralstonia.
In this study, we report the isolation of the R. solanacearum strain gd-2 from tobacco plants in Fujian Province, China.This strain belongs to phylotype I sequence 15 and exhibits strong pathogenicity to tobacco.We performed whole-genome sequencing and assembly to obtain the genome framework of gd-2.In addition, we performed functional annotation of the gd-2 genome and compared it with other published R. solanacearum genome sequences using comparative genomics analysis to explore whether gd-2 has genome segments or genes related to host specificity, providing new evidence for ultimately analyzing the pathogenic specificity of R. solanacearum and the prevention and control of bacterial wilt.

Strain gd-2 classification and pathogenicity identification
The bacterial strain gd-2 was isolated from Fujian Province, China, and is preserved by the Tobacco Research Institute of the Chinese Academy of Agricultural Sciences.A bacterial genomic DNA extraction kit (TIANGEN, Beijing, China) was used to extract the genomic DNA of the strain.Polymerase chain reaction (PCR) amplification was performed using a composite PCR of the phylotype type of R. solanacearum, and the primers are shown in Table 1.The band information was observed through the gel imager, the tested strain was determined to belong to R. solanacearum according to whether there were 759/760 bands, and the evolutionary type of the strain was determined by the size of the band.
Three tobacco cultivars hongda, CHB and K326 were used for pathogenicity test of tobacco bacterial wilt for gd-2, which were common resistance and susceptible control variety (Li et al., 2015;Cao et al., 2013;Pan et al., 2021).And hongda existed high susceptible (HS) to susceptible (S) for bacterial wilt, CHB existed S and K326 existed middle resistance (MR) to resistance (R), respectively.After being activated for 36 h by inoculation and streaking on a TTC culture medium plate, single colonies were picked up with disposable inoculation rings and inoculated into NB liquid medium.The inoculated medium was placed in a shaker at 28°C and 220 r/min for 24 h, yielding a bacterial solution with a concentration of approximately 1.0 × 109 CFU/mL (OD 600 of 1.0).The bacterial solution was diluted with deionized water to 1.5 × 108 CFU/mL.When the tobacco seedlings reached the five-leaf stage, 30 mL of diluted bacterial suspension was introduced into the bottom of each seedling at a standard rate, keeping the temperature of the bacterial solution at approximately 30°C and the humidity at approximately 80%.Twenty seedlings were inoculated for each variety, and the incidence of bacterial wilt was investigated at 3, 6, 9, 12, 15, and 18 days after inoculation.

Genomic sequencing and assembly
The third-generation sequencing technology platform PacBio Sequel II sequencer (Pacific Biosciences4 ) was used for genome sequencing, which was sequenced by Shanghai Winnerbio Technology Co., Ltd.(Shanghai, China) using Illumina NovaSeq 6000.The sequencing strategy was single-molecule real-time (SMRT) sequencing.Subsequently, the genome sequencing data were analyzed by GC depth analysis and K-mer frequency distribution analysis to determine whether there was contamination from other species or large fragments from other sources.The third-generation PacBio Sequel II platform and the second-generation sequencing platform Illumina NovaSeq 6000 were used to construct large fragment libraries (10-20 kb) and small fragment libraries (~400 bp) from DNA samples that passed quality control.The raw data were obtained by sequencing on different platforms.Canu V2.25 was used to assemble the thirdgeneration data independently.Then, the final assembly result was corrected by using Pilon V1.246 and the second-generation data to obtain the final assembly of the bacterial genome.

Gene prediction and analysis
The assembled genome sequence was used to predict the coding sequence (CDS) of coding genes using Glimmer37 software, transfer RNA (tRNA) genes using tRNAscan-SE 8 software, and ribosomal RNA (rRNA) genes using Barrnap 9 software.Gene function annotation was performed by comparing the protein sequence file of the gene with the database.The relevant databases included the Nonredundant Protein Database (NR), Swiss-Prot database,10 Pfam database,11 Gene Ontology (GO) database, 12 and Kyoto Encyclopedia of Genes and Genomes (KEGG) database.13DIOMAN14 was used for sequence alignment analysis.
Circos Version 0.69-615 software was used to draw genome circles for the obtained chromosomes and plasmids.The default scanning map ordered all scaffolds from large to small or from small to large and concatenated them into a sequence to draw circles without distinguishing direction.The default information from the outer to the inner circle corresponded to genome size identification, positive-and negative-strand gene information, noncoding RNA (ncRNA), GC content, and GC skew.In addition, IslandViewer 4 16 was used for genome island prediction using IslandPath-DIMOB, Islander, and other methods.Prophage prediction was carried out using  chain-anchored membrane, and (e) GPI-anchored membrane.The SignalP 21 tool was used to predict the signal peptide region of each protein and then combined with the analysis results of transmembrane domains to select proteins with a signal peptide structure but without a transmembrane domain as candidate secreted proteins.The secondary metabolite synthesis gene clusters of the samples were predicted using the antiSMASH bacterial database.

Comparative genomic analysis
We conducted comparative genomic analysis via two methods to further investigate the structural characteristics and key genes of plasmids.One method identified plasmid-specific regions and mutation hotspots through comparative genomic circle diagrams, and the other method identified the structural differences of local gene clusters through gene cluster comparison.We conducted comparative analysis of the 10 reference genome sequences with the large plasmid in the gd-2 genome using BRIGV0.9.5 software 22 and constructed comparative genomic circle diagrams.In addition, we used EasyFigV2.2.3 23 to conduct detailed comparative gene cluster analysis of the target region.

Virulence genes analysis
In this study, we focused on analyzing the T3SS, T4SS, T6SS, as these secretion systems and effectors are often closely related to the pathogenic ability of bacteria.The annotation analysis was conducted to annotate T6SS using the software T6SS_finder with thresholds of identity ≥50% and E value ≤1e −5 .The annotation analysis of the T3Es was conducted using BLAST+ with more than 80% identity and more than 60% gene alignment coverage.Based on the Type VI effector database summarized in the SecReT6 database, annotation analysis of the Type VI effector of the strain gd-2 was conducted using BLAST+ with over 80% identity and over 60% gene alignment coverage.Through HMMER3 software, various genes in the two-component signal transduction system in the genome were obtained and analyzed using the Pfam database combined with the structural domain characteristics of histidine kinases and response regulatory proteins.The genes were divided into three categories, regulator, sensor, and hybrid.We also analyzed the number of chemotaxis genes and recorded their detailed annotation information.The quorum sensing genes were analyzed to identify genes and pathways related to quorum sensing by comparison and analysis with the KEGG database.

Comparative analysis of T3Es in different Ralstonia solanacearum strains
The effector protein of the T3SS system in R. solanacearum is the main determinant protein of its pathogenicity.We identified the distribution and variation information of T3Es in gd-2 and nine other 21 http://www.cbs.dtu.dk/services/SignalP22 http://sourceforge.net/projects/brig/23 http://mjsull.github.io/Easyfig/sequenced and published R. solanacearum genomes, including phylotype I strain GMI1000 (BioProject: PRJNA13); phylotype IIA strain CFBP2957 (BioProject: PRJNA224116); phylotype III strain CMR15 (BioProject: PRJEA50681); phylotype IIB strain Po82 (BioProject: PRJEA50683); phylotype IV strain PSI07 (BioProject: PRJNA66837); and three phylotype I R. solanacearum sequence variants that can infect tobacco, including sequevar 13 strain CQPS-1 (BioProject: PRJNA331070), sequevar 17 strain FQY_4 (BioProject: PRJNA182081), and sequevar 54 strain Y45 (BioProject: PRJNA224116).The localized T3E database to identify T3Es in nine published R. solanacearum genomes were constructed firstly.Using BLAST+, the blast stragegy were the E value 1e-5 , the over 60% coverage and the over 80%identity.And the common and unique T3Es among the nine R. solanacearum strains were compared using the Venn diagram to count the gene distribution and sequence variation of the T3Es in gd-2, the candidate T3E genes of the other eight strains were aligned using BLAST+.We statistically analyzed the functional genes related to the hrp gene cluster in all nine strains, extracted the protein sequence files of the related genes and performed comparative display the genomes of the main R. solanacearum strains.

Identification and pathogenicity detection of Ralstonia solanacearum gd-2
The strain gd-2 cultured on TTC medium exhibited a central reddish color surrounded by a milky white irregular shape, and its mobility was visible under high light conditions (Figure 1A).Agarose gel electrophoresis showed that the R. solanacearum strain gd-2 exhibited 144 bp and 280 bp bands, indicating that the strain is R. solanacearum phylotype I (Figure 1B:line 3).Amplification of the egl gene of the R. solanacearum strain gd-2 using the endoglucanasespecific primers Endo-F/Endo-R resulted in an 800 bp band, and sequencing and alignment results showed that the strain belonged to sequence variant 15 (Supplementary Table S2).After inoculating different varieties of tobacco with gd-2, a typical symptom of bacterial wilt was observed: leaves gradually showed wilting symptoms, and the infection spread from the lower leaves to the upper leaves.The stems gradually decayed until the entire tobacco plant died.There were differences in resistance among the tobacco varieties (Figure 1C) and the disease index and resistance performance are similar to other study results of different strains (Li et al., 2015;Cao et al., 2013;Pan et al., 2021).These results indicated that gd-2 meets the characteristics of typical R. solanacearum and has pathogenicity to tobacco.

Sequencing, assembly and annotations of the gd-2 genome
Total 7,666,395,879 bp base reads were identified from sequencing data.The GC depth and K-mer frequency distribution results showed that there was no contamination of miscellaneous bacteria in the sequencing data.After assembling the sequencing data, 3,828,519 bp chromosomes and 2,098,962 bp plasmids were obtained.Gene prediction results showed that 3,434 and 1,640 genes were identified in the chromosomes and plasmids, respectively.The predicted noncoding RNAs included 59 tRNAs, 12 rRNAs, four 5S rRNAs, four 16S rRNAs, and four 23S rRNAs.Different databases identified 2,600-5,000 genes with functional annotations.The COG annotation classification, GO annotation classification, and KEGG annotation classification are shown in Figure 2. COG annotation classification involved 24 categories, and the five categories with the largest number of genes were amino acid transport and metabolism (417), transcription (393), general function prediction only (335), signal transduction mechanisms (299), and cell wall/membrane/envelope biogenesis (299); 172 genes had an unknown function.For GO annotation classification, the three biological process categories with the highest number of enriched genes were regulation of DNA-templated transcription, methylation, and phosphorylation; the three cellular component categories with the highest number of enriched genes were integral component of membrane, plasma membrane, and cytoplasm; and the three molecular function categories with the highest number of enriched genes were DNA binding, ATP binding, and metal ion binding.For KEGG annotation classification, global and overview maps, signal transduction, and membrane transport had the highest number of enriched genes.
The genome circle diagram include genome size identification, gene information on the positive and negative strands, ncRNA, GC content, GC skew, and other information (Figure 3).The genes carried by GIs usually confer selective advantages to bacteria.17 GIs were identified, of which 11 originated from chromosomes and 6 from plasmids (Supplementary Table S3).Seven prophages were identified and of which six originated from chromosomes and one from plasmids (Supplementary Table S4).Three CRISPR_Cas systems were identified, of which one originated from chromosomes and two from plasmids (Supplementary Table S5).In addition, 130 genes were annotated as carbohydrate active enzyme-related genes (Supplementary Table S6), 68 genes were predicted as pathogen-host interaction-related genes (Supplementary Table S7), 1,172 genes were predicted to have transmembrane structures (Supplementary Table S8), 494 genes were predicted to be transporters (Supplementary Table S9), 848 genes contained signal peptide domains (Supplementary Table S10), and 705 genes were predicted to be secreted proteins (Supplementary Table S11).

Comparative genome analysis between the strain gd-2 and 10 highest similarity genomes
By BLAST alignment of gd-2 genome data with the NCBI database, 10 sequences with the highest similarity were identified, including six plasmid sequences of R. solanacearum, two chromosome sequences of R. solanacearum, and two plasmid sequences of R. pseudosolanacearum (Supplementary Table S12), the comparative genome circle diagram was shown in Figure 4A.The full length of gd-2-PlasmidA is 2,098,962 bp, with a GC content of 67.00%.Screening genes with identity >60% and coverage >90% predicted by CARD and VFDB, we identified nine possible antibiotic resistance genes and 18 candidated virulence factors.The antibiotic resistance genes are mainly related to various multidrug efflux pumps, such as adeABC gene adeB, RosAB gene rosA and rosB, AcrAB-TolC gene acrB, AdeFGH gene adeF, MuxABC-OpmB gene MuxB, MdtABC-TolC gene MdtC and BaeR which promotes the expression of MdtABC and AcrD   efflux pumps.The virulence facors include flagella which encoding polar flagella needed for motility and macrophage invasion, Cya which encoding a dual-function toxin with adenylate cyclase and haemolytic activity, contribute as an anti-inflammatory protein and heat shock protein (Hsp) 60 which mediates complement-independent attachment to mammalian and amoebic host cells.In addition, five GIs, one prophage and two CRISPR elements were also identified on this plasmid.The differences between gd-2-PlasmidA and other genomes were mainly located at the position of GI12-GI14 (612,748-700,708 bp).We performed a detailed comparative gene cluster analysis of the GI12-GI14 region of gd-2-PlasmidA with four highest similar strain B2 plasmid (GenBank: CP049788.1),strain R24 plasmid megaplasmid (GenBank: CP076122.1),strain 202 chromosome (GenBank: CP049789.1),and strain CQPS-1 chromosome (Figure 4B).The analysis showed that the GI12 sequence of gd-2-PlasmidA showed some differences in the two putative proteins before the second IS5 and parts of IS5 from other sequences except for the B2 plasmid.In contrast, the GI14 sequence showed high similarity among the various strains, although the six genes at the end of the chromosome were deleted in both strain 202 and strain CQPS-1.In addition, the prophage structure of the R24 plasmid megaplasmid showed the greatest difference from strain Ph07 of gd-2-PlasmidA, with the internal deletion of three consecutive XerD genes, which were inverted in the chromosome sequences of strain 202 and strain CQPS-1.Chromosome of strain 202 contained two copies of the same Ph07 strain-like sequence while chromosome strain CQPS-1 contained an additional copy of the Ph07 strain-like sequence, but it lacked many other genes.These results reflect that the mobile elements of GI12, Ph07 and GI14 may have undergone further gene loss or multiple gene copies after integration into the gd-2-PlasmidA-like plasmid of R. solanacearum resulting in significant differences in the plasmid.
The functional genes related to the hrp gene cluster in all nine samples were statistically analyzed.The number of functional genes related to the hrp gene cluster identified in different strains ranged from 26 to 30, with 30 hrp genes identified in gd-2 (Supplementary Table S15).The hrp genes in gd-2 were consistent  with the number found in GMI1000 and CQPS-1.A comparative display analysis of the hrp gene cluster was performed for 4 samples, including GIM1000, CQPS-1, FQY-4, and gd-2 (Figure 6C).
The composition and diversity of R. solanacearum groups caused by bacterial wilt are very complex.Nowdays, 55 sequence variants of R. solanacearum have been identified and the NCBI database has published the complete draft genome of 145 R. solanacearum (Ahn et al., 2011;Cai et al., 2015;Liu et al., 2017;Asolkar and Ramesh, 2018;Greenrod et al., 2023).The genome of R. solanacearum is approximately 5.8 Mb, dominated by two circular replicons, with the occasional presence of small plasmids, such as CMR15 containing a 35 kb small plasmid and PSI07 containing a 13 kb small plasmid (Liu et al., 2017).The genome size of gd-2 was 5.93 Mb, including the chromosomes (3.83 Mb) and the megaplasmid (2.10 Mb), which was larger than phylotype I strain FJ1003 (5.90 Mb), phylotype I sequevar 17 strain CQPS-1 (5.89 Mb) and phylotype I GMI1000 (5.8 Mb) (Salanoubat et al., 2002;Liu et al., 2017;Chen et al., 2022).Gene prediction results showed that 3,434 and 1,640 genes were identified in the chromosomes and plasmids of gd-2, which were similar with FJ1003 (3,446 chromosomes genes and 1,564 megaplasmid genes), CQPS-1 (3,573 chromosomes genes and 1,656 megaplasmid genes), phylotype I sequevar 14 M strain FJ1 (3,502 chromosomes genes and 1,596 megaplasmid genes) (Tan et al., 2022).The hrp gene cluster is an important component of the T3SS, which is necessary for the pathogenicity of R. solanacearum and can induce hypersensitivity reactions in non host plants (Lindgren, 1997).And 30 hrp genes identified both instrain gd-2 and strain CQPS-1.However, the Statistical analysis of the structural composition of the secretion system of the gd-2 strain.The genes of each secretion system are represented by the gene names in the KEGG database, and the red boxes represent the corresponding genes in the strain.numbers of GIs which is important forms of horizontal transfer elements of gd-2 was less than CQPS-1 (21), FJ1 (21), and FJ1003 (23), which may affect the adaptability of bacterial strain gd-2 to the environment.The predicted noncoding RNAs of gd-2 included 59 tRNAs, 12 rRNAs, four 5S rRNAs, 16S rRNAs, and four 23S rRNAs.The number of tRNAs is similiar to CQPS-1 (58), biovar 4 Bs715 (59), FQY-4 (62), and FJ1 (59), significantly higher than FJ1003 (35), phylotype I YC45 (46), and Race 4 Biovar 4 SD54 (46) (Cao et al., 2013;Shan et al., 2013;She et al., 2015;Liu et al., 2017;Chen et al., 2022;Tan et al., 2022;Jeong et al., 2023).And more rRNA may improve protein synthesis ability and improve the adaptability of strains to the environment.
Pathogenic bacteria rely heavily on effector molecules secreted extracellularly or directly into host target cells to induce toxicity in the host or surrounding organisms.These different functional macromolecules are transported extracellularly through different secretion apparatuses (Li et al., 2012;Bai et al., 2018).Currently, seven types of secretion systems have been identified, which exhibit diversity not only in the effector molecules secreted but also in the composition of the apparatus.T1SS, T2SS, T3SS, T5SS, and T6SS are mainly found in gram-negative bacteria, while T7SS is mainly found in grampositive bacteria.T4SS is found in both gram-positive and gramnegative bacteria (Parizad et al., 2016;Cordsmeier et al., 2022).T1SS and T5SS contain simple structure, consisting of only two or three proteins (Zhou et al., 2019).T2SS, T3SS, T4SS, and T6SS exist more complex than T1SS and T5SS and their apparatus can traverse the entire cell membrane (Korotkov et al., 2012).Study on T7SS is still in its infancy, and the specific apparatus and mechanism are still unclear (Bitter et al., 2009).T3SS, T4SS, and T6SS can directly inject effector molecules into eukaryotic cells, and they are mostly encoded by clusters of consecutive genes, especially in pathogenic bacteria, where these apparatus genes often exist as virulence islands (Liao et al., 2021).In this study, 66 secretion system structural genes were identified, including 5 in T1SS, 23 in T2SS, 10 in T3SS, 16 in T6SS, 11 in Sec-SRP, and 3 in Tat.
The T3SS effector proteins have an hrp II-box (TTCGN16-TTCG), which is activated by HrpB and HrpG transcription and powered by the ATPase complex.It enters plant cells through the cytoplasmic ring, basement, endomembrane exit, and transport pore (Mcnally et al., 2011).The effector proteins in R. solanacearum consist of 94 orthologous families, of which 71 are transferred or secreted through T3SS (Cunnac et al., 2004).A total of 72 T3SS proteins were identified in the strain gd-2 genome and 72 T3Es were identified, which is comparable to the number found in GMI1000 (74), FQY-4 (70), and Y45 (69).RipAZ2 is a unique T3E in gd-2 compared with other eight sequenced strain.However, there are significant differences between CQPS-1 (54) isolated from phylotype I of tobacco and CMR15 (61) isolated from phylotype III.The framework division of 10.3389/fmicb.2024.1335081Frontiers in Microbiology 13 frontiersin.orgR. solanacearum lineages result from the evolution and geographical origin of R. solanacearum, so it is speculated that the T3Es specific to each of the four R. solanacearum lineages may have been formed during the long-term evolution of the strains and their hosts, which also reflects the complexity of R. solanacearum species from another perspective.

FIGURE 2 COG,
FIGURE 2COG, GO and KEGG functional categories of protein-coding genes in the R. solanacearum gd-2 genome.(A) COG classification of gd-2.The horizontal axis is the 25 COG categories, and the vertical axis is the number of genes annotated to the relevant categories in the genome.(B) Clustering results of GO annotation.The horizontal axis is the 3 GO categories, and the vertical axis is the number of genes annotated to the relevant categories in the genome.(C) Histogram of KEGG analysis of gd-2.The horizontal axis is the number of genes, and the vertical axis is the number of genes annotated to each category in the genome.

FIGURE 3
FIGURE 3 Genomic circle diagram of gd-2 (chromosome and plasmid).(A) Genomic circle diagram of gd-2.(B) Plasmid circle diagram of gd-2.The outermost circle of the circle diagram indicates the size of the genome; the second and third circles are the CDS on the positive and negative strands, with different colors indicating different functional classifications of COGs for the CDS; the fourth circle is rRNA and tRNA; the fifth circle is GC content,with outwards red portions indicating high GC content in the region.The higher the peak value is, the greater the difference from the average GC content.The inner blue part indicates that the GC content in this region is lower than the average GC content of the whole genome.The higher the peak value is, the greater the difference from the average GC content.The innermost circle is the GC skew value, which is calculated as G-C/G + C and can assist in determining the leading and lagging strands.Generally, the leading strand GC skew >0 and the lagging strand GC skew <0.It can also assist in determining the replication start point (minimum cumulative deviation) and end point (maximum cumulative deviation).

FIGURE 4
FIGURE 4Comparison of genome circle analysis and plasmid comparative gene cluster analysis of gd-2.(A) Comparison of genome circle analysis of gd-2.(1) The large plasmid in the gd-2 genome is used as the reference genome, and the order from inside to outside is as follows: first circle: the GC content of the gd-2 plasmid; second circle: the GC skew of the gd-2 plasmid; third circle: the full-length sequence of the gd-2 plasmid; fourth circle: the fulllength sequence of the B2 genome plasmid; fifth circle: the full-length sequence of the Phyl III-seqv23 genome chromosome III; sixth circle: the fulllength sequence of the pMAFF241647 plasmid; seventh circle: the full-length sequence of the SL3822 genome plasmid; eighth circle: the full-length sequence of the FJAT15244.F50 genome Plas1 plasmid; ninth circle: the full-length sequence of the MP chromosome in the P824 genome; tenth circle: the full-length sequence of the plasmid in the OE1-1 genome; eleventh circle: the full-length sequence of the plasmid in the RS 476 genome; twelfth circle: the full-length sequence of the HA4-1MP plasmid; thirteenth circle: the mobile elements annotated on the gd-2 plasmid, including genomic islands, prophages, and CRISPR-Cas systems; fourteenth circle: the suspected drug resistance genes annotated on the gd-2 plasmid; and fifteenth circle: the suspected virulence genes annotated on the gd-2 plasmid.(2) Comparison of plasmid comparative gene cluster analysis of gd-2-PlasmidA.Sequence comparison analysis of GI12-GI14 of gd-2-PlasmidA with B2, 202, CQPS-1.

FIGURE 6
FIGURE 6 Venn diagram and gene comparison of hrp clusters among gd-2 and others R. solanacearum strain.(A) Venn diagram showing numbers of specific and shared gene families among gd-2 and five different phylotype species.(B) Venn diagram showing numbers of specific and shared gene families among gd-2 and three phylotype I species.(C) Genetic organization and gene comparison of hrp clusters between CQPS-1, Y45 and FQY_4.