Directly Sequenced Genomes of Contemporary Strains of Syphilis Reveal Recombination-Driven Diversity in Genes Encoding Predicted Surface-Exposed Antigens

Syphilis, caused by Treponema pallidum subsp. pallidum (TPA), remains an important public health problem with an increasing worldwide prevalence. Despite recent advances in in vitro cultivation, genetic variability of this pathogen during infection is poorly understood. Here, we present contemporary and geographically diverse complete treponemal genome sequences isolated directly from patients using a methyl-directed enrichment prior to sequencing. This approach reveals that approximately 50% of the genetic diversity found in TPA is driven by inter- and/or intra-strain recombination events, particularly in strains belonging to one of the defined genetic groups of syphilis treponemes: Nichols-like strains. Recombinant loci were found to encode putative outer-membrane proteins and the recombination variability was almost exclusively found in regions predicted to be at the host-pathogen interface. Genetic recombination has been considered to be a rare event in treponemes, yet our study unexpectedly showed that it occurs at a significant level and may have important impacts in the biology of this pathogen, especially as these events occur primarily in the outer membrane proteins. This study reveals the existence of strains with different repertoires of surface-exposed antigens circulating in the current human population, which should be taken into account during syphilis vaccine development.


INTRODUCTION
Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis, a globally occurring disease. Although the worldwide number of syphilis cases dramatically decreased after the introduction of penicillin therapy in the 1940s, the estimated number of new syphilis cases per year remains over 5.6 million. Especially alarming is the number of congenital syphilis cases, which is approaching one million cases per year (Newman et al., 2012;Peeling et al., 2017). In developed countries, syphilis is often transmitted among MSM patients (men who have sex with men). Moreover, MSM patients with syphilis are often coinfected with HIV (42% in Western Europe) (Dubourg et al., 2015). It is believed that syphilis facilitates the HIV infection, since syphilitic genital ulcers are infiltrated with lymphocytes (the primary target cells for HIV-infection) and provide a portal of entry for HIV acquisition. The rising prevalence of syphilis among MSM patients has coincided with the introduction of highly active anti-retroviral drugs leading to decreased HIVassociated mortality and the re-emergence of unsafe sexual behavior among MSM (Stolte et al., 2001). TPA infections are characterized by early and fast dissemination, immune evasion and long persistence in untreated patients. However, the underlying molecular mechanisms remain poorly understood (Radolf et al., 2016).
In spite of recent advances in in vitro cultivation of TPA (Edmondson et al., 2018), routine laboratory cultivation of this pathogen directly from patient samples is not yet possible. Therefore, most of the information on TPA genetics comes from genome sequencing studies, where DNA was isolated from bacteria propagated in experimentally infected rabbits (Fraser et al., 1998;Matȇjková et al., 2008;Giacani et al., 2010Giacani et al., , 2014Pětrošová et al., 2012Pětrošová et al., , 2013Zobaníková et al., 2012;Tong et al., 2017). The research community uses culture-independent enrichment techniques prior to whole genome sequencing of TPA clinical samples due to the overwhelming levels of human DNA and very low amounts of TPA DNA (1000:1 ratio of human to TPA DNA) found in clinical samples. However, available enrichment techniques demonstrate low efficiency (e.g., Antitreponemal antibody enrichment, ATAE) (Grillová et al., 2018b) or are based on sequence-specific protocols (e.g., DNA-capture microarray and "in solution" capture techniques) (Arora et al., 2016;Pinto et al., 2016;Knauf et al., 2018;Marks et al., 2018), thus preventing the recovery of unique sequences not present in the reference genomes.
In this study, we performed direct whole genome sequencing of 25 TPA clinical samples isolated from different geographical areas using methyl-directed enrichment prior to next generation sequencing (NGS) (Barnes et al., 2014). Using this approach, we obtained 11 complete genome sequences, which represents the vast majority (92%) of complete TPA genomes sequenced directly from clinical samples. The subsequent detailed comparative genomic analyses revealed unexpected variability among Nichols-like genomes driven by inter-clade and/or intra-strain recombination events, which were accumulated mainly in the genes encoding predicted outer membrane proteins. This discovery, beyond being relevant to the understanding of basic biology of treponemes, highlights the presence of different repertoires of alleles coding for potential virulence factors, which circulate in the current human population.

Clinical Samples
We selected 25 TPA samples recently isolated from 24 patients diagnosed with syphilis for whole genome sequencing. The sample set was selected to contain (i) samples with the highest possible genetic diversity, (ii) samples from different geographical areas and (iii) samples representing contemporary TPA infections. The samples were collected in four countries on three different continents (Australia, Cuba, Czechia, and France), mostly from males (92%) from which 74% were MSM (Table 1). Samples were taken as genital (n = 16), anal (n = 4), buccal (n = 3), or skin smears (n = 1), with one sample of lung tissue from a fatal case of congenital syphilis. As a result of the non-random sample selection, Nichols-like strains were overrepresented in our sample set (31%) compared to their prevalence in the infected population (5.9%) (Šmajs et al., 2018). Samples belonged to 9 different sequencing types (STs) and most (72%) carried the A2058G mutations in both rrn operons leading to resistance to macrolide antibiotics ( Table 1).
The number of TPA DNA copies as well as human DNA copies were determined by qPCR in all examined samples. The number of TPA DNA varied from 1 to 10 5 copies per µl with the TPA DNA/human DNA ratio ranging from 0.01 to 3.69 (Supplementary Table S1). Given the fact that the human genome is approximately 3000 times larger than the TPA genome, the samples contained 10 3 -10 5 times more human DNA than treponemal DNA requiring TPA DNA enrichment prior to sequencing.

Methyl-Directed Enrichment Using Restriction Endonuclease DpnI
The method we are presenting in this paper is based on binding activity of the DpnI endonuclease. DpnI is a restriction endonuclease that recognizes DNA methylated on adenine residue within the GATC sequence. This methylated DNA motif occurs in all bacteria that have deoxyadenosine methyltransferase (DAM) and is not present in higher eukaryotes. Immobilized DpnI proteins on magnetic beads were used for specific capture of prokaryotic DNA. In the absence of Mg 2+ ions, DpnI binds its recognition sequence without cutting it. A previous study by Stamm et al. (1997) identified methylated adenine residues in GATC sequences in the TPA genome. To ensure the  (Grillová et al., 2018b) prior to the DpnI enrichment. * * Parallel samples taken from the same patient but isolated from different clinical material. MSM, men who have sex with men; MSW, men who have sex with women; WSM, women who have sex with men. appropriate binding of DpnI and enrichment effectivity of the DpnI enrichment, a pilot experiment on one sample (CW88 ;  Table 1) was performed. Before enrichment, the sample was sequenced with 384,489,027 reads and 25,168 of them were mapped to the treponemal genome. This resulted in 92% broad genome coverage and median sequencing depth of 6.6× and more than 100 sequencing gaps. After enrichment, the same sample was sequenced with a similar number of reads (332, 564, 761) revealing 296, 978 treponemal reads, which resulted in a near complete genome (98%) with a median sequencing depth of 68×. Except for the paralogous and repetitive regions (listed in the Supplementary Table S2), which were excluded from the reference-guided approach, the draft genome of CW88 had even sequencing coverage depth without any detectable biases. This indicated that G m6 ATC DNA motifs are evenly distributed across the TPA genomes and that the DpnI method is suitable for the TPA DNA enrichment.

Direct Whole Genome Sequencing
Methyl-directed enrichment was applied to all examined clinical samples (n = 25) prior to NGS (details in section "Materials and Methods"). We used the preliminary sequencing to calculate the appropriate number of reads needed for the best possible coverage in the next sequencing runs. The NGS statistics are given in the Supplementary Table S3. In order to determine if the new sequencing approach was comparable to the traditional pooled segment genome sequencing (PSGS) approach, we sequenced isolates from the same organism (strain Phi-1 and Grady) following each method and did not find any discrepancies between the genomes (data not shown). According to the preliminary sequencing results, we selected samples containing treponemal DNA showing the highest breadth coverage (>97%, n = 11) as candidates for complete genome sequencing and samples with lower coverage (69-97%, n = 8) as candidates for genome-wide analyses (with a depth of coverage of 3 or greater). The samples with the lowest coverage (<10%, n = 7) were excluded from further analyses (Supplementary Table S3). For the whole genome determination, we amplified and Sanger sequenced regions with low coverage (less than 3 good-quality reads; in average 10 regions per genome) as well as regions, which were excluded from the bioinformatic pipeline including paralogous regions (tpr genes and rrn operons) and repetitive regions (e.g., arp and TP0470 genes) (details in section "Materials and Methods").
All samples from SS14-like and Nichols-like clades revealed complete gene synteny. However, we identified significant differences between the genetics of these two TPA clades (Figure 1 and Supplementary Figure S1). The variability FIGURE 1 | Phylogeny of all TPA complete genome sequences determined to date. Maximum likelihood bootstrapping method was used to generate the phylogenetic tree based on 2273 variable positions found exclusively in the whole genome sequences available to date ( Table 2). The bootstrap values, when above 60, are given next to the branches in red. SS14-like strains are represented by red squares, Nichols-like strains by blue squares. The year of isolation is given next to the branches. Strains designated CW30-CW88 are the whole genome sequences determined in this study by DpnI enrichment while Phi-1 and Grady were established in this study by PSGS. These represent 65% of all whole genome sequences of TPA to date. tpr genes, repetitive regions and inter-and intra-recombinant loci were excluded from this analysis ( Supplementary Table S2 and Figures 2, 5). We used the genome of Samoa D (T. pallidum subsp. pertenue) as an outgroup.
observed in Nichols-like strains was about one order of magnitude higher compared to that of the SS14-like strains. Moreover, in contrast to SS14-like strains, Nichols-like strains showed accumulation of a high number of single nucleotide variants (SNVs) in several genes. The manual inspection of the genes with a high SNV density (defined as more than 4 SNVs per gene) revealed that the genetic diversity was a result of inter-clade and/or intra-strain recombination events (altogether covering 147 SNVs, which represents 49.5% of variability found within Nichols-like strains). The intra-strain recombination events included rearrangements of genes coding for lipoproteins (TP0856 and TP0858), which possessed modular structures , rrn spacers and predicted virulence factors tprG and tprJ (described below, Figure 2). The inter-clade recombination events were identified as sequences resembling both different syphilis genetic groups (i.e., SS14-like and Nichols-like groups) in the TP0136 gene encoding an outer membrane protein. In addition, the inter-clade recombination included sequences resembling both syphilis and bejel sequences (in TP0117 coding for TprC; in TP0317 coding for TprG; in TP0462 encoding probable lipoprotein; in TP0483 coding for hypothetical protein; in TP0621 coding for TprJ; in TP0865 encoding a putative outer membrane protein) ( Table 2 and Figure 3).
The detailed overview concerning different genome dynamics of SS14-like and Nichols-like strains are described in the Supplementary Text S1.

Analyses of tpr Genes
When analyzing tpr genes among the SS14-like samples we did not find any significant variability with exception of the sample CW30, where we found a Nichols-like allele in the tprG gene (differing in 29 SNVs from the SS14 sequence) probably as the result of an inter-clade recombination event (Figure 3). Otherwise, the SS14-like samples differed only in a few SNVs in the tprC and tprI genes compared to the SS14 reference (1 and 3, respectively, Figure 4 and Supplementary Figure S2).
In Nichols-like samples, the analyses of tpr genes revealed higher sequence variability compared to the group of SS14like samples. We have identified gene conversion of a partial sequence of the tprJ gene into the tprG gene (intra-strain recombination, Figure 2). Moreover, in sample CW59, we identified inter-clade recombination in the tprJ allele (with a putative TEN -T. pallidum subsp. endemicum or TPE -T. pallidum subsp. pertenue donor sequence, Figure 3). And finally, we have identified new alleles in the tprC locus (tprC3, tprC4, Figure 3) represented by different branches in maximum likelihood phylogeny (Supplementary Figure S2). After manual inspection of the tprC3 and tprC4 alleles, we have identified the putative donor sequences of these alleles as SS14-like, bejel (TEN), or yaws (TPE) strains. The remaining tpr genes among Nichols-like strains were quite uniform, except for a few SNVs found in alleles present in the tprB, tprI, tprJ, tprL loci (Figure 4).

Recombination
When analyzing all published complete TPA genomes (including genomes from this study), we observed inter-clade or intra-strain recombination events frequently among Nichols-like strains, and only sporadically among SS14-like strains (Figure 1 and Table 2). More specifically, intra-strain recombination was found in all complete genomes of Nichols-like clade. Interestingly, inter-clade recombination was found more frequently in Nichols-like strains (B) Rearrangement of the modular structure of TP0856 and TP0858 genes. We identified two variants of the modular structure of the TP0856 and TP0858 genes -r1r3r4r5/r7r4r5 and r1r3r4r5/r7r4r6. These structures differ in 30 nucleotide sites. The figure was modified according to Strouhal et al. (2018). (C) Intra-strain recombination between tprG and tprJ. tprG1 and tprG2 allele variants differ in 18 variable positions.

Analyses of the Strains Based on the Conserved Genomic Regions
To analyze the phylogeny of TPA strains, we excluded all identified inter-clade and intra-strain recombinant loci identified in this study and in the previous studies ( Figure 5) and variable genes such as tpr genes, and reconstructed Network phylogenetic tree of these two clades based solely on the conserved genomic regions. Except for Mexico A strain, SS14-like strains created a star-like topology as described previously (Arora et al., 2016) and the highest genetic distance (18 SNVs) was found between sample CW87 and the SS14 reference genome (Supplementary Figure S3). Nichols-like strains were found to be more genetically diverse than SS14-like strains with the minimum genetic distance represented by 12 SNVs in the case of DAL-1 (1991) and Nichols (1912), and the highest genetic distance represented by 160 SNVs in the case of SEA81-4 (2014) and DAL-1 (1991).
Interestingly, SNVs distinguishing Nichols-like and SS14-like strains code for a significantly higher proportion of synonymous substitutions of amino acids (45.3%) compared to the SNVs found inside the SS14-like strains (17%) and within the Nicholslike strains (18%), suggesting that separate evolutionary forces operate inside and between each clade. Since we observed the same recombination events in the different phylogenetic branches (e.g., inter-clade recombination of TP0865 and tprJ in SEA-81-4 and CW59; Table 2 and Figure 1), these recombination events may have emerged several times independently.

Recombination Loci Encode Putative Surface-Exposed Antigens
Predicted protein structures encoded by recombinant loci provide important insights into structural and functional implications of recombination-driven variability. From the 8 recombinant protein-coding regions identified in this study, 6 were predicted to code for outer membrane proteins (TP0136, TP0858, TP0865, tprC, tprG and tprJ) and all of them contained predicted antigenic peptides (15-29). The prediction of treponemal protein structures is quite limited due to the lack of the protein homologues. However, a protein structure for tprC was predicted by Centurion-Lara et al. (2013) and more recently by Kumar et al. (2018). The recombinant regions identified in the new tprC3 and tprC4 alleles found TABLE 2 | Inter-clade and intra-strain recombination in examined samples and in previously determined whole genome sequences of TPA.
FIGURE 5 | Inter-clade recombinations identified to date. The genes with stars represent loci, which were previously identified as recombinant (Pětrošová et al., 2012;Arora et al., 2016;Mikalová et al., 2017b;Grillová et al., 2018c). Five (out of 10) loci were identified in this study and were found among contemporary clinical samples enriched by a sequence-independent enrichment method.

DISCUSSION
Efforts to understand the pathogenesis of TPA have been hindered by the inability to routinely propagate the bacterium in vitro and the lack of an efficient method for obtaining genomes directly from clinical samples. TPA isolates form two separate clusters, i.e., SS14-like and Nichols-like clades (Pětrošová et al., 2013;Nechvátal et al., 2014;Arora et al., 2016;Šmajs et al., 2016, 2018; Figure 1 and Supplementary Figure S1). We have found striking genetic diversity of the contemporary Nichols-like strains when compared to SS14-like strains. Nichols-like strains represent only a minority (about 6%) of contemporary strains circulating in the syphilis-infected population (Woznicová et al., 2007;Flasarová et al., 2012;Grillová et al., 2014Grillová et al., , 2018cArora et al., 2016;Gallo Vaulet et al., 2017;Mikalová et al., 2017a;Pospíšilová et al., 2018), as revealed by molecular typing studies of TPA isolates. Although there are several possible explanations available, the ultimate reasons for this contemporary worldwide predominance of SS14-like isolates in the human population remain unknown (Šmajs et al., 2016). However, the high prevalence of SS14-like strains circulating in the contemporary syphilis-infected population could be due to the recent expansion of these strains. Therefore, it would not be surprising that most of the SS14-like strains are in fact of more clonal character than the Nichols-like strains.
The observed genetic diversity of the contemporary Nicholslike strains could, therefore, be a result of sampling bias. This possibility is supported by the existence of the sequentially diverse SS14-like strain, TPA Mexico A, isolated in 1953 (Pětrošová et al., 2012). To further address this question, additional molecular typing studies, accompanied by whole genome sequencing of genetically diverse MLST types, would be needed. The methyl-directed enrichment used in this study allowed us to discover that about 50% of this genetic diversity was a result of inter-clade and/or intra-strain recombination events. Although molecular mechanisms of inter-clade and/or intrastrain recombination events could differ, both these processes can provide new alleles to TPA strains that are positively selected by the immune host system. In fact, most of the detected variability within both SS14-and Nichols-like clades predominantly led to non-synonymous amino acid changes which is consistent with positive selection of the corresponding genetic loci.

Intra-Strain Recombinant Events
As described previously, the mechanisms resulting in intrastrain recombinant events include gene conversion in regions with modular character, e.g., tpr (T. pallidum repeat) genes (predicted to code for the potential virulence factors) (Gray et al., 2006;Strouhal et al., 2018), duplication or deletion of repetitive sequences (arp, TP0470) (Harper et al., 2008;Šmajs et al., 2018) FIGURE 6 | Homology models of TprC, TP0858, and TP0865. Homology models are shown in cartoon representation. The mutated residues are depicted with spheres. The models are colored based on a rainbow coloring scheme (with N-terminal of protein colored blue and C-terminal colored red). and reciprocal translocation (rrn operons, tprCD loci) (Čejková et al., 2013;Centurion-Lara et al., 2013).
The tprD2 allele, but not tprD (differing in 328 nucleotide positions), was previously predicted to be an outer membrane protein (Centurion-Lara et al., 2000), which suggests a different functional role for each allele during the course of infection. In our study, tprD2 alleles were found among all completely sequenced TPA isolates (belonging to both SS14-and Nicholslike clades) despite the fact, that tprD2 was previously believed to occur exclusively among SS14-like strains and tprD allele among Nichols-like strains (Fraser et al., 1998;Zobaníková et al., 2012;Centurion-Lara et al., 2013). This suggests that in the ancestor of these Nichols-like strains, "tprD allele" arose by duplication (gene conversion) of the tprC gene. A similar finding was recently published by Kumar et al. (2018). Interestingly, although no such recombination was found among SS14-like strains, in the SS14 genome, a minor tprD allele has been found in the tprD locus (Pětrošová et al., 2013).
Similar recombination of predicted virulence factors identified among the clinical samples in this study was observed between the tprG and tprJ genes resulting in a new tprG2 allele in Nichols-like clinical samples. The same pattern of recombination was already predicted as possible in the work of Strouhal et al. (2018).
Another recombination found in this study resulted in new patterns in the TP0856 and TP0858 genes, following a pattern previously recognized among TPE strain Kampung-Dalan K363, TPA SEA81-4  and TPE strains from the Solomon Islands (Marks et al., 2018). Both these proteins showed structural similarity to FadL, a long-chain fatty acid transporter (Kumar et al., 2018) required for the specific binding and transport of exogenous long-chain fatty acids prior to metabolic utilization. Moreover, the predicted 3D structures of TP0858 revealed that the recombination-driven diversity found almost entirely corresponds to residues located at the host-pathogen interface (Figure 6).
Finally, as a result of reciprocal translocation, we identified an inverse rrn spacer pattern in one of the Nichols-like samples. However, these rrn spacer patterns (Ile/Ala or Ala/Ile) appeared to be distributed randomly across species/subspecies classification, time and the geographical source of the treponemal strains (Čejková et al., 2013) and the impact of this intra-strain recombination remains unknown.

Inter-Clade Recombinant Events
Treponema pallidum subsp. pallidum is not considered a competent bacterium and does not possess gene transfer mechanisms. In addition, no plasmid or phages have been described as of yet. Despite this, several apparent recombinant loci appear to result from inter-clade genetic recombinations. Previous studies identified such recombinations in TP0136 (Arora et al., 2016;Grillová et al., 2018c), TP0548 (Mikalová et al., 2017b), TP0326, TP0488 (Pětrošová et al., 2012), and TP0865 (Arora et al., 2016). In this study, we have identified five new recombinant loci representing one half of all inter-clade recombinant genes identified to date. We have identified recombination events between Nichols-like and , TP0136, TP0317 [tprG]); and Nichols-like strains and bejel treponemes (or possibly in some cases also TPE) (TP0117 [tprC], TP0462, TP0483, TP0621 [tprJ], TP0865) (Figure 5). In the case of tprG, we have found both intra-strain recombinations (recombination between tprG and tprJ genes resulting in a new tprG2 allele) and inter-clade recombination (in the Nichols-like strain CW30 with sequence originating from SS14-like strains, Figure 3). In addition, the recombinant regions identified in the new tprC3 and tprC4 alleles found in this study (Supplementary Figure S2) correspond to the identified extracellular loops (L3, L4, L5) of the β-barrel outer membrane protein which were predicted to serve as B-cells epitopes (Kumar et al., 2018). Similarly, the newly predicted 3D structures of protein TP0865 in this study (Figure 6) showed the accumulation of recombination-driven diversity in the residues located at the host-pathogen interface.
While the intra-strain recombinant events are relatively easy to explain, the presence of inter-clade recombinations would require a DNA transfer to the recipient bacterium from the outside, likely during co-infection of patients with treponemes either belonging to two different syphilis clades (Nichols, SS14) or two different subspecies of Treponema (causing syphilis and bejel). Cross-immunity experiments (Turner and Hollander, 1957) revealed that there is no protective immunity between different TPA strains and between TPA and TEN strains enabling co-infections or overlapping infections of different treponemal strains. The subsequent homologous recombination of DNA taken up by recipient cells could provide alleles encoding protein sequences allowing persistence and escape from the immune response of the host.
In addition to inter-clade and intra-strain recombinations identified in this study, we have analyzed all publicly available TPA genome sequences (Supplementary Table S4) for the presence of such recombinant events. Among whole genome sequences (n = 20), we identified a SEA81-4 strain (Nicholslike strain; CP003679.1) that carries the tprG2 allele and the r3r4r6 modular structure of TP0858 gene as results of intra-strain recombination events and we have identified TP0865 and tprJ as recombinant loci. Among draft genomes (n = 74), we have identified strains UW189B as inter-clade recombinant in both the TP0462 and TP0865 loci.
Given the adaptive evolution operating within both clades of syphilis treponemes, recombinant loci appear to be important in the treponemal pathogenesis and bacterium-host interactions. Despite the fact that TPA contains a low abundance of surfaceexposed antigens, most of the Tpr proteins and recombinant proteins including TP0136, TP0326, TP0462, TP0483, TP0488, TP0548, and TP0865 encode for outer membrane proteins which are either targets for interactions with the immune system or structures enabling binding to host tissues (Brinkman et al., 2008;Arora et al., 2016;Kumar et al., 2018). In general, these proteins should be an important candidates for vaccine development. This is opportune, since several research teams are currently working on the development of a vaccine against syphilis. A comprehensive syphilis vaccine needs to react with antigens present in the reference strains but also on variants among contemporary TPA strains circulating in the human population. The current research identifying molecular types of TPA strains and their subsequent genomic analyses should be able to provide the required inventory of treponemal antigens and their variants.

Collection of Clinical Samples
Clinical samples were collected between 2013 and 2016 from several clinical departments in the Czechia (Department of Dermatovenerology, University Hospital Brno, Czechia); France (Institut Cochin U1016 Equipe Batteux, Laboratoire de Dermatologie-CNR Syphilis, Faculté de Médecine, Université Sorbonne Paris Descartes, Paris, France); Cuba (Instituto de Medicina Tropical "Pedro Kourí", Havana, Cuba) and Australia (Melbourne Sexual Health Centre, Australia) ( Table 1). Patients were considered as syphilis-positive when clinical symptoms were combined with positive syphilis serology or with positive PCR detection of treponemal DNA. All clinical samples were received after patients signed an informed-consent form and the written informed consent was obtained. The design of the study was approved by the ethics committee of the Faculty of Medicine, Masaryk University and the study was conducted in compliance with the Declaration of Helsinki.

Isolation of DNA, MLST, and Quantification of Treponemal and Human DNA Present in the Clinical Material
Swab extracts (prepared by submersion of swabs into 1.5 ml of PBS and agitation for 5 min at room temperature) and tissue sample (25 mg) were used for isolation of DNA using a QIAamp DNA Blood Mini kit and a DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany) according to manufacturer's recommendations. Multi-locus sequence typing was performed as described previously (Grillová et al., 2018a). The sequences were submitted to BIGSdb of T. pallidum subsp. pallidum available at pubMLST , and the allelic profiles, STs and clonal complexes were determined ( Table 1).

Methyl-Directed Enrichment Using Restriction Endonuclease DpnI
The endonuclease DpnI cleaves the tetramer GATC when methylated at the N6 position of adenine. When used under conditions which prevent digestion, DpnI binds the methylated tetramer which is distributed approximately every 256 bases on average in DAM positive bacteria such as TPA but is absent in mammalian genomes. This enables selective bacterial DNA enrichment from the excess of human DNA found in syphilis samples. To accomplish methyl-directed enrichment, clinical DNA samples (10-40 µl) were added to DpnI-coated beads in 1.7 mL Eppendorf tubes in a final volume of 50 µl, as described previously (Barnes et al., 2014). The beads were mixed by end-over-end rotation for 30 min and the DpnImagnetic beads were separated using a magnetic stand. The beads were washed once with Wash Buffer (10 mM Tris pH 7.9, 500 mM NaCl, 10 mM CaCl 2 , 0.1% Tween 20) followed by a single Binding Buffer wash (10 mM Tris pH 7.9, 50 mM NaCl, 10 mM CaCl 2 , 0.01% Tween 20). DNA was eluted from beads by incubation with 20 µl of 5 M guanidinium thiocyanate at room temperature for 5 min and subsequently desalted via dialysis for 45 min using 20,000 MWCO Slide-A-Lyzer MINI dialysis cups (Thermo Scientific, Waltham, MA, United States).

Next Generation Sequencing
The Nextera XT DNA library Preparation Kit (Illumina, San Diego, CA, United States) was used to produce barcoded libraries for all DpnI enriched fractions. Library products were amplified using 19 cycles and DNA was purified with AMPure XP beads (New England BioLabs, Ipswich, MA, United States) with an elution volume of 20 µl. Library quality and size distributions were determined with the Lab Chip GX Touch-HT (Perkin Elmer, Waltham, MA, United States) and High Sensitivity DNA Analysis Kit (Perkin Elmer, Waltham, MA, United States). Libraries were diluted for sequencing and pooled as appropriate for the targeted sequencing depth on a single flow cell. NextSeq runs were prepared using NextSeq 500/550 300 Cycle High Output v2 (FC-404-2004), loaded at approximately 3.9 pM. All runs were configured to obtain 149 nucleotide pairedend read lengths.

Completion of Whole Genomes
In the samples with the highest NGS broad coverage (>97%, n = 11), the Sanger sequencing of regions with low coverage was performed (approximately 10 regions for every genome). Moreover, paralogous tpr genes (tprC, tprD, tprE, tprF, tprG, tprI, tprJ) were amplified with Long-range PCR (Supplementary Table S2) under conditions described in the Quality of treponemal DNA paragraph and Sanger sequenced using sequencing primers presented in the Supplementary Table S5. The number of 60 bp-long repetitions in the arp gene and the number of 24 bp-long repetition in the TP0470 gene were determined by Sanger sequencing using primers listed in the Supplementary Table S2. The number of repetitions was verified by gel electrophoresis. To determine the intergenic spacers between rRNA encoding rrn loci, we designed primers for nested PCR using one unique primer in outer step for distinguishing Ala/Ile and Ile/Ala patterns (Supplementary Table S6). The Sanger sequencing reads were combined with the Illumina sequencing reads using Lasergene software (DNASTAR v. 7.1.0.; DNASTAR, Madison, WI, United States). The workflow to obtain the whole genome sequences of TPA is given in the Supplementary Figure S4.
Sequences of treponemal rRNA operons (rrn1, rrn2) were searched against the de novo assemblies (BLAST) (v2.2.31+, Camacho et al., 2009) and all the hits with more than 90% identity, and alignment length of 100 bp were extracted in fasta and bed format (Bedtools) (v2.27.0, Quinlan and Hall, 2010). We have used BLAST to discover all potential sequences of rRNA regions or their fragments. We manually inspected all the BLAST hits and selected those that most likely represented the real rRNA regions.

Phylogenetic Analyses
Maximum likelihood phylogeny based on whole genomes was done using MEGA (v6.0, Tamura et al., 2011) using the Tamura-Nei model and 1000 bootstrap replications. The visualization of the phylogenetic tree was done using iTOL (v4, Letunic and Bork, 2007). Median-joining (MJ) networks were generated with Network version 4 (Bandelt et al., 1999).

Detection of Recombination Events
Recombination patterns were identified by manual inspection of gene sequences that had a high number of SNVs (identified by SNV call) and displayed phylogeny incongruent with the one derived from whole genome sequences. The high number of SNVs was defined as the presence of at least 4 SNVs per gene, which is about 10-times higher number of polymorphic sites than expected between TPA clades (Pětrošová et al., 2013;Šmajs et al., 2016), and this threshold was calculated from the previously published recombination events found in other treponemal genomes (Pětrošová et al., 2012;Štaudová et al., 2014). The gene tree topology was tested against the tree topology derived from the whole genome sequences.

Pooled Segment Genome Sequencing (PSGS) of TPA Phi-1 and Grady Strains
Philadelphia 1 (Phi-1) strain was isolated in Philadelphia, United States in 1988 (Harper et al., 2008) and Grady strain was isolated in Atlanta United States in 1980s. Both strains were provided by David L. Cox (Centers for Disease Control and Prevention, Atlanta, GA, United States) as a rabbit testicular tissue containing treponemal cells. Whole genomic DNA was amplified from rabbit testicular tissue using QIAGEN REPLIg kit (QIAGEN, Hilden, Germany) according to manufacturer's instructions. Amplified DNA served as a template for T. pallidum intervals (TP intervals) amplification during the PSGS phase as described previously (Weinstock et al., 2000;Strouhal et al., 2017).
The amplified TP intervals (n = 279 and 272) of the Phi-1 and Grady samples, respectively, were sequenced using the Illumina platform (NextSeq 500) at CEITEC (Brno, Czechia). To separate paralogous regions, the amplified TP intervals were labeled with multiplex identifier adapters and sequenced as four different samples (Nextera TM XT DNA Sample Preparation Kit, Illumina Inc., Madison, WI, United States). The sequencing reads were trimmed (Trimmomatic) (v0.32, Bolger et al., 2014), and low-quality bases were removed with a sliding window (window length of 4 nt; average quality of at least Phred 17). The sequencing reads shorter than 50 bp were omitted from the analyses. Reads were analyzed with respect to four distinct pools and were de novo assembled using SeqMan NGen software (v4.1.0, DNASTAR, Madison, WI, United States) as well as mapped to the TPA reference genome (GenBank Acc. No. CP004011.1).

3D Structure Prediction
The 3D structures of TP0858 and TP0865 were generated using the SWISS-MODEL server (Waterhouse et al., 2018) using as templates the Protein Data Bank entries 3DWO and 3BS0, respectively. HHblits was used to find suitable template models (Remmert et al., 2011). The orientation of proteins with respect to the outer membrane corresponded to that predicted in the OPM database 1 (Lomize et al., 2012). The TprC model was built using the TMBpro server (Randall et al., 2008) according to Kumar et al. (2018). The antigenic peptides were predicted by "Predicted Antigenic Peptides tool 2 ." Predictions were based on a table that reflects the occurrence of amino acid residues in experimentally known segmental epitopes. Segments were only reported if they had a minimum size of 8 residues.

ETHICS STATEMENT
All patients signed the informed consent. The study protocol was approved by the Ethics Committee of all institutions involved in this study and was conducted in compliance with the Declaration of Helsinki.

AUTHOR CONTRIBUTIONS
LGr, RF, and DŠ designed the experiments.
LGr, LM, MN, AN, PP, and CW performed the experiments.
LGr, MP, and DŠ wrote the manuscript. All authors provided critical feedback.

FUNDING
This research was supported by funds from the Faculty of Medicine, Masaryk University to junior researchers (LGr, MN, and PP), the Grant Agency of the Czechia (GA17-25455S) and by the Ministry of Health of the Czechia (17-31333A) to DŠ. Core Facility Bioinformatics of CEITEC Masaryk University is gratefully acknowledged for the obtaining of the scientific data presented in this manuscript. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, provided under the program "Projects of Large Research, Development, and Innovations Infrastructures".