Genomic diversity and taxonomic marker for Arcobacter species

Arcobacter was recognized as an emerging enteropathogen and controversies regarding its classification persisted. This study aimed to reevaluate the taxonomy of Arcobacter utilizing the 16S rRNA gene, 23S rRNA gene, single-copy orthologous genes, as well as genomic indices such as Average Nucleotide Identity (ANI) and in silico DNA–DNA hybridization (isDDH). The taxonomy of this genus was reevaluated in this study using multiple indices with a dataset of 371 genomes comprising 34 known species and 14 potentially new species. Good discrimination could be achieved only in some species but not for the species with higher sequence similarity using the comparisons of the 16S rRNA gene and 23S rRNA gene sequences. A high-accuracy phylogenomic approach for Arcobacter was established using 84 single-copy orthologous genes obtained through various bioinformatics methods. One marker gene (gene711), which was found to possess the same distinguishing ability as ANI, isDDH, and single-copy orthologous methods, was identified as a reliable locus for inferring the phylogeny of the genus. The effective species classification was achieved by employing gene711 with a sequence similarity exceeding 96%, even for species like A. cloacae, A. lanthieri, and A. skirrowii, which exhibited ambiguous classification using ANI and isDDH. Additionally, excellent subspecies categorizing among A. cryaerophilus could be distinguished using gene711. In conclusion, this framework strategy had the potential advantage of developing rapid species identification, particularly for highly variable species, providing a novel insight into the behavior and characteristics of Arcobacter.


Introduction
Arcobacter has gained increasing significance in recent years, as its members are now recognized as emerging enteropathogens and potential zoonotic agents (Ho et al., 2006).The Arcobacter genus belongs to the Campylobacteraceae family, which includes other genera: Campylobacter, Helicobacter, Sulfurospirillum, and others (On, 2001).Initially classified within the Campylobacter genus, it was in 1991 that the Arcobacter genus was recognized as distinct and designated as a separate genus within the Campylobacteraceae family (Vandamme et al., 1991).Arcobacter was generally described as possessing differentiated abilities from Campylobacter, namely the ability to grow in aerobic conditions and at temperatures between 15 and 30°C (Vandamme et al., 1992); however, this principle has been changed by the increased number of new species.Nowadays, Arcobacter species inhabit a wide range of ecological niches, encompassing diverse environments such as marine environments, wastewater and drinking methods, the number of Arcobacter strains is increasing, leading to the gradual identification of new Arcobacter species.Consequently, this progress poses a challenge in effectively classifying these species, thereby introducing increased difficulties in taxonomy.Incorporating genomics into taxonomy appears to be a promising development, enhancing credibility by offering reproducible, reliable, and highly informative methods to infer phylogenetic relationships among prokaryotes while avoiding unreliable approaches and subjective, difficult-to-replicate data.Within this modern taxonomy context, the objective of this study was to reassess the taxonomy of both known and newly identified Arcobacter species by using 16S rRNA gene, 23S rRNA gene, the whole genome sequences, and the derived genomic analysis, providing valuable insights into the taxonomic investigation of Arcobacter.We also evaluated the efficacy of various genome-based phylogenetic tools in discriminating between different Arcobacter species.

Downloading of publicly available genomes
All 34 valid species included in the Arcobacter genus have been studied.They were represented by 199 genomes and 17 potentially new species genomes (Supplementary Table S1).All genome sequences identified as Arcobacter were downloaded from the National Center for Biotechnology Information (NCBI) and Bacterial and Viral Bioinformatics Resource Center (BV-BRC) public database on January 2023.All publicly available assemblies were subjected to quality control by Quast software (Gurevich et al., 2013).Firstly, genomic sequences identified as "poor" were excluded from the analysis based on the sequencing quality.Secondly, genomes that did not meet the criteria for genome size and GC content were filtered out according to the genomic characteristics of Arcobacter.Additionally, only genomes with a scaffold count of less than 200 were included to ensure the reliability of the analysis results.Finally, the obtained genomes underwent species identification using the GTDB v2.3.2 software (Chaumeil et al., 2019), and only the genomes identified as Arcobacter were included in the analysis.A total of 199 Arcobacter genomes were included in the study, comprising 34 named Arcobacter species and 14 unclassified Arcobacter species, as shown in Supplementary Table S1.

Analysis of ribosomal genes
The 16S rRNA gene and 23S rRNA gene sequences were extracted from the genome assemblies using barrnap v0.9, producing a gff file of rRNA gene locations in the genome assemblies.The gff files were combined with the bedtools (Quinlan and Hall, 2010), fastaFromBed, to extract the 16S rRNA and 23S rRNA gene sequences from the genome assemblies.Genes sequences were aligned using MAFFT v7.490 software (Katoh and Standley, 2013).The genomes containing the complete 16S rRNA gene and 23S rRNA gene were selected, and the corresponding sequences were extracted and aligned to construct a Neighbor-Joining (NJ) phylogenetic tree with a bootstrap value of 1,000.Additionally, pairwise sequence comparisons were performed using MAFFT v7.490 software (Katoh and Standley, 2013) to determine sequence alignments and assess the similarity between pairs of sequences.

Analysis of ANI and isDDH
Pairwise ANI values were calculated for all genomes using pyani v0.2 software (module ANIb), accessible at https://github.com/widdowquinn/pyani.The Genome-to-Genome Distance Calculator (GGDC) web service was used to report isDDH for the accurate delineation of prokaryotic subspecies and to calculate differences in G + C genomic content. 3Analysis was performed using "Formula 2, " as recommended by the GGDC authors, which allows for isDDH estimation independent of genome lengths, making it suitable for incomplete genomes.A matrix with ANI values across all genomes was visualized using the pheatmap package, and an in-house script was used to generate a clustering dendrogram based on the ANI matrix.

Identification of single-copy orthologous genes and marker gene
The OrthoFinder v2.5.4 software (Emms and Kelly, 2019) was employed to perform a homology analysis on the 371 Arcobacter genomes, identifying single-copy orthologous genes.The software parameters used were -S blast, −M msa, −T raxml.The EasyTree.py 3 Available at http://ggdc.dsmz.de/ggdc.php.script4 was used to extract all single-copy orthologous genes from each genome.The genes were aligned using the MAFFT v7.490 software, and an ML tree (data not shown) was constructed by concatenating and coalescing these genes using the raxmlHPC v8.2.12 software (Stamatakis, 2014) and MEGA 7 (Kumar et al., 2016) software, with a bootstrap value of 1,000.The resulting tree was annotated using the table2itol package and visualized in iTOL. 5

Genomic characteristics of the Arcobacter
A total of 371 high-quality sequenced and assembled genomes of Arcobacter were obtained through genome quality control, and a comprehensive analysis was conducted on 371 genomes.All 34 species currently included in the Arcobacter and 14 candidate species have been investigated in the present study.The scaffolds obtained and the N50 values complied with the proposed minimal standards for using genomes in taxonomic studies (Chun et al., 2018).Genome assemblies had 1 to 166 contigs.The genome sizes and GC contents displayed significant variations across different Arcobacter species.The genome size ranged from 1.68 Mb for A. skirrowii CNAS13 to 3.57 Mb for A. lekithochrous CP054052.The genome size of A. skirrowii was generally smaller than that of other Arcobacter species.In comparison, the genome size of A. lekitochrous was generally larger than that of other Arcobacter species.The G + C content ranged from 26.08% in A. molluscorum NXFY00000000 to 31.00% in Arcobacter spp.JAIFNA000000000, as shown in Supplementary Table S1.

Phylogenetic of ribosomal genes
The size of the 16S rRNA gene in 34 type strains of Arcobacter species ranged from 1,512 to 1,516 bp, with sequence similarities ranging from 91.97% (between A. cryaerophilus and A. bivalviorum) to 99.93% (between A. butzleri and A. lacus).Similarly, the size of the 23S rRNA gene varied from 2,873 to 3,026 bp, with sequence similarities ranging from 86.72% (between A. vandammei and A. pacificus) to 99.72% (between A. butzleri and A. lacus).Detailed results can be found in Tables 1, 2 and Supplementary Table S2.The phylogenetic trees constructed based on the 16S rRNA gene and 23S rRNA gene of the type strains were presented in Figure 1.It was noteworthy that there were certain variations observed in the phylogenetic trees constructed using different sequence datasets.Of the 371 Arcobacter genomes analyzed, 281 were selected for analysis due to the near-full length of the 16S rRNA gene and 23S rRNA gene.The size of the 16S rRNA gene ranged from 1,306 to 1,517 bp, almost all of which were around 1,514 bp, except VBUD00000000, VBUC00000000, NXGJ00000000, SZACF0142G, SZACF1311G, and SZACF1324G.Similarly, the size of the 23S rRNA gene ranged from 2,607 to 3,030 bp, most of which were around 2,907 bp.The similarities in the 16S rRNA gene sequences among different Arcobacter species

Species classification and genetic population
The results of the ANI and the isDDH calculations among the studied genomes were given in Table 2, Supplementary Table S3, and Figure 3. Significant differences in ANI were observed among different species of Arcobacter.The ANI values among some strains within A. cloacae, A. lanthieri, A. marinus, A. skirrowii, and A. cryaerophilus species were < 96%, and the isDDH values were < 70%.Among them, the most significant differences in ANI and isDDH were observed between subspecies of A. cryaerophilus, with ANI and isDDH values of 92.32 and 48.10%, respectively.However, the ANI or isDDH values within the species were significantly higher than those with the closest related species.In addition to the known species of Arcobacter, 17 genomes potentially represented 14 new species that were identified.The ANI values between these new species and the known genomes of Arcobacter exhibited significant differences.The ANI and isDDH values compared to known Arcobacter species were below 96 and 70%, respectively, which were the cut-off values proposed for delineating new species.Only the ANI between A. spp._PDJV00000000 and A. nitrofigilis_CP001999 > 90%, while for the remaining genomes <90%.

Phylogenetic reconstruction using the marker gene
The analysis of 371 genomes revealed a total of 835,009 genes 10,652 orthogroups, and 3,395 unassigned genes.Among these orthogroups, 216 were found to be present in all analyzed genomes, with 84 of them being single-copy orthologous genes.To elucidate the taxonomic relationships among members of the Arcobacter genus, we constructed a high-quality NJ phylogenomic tree based on the concatenation of these 84 conserved single-copy orthologous genes (Figure 4).The phylogenetic tree, derived from 84 single-copy homologous genes, demonstrated excellent resolution in identifying Arcobacter species.Notably, even A. butzleri and A. lacus, The NJ tree was constructed based on the 16S rRNA gene and 23S rRNA gene sequences of 34 Arcobacter species type strains, with a bootstrap value of 1,000.(A) The phylogenetic tree was constructed using the 16S rRNA gene, and (B) was constructed using the 23S rRNA gene.Bar indicated 5 substitutions per 1,000 bp.characterized by remarkably high ANI values, can be clearly differentiated.Remarkably, the species classification results derived from the phylogenetic tree using the 84 single-copy homologous genes closely aligned with the ANI results, which meant that Arcobacter can be accurately classified using single-copy concatenation genes.Phylogenetic trees for each single-copy orthologous gene were also constructed using nucleotide and amino acid sequences.When comparing the phylogenetic trees constructed based on nucleotide and amino acid sequences of each gene with ANI results, it was found that the topology of the phylogenetic tree built using gene711 was nearly identical to the phylogenetic tree constructed using the concatenation of 84 single-copy homologous genes (Figure 5; Supplementary Figure S2).During the sequence alignment analysis of each gene, gene711 effectively differentiated all species within the Arcobacter genus.Furthermore, the sequence similarities within species were found to be >96% (except for A. cryaerophilus and A. marinus), while the maximum sequence similarity between different species was <94%.Consequently, gene711 could be considered a reliable signature gene for identifying Arcobacter species, with a sequence similarity threshold of greater than 95-96% defining the same species (Table 2; Supplementary Tables S2, S3).The gene711 exhibited sequence similarity above 96% in A. cloacae, A. lanthieri, and A. skirrowii, while within these species, their ANI and isDDH values were below the classification thresholds of 96 and 70%, respectively.In A. marinus, A.marinus_CP042812, A. marinus_ NWVW00000000, and A. marinus_PTIW00000000 showed gene711 sequence similarities ranging between 95 and 96% with other genomes, which was consistent with the ANI and isDDH results.For A. cryaerophilus, except for CNAC091 and A. cryaerophilus_ NERP00000000, gene711 effectively divided A. cryaerophilus into four distinct subspecies, as shown in Figures 3B, 6 and Supplementary Table S3.The sequence similarity of gene711 was >96% within each subspecies, while the sequence similarity between subspecies was <96%, similar to the results based on ANI and isDDH.

Discussion
Arcobacter is recognized as a globally emerging foodborne and zoonotic pathogen with a wide range of sources and regions (Collado and Figueras, 2011;Ferreira et al., 2016).Understanding its genomic and classification characteristics is crucial for further investigations of this pathogen.In this study, a total of 371 genomes, comprising 34 named Arcobacter species and 14 unclassified Arcobacter species, were selected to elucidate the taxonomic characteristics of Arcobacter.The quality of the genome sequences generally met the minimal standards established for using genome data for taxonomical purposes (Chun et al., 2018).Globally, the genome size ranged from 1.68 Mb to 3.57 Mb.The G + C content ranged from 26.08 to 31.00%.Significant variations in genome size and GC content were observed in Arcobacter, suggesting considerable genomic diversity and divergence.This aspect could be one of the reasons contributing to the current challenges in the taxonomic classification of Arcobacter.
Like other bacterial genera, the taxonomic classification of Arcobacter has traditionally been based on the analysis of the 16S rRNA gene (Wesley et al., 1995).In fact, several potential new Arcobacter species could be inferred from the sequences available in public databases, similar to the 17 genomes downloaded in this study, which included 14 potentially new Arcobacter species.In previous studies, the similarity of the 16S rRNA gene has been considered a decisive characteristic for taxonomic classification at the genus or species level (Stackebrandt, 2006).Specifically, the sequence similarity of >98.7% in the 16S rRNA gene has been found to show good consistency with an isDDH > 70% (Stackebrandt, 2006).The sequence similarity of the 16S rRNA gene in 34 type strains of Arcobacter among multiple species was observed to be >98.7%.Moreover, expanding the number of 16S rRNA gene sequences to 281 revealed that more species displayed 16S rRNA gene sequence similarities >98.7%.However, it was necessary to note that phylogenetic trees constructed solely based on the 16S rRNA gene could cluster individuals of the same species together; however, relying solely on the 98.7% similarity threshold for species classification might lead to biased results.In other words, the discriminatory power of the 16S rRNA gene was limited when dealing with species that possessed highly similar 16S rRNA gene sequences.The 23S rRNA gene sequences were also attempted to assess Arcobacter interspecies differences, as published data indicated 16S rRNA gene sequences did not contain sufficient information to effectively discriminate between strains (Deshpande et al., 2013).However, our findings indicated that the 23S rRNA gene sequences were also insufficient for effective discrimination, likely due to the increased burden of additional sequences.Despite our efforts, the results obtained using the 23S rRNA gene were similar to those obtained using the 16S rRNA gene, further underscoring the limited discriminatory power for species with high sequence similarity.
Nowadays, genomic data such as the ANI and the isDDH are being increasingly used to define bacterial species, although their full potential for delineating genera has yet to be explored (Perez-Cataluna et al., 2018b;On et al., 2021).As discussed in other studies, the ANI and isDDH indices have been proven to provide reliable information for the delineation of Arcobacter species and have also been included in the minimal guidelines for defining species using genomes (Chun et al., 2018;Perez-Cataluna et al., 2018b;On et al., 2021).For Arcobacter, ANI values >96% were the ones that better correlated with isDDH results >70% in previous studies (Perez-Cataluna et al., 2018a;Zhou et al., 2022), which was further confirmed in this study.The ANI values between genomes of most Arcobacter species were consistent at >96%, except for certain genomes in A. cloacae, A. lanthieri, A. marinus, A. skirrowii, and A. cryaerophilus that did not meet the 96% classification threshold.Additionally, isDDH analysis was performed on species with ANI values <96%, and the results were consistent with the ANI result.Specifically, for genomes with ANI values<96%, their isDDH values The NJ tree was constructed based on the 84 single-copy homologous genes, with a bootstrap value of 1,000.(A) was the phylogenetic tree constructed using nucleotide sequence, and (B) was the phylogenetic tree constructed using amino acid sequence.Different colors or shapes indicated different Arcobacter species.Bar indicated 1 substitution per 10 bp.

FIGURE 5
The NJ tree was constructed based on gene711, with a bootstrap value of 1,000.(A) was the phylogenetic tree constructed using the nucleotide sequence of 34 Arcobacter species type strain, (B) was the phylogenetic tree constructed using amino acid sequence of 34 Arcobacter species type strain, (C) was the phylogenetic tree constructed using the nucleotide sequence of 371 Arcobacter genomes, (D) was the phylogenetic tree constructed using amino acid sequence of 371 Arcobacter genomes.Different colors or shapes indicated different Arcobacter species.
were found to be <70%.For ANIm, intraspecies pairs generally have >96% identity, while interspecies pairs generally have <93%, with an intermediate range of 93-96% where species circumspection cannot be assured (Rossello-Mora and Amann, 2015).These findings suggested substantial genomic differences within Arcobacter species, even though they could be classified into different subspecies.Previous studies have proposed that A. cryaerophilus should be divided into four subspecies according to the species classification criteria of ANI values >96% and isDDH values >70% (Zhou et al., 2022), which was further confirmed in this study.Within the Arcobacter genus, 17 genomes potentially represented 14 new potentially species.The ANI values between these new species and the known genomes of Arcobacter exhibited significant differences.Only the ANI between A. spp._PDJV00000000 and A. nitrofigilis_ CP001999 > 90% and reached 91.64%, while the ANI for the remaining genomes <90%.These findings further emphasized the substantial genomic diversity within the Arcobacter genus, which posed challenges for population classification.
This study established a method based on the construction of phylogenetic trees using single-copy orthologous genes for the rapid and simplified classification of Arcobacter species.A robust means of species identification within Arcobacter was provided by utilizing 84 single-copy orthologous genes.However, this method was not widely endorsed due to its reliance on a considerable number of genes.Fortunately, we have discovered that gene711 effectively differentiated various species within Arcobacter.The gene711, which encoded a 186-218 amino acid in Arcobacter, was a FlgO family outer membrane protein and was capable of reproducing a tree with a similar topology to our genome-based phylogeny.The gene711 sequences demonstrated high nucleotide diversity and yielded a tree that accurately separates strains into phylogenetic groups defined by ANI-based analysis.The gene711 exhibited sequence similarity >96% within the same species, while the similarity between different species was significantly <96%.The neighboring genes upstream and downstream of gene711 also displayed relatively conserved characteristics, making them potential targets for developing sequence-based analysis or real-time PCR assays to detect Arcobacter species.The discriminatory power of the gene711 locus made it possible to improve the accuracy of species identification within the Arcobacter genus.As mentioned earlier, certain genomes within A. cloacae, A. lanthieri, A. marinus, A. skirrowii, and A. cryaerophilus did not meet the species classification criteria of ANI values >96% and isDDH values >70% within the same species.Among these species, we used gene711 to verify and found that except for A. marinus and A. cryaerophilus, the remaining species met the requirement of gene711 > 96% within the species and gene711 < 96% between species.Previous studies (Zhou et al., 2022) have identified four subspecies within A. cryaerophilus, and our study using gene711 for A. cryaerophilus subspecies classification further supported this conclusion.However, there were also instances of gene711 anomalies in certain strains within A. cryaerophilus, such as CNAC091.
To our knowledge, this is the first time that gene711 has been used as a phylogenetic marker within a bacterial genus.As highlighted in the review by Collado and Figueras (2011), numerous uncultured or as-yet-undescribed species of Arcobacter have been identified based on nearly full-length 16S rRNA gene sequences, potentially surpassing the number of already known species at that time.The emergence of new species can be anticipated in the near future, further validating the significance of gene711 proposed in this study.The phylogenetic tree was generated based on the sequences of gene711.The neighbor-joining method was used to generate the phylogenetic tree, which was performed using MEGA 7.0 with 1,000 bootstrap replications.Bars of different colors represented different subclades.Bar indicated 5 substitutions per 1,000 bp.
Frontiers in Microbiology frontiersin.org

Conclusion
In this study, we evaluated the efficacy of various genome-based phylogenetic tools in discriminating between different Arcobacter species.Novel approaches for the classification of the Arcobacter were employed in this study.Finally, a maker gene (gene711) that demonstrated greater discriminatory power and robustness than other commonly used markers was identified, making it a valuable tool for future molecular identification of Arcobacter species.In summary, our study offers valuable insights into the evolution, genetic diversity, and species classification of Arcobacter, thereby shedding new light on the behavior and characteristics of this genus.

FIGURE 2
FIGURE 2The NJ tree was constructed based on the 16S rRNA gene and 23S rRNA gene sequences of 281 Arcobacter genomes, with a bootstrap value of 1,000.(A) The phylogenetic tree was constructed using the 16S rRNA gene, and (B) was constructed using the 23S rRNA gene.Different colors or shapes indicated different Arcobacter species.Bar indicated 1 substitution per 100 bp.

FIGURE 3 Arcobacter
FIGURE 3Arcobacter ANIb heatmap using the pheatmap package.(A) was the ANIb heatmap of 34 known Arcobacter species and 14 unknown Arcobacter species, and (B) was the heatmap of A. cryaerophilus.The depth of the color indicated the size of the ANI value, which increased sequentially from blue to orange.

TABLE 1
16S rRNA gene and 23S rRNA gene sizes and the start and end positions of gene711 in the genome of 34 Arcobacter species type strains.Intraspecies and interspecies similarity of 16S rRNA gene, 23S rRNA gene, ANI, and gene711 of Arcobacter.

TABLE 2 (
Continued) differentiate between A. butzleri and A. lacus.For most species of Arcobacter, phylogenetic trees based on 16S rRNA and 23S rRNA genes have better resolution.