Involvement of aph(3′)-IIa in the formation of mosaic aminoglycoside resistance genes in natural environments

Intragenic recombination leading to mosaic gene formation is known to alter resistance profiles for particular genes and bacterial species. Few studies have examined to what extent aminoglycoside resistance genes undergo intragenic recombination. We screened the GenBank database for mosaic gene formation in homologs of the aph(3′)-IIa (nptII) gene. APH(3′)-IIa inactivates important aminoglycoside antibiotics. The gene is widely used as a selectable marker in biotechnology and enters the environment via laboratory discharges and the release of transgenic organisms. Such releases may provide opportunities for recombination in competent environmental bacteria. The retrieved GenBank sequences were grouped in three datasets comprising river water samples, duck pathogens and full-length variants from various bacterial genomes and plasmids. Analysis for recombination in these datasets was performed with the Recombination Detection Program (RDP4), and the Genetic Algorithm for Recombination Detection (GARD). From a total of 89 homologous sequences, 83% showed 99–100% sequence identity with aph(3′)-IIa originally described as part of transposon Tn5. Fifty one were unique sequence variants eligible for recombination analysis. Only a single recombination event was identified with high confidence and indicated the involvement of aph(3′)-IIa in the formation of a mosaic gene located on a plasmid of environmental origin in the multi-resistant isolate Pseudomonas aeruginosa PA96. The available data suggest that aph(3′)-IIa is not an archetypical mosaic gene as the divergence between the described sequence variants and the number of detectable recombination events is low. This is in contrast to the numerous mosaic alleles reported for certain penicillin or tetracycline resistance determinants.


Introduction
Mosaic genes are genetic units consisting of DNA segments of different phylogenetic origin leading to sequence patterns which may confer novel phenotypic properties (Smith, 1992;Dowson et al., 1997;Boc and Makarenkov, 2011). The within gene (i.e., intragenic) recombination of DNA fragments increases the genetic plasticity of bacterial genomes and contributes to evolution and adaptability to new environmental conditions (Hanage et al., 2006). The process of mosaic gene formation primarily relies on the uptake of free DNA from the environment by competent bacteria via natural genetic transformation and subsequent integration of the incoming DNA fragment into the bacterial genome through homologous recombination (Smith et al., 1991). The efficiency of DNA segment integration is dependent on sequence similarity between the involved DNA strands. The frequency of homologous recombination decreases in a log-linear relationship with increasing sequence divergence between donor and recipient DNA to the point where it falls below the limit of detectionwhich is usually the case when pairwise sequence identity drops below 70% (Dowson et al., 1997;Fraser et al., 2007). This stringent similarity requirement may be circumvented by homology-directed illegitimate recombination, a mechanism where the integration of non-homologous DNA fragments is facilitated by the presence of a short homologous anchor sequence in the donor molecule and a region of microhomology on the opposite terminus of the incoming DNA with the target sequence (de Vries and Wackernagel, 2002;Prudhomme et al., 2002); or by double-illegitimate recombination, which is independent of any homology (Hulter and Wackernagel, 2008).
Genetic recombination inducing mosaic patterns in antibiotic resistance genes in bacterial pathogens results in therapy failure in clinical settings (Spratt, 1994;Heinemann and Traavik, 2004). Bacteria capable of lateral transfer of resistance gene fragments have the opportunity to evade selection pressure in response to alternating antibiotic therapy by acquiring new or modifying existing housekeeping genes and/or resistance determinants (Spratt, 1994). A prominent example is the mosaic pattern formation occurring in penicillin binding protein genes in Streptococcus pneumoniae (e.g., pbp2b) and Neisseria spp. (e.g., penA) and in tetracycline resistance determinants [e.g., tet(M), tet(O), tet(W)] in various animal and human pathogens (Spratt et al., 1989;Dowson et al., 1994;Patterson et al., 2007). These mosaic genes confer increased antibiotic resistance to the host bacterium and impact human health by increasing the morbidity and mortality rates of infectious diseases and by amplifying the financial burden of public health systems (Doern et al., 2001;Heinemann and Traavik, 2004;Bush et al., 2011).
An analysis of a potential contribution of the aminoglycoside resistance gene aph(3 ′ )-IIa to the mosaic gene formation and the variability of aph(3 ′ )-II-homologs is of relevance because this resistance gene is one of the most frequently applied selectable markers in genetic engineering and plant gene technology (Miki and McHugh, 2004;Shakya et al., 2011). Due to such technology applications this resistance gene is shed into the environment. Corresponding DNA fragments may additionally undergo chemical modifications when present as free extracellular DNA in the environment (Pontiroli et al., 2007;Chen et al., 2012). A recombination of anthropogenically released aph(3 ′ )-IIa fragments with endogenous aph(3 ′ )-IIa homologs present in competent environmental bacteria may lead to the formation of mosaic phosphotransferases with an altered antibiotic inactivation spectrum.
The enzyme APH(3 ′ )-IIa inactivates the critically important aminoglycoside antibiotics neomycin and kanamycin as well as paromomycin, butirosin, gentamicin B, and ribostamycin (Shaw et al., 1993;WHO, 2012). Amikacin, a crucial secondline antibiotic used exclusively in humans, was shown to be phosphorylated to some extent only under in vitro conditions (Perlin and Lerner, 1986).
There is currently no experimental evidence available to support or disprove the hypothesis that antibiotic marker genes like aph(3 ′ )-IIa may be involved in the formation of mosaic resistance genes. But powerful bioinformatic tools have now become available that allow in silico analysis of lateral intragenic gene transfer events (Boc et al., 2010;Martin et al., 2010;Boc and Makarenkov, 2011;Le et al., 2014).
To determine whether the genetic variability of aph(3 ′ )-IIa like alleles available in GenBank has arisen from mosaic formation we performed a detailed in silico screening for intragenic recombination events in aph(3 ′ )-IIa sequences utilizing phylogeny-and non-phylogeny-based algorithms of the Recombination Detection Program (RDP4) software package and the Genetic Algorithm for Recombination Detection (GARD) (Kosakovsky Pond et al., 2006a;Martin, 2010).

Sequence Alignments
Sequences producing BLAST matches were downloaded from GenBank, spanning the complete open reading frame when available. Multiple sequence alignments were prepared using the ClustalW algorithm implemented in Bioedit (http://www. mbio.ncsu.edu/bioedit/bioedit.html) (Hall, 2007). The sequence identity matrix option of Bioedit was used to determine the pairwise sequence identity between each sequence and the reference sequence aph(3 ′ )-IIa (EcoAph3IIa). The sequence difference count matrix option of Bioedit was used to determine pairwise nucleotide differences among all aligned sequences. less than 60% sequence identity were considered as nonhomologous. This distinction was based on the observation that aph(3 ′ )-IIb (X90856) and aph(3 ′ )-IIc (HQ424460), the closest described relatives of aph(3 ′ )-IIa among aminoglycoside 3 ′ -Ophosphotransferases (Ramirez and Tolmasky, 2010) share nearly 60% sequence identity with aph(3 ′ )-IIa.
From the bulk of homologs collected from GenBank (Table 1), three sequence datasets were selected for recombination analysis: Dataset 1: 36 partial sequences from the Riemerella anatipestifer isolate collection, representing the intra-species variation of aph(3')-IIa homologs in a pathogen species residing in ducks (yellow bars in Figure S1).
Dataset 2: 11 partial sequences from river water, representing the variation of aph(3 ′ )-IIa homologs occurring in bacterial species recovered from a defined natural aquatic environment (green bars, Figure S1).
Dataset 3: 34 full length aph(3 ′ )-IIa homologs comprising the reference gene EcoAph3IIa and 33 sequences from various bacterial genomes and plasmids. This dataset represented the entire variation of aph(3 ′ )-IIa genes known to date (i.e., as officially deposited in GenBank per September 22nd, 2014 (red, dark blue and light blue bars, Figure S1).
Each dataset was separately aligned with ClustalW and dereplicated to retain one representative sequence per variant. Pairwise differences among all variants were determined to allow selection of sequence subsets for improved recombination detection according to the recommendations of the instruction manual of RDP4 (Martin, 2010). It is indicated that RDP4 is unlikely to detect recombination between extremely similar sequences. The presence of multiple nearly identical sequences in a dataset unnecessarily increases the number of pairwise comparisons and the severity of multiple comparison correction and, thus, reduces sensitivity. The following formula was used for calculating the ratio between the number of sequences (X), length (L) and the minimum required pairwise distances (Y) in the dataset for sequences still eligible for recombination analysis by RDP4: Y = (2 × ln 4X) / L (Martin, 2010). On the other hand, highly divergent sequences increase the risk of false positives as they may cause misalignments and introduce an excess of variable sites into the alignment. Therefore, sequences sharing less than 70% sequence identities have to be handled with caution (Martin, 2010).

Detection of Recombination Events in Aligned Sequence Datasets
Recombination events in multiple sequence alignments were determined using the Recombination Detection Program Beta 4.36 package (RDP4). Seven of the recombination signal detection algorithms available as modules in RDP4 were employed: RDP (Martin, 2010), BootScan (Martin et al., 2005), MaxChi (Smith, 1992), Chimera (Posada and Crandall, 2001), GeneConv (Padidam et al., 1999), SiScan (Gibbs et al., 2000), and 3Seq (Boni et al., 2007). In the general settings for the RDP4 recombination detection procedure, the highest acceptable p-value was set to 0.05, the Bonferroni method was selected to correct for multiple comparisons and the entire process was run in permutational mode with 100 permutations. For the remaining parameters in the general RDP4 options defaults were retained. These defaults involved running PhylPro (Weiller, 1998) and LARD (Holmes et al., 1999) as secondary detection methods. Default settings were also retained for the options in the individual detection modules, except for MaxChi, where the specific window size was set to "variable." These settings and analysis modules were chosen in accordance with common practice in literature (Keymer and Boehm, 2011;Smith et al., 2012;Thomas et al., 2012;Alvarez-Perez et al., 2013;Freel et al., 2013;Hester et al., 2013;Altamia et al., 2014;Duron, 2014).

Genetic Diversity of Aph(3 ′ )-IIa Homolog Sequences in GenBank
The GenBank database was BLAST-searched for sequences similar to the aph(3 ′ )-IIa gene from the transposon Tn5 of E. coli (EcoAph3IIa). In total 227 hits were obtained. Table 1 summarizes the 94 highest scoring hits, and Figure S1 shows the regions of aph(3 ′ )-IIa matched by these hits.
Eighty nine sequences showed sequence identities of 63-100% with EcoAph3IIa and were considered as aph(3 ′ )-IIa homologs. Their bacterial carriers were of animal (40 isolates), human (28 isolates) and genuine environmental origin (21 isolates) (Figure 1). The large majority originated from avian hosts. Non-vertebrate samples were retrieved from such diverse environments as river water, soil, pig manure, activated sludge, marine sediments, and household installations (Figure 1). Most of the animal bacteria were pathogens (43%) but only a minimal fraction of the environmental isolates could be identified as causative agents for diseases (1%) (Figure 2). The aph(3 ′ )-IIa gene sequence variant carriers comprised the following bacterial taxonomic classes: Actinobacteria, Alphaproteobacteria, Bacilli, Bacteroidia, Betaproteobacteria, Clostridia, Flavobacteria, and Gammaproteobacteria ( Figure S2). Complete aph(3 ′ )-IIa homologs had a length of 792-795 nts and discontiguous megablast produced alignment matches of 627-795 bp with the reference gene. Of these 89 homologs 26 were perfect 100% matches and 48 showed over 99% sequence identity with the reference sequence. The 99-100% BLAST matches included two sets of partial sequences originating from bacterial population surveys specifically targeting aph(3 ′ )-IIa diversity: 36 sequences from isolates of the avian pathogen R. anatipestifer collected from diseased ducks (Yang et al., 2012), and 11 sequences from a cultivation independent monitoring of aph(3 ′ )-IIa in Canadian river water samples (Zhu, 2007). As these sequences had been produced by PCR amplification with primers binding within the aph(3 ′ )-IIa gene, sequence information was missing at their ends. Fifteen perfect (100%) and six nearly perfect (>99%) matches over the full gene length were detected in plasmid and genome sequences of bacteria phylogenetically as divergent as E. coli, Bacteroides dorei, and Clostridium nexile.  Search was performed against the non-redundant nucleotide collection and the database of genomic reference sequences. Bold sequences are unique variants.
Frontiers in Microbiology | www.frontiersin.org The remaining six 99-100% matches represented gene fragments (66-754 nts; Table 1). Fifteen sequences were found to share 63-99% sequence identity with the reference sequence. These included two short sequence fragments from PCR-based studies on antibiotic resistance genes in water (JQ937279) and activated sludge (GU721005) and 13 complete genes from genomes and plasmids of Pseudomonas aeruginosa, Enterobacter cloacae, Citrobacter freundii, Klebsiella pneumoniae, Klebsiella oxytoca, Saccharomonospora xinjiangensis, Pseudomonas stutzeri, and Burkholderia spp. isolates ( Table 1). The remaining 136 hits shared only 44-59% sequence identity with EcoAph3IIa and, thus, were not considered as aph(3 ′ )-IIa homologs. They included the aph(3 ′ )-IIc gene of Stenotrophomonas maltophilia (HQ424460) and the aph(3 ′ )-IIb (X90856) gene of P. aeruginosa (data not shown). The last sequence match presented in Table 1 and Figure S1 is an open reading frame of a S. maltophilia strain (CP001111) with 97% sequence identity to aph(3 ′ )-IIc. These different aph genes varied in open reading frame length between 783 and 813 nts and produced discontiguous megablast matches spanning 50-370 bp between positions 360 and 720 of aph(3 ′ )-IIa. The region between positions 360 and 720 of the aph(3 ′ )-IIa gene contains two functional domains, known as motif1 and motif2, that are conserved across different clades of the aph gene family (Shaw et al., 1993).

Sequence Variation and Recombination Analysis in Aph(3 ′ )-IIa Homologs from Riemerella Anatipestifer Isolates (Dataset 1)
Of the 36 sequences from R. anatipestifer isolates, 25 were unique variants. One unique representative was selected from FIGURE 2 | Relative abundance and origin of bacterial isolates carrying aph(3 ′ )-IIa variants. Only isolates explicitly classified as "pathogen" in the GenBank entry or in one of its associated publications or showing a clear history as causative agents for disease as described in Murray et al. (1999), were considered as pathogens. All other isolates were identified as "non-pathogens" (including species characterized as opportunistic pathogens causing rare disease only in immunocompromised patients and "uncultured bacteria" without any additional information available). Data were calculated for a total of 89 isolates (=100%). each group of identical sequences. The most frequent variant (RiemerGN19) was identical with the aph(3 ′ )-IIa reference gene from the E. coli transposon Tn5. The sequences contained parts of the PCR primers used by the survey authors (Yang et al., 2012). After removal of the uninformative primer regions, a 686 nts gene segment, spanning aph(3 ′ )-IIa between position 85 and 770 remained for recombination analysis. In total there were 45 polymorphic sites in the sequence alignment. Pairwise nucleotide differences ranged between 1 and 9 nucleotides. RDP4 analysis of the entire 25 sequence set did not reveal recombination signals. The analysis was repeated with a subset comprising the four most divergent sequences (RiemerX234, RiemerX211, RiemerFX02, RiemerC006). This subset corresponded to the recommendations of the RDP4 developers (Martin, 2010) regarding the relation between number, length and minimum divergence of the sequences. However, no recombination event was detected in this subset.

Sequence Variation and Recombination Analysis in Aph(3 ′ )-IIa Homologs from River Water (Dataset 2)
All of the 11 aph(3 ′ )-IIa sequences extracted from river water were unique variants. Sequence UncultK40 was identical with the aph(3 ′ )-IIa reference gene (EcoAph3IIa). After removal of PCR primer binding sites, a 688 nts gene segment, spanning aph(3 ′ )-IIa between position 27 and 714 remained for recombination analysis. Pairwise nucleotide differences ranged between 1 and 9 nucleotides. RDP4 did not detect recombination events neither in the complete set of 11 sequences, nor in the alignment of the three most divergent sequences (Uncultk56, UncultK009, UncultK025).

Sequence Variation and Recombination Analysis in Full Length Aph(3 ′ )-IIa Homologs from Various Bacterial Genomes and Plasmids (Dataset 3)
Of the 34 available full length homologs originating from various bacterial chromosomes and plasmids 15 were unique variants. The original aph(3 ′ )-IIa gene (EcoAph3IIa) was representative for 15 sequences producing perfect BLAST matches. The three sequences from isolate P. aeruginosa U2504 were identical, and one was retained as representative (Pseudomo14). The aph(3 ′ )-IIa homologs detected in plasmids of E. cloacae, K. oxytoca, K. pneumoniae and C. freundii were identical, and the sequence from Citrobacter was retained as representative for further analysis (Citrobac01). The 15 unique sequences comprised 795 nts, except for Sacharo01, which was one nucleotide triplet shorter. Pairwise sequence differences varied between 1 and 324 nucleotides. Seven recombination detection methods in RDP4 detected a single recombination event in this dataset ( Table 2). The results suggested that Pseudomo02 was a mosaic of Pseudomo14 and a sequence highly similar to the reference sequence EcoAph3IIa (Figure 3). The seven methods congruently identified the exchange of a fragment in the region between alignment positions 100 and 500. Figure 3 visualizes the recombination event and highlights the recombination breakpoints at positions 224 and 484, which were proposed congruently by three different methods ( Table 2). Analysis of a 5 sequence subset including only sequences with the recommended level of pairwise nucleotide differences (8-239 nts, for explanations see Materials and Methods) confirmed the results obtained with the complete 15 sequence dataset ( Table 2). For further confirmation the 15 sequence set was analyzed with GARD. GARD analysis detected a single significant recombination breakpoint signal at position 198 (Table 3). Upon analysis of the five sequence subset, GARD produced several statistically non-significant breakpoint signals, including one at position 482.

Discussion
Sequence analysis of antibiotic resistance genes coding for penicillin binding proteins or for tetracycline resistance determinants has revealed horizontal gene transfer events leading to mosaic gene formation (Dowson et al., 1994;Patterson et al., 2007). The aim of this work was to elucidate whether intragenic recombination also occurs in natural homologs of aph(3 ′ )-IIa aminoglycoside resistance genes. To determine the natural variability of aph(3 ′ )-IIa the GenBank database was screened for aph(3 ′ )-IIa variants. The hits were subsequently analyzed for intragenic recombination signals with the RDP4 software package and the web-based tool GARD.
The analysis of the recombination potential of aph(3 ′ )-IIa is of biological relevance because this resistance determinant is inactivating important aminoglycoside antibiotics like kanamycin and neomycin which are vital antimicrobial agents for veterinary purposes and in special cases for human therapeutic applications (WHO, 2012). Additionally, Aph(3 ′ )-IIa was shown in vitro to be capable of extending its antibiotic inactivation spectrum to amikacin-an essential agent for the treatment of severe systemic infections caused by Gram negative bacteria and a crucial second-line antibiotic for combatting multidrug-resistant tuberculosis (Durante-Mangoni et al., 2009;WHO, 2011)-due to an exchange of a single amino acid (Kocabiyik and Perlin, 1992). Although a high-level aph(3 ′ )-IIa-induced amikacin resistant phenotype was only demonstrated so far for an E. coli mutant laboratory strain that showed a reduced aminoglycoside uptake combined with a resistance gene amplification (Perlin and Lerner, 1986) these observations are indicative for a significant effect of aph(3 ′ )-IIa sequence variability on the antibiotic resistance profile of this aminoglycoside phosphotransferase. Nevertheless, we are only aware of two studies dealing explicitly with aph(3 ′ )-IIa sequence variations, both failing to provide a connection between genotype and antibiotic resistance phenotype or induced minimum inhibitory concentrations (MIC) (Zhu, 2007;Yang et al., 2012).
There are only a few studies available on the prevalence of aph(3 ′ )-IIa. Shaw et al. reported 2.5% of all isolates resistant to kanamycin as carriers of aph(3 ′ )-IIa (Shaw et al., 1993). Most of the remaining papers suggested a low abundance of this resistance determinant in natural habitats: aph(3 ′ )-IIa was only rarely detected in bacterial isolates of human (Peirano et al., 2006;Woegerbauer et al., 2014) or environmental origin or in total soil DNA preparations (Leff et al., 1993;Smalla et al., 1993;Ma et al., 2011) or there was evidence of large seasonal fluctuations especially in river waters (Zhu, 2007). These findings indicate that i) bacterial aph(3 ′ )-IIa carrier strains are available providing recombination partners for this resistance determinant and that ii) an artificial exposure of bacterial populations with aph(3 ′ )-IIa copies from anthropogenic sources like laboratory waste discharges or antibiotic resistance marker gene carrying transgenic organisms-eventually in combination with aminoglycoside containing effluents or manure -may increase the likelihood for genetic recombination (Chee-Sanford et al., 2009;Chen et al., 2012). BLAST search of GenBank revealed only a limited number of aph(3 ′ )-IIa variants with sequence identities between 60 and 99%. This is in contrast to the many mosaic genes coding for penicillin binding proteins or tetracycline resistance determinants for which homologs with a continuous spectrum of sequence identity between 80 and 99% have been identified (Spratt, 1994;Oggioni et al., 1996;Hakenbeck, 2000;Hollingshead et al., 2000;Johansen et al., 2001;Prudhomme et al., 2002;Nakamura et al., 2012).
The retrieved aph(3 ′ )-IIa sequence homologs comprised a wide range of variant sequences originating from a broad variety of environmental sources including soil, water, marine sediments, manure, sewage sludge, and diverse human (gut, skin, urinary tract, lung, brain) and animal habitats (birds, pigs, cows).
For recombination analysis, the aph(3 ′ )-IIa homologs were grouped into 3 datasets originating from duck pathogens (dataset 1) and river water (dataset 2) as representatives for sequences from bacteria living in a common habitat with the obvious physical property to exchange gene fragments. The remaining unique full length aph(3 ′ )-IIa homologs were from bacteria of diverse animal, human or genuine environmental origins which could not be allocated to a common biotope (dataset 3). A combined analysis of sequences from dataset 3 comprising such different ecosystems is valid since lateral transfer of fragments in the evolution of a gene of interest can be assessed by sequence comparison without the prerequisite that the source organisms are of the same species or have been isolated from a common habitat. For example Oggioni et al. discovered mosaic patterns in tetracycline resistance genes by comparing previously published sequences of tetracycline resistant Enterococcus faecalis, S. pneumoniae, Staphylococcus aureus, Ureaplasma urealyticum,  and Neisseria spp. isolates (Oggioni et al., 1996). Similarly Boc et al. detected numerous recombination events in the evolution of the rubisco gene rbcL by comparison of amino acid sequences from various photosynthetic bacteria and algae (Boc and Makarenkov, 2011). The theoretical lower limit for most of the RDP4 algorithms applied for the identification of a mosaic gene (i.e., a gene affected by intragenic recombination) is three (Martin, 2010). Many publications refer to approx. 8-12 sequences to be sufficient for a reliable identification of mosaic genes (Oggioni et al., 1996;Dowson et al., 1997;Filipe et al., 2000;King et al., 2005): Oggioni et al. used a total of eight sequences to identify tet(M) as mosaic gene in silico with high significance (Oggioni et al., 1996). Filipe et al. tested 12 murM alleles (Filipe et al., 2000), Dowson et al. 8 pbp2b alleles (Dowson et al., 1997), and King et al. used 12 novel 5 ′ and 10 novel 3 ′ nanA alleles to establish gene mosaicism (King et al., 2005). Our efforts are far exceeding any data collections used so far for the detection of mosaic genes in a single approach.
In the analysis of our third dataset, seven sequence comparison algorithms of the RDP4 suite provided evidence for a recombination event. The risk of identifying false positives, i.e., of mistaking mutation for recombination events, is inherent to any in silico recombination detection strategy (Martin, 2010;Boc and Makarenkov, 2011). Therefore, it is current practice to confirm calculated recombination events with several methods, including phylogeny-based and substitution distribution-based algorithms (Bay and Bielawski, 2011;Boc and Makarenkov, 2011). The described recombination event in the aph(3 ′ )-IIa gene dataset is supported by three phylogeny-based methods (RDP, BootScan, SiScan), 4 substitution distribution-based methods (MaxChi, GeneConv, Chimera, 3Seq) and to some extent also by the phylogeny-based genetic algorithm (GARD). In bacterial multi-locus sequence typing (MLST), a major application area of the RDP4 software, many authors have convened to accept a software reported recombination event, if it is detected by at least three methods with a Bonferroni-corrected p-value < 0.05 (Keymer and Boehm, 2011;Smith et al., 2012;Alvarez-Perez et al., 2013). This criterion is met by the recombination event described here. The different methods agreed on the exchanged gene region and on the recombination partners involved in this event, but proposed different positions as recombination breakpoints. This reflects the different aspects of information each algorithm is targeting in a sequence alignment (Martin, 2010).
The mosaic aph(3 ′ )-IIa gene identified in our third dataset is located in a Tn5 similar cassette on pOZ176, an incP-2 plasmid from the multidrug-resistant isolate P. aeruginosa PA96 (Xiong et al., 2013). Plasmid pOZ176 is of environmental origin showing homologies with a vector from the plant pathogen P. fluorescens and to genomic islands present in the environmental bacteria Ralstonia solanacearum and Azotobacter vinelandii. Codon usage analysis indicated that most of the resistance genes of pOZ176 were not originally from P. aeruginosa but acquired by horizontal gene transfer from other species indicating a long history of DNA rearrangements most probably driven by antibiotic selection (Xiong et al., 2013).
PA96 is reported to be phenotypically resistant to at least 13 antibiotics (including amikacin and gentamicin) from three different substance classes [ß-lactams (pencillins, cephalosporins, carbapenems), fluoroquinolones, and aminoglycosides] (Xiong et al., 2013). Whole genome sequencing revealed that PA96 is carrier of aph(3 ′ )-IIb (Deraspe et al., 2014), which mediates resistance to kanamycin, neomycin, and butirosin (Zeng and Jin, 2003) potentially masking an antibiotic activity of the newly discovered aph(3 ′ )-IIa mosaic gene. At present there is no information available whether this novel mosaic gene on pOZ176 is functionally active and expressing any antibiotic resistance phenotype.
Antibiotic resistance marker genes used in transgenic plants are in several cases plant-codon optimized versions of their bacterial counterparts (Roa-Rodriguez and Nottenburg, 2003). The plant-derived aph(3 ′ )-IIa variant of the transgenic potato line EH92-527-1 (Amflora) contains a characteristic mutation. Alignment of the recombinant aph(3 ′ )-IIa gene of pOZ176 with the plant-derived transgenic variant of aph(3 ′ )-IIa from EH92-527-1 revealed an absence of the plant allele-specific mutation in pOZ176 and vice versa an absence of the mutations distinctive for the recombinant aph(3 ′ )-IIa allele in the transgenic counterpart (data not shown due to confidential business information restrictions). These observations indicate that an involvement of this transgenic allele in the evolution of aph(3 ′ )-IIa of pOZ176 is unlikely.
Compared to the complex recombination history of known mosaic genes such as pbp2b (Dowson et al., 1997), murM (Filipe et al., 2000), or tet(M) (Oggioni et al., 1996), the observed recombination frequency among aph(3 ′ )-IIa homologs was low. Although intragenic recombination is thought to be a frequent process during bacterial evolution (Didelot and Maiden, 2010) our report is presenting the first evidence for only a single mosaic formation event among aph(3 ′ )-IIa homologs. To verify the sensitivity of our approach, we analyzed sequence collections of pbp2b, murM, and tet(M) with RDP4 using the same settings, and detected a multitude of recombination breakpoints and corresponding p-values several orders of magnitude lower than those obtained with the aph(3 ′ )-IIa datasets (data not shown). According to the currently available sequence information in GenBank and compared to typical mosaic genes aph(3 ′ )-IIa appears to be less prone for intragenic recombination. However, it is important to realize that novel aph(3 ′ )-IIa sequence variants becoming prospectively available may change the outcome of the in silico recombination analysis.
We conclude that a recombination event has occurred during the evolution of an aph(3 ′ )-IIa homolog present on a plasmid of environmental origin in a pathogenic multi-resistant strain of P. aeruginosa. The observed number of variant aph(3 ′ )-IIa sequences is low and their diversity appears to be not primarily driven by intragenic recombinations.