Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy

Plasmids are important “vehicles” for the communication of genetic information between bacteria. The exchange of plasmids transmits pathogenically and environmentally relevant traits to the host bacteria, promoting their rapid evolution and adaptation to various environments. Over the past six decades, a large number of plasmids have been identified and isolated from different microbes. With the revolution of sequencing technology, more than 4600 complete sequences of plasmids found in bacteria, archaea, and eukaryotes have been determined. The classification of a wide variety of plasmids is not only important to understand their features, host ranges, and microbial evolution but is also necessary to effectively use them as genetic tools for microbial engineering. This review summarizes the current situation of the classification of fully sequenced plasmids based on their host taxonomy and their features of replication and conjugative transfer. The majority of the fully sequenced plasmids are found in bacteria in the Proteobacteria, Firmicutes, Spirochaetes, Actinobacteria, Cyanobacteria and Euryarcheota phyla, and key features of each phylum are included. Recent advances in the identification of novel types of plasmids and plasmid transfer by culture-independent methods using samples from natural environments are also discussed.


Introduction
Plasmids are circular or linear extrachromosomal replicons that are found in many microorganisms in the domains Bacteria, Archaea, and Eukaryota (Funnell and Phillips, 2004). Plasmids are transmissible by conjugation (Frost et al., 2005;Sota and Top, 2008;Frost and Koraimann, 2010). Smillie et al. (2010) reported that about 14% of the full-sequenced plasmids were predicted to be conjugative. Conjugation is one of the most effective mechanisms to spread genetic elements among bacteria (Guglielmini et al., 2011). It is therefore one of the most important "vehicles" for bacterial communication of genetic information, facilitating the rapid evolution and adaptation abilities seen in bacteria (Aminov, 2011).
Plasmids are also important genetic tools used to manipulate and analyze microorganisms through the introduction, modification or removal of target genes (Frost et al., 2005;Sota and Top, 2008). New plasmids have been reported with the recent revolution in nucleotide sequencing technology. Currently, there are 4602 complete sequences of plasmids in the NCBI Plasmid Genome database: 4418 are from bacteria, 137 are from archaea, and the remaining 47 are from eukaryota ( Figure 1A; based on the NCBI database, http://ftp.ncbi.nih.gov/genomes/Plasmids, Aug. 2014).
Knowledge of the relationships between plasmid features and host taxonomy is important in order to understand how the plasmids have been spread among microbes. The classification of known plasmids is necessary to identify newly isolated plasmids in the future. Information about the host range of these plasmids is critical to effectively use them as genetic tools for microbial engineering. This information is also important to detect, isolate and identify novel types of plasmids in environmental samples. In this review, known plasmids for which the complete sequence was available were classified by their host and their (putative) replication or transfer systems. The classification may help to predict which types of plasmids could be replicated in or be transferred to the host microbes. Several insights into the mechanisms of replication and conjugative transfer used by plasmids are summarized first, followed by a description of the features of representative plasmids in several phyla. Recent advances in the identification of novel types of plasmids and plasmid transfers by culture-independent methods using samples from natural environments are also described.
Ravin recently reviewed the replication mechanisms of a prophage of Escherichia coli, N15, which was the first linear plasmid identified with covalently closed ends (Ravin, 2011). Most linear plasmids have conserved "telomeres" containing inverted repeat sequences (Ravin, 2011). The 5 ′ telomeric ends are blocked by covalently attached telomere terminal proteins (Ravin, 2011). Linear plasmids have sets of conserved telomere replication genes known as tpg and tap Cohen, 2001, 2003), or tpc and tac (Huang et al., 2007). Many linear type plasmids have been found in Actinobacteria, especially in the genera Mycobacteria, Rhodococcus, and Streptomyces (Ventura et al., 2007).
Conjugative transfer is another important mechanism by which plasmids spread DNA among different bacteria. Selftransmissible plasmids in Gram-negative bacteria generally carry complete sets of genes required for transfer, the origin of transfer (oriT), the relaxase protein, the type IV coupling protein (T4CP), and the type IV secretion system (T4SS). Garcillán- Barcia et al. (2009 and Smillie et al. (2010) classified the conjugative, or mobilizable, plasmids in the GenBank database into six mobility (MOB) types (MOB C , MOB F , MOB H , MOB P , MOB Q , and MOB V ) according to the amino acid sequences of their relaxase proteins. An additional classification was performed based on the plasmids' T4SS involved in mating pair formation (MPF) during conjugation. Smillie et al. proposed four classes of MPF (MPF F , MPF G , MPF I , and MPF T ) according to the T4SS amino acid sequences (Smillie et al., 2010). They also investigated the presence of two key elements of plasmid mobility, type IV pili coupling protein (T4CP) and the ATPase VirB4 (Smillie et al., 2010). During conjugation, double-stranded plasmid DNA is cleaved at the oriT site by a relaxase protein, which then covalently binds to the oriT DNA. The resultant DNA-protein complex is transported to the recipient cell by T4SS. This single-stranded DNA is transferred into the recipient cell by the T4CP. The mobilizable plasmids only have oriT, relaxase, and sometimes T4CP (Garcillán- Barcia et al., 2009. Gram-positive bacteria transfer plasmids by two methods, although the detailed mechanisms are not well understood. First, a single strand of plasmid DNA is transported via a T4SS, which seems to be widely used as a means for transferring plasmids in Gram positive bacteria (Goessweiner-Mohr et al., 2013). Several plasmids of the order Actinomycetales have conjugative systems that function in a manner similar to the segregation of chromosomal DNA during bacterial cell division and sporulation. The translocation of double-stranded DNA to the recipient cell is mediated by an FtsK-homologous protein (Goessweiner-Mohr et al., 2013). As for archaeal plasmids, only the plasmids in Sulfolobales are known to be transferred (Greve et al., 2004); however, the mechanisms are still not well understood.

Distribution and Classification of Plasmids
Wide variations in both size and GC contents were observed among the 4602 plasmids found in the GenBank database. The average size was 80 kb (range: 744 bp-2.58 Mb), and the average GC content was 44.1% (range: 19.3-75.6%; Table S1). More than 90% of the plasmids in the database were identified in 22 phyla: Proteobacteria (2142 sequences, 47%), Firmicutes (1129 sequences, 25%), Spirochaetes (423 sequences, 9.2%), Actinobacteria (269 sequences, 5.8%), Cyanobacteria (136 sequences, 3.0%) Bacteroidetes (74 sequences, 1.6%), Chlamydiae (67 sequences, 1.5%), and Tenericutes (62 sequences, 1.3%; Figure 1A). The relationships between the phyla of plasmid hosts and plasmid size or GC content are shown in Figures 1B,C. The bimodal distribution of plasmid sizes was plotted as previously shown (Smillie et al., 2010;Garcillán-Barcia et al., 2011) with peaks at approximately 4-8 kb and 32 kb, whereas the number of sequenced plasmids increased from 1730 plasmids to 4602 plasmids. This fact indicates that the distribution of plasmid sizes is conserved if the number of sequenced plasmids increases. The average size of the 4602 plasmids (79.8 kb) was larger than those of 1730 plasmids (63.5 kb). The difference was due to the number of plasmids more than 1 Mb in 4602 plasmids (45 plasmids) was much larger than those in 1730 plasmids (9 plasmids). The average sizes of plasmids in the phyla Actinobacteria, Bacteroidetes, Cyanobacteria, Proteobacteria, and Euryarcheota were similarly distributed (650-1200 kb), while those in Chlamydiae, Firmicutes, Spirocaetes, and Tenericutes were rather small (6-40 kb, Figure 1B). However, the average GC content of plasmids was different in each host phylum (26.3-64.5%, Figure 1C). Nishida reported that the GC contents of the majority of plasmids were lower than those of their host chromosomes, although the difference was less than 10% (Nishida, 2012). Thus, it was reasonable that plasmids found in Actinobacteria had the highest GC content (64.5%; Figure 1C), because their host genomes ranged from 51% to greater than 70% GC (Ventura et al., 2007). It was suggested that the lower GC contents of plasmids than those of host chromosome might be due to the higher energy cost to maintain G and C than A and T (Rocha and Danchin, 2002). On the other hand, the replication of plasmids depending on the host cell's respiration machinery might be expected to have the same GC content as the host (Rocha and Danchin, 2002). Therefore, the GC content of the plasmid is likely to be important for determining its host range. It was suggested that the nucleotide composition of plasmid might be progressively altered toward the average nucleotide composition of the host genome (Lawrence and Ochman, 1998;Rocha and Danchin, 2002). Based on the hypothesis, the relationships of GC contents between the plasmid and the host genome might be also important for predicting when the plasmid was introduced into the host species. Indeed, even in the identical host species, the ranges of GC contents of plasmids were wider than those of the host genome sequence. For example, the GC contents of the whole nucleotide sequences of 10 plasmids in Pseudomonas aeruginosa (Table S1) ranged from 45.8 to 63.8% (the average GC contents were 58.7%). According to the Pseudomonas genome database (http://www.pseudomonas. com/), the GC contents of Pseudomonas aeruginosa ranged from 66.1 to 66.6% (average GC contents = 66.4%). The plasmid with 63.8% GC content might be introduced into strains P. aeruginosa earlier than the one with 45.8%.
Plasmids have been classified based on incompatibility since the 1970s. Incompatibility (Inc) is defined as the inability of plasmids sharing similar replication and partition systems to be propagated stably in the same host cell line. Inc groups have been independently classified in three different genera; there are 27 Inc groups in Enterobacteriaceae, 14 Inc groups of Pseudomonas, and approximately 18 Inc groups in Staphylococcus (Udo and Grubb, 1991;Lawley et al., 2004;Taylor et al., 2004;Thomas and Haines, 2004;Sota and Top, 2008;Carattoli, 2009). Several Inc groups of Pseudomonas are identical to those in enterobacteria, such as IncP-1 (equivalent to IncP), IncP-3 (equivalent to IncA/C), IncP-4 (equivalent to IncQ), and IncP-6(equivalent to IncG/U). Classification into an Inc group is always based on the amino acid sequence of the replication initiation (Rep) protein (replicon typing), and it is not necessarily confirmed by conventional methods whether the plasmid shows incompatibility with the same Inc group plasmid in the same host cell line. The classification based on the replicon typing is also useful for grouping the plasmids in unidentified Inc groups.
As described above, classification of plasmids based on their MOB types and MPF classes has previously been reported (Garcillán-Barcia et al., 2009Smillie et al., 2010). Plasmid classification by replicon typing is based on the molecular characteristics of the replicons and has been quite successful (Carattoli et al., 2005). There are difficulties with this method: (i) plasmids frequently carry multiple replicons, and it is therefore difficult to classify the plasmid into single replicon group. (ii) Detailed information about Inc groups or Rep types is limited among several microbial taxonomies especially enterobacteria, and it is difficult to identify replication regions for the other types of plasmids. The classification of plasmids based on their mobility (Garcillán- Barcia et al., 2009;Smillie et al., 2010;Garcillán-Barcia et al., 2011) can overcome these problems because (i) classification by MOB types can cover the whole microbial plasmids, and (ii) plasmids rarely carry more than one relaxase gene (Garcillán-Barcia et al., 2009). However, this classification is not appropriate for non-transmissible plasmids.
In this review, 4602 plasmids with complete sequences listed in the GenBank database were classified based on the genes encoding (putative) Rep proteins, MOB classes, and MPF types. Amino acid sequences of previously identified Rep proteins were used; their accession numbers in the DDBJ/EMBL/GenBank database are listed in Table S2-1. The local TBLASTN program was used for the classification of Rep proteins with the following parameters: e-value <10 −5 , >50% identity, and >0.5 query coverage. The parameters for identity and coverage were chosen based on variations in the amino acid sequences of the replication initiation protein TrfA from the IncP-1 plasmid (data not shown). The IncP-1 plasmid is one of the best-studied plasmids distributed across many bacterial classes (Adamczyk and Jagura-Burdzy, 2003), and thus, its TrfA protein sequence was used to set the criteria for analysis. For the ColE1 family plasmids, nucleotide sequences of RNA II were used as queries for the local BLASTN program using an e-value <10 −5 . Notably, the Rep proteins of the IncB, IncFII (RepA1), IncI, and IncK plasmids showed greater than 87% identity with one another. Similarly, the Rep proteins of Inc4, Inc9, Inc10, and Inc14 plasmids have 311-314 amino acid sequences and share a high degree of identity (>75%) with each other. These findings suggest that plasmids from these Inc groups are closely related to one another. In the case that one open reading frame simultaneously showed identity with the Rep proteins of these Inc groups, the one with the smaller e-value was used for the classification. Among the 4602 plasmids analyzed, 1845 plasmids (40.0%) in Proteobacteria, Firmicutes, Actinobacteria, Cyanobacteria, Bacteroidetes, Tenericutes, Euryarchaeota and other phyla were classified into previously known Inc groups or unidentified Inc groups with other known Rep types (Figures 2A,B), although several plasmids were classified into multiple Inc groups (Table S1).
The classification of plasmids into MOB classes and MPF types using T4CPs and VirB4 homologs was performed as proposed by de la Cruz's group (Garcillán-Barcia et al., 2009;Smillie et al., 2010). Instead of using the local PSI-BLAST program (ver. 2.2.24, http://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LAT-EST/) as described previously (Garcillán-Barcia et al., 2009), we used the local TBLASTN program with >50% identity and >0.7 query coverage for the alignment of MOB and MPF gene sequences (lists of queries are shown in Table S2-2). The MPF type of a plasmid was determined by the presence of more than Frontiers in Microbiology | www.frontiersin.org two MPF genes on the plasmid. There are five MPF types of plasmids, as defined by de la Cruz's group: (i) non-transmissible plasmids that do not code for a relaxase ("non-mob"); (ii) plasmids that do not contain a relaxase but contain T4CP, VirB4, and MPF, or any two of these three elements ("non-mob, protein export"); (iii) mobilizable plasmids that contain a relaxase gene but lack VirB4 and MPF ("mob"); (iv) conjugative plasmids that contain a known type of T4SS (MPF F , MPF T , MPF I or MPF G ), plus relaxase and T4CP ("determined conjugative"); and (v) conjugative plasmids that contain genes for relaxase, T4CP, and VirB4, but whose T4SS does not belong to a specific MPF ("undetermined conjugative"). Because the relationship between the plasmid and each MOB class or MPF type was previously reviewed in detail (Garcillán- Barcia et al., 2009Smillie et al., 2010), the distribution of "mob" or putative mobilizable, "determined conjugative" or putative transmissible, and the other plasmids were discussed in this review. As shown in Figure 2C, putative transmissible plasmids were primarily found in the phylum Proteobacteria, with smaller numbers found in Firmicutes, Bacteroidetes, Tenericutes, and others. Many putative mobilizable plasmids were found in the phyla Proteobacteria (401/2141 plasmids, 18.7%), Firmicutes (285/1130 plasmids, 25.2%), Actinobacteria (64/269 plasmids, 23.8%), and Cyanobacteria (41/136 plasmids, 30.1%) ( Figure 2C). The mobilizable plasmids had a mean peak at a smaller size (around 5 kb) than that of the transmissible plasmids (around 100 kb; Figure 2D, left).The two major peaks of mobilizable and transmissible plasmids were similarly found as those in previous report (Smillie et al., 2010). This fact indicates that the size distribution of mobilizable and transmissible plasmids is conserved if the number of full-sequenced plasmids increases. As for the distribution of the GC content of the plasmids, both types showed a wide range of values (23-74%; Figure 2D, right), probably because various hosts harbor both mobilizable and conjugative plasmids.

Plasmids in Proteobacteria
Plasmids within Proteobacteria are one of the most extensively studied groups because they include those harboring the pathogenic genes of bacteria known to infect both animals and plants. Indeed, about half of the plasmids in the database were found in Proteobacteria ( Figure 1A). Although many catabolic plasmids have also been identified in the phylum Proteobacteria, they are not described in detail here, as they have been reviewed elsewhere (Shintani et al., 2010;Shintani and Nojiri, 2013). The majority of the plasmids within this phylum were found in the classes Gammaproteobacteria (1389; 63%), Alphaproteobacteria (451; 22%), and Betaproteobacteria (187; 8.7%). The size distribution was bimodal, with one group at around 5 kb and the other at 65-131 kb ( Figure 3A, left), although the average size of plasmids in each major class was different: 58.7 kb in Gammaproteobacteria, 218 kb in Alphaproteobacteria, and 259 kb in Betaproteobacteria. The average GC contents of plasmids in these groups were 48.0% in Gammaproteobacteria, 56.5% in Alphaproteobacteria, and 58.5% in Betaproteobacteria (Figure 3A, right). Within this phylum, 1077 plasmids were classified using previously known Rep proteins.

Plasmids in Gammaproteobacteria
Among the 1396 plasmids in Gammaproteobacteria, 830 were classified into Inc groups (IncA to IncZ, IncP-1 to IncP-9, and Inc13; 485 plasmids), the ColE1-type plasmid group (210 plasmids), the ColE2/E3-type plasmid group (38 plasmids), and other groups with known Rep (119 plasmids), while several plasmids were classified into multiple Inc or Rep type groups (Table S1). The average sizes of plasmids classified into Inc13, ColE1-type, and ColE2/E3-type plasmids (6053 bp) were smaller than those of other classified plasmids (98,574 bp) (Figure 3B, left), although the average GC contents were similar between the smaller and larger groups (48.7% and 49.6%, respectively; Figure 3B, right). This was probably because the sizes of the plasmids are closely related to their replication systems.
The majority of plasmids in Gammaproteobacteria were from Enterobacteriales (927) and Pseudomonadales (183). The plasmids from Enterobacteriales contain antibiotic resistance plasmids, which cause the rapid development of resistance to antibiotics in bacterial populations. Thus, more than 75% of the plasmids in Enterobacteriales (697/927) were classified into Inc groups or Rep types, including IncF, IncA/C (equivalent to IncP-3), IncL/M, IncI, IncHI2/S, and IncN, containing plasmid families already known to carry resistance genes (Carattoli, 2009;Wang et al., 2013). Carattoli and Partridge have separately reviewed the emergence and spread of antibiotic resistance via plasmids among Enterobacteriaceae (Carattoli, 2009;Partridge, 2011). The spread of plasmids carrying genes for extendedspectrum β-lactamases (ESBLs) has been a particular problem in the treatment of Enterobacteriaceae infections (Paterson, 2006). Plasmid groups found in Enterobacteriales, such as Enterobacter, Escherichia, Salmonella, Shigella, and Yersinia, include a large number of virulence plasmids. It should be noted that virulenceassociated traits of E. coli are almost exclusively found on IncFfamily plasmids (Johnson and Nolan, 2009). In virulent Yersinia species, extensive research has been carried out on "low calcium ion (Ca 2+ ) response plasmids, " which regulate the growth and expression of several virulence-associated properties by Ca 2+ and temperature. This group includes the pCD1, pYVe8081, and pYVe227 series, which encode a set of secreted anti-host proteins and a type III secretion system (Straley et al., 1993). Notably, proteins responsible for replication initiation encoded on pCDtype plasmids showed high similarity with those encoded on IncZ plasmids, and 186 of the 976 plasmids of Enterobacteriales carried genes encoding these types of Rep proteins (Table S1).
Although the IncP-1 to IncP-14 groups have been found in Pseudomonas, only 21 plasmids in the order Pseudomonadales were classified into these groups; the other 93 plasmids classified into these Inc groups were in other orders (including Enterobacteriales and Thiotrichales), other classes (including Alphaproteobacteria and Betaproteobacteria), or even other phyla (including Actinobacteria and Thermotogae; Table S1). Recently, Xiong et al. determined the complete sequence of pOZ176, an IncP-2 plasmid which is resistant to heavy metals (Xiong et al., 2013). Ten additional plasmids have been identified which carry a gene encoding a putative Rep protein with >50% Histograms of plasmid size (left) and GC content (right) in Alphaproteobacteria. The distribution of classified plasmids are shown in blue (RepABC-type), light blue (pUT-type), dark gray (Rep20-type), light gray (other Rep types), and white (unclassified). The average sizes of Rep20-type, pUT-type, and RepABC-type plasmids are represented by dotted lines. identity and >0.5 query coverage, suggesting that these plasmids may have IncP-2-like replication systems (Table S1). The IncP-2 group contains degradative plasmids whose sizes are typically around 500 kb (Shintani et al., 2010;Shintani and Nojiri, 2013); the average size of the ten plasmids was 425 kb (Table  S1). Two plasmids were found to share >83% identity with the sequence of the Rep protein from pOZ176: pPNAP01, a plasmid of Polaromonas naphthalenivorans CJ2, encodes proteins involved in the degradation of naphthalene (Jeon et al., 2003(Jeon et al., , 2006Yagi et al., 2009), and pBB1 (GenBank/EMBL/DDBJ accession no. CP002879), a plasmid of Cupriavidus necator N-1, encodes proteins putatively involved in the degradation of catechols (Poehlein et al., 2011). Taken together, these findings suggest that the nucleotide sequences of the replication systems used by previously-known Inc group plasmids could be important tools for classifying other plasmids. It will be therefore helpful for the classification to determine the nucleotide sequences of the other unsequenced Inc group plasmids including IncP-5, IncP-8, IncP-10, IncP-11, IncP-12, IncP-13, and IncP-14 plasmids (Table S2-1).
Acinetobacter baumannii is known to be an important pathogen, and the plasmids in the genus Acinetobacter are considered to be key genetic factors in the spread of multi-drug resistance (Dijkshoorn et al., 2007;Evans and Amyes, 2014). Bertini et al. proposed PCR-based replicon typing of 19 groups (GRs) of resistance plasmids in A. baumannii on the basis of nucleotide sequence similarities between their replication initiation proteins (Bertini et al., 2010). Using the amino acid sequences of Rep from plasmids in the GR1 to GR19 groups, we were able to classify 78 of the 4602 plasmids found in the database, including 60 of the 90 plasmids in genus Acinetobacter (Table S1). The remaining 18 plasmids were also found in Gammaproteobacteria, and, notably, these plasmids were all classified into GR3 (Table S1). The average GC contents of plasmids classified into GR1-19 (38.4%) were lower than those of other plasmids in Gammaproteobacteria (Figure 3B). These findings indicate that the plasmids with GR1-to GR19-type Rep proteins were primarily distributed in Acinetobacter and that GR3 group plasmids may have a broader host range than plasmids in other GR groups.

Plasmids in Alphaproteobacteria
The majority of the 451 plasmids in Alphaproteobacteria were found in Rhizobiales (143 plasmids), Rhodospirillales (122 plasmids), Rhodobacteriales (94 plasmids), and Sphingomonadales (65 plasmids). Many plasmids in Alphaproteobacteria carry genes encoding RepABC proteins (Cevallos et al., 2008). The RepAB proteins are involved in the partitioning of the plasmid, and RepC is associated with replication initiation (Cevallos et al., 2008;Pinto et al., 2012). Indeed, 123 plasmids in Alphaproteobacteria had RepABC type genes (Table S1). Although plasmids containing these types of Rep genes have an average size of 484 kb, their sizes vary widely from 30 to 2430 kb ( Figure 3C).
Hosts in the Rhizobiales are known to be symbiotic bacteria, and they can usually fix nitrogen only upon establishing a mutualistic interaction with plants, particularly those of the Leguminosae family (Cevallos et al., 2008;Pinto et al., 2012). Their 143 plasmids are characterized by a highly variable plasmid number (from 0 to 11) and size (from 150 to 1683 kb). Of these, 101 plasmids were classified into the RepABC type group (Table S1). Notably, most of the genes for nodulation and nitrogen fixation are located on a single plasmid, called the symbiotic plasmid (pSym). pSym contains the structural genes for nitrogenase (nifHDK) and/or genes essential to the production of the nodulation factor (notABC). Introduction of pSym into a plasmid-free A. tumefaciens strain leads to nodulation of a specific host and, sometimes, to nitrogen fixation, albeit at modest levels. Conversely, elimination of pSym impairs both the nodulation and nitrogen fixation capacities of the original bacterial strain (Torres et al., 2011). The presence of pSym is common to members of the fast-growing Rhizobium and Sinorhizobium species, including symbionts of many plants of agronomic interest. RepABC plasmids were also found in classes Rhodospirillales and Rhodobacteriales (Table S1) as previously reported (Petersen et al., 2011).
Approximately half of the plasmids in class Rhodosprillales were found in Acetobacter (60 of 122 plasmids; Table S1). The acetic acid bacteria are important for the industrial oxidation of various compounds, and thus, the construction of shuttle vector(s) to facilitate functional gene studies of these bacteria was reported 30 years ago (Fukaya et al., 1985;Okumura et al., 1985). Nevertheless, a well-defined method for the classification of Acetobacter plasmids has not been developed. Two groups have recently reported the characterization of Rep proteins, Rep7 and Rep20, from small plasmids pGR7 (Grones and Grones, 2012) and pAG20 (Babic et al., 2014), respectively, found in Acetobacter (class Rhodospirillales). The Rep20 gene was also found in plasmids in Rhodospirillales and Rhodobacteriales (Figure 3C). The average size of these plasmids was 4.6 kb, while that of RepABCtype plasmids was 483 kb (Figure 3C), suggesting that the newly characterized Rep proteins may be appropriate for the replication of small plasmids.
Several plasmids in Sphingomonadales have been found in bacteria which degrade biphenyl (pNL1 and pNL2 in Novosphingobium aromaticivorans DSM 12444; (Stillwell et al., 1995;Romine et al., 1999), gamma-hexachlorocyclohexane (pCHQ1, pUT1, and pUT2 in Sphingobium japonicum UT26S; (Nagata et al., , 2010(Nagata et al., , 2011, and dibenzo-p-dioxin (pSWIT01 and pSWIT02 in Sphingomonas wittichii RW1; (Miller et al., 2010). The (putative) Rep proteins of these plasmids (pCHQ1, pUT1, pUT2, pSWIT01, pNL1, and pNL2) were found in 59 other plasmids in Alphaproteobacteria, and 24 plasmids were found in Sphingomonadales (Table S1). Notably, 42 plasmids in Alphaproteobacteria were classified as pUT1-or pUT2 type plasmids, and 36 of these were the pUT1 type ( Figure 3C). The average size of the pUT-type plasmids was 68.4 kb, which was larger than Rep20 type plasmids and smaller than RepABC-type plasmids ( Figure 3C). Most of the plasmids classified into these three Rep types were in Alphaproteobacteria (Table S1), whereas two pUT1type plasmids and six RepABC-type plasmids were found in other classes or phyla (Table S1).Thus, they are important vehicles among Alphaproteobacteria, although the plasmids with smaller sizes and lower GC contents were still unable to be classified ( Figure 3C).

Plasmids in Firmicutes
Plasmids found in Firmicutes were smaller (1.3-627 kb, 39 kb average) and had lower GC contents (23-63%, 34.4% average) than those in Proteobacteria (0.74-2580 kb, 103 kb average; 22-76%, 50.2% average; Figures 1B,C). Among the 1129 plasmids in Firmicutes, 1021 plasmids, or 90%, were found in the class Bacilli ( Figure 4A). Based on comparisons with the amino acids sequences of (putative) Rep genes of the Inc groups of staphylococci, 475 plasmids in Firmicutes were shown to have putative Rep genes of Inc1 to Inc18 plasmids (Figure 4B), and 311 plasmids were from Bacilliales, including Bacillus (57 plasmids) and Staphylococcus (220 plasmids; Table S1). A subset of 158 plasmids classified into Inc1 to Inc18 groups was found in Lactobacillales, including Enterococcus (36 plasmids), Lactobacilllus (56 plasmids), Lactococcus (15 plasmids), and Streptococcus (34 plasmids; Table S1). The Rep genes of the Inc4, Inc8, Inc9, Inc10, Inc11, Inc13 and Inc14 group plasmids are known as rolling-circle replication plasmids (Smith and Thomas, 2004;Khan, 2005;Guglielmetti et al., 2007), while those of Inc1, Inc7, and Inc18 are theta-type replication plasmids (Bruand et al., 1991;Jensen et al., 2010b;Liu et al., 2013). Another classification system based on the nucleotide sequences of plasmids was proposed for enterococci and staphylococci, which included 26 Rep families and 10 unique families (Jensen et al., 2010a;Lozano et al., 2012). Of the 4602 plasmids included in our analysis, 388 plasmids could be classified using the 26 Rep families, and 383 of them were found in Firmicutes (Table S1). Several plasmids were classified both a Rep family and an Inc group, but 91 plasmids were only classified into the Rep family ( Figure 4B). The distribution of plasmid size was bimodal, and the 269 putative rolling-circle replication plasmids had a smaller average size (5713 bp), while the average size of the 200 theta-type replication plasmids was larger (45,498 bp; Figure 4B, left). The GC contents of both groups were similar, 32.3% and 34.1%, respectively (Figure 4B, right).
Plasmids conferring resistance and virulence to Bacilli, including Enterococcus (Bacilliales) and Staphylococcus (Lactobacilliales), are causative agents of hospital infection outbreaks. The transmission of antibiotic resistance to Enterococcus is mediated by the pheromone-responsiveness and broad host range of Inc18 group plasmids (Palmer et al., 2010). The pheromoneresponsive plasmids have mainly been described in Enterococcus faecalis. The representative examples of these plasmids are pCF10 and pAD1 (Clewell, 2007;Dunny, 2007). Their transfers are induced by pheromones generated from lipoprotein signal peptides encoded on the host chromosome (Clewell, 2007;Dunny, 2007). Pheromone receptors are encoded on the plasmid, and binding of the pheromone to its receptor causes an effective pair formation between donor and recipient cells. Rosvoll et al. have reported an in-depth investigation of Inc18 plasmids, which are found mostly in E. faecium (Rosvoll et al., 2010). Of these, only pAMβ1 and pIP501 have been well-characterized and were originally identified in E. faecalis and Streptococcus agalactiae, respectively. Within Staphylococcus, the pSK1 family (Inc1, Rep 20 family), which includes a multidrug-resistance plasmid like pSK4 and pSK7, and the pSK41 family (Inc7, Rep 15 family), including a conjugative multidrug-resistance plasmid like pGO1, have been investigated along with the previously discussed rolling-circle replicating plasmid (Jensen et al., 2010b;Liu et al., 2013).
The majority of the lactic acid bacteria (LAB) that belong to Bacilli, including Lactococcus, Lactobacillus, Leuconostoc, Pediococcus, and Streptococcus, as well as Bifidobacterium (phylum Actinobacteria), are important in food fermentation. Recently, the plasmids from Lactococcus lactis were reviewed in detail regarding plasmid-encoded traits of biotechnological significance (Ainsworth et al., 2014). Genetic engineering technology, including the development of cloning vectors for the LAB, is critical for the effective application of these bacteria in the food industry (Shareck et al., 2004). Versatile vectors have been developed for high-level, inducible gene expression in Lactobacillus sakei and Lactobacillus plantarum (Sorvig et al., 2005). In the GenBank database, 166 plasmids were found in Lactobacillus, and notably, multiple (five to ten) plasmids were found in individual strains, such as Lactobacillus fermentum MTCC 8711 (Jayashree et al., 2013), Lactobacillus plantarum 16 (Crowley et al., 2013), and Lactobacillus reuteri I5007 (Hou et al., 2014). Fukao et al. reported that one of the LAB, L. brevis KB290, has nine plasmids with RepABC systems (Fukao et al., 2013). Understanding how these plasmids could be maintained simultaneously in a single host cell will be important to the development of genetic tools for LAB.

Plasmids in Spirochaetes
Although 423 plasmids were found in Spirochaetes (Figure 1A), the majority of them were identified in genus Borrelia (416 out of 423), which is known to be an important pathogenic bacterium. The average GC contents of these plasmids were comparatively low (27.7%; Figure 1C). A unique feature of Borrelia is its segmented genome consisting of multiple circular and linear plasmids in addition to its linear chromosome within a single cell (Casjens et al., 2000). The features of Borrelia have been studied in detail (Chaconas and Kobryn, 2010;Chaconas and Norris, 2013). B. burgdorferi type strain B31, the causative agent of Lyme disease, possesses twelve linear plasmids and nine circular plasmids (Fraser et al., 1997;Casjens et al., 2000). Tilly et al. recently characterized the indispensable elements involved in the maintenance of cp26, a 26-kb circular plasmid found in B. burgdorferi (Tilly et al., 2012). Most plasmids can be lost without affecting the host growth; however, cp26 remains in all isolates of B. burgdorferi (Tilly et al., 2012). This plasmid carries genes required for survival of its host, including the resT gene, which encodes a telomere resolvase involved in the resolution of the replicated telomeres of the linear chromosome and plasmids in B. burgdorferi (Tilly et al., 2012).
Notably, several circular plasmids in B31 are homologous throughout almost their entire lengths (cp32s) and carry many genes encoding lipoproteins located on cell surfaces, although very few of these are metabolic or housekeeping genes (Casjens et al., 2000). Bunikis et al. proposed an efficient method to identify each plasmid based on multiplex PCR (Bunikis et al., 2011). The mechanisms used by the host strain to replicate and maintain similar plasmids have been studied in one of the cp32 plasmids (Eggers et al., 2002). There are still many interesting characteristics that require further investigation of plasmids in Spirochetes, and comparative genomic analyses of Borrelia in particular will reveal important information. Casjens et al. compared the genomic structures of plasmids identified in four different Borrelia strains (Casjens et al., 2012). They suggested that there are at least 28 plasmid compatibility types among the four strains and that several inter-plasmid genomic rearrangements may have occurred. More genomic information will be necessary for understanding the genetic evolution of these plasmids and their host chromosome.

Plasmids in Actinobacteria
All of the plasmids in the phylum Actinobacteria were found in class Actinobacteria, and, as shown in Figures 5A,B, they were found in nine suborders and one order (Bifidobacteriales). The plasmids in Actinobacteria were large and had high GC contents (Figures 1B,C). The phylum Actinobacteria includes a wide variety of bacteria with different morphologies, physiologies, and metabolic properties, and they have various kinds of plasmids (Ventura et al., 2007). In a review of the genomics of Actinobacteria, Ventura et al. showed the phylogenetic relationships between Rep proteins from actinobacterial plasmids (Ventura et al., 2007). In this review, sequences from 42 different types of Rep proteins from actinobacterial plasmids were used as queries, and of these proteins, 27 were found in more than two plasmids (Table S2-1). Of the 269 plasmids in Actinobacteria, 110 were found in Corynebacterineae, 69 in Streptomycineae, 32 in Micrococcineae, and 29 in Bifidobacteriales. The average sizes and GC contents of these plasmids varied widely (4631-200,330 bp; 56.1-70.3%; Figures 5A,B).
The majority of the plasmids from Corynebacterineae were found in Corynebacterium (38 plasmids), Mycobacterium (29 plasmids), and Rhodococcus (31 plasmids). Because C. glutamicum is widely used as an industrial producer of amino acids, plasmids of the amino acid-producing isolates were classified to develop efficient recombinant DNA techniques (Tauch et al., 2003a). Several rolling-circle replication plasmids were found in C. glutamicum, and the pBL1 (including pAG3 and pCG2) and pCG1 (including pCG4 and pGA2) families were proposed (Tauch et al., 2003a). A plasmid found in the human pathogen C. diphtheria S601, pNG2, has a Rep protein similar to those of the pCG1 family (Tauch et al., 2003a,b).
There were six pBL1-family plasmids and 22 pCG1/pNG2 plasmids in the GenBank database (Table S1). While plasmids in the pBL1 family had small sizes (4.4-6.8 kb), those in the pCG1/pNG2 family had a wide range of sizes (4.1-85 kb, Figure 5C). Tauch et al. also reported a theta-type replication plasmid family in Corynebacterium, the pCRY4 family, but among the 4602 plasmids analyzed, only one plasmid was classified into this family (Table S1). Of the plasmids found in Mycobacterium, only the low copy number plasmid pAL5000 has been widely used (Labidi et al., 1985). Plasmids in the pMSC262 family are compatible with pAL5000, including pVT2, pMUM001, and linear plasmid pCLP (Le Dantec et al., 2001), which had a wide range of sizes (24-615 kb, Figure 5C). Among the 29 plasmids of Mycobacterium, 16 plasmids were classified into the pMSC262 family (Table S1). Plasmids in Rhodococcus were found in the biphenyl degrader, R. jostii RHA1 (Seto et al., 1995). This strain has three plasmids, pRHL1, pRHL2, and pRHL3 (Masai et al., 1997;Shimizu et al., 2001;McLeod et al., 2006). The R. erythropolis PR4 strain is known to utilize n-alkanes and alkylbenzene, and it has three plasmids, pREL1, pREC1, and pREC2 (Sekine et al., 2006). Only a small number of plasmids in Rhodococcus were predicted to have Rep proteins similar to those of pRHL1, pRHL3, pREC1, or pREC2 (Table S1).
Plasmids in Streptomycieae are found in Streptomyces species, and several of them are large linear plasmids. These linear plasmids have conserved "telomeres" containing inverted repeat sequences (Chen et al., 1993;Pandza et al., 1998). The 5 ′ telomeric ends are blocked by covalently attached telomere terminal proteins (Bao and Cohen, 2001). They have sets of conserved telomere replication genes known as tpgR1 and tapR1 or tpc and tac Cohen, 2001, 2003). In Streptomyces, 13 plasmids, all linear, carry these telomere replication gene sets (Table  S1). Their sizes were relatively large (54-1797 kb, Figure 5C) and their GC contents were high (68.3-71.9%, Figure 5C). A rolling circle replication plasmid, pIJ101, from S. lividans has been used as a cloning vector in Streptomyces (Kendall and Cohen, 1988;Ventura et al., 2007), but only two other plasmids have been found to carry a similar Rep gene (Table S1).
Plasmids in Bifidobacteriales were small, with an average size of 4.6 kb ( Figure 5A). Among the 29 plasmids in Bifidobacteriales, 28 of them were classified into the known Rep types pKJ50 (Park et al., 1999), pNAC2, pNAC3 (Corneau et al., 2004), and pDOJH10S (Lee and O'Sullivan, 2006). Rep proteins of pKJ50, pNAC2, and pNAC3 were found only in Bifidobacterium; however, those of pDOJH10S were found in plasmids of other suborders of Actinobacteria, including Corynebacterineae, Micrococcinaea, and Propionibacterineae (Table S1). It is likely that plasmids with Rep types of pDOJH10S have a broad host range in the phylum Actinobacteria.

Plasmids in Cyanobacteria
Cyanobacteria are known for the ability to undergo oxygenic photosynthesis and are a promising platform for the production of renewable chemicals and fuels. Thus, a plasmid vector to introduce exogenous genes into cyanobacterial cells has been developed. Taton et al. have developed a vector system for plasmids with a broad host range using the previously known plasmid origin of Cyanobacteria (http://golden.ucsd.edu/CyanoVECTOR/) (Taton et al., 2014). The origins of plasmids were from pDU1 in Nostoc sp. PCC 7524 (Anabaena sp. PCC 7120; Walton et al., 1992), pANS (pUH24) in Synechococus elongatus PCC 7942 (Golden and Sherman, 1983), pDC1 in Nostoc sp. PCC 8009 (Nostoc sp. ATCC 29133; Lambert and Carr, 1983), and pFDA in Fremyella diplosiphon PCC 7601 (Cobley et al., 1993). Although vectors containing these origins could be a useful module to construct new vectors for Cyanobacteria, the homologous rep genes were not found in other plasmids in our database. This finding indicates that Cyanobacteria may contain a wider variety of plasmids than other phyla. Smillie et al. (2010) proposed that cyanobacteria may use an as-yet-uncharacterized system to conjugate because, within the phylum, the plasmids encoding relaxase and/or T4CP did not possess known T4SS or VirB4 (Table S1). Indeed, plasmid transfer from Proteobacteria to Cyanobacteria has been previously reported (Encinas et al., 2014). Further indepth analyses and classification of Cyanobacteria plasmids are necessary to understand the spread of plasmids in this phylum.

Plasmids in Archaea
Among the 4602 plasmids in the database, 137 plasmids were found in Archaea: 112 plasmids were in Euryarcheoata ( Figure 1A) and 23 plasmids were in Crenarchaeota. There were large differences in the average sizes and GC contents between the two phyla. The plasmids in Euryarcheoata had an average size of 119 kb and GC content of 52.1%, while those in Crenarchaeota were 20 kb and 39.2%, respectively. Euryarcheoata includes all methanogens, haloarchaea, and some hyperthermophilic genera such as Pyrococcus and Thermococcus. Most known archaeal plasmids have cryptic phenotypes. The best characterized plasmids among methanogens are pME2001, pURB500, and pC2A (Greve et al., 2004). The putative Rep genes have been reported for pME2001 (Luo et al., 2012) and pURB500 (Tumbula et al., 1997), although multiple regions (or genes) were required for replication of pURB500. The putative gene involved in plasmid replication identified in pC2A showed limited homology with Rep proteins encoded by rolling-circle plasmids (del Solar et al., 1993;Metcalf et al., 1997). In Thermococcales, there are two wellcharacterized plasmids, pGT5 from Pyrococcus abyssi and pTN1 from Thermococcus nautilus, and their respective replication initiation proteins have been identified. Both pGT5 and pTN1 have been shown to use the rolling circle replication system (Arnold et al., 1999;Soler et al., 2007). The majority of plasmids in Crenarchaeota were found in Sulfolobales plasmids (21/23 plasmids), and they were classified into pRN-type and pNOB-type groups (Greve et al., 2004). Joshua et al. have reported the functional characterization of pRN1 (Joshua et al., 2013). A pRN-type plasmid, pSSVx, is known to be a virus-plasmid hybrid that coexists intracellularly with the fusellovirus SSV1 and can be packaged into viral particles (Arnold et al., 1999). The pNOB8-type plasmids are larger in size (around 30 kb) than pRN-type plasmids (less than 10 kb) and are known to be conjugative (She et al., 1998). The putative Rep gene for the pNOB8-type group has not been identified (Greve et al., 2004). Based on the TBLASTN analysis, these Rep genes of pRUB500, pME2001, pC2A, pGT5, pTN1, and pRN1 were found in only 15 plasmids of Archaea (Tables S1 and S2), suggesting that a wider variety of plasmids may exist in this kingdom than in others.
Genetic tools used in the analysis of archaea are currently being developed. Many archaeal species are resistant to conventional antibiotics, a factor that limits the use of genetic manipulation in archaea (Atomi et al., 2012). Several shuttle vectors are available for certain genera within Euryarcheoata, such as halophiles (Haloarcula), methanogens (Methaonococcus and Methonosarcina), and Thermococcales (Pyrococcus). However, those specific to Crenarcheota are only useful for the genus Sulfolobales. Methanogens are known to generate methane and were the first microorganisms identified as Archaea. Halophiles are the Archaea living in the most saline environments on the earth, while Thermococcales are found in high temperature environments, mostly above 80 • C (Heuer and Smalla, 2012). Sulfolobales are the only members of Crenarcheota whose genetic manipulation methods have been established thus far (Leigh et al., 2011;Atomi et al., 2012). The plasmids of Archaea are typically introduced by transformation; conjugative transfer has not been reported with the exception of a small number of plasmids in Sulfolobus (Greve et al., 2004;Smillie et al., 2010).

Recent Work to Identify other Environmental Plasmids
A large number of plasmids have been identified by chance during the analysis of the host bacteria based on specific phenotypes. Indeed, in our recent report of the whole genome sequence of a biphenyl-degrading bacterium, Geobacillus sp. JF8, we showed that it carried a plasmid, pBt40, with biphenyl-degradative genes (Shintani et al., 2014b). The putative Rep gene of pBt40 was identified, and its homologs were recently found in plasmids isolated from other strains within the genus Geobacillus, although detailed characterizations of these plasmids have not been performed (Table S1). Considering that most environmental bacteria cannot be cultivated, culture-dependent methods to detect transconjugants of plasmids are known to be highly biased. Smalla and Sobecky (2002) proposed multiphasic approaches for the characterization of plasmids and other mobile genetic elements (Smalla and Sobecky, 2002). Culture-independent methods have recently been developed to isolate and identify novel types of plasmids and to detect plasmid transfers (Heuer and Smalla, 2012).

Methods for Plasmid Isolation and Identification
Notably, 28 plasmids in the database have been identified by culture-independent methods from unidentified host strains or uncultivated microbes (Table S1). PCR using plasmid-specific sequences is one of the most common methods used to detect plasmids in environmental samples. Various primers and PCR methods to identify plasmids have been designed, such as those for Enterobacteriaceae (Carattoli et al., 2005), enterococci (Rosvoll et al., 2010), and other Gram-positive bacteria, mostly staphylococci (Clewell, 2007;Jensen et al., 2010a;Lozano et al., 2012). Methods of direct detection by PCR using degenerate primers have been reported for IncP-1, IncP-7, and IncP-9 plasmids from environmental samples (Dealtry et al., 2014) and for plasmids of different MOB families (Alvarado et al., 2012). Many plasmids have been identified from environmental samples, including the human gut, by metagenomic analyses (Elsaied et al., 2011;Kristiansson et al., 2011;Zhang et al., 2011;Brolund et al., 2013;Song et al., 2013). Brown Kav et al. characterized the overall plasmid population in the bovine rumen (termed rumen plasmidomes) by sequencing the extracted circular plasmids (Brown Kav et al., 2012. They developed a bioinformatics pipeline that could successfully detect low abundance plasmids and remove contamination of chromosomal DNA (Brown Kav et al., 2012). Sentchilo et al. compared the plasmid metagenomes from two separate activated sludge systems and found that the plasmids from the two environments were strongly different (Sentchilo et al., 2013). Jorgensen et al. reported an in silico procedure for identifying small plasmids in metagenomics datasets of the rat cecum (Jorgensen et al., 2014). They successfully identified 160 circular sequences carrying a gene with a plasmid replication domain, and the majority of these sequences were novel (Jorgensen et al., 2014).
Exogenous plasmid isolation is a culture-independent method used to capture plasmids directly from microbial communities. This method is based on the ability of the plasmid mobility and replication in a recipient. Transconjugants are selected by their phenotypes, such as antibiotic resistances, conferred by plasmid carriage. This method enables researchers to search for plasmids in a wider microcosm containing the uncultivated microbial fraction . Therefore, the potential to obtain novel types of plasmids is higher. Indeed, this method has been used to isolate a number of novel plasmids Sobecky and Hazen, 2009;Sen et al., 2011;Eikmeyer et al., 2012;Oliveira et al., 2013;Norberg et al., 2014). A transposon-aided capture (TRACA) method has been developed to study and isolate plasmids independent of plasmid-encoded traits in various bacterial habitats (Jones and Marchesi, 2007). This method has also facilitated the successful identification of several novel types of plasmids (Jones and Marchesi, 2007;Zhang et al., 2011;Burmolle et al., 2012).

Methods to Detect Plasmid Transfer
There are multiple reviews of the conventional methods to detect plasmid transfer in various environments based on culturedependent approaches (Smalla and Sobecky, 2002;Smalla, 2007, 2012;Shintani et al., 2010). Systems able to directly detect plasmid transfers by culture-independent methods have been developed using fluorescent protein markers, such as GFP, DsRed, mCherry, GusA,LuxAB, with antibiotic resistance genes (Amann and Fuchs, 2008). Fluorescence in situ hybridization (FISH) using specific probes targeting 16S rRNA sequences and in situ PCR in transconjugant cells are able to detect transconjugants at the single-cell level (Amann and Fuchs, 2008;Cenciarini-Borde et al., 2009;Wagner and Haider, 2012). Fluorescent cells are also detectable and separable with the use of flow cytometry or a micromanipulator at the single cell level; both methods have a strong potential to identify actual host ranges of plasmids (Musovic et al., 2010;Shintani et al., 2014a).
In silico analyses have been developed to identify horizontal gene transfer phenomena including plasmid transfer. Yamashita et al. carried out a plasmidome network analysis of all available complete bacterial plasmids to identify and characterize the most recent horizontal gene transfer or plasmid transfer (Yamashita et al., 2014). de Been et al. reported a novel approach for the reconstruction of mobile genetic elements from whole-genome sequence data, with which they discovered specific plasmid lineages shared between farm animals and humans (de Been et al., 2014).
Over the next decade, the number of fully-sequenced plasmids will greatly increase through the use of deep sequencing methods and in silico analyses of datasets. The classification methods using Rep genes and mobility types based on their nucleotide sequences are useful to identify newly isolated plasmids. In this review, it is discussed genes for replication and conjugative transfer, but not the other gene(s) on plasmids including genes for resistance to antibiotics, metabolism of natural and synthetic compounds, pathogenicity, and host symbiosis. The classification lists in this review will be helpful to understand how these "other" genes on plasmids could be spread among microbes. The classification will be also of help to predict host candidates of plasmids found by a metagenomics approach, rather than mostly a collection of results obtained.
Considering that less than half of the plasmids in the database were able to be classified, there may still be unknown or novel types of plasmid replication or transfer systems in microbes. Therefore, characterizations of plasmid features involved in replication, maintenance, transfer, and host range based on molecular biological and biochemical methods are still necessary. Cultureindependent methods to detect plasmids and their transfers are currently available for analyzing non-culturable or uncultivated microbes. These studies will not only enable us to understand the evolution of microbes mediated by plasmid transfers but will also provide us with useful tools for genetic analyses of both culturable and non-culturable microbes.