Cellular and Infection Microbiology Comparative Genomics Evidence That Only Protein Toxins Are Tagging Bad Bugs

The term toxin was introduced by Roux and Yersin and describes macromolecular substances that, when produced during infection or when introduced parenterally or orally, cause an impairment of physiological functions that lead to disease or to the death of the infected organism. Long after the discovery of toxins, early genetic studies on bacterial virulence demonstrated that removing a certain number of genes from pathogenic bacteria decreases their capacity to infect hosts. Each of the removed factors was therefore referred to as a " virulence factor, " and it was speculated that non-pathogenic bacteria lack such supplementary factors. However, many recent comparative studies demonstrate that the specialization of bacteria to eukaryotic hosts is associated with massive gene loss. We recently demonstrated that the only features that seem to characterize 12 epidemic bacteria are toxin–antitoxin (TA) modules, which are addiction molecules in host bacteria. In this , study, we investigated if protein toxins are indeed the only molecules specific to patho-genic bacteria by comparing 14 epidemic bacterial killers (" bad bugs ") with their 14 closest non-epidemic relatives (" controls "). We found protein toxins in significantly more elevated numbers in all of the " bad bugs. " For the first time, statistical principal components analysis , including genome size, GC%, TA modules, restriction enzymes, and toxins, revealed that toxins are the only proteins other than TA modules that are correlated with the path-ogenic character of bacteria. Moreover, intracellular toxins appear to be more correlated with the pathogenic character of bacteria than secreted toxins. In conclusion, we hypothesize that the only truly identifiable phenomena, witnessing the convergent evolution of the most pathogenic bacteria for humans are the loss of metabolic activities, i.e., the outcome of the loss of regulatory and transcription factors and the presence of protein toxins, alone, or coupled as TA modules.


INTRODUCTION
The term toxin was introduced 123 years ago by Roux and Yersin to describe macromolecular substances that, when produced during infection or when introduced parenterally or orally, cause an impairment of physiological functions that leads to disease or to the death of the infected organism (Alouf, 2000). Since 1884, 323 toxins have been identified (Figure 1). Toxins are mainly classified as bacterial protein toxins/exotoxins or toxic lipopolysaccharide complexes (LPSs)/endotoxins. Exotoxins are usually secreted by living bacteria during exponential growth. The production of the toxin is generally specific to a particular bacterial species that produces the disease associated with the toxin. For example, only Clostridium tetani produces Tetanus toxin, and only Corynebacterium diphtheriae produces Diphtheria toxin. Toxins have a chemical, quantifiable action that establishes an aggressive frontal assault strategy during microbial pathogenesis. In contrast, LPSs establish a non-specific stealth assault that subverts the host's immune response (Merrell and Falkow, 2004).
Long after the discovery of toxins, early genetic studies on bacterial virulence demonstrated that removing a certain number of genes from pathogenic bacteria decreases their capacity to infect hosts. Each of the removed factors was therefore referred to as a "virulence factor" (Lawrence, 1999a;Dobrindt et al., 2004;Ochman et al., 2005), and it was speculated that nonpathogenic bacteria lack such supplementary factors (Lawrence, 1999a;Ochman et al., 2005). However, many recent comparative studies demonstrate that specialization of bacteria to eukaryotic hosts is associated with massive gene loss (Nierman et al., 2004;Merhej et al., 2009) and even loss of identified "virulence factors" Georgiades and Raoult, 2011a,b). Rickettsiae constitute the most representative example of this observation. Genomic analysis of rickettsial species reveals that a shift to pathogenicity does not require acquisition of new genes. Rather, gene loss is implicated in the emergence of their virulence (Andersson and Kurland, 1998;Andersson and Andersson, 1999;Moran, 2002;Blanc et al., 2007;Darby et al., 2007;Fournier et al., 2009;Merhej and Raoult, 2010). Furthermore, in a recent study in our laboratory comparing the 12 most dangerous epidemic bacteria for humans with their closest sequenced non-epidemic relatives, we demonstrated that the epidemic bacteria ("bad bugs") have significantly smaller genomes, resulting FIGURE 1 | Discovery of toxins. The first toxin was identified in 1884, while toxin-antitoxin modules were considered as virulence factors for the first time in 2011. Chronological data were retrieved from Alouf (2000).
from degraded recombination and repair systems, while no significant differences were observed concerning characteristics that were considered to play a role in pathogenesis (Georgiades and Raoult, 2011b). In the latter study, the only features that appear to characterize epidemic bacteria are, surprisingly, toxin-antitoxin (TA) modules (Figure 1; Georgiades and Raoult, 2011b). Other than their roles as addiction molecules in host bacteria, TA modules (Jensen and Gerdes, 1995;Fozo et al., 2010) seem to play a role in bacterial virulence, given that pathogenicity is initiated after attempts to limit their translation in bacteria (Kristoffersen et al., 2000;Yamamoto et al., 2002;Picardeau et al., 2003). Other than toxins and TA modules, some restriction enzymes have toxic activities (Arber, 1978). The actual role of restriction enzymes is bacterial defense against bacteriophage DNA by cleaving the DNA at specific sites. They are also considered to act as selfish elements (Bourniquel and Bickle, 2002). The toxic effects of restriction enzymes are mediated in the form of mutagenic effects on host cellular DNA (Arber, 1978;Kinashi et al., 1993;Price et al., 1995).
In this study, we investigated if toxins are indeed the only molecules specific to pathogenic bacteria. We supplemented the 12 "bad bugs" from our previous study with two additional "bad bugs" (Neisseria meningitidis and Staphylococcus aureus) and compared all 14 with their 14 closest non-epidemic relatives ("controls"). These 14 "bad bugs" are the most dangerous epidemic bacteria of all times and we chose to limit our study to them in order to achieve a neutral and not biased approach.

GENOMIC CHARACTERISTICS
All genomic characteristics used in this study [i.e., genome size, GC%, and number of open reading frames (ORFs)] were retrieved from the NCBI database. A graph was plotted and a χ 2 statistical test was performed for each feature to determine if there were statistically significant differences between the "bad bugs" and the "controls."

RESTRICTION ENZYMES
Restriction enzymes for each of the 28 bacterial species were retrieved from the REBASE database (Roberts et al., 2010). A graph was designed, and a χ 2 statistical test was performed to conclude whether there were statistically significant differences between the "bad bugs" and the "controls."

TA MODULES
Text-mining searches were conducted in the GenBank protein database for the following seven type II TA families: VapB/C, RelE/B, ParE/D, MazE/F, phd/doc, ccdA/B, and higA/B. Each protein was used in a tBLASTN query, and hits were defined based on an e-value threshold of 10e-5 with >30% identity and at least 70% coverage. A graph was designed and a χ 2 statistical test was performed to conclude whether there were statistically significant differences between the "bad bugs" and the "controls."

PROTEIN TOXINS
Protein toxins for each of the 28 bacterial species were retrieved from the MvirDB database (Zhou et al., 2007). A graph was designed, and a χ 2 statistical test was performed to conclude whether there were statistically significant differences between the "bad bugs" and the "controls."

PHYLOGENIES
Phylogenomic trees were constructed for all restriction enzymes, protein toxins, and all toxins and TA modules by generating a matrix of binary discrete characters ("0" and "1" for absence and presence, respectively) and implementing the neighbor-ioining (NJ) method in Phylogeny Inference Package (PHYLIP ; Felsenstein, 1993). Phylogenetic trees were also separately constructed for each restriction enzyme family gene using three methods: NJ, maximum-parsimony (MP), and maximum-likelihood (ML). Alignments were performed with ClustalX2 (Larkin et al., 2007), and trees were constructed using Mega4 (Tamura et al., 2007).

ORIGINS OF THE TOXINS AND TA MODULES
A literature analysis allowed us to determine which of the toxins were transported to the bacterium by various mobile elements. Additionally, using phylogenies, we were able to determine whether toxins and TA modules were acquired by horizontal gene transfer (HGT) for each of the bacteria in our study. A tBLASTN query was performed for each toxin, and hits were defined based on an e-value threshold of 10e-5 with >30% identity and at least 70% coverage. Alignments were performed with ClustalX2 (Larkin et al., 2007), and trees were constructed using Mega4 (Tamura et al., 2007) via three methods: NJ, MP, and ML.

PRINCIPAL COMPONENTS ANALYSES
A first principal components analysis (PCA) was performed, including genome size, GC%, number of ORFs, protein toxins, restriction enzymes, and TA modules. The second PCA included protein toxins and TA modules, taking into consideration the lifestyle of the toxins (intracellular or extracellular). The PCAs were performed using R software (http://www.r-project.org/version 2.11.0).

RESULTS
Confirming our previous study (Georgiades and Raoult, 2011b), our current results reveal significant differences in genome size and the number of ORFs between the "bad bugs" and "controls." Indeed,"bad bugs" have smaller genomes and less ORFs than "controls" (Table 1; Figures A1A,B in Appendix). However, while most "bad bugs" have a lower GC%, the differences are not statistically significant (Table 1; Figure A1C in Appendix). Furthermore, only five bad bugs have more restriction enzymes than their controls (M. tuberculosis, C. diphtheriae, S. pneumoniae, S. typhi, and N. meningitidis), and the differences are not statistically significant (p = 0.2857; Table 1; Figure A1D and Table A1 in Appendix). The phylogenomic tree for restriction enzymes displays a small cluster containing six species that do not encode restriction enzymes, of which four are "bad bugs" (Y. pestis, B. pertussis, T. pallidum, and R. prowazekii; Figure A2 in Appendix). The phylogenetic tree for type I restriction enzymes displays a cluster containing five "controls" (S. suis, V. parahaemolyticus, M. smegmatis, E. coli HS, and T. denticola) and only one "bad bug" (S. pneumoniae). The rest of the trees generally present topologies that are different from the topologies given by a 16S rRNA phylogenetic tree, with closely related species not clustering together. This result holds true for all of the methods used in this study (NJ, MP, and ML; Figure A3 in Appendix).
Concerning the protein toxins, all of the "bad bugs" contain more proteins, and the difference is statistically significant (p = 0.0002; Table 1; Figure A1E and Table A2 in Appendix). The phylogenomic tree of all toxins includes a cluster containing 12 species, of which nine are "controls" missing toxins (T. denticola, S. haemolyticus, V. parahaemolyticus, E. coli HS, S. schwarzengrund, C. glutamicum, R. africae, M. smegmatis, and M. avium). Moreover, a cluster of seven "bad bugs" containing toxins is also present. "Bad bugs" or "controls" tend to cluster together and not with their phylogenetically closest relatives ( Figure A4 in Appendix). Finally, we searched the 28 genomes for members of the seven known TA modules and found that eight "bad bugs" (M. tuberculosis, Y. pestis, S. pneumoniae, S. pyogenes, S. typhi, S. dysenteriae, V. cholerae, and N. meningitidis) contain significantly more TA modules than their controls (p = 0.0445; Tables 1 and 2, Figure A1F in Appendix). Our results agree with the findings of previous studies (Pandey and Gerdes, 2005;Goulard et al., 2010;Georgiades and Raoult, 2011b) and the data in the toxin-antitoxin database (TADB; Shao et al., 2011).
We constructed a phylogenomic tree based on the presence/absence of toxins and TA modules that presents three clusters. One of these clusters contains seven "controls" with no toxins or TA modules (M. avium, R. africae, C. glutamicum, S. schwarzengrund, E. coli, V. parahaemolyticus, and S. haemolyticus), whereas "bad bugs" that encode many toxic elements are grouped together (M. tuberculosis, Y. pestis, S. pneumoniae, S. pyogenes, S. typhi, S. dysenteriae, V. cholerae, S. aureus, and N. meningitidis; Figure 2).

Frontiers in Cellular and Infection Microbiology
www.frontiersin.org Table 1 | Genomic characteristics studied for each of the 28 bacterial species. "Bad bugs" are in red and "controls" in blue. We also searched for possible HGT events, which would be located in the genomic origins of toxin and TA module sequences present in the genomes of the bacteria of interest. We did not find strong evidence for such events, except in three cases: cholera toxin seems to have been gained by V. cholerae from E. coli, and Zeta toxin of N. cinerea, and Shiga toxin also grouped with E. coli strains (the commensal strain O8 IAI1 and strain O103, respectively). As for TA modules, they appear to have different origins; HigA/B were likely transferred by Firmicutes, MazE/F likely came from different proteobacteria, RelB/E may have been transferred by Gammaproteobacteria, and VapB/C were most likely transferred from Betaproteobacteria (Figure A5 in Appendix).

Species
Our initial PCA revealed that other than TA modules, only toxins appear to be correlated with pathogenicity (Figure 3). "Bad bugs" and "controls" are separated on the analysis plot when these two features are taken into consideration. Furthermore, as discovered in the second PCA, when we considered the lifestyles of the toxins (i.e., whether they are intracellular or extracellular), the intracellular toxins, including TA modules, are most related to the pathogenic capacity of the bacteria studied and not the extracellular toxins, as would be expected (Figure 4). Other than TA modules, which are intracellular toxins, the toxins of Salmonella, Shigella, V. cholerae, Yersinia, and B. pertussis are also intracellular. This is why their pathogenic capacities were so difficult to understand and why the TA modules were never considered important for bacterial pathogenesis.

DISCUSSION
In this study, we investigated whether restriction enzymes and protein toxins are the only features, other than TA modules (Georgiades and Raoult, 2011b), that can be considered natural virulence factors for bacteria. To maintain an unbiased analysis, we assessed the 14 most dangerous epidemic bacteria for humans and compared them to their closest non-epidemic related species.
In a previous study, we demonstrated that 12 epidemic bacteria have reduced-size genomes and more selfish elements, i.e., TA modules, which have evolved such that the host cells become addicted to them (Lawrence, 1999b;Makarova et al., 2009;Georgiades and Raoult, 2011b). Other studies also report a high number of TA modules in pathogenic bacteria, such as Y. pestis (Goulard et al., 2010), but their role in pathogenesis is not evident and was never considered previously. TA modules are mostly encoded on plasmids or within prophages and were initially identified as plasmid stabilization factors. When they were also found in multiple copies on chromosomes, it was hypothesized that they might also have a role in the stabilization of integrons in bacterial chromosomes (Szekeres et al., 2007). Other systems that possess properties of selfish elements are restriction enzymes, Frontiers in Cellular and Infection Microbiology www.frontiersin.org Table 2 | Number of TA modules in each of the TA families. "Bad bugs" are in red and "controls" in blue.

M. leprae
Asterisk shows bad bugs with more TA than their controls.
which may also constitute a toxic danger for bacterial host cells. It has been demonstrated that restriction enzymes can be toxic for mammalian cells by promoting DNA mutations (Kinashi et al., 1993;Price et al., 1995). However, our study did not reveal a significant difference in the numbers of restriction enzymes in "bad bugs." Since Diphtheria toxin was isolated by Roux and Yersin in 1888, bacterial toxins have been recognized as the primary virulence factors for pathogenic bacteria. These toxins are defined as "soluble substances that alter the normal metabolism of host cells with deleterious effects on the host," are considered as one of the most powerful human poisons known, and can retain high activity when very dilute. Indeed, the major symptoms associated with diseases caused by C. diphtheriae, B. pertussis, V. cholerae, Bacillus anthracis, Clostridium botulinum, and enterohemorrhagic E. coli are all related to the activities of the toxins produced by these organisms (Alouf, 2000;Merrell and Falkow, 2004). Thus, we hypothesize that toxins have a direct measurable effect on cells. Injecting toxins in an animal or a cell leads to its death, whereas administration of antibodies against the toxin in question offers protection against future infections. This is what constitutes the principle of vaccinations against Diphtheria and Tetanus and the principle of preventive passive immunization (Stiehm, 1998). Recent experimental and cellular models testing different individual genes from bacterial genomic repertoires demonstrate that some glycoproteins have similar toxic roles (Merrell and Falkow, 2004). These endotoxins may be released in a soluble form by young cultures grown in the laboratory; however they act in a way that does not reflect bacterial virulence in experimental models (Wesselink et al., 1978). For the most part, endotoxins remain associated with the cell wall until disintegration of the organisms. In vivo, this is the result of autolysis, external lysis mediated by complement and lysozyme, and phagocytic digestion of bacterial cells. Both the toxic component of LPS and the immunogenic portion of LPS act as determinants of pathogenicity for bacteria (Merrell and Falkow, 2004). Furthermore, bacterial pathogenicity in humans is also associated with the epidemic capacity of the bacteria, about which there are few hypotheses, except in the case of vector-borne diseases (Gubler, 1997), whose multiplication in the blood and levels of bacterial load are critical. Such a considerable multiplication causes deleterious effects to the host that may lead to its death, and events such as high multiplication results in a de-regulation associated Frontiers in Cellular and Infection Microbiology www.frontiersin.org with the disappearance of transcription regulators observed in pathogenic species (Merhej et al., 2009;Georgiades and Raoult, 2011b). This likely causes the persistence of multiplication in the FIGURE 2 | Phylogenomic tree based on presence/absence of toxin-antitoxin modules and toxins. Closely related species do not cluster together while totally unrelated species do, meaning that highly divergent species may have common evolutionary histories. "Bad bugs" are in red, and "controls" are in blue.
blood, even if nutrients decrease, and the appearance of a less favorable condition that breaks the balance with the host. In our study, we found significantly elevated numbers of protein toxins in all of the "bad bugs," while most of the "controls" only possess two or less toxins. The statistical PCA, including genome size, GC%, TA modules, restriction enzymes, and toxins, revealed that other than TA modules, toxins are the only proteins linked to the pathogenicity of bacteria (Figure 3). Moreover, intracellular toxins appear to be more related to the pathogenicity of bacteria than secreted toxins (Figure 4). Of course, this hypothesis needs to be confirmed by experimentations and possible mechanisms should be proposed. Phylogenomic trees, constructed based on the presence/absence of restriction enzyme genes, toxins, and TA modules, display different topologies compared to 16S rRNA phylogenetic trees, meaning that closely related species are not found in sister taxa. "Bad bugs" or "controls" tend to cluster together rather than with their phylogenetic neighbors. This demonstrates that specialized epidemic bacteria have congruent evolutionary histories, resulting in a virulent gene repertoire defined by both present  Frontiers in Cellular and Infection Microbiology www.frontiersin.org FIGURE 4 | Principal components analysis. Intracellular toxins and TA modules characterize epidemic bacteria. "Bad bugs" are in red, and "controls" are in blue.
FIGURE 5 | The "bad bug" creation scenario. Initially, bacteria contain metabolic functions and recombination machinery. TA modules are gained by gene transfer, and toxin genes arrive in bacterial genomes by various mobile elements, such as plasmids.
After specialization, the bacterial recombination systems are degraded, and metabolic functions are lost. "Bad bugs" are characterized by toxins and TA modules that stabilize neighboring genes and limit massive gene loss. Appendix). Furthermore, most toxins arrive in bacterial genomes by various mobile elements, especially bacteriophages. This is the case for the streptococcal and staphylococcal toxins, Cholera toxin, Shiga toxin, and E. coli Enterotoxins. Modifications of bacterial virulence by bacteriophages were initially brought to light in 1951 (Freeman, 1951). This first demonstration showed that avirulent strains of C. diphtheriae infected with a bacteriophage yielded virulent lysogens producing Diphtheria toxin (Freeman, 1951). The outbreak of bloody diarrhea and hemolytic uremic syndrome (HUS) due to an E. coli O104:H4 strain in Germany in May and June 2011 illustrates the capacity of bacterial species to produce new combinations of genes, leading to the emergence of highly aggressive strains. The O104:H4 strain acquired the phage-mediated Shiga toxin and resistance to numerous antibiotics. However, antibiotic selective pressure has nothing to do with specialization (Werner et al., 2004;Furuya and Lowy, 2006). As in the HUS case, antibiotics are not always indicated in the treatment of the human disease because they may worsen the symptoms when the toxin is released (Denamur, 2011).
In conclusion, the only truly identifiable phenomena, witnessing a convergent evolution in the most pathogenic bacteria for humans, are the loss of metabolic activities (Georgiades and Raoult, 2011b) that occur via the loss of regulatory and transcription factors and the acquisition of protein toxins, alone, or coupled as TA modules (Figure 5). For example, in the case of Shigella, the loss of the lysine decarboxylase activity is a major event in the acquisition of its virulence capacity (Maurelli et al., 1998). TA modules stabilize and protect neighboring genes, which explains the fact that massive gene loss in some specialized bacteria seems to be decreased, while in some others (e.g., R. prowazekii) that do not contain TA modules, gene loss is much more extreme and on-going.