Comparative and Functional Genomics of Legionella Identified Eukaryotic Like Proteins as Key Players in Host–Pathogen Interactions

Although best known for its ability to cause severe pneumonia in people whose immune defenses are weakened, Legionella pneumophila and Legionella longbeachae are two species of a large genus of bacteria that are ubiquitous in nature, where they parasitize protozoa. Adaptation to the host environment and exploitation of host cell functions are critical for the success of these intracellular pathogens. The establishment and publication of the complete genome sequences of L. pneumophila and L. longbeachae isolates paved the way for major breakthroughs in understanding the biology of these organisms. In this review we present the knowledge gained from the analyses and comparison of the complete genome sequences of different L. pneumophila and L. longbeachae strains. Emphasis is given on putative virulence and Legionella life cycle related functions, such as the identification of an extended array of eukaryotic like proteins, many of which have been shown to modulate host cell functions to the pathogen’s advantage. Surprisingly, many of the eukaryotic domain proteins identified in L. pneumophila as well as many substrates of the Dot/Icm type IV secretion system essential for intracellular replication are different between these two species, although they cause the same disease. Finally, evolutionary aspects regarding the eukaryotic like proteins in Legionella are discussed.


INTRODUCTION
Genomics has the potential to provide an in depth understanding of the genetics, biochemistry, physiology, and pathogenesis of a microorganism. Furthermore comparative genomics, functional genomics, and related technologies, are helping to unravel the molecular basis of the pathogenesis, evolution, and phenotypic differences among different species, strains, or clones and to uncover potential virulence genes. Knowledge of the genomes provides the basis for the application of new powerful approaches for the understanding of the biology of the organisms studied.
Although Legionella are mainly environmental bacteria, several species are pathogenic to humans, in particular Legionella pneumophila Mcdade et al., 1977) and Legionella longbeachae (Mckinney et al., 1981). Legionnaires' disease has emerged in the second half of the twentieth century partly due to human alterations of the environment. The development of artificial water systems in the last decades like air conditioning systems, cooling towers, showers, and other aerosolizing devices has allowed Legionella to gain access to the human respiratory system. When inhaled in contaminated aerosols, pathogenic Legionella can reach the alveoli of the lung where they are subsequently engulfed by macrophages. In contrast to most bacteria, which are destroyed, some Legionella species can multiply within the phagosome and eventually kill the macrophage, resulting in a severe, often fatal pneumonia called legionellosis or Legionnaires' disease (mortality rate of 5-20%; up to 50% in nosocomial infections; Steinert et al., 2002;Marrie, 2008;Whiley and Bentham, 2011). To replicate intracellularly L. pneumophila manipulates host cellular processes using bacterial proteins that are delivered into the cytosolic compartment of the host cell by a specialized type IV secretion system called Dot/Icm. The proteins delivered by the Dot/Icm system target host factors implicated in controlling membrane transport in eukaryotic cells, which enables L. pneumophila to create an endoplasmic reticulum-like vacuole that supports intracellular replication in both protozoan and mammalian host cells (for a review see Hubber and Roy, 2010).
An interesting epidemiological observation is, that among the over 50 Legionella species described today, strains belonging to the species L. pneumophila are responsible for over 90% of the legionellosis cases worldwide and strains belonging to the species L. longbeachae are responsible for about 5% of human legionellosis cases worldwide (Yu et al., 2002). Surprisingly, this distribution is very different in Australia and New Zealand where L. pneumophila accounts for "only" 45.7% of the cases but L. longbeachae is implicated in 30.4% of the human cases. Furthermore, among the strains causing Legionnaires' disease, L. pneumophila serogroup 1 (Sg1) alone is responsible for over 85% of cases (Yu et al., 2002;Doleans et al., 2004) despite the description of 15 different Sg within this species. In addition, the characterization of over 400 different L. pneumophila Sg1 strains has shown that only a minority among these is responsible for causing most of the human disease (Edelstein and Metlay, 2009). Some of these clones are distributed worldwide like L. pneumophila strain Paris (Cazalet et al., 2008) others have a more restricted geographical distribution, like the recently described endemic clone, prevalent in Ontario, Canada (Tijet et al., 2010). For the species L. longbeachae two serogroups are described to date (Bibb et al., 1981;Mckinney et al., 1981). L. longbeachae Sg1 is predominant in human disease as it causes up to 95% of the cases of legionellosis worldwide and most outbreaks and sporadic cases in Australia (Anonymous, 1997;Montanaro-Punzengruber et al., 1999). The two main human pathogenic Legionella species, L. pneumophila and L. longbeachae cause the same disease and symptoms in humans (Amodeo et al., 2009), however, there exist major differences between both species in niche adaptation and host susceptibility.
(i) They are found in different environmental niches, as L. pneumophila is mainly found in natural and artificial water circuits and L. longbeachae is principally found in soil and therefore associated with gardening and use of potting compost (O'Connor et al., 2007). However, although less common, the isolation of L. pneumophila from potting soil in Europe has also been reported (Casati et al., 2009;Velonakis et al., 2009). Human infection due to L. longbeachae is particularly common in Australia but cases have been documented also in other countries like the USA, Japan, Spain, England, or Germany (MMWR, 2000;Garcia et al., 2004;Kubota et al., 2007;Kumpers et al., 2008;Pravinkumar et al., 2010). (ii) As described for other Legionella species, person to person transmission of L. longbeachae has not been documented, however, the primary transmission mode seems to be inhalation of dust from contaminated compost or soil that contains the organism (Steele et al., 1990;MMWR, 2000;O'Connor et al., 2007). (iii) Furthermore, for L. pneumophila a biphasic life cycle was observed in vitro and in vivo as exponential phase bacteria do not express virulence factors and are unable to replicate intracellularly. The ability of L. pneumophila to replicate intracellularly is triggered at the post-exponential phase by a complex regulatory cascade (Molofsky and Swanson, 2004;Sahr et al., 2009). In contrast, less is known on the L. longbeachae intracellular life cycle and its virulence factors. It was recently shown that unlike L. pneumophila the ability of L. longbeachae to replicate intracellularly is independent of the bacterial growth phase  and that phagosome biogenesis is different. Like L. pneumophila, the L. longbeachae phagosome is surrounded by endoplasmic reticulum and does not mature to a phagolysosome; however it acquires early and late endosomal markers . (iv) Another interesting difference between these two species is their ability to colonize the lungs of mice. While only A/J mice are permissive for replication of L. pneumophila, A/J, C57BL/6, and BALB/c mice are all permissive for replication of L. longbeachae Gobin et al., 2009). Resistance of C57BL/6 and BALB/c mice to L. pneumophila has been attributed to polymorphisms in Nod-like receptor apoptosis inhibitory protein 5 (naip5) allele that recognizes the C-terminus of flagellin (Wright et al., 2003;Molofsky et al., 2006;Ren et al., 2006;Lightfield et al., 2008). The current model is that L. pneumophila replication is restricted due to flagellin dependent caspase-1 activation through Naip5-Ipaf and early macrophage cell death by pyroptosis. However, although depletion or inhibition of caspase-1 activity leads to decreased targeting of bacteria to lysosomes, the mechanism of caspase-1-dependent restriction of L. pneumophila replication in macrophages and in vivo is not fully understood (Schuelein et al., 2011).
In the last years, six genomes of different L. pneumophila strains (Paris, Lens, Philadelphia, Corby, Alcoy, and 130b (Cazalet et al., 2004;Chien et al., 2004;Steinert et al., 2007;D'Auria et al., 2010;Schroeder et al., 2010) have been published. The genome sequences of all but strain 130b were completely finished. Furthermore, the sequencing and analysis of four genomes of L. longbeachae have been carried out recently (Cazalet et al., 2010). L. longbeachae strain NSW150 of Sg1 isolated in Australia from a patient was sequenced completely, and for the remaining three strains (ATCC33462, Sg1 isolated from a human lung, C-4E7 and 98072, both of Sg2 isolated from patients) a draft genome sequence was reported. A fifth L. longbeachae strain (D-4968 of Sg1, isolated in the US from a patient) was recently sequenced and the analysis of the genome sequences assembled into 89 contigs was reported (Kozak et al., 2010).
Here we will describe what we learned from the analysis and comparison of the sequenced Legionella strains. We will discuss their general characteristics and then highlight the specific features or common traits with respect to the different ecological niches and the differences in host susceptibility of these two Legionella species. Emphasis will be put on putative virulence and Legionella life cycle related functions. In the last part we will analyze and discuss the possible evolution of the identified virulence factors. Finally, future perspectives in Legionella genomics are presented.

GENERAL FEATURES OF THE L. PNEUMOPHILA AND L. LONGBEACHAE GENOMES
Legionella pneumophila and L. longbeachae each have a single, circular chromosome with a size of 3.3-3.5 Mega bases (Mb) for L. pneumophila and 3.9-4.1 Mb for L. longbeachae. For both the average G + C content is 38% (Tables 1A,B). The L. pneumophila strains Paris and Lens each contain different plasmids, 131.9 kb and 59.8 kb in size, respectively. In strain Philadelphia-1, 130b, Alcoy, and Corby no plasmid was identified. The L. longbeachae strains NSW10 and D-4986 carry highly similar plasmids of about 70 kb and DNA identity of 99%, strains C-4E7 and 98072 also contain each a highly similar plasmid of 133.8 kb in size. Thus similar plasmids circulate among L. longbeachae strains, but they seem to be different from those found in L. pneumophila.
A total of ∼3000 and 3500 protein-encoding genes are predicted in the L. pneumophila and L. longbeachae genomes, respectively. No function could be predicted for about 40% of these genes and about 20% are unique to the genus Legionella. Comparative analysis of the genome structure of the L. pneumophila genomes showed Frontiers in Microbiology | Cellular and Infection Microbiology high colinearity, with only few translocations, duplications, deletions, or inversions (Figures 1A,B) and identified between 6 and 11% of genes as specific to each L. pneumophila strain. Principally, the genomes contain three large plasticity zones, where the synteny is disrupted: a 260-kb inversion in strain Lens with respect to strains Paris and Philadelphia-1, a 130-kb fragment which is inserted in a different genomic location in strains Paris and Philadelphia-1 and the about 50 kb chromosomal region carrying the Lvh type IV secretion system, previously described in strain Philadelphia-1 (Segal et al., 1999). Furthermore, deletions and insertions of several smaller regions were identified in each strain, as well as regions with variable gene content. In contrast, comparison of the completed chromosome sequences of L. pneumophila and L. longbeachae shows that the two Legionella species have a significantly different genome organization ( Figure 1C). Moreover only about 65% of the L. longbeachae genes are orthologous to L. pneumophila genes, whereas about 34% of all genes are specific to L. longbeachae with respect to L. pneumophila Paris, Lens, Philadelphia, and Corby (defined by less than 30% amino acid identity over 80% of the length of the smallest protein). Analysis of single nucleotide polymorphisms (SNP) revealed a very low SNP number of less than 0.4% among the four L. longbeachae genomes, which is significantly lower than the polymorphism of about 2% between L. pneumophila Sg1 strains Paris and Philadelphia ( Table 1B). Comparison of the two L. longbeachae Sg1 genomes (NSW150, ATCC33462) identified 1611 SNPs of which 1426 are located in only seven chromosomal regions mainly encoding putative mobile elements, whereas the remaining 185 SNPs were evenly distributed around the chromosome. A similar number of about 1900 SNPs were identified when comparing strains NSW150 to strain D-4968 (Table 1B). In contrast, the SNP number between two strains of different Sg was higher, with about 16000 SNPs present between Sg1 and Sg2 strains (Table 1B). This low SNP number and relatively homogeneous distribution of the SNPs around the chromosome suggest recent expansion for the species L. longbeachae (Cazalet et al., 2010). The sequences and their analysis are accessible at http://genolist.pasteur.fr/LegioList/.
To investigate the phylogenetic relationship among the L. pneumophila and L. longbeachae strains we here used the nucleotide sequence of recN (recombination and repair protein-encoding gene) aligned based on the protein alignment. Based on an analysis of 32 protein-encoding genes widely distributed among bacterial genomes, RecN was described as the gene with the greatest potential for predicting genome relatedness at the genus or subgenus level (Zeigler, 2003). As depicted in Figure 2, the phylogenetic relationship among the four L. pneumophila strains is very high, and L. longbeachae is clearly more distant.
www.frontiersin.org Inversions between the genomic sequences are represented in blue. Genome-wide synteny is disrupted by a 260 kb inversion (blue) and a 130 kb plasticity zone between strain L. pneumophila Paris and Lens. In contrast, synteny between L. pneumophila and L. longbeachae is highly conserved.

DIVERSITY IN SECRETION SYSTEMS AND THEIR SUBSTRATES MAY CONTRIBUTE TO DIFFERENCES IN INTRACELLULAR TRAFFICKING AND NICHE ADAPTATION
The capacity of pathogens like Legionella to infect eukaryotic cells is intimately linked to the ability to manipulate host cell functions to establish an intracellular niche for their replication. Essential for the ability of Legionella to subvert host functions are its different secretion systems. The two major ones, known to be involved in virulence of L. pneumophila are the Dot/Icm type IV secretion system (T4BSS) and the Lsp type II secretion system (T2SS; Marra et al., 1992;Berger and Isberg, 1993;Rossier and Cianciotto, 2001).
For L. pneumophila type II protein secretion is critical for infection of amebae, macrophages and mice. Analyses of the L. longbeachae genome sequences showed, that it contains all genes to encode a functional Lsp type II secretion machinery (Cazalet et al., 2010;Kozak et al., 2010). Several studies, including the analysis of the L. pneumophila type II secretome indicated that L. pneumophila encodes at least 25 type II secreted substrates (Debroy et al., 2006;Cianciotto, 2009). Although this experimentally defined repertoire of type II secretion-dependent proteins is the largest known in bacteria, it may contain even more than 60 proteins as 35 additional proteins with a signal sequence were identified by in silico analyses (Cianciotto, 2009). A search for homologs of these substrates in the L. longbeachae genome sequences revealed that 9 (36%) of the 25 type II secretion system substrates described for L. pneumophila are absent from L. longbeachae ( Table 2). For example the phospholipase C encoded by plcA and the chiA-encoded chitinase, which was shown to promote L. pneumophila persistence in the lungs of A/J mice are not present in L. longbeachae (Debroy et al., 2006). Thus over a third of the T2SS substrates seem to differ between L. pneumophila and L. longbeachae, a feature probably related to the different ecological niches occupied, but also to different virulence properties in the hosts.
Indispensible for replication of L. pneumophila in the eukaryotic host cells is the Dot/Icm T4SS (Nagai and Kubori, 2011), which translocate a large repertoire of bacterial effectors into the host cell. These effectors modulate multiple host cell processes and in particular, redirect trafficking of the L. pneumophila phagosome and mediate its conversion into an ER-derived organelle competent for Frontiers in Microbiology | Cellular and Infection Microbiology intracellular bacterial replication (Shin and Roy, 2008;Cianciotto, 2009). The Dot/Icm system is conserved in L. longbeachae with a similar gene organization and protein identities of 47-92% with respect to L. pneumophila (Figure 3). This is similar to what has been reported previously for other Legionella species . The only major differences identified are that in L. longbeachae the icmR gene is replaced by the ligB gene, however, the encoded proteins have been shown to perform similar functions (Feldman and Segal, 2004;Feldman et al., 2005) and that the DotG/IcmE protein of L. longbeachae (1525 aa) is 477 amino acids larger than that of L. pneumophila (1048 aa; Cazalet et al., 2010). DotG of L. pneumophila is part of the core transmembrane complex of the secretion system and is composed of three domains: a transmembrane N-terminal domain, a central region composed of 42 repeats of 10 amino acid and a C-terminal region homologous to VirB10. In contrast, the central region of L. longbeachae DotG is composed of approximately 90 repeats. Among the many VirB10 homologs present in bacteria, the Coxiella DotG and the Helicobacter pylori Cag7 are the only ones, which also have multiple repeats of 10 aa . It will be challenging to understand the impact of this modification on the function of the type IV secretion system. A L. longbeachae T4SS mutant obtained by deleting the dotA gene is strongly attenuated for intracellular growth in Acanthamoeba castellanii and human macrophages (Cazalet et al., 2010, and unpublished data), is outcompeted by the wild type strain 24 and 72 h after infection of lungs of A/J mice and is also dramatically attenuated for replication in lungs of A/J mice upon single infections (Cazalet et al., 2010). Thus, similar to what is seen for L. pneumophila, the Dot/Icm T4SS of L. longbeachae is also central for its pathogenesis and the capacity to replicate in eukaryotic host cells. This T4SS is crucial for intracellular replication for Legionella as it secretes an exceptionally large number of proteins into the host cell. Using different methods, 275 substrates have been shown to be translocated in the host cell in a Dot/Icm T4SS dependent manner (Campodonico et al., 2005;De Felipe et al., 2005, 2008Shohdy et al., 2005;Burstein et al., 2009;Heidtman et al., 2009;Zhu et al., 2011). Table 3 shows the distribution of the 275 Dot/Icm substrates identified in L. pneumophila strain Philadelphia and their distribution in the six L. pneumophila and five L. longbeachae genomes sequenced. Their conservation among different L. pneumophila strains is very high, as over 80% of the substrates are present in all L. pneumophila strains analyzed here. In contrast, the search for homologs of these L. pneumophila Dot/Icm substrates in L. longbeachae showed that even more pronounced differences are present than in the repertoire of type II secreted substrates. Only 98 of these 275 L. pneumophila Dot/Icm substrates have homologs in the L. longbeachae genomes (Table 3). However, the repertoire of L. longbeachae substrates seems also to be quite large, as a search for proteins that encode eukaryotic like domains and contain the secretion signal described by Nagai et al. (2005) and the additional criteria defined by Kubori et al. (2008) predicted 51 putative Dot/Icm substrates specific for L. longbeachae NSW150 (Cazalet et al., 2010) indicating that at least over 140 proteins might be secreted by the Dot/Icm T4SS of L. longbeachae. A similar number of L. longbeachae specific putative eukaryotic like proteins and effectors was predicted for strain D-4968 (Kozak et al., 2010). Examples of effector proteins conserved between the two species are RalF, VipA, VipF, SidC, SidE, SidJ, YlfA LepA, and LepB, which contribute to trafficking or recruitment and retention of vesicles to L. pneumophila (Nagai et al., 2002;Chen et al., 2004;Luo and Isberg, 2004;Campodonico et al., 2005;Shohdy et al., 2005;Liu and Luo, 2007). It is interesting to note that homologs of SidM/DrrA and SidD are absent from L. longbeachae but a homolog of LepB is present. For L. pneumophila it was shown that SidM/DrrA, SidD, and LepB act in cooperation to manipulate Rab1 activity in the host cell. DrrA/SidM possesses three domains, an N-terminal AMPtransfer domain (AT), a nucleotide exchange factor (GEF) domain in the central part and a phosphatidylinositol-4-Phosphate binding domain (P4M) in its C-terminal part. After association of DrrA/SidM with the membrane of the Legionella-containing vacuole (LCV) via P4M (Brombacher et al., 2009), it recruits Rab1 via the GEF domain and catalyzes the GDP-GTP exchange (Ingmundson et al., 2007;Machner and Isberg, 2007). Rab1 is then adenylated by the AT domain leading to inhibition of GAP-catalyzed Rab1deactivation (Müller et al., 2010). LepB cannot bind AMPylated Rab1 (Ingmundson et al., 2007). Recently it was shown that SidD deAMPylates Rab1 and enables LepB to bind Rab1 to promote its GTP-GDP exchange (Neunuebel et al., 2011;Tan and Luo, 2011). One might assume that other proteins of L. longbeachae not yet identified may perform the functions of DrrA/SidM and SidD. Another interesting observation is, that all except four of the effector proteins of L. pneumophila that are conserved in L. longbeachae are also conserved in all sequenced L. pneumophila genomes ( Table 3).
Taken together the T2SS Lsp and the T4SS Dot/Icm are highly conserved between L. pneumophila and L. longbeachae. However, more than a third of the known L. pneumophila type II-and over 70% of type IV-dependent substrates differ between both species. These species specific, secreted effectors might be implicated in the different niche adaptations and host susceptibilities. Most interestingly, of the 98 L. pneumophila substrates conserved in L. longbeachae 87 are also present in all L. pneumophila strains sequenced to date. Thus, these 87 Dot/Icm substrates might be essential for intracellular replication of Legionella and represent a minimal toolkit for intracellular replication that has been acquired before the divergence of the two species.

MOLECULAR MIMICRY IS A MAJOR VIRULENCE STRATEGY OF L. PNEUMOPHILA AND L. LONGBEACHAE
The L. pneumophila genome sequence analysis has revealed that many of the predicted or experimentally verified Dot/Icm secreted substrates are proteins similar to eukaryotic proteins or contain motifs mainly or only found in eukaryotic proteins (Cazalet et al., 2004;De Felipe et al., 2005). Thus comparative genomics suggested that L. pneumophila encodes specific virulence factors that have evolved during its evolution with eukaryotic host cells such as fresh-water ameba (Cazalet et al., 2004). The protein-motifs predominantly found in eukaryotes, which were identified in the L. pneumophila genomes are ankyrin repeats, SEL1 (TPR), Set domain, Sec7, serine threonine kinase domains (STPK), U-box, and F-box motifs. Examples for eukaryotic like proteins of L. pneumophila are two secreted apyrases, a Frontiers in Microbiology | Cellular and Infection Microbiology  ) www.frontiersin.org lpg0402 Frontiers in Microbiology | Cellular and Infection Microbiology www.frontiersin.org    Continued) www.frontiersin.org

Frontiers in Microbiology | Cellular and Infection Microbiology
List of substrates is based on Isberg et al. (2009), De Felipe et al. (2008, Ninio et al. (2009) sphingosine-1-phosphate lyase and sphingosine kinase, eukaryotic like glycoamylase, cytokinin oxidase, zinc metalloprotease, or an RNA binding precursor (Cazalet et al., 2004;De Felipe et al., 2005;Bruggemann et al., 2006). Function prediction based on similarity searches suggested that many of these proteins are implicated in modulating host cell functions to the pathogens advantage (Cazalet et al., 2004). Recent functional studies confirm these predictions. As a first example, it was shown that L. pneumophila is able to interfere with the host ubiquitination pathway. The L. pneumophila U-box containing protein LubX was shown to be a secreted effector of the Dot/Icm secretion system that mediates polyubiquitination of a host kinase Clk1 (Kubori et al., 2008). Recently, LubX was described as the first example of an effector protein, which targets and regulates another effector within host cells, as it functions as an E3 ubiquitin ligase that hijacks the host proteasome to specifically target the bacterial effector protein SidH for degradation. Delayed delivery of LubX to the host cytoplasm leads to the shutdown of SidH within the host cells at later stages of infection. This demonstrates a sophisticated level of co-evolution between eukaryotic cells and L. pneumophila involving an effector that functions as a key regulator to temporally coordinate the function of a cognate effector protein (Kubori et al., 2010;Luo, 2011). Furthermore, AnkB/Lpp2028, one of the three F-box proteins of L. pneumophila, was shown to be a T4SS effector that is implicated in virulence of L. pneumophila and in recruiting ubiquitinated proteins to the LCV (Al-Khodor et al., 2008;Price et al., 2009;Habyarimana et al., 2010;Lomma et al., 2010).
A second example is the apyrases (Lpg1905 and Lpg0971) encoded in the L. pneumophila genomes. Indeed, both are secreted enzymes important for intracellular replication of L. pneumophila. Lpg1905 is a novel prokaryotic ecto-NTPDase, similar to CD39/NTPDase1, which is characterized by the presence of five apyrase-conserved regions and enhances the replication of L. pneumophila in eukaryotic cells . Apart from ATP and ADP, Lpg1905 also cleaves GTP and GDP with similar efficiency to ATP and ADP, respectively (Sansom et al., 2008). A third example is a L. pneumophila homolog of the highly conserved eukaryotic enzyme sphingosine-1-phosphate lyase (Spl). In eukaryotes, SPL is an enzyme that catalyzes the irreversible cleavage of sphingosine-1-phosphate (S1P). S1P is implicated in various physiological processes like cell survival, apoptosis, proliferation, migration, differentiation, platelet aggregation, angiogenesis, lymphocyte trafficking and development. Despite the fact that the function of the L. pneumophila Spl remains actually unknown, the hypothesis is that it plays a role in autophagy and/or apoptosis (Cazalet et al., 2004;Bruggemann et al., 2006). Recently it has been shown that the L. pneumophila Spl is a secreted effector of the Dot/Icm T4SS, that it is able to complement the sphingosine-sensitive phenotype of Saccharomyces cerevisiae. Moreover, L. pneumophila Spl co-localizes to the host cell mitochondria . Taken together, the many different functional studies undertaken based on the results of the genome sequence analyses deciphering the roles of the eukaryotic like proteins have clearly established that they are secreted virulence factors that are involved in host cell adhesion, formation of the LCV, modulation of host cell functions, induction of apoptosis and egress of Legionella (Nora et al., 2009;Hubber and Roy, 2010). Most of these effector proteins are expressed at different stages of the intracellular life cycle of L. pneumophila (Bruggemann et al., 2006) and are delivered to the host cell by the Dot/Icm T4SS. Thus molecular mimicry of eukaryotic proteins is a major virulence strategy of L. pneumophila.
As expected, eukaryotic like proteins and proteins encoding domains mainly found in eukaryotic proteins are also present in the L. longbeachae genomes. However, between the two species a www.frontiersin.org considerable diversity in the repertoire of these proteins exists. For example Spl, LubX, the three L. pneumophila F-box proteins, and the homolog of one (Lpg1905) of the two apyrases are missing in all sequenced L. longbeachae genomes. In contrast a glycoamylase (Herrmann et al., 2011) and an uridine kinase homolog are present also in L. longbeachae (Cazalet et al., 2010;Kozak et al., 2010; Table 3). However, other proteins encoded by the L. longbeachae genome contain U-box and F-box domains and might therefore fulfill similar functions. Thus, although the specific proteins may not be conserved, the eukaryotic like protein-protein interaction domains found in L. pneumophila are also present in L. longbeachae.
The differences in trafficking between L. longbeachae and L. pneumophila mentioned above might be related to specific effectors encoded by L. longbeachae. A search for such specific putative effectors of L. longbeachae identified several proteins that might contribute to these differences like a family of Ras-related small GTPases (Cazalet et al., 2010;Kozak et al., 2010). These proteins may be involved in vesicular trafficking and thus may account at least partly for the specificities of the L. longbeachae life cycle. L. pneumophila is also known to exploit monophosphorylated host phosphoinositides (PI) to anchor the effector proteins SidC, SidM/DrrA, LpnE, and LidA to the membrane of the replication vacuole (Machner and Isberg, 2006;Murata et al., 2006;Weber et al., 2006Weber et al., , 2009Newton et al., 2007;Brombacher et al., 2009). L. longbeachae may employ an additional strategy to interfere with the host PI as a homolog of the mammalian PI metabolizing enzyme phosphatidylinositol-4-phosphate 5-kinase was identified in its genome. One could speculate that this protein allows direct modulation of the host cell PI levels.
Interestingly, although 23 of the 29 ankyrin proteins identified in the L. pneumophila strains are absent from the L. longbeachae genome, L. longbeachae encodes a total of 23 specific ankyrin repeat proteins ( Table 3). For example, L. pneumophila AnkX/AnkN that was shown to interfere with microtubuledependent vesicular transport is missing in L. longbeachae (Pan et al., 2008). However, L. longbeachae encodes a putative tubulintyrosine ligase (TTL). TTL catalyzes the ATP-dependent posttranslational addition of a tyrosine to the carboxy terminal end of detyrosinated alpha-tubulin. Although the exact physiological function of alpha-tubulin has so far not been established, it has been linked to altered microtubule structure and function (Eiserich et al., 1999). Thus this protein might take over this function in L. longbeachae.
Legionella longbeachae is the first bacterial genome encoding a protein containing an Src Homology 2 (SH2) domain. SH2 domains, in eukaryotes, have regulatory functions in various intracellular signaling cascades. Furthermore, L. longbeachae encodes two proteins with pentatricopeptide repeat (PPR) domains. This family seems to be greatly expanded in plants, where they appear to play essential roles in organellar RNA metabolism (Lurin et al., 2004;Nakamura et al., 2004;Schmitz-Linneweber and Small, 2008). Only 12 bacterial PPR domain proteins have been identified to date, all encoded by two species, the plant pathogens Ralstonia solanacearum and the facultative photosynthetic bacterium Rhodobacter sphaeroides. Thus, genome analysis revealed a particular feature of the Legionella genomes, the presence of many eukaryotic like proteins and protein domains, some of which are common to the two Legionella species, others which are specific and may thus account for the species specific features in intracellular trafficking and niche adaptation in the environment.

SURFACE STRUCTURES -A CLUE TO MOUSE SUSCEPTIBILITY TO INFECTION WITH LEGIONELLA
Despite the presence of many different species of Legionella in aquatic reservoirs, the vast majority of human disease is caused by a single serogroup (Sg) of a single species, namely L. pneumophila Sg1, which is responsible for about 84% of all cases worldwide (Yu et al., 2002). Similar results are obtained for L. longbeachae. Two serogroups are described, but L. longbeachae Sg1 is predominant in human disease. Lipopolysaccharide (LPS) is the basis for the classification of serogroups but it is also a major immunodominant antigen of L. pneumophila and L. longbeachae. Interestingly, it has also been shown that membrane vesicles shed by virulent L. pneumophila containing LPS are sufficient to inhibit phagosome-lysosome fusion (Fernandez-Moreira et al., 2006). Results obtained from large-scale genome comparisons of L. pneumophila suggested that LPS of Sg1 itself might be implicated in the predominance of Sg1 strains in human disease compared to other serogroups of L. pneumophila and other Legionella species (Cazalet et al., 2008). A comparative search for LPS coding regions in the genome of L. longbeachae NSW 150 identified two gene clusters encoding proteins that could be involved in production of lipopolysaccharide (LPS) and/or capsule. Neither shared homology with the L. pneumophila LPS biosynthesis gene cluster suggesting considerable differences in this major immunodominant antigen between the two Legionella species. However, homologs of L. pneumophila lipidA biosynthesis genes (LpxA, LpxB, LpxD, and WaaM) are present. Electron microscopy also demonstrated that, in contrast to L. pneumophila, L. longbeachae produces a capsulelike structure, suggesting that one of the aforementioned gene cluster encodes LPS and the other the capsule (Cazalet et al., 2010).
As mentioned in the introduction, only A/J mice are permissive for replication of L. pneumophila, in contrast A/J, C57BL/6, and BALB/c mice are all permissive for replication of L. longbeachae. In C57BL/6 mice cytosolic flagellin of L. pneumophila triggers Naip5dependent caspase-1 activation and subsequent proinflammatory cell death by pyroptosis rendering them resistant to infection (Diez et al., 2003;Wright et al., 2003;Molofsky et al., 2006;Ren et al., 2006;Zamboni et al., 2006;Lamkanfi et al., 2007;Lightfield et al., 2008). Genome analysis shed light on the reasons for these differences. L. longbeachae does not carry any flagellar biosynthesis genes except the sigma factor FliA, the regulator FleN, the twocomponent system FleR/FleS and the flagellar basal body rod modification protein FlgD (Cazalet et al., 2010;Kozak et al., 2010). Analysis of the genome sequences of strains L. longbeachae D-4968, ATCC33642, 98072, and C-4E7 as well as a PCR-based screening of 50 L. longbeachae isolates belonging to both serogroups by Kozak et al. (2010) and of 15 additional isolates by Cazalet et al. (2010) did not detect flagellar genes in any isolate confirming that L. longbeachae, in contrast to L. pneumophila does not synthesize flagella. Interestingly, all genes bordering flagellar gene clusters are conserved between L. longbeachae and L. pneumophila, suggesting deletion of these regions from the L. longbeachae genome. This result suggests, that L. longbeachae fails to activate caspase-1 due to the lack of flagellin, which may also partly explain the differences in mouse susceptibility to L. pneumophila and L. longbeachae infection. The putative L. longbeachae capsule may also contribute to this difference.

Frontiers in Microbiology | Cellular and Infection Microbiology
Quite interestingly, although L. longbeachae does not encode flagella, it encodes a putative chemotaxis system. Chemotaxis enables bacteria to find favorable conditions by migrating toward higher concentrations of attractants. In many bacteria, the chemotactic response is mediated by a two-component signal transduction pathway, comprising a histidine kinase CheA and a response regulator CheY. Homologs of this regulatory system are present in the L. longbeachae genomes sequenced (Cazalet et al., 2010;Kozak et al., 2010). Furthermore, two homologs of the "adaptor" protein CheW that associate with CheA or cytoplasmic chemosensory receptors are present. Ligand-binding to receptors regulates the autophosphorylation activity of CheA in these complexes. The CheA phosphoryl group is subsequently transferred to CheY, which then diffuses away to the flagellum where it modulates motor rotation. Adaptation to continuous stimulation is mediated by a methyltransferase CheR. Together, these proteins represent an evolutionarily conserved core of the chemotaxis pathway, common to many bacteria and archea (Kentner and Sourjik, 2006;Hazelbauer et al., 2008). Homologs of all these proteins are present in the L. longbeachae genomes (Cazalet et al., 2010;Kozak et al., 2010) and a similar chemotaxis system is present in Legionella drancourtii LLAP12 (La Scola et al., 2004) but it is absent from L. pneumophila. The flanking genomic regions are highly conserved among L. longbeachae and all L. pneumophila strains sequenced, suggesting that L. pneumophila, although it encodes flagella has lost the chemotaxis system encoding genes by deletion events.
Thus these two species differ markedly in their surface structures. L. longbeachae encodes a capsule-like structure, synthesizes a very different LPS, does not synthesize flagella but encodes a chemotaxis system. These differences in surface structures seem to be due to deletion events leading to the loss of flagella in L. longbeachae and the loss of chemotaxis in L. pneumophila leading in part to the adaptation to their different main niches, soil, and water.

EVOLUTION OF EUKARYOTIC EFFECTORS -ACQUISITION BY HORIZONTAL GENE TRANSFER FROM EUKARYOTES?
Human to human transmission of Legionella has never been reported. Thus humans have been inconsequential in the evolution of these bacteria. However, Legionella have co-evolved with freshwater protozoa allowing the adaptation to eukaryotic cells. The idea that protozoa are training grounds for intracellular pathogens was born with the finding by Rowbotham (1980) that Legionella has the ability to multiply intracellularly. This lead to a new percept in microbiology: bacteria parasitize protozoa and can utilize the same process to infect humans. Indeed, the long co-evolution of Legionella with protozoa is reflected in its genome by the presence of eukaryotic like genes, many of which are clearly virulence factors used by L. pneumophila to subvert host functions. These genes may have been acquired either through horizontal gene transfer (HGT) from the host cells (e.g., aquatic protozoa) or from bacteria or may have evolved by convergent evolution. Recently it has www.frontiersin.org been reported that L. drancourtii a relative of L. pneumophila has acquired a sterol reductase gene from the Acanthamoeba polyphaga Mimivirus genome, a virus that grows in ameba (Moliner et al., 2009). Thus, the acquisition of some of the eukaryotic like genes of L. pneumophila by HGT from protozoa is plausible. ralF was the first gene suggested to have been acquired by L. pneumophila from eukaryotes by HGT, as RalF carries a eukaryotic Sec 7 domain (Nagai et al., 2002). In order to study the evolutionary origin of eukaryotic L. pneumophila genes, we have undertaken a phylogenetic analysis of the eukaryote-like sphingosine-1-phosphate lyase of L. pneumophila that is encoded by lpp2128 described earlier. The phylogenetic analyses shown in Figure 4 revealed that it was most likely acquired from a eukaryotic organism early during Legionella evolution Nora et al., 2009) as the Lpp2128 protein sequence of L. pneumophila clearly falls into the eukaryotic clade of SPL sequences.
We then tested the hypothesis that L. longbeachae might have acquired genes also from plants, which is conceivable as it is found in soil. We thus undertook here a phylogenetic analysis similar to that described above for the L. longbeachae protein Llo2643 that contains PPR repeats, a protein family typically present in plants. A Blast search in the database revealed that homologs of Llo2643 are only found in eukaryotes, in particular in plants and algae. The only prokaryotes encoding this protein are the cyanobacteria Microcoelus vaginatus and Cylindrospermopsis rasiborskii. This rare presence in bacteria is suggestive of a horizontal transfer event from eukaryotes to these bacteria. Figure 5 shows the phylogenetic tree we obtained. The fact that the bacterial proteins group together may also be due to a phenomenon of long branch attraction. Thus, the Llo2643 protein of L. longbeachae appears closer to plant proteins than prokaryotic ones. Once more plant proteins, perhaps from algae, will be in the database, it might become possible to evaluate whether L. longbeachae indeed acquired genes from plants.
Legionella is not the only prokaryote whose genome shows an enrichment of proteins with eukaryotic domains. Another example is the genome of "Ca. Amoebophilus asiaticus" a Gramnegative, obligate intracellular ameba symbiont belonging to the Bacteroidetes, which has been discovered within an ameba isolated from lake sediment (Schmitz-Esser et al., 2008) has been reported (Schmitz-Esser et al., 2010). In a recent report Schmitz-Esser et al. (2010) show that the genome of this organism also encodes an arsenal of proteins with eukaryotic domains. To further investigate the distribution of these protein domains in other bacteria the authors have undertaken an enrichment analysis comparing the fraction of all functional protein domains among 514 bacterial proteomes (Schmitz-Esser et al., 2010). This showed that the genomes of bacteria for which the replication in ameba has been demonstrated were enriched in protein domains that are predominantly found in eukaryotic proteins. Interestingly, the domains potentially involved in host cell interaction described above, such as ANK repeats, LRR, SEL1 repeats, and F-and U-box domains, are among the most highly enriched domains in proteomes of amebaassociated bacteria. Bacteria that can exploit amebae as hosts thus share a set of eukaryotic domains important for host cell interaction despite their different lifestyles and their large phylogenetic diversity. This suggests that bacteria thriving within ameba use similar mechanisms for host cell interaction to facilitate survival in the host cell. Due to the phylogenetic diversity of these bacteria, it is most likely that these traits were acquired independently during evolutionary early interaction with ancient protozoa.

CONCLUSION
Legionella pneumophila and L. longbeachae are two human pathogens that are able to modulate, manipulate, and subvert many eukaryotic host cell functions to their advantage, in order to enter, replicate, and evade protozoa or human alveolar macrophages during disease. In the last years genome analyses, as well as comparative and functional genomics have demonstrated that genome plasticity plays a major role in differences in host cell exploitation and niche adaptation of Legionella. The genomes of these environmental pathogens are shaped by HGT between Frontiers in Microbiology | Cellular and Infection Microbiology eukaryotes and prokaryotes, allowing them to mimic host cell functions and to exploit host cell pathways. Genome plasticity and HGT lead in each strain and species to a different repertoire of secreted effectors that may allow subtle adaptations to, e.g., different protozoan hosts. Plasmids can be exchanged among strains and phages and deletions of surface structures like flagella or chemotaxis systems has taken place. Thus genome plasticity is major mechanism by which Legionella may adapt to different niches and hosts.
Access to genomic data has revealed many potential virulence factors of L. pneumophila and L. longbeachae as well as metabolic capacities of these bacteria. The increasing information in the genomic database will allow a better identification of the origin and similarity of eukaryotic like proteins or eukaryotic protein domains and other virulence factors. New eukaryotic genomes like that of the natural host of Legionella, A. castellanii are in progress. These additional data will allow studying possible transfer events of genes from the eukaryotic host to Legionella more in depth. Taken together, the progressive increase of information on Legionella as well as on protozoa will allow more complete comparative and phylogenetic studies to shed light on the evolution of virulence in Legionella. However, much work remains to be done to translate the basic findings from genomics research into improved understanding of the biology of this organism. As data are accumulating, new fields of investigation will emerge. Without doubt the investigation and characterization of regulatory ncRNAs will be one such field. Manipulation of host-epigenetic information and investigating host susceptibility to disease will be another. In particular development of high throughput techniques for comparative and functional genomics as well as more and more powerful imaging techniques will accelerate the pace of knowledge acquisition.