Global Proteomic Analysis of Two Tick-Borne Emerging Zoonotic Agents: Anaplasma Phagocytophilum and Ehrlichia Chaffeensis

Anaplasma phagocytophilum and Ehrlichia chaffeensis are obligatory intracellular α-proteobacteria that infect human leukocytes and cause potentially fatal emerging zoonoses. In the present study, we determined global protein expression profiles of these bacteria cultured in the human promyelocytic leukemia cell line, HL-60. Mass spectrometric (MS) analyses identified a total of 1,212 A. phagocytophilum and 1,021 E. chaffeensis proteins, representing 89.3 and 92.3% of the predicted bacterial proteomes, respectively. Nearly all bacterial proteins (≥99%) with known functions were expressed, whereas only approximately 80% of “hypothetical” proteins were detected in infected human cells. Quantitative MS/MS analyses indicated that highly expressed proteins in both bacteria included chaperones, enzymes involved in biosynthesis and metabolism, and outer membrane proteins, such as A. phagocytophilum P44 and E. chaffeensis P28/OMP-1. Among 113 A. phagocytophilum p44 paralogous genes, 110 of them were expressed and 88 of them were encoded by pseudogenes. In addition, bacterial infection of HL-60 cells up-regulated the expression of human proteins involved mostly in cytoskeleton components, vesicular trafficking, cell signaling, and energy metabolism, but down-regulated some pattern recognition receptors involved in innate immunity. Our proteomics data represent a comprehensive analysis of A. phagocytophilum and E. chaffeensis proteomes, and provide a quantitative view of human host protein expression profiles regulated by bacterial infection. The availability of these proteomic data will provide new insights into biology and pathogenesis of these obligatory intracellular pathogens.

. Since them, reported cases have increased every year. During 2008, cases attributed to A. phagocytophilum and E. chaffeensis increased by 21 and 16% from 2007, respectively (Hall-Baker et al., 2010).
Anaplasma phagocytophilum and E. chaffeensis are obligatory intracellular bacteria with a life cycle that requires repeated transmission between mammalian hosts and tick vectors (Rikihisa, 1991(Rikihisa, , 2010bDumler et al., 2001). Once transmitted to mammals, these bacteria replicate in membrane-bound compartments inside the primary host immune defensive cells: granulocytes (A. phagocytophilum) or monocytes/macrophages (E. chaffeensis). Since culture isolation of these organisms in the 1990s (Dawson et al., 1991;Goodman et al., 1996), unique strategies employed by A. phagocytophilum and E. chaffeensis for their survival in hostile environment have begun to be unraveled, including hijacking host cell signaling pathways, altering vesicular trafficking, usurping nutritional and cytoskeletal components, and subverting several host innate immune responses Fikrig, 2003, 2006;Carlyon et al., 2004;Sukumaran et al., 2005;Huang et al., 2010a;Rikihisa, 2010a,b;Sultana et al., 2010;Wakeel et al., 2010). The complete genome sequences of A. phagocytophilum (1,471,282 base pairs) and E. chaffeensis (1,175,764 bp), and detailed analyses of their protein-coding genes have proven a

IntroductIon
Anaplasma phagocytophilum and Ehrlichia chaffeensis are small (ca. 0.4 by 1.5 μm), pleomorphic gram-negative bacteria that belong to the family Anaplasmataceae in the order Rickettsiales, the class α-proteobacteria (Dumler et al., 2001;Rikihisa, 2010b). The infection of humans by A. phagocytophilum and E. chaffeensis causes human granulocytic anaplasmosis [HGA, first reported in 1994, formerly known as human granulocytic ehrlichiosis (HGE)] and human monocytic ehrlichiosis (HME, first reported in 1987), respectively (Maeda et al., 1987;Chen et al., 1994). HGA and HME are similar systemic febrile diseases characterized by fever, headache, myalgia, anorexia, and chills, and are frequently accompanied by leukopenia, thrombocytopenia, anemia, and elevations in serum hepatic aminotransferases (Paddock and Childs, 2003;Bakken and Dumler, 2008;Thomas et al., 2009). Neurological signs are more frequently reported with HME than HGA (Paddock and Childs, 2003). Although doxycycline is generally effective in treating human ehrlichioses, delayed therapy, the presence of underlying allergies or poor health, and immuno-suppression often lead to severe complications or death. As important life-threatening tick-borne emerging zoonoses, HGA and HME were designated as nationally notifiable diseases by US Centers for Disease Control and Prevention in 1998 great resource for studying these bacteria and the diseases they cause (Dunning Hotopp et al., 2006). These two species share approximately 500 genes; most of them encode proteins with homologies to those with known functions. However, approximately 470-580 genes are unique to each species (Dunning Hotopp et al., 2006), and approximately 45% of predicted open reading frames (ORFs) in the two genomes were annotated as uncharacterized "hypothetical proteins" or proteins without any functional assignment (Table 1). However, whether they really encode proteins and are actually expressed in living organisms remains largely unknown.
Owing to the recent technical advance in transcriptome and proteome analyses, a holistic view of the numerous expressed genes and proteins of an organism has become available. Whole genome transcriptome analysis of A. phagocytophilum in human HL-60 cells showed the expression of approximately 70% of the bacterial gene transcripts (Nelson et al., 2008). Proteomics studies based on 1-D and 2-D gel analyses of E. chaffeensis identified one-fourth of the total ORFs from human and tick cell-derived bacterial cultures (Singu et al., 2005;Seo et al., 2008). However, there are major difficulties in proteomic studies of obligatory intracellular bacteria; because a high-purity bacterial sample is not easily obtainable, and the presence of a large amount of host proteins reduces the sensitivity and lowers the identification scores of bacterial proteins (Li and Lostumbo, 2010). The development of more sensitive nano-liquid chromatography combined with tandem MS/MS (nano-LC-MS/ MS)-based proteomic approach improves global protein analysis of obligatory intracellular bacteria, as low levels of proteins can be identified in samples mixed with a large amount of host proteins (Zimmer et al., 2006). Furthermore, label-free protein quantitation based on LC-MS peptide peak intensity information becomes possible due to the reproducibility and sensitivity of intensity data measurements, and multiple samples from different conditions can be compared directly without stable isotope labeling (Old et al., 2005;Zimmer et al., 2006;Shi et al., 2009).
Here, we present the first comprehensive proteomes of two human pathogens A. phagocytophilum and E. chaffeensis, their relative protein expression abundances, and the influence of infection with these two pathogens on human host protein expression using multidimensional nano-LC-MS/MS approaches developed at Pacific Northwest National Laboratory 1 (Zimmer et al., Milford, MA, USA), and a BCA protein assay (Thermo Fisher/ Pierce, Rockford, IL, USA) was performed to determine the final sample concentration. For digestion with soluble and insoluble protein extracts, purified bacterial or host cell pellets were resuspended in 50 mM NH 4 HCO 3 buffer and centrifuged at 355,000×g at 4°C for 10 min to separate the protein lysates into two parts: soluble and insoluble protein fractions. The supernatant was tryptically digested and cleaned up in the same fashion as in the global digest method and designated as soluble digest samples. The pellet after ultracentrifugation, containing the insoluble protein fraction, was washed and resuspended in a denaturing solution (7 M urea, 2 M thiourea, 1% CHAPS, 10 mM DTT, 50 mM NH 4 HCO 3 , pH 7.8). Insoluble protein samples were digested as described above. Removal of salts and detergent was performed using a Discovery strong cation exchange (SCX) SPE column (Supelco). Peptides were concentrated, and the concentration measured as described above. All trypsin-digested peptides were snap frozen in liquid N 2 and stored at −80°C until proteomic analysis.

Mass spectroMetry and data analysIs
In order to enhance proteome coverage, all peptide samples were further separated by SCX chromatography coupled offline with nano-LC−MS/MS analyses. Peptide mixtures from each proteome sample were fractionated into 35-70 fractions as previously described (Qian et al., 2005). A description of the instrumentation and specifics of the high-performance liquid chromatography (HPLC-MS/MS) and HPLC-MS instrumental arrangements and associated methods for each biological system have been described previously and are consistent for all experiments Mottaz-Brewer et al., 2008). In brief, samples were loaded onto an in-house developed chromatography system that uses a 20-cm × 75-μm C18 reversephase column and ionized as they eluted from the column into a mass spectrometer using electrospray ionization. The liquid chromatography gradient was generated linearly from aqueous to organic over 100 min in acidic conditions. Typically, MS was performed in a linear trap quadrupole (LTQ; Thermo Fisher Scientific) ion trap mass spectrometer. Tandem MS (MS/MS) were collected using data-dependent settings on the top 10 ions from the precursor scan.
Tandem MS spectra (MS/MS) were matched to protein sequence files using the SEQUEST program and filtered with a combination of scores provided in the output files (Eng et al., 1994), which included the minimum threshold filter scores defined by Washburn et al. (2001), and an additional minimum discriminant score of 0.5 to reduce the false-positive identifications (Strittmatter et al., 2004). Only peptides passing these filters were populated into the initial accurate mass and time (AMT) tag database. The searches were performed using the annotated protein databases of A. phagocytophilum HZ (1,357 protein entries, GenBank Accession Number NC_007797) including newly annotated 113 A. phagocytophilum P44 proteins, E. chaffeensis Arkansas (1,106 protein entries, GenBank Accession Number NC_007799), and Homo sapiens IPI protein database (61,225 protein entries, IPI 2006, v3.36). Each bacterial and human peptide from infected host cells was identified and populated into the same AMT tag database.
2006; Mottaz-Brewer et al., 2008). The determination of protein expression profiles of A. phagocytophilum and E. chaffeensis in human leukocytes will help advance understanding cell biology, physiology of these bacteria, and complex interplay between bacteria and their host, and enhance the opportunities for investigation of novel targets for antimicrobial therapy or blocking of pathogenic pathways.

BacterIa culture and purIfIcatIon
Anaplasma phagocytophilum HZ (type strain; Rikihisa et al., 1997) and E. chaffeensis Arkansas (type strain; Dawson et al., 1991) were cultured in HL-60 cells, which are undifferentiated human promyelocytic leukemia cells from ATCC (#CCL-240, Manassas, VA, USA). Cells were maintained in RPMI 1640 medium supplemented with 5% fetal bovine serum and 2 mM l-glutamine, and incubated at 37°C in a humidified 5% CO2-95% air atmosphere. No antibiotic was used throughout the study. When infectivity reached greater than 95% as assessed by Diff-Quik staining of cytocentrifuged preparations (Baxter Scientific Products, Obetz, OH, USA), infected cells were harvested, extensively washed to remove serum proteins, and host cell-free bacteria were released by sonication for 10 s at an output setting of 2 with an ultrasonic processor W-380 (Heat Systems, Farmington, NY, USA). After low-speed centrifugation at 700×g to remove nuclei and unbroken cells, the supernatant was filtered through a 5-μm then 0.8-μm filter (Millipore, Billerica, MA, USA) to remove cellular debris. The filtrate was then centrifuged at 10,000 × g for 10 min, and the pellet enriched with host cell-free bacteria was collected. saMple preparatIon for proteoMIcs analysIs: proteIn partItIonIng, dIgestIon, and clean-up To obtain comprehensive coverage of protein expression profiles, including both hydrophilic and hydrophobic proteins, proteins with very high or low pIs, and proteins with different cellular distributions, three optimized protein extraction protocols, including global, soluble, and insoluble protein extracts, were applied to purified host cell-free bacteria and uninfected or infected HL-60 cells as described previously (Mottaz-Brewer et al., 2008). For tryptic digestion with global protein extracts, pellets containing purified bacteria or host cells were suspended in 100 mM NH 4 HCO 3 buffer (pH 8.4). The resulting suspension was transferred to a 2.0-mL cryovial tube with O-ring in cap, and lysed by beating with 0.1-mm zirconia/silica disruption beads (BioSpec Products, Bartlesville, OK, USA). Protein samples were denatured and reduced by adding urea, thiourea, and dithiothreitol (DTT) at final concentrations of 7 M, 2 M, and 5 mM, respectively. Following incubation at 60°C for 30 min, the samples were diluted 10-fold with NH 4 HCO 3 buffer. Global digest was performed by adding trypsin at 1:50 (w:w) enzyme:protein ratio, and CaCl 2 at a final concentration of 1 mM. The samples were incubated at 37°C for 3 h, snap frozen in liquid N 2 to stop the digestion, and stored at −80°C until further analysis. Clean-up was performed using a Discovery C-18 solid phase extraction (SPE) column (Supelco, Bellefonte, PA, USA) to prepare the samples for MS analysis. Peptides were then concentrated by a Savant SpeedVac manifold (Thermo Fisher, datasets using purified bacteria from infected cells and 189 datasets from A. phagocytophilum-infected cells, respectively. The database for E. chaffeensis contained 49 datasets from purified bacteria and 192 datasets associated with E. chaffeensis-infected host cells. In protein samples from both A. phagocytophilum and E. chaffeensis-infected HL-60 cells, greater than 126,000 peptides were identified (Figure A1 in Appendix). Among these peptides, 44,080 matched to 1,212 A. phagocytophilum proteins, and 40,004 matched to 1,021 E. chaffeensis proteins, representing 89.3 and 92.3% of the predicted bacterial proteomes, respectively ( Table 1). Among these detected proteins, greater than 96% have more than one peptide match. For proteins with known functional categories assigned, nearly all of these proteins (99.0% from A. phagocytophilum and 99.7% from E. chaffeensis) were expressed in HL-60 cells, including enzymes required for metabolisms and proteins involved in pathogenesis and regulatory functions, such as outer membrane proteins, the type IV secretion system (T4SS), and two-component regulatory systems. Therefore, nearly all proteins with known functions are likely essential for the replication and survival of these two pathogens inside human host cells. These expression profiles in mammalian host also suggest that, although gene loss occurred in the family Anaplasmataceae as a result of reductive genome evolution (Blanc et al., 2007;Darby et al., 2007), these genes cannot be sacrificed from their genomes.

expressIon of A. phAgocytophilum and E. chAffEEnsis proteIns In BIosynthesIs pathways and phage coMponents
Anaplasma phagocytophilum and E. chaffeensis have significantly higher percentages of their genomes involved in nucleotide biosynthesis, cofactor and vitamin biosynthesis, and protein synthesis than their closely related free-living α-proteobacterium Caulobacter crescentus (Dunning Hotopp et al., 2006). Expression of enzymes

QuantItatIve Mass spectroMetrIc analysIs
Before running on the mass spectrometer, the total peptide mass was measured, and the sample was diluted to 1 μg/μL for injection. After the building of the initial AMT tag database, all samples were analyzed with a 9.4-T Fourier transform ion cyclotron resonance (FTICR) mass spectrometer (Bruker Daltonic, Billerica, MA, USA) following the separation of peptides by reverse-phase capillary HPLC under identical conditions as described (Shi et al., 2006(Shi et al., , 2009. Standard proteins were added before digestion and used to track the performance of the instruments. Five technical replicates for each sample were injected into the FTICR mass spectrometer. Relevant information, such as the elution time from the capillary LC column, the abundance of the signal (integrated area under the elution profile), and the monoisotopic mass (determined from charge state and the high accuracy m/z measurement) of each feature observed in the FTICR, was used to match the peptide identifications contained within the initial AMT tag database. These peptides, now identified and quantified, were used to infer the protein composition of the samples. Only peptides observed in at least three out of five of these technical replicates were used in data analysis, and all proteins were required to have at least three observed peptides to be included in the confident results. In addition, the number of peptides observed for each protein in a biological sample was divided by the total number of peptides determined from the same sample to give an estimate of relative abundance of each identified protein in the sample. The abundances for each peptide were then averaged across these runs before the ratio calculations.

data analysIs and proteoMIc dataBase onlIne access
Bacterial proteins were classified based on the functional role categories using JCVI Annotation Engine and Comprehensive Microbial Resource 2 as described previously (Dunning Hotopp et al., 2006;Lin et al., 2009). Human proteins were classified based on gene ontology (GO) as annotated by the GO Consortium 3 . All peptides and proteins identified from A. phagocytophilum, E. chaffeensis, and human HL-60 cells, together with the detailed analyses of protein expression profiles and quantitation results can be accessed and downloaded from the website 4 .

results and dIscussIon overvIew of A. phAgocytophilum and E. chAffEEnsis proteIns IdentIfIed By proteoMIcs
In order to identify and quantitate the comprehensive protein expression profiles of A. phagocytophilum and E. chaffeensis, 18 A. phagocytophilum protein samples and 14 E. chaffeensis samples were prepared from purified host cell-free bacteria and infected HL-60 cells using three different protein extraction protocols as described; each contained approximately 1 mg of peptides after tryptic digestion and column clean-up. Approximately 250 MS runs for each bacterium from these samples were performed until no new peptides were detected ( Figure A1 in Appendix). Positively identified peptides were populated into the AMT tag database, and two databases were constructed. The A. phagocytophilum database contained search results from 60 Analysis of these expressed proteins with unknown functions showed that 50.7% of them in A. phagocytophilum and 61.8% in E. chaffeensis were greater than 100 amino acids (AA) in protein length (Tables S3  and S4 in Supplementary Material). However, for "hypothetical" proteins undetectable by proteomic analysis, 97.8% of them in A. phagocytophilum and 100% in E. chaffeensis were fewer than 100 AA ( Table 2). As functional assignment to an ORF during genome annotation process is based on the homology or domain structure matches to known proteins or domains, proteins with known functions assigned are most likely biased toward long proteins (Skovgaard et al., 2001). This statement is probably true since among the "hypothetical" proteins, 60.2% in A. phagocytophilum and 48.2% in E. chaffeensis are fewer than 100 AA, whereas less than 8% of proteins with known functions are fewer than 100 AA ( Table 1). The shorter protein length also reduces its possibility of being detected by proteomic analysis due to the smaller number of peptides after trypsin-treatment. However, our study showed that greater than 63% of the "hypothetical" proteins with fewer than 100 AA could be detected in both bacteria (Table 1). Therefore, further bioinformatics analyses of these expressed genes, combined with comprehensive protein expression profiles under different culturing or host environmental conditions, would help in the prediction of true "hypothetical" proteins.

expressIon of overlappIng orfs In A. phAgocytophilum and E. chAffEEnsis
Overlapping genes are detected primarily in parasitic or symbiotic bacteria and are believed to be a consequence of the reduction of originally larger genomes (Fukuda et al., 2003;Blanc et al., 2007). Analyses of the A. phagocytophilum and E. chaffeensis genomes identified overlaps among protein-coding ORFs and between RNA-and protein-coding ORFs, which occurred either at different reading frames of the same strand or on the complementary strands. Proteomic data showed that many overlapping genes were indeed expressed by A. phagocytophilum and E. chaffeensis in infected human host cells (Figures A2 and A3 in involved in nucleotide, vitamin, and cofactor biosynthetic pathway in A. phagocytophilum and E. chaffeensis, suggests that they do not need to compete with human leukocytes for, and may even supply host cells with, essential vitamins and nucleotides. This has been proposed to occur between the obligatory intracellular bacterium Wigglesworthia glossinidia and its insect host, tsetse fly (Zientz et al., 2004). The protein synthesis category includes many essential genes, such as those encoding ribosomal proteins, tRNA synthetases, RNA modification enzymes, and translation factors. Almost all of these proteins were expressed in mammalian hosts, except for A. phagocytophilum ribosomal protein L36 ( Table S1 in Supplementary Material). Previous studies have shown that ribosomal protein L36 is dispensable for Escherichia coli growth and protein synthesis (Ikegami et al., 2005), and the gene encoding ribosomal protein L36 was not identified in the closely related Neorickettsia spp (Lin et al., 2009), suggesting that L36 might not be necessary for members in the family Anaplasmataceae.
Anaplasma phagocytophilum and E. chaffeensis have a lower coding percentage for transporters compared to the free-living C. crescentus (Dunning Hotopp et al., 2006). Although nearly 100% of the proteins with known functions were expressed in HL-60 cells, few proteins involved in transport functions, like twin-arginine translocation protein TatA/E of A. phagocytophilum, and monovalent cation/proton antiporter MnhG/PhaG subunit family protein of E. chaffeensis were not detected in bacteria cultured in HL-60 cells (Tables S1 and S2 in Supplementary Material). Interestingly, although A. phagocytophilum and E. chaffeensis do not encode for intact prophage or transposable/mobile elements, a few phage core components (HK97-like portal, major capsid, and prohead protease) were identified scattered throughout their genomes, and their protein expressions were also confirmed by proteomics. The functions of these remnant phage components on bacterial infection of human hosts are unclear; however, some literature has suggested that they might be involved in lateral gene transfer, bacterial chromosome inversion, evolution, and virulence factors expression (Canchaya et al., 2003;Brussow et al., 2004).

expressIon profIlIng of A. phAgocytophilum and E. chAffEEnsis "hypothetIcal" proteIns
Since approximately 45% of the predicted ORFs in the genomes encode conserved or uncharacterized "hypothetical" proteins ( Table 1; Dunning Hotopp et al., 2006), whether they really encode proteins and whether these proteins are expressed in living organisms are largely unknown. Analysis of the expression profiles of these hypothetical proteins or proteins without known functions assigned showed that only 77.9 and 83.7% of them were expressed in A. phagocytophilum and E. chaffeensis, respectively ( Table 1). The much lower expression ratio of these "hypothetical" genes compared to those of proteins with known functions assigned (near 100%), suggests that the expression of certain "hypothetical" proteins might be regulated in different host environments, like the arthropod vectors, and play critical roles in responses to host adaptation.
As suggested by Ochman (2002) and Skovgaard et al. (2001), a substantial fraction of hypothetical ORFs in bacterial genomes are short (under 300 nucleotides in length); therefore, many of them might be random stretches of DNA and do not actually encode proteins. Since the N-and C-regions flanking hypervariable domains are highly conserved among P44 proteins, one peptide identified by proteomic analysis might actually match to several P44s. Therefore, we further analyzed all peptide matches to P44 proteins and confirmed that 84 P44 proteins (74.3%) were expressed with at least one unique peptide match ( Table 3; Table S5 in Supplementary Material). These results showed that silent p44 gene reserves distributed throughout the A. phagocytophilum genome can actually be recombined and expressed from the p44-18ES expression locus (Figure 1). In addition, the region near this expression locus showed greater numbers of identified peptides matched to P44 proteins encoded by either full-length p44 genes that can be expressed at their own loci or silent p44 genes that have to be recombined into the p44-18ES locus for protein expression (Figure 1), which could be due to higher transcription activities in this region and/or higher recombination activities with the p44-18ES locus. The expression of the entire P44 repertoire by populations of this bacterium in human leukocytes would ensure their rapid adaptation to changing host environments and successful parasitism in new host cells, as well as escaping host immune surveillance. These results confirm our previous findings from mRNA data that diverse P44s can be expressed at the p44-expression locus by gene conversion from over 100 p44 donor loci (Lin et al., 2003Wang et al., 2004;Lin and Rikihisa, 2005).
Appendix). These ORFs include one pair each of completely overlapping protein-coding ORFs in A. phagocytophilum (APH_0143/ APH_0144) and E. chaffeensis (ECH_0506/ECH_0507), one pair of overlapping ORFs between protein (ECH_0472) and 6SRNA1 genes (ECH_1158), and 10 out of 21 (A. phagocytophilum) or 4 out of 26 (E. chaffeensis) partial overlapping protein-coding ORFs (Figures A2 and A3 in Appendix). These data suggest that overlapping ORFs can actually be transcribed and translated into proteins in these organisms with reduced genome contents in order to increase their coding capacities.

A. phAgocytophilum and E. chAffEEnsis
Despite the reduction in their genome sizes and significantly lower coding capacity for metabolism, transport, and regulatory functions, A. phagocytophilum and E. chaffeensis not only retained but expanded a pool of genes encoding outer membrane proteins (Dunning Hotopp et al., 2006). Most of these outer membrane proteins are members of Pfam PF01617 and constitute the OMP-1/ MSP2/P44 family (Dunning Hotopp et al., 2006;Finn et al., 2010). Since A. phagocytophilum and E. chaffeensis cannot be transovarially transmitted in their arthropod vectors, and ticks must acquire these organisms by feeding on an infected vertebrate reservoir animal, it was proposed that the expansion of this gene family might allow persistence in the vertebrate reservoir by providing antigenic variation, thus allowing for effective transmission from mammals to ticks (Rikihisa, 2010a). The A. phagocytophilum genome has the largest expansion of the genes belonging to OMP-1/MSP2/P44 family among members of the family Anaplasmataceae, most of them encoding P44 outer membrane proteins (Dunning Hotopp et al., 2006). A total of 113 annotated p44 loci longer than 60 bp in gene length and some smaller DNA fragments homologous to p44 gene family can be identified throughout the genome, which consists of greater than 5% of the total genome contents (Dunning Hotopp et al., 2006). The full-length p44s consist of a central hypervariable region of approximately 280 bp encoding a signature of four conserved AA regions (C, C, WP, A) and conserved flanking sequences (Table 3; Lin and Rikihisa, 2005;Dunning Hotopp et al., 2006). By comparing the length and domains of the identified p44s to the full-length p44s, all p44 genes were annotated and classified as full-length, truncated, fragmented, or degenerated genes, as defined in Table 3 ( Dunning Hotopp et al., 2006). Due to the lack of start/stop codons, silent p44 gene fragments are unlikely to be expressed at their own genomic loci and have to be recombined into and expressed from the expression locus APH_1221 (p44-18ES) by a RecF-dependent recombination, as suggested by Lin et al. (2006). To assist proteomic detection of all possible P44 peptides, in-frame AA sequences were deduced from all p44 genes (including silent pseudogenes without start/stop codons and degenerated p44 fragments containing nonsense mutations) and used in the SEQUEST search. The expression of P44-59 in the A. phagocytophilum outer membrane from these pseudogenes had been confirmed previously (Ge and Rikihisa, 2007a). Results showed that in addition to 22 full-length P44s, peptides identified by proteomic analysis were matched to protein sequences deduced from nearly all p44 genes (97.3%), including 86 silent p44 gene fragments and 2 degenerated p44 genes ( Table 3).  (Huang et al., 2007;Kumagai et al., 2008). Since the tricarboxylic acid (TCA) cycle in A. phagocytophilum and E. chaffeensis is incomplete (Dunning Hotopp et al., 2006), porin activity of P44 and P28/Omp-1 likely feeds the TCA cycle, and the differential expression of P44 or P28/OMP-1s might influence individual bacterial physiological activity (Huang et al., 2007;Kumagai et al., 2008).

A. phAgocytophilum and E. chAffEEnsis
Following the determination of global expression profiling of these intracellular bacteria, we further determined the relative abundance of A. phagocytophilum and E. chaffeensis proteins expressed in human host cells. Quantitative analyses of protein expression were determined by averaging individual peptide abundances for the matching protein in the entire pool of peptides identified. Although different proteins do not contain the same peptides and protein abundances are not directly comparable, the relative correlation to the total abundance still exists, especially with at least threefold difference between proteins (Old et al., 2005). Quantitative analyses identified 130 proteins from A. phagocytophilum and 116 from E. chaffeensis as having relative abundances greater than 1 ( Ehrlichia chaffeensis has 22 paralogous tandemly arranged p28/omp-1 genes encoding immunodominant major outer membrane proteins (Ohashi et al., 1998a,b;Dunning Hotopp et al., 2006). Proteomics analyses showed that all these proteins and 27 other cell envelope proteins are expressed by E. chaffeensis in HL-60 cells ( Table 1). Nineteen out of 22 P28/OMP-1 proteins have also been confirmed by proteomic identification of surface-exposed proteins of E. chaffeensis cultured in the human acute leukemia cell line THP-1 (Ge and Rikihisa, 2007b). Temporal transcript analyses showed that mRNA expression of 16 out of 22 p28/omp-1 genes was detected in the blood from acute to chronically infected dogs (over 56 days of infection; Unver et al., 2002). Using 22 synthetic antigenic peptides unique to each of the P28/OMP-1 proteins, sera from persistently infected dogs were reacted with all P28/OMP-1 family proteins (Zhang et al., 2004). These data suggest that P28/ OMP-1 family proteins are not involved in immune evasion at the population level (Unver et al., 2002;Zhang et al., 2004).
Surface expression of porins that function as passive diffusion channels is required for small hydrophilic compounds to pass through the outer membranes of gram-negative bacteria (Nikaido and Vaara, 1985;Nikaido, 2003). Our previous studies have shown that both P44 and P28/Omp-1 have porin activities as measured by liposome swelling assay, allowing the diffusion of l-glutamine, monosaccharides arabinose and glucose, disaccharide sucrose, and even tetrasaccharide FiguRE 1 | Anaplasma phagocytophilum P44 expression maps as detected by proteomic analysis. All genes encoding P44 outer membrane proteins were plotted on the first circle. The bar heights on the second circle represented the number of P44-matching peptides detected, with higher bars indicating greater numbers of matching unique peptides. The third circle showed P44 proteins that had matched peptides but no unique peptide matches, and the fourth circle showed P44 proteins that had no peptide matches by proteomic analysis (APH_1122/P44-75, APH_1124/P44-C, and APH_1399/P44-C). The origin of the A. phagocytophilum genome was marked as (1), and the expression locus p44-18ES was highlighted by the green box. Color codes in circles 1, 2, and 4: Red, full-length p44s; Blue, truncated p44s; Green: N-terminal p44 fragments; Brown: C-terminal p44 fragments; Gray: Degenerated P44 fragments. A. phagocytophilum and E. chaffeensis up-regulated the expression of proteins involved mostly in vesicular trafficking and cytoskeleton components, protein tyrosine kinases, pro-survival proteins, and enzymes involved in metabolism and oxidative respiration ( Table 5; Tables S10 and S11 in Supplementary Material). However, some proteins involved in host immune responses were down-regulated, including pattern recognition receptors like TLR1 and mannose receptor 2 ( Table 6; Tables S12 and S13 in Supplementary Material). Several human genes that were up-or down-regulated by infection with A. phagocytophilum or E. chaffeensis have been reported previously. Up-regulated genes in human neutrophils at early stage of A. phagocytophilum infection included those that promote actin polymerization . Up-regulation of genes involved in iron metabolism like transferrin-receptor was detected in A. phagocytophilum-infected NB4 cell, a human promyelocytic leukemia cell line (Pedra et al., 2005), and E. chaffeensis-infected THP-1 cell, a human monocytic leukemia cell line (Barnewall et al., 1999). The expression of histone deacetylase (HDAC) 1/2 was increased in A. phagocytophilum-infected THP-1 cells (Garcia-Garcia et al., 2009). Down-regulation of TLR2/4 mRNA and protein expression was reported in E. chaffeensis-infected human monocytes . In addition, several reports have demonstrated the interactions between these up-regulated human proteins and bacterial proteins or activation of human proteins by bacterial infection. For example, the protein tyrosine kinase Fyn was shown to interact with E. chaffeensis TRP47 protein in THP-1 cells , whereas A. phagocytophilum induced actin phosphorylation by p21-activated kinase (PAK1) in Ixodes ticks (Sultana et al., 2010). A. phagocytophilum-containing morulae were colocalized with several in Supplementary Material). Classification by functional role categories showed that A. phagocytophilum and E. chaffeensis have similar numbers of abundant proteins in all but three functional categories (Table 4, in bold font). Due to the expansion in P44 outer membrane family proteins, more proteins are expressed abundantly by A. phagocytophilum in the "Cell envelope" category. On the other hand, E. chaffeensis abundantly expresses more proteins involved in the categories including "Protein synthesis," like ribosomal proteins and "Energy metabolism," like electron transport chain proteins, probably because E. chaffeensis has additional ability to synthesize arginine and lysine but A. phagocytophilum does not (Dunning Hotopp et al., 2006). Interestingly, greater than 12% of these abundantly expressed proteins are hypothetical proteins or proteins with unknown functions (Table 4), suggesting that these proteins might be required for infecting human host cells and could be novel targets for the study of pathogenic mechanisms in human infection.

QuantItatIve analyses of up-or down-regulated huMan proteIns In A. phAgocytophilum and E. chAffEEnsis-Infected hl-60 cells vs. unInfected cells
As obligatory intracellular bacteria, the life cycles of A. phagocytophilum and E. chaffeensis are dependent on their mammalian hosts and are known to regulate or hijack host components for their survival (Rikihisa, 2010a,b). We, therefore determined the relative abundance of human proteins by comparing the LC-MS peptide peak intensity information of the same peptides from infected HL-60 cells to that from uninfected cells. A total of 48,054 human proteins were identified from HL-60 cells ( Table S9 in Supplementary Material). Quantitative analyses of human proteins in infected vs. uninfected HL-60 cells showed that infection by Table 5 | up-regulated human proteins in infected vs. uninfected HL-60 cells by quantitative proteomics analysis 1 .

Rab/Rho gTPases)
ADP-ribosylation factor (ARF) 1/3/4/5; ARF GTPase-activating protein GIT2; Rab 5/7/11/27; Rap1; Rho/Rac GEF 2; cell division cycle 42 (CDC42); transferrin-receptor protein 1; clathrin heavy chain; diaphanous homolog (mDia) 1 ADP-ribosylation factor (ARF) 1/3/4/5; ARF GTPase-activating protein GIT2; Rab 1/5/7/8/10/11/35; Rho-associated protein kinase 2; Rap1; Rho/Rac GEF 2; cell division cycle 42 (CDC42); STE20-like kinase; citron (Rho-interacting, ser/thr kinase 21); integrin-linked kinase; Rab GTPases, including Rab11 (Huang et al., 2010a), and E. chaffeensis-containing morulae were colocalized with Rab5 (Mott et al., 1999). Both A. phagocytophilum-and E. chaffeensis-containing morulae were colocalized with major histocompatibility complex (MHC) class I and II antigens (Mott et al., 1999). Several isoforms of sarcoplasmic/endoplasmic reticulum calcium ATPase (SERCA) were up-regulated in A. phagocytophilum-and E. chaffeensis-infected HL-60 cells, suggesting proteins involved in the intracellular Ca 2+ regulation like phospholipase C and transglutaminase shown in previous studies are critical in bacterial infection de la Fuente et al., 2005). There are several studies using microarray analyses to identify genes differentially regulated in response to A. phagocytophilum infection in human neutrophils and the promyelocytic leukemia cell lines NB4 and HL-60 cells at different infection stages (Borjesson et al., 2005;de la Fuente et al., 2005;Pedra et al., 2005;Sukumaran et al., 2005;Lee and Goodman, 2006;Galindo et al., 2008;Lee et al., 2008). These studies identified similar sets of differentially regulated genes involved in vesicular transport, cytoskeletal remodeling, signaling and communication events, cell-cycle and apoptosis regulation, and innate immunity. However, due to the differences in host cell types, efficiency of infection, post-infection time points, experimental designs, array platforms, databases used, and statistical analyses, a large portion of the genes are difficult to compare among these studies (Pedra et al., 2005;Lee et al., 2008). Since most cell functions are carried out by proteins, the comparison of proteomic data would reflect a more accurate state of cellular physiology and pathology. Nevertheless, combining these microarray and quantitative proteomic data would allow more comprehensive understanding of host cellular changes induced by infection with these pathogens. Our proteomic analyses reveal that infection with A. phagocytophilum or E. chaffeensis could modulate human host cell machinery to produce more energy, enhance vesicular transport, and activate cell signaling events involved in bacterial entry and proliferation. Further analyses of these up-and down-regulated human proteins will provide more information about the global regulation of host cells by infection with these intracellular pathogens.

conclusion
The determination of bacterial proteomes is an important step in converting genetic information to protein function and cell biology. This study provides the first comprehensive proteomes of obligatory intracellular pathogens. A total of 1,212 A. phagocytophilum and 1,021 E. chaffeensis proteins are identified, representing 89.3 and 92.3% of the predicted bacterial proteomes, respectively. Nearly all proteins that have functions assigned are expressed in infected human hosts, including those involved in metabolism, pathogenesis, and regulation. Bacterial infection upregulated the expression of human proteins involved mostly in cytoskeleton components, vesicular trafficking, cell signaling, and energy metabolism, but down-regulated some pattern recognition receptors involved in innate immunity. The availability of these proteomic data will provide a wealth of information on the molecular mechanisms of bacterial pathogenesis and therefore will greatly facilitate the understanding of the biology of these ehrlichiosis agents and the signaling events between intracellular bacteria and their host cells.

supplementARy mAteRiAl
Tables S1-S14 can be found online at http://www. frontiersin. or g / Ce l lu l a r _ a n d _ In fe c t i on _ Mi c robi o l o g y / 1 0 . 3 3 8 9 / fmicb.2011.00024/   Detection of peptides by proteomics: ECH_0086 (51 aa