Clinical and Genomic Analysis of Liver Abscess-Causing Klebsiella pneumoniae Identifies New Liver Abscess-Associated Virulence Genes

Hypervirulent variants of Klebsiella pneumoniae (hvKp) that cause invasive community-acquired pyogenic liver abscess (PLA) have emerged globally. Little is known about the virulence determinants associated with hvKp, except for the virulence genes rmpA/A2 and siderophores (iroBCD/iucABCD) carried by the pK2044-like large virulence plasmid. Here, we collected most recent clinical isolates of hvKp from PLA samples in China, and performed clinical, molecular, and genomic sequencing analyses. We found that 90.9% (40/44) of the pathogens causing PLA were K. pneumoniae. Among the 40 LA-Kp, K1 (62.5%), and K2 (17.5%) were the dominant serotypes, and ST23 (47.5%) was the major sequence type. S1-PFGE analyses demonstrated that although 77.5% (31/40) of the LA-Kp isolates harbored a single large virulence plasmid varied in size, 5 (12.5%) isolates had no plasmid and 4 (10%) had two or three plasmids. Whole genome sequencing and comparative analysis of 3 LA-Kp and 3 non-LA-Kp identified 133 genes present only in LA-Kp. Further, large scale screening of the 133 genes in 45 LA-Kp and 103 non-LA-Kp genome sequences from public databases identified 30 genes that were highly associated with LA-Kp, including iroBCD, iucABCD and rmpA/A2 and 21 new genes. Then, these 21 new genes were analyzed in 40 LA-Kp and 86 non-LA-Kp clinical isolates collected in this study by PCR, showing that new genes were present 80–100% among LA-Kp isolates while 2–11% in K. pneumoniae isolates from sputum and urine. Several of the 21 genes have been proposed as virulence factors in other bacteria, such as the gene encoding SAM-dependent methyltransferase and pagO which protects bacteria from phagocytosis. Taken together, these genes are likely new virulence factors contributing to the hypervirulence phenotype of hvKp, and may deepen our understanding of virulence mechanism of hvKp.


INTRODUCTION
K. pneumoniae is a common opportunistic pathogen responsible for nosocomial infections, such as pneumonia and urinary tract infections (Podschun and Ullmann, 1998). In mid-1980s and1990, a new hypervirulent variant of K. pneumoniae (hvKp) causing invasive pyogenic liver abscess (PLA), was first described in Taiwan, and subsequently found worldwide, especially in Asia (Cheng and Lin, 1986;Siu et al., 2012;Shon et al., 2013). Different from "classic" K. pneumoniae (cKp), the new variants of K. pneumoniae exhibit enhanced virulence features. In addition to PLA, hvKp also cause other invasive diseases including abscesses at other sites (e.g., eyes, brain, prostate, and kidney), necrotizing fasciitis, and severe pneumonia with bacteremia (Hu et al., 1999;Saccente, 1999;Hyun et al., 2014;Kim et al., 2014). Although hvKp infection appears to occur often in diabetic patients, a particularly disconcerting problem is its ability to cause community-acquired, life-threatening infection among young and healthy individuals (Pomakova et al., 2012).
hvKp strains often form colonies with a hypermucoviscous phenotype, which can be defined semi-quantitatively by a positive "string test, " a method that has been widely used for identification of hvKp (Siu et al., 2012). However, not all hvKp strains are hypermucoviscous since non-hypermucoviscous hvKP have been reported (Tan et al., 2014;Qu et al., 2015). Serotyping and sequence typing of hvKp have been widely reported. Most hvKp have K1 or K2 capsular serotypes. Capsular with mucoid phenotype protects bacteria from phagocytosis and bactericidal effect of serum (Lin et al., 2004;Yeh et al., 2010). However, non-K1/K2 serotype hvKp strains were also reported and some K1/K2 strains can also be cKp strains (Mizuta et al., 1983;Yu et al., 2008;Brisse et al., 2009). Several sequence types (ST) have shown to be associated with hvKp. For example, ST23 is most associated with the K1 serotype, whereas ST86 and ST65 are often associated with K2 serotype (Chung et al., 2008;Siu et al., 2011;Lin et al., 2014). hvKp strains with other ST types were also reported (Merlet et al., 2012;Luo et al., 2014). Overall, it is difficult to define a hvKp strain solely based on colony phenotype, serotyping or sequence typing.
Several virulence factors have been identified to be associated with hvKp, including iron acquisition systems salmochelin (iroBCDN)/aerobactin (iucABCDiutA), and the regulator of mucoid phenotype A gene (rmpA/A2) (Hsieh et al., 2008;Hsu et al., 2011). Both salmochelin/aerobactin systems and rmpA/A2 are found almost in all reported hvKp strains locating on a large plasmid (Struve et al., 2015). Salmochelin/aerobactin was shown to enhance the virulence of K. pneumoniae in a mouse model (Nassif and Sansonetti, 1986). The important role of RmpA in virulence has been demonstrated in animal model, as deletion of rmpA decreased virulence 1000-fold (Nassif et al., 1989). However, transforming of these factors into a plasmid cured Abbreviations: LA, liver abscess; PLA, pyogenic liver abscess; KLA, Klebsiella pneumoniae-caused pyogenic liver abscess; LA-Kp, liver abscess causing K. pneumoniae; hvKp, hypervirulent variants of K. pneumoniae; cKp, "classic" K. pneumoniae. hvKp strain could not fully restore the hypervirulent phenotype (Nassif et al., 1989), indicating that there are additional virulence factors yet-to-be identified.
The first complete genome sequence of hvKp was reported for NTUH-K2044 (ST23, K1 serotype), a most studied liver abscess-causing K. pneumoniae (LA-Kp) isolated from Taiwan (Wu et al., 2009). The NTUH-K2044 harbors a 224-Kb plasmid (pK2044) carrying the virulence genes iroBCDN, iucABCDiutA, and rmpA/A2. pK2044 is similar to pLVPK, another plasmid that was found in the hvKp strain CG43 and is required for K. pneumoniae-caused pyogenic liver abscess (KLA) (Chen et al., 2004). Recently Holt et al. analyzed genomic sequences among invasive and non-invasive K. pneumoniae strains, and found that the presence of rmpA/A2 and the iron acquisition systems (iroBCDN, iucABCDiutA) are significantly associated with K. pneumoniae causing invasive human infection (Holt et al., 2015). Another recent genomic comparison analysis between clonal complex 23 (CC23) K. pneumoniae isolates (a group of hvKp strains) with non-CC23 K. pneumoniae strains found that all the CC23 strains harbored a pK2044-like plasmid encoding iroBCDN, iucABCDiutA, and rmpA (Struve et al., 2015). Overall, these genomic analyses support the notion that the presence of large plasmid in hvKp, and that plasmidencoded iroBCDN, iucABCDiutA, and rmpA/A2 virulence genes are tightly associated with hvKp strains. In addition, several gene clusters have been proposed to be associated with hvKp (Struve et al., 2015). Nevertheless, whether all hvKp strains contain a pK2044-like plasmid and whether other genes on the plasmids are conserved among hvKp strains remain unclear.
Since string test, serotyping and sequence typing cannot definitely define a K. pneumoniae isolate as a hvKp strain, we took the clinical features into account in addition to the microbiological phenotypes as recently proposed by Siu et al. (2012) and Shon et al. (2013). Given that liver abscess is the representative clinical syndrome of hvKp infection, in this study, we solely focused on the hvKp isolates cultured from most recent LA subcutaneous drainage, which provided us clearly defined hvKp strains. We then performed clinical, molecular and genomic studies. We sequenced two representative LA-Kp clinical isolates. To the best of our knowledge, this is the first report of LA-Kp genome sequences from mainland China. We discovered that LA-Kp has a diverse range of plasmids. We further identified 21 new genes associated with LA-Kp, which may deepen our understanding of virulence mechanism of hvKp in general.

Ethics Statement
The CT-guided LA subcutaneous drainage and sample collection were performed with written informed consent from LA patients during January 2014 to January 2016, and were approved by Fifth Affiliated Hospital of Wenzhou Medical University Ethics Committee and Nanjing First Hospital Ethics Committee. The operating procedures were conducted in accordance with the national guidelines in China (Li and He, 2001). Isolates from blood, sputum, and urine samples were collected as part of the routine clinical management of patients, according to the national guidelines in China (Shang et al., 2015). Therefore, informed consent was not sought.

Clinical Bacterial Isolation
Besides LA-causing bacteria, clinical K. pneumoniae isolates collected from blood, sputum and urine samples were also included for comparative study of PCR screen of LA-associated genes. Blood samples were taken from patients who has bloodstream infections but without LA. Sputum and urine samples were taken from patients who had pneumonia or urinary tract infections, but without bacteremia. LA drainage, sputum, and urine samples were plated on Columbia blood agar plates and incubated at 37 • C for bacterial isolation. Blood samples were first incubated in blood culture bottles and the presence of bacteria in blood culture was detected by BacT/ALERT system. Positive blood cultures were then plated on Columbia blood agar plates and incubated at 37 • C for bacterial isolation. Strains of positive cultures were identified by VITEK 2 Compact System (bioMérieux, France). All isolates were stored in 25% (v/v) glycerol broth at minus 80 • C until use.

Antimicrobial Susceptibility Testing
Minimum inhibitory concentrations (MICs) of antimicrobial agents were determined by the agar dilution method and interpreted following the recommendations of the Clinical and Laboratory Standards Institute (CLSI, Wayne, PA, USA). Escherichia coli ATCC 25922 was used as the quality control strain.

String Test
The hypermucoviscous phenotype of K. pneumoniae was identified by a positive string test, which is defined as the formation of a viscous string >5 mm in length when a colony grown overnight on a sheep blood agar plate at 37 • C was stretched by a bacteriology inoculation loop (Siu et al., 2012).

Multilocus Sequence Typing (MLST) and
Serotyping of K. pneumoniae MLST was performed by PCR amplification and subsequent sequencing of seven housekeeping genes of K. pneumoniae according to protocols provided on the MLST website for K. pneumoniae (http://bigsdb.pasteur.fr/klebsiella/klebsiella.html). Capsular serotype of each K. pneumoniae isolate was performed by PCR of the serotype-specific genes for K1, K2, K5, K20, K54, and K57 serotypes as previously reported (Turton et al., 2010).

PCR Detection of the Targeted Genes
Genomic DNA was extracted from all K. pneumoniae isolates using bacterial genomic DNA extraction kit (TIANGEN Biotech, Beijing, China). Targeted genes were amplified by polymerase chain reaction (PCR). PCR primers for each target genes are listed in Table S1. S1-Pulsed-Field Gel Electrophoresis (S1-PFGE) for Detecting and Sizing Plasmids of LA-Kp S1-PFGE was carried out for detection and determining the size of the endogenous plasmids in all clinical LA-Kp isolates as described previously (Barton et al., 1995). Briefly, Total DNA of LA-Kp were embedded in agarose gel plugs. The plugs were digested with S1 nuclease (TaKaRa) and then separated by PFGE. Genomic DNA of Salmonella enterica serovar Braenderup H9812 digested with XbaI was used as a molecular standard.

Genome Sequencing
Genomic DNA of clinical K. pneumoniae isolates GN-2, GN-3, and XL-1 were extracted using bacterial genomic DNA extraction kit, and sequenced using Illumina MiSeq sequencing technologies as described previously (Etienne et al., 2012). Fragment libraries were constructed using the Nextera kit (Illumina, San Diego, CA) followed by 300-bp paired-end sequencing on a MiSeq sequencer (Illumina) according to the manufacturer's instructions. The sequencing reads were assembled using SPAdes V3.5 with default parameters to include only contigs of more than 500 nucleotides (Bankevich et al., 2012). The genes were predicted and annotated using PATRIC online tool (Wattam et al., 2014).

Genome Comparative Analysis
To identify highly conserved genes specifically present in LA-Kp, three LA-Kp genomes (GN-2, GN-3, and NTUH-K2044) and three non-LA-Kp genomes (XL-1, HS11286, and MGH78578) were compared using whole genome orthologous gene comparative analysis. To reduce bias introduced by different annotation platforms or parameters, the genomes of NTUH-K2044, HS11286, and MGH78578 were re-annotated by using PATRIC online tool. The protein sequences of the six K. pneumoniae strains were clustered into customized orthology clusters by orthoMCL (v1.4) with default parameters (Li et al., 2003). Genes in orthology clusters containing only genes from LA-Kp were considered as LA-associated gene. The LAassociated genes were identified by in-house scripts. BLASTN atlases of the chromosomes and plasmids of GN-2, GN-3, CG43, and NTUH-K2044 were constructed using BLAST Ring Image Generator (BRIG v0.95) with default parameters (Alikhan et al., 2011). The genome sequences of K. pneumoniae isolates from liver abscess (n = 44), blood (n = 45), and sputum and urine (n = 59) (Table S2) were downloaded from the European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) (Bialek-Davenet et al., 2014;Holt et al., 2015;Struve et al., 2015). The genome sequences were screened for LA-specific genes using a read mapping approach with SRST2 (Inouye et al., 2014).

Functional Comparative Analysis
Protein sequences of the three LA-Kp (GN-2, GN-3, and NTUH-K2044) and three non-LA-Kp (XL-1, HS11286, and MGH78578) were aligned to the Clusters of Orthologous Groups (COGs) (latest version) database via BLAST (v2.2.26) (E < 1e −5 ), and the best hits were selected as the COG annotations (Ye et al., 2006;Galperin et al., 2015). The COG clusters are referred to as the finished COG categories. The virulence factors were annotated by PATRIC online tool with PATRIC VF database. The presence of antibiotic resistance coding genes of the above six K. pneumoniae strains was investigated by using of the web-based tool ResFinder at http://cge.cbs.dtu.dk/services/ResFinder (Zankari et al., 2012).

Nucleotide Sequence Accession Numbers
The accession numbers for the three K. pneumoniae isolates sequenced in this study are available at DDBJ/ENA/GenBank under the bioproject PRJNA349219.
As shown in Table 2, all 40 patients with K. pneumoniaecaused liver abscesses (KLA) were community-acquired, and 55% (22/40) were diagnosed with underlying disease of diabetes. Most of KLA (67.5%) happened in the right lobe of liver. Eight patients (20.0%) had metastatic infections, including 4 patients with lung abscess, 3 with eye abscess and 1 with endocarditis. Blood leukocyte and C-reaction protein were elevated in 32 (80.0%) and 39 (97.5%) LA patients, respectively. Blood test showed disordered liver function in 35 KLA patients. None of the patients died after appropriate treatment.

Microbiological Characteristics of LA-Kp
All the 40 LA-Kp isolates were susceptible to the 10 antimicrobials tested including cefazolin, cefepime, cefotetan, aztreonam, levofloxacin, ciprofloxacin, amikacin, imipenem, piperacillin-tazobactam, and ampicillin-sulbactam. All LA-Kp isolates were resistant to ampicillin, and two isolates were resistant to SMZ-TMP.

Genome Sequencing and Analysis of hvKp Associated Genes on Chromosome
To further investigate the molecular and genomic features of LA-Kp, whole genome sequencing and analysis were conducted for 2 LA-Kp isolates, GN-2 and GN-3, and one multi-drug resistance (MDR) cKp isolated from blood for comparison, XL-1. GN-3 is a typical hvKp strain, which is ST23 sequence type and K1 serotype, similar to the well-studied LA-Kp strain NTUH-K2044. GN-2  Frontiers in Cellular and Infection Microbiology | www.frontiersin.org FIGURE 1 | Microbiological characteristics of 40 K. pneumoniae isolates from liver abscess drainage. Phylogenetic tree was derived from MLST analysis of 7 housekeeping genes of K. pneumoniae. K. pneumoniae NTUH-K2044 was included in the analysis and is marked by an asterisk. NT, non-typeable. Straight line below the figure represents phylogenetic distance among isolates.
(ST485, K5) was selected because over 35% LA-Kp isolates in our study are non-typical, and ST485 sequence type is a new hvKp sequence type that has not been reported to-date. About 1.9, 1.6, and 1.7 M pairs of 300 bp pair-end reads were generated and assembled into 54, 52, and 83 contigs with final genome sizes of 5.61, 5.74, and 5.53 Mbp for GN-2, GN-3, and XL-1, respectively. The detailed information of sequenced genomes is listed in Table 3. Previously, several genes or clusters on chromosome have been reported to be associated with hvKp. We mapped these genes/clusters to the genomes of GN-2 and GN-3, as well as to other two LA causing K. pneumoniae, NTUH-K2044 and CG43, whose complete genomic sequences are available. kfu region, allS region and a 56-kb putative pathogenicity island kpc, were reported to be associated with virulence in NTUH-K2044 (Chou et al., 2004;Ma et al., 2005;Wu et al., 2010). These three regions were all found in GN-3, but not detected in GN-2 and CG43 (Figure 2A). Additional 10 genomic regions (R1-R10 in Figure 2A) consisting of 72 genes were previously reported to be associated to CC23 (Struve et al., 2015), a group of hvKp strains with most of which are ST23 sequence type. In our analysis, these 72 genes were all present in GN-3 and NTUH-K2044 (both belong to ST23, K1). However, only 3 of these genes were detected in GN-2 (ST485, K5) and CG43 (ST86, K2), including 2 genes encoding for CRISPR-associated proteins and one encoding for a hypothetical protein.

LA-Kp Isolates Harbor Plasmid(s) Varied in Size and Number
Sequence analysis showed that both GN-2 and GN-3 harbored a plasmid (designated as pGN-2 and pGN-3 respectively) (Wu et al., 2009). pGN-2 and pGN-3 were then further compared to pK2044 and pLVPK. As shown in Figure 2B, aerobactin (iucABCDiutA), salmochelin (iroBCDN), rmpA, and rmpA2, previously reported to be associated with LA-Kp, were all detected on pGN-2 and pGN-3. pGN-3 (purple) aligned well with pK2044 (red) and pLVPK (pink). However, the sequence of pGN-2 (blue) showed nearly 40% difference from the other three plasmids. The finding of pGN-2 with variable sequences implies that not all the genes on the pK2044 and pK2044-like virulence plasmids, but a portion of the genes, are required for the hypervirluent phenotype of LA-Kp.

Functional Comparative Analyses Revealed Differences between LA-Kp and Non-LA-Kp
Functional analysis was conducted by clustering annotated genes according to 25 Clusters of Orthologous Groups (COG) categories; more than 80% of genes were clustered with  COGs. As shown in Figure 4, seven COG categories showed significant difference between LA-Kp and non-LA-Kp isolates. Numbers of genes in categories of "mobilome: prophages, transposons" and "cell cycle control, cell division, chromosome partitioning" in LA-Kp were significantly less than those in non-LA-Kp. In addition, LA-Kp isolates had significantly more members of genes in other 5 categories than non-LA-Kp strains, including "carbohydrate transport and metabolism, " "amino acid transport and metabolism, " "coenzyme transport and metabolism, " "secondary metabolites biosynthesis, transport and catabolism, " and "general function prediction only." We also compared the distribution of antimicrobial resistance (AMR) genes between 3 LA-Kp and 3 non-LA-Kp. AMR comparative analysis showed that LA-Kp only carried one antibiotic resistance gene on chromosome (Table S3), encoding for SHV beta-lactamase, which is responsible for the resistance to ampicillin (except that GN-2 carried an additional fosfomycin resistance gene fosA). Three non-LA-Kp isolates carried several plasmid-encoded drug resistance genes, such as bla KPC-2 , bla TEM-1B, and bla CTX-M-14 , other than the chromosomal SHV beta-lactamase gene (Table S3).

Whole Genome Orthologous Gene Comparative Analysis Revealed New LA-Associated Virulence Genes
We firstly identified the virulence factors annotated by PATRIC_VF database and compared them between 3 LA-Kp strains (GN-2, GN-3, NTUH-K2044) and 3 non-LA-Kp strains (XL-1, HS11286, and MGH78578) ( Table 3). Comparative analysis identified 10 virulence genes that were uniquely present in LA-Kp but not in non-LA-Kp, including the known virulence  factors aerobactin (iucABCDiutA) and salmochelin (iroBCDN), and pagO, a gene that has not been shown to be associated with LA previously (Table S4).
Since virulence factors recorded in the existing database are limited, orthologous gene comparative analysis of whole genome was conducted for 3 LA-Kp and 3 non-LA-Kp isolates ( Table 3). Of 4649 orthologous genes shared by 3 LA-Kp isolates, 133 genes were found only present in LA-Kp genomes (Table S5). Strikingly, 84.9% of these genes (113 of 133) were located on the large virulence plasmid. To further examine the specificity of the 133 LA-associated genes identified, these genes were subjected to screening in 148 K. pneumoniae genome sequences obtained from public databases, including 45 isolated from LA, 44 isolated from bloodstream, and 59 from sputum and urine (general information of these strains was listed in Table S2). A gene with an occurrence frequency in LA-Kp more than 75% was considered as a high prevalence gene. Of the 133 genes, 129 were highly prevalent in LA-Kp isolates. However, among the 129 genes, 63 genes were present in isolates from blood with an occurrence frequency more than 25% (Table S6), indicating that not all the 133 genes are specific to LA-Kp.
To identify the genes specific to LA-Kp, we applied the accuracy rate as an indicator. Accuracy rate was calculated by dividing the sum of true positive cases in LA-Kp group and true negative cases in non-LA-Kp group with the total number of cases examined. A gene with an accuracy rate greater than 85% was designated as a highly associated gene. Upon calculating the accuracy rate for 45 LA-Kp and 44 K. pneumoniae isolates from blood, 30 out of 133 genes were identified as highly LAassociated genes ( Table 4). Of the 30 highly associated genes, 9 were iroBCD, iucABCD, and rmpA/A2. In addition to pagO identified above, 20 newly identified LA-associated genes shared highly similar distribution with iroBCD, iucABCD, and rmpA/A2 in LA-Kp and non-LA-Kp from blood ( Figure 5, genes with unknown function were not shown). Among the 30 genes, 27 are located on the plasmid, whereas 3 genes are on the chromosome (Table 4). Furthermore, these 30 genes showed low prevalence (6-10%) among 59 genome sequences of K. pneumoniae isolates from sputum and urine (which are generally considered as cKp) (Figure 5). The accuracy rates of these genes were high (88-94%) in distinguishing LA-Kp from K. pneumoniae isolated from sputum and urine (Table S7). Thus, these newly identified 21 genes are highly associated with LA-Kp.
Among the 21 predicted LA-associated genes ( Table 4), pagO encodes for a PhoPQ-activated integral membrane protein and is required for virulence in Salmonellae spp. (Gunn et al., 1998). Genetic screening suggests that pagO in LA-Kp may be required for liver abscess induction in a mouse model (Tu et al., 2009). SAM-dependent methyltransferase has been reported to play an important role in the adaptation to hostile environment of the macrophage and acid stress in Mycobacterium tuberculosis (Healy et al., 2016). shiF has been speculated as an auxiliary gene that promotes the transport of lysine, the precursor of aerobactin (iucABCD) in E. coli. Its expression is upregulated when exposed to chicken serum or iron-deficient environment (Lemaitre et al., 2012). In Klebsiella pneumoniae, shiF is located adjacent to iucABCD which further supports its potential role in virulence. LuxR is a key quorum-sensing regulator in controlling the virulence gene expression in many bacteria, such as Vibrio alginolyticus and Xanthomonas oryzae (Xu et al., 2015;Gu et al., 2016). Several other genes including hemin and fecIRA gene cluster are associated with iron acquisition, suggesting that they may play a key role in hvKp pathogenicity similar to aerobactin and salmochelin. Other genes were mobile elements and phagerelated genes, which may contribute to the integration of plasmid into chromosome.

Distribution of Newly Identified LA-Associated Virulence Genes among Clinical K. pneumoniae Isolates
To confirm the specificity of LA-associated virulence genes identified by bioinformatics analysis, PCR analysis was performed for a totally 126 clinical K. pneumoniae isolates, including 40 isolates from liver abscess, 50 from blood and 36 from sputum or urine in patients without bloodstream infection. As shown in Figure 6, in addition to iroBCD, iucABCD and rmpA/A2, four newly identified LA-associated genes including the gene encoding SAM-dependent methyltransferases, pagO, luxR, and shiF were 100% present in LA-Kp, and only 2-11% in K. pneumoniae isolates from sputum or urine. The occurrences of other newly predicted LA-associated genes including ibrB, fecIRA cluster, wcaJ, and genes encoding Fe 3+ -citrate ABC transporter, lysozyme, CP4-like integrase, and alginate lyase, were also significantly higher in K. pneumoniae isolated from liver abscess (80-97%) than those in K. pneumoniae isolates from sputum or urine (2-11%).

DISCUSSION
We have just begun to understand the nature of hvKp, and many questions regarding the hypervirulent nature of this pathogen remain to be answered. In this study, we isolated 40 LA-Kp strains from most recent PLA cases in mainland China. We found that over 90% of pathogens isolated from surgical drainage of PLA were K. pneumoniae. The majority of patients had underlying disease of diabetes. The vast majority of the LA-Kp were susceptible to main antimicrobial agents. K1/K2 serotype contributed to most of LA-Kp and ST23 was the dominant sequence type. Sequencing and S1-PFGE analyses in this study led to one of the important findings that there is a diverse plasmid profile among hvKp strains. It has been thought that all hvKp strains harbor a pK2044-like plasmid that carries genes encoding siderophores (aerobactin/salmochelin) and rmpA/A2 which are critical for the virulence of LA-Kp (Struve et al., 2015). We first identified a plasmid pGN2 in hvKp that is dramatically different from pK2044. We then examined the plasmid profiles of all 40 LA-Kp clinical isolates by S1-PFGE and found that 77.5% of the LA-Kp isolates harbor a single pK2044-like large virulence plasmid, and 12.5% isolates have no plasmid. In addition, 10% of LA-Kp isolates contained two or three plasmids. To the best of our knowledge, this phenomenon has not been reported in LA-Kp isolates or any other hvKp strains heretofore. This is important since it has been postulated that hvKp strains are resistant to acquire plasmids, which may account for the observation that hvKp tend to be highly antimicrobial susceptible (Alcántar-Curiel and Giron, 2015). Our finding that the coexistence of the large virulent plasmid and other plasmids suggests that hvKp could acquire other plasmids. A recent study reported that the bla CTX-M carrying plasmids could be acquired by hvKP . Therefore, a concern of multidrugresistant hvKP infection might be raised.
For the 5 LA-Kp isolates that contain no plasmid, we hypothesized that the plasmid has integrated into the chromosome based on the fact that these isolates all have aerobactin, salmochelin and rmpA/A2. Tang et al. also reported several K1 and K54 plasmidless hvKp isolates in Taiwan, and demonstrated that pLVPK (a plasmid similar to pK2044) was integrated into the chromosome (designated as chromosomeintegrated form) in these isolates (Tang et al., 2010). Among our chromosome-integrated isolates, 3 belonged to K5 serotypes and 2 belonged to K1. Given that a large plasmid is likely difficult to be maintained by the cell, integration of virulent plasmid into the chromosome may provide an advantage for hvKp to maintain the virulence factors and its invasive nature.
Besides aerobactin, salmochelin and rmpA/A2, little is known for other virulence genes in hvKp. Previous studies suggested that the chromosomally carried allS gene cluster, kfu gene cluster and kpc fimbrial gene cluster were associated to virulence in hvKp K1 serotype strains (Chou et al., 2004;Ma et al., 2005;Wu et al., 2010). Recent genome comparative study focusing on K. pneumoniae CC23 isolates suggests that these FIGURE 6 | PCR analysis of the distribution of LA-associated genes among 126 clinical K. pneumoniae isolates. Genes/gene products in GRAY background area are located on the large virulence plasmid, and genes in SILVER background area are located on chromosome. Genes/gene products within the RED box are the newly identified LA-associated genes.
clusters were mainly associated to CC23 or K1 serotype. CC23 or K1 serotype strains also include some non-hvKp (Liu et al., 2014). In our analysis, clusters of alls, kfu, kpc fimbrial and 10 genomic regions were not detected in GN-2 (ST485, K5) and K. pneumoniae CG43 (ST86, K2). Thus, these clusters may contribute to virulence in CC23 or K1 serotype isolates, or may be the conserved genes but not virulenceassociated.
In our study, over 40% LA-Kp clinical isolates are non-CC23. Therefore, we performed genomic comparative analysis between groups of LA-Kp (including non-CC23) and non-LA-Kp for identification of the virulence factors in LA-Kp. First, we compared the known virulence genes in PATRIC virulence database between 3 LA-Kp and 3 non-LA-Kp and identified a new LA-Kp associated virulence factor, pagO. Considering that the virulence genes in database are limited, we then compared all the genes in 3 LA-Kp strains to genes in 3 non-LA-Kp strains based on their orthologous relationships, and identified 133 genes that were present only in LA-Kp, including iucABCD, iroBCD and pagO. We further tested the specificity of the 133 genes to LA-Kp by their prevalence in 45 LA-Kp and 103 non-LA-Kp genome sequences, which led to identification of 21 new genes that are highly associated with LA-Kp. Finally, we tested the prevalence of the newly predicted genes in 40 LA-Kp and 86 non-LA-Kp clinical isolates by PCR analysis. The result showed that the genes encoding SAM-dependent methyltransferases, pagO, luxR, and shiF were present 100% in LA-Kp, but only 2-11% in isolates from sputum or urine.
Other genes tested showed 80-97% in LA-Kp and only 2-11% in isolates from sputum or urine. Thus, the specificity test results of these genes in clinical isolates supports bioinformatics prediction.
Several of the 21 newly predicted LA-associated genes identified in this study have been shown to be involved in virulence in other bacterial species, supporting the hypothesis that these genes are potential virulence genes in LA-Kp. A genetic screening study showed that one of these genes, pagO, has been shown to be involved in liver abscess formation by LA-Kp (Tu et al., 2009). In that study, an oral infection model was established for liver abscess formation by LA-Kp. Using a signature-tagged transposon mutant library, they identified 28 genes whose mutations resulted in reduced ability to develop liver abscess. However, no complementation experiment was performed to confirm the mutant phenotype. One caveat of our study is that roles of those predicted virulence factors in LA formation have not been tested genetically. We are currently in the process of performing genetics to elucidate the functions of those genes in LA-Kp virulence.

AUTHOR CONTRIBUTIONS
MY, JT, MW, and JH designed the study. MY and JT drafted the manuscript. JT, YZ, ZC, ZY, CS, JY, and XZ collected clinical isolates and clinical data. JJ conducted the bioinformatic analyses. MY, YB, WY, JR, TZ, ZS, and BD carried out experiments. QG and XX raised several useful suggestions. All authors read and approved the final manuscript.