Allelic Variation in Outer Membrane Protein A and Its Influence on Attachment of Escherichia coli to Corn Stover

Understanding the genetic factors that govern microbe-sediment interactions in aquatic environments is important for water quality management and reduction of waterborne disease outbreaks. Although chemical properties of bacteria have been identified that contribute to initiation of attachment, the outer membrane proteins that contribute to these chemical properties still remain unclear. In this study we explored the attachment of 78 Escherichia coli environmental isolates to corn stover, a representative agricultural residue. Outer membrane proteome analysis led to the observation of amino acid variations, some of which had not been previously described, in outer membrane protein A (OmpA) at 10 distinct locations, including each of the four extracellular loops, three of the eight transmembrane segments, the proline-rich linker and the dimerization domain. Some of the polymorphisms within loops 1, 2, and 3 were found to significantly co-occur. Grouping of sequences according to the outer loop polymorphisms revealed five distinct patterns that each occur in at least 5% of our isolates. The two most common patterns, I and II, are encoded by 33.3 and 20.5% of these isolates and differ at each of the four loops. Statistically significant differences in attachment to corn stover were observed among isolates expressing different versions of OmpA and when different versions of OmpA were expressed in the same genetic background. Most notable was the increased corn stover attachment associated with a loop 3 sequence of SNFDGKN relative to the standard SNVYGKN sequence. These results provide further insight into the allelic variation of OmpA and implicate OmpA in contributing to attachment to corn stover.


INTRODUCTION
Waterborne diseases such as gastroenteritis, diarrhea, typhoid fever, and dermatitis can develop when people consume or come into contact with water contaminated by pathogenic microorganisms (Rosen, 2001;Pond, 2005). In the United States, more than 562,000 miles of rivers and streams and 12.1 million acres of lakes, reservoirs and ponds are classified as impaired due to pathogens (EPA) 1 . Worldwide, approximately 1.2 million people, about one quarter of whom were under 6 years of age, died from diarrheal diseases in 2013 (Murray et al., 2014). These diarrheal deaths can be attributed to a variety of microbial causes, including viruses, bacteria and amoeba. Escherichia coli, which is the focus of the research described here, was the known causative agent in more than 10% of the cases in which the etiology was determined (Murray et al., 2014). Bacteria in aquatic environments can attach to sediment particles, contributing to their persistence and transport (Schillinger and Gannon, 1985;Song et al., 1994;Davies and Bavor, 2000;Bai and Lung, 2005;Liao et al., 2015). To guide planning for water quality management and reduction of waterborne disease outbreaks, improved understanding of microbe-particle interactions is essential.
Bacterial attachment to a surface requires overcoming the repulsive forces between the cell and the surface. This surface attachment is the initial step of the highly ordered process of biofilm formation (Donlan, 2002;Van Houdt and Michiels, 2005). A variety of genes that encode or regulate cell surface structures have been demonstrated to be involved in biofilm formation, including, for example, motA, crsA, fimA, ag43, csgA, ompX, and pgaABCD (Pratt and Kolter, 1998;Vidal et al., 1998;Hare and McDonough, 1999;Danese et al., 2000;Jackson et al., 2002;Beloin et al., 2004;Otto and Hermansson, 2004). Such genes may also affect the initiation of attachment, given the fact that increased attachment leads to more frequent establishment of biofilms (Agladze et al., 2005). The Derjaguin-Landau-Verwey-Overbeek (DLVO) theory has been reported to efficiently predict bacteria-mineral interactions, in which the interaction force is a function of both cell surface net charge and hydrophobicity (Rijnaarts et al., 1999;Hori and Matsumoto, 2010;Zhao et al., 2014). In this context, cell surface net charge and cell surface hydrophobicity are regarded as two chemical properties that contribute to microbial attachment to solids, though some studies suggest that the contribution of hydrophobicity and surface charge may differ depending on the surface in question (Rivas et al., 2007;Bolster et al., 2009;Liao et al., 2015). Our previous characterization of 78 environmental E. coli isolates shows a range of cell surface charge from −39.9 to −6.8 mV in terms of zeta potential and hydrophobicity from 0.01 to 0.9 (Liang et al., 2016). These studies also showed that while cell surface hydrophobicity can be attributed to variations in the production of extracellular polymeric substances, the variations in cell surface charge did not correlate with any of the other measured properties (Liang et al., 2016). This lack of understanding of the basis of variations in cell surface charge highlights the need for further characterization of genetic factors contributing to cell surface properties.
At the interface between the bacteria and particles, cell surface structures, such as outer membrane proteins, flagella, fimbria, and extracellular polysaccharides affect the initial attachment step by contributing to the chemical properties of the cell surface. Although biofilm development is wellcharacterized, little is known about preexisting surface structures that mediate the initial interaction between bacteria and particles in the aquatic habitats. Moreover, while there are many studies about attachment of bacteria to plant tissues and quartz (sand) in lab conditions, little is known about the genetic factors that drive the attachment of E. coli to representative stream particles under environmentally relevant conditions. Proteomic analysis has proven to be an effective method to explore the surface structures which play key roles in attachment (Otto et al., 2001;Orme et al., 2006;Rivas et al., 2008).
More than 300 million acres of US farmland were planted with corn in 2015 (USDA, 2016). Production of the corn kernel is accompanied by the production of an approximately equal mass of corn stover, which includes such components as the stalks, cobs and husks (Adler et al., 2015). Corn stover consists of roughly 34 wt% glucan, 22 wt% xylan, and 18 wt% lignin and has a crystallinity index of 50 (Kumar et al., 2009). There are a variety of uses for the stover, including but not limited to, use as animal feed or for the production of biorenewable fuels and chemicals (Golecha and Gan, 2016). Here we have selected corn stover as a representative agricultural residue likely to be present in streams and sediment.
In this study, E. coli strains isolated from a local waterway were used to characterize the outer membrane proteins (OMPs) which contribute to the attachment of E. coli to corn stover under environmentally relevant conditions. This process was guided by the use of proteomic analysis and development of a competitive attachment assay, modified from previous work (Liu et al., 2011). This analysis includes consideration of flagella and OMPs, such as Outer Membrane Protein A (OmpA). Our work demonstrates the association of sequence variations within the OmpA protein with attachment to corn stover.

Bacterial Strains, Growth Condition, and Corn Stover Particles
Seventy-eight E. coli strains were selected from a pool of 400 E. coli strains isolated from Squaw Creek in Ames, Iowa, as previously described (Liang et al., 2016). These E. coli isolates were of different genotype according to our previous repetitive sequence based-PCR (rep-PCR) and fingerprint profile analysis, using a maximum similarity cut-off of 90% (Liang et al., 2016). The E. coli lab strain MG1655 was used as a K12 control. Cells were grown in M9 medium (pH = 7.4) with 0.4% (wt/vol) glucose at 37 • C, 250 rpm to early stationary phase (OD 600 = 1.0-1.5), and then harvested by centrifugation at 4 • C, 4,000 × g for 15 min. This growth condition was selected with the intention of approximating the conditions of the mammalian hosts that are the typical source of bacteria in agricultural water systems.
This growth and harvesting condition was used in preparing cells for the attachment assay and outer membrane protein extraction, unless otherwise indicated. Cells for molecular manipulation were grown in Luria-Bertani (LB) medium (pH = 7.0), at 37 • C or 30 • C as needed. When required, the medium was supplemented with 25 µg ml −1 chloramphenicol and 100 µg ml −1 ampicillin.
Corn stover was prepared by grinding dry corn leaf. The resulting powder was sieved using U.S. standard no. 270. The resulting particles with diameter less than 53 µm were collected and stored at room temperature. This corn stover had a carbon content of 38 wt% and surface area of 3 m 2 /g (Liang et al, in preparation).

Transmission Electron Microscopy
Negative staining and TEM imaging were conducted in the Microscopy and NanoImaging Facility at Iowa State University. Briefly, 3 µl of cells in water were placed onto a carbon film coated copper grid for 1 min. The supernatant was removed by wicking from the side with a piece of filter paper. Then 3 µl of 2% (wt/vol) aqueous uranyl acetate was placed onto the grid for 30 s. Excess stain was removed by wicking, and the grid was allowed to dry. Images were made using a JEOL 2,100 scanning transmission electron microscope at an accelerating voltage of 200 kV (Japan Electron Optics Laboratory, USA, Peabody, MA).

Motility Agar Stab Test
A motility agar stab test was performed as described previously (Greene et al., 1951). Briefly, E. coli strains were stabbed into glass tubes with LB containing 0.4% (wt/vol) agar. The tubes were incubated at 37 • C for 18 h or until growth was evident. A diffuse cloud of growth beyond the stab track indicates motility, while growth constrained to the stab track indicates a lack of motility.

Attachment Assays
Single-strain attachment assays were performed according to previously described methods (Liu et al., 2011;Zwonitzer et al., 2016) with modifications that are briefly described here. Fifteen µl of cell suspension were added to a 15-ml centrifuge tube (120 mm in length) containing 15 ml of saturated CaCO 3 solution (pH ∼ 8) and 1.06 mg l −1 corn stover. The target final E. coli concentration was 10 3 CFU ml −1 . These concentrations were selected based on previous reports (Terrio, 1996;Bai and Lung, 2005). After shaking at 80 rpm for 10 min, samples were allowed to settle for 1 min. The top 7.5 ml of liquid was transferred to another tube without disturbing the bottom layer. This top layer contained unattached bacteria. One drop of Tween 85 was added to the remaining liquid to separate the settled bacteria-particle aggregates. After vortexing for 10 s, each sample was diluted 10fold with phosphate buffer. One ml of the diluted samples was then used for enumeration of colony forming units (CFU) on LB agar plates. Note that the Tween treatment and vortexing have no significant impact on cell viability (data not shown). The attached fraction was calculated as: where C t is the number of CFU collected from the top half of the settled liquid and C b is the number of CFU collected from an equal volume of the bottom half of the settled liquid after addition of Tween and vortexting. The (C b -C t ) term represents the enrichment of attached bacteria in the bottom half of the liquid relative to the unattached bacteria assumed to dispersed uniformly throughout the top and bottom halves of the liquid. The (C t + C b ) term represents the total number of CFU queried. Up to six distinct tubes were assayed for each strain. An attached fraction was calculated for each tube, and then the average of all replicates was calculated. CFU counts ranged from 7 to 118, with an average value of 23.
When directly comparing two strains, such as a wild-type (WT) and mutant, we used a competitive attachment assay developed here. This assay is very similar to the single-strain assay described above, except that both the mutant and WT strains were incubated with particles for attachment in the same tube. Moreover, instead of enumerating the CFU of both the attached and unattached fraction, only the attached (bottom) fraction was enumerated. Specifically, after mixing and settling of the bacteria and particles, 1 ml of Tween-treated phosphate suspension from the bottom half of the settled liquid was enumerated for both total CFU attached (grown on LB agar plates with no antibiotics) and mutant CFU attached (grown on LB agar plate with corresponding antibiotics). The attachment ratio for mutant: wild-type was calculated as: The same experiment was conducted with no particle added as a control (ratio control ).
This direct comparison of the bottom fraction of the settled liquid is possible only because the WT and mutant strains can be distinguished via antibiotic resistance markers.

Cell Shearing
Approximately 10 10 cells were harvested at early stationary phase as described above, and washed twice with 10 ml CaCO 3 solution. The washed cells were resuspended in 10 ml CaCO 3 solution and subjected to blending (8 × 45 s) at 20,000 rpm on ice using a Homogenizer (Omni International., Kennesaw GA). After blending, cells were washed once using CaCO 3 solution and then diluted to a concentration of approximately 10 6 cells ml −1 .

Proteomic Analysis
OMPs were extracted using the ReadyPrep Protein Extraction Kit (Membrane II) (Bio-Rad, Hercules, CA, USA) according to the manufacturer's instructions. The RC DC Protein assay kit (Bio-Rad, Hercules, CA, USA) was used to measure the protein concentration. Either 200 or 100 µg of protein dissolved in 125 µl of rehydration/sample buffer, as provided in the protein extraction kit, was used for the first dimension isoelectric focusing (IEF) on immobilized-pH-gradient (IPG) strips (pH 3 to 10, 7, cm, Bio-Rad, Hercules, CA, USA) at 20 • C, approximately 40,000 V·h for 22 h. This first dimension separation was performed at the Protein Facility of Iowa State University.
The IPG strips were then loaded to 7 cm Any kD precast polyacrylamide gels, 8.6 × 6.7 cm (W × L, Bio-Rad, Hercules, CA, USA) for the second dimension separation. Electrophoresis was performed at 180 V until the dye front reached the bottom of the gel. Precision Plus protein standard plugs (10-250 kD, Bio-Rad, Hercules, CA, USA) were used as molecular weight standards. The gels were stained overnight using Coomassie blue, and then destained overnight.
Images of the 2D gels were acquired using Amersham Pharmacia Biotech image scanner flatbed scanner (Amersham Pharmacia Biotech Inc, Piscataway, NJ, USA). The images were analyzed using the SameSpots (TotalLab, UK) software. This analysis included quantification of both spot size and intensity. All spot size intensity and size values were normalized relative to a reference gel, which was selected on the basis of image quality. Proteins associated with specific spots were identified by excision of the spots from the gels, in-gel trypsin digestion and mass spectrometry. Specifically, proteins were identified by matrixassisted laser desorption ionization-time of flight tandem mass spectrometry (MALDI-TOF MS/MS) and database searching using Mascot (Matrix Science Inc, Boston, MA, USA). Image and protein analysis were performed at the Protein Facility of Iowa State University.

OmpA Sequencing and Characterization
Open reading frames (ORF) were PCR amplified using Q5 R High-Fidelity DNA Polymerase (New England Biolabs Inc, Ipswich, MA, USA), purified using QIAquick Gel Extraction Kit (QIAGEN, Germantown, MA, USA), and sequenced at the Iowa State University DNA Facility. Primers used in this study are listed in Table S1. Each ORF nucleotide sequence was converted to an amino acid sequence using the translate tool (http://web.expasy.org/translate/). All of the amino acid sequences are provided in Table S2.

Co-occurrence Analysis
To detect the potential co-occurrence of OmpA polymorphisms, the amino acid sequences at the locations of interest of each strain were combined and converted to a binary matrix. For example, the amino acid sequence of location 25 in OmpA is N in strain 3 and P in strain 6. After the binary conversion, strain 3's OmpA amino acid sequence at location 25 would now be 1 for N and 0 for P; while strain 6's OmpA amino acid sequence at location 25 would be 0 for N and 1 for P. The Spearman's rank correlation coefficients (ρ) were calculated for all possible combinations of sequences. The significance of each sequence combination was adjusted for false discovery rate using method developed by Benjamini & Hochberg (Benjamini and Hochberg, 1995). Sequence combinations with adjusted P ≥ 0.05 and −0.75 < ρ < 0.75 were eliminated. All calculations were performed using R 3.2.4 and package Hmisc.

Gene Removal and Plasmid Construction
Recombineering via the Lambda Red-mediated gene replacement system was used to delete genes from the bacterial genome. Briefly, the DNA fragment of chloramphenicol resistance gene with homologous arms upstream and downstream of the target genes was introduced into E. coli cells via electroporation. With the help of the recombinase enzymes encoded by plasmid pKD46, the target gene was then replaced by the antibiotic resistance gene, as described previously (Datsenko and Wanner, 2000). PCR was conducted to confirm the success of replacement.
To construct the plasmid expressing a selected OmpA pattern, a 1.3 Kb fragment containing MG1655's ompA promoter and the coding region of the selected ompA gene was obtained by overlapping PCR and subsequently cloned into plasmid pGEN-MCS using NotI and HindIII sites.

Statistical Analysis
One-way ANOVA and two-tailed student's t-test were used to analyze attachment data. Box and whisker plots were generated using BoxPlotR (shiny.chemgrid.org).

RESULTS
As previously described (Liang et al., 2016), a library of 400 E. coli strains was isolated from two locations within the Squaw Creek watershed in Ames, IA during the fall of 2012 and summer of 2013 and subjected to Rep-PCR fingerprint analysis. This analysis generated genome similarity scores for each pair of isolates and a set of 78 isolates with a maximum genome similarity score of 90% was selected for further study

Flagella Do Not Impact Corn Stover Attachment
The flagellum is a well-characterized filamentous appendage on the microbial surface which not only grants mobility in aqueous environments but also was reported to be a key genetic factor involved in biofilm formation (Pratt and Kolter, 1998;Van Houdt and Michiels, 2005). The roles of the flagella in initiating bacteriasurface attachment are (1) overcoming the repulsive force at the solid-liquid interface, and (2) providing opportunities for bacteria-surface contact (Pratt and Kolter, 1998;Gutman et al., 2013). The fliC gene encodes the flagellin protein, a primary subunit of the flagella filament. Each of the fliC deletion mutants constructed here was confirmed to be non-motile by the motility agar stab test ( Figure S1).
Using the competitive attachment assay designed in this study (Figure 1), we found that none of these fliC deletion mutants showed a statistically significant (P < 0.05) change in attachment to corn stover ( Table 1). It should be noted that this competitive attachment assay can only be used to directly compare strains that can be distinguished in the settled liquid. For example, here we used antibiotic resistance markers to distinguish the wild-type strain and the corresponding fliC deletion mutant.
These results demonstrate that while flagella may contribute to binding of cells to some types of particles, they are not involved in binding to corn stover. Thus, we sought to identify other proteins that might be involved in mediating attachment to corn stover.

Shearing Treatment Impacts Attachment of Some Isolates
A homogenizer was used to shear and remove the flagella from the surface of a set of 18 of our 78 isolates, using a method similar to previous reports (Rosenbaum and Child, FIGURE 1 | Schematic of the competitive attachment assay developed here, using attachment of wild type MG1655 and mutant MG1655: ∆fliC to corn stover as an example. Mutant and wild type ratio was calculated as mutant: (total-mutant). Flagellin component fliC was deleted and the corresponding mutant and wild-type were subjected to competitive attachment assay. Changes in attachment ability were obtained from the competitive attachment assay (n = 3) between the indicated ∆fliC mutant and corresponding wild type strain, with the mean value and standard error shown. If the ratio is greater than 1.0, the mutant has higher attachment ability than wild type. P-values were determined by comparing the ratios from the assays performed with and without corn stover via the Student's t-test.
1967). These 18 isolates were selected randomly from our pool of genomically distinct isolates. Note that this shearing process is also expected to remove other surface proteins. Images from the TEM examination show that this shearing procedure was effective in removing at least some flagella from the cell surface ( Figure S2). Specifically, flagella can be observed in images collected before blending, but not in the images collected after blending.
We characterized the attachment behavior of each of the 18 isolates in both the unblended and blended state (Figure 2). Note that this attachment assay sometimes produces negative values, meaning that there were more CFU in the top half of the settled liquid than in the bottom half, possibly due to cell buoyancy (Wan et al., 1995). Five isolates (57, 108, 117, 217, and 331) had at least a 5% decrease in attachment after shearing. This decreased attachment could possibly be attributed to removal of a surface protein or feature involved in promoting attachment. Three isolates (44, 69, and 308) had an increase of at least 5%. This increased attachment could indicate that a surface protein or feature that inhibited attachment was removed by the shearing process. The other 10 isolates had less than a 5% difference in attachment before and after shearing. This lack of change in attachment indicates that either the flagella, or other surface proteins, controlling the attachment behavior were not thoroughly removed by the blending process or that the attachment behavior of these strains is controlled by something that was not affected by the blending treatment.
The finding that the removal of surface proteins actually increased attachment in some cases was unexpected. This could possibly be due to the removal of a surface protein that had a repulsive interaction with the corn stover particles. The goal here was to identify genetic factors involved in promoting attachment to corn stover, so strains whose attachment decreased after shearing treatment were selected for further analysis.

Proteomic Analysis Identifies OmpA as Contributing to Attachment
To investigate the surface structures that contribute to bacteria-particle attachment, we did two-dimensional (2-D) gel electrophoresis analysis for OMPs. OMP profiles of three randomly-selected strains with decreased attachment ability to corn stover after blending (57, 117, and 217) were compared to three randomly-selected strains with no change in attachment (62, 100, and 157). This comparison identified 4 spots with potentially different expression levels between these two groups of strains ( Table 2). The most promising spot, #5352, was present on the gels of strains 62, 100, and 157, but was not detected on the gels of strains 57, 117, and 217 (Figure 3). In-gel digestion and MALDI-TOF MS/MS analysis identified this spot as outer membrane protein OmpA. Similar characterization revealed that spot #5405 is also OmpA. Thus, the spots associated with OmpA were in sufficiently different locations on the gels of strains 57, 117, and 217 vs. strains 62, 100, and 157 to be identified by the image analysis software as two distinct spots. These results suggest that OmpA may contribute to the patterns of attachment observed for our strains and this shifting of the OmpA-associated spot indicates that there are variations at the molecular level among the OmpAs. This led us to sequence the ompA gene for each of our 78 isolates.

Allelic Variation of OmpA
Consistent with the shifting OmpA spots in the proteome analysis, we found that there is substantial variation in the translated sequences (Figure 4, Table 3A). Specifically, polymorphisms were observed in each of the four extracellular loops, within three of the eight transmembrane segments, within the proline-rich linker and at two positions within the dimerization domain. As described below, some of these variations have been previously noted. The polymorphisms are described according to the amino acid numbering convention used by Pautsch and Schulz (Pautsch and Schulz, 1998). Note that the N-terminal domain of OmpA is generally considered to consist of residues 1-172 and the C-terminal domain of residues 189-325, with residues 173-188 being the proline-rich linker. Residues 189-276 comprise the proposed dimerization domain (Marcoux et al., 2014).
The polymorphism within loop 1 falls nearly in the middle of the loop, within the region designated by Pautsch and Schulz as "highly mobile" (Pautsch and Schulz, 1998). This residue is asparagine in nearly half (46%) of our isolates, and aspartic acid or proline in the other 15 and 39% of the isolates, respectively. Asparagine is uncharged, aspartic acid is negatively charged and proline is hydrophobic and contains a secondary amine group. Thus, the variation within loop 1 includes structurally and functionally variable amino acids.
The polymorphism within loop 2 consists of three amino acids and is also within a loop segment previously designated as "highly mobile." Slightly more than half of our isolates (51%) encode serine-valine-glutamate, 46% encode aspartateasparagine-isoleucine and two others encode variations of this SVE pattern: one encodes serine-valine-alanine and the other encodes serine-valine-valine. Thus, 97% of our isolates present one residue with a polar uncharged side chain (S or N), one hydrophobic residue (V or I), and one negatively charged acidic residue (E or D) in this variable region of loop 2.
The third extracellular loop, which was previously concluded to not have a "highly mobile" region (Pautsch and Schulz, 1998), consists not just of amino acid changes relative to the standard OmpA sequence, but also consists of the insertion of up to five additional amino acids. The polymorphism includes the first  amino acid of the subsequent transmembrane segment, with each isolate encoding either an asparagine or aspartic acid at position 114. Eight variations were observed in this position (Figure 4, Table 3A). The standard SNVYG was observed in 39.7% of our isolates, with another 9.0% encoding the two-residue variation of SNFDG, with bold font indicating the variant residues. Another 39.7% of the population encodes a sequence which is substantially different from the standard SNVYG. Specifically these isolates encode ANVPGGASF, with two other strains differing from this pattern at only one residue each (ANVPGGASY and ANVPGVASF) and two others differing at two or three residues (SNVPGGAST and SNVPGGVST). There is also a substantial representation (6.4%) of the sequence AHNNVTGESE. As with loops 1 and 2, the variation within loop 4 falls within the "highly mobile" section. This amino acid varies between histidine, positively charged, for 78% of our isolates and asparagine, uncharged, for the other 22% of the isolates.
The four remaining N-terminal domain polymorphisms are within transmembrane segments (Figure 4). Two are within the transmembrane segments flanking loop 3, both near the periplasmic end. At position 93, 50% of isolates encode isoleucine FIGURE 3 | Outer membrane proteomic analysis showed shifting of the OmpA-associated spot between groups of strains responding differently to blending. The photo on the left is a 2D gel selected as standard gel for SameSpots analysis. Spots were identified and selected for excision via the SameSpots software, but the images shown here are the raw images, manually labeled for clarity. The figures on the right show spots #5352, #5405, and #5737 ( Table 2) from each of the six strains. Excised spots were subjected to protein identification through in-gel digestion, MALDI-QUAD-TOF and MS/MS. and 50% encode valine, both of which are branched chain and hydrophobic. At position 129, 94% of isolates encode tyrosine and 6% encode tryptophan, both of which are hydrophobic and contain aromatic groups. The other two polymorphisms within the N-terminal domain are directly adjacent to extracellular loops. Position 114 encodes asparagine, polar and uncharged, in 55% of isolates and aspartate, negatively charged, in 45% of isolates. At position 161, 63% of isolates encode methionine and 37% encode leucine, both of which are hydrophobic.
Within the proline-rich linker and C-terminal domain, there are five residues at which variations were observed. Within the linker segment at position 175, three isolates encode a valine and the rest encode alanine. Valine and alanine both have hydrophobic side chains. At position 180, one isolate (strain 18) encodes a threonine, the rest encode an alanine (not shown). At position 203, 18 isolates encode a threonine, while the rest encode an asparagine. Asparagine and threonine both have polar, uncharged side chains, though asparagine is neutral and threonine is polar. At position 251, eight strains encode an alanine, while the rest of the isolates encode glycine. Glycine and alanine belong to different functional categories. At position 270, a single isolate (strain 385) encodes a serine instead of the alanine presented by the rest of the isolates (not shown). The C-terminal domain of OmpA is not as well characterized as the N-terminal domain, but these polymorphisms do fall within the dimerization region proposed between residues 189 and 276 (Marcoux et al., 2014).
Many of these variations have been previously described, mostly notably in two studies described below (Gophna et al., 2004;Power et al., 2006). To the best of our knowledge, we are the first to report the D variant at position 25 and the polymorphism at position 129. Additionally, these two previous studies only observed the two most common variants within loop 3: ANVPGGASF and SNVYG. The SNVPGGVST, AHNNVTGESE, SNFDG, ANVPGGASF and SNVPGGAST variants have not been previously described.

OmpA Outer Loop Sequence Patterns
While some of these polymorphisms have been previously described (Gophna et al., 2004;Power et al., 2006), both of these previous reports only considered a few E. coli strains or only considered two possible ompA alleles. Here we have expanded the list of possible patterns (Table 3A) and considered the cooccurrence of the various polymorphisms (Table 4).
We have named the patterns of the outer loop polymorphisms relative to their abundance in our isolate collection. Pattern I (N, SVE, SNVYG, H) is encoded by 33.3% of our isolates and is the standard sequence previously characterized for structural analysis, such as (Pautsch and Schulz, 1998). Pattern II (P, DNI, ANVPGGASF, N) is encoded by 20.5% of our isolate collection FIGURE 4 | Polymorphisms within OmpA among our 78 E. coli isolates. Residues are numbered and classified according to the description by Pautsch and Schulz (1998) and Marcoux et al. (2014). Residues 22-28, 65-68, and 147-157 were previously classified by Pautsch and Schulz (1998) as "highly mobile" regions of loops 1, 2, and 4, respectively and residues 188-276 were identified by Marcoux et al. (2014) as the dimerization domain. Two polymorphisms in the C-terminal domain were observed only in one isolate each and are not shown here, but are described in the text. Sequence logos were generated with Weblogo 3.5.0, with letter height representing probability. Colors indicate amino acid chemistry, with polar residues shown in green, neutral residues in purple, basic residues in blue, acidic residues in red and hydrophobic residues in black. and is the only pattern that includes an asparagine at loop 4 instead of a histidine. It should be noted that these two most abundant patterns differ from each other at each of the four loops. Pattern III (P, DNI, ANVPGGASF, H) differs from pattern II only at loop 4 and is encoded by 14.1% of our strains. Pattern IV (N, SVE, SNFDG, H) differs from pattern I only at two residues within loop 3 and is expressed by 7.7% of our isolates. Pattern V (D, SVE, AHNNVTGESE, H) is encoded by 6.4% of our isolates and differs from all of the other patterns at loops 1 and 3 and also includes all of the strains that encode a tryptophan at transmembrane position 129 instead of a tyrosine (not shown).
Two other sequences are designated in Table 3A as patterns, even though they were only observed in one isolate each. These were selected for comparative purposes. Pattern VI (D, DNI, SNVYG, H) differs from the most abundant pattern (I) only at loops 1 and 2. Pattern VII (D, SVE, SNFDG, H) differs from pattern I only at loops 1 and 3. Gophna et al.'s (2004) analysis identified the outer loop polymorphisms that we have designated pattern I in E. coli strains MG1655, O157:H7 EDL933, and K1 and pattern III in septicemic O78 strain 789 and uropathogenic strain CFT073 (Gophna et al., 2004). Power et al. compared the OmpA sequence of three E. coli isolates to the standard MG1655 sequence (Power et al., 2006). The human and Tasmanian devil isolates H474 and TA024 matched the MG1655 sequence, which we call pattern I. Their bird isolate, strain B194, encoded the sequence we have designated pattern II and that they designated ompA2. Power et al then used a PCR-based assay to screen 524 E. coli isolates and concluded that the ompA2 allele was present in 43% of strains (Power et al., 2006). This PCR assay utilized a forward primer that binds within the gene segment encoding the ANVPGGA portion of loop 3 and a reverse primer that binds to a nonvariable gene segment within the C-terminal domain. This assay would have given positive results for strains presenting the ANVPGGASF or ANVPGGASY polymorphisms in loop 3 and thus would have given positive results for 41% of our isolate collection, consistent with the previous PCR results. However, the sequence data that we present from our 78 isolates indicates that Power's ompA2 allele frequency calculation is incomplete, due to its lack of sensitivity to the polymorphisms in loops 1, 2, and 4.

OmpA Allelic Variation Is Associated with Attachment to Corn Stover
Our sequencing of OmpA was motivated by the differential response of some strains to shearing treatment. The three strains selected for proteomic analysis due to their decreased attachment to corn stover after blending (57, 117, and 217) all have the same polymorphisms within loops 1 (P), 2 (DNI), and 3 (ANVPGGASFKD). In contrast, the two strains that showed increased attachment to corn stover after blending (44 and 69) both encode the SNVYGKN variant at loop 3 and the H polymorphism within loop 4. Of the nine strains that showed less than 5% change in corn stover attachment after shearing (48, 62, 100, 105, 110, 127, 130, 147, and 157), only one (100) presents the P variation at loop 1, only two (100 and 110) present the DNI variation at loop 2, and only one (110) presents the ANVPGGASFKD variant at loop 3 associated with decreased attachment after shearing. None of these nine strains present the (P, DNI, ANVPGGASFKD) combination observed in the isolates that showed decreased attachment after shearing. Four of these nine strains (105,130,147,157) do encode the SNVYGKN at loop 3 and H at loop 4 associated with increased attachment after shearing. Thus, attachment to corn stover, especially decreased attachment after shearing treatment, appears to be related to the OmpA outer loop sequence. No such trends were apparent for the linker and dimerization domain polymorphisms.
In order to quantify the association of the OmpA polymorphisms and attachment of the isolates to corn stover, we performed the corn stover attachment assay for each of the isolates not already characterized (Table S4). Values ranged from −17 to 29%, with an average of 6.4%. As noted above, negative values for the attachment assay stem from higher cell counts in the top half of the settled liquid relative to the bottom half, possibly due to cell buoyancy (Wan et al., 1995).
Statistical analysis was performed for each sequence variation or pattern that was observed in at least five isolates (Figure 5). At locations where only two variants were present in at least five isolates, the Student's t-test was used. At locations where more than two variants were observed in at least five isolates, one-way ANOVA was used to compare the mean values, and then the Student's t-test was used to directly compare groups. Patterns VI and VII were not included in this analysis since they are each present in only one isolate each. Of the individual polymorphisms, significant trends were observed only for loop 3, with a one-way ANOVA P-value of 0.019. Direct comparison showed that strains encoding SNFDGKN had a significantly lower attachment than strains encoding ANVPGGASFKD (P = 0.0032) and the standard loop 3 sequence SNVYGKN (P = 0.011) (Figure 5).
When isolates were analyzed according to the outer loop patterns defined in Table 3A, significant differences were observed, with a P-value of 0.045 by one-way ANOVA (Figure 5). Isolates encoding pattern I significantly differed from pattern IV (P = 0.030), though these two patterns differ only by two residues within loop 3 (SNVYGKN vs. SNFDGKN). Also, pattern III differed from pattern IV (P = 0.0087), these patterns differ at loops 1, 2, and 3, but not loop 4. It is interesting to note that the two most abundant patterns, I and II, do not significantly differ (P = 0.66), despite encoding different sequence variants at each of the four extracellular loops.
When the linker and dimerization domain polymorphisms were included with the outer loop patterns, the one-way ANOVA P-value was 0.012. Both of the significant differences attributed to outer loop polymorphisms were still observed ( Figure 5). Specifically, strains encoding the (I, α) pattern significantly differed from those encoding (IV, β) (P = 0.0097) and strains encoding pattern (III, α) significantly differed from those encoding (IV, β) (P = 0.0032). This analysis also identified significant differences between patterns (II, α) and (IV, β) (P = 0.026) and (III, α) and (V, α) (P = 0.025). Thus, naturallyoccurring sequence patterns in OmpA are associated with significant differences in attachment to corn stover.

OmpA Allelic Variations Impact Attachment in a Single Genetic Background
Our results thus far suggest that allelic variation in the sequence of OmpA contributes to differences in attachment to corn stover. However, these results simply show an association and not a direct cause. This associative effect could involve a variety of other cell features. Therefore, we further explored the role of OmpA and variations in its sequence within a single genetic background. We selected the standard K-12 E. coli FIGURE 5 | Polymorphisms within OmpA are significantly associated with attachment of E. coli isolates to corn stover. Box plots were generated using BoxPlotR. The whiskers indicate the minimum and maximum values. The bottom of the box is the 1st quartile and the top of the box indicates the 3rd quartile. The dark line within the box is the median value. Box width is scaled according to the number of strains in each group. Shading of the boxes is for improved readability and does not convey meaning. Group-wide P-values were determined using one-way ANOVA, P-values between individual sequences and patterns were determined using the Student's t-test. strain MG1655, which encodes the (I, α) version, for these studies. Specifically, the effect of these sequence variations on the attachment behavior of MG1655 was explored by expressing seven distinct OmpA versions from a plasmid in the MG1655 ∆ompA strain (Table 5).
These results confirm some of the significant associations observed in the isolate studies. For example, isolates encoding pattern (I, α) had a significantly higher attachment to corn stover than strains encoding pattern (IV, β), P = 0.0097 (Figure 5). When these two versions of OmpA were expressed in the same genetic background, the attachment values were significantly different (P = 0.0058), and showed same trend. This shows that the difference in corn stover attachment observed in the isolate strains can be at least partially attributed to the variation in OmpA sequence.
However, not all of the significant associations observed for the isolates were observed in the MG1655 studies. For example, pattern (II, α) had a significantly higher mean attachment value than pattern (IV, β) in the isolates (P = 0.026), but no significant difference was observed when these two sequences were expressed in MG1655 (P = 0.1).
Other significant differences were detected in this comparison that were not detected in the isolate studies. Most notably, patterns (V, α), (VI, α), and (VII, β) each had a significantly higher attachment than patterns (I, α), (III, γ), and (IV, β) ( Table 5). Patterns V, VI, and VII each encode an aspartic acid in loop 1, while patterns I and IV encode an asparagine and pattern III encodes a proline. Note that patterns VI and VII were not analyzed in the isolate studies, since they were observed in less than five isolates.
To summarize, these results demonstrate that when expressed in the same genetic background, naturally-occurring sequence variations within OmpA result in significant differences in attachment to corn stover. Note that previous work has shown that overexpression (Ma and Wood, 2009) and deletion of ompA (Wang, 2002) can both lead to low viability. Consistent with these previous reports, we were unable to characterize attachment for ompA deletion mutants due to low survival of the attachment assay (data not shown).

DISCUSSION
The goal of this project was to increase understanding of the genetic factors that drive attachment of E. coli to a representative biotic environmental particle under environmentally relevant conditions. Here we have demonstrated that while flagella do not play a clear role in this attachment process, variations in the amino acid sequence of OmpA appear to play a significant role in attachment of E. coli to corn stover.
OmpA is one of the most abundant outer membrane proteins for Gram-negative bacteria, with expression of 2-3 × 10 5 molecules per cell (Movva et al., 1980). As a transmembrane protein embedded in the outer membrane, it crosses the membrane eight times, resulting in four large and hydrophilic extracellular loops and four small periplasmic turns (Vogel and Jähnig, 1986;Schulz, 1998, 2000; OmpA was cloned from the indicated template strain and expressed from the pGEN-MCS expression vector in MG1655 ompA. The one-way ANOVA P-value was 3.9 × 10 −5 . Amino acids are color-coded according to the convention described in the legend of Figure 4. P values are shown only when the value was less than 0.05.

2001
; Smith et al., 2007). OmpA can act as a porin for uptake of small molecules (Pautsch and Schulz, 1998;Arora et al., 2000), as a receptor for bacteriophage and colicin (Foulds and Barrett, 1973;Morona et al., 1984), and also as a structural protein to maintain the integrity of the outer membrane (Sonntag et al., 1978;Koebnik, 1995). It has also been reported to be an important target for immune response, a contributing factor to the pathogenicity of E. coli (Weiser and Gotschlich, 1991;Prasadarao et al., 1996;Belaaouaj et al., 2000;Wang, 2002;Godefroy et al., 2003;Jeannin et al., 2003), and a critical component of biofilm formation (Danese et al., 2000;Orme et al., 2006;Rivas et al., 2008). Sequence variation for OmpA has been previously noted among bacterial genera. Wang's (2002) comparison of E. coli K1, Salmonella typhimurium, Shigella flexneri, Enterobacter aerogenes, and Klebsiella pneumonia concluded that the amino acid sequence of the extracellular loops is greatly variable among genera (Wang, 2002). Weiss et al. (2008) comparison of OmpA across nine species also noted the variability among the extracellular loops. Some of these variations were shown to impact survival in acid, high salt and serum (Wang, 2002) and pathogenesis in tsetse fly (Weiss et al., 2008) but their role in attachment was not described.
OmpA sequence variation has also been reported within E. coli. Gophna et al. (2004) and Power et al. (2006) each observed many of the individual polymorphisms reported here. These previous studies linked OmpA polymorphisms to resistance to neutrophil elastase (Gophna et al., 2004) and bacteriophage (Gophna et al., 2004), but did not consider attachment. Our consideration of 78 strains enabled us to identify low-frequency sequence variations that have not been previously reported. Our larger dataset and consideration of the entire ompA sequence also allowed us to identify patterns and significant co-occurrences among these polymorphisms. The fact that 30% of the isolates in our collection encode outer loop sequences distinct from those described by these two previous studies emphasizes the contribution of this dataset to capturing the variability within E. coli OmpA.
When the polymorphisms in the transmembrane segments are not considered, we observed 22 distinct OmpA sequences (Table 3C). Fifty percent of our 78 isolates encode either pattern (I, α) or (II, α). When only the outer loop polymorphisms are considered, there are five patterns (I, II, III, IV, and V) that are observed in at least five isolates each, with the two most abundant patterns differing at each of the four extracellular loops. Significant co-occurrences were observed for polymorphisms in loops 1, 2, and 3, but not loop 4 ( Table 4). This variation in OmpA is consistent with a genome-wide analysis of Salmonella enterica that concluded that surface-exposed molecules display a high degree of allelic variation, possibly contributing to pathoadaptation (Yue et al., 2015).
The OmpA outer loop sequence variations showed a significant association with attachment to corn stover among our isolate collection (Figure 5). Statistically significant differences were detected for the loop 3 sequences SNVYKGN and SNFDGKN at each level of analysis: when grouping only by the loop 3 sequence (P = 0.011), when grouping only by the outer loop pattern (I vs. IV) (P = 0.0087), when grouping by the combined outer loop/linker/dimerization domain pattern (I, α vs. IV, β) (P = 0.0097) and when expressed in the same genetic background (P = 0.0058). However, OmpA sequence variation obviously does not explain all of the observed variation in corn stover attachment. Other outer membrane proteins and structures are likely to contribute to the variation that we observed between isolates expressing the same OmpA sequence. Krishnan et al. (2014) previously demonstrated through targeted mutagenesis that specific interactions can occur between loops 2 and 3 of OmpA to glycan groups on human cell receptors (Krishnan et al., 2014). Thus, it is possible that the attachment of E. coli cells to corn stover is also mediated by chemical bond formation between OmpA's extracellular loops and the glycan groups on the surface of the corn stover particle. Therefore, the changes in attachment associated with expression of different versions of OmpA can possibly be attributed to the formation of stronger or additional chemical bonds. A 2015 study of E. coli O157:H7 EDL933 concluded that deletion of ompA significantly decreases attachment to glass and to spinach (Nagy et al., 2015), though it was proposed that the decreased attachment may be due to problems with membrane structure and cell physiology.
It is possible that some of the differences observed in the MG1655 studies, particularly those that were not observed in the isolates (Table 5), are due to interactions of OmpA with other membrane components. For example, a 1980 publication cited difficulty in functional expression of ompA from clinical isolates in E. coli K12 (Beher et al., 1980). The authors concluded that successful cell surface expression of OmpA requires interaction between OmpA and lipopolysaccharides, where these lipopolysaccharides can vary between strains (Beher et al., 1980), consistent with the conclusions of other researchers (Schweizer et al., 1978). The lack of consistency between the associative studies and expression in MG1655 could also indicate that the differences in attachment behavior are due to features other than OmpA.
Previous reports of the role of flagella in bacterial attachment to surfaces have been mixed (Rivas et al., 2006;Gutman et al., 2013), consistent with our own mixed results. Beyond functioning as the appendage generating motility, flagella also contribute to surface hydrophobicity and net surface charge and associate with surfaces through both steric and electrostatic interactions (De Kerchove and Elimelech, 2008). Therefore, our observation that different E. coli isolates have a differential response to blending in terms of corn stover attachment (Figure 2) can possibly be attributed to other structures revealed on the cell surface after blending. Alternatively, the loss of flagella-mediated tumbling may have allowed increased interaction time between the cells and particle surfaces so that attachment was more likely. EPS have also been previously reported to play a role in bacterial attachment (Matthysse et al., 2008;Macarisin et al., 2012). However, our characterization of MG1655 deletion mutants of wcaD, bcsA, and pgaC suggested that these three EPS-associated genes play either no role or only a minor role in attachment to corn stover (data not shown).
A variety of methods have previously been used to test the attachment of bacteria to abiotic surfaces, such as deposition rate during flow through a packed column (Foppen and Schijven, 2005;Levy et al., 2007), flow cytometry (Liang et al., 2014), and settling, centrifugation, and filtration (Schillinger and Gannon, 1985;Soupir et al., 2008;Krometis et al., 2009;Liu et al., 2011). Here we developed a method of directly comparing attachment behavior of two strains (Figure 1). This method is a modification of the previously-described settling method to investigate the role of a specific genetic factor on attachment and should be of use in a variety of studies in which strains can be distinguished, such as through the use of antibiotic resistance markers.

AUTHOR CONTRIBUTIONS
CL, XL, and MS collected strains. CL performed rep-PCR; generated the strains and performed the sequence analysis and experiments for the data presented in Tables 1-3, 5, and Figures 2-5, Figures S1, S2 and wrote the manuscript draft. XL generated the isolate attachment data and developed the protocols for the attachment assays. FY performed the co-occurrence analysis. MS, AH, and MT contributed to experimental design, data analysis and manuscript writing. LJ contributed to experimental design, data analysis and finalized the manuscript.

FUNDING
Funding for this work was provided by the National Science Foundation grant CBET-1236510. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.