Genetic and Phenotypic Features to Screen for Putative Adherent-Invasive Escherichia coli

To date no molecular tools are available to identify the adherent-invasive Escherichia coli (AIEC) pathotype, which has been associated with Crohn’s disease and colonizes the intestine of different hosts. Current techniques based on phenotypic screening of isolates are extremely time-consuming. The aim of this work was to search for signature traits to assist in rapid AIEC identification. The occurrence of at least 54 virulence genes (VGs), the resistance to 30 antibiotics and the distribution of FimH and ChiA amino acid substitutions was studied in a collection of 48 AIEC and 56 non-AIEC isolated from the intestine of humans and animals. χ2 test was used to find frequency differences according to origin of isolation, AIEC phenotype and phylogroup. Mann–Whitney test was applied to test association with adhesion and invasion indices. Binary logistic regression was performed to search for variables of predictive value. Animal strains (N = 45) were enriched in 12 VGs while 7 VGs were more predominant in human strains (N = 59). The prevalence of 15 VGs was higher in AIEC (N = 49) than in non-AIEC (N = 56) strains, but only pic gene was still differentially distributed when analyzing human and animal strains separately. Among human strains, three additional VGs presented higher frequency in AIEC strains (papGII/III, iss and vat; N = 22) than in non-AIEC strains (N = 37). No differences between AIEC/non-AIEC were found in FimH variants. In contrast, the ChiA sequence of LF82 was shared with the 35.5% of AIEC studied (N = 31) and only with the 7.4% of non-AIEC strains (N = 27; p = 0.027). Binary logistic regression analysis, using as input variables all the VGs and antibiotic resistances tested, revealed that typifying E. coli isolates using pic gene and ampicillin resistance was useful to correctly classify strains according to the phenotype with a 75.5% of accuracy. Although there is not a molecular signature fully specific and sensitive to identify the AIEC pathotype, we propose two features easy to be tested that could assist in AIEC screening. Future work using additional strain collections would be required to assess the applicability of this method.

To date no molecular tools are available to identify the adherent-invasive Escherichia coli (AIEC) pathotype, which has been associated with Crohn's disease and colonizes the intestine of different hosts. Current techniques based on phenotypic screening of isolates are extremely time-consuming. The aim of this work was to search for signature traits to assist in rapid AIEC identification. The occurrence of at least 54 virulence genes (VGs), the resistance to 30 antibiotics and the distribution of FimH and ChiA amino acid substitutions was studied in a collection of 48 AIEC and 56 non-AIEC isolated from the intestine of humans and animals. χ 2 test was used to find frequency differences according to origin of isolation, AIEC phenotype and phylogroup. Mann-Whitney test was applied to test association with adhesion and invasion indices. Binary logistic regression was performed to search for variables of predictive value. Animal strains (N = 45) were enriched in 12 VGs while 7 VGs were more predominant in human strains (N = 59). The prevalence of 15 VGs was higher in AIEC (N = 49) than in non-AIEC (N = 56) strains, but only pic gene was still differentially distributed when analyzing human and animal strains separately. Among human strains, three additional VGs presented higher frequency in AIEC strains (papGII/III, iss and vat; N = 22) than in non-AIEC strains (N = 37). No differences between AIEC/non-AIEC were found in FimH variants. In contrast, the ChiA sequence of LF82 was shared with the 35.5% of AIEC studied (N = 31) and only with the 7.4% of non-AIEC strains (N = 27; p = 0.027). Binary logistic regression analysis, using as input variables all the VGs and antibiotic resistances tested, revealed that typifying E. coli isolates using pic gene and ampicillin resistance was useful to correctly classify strains according to the phenotype with a 75.5% of accuracy. Although there is not a molecular signature fully specific and sensitive to identify the AIEC pathotype, we propose two features easy to be tested that could assist in AIEC screening. Future work using additional strain collections would be required to assess the applicability of this method.
The AIEC pathotype is defined as E. coli strains able to adhere to and invade intestinal epithelial cells (Boudeau et al., 1999), as well as to survive and replicate inside macrophages without inducing apoptosis but promoting the release of high levels of tumor necrosis factor alpha (Glasser et al., 2001). They lack common virulence factors of intestinal pathogenic E. coli, instead they present similar virulence traits to Extraintestinal Pathogenic E. coli (ExPEC) (Baumgart et al., 2007;Martinez-Medina et al., 2009b;Miquel et al., 2010;Nash et al., 2010).
The AIEC pathotype was described about 20 years ago in a patient with ileal CD (Boudeau et al., 1999) and since then substantial research has been conducted to elucidate the molecular mechanisms of AIEC virulence and its relation with the disease pathogenesis. Indeed, it has been demonstrated that AIEC benefit of CEACAM6 and CHI3L1 receptors, overexpressed in CD, to promote its adhesion and invasion to intestinal epithelial cells (IECs) located in the ileum via the FimH adhesin of the type-1 pili (Barnich et al., 2007;Carvalho et al., 2009) or to colonic IECs via the chitinase ChiA (Low et al., 2013), respectively. AIEC translocation may also occur through the M cells present in the Peyer's patches by means of FimH and long polar fimbriae (LpfA; Chassaing et al., 2011). Translocated AIEC cells may survive macrophage engulfment and multiply inside mature phagolysosomes (Bringer et al., 2005(Bringer et al., , 2007, what implies continuous secretion of cytokines and chronic macrophage activation (Glasser et al., 2001). AIEC link with the disease is reinforced by its ability to stimulate granuloma formation in vitro, which is a common histopathological feature of CD (Meconi et al., 2007). Genetic defects (Lapaquette et al., 2012) and elevated expression of some microRNAs (Nguyen et al., 2014) related to impairment of autophagy in CD patients contribute to unrestrained AIEC intracellular replication and persistent infection.
Since AIEC identification still relays on phenotypic assays based on infected cell cultures, which are extremely time consuming and hard to standardize, the finding of molecular tools or rapid tests to easily identify the AIEC pathotype would definitely be of interest for scientists studying the epidemiology of the pathotype and clinicians that aim to detect what patients are colonized by AIEC to apply personalized treatments.
In this study, the prevalence and/or sequence variants in gene products of a number of virulence genes (VGs) as well as the antimicrobial resistance profile of AIEC and non-AIEC strains has been compared in order to look for signature traits that could assist in a rapid AIEC identification.

E. coli Collection
The E. coli collection used in this study was composed by three groups of strains: (I) strains previously isolated from the intestinal mucosa of CD patients and controls (C) under the approval of the Ethics Committee of Clinical Investigation of the Hospital Josep Trueta of Girona on May 22, 2006(Martinez-Medina et al., 2009a, (II) strains previously isolated from animals suffering from enteritis under routine microbiological diagnostic procedures (Martinez-Medina et al., 2011), and (III) strains newly isolated from colorectal cancer (CRC) and UC patients (Supplementary Table 1). Biopsies from CRC and UC patients were taken from the ileum and/or colon with sterile forceps, immediately placed in sterile tubes without any buffer, and maintained at 4 • C for E. coli isolation. The study protocols for CRC and UC strains were approved by the local Ethics Committees (CEIC-Institut d'Assistència Sanitària, in April 2009 andJanuary 2012;and CEIC-Hospital Universitari de Girona Droctor Josep Trueta, in May 2006). All subjects gave written informed consent in accordance with the Declaration of Helsinki. Additionally, the AIEC reference strain LF82, which was a kind gift from Prof. Darfeuille-Michaud (Université d'Auvergne, France), was also included. Information about the strains examined in each section (VGs prevalence, FimH and ChiA sequence variants and antibiotic resistance) can be found in Table 1 and Supplementary

Virulence Genotyping by PCR
Fifty-four VGs from different groups, including adhesins, toxins, invasins, iron scavenging involved genes and genes involved in capsule formation and stress resistance, were amplified by PCR as defined previously (Martinez-Medina et al., 2011). In addition, lpfA genes have also been studied in human-isolated strains. PCR primers for lpfA 141 and lpfA 154 genes were extracted from  Chassaing et al. (2011) and PCR conditions were applied as explained therein. All genetic elements studied (either genes or alleles) were referred as VGs in this work.

Gene Sequencing and Sequence Analysis
For fimH gene, PCR primers and program conditions were applied as described elsewhere (Iebba et al., 2012). To sequence chiA gene, a set of four primers were designed in the present study. Two independent PCRs were performed in order to amplify the whole gene (2694 bp). The first PCR was carried with ChiA-84F (5 -TCATATTGAAGGGTTCTCG-3 ) and ChiA1711R (5 -TCCAGTCAACAAAAACACGC-3 ) leading to an amplicon of 1795 bp. The second PCR was carried with ChiA897F (5 -TAATAATGGCGGTGCTGTGA-3 ) and ChiA+12R (5 -TCGCCAACACATTTATTGC-3 ), what resulted in an amplicon of 1818 bp. Primers ChiA897F and ChiA1711R were used to sequence a fragment of approximately 550 bp in the middle of the gene in which previously described mutations were located. PCR products were purified by ExoSap (Thermo Fisher Scientific) following manufacturer's instructions and sequenced by Sanger method (Macrogen, Netherlands). Sequences were cleaned and aligned with BioEdit software (Hall, 1999) using K-12 gene sequence as a reference (fimH gene ID: 948847; chiA gene ID: 947837) and uploaded in GenBank (MH730201 -MH730304). Nucleotide sequences were translated using EMBOSS Transeq (Rice et al., 2000). Reticulate trees were constructed with PopART software (Leigh and Bryant, 2015) using the median joining algorithm, considering only the variable DNA positions that caused non-synonymous amino acid changes.

Adhesion and Invasion Assays
Adhesion and invasion assays were performed for isolates obtained from CRC and UC, whereas isolates from C, CD and animals were previously assessed (Martinez-Medina et al., 2009aCamprubí-Font et al., 2018). Briefly, the Intestine-407 epithelial cell line (ATCC CCL-6) was used for the adhesion and invasion assays. Both assays were performed in triplicate as described previously (Boudeau et al., 1999). LF82 and K-12 strains have been used as positive and negative control, respectively. Adhesion values were indicated as number of bacteria per I-407 cell (bacteria/I-407 cell). Invasive ability was expressed as the percentage of the initial inoculum that became intracellular: I_INV (%) = (intracellular bacteria/4 × 10 6 bacteria inoculated) × 100.

Survival and Replication Within Macrophages
The replication capacity of AIEC isolated from CRC and UC, as well as, non-AIEC strains isolated from CD and C subjects was assessed in this study. The capacity of AIEC strains isolated from CD, C or animals were previously assessed (Martinez-Medina et al., 2009a. For survival and replication assays, the murine macrophage-like J774A.1 cell line (ATCC TIB-67) was used and assays were performed as depicted previously (Bringer et al., 2006). LF82 and K-12 strains have been used as positive and negative control, respectively. The results are expressed as the mean percentage of intracellular bacteria recovered at 1 and 24 h post-infection: I_REPL (%) = (CFU ml −1 at 24 h/CFU ml −1 at 1 h) × 100.

Antibiotic Resistance
The collection of strains isolated from human was screened against 30 antimicrobial agents using the Vitek

Statistical Analysis
The significance of frequency values, for prevalence of VGs was measured by Pearson's χ 2 test using SPSS 23.0 software according to phenotype and phylogroup. In terms of differences in the frequency of particular mutations in the FimH or ChiA protein sequence, Pearson's χ 2 test was used only for those variable positions harbored by more than three strains. For quantitative variables (adhesion and invasion index), the Mann-Whitney non-parametric test was applied. Binary Logistic Regression was employed to depict a predictive model to classify AIEC strains. All data about VGs prevalence, amino acid variants and antibiotic resistance were included in the model. In all cases, a p-value ≤ 0.05 was considered statistically significant.

Virulence Gene Repertoires
Animal vs. Human E. coli Strains Prevalence of 54 VGs were assessed in a collection of E. coli strains, including AIEC and non-AIEC, isolated from the intestine of both animals and humans (N = 104). Of those, 19 presented differential distribution according to host origin ( Figure 1A, Supplementary Figure 1A and Supplementary  Table 4). Twelve genes (malX, hlya, pks, hra, iroN, pic, sfa/foc, eaI, cnf, focG, ireA and papGII/III) were more frequent in animal-isolated strains (29-84%) than in human-isolated strains (7-42%; p ≤ 0.025), and 7 genes (traT, iucD, iutA, iha, sat, papGII and neuC) were more prevalent in strains isolated from humans (present in 15-68% of total human strains) than in those from animals (0-40%; p ≤ 0.010). Considering only those strains of the AIEC pathotype, 17 VGs were still associated with origin of isolation. Of those, 11 were more frequent in strains isolated from animals and 6 in human-strains (p ≤ 0.046) ( Figure 1B and Supplementary Table 4). In non-AIEC strains the prevalence of VGs was more similar when analyzing data by origin of isolation. In this case, 10 out of 54 genes were differentially distributed; six were overrepresented in animal strains and four in human strains (p ≤ 0.038) ( Figure 1C and Supplementary Table 4).
The distribution of virulence-associated genes was examined according to phylogroup. Considering the whole collection of strains, 55.6% (30 genes) of the VGs studied was associated with the phylogenetic origin of the strains (Supplementary Table 5).
Most of the studied genes (29/30) were mainly related with B2 and/or D phylogroups, except for csgA gene, which was more frequent in A and B1 phylogroups. Of note, 17 of the 19 genes associated with either human or animal hosts were differentially distributed depending on the phylogenetic origin (Supplementary Table 5).
Considering that the distribution of phylogroups was different between animal and human strains (p < 0.001) ( Supplementary  Table 3), we selected the most abundant phylogroup (B2) to perform the comparisons, in order to avoid differences due to phylogenetic origin. Interestingly, genes previously associated with origin of isolation in the whole collection maintained its significance after selecting B2 strains only (Supplementary Table 4). Concerning AIEC and non-AIEC strains, 15/17 and 5/10 VGs respectively were still differentially distributed according to origin of isolation, when only B2 phylogroup strains were analyzed (Supplementary Table 4).
All the genes reported to be differentially represented according to pathotype were also associated with phylogroup, with the exception of sfaS (Supplementary Table 5). Differential phylogroup distribution was reported between AIEC and non-AIEC strains studied in this section (p = 0.002), as non-AIEC strains were more predominant in A, B1 and D phylogroup while AIEC mainly constituted the B2 (Supplementary Table 3). Therefore, to prevent phylogroup as confounding factor, the analyses were performed only with B2 strains. Apart from three genes (papGII/III, sfaS and pic) that maintained its differential distribution between AIEC and non-AIEC strains, the others did not associate with pathotype (Supplementary Table 6).
To unveil possible differences in gene prevalence due to isolation origin, we further evaluated the 54 VGs in each group of strains (45 from animals and 59 from humans) (Figures 2B,C and  Supplementary Table 6). Indeed, 13 out of the 20 genes found significant when analyzing all the strain collection, maintained the significance in strains isolated from animals but not in human strains. Only pic gene was more prevalent in AIEC strains irrespectively of strains' host.
Of note, genes previously found to be more frequent in AIEC than in non-AIEC strains from human (lpfA 154 , and chuA) (Dogan et al., 2014;Céspedes et al., 2017) reported similar percentage of PCR-positive AIEC/non-AIEC strains.

FimH and ChiA Amino Acid Substitutions
Since it has been suggested that differences regarding phenotype may rely on variations in the protein sequence, in this study alterations in FimH and ChiA have been explored. For this analysis, strains isolated from C, CD, UC and CRC were considered (N = 58; 31 AIEC and 27 non-AIEC strains).
Fifty-four strains presented the fimH gene, representing 93.9% of AIEC and 92.6% of non-AIEC strains. As shown in Figure 3A and Supplementary Table 7, a total of 19 FimH amino acid substitutions were found among the strain collection which grouped the strains in 21 variants. There was no variant comprising uniquely or mainly AIEC strains. When comparing the sequence of AIEC/non-AIEC strains globally, both groups of strains presented on average two substitutions throughout the FimH sequence (AIEC: 2 ± 1; non-AIEC: 2 ± 1; p = 0.915). Individually, none of amino acid substitutions associated with the disease of isolation neither with AIEC phenotype. Only N70S and S78N were related to phylogenetic origin, as they were only found in C or CD-isolated strains from the B2 (69.2    Only amino acid substitutions existent in more than three strains were examined. E. coli K-12 commensal strain was used as reference. a Only strains isolated from Crohn's disease or controls were considered. Atypical strain was discarded. NS: not significant. and 73.1% of strains, respectively) and D (25.0 and 25.0% of strains, respectively) phylogroup (p < 0.001) ( Table 2). Despite that, significant difference in terms of invasion index was achieved depending on the amino acid present in the 119 position but no divergence was found for the adhesion capacity. In this case, strains presenting the amino acid A (equal to K-12) had lower invasion values (0.223 ± 0.402% of intracellular bacteria/inoculum; N = 47) in comparison to strains with V (0.401 ± 0.477% of intracellular bacteria/inoculum; N = 7; p = 0.048).
Regarding chiA gene, 86.2% AIEC strains (N = 31) and 63% non-AIEC strains (N = 27) presented this gene (p = 0.044). Twenty-four variable amino acid positions were found, assembling the strains in a total of 16 variants ( Figure 3B and Supplementary Table 8). Again, similar protein sequence variants were reported among strains isolated from diverse groups of subjects (Figure 3). None of the mutations identified were associated with AIEC, neither the five mutations previously described (K362Q, K370E, A378V, E388V, V548E) (Low et al., 2013; Table 2). However, a subcluster of strains with chiA sequence identical to LF82 included a higher proportion of AIEC strains (85%) than non-AIEC strains (15%; p = 0.027). Nevertheless, this variant represented only the 35.5% of all AIEC strains and the 7.4% of total non-AIEC strains. Besides, the number of variable positions differed among pathotypes, being slightly higher for AIEC strains (10 ± 5) than in non-AIEC strains (8 ± 6; p = 0.038). Of note, most of the strains harboring an amino acid different from K-12 strain were from the B2 phylogroup, being V335G amino acid change an exception as it was only reported in A-phylogroup strains ( Table 2).

Test for Rapid AIEC Identification
To further establish a strategy that allows rapid identification of AIEC strains, we combined all the data of VGs carriage, amino acid variants and antibiotic resistance and performed Binary Logistic Regression to search for predictive features for AIEC screening (Table 3). In the present work, the combination of ampicillin resistance (Odds ratio = 5.244; 95% CI = 1.325-20.757) together with the prevalence of the pic gene (Odds ratio = 4.854; 95% CI = 1.140-20.638) uncovered a possible technique to identify AIEC strains, as it classifies strains according to the phenotype with a 75.5% of global success (P(AIEC) = -1.974+1.657 × ampicillin resistance + 1.579 × pic gene). For a given E. coli strain already isolated from human intestine that presents ampicillin resistance and harbors the pic gene, the probability to be AIEC would be of 87.81%. This probability is reduced to 59.76 and 57.87% if the strain has either ampicillin resistance or the pic gene, respectively, and it ends up to 22.07% if the strain is sensible to ampicillin and does not present the pic gene. Another combination resulted also significant (ampicillin resistance with vat gene prevalence). However, low sensitivity was achieved in this case (sensitivity 50%, specificity 77.8% and accuracy 65.3%).

DISCUSSION
The AIEC pathotype has been involved in CD, and our knowledge about its distribution in other intestinal or extraintestinal diseases as well as the reservoirs and transmission paths is scarce. One reason of that is due to the fact that AIEC identification is based on phenotypic traits undergoing cell-culture infection assays, which are extremely time consuming and hard to standardize. In this work we have deeply characterized genetically and phenotypically a collection of AIEC and non-AIEC strains isolated from the intestinal mucosa of human and animals with the aim to better define the characteristics of AIEC pathotype and to find putative genetic/phenotypic markers for its rapid identification. In our collection, higher number of VGs were associated with animal than with human strains and although the phylogenetic origin determined VGs profiles, differences between human and animal strains were still evident when exclusively B2 strains were considered for comparison. This observation must be considered to further search for genetic traits associated with AIEC pathotype. The inclusion of animal strains in the study helped us to detect that the host origin of isolation needs to be carefully considered when drawing conclusions.
In the present work we have focused on strains isolated from humans. Four genes (vat, pic, iss and papG) differentially distributed between AIEC and non-AIEC strains have been identified. So far, there are limited studies in which the prevalence of these genes in AIEC strains has been investigated. Among these four genes, the vacuolating autotransporter toxin (vat gene) has been implicated in LF82 AIEC pathogenesis (Gibold et al., 2015). It encodes for an autotransporter toxin involved in the gut mucus degradation. We found higher frequency of vat-positive AIEC strains similar to two previous studies: Desilets et al. (2015) (9/13 AIEC and 0/6 non-AIEC) and Gibold et al. (2015) (32/75 AIEC and 10/70 non-AIEC). On the other hand, in our work, no differences in the prevalence of vat according to pathotype were described once only B2 strains were considered, as occurred in O'Brien et al. (2016). Nonetheless, higher adhesion and invasion values were reported for those strains harboring the vat gene, such as previously reported (Gibold et al., 2015). The pic gene also encodes for a protease with toxin autotransporter activity, so it could be also involved in AIEC pathogenesis. However, so far there have only been studies relating it with Shigella flexneri (Henderson et al., 1999), and strains from de Uropathogenic E. coli, Enteroaggregative E. coli and Enteroinvasive E. coli pathotypes (Boisen et al., 2009). To the best of our knowledge no study previously analyzed its presence in an AIEC collection. Herein, we reported occurrence of this gene in a subset of AIEC strains (41%) while it was less frequent in non-AIEC strains (16%). Moreover, higher adhesion values for pic-positive strains were found. This observation together with the fact that pic may contribute to intestinal colonization in mouse models for enteroaggregative E. coli (Boisen et al., 2009), suggest that the presence of pic might confer some bacterial virulence advantage. Isogenic mutants to confirm its implication in AIEC virulence are required. However, no differences between AIEC pic+ (60%) and non-AIEC pic+ (29%) strains was found once only B2-phylogroup strains were considered, a fact that may be attributable to the amount of strains analyzed. The iss (increased serum survival) gene encodes for a protein responsible for serum resistance in ExPEC, such as avian pathogenic E. coli strains (Johnson et al., 2008). In this case, Dogan et al. (2014) did not describe an association with AIEC, but probably differences in the phylogenetic origin of the strain collections may influence these results. Finally, the combination of alleles papGII-III, encoding for adhesins of the E. coli pilus P, have been found in a low percentage of human strains (12%). Nonetheless, this gene is involved in adhesion processes and has been suggested to contribute to the urosepsis' pathogenesis (Féria et al., 2001). The prevalence of papGII has only been reported in AIEC strains isolated from CD pediatrics patients yet in a very low frequency compared with AIEC isolated from C (Conte et al., 2014).
Previous studies have reported differences in the prevalence of some genes according to pathotype (pduC, chuA, lpfA, lpfA+gipA and vat) (Dogan et al., 2014;Gibold et al., 2015;Vazeille et al., 2016;Céspedes et al., 2017). However, in our strain collection, similar lpfA 154 and chuA gene prevalence values were reported between AIEC and non-AIEC isolates. Bearing in mind that the VG carriage is deeply associated with the phylogenetic origin (Kotlowski et al., 2007), we suspect that these discrepancies may be explained due to the diversity of the strain collection used in each study. Therefore, our results confirm the high genetic variability of AIEC strains and suggest that many of the genetic features described to date are in fact related to phylogroup origin of the strains rather than to AIEC phenotype.
Results obtained on FimH, one of the most studied virulence factor in AIEC pathotype, are in line with previous data Desilets et al., 2015;O'Brien et al., 2016;Céspedes et al., 2017), since no differences in pathoadaptative mutations were specifically associated with AIEC pathotype.
Besides, although Dreux et al. (2013) and Iebba et al. (2012) indicated that N70S and S78N FimH variants could confer increased strain's capacity to adhere to the human receptor CEACAM6, no increased adhesion was observed for strains harboring these variants in our collection. Nonetheless, the strains with A119V mutation, a substitution previously reported to confer an advantage on adhesion (Iebba et al., 2012), presented higher invasion indices. Actually, N70S and S78N associated with B2 and D-phylogroup strains, as it has been determined in other groups of strains (Hommais et al., 2003;Miquel et al., 2010;Iebba et al., 2012;Dreux et al., 2013;Desilets et al., 2015). Finally, while G66S and V27A variants have been associated with CD origin of the strains and T74I, V163A and A242V variants with UC strains in a previous study (Iebba et al., 2012), no particular variants were associated with disease origin in the present work.
Up to our knowledge, this is the first study examining the sequence of ChiA in a large strain collection (N = 58). Until now, differences in ChiA sequence were only sought between LF82 and K-12 and five mutations (Q362K, E370K, V378A, V388E, and E548V) were described as required for the proper interaction between bacteria and epithelial cells (Low et al., 2013). Despite we found these mutations equally distributed between AIEC and non-AIEC strains and no significant differences neither in adhesiveness nor in invasiveness between variants, it is of note that the AIEC LF82 sequence variant was mainly shared among AIEC strains (being the 85% of strains with this sequence AIEC). Unfortunately this variant is not highly frequent amongst the whole AIEC collection, only the 35.5% of AIEC strains had this gene sequence variant, so we suggest this gene is not suitable for AIEC screening. Additional studies regarding the expression of chiA gene would be needed in order to decipher whether the strains harboring the same sequence express this gene differentially according to pathotype.
Scarce studies have evaluated the capacity of AIEC strains to resist the action of antibiotics (Subramanian et al., 2008;Dogan et al., 2013Dogan et al., , 2018Brown et al., 2015;Oberc et al., 2018) and no one has compared antibiotic resistance between AIEC and non-AIEC strains. In this work, we have combined this feature with VGs prevalence. Despite no specific and widely distributed AIEC characteristic has been found, in this work we show that the presence of pic gene and ampicillin resistance are two traits that could assist in AIEC screening since AmpR pic + E. coli strains have a probability of 82% to be AIEC. This could be of use as an initial method of screening of human E. coli isolates. The major problem is about false-positives, so AIEC predicted strains by this method should be further tested phenotypically. It is also necessary to test the specificity of the method using genetically close pathotypes such as Extraintestinal Pathogenic E. coli and to test the applicability in external strain collections isolated from different geographical locations.
To sum up, this data provide deepest knowledge about AIEC VGs sets, what has revealed four VGs that could be of relevance in AIEC pathogenicity. We reinforce the idea that no particular VG is related to AIEC phenotype. Despite diverse virulence factors could drive to the same phenotype, the presence of an AIEC-specific marker cannot be discarded. Differences in gene expression or point mutations of core genes may explain the genetic basis of AIEC pathotype. Noticeably, a novel strategy to assist in AIEC identification is proposed, yet further works confirming our results in additional strain collections are necessary.

AUTHOR CONTRIBUTIONS
MM-M designed the study and obtained funding. CC-F, MM-M, ML-S, and CE obtained the data. CC-F and MM-M performed the statistical analysis. CC-F drafted the manuscript. MM-M and ML-S revised the manuscript.

FUNDING
This work was funded by the Universitat de Girona projects MPCUdG2016-009 and GdRCompetUdG2017, and the Spanish Ministry of Education and Science through projects SAF2010-15896 and SAF2013-43284-P, being the last co-funded by the European Regional Development Fund. CC-F was recipient of an IF grant from the Universitat de Girona (IFUDG2015/12).

ACKNOWLEDGMENTS
We thank Ms. Nerea Maeso Sánchez and Ms. Ariadna Segú Roig for the assistance in the genes' sequencing part of the study.
TABLE S1 | Information of the patients from whom the UC and CRC strains were isolated.
TABLE S2 | Information about the strains analyzed. Host and disease of isolation, phylogroup origin, adhesion (ADH), invasion (INV) and intramacrophage replication (REPL) indices as well as country origin are indicated. VG prevalence, FimH and ChiA amino acid (aa) substitutions and AB resistance is depicted.