Taxon Appearance From Extraction and Amplification Steps Demonstrates the Value of Multiple Controls in Tick Microbiota Analysis

Background The development of high-throughput sequencing technologies has substantially improved analysis of bacterial community diversity, composition, and functions. Over the last decade, high-throughput sequencing has been used extensively to identify the diversity and composition of tick microbial communities. However, a growing number of studies are warning about the impact of contamination brought along the different steps of the analytical process, from DNA extraction to amplification. In low biomass samples, e.g., individual tick samples, these contaminants may represent a large part of the obtained sequences, and thus generate considerable errors in downstream analyses and in the interpretation of results. Most studies of tick microbiota either do not mention the inclusion of controls during the DNA extraction or amplification steps, or consider the lack of an electrophoresis signal as an absence of contamination. In this context, we aimed to assess the proportion of contaminant sequences resulting from these steps. We analyzed the microbiota of individual Ixodes ricinus ticks by including several categories of controls throughout the analytical process: homogenization, DNA extraction, and DNA amplification. Results Controls yielded a significant number of sequences (1,126–13,198 mean sequences, depending on the control category). Some operational taxonomic units (OTUs) detected in these controls belong to genera reported in previous tick microbiota studies. In this study, these OTUs accounted for 50.9% of the total number of sequences in our samples, and were considered contaminants. Contamination levels (i.e., the percentage of sequences belonging to OTUs identified as contaminants) varied with tick instar and sex: 76.3% of nymphs and 75% of males demonstrated contamination over 50%, while most females (65.7%) had rates lower than 20%. Contamination mainly corresponded to OTUs detected in homogenization and extraction reagent controls, highlighting the importance of carefully controlling these steps. Conclusion Here, we showed that contaminant OTUs from sample laboratory processing steps can represent more than half the total sequence yield in sequencing runs, and lead to unreliable results when characterizing tick microbial communities. We thus strongly advise the routine use of negative controls in tick microbiota studies, and more generally in studies involving low biomass samples.

Background: The development of high-throughput sequencing technologies has substantially improved analysis of bacterial community diversity, composition, and functions. Over the last decade, high-throughput sequencing has been used extensively to identify the diversity and composition of tick microbial communities. However, a growing number of studies are warning about the impact of contamination brought along the different steps of the analytical process, from DNA extraction to amplification. In low biomass samples, e.g., individual tick samples, these contaminants may represent a large part of the obtained sequences, and thus generate considerable errors in downstream analyses and in the interpretation of results. Most studies of tick microbiota either do not mention the inclusion of controls during the DNA extraction or amplification steps, or consider the lack of an electrophoresis signal as an absence of contamination. In this context, we aimed to assess the proportion of contaminant sequences resulting from these steps. We analyzed the microbiota of individual Ixodes ricinus ticks by including several categories of controls throughout the analytical process: homogenization, DNA extraction, and DNA amplification.
Results: Controls yielded a significant number of sequences (1,198 mean sequences, depending on the control category). Some operational taxonomic units (OTUs) detected in these controls belong to genera reported in previous tick microbiota studies. In this study, these OTUs accounted for 50.9% of the total number of sequences in our samples, and were considered contaminants. Contamination levels (i.e., the percentage of sequences belonging to OTUs identified as contaminants) varied with tick instar and sex: 76.3% of nymphs and 75% of males demonstrated contamination over 50%, while most females (65.7%) had rates lower than 20%. Contamination mainly corresponded to OTUs detected in homogenization and extraction reagent controls, highlighting the importance of carefully controlling these steps.

INTRODUCTION
Analysis of microbial community composition by 16S metabarcoding represents a powerful tool commonly used to evaluate the prokaryotic diversity in many ecosystems. Highthroughput sequencing approaches allow to identify microbiota with high coverage and reveal the exceptional and complex microbial diversity of environments ranging from the wide ocean (Sogin et al., 2006;Hamdan et al., 2013) to small arthropods (Degli Esposti and Martinez Romero, 2017;Estrada-Peña et al., 2018). With the constant improvement of sensitivity associated with its decreasing cost, 16S metabarcoding has spread to many areas, including those dealing with low biomass communities such as tick microbiota.
Several potential biases, inherent to the PCR amplification (chimeric sequences) or sequencing steps (random sequencing errors), are now well known (Galan et al., 2016), and pipelines have been developed to detect and remove these errors during the sequence analysis. However, several studies have begun to warn scientists about the potential distortion of results due to contamination from the extraction/amplification steps or from the environment during sample processing (Salter et al., 2014;Strong et al., 2014;Weiss et al., 2014;Narasimhan and Fikrig, 2015;Galan et al., 2016;Glassing et al., 2016;Greay et al., 2018;Knight et al., 2018). This potential contamination is able to significantly skew results concerning microbe diversity and composition, particularly in the case of low biomass samples, and can have serious consequences on the interpretation of results. For example, while investigating 16S rRNA gene profiles from nasopharyngeal swabs regularly sampled from a cohort of children from 1 to 24 months, Salter et al. (2014) demonstrated that the apparent clustering of samples by age, observed by principal coordinate analysis, in fact corresponded to the commercial kit used to perform the DNA extraction of these samples. While studying the influence of laboratory processing on diluted samples, these authors also observed that the lower the initial biomass of samples, the higher the proportion of contaminants associated with the kits or the environment. In the context of tick microbiota studies, performing negative controls would therefore be essential to detect potential contamination, and efficiently distinguish this contamination from genuine tick microbial communities. On the basis of a large review of the scientific literature, it is clear that routine multiple control analyses (extraction and PCR negative controls) are not considered at their true value, and are only rarely implemented (i.e., sequenced and used to filter contaminants from the dataset) in tick microbial community analyses (Hawlena et al., 2013;Gofton et al., 2015a,b;Rynkiewicz et al., 2015;René-Martellet et al., 2017;Sperling et al., 2017;Estrada-Peña et al., 2018;Thapa et al., 2019). In this context, we performed a 16S rRNA gene sequencing analysis on individual ticks (females, males, and nymphs) considering concomitantly several negative controls at each step of the process, prior to the sequencing run (tick homogenization, DNA extraction, and amplification).

Tick Collection and Sample Selection
Questing Ixodes ricinus nymphs and adults were collected for 3 years by dragging (from April 2014 to May 2017) in the Sénart forest in the south of Paris. More details on the sampling location and design, and tick collection, are available in Lejal et al. (2019). After morphological identification, ticks were stored at −80 • C until further analysis.

Tick Washing, Homogenization, and DNA Extraction
Ticks were first washed once in ethanol 70% for 5 min and rinsed twice in sterile MilliQ water for 5 min each time. Ticks were then individually homogenized in 375 µL of Dulbecco's Modified Eagle Medium with decomplemented fetal calf serum (10%) and six steel beads using the homogenizer Precellys R 24 Dual (Bertin, France) at 5,500 rpm for 20 s. DNA extraction was performed on 100 µL of tick homogenate, using the NucleoSpin R Tissue DNA extraction kit (Macherey-Nagel, Germany). For these previous steps, sterile tubes and filter tips were used on a dedicated area for DNA extraction. Bench and materials were decontaminated with DNA remover (Molecular BioProducts, San Diego, CA, United States) before and after each use. Each time a tick homogenization or DNA extraction was performed, a homogenization control (HC) or extraction control (EC) was added for the corresponding step. Three positives controls, corresponding to a microbial community standard (D6300 ZymoBIOMICS Microbial Community Standard -Zymo Research, United States), were also extracted in the same way.

DNA Amplification and Multiplexing
In total, 32 males, 35 females, and 557 nymphs were subjected to sequencing analysis. DNA amplifications were performed on the V4 region of the 16S rRNA gene using the primer pair used by Galan et al. (2016) (16S-V4F: 5 -GTGCCAGCMGCCGCGGTAA-3 and 16S-V4R: 5 -GGACTACHVGGGTWTCTAATCC-3 ), producing a 251 bp amplicon. This primer pair was chosen for its ability to specifically amplify prokaryotic DNA (86 or 95% of referenced bacterial DNA in the Silva database, and 52.3 or 93% of Archaeal DNA, when considering 0 or 1 mismatch, respectively). This characteristic is crucial in the context of tick microbiota analysis, due to the high amount of tick DNA compared to prokaryotic DNA. An 8 bp-index was added to the forward and reverse primers. The individual amplification of each sample was performed using a combination of 22 and 32 index-tagged forward and reverse primers, allowing for amplification and multiplexing of a maximum of 704 samples that can be loaded together on an Illumina MiSeq flow cell. Amplification and multiplexing steps were performed under a sterile hood, decontaminated before and after use with DNA remover (Molecular BioProducts, San Diego, CA, United States) and UVs, using tube strips with individual caps to prevent crosscontamination. All the PCR amplifications were carried out using the Phusion High-Fidelity R DNA Polymerase amplification kit (Thermo Fisher Scientific, Lithuania). For each sample, 5 µL of DNA extract were amplified in a 50 µL final reaction volume, containing 1× Phusion HF buffer, 0.2 µM of dNTPs, 0.2 U/mL of Phusion DNA polymerase, and 0.35 µM of forward and reverse primer. The following thermal cycling procedure was used: initial denaturation at 98 • C for 30 s, 35 cycles of denaturation at 98 • C for 10 s, annealing at 55 • C for 30 s, followed by extension at 72 • C for 30 s. The final extension was carried out at 72 • C for 10 min. Amplification step was performed on a peqSTAR 2× thermocycler (PEQLAB, Germany). For each PCR run, negative controls were performed by using the reaction mixture without template, but using ultrapure water (Invitrogen, United Kingdom). PCR products were checked on 1.5% agarose gels.

DNA Purification, Quantification, and Pooling
Amplicons were purified and quantified using a SequalPrep normalization plate kit (Invitrogen, United States), and a Qubit dsDNA High Sensitivity Assay kit (Invitrogen, United States), respectively. Amplicons and controls were pooled at equimolar concentrations and sent to the sequencing platform (GenoScreen, France).

Negative Controls
In total, 45 negative controls were included and analyzed using the same process as for tick samples (Figure 1).
Twelve Homogenization controls (HCs): Tube containing homogenization solution, remained open in the room while putting ticks inside other tubes between the cleaning and homogenization steps. DNA extraction and amplification were then performed on the homogenization control in the same conditions as for any other sample.
Twelve Extraction reagent controls (ECs): For these controls, the different DNA extraction steps were performed under the same conditions as for any other samples, but without adding sample. DNA amplification was then performed on the extraction control in the same conditions as for any other sample.
Twenty-one DNA amplification controls (ACs): The DNA amplification step was performed on ultrapure water (Invitrogen, United Kingdom), in the same conditions as for any other sample.

Sequencing and Data Processing
The equimolar mix was concentrated and sequenced by GenoScreen (Lille, France) using MiSeq Illumina 2 × 250 bp chemistry. Quality control checks on raw data were performed using FastQC v0.11.3 (Andrews, 2010), and no problems were detected. First, pairs of amplicons were merged using Pear v0.9.11 (Zhang et al., 2014) with a minimum overlap of 20 bp, a minimum assembly length of 100 bp, a maximum assembly length of 480 bp, and an e-value of 0.01. Then, the sequencing orientation of extended sequences was checked by using cutadapt v1.12 (Martin, 2011), with primers in forward and reverse strands (without removing them). Sequences with a reversed orientation were reverse-complemented to obtain a set of sequences with the same orientation (5 to 3 ). The sequence analysis was performed using FROGS (Escudié et al., 2018). Primers were removed using cutadapt and resulting sequences were dereplicated. Sequences smaller or longer than 250 bp ± 20 bp, or containing N-nucleotides were removed from the dataset. Operational taxonomic units (OTUs) were built using Swarm v2.1.12 (Mahé et al., 2015) with parameter d = 3, and chimeras were filtered using vsearch v1.4 (Rognes et al., 2016) in a de novo mode. Finally, OTUs with low abundance (<0.005% of the total sequence count) were filtered out. OTUs were affiliated by blasting cluster seed FIGURE 1 | Analyzing process for samples and added controls at each steps: Tick homogenization, DNA extraction, DNA amplification, and 16s rRNA gene sequencing. For each step, information stated in black correspond to the initial sample and potential controls added at a previous step, and blue stated information correspond to the controls added at the present step.
sequences against the Silva database v132 (Quast et al., 2013). OTUs corresponding to mitochondrial/chloroplast DNA or not affiliated to any domain were removed.

Data Processing to Distinguish Contaminant Versus Tick Microbiota OTUs
The control composition was used to distinguish tick microbial OTUs from contaminants. All the OTUs corresponding to more than 1% of the total number of sequences obtained in each category of control were identified as main contaminants.
However, due to possible errors in sequence assignment occurring during sequencing due to the formation of mixed clusters on the flow cell (Kircher et al., 2012), negative controls can also contain a few sequences of tick microbial OTUs, complicating contaminant identification under this 1% threshold. Therefore, for each OTU, the number of sequences expected to spread to other samples due to false assignment (i.e., expected number) was determined by calculating 0.02% of the total number of each OTU in the dataset, corresponding to a false assignment rate estimated previously by Galan et al. (2016). This expected number was compared to the maximum number of obtained sequences in controls (i.e., maximum number).
As this rate was not estimated from our dataset, we decided to allow variability around this threshold, while calculating a 99% confidence interval around the expected number of 0.02%. OTUs whose maximum sequence number was above the upper threshold were considered contaminants. Please note that cross-contaminations from samples to controls were considered negligible. Indeed, no negative controls presented a higher maximum sequence number than the expected number for an OTU belonging to the most abundant genus of tick microbiota in our dataset (Ca. Midichloria), which is also the 3rd most abundant OTU in the dataset (266,454 sequences).

Statistical Analyses
The percentage of samples presenting a contamination rate higher than 50% was compared between samples categories (nymphs, males, and females) according to a Chi-square statistical test.
Alpha diversity was estimated before and after contaminating OTUs filtering. Samples containing fewer than 500 sequences after contaminant filtering were removed from the analysis, even if they contained more than 500 sequences before the filtering. Several measures of alpha diversity were calculated. Basically, the observed number of OTUs (species richness) was used as well as the Faith phylogenetic diversity (PD) index, corresponding to the total branch length of the subtree spanned by the community, and also two indexes taking into account the number of sequences: the Shannon index (more influenced by rare OTUs) and Simpson index (more influenced by dominant OTUs). Phylogenetic diversity was estimated using the Picante package (Kembel et al., 2010), while other measures were assessed thanks to Easy 16S, 1 based on the phyloseq package (McMurdie and Holmes, 2013). Alpha diversity measure comparison was performed between contaminant filtered and not filtered groups, as well as between instars, using non-parametric statistical tests (Wilcoxon and Kruskal-Wallis, respectively). Statistical computations were performed in R 3.6.0 (R Core Team, 2018).

RESULTS
After sequencing, demultiplexing and merging amplicons, we obtained a total of 4,767,043 sequences. The impact of the different steps of the data processing: (1) merging filters, (2) FROGS pre-process, (3) Swarm clustering, (4) Chimera removing, and (5) filtering of rare OTUs are described for each sample in the Supplementary Table S1. After performing all these steps, we obtained a dataset composed of 608 OTUs for 2,804,545 sequences. After the affiliation step using the SILVA data base, OTUs corresponding to mitochondrial/chloroplast DNA or not affiliated to any domain, have been removed, and we finally obtained a dataset composed of 513 OTUs and 2,695,360 sequences.
Positive controls (microbial community standards) presented a mean number of sequences of 5,702 (±634) and a mean number of OTUs of 60 (±9). All the expected micro-organisms were 1 http://genome.jouy.inra.fr/shiny/easy16S/ detected and 98.1% (±1.1) of the sequences were attributed to the microbial community standard.
The most abundant OTUs, with relative abundance higher than 1% in at least one category of samples or negative controls, are represented in Figure 2. All other OTUs, i.e., those with relative abundance lower than 1% in all categories were grouped together in the "Others" category.
The examination of the at-least-1% OTU composition of the control (HCs, ECs, and ACs) allowed to identify OTUs belonging to the genera: Pseudomonas, Acinetobacter, Halomonas, Shewanella, Tepidimonas, Escherichia-Shigella, Photobacterium, Polaromonas, Deinococcus, and Reyranella, as well as two OTUs belonging to the Enterobacteriaceae family but multi-affiliated at the genus level, and one OTU only affiliated to Gammaproteobacteria at the Class level (Figure 2). OTU composition was contrasted across the control types. While the same OTUs were identified in HCs and ECs and belonged to Pseudomonas (70.7 and 69.3%, respectively), multiaffiliated Gammaproteobacteria (12.2 and 16.7%, respectively), Acinetobacter (3.1 and 2.4%, respectively), and multi-affiliated Enterobacteriaceae (2.7 and 3.6%, respectively), the microbial community identified in AC controls was dominated by OTUs belonging to Halomonas (58.8%), Shewanella (13.6%), Tepidimonas (5.1%), Escherichia-Shigella (2.3%), Photobacterium (1.5%), Polaromonas (1.5%), Deinococcus (1.5%), Reyranella (1.2%), and multi-affiliated Enterobacteriaceae (1.2%). These OTUs represented 38.9% of the total number of sequences coming from the tick samples. Focusing on the sample category, these OTUs correspond to 11.9, 40.4, and 41.3% of sequences detected in females, males and nymphs, respectively (Figure 2). These OTUs were therefore considered to be the main contaminants of the dataset. It is interesting to note that most of these OTUs correspond to those detected in both EC and HC controls. In fact, only 2.8% of sequences detected in the different categories of samples (nymphs, females, and males) correspond to OTUs detected in the AC controls and identified as main contaminants.
A more in-depth identification of contaminants, including those corresponding to less than 1% of the total number of sequences was performed, allowing us to identify 160 OTUs as contaminants among the 513 initially identified OTUs in the dataset. This identification increased the proportions of identified contaminants in the samples, which reach 50.9% of the total sequence count from the tick samples, and 20.1, 51.2, and 53.7% of sequences detected in females, males, and nymphs, respectively. Furthermore, after these contaminant filtering steps, 201 samples contained less than 500 sequences compared to 8 before filtering. For 76.6% of nymph and 75% of male samples, contaminant OTUs represented more than 50% of the total FIGURE 2 | Comparison of the 16S rRNA gene sequencing taxonomic assignments between negative controls and samples. Percentages are represented for the OTUs with proportions corresponding to at least 1% of the total number of sequences assigned to a category of samples or controls. Those corresponding to less than 1% are assigned as "Others." OTUs belonging to the same genus or family (if multi-affiliated at the scale of genus) were assigned under the same name. One OTU was only identified until the class level (multi-affiliated Gammaproteobacteria). Most OTUs detected in homogenization and DNA extraction controls seems to represent a high proportion of reads detected in samples, particularly in nymph and male samples. sequence count (Figure 3). These sample contamination rates are significantly higher than those obtained for females (p < 0.05 according to a Chi-square test), for which most of the samples (65.7%) exhibited a contamination rate below 20%. The impact of this contamination on diversity, according to the sample category was estimated (Figure 4). As expected, for all the studied diversity measures, contaminant filtering led to a decrease in diversity. Concerning nymphs and females, this decrease was significant for all the tested diversity indexes, while for males, it was only significant for the observed and phylogenetic diversity measures, suggesting that for males, contamination concerned mostly low-abundance taxa. The impact of this contaminant filtering on alpha diversity comparison between tick instars was been studied. Without filtering, significant differences in alpha diversity were found between tick instars for observed, Shannon and Simpson measures. Concerning filtered data, significant differences were still observed between instars when using Shannon and Simpson indexes, but disappeared when using the observed diversity measure.

DISCUSSION
In this study, we sequenced the microbiota of 624 individual I. ricinus ticks including several categories of negative controls to identify and quantify the contaminating sequences, in particular those produced during the extraction or amplification steps. The performance of multiple negative controls allowed us to perform in-depth identification of contaminant OTUs. These OTUs represented 50.9% of the total sequence yield from tick samples, and had a significant impact on microbial diversity estimation. This shows the importance of contamination potentially arising from processing steps, and demonstrates that its consequences on data analyses can be significant and should not be neglected.
A high number of sequences in each category of controls was detected, particularly in HCs and ECs, in which the mean number of sequences was even higher than the mean sequence number detected in samples. Importantly, this higher number might be partly overestimated, first due to the presence of tick DNA during the PCR that may have inhibited the reaction on the 16S rRNA gene in samples while not in controls, and second during the measure of DNA concentrations that may have led to a higher estimation of concentrations in samples compared to controls, before the pooling step. Despite this, the high number of sequences in controls raises the issue of contaminant presence, while performing each step of the sample processing.
Each processing step leads to a different contamination pattern. Most of the main contaminants do not seem to originate from the DNA amplification step. Furthermore, as the composition of homogenization and extraction controls was the same, while the reagents used were different and homogenization control also underwent the extraction step, the contaminant sequences most probably came from the DNA extraction step, rather than from the homogenization step. The key role of the FIGURE 3 | Impact of contamination on the final number of sequence per sample according to the tick instar/sex. A high proportion of sequences have been identified as contaminants and therefore removed from the dataset. Nymph and male samples seem to be more impacted, with almost 75% of samples containing more than 50% of contaminating sequences. extraction step in the contamination of low biomass samples has previously been demonstrated by Salter et al. (2014). Furthermore, three of the four main contaminants identified in our analysis, corresponding to Pseudomonas, Acinetobacter, and multi-affiliated Enterobacteriaceae, belong to families and genera identified by these authors as contaminants coming from DNA extraction kits. Our observations differed from those obtain by Galan et al. (2016), who mainly detected contaminants coming from the DNA amplification step, while this step represented only 2% of the main contaminants detected in our samples. In their study of kit contaminants, Salter et al. (2014) observed that the extent of contamination arising from DNA extraction was different depending on the kit brand used to perform this step. While using the less contaminating kit, they extracted DNA from several 10-fold cascade dilutions of a pure suspension of Salmonella bongori, and subsequently assessed 16S rRNA gene concentrations via q-PCR. They demonstrated a linear relationship between the 16S rRNA gene copy number and the dilution fold only for the first dilution points, after which levels started to stabilize and to reach a threshold of 10 5 mean copy number/mL. Consequently, the higher rate of extraction, rather than amplification contaminants in samples, could be due either to variations of contaminating load between our extraction and amplification kits, that are not sourced from the same company, or to the presence of a basal level of extraction contaminant bacterial DNA, potentially limiting the contamination arising from the next step (amplification). These types of contamination seems to be the main in our dataset. However, some other sources of external contamination, such as those potentially coming from the dragged material or inherent to 16s metabarcoding (i.e., cross talk during the sequencing run, leading to false assignment of sequences between samples) could be considered. False assignment rate can be assessed through the use of an alien sequence (Galan et al., 2016). Such a sequence was not used in our dataset. Nonetheless, if false assignments would have highly affected our data set, we probably would have detected high proportion of Ca. Midichloria (the most abundant known tick microbial member) sequences in our negative controls, that would have led to its identification as a contaminant. Even if we are aware that dragged material or false assignments might be another source of contamination, we hypothesize, based on our results, that they would be probably much weaker than those linked to the extraction steps and would represent a low percentage of the total number of sequences.
Interestingly, we showed that in terms of contamination rates, the influence of contaminant presence was significantly higher in nymph and male samples compared to female ones. Because low biomass samples were shown to be particularly sensitive to contamination (Salter et al., 2014), we might suppose that the amount of bacterial DNA would be lower in males and nymphs, explaining the higher presence of contaminants in these samples. It would be useful to develop epifluorescence or flow cytometry protocols adapted to tick samples, to assess whether the amount of bacteria composing the female microbiota is effectively higher than in the male and nymph. Additionally, previous studies dealing with tick microbes have identified higher microbiota diversity in males and nymphs compared to females, while investigating richness and PD (Van Treuren et al., 2015;Zolnik et al., 2016) as well as Shannon and Simpson indexes (Zolnik et al., 2016). However, no negative controls and data filtering from contaminants were reported in these studies, and they could therefore be impacted by contamination. In the present study, we determined that the presence of contaminants can lead to an overestimation of microbial diversity in all the tick instars, and that this is the case for several diversity indexes, especially richness and PD estimations, which are the measures most affected by the filtering process. Furthermore, the impact of removing contaminants on diversity comparisons between instars was estimated and showed that this filtering step can influence the final results, and particularly in our case, concerning the observed diversity estimation. Our results therefore call for caution in attempts to draw conclusion about tick instar diversity without considering negative controls.
As already mentioned by Narasimhan and Fikrig (2015), the understanding of arthropod microbial communities in the context of arthropod survival and pathogen transmission may open new routes in arthropod-borne pathogen control strategies. In this context, many studies have assessed tick microbiota FIGURE 4 | Impact of contamination on alpha diversity estimations according to the tick instars. Alpha diversity has been estimated according to the observed number of OTUs, the Shannon and Simpson indexes and the Faith's phylogenetic diversity (PD), for each instar on filtered (turquoise) and not filtered (orange-red) datasets. Results of the statistical test comparing alpha diversity measurement between instars, as well as between the filtered and not filtered groups, are represented on colored bands at both the top and the bottom of the frame, respectively. NS, not significant, ***p < 0.001, **p < 0.01. composition in individual ticks or even in tick organs (Andreotti et al., 2011;Lalzar et al., 2012;Hawlena et al., 2013;Budachetri et al., 2014;Narasimhan et al., 2014;Qiu et al., 2014;Clayton et al., 2015;Rynkiewicz et al., 2015;Van Treuren et al., 2015;Gall et al., 2016;Khoo et al., 2016;Zolnik et al., 2016;Abraham et al., 2017;Kwan et al., 2017;Swei and Kwan, 2017;Estrada-Peña et al., 2018). However, very few reported the use of negative controls or mentioned the identification and removal of subsequently identified contaminants (Hawlena et al., 2013;Gofton et al., 2015a,b;Rynkiewicz et al., 2015;René-Martellet et al., 2017;Sperling et al., 2017;Estrada-Peña et al., 2018;Thapa et al., 2019), thus obscuring the risk that identified micro-organisms as members of the microbiota could correspond to contaminants. Nonetheless, the bias generated by the presence of contaminants can have a considerable influence on the characterization of tick microbiota composition and structure. Here, we showed that contaminants represented almost 50.9% of the total sequence yield of tick samples. Furthermore, several contaminants we identified belong to genera (especially Pseudomonas and Acinetobacter) previously reported in several tick microbiota studies (reviewed by Narasimhan and Fikrig, 2015). This does not necessarily mean that all OTUs belonging to these genera are contaminants, as several OTUs belonging to the same genera may correspond either to tick microbiota members or to contaminants. However, this points out the lack of reliability concerning this crucial point in the understanding of tick microbiota composition, and the need to perform negative controls to filter contaminants from the data set and to rule out the hypothesis of potential contamination.

CONCLUSION
The use of commercial DNA extraction kits has become standard, particularly in the context of studies dealing with the identification of tick microbial communities, notably because of their ease of use and the fact that they employ safer chemicals than those in conventional methods (i.e., phenol chloroform extraction). However, the issue of contaminants coming from DNA extraction kits has recently been raised for microbiota studies dealing with low biomass samples, such as blood, intestinal tissue, mucosal tissue, or nasopharyngeal samples. Our study adds further evidence supporting these important concerns in the field of tick microbiota. Here, our results suggest that many negative controls should be performed at each step of sample processing prior to performing microbiota analyses, allowing us to efficiently identify a potentially serious bias in the study of the actual tick microbial community. While more and more studies try to identify tick microbiota to in fine potentially propose hypothesis for the development of new tick-borne pathogen control strategies, we strongly advise the routine use of negative controls in microbiota studies, particularly in the context of low biomass samples like ticks or other small arthropods. These procedures should be used routinely in order to obtain the most reliable results possible.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the European Nucleotide Archive. Project accession number: PRJEB36903 (ERP120162) -Sample accession numbers: ERS4353953-ERS4354625 -Run accession numbers: ERR3956669-ERR3957340.

AUTHOR CONTRIBUTIONS
TP, MV-T, J-FC, and EL conceived and designed the experiments. EL performed the experiments. EL, AE-P, TP, MMs, MMi, CM, and OR analyzed the data. EL, AE-P, MMs, J-FC, OR, MMi, CM, MV-T, and TP wrote the manuscript. All the authors read and approved the final manuscript.

FUNDING
This work was supported by the Métaprogramme "Metaomics and Microbial Ecosystems" (MEM) and the Métaprogramme "Adaptation of Agriculture and Forests to Climate Change" (ACCAF) granted by the French National Institute for Agricultural Research (France). The salary of EL, the Ph.D. student working on this project was funded by the Ile-de-France region.