Relevance of DNA barcodes for biomonitoring of freshwater animals

The COI gene, colloquially named the DNA barcode, is a universal marker for species identification in the animal kingdom. Nevertheless, due to the taxonomic impediment, there are various proposals for molecular operational taxonomic units (MOTUs) because high-throughput sequencers can generate millions of sequences in one run. In the case of freshwater systems, it is possible to analyze whole communities through their DNA using only water or sediment as a sample. Using DNA barcodes with these technologies is known as metabarcoding. More than 90% of studies based on eDNA work with MOTUs without previous knowledge of the biodiversity in the habitat. Despite this problem, it has been proposed as the future for biomonitoring. All these studies are biased toward the Global North and focused on freshwater macrofaunae. Few studies include other regions of the world or other communities, such as zooplankton and phytoplankton. The future of biomonitoring should be based on a standardized gene, for example, COI, the most studied gene in animals, or another secondary consensual gene. Here, we analyzed some proposals with 28S or 12S. The studies on eDNA can focus on analyses of the whole community or a particular species. The latter can be an endangered or exotic species. Any eDNA study focused on a community study should have a well-documented DNA baseline linked to vouchered specimens. Otherwise, it will be tough to discriminate between false positives and negatives. Biomonitoring routines based on eDNA can detect a change in a community due to any perturbation of the aquatic ecosystem. Also, it can track changes along the history of an epicontinental environment through the analyses of sediments. However, their implementation will be complex in most megadiverse Neotropical countries due to the lack of these baselines. It has been demonstrated that a rapid functional construction of a DNA baseline is possible, although the curation of the species can take more time. However, there is a lack of governmental interest in this kind of research and subsequent biomonitoring.


Introduction
Since the proposal by Hebert et al. (2003), DNA barcodes have become a hot topic with many controversies from a philosophical background (Ebach and de Carvalho, 2010). Their failure to discriminate species, with more emphasis on plants and fungi, where the markers proposed (Hollingsworth et al., 2009;Schoch et al., 2012) have many limitations, has also been reported. Some proposals with other markers have also been made (Heeger et al., 2019;Liu et al., 2022).
Nevertheless, the studies incorporating DNA barcodes have increased steadily since their inception to nearly 1400 in the year 2020 , despite the predictions of some people about their end (Taylor and Harris, 2012). Accordingly, Elías-Gutiérrez et al. (2021) proposed that the advancement today has not been the same with aquatic organisms, and it has been less in freshwaters, with no more than 90 publications in the same year. Most probably, these limited results are due to problems amplifying the proposed standardized gene as DNA barcodes, the first half of cytochrome c oxidase I (COI or COX1), mainly in invertebrates . This methodological problem led to the proposal of alternative genes, reviewed in the following paragraphs. Today, we can say that we do not have severe limitations to amplifying the COI gene in almost any freshwater specimen of any group if we correctly apply the protocols proposed by Elías-Gutiérrez et al. (2018), and the zooplankton (Zplk) primers developed by Prosser et al. (2013), or other more specific primers. This result is reflected in the 871 diaptomids, 1381 cyclopoids, 2077 anomopods, and 211 ctenopods, among other freshwater zooplankters (in a broad sense) as mites or ostracods, already barcoded from Mexico (see the Taxonomy Browser available on BOLD: boldsystems.org). Currently, in the case of the zooplankton studies with these results, we are working on full descriptions of the unknown species highlighted after DNA barcoding. Several researchers consider the construction of these public databases an example (Makino et al., 2017).
Moreover, three good recent reviews show some of the significant tendencies of eDNA and metabarcoding studies (Pawlowski et al., 2022;Schenekar, 2022;Yao et al., 2022). They focused only on benthos, macroinvertebrates, and fish, and we will discuss them later.
This review aims to evaluate the DNA barcoding of aquatic life and current trends in metabarcoding with remarks on some limitations we have seen in developing, implementing, and applying these methods.
We also want to remark on its relevance in megadiverse countries, where the funds for science are limited.

Methods
We consulted the Web of Science (https://www.webofscience. com) on different dates in September 2022 using the search strings "eDNA" AND "metabarcoding" AND "freshwater." These combinations are used to construct Figure 1.
Each hit was analyzed, and the most relevant are cited in the following paragraphs. Our criteria are resumed in the following sections of this review.
However, we do not pretend to make an extensive assessment of the literature available.
For a better understanding, the review is divided into four sections and a conclusion, involving the main topics as the objectives for this work.

Metabarcoding, biomonitoring, and eDNA
After DNA barcoding, metabarcoding of environmental DNA (eDNA) has been one of the common applications developed. The word metabarcoding was first proposed in 2011 by Pompanon et al. (2011). This word refers to using DNA to identify many taxa within a sample, revealing the composition of the species. This term can be associated with biomonitoring, which is understood as measuring the diversity or presence of live organisms, with the primary goal of detecting changes or differences in any ecosystem (Yu et al., 2012). These changes can be natural in origin, seasonal or timeline changes in the environment or any perturbation or stress on it such as pollution, presence of exotic species, or to compare two localities. Ogram et al. (1987) proposed the term environmental DNA for the first time when working with microbial DNA from sediments collected near Pensacola, Florida, and Knoxville, Tennessee. Later, its first uses were for microbiology studies. Recently, it was resurrected by Taberlet et al. (2012) and Dejean et al. (2011). The word refers to DNA obtained from environmental samples, such as water or sediments. However, it is not restricted to aquatic ecosystems because it can be obtained from the air (eDNAir) (Clare et al., 2021), soil, or any other substrate where the flora or fauna can leave traces of their DNA (Kyle et al., 2022). It is essential to mention that in May 2019, a new journal was devoted to this field of research: Environmental DNA (ISSN: 2637-4943). It still needs to be indexed in Clarivate.
These terms can be combined in eDNA metabarcoding, a recent proposal for biomonitoring any epicontinental or marine ecosystem (or terrestrial). For aquatic environments, among the first uses of this term was in the detection of the diversity of marine fish fauna using a small fragment (<100 bp) of the cytb gene in a region named The Sound of Elsinore, Denmark, by Thomsen et al. (2012). The authors used cytb because, at that time, it had the best coverage of the local fish fauna.
Today, the most sequenced gene for all aquatic life is the first half of the mitochondrial cytochrome c oxidase I gene, totaling 14,525,551 animal specimens (Barcode of Life Data System, BOLD). Due to difficulties amplifying it in aquatic life, mainly crustaceans, some authors proposed other markers as DNA barcodes, such as the 28S (Hirai et al., 2013). However, their use lowers the accuracy of species identifications compared with COI (Elías- . This latter gene is not perfect, and some young aquatic species are not discriminated by it, as occurs with the Characidae fish from Mexico (Valdez-Moreno et al., 2009).
A simple comparison of the development of libraries can be made in GenBank: the search words "cytb Actinopterygii" provides 154,517 hits, meanwhile "COI Actinopterygii" provides 200,117 hits, and the BOLD database provides 293,659 public records, with a total (including the non-yet public) of 399,462 hits. For predominantly freshwater animals, such as the Anomopoda, cytb provides 75 hits vs. 4611 in GenBank.
Nevertheless, in the case of some groups such as fish, 12S outperforms COI for eDNA . This author concluded that it was a question of primers. There are some recommendations to work more with this gene (Weigand et al., 2019) because COI covers 87.9% of the freshwater fish fauna, while 12S only covers 36.4% with at least one sequence. Another recent effort for Neotropical fish included sequencing it for 67 species from Brazil (Milan et al., 2020). Moreover, Shogren et al. (2018) found that longer fragments of DNA degrade more rapidly than shorter ones in the environment. Some of these problems will be overcome once more standardized protocols arrive.
A problem using ribosomal mtDNA genes such as 12S is the failure to discriminate pseudogenes (known as NUMTs). Little has been studied, but in humans, the recovery of undiagnosable NUMTs has been demonstrated (Olson and Yoder, 2002). There is no study comparing the performance of the COI vs. 12S on a broad scale.
Although some libraries are being developed, they host material, in this case, fish, from limited biodiversity regions . In comparison to the Neotropics, for example, in the middle Amazon Basin, near Leticia (Colombia), in just 40 km 2 , Galvis et al. (2006) registered 344 fish species.
An additional advantage of using COI as the primary marker for metabarcoding is that a small fragment of up to 109 bp provides a reliable identification in most species (Hajibabaei et al., 2006). However, the accuracy will depend on the region of the 650 bp amplified it refers to which part within these 650 bp is amplified. With these ideas, many proposals arose to obtain faster sequencing results, from Sanger sequencing  to new developments, such as the latest generation of MinION cells, involving thousands of specimens and providing up to 658 bp (Srivathsan et al., 2021).

Taxonomic impediment
Based on the previous paragraphs, we can say that, currently, metabarcoding-based biomonitoring should be centered mainly on the COI gene. However, an alternative marker would be needed sometimes, yet there is no consensus on any as a second universal marker. Second, it is easier to get thousands of sequences technologically, but the taxonomic impediment is the major problem. This problem is more marked with invertebrates (Coleman, 2015). In other words, that means technological developments are surpassing our ability to identify species.
There have been many proposals to "speed" up species discovery to overcome this problem. Among them, Sharkey et al. (2021) and Meierotto et al. (2019) proposed some minimalist approaches, although they are not exempt from controversy (Zamani et al., 2022). These discussions have focused on insects. In the case of aquatic life, it is not possible to use these "modern" minimalist proxies because many species are cryptic (García-Morales and Elías-Gutiérrez, 2013; Elías-Gutiérrez et al., 2019). Their description requires a more integrative approach, as proposed by Andrade-Sossa et al. (2020) or García-Morales et al. (2021).
Another way to overcome the taxonomic impediment has been elaborating different mathematical algorithms to distinguish molecular taxonomic operational units (MOTUs) that could correspond to the species. There are many ways to calculate these MOTUs; one of the most used is the Barcode Index Numbers (BINs), proposed by Ratnasingham and Hebert (2013). However, these clusters always require additional evidence to be supported, and they can change based on this knowledge. Others proposed taxonomy-free indexes (Apotheloz-Perret-Gentil et al., 2017). However, little congruence has been observed when these methods are compared with morphology-based methods at the species level in tropical environments (Kutty et al., 2022).
In our group study on zooplankton, we faced this problem with new non-conventional collection methods, such as using light traps (Montes-Ortiz and Elías-Gutiérrez, 2018). Zooplanktonic species increased dramatically, including many non-traditional zooplankters, such as Acari, chironomids, chaoborids, or ostracods . As a result, we are facing a fascinating new world of species that we consider "zooplankton in a broad sense." All these animals interact and have a role within this community, as we demonstrate with a mite predating Bosmina tubicen, a strict  (Fisher et al., 2017). Our proposal is the construction of "rapid baselines" with recovering specimens, when possible, for later description (Montes-Ortiz et al., 2022). If the DNA extraction destroys the whole specimen, we deposit parallel vouchers in a biorepository. All material should be uploaded to a public database like BOLD. Later, they will allow biomonitoring, as proposed by Valdez-Moreno et al. (2021). They compared their eDNA data from tropical oligotrophic Lake Bacalar against a dataset of 3534 specimens representing 519 species of fish from Mexico. However, some doubtful records (false positives) appeared, which we will discuss later.
Accordingly, we should know the species dwelling in each freshwater system, allowing eDNA metabarcoding and comparing with the baseline for biomonitoring.
Finally, the only answer to speed up the process of species description in aquatic environments is to train more specialists to understand aquatic biodiversity, mostly devoted to invertebrates, and convince society about the importance of this job.
Nevertheless, Figure 1 shows that interest in metabarcoding is increasing much more rapidly than that in barcoding studies. We can say that most barcoding studies are the basis for working with metabarcoding, which will be discussed in the next section.

False positives and/or false negatives
We assume that the biodiversity of a freshwater system is unknown. In that case, it means that we cannot determine if the sequences we obtained using the eDNA techniques are false positives or if false negatives exist. These latter results are obtained from species present in the ecosystems but are not detected by these methods. A simple way to approach this lack of knowledge of biodiversity can be the first sight of the initial BOLD page, which shows 807,000 BINs but only 244,000 animal species (in addition to 72,000 plants and 24,000 fungi). Much less than half of the species have a scientific name.
DNA of false positives can sometimes be physically present in the aquatic environment due to different factors. For example, Valdez-Moreno et al. (2019) found a marine fish, Lachnolaimus maximus, near Lake Bacalar, far away from its typical habitat, the Mesoamerican Reef. A field survey explained its presence: remains of this fish were thrown from restaurants into the water. In many cases, the false positives are not as evident as the presence of a strict marine species in a freshwater ecosystem.
More problematic is the finding of false negatives because they can be rare or occasional in the surveyed environments. This case requires a significant field effort, replicate samples, and larger volume waters. It is easy to mention these points. However, the implementation can be challenging. For example, depending on suspended sediments, filters can collapse rapidly. The use of primers can also be challenging (Polanco-Fernandez et al., 2021). For example, related to primers, it is essential to consider the primer bias or the so-called amplification bias that is mostly related to universal primers in community studies. The result is a reduction to realize quantitative inferences to count the taxa (Bruce et al., 2021).

Current trends
Independently of the focus of the study, the studies of eDNA in aquatic environments for biomonitoring involved two main routes: studies based on analyses of the whole community involving the socalled metabarcoding and studies focused on a search for a particular species that could be endangered, an introduced exotic, or commercially valuable. These latter studies could be based on quantitative PCR or digital droplet PCR. Among the applications, these methods can be used to follow an invasion by exotic species (Takahara et al., 2013) or detect some aspects related to the biology of a species, such as the spawning season (Bylemans et al., 2017). In these cases, any specific marker can be used instead of the DNA barcodes.
Although most studies focus on methods developments, two recent reviews are devoted to analyzing all published information about eDNA in aquatic environments.
Schenekar's (2022) assessment of 381 eDNA-focused studies in freshwaters was limited to macro-organisms. It showed an increase in biomonitoring (64.8% of the total) and a diminution of purely ecological (19.9%) works. The growth of the studies was exponential, and most of them (88.5%) were conducted in the socalled Global North (North America, Europe, and Asia). However, the metabarcoding studies were only 36.5% of the total, the most studies based on qPCR (55.1%), where the authors rely only upon the MOTUs, and are mainly targeted to fishes that can be identified in this way if their marker is already known.
In the case of fish, they are the aquatic group with more DNA barcoding studies . Yao et al. (2022) analyzed all publications involving fish and eDNA. A total of 416 studies were found (from marine to freshwater), including biomonitoring to ecological interactions.
A novel development has been the analysis of lake sediments, where some species' colonization patterns can be followed through time (Olajos et al., 2018). However, false positives/negatives are still an issue. Nevertheless, biomonitoring the change through sediments is a promise for paleoecology (Capo et al., 2021). A recent review of methods, protocols, and recommendations for standardization was made by Pawlowski et al. (2022).
There are only a few works on other critical freshwater communities, such as the zooplankton, with no more than 10 hits on the Web of Science (using search strings "eDNA" AND "freshwater" AND "zooplankton"). The first study published was an analysis of spatial and temporal dynamics with eDNA, based on the 18S rRNA gene, in Harsha Lake (Ohio, USA) (Banerji et al., 2018). Although the authors found 1,314 unique MOTUs, it is impossible to elucidate the presence of false positives/negatives due to the lack of a baseline. The same situation was faced by Qiu et al. (2022) in Poyang Lake (China). Xie et al. (2021), working in Daqing River Basin (China), named 15 zooplankton species using the BLAST algorithm in GenBank. Several names were misidentifications. Yang and Zhang (2020) used the zooplankton to assess Thai Lake and its surroundings, and MOTUs were assigned using GenBank and its database (Yang et al., 2017). Although the authors used a Bayesian tool to assign the taxonomic groups found (Munch et al., 2008), the taxonomic impediment is present (see Figure 2 in Yang et al., 2017).
As a workflow, we resume our proposal and previous analyses about metabarcoding and DNA barcoding in Figure 2. As seen, we consider it crucial to have a baseline in the case of studies based on a community such as zooplankton, nekton, or benthos. It is also essential to consider the type of freshwater system to be studied (Bruce et al., 2021).

Frontiers in Environmental Science
frontiersin.org The situation in megadiverse countries A significant problem in megadiverse countries is not only the complexity of developing the methods and baselines for any group of aquatic life due to different environmental conditions and more complex biotic interactions. The development of science in any respect is compromised due to political factors and cuts in budgets. For example, the two leading countries in Latin America, and within the first places in biodiversity in the world, Brazil (first place) and Mexico (fifth place), recently suffered severe cuts for science (Elías-Gutiérrez et al., 2017;Lazcano, 2019;Thomaz et al., 2020;Kowaltowski, 2021;Quiroga-Garza et al., 2022), compromising not only the research but also the communication of it in open access journals (Smith et al., 2022). Our research on this topic (Valdez-Moreno et al., 2019) stopped due to the lack of funds and the prohibitions imposed by the government to purchase any equipment with resources obtained from foundations other than the governmental Mexican Council of Science and Technology (CONACYT). These problems and the loss of biodiversity should be a priority. Instead of that, the unsustainable policies (Overbeck et al., 2018;Ortega and Jaber, 2022) and lack of interest of the governments have caused a significant tragedy and an irrecuperable loss in the aquatic and terrestrial environments (Pelicice et al., 2017;Rico-Sanchez et al., 2020), leading to global consequences (Overbeck et al., 2018;Thomaz et al., 2020). Nevertheless, some of these countries have a firm (but small) scientific community (Aguado-Lopez and Becerril-García, 2021) with the ability to work on these topics. We urge international pressure to overcome this situation with no physical frontiers or barriers because they affect the entire world. An example can be the formation of the Atlantic Sargassum belt due to the discharges of nutrients in recent years of the Amazon River, among other factors (Wang et al., 2019), which seriously affects all countries with an Atlantic coast, such as Mexico (Rodríguez-Martínez et al., 2022).

Conclusion
We can conclude that eDNA metabarcoding is a promising technique for biomonitoring all kinds of epicontinental waters.

FIGURE 2
Workflow for metabarcoding and eDNA studies on freshwater ecosystems. We consider it essential to construct a baseline in the case of community biomonitoring.

Frontiers in Environmental Science
frontiersin.org However, the development of the baselines does not follow the same pace as the techniques on metabarcoding, and still, there is no standardization of methods for the latter (Mauvisseau et al., 2019). This lack of standardization of methods is usually because the primers, permanence of DNA in the environment, filtering methods, etc., are still in development (Schenekar, 2022). As it has been seen, in different environments, the permanence of eDNA will vary significantly, depending on different factors, such as water temperature and salinity. Because of these problems, we propose mock experiments. However, many main variables of freshwater, such as ultraviolet light, pH, dissolved oxygen, and biotic effects, are unknown (Lamb et al., 2022). We believe all these methods need development for each type of system, and they should be compared within it, not among them (see Figure 2). At least, in the actual status of knowledge and technical development, we consider that this approach is the most feasible.

Author contributions
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.