A primer on pollen assignment by nanopore-based DNA sequencing

The possibility to identify plants based on the taxonomic information coming from their pollen grains offers many applications within various biological disciplines. In the past and depending on the application or research in question, pollen origin was analyzed by microscopy, usually preceded by chemical treatment methods. This procedure for identification of pollen grains is both time-consuming and requires expert knowledge of morphological features. Additionally, these microscopically recognizable features usually have a low resolution at species-level. Since a few decades, DNA has been used for the identification of pollen taxa, as sequencing technologies evolved both in their handling and affordability. We discuss advantages and challenges of pollen DNA analyses compared to traditional methods. With readers with little experience in this field in mind, we present a hands-on primer for genetic pollen analysis by nanopore sequencing. As our lab mainly works with pollen collected within agroecological research projects, we focus on pollen collected by pollinating insects. We briefly consider sample collection, storage and processing in the laboratory as well as bioinformatic aspects. Currently, pollen metabarcoding is mostly conducted with next-generation sequencing methods that generate short sequence reads (<1 kb). Increasingly, however, pollen DNA analysis is carried out using the long-read generating (several kb), low-budget and mobile MinION nanopore sequencing platform by Oxford Nanopore Technologies. Therefore, we are focusing on aspects for palynology with the MinION DNA sequencing device.


Potential of pollen analysis
Species declines are becoming increasingly serious. Agricultural intensification is considered a major driver of biodiversity decline that also affects functionally relevant species, including pollinators (Díaz et al., 2019;Krehenwinkel et al., 2019;Raven and Wagner, 2021). Land use intensification additionally causes biotic homogenization of plant and animal communities in agricultural landscapes (Parreño et al., 2022). Besides, deforestation, industrialization and urbanization contribute to the elimination of nesting places and habitats for many species leading to a loss of overall biodiversity (Sánchez-Bayo and Wyckhuys, 2019). To counteract this development, mankind needs as much information as possible about the influences of the abovementioned impacts on existing communities and ecosystems. Biomonitoring methods aim to identify species and conditions to measure changes in ecosystems (Hajibabaei et al., 2011).
Biomonitoring methods are especially in demand for the analysis of plant-pollinator networks, not only in natural and agricultural landscapes, including forests (Carneiro de Melo Moura et al., 2022), but also in urban ecosystems (Udy et al., 2020). In particular, insect pollinators are indispensable due to their pollination services (Porto et al., 2020;Baylis et al., 2021). Detailed knowledge of existing plant-pollinator networks and the foraging behavior of pollinators in different landscapes can help to maintain future pollination services and support management strategies (Leidenfrost et al., 2020;Bell et al., 2022;Namin et al., 2022). Both, plant-pollinator networks and foraging behavior can be reconstructed with the analysis of pollen grains collected by pollinators. This information may be used to guide, for example, urban planting projects or ecological landscaping (Potter et al., 2019). Identification of the plants used for honey production can also provide valuable information to beekeepers and consumers; indeed, marketing and validation of specialty honey, such as Manuka honey, requires information about the floral source (Galimberti et al., 2014). Furthermore, the identification of the pollen source supports the quality control of other bee products such as royal jelly or propolis, whose composition is also influenced by pollen diversity (Danner et al., 2017;Kegode et al., 2022). Finally, since pollen contains carbohydrates, lipids, vitamins, minerals and all the basic amino acids, its correct composition is of great importance for pollinators' health (Di Pasquale et al., 2013;Frias et al., 2016).
Palynology is very interdisciplinary and has a huge outreach ( Figure 1). Besides in agricultural sciences, it also plays a major role in, e.g., aerobiology, a discipline that investigates the passive transport of bioaerosols through air. Here, pollen is mostly studied in the context of allergen monitoring (Fragola et al., 2022;Khan et al., 2022;Polling et al., 2022). In forensic palynology, pollen, which easily attaches to many surfaces such as skin and clothes, which is insensitive to chemical reactions, and that is incredibly durable, provides information about the potential timing and location of a crime scene (Alotaibi et al., 2020). In paleoecological and paleoclimatological research, pollen is applied as well. With fossil pollen from sediment or ice cores, climate reconstructions from the quaternary period (2.6 million years ago) and older were possible (Chevalier et al., 2020).

Advantages and challenges of genetic pollen studies
For the microscopic identification of pollen grains, expert knowledge and plenty of time is needed. In contrast, genetic processing of pollen does not require years of experience in palynology but can be carried out by virtually all experienced molecular biologists (Bell et al., 2022). Furthermore, the taxonomic resolution based on morphological traits is limited, as not for all plant families the species can be determined. Pollen of the Rosaceae, e.g., to which many important fruit varieties belong, show a very similar morphology (Lechowicz et al., 2020). This fact also restricts the success of computerassisted analysis of micrographs (Polling et al., 2022). But with DNA analysis, e.g., DNA metabarcoding, pollen can be identified in more detail (Potter et al., 2019;Ruppert et al., 2019). Additionally, not only single pollen grains but also mixed bulk samples can be processed, which makes DNA metabarcoding an important tool for understanding and monitoring ecosystems (Vamosi et al., 2017). Furthermore, a higher number of taxa than in classical observation trials can be detected (Bell et al., 2016;Pornon et al., 2017).
The fact that DNA could be made readable imposed entirely new perspectives on the term biodiversity since genetic information paved the way for rapid taxa identification, even of previously unknown taxa (Hebert and Gregory, 2005). In addition, high-throughput methods enabled the processing of data volumes greater than ever and thereby allowed the realization of large-scale metagenomic surveys (Fišer Pečnikar and Buzan, 2014;Reuter et al., 2015;Thomsen and Willerslev, 2015). With one pollen sample, e.g., coming from a pollinator insect, multiple interactions can be efficiently analyzed, for which several years of observation would otherwise have been necessary. E.g., from one single intestinal DNA sample one can detect plant-pollinator interactions as well as the microbiome composition. Thus, with molecular palynology high-throughput biodiversity monitoring can be conducted.
Of course, there are a lot of possible error sources during the process of genetic pollen analysis. We will come to these in the "How to" section. And, in contrast to standard laboratory organisms or sample material like bacteria or blood, there are no well-established methods for DNA isolation from pollen originating from different plant taxa (Bell et al., 2016). Furthermore, depending on which sequencing method is used, the read accuracy may differ (van Dijk et al., 2018). Currently, if all steps from pollen sample collection, DNA isolation and all subsequent steps to DNA sequencing and subsequent sequence data analysis are added up, DNA sequencing may initially even require more time-effort than microscopic pollen examination. The analysis of pollen grains is applied in many research fields.
Frontiers in Ecology and Evolution 03 frontiersin.org 3. How to: Pollen identification by DNA sequencing There are numerous available workflows for molecular palynology, the most common being DNA metabarcoding. In this case, not the entire DNA strand is sequenced, but only a short part of it (Taberlet et al., 2018). Pollen metabarcoding is made up of five steps: pollen collection, DNA isolation, barcode amplification, sequencing and downstream bioinformatic data analysis ( Figure 2). Depending on the source of the pollen, the available laboratory equipment or the data that is sought to be generated, different methods may be applied in each step. In order to achieve maximum success and a high significance of the results, a good quality of the intermediate product must be produced in each step, i.e., DNA purity, amplicon purity, read length, quality score or completeness of databases. Therefore, it is important to work in a clean environment and to disinfect all equipment.

Pollen sampling and storage
Depending on the source, pollen tells different stories. To create a plant-specific pollen image database, it usually has to be collected directly from its origin, the flower (Shivanna and Rangaswamy, 1992). Pollen collection directly from the flower is also necessary when either the success or efficiency of DNA extraction methods, the level of polyploidy, or the presence of plant organelles are of particular interest. However, to establish plant barcode databases, DNA can be collected directly from any DNA containing part of the plant. To infer plant-pollinator networks, though, pollen is collected from pollinators or their nests for molecular palynology.

Sampling pollen from flowers
For many plants, non-disruptive pollen sampling of the flower can be carried out with sterilized spatula. In some cases, the plant must be shaken or lightly rubbed over a 0.5 mm sieve. However, not every plant is suitable for this, as there is not much free pollen available from all plant species. In such cases, the anthers must be collected from the flowers and dried. After drying, they release pollen from their interior. The sieve method can also be used here. If the flowers are subjected to vibration (e.g., by using electric toothbrush), the pollen released from the flower can be collected directly in a container (Knäbe et al., 2014).

Sampling pollen from pollinators
Pollen collected by pollinators might either be loosely attached to their body or mixed with plant nectar or insect saliva. The latter is usually deposited in the nest. Thus, the pollen might either be sampled directly from the insect or its nest. Pollen sampling from individuals can be used to study the foraging activities of bees.
Honey bees and bumble bees transport the captured pollen grains from the flower to their hive in the form of pollen loads and store it as an energy and protein resource to feed their colony. For honey bee pollen, so called pollen traps can be installed in front of the beehive. The honey bees have to pass through this perforated grid where they lose their pollen loads. These fall into a drawer and can be collected (Bänsch et al., 2020). Pollen traps are also available for bumble bee nests (Judd et al., 2020).
In contrast, wild solitary bees collect pollen at their abdomen and store it in a clump for their offspring in their nest. The pollen they collect must be sampled with a sterilized spatula. In some studies, insect pollinators are caught and the pollen is sampled from them with tweezers, leaving the individual alive (Biella et al., 2019;Leidenfrost et al., 2020;Rivers-Moore et al., 2020).

Extracting pollen from honey
Next, to biomonitoring issues, tracing the origin and composition of honey is also of interest (Wirta et al., 2021;Liu et al., 2022). However, honey usually contains much less than 1% (w/w) pollen. A huge amount of source material, about 3-10 g, is needed to accumulate enough pollen mass for DNA extraction. Mixed with 30 mL of sterile water, the suspension is incubated at 65°C for 30 min. The dissolved honey sample is afterwards centrifuged (30 min, 15,000 rpm) to pelletize the pollen. The resulting pellet can now be used for DNA isolation (de Vere et al., 2017).

Long-term storage of pollen
When the pollen pellet is resuspended 1:4 (pollen:ethanol) in 70% (v/v) undenatured ethanol, an aliquot can be taken as a randomized sample (Leidenfrost et al., 2020). At the same time, the pollen grains are washed from nectar and contaminants. DNA metabarcoding consists of five steps. These steps vary in their execution depending on the sample material and the ecological question. First, the starting material must be prepared in different ways to obtain an appropriate concentration of DNA. Depending on the downstream application, the barcode is amplified and the DNA is read with a selected sequencing method so that the data can later be analyzed accordingly.
Frontiers in Ecology and Evolution 04 frontiersin.org

Hands-on …
No matter how or from which source pollen grains are collected for biological analysis, proper storage is important to prevent DNA degradation.
Consequently, freshly sampled pollen should be stored either refrigerated at 4°C or in 70% (v/v) ethanol. In terms of biodiversity analyses, it is anyway appropriate to create a homogeneous mix with ethanol in order to create a representative random sample from which aliquots can be drawn. For this purpose, undenatured ethanol should be used, as some additives in denatured ethanol can interfere with downstream applications.
Immediately after resuspension in ethanol, it is advisable to take aliquots of 100-400 μL in order create identical replicates. It is important to mix the pollen:ethanol suspension really well to prevent the pipette tip from clogging. Subsequently, after a centrifugation step (10 min, 14,000×g) the supernatant is discarded leaving a washed pollen pellet. After drying in a clean bench for 24-72 h, the pellet can be used for DNA isolation. It should have a mass of about 0.015-0.025 g (Bänsch et al., 2020).

Pollen disruption and DNA isolation
Pollen samples might originate from plants, airborne pollen, bee foragers or bee nests. Thus, depending on its source, the pollen sample is either composed of only a few grains or a bulk sample representing one or more plant species. Pollen collected from pollinators usually constitute mixed samples as pollinators often visit different flowers (Bell et al., 2017a).
As different pollen species have various morphological structures and sizes, it is a challenge to isolate DNA from the pollen grains (Bell et al., 2016;Halbritter et al., 2018). The pollen wall of seed plants, called sporoderm, is composed of two layers: the inner intine and the outer exine. The exine, mainly consists of the polymer sporopollenin, which is very robust as it is acetolysis-and decay-resistant. These morphological traits enable the preservation of the pollen nutrients (Halbritter et al., 2018). Thus, it requires a good cell disruption method to release the DNA (Yang et al., 2019).

Pollen disruption
For pollen disruption, a practical and time efficient way is beadbeating (Leontidou et al., 2021;James et al., 2022;Polling et al., 2022). When available, ball mills can be used. However, a standard vortex device, typically present in every biological laboratory, is usually sufficient (Kamo et al., 2018). Ceramic beads are both hard enough and feature a rough surface helping to break the pollen wall. Due to the different morphological traits of pollen grains, it is recommended to not only use one but two bead sizes simultaneously. Generally, diameters of 2.8 mm and 1.4 mm yield good results (Bänsch et al., 2020;Leidenfrost et al., 2020). With the disrupted pollen suspension, DNA extraction can be performed.

Hands-on …
When incubating the pollen sample together with ~400 μL lysis buffer (buffer AP1 from the DNeasy Plant Mini Kit from Qiagen), 4 μL of proteinase K (20 mg/mL) and 1 μl RNase A for 1 h at 65°C before pollen disruption, optimal results in pollen purity can be achieved. After this treatment, beads can be added directly into the tubes to disrupt the pollen by vortexing for 3 min or using a tissue lyser. The resulting suspension can then be processed according to the kits' instruction.
During DNA isolation, pollen may pellet poorly and form an upper phase during the first centrifugation step, which is intended to pellet impurities and cell debris.
In this case, care must be taken to not take up this pollen when removing the supernatant, as it could later clog the DNA extraction column.
DNA extraction results can vary depending on the storage, disruption and isolation method. DNeasy Plant Mini Kit from Qiagen predicts a DNA yield of 38-40 ng/μL. However, when working with pollen, we usually see a much lower DNA yield of 3-20 ng/μL. For accurate DNA quantification a Qubit fluorometer (Thermo Fisher Scientific Inc.) should be used.

DNA metabarcoding
DNA barcoding describes the identification of taxa based on standardized barcode sequences (Hebert et al., 2003;Kress et al., 2015). A barcode sequence comprises a short, conserved DNA section, e.g., the mitochondrial cytochrome c oxidase I gene, that can be easily PCR amplified and sequenced. In metabarcoding, the same method is applied to a mixed sample that is analyzed by high-throughput sequencing (Taberlet et al., 2012;Lowe et al., 2022). This way, taxonomic identification can be performed without time consuming observation efforts or morphological expert knowledge (Lamb et al., 2019;Ruppert et al., 2019).

Barcode selection
For the identification of plant taxa present in pollen samples, usually not the complete genomic DNA, but a short, standardized barcode section is used. This barcode section has to be (a) short enough to be PCR amplifiable, (b) distinct enough to show interspecies variability, and (c) enclosed by two inter-species conserved regions serving as primer binding sites (Taberlet et al., 2018). Table 1 lists frequently selected DNA barcodes with their expected amplicon lengths. In the past, plant pollen was predominantly classified with either organelle rDNA, nuclear rDNA, or internal transcribed spacer (ITS) sequences (Danner et al., 2017;Maestri et al., 2019;Suchan et al., 2019). For pollen, several plant barcodes have been established, namely: rbcL, matK, psbA-trnH, trnL. Plastidic barcodes (rbcL and matK) are not recommended anymore as plastid DNA is not present in all pollen grains (Galimberti et al., 2014;Bell et al., 2016;Richardson et al., 2019). A very popular plant barcode in metabarcoding studies is the ITS region (Danner et al., 2017; Frontiers in Ecology and Evolution 05 frontiersin.org Nürnberger et al., 2019;Vaudo et al., 2020;Leontidou et al., 2021). It is comprised of ITS1 and ITS2 that are separated by the 5.8S rRNA gene (Figure 3). It was found that ITS1 has a higher discriminatory power and species identification success rate than ITS2 . Still, ITS2 has a greater popularity (Table 1). Long-read DNA sequencing methods from Oxford Nanopore Technologies and PacBio allow for the analysis of the complete ITS region. The discriminatory power of barcodes does not only depend on the sequence length but also on the availability of plant barcodes in sequence databases (Namin et al., 2022). Thus, it is advisable to analyze several barcodes in parallel (see below). However, even if plant barcode reads from pollen cannot be assigned to taxa, their sequence variability can still be used to infer pollen diversity.

PCR amplification of barcode(s)
Before sequencing, all barcodes are amplified by either a standard or multiplex PCR. However, this step may lead to a disproportional, source dependent amplification, a phenomenon called PCR-bias (Liu et al., 2022). For that reason and to ensure a high taxonomic resolution, it is important to use plant barcodes with a high degree of universality across taxonomic groups (Bell et al., 2016;Kamo et al., 2018). Additionally, it has been observed that analysis of one single barcode may lead to ambiguous results. Usually, using a multi-locus approach with more than one barcode increases the discriminatory power (Kamo et al., 2018;Ruppert et al., 2019). Principally, if enough sample is available, plant barcode sequencing can also be performed with raw, unamplified DNA samples. Several samples can still be sequenced in parallel: Multiplexing barcodes can be added to individual samples, e.g., by transposase-assisted tagmentation without PCR (Adey et al., 2010).

Hands-on …
When choosing a plant barcode for pollen metabarcoding, the length of the barcode should be a decisive argument. For next-generation sequencing approaches, short barcodes such as ITS2 or trnL are appropriate. With longread sequencing platforms from Oxford Nanopore Technologies and PacBio, longer barcodes may be analyzed.

Plant barcode sequencing
Metabarcoding studies are usually performed with highthroughput, next-generation sequencing (NGS), short-read platforms. However, due to high costs and the dependence on external service providers (only few labs have access to their own sequencing device), the cheap, handy and flexible MinION long-read platform from Oxford Nanopore Technologies has become an attractive alternative (Feng et al., 2015;Peel et al., 2019;Srivathsan et al., 2021).

Short-read NGS platforms
Nowadays, mostly next-generation sequencing (NGS) methods are applied for pollen metabarcoding (Figure 4). One popular NGS-method, Illumina sequencing, is largely dominating the market (van Dijk et al., Lennartz et al., 2021;Leontidou et al., 2021;Tommasi et al., 2022). This sequencing technique relies on the synthesis of a complementary strand via bridging PCR. Drawbacks of Illumina and other NGS methods are that they produce relatively short reads of one hundred to one thousand base pairs, which may cause gaps or incorrect assemblies (Rang et al., 2018;van Dijk et al., 2018). Additionally, there is a need for discussion if the relatively small reads (<250 base pairs) are enough to distinguish between species (Maestri et al., 2019).

Long-read MinION platform
Currently, for read lengths over one thousand base pairs, longread sequencing platforms from either Oxford Nanopore Technologies (ONT) or Pacific Biosciences (PacBio) are available. They can generate read lengths between ten thousand and two million base pairs (Maestri et al., 2019). Here we focus on the application of the portable MinION sequencing device from ONT ( Figure 5). With ONT devices, cost-effective, real-time, single-molecule sequencing can be carried out. In principle, even without any intervening amplification step (Krehenwinkel et al., 2019). Depending on the flow cell that is used for sequencing, different read lengths can be achieved. Its nanoporebased sequencing technology allows rapid analyses of DNA samples anywhere and avoids dependency on distant laboratories. For sequencing, extracted, single-stranded DNA fragments are linked to a motor protein that facilitates passage of the DNA molecule through  (Baldwin et al., 1995;Álvarez and Wendel, 2003;Wang et al., 2015), matK (Hilu and Liang, 1997;Kress and Erickson, 2007;Bell et al., 2016), rbcL (Newmaster et al., 2006;Bell et al., 2017b), trnH-psbA (Pang et al., 2012;Bell et al., 2016), trnL (Taberlet et al., 2007;Bell et al., 2016). Queried in the PubMed database with: "((ITS2) OR (internal transcribed spacer 2)) AND (pollen)" or "((ITS1) OR (internal transcribed spacer 1)) AND (pollen)" or "((matK) OR (mat-K) OR (maturase K)) AND (pollen)" or "((rbcL) OR (rbc-L) OR (rubisco)) AND (pollen)" or "((trnH) OR (trn-H) OR (trnH-psbA) OR (psbA-trnH)) AND (pollen)" or "((trnL) OR (trn-L) OR (trnL-trnF)) AND (pollen). " Frontiers in Ecology and Evolution 06 frontiersin.org the nanopore. The latter is embedded in a polymer membrane to which a membrane potential is applied (van Dijk et al., 2018). While passing through the membrane, sequence dependent clogging of the pore influences the ion flow through the pore, which in turn can be measured amperometrically. Instead of a fluorogram as obtained from Illumina NGS sequencing methods, the nanopore technique yields a so-called squiggle plot for each DNA molecule, which is then used for base calling (see below). The current MinION technology produces an output of at least five billion bases per run. For the R9.4 flow cell up to twenty billion bases of sequence data can be produced.

Portability
Prospectively, the MinION can be used to perform sequencing in the field or areas without laboratory infrastructure (Krehenwinkel et al., 2019). As the MinION sequencer can be powered via USB, it is a useful tool for sequencing projects in field or areas without proper laboratory equipment (van Dijk et al., 2018). With its stand-alone pendant, i.e., the MinION Mk1c, no computer is needed for sequencing as the device performs base calling as well ( Figure 5). Since environmental DNA studies become increasingly popular, miniature portable laboratory equipment such as miniaturized thermocyclers or battery powered gel electrophoresis devices are available. ONT offers a customized, portable lab-on-the-chip called VolTRAX for automated library preparation. Thus, with ONT devices, DNA metabarcoding studies under field (Johnson et al., 2017;Krehenwinkel et al., 2019;Maestri et al., 2019;Raymond-Bouchard et al., 2022) and even space (Castro-Wallace et al., 2017) conditions with minimal lab equipment are possible.  Single-molecule real-time DNA sequencing. Life Technologies Illumina sequencing methodology creates up to 251 bp long high-quality sequence reads and currently dominates the market. In contrast, both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) provide platforms for the generation of long (>800 bp) sequence reads, with DNA polymerases or protein nanopores, respectively. QDN, quantum dot nanoparticle; ZMW, zero-mode waveguide.
Frontiers in Ecology and Evolution 07 frontiersin.org

Error rate
Despite all advantages such as long-read sequencing and portability, MinION-based nanopore sequencing reads still show a comparatively high error rate. While the quality score of typical NGS techniques and PacBio are usually above 30 (99.9% base call accuracy), ONT reads show currently a quality score around 15-20 (96.8%-99% accuracy, respectively). However, when the MinION was first introduced in 2014, the accuracy of the generated reads was below 60% (Rang et al., 2018). Therefore, the technology still has a bad reputation. Together with a possible PCR bias, it limited the applicability of nanopore sequencing on metabarcoding of mixed samples (Rang et al., 2018;Maestri et al., 2019). However, if a specific reference database is applied and the MinION-specific error model (Krishnakumar et al., 2018) is considered during bioinformatic data processing (see below), MinION is well suited for metabarcoding (Krehenwinkel et al., 2019;Leidenfrost et al., 2020;Baloğlu et al., 2021). Furthermore, the read quality is continuously improving with every release of a new ONT library preparation kit and nanopore design.

Library preparation
The main objective of library preparation is the fragmentation of the sample DNA and attachment of the motor protein. With the ONT Rapid Sequencing Kit (SQK-RAD004) this is done in one step and library preparation requires 10 min and 400 ng of DNA. The price per sample is around 575 US$. By multiplexing, several separate DNA samples can be sequenced simultaneously at one flow cell. The ONT Rapid Barcoding Kit (SQK-RBK004) allows the attachment of multiplexing barcodes to up to twelve individual samples, which reduces the price per sample to 54 US$. The kit requires 400 ng genomic DNA as starting material, too. Hence, the sequence depth is reduced by a factor of twelve. For plant barcode sequencing from pollen samples this suffices (Leidenfrost et al., 2020). Depending on how many samples are to be processed at the same time (and how experienced the laboratory technician is), the laboratory work of sequencing library preparation takes approximately three to six hours. During the library preparation protocol, molarity calculations have to be carried out to proceed with the appropriate amount of DNA. The NEBioCalculator is a convient free online tool (NEBioCalculator, 2021). As mention, for accurate DNA quantification a Qubit fluorometer (Thermo Fisher Scientific Inc.) should be used.
It should be noted that ONT allows for two sequencing strategies: With the 1D approach, only one strand of the template DNA is sequenced. In contrast, with the 1D 2 library preparation chemistry, both complementary strands are sequenced and the squiggles of both strands are combined to create a higher-quality consensus read. This slightly increases read accuracy at the cost of sequencing depth (Cornelis et al., 2019).
The resulting library can then be pipetted into a flow cell to start the sequencing process. Typically, after around 10 min, the first one thousand reads are available for downstream data analysis. And after just a few hours, a usable amount of data has been produced. The activity of the pores in the flow cell as well as other parameters such as temperature, sequenced reads or the average quality score can be monitored in real-time during sequencing.

Hands-on …
MinION DNA sequencing still has the stigma of poor read quality attached to it. Thus, metabarcoding in combination with nanopore sequencing is usually not recommended. However, the technology is improving rapidly and a new Q20+ chemistry for read accuracies around 99% has been released by ONT only recently. Furthermore, still using the older chemistry, we could demonstrate that the main pollen resources of bumble bees can be identified by MinION nanopore sequencing to mostly similar extent as with Illumina sequencing (Leidenfrost et al., 2020). ONT provides a protocol for sequencing short reads, called Amplicons by Ligation (SQK-LSK109), that can be used for metabarcoding (Knot et al., 2020;Seth and Barik, 2021).

Bioinformatics and taxonomic assignment
After working both in the field and in the lab, the final steps in molecular palynology are carried out on the computer ( Figure 6). Typically, up-to-date tools lack any graphical user interface (GUI). Thus, both data handling and program executions are preferably performed in a UNIX-like command line interface, e.g., macOS Terminal, the PowerShell with a Windows Subsystem for Linux (WSL) for Windows 10 or higher, or a Linux system. It is strongly recommended to acquire the appropriate skills (Wünschiers, 2013).
ONT sequencing platforms provide all sequence run data as a binary encoded FAST5 file. FAST5 is a proprietary format developed by ONT that is derived from the Hierarchical Data Format 5 (HDF5) (The HDF Group, 2010). Most importantly, it encodes the squiggle plot data, i.e., the amperometric changes over the nanopore over time, as the DNA molecule passes through. During base calling, this data is converted into a sequence of nucleotides.

Hands-on …
Running the MinION does not require powerful computing resources; a modern notebook with a solid-state hard disk drive (SSD) is sufficient. ONT provides the MinKNOW software package that controls the MinION, allows for sequencing parameter settings and transfers the data from the device to the computer. This software is available for MS Windows and macOS. Depending on the available computer hardware, it is recommended to run base calling after sequencing. However, MinKNOW also allows for real-time base calling and generation of FASTQ files. By default, one thousand reads are stored together in one single FAST5 file. Frontiers in Ecology and Evolution 08 frontiersin.org

Base calling
The base calling process for nanopore data is rather different from base calling in other sequencing technologies. The main difference lies in the fact that not one single nucleotide but usually a pentamer determines the electric current through the nanopore. Accordingly, not four but 1,024 states have to be distinguished (Wick et al., 2019). Base calling is a very active field of development with contributions from ONT and independent research groups. ONT developed eight base caller software packages, whereof Guppy is the most prominent one (Wick et al., 2019;Kahlke, 2021;Wang et al., 2021).
Guppy does not only transform the squiggles into nucleotide reads but simultaneously removes multiplexing barcodes and adapter sequences from pre-processing, e.g., library preparation. Guppy is integrated into the MinKNOW software. However, only the standalone version is available for Linux operation systems. Base calling with Guppy can be extremely accelerated by the utilization a graphics processing unit (GPU).

Demultiplexing
When several samples were sequenced at the same time, the sequence data has to be demultiplexed. Thereby, the reads are assigned to their actual sample. Again, this can be carried out directly in parallel to sequencing with MinKNOW or afterwards with third-party software like Porechop (Wick, 2018) or DeepBinner . Unlike Porechop that requires base called FASTQ file, DeepBinner identifies barcodes from the squiggle raw signal in the FAST5 file, which gives it a greater sensitivity. When base calling is performed with Guppy, it can simultaneously be instructed to demultiplex the reads.

Error correction and quality filtering
Assuming that no high-quality short reads from NGS sequencing are available for error correction, one can still improve the nanopore reads based on the known error model: Nanopore reads predominantly suffer from insertions and deletions (indels) in homopolymers (Delahaye and Nicolas, 2021). Thus, several algorithmic approaches have been implemented for standalone, computational error correction (Salmela et al., 2016;Koren et al., 2017;Xiao et al., 2017;Sahlin and Medvedev, 2021).
The error rate can also be mitigated by using multiple reads for one plant barcode to establish a consensus, e.g., with the tool SINGLe (Espada et al., 2022). This consensus calling strategy reduces the read quality at the cost of sequencing depth by a factor of 30-100.
After the optional error correction, reads can be filtered by their quality score. For quality filtering we provide a simple script that may be applied and that allows the setting of different aspects, such as read length and individual nucleotide or average read quality thresholds (Wünschiers, 2022). Primer sequences from the plant barcode amplification step are trimmed afterwards. To that end, again Porechop or Cutadapt are common tools (Martin, 2011). Exemplified bioinformatics pipeline starting with the FAST5 data file as provided by the MinION sequencer. On the left, the processing of eight plant barcode reads from two pooled, multiplexed samples is shown schematically. On the right, abbreviated file contents and software functions are shown.
Frontiers in Ecology and Evolution 09 frontiersin.org

Hands-on …
Starting off with a FAST5 file as provided by the ONT nanopore sequencing platform and with minimal computational effort, the next steps toward taxa identification may be performed as follows in the Linux command line: • Base calling, demultiplexing, and multiplex barcode trimming with Guppy:

Assigning reads to taxa
Finally, pollen sequence reads are assigned to plant barcodes ( Figure 7). This is usually done either by a local alignment as implemented in BLAST+ (Camacho et al., 2009) or a global aligner, e.g., the freely available VSEARCH software (Rognes et al., 2016). Prerequisite is an appropriate database (Bell et al., 2016). In the case of ITS2, the online database provided by the University of Würzburg, Germany may be used (Ankenbrand et al., 2015). Alternatively, a local customized database is created that contains all relevant barcode sequences, optimally filtered to only contain locally occurring plants to reduce the noise. The required barcode sequences can be downloaded, e.g., from NCBI GenBank. Additionally, the assigned plant species can be filtered and divided by their blooming time. This way, the reliability of the results can be increased. The barcode sequence reads can also be deconvoluted by aligning them to a custom reference using the minimap2 aligner software (Li, 2021). This sequence alignment tool is optimized to map noisy sequence reads to a reference database.

Outlook
What can be exprected in the future? On the one hand we see a trend towards long-read DNA sequencing technologies that will certainly enhance the usability of currently used barcodes. Likewise, it opens possibilities to use longer barcodes. Furthermore, it will help to increase the resolution at the species level. This development will be facilitated by an ever-increasing accuracy of long-reads with affordable and portable devices. Concurrently, we see a trend toward the application of "whole genome barcodes" by an approach that is called genome skimming (Dodsworth, 2015;Bell et al., 2021). In contrast to the targeted-sequencing approach of metabarcoding, shotgun metagenomics involves randomly sequencing short genomic DNA stretches from mixed samples. These can then be used for queries in genome databases. Currently, the number of sequenced plant species, as necessary for pollen identification, is limited. However, Peel et al. showed the feasibility of a reverse metagenomics approach for which they sequenced locally growing plant species with a low coverage (Peel et al., 2019). These species are represented as so-called genome skims. From these genome-wide sequence reads they created a customized sequence database that they queried with shotgun sequenced pollen DNA. They demonstrated that this reverse metagenomics approach could classify plant species present in mixedspecies samples at proportions of 1% DNA or higher.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
LP, BP, and RW: conceptualization and reviewing and editing. LP: writing original draft. RW: supervision. All authors contributed to the article and approved the submitted version.

Funding
This work was funded by the Saxon State Ministry of Science, Culture and Tourism.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. DNA barcode amplicon sequences are queried against a sequence database. Optimally, this database has been filtered to only include locally occurring species.
Frontiers in Ecology and Evolution 10 frontiersin.org