- 1Moscow Center for Advanced Studies, Moscow, Russia
- 2Sechenov First Moscow State Medical University, Moscow, Russia
- 3Institute of Environmental Engineering, RUDN University, Moscow, Russia
DNA sequencing technologies play a key role in modern soil microbiome research, providing a deep understanding of its structure and functional role in ecosystems. 16S rRNA gene, region of 18S-ITS-28S sequencing and shotgun sequencing using modern sequencing technologies (Illumina, Pacific Biosciences (PacBio), Oxford Nanopore Technologies (ONT)) allow us to identify the diversity and dynamics of microbial communities with high accuracy and resolution, which significantly expands our knowledge of biological processes and interactions between microorganisms in the soil. Soil microbiome analysis using sequencing contributes to the development of innovative methods for sustainable agricultural land management, improved fertility, plant disease management, and increased crop yields. Despite its significant potential, each sequencing technology has its own advantages and limitations related to accuracy, depth of coverage, cost, and data analysis complexity. Understanding these characteristics is crucial for selecting the optimal methods depending on the research objectives and available resources. This review systematizes modern sequencing methods, their technical capabilities and limitations, bioinformatics tools for sequencing data analysis, considers examples of successful applications in the study of soil microbiome in various ecosystems, and emphasizes new trends in metagenomics. In-depth study and development of soil microbiome sequencing technologies contributes to more sustainable and resource-efficient agriculture, emphasizing the need for a comprehensive and informed approach to the analysis of microbial communities.
1 Introduction
Soil is the largest repository of biodiversity. It participates in regulating climatic processes and is a direct participant in the carbon cycle, contributing to the purification of water and the environment. Currently, soil is one of the key factors directly influencing economic well-being. Soil is the main source of food production. Thus, soil research is important both for agriculture and for maintaining ecosystem stability. In agriculture, soil research contributes to increased fertility and crop yields (1). The use of modern technologies accelerates the selection of agricultural methods, the detection of pathogens and treatment methods, contributes to the timely detection of nutrient deficiencies and prevents soil degradation (2, 3). Careful soil analysis helps to use resources rationally, minimize environmental damage, and ensure healthy plant growth.
Soil microbiome performs the key functions in agricultural systems, determining soil fertility, crop yields, and stress resistance (4). The soil microbiome is highly diverse, encompassing not only bacterial species but also fungi, archaea, and other microorganisms (5). There are many factors that can influence the soil microbiome, so a preliminary analysis of the microbiome can protect against the irreparable consequences of using certain agricultural methods (5–7).
For a long time, analyzing the soil microbiome was a time-consuming and laborious task that required the cultivation of bacteria and the use of PCR and Sanger sequencing methods. However, most soil microorganisms cannot be cultivated in laboratory conditions, which is why the world of the soil microbiome remained unexplored for a long time (8). With the advent of high-throughput sequencing methods, it became possible to analyze the microbial community of the soil without resorting to cultivation (9, 10). This has allowed researchers to learn about the taxonomic composition of soils, which previously remained a mystery. Moreover, it has become possible to analyze low-represented genera/species.
Illumina and Ion Torrent technologies are second-generation sequencing technologies characterized by short reads (11). Third-generation sequencing technologies such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are characterized by long reads, which allow the sequencing of entire genes rather than short variable regions (12).
The technological transition from short to long reads and hybrid sequencing is based on the strong advantages of long and hybrid approaches, which provide more complete and accurate genome reconstruction and variant detection. Long-read sequencing allows sequences up to 25–30 thousand nucleotides to be obtained, whereas short reads are limited to 1000 bp (in 454 pyrosequencing) (13). This provides more complete gene coverage and minimizes amplification errors characteristic of short reads. Long reads are better at handling repetitive regions, complex structural variations, and genome heterogeneity, which often limit short reads (13, 14). However, high read error rates and high computational resource requirements remain challenges for long-read platforms, which has led to the use of hybrid sequencing approaches. Hybrid methods combine the high accuracy of short reads with the ability of long reads to cover long sequences for more accurate and complete genome assemblies (15).
The most popular approaches are currently used to analyze soil microbiomes, including 16S rRNA gene sequencing and shotgun metagenomic sequencing. 16S rRNA gene sequencing allows for the identification of taxa down to the genus level, while shotgun sequencing allows for higher-resolution profiling, often down to the strain level. 16S rRNA gene sequencing approach is further subdivided into amplicon sequencing and full-length 16S rRNA gene sequencing (16). Previously, due to technical limitations, the 16S rRNA gene was only partially sequenced, typically targeting specific variable regions (most commonly V3–V4) through amplicon-based approaches. Now, with the advent of long-read technologies, it is possible to sequence the entire 16S rRNA gene (V1-V9), which offers advantages in identifying closely related taxa (17, 18). The analogue of 16S rRNA gene sequencing in eukaryotes is the sequencing of the ITS (Internal Transcribed Spacer) region located between the coding regions of the 18S rRNA, 5.8 rRNA, and 28 rRNA genes (19). Shotgun sequencing, in turn, involves sequencing the entire metagenome within a sample, without being limited to bacteria or fungi (in the case of ITS) or to amplicons of specific genes. In addition, shotgun sequencing provides the opportunity to perform genomic and functional analysis of soil based on genes found in the genomes of all microorganisms in the sample (20, 21).
Due to the rapid development of sequencing platforms, new, constantly updated bioinformatics tools are needed to analyze sequencing data. With the rapid development of artificial intelligence, more and more tools based on machine learning models and neural networks are emerging. Currently, artificial intelligence methods are actively used to analyze soil microbiomes, allowing complex patterns in metagenomic data to be detected, microbiome functions to be predicted, and soil management to be optimized. Machine learning models have found their most widespread application in the field of binning for metagenome-assembled genomes (MAGs) recovery (22). Machine learning is also used in taxonomic identification and functional annotation (23–25).
Each sequencing technology has its advantages and disadvantages. In this review, we explore the application of NGS (next generation sequencing) technologies in soil microbiome analysis, providing a detailed overview of each sequencing approach, the bioinformatics tools used for analysis, and a comparative assessment of their respective strengths and weaknesses.
2 Soil characteristics, its microbiome, and metagenomic sequencing
Soil quality and fertility are important factors that directly influence the efficiency of agricultural production. By observing soil characteristics, it is possible to monitor its fertility and, if necessary, take measures to improve the soil by applying fertilizers in a timely manner. Soil characteristics are divided into physical, chemical, and biological (26). Physical characteristics include texture and saturation percentage. Chemical characteristics include pH, electrical conductivity, and concentrations of calcium carbonate, potassium, and phosphorus. Biological characteristics include soil organic matter —all substances in the soil containing organic carbon formed by plants or animals at various stages of decomposition and decay (27).
Microorganisms play an important role in soil quality. For example, Azotobacter spp. and Azospirillum spp. bacteria are capable of fixing atmospheric nitrogen and converting it into a form accessible to other organisms. In addition, they produce vitamins and siderophores that stimulate plant growth (28). Azospirillum spp. can act as bioprotectors, protecting plants from pathogens, producing hormones, and improving nutrient uptake (29, 30). Another group of bacteria important for soil fertility are Actinobacteria, which participate in the decomposition of plant and animal residues, including hard-to-decompose compounds. Actinobacteria are capable of synthesizing enzymes and antibiotics and participating in the formation of humus (31, 32). Actinobacteria also produce secondary metabolites that improve fertilizer availability and promote plant growth (31). In addition to bacteria, fungi also participate in the formation of fertile soils. They are involved in various basic physiological processes of plants and, for the most part, are supporting factors under stressful conditions (drought, salinization, contamination with xenobiotics) (33). Soil enzymes synthesized by microorganisms also determine the quality, composition, and changes in the soil; they participate in the synthesis and decomposition of humus, nitrogen fixation, xenobiotic detoxification, nitrification, and denitrification (34).
Soil microorganisms can be considered accurate and sensitive indicators of soil fertility and quality, as they react very quickly to changes in environmental conditions (35–39). Thus, Nunes et al. (40) demonstrated a relationship between organic carbon loss and soil microorganism biomass and their activity. Thus, microorganisms play a key role in the formation and maintenance of soil fertility (41–43), as well as in the synthesis of secondary metabolites, antibiotics, and enzymes, making them a relevant subject of modern scientific research.
Metagenomic sequencing of soil samples and subsequent annotation of data for gene identification allows us to establish a close relationship between sequencing data, microbiome diversity, functional genes, and, as a result, soil properties. Jiyu Jia et al. (44) found in their study that 17 functional genes of the soil microbiome are directly related to ecological processes and measurable indicators of soil health. These genes are involved in the carbon cycle (cbbL, GH31), nitrogen cycle (nifH, ureC, chiA, A-amoA, B-amoA, narG, nirK, nirS, norB, and nosZ), and phosphorus cycle (gltA, bpp, phoD, phoC, pqqC). The authors consider the functional genes found to be valuable indicators for assessing soil condition and adjusting agricultural practices. Metagenomic sequencing allows for quick and accurate assessment of the presence of these genes in specific areas of land, simplifying the process of agricultural control. In addition, metagenomic sequencing data can be used in soil screening to search for potentially useful bacteria to increase soil carbon stocks (45). Using metagenomic sequencing, Weijie Jin et al. (46) determined how different types of land use affect the soil microbiome and functional genes involved in carbon and phosphorus cycling in the desert steppe zone of the Loess Plateau. In forest soils, a greater number of genes responsible for carbon fixation and methanogenesis (ppdK, rpiB, glpX, epi, purS, mttB, acs) were identified, indicating their increased ability to process organic matter. In contrast, lily farmland had the highest number of 2AEP transporter genes and more intense microbial interactions associated with carbon and phosphorus cycles. In addition, the forest ecosystem was characterized by a more specialized but less complex microbial network. These data are important for predicting the biogeochemical consequences of anthropogenic impacts and developing sustainable land use strategies in arid regions. In addition, a novel approach combining metagenomics and metabolomics enabled Viviana Freire-Zapata et al. (47) to investigate the relationships influencing greenhouse gas dynamics under permafrost thawing conditions. Their trait-level analysis established links between microbial taxa, metabolites, and observed changes in CO2 and CH4 in pore water. On the technical side, metabarcoding (sequencing of metabarcode genes, 16S rRNA gene, ITS regions) is currently quite popular due to its low cost compared to metagenomic analysis, but establishing the link between the data obtained and metabolic functions remains a challenge. Therefore, to simplify the task of mapping metabarcoding data with metabolic function networks that convert organic and inorganic compounds present in the environment, Arnaud Belcour et al. (48) (48) have developed a pipeline called Tabigecy, which uses taxonomic affiliation to predict the metabolic functions that make up biogeochemical cycles, making it a powerful tool for analyzing soil processes through sequencing.
Sequencing the soil microbiome is becoming a powerful tool influencing decision-making in agriculture and ecosystem restoration. For example, a study by Mishra AK et al. (49) in India showed that soils treated with organic fertilizers have a higher diversity of bacteria and fungi compared to soils treated with chemical fertilizers, according to 16S and ITS sequencing data. Thus, this study using NGS methods contributes to the development of recommendations for agricultural management. In addition to standard approaches, methods using machine learning are also used. Mo, Y. et al. (50) applied machine learning and SHAP analysis to identify specific microbial biomarkers indicative of different agricultural practices (fertilizer sources, tillage, cover crops). The results obtained contribute to the creation of new targeted strategies for improving agricultural practices. Peddle SD et al. (51) emphasize in their review that soil microbiome sequencing data can improve the planning, intervention, and monitoring of ecosystem restoration. Understanding the composition, diversity, and functions of the microbiome allows interventions to be tailored to overcome biotic and abiotic barriers to restoration. One such intervention could be the inoculation of environmentally suitable soil microbiomes into degraded environments. Also, incorporating well-planned experiments into restoration projects using microbiome sequencing data provides a more rigorous assessment of the effectiveness of various restoration methods. The recent popularity of NGS has contributed to the study of soil microbiomes at the national level, as Daisy Neale et al. (52) recommend conducting studies to identify soil microbiomes across the UK, continuously monitoring microbiome composition, and creating regulations to help farmers practice agriculture while maintaining a healthy soil microbiome.
3 Sequencing as a new stage in soil microbiome research
The soil microbiome is a very complex and highly diverse subject of research. Detecting low-represented genera is particularly challenging; moreover, only a small fraction of microorganisms can be cultured in laboratory conditions. More than 99% of bacteria and archaea have not been isolated in pure culture (8), but their usefulness and functional properties have been repeatedly confirmed in molecular studies (53). A breakthrough in sequencing was the creation of second-generation sequencers, Illumina and Ion Torrent technologies, which significantly reduced the cost of sequencing and increased speed and throughput, allowing multiple reactions to be performed at the same time (13). However, these platforms also have their drawbacks, one of which is the short read length (100–400 base pairs), and the second is the inability to control and manage the reaction in real time (54). A breakthrough in the development of long-read technologies was the creation of the Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) platforms, which perform read lengths of over 10,000 base pairs (12, 55). Sequencing on third-generation platforms allows real-time data acquisition and total control over the reaction (56, 28). Such advances in sequencing technologies have contributed to the development of microbiome analysis and made it possible to detect low-represented genera in samples with diverse taxonomy. Microbiome analysis has become accessible to a wide range of scientific research (21, 57–61). Soil microbiome research includes not only the study of bacteria or archaea, but also all other microorganisms inhabiting the soil such as fungi, viruses, protozoa, and microscopic algae (62, n.d.; 63–65). The primary methods for microbiome analysis include 16S rRNA gene sequencing (or 18S rRNA, ITS, and 28S rRNA gene sequencing for eukaryotic organisms) and shotgun sequencing (Figure 1). In 16S rRNA gene sequencing, PCR is used to generate target amplicons, and the resulting sequences are clustered into operational taxonomic units (OTUs) for subsequent analysis to determine taxonomic diversity (66, 67). In shotgun sequencing, the entire genomic content is sequenced, rather than targeting specific amplicons (68).
Figure 1. The main approaches of soil microbiome analysis. The soil icon was reproduced from Bioicons by Frédéric Bouché, licensed under CC BY 4.0 Unported.
3.1 16S rRNA gene sequencing
16S rRNA gene sequencing is a type of targeted sequencing. In this type of sequencing, after DNA is extracted from samples, the target gene/gene region is amplified, which in turn serves as a marker for taxonomic identification or metabolic processes (69, 70). This approach facilitates the characterization of microbial community composition within a given environment and enables the identification of interaction processes both among microorganisms and between microorganisms and their environment. 16S rRNA gene sequencing is widespread due to its convenience and simplicity. Certain agreements have been made in the context of identification: sequences with homology above 95% are classified as belonging to the same genus, and those above 97% are classified as belonging to the same species (71). An indicator of biodiversity is the concept of “alpha diversity,” which assesses the diversity of species and the uniformity of their distribution. The indicators of alpha diversity itself are the number of different species and the Shannon index, which, in addition to the number of different species, also takes into account their uniformity of distribution (72).
The complete 16S rRNA gene consists of about 1,500 base pairs and includes 9 variable regions (73) (Figure 2). Due to the active use of Illumina technology, which is characterized by short reads, 16S rRNA gene sequencing often refers to the sequencing of several variable regions, such as V4 or V6, V1-V3, or V3-V5 (74). Currently, full 16S rRNA gene sequencing is becoming more popular due to the emergence of third-generation sequencing technologies.
3.1.1 16S rRNA gene amplicon sequencing
Amplicon sequencing of 16S rRNA gene often uses sequencing technologies such as Illumina and Ion Torrent, but long-read methods can also be used. For example, ONT technology has been applied in the study and development of methods to combat diseases in horticultural crops using the soil microbiome. Xiaoping Li et al. (17) used the Nanopore MinION platform to study soil DNA, specifically its 16S rRNA gene amplicons, from orchards with a reduced population of the boxwood blight pathogen Calonectria pseudonaviculata. The authors identified 66 bacterial taxa that contain strains with biological control activity against plant pathogens.
Illumina technology is the most common method in 16S rRNA gene amplicon sequencing. Illumina sequencing technology involves several steps: first, DNA is fragmented and adapters are added to the ends of the fragments, then the sample is denatured and immobilized on a substrate, where amplification occurs by bridge PCR with the formation of clusters of identical copies of fragments. After that, the actual sequencing begins—a primer is attached to the chains, and DNA polymerase adds modified nucleotides with reversible terminators and fluorescent labels that cause synthesis to stop after each addition. A laser excites the labels, a color signal is recorded, then the labels and terminators are removed, and the cycle is repeated for the next nucleotide. As a result, the DNA sequence is obtained by reading the fluorescent signals across multiple clusters simultaneously (11).
Using 16S rRNA gene amplicon sequencing on Illumina Miseq, Sukanta Kumar Pradhan et al. (75) assessed the impact of chromite mining on the soil microbiome and found a change in the composition of the microbiota during the transition from contaminated sites to control sites at the type level (Actinobacteria, Proteobacteria, and Acidobacteria). The authors suggest that these groups of organisms are the most resistant to metals. In terms of species diversity, Shigella sp., Bacteroides sp., Propionibacterium acnes, Pantoea sp., Aciditerrimonas sp., Reyranella sp., Alphaproteobacterium sp., and representatives of the Burkholderiaceae family were widely represented at the in situ sites. In addition, using the Illumina NovaSeq platform, the variable fragment V1-V3 of the 16S rRNA gene of soil samples collected in the root zone of healthy ironwood trees and trees with signs of “ironwood tree decline” was sequenced, the relationship between the bacterial community in the soil and “ironwood tree decline” was assessed. The study found that the soil microbiome was not associated with “ironwood tree decline” (76). Amplicon sequencing is used to study and compare the taxonomic representation of microorganisms in cold and hot deserts (77). 16S rRNA gene sequencing plays an important role in the selection of agricultural methods, methods of processing and applying nitrogen fertilizers (78), as well as in the search for new ways to combat soil waterlogging (6).
In addition, Illumina Miseq is used to conduct fundamental research on the use of various variable fragments of the 16S rRNA gene. Ana Soriano-Lerma et al. (79) tested the hypothesis of the possible influence of the choice of the variable region of 16S rRNA gene on the results of soil microbiome research. Sequencing of regions V1-V3, V3-V4, V4-V5, V6-V8 were sequenced. As a result, differences were observed in alpha diversity and the detection of certain taxa, but this did not affect classification at the genus level. However, the choice of variable region did influence the identification of certain microorganisms, with the V1-V3 region showing the best results.
Ion Torrent technology is essentially similar to pyrosequencing technology. The system consists of a chip with wells containing DNA polymerase and beads, on which multiple copies of a single DNA fragment are immobilized in each well. dNTPs pass through the chip one by one, and as a result of complementary interaction and covalent binding with the growing chain, pyrophosphate and protons are released, and the pH of the reaction mixture decreases. The semiconductor chip detects the change in voltage (80). This technology is used less frequently due to more frequent errors and lower taxonomic resolution compared to Illumina technology (81, 82). Nevertheless, using this technology, Mangse et al. (83) demonstrated the influence of volatile petroleum hydrocarbons on the structure of the soil microbial community, Rhodococcus, Desulfosporosinus, Polaromonas, Mesorhizobium, and Methylibium had the highest relative abundance in soil treated with straight-chain alkanes, while Pseudomonas was more abundant in soil contaminated with cyclic alkanes. Koryachenko et al. (84) assessed the biological diversity in sedimentary Marl mudstone soil and revealed differences in the composition of the microbiome between Marl with surface green mat and bare Marl soil layer despite the similarity in pH using Ion Torrent. Ion Torrent was also used to assess the impact of various phosphate sources on the microbiota of the rhizosphere and endorhiza of barley. Depending on the phosphate source, the alpha and beta diversity of the active microbiota changed, especially in the rhizosphere, and there was also an impact on the relative abundance of some taxa (85). Salah Eddine Azaroual et al. (86) studied bacterial communities in Moroccan phosphate mines using six variable regions of the 16S rRNA gene. Proteobacteria, Actinobacteria, Bacteroidetes, and Firmicutes were found in the rhizosphere, with variable regions V3, V4, V6-V7 showing the best results in taxonomic classification.
Johnson et al. (16) showed that different variable regions differ in the level of accuracy of the results obtained; for example, according to the sequencing data for the V4 region, it was not possible to determine the taxonomic identity of more than half of the amplicons. In addition, different variable regions have different affinities for the identification of certain bacterial taxa. The same article showed that the V1-V2 region is poorly suited for the taxonomic identification of Proteobacteria, and the V3-V5 region is poorly suited for the identification of Actinobacteria. Meanwhile, some variable regions are well suited for detecting sequences of specific genera, such as Escherichia/Shigella for the V1-V3 region and Klebsiella for the V3-V5 region. However, despite the fact that some variable regions, such as V1-V3, provide a sufficient representation of the diversity of microorganisms, their sequences are still insufficient to separate closely related taxa from each other. In addition, it is impossible to distinguish closely related taxa if the frequency of differences between their sequences is less than the error limit of the sequencing technology used. Also, in some bacteria, different 16S rRNA gene sequences can be found in different operons of the same genome, which also complicates the obtaining of high-quality results on taxonomic composition in amplicon sequencing (87, 88).
3.1.2 Full-length 16S rRNA gene sequencing
Sequencing the full-length 16S rRNA gene provides more comprehensive results on microbial diversity. Sequencing the entire gene allows genera to be distinguished, while sequencing certain variable regions only allows taxonomy to be determined down to the genus level (16).
3.1.2.1 PacBio full-length 16S rRNA gene sequencing
A new stage in the development of sequencing was marked by the emergence of long-read technologies, known as third-generation technologies. Long-read technologies overcome the limitations of first- and second-generation sequencing methods because they can generate reads longer than 10 kilobases and do not require the preparation of a DNA template library for sequencing, which eliminates the need for template manipulation and reduces sequencing errors (89). Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio) are currently the leaders in long-read sequencing and have made significant contributions to the study of the soil microbiome (3, 6, 17, 18, 90–96).
PacBio is based on advanced SMRT sequencing technology - single-molecule real-time method. SMRT is a parallel approach to sequencing a single DNA molecule. SMRT uses a zero-mode waveguide (ZMW). A ZMW consists of a silicon substrate with a circular hole drilled in it with a radius of about 70 nm and a layer of aluminum coating. The template, sequence primer, and DNA polymerase make up the polymer unit of each ZMW (97). One of four fluorescent dyes is attached to each DNA base. The incorporation of a nucleotide by polymerase causes a change in the fluorescence of the dye corresponding to the base, and this change is detected.
The PacBio platform, thanks to its ability to generate long sequences, allows for whole-genome sequencing of the 16S rRNA gene. This property, combined with multiple passes of the same DNA molecule (cyclic consensus sequencing model), provides taxonomic determination and identification at the genera level with high resolution and exceptional accuracy (16, 98).
All these advantages of PacBio explain the widespread use of the platform in the field of soil and agricultural microbiology.
Seong-Jun Chun et al. (91) studied the impact of two naturalized sunflower varieties with different seed dispersal densities on soil microbial communities. Using full-length 16S rRNA gene sequencing with long reads on the PacBio Sequel II platform, the authors identified 9,885 amplicon sequence variants, including key indicator species significantly correlated with seed density. Although sunflower demonstrated limited dominance and did not have a significant impact on the overall community structure, full-genome 16S rRNA gene sequencing allowed the identification of some microbial taxa (Phycicoccus ginsenosidimutans and Flavisolibacter ginsengiterrae) that were closely associated with sunflower growth.
The high resolution of PacBio technologies allows for the analysis of microbial communities at different soil depths and the identification of differences between them. Thus, Richard Estrada et al. (93) used PacBio-HiFi (High Fidelity reads or highly accurate long reads) to show differences in alpha and beta diversity indices depending on plant type and soil depth when studying microbial communities at three depths (3 cm, 12 cm, and 30 cm) in Annona cherimola (cherimoya) and Pouteria lucuma (lucuma) fruit trees.
PacBio technologies are also used in combination with non-targeted metabolomics. In a study by Zhumei Du et al. (90), the use of PacBio SMRT sequencing technology and non-targeted metabolomics revealed the interaction between the microbiota and metabolome during silage fermentation. By combining microbiome profiling and metabolomics, the authors showed that changes in the structure of the microbial community can affect metabolic pathways, and that end metabolites can inhibit the growth of microorganisms that do not contribute to silage fermentation.
3.1.2.2 ONT full-length 16S rRNA gene sequencing
ONT involves sequencing individual molecules. During Nanopore sequencing, single-stranded DNA (ssDNA) undergoes a change in electrical charge as it passes through a protein channel (nanopore). DNA polymerase contains an enzyme that allows ssDNA to pass through nanopores, and the sensitivity is sufficient to distinguish nucleotide bases (99). ONT’s compact MinION sequencer, which is palm-sized, allows long sequences to be obtained in real time. Initially, the device generated chains 6–8 thousand base pairs long, but laboratory improvements have significantly increased its capabilities. Today, MinION can be used to create sequences longer than 100,000 base pairs (100).
Existing data on the use of ONT for 16S rRNA sequencing indicate a higher error rate compared to another long-read technology, PacBio (101). However, every year, more advanced methods for correcting ONT sequencing errors appear, and reagent kits and cuvettes are improved, which increases the accuracy of the technology and reduces the number of false positive and false negative results (102, 103).
The compactness, cost-effectiveness, and rapid development of ONT have spread the technology for sequencing applications in agriculture.
ONT sequencing of the full-length 16S rRNA gene sequence served as a tool for assessing changes in the composition of the soil microorganism community when treating soil with silver, titanium dioxide, and zinc oxide nanoparticles (94). Sangeeta Chavan et al. identified changes in the abundance and diversity of soil bacteria, not at the phylum level, but at deeper levels — “orders” and “genera.” They also found that of the three types of nanoparticles, silver was the most effective at suppressing bacteria in the soil, especially beneficial genera such as the nitrogen-fixing bacteria Clostridium and the plant growth-promoting Vibrio.
ONT technologies also helped identify hidden, uncultivable microbial communities around the active ectomycorrhizal zone of Astraeus (ectomycorrhizal symbiosis is a form of mutually beneficial symbiotic interaction between fungi and the roots of woody plants) in the dry deciduous sal forest of Jharkhand, India (18). Sequencing on the MinION Oxford Nanopore platform revealed a high diversity of operational taxonomic units (1905 OTUs) and 25 different types of bacteria. The most common type was Proteobacteria (36%), followed by Firmicutes (28%), Actinobacteria (10%), and Bacteroides (6%), and the most common class of bacterial communities in the active ectomycorrhizal zone was Gammaproteobacteria.
ONT technologies often allow for near full-length targeted amplicon sequencing of the complete rRNA locus (104, 105). Using this technology in combination with shotgun metagenomics, Nicholas LeBlanc et al. (96) investigated the effect of green manure from broccoli, marigolds, and sudangrass on the taxonomic and functional characteristics of soil bacterial communities. Using nanopore sequencing of 16S rRNA gene and metagenomic libraries, bacterial communities in the bulk soil and rhizosphere were characterized, and it was shown that all green manures reduced the abundance of the soil pathogen V. dahliae.
3.1.3 Bioinformatic processing of 16S rRNA gene sequencing data
16S rRNA gene sequencing remains the gold standard in microbiome research due to its cost-effectiveness and simplicity. The general workflow for analyzing 16S rRNA gene sequencing data consists of demultiplexing - separating reads by sample based on barcodes, quality control and filtering, trimming of adapters and primers, pairing of reads, denoising/clustering - formation of ASVs (Amplicon Sequence Variants) or OTUs, removal of chimeras (artificial sequences), taxonomic classification, phylogenetic analysis, and diversity calculation.
A popular tool that includes all of the above steps is QIIME 2 (Quantitative Insights Into Microbial Ecology). Two tools are available for denoising inside QIIME 2: DADA2 (Divisive Amplicon Denoising Algorithm) - trains an error model, works with both paired and single reads, and is used when the priority is to identify rare taxa; Deblur - uses a pre-trained error model, works only with forward reads (if the data is paired-read, it must be merged in advance), and is used when specificity is a priority (106–108). The next pipeline, mothur, is a classic tool for analyzing 16S rRNA gene sequencing data, creating OTUs by default rather than ASVs (although DADA2 can be integrated) (109). DADA2 is a tool that can be used either alone or in combination with other pipelines. DADA2 is a machine learning algorithm for inferring true biological sequences (ASVs) instead of clustering into OTUs. DADA2 modulates amplification and sequencing errors using quality scores to estimate the probability of error in each nucleotide (107). USEARCH is a closed platform that includes several algorithms: UPARSE for OTU clustering and UNOISE3 for ASV creation (110, 111). However, last year USEARCH released open-source software (112). A comparison of the pipelines described above for processing 16S rRNA gene sequencing data is presented in Table 1. Regarding the differences between OTU and ASV, OTU is an approach that groups sequences into clusters at 97% identity (sometimes 99%), which corresponds to the species level of classification (113). ASV is a modern approach that resolves sequences to the level of a single nucleotide. ASV is characterized by high resolution, detection of rare taxa, low risk of over-merging, moderate risk of over-splitting, and more accurate assessment of alpha diversity compared to OTU (114). Thus, based on a combination of indicators, the ASV approach is currently preferable. As for existing taxonomic databases for 16S rRNA amplicon analysis, SILVA v138+ and GSR-DB (consisting of Greengenes+SILVA+RDP) are currently the most widely used for the highest accuracy at the species level (115, 116). In addition to the classifiers built into the QIIME2 and mothur pipelines, which are based on k-mer frequencies and do not require sequence alignment, alignment-based methods can also be used: BLAST+ and VSEARCH. The former is more accurate but slower, while the latter is an open-source alternative to USEARCH (117, 118). Both of these classifiers can be implemented in QIIME 2. In addition, the standard taxonomic classifiers for shotgun metagenomic data, Kraken and Bracken, also support 16S rRNA gene databases and are faster and more accurate (119, 120). Methods using machine learning are actively being developed. 16S Classifier uses Random Forest to classify hypervariable regions (121), and Cliffy is a new method based on neural networks for robust 16S rRNA gene classification (122).
With regard to third-generation technologies and the development of long-read sequencing technologies, the question arises of minimizing errors that occur in long reads. EMU is a specialized tool for taxonomic classification of full-length 16S rRNA gene sequences with high error rates obtained from third-generation sequencing. EMU was specifically designed to overcome the high error rate of long reads when analyzing full-length 16S rRNA gene sequencing data. EMU first aligns reads to a reference database using minimap2, and then applies an expectation-maximization (EM) algorithm with correction, using information from the entire community for iterative error correction. EMU is capable of distinguishing between species with high genetic similarity and also reduces false positives. The limitations of this tool are that it only classifies organisms that are in the database and requires significant computational resources in terms of both memory and time (103).
DeepConsensus is a sequence error correction tool developed by Pacific Biosciences that uses deep learning (transformer-encoder) to improve consensus sequences in PacBio CCS (Circular Consensus Sequencing) and create HiFi reads. DeepConsensus uses a multiple sequence alignment of the PacBio subread bases and a draft consensus from the current production method (pbccs). This gap-aware transformer–encoder approach more accurately represents misalignment errors in the training process. DeepConsensus is capable of correcting mismatch errors, homopolymer INDEL (insertion-deletion) errors, and non-homopolymer INDEL errors (123).
3.2 ITS sequencing
Fungi play an important role in soil health, and differences in decomposability characteristics among guilds of mycorrhizal fungi are a critical factor in soil carbon cycling (124). Mycorrhizal fungi play a fundamental role in soil formation and nutrient cycling. For example, hyphal networks bind soil particles together, improving soil structure, aeration, and water retention capacity (125). In terms of plant nutrition, arbuscular mycorrhizal fungi not only facilitate access to phosphorus and nitrogen, but can also directly mobilize nitrogen from organic matter, accelerating its decomposition (126). In forest ecosystems, this function is extended by ectomycorrhizal fungi, which decompose complex organic materials, ensuring nutrient recycling across the entire ecosystem (127). Seasonal dynamics of guilds highlight the role of fungi in nutrient cycling, with saprotrophs causing litter decomposition during colder months: saprotrophic fungi dominated in autumn/winter (66.49–76.01%), while symbiotic fungi peaked in spring (up to 7.27%) in Robinia pseudoacacia L. plantations (128). Thus, the use of high-throughput sequencing in the study of soil fungi is important for understanding biogeochemical cycles, spatial and temporal patterns of fungal dynamics, tracking long-term and short-term changes associated with anthropogenic impacts or climate, and adjusting soil management practices.
ITS (Internal Transcribed Spacer) are non-coding regions located between the 18S and 28S rRNA genes (Figure 3). ITS sequencing in microbiome analysis is analogous to 16S rRNA gene sequencing, with the difference between the two methods being that the former is used for eukaryotes, in particular for studying the soil microbiome, often to investigate the taxonomic representation of fungi, while the latter is applicable for studying prokaryotes. The ITS region consists of two fragments, ITS1 and ITS2. ITS1 is located between the 18S and 5.8S coding regions, while ITS2 is located between 5.8S and 28S. The total length of this region is about 550 base pairs, which is why amplicon sequencing with short reads and a choice between ITS1 and ITS2 was often used (19, 129). The development of third-generation sequencing technologies has made it possible to sequence the entire region. For a more accurate taxonomic analysis, the method of sequencing regions of the gene of small or/and large ribosome subunits in addition to ITS is often used today. Thus, the entire 18S-ITS-28S region is sequenced. (3, 130–135).
By sequencing the ITS1 region using Ion Torrent technology, Feng Xue et al. (136) established that the most abundant types of fungi in the rhizosphere soil of grape from Shihezi in Xinjiang are Ascomycota and Basidiomycota, and the dominant classes of fungi are Sordariomycetes and Dothideomycetes. In addition, the maximum abundance and diversity of fungi were observed in the middle soil layer (20–35 cm) in the rhizosphere soil of grape where grapes had been cultivated for 12 years, while the lowest fungal abundance was observed in the lower layer (35–50 cm) of the rhizosphere of grapes grown for 5 years.
In addition, pure colonies of soil fungi from the biocrust and rhizosphere of arid grassland in Utah, USA, were identified by ITS and large ribosomal subunit sequencing using Illumina technology. The most abundant types were Ascomycota (88%), Mucoromycota (8.6%), and Basidiomycota (3.4%). Thirty percent of the isolated fungal cultures were keratinolytic but were not dermatophytes (131).
ITS2 sequencing contributed to the study of the soil microbiome of diseased potato plants in New Brunswick, Canada. Verticillium dahliae was found in all soil samples, with significantly higher levels in the stems of diseased plants. The stems also contained the fungi Plectosphaerella cucumerina, Colletotrichum coccodes, Botrytis sp. and Alternaria alternata, which were also the cause of the potato early dying disease complex (3). Sequencing of the ITS region using PacBio showed the effect of titanium on crop yields and efficiency. Titanium ions changed the relative abundance of conservative taxa, increased the interaction between soil bacteria and fungi, and increased the stability of the soil microbial community (137).
ITS2 sequencing also contributes to the creation of sustainable agrosystems (78), research into the impact of environmental disturbances on soil microbiome taxonomy (92) and the structure of fungal communities (132).
3.2.1 Bioinformatic processing of ITS sequencing data
The same tools are used to process ITS data as for 16S rRNA gene Sequencing Data analysis: QIIME 2, mothur, DADA2, but with minor modifications. Since the length of variable ITS fragments is inconsistent and varies between species, variable-length reads must be processed (106, 107, 109). Optionally, ITSx (or the ITSxpress plugin for QIIME 2) can be used for automatic ITS1/ITS2 extraction (138, 139). The choice between OTU and ASV in eukaryote classification is not so clear-cut. For fungal ITS analysis, 97% OTU clustering is preferable to the DADA2 ASV approach due to high intraspecies variation, but some researchers support the ASV approach (140). LotuS2, PipeCraft2, and FROGS exist as separate pipelines for analyzing ITS regions (141–143). They already include ITSx in their composition. The UNITE database is most often used to classify fungi and other eukaryotes based on ITS sequences. (144). Artificially merged sequences, which are more common in ITS data due to high length variation, are removed using UCHIME (145). FUNGuild and FungalTraits are used for functional annotation of fungi (146, 147). A study by Benjawan Tanunchai et al. (148) demonstrated that FungalTraits outperforms FUNGuild in assigning a greater number and quality of ASVs, as well as a higher frequency of significant correlations with environmental factors. QIIME 2 and mothur pipelines are used to analyze the 18S-ITS-28S full operon sequencing data (106, 109). For taxonomic classification in this case, the SILVA databases consisting of SILVA SSU (16S/18S) and SILVA LSU (23S/28S) are used, as well as the Ribosomal Operon Database (ROD), a specialized database for the complete ribosomal operon (115, 149).
3.3 Shotgun metagenomic sequencing
One of the most effective and widely used sequencing approaches is shotgun metagenomic sequencing technology (150). This approach allows millions of overlapping DNA sequence reads to be obtained and involves the use of computational algorithms that utilize sequence overlap regions to assemble sequences into fragments and into a complete genome (68, 151). Researchers can analyze the data obtained using taxonomic classifiers, which provides a complete picture of complex microbiomes. One of the significant advantages of shotgun sequencing over 16S rRNA gene sequencing is that it is not limited to bacteria. Shotgun metagenomics allows the composition, diversity, and functions of soil organisms to be determined. This method makes it possible to highlight entire genes, giving researchers the opportunity to determine their role in the metagenome. In addition, shotgun sequencing allows for strain-level classification, which is more accurate than genus-level classification using amplicon sequencing (152, 153). However, shotgun metagenomic sequencing of soil requires an increase in the coverage (data output) per sample due to the greater taxonomic diversity compared to the study of metagenomes of other biocenoses, such as the animal or human microbiome (154–158).
Shotgun metagenomics can reveal the diversity of soil bacteria, their composition and functions, and the microbial food cycle.
Using shotgun metagenomics on the Illumina NovaSeq platform, Beat Frey et al. (159) studied the functional genetic potential of soil microorganisms in Swiss forests. The aim of the study was to identify the influence of tree species and soil depth on the genetic repertoire and to gain insight into the carbon and nitrogen cycles in microorganisms. The relative abundance of taxa at the domain level showed that there are 5–10 times more Archaea in deep soil layers, while the abundance of Bacteria does not change depending on soil depth. An increased content of carbohydrate-active enzyme genes was also found in deep soil layers.
One of the important advantages of shotgun technology is the possibility of functional analysis. For example, when studying natural agricultural lands in Ethiopia using shotgun sequencing on the Illumina NovaSeq 6000 platform, not only was an understanding of the taxonomic diversity of two separately selected natural agricultural lands obtained, but the presence of protein domains, families, and pathways associated with the biosynthesis of secondary metabolites (21). The analysis also revealed the presence of new putative bacterial precursor genes, such as polyketide synthases and non-ribosomal peptide synthetases, which may encode new secondary metabolites, such as antibiotics.
Shotgun sequencing collects a large amount of data on the composition of the microbiome of completely different soils from around the world, providing a basis for further research, including comparative studies. Thus, data on the taxonomic diversity and functional profile of the soil of the Guanica dry forest (58), the tropical forest in the Peruvian Amazon (20), saline soils in the Odiel Saltmarshes Natural Area of Southwest Spain (59), soils of termite mounds in the North West Province of South Africa (60). Research into the rhizosphere is also actively underway, for example, studies have been conducted on the rhizosphere of corn (160), carrots (161), and the honeybush Cyclopia intermedia (162).
3.3.1 Bioinformatic processing of shotgun metagenomic sequencing data
A complete shotgun data analysis includes the following critical steps: quality control, trimming and filtering, mapping/alignment, assembly, binning, and taxonomic classification.
There are various tools for quality control depending on the type of data (Table 2). For example, the most popular tool for analyzing short reads is FastQC, which provides HTML reports with graphs showing quality distribution by position, GC content, adapter presence, and other metrics applicable for quality assessment (163, n.d.). NanoPack is specialized for analyzing long reads (ONT and PacBio), and the collection includes NanoPlot for visualization, NanoStat for statistics, NanoQC for read quality analysis, and NanoComp for dataset comparison (164). PycoQC is designed only for ONT data, providing interactive graphs with the ability to zoom, filter, and explore data in detail (165, n.d.). LongQC is a comprehensive tool for assessing the quality of long reads, analyzing quality distribution, read length, and coverage completeness, using k-mer overlap to detect non-sense reads, contaminants, and artifacts without the need for a reference genome (166). LongReadSum is the most advanced tool implemented in multithreaded C++ for maximum performance. It is the only tool with comprehensive support for all major formats, including POD5, the new ONT format. LongReadSum provides base signal analysis from FAST5/POD5 files and supports base modification (methylation) detection (167).
The next step is trimming (cutting off adapters and low-quality bases). Popular tools: Cutadapt - a universal adapter trimmer that supports high-quality trimming of 3’ and 5’ ends, including a special mode for NextSeq/NovaSeq dual-color chemistry, and can also perform poly-A/poly-T trimming to detect polyadenylation tails (168), Trim Galore - a Perl wrapper around Cutadapt and FastQC that automates simultaneous adapter removal and quality control (169, n.d.), Trimmomatic - a tool for processing (cleaning, trimming, adapting, filtering) Illumina data, supporting both single and paired reads (170), BBDuk (Decontamination Using Kmers) uses k-mer matching to detect adapters and contaminants (171, n.d.), Porechop_ABI (an algorithm based on approximate k-mers and capable of detecting adapter sequences based solely on their frequency) and Dorado specialize in removing adapters from Oxford Nanopore data (172, 173, n.d.). Dorado is a modern tool for basecalling ONT data with built-in quality assessment. It converts raw Nanopore signals (POD5/FAST5) into base sequences, performs demultiplexing, alignment, modification detection, and variant calling (173, n.d.). Unlike Illumina and ONT technologies, PacBio data does not require manual adapter trimming, as this is provided by built-in SMRTbell adapter cleaning algorithms in PacBio’s proprietary software (174, n.d.).
Data filtering - removal of host DNA. Bowtie2, BWA, and KneadData (combines Trimmomatic and Bowtie2) - alignment-based methods align each read against the host genome index and remove those that match. These methods are highly accurate (>99.9%), but require large amounts of memory and time for indexing (175–177, n.d.). Hostile is a modern hybrid tool that uses Bowtie2 for short reads and Minimap2 for long reads. The key advantage is that optional index masking against bacterial and viral genomes allows more than 99.99% of microbial reads to be retained while removing >99.6% of host reads (178). HoCoRT is a tool that provides a unified interface for a variety of methods (Bowtie2, HISAT2, Kraken2, BioBloom, BBMap, Minimap2). It includes recommendations for selecting the optimal algorithm for different types of data: BioBloom or Bowtie2 is recommended for the gut microbiome, Bowtie2 for the oral microbiome, and Kraken2 + Minimap2 for long reads (179).
With regard to assembly, all tools are divided into several groups: methods based on the de Bruijn graph, the overlap graph, and hybrid assembly methods. The first group includes MEGAHIT, metaSPAdes, SKESA, and SPAdes, which work with short reads (180–182). MEGAHIT uses a succinct graph, significantly reducing memory requirements, but metaSPAdes demonstrates higher contig quality, but requires more memory. MEGAHIT outperformed metaSPAdes on deeper sequenced datasets (>100x), and metaSPAdes obtained better results than MEGAHIT on low-complexity datasets (183). Methods using overlapping graphs (metaFlye, Canu, Flye, Lathe) are designed for long PacBio and Nanopore reads (184–186). The metaMDBG tool was recently released for assembling PacBio HiFi reads (187). metaFlye demonstrates higher performance in reconstructing complete microbial genomes. In a study by Zhang et al. (183) Canu, metaFlye, and Lathe performed significantly better than other assemblers. metaFlye generated assemblies with the highest genome coverage and total assembly length. Hybrid methods (Unicycler, metaSPAdes –meta with hybrid mode) combine short and long reads for maximum assembly quality. According to the results of a study by Catarina Inês Mendes et al. (188) MEGAHIT, metaSPAdes, SKESA, SPAdes, and Unicycler showed the best assembly quality on various mock communities.
To reconstruct metagenome-assembled genomes (MAGs) after assembly, it is necessary to perform binning of metagenomic contigs. Currently, hybrid binning programs (combining k-mer and GC composition accounting with coverage) are very common. These include MetaBAT2 (an updated version of MetaBAT) — based on a graph-oriented approach, a modified label propagation algorithm (LPA) (189); CONCOCT uses a statistical approach (190), MaxBin/MaxBin 2.0 uses marker genes (191). Another radically new approach to metagenomic data binning is the use of neural networks and deep learning to train better representations of contigs. Among the programs of this generation are: SemiBin (using deep Siamese neural networks), VAMB, COMEBin, GenomeFace, and McDevol (192–196). According to a study by Yazhini Arangasamy et al. (196), real-world data showed that MetaBAT2 and GenomeFace were the fastest tools, while COMEBin was the slowest and caused memory shortage issues; however, COMEBin may be more suitable for large-scale metagenomic studies. McDevol proved to be faster than COMEBin. McDevol is powerful in the context of high sample coverage, which is not observed at low coverage. New approaches to binning include programs that use information about contig connectivity from assembly graphs: MetaCoAG, GraphMB, RepBin, CCVAE, UnitigBIN, GraphBin-Tk (an assembly graph-based metagenomic binning tool that combines the capabilities of MetaCoAG, GraphBin, and GraphBin2). (197–203). In addition to the tools described above, there are ensemble methods that combine the actions of several binner tools to improve quality and reduce bin contamination: DAS Tool, MetaBinner, BASALT, Binette (204–207). Also worthy of special attention are pipelines that work with raw reads, outputting MAGs and their analysis, such as MetaWRAP and MetaflowX (208, 209).
Taxonomic classification is an important step in shotgun metagenomic data processing, allowing us to determine which microorganisms are present in a sample and in what quantities. There are many tools available for taxonomic classification of metagenomic data (Table 3). K-mer-based methods (Kraken2, KMCP, Kaiju) perform a direct search for exact k-mer matches in a pre-built database (119, 210, 211). Kraken2 uses the LCA (Lowest Common Ancestor) algorithm to determine the most specific taxonomy to which a k-mer belongs and is one of the most widely used classifiers (119). Kraken2 with a confidence threshold of 0 detects more species but with lower accuracy, while Kraken2 with a confidence threshold of 1 is very accurate but detects very few species (especially in the environment) (212). Alignment-based methods (MEGAN, Woltka) align reads against reference genomes and assign taxonomy based on the quality of the alignment (213–215). The marker gene method - MetaPhlAn4 uses species-specific marker genes (usually conserved protein regions), which avoids the need for complete alignment. MetaPhlAn4 combines marker genes with MAGs (metagenome-assembled genomes) for better detection of new species through unknown species-level genome bins (216). Alignment-based and LCA-based classifiers (MEGAN, Woltka) often require pre-assembly and binning, whereas k-mer-based methods (Kraken2, KMCP, Kaiju) and MetaPhlAn4 work directly with reads without assembly.
4 Comparison of sequencing technologies
Scientists have repeatedly attempted to compare sequencing technologies for soil, gut, and other microbiomes. Comparisons are made based on both technical (principle of sequencing, duration of run, sequencing depth, quality) and economic characteristics (equipment and sequencing cost) (217). Of particular interest to scientists studying the soil microbiome is the comparison of sequencing platforms in terms of the detection of bacterial taxa and species, especially the ability to detect low-represented microbial communities. This task is complex, as each sequencing method has its limitations. When comparing different technologies, researchers pay particular attention to the availability and correctness of specialized databases (218).
In our review, we have attempted to compile studies that provide comparative characteristics of various sequencing technologies for detecting microbiome diversity (including Alpha, Beta Diversity, and technical characteristics of sequencing technologies) (Table 4).
The comparative analysis of sequencing platforms across key technical parameters (read length, accuracy, cost, data yield) provides nuanced insights into the advantages and limitations of each technology in microbiome studies. Read length varies substantially between platforms: Illumina 16S rRNA amplicon sequencing produces short reads typically ranging from 150 to 300 bp, sufficient for targeting specific hypervariable regions of the 16S rRNA gene (221). This limitation constrains detailed taxonomic resolution but allows for massive throughput with billions of reads per run (16). Hybrid metagenomics combines short reads (~ 150 bp) with long HiFi reads from PacBio (~6,000–7,000 bp), enhancing genome coverage and functional insight (98, 224, 236). PacBio technologies offer long reads predominantly between 10–25 kb, with some extending beyond 30 kb, which facilitates more robust genome assembly and strain-level resolution (227, 228). Oxford Nanopore Technologies (ONT) deliver ultra-long reads with averages from 4,000 to 7,600 bp and maximum reads surpassing 200 kb, enabling comprehensive genome resolution including repetitive and complex regions (232–234, n.d.; 235).
The cost differential aligns broadly with technical complexity: Illumina offers the most cost-effective platform, especially for short reads and high throughput projects (219). Shotgun metagenomics is more expensive, requiring deeper coverage and complex bioinformatics (237–239). PacBio long reads entail higher equipment and per-sample costs but provide unparalleled read length and accuracy (228, 240). ONT stands out as a flexible and accessible option, with lower capital expenditure and scalable throughput, suitable especially for field-based applications (241, 242). However, the choice of sequencing method and platform should be carefully considered, taking into account the nature of the research object, the scientific question, and the capabilities of the laboratory. Hence, we aim to focus our comparative analysis on recent studies that offer insights into microbiome diversity across different platforms. This approach will also highlight potential challenges researchers might encounter.
Amplicon sequencing technology (targeted sequencing of 16S rRNA gene regions in archaea and bacteria and the ITS region in fungi) is widely used to study the soil microbiome (3, 76, 77, 136). However, the technique has a number of limitations.
One of the disadvantages of amplicon sequencing is the distortion of microbial profiles due to primer bias (243, 244). Intragenomic variability also contributes to profile distortion (245) and taxonomic differences in the number of copies of the 16S rRNA gene (246, 247). This variability can misrepresent the abundance of microbial taxa, making it difficult to accurately assess diversity.
Metagenomics using the shotgun approach allows the identification of microbial species and strains, functional gene profiling, and the discovery of new genomes using metagenomically assembled genomes. These methods are important for studying soil microbial diversity. However, despite its broader coverage, the shotgun method is expensive compared to amplicon sequencing and requires significant computing power. Shotgun data analysis technology is also prone to false species identification due to the bioinformatics tools used (218, 248, 249).
Studies comparing the sequencing results of these technologies reveal the following patterns. Dominant bacterial types are reliably identified by both amplicon and fragment sequencing. However, fractional sequencing reveals a higher percentage of new domains compared to amplicon sequencing (154, 250).
16S rRNA gene amplicon sequencing usually provides conservative estimates of Shannon diversity. Amplification of specific variable regions of the 16S rRNA gene provides good taxonomic resolution, but PCR biases and limited region selection (often V3-V4) can underestimate actual diversity and prevent the detection of rare genera. Shotgun metagenomic sequencing allows for a more complete assessment of alpha diversity by capturing all genes, including rare species, rather than just rRNA genes. This leads to increased Shannon and beta diversity and better detection of low-represented microorganisms. Ravi Ranjan et al. (238) demonstrates that shotgun metagenomics reveals richer communities with a more even distribution. That is, metagenomic shotgun studies, compared to 16S rRNA gene sequencing, allow for the identification of a greater diversity of soil microorganisms, especially if the study involves working with low-represented species, while amplicon sequencing remains cost-effective and data analysis pipelines are simplified compared to metagenomics.
A comparison of Illumina technologies with long-read technologies also showed that Shannon diversity according to Illumina is often lower compared to PacBio or ONT methods (220, 226).
Szoboszlay M et al. (230) showed that differences in the results of sequencing the structure of the bacterial community in feces using Nanopore and Illumina technologies were significant but small compared to the variations between samples. The results show that Nanopore is preferable for sequencing 16S rRNA gene amplicons when accurate taxonomic identification at the genera level, study of rare taxa, or detailed assessment of diversity is required. At the same time, Illumina 16S rRNA gene sequencing is suitable for communities with a large number of unknown genera and studies where it is important to determine amplicon sequence variants.
Thanks to their ability to read the entire 16S rRNA gene, PacBio and ONT long-read technologies provide increased accuracy in taxon identification (16). PacBio and ONT typically demonstrate higher Shannon diversity values compared to Illumina, which is associated with better detection of closely related and rare genus, but there are also differences between these technologies. Comparisons between these technologies are often made.
The PacBio platform, using a cyclic consensus sequencing model, is capable of providing high-resolution genera identification with an accuracy exceeding 99.9% by repeatedly passing the same DNA molecule (16, 98). In turn, ONT has advantages such as portability and cost-effectiveness, but for the detection of low-represented genera, ONT requires higher coverage, which brings the cost of ONT close to that of PacBio technology (226). ONT sensitivity and accuracy are currently inferior to PacBio (251, 252), but recent modifications to ONT using the latest reagent kits, R10.4.1 dual-read head flow cells, base calling algorithms, and error correction tools based on machine learning and deep learning improve read accuracy and taxonomic resolution (101–103, 235).
In studies comparing long-read sequencing, the authors also emphasize the importance of selecting the optimal number of reads to fully capture microbial diversity. Comparisons of technologies show that at low sequencing depths, significant differences in diversity metrics can be observed between ONT and PacBio technologies, and increasing the sequencing depth leads to more comparable results when using different technologies (226, 253–255).
5 New trends in metagenomics
Current trends in metagenomics include genome-recognizing metagenomics, an approach in which individual genomes are reconstructed from metagenomic data (MAGs, metagenome-assembled genomes), which allows the structure and functions of communities to be studied at the level of individual organisms without isolating them in culture, which is quite relevant given the small percentage of microorganisms that can be cultured in soil. Álvaro Rodríguez del Río et al. (256) recovered 742 mostly unknown bacterial and 1865 viral Metagenome-Assembled Genomes (MAGs) in their study and found that multiple factors (warming, drought, nitrogen depositions, salinity, etc.) exert selective pressure on soil prokaryotes and viruses that is not observed at the level of individual factors. Thanks to the ability to study MAGs, many uncultivable microorganisms have been identified, and it has become possible to link these microorganisms to global biogeochemical cycles occurring in the soil: nitrogen, carbon, and sulfur cycles (257–260). Analysis of MAGs allows us to analyze the composition of the soil microbiome depending on precipitation (261) and discover new genera of bacteria for specific soils (262).
Hybrid metagenomics is a separate area that is gaining popularity. The use of hybrid metagenomics allows for more accurate MAG assembly, binning, and annotation. The use of short reads and HiFi long reads (HiFi-LR) contributes to an increase in the number and quality of recovered MAGs. High-depth short-read data improve the binning of HiFi-LR contigs (15, 263). Long-read sequencing combined with metatranscriptomic analysis provides direct insight into the functional dynamics of microbial biosynthetic gene clusters in ecological processes (264).
Metagenomics allows the creation of functional “maps” of soil microbiomes, revealing the relationships between taxonomy, genes, and ecosystem functions. Through metagenomic analysis, Zichen Huang et al. (265) identified functional genes in the microbiome that correlate positively or negatively with soil factors. Functional genes associated with carbon fixation and carbon degradation positively correlate with soil organic carbon (SOC) and total nitrogen content, while methane metabolism showed an inverse correlation. Genes associated with nitrogen degradation and denitrification positively correlate with SOC content and negatively correlate with nitrate nitrogen levels, while the opposite correlation trend is associated with nitrification genes. Metagenomic analysis also allows the detection of changes in genes involved in soil nitrogen and phosphorus cycling and their interaction in response to different fertilizers (266). Analysis of the soil microbiome under three different deciduous trees showed that Castanea dentata forms a distinct soil microbiome of three species, both functionally and taxonomically, with overall suppression of functional genes in the pathways of nitrification, denitrification, and nitrate reduction pathways. Overall, tree species can mediate the abundance of key microbial genes involved in nitrogen and (to a lesser extent) carbon metabolism pathways in soil (267). Thus, metagenomics contributes to the development of a link between the soil microbiome and its functional genes that respond to environmental stimuli and conditions.
The use of multi-omics approaches to study soil ecosystems is also gaining popularity. It is a cutting-edge approach for a comprehensive understanding of the functioning of soil microbiomes. 880 samples from the global Earth Microbiome Project were sequenced using amplicon (16S, 18S, ITS) and shotgun methods, as well as subjected to metabolomic analysis (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry), resulting in the creation of a global database of microbial and metabolite associations, as well as establishing correlations between specific metabolites and taxonomic groups in the soil (268). The use of metagenomic sequencing and metabolomic research also contributes to the creation of recommendations for the cultivation of medicinal herbs. Chengcheng Liu et al. (269) found that the relative abundance of beneficial microorganisms decreased by the third year of Fritillaria unibracteata cultivation, while the relative abundance of harmful microorganisms showed an upward trend. Metabolomics results showed that there were significant changes in the composition of metabolites in the rhizosphere soil, and the relative abundance of some beneficial metabolites (e.g., myristic acid and oleic acid) showed a downward trend. The combined use of metagenomic and metatranscriptomic studies allows us to study the relationship between biogeochemical cycles, in particular between nitrogen and carbon cycles. The addition of glucose rapidly increased the transcription of genes encoding ammonium and nitrate transporters, enzymes responsible for nitrogen assimilation, and genes associated with the nitrogen regulatory network, thereby Peter F. Chuckran et al. (270) found that soil microbial communities can respond rapidly to changes in carbon availability by radically altering the transcription of genes that control the nitrogen cycle. Multi-omics approaches also facilitate the study of interactions between microorganisms and metabolites to better understand the metabolic processes that lead to greenhouse gas emissions during ecosystem changes or predict biogeochemical changes (47, 271).
6 Conclusions
This review underscores the crucial role of sequencing technologies in soil microbiome research and their significance for unraveling intricate biological processes within soil ecosystems. The article provides a detailed overview of widely used sequencing methods, including 16S rRNA gene sequencing, ITS sequencing and shotgun sequencing on modern platforms (Illumina, PacBio, ONT), as well as their successful application for soil microbiome analysis and the study of other ecosystems. In addition to sequencing technologies, this review discusses various bioinformatics tools for analyzing sequencing data, including artificial intelligence-based tools, and current trends in metagenomics. The review also includes comparative studies that identify the main advantages and limitations of each sequencing technology, allowing researchers to make more informed decisions about which method to use based on the goals and resources of the project. Recognizing the advantages and disadvantages of various sequencing technologies enhances our ability to study microbial communities and their influence on agroecology. Comparative data are essential for evaluating sensitivity and accuracy in identifying genera/species, and functional relevance. This approach will fully harness the potential of the soil microbiome, leading to innovative solutions.
Author contributions
DR: Conceptualization, Visualization, Writing – original draft, Writing – review & editing. MB: Visualization, Writing – original draft, Writing – review & editing. LS: Writing – original draft, Writing – review & editing. NM: Conceptualization, Writing – original draft, Writing – review & editing. AV: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared financial support was received for this work and/or its publication. This publication has been supported by the RUDN University Scientific Projects Grant System, Project No. 202760-2-000.
Acknowledgments
During the preparation of this manuscript, the authors used https://bioicons.com/?query=soilArabidopsis_whole_plant_hydroponics_versus_soil icon by Frédéric Bouché https://figshare.com/authors/Plant_Illustrations/3773596 licensed under CC-BY 4.0 Unported https://creativecommons.org/licenses/by/4.0/ for the purposes of visualization. The authors have reviewed and edited the output and take full responsibility for the content of this publication. The authors express gratitude to the Association of Specialists in the Field of Molecular, Cellular and Synthetic Biology (Russia) for their efforts in uniting specialists involved in this study.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Anikwe M and Ife K. The role of soil ecosystem services in the circular bioeconomy. Front Soil Sci. (2023) 3:1209100. doi: 10.3389/fsoil.2023.1209100
2. Hima Parvathy A, Santhoshkumar R, and Soniya EV. Next-generation sequencing-based comparative mapping and culture-based screening of bacterial rhizobiome in phytophthora capsici-resistant and susceptible piper species. Front Microbiol. (2024) 15:1458454. doi: 10.3389/fmicb.2024.1458454
3. Borza T, Lumactud RA, Shim SY, Al-Mughrabi K, and Prithiviraj B. Microbial community composition associated with potato plants displaying early dying syndrome. Microorganisms. (2025) 13:14825. doi: 10.3390/microorganisms13071482
4. Hartmann M and Six J. Soil structure and microbiome functions in agroecosystems. Soil Structure Microbiome Functions Agroecosyst. (2023) 4:4–18. doi: 10.1038/s43017-022-00366-w
5. Banerjee S and van der Heijden MGA. Soil microbiomes and one health. Nat Rev Microbiol. (2023) 21:6–205. doi: 10.1038/s41579-022-00779-w
6. Yu T, Cheng L, Liu Q, Wang S, Zhou Y, Zhong H, et al. Effects of waterlogging on soybean rhizosphere bacterial community using V4, loopSeq, and pacBio 16S rRNA sequence. Microbiol Spectr. (2022) 10:e0201121. doi: 10.1128/spectrum.02011-21
7. Salam LB, Obayori OS, Ilori MO, and Amund OO. Chromium contamination accentuates changes in the microbiome and heavy metal resistome of a tropical agricultural soil. World J Microbiol Biotechnol. (2023) 39:2285. doi: 10.1007/s11274-023-03681-6
8. Jiao J-Y, Liu L, Hua Z-S, Fang B-Z, Zhou E-M, Salam N, et al. Microbial dark matter coming to light: challenges and opportunities. Natl Sci Rev. (2021) 8:nwaa280. doi: 10.1093/nsr/nwaa280
9. Omotayo OP, Igiehon ON, and Babalola OO. Microbial genes of agricultural importance in maize rhizosphere unveiled through shotgun metagenomics. Spanish J Soil Sci. (2022) 12:10427. doi: 10.3389/sjss.2022.10427
10. Prosser JI. Dispersing misconceptions and identifying opportunities for the use of “omics” in soil microbial ecology. Nat Rev Microbiol. (2015) 13:439–46. doi: 10.1038/nrmicro3468
11. Shendure J and Ji H. Next-generation DNA sequencing. Nat Biotechnol. (2008) 26:1135–455. doi: 10.1038/nbt1486
12. Cao Y, Fanning S, Proos S, Jordan K, and Srikumar S. A review on the applications of next generation sequencing technologies as applied to food-related microbiome studies. Front Microbiol. (2017) 8:1829. doi: 10.3389/fmicb.2017.01829
13. Satam H, Joshi K, Mangrolia U, Waghoo S, Zaidi G, Rawool S, et al. Next-generation sequencing technology: current trends and advancements. Biology. (2023) 12:997. doi: 10.3390/biology12070997
14. Tafazoli A, Hemmati M, Rafigh M, Alimardani M, Khaghani F, Korostyński M, et al. Leveraging long-read sequencing technologies for pharmacogenomic testing: applications, analytical strategies, challenges, and future perspectives. Front Genet. (2025) 16:1435416. doi: 10.3389/fgene.2025.1435416
15. Belliardo C, Maurice N, Pere A, Mondy S, Franc A, Bailly-Bechet M, et al. Accurate MAG reconstruction from complex soil microbiome through combined short- and hiFi long-reads metagenomics. bioRxiv. (2025). doi: 10.1101/2025.09.12.675765. Preprint.
16. Johnson JS, Spakowicz DJ, Hong B-Y, Petersen LM, Demkowicz P, Chen L, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun. (2019) 10:5029. doi: 10.1038/s41467-019-13036-1
17. Li X, Kong P, Daughtrey M, Kosta K, Schirmer S, Howle M, et al. Characterization of the soil bacterial community from selected boxwood gardens across the United States. Microorganisms. (2022) 10:1514. doi: 10.3390/microorganisms10081514
18. Vishal V, Munda SS, Singh G, and Lal S. Cataloguing the bacterial diversity in the active ectomycorrhizal zone of astraeus from a dry deciduous forest of shorea. Biodivers Data J. (2021) 9:e63086. doi: 10.3897/BDJ.9.e63086
19. Kauserud H. ITS alchemy: on the use of ITS as a DNA marker in fungal ecology. Fungal Ecol. (2023) 65:101274. doi: 10.1016/j.funeco.2023.101274
20. Cobos M, Estela SL, Rodríguez HN, Castro CG, Grandez M, and Castro JC. Soil microbial diversity and functional profiling of a tropical rainforest of a highly dissected low hill from the upper itaya river basin revealed by analysis of shotgun metagenomics sequencing data. Data Brief. (2022) 42:108205. doi: 10.1016/j.dib.2022.108205
21. Kifle BA, Sime AM, Gemeda MT, and Woldesemayat AA. Shotgun metagenomic insights into secondary metabolite biosynthetic gene clusters reveal taxonomic and functional profiles of microbiomes in natural farmland soil. Sci Rep. (2024) 14:150965. doi: 10.1038/s41598-024-63254-x
22. Herazo-Álvarez J, Mora M, Cuadros-Orellana S, Vilches-Ponce K, and Hernández-García R. A review of neural networks for metagenomic binning. Briefings Bioinf. (2025) 26:bbaf065. doi: 10.1093/bib/bbaf065
23. Liang Q, Bible PW, Liu Y, Zou B, and Wei L. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genomics Bioinf. (2020) 2:lqaa009. doi: 10.1093/nargab/lqaa009
24. Antonielli L, Pfeiffer S, and Sessitsch A. Microbial community prediction in plant-soil systems using machine learning. (2024). Available online at: https://ai4soilhealth.eu/event/international-conference-artificial-intelligence-for-soil-health/
25. Liu M, Li Y, and Li H. Deep learning to predict the biosynthetic gene clusters in bacterial genomes. J Mol Biol. (2022) 434:1675975. doi: 10.1016/j.jmb.2022.167597
26. Villafuerte AB, Soria R, Rodríguez-Berbel N, Zema DA, Lucas-Borja ME, Ortega R, et al. Short-term evaluation of soil physical, chemical and biochemical properties in an abandoned cropland treated with different soil organic amendments under semiarid conditions. J Environ Manage. (2024) 349:119372. doi: 10.1016/j.jenvman.2023.119372
27. Malone Z, Berhe AA, and Ryals R. Impacts of organic matter amendments on urban soil carbon and soil quality: A meta-analysis. J Cleaner Product. (2023) 419:138148. doi: 10.1016/j.jclepro.2023.138148
28. Aasfar A, Bargaz A, Yaakoubi K, Hilali A, Bennis I, Zeroual Y, et al. Nitrogen fixing azotobacter species as potential soil biological enhancers for crop nutrition and yield stability. Front Microbiol. (2021) 12:628379. doi: 10.3389/fmicb.2021.628379
29. Mariotti L, Scartazza A, Curadi M, Picciarelli P, and Toffanin A. Azospirillum baldaniorum sp245 induces physiological responses to alleviate the adverse effects of drought stress in purple basil. Plants (Basel Switzerland). (2021) 10:11415. doi: 10.3390/plants10061141
30. Souza MST, de Baura VA, Santos SA, Fernandes-Júnior PI, Reis Junior FB, Marques MR, et al. Azospirillum spp. from native forage grasses in Brazilian pantanal floodplain: biodiversity and plant growth promotion potential. World J Microbiol Biotechnol. (2017) 33:81. doi: 10.1007/s11274-017-2251-4
31. Bhatti AA, Haq S, and Bhat RA. Actinomycetes benefaction role in soil and plant health. Microbial Pathogene. (2017) 111:458–67. doi: 10.1016/j.micpath.2017.09.036
32. Adenan NH and Ting AS-Y. Actinobacteria from soils and their applications in environmental bioremediation. In Microbial Biotechnology eds Chowdhary P, Mani S, and Chaturvedi P (2022). doi: 10.1002/9781119834489.ch16
33. Yuvaraj M and Ramasamy M. “Role of Fungi in Agriculture,” in Biostimulants in Plant Science, eds. Mirmajlessi SM and Radhakrishnan R (London: IntechOpen). (2020). doi: 10.5772/intechopen.89718
34. Daunoras J, Kačergius A, and Gudiukaitė R. Role of soil microbiota enzymes in soil health and activity changes depending on climate change and the type of soil ecosystem. Biology. (2024) 13:855. doi: 10.3390/biology13020085
35. Richter A, Huallacháin DÓ, Doyle E, Clipson N, Van Leeuwen JP, Heuvelink GB, et al. Linking diagnostic features to soil microbial biomass and respiration in agricultural grassland soil: A large-scale study in Ireland. Eur J Soil Sci. (2018) 69:414–28. doi: 10.1111/ejss.12551
36. Ghosh A, Singh AB, Kumar RV, Manna MC, Bhattacharyya R, Rahman MM, et al. Soil enzymes and microbial elemental stoichiometry as bio-indicators of soil quality in diverse cropping systems and nutrient management practices of Indian vertisols. Appl Soil Ecol. (2020) 145:103304. doi: 10.1016/j.apsoil.2019.06.007
37. Urra J, Alkorta I, Lanzén A, Mijangos I, and Garbisu C. The application of fresh and composted horse and chicken manure affects soil quality, microbial composition and antibiotic resistance. Appl Soil Ecol. (2019) 135:73–84. doi: 10.1016/j.apsoil.2018.11.005
38. Kabiri V, Raiesi F, and Ghazavi MA. Tillage effects on soil microbial biomass, SOM mineralization and enzyme activity in a semi-arid calcixerepts. Agricult Ecosyst Environ. (2016) 232:73–84. doi: 10.1016/j.agee.2016.07.022
39. Michael S, Nannipieri P, Sørensen SJ, and van Elsas JD. Microbial indicators for soil quality. Biol Fertil Soils. (2018) 54:1–10. doi: 10.1007/s00374-017-1248-3
40. Nunes JS, Araujo ASF, Nunes LAPL, Lima LM, Carneiro RFV, Salviano AAC, et al. Impact of land degradation on soil microbial biomass and activity in northeast Brazil. Pedosphere. (2012) 22:88–95. doi: 10.1016/S1002-0160(11)60194-X
41. Kumar A, Maurya BR, and Raghuwanshi R. Isolation and characterization of PGPR and their effect on growth, yield and nutrient content in wheat (Triticum aestivum L.). Biocatalysis Agric Biotechnol. (2014) 3:121–28. doi: 10.1016/j.bcab.2014.08.003
42. Singh JS, Kumar A, Rai AN, and Singh DP. Cyanobacteria: A precious bio-resource in agriculture, ecosystem, and environmental sustainability. Front Microbiol. (2016) 7:529. doi: 10.3389/fmicb.2016.00529
43. Vimal SR, Singh JS, Arora NK, and Singh S. Soil-plant-microbe interactions in stressed agriculture management: A review. Pedosphere. (2017) 27:177–925. doi: 10.1016/S1002-0160(17)60309-6
44. Jia J, de Goede R, Li Y, Zhang J, Wang G, Zhang J, et al. Unlocking soil health: are microbial functional genes effective indicators? Soil Biol Biochem. (2025) 204:109768. doi: 10.1016/j.soilbio.2025.109768
45. Beattie GA, Edlund A, Esiobu N, Gilbert J, Nicolaisen MH, Jansson JK, et al. Soil microbiome interventions for carbon sequestration and climate mitigation. mSystems. (2025) 10:e0112924. doi: 10.1128/msystems.01129-24
46. Jin W, Zhang Y, Su X, Xie Z, Wang R, Wang Y, et al. Effects of different land use on functional genes of soil microbial carbon and phosphorus cycles in the desert steppe zone of the loess plateau. BMC Microbiol. (2025) 25:607. doi: 10.1186/s12866-025-04305-9
47. Freire-Zapata V, Holland-Moritz H, Cronin DR, Aroney S, Smith DA, Wilson RM, et al. Microbiome–metabolite linkages drive greenhouse gas dynamics over a permafrost thaw gradient. Nat Microbiol. (2024) 9:2892–908. doi: 10.1038/s41564-024-01800-z
48. Belcour A, Megy L, Stephant S, Michel C, Rad S, Bombach P, et al. Predicting coarse-grained representations of biogeochemical cycles from metabarcoding data. Bioinformatics. (2025) 41:i49–57. doi: 10.1093/bioinformatics/btaf230
49. Mishra AK, Yadav P, Sharma S, and Maurya P. Comparison of microbial diversity and community structure in soils managed with organic and chemical fertilization strategies using amplicon sequencing of 16 s and ITS regions. Front Microbiol. (2025) 15:1444903. doi: 10.3389/fmicb.2024.1444903
50. Mo Y, Bier R, Li X, Daniels M, Smith A, Yu L, et al. Agricultural practices influence soil microbiome assembly and interactions at different depths identified by machine learning. Commun Biol. (2024) 7:1349. doi: 10.1038/s42003-024-07059-8
51. Peddle SD, Hodgson RJ, Borrett RJ, Brachmann S, Davies TC, Erickson TE, et al. Practical applications of soil microbiota to improve ecosystem restoration: current knowledge and future directions. Biol Rev Cambridge Philos Soc. (2025) 100:1–18. doi: 10.1111/brv.13124
52. Neale D, Cullen L, and Ranout AS. Improving soil health in the UK: why a microbial approach is indispensable in attaining sustainable soils. Sustain Microbiol. (2024) 1:qvae026. doi: 10.1093/sumbio/qvae026
53. Romillac N and Santorufo L. Transferring concepts from plant to microbial ecology: A framework proposal to identify relevant bacterial functional traits. Soil Biol Biochem. (2021) 162:108415. doi: 10.1016/j.soilbio.2021.108415
54. Gehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, et al. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microbial Genomics. (2022) 8:794. doi: 10.1099/mgen.0.000794
55. Gunjal A, Gupta S, Nweze JE, and Nweze JA. Chapter 4 - metagenomics in bioremediation: recent advances, challenges, and perspectives. In: Kumar V, Bilal M, Shahi SK, and Garg VK, editors. Metagenomics to bioremediation. Academic Press (2023). Developments in Applied Microbiology and Biotechnology. doi: 10.1016/B978-0-323-96113-4.00018-4
56. Vasudeva K, Kaur P, and Munshi A. Chapter 28 - high-throughput sequencing technologies in metagenomics. In: Kumar V, Bilal M, Shahi SK, and Garg VK, editors. Metagenomics to bioremediation. Academic Press (2023). Developments in Applied Microbiology and Biotechnology. doi: 10.1016/B978-0-323-96113-4.00005-6
57. Maljkovic Berry I, Melendrez MC, Bishop-Lilly KA, Rutvisuttinunt W, Pollett S, Talundzic E, et al. Next generation sequencing and bioinformatics methodologies for infectious disease research and public health: approaches, applications, and considerations for development of laboratory capacity. J Infect Dis. (2020) 221:S292–307. doi: 10.1093/infdis/jiz286
58. Sotomayor-Mena RG and Rios-Velazquez C. Soil microbiome dataset from guanica dry forest in Puerto Rico generated by shotgun sequencing. Data Brief. (2020) 28:104919. doi: 10.1016/j.dib.2019.104919
59. Galisteo C, Puente-Sánchez F, de la Haba RR, Bertilsson S, Sánchez-Porro C, and Ventosa A. Metagenomic insights into the prokaryotic communities of heavy metal-contaminated hypersaline soils. Sci Total Environ. (2024) 951:175497. doi: 10.1016/j.scitotenv.2024.175497
60. Enagbonma BJ, Amoo AE, and Babalola OO. Deciphering the microbiota data from termite mound soil in South Africa using shotgun metagenomics. Data Brief. (2020) 28:104802. doi: 10.1016/j.dib.2019.104802
61. Galanova OO, Mitkin NA, Danilova AA, Pavshintsev VV, Tsybizov DA, Zakharenko AM, et al. Assessment of soil health through metagenomic analysis of bacterial diversity in Russian black soil. Microorganisms. (2025) 13:854. doi: 10.3390/microorganisms13040854
62. Muscatt G, Cook R, Millard A, Bending GD, and Jameson E. Viral metagenomics reveals diverse virus-host interactions throughout the soil depth profile. mBio. (2023) 14:e02246–235. doi: 10.1128/mbio.02246-23
63. Wu Y, Sun X-R, Pritchard HW, Shen Y-B, Wu X-Q, and Peng C-Y. The metagenomics of soil bacteria and fungi and the release of mechanical dormancy in hard seeds. Front Plant Sci. (2023) 14:1187614. doi: 10.3389/fpls.2023.1187614
64. Xu R, Zhang M, Lin H, Gao P, Yang Z, Wang D, et al. Response of soil protozoa to acid mine drainage in a contaminated terrace. J Hazardous Mater. (2022) 421:126790. doi: 10.1016/j.jhazmat.2021.126790
65. Patova E, Novakovskaya I, Gusev E, and Martynenko N. Diversity of cyanobacteria and algae in biological soil crusts of the northern ural mountain region assessed through morphological and metabarcoding approaches. Diversity. (2023) 15:10805. doi: 10.3390/d15101080
66. Wei Z-G, Zhang X-D, Cao M, Liu F, Qian Y, and Zhang S-W. Comparison of methods for picking the operational taxonomic units from amplicon sequences. Front Microbiol. (2021) 12:644012. doi: 10.3389/fmicb.2021.644012
67. Tkacz A, Hortala M, and Poole PS. Absolute quantitation of microbiota abundance in environmental samples. Microbiome. (2018) 6:1105. doi: 10.1186/s40168-018-0491-7
68. Quince C, Walker AW, Simpson JT, Loman NJ, and Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. (2017) 35:833–445. doi: 10.1038/nbt.3935
69. Yun J, Crombie AT, Ul Haque MF, Cai Y, Zheng X, Wang J, et al. Revealing the community and metabolic potential of active methanotrophs by targeted metagenomics in the zoige wetland of the tibetan plateau. Environ Microbiol. (2021) 23:6520–35. doi: 10.1111/1462-2920.15697
70. Alsammar HF, Naseeb S, Brancia LB, Tucker Gilman R, Wang P, and Delneri D. Targeted metagenomics approach to capture the biodiversity of saccharomyces genus in wild environments. Environ Microbiol Rep. (2019) 11:206–145. doi: 10.1111/1758-2229.12724
71. Schloss PD and Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. (2005) 71:1501–65. doi: 10.1128/AEM.71.3.1501-1506.2005
72. Walters KE and Martiny JBH. Alpha-, beta-, and gamma-diversity of bacteria varies across habitats. PloS One. (2020) 15:e02338725. doi: 10.1371/journal.pone.0233872
73. Fukuda K, Ogawa M, Taniguchi H, and Saito M. Molecular approaches to studying microbial communities: targeting the 16S ribosomal RNA gene. J UOEH. (2016) 38:223–325. doi: 10.7888/juoeh.38.223
74. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. (2012) 486:207–14. doi: 10.1038/nature11234
75. Pradhan SK, Singh NR, Kumar U, Mishra SR, Perumal RC, Benny J, et al. Illumina miSeq based assessment of bacterial community structure and diversity along the heavy metal concentration gradient in sukinda chromite mine area soils, India. Ecol Genet Genomics. (2020) 15:100054. doi: 10.1016/j.egg.2020.100054
76. Setia G, Schlub R, and Husseneder C. Next-generation sequencing dataset of bacterial communities of microcerotermes crassus workers associated with ironwood trees (Casuarina equisetifolia) in Guam. Data Brief. (2023) 48:109286. doi: 10.1016/j.dib.2023.109286
77. Fierer N, Leff JW, Adams BJ, Nielsen UN, Bates ST, Lauber CL, et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci United States America. (2012) 109:21390–95. doi: 10.1073/pnas.1215210110
78. Behr JH, Kuhl-Nagel T, Sommermann L, Moradtalab N, Chowdhury SP, Schloter M, et al. Long-term conservation tillage with reduced nitrogen fertilization intensity can improve winter wheat health via positive plant-microorganism feedback in the rhizosphere. FEMS Microbiol Ecol. (2024) 100:fiae003. doi: 10.1093/femsec/fiae003
79. Soriano-Lerma A, Pérez-Carrasco V, Sánchez-Marañón M, Ortiz-González M, Sánchez-Martín V, Gijón J, et al. Influence of 16S rRNA target region on the outcome of microbiome studies in soil and saliva samples. Sci Rep. (2020) 10:13637. doi: 10.1038/s41598-020-70141-8
80. Meera Krishna B, Khan MA, and Khan ST. Next-generation sequencing (NGS) platforms: an exciting era of genome sequence analysis. In: Tripathi V, Kumar P, Tripathi P, Kishore A, and Kamle M, editors. Microbial Genomics in Sustainable Agroecosystems: Volume 2. Singapore: Springer Singapore (2019). doi: 10.1007/978-981-32-9860-6_6
81. Nkongolo KK and Narendrula-Kotha R. Advances in monitoring soil microbial community dynamic and function. J Appl Genet. (2020) 61:249–63. doi: 10.1007/s13353-020-00549-5
82. de Melo Pereira GV, de Carvalho Neto >DP, Maske BL, De Dea Lindner J, Vale AS, Favero GR, et al. An Updated Review on Bacterial Community Composition of Traditional Fermented Milk Products: What next-Generation Sequencing Has Revealed so Far? Crit Rev Food Sci Nutr. (2022) 62:1870–89. doi: 10.1080/10408398.2020.1848787
83. Mangse G, Werner D, Meynet P, and Ogbaga CC. Microbial community responses to different volatile petroleum hydrocarbon class mixtures in an aerobic sandy soil. Environ pollut. (2020) 264:114738. doi: 10.1016/j.envpol.2020.114738
84. Koryachenko O, Girsowicz R, Dekel Y, Doniger T, and Steinberger Y. Sedimentary marl mudstone as a substrate in a xeric environment revealed by microbiome analysis. Extremophiles: Life Under Extreme Conditions. (2019) 23:337–465. doi: 10.1007/s00792-019-01087-7
85. Cardinale M, Suarez C, Steffens D, Ratering S, and Schnell S. Effect of different soil phosphate sources on the active bacterial microbiota is greater in the rhizosphere than in the endorhiza of barley (Hordeum vulgare L.). Microbial Ecol. (2019) 77:689–7005. doi: 10.1007/s00248-018-1264-3
86. Azaroual SE, Kasmi Y, Aasfar A, El Arroussi H, Zeroual Y, El Kadiri Y, et al. Investigation of bacterial diversity using 16S rRNA sequencing and prediction of its functionalities in moroccan phosphate mine ecosystem. Sci Rep. (2022) 12:3741. doi: 10.1038/s41598-022-07765-5
87. Stoddard SF, Smith BJ, Hein R, Roller BRK, and Schmidt TM. rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res. (2015) 43:D593–598. doi: 10.1093/nar/gku1201
88. Pei AY, Oberdorf WE, Nossa CW, Agarwal A, Chokshi P, Gerz EA, et al. Diversity of 16S rRNA Genes within Individual Prokaryotic Genomes. Appl Environ Microbiol. (2010) 76:3886–97. doi: 10.1128/AEM.02953-09
89. Bleidorn C. Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Syst Biodivers. (2016) 14:1–8. doi: 10.1080/14772000.2015.1099575
90. Du Z, Sun L, Lin Y, Yang F, and Cai Y. Using pacBio SMRT sequencing technology and metabolomics to explore the microbiota-metabolome interaction related to silage fermentation of woody plant. Front Microbiol. (2022) 13:857431. doi: 10.3389/fmicb.2022.857431
91. Chun S-J, Cui Y, Kim J, Lee J-W, Han SM, and Nam K-H. Ecological impacts of sunflowers on soil microbial communities: insights from full-length 16S rRNA sequencing. Curr Microbiol. (2025) 82:2975. doi: 10.1007/s00284-025-04273-3
92. Byers AK, Waipara N, Condron L, and Black A. The impacts of ecological disturbances on the diversity of biosynthetic gene clusters in kauri (Agathis australis) soil. Environ Microbiome. (2024) 19:69. doi: 10.1186/s40793-024-00613-1
93. Estrada R, Porras T, Romero Y, Pérez WE, Vilcara EA, Cruz J, et al. Soil depth and physicochemical properties influence microbial dynamics in the rhizosphere of two Peruvian superfood trees, cherimoya and lucuma, as shown by pacBio-hiFi sequencing. Sci Rep. (2024) 14:19508. doi: 10.1038/s41598-024-69945-9
94. Chavan S, Sarangdhar V, and Vigneshwaran N. Nanopore-based metagenomic analysis of the impact of nanoparticles on soil microbial communities. Heliyon. (2022) 8:e096935. doi: 10.1016/j.heliyon.2022.e09693
95. Cruz-Silva A, Laureano G, Pereira M, Dias R, Silva JM da, Oliveira N, et al. A new perspective for vineyard terroir identity: looking for microbial indicator species by long read nanopore sequencing. Microorganisms. (2023) 11:672. doi: 10.3390/microorganisms11030672
96. LeBlanc N. Green manures alter taxonomic and functional characteristics of soil bacterial communities. Microbial Ecol. (2023) 85:684–97. doi: 10.1007/s00248-022-01975-0
97. Ardui S, Ameur A, Vermeesch JR, and Hestand MS. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. (2018) 46:2159–685. doi: 10.1093/nar/gky066
98. Garg D, Patel N, Rawat A, and Rosado AS. Cutting edge tools in the field of soil microbiology. Curr Res Microbial Sci. (2024) 6:100226. doi: 10.1016/j.crmicr.2024.100226
99. Brinkerhoff H, Kang ASW, Liu J, Aksimentiev A, and Dekker C. Multiple rereads of single proteins at single-amino acid resolution using nanopores. Sci (New York N.Y.). (2021) 374:1509–13. doi: 10.1126/science.abl4381
100. Tyler AD, Mataseje L, Urfano CJ, Schmidt L, Antonation KS, Mulvey MR, et al. Evaluation of oxford nanopore’s minION sequencing device for microbial whole genome sequencing applications. Sci Rep. (2018) 8:10931. doi: 10.1038/s41598-018-29334-5
101. Zhang T, Li H, Ma S, Cao J, Liao H, Huang Q, et al. The newest oxford nanopore R10.4.1 full-length 16S rRNA sequencing enables the accurate resolution of species-level microbial community profiling. Appl Environ Microbiol. (2023) 89:e0060523. doi: 10.1128/aem.00605-23
102. Liu Y, Li Y, Chen E, Xu J, Zhang W, Zeng X, et al. Repeat and haplotype aware error correction in nanopore sequencing reads with deChat. Commun Biol. (2024) 7:1678. doi: 10.1038/s42003-024-07376-y
103. Curry KD, Wang Q, Nute MG, Tyshaieva A, Reeves E, Soriano S, et al. Emu: species-level microbial community profiling of full-length 16S rRNA oxford nanopore sequencing data. Nat Methods. (2022) 19:845–53. doi: 10.1038/s41592-022-01520-4
104. Nygaard AB, Tunsjø HS, Meisal R, and Charnock C. A preliminary study on the potential of nanopore minION and illumina miSeq 16S rRNA gene sequencing to characterize building-dust microbiomes. Sci Rep. (2020) 10:32095. doi: 10.1038/s41598-020-59771-0
105. Kerkhof LJ, Dillon KP, Häggblom MM, and McGuinness LR. Profiling bacterial communities by minION sequencing of ribosomal operons. Microbiome. (2017) 5:1165. doi: 10.1186/s40168-017-0336-9
106. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. (2019) 37:852–57. doi: 10.1038/s41587-019-0209-9
107. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, and Holmes SP. DADA2: high-resolution sample inference from illumina amplicon data. Nat Methods. (2016) 13:581–835. doi: 10.1038/nmeth.3869
108. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Xu Zech Z, et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems. (2017) 2:e00191–16. doi: 10.1128/mSystems.00191-16
109. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. (2009) 75:7537–41. doi: 10.1128/AEM.01541-09
110. Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. (2013) 10:996–98. doi: 10.1038/nmeth.2604
111. Edgar RC. UNOISE2: improved error-correction for illumina 16S and ITS amplicon sequencing. bioRxiv. (2016). doi: 10.1101/081257. Preprint.
112. Zhou Y, Liu Y-X, and Li X. USEARCH 12: open-source software for sequencing analysis in bioinformatics and microbiome. iMeta. (2024) 3:e2365. doi: 10.1002/imt2.236
113. Nearing J, Douglas G, Comeau A, and Langille M. Denoising the denoisers: an independent evaluation of microbiome sequence error-correction approaches. PeerJ. (2018) 6:e5364. doi: 10.7717/peerj.5364
114. Chiarello M, McCauley M, Villéger S, and Jackson CR. Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PloS One. (2022) 17:e02644435. doi: 10.1371/journal.pone.0264443
115. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. (2013) 41:D590–596. doi: 10.1093/nar/gks1219
116. Molano L-AG, Vega-Abellaneda S, and Manichanh C. GSR-DB: A manually curated and optimized taxonomical database for 16S rRNA amplicon analysis. mSystems. (2024) 9:e00950235. doi: 10.1128/msystems.00950-23
117. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinf. (2009) 10:421. doi: 10.1186/1471-2105-10-421
118. Rognes T, Flouri T, Nichols B, Quince C, and Mahé F. VSEARCH: A versatile open source tool for metagenomics. PeerJ. (2016) 4:e2584. doi: 10.7717/peerj.2584
119. Wood DE, Lu J, and Langmead B. Improved metagenomic analysis with kraken 2. Genome Biol. (2019) 20:257. doi: 10.1186/s13059-019-1891-0
120. Lu J, Breitwieser FP, Thielen P, and Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. (2017) 3:e104. doi: 10.7717/peerj-cs.104
121. Chaudhary N, Sharma AK, Agarwal P, Gupta A, and Sharma VK. 16S classifier: A tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets. PloS One. (2015) 10:e01161065. doi: 10.1371/journal.pone.0116106
122. Ahmed O, Boucher C, and Langmead B. Cliffy: robust 16S rRNA classification based on a compressed LCA index. bioRxiv: Preprint Server Biol. (2024). doi: 10.1101/2024.05.25.595899. 05.25.595899.
123. Baid G, Cook DE, Shafin K, Yun T, Llinares-López F, Berthet Q, et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat Biotechnol. (2023) 41:232–38. doi: 10.1038/s41587-022-01435-7
124. Huang W, van Bodegom PM, Declerck S, Heinonsalo J, Cosme M, Viskari T, et al. Mycelium chemistry differs markedly between ectomycorrhizal and arbuscular mycorrhizal fungi. Commun Biol. (2022) 5:398. doi: 10.1038/s42003-022-03341-9
125. Guo Y, Guigue J, Bauke SL, Hempel S, and Rillig MC. Soil depth and fertilizer shape fungal community composition in a long-term fertilizer agricultural field. Appl Soil Ecol. (2025) 207:105943. doi: 10.1016/j.apsoil.2025.105943
126. Matazimov MT, Sidametova ZE, Olimov NK, Abdullaeva MU, Rakhimova DO, Rustamov IX, et al. Ecological roles of fungal networks: integrating nutrient cycling, soil health, and climate resilience-A systematic review. Microbial Bioactives. (2025) 8:1–8. doi: 10.25163/microbbioacts.8110420
127. Tunlid A, Floudas D, Op De Beeck M, Wang T, and Persson P. Decomposition of soil organic matter by ectomycorrhizal fungi: mechanisms and consequences for organic nitrogen uptake and soil carbon stabilization. Front Forests Global Change. (2022) 5:934409. doi: 10.3389/ffgc.2022.934409
128. Tian H, Li L, Zhu Y, Wang C, Wu M, Shen W, et al. Soil fungal community and co-occurrence network patterns at different successional stages of black locust coppice stands. Front Microbiol. (2025) 16:1528028. doi: 10.3389/fmicb.2025.1528028
129. Nilsson RH, Tedersoo L, Ryberg M, Kristiansson E, Hartmann M, Unterseher M, et al. A comprehensive, automatically updated fungal ITS sequence dataset for reference-based chimera control in environmental sequencing efforts. Microbes Environ. (2015) 30:145–50. doi: 10.1264/jsme2.ME14121
130. Jones B, Goodall T, George PBL, Gweon HS, Puissant J, Read DS, et al. Beyond taxonomic identification: integration of ecological responses to a soil bacterial 16S rRNA gene database. Front Microbiol. (2021) 12:682886. doi: 10.3389/fmicb.2021.682886
131. Hamm PS, Mueller RC, Kuske CR, and Porras-Alfaro A. Keratinophilic fungi: specialized fungal communities in a desert ecosystem identified using cultured-based and illumina sequencing approaches. Microbiol Res. (2020) 239:126530. doi: 10.1016/j.micres.2020.126530
132. Furneaux B, Bahram M, Rosling A, Yorou NS, and Ryberg M. Long- and short-read metabarcoding technologies reveal similar spatiotemporal structures in fungal communities. Mol Ecol Resour. (2021) 21:1833–495. doi: 10.1111/1755-0998.13387
133. Tedersoo L, Tooming-Klunderud A, and Anslan S. PacBio metabarcoding of fungi and other eukaryotes: errors, biases and perspectives. New Phytol. (2018) 217:1370–855. doi: 10.1111/nph.14776
134. Lee FCH and Muthu V. From 18S to 28S rRNA gene: an improved targeted sarcocystidae PCR amplification, species identification with long DNA sequences. Am J Trop Med Hygiene. (2021) 104:1388–935. doi: 10.4269/ajtmh.20-0767
135. Hatfield RG, Batista FM, Bean TP, Fonseca VG, Santos A, Turner AD, et al. The application of nanopore sequencing technology to the study of dinoflagellates: A proof of concept study for rapid sequence-based discrimination of potentially harmful algae. Front Microbiol. (2020) 11:844. doi: 10.3389/fmicb.2020.00844
136. Xue F and Liu T. DNA sequence and community structure diversity of multi-year soil fungi in grape of xinjiang. Sci Rep. (2021) 11:163675. doi: 10.1038/s41598-021-95854-2
137. He Y, Hou X-Y, Li C-X, Wang Y, and Ma X-R. Soil microbial communities altered by titanium ions in different agroecosystems of pitaya and grape. Microbiol Spectr. (2022) 10:e00907215. doi: 10.1128/spectrum.00907-21
138. Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, et al. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol Evol. (2013) 4:914–19. doi: 10.1111/2041-210X.12073
139. Einarsson SV and Rivers AR. ITSxpress version 2: software to rapidly trim internal transcribed spacer sequences with quality scores for amplicon sequencing. Microbiol Spectr. (2024) 12:e00601245. doi: 10.1128/spectrum.00601-24
140. Rzehak T, Praeg N, Galla G, Seeber J, Hauffe HC, and Illmer P. Comparison of commonly used software pipelines for analyzing fungal metabarcoding data. BMC Genomics. (2024) 25:10855. doi: 10.1186/s12864-024-11001-x
141. Özkurt E, Fritscher J, Soranzo N, Ng DYK, Davey RP, Bahram M, et al. LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis. Microbiome. (2022) 10:176. doi: 10.1186/s40168-022-01365-1
142. Anslan S, Bahram M, Hiiesalu I, and Tedersoo L. PipeCraft: flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Mol Ecol Resour. (2017) 17:e234–405. doi: 10.1111/1755-0998.12692
143. Escudié F, Auer L, Bernard M, Mariadassou M, Cauquil L, Vidal K, et al. FROGS: find, rapidly, OTUs with galaxy solution. Bioinf (Oxford England). (2018) 34:1287–94. doi: 10.1093/bioinformatics/btx791
144. Abarenkov K, Nilsson RH, Larsson K-H, Taylor AFS, May TW, Frøslev TG, et al. The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered. Nucleic Acids Res. (2024) 52:D791–97. doi: 10.1093/nar/gkad1039
145. Edgar RC, Haas BJ, Clemente JC, Quince C, and Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. (2011) 27:2194–005. doi: 10.1093/bioinformatics/btr381
146. Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J, et al. FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol. (2016) 20:241–48. doi: 10.1016/j.funeco.2015.06.006
147. Põlme S, Abarenkov K, Nilsson RH, Lindahl BD, Clemmensen KE, Kauserud H, et al. FungalTraits: A user-friendly traits database of fungi and fungus-like stramenopiles. Fungal Diversity. (2020) 105:1–16. doi: 10.1007/s13225-020-00466-2
148. Tanunchai B, Ji L, Schroeter SA, Wahdan SFM, Hossen S, Delelegn Y, et al. FungalTraits vs. FUNGuild: comparison of ecological functional assignments of leaf- and needle-associated fungi across 12 temperate tree species. Microbial Ecol. (2023) 85:411–28. doi: 10.1007/s00248-022-01973-2
149. Krabberød AK, Stokke E, Thoen E, Skrede I, and Kauserud H. The ribosomal operon database: A full-length rDNA operon database derived from genome assemblies. Mol Ecol Resour. (2025) 25:e140315. doi: 10.1111/1755-0998.14031
150. Hillmann B, Al-Ghalith GA, Shields-Cutler RR, Zhu Q, Gohl DM, Beckman KB, et al. Evaluating the information content of shallow shotgun metagenomics. mSystems. (2018) 3:e00069–18. doi: 10.1128/mSystems.00069-18
151. Ye L, Dong N, Xiong W, Li J, Li R, Heng H, et al. High-resolution metagenomics of human gut microbiota generated by nanopore and illumina hybrid metagenome assembly. Front Microbiol. (2022) 13:801587. doi: 10.3389/fmicb.2022.801587
152. Logares R, Sunagawa S, Salazar G, Cornejo-Castillo FM, Ferrera I, Sarmento H, et al. Metagenomic 16S rDNA illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. Environ Microbiol. (2014) 16:2659–71. doi: 10.1111/1462-2920.12250
153. Poretsky R, Rodriguez-R LM, Luo C, Tsementzi D, and Konstantinidis KT. Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PloS One. (2014) 9:e938275. doi: 10.1371/journal.pone.0093827
154. Edwin NR, Duff A, Deveautour C, Brennan F, Abram F, and O’Sullivan O. Consistent microbial insights across sequencing methods in soil studies: the role of reference taxonomies. mSystems. (2025) 10:e01059245. doi: 10.1128/msystems.01059-24
155. Yang J-X, Peng Y, Yang J-J, Zhang Y-H, Dong Q, Li Q-S, et al. Nitrogen addition alters arbuscular mycorrhizal fungi and soil bacteria networks without promoting phosphorus mineralization in a semiarid grassland. Commun Biol. (2025) 8:1229. doi: 10.1038/s42003-025-08681-w
156. Wang L, Chen X, Pollock NR, Villafuerte Gálvez JA, Alonso CD, Wang D, et al. Metagenomic analysis reveals distinct patterns of gut microbiota features with diversified functions in C. Difficile infection (CDI), asymptomatic carriage and non-CDI diarrhea. Gut Microbes. (2025) 17:2505269. doi: 10.1080/19490976.2025.2505269
157. Zöggeler T, Kavallar AM, Pollio AR, Aldrian D, Decristoforo C, Scholl-Bürgi S, et al. Meta-analysis of shotgun sequencing of gut microbiota in obese children with MASLD or MASH. Gut Microbes. (2025) 17:2508951. doi: 10.1080/19490976.2025.2508951
158. Star-Shirko B, Pangga GM, McKenna A, Corcionivoschi N, Richmond A, Ijaz UZ, et al. Investigating microbial population structure and function in the chicken caeca and large intestine over time using metagenomics. BMC Res Notes. (2025) 18:355. doi: 10.1186/s13104-025-07441-7
159. Frey B, Varliero G, Qi W, Stierli B, Walthert L, and Brunner I. Shotgun metagenomics of deep forest soil layers show evidence of altered microbial genetic potential for biogeochemical cycling. Front Microbiol. (2022) 13:828977. doi: 10.3389/fmicb.2022.828977
160. Babalola OO, Molefe RR, and Amoo AE. Revealing the active microbiome connected with the rhizosphere soil of maize plants in ventersdorp, South Africa. Biodivers Data J. (2021) 9:e60245. doi: 10.3897/BDJ.9.e60245
161. Babalola OO, Adebayo AA, and Enagbonma BJ. Shotgun metagenomics dataset of the core rhizo-microbiome of monoculture and soybean-precedent carrot. BMC Genomic Data. (2025) 26:265. doi: 10.1186/s12863-025-01320-7
162. Hassen AI, Pierneef R, Swanevelder ZH, and Bopape FL. Microbial and functional diversity of cyclopia intermedia rhizosphere microbiome revealed by analysis of shotgun metagenomics sequence data. Data Brief. (2020) 32:106288. doi: 10.1016/j.dib.2020.106288
163. ‘FastQC A Quality Control Tool for High Throughput Sequence Data’. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (Accessed November 2, 2025).
164. De Coster W, D’Hert S, Schultz DT, Cruts M, and Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinf (Oxford England). (2018) 34:2666–695. doi: 10.1093/bioinformatics/bty149
165. ’pycoQC’. Available online at: https://a-slide.github.io/pycoQC/ (Accessed November 2, 2025).
166. Fukasawa Y, Ermini L, Wang H, Carty K, and Cheung M-S. LongQC: A quality control tool for third generation sequencing long read data. G3: Genes|Genomes|Genet. (2020) 10:1193–965. doi: 10.1534/g3.119.400864
167. Perdomo JE, Ahsan MU, Liu Q, Fang L, and Wang K. LongReadSum: A fast and flexible quality control and signal summarization tool for long-read sequencing data. Comput Struct Biotechnol J. (2025) 27:556–63. doi: 10.1016/j.csbj.2025.01.019
168. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.J. (2011) 17:10–2. doi: 10.14806/ej.17.1.200
169. ’Trim Galore’. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (Accessed November 2, 2025).
170. Bolger AM, Lohse M, and Usadel B. Trimmomatic: A flexible trimmer for illumina sequence data. Bioinformatics. (2014) 30:2114–205. doi: 10.1093/bioinformatics/btu170
171. ‘BBDuk Guide’. Archive. Available online at: https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ (Accessed November 2, 2025).
172. Bonenfant Q, Noé L, and Touzet H. Porechop_ABI: discovering unknown adapters in oxford nanopore technology sequencing reads for downstream trimming. Bioinf Adv. (2023) 3:vbac085. doi: 10.1093/bioadv/vbac085
173. ‘Dorado Documentation’. Available online at: https://software-docs.nanoporetech.com/dorado/latest/ (Accessed November 2, 2025).
174. PacBio. PacBio - sequence with confidence. Available online at: https://www.pacb.com/ (Accessed November 2, 2025).
175. Langmead B and Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. (2012) 9:357–595. doi: 10.1038/nmeth.1923
176. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997. (2013). doi: 10.48550/arXiv.1303.3997. Preprint, arXiv.
177. KneadData. Available online at: https://huttenhower.sph.harvard.edu/kneaddata/ (Accessed November 2, 2025).
178. Constantinides B, Hunt M, and Crook DW. Hostile: accurate decontamination of microbial host sequences. Bioinf (Oxford England). (2023) 39:btad728. doi: 10.1093/bioinformatics/btad728
179. Rumbavicius I, Rounge TB, and Rognes T. HoCoRT: host contamination removal tool. BMC Bioinf. (2023) 24:3715. doi: 10.1186/s12859-023-05492-w
180. Li D, Liu C-M, Luo R, Sadakane K, and Lam T-W. MEGAHIT: An Ultra-Fast Single-Node Solution for Large and Complex Metagenomics Assembly via Succinct de Bruijn Graph. Bioinf (Oxford England). (2015) 31:1674–765. doi: 10.1093/bioinformatics/btv033
181. Nurk S, Meleshko D, Korobeynikov A, and Pevzner PA. metaSPAdes: A new versatile metagenomic assembler. Genome Res. (2017) 27:824–345. doi: 10.1101/gr.213959.116
182. Souvorov A, Agarwala R, and Lipman DJ. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. (2018) 19:1535. doi: 10.1186/s13059-018-1540-z
183. Zhang Z, Yang C, Veldsman WP, Fang X, and Zhang L. Benchmarking genome assembly methods on metagenomic sequencing data. Briefings in Bioinformatics. (2023) 24:bbad087. doi: 10.1093/bib/bbad087
184. Kolmogorov M, Bickhart DM, Behsaz B, Gurevich A, Rayko M, Shin SB, et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods. (2020) 17:1103–10. doi: 10.1038/s41592-020-00971-x
185. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, and Phillippy AM. Canu: Scalable and Accurate Long-Read Assembly via Adaptive k-Mer Weighting and Repeat Separation. Genome Res. (2017) 27:722–365. doi: 10.1101/gr.215087.116
186. Moss EL, Maghini DG, and Bhatt AS. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol. (2020) 38:701–75. doi: 10.1038/s41587-020-0422-6
187. Benoit G, Raguideau S, James R, Phillippy AM, Chikhi R, and Quince C. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol. (2024) 42:1378–835. doi: 10.1038/s41587-023-01983-6
188. Mendes CI, Vila-Cerqueira P, Motro Y, Moran-Gilad J, Carriço JA, and Ramirez M. LMAS: Evaluating Metagenomic Short de Novo Assembly Methods through Defined Communities. GigaScience. (2023) 12:giac122. doi: 10.1093/gigascience/giac122
189. Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. (2019) 7:e7359. doi: 10.7717/peerj.7359
190. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. (2014) 11:1144–46. doi: 10.1038/nmeth.3103
191. Wu Y-W, Simmons BA, and Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinf (Oxford England). (2016) 32:605–75. doi: 10.1093/bioinformatics/btv638
192. Pan S, Zhu C, Zhao X-M, and Coelho LP. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat Commun. (2022) 13:23265. doi: 10.1038/s41467-022-29843-y
193. Nissen JN, Johansen J, Allesøe RL, Sønderby CK, Armenteros JJA, Grønbech CH, et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat Biotechnol. (2021) 39:555–60. doi: 10.1038/s41587-020-00777-4
194. Wang Z, You R, Han H, Liu W, Sun F, and Zhu S. Effective binning of metagenomic contigs using contrastive multi-view representation learning. Nat Commun. (2024) 15:5855. doi: 10.1038/s41467-023-44290-z
195. Lettich R, Egan R, Riley R, Wang Z, Tritt A, Oliker L, et al. GenomeFace: A deep learning-based metagenome binner trained on 43,000 microbial genomes. bioRxiv. (2024). doi: 10.1101/2024.02.07.579326. Preprint.
196. Arangasamy Y, Morice É, Jochheim A, Lieser B, and Söding J. Evaluation of metagenome binning: advances and challenges. bioRxiv. (2025). doi: 10.1101/2025.02.21.639465. Preprint.
197. Mallawaarachchi V and Lin Y. MetaCoAG: binning metagenomic contigs via composition, coverage and assembly graphs. In: Pe’er I, editor. Research in Computational Molecular Biology. Cham: Springer International Publishing (2022).
198. Lamurias A, Sereika M, Albertsen M, Hose K, and Nielsen TD. Metagenomic binning with assembly graph embeddings. Bioinf (Oxford England). (2022) 38:4481–875. doi: 10.1093/bioinformatics/btac557
199. Xue H, Mallawaarachchi V, Zhang Y, Rajan V, and Lin Y. RepBin: constraint-based graph representation learning for metagenomic binning. arXiv:2112.11696. (2021). doi: 10.48550/arXiv.2112.11696. Preprint, arXiv.
200. Lamurias A, Tibo A, Hose K, Albertsen M, and Nielsen TD. Metagenomic binning using connectivity-constrained variational autoencoders. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, and Scarlett J, editors. Proceedings of the 40th International Conference on Machine Learning, vol. 202 . Proceedings of Machine Learning Research. PMLR (2023). Available online at: https://proceedings.mlr.press/v202/lamurias23a.html (Accessed December 14, 2025).
201. Xue H, Mallawaarachchi V, Xie L, and Rajan V. (2023). Encoding unitig-level assembly graphs with heterophilous constraints for metagenomic contigs binning, in: The Twelfth International Conference on Learning Representations, , October 13. Available online at: https://openreview.net/forum?id=vBw8JGBJWj&referrer=%5Bthe%20profile%20of%20Vaibhav%20Rajan%5D(%2Fprofile%3Fid%3D~Vaibhav_Rajan2) (Accessed November 3, 2025).
202. Feng X and Li H. Evaluating and improving the representation of bacterial contents in long-read metagenome assemblies. Genome Biol. (2024) 25:925. doi: 10.1186/s13059-024-03234-6
203. Mallawaarachchi V, Wickramarachchi A, McArthur R, Lang Y, Caley K, and Huttley G. GraphBin-tk: assembly graph-based metagenomic binning toolkit. J Open Source Softw. (2025) 10:77135. doi: 10.21105/joss.07713
204. Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. (2018) 3:836–43. doi: 10.1038/s41564-018-0171-1
205. Wang Z, Huang P, You R, Sun F, and Zhu S. MetaBinner: A high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol. (2023) 24:15. doi: 10.1186/s13059-022-02832-6
206. Qiu Z, Yuan L, Lian C-A, Lin B, Chen J, Mu R, et al. BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis. Nat Commun. (2024) 15:2179. doi: 10.1038/s41467-024-46539-7
207. Mainguy J and Hoede C. Binette: A fast and accurate bin refinement tool to construct high quality metagenome assembled genomes. J Open Source Softw. (2024) 9:67825. doi: 10.21105/joss.06782
208. Uritskiy GV, DiRuggiero J, and Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. (2018) 6:1585. doi: 10.1186/s40168-018-0541-1
209. Xia Y, Liang L, Wang X, Chen Z, Liu J, Yang Y, et al. MetaflowX: A scalable and resource-efficient workflow for multi-strategy metagenomic analysis. Nucleic Acids Res. (2025) 53:gkaf954. doi: 10.1093/nar/gkaf954
210. Shen W, Xiang H, Huang T, Tang H, Peng M, Cai D, et al. KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinf (Oxford England). (2023) 39:btac845. doi: 10.1093/bioinformatics/btac845
211. Menzel P, Ng KL, and Krogh A. Fast and sensitive taxonomic classification for metagenomics with kaiju. Nat Commun. (2016) 7:11257. doi: 10.1038/ncomms11257
212. Wright RJ, Comeau AM, and Langille MGI. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microbial Genomics. (2023) 9:0009495. doi: 10.1099/mgen.0.000949
213. Huson DH, Auch AF, Qi J, and Schuster SC. MEGAN analysis of metagenomic data. Genome Res. (2007) 17:377–865. doi: 10.1101/gr.5969107
214. Huson DH, Albrecht B, Bağcı C, Bessarab I, Górska A, Jolic D, et al. MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol Direct. (2018) 13:6. doi: 10.1186/s13062-018-0208-7
215. Zhu Q, Huang S, Gonzalez A, McGrath I, McDonald D, Haiminen N, et al. Phylogeny-aware analysis of metagenome community ecology based on matched reference genomes while bypassing taxonomy. mSystems. (2022) 7:e0016722. doi: 10.1128/msystems.00167-22
216. Blanco-Míguez A, Beghini F, Cumbo F, McIver LJ, Thompson KN, Zolfo M, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using metaPhlAn 4. Nat Biotechnol. (2023) 41:1633–44. doi: 10.1038/s41587-023-01688-w
217. Akaçin İ, Ersoy Ş, Doluca O, and Güngörmüşler M. Comparing the significance of the utilization of next generation and third generation sequencing technologies in microbial metagenomics. Microbiol Res. (2022) 264:127154. doi: 10.1016/j.micres.2022.127154
218. Edwin NR, Fitzpatrick AH, Brennan F, Abram F, and O’Sullivan O. An in-depth evaluation of metagenomic classifiers for soil microbiomes. Environ Microbiome. (2024) 19:195. doi: 10.1186/s40793-024-00561-w
219. de Muinck EJ, Trosvik P, Gilfillan GD, Hov JR, and Sundaram AYM. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the illumina hiSeq platform. Microbiome. (2017) 5:685. doi: 10.1186/s40168-017-0279-1
220. Biada I, Santacreu MA, González-Recio O, and Ibáñez-Escriche N. Comparative analysis of illumina, pacBio, and nanopore for 16S rRNA gene sequencing of rabbit’s gut microbiota. Front Microbiomes. (2025) 4:1587712. doi: 10.3389/frmbi.2025.1587712
221. Stevens BM, Creed TB, Reardon CL, and Manter DK. Comparison of oxford nanopore technologies and illumina miSeq sequencing with mock communities and agricultural soil. Sci Rep. (2023) 13:93235. doi: 10.1038/s41598-023-36101-8
222. Pan P, Gu Y, Sun D-L, Wu QL, and Zhou N-Y. Microbial diversity biased estimation caused by intragenomic heterogeneity and interspecific conservation of 16S rRNA genes. Appl Environ Microbiol. (2023) 89:e02108225. doi: 10.1128/aem.02108-22
223. Allali I, Arnold JW, Roach J, Cadenas MB, Butz N, Hassan HM, et al. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiol. (2017) 17:194. doi: 10.1186/s12866-017-1101-8
224. Hon T, Mars K, Young G, Tsai Y-C, Karalius JW, Landolin JM, et al. Highly accurate long-read hiFi sequencing data for five complex genomes. Sci Data. (2020) 7:399. doi: 10.1038/s41597-020-00743-4
225. Strokach A, Zoruk P, Boldyreva D, Morozov M, Olekhnovich E, Veselovsky V, et al. Comparative evaluation of sequencing technologies and primer sets for mouse gut microbiota profiling. Front Microbiol. (2025) 16:1584359. doi: 10.3389/fmicb.2025.1584359
226. Veselovsky V, Romanov M, Zoruk P, et al. Comparative evaluation of sequencing platforms: pacific biosciences, oxford nanopore technologies, and illumina for 16S rRNA-based soil microbiome profiling. Front Microbiol. (2025) 16:1633360. doi: 10.3389/fmicb.2025.1633360
227. Tvedte ES, Gasser M, Sparklin BC, Michalski J, Hjelmen CE, Johnston JS, et al. Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 (Bethesda Md.). (2021) 11:jkab083. doi: 10.1093/g3journal/jkab083
228. Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. (2019) 37:1155–62. doi: 10.1038/s41587-019-0217-9
229. Volden R, Palmer T, Byrne A, Cole C, Schmitz RJ, Green RE, et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc Natl Acad Sci. (2018) 115:9726–31. doi: 10.1073/pnas.1806447115
230. Szoboszlay M, Schramm L, Pinzauti D, Scerri J, Sandionigi A, and Biazzo M. Nanopore is preferable over illumina for 16S amplicon sequencing of the gut microbiota when species-level taxonomic classification, accurate estimation of richness, or focus on rare taxa is required. Microorganisms. (2023) 11:8045. doi: 10.3390/microorganisms11030804
231. Yeo K, Connell J, Bouras G, Smith E, Murphy W, Hodge J-C, et al. A Comparison between Full-Length 16S rRNA Oxford Nanopore Sequencing and Illumina V3-V4 16S rRNA Sequencing in Head and Neck Cancer Tissues. Arch Microbiol. (2024) 206:248. doi: 10.1007/s00203-024-03985-7
232. Schiffer AM, Rahman A, Sutton W, Putnam ML, and Weisberg AJ. A comparison of short- and long-read whole genome sequencing for microbial pathogen epidemiology. bioRxiv. (2025). doi: 10.1101/2025.02.17.638699. Preprint.
233. Bejaoui S, Nielsen SH, Rasmussen A, Coia JE, Andersen DT, Pedersen TB, et al. Comparison of illumina and oxford nanopore sequencing data quality for clostridioides difficile genome analysis and their application for epidemiological surveillance. BMC Genomics. (2025) 26:92. doi: 10.1186/s12864-025-11267-9
234. ’Nanopore Sequencing Accuracy | Oxford Nanopore Technologies’. Available online at: https://nanoporetech.com/platform/accuracy (Accessed 4 November 2025).
235. Ermini L and Driguez P. The application of long-read sequencing to cancer. Cancers. (2024) 16:12755. doi: 10.3390/cancers16071275
236. Eisenhofer R, Nesme J, Santos-Bay L, Koziol A, Sørensen SJ, Alberdi A, et al. A comparison of short-read, hiFi long-read, and hybrid strategies for genome-resolved metagenomics. Microbiol Spectr. (2024) 12:e0359023. doi: 10.1128/spectrum.03590-23
237. Jovel J, Patterson J, Wang W, Hotte N, O’Keefe S, Mitchel T, et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front Microbiol. (2016) 7:459. doi: 10.3389/fmicb.2016.00459
238. Ranjan R, Rani A, Metwally A, McGee HS, and Perkins DL. Analysis of the microbiome: advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem Biophys Res Commun. (2016) 469:967–775. doi: 10.1016/j.bbrc.2015.12.083
239. Vatta P and Cacciò SM. Detection of parasites in food and water matrices by shotgun metagenomics: A narrative review. Food Waterborne Parasitol. (2025) 39:e00265. doi: 10.1016/j.fawpar.2025.e00265
240. Rhoads A and Au KF. PacBio sequencing and its applications. Genom Proteomics Bioinf. (2015) 13:278–895. doi: 10.1016/j.gpb.2015.08.002
241. Wang Y, Zhao Y, Bollas A, Wang Y, and Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol. (2021) 39:1348–655. doi: 10.1038/s41587-021-01108-x
242. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, et al. Oxford Nanopore R10.4 Long-Read Sequencing Enables the Generation of near-Finished Bacterial Genomes from Pure Cultures and Metagenomes without Short-Read or Reference Polishing. Nat Methods. (2022) 19:823–26. doi: 10.1038/s41592-022-01539-7
243. Jeong J, Yun K, Mun S, Chung W-H, Choi S-Y, Nam Y, et al. Publisher correction: the effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology. Sci Rep. (2021) 11:10861. doi: 10.1038/s41598-021-90067-z
244. Campanaro S, Treu L, Kougias PG, Zhu X, and Angelidaki I. Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies. Sci Rep. (2018) 8:19265. doi: 10.1038/s41598-018-20414-0
245. Sun D-L, Jiang X, Wu QL, and Zhou N-Y. Intragenomic heterogeneity of 16S rRNA genes causes overestimation of prokaryotic diversity. Appl Environ Microbiol. (2013) 79:5962–695. doi: 10.1128/AEM.01282-13
246. Větrovský T and Baldrian P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PloS One. (2013) 8:e579235. doi: 10.1371/journal.pone.0057923
247. Gao Y and Wu M. Accounting for 16S rRNA copy number prediction uncertainty and its implications in bacterial diversity analyses. ISME Commun. (2023) 3:595. doi: 10.1038/s43705-023-00266-0
248. Peabody MA, Rossum TV, Lo R, and Brinkman FSL. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinf. (2015) 16:363. doi: 10.1186/s12859-015-0788-5
249. Walsh AM, Crispie F, O’Sullivan O, Finnegan L, Claesson MJ, and Cotter PD. Species classifier choice is a key consideration when analysing low-complexity food microbiome data. Microbiome. (2018) 6:505. doi: 10.1186/s40168-018-0437-0
250. Mantri SS, Negri T, Sales-Ortells H, Angelov A, Peter S, Neidhardt H, et al. Metagenomic sequencing of multiple soil horizons and sites in close vicinity revealed novel secondary metabolite diversity. mSystems. (2021) 6:e0101821. doi: 10.1128/mSystems.01018-21
251. Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, and Kleindienst S. Interpretations of environmental microbial community studies are biased by the selected 16S rRNA (Gene) amplicon sequencing pipeline. Front Microbiol. (2020) 11:550420. doi: 10.3389/fmicb.2020.550420
252. Calus ST, Ijaz UZ, and Pinto AJ. NanoAmpli-seq: A workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. GigaScience. (2018) 7:giy140. doi: 10.1093/gigascience/giy140
253. Hussein EI, Jacob JH, Shakhatreh MAK, Abd Al-Razaq MA, Juhmani A-SF, and Cornelison CT. Exploring the microbial diversity in Jordanian hot springs by comparative metagenomic analysis. MicrobiologyOpen. (2017) 6:e005215. doi: 10.1002/mbo3.521
254. Schloss PD. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere. (2024) 9:e0035423. doi: 10.1128/msphere.00354-23
255. Mysara M, Njima M, Leys N, Raes J, and Monsieurs P. From reads to operational taxonomic units: an ensemble processing pipeline for miSeq amplicon sequencing data. GigaScience. (2017) 6:1–105. doi: 10.1093/gigascience/giw017
256. Rodríguez del Río Á, Scheu S, and Rillig MC. Soil microbial responses to multiple global change factors as assessed by metagenomics. Nat Commun. (2025) 16:50585. doi: 10.1038/s41467-025-60390-4
257. Zhang Z-F, Liu L-R, Pan Y-P, Pan J, and Li M. Long-read assembled metagenomic approaches improve our understanding on metabolic potentials of microbial community in mangrove sediments. Microbiome. (2023) 11:1885. doi: 10.1186/s40168-023-01630-x
258. Rasmussen AN and Francis CA. Genome-resolved metagenomic insights into massive seasonal ammonia-oxidizing archaea blooms in san francisco bay. mSystems. (2022) 7:e01270215. doi: 10.1128/msystems.01270-21
259. Trivedi CB, Stamps BW, Lau GE, Grasby SE, Templeton AS, and Spear JR. Microbial metabolic redundancy is a key mechanism in a sulfur-rich glacial ecosystem. mSystems. (2020) 5:e00504–205. doi: 10.1128/mSystems.00504-20
260. Zhang Z, Liu T, Li X, Ye Q, Bangash HI, Zheng J, et al. Metagenome-assembled genomes reveal carbohydrate degradation and element metabolism of microorganisms inhabiting tengchong hot springs, China. Environ Res. (2023) 238:117144. doi: 10.1016/j.envres.2023.117144
261. Kazarina A, Wiechman H, Sarkar S, Richie T, and Lee STM. Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient. Sci Data. (2025) 12:5215. doi: 10.1038/s41597-025-04884-2
262. Venturini AM, Gontijo JB, Mandro JA, Paula FS, Yoshiura CA, da França AG, et al. Genome-resolved metagenomics reveals novel archaeal and bacterial genomes from amazonian forest and pasture soils. Microbial Genomics. (2022) 8:mgen000853. doi: 10.1099/mgen.0.000853
263. Tao Y, Xun F, Zhao C, Mao Z, Li B, Xing P, et al. Improved assembly of metagenome-assembled genomes and viruses in tibetan saline lake sediment by hiFi metagenomic sequencing. Microbiol Spectr. (2023) 11:e0332822. doi: 10.1128/spectrum.03328-22
264. Van Goethem MW, Osborn AR, Bowen BP, Andeer PF, Swenson TL, Clum A, et al. Long-read metagenomics of soil communities reveals phylum-specific secondary metabolite dynamics. Commun Biol. (2021) 4:1302. doi: 10.1038/s42003-021-02809-4
265. Huang Z, Wang J, He X, Zhang M, Ren X, Yu W, et al. Divergent profiles of rhizosphere soil carbon and nitrogen cycling in pinus massoniana provenances with different types of carbon storage. Front Microbiol. (2025) 16:1537173. doi: 10.3389/fmicb.2025.1537173
266. Bai X, Wu J, Zhang B, Zhao H, Tian F, and Wang B. Metagenomics reveals functional profiles of soil nitrogen and phosphorus cycling under different amendments in saline-alkali soil. Environ Res. (2025) 267:120686. doi: 10.1016/j.envres.2024.120686
267. Kelly CN, Schwaner GW, Cumming JR, and Driscoll TP. Metagenomic reconstruction of nitrogen and carbon cycling pathways in forest soil: influence of different hardwood tree species. Soil Biol Biochem. (2021) 156:108226. doi: 10.1016/j.soilbio.2021.108226
268. Shaffer JP, Nothias L-F, Thompson LR, Sanders JG, Salido RA, Couvillion SP, et al. Standardized multi-omics of earth’s microbiomes reveals microbial and metabolite diversity. Nat Microbiol. (2022) 7:2128–50. doi: 10.1038/s41564-022-01266-x
269. Liu C, Yu J, Ying J, Zhang K, Hu Z, Liu Z, et al. Integrated metagenomics and metabolomics analysis reveals changes in the microbiome and metabolites in the rhizosphere soil of fritillaria unibracteata. Front Plant Sci. (2023) 14:1223720. doi: 10.3389/fpls.2023.1223720
270. Chuckran PF, Fofanov V, Hungate BA, Morrissey EM, Schwartz E, Walkup J, et al. Rapid response of nitrogen cycling gene transcription to labile carbon amendments in a soil microbial community. mSystems. (2021) 6:e00161–21. doi: 10.1128/mSystems.00161-21
Keywords: soil microbiome, 16S rRNA gene sequencing, ITS sequencing, shotgun metagenomics, Illumina, PacBio, Oxford nanopore
Citation: Reznikova DA, Barannikova MV, Shnakhova LM, Mitkin NA and Vatlin AA (2026) Next-generation sequencing approaches for soil microbiome research. Front. Soil Sci. 5:1706999. doi: 10.3389/fsoil.2025.1706999
Received: 16 September 2025; Accepted: 05 December 2025; Revised: 05 November 2025;
Published: 12 January 2026.
Edited by:
Simone Raposo Cotta, University of São Paulo, BrazilReviewed by:
Suleiman Aminu, Mohammed VI Polytechnic University, MoroccoDiksha Garg, DAV University, India
Copyright © 2026 Reznikova, Barannikova, Shnakhova, Mitkin and Vatlin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Diana A. Reznikova, cmV6bmlrb3ZhLmRhQHBoeXN0ZWNoLmVkdQ==
Lidia M. Shnakhova2