ORIGINAL RESEARCH article

Front. Environ. Sci., 03 June 2022

Sec. Water and Wastewater Management

Volume 10 - 2022 | https://doi.org/10.3389/fenvs.2022.901917

HT-ARGfinder: A Comprehensive Pipeline for Identifying Horizontally Transferred Antibiotic Resistance Genes and Directionality in Metagenomic Sequencing Data

  • 1. Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

  • 2. Interdisciplinary PhD Program in Genetics, Bioinformatics and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

  • 3. Department of Civil and Environment Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

Article metrics

View details

5

Citations

5,1k

Views

1,4k

Downloads

Abstract

Antibiotic resistance is a continually rising threat to global health. A primary driver of the evolution of new strains of resistant pathogens is the horizontal gene transfer (HGT) of antibiotic resistance genes (ARGs). However, identifying and quantifying ARGs subject to HGT remains a significant challenge. Here, we introduce HT-ARGfinder (horizontally transferred ARG finder), a pipeline that detects and enumerates horizontally transferred ARGs in metagenomic data while also estimating the directionality of transfer. To demonstrate the pipeline, we applied it to an array of publicly-available wastewater metagenomes, including hospital sewage. We compare the horizontally transferred ARGs detected across various sample types and estimate their directionality of transfer among donors and recipients. This study introduces a comprehensive tool to track mobile ARGs in wastewater and other aquatic environments.

1 Introduction

The ability of pathogens to resist antibiotic treatments significantly increases both morbidity and mortality (Bush et al., 2011). As hosts and pathogens have coevolved over millions of years, bacterial pathogens have adjusted their virulence to adapt to host defensive mechanisms (Beceiro et al., 2013). Antimicrobial-resistant infections kill at least 700,000 people worldwide each year, and, within 30 years, resistant infections are expected to kill 10 million people annually, far outnumbering cancer mortality (O’neill, 2014). There is growing recognition that problematic pathogens encountered in the clinic originally acquired their ability to resist antibiotics from the broader pool of bacteria inhabiting natural environments. Once a resistant pathogen becomes endemic in a hospital, it can create a source of nosocomial infection and elevate the overall risk of standard medical procedures (Davies and Davies, 2010; Cheong et al., 2017).

Horizontal gene transfer (HGT) is a natural process that contributes to the evolution of resistant bacterial strains through the physical passage of DNA from one cell to another (Khan and Rao, 2019). HGT occurs through three fundamental mechanisms: transformation (uptake of naked DNA by competent cells), transduction (uptake of foreign DNA mediated by a phage), and conjugation (bacterial mating) (Amarasiri et al., 2020). This transfer of genetic material can cause both beneficial as well as adverse consequences from a human standpoint (Kunhikannan et al., 2021). HGT is commonly harnessed for industrial biotechnological purposes and generally serves to expand genetic diversity (Le Roux and Blokesch, 2018). However, HGT is also primarily responsible for mobilizing multidrug resistance (MDR) among strains. In particular, multi-antibiotic resistant superbugs, which can be resistant to all available antibiotic therapies, have arisen mainly as a result of HGT (Mathers et al., 2015; Wang and Sun, 2015; Malhotra-Kumar et al., 2016). The transfer of plasmids carrying multiple ARGs is one primary driver of the superbug phenomenon (Sun et al., 2019).

A growing body of research has brought to light the importance of aquatic environments as a hub for the dissemination of ARGs. Aquatic environments typically receive multiple waste streams, treated and untreated, containing various mixtures of contaminants. As such, aquatic environments have been identified as ideal settings for HGT of ARGs, while human exposure to antibiotic resistance bacteria, and ARGs in aquatic environments may pose an additional health risk (Amarasiri et al., 2020). The evolution and spread of ARGs in aquatic environments have been documented across several studies conducted in various locales across the globe. Hundreds of different ARGs have been discovered in bacteria found not just in hospitals, livestock, and meatpacking wastewater but also in sewage, effluent treatment facilities, surface water, groundwater, and even drinking water (Zhang et al., 2009).

Convenient, efficient, and accurate means of identifying ARGs subject to HGT would be highly valuable to the research community and to fledgling efforts to advance environmental monitoring of antibiotic resistance. This would allow the ability to focus on ARGs that are of most concern in terms of ability to mobilize to pathogens instead of intrinsic, non-mobile counterparts. Advances in shotgun metagenomic sequencing of environmental samples holds substantial promise for this purpose. The ability to recover millions of DNA sequences from a given sample affords the opportunity to correspondingly search for ARGs and mobile genetic elements (MGEs), including plasmids, transposons, and integrons. Recently, MetaCHIP was introduced as a tool to specifically identify genes subject to HGT across a microbial community (Song et al., 2019). MetaCHIP identifies HGT in assembled metagenomic data through a combination of best hit and phylogenetic approaches, which further serves to inform estimates of the directionality of the horizontally transferred genes (i.e., the donor and recipient of the HGT). LEMON (Li et al., 2019) is a similar tool that was also recently introduced and, like MetaCHIP, works best with nearly complete metagenome-assembled genomes. However, these existing tools are not explicitly tailored to identify and track the directionality of horizontally transferred ARGs.

We propose a new pipeline, HT-ARGfinder, which detects horizontally transferred ARGs from metagenomic data derived from complex environmental microbial communities. The pipeline was applied to metagenomes obtained from a range of wastewater samples, representing presumed hotspots for ARG evolution and mobilization. Two ARG databases were applied and compared: CARD, the most well-curated and up-to-date database of known ARGs, and DeepARG-DB, which comprises 14,933 genes including the three ARG databases: CARD, ARDB, and UNIPROT, to include putative new ARGs discovered based on deep learning Arango-Argoty et al. (2018). The specific objectives of this study were to: 1) develop HT-ARGfinder as a comprehensive and user-friendly pipeline for identifying horizontally transferred ARGs in metagenomic data sets and inferring directionality of their transfer, 2) apply the pipeline for identifying and comparing horizontally transferred ARGs detected across a range of representative wastewater environments, and 3) infer the directionality of ARG transfer among various bacterial classes in several metagenomic data sets.

2 Materials and Methods

2.1 Workflow

HT-ARGfinder combines a series of bioinformatics tools to process metagenomic data, starting with quality control and ending with ARG detection and directionality inference. The workflow of HT-ARGfinder is shown in Figure 1.

FIGURE 1

2.1.1 Quality Control of Reads

Fastp (Chen et al., 2018), an ultra-fast FASTQ preprocessor with quality control, was incorporated for quality control and preprocessing of the paired-end reads. This tool is developed in C++ and has multi-threading support. Fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic (Bolger et al., 2014), Cutadapt (Martin, 2011), FASTQC (Brown et al., 2017), and AfterQC (Chen et al., 2017).

2.1.2 Metagenomic Assembly

MegaHIT (Li et al., 2015) was used for assembling metagenomes, which we previously found to provide the most accurate and extensive assemblies of short-read metagenomic data from complex environmental samples. MegaHIT is a de novo assembler that uses a succinct de Bruijn graph to assemble short reads (Bowe et al., 2012).

2.1.3 Binning

MetaBAT 2 (Kang et al., 2019) is incorporated for the purpose of reconstructing single genomes from microbial communities. In comparison to other binning tools such as MaxBin2 (Wu et al., 2016), CONCOCT (Alneberg et al., 2013), and MyCC (Lin and Liao, 2016), MetaBAT 2 shows better performance.

2.1.4 Quality Control of Bins

For quality control of the bins, CheckM (v1.1.2) (Parks et al., 2015) was employed. CheckM is an automated method for evaluating the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. The bins were filtered based on completeness and contamination reported by CheckM. Bins having completeness ≥ 40% and contamination ≤ 5% were selected for the next step. According to Bowers et al. (2017), “Medium-quality drafts” SAGs and MAGs are those genomes with completeness estimates of and with contamination of . All the other SAGs and MAGs ( completeness or contamination) should be reported as “low quality drafts.” MetaCHIP has chosen completess and 0% contamination for their pipeline Song et al. (2019). We have used contamination since its less than the recommended 10% for medium quality drafts but slightly less stringent than MetaCHIP.

2.1.5 Taxonomic Classification

After binning, the genomes are taxonomically classified using GTDB-Tk (v1.6.0), which provides objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) (Chaumeil et al., 2020).

2.1.6 HGT Detection

MetaCHIP, a pipeline for reference independent HGT identification in metagenomic data (Song et al., 2019), is used for detecting HGT at the community level. MetaCHIP implements a combination of both best match and phylogenetic approaches. First, it uses sequence similarity comparison to identify putative open reading frames (ORFs) horizontally transferred for user-defined taxonomy ranks (e.g., class, order family, or genus). Then phylogenetic trees are constructed for the putative HGT gene candidates and compared to the species trees determined by single-copy genes. Finally, reconciliation of the gene trees and species trees is performed using Ranger-DTL v2.0 (Bansal et al., 2018) to identify HGTs and donors and recipients. MetaCHIP allows users to specify the taxonomy ranks for which HGTs are identified. The default is “class,” and HT-ARGfinder keeps the default one.

2.1.7 Databases and Sequence Alignment

The detected horizontally transferred genes are finally aligned against two reference databases, DeepARG-DB (Arango-Argoty et al., 2018) and CARD (Alcock et al., 2020) using DIAMOND (Buchfink et al., 2015) to determine ARGs. CARD represents a highly-curated database of previously described ARGs that is updated periodically. In contrast, DeepARG-DB is a larger database that encompasses the ARGs included in CARD and additional ARGs detected from a deep learning algorithm. There is an option to use new databases in our pipeline. Users can easily execute the command and include new databases to check ARGs against those databases.

2.2 Wastewater Metagenomes Included in This Study

We selected publicly-available metagenomic data from a range of wastewaters for inclusion in this study; see Table 1. We were particularly interested in data sets, such as hospital sewage, urban conventional activated sludge, likely to contain a diverse array of ARGs that had been subject to HGT. All data used in this study are open source and are available at NCBI Sequence Read Archive (SRA) under the Bioproject accession numbers reported in Table 1. These included hospital sewage, urban conventional activated sludge and urban sewage samples.

TABLE 1

Environment1Accession no. (NCBI SRA)Classification according to original study
Hospital sewageERX3538875, ERX3538874, ERX3538873, ERX3538872, ERX3538871, ERX3538870, ERX3538869, ERX3538868Hospital wastewater
SRX7901963, SRX7901962, SRX7901961, SRX7901960, SRX7901959, SRX7901958Hospital sewage
Urban conventional activated sludgeSRX3720490, SRX3720489, SRX3720488, SRX3720487, SRX3720486, SRX3720481, SRX3720480, SRX3720479Activated sludge
Urban sewageERX1795927, ERX1783568, ERX1795930, ERX1783571, ERX1795935, ERX1783576, ERX2608526, ERX2608605, ERX1783583, ERX1783584Urban sewage

Summary of data sets used to demonstrate HT-ARGfinder.

Table 1 provides a summary of these samples, including the environments analyzed as named by the original researchers and generalized categories based on common terminology used in this paper (hospital sewage, urban conventional activated sludge, and urban sewage). These datasets included Illumina HiSeq 4000 paired-end sequencing metagenomic data, MiSeq paired-end sequencing metagenomic data, Illumina HiSeq 2500, and Illumina HiSeq 3000 paired-end metagenomic data.

2.3 Demonstrating HT-ARGfinder

We demonstrated HT-ARGfinder using the data summarized in Table 1. The HT-ARGfinder output consisted of two lists of horizontally-transferred ARGs according to CARD and DeepARG-DB databases and their corresponding estimated directionalities. The number of detected ARGs from these two databases was compared. We also analyzed the abundances of different bacterial classes associated with the HGT events in the samples of different environments according to CARD and DeepARG-DB. MetaCHIP was applied to identify the bacterial classes corresponding to the detected horizontally-transferred genes. We filtered out the ARGs detected using both CARD and DeepARG-DB from that list. Finally, we mapped the output lists of HT-ARGfinder to directed graphs where the nodes represent different bacterial classes involved in HGT, and the directed edges represent the direction of the HGT. This graph provides visualization of the whole HGT scenario of a metagenomic sample.

3 Results

3.1 ARG Detection

HT-ARGfinder was found to be a robust tool for the detection of horizontally transferred ARGs in complex environmental metagenomic data sets. Table 2 presents the total number of horizontally transferred genes and the total number of ARGs according to CARD and DeepARG-DB databases. The table shows that raw hospital wastewater and urban conventional activated sludge have many HGT events. Still, fewer horizontally transferred genes were detected in the urban wastewater treatment plant samples. Based on these trends, raw hospital wastewater, and urban conventional activated sludge samples focused on the remainder of the HT-ARGfinder demonstration.

TABLE 2

EnvironmentData setHGTsARGs (CARD)ARGs (DeepARG-DB)
Hospital sewageERX35388759867
ERX35388746123
ERX35388732642120
ERX3538872411
ERX3538871900
ERX3538870400
ERX35388692023
ERX3538868933
SRX79019633023
SRX79019621433
SRX79019613122
SRX7901960900
SRX79019590
SRX7901958300
Urban conventional activated sludgeSRX37204902011
SRX37204891111
SRX37204881933
SRX37204874422
SRX3720486822
SRX37204811422
SRX37204801112
SRX37204792401
Urban sewageERX17959271233
ERX1783568800
ERX1795930111
ERX1783571411
ERX17959351111
ERX17835761811
ERX26085261100
ERX26086051100
ERX17835831211
ERX1783584900

Total number of horizontally transferred genes (HGTs), ARGs according to CARD database, and ARGs according to DeepARG-DB.

As can be observed from Table 2, ARGs represent a remarkably consistent portion of the total horizontally transferred genes detected (10% estimated for both types of environments). As expected, DeepARG-DB tended to detect a greater number of horizontally transferred ARGs than CARD. Still, the percent of the horizontally transferred ARGs were similar (Figure 2). This suggests that the relative proportion of mobile ARGs does not vary substantially across the environments tested, although the overall transfer rates may vary.

FIGURE 2

3.2 Identification of Donor and Recipient Bacterial Classes

MetaCHIP provides a prediction of the donor and recipient bacteria involved in each HGT event. HT-ARGfinder improves upon this feature by filtering events related explicitly to ARG transfers. This yielded valuable information concerning the classes of bacteria actively engaged in the horizontal transfer of ARGs. Further, the abundance of these classes and ARGs can be estimated in each environment. Heatmaps were generated to illustrate the average number of bacterial classes estimated to be engaged in ARG transfer across the samples analyzed in this study (Figure 3).

FIGURE 3

The hospital sewage samples were found to be enriched with Actinomycetia, Negativicutes, Clostridia_A, Clostridia, Bacilli, Acidimicrobiia, Bacteroidia, Gammaproteobacteria, Phycisphaerae, Planctomycetes, and Alphaproteobacteria. Figure 3 shows their average abundance (Supplementary Tables show the number of detected ARG classes in each sample). In urban conventional activated sludge samples, we found UBA2214, Alphaproteobacteria, RBG-16-71-46, Acidimicrobiia, Polyangia, Gammaproteobacteria, Calditrichia, and Bacteroidia (Figure 3). In urban sewage samples, Bacilli, Gammaproteobacteria, Fusobacteriia, Clostridia, Negativicutes, Campylobacteria, Desulfobacteria, and Polyangia were found (Figure 3). Figure 4 shows all the bacterial classes and their abundances as donor and recipient in these metagenomic samples according to both DeepARG-DB and CARD databases. We can see that Clostridia and Gammaproteobacteria were higher in counts as the donor, whereas Gammaproteobacteria, Bacteroidea, and Clostridia were higher in counts as the recipient.

FIGURE 4

3.3 Comparison Between CARD and DeepARG-DB

Implementation of CARD versus DeepARG-DB in the pipeline resulted in different numbers of horizontally transferred ARGs detected in some samples (Table 1; Figures 24). For the cases in which the discrepancy was the greatest, DeepARG-DB has consistently yielded the higher count. This is consistent with DeepARG-DB being a larger database than CARD, encompassing the ARGs included in CARD and additional ARGs identified via the deep learning module.

3.4 Relationship Among Bacterial Classes in Samples

HT-ARGfinder provides additional value in inferring the directionality of ARG transfer among bacterial classes. This can be visualized in directed graphs that illustrate the various transfer relationships among the bacterial classes represented in a sample. For example, for a metagenomic sample, let G = (V, E) be a directed graph where each vertex represents a bacterial class, each directed edge between two vertices indicates the directionality of the ARG transfer between the two bacterial classes, and the edge weight indicates the number of ARGs transferred. A hospital sewage sample (ERX3538875) was used to visualize the relationship among the bacterial classes (Figure 5). Four bacterial classes involved in HGT of ARGs were identified in this sample: Bacilli, Gammaproteobacteria, Bacteroidia, and Acidimicrobiia. The possible ARGs that are transferred between Gammaproteobacteria and Alphaproteobacteria encoded resistance to macrolides (macB), peptide (bcRA), and polymyxins (arnA).

FIGURE 5

4 Discussion and Conclusion

HT-ARGfinder provides a comprehensive and user-friendly tool specifically directed toward identifying mobile ARGs in metagenomic data sets and estimating the directionality of their transfer among bacterial classes. This pipeline can provide substantial value in targeting environmental monitoring efforts towards tracking mobile ARGs and help inform better and direct efforts to stop the spread of antibiotic resistance. Here we demonstrated the tool’s utility for a spectrum of wastewater samples. We have used CARD and DeepARG-DB as the ARG databases, but this pipeline can be modified easily by replacing these databases with some recent ones, for example, HMD-ARG-DB (Li et al., 2021). It is composed of 17, 282 sequences, gathered and cleaned from 7 published ARG databases: CARD, AMRFinder (Feldgarden et al., 2019), ResFinder (Zankari et al., 2012), ARG-ANNOT (Gupta et al., 2014), DeepARG, MEGARes (Doster et al., 2020) and Resfams (Gibson et al., 2015). We have shown results using CARD and DeepARG-DB because these are well-known ones. Users can modify the pipeline by adding other databases easily.

Our tool detects all the ARGs that are transferred horizontally. These ARGs are horizontally transferred through the mobile genetic elements such as plasmids, transposons, integrons, and genomic islands. (Khezri et al., 2020) detects only the plasmid-mediated ARGs by first identifying plasmids. Since our tool detects all the ARGs transferred through HGT, it detects ARGs mediated through potential mobile genetic elements.

The analysis was consistent with the expectation of high abundances of mobile ARGs in hospital wastewater and urban conventional activated sludge samples. While the focus was on wastewater samples, the pipeline can readily be applied to various other complex metagenomes, including different aquatic and wastewater samples and gastrointestinal microbiota, broadly of interest for ARG surveillance and research. Based on this study, it was apparent that ARGs represent a substantial fraction (∼ 10%) of genes actively transferred in a given wastewater microbiome. It would be interesting to examine further the remaining genes subject to HGT, which might include a novel, yet to be discovered ARGs. This is apparent in recognizing that many horizontally transferred ARGs were found when the pipeline queried DeepARG-DB. Additionally, such genes could encode other relevant functions, such as metal resistance or processes related to gene mobility. Further analysis, such as this, of the context of the ARGs, can help resolve the driver of their mobility. The directionality analysis is further of significant value in tracking the movement of ARGs across bacterial hosts. Such information could provide value in identifying which bacteria should be targeted in mitigation efforts aimed at stemming the spread of antimicrobial resistance.

Statements

Data availability statement

The source code of HTARGfinder can be found at https://github.com/Badhan023/HTARGfinder. All the data sets that have been used are publicly available in the NCBI Short Read 235 Archive (SRA).

Author contributions

LZ conceived the original idea for the pipeline. BD, ME, and NM accordingly planned and executed the study. AP provided consultation in the application of the pipeline and interpretation of results. BD designed and implemented the pipeline, wrote the initial draft of the manuscript. All authors read, modified, and approved the final manuscript.

Funding

This work was supported in part by the U.S. National Science Foundation Awards 1545756, 2004751, and 2125798.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fenvs.2022.901917/full#supplementary-material

Footnotes

1.^Environments were re-classified based on information reported by the authors for the consistency across this study

References

Summary

Keywords

antibiotic resistance gene, horizontal gene transfer, metagenomics, urban conventional activated sludge, hospital sewage, urban sewage

Citation

Das B, Emon MI, Moumi NA, Sein J, Pruden A, Heath LS and Zhang L (2022) HT-ARGfinder: A Comprehensive Pipeline for Identifying Horizontally Transferred Antibiotic Resistance Genes and Directionality in Metagenomic Sequencing Data. Front. Environ. Sci. 10:901917. doi: 10.3389/fenvs.2022.901917

Received

22 March 2022

Accepted

19 May 2022

Published

03 June 2022

Volume

10 - 2022

Edited by

Zhi Wang, Innovation Academy for Precision Measurement Science and Technology (CAS), China

Reviewed by

Bing Li, Tsinghua University, China

Zeyou Chen, Nankai University, China

Yuyi Yang, Wuhan Botanical Garden (CAS), China

Updates

Copyright

*Correspondence: Liqing Zhang,

This article was submitted to Water and Wastewater Management, a section of the journal Frontiers in Environmental Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics