<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2022.946070</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Technology and Code</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Daily Reports on Phage-Host Interactions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Albrycht</surname> <given-names>Kamil</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1821024/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Rynkiewicz</surname> <given-names>Adam A.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Harasymczuk</surname> <given-names>Michal</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1888941/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Barylski</surname> <given-names>Jakub</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1888155/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Zielezinski</surname> <given-names>Andrzej</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1042014/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University</institution>, <addr-line>Pozna&#x0144;</addr-line>, <country>Poland</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Traumatology, Orthopaedics and Hand Surgery, University of Medical Sciences</institution>, <addr-line>Poznan</addr-line>, <country>Poland</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Molecular Virology, Faculty of Biology, Adam Mickiewicz University</institution>, <addr-line>Pozna&#x0144;</addr-line>, <country>Poland</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Maria Dzunkova, University of Valencia, Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Witold Kot, University of Copenhagen, Denmark; Pierre Chaumeil, The University of Queensland, Australia; Lubos Klucar, Institute of Molecular Biology (SAS), Slovakia</p></fn>
<corresp id="c001">&#x002A;Correspondence: Andrzej Zielezinski, <email>andrzejz@amu.edu.pl</email></corresp>
<fn fn-type="other" id="fn002"><p><sup>&#x2020;</sup>ORCID: Jakub Barylski, <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0001-6630-6932">orcid.org/0000-0001-6630-6932</ext-link>; Andrzej Zielezinski, <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0002-8096-3776">orcid.org/0000-0002-8096-3776</ext-link></p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Phage Biology, a section of the journal Frontiers in Microbiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>946070</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>05</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Albrycht, Rynkiewicz, Harasymczuk, Barylski and Zielezinski.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Albrycht, Rynkiewicz, Harasymczuk, Barylski and Zielezinski</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Understanding phage-host relationships is crucial for the study of virus biology and the application of phages in biotechnology and medicine. However, information concerning the range of hosts for bacterial and archaeal viruses is scattered across numerous databases and is difficult to obtain. Therefore, here we present PHD (<underline>P</underline>hage &#x0026; <underline>H</underline>ost <underline>D</underline>aily), a web application that offers a comprehensive, up-to-date catalog of known phage-host associations that allows users to select viruses targeting specific bacterial and archaeal taxa of interest. Our service combines the latest information on virus-host interactions from seven source databases with current taxonomic classification retrieved directly from the groups and institutions responsible for its maintenance. The web application also provides summary statistics on host and virus diversity, their pairwise interactions, and the host range of deposited phages. PHD is updated daily and available at <ext-link ext-link-type="uri" xlink:href="http://phdaily.info">http://phdaily.info</ext-link> or <ext-link ext-link-type="uri" xlink:href="http://combio.pl/phdaily">http://combio.pl/phdaily</ext-link>.</p>
</abstract>
<kwd-group>
<kwd>phage</kwd>
<kwd>host</kwd>
<kwd>bacteria</kwd>
<kwd>archaea</kwd>
<kwd>phage-host interactions</kwd>
<kwd>database</kwd>
<kwd>web application</kwd>
</kwd-group>
<contract-sponsor id="cn001">Narodowe Centrum Nauki<named-content content-type="fundref-id">10.13039/501100004281</named-content></contract-sponsor><contract-sponsor id="cn002">Narodowe Centrum Bada&#x0144; i Rozwoju<named-content content-type="fundref-id">10.13039/501100005632</named-content></contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="40"/>
<page-count count="6"/>
<word-count count="4961"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Phages play a pivotal role in many ecosystems by shaping the structure of bacterial communities (<xref ref-type="bibr" rid="B12">Dion et al., 2020</xref>). They are also the main drivers of horizontal gene transfer and bacterial evolution (<xref ref-type="bibr" rid="B4">Breitbart et al., 2018</xref>). As most viruses have narrow host ranges that span no more than a species or genus (<xref ref-type="bibr" rid="B28">Paez-Espino et al., 2016</xref>), they can be used to control the population of certain bacterial species with minimal risk of disturbing the entire microbiota. Thus, phages have been used in diagnostics (<xref ref-type="bibr" rid="B33">Schofield et al., 2012</xref>), drug design (<xref ref-type="bibr" rid="B25">Nixon et al., 2014</xref>), the treatment of human and animal infections (<xref ref-type="bibr" rid="B10">Dedrick et al., 2019</xref>; <xref ref-type="bibr" rid="B14">Eskenazi et al., 2022</xref>), agriculture (<xref ref-type="bibr" rid="B5">Buttimer et al., 2017</xref>), food preservation (<xref ref-type="bibr" rid="B36">Sulakvelidze, 2013</xref>), and wastewater treatment (<xref ref-type="bibr" rid="B19">Jassim et al., 2016</xref>).</p>
<p>Paradoxically, although information on host specificity is a crucial part of phage biology and a prerequisite for its practical application, it is not readily accessible. Theoretically, the databases of the National Center for Biotechnology Information (NCBI) such as RefSeq (<xref ref-type="bibr" rid="B26">O&#x2019;Leary et al., 2016</xref>) or GenBank (<xref ref-type="bibr" rid="B31">Sayers et al., 2019</xref>) provide host information for most viral genomic sequences. Unfortunately, this information is stored in error-prone textual form with no direct links to the valid taxonomic classification of the host. Thus, the information is often ambiguous (e.g., simply &#x201C;endosymbiont&#x201D;), too generic, (e.g., &#x201C;Bacteria&#x201D;, &#x201C;Proteobacteria&#x201D;), taxonomically outdated, (e.g., &#x201C;<italic>Bacillus megaterium</italic>&#x201D; instead of <italic>Priestia megaterium</italic>) or misspelled (e.g., &#x201C;<italic>Bacilluls</italic>&#x201D; instead of <italic>Bacillus</italic>). These issues have been addressed in two excellent databases, Virus-Host DB (<xref ref-type="bibr" rid="B23">Mihara et al., 2016</xref>) and NCBI Virus (<xref ref-type="bibr" rid="B18">Hatcher et al., 2017</xref>), both of which provide access to host taxonomy based on the curation of plain-text host descriptors in GenBank and RefSeq. However, these databases only partially overlap in assignments between viral and prokaryotic species due to different genome selection criteria and host information-extraction methods (e.g., Virus-Host DB contains only viruses with complete genomes and provides host information based on a manual literature survey). Host information is also sporadically available in virus protein records from UniProt-SwissProt (<xref ref-type="bibr" rid="B3">Bateman et al., 2021</xref>) and annotations of protein-protein interactions from IntAct Molecular Interaction Database (<xref ref-type="bibr" rid="B27">Orchard et al., 2014</xref>). The MVP database (Microbe Versus Phage) provides phage&#x2013;host interactions from RefSeq and GenBank with the addition of prophage sequence predictions from assembled metagenomic sequences (<xref ref-type="bibr" rid="B16">Gao et al., 2018</xref>). Consequently, information regarding known phage-host interactions is scattered across multiple databases, each with different content, data access, and update times. Such a situation is inconvenient for researchers and hinders attempts at systematic, statistical analyzes of phage-host interactions.</p>
<p>To address this problem, we have developed PHD (<underline>P</underline>hage &#x0026; <underline>H</underline>ost <underline>D</underline>aily), a daily updated web application that combines information on phage-host interactions from seven sources &#x2014; NCBI Virus, Virus-Host DB, MVP, RefSeq, GenBank, UniProt, and IntAct. PHD provides information on hosts for prokaryotic viruses at the species level using two alternative taxonomic classification systems, NCBI Taxonomy (<xref ref-type="bibr" rid="B32">Schoch et al., 2020</xref>) and Genome Taxonomy Database (GTDB) (<xref ref-type="bibr" rid="B29">Parks et al., 2020</xref>, <xref ref-type="bibr" rid="B30">2022</xref>). Virus species are classified according to NCBI Taxonomy and the International Committee on Taxonomy of Viruses (ICTV) (<xref ref-type="bibr" rid="B17">Gorbalenya et al., 2020</xref>; <xref ref-type="bibr" rid="B22">Krupovic et al., 2021</xref>). PHD also points to genome assemblies available for each virus species by keeping track of the NCBI Assembly resource (<xref ref-type="bibr" rid="B21">Kitts et al., 2016</xref>) and the INPHARED database of complete phage genomes (<xref ref-type="bibr" rid="B7">Cook et al., 2021</xref>). PHD also publishes daily reports on the current catalog of phage-host interactions. Finally, the web application offers easy access to data by providing user-friendly search, browse, and filter utilities not included in earlier phage-host databases.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<p>The workflow of data collection related to virus genomic sequences, host information, and taxonomic classification is shown in <xref ref-type="supplementary-material" rid="FS1">Supplementary Figure 1</xref>.</p>
<sec id="S2.SS1">
<title>Virus Sequence Data</title>
<p>Virus genome assemblies from GenBank and RefSeq are downloaded from NCBI (<xref ref-type="bibr" rid="B21">Kitts et al., 2016</xref>) using genome_updater v. 0.5.1 software<sup><xref ref-type="fn" rid="footnote1">1</xref></sup>. The information on the assembly level of each genome (Complete Genome/Chromosome, Scaffold, Contig) is extracted from assembly report files. Nucleotide sequences of viruses present in GenBank or RefSeq but not in the Assembly database are retrieved in FASTA and flat-file formats from NCBI Virus (<xref ref-type="bibr" rid="B18">Hatcher et al., 2017</xref>) and the RefSeq FTP server. The obtained sequences are assigned as a &#x201C;Complete Genome&#x201D; if they were included in the monthly-update of complete phage genomes in INPHARED (<xref ref-type="bibr" rid="B7">Cook et al., 2021</xref>).</p>
</sec>
<sec id="S2.SS2">
<title>Taxonomic Classification</title>
<p>National Center for Biotechnology Information (NCBI) taxonomy tables are downloaded from the NCBI FTP server. The ICTV taxonomy of viruses is retrieved from the Virus Metadata Resource at the ICTV website, and species are mapped to the corresponding NCBI taxonomy identifiers based on the RefSeq/GenBank genome accessions provided by ICTV. The GTDB taxonomy of Bacteria and Archaea is obtained from metadata files provided in the latest GTDB release. The bacterial and archaeal lineages are mapped between NCBI and GTDB taxonomies based on the NCBI taxonomy identifiers provided in the GTDB archaeal and bacterial metadata files.</p>
</sec>
<sec id="S2.SS3">
<title>Host Information</title>
<p>Virus-host assignments are retrieved from: (i) the NCBI Virus website, (ii) the TSV file provided by VirusHost DB, (iii) the text files from MVP, (iv) the GenBank flat files (in the &#x201C;/isolated_host = &#x201C; or &#x201C;/host = &#x201C; qualifiers), (v) the protein-protein interactions from the IntAct FTP server, and (vi) the protein sequence entries in UniProt-SwissProt (&#x201C;OH&#x201D; line in UniProt entry). The extracted names and taxonomy identifiers of hosts are queried against NCBI Taxonomy using TaxonKit v. 0.10.1 (<xref ref-type="bibr" rid="B34">Shen and Ren, 2021</xref>) to retrieve complete host lineages. Only bacterial and archaeal hosts specified at the species level are included.</p>
</sec>
<sec id="S2.SS4">
<title>Host Range</title>
<p>For a prokaryotic virus infecting only one host species, the host range is set to this species. For a virus infecting multiple host species, we defined the host range as the taxonomic rank of the last common ancestor of all its hosts in the NCBI taxonomic database.</p>
</sec>
<sec id="S2.SS5">
<title>Application Development</title>
<p>The PHD web interface was developed in React.js (v. 17.0.2), Next.js (v. 11.1.3) and Highcharts.js (v. 10.0.0). The database querying system was developed in Django (v. 4.0.0), Django REST framework (v. 3.13.1), and Python (v. 3.9.5) using SQLite database as a management system.</p>
</sec>
</sec>
<sec id="S3" sec-type="results">
<title>Results</title>
<sec id="S3.SS1">
<title>Taxonomic and Genomic Diversity of Viruses</title>
<p>As of May 1, 2022, 12,123 virus species have prokaryotic hosts reported at the species level. Only one-quarter of these viruses (24%) have been classified by ICTV, indicating a significant delay between NCBI submissions and classification by the committee. However, the number of taxa at higher ranks, from genus to phylum, is similar between NCBI and ICTV taxonomies (<xref ref-type="fig" rid="F1">Figure 1A</xref>). Both systems classify prokaryotic viruses into 47 families. More than three-quarters of virus species remain in the morphotype-based <italic>Siphoviridae, Myoviridae</italic>, and <italic>Podoviridae</italic> families (<xref ref-type="fig" rid="F1">Figure 1B</xref>). These umbrella groups of historical importance gather phages that are without properly resolved phylogenetic taxonomy and are scheduled for dissolution (<xref ref-type="bibr" rid="B1">Adriaenssens, 2021</xref>; <xref ref-type="bibr" rid="B37">Turner et al., 2021</xref>). Aside from these, the largest family is <italic>Autographiviridae</italic>, which represents 6% of the total viral species.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Genomic and taxonomic diversity of prokaryotic viruses and their hosts (as of May 1, 2022). <bold>(A)</bold> Number of different virus taxonomic units across six taxonomic ranks (from species to phylum) according to National Center for Biotechnology Information (NCBI) Taxonomy and the International Committee on Taxonomy of Viruses (ICTV). <bold>(B)</bold> Ten most abundant virus families represented by the highest number of virus species. <bold>(C,D)</bold> The number of representative viral genomes stratified by genome composition and assembly level. <bold>(E)</bold> Size distribution of completely sequenced virus genomes. The red vertical line indicates the median genome size, and the light gray background represents the range between the 5<sup>th</sup> and 95<sup>th</sup> percentiles. <bold>(F)</bold> Proportion of viruses isolated on the top 15 most abundant host genera (i.e., host genera infected by the highest number of viruses). <bold>(G)</bold> Number of different taxonomic units of bacterial and archaeal hosts across seven taxonomic ranks compared to the number of all bacterial and archaeal taxa present in NCBI Assembly. <bold>(H)</bold> Ten most abundant host classes represented by the highest number of known host species. <bold>(I)</bold> Number of virus species isolated on a different number of host species. <bold>(J)</bold> Comparison of the number of pairwise interactions between virus and host species in different databases. <bold>(K)</bold> Unique and shared virus-host interactions among four databases. The bar chart indicates the intersection size of virus-host interactions. Connected black dots on the bottom panel indicate which combination of the databases was considered for each intersection. Single, unconnected black dots represent virus-host interactions unique to each database <bold>(L)</bold> Number of genomes and virus species reported daily in the last 2 months (from March 1 to May 1, 2022). Virus genomes were assigned to species based on the then-most-recent NCBI Taxonomy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-13-946070-g001.tif"/>
</fig>
<p>Consequently, most sequences in the database come from double-stranded DNA (dsDNA) viruses (<xref ref-type="fig" rid="F1">Figure 1C</xref>). Single-stranded DNA (ssDNA) accounts for only 1% of viral genomes and belong to three orders: <italic>Tubulavirales</italic> (<italic>n</italic> = 71), <italic>Petitvirales</italic> (<italic>n</italic> = 67), and <italic>Haloruvirales</italic> (<italic>n</italic> = 8; two betapleolipovirus species in <italic>Haloruvirales</italic> have circular dsDNA genomes containing single-stranded discontinuities), and lower taxonomic units not classified at the order level (<italic>n</italic> = 3). RNA viruses correspond to fewer than one percent of virus species (<italic>n</italic> = 27) and belong to five families: <italic>Cystoviridae</italic> (<italic>n</italic> = 16), <italic>Leviviridae</italic> (<italic>n</italic> = 5), <italic>Fiersviridae</italic> (<italic>n</italic> = 4), <italic>Steitzviridae</italic> (<italic>n</italic> = 1), and <italic>Duinviridae</italic> (<italic>n</italic> = 1).</p>
<p>Most virus species (92%) are represented by single genome assembly. The remaining species mainly have two (4%) or three (1%) genomes assigned. The highest number of genomes have been reported for two closely related species, <italic>Escherichia</italic> virus G4/<italic>Gequatrovirus G4</italic> (<italic>n</italic> = 343) and <italic>Escherichia</italic> virus phiX174<italic>/Sinsheimervirus phiX174</italic> (<italic>n</italic> = 105). In both cases, the majority of retrieved sequences represents strains obtained during <italic>in vitro</italic> evolution experiments (<xref ref-type="bibr" rid="B9">Cuevas et al., 2009</xref>; <xref ref-type="bibr" rid="B13">Domingo-Calap et al., 2009</xref>). Over 92% of virus species (<italic>n</italic> = 11,195) have complete genomes, and the remaining viruses are represented by genomic fragments (7%; <italic>n</italic> = 884) or partial genomes at the contig and scaffold levels (1%; <italic>n</italic> = 44) (<xref ref-type="fig" rid="F1">Figure 1D</xref>). Most virus species are represented only by assemblies from GenBank, but 34% are also covered by the RefSeq database.</p>
<p>The size of complete genomes varies between 1.4 and 551.6 kb, with no homogenous distribution (<xref ref-type="fig" rid="F1">Figure 1E</xref>), which may be due to a bias linked to isolation techniques, sparse sampling of different virus taxa, or natural constraints on the size of viral genomes. Although <italic>Campylobacter</italic> phage C10 is the shortest phage genome sequence (1,417 bp) submitted to NCBI, the record itself (accession: <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="MG065651">MG065651</ext-link>) is flagged by the GenBank staff as &#x201C;unverified&#x201D;. The second smallest phage genome (2,435 bp) belongs to <italic>Leuconostoc</italic> phage L5, which is often cited as the phage with the smallest known genome (<xref ref-type="bibr" rid="B12">Dion et al., 2020</xref>). At the other end of the distribution (<xref ref-type="fig" rid="F1">Figure 1E</xref>), there are 267 phage species (2%) with genomes of more than 200 kb, often referred to as &#x201C;jumbo&#x201D; or &#x201C;giant&#x201D; phages (<xref ref-type="bibr" rid="B40">Yuan and Gao, 2017</xref>; <xref ref-type="bibr" rid="B2">Al-Shayeb et al., 2020</xref>). Such phages have been isolated only for 73 bacterial species from 38 genera, mostly from <italic>Erwinia</italic>, <italic>Vibrio</italic>, <italic>Aeoromans</italic>, <italic>Pseudomonas</italic>, and <italic>Klebsiella</italic>. Phages with genomes &#x003E; 500 kb (<xref ref-type="bibr" rid="B11">Devoto et al., 2019</xref>), <italic>n</italic> = 12) have been isolated from <italic>Prevotella</italic> species (e.g., <italic>Prevotella</italic> phage Lak-B8 has the largest genome of 551,627 bp).</p>
</sec>
<sec id="S3.SS2">
<title>Virus-Host Interactions</title>
<p>Sequenced viruses appear to represent only a small fraction of the actual phage diversity as half of the virus species infect only eight host genera (<italic>Mycolicibacterium</italic>, <italic>Escherichia</italic>, <italic>Streptococcus</italic>, <italic>Vibrio</italic>, <italic>Pseudomonas</italic>, <italic>Salmonella</italic>, <italic>Klebsiella</italic>, and <italic>Staphylococcus</italic>) (<xref ref-type="fig" rid="F1">Figure 1F</xref>). One of the reasons for such a disproportion may be biased toward culturable host taxa in isolation efforts, e.g., the SEA-PHAGES program (Science Education Alliance&#x2013;Phage Hunters Advancing Genomics and Evolutionary Science) that focuses mainly on phages infecting <italic>Mycolicibacterium smegmatis</italic> (<xref ref-type="bibr" rid="B20">Jordan et al., 2014</xref>). In total, the viruses were isolated on 944 prokaryotic species including 875 bacteria and 69 archaea, accounting for 1.5% of all bacterial species (<italic>n</italic> = 60,136) and 2.7% archaeal species (<italic>n</italic> = 2,597) reported in NCBI Assembly (<xref ref-type="fig" rid="F1">Figure 1G</xref>). Compared to NCBI Taxonomy, the fraction of bacterial and archaeal species with known viruses is even smaller and corresponds to only 0.2% of bacterial (<italic>n</italic> = 471,815) and 0.5% of archaeal species (<italic>n</italic> = 12,718), respectively. Although collectively, host species represent 34 classes, three-quarters of the host species fall into five classes (Gammaproteobacteria, Bacilli, Alphaproteobacteria, Actinomycetia, and Betaproteobacteria) (<xref ref-type="fig" rid="F1">Figure 1H</xref>). Given that all cellular organisms are most likely prey to viral attack (<xref ref-type="bibr" rid="B15">Fuhrman, 1999</xref>), these gaps in host diversity indicate that phage genomic diversity and the scope of virus-host interactions remain widely uncharacterized.</p>
<p>To date, there are 12,725 pairwise linkages between 12,123 viral and 944 prokaryotic species. Most viruses (96.1%; <italic>n</italic> = 11,640) were isolated from single hosts, followed by viruses infecting two host species (3.4%; <italic>n</italic> = 419) (<xref ref-type="fig" rid="F1">Figure 1I</xref>) mostly from the same genus or family, and sporadically with a broader host range (<italic>Pseudomonas</italic> virus PB1 reported in two species from different phyla, <italic>Pseudomonas</italic> and <italic>Chryseobacterium</italic>). The remaining virus species (0.5%; <italic>n</italic> = 64) were reported to infect more than two host species (<xref ref-type="fig" rid="F1">Figure 1I</xref>). The record-holder is the <italic>Pseudomonas</italic> virus PRD1, known to infect nine bacteria species from the Proteobacteria phylum carrying the IncN plasmid.</p>
<p>Most assignments between viral and host species were retrieved from NCBI Virus (93%) and GenBank (87%), followed by Virus-Host DB (35%) and RefSeq (31%) (<xref ref-type="fig" rid="F1">Figure 1J</xref>), indicating that Refseq lags behind the submission of new virus genomes (because sequence records in RefSeq additionally undergo NCBI curation). Over a quarter (29%) of the assignments were covered by all source databases (<xref ref-type="fig" rid="F1">Figure 1I</xref>). Despite this overlap, these databases differ in the content of virus-host assignments. NCBI Virus provides 1,069 virus-host assignments (8%) that were not present in the other source databases. Similarly, GenBank and Virus-Host DB also have specific assignments that correspond to 3 and 2% of all interactions, respectively. The remaining source databases &#x2013; RefSeq, MVP, UniProt, and IntAct &#x2013; do not contain unique virus-host assignments, but provide support for 33% of the existing interactions.</p>
</sec>
<sec id="S3.SS3">
<title>Web Interface and Data Access</title>
<p><underline>P</underline>hage &#x0026; <underline>H</underline>ost <underline>D</underline>aily (PHD) offers two ways to access information on interactions between virus and host species: by searching for a particular virus/host taxon and browsing taxonomic trees.</p>
<p>The Search view allows users to look for viruses targeting bacterial or archaeal taxa of interest or prokaryotic taxa that are infected by phages from a given viral taxon. The view allows for searches corresponding to the names and identifiers used in NCBI, GTDB, and ICTV taxonomies. For convenience, the search box features an autocomplete functionality that suggests terms matching the user query. The Browse view provides a hierarchical exploration of virus-host interactions through virus or host taxonomies based on NCBI or GTDB Taxonomy. The interactive interface allows users to expand branches of virus or host trees and view the number of virus-host interactions associated with each node.</p>
<p>Once the query taxon is selected from either the Search or Browse view, PHD presents a table of pairwise interactions between viral and host species belonging to the query viral/host. For each virus-host interaction, PHD lists the source database(s), taxonomic affiliations for both viruses and hosts, as well as information on the virus&#x2019; genome composition and assembly completeness of the representative virus genome. This is a central component of PHD that can be filtered using multiple combinations of parameters (e.g., all virus-host interactions within <italic>Enterobacterales</italic> that are supported by RefSeq and Virus-Host DB and contain viruses with complete genomes).</p>
<p>Each virus species has an associated web page indicating host range, genomic sequences, taxonomy, and nomenclature. The available sequence data for a given virus species are organized into genome assemblies with information on assembly level, sequence length, and an indication of a representative genome, and links to NCBI Assembly and NCBI Nucleotide resources.</p>
<p>All virus-host interaction data and viral sequences available through the web interface can be downloaded as JSON, GenBank, and FASTA files.</p>
</sec>
</sec>
<sec id="S4" sec-type="discussion">
<title>Discussion</title>
<p>Recent advances in metagenomics have enabled the assembly of nearly complete phage and microbial genomes from environmental samples. This has provided a unique opportunity to study the natural viral diversity and complex dynamics of phage-host interactions (<xref ref-type="bibr" rid="B28">Paez-Espino et al., 2016</xref>; <xref ref-type="bibr" rid="B24">Nayfach et al., 2021</xref>). However, metagenomically-derived phages are generally not associated with a host. This gap is slowly filled with new laboratory methods of high-throughput identification of virus-host interactions (including proximity ligation, viral tagging, phageFISH, and XRM-Seq) but these methods still require a careful interpretation by an expert and thus the paste of the discovery lags the deluge of metagenomic data (<xref ref-type="bibr" rid="B6">Coclet and Roux, 2021</xref>; <xref ref-type="bibr" rid="B35">Smith et al., 2022</xref>). These issues have prompted the development of bioinformatics tools that predict the potential host(s) based on the virus genome sequence and may select candidates for experimental verification of the interaction (<xref ref-type="bibr" rid="B38">Versoza and Pfeifer, 2022</xref>). Some of the most promising approaches to phage-host predictions are based on machine learning (ML) algorithms (<xref ref-type="bibr" rid="B39">Wang et al., 2020</xref>; <xref ref-type="bibr" rid="B8">Coutinho et al., 2021</xref>). As has been recently highlighted (<xref ref-type="bibr" rid="B6">Coclet and Roux, 2021</xref>; <xref ref-type="bibr" rid="B38">Versoza and Pfeifer, 2022</xref>), there is a pressing need to establish robust, comprehensive, and balanced sets suitable for training and testing ML algorithms. PHD can aid developers in constructing custom sets meeting specific criteria such as taxonomic affiliations of viruses and hosts, quality of the genome assemblies, and source databases.</p>
<p>The continuous mode of the PHD updates may prove useful during the current period of taxonomic upheaval. With ICTV rearranging major phage taxa to reflect their phylogenetic relations (<xref ref-type="bibr" rid="B1">Adriaenssens, 2021</xref>; <xref ref-type="bibr" rid="B37">Turner et al., 2021</xref>) and NCBI rapidly clustering sequences within the 95% identity threshold delineating species (<xref ref-type="fig" rid="F1">Figure 1L</xref>), each day brings us closer to a comprehensive and well-organized classification scheme that facilitates research in all phage-related fields.</p>
</sec>
<sec id="S5" sec-type="conclusion">
<title>Conclusion</title>
<p><underline>P</underline>hage &#x0026; <underline>H</underline>ost <underline>D</underline>aily (PHD) provides a single, convenient interface that allows for rapid access to an exhaustive set of experimentally verified phage-host interactions and provides up-to-date taxonomic classifications for all phages and hosts. We hope that our service will become a convenient one-stop-shop for biologists and bioinformaticians interested in finding novel, alternative hosts of known phages, spotting the bacterial taxa that might be neglected during earlier studies, and interpreting ecological relations observed in the environment. It can also be used by developers of bioinformatic tools to compile well-annotated phage and host datasets for their tools. Finally, our data can help to uncover links between genomics and the phylogeny of prokaryotic viruses, and their host range.</p>
</sec>
<sec id="S6" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The original contributions presented in this study are included in the article/<xref ref-type="supplementary-material" rid="FS1">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>AZ conceived and supervised the project. AZ and AR implemented the database and methods for data collection. KA designed and implemented the user interface. AZ and MH prepared the figure. AZ, MH, and JB analyzed the data and wrote the manuscript. All authors reviewed and approved the manuscript.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="S8" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by the National Science Center (NCN, Poland) grant 2018/31/D/NZ2/00108 to AZ and the National Center for Research and Development (NCBR, Poland) grant LIDER/5/0023/L-10/18/NCBR/2019 to JB.</p>
</sec>
<ack><p>We thank Igor Tolstoy and J. Rodney Brister for explaining current and forthcoming changes to virus taxonomy in NCBI.</p>
</ack>
<sec id="S10" sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2022.946070/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmicb.2022.946070/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="FS1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Figure 1</label>
<caption><p>Overview of the methods implemented in the PHD web application to collect information regarding interactions between viruses and prokaryotic host species. <bold>(1)</bold> Names and/or NCBI taxonomy identifiers (taxIds) of hosts are extracted from nucleotide/protein sequence records of viruses available in six source databases (NCBI Virus, RefSeq, Virus-Host DB, MVP, UniProt-SwissProt, and IntAct). <bold>(2)</bold> The extracted host names/taxIds are queried in TaxonKit against NCBI Taxonomy to retrieve full taxonomic lineages of hosts including their names, ranks, and taxIds. Only prokaryotic host species from Bacteria or Archaea are included in further steps. <bold>(3)</bold> Additional taxonomic information (if available) for each prokaryotic host species is retrieved from the Genome Taxonomy Database (GTDB). <bold>(4)</bold> Interaction assignments between virus sequence records and the prokaryotic host species are collected from the source databases. <bold>(5)</bold> Virus taxIds provided in sequence records are used to retrieve virus taxonomic lineages from NCBI Taxonomy. The obtained virus species taxIds or sequence accessions are used to retrieve virus taxonomic lineages (if available) in the International Committee on Taxonomy of Viruses (ICTV). Sequence accessions are then assigned to the appropriate virus species. For example, three genomic sequences (<ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="MN125599">MN125599</ext-link>, <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="MN125600">MN125600</ext-link>, and <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NC_049813">NC_049813</ext-link>) belong to the <italic>Veterinaerplatzvirus vv12210I</italic> species. <bold>(6)</bold> Sequence accessions within virus species are grouped into genome assemblies based on metadata provided in the NCBI Assembly database. For example, two sequence accessions - <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="MN125599">MN125599</ext-link>, <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="MN125600">MN125600</ext-link> - are part of one genome assembly from GenBank (assembly accession: <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="GCA_009903655">GCA_009903655</ext-link>) while the third sequence <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NC_049813">NC_049813</ext-link> is a separate genome assemble from RefSeq (assembly accession: <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="GCF_009671745">GCF_009671745</ext-link>). Assembly level category (i.e., Complete or Scaffold or Contig or unknown) is assigned to each virus assembly based on information provided by NCBI Assembly and INPHARED databases. <bold>(7)</bold> Source databases are assigned to each interaction between virus and host species. For example, the interaction between <italic>Veterinaerplatzvirus vv12210I</italic> and <italic>E. coli</italic> was covered by three source databases (i.e., NCBI Virus, Virus-Host DB, and RefSeq).</p></caption>
</supplementary-material>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adriaenssens</surname> <given-names>E. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Phage diversity in the human gut microbiome: a taxonomist&#x2019;s perspective.</article-title> <source><italic>mSystems</italic></source> <volume>6</volume>:<issue>e0079921</issue>. <pub-id pub-id-type="doi">10.1128/mSystems.00799-21</pub-id> <pub-id pub-id-type="pmid">34402650</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al-Shayeb</surname> <given-names>B.</given-names></name> <name><surname>Sachdeva</surname> <given-names>R.</given-names></name> <name><surname>Chen</surname> <given-names>L.-X.</given-names></name> <name><surname>Ward</surname> <given-names>F.</given-names></name> <name><surname>Munk</surname> <given-names>P.</given-names></name> <name><surname>Devoto</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Clades of huge phages from across Earth&#x2019;s ecosystems.</article-title> <source><italic>Nature</italic></source> <volume>578</volume> <fpage>425</fpage>&#x2013;<lpage>431</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-020-2007-4</pub-id> <pub-id pub-id-type="pmid">32051592</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bateman</surname> <given-names>A.</given-names></name> <name><surname>Martin</surname> <given-names>M.-J.</given-names></name> <name><surname>Orchard</surname> <given-names>S.</given-names></name> <name><surname>Magrane</surname> <given-names>M.</given-names></name> <name><surname>Agivetova</surname> <given-names>R.</given-names></name> <name><surname>Ahmad</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>UniProt: the universal protein knowledgebase in 2021.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>49</volume> <fpage>D480</fpage>&#x2013;<lpage>D489</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa1100</pub-id> <pub-id pub-id-type="pmid">33237286</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breitbart</surname> <given-names>M.</given-names></name> <name><surname>Bonnain</surname> <given-names>C.</given-names></name> <name><surname>Malki</surname> <given-names>K.</given-names></name> <name><surname>Sawaya</surname> <given-names>N. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Phage puppet masters of the marine microbial realm.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>3</volume> <fpage>754</fpage>&#x2013;<lpage>766</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-018-0166-y</pub-id> <pub-id pub-id-type="pmid">29867096</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buttimer</surname> <given-names>C.</given-names></name> <name><surname>McAuliffe</surname> <given-names>O.</given-names></name> <name><surname>Ross</surname> <given-names>R. P.</given-names></name> <name><surname>Hill</surname> <given-names>C.</given-names></name> <name><surname>O&#x2019;Mahony</surname> <given-names>J.</given-names></name> <name><surname>Coffey</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Bacteriophages and Bacterial Plant Diseases.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>8</volume>:<issue>34</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2017.00034</pub-id> <pub-id pub-id-type="pmid">28163700</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coclet</surname> <given-names>C.</given-names></name> <name><surname>Roux</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Global overview and major challenges of host prediction methods for uncultivated phages.</article-title> <source><italic>Curr. Opin. Virol.</italic></source> <volume>49</volume> <fpage>117</fpage>&#x2013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1016/j.coviro.2021.05.003</pub-id> <pub-id pub-id-type="pmid">34126465</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cook</surname> <given-names>R.</given-names></name> <name><surname>Brown</surname> <given-names>N.</given-names></name> <name><surname>Redgwell</surname> <given-names>T.</given-names></name> <name><surname>Rihtman</surname> <given-names>B.</given-names></name> <name><surname>Barnes</surname> <given-names>M.</given-names></name> <name><surname>Clokie</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>INfrastructure for a phage reference database: identification of large-scale biases in the current collection of cultured phage genomes.</article-title> <source><italic>PHAGE</italic></source> <volume>2</volume> <fpage>214</fpage>&#x2013;<lpage>223</lpage>. <pub-id pub-id-type="doi">10.1089/phage.2021.0007</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coutinho</surname> <given-names>F. H.</given-names></name> <name><surname>Zaragoza-Solas</surname> <given-names>A.</given-names></name> <name><surname>L&#x00F3;pez-P&#x00E9;rez</surname> <given-names>M.</given-names></name> <name><surname>Barylski</surname> <given-names>J.</given-names></name> <name><surname>Zielezinski</surname> <given-names>A.</given-names></name> <name><surname>Dutilh</surname> <given-names>B. E.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>RaFAH: host prediction for viruses of Bacteria and Archaea based on protein content.</article-title> <source><italic>Patterns</italic></source> <volume>2</volume>:<issue>100274</issue>. <pub-id pub-id-type="doi">10.1016/j.patter.2021.100274</pub-id> <pub-id pub-id-type="pmid">34286299</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cuevas</surname> <given-names>J. M.</given-names></name> <name><surname>Duffy</surname> <given-names>S.</given-names></name> <name><surname>Sanjua&#x00EC;n</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Point Mutation Rate of Bacteriophage &#x03A6;X174.</article-title> <source><italic>Genetics</italic></source> <volume>183</volume> <fpage>747</fpage>&#x2013;<lpage>749</lpage>. <pub-id pub-id-type="doi">10.1534/genetics.109.106005</pub-id> <pub-id pub-id-type="pmid">19652180</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dedrick</surname> <given-names>R. M.</given-names></name> <name><surname>Guerrero-Bustamante</surname> <given-names>C. A.</given-names></name> <name><surname>Garlena</surname> <given-names>R. A.</given-names></name> <name><surname>Russell</surname> <given-names>D. A.</given-names></name> <name><surname>Ford</surname> <given-names>K.</given-names></name> <name><surname>Harris</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Engineered bacteriophages for treatment of a patient with a disseminated drug-resistant Mycobacterium abscessus.</article-title> <source><italic>Nat. Med.</italic></source> <volume>25</volume> <fpage>730</fpage>&#x2013;<lpage>733</lpage>. <pub-id pub-id-type="doi">10.1038/s41591-019-0437-z</pub-id> <pub-id pub-id-type="pmid">31068712</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devoto</surname> <given-names>A. E.</given-names></name> <name><surname>Santini</surname> <given-names>J. M.</given-names></name> <name><surname>Olm</surname> <given-names>M. R.</given-names></name> <name><surname>Anantharaman</surname> <given-names>K.</given-names></name> <name><surname>Munk</surname> <given-names>P.</given-names></name> <name><surname>Tung</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Megaphages infect <italic>Prevotella</italic> and variants are widespread in gut microbiomes.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>4</volume>, <fpage>693</fpage>&#x2013;<lpage>700</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-018-0338-9</pub-id> <pub-id pub-id-type="pmid">30692672</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dion</surname> <given-names>M. B.</given-names></name> <name><surname>Oechslin</surname> <given-names>F.</given-names></name> <name><surname>Moineau</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Phage diversity, genomics and phylogeny.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>18</volume> <fpage>125</fpage>&#x2013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1038/s41579-019-0311-5</pub-id> <pub-id pub-id-type="pmid">32015529</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Domingo-Calap</surname> <given-names>P.</given-names></name> <name><surname>Cuevas</surname> <given-names>J. M.</given-names></name> <name><surname>Sanju&#x00E1;n</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>The Fitness Effects of Random Mutations in Single-Stranded DNA and RNA Bacteriophages.</article-title> <source><italic>PLoS Genetics</italic></source> <volume>5</volume>:<issue>e1000742</issue>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1000742</pub-id> <pub-id pub-id-type="pmid">19956760</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eskenazi</surname> <given-names>A.</given-names></name> <name><surname>Lood</surname> <given-names>C.</given-names></name> <name><surname>Wubbolts</surname> <given-names>J.</given-names></name> <name><surname>Hites</surname> <given-names>M.</given-names></name> <name><surname>Balarjishvili</surname> <given-names>N.</given-names></name> <name><surname>Leshkasheli</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2022</year>). <article-title>Combination of pre-adapted bacteriophage therapy and antibiotics for treatment of fracture-related infection due to pandrug-resistant <italic>Klebsiella pneumoniae</italic>.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>13</volume>:<issue>302</issue>. <pub-id pub-id-type="doi">10.1038/s41467-021-27656-z</pub-id> <pub-id pub-id-type="pmid">35042848</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fuhrman</surname> <given-names>J. A.</given-names></name></person-group> (<year>1999</year>). <article-title>Marine viruses and their biogeochemical and ecological effects.</article-title> <source><italic>Nature</italic></source> <volume>399</volume> <fpage>541</fpage>&#x2013;<lpage>548</lpage>.</citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>N. L.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Hu</surname> <given-names>S.</given-names></name> <name><surname>Lercher</surname> <given-names>M. J.</given-names></name> <name><surname>Zhao</surname> <given-names>X. M.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>MVP: a microbe-phage interaction database.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume> <fpage>D700</fpage>&#x2013;<lpage>D707</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1124</pub-id> <pub-id pub-id-type="pmid">29177508</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gorbalenya</surname> <given-names>A. E.</given-names></name> <name><surname>Krupovic</surname> <given-names>M.</given-names></name> <name><surname>Mushegian</surname> <given-names>A.</given-names></name> <name><surname>Kropinskim</surname> <given-names>A. M.</given-names></name> <name><surname>Siddell</surname> <given-names>S. G.</given-names></name> <name><surname>Varsani</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>5</volume> <fpage>668</fpage>&#x2013;<lpage>674</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-020-0709-x</pub-id> <pub-id pub-id-type="pmid">32341570</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hatcher</surname> <given-names>E. L.</given-names></name> <name><surname>Zhdanov</surname> <given-names>S. A.</given-names></name> <name><surname>Bao</surname> <given-names>Y.</given-names></name> <name><surname>Blinkova</surname> <given-names>O.</given-names></name> <name><surname>Nawrocki</surname> <given-names>E. P.</given-names></name> <name><surname>Ostapchuck</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Virus Variation Resource &#x2013; improved response to emergent viral outbreaks.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>45</volume> <fpage>D482</fpage>&#x2013;<lpage>D490</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1065</pub-id> <pub-id pub-id-type="pmid">27899678</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jassim</surname> <given-names>S. A. A.</given-names></name> <name><surname>Limoges</surname> <given-names>R. G.</given-names></name> <name><surname>El-Cheikh</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <article-title>Bacteriophage biocontrol in wastewater treatment.</article-title> <source><italic>World J. Microbiol. Biotechnol.</italic></source> <volume>32</volume>:<issue>70</issue>. <pub-id pub-id-type="doi">10.1007/s11274-016-2028-1</pub-id> <pub-id pub-id-type="pmid">26941243</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jordan</surname> <given-names>T. C.</given-names></name> <name><surname>Burnett</surname> <given-names>S. H.</given-names></name> <name><surname>Carson</surname> <given-names>S.</given-names></name> <name><surname>Caruso</surname> <given-names>S. M.</given-names></name> <name><surname>Clase</surname> <given-names>K.</given-names></name> <name><surname>DeJong</surname> <given-names>R. J.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>A broadly implementable research course in phage discovery and genomics for first-year undergraduate students.</article-title> <source><italic>mBio</italic></source> <volume>5</volume>:<issue>e01051-13</issue>. <pub-id pub-id-type="doi">10.1128/mBio.01051-13</pub-id> <pub-id pub-id-type="pmid">24496795</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kitts</surname> <given-names>P. A.</given-names></name> <name><surname>Church</surname> <given-names>D. M.</given-names></name> <name><surname>Thibaud-Nissen</surname> <given-names>F.</given-names></name> <name><surname>Choi</surname> <given-names>J.</given-names></name> <name><surname>Hem</surname> <given-names>V.</given-names></name> <name><surname>Sapojnikov</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Assembly: a resource for assembled genomes at NCBI.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>44</volume> <fpage>D73</fpage>&#x2013;<lpage>D80</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1226</pub-id> <pub-id pub-id-type="pmid">26578580</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krupovic</surname> <given-names>M.</given-names></name> <name><surname>Turner</surname> <given-names>D.</given-names></name> <name><surname>Morozova</surname> <given-names>V.</given-names></name> <name><surname>Dyall-Smith</surname> <given-names>M.</given-names></name> <name><surname>Oksanen</surname> <given-names>H. M.</given-names></name> <name><surname>Edwards</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Bacterial Viruses Subcommittee and Archaeal Viruses Subcommittee of the ICTV: update of taxonomy changes in 2021.</article-title> <source><italic>Arch. Virol.</italic></source> <volume>166</volume> <fpage>3239</fpage>&#x2013;<lpage>3244</lpage>. <pub-id pub-id-type="doi">10.1007/s00705-021-05205-9</pub-id> <pub-id pub-id-type="pmid">34417873</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mihara</surname> <given-names>T.</given-names></name> <name><surname>Nishimura</surname> <given-names>Y.</given-names></name> <name><surname>Shimizu</surname> <given-names>Y.</given-names></name> <name><surname>Nishiyama</surname> <given-names>H.</given-names></name> <name><surname>Yoshikawa</surname> <given-names>G.</given-names></name> <name><surname>Uehara</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Linking Virus Genomes with Host Taxonomy.</article-title> <source><italic>Viruses</italic></source> <volume>8</volume>:<issue>66</issue>. <pub-id pub-id-type="doi">10.3390/v8030066</pub-id> <pub-id pub-id-type="pmid">26938550</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nayfach</surname> <given-names>S.</given-names></name> <name><surname>P&#x00E1;ez-Espino</surname> <given-names>D.</given-names></name> <name><surname>Call</surname> <given-names>L.</given-names></name> <name><surname>Low</surname> <given-names>S. J.</given-names></name> <name><surname>Sberro</surname> <given-names>H.</given-names></name> <name><surname>Ivanova</surname> <given-names>N. N.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>6</volume> <fpage>960</fpage>&#x2013;<lpage>970</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-021-00928-6</pub-id> <pub-id pub-id-type="pmid">34168315</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nixon</surname> <given-names>A. E.</given-names></name> <name><surname>Sexton</surname> <given-names>D. J.</given-names></name> <name><surname>Ladner</surname> <given-names>R. C.</given-names></name></person-group> (<year>2014</year>). <article-title>Drugs derived from phage display.</article-title> <source><italic>MAbs</italic></source> <volume>6</volume> <fpage>73</fpage>&#x2013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.4161/mabs.27240</pub-id> <pub-id pub-id-type="pmid">24262785</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x2019;Leary</surname> <given-names>N. A.</given-names></name> <name><surname>Wright</surname> <given-names>M. W.</given-names></name> <name><surname>Brister</surname> <given-names>J. R.</given-names></name> <name><surname>Ciufo</surname> <given-names>S.</given-names></name> <name><surname>Haddad</surname> <given-names>D.</given-names></name> <name><surname>McVeigh</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>44</volume> <fpage>D733</fpage>&#x2013;<lpage>D745</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1189</pub-id> <pub-id pub-id-type="pmid">26553804</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orchard</surname> <given-names>S.</given-names></name> <name><surname>Ammari</surname> <given-names>M.</given-names></name> <name><surname>Aranda</surname> <given-names>B.</given-names></name> <name><surname>Breuza</surname> <given-names>L.</given-names></name> <name><surname>Briganti</surname> <given-names>L.</given-names></name> <name><surname>Broackes-Carter</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>The MIntAct project&#x2014;IntAct as a common curation platform for 11 molecular interaction databases.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>42</volume> <fpage>D358</fpage>&#x2013;<lpage>D363</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1115</pub-id> <pub-id pub-id-type="pmid">24234451</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paez-Espino</surname> <given-names>D.</given-names></name> <name><surname>Eloe-Fadrosh</surname> <given-names>E. A.</given-names></name> <name><surname>Pavlopoulos</surname> <given-names>G. A.</given-names></name> <name><surname>Thomas</surname> <given-names>A. D.</given-names></name> <name><surname>Huntemann</surname> <given-names>M.</given-names></name> <name><surname>Mikhailova</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Uncovering Earth&#x2019;s virome.</article-title> <source><italic>Nature</italic></source> <volume>536</volume> <fpage>425</fpage>&#x2013;<lpage>430</lpage>. <pub-id pub-id-type="doi">10.1038/nature19094</pub-id> <pub-id pub-id-type="pmid">27533034</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parks</surname> <given-names>D. H.</given-names></name> <name><surname>Chuvochina</surname> <given-names>M.</given-names></name> <name><surname>Chaumeil</surname> <given-names>P.-A.</given-names></name> <name><surname>Rinke</surname> <given-names>C.</given-names></name> <name><surname>Mussig</surname> <given-names>A. J.</given-names></name> <name><surname>Hugenholtz</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>A complete domain-to-species taxonomy for Bacteria and Archaea.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>38</volume> <fpage>1079</fpage>&#x2013;<lpage>1086</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-020-0501-8</pub-id> <pub-id pub-id-type="pmid">32341564</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parks</surname> <given-names>D. H.</given-names></name> <name><surname>Chuvochina</surname> <given-names>M.</given-names></name> <name><surname>Rinke</surname> <given-names>C.</given-names></name> <name><surname>Mussig</surname> <given-names>A. J.</given-names></name> <name><surname>Chaumeil</surname> <given-names>P.-A.</given-names></name> <name><surname>Hugenholtz</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>50</volume> <fpage>D785</fpage>&#x2013;<lpage>D794</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab776</pub-id> <pub-id pub-id-type="pmid">34520557</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sayers</surname> <given-names>E. W.</given-names></name> <name><surname>Cavanaugh</surname> <given-names>M.</given-names></name> <name><surname>Clark</surname> <given-names>K.</given-names></name> <name><surname>Ostell</surname> <given-names>J.</given-names></name> <name><surname>Pruitt</surname> <given-names>K. D.</given-names></name> <name><surname>Karsch-Mizrachi</surname> <given-names>I.</given-names></name></person-group> (<year>2019</year>). <article-title>GenBank.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>48</volume> <fpage>D84</fpage>&#x2013;<lpage>D86</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkz956</pub-id> <pub-id pub-id-type="pmid">31665464</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schoch</surname> <given-names>C. L.</given-names></name> <name><surname>Ciufo</surname> <given-names>S.</given-names></name> <name><surname>Domrachev</surname> <given-names>M.</given-names></name> <name><surname>Hotton</surname> <given-names>C. L.</given-names></name> <name><surname>Kannan</surname> <given-names>S.</given-names></name> <name><surname>Khovanskaya</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>NCBI Taxonomy: a comprehensive update on curation, resources and tools.</article-title> <source><italic>Database</italic></source> <volume>2020</volume>:<issue>baaa062</issue>. <pub-id pub-id-type="doi">10.1093/database/baaa062</pub-id> <pub-id pub-id-type="pmid">32761142</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schofield</surname> <given-names>D.</given-names></name> <name><surname>Sharp</surname> <given-names>N. J.</given-names></name> <name><surname>Westwater</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). <article-title>Phage-based platforms for the clinical detection of human bacterial pathogens.</article-title> <source><italic>Bacteriophage</italic></source> <volume>2</volume> <fpage>105</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.4161/bact.19274</pub-id> <pub-id pub-id-type="pmid">23050221</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>W.</given-names></name> <name><surname>Ren</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>TaxonKit: a practical and efficient NCBI taxonomy toolkit.</article-title> <source><italic>J. Genet. Genomics</italic></source> <volume>48</volume> <fpage>844</fpage>&#x2013;<lpage>850</lpage>. <pub-id pub-id-type="doi">10.1016/j.jgg.2021.03.006</pub-id> <pub-id pub-id-type="pmid">34001434</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>S. E.</given-names></name> <name><surname>Huang</surname> <given-names>W.</given-names></name> <name><surname>Tiamani</surname> <given-names>K.</given-names></name> <name><surname>Unterer</surname> <given-names>M.</given-names></name> <name><surname>Khan Mirzaei</surname> <given-names>M.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name></person-group> (<year>2022</year>). <article-title>Emerging technologies in the study of the virome.</article-title> <source><italic>Curr. Opin. Virol.</italic></source> <volume>54</volume> <issue>101231</issue>. <pub-id pub-id-type="doi">10.1016/j.coviro.2022.101231</pub-id> <pub-id pub-id-type="pmid">35643020</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sulakvelidze</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Using lytic bacteriophages to eliminate or significantly reduce contamination of food by foodborne bacterial pathogens.</article-title> <source><italic>J. Sci. Food Agric.</italic></source> <volume>93</volume> <fpage>3137</fpage>&#x2013;<lpage>3146</lpage>. <pub-id pub-id-type="doi">10.1002/jsfa.6222</pub-id> <pub-id pub-id-type="pmid">23670852</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turner</surname> <given-names>D.</given-names></name> <name><surname>Kropinski</surname> <given-names>A. M.</given-names></name> <name><surname>Adriaenssens</surname> <given-names>E. M.</given-names></name></person-group> (<year>2021</year>). <article-title>A Roadmap for Genome-Based Phage Taxonomy.</article-title> <source><italic>Viruses</italic></source> <volume>13</volume> <issue>506</issue>. <pub-id pub-id-type="doi">10.3390/v13030506</pub-id> <pub-id pub-id-type="pmid">33803862</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Versoza</surname> <given-names>C. J.</given-names></name> <name><surname>Pfeifer</surname> <given-names>S. P.</given-names></name></person-group> (<year>2022</year>). <article-title>Computational Prediction of Bacteriophage Host Ranges.</article-title> <source><italic>Microorganisms</italic></source> <volume>10</volume> <issue>149</issue>. <pub-id pub-id-type="doi">10.3390/microorganisms10010149</pub-id> <pub-id pub-id-type="pmid">35056598</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Ren</surname> <given-names>J.</given-names></name> <name><surname>Tang</surname> <given-names>K.</given-names></name> <name><surname>Dart</surname> <given-names>E.</given-names></name> <name><surname>Ignacio-Espinoza</surname> <given-names>J. C.</given-names></name> <name><surname>Fuhrman</surname> <given-names>J. A.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>A network-based integrated framework for predicting virus&#x2013;prokaryote interactions.</article-title> <source><italic>NAR Genomics Bioinfor.</italic></source> <volume>2</volume>:<issue>lqaa044</issue>. <pub-id pub-id-type="doi">10.1093/nargab/lqaa044</pub-id> <pub-id pub-id-type="pmid">32626849</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Gao</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Jumbo Bacteriophages: an Overview.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>8</volume>:<issue>403</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2017.00403</pub-id> <pub-id pub-id-type="pmid">28352259</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p><ext-link ext-link-type="uri" xlink:href="https://github.com/pirovc/genome_updater">https://github.com/pirovc/genome_updater</ext-link></p></fn>
</fn-group>
</back>
</article>