Edited by:
Reviewed by:
*Correspondence:
This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Fine-tuned regulation of the cellular nucleotide pools is indispensable for faithful replication of Deoxyribonucleic Acid (DNA). The genetic information is also safeguarded by DNA damage recognition and repair processes. Uracil is one of the most frequently occurring erroneous bases in DNA; it can arise from cytosine deamination or thymine-replacing incorporation. Two enzyme activities are primarily involved in keeping DNA uracil-free: dUTPase (dUTP pyrophosphatase) activity that prevent thymine-replacing incorporation and uracil-DNA glycosylase activity that excise uracil from DNA and initiate uracil-excision repair. Both dUTPase and the most efficient uracil-DNA glycosylase (UNG) is thought to be ubiquitous in free-living organisms. In the present work, we have systematically investigated the genotype of deposited fully sequenced bacterial and Archaeal genomes. We have performed bioinformatic searches in these genomes using the already well described dUTPase and UNG gene sequences. For dUTPases, we have included the trimeric all-beta and the dimeric all-alpha families and also, the bifunctional dCTP (deoxycytidine triphosphate) deaminase-dUTPase sequences. Surprisingly, we have found that in contrast to the generally held opinion, a wide number of bacterial and Archaeal species lack all of the previously described dUTPase gene(s). The
The inherent chemical reactivity of DNA and the presence of reactive metabolites and other molecular species within the cell leads to numerous chemical modifications within the DNA even under normal, physiological conditions (
The DNA repair pathways (
In a dUTPase knock-out background, viability can be still restored in some cases by simultaneous UNG knock-out (
The importance of dUTPase is underlined by its reported ubiquity. However, our recent observations in several
This intriguing situation in
For dUTPases, two protein families have been described to date, the all-β trimeric and the all-α dimeric dUTPases (11), hence we used representative sequences of these families in our search (dUTPases from
In our studies, we investigated those prokaryote genomes that are fully sequenced and deposited in the NCBI Genome database that is, 2261 bacterial and 151 archaeal genomic sequence sets. The result of screening the bacterial and Archaeal genomes for the presence/absence of dUTPase genes is shown in
Among the three different genotypes identified in the prokaryote genomes in our study, the
The
Our data, despite the usual textbook knowledge, clearly demonstrated that the dUTPase gene is far from being ubiquitous in prokaryotes. It was of immediate further interest to understand how the different organisms may cope with this unexpected situation. We emphasize that our analysis could only involve the dUTPase genes that have been already described in the literature. The proteins encoded in other genes may also possess dUTPase activity, and we will address this possibility also in our discussions under section “Novel protein set for uracil-DNA metabolism.”
Since the
For uracil-DNA glycosylase, the sequence of the UNG enzyme from
Based on the results the organisms lacking dUTPase gene were further distributed into two groups depending on the simultaneous absence or presence of UNG gene (cf. blue and pink segments on
A more detailed analysis of the evolutionary distribution of species that do not have dUTPase genes is shown in
Distribution of
Staphylococcaceae | Oscillatoriophycideae |
Flavobacteriaceae | Thermoanaerobacterales |
Bacillaceae | Oceanospirillales |
Enterococcaceae | Mycoplasmataceae |
Vibrionaceae | Thermotogaceae |
Spirochaetaceae | Methanomicrobia |
Mycoplasmataceae |
Inhibitory proteins of UNG may modify the physiological scenario, hence we investigated if any of the UNG inhibitory proteins may be encoded in those bacterial and Archaeal genomes that showed up as
For UNG, three different proteins have been identified with significant inhibitory effeciency. Two of these (UGI and p56) are encoded by different bacteriophages [phages PBS1/PBS2 and phi29 of
We found that none of the phage-related UGI or p56 protein genes could be located on the genomes investigated. The gene for SaUGI, the
Uracil N-Glycosylase inhibitors were previously only identified in
Interestingly, several bacteriophages carry genes that modify the uracil-DNA metabolism. For example in
As mentioned above the UNG inhibitor, UGI was discovered also as a phage protein. The very first finding that led to the identification of UGI was that
In summary, several prophages carry genes that encode proteins involved in the uracil metabolism. The products of these genes may modify the scenario predicted based on the genomic sequence of the bacteria.
Another strategy to survive
For Thermatoga and Methanomicrobia, data from the literature indicate that the
As mentioned earlier, a new UNG inhibitor, SaUGI was also recently described (
We have shown that the genes for the common dUTPase enzyme families are far from being ubiquitous in prokaryotes. This unexpected genotype is observed in evolutionary well-separated branches suggesting that loss of the
Horizontal gene transfer is of general key importance in spreading virulence elements. In the present study we observe that elements involved in uracil-DNA metabolism are also interestingly found within mobile genetic elements. Parallel spreading of these U-DNA factors with virulence elements may also impact as key regulators of genome integrity and mutagenic rates. The biomedical significance of these findings are especially relevant for microbes of current high therapeutic challenge. Among these, we suggest that depending on the expression pattern of the proteins involved in uracil DNA metabolism,
Phages and mobile genetic elements has important role also in lateral gene transfer. For example, the mentioned
Here we describe the workflow that has generated the list of bacterial and archaeal genomes without dUTPase and from these genomes those with and without UNG, UGI, SAUGI, and P56. The list, tables and the source of the in-house programs referred below, are available at the website
The source of the bacterial and archaeal genome sequences was downloaded from the NCBI FTP site:
Search for dUTPase sequences, the UNG sequence and the UNG inhibitor UGI-SAUGI-P56 sequences were directed by the run-blast.pl script that calls the program tblastn; the applied fasta files to search for in the database were:
dUTPase-tri-di1-di2-arch.fasta,UNG.fasta, UGI-SAUGI-P56.fasta, all downloadable from
The dUTPase fasta file contains one trimeric (
The evaluation of the tblastn results were performed by the script find-nohits.pl that returned a table of the bacterial/Archaeal genomes without dUTPase genes where no alignments were found with smaller than 0.01 E-value for any of the three dUTPases we search for. The genomes without dUTPase hits were also partitioned into classes (i) according to the containment of UNG genes with better than 0.01 E-value, and (ii) containment of any UNG inhibitors with sequence-similarities from the fasta file UGI-SAUGI-P56.fasta of 0.01 E-value or less. The genomes without dUTPase and with UNG are listed in
The interested reader can easily reproduce the results in each row of
We have used the MEGAN5 (
First, the file that maps the gi values the Taxonomy IDs was downloaded from the NCBI FTP site:
Next, the gen-megan.pl script of ours was applied to get life_wo_di1-di2-tri-arch_dUTPase_E001.megan file that was opened by the MEGAN5 software
The
The source of the bacterial and archaeal genome sequences was downloaded from the NCBI FTP site:
Search for dUTPase sequences, the UNG sequence and the UNG inhibitor UGI-SAUGI-P56 sequences were directed by the run-blast.pl script that calls the program tblastn; the applied fasta files to search for in the database were:
dUTPase-tri-di1-di2-arch.fasta,UNG.fasta, UGI-SAUGI-P56.fasta, all downloadable from
Initiated the study: BV, contributed analysis tools: CK, VP-K, OD, DS, developed software: CK, designed phylogenetic visualization: CK, analyzed results: JS, BV, VG, CK, VP-K, OD, and DS, wrote the paper: BV, VG, JS.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at:
deoxycytidine triphosphate
deoxyinozitol triphosphate
Deoxyribonucleic Acid
deoxynucleoside triphosphate
deoxynucleoside triphosphate hydrolase
deoxyuracil triphosphate
dUTP pyrophosphatase
the name of a nucleotide hydrolase enzyme family
name of an UDG enzyme family
mismatch-specific DNA glycosylase
protein 56, name of an UNG inhibitor protein
Ribonucleic Acid
Staphylococcus aureus uracil DNA glycosylase inhibitor
single-strand-selective monofunctional uracil-DNA glycosylase
name of a S. aureus pathogenicity island repressor protein
Thymine DNA glycosylase
uracil DNA glycosylase
uracil DNA glycosylase inhibitor
N-Glycosylase
uracil N-Glycosylase, the name of the most common uracil DNA glycosylase inhibitor.