Characterization of winged helix domain fusion endonucleases as N6-methyladenine-dependent type IV restriction systems

Winged helix (wH) domains, also termed winged helix-turn-helix (wHTH) domains, are widespread in all kingdoms of life and have diverse roles. In the context of DNA binding and DNA modification sensing, some eukaryotic wH domains are known as sensors of non-methylated CpG. In contrast, the prokaryotic wH domains in DpnI and HhiV4I act as sensors of adenine methylation in the 6mApT (N6-methyladenine, 6mA, or N6mA) context. DNA-binding modes and interactions with the probed dinucleotide are vastly different in the two cases. Here, we show that the role of the wH domain as a sensor of adenine methylation is widespread in prokaryotes. We present previously uncharacterized examples of PD-(D/E)XK—wH (FcyTI, Psp4BI), PUA—wH—HNH (HtuIII), wH—GIY-YIG (Ahi29725I, Apa233I), and PLD—wH (Aba4572I, CbaI) fusion endonucleases that sense adenine methylation in the Dam+ Gm6ATC sequence contexts. Representatives of the wH domain endonuclease fusion families with the exception of the PLD—wH family could be purified, and an in vitro preference for adenine methylation in the Dam context could be demonstrated. Like most other modification-dependent restriction endonucleases (MDREs, also called type IV restriction systems), the new fusion endonucleases except those in the PD-(D/E)XK—wH family cleave close to but outside the recognition sequence. Taken together, our data illustrate the widespread combinatorial use of prokaryotic wH domains as adenine methylation readers. Other potential 6mA sensors in modified DNA are also discussed.


Introduction
In most restriction-modification (R-M) scenarios, nucleobase modification serves as a mark of self and provides protection against endonuclease digestion.In some cases, however, phages have learned to exploit this principle by modifying their own DNA, either by incorporation of non-standard nucleoside triphosphates or by post-replicative modifications catalyzed either by host or phage enzymes.Modification-dependent restriction endonucleases (MDREs) specifically target such modified DNA (modified base or backbone).The MDREs come in two main groups, distinguished by the presence or absence of nucleoside triphosphate (NTP)-consuming motor proteins.The NTP-independent proteins are typically modular, with separate modification sensing and DNA cleavage domains.Because of this architecture, DNA cleavage typically takes place at a distance from the site of modification.For some enzymes, a single site is sufficient, but typically, cleavage is most efficient when it is directed by appropriately spaced modifications, which cooperate to position an endonuclease dimer for a double strand (ds) cut in the DNA.
The catalytic domains present in restriction can be grouped into the almost universally used hydrolases (Orlowski and Bujnicki, 2008) and the very rarely used lyases (Miyazono et al., 2014).The hydrolases, in turn, can be grouped into a surprisingly small set of phylogenetically unrelated enzyme groups.PD-(D/E)XK enzymes are named for characteristic amino acids (aa) built around a central β-sheet, which harbors one or two catalytic Mg 2+ ions (Pingoud et al., 2005).The metal ions are held in place in part by the D and D or E (abbreviated as D/E) interacting residues, which, together with a K residue, activate a water molecule for direct inline attack on the scissile phosphate (Bujnicki and Rychlewski, 2001;Kosinski et al., 2005).HNH enzymes, also called ββα-Me enzymes or His-Me finger enzymes (Jablonska et al., 2017;Wu et al., 2020), harbor a single metal cation in their active site.Metal identity requirements are less strict than for PD-(D/E)XK enzymes.Many divalent transition metal ions are acceptable (Pommer et al., 2001).Like PD-(D/E)XK enzymes, the HNH enzymes are believed to catalyze attack on the scissile phosphate by a water molecule.However, water activation is not by a lysine residue but by the first histidine of the HNH motif (Flick et al., 1998;Sokolowska et al., 2009).GIY-YIG enzymes (Dunin-Horkawicz et al., 2006;Kaminska et al., 2008) also bind a single metal cation in the active site.These enzymes activate the water molecule with a tyrosine residue, most likely from the GIY motif (Sokolowska et al., 2011).Finally, there are also completely metal-independent endonuclease domains.They resemble phospholipase D; therefore, the enzymes containing them are known as PLD endonucleases (Grazulis et al., 2005;Chan et al., 2007).The PLD enzymes are believed to catalyze phosphodiester cleavage via a covalent intermediate (Sasnauskas et al., 2010).
Apart from the PUA superfamily, other modification sensor domains may also be involved in restriction, such as the NEco domain in EcoKMcrA with affinity for 5mC and 5hmC (Czapinska et al., 2018).Unlike the PUA superfamily domains, the NEco domain senses 5mC or 5hmC without nucleotide flipping in the context of dsDNA (Slyvka et al., 2019).Finally, a winged helix (wH) domain has been described as a 6mA sensor in DpnI.Like the NEco domain, the wH (winged helix) domain senses nucleobase modifications in the context of dsDNA without flipping (Mierzejewska et al., 2014).However, in contrast to the NEco domain, it has specificity for 6mA rather than 5mC.Also, in contrast to NEco, which recognizes methyl groups of fully methylated CpG in two separate pockets, the wH domain recognizes methyl groups of fully methylated ApT in a single pocket, exploiting their proximity in space.The wH domain in DpnI is unusual in being fused to a nuclease domain, which has a separate sequence (GATC) and modification (6mA) specificity (Siwek et al., 2012).Therefore, it acts more like an effector domain in type IIE enzymes (Senesac and Allen, 1995;Roberts et al., 2003), except that both the nuclease and sensor/effector domain are specific for methylated rather than non-methylated DNA.
Winged helix (wH) domains are a group of DNA-binding domains that belong to the superfamily of helix-turn-helix (HTH) proteins (Brennan, 1993;Lai et al., 1993;Gajiwala and Burley, 2000).Structurally, canonical winged helix domains consist of an N-terminal α-helices and β-strand, the HTH motif, and a β-hairpin.The "wings" of the motif are the loops connecting the strands of the β-hairpin and immediately downstream of it (Iyer et al., 2016;Figure 1A).Winged helix motifs were first found in transcription factors, but it is now clear that they also have roles in transcription initiation complexes (Teichmann et al., 2012), in the binding of left-handed Z-DNA (Schwartz et al., 2001) or RNA (Tang et al., 2021), or in proteinprotein interactions (Wah et al., 1997).In transcription factors, wH domains tend to interact with DNA, just as would be expected for the HTH motif that is embedded within them.In other words, they insert the second helix of the HTH motif, which is the third helix of the wH domain, into the major groove of DNA (Gajiwala and Burley, 2000).However, other DNA-binding modes are also possible in special cases (Gajiwala et al., 2000;Wolberger and Campbell, 2000).A recent example of such alternative binding modes is the complexes of eukaryotic winged helix domains with dsDNA containing non-methylated CpG (Stielow et al., 2021;Becht et al., 2023;Weber et al., 2023).A winged helix motif in a restriction endonuclease (REase) was first noticed in the DNA-binding domain of FokI (Wah et al., 1997), but this particular wH domain does not appear to be involved in interactions with DNA.
The role of the winged helix domain in adenine methylation sensing was first noticed in DpnI.DpnI is a G6mATC-specific endonuclease that cleaves within the recognition sequence and has a strong preference for DNA that is adenine-methylated in both strands (Siwek et al., 2012).In DpnI, the winged helix domain plays the role of an effector domain that senses 6mA separately from and with slightly relaxed sequence specificity compared to the PD-(D/E)XK nuclease domain (Mierzejewska et al., 2014;Figures 1B,C).More recently, a winged helix domain was also implicated in the sensing of adenine methylation, also in the G6mATC context.HHPV4I (also called HhiV4I; Lu et al., 2023) is a three-domain enzyme, with a PUA (SRA)-like domain at the N-terminus, a winged helix domain in the middle, and an HNH endonuclease domain at the C-terminus.The PUA superfamily domain, described as an SRA domain by Lu et al., appears not to be involved in DNA modification sensing (Lu et al., 2023).By contrast, the winged helix domain directed preferential cleavage of Dam + over Dam − DNA, and it has much higher affinity to Dam + than to Dam − DNA in gel shift experiments.In contrast to DpnI, HHPV4I (HhiV4I) cleaves at a distance from the site of adenine methylation, suggesting that the endonuclease domain is directed by the winged helix domain and does not sense adenine methylation on its own (Lu et al., 2023).
In this study, we show that the winged helix domain is widely used as an adenine methylation sensor in MDREs (Figure 1D).We present additional examples of proteins that share the PD-(D/E)XK-wH architecture with DpnI or the PUA-wH-HNH architecture with HHPV4I (HhiV4I).Additionally, we show that the wH domain is can  (Kabsch and Sander, 1983).DpnI residues that are involved in the formation of the pocket for the methyl groups (of DNA methylated in both strands) are marked by an "m," and those that are involved in hydrogen bonding with the nucleobases of the GATC target sequence of DpnI are marked by an asterisk ("*").Their identities and residue numbers in case of DpnI are indicated below the alignment (with reference to the DpnI structure with PDB accession 4kyw; Mierzejewska et al., 2014).Note the strong overlap between methyl pocket-forming residues and residues that are involved in target sequence selection.

Protein expression and purification
C2566 (Dam + ) competent cells (cloning grade) were provided by NEB.ER2948 (Dam − ) competent cells were prepared by a modified rubidium chloride method.E. coli cells were cultured at 37°C to mid-log phase, and IPTG was added to the culture at 0.5 mM final concentration for protein production (at 18°C overnight).Cells were lysed by sonication in chitin column buffer or Ni-agarose column buffer.Clarified cell lysates with overexpressed proteins (target protein-intein-CBD) or C-terminal 6× His-tagged proteins were loaded onto chitin or Ni-agarose columns, respectively, for affinity purification.The protein purification protocols were used as recommended by the manufacturers.In some cases, the partially purified proteins were further purified by chromatography through DEAE (flow-through to remove nucleic acids at 0.3 M NaCl concentration) and Hi-Trap Heparin (5 mL).BigDye ® Terminator v3.1 Cycle Sequencing Kit was purchased from Thermo-Fisher (Applied Biosystems).Restriction gene inserts in plasmids were sequenced to verify the correct sequences.Dam + pBR322 DNA fragments after restriction digestion were sequenced to determine the cut sites.DNA sequence edits were carried out using DNAStar or Geneious software packages.BlastP searches in the GenBank and UniProtKB databases were performed using the respective web servers.NCBI Pfam and conserved domains were used to visualize protein domains of REase homologs.

Plasmid preparation and transformation
Plasmid mini-prep kits and competent cells were provided by NEB.Plasmid mini-preparation and bacterial transformation were done according to the manufacturer's recommendation.

Clustering locus-specific annotations analysis
For clustering locus-specific annotations (CLANS) analysis, the homologs of the five groups of wH-containing enzymes (sequences of the reference proteins are listed in Supplementary material) were obtained by blast (BlastP, default settings in the UniProtKB website) using the UniProtKB reference proteomes and Swiss-Prot database.A total of 1,258 wH fusion endonuclease homolog proteins were analyzed.The resulting homolog protein fasta files were combined and subjected to CLANS analysis using the MPI Bioinformatics Toolkit.The result of the CLANS analysis is visualized with the CLANS Java application (Frickey and Lupas, 2004).CLANS analyzes all-against-all pairwise sequence similarities to establish relationships within a protein family.In the CLANS network figure, each node (a small colored dot) represents a full-length protein (or wH domain in the lower box), and the line connects two proteins/domains that share sequence similarities.The lengths of the lines represent the degree of sequence similarities, with short lines representing close similarities and vice versa.Each node is the same size, but the size of the color "blob" is related to the number of nodes clustered together.

Phylogenetic analysis
To perform phylogenetic analysis, we first built a hmm (hidden Markov model) profile with the wH domains of the six representative wH-containing enzymes (note: wH domains only, not full-length proteins).The resulting profile was used to search homologs using hmmsearch (HMMER v3.1b2) on the combined fasta file (see CLANS analysis).We extracted the wH domain in each homolog protein and performed multiple alignments with the five representative wH domains using MAFFT (v7.508).The maximum likelihood tree was constructed using RAxML (v8.2.12) with option -f a to enable rapid bootstrapping for 100 times.Other options used were -p 1237, −x 1,237, −m PROTGAMMAAUTO.The best-scoring maximum likelihood tree with bootstrap values was visualized with iTOL. 1

Proteome analysis
The four samples (10 μg each) were digested with trypsin using S-Trap micro-columns as recommended by the manufacturer (PROTIFI).LC-MS/MS of the digests was conducted on the Thermo Orbitrap Exploris, with two injections made per sample.Data were searched using Proteome Discoverer v2.5 against a combined FASTA database containing the Uniprotkb_E.coli-B strain proteins and the four winged helix (wH) endonuclease protein sequences (FcyTI, HhiV4I, Ahi29725I, and Apa233I).Results are presented in 8 Excel spreadsheets, one file for each injection (see raw data on the proteome 1 https://itol.embl.de/ of the purified proteins).The most confident protein ID is determined by Score Sequest HT (the last column in each table-the higher the score, the more confident the identification).Proteins identified are reported with >1 unique peptide per protein and a 1% false discovery rate.

Bioinformatic screen of wH domain endonucleases
Some known wH domains are adenine methylation (6mA)-specific.This notion was originally suggested by the demonstration of 6mA specificity of the DpnI wH domain and further strengthened by the observation of adenine specificity of the wH domain of HHPV4I (Lu et al., 2023), which was reported when this study was being finalized.In the hope of finding new adenine methylation-specific endonucleases, we searched for fusions of wH domains with endonuclease domains known to play roles in R-M (Pingoud and Jeltsch, 2001).Apart from additional PD-(D/E)XK-wH and PUA-wH-HNH endonucleases, we also identified fusion proteins with a wH-HNH, wH-GIY-YIG, and PLD-wH architecture.In addition, we found PD-(D/E)XK-wH-NTPase cases that can be considered as fusions of a DpnI-like protein with an McrB-like NTPase (GTPase/ATPase) domain and NTD-wH-NTPase cases, apparently without a nuclease domain and with an N-terminal domain of unknown function.One additional example is the wH-Mrr catalytic domain (PD-QXK)-NTPase fusion endonuclease.For this study, we concentrated on the NTP-independent enzymes with wH and endonuclease domains.To better understand the sequence relationships between the new wH fusion proteins, we carried out a CLANS (Frickey and Lupas, 2004) analysis.CLANS determines all-against-all pairwise sequence similarities to establish relationships within a protein family.It is not intended to find sequence motifs within protein sequences, which are better detected using other software.In CLANS analysis of the full-length proteins [with BamHI (GGATCC) and related enzymes as a control], the wH domain-containing endonucleases segregated clearly into separate groups, driven by the sequence similarity between endonuclease domains of the same group (Figure 2).However, when we limited the CLANS analysis to the wH domains alone, the segregation into groups according to the endonuclease domain was not so clear.In particular, wH domains from PD-(D/E)XK and GIY-YIG endonucleases were fully intermingled, possibly suggesting multiple separate fusion events (Figure 2, box).The finding suggests that fusions of the same type may have arisen independently several times in evolution.Nevertheless, much of the diversity of the new fusion proteins is clearly due to divergent evolution.This is also supported by the phylogenetic tree of the wH domains (Supplementary Figure S1).Note, however, that bootstrap values for most branches of the tree are very low, making the tree very tentative overall.
Rare occurrence of methyltransferase genes in the immediate genetic neighborhood of the new fusion proteins wH proteins frequently bind DNA, but only some of them are methylation-dependent (e.g., DpnI), whereas others are not (e.g., .Typically, such a methyltransferase would be located in the immediate genomic neighborhood of the endonuclease, so that the entire system could work as a defense island (Makarova et al., 2011).To test for possible association with DNA methyltransferases, we inspected the genomic neighborhoods of over 1,000 wH fusion proteins.In 87% of cases, none of the three genes adjacent upstream or downstream to the wH fusion gene encoded a DNA methyltransferase, suggesting that most of the new wH fusion proteins acted as stand-alone endonucleases, possibly as type IV restriction systems.
Likely specificity of at least some of the wH fusion proteins for 6mA in the GATC context Adenine methylation in bacteria occurs frequently in the GATC context i.e. the target sequence of the Dam methyltransferase (MTase) (Marinus and Casadesus, 2009), which is widely distributed in bacteria because of its diverse house-keeping roles, including DNA replication (Boye and Lobner-Olesen, 1990) and mismatch repair (Au et al., 1992;Josephs et al., 2015).Hence, it was likely that the putative MDREs with the wH domain might detect adenine methylation in this sequence context.This idea was further supported by the precedent of the wH domain in DpnI, which is known to be specific for adenine methylation in the Dam context (with some leeway for the outer bases S6mATS, where S is G or C).Inspection of the crystal structure of the DpnI domain in complex with target DNA revealed that the same residues contribute to both the methyl binding pocket and the sequence specificity, suggesting that methyl sensing and detection of the G6mATC target sequence are intricately linked (Mierzejewska et al., 2014).Large-scale analysis of the wH domain fusion proteins showed that the motif for 6mA and GATC recognition (see arrows in Figure 1D), or closely related motifs, were present in approximately 10% of the new fusion proteins.With the exception of SruGXI as a representative of the wH-HNH endonucleases, we selected for further characterization the fusion proteins that had the motif for 6mA and GATC specificity.Such fusion proteins are very likely to recognize m6A in the GATC context.The reminder is that stand-alone wH fusion proteins are likely to recognize modified DNA (otherwise they would be toxic to the host).However, it is currently unclear whether the modification is m6A, and, if so, whether the sequence context is GATC.

Avoidance of genomic conflict
Inspection of a 10 kb interval around genes encoding the new fusion proteins revealed association with methyltransferases in some cases.PD-(D/E)XK-wH domains co-occurred with predicted C5 methyltransferases in 40 cases.In some cases, they also co-occurred with a predicted 4mC (N4mC) or 6mA MTase directly adjacent to it.In these cases (e.g., Bacteroidota bacterium isolate CP064983.1,Moraxella ovis strain CP011158.1),the putative DNA MTase has been inactivated by a frame shift.The PUA-wH-HNH and wH-HNH co-occurred in 47 cases with Eco57I-like MTases (of type IIG R-M-S fusion enzymes).These MTases are predicted to be 6mA MTases, with CTGAAG (site of methylation underlined) as the target sequence.In the case of the wH-GIY-YIG endonucleases, we found four instances of an EcoRI-like adenine MTase nearby.These MTases are expected to methylate GAATTC.Finally, for the PLD-wH endonucleases, we detected 17 cases of proximity to EcoEI-like (GAGN 7 ATGC) or EcoR124I-like (GAAN 6 RTCG) type I methyltransferases, also causing no conflict.Genetic conflict would not be expected in any of these cases if the new wH fusion proteins had specificity for 6mA in the GATC context.Next, we looked for possible genetic conflicts on a genome-wide scale, assuming that the new wH fusion proteins were specific for m6A in a GATC context.Three types of such conflict are conceivable.First, a frequent adenine methyltransferase, such as M.EcoGII, may methylate adenine to m6A in GATC, among many other contexts.Second, a Dam-like methyltransferase may specifically modify GATC sequences.Finally, there are also methyltransferases that methylate target sequences that are longer than GATC but include the GATC site in the recognition sequence.We searched for cases of such potential conflict, scanning the entire genome, not just the genomic neighborhoods.Overall, less than 100 cases of potential conflict were identified (Supplementary Tables S1-S3) for over 1,000 wH fusion proteins.Most of the wH fusion proteins in potential conflict are stand-alone enzymes without an associated methyltransferase.Genomic conflict could be avoided if these proteins recognized modified DNA containing a mark other than m6A in the GATC context (i.e., either another methylation type or m6A in a different sequence context).Alternatively, conflict may be avoided or mitigated by tight expression control of the endonuclease or the methyltransferase.

Selection of wH fusion proteins for experimental characterization
Four types of endonuclease domains are noted in wH domain fusions: (1) DpnI-like PD-(E/D)XK endonuclease; (2) HNH endonuclease domain; (3) GIY-YIG endonuclease domain; and (4) PLD family endonuclease domain.An NTD-wH-NTPase fusion is usually paired with another endonuclease subunit, such as McrC-like catalytic subunit, which is not discussed in detail here.We have not studied evolutionary relationships within each endonuclease family since the endonuclease families have been the subject of numerous review articles and research papers (Mehta et al., 2004;Grazulis et al., 2005;Pingoud et al., 2005;Dunin-Horkawicz et al., 2006).For experimental characterization, we chose representatives of the PD-(D/E)XK-wH (FcyTI), PUA-wH-HNH (HhiV4I), wH-HNH (SruGXI), wH-GIY-YIG (Ahi29725I, Apa233), and PLD-wH (Aba4572I, CbaI) fusions for further experimental characterization.In the case of the PUA-wH-HNH architecture, many additional candidate MDREs were tested in E. coli cells only.An attempt to purify an NTD-wH-NTPase/McrC-like subunit fusion protein was unsuccessful.Therefore, we focused this study exclusively on single-chain proteins.

The wH fusion endonucleases exhibit Dam + -dependent toxicity in E. coli cells
Endonuclease toxicity is a good proxy for restriction in bacterial cells (Heitman and Model, 1990;Fomenkov et al., 1994).If the putative MDREs were specific for Dam-methylated DNA, they should be more toxic to Dam + (C2566) than to Dam − (ER2948) E. coli cells.We tested this prediction with our IPTG-inducible expression constructs, both under basal (no IPTG) conditions and under induction conditions (0.5 mM IPTG).Since Dam − competent cells were roughly an order of magnitude less competent than Dam + cells, we avoided direct comparisons of transformation efficiency between Dam + and Dam − cells.Instead, we quantified the reduction in colony counts for transformations with expression plasmids compared to colony counts with an empty vector.With the exception of the Aba4572I expression construct, the plasmids for all other endonuclease-containing clones caused a reduction in colony count by two to three orders of magnitude compared to the empty vector control in Dam + cells under induced conditions (Figure 3).Aba4572I endonuclease may have strong non-specific endonuclease activity since it is toxic in a Dam-deficient strain under IPTG induction.The toxicity appeared to be less severe in Dam + cells.This result is not well understood.If the Aba4572I outlier is disregarded, the experimental results indicate that the wH fusion endonucleases display typical "restriction" on Dam + host but not Dam − cells.Whether this in vivo "restriction" was caused by tight binding to modified sites to inhibit replication or endonuclease cleavage of modified DNA remains to be investigated.

PD-(D/E)XK-wH endonucleases
As representatives of the PD-(D/E)XK-wH family, we selected Psp4BI and FcyTI (GenBank accession numbers WP_102090895 and WP_094411979).The two enzymes have 58.4 and 58.7% amino acid (aa) sequence identity to DpnI, respectively.Psp4BI was chosen because the source organism is psychrophilic, suggesting that the enzyme might be susceptible to heat inactivation, which would be desirable for biotechnological applications.The synthetic genes with E. coli optimized codons were cloned into pTXB1 in fusion with intein and CBD (chitin-binding domain) and expressed in the Dam-deficient T7 expression strain.The two enzymes were affinity purified on a chitin column and released from the column by DTT-triggered cleavage.The yield of Psp4BI was low due to poor expression of the fusion protein (Psp4BI-intein-CBD) (not shown); partially purified Psp4BI gave rise to a partial digestion pattern that was retained after 4 h at 25-37°C.The low activity could be caused by a low enzyme concentration or inhibition by some impurities in the preparation.By contrast, purified FcyTI was active on Dam + pBR322, pUC19 (HindIII-linearized), and phage λ DNA (Supplementary Figures S2A,B).FcyTI-specific activity was determined as approximately 32,000 U/mg protein in buffer 2.1.FcyTI could be inactivated by heating at 65°C for 30 min, which is a useful enzyme property (Supplementary Figure S3).FcyTI endonuclease was originally found in the genome of Flavobacterium cyanobacteriorum, which grows at 20-30°C.Due to its better biochemical properties, FcyTI was used for the in vivo toxicity study (Figure 3) and for the digestion of modified oligos (see Figure 4).The FcyTI expression plasmid showed over a 1,000-fold reduction in transformation efficiency in the Dam + T7 strain compared to a Dam − host (Figure 3).FcyTI could be over-expressed only in the Dam − T7 expression strain.Run-off sequencing demonstrated that the enzyme was able to cleave within the Gm6A↓TC recognition sequence (Supplementary Figure S4).The purified 6× His-tagged FcyTI shown in Supplementary Figure S5 was used for the digestion of modified or hemi-modified oligoduplexes.
To compare the activity of the enzyme toward fully methylated, hemi-methylated, and non-methylated DNA, we digested synthetic DNA oligoduplexes and quantified substrate and product amounts after restriction digestion by capillary electrophoresis (CE) (Figure 4).The results showed that FcyTI was most active on fully methylated  DNA but also had partial activity on hemi-methylated DNA, similar to DpnI.No digestion product was detected for non-methylated DNA.MboI was used as a control and digested only unmodified GATC oligos.Fully and hemi-modified substrates were resistant to MboI restriction (Figure 4).The original CE digestion results are shown in raw data (PeakScan analysis of CE peaks).Consistent with the duplex oligos digestion, unmodified pUC19 (HindIII-linearized) or phage DNA were poorly cleaved by FcyTI (Supplementary Figure S2), although weak activity was observed on Dam − pUC19, probably due to the high enzyme concentration.

PUA-wH-HNH fusion endonuclease HhiV4I
6× His-tagged HhiV4I (see Supplementary Figure S5) was subjected to three-step chromatography (Ni-agarose column, DEAE column, and Heparin agarose column).Compared to the recently published paper on the same enzyme (Lu et al., 2023), two additional chromatography steps were used (DEAE and Heparin columns).Unfortunately, the Heparin agarose chromatography step was less efficient for purification than is typical for other DNA-binding proteins because HhiV4I was in the flow-through and did not bind to the Heparin column, as would be expected for a typical nucleic acidbinding protein.As a result, HhiV4I was not purified to homogeneity (Supplementary Figure S5).Mass spectrometry analysis of the contaminations identified, among other proteins, E. coli exonuclease (Exonuclease VII, Exo VII) as a minor contaminant (see raw data for the HhiV4I mass spectrometry study).Exo VII cleaves single-stranded DNA (ssDNA) from both the 5′ → 3′ and 3′ → 5′ directions.This enzyme is not active on linear or circular dsDNA.The contaminating exonuclease would not likely interfere with major cut site determination, but it may interfere with minor cut site(s) by removing a few nucleotides for the cleaved ends by HhiV4I if the overhang is single-stranded.
The partially purified 6× His-tagged HhiV4I was used for HhiV4I characterization.Consistent with the findings of Lu et al. (2023), we observed that HhiV4I was much more active in the presence of Mn 2+ ions than in the presence of other divalent metal cations (Figure 5A).HhiV4I showed weak DNA-nicking activity in Mg 2+ buffer.
In agreement with the toxicity experiments (Figure 3) and the results of Lu et al. (2023), we found that the enzyme had higher activity against Dam + than Dam − pBR322, pUC19, λ DNA, and synthetic duplex oligos.If HhiV4I cleaved at or near Dam + sites, its cleavage products should be of similar size as those of DpnI digestion, and discrete bands (as opposed to a smear on the gel) should be observed.
In our experiments, we saw only a partial match of fragment sizes, likely due to incomplete digestion (Figure 6A) (see below for the two-site requirement for efficient cleavage).Dam + phage λ DNA was also only partially digested while Dam − λ DNA was not cut at all (λ DNA was partially methylated by the host Dam methylase during rapid phage replication in E. coli, unpublished observation) (Figure 6B).When Dam − λ DNA was methylated in vitro by Dam methylase or M.EcoGII, the DNA substrates now became cleavable by HhiV4I (Figure 6C), further demonstrating that GATC methylation is required for restriction.M.EcoGII-modified λ DNA appeared to be a slightly better substrate for HhiV4I restriction than Dam-modified λ DNA, indicating that the wH might not be strictly limited to the detection of 6mA in the GATC context.
We could digest non-methylated DNA with excess HhiV4I, suggesting that the dependence of the enzyme on adenine methylation was not absolute.Most restriction enzymes display star activity at high enzyme, high glycerol concentration, or low salt.This conclusion was confirmed with the digestion of synthetic DNA with a defined adenine methylation status.As expected, HhiV4I was most active on fully methylated DNA but had some activity on hemi-and non-methylated DNA (Figure 4).In agreement with the findings by Lu et al. (2023) we did not detect activity of HhiV4I toward PCR products, which contained 5mC or 5hmC instead of C, in conditions conducive to digestion of m6A containing DNA (Figure 5B).Since the PCR products contain 5mC and 5hmC in many different contexts, this result suggests that the enzyme has no activity against methylated or hydroxymethylated DNA, despite the presence of the PUA (SRA-like) domain.This was surprising because it had been shown previously that PUA superfamily REases VcaM4I, SRA-like domain-containing endonuclease TagI, and PvuRts1I restricted DNA containing modified cytosines (Janosi et al., 1994;Pastor et al., 2021).Possible activity against WT T4 [glucosylated(g)-5hmC] modified DNAs remains to be tested.HhiV4I shows no activity on dZ (modified adenine, 2-aminoadenine, or 2,6-diaminopurine)-modified PCR DNA (Figure 5B).
HhiV4I prefers to cut between two G6mATC sites with optimal spacers of 13-27 bp in Dam + pBR322.Shorter spacers of 8-11 bp or longer spacers >42 bp were cleaved more slowly.Run-off sequencing of Dam + HhiV4I DNA confirmed that the enzyme cleaved in the vicinity of but not within the G6mATC sequence, as previously reported (Supplementary Figure S6).

PUA-wH-HNH and wH-HNH endonucleases
In contrast to the prophage-encoded HhiV4I, most PUA-wH-HNH enzymes (375-496 aa long) and wH-HNH endonucleases (224-283 aa long) are bacterial/archaeal enzymes.For 15 of these enzymes and HhiV4I as a positive control, we attempted expression in the Dam − E. coli cells.Moreover, we analyzed the transformation efficiency into Dam + (C2566) and Dam − (ER2948) cells compared to the empty vector.Restriction activity was examined in the presence of IPTG induction to elevate the genome conflict (Table 1).Some restriction genes, such as HhiV4I and SruGXI, had a strong toxic effect, as detected by a 100-1,000-fold reduction in transformation efficiency in the Dam + host.Other ORFs caused an approximately 10-fold reduction in transformation efficiency, consistent with partial restriction (+/−).The transformation of the HhaN23I gene caused the formation of very small colonies in the presence or absence of IPTG, indicating partial restriction.A few ORF constructs showed no difference in transformation efficiency in the Dam + host, presumably as a result of poor expression or lack of activity (e.g., HboP9I).As a control, the pTXB1 empty vector could be readily transferred into C2566 (Dam + ) or ER2948 (Dam − ) cells in the presence of IPTG (Table 1; Figure 7).
Selected enzymes that appeared to be promising as Dam +dependent MDREs were partially purified, and their activity was tested on Dam + pBR322 or λ DNA.The partially purified HtuIII enzyme (GenBank accession number NC_013743, PUA-wH-HNH fusion) shows a low nicking activity in Mn 2+ or Co 2+ buffer (Supplementary Figure S7).DNA run-off sequencing of the partially nicked pBR322 indicated that the nick occurred upstream of the Gm6ATC site (top strand nicking only; ↓NGm6ATC-N14-Gm6ATC).
The results of wH-HNH and PUA-wH-HNH endonuclease activities and in vivo toxicity are summarized in Table 1.Analogous to HhiV4I, HtuIII also preferred Mn 2+ or Co 2+ for catalytic activity, suggesting that both enzymes have a unique metal ion binding site that is different from the typical HNH ββα-metal catalytic domain found in type II REases, homing endonucleases, Cas9, and non-specific endonucleases utilizing Mg 2+ as a cofactor.The cofactor preferences are similar to the preferences of E. coli EcoKMcrA endonuclease and ScoMcrA (Liu et al., 2010).

wH-GIY-YIG endonucleases
Two wH-GIY-YIG fusion proteins, Ahi29725I (WP_035368356) and Apa233I (WP_026653965), were selected for characterization.The proteins occur naturally in Acholeplasma hippikon (ATCC29725 strain) and Acholeplasma palma (J233 strain), respectively.Acholeplasma are bacteria without cell walls in the Mollicutes class with small genomes (1.5-1.65 mbp).Acholeplasma species are found in animals, insects, and some plants in the environment.Some Acholeplasma species are pathogenic and can contaminate mammalian cell cultures.We expressed both proteins in Dam − E. coli and purified the proteins by chromatography through chitin, DEAE, and Heparin columns.The analysis of the purified proteins on SDS-PAGE is shown in Supplementary Figure S5.Protein mass spectrometry analysis of the purified enzymes showed minimal exonuclease contamination (see raw data for protein composition analysis).
The purified Ahi29725I enzyme was assayed on Dam + and Dam − λ DNA to test modification dependence (Figure 8).Dam − λ DNA was also methylated by Dam methylase (M.Dam) or EcoGII frequent adenine methylase (M.EcoGII) in the test tube and used for Ahi29725I digestions.The Ahi29725I endonuclease generated a partial digestion pattern on Dam + λ DNA.It showed no cleavage activity on Dam − λ DNA, indicating restriction activity dependent on Dam modification.When Dam − λ DNA was methylated in vitro by Dam methylase or M.EcoGII, the modified substrates now became cleavable by Ahi29725I (Figure 8).In control digestion, MboI, DpnII, and Sau3AI are able to cleave Dam − λ DNA, but DpnI cannot.Similarly, Ahi29725I and Apa233I endonucleases are also active on Dam + pBR322 and inactive on Dam − pBR322 (Figure 9).However, high enzyme concentrations of Apa233I resulted in non-specific digestion (smearing) of Dam − DNA.The finding was attributed to the non-specific activity on unmodified DNA, since most restriction enzymes display star activity at high enzyme, high glycerol concentration, or low salt.The Ahi29725I and Apa233I digested pBR322 (Dam + ) DNA was subjected to run-off sequencing with primers annealing near the Gm6ATC sites.Cleavages occurred outside the recognition sequence, at a variable distance from the site of methylation (i.e., Ahi29725I cleaves G6mATC at N 1-23 ) (Supplementary Figures S9, S10).Ahi29725I and Apa233I have limited, if any, preference for cleavage at NN/RN and NN/GN sites, respectively (Supplementary Figure S11), which would have to be attributed to endonuclease sequence preferences.
To test whether Ahi29725I catalyzed DNA cleavage could be directed by adenine methylation in addition to the G6mATC sequence context, we digested M.EcoGII-modified pBR322 DNA (Dam − ) to see any enhancement of activity due to frequent adenine methylation.M.EcoGII is capable of methylating all adenines in DNA substrates except in polyA tracks (Murray et al., 2018).Ahi29725I activity was enhanced on M.EcoGII-methylated DNA   In vivo toxicity study: plasmid transfer into C2566 (Dam + ) and ER2948 (Dam − ) competent cells by transformation (~50 ng plasmid DNA).Three types of restriction phenotypes were observed: a strong reduction in transformation efficiency due to gene conflict (e.g., HhiV4I, SruGXI, and HspR68I); small colony formation in Dam + hosts presumably due to mild toxicity of the restriction gene (e.g., HhaN23I); and no noticeable change in transformation efficiency (e.g., HboP9I) compared to the empty vector control.The assay was done semi-qualitatively based on visualization of the transformation plates and not quantitatively since the number of colonies was not counted.Toxicity was more apparent with IPTG induction (0.5 mM IPTG in Amp plates).The overall transformation efficiency is lower in the Dam-deficient host. 10.3389/fmicb.2024.1286822 Frontiers in Microbiology 14 frontiersin.orgsubstrate (Supplementary Figure S12).Three large fragments of Dam + DNA were further digested into smaller fragments after M.EcoGII methylation.However, it was not clear whether the enhanced activity was due to 6mA-dependent relaxed sequence recognition (e.g., cleavage near the Cm6ATC star site or Sm6ATS sites, S = G and C).The enhanced activity on M.EcoGII-modified DNA remains to be characterized in future by using defined modified oligos or restriction digestion/NGS sequencing mapping of M.EcoGII-modified λDNA.
In digestion of methylated duplex oligos with a single G6mATC site, it was noted that Ahi29725I preferentially cleaved fully methylated oligos (M+/M+) over hemi-modified substrates (M+/M− or M−/ M+).However, Apa233I endonuclease was able to cut both fully modified and hemi-modified oligos (Figure 4).This discrepancy in methylation dependence between the two enzymes is unexplained.
If the in vitro divalent cation requirement of Ahi29725I was relevant in cells, invading 6mA-modified DNA would be digested by Mg 2+ -bound Ahi29725I.By contrast, activation of the non-specific endonuclease activity of Ahi29725I with Mn 2+ , Co 2+ , or Ni 2+ in the active site could lead to cleavage of both cellular and invading DNA regardless of modifications, triggering cell death and preventing phage release.

PLD-wH endonucleases
We identified 27 predicted PLD-wH fusion endonucleases in bacterial genomes.Two putative restriction genes from Anaerolineaceae bacterium (Aba4572I) and Chloroflexi bacterium (CbaI) were cloned into the pTXB1 expression vector.However, upon IPTG induction, no over-expressed proteins were detected.In the gene neighborhood analysis, the Aba45721 ORF resides in a genomic region of (1) DNA MTase (predicted specificity CGATCG, amino-MTase), (2) PLD-wH endonuclease, and (3) and (4) hypothetical proteins.If the CGATCG site is methylated to become CGm6ATCG, it would be a substrate for the PLD-wH endonuclease, which could potentially result in self-restriction.The CbaI enzyme is located in a region with (1) Leu-tRNA ligase, (2) restriction endonuclease, (3) PLD-wH endonuclease, (4) hypothetical protein, and (5) dimethyl-menaquinone MTase.Since the transformation of Aba4572I and CbaI was less toxic in Dam − cells in a non-induced condition (see Figure 3), the lack of expression in Dam − cells is surprising and requires further investigation.Expression of two more PLD-wH fusion proteins containing the conserved catalytic residues HxDx(4)K and HxEx(4)K in the PLD endonuclease domain in E. coli Dam − cells was not successful due to toxicity.More work is necessary to explain the reasons for the poor expression of PLD-wH fusion endonucleases in E. coli.

NTD-wH-NTPase fusions
The N-terminal domain-wH-NTPase fusions are usually paired with another catalytic subunit, such as an McrC (Ross et al., 1989) McrC-like protein with a PD-(D/E)XK endonuclease motif (Pieper and Pingoud, 2002).This arrangement is reminiscent of McrBC, a type IV restriction system acting on modified cytosines (Stewart et al., 2000).We only made one unsuccessful attempt to purify a putative heterodimeric NTD-wH-NTPase/McrC complex.Therefore, we have not studied the possible activity of these enzymes toward 6mA-containing DNA or their more general activity in the restriction of modified DNA.More work is needed to characterize this group of putative type IV restriction systems with wH-NTPase fusion.

Discussion
wH domain as a sensor of fully methylated ApT in a dsDNA context The wH domain was first associated with adenine methylation because of its presence in the C-terminal region of E. coli and phage T4 Dam methyltransferases (Teichmann et al., 2012), and later because of its presence in the adenine methylation-dependent DpnI restriction endonuclease.A subsequent study on DpnI showed that the wH domain binds dsDNA at fully methylated ApT sites without base flipping.The two methyl groups, which are in close proximity, are bound in a single pocket of the wH domain of DpnI (Mierzejewska et al., 2014).The study on DpnI also showed that the specificity of the domain for the flanking sequence was somewhat relaxed with respect to the Dam Gm6ATC consensus and the Sm6ATS sites (where S stands for G or C) (Siwek et al., 2012).In this study, we show that the properties of an adenine methylation reader carry over to many fusions with HNH, GIY-YIG, and likely also PLD endonuclease domains.If methylation is seen in the ApT context, there is a preference for fully methylated sites except for Apa233I (see Figures 3, 4).Our study shows that in all tested fusion proteins, the wH domain can operate as an adenine methylation reader for the Gm6ATC context.A study on the wH-GIY-YIG endonucleases further indicates that additional cleavage sites are likely created when DNA is hypermethylated by M.EcoGII (see Supplementary Figure S12).Hence, the wH domains of the wH-GIY-YIG endonucleases also suggest that the Gm6ATC preference may be relaxed, but the star binding sites remain to be characterized (star sites are usually defined as DNA sequences with one base off from the cognate site; if there are two bases off, these sequences are usually called non-cognate sites; Pingoud et al., 2016).
The identification of prokaryotic winged helix domains as sensors of adenine methylation contrasts with the role of some eukaryotic winged helix domains as sensors of non-methylated CpG (Stielow et al., 2021;Becht et al., 2023;Weber et al., 2023).The superposition of the winged helix domains of prokaryotic DpnI (Mierzejewska et al., 2014) and eukaryotic KAT6A (also called histone acetyltransferase KAT6A, lysine acetyltransferase 6A, zinc finger protein 220, and MYST-3) (Weber et al., 2023) shows that the dsDNA molecules are bound to opposite faces of the wH domain (Figure 10), indicating that the two DNA-binding modes have likely evolved independently for needs that are characteristic for prokaryotes (sensing of Dam methylation) and eukaryotes (sensing of the absence of CpG methylation).

Cooperation with endonuclease domains
For most NTP-independent MDREs, there is a clear division of labor between the modification reader and endonuclease domains.The former recruits the enzyme to modification sites, and the latter cleaves the DNA at a distance from the recognition site, which is likely defined by the length of the linker that connects the two domains.The nuclease domain has generally low or only a very relaxed sequence specificity, and it is likely not modification-specific.How the modification sensor domain keeps the activity of the nuclease domain in check is not well understood.In some cases, it can be shown that the linker has an inhibitory role for the endonuclease that is only relieved once modified DNA is bound to the reader domain and the complex reorganizes structurally (Pastor et al., 2021).The PUA-wH-HNH (HhiV4I) and wH-GIY-YIG (Ahi29725I and Apa233I) that were tested by run-off sequencing are consistent with this expectation.As for the cytosine modification-specific MDREs, cleavage occurred mostly at a distance from the recognition sequence except for the BisI family REases (e.g., Eco15I and NhoI) that cut within the recognition sequence GCNGC with 2-4 modified cytosines (Xu et al., 2016).Among the wH fusion endonucleases, DpnI and its isochizomers are the exception to the rule that cleavage occurs always outside of and not within the recognition sequence.Mechanically, DpnI DNA cleavage within the recognition sequence is a consequence of the fact that the endonuclease domain has separate sequence and modification specificities (Siwek et al., 2012).In this scenario, the role of the wH domain is similar to the role of the extra specificity domain in type IIE restriction endonucleases (Roberts et al., 2003), except that the target sequence contains a modified base.Type IIE restriction endonucleases are one of the subfamilies that require a pair of target sequences in order to show activity (Senesac and Allen, 1995;Colandene and Topal, 1998).Run-off sequencing shows that FcyTI and Psp4BI cleave inside the recognition sequence, like DpnI, pointing to the separate sequence and modification specificity of the catalytic PD-(D/E)XK domain.Given the high sequence conservation of the PD-(D/E)XK-wH endonuclease family, it is likely that this property is general for the entire family.

Sensors/readers for N6-methyladenine in DNA
Despite the many roles of DNA adenine methylation in prokaryotes (and eukaryotic organelles), the repertoire of reader domains for m6A in DNA is still surprisingly limited (Iyer et al., 2016).Perhaps the bestknown adenine methylation sensors are the YTH (Liao et al., 2018) domains, which belong to (or are related to) the PUA superfamily domains.The PUA superfamily domains are believed to flip the modified 2′-deoxynucleotide out of duplex DNA (Shao et al., 2014) or to bind a single nucleotide in RNA in the reader pocket (Li et al., 2014;Xu et al., 2014).Therefore, at least when acting in isolation, they can be considered as sensors of a single modified adenine.Consistent with this role, most YTH domains sense adenine methylation in RNA rather than DNA.However, some YTH and ASCH domains in prokaryotes are considered as DNA adenine methylation sensors (Iyer et al., 2016).For the ASCH domains, this remains to be experimentally shown, since currently only a 4mC reader role is experimentally supported (Stanislauskiene et al., 2020).Apart from the YTH and ASCH domains, the HARE-HTH and RAMA (Restriction enzyme and Adenine Methylation Associated) domains have also been suggested to serve as readers of adenine methylation in DNA (Teichmann et al., 2012).The HARE-HTH domains are related to winged helix domains, which would be consistent with a role as adenine methylation sensors.However, they have an extra helix inserted into the HTH motif of the winged helix domain, and recent analysis suggests that they are more likely to sense cytosine modifications (Aravind and Iyer, 2012).Unlike Structural comparison of the wH domains of DpnI (G6mATC) with bacterial origin and mammalian KAT6A protein that recognizes unmodified CpG sites.The docking of wH domains on dsDNA shows a major difference in recognition.The wH domain-containing protein KAT6A is a histone lysine acetyltransferase that acetylates lysine residues in histones H3 and H4 (in vitro) (Weber et al., 2023(Weber et al., ). 10.3389/fmicb.2024.1286822 .1286822Frontiers in Microbiology 17 frontiersin.orgthe HARE-HTH domains, the RAMA domains are unrelated to the wH domain in fold (Yang et al., 2023).For the RAMA domain-containing MPND protein, there is some biochemical evidence for adenine methylation sensing (Kweon et al., 2019).However, a preference for adenine-methylated DNA could not be experimentally confirmed (Yang et al., 2023).We noticed the occurrence of RAMA-Mrr catalytic domain (PD-QXK)-NTPase (three-domain fusion) and GIY-YIG-RAMA (two-domain fusion) in prokaryotes, which might indicate that the RAMA domain is utilized similarly to the wH domain in these fusions.Future studies will be focused on the characterization of YTH-NTPase, YTH-HNH, and RAMA fusion endonucleases as 6mA readers/sensors in type IV restriction systems.

FIGURE 1
FIGURE 1 Winged helix (wH) domain fold and role as a methylation sensor.(A) Canonical wH fold.(B) Cartoon representation of the DpnI wH domain bound to fully methylated DNA, based on the crystal structure (Siwek et al., 2012).(C) Methyl binding region of the DpnI wH domain.(D) Alignment of representative winged helix domains in endonuclease or NTPase fusions.The secondary structure annotation is based on the DpnI experimental structure, analyzed for secondary structure elements using DSSP(Kabsch and Sander, 1983).DpnI residues that are involved in the formation of the pocket for the methyl groups (of DNA methylated in both strands) are marked by an "m," and those that are involved in hydrogen bonding with the nucleobases of the GATC target sequence of DpnI are marked by an asterisk ("*").Their identities and residue numbers in case of DpnI are indicated below the alignment (with reference to the DpnI structure with PDB accession 4kyw;Mierzejewska et al., 2014).Note the strong overlap between methyl pocket-forming residues and residues that are involved in target sequence selection.
10.3389/fmicb.2024.1286822Frontiers in Microbiology 06 frontiersin.orgFokI).If the new wH fusion proteins were modification-specific, they should occur as stand-alone enzymes.Otherwise, they should be associated with a host genome protecting DNA methyltransferase of any type (N6mA, N4mC, C5)

FIGURE 2
FIGURE2CLANS analysis of the full-length wH fusion proteins with PD-(D/E)XK (marked as red), PUA and HNH (dark blue), HNH (light blue), GIY-YIG endonuclease domains (yellow), and NTD (N-terminal domain)-NTPase (green) found in this study, with BamHI and related isoschizomers as a control group (purple).BOX: CLANS analysis of the wH domains alone, color-coded as in full-length proteins.

FIGURE 3
FIGURE 3 Toxicity of selected wH domain fusion endonucleases to a Dam + but not a Dam − host.Expression vectors containing open reading frames for putative MDREs or empty plasmid (50 ng) were transformed into Dam-positive C2566 (+) or Dam-negative ER2948 (−) E. coli cells, with (+) or without (−) IPTG induction.The reduction in colony counts for expression plasmid compared to the empty vector, plotted on the ordinate, is a measure of toxicity.

TABLE 1
In vivo toxicity of PUA-wH-HNH and wH-HNH endonucleases. in vivo toxicity, and in vitro activity of PUA-wH-HNH and wH-HNH endonucleases.The in vivo toxic effect of the restriction gene was measured by transformation into Dam + and Dam − E. coli competent cells under IPTG induction.+, strong restriction (100-1,000-fold reduction in Dam + cells), +/−, mild restriction (~10-fold reduction in Dam + cells or formation of sick small colonies), −, no restriction.Protein expression level: +++, 2-10 mg protein per liter of IPTG-induced cells; ++, 1-1.5 mg/L; −, target protein not detected after chitin column purification or in IPTG-induced cell extract.Proteins with 375-496 aa residues are PUA-wH-HNH fusions; proteins below 300 aa residues are wH-HNH fusions.DauS27I is found in the G + bacterium Dictyobacter aurantiacus S27.The other enzymes are found in Archaea (Halobacteria).ND, not determined.Dam + pBR322 was used for in vitro cleavage assays.