The Electronic Behavior of Zinc-Finger Protein Binding Sites in the Context of the DNA Extended Ladder Model

Oiwa, Nestor N.; Cordeiro, Claudette E.; Heermann, Dieter W.

doi:10.3389/fphy.2016.00013

ORIGINAL RESEARCH article

Front. Phys., 11 May 2016
Sec. Biophysics
Volume 4 - 2016 | https://doi.org/10.3389/fphy.2016.00013

The Electronic Behavior of Zinc-Finger Protein Binding Sites in the Context of the DNA Extended Ladder Model

Nestor N. Oiwa^1,2^*

Claudette E. Cordeiro³

Dieter W. Heermann²

¹Department of Basic Science, Universidade Federal Fluminense, Nova Friburgo, Brazil
²Institute for Theoretical Physics, Heidelberg University, Heidelberg, Germany
³Department of Physics, Universidade Federal Fluminense, Niterói, Brazil

Instead of ATCG letter alignments, typically used in bioinformatics, we propose a new alignment method using the probability distribution function of the bottom of the occupied molecular orbital (BOMO), highest occupied molecular orbital (HOMO), and lowest unoccupied orbital (LUMO). We apply the technique to transcription factors with Cys₂His₂ zinc fingers. These transcription factors search for binding sites, probing for the electronic patterns at the minor and major DNA groves. The eukaryotic Cys₂His₂ zinc finger proteins bind to DNA ubiquitously at highly conserved domains. They are responsible for gene regulation and the spatial organization of DNA. To study and understand these zinc finger DNA-protein interactions, we use the extended ladder in the DNA model proposed by Zhu et al. [1]. Considering one single spinless electron in each nucleotide π-orbital along a double DNA chain (dDNA), we find a typical pattern for the bottom of BOMO, HOMO, and LUMO along the binding sites. We specifically looked at two members of zinc finger protein family: specificity protein 1 (SP1) and early grown response 1 transcription factors (EGR1). When the valence band is filled, we find electrons in the purines along the nucleotide sequence, compatible with the electric charges of the binding amino acids in SP1 and EGR1 zinc finger.

Introduction

Nucleotide alignments are the standard method for spotting the DNA-protein binding sites along the genome. However, transcription factors do not identify nucleotides, but probe the dDNA surface, searching for the π-orbital electronic patterns. In this work, we develop the concept of the electronic alignment using one of the major eukaryotic DNA-protein binding motifs, which are those related to zinc fingers (ZF). ZFs form a key protein family for the chromatin condensation as well as the gene regulation. There are around one thousand ZF encoding genes [2] and ten thousands highly conserved putative ZF binding sites along the human genome [3, 4]. The majority of ZF proteins assist transcription factors, acting as repressors, activators and regulators [2, 5]. They are responsible for the genome special structure in the DNA loops too, exposing or hiding the genes, and work as an insulator, avoiding the spread of heterochromatin [6]. These ZF proteins could mediate long-range chromosomal interactions in eukaryotic cells, >100 thousand base pair (bp) [7–10]. However, the exact relation between the long-ranged correlation in genomic scale nucleotide sequences (>20 thousand bp) and the chromosomal three-dimensional organization is still not clear [10–15] and subject to intense research. Furthermore, since transcription factors spot specific sequences without the opening of the double helix, we expect some biological mechanism for probing nucleotide based on local properties [16]. To understand this there are two basic approaches: a polymeric description and by electric charges. The most common polymeric description considers DNA as a single one-dimensional strand, explaining the DNA denaturation semi-analytically [17–19]. The literature also report the mechanical properties of chromosomal fibers and the long-range nucleotide interaction due to DNA loops, histones and zinc finger proteins, using Monte Carlo simulation [10, 20–22]. Since electrons play a crucial role in the DNA-protein interaction, we must consider the DNA from the electric charge distribution too. The electronic nature of DNA is still under debate. The double helices behave as insulators or conductors under silver deposition [23], material contaminants [24, 25] and others environmental conditions [26]. However, when the conductivity is measured in atmosphere, low vacuum or Tris-HCl buffers, DNA has semiconductor features with the typical gap between the valence and conductor band in the electronic density of states (DOS) [26–31]. In order to describe this behavior, ionization models (also known as ballistic, polaron, or wire-like charge transport) have been proposed [32–36]. The parameters in the ionization models are easily measured, since one just needs to evaluate the loss of energy when we take one electron in a neutral molecule. The lost electron is usually in the highest occupied molecular orbital valence band (HOMO) and it may easily jump to the lowest unoccupied molecular orbital in the conductor band (LUMO). But, the literature also suggests electronic affinity models, where the energy is described by the gain of electrons in neutral molecules [37–41]. These theoretical results usually combine density functional theory and molecular dynamic simulation.

In 2007 Zhu et al. joined both molecular ionization and affinity approaches [1]. This adaptation of the Peyra-Bishop DNA melting model [17] describes the nucleotide sequence from their semi-conductor features, avoiding the heavy computational cost of ab initio molecular dynamical simulations. Their work allowed to spot electronic local density of states (LDOS) in one viral P5 promoter sequence, connecting LDOS with one specific biological function [1, 16]. Unfortunately, they did not search methodically for patterns in many sequences. Neither they did look for the gap between HOMO and LUMO in (C)_n as one expects from the experimental data [30, 42].

Our work begins at this point. We fix the problem of the HOMO-LUMO gap in the model of Zhu et al. introducing the extended ladder in the model as suggested by Senthilkumar et al. [36, 43–45]. We consider the π-orbital of the nucleotide in our model instead of the interstrand hydrogen bond between base pair as Zhu et al. [1]. We also analyze systematically the DNA-protein binding sites for two transcription factor proteins: the human specificity protein transcription factor 1 (SP1) [46] and early grown response factor (EGR1, aka Zif268) [47], both localized in the promoter of a great variety of genes and characterized by a molecular structure called Cys₂His₂ zinc finger (ZF). The descriptions of ZFs as well as SP1 and EGR1 are in the Appendix of Supplementary Material. Finally, we report an electronic distribution pattern for SP1 and EGR1 binding sites. The reported motifs do not use scores, weighting the nucleotide sequence similarity as in bioinformatics [48], but they present the resemblance of the electronic cloud position between the nucleotide sequences.

The paper is organized as follows. First, we discuss the selection criteria of GenBank files and procedure for nucleotide alignments in the section material. Then, we describe the extended ladder model as a method for computing the electronic clouds associated with nucleotides. We test the model, studying the electronic behavior of (C)_n and (T)_n sequences. After this, we analyze systematically the SP1 and EGR1 binding sequences and report strand dependence and independence (see Section Results and Discussion).

Materials

We use the DNA sequence from the human reference map, annotation release 106 (build GRCh38/hg38) [49]. The criteria for selecting the binding sites in this work are as follows. The binding sites must have experimental confirmation in vitro. We remark that we usually observe many single nucleotide polymorphisms (SNPs) between the reference map and the reported experimental samples, because the reference map is basically a consensus sequence from nine individuals [50] while the samples in experimental binding site data belongs to one individual. Nested binding sites are a common occurrence, but we try to avoid overlapped binding sites in order to simplify the search for an electronic motif. The binding site of the transcription factors is in the promoter, a region between 500 and 2000 bp distant from the beginning of the gene. We spot similar SP1 and EGR1 binding sites, TATA box and other structures reported in individual samples in the GenBank reference map as well as in databanks as in the Eukaryotic Promoter Database for SP1 and EGR1 [51–53]. Finally, we use the nucleotides sequences in FASTA and GenBank flat file format, since the nucleotides are just nucleotides with the phosphate group.

We select 16 binding sites in 10 different genes, see Table 2. Details about the selected files are in the Appendix of Supplementary Material.

The Method: The Extended Ladder DNA Model

In this paper, we consider one double DNA chain with n base pairs, totaling 2n nucleotides, Figure 1D. In reality our model does not consider nucleotides, but nucleosides, i.e., the nucleotide with the phosphate group. However, we call nucleosides nucleotides in this work in order to simplify the nomenclature. The electronic behavior of the spinless free electron of the π-orbital of the nucleotide is given by the same Hamiltonian as in Zhu et al. [1],

\begin{matrix} H = H_{e} + H_{e b} + H_{b} . & (1) \end{matrix}

This Hamiltonian combines elements from the Peyra-Bishop DNA melting [17] and charge transport models [32–36]. The first term in Equation (1) is given by,

\begin{matrix} \begin{array}{l} H_{e} = \sum_{i = 1}^{2 n} ϵ_{i} C_{i}^{†} C_{i} + (\sum_{i = 1}^{n - 1} t_{2 i - 1, 2 i + 1} C_{2 i - 1}^{†} C_{2 i + 1} + \\ \sum_{i = 1}^{n - 1} t_{2 i, 2 i + 2} C_{2 i}^{†} C_{2 i + 2} + \sum_{i = 1}^{n - 1} t_{2 i - 1, 2 i} C_{2 i - 1}^{†} C_{2 i} \\ + \sum_{i = 1}^{n - 1} t_{2 i - 2, 2 i + 1} C_{2 i - 2}^{†} C_{2 i + 1}) + H . c . \end{array} & (2) \end{matrix}

where $C_{i}^{†}$ and C_i are the electron creation and annihilation operators at site i, ϵ_i is the on-site ionization energy, n is the number of nucleotides and t_ij is the electron hopping rate between nucleotides i and j. Here, we are using the extended ladder, where we duplicate the one dimensional lattice in Zhu et al. [1] and include the interstrand hopping, Table 1 [36, 43, 45]. The structure of the ladder considers the long-distance charge and hole transport along dDNA [43, 54–56] The second term in Equation (1) represents the coupling between the free electron and the nucleotide displacement field,

\begin{matrix} H_{e b} = α_{v} \sum_{i = 1}^{2 n} y_{i} C_{i}^{†} C_{i} & (3) \end{matrix}

where y_i is the displacement (dark dotted line) of the electronic cloud from the equilibrium in the nucleotide (light dotted line), Figure 1D. The last term H_b represents the interaction of the electron with the nucleotide:

\begin{matrix} H_{b} = \sum_{i = 1}^{2 n} [D_{i} {(e^{- a_{i} y_{i}} - 1)}^{2} + \frac{k_{v}}{2} (1 + ρ e^{- α (y_{i} + y_{i + 1})}) {(y_{i} - y_{i - 1})}^{2}], & (4) \end{matrix}

where D_i and a_i are parameters of the Morse potential, k_v is the spring constant of the anharmonic interaction between two contiguous base-pairs. ρ and α are the parameters for modifying k_v in order to evaluate long-range cooperative electronic behavior [1].

TABLE 1

Table 1. Hopping rates in eV for the extended ladder model reported in Sarmento et al. [36], Senthilkumar et al. [43], Zilly et al. [45].

FIGURE 1

Figure 1. (A) Spatial structure of the three zinc fingers of EGR1 (blue) embracing the DNA major grove (orange). The zinc ions are in black and the DNA-protein bindings of the second zinc finger are in red. (B,C) are the DNA binding sites and amino acid sequence for the three zinc fingers of EGR1 [67] and SP1 [5] (1ZF to 3ZF). Solid red lines indicate the binding between one particular nucleotide and its correspondent amino acid. The dotted black lines in (A–C) are hydrogen bonds that stabilize the first G-R or G-K bonds in each zinc finger. When the valence band is filled, n_e = n, the nucleotides in yellow are those with 100 % probability of electronic presence, while the holes are in gray. The negative charged amino acids with weak (threonine, T) or strong acid property (glutamic acid, E) are indicated in yellow too. The positive charged basic argine (R), histidine (H), and lysine (K) as well as protein-binding cytosines are indicated in gray. (D) The diagram for the DNA extended ladder model. The light dotted line is the electronic equilibrium radius for the Morse potential. The dark dotted lines is the field displacement y_i. The dashed-dotted are the purines (adenine and guanine) electronic clouds with n_i = 1.0 and n_e = n. The dashed lines are the interstrand electronic hopping. The solid lines are the sugar phosphate backbones.

We study the electronic part H_e and H_eb of the Hamiltonian in Equation (1) computing the eigenvalue E_k and eigenvectors $ϕ_{i}^{k}$ , i, k = 1, …, 2n, of the matrix

\begin{matrix} H_{e + e b} = | \begin{matrix} ϵ_{1} + α_{v} y_{1} & t_{1, 2} & t_{1, 3} & t_{1, 4} & 0 & 0 & 0 & \dots \\ t_{2, 1} & ϵ_{2} + α_{v} y_{2} & t_{2, 3} & t_{3, 4} & 0 & 0 & 0 & \dots \\ t_{3, 1} & t_{3, 2} & ϵ_{3} + α_{v} y_{3} & t_{3, 4} & t_{3, 5} & t_{3, 6} & 0 & \dots \\ t_{4, 1} & t_{4, 2} & t_{4, 3} & ϵ_{4} + α_{v} y_{4} & t_{4, 5} & t_{4, 6} & 0 & \dots \\ 0 & 0 & t_{5, 3} & t_{5, 4} & ϵ_{5} + α_{v} y_{5} & t_{5, 6} & \dots & \dots \\ 0 & 0 & t_{6, 3} & t_{6, 4} & t_{6, 5} & ϵ_{6} + α_{v} y_{6} & \dots & \dots \\ 0 & 0 & 0 & 0 & ⋮ & ⋮ & ⋱ & ⋱ \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋱ \end{matrix} | . & (5) \end{matrix}

This matrix is similar to the one suggested in Sarmento et al. [36, 45], except for the electron base component H_eb.

In order to estimate y_i, we consider the self-consistency condition, given by

\begin{matrix} < \frac{\partial H_{b}}{\partial y_{i}} + \frac{\partial H_{e b}}{\partial y_{i}} > = 0, & (6) \end{matrix}

where < … > represent the average over the free electrons in the system. The iteration method for solving Equations (5) and (6) is described in Zhu et al. [1], and it consists of the follow procedure. Given a initial condition for {y_i}, we diagonalize the matrix in Equation (7) in order to compute the electronic occupation in each site < n_i>, where $n_{i} = \sum_{k = 1}^{n_{e}} | ϕ_{i}^{k} |^{2}$ and n_e is the number of electrons in the system. This set of < n_i > will be used in the Langevin equation calculated from Equation (6). We update the values of {y_i}, using fourth-order Runger-Kutta method for the Langevin equation. The new {y_i} set is inserted again in the matrix of Equation (5). We repeat the iteration until we achieve the minimum local adiabatic electronic and structural configuration. The computations were done using R with the package deSolve for the Runger-Kutta algorithm [57]. The choices of the model parameters are in the Appendix of Supplementary Material.

In this work, we estimate the spatial distribution of electrons, energy level and displacement field only considering n_e = n. Thus, the valence band is always filled with electrons and the conductor band is empty. Our model does not have periodic boundary condition. Hence, the selected regions for our analysis must be large in order to avoid boundary effects. We analyze only nucleotide sequences with a distance of at least 10 bp from the beginning to the end of the sample.

We apply the proposed model in poly(C)-poly(G) and poly(T)-poly(A) sequences with 63 base pairs in order to understand the behavior of the electrons dispersed along the DNA chain.

According to Mehrez and Anantram [36, 44, 45, 58], we do expect a gap in the energy band in the test sequences (C)₆₃ and (T)₆₃ as can be seen in Figure 2A. Although we do not show in this work, our reproduction of the Zhu et al. computation confirms the gap for (T)₆₃ sequence [1]. But, we do not find the gap in (C)₆₃ applying their approach. The gap in (C)₆₃ is absent in their model, because they consider the interstrand hydrogen bonds instead of π-orbitals. Returning to our results, the gap between the valence and conductor band in (T)₆₃ is narrower than in (C)₆₃. Furthermore, the gap of the pure (C)₆₃ sequence can be modulated, when we introduce one single T in the position 32. One HOMO and LUMO appear in the gap of the energy band, marked as H and L in Figure 2D. Moreover, the HOMO and LUMO electronic cloud, dispersed in pure (C)₆₃, black lines in Figures 2B,E), becomes localized in the introduced T (red lines in Figures 2B,E). We notice that the electronic cloud of HOMO is dispersed in a pure (C)₆₃. Thus, thymines and adenines are related with LUMOs and cytosine and guanin are linked with the localization of the bottom of occupied molecular orbital (BOMO).

FIGURE 2

Figure 2. (A) The electronic density of states (DOS) for (C)₆₃ in black lines and (T)₆₃ in red lines. (D) Same as in (A), except that the sequence has one C or T in i = 32. In (D) BOMO for (T)₆₃ with one replaced C in the position 32 is pointed as G, and the HOMO and LUMO energetic level for (C)₆₃ with T in i = 32 are respectively indicated by H and L. The electronic cloud for HOMO $| ϕ_{i}^{HOMO} |^{2}$ (B) and LUMO $| ϕ_{i}^{LUMO} |^{2}$ (E) for (C)₆₃ (black lines) and the same sequence with T in i = 32 (red lines). The electronic cloud for HOMO $| ϕ_{i}^{HOMO} |^{2}$ (C) and BOMO $| ϕ_{i}^{0} |^{2}$ (F) for (T)₆₃ (black lines) and the same sequence with C in position 32 (red lines).

On the other hand, when we substitute one thymine by cytosine in a (T)₆₃, the HOMO electronic cloud will be localized in the replaced nucleotide, Figure 2C. Moreover, the eigenvalue related to this electronic state remains in the valence band with values 8.05 ± 0.01 eV, G in Figure 2D. Furthermore, the electronic distribution of BOMO will be positioned over the cytosine too, Figure 2F. The CG rich domains are usually related to BOMO. We do not observe any alterations in the conductor band for a (T)₆₃ with and without the replacement.

Results and Discussions

The Electronic Density of State in SP1 and EGR1 Transcription Factor

We apply the procedure described in the previous section and we estimate the eigenvalues E_k and eigenvectors $ϕ_{i}^{k}$ of the total Hamiltonian in Equation (1) for the sequences in Table 2. The criteria for the sequence selection as well as the method for nucleotide alignments are described in the Appendix of Supplementary Material. The alignments in Table 2 are in agreement with the consensus sequence in the literature: 5′-ggggcgggg-3′ [5, 59–64] and 5′-gcgggggcg-3′ [59, 60, 65–67] for SP1 and EGR1, respectively.

TABLE 2

Table 2. Nucleotide alignment for SP1 and EGR1.

Figure 3 shows a typical set of results for the SP1 binding site of the gene MOAB and EGR1 binding site of the gene EGR1. The nucleotide sequence of MOAB SP1 is in reverse complementary reading direction and EGR1 is in complementary strand, Figure 3G.

FIGURE 3

Figure 3. The results of SP1d and EGR1 binding sites for MOAB [61] and EGR1 genes [47] are respectively in the left and right columns. We remark that the EGR1 transcription factor can bind in the promoter of his own gene [47]. (A,B) are the density of states (DOS), where BOMO, HOMO, and LUMO energy levels of the zinc finger binding site are indicated by G, H and L. (C,D) are the probability $| ϕ_{i}^{k} |^{2}$ of BOMO (dashed and solid black lines) and HOMO electrons (orange line). BOMO E₀ in the valence band is degenerated with values corresponding to 8.00±0.01eV and 7.98±0.02 eV for SP1d and EGR1. E_HOMO are 8.52±0.01eV in both (C,D). The electronic clouds $| ϕ_{i}^{k} |^{2}$ of LUMO in the conductor band are in (E) with E_LUMO = 9.28±0.01eV and (F) with E_LUMO = 9.45±0.01eV. The nucleotide sequences are given in (G,H), where we underline the 1-3ZF binding consensus sequence in light and dark green lines [5, 47, 59–67]. We remark that the MOAB SP1d is in reverse complementary direction, the EGR1 reading sequence is in the complementary strand. The nucleotides with at least 10% probability of finding BOMO electrons are in gray and yellow. The HOMO and LUMO nucleotides with $| ϕ_{i}^{k} |^{2} \geq 0.1$ are in orange and marked with red bordered boxes, respectively. (I,J) are the probability for the electronic presence in the direct strand (black) and the complementary strand (red), when the valence band is completely filled, n_e = n. (K,L) are the field displacements y_i in the Morse potential with n_e = n for the direct strand (black) and for the complementary strand (red).

Although we have 2n eigenvalues and eigenvectors, each one related with one of 2n nucleotides of the system, the relevant electrons for the binding sites are those linked with BOMO, HOMO, and LUMO, respectively noted as G, H, and L in the density of states Figures 3A,B.

We start with the analysis of the position of BOMOs looking for the values of $| ϕ_{i}^{k} |^{2}$ with k close to 1. When we consider n = 50 as in MOAB and EGR1, the analysis of the first eight eigenvectors are usually sufficient to identify the relevant ones. The electronic cloud n_i, 0 ≤ n_i ≤ 1, Equation (1), is strand dependent, but we do not observe any strand related pattern for individual electrons. Thus, we sum the probabilities of the direct and the complementary strands to find the local electronic cloud. BOMOs could be degenerated in many electrons along the nucleotide sequence, but we should focus just in those around the binding sites, yellow and black lines in the valence band $| ϕ_{i}^{k} |^{2}$ , Figures 3C,D. Note that the sum of these two degenerated BOMOs $\sum_{k} | ϕ_{i}^{k} |^{2} δ (E_{0} - E_{k})$ will result in the LDOS of the binding site, which is proportional to the differential tunneling conductance [1]. At low temperature, this quantity could be measured by scanning tunneling microscope (STM) [42]. The zinc fingers of SP1 and EGR1 transcription factors act as tips of an STM, scanning binding sites along the DNA chain. Finally, we mark the nucleotides with at least 10% probability of electronic presence in gray and yellow in Figures 3G,H.

The procedure for localizing the electronic cloud associated with HOMO and LUMO is very similar to identify BOMO probability distributions, except that k of HOMO and LUMO are close to n. In order to find the electronic clouds, we need to consider k from 46 to 50 for HOMO and 51–54 for LUMO, when n = 50. The electronic cloud associated with LUMO is always close to the HOMO, with a maximum of ±6 bp distance. Since the probability of finding one HOMO or LUMO electrons are strand independent, we add both direct and complementary strand $| ϕ_{i}^{k} |^{2}$ in Figures 3C–F). The orange lines in Figures 3C,D and the red lines in Figures 3E,F) are HOMO and LUMO, respectively. We can also measure the LDOS of HOMO and LUMO with STM, using the same approach for BOMOs. The nucleotides with at least 10% of probability of electronic presence are denoted by orange and red boxes in Figures 3G,H).

Now we return to Table 2, where all BOMOs are marked in gray and yellow as well as the HOMO and LUMO electrons are in orange and red boxes. Looking at Table 2, the electronic distribution patterns for the binding sites for SP1 and EGR1 transcription factors are clear.

In the case of SP1, BOMO clouds are over the first (5′-ggg-3′) and third triplets (5′-ggg-3′) of the consensus sequence, light green in Table 2. These triplets identify the first and third ZF binding positions of SP1. Moreover, the first BOMO electronic cloud has values from 4 to 5 bp, while the second ranges from 2 to 4 bp. The eigenvalue of these BOMOs are 7.98±0.05 eV. The energy level of HOMO electrons are fixed at 8.52±0.02 eV and the electronic cloud size spans between 1 and 2 bp. We observe some fluctuation in the eigenvalue in LUMO for SP1, which values 9.3±0.1 eV. The LUMO electrons envelop 2–5 base pairs. The positions of HOMO and LUMO associated electrons are always before the first electron and these electrons are placed from −12 to 1.

For the EGR1 the first BOMO spans from the position 3 to 7 over the second triplet (5′-ggg-3′), and the probability in finding this particular electron spans over 2 or 4 bp. The second triplet is the binding site of the second ZF of the early grown response protein 1. The second electron is after the second triplet and is dispersed between the nucleotide positions 7–15, covering the third triplet. The electronic cloud size ranges from 2 to 4 bp. All BOMO energies in Table 2 value 7.99±0.03 eV. The HOMO and LUMO electronic cloud is over the second electron. All E_HOMO in Table 2 value 8.52±0.01 eV and the HOMO related electronic clouds have a length of 1 or 2 base pair. The LUMO energies fluctuate with an average value of 9.4±0.1 eV. The LUMO electronic cloud size varies from 1 to 6 bp and they are in position from 10 to 20.

Considering the HOMO and LUMO distributions, we believe that they may play some role in SP1 and EGR1 binding. These proteins bind DNA, embracing the major grove of the double helix as guide. In the case of SP1, the head may interact with nucleotides between positions −11 and −1. The behavior for EGR1 is more elusive because HOMO and LUMO are completely dispersed over the nucleotides 5–20. Despite the description emphasizing the similarity between the ZF and nucleotide interaction in the literature over the consensus nucleotides [5], the mechanisms of protein attachment in EGR 1 and SP1 are not the same [67, 68].

The HOMO and LUMO electronic clouds frequently overlap. Furthermore, the electrons of HOMO and LUMO are always in adenine and thymine rich sequences. The main reason is as the follows. The electrons from the HOMO in the valence band should move to the nearby lowest unoccupied molecular orbital in the conductor band, when the system is disturbed. And the easiest way for this movement is placing the electron in regions with higher excitability, i.e., AT rich domains. We may conjecture that this jump of the electron in the HOMO to the LUMO has an unknown role in the transcription factor SP1 and EGR1.

On the other hand, the less mobile electrons are those in the CG rich domain, since they are at the bottom of the DOS. So, we expect to identify BOMOs in CG rich-sequences instead of AT rich-regions as we see for (T)₆₃ with one C in the position 32, described previously. Furthermore, these BOMOs are degenerated, i.e., all electrons are at the same energy level. Thus, these cytosine and guanine rich-regions, typical in promoters, are ideal landmarks for SP1 and EGR1 binding sites. The absence of excitation in the lowest states is vital for ZF transcription factors, because nucleotides with mobile electrons may change the position of the beginning of the gene reading, altering the gene expression. For the eigenvalues between BOMO and HOMO we do not find any obvious pattern associated with SP1 and EGR1.

We never observe the BOMO electronic cloud and the overlapped LUMO-HOMOs together using the criteria of a minimum 10% of the localization probability of one particular electron at the samples in Table 2.

Unfortunately the findings in this section cannot be compared with accurate quantum chemistry calculations, because the literature reports results just for few nucleotides [37–41]. Systematic analysis at least 26 nucleotides is usually avoided, because this demands huge processing time to include hydrogen bonds and sugar residues for the description of DNA-protein interactions. On the other hand, the ionization model literature also does not study methodically particular nucleotide sequences, just indicating the viability of the computation in artificial sequences and few contiguous sequences (contigs) [1, 32–36].

The Collective Electronic Behavior

The electronic probabilities $| ϕ_{i}^{k} |^{2}$ of individual electrons, discussed in the previous section, are strand independent. However, the collective electronic probabilities n_i and the field displacement y_i depend of the strand.

When we have n_e = n electrons, we fill only the valence band and usually observe in all analyzed sequences, Table 2, 100% of probability of electronic presence in purine (adenine or guanine) and the absence of an electron (hole) in pyrimidines (thymine or cytosine) in agreement with the DFT analysis [41]. Figures 3I,J are the probabilities n_i associated with finding one electron in one nucleotide for the MAOB SP1d and EGR1 binding site sequences. The electronic presence in each purine gives us a new biological interpretation for Peng et al. contributions [11, 12]. Using exon and intron rich segments of the eukaryotic genome, they construct a DNA walk using purine and pyrimidine as criteria for steps. Then they report a self-affine fractal in the walk, showing long-ranged correlation in the purine and pyrimidine distribution. When we look to Figures 3I,J, purines and pyrimidines reflect the electronic distribution along the DNA chain. This electronic distribution is related to BOMO, HOMO, and LUMO distributions, which work as ZF binding sites, for example. It is important to stress that they report a self-affine fractal, but not a self-similar fractal. The self-similarity is related to the palindromic sequences, connected with DNA-loop structures as tRNA and rRNA [69, 70], while self-affinity is related to introns [11]. Furthermore, the evidence of polynomial long-ranged nucleotide interaction is also supported by Oiwa and Goldman [13, 14]. In this work the long contiguous sequences are represented by a sequence of 0 and 1 for noncoding and coding nucleotides, where noncoding nucleotides are intergenic regions and introns and coding nucleotides are genes and regions for metabolic controls. We made an auto-correlation analysis over the binary sequence and report correlation between two coding nucleotides at least 20 thousand bp apart.

The existence of long-ranged correlations has another consequence in the model. The second term in Equation (4), the stacking interaction between adjacent base pair, mimics the bending of DNA as polymer. But, we will see that the short-ranged term $ρ e^{- α (y_{i} + y_{i + 1})}$ in Equation (4) does not contribute to the electronic behavior. This term has an energetic value of the order of 10⁻⁴ eV, when we consider typical values for the parameters: y_i ~ − 0.1Å, ρ ~ 10 and α ~ 0.35Å. On the other hand, the Morse potential is of the order of 10⁻²eV. The stacking interaction will be relevant only if we consider ρ > 100, but such high experimental values for ρ are unlikely. This short-ranged element of the model comes from the DNA melting problem, where the interstrands binding of the double helix may open [17]. In this case, the short-ranged element is important, since it is easier to open the dDNA when the neighbor bp is already open. Moreover, y_i represent the displacement field of the electronic cloud of the hydrogen bonds between nucleotides in the DNA melting model. But, we change the concept of y_i to the π-orbital of the nucleotide. So, the short-ranged part in Equation (4) is no longer relevant. In order to simplify the model, one may eliminate the harmonic oscillator too in Equation (4). However, the harmonic oscillator is important for describing the stacking interaction in the Langevin equation, Equation (6). On the other hand, the bending and the torsion of the double chain have influence over the DNA chain [71], but this behavior cannot be explained by Equation (4), because we have just short-range exponential interactions and a harmonic oscillator between two neighbor base pairs. The missing long-range element in Equation (4) is object of further research. Finally, we do not observe the presence of the electron in purine sequences with one replaced pyrimidine or vice versa: (T)₆₃ with one C in i = 32 or (C)₆₃ with one T in i = 32. The presences or absences of electrons depend on neighbor base pairs.

The presence of electrons in purines has a profound impact on the ZF binding. We show the consensus nucleotide sequence and the core zinc finger binding amino acids in Figures 1B,C for EGR1 and SP1, respectively.

The EGR1 amino acid sequence is available in the Universal Protein Resource databank (UniProt) with the accession code P18146 [72]. The three zinc fingers of the human EGR1 are positioned between position 338–362, 368–390, and 396–418 of the 543 long amino acid sequence [72]. The dotted lines in Figure 1B are the hydrogen bonds between aspartic acid (Figure 1D) and adenine or cytosine, which stabilize the first guanine-argine(R) hydrogen bond of ZF. The positive charged basic argine (R), histidine (H), and lysine (K) responsible for the DNA-protein are highlighted in gray, while the negative charged weak acid threonine (T) and strong acid glutamine (E) are in yellow. Each red line in Figure 1B is the binding of one particular nucleotide with its respective opposite charged amino acid of the core zinc finger segment of the EGR1 [67].

The 785 amino acid long SP1 transcription factor, UniProt accession code P08047, has three tandem ZFs between 626 to 650, 656 to 680, and 686 to 708 [46, 72]. The dotted lines are the hydrogen bonds that stabilize the first ZF guanine-argine(R) or guanine-lysine(K) bonds. The binding between core ZF amino acids and the correspondent nucleotides are indicated by red lines in Figure 1C [5]. Again, each nucleotide is connected with opposite charged amino acid.

Concerning the electrical charges of the SP1 zinc finger tips, the middle amino acid that bonds with the middle nucleotide of the triplet, there is one motif associated with the position of BOMOs, described in the previous section. The pattern of positive and negative charges along the nucleotide sequence coincides with the position of the three ZF tips. Since BOMOs are the most stable electrons, they guide the fingers as holder for fixing SP1 to the dDNA. We observe the same phenomenon for the EGR1.

When we compare the strand independence analysis of the previous section with the electronic strand dependence, one may suggest a contradiction between the presences of BOMO in the complementary strand 3′-ccc-5′ at EGR1 in Figure 3H. Actually BOMO in this case is at the direct strand 5′-ggg-3′. We have the impression that BOMO is in the 3′-ccc-5′, since we sum the electronic cloud of direct and complementary strand in the previous section, seeking the electronic motif of BOMO.

The collective probabilities n_i are not the unique strand dependent variable in SP1 and EGR1. The field displacement y_i of the Morse potential is also strand dependent. Hence the electronic cloud y_i, given by the Morse potential in Equation (4), usually contract when n_i = 1. i.e., in the presence of purines. The contraction of the electronic cloud is more intense in adenine (y_i = −0.125±0.001) than guanine (y_i = −0.114±0.001). The simultaneous measurement of the size of the electronic cloud of the direct and complementary strands y_i mirror the nucleotide order and may lead to a new sequencing method.

The consensus sequences, the light and dark green lines in Figures 3G,H, are reflected in y_i and n_i, Figures 3I–L. We usually observe the absence of the electronic cloud in the middle cytosine of the direct strand of the SP1 and EGR1 binding sites, black lines with circle in Figures 3I,K, as well as the opposite behavior in the complementary strand, red lines with plus in Figures 3I,K. But, we should be cautious, because this is not true for the purine sequences with one replaced pyrimidine or vice versa in the same way of n_i, as mentioned before.

Conclusion

Since zinc-fingers (ZFs) interact at the π-orbitals of nucleotides, we do not expect that hydrogen bonds have a relevant role in ZF-DNA bindings as proposed by Zhu et al. [1]. We extend their model allowing the electronic movement along the nucleotides as in charge transport theory [32–36] and introducing the gap in the (C)₆₃ sequence as we expect from the experiments [30, 42]. Furthermore, we identified a typical motif for the probability distribution function of BOMO, HOMO, and LUMO for the nucleotide π-orbital along a dDNA at the binding sites of SP1 and EGR1. BOMO, HOMO, and LUMO show an electronic motif for SP1 and EGR1 binding sites, compatible with the consensus multiple alignments. Thus, the extended ladder model may replace the nucleotide alignment methods based on scores [48]. In the case of SP1, there is one BOMO in the first and another in the third zinc finger binding site, and the HOMO and LUMO positions are before the consensus sequence. The first BOMO is distributed for EGR1 over the second zinc finger binding position and the second BOMO is after the consensus sequence. The HOMO and LUMO are over the second BOMO. BOMOs are degenerated with 7.98±0.05 and 7.99±0.03 eV for SP1 and EGR1, respectively. The HOMO eigenvalues are 8.52±0.02 eV (SP1) and 8.52±0.01 eV (EGR1). The LUMO energy levels are 9.3±0.1 eV (SP1) and 9.4±0.1 eV (EGR1).

When the valence band is filled, we observe a 100% probability in electronic presence in purines (adenine and guanine) and its absence in pyrimidines (thynine and cytosine). Furthermore, the sequence of electrons and holes coincide with the basicity and acidity of the DNA-protein binding animo acids in the zinc fingers. In particular, the sequence of positive and negative charges of the tips of SP1 and EGR1 coincide with BOMO cloud distribution. The collective electronic behavior for the filled valence band DNA chain will result in a sequence of electronic clouds around purine π-orbitals (dashed-dotted lines in Figure 1D).

The Morse potential is the key components for the electronic behavior in the double helix DNA chain in the extended ladder model proposed here. But, the stacking interaction between adjacent base pairs in the Zhu et al. [1] has limited influence on the results, since this interaction is short-ranged.

The method described in this paper simplifies the search of the DNA-protein binding sites, because it does not require any score weighting system as in traditional bioinformatics for nucleotide alignment. So we do not need to worry about gaps in the nucleotide alignment, since we are not looking for the letters A, T, C, and G, but for the presence of the electrons or holes as in the charge transport. In this paper, we study the application of the method to SP1 and EGR1, but the technique is suitable for any nucleotide sequence, e.g., TATA boxes, CCCTC transcription factor, etc.

Author Contributions

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors thank Lei Liu for the discussions about zinc fingers and for kindly providing us with Figure 1A. This work is supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), process number 248589/2013, Brazil.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fphy.2016.00013

References

1. Zhu J-X, Rasmussen KØ, Balatsky AV, Bishop AR. Local electronic structure in the Peyrard-Bishop-Holstein model. J Phys Condens Matter (2007) 19:136203. doi: 10.1088/0953-8984/19/13/136203

CrossRef Full Text

2. Razin SV, Borunova VV, Maksimenko OG, Kantidze OL. Cys₂His₂ zinc finger protein family: classification, functions, and major members. Biochemistry (2012) 77:217–26. doi: 10.1134/s0006297912030017

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell (2007) 128:1231–45. doi: 10.1016/j.cell.2006.12.048

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Chen H, Tian Y, Shu W, Bo X, Wang S. Comprehensive identification and annotation of Cell Type-Specific and Ubiquitous CTCF-binding sites in the human genome. PLoS One (2012) 7:e41374. doi: 10.1371/journal.pone.0041374

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Klug A. The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Ann Rev Biochem. (2010) 79:213–31. doi: 10.1146/annurev-biochem-010909-095056

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Ong C-T, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet. (2014) 15:234–46. doi: 10.1038/nrg3663

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Ling JQ, Li T, Hu JF, Vu TH, Chen HL, Qiu XW, et al. CTCF mediates interchromosomal colocalization between Igf2/H19 and Wsb1/Nf1. Science (2006) 312:269–72. doi: 10.1126/science.1123191

ORIGINAL RESEARCH article

The Electronic Behavior of Zinc-Finger Protein Binding Sites in the Context of the DNA Extended Ladder Model

Introduction

Materials

The Method: The Extended Ladder DNA Model

Results and Discussions

The Electronic Density of State in SP1 and EGR1 Transcription Factor

The Collective Electronic Behavior

Conclusion

Author Contributions

Conflict of Interest Statement

Acknowledgments

Supplementary Material

References

People also looked at