The 1-Particle-per-k-Nucleotides (1PkN) Elastic Network Model of DNA Dynamics with Sequence-Dependent Geometry

Coarse-grained models of DNA have made important contributions to the determination of the physical properties of genomic DNA, working as a molecular machine for gene regulation. In this study, to analyze the global dynamics of long DNA sequences with consideration of sequence-dependent geometry, we propose elastic network models of DNA where each particle represents k nucleotides (1-particle-per-k-nucleotides, 1PkN). The models were adjusted according to profiles of the anisotropic fluctuations obtained from our previous 1-particle-per-1-nucleotide (1P1N) model, which was proven to reproduce such profiles of all-atom models. We confirmed that the 1P3N and 1P4N models are suitable for the analysis of detailed dynamics such as local twisting motion. The models are intended for the analysis of large structures, e.g., 10-nm fibers in the nucleus, and nucleoids of mitochondrial or phage DNA at low computational costs. As an example, we surveyed the physical characteristics of the whole mitochondrial human and Plasmodium falciparum genomes.


INTRODUCTION
Genomic DNA plays many functional roles such as a medium for the storage of genetic information and as a molecular machine involved in gene expression regulation. A recent genome-wide analysis of the intra-nuclear chromosomes of eukaryotes suggested that not only chemical but also sequence-dependent physical properties of DNA predominantly contribute to formation of the genomic structure and gene expression regulation (Shrader and Crothers, 1989;Cacchione et al., 1997;Lowary and Widom, 1998;Widlund et al., 1999;Geggier and Vologodskii, 2010). Specifically, sequence-dependent structures play a key role in the determination of functional properties such as nucleosome positioning in DNA (Freeman et al., 2014b;Awazu, 2017).
Genomic DNA in prokaryotes, viruses, and mitochondria forms compact structures called nucleoids. The structures are shaped through the bending and twisting of DNA, mediated by histone-like proteins such as mitochondrial transcription factor A (TFAM) (Alam et al., 2003;Pohjoismäki et al., 2006;Kaufman et al., 2007;Ngo et al., 2014) and HU proteins (Broyles and Pettijohn, 1986;Kobryn et al., 1999;Guo and Adhya, 2007), with the help of topoisomerases (Brown et al., 1979;Wasserman and Cozzarelli, 1991;Nenortas et al., 1998;Wang, 2002). Each nucleoid generally contains negative supercoils of DNA, which notably enhance the transcription and replication of genetic loci (Bogenhagen et al., 2014;Farge et al., 2014;Kukat et al., 2015). This observation indicates that the physical processes involving DNA, such as the formation and deformation of nucleoids, perform important functions in the regulation of gene expression in prokaryotes.
Coarse-grained (CG) models of DNA have facilitated investigations of the dynamics of relatively large DNA molecules for long time scales at reduced computational costs. Recently proposed CG models can accurately reproduce some of the properties observed in experiments (Freeman et al., 2014a;Hinckley and de Pablo, 2015;Markegard et al., 2015;Snodin et al., 2015;Singh and Granek, 2016) or all-atom models (Savelyev and Papoian, 2010;Setny and Zacharias, 2013;Naômé et al., 2014), which also depend on solution conditions. However, due to the large degrees of freedom, the computational cost of these models is still not low enough to be applicable to long DNA sequences containing more than 10 4 base pairs, corresponding to a 10-nm fiber with 100 nucleosomes or a mitochondrial nucleoid.
Recently, we devised a simple CG elastic network model, in which each nucleotide is described by one particle, named the 1-particle-per-1-nucleotide (1P1N) model. This model incorporates a sequence-dependent basic structure, and can reproduce the dynamic behavior around the basic structure observed in all-atom models (Isami et al., 2015). In this paper, to further reduce the computational cost, we propose a 1-particleper-k-nucleotides (1PkN) model, i.e., a CG elastic network model where one particle represents k nucleotides. To construct feasible 1PkN models for k > 1, we designed interaction networks to reproduce the behavior of the 1P1N model, and determined appropriate k values. We found that the 1PkN models with k < 13 could sufficiently describe the global fluctuation of DNA, whereas the 1P3N and 1P4N models are also applicable to detailed analyses of local fluctuations.
The 1PkN models can deal with sequence-dependent behavior at a more than 10 times reduced computational cost than required for the previous CG models. As a demonstration, we constructed 1P3N models of the entire Plasmodium falciparum and human mitochondrial genomes (5,967 and 16,569 bp, respectively), and analyzed their physical properties affecting local bending and twisting, which are important for the formation of nucleoid structures.

The 1P1N Coarse-Grained (CG) Model
First, we briefly review our previous CG elastic network model of double-stranded DNA, where each nucleotide is represented by one particle, namely the 1P1N model (Isami et al., 2015).
Similar to other elastic network models, this model depends on a basic structure. To consider sequence-dependent geometry, we constructed a basic structure according to a set of helical parameters (base-step and base-pair parameters) depending on the local nucleotide sequences (typically two consecutive basepairs). These parameters have been determined by experiments and molecular dynamics simulations (Olson et al., 1998(Olson et al., , 2006Lankaš et al., 2003;Perez et al., 2008;Morozov et al., 2009;Lavery et al., 2010;Dans et al., 2012;Hospital et al., 2013). Throughout this study, we employed the helical parameters obtained in in vitro experiments and from X-ray crystal structure analysis (Table  S1) (Olson et al., 1998(Olson et al., , 2006Lankaš et al., 2003;Morozov et al., 2009;Freeman et al., 2014a), which were also used in other recent CG DNA models (Freeman et al., 2014a;Isami et al., 2015). Qualitatively similar results were obtained even if different helical parameter sets were applied (Perez et al., 2008;Lavery et al., 2010;Dans et al., 2012;Hospital et al., 2013) for all analyses described in this paper (data not shown). We used 3DNA (Lu and Olson, 2003) to obtain the coordinate of each atom in the DNA, on the basis of the helical parameters (for detailed methods on the generation of the atom coordinates, see Lu and Olson, 2003).
In the 1P1N model, x i = (x i , y i , z i ), the position of nucleotide i (i = 0, 1, 2, ...), is given as the coordinate of the C1' carbon of that nucleotide, and m i , the mass of nucleotide i, is considered to be equal (m i = m). The potential V of the interaction between nucleotides is given as where x 0 i is the position of nucleotide i in the basic structure. We assumed that nucleotides i and i c form a base pair, i.e., the superscript c indicates the corresponding nucleotide in the complementary strand. B 1 ij = 1 for j = i ± 2, i ± 1, (i ± 3) c , (i ± 2) c , (i ± 1) c , and i c ; and B ij = 0 otherwise. C 1 is 7.7 kJ/(Å 2 mol) (by rescaling m = 10 −3 /N A [kg]). Using normal-mode analysis (NMA), we confirmed that this model can accurately reproduce the dynamic parameters of each nucleotide obtained by means of the all-atom model, for the several types of helical parameters mentioned above (Isami et al., 2015).

The 1PkN Coarse-Grained (CG) Model
In this study, we aimed to construct further CG models intended for the analysis of long DNA. Here, we present 1PkN models, where k nucleotides are represented by a single particle. If we were to directly compare such models with the corresponding all-atom models, the computational cost of the analysis would be huge (e.g., a 450-bp model, considered below for the 1P9N case, involves 18,450 atoms), as we would need to assess a statistically sufficient number of sequences. Therefore, we used the 1P1N model as a reference to design and validate these new models.
To construct the 1PkN model, each k particle (i.e., at the center of consecutive k bases) in the 1P1N model was extracted. Figure 1 shows the cases for k = 4 and k = 5. More specifically, we assumed a 1PkN model of L bp, consisting of 2 × [ L k ] particles (where [ ] indicates the Gauss symbol). The three-dimensional coordinates q n , q n c and mass m n , m n c of the n-th (n = 0, 1, 2, · · · ) and n c -th particles in the 1PkN model were expressed as those of the i-th and i c -th particles (i.e., nucleotides) in the 1P1N model (i.e., x i , x i c , m i , and m i c ), respectively, where i = kn + k 0 and i c = (kn + k 0 ) c (the n-th and n c -th particles form a base pair). Here, k 0 = k−2 2 (k: even) or k−1 2 (k: odd) (Figure 1). Then, we define the interaction potential V between the particles as where q 0 n is the position of particle n in the basic structure, B k nl indicates the weight of the connection between particles n and l (Figure 2), and C k is the connection strength. We determined the appropriate k, C k , and B k nl values as shown below. As mentioned above, the basic structure was constructed according to a set of sequence-dependent helical parameters. Both the 1P1N model and 1PkN (k > 1) models incorporate sequence-dependent physical properties through the interaction potential V that depends on the basic structure (x 0 i or their subset q 0 n ).

Normal-Mode Analysis (NMA)
To optimize and evaluate the 1PkN model, we compared the structural fluctuations in the 1PkN model with those of the 1P1N model. For this purpose, NMA was used to estimate the mobility of each particle in the 1PkN model. NMA has been applied to elastic network models and is demonstrated to well reproduce the structural properties of biomolecules such as proteins (Tirion, 1996;Atilgan et al., 2001;Bahar and Rader, 2005;Yang et al., 2009;Bahar et al., 2009Bahar et al., , 2010Dykeman and Sankey, 2010). Here, we provide a brief overview of NMA. We defined q(t) (q = (q 0 , q 1 , · · · , q N−1 ), q n = (x n , y n , z n )) as a 3N-dimensional position vector, and q 0 as the position vector of the basic structure. Using linear approximation, small deviations δq(t) = q − q 0 around the basic structure obey the following equation: where , and i |v k i | 2 = 1) represent the k-th largest eigenvalue and its eigenvector of the 3N × 3N Hessian matrix H, where We presumed that the system is in thermodynamic equilibrium at temperature T. Then, the amplitude A k is expressed as with the Boltzmann constant k B . Using this solution, the mean square fluctuation (MSF; i.e., the strength of thermal fluctuation) of the n-th particle and the correlation of motion between particles n and l were obtained as F a n =< |δq n | 2 >= and respectively, where < ··· > represents the temporal average. Similarly, for the 1P1N model, the MSF G a i =< |δq i | 2 > and the correlation of motion G c ij =< δq i · δq j > /( G a i G a j ) were calculated, where the i-th and j-th particles in the 1P1N model correspond to the n-th and l-th particles in the 1PkN model, respectively.
Note that the particles at the ends of the 1PkN model do not correspond to those at the ends of the 1P1N model. Furthermore, the behavior of the particles at the ends of these models often become specific because there are fewer interacting particles at the ends than in the middle. Therefore, in the estimation of fluctuations, we omitted δq n for four particles (i.e., two particle pairs) at each end.

Determination of the Parameters in the 1PkN Model
The 1PkN model above includes the free parameters B k nl and C k . We optimized these parameters to fit the overall fluctuation of each particle with that observed in the 1P1N model.
To compare the fluctuations between the 1P1N and 1PkN models, we employed randomly chosen DNA sequences with 50×k base pairs for each k; i.e., the number of particle pairs in the 1PkN model was always 50. Pearson's correlation coefficients ρ a between the MSF F a n in the 1PkN and G a i in the 1P1N models was used as the performance index. A set of B k nl that maximizes the average of ρ a for 500 randomly chosen sequences was determined by a systematic survey ( Table 1). For the sake of simplicity, we assumed B k nl for l = n±4, n±3, n±2, n±1, (n±4) c , (n±3) c , (n± 2) c , (n ± 1) c , and n c is equal to 0 or 1, and B k nl = 0 otherwise (Figure 2). We confirmed that the results were qualitatively the same even when B k nl could take intermediate values. C k values for k > 1 were chosen manually, which alter only the absolute level of the fluctuations and do not affect the correlations we consider below.

Comparison of Anisotropic Fluctuations in the 1P1N and 1PkN Models
As previously demonstrated, the 1P1N model can effectively reproduce the fluctuations of an all-atom model in several typical geometric vector directions (Isami et al., 2015). To evaluate the performance of the 1PkN models, we again adopted anisotropic fluctuations as indices, and compared them between the 1P1N and 1PkN models for randomly chosen DNA sequences. Since the total numbers of modes are different between the 1P1N and 1PkN models for k > 2, we cannot compare each of their eigenvectors directly. Thus, we focused on anisotropic MSFs in several representative directions for each particle in the 1P1N and 1PkN models.
The fluctuations in each direction were evaluated as follows. In the 1PkN model, the n-th and n c -th particles form a base pair.   Table S2.
We defined the unit vectors and to the directions shown in Figure 3. Although b n and s n are not orthogonal in general, we confirmed that the angles of these vectors were always sufficiently close to π 2 rad for each n. Using these vectors, we decomposed the MSFs of the n-th and n c -th particles of the 1PkN model (i.e., F a n and F a n c ) into these three directions to obtain F b n (F b n c ), F s n (F s n c ), and F t n (F t n c ). We also calculated the MSF of relative particle positions DF a n of the n-th particle pair, and its projection DF b n , DF s n , and DF t n to these three directions. Furthermore, we estimated the local flexibility toward the directions of bending and twisting around the n-th particle pair, as denoted by F bend n and F twist n , respectively. The definitions of these indices are provided in Table 2.
Moreover, to compare the behavior between the n-th (n c -th) particle in the 1PkN model and the corresponding i-th (i c -th) particle in the 1P1N model, their motions must be projected onto the common vectors of the 1PkN model. For this purpose, we adopted b 1 i = b n , s 1 i = s n , and t 1 i = t n for i = kn + k 0 (b n , s n , and t n were obtained from the 1PkN model). Using these vectors, we projected the MSFs of the i-th (i c -th) particles in the 1P1N model (i.e., G a i (G a i c )) onto the same directions as those of the corresponding n-th (n c -th) particles in the 1PkN model, to obtain Similarly, the MSFs of the relative particle positions of the i-th particle pair (DG a i , DG b i , DG s i , and DG t i ) and the local flexibility values (G bend i and G twist i ) in the same directions were obtained ( Table 2).
Similarities between the fluctuations in the 1PkN model (F • n and DF • n ) and those in the 1P1N model (G • i and DG • i ) for each sequence were evaluated by means of the Pearson's correlation coefficients ρ • listed in Table 2. The correlations of motion, F c nl and G c ij , were also compared using Pearson's correlation coefficient ρ c .
To compare the absolute magnitude of fluctuations, we define F • n n , DF • n n , G • i i , and DG • i i as the average of F • n , DF • n , G • i , and DG • i , respectively, over the entire sequence except for two particle pairs (2 × k base pairs) at each end in the 1PkN (1P1N) model.
We assessed the local bending and twisting ability F bend n and F twist n , respectively, of the n-th particle pair (i = 3n + 1-th base pair). The moving average was taken over seven particle pairs: MF bend n = n+6 m=n F bend m and MF twist n = n+6 m=n F twist m . The averaging window (21 bp) largely corresponds to the length of the histone-like protein TFAM-binding sequence (22 bp Alam et al., 2003;Pohjoismäki et al., 2006;Kaufman et al., 2007;Ngo et al., 2014).

Validation of the 1PkN Model by Means of Anisotropic Fluctuations
As described in the previous section, we evaluated the 1PkN model using the indices listed in Table 2. We calculated these indices for each DNA sequence. For most k values up to k = 15, the profiles of MSFs of the n-th particle (F a n , F b n , F s n , and F t n ) and the correlations of fluctuations between the n-th and l-th particles (F c nl ) in the 1PkN model were similar to the corresponding values (G a i , G b i , G s i , G t i , and G c ij ) obtained in the 1P1N model (Figures 4-7; Figures S1-S16). In particular, good agreement was observed between F a n and G a i (the overall fluctuation profiles are independent of the value of C k because C k influences only the absolute values of fluctuations). For the 500 randomly chosen sequences, ρ a , ρ b , ρ s , ρ t , and ρ c showed high average values with low standard deviations for all k values considered (Table 3; details of ρ • values and the profiles of G Tables S3-S5, Figures S1-S16).
For all k values considered, the absolute values of ρ Db were always much lower ( Table 3, Tables S3-S5). In fact, the amplitude of DF b n was shown to be negligible, i.e., much smaller than those of the fluctuations in other directions ( Figures S2, S6, S10, S14). From these results, DF s n and DF t n were presumed to be essential for characterization of the inter-strand motions of double-stranded DNA. Focusing directly on the local bending and twisting flexibility of DNA in detail, F bend n and F twist n were thought to be even more important than DF s n and DF t n . Therefore, we mainly focused on ρ bend and ρ twist for the comparison between the 1PkN and 1P1N models.
For k = 3, ρ bend and ρ twist showed particularly high values (Table 3 and Figure 5). Therefore, the 1P3N model with an appropriate B k nl can effectively reproduce the mechanical properties, including local bending and twisting ability, of the 1P1N model; thus, although indirectly, the 1P3N model was proven to well reproduce such properties of the all-atom model.

Meaning
Definition Correlation MSF (mean F a n =< |δq n | 2 >, (F a n c =< |δq n c | 2 >) ρ a square fluctuation) We defined δQn = δq n + δq n c 2 In addition, we confirmed that the result was almost identical for longer (450 bp) sequences ( Table 3, Figures S13-S16), suggesting that the high correlations between the 1P3N and 1P1N models were independent of the sequence length.
For k = 4, ρ bend and ρ twist were sufficiently high (greater than 0.95), although a non-negligible difference exists between the absolute values of F twist n and G twist i (Figure 6). Therefore, if we analyze only the relative patterns of the local bending and twisting flexibility of DNA, the 1P4N model seems to be sufficiently useful.
The local twisting flexibility was not well reproduced for k > 4. This reflects the fact that the pitch of B-DNA is 10-11 base pairs, and hence the right-handed double-stranded structure was not retained for k = 5 (i.e., 2 particle pairs per pitch) or larger.
These 1P3N and 1P4N models contain 1/3 and 1/4 of the particles, respectively, as compared to the 1P1N model. Although the computational cost of the NMA of these models is much lower (by a factor of 1/10 to 1/30 for k = 3) than that of the 1P1N model, the accuracy of the obtained statistical results is almost identical (the computational cost to obtain all the eigenvalues and eigenvectors of an N × N matrix is generally between O(N 2 ) and O(N 3 ), which is the most time-consuming step; hence the reduction is expected to be k −2 to k −3 ).    Therefore, these models can be used for exhaustive analyses of the dynamic features of large DNA molecules. It should also be noted that ρ a , ρ b , ρ s , ρ t , ρ bend , and ρ c were higher than 0.9 for all k < 13, suggesting that 1PkN models with larger k values (e.g., 1P9N in Figure 7) may be useful for analyses of only the crude sequence-dependent behavior of very long DNA molecules. The MSF exhibited approximately the same magnitude in the 1P3N model and 1P1N model for a GC-content of around 0.5, although in the case of lower GC-contents, F a n n of the 1P3N model was slightly larger than G a i i of the 1P1N model (specifically, reaching up to 10% larger) (Figure 8). Most of the profiles of F • n n and DF • n n were close to those of G • i i and DG • i i ; however, some cases resulted in considerably different magnitude profiles (Figure 8, Figure S17). Nevertheless, the 1P3N model consistently showed good agreement with the behavior of the 1P1N model.

Application Example: Mitochondrial-Genome Dynamics
As an example of the possible application of the 1P3N model, we analyzed the whole-genome dynamics of mitochondrial DNA. Using the NMA of the 1P3N model constructed for these sequences, the potential for bending and superhelix formation in each local part of these genomes was inferred.
Consistently, there were non-uniform profiles of MF bend n and MF twist n detected in P. falciparum mitochondrial DNA (Figure 9) and human mitochondrial DNA (Figure 10 and Figures S18-S20). Similar results were obtained for S. pombe (19,431 bp) and D. melanogaster (19,524 bp) mitochondrial DNA molecules, which are longer than human mitochondrial DNA (Figures S21,  S22).

SUMMARY AND CONCLUSIONS
In this study, we developed the 1PkN model, which is more coarse-grained than recent DNA models and is intended for the analysis of the dynamic properties of long DNA molecules. In particular, we showed that the 1P3N model, where each particle represents three nucleotides, could accurately reproduce the distributions of the anisotropic fluctuations of nucleotides and the flexibility of local regions observed in the 1P1N model. Considering our previous result showing that the 1P1N model well reproduced the fluctuations of all-atom models of DNA (Isami et al., 2015), although indirectly, the 1P3N model was here shown to be in good agreement with all-atom models.
The 1P3N model seems to be useful for detailed analyses of the mechanical properties of long DNA molecules. The 1P4N model appears to be helpful for assessment of the fluctuations and flexibility of local regions as well. Using the 1P3N model, we analyzed the physical properties of whole mitochondrial genomes, even longer than 10 4 bp. From the macroscopic perspective, our models should be connected to further coarse-grained polymer-like models (e.g., Brackley et al., 2013), where the sequence-dependent properties that emerged in our model could be incorporated as the inhomogeneity   of mechanical parameters such as persistence length. Such multiscale approaches would be beneficial for analyses of huge structures such as whole bacterial genome DNA.
In this study, our analysis depended on the NMA. Moreover, we can perform molecular dynamics simulations of the 1P3N model, as well as the other proposed models, for the analysis of large deformations and interactions with a variety of proteins. These simulations would be useful to elucidate the formation and dynamics of compact nucleoid structures and ∼30-nm chromatin structures (Gall, 1966;Adolph, 1980;Langmore and Paulson, 1983;Paulson and Langmore, 1983;Eltsov et al., 2008;Joti et al., 2012) or even larger structures such as the topologically associated domains in the nucleus (Zhu et al., 2007;Grosberg, 2012). Characteristics of DNA also depend on the conditions of the solution, such as temperature and ionic strength (Hamelberg et al., 2000;Mantz et al., 2007;Middleton et al., 2007). Therefore, adjustment of the model to reflect different conditions of solution would be warranted. We are planning to modify the proposed CG model to include the chemical properties of DNA and to analyze chromatin structures using not only NMA but also molecular dynamics simulations.

ACKNOWLEDGMENTS
The authors are grateful to S. Tate for fruitful discussions.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphys. 2017.00103/full#supplementary-material