Can We Assess Early DNA Damage at the Molecular Scale by Radiation Track Structure Simulations? A Tetranucleosome Scenario in Geant4-DNA

Advanced track structure codes excel as state-of-the-art tools to low-scale dosimetric models: the rational evolution for a cell-like scenario is detailed within a microsecond of an ion collision, that is the standard timescale for critical DNA modifications. The in vitro DNA double-strand breaks (DSB) yield is matched indirectly by nanodosimetric track structure assessments; however, the score to specific DSB motifs (i.e., the yield associated to each DSB distance between DNA cuts) is mostly overlooked. In this work, we extend the PDB4DNA example of the Geant4-DNA toolkit, to briefly assess the hit and DSB scores over a nucleosome tetramer framework (Protein Data Bank entry: 1zbb). We describe a critical scenario that biases the statistical significance for an event-by-event track structure assessment at the nanometric scale, based on a Shannon's entropy estimate of the volumetric hit score; finally, we draw a tentative correlation between the mean DSB quality and a proton track, and conclude that short-distanced DSBs by direct effect are favored within a Bragg peak-relevant energy range.


INTRODUCTION
Mechanistic estimates of cell radiosensitivity benefit from the sharply detailed information on the local density of radiation-induced, DNA-lethal events in cell nuclei, down to a nanometric scale [1,2].
The description of a rational frame for the overall outcome associated to a specific radiation quality (which leads to the early cell reaction) involves several research fields. Particularly, this is critical to ion beam irradiation, which is associated with a highly complex energy deposition cascade. The event-by-event reconstruction of the cell-like scenario, however, forces a choice in scale and time; indeed, advanced Monte Carlo radiation track structure toolkits depict the microsecond timescale of radiolytic radical diffusion and reactivity [3][4][5][6], while molecular dynamics (MD) may achieve a few microseconds of non-reactive, all-atom DNA dynamics [7,8]. Beside actual technical limitations, however, there is a multitude of choices as to which chromatin scenarios are worth the simulation effort, to what extent, and what to look for.
As nuclear DNA suffers all carriers of molecular toxicity, cells enable substrate-selective recovery mechanisms [9]; each mechanism is associated to one activation factor, that senses a defined DNA aberration class (i.e., base excisions, DNA strands crosslinks, strand breaks, etc.)-still, we lack a rational scenario to describe the subsequent scale of mechanistic evolution for the DNA lesion [10]. All aforementioned DNA modification examples relate, in fact, to a wide variety of biochemical scenarios; it was remarked that the early dynamics for a short double-stranded DNA chain, as it suffers a double-strand break (DSB) event-that is, the close cuts in the covalent DNA backbone over the two strands-is defined markedly by the distance between strand cuts [11]. Furthermore, broken DNA chains are resilient to thermal fluctuations in the nucleosome, where the core histone tails hold the DNA ends, within a few microseconds [12].
Track structure toolkits benefit from coarse criteria that qualify as a DSB any closely associated, local energy deposition event over the DNA backbone (either beyond or within an arbitrary dose threshold) to a certain likelihood [6,13,14]. We assume, however, that different DSB motifs are distinctly harmful, and short-distanced DSBs alone would unfold nucleosomal DNA within a few microseconds of an ion collision, while it seems unlikely that a DSB frame distanced by 10 bases would crack by thermal fluctuations. The mechanistic assessment of DSB distances meets a biochemical intuition; in fact, the explicit scenario that unfolds after a DSB event, and the kinetic implications distinct DSBs have on cell activation factors, are not fully known. Furthermore, while track structure mechanistic assessments are meant to focus the overall radiation field effect on a local scale, to indirectly match absolute strand break outcomes, the score to specific DSB distances is mostly overlooked.
We hereby show the results of a brief assessment for a proton track structure within a Bragg peak-related energy range, at a nanometric scale. As a fit environment, we exploited and extended the basic PDB4DNA example classes [15] of the Geant4-DNA toolkit [16][17][18][19], to keep track of the hit and DSB distance scores (by early direct effect), over a nanometric water simulation frame. The natural framework to such analysis are the atomic crystal coordinates of a nucleosome tetramer [20] (hereafter tetranucleosome, Protein Data Bank entry 1ZBB [21]). Nucleosomes define the elemental units of chromatin in the eukaryotic cell [22,23], where a 147 bp double-stranded DNA helix wraps a core histone octamer over 1.67 turns [24]; furthermore, nucleosomes are threaded as "beads" into 30nm-wide chromatin fibers [25], thus the tetrameric framework involves linear, linker DNA chains and wrapped core DNA, and fairly depicts a chromatin-like scenario.
Initially, our first aim was to track the likelihood of each nucleotide over the nucleosome tetramer framework to be involved by a DSB event. However, our assessment revealed a critical scenario, where the hit collection over the tetranucleosome DNA backbone is statistically biased by a "truncated" track structure artifact; we thus involve a statistical estimator based on Shannon's entropy to assess the level of bias for a collection of hit events, within the nanometric volume. As the hit artifact is trivially fixed by an expansion of the water box, we finally draw a tentative correlation between the mean DSB quality and a proton track, and conclude that short-distanced DSBs by direct effect are favored within a Bragg peak-related energy range.

MATERIALS AND METHODS
The PDB4DNA example of the Geant4-DNA toolkit includes a set of C++ libraries that let users create nanometric simulation volumes, tailored over the atomic coordinates of DNA biomolecular structures (available within the RCSB Protein Data Bank, RRID:SCR_012820 [26]). Protein Data Bank (PDB) files collect exhaustive information of a biomolecule's framework as, for instance, a detailed list of atom coordinates in a readable ASCII format; DNA nucleotides are further classified and labeled by a serial index that codifies their location over the DNA chain (1-694 for the overall nucleosome tetramer framework referred to in this work).
The 1ZBB PDB entry describes the elemental symmetrical unit for the nucleosome tetramer (i.e., a dinucleosome). This was further transformed via a dedicated PDB file editor (VMD, RRID:SCR_001820 [27]) and added with its complementary element, to achieve a full tetrameric framework (694 bp) that was associated to the PDB4DNA example; the edited PDB file records were fixed to comply with the PDBlib reader format criteria (these are detailed in [15]).
We will hereafter refer to the tetranucleosome-tailored water box extracted by PDB4DNA (13.0 × 15.2 × 25.4 nm) as the reference volume; all framework-related information, that is, the DNA atomic coordinates and nucleotides' center of mass, is implicitly extracted and cached within the Geant4 environment. Such default reference volume (a G4Box instance) is made of G4_WATER (a NIST database material) and lies within a void environment of "Galactic, " vacuum material.
The basic PDB4DNA analysis tools keep track of the overall energy deposition, single-and DSB scores within the water box. We extended the default classes and involved a further set of ROOT [28] histograms to keep track of 1 (i) the energy deposition events (hits) score over the reference volume coordinates; (ii) the hits and DNA strand breaks score over the DNA backbone; (iii) the DNA DSBs distance score, as the absolute distance between individual strand breaks over complementary DNA strands, within a 10 bp threshold. A DNA strand break is scored as an overall 8.22-eV dose deposition (lower threshold) to a nucleotide backbone moiety, that is, the ribose-phosphate residue. Several criteria that yield reliable estimates of DNA strand breaks by early direct effect are detailed in the literature [29]; however, we remark that our analysis, in this concern, is merely qualitative.
The overall dataset was collected within the Geant4 (version 10.02-P03) toolkit environment [30][31][32]; "raw" data were analyzed via ROOT scripts as detailed in the "Results" section. Each of several runs covered 10 7 tracks and diverse volume choices; however, the default G4EmDNAPhysics list constructor 2 to low-energy electron cross-sections 3 was set. The default PDB4DNA layout involves an isotropic, outer spherical source, that is defined over the vertex coordinates of the reference volume; thus, particles (500 keV to 5 MeV protons, in this work) are randomly shot by the edges toward the water box. The source is bound to the active box, therefore it stretches as the water volume is expanded (vide infra). Such scenario let us factor varied, random nucleosome layouts in the overall assessment, that is representative of a chromatin fiber scenario.

Hit Score Analysis Within a Nanometric-Sized Volume: The Hit Artifact
We started by the default PDB4DNA extension layout, where particles (500 keV protons) randomly strike the reference volume by a perfectly isotropic, spherical source. Figure 1A shows how the hit counter (the score to each energy deposition event over the DNA backbone) is explicitly non-homogenous, as well as the strand break and DSB counters (not shown here). We expected an explicit dependence for the hit score on local DNA morphology, hence the likelihood for a nucleotide to be involved by a DSB event would vary between the linker and core DNA chains; however, no such correlation seems to arise, as the hit spikes in Figure 1A involve nucleotides over linker and wrapped DNA likewise. Remarkably, such spikes describe a cluster of DNA nucleotides over the "core" of the tetranucleosome framework (highlighted in red in Figure 1D), which we show to be an artifact (vide infra); indeed, the hit counter to all energy deposition events within the reference volume (Figure 2A) shows that the central core is oversampled effectively over the z-axis (as defined in the nucleosome tetramer PDB atom coordinates file). The same holds true for the x-and y-axes (not shown here).
We will hereafter refer to such effect as a hit artifact. As particles leak off the water box (to a vacuum environment), their tracks are cut off. Hence, no further collisions/events are detailed by the reference volume outer shell, and we lack track information at the system boundary, where there is a local unbalance. As a consequence, a minor dose fraction is deposited over outlying DNA nucleotides, and the dosimetric information we draw out of the tetranucleosome framework is an effective oversample of the reference volume central core (that is incidentally taken up by linker DNA chains), while we overlook the outlying nucleosome compartments. Such hit artifact remarks how we likely misestimate the early effect of an ion traversal over a small DNA framework, as we lack either boundary conditions or track information at its outer solvation shell.
To avoid the hit artifact, we expanded the water box further off the default size of the reference volume, i.e., we symmetrically applied a multiplicative linear expansion factor to each box size (1-5-fold). Figures 1A-C show the hit counter spikes effectively vanish off the DNA backbone, as well as the DNA strand break counter (not shown here), where the water volume is expanded; hence, such trivial symmetrical expansion ensures all nucleotides are sampled evenly over the tetranucleosome framework. So, while we expect the hit artifact to apply to all nanometric systems likewise, a substantially thick "solvation shell" makes sure the track is detailed over the reference volume (and the DNA backbone), which now lies within a wider water box. This raises a further issue, that is, to achieve a convenient tradeoff in track structure details and the least effort, i.e., to establish where the water box is overexpanded.

Shannon's Entropy as a Bias Estimator to Achieve a Convenient Expansion Tradeoff
We will hereafter refer to the volume hit score (VHS) as the overall score of energy deposition events within the reference volume frame, while the DNA hit score (DHS) is the subset of all VHS events that fall over the DNA backbone; such scorers effectively estimate the overall amount of information we collect (and lose) over the reference volume frame, while the water box is expanded.
The VHS increases with the volume expansion, as shown in Figure 3A; this is expected, as a thick solvation shell ensures an increase in the overall information we collect over the reference (and its outer) volume, as shown explicitly in Figures 2A-C. However, Figure 3B shows that the DHS coincidently decreases, thus we scored fewer hits over the DNA backbone, while the hit counter had increased within the reference volume.
The latter outcome looks counterintuitive; however, nucleotides effectively take up a minor fraction of a nucleosome volume. By the PDB4DNA default scenario (1-fold expansion factor), we overstrike the tetranucleosome "crowded" core, FIGURE 2 | Hit score over the z-axis [as defined in the nucleosome tetramer Protein Data Bank (PDB) atom coordinates file] to all dose deposition events within the reference volume frame, for a set of 500 keV-protons PDB4DNA runs at a (A) 1-fold, (B) 2.5-fold, and (C) 5-fold linear expansion of the reference volume sizes; the reference volume is oversampled at its core, whereas the disparity to the outlying compartments is explicit to smaller systems (hit artifact). where tracks are extremely effective; they turn less effective over the DNA backbone, however, as we shoot over a wider water system (lower DHS). This likely reflects an uneven nucleotide framework, where a major hit fraction strikes an outlying volume associated to a low nucleotide concentration.
To establish and quantify the level of flatness for the DNA hit counter (and estimate a VHS-DHS tradeoff), we referred to a normalized Shannon's entropy formula, defined as: where the index i runs over the N = 694 nucleotide pairs, and p i is defined as the hit score over the i-th nucleotide pair, divided by the overall DNA hit score; therefore, S varies between 0 (maximally biased distribution) and 1 (unbiased distribution). Figure 3C shows that Shannon's entropy increases steeply with the linear expansion factor; however, no significant increase in S is achieved beyond a 2.5-fold expansion of the default reference volume sizes, which we therefore established as a minimum threshold to achieve an unbiased sample of the tetranucleosome DNA backbone.
Such a threshold is, however, strictly bound to the 500 keV scenario. We thus extended the assessment to a 5-MeV particle case; Figures 3D,E show the volume and DNA hit score to behave exactly alike at 500 keV and 5 MeV, within an order of magnitude decrease for all values in the latter case, as expected by an effective difference in LET, i.e., with fewer energy deposition events overall. Remarkably, the 500-keV steep increase in Shannon's entropy ( Figure 3C) is matched at 5 MeV ( Figure 3F)-likewise, S is maximum at a 2.5-fold linear expansion factor of the default reference volume sizes. We therefore established a 2.5 expansion factor to be a minimum threshold to achieve an unbiased and statistically significant sample of the tetranucleosome framework, within the 500 keV−5 MeV energy range.

The Distance Mean Score
In view of the latter outcome, we extended our assessment of the 500-keV to 5-MeV energy range scenario at fixed 2.5-fold volume expansion and involved a further estimator we will hereafter refer to as double strand break distance mean score (DMS).
It is widely shared that the definition of the DNA double strand break is based over a threshold distance criterion between individual strand breaks over complementary DNA strands [33]. Such criterion is rational to a microdosimetry level of theory; each DSB motif (that is associated to a strand breaks distance), however, implies a local unique chemical aberration and mechanical behavior, where the (virtual) timescales for a broken DNA framework to crack by thermal fluctuations vary in each DSB scenario [11]. We thus added a ROOT histogram that kept track of each DSB distance. The DSB distance scores at 500 keV, 1.5 MeV, and 5 MeV seemingly share a Poisson fit, as shown by Figures 4A-C, and are biased towards short-distanced DSBs; Figure 4D further shows how the DSB DMS decreases slightly with the particle energy, within the 500 keV to 5 MeV range. While we lack a yet significant dataset (that extends over a wider energy range), we would speculate the existence of an effective correlation between a particle track structure and a "mean DSB event" quality, over a unique DNA framework. Rather remarkably, short-distanced DSB events by direct effect (one-to five-nucleotide distance) would thus look favored by a proton source within a Bragg peak-related energy range.
We further realized that the DSB DMS value slightly fluctuates as the volume is expanded- Figures 4E,F show the DMS within a 1-to 5-fold linear size expansion, at 0.5 and 5 MeV. As we overexpand the water box sizes beyond a 2.5-fold factor (that is, the threshold where we achieve a substantially thick solvation shell and moderate the DHS loss), we shall collect no further track structure information and collaterally oversample the water volume far off the tetranucleosome framework; thus, we expect the DSB DMS to converge eventually.

DISCUSSION
We detailed a brief assessment of the early, direct DNA lesions associated to the energy deposition track by a proton beam isotropic source over a Bragg peak-related energy range. We extended the default PDB4DNA (a Geant4-DNA example) C++ classes and involved a set of ROOT histograms to keep track of the hit and DNA strand break scores over a nanometric-sized water box, tailored over the atomic coordinates of a nucleosome tetramer (Protein Data Bank entry: 1ZBB).
By the default PDB4DNA extension layout, we achieved a non-homogeneous, spiked hit score over the nucleosome tetramer DNA backbone, which describes a nucleotides cluster over the reference volume central core and where the outlying nucleosome compartments are undersampled; this leads to a local unbalance in the hit counts. We thus symmetrically expanded each of the reference volume box sizes up to 5-fold and allowed a thick solvation shell about the nucleosome tetramer, where an ion track is not broken off; we eventually established (via a normalized Shannon's entropy formula) that a 2.5 linear size expansion threshold achieves an unbiased sample of the nucleosome tetramer DNA backbone, within a 500 keV to 5 MeV energy range.
Clinical treatments (10 8 -10 9 ions/cm 2 fluence) strike cell nuclei by a few hundred projectiles; in silico track structure assessments of a nucleosome (that is a frame size smaller by a factor 10 8 than a cell nucleus) shall thus infer mean dosimetric information at the nanometric level, which we achieved by an exhaustive and unbiased collection of events over the DNA backbone. A bias estimation by a Shannon's entropy algorithm is, however, strictly framework dependent: in fact, we shall expect it to be not as effective where DNA is highly symmetrical over the volume.
In conclusion, we remarked that a DSB coarse nanodosimetric description based over a distance threshold (i.e., that is inclusive of all double strand break motifs, within an arbitrary distance) is weak by molecular dynamics (MD) criteria, where we are not allowed chemical ambivalence. To such aim, we noticed that the DSB distance scores share a Poisson fit and are biased towards short-distanced DSBs (one-to five-nucleotide distance), within a 500 keV to 5 MeV proton energy range. As a further biophysical estimator, the DSB DMS slightly fluctuates as the volume is expanded and decreases with the particle energy. While we lack a yet significant dataset and a careful assessment of DSB criteria, we speculated a correlation between a particle quality and energy and a "mean break event" assumption (by direct effect), whereby particles are associated with a DSB distance likelihood based over a track structure description. We acknowledge, however, that a further, updated analysis, where indirect effects are taken into account will be needed.
As done initially in Landuzzi et al. [11] and Cleri et al. [12], MD shall be exploited to further assess the early evolution of chromatin-like DNA frameworks, despite within limited timescales, as a clear scenario of early DNA lesions is collected. To create a cross framework, where to meet a nanodosimetric and biochemical intuition, further structural feedback shall be collected, whereas, to date, we lack atomistic datasets to irradiated, chromatin-like frameworks. An isolated attempt that unifies DNA lesions by ion irradiation and classic MD was carried out in the context of a multiscale approach [34], although explicitly focused on the channels of shockwave induction by local heat spikes in high-LET regimes. As exhaustive datasets on the local features of clustered DNA lesions by different radiation sources will become accessible, multiscale approaches shall become straightforward; however, such issues are yet a matter for further debate.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
LP carried out the numerical simulations and analysis and wrote the draft. All authors conceived the current ideas, discussed the assessment, reviewed, and contributed to the draft.