Development and Benchmarking of a Monte Carlo Dose Engine for Proton Radiation Therapy

Dose calculation algorithms based on Monte Carlo (MC) simulations play a crucial role in radiotherapy. Here, the development and benchmarking of a novel MC dose engine, MonteRay, is presented for proton therapy aiming to support clinical activity at the Heidelberg Ion Beam Therapy center (HIT) and the development of MRI (magnetic resonance imaging)-guided particle therapy. Comparisons against dosimetric data and gold standard MC FLUKA calculations at different levels of complexity, ranging from single pencil beams in water to patient plans, showed high levels of agreement, validating the physical approach implemented in the dose engine. Additionally, MonteRay has been found to match satisfactorily to FLUKA dose predictions in magnetic fields both in homogeneous and heterogeneous scenarios advocating its use for future MRI-guided proton therapy applications. Benchmarked on 150 MeV protons transported on a 2 × 2 × 2 mm3 grid, MonteRay achieved a high computational throughput and was able to simulate the histories of more than 30,000 primary protons per second on a single CPU core.


INTRODUCTION
Image guided radiotherapy is at the forefront of innovative treatment delivery techniques. It has the potential to improve treatment efficacy via on-board imaging procedures such as adaptive planning and/or live monitoring, for instance via magnetic resonance (MR)guided radiation therapy (MRgRT) [1,2]. Over the last decade, clinical prototypes have combined low-field-strength MR and radioactive cobalt-60 sources for photon treatment, followed by linear accelerators and higher field-strength MR fields for improved image resolution [3][4][5].
Particle therapy (PT), a cancer treatment modality achieving superior dose conformity to solid tumours compared to conventional photon techniques [6,7], would greatly benefit from on-board MR-guided treatment delivery [8]. For instance, at the Heidelberg Ion Beam Therapy Center (HIT) over 5,000 patients have been treated with proton and carbon ions since 2009 [9]. While 16 O ions have so far only been used for research purposes, HIT has treated the first patient with raster scanning 4 He ion beams in July 2021.
For all clinically administered ion beams, on-board MRguided treatment delivery is currently not feasible. However, system developments for treatment planning and delivery of MR-guided particle therapy are underway at HIT. Here, we begin with considerations in dose calculation for MR-guided particle therapy. During MRgRT using photons, for example, the MR field (due to Lorentz forces) can impact the dose deposition of ionized electrons/delta-rays, with severity depending on patient anatomy and MR field strength [10,11]. Hence, dose calculation corrections are introduced in clinical practice for improving accuracy [12]. Similarly, trajectories of fast charged particles like protons are altered by the MR field [13][14][15] and consequently, proper consideration must be given for accurate dose calculation.
With the aim of providing dose computations at various levels of accuracy and speed for current and future treatment in particle therapy with light and heavy ions, various systems have been introduced at HIT to support clinical deployment of PT. Initially, as a gold standard, a MC environment based on the MC code FLUKA [16,17] has been developed and extensively benchmarked [18] for allowing database generation for clinical analytical treatment planning system (TPS) and patient recalculations. This framework required long computation times (hours to days depending on the number of CPUs available) which limited its usage in the analysis of large patient cohorts and for any adaptive/on-line planning.
In order to overcome these limitations, FRoG (Fast dose* Recalculation on GPU) has been introduced, an advanced analytical code capable of calculating dose, LET d (dose-weighted Linear Energy Transfer) and biological dose for the four particle beams available at HIT [19][20][21] and which is in use at other PT facilities in Europe (Centro Nazionale di Adroterapia Oncologica [21], Danish Centre for Particle Therapy [22]. High levels of agreement within 1-2% [19,21,23] were found comparing FLUKA and FRoG recalculated dose-volume-histograms (DVH) of proton and other light ion patient plans even for complex cases such as lung irradiation [23]. However, analytical codes are usually designed for a specific task, making the introduction of new features such as MR-guidance [14,24], positron emission tomography [25,26] and prompt gammas [27] require large development effort and substantial changes in the physics engine. Fast MC engines have been introduced for proton beams [28][29][30][31][32][33] and helped streamline the development while reaching various levels of agreement when compared against gold standard MC codes such as FLUKA and TOPAS/Geant [34].
Several recent works have investigated the impact of MRguidance on particle beam physics and modelling distortion due to the Lorentz force [13,[35][36][37][38]. Despite these characterizations however, no fast MC engine has been presented in literature which is able to perform clinically relevant particle therapy calculations in magnetic fields. In this work, a CPU-based fast MC dose engine for proton beams (MonteRay) was developed and benchmarked for supporting ongoing clinical activity and introducing novel treatment modalities, particularly within the MRI-guided particle therapy program at HIT.

Programming Languages and Libraries
With performance and extensibility to GPUs in mind, the MonteRay MC engine was written in C++. Several external libraries were used either during development or execution of the MC code. The frameworks GoogleTest [39] and Benchmark [40] are used for testing and microbenchmarking the source code. The Boost library [41] is used for filesystem operations and parsing of configuration files. RapidXml [42] is used for reading of irradiation plans in XML format. ITK [43] and DCMTK [44] are used for reading CT images. FLUKA simulations were performed using FLUKA version 2020 0.6.

Geometry and Materials
Voxelized water phantom and patient geometries are implemented from computed tomography (CT) scans using the approach described in [45,46], i.e. the Hounsfield Unit (HU) of each voxel is converted to a water equivalent path length, density and elemental composition. In total, 36 different materials, covering an HU range between -1000 HU and 3070 HU are used. HU values larger than 3070 are assumed to be metallic implants made from titanium. Each material is modeled as a combination of up to ten elements. Additionally, five extra materials (water, RW3, PMMA, air and carbon fiber) can be defined by the user for dosimetric studies. For the calculation of nuclear interactions, only the most abundant isotope of each element is considered: 1 H, 12 C, 14 N, 16 O, 23 Na, 24 Mg, 31 P, 32 S, 35 Cl, 40 Ar, 39 K, 40 Ca and 48 Ti. However, just H, C, O and Ca already constitute more than 90% of a human's weight [47]. Including more materials in MonteRay is trivial if they consist only of the ten base elements already defined. Adding additional elements requires the generation of additional inelastic nuclear interaction databases (Inelastic Nuclear Interactions).

Handling the HIT-specific Beamline
The HIT beamline consists of various layers of different materials, including tungsten [48], with which the particle beam interacts before reaching the patient, resulting in a unique phase-space of particles. To avoid modelling and simulating the whole beamline in MonteRay, the approach described in [49] was used, i.e., sampling from a phase-space for each of the 255 quasimonoenergetic proton beams available at the HIT facility. Each file contains the location, direction and energy of 10 million particles sampled on a plane perpendicular to the beam's direction before the patient's entrance. The phase space was generated using FLUKA and besides primary protons, secondary protons generated due to the primary particle's interactions with the beamline are also considered. For now, however, all other secondary particles (deuterons, tritons, 3 He, 4 He and neutrons) are neglected. During simulation, our MC code randomly samples individual particles from these phase space files.

Transport
For the simulation of proton beams, MonteRay performs the transport of protons, deuterons, tritons, 3-Helium and 4-Helium. Of these particles, only protons undergo elastic and Inelastic Nuclear Interactions as described in Nuclear Interactions. All transported particles experience energy loss and scattering through electromagnetic interactions as described in Electromagnetic Interactions.
Energy is deposited either on a Cartesian or a cylindrical grid. Energy depositions from heavy nuclear recoils are recorded locally while energy lost through electromagnetic interactions are deposited along a track via the method described in [50]: given the particle's location at the beginning of the transport step x i → and its position at the end of the transport step x f → , the point of energy deposition is chosen randomly via where U is a random number uniformly distributed on the interval [0, 1). This is an efficient method of avoiding aliasing effects due to floating-point inaccuracies at grid boundaries and mismatches between the CT and the scoring grid. To avoid discontinuities in the deposited dose, the particles are transported on a grid with spacing equal to or less than the requested scoring grids spacing. If a CT is loaded, this will be the CT grid. All simulations shown here, unless otherwise noted, were performed on a 1 × 1 × 1 mm 3 Cartesian grid. At the beginning of each step, the distance to the next voxel's boundary dx vox is calculated and the distance to the next nuclear interaction dx nuc is sampled based on the total nuclear cross section introduced in Nuclear Interactions. The smaller of these two values is chosen as the current iteration's step length dx, i.e.
The energy loss over the distance dx is calculated and the scattering angle is sampled after the approaches described in Electromagnetic Interactions. In the presence of a magnetic field B, an additional deflection Δu m → due to the Lorentz force is calculated after [51] using where m is the particle's rest mass, z is the particle's charge in units of the elementary charge, β is its velocity relative to the speed of light c, u is its normalized direction and γ is the Lorentz factor. After updating the particle's position, energy and direction, if a nuclear interaction occurred, the type of nuclear interaction is determined, and the nuclear interaction performed as will be described in Nuclear Interactions. The transport step is repeated until the particle's energy falls below a threshold of 1 MeV. The remaining energy is deposited in a single step. During transport, only protons undergo nuclear interactions. For all other particles, only electromagnetic energy losses are considered.
Angular deflections due to nuclear or electromagnetic interactions, expressed through a polar angle θ and an azimuthal angle ϕ, are applied to the particle's initial direction u to obtain the particle's final direction u′ → via: where the vectors v and w are chosen such that together with u, they form an orthonormal basis. Since all physical interactions considered in the simulation are independent of ϕ, any orthonormal basis can be used for this purpose. To find v, a run-time efficient algorithm described in [52] is used. The last constituent of the orthonormal basis is then computed using the cross product w u × v.

Electromagnetic Interactions
Interactions with electrons cause charged particles to continuously lose energy while travelling through matter. The mean energy loss per unit distance due to this process is called the stopping power S, which is a function of energy and dependent on the projectile's mass and charge [53,54]. FLUKA was used to tabulate the energy loss of the transported particles in water from 0.1 MeV/ u to 1,000 MeV/ u with 2000 linearly spaced intervals.
To obtain the stopping power in materials other than water, the stopping power table for water was multiplied by a factor dependent on the materials HU value [45]. Since the step size dx is fixed at the beginning of each transport step, the mean energy loss dE that the particle experiences during the step must be calculated. This problem is equivalent to solving the following equation: where E(x) is the particles energy after having travelled a distance x and E 0 is the particle's energy at the beginning of the step. While this equation is in principle solvable under the assumption that S is linear along the step, a numerical approximation was used instead. This approximation is based on the following recurrence relation: This recurrence relation is evaluated up to a depth of n 3 to arrive at an accurate estimate of dE.
Scattering is a statistical process, so the stopping power only describes the mean energy loss per unit distance traveled. Theoretical treatments of this process have for example been done in [55,56]. The distributions derived therein are complex and their sampling costly. But in the limit were dx is large, the energy loss is approximately distributed normally around dE. If dx is small on the other hand, the Gaussian approximation is inadequate and the distribution is skewed towards high energy losses [54]. Whether a Gaussian approximation is appropriate can be judged through the parameter κ given by where ξ is given by and T max is given by where N a is Avogadro's number, m e is the electron's mass, r e is the classical electrons radius, q is the particles charge in Coulomb, Z is the target's atomic charge and A is the target's atomic number. Following [57], the energy loss distribution is approximated through a normal distribution if κ ≥ 10 and a log-normal distribution if κ < 10. The normal distribution has mean dE and standard deviation σ as given for example in [53]: The log-normal distribution's parameters are determined through a fit, matching the first four moments of the Vavilov distribution [57]. For very small κ < 0.3 [57], propose the use of a different distribution, but with step sizes of 1 mm it was found that adequate agreement can be achieved by sampling from a lognormal distribution even when κ < 0.3. During the simulations, care had to be taken here since occasionally, especially for very low-density materials like air, the energy loss sampled according to the log-normal approximation could become negative. In this case the approximation dE dE was used.
Besides inelastic collisions with atomic electrons, charged particles also undergo elastic collisions with atomic nuclei. These interactions do not contribute to the particle's energy loss but deflect the particle. This too, is a statistical process. Commonly, MC simulations base their scattering model on Moliere's theoretical treatment [58]. The formula derived by Moliere is a series of functions where the reduced angle ϑ is related to the polar scattering angle θ used in Eq. 1 via and where χ c and B are constants dependent on the target material, the incoming particle's energy and charge. These constants are defined in [58] together with an integral representation of the functions f n . To clear up possible confusions, we note that in literature, frequently not the scattering angle ϑ is considered but instead the projected angle φ is used. This angle arises when one considers the projection of ϑ onto an axis perpendicular to the beam's direction. For a rigorous definition of ϑ and φ, we refer to [58,59] and here we will only work with the angle ϑ.
Sampling from higher order terms of the Moliere distribution is computationally expensive, but approximations can be made. Perhaps the simplest is dropping higher order terms, i.e. terms where n ≥ 2. Since B is a measure for the average number of single scattering events occurring along a step, when the step size dx is large the weight of the higher order terms decreases and the distribution can be approximated through a Rayleigh distribution in ϑ Single Gaussian approximations of the scattering angle have for example been introduced by Rossi [60] or Highland [61]. For the width σ of this Gaussian [60], provide the following empirical formula: where z is the particle's charge in units of the elementary charge, ρ is the target's density, χ 0 is the target's radiation length and p is the particle's momentum in MeV/c. Originally, the value of E s was given as 21 MeV but with the mixed Rayleigh-Rutherford approach that will be presented here, a value of 11.6 MeV was found to be better and was used throughout all simulations presented in Results.
For small dx, the single Gaussian approximation does not adequately reproduce the large angle tails of Moliere's distribution. As a result, authors have proposed different modifications to the pure Gaussian probability distribution such as double or triple Gaussian parametrizations [62,63] or parametrizations that use a Rutherford distribution to model the tail [31,53,[64][65][66]. Generally, even when using fits of Gaussian mixture models, the large-angle tails of the Moliere distribution are not reproduced adequately. In this work, a parametrization similar to [64] was used, combining a Rutherford-like tail with a Rayleigh distribution at the center For the simulations, the value k 3.5σ was used, the constant α was determined such that P(θ) is continuous at the boundary θ k and N was determined such that the probability density function is normalized, i.e.
Here, N G and N R are the integral of the Rayleigh and the Rutherford-like part respectively, given by Sampling of an angle θ is then done via inverse transform sampling using only a single uniformly distributed random number U after Nuclear Interactions

Elastic Nuclear Interactions
The kinematics involved in elastic nuclear interactions are implemented fully relativistically. The total elastic cross section σ el is calculated starting from the work of [67] and was tabulated in 500 evenly spaced bins ranging from 0.1 to 500.1 MeV for all 10 nuclei listed in Geometry and Materials. The scattering angle in the center of mass frame is sampled according to a parametrization proposed in [68]. First, the momentum transfer t is sampled after where, A is the target nucleus atomic number. Then, the center of mass scattering angle is calculated via With p CM being the center of mass momentum. From θ CM , the laboratory frame polar scattering angles are computed and, together with a uniformly distributed azimuthal angle, applied to the resulting scattered particles.

Inelastic Nuclear Interactions
In particle therapy, inelastic nuclear scattering events generate the mixed radiation field, i.e. photons, protons, neutrons, deuterons, tritons, 3 He, 4 He and heavier fragments (nuclear recoils). In MonteRay, similarly to other works in literature [31,32], photons and neutrons are assumed to be dosimetrically irrelevant and they are neither transported nor produced. The total inelastic cross section σ ine for protons was calculated starting from the work of [69,70]. To model the production of secondary particles, a database of nuclear event probabilities was generated based on nuclear models used internally by FLUKA. The database covers a primary proton energy T in ranging from 10 to 300 MeV, in steps of 10 MeV. Tables were generated for each of the 10 elements defined in and for each of the five possible product particles considered: protons, tritons, deuterons, 3-Helium and 4-Helium. Each table (of the 30 · 10 · 5 1500 tables) is divided into 100 bins in the kinetic energy T sec and 100 bins in the solid angle Ω sec of the secondaries. The 100 energy bins divide the range 0 to T in into evenly spaced intervals and the 100 angular bins evenly divide the interval 0 to 4 π. If an inelastic nuclear event occurs during simulation, all possible products for the current target nucleus are created but assigned weights corresponding to their relative multiplicity ( Figure 1A). Secondary particle energy ( Figure 1B) and direction are chosen via a binary search on a cumulative probability distribution, generated at the beginning of the simulation by summing up the tables values. Additionally, the mean kinetic recoil energy is stored for each table and deposited on the spot following a nuclear event.

Benchmarking of the Developed Dose Engine
To benchmark MonteRay, its predictions were compared against experimental data acquired at HIT over the last years, published in [71,72]. For scenarios where experimental data was not available, e.g. in presence of magnetic fields and for patient calculations, FLUKA predictions were used as a reference.

Comparison Metrics
To judge MonteRay's agreement with measurements or against other TPS, several common radiotherapy metrics were used. The relative error was used to quantify the relative disagreement between two dose profiles, d 1 and d 2 . Measured and calculated beam ranges were compared in terms of their R 80 value which is defined as the depth distal to the Bragg peak (BP) where the dose falls to 80% of the BP value. The difference in range for two dose distributions was quantified through ΔR 80 |R 1 80 − R 2 80 |. Agreement between lateral profiles was judged using the full width at half maximum (FWHM) value and the full width at 10% of the maximum (FW10%M) value. For the comparison of 3D dose distributions, the 3D local gamma pass rate was calculated. For this the python package pymedphys version 0.37.1 was used. For the calculation, similar to previous proton MC engines [28,31,73,74], the dose percentage threshold was set to 2%, the distance threshold to 2 mm and the dose cutoff to 5% of the maximum dose. During the calculation of the gamma pass rate, dose outside the patient was not considered as it is clinically irrelevant. Another metric used to evaluate patient plans is the D x value. For a given region of interest (ROI), it is defined as the minimum dose that x percent of the ROIs volume is exposed to.
To judge the deflection of a single beam in a magnetic field, we introduce the center of mass (COM) of the beam. Given a lateral profile scored in N bins at locations x i with corresponding scored doses d(x i ), we define it as Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 741453 5

Dosimetric Data
Various experimental data that was previously recorded at HIT was used to evaluate MonteRay's performance in terms of dosimetric accuracy. This data, included pencil-beam depthdose distributions [71], lateral profiles of vertically scanned line profiles [75] and Spread-Out Bragg Peak (SOBP) plans [72]. Details on the measurement procedures were given in the mentioned references so only a quick overview will be given here. Pencil beam depth-dose distributions in water were recorded using a PeakFinder water column (PTW, Freiburg) with a diameter of 8.16 cm. In total, 17 Bragg curves with beam energies spanning the entire energy range available at HIT (from 48.5 to 222.6 MeV) have been measured. The measurements took place in a clinical room at HIT. The resolution was 0.05 mm in the region of the BP.
Measurements of lateral profiles of vertically scanned irradiation lines in a water phantom were obtained at three energies (81.5, 158.5 and 222.6 MeV) using an array of 24 motorized pinpoint chambers (PTW, 0.03 cm 3 ) arranged in a block of six rows and four columns. The profiles were recorded perpendicularly to the direction of the vertically scanned line. Each scanned line consisted of 101 pencil beams ranging from -50 mm to +50 mm with a 1 mm spacing. The horizontal profiles were recorded starting from about 16 mm in water to 30 mm after the BP. For each energy, profiles at 42 depths were recorded. The distance between consecutive profiles was between 0.5 and 10 mm.
Three SOBP plans centered around 5 cm, 12.5 and 20 cm in water were created using a FLUKA-based treatment planning tool. The planned dose was 1 Gy within the 3 × 3 × 3 cm target region. Delivery of the plans happened in the experimental room at HIT with measurements being done with the same block of pinpoint ionization chambers used for acquiring the lateral profiles described earlier. The profiles were recorded starting at a depth of 16 mm to approximately 20 mm after the end of the SOBP. The step size between measurements in regions of high gradient and in regions of high dose was 1 mm.

FLUKA Calculations
Due to the lack of dosimetric data in magnetic fields, the transport in magnetic fields was benchmarked by comparing MonteRay against FLUKA. For this the effect of homogenous magnetic fields, applied perpendicular to the beam's direction, was studied for field strengths of 0.5, 1.0 and 2 T. In FLUKA, magnetic fields were enabled using the MGNFIELD card with default settings. The DEFAULTS card with value PRECISIO was enabled during FLUKA simulations to ensure high precision simulations.

Patient Planning
Patient planning was performed in the clinical TPS RayStation 10 A (RaySearch Laboraries, Stockholm, Sweden) on an anonymized DICOM patient data set representative of a meningioma treatment. A proton treatment plan using a single beam at 90°was optimized for evaluation of dose calculation accuracies in a patient anatomy. The initial spot positioning (hexagonal grid with spot spacing of 3.6 mm, energy spacing of 2.1 mm) and minimum number of particles (580.000 particles) settings follow clinical practice at HIT. Optimization was made on the planning target volume (PTV, ∼112 cm³) for 49.1 Gy/ 54 GyRBE in 30 fractions using a constant radiobiological effectiveness of 1.1. The resultant energy range spanned from ∼78 to 151 MeV. The dose grid was set to 2 × 2 × 2 mm³ in RayStation with a dose uncertainty of 0.5%. The treatment plan was exported in FLUKA and MonteRay for forward calculation with and without a magnetic field. The statistical uncertainty of the MonteRay and FLUKA runs was 1%. The dose uncertainty was estimated using the batch method. Dose cubes stemming

Pristine Bragg Peaks in Water
To evaluate the accuracy of MonteRay, we first compare the simulated dose in water d MR to the dose measured at HIT d HIT for 17 quasi-monoenergetic beams. The beam energies ranged from 71.5 to 222.6 MeV. Figure 2 shows a comparison of the dose obtained with MonteRay with measured values for three exemplary energies of 71.5, 158.5 and 222.6 MeV. Due to the high resolution of the measured data (up to 0.05 mm in the BP region), the transport was performed on a Cartesian grid with 0.1 × 0.1 × 0.1 mm 3 resolution. Scoring likewise was done in 0.1 mm thick slices. To match the physical dimension of the detector, scoring was performed in a cylindrical volume with a radius of 4.08 cm. Both measurements and simulations were normalized to one at the BP. Across all the energies, the maximum, minimum and mean ΔR 80 values were 0.16, 0.06 and 0.10 mm, respectively. Once the MC calculations were shifted by ΔR 80 , d MC and d HIT were quantitively compared using the relative error ε rel . The dose threshold for calculating ε rel was set to 20% of maximum. The mean absolute ε rel over all the investigated energies was 0.56 %.
For the verification of the lateral parametrization in water, measurements of vertically scanned proton beam lines, as described in Dosimetric Data, were compared against MonteRay simulations. Lateral relative dose profiles at three energies, 81.5, 158.5 and 222.6 MeV, and at 40 different depths were compared. In Figure 3 and for each energy, lateral profiles at three depths are visualized: at the entrance (top row), in the BP region (bottom row) and in the middle of these two (middle row). The depths are reported in each panel of Figure 3. The corresponding energy is given at the top of each column. After correcting for the error in FWHM already present at the entrance due to daily variations in the beam's shape, on average, the simulated FWHM matched the experimental data's FWHM within 0.1, 0.3 and 0.5 mm for the three energies, respectively. Likewise, the FW10%M values matched to within 0.1, 0.3 and 0.9 mm.

Spread Out Bragg Peaks in Water
Next, MonteRay's simulated dose was compared with dosimetric data from SOBP plans. The measurement process was described in Dosimetric Data. The resulting depth-dose distributions are displayed in Figure 4, together with the measured values. The mean absolute relative error between measurements and predictions (excluding data in regions of high dose gradients, as performed in clinical routine) was (0.69%, 0.74%, 1.0%) with a standard deviation of (0.7%, 0.6%, 1.0%). The ΔR 80 values were 0.5, 0.3 and 0.3 mm, respectively. In the lower panels of Figure 4, lateral profiles at the entrance and at in the middle of the SOBP are shown. Here, the simulated SOBP widths matched the experimental ones on to within about 1 mm.

Magnetic Field Deflection in Homogenous Fields
To judge the accuracy of MonteRay when dealing with homogenous magnetic fields, MonteRay's simulations were first compared to FLUKA's for monoenergetic proton beams incident on water. The magnetic field was applied perpendicular to the beam's direction and four field strengths of 0 T, 0.5 T, 1 T and 2 T were compared. Planar profiles were scored with a resolution of 1 × 1 × 1 mm 3 but were afterwards integrated along 1 cm in the direction of the magnetic field axis to provide higher statistics. In Figures 5A,B, 2D  . For all tested field strengths, the gamma passing rate (as defined in Comparison Metrics) was above 99.8% In Figure 5C, lateral profiles at the BP position for the four field strengths are shown. From lateral profiles, COM, FWHM and FW%10M were computed at each depth up to the BP. The maximum differences in COMs (ΔCOM), FWHM (ΔFW 50 ) and FW10%M (ΔFW 10 ) between MonteRay and FLUKA are summarized in Table 1. For all tested field strengths and at all depths, the maximum distances between the COMs stayed below 0.15 mm, the maximum disagreements in the FWHM reached 0.21 mm while the maximum disagreements in the FW10%M reached 0.31 mm. Comparing integrated depth-dose profiles, the R 80 values between MonteRay and FLUKA were found to agree to within 0.14, 0.18, 0.10 and 0.07 mm. The maximum relative errors in dose, after correcting for these shifts, was 1.2%.

Patient Case
In Figure 6   In Figure 7A, DVHs calculated for several regions of interest (ROI) are displayed: the CTV, the brain, the brainstem and the right optical nerve. The D 2 , D 50 and D 98 values were computed for the CTV and D 2 values were computed for the organs at risk (OAR). To judge the quality of MonteRay, the relative difference in D x values between MonteRay and FLUKA is compared to those between RayStation and FLUKA. Overall, the agreement  between MonteRay and FLUKA was of the same magnitude as the agreement between RayStation and FLUKA. For the CTV, good agreement in the D 2 value of 0.25%, the D 50 value of 0.38% and the D 98 value to within 0.58% was found. For the considered OARs the computed D 2 values matched within 0.50% for the brain, within 0.44% for the brainstem and to within 0.49% for the right optical nerve.

Patient Case With a Magnetic Field
To benchmark our magnetic field implementation, the previous patient plan was reused but for the dose calculation in MonteRay and FLUKA, a homogenous magnetic field of 1 T was applied throughout the CT volume. In Figure 8, the calculated doses in FLUKA (Panel (A)) and MonteRay (Panel (B)) are displayed. With the magnetic field enabled, the gamma passing rate between MonteRay and FLUKA was found to be 98.8%.
In In Figure 7B, DVHs calculated on the same ROIs as in the previous section are shown. D x values were computed for FLUKA and MonteRay. For the CTV we found agreement in the D 2 value of 2%, in the D 50 value of 0.53% and in the D 98 value of 1.2%. For the OARs, the computed D 2 values matched within 0.76% for the brain, within 2.1% for the brainstem and within 2.3% for the right optical nerve.

Runtime Benchmarks
The performance of MonteRay was evaluated for various test cases. All tests were performed on a six-core AMD Ryzen 5,3600 processor. The transport grid's resolution was set to 2 × 2 × 2 mm 3 . This resolution is used clinically at HIT and other fast MC codes have used this resolution for benchmarking [31]. For 150 MeV monoenergetic Protons in water with a FWHM of 1 cm, a throughput of 31 k primaries per second on a single core and 180 k primaries per second when using all six cores of the CPU, was measured. Under parallel load, the throughput therefore was 30 k primaries per second per core. In comparison, the computational throughput of FLUKA on the same problem on the same hardware was 1.1 k primaries per second.
For the patient plan, benchmarks were run on a 2 × 2 × 2 mm 3 grid with 5,000 particles per pencil beam per core. In total, the plan consisted of 8313 pencil beams. On a single core, a throughput of 33 k particles per second was observed while the throughput on six core was measured to be 193 k primaries per second which corresponds to 32 k primaries per seconds per core.

DISCUSSION
The comparison of MonteRay predictions against dosimetric data and FLUKA simulations confirms that the implemented  electromagnetic and nuclear models correctly reproduce the underlying physics. In terms of depth-dose distributions for pencil beams in water (Pristine Bragg Peaks in Water), the mean absolute relative error over all 17 compared energies was 0.56%, ranging from 0.33 to 0.60% for 102.6 and 222.6 MeV protons, respectively. The depth-dependent maximum absolute relative error varied from 0.95% (48.5 MeV) to 3.4% (222.6 MeV). The latter is located at the entrance channel of the highest energy (222.6 MeV) which is typically not used for clinical purpose. This underestimation could in part be explained through the fact that the current approach for sampling the initial particles neglects secondary d, t, 3 He and 4 He particles produced in the beamline. Our predictions are in line with other fast MC engines available in literature, for example [73], using FRED have found relative differences of up to about 3% for 200 MeV protons in water.
In terms of lateral evolution as function of depth, MonteRay matched satisfactorily the experimental data in terms of FWHM/FW10%M within on average 0.1, 0.3, and 0.9 mm for low, medium and high energies. The largest difference has been found in the Bragg peak region for 222.6 MeV protons with a maximum variation of the FW10%M of 2 mm. To evaluate possible shortcomings in the scattering model, we have compared FLUKA and MonteRay predictions for 200 MeV proton beams in water without the HIT beamline. The maximum FWHM(FW10%M) variation found was   Evaluation of MonteRay on a patient plan showed good agreement against simulations performed with FLUKA. In terms of D2, D50, and D98 we achieved similar agreement to FLUKA as RayStation did. The 3D gamma pass rate was calculated to be 99.8% showing that the implemented models and approximations for electromagnetic and nuclear interactions approximate the underlying physics well, also in a clinical setting. Computed 3D gamma pass rates were in line with those obtained by other fast MC engines [28,33,73].
Similarly, we evaluated the quality of our simulation when an additional magnetic field was applied to an irradiation plan. Compared to FLUKA, we found adequate agreement in terms of D2, D50, and D98 between 0.5 and 2.3%. The 3D gamma pass rate was 98.9%, showing that a simple approximation of the Lorentz force is adequate at describing the transport of charged particles in homogenous magnetic fields.
In terms of computational throughput, MonteRay was able to simulate 31 k primaries per second for a 150 MeV proton beam incident on water, transported on a 2 × 2 × 2 mm 3 grid. Parallel execution on six cores was found to scale linearly, achieving a throughput of 180 k primaries per second. When benchmarked on a patient plan containing ∼8300 pencil beams with energies ranging from ∼78 to ∼150 MeV, we measured a throughput of 33 k particles per seconds on a single core and 193 k particles on six cores. Again, linear scaling was observed which demonstrates that reading the phase space from disk is not a bottleneck, even when multiple cores are competing for random read access.

CONCLUSION AND OUTLOOK
In this work we have presented a novel MC engine, specialized for proton therapy calculations, currently under development at HIT. Good agreement with measured data and a full-fledged MC engine (FLUKA) has been found. MonteRay achieved fast tracking rates of more than 30 k proton primaries per second at 150 MeV on a 2 × 2 × 2 mm 3 grid. In a next step, work will begin on porting our fast CPU engine onto GPUs. Following a heterogenous approach, i.e. using both CPUs and GPUs, we hope to achieve sub-minute runtimes even for large irradiation plans. A custom Monte Carlo engine will also allow us to easily implement custom features such as computing the linear energy transfer or to add imaging capabilities by producing positrons or prompt gammas.
With helium beam treatment commencing at HIT, inclusion of helium beams in MonteRay is underway with inelastic nuclear databases having already been generated.
With the aim of MR guided ion therapy, we are the first fast MC engine to include magnetic field support. In the future we will expand our evaluation to inhomogeneous fields with a focus on simulating MRIs which are being installed at HIT for the purpose of MR guided ion therapy.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
AM, AF, and PL were responsible for the conceptual design of the Monte Carlo engine. PL, BK, and AM developed the source code of the Monte Carlo engine. AM generated the materials database, including elastic and inelastic nuclear cross sections. AF developed the code for the generation of inelastic nuclear databases from FLUKA. JB generated the inelastic nuclear databases. PL, TT, and AM worked on the analysis of data. TT, AM, and SM collected and provided experimental data used in this work. TT computed and provided the patient plan and SOBP plans. JD and TH provided clinical direction during project development, manuscript writing and project administration and funding acquisition. All authors read and contributed to the preparation of the manuscript.

FUNDING
The authors acknowledge financial support through the German Federal Ministry of Education and Research (BMBF) within the project (Grant number: 13GW0436A).