Challenges in Monte Carlo Simulations as Clinical and Research Tool in Particle Therapy: A Review

The use and interest in Monte Carlo (MC) techniques in the field of medical physics have been rapidly increasing in the past years. This is the case especially in particle therapy, where accurate simulations of different physics processes in complex patient geometries are crucial for a successful patient treatment and for many related research and development activities. Thanks to the detailed implementation of physics processes in any type of material, to the capability of tracking particles in 3D, and to the possibility of including the most important radiobiological effects, MC simulations have become an essential calculation tool not only for dose calculations but also for many other purposes, like the design and commissioning of novel clinical facilities, shielding and radiation protection, the commissioning of treatment planning systems, and prediction and interpretation of data for range monitoring strategies. MC simulations are starting to be more frequently used in clinical practice, especially in the form of specialized codes oriented to dose calculations that can be performed in short time. The use of general purpose MC codes is instead more devoted to research. Despite the increased use of MC simulations for patient treatments, the existing literature suggests that there are still a number of challenges to be faced in order to increase the accuracy of MC calculations for patient treatments. The goal of this review is to discuss some of these remaining challenges. Undoubtedly, it is a work for which a multidisciplinary approach is required. Here, we try to identify some of the aspects where the community involved in applied nuclear physics, radiation biophysics, and computing development can contribute to find solutions. We have selected four specific challenges: i) the development of models in MC to describe nuclear physics interactions, ii) modeling of radiobiological processes in MC simulations, iii) developments of MC-based treatment planning tools, and iv) developments of fast MC codes. For each of them, we describe the underlying problems, present selected examples of proposed solutions, and try to give recommendations for future research.


INTRODUCTION 2 DEVELOPMENT OF NUCLEAR INTERACTION MODELS IN MC CODES
Charged hadrons of energies relevant in particle therapy (up to a few hundred MeV/nucleon) interact in tissue by electromagnetic and nuclear interactions [11]. Concerning the former, inelastic collisions with atomic electrons, resulting in ionization and atomic excitation, cause the particle to continuously lose energy along its path. Energy losses can be well described by the Bethe-Bloch formalism, predicting an increasing energy loss with decreasing particle energy. This is the main process governing the shape of the Bragg peak for charged particle beams in material. Also, discrete inelastic energy losses can occur in the form of delta-rays. Moreover, charged particles undergo numerous elastic Coulomb scatterings from the nuclei themselves (multiple Coulomb scattering, MCS), causing lateral broadening of the incoming particle beam. Modeling electromagnetic interactions is highly complicated in MC codes, but usually considered sufficiently accurate. On the contrary, hadronic physics models are still not considered completely satisfactory [12].
In this section, we try to point out the most important difficulties and challenges in nuclear physics model building, quoting the features of some of the most widely used MC codes used for particle therapy. Then, we shall summarize the possible impact of nuclear interaction modeling in dose calculation and in the evaluation of additional quantities useful for particle therapy. In particular, we shall focus the discussion on secondary particle production for range monitoring purposes. Finally, we shall present a summary of existing experimental data which are useful for tuning and benchmarking calculation models.

Main Problematics in the Development of Nuclear Interaction Models
There are many aspects of hadronic interactions that are relevant for correctly describing interactions of protons or heavier ions in the energy range relevant to particle therapy. Among them, we point out nuclear reaction cross section, elastic cross section, secondary particles, and fragment production, considering at the same time multiplicity, angle, and energy distributions. Secondary nucleons, particles, and fragments produced in nuclear reactions can considerably affect the spatial pattern of energy deposition and must be carefully taken into account. In particular, for the case of ion projectiles, nuclear fragmentation reactions are responsible for the deterioration of the physical selectivity in the longitudinal and transverse dimension, especially around the Bragg peak region. The amount of fragments produced generally increases with the mass and charge of the primary particle.
The commonly used general purpose modern MC codes make use of phenomenological models. Although a common approach is to fit existing data to predict certain quantities, in our opinion, they should be built upon reliable physical bases to have full predictive capability. In practice, this means that these models have to be built according to a "microscopic" approach, that is, starting from the fundamental properties of the nucleus and of its constituents. All relevant conservation laws have to be fulfilled, and correlations within each single interaction must be preserved. The treatment of nuclear environment and all phases following the primary fast interaction (pre-equilibrium, evaporation, fission, and de-excitation) is to be taken into account.
Models of this kind have necessarily a number of parameters that must be tuned by means of experimental data at single interaction level. In order to achieve a reliable level of predictive power, the number of these parameters should be kept minimal and their dependence on projectiles, targets, and energies should be predetermined without adapting them to the specific situation. On the one hand, this approach allows in principle to achieve a high degree of reliability. On the other hand, this might result in complex algorithms which can be demanding in terms of computing power with respect to simpler solutions, such as the parametric interpolation (or extrapolation) of existing data. It must also be pointed out that is not always possible, within a given model, to achieve the same level of accuracy at all energies or in the whole accessible phase space. Furthermore, in the therapeutic energy range, it is not always possible to rely upon a single model. Great care has to be taken in order to ensure the proper continuity in the transition from one model to another. Different MC codes have found different solutions to the above problems. In the following, we shall summarize the nuclear models adopted for a few of the most relevant MC codes presently used in particle therapy.

GEANT4
GEANT4 is used by a large number of experiments and projects in a variety of application domains, including medical physics and radiation protection. In the case of GEANT4 [13][14][15], the user has the possibility of selecting different models by specifying the so-called physics list. Reference physics lists are available in Ref. 16. In the context of particle therapy, an often recommended choice is the one called QGSP_BIC_EMY, which includes, beyond a Quark Gluon String model for the multi-GeV energy range, a Binary Cascade model. The GEANT4 Binary Cascade is a hybrid between a classical intranuclear cascade and a quantum molecular dynamic model [17][18][19] for the simulation of inelastic scattering of hadrons and light ions of intermediate energies. It is considered valid down to 200 MeV. New developments are under investigation to improve models for energies below this value, an example of which was published recently [20]. However, this study also highlighted the need for many more improvements in nuclear interaction modeling and calculation speed.
There are user-friendly interfaces to GEANT4 which are used by different groups working in particle therapy. For instance, the TOPAS (TOol for PArticle Simulation) toolkit [21,22] wraps and extends the GEANT4 Simulation Toolkit to facilitate the use of MC simulations in radiotherapy environments. It allows the user to configure pre-built components (nozzles, patient geometry, dosimetry, and imaging components) to simulate a wide variety of treatments with no required knowledge of any programming language. Another example is GATE (GEANT4 Application for Tomographic Emission) [23]. It has been developed by the OpenGate collaboration and encapsulates the GEANT4 libraries, providing a toolkit mostly oriented to nuclear medicine by easily including detailed geometry of many imaging devices. It has been tested in the context of particle therapy, especially for range monitoring applications [24][25][26].
In order to respond to the need of benchmarking the code against reference data, the GEANT4 community has recently developed a testing system, denominated G4-Med [27], which is specifically oriented to medical physics. Among the different options, it offers the possibility of benchmarking both electromagnetic and hadronic physics processes and models available in the pre-built, physics lists. A whole chapter is dedicated to the topic of hadronic models in the specific context of particle therapy. Total cross section of hadron-nucleus and nucleus--nucleus collisions have been compared to the data publicly available in the EXFOR database. Different subjects have been considered: yields, charge changing, and double differential cross sections. Tests concerning comparisons with Bragg peaks have been also reported. A detailed discussion on the quality of these benchmarks can be found in the report.
In addition to this recent work, in the context of proton therapy, we point out the work of Hall et al. [28], where GEANT4 results were compared to accurate measurement of longitudinal absolute dose profiles for 177 MeV protons at different radial distances from the beam axis, up to a radius of 10 cm. The test is sensitive to the dose envelope originating from nuclear interactions. Excellent agreement is reported over five orders of magnitude in the dose scale.

FLUKA
In the case of FLUKA [29,30], the number of available physics models is limited. The code automatically selects the appropriate model for each interaction according to the energy of projectile.
The nuclear environment, down to a few MeV, is provided by the interaction model called PEANUT (Pre-Equilibrium Approach to NUclear Thermalization) [31][32][33]. Interactions proceed along the steps of a generalized intranuclear cascade, followed by preequilibrium particle emission and by an equilibrium phase. For residual having A < 16, a Fermi breakup model is implemented [34]. In the emission of nucleons, a coalescence model is considered. The excitation energy still remaining after nuclear evaporation is dissipated by emission of γ rays. Competition of gamma ray emission with particle evaporation is considered. In the case of nucleus-nucleus collisions, for incident energies below ∼ 130 MeV/u, FLUKA makes use of the BME (Boltzmann master equation) model [35], which simulates the thermalization of a composite nucleus created in the complete or incomplete fusion of two ions. For increasing energy, FLUKA relies on the RQMD (Relativistic Quantum Molecular Dynamic) model [17,36], which can also be run in intranuclear cascade mode.

PHITS
A QMD approach has been chosen also in the PHITS (Particle and Heavy Ion Transport code System) code [37,38] for the treatment of hadron-nucleus and nucleus-nucleus collisions. Here, the JQMD (JAERI Quantum Molecular Dynamics) code is used [18]. JQMD is combined with the JAM (Jet AA Microscopic Transport Model) code [39], which implements a hadronic cascade approach, capable of dealing with hadron-hadron collisions up to center-of-mass energy s √ ∼ 100 GeV. At the end of the dynamical stage of the interaction, excited residual nuclei are treated by the GEM (Generalized Evaporation Model) model [40] to generate light particle evaporation and fission processes. Benchmarks of PHITS are reported in Ref. 41.

MCNP
MCNP (Monte Carlo NParticle), now in the version MCNP6 [42], developed at Los Alamos National Laboratories, is one of the most important general purpose three-dimensional MC codes. It is well known in nuclear physics and used for studies including criticality, shielding, and detector response, but also dosimetry and many other applications, such as medical ones. As far as hadron-nucleus and nucleus-nucleus interactions are concerned, also MCNP makes use of different models depending on energy. For incident energy below 1 GeV, of relevance for particle therapy, interactions are mostly treated by means of the INC (intranuclear cascade) approach. The MCNP6 default option for reactions induced by d, t, 3 He, 4 4He is the ISABEL INC generator, described in Ref. 43, and can be used also for nucleons and heavy ions with E < 1 GeV. It can optionally include pre-equilibrium reactions described by the MPM (multistage pre-equilibrium model) [44]. Evaporation reactions are treated with EVAP [45], while for fission, RAL [46] or HETFIS [47,48] can be chosen. A newer and improved model is CEM03.03 [49][50][51] which has its own treatment of pre-equilibrium, evaporation, and fission reactions. It considers also coalescence of nucleons into complex particles up to 4 He and Fermi breakup of excited or unstable nuclei with mass numbers up to A 12. Another recent alternative for the INC approach is INCL [52] for nucleons, d, t,

SHIELD-HIT
SHIELD-HIT, in its last version SHIELD-HIT12A [55,56], is a MC particle transport program optimized for proton and ion particle therapy. Nuclear reactions are treated within the MSDM (multistage dynamical model) generator [57]. It is composed of a fast cascade stage of nuclear reaction, which, according to projectile energy, is treated by the DCM (Dubna Cascade Model) [58] or by the QGSM (Quark-Gluon String Model) [59,60]. At the end of the cascade stage, nucleons which are close to each other in phase space can coalesce to form a complex particle. Pre-compound emission of nuclei is handled by a cascade-exciton model [61]. Subsequent equilibrium deexcitation is handled by Fermi breakup, according to the implementation described in Ref. 62. Then, evaporation/fission competition and multi-fragmentation of highly excited nuclei follow.

Impact of Nuclear Reaction Models on Dose Calculations
Nuclear interactions cause the attenuation of primary ions and build up of secondary ions, aspects that are crucial for accurate dose delivery and dosimetry [12]. Inelastic interactions are responsible for beam attenuation of the primary beam with penetration depth, and elastic interactions contribute to beam broadening. In the case of carbon ions, it was estimated with MC simulations that up to about 40% of the dose in the region between the entrance channel and the Bragg peak is delivered by fragments [63]. Thus, wrongly modeled cross sections would clearly lead to discrepancies in longitudinal and lateral dose distributions between measurements and MC simulations. A vast amount of works exist, mostly in the context of proton and carbon therapy, showing excellent agreements between MC codes and dosimetric measurements, both in terms of lateral and longitudinal dose (Bragg peak measurements). An example of a measured Bragg peak together with a GEANT4 MC simulation is shown in Figure 1. Results with analogous quality of agreement have been obtained also for other codes, including FLUKA comparisons with HIT [1], CNAO [64], and GSI [65].
From these nice agreements in dose distributions, one may conclude that we have reached a satisfactory level of accuracy of the description of nuclear reactions in MC codes for physical dose calculations. However, this is only partially true, and several important improvements in the context of dose calculations remain to be done: • For particles other than protons and carbon ions, the agreement in physical dose between MC codes and measurements is still not fully satisfactory. For instance, some significant discrepancies were found between simulations and data of spread-out Bragg peaks of Helium atoms in water [67,68]. These differences were Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 attributed to an underestimation of dose contributions from secondary particles produced at large angles. • Nuclear interactions strongly affect biological dose [12].
Secondary fragments produced during nuclear interactions lead to an altered spatial dose distribution, because different fragments have different ranges and angular distributions. Some of these fragments may hardly contribute to the physical dose, but they can lead to a modification of the LET spectra, which results in a difference of RBE for the same delivered dose. Thus, even when longitudinal and lateral physical dose measurements show excellent agreements with a MC code, this does not automatically imply that the biological dose in the patient is correctly calculated. In Section 4, we discuss the importance of biological aspects in more detail.
• Other aspects different from nuclear interactions can affect the prediction of dose distribution. For instance, in proton therapy, δ-ray electron build-up effects are important in the dose build-up region [69], and should be correctly accounted for in physical and biological dose calculations.
Summarizing, physical dose is indeed an important measure to validate the description of nuclear reaction models in MC codes, and newly developed models should always be validated with dose measurements. However, evaluating physical dose alone is not enough, as will become clear in the next paragraph.

Impact on the Calculation of Secondary Particle Production
Although dose calculation is the primary role of MC application in particle therapy, there are other aspects in which this kind of tool turns out to be essential. These are related to the capability of providing a reliable prediction of secondary particles produced in nuclear interactions. We have mentioned above that nuclear reactions influence the spectrum of fragments and how this can be crucial for the correct evaluation of both physical and biological doses. Moreover, although of scarce numerical relevance for dosimetry, production of secondary particles is relevant for various other issues, including the implementation of range monitoring techniques, neutron production, shielding, treatment facility design, or full treatment head simulations. In the following, we discuss the issues that we consider most complex: range monitoring and neutron production.

Calculation of Secondary Products for Range Monitoring Purposes
The nuclear processes that can yield secondary radiation suited for range monitoring application are three: the production of β + emitting nuclei, the de-excitation of nuclei by means of prompt photon emission, and the fragmentation of ion projectiles into fast hadrons capable of escaping out of the patient. In all cases, a MC prediction of the measured distributions is essential for their effective use for range monitoring purposes. Here, we summarize the main issues which are of interest for these processes as far as MC codes are concerned.
• Production of β + emitting nuclides. Nuclear β + decays produce positrons that annihilate with electrons, resulting in an almost back-to-back 511 keV photon pair that can be detected. The most likely β + emitting isotopes that can be formed are 10 C, 11 C, 15 O, and 13 N. Much research has been dedicated to range monitoring with PET, summarized, for instance, in various reviews [8,[70][71][72]. The capability of MC codes to provide a reliable prediction of the production of these isotopes is strongly correlated to the quality that the physics models achieve in managing the fragmentation process in general, and all the other stages occurring in the nuclei following the fast interaction. An alternative approach to full modeling, whenever validated experimental measurement of production cross sections Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 exist (as, for instance, those available in EXFOR database), is to fit them and interface them to the simulated track length distribution. In range monitoring, both approaches are being used. The general situation is that the most important general purpose codes are able to predict the production of the most abundant nuclides (like 11 C or 15 O) with a sufficient quality. The main challenge remains in the capability of predicting nuclides with short lifetimes, where also the available experimental measurements are scarce. Data that can be used to improve the accuracy of MC models for the production of short-lived isotopes for range monitoring are discussed in Section 2.5. All stages concerning transport of positrons and their decay can be considered more standard and manageable without particular difficulty. • Prompt gamma from nuclear de-excitation. Prompt gamma imaging methods are based on the detection of prompt photons that are emitted in the de-excitation phase of a nuclear interaction. The distribution of prompt gammas is correlated with the beam range. A recent review is written by Krimmer et. al. [73]. In the past years, MC developers started to pay particular attention to correctly predicting prompt gamma distributions. Also, in this case, the complexity of nonelastic nuclear reactions makes it difficult to accurately reproduce the level and shape of prompt γ emission. Of course the quality in the prediction of residual nuclei and of their excitation, as for the case of β + emitters, is important. Whenever possible, photon energies and branching ratios should be sampled according to existing databases of known nuclear levels and transitions. For example, they can be derived from data provided by the most recent release of the RIPL [74] data provided by IAEA. However, not only the level energies, but also spin and parity have an influence on photon emission. Furthermore, the shape of emission lines is subject to Doppler broadened due to nuclear recoil, and this also deserves attention in the development of the interaction model. • Production of fast charged hadrons. The principle of range monitoring in ion therapy by means of hadrons escaping the patient (mainly protons) was proposed about a decade ago [75] and since then investigated by different research groups, see Section 2.5. Fast hadron production is clearly part of nuclear fragmentation, and therefore we are again dealing with the physics models which are relevant to dose calculations. However, as discussed in Ref. 76, the most accurate result in the measurement of a proton emission distribution correlated with a longitudinal dose profile, is achieved by detecting particles emitted at large angles with respect to the beam direction. This involves a limited region of the available phase space and it is not trivial for available calculation models to reproduce with the same quality both fragmentation at small angles (the most important as far as dose is concerned) and the emission at large angles. As discussed later, there is a lack of experimental data useful for model tuning and benchmarking, and available data are limited in angle and primary energy [77,78]. Double differential cross sections are eagerly needed. More available data will be discussed in Section 2.5.

Calculation of Neutron Production and Interaction
MC codes allow to evaluate neutron production during a particle therapy treatment. The simulation of neutrons with energy exceeding ∼ 10 MeV is provided by the same models discussed so far for charged particles. The situation is different for lower energies, since neutron cross sections have a complex structure. They cannot be calculated and usually one has to rely upon evaluated nuclear data files. Different libraries are accessible (ENDF/B, JEFF, JENDL, CENDL, ROSFOND, and BROND) [79,80] and are periodically updated. By means of dedicated software codes, it is possible to obtain cross sections to be used within MC codes.
Simulations have been used to evaluate the risk of secondary cancer due to neutron exposure, especially in the comparison between passive and active scanning (see for instance [81][82][83][84][85]). At present, there is some consensus that risks deriving from neutrons are negligible with respect to those associated to the primary beam, at least as far proton therapy is concerned. The contribution of neutrons to dose is therefore normally neglected in treatment planning. However, there remain other cases in which the calculation of neutron interactions is important: • for prompt γ detection for range monitoring purposes, the evaluation of background deriving from neutron interactions is relevant. • The long-term effects of neutron production should be done with MC models that include radiobiological models [83,84]. • In Boron neutron capture therapy (BNCT) [86,87], which is based on the direct use of neutrons. It is one of the cases in which treatment planning and dosimetry are strongly based on MC. One of the main reference codes for BNCT is MCNP [42,88], which was used for treatment planning [89]. More recently also PHITS [37,38] has entered in use for this purpose [90]. A comparison of results obtained with GEANT4 for different physics list, and with FLUKA, is reported in Ref. 91. MC simulations are also widely used in BNCT to design new beams for the specific application [92,93].
For reliable neutron production models, benchmarking nuclear interaction models with neutron data are important. Fortunately, a large amount of neutron production data at an enormous energy range is already available from reactor experiments and radiation protection in space, as we will see in Section 2.5. Finally, even though neutrons might not seem very interesting in the context of charged particle therapy, we point out that neutron data are very powerful to constrain nuclear interaction models.

Data for Benchmarking and Tuning Nuclear Physics Codes
To determine the effects of secondary particles and to exploit them in range monitoring, MC models can be used to predict the yields and characteristics of the secondary particles. The accuracy Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 of the predictions ultimately relies on the ability to correctly model the relevant production cross sections. In our opinion, experimentally measured cross sections are the most valuable data that are needed to tune the models in the relevant range of ion species, energies, and target mass. In particular, single and double differential cross sections are the most interesting measurements.
A collection of measured cross sections in the energy range from 100 MeV/u up to 10 GeV/u can be found in reviews by Norbury [94], Sihver [95], and Bauhoff [96], focusing mostly on radiation protection in space. Moreover, the handbook by Heilbronn and Nakamura [97] contains lots of data for neutron production, mostly from design studies of heavy ion accelerator facilities and radiation protection in space. In the context of radiotherapy, much effort has been done in the last decades to improve the accuracy of MC codes in the energy range up to 400 MeV/u for tissue-like targets. However, although a large amount of valuable data are available for the purpose of benchmarking and tuning calculation models, at present, there still remain significant gaps. In the following, we summarize some of the most relevant measurements that have to be considered. We divide them into three categories: i) measurements of cross sections on thin targets, ii) measurements performed on thick targets, and iii) measurements specifically oriented for range monitoring purposes.

Measurements of Cross Sections on Thin Targets
These measurement concern total cross sections, partial cross sections, and single and double differential cross sections of specific processes. While total cross sections are valuable to predict primary beam attenuation, partial cross sections, and single and double differential cross sections are important to predict yields, angles, and energies of secondary particles. The usage of thin targets is most appropriate for tuning MC models, because the energy of the beam does not decrease, and the model parameters can be isolated from transport issues. A non-exhaustive selection of cross section measurements that have frequently been used for tuning nuclear models in MC simulations in the particle therapy energy range is reported in Table 1.
The majority of the cross section measurements in Table 1 are for carbon projectiles; however, the growing amount of interest in particle therapy with other projectiles has led to recent new initiatives. For instance, valuable new cross section measurements for 4 He [98,99] are displayed in Figure 2. This figure illustrates a general problem with many cross sections: there are differences between data sets, and for analytical or MC model parameterizations, it is ambiguous with which measurements the tuning should be done. New data are thus useful to resolve ambiguities between data sets.
As reported in 2010 by Böhlen [63], a tuning of the hadronic models in FLUKA and GEANT4 was based on some of the measurements in Table 1, revealing several shortcomings in both codes, in particular at lower energies. Further developments in the improvement of the nuclear models in GEANT4 were reported afterward [104,106,107], and new efforts are currently in progress [20]. PHITS, FLUKA, and MCNP6 were recently benchmarked with experimental data for neutron production cross sections [108].
Concerning protons, the work by Braunn et. al. [109], aimed at benchmarking the TALYS nuclear reaction code, contains a large number of references of total proton-nucleus cross section measurements as a function of energy in the range up to 250 MeV with tissue-like targets.
Despite the progress made over the years, Table 1 shows that double differential cross section measurements for charged fragment production are still scarce, while such measurements are the most essential for tuning nuclear reaction models. Measurements that are specifically aimed at improving the knowledge for particle therapy are planned in 2021 by the FOOT collaboration [110]. The ultimate goal of this experiment is to provide measurements of energy differential cross sections for the production of charged fragments with an accuracy of 5-10%. This would provide reference data sets for 12 C 62 C Double differential cross sections and angular distributions of secondary charged fragments up to 25°D e Napoli et al. [104] 12 C 9 5 C , C H 2 , Al, Al 2 O 3 , and Ti Double differential cross sections for secondary charged fragment production ranging from protons to carbon isotopes Dudouet et al. [77] 12 C 5 0 C , C H 2 , Al, Al 2 O 3 , Ti, and PMMA Double differential cross section for secondary charged fragment production ranging from protons to carbon isotopes Divay et al. [78] 12 C 115, 153, 221, 281, and 353 C, plastic scintillator, and PMMA Energy differential cross section at 60°and 90°of fragments with Z 1 Mattei et al. [105] Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 model benchmarking with a high level of accuracy. However, for processes or phase space regions where there is a complete lack of data, even less accurate measurement would be valuable.

Measurements on Thick Targets
These measurements include cross sections, primary beam attenuation studies, Faraday cup measurements, fragmentation yields, and emission angles. Rather than being directly useful for model tuning, these measurements allow us to assess the overall accuracy of MC codes, that is, transport, nuclear, and electromagnetic interactions together. A vast amount of measurements on thick targets have been done in the last decades; however, only a handful are useful for MC benchmarking. First, they should include a clear description of the experimental setup. Second, the physical quantities should be reported in absolute units. In Table 2, a selection of valuable measurements is presented. Of these measurements, the cross section measurements by Schall [111], Heattner [112,113], and Golvschenko [114,115] are most suitable for benchmarking MC codes at particle therapy energies, and these data were used for benchmarking FLUKA [63], GEANT4 [63], PHITS [116], and SHIELD-HIT [55,56]. Examples of studies aimed at studying emission angles and fragment yields were performed for PHITS [117], FLUKA [118], and GEANT4 [119], allowing for additional improvements in these codes. A recent experimental work of Aricò et al. [120] points out that there are differences in secondary fragment production between water and PMMA targets. Furthermore, simulations were performed using FLUKA, and some differences were found between experimental measurements and calculations. All these differences should be taken into account when dosimetric measurements are performed using PMMA instead of water phantoms. A different approach is represented by charge measurements performed by means of a multilayer Faraday Cup. For example, in the work of Rinaldi et al. [121], a FLUKA simulation of protons at 160 MeV was compared to existing experimental data. This is an integral test which allows to estimate the accuracy of MC models in reproducing the overall range of nuclear secondaries produced in target fragmentation. However, as remarked by the authors, it cannot provide any specific check of a particular reaction channel. Besides FLUKA, also other MC codes have been previously compared to the same kind of measurements. For instance, SHIELD-HIT is considered in Henkner et al. [122], MCNPX in Mascia et al. [123], and GEANT4 in Zacharatou et al. [124] and in Hall et al. [28].

Measurements That Were Performed in the Context of Range Monitoring
These measurements are those that specifically concern the production of secondaries which can be exploited for range monitoring, a topic where the reliability of nuclear models in   MC is of particular relevance. These secondaries include β + emitting nuclei, prompt gammas, and energetic secondary charged fragments. We summarize a selection of these measurements in Table 3. Again the list is non-exhaustive, and we selected only those measurements that were reported in absolute physics quantities on homogeneous targets, since we consider these as the most suitable for the benchmarking of MC models.
As far as β + emitting nuclei are concerned, a large amount of measurements is available for offline, in-room, and online monitoring. Most of them are measurements of the detection of the activity spatial distribution. However, the data which are particularly useful for MC model tuning are cross section measurements. Actually there are not yet enough cross section data available. Existing data have large uncertainties, and to achieve a range accuracy below 1 mm, more accurate cross section measurements are needed [132]. The case for which more data are available is that of 11 C production in p-12 C collisions. Figure 3, taken from the work of Matsushita et al. [133], illustrates a situation which points out that considerable systematic differences exist between different data sets. Clearly, the uncertainty of the predictions of MC models can be, at best, of the same order as the experimental uncertainty. FLUKA was compared with these cross section data, starting from the work of Sommerer [134], and the quality of results, as summarized in Figure 7 of the article of Battistoni et al. [135], can be considered satisfactory. The comparison with data for 11 C and 15 O production is shown. These are the two most relevant radionuclides in case of PET monitoring in proton therapy, but other isotopes can be important, especially in in-beam PET. Less data are available in case of 10 C production. Recently, new cross section data for 10 C, 11 C, and O 15 , very useful for future benchmark studies, were published by different groups [133,136], where the work of Horst et al [136] concerns also data relative to C-C and C-O collisions. Since recently, the use of PET monitoring technique is under consideration also in the case of ion therapy [137]. The prediction capability for shortlived β + emitters remains subject to uncertainties. A recent comparison [138] between GEANT4 for short-lived β + emitting nuclei with data [139] clearly demonstrated the need for improvement.
Regarding prompt gammas, soon after the first proposal to use prompt gamma detection for particle therapy [140], it was realized that existing MC models were not reliable, see, for instance [141]. This has led to many new developments, among which the Envision project [142], in the context of European FP7 program, dedicated to imaging in ion therapy, which also stimulated new efforts for the improvement of existing codes and the development of new ones. For instance, valuable measurements in the framework of Envision were reported by Smeets et al. [143] and Pinto et al. [144], as well as several studies aimed at improving the accuracy of MC codes like MCNPX [143], GEANT4 [145], and TOPAS [146]. The study by Dedes et al. [145] revealed that prompt gamma yields were strongly overestimated by GEANT4 [145]. Improvements in GEANT4 were reported in 2016 [147], and good results for the FLUKA code concerning the prediction of both yields and energies of prompt gammas are obtained [135].
Finally, charged fast hadrons were considered more recently for range monitoring purposes [148][149][150] in the context of ion therapy. Table 3 summarizes various measurements that are useful for MC benchmarking. From the point of view of fundamental cross sections of interest in this case, also data reported in Table 1 are very useful. However, the approach aiming at achieving more spatial precision [149] requires the use of fragments emitted at large angles (mostly protons). Simulating such processes is challenging, because only a few of the total number of charged hadron secondaries produced are emitted at large angles. In fact, the standard available biasing techniques (importance biasing and biasing of inelastic cross section) do not apply in this context, especially when a comparison with experimental data on the basis of event-byevent reconstruction is required. It would be possible to develop dedicated variance reduction techniques for this purpose. In addition there is lack of data at large production angles to benchmark production models. A recent attempt to perform this kind of measurements for 12 C interactions on different elements is reported in Ref. 105.

Future Directions in Model Development
In the last decade, many important developments of interest for particle therapy concerned the use of MC. So far, the attention has been mainly focused on the treatments with proton and 12 C beams. Therefore, also the efforts in the development of nuclear models were necessarily concentrated on the interactions of these projectiles. More recently, the attention is focusing on 4 He interactions, since this appears to be most probable application of new ion beams for therapy [168]. The case of 16 O projectiles is also a next study case. Consequently, future work to improve MC models should be oriented in the same direction.
It turns out that adopted hadronic models are in general adequate for physical dose calculations. However, it is clear that none of the models considered in the above-discussed MC codes, or others, is capable, alone, to provide reliable predictions in all clinical and research applications. Due to their intrinsic phenomenological nature, experimental data are needed to drive model development and to allow their benchmark. In this respect, our claim is that the most valuable data for this purpose are single and double differential cross sections, measured on thin target experiments. Unfortunately, in the whole range of interest for particle therapy, there is still a significant lack of measurements, mainly concerning nucleus-nucleus interactions. In the next years, new data are expected from the FOOT experiment [110]. Of course, also indirect approaches and measurements on thick targets, often involving multiple interactions and different energies, are very important for model validation.
Finally, we would like to point out that a judgment of the level of accuracy of certain model may strongly depend on the scope of use. The capability of correctly reproducing in detail the cross sections may not be necessary in all cases. For instance, a model can be accurate in reproducing the physical dose but not enough for range verification techniques. In any case, MC models require a continuous work of upgrade and development. This concerns both the improvement of physics modeling and the optimization of their algorithmic implementation. Actually, the complexity of the description of interactions can result in elevated consumption of computing time. As we shall discuss also in the next sections, this aspect remains one of the major limitations in the use of general purpose MC codes in clinical practice.

Rationale
The capability of performing a reliable calculation of physical dose is a necessary condition for an MC to be used in particle therapy. However, this is not sufficient, and the evaluation of effective biological dose is eventually required. This is a fundamental aspect which is particularly important in ion therapy. In fact, one of the most important reasons for the use of carbon or heavier ions in particle therapy is their increased relative biological effectiveness (RBE) in the Bragg peak region. A primary ion beam will give rise to a mixed radiation field with particles and nuclear fragments of different charge, energy, and LET. Each of them will have a different biological impact, even for the same imparted physical dose in the same kind of tissue. This is related to the different ionization density which the different particles will produce on length scales comparable to the size of DNA structure and cell nuclei. Many complex biological aspects are involved that may give rise to significant uncertainties in effective biological dose calculations. Thus, the study of radiobiological effects cannot be limited just to the physics of the interactions of radiation with matter, but should include the modeling of biological factors. Below, we discuss strategies that have recently been developed to take into account some biological effects in MC dose calculations in clinical and research context.

The Track Structure Approach
This review mainly concerns MC codes adopting the condensed history approach, while radiobiological calculations are most properly performed by means of track structure codes. These are codes designed to track the passage of electrons and ions simulating each individual basic interaction and recording positions and energy depositions of all produced particles. Besides the description of physical processes, radiochemistry models must be implemented in these codes to take into account the many body processes relevant for radiobiological purposes. Track structure codes are in general able to perform calculations on microscopic (nanometric) volume scales in liquid water, making their application in simulations of actual treatments highly unpractical from the point of view of computing power. However, they remain fundamental, together with mathematical models of the cell structure, for the investigation of all basic mechanisms related to biological effects of radiation. Among the examples of codes belonging to this class, we can quote PARTRAC [169], able to perform calculations on microscopic scales in liquid water, and TRAX [170,171], which can deal with different materials. Results obtainable by these codes can in principle be coupled with the radiation field simulation achievable with general purpose MC codes.
A recent approach aiming to merge the track structure approach into the framework of a general purpose MC code is the GEANT4-DNA project [172]. It was started in the context of the studies for radiation protection in space missions. The code currently includes the interactions of light particles (electrons) FIGURE 4 | Projected 2D pattern generated by a single 1 keV electron in liquid water using the GEANT4-DNA physics processes. The primary particle originates at the (0,0) position. Different colors represent different physics processes [172].
Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 and ions including hydrogen and helium isotopes down to the eV scale in liquid water. An example of the tracking capabilities of the code is shown in Figure 4, where the 2D pattern generated in liquid water by a single 1 keV electron is shown. Thanks to the geometrical modeling capabilities of GEANT4, it allows to implement the geometry of biological targets at submicrometric scales. In particular, it can use either a voxelized or an atomistic approach. The latter allows to model targets at nanometric scales, such as the DNA molecule, using the combination of standard mathematical volumes, as shown in Figure 5. A chemistry model can be coupled to simulate indirect effects of radiation due to the generation of molecular radical species.
Another attempt is being made in the context of the TOPAS [21,22] project, where an extension was developed called TOPAS-nBio [22,173], which is aimed at the modeling of detailed biological effects at the nanometer scale, facilitating and extending the use of GEANT4-DNA models for subcellular geometries, physics, and chemistry processes.

Coupling of General Purpose MC Codes With Radiobiological Models
A common approach to evaluate the RBE-weighted dose (D RBE ) in a MC simulation is to obtain the RBE by exploiting the survival probability S of cells as function of physical dose D as predicted by the linear quadratic dependence, inspired by the dual radiation model of [174] formulated as S e −αD−βD 2 . [1] Here, α and β are parameters, which depend on several variables including those of biological nature, such as the tissue type, and of physical nature, like particle type, energy, dose, and LET. The RBE factor, for a given survival level S, is then given by where α X and β X are the coefficients for photons, while α i and β i are the coefficients for the ions of interest (at a given LET). When general purpose condensed history MC codes are considered, the most common approach is to rely on a precomputed database of the coefficients α and β. They can be estimated by numerical radiobiological models, such as the local effect model (LEM) [175][176][177][178] or the microdosimetric kinetic model (MKM) [179][180][181][182][183][184][185]. Alternatively, they can be obtained from experimental data or from track structure simulations. One of the first examples of this approach within a MC can be found in the work of Kase et al. [186], using GEANT4 in the context of ion therapy activity at NIRS. In the case of FLUKA [135], a general interface is available to the user in order to provide a database in terms of α and β for different tissue types and for the different components of the radiation field as a function of energy per nucleon. As implemented also in other codes, in order to compute the biological effect, FLUKA performs the calculation of doseweighted averages α j and β j : where Δd i,j is the dose from the ith charged particle (composing the mixed radiation field) with associated α i,j and β i,j in voxel j and i runs over all particles depositing dose in voxel j. Eventually, RBE and RBE-weighted dose values can be determined for each voxel of the irradiated target knowing the absorbed dose and the doseweighted averages α j and β j As an example, we report in Figure 6 α and β (left panel) and the absorbed dose and D RBE (right panel) for a carbon ion biologically optimized spread-out Bragg peak, as available at CNAO. The SOBP was calculated for a homogeneous dose distribution of 3 Gy-RBE in a cubic shaped target of side 6 cm centered at 9 cm depth in water. The FLUKA weighting was achieved for a cell line having (α/β) X 2 Gy, as obtained from photon irradiation measurements.
Another example where this approach was applied is a newly developed radiobiological model denominated BIANCA [187,188], implemented in the form of MC simulation as well, which takes into account the development of complex DNA lesions, chromosomic aberrations, and their capability of inducing clonogenic cell death. In Ref. 188, the coupling of BIANCA with the FLUKA MC is reported. This work shows a comparison of survival data of CHO cells after an irradiation  Figure 7. Different RBE models can produce different results, and this can result in different dose prescriptions. This aspect has been studied in detail for carbon ion therapy at CNAO with the help of MC calculation. The starting point is that it was considered to take as reference the clinical protocols assessed in the past in the NIRS Japanese center. At NIRS the MKM radiobiological model was used, while, for several years, the treatment planning system of CNAO, as in the rest of Europe, was based on the LEM-I model. Since 2012, Fossati et al. [190] showed that for the same D RBE , significant variations of the physical dose D in the target volume, up to 15-20%, were found when comparing the two approaches. In order to minimize target physical dose variations, and possible consequent risks of undercoverage of target, prescription dose conversion factors, as suggested in Ref. 190 in a study with water phantoms, were validated for a series of patient cases by Molinelli et al. [191] by means of simulations performed with the FLUKA MC, where the alpha, beta radiobiological parameters from LEM-I were used according to the procedure described in this section. Further studies using MC simulations are reported in the work Magro et al. [192]. A Matlab-based tool was developed to generate a biological database, that is, a set of input tables of some model-specific parameters for a variety of particles, based on the MKM mode. This database was benchmarked with published ICRU energy loss tables. Then, using this database together with the information about the mixed-radiation field (particle type, energy, etc), FLUKA can calculate the RBE-weighted dose of the mixed-radiation field in each voxel. To clinically benchmark the coupling of FLUKA with the NIRS approach, a few real patient treatments were simulated, corresponding to different prescripted dose levels. The simulation results (physical dose, effective dose, and RBE) were compared to the results obtained by means of the TPS adopted at NIRS. Some discrepancies were found, but the general level of agreement was considered satisfactory. A similar investigation has been more recently performed for 4 He ions, as reported in the work of Mein et al. [193] using both MC and an analytical calculation platform.

Semi-Analytical Approaches
The dose average linear energy transfer (LET D ) is frequently used as physical quantity to describe the biological effectiveness of the mixed radiation field. LET D is the dose-weighted mean value of the particle LET distribution at depth z in the radiation field consisting of dose contribution D i from all particle species i and is defined as The quantities LET i (E) and ϕ i (E, z) can be evaluated by means of MC codes so that LET D can be calculated by means of Eq. 4. This can be exploited to evaluate RBE in the case of proton therapy, where it is possible to make use of phenomenological models which predict proton RBE as a function of LET, dose, and the (α, β) tissue-specific parameters for photons. The most common models of this type, derived from the analysis of proton RBE experimental data, are those of Wedenberg [194], Wilkens [195], McNamara [196], and Carabe [197]. Dependence of RBE on LET can be fitted in different ways. For example, in the Wedenberg model [194], RBE is given by where q is a parameter obtained by experimental data, different for each cell line. Instead, the RBE expressions of McNamara [196] is given by where p 0 0.999064, p 1 0.35605 Gy keV −1 μm, p 2 1.1012, and p 3 0.0038703 Gy −0.5 keV μm. These kind of approaches can be useful in proton therapy to take into account variations in RBE, as alternative to the common assumption in clinical treatments of a constant RBE value of 1.1. In principle, it is possible to introduce these parameterizations also in ion therapy; however, it is usually preferred to make use of more sophisticated phenomenological or mechanistic models.

Future Directions and Developments
Considering the present state of the art, the possibility of making use of track structure simulation within a MC calculation of particle therapy treatments seems to be confined, for the moment, to research activities. In fact, the exceptionally high computing time which is required by such an approach can discourage to adopt this solution for a systematic study of clinical cases, and treatment planning in particular, even when accelerating techniques or large clusters are employed. It can be considered sometimes for retrospective analyses of clinical data. In general, the use of an interface to pre-calculated tables from external radiobiological models will remain the most practical solution for MC simulations.
Future developments, useful for biologically oriented treatment planning, need to consider not only RBE but also other important radiobiological quantities. One of the most important parameters is the oxygen enhancement ratio (OER) and its dependence on physical variables. This parameter is particularly interesting in view of the use of high-LET radiation for the treatment of hypoxic tumors, as discussed, for instance, in the works of Scifoni and Sokol et al. [198][199][200].
The development of new generation treatment planning is also considering the radiobiological effect of target fragmentation in proton therapy. As proposed in Ref. 201, this phenomenon may contribute to an increase of RBE especially in the entrance channel, and neglecting this contribution can bring to an underestimation of damage to the healthy tissues. The investigation of this particular aspect, usually not considered so far, requires both experimental activity and MC simulations. This is also a theme where the importance of modeling nuclear interactions, as discussed in Section 2, is emerging [202,203].
Other topics in which MC simulation can be important in the context of radiobiology is the comprehension of possible mechanisms leading to the enhancement of radiobiological effectiveness. An attempt in this direction is the recent suggestion concerning the additions of specific radioisotopes (like 11 B or 19 F) to exploit nuclear reactions triggered by protons on these nuclei. These reactions can generate shortrange high-LET alpha particles inside the tumors, thereby allowing a highly localized DNA-damaging action [204]. Another interesting challenge is coming from the possible extension of the FLASH Radiotherapy approach [205,206] to particle therapy, that is, the delivery of very high dose rates ( ≥ 100 Gy/s). It has been suggested that such high dose rates can enhance differential effects between normal tissue behavior with respect to the tumor. From the point of view of MC codes, no changes are expected for the physics of interactions of radiation in matter, while specific radiobiological models are still under investigation and development.

Introduction
Over the past years, the superior accuracy of MC calculations with respect to analytical calculations has been confirmed in many studies. This is thanks to the more accurate implementation of physical (and possibly radiobiological) processes, the fact that the material composition is included in calculations, and the precise tracking algorithms. At present, MC dose calculations are therefore successfully used for various purposes in treatment planning and related research. Although the applications of MC in treatment planning are mostly in forward dose calculations (to compute the dose distribution in a tissue given a treatment plan), new developments are ongoing to include them in inverse dose calculations (to find a treatment plan whose execution will achieve a desired dose distribution).
In the next paragraphs, we shall first summarize some considerations on the role of MC in the present use of MC in treatment planning systems. Then, we shall present an example of developments in MC-based treatment planning systems, and we finish with some consideration about the underlying problematics and developments.

Present Usage of MC in Treatment Planning
MC simulations play an important role in the development and commissioning of treatment planning systems. Different treatment planning systems exist for particle therapy. A discussion of these tools goes beyond the purpose of the present review, but a few general issues can be mentioned. For dose delivery with spot spanning, commercial treatment planning systems are usually based on fast analytic dose engines using pencil beam algorithms. Such algorithms are often developed with the help of MC dose calculations, since it is not possible to rely completely upon analytic expressions or available tabulations. Many treatment planning systems require that the user provides specific measurements and MC simulations of pencil beams in water as a part of the TPS commissioning process for a given facility.
Moreover, MC simulations are currently often used for verifying the treatment plans of commercial treatment planning systems for complex patient dose calculations. Such verifications are often desired in situations which are characterized by large density heterogeneities (see, e.g., [1][2][3]), at least for a limited number of patient cases. The TPS prescription is used as input to the MC, which performs a forward dose calculation, that can be compared with the analytical result.
The clinical interest in high-accuracy MC tools is demonstrated by the fact that vendors of commercial treatment planning systems have since recently started to provide the user with MC-based dose calculation kernels for proton therapy, albeit only for forward calculations: • The RayStation ® [207] TPS provides the user the possibility to perform forward MC dose calculations for protons, based on a condensed history MC for primary and secondary protons, while heavier secondaries are considered only using a continuous slowing down approximation [208]. This MC dose engine has been validated also in clinical context [209]. Here, the authors demonstrated the superior accuracy of the MC dose engine with respect to the analytical algorithm in specific situations: in the presence of range shifters, large air gaps, or when beam directions are tangential to the patient surface. This may also depend on patient anatomy [2,3]. RayStation also includes a tool allowing for an easy comparison between the MC and the analytical result, so discrepancies can be easily identified.
• The Eclipse treatment planning system provides users a MC dose algorithm for proton therapy, AcurosPT [210]. Here, computational run time is minimized by simplifying and eliminating less significant physics processes. The algorithm was benchmarked with TOPAS.

Recent New Developments in MC-Based TPS
There are several promising new developments in the usage of MC calculations in treatment planning. For instance, in the context of ion therapy, new developments are ongoing in the TRiP98 (treatment planning for particles) [211,212] tool. This TPS was created as analytical dose calculation and optimization tool and used clinically for carbon ion radiotherapy at GSI until 2008, taking advantage from the development of the LEM radiobiological model carried on at GSI [175-178, 213, 214].
In the clinical applications of TRiP98, the LEM-I version was adopted. It also served as prototype for the clinical TPS Syngo® RT Planning Software [215]. The original TRiP98 used a 1dimensional deterministic transport algorithm, and the broadening of the ion beam as a function of depth was accounted for by means of a double Gaussian function. In order to overcome this kind of limitations, a fast MC algorithm was developed within TRiP98, as reported in the article of Iancu et al. [216]. This dose kernel could potentially also be used for plan optimization. Also, in the development of ion treatment planning at HIMAC, MC has played a role. As reported in a dedicated textbook [217], GEANT4 was used to build the prediction of ion species and their kinetic energies of the therapeutic carbonion beam and to predict the dose mean-specific energy (z*) which is necessary for their microdosimetric radiobiological model.
A recent example of MC dose calculation tool for both proton and heavy ion therapy is the work of Russo et al. [218,219]. It describes a TPS computing kernel for different ion types, called PlanKIT (Planning Kernel for Ion Therapy), developed within a collaboration between IBA and INFN, currently only used for research. This TPS is based on the use of a "Beamlet Superposition (BS) model." The concept, similar to what is adopted in other projects, is that an ion beam can be thought of as composed of subunits, here called Beamlet, that are obtained splitting the beam phase space in smaller phase spaces. The total irradiation result is obtained by summing the interactions of the different beamlets, computed with FLUKA MC simulations. In this way, it is possible to produce universal Look-Up tables of physical and radiobiological quantities, like dose, track-averaged, and dose-averaged LET and, by coupling to the MKM radiobiological model, α and β coefficients to derive RBE according to the linear quadratic model. The main advantages deriving from this approach were two: i) the possibility to simulate the irradiation with different ions species and ii) the Considering the recognized quality of results in dose calculation which can be obtained by means of simulation, it is natural to consider also the possibility of coupling MC to an optimization code for inverse dose calculations, so that MC accuracy would be assured for all stages in treatment planning, from development to clinical use. This would have a few important advantages: • It would allow to include the most accurate description of physics, geometry, beam models, and materials during all stages of treatment planning, including plan optimization, that is, the calculation process to obtain all beam weights by optimizing the dose in terms of target coverage and OAR dose. • It would allow to consider the actual composition of tissues.
In fact, TPS calculations are usually performed just considering a water equivalent path length environment. For electromagnetic processes, this is generally not a problem, but for processes involving nuclear interactions, it may lead to errors, especially in calculations for high density regions in the patient (bone and implants), or in modeling treatment heads and beam shape devices. • It would be possible to simultaneously predict dose and secondary particle production, useful for treatment verification techniques.
One of the first attempts to proceed in this direction has been reported in the work of Mairani et al. [220], which describes the development of a MC treatment planning (MCTP tool) for proton therapy based on the FLUKA code. The workflow of this solution is illustrated in Figure 8.
The procedure requires as input a set of pre-optimized pencil beams P 1 (N 1 ). Here, N 1 is the initial pre-optimized guess for the fluences that is obtained with a commercial TPS or fast MC, starting from all the possible beams P 0 available in a given therapy facility in terms of energy values and positions. Then, the MC-optimized solution, P 2 (N 2 ), is obtained iteratively by an optimizer algorithm which runs over a MC calculated dose kernel, that is, 2-dimensional matrix of d i,j values (dose in voxel j of patient geometry deposited by pencil beam i). Hereby, an RBE-weighed dose kernel is achieved by either using a fixed RBE value of 1.1 or using pre-computed tables of radiobiological coefficients, as described in Section 3. An optimizer code, taking as input the MC dose kernel, produces the treatment plan by minimizing a cost function which takes into account the prescribed dose to the planned target volume (PTV) and a set of dose limits for organs at risk (OAR). This approach has been initially validated at CNAO by comparing the results to those achieved by the standard TPS adopted in the facility. Figure 9 reports the results of one of those comparisons.
The project was soon extended to ion therapy, as described in the work by Böhlen et al. [221]. Since then, it has been used mainly for research purposes. For instance, we can quote two examples: i) the investigation of the robustness of ion beam therapy treatment plans with respect to uncertainties in biological treatment [222] and ii) the study and validation of treatments with different ion species, including 4 He and 16 O (also protons and C), by comparing results with dosimetric measurements, as performed at the Heidelberg Ion Therapy Center [223].

Developments and Perspectives
As mentioned in the conclusions of Section 3, work is ongoing in view of a new generation of TPS considering new features, as other radiobiological parameters and target fragmentation, and the possible use of new ions other than 12 C. In all these cases, the use of MC will undoubtedly remain important both for development and check. A new case study where simulation can play a fundamental role (provided that the coupling with radiobiological modeling is properly implemented) is the optimization calculations for multiple ion treatments, as described by Sokol et al. [200]. Here, the combination of 4 He and 16 O beams has been considered as advantageous to achieve the "kill painting" approach [198,224]. The first studies have been performed by means of the multi-ion biological optimization (MIBO) version of TRiP98 where specific algorithms to consider OER were added.
The MC approach discussed in Section 4.3 to treatment planning is surely of great interest and is potentially an innovative breakthrough. A few issues are still hampering the application in clinical practice. The most obvious difficulty is the large computing power that is required by full MC calculations, several hours for a multi-field proton therapy treatment plan [220]. From a clinical point of view, treatment planning with full MC can become attractive only if the required time is comparable to that employed using the standard approach, even if the quality of physics can be superior. Of course it is possible to reduce computation time by running parallel independent histories on a cluster of CPUs. However, the use of a very large cluster is not straightforward for all therapy centers. Instead, a more promising road is the development of fast MC techniques. This will be reviewed in Section 5. As we shall see, new attempts to develop MC treatment planning environments are under way [225,226].
Finally, higher computation speed is not the unique requirement. In order to facilitate the use of MC in treatment planning in a clinical environment, it is important to develop the necessary software interfaces for the integration with TPS and image processing.

DEVELOPMENT OF FAST MC CODES
As mentioned previously, a major issue hindering widespread clinical implementation of MC simulations is computational efficiency. The complexity in the implementation of physics models and transport algorithms is very demanding in terms of computing power. Considering that a normal MC simulation Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 of a treatment often requires transporting a large number of particles, usually large computer clusters have to be used. For this reason, in the last years, the interest for the development of fast simulation techniques has grown considerably. These attempts make either use of new technologies in computing hardware, or they adopt dedicated algorithmic solutions. Since it is often sufficient to have a code fully dedicated to dose calculation, a significant contribution to the speedup is also obtained by simplifying the structure of the code with respect to a general purpose MC, focusing only on the relevant processes. In this Section, we highlight a few examples of fast MC code development.

Usage of Graphics Processing Units (GPU)
The progress in the use of graphics processing units (GPU) allowed a new effective direction in the search for higher computation speed. This approach, pushed by computed graphics, has allowed for the development of techniques for general purpose computing exploiting the high degree of parallel operation which characterize these hardware units. This has brought to the approach denominated "General Purpose computing with Graphics Processing Units" (GPGPU). MC is one of those computing cases which can profit from the high degree of parallelism allowed by this technology, since events may be processed in many different cores at the same time. A comprehensive review about advantages and challenges about GPU proton dose calculations (analytical and MC) was written by Jia et al. [10]. The GPGPU architectures have evolved in time, and since about 2007 commercial solutions are available with full possibility of programming. Probably, the most diffused and interesting hardware units are those produced by NVIDIA® through the CUDA® (Compute Unified Device Architecture) platform [227]. CUDA® is a development environment which allows the writing of applications by means of the extension of diffused programming languages, like C/C++, overcoming the difficulties of assembly languages. CUDA® compilers for other languages are also commercially available. Programs exploiting the GPGPU architecture can be also written using the OpenCL (Open Computing Language) software libraries. MC algorithms  Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 must in general be rewritten in order to efficiently exploit GPU features, taking also into account possible limitations and bottlenecks, as those coming from the interactions between CPU and GPU memories. Limits such as the size of global and shared memory, maximum number of threads per block, and number of stream multiprocessors are GPU dependent. In the case of MC simulations, there exist some limitations to the effective number of parallel threads in a GPU. The large number of cores (typically thousands) cannot, in practice, be totally exploited at all times. The random structure in MC algorithms generates the so-called thread divergence, which limits the actual number of cores which effectively run in parallel. However, the achievable gain factor remains in any case significant.
A recent example of a MC simulation framework exploiting the GPU approach for pencil beam dose calculations is the FRED (Fast paRticle thErapy Dose evaluator) [225] code, developed in the context of treatment planning for proton therapy. Using effective models for particle-medium electromagnetic and nuclear interactions, it allows for tracking and scoring of energy deposition of primary and secondary particles, also in a voxel geometry imported from CT scans. Single pencil-beam dose simulations have been validated. A merit of this framework is that, since the beginning of its development, it already contained all the optimization tools for full treatment plan MC-based calculations. Moreover, not only the use of the water path length approximation is possible, but also simulating realistic materials. The tracing kernel can achieve an event processing rate of 10 million primary/s on a single GPU card. This performance allows to recalculate a treatment plan at 1% of the total particles in a few minutes.
As an example of the quality achievable with this tool, in Figure 10, we show the dose comparison between measurements, FRED, and the commercial TPS used at CNAO, for a spread-out Bragg Peak (SOBP) corresponding to a uniform dose irradiation in a 6-cm sided cube volume at 15 cm of depth in water. Panels (a) and (c) show the longitudinal dose profiles, while panels (b) and (d) show the transversal profiles.
Preclinical application of FRED is presently being studied in Kracow, also in conjunction with the development of plasticscintillator-based PET detectors for particle therapy delivery monitoring [228]. Further development of FRED for carbon ion treatments is in progress.
Moving to an example in a more clinical context, complete 4-D patient dose calculations in proton therapy including respiratory motion can be ran on GPUs, as reported in Ref. [229]. The GPU calculations include the distribution of the treatment plan according to breathing phase, calculations, and dose accumulation from the various breathing phases. Proton transport simulations were done according to the track-repeating algorithm (described below). Simplified approaches for electromagnetic and nuclear physics processes were used. Validation of the GPU dose calculations was done against GEANT4/TOPAS in terms of secondary particle yields, energy and angular distributions, and for treatment plans [230]. The calculations are fast enough to have allowed for the development of a GPU-based TPS [231]. A GPU-based proton MC tool (GPUMCD) has been developed by Maneval et al. [232]. They introduced a rigorous formalism for energy loss calculation at the typical proton energies used in therapy. The purpose was to accelerate the MC dose calculations by allowing larger steps while preserving accuracy. A lookup table linking the fractional energy loss per step length, ϵ, to the equivalent restricted stopping power L eq was created. The mean energy loss for any step length was simply defined as the product of the step length with L eq . The proton CSDA (continuous slowing-down approximation) was modeled with the L eq formalism and added to GPUMCD. GEANT4 had been used as reference for the validation. This formalism was found to lead to an intrinsic efficiency gain factor ranging between 30-630, increasing with the prescribed accuracy of simulations. It can be considered a promising variance reduction technique for computing proton dose distributions. Combined with GPU acceleration, the total acceleration provided by the L eq formalism with respect to CPU-based GEANT4 simulations was found to be the order of 10 5 .
Still in proton therapy, a calculation package for proton dose calculations, gPMC (GPU Proton Monte Carlo), was developed [233]. This framework aims at improving speed without applying simplified approaches. Protons are transported according to a class II condensed history scheme and continuous slowing down. Production of delta-electrons is considered, but their kinetic energy is deposited locally. Regarding nuclear interactions, the empirical strategy developed in Ref. 234 is adopted (described in more detail below). Only proton-proton elastic reactions, proton-oxygen elastic reactions, and inelastic interactions were included. The framework was validated using TOPAS/GEANT4 and performing the gamma passing rate analysis. It can be noted that the authors of this work explicitly mention some technical problems, typically encountered in the interaction between GPUs and CPU, and their solution. Since then, developers of GPUbased MC codes have progressively learned how to manage and overcome these difficulties.
For carbon therapy, recently a framework called goCMC (GPU OpenCL Carbon Monte Carlo) was developed, simulating particle transport in voxelized geometry [235]. The package operates using the OpenCL framework. Electromagnetic processes are modeled by the standard class II condensed history scheme with a continuous slowing down approach. Regarding nuclear interactions, only 4 (inelastic) reactions are included, and secondary particle production is simulated using pre-computed tables, containing particle yields, energy, and scattering angle probabilities. The disadvantage of this method is that energy and momentum conservation is not assured event-by-event, as in full MC simulations. The method was validated for mono-energetic mono-directional beams against GEANT4 and data, using the gamma analysis and dosimetric quantities. A new development, oriented to provide a real MC-based treatment planning is presented in Ref. [226]. An example of results is given in Figure 11, for three patient cases (prostate, pancreas, and brain) treated with 12 C ions. The total time needed, including MC simulation, optimization, and final dose calculation, ranged from about 800 to 6,400 s, depending also on the adopted hardware.
Recently, a GPU simulation framework called FRoG (Fast Recalculation on GPU) was developed allowing to simulate protons, carbon, helium, and oxygen ions [236]. Although Frog is reported to have MC accuracy, the framework used an analytical dose calculator.

Usage of Phase-Space Files
Rather than fully modeling the beam line and treatment head, a fixed machine output for a given installation can be used for treatment simulations. In that case, "phase-space files" can be produced for a given treatment head, containing the physical properties (energy, direction) of the primary particles that exit a given treatment head. This technique is common in conventional radiotherapy. In particle therapy with passive scattering techniques, usage of phase space files is typically not possible because the treatment head varies. However, even for active scanning techniques applying this method is not straightforward, because of facility dependence of the characteristics of outgoing particles, for instance, due to differences in the beam line devices or differences in operational beam parameters like beam size. In that case, using "optimized" phase space files could be an option [237][238][239], in which the parameters are somehow tuned until the simulations match the measurements (typically longitudinal and lateral dose profiles) at a certain treatment site. This method was recently applied at the Shanghai Proton and Heavy Ion Center [239].

Track-Repeating Algorithms
The track-repeating approach, originally proposed by Li et al. [240] for protons, is based on the idea to use pre-generated events during simulation to accelerate dose simulations. First, a full MC simulation is performed to generate events of protons interacting in water, with various initial beam energies. These events, including particle trajectories with path length, scattering angles, energy losses, and deposits in water for each step, are stored in a large database. All the physics interactions (in water), including secondary particles, are thus calculated by a general purpose MC code. Then, during particle transport in a given medium, the particle track length in a given material is adjusted to that of water, using the local density. For materials such as bone, corrections can be included by considering the different stopping power. Angles are adjusted according to the direction of the incident particle. At each step, appropriate tracks are selected from this pre-generated database, which are tracked further. This so-called track-repeating approach is different from the normal MC simulation, where physics interactions are sampled on the fly at each step. The track-repeating approach was clinically validated [241] against data and GEANT4, using the gamma index analysis and dosimetric quantities (DVHs) and is in use at the Shanghai particle therapy facility [238]. Figure 12 shows a comparison between the fast dose calculator (CDC) described by Yepes et al. [241].
A GPU implementation was also performed by the authors [242]. The usage of pre-generated tracks from a database was also adopted by [243], reusing ideas from available codes from conventional radiotherapy. The advantage of their approach is that they use pre-generated events in many different materials, rather than only in water. They used MCNPX for the generation of the pre-calculated tracks, as well as for the validation of the code. The latter was based on depth-dose curves along the central axis and lateral dose distributions.
A possible shortcoming of the track repeating approach, in the context of particle therapy, concerns nuclear interactions. While there are assessed methods to perform scaling of stopping power for different materials and densities with respect to water, there may be a substantial dependence of nuclear cross sections on the actual elemental composition. It is therefore questionable whether using events based on the physics interactions in water is sufficiently accurate. Regarding the calculation of dose, which is dominated by electromagnetic interactions, the accuracy is probably sufficient. In those cases where an accurate description of nuclear interaction is desired, as those we have discussed in Sections 2.3 and 2.4, this could not be the case. This problem could probably be overcome by pre-generating full events also in other materials.

Voxel Monte Carlo Algorithms
Based on ideas from traditional radiotherapy algorithms, a Voxel MC code for proton (VMCpro) therapy was developed [234]. Here, protons are step-by-step traced through a voxelized volume (typically a CT scan). One step is equal to the distance between the voxel boundaries, unless a discrete interaction took place within the voxel. The geometrical step length is calculated from the geometrical step length in water according to the ratio of densities and the ratio of the stopping power in a given voxel density to that in water, using an empirical fit formula. A multiple scattering angle is sampled at the new position, and the proton is rotated accordingly. Nuclear interactions are modeled as corrections to electromagnetic processes, treating all soft tissues as water. In the quoted article, the authors claim that they succeeded to achieve, at that time, a computing speed higher FIGURE 11 | Example of TPS calculation with the GPU framework described in Ref. 226. First and second rows are biological dose and physical dose, overlaid on CT images, in three patient cases. From left to right: prostate, pancreatic, and brain cancers. The color bar indicates the dose levels relative to the prescription dose. Arrows indicate beam directions. The bottom row is the corresponding dose-volume histograms (DVHs). Solid lines and dashed lines denote biological dose and physical dose, respectively. The OTV represents the optimization target volume, a similar concept to CTV.
Frontiers in Physics | www.frontiersin.org November 2020 | Volume 8 | Article 567800 by a factor of 13 with respect to FLUKA and of 35 with respect to GEANT4.

Macro Monte Carlo Simulations
In the work of Fix e al. [244], a method inherited from electron dose calculations is adapted to proton therapy: Macro Monte Carlo simulations (MMC). In this method, full MC simulations (in this case with GEANT4) of protons in different spheres of different materials with many energies are performed, and exiting particles with all their characteristics are stored (macro-steps). The dose distribution in a voxelized grid can be obtained by a sequence of macro steps. The method was validated by comparing depth-dose curves and lateral dose curves with GEANT4.

Other Approaches
An example of a dedicated in-house proton MC code developed at PSI for assessing possible deficiencies of analytical dose calculations in inhomogeneous patient tissue is described in Ref. [245]. Here, a slab-by-slab approach is adopted for proton transport, calculating energy loss and taking into account multiple Coulomb scattering in the voxel. However, nuclear interactions are simulated analytically, with no tracking of secondary particles. The code was tested on a variety of patient plans. Yet another approach of speeding up MC simulations for protons is with simplified Monte Carlo (SMC) simulations [246]. The approach is based on an effective model to take into account all physics processes relevant to dose calculations, using measured depth-dose curves in water. At each voxel, two effects are modeled. First, the residual range of protons is decreased according to the local material properties and corresponding energy loss. Second, MCS is modeled according to a Gaussian distribution. Thus, rather than sampling detailed interaction processes step by step, they capture all net effects (including nuclear interactions) in terms of dose deposition in each voxel. The code was also implemented on a GPU platform.

GPU-Based MC for Radiobiological Calculations
The use of GPU allowed to develop efficient MC applications specifically oriented to radiobiological purposes. As discussed in Section 3, the quantitative evaluation of radiation damage in biological cells, such as DNA strand breaks and base damage, in general take a considerably long execution time. This is particularly relevant for track structure computations. We report here three interesting examples. In the work of Kalantzis et al. [247], the authors present an implementation of a MC code for microdosimetric calculations of low energy electrons and protons tracks on a GPU under the above quoted CUDA® platform developed by NVIDIA®. Performance and accuracy have been tested on a commercially available general purpose GPU. They also developed a hybrid implementation employing OpenMP (an application program interface (API) used to explicitly direct multi-threaded, shared memory parallelism among CPU cores) and CUDA® in order to demonstrate the potential of utilizing simultaneously a multi-core CPU and a GPU for further acceleration of the MC simulations.
Another interesting example is reported by Okada et al. [248], where a new MC simulator named MPEXS-DNA is described, allowing high computing performance by using a GPU. The code has been developed for track structure and radiolysis simulations at the subcellular scale. Physics and chemical processes are based on GEANT4-DNA processes available in GEANT4 version 10.02 p03. The validation was performed by comparison with existing experimental data and simulation results obtained by other simulation codes, including PARTRAC [169]. By using NVIDIA® GPU devices adopting the Volta architecture, MPEXS-DNA has achieved speedup factors up to 2,900 against GEANT4-DNA simulations performed with a single CPU core.
Finally, Tsai et al. [249] present a GPU-based fast microscopic MC simulation package, gMicroMC. They also built a human lymphocyte nucleus DNA model and implemented a DNA damage calculation method to compute single-and doublestrand breaks of different complexities. The comparison of the simulation results with those generated by GEANT4-DNA gives good agreement. They achieved speedup factors of ∼ 540 times for the entire simulation process, as compared to the computations by GEANT4-DNA executed on a CPU using a single thread.

Role of Full MC Frameworks
From the amount of literature written about fast simulation frameworks, it is evident that there is a large interest in speeding up calculation codes, as this brings us a step closer to application in a clinical environment. At the same time, we have seen that all fast codes have validated their calculations also by means of comparison with general purpose MC codes. Thus, we can conclude that codes like MCNPX, GEANT, or FLUKA remain important role also for the development of new calculation methods to be adopted in particle therapy. Summarizing, we can conclude that in the context of fast MC developing, there are two main tasks for full general purpose codes: • To generate realistic pre-generated data bases for trackrepeating approaches. • To serve as validation for accelerated simulation codes.
In the case of phase space file and track-repeating algorithms, attention has to be paid to possible artifacts. The size of precalculated samples need to be carefully considered. In fact, when using standard sampling techniques, the very long period in the cycling of random number generation algorithms allows a totally safe margin against the possible repetition of the same events. Instead, when sampling from a file of pre-generated events, which is necessarily limited in size, more attention has to be paid if the same configuration has to be used more than once. In this case, there is the risk of generating fake structures and correlations in the distribution of scored variables.
In addition it has to be remarked that for specific applications going beyond dose calculations, like, for instance, the case of range monitoring, general purpose MC codes still remain the most appropriate tool.

DISCUSSION
In this review, we have discussed some of the ongoing research and challenges that remain to be faced in order to improve MC simulations for particle therapy so as to facilitate their use in clinics and in research. One of the challenges is the improvement of nuclear interaction models. In particular, there is still the need of more new experimental data sets of double differential cross section measurements for radiotherapeutic beams and tissue like targets, since this is essential for benchmarking and tuning nuclear interaction models. Although the calculation of physical dose has already reached in most cases a satisfactory level of accuracy, improved models can lead to more accurate biological dose calculations and improved range monitoring methods, not only for carbon and protons but also for new ions like helium and oxygen. New experimental activities designed for this purpose are at present in progress [110].
Another important challenge for MC development concerns the reliability of modeling radiobiological factors in simulations. The coupling with radiobiological models, or databases, may allow to evaluate the onset of initial DNA damages, but the task of treating all later processes governed by radiochemical processes is out of scope, at least in general purpose MC codes. The uncertainties in this context are particularly difficult to mitigate. The experimental radiobiological measurements, which provide results that are the most useful references for RBE evaluation, still exhibit important uncertainties, as demonstrated in the data base of the particle irradiation data ensemble (PIDE) project [250].
Regarding MC-based treatment planning, many interesting developments are ongoing, and MC-based biological dose calculations are entering more and more in the clinical workflow for all patients. Fast MC codes based on GPU-based calculations or special algorithms are very valuable to come a step closer toward MC-based biological treatment planning. However, such fast MC calculations should always be carefully validated with full MC codes, which then maintain their important role.
There are certainly other topics which are still relevant to the development of MC simulations in particle therapy that we have not considered in this review. For instance, we have not discussed the possible benefits related to improving the description of patient composition in MC simulations, through new imaging techniques. Currently, MC simulations of all processes in patients are based on CT scans. When hadronic interactions are relevant, the actual nuclear composition of tissues has a relevance. Parameterization as a function of CT Hounsfield numbers, like the one by Schneider et al. [251], provides mean values. Improvements could derive by the adoption of imaging techniques with a higher resolving power in distinguishing material differences, like Dual Energy computed tomography (DECT) or multi-energy computed tomography (MECT). Research and development on new approaches like proton-CT can be also an alternative solution. Also, much research is ongoing to how to simulate time-dependent geometries (4D-MC simulations) to include motion effects in dose calculations, which was not treated in this review. There are also some arguments which are related to quality control, dosimetry, and microdosimetry which surely benefit from the use of MC. New developments in this direction are ongoing [252]. Last, but not least, we have not entered in the discussion about the importance of easy user interfaces for MC codes. Several interesting developments in this context are ongoing [21,22,253]. Examples on this subject, in view of clinical application of MC in proton therapy, have been recently published [254][255][256]. The spread of MC tools in clinical practice can be also facilitated providing nonexpert users with clear tutorials and easy installation procedures.
Despite the important progresses achieved so far in the application of MC simulations in particle therapy, we are still far away from efficient MC simulation codes that can model the physical interactions and biological response of tissue at multiscale level, that is, from the level of DNA segments (nm) to large populations of cells (cm). Such MC codes should include simulation of physical, physicochemical, and chemical processes in human tissue, within short time scales and without the need of enormous computing resources. It is also clear that MC tools, for their nature, need to be continuously improved and maintained, also from the point of view of computing technology. Frequent code upgrades, close collaboration, and resources are also necessary to further improve the usage of MC calculations. Not only developers but also users of MC simulations in particle therapy can contribute by validating their simulation framework with measurements, whereby it is essential that the beam model and dose delivery system are accurately simulated. Finally, the diversity of the literature studied in this review highlights the need for a balanced multidisciplinary multi-scale approach to realize such codes, combining physics, biology, chemistry, medicine, and computing. Bridging the gaps between the various research fields is therefore of prime importance for the development of efficient MC tools in particle therapy.

CONCLUSION
Having access to accurate MC simulation codes is considered crucial in particle therapy, not only because such codes provide more accurate calculations of dose in patients but also in several research fields including range verification techniques, neutron dose calculations, modeling of biological effects, and research to new treatment techniques. In this review, we have discussed a few of the challenges that are to be faced before MC simulations can be widely and efficiently used by the particle therapy community. Improving nuclear interaction physics models, radiobiological modeling, MC-based treatment planning, and development in fast MC codes are the topics selected for this review, but many other topics remain to be addressed.
We have tried to emphasize how the successful development of these codes is a task that need a multidisciplinary approach, as in fact is true for the whole subject of particle therapy. Applied nuclear physicists can have an important role in bringing the particle physics community a step closer toward an efficient use of MC simulations. In particular, the expertise and data from nuclear physics experiments are very important to improve the radiobiological modeling of tissue response to particle treatments in MC simulations.

AUTHOR CONTRIBUTIONS
SM promoted this work; GB and AK performed editing and reviewing. AK coordinated the work.