Commissioning of GPU–Accelerated Monte Carlo Code FRED for Clinical Applications in Proton Therapy

We present commissioning and validation of Fred, a graphical processing unit (GPU)–accelerated Monte Carlo code, for two proton beam therapy facilities of different beam line design: CCB (Krakow, IBA) and EMORY (Atlanta, Varian). We followed clinical acceptance tests required to approve the certified treatment planning system for clinical use. We implemented an automated and efficient procedure to build a parameter library characterizing the clinical proton pencil beam. Beam energy, energy spread, lateral propagation model, and a dosimetric calibration factor were parametrized based on measurements performed during the facility start-up. The Fred beam model was validated against commissioning and supplementary measurements performed with and without range shifter. We obtained 1) submillimeter agreement of Bragg peak shapes in water and lateral beam profiles in air and slab phantoms, 2) < 2 % dose agreement for spread out Bragg peaks of different ranges, 3) average gamma index (2%/2 mm) passing rate of > 95 % for > 1000 patient verification measurements using a two-dimensional array of ionization chambers, and 4) gamma index passing rate of > 99 % for three-dimensional dose distributions computed with Fred and measured with an array of ionization chambers behind an anthropomorphic phantom. The results of example treatment planning study on > 100 patients demonstrated that Fred simulations in computed tomography enable an accurate prediction of dose distribution in patient and application of Fred as second patient quality assurance tool. Computation of a patient treatment in a CT using 10 4 protons per pencil beam took on average 2′30 min with a tracking rate of 2.9 × 10 5 p + / s . Fred was successfully commissioned and validated against the clinical beam model, showing that it could potentially be used in clinical routine. Thanks to high computational performance due to GPU acceleration and an automated beam model implementation method, the application of Fred is now possible for research or quality assurance purposes in most of the proton facilities.


INTRODUCTION
In proton radiation therapy, Monte Carlo (MC) methods offer more accurate modeling of proton interactions with heterogeneous media and improve dose calculation accuracy in complex geometries with respect to analytical pencil beam algorithms [1][2][3][4]. The application of MC algorithms in treatment planning can eventually lead to a reduction in the target volume safety margins by about 2% and more accurate prediction of the treatment outcomes [5]. The state-of-the-art commercial proton beam therapy (PBT) treatment planning systems (TPS) employ MC methods for treatment plan optimization and dose calculation [6,7], but they are still not the standard treatment planning tools in all clinically operating PBT facilities. Many proton facilities still use analytical pencil beam algorithms of limited accuracy in heterogeneous media. Also, the time performance of the MC-based TPS remains to be an issue, especially when applying robust optimization algorithms that require computing several dose distributions for one computed tomography (CT) image or in treatments of moving targets where 4D-CT consisting of a series of CT images of several motion phases of one patient are employed in treatment plan optimization [8]. In addition, proton radiation therapy quality assurance (QA) procedures are time consuming and require manpower for experimental measurements of dose distributions in phantoms, typically performed at a few depths in water for each treatment field. In fact, time needed for patient QA could be dedicated for the actual patient treatment. Therefore, reduction in the number of measurements is widely discussed among medical physicists [9][10][11][12][13][14]. Supplementing or replacing patient QA measurements with dose distribution recalculation using a second, independent, dose-calculation engines can be beneficial for PBT facilities.
In several PBT facilities, general purpose MC simulation toolkits, such as: FLUKA [15], Geant4 [16,17], or Shield-HIT [18] as well as more user-friendly environments built on Geant4 like GATE/GATE-RTion [19][20][21] and TOPAS [22,23], are used to support research activities and/or simulations for patient QA. The clinical application of general purpose MC tools is limited, mainly due to the time required to recalculate a complete plan ranging from tens of minutes to even a few hours. For this reason, the parallelization of the particle tracking on several central processing units (CPU) or general purpose graphical processing units (GPU) is of interest for radiotherapy. The PBT-dedicated GPU-based MC code gPMC implemented by Jia et al. [24] was further developed [25] and validated using clinical patient data [26]. Following the gPMC development, Wan Chen Tseung and colleagues presented a high-performance GPU-accelerated MC code, which is used for routine clinical QA and as the dose calculation engine in a clinical MC-based Intensity Modulated Proton Therapy (IMPT) treatment planning system [27]. Recently, an analytical pencil beam algorithm, the FRoG platform, was implemented on GPU for clinical investigations with different ion types [28,29].
The commissioning and validation of the independent, MCbased dose calculation engine for research or patient QA purposes is a time-consuming process that requires knowledgeable and experienced manpower. Only recently, standards for beam modeling and beam model commissioning for MC dose calculation-based radiation therapy treatment planning were proposed [30]. The experimental characterization of the proton beam properties (longitudinal and lateral profiles as well as dosimetric calibration) as a function of primary beam energy is facility dependent because different PBT centers use different accelerators, measurement methods, and TPS. The complete implementation of passive and active beam delivery nozzle geometry was described by Paganetti et al. [1] for cyclotronbased facilities and by Parodi et al. [31] for synchrotron-based facilities. However, it was suggested later that for MC dose calculation purposes, defining the beam model following the clinical commissioning procedure and avoiding detailed simulations of the beam nozzle geometry is possible with a precision that is sufficient for clinical application [10,32,33].
This article reports on commissioning of the GPUaccelerated MC code FRED [34] and its validation at two cyclotron-based proton beam therapy facilities of different beam line design: Varian ProBeam in Atlanta, GA (United States), and IBA Protheus C-235 in Krakow (Poland). The software toolkit FRED (Fast paRticle thErapy Dose evaluator) [34] was developed at the University of Rome for parallelized proton beam transport simulations in heterogeneous geometry defined by the patient CT. We describe in detail FRED commissioning steps, that is, automated characterization of the beam model that describes the proton beam used for patient treatment and follows the clinical QA procedures. Finally, we validated our commissioning procedure using the optimized beam models. We simulated dose distributions in FRED and compared the results with verification measurements performed in homogeneous and heterogeneous phantoms with and without range shifters as suggested by Winterhalter et al. [35]. Such extensive experimental validation of FRED accuracy and time performance has been never reported before. To increase the confidence of the reader about the accuracy of FRED simulations, selected results were also compared with clinical TPS simulations. Eventually, we evaluated clinical cases of patient treatment plans to demonstrate the clinical applicability of FRED.

GPU-Accelerated Monte Carlo Code FRED
The great benefit of FRED with respect to general purpose MC codes is its computation performance achievable on a variety of different hardware without compromising the dose computation accuracy. The typical tracking rates range from 10-100 thousand protons per second using a single CPU to about million particles per second using GPU cards. FRED is equipped with an interface to convert phantom/patient geometries stored in DICOM CT images to a voxelized geometry of the patient containing the atomic tissue composition using a conversion table based on stoichiometric calibration [36]. In addition to patient geometry, user-defined geometries of specific material composition can be included enabling simulations of proton transport in passive elements like range shifter.
The physical interaction models implemented in FRED are trimmed down with respect to general purpose MC codes, such as Geant4/FLUKA within the regime that is relevant for particle therapy, in order to speed up the execution time without compromising the accuracy of dose-deposition calculations. In particular, the physics processes contributing to the dose deposited by protons in patient tissue, that is, mean energy loss, energy fluctuations, nuclear elastic and inelastic interactions with target nuclei as well as the trajectory deflection via a multiple Coulomb scattering, are implemented in FRED [34]. Moreover, FRED offers linear energy transfer (LET) and relative biological effectiveness (RBE)-weighted dose calculations by means of different RBE models, providing further information, which is not available in the state-of-theart commercial TPS. The LET and RBE computations in FRED are out of the scope of this study.

Commissioning Measurements and FRED Simulations
FRED commissioning was performed for one gantry room of two PBT facilities of different beam line design equipped with scanned proton beams that are in clinical operation since 2016 and 2018, respectively. Krakow facility is an IBA design based on Proteus C-235 cyclotron equipped with two rotational gantries, an eye treatment room and an experimental hall. The TPS Eclipse from Varian, version 13.6, is used for treatment planning in CCB. It uses analytical proton convolution superposition (PCS) algorithm for the dose calculation and optimization [37]. EMORY PBT center in Atlanta is a ProBeam system designed by Varian and equipped with three rotational gantries and two horizontal beam lines. The TPS RayStation from RaySearch laboratories, version 8A, equipped with MC dose algorithm is used for treatment planning in EMORY [7]. The properties of proton beams and the measurement methods used for the acquisition of clinical beam model commissioning data at both facilities are listed in Table 1.
The commissioning measurements that include depth dose distribution measurements in water phantom, measurements of the lateral profiles (without range shifter) in air, and absolute dose measurements in a water phantom were used to build parameter libraries characterizing the FRED beam model for Krakow and Atlanta facilities. The water phantom and in-air setup used for commissioning measurements are schematically illustrated in Figures 1 A  and B respectively. The figure indicates how the proton beam is transported from the nozzle toward the detector/phantom. During irradiation, the beam is deflected vertically and horizontally by scanning magnets and crosses a position sensitive ionization chamber (IC23), which is used for beam lateral position and size measurement. The procedure of the commissioning data acquisition is not described here in detail as it is out of the scope of this article.
The FRED simulation setup mimics the commissioning measurements setup shown in Figures 1 A and B. The virtual beam source is located at the position of the scanning magnet located closer to the isocenter because at this position, the deflection of the beam in both X and Y directions is defined. The different position of the X and Y scanning magnets is taken into account, while calculating the direction of a single pencil beam. The beam propagation in the IC23 is omitted in the simulations and is taken into account by adjusting beam source parameters, in such a way that the beam size fits the results of beam size measurements in air performed with scintillating screen (Lynx). The proton beam was propagated without and with range shifter. FRED simulations in water were performed in 40 × 40 × 40 cm 3 virtual phantoms of 1 × 1 × 1 mm 3 voxel size ( Figure 1A). The ionization potential of water was set to 80 eV [38]. FRED simulations of the in-air setup used for beam model validation were performed in a virtual air phantom. The total time of FRED MC simulations includes tracking time, time needed for memory allocation, and the file writing. The tracking rate of simulation is given as the number of protons tracked per second (p + /s).

Beam Model Parameters
The beam model parameters characterize longitudinal and lateral pencil beam profiles as well as dosimetric calibration. Two parameters, energy (E) and energy spread (E σ ), characterize proton pencil beam depth dose distribution (longitudinal) profile. One further parameter, monitor units (MU) to the number of particles conversion factor (SF MU ), characterizes integral dose distribution (IDD) dosimetrically, by means of dose measurement at 2 cm depth, following TPS commissioning protocol and other references [37,39,40]. The lateral propagation of the proton pencil beam can be characterized by a quadratic model by means of modeling beam emittance or bilinear model by defining virtual point source. In fact, the bilinear model is an approximation of a quadratic model in a limited range. The virtual point source approach can be applied when the waist of the quadratic function of emittance model is far enough from the isocenter to approximate lateral beam propagation behind the nozzle exit by a bilinear function. FRED is capable of handling lateral beam propagation using both virtual point source or emittance approaches.  For characterizing the lateral propagation, the lateral beam profiles measured during facility commissioning in air at different Z positions (cf. Figure 1B) were fitted using the Gaussian fit, and its σ(z) was calculated. Additionally, the σ(z) measured with the beam monitor chambers in the nozzle can be used [41]. This improves the quality of the lateral beam propagation modeling, especially in cases where the waist of the beam is located between the nozzle and the first measured point in air. Fitting σ(z) to commissioning data from both facilities at different distances from the isocenter using bilinear and quadratic functions indicated that the emittance model is appropriate for Krakow facility, whereas the virtual point source model can be used for EMORY.
For characterizing the beam lateral propagation in Krakow, six emittance model parameters (ϵ, α, β), three in X direction and three in Y direction, were used. The Twiss parameters ϵ, α, and β were obtained according to the following formula [42]: where the emittance ϵ corresponds to the area in the X/Y position-velocity phase space and is assumed to be constant over the beam propagation in air. The Twiss parameter α is related to the focusing/defocusing of the beam, whereas β characterizes the length over which the beam changes its transverse shape. For characterizing the beam lateral propagation in Atlanta, four parameters, two in X direction and two in Y direction, specific for a bilinear approximation were used. The parameters were obtained according to the following formula: where the S is the function slope and corresponds to the rate of the spot size variation and VSD stands for virtual source distance and corresponds to the distance from the virtual source to the isocenter. Note that for both approaches, virtual point source and emittance model of lateral beam propagation, particles are transported starting from the position of the scanning magnets regardless of the position the emittance waist and VSD.
For TPS exploiting analytical pencil beam algorithm, the emittance model is defined for configurations with and without range shifter, whereas in MC-based TPS and in FRED, only the configuration without range shifter is defined, and proton transport in range shifter is simulated according to its model parameters (material composition, density, physical thickness).

Generation of Beam Model Parameter Library
We implemented a set of software tools that calculate beam model parameters in three automated steps (see Figure 2). The beam model parameter libraries were generated in the entire proton beam energy range in 10 MeV steps ( Table 1) for both facilities. Figure 2 schematically illustrates how the FRED MC commissioning procedure uses the facility commissioning measurements as the input to obtain beam model parameters per nominal energy, that is, beam energy E, energy spread E σ , MU scaling factor SF MU , and six emittance or four virtual point source parameters. The procedure is automated and does not require any interaction with the user, except preparation of the measurement data. FRED simulations of single pencil beams were performed using 10 8 primary protons.
Step 1. In the first step ( Figure 2: Step 1), the emittance or virtual point source model (Eqs 1 and 2) was fitted to the measured beam spot size (σ x/y ) as a function of the position along the beam (see Section 2.3). For Krakow beam model, in addition to the beam size measurements performed with Lynx (pixel size 0.5 × 0.5 mm 2 ), the beam size measurements performed during irradiation with IC23 (resolution 5 mm in X/Y directions) installed close to the nozzle exit were used to fit the emittance model (see Section 2.3). In this way, emittance model parameters (ϵ, α, β) or virtual point source parameters (S, VSD) were obtained for X and Y directions and each energy.
Step 2. In the second step ( Figure 2: Step 2), beam energy (E) and energy spread (E σ ) were obtained. The measured and simulated IDD profiles were fitted using a formalism proposed by Bortfeld [43,44]. Using the fit and semiempirical relations proposed by Bortfeld [43], the initial energy and energy spread of protons producing an IDD distribution were computed. The Bragg peak range (R 80% ) defined as 80% of the maximal value at the distal falloff and the Bragg peak full width at half maximum (FWHM) were numerically calculated from the fitted curve. The E, E σ , R 80% , and FWHM parameters were calculated for experimental data and each FRED simulation. An automated iterative optimization procedure was developed to find such E and E σ values in FRED, which minimize the absolute difference of Bragg peak range ( ΔR 80% ) and FWHM (|ΔFWHM|) between simulation and measurement. The dependence of ΔR 80% and |ΔFWHM| on E and E σ is a continuous function with a single global minimum. The optimization procedure was implemented in Python exploiting the Nelder-Mead simplex algorithm [45]. The initial guess of energy and energy spread was estimated from the Bortfeld curve fitted to measured data. Each consecutive step of the optimization algorithm included the following: 1) new simulation of a depth dose distribution in water with energy and energy spread computed by the optimization algorithm, 2) Bortfeld curve fit and estimation of R 80% and FWHM for the simulated curve, and 3) estimation of ΔR 80% and |ΔFWHM| comparing measurement and new simulation. The FRED beam energy (E) and energy spread (E σ ) are considered optimal when ΔR 80% and |ΔFWHM| are less than or equal to 0.05 mm.
Step 3. In the third step ( Figure 2: Step 3), the dosimetric calibration from TPS MU to the number of particles (SFSF MU ) was obtained for each nominal energy, mimicking the measurement setup. For this purpose, a monoenergetic 10 × 10 cm 2 field in water was simulated with spot spacing 2.5 mm, 1 MU per spot and unitary MU scaling factor. The dose in the uniform field center at 2 cm depth in water, D 2 cm , was derived from the simulation. The MU scaling factor (SF MU ) was obtained as the ratio between D 2 cm obtained from commissioning measurement and FRED MC simulation.
The output of the characterization procedure is a list of beam model parameters per nominal energy and is stored in a text file.
We developed a software tool that converts clinical TPS treatment plan into FRED input files using the beam model library (cf. Figure 2: Conversion and calculation of treatment plans). The parameters in between nominal energies are linearly interpolated, mimicking the procedures applied by TPS and beam line control system.

Validation in Homogeneous Media
This section describes how the beam model library was validated by comparing FRED simulations with measurements performed at each facility. We compared 1) lateral propagation of proton pencil beams, 2) treatment plans of dose cubes, and 3) patient QA treatment plans. The beam model validation steps are schematically illustrated in Figure 2 (lower row). The treatment plans were exported from TPS and converted from DICOM to FRED input file format. The QA treatment plans were simulated in FRED using 10 5 protons per pencil beam. After simulation, the dose from each spot was scaled to the actual number of particles optimized in the treatment plan using dosimetric calibration (SF MU ). This approach warranties the same statistical precision of calculation of dose delivered by each pencil beam, regardless of its weight in the treatment plan.
Lateral propagation of proton pencil beams. The measurements of lateral profiles of proton pencil beams at 100, 150, and 200 MeV were performed using the Lynx scintillating screen (IBA Dosimetry) in air for CCB and EMORY [46] at five positions behind the range shifter. The beam lateral profiles in solid phantoms were measured with Lynx in RW3 slab phantom for beam energies 100, 150, and 200 MeV at CCB and in PMMA slab phantom for beam energies 130, 180, and 240 MeV at EMORY.
FRED simulations for pencil beams were performed at the corresponding positions behind the range shifter in air and in solid phantoms. The transverse shape of the beam in X and Y directions was fitted with a single Gaussian fit, and the σ obtained from measurements and simulations were compared.
Spread Out Bragg Peak (SOBP). The longitudinal profiles of dose cubes (SOBPs) were measured 1) at CCB using a dosimetrically calibrated plane-parallel Markus chamber placed in a water phantom (sensitive volume 0.055 cm 3 ) with variable 0.1-1 cm step length and 2) at EMORY using the Zebra detector (IBA Dosimetry) without dosimetric calibration. The QA treatment plans of dose cubes were optimized in clinical TPS aiming at achieving homogeneous biological dose of 1 Gy (RBE) and 4 Gy (RBE) at CCB and EMORY, respectively. All cubes had a lateral size of 10 × 10 cm 2 . At CCB, dose cubes of 5 cm length (modulation) and variable range of 10, 15, 20, 25, and 30 cm without range shifter were optimized and evaluated. At EMORY, dose cubes of 10 cm length (modulation) and constant range of 15 cm without and with three range shifters of different thickness were investigated. For each measurement, the isocenter position in water was in the middle of SOBP, causing that the measurements were performed with air gaps ranging from 5 to 32 cm for EMORY and from 11.2 to 29.4 cm for CCB. Simulations of the SOBP plans were performed in a virtual water phantom. The measured SOBP dose profiles were compared with the profile extracted from threedimensional (3D) dose calculation obtained from FRED MC simulations. Absolute dose comparison was performed for Markus chamber measurements conducted at CCB, whereas relative dose comparison was performed for Zebra measurements conducted at EMORY.
Patient QA. To evaluate the accuracy of FRED simulations, patient QA treatment plans were simulated in a virtual water phantom and compared with patient QA measurements routinely performed in the clinic. The comparison of TPS vs. measurement is also shown.
In CCB and EMORY, the MatriXX PT (IBA Dosimetry) is currently in use for patient QA [47]. MatriXX is a two- Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 567300 6 dimensional (2D) array of 1020 plane-parallel ionization chambers of 4 mm diameter arranged in a 32 × 32 grid with the distance between chambers of 7.62 mm. In both facilities, the MatriXX detector was calibrated to dose in water according to protocol proposed by the manufacturer. Patient QA measurements are typically performed at 3-5 depths at CCB and at 1-3 depths at EMORY. The measurement depths are selected by a medical physicist during the QA preparation process for each patient individually, to cover the entire treatment field. For EMORY, the air gap ranges from 5 to 22 cm, whereas for CCB, it ranges from 21.7 to 27.7 cm. The patient QA treatment plans of 74 patients (1077 measured layers, 967 without and 110 with range shifter) treated in Krakow and 13 patients (56 measured layers) treated in EMORY were evaluated. The dose distributions obtained from TPS and FRED calculations were compared to measured data by means of dose profile and gamma index (GI) analysis [48]. GI calculation tools implemented in PyMedPhys Python package [49] were used for evaluation. The 3D GI test (2 mm distance-to-agreement and 2% of local dose difference criteria, with the dose cutoff at 2% of the maximum dose) was used to compare 2D slice of dose field measurement (reference) with 3D FRED dose distribution calculation (evaluation).

Validation in Heterogeneous Media
The end-to-end experimental validation of FRED physics models, beam model, and CT calibration using a heterogeneous CIRS head-and-neck phantom (model 731-HN) [50] was performed in Krakow. The experimental setup is shown in Figure 3. The CIRS phantom consists of five materials equivalent to the following tissues/organs: brain, bone, larynx, trachea, sinus, teeth, and nasal cavities. One half of the phantom consists of single piece, and the other is sliced into three segments as shown in Figure 3A. The CIRS phantom was positioned in the treatment room using orthogonal X-ray imaging system and the phantom CT scan, following the clinical patient positioning procedure applied in Krakow. The irradiation plans of 10 × 10 cm 2 monoenergetic fields at nominal energies 100, 150, and 200 MeV were prepared in clinical TPS with and without range shifter. The dose distribution downstream from the CIRS phantom was measured using the MatriXX detector placed in the DigiPhant water phantom (IBA Dosimetry, see Section 2.5). Data were acquired in 5 mm water-equivalent steps yielding 3D dose distribution with lateral resolution of 7.62 mm and longitudinal resolution of 5 mm. Dose distributions were measured behind half CIRS head in water for nominal energies 150 and 200 MeV (cf. Figure 3B). The dose distribution was measured behind 1/6 slice of CIRS head in water-equivalent RW3 slab phantom using 100 MeV proton beam (IBA Dosimetry; cf. Figure 3C) because 100 MeV protons have insufficient range to traverse the half-head phantom to acquire dose distribution in water using MatriXX (with and without range shifter).
The measurements were compared to FRED simulations of the experimental setup performed in the CT image of the CIRS and water phantoms. The CT image of CIRS phantom was acquired using the CT scanner (Siemens SOMATOM) calibrated for treatment planning in Krakow. The comparison of measured and simulated 3D dose distributions was performed using a 3D GI method.

Patient Data
A retrospective patient study was performed to investigate time performance of FRED as an independent, MC-based, proton dose computation tool and demonstrate its applicability for patient QA in the clinic. For this purpose, we referred our results to the TPS computations.
The 122 treatment plans (including boost plans) of 90 head and neck as well as brain patients treated at CCB from 2016 to 2018 and an example treatment plan of a patient treated in EMORY in 2019 [7] were simulated in FRED on CT geometries. The clinical CT images were sampled down to 1.5 × 1.5 × 1.5 mm 3 voxel size. The facility-specific clinical CT calibration curve obtained from stoichiometric calibration [36] was implemented in FRED. The CT calibration curve used in FRED contains information on the composition, relative stopping power (RSP) of protons, radiation length, and density of 93 materials. The density and RSP of CT numbers between 93 predefined points are linearly interpolated. The CT images of the patient anatomy and delineated contours were used for the optimization of plans in clinical TPS using an analytical intensity modulated proton therapy (IMPT) optimization algorithm. Depending on the target size and the number of fields, the number of pencil beams in a treatment plan varied from 1,378 to 32,290 with the median value of 10,989. 10 4 protons per pencil beam were simulated for each patient treatment plan recalculated in FRED, and the obtained dose was scaled to the actual number of particles optimized in the treatment plan. In order to investigate the impact of PTV volume on the FRED dose calculation accuracy, we divided treatment plans of patients treated in CCB into three subgroups distinguishing 12 plans with small PTV volume (V PTV < 50 ml), 60 plans with medium PTV volume (50 ml ≤ V PTV < 200 ml), and 50 plans with large PTV volume (V PTV ≥ 200 ml). PTV volumes ranged from 28.5 ml to 1,010 ml An example treatment planning study on 122 plans included a comparison of dose distributions obtained from FRED and from clinical TPS. We evaluated four parameters based on dose volume histogram (DVH) that characterize the quality of dose distribution. 1) The mean dose (D mean ) is related to the prescribed dose (D p ). 2) The homogeneity index (HI) characterizes the slope of the DVH; hence, the uniformity of the dose distribution in the PTV. The HI is defined as HI (D 2% − D 98% )/D p , where D 2% and D 98% are the doses received by 2% and 98% of the PTV, respectively [51]. 3) The conformity index (CI) describes how much dose prescribed to the planning target volume (PTV) is delivered outside the PTV, possibly to organs at risk. The CI is defined as CI V body 95% /V PTV 95% , where V body 95% and V PTV 95% are the volumes of the body and PTV, which receive at least 95% of the prescribed dose D p [52]. 4) The relative mean square error (RMSE) characterizes the deviation of a DVH from the prescribed dose D p . It was calculated at the slope of a DVH, in a range between D 5% and D 95% , and it is defined as RMSE 95 5 (D x% − D p ) 2 /90 .

Generation of the Beam Model Parameter Library
The beam model parameter libraries characterizing the proton beam model for CCB and EMORY facilities were generated using an automated procedure (cf. Section 2.3) and are illustrated as a function of nominal proton beam energy in Figure 4. Using the beam model library, the nominal primary proton beam energy for each pencil beam from the treatment plan is used to define the initial parameters of the pencil beams used by FRED simulations. Figure 4   The IDD profiles of single proton beams in water for three nominal energies: 100, 150, and 200 MeV are given in Figures 5 A  and B for the Krakow and Atlanta facilities. The profiles are in agreement with the commissioning measurements: the range (R 80% ) of the pencil beams agrees within 0.02 mm, the relative dose difference along the pencil beam profile is below 4%, the FWHM of the Bragg peak agrees within 0.05 mm, the distal falloff width between 80% and 20% Bragg peak dose agrees within 0.04 mm, and the peak-to-plateau ratio agrees within 0.11.
The fitted single beam sizes in air obtained in commissioning measurements, described by σ x/y of lateral pencil beam profiles is shown in the Figures 5 C and D for three nominal energies: 100, 150, and 200 MeV for the Krakow and Atlanta facilities, respectively. The maximum absolute difference between fitted and measured beam sizes ranging from −20 to 20 cm (CCB) and −30 to 5 cm (EMORY) in Z direction with respect to the isocenter is smaller than 0.05 mm. We deem this sufficiently accurate to model lateral beam propagation in clinical applications. The quadratic and linear shape of the fit justifies the use of the emittance ( Figure 5C) and VPS ( Figure 5D) model for the Krakow and Atlanta facilities, respectively.
Dose computation time for a single pencil beam at 100, 150, and 200 MeV simulated with 10 8 primary protons was 36, 44, and 53 s, respectively. The corresponding tracking rate is 10.1 × 10 6 , 5.7 × 10 6 , and 3.6 × 10 6 p + /s. The tracking rate decreases with the beam range as more interactions must be processed.
The total computation time needed to determine the beam model parameters for all reference energies following the automated procedure described in Section 2.

Validation in Homogeneous Media
Lateral propagation of proton pencil beams. The lateral propagation of pencil beams in air behind range shifter of different thickness ( Figure 6) and in slab phantoms (Figure 7) was simulated in FRED and compared with the beam size σ x/y of lateral pencil beam profiles obtained experimentally. Note that the comparison was performed at different positions/depths and for different primary proton beam energies at CCB and EMORY facilities.
The lateral propagation of the beam in range shifter and in slab phantom is accurately modeled in FRED. The values of σ x/y Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 567300 8 obtained from measurements agree with simulated values mostly within 100 µm, as indicated by error bars in Figures 6 and 7. The results in air and in slab phantoms are within the spot size QA acceptance criterion of ± 0.6 mm used by CCB therapy center.
Spread Out Bragg Peak (SOBP). Depth dose distribution profiles of cubic volumes obtained from measurements and FRED simulations are shown in Figure 8 for CCB in the top panels and for EMORY in the bottom panels. The results obtained for CCB are absolute dose, whereas they are relatively normalized to the dose value in the middle of the SOBP for EMORY. Because the treatment plans were optimized in clinical TPS, the obtained physical dose differs from the prescribed biological dose by the RBE factor of 10%.
Good agreement between FRED MC simulations and dose measurements along the SOBP profiles was obtained. The maximum relative dose difference is 2% for most of the measurement points. The largest relative dose differences are observed at the distal falloff, that is, a high-dose gradient region, and result from the detector positioning uncertainties, estimated to be about ± 0.3 mm. Small variations between the measurements and simulations are present at the beginning of the plateau and in the SOBP of cubes between the range of 25 and 30 cm. They are potentially related to the implementation of the nuclear interaction model in FRED for the highest beam energies. This accuracy is acceptable for the scope of the presented clinical application.
The tracking rate of the dose cube simulation ranged from 4.5 × 10 6 to 2.0 × 10 6 p + /s and the complete dose computation time for a single dose cube was up to 10 min, with the statistics 10 5 primaries per pencil beam.
Patient QA. 2D transversal dose maps obtained from measurements performed with the MatriXX detector in water phantom were compared with FRED and TPS simulations of patient treatment plans using the GI analysis. Data from 1077 measurements performed at CCB and 52 measurements performed at EMORY were investigated, and the results of the comparison are summarized in Figure 9. The average GI passing rate obtained comparing all simulated and measured layers was 97.83% (4.94) (1σ) for CCB and 95.51% (3.88) (1σ) for EMORY. Of 1,077 layers evaluated for CCB, 1,022 fulfilled the requirement for the GI passing rate (%GP) to be greater than 90%. For EMORY, 47 of 52 investigated layers fulfilled this requirement. Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 567300 9 Figure 10 shows an example of a transversal dose field layer extracted from FRED MC simulation and the corresponding dose distribution measured with MatriXX at the same depth in water, as well as in the GI map.

Validation in Heterogeneous Media
The experimental validation of FRED accuracy was performed by comparing 3D dose distributions behind the heterogeneous phantom obtained experimentally and from FRED simulations (cf. Section 2.6). An example of the comparison of FRED simulation against the experimentally acquired data is shown in Figure 11. Two 3D dose measurements, one with and other without range shifter, were performed for each of the investigated energies (100, 150, 200 MeV). An excellent agreement between FRED simulations and measurements was achieved. For all the investigated cases, the 3D GI (2%/2 mm) is greater than 99%. Comparing the clinical (analytical) TPS simulation and the measurements, the GI passing rate is 93.

Example Clinical Application of FRED
As an example, dose distributions, dose profiles, and DVHs recalculated with FRED and clinical TPS, for one patient case from CCB and one from EMORY, are shown in Figure 12. For CCB patient case (Figure 12 top panels), dose distributions computed with FRED are less uniform compared to the analytical TPS calculations. This is also observed analyzing the dose profiles and the DVH for PTV and results in the reduction of the mean dose in PTV and organ at risk. For EMORY patient case ( Figure 12 bottom panels), the differences in dose distributions are less visible as MC-based TPS was used for the dose Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 567300 optimization and calculation. The observed differences between FRED and RayStation MC-based TPS are similar to the results obtained comparing RayStation with ECLIPSE MC algorithm reported by Chang et al. [7]. Analysis of 122 treatment plans of patients treated at CCB was performed to quantify the time performance and demonstrate the clinical applicability of FRED dose computations for patient QA. Comparing dose distributions in PTV, we observed that the ratio D mean /D p obtained with FRED is more dispersed than the one obtained with analytical TPS, while the effect is more pronounced for small targets. The average relative difference in median value ranges from 3% for small targets, through 1.5% for medium size target volumes, to 1% for large target volumes, as shown in Figure 13 (left panel). The analysis of HI in PTV is shown in Figure 13 (middle-left panel). On average, the median HI is 0.11 and 0.16 for clinical TPS and FRED, respectively. Independently on FIGURE 6 | Spot sizes in air in X (blue) and Y (red) directions for CCB and EMORY without range shifter and behind the range shifters used at facility (single range shifter (RS) of thickness 36.7 mm for CCB and RS2, RS3, and RS5 of thickness 20, 30, and 50 mm, respectively, for EMORY). The measured spot sizes are shown as points with error bars (± 0.1 mm), and the solid lines show the simulation results.
Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 567300 11 the target volume, the HI in PTV calculated with FRED is higher, that is, dose distribution is less homogeneous than the HI calculated with analytical TPS. Figure 13 (middle-right panel) shows the CI distributions, which present no substantial difference between both, FRED and TPS calculations (median CI is 1.26 and 1.23 for TPS and FRED, respectively). In general, for both, FRED and TPS calculations, dose distributions of small PTV are less conformal with respect to dose distributions for large PTV. The comparison of DVH in PTV by means of RMSE analysis confirms the conclusions from D mean /D p ratio and HI analysis. The histogram of RMSE for TPS distribution is narrower with smaller mean value, whereas for FRED, the RMSE distribution is wider with slightly greater mean value. This is because the dose distributions calculated with FRED are less uniform in PTV, as indicated by HI analysis, and the mean dose in PTV differs from the dose in PTV calculated with TPS, as indicated by D mean /D p ratio analysis.
For a treatment plan, the total simulation time varied depending on the complexity of the plan, that is, the total number of pencil beams and the presence of range shifter in the plan. For the simulations in CT geometry rescaled to 1.5 × 1.5 × 1.5 mm 3 voxels, the computation

DISCUSSION
We have built a proton beam model libraries for FRED MC code according to the QA protocols, and we accomplished acceptance tests required for beam model validation in a commercial TPS at proton therapy facilities. We performed MC commissioning avoiding the nozzle geometry modeling, similar to the work presented by other groups [10,32,33]. The beam model library parameters containing the information on initial proton energy and energy spread, lateral beam propagation, and dosimetric calibration were identified in 10 MeV energy steps in the therapeutic energy range to best fit the commissioning measurements of proton pencil  beams (cf. Section 3.1). A submillimeter agreement between simulated and measured Bragg peaks shape and range in water and lateral beam sizes in air and in solid phantoms was obtained with and without range shifter for beam model of two facilities of different beam line design.
In the study, we assumed the uncertainty of single pencil beam and SOBP depth dose profile measurements to be ± 3%. The uncertainty of positioning of the ionization chamber in the water phantom is about 0.3 mm. The uncertainty of the lateral pencil beam size measurement performed with scintillating screen (Lynx detector) in air and in the RW3/PMMA slab phantom is ± 0.1 mm, whereas the measurement with IC23 has 0.5 mm uncertainty [32]. We estimate the uncertainty of the slab phantom positioning at 1 mm, but it has negligible impact on the beam lateral profile measurements. We performed beam model commissioning and validation using the proton per pencil beam statistics that it is required to assure no impact of the statistical uncertainty on these results. For single pencil beams, 10 8 protons per beam offer statistical uncertainty below 1% in 3σ distance from the beam core, when simulations are performed in 1 × 1 × 1 mm 3 grid. Lower statistics can be used for recalculation of treatment plans in water in the same 1 × 1 × 1 mm 3 grid because dose distribution is obtained from superposition of hundreds of pencil beams. We found that for treatment plan recalculation in water, the statistical uncertainty below 1% can be achieved using 10 5 protons per beam for small fields. For clinical application of FRED, the limiting factor is the time of simulations. We found that for resampling the patient geometry in CT to 1.5 × 1.5 × 1.5 mm 3 , 10 4 primaries per pencil beams can be used, achieving statistical uncertainty of about 2%. We consider this setting as a good compromise between simulation time and simulation accuracy, allowing treatment plan recalculation in CT scan within a few minutes. No statistical uncertainty of the dose calculated with analytical TPS used as CCB was considered, whereas the dose was calculated with the statistical uncertainty of 0.5% in MCbased RayStation TPS on 2 × 2 × 2 mm 3 grid in water phantom and 3×3×3 mm 3 grid in patient CT.
The comparison of FRED simulations to QA measurements in water presented in Section 3.2 indicates that, on average, FRED dose distributions agree better with measurements than the prediction made by TPS pencil beam algorithm used in CCB (Figure 9, left panel); however, FRED dose distributions are comparable to predictions of commercial MC-based TPS used in EMORY (Figure 9, middle panel). Analysis of CCB patient QA data shows that for small, medium, and large PTV volumes, on average, the dose distributions computed by FRED agree better with measurements when compared with dose distributions computed with pencil beam algorithm ( Figure 9, right panel). We have not observed substantial differences in FRED dose calculation accuracy between different PTV volume categories. Note that small PTV volumes ranging from 28.5 to 50 ml were investigated for CCB. In Section 3.3, we presented the results of end-to-end FRED validation of FRED simulations. For various beam energies, large air gaps, and setups with and without range shifter, we compared FRED simulations with measurements of 3D dose distributions behind anthropomorphic CIRS head phantom containing high-density gradients on the boundary between head bones and nasal cavities. The high accuracy of the FRED dose calculations was confirmed in the results of GI tests better than 99% for all of the investigated cases.
Comparison of experimental results in homogeneous media and anthropomorphic phantom with FRED simulations (cf. Sections 3.2 and 3.3 and Supplementary Materials) indicates that fast dose recalculations in patient CT performed with FRED (cf. Section 3.4) is a very accurate simulation of proton treatment. A retrospective treatment planning study and the statistical evaluation of DVH parameters are example of routine clinical application of FRED for patient QA. The dose nonuniformities in PTV shown in an example CCB patient case recalculated with FRED ( Figure 12) are also observed in the analysis of D mean /D p and HI for 122 patient cases summarized in Figure 13. The differences of the mean dose delivered to PTV structures, calculated by FRED and predicted by TPS are more pronounced for small PTV volumes ( Figure 13, left panel). FRED calculations predict dose nonuniformity for small, medium, and large PTV volumes, which cannot be calculated with analytical TPS used in Krakow. In general, dose distributions are less conformal in small targets than in large targets because it is predicted both by FRED simulations and by TPS pencil beam algorithm calculations ( Figure 13, right panel). Note that these clinical results, both from TPS and FRED, include uncertainties related to acquisition of commissioning data, beam model implementation, CT calibration, and the like. On the other hand, the distribution of D mean /D p , HI, and CI indicate that overall, the dose distribution calculations performed with both clinical TPS and FRED are within the clinically relevant acceptance.
In clinical practice, additional information about dose, LET and RBE-weighted dose distributions calculated with FRED can be an indication for medical physicists to revise the treatment plan optimization or to perform additional experimental validation, when the results deviate from the predictions of TPS exceeding acceptance criteria. The time performance of FRED enables to obtain this information within about 2.5 min. FRED is currently adapted to be executed as a stand-alone library, which will enable its easy integration with commercial TPS (eg, Eclipse or RayStation) and dedicated software tools for patient QA (eg, MyQAion).
Schiavi et al. [34] reported that simulation of dose deposition in a water phantom induced by 10 6 primary protons can be reduced from 22 min required by FLUKA MC code to 0.5 s when employing FRED running on two GPU modules [34]. Regarding dose distribution simulation in patients, Grassberger, Anthony Lomax, and Paganetti [33] reported that the patient simulation for the head and neck took 371 min (10 6 primaries simulated) on single CPU using TOPAS (Geant4), which corresponds to a tracking rate of 45 p + /s, whereas the average tracking rate obtained with FRED is 2.9 × 10 5 p + /s in patient CT rescaled to 1.5 × 1.5 × 1.5 mm 3 using two GPUs. The time performance results presented in this article can be linearly scaled as a function of the number of GPU cards applied [34]. Note that the simulation time depends on the number of primaries simulated per pencil beam, tumor depth (i.e. the beam energy), and scoring resolution used for the simulation. The most accurate dose calculations in tissue heterogeneities can be obtained performing the simulation in original CT grid. In order to achieve the statistical uncertainty below 1% on CT grid used at CCB 0.7 × 0.7 × 1.2 mm 3 , 10 5 primaries per pencil beam should be simulated. The average simulation time for the patient group investigated in Section 3.4 in original CT resolution is 31.8 161. 8 3.5 (σ 23.8) min. The clinical application of proton therapy and development of new treatment protocols, for example, studies on the reduction of safety margins accounting for treatment plan robustness, require treatment planning studies that can only be performed analyzing several treatment planning approaches. The total simulation time of all 122 patient cases shown in Section 3.4 was about 5 h. An example study of 10 possible treatment planning approaches on our patient group could be performed using FRED within about two days of simulation. Another application is robust optimization of treatment plans, that is, particularly relevant for treatment planning of moving targets, when several dose distributions must be computed on 4D CT. Performing such studies without the time performance offered by FRED would not be possible with any general purpose MC code in reasonable time.
In addition to its clinical applications, the time performance of FRED enables preparation of the proton beam model faster with respect to a general purpose MC codes. This is particularly useful when a new beam model must be implemented in the clinical routine due to technical modifications or maintenance at accelerator. When the facility beam commissioning measurements are available, the GPU acceleration offered by FRED allows to parametrize the beam model within about 12 h, requiring minimal manual interventions. This potentially enables easy and quick use of FRED for research and patient QA purposes in most of the proton facilities with little experimental efforts.

CONCLUSION
In this article, we share our experience on commissioning and validation of GPU-accelerated MC code FRED based on commissioning measurements of two proton beam therapy facilities of different beam line design: CCB (Krakow) from IBA and EMORY (Atlanta) from Varian. FRED passed acceptance tests required to approve TPS for clinical use. The approach we used combines the application of a new GPU-accelerated MC code, implementation of two proton beam lateral beam propagation models, automated beam model optimization method, experimental validation of beam model parameters in an anthropomorphic phantom with and without range shifter, and comparison of patient treatment plans computed with FRED and clinical TPS in patient CT. Our commissioning and validation results demonstrate the universal and accurate implementation of the physics models in FRED, allowing its flexible applications for medical physics and research purposes. The application of FRED as a secondary MC engine for patient QA in clinical routine is foreseen in Krakow proton facility. FRED is currently used for treatment planning studies evaluating radiobiologically effective dose using variable RBE.

DATA AVAILABILITY STATEMENT
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

AUTHOR CONTRIBUTIONS
JG, MG, AS, ASc, and AR developed the beam model for CCB. JG developed automated beam model library implementation method and performed data analysis to validate the beam model. JG, MG and ASc developed the emittance and virtual point source models for FRED. JG, NM and AR designed, while JG, MG, NM, AR, MR performed validation experiments with proton beams at CCB. JG performed data analysis of experiments. KC, NK and MPN supported data analysis. RK provided access to beam model commissioning and validation data from CCB. JG participated in commissioning measurements in CCB. CC and LL provided commissioning, validation, and patient data from EMORY. JG implemented beam model for EMORY and performed analysis of validation and patient data. RK and EP provided access to patient data from CCB. KK and MR exported the patient data from clinical TPS. MG and JG performed simulations and analysis of patient data. ASc and VP developed and made substantial improvements in FRED source code required to enable presented studies. MD, IR, ES, and FT provided expertise in beam modeling and medical physics. JG prepared all figures. JG and AR designed the project and drafted the manuscript. AR acquired funding. All the authors reviewed and approved the manuscript.