NTCP Models for Severe Radiation Induced Dermatitis After IMRT or Proton Therapy for Thoracic Cancer Patients

Radiation therapy (RT) of thoracic cancers may cause severe radiation dermatitis (RD), which impacts on the quality of a patient's life. Aim of this study was to analyze the incidence of acute RD and develop normal tissue complication probability (NTCP) models for severe RD in thoracic cancer patients treated with Intensity-Modulated RT (IMRT) or Passive Scattering Proton Therapy (PSPT). We analyzed 166 Non-Small-Cell Lung Cancer (NSCLC) patients prospectively treated at a single institution with IMRT (103 patients) or PSPT (63 patients). All patients were treated to a prescribed dose of 60 to 74 Gy in conventional daily fractionation with concurrent chemotherapy. RD was scored according to CTCAE v3 scoring system. For each patient, the epidermis structure (skin) was automatically defined by an in house developed segmentation algorithm. The absolute dose-surface histogram (DSH) of the skin were extracted and normalized using the Body Surface Area (BSA) index as scaling factor. Patient and treatment-related characteristics were analyzed. The Lyman-Kutcher-Burman (LKB) NTCP model recast for DSH and the multivariable logistic model were adopted. Models were internally validated by Leave-One-Out method. Model performance was evaluated by the area under the receiver operator characteristic curve, and calibration plot parameters. Fifteen of 166 (9%) patients developed severe dermatitis (grade 3). RT technique did not impact RD incidence. Total gross tumor volume (GTV) size was the only non dosimetric variable significantly correlated with severe RD (p = 0.027). Multivariable logistic modeling resulted in a single variable model including S20Gy, the relative skin surface receiving more than 20 Gy (OR = 31.4). The cut off for S20Gy was 1.1% of the BSA. LKB model parameters were TD50 = 9.5 Gy, m = 0.24, n = 0.62. Both NTCP models showed comparably high prediction and calibration performances. Despite skin toxicity has long been considered a potential limiting factor in the clinical use of PSPT, no significant differences in RD incidence was found between RT modalities. Once externally validated, the availability of NTCP models for prediction of severe RD may advance treatment planning optimization.


INTRODUCTION
The development of acute and chronic radiation-induced skin injuries is a common side effect of radiation therapy (RT). Acute radiation dermatitis (RD), with reactions evident one to four weeks after the beginning of RT, may limit the duration of treatment and the dose delivered (1,2). The severity of adverse dermatologic events ranges from mild erythema to moist desquamation and ulceration, impacting on the quality of a patient's life (3). Acute RD occurs most frequently after RT of breast, pelvic (e.g., anal cancer, vulvar cancer) and head and neck malignancies, while lower incidence is reported for deeper tumors as lung cancers (4).
Thanks to the advent of high-energy photon RT, which provide more skin sparing treatments compared to older ones with lower energy treatment machines, a general reduction in RD incidence and severity has been achieved in the past decades. Still, RD remains one of the significant adverse effect of RT.
The introduction of most modern treatment modalities, such as intensity modulated RT (IMRT) or proton beam therapy, has nowadays changed the dose distribution patterns in the normal tissues surrounding the tumors (5,6). Accordingly, advanced RT techniques have generally reduced the burden of radiation related risks, included skin toxicity (7,8). The substantial sparing of organs-at-risk from proton beams compared to IMRT is expected to theoretically further reduce radiation-induced morbidity (9). However, the risk of a potential increase of skin toxicity has long been considered a peculiar drawback in the clinical use of protons. The higher beam entry dose of the spread-out Bragg peak represents a disadvantage for the skin; thus causing concern over a possible increase in skin adverse effects (10,11).
The skin response to radiation has been studied since the discovery of X-rays (2,12). Multiple patient-specific and dosimetric features have been identified as risk factors for acute skin toxicity after RT for diverse tumor locations, in particular breast (7,13,14), head and neck (15) or brain tumors (16). Notwithstanding this, normal tissue complication probability (NTCP) modeling of skin toxicity is still not fully explored. In addition, the available NTCP models are mostly designed for dose-volume histogram (DVH) from a target volume (e.g., breast) (17)(18)(19)(20) or are based on DVH from a pseudo-skin structure defined as a layer of 2-5 mm inward from the body contour (15,21,22). A different approach could directly consider the surface phenomena connected to the actual organ at risk, i.e., the skin (23).
In the present study, we analyzed the incidence of acute RD in thoracic cancer patients treated with Intensity-Modulated RT (IMRT) or Passive Scattering Proton Therapy (PSPT) on a completed prospective randomized trial (24), and we developed NTCP models for severe acute RD. The model procedure was based on the introduction of a fully automated method for skin definition as a critical organ. Both the Lyman-Kutcher-Burman (LKB) and multivariable logistic regression modeling strategies were adopted.

METHODS AND MATERIAL
The study involved 225 patients with locally advanced Non-Small-Cell Lung Cancer (NSCLC) enrolled in the trial NCT00915005. One hundred sixty-six patients were eligible for the present analysis. The eligibility criteria included acute RD follow-up data and availability of dose maps. All patients were treated according to an IRB approved protocol (NCT00915005) with image-guided IMRT (103 patients) or PSPT (63 patients) to a prescribed dose of 66 or 74 Gy (RBE) in 33 or 37 conventional daily fractions delivered with concurrent chemotherapy (CHT). The typical three-field arrangement was used for all PSPT plans (24). Typically, a posterior and lateral beams plus an oblique beam that avoids lung parenchyma in its exit dose (25). In the IMRT plans, six to nine equidistant, coplanar, axial 6-MV beams were usually used (26).
Details of the protocol, patient and treatment characteristics are reported elsewhere (27,28). All dose maps were obtained with a dose grid size of 2.0 × 2.0 × 2.5 mm 3 .
For each patient, acute RD was assessed as the maximum score recorded during the treatment and within 90 days after RT. The RD was graded according to the National Cancer Institute's Common Toxicity Criteria for Adverse Events (CTCAE) version 3 into the following groups:

Dosimetric Analysis
For each patient, individual DICOM RT plans (computed tomography (CT) scans, doses, and contoured organ structures) were converted into Matlab-readable format (MathWorks, Natick, MA, USA) using the CERR (Computational Environment for Radiotherapy Research) software (29). The epidermis (skin) was automatically defined by an inhouse segmentation algorithm developed on purpose. In detail, the body contour was first corrected applying a Hounsfield unit thresholding over a moving window to exclude possible contribution from treatment bed. The resulting structure was then eroded by 3 mm [i.e., approximately the mean skin thickness (30)]; the skin was then obtained subtracting from its erosion (Figure 1) according to the following equation where B[r] is a spherical structuring element of radius r, \ represents the set difference, and ⊖ stands for morphological erosion (31).
The absolute dose-surface histograms (DSHs) of the skin thus extracted were computed by an in-house developed library for Where W and H are patient's weight and height respectively (32).
The following DSH metrics were extracted: the relative skin surface receiving more than X Gy (S x ) in step of 1 Gy, the minimum dose given to the hottest x% skin surface in step of 5% (D x ), the skin near maximum dose (D 2% ) and the mean dose (D mean ).

Statistical Analysis
Acute RD was analyzed according to its severity, i.e., grade 3 (G3) RD vs. G0-G2 RD. All the extracted skin dose parameters along with patient-specific and treatment-related factors were analyzed by univariate statistical methods for the above defined grouping. Categorical variables were tested by Pearson's χ 2 -test or Fisher's exact test when appropriate; continuous variables were tested by Mann-Whitney U-test.
Average relative DSHs stratified by treatment modality and toxicity endpoints were compared at each dose point by twotailed t-test. A significance α-level of 0.05 corrected according to the Holm-Šidák method for multiple comparison was applied (33).

Normal Tissue Complication Probability Modeling
For the defined endpoint, two different NTCP modeling approaches were applied: the LKB model, built on generalized equivalent uniform dose (gEUD) (34,35) and recast for DSHs (23), and the multivariable logistic model. The LKB model parameters (TD 50 , m and n) and their 95% confidence intervals (CIs) were fitted as described in (36). TD 50 is the value of the uniform dose given to the entire organ surface corresponding to the 50% probability to induce toxicity; m is inversely proportional to the slope of the dose-response curve; and n accounts, in this specific case, for the surface effect (n close to 0 meaning weak surface effect, n close to 1 strong surface effect). Briefly, the Maximum Likelihood method was used to find the bestfit values of the LKB parameters by maximizing the logarithm of the likelihood (LLH). The LLH function was numerically maximized by the Nelder-Mead Simplex Method using an in-house developed library for Matlab. Ninety-five percent confidence intervals for parameters estimates were obtained using the profile likelihood method.
In order to evaluate the possible impact of dosimetric and non-dosimetric factors, the multivariable stepwise logistic regression method for NTCP modeling was also applied (37,38). In the multivariable analysis were included only the variables highly correlated with RD (p < 0.1 at the univariable analysis) that were not collinear (correlation |Rs|<0.75) with variables more correlated with RD.
The Leave-One-Out (LOO) method was applied to the whole statistical pipelines to cross validate the models.
Model performance was evaluated by the area under the receiver operating characteristic (ROC) curve (AUC) and by balanced accuracy (39). Cut-off values on the ROC curve were determined by Youden's J statistic (40). Calibration plots were also generated for graphical assessment of the agreement between observed outcome and LOO prediction.

RESULTS
Of the 166 patients, 118 (71%) developed acute RD of any grade; fifteen of 166 (9%) patients developed G3 RD. In particular, 71 (69%) of IMRT patients developed a RD of any grade compared to 47 (75%) of PSPT patients; G3 RD occurred in 8 IMRT (8%) and 7 (11%) PSPT patients, respectively. The distribution of RD grades for each treatment modality is reported in Figure 2. There were no cases of grade 4 toxicity.
No significant differences were found in the distribution of clinical and disease factors between patients classified according to the treatment modality ( Table 1). In addition, the univariate analysis did not show significant correlations between treatment modality and the incidence of RD categorized for any grade threshold (grade ≥ 1: The analysis of average skin DSH in patients stratified by treatment modality (Figures 3A,C) showed that PSPT, compared to IMRT, significantly reduced the skin surface receiving low doses (<12 Gy). An opposite behavior can be observed in the range from 25 to 55 Gy. Average skin DSHs of patients with and without G3-RD showed instead a significant separation between the two curves starting from the dose value of 5 Gy (Figures 3B,D).
At univariate analysis for patients stratified according to G3 RD ( Table 2), all the S x metrics for doses greater than 5 Gy were significantly correlated with G3-RD; among the clinical variables, total gross tumor volume (GTV) size was the only non dosimetric factor significantly correlated with severe RD (p = 0.027).
From NTCP model training, LKB model resulted in the following parameters: TD 50 Table 3.
Regarding the logistic modeling, after the variable selection procedure, multivariable modeling resulted in a single variable model including S 20Gy (OR = 31.4, 95% CI: [7.5, 131.7], constant= −6.34± 1.03). The ROC analysis identified that the optimal cut-off for S 20Gy was 1.1% of the BSA.
Similarly, to the LKB model, the logistic model achieved high prediction performances as shown by the AUC values reported in Table 3. LOO cross validation confirmed good prediction and calibration performances ( Table 3 and Figure 4). Notably, the balanced accuracy demonstrated a good generalization score and a robust prediction capability despite data imbalance.

DISCUSSION
The treatment of choice for many thoracic cancers, such as NSCLC, consists in RT given with either concurrent or sequential CHT (10,41). Radiation induced morbidity to main organs at risk (heart, lungs, esophagus etc.) represents a major concern for radiation treatment. Advanced technologies may potentially reduce the risk of damaging normal tissue, and in particular the favorable physical characteristics of energy deposition in Hadron therapy make it a promising strategy for normal tissue dose sparing and for reducing the side effects of RT. The skin, however, raises unique issues that deserve a separate discussion. Indeed, the initial dose build-up typical of photons is advantageous for skin sparing, compared to the higher entrance dose deriving from the pile-up of Bragg curves in the production of spread-out Bragg peaks. This effect may lead to an increase in incidence or severity of skin toxicity with a potential detrimental impact on both the RT course and the patient's quality of life. In addition, different amounts of dose may be delivered to the skin depending on the particular technology adopted to give proton therapy, which can rely on either passive scattering or active scanning techniques (42).
The domain of radiation-related skin side effects following proton beam therapy were recently investigated for brain tumor patients (16,43). Erythema of grade 1-2 was found to be significantly correlated to skin (defined at 3 mm depth) dose volume parameters in the high dose region (V 35Gy ) from both passive or active scanning proton beams. In a different study on severe RD following PSPT for breast cancer, the authors identified as prognostic factors the V 52.5Gy or the D 10cc of the skin structure defined as a layer of 5 mm inward from the body contour (21).
Few studies have performed a direct comparison on RD incidence following proton versus photon treatments. Acute side effects were compared in a retrospective study on a small cohort of patients after proton beam therapy (18 patients) and IMRT (23 patients) for head and neck cancer (44). Interestingly, in their study, the authors found a greater rate of G2 RD in the proton therapy group, but no difference in the rate of G3 RD between proton and IMRT. Recently, De Cesaris et al. (11) reported on RD after treatment of 86 breast cancer patients undergoing adjuvant proton or photon RT. They observed an increase in moderate (G2) toxicity associated to proton therapy; again, no significant difference between treatment modalities was found for severe RD.
In the present study, we analyzed the data from a randomized trial on PSPT vs. IMRT treatment for inoperable NSCLC patients, and we addressed different aspects related to radiation-induced skin reactions. This study has the unique characteristic of directly comparing acute skin toxicity in a quite large cohort of patients treated at the same institution with proton or photon RT. First, we analyzed the differences of acute skin toxicity between patients treated with IMRT and PSPT. Both the depth of the lung tumor location within the body and the passive proton technique-used in the trial patients-were expected to increase the skin toxicity of the treatment. Despite this, a key finding of our investigation was that the RT technique did not impact neither incidence nor severity of acute RD (Figure 2).
Then, we evaluated the dose to the skin taking advantage of the DSHs expressly extracted for the epidermis. The DSHs were obtained by a fully automated algorithm that guarantees a high level of standardization. To account for the different patients' sizes, the absolute DSHs were normalized using the BSA (32) as scaling factor. The DSH differences between RT modalities showed that PSPT succeeded in lowering the skin surface receiving low dose (namely, <12 Gy), while the expected increase in entrance dose was evident for intermediate to high dose regime (i.e., higher than 25 Gy). Noteworthy, the switch in dose sparing effectiveness between PSPT and IMRT happens at a dose level in the range from 20 to 30 Gy. This range of doses is known to be strongly related to the probability of radiationinduced dermatological effects (12,13,16), as also confirmed in the current study by the comparison of average skin DSHs for patients grouped according to the development of severe RD ( Figure 3B).
Since the treatment modality did not correlate with the considered outcome, the NTCP models for severe RD were derived from the whole cohort of patients. We focused on G3 toxicity due to its high clinical relevance. Two different approaches were applied: the traditional purely dosimetric LKB model and the multivariable logistic regression modeling scheme. Both models indeed are important and can find their application in clinical practice. The multivariate logistic model is more flexible when non-dosimetric variables needs to be considered and in order to build predictive tools for improving personalized patient follow-up care. On the other hand, the LKB scheme is more robust for treatment planning optimization (gEUD is a superior evaluator than multiple DSH cut-off points), since it controls the dose distribution over all dose range.
The LKB approach highlighted a relevant surface effect (n = 0.62) of the dose on RD development. While the LKB n and m parameter estimates were comparable with those obtained in previous published models on acute skin toxicity (1, 13), a TD 50 of 10 Gy was a relatively low dose when compared to those previous studies. However, a direct comparison was hampered by the different modeling strategy (LKB recast on DSH) or the different normalization procedure (the BSA as scaling factor) adopted in the present analysis.
On the other hand, the multivariable logistic regression model highlighted that the most and only significantly independent toxicity predictor was the skin surface receiving more than 20 Gy. The robustness of those radiobiological hints is supported by the good performances of both predictive models, which showed cross-validated ROC-AUCs close to 0.8.
The current interest in the investigation on the patterns of dose-RD response is enhanced by the increasing attention to the quality of life of patients undergoing RT, in turn triggered by the substantially improved therapeutic ratio of the modern treatment techniques. Precise knowledge of the radiobiology of acute skin radiation effects constitutes the essential basis for the development of biology-based treatment strategies. In addition, severe acute skin reactions may be prodromal of consequential skin late effects (45), thus making their prediction and, possibly, prevention even more important.
The newest proton facilities have moved toward pencil beam scanning technology. A phantom dosimetric study investigating skin dose differences between spot scanning and passively scattered proton therapy beams indicated that, on average, a lower skin dose of about 12% was delivered when active spot scanning proton beams were used (42). Thanks to the higher flexibility with an enhanced modulation capability, the combined use of active scanning beams and the inclusion of skin specific model parameters in the planning strategies may result in further skin dose sparing to minimize the occurrence of cutaneous toxicity. In this respect, we focused on two classes of NTCP models that could be easily ported on the most common treatment planning systems used in the clinical practice. Indeed, the DSH formalism can be implemented following the procedure suggested in (23), thus directly allowing for the application of the dose constraints (e.g., S 20Gy ) derived by the logistic approach. On the other side, the estimation of the n parameter of the LKB strategy can be exploited for treatment plan optimization by constraining the gEUD, which is a widespread empirical model available in several commercial systems.
In order to improve our understanding of the mechanisms underlying radiation-induced skin damage, future direction of the research is the inclusion of spatial information of dose distributions within the analysis of skin toxicity, as already performed for different toxicity endpoints after RT (46)(47)(48)(49)(50). The extraction of organ Dose-Surface Maps (51, 52) may allow for an enhanced prediction of RT toxicity based on the knowledge of the most radiosensitive skin areas.
Additional issues to be considered when modeling RD should be the impact of CHT treatments and of different RT dose fractionation schemes. Radiation-related skin side effects have been associated to different patient-related factors such as the use of radiosensitizing CHT and/or biologics (1). In particular, both incidence and severity of RD may be increased by concomitant CHT, although conflicting results are reported in the available literature. For example, a randomized comparison of patients treated for anal cancer by RT alone or combined with CHT found overall RD in 76% for radiation alone versus 93% for combined modality therapy (53). In contrast, a three-arm randomized trial in advanced larynx cancer found similar Grade 3-4 acute skin toxicities for patients receiving RT alone (9%), concurrent RT-CHT (10%), and sequential CHT-RT (7%) (54). Rates of acute and late skin toxicity were not significantly different also in a retrospective analysis of breast cancer patients undergoing lumpectomy with or without adjuvant CHT followed by hypofractionated RT (55). Recently, a multivariable NTCP analysis did not highlighted any effect of CHT on severe RD in breast cancer patients (13).
As regards to dose fractionation, greater dose per fraction are generally of concern to normal tissue toxicities. However, data on adverse skin reactions on patients who underwent Stereotactic Body Radiation Therapy (SBRT) is still limited (1). Suggested skin SBRT dose constraints (for toxicity grade ≥ 3) were D 10cc < 23 Gy, for one single fraction of 34 Gy, and D 10cc <30-33 Gy for a total dose of 40-60 Gy in 4-5 fractions (56). Interestingly, these dose constraints are in the range of doses strongly related to the probability of RD (Figure 3B).
In the cohort analyzed in the current study, all patients received concurrent CHT and standard fractionation regimens. Future studies on large cohorts of patients undergoing RT with and without the use of CHT treatments and with different fractionation size are warranted in order to shed light on the possible CHT enhancement factor and fractionation effects.
A potential limitation of the study is related to the dose calculation uncertainties in the first few millimeters from body surface, which may be relatively large. However, in order to quantify their impact on the modeling results, Mori et al. (15) performed a sensitivity analysis showing that dose uncertainty has negligible impact on logistic regressions coefficients. Furthermore, the percentage differences between the measured dose to the skin and the estimate of the treatment planning system with passively scattered proton beams was evaluated in (42). The average measured doses resulted to be only 2% lower than the average calculated doses.
In conclusion, despite skin toxicity has long been considered a potential limiting factor in the clinical use of proton beam therapy, no significant differences in RD incidence was found between IMRT and PSPT in the analyzed trial. The developed NTCP models for the prediction of severe RD, once externally validated, may advance treatment planning optimization for the implementation of skin sparing techniques.

DATA AVAILABILITY STATEMENT
The datasets for this article are not publicly available because: Data Sharing Agreement does not include this option. Requests to access the datasets should be directed to Dr. Zhongxing Liao (zliao@mdanderson.org).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by MD Anderson Cancer Center IRB (protocol NCT00915005). The patients/participants provided their written informed consent to participate in this study.