Quantitative Comparison of Knowledge-Based and Manual Intensity Modulated Radiation Therapy Planning for Nasopharyngeal Carcinoma

Background and purpose To validate the feasibility and efficiency of a fully automatic knowledge-based planning (KBP) method for nasopharyngeal carcinoma (NPC) cases, with special attention to the possible way that the success rate of auto-planning can be improved. Methods and materials A knowledge-based dose volume histogram (DVH) prediction model was developed based on 99 formerly treated NPC patients, by means of which the optimization objectives and the corresponding priorities for intensity modulation radiation therapy (IMRT) planning were automatically generated for each head and neck organ at risk (OAR). The automatic KBP method was thus evaluated in 17 new NPC cases with comparison to manual plans (MP) and expert plans (EXP) in terms of target dose coverage, conformity index (CI), homogeneity index (HI), and normal tissue protection. To quantify the plan quality, a metric was applied for plan evaluation. The variation in the plan quality and time consumption among planners was also investigated. Results With comparable target dose distributions, the KBP method achieved a significant dose reduction in critical organs such as the optic chiasm (p<0.001), optic nerve (p=0.021), and temporal lobe (p<0.001), but failed to spare the spinal cord (p<0.001) compared with MPs and EXPs. The overall plan quality evaluation gave mean scores of 144.59±11.48, 142.71±15.18, and 144.82±15.17, respectively, for KBPs, MPs, and EXPs (p=0.259). A total of 15 out of 17 KBPs (i.e., 88.24%) were approved by our physician as clinically acceptable. Conclusion The automatic KBP method using the DVH prediction model provided a possible way to generate clinically acceptable plans in a short time for NPC patients.


INTRODUCTION
Intensity modulation radiation therapy (IMRT) has become a major treatment modality for nasopharyngeal carcinoma (NPC). Compared with traditional two-dimensional radiotherapy and three-dimensional conformal radiotherapy, IMRT uses inverse planning algorithms to generate fields of varied beam intensity which allows a higher radiation dose to be delivered to the tumor while minimizing exposure to the surrounding healthy organs (1,2). Recent reports have proven a better 5-year overall survival, tumor local control, and fewer late toxicities for NPC patients treated with IMRT (3,4).
Although the clinical benefits of IMRT for NPC treatment have been confirmed, a renewed concern has recently arisen on the quality of IMRT planning. Currently, IMRT planning is still a trialand-error procedure, in which dosimetrists are required to predetermine all the starting optimization objectives for tumor targets and organs at risk (OARs), and manually adjust them during the optimization process until the desired dose distribution is achieved. This is a challenging process because the optimization objectives are usually unknown before planning and geometrical anatomy-based features vary among patients. It has already been demonstrated that the plan quality relies heavily on the experience of a dosimetrist and the time spent on a given plan (5). What is worse, the recommended IMRT quality assurance protocols can only check whether the planning parameters are correct or not, they can not verify whether the plan has an optimal dose distribution. Therefore, it is essential to explore new methods to guide planners of varied skill levels to generate high quality plans in a more efficient way.
Many efforts have been made to offer a clearer directionality during IMRT planning by utilizing both patient anatomical information and past planning experience. Early exploration was conducted by Wu et al. (6,7) who proposed an information retrieval method which utilized an overlap volume histogram to find similar plans of previous patients in a database as initial planning goals to guide the new planning procedure. Moore et al. (8) formulized the correlation between the principle OAR mean dose and the percentage of that OAR overlapping the planning target volume (PTV) to yield a simple dose prediction model, striving to provide a quality control tool for clinical IMRT planning. Recently, more sophisticated frameworks like machine learning were introduced to create refined dose volume histogram (DVH) estimation algorithms (9,10) and preliminary results demonstrated that such knowledgebased planning (KBP) methods helped improve plan quality and planning efficiency by integrating the prior information into the planning process (11,12).
While the KBP method has been found to be useful in many treatment sites (12)(13)(14), a newly published work revealed that less than half of fully automatic KBP plans for NPC cases can satisfy the clinical acceptance criteria (15). This is mainly due to the proximity of neighboring critical structures to the tumor target so that any slight improvement in target dose coverage may also result in those structures exceeding the primary objective dose constraints. Thus the purpose of this study is to validate the suitability and efficiency of the fully automatic KBP for NPC cases, with special attention to the possible ways that the success rate of auto-planning can be improved. To quantitatively evaluate plan quality, a quality assessing tool with built-in scoring criteria was introduced. The potential benefits of combining this quality metric with estimated DVHs for quick plan quality check were discussed.

Prior Plan Selection
To generate the DVH prediction model, 99 prior IMRT plans for NPC patients were retrospectively selected from our institutional database. The TNM staging information is shown in Table 1. All patients were immobilized in the supine position with headneck-shoulder thermoplastic masks. A 9 co-planar beam IMRT plan with a collimator angle fixed at 0°was designed for each case by a senior physicist using the Eclipse treatment planning system (version 11.0, Varian Medical Systems, Palo Alto, CA). The dose prescription was set to 70 Gy in 30 fractions to the planning gross target volume (PGTV), 60 Gy in 30 fractions to the planning target volume (PTV1), and 54 Gy in 30 fractions to the planning target volume (PTV2). For NPC, the planning target volumes (PTV1 and PTV2) were constructed automatically by expanding the corresponding clinical target volumes (CTV1 and CTV2) in three dimensions by 3 mm, allowing for setup uncertainties. Specifically, CTV1 includes the high-risk regions of microscopic infiltration surrounding the primary gross target volume (GTV), which is defined as GTV plus a 5-10 mm margin, including the entire nasopharyngeal mucosa. CTV2 is defined as CTV1 plus a 5-10 mm margin to encompass the low-risk anatomic sites of microscopic extension. Besides, the located neck levels of the lymph nodes, and the elective neck irradiation levels are also defined as CTV2. The planning goals for tumor targets and dose constraints for the OARs were chosen according to our department protocols and national and international recommendations (16,17). Recent follow-ups indicated that all patients were proven to have favorable prognoses with neither severe late toxicity nor treatment failure (local recurrence/ distant metastasis).

Generating a KBP Plan
In this study, a mathematical framework was performed to derive DVH estimation models for head and neck OARs from high quality prior plans, similar to Zhu et al. (9). The model incorporated two major groups of anatomical features including volumetric information and spatial information, which were characterized by the minimum distance from a voxel to the PTV surface (distanceto-target histogram, DTH). The DTH and DVH curves were parameterized using principal component analysis so that noticeable anatomical and dosimetric features were quantified by 1 to 4 principal components with eigenvalue contributions over 97%. For each individual OAR, multivariate regression analysis was carried out to select the variables with statistical significance and thereafter a mathematical model was built using support vector regression (SVR). It was reported that using SVR with a ϵ-insensitive loss function can avoid overfitting and has fewer fitting errors than using multivariable nonlinear regression (9). As the quality of the plan database may determine the degree of accuracy that a prediction model can offer, a refinement process was performed for the primary model to improve its predictive accuracy (18,19). This was done by taking the primary model as a self-checking tool and relatively suboptimal database plans were thus identified by comparing the estimated DVHs with the planned DVHs. Unlike previous studies, these suboptimal plans were not excluded from the database, but were rejoined to the training dataset after they were reoptimized by a group of experts under the guidance of the estimated DVHs to further spare the OARs.
The refined model was then used for automatic IMRT planning, by means of which the achievable DVHs were predicted with a 95% confidence interval for each OAR. It is known that the commercial planning system RapidPlan takes the lower bound of the DVH estimate range as the optimization objectives with an attempt to maximize OAR sparing (20). Based on our experience and the previous study (15), we selected the predicted mean value instead of the lower limit of the DVH estimation range as the starting optimization objectives for some adjacent OARs such as the optical chiasm, optical nerve, pituitary, and inner ear in advanced T3-T4 cases to better balance the target dose coverage and normal tissue protection.

Clinical Evaluation
The clinical test was conducted in 17 new NPC cases of various clinical stages (T1: 2 cases, T2: 1 case, T3: 10 cases, and T4: 4 cases). For each patient, three different IMRT plans were generated: 1) a manual plan (MP): this plan was designed independently by a dosimetrist in the traditional trial-anderror way. 2) A knowledge-based plan (KBP): this plan was automatically generated based on the estimated DVHs by only one click of the 'optimization' button with no other human intervention, which is different from the previous study (15).
3) An expert plan (EXP): the MP was adjusted repeatedly by an expert panel with reference to the estimated DVHs until a consensus on the dose distributions was reached. The EXP was regarded as the reference standard in our plan comparison.
In addition, the plan quality variation among planners was investigated by selecting 5 NPC cases of different difficulty (T2: 1 case, T3: 3 cases, and T4: 1 case). For each case, an MP plan was generated independently by three planners with diverse working ages (A: trainee, nearly one-year experience; B: young dosimetrist, three-year experience; and C: senior dosimetrist, more than five-year experience). The resulting plan quality and time consumption were compared.

Dosimetric Analysis Indices
For a tumor target, a plan comparison was conducted in terms of dose coverage, conformity index (CI), and homogeneity index (HI).
The CI (21) was calculated using the following equation: where V Tref is the volume of the target covered by the reference isodose, V T is the target volume, and V ref is the volume of the reference isodose. The HI (22) was defined as: where D x% is the absorbed dose received by x% of the target volume. In this study, 14 kinds of head and neck OARs for NPC treatment were evaluated as shown in Table 2. The maximum dose (D max or D 1cc ) and the mean dose (D mean ) were chosen for the dosimetric evaluation of serial and parallel organs, respectively. The D 1% was specially applied for optic organs as their volumes were too small. Other dosimetric indices used are detailed in Table 2.
To quantify the plan quality, an assessing tool, namely plan quality metric (PQM), was introduced (23). The scoring criteria were established based on our institutional protocols and referenced in the RTOG-0225 and RTOG-0615 guidelines (16,17) and the work of Ng et al. (24). The total score was 200 points and was divided into 4 levels, i.e., targets (100 points), critical organs (60 points), sub-critical organs (25 points), and other normal organs (15 points). The organ classification and scoring details are listed in Table 2.
As for statistical analysis, a Kolmogorov-Smirnov test and homogeneity of variance test were used to affirm the normality and variance homogeneity of the data. For those fulfilling the above two conditions, an F-test was performed or otherwise a Friedman test was applied for a plan comparison. A Bonferroni test was further selected for pair wise comparison in multiple objectives. All statistical analyses were performed using the SPSS software (version 22, SPSS Inc., Chicago, IL). Table 3 shows the target dose distribution for three kind of plans. All three groups achieved a dose coverage of V 98% higher than 99% for PGTV and PTV1. The hot spot was better controlled in the EXPs (p=0.013), but all three kind of plans had a V 110% of lower than 3%. Compared with MPs and EXPs, KBPs acquired increased conformity in PGTV (p<0.001) at the sacrifice of HI in PGTV, PTV1, and PTV2 (p<0.001). It was observed that V 98% in PTV2 was significantly lower in KBPs than those in MPs and EXPs (p=0.041).

OAR Dose Analysis
While the radiation doses to OARs were all managed within the tolerance limits in the three kinds of plans, the KBPs had a slight advantage in OAR sparing than MPs and even EXPs ( Table 4). Significant dose reduction was achieved in KBPs for critical organs such as the optic chiasm (p<0.001), optic nerve (p=0.021), and temporal lobe (p<0.001), but the KBPs failed to spare the spinal cord compared with MPs and EXPs (p<0.001). As for sub-critical and other normal organs, the KBPs also provided comparable or better protection except for the pituitary (p=0.002) compared with MPs and EXPs.

Overall Plan Quality Evaluation
The plan quality scores are given in Table 5

Plan Quality Variation Among Planners
The PQM scores of five tested cases were on average 136.60± 18.68, 141.40±18.99, and 143.80±20.35, respectively, for dosimetrist A, B, and C ( Table 6). It was noticed that the plan quality improved with increased experience.
As shown in Figure 1, the average time required to achieve clinically acceptable dose distributions decreased with the increase of work experience. However, it was observed that

DISCUSSION
Previously published studies have revealed that quite a few clinical plans may have sub-optimal dose distributions, leading to excessive irradiation to normal tissues (11,25). KBP methods may provide a possible solution by incorporating prior information into the planning process. In this study, we validated the feasibility and efficiency of a KBP method based on estimated DVHs with special efforts to improve the success rate of auto-planning for NPC treatment. As the database quality might have a direct impact on the prediction results (26), only high quality prior plans with definite curative effects were enrolled. Also, a refinement process was applied here for the primary model to enhance its predictive ability as recommended by several authors (18,19,27).
By introducing estimated DVHs, patient-specific optimization objectives rather than general templates were generated for each individual patient in the KBP method, based on the patient anatomy and prior knowledge. This helped offer a clearer directionality for the planner to refine the optimization objectives and achieve a high quality plan, which would be particularly useful for some complicated disease sites such as cancer of the head and neck (28). Our results showed that the EXP method provided the best trade-off between target dose coverage and normal tissue protection, acquiring the highest quality assessment scores among the three kinds of IMRT plans. For T1-T2 and most T3 cases, the KBP method has shown its capability in sparing normal tissues and thus the plan quality score of a fully automatic KBP is better than that of MP, and is close to or reaches the level of EXP. For advanced T4 cases, due to the proximity of neighboring critical structures to the tumor target, some minor improvements in OAR sparing may lead to insufficient target dose coverage, giving the KBP a slightly lower score than MP and EXP. However, no statistically significant  difference was found among the three kinds of plans, indicating that the KBP method can produce comparable or even better plans than the traditional manual way. This observation was consistent with previously published studies (15,28,29).
It should be noted that we herein applied predicted mean DVH values as the starting optimization objectives for some adjacent OARs such as the optical chiasm, optical nerve, pituitary, and inner ear in advanced T3-T4 cases. This may be the reason why we obtained a higher success rate in autoplanning (about 88%) than the previous study (about 45%) (15). Chang et al. (15) conducted their investigation using a similar estimation module, but took into account the lower bound of the DVH estimate range as the optimization objectives with an attempt to maximize OAR sparing, though the predicted mean usually represents the best estimate from a statistical point of view. For early T1-T2 cases, there is enough distance between tumor targets and the surrounding normal tissues to allow for high dose fall-off, thus relatively "tighter" objectives help achieve better results. However, for advanced T3-T4 cases, applying the lower limit of the estimated DVH as the objective seems too hard to realize for almost all the OARs, especially for the optical chiasm, optical nerve, pituitary, and inner ear which are adjacent to or overlap the target area. These "hard" objectives cause suboptimal trade-off, resulting in insufficient target coverage by the prescribed dose. In fact, even if the predicted mean was selected as the objective, our results demonstrated that the automatic KBP still spared the surrounding critical organs well.
A previously published study applied a scoring system, together with KBP models, to serve as a teaching aid for training IMRT planning skills for lung cancer (30). However, it has been pointed out that this scoring system will always have an ad hoc nature as the preferences of physicians will vary, although the plan scoring system can measure the overall quality of a plan (30). In this study, a similar quality assessment tool was also introduced to quantify the plan quality of NPC cases. The builtin dosimetric indices were referenced in the relevant national and international guidelines, while the scores were given based on our clinical evaluation practice, ensuring that the derived score was in good agreement with the clinical comments. It was shown that for T1 and T2 cases, the high quality plan usually obtained a score of above 150 points, but for T3 and T4 cases, the plan acceptance criteria should be properly reduced to about 140 points. This suggests that if a plan quality score is below these thresholds, for example, if a T2 case obtains an assessing score of less than 150 points, then the planner should be cautious and a systematic quality review would be required to keep the plan standard high. It has been proven to our satisfaction that the quality metric can be calculated within seconds, providing an efficient tool for quick plan quality checks. However, as shown by us and the previous study (15), the KBP method failed to spare the spinal cord compared with MPs and EXPs. This may be due to the fact that only the primary lesion of the nasopharynx was involved in the DVH prediction model, and the influence of a cervical positive lymph node target was not considered. Recently, Zhang et al. (31) proposed an improved model building method utilizing a so-called generalized distance-to-target histogram to capture the geometric relationships of an OAR with multiple PTVs. This may provide a potential solution for generating a more accurate DVH prediction model for NPC. More research is warranted.
Our results confirmed that traditional manual planning was operator-and experience-dependent. Compared with the junior planner, the experienced dosimetrist was able to produce a high quality plan in a shorter period of time. The KBP method makes full use of prior knowledge, which can generate a plan with quality comparable to that of a senior dosimetrist. However, as commented by Chang et al. (15), the KBP method cannot fully replace the experienced planners, but works more as an aid to guide planners of varied skill levels, especially for the junior planners, to obtain a qualified plan in a more efficient way. By using KBP, the plan quality variation among planners was minimized, thus improving the overall plan quality in a systematic way.

CONCLUSIONS
This study provided evidence that the automatic KBP method can produce clinically acceptable IMRT plans with quality comparable to manual plans for NPC cases. The quality metric helped to quantify the plan quality for a more intuitive evaluation of the planned dose distribution, providing a potential tool for quick plan quality checks.

DATA AVAILABILITY STATEMENT
The datasets of this research are backed up on the Research Data Deposit (RDD, https://www.researchdata.org.cn, approval number: RDDA2020001752) and are available on reasonable request.

ETHICS STATEMENT
Our study was reviewed and approved by the IRB committee of Sun Yat-sen University Cancer Center, with the approval number of B2019-131-01. Written informed consent was obtained from the participants of this study.

AUTHOR CONTRIBUTIONS
ZQ designed and supervised the study. ZQ, BL, and JH developed the automatic KBP method. JH, BL, and WX collected and analyzed the data. JZ, XY, HG, MW, and YW provided technical assistance for the study. ZQ, JH, BL and WX wrote the manuscript. All authors contributed to the article and approved the submitted version.