Retrospective Clinical Evaluation of a Decision-Support Software for Adaptive Radiotherapy of Head and Neck Cancer Patients

Purpose This study aimed to evaluate the clinical need for an automated decision-support software platform for adaptive radiation therapy (ART) of head and neck cancer (HNC) patients. Methods We tested RTapp (SegAna), a new ART software platform for deciding when a treatment replan is needed, to investigate a set of 27 HNC patients’ data retrospectively. For each fraction, the software estimated key components of ART such as daily dose distribution and cumulative doses received by targets and organs at risk (OARs) from daily 3D imaging in real-time. RTapp also included a prediction algorithm that analyzed dosimetric parameter (DP) trends against user-specified thresholds to proactively trigger adaptive re-planning up to four fractions ahead. The DPs evaluated for ART were based on treatment planning dose constraints. Warning (V95<95%) and adaptation (V95<93%) thresholds were set for PTVs, while OAR adaptation dosimetric endpoints of +10% (DE10) were set for all Dmax and Dmean DPs. Any threshold violation at end of treatment (EOT) triggered a review of the DP trends to determine the threshold-crossing fraction Fx when the violations occurred. The prediction model accuracy was determined as the difference between calculated and predicted DP values with 95% confidence intervals (CI95). Results RTapp was able to address the needs of treatment adaptation. Specifically, we identified 18/27 studies (67%) for violating PTV coverage or parotid Dmean at EOT. Twelve PTVs had V95<95% (mean coverage decrease of −6.8 ± 2.9%) including six flagged for adaptation at median Fx = 6 (range, 1–16). Seventeen parotids were flagged for exceeding Dmean dose constraints with a median increase of +2.60 Gy (range, 0.99–6.31 Gy) at EOT, including nine with DP>DE10. The differences between predicted and calculated PTV V95 and parotid Dmean was up to 7.6% (mean ± CI95, −2.7 ± 4.1%) and 5 Gy (mean ± CI95, 0.3 ± 1.6 Gy), respectively. The most accurate predictions were obtained closest to the threshold-crossing fraction. For parotids, the results showed that Fx ranged between fractions 1 and 23, with a lack of specific trend demonstrating that the need for treatment adaptation may be verified for every fraction. Conclusion Integrated in an ART clinical workflow, RTapp aids in predicting whether specific treatment would require adaptation up to four fractions ahead of time.


INTRODUCTION
The use of intensity-modulated radiation therapy (IMRT) and volumetric-modulated arc therapy (VMAT) techniques to treat head and neck cancer (HNC) enables the delivery of highly conformal radiotherapy (RT) treatments with complex dose distributions. Initial issues in RT delivery, such as patient setup and localization errors, were addressed by the International Commission on Radiation Units and Measurements (ICRU) in the 1980s and 1990s with the recommendations for the delineation of gross tumor volume (GTV), clinical target volume (CTV), and planning target volume (PTV) structures (1)(2)(3) and minimized by the continuous improvement of imaging modalities for patient setup verification. The latest image-guided radiation therapy (IGRT) solutions have pushed the limits of reducing PTV-to-CTV margins to only a few millimeters (4)(5)(6), generating a greater sparing of organs at risk (OARs). However, these smaller margins leave very little room for errors, as an inadequate PTV coverage could lead to treatment failure. The proximity of critical structures to GTVs, the change in volume, the displacement of targets and OARs, and weight loss during treatment are now the new challenges faced by radiation oncologists, as they all constitute risks for target underdosage, leading to possible local failure or radiation toxicities (7)(8)(9)(10)(11). Successful strategies to improve HNC patients' quality of life after RT include sparing the parotid glands (PGs) and the mandible to decrease the risks of xerostomia (12) and osteoradionecrosis (13).
Adaptive radiotherapy (ART) involves all methods that aim to adapt RT treatments and delivered dose distributions to any specific patient anatomical changes (14). The main potential benefits of ART are to ensure the adequate dosimetric coverage of targets and to limit OAR doses throughout the treatment, assuming that it will increase the therapeutic ratio and provide better outcomes for cancer patients (15). ART encompasses offline, online, and real-time strategies to mitigate the effects of anatomic shifts and setup errors (16). While both offline and online ART involve imaging to review the current anatomy and to assess the need for a new plan, most common ART methods allowed by current technologies for HNC patients are variants of offline strategies (17)(18)(19)(20)(21)(22), as these are well suited for the slow progressing nature of anatomic changes observed during HNC RT treatments (8,20,23). Online methods are more appropriate to address the effects of stochastic patient setup errors (24,25), such as those caused by the inter-fractional variation in shoulder position (26,27) or the loose fitting of the immobilization mask due to weight loss. They are, however, only commercially available for HNC treatment on dedicated adaptive RT systems such as the Halcyon with Ethos (28) (Varian, Palo Alto, CA), Radixact (29) (Accuray, Sunnyvale, USA), or the MRIdian (Viewray, Cleveland, USA) and Unity (Elekta, Stockholm, Sweden) combined MRIlinac platforms (30). MR-guided online ART is a promising avenue for HNC treatment, as it would allow for daily online adaptation (important for fast responding HN tumors) and may allow for the online monitoring of tumor response with functional MRI protocols without the additional dose delivered with nuclear imaging (30). Real-time ART introduces additional sophisticated patient monitoring to correct intra-fractional anatomic shifts in real time during treatment delivery (31) and seems excessive for HNC treatments. In contrast, offline ART can readily be i m p l e m e n t e d w i t h t h e c u r r e n t b a s i c c l i n i c a l R T treatment resources.
Clinical offline ART workflows follow four key steps: imaging, assessment, re-planning, and quality assurance (QA) (16). It is recommended that HNC patients be monitored with CT or CBCT imaging acquired frequently (daily or weekly) throughout the treatment (32,33). These images are reviewed by a radiation oncologist who then decides whether the treatment plan is to be adapted. The re-planning decision is based on a review strategy, which typically consists of assessing anatomic variations and their impact on the dosimetry of targets and OARs. After the registration of the periodic CBCT images to the initial plan CT, an initial qualitative evaluation visually compares the structure contours from the periodic CBCT images against pretreatment volumes. Subsequent quantitative assessments require specialized software tools to compute similarity and distance metrics between the initial volumes and the new structures and to estimate the treatment dose from the most current patient anatomy. The determination of patientand plan-specific thresholds based on treatment site, fractionation, and outcome is key to optimize the ART workflow and to provide an individualized approach well suited for HNC patients. However, the ART tasks performed after periodic patient imaging requires several hours of expert physicians, physicists, and dosimetrists, consuming resources that most radiation oncology facilities cannot afford. As no commercial automated ART offline workflow is yet available with gantry-mounted linacs, quantitative changes for targets and OAR structures of interest cannot be estimated in a feasible time for the majority of HNC patients. The time-consuming nature of offline ART workflows leads to delays until the new plan is available for treatment. Consequently, clinicians might continue to administer RT according to the original treatment plan, which may reduce the efficacy of the radiation treatment sought by triggering a plan adaptation. Therefore, there is a current need for an automated and quantitative framework that will process the daily imaging, generate the contoured structures, compute the dose to be received by the structures of interest, and predict if a re-planning is required to maintain the current plan quality. Such automated workflow would ideally include the implementation of predictive models to allow for the instantiation of clinical adaptive re-planning ahead of time.
This manuscript reports on our experience with a newly developed commercial decision-support software platform for ART, RTapp ™ (SegAna, Orlando, FL), which automatically tracks and analyzes daily anatomical changes throughout an entire course of RT and predicts when treatment plans will exceed dose constraints. Most software tasks are optimized to run on a graphics processing unit (GPU) and allow the presentation of the results in near real time (19). A feasibility study of the ART workflow introduced by RTapp was conducted retrospectively with HNC patient data to assess if RTapp could help determine the need for adaptive re-planning during treatment, based on a set of hypothetical PTV coverage and OAR dose thresholds.

Enrollment Criteria
An initial set of 81 HNC patients treated with external beam radiation therapy (EBRT) between January and December 2019 with VMAT was surveyed for the retrospective analysis, under Institutional Review Board (IRB) protocol (# LU213253). Exclusion criteria included patients who received prior HNC RT (n=2) or EBRT with sequential boost (n=4) or treatment adaptation (n=4), as these require a new CT scan; patient not imaged with daily kV-CBCT (n=4); and any patient for whom complete PTV and PG volumes were not included in the CBCT field of view (n=24) for all treatment fractions due to the need to track dose constraints based on full volume coverage. Additional exclusion criteria were applied to sinus and nasal cavity sites (n=3) and lips (n=3) due to target locations with initial limited interest for adaptation. The above selection criteria were fulfilled by 37 patients, out of which 27 were randomly selected for analysis. The diversity of HNC sites ( Table 1) was representative of the HNC patient population treated at our institution. Table 2 summarizes the distribution of treatment prescriptions and targets included in this study.

Treatment Planning, Delivery, and Imaging
The treatment planning images were acquired on a 32-slice Siemens SOMATOM CT Open AS scanner (Siemens Healthineers, Erlangen, Germany) with a reconstructed slice thickness of 3 mm and metal artifact reduction (MAR) enabled by default. All HNC patients were immobilized with a Q-fix Fiberplast ® Portrait S-frame Head and Shoulder thermoplastic immobilization mask (Qfix, Avondale, PA). The GTV and nodal CTV targets were contoured by the treating physician prior to applying 2-3 mm PTV margins defined as follow: high-risk (HR) PTV, intermediate-risk (IR) PTV, and low-risk (LR) PTV. A dosimetrist contoured all normal structures and OARs. The plan optimization followed the list of dose constraints required by the treating physician for each individual plan. All patients were treated on a Varian Truebeam linear accelerator (Varian Medical Systems, Palo Alto, CA) with 6 MV VMAT in 30-35 fractions. Daily patient setup and verification was performed with the On-Board kV Imaging (OBI) system. Each CBCT image set was comprised of 93 frames with 2 mm separation. Table 3 lists the planning dose constraints for our patient cohort. Each patient dataset, composed of treatment plan CT, structure set, 3D dose, and daily 3D kV-CBCTs, was anonymized prior to be exported as DICOM RT objects for processing by RTapp.

Overview of the Adaptive Software Platform
RTapp is a stand-alone and vendor agnostic application. As such, it can be employed with any treatment delivery platform, as long as 3D imaging is available for analysis.

Software Front-End
The front-end of the application ( Figure 1) displays treatmentspecific data, anatomic visualization windows (Figures 1A-F) and panels with graphical data (Figures 1G-I) to guide the ART decision-making process for the selected patient study.

RTapp Workflow
The main purpose of RTapp is to estimate the dose received by each structure at any treatment time point. A predictive algorithm analyzes the trend of user-defined structure and specific dosimetric parameters (DPs) against predetermined dosimetric endpoint (DE) values to forecast if, and so when,  any dose constraint would be violated. The automated RTapp workflow can be divided in three steps, as outlined in Figure 2.
Step 1. For each treatment fraction, the initial treatment plan CT images and structures are deformed to match the daily setup CBCT images. An optical flow-based deformable image registration (DIR) algorithm (34) automatically registers the initial and daily 3D image sets and generates a deformation vector field (DVF).
Step 2. The DVF is then employed to deform the initial plan structures and dose into daily deformed structures and dose, as described in Qi et al. (19). The deformation results can be verified via DIR confidence metrics.
Step 3. The daily dose distribution within any structure is calculated from the deformed structures and dose. The dose volume histograms (DVHs) for the day of treatment (DVH day ) and up to the current treatment time are generated from the estimated dose distribution ( Figure 1G). The day of treatment DVH represents the daily dose scaled to the whole course of treatment, assuming that the daily anatomy would be maintained for the whole course of treatment. The sum DVH (DVH sum )  summarizes structure doses accumulated up to the current fraction, scaling the latest fraction dose to the remaining course of treatment, assuming that the most current structure anatomy and dose distribution would hold for the remaining treatment fractions. Specific dosimetric parameters DP day and DP sum are calculated from the DVH day and DVH sum to populate dose trend graphs displayed on the software front-end ( Figure 1H). A linear predictive model analyzes the trends of DP sum to forecast their values over the next four fractions, hence providing quantitative data to guide the decision to adapt proactively.

Implementation Environment
RTapp was tested as a standalone application installed on a Microsoft Windows 10 workstation equipped with a 2.3-GHz intel Core i-9 CPU, 32 GB RAM, and a Nvidia GeForce RTX2070 (8 GB RAM). The processing of a single fraction data set (93 CBCT images and~30 structures) typically took <1 min.

Evaluation of DIR Quality
The quality of the deformation was first assessed by visually comparing overlaid initial and deformed structures contours on the CBCT viewing panels ( Figures 1D-F). A quantitative evaluation was then performed with two DIR confidence metrics provided by RTapp, following the recommendations from the AAPM TG-132 report (35). Structures with normalized cross-correlation (NCC) values <0.85, or for which the displacement vector of the secondary image voxels of a structure exceeded a 7-mm "large" displacement threshold after deformation, were automatically flagged for review. While the NCC threshold was hard coded into RTapp, the pixel displacement threshold of 7 mm was selected, as it qualitatively provided the best trade-off between too many flags (<5 mm) and missing registration errors (>9 mm) due to positioning errors on an initial test HNC patient data set. The DIR algorithm parameters were adjusted before reprocessing a fraction when the flagged structures deformations were assessed as inaccurate after user review.

Dosimetric Metrics and Thresholds for Target Coverage and OARs
The structures monitored for the retrospective study were all PTV, CTV, GTV, and nodal targets, and parotid glands (PGs), spinal cord, cochleae, brainstem, mandible, esophagus, and larynx when applicable. The DPs evaluated against the need for adaptation were based on the dose constraints required for the treatment planning of HNC at our institution ( Table 3). All cases had identical requirements for the PTVs (V 95 > 95% of prescription dose; maximum dose D max < 110%) and for the PGs (D mean < 20 Gy). Additional PG DPs were defined specifically for this study: D mean < 21 Gy for cases where the dosimetrist could not keep the mean ipsilateral PG dose below 20 Gy due to the overlap with IR PTVs, and D mean < 26 Gy for spared contralateral PGs. Other OAR constraints varied per plan.

Adaptive Review Strategy
Each patient data set was fully processed with RTapp. The DVH day and DVH sum generated by RTapp for the final dose ( Figure 1G) were compared to the initial plan DVH (DVH plan ). Any violation of dose constraints at end of treatment (EOT) were tallied as potential case for adaptation. For every dose constraint violation, the DP sum trend ( Figure 1H) was reviewed to determine the fraction when the violation occurred. For this work, a hypothetical "warning" threshold (V 95 < 95%) was chosen to investigate the impact of daily setup variation on PTV coverage, and an "adaptation" threshold of −2% (V 95 < 93%) was set to trigger a review of this patient's anatomy for replanning. A hypothetical OAR "adaptation" dosimetric endpoint of 10% (DE 10 ) was set uniformly for all D max and D mean DPs. A set of HNC structure-specific endpoints, which lists the above structures and associated adaptation thresholds, was saved in RTapp to conduct this retrospective work.

Prediction Model Accuracy
The accuracy of the prediction model was evaluated by calculating the difference between the DP sum value at the fraction when it violated the adaptation threshold and the predicted pDP sum values from the four fractions preceding the violation time point. The difference in DP sum was averaged over all patient studies flagged for adaptation to calculate the 95% confidence interval as summarized by Equation 1. For a particular DP sum value crossing a threshold at fraction Fx, the difference in DP sum from a predicted pDP sum value based on processed fraction [Fx − i] with i = [4,3,2,1], averaged over all n flagged studies is given by: Finally, this retrospective work was devised to help provide estimates on the proportion of HNC patients expected to need ART and gather information on the magnitude of differences observed between plan results and actual determined DP values in order to help better plan future prospective analysis with this software. As such, a potential clinical ART workflow integrating RTapp in the treatment of HNC patients was proposed based on our experience.

Adaptive review
The retrospective analysis with RTapp reported 18/27 patient studies (67%) that failed to meet at least one dose constraint at EOT. The flagged structures were 12 PTV targets and 17 PGs. Other structure DPs remained below their warning threshold values throughout the treatments. Box plots summarizing the differences between the DVH plan and the DVH sum derived PTV D max and V 95 , and PG D mean and spinal cord D max are shown in Figure 3. Overall, PTV dosimetry and coverage decreased while PG and spinal cord doses increased during RT, with the latter remaining below thresholds.

Targets Coverage
The difference in V 95 between planned and EOT DP sum values ranged from 0 to −4.9% for HR PTVs, +0.4 to −13.7% for IR PTVs, and −0.1 to −7.1% for LR PTVs. The difference in PTV D max ranged from +0.74 to −3.7 Gy. While all PTVs met the minimum V 95 > 95% coverage requirement after treatment planning, 12 PTVs belonging to nine patients were flagged for under-coverage (V 95 < 95%) at EOT. Table 4 summarizes the changes in targets coverage. The mean PTV coverage decrease was D V 95 = −6:8 ± 2:9 % resulting in a mean final V 95 = 91:8 ± 2:9 % . Six flagged targets were IR PTVs, accounting for the initial GTVs, involved nodal basin, and areas of microscopic spread (studies 94, 118, 146, 46, 19, and 18). Four flagged PTVs were covering postoperative beds (studies 15, 8, and 99), while the last three flagged PTVs were covering low-risk nodal basin (studies 8, 18, and 19). The example shown in Figure 1, from study 118, illustrates the effect of internal anatomical changes and weight loss on the last fraction CBCT (Figures 1D-F). The inwards shift and the regression of the deformed PTV contour are responsible for the loss of coverage (dotted line DVH sum on Figure 1G) for IR PTV63 with an EOT V 95 value of 92.86%.

Trend Analysis of Target Dosimetric Parameters
The threshold-crossing fraction for V 95 < 95% threshold was extracted from the automated fraction processing reports ( Table 4, "Fraction for V 95 < 95%"). The results hinted at a clustering of PTV coverage constraint violations occurring either during the first three fractions (early) or after the second quarter of the treatment (late), with median final V 95 values of 91.6% and 92.5%, respectively. However, an independent samples twotailed t-test for equality of the mean final V 95 values of both groups indicated no significant difference (p = 0.799). The offline review of daily setup CBCT images revealed that local setup errors were responsible for the lack of V 95 coverage at the first fraction due to head and mandible tilts (studies 18 and 19) and shoulder misalignment (studies 15, 99, and 146). The fractions at which a specific PTV crossed the V 95 < 93% "adaptation" threshold were determined from their respective V 95 DP sum trend graphs. Figure 4A shows an example dose trend from study 118 where the PTV coverage decreases during treatment and crossed the 93% "adaptation" threshold (indicated by the red background) at fraction 26, as pointed out by the arrow. Six PTVs from studies 8 (PTV60 and PTV54), 15 (PTV60), 18 (IR PTV63), 19 (LR PTV56), and 118 (IR PTV63) were flagged for adaptation before EOT with V 95 < 93% at a median fraction of 6 (range, 1-16).

Parotid Mean Dose
The difference between the planned PG D mean and the DP sum value estimated at EOT by RTapp ranged from −5.6 to +6.3 Gy, with 74% of the parotids showing an increase in overall PG dose. RTapp reported 13 patients (17 flagged PGs) with at least one PG exceeding their dose constraint. Eleven studies had at least one PG crossing the 20 Gy D mean threshold at EOT. Three cases with an initial PG D mean between 20-21 Gy were reported for exceeding a 21-Gy D mean by EOT. Three additional contralateral PGs for which the initial plan kept the PG D mean < 26 Gy were also flagged. The median increase in PG mean dose at end of treatment was +2.60 Gy (range, 0.99-6.31 Gy) for all 17 flagged PGs. The mean difference between start and EOT for PGs violating the 20, 21, and 26 Gy D mean constraint were +2.92, +3.49, and +2.51 Gy, respectively.  . Panels (C, D) illustrate the trend prediction in DP sum for the right PG. The data to the left of the current fraction represent calculated values from daily deformed anatomy. The data to the right predict the variations of the DP day and DP sum for the next four fractions. The fraction at which the coverage crossed the 95% warning threshold, and the corresponding cumulative dose were identified from the V 95 trend.

Trend Analysis of Parotid Dosimetric Parameters
doses exceeded their respective 10% deviation thresholds was obtained from the automated fraction processing reports ( Table 5, "fraction for D mean > DE"). The flagged studies can be divided into two independent groups. (1) Eight patient studies had a PG DP day failure occurring within the first two treatment fractions, with an average PG mean dose difference DD mean = 4.05 ± 1.46 Gy. These occurred too early to result from radiation treatment and were most likely due to patients relaxing in their immobilization mask, leading to inconsistent patient setup throughout the course of treatment.
(2) The second group comprises nine studies with endpoint failures occurring later during treatment (median threshold-crossing fraction Fx=20; range, , with an average PG mean dose difference DD mean = 1.98 ± 0.78 Gy, likely to result from gradual body weight loss and internal anatomical shifts induced by radiation treatment response. Figure 4B presents the D mean trend from the right PG (study 81), indicating the 26 Gy DE 0 being exceeded at Fx = 8. An independent samples two-tailed t-test for equality of the mean PG dose difference between the early and late groups showed significance (p = 0.005). Nine PGs were flagged for adaptation with PG D mean > DE 10 before EOT, with average PG mean dose differences DD mean = 4.68 ± 1.11 Gy and DD mean = 2.54 ± 0.78 Gy for the early (N=6) and late (N=3) groups, respectively.  Figure 4B). The predicted trend at fraction 5 (extended dotted line to the right of the white vertical line on Figure 4C) indicated that the right PG D mean would cross 26 Gy at fraction Fx=8. The accuracy of the prediction model was estimated for all flagged studies except those with identified Fx < 5 (in Tables 4, 5) as the model requires a minimum of five treated fractions to generate predictions. The overall difference between measured and predicted PTV V 95 ranged from −7.6% to 0.0% (mean ± CI 95 , −2.7% ± 4.1%), with the largest differences observed for predictions made four fractions ahead (mean, −3.2%; CI 95 , −7.2%, 1.1%) and the most accurate predictions (mean, −2.2%; CI 95 , −5.2%, 0.8%) obtained closest to the threshold-crossing fraction. Figure 5A summarizes the mean V 95 difference results as a function of temporal proximity defined as the time interval between the threshold-crossing fraction (Fx) and the last processed fractions (Fx − i) providing the prediction V 95 [Fx − i] with i progressing from 4 to 1.

Predictive Model
Uncertainties are reported as 95% confidence intervals (CI 95 ). All model predictions overestimated V 95 coverage values. The variation in prediction accuracy for the PG D mean is presented in Figure 5B. The overall difference between the measured and predicted PG D mean values ranged from 0.0 to +5.0 Gy (mean ± CI 95 , 0.2 ± 1.6 Gy). The largest differences were calculated for predictions four fractions ahead (mean, 0.65 Gy; CI 95 , −1.83 Gy, 3.13 Gy), while the highest accuracy was obtained within two (mean, 0.17; CI 95 , −1.09 Gy, 1.43 Gy) to one fraction ahead (mean, 0.14; CI 95 , −0.93 Gy, 1.22 Gy).

Performance of DIR Algorithm
The automatically calculated NCC and distance confidence metrics quantitatively identified the structures with questionable deformations and helped to confirm our observations from the qualitative visual review. After several iterations, a single DIR algorithm configuration was found to optimally process all patient data sets without user adjustment. Seventeen patients from our cohort had dental implants that created streak and beam hardening artifacts on daily CBCT images. The registrations seemed robust against imaging artifacts for all 17 patient's PTV and mandible contours deformations. Figure 6 illustrates the deformation accuracy of a PTV, mandible, and parotid contours in the presence of severe artifacts. The only issues encountered when reviewing DIR results were the misidentifications of the immobilization mask for the external body contour. These mainly occurred during the second half of treatments, after patients' weight loss created air gaps between the skin surface and the shell of the mask.

DISCUSSION
This manuscript is the first to report on an automated platformagnostic commercial software (RTapp) designed to provide quantitative data to support the adaptive re-planning decision process. The retrospective analysis of 27 HNC patients with the first version of RTapp demonstrated that PTVs and parotid structures would most likely require daily monitoring for adaptation throughout the whole course of RT. The review of daily alignments between CBCTs and the planning CT revealed clear evidence of gradual body weight loss and internal anatomical changes throughout treatment for the 18 flagged patients. Our observations agree with published studies on ART for HNC, which reported on the reduction in target coverage (9,25,26) and on the increase in PG dose (18,36) and spinal cord dose (9, 26) without adaptation. It is highly likely that these 18 patients would have dosimetrically benefited from ART if daily information on the dosimetric impact of anatomical changes was readily available during treatment.

Impact of Sub-Optimal Immobilization
Several patients were flagged for adaptation during the first few treatment fractions. Reviewing daily patient set-up images revealed that their shoulders' position differed significantly to that seen on their planning CT. The immobilization device clearly failed to provide appropriate and reproducible support for the shoulders, with a direct impact on the dose coverage of the inferior cervical lymph nodes included in IR PTVs and on the A B FIGURE 5 | Differences between measured and predicted values at threshold crossing for flagged studies for (A) PTV V 95 and (B) PG D mean . Fx identified thresholdcrossing fraction. Fx-i represents the last fraction that was processed to obtain the model predicted value. In this case, Fx-4 represents the first fraction for which a predicted pDP sum value was available, while Fx-1 identifies the fraction that directly precedes the threshold-crossing fraction. The confidence bands and error bars indicate the 95% confidence interval. PG dosimetry due to the large elasticity of the head and neck tissue, which resulted in a high variability observed in the DP day trends for these patients. In the example from Figure 4A, pretreatment setup images revealed large daily variations in head rotation and shoulder positions, indicating that the patient was able to gradually move within her immobilization mask. This effect of gradual patient weight loss led to the continuous decrease in PTV V 95 demonstrated by the DP sum trend and crossing the adaptation threshold at fraction 26. Adapting this patient's RT treatment with a new immobilization would have improved setup reproducibility and raised the PTV coverage.

Predictive Model
The  (37,38), tumor response (37,39), and OARs dose accumulation (38,40). The approach introduced by McCulloch et al. (40) was the closest to that implemented in RTapp. It predicts specific dose metrics values at EOT based on accumulated doses calculated from daily CBCT anatomy. Their model achieved >95% sensitivity and specificity to detect a need for adaptation with predictions based on a minimum of 10 and 15 treated fractions. However, their method involved timeconsuming manual steps to generate the predictions and only provided the deviation between planned and received dose at a single time point. In contrast, RTapp relies solely on individual patient data to generate predictions on a per-fraction basis, in real time, without user intervention. The current prediction model accuracy could be improved with the implementation of a multiple regression model accounting for parameters easily available within RTapp-some of which have been shown to correlate with change in OAR dosimetry (41), or with the occurrence of locoregional control for oropharyngeal cancers (36,42,43) and incidence of xerostomia (44). Ultimately, machine-learning-based prediction methods might provide the most accurate trend of OAR and targets DPs (45, 46).

Selection of Appropriate Dose Metrics and Deviation Thresholds
The "warning" and "adaptation" threshold for specific DPs are at the core of the automated adaptive decision-making process.
There is no consensus on which DPs and deviation thresholds are the most appropriate for triggering adaptation. Therefore, the hypothetical limits to initiate a plan review or adaptive replanning defined in this work were based on physicians'

ART Clinical Workflow With RTapp
A proposed clinical ART workflow integrating RTapp to monitor patients for potential adaptive re-planning can be divided into three successive steps prior to treatment delivery, as described in Figure 7. First, the accuracy of the deformation is evaluated by DIR metrics, with the option to adjust the DIR parameters and reprocess daily setup images instantly. Second, the treating therapy staff at the control console compares DP day values to warning and action thresholds and decide whether to adjust a daily patient setup. Third, the treatment team can be alerted to review trends and predictions of any DP sum if a warning or adaptation threshold is reached. Such information can help with the decision to trigger adaptive re-planning or to carry on with the current treatment plan. Patients' alignment could potentially be improved daily, or in some cases, urgent adaptive re-planning may take place while minimizing treatment postponement. The proposed workflow is not limited to HNC sites and could be applied to pelvic cancer sites for which differences in daily bladder and rectal filling may impact the dosimetry of targets and OARs, or to non-small cell lung cancers stage II and above, to help with limiting the irradiation of healthy lung tissue caused by shrinking tumors.

Clinical Impact
Online ART is commercially unavailable for gimbal mounted linacs, which constitute the majority of medical linear accelerators installed worldwide (50). In a typical offline ART workflow, the successive steps of processing new 3D patient images (DIR between planning CT and latest CT images), structures re-contouring, re-planning, evaluation of dosimetric and volumetric changes, and QA usually take 2-5 days. RTapp could potentially optimize clinical resources for ART, saving hours of dosimetrist, physicist, and physician work by processing setup images and reporting dosimetric and volumetric changes in less than a minute. Such quantitative information is currently only available on dedicated online ART treatment platforms, which can perform dose recalculation within times ranging from 15 to 60 min (30,51). The implementation of the predictive model would further optimize the use of resources for adaptive re-planning, by allowing to generate a new treatment plan before a patient meets the requirements to trigger adaptation. In addition, vendor-agnostic ART decision support software applications provide several advantages compared to fully dedicated online ART systems: they are readily deployable with any treatment platform equipped with 3D imaging capabilities and make use of resources already available clinically. Finally, their cost effectiveness is particularly attractive to bring ART to patients from remote rural regions hours away from large academic centers and from low-and middle-income countries.

Limitations
The small sample size resulted in a low number of studies flagged for adaptation and in large CI 95 for the estimation of the prediction model accuracy. This is in part due to the exclusion of patients with only partial PTV or PG volumes in the CBCT field of view from our original cohort. The limited FOV in the superior-inferior axis from current CBCT imaging systems is an important technological limitation for the proposed method and would require the choice of adaptive DE independent of total structure volumes. However, full structure volumes could be recovered by acquiring two CBCTs, to be merged prior to processing by RTapp at the cost of increased imaging dose, or with new machine learning-based image processing methods to estimate the position of structures outside the FOV of a single CBCT based on the visible anatomy. The first iteration of the RTapp software employed for this study did not have the capability to export the deformed CT and structures data set. Therefore, we could not perform a comparison of RTapp's estimated dosimetric parameters to those that could have been obtained from an actual dose recalculation with original treatment plan.

Future Work
Once the capability to export deformed data sets is functional, the software performance will be established by evaluating sensitivity, specificity, positive and negative predictive values, and accuracy to determine the need for adaptation for HNC patients based on the DE chosen for this work. Such step is a prerequisite to conduct an observational clinical study aimed at comparing the traditional offline ART workflow, which involves physician-identified cases, to a hybrid RTapp-based workflow such as the one proposed in Section 4.4.

CONCLUSION
A novel automated decision support software platform for ART was tested retrospectively with 27 HNC patients' data. Eighteen patients were flagged for adaptation at end of treatment. The trend of PTV coverage and parotid mean doses against specific dose metrics and deviation thresholds on a per-fraction basis demonstrated that RTapp could help identify when to trigger plan adaptation and potentially pro-actively predict when a physician might consider the need for treatment plan adaptation. The tools offered by RTapp have the potential to benefit any clinic equipped with a daily 3D imaging capability without adequate resources to provide ART for their HNC patients. The software platform evaluated provides all the tools and information necessary to design prospective studies aiming  2) The treatment team at the console reviews the DP day values against warning thresholds and might reposition the patient. (3) The current and predicted DP sum trends are reviewed prior treatment to assess whether the treatment adaptation is needed within the next four fractions.
to test whether ART will improve outcome both for TCP and NTCP in a diverse range of cancer sites and fractionations.

DATA AVAILABILITY STATEMENT
The de-identified raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Loyola University Internal Review Board. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
Conception and design of study: SG, AS, AB, and BE. Provision of study material and patients: AB and BE. Data collection: SG and BL. Data analysis and interpretation: SG, AB, BE, BL, and CJ. SG wrote the first draft of the manuscript. All authors contributed to the subsequent manuscript revisions and approved the submitted version.