Comparison of Automated Atlas-Based Segmentation Software for Postoperative Prostate Cancer Radiotherapy

Automated atlas-based segmentation (ABS) algorithms present the potential to reduce the variability in volume delineation. Several vendors offer software that are mainly used for cranial, head and neck, and prostate cases. The present study will compare the contours produced by a radiation oncologist to the contours computed by different automated ABS algorithms for prostate bed cases, including femoral heads, bladder, and rectum. Contour comparison was evaluated by different metrics such as volume ratio, Dice coefficient, and Hausdorff distance. Results depended on the volume of interest showed some discrepancies between the different software. Automatic contours could be a good starting point for the delineation of organs since efficient editing tools are provided by different vendors. It should become an important help in the next few years for organ at risk delineation.

Prostate bed radiotherapy after radical prostatectomy may present some clinical benefits in term of clinical outcome (1,2). Although intraoperative irradiation is a possible treatment modality (3), irradiation is mainly delivered by external beam radiotherapy. Advances in radiation oncology led to intensity-modulated radiotherapy (IMRT) and image-guided radiotherapy (IGRT). Those advances allow to either increase dose to target tissues or spare surrounding healthy structures. The development of state-of-the-art technologies including imaging modalities, treatment planning systems, and linacs have enabled radiotherapy treatments to be highly specific (4). In this context, the delineation of target and normal organs is the prerequisite inputs to the planning process. Consequently, the implementation of modern radiotherapy treatment plans focuses on the need of contouring guidelines (5). A recent development in radiotherapy is the use of automated atlas-based auto-segmentation algorithms to aid in organ delineation (6). The aim of the study was to compare the different atlas-based auto-segmentation software available when used for prostate bed and organs at risk. The study was limited to a single radiation oncologist to avoid inter-rater variations.
Indeed, significant levels of interobserver variability in target volume delineation have been demonstrated in prostate cancer radiotherapy (7)(8)(9)(10). This variability is the most important source of uncertainties in radiotherapy (11,12). However, this variability is out of the scope of our study as at least four consensuses originating from four scientific groups were validated (13). Therefore, no ground truth can be considered. The aim of our study was to assess how segmentation software are able to learn from the single radiation oncologist habits in order to reproduce these habits to novel patients.

Population and Treatment
Twenty consecutive patients, treated in a clinical center, were included in this study from January to September 2015 for a pT3aR0-R1N0M0 prostate cancer after surgery. They were treated by postoperative salvage IMRT. Treatment aimed at delivering 66 Gy to the prostatic bed as clinical target volume (CTV) (1). Computed Tomography scans (CT) were contoured by only one physician according to the Radiation Therapy Oncology Group (RTOG) guidelines for target volumes (5). The following organs at risk were also delineated: bladder, rectum, and femoral heads (14).

ethics
As French laws (data, data-collection, and freedom law, January, 6, 1978) agreed for single-center retrospective study, no specific written informed consent is needed. All patients have been orally informed about potential use of already recorded data for potential study.

atlas-Based auto-segmentation software
Five software were compared. WorkFlow Box (Mirada Medical) (WFB), MIM Maestro (MIM Software), SPICE (Philips), ABAS (Elekta), and the atlas-based segmentation module included in RayStation (RaySearch Laboratories). WFB is a black-box server that performs atlas-based contouring automatically. WFB fits seamlessly in to your current process via standard DICOM protocols. WFB uses deformable registration algorithm to automatically apply contours to planning CTs based on multiple expert atlases.
Alternatively, clinicians can define their own atlases. In the current study, atlases were based on patient contours delineated by the expert physician. Auto-contouring is a feature of MIM Maestro software. Automatic contours may be based on either user-defined atlas libraries or automatic atlas subject selection. This software includes features to sort atlases depending on TNM status, lesion laterality, or physician. If several atlases are chosen to start the auto-segmentation, a structure set was generated per atlas, and data were gathered to create the simultaneous truth and performance level estimation (STAPLE) contours for each organ. STAPLE is an expected maximization algorithm that computes a probabilistic estimate of the true segmentation by weighting each segmentation on its estimated performance level (15). In addition, it provides tools to correct auto-contours and a scripting platform. ABAS (Elekta) approximates the anatomy contours by scanning a library of reference images, applying elements of those forms to a new patient image, and creating a structure set to fit the patient's anatomy. The user may either choose an atlas among the library or use the STAPLE algorithm. In this study, the STAPLE algorithm was used. The operator cannot see or edit the contours within ABAS, but contours may be imported in any contouring solution, such as Focal or Monaco considering Elekta software. SPICE (Philips) that stands for Smart Probabilistic Image Contouring Engine, is an option of Pinnacle, a treatment planning system. This system computes contours from a probabilistic segmentation based on its own expert atlases, and the user cannot import his datasets to create another expert library. Consequently, only a limited number of treatment sites and organs is available. The transformation is based on a dense deformable registration method (Enhanced Demons), which further initializes organ-specific deformable models. The method is based on adaptation and probabilistic refinement (16). In addition to plan design and optimization features, RayStation Treatment Planning System (RS) provides an auto-segmentation solution based on ANAtomically CONstrained Deformation Algorithm (ANACONDA). ANACONDA combines image information (i.e., intensities) with anatomical information as provided by contoured image sets (17). It is a hybrid algorithm due to the combination of using image similarity and anatomical information. Model-based segmentation (MBS) and atlas-based segmentation (ABS) are available. MBS includes models with adjustable shape, size, and property parameters provided by RayStation for the different organs at risk, including femoral heads and bladder. ABS requires user-defined atlases with image sets and contours. In this study, only ABS was used, even for femoral heads and bladder.

atlas and evaluation Databases
The first 10 patients were selected to build the atlas database except for SPICE that is working differently and used its own atlas database. The 10 following patients constituted the evaluation database. The aim of the study was to compare the contours produced by the different automatic tools against the physician contours. For each patient of the evaluation database, atlas-based auto-segmentation software produced a DICOM Structure Set using the provided atlas database. Automatic contours without any modification were then exported in DICOM format for the comparison.
contour comparison CTV, bladder, rectum, and femoral heads delineated by the physician and computed by the automatic tools were imported in DICOM format in the Slicer open source freeware (http:// www.slicer.org). Automatic and expert contours defined on the different CT slices constituted volumes. The additional module DICOM RT was used to compare those volumes. Physician contours were used as reference contours. Different metrics were calculated to quantify the similarity between the automatic and the expert volumes.
The simple ratio R of the automatic volume (in cubic centimeter) divided by the expert volume (in cubic centimeter) was calculated.  The Dice Similarity Coefficient (DSC) was used to quantify the overlap between the expert and the automatic contours (18). DSC corresponds to the ratio of two times the intersection of two volumes divided by the sum of the two volumes (Eq 1).
where, A and B are the two volumes to be compared. The Hausdorff distance (95% confidence interval) was used to quantify the magnitude of gross deviations between contour surfaces (19). The Hausdorff distance computation utilizes a maximum-minimum function as defined by Eq 2: where a and b are points of contour sets A and B, and d(a,b)

resUlTs
For the 10 patients included in the evaluation dataset, the results are presented volume of interest by volume of interest. For femoral heads, results were obviously similar for the left and the right sides ( Table 1). R values were higher than 0.93, except for SPICE. But for this latter, the problem was that femoral heads were automatically delineated on too many slices. The lowest slice on which a SPICE contour was defined differed from the expert. Those results were confirmed by the DSC analysis. Results were really consistent from one patient to another (Figure 1). Except for SPICE, DSC and H95% were, respectively, about 0.90 and less than 10 mm for both femoral heads with small discrepancies whatever the patient. Femoral heads contours were acceptable, and only slight corrections would have been necessary to validate the automatic segmentation.
Bladder R values were larger than those obtained for femoral heads, and differences were observed between patients and software ( Table 2). SD was very large whatever the automatic solution. However, lower values were obtained with WFB and SPICE. Probably results would have been improved if CT scans had been injected with some contrast product. But DSC were satisfactory for most algorithms, with an average value higher than 0.75. For most algorithms, results were degraded by one or two cases. For example, SPICE median DSC was higher than 0.90, but average value was only 0.76 due to a very bad contour for Patient 10 (Figure 2). Similarly, ABAS and MIM failed for Patients 2 and 3. H95% was about 15 mm, except for RS. RaySearch results were disappointing, but the MBS option was not used for this study. Automatic contours were globally satisfactory for most algorithms. However, results really depended on the patient case. Verification and corrections were required.
Rectum R values were lower than those obtained for bladder, but SDs were still high, about 30% ( Table 3). Rectum automatic contours were larger than expert contours, except for WFB (Figure 3). Despite the lower R values, DSC mean values were slightly lower than for bladder. However, less discrepancies were observed between patients, average, and median DSC were approximately equal. Globally, DSC results were similar for the different algorithms, except RS (Figure 4). H95% was in the same order of magnitude, less than 15 mm, except for RS. Atlas-based contours presented discrepancies with the expert, and manual corrections were necessary.
Automatic prostate bed contours were less satisfactory with large volume variations ( Table 4). R values varied from 0.49 for SPICE to 1.37 for MIM. DSC was lower than 0.70 for all solutions, demonstrating that prostate bed cannot be automatically defined (Figure 5). Many corrections would be required to adapt automatic contours. However, ABAS had the best average DSC (Figure 5). Automatic prostate bed contours were insufficient. Manual segmentation should be preferred for this target volume whatever the algorithm.       cOnclUsiOn To the best of our knowledge, no other study compared automatic delineation software for prostate cancer in the postoperative setting. The comparison of five different automatic-based segmentation software used for prostate bed and nearby organs showed these algorithms were very efficient for high contrast organs such as femoral heads. For other organs at risk, results were nuanced. Automatic contours were quite close to the expert contours, but corrections were required and for some cases, depending on the algorithm, computed contours were bad. Prostate bed contours were insufficient, but automatic segmentation aims essentially to delineate organs at risk. Postoperative CTV can be considered as a virtual volume without difference in terms of contrast or gray level over a large part of its volume. This difference compared to automatic prostate delineation may explain the bad outcomes in postoperative situation. A study shortcoming was the limited number of patients used to create the reference database. But the objective was mainly to compare the different software with the same settings, except for SPICE that considered its own reference datasets. In this context, a single physician defined the reference contours, and an arbitrary choice of 10 patients was done. For each automatic delineation software, an optimization study may lead to a different number of patients to build the reference database. Such studies may improve the coherence between automatic and physician contours (20). For example, RayStation recommends the use of up to 20 cases for atlas creation. However, results were consistent with the study published by Hwee et al. (6) that focused on MIM solution. Although proposed contours differed from one algorithm to another, the present study cannot establish a ranking of the software. Indeed, only 10 cases delineated by a single physician were selected to create the expert database, and 10 other cases were used for evaluation. In addition, this study did not consider the extra features proposed by some tools to modify the computed segmentation. Nevertheless, it allowed to state that atlas-based automatic segmentation has reached an interesting level of accuracy, especially for high contrast organs. Automatic contours could be a good starting point for the delineation of organs since efficient editing tools are provided by different vendors. It should become an important help in the next few years for organ at risk delineation.
aUThOr cOnTriBUTiOns AE, SS, and DP selected the patients and delineated the volumes of interest. GD, TR, JD, and TL generated the automatic contours. GD, JF, and CN analyzed the data. All authors contributed to the redaction of the manuscript. reFerences