Deep Learning-Aided Automatic Contouring of Clinical Target Volumes for Radiotherapy in Breast Cancer After Modified Radical Mastectomy

Purpose: The aim of this study is to develop a practicable automatic clinical target volume (CTV) delineation method for radiotherapy of breast cancer after modified radical mastectomy. Methods: Unlike breast conserving surgery, the radiotherapy CTV for modified radical mastectomy involves several regions, including CTV in the chest wall (CTV cw ), supra- and infra-clavicular region (CTV sc ), and internal mammary lymphatic region (CTV im ). For accurate and efficient segmentation of the CTVs in radiotherapy of breast cancer after modified radical mastectomy, a multi-scale convolutional neural network with an orientation attention mechanism is proposed to capture the corresponding features in different perception fields. A channel-specific local Dice loss, alongside several data augmentation methods, is also designed specifically to stabilize the model training and improve the generalization performance of the model. The segmentation performance is quantitatively evaluated by statistical metrics and qualitatively evaluated by clinicians in terms of consistency and time efficiency. Results: The proposed method is trained and evaluated on the self-collected dataset, which contains 110 computed tomography scans from patients with breast cancer who underwent modified mastectomy. The experimental results show that the proposed segmentation method achieved superior performance in terms of Dice similarity coefficient (DSC), Hausdorff distance (HD) and Average symmetric surface distance (ASSD) compared with baseline approaches. Conclusion: Both quantitative and qualitative evaluation results demonstrated that the specifically designed method is practical and effective in automatic contouring of CTVs for radiotherapy of breast cancer after modified radical mastectomy. Clinicians can significantly save time on manual delineation while obtaining contouring results with high consistency by employing this method.


INTRODUCTION
According to a report from the World Health Organization, breast cancer has overtaken lung cancer as the most prevalent cancer worldwide [1]. Different stages of tumor progression require different types of surgical treatment, including breastconserving surgery (BCS) and Radical Mastectomy (RM). Modified radical mastectomy (MRM) is widely used in clinical practice for the treatment of breast cancer to ensure surgical efficacy while reducing surgical damage and improving the patient's quality of life [2]. Specifically, MRM has become a cornerstone of breast cancer treatment in China. It involves excising only the mammary gland and clearing the axillary lymph nodes, while preserving the pectoralis major and minor muscles, thereby ensuring postoperative mobility and appearance.
Although MRM is beneficial to patients, it presents a challenge to clinicians in contouring the clinical target volume (CTV) for postoperative radiotherapy because the corresponding CTVs involve several target areas with relatively complex anatomic structures compared with their counterparts in BCS and HS. There are three targets in the CTV delineation for radiotherapy of breast cancer after MRM: CTV in the chest wall (CTV cw ), supraclavicular region (CTV sc ), and internal mammary lymphatic region (CTV im ), among which the position and volume vary significantly. The significant variation between patients and the inter-intra-observation variability [3,4] also results in highly demanding and time-consuming work for clinicians. Conversely, research has demonstrated that the incidental doses to regions, such as the contralateral breast and thyroid caused by contouring errors can affect patients' quality of life [5][6][7]. Therefore, there is an urgent need to develop an automatic CTV delineation method for radiotherapy of breast cancer after MRM to reduce the burden on clinicians while improving work efficiency and accuracy.
Currently, most automatic contouring methods are developed for radiotherapy after breast-conserving surgery because they only segment the breast with the mammary gland. For example, atlas-based methods are successful in breast [8] segmentation under the condition that the amount of data and the inter-data variation are small. As the volume of data grows, deep-learningbased approaches have achieved significant development toward remedying the cases with large deformation and other considerable variations and have been adopted by an increasing number of institutes and clinicians.
To the best of our knowledge, this is the first study whose aim is to develop a deep learning-based automatic CTV delineation algorithm for radiotherapy of breast cancer after MRM. In this study, we propose a specifically designed multi-objective segmentation method for automatic CTV delineation for radiotherapy of breast cancer after MRM. An orientation attention mechanism is proposed to tackle the misrecognition of a similar structure between the breast and back sides caused by modified radical surgery. To enable the model to segment the targets correctly with significantly different volumes, an inception block-based multi-scale convolution architecture is constructed to obtain different perception fields and capture the corresponding features. In addition, the model is trained by local dice loss to handle the imbalance between segmentation categories and stabilize the training. Furthermore, three particular data augmentation strategies, namely, attention position variance, deformation simulation, and breast implant simulation, are designed to cope with the problem of data scarcity and differentiation.
The remainder of this paper is organized as follows. 2 introduces related research on automatic breast CTV delineation. 3 Materials and methods describe the specifically designed methods. 4 The experimental results show the quantitative and qualitative results. 5 Discussion and 6 Conclusion and future work.

RELATED WORKS
For the past few decades, traditional methods, particularly Atlasbased methods, have been the preferred solution for automatic CTV delineation. Atlas-based approaches perform deformable image registration to match the target and ground truth. Patients are segmented based on an atlas library, and the most anatomically similar will be selected as the target to be transformed into the same coordinate space as the input data. Anders et al. [9] and Velker et al. [10] collected 9 and 124 cases to build a library for breast cancer. The method proposed by Velker achieved good performance on structured CTVs, such as breast and chest wall, with Dice similarity coefficient (DSC) values of 0.87 and 0.89 for left-and right-side breast, respectively.
Atlas-based solutions have been widely utilized in cancer sites, such as the head and neck [11], breast [12], and lungs [13]. However, the performance of these approaches is limited by the degree of deformation, image registration quality, and additional corrections. For instance, for highly variable structures, such as internal mammary nodes, Velker's method achieved poor performance with a DSC of 0.3. In this case, several deeplearning-based approaches have been proposed and have made significant progress in terms of accuracy and consistency [14].
Deep learning methods have demonstrated excellent performance in several fields. Convolutional neural networks (CNNs) have become increasingly irreplaceable in the field of image processing and analysis, producing results by extracting and learning the features from well-organized training data. Deep learning-based semantic segmentation is a suitable solution for automatic CTV delineation. Min et al. [15] proposed a deep learning-based breast segmentation algorithm (a 3D fully convolutional DesnseNet) and compared its performance with the aforementioned atlas-based segmentation methods. The comparison results demonstrated that the deep learning method performed more consistently and robustly on the majority of structures. In addition to the segmentation accuracy, clinicians are concerned with the inference speed of the algorithms because the produced segmentation results still require manual correction. To this end, Jan et al. [16] proposed BibNet, a novel neural network built by U-Net [17] with a multiresolution level processing structure and residual connections, alongside a full-image processing strategy to increase the inference speed while improving the segmentation quality. Kuo et al. [18] proposed a deep dilated residual network (DD-ResNet) for auto-segmentation of the clinical target volume for breast cancer radiotherapy, which outperformed deep dilated convolutional neural network (DDCNN) and deep deconvolutional neural network (DDNN). Compared with those references, we use optimizer U-Net to help doctors contouring the region of breast cancer.

Data Acquisition
The data supporting this study comprised 110 CT scans of patients who underwent modified mastectomy surgery collected from Tianjin Medical University Cancer Institute and Hospital. These patients received adjuvant radiotherapy on the chest wall, supra-and infra-clavicular, and internal mammary lymphatic regions after lumpectomy. Therefore, the CTVs delineated for radiotherapy by an experienced clinician according to the RTOG criteria were set as the ground truth for model training [19]. The CTVs on both the left and right sides were delineated to stabilize model training. Patients with breast implants were also collected in our dataset and extended using the breast implant simulation data augmentation method. The twodimensional size and thickness of the reconstructed CT images were 512*512 and 5 mm, respectively. The dataset was randomly split into a training set and testing set with 82 cases and 28 cases, respectively. For the sake of splitting our dataset for training and test purpose, the ratio of training and test set about 3:1, which is slightly higher than the 4:1 for most commonly used, was adopted, accommodating the limited overall sample size, resulting in an adequately sized test set.

Architecture and Strategies
The architecture of the proposed network is illustrated in Figure 1. The input images are preprocessed using a specific orientation attention method before being fed into the network.
Each convolution block in the network comprises a inception module, followed by an activation layer and a batch normalization layer. The red arrows symbolize max pooling, whereas the green arrows symbolize transpose convolution. Black arrows indicate the inputs and outputs of the model. Local dice loss is employed to train the model for multiobjective segmentation, followed by a sigmoid activation function to generate the output mask. In this study, we focused on the specific characteristics of CTVs after MRM and designed corresponding solutions to accomplish an automatic contouring task.
The breast on the affected side is excised in MRM with only the pectoralis major and minor muscles preserved, resulting in a flat structure that is similar to the back. In addition, the collected data contained patients with left breast cancer and right breast cancer, and even on both sides; therefore, the model should be encouraged to focus more on the affected side and perform delicate segmentation. To this end, an orientation attention mechanism was designed for preprocessing. Specifically, a direction attention map is calculated based on the formula AP i 1 − i/H and LR i 1 − i/W, where i and H/W are the row/column index and image resolution along the anterior-posterior (AP) and left-right (LR) directions, respectively. The input of the model is the product of the AP and LR direction attention map and the normalized CT image with a range of [−1, 1]. The values on the breast and affected sides in the attention map were set to near 1, whereas the opposite side was set to near 0, thereby assigning higher importance to the breast and affected sides. This can be observed in Figure 1; the input attention image has a gray gradient along the vertical and horizontal directions. The darker side is emphasized, thus implicitly promoting breast segmentation.
The segmentation targets of the model contained CTV in the chest wall (CTV cw ), supra-clavicular region (CTV sc ), and internal mammary lymphatic region (CTV im ), which vary greatly in volume. CTV cw and CTV sc have thin and long shapes, whereas CTV im only occupies a small region. This imbalance may confuse the model and reduce segmentation performance, especially for Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 754248 small targets. Therefore, to enable the model to extract features with different perception fields, thereby performing delicate segmentation of targets with different scales, a network with a multi-scale convolution structure is constructed. This is done by utilizing a refined inception block [20] as a basic convolution element, which can improve the perception field while maintaining minimal pooling operations. Specifically, the input to each convolution block is fed into 1*1, 3*3, and 5*5 convolution layers and a max pooling layer to obtain different perception fields, and the extracted multi-scale features are then fused to model higher-level semantic information. In addition, to overcome the problems of incomplete labels, a novel local loss is introduced for network optimization, where a local mask is calculated based on the label. If parts of the targets are not annotated, the local mask will be initialized by zeros, thereby avoiding optimization of the model with the segmentation error outside the local regions. Benefiting from the larger variation in the breast cancer dataset, this local loss performed excellently in this study. Moreover, the sigmoid activation function is employed in the output layer to produce the probability of the categories of each pixel in the case of overlap among labels.
To cope with individual variations, such as various deformations and cases with breast implants, we designed several targeted data augmentation methods. Three specific data augmentation approaches are exploited to improve data diversity: Attention position variance, deformation simulation, and breast implant simulation. The CT scan center may vary significantly for different patients. Furthermore, the attention map is calculated based on the body center, which may be affected by the coach and other similar materials in the image. Thus, we adjusted the body cancer with limited variation and generated the corresponding input image for training. Breast cancer is a deformable organ, and small deformation is common in breast cancer radiotherapy. Thus, a random elastic deformation vector field was applied to the CT images for deformation augmentation. In particular, a breast implant simulation method was designed for data augmentation. Patients who have undergone breast reconstruction have completely different anatomical structures compared with other patients, which may confuse the model in the training process. In this case, we simulated breast implants in the breast region via morphological processing and density simulations. In the study, We collected CT images from 110 patients with breast cancer for model training and testing. They received radiotherapy from June 6, 2016 to January 31, 2020, at Tianjin Medical University Cancer Hospital. The contouring of target areas have been examined and modified by senior radiotherapy doctors. In order to reduce the influence of individual differences, these CT images are processed by the above data enhancement methods. From Figure 2, it can be seen that the simulated images have a relatively similar appearance to the real data. These approaches increase the amount of data, reduce overfitting, and improve the generalization performance of the model.

Evaluation Metrics
To evaluate this method, the DSC was employed as the quantitative metric, which is defined as the overlap between the segmented mask and the manually labeled mask, witch labled by experienced radiologists. The DSC formula is shown in Eq. 1, where A denotes the ground truth, and B denotes the predicted results. Therefore, a higher DSC indicates a more precise segmentation performance.
In some cases, more attention should be paid to segmentation boundaries. Therefore, the Hausdorff distance (HD) and average symmetric surface distance (ASSD) were calculated to evaluate the segmentation performance on boundaries. HD measures the surface distance between two point sets X and Y, as defined by Eq. 2. ASSD is the average of all the distances from points on the boundary of the predicted results to the boundary of the ground truth, which is calculated by Eq. 3.
HD max max x∈X min y∈Y d x, y , max y∈Y min x∈X d x, y (2) ASSD x∈X min y∈Y d x, y + y∈Y min x∈X d y, x len X ( ) + len Y ( ) where len(X) and len(Y) represent the total number of pixels in the boundary X and boundary Y respectively. Although the above metrics could provide a scientific assessment of the proposed segmentation method, they are not reliable enough to evaluate the significance of clinical practice Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 754248 [21]. To this end, we conducted a user study to obtain a practical assessment by three experienced radiologists.

Statistical Analysis
A paired t-test was conducted to verify the statistical difference between the quantitative evaluation results of the proposed method and other approaches. The test was also performed on the clinicians' scores. A p value of less than 0.05 can be regarded as a significant difference between the proposed method and baseline approaches. Table 1 presents the quantitative evaluation results of the proposed method and the baseline (U-Net) in terms of DSC, HD, and ASSD. It is observed that the proposed method achieved a mean DSC of 0.92 with standard deviation of 0.04 for CTV cw , a mean DSC of 0.74 with standard deviation of 0.09 for CTV im , and a mean DSC of 0.76 with standard deviation of 0.10 for CTV sc . The average DSC over all categories of the proposed method is 0.81, which outperformed the baseline significantly. The p value of 0.0001 also demonstrated the significant difference between the two methods. Figures 3A,B show the proposed method has larger inter-subject variations in the left CTVs. The HD and ASSD evaluations illustrated that the proposed method produced smaller surface discrepancies compared with U-Net in all the CTVs. Figures 3B,C,E,F revealed that the proposed method tends to generate segmentation results with quite small inter-subject diversity compared with U-Net, thereby demonstrating the inference quality and the robustness of the proposed method. 1 | Quantitative evaluation of the proposed method and U-Net on CTV cw , CTV im , and CTV sc in terms of DSC, HD and ASSD. The p value smaller than 0.05 indicates that there are significant differences between the two approaches. Specifically, our method can produce significantly better result with small inter-subject diversity compared with U-Net on CTV cw and CTV sc , because the multi-scale convolution module enables the model to extract sufficient features to segment targets with complex structure, such as CTV cw and CTV sc . As for targets with small volume like CTV im , the proposed method can also produce precise results by utilizing receptive fields with different scale. Figures 4, 5 compare the segmentation results of U-Net, the proposed method with the manual segmentation on the cancer affected side and the contralateral side. The CTV in the chest wall (CTV cw ) has an anatomically different structure on the affected side and the contralateral side because the mammary gland is removed. The results produced by U-Net suffer from a moderate degree of under-segmentation and holes in targets, which is not acceptable clinically. It can be seen that our proposed method achieved closer results to the gold standard in terms of shape, location, and volume than those of the counterpart of U-Net.

Ablation Study
In this section, we explored the importance and effectiveness of the orientation attention mechanism and breast implant simulation.

Importance of Orientation Attention
The input orientation attention strategy is expected to encourage the model to distinguish the breast region from the back region in the transverse CT slices and perform segmentation. To verify the effectiveness of this strategy, we conducted an ablation experiment by removing the input orientation attention mechanism and compared the segmentation performance. Figure 6 shows the segmentation results for a test case generated by models with and without input orientation attention preprocessing. The model trained without the orientation attention mechanism incorrectly performs segmentation on the back region, whereas the targets are correctly segmented by the model trained with the orientation attention strategy.

Importance of Breast Implants Simulatioǹ
Only six patients with breast implants were included in the training data, which was extremely imbalanced for training. The different anatomical structures between patients with and without breast implants can confuse the model during the training process. Thus, we expect that the proposed breast implant simulation can handle this problem by increasing the amount of data with breast implants. We investigated the FIGURE 4 | Examples of segmentation results of U-Net and the proposed method against gold standard for the affected side. Different colors represent different segmentation targets. The first row is the result of U-Net,the second row is the result of our method, the third row is the groundtruth of the images. And the different colors represent dfferent segmentation targets. Blue meas the supra-clavicular region, yellow means internal mammary lymphatic region (CTV im), another means CTV in the chest wall (CTV cw).
Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 754248 importance of breast implant simulation by training the model with only the original data. Figure 7 presents the segmentation results for the case of breast implants. It was found that the trained model without specific data augmentation was confused by processing cases with breast implants, resulting in poor segmentation results. The proposed method is well suited to cases with breast implants, whereas U-Net performs poorly.

Timing Performance
The time required to train the proposed model on two GTX 1080 GPUs was approximately 24 h. By utilizing the automatic segmentation method, the time required to delineate a breast CTV of a patient is drastically reduced from approximately 40 min (manual delineating) to several seconds. Even if some special cases need doctors correct the delineating result maunally, the completion of a breast CTV contouring can be

DISCUSSION
In this study, we proposed a specifically-designed deep learningbased framework for automatic contouring of 10 targets in CT scans for modified mastectomy RT. The experiment results indicate that our method performed well, exhibiting excellent agreement with the CTVs that were manually delineated by clinicians. In detail, both quantitative and qualitative evaluations demonstrated the feasibility of the proposed methods in contouring CTVs for modified mastectomy RT. The orientation attention provides reliable supervision for the model to recognize the breast and affect sides in CT images. Different from simply applying a deep learning-based segmentation network for automatic CTVs contouring, we conducted statistic analysis of the CTVs in modified mastectomy surgery-based radiotherapy and designed the network according to the statistical characteristics. The multiscale convolutional structure constructed by refined inception module increases both the width of the network and the adaptability of the network to scales, thereby producing delicate segmentation results of targets with different volume. Besides, the local loss drives the optimization for all of the targets even in the cases with labels missing.
Considering the scarcity of data volume and the variability among data, we designed three data enhancement methods for data expansion to improve the generalization performance of the model while avoiding overfitting. Data augmentation is particularly essential for medical-related researches, since it takes long and a lot to collect medical data. Apart from the attention position and general deformation simulation, we particularly designed the breast implants simulation method to increase the number of cases with breast implants. The breast anatomical structure of patients with breast implants is completely different from the patients without. So a small amount of data with breast implants can affect the model training, resulting in the model not converge. Through the breast implants simulation, the problem of category imbalance is alleviated and the model is able to generate more accurate segmentation results for patients with breast implants.
Although deep learning solutions performs well in producing contouring results for RT (RT is a file that stores the coordinates of the region of interest), the nature of deep learning makes it sort of disputable [22] because it learns how to segment only based on the ground truth delineated by one clinician. Radiotherapy requires clinical input and creativity in terms of science and art [19]. The delineation results of the same case can vary between clinicians, and it is sometimes difficult to determine which one is optimal. Therefore, the ground truth used for training the deep learning model also should have diversity. The reinforcement learning provides a potential way to enable the DL model to learn how to optimally segment targets. Manual delineation of OARs and CTVs for RT is a laborious task for clinicians, which requires not only experience but also physical exertion. Repetitive work for long periods can lead to reduced productivity and even errors on the part of clinicians [2]. In this case, automatic segmentation algorithms serve as a useful tool for reducing the workload of clinicians and producing highly consistent results. A previous study illustrated that atlas-based automatic segmentation (ABAS) for loco-regional RT of breast cancer reduced the time needed for manual delineation by 93% (before correction) and 32% (after correction) [23]. Our method reduced the time required for contouring from 40 min (manual) to 10 min (automatic) on average. With the assistance of deep learning-based auto-segmentation, radiation oncologists can work more efficiently.
To evaluate the segmentation results more carefully and efficiently, and to explore the detailed gap between the deep learning-based automatic contouring algorithm and manual contouring, we used both HD and ASSD to evaluate the performance of the contouring results on the edges and surfaces. In this case, we further proved the level of advancement of the proposed method on 3D level rather than the 2D level only. Table 1 and Figure 3 illustrate that the proposed method can produce segmentation results with better agreement with the manually delineated structures in terms of region and surface.
This study has several limitations. First, we conducted this research in a single center with limited sample size and diversity, which will impose a challenge on the generalization power of the proposed model. The well-performing model may produce unacceptable segmentation results when applied to other centers owing to the variance between the data. Therefore, we plan to validate the proposed method using data from other institutions. Second, the accuracy and pattern of the segmentation results depend heavily on the manual annotations used for training, which can be both advantageous and disadvantageous. As aforementioned, the model can be trained using a homogeneous gold standard created by a single clinician. However, there is no 100% gold standard in clinical settings, as inter-intra-observer variations always exist. Thus, further studies should be conducted to evaluate the generalization of the gold standard created by multiple clinicians. Additionally, it may be more favorable if the OARs are segmented simultaneously. By extracting corresponding features and segment-related organs and tissues, the model can obtain a better perception of the target region. Specifically, the OARs that are most helpful for segmenting target CTVs in the breast region still need to be considered. For instance, the importance of coronary vessels has been increasingly acknowledged.

CONCLUSION
Auto-contouring of the CTVs can relieve clinicians from tedious contouring work while improve the consistency and reliability of radiotherapy. In this study, a specifically designed deep learning-based segmentation method was developed to delineate CTVs for modified mastectomy radiotherapy. Qualitative and quantitative evaluations demonstrated the outstanding performance of the proposed method. The method can also handle cases with breast implants and large shape variability. The user study also suggests that the proposed method is practical and beneficial to clinical work by significantly saving time and improving the consistency of decisions.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because data security requirement of our hospital. Requests to access the datasets should be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethics Committee of Tianjin Medical University Cancer Institute and Hospital. The ethics committee waived the requirement of written informed consent for participation. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.