Incorporating the synthetic CT image for improving the performance of deformable image registration between planning CT and cone-beam CT

Objective To develop a contrast learning-based generative (CLG) model for the generation of high-quality synthetic computed tomography (sCT) from low-quality cone-beam CT (CBCT). The CLG model improves the performance of deformable image registration (DIR). Methods This study included 100 post-breast-conserving patients with the pCT images, CBCT images, and the target contours, which the physicians delineated. The CT images were generated from CBCT images via the proposed CLG model. We used the Sct images as the fixed images instead of the CBCT images to achieve the multi-modality image registration accurately. The deformation vector field is applied to propagate the target contour from the pCT to CBCT to realize the automatic target segmentation on CBCT images. We calculate the Dice similarity coefficient (DSC), 95 % Hausdorff distance (HD95), and average surface distance (ASD) between the prediction and reference segmentation to evaluate the proposed method. Results The DSC, HD95, and ASD of the target contours with the proposed method were 0.87 ± 0.04, 4.55 ± 2.18, and 1.41 ± 0.56, respectively. Compared with the traditional method without the synthetic CT assisted (0.86 ± 0.05, 5.17 ± 2.60, and 1.55 ± 0.72), the proposed method was outperformed, especially in the soft tissue target, such as the tumor bed region. Conclusion The CLG model proposed in this study can create the high-quality sCT from low-quality CBCT and improve the performance of DIR between the CBCT and the pCT. The target segmentation accuracy is better than using the traditional DIR.


Introduction
In image-guided radiotherapy, Cone-beam computed tomography (CT) (CBCT) has been incorporated into the contemporary linear accelerators (1)(2)(3). However, CBCT has low image quality due to the small number of x-ray projections and the long acquisition time, which impedes the following deformable image registration (DIR) procedure (4). In adaptive radiation treatment (ART), DIR between the planning computed tomography (pCT) and daily CBCT is required (5,6). The deformable vector field (DVF) generated from the DIR might help with patient setup, contour propagation, target definition, and online dosage computation (7). The sum of squared differences and the mean absolute difference is employed in many traditional DIR methods to assess the fixed and moving image registration performance (8). On the other hand, these measurements presume that the moving and fixed image intensities are consistent. As a result of the image intensity discrepancy, the sum of squared difference (SSD) and Mean absolute error (MAE) cannot be directly used for pCT-CBCT DIR (9)(10)(11)(12)(13)(14)(15)(16).
Many studies focused on the deep learning (DL) based DIR (17-21) Kearney et al. developed an unsupervised learning technique to register the CT to CBCT (22). For multimodal CT-CBCT image registration, Fu et al. presented an unsupervised DL registration network that used directional local structural similarity and original images as input. (23). A DL registration model was proposed by Han et al. to predict Organ at Risk (OAR) segmentations on the CBCT based on planned CT segmentation (24). However, the severe artifacts of CBCT greatly limit the deformable image registration accuracy of CBCT with planning CT. Therefore, many scholars have proposed to convert CBCT into highquality synthetic CT before registration. Fu et al. propose synthesizing a high-quality CT from CBCT to reduce image artifacts and perform intensity correction before image registration (3). The Cycle-GAN was used in the high-quality CT generation. Although Cycle-GAN can improve the quality of CBCT images, it does not mean that it helps to improve the accuracy of registration. one of the more fatal drawbacks of Cycle-GAN is that the anatomical geometry may change after the image quality improvement. These changes include the movement, deformation, or disappearance of anatomical structures, which bring huge errors to the subsequent deformable image registration. Therefore, an urgent need is to investigate a synthetic CT generation model with anatomical geometric consistency.
This study proposes a contrast learning-based generativity (CLG) model for synthetic CT (sCT) generation to address the above issues. The proposed method maintains the consistency of the anatomy after image synthesis. As a result, the synthesis images are more trustworthy than the cycleGAN. In addition, the high-quality synthetic CT improves the deformable image registration performance of CBCT and breast pCT.

Data acquisition and processing
The study retrospectively included 100 patients who underwent radiotherapy after breast-conserving surgery. The patients were treated using a standard treatment planning process with CT images and at least one set of CBCT images acquired during treatment. The CT images were acquired using the Siemens Medical System scanner with a voxel size of 0.977 × 0.977 × 5 mm 3 and a data size of 512 ×512 × 80. The CBCT was acquired using the Varian Edge (Varian Medical Systems, Palo Alto, CA) scanner with a voxel size of 0.977 × 0.977 × 5 mm 3 . Due to the difference in scanning range and voxel size between CT and CBCT, we first rigidly aligned the CT and resampled the voxel size to match CBCT.
In our study, a DL network generated sCT images from CBCT images. And then, pCT was aligned with CBCT and sCT images, respectively, using the DIR method. Next, the contours on pCT were propagated on CBCT (sCT) images. Physicians first manually outlined pCT and CBCT data contours (target area contours including tumor bed area clinical target volume (CTV) 1, CTV 2, Heart). The final contours propagated on the sCT images have a more similar anatomy to the original CBCT, especially in soft tissues with significant effects. The model was trained, validated, and tested using 52/7/41 patients, corresponding to 4160/560/3280 slices.

Synthetic CT generation
The image transformation problem is an untangling problem: the separated content must be preserved between the different image modalities. The appearance must be modified (25,26). In most cases, the adversarial loss produces the goal appearance, whereas circular consistency loss is used to retain the content (27)(28)(29). However, cyclic consistency loss assumes a bijection between two domains, which is usually too restrictive.
In this study, to maintain consistency in content effectively, we used the CLG model to generate the sCT images (30). The CLG model network architecture is schematically shown in Figure 1.
The CLG model uses only one-way learning mapping. The I CBCT was the input image, and the I sCT was the output image. Split the generator G into the decoder G dec and the encoder G enc to get the output image. We employed a Resnet-based generator in particular (31). We refer to the encoder as the generator's first half, and the decoder corresponds to the back half of the generator. The whole image should have an identical structure. Therefore, we should use the learning target of multi-layer image blocks. The feature layers were encoded by an encoder G enc , where different layers with different spatial locations represent different image blocks. To generate a sequence of features, we select the L layer feature map fz l rightg L = fH l (G l enc (x))g L and feed it through the two-layer Multi-layer perceptron (MLP) network H 1 . The number of channels per layer was C 1 . The output y was encoded into the fẑ l g L = fH l (G l enc (ŷ ))g L in the same way. After getting the features via the MLP network, we introduce contrast learning. The features of the output image become the query samples, the input's corresponding location features become the positive samples, and the other input's features become the negative samples. The purpose of contrast learning was to make the query sample and the positive sample signal correlate with the negative sample to form a contrast.
The queries, positive samples, and N negative samples are fixed to map to the Kth dimensional vector v, v + ∈ R K and v − ∈ R N×K , respectively. The n-th negative value is denoted by v − n ∈ R K . Our objective aims to link the output and the input image. The query indicates the output image. The corresponding and noncorresponding inputs are positive and negative, respectively. We normalize the vector to the unit sphere to prevent spatial collapse or expansion. The classification problem was built up in an (N+1)-way configuration, with the distance between the query and the other instances scaled by the temperature of 0.07 (32,33). The chance of choosing a positive example among negative instances was determined using the cross-entropy loss.
Finally, we obtain the loss of multi-layer patch contrast learning: Similarly, an identity contrast loss can be obtained similarly to fix the output image. Where z s 1 and z S\s l from the input's first layer feature map from the output's first layer feature map. The adversarial loss encourages the output to resemble the image in the target domain in terms of appearance (26).
In summary, the total loss function of the Cut network is shown in equation (4) below.
where during training, we set l X = l Y = 1.
The learning rate was set to 2*10 -4 , and the Adam optimizer was used. The number of iterations was set to 200 epochs, and the learning rate of the first 50 epochs remained unchanged, while the learning rate of the rest 150 epochs decayed to 0. The model was trained and tested on an NVIDIA 1080 GPU with 8 GB of memory with a batch size of one. The model was based on the PyTorch framework.

Deformable image registration
We used the DiffeoDemons (34) algorithm for the deformation registration algorithm. The Demons-based differential homogeneous registration algorithm solves transformations in the logarithmic domain. The basic concept behind the approach is to represent the current transformation as an index of the smooth velocity field V. We use the homogeneous differential demon to quickly compute j ∘ exp (v)=exp (V) ∘ exp (v) and then update v. The exponential mapping in the Lie algebra (vectorspace of the velocity field) is denoted by the symbol exp. The following equation yields the functional energy: where l > 0. It's worth noticing that the transformation of y's Jacobian matrix is ∇j . The DiffeoDemons model guarantees a smooth displacement field at all times. I m stands for moving image and I hrmf stands for the fixed image. Figure 2 shows the image registration process with two different methods. To compare the performance of the method with and without the incorporation of the sCT, we obtained the deformable vector field (DVF) of I pCT -mI CBCT and I pCT -I sCT by DiffeoDemons deformation registration. The DVFs are used to warp the moving image and the corresponding target labels to obtain the warped CBCT images I mwCBCT , the warped sCT images I wsCT and the warped target labels. Figure 3 compared the target segmentation performance between the fixed image using the CBCT image and sCT image. The red, blue, and green contours represent the target area on the I wCBCT image and I wsCT image. After using sCT images for fixed image alignment, the target contours are better than the original CBCT images for fixed The CLG model network architecture.  Figure 4 shows the soft tissue comparison of CBCT, sCT, and pCT images at the same window width and position, with the red contours indicating the tumor bed area. The sCT improves the image quality and spatial uniformity while keeping the imaging anatomy unchanged, resembling the tissue information distribution of pCT images. The proposed CLG model can greatly improve the tissue contrast of CBCT images. Thus, the soft tissue segmentation performance can be achieved more accurately. The first row in Figure 5 shows the pCT, CBCT, warped CBCT, and warped sCT images displayed in the same window for a single patient, and the second row shows the difference between CBCT, warped CBCT and warped sCT images and pCT images. Since thepatient had significant weight loss, tumor shrinkage, and the influence of respiratory factors during the fractionated treatment, the difference between the pCT and the CBCT is obvious, especially in the lungs. In contrast, the difference between wCBCT-pCT and wsCT-pCT images is significantly reduced, which indicates that the artifact in the CBCT was reduced. The black contour lines represent CTV1, CTV2, and heart target areas in the different maps. The comparison reveals that the difference between westand pCT images in the target area region is smaller than between wCBCT and pCT images. Furthermore, in the soft tissue region, such as the tumor bed, the image intensity difference between the registration synthetic computed tomography and pCT is smaller, which indicates that the sCT generated by the CLG model proposed in this paper can improve the image contrast of the soft tissue. Therefore, the sCT's image quality is comparable to that of the pCT and benefits the performance of the DIR.

Results and discussion
The results of the Dice similarity coefficient (DSC), 95 percent Hausdorff distance (HD95), and average surface distance (ASD) on the various techniques are shown in Table 1 and Figure 6. Compared with the traditional method using the CBCT image as the fixed image, the proposed method clearly shows that incorporating the sCT image in the DIR achieved a better result in DSC, HD95, and ASD. For example, the DSC value of pCT to sCT in the CTV1 (tumor bed) is 0.81 ± 0.06, while the DSC value of pCT to CBCT is only 0.79 ± 0.08.
Breast cancer radiation therapy is based on the pCT for treatment planning. However, the target area and anatomy will change with the treatment process. In addition, positional errors and patient respiration can result in underdose to the target area and increased dose to normal organs. ART uses daily CBCT images to analyze changes in the target area and anatomy. Correction of anatomical FIGURE 3 Comparison of object segmentation performance of fixed image using CBCT image and sCT image. The image registration process with two different methods.
Li et al. 10.3389/fonc.2023.1127866 changes is achieved by the DIR method. The DIR between the CBCT and pCT image is a multimodal DIR problem. The multimodal DIR is challenging because establishing effective similarity measures between regions or features of multimodal images is difficult due to the nonlinear variation of grayscale features. To this end, this study achieves accurate DIR of pCT and fractional CBCT of breast cancer and accurate propagation of the corresponding target region contours by converting the multimodal DIR problem into a unimodal problem. Traditional image synthesis methods usually use cycleGAN. cycleGAN is good at suppressing artifacts, but it does not mean that the generated images are reliable. The anatomy changes before and after the improvement of CBCT image quality. These changes include shifting, distortion, or disappearance of the target area, which brings huge errors to the subsequent automatic target area localization. These changes include the movement, distortion, or disappearance of the target area, which can lead to significant errors in subsequent DIR. We proposed a multi-level contrast learning-based approach for generating quantitative CBCT images with anatomical geometric consistency to improve the quality of CBCT images. Based on the generative adversarial network, we introduce a contrast-loss function to ensure the consistency of the anatomical structure. The proposed loss function discards the cyclic consistency loss function in cycleGAN and avoids the strong mapping relationship brought by the cyclic consistency loss function. As a result, it not only improves the computational speed but also better in image generation details. The advantage of our method is that it maintains consistent anatomical geometry before and after image generation. As a result, the generated images are more trustworthy.
This study used the CLG model to generate high-quality sCT from CBCT. A patch loss was proposed based on contrast learning to calculate the similarity between patches, which enables learning the CT value distribution of pCT without changing the anatomical structure of CBCT images. The proposed CLG model can alleviate the effects of low contrast, high noise, and artifact contamination in soft tissues. We performed multimodal pCT-CBCT DIR and unimodal pCT-sCT DIR by the DiffeoDemons algorithm. The The pCT, CBCT, warped CBCT, and warped sCT images displayed in the same window for a single patient, and the second row shows the difference between CBCT, warped CBCT, and warped sCT images and pCT images. Comparison of soft tissue from CBCT, sCT, and pCT images with same window width and location, with tumor bed areas represented by the red contour line.
target segmentation results show that the unimodal pCT-sCT registration is significantly better than multimodal pCT-CBCT registration. In this paper, only the DiffeoDemons algorithm was used to perform DIR; however, many other DIR algorithms can be explored for potential performance improvements (35).
Although the proposed framework produces accurate results by incorporating the sCT, further refinements may be made. Firstly, we adopted the patch-based unsupervised convolutional, which is computationally intensive. The training step may increase efficiency by balancing network breadth, depth, and resolution. Second, more training images may be needed to ensure the findings' correctness. We will expand the datasets with a wide range of anatomic variants and from other CT scanners in the future to improve the network's resilience. This study used 8000 CT slices from 100 patients to evaluate the model. The outcomes are deemed clinically satisfactory. The prediction accuracy will increase, and the network will avoid potential overfitting when the training dataset is expanded.

Conclusion
This work proposed a CLG model to create high-quality sCT from CBCT. Instead of the CBCT images with severe artifacts, the pCT performs DIR with high-quality sCT images for the target contour propagation. The results showed that incorporating the sCT Image can improve the performance of DIR between pCT and CBCT, especially in soft tissues. Furthermore, the proposed method is quite general and can be applied to other organs, such as the abdomen and prostate.

Data availability statement
The hospital datasets are protected for patient privacy and are not publicly available. However, the datasets are available from the corresponding author upon reasonable request. The results of the Dice similarity coefficient (DSC), 95 percent Hausdorff distance (HD95), and average surface distance (ASD) on the various techniques.