NuSegDA: Domain adaptation for nuclei segmentation

The accurate segmentation of nuclei is crucial for cancer diagnosis and further clinical treatments. To successfully train a nuclei segmentation network in a fully-supervised manner for a particular type of organ or cancer, we need the dataset with ground-truth annotations. However, such well-annotated nuclei segmentation datasets are highly rare, and manually labeling an unannotated dataset is an expensive, time-consuming, and tedious process. Consequently, we require to discover a way for training the nuclei segmentation network with unlabeled dataset. In this paper, we propose a model named NuSegUDA for nuclei segmentation on the unlabeled dataset (target domain). It is achieved by applying Unsupervised Domain Adaptation (UDA) technique with the help of another labeled dataset (source domain) that may come from different type of organ, cancer, or source. We apply UDA technique at both of feature space and output space. We additionally utilize a reconstruction network and incorporate adversarial learning into it so that the source-domain images can be accurately translated to the target-domain for further training of the segmentation network. We validate our proposed NuSegUDA on two public nuclei segmentation datasets, and obtain significant improvement as compared with the baseline methods. Extensive experiments also verify the contribution of newly proposed image reconstruction adversarial loss, and target-translated source supervised loss to the performance boost of NuSegUDA. Finally, considering the scenario when we have a small number of annotations available from the target domain, we extend our work and propose NuSegSSDA, a Semi-Supervised Domain Adaptation (SSDA) based approach.

The accurate segmentation of nuclei is crucial for cancer diagnosis and further clinical treatments. To successfully train a nuclei segmentation network in a fully-supervised manner for a particular type of organ or cancer, we need the dataset with ground-truth annotations. However, such well-annotated nuclei segmentation datasets are highly rare, and manually labeling an unannotated dataset is an expensive, time-consuming, and tedious process. Consequently, we require to discover a way for training the nuclei segmentation network with unlabeled dataset. In this paper, we propose a model named NuSegUDA for nuclei segmentation on the unlabeled dataset (target domain). It is achieved by applying Unsupervised Domain Adaptation (UDA) technique with the help of another labeled dataset (source domain) that may come from di erent type of organ, cancer, or source. We apply UDA technique at both of feature space and output space. We additionally utilize a reconstruction network and incorporate adversarial learning into it so that the source-domain images can be accurately translated to the target-domain for further training of the segmentation network. We validate our proposed NuSegUDA on two public nuclei segmentation datasets, and obtain significant improvement as compared with the baseline methods. Extensive experiments also verify the contribution of newly proposed image reconstruction adversarial loss, and target-translated source supervised loss to the performance boost of NuSegUDA. Finally, considering the scenario when we have a small number of annotations available from the target domain, we extend our work and propose NuSegSSDA, a Semi-Supervised Domain Adaptation (SSDA) based approach. KEYWORDS nuclei segmentation, domain adaptation, Unsupervised Domain Adaptation, Semi-Supervised Domain Adaptation, adversarial learning

. Introduction
Nuclei are the fundamental organizational unit of life (Sharma et al., 2022). Nuclei segmentation, a subclass of biomedical image segmentation, is considered as an essential task of digital histopathology image analysis (Yang S. et al., 2021;Haq and Huang, 2022). However, accurate nuclei segmentation is quite challenging due to the significant variations in the shape and appearance of nuclei, clustered and overlapped nuclei, blurred nuclei boundaries, inconsistent staining methods, scanning artifacts, etc. (see Figure 1). Also, histopathology of different organs or cancer types may exhibit different textures, color distributions, morphology, and scales (Xu et al., 2017;Mahmood et al., 2019).
Nuclei segmentation problem can be seen as a semantic segmentation problem in which we want to segment the nuclei from it's background. Figure 1 shows the input image, and corresponding output of semantic segmentation of nuclei. Convolutional Neural Network (CNN) based approaches like Fully Convolutional Network (FCN) (Long et al., 2015), U-Net (Ronneberger et al., 2015), UNet++ (Zhou et al., 2018), etc. give very promising results in biomedical image segmentation tasks as well as in nuclei segmentation problems (Sirinukunwattana et al., 2016;Haq and Huang, 2022;Sharma et al., 2022). However, to successfully train these fully-supervised methods, we need at least a few amount of annotated data (i.e., images with their corresponding pixel-level ground-truth labels) (Zeiler and Fergus, 2014;Kumar et al., 2017;Sharma et al., 2022). Unfortunately, such well-annotated datasets, even if very small-sized, are highly rare in biomedical domain. Moreover, due to the heterogeneity of nuclei, it's even harder to learn good models under the scenario of lacking annotations and samples. Also, commonly used strategy which first collects an unannotated histopathology dataset and then do the manual pixel-level labeling with the help of experts is also an expensive, time-consuming, and tedious process (Xu et al., 2017;Yang S. et al., 2021). For example, annotating even a small nuclei segmentation dataset consisting of 50 image patches takes 120-130 h of an expert pathologist's time (Hou et al., 2019). Therefore, an urgent question is raised: how could we robustly train a deep CNN model for nuclei segmentation without any further need for annotations?
For nuclei segmentation problem, simply applying Transfer Learning (i.e., models trained with one organ or cancer type, and then evaluated with different organ or cancer types) unfortunately leads to poor performance due to the domain shift problem (Sharma et al., 2022). This domain shift problem happens due to different scanners, scanning protocols, tissue types, etc. (Sharma et al., 2022). In this paper, we propose Domain Adaptation, a subclass of Transfer Learning, based framework to solve the domain shift problem for nuclei segmentation. We consider the unannotated dataset (i.e., for which we want to predict the labels) as the target domain. Then, with the help of another related but different annotated dataset, referred as the source domain, we apply adversarial learning (Goodfellow et al., 2014) based domain adaptation technique for nuclei segmentation problem. Thus, our proposed framework, learns from the labeled source domain and adapts to the unlabeled target domain.
In this work, we first propose an Unsupervised Domain Adaptation (UDA) model for nuclei segmentation to close the gap between the annotated source domain and unlabeled target domain. Unsupervised Domain Adaptation methods are capable to minimize the labeling cost by utilizing cross-domain data and aligning the distribution shift between labeled source domain data and unlabeled target domain data. We empirically and carefully observed that, images from different nuclei datasets, even if collected from different organ or cancer types, exhibit dissimilarity although their corresponding segmentation ground-truth labels are quite similar (see Figure 2). In summary, ground-truth labels for nuclei segmentation are domain-invariant. Because of the aforementioned observation, we apply domain adaptation in the output space. Thus, with the help of adversarial learning, we train a robust nuclei segmentation network to generate sourcedomain look-alike outputs for target images. Adversarial learning attempts to align target-domain predictions with source-domain ground truths via discriminator training. In addition to image-level domain adaptation at the output space, we apply domain-invariant class-conditional feature-level domain adaptation in the feature space. However, simply forcing the target-domain distribution toward the source-domain distribution can destroy the latent structural patterns of the target domain, leading to a drop in the model's accuracy. Consequently, we also use a reconstruction network to maximize the correlation between target images and target predictions. Again, a reconstruction network alone can not perfectly reconstruct original images (i.e., the reconstructed images lack original texture, style, color distribution, etc.) for which we incorporate adversarial learning into the reconstruction network, which in turn helps us to translate source domain images to the target domain. We additionally train our UDA model with these target-translated source images, and observe a significant performance boost. Finally, we extend our UDA framework to Semi-Supervised Domain Adaptation (SSDA) model considering that we have some annotations available from the target domain.
Conducting extensive experiments on two nuclei segmentation datasets we conclude that, our proposed UDA method, NuSegUDA, outperforms fully-supervised model trained on source domain and evaluated on target domain, and baseline generic and biomedical UDA segmentation models. Experimental result (see Section 4) also shows the impacts of training NuSegUDA with proposed image reconstruction adversarial loss, target-translated source images, and feature-level clustering loss. Furthermore, the accuracy of our SSDA model, NuSegSSDA, is highly competitive to the upper bound of fully-supervised model trained in the target domain.
Therefore, the main contributions of this paper are: (1) We propose an adversarial learning based Unsupervised Domain Adaptation (UDA) approach, which is applied at both of feature space and output space to solve nuclei segmentation problem for unannotated datasets. (2) Additionally, we incorporate adversarial learning into a reconstruction network to translate source domain images to the target domain, and train proposed model with these target-translated source images. (3) Compared to many of the baselines, our proposed method is simple as it does not depend on any data synthesization or data augmentation.   .

Related works
In literature, several domain adaptation models have been proposed for generic image segmentation. Isola et al. (2017) applied conditional GAN (Mirza and Osindero, 2014) for image-toimage translation problems. CyCADA proposed an Unsupervised Domain Adaptation (UDA) model utilizing both of input space and feature space adaptation (Hoffman et al., 2017). A multilevel adversarial network based domain adaptation approach for semantic segmentation was proposed in AdaptSegNet (Tsai et al., 2018). Zhang et al. (2018) proposed a fully convolutional adaptation network for semantic segmentation. CrDoCo proposed a crossdomain consistency loss based pixel-wise adversarial domain adaptation algorithm (Chen Y.-C. et al., 2019).  proposed adversarial self-supervision UDA model which maximizes agreement between clean samples and their adversarial examples. Toldo et al. (2021) proposed feature-clustering based UDA framework that groups features of the same class into tight and well-separated clusters.
Domain adaptation has also been employed in different biomedical image segmentation tasks. A multi-connected domain discriminator based UDA model for brain lesion segmentation was proposed by Kamnitsas et al. (2017). Dong et al. (2018) introduced another UDA framework for cardiothoracic ratio estimation through chest organ segmentation. Huo et al. (2018) proposed an end-to-end CycleGAN  based whole abdomen MRI to CT image synthesis and CT splenomegaly segmentation network. Mahmood et al. (2019) proposed a nuclei segmentation approach in which a large dataset is generated using synthesization. Gholami et al. (2019) proposed a biophysicsbased medical image segmentation framework which enriches the training dataset by generating synthetic tumor-bearing MR images. Hou et al. (2019) also synthesized annotated training data for histopathology image segmentation. Haq and Huang (2020) utilized adversarial learning at output space along with a reconstruction network for nuclei segmentation. Xia et al. (2020) proposed Uncertainty-aware Multi-view Co-Training (UMCT) framework which is capable of utilizing large-scale unlabeled data to improve volumetric medical image segmentation. Raju et al. (2020) proposed an user-guided domain adaptation framework for liver segmentation which uses prediction-based adversarial domain adaptation to model the combined distribution of user interactions and mask predictions. EndoUDA proposed another UDA-based segmentation model for gastrointestinal endoscopy imaging which comprises of a shared encoder and a joint loss function for improved unseen target domain generalization (Celik et al., 2021). Li et al. (2021) proposed another GAN (Mirza and Osindero, 2014) based framework for Unsupervised Domain Adaptation of nuclei segmentation which also utilized self-ensembling and conditional random field (Boykov and Kolmogorov, 2004). Sharma et al. (2022) proposed a mutual information based UDA method for crossdomain nuclei segmentation.
Several previous approaches (Dong et al., 2018;Tsai et al., 2018;Haq and Huang, 2020;Toldo et al., 2021) employed Unsupervised Domain Adaptation technique either in the output space or the feature space. Differently from these approaches, in our work we apply domain adaptation at both of output space and feature space. Additionally, unlike previous works, we utilize a reconstruction network to ensure that the target domain predictions spatially correspond to the target domain images. Also, several recent works (Huo et al., 2018;Gholami et al., 2019;Hou et al., 2019;Mahmood et al., 2019) applied complicated data synthesization techniques to generate a large training dataset. On the contrary, in our work we simply incorporate adversarial learning so that the source domain images can be translated to the target domain for further training.

. Methodology
In this section, we first describe the problem that we aim to solve. Then, we introduce the details of our proposed Unsupervised Domain Adaptation (UDA) and Semi-Supervised Domain Adaptation (SSDA) framework. Finally, we discuss the implementations of the proposed models.

. . Problem definition
In our nuclei segmentation problem, we have nuclei histopathology image patches as input X of size H × W × 3. The input X comes from either the source domain or the target domain. Depending on the problem (i.e., unsupervised or semisupervised) and domain (i.e., source or target), we may also have the corresponding pixel-wise ground-truth label Y of size H × W × 1 which is basically a binary mask. Then, using the segmentation network, we want to predict the segmentation output Y of size H × W × 1. Formally, in Unsupervised Domain Adaptation (UDA) problem, the source domain consists of N s annotated images {(X s , Y s )}, and the target domain has N t unannotated images {(X t )}. In the case of Semi-Supervised Domain Adaptation (SSDA) problem, the source domain is the same as it is in UDA problem, and we assume that the target domain has N l t images with annotations {(X l t , Y t )} and N u t unannotated images {(X u t )}. In both of UDA and SSDA problem, the source domain data and target domain data are the related data but they come from different distributions (i.e., different organ or cancer types). For both of unsupervised and Semi-Supervised Domain Adaptation, our ultimate goal is to learn nuclei segmentation models that accurately produce the segmentation outputs in the target domain.

. . Unsupervised Domain Adaptation
We refer our nuclei segmentation Unsupervised Domain Adaptation (UDA) model as NuSegUDA, and the framework is shown in Figure 3. NuSegUDA consists of four modules: Segmentation network (S), Reconstruction network (R), Prediction Discriminator (D P ), and Image Discriminator (D I ).

. . . Segmentation network
The segmentation network S takes image X as the input and produces the segmentation predictionŶ of the same size as the input. Here, X can be either the source domain image X s , or the target domain image X t . Hence, the source domain prediction Y s = S(X s ), and the target domain predictionŶ t = S(X t ). From the perspective of GAN (Goodfellow et al., 2014) framework, the segmentation network S can be thought as the generator module.
We train S to generate the source domain segmentation predictionsŶ s to be similar to the source domain ground-truth labels Y s . Since in Unsupervised Domain Adaptation (UDA) the ground-truth labels are not available for target images, we can not compute any supervised pixel-level loss for target predictions. In practice, we found that combining dice-coefficient loss and entropy minimization loss is more effective than simply using binary crossentropy loss for nuclei segmentation tasks. Therefore, we define segmentation loss L seg as: where Y ′ s andŶ ′ s are the flattened Y s andŶ s , respectively. Here, question may arise that why we are using single segmentation network S in NuSegUDA although we have two different domains. Since we are particularly looking for nuclei from both domain images, it is very unusual to use multiple segmentation networks. Additionally, using two segmentation networks would increase the number of learnable parameters which would slow down the training process in turn. Therefore, single segmentation network helps to prevent the memory issues and training latency in NuSegUDA.
Training the segmentation network S with only the annotated source data teaches S to make accurate predictions for source images. However, this segmentation network may generate incorrect outputs for target images as there are visual discrepancies between source images and target images (see Figure 2). This visual gap between domains causes the domain shift problem. According to our aforementioned observation that nuclei segmentation outputs are domain-invariant, we require S to produce target domain predictions as much as close to the source domain predictions. In other words, we want to make the distribution of target predictionsŶ t closer to the distribution of source predictionŝ Y s . For this reason, we utilize Prediction Discriminator D P in NuSegUDA, and we define the prediction adversarial loss as: whereŶ t = S(X t ), and H p and W p are height and width of the prediction discriminator output D P (Ŷ t ). The details of the Prediction Discriminator D P is discussed in Section 3.2.3.
The prediction adversarial loss in Equation (4) helps S to fool the prediction discriminator so that it considersŶ t as source domain segmentation outputs. Segmentation loss and the prediction adversarial loss jointly guide S to generate target domain predictionsŶ t which look similar to source domain ground-truths.

. . . Reconstruction network
As we mentioned earlier, the segmentation network S produces domain-invariant predictions for both domains. In other words, we want to generate the target domain predictions in a way so that they become similar to the source domain predictions. However, it is highly probable that the target predictions are not wellcorrelated with corresponding target input images. In this scenario, the ability of reconstructing the images from the predictions with similar visual appearance as input images will ensure that there is a correlation between the input image and segmentation output.
To ensure that our target domain predictions spatially correspond to the target domain images, reconstruction network R is used in NuSegUDA. In a similar way to Xia and Kulis (2017), we consider the segmentation network S and the reconstruction network R as an encoder and a decoder, respectively. R reconstructs target images from the corresponding predictions. Thus, S and R altogether works as an autoencoder.
Using our reconstruction network R, we first reconstruct target input images X t fromŶ t . Then, we calculate the reconstruction loss as: where, R(Ŷ t ) is the output of reconstruction network forŶ t , and C is the number of channels of input image X t .

FIGURE
Complete architecture of NuSegUDA. Segmentation network generates segmentation outputs, from which reconstruction network reconstructs input images. Prediction discriminator distinguishes between source domain outputs and target domain outputs. Image discriminator distinguishes between original images and reconstructed images. Although we use above reconstruction loss to reconstruct the target domain images from its predictions, the reconstructed images may have very different textures and styles (for both of nuclei and background) than the original images (see Figure 4). The reason is that the pixel-wise reconstruction loss L recons (in Equation 5) can not capture the overall pixel distribution of target domain images. To solve this issue, in addition to L recons , we also utilize an Image Discriminator D I to distinguish the original images and the reconstructed images. To train R and S to generate original-alike reconstructed images, we define image reconstruction adversarial loss as: whereX t = R(Ŷ t ), and H i and W i are height and width of the image discriminator output D I (X t ). This adversarial loss L advI trains R and S to reconstruct target domain images of similar distributions (in terms of texture, style, color distribution, etc.) to the original images from target domain.
In NuSegUDA, L advP helps the segmentation network S to generate target predictionsŶ t to be similar to the source predictionsŶ s . And, due to L advI , reconstruction network R learns to reconstruct target images (i.e.,X t ) which are very similar to the original target images in terms of texture, style, color distribution, etc. In other words, S maps both domain images (i.e., X s and X t ) to a common prediction subspace R n p , and from R n p R reconstructs the images in target domain. Therefore, using S and R we can translate source domain images X s to the target domain. Thus, target translated source domain images X s→t = R(S(X s )). Figure 4 shows the visualizations of the impacts of image reconstruction adversarial loss L advI on X s→t . Finally, we train the segmentation network S with {(X s→t , Y s )} using following L trans loss which is a combination of dice-coefficient loss and entropy minimization loss: whereỸ s = S(X s→t ). And, Y ′ s andỸ ′ s are the flattened Y s and Y s , respectively.

. . . Discriminators
We utilize two discriminators in NuSegUDA: Prediction Discriminator (D P ) and Image Discriminator (D I ). Prediction Discriminator distinguishes between source domain outputs and target domain outputs, whereas Image Discriminator distinguishes between original images and reconstructed images. We discuss the details of both discriminators in the following. Prediction discriminator As our goal is to generate similar predictions for both of source images and target images, we incorporate prediction discriminator D P in NuSegUDA. This discriminator takes source domain prediction or target domain prediction as input, and then distinguishes whether the input (i.e., prediction) comes from the source domain or the target domain. To train D P , we use following cross-entropy loss: where z p =0 when D P takes target domain prediction as its input, and z p =1 when the input comes from source domain prediction. Image discriminator We use image discriminator D I in NuSegUDA so that the reconstructed image distribution becomes similar to original image distribution. The input of D I is either the original target image or the reconstructed target image. Then, D I distinguishes whether the input is original or the reconstructed one. Similar to D P , we use following cross-entropy loss to train D I : where z i =0 when D I takes reconstructed target imageX t as its input, and z i =1 when the input comes from original target images X t .

. . . Feature-level adaptation
In addition to image-level domain adaptation at the outputs, we also apply feature-level domain adaptation in NuSegUDA to reduce the domain gap in the feature space. We assume that, our segmentation network S is composed of an encoder S E and a decoder S D (i.e., S = S E oS D ). Here, the encoder S E works as a feature extractor. Due to the discrepancy of input statistics across domains, there is also a shift of feature distribution in the feature space spanned by S E . Similar to Toldo et al. (2021), we utilize a clustering loss at the feature-level to serve as a constraint toward a class-conditional feature alignment between domains.
Given source image X s and target image X t , we first extract the features F s = S E (X s ) and F t = S E (X t ). Then, the clustering loss is computed as: where f i is the feature vector corresponding to a spatial location of F s or F t ,ŷ i is the corresponding predicted class, and C is the set of semantic classes which is {0, 1} for our nuclei segmentation problem. To computeŷ i , the segmentation prediction Y is downsampled to match the spatial dimension of F. We set the function d(.) to L1 norm. In Equation (12), c j denotes the centroid of semantic class j, which is computed using following formula: where δ j,ŷ i is equal to 1 ifŷ i = j, and to 0 otherwise. In Equation (12), the clustering loss is composed of two terms: the first term measures how close the features are from their respective centroids, and the second term measures how far the semantic class centroids are from each other. Therefore, according to the first term, the feature vectors of the same class from same or different domain are tightened around the class feature centroids. And, because of the second term, features from different classes gets a repulsive force applied to feature centroids which moves them apart.
Thus, we minimize the following total loss when training our segmentation network S and reconstruction network R: where, λ advP , λ recons , λ advI , λ trans , and λ cl are the weights to balance corresponding losses.

. . Semi-Supervised Domain Adaptation
In Semi-Supervised Domain Adaptation (SSDA) problem, we aims to ensure the best usages of available target domain annotations Y t when training our segmentation network S. In such scenarios, we extend proposed NuSegUDA framework to NuSegSSDA, a nuclei segmentation SSDA model.
In NuSegSSDA, for unannotated target images X u t we follow the same steps as NuSegUDA. However, when we encounter an annotated target data (X l t , Y t ) while training, we additionally compute the segmentation loss L seg (X l t ) in the similar manner to Equation (3). Then, while computing the total loss we incorporate L seg (X l t ) so that the segmentation network learns to generate the predictions closer to target ground-truths. Therefore, Equation (14) is now modified as below: . Experiments

. . Datasets
In our experiments, we use two H&E stained histopathology datasets with ground-truth annotations for nuclei segmentation. Both of the datasets that we used are public. We present the brief of the datasets in the following. Irshad et al. (2014) in which the images are extracted at 40x magnification from Whole Slide Images (WSI) of Kidney Renal Clear cell carcinoma (KIRC). This dataset, referred as KIRC, consists of 486 H&E stained histology images of 400 × 400 pixel size with annotations made by expert pathologists and research fellows. In our experiments, we randomly split KIRC into 80% for training, 10% for validation, and 10% for testing. Dataset-2 (TNBC) Naylor et al. (2018) generated this dataset by collecting slides from Triple Negative Breast Cancer (TNBC) patients at 40x magnification. For a total of 50 H&E stained histology images of pixel size 512 × 512, labeling was performed by expert pathologist and research fellows. We follow the same data splitting as KIRC for this dataset. We refer this dataset as TNBC in our experiments. Visual differences among datasets Although both datasets consist of H&E stained histopathology images, they are collected from two different organs, cancer types, and institutions. KIRC images are collected from TCGA portal (image acquiring tools are unknown to us), whereas TNBC images were acquired at Curie Institute using Philips Ultra Fast Scanner 1.6RA. Organ difference, cancer type difference, institutional difference, and using different imaging tools and protocols cause the visual difference among the images from these two datasets. See Figure 2, where TNBC image looks dimmer than KIRC image.

. . Implementations
In our work, we use U-Net (Ronneberger et al., 2015) as both of our segmentation network and reconstruction network. We choose U-Net so that our proposed segmentation framework can be directly applied in other biomedical domains. We preferred U-Net over UNet++ (Zhou et al., 2018) because of the less number of parameters. Following DCGAN (Radford et al., 2015), we designed our prediction discriminator and image discriminator consisting of five convolutional layers. To train NuSegUDA and NuSegSSDA, we followed the training strategy from GAN (Goodfellow et al., 2014). Adam optimizer (Kingma and Ba, 2014) with learning rate 0.0001, 0.001, 0.001, and 0.001 are used in segmentation network, reconstruction network, prediction discriminator, and image discriminator, respectively. We empirically choose 0.001, 0.01, 0.001, 0.001, and 0.002 as λ advP , λ recons , λ advI , λ trans , and λ cl , respectively. We implement NuSegUDA and NuSegSSDA using PyTorch (Paszke et al., 2019), and trained on a single GPU. We do not use any data augmentation in our experiments.
From Table 1, we see that source-trained U-Net gives the lowerbound of experimental performance (see first row of Table 1) which happens because of the visual domain gap between source training images and target test images, also known as domain shift problem. We see that, our proposed UDA model NuSegUDA outperforms all UDA baseline models in terms of IoU%, Dice score, and Hausdorff distance. Specifically, NuSegUDA has 1.28 and 0.42 higher IoU% than best generic UDA baseline OrClEmb, and best biomedical UDA baseline MaNi, respectively. Figure 5 shows the visualization results of CellSegUDA, SelfEnsemb, MaNi, and NuSegUDA. In Table 1, the second to last row [i.e., U-Net (target-trained)] shows the upper-bound of experimental performance (i.e., training U-Net with TNBC-train and testing it on TNBC-test). Experiment-2 (TNBC → KIRC) We conduct another experiment in the similar way to experiment-1 by selecting TNBC as source and KIRC as target domain. This experiment also reflects the excellence of NuSegUDA compared to other approaches in terms of segmentation accuracies (see last three columns of Table 1).
. /fdata. .  . In (C-F), green pixels, red pixels, and blue pixels indicate the true positives, false positives, and false negatives, respectively. In other words, green and red pixels indicate the predicted nuclei pixels, whereas green and blue pixels indicate the ground-truth nuclei pixels. This average-dense nuclei histopathology image in (A) is chosen so that the reader can easily find out the visual di erences without further zooming-in.

. . . Semi-Supervised Domain Adaptation
Experiment-1 (KIRC → TNBC) In experiment-1, we assess our Semi-Supervised Domain Adaptation (SSDA) method NuSegSSDA for KIRC → TNBC. Table 2 shows the experimental performances of NuSegSSDA. For this experiment, the source dataset KIRC is the same as UDA experiments. However, now we treat TNBC as partially labeled. We train NuSegSSDA considering 10%, 25%, 50%, and 75% images from TNBC-train dataset have annotations available. Then, testing on TNBC-test gives us increasing IoUs and Dice scores, and decreasing Hausdorff Distances. This happens because more false negative nuclei can be identified and some false positive nuclei can be removed by NuSegSSDA as we train it with more target annotations (see Figure 6). We compare NuSegSSDA with fully-supervised model U-Net (Ronneberger et al., 2015), and baseline biomedical SSDA model CellSegSSDA (Haq and Huang, 2020) to demonstrate the superiority of our proposed SSDA model. To train U-Net, we combine full KIRC dataset with the same 10%, 25%, 50%, and 75% of TNBC-train we chose to train NuSegSSDA. We observe that, the accuracy of NuSegSSDA approaches to the upper-bound (only lower by 1.35 IoU%) as we train with more annotations from target domain. Experiment-2 (TNBC → KIRC) In our second experiment, we select TNBC as source and KIRC as target domain. The second experiment also demonstrates the excellence of NuSegSSDA compared to U-Net (Ronneberger et al., 2015) and CellSegSSDA (Haq and Huang, 2020) (see last three columns of Table 2). Similar to experiment-1, for the second experiment we again see that the segmentation accuracies of NuSegSSDA increase when more target images are annotated.

. . . Ablation studies
To verify the robustness of proposed UDA framework, we perform extensive ablation studies on the adaptation of NuSegUDA from KIRC to TNBC, and from TNBC to KIRC. First, we examine the contribution of each loss to the final IoU%, Dice score, and Hausdorff Distance; then, we investigate the effects of different segmentation network backbones on NuSegUDA.

Effectiveness of losses
The contribution of image adversarial loss L advI , target-translated source supervised loss L trans , and clustering loss L cl to our proposed NuSegUDA model is shown in Table 3. We see that, simply applying only L advI or L cl to CellSegUDA (Haq and Huang, 2020) gives little better performance than CellSegUDA alone. However, when we apply only target-translated source supervised loss L trans to CellSegUDA, the performance is inferior due to the absence of L advI loss. Without applying image-adversarial loss L advI , target-translated source images X s→t looks very different from the target-domain images in terms of texture, style, color distribution, etc. (see Figure 4). As a result, the performance of the model (i.e., CellSegUDA w/ L trans ) decreases when trained with these X s→t images. Similarly, NuSegUDA w/o L advI gives much worse performance than NuSegUDA which happens because of training NuSegUDA with less-realistic target-translated source domain images. This again validates the effectiveness of L advI on NuSegUDA. Finally, with all the proposed losses enabled, we achieve the best performing model NuSegUDA for both of the experiments which demonstrates the combined impact of newly proposed image adversarial loss, target-translated source supervised loss, and clustering loss on NuSegUDA. Figure 7 shows the visualization results of NuSegUDA w/o L advI , NuSegUDA w/o L trans , NuSegUDA w/o L cl , and NuSegUDA. Impacts of different segmentation networks In NuSegUDA, we use U-Net (Ronneberger et al., 2015) as the backbone segmentation network. We also assess the model performance by replacing the backbone segmentation network with two more frequently-used Convolutional Neural Network (CNN) based approaches: FCN (Long et al., 2015) and UNet++ (Zhou et al., 2018). As mentioned earlier, CNN based approaches are still the dominant ones for semantic segmentation of nuclei. However, due to the intrinsic locality nature and limited receptive fields of convolution operations, CNN based models may be incapable of capturing the global context of the input (Chen et al., 2021;Jia et al., 2021;Zheng et al., 2021). To this end, .
/fdata. .   we explore the feasibility of Transformers, an alternative to CNNs, as the backbone segmentation network in NuSegUDA. Transformer mainly utilizes self-attention mechanism to extract inherent features (Tay et al., 2020), and due to this self-attention mechanism, transformers are powerful at modeling the global context of an input (Zheng et al., 2021). To examine the effectiveness of Vision Transformer based model, we replace U-Net in NuSegUDA with TransUNet (Chen et al., 2021) which basically combines a hybrid CNN-transformer encoder architecture with a decoder. Table 4 shows the quantitative results of using different segmentation networks in NuSegUDA. We see that, among CNN-based models, UNet++ and U-Net outperform other CNN approaches in Experiment-1, and Experiment-2, respectively. We also see that, Transformer-based model TransUNet does not give any better accuracy than U-Net and UNet++ for both of the experiments. This happens due to our small-sized training datasets, because Vision Transformers (VT) need lot of data for training, usually more than what is necessary to standard CNNs (Liu et al., 2021).

. Conclusion
Accurate nuclei segmentation is a significant step for cancer diagnosis and further clinical procedures. Collecting a fully annotated nuclei segmentation dataset, or manually labeling an unannotated dataset is expensive, time-consuming, and .
impractical although such annotations are required to train Convolutional Neural Networks in fully-supervised manner. In this work, we propose a novel Unsupervised Domain Adaptation (UDA) framework named NuSegUDA for segmenting nuclei in unannotated datasets by utilizing adversarial learning. In NuSegUDA, we apply domain adaptation at both of feature space and output space. We also incorporate image adversarial loss and target-translated source supervised loss into NuSegUDA, and train the model with target-translated source domain images. Extensive and prominent experimental results validate the effectiveness of each of the newly proposed modules and losses, and the superiority of NuSegUDA over baseline models. Finally, assuming we have a few annotations available, we extend our work to Semi-Supervised Domain Adaptation (SSDA). We expect our proposed UDA and SSDA approaches to be very useful in other biomedical image segmentation tasks.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.