Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurorobot., 05 January 2026

Volume 19 - 2025 | https://doi.org/10.3389/fnbot.2025.1665528

Subdomain adaptation method based on transferable semantic alignment and class correlation

Qian HanQian Han1Jinfu LaoJinfu Lao2Jinyong Zhang
Jinyong Zhang1*
  • 1Department of Computer Engineering, Maoming Polytechnic, Maoming, China
  • 2China Mobile Communications Group Guangdong Co., Ltd., Maoming Branch, Maoming, China

To address these challenges, we propose a subdomain adaptation framework driven by transferable semantic alignment and class correlation. First, source and target domains are divided into subdomains according to class labels, and a joint subdomain distribution alignment mechanism is introduced to reduce intra-class distribution divergence while enlarging inter-class disparities. Second, a domain-adaptive semantic consistency loss is employed to cluster semantically similar samples and separate dissimilar ones in a unified representation space, enabling precise cross-domain semantic alignment. Third, pseudo-label quality in the target domain is improved via temperature-based label smoothing, complemented by a class correlation matrix and a loss function capturing inter-class relationships to exploit intrinsic intra-class coherence and inter-class distinction. Extensive experiments on multiple public datasets demonstrate that the proposed method achieves superior average classification accuracy compared to existing approaches, validating the effectiveness of semantic alignment and class correlation modeling. By explicitly modeling intra-class coherence and inter-class distinction without additional architectural complexity, the framework effectively mitigates domain shift, enhances semantic alignment, and improves recognition performance on the target domain, offering a robust solution for deep unsupervised domain adaptation.

1 Introduction

The past few years have seen rapid evolution of representation learning approaches, yielding notable improvements in visual computing and human language technologies (Mohammed and Kora, 2023). These advancements have primarily been driven by two critical factors: the swift advancement of powerful computational infrastructure, which enables the training of large-scale models, and the availability of massive labeled datasets, which allow deep models to learn rich and accurate feature representations (Niu et al., 2021; Yamada et al., 2021). However, collecting and annotating large-scale datasets remains a costly and labor-intensive process. Across specialized scenarios such as biomedical visual analysis and military-grade sensing, the acquisition of high-quality samples is often limited by practical constraints (Mei et al., 2022). Under such conditions, maintaining high model performance with limited or imbalanced data has become a central challenge in current research. This article’s code is open source in https://github.com/XXXX/XXXXX.

Domain adaptation (DA), A critical mechanism within the transfer learning paradigm, it focuses on reducing domain shift by adapting knowledge acquired from the source domain to enhance learning in the target domain, where labeled data is limited or unavailable (Liu et al., 2022; Gao et al., 2022). DA is built on the premise that, while statistical discrepancies are present between the input and target domains, they retain common, generalizable features that enable effective knowledge reuse. Through DA, models trained on the source domain can be effectively applied to the target domain with minimal or no labeled data, provided that appropriate adaptation strategies are employed (Singhal et al., 2023). By leveraging known target domain data—labeled or unlabeled—DA enables models to reduce domain shift and improve generalization. Nevertheless, in numerous practical applications, the availability of data from the target domain is often highly constrained or completely absent, which restricts the applicability of traditional domain adaptation methods (Zhou et al., 2022; Yang et al., 2025; Wang et al., 2025).

To overcome this limitation, increasing attention has been directed toward Domain generalization (DG), which can be viewed as an extension of DA (Fan et al., 2021). In domain generalization, the objective is to leverage multiple source domains to construct a model that can generalize effectively to domains not encountered during training. Unlike DA, DG does not rely on target domain data during training. Consequently, models must learn more generalizable and abstract representations from the source domains to address potential distribution shifts with unknown targets. DG thus presents a more realistic and challenging setting, and advancing this area is essential for enabling intelligent systems to function reliably in ever-changing and uncertain contexts, including self-driving vehicles and clinical decision-making (Khoee et al., 2024).

In summary, DA and DG constitute two core methodologies for handling distributional discrepancies in transfer learning scenarios. While the objective of DA is to bridge the statistical divergence between source and target domains through adaptation techniques that rely on scarce target domain samples, DG pushes the boundary further by requiring strong generalization to entirely unseen domains. Both paradigms have demonstrated substantial potential in practical applications (Bai et al., 2024). Focusing specifically on deep unsupervised domain adaptation, existing subdomain adaptation methods have achieved progress but still face two critical challenges (Yan et al., 2025). First, the semantic relationships among samples are often underexplored. Most current methods focus on aligning global or class-level distributions while overlooking fine-grained semantic structures within the same class. In practice, samples of the same class may exhibit significant intra-class diversity, while those of different classes may share overlapping semantics. Failure to capture these local semantic patterns can lead to suboptimal alignment and impaired discriminability on the target domain. Second, pseudo-label optimization remains complex and error-prone. Given the scarcity of human-annotated samples within the target domain, pseudo-labels are widely used for supervision. To enhance their accuracy, many approaches incorporate additional model components, complex training stages, or confidence calibration strategies. However, such designs increase training overhead and are susceptible to noise in early-stage pseudo-labels, ultimately degrading convergence and performance.

To address these issues, a novel subdomain adaptation method based on transferable semantic alignment and class correlation is proposed in this study. Specifically, the pairwise semantic transfer loss aims to enforce compact clustering of semantically similar instances while promoting greater separation for dissimilar ones in a shared representational domain, enabling fine-grained semantic alignment. A joint subdomain distribution alignment mechanism is introduced to simultaneously align intra-class subdomain distributions across domains while enhancing inter-class separability, thereby alleviating local domain shift. Moreover, a temperature-based soft pseudo-labeling strategy is adopted, and a novel class correlation loss is constructed based on the class-wise self-correlation matrix. This loss facilitates the learning of intra-class consistency and inter-class discriminability without introducing extra network components or multi-stage training, ensuring both simplicity and efficiency. The principal innovations of this research can be outlined as follows:

• A transferable semantic alignment loss is proposed to capture fine-grained semantic relations among samples. Leveraging a pairwise alignment approach, the model improves semantic cohesion among similar instances and enforces separation between divergent ones, thereby building more discriminative semantic structures in the feature space.

• A joint subdomain distribution alignment mechanism is introduced to mitigate local domain shift. By aligning the distributions of subdomains belonging to the same class and enhancing the disparity between those of different classes, the proposed method enables more precise cross-domain substructure alignment.

• A class correlation-driven pseudo-label optimization method is presented, balancing performance and simplicity. High-quality soft labels are generated using temperature rescaling, and a novel class correlation loss is formulated based on self-correlation matrices to enhance both intra-class consistency and inter-class discrimination, without requiring additional network structures or complex training procedures.

To further enhance transparency and reproducibility, we commit to releasing the full source code upon acceptance of this paper. All resources will be hosted on GitHub, enabling researchers to readily reproduce our results and adapt the proposed method to other recommendation tasks.

2 Related works

2.1 Unsupervised domain adaptation

Deep UDA methods are commonly divided into two major paradigms: discrepancy minimization techniques and adversarial-based frameworks (Cheng et al., 2024). While discrepancy-driven techniques align source and target distributions by minimizing metrics such as MMD or CORAL, adversarial-based methods promote the learning of domain-shared features through a min–max game between feature extractors and domain discriminators.

Discrepancy-based approaches typically align domains at three levels—domain-level, class-level, and sample-level—by quantifying and reducing distribution gaps between source and target domains. Ge et al. proposed a deep conditional adaptation network designed to minimize the conditional distribution gap between the source and target domains, thereby facilitating cross-domain adaptation (Ge et al., 2023). Chen et al. introduced HoMM, which performs higher-order moment matching to achieve finer-grained distribution alignment (Chen et al., 2020). Zhu et al. incorporated source-domain label information into the maximum mean and covariance discrepancy (MMCD), aligning domain differences at both the marginal and conditional levels to enhance model generalization (Zhu et al., 2024). Gilo et al. combined three adaptation strategies—local maximum mean discrepancy, correlation alignment, and entropy regularization—to achieve more precise alignment at both the domain and class levels (Gilo et al., 2024). Zhang et al. proposed A2LP, a label propagation method augmented with high-confidence virtual instances (termed anchor points), which refine pseudo-labels at the feature level and alternate between pseudo-label refinement and domain-invariant representation learning for adaptation (Zhang et al., 2020).

Inspired by the architecture of GAN (Goodfellow et al., 2014), adversarial learning techniques formulate a competitive training process between a representation learner and a domain discriminator to promote domain-invariant feature learning. Motivated by conditional GAN (Mirza and Osindero, 2014), the conditional domain adversarial network (CDAN; Long et al., 2018), proposed by Long et al., mitigates domain shift by aligning joint feature–label distributions, effectively enhancing the target domain’s classification capability. Chen et al. further introduced a contrastive adversarial adaptation approach, in which the balance between the feature extractor and the discriminator is optimized to strengthen cross-domain distributional alignment (Chen et al., 2024). Building on this, Yu et al. proposed a category-aware adversarial domain adaptation method. By employing multiple discriminators to capture diverse patterns and incorporating category prototype information, fine-grained alignment between source and target features was achieved at the category level (Yu and Wang, 2024).

Moreover, several studies have explored adversarial learning through dual classifiers. The Maximum Classifier Discrepancy (MCD) approach was developed by Saito et al. to address domain adaptation through classifier disagreement (Saito et al., 2018), which aims to identify target features outside the source support by promoting maximal prediction discrepancy between two independently trained classifiers, while forcing the feature generator to minimize this discrepancy to align the domains. Lü et al. proposed a neighborhood aggregation-based dual-classifier method, in which pseudo-labels for target samples are generated through a nearest-neighbor strategy, and multiple constraints are imposed from different perspectives to regularize the outputs of the two classifiers (Lü et al., 2025). Li et al. introduced a cross-domain gradient discrepancy minimization approach, where the explicit reduction of gradient differences between source and target samples enhances the classifier’s recognition accuracy on the target domain (Li et al., 2024).

2.2 Domain generalization

In recent years, domain generalization (DG) has emerged as a prominent research focus, aiming to train models from one or several related yet distributionally distinct source domains so that they maintain strong generalization performance in unseen target domains (Li et al., 2018). To this end, a variety of methods have been proposed. Dayal et al. integrated maximum mean discrepancy (MMD) with adversarial strategies to align cross-domain distributions, enabling consistency with arbitrary prior distributions and thereby enhancing model robustness (Dayal et al., 2023). Chen et al. introduced an adversarial augmentation mechanism based on angular center loss, which expands the source distribution in latent space and enlarges inter-class margins, thereby generating diverse samples to improve generalization (Chen et al., 2023). Cheng et al. developed an adversarial Bayesian augmentation approach, which synthesizes diversified data to improve performance in previously unseen domains. Meanwhile, meta-learning has attracted increasing attention in the context of DG (Cheng et al., 2023). Tian et al. proposed a cross-domain adaptive meta-learning framework that integrates structural relation modeling with semantic discrimination (Tian et al., 2025). By leveraging structural reduction and causality-driven feature disentanglement, their method extracts stable semantic features, significantly enhancing cross-domain adaptability. Qin et al. designed a bi-level meta-learning framework, where the lower level focuses on domain-specific feature representation and the higher level learns cross-domain priors, thereby improving knowledge transfer and generalization (Qin et al., 2023). Further, Chen et al. introduced a meta-causal learning paradigm that constructs auxiliary domains and employs counterfactual reasoning to identify and model the causal factors underlying the distribution shift between source and auxiliary domains (Chen et al., 2023). These causal insights are then embedded into factor-aware alignment, effectively mitigating distribution discrepancies during testing.

While most DG methods focus on the training phase, another line of work targets the testing phase by adapting models using unlabeled online test data. For example, Wang et al. developed Tent (Wang et al., 2020), a method that adjusts the model at inference by lowering entropy in predictions to boost confidence. Tent updates model parameters online during inference without access to labeled data, thereby reducing generalization error on new domains using only the test data and the model itself. Further extending this idea, Iwasawa et al. introduced T3A (Iwasawa and Matsuo, 2021), a test-phase method for tuning the classifier. T3A adapts the classifier module of a pre-trained model by: (1) deriving pseudo-prototypes per class through a base model trained on source data combined with incoming unlabeled target instances; and (2) classifying new samples based on their distances to the pseudo-prototypes. T3A requires no backpropagation and only modifies the final linear classification layer, resulting in negligible computational overhead during inference and avoiding the instability often caused by stochastic optimization.

3 Methodology

3.1 Proposed TSACC method

Assume that the set D S = { X S , Y S } = { x s i , y s i } i = 1 n s consists of n s labeled source domain samples, with data distribution p, and the set D T = { X T } = { x t i } i = 1 n t consists of n t unlabeled target domain samples, with data distribution q, and p q . It is additionally presumed that both the source and target domains possess an identical label space comprising K categories, represented as Y S = Y T = Y { 1 , 2 , 3 , , K } . TSACC’s objective is to create a deep learning model that effectively narrows the feature distribution divergence between aligned subdomains from the source and target domains, while acquiring transferable representations to reduce the risk on the target domain through supervision from source labels, R t ( f ) = E ( x , y ) q [ f ( x ) y ] .

The overall framework of the TSACC algorithm is shown in Figure 1. Specifically, the framework includes a shared feature extractor F, which maps the raw input data into a shared feature space, denoted as f s = F ( x s ) and f t = F ( x t ) , and a task-specific classifier C, which is shared across domains and used to generate the corresponding predictions, denoted as Y ̂ S = C ( f s ) and Y ̂ T = C ( f t ) . Within the shared feature space, TSACC first partitions data into subdomains based on class labels (using pseudo-labels for target samples). A novel Joint Subdomain Distribution Alignment Loss is then introduced. This loss simultaneously minimizes intra-class subdomain discrepancies to enhance transferability and maximizes inter-class subdomain disparities to improve discriminability, thereby minimizing the impact of domain shift across the source and target distributions. In addition, to promote cross-domain consistency, a Transferable Semantic Alignment Loss is designed to better align class-specific semantic features. This is achieved by encouraging more compact intra-class representations and greater inter-class separation, while replacing source-only features with combinations of source and target features to alleviate negative transfer. To reduce the adverse impact of overconfident pseudo-labels, a temperature rescaling strategy is employed, producing softened pseudo-labels that better reflect prediction uncertainty. Furthermore, a class self-correlation matrix is introduced, and a novel Class Correlation Loss is formulated. This loss function promotes stronger correlations within the same class and weaker correlations between different classes, thereby facilitating the model’s ability to generalize shared traits while distinguishing class boundaries in the target domain. As a result, both the quality of pseudo-labels and the classification performance on the target domain are significantly improved. The following sections detail each component of the TSACC framework. The pseudocode of TSACC is presented in Algorithm 1.

Figure 1
Diagram illustrating a deep learning model for domain adaptation. On the left, images from source domains \(X_S\) and \(X_T\) are processed by feature extractors. The extracted features \(Z_S\) and \(Z_T\) are fed into classifiers, yielding predictions \(\hat{Y}_S\) and \(\hat{Y}_T\). Shared weights connect the feature extractors and classifiers. Loss functions \(\mathcal{L}_{jmm}, \mathcal{L}_{TST}, \mathcal{L}_{cls},\) and \(\mathcal{L}_{CRL}\) measure performance. A dashed line indicates the shared components and processes.

Figure 1. Overall framework of the proposed TSACC method.

ALGORITHM 1
Algorithmic steps for adapting a model from a source domain \(D_S \) to a target domain \(D_T \). Includes initializing models \(F \) and \(C \), adjusting weights, computing predictions, losses, and updating models. Key components are dynamic weight calculation, feature extraction, and computing losses for classification, subdomain distribution alignment, and semantic alignment. Pseudo-labels for the target domain are generated using temperature scaling. The process uses stochastic gradient descent for optimization.

ALGORITHM 1. Transferable semantic alignment and class correlation.

3.2 Source domain classification error loss

Ensuring the effectiveness and stability of unsupervised classification requires controlling empirical risk within the source domain. Specifically, the optimization of the feature extractor F and classifier C relies on minimizing the source-domain supervised classification error. The formal definition of this loss function is as follows:

cls ( x s , y s ) = 1 n s i = 1 n s ce ( C ( F ( x s i ) , y s i ) )     (1)

where ce ( , ) stands for the cross-entropy objective used to supervise classification.

3.3 Joint subdomain distribution alignment loss

There are usually various distribution differences between the data of the source domain and the target domain. To narrow the gap between source and target domains, traditional domain adaptation methods often concentrate on matching their marginal or conditional distributions. But often ignore the importance of joint distribution discrepancy. Through matching the combined probability distributions of source and target domains, the model is guided to acquire richer and more generalizable knowledge, thus significantly enhancing its capability to adapt to the target domain.

Drawing on the JPDA technique (Zhang & Wu, 2020), which employs a discriminative joint probability–based variant of maximum mean discrepancy to bridge domain gaps, a new joint subdomain distribution alignment loss is designed in this paper based on Local Maximum Mean Discrepancy (LMMD). To enhance the transfer ability across subdomains, the method focuses on aligning the distributions of source and target subsets belonging to the same category, and to enlarge the distributional divergence among subdomains belonging to different categories in order to strengthen their discriminative power, thereby significantly reducing the domain divergence existing between source and target datasets. The formal definition of this loss function is as follows:

L jmmd = 1 K k = 1 K x s i D S w sk i ϕ ( F ( x s i ) ) x t j D T w tk j ϕ ( F ( x t j ) ) 1 K ( K 1 ) k = 1 K l k K x s i D S w sk i ϕ ( F ( x s i ) ) x t j D T w tl j ϕ ( F ( x t j ) )     (2)

where F ( x i ) represents the feature vector of sample x i after being processed by the shared feature extractor F, ϕ ( ) denotes a certain feature mapping that maps sample features to a reproducing kernel Hilbert space (RKHS), w sk i and w tk j are the weights of samples x s i and x t j belonging to class k, and i = 1 n s w sk i and j = 1 n s w tk j are both equal to 1. The initial component aims to minimize the distribution gap between source and target subdomains sharing the same class, whereas the latter component seeks to maximize the divergence between subdomains of distinct classes.

3.4 Transferable semantic alignment loss

Despite its ability to reduce distribution mismatches between matched subdomains, the joint distribution alignment loss serves to mitigate the domain shift challenge, it neglects the detailed semantic content embedded within the instances. This oversight might cause mismatches by aligning target features with unrelated source domain features. For example, Features representing monitors in the target domain could be incorrectly positioned near mobile phone features from the source domain, causing misalignment. To address this problem, inspired by linear discriminant analysis (LDA; Xanthopoulos et al., 2013), this paper extends and improves the semantic transfer loss in MSTN and proposes a novel transferable semantic alignment loss, which seeks to accurately model and leverage semantic correlations among samples for aligning class-wise semantic features across both domains. The semantic transfer loss employed in MSTN is formulated as:

L ST = k = 1 K Φ ( C S k , C T k )     (3)

where C S k = Γ ( F ( X S k ) ) and C T k = Γ ( F ( X T k ) ) indicate the centroids of the kth class within the source and target domains’ shared feature space; Φ ( ) is defined as a distance metric, and Γ ( ) as the centroid extraction function.

Equation 3 only considers the semantic matching between same-class samples in the source and target domains. However, we believe that whether inter-domain or intra-domain, within the shared feature space, the separation between samples of the same category should be minimized, while maximizing the distance between samples belonging to different categories. Therefore, the semantic transfer loss can be extended in Equation 4.

L TST = k = 1 K Ψ ( C S k , C T k ) + λ k = 1 K j k K ( Λ ( C S k , C S j ) + Λ ( C S k , C T j ) + Λ ( C T k , C T j ) )     (4)

where λ is a balancing parameter, Ψ ( C 1 , C 2 ) = C 1 C 22 denotes the Euclidean distance between centroids, Λ ( C 1 , C 2 ) = ( C 1 C 2 C 1 C 2 ) p indicates the centroid-wise cosine similarity, and p is an exponent controlling the function’s behavior. The initial term encourages alignment of samples sharing the same class from both source and target domains, while the latter term promotes discrimination among samples of distinct classes within and between domains.

In addition, given that significant distribution differences usually exist between source and target samples, transferring semantic representations directly between the two domains might lead to adverse transfer effects. To address this issue, this study proposes substituting the original source domain features with a hybrid of source and target features, thereby broadening the source domain’s semantic feature space and mitigating negative transfer during semantic alignment. Consequently, the final transferable semantic alignment loss is formulated in Equation 5.

L TST = k = 1 K Ψ ( C k , C T k ) + λ k = 1 K j k K ( Λ ( C k , C j ) + Λ ( C k , C T j ) + Λ ( C T k , C T j ) )     (5)

where C k = Γ ( [ F ( X S k ) , F ( X T k ) ] ) corresponds to the average feature representation of all kth-class samples from the source and target domains in the common embedding space. By minimizing Equation 5, more compact representations can be created within each class, and greater distances between different classes can be achieved, empowering the model to identify semantic associations between the source and target inputs, leading to effective and accurate transfer of semantic structures.

3.5 Pseudo-label smoothing and class correlation loss

In both the subdomain joint distribution matching loss and the transferable semantic alignment loss, pseudo-labels—i.e., the class probability predictions of target domain samples generated by the classifier—are required. Consequently, the quality of the pseudo-labels plays a critical role in determining the model’s overall accuracy. Prior research (Xie et al., 2018) has shown that deep neural networks (DNNs) tend to produce overly confident predictions, which may introduce significant noise during training. To mitigate this issue, pseudo-labels must be softened to yield more reliable probability distributions.

A simple yet effective strategy is temperature scaling (Guo et al., 2017), which adjusts the sharpness of the predicted distributions. Specifically, the likelihood Y ̂ ij that the ith instance belongs to category j is adjusted as Equation 6.

Y ̂ ij = exp ( Z ̂ ij / T ) j = 1 K exp ( Z ̂ ij / T )     (6)

where Z ̂ ij denotes the raw logits (pre-softmax outputs) of the classifier, and T serves as the temperature coefficient controlling distribution smoothness. By setting T > 1, the softmax outputs are smoothed, reducing overconfidence and enhancing the robustness of pseudo-labels. This adjustment alleviates the risk of model collapse caused by incorrect but highly confident predictions and promotes more stable learning.

To further exploit the class-wise relational structure in the classifier outputs, the class correlation matrix is introduced, inspired by the minimum class confusion (MCC) method (Jin et al., 2020). It is defined in Equation 7.

U = Y ̂ T Y ̂     (7)

where Y ̂ R B × K indicates the soft pseudo-label distribution corresponding to the ith target data instance. Samples with near-uniform distributions exhibit high uncertainty and contribute limited supervisory value. In contrast, those with pronounced peaks (i.e., clearer predictions) provide more meaningful guidance during training.

To incorporate this reliability into learning, an entropy-based weighting scheme is applied. The prediction uncertainty of each sample is quantified by its entropy and transformed into a weighting factor via a softmax function, as illustrated in Equation 8.

E ( y ̂ i ) = j = 1 K y ̂ ij log ( y ̂ ij ) Q ii = exp ( E ( y ̂ i ) ) m = 1 B exp ( E ( y ̂ m ) ) U = Y ̂ T Q Y ̂     (8)

where y ̂ i R 1 × K denotes the pseudo-label vector of the ith sample, and Q is a diagonal matrix. This mechanism ensures that samples with high prediction confidence are assigned greater influence in the learning process, while uncertain samples are down-weighted, thereby improving the robustness of model updates.

In the class correlation matrix, diagonal entries quantify intra-class correlation, while off-diagonal entries reflect inter-class correlation—i.e., the degree of confusion between classes. High intra-class correlation enhances the model’s ability to capture shared characteristics within a class, whereas low inter-class correlation promotes better separation between different classes. To leverage this, a novel class correlation loss is proposed to refine the classifier’s discriminative capacity for target domain samples. It jointly encourages higher correlation among samples of the same class and discourages correlation between samples from different classes, thereby facilitating the learning of more distinct and cohesive feature representations. This objective can be mathematically expressed in Equation 9.

CRL = 1 K ( i = 1 , j i K U ij μ i = 1 , j = i K U ij )     (9)

where i = 1 , j i K U ij indicates the overall correlation between different classes, and i = 1 , j = i K U ij corresponds to the aggregated correlation within each class, and μ is a balancing coefficient. The reduction of Equation 9 facilitates improved consistency among same-class samples and clearer discrimination between different classes, ultimately enhancing both pseudo-label precision and target domain classification effectiveness.

Remark. The main difference between our proposed method and MCC (Zhang & Wu, 2020) lies in the construction of the weight matrix. In MCC, the weight W is computed from the entropy of the prediction distribution via a sigmoidal mapping, which reflects the absolute confidence of individual samples. In contrast, our approach defines a weight matrix Q based on the energy function E ( y ̂ i ) , followed by a softmax normalization across the mini-batch which is illustrated in Equation 10.

Q ii = exp ( E ( y ̂ i ) m = 1 B exp ( E ( y ̂ m ) )     (10)

This design highlights the relative confidence differences among samples within a batch. Compared to W, the use of Q provides three advantages: (1) enhanced stability due to the smooth nature of the softmax mapping, (2) better discrimination between high-confidence and low-confidence samples through batch-level normalization, and (3) theoretical consistency with contrastive learning frameworks that also rely on energy-based normalization.

3.6 Objective loss function of TSACC

By aggregating Equations 1, 2, 5, and 9, the unified objective function for TSACC is constructed in Equation 11.

= cls + α jmmd + β TST + γ CRL     (11)

where α , β and γ is a hyperparameter balancing the importance of each term.

4 Experimental results and analysis

4.1 Algorithm setup and experimental environment

In this work, TSACC adopts ResNet-50 (He et al., 2016) as the shared feature extractor, employing ImageNet-pretrained weights as initialization, followed by model fine-tuning in the training process. For the classifier, a single fully connected layer with 256 inputs and K outputs (K indicates how many distinct classes are present in the classification setting) is used. To obtain class probability distributions, the output is processed using a Softmax function. Training is performed for 200 epochs, and the final network is used for prediction.

Considering that the subdomain joint distribution matching loss and the semantic transfer loss rely on pseudo-labels from the target domain during training, TSACC employs a progressive strategy to mitigate noisy pseudo-labels in the early training stage. Specifically, TSACC introduces a dynamic hyperparameter τ = 2 e δθ 1 , which changes from 0 to 1 during training, to adjust the balancing hyperparameters α and β , denoted as α = α τ and β = β τ respectively. In the experiments, δ is set to 10, and θ linearly increases from 0 to 1 throughout training. This gradual adjustment approach is designed to enable the network to effectively capture class-wise relational patterns from the autocorrelation matrix during the initial training phase, thereby improving pseudo-label quality and facilitating the learning of the subsequent subdomain joint distribution matching and semantic transfer modules.

For every experimental setup, SGD optimizer with momentum set to 0.9 is applied, alongside the learning rate decay strategy introduced by Revgrad (Ganin and Lempitsk, 2015). Given the high computational cost, rather than employing grid search to determine the best learning rate, TSACC utilizes the adaptive learning rate update formula presented at: η θ = η 0 ( 1 + σθ ) ω , where θ linearly changes from 0 to 1, η 0 = 0.01 , σ = 10 , ω = 0.75 . This adaptive learning rate schedule effectively controls computational costs while enhancing model stability. Additionally, the exponent p is assigned a value of 0.1, while the temperature parameter T is fixed at 1.8. Finally, all experiments are conducted under the experimental environment summarized in Table 1.

Table 1
www.frontiersin.org

Table 1. Experimental environment of TSACC algorithm.

4.2 Experimental design and results analysis

The effectiveness of TSACC is demonstrated through empirical evaluation on a set of well-established public datasets commonly utilized in deep domain adaptation research. The datasets used in this section include: Cross-domain object recognition: ImageCLEF-DA, Office-31, Office-Home. Cross-domain handwritten digit recognition: USPS dataset, MNIST dataset, and SVHN dataset. The main baseline comparison methods include: DANN (Ajakan et al., 2014), MCD (Saito et al., 2018), MSTN (Xie et al., 2018), CDAN (Chen et al., 2024), GPDA (Kim et al., 2019), SWD (Lee et al., 2019), DFA-ENT (Wang et al., 2021), DSAN (Zhu et al., 2020), MCC (Jin et al., 2020), DCP (Chen et al., 2021), SCDA (Li et al., 2021), DALN (Chen et al., 2022), BIWAA (Westfechtel et al., 2023), FACT (Schrod et al., 2023), DAMP (Du et al., 2024) and PDA (Bai et al., 2024) among others.

4.2.1 Office-31 dataset

As a representative benchmark, Office-31 has been widely used in numerous studies focused on domain adaptation. It consists of images from real office environments spanning 31 different categories, with a total of 4,110 images. These samples originate from three separate domains, namely Amazon (A), Webcam (W), and DSLR (D). Sample images from the dataset are shown in Figure 2. While Amazon domain images are obtained online from amazon.com, the Webcam and DSLR domains include photos taken with a webcam and a DSLR camera in different physical environments.

Figure 2
(a) Two rulers and a black desk lamp on a blue background. (b) A yellow ruler and a black desk lamp on a wooden surface. (c) A blue pen with a light on a wooden surface and a black desk lamp with a webcam.

Figure 2. Sample images from the Office-31 dataset.

For an in-depth performance comparison among domain adaptation approaches, the Office-31 dataset’s three domains are cyclically designated as source and target domains, yielding six unique transfer tasks: A → W, D → W, W → D, A → D, D → A, and W → A. A mini-batch size of 32 is adopted during training. The learning rate is initialized to 0.001 for the shared feature encoder and 0.01 for the classification module. Detailed outcomes are presented in Table 2.

Table 2
www.frontiersin.org

Table 2. Recognition accuracy (%) of various algorithms on the office-31 dataset.

4.2.2 Office-home dataset

As a frequently used benchmark in domain adaptation, the Office-Home dataset is characterized by its higher level of difficulty and complexity. It not only contains real-world images but also includes diverse styles such as illustrations. The dataset covers 65 categories, with a total of 15,588 images. Based on the source of image collection, the image data are partitioned into four domains: Artistic (A), Clipart (C), Product (P), and Real-World (R). Figure 3 displays representative samples from each domain. To comprehensively evaluate the effectiveness of various domain adaptation methods, the Office-Home dataset’s four domains are pairwise combined by alternately designating them as source and target domains, forming 12 different domain adaptation tasks (A → C, A → P, A → R, C → A, C → P, C → R, P → A, P → C, P → R, R → A, R → C, R → P) for evaluation. A batch size of 96 is adopted during training, with initial learning rates of 0.003 and 0.03 assigned to the shared feature extractor and the classifier. Table 3 presents the corresponding experimental results.

Figure 3
Two rows of images show different styles of clocks and bicycles labeled as

Figure 3. Sample images from the Office-Home dataset.

Table 3
www.frontiersin.org

Table 3. Recognition accuracy (%) of various algorithms on the office-home dataset.

4.2.3 ImageCLEF-DA dataset

The ImageCLEF-DA dataset, designed for the ImageCLEF2014 domain adaptation challenge, consists entirely of authentic natural images. It is divided into three domains: ImageNet ILSVRC2012 (I), Caltech-256 (C), and Pascal VOC2012 (P). Each domain contains 12 categories with 600 images, totaling 1800 images across all domains. Representative samples are shown in Figure 4.

Figure 4
Composite image with three sets. (a) Two images from ImageNet ILSVRC2012: a silhouette of an airplane and a dog drinking from water. (b) Two images from Caltech-256: an airplane on a runway and a German Shepherd on grass. (c) Two images from Pascal VOC2012: an airplane taking off and a black and white dog.

Figure 4. Sample images from the ImageCLEF-DA dataset.

To comprehensively evaluate the effectiveness of various domain adaptation methods, the three domains of ImageCLEF-DA were alternately used as source and target domains, forming six domain adaptation tasks (I → P, P → I, I → C, C → I, C → P, P → C). Training was conducted with a batch size of 32, and initial learning rates were set to 0.001 and 0.01 for the shared feature encoder and classifier, respectively. The classification accuracy metrics achieved by several algorithms are summarized in Table 4.

Table 4
www.frontiersin.org

Table 4. Recognition accuracy (%) of various algorithms on the ImageCLEF-DA dataset.

4.2.4 MNIST-USPS-SVHN dataset

MNIST, USPS, and SVHN are three widely used handwritten digit datasets, each containing images of digits from 0 to 9, covering 10 classes in total. The MNIST dataset is composed of 28 × 28 grayscale images of digits, USPS consists of 16 × 16 grayscale digit images, and SVHN includes 32 × 32 color images of digits. Sample images from these datasets are shown in Figure 5.

Figure 5
Sample images from three datasets showcasing the digits zero through nine. The top row displays handwritten digits from the MNIST dataset. The middle row shows similar digits from the USPS dataset. The bottom row features the SVHN dataset, depicting digits captured from house numbers in varying colors and resolutions.

Figure 5. Sample images from the MNIST-USPS-SVHN datasets.

To evaluate the performance of various methods, three domain transfer tasks were set up: MNIST → USPS, USPS → MNIST, and SVHN → MNIST. Throughout training, a batch size of 96 was utilized, initializing the learning rates at 0.003 for the shared feature extractor and 0.03 for the classifier. The performance of the evaluated methods in terms of classification accuracy on these tasks is presented in Table 5.

Table 5
www.frontiersin.org

Table 5. Recognition accuracy (%) of various algorithms on the MNIST-USPS-SVHN datasets.

The proposed TSACC algorithm was systematically evaluated on multiple benchmark datasets for cross-domain object recognition and handwritten digit classification, including Office-31, Office-Home, ImageCLEF-DA, as well as MNIST, USPS, and SVHN. Experimental results demonstrate that TSACC achieves superior performance across nearly all tasks, with average classification accuracy matching or exceeding state-of-the-art methods. In the first task group, TSACC exhibited slightly lower performance than PDA; however, its advantages became pronounced on larger-scale or more complex datasets, such as Office-Home and ImageCLEF-DA, where its class-center alignment and cross-domain adaptation strategies effectively captured domain-invariant features, leading to significant improvements in generalization. The experimental findings indicate that TSACC consistently delivers strong results in both cross-domain object recognition and handwritten digit recognition tasks, highlighting its applicability across diverse data types and task scenarios. Furthermore, TSACC maintained leading average accuracy across all evaluated datasets, confirming its effectiveness in capturing domain-invariant representations and enhancing knowledge transfer. Overall, TSACC not only outperforms existing adversarial and non-adversarial approaches but also provides a practical framework for efficient cross-domain adaptation without relying on complex adversarial training.

4.2.5 Algorithm convergence analysis

The convergence behavior of TSACC was assessed through experiments on two tasks: A → W from Office-31 and A → C from Office-Home. The variation of the total objective loss function over the entire training process was monitored, as shown in Figure 6. With more training epochs, it becomes apparent that the loss values stabilize after approximately 80 epochs in both tasks. This result indicates that the TSACC algorithm converges rapidly, demonstrating its favorable convergence properties.

Figure 6
Two line graphs display training loss over epochs. The left graph, labeled

Figure 6. Convergence curves of the TSACC algorithm on different tasks. (a) A → W, (b) A → C.

4.2.6 Feature visualization experiment

To provide an intuitive assessment of TSACC’s feature representation capacity, t-SNE was employed to project the extracted features into a low-dimensional space for visualization. Task D → A was selected as an example, where the feature distributions of ResNet-50, DANN, MCC, and the proposed TSACC are compared (Figure 7). The results show that TSACC yields markedly stronger intra-class compactness, with samples of the same category clustering more tightly in the low-dimensional space. In contrast, the other three methods exhibit greater intra-class dispersion and weaker consistency. Inter-class separability is also more pronounced under TSACC, with sharper category boundaries that reduce cross-class confusion.

Figure 7
Four scatter plots displaying data clusters of source samples in red and target samples in blue. Each plot represents a different model: (a) ResNet-50, (b) DANN, (a) MCC, and (b) TSACC. The distribution and overlap of the red and blue clusters vary across the models.

Figure 7. t-SNE visualization of feature distributions for D → A. (a) ResNet-50, (b) DANN, (c) MCC and (d) TSACC.

Moreover, source–target alignment is achieved more accurately with TSACC: samples from the same category in both domains overlap more closely in the feature space, indicating more effective preservation and alignment of semantic structures across domains. Compared with competing methods, the representations learned by TSACC strike a better balance between discriminability and domain invariance, thereby facilitating improved classification performance on the target domain. These visual observations are consistent with the quantitative results on classification accuracy and domain discrepancy, confirming that TSACC produces more compact intra-class representations, clearer inter-class separation, and more precise source–target alignment. This demonstrates that the proposed semantic alignment and class-correlation constraints enhance both discriminability and generalization without introducing additional structural complexity.

4.2.7 Parameter sensitivity analysis

The TSACC objective function contains three balancing hyperparameters denoted as α , β and γ . To assess how the model responds to changes in these hyperparameters, their values were systematically tuned over the interval {0.2, 0.4, 0.6, 0.8, 1}. This evaluation was based on the A → W scenario in the Office-31 dataset and the C → P scenario in the ImageCLEF-DA dataset. The results, illustrated in Figure 8, show that the model exhibits low sensitivity to hyperparameter α , β , maintaining stable performance across its range. Although some sensitivity to hyperparameter γ was observed, the overall fluctuation was limited and could be mitigated by fine-tuning.

Figure 8
Graphs showing classification accuracy against parameters alpha, beta, and gamma. Each graph depicts two datasets: Office-31 with consistently high accuracy around 95%, and ImageCLEF-DA with accuracy around 80%. Each graph shows a stable trend for both datasets.

Figure 8. Classification accuracy of TSACC with varying balancing hyperparameters.

4.2.8 Confusion matrix visualization

Confusion matrix visualization provides an intuitive assessment of the model’s classification performance across different categories. The domain adaptation task from ImageCLEF-DA involving transfer from domain I to C was used for experimental validation, with results presented in Figure 9. The diagonal elements represent the proportion of correctly classified samples per class, with darker colors indicating higher accuracy. Compared to the DCP algorithm, TSACC exhibits substantially reduced inter-class confusion, providing evidence that the proposed technique can accurately extract class-level association features. This enhancement improves the classifier’s discriminative ability for target domain category features, thereby boosting recognition performance on target domain samples.

Figure 9
Two confusion matrices comparing classification performance. The first matrix shows precision ranging from 0.85 to 1.00, with most values near the diagonal. The second matrix indicates improved results, with precision ranging from 0.91 to 1.00, showing better accuracy along the diagonal for each class category such as aeroplane, bike, bird, etc. Color intensity represents precision levels.

Figure 9. Confusion matrices of different algorithms on the I → C task of the ImageCLEF-DA dataset. (a) DCP, (b) TSACC.

4.2.9 Computational efficiency and resource consumption analysis

Under the experimental environment summarized in Table 1, the training and inference resource consumption of the TSACC algorithm was evaluated across representative cross-domain tasks, and the results are presented in Table 6. For the Office-31 (A→W) task, the training time per epoch was 10.4 s, with a peak GPU memory usage of 6.3 GB. In the Office-Home (A→C) task, which involves a larger dataset, each epoch required 30.2 s of training and 12.5 GB of GPU memory. For the smaller ImageCLEF-DA (I→C) task, the per-epoch training time was 5.6 s, with 5.7 GB of memory usage. The SVHN→MNIST task, involving handwritten digits, required 58.4 s per epoch and 12.1 GB of GPU memory. The average inference time per sample was approximately 2.2 ms across all tasks. These results indicate that the TSACC algorithm maintains strong model discriminative capability while achieving an average per-sample inference latency of ~2 ms, satisfying real-time requirements. Overall, the algorithm demonstrates fast training, moderate memory usage, and high inference efficiency on modern hardware, suggesting its feasibility for deployment in resource-constrained environments.

Table 6
www.frontiersin.org

Table 6. Resource consumption of the TSACC algorithm.

4.2.10 Ablation study

To rigorously validate the effectiveness of each module in the proposed TSACC framework, ablation experiments were conducted on four benchmark datasets: Office-31, Office-Home, ImageCLEF-DA, and MNIST–USPS–SVHN. The results are reported in Tables 710. Four ablation settings were considered: Method 1, retaining only the source classification loss while removing all other components. Method 2, excluding the subdomain joint distribution alignment loss jmmd . Method 3, excluding the transferable semantic consistency loss TST . Method 4, excluding the class correlation loss CRL .

Table 7
www.frontiersin.org

Table 7. Ablation study results of TSACC on the office-31 dataset (%).

Table 8
www.frontiersin.org

Table 8. Ablation study results of TSACC on the office-home dataset (%).

Table 9
www.frontiersin.org

Table 9. Ablation study results of TSACC on the ImageCLEF-DA dataset (%).

Table 10
www.frontiersin.org

Table 10. Ablation study results of TSACC on the MNIST-USPS-SVHN dataset (%).

Across all transfer tasks, the complete TSACC model (including all loss terms) consistently achieved the highest classification accuracy. For instance, on Office-31, TSACC reached an average accuracy of 90.7%, outperforming Method 1 (85.4%) and the reduced variants Method 2 (88.2%), Method 3 (88.8%), and Method 4 (88.7%). Similar trends were observed on other datasets: on Office-Home, TSACC achieved 72.9% compared to 66.5% (Method 1), 69.2% (Method 2), 71.5% (Method 3), and 70.7% (Method 4); on ImageCLEF-DA, TSACC reached 90.7%, surpassing 85.1% (Method 1) and 88.2%/89.5%/89.1% (Methods 2–4); on the digit benchmark, TSACC achieved 98.3%, clearly exceeding 92.7% (Method 1) and 94.6%/96.9%/96.5% (Methods 2–4). These results highlight that the integrated TSACC model consistently delivers superior performance, whereas removing any advanced loss term leads to noticeable degradation. Excluding the subdomain joint distribution alignment loss reduced accuracy by ~2–3 percentage points, while removing either the transferable semantic consistency loss or the class correlation loss caused declines of ~1–2 points. The poorest performance was observed in Method 1, where only the source classification loss was retained, resulting in a 5–6 point drop relative to the full model. This demonstrates that source-domain supervision alone is insufficient for effective target-domain adaptation. Further analysis reveals the complementary roles of different modules. The subdomain joint distribution alignment loss plays a critical role in fine-grained alignment of source–target feature distributions, as evidenced by the sharp performance decline when it was removed. The transferable semantic consistency loss facilitates the learning of more discriminative shared representations in high-dimensional space, contributing directly to improved target-domain classification. The class correlation loss leverages intra- and inter-class relationships to enhance discriminability, with performance decreases observed upon its removal. When all three loss terms operate jointly, the model achieves its best results, confirming their mutual complementarity. Overall, the ablation study provides strong evidence for the rationality and necessity of the TSACC design: each module yields tangible performance gains, and their integration enables the model to achieve state-of-the-art domain adaptation accuracy across diverse transfer scenarios.

5 Conclusion

The proposed TSACC method avoids the complexity of adversarial training while achieving rapid and stable convergence. In this approach, label categories are used to partition different subdomains, and a novel transfer semantic loss is introduced to deeply capture the intrinsic semantic structures within both source and target domains. Additionally, TSACC designs a new joint distribution matching loss that simultaneously reduces distribution discrepancies among subdomains of the same class and enlarges the differences between subdomains of different classes. To uphold the reliability of pseudo-label assignments and ensure accurate model optimization, temperature scaling is applied for pseudo-label smoothing. Finally, by incorporating the concept of a class autocorrelation matrix, the proposed class correlation loss aims to boost the model’s proficiency in identifying intra-class feature consistency within the target domain, as well as to sharpen decision boundaries between different classes.

Despite the distributional differences between the source and target domains, this work presumes a common label set. Nevertheless, in real-world applications, this assumption may fail because the label spaces of the target and source domains might not align. Future work will focus on integrating feature selection techniques to extend the proposed method to broader domain adaptation settings.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://openxlab.org.cn/datasets/OpenDataLab/Office-31/tree/main.

Author contributions

QH: Conceptualization, Writing – original draft. JL: Methodology, Writing – review & editing. JZ: Formal analysis, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

JL is employed by the company China Mobile Communications Group Guangdong Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ajakan, H., Germain, P., and Larochelle, H. (2014). Domain-adversarial neural networks. arXiv preprint arXiv 1412:4446.

Google Scholar

Bai, S., Zhang, M., Zhou, W., Huang, S., Luan, Z., Wang, D., et al. (2024). Prompt-based distribution alignment for unsupervised domain adaptation. Proceed. AAAI conference on artificial intelligence. 38, 729–737. doi: 10.1609/aaai.v38i2.27830

Crossref Full Text | Google Scholar

Chen, T, Baktashmotlagh, M, and Wang, Z, “Center-aware adversarial augmentation for single domain generalization.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. (2023): 4157–4165.

Google Scholar

Chen, L, Chen, H, and Wei, Z, “Reusing the task-specific classifier as a discriminator: discriminator-free adversarial domain adaptation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2022): 7181–7190.

Google Scholar

Chen, C., Fu, Z., Chen, Z., Jin, S., Cheng, Z., Jin, X., et al. (2020). Homm: higher-order moment matching for unsupervised domain adaptation. Proceed. AAAI conference on artificial intelligence 34, 3422–3429. doi: 10.1609/aaai.v34i04.5745

Crossref Full Text | Google Scholar

Chen, J, Gao, Z, and Wu, X, “Meta-causal learning for single domain generalization.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2023): 7683–7692.

Google Scholar

Chen, H., Li, L., and Chen, J. (2021). Unsupervised domain adaptation via double classifiers based on high confidence pseudo label. arXiv preprint arXiv 2105:04729.

Google Scholar

Chen, J., Zhang, Z., and Li, L. (2024). Contrastive adversarial training for unsupervised domain adaptation. arXiv preprint arXiv 2407:12782.

Google Scholar

Cheng, S, Gokhale, T, and Yang, Y. “Adversarial bayesian augmentation for single-source domain generalization.” Proceedings of the IEEE/CVF International Conference on Computer Vision. (2023): 11400–11410.

Google Scholar

Cheng, Z., Wang, S., Yang, D., Qi, J., Xiao, M., and Yan, C. (2024). Deep joint semantic adaptation network for multi-source unsupervised domain adaptation. Pattern Recogn. 151:110409. doi: 10.1016/j.patcog.2024.110409

Crossref Full Text | Google Scholar

Dayal, A., Vimal, K. B., and Cenkeramaddi, L. R. (2023). MADG: margin-based adversarial learning for domain generalization. Adv. Neural Inf. Proces. Syst. 36, 58938–58952.

Google Scholar

Du, Z, Li, X, and Li, F, “Domain-agnostic mutual prompting for unsupervised domain adaptation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2024): 23375–23384.

Google Scholar

Fan, X, Wang, Q, and Ke, J, “Adversarially adaptive normalization for single domain generalization.” Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. (2021): 8208–8217.

Google Scholar

Ganin, Y., and Lempitsk, V. (2015). Unsupervised domain adaptation by backpropagation. International conference on machine learning. PMLR, 1180–1189.

Google Scholar

Gao, C., Cai, G., Jiang, X., and Zheng, F. (2022). Conditional feature learning based transformer for text-based person search[J]. IEEE Transactions on Image Processing, 31, 6097–6108.

Google Scholar

Ge, P., Ren, C. X., Xu, X. L., and Yan, H. (2023). Unsupervised domain adaptation via deep conditional adaptation network. Pattern Recogn. 134:109088. doi: 10.1016/j.patcog.2022.109088

Crossref Full Text | Google Scholar

Gilo, O., Mathew, J., Mondal, S., and Sandoniya, R. K. (2024). Subdomain adaptation via correlation alignment with entropy minimization for unsupervised domain adaptation. Pattern. Anal. Applic. 27:13. doi: 10.1007/s10044-024-01232-9

Crossref Full Text | Google Scholar

Goodfellow, I., Pouget-Abadie, J., and Mirza, M. (2014). Generative adversarial nets. Adv. Neural Inf. Proces. Syst. 27.

Google Scholar

Guo, C., Pleiss, G., and Sun, Y. (2017). On calibration of modern neural networks. International conference on machine learning. PMLR, 1321–1330.

Google Scholar

He, K., Zhang, X., and Ren, S. (2016). Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 770–778.

Google Scholar

Iwasawa, Y., and Matsuo, Y. (2021). Test-time classifier adjustment module for model-agnostic domain generalization. Adv. Neural Inf. Proces. Syst. 34, 2427–2440.

Google Scholar

Jin, Y, Wang, X, and Long, M, “Minimum class confusion for versatile domain adaptation.” Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, august 23–28, 2020, proceedings, part XXI 16. Springer International Publishing, (2020): 464–480.

Google Scholar

Khoee, A. G., Yu, Y., and Feldt, R. (2024). Domain generalization through meta-learning: a survey. Artif. Intell. Rev. 57:285. doi: 10.1007/s10462-024-10922-z

Crossref Full Text | Google Scholar

Kim, M, Sahu, P, and Gholami, B, “Unsupervised visual domain adaptation: a deep max-margin gaussian process approach.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2019): 4380–4390.

Google Scholar

Lee, C Y, Batra, T, and Baig, M H, “Sliced wasserstein discrepancy for unsupervised domain adaptation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019): 10285–10295.

Google Scholar

Li, H., Pan, S. J., and Wang, S. (2018). Domain generalization with adversarial feature learning. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 5400:5409.

Google Scholar

Li, S, Xie, M, and Lv, F, “Semantic concentration for domain adaptation.” Proceedings of the IEEE/CVF international conference on computer vision. (2021): 9102–9111.

Google Scholar

Li, J., Zhu, L., and Du, Z. (2024). Bi-classifier adversarial learning-based unsupervised domain adaptation//unsupervised domain adaptation: Recent advances and future perspectives. Singapore: Springer Nature Singapore, 69–104.

Google Scholar

Liu, X., Yoo, C., Xing, F., Oh, H., el Fakhri, G., Kang, J. W., et al. (2022). Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing 11. doi: 10.1561/116.00000192

Crossref Full Text | Google Scholar

Long, M., Cao, Z., and Wang, J. (2018). Conditional adversarial domain adaptation. Adv. Neural Inf. Proces. Syst. 31.

Google Scholar

Lü, S., Zhang, X., Li, Z., Li, J., and Kang, M. (2025). Bi-classifier with neighborhood aggregation for unsupervised domain adaptation. Inf. Sci. 718:122399. doi: 10.1016/j.ins.2025.122399,

PubMed Abstract | Crossref Full Text | Google Scholar

Mei, X., Liu, Z., Robson, P. M., Marinelli, B., Huang, M., Doshi, A., et al. (2022). RadImageNet: an open radiologic deep learning research dataset for effective transfer learning. Radiology. Artif. Intell. 4:e210315. doi: 10.1148/ryai.210315,

PubMed Abstract | Crossref Full Text | Google Scholar

Mirza, M., and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv 1411:1784.

Google Scholar

Mohammed, A., and Kora, R. (2023). A comprehensive review on ensemble deep learning: opportunities and challenges. J. King Saud University-Computer Info. Sci. 35, 757–774. doi: 10.1016/j.jksuci.2023.01.014

Crossref Full Text | Google Scholar

Niu, Z., Zhong, G., and Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62. doi: 10.1016/j.neucom.2021.03.091

Crossref Full Text | Google Scholar

Qin, X, Song, X, and Jiang, S. “Bi-level meta-learning for few-shot domain generalization.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2023): 15900–15910.

Google Scholar

Saito, K., Watanabe, K., and Ushiku, Y. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 3723–3732.

Google Scholar

Schrod, S., Lippl, J., and Schäfer, A. (2023). Fact: Federated adversarial cross training. arXiv preprint arXiv 2306:00607.

Google Scholar

Singhal, P., Walambe, R., Ramanna, S., and Kotecha, K. (2023). Domain adaptation: challenges, methods, datasets, and applications. IEEE access 11, 6973–7020. doi: 10.1109/ACCESS.2023.3237025

Crossref Full Text | Google Scholar

Tian, Q, Zhao, C, and Shao, M, “Mldgg: Meta-learning for domain generalization on graphs.” Proceedings of the 31st ACM SIGKDD conference on knowledge discovery and data mining V. 1. (2025): 1361–1372.

Google Scholar

Wang, J., Chen, J., Lin, J., Sigal, L., and de Silva, C. W. (2021). Discriminative feature alignment: improving transferability of unsupervised domain adaptation by Gaussian-guided latent alignment. Pattern Recogn. 116:107943. doi: 10.1016/j.patcog.2021.107943

Crossref Full Text | Google Scholar

Wang, D., Shelhamer, E., and Liu, S. (2020). Tent: Fully test-time adaptation by entropy minimization. arxiv preprint arxiv 2006:10726.

Google Scholar

Wang, H., Song, Y., Yang, H., and Liu, Z. (2025). Generalized Koopman Neural Operator for Data-driven Modelling of Electric Railway Pantograph-catenary Systems[J]. IEEE Transactions on Transportation Electrification. 99:1. doi: 10.1109/TTE.2025.3609347

Crossref Full Text | Google Scholar

Westfechtel, T, Yeh, H W, and Meng, Q, “Backprop induced feature weighting for adversarial domain adaptation with iterative label distribution alignment.” Proceedings of the IEEE/CVF winter conference on applications of computer vision. (2023): 392–401.

Google Scholar

Xanthopoulos, P., Pardalos, P. M., Trafalis, T. B., Xanthopoulos, P., Pardalos, P. M., and Trafalis, T. B. (2013). Linear discriminant analysis. Robust Data Mining, 27–33. doi: 10.1007/978-1-4419-9878-1_4

Crossref Full Text | Google Scholar

Xie, S., Zheng, Z., and Chen, L. (2018). Learning semantic representations for unsupervised domain adaptation. Int. Conference On Machine Learn. PMLR, 5423–5432.

Google Scholar

Yamada, K. D., Lin, F., and Nakamura, T. (2021). Developing a novel recurrent neural network architecture with fewer parameters and good learning performance[J]. Interdisciplinary information sciences, 27, 25–40.

Google Scholar

Yan, J., Cheng, Y., Zhang, F., Li, M., Zhou, N., Jin, B., et al. (2025). Research on multimodal techniques for arc detection in railway systems with limited data[J]. Structural Health Monitoring, 14759217251336797. doi: 10.1177/14759217251336797

Crossref Full Text | Google Scholar

Yang, H., Liu, Z., Ma, N., Wang, X., Liu, W., Hui, W., et al. (2025). CSRM-MIM: A Self-Supervised Pre-training Method for Detecting Catenary Support Components in Electrified Railways[J]. IEEE Transactions on Transportation Electrification, 99:1. doi: 10.1109/TTE.2025.3562604

Crossref Full Text | Google Scholar

Yu, Z, and Wang, P. “Capan: class-aware prototypical adversarial networks for unsupervised domain adaptation.” 2024 IEEE international conference on multimedia and expo (ICME). IEEE, (2024): 1–6.

Google Scholar

Zhang, Y., Deng, B., and Jia, K. (2020). “Label propagation with augmented anchors: a simple semi supervised learning baseline for unsupervised domain adaptation” in European conference on computer vision (Cham: Springer International Publishing), 781–797.

Google Scholar

Zhang, W, and Wu, D. “Discriminative joint probability maximum mean discrepancy (DJP-MMD) for domain adaptation.” 2020 international joint conference on neural networks (IJCNN). IEEE, (2020): 1–8.

Google Scholar

Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. (2022). Domain generalization: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1–20. doi: 10.1109/TPAMI.2022.3195549,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhu, T, Zheng, Y, and Pu, J. “Class discriminative maximum mean and covariance discrepancy for unsupervised domain adaptation.” 2024 9th international conference on signal and image processing (ICSIP). IEEE, (2024): 226–230.

Google Scholar

Zhu, Y., Zhuang, F., Wang, J., Ke, G., Chen, J., Bian, J., et al. (2020). Deep subdomain adaptation network for image classification. IEEE transactions on neural networks and learning systems 32, 1713–1722. doi: 10.1109/TNNLS.2020.2988928,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: joint subdomain distribution alignment, transferable semantic alignment loss, class correlation-driven pseudo-label optimization, intra-class consistency, inter-class discriminability

Citation: Han Q, Lao J and Zhang J (2026) Subdomain adaptation method based on transferable semantic alignment and class correlation. Front. Neurorobot. 19:1665528. doi: 10.3389/fnbot.2025.1665528

Received: 14 July 2025; Accepted: 21 October 2025;
Published: 05 January 2026.

Edited by:

Xianmin Wang, Guangzhou University, China

Reviewed by:

Çağlar Uyulan, Izmir Katip Celebi University, Türkiye
Guozhao Kou, Vocational Flight college of Mianyang, China
Changlin Chen, University of Science and Technology of China, China

Copyright © 2026 Han, Lao and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jinyong Zhang, MjAwNTAxMDAxOEBtbXB0LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.