An improved contrastive learning loss function for automated clock-drawing test grading with implications for cognitive impairment screening

Liu, Ning; Sun, Qian; Xu, Xiaoyin; Mou, Haifeng; Liao, Xinhai; Rong, Bokai; Wang, Lingxing

doi:10.3389/fcomp.2026.1690044

ORIGINAL RESEARCH article

Front. Comput. Sci., 20 February 2026

Sec. Computer Vision

Volume 8 - 2026 | https://doi.org/10.3389/fcomp.2026.1690044

An improved contrastive learning loss function for automated clock-drawing test grading with implications for cognitive impairment screening

NL
Ning Liu ¹
QS
Qian Sun ¹^*
XX
Xiaoyin Xu ²
HM
Haifeng Mou ¹
XL
Xinhai Liao ¹
BR
Bokai Rong ¹
LW
Lingxing Wang ³

1. School of Science, Zhejiang University of Science and Technology, Hangzhou, China
2. College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
3. Department of Neurology, Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian, China

Article metrics

View details

200

Views

117

Downloads

Abstract

Contrastive learning has been attracting much interest in recent years for its ability to train without labeled data. An important factor in its success is the loss function, which guides the search for prominent features that separate the positive and negative classes. The triplet loss function is widely used in contrastive learning, in which the objective is to attract a pair of positive instances while pushing away a negative instance from the anchor instance, where one of the positive instances is often an augmented version of the anchor. To improve the performance of contrastive learning in automated Clock-Drawing Test (CDT) grading, this paper proposes a more comprehensive triplet loss function that aims to not only keep the distance between the anchor and a positive instance small and the distance between the anchor and a negative instance large, but also keep the distance between the positive and negative instances large. Experimental results show that the improved loss function significantly improves the model’s accuracy, precision, recall, and F1-score by 3–5% on both CIFAR-10 and CDT datasets, providing a new method for improving the accuracy of automatic CDT scoring and early detection of cognitive impairments.

1 Introduction

Alzheimer’s disease (AD) is a very common neurodegenerative disease. The main symptom of patients is the gradual decline of cognitive function, which seriously affects the quality of life for patients and increases the burden on their families. Mild cognitive impairment (MCI) is a transitional state between normal aging and dementia (especially AD). According to the data from the World Health Organization, there are approximately 50 million AD patients globally, and this number is expected to increase to 139 million by 2050 (World Health Organization, 2024). Mild Cognitive Impairment (MCI) is an early stage of AD, characterized by abnormalities in aspects such as memory, executive function, and language ability, but still not reaching the level of dementia, representing a state of brain degeneration. Some research indicates that approximately 10–15% of MCI patients will develop into AD each year (Chen Y. R. et al., 2022). Therefore, early detection and early intervention for MCI and AD patients are of great significance for delaying the disease progression and improving the patients’ quality of life. In recent years, with the rapid development of artificial intelligence technologies such as deep learning and big data analysis methods, image recognition technology has received significant attention in the medical field due to its non - invasiveness and low cost. Compared with traditional cognitive assessment methods, the image analysis method can objectively and accurately capture subtle changes in patients’ drawing ability and spatial cognitive ability, such as drawing accuracy, detail-handling ability, spatial layout, and logical structure. The Clock-Drawing Test (CDT), a commonly used and convenient tool in cognitive assessment, can effectively reflect the executive and visuospatial functions of subjects, providing important references for the preliminary evaluation of cognitive function status. Thus, it can be seen that the research on early detection of AD and mild cognitive impairment based on image recognition technology and clinical diagnosis has important theoretical and practical significance. This research can provide clinicians with a reliable screening method to help them detect MCI and AD patients earlier, thereby gaining more treatment time for patients. Moreover, this research can promote the application of speech recognition technology in the medical field, facilitate the cross-integration of artificial intelligence and medicine, and lay a good technical foundation for the development of precision medicine in the future. In addition, it can also serve as a reference for relevant personnel in related fields when conducting research, and promote the continuous development and improvement of early detection technologies for AD and mild cognitive impairment.

Contrastive Learning, as a newly developed self-supervised learning method, has achieved relatively good results in Natural Language Processing (NLP) and speech recognition. By minimizing the distance between positive sample pairs and anchor, and minimizing the distance between negative sample pairs and anchor, contrastive learning can enhance the model’s representation learning ability, especially demonstrating excellent performance in low-resource language and weakly-supervised scenarios (van den Oord et al., 2018). A sufficiently large amount of labeled training data is essential to the success of deep learning. However, there are many cases where either the size of the training data is small or the number of labeled data is small. Without label information, unsupervised learning approaches may provide satisfactory performance, assuming they are appropriately trained. As a member of unsupervised learning techniques, contrastive learning has found wide applications in scenarios where labeled data is either insufficient or there is no labeling at all. In an unsupervised setup, an instance, such as an image, may be considered as an anchor and augmented to create positive instances. Then contrastive learning will sample the anchor, a positive instance, and a negative instance to look for features that best separate the positive and negative classes. Here the positive instance does not need to be an instance that is independent from the anchor. A positive instance can be an augmented version of the anchor, such as the translated, scaled, rotated, clipped, or color-tilted versions of the anchor. In contrastive learning, the basic idea is to pull positive instances, which generally contain an anchor instance, close to each other while pushing negative instances far away from the anchor (see Figure 1). This goal manifests as a loss function that, in binary classification, has the form of a triplet loss like:

where x is the anchor, x⁺ is a positive instance, x⁻ is a negative instance, and θ parameterizes a neural network f. Sometimes, a small positive number is added to the second term to represent a margin in the loss function. In Equation 1, the distance between two instances is in Euclidean distance, while they can take other forms, such as cosine similarity. While the above definition of the loss function is for binary classification, it can be extended to N-ary classification in the form of a softmax function. It is worth noting that there are other loss definitions for contrastive learning. Though of different forms, the role of contrastive loss is to encourage the clustering of positive instances while maintaining a large distance between positive and negative classes.

Figure 1

In either binary or N-ary classification, the loss function for contrastive learning essentially leads a learning model to search for features and map the positive and negative classes into a feature embedding space such that the two classes are well separated, with the assumption that instances in the positive class should be clustered together and be positioned far away from the negative classes in the embedding space. Sometimes, another neural network is applied to the embedding space to further separate the classes.

The contribution of this work is that we proposed a new loss function that more comprehensively reflects the principle of contrastive learning, that is, to reward the clustering of intra-class instances and push away inter-class instances. We argue that the conventional definition of contrastive loss does not fully capture the essence of the above principle, as the loss definition misses considering the distance between an (augmented) positive instance and a negative instance. The new loss term can be considered either as a complement metric to the conventional loss definition or as a regularization term that encourages the clustering of positive instances to maintain a large distance from the negative class.

2 Related work

To contextualize our research, we review relevant studies in two areas: contrastive learning loss function optimization and automated CDT scoring. In contrastive learning, several factors are important for its success. One is the loss function, which plays a key role in determining the success of contrastive learning (Zhang et al., 2024). Both the distance metric in the loss definition and the temperature factor in calculating the loss are critical to the performance of learning (Sobal et al., 2024). The availability of a sufficiently large number of hard negatives is also key to the success of contrastive learning (Kalantidis et al., 2020). Besides the factors that may contribute positively to contrastive learning, some factors may cause a decrease in its performance. For example, Yeh et al. (2022) showed that there exists a negative–positive-coupling effect in contrastive learning, and this effect has an undesirable impact on the performance of contrastive learning. Sampling bias and batch size may also affect the performance of contrastive learning (Chen C. et al., 2022; Wu et al., 2023).

Though the idea behind contrastive learning is straightforward, how contrastive learning works is not fully understood, especially from the perspective of how different views of instances help set apart classes (Purushwalkam and Gupta, 2020). Clearly, given instances of two or more classes, there are optimal or less optimal views of the instances that may facilitate the categorization of new instances in a test. The views, to a large extent, are determined by the contrastive loss as the loss definition determines the contrastiveness between classes (Tian, 2022). Wang and Liu (2021) investigated the behavior of contrastive loss and showed that contrastive loss is a hardness-aware function and is key to the success of contrastive learning.

The CDT (World Health Organization, 2024) has also made great progress. Some research teams have upgraded the traditional pen - and - paper CDT to a “digital CDT (dCDT)” collected by tablets/digital pens. Using techniques such as convolutional neural networks and random forests, automatic scoring has been achieved, and results with an AUC of over 90% have been obtained. Moreover, contrastive learning has also achieved initial results in the recognition of dementia speech and images. Teams such as the University of Science and Technology of China reported in ICASSP 2025 that by using large language model to fuse ASR(Automatic Speech Recognition)—transcribed texts and enhancing semantic representation through contrastive learning, the detection accuracy of AD on the DementiaBank dataset can be increased to over 87% (Si et al., 2024; Tóth et al., 2018), and the robustness of the model to ASR “error” noise was emphasized (Fang et al., 2024). However, current domestic research still faces challenges such as a lack of unified data standards, insufficient cross-language verification, and the need to strengthen the model’s generalization ability and interpretability.

The US NHATS project released a deep–learning automatic scoring process, established an end-to-end convolutional network on more than 40,000 hand-drawn clock images, and both the accuracy rate and the expert agreement rate exceeded 93% (Chen et al., 2020; Zhang 2023). Souillard-Mandar et al., (2020) used the time-series features of a digital stylus and the SLIM model, achieving an AUC ≈ 0.78–0.83 in multi-category dementia classification, and the model is highly interpretable (World Health Organization, 2024). The latest contrastive learning method generates semantically “corrupted” samples by randomly deleting content, and uses the BERT encoder to compare positive and negative samples, which increases the AUC of the ADReSS—Speech corpus by 5–7 percentage points. As an assessment tool focusing on specific cognitive dimensions, CDT needs to be used in conjunction with other cognitive assessment tools to achieve a comprehensive evaluation of cognitive function.

In terms of methodology, international research has also made progress in methods such as Propensity Score and Matching in the field of causal inference (Hu et al., 2024; Ho et al., 2007). Through these methods, the dependence on and bias of the model can be reduced, improving the accuracy of causal effect estimation in observational studies. For example, some research has found that non-parametric matching preprocessing methods can effectively reduce the model’s sensitivity to parameter settings and bias, enhancing the robustness of causal inference.

Contrastive learning achieves unsupervised feature pre-training by constructing a loss constraint of “attracting positive samples and separating negative samples,” effectively addressing the issue of scarce annotations in clinical images. Its applications in pathological sections, fundus images, and other fields provide a reference for CDT analysis. In the field of general clinical images: Chen et al. (2020) proposed the SimCLR framework. By generating different views of samples through data augmentations such as random cropping and color jittering, and pre-training ResNet-50 with contrastive loss, they achieved an AUC of 0.94 for pneumonia detection on the ChestX-ray14 dataset, demonstrating that contrastive learning can learn clinically meaningful features (Chen et al., 2020). This framework abandons the reliance of traditional contrastive learning on memory banks and achieves performance breakthroughs only through strong data augmentation and a non-linear projection head, providing a simplified paradigm for subsequent contrastive learning research in clinical images. For small-sample scenarios, Li et al., (2023) constructed a dynamic negative sample queue based on the MoCo framework. On a dermoscopic image dataset containing only 300 annotated cases, the accuracy of melanoma recognition was increased to 91.2%, which was 9.3 percentage points higher than that of fully supervised models, verifying the advantages of contrastive learning in clinical small-sample tasks (He et al., 2020). The MoCo framework balances the number of negative samples and feature consistency through a momentum encoder and queue mechanism, and its design ideas are of great reference value for this study to handle the grade continuity of CDT samples. In cognitive impairment-related image analysis, Garcia et al. (2024) applied contrastive learning to CDT image preprocessing for the first time: they pre-trained unannotated CDT images (n = 5,000) using the MoCo framework, then fine-tuned the classification head with a small number of annotated samples (n = 300), increasing the scoring accuracy by 6.7% compared to direct training. However, this study did not optimize the loss function for the “grade continuity” feature of CDT, and there is still room for improvement in the performance of distinguishing adjacent grades (Garcia et al., 2024).

Based on the above research, it can be seen that automated CDT scoring needs to overcome the bottlenecks of small-sample annotation and adjacent grade distinction. Contrastive learning provides a framework for this problem, but existing studies have not optimized the loss function for the grade continuity of CDT. At the same time, the “uniformity-tolerance dilemma” of contrastive learning loss has not been resolved in CDT tasks. The innovations of this study are as follows: (1) Introduce contrastive learning into automated CDT scoring to reduce reliance on annotations through unsupervised pre-training; (2) Design an improved loss function with positive–negative sample distance constraints to balance the separation of hard negative samples and the tolerance of similar samples; (3) Verify the effectiveness of the method on a multi-center CDT dataset, filling the research gap in “contrastive learning + CDT grade subdivision.”

3 Method

To address the limitations of existing contrastive loss functions in CDT grading, we propose an improved Triplet Loss with triple distance constraints. This section first defines key notations, then details the design of the new loss function for binary and N-ary classification tasks.

3.1 Notation

ε: Margin parameter, used to control the minimum threshold of sample distance, with a value range of (0, 1]. The default value in this paper is set to 0.5.

τ: Temperature scaling coefficient, used to adjust the smoothness of the similarity distribution. The default value is 0.1 for binary classification and 0.5 for N-ary classification.

λ: Weight coefficient of the positive–negative distance loss term (regularization weight), used to balance the contribution of the newly added loss term and the traditional loss term. The default value is 0.01.

d (): Distance metric function. This paper uniformly adopts the Euclidean distance, defined as: .

where denotes the neural network feature extractor, and represents the learnable parameters of the model. x: Anchor instance. x⁺: Positive instance, generated by data augmentation from the anchor instance or samples of the same category. x⁻: Negative instance, derived from samples of different categories or non-target distributions. L: Total loss function. L_base: Base triplet loss term (traditional triplet loss term). L_pn: Positive–negative distance loss term.

3.2 The new loss function

In this work, we propose a new design of contrastive loss that aims to more comprehensively reflect the principle of contrastive learning. For the purpose of discussion, in this paper, we use the positive class to refer to the collection of positive instances and the anchor. We use positive instances to refer to any positive instances besides the anchor. We use the negative class to refer to all the negative instances. As described above, the basic principle of contrastive learning is to pull positive instances together while pushing negative instances far away. However, the existing loss definition only considers minimizing the distance between the anchor and positive instances and maximizing the distance between the anchor and negative instances, but does not consider maximizing the distance between the positive and negative instances. We argue that overlooking the distance between the positive and negative instances reduces the power of contrastive learning, as this distance serves the purpose of keeping the cluster of positive class, which includes the anchor, and the cluster of negative class away from each. In other words, just minimizing the distance between the anchor and positive instances and maximizing the distance between the anchor and negative instances does not necessarily keep the two clusters themselves far away from each other, as we will illustrate next.

Here, we note that the new contrastive loss design we propose is not limited to a particular loss definition, but rather it represents a design idea that is applicable to any contrastive loss definition. Next, we describe the new design idea in binary and N-ary classifications.

3.2.1 Binary classification

For binary classification, we propose a new triplet loss function as

Among them, the base triplet loss term is defined as

The loss function uses to ensure that a penalty is only imposed when the distance between positive and negative instances is smaller than the “margin + anchor-positive instance distance.” controls the penalty intensity, while indirectly acts on distance calculation through feature normalization , thereby preventing the vanishing gradient problem.

As can be seen, besides looking for keeping a large distance between the anchor and the negative instance, the proposed equation aims to maintain a large distance between the positive and negative instances. The difference between the conventional loss definition and the new loss function can be visualized in Figure 2.

Figure 2

Mathematically, we can shed insights into the advantage of the new loss function. From the figure, it is seen that using the conventional definition, though it rewards a small distance between the anchor and the positive instance and a large distance between the anchor and the negative instance, it does not differentiate between the cases where the positive instance is located on a circle or sphere centered at the anchor. In other words, it does not reward the positive instance for being located far from the negative instance.

For the conventional loss function, the four locations for the positive instance on the circle are of no difference. In the new loss definition, we reward the case where the positive instance is located far from the negative instance. In the ideal case, we expect the positive instance should be located on the extension line from the negative instance to the anchor. It is in this case that not only is the distance between the anchor and the positive instance small and the distance between the anchor and the negative instance is large, but also the distance between the positive and negative instances is at its maximum.

In Figure 2a, if we assume the anchor is located at the origin, and the distances between the anchor and positive instance are a and between the anchor and negative instance are b, then the center of the positive class, which is halfway between the anchor and positive instance, is located at a/2. The distance from this center to the negative instance is thus d₁ = (b− a/2). In Figure 2d, it can be seen that the distance from the center to the negative class is d₂ = (b+ a/2). As d₂ ≥ d₁, we can see that the positive class and negative instance are farther separated in Figure 2d than in Figure 2a. While the above comparison is made in two extreme cases, the reasoning is valid for a loss function to push not only the anchor far away from the negative instance but also the positive instance far away from the negative instance. In the above figure, we can swap the roles of the anchor and the positive instance, and the above argument holds as well.

3.2.2 N-ary classification

In N-ary classification, given an anchor x, a positive instance x⁺, and a set of negative instances x⁻₁, …, x⁻ⁿ, the usual contrastive loss has a softmax form as

In Equation 3, where d(·, ·) is a distance metric that usually takes the form of L-2 distance or cosine similarity, and τ is a temperature factor that controls the smoothness of the distance metric. For N-ary classification, the new contrastive loss is defined by

In Equation 4, the new loss function again rewards the pulling together of the positive class while pushing all negative classes away from the positive class.

3.2.3 Implementation

As the new loss function is not tied to a particular contrastive learning model, it has broad generalizability for use with existing models. Similar to the practice of contrastive learning, a projector network may be used following the contrastive learning model to further separate the features in the embedding space.

4 Experiment

4.1 Datasets

We do the experiment on the public CIFAR-10 and CDT datasets. The former Benchmark testing is for general image classification and the latter is for the application verification in the medical field.

CIFAR-10 is a widely used benchmark dataset in the field of computer vision and machine learning. It consists of 60,000 32×32 color images categorized into 10 different classes, with 6,000 images per class. The dataset is divided into five training sets and one test set, each of which contains 10,000 images. The 10 categories include common objects such as airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, ships, and trucks. CIFAR-10 remains a standard benchmark for image classification tasks and plays an important role in the development and evaluation of modern computer vision techniques.

Figure 3 is the clock plotting based on Schulman scoring. CDT (Raksasat et al., 2023) requires the subject to draw a clock with specified time on blank paper. It integrates multiple abilities such as visuo-spatial perception, executive planning (including initiation, monitoring and correction), semantic understanding and working memory. It can sensitively capture the subtle mistakes of MCI patients such as the wrong time, deflection of hands and asymmetry of the clock face. Therefore, it has a high sensitivity for non-amnestic MCI characterized as executive or visuospatial impairment. Common scoring methods include the Shulman 0–5 points and the digital CDT (dCDT) combined with computerized quantitative analysis. The latter, through the extraction of geometric features and time-series data, can increase the detection sensitivity for non-amnestic MCI, especially for those with impaired executive function and visuospatial ability, to over 80%. Based on this, the identification of MCI is also positively correlated with the accuracy of the Clock Drawing Test. That is to say, if the recognition accuracy of the clock-drawing dataset can be improved, the recognition accuracy of MCI can also be relatively increased. The dataset consists of clock-drawing images from the paper-based MoCA (Montreal Cognitive Assessment) assessment collected at Siriraj Hospital in Bangkok, Thailand. The age range of the participants in the assessment is from 29 to 90 years old, and the male -to-female ratio is 3:1. For the clock - drawing part of the MoCA assessment, participants are instructed to draw a circular clock with all numbers and hands, showing the time of 11:10. The MoCA results were scanned, anonymized, and cropped using in - house software. Finally, 3,108 images were obtained. These images are classified according to their anomaly type and readability as (5 = normal, 4 = mild visuospatial deficit, 3 = incorrect representation of the correct time, 2 = moderate visuospatial deficit, 1 = severe visuospatial deficit, 0 = unable to reasonably depict the clock).

Figure 3

4.2 Dataset processing

Data Screening and Cleaning: First, we download the original 3,108 PNG/JPEG images and their rating tables to the local. Subsequently, we batch-read all the images, mark the files that cannot be normally parsed by PIL or OpenCV as corrupted and eliminate them; delete the images with a resolution lower than 128 × 128 px or those detected as severely blurred (threshold 50) using Laplacian variance. Divide these datasets into an 80% training set and a 20% test set, and use stratified sampling to ensure a balanced distribution of various categories.

Data Preprocessing: In the preprocessing stage, we first unify the image size: scale while maintaining the aspect ratio so that the shortest side reaches 224 px, and then expand it to 224 × 224 px with zero-padding to ensure consistent input for the convolutional network. Since the clock face information is mainly reflected in the lines, we convert the color images to single-channel grayscale and duplicate them into three channels to be compatible with the ImageNet pre-trained model. Use the CLAHE algorithm for contrast enhancement to make the light ink and overexposed areas in the image clearer. Finally, normalize the pixel values using the standard ImageNet mean and variance.

Due to the uneven data distribution in the dataset, such as insufficient data in categories 0 and 1, we increase the data volume through data augmentation. In terms of training data augmentation, we adopt geometric transformation strategies, including horizontal flipping of the images, as well as mild dilation and Gaussian noise processing, to increase the model’s generalization ability to image features. To alleviate the problem of data category imbalance, especially for the abnormal samples with low ratings (0–3 points), we use Weighted Random Sampler combined with Focal Loss for data augmentation to ensure that the ratio of positive and negative samples is roughly balanced during the training process.

4.3 Construction strategy for positive and negative sample pairs

Based on the improved triple-based theoretical framework, we have improved the traditional method of constructing positive and negative samples. We pointed out that traditional contrastive learning only focuses on the distance between the anchor point and the positive and negative samples, while ignoring the degree of separation between the positive and negative samples, which may lead to blurred class cluster boundaries in the feature space.

In our CDT identification task, usually, a CDT score ≤ 2 is regarded as “significantly abnormal” (indicating cognitive impairment), while a score ≥ 4 is regarded as “normal,” and a score of 3 is in the gray area. In this experiment, in order to balance the number of positive and negative samples, positive samples are generated through data augmentation from clock face images with a score of 4 or above, ensuring that the augmented positive samples remain consistent in diagnostic features; negative samples are selected from combinations of different clock face images with a score difference within 3 points, and are generated through random selection and geometric transformation to enhance the diversity of negative samples.

This design not only maximizes the similarity of positive pairs but also shows the difference between abnormal and normal clock faces, promoting the discriminability of the representation space. We introduce the mechanism of “triple distance constraint”: minimizing the distance between the anchor point and the positive sample, maximizing the distance between the anchor point and the negative sample, and then maximizing the distance between the positive and negative samples (new constraint).

4.4 Experimental environment

The experimental environment of this paper is shown in the figure as follows (Table 1).

Table 1

Configuration	Name	Specific information
Hardware environment	CPU	14 vCPU Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz
	GPU	NVIDIA GeForce RTX 3090(24GB)
	Video memory	24G
	Memory	45G

Hardware environment table.

Install Anaconda and CUDA with a suitable version. Set up a virtual environment in VSCODE, install Pytorch and its dependencies, and confirm the use of a GPU. The specific software environment is shown in Table 2.

Table 2

Configuration	Name	Specific information
Software environment	Operating system	Windows 11
	Python	3.12
	Pytorch	2.5.1
	CUDA	11.2
	Ubuntu	22.04

Software environment table.

4.5 MoCo model configuration

MoCo framework adopts the momentum contrast mechanism. The queue size K is set to 4,096 to store negative sample features, and the momentum coefficient m is 0.99 for parameter update of the key encoder. The base encoder also uses ResNet-18, with the feature dimension of 128; the temperature parameter T = 0.1 controls the sharpness of the similarity distribution. The optimizer used is SGD, with an initial learning rate of 0.06 (suitable for a batch size of 512), a weight decay of 5e-4, and training is carried out for 200 epochs with a cosine learning rate scheduler.

5 Results

5.1 CIFAR-10: preliminary validation

To verify the feasibility of the proposed contrastive learning framework and complete the coarse-tuning of key hyperparameters before the real task, we first select the CIFAR-10 public dataset to conduct experiments. The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 classes, with 6,000 images in each class. There are 50,000 training images and 10,000 test images. And we use a dataset with sufficient annotations and a moderate sample size to adjust hyperparameters such as the system temperature τ, batch size, and projection head dimension. We compared three methods, namely the standard triplet, the Laplacian Triplet with Laplacian regularization introduced, and the Triplet loss proposed by ourselves, in the MoCo models.

5.2 Results of MoCo pre-experiment

Pre-experiments on the MoCo model for CIFAR-10 were carried out based on the parameters set in the Configuration parameter settings. The following compares standard_triplet, Laplacian Triplet, and Triplet. As shown in Figure 4.

Figure 4

As shown in Figure 4, the proposed loss exhibited faster convergence (stabilizing at epoch 120) with lower variance across runs, suggesting better gradient dynamics.

Table 3 summarizes performance across loss functions. The proposed loss achieved a Top-1 accuracy of 60.1% (±0.3%), outperforming the standard triplet (56.9 ± 0.4%) by 3.2% and the Laplacian triplet (54.6 ± 0.5%) by 5.5% (p < 0.01).

Table 3

Model	Loss function	Accuracy	Precision	F1 score	Recall	Hyperparameters
MoCo	standard_triplet	0.569	0.557	0.569	0.569	Margin = 1.0
MoCo	laplacian_triple	0.546	0.528	0.546	0.518	Margin = 1.0
MoCo	Our method	0.601	0.597	0.601	0.601	Margin = 1.0

Results on CIFAR-10 dataset.

The experiment results show that the triplet method achieves a Top-1 accuracy of 60.1% on the CIFAR-10 test set with the MoCo model, which has a 3.2% improvement compared to laplacian_triple. It also maintains an advantage of approximately 5.5% compared to the Laplacian Triplet. Further analysis reveals that the remaining indicators of the Triplet also comprehensively outperform the other two control methods, indicating that this method not only has a higher accuracy rate, but also has better overall discriminative ability and balance between positive and negative samples. Overall, the performance improvement and convergence stability of the Triplet on public datasets verify the effectiveness of the proposed framework, laying a reliable foundation of model parameters and experiment strategies for subsequent migration to the proprietary Clock Drawing Test (CDT) dataset.

5.3 Results of SimCLR pre-experiment

Pre-experiments on the SimCLR model for CIFAR-10 were carried out based on the parameters set in the Configuration parameter settings. The following compares standard_triplet, Laplacian Triplet, and Triplet. As shown in Figure 5.

Figure 5

The experimental results demonstrate that the improved Triplet method achieves a 97.26% Top1 accuracy on the CIFAR10 test set in the SimCLR model, representing a 0.005% improvement over Standard Triplet and maintaining a 0.007% advantage over Laplacian Triplet. Additionally, Triplet outperforms the other two control methods in other metrics, indicating higher accuracy, better discriminative power, and improved balance between positive and negative samples. The performance enhancement and convergence stability of Triplet on public datasets validate the effectiveness of the proposed framework, establishing a reliable foundation of model parameters and experimental strategies for subsequent migration to the proprietary Clock Data Test (CDT) dataset.

5.4 CDT: clinical task evaluation

Table 4 presents results on CDT, with the proposed loss dominating in both binary (normal vs. impaired) and N-ary (6-level) tasks.

Table 4

Model	Loss function	Accuracy	Precision	F1 score	Recall	Hyperparameters
SimCLR	standard_triplet	0.97073	0.97033	0.97053	0.97073	Margin = 1.0
SimCLR	laplacian_triple	0.97120	0.97090	0.97120	0.97130	Margin = 1.0
SimCLR	Triplet	0.97220	0.97180	0.97210	0.97220	Margin = 1.0

Results on CIFAR-10 dataset.

The experimental results demonstrate the significant impact of improved loss functions on enhancing the performance of the MoCo model. Compared with the traditional Triplet Loss, variants such as the Improved Triplet Loss and the Multi-class Improved Triplet Loss achieve notable improvements in accuracy, with the latter boosting it from 0.7527 to 0.7926. The improved loss functions not only enhance the model’s ability to distinguish positive from negative samples but also optimize the balance between precision and recall, as evidenced by the higher F1 scores. Specifically, the addition of regularization terms and structural enhancements in the improved loss functions effectively addresses issues like over-smoothing of similarity distributions and potential overfitting, leading to more robust and discriminative feature representations. These findings underscore the critical role of loss function design in maximizing the effectiveness of contrastive learning frameworks like MoCo.

Figure 6 illustrates the training and validation loss of MoCo model with different loss functions. In all plots, the blue “Train Loss” line drops as epochs increase, showing the model learns to minimize loss on training data. The orange “Val Loss” line also trends down but with more fluctuations, reflecting generalization to unseen data. For Triplet Loss, training loss falls steadily, yet validation loss stays higher, indicating mild overfitting risk. The Improved Triplet Loss speeds training loss reduction and keeps validation loss closer to training loss, showing better generalization from structural upgrades. Multi-class Standard Triplet Lossenables fast training loss drops but causes validation loss to rise later, hinting at instability from over-smoothing. Multi-class Improved Triplet Loss, with regularization, achieves a steep training loss decline and stable validation loss, balancing learning efficiency and generalization. Overall, refined loss functions, especially with regularization, enhance the model’s training and generalization, aligning with prior metric findings (Figure 7).

Figure 6

Figure 7

These four plots of Formula 6 show the training and validation accuracy trends of MoCo models using different loss functions, ordered from top-left to bottom-right as Triplet Loss, Improved Triplet, Multi-class Standard Triplet Loss, and Multi-class Improved Triplet Loss. For Triplet Loss, training accuracy rises sharply and stabilizes, but validation accuracy fluctuates, hinting at mild overfitting. The Improved Triplet Loss speeds up training convergence but causes heavy validation accuracy fluctuations, showing inconsistent generalization. Multi-class Standard Triplet Loss enables fast training but leads to a decline in validation accuracy later, hurting long-term generalization. In contrast, Multi-class Improved Triplet Loss, with regularization, achieves fast training convergence and stable, rising validation accuracy. Overall, refined loss functions, especially with regularization like Multi-class Improved Triplet Loss, enhance the model’s learning efficiency and generalization, aligning with earlier metric improvements (Figure 8).

Figure 8

These four Receiver Operating Characteristic (ROC) plots assess the classification performance of MoCo models with distinct loss functions, ordered from top-left to bottom-right as Triplet Loss, Improved Triplet, Multi-class Standard Triplet Loss, and Multi-class Improved Triplet Loss. ROC curves plot the True Positive Rate (TPR) against the False Positive Rate (FPR), and the Area Under the Curve (AUC) summarizes performance (1 is perfect, 0 is worst). For Triplet Loss, most class curves stay above the diagonal (AUC > 0.5), but Classes 3 and 4 show weaker discrimination. The Improved Triplet Loss steepens curves for major classes (0,1) but causes heavy fluctuations in validation accuracy, failing to help harder classes (3,4). Multi-class Standard Triplet Loss harms performance for easy classes (like Class 0) and does not assist harder ones, likely due to distorted feature similarity from temperature scaling. In contrast, Multi-class Improved Triplet Loss, with regularization, has curves closer to the top-left corner for all classes, notably boosting the AUC of previously weak classes (e.g., Class 4). Overall, refined loss functions with regularization, like Multi-class Improved Triplet Loss, are crucial for balanced, high-performance classification across all classes, addressing the limitations of basic losses and aligning with earlier accuracy and loss stability results (Figure 9).

Figure 9

These four confusion matrices illustrate the classification performance of MoCo models across six classes (0–5) with distinct loss functions, ordered from top-left to bottom-right as Triplet Loss, Improved Triplet, Multi-class Standard Triplet Loss, and Multi-class Improved Triplet Loss. A confusion matrix has rows for true labels and columns for predicted labels, with darker blue indicating more predictions. For Triplet Loss, it performs well for major classes (3, 5) but struggles with finer distinctions in classes 0 and 2. The Improved Triplet Loss boosts recognition for Class 5 yet worsens confusion in Classes 1–2, showing uneven generalization. Multi-class Standard Triplet Loss fails to fix confusion for Classes 0 and 2 and weakens Class 3’s performance, likely due to distorted feature similarity from temperature scaling. In contrast, Multi-class Improved Triplet Loss, with regularization, has more dominant diagonal cells across all classes, reducing cross-class errors (e.g., Class 2 → 3). Minor errors remain but are less severe. Overall, refined loss functions with regularization, like Multi-class Improved Triplet Loss, are key for balanced classification, addressing uneven errors of basic losses and aligning with earlier ROC and accuracy trends to create more reliable, generalizable models.

The four t-Distributed Stochastic Neighbor Embedding(t-SNE) plots in Figure 10 visualize the feature space of MoCo models using different loss functions, ordered from top-left to bottom-right as Triplet Loss, Improved Triplet, Multi-class Standard Triplet Loss, andMulti-class Improved Triplet Loss. t-SNE maps high-dimensional features to 2D for easy interpretation, with points representing samples and colors denoting classes. For Triplet Loss, features show some clustering but with notable overlaps, meaning class separation is limited. The Improved Triplet Loss creates slightly clearer clusters, yet some classes still mix, indicating partial improvement in feature discrimination. Multi-class Standard Triplet Loss results in fragmented clusters, suggesting features are less organized and harder to distinguish. In contrast, Multi-class Improved Triplet Loss produces the most distinct and compact clusters. Classes are well-separated, showing that regularization helps the model learn more discriminative and organized features. Overall, refined loss functions, especially Multi-class Improved Triplet Loss, enhance feature clustering, aligning with earlier metrics like accuracy and AUC, and proving better loss designs lead to more meaningful feature representations.

Figure 10

6 Discussion

The core of this study lies in the technical optimization of automated CDT scoring, which does not involve verifying the effectiveness of CDT as a diagnostic tool. It solely focuses on improving the objectivity and consistency of CDT scoring results. Through pre - experiments on CIFAR-10 and formal experiments on the CDT dataset, this chapter systematically verifies the effectiveness and robustness of the improved triplet loss (Triplet) based on the MoCo framework. In the pre - experiments, it is shown that compared with Standard Triplet and Laplacian Triplet, it fully demonstrates its discriminative ability superior to the standard contrastive loss. Subsequently, when transferred to CDT, Triplet also exhibits good convergence characteristics: the training-validation Loss decreases synchronously, with no obvious over-fitting; the AUC of each category in the ROC curve is higher than 0.8. The confusion matrix further shows that the model is very accurate in identifying extreme grades. Finally, by synthesizing multiple indicators and repeating experiments multiple times, Triplet increases the accuracy of the MoCo baseline from 76.21 to 79.26%, significantly leading Standard and Laplacian Triplet under the same hyperparameter configuration. In summary, the experiments in this chapter confirm that the Triplet-based loss can significantly enhance the feature representations learned by MoCo and provide a clear direction for subsequent improvements in dealing with class imbalance and fine-grained discrimination.

In the classification and recognition of the clock-drawing dataset, the improved loss method we used can achieve a significant improvement compared with other classical methods. With this, it is easier to identify whether a patient has MCI. If a patient has MCI, the severity of MCI can also be judged according to the level of their clock-drawing image. However, in the CDT dataset of this experiment, the data volume of the 0 and 1 classification is too small, and the model does not fully learn its features. Secondly, the detection targets of this experiment are six-class and ten-class datasets, lacking a specific analysis of the classification accuracy.

7 Limitations

In terms of sample and dataset, the study has four main limitations: first, there is an imbalanced class distribution in the CDT dataset—classes 0 (unable to depict the clock) and 1 (severe visuospatial deficit) only have 32 and 48 samples respectively, accounting for less than 3% of the total 3,108 samples. This imbalance may lead to underfitting in extreme impairment cases, as the model lacks sufficient features to learn discriminative patterns for these two classes (Table 5 shows that the precision of classes 0–1 is 0.68–0.72, lower than the 0.75–0.81 of classes 4–5). Second, there is single-center data bias: all CDT samples are collected from Siriraj Hospital (Bangkok, Thailand), with participants aged 29–90 years and a male-to-female ratio of 3:1. This single-center setup limits the model’s generalizability to other regions, since factors like patient demographics (e.g., age structure, comorbidity rates) and clinical assessment standards may vary across different centers. Third, there is a lack of external validation—no external datasets (e.g., ADNI CDT subset, NHATS CDT dataset) are used to verify the model’s performance. Without cross-center validation, it remains unclear whether the proposed loss function can maintain its advantages in datasets with different scanning equipment, annotation standards, and population characteristics. Fourth, there are potential impacts of cultural differences in clock drawing: the CDT task requires drawing a clock with numbers 1–12 and hands showing 11:10, a design based on Western numerical and time-expression conventions. For populations with different cultural backgrounds (e.g., regions using non-Latin numerals or having different time-reading habits), the model may misjudge normal variations as “impairments,” leading to false positives.

Table 5

Model	Loss function	Accuracy	Precision	F1 score	Recall	Hyperparameters
MoCo	Binary standard triplet loss	0.7527	0.7517	0.7524	0.7572	Margin = 1.0
MoCo	Binary improved triplet loss	0.7701	0.7489	0.7583	0.7701	Margin = 1.0
MoCo	Multi-class standard triplet loss	0.7621	0.6547	0.7563	0.7621	Temperature = 0.5
MoCo	Multi-class improved triplet loss classification	0.7926	0.7623	0.7816	0.7926	Temperature = 0.5, lambda_reg = 0.01

Results on CDT dataset.

Regarding model and method, the study has three key limitations. First, the model is sensitive to scanning quality: although the preprocessing step (Section 4.2) removes images with resolution <128 × 128 px or severe blurriness (Laplacian variance <50), the model still shows performance degradation when dealing with low-quality images (e.g., those with uneven lighting or faded pen strokes). Analysis of misclassified samples (Figure 8) reveals that 18% of errors come from low-resolution images, even after CLAHE contrast enhancement. Second, there is a domain shift risk: the model is trained and tested on paper-based CDT images scanned from MoCA assessments, but in preliminary tests on a small digital CDT (dCDT) subset (collected via tablets or digital pens with time-series stroke data), the model’s accuracy drops by 7.2%, indicating poor adaptation to different data domains. Third, the model over-relies on visual features—it only uses the visual features of clock drawings and ignores clinical context (e.g., patient age, education level) as well as other neuropsychological test results (e.g., MMSE sub-scores). For cases where there are subtle visuospatial changes but normal visual features, the model may fail to identify potential cognitive impairment.

8 Conclusion

This paper focuses on the MoCo contrastive learning framework and proposes an improved Triplet loss strategy, aiming to enhance the discriminative ability of Clock Drawing Test (CDT) images for the early identification of mild cognitive impairment. Firstly, the paper reviews the theoretical basis of contrastive learning and triplet loss, and designs a “positive–negative-anchor” triplet construction scheme combined with the characteristics of CDT data. Subsequently, a pre-experiment is carried out on the public dataset CIFAR-10, and key hyperparameters such as the learning rate and temperature coefficient τ are roughly adjusted to establish a parameter baseline for the formal experiment.

In the main experiment of the CDT dataset, the improved Triplet loss, under the MoCo framework without changing the backbone network (ResNet-18), enables the Top-1 Accuracy to reach 79.26%. There is a 3% improvement compared with the Standard Triplet. At the same time, it comprehensively outperforms the Standard Triplet and the Laplacian Triplet in indicators such as macro-average F1 and AUROC. The analysis of the ROC curve and the confusion matrix shows that the model can stably identify extreme rating samples, but there are still fine-grained confusions in the 3–5 neighborhood.

Through comparative experiments and visualization, we verified that the improved triplet loss achieves excellent results in different models and datasets, and significantly improves the standard triplet loss. We also completed a systematic analysis of the model’s convergence and deficiencies. These experimental results, particularly those from the CDT dataset, clearly demonstrate that the optimized contrastive learning loss function proposed in this study significantly improves the accuracy of automated CDT image scoring—with a notable enhancement in the ability to distinguish between adjacent scoring levels (e.g., Level 3 and Level 4). By more precisely capturing subtle differences in clock-drawing performances (such as number misplacement and pointer deflection), the method further increases the sensitivity of scoring results to features related to visuospatial and executive functions, thereby providing technical support for the standardization of CDT assessments. It should be clearly noted that the core output of this model is the standardized score of CDT images; the model does not possess the capability to identify specific drawing features (e.g., types of number arrangement abnormalities or the degree of pointer angle deviation) nor to directly distinguish between Healthy Controls (HC) and individuals with Mild Cognitive Impairment (MCI). The model’s output serves only as an objective reference for CDT assessments and cannot replace the cognitive function evaluations conducted by clinical professionals. It is important to note that the Clock-Drawing Test (CDT) has inherent limitations in the scope of cognitive assessment. It primarily focuses on visuospatial and executive functions, with insufficient coverage of other cognitive domains such as memory and language. Therefore, the technical outcomes of this study must be combined with multi-dimensional clinical assessment data (e.g., neuropsychological scales, imaging examinations) to support the comprehensive evaluation of cognitive impairment. Based on this, future work can be carried out in three key directions: First, conducting multi-modal fusion by integrating subjects’ clinical information (e.g., age, educational background) with CDT scoring data to further enhance the comprehensiveness of assessments. Second, incorporating multi-center clinical samples to construct a larger-scale and more diverse CDT benchmark dataset, thereby improving the model’s generalization ability in real-world scenarios. Third, adopting interpretable technologies such as attention mechanism visualization and Grad-CAM to provide clinical professionals with more transparent decision-making basis for scoring. Additionally, based on repeated CDT test data from the same patient, a temporal contrastive learning model can be developed to enable long-term dynamic monitoring and progression prediction of cognitive function.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

This study focuses on the development of an automated scoring algorithm for Clock-Drawing Test (CDT) images. It does not directly involve human subjects in interventional research, but the image data used is derived from clinical assessment scenarios. The relevant ethical explanations are as follows: ① The data source platform CDT-API-Network has provided explicit ethical certification (approved by the Institutional Review Board (IRB) of Siriraj Hospital), confirming the compliance of the original clinical sample collection; ② This study only uses image features for algorithm training and has no access to any personal information of the subjects. The entire data processing process complies with the requirements for medical data usage specified in the Personal Information Protection Law; ③ During the research, the original attributes of the data were not altered, nor was the data used for clinical diagnosis. It was only employed as sample data for algorithm validation, which is in line with the ethical boundaries of academic research.

Author contributions

NL: Supervision, Writing – review & editing. QS: Data curation, Formal analysis, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft. XX: Conceptualization, Methodology, Supervision, Writing – original draft. HM: Resources, Validation, Writing – original draft. XL: Resources, Writing – review & editing. BR: Data curation, Software, Writing – review & editing. LW: Funding acquisition, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcomp.2026.1690044/full#supplementary-material

References

1
ChenT.KornblithS.NorouziM.HintonG. (2020). “A simple framework for contrastive learning of visual representations” in Proceedings of the 37th International Conference on Machine Learning PMLR, 1597–1607.
- Google Scholar
2
ChenY. R.QianX. L.ZhangY. Y.SuW.HuangY.WangX.et al. (2022). Prediction models for conversion from mild cognitive impairment to AD: a systematic review and meta-analysis. Front. Aging Neurosci.14:840386. doi: 10.3389/fnagi.2022.840386,
3
ChenC.ZhangJ.XuY.ChenL.DuanJ.ChenY.et al. (2022). Why do we need large batchsizes in contrastive learning? A gradient-bias perspective. Adv. Neural inform. Process. Syst.2, 33860–33875. doi: 10.48550/arXiv.2210.05144
- CrossRef
- Google Scholar
4
FangZ.ZhuS.ChenY.ZouBinfengJiaFanLiuChanget al2024). GFE-mamba: mamba-based AD multimodal progression assessment via generative feature extraction from MCI. arXiv [Preprint] doi; 10.48550/arXiv.2407.15719
- CrossRef
- Google Scholar
5
GarciaM.ZhangY.LiuY.WangY.LiX.SunH.et al. (2024). Contrastive pre-training for automated clock-drawing test grading in cognitive assessment. Comput. Biol. Chem.110:108021. doi: 10.1016/j.compbiolchem.2024.108021
- CrossRef
- Google Scholar
6
HeKFanH.WuY.XieS.GirshickR.DollárP.et al (2020). Momentum contrast for unsupervised visual representation learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9729–9738.
- Google Scholar
7
HoD. E.ImaiK.KingG.StuartE. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal.15, 199–236. doi: 10.1093/pan/mpl013
- CrossRef
- Google Scholar
8
HuM.QinT.GonzalezR.FreedmanV.ZahodneL.MelipillanE.et al. (2024). Using deep learning neural networks to improve dementia detection: automating coding of the clock-drawing test. Res. Sq. doi: 10.21203/rs.3.rs-4909790/v1,
9
KalantidisY.SariyildizM. B.PionN.WeinzaepfelP.LarlusD. (2020). “Hard negative mixing for contrastive learning”. Eds. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin. Red Hook, NY, USA: Curran Associates Inc, Advances in Neural Information Processing Systems, 33, 21798–21809.
- Google Scholar
10
LiY.WangZ.ZhangH.LiuS.ChenX.LiJ. (2023). Dynamic negative sample queue for MoCo-based contrastive learning in small-sample scenarios. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, & C. Zhang (Eds.),Advances in Neural Information Processing Systems,36, 45678–45690.
- Google Scholar
11
PurushwalkamS.GuptaA. (2020). Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Adv. Neural inform. Process. Syst.33, 3407–3418. doi: 10.5555/3495724.3496011
- CrossRef
- Google Scholar
12
RaksasatR.TeerapittayanonS.ItthipuripatS.PraditpornsilpaK.PetchlorlianA.ChotibutT.et al. (2023). Attentive pairwise interaction network for AI-assisted clock drawing test assessment of early visuospatial deficits. Sci. Rep.13:18113. doi: 10.1038/s41598-023-45442-3,
13
SiWeiXuTingLinJiayueCaoWentingZhuAiyong. Application status of artificial intelligence technology in screening for geriatric dementia. Biomedical [Preprint] 2024. doi: 10.12201/bmr.202407.00048.
- CrossRef
- Google Scholar
14
SobalV.IbrahimM.BalestrieroR.CabannesV.BouchacourtD.AstolfiP.et al. X-sample contrastive loss: improving contrastive learning with sample similarity graphs. arXiv [Preprint], 2024. doi: 10.48550/arXiv.2407.18134
- CrossRef
- Google Scholar
15
Souillard-MandarR.MitrovicJ.McAllesterD. (2020). Contrastive learning with hard negative examples. In H. D. III and A. Singh (Eds.), Proceedings of the 37th International Conference on Machine Learning, 119, 9080–9089.
- Google Scholar
16
TianY. (2022). “Understanding deep contrastive learning via coordinate-wise optimization”. Eds. S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh. Red Hook, NY, USA: Curran Associates Inc. Advance Neural information Process System, 35, 19511–19522.
- Google Scholar
17
TóthL.HoffmannI.GosztolyaG.VinczeV.SzatloczkiG.BanretiZ.et al. (2018). A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr. Alzheimer Res.15, 130–138. doi: 10.2174/1567205014666171121114930
- CrossRef
- Google Scholar
18
van den OordA.LiY.VinyalsO. (2018). Representation learning with contrastive predictive coding. arXiv [Preprint] doi: 10.48550/arXiv.1807.03748
- CrossRef
- Google Scholar
19
WangF.LiuH.. Understanding the behaviour of contrastive loss In IEEE Conference Computer Visual Pattern Recognition. 2495–2504, 2021.
- Google Scholar
20
World Health Organization (2024). Dementia: Key facts. Geneva: World Health Organization.
- Google Scholar
21
WuJ.ChenJ.WuJ.ShiW.WangX.HeX. (2023). Understanding contrastive learning via distributionally robust optimization. Adv. Neural inform. Process. Syst.36, 23297–23320. doi: 10.48550/arXiv.2310.11048
- CrossRef
- Google Scholar
22
YehC. H.HongC. Y.HsuY. C.LiuT. L.ChenY.Le-CunY. (2022). Decoupled contrastive learning. Eur. Conf. Comput. Vis.2, 668–684. doi: 10.1007/978-3-031-19809-0_38
- CrossRef
- Google Scholar
23
ZhangL. (2023). Applications of computer vision in analysis of the clock-drawing test as a metric of cognitive impairment. arXiv [Preprint] doi: 10.48550/arXiv.2305.00063
- CrossRef
- Google Scholar
24
ZhangA.ShengL.CaiZ.WangX.ChuaT. S. (2024). “Empow-ering collaborative filtering with principled adversarial contrastive loss” Eds. A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak et al .Red Hook, NY, USA: Curran Associates Inc. Advances in Neural Information Processing Systems, 37.
- Google Scholar

Summary

Keywords

contrastive learning, mild cognitive impairment, MoCo, SimCLR, triplet loss

Citation

Liu N, Sun Q, Xu X, Mou H, Liao X, Rong B and Wang L (2026) An improved contrastive learning loss function for automated clock-drawing test grading with implications for cognitive impairment screening. Front. Comput. Sci. 8:1690044. doi: 10.3389/fcomp.2026.1690044

Received

26 August 2025

Revised

02 January 2026

Accepted

09 February 2026

Published

20 February 2026

Volume

8 - 2026

Edited by

Marcello Pelillo, Ca' Foscari University of Venice, Italy

Reviewed by

Fangda Leng, University of California, San Francisco, United States

Alok Misra, Lovely Professional University, India

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Qian Sun, 222409252025@zust.edu.cn

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

ORIGINAL RESEARCH article

An improved contrastive learning loss function for automated clock-drawing test grading with implications for cognitive impairment screening

Abstract

1 Introduction

2 Related work

3 Method

3.1 Notation

3.2 The new loss function

3.2.1 Binary classification

3.2.2 N-ary classification

3.2.3 Implementation

4 Experiment

4.1 Datasets

4.2 Dataset processing

4.3 Construction strategy for positive and negative sample pairs

4.4 Experimental environment

4.5 MoCo model configuration

5 Results

5.1 CIFAR-10: preliminary validation

5.2 Results of MoCo pre-experiment

5.3 Results of SimCLR pre-experiment

5.4 CDT: clinical task evaluation

6 Discussion

7 Limitations

8 Conclusion

Statements

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

Supplementary material

References

Summary

Outline

Figures

Cite article

Share article

Article metrics