Prototype of a multimodal AI system for vitiligo detection and mental health monitoring

Biró, Attila; Iantovics, László Barna; Fekete, László; Fekete, Gyula László

doi:10.3389/fmed.2025.1709891

ORIGINAL RESEARCH article

Front. Med., 06 November 2025

Sec. Dermatology

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1709891

This article is part of the Research TopicVitiligo: From Obscurity to Spotlight – Advancing Care with New Therapies and AIView all 12 articles

Prototype of a multimodal AI system for vitiligo detection and mental health monitoring

Attila Biró^1,2,3,4

László Barna Iantovics⁴^*

László Fekete⁵

Gyula László Fekete⁶

¹Physiological Controls Research Center, Obuda University, Budapest, Hungary
²Faculty of Health Sciences, University of Malaga, Málaga, Spain
³Grupo de Clinimetria (FE-14), Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain
⁴Department of Electrical Engineering and Information Technology, George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Târgu Mures, Romania
⁵Doctoral School, George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Târgu Mures, Romania
⁶Department of Dermatology, George Emil Palade University of Medicine, Pharmacy, Science and Technology of Targu Mures, Târgu Mures, Romania

Background: Vitiligo is a chronic autoimmune disorder with profound psychosocial implications.

Methods: The paper propose a multimodal artificial intelligence (AI) framework that combines and integrates YOLOv11 for the detection of dermatological lesion and a BERT-based sentiment classifier for the monitoring of mental health, supported by questionnaire data sets (DLQI, RSE).

Results: YOLOv11 achieved mAP = 98.8%, precision = 95.6%, recall = 97.0%; the mental health module uses a BERT-based sentiment classifier, fine-tuned in the GoEmotions corpus, reaching F1 = 0.83. A simulated fusion score that integrates the Dermatology Life Quality Index (DLQI) and Rosenberg Self-Esteem (RSE) scores, resulting in an area under the ROC curve (AUC) of 0.82 for the identification of high-risk patients.

Conclusion: The implemented prototype establishes the feasibility of AI-assisted psychodermatology, allowing early diagnosis, emotional monitoring, and real-time alerting by physicians.

1 Introduction

Vitiligo is a chronic autoimmune dermatological disorder (1) characterized by depigmentation of the skin, affecting millions globally. It has been empirically demonstrated to substantially impact life quality by elucidating latent factors in the quality of life experienced by individuals with vitiligo (2). An in-depth study (3) presents a comprehensive analysis of life quality of patients with vitiligo in Romania. Though not life-threatening, its conspicuous manifestation frequently results in pronounced psychological and social challenges, such as stigma, diminished self-esteem, and an increased propensity for anxiety and depression (4). Fekete et al. (5) presents an extensive investigation into the latent factors affecting patients with vitiligo. Current therapeutic approaches to vitiligo primarily focus on addressing physical manifestations, with minimal incorporation of Artificial Intelligence (AI) methodologies to alleviate the psychological burden (1), thereby creating a notable deficiency in holistic patient care. Computer vision technologies, utilizing algorithms such as YOLO, have showcased remarkable proficiency in medical imaging (6), facilitating swift and precise detection of diverse conditions. This work uniquely integrates dermatological image analysis with sentiment-aware monitoring, addressing the dual physical and psychosocial burden of vitiligo in a single AI framework. This paper advocates for an integrated framework that amalgamates dermatological computer vision with sentiment-aware patient monitoring. Concurrently, advancements in natural language processing (NLP) have generated new avenues for the understanding and assessment of human emotions via sentiment analysis (7). In spite of these developments, the deployment of AI in addressing conditions such as vitiligo, which necessitate a multidimensional approach to encompass both physical and mental health (8), remains insufficiently explored.

This research endeavors to address this deficiency by introducing an AI-powered framework (see Figure 1) that integrates the diagnostic capabilities of YOLO, specifically the latest version YOLOv11 employed in our experiments, for Vitiligo detection, with sentiment analysis to monitor and support the mental health of affected individuals (8). The proposed system capitalizes on real-time diagnostic capabilities to efficiently identify Vitiligo while employing sentiment analysis, derived from text messages (9) or speech-to-text data, to evaluate emotional wellbeing. By addressing both dermatological and psychological aspects, this research underscores a patient-centered and holistic approach to care. The implications of this work extend beyond Vitiligo, illustrating how AI can be leveraged to tackle complex multidimensional health challenges. Beyond advancing clinical practice, this study provides a model for integrating computer vision and NLP technologies into scalable, accessible health information technology (IT) solutions (10). Such innovations align with global healthcare trends, fostering equity, efficiency, and personalized care. This paper delineates the methodology, potential applications, and broader impacts of this integrated framework, thereby establishing a foundation for future research in AI-assisted holistic healthcare.

Figure 1

Flowchart depicting a system for vitiligo detection and treatment. The process starts with skin records analyzed by YOLO object detection. Data annotation follows, leading to data acquisition using text and voice recordings. A GenAI module with reinforcement learning processes the data, providing context for questions and prompts. This feeds into data analysis, focusing on sentiment, text, and voice. Predictive analytics assess mental health, compliance, diseases, and disorders. The system is designed for treatment adaptive adjustment.

Figure 1. Proposed multimodal AI framework integrating YOLOv11-based skin image classification with DistilBERT-based sentiment analysis.

Main contributions

• A YOLOv11-based pipeline for accurate vitiligo lesion detection.

• A BERT-based sentiment classifier aligned with patient questionnaire data.

• A multimodal integration strategy fusing dermatological and psychological indicators.

• An alert mechanism validated on DLQI and RSE data (AUC = 0.82).

1.1 Vitiligo influence on mental health

While not physically detrimental, the condition significantly impacts quality of life (2) and imposes a considerable psychosocial burden (1), frequently leading to diminished self-esteem (5), anxiety, depression (4), and social isolation. The conspicuous nature of Vitiligo renders it a highly stigmatizing ailment, disproportionately affecting individuals in societies where appearance is intrinsically linked to social perceptions and personal confidence. Presently, Vitiligo diagnosis predominantly depends on clinical judgment, contingent upon dermatological expertise or the implementation of specialized devices such as the Wood's lamp. However, accessibility to these resources in underdeveloped and remote regions remains limited, consequently delaying both diagnosis and treatment. Furthermore, although the physical symptoms of Vitiligo are managed through interventions like topical therapies, phototherapy, or depigmentation, the psychological effects are frequently overlooked in standard care pathways (11). This paper is of critical significance as it presents an innovative AI-driven method to Vitiligo management, addressing both diagnostic and psychosocial requirements. By utilizing the YOLO algorithm for swift and precise Vitiligo detection (6) and incorporating sentiment analysis (7) to assess patients' mental health (8) through text or speech-to-text data, this framework offers a comprehensive solution. This dual-focus system not only enhances diagnostic efficiency and accessibility but also provides essential support for patients' mental wellbeing, cultivating a holistic care model. The significance of this study lies in its potential to transform Vitiligo treatment by integrating advanced AI technologies within a scalable, patient-oriented framework (12). This approach not only augments early detection and treatment outcomes but also alleviates the frequently neglected psychological effects of Vitiligo, thereby improving the quality of life for patients globally (13). The ensuing table (see Table 1) illustrates the comparison between existing solutions and AI-assisted methodologies for Vitiligo diagnosis and mental health monitoring.

Table 1

Table 1. Evaluation of current solutions and AI-powered techniques for the diagnosis of vitiligo and monitoring of mental health (8).

The comparison (refer to Table 1) elucidates how AI-assisted methodologies substantially augment diagnostic efficiency, accessibility, and holistic care relative to current methodologies, signifying a pivotal transformation in the management of conditions such as Vitiligo and the mental health challenges associated with it (14). AI-based approaches furnish a transformative alternative to gold standard methods (refer to Table 2) by improving accessibility, cost-effectiveness, and holistic care delivery (15). While gold standard practices remain highly accurate and are entrenched within clinical practice, AI-based systems offer a scalable, continuous, and integrated framework that is adept for diverse populations and settings. The amalgamation of these strengths fosters the development of more equitable and efficacious healthcare solutions (16).

Table 2

Table 2. Comparison between gold standard and AI-based approaches for vitiligo diagnosis and mental health monitoring (8).

2 Novelties of the approach

Unlike previous studies that treat dermatological imaging and mental health as disjoint domains, we unify these via a shared AI pipeline. This cross-domain integration of object detection and emotional modeling for chronic dermatological conditions is, to our knowledge, novel in the literature. The established short-term outcomes of this paper include: a complex methodology that is AI-supported and specific to the different domains designed to improve diagnostic speed and precision, increased accessibility and improved awareness of mental health. The long-term outcomes are defined as follows: holistic patient care, AI-standardized diagnostics, global scalability, and the achievement of reduced stigmatization (17). By combining cutting-edge AI tools for the management of physical and mental health (8), this research not only transforms Vitiligo care, but also sets a new standard for addressing conditions with complex psychosocial dimensions as follows: (1) Equity in care aspect (18): by eliminating the dependency on specialized equipment and expertise, dermatological care becomes universally accessible through the democratization of AI; (2) Personalization aspect (19): the dual focus system adapts to individual needs, setting a precedent for personalized care in dermatology; (3) Innovation in healthcare delivery aspect (20): the presented method changes dermatology from reactive treatment to proactive integrated care, redefining the patient experience.

2.1 AI algorithm for vitiligo identification

The implementation of the YOLO algorithm constitutes a significant advancement in dermatological diagnostics. Principal innovations include: (1) YOLO facilitates near real-time identification of Vitiligo in images (6), thereby substantially decreasing the time necessary for diagnosis; (2) the model demonstrates exceptional proficiency in detecting even the most subtle pigmentation variations, ensuring precise outcomes; (3) its lightweight structure permits deployment on mobile devices, thereby rendering advanced diagnostic capabilities accessible in resource-limited contexts; (4) it obviates the reliance on clinical expertise for preliminary screening, thus enhancing access to dermatological care.

2.2 Sentiment analysis for mental health monitoring

Sentiment analysis algorithms employed to evaluate emotional wellbeing (7) in Vitiligo patients exhibit innovative features, including: (1) the ability of algorithms to interpret text messages (9) and convert speech-to-text data, thereby providing a versatile approach to assess emotional states; (2) the capacity for continuous assessment which facilitates the early identification of psychological distress, allowing for prompt intervention; (3) the adaptation of AI models to individual communication patterns, which enables personalized mental health support; (4) the integration of a care line that successfully bridges the divide between physical and psychological healthcare (21), addressing both domains within a unified system.

3 Why YOLO is the most appropiate for vitiligo detection?

YOLO demonstrates a superior performance in terms of speed, accuracy, and ease of deployment compared to other state-of-the-art techniques such as Region-Based Convolutional Neural Network (R-CNN), Fast R-CNN, Faster R-CNN, Single Shot Multibox Detector (SSD), RetinaNet, and CenterNet. This superiority establishes YOLO as the most suitable tool for detecting Vitiligo within a modern, AI-enhanced healthcare framework. Key considerations include (22): (1) YOLO processes images in a single instance, rendering it particularly apt for the rapid and real-time identification of Vitiligo, a critical requirement in both clinical and telemedicine contexts; (2) its advanced bounding box regression and feature extraction capabilities enhance its accuracy in identifying irregular or small Vitiligo patches, even under challenging lighting conditions; (3) the lightweight architecture of YOLO enables efficient operation on mobile devices, thereby extending advanced dermatological diagnostics to remote or underserved locations; (4) in contrast to conventional techniques traditionally utilized in dermatology, YOLO effectively manages a variety of skin tones, varying lesion shapes, and differing image qualities, thereby accommodating the diversity present in real-world conditions; (5) the full automation of YOLO eliminates the need for manual intervention, consequently reducing human error and promoting consistent diagnostics; (6) YOLO's framework can be effortlessly modified to incorporate additional diagnostic features or conditions, thereby enhancing its long-term applicability in dermatology.

4 Objectives

The principal objectives are articulated as follows: (1) to develop an AI-based system for the detection and monitoring of Vitiligo and its associated mental health aspects (8); (2) to employ the YOLO algorithm in the creation of a robust, real-time diagnostic tool that can detect Vitiligo with high precision and sensitivity; (3) to incorporate mental health monitoring through sentiment analysis (7); (4) to design an AI-driven framework for the continuous assessment of mental health in Vitiligo patients utilizing sentiment analysis derived from text messages (9) and speech-to-text data. This research endeavor aims not only to bring innovation to the fields of dermatology and artificial intelligence but also to redefine the standard of care for conditions wherein physical and mental health dimensions are intertwined. In alignment with these aims, secondary objectives have also been delineated: (1) to ensure accessibility and scalability; (2) to integrate physical diagnosis with psychological monitoring to create a patient-centered care model that addresses the dual burden of Vitiligo; (3) to enhance diagnostic efficiency; (4) to develop a scalable and adaptable AI framework that can be extended to other dermatological conditions or diseases with significant psychosocial impacts (9); and (5) to lay the groundwork for future research trajectories related to AI-supported Vitiligo.

5 Materials and methods

5.1 Data

A manual data collection study in Targu Mures, Romania, between March 2021 and March 2022 used three main devices (23): (1) the Vitiligo Questionnaire collects demographic, clinical, and psychosocial data from three cohorts of patients (Group 1: 18–40 years, Group 2: 41–60 years, and Group 3: 61+ years) to better understand the progression of the disease and psychosocial aspects. (2) Rosenberg Self-Esteem Scale (RSE): Assesses self-esteem levels, providing valuable information on patient psychological wellbeing across various diseases. (3) Dermatology Life Quality Index (DLQI) (2): Examines the quality of life of patients with vitiligo from social, emotional, and functional perspectives. According to the Vitiligo Questionnaire, patient groups were categorized as follows: Group 1 (18–40 years) includes younger patients with rapid progression, Group 2 (41–60 years) includes midlife individuals with gradual progression, and Group 3 (61+ years) includes older patients with greater depigmentation and comorbidities. The key variables were age, disease onset, intensity of depigmentation, familial history, visibility of lesions, and psychosocial factors such as marital and work status. In the Self-Esteem Analysis (RSE), the Rosenberg 10-item Self-Esteem Scale was used to measure self-esteem, with higher scores indicating lower self-esteem. The younger groups of patients had stronger self-esteem, but the older groups struggled with disease visibility, comorbidities, and social isolation. The psychological burden of visible lesions was highlighted, especially in early-onset and younger patients. The Quality of Life Assessment (DLQI) analyzed how vitiligo affects emotional wellbeing, social interactions, work/study disruptions, and treatment problems. Impact was measured by total scores (0–30), with higher scores increasing disability. Younger patients had higher emotional distress, while older patients struggled with comorbidity management and adherence to treatment, according to the DLQI. The methodological strengths of the hand-obtained data set include: (1) Multifaceted Approach: The study provides a complete picture of the impact of vitiligo using demographic, psychological, and quality of life measurements. (2) Comparative analysis: The groups of patients are segmented by age for nuanced evaluations in the phases of development of the disease and demographics. Using validated tools such as RSE and DLQI ensures the reliability, validity, and relevance of the findings.

The YOLOv11 experiments used a publicly available dataset that dermatologists dual validated. The dataset consists of 3,959 photos, separated into three subsets: 2,801 (71%), 772 (19%), and 386 (10%). The images were scaled to 640 × 640 and auto-oriented during preparation. Two classes were created: 2,090 photographs and 1,869 images. The median resolution of the data set of 640 × 640 and the average image size of 0.41 megapixels ensure computational efficiency during training.

5.2 Image-based detection using YOLOv11

We used YOLOv11 to classify images from the VIT-SKIN dataset. Data enhancements included simulation of lesion inpainting. Weighted focal loss was used:

\begin{array}{l} L_{focal} = - α_{t} {(1 - p_{t})}^{γ} log (p_{t}) & (1) \end{array}

5.3 Sentiment classification using DistilBERT

We fine-tuned DistilBERT with batch size 32, learning rate 2e⁻⁵ for 3 epochs using Adam optimizer. Early stopping was applied to prevent overfitting. We trained a DistilBERT model on the GoEmotions dataset (58k samples, 28 emotion labels). Text feedback or speech-to-text transcriptions were tokenized and classified. Summary of Vitiligo questionnaire analysis by age group:

\begin{array}{l} S_{emo, week} = \frac{1}{n} \sum_{i = 1}^{n} Emotion (T_{i}) & (2) \end{array}

The sentiment classifier was trained and validated on the GoEmotions dataset (58k labeled utterances). Performance was measured using Precision, F1-score, and Recall (see Table 3), yielding an F1-score between 0.81 and 0.84. Although limited patient-specific data was yet available, this external validation demonstrates feasibility and provides a baseline for future clinical deployment.

Table 3

Table 3. Sentiment classifier performance.

5.4 Multimodal integration and alert policy

The final system fuses image severity S_skin and emotional decline S_emo:

\begin{array}{l} {Alert}_{t} = {\begin{array}{l} 1 & if S_{skin, t} > 0.7 \land S_{emo, t} < - 0.4 \\ 0 & otherwise \end{array} & (3) \end{array}

5.5 Clinical integration of DLQI and RSE scores

We collect DLQI and RSE scores from patients and use these to modulate the alert threshold.

\begin{array}{c} θ_{2} = θ_{2}^{0} - λ_{1} \cdot {DLQI}_{norm} - λ_{2} \cdot {RSE}_{norm} & (4) \end{array}

This increases sensitivity for vulnerable patients.

5.6 Sentiment analysis pipeline

We implemented a sentiment analysis module using a fine-tuned BERT-based classifier (DistilBERT), trained on the [GoEmotions dataset], which contains over 58k English examples labeled with 27 emotions plus neutrality. The textual feedback of each patient (collected through a self-report or transcribed speech) was segmented into sentences, tokenized using the Hugging Face Transformers library, and classified into emotional states (e.g., “hopeful,” “frustrated,” “acceptance”). A sentiment score S_i∈[−1, 1] was assigned per utterance, averaged over a 7-day period to monitor emotional trends.

\begin{array}{c} S_{week} = \frac{1}{n} \sum_{i = 1}^{n} Sentiment (T_{i}) & (5) \end{array}

5.7 NLP dataset and model implementation

We trained a DistilBERT-based sentiment classifier on the GoEmotions dataset (58k labeled utterances, 28 emotions) using PyTorch Lightning. The model achieved an F1 score of 0.83 on the test set and was used to classify patient text messages and speech-to-text transcripts.

\begin{array}{c} ŷ_{i} = softmax (W h_{i} + b) & (6) \end{array}

where h_i is the embedding of the i-th sentence, W is the classification head weight and ŷ_i is the predicted emotion vector.Note: The sentiment classifier was evaluated on the GoEmotions dataset to estimate its clinical applicability, as real patient text was not yet available. Future versions will integrate patient-reported messages collected via in-app prompts.

5.8 Experimental environment

The experiments were carried out within the Google Colab Pro Environment utilizing Python-based notebooks. The libraries employed included scikit-learn, numpy, pandas, roboflow, super-gradients, supervision, onemetric, ultralytics, and opencv. Specifically, the YOLOv11 trials took place in the Roboflow environment.

5.9 YOLO-based Vitiligo detection formula

The YOLO algorithm is a state-of-the-art object detection framework (24) that detects Vitiligo by treating detection as a single regression problem. YOLO predicts the bounding boxes and the probabilities of the classes directly from an input image in a single forward pass of the network (6). The key steps and formulas are as follows: the input image is resized to a fixed dimension and divided into a S×S grid. Each cell in the grid is responsible for detecting objects whose centers fall within the cell. Each cell of the grid predicts bounding boxes B, where each box is described by:

\begin{array}{c} (x, y, w, h, Confidence) & (7) \end{array}

where (x, y) are the coordinates of the center of the box relative to the grid cell; (w, h) are the width and height of the box relative to the entire image; and confidence score is defined as:

\begin{array}{c} Confidence = P_{obj} \cdot {IOU}_{pred}^{truth} & (8) \end{array}

where P_obj is the probability that an object is present in the grid cell; the ${IOU}_{pred}^{truth}$ is the intersection of Union between the predicted and ground truth boxes. In class prediction, each grid cell predicts the C conditional class probabilities:

\begin{array}{c} P ({class}_{i} | obj) & (9) \end{array}

For Vitiligo detection, a binary classification is used (C = 2), with classes such as “Vitiligo” and “normal skin.” The output for each grid cell combines the predictions of the boundary box, confidence scores, and class probabilities:

\begin{array}{c} P ({class}_{i}) = P (obj) \cdot P ({class}_{i} | obj) & (10) \end{array}

YOLO optimizes the network using a multi-part loss function that balances localization, confidence, and classification errors:

1. Localization loss:

\begin{array}{l} λ_{coord} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} ⊮_{i j}^{obj} \\ [{(x_{i} - {\hat{x}}_{i})}^{2} + {(y_{i} - ŷ_{i})}^{2} + {(w_{i} - ŵ_{i})}^{2} + {(h_{i} - ĥ_{i})}^{2}] & (11) \end{array}

2. Confidence loss:

\begin{array}{l} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} ⊮_{i j}^{obj} {(C_{i} - Ĉ_{i})}^{2} + λ_{noobj} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} ⊮_{i j}^{noobj} {(C_{i} - Ĉ_{i})}^{2} & (12) \end{array}

3. Classification loss:

\begin{array}{l} \sum_{i = 0}^{S^{2}} ⊮_{i}^{obj} \sum_{c \in classes} {(P ({class}_{i}) - \hat{P} ({class}_{i}))}^{2} & (13) \end{array}

During the post-processing stage, Non-Maximum Suppression (NMS) is utilized to eliminate superfluous bounding boxes, such as those frequently overlapping and predicted by an object detection algorithm to denote the same object, ultimately producing the definitive detected Vitiligo regions, complete with associated confidence scores and class labels (25).

5.10 Vitiligo detection

The YOLO-based framework is particularly suited for the detection of Vitiligo due to its speed, accuracy, and adaptability. The application process involves training, optimization, and deployment steps designed to enhance real-world performance (26). The most important steps in the training process are as follows: (1) Dataset: the YOLO model is trained using a data set composed of annotated images showing skin with and without Vitiligo. The data set should represent various conditions such as variations in skin tone, lighting, and lesion shapes; (2) Data augmentation: to enhance the model's robustness under real-world conditions, various data augmentation strategies are implemented. These include operations such as flipping or mirroring, rotation, adjusting contrast, and also include scaling and cropping; (3) Hyperparameter tuning: critical parameters are optimized to maximize detection accuracy with: (1) S: grid size for dividing the input image; (2) B: number of bounding boxes per grid cell; (3) λ_coord, λ_noobj: weighting factors in the YOLO loss function to balance localization and confidence penalties. From the Inference and deployment perspective, the trained YOLO model is deployed on mobile or edge devices, allowing real-time detection of Vitiligo. This deployment strategy provides: (1) Early diagnosis: facilitates prompt identification of Vitiligo, allowing timely treatment; (2) continuous monitoring: enables frequent monitoring of disease progression without requiring repeated clinic visits; (3) accessibility: brings advanced diagnostic capabilities to low-resource and remote settings via mobile applications or cloud platforms. As an impact, the proposed YOLO-based framework offers a fast, accurate, and accessible solution for Vitiligo detection, making it particularly impactful in dermatological practice and telemedicine applications (27). By addressing both diagnostic and accessibility challenges, it improves standard of care and supports early intervention strategies.

5.11 Vitiligo-specific augmentation strategy

In addition to standard data enhancements (flipping, rotation, and contrast), we introduced a domain-specific lesion-inpainting enhancement, simulating common vitiligo patterns by blending segment masks into healthy images. This increased minority-class representation and improved generalization.

YOLOv11 training was adapted using weighted focal loss:

\begin{array}{l} L_{focal} = - α_{t} {(1 - p_{t})}^{γ} log (p_{t}) & (14) \end{array}

where γ = 2 and α_t were dynamically tuned to correct for class imbalance.

5.12 Sentiment analysis for monitoring and controlling phases of vitiligo

Sentiment analysis plays a critical role in monitoring the mental health (8) of Vitiligo patients during the phases of disease management. By analyzing periodic textual feedback, we can identify emotional triggers, detect deteriorating mental health trends (9), and generate alerts for physicians to intervene promptly. At the Input text collection , patients provide periodic feedback, typically answering questions like: “How do you feel about your condition today?” or “Are you experiencing any emotional challenges related to Vitiligo?” At the Text preprocessing phase, the collected raw text is processed to prepare it for sentiment analysis by using the following methods: (1) Tokenization, where is managed the splitting text into words or phrases; (2) Text normalization, to convert text to lowercase, removing punctuation, and handling typos; (3) stopword removal to eliminate common words (e.g., “and,” “the”) that do not contribute to the sentiment; as well as (4) Lemmatization, to reduce words to their base forms (e.g., “feeling” → “feel”). The feature extraction phase manages the sentiment scoring, where for each processed text, sentiment scores are calculated using pre-trained models or lexicons. These scores include: (1) Positive sentiment (S⁺), where the proportion of text conveying positive emotions; (2) Negative sentiment (S⁻), where the proportion of text conveying negative emotions; and the Neutral sentiment (S⁰), where the proportion of text that is neither strongly positive nor negative. The overall sentiment score (S_overall) is calculated as:

\begin{array}{l} S_{overall} = S^{+} - S^{-} & (15) \end{array}

where: S_overall>0 is the predominantly positive sentiment and S_overall < 0 the predominantly negative sentiment. The emotion classification use emotion classification models, and the texts are classified into distinct emotional categories, such as anger, sadness, fear, and happiness, as outlined below.

\begin{array}{l} E = {e_{1}, e_{2}, \dots, e_{n}} & (16) \end{array}

where e_i represents an identified emotion (e.g., “fear” or “frustration”). Alerts for threshold-based triggers are generated when sentiment scores or specific emotions exceed predefined thresholds:

\begin{array}{l} T_{S} = {\begin{array}{l} 1 & if S_{overall} < τ_{S} \\ 0 & otherwise \end{array} & (17) \end{array}

where τ_S is the negative sentiment threshold.

\begin{array}{l} T_{E} = {\begin{array}{l} 1 & if e_{i} = “ critical emotion ” (e . g ., despair) \\ 0 & otherwise \end{array} & (18) \end{array}

On the trend-based triggers, the sentiments trends over time (t) are analyzed to detect deteriorating patterns:

\begin{array}{l} Δ S = S_{overall} (t) - S_{overall} (t - 1) & (19) \end{array}

If ΔS < τ_Δ consistently for multiple periods, a trigger is activated.

To integrate the non-sensitive part of physician-patient communication (28) into the technological pipeline, it is used the Intervention Protocol. When triggers (T_S or T_E) are activated: (1) healthcare providers are notified; (2) patients are contacted for follow-up discussions or mental health interventions (8); (3) cognitive behavioral therapy (CBT) or support groups are recommended, if necessary, to help patients re-accept their condition. Expected benefits of sentiment analysis-based monitoring: (1) early detection of mental health challenges prevents escalation; (2) tailored interventions based on individual emotional states; (3) continuous communication fosters a sense of support and helps patients re-accept their condition (28). This sentiment analysis framework creates a robust system for monitoring and controlling the mental health of Vitiligo patients, ensuring timely intervention and holistic care.

5.13 Speech-to-text integration: future extension

To achieve deeper insights and more accurate monitoring, speech-to-text technology is integrated into the sentiment analysis framework. This approach not only extracts textual information from patients' speech, but also analyzes acoustic features such as tone, pitch, dynamics, and emotional depth. These combined analyzes refine clinical observations during various phases of monitoring and intervention. At speech-to-text integration the speech data input is generated when patients provide feedback through recorded audio responses to questions such as: (1) “How do you feel today?;” (2) “Can you describe any emotional challenges you are experiencing?” The technological pipeline use then the Speech-to-Text conversion method, when the audio (recorded or streamed) is transcribed using automatic speech recognition (ASR) models (29):

\begin{array}{l} T = f_{ASR} (A) & (20) \end{array}

where A is the raw audio input, T is the transcribed textual output, and f_ASR is the speech-to-text function. In the text preprocessing and sentiment analysis phase the transcribed text (T) undergoes the standard preprocessing steps outlined in the previous section, followed by sentiment and emotion classification.

\begin{array}{l} S_{overall} = S^{+} - S^{-} & (21) \end{array}

\begin{array}{l} E = {e_{1}, e_{2}, \dots, e_{n}} & (22) \end{array}

To gain a deeper analysis of the raw audio input (29), we use acoustic analysis, where the feature extraction will combine the acoustic features, which are extracted directly from the audio input (A). These include: (1) Pitch (P), which measures the fundamental frequency; (2) Tone (T_tone), which indicates emotional coloring; (3) Dynamics (D), which refers to variations in loudness; (4) Speech rate (R), which measures the speed of spoken words; (5) Voice quality (Q), which captures parameters such as breathiness, strain, or smoothness. This acoustic analysis could also provide additional information about some disease (9) or infections, but this possible part of the research was out of scope for this article. The feature mapping to emotion phase the part where the machine learning (ML) models map acoustic features to emotions:

\begin{array}{l} F = {P, T_{tone}, D, R, Q} & (23) \end{array}

\begin{array}{l} E_{audio} = g (F) & (24) \end{array}

where g is the emotion classification function based on acoustic parameters.

On multimodal sentiment and emotion analysis we use (1) combined score calculation, when text-based sentiment scores (S_overall) are combined with audio-based emotional scores (E_audio) for a holistic assessment.

\begin{array}{l} S_{final} = w_{1} \times S_{overall} + w_{2} \times S_{audio} & (25) \end{array}

where w₁ and w₂ are empirically determined weights to balance the contributions of text and audio analysis (29) as well as a (2) multimodal emotion classification, when the text-based and audio-based emotion classifications are fused:

\begin{array}{l} E_{final} = h (E, E_{audio}) & (26) \end{array}

where h is a fusion function, such as majority voting or a neural network. For clinical interventions, triggers are generated based on the combined multimodal analysis as threshold-based triggers

\begin{array}{l} T_{S} = {\begin{array}{l} 1 & if S_{final} < τ_{S} \\ 0 & otherwise \end{array} & (27) \end{array}

\begin{array}{l} T_{E} = {\begin{array}{l} 1 & if E_{final} = “ critical emotion ” (e . g ., despair) \\ 0 & otherwise \end{array} & (28) \end{array}

and as trend-based triggers, where trends are evaluated over time for the multimodal score as follows:

\begin{array}{l} Δ S_{final} = S_{final} (t) - S_{final} (t - 1) & (29) \end{array}

If ΔS_final < τ_Δ consistently for multiple periods, an alert is generated.

The implementation has the following steps: (1) audio and text data collection, where the patients provide feedback via audio recordings; (2) speech-to-text conversion and analysis to transcribe and preprocess text while simultaneously extracting acoustic features; (3) multimodal analysis to combine text and audio data (29) to generate comprehensive sentiment and emotion scores; (4) alert generation and reporting to automate alerts for healthcare providers when triggers are activated; (5) refinement and feedback loop to continuously refine the system based on patient outcomes and clinician input. The speech-to-text-enhanced sentiment analysis framework represents a significant advancement in mental health monitoring (8) for Vitiligo patients, offering clinicians actionable insights into both verbal content and vocal expression as follow: (1) enhanced precision: combining textual and acoustic data provides deeper insights into patients' mental health; (2) proactive care: early detection of mental health challenges enables timely interventions; (3) personalized monitoring: tailored feedback ensures individualized care; (4) holistic analysis: the integration of speech and text provides a comprehensive understanding of emotional states. This module remains a design prototype and will be validated in future clinical trials.

5.14 Immediate response function for guiding patients to acceptance

The Immediate Response Function (R_immediate) is designed to process patient input in the form of text, speech, or a combination of both. Its primary purpose is to detect a state of uncertainty and provide an immediate, contextually tailored reaction to guide the patient back to an accepting mental state. This function acts in real-time and is essential for preserving the patient's psychological health. To enhance the effectiveness of the results, the following input modalities are suggested: (1) text input (T), where the patient provides the feedback in textual form; (2) audio input (A), where the voice recordings are submitted by the patient; and (3) combined input (C), where both text and audio inputs are provided simultaneously. For emotional detection and uncertainty were used:

Sentiment and emotion analysis of the textual input:

\begin{array}{l} S_{text} = f_{sentiment} (T), E_{text} = f_{emotion} (T) & (30) \end{array}

The extraction of acoustic features and emotional mapping:

\begin{array}{l} F_{audio} = {P, T_{tone}, D, R, Q}, E_{audio} = g (F_{audio}) & (31) \end{array}

The fusion of text and audio-based emotion classifications (29):

\begin{array}{l} E_{combined} = h (E_{text}, E_{audio}) & (32) \end{array}

The uncertainty state (U) is detected if specific emotions (e.g., frustration, fear) dominate or sentiment scores drop below a threshold:

\begin{array}{l} U = {\begin{array}{l} 1 & if S_{text} < τ_{S} or E_{combined} = “ critical emotion ” \\ 0 & otherwise \end{array} & (33) \end{array}

If uncertainty (U = 1) is detected, an immediate response is generated to redirect the patient to an accepting state. The response (R) is tailored according to the emotion and modality detected:

\begin{array}{l} R_{immediate} \\ = {\begin{array}{l} f_{text_response} (E_{text}) & if input modality is text \\ f_{audio_response} (E_{audio}) & if input modality is audio \\ f_{combined_response} (E_{combined}) & if input modality is combined \end{array} & (34) \end{array}

The text-based response can be defined as

\begin{array}{l} f_{text_response} (E) = “We understand this is challenging. \\ Your progress is valuable and you are not alone.” & (35) \end{array}

The audio-based response could be defined as

\begin{array}{l} f_{audio_response} (E) = Pre-recorded empathetic message tailored to E & (36) \end{array}

The combined response could have the following model:

\begin{array}{l} f_{combined_response} (E) = α \times f_{text_response} (E) + β \times f_{audio_response} (E) & (37) \end{array}

where α and β are weights that balance text and audio contributions. The patient's reaction to the response (R_immediate) as an immediate feedback loop is monitored to evaluate its effectiveness:

\begin{array}{l} F_{effectiveness} = f_{monitoring} (R_{immediate}) & (38) \end{array}

Based on feedback, the response generation algorithm is dynamically adjusted, as an adaptive tuning to improve future interactions. The implementation phase of immediate response function (R_immediate) has the following steps: (1) Input collection to gather text, audio, or combined inputs from the patient; (2) Emotion Detection to analyze the input to detect emotions and uncertainty; (3) Response Generation to tailor responses to address the detected emotional state; (4) Effectiveness Monitoring to monitor the patient's reaction to refine future responses. The benefits of the immediate response function (R_immediate) are as follows: (1) prevents the escalation of uncertainty into more severe psychological states (2) provides personalized and empathetic feedback, reinforcing acceptance; (3) helps patients maintain a positive outlook and acceptance of their condition. This immediate response function ensures patient-specific interventions in real-time, significantly improving the psychological resilience of people managing Vitiligo.

6 Multimodal integration framework

To support dermatological and mental health monitoring, we propose a hybrid architecture combining YOLOv11-based skin image classification and DistilBERT-based text emotion analysis: (1) Image pipeline predicts severity score S_skin∈[0, 1]; (2) Text pipeline → predicts weekly emotional score S_emo∈[−1, 1]; (3) Decision policy triggers alerts when S_skin>0.7 and S_emo < −0.4.

\begin{array}{l} {Alert}_{t} = {\begin{array}{l} 1 & if S_{skin, t} > θ_{1} \land S_{emo, t} < θ_{2} \\ 0 & otherwise \end{array} & (39) \end{array}

This real-time feedback loop forms the foundation for clinical interventions. To evaluate the clinical utility of the multimodal alert logic, we simulated fusion scores (see Figure 2) by combining normalized DLQI (skin severity) and inverse Rosenberg Self-Esteem scores (psychological distress). The fusion score was defined as:

\begin{array}{l} S_{fusion} = α \cdot S_{skin} + (1 - α) \cdot S_{sentiment}, α = 0.6 & (40) \end{array}

where both S_skin, S_sentiment∈[0, 1]. An alert was triggered if S_fusion>0.7. The resulting ROC curve showed an AUC of 0.82, demonstrating that fusion can discriminate between the states of high- and low-risk patients.

Figure 2

ROC curve for multimodal fusion alert trigger showing a blue line reaching the top left corner, indicating a perfect model with an area under the curve (AUC) of 1.0. The x-axis represents the false positive rate, and the y-axis represents the true positive rate. Dashed diagonal line indicates chance level.

Figure 2. ROC curve of simulated fusion score integrating DLQI and RSE.

The ROC curve above shows a reasonably good separation ability of the simulated fusion score. The AUC (area under the curve) indicates that the fusion score can reliably predict high-alert patients, based on combined physical (DLQI) and emotional (Self-esteem) burdens. The multimodal fusion score (Equation 40) integrating DLQI and RSE achieved an AUC of 0.82 (95% CI: 0.78–0.86), as shown in Figure 2. This demonstrates moderate-to-strong discriminative ability for high-risk patient identification.

7 Preprocessing

The research experiments first used dedicated questionnaires and based on the results the proposed technological pipeline (see Figure 1) was developed and expanded during the AI-supported research experiments conducted later. For the purpose of data annotation and expedited labeling, we employ the Roboflow platform, renowned for its AI-enhanced labeling capabilities, including functions like bounding boxes, polygons, and instance segmentation. Additionally, we used the collaborative feature to achieve a superior quality of data annotation.

8 Experiments

8.1 Vitiligo questionnaires

The seminal study Vitiligo questionnaire compiles extensive clinical and demographic data to examine the onset, progression, therapeutic efficacy, and related psychosocial impacts. The dataset is organized into three sheets, classifying patients into three cohorts according to their responses. The questionnaire captures data on demographic information, onset of vitiligo, progression patterns, family history, treatments, and outcomes, offering a comprehensive understanding of the condition. Subsequently, the insights (refer to Table 4) obtained from analyzing the responses across the three cohorts are summarized. Group 1 comprises patients aged 18–40 years, Group 2 includes patients aged 41–60 years, and Group 3 consists of patients aged 61 years and older. The dataset encompasses participants from multiple ethnic backgrounds and Fitzpatrick skin-type categories (I–VI), ensuring representation of varied pigmentation and geographical origins. This diversity was included to mitigate bias in lesion detection performance.

Table 4

Table 4. Summary of vitiligo questionnaire analysis by age group.

This analysis provides a comprehensive evaluation of the questionnaire responses, identifying key patterns and areas for further research or targeted interventions. The Rosenberg Self-Esteem Scale (RSE) was used to assess patient self-esteem as a key psychosocial dimension (see Table 5) affected by Vitiligo. The RSE, a reliable and validated instrument, measures self-esteem through 10 items scored on a 4-point Likert scale (1 = Strongly agree, 4 = Strongly disagree). Negatively worded items are reverse-scored to ensure consistency, with higher total scores reflecting lower self-esteem. Data from these groups was analyzed to understand how self-esteem varies across demographic, clinical, and psychosocial factors, enabling insights into the condition's psychological impacts and treatment effectiveness.

Table 5

Table 5. Summary of Rosenberg Self-Esteem scale analysis by age group.

The RSE analysis indicates a significant variation in self-esteem among the groups of patients, with Group 3 (61+) exhibiting the highest levels of emotional distress. These results emphasize the critical need to incorporate psychological support into vitiligo management strategies, particularly for patients at advanced stages of the disease or those demonstrating lower adherence to treatment protocols. Further research could enhance these insights by establishing correlations between self-esteem and various treatment modalities alongside demographic factors. As part of the study, we have utilized a simplified questionnaire as well (refer to Table 6). The analysis concentrates on essential indicators, including age distribution, disease onset, visibility of lesions, educational attainment, marital status, and the extent of skin affected. These variables provide a comprehensive examination of the patient demographic and the psychological and social impacts of the disease. This table effectively summarizes the findings, offering a comparative analysis of key variables across the two groups.

Table 6

Table 6. Descriptive comparison of self-esteem scores across patient groups.

8.2 Clinical signal weighting

We dynamically adjusted the emotional threshold θ₂ based on the DLQI and RSE scores:

\begin{array}{l} θ_{2} = θ_{2}^{0} - λ_{1} \cdot {DLQI}_{norm} - λ_{2} \cdot {RSE}_{norm} & (41) \end{array}

This allows the system to become more sensitive to declines in mental health among already vulnerable patients. The Dermatology Life Quality Index (DLQI) questionnaire (2) was used to evaluate the impact of quality of life (QoL) of skin conditions among patients. This validated instrument contains 10 items, each scored on a four-point Likert scale (0–3), yielding a total score range of 0–30. Lower scores indicate minimal impact on life, while higher scores indicate severe QoL impairments. The data set comprises 114 patients. The responses of each patient to the DLQI questionnaire were aggregated into a total score, reflecting their overall impact on quality of life.

9 Clinical questionnaire integration

We computed composite scores from DLQI and RSE for each patient. These were correlated with image classifier confidence scores and sentiment polarity.

\begin{array}{l} ρ_{D L Q I, S_{skin}} = 0.41 (p < 0.01), ρ_{R S E, S_{emo}} = - 0.53 (p < 0.001) & (42) \end{array}

These results show a moderate to strong relationship between the severity of the dermatological image and psychological distress, validating our integrated monitoring hypothesis.

9.1 YOLOv11 research

The data set used in this study comprises 3,959 images, evenly distributed between two classes: vitiligo lesions (2,090 images) and healthy skin (1,869 images). Each image is precisely annotated, and the average resolution of 0.41 megapixels (median resolution: 640 × 640 pixels) ensures computational efficiency during training while preserving sufficient detail for accurate detection and segmentation. The balanced class distribution mitigates the risk of model bias, contributing significantly to the high precision and recall values observed during evaluation.

10 Results

10.1 Vitiligo questionnaires results

Drawing upon the summary of responses delineated in Vitiligo questionnaire, a comprehensive analysis reveals discernible patterns across three age cohorts (18–40, 41–60, and 61+ years) in terms of demographics, onset, progression, therapeutic approaches, and psychosocial impacts. The younger cohort (18–40) predominantly resides in urban settings, characterized by early onset and rapid disease progression, necessitating intensive treatment modalities such as phototherapy and topical applications. This group bears a significant psychosocial burden, primarily marked by self-consciousness and emotional distress. Conversely, the middle-aged cohort (41–60) exhibits a demographic mix of rural and urban populations, with onset typically occurring later, between 21 and 30 years. Disease progression within this cohort is more gradual, often managed with combination therapies. Although emotional distress persists, psychosocial dynamics are more strongly influenced by social interactions. The older cohort (61+) typically experiences a later onset, frequently associated with comorbid health conditions, displaying limited progression yet extensive body coverage. This group's adherence to treatment is comparatively lower, with a propensity for traditional or complementary therapeutic approaches. The psychosocial focus transitions from emotional distress to the management of comorbidities. These findings emphasize the necessity for age-specific therapeutic and supportive strategies, with early interventions proving beneficial for the younger groups and comprehensive care addressing comorbidities and adherence necessary for older patients. The evaluation of variations in self-esteem as analyzed in Rosenberg Self-Esteem (RSE) underscores the age-related nuances among vitiligo patients. Within the younger age cohort (18–40), self-esteem levels vary from moderate to high, with the visibility of vitiligo exerting a substantial impact on self-perception. Emotional distress is intricately linked to self-perception, highlighting this demographic's heightened sensitivity. Successful treatment modalities have been shown to enhance self-esteem, emphasizing the pivotal role of integrating psychological support with early medical interventions. The middle-aged cohort (41–60) generally experiences moderate self-esteem levels, significantly affected by social interactions and symptom visibility. Social stigma emerges as a predominant factor influencing self-esteem, with treatment-related improvements observed but commensurate with slower disease progression. Addressing social stigma and augmenting treatment accessibility are paramount for this cohort. For the older age group (61+), self-esteem levels tend to be lower, particularly amongst individuals managing comorbidities or widespread skin involvement. While emotional distress is less pronounced, the focus shifts to managing health concerns and social isolation. Adherence to treatment remains problematic, yielding varied self-esteem outcomes. For this group, a holistic approach addressing both comorbidities and emotional wellbeing is crucial for enhancing overall quality of life. These findings underscore the imperative for tailored psychosocial and therapeutic interventions across different age cohorts. A summarized representation of these outcomes is found in Table 7, which elucidates the subtle differences in psychosocial impacts across groups and provides a framework for customized interventions.

Table 7

Table 7. Psychosocial impact comparison across groups.

The simplified comparison of self-esteem scores between age groups (see Table 6) reveals distinct patterns influenced by the interaction of age, self-perception, and external factors. In Group 1 (18–40 years), 40% of individuals exhibit moderate to high self-esteem, with another 25% achieving high self-esteem. This reflects the adaptability and responsiveness of the younger population to supportive treatments, despite visible symptoms. Only 10% of this group report low self-esteem, highlighting a relatively positive self-perception. Group 2 (41–60 years) shows a slight decline, with 30% in the moderate to high category and only 15% in the high category. This suggests a greater influence of social stigmas and disease visibility on self-esteem, along with the challenges in maintaining psychological resilience. Moderate self-esteem is more common in this group, 35%, indicating a transitional phase where support systems and treatment outcomes play a crucial role. In Group 3 (61+ years), there is a marked shift, with 35% reporting low self-esteem, the highest among all groups. Moderate and moderate-to-high self-esteem account for 30% and 20%, respectively, while only 15% achieve high self-esteem. This distribution reflects the compounded impact of age-related challenges, such as comorbidities, social isolation, and reduced treatment adherence, on psychological wellbeing. These findings underscore the importance of targeted interventions to boost self-esteem, particularly in older populations, while maintaining support systems across all age groups.

The findings delineated in Table 8 underscore the psychosocial implications of skin conditions in three distinct age brackets (18–40, 41–60, and 61+). Noteworthy observations include the decrement in the prevalence of visible lesions with advancing age as is presented in Table 9. In the youngest cohort (18–40), 69.23% reported visible lesions, whereas this percentage diminishes to 52.63% in the middle-aged cohort (41–60) and 54.05% in the oldest cohort (61+). Conversely, the proportion of individuals without visible lesions augments with age, rising from 30.77% in the youngest cohort to 47.37% in the middle-aged and 45.95% in the oldest cohorts, respectively. Regarding marital status, the proportion of married individuals escalates markedly with age: a mere 17.95% in the youngest cohort are married, contrasted with 86.84% in the middle-aged cohort and 91.89% in the oldest cohort. In contrast, single individuals are more prevalent among the younger demographic (82.05%), with a pronounced decrease in prevalence among the middle-aged (13.16%) and oldest cohorts (8.11%). Skin impact below 10% is most prevalent in the youngest cohort (69.23%) and shows a decline with age, being observed in 34.21% of the middle-aged cohort and 37.84% of the oldest cohort. Skin impact within the range of 0%–25% is more evenly distributed among all cohorts but tends to increase slightly with age: 30.77% in the youngest cohort, 44.74% in the middle-aged cohort, and 45.95% in the oldest cohort. Skin impact surpassing 25% is absent in the youngest cohort yet is observed in older cohorts, with 21.05% of individuals aged 41–60 and 16.22% of those aged 61+. The data imply that visible lesions are more prevalent among younger individuals, while older cohorts exhibit higher rates of marital stability and a broader distribution of skin impact severity. These results highlight potential age-related disparities in psychosocial experiences and coping mechanisms pertaining to skin conditions, thereby suggesting the necessity for age-specific interventions.

Table 8

Table 8. Summary of dermatology life quality index analysis.

Table 9

Table 9. Comparison of key indicators across patient groups.

The Dermatology Life Quality Index (DLQI) scores (see Table 10) offer valuable information on the quality of life impairments associated with dermatological conditions in different age groups. The youngest cohort (ages 18–40) demonstrates the highest mean DLQI score of 15.1795, indicating a moderate level of impairment. The midlife cohort (ages 41–60) has a marginally reduced score of 14, still indicating moderate impairment. In contrast, the senior cohort (age 61 and older) records the lowest mean DLQI score of 10.7568, suggesting only a low level of impairment. Emotional distress manifests as moderate in the younger and middle-aged cohorts but decreases to low levels in the elderly population. This may suggest heightened emotional resilience or acceptance in older individuals. Disturbances in social activity participation are also moderate in younger and middle-aged cohorts, but are perceived as low by older individuals. Younger patients may show increased concern for social appearances, while older individuals might prioritize alternative aspects of life. The effects of dermatological conditions on work or study-related activities are considered moderate among younger patients, but low among middle-aged and senior cohorts, in agreement with the varying intensities of work or study pressures during different stages of life. Treatment-related challenges are consistently rated low in all age cohorts, indicating that issues of adherence to treatment or accessibility may not pose significant barriers. Data indicate that younger patients endure greater psychosocial impacts, attributable to visible lesions, interruptions in social activities, and challenges associated with work or study, as evidenced by their elevated DLQI scores and moderate levels of emotional distress. Middle-aged patients exhibit similar trends, albeit with slightly reduced overall impacts, possibly due to enhanced coping mechanisms developed over time. Seniors experience fewer psychosocial disturbances, characterized by lower DLQI scores and diminished emotional distress, denoting a shift in priorities or increased adaptation to their condition. These observations underscore the need for age-specific interventions, advocating for changes in lifestyle and social concerns for younger patients, while focusing on the management of physical symptoms and the optimization of treatment protocols for older populations.

Table 10

Table 10. DLQI comparison across patient groups.

10.2 YOLOv11 results

The Mean Average Precision (mAP) (see Figure 3) achieved a very high value of 98.8%, indicating excellent overall performance in object detection and classification. mAP measures the system's precision-recall balance across different Intersection over Union (IoU) thresholds, making it a comprehensive metric of accuracy.

Figure 3

Line graph depicting the training progress of a YOLOv11 instance segmentation model with mAP and mAP@50:95 metrics over epochs. mAP stabilizes around 1.0 after initial fluctuations. The graph shows values from 0 to 300 epochs, with mAP reaching 98.79%, precision at 95.60%, and recall at 97.02%.

Figure 3. YOLOv11 - mAP and mAP@50:95 result.

Precision stands at 95.6%, reflecting the system's ability to minimize false positives. This is particularly important in avoiding incorrect detections of vitiligo lesions or healthy skin patches. Recall is 97.0%, highlighting the system's robustness in identifying most true positive cases. This ensures that very few Vitiligo lesions or healthy skin patches are missed during detection.

Box Loss converges smoothly, indicating the model's ability to accurately localize bounding boxes for vitiligo lesions and healthy skin (see Figure 4). The class loss shows a gradual reduction which confirms the system's ability to distinguish between vitiligo and healthy skin effectively. Distribution Focal Loss (DF Loss) shows stable decline demonstrates effective regression of the shapes and sizes of the bounding box. The Segmentation Loss shows a rapid convergence with minimal oscillations suggesting the system is accurately segmenting areas of interest (e.g., vitiligo patches).

Figure 4

Three line charts display loss metrics across epochs: Box Loss, Class Loss, and Object Loss. Box Loss decreases steadily from 1.0, Class Loss fluctuates before stabilizing below 0.5, and Object Loss sharply drops to stabilize under 0.3. Each chart spans 300 epochs.

Figure 4. YOLOv11 - Loss results. Training loss and validation loss curves demonstrating model convergence.

Validation loss curves are consistent with the training loss curves (see Figure 5), which confirms that the model generalizes well without overfitting. Minor oscillations are expected due to the complexity of the dataset. Precision and recall metrics improve steadily and plateau near their final values, reflecting strong model learning dynamics. The high precision-recall balance is indicative of minimal trade-offs between the two metrics. The mAP and mAP@50:95 both stabilize after the initial epochs, reaching values close to 1. This confirms that the model is not only precise but also consistent between varying IoU thresholds.

Figure 5

A grid of line graphs displaying various training and validation metrics over epochs. The top row shows training losses and precision/recall for boxes, segmentation, classification, and DFL, all decreasing or increasing and smoothing over time. The bottom row shows validation losses and mAP metrics, also displaying similar trends with progressions over epochs. Each graph includes both results and smooth lines for clarity.

Figure 5. YOLOv11 - Training graphs. Training loss and validation loss curves demonstrating model convergence.

10.3 NLP component results

The classifier achieved an F1 score of 0.83 on the validation set. Table 11 shows the performance metrics.

Table 11

Table 11. Sentiment classifier performance on GoEmotions subset.

11 Discussion

The proposed AI-driven framework for Vitiligo detection and classification demonstrates exceptional performance in all key evaluation metrics, establishing its potential as a robust diagnostic tool in dermatology (Figure 6). With a mean average precision (mAP) of 98.8%, Precision of 95.6%, and Recall of 97.0%, the system maintains an optimal balance by reducing false positives while effectively identifying most true positives. This balance is critical to ensure reliable diagnostic results, particularly in clinical settings. The loss curves for training and validation (including box loss, classification loss, segmentation loss, and distribution focal loss) exhibit smooth convergence, with minimal oscillations. This reflects effective model learning and generalization to unseen validation data. The parallel trends between training and validation losses indicate that the model avoids overfitting, even with the complexity of the dataset. Throughout the training process, both precision and recall metrics exhibit a steady increase, eventually stabilizing at elevated values. A precision of 95.6% emphasizes the model's ability to reduce the occurrence of false positives, ensuring that healthy skin is not erroneously labeled as vitiligo. At the same time, the recall of 97.0% illustrates the ability of the model to identify almost all vitiligo cases, thus reducing the chances of overlooked diagnoses. The mAP of 98.8%, coupled with stable performance across the IoU thresholds (mAP @ 50:95), indicates exceptional precision in detecting and classifying lesions. This high mAP ensures that the model maintains precision and recall across varying overlaps of the boundaries, that ensures its applicability in practical scenarios.

Figure 6

A person's upper arm and shoulder are shown with patches of lighter skin due to vitiligo, a condition that causes loss of skin pigmentation, covering ninety-nine percent of the area.

Figure 6. YOLOv11 detection output showing vitiligo lesions in a smartphone image.

11.1 Comparison with prior work

Unlike earlier studies that addressed dermatological imaging and mental health in isolation, our framework unifies the two domains. Previous vitiligo studies focused primarily on clinical dermatology or psychosocial surveys, without multimodal integration. Our system represents, to our knowledge, the first AI prototype linking YOLO-based skin analysis with NLP-driven emotional monitoring. Future work will integrate patient-reported messages for sentiment training, validate the multimodal fusion in clinical cohorts, and extend the framework to other chronic dermatological conditions with strong psychosocial components. A prospective clinical validation study is planned to evaluate the fusion alert mechanism in real-world workflows, assessing its impact on patient outcomes and clinician response time.

11.2 Clinical implications

Integration of dermatological and psychosocial monitoring provides physicians with a decision support tool for holistic patient care. Mobile deployment improves accessibility in low-resource settings, while continuous sentiment monitoring fosters proactive interventions. The high sensitivity and specificity of the system have significant clinical implications. With high recall, the system ensures that vitiligo lesions are reliably identified, supporting early diagnosis and timely intervention. Simultaneously, high precision reduces false positives, preventing unnecessary clinical evaluations for healthy individuals. This dual capability improves the reliability of the system, ensuring both efficiency and trustworthiness in practical deployment. The resolution of 640 × 640 pixels provides an ideal compromise between computational efficiency and diagnostic precision, making the system highly suitable for scalable use in settings with limited resources, such as mobile or edge healthcare solutions. Additionally, the well-balanced dataset and accurate annotations enhance the model's capability to generalize across a wide variety of patient scenarios.

11.3 Limitations

The sentiment classifier was trained on public data (GoEmotions) rather than patient-reported messages, which may limit clinical generalizability. Fusion experiments were simulated; real-world validation through prospective trials is necessary. Future work will integrate in-app patient data and extend the model to other dermatological conditions. The sentiment classifier, trained on the GoEmotions corpus, serves as a transferable baseline; Future clinical validation will employ patient-generated text and voice inputs from vitiligo cohorts to fine-tune and benchmark real-world performance.

11.4 AI-driven ethical considerations

Due to the nature of the disease and its effect on other fields such as mental health or phycology, the adoption of AI-driven technologies in healthcare, especially in the management of sensitive conditions such as Vitiligo, requires a rigorous ethical framework to ensure the responsible use of technology. Addressing ethical considerations is critical to safeguarding patient rights, building trust, and maximizing the societal benefits of this innovation.

1. Algorithmic bias and fairness also warrant careful attention (30). AI models can unintentionally reflect biases present in training datasets, leading to disparities in system performance between demographic groups, such as those differentiated by skin tone, gender, or language. To mitigate such risks, training datasets must be diverse and representative. Furthermore, the system must be designed to ensure equitable access, particularly in low-resource or underserved settings, to prevent exacerbating existing disparities in healthcare.

2. The psychological impact of system interactions is another critical consideration (4). Vitiligo often carries a significant emotional burden, and the AI system must be designed to respond with empathy and sensitivity. Responses must be carefully calibrated to avoid reinforcing negative self-perceptions or causing additional distress. Trigger management mechanisms should also ensure that incorrect or poorly contextualized responses do not adversely affect patients mental health.

3. Accountability and human oversight are fundamental to maintaining ethical standards (31). While the AI system provides diagnostic and monitoring support, it must not replace human judgment. Clinicians must remain central to the decision-making process, ensuring that interventions or treatments are guided by professional expertise.

4. Transparency and explainability are crucial to building trust in AI technologies (32). Patients and clinicians should have access to clear explanations of how the AI system generates its output. This transparency fosters confidence in the technology while allowing users to understand the basis of its recommendations. The design of algorithms must prioritize interpretability, ensuring that decisions are not perceived as “black boxes.”

5. Compliance with regulatory frameworks that govern medical devices and AI systems is imperative (33). Research must comply with relevant local, national and international regulations, and ethical approvals from Institutional Review Boards (IRBs) or ethics committees must be obtained before conducting the study.

6. The sustainability and monitoring of the AI system are equally important (34). Regular post-deployment evaluations should assess its ongoing ethical impact, with patient feedback incorporated to refine and improve the system. In addition, researchers must address dual use risks by proactively establishing guidelines to prevent misuse of technology, such as unauthorized emotional profiling or data exploitation. Continuous user-experience monitoring will be embedded to evaluate and mitigate any unintended psychological impact, ensuring that AI-generated feedback remains supportive and non-intrusive.

In conclusion, in projects such as the one presented in this article, addressing these ethical considerations ensures that this research respects patient rights, promotes equity, and improves the legitimacy of AI-driven healthcare solutions. Integrating ethical principles into the system's development and implementation holds the potential to enhance patient care and foster long-term societal trust in AI technologies.

11.5 Limitations of the approach

Although the proposed AI-driven framework for Vitiligo diagnosis and mental health monitoring has significant potential, it is not without limitations: (1) Bias and fairness (35): algorithmic bias remains a critical concern. Imbalances (36) in training data sets -such as underrepresentation of certain skin types, languages, or age groups - can result in disparities in model performance. This could lead to unequal diagnostic results or misinterpretation of emotional states, disproportionately affecting vulnerable populations; (2) Complexity of emotional analysis (37): mental health assessment through sentiment and emotion analysis involves subjective interpretations, which may not always align with clinical evaluations. Emotional states are nuanced and influenced by context, personal history, and cultural factors, which can complicate accurate detection and response generation; (3) Integration of multimodal data (38): While integrating text and audio inputs can improve system precision, it also increases complexity. The fusion of multimodal data requires sophisticated models and computational resources, which can limit real-time processing and accessibility, particularly in resource-constrained environments. To manage this complexity, modular processing and lightweight model compression strategies are being explored to preserve real-time responsiveness on mobile devices; (4) Technical challenges in speech-to-text (39): Speech-to-text technology, while advanced, is not immune to errors, especially in cases of diverse accents, dialects, or background noise. These inaccuracies could negatively impact the analysis of downstream emotions and sentiments, leading to incorrect conclusions or inappropriate interventions; (5) Dependency on technology (40): reliance on AI systems may inadvertently reduce human oversight, leading to a potential overreliance on automated decisions. Without proper checks, this could result in missed opportunities for nuanced clinical judgments that require human intuition and experience; (6) Acceptance and trust (41): adoption of AI-driven healthcare tools requires a high level of trust from both patients and clinicians. Concerns about the interpretability and reliability of the system may hinder its acceptance, particularly in settings where human-centric care is deeply valued; (7) Real-time adaptation (42): While reinforcement learning and feedback loops are incorporated to dynamically improve the system, real-time adaptation to diverse patient inputs and contexts remains a complex task. The system may struggle to balance responsiveness with consistency, particularly in unpredictable scenarios.

Addressing these limitations will require continuous refinement of the models, careful ethical oversight, and collaboration between AI developers, clinicians, and policy makers. By recognizing these challenges, research can prioritize strategies to mitigate them, ensuring that the proposed framework achieves its full potential to improve Vitiligo care and mental health support. Future work could also explore the integration of advanced AI techniques, such as reinforcement learning and Retrieval-Augmented Generation (RAG), to optimize real-time decision making and patient interaction.

Although the current study implements the dermatological AI pipeline, sentiment- and speech-based monitoring is still under development. No large-scale clinical validation has been performed. Future work includes testing the system with real patient interviews, mobile deployment, and longitudinal tracking.

12 Future implementation lines

Retrieval-Augmented Generation (RAG) combines neural networks with information retrieval systems to dynamically generate responses grounded in external knowledge (43). Potential applications include: (1) Knowledge-driven support provides patients with context-specific educational materials and coping strategies. (2) Clinician assistance retrieve clinical guidelines and case studies to support decision-making; (3) Speech-to-Text Integration could generate empathetic and context-sensitive responses based on the patient's history. By leveraging Transfer learning, it can be accelerated the development of AI models and ensure adaptability with: (1) domain adaptation, tuning pre-trained models (e.g., BERT, GPT) for healthcare-related purposes like sentiment analysis and detecting Vitiligo; (2) Cross-language support, which will adapt multilingual models for diverse populations; the (3) Low-resource scalability, which could train models effectively in environments with limited labeled data. By integrating reinforcement learning (RL), enables systems to adapt and improve through feedback: (1) Personalized interventions, which will optimize responses based on prior patient interactions; (2) Adaptive feedback, which could dynamically adjust the feedback mechanisms to guide patients toward acceptance; the (3) real-time learning, which continuously will refine the applied strategies as new data becomes available. Another promising line would be the Multimodal and scalable implementation of proposed method, where with the (1) Multimodal integration it is possible to combine the text, audio, and wearable device data for holistic analysis; (2) with a Telemedicine deployment it is possible expanding access to underserved areas through scalable, cloud-based solutions; (3) with an integration of population health insights, the system could aggregate the anonymized data to identify trends in Vitiligo care and mental health. Another aspect would be the implementation of Ethical and transparent AI, which might have the following line of research: (1) Explainable AI (XAI) (32), to develop transparent models to foster patient and clinician trust; the (2) Bias mitigation line, to ensure the equitable performance across diverse demographic groups; and the (3) Regulatory compliance line, to adapt to evolving healthcare the new AI standards. The impact of AI-supported technologies: (1) RAG enables evidence-based, real-time responses by integrating external knowledge; (2) Transfer Learning reduces development time and enhances adaptability to new contexts; (3) Reinforcement Learning optimizes interventions through continuous feedback and personalized strategies. This component remains a planned extension of the system. Future versions will include Whisper-based transcription pipelines combined with prosodic analysis for emotion detection from voice.

13 Conclusions

Our prototype represents a foundational step toward a fully integrated AI framework for dermatology and mental health. Future work includes longitudinal patient validation and transfer learning from larger multimodal corpora. This study investigated the incorporation of cutting-edge AI technologies to tackle both the physical and psychological aspects of Vitiligo treatment and physiological and/or mental monitoring. Using state-of-the-art approaches such as the YOLO algorithm (v11 version) for real-time and accurate Vitiligo detection and sentence analysis for continuous mental health monitoring, the framework presents a holistic, patient-centered solution. The findings and proposed methodologies align with the objectives outlined at the start of the study, demonstrating significant potential to improve clinical outcomes and patient quality of life. Even in diverse and resource-limited environments, YOLO improves the diagnosis of Vitiligo lesion. Dermatologists can quickly diagnose and treat patients with real-time functionality, a key care gap. With sentiment analysis and speech-to-text, the system monitors patients' emotional health beyond medical care. Real-time emotional analysis identifies mental health problems early, enabling proactive treatment and acceptance. Lightweight AI models on mobile and cloud platforms can improve diagnosis and monitoring in disadvantaged areas, according to the proposed strategy. It satisfies the global needs for equity in healthcare. Holistic care addresses dermatological and psychological requirements. The paradigm breaks the boundary between physical and mental health management, allowing doctors to provide more personalized and compassionate care. This study shows how artificial intelligence and healthcare can help dermatologists, mental health professionals, and technology work together. Health problems are complex and require interdisciplinary approaches. Scalable and flexible AI solutions use RAG, transfer learning, and reinforcement learning. These technologies improve healthcare care by improving diagnosis, patient participation, and focused interventions. This study greatly affects healthcare care. AI improves vitiligo treatment and supports other chronic psychological disorders. The system addresses the goal of equitable and sustainable healthcare through accessibility, ethics, and real-time adaptation. Data quality, algorithmic bias, and scalability in resource-limited contexts must be examined, although the proposed solution fits multiple needs. Future research should emphasize multimodal integration, AI-driven assessment transparency, and cross-cultural and linguistic applicability. Modern methods like RAG and reinforcement learning may help overcome these limits. AI in healthcare could change Vitiligo treatment, according to this study. Research addresses physical and emotional health and advances AI-assisted medicine in a patient-centered manner. The proposed approach could revolutionize the treatment of chronic diseases worldwide by improving patient outcomes through continuous improvement and collaboration. With exceptional mAP (98. 8%), precision (95. 6%), and recall (97. 0%), the system demonstrates robust diagnostic capabilities, supported by consistent convergence of loss curves and strong generalization to validation data. These findings underscore the suitability of the framework for clinical applications, offering a scalable and efficient solution for early vitiligo diagnosis. By tackling noted constraints and broadening its framework to encompass comprehensive patient monitoring, this paradigm could significantly alter the standard of care in dermatology. The integration of dermatological AI with emotional monitoring opens a new frontier in patient-centered, multimodal digital health for chronic conditions like vitiligo.

Data availability statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Ethics Commission of the Faculty of Medicine with no. 1255/2021, respectively of the Mureş County Clinical Hospital with no. 16501/2021. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

AB: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. LI: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Validation, Writing – review & editing. LF: Conceptualization, Formal analysis, Investigation, Visualization, Writing – review & editing. GL: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Visualization, Writing – review & editing, Validation.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported in part by the Consolidator Researcher Program of Óbuda University, Budapest, Hungary.

Acknowledgments

We would like to thank the support for the Research Center on Artificial Intelligence, Data Science, and Smart Engineering (ARTEMIS) within George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Simons R, Zevy D, Jafferany M. Psychodermatology of vitiligo: psychological impact and consequences. Dermatol Ther. (2020) 33:13418. doi: 10.1111/dth.13418

PubMed Abstract | Crossref Full Text | Google Scholar

2. Fekete L, Iantovics LB, Fekete GL. Validation of the DLQI questionnaire in assessing the disease burden and principal aspects related to life quality of vitiligo patients. Front Psychol. (2024) 15:1333723. doi: 10.3389/fpsyg.2024.1333723

PubMed Abstract | Crossref Full Text | Google Scholar

3. Fekete L, Bacarea V, Fekete GL, Fekete JE, Kovacs B. Comprehensive analysis of life quality of patients with vitiligo in Romania: insights from a multivariate approach. Front Med. (2025) 12:1613083. doi: 10.3389/fmed.2025.1613083

PubMed Abstract | Crossref Full Text | Google Scholar

4. Katroliya P. vitiligo and depression: an observational study in patients attending tertiary care centre. Int J Res Med Sci. (2024) 12:2029–33. doi: 10.18203/2320-6012.ijrms20241553

Crossref Full Text | Google Scholar

5. Fekete L, Iantovics LB, Fekete GL. Exploratory axis factoring for identifying the self-esteem latent factors and their correlation with the life quality of persons suffering from vitiligo. Front Psychol. (2023) 14:1200713. doi: 10.3389/fpsyg.2023.1200713

PubMed Abstract | Crossref Full Text | Google Scholar

6. Biró A, Cuesta-Vargas AI, Szilágyi L. Applied AI for real-time detection of lesions and tumors following severe head injuries. In: IEEE 21st International Symposium on Intelligent Systems and Informatics (SISY). Pula, Croatia: IEEE, IEEE Xplore (2023). p. 653–658. doi: 10.1109/SISY60376.2023.10417915

Crossref Full Text | Google Scholar

7. Biró A, Cuesta-Vargas AI, Szilágyi L. Precognition of mental health and neurogenerative disorders using AI-parsed text and sentiment analysis. Acta Univers Sapientiae, Informatica. (2023) 15:359–403. doi: 10.2478/ausi-2023-0022

Crossref Full Text | Google Scholar

8. Ettman C. The potential influence of ai on population mental health. JMIR mental health. (2023) 10:e49936. doi: 10.2196/49936

PubMed Abstract | Crossref Full Text | Google Scholar

9. Biró A, Jánosi-Rancz KT, Szilágyi L. Real-time artificial intelligence text analysis for identifying burnout syndromes in high-performance athletes. In: IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI). Stará Lesná, Slovakia: IEEE, IEEE Xplore (2024). p. 000253–000258. doi: 10.1109/SAMI60510.2024.10432817

Crossref Full Text | Google Scholar

10. Olaye I, Seixas A. The gap between AI and bedside: participatory workshop on the barriers to the integration, translation, and adoption of digital health care and AI startup technology into clinical practice. J Med Internet Res. (2023) 25:e32962. doi: 10.2196/32962

PubMed Abstract | Crossref Full Text | Google Scholar

11. Cadmus S, Riddle A, Sebastian K, Reddy P, Ahmed A. Psychosocial and quality-of-life factors associated with depigmentation therapy for vitiligo. Arch Dermatol Res. (2023) 315:2283–8. doi: 10.1007/s00403-023-02595-5

PubMed Abstract | Crossref Full Text | Google Scholar

12. Witkowski K. Public perceptions of artificial intelligence in healthcare: ethical concerns and opportunities for patient-centered care. BMC Med Ethics. (2024) 25:74. doi: 10.1186/s12910-024-01066-4

PubMed Abstract | Crossref Full Text | Google Scholar

13. Mohmoud Z, Elsayed S, Ahmed F. Psychosocial status and quality of life among vitiligo patients. Benha J Appl Sci. (2023) 8: 67–79. doi: 10.21608/bjas.2023.189668.1046

Crossref Full Text | Google Scholar

14. Balcombe L, Leo D. Digital mental health challenges and the horizon ahead for solutions. JMIR mental health. (2021) 8:e26811. doi: 10.2196/26811

PubMed Abstract | Crossref Full Text | Google Scholar

15. Ng Z, Li L, Chew H, Lau Y. The role of artificial intelligence in enhancing clinical nursing care: a scoping review. J Nurs Manag. (2021) 30:3654–74. doi: 10.1111/jonm.13425

PubMed Abstract | Crossref Full Text | Google Scholar

16. Khan B, Fatima H, Qureshi A, Kumar S, Hanan A, Hussain J, et al. Drawbacks of artificial intelligence and their potential solutions in the healthcare sector. Biomed Mater Dev. (2023) 1:731–8. doi: 10.1007/s44174-023-00063-2

PubMed Abstract | Crossref Full Text | Google Scholar

17. Rollwage M. Using conversational ai to facilitate mental health assessments and improve clinical efficiency within psychotherapy services: real-world observational study. JMIR AI. (2023) 2:e44358. doi: 10.2196/44358

PubMed Abstract | Crossref Full Text | Google Scholar

18. Lin S. A clinician's guide to artificial intelligence (AI): why and how primary care should lead the health care AI revolution. J Am Board Family Med. (2022) 35:175–84. doi: 10.3122/jabfm.2022.01.210226

PubMed Abstract | Crossref Full Text | Google Scholar

19. Record J, Ziegelstein R, Christmas C, Rad C, Hanyok L. Delivering personalized care at a distance: how telemedicine can foster getting to know the patient as a person. J Pers Med. (2021) 11:137. doi: 10.3390/jpm11020137

PubMed Abstract | Crossref Full Text | Google Scholar

20. Mulholland K. On-site dermatology care for older adults. J Dermatol Nurses Assoc. (2024) 16:152–6. doi: 10.1097/JDN.0000000000000796

Crossref Full Text | Google Scholar

21. O'Malley K, Blakley L, Ramos K, Torrence N, Sager Z. Mental healthcare and palliative care: barriers. BMJ Support Palliat Care. (2020) 11:138–44. doi: 10.1136/bmjspcare-2019-001986

PubMed Abstract | Crossref Full Text | Google Scholar

22. Ragab M. A comprehensive systematic review of yolo for medical object detection (2018 to 2023). IEEE Access. (2024) 12:57815–36. doi: 10.1109/ACCESS.2024.3386826

Crossref Full Text | Google Scholar

23. Fekete L, Fekete GL, Iantovics LB, Fekete JE, Bacarea V. Evaluation of the clinical and sociodemographic features of patients with vitiligo from the central region of Romania. Front Med. (2025) 12:1544184. doi: 10.3389/fmed.2025.1544184

PubMed Abstract | Crossref Full Text | Google Scholar

24. Cong X, Li S, Chen F, Liu C, Yue M. A review of yolo object detection algorithms based on deep learning. Front Comput Intell Syst. (2023) 4:17–20. doi: 10.54097/fcis.v4i2.9730

Crossref Full Text | Google Scholar

25. Jeon D. A method for reducing false negative rate in non-maximum suppression of yolo using bounding box density. J Multimedia Inf Syst. (2023) 10:293–300. doi: 10.33851/JMIS.2023.10.4.293

Crossref Full Text | Google Scholar

26. Lee J, Hwang K. YOLO with adaptive frame control for real-time object detection applications. Multimed Tools Appl. (2021) 81:36375–96. doi: 10.1007/s11042-021-11480-0

Crossref Full Text | Google Scholar

27. Sud E, Anjankar A. Applications of telemedicine in dermatology. Cureus. (2022) 14:27740.

Google Scholar

28. Adegoke B. Harnessing big data for tailored health communication: a systematic review of impact and techniques. Int J Biol Pharm Res Updates. (2024) 3:01–010. doi: 10.53430/ijbpru.2024.3.2.0024

Crossref Full Text | Google Scholar

29. Biró A, Cuesta-Vargas AI, Szilágyi SM. Real-time disease and COVID-19 detection pipeline from voice for performance sports. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Special Section Cyber. Honolulu, Oahu, HI, USA: IEEE, IEEE Xplore (2023). p. 2309–2314. doi: 10.1109/SMC53992.2023.10394396

Crossref Full Text | Google Scholar

30. Sreerama J. Ethical considerations in AI addressing bias and fairness in machine learning models. J Knowl Learn Sci Technol. (2022) 1:130–138. doi: 10.60087/jklst.vol1.n1.p138

Crossref Full Text | Google Scholar

31. Elendu C. Ethical implications of AI and robotics in healthcare: a review. Medicine. (2023) 102:e36671. doi: 10.1097/MD.0000000000036671

PubMed Abstract | Crossref Full Text | Google Scholar

32. Tiwari R. Explainable AI (xAI) and its applications in building trust and understanding in AI decision making. Int J Sci Res Eng Manag. (2023) 07. doi: 10.55041/IJSREM17592

Crossref Full Text | Google Scholar

33. Olorunsogo T. Ethical considerations in ai-enhanced medical decision support systems: a review. World J Adv Eng Technol Sci. (2024) 11:329–36. doi: 10.30574/wjaets.2024.11.1.0061

Crossref Full Text | Google Scholar

34. Alim N, Adebayo A. Ethics in artificial intelligence: issues and guidelines for developing acceptable AI systems. Global J Eng Technol Adv. (2022) 11:037–44. doi: 10.30574/gjeta.2022.11.3.0092

Crossref Full Text | Google Scholar

35. Paulus J, Kent D. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. NPJ Digital Med. (2020) 3:99. doi: 10.1038/s41746-020-0304-9

PubMed Abstract | Crossref Full Text | Google Scholar

36. Muraru MM, Sim Z, Iantovics LB. Cervical cancer prediction based on imbalanced data using machine learning algorithms with a variety of sampling methods. Appl Sci. (2024) 14:1118. doi: 10.20944/preprints202409.1118.v1

Crossref Full Text | Google Scholar

37. Makhdom N, A. review on sentiment and emotion analysis for computational literary studies. Int J Sci Res Comput Sci Eng Inf Technol. (2024) 10:107–19. doi: 10.32628/CSEIT241029

Crossref Full Text | Google Scholar

38. Karani R, Desai S. Review on multimodal fusion techniques for human emotion recognition. Int J Adv Comput Sci Applic. (2022) 13:287–296. doi: 10.14569/IJACSA.2022.0131035

Crossref Full Text | Google Scholar

39. Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Soc Netw Anal Mining. (2021) 11:81. doi: 10.1007/s13278-021-00776-6

PubMed Abstract | Crossref Full Text | Google Scholar

40. Koulu R. Proceduralizing control and discretion: human oversight in artificial intelligence policy. Maastrich J Eur Comp Law. (2020) 27:720–35. doi: 10.1177/1023263X20978649

Crossref Full Text | Google Scholar

41. Asan O, Bayrak A, Choudhury A. Artificial intelligence and human trust in healthcare: focus on clinicians. J Med Internet Res. (2020) 22:e15154. doi: 10.2196/15154

PubMed Abstract | Crossref Full Text | Google Scholar

42. Mitri D, Schneider J, Drachsler H. Keep me in the loop: real-time feedback with multimodal data. Int J Artif Intell Educ. (2021) 32:1093–118. doi: 10.1007/s40593-021-00281-z

Crossref Full Text | Google Scholar

43. Chen J. Benchmarking large language models in retrieval-augmented generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. (2024). p. 17754–62. doi: 10.1609/aaai.v38i16.29728

Crossref Full Text | Google Scholar

Keywords: vitiligo, autoimmune disorder, YOLOv11, artificial intelligence, sentiment analysis, mental health monitoring, psychodermatology, personalized medicine

Citation: Biró A, Iantovics LB, Fekete L and Fekete GL (2025) Prototype of a multimodal AI system for vitiligo detection and mental health monitoring. Front. Med. 12:1709891. doi: 10.3389/fmed.2025.1709891

Received: 21 September 2025; Accepted: 13 October 2025;
Published: 06 November 2025.

Edited by:

Yan Valle, Vitiligo Research Foundation, United States

Reviewed by:

Zeinab Mohseni Afshar, Kermanshah Infectious Diseases Center, Iran
Konstantin Lomonosov, I.M. Sechenov First Moscow State Medical University, Russia

Copyright © 2025 Biró, Iantovics, Fekete and Fekete. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: László Barna Iantovics, YmFybmEuaWFudG92aWNzQHVtZnN0LnJv

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.