ORIGINAL RESEARCH article
Front. Psychiatry
Sec. Computational Psychiatry
Leveraging Data Augmentation for Machine Learning Models in Predicting Depression and Anxiety Using the Revised Child Anxiety and Depression Scale (RCADS-47) Clinical Reports
Provisionally accepted- 1National University of Sciences and Technology (NUST), Islamabad, Pakistan
- 2Rawalpindi Medical University, Rawalpindi, Pakistan
- 33University of Padua, Via Cesare Battisti, 241, 35121 Padova PD, Italy, Padova, Italy
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Objective: An estimated 15 million people are affected by depression and anxiety in Pakistan. However, there are relatively few government mental health facilities and certified psychiatrists. This highlights the need for efficient assessments to implement intervention strategies and address these challenges. This study aims to utilize machine learning with RCADS to maximize the use of current healthcare resources and facilitate depression and anxiety screening. Methods: The dataset include 138 cases, with 89 retained after cleaning along 47 RCADS-items as features. Based on RCADS-47 T-scores, cases were classified as normal, borderline and clinical, with 7% in the borderline, 55% in normal and 38% in clinical range. Three feature selection methods - the Chi-square test of independence, Spearman’s correlation, and Random Forest-Recursive Feature Elimination were performed. Data augmentation was done using the probability distribution of the existing data to generate hybrid-synthetic correlated discrete multinomial variants of each item of RCADS. Six commonly employed ML algorithms, Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, Naive Bayes, and K-Nearest Neighbor, were trained on the original dataset and the top three best models were then evaluated on augmented datasets and the best among them, further validated on external dataset. Results: Item 05 of the RCADS has a weak correlation with the evaluation of depression and anxiety in the study population. Data augmented to forth time its original size was determined to be the optimal ratio for our dataset as Random Forest yielded the best overall results with up to 81% macro average accuracy, precision, recall and F1 score when tested on this data. Conclusion: The findings suggest that the Random Forest algorithm using 46 features suits the data well and has the potential to be further developed as a decision support system for the concerned professionals and improve the usual way of screening anxiety and depression in children and adolescents.
Keywords: Revised Child Anxiety and Depression Scale (RCADS), machine learning algorithms, Depression, Anxiety, Data augmentation
Received: 21 Aug 2025; Accepted: 04 Nov 2025.
Copyright: © 2025 Noor, Hussain, Hamdan, Zaman, Paracha and Zahra Shamsi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mehwish Zaman, mehwish.zaman@studenti.unipd.it
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
