ORIGINAL RESEARCH article
Front. Comput. Sci.
Sec. Mobile and Ubiquitous Computing
Bursting the Bubble: Data Leakage and Inflated Deep Learning Accuracy in Multivariate Time-Series Frailty Classification
Provisionally accepted- Technical University of Berlin, Berlin, Germany
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Accurate prediction of frailty in older adults is crucial for preventing adverse outcomes, yet distinguishing frail, pre-frail, and non-frail states remains challenging. Amjad et al. (2025) applied InceptionTime to the GSTRIDE dataset and reported near-perfect multi-class frailty prediction (>98% accuracy), exceeding values typically observed in comparable studies. Methods: We conducted a methodological re-evaluation and replication of this pipeline to assess the robustness of reported performance. Corrections included subject-wise data partitioning, feature scaling within training folds, and non-overlapping sliding time windows applied separately to each subset to prevent potential leakage. Results: Reimplementation of the original pipeline reproduced the previously reported high accuracy. After applying the corrected framework, overall recall and precision decreased (47.4% and 45.9%, respectively), providing a more conservative, data-specific estimate of model generalizability. Per-class analysis indicated reductions across all categories, with Frail-class recall dropping to 21.4%, highlighting the particular challenge of identifying high-risk individuals. Conclusion: The findings suggest that methodological factors, such as data leakage, likely contributed to the previously reported high performance. Under rigorous controls, frailty prediction challenging, particularly for the frail class, underscoring the need for careful evaluation of model generalizability. Significance: This study illustrates the importance of transparent, methodologically sound pipelines in clinical AI research. By providing a reproducible framework for frailty prediction, we aim to support future studies in obtaining realistic performance estimates and developing clinically meaningful models.
Keywords: Aging, Data leakage, deep learning, Frailty, multivariate time-series classification
Received: 06 Sep 2025; Accepted: 02 Feb 2026.
Copyright: © 2026 Hughes and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Charmayne Mary Lee Hughes
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
