Bursting the Bubble: Data Leakage and Inflated Deep Learning Accuracy in Multivariate Time-Series Frailty Classification

Hughes, Charmayne  Mary Lee; Zhang, Yan

doi:10.3389/fcomp.2026.1700489

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Mobile and Ubiquitous Computing

Bursting the Bubble: Data Leakage and Inflated Deep Learning Accuracy in Multivariate Time-Series Frailty Classification

Provisionally accepted

Charmayne Mary Lee Hughes^*

Yan Zhang

Technical University of Berlin, Berlin, Germany

The final, formatted version of the article will be published soon.

Background: Accurate prediction of frailty in older adults is crucial for preventing adverse outcomes, yet distinguishing frail, pre-frail, and non-frail states remains challenging. Amjad et al. (2025) applied InceptionTime to the GSTRIDE dataset and reported near-perfect multi-class frailty prediction (>98% accuracy), exceeding values typically observed in comparable studies. Methods: We conducted a methodological re-evaluation and replication of this pipeline to assess the robustness of reported performance. Corrections included subject-wise data partitioning, feature scaling within training folds, and non-overlapping sliding time windows applied separately to each subset to prevent potential leakage. Results: Reimplementation of the original pipeline reproduced the previously reported high accuracy. After applying the corrected framework, overall recall and precision decreased (47.4% and 45.9%, respectively), providing a more conservative, data-specific estimate of model generalizability. Per-class analysis indicated reductions across all categories, with Frail-class recall dropping to 21.4%, highlighting the particular challenge of identifying high-risk individuals. Conclusion: The findings suggest that methodological factors, such as data leakage, likely contributed to the previously reported high performance. Under rigorous controls, frailty prediction challenging, particularly for the frail class, underscoring the need for careful evaluation of model generalizability. Significance: This study illustrates the importance of transparent, methodologically sound pipelines in clinical AI research. By providing a reproducible framework for frailty prediction, we aim to support future studies in obtaining realistic performance estimates and developing clinically meaningful models.

Keywords: Aging, Data leakage, deep learning, Frailty, multivariate time-series classification

Received: 06 Sep 2025; Accepted: 02 Feb 2026.

Copyright: © 2026 Hughes and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Charmayne Mary Lee Hughes

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.