ORIGINAL RESEARCH article
Front. Genet.
Sec. Evolutionary and Population Genetics
Volume 16 - 2025 | doi: 10.3389/fgene.2025.1631529
Rapid Forensic Ancestry Inference in Selected Northeast Asian Populations: A Y-STR Based Attention-Based Ensemble Framework for Initial Investigation Guidance
Provisionally accepted- Department of Management Engineering, Department of Engineering, Dankook University, Cheonan, Republic of Korea
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Rapid inference of ancestral origin from DNA evidence is critical in time-sensitive forensic investigations, particularly during the initial hours when crucial investigative decisions must be made. Although comprehensive analyses using multiple genetic markers provide thorough results, they often require significant processing time and resources. Y-chromosome short tandem repeats (Y-STRs) exhibit population-specific allelic distributions that facilitate rapid analysis, making them particularly valuable for initial screening in forensic contexts. This study aims to enhance population classification accuracy using Y-STR profile analysis, with a particular focus on Northeast Asian populations that are often merged into a single group by commercial ancestry panels. We developed a machine learning architecture centered on an attention-based ensemble mechanism that incorporates three complementary algorithms: a One-vs-Rest Random Forest, XGBoost, and Logistic Regression, each configured to effectively manage imbalanced datasets. Utilizing only Y-STR data, the model achieved an overall accuracy of 80–81% and demonstrated high stability. Notably, the model effectively processes imbalanced datasets, generating reliable outcomes for rapid ancestry assessment in time-critical investigations. By addressing a key limitation in commercial ancestry panels—their failure to differentiate among Northeast Asian subpopulations—this framework provides valuable preliminary guidance in forensic cases involving Asian individuals. Consequently, our approach enhances rapid screening capabilities, which can inform early-stage investigations while complementing subsequent, more comprehensive genetic analyses.
Keywords: Y-STR, rapid forensic screening, initial ancestry inference, machine learning, Data Imbalance, Northeast Asian populations, crime scene investigation
Received: 20 May 2025; Accepted: 25 Aug 2025.
Copyright: © 2025 Koo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Kyo-Chan Koo, Department of Management Engineering, Department of Engineering, Dankook University, Cheonan, Republic of Korea
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.