BRIEF RESEARCH REPORT article
Front. Public Health
Sec. Digital Public Health
Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1672038
This article is part of the Research TopicOperationalizing Precision HealthView all articles
Operationalizing language-based population stratification for widening access to precision genomics in Africa
Provisionally accepted- Ortholog, Nairobi, Kenya
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Despite remarkable advancements in genomic technologies, individuals of predominant African-related genetic similarity remain significantly under-represented, accounting for only 2.4% of published genome-wide association studies. This disparity limits our understanding of human biology and hinders equitable translation of genomic advances into healthcare.We exploited a quantitative framework using normalized Levenshtein distance (LDN) to analyse lexical similarity patterns across Kenya's ethnolinguistic landscape, comprising Bantu, Nilotic, and Cushitic language groups. We compared lexical distance matrices with available genetic population differentiation data and geographic proximity to evaluate their relative efficacy in predicting genetic relationships.Results: Lexical similarity analysis revealed distinct clustering patterns that closely mirror Kenya's ethnolinguistic diversity. Multidimensional scaling and hierarchical clustering clearly separated the three major language families and identified finescale relationships within each group. Importantly, lexical distance demonstrated stronger correlation with genetic differentiation (r = 0.91, CI(0.55-0.99)) than geographic proximity (r = 0.29, CI(0.29-0.53)), confirming language as a superior proxy for population genetic structure. Our analysis, demonstrate an objective basis for prioritizing populations in genomic studies.
Keywords: Precision genomics, Africa, lexical similarity, Multi-ethnic, population stratification, Genomics
Received: 23 Jul 2025; Accepted: 29 Aug 2025.
Copyright: © 2025 Kulohoma and Wesonga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Benard Kulohoma, Ortholog, Nairobi, Kenya
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.