Your new experience awaits. Try the new design now and help us make it even better

BRIEF RESEARCH REPORT article

Front. Public Health

Sec. Digital Public Health

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1672038

This article is part of the Research TopicOperationalizing Precision HealthView all articles

Operationalizing language-based population stratification for widening access to precision genomics in Africa

Provisionally accepted
Benard  KulohomaBenard Kulohoma*Colette  S A WesongaColette S A Wesonga
  • Ortholog, Nairobi, Kenya

The final, formatted version of the article will be published soon.

Background: Despite remarkable advancements in genomic technologies, individuals of predominant African-related genetic similarity remain significantly under-represented, accounting for only 2.4% of published genome-wide association studies. This disparity limits our understanding of human biology and hinders equitable translation of genomic advances into healthcare.We exploited a quantitative framework using normalized Levenshtein distance (LDN) to analyse lexical similarity patterns across Kenya's ethnolinguistic landscape, comprising Bantu, Nilotic, and Cushitic language groups. We compared lexical distance matrices with available genetic population differentiation data and geographic proximity to evaluate their relative efficacy in predicting genetic relationships.Results: Lexical similarity analysis revealed distinct clustering patterns that closely mirror Kenya's ethnolinguistic diversity. Multidimensional scaling and hierarchical clustering clearly separated the three major language families and identified finescale relationships within each group. Importantly, lexical distance demonstrated stronger correlation with genetic differentiation (r = 0.91, CI(0.55-0.99)) than geographic proximity (r = 0.29, CI(0.29-0.53)), confirming language as a superior proxy for population genetic structure. Our analysis, demonstrate an objective basis for prioritizing populations in genomic studies.

Keywords: Precision genomics, Africa, lexical similarity, Multi-ethnic, population stratification, Genomics

Received: 23 Jul 2025; Accepted: 29 Aug 2025.

Copyright: © 2025 Kulohoma and Wesonga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Benard Kulohoma, Ortholog, Nairobi, Kenya

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.