AUTHOR=Veintimilla Alison M. , Acharya Chintan K. , Mulligan Connie J. , Fang Ruogu , Moore Erika TITLE=TRACE: applying AI language models to extract ancestry information from curated biomedical literature JOURNAL=Frontiers in Digital Health VOLUME=Volume 7 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2025.1608370 DOI=10.3389/fdgth.2025.1608370 ISSN=2673-253X ABSTRACT=IntroductionAncestry reporting is essential to ensure transparency and proper representation in biomedical studies. However, manually extracting this information from study texts is time-consuming and inefficient. In this paper, we present TRACE (Tool for Researching Ancestry and Cell Extraction), powered by GPT-4 and web-crawling, to automate ancestry identification by detecting cell lines or cultures in texts and tracing their ancestry.MethodsTRACE extracts cell lines and primary cultures from research articles and follows web sources to determine their ancestry. We compared TRACE's outputs to a manually generated database to confirm its performance in identifying and verifying ancestry information.ResultsThe results reveal an overrepresentation of European/White samples and significant underreporting. TRACE enables large-scale, systematic ancestry analysis—a valuable resource for researchers and agencies assessing biases in sample selection.ConclusionsAs an open-source tool, TRACE it facilitates broader use to evaluate and improve ancestry representation in biomedical research.