TECHNOLOGY AND CODE article
Front. Res. Metr. Anal.
Sec. Emerging Technologies and Transformative Paradigms in Research
Volume 10 - 2025 | doi: 10.3389/frma.2025.1596687
This article is part of the Research TopicImpact Evaluation using the Translational Science Benefits Model Framework in the National Center for Advancing Translational Science Clinical and Translational Science Award ProgramView all 9 articles
Topic Analysis on Publications and Patents Toward Fully Automated Translation Science Benefits Model Impact Extraction
Provisionally accepted- 1Digital Infuzion, Rockville, California, United States
- 2University of Maryland, Baltimore County, Baltimore, Maryland, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: The Clinical and Translational Science Award (CTSA) program, funded by the National Center for Advancing Translational Sciences (NCATS), has supported over 65 hubs, generating 118,490 publications from 2006 to 2021. Measuring the impact of these outputs remains challenging, as traditional bibliometric methods fail to capture patents, policy contributions, and clinical implementation. The Translational Science Benefits Model (TSBM) provides a structured framework for assessing clinical, community, economic, and policy benefits, but its manual application is resource-intensive. Advances in natural language processing (NLP) and artificial intelligence (AI) offer a scalable solution for automating benefit extraction from large research datasets.Objective: This study presents an NLP-driven pipeline that automates the extraction of TSBM benefits from research outputs using Latent Dirichlet Allocation (LDA) topic modeling to enable efficient, scalable, and reproducible impact analysis. The application of NLP allows the discovery of topics and benefits to emerge from the very large corpus of CTSA documents without requiring directed searches or preconceived benefits for data mining.Methods: We applied LDA topic modeling to publications, patents, and grants and mapped the topics to TSBM benefits using subject matter expert (SME) validation. Impact visualizations, including heatmaps and t-SNE plots, highlighted benefit distributions across the corpus and CTSA hubs.Results: Spanning CTSA hub grants awarded from 2006 to 2023, our analysis corpus comprised 1,296 projects, 127,958 publications and 352 patents. Applying our NLP-driven pipeline to deduplicated data, we found that clinical and community benefits were the most frequently extracted benefits from publications and projects, reflecting the patient-centered and communitydriven nature of CTSA research. Economic and policy benefits were less frequently identified, prompting the inclusion of patent data to better capture commercialization impacts. The Publications LDA Model proved the most effective for benefit extraction for publications and projects. All patents were automatically tagged as economic benefits, given their intrinsic focus on commercialization and in accordance with TSBM guidelines. Conclusion: Automated NLP-driven benefit extraction enabled a data-driven approach to applying the TSBM at the scale of the entire CTSA program outputs.
Keywords: translational science benefits model (TSBM), natural language processing (NLP), latent Dirichlet allocation (LDA), Clinical and Translational Science Award (CTSA), impact, Artificial intelligence (AI), large language model (LLM), machine learning
Received: 21 Mar 2025; Accepted: 11 Jul 2025.
Copyright: © 2025 Manjunath, Appelmans, Balta, DiMercurio, Avalos and Stark. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Eline Appelmans, Digital Infuzion, Rockville, California, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.