TECHNOLOGY AND CODE article

Front. Res. Metr. Anal.

Sec. Emerging Technologies and Transformative Paradigms in Research

Volume 10 - 2025 | doi: 10.3389/frma.2025.1596687

This article is part of the Research TopicImpact Evaluation using the Translational Science Benefits Model Framework in the National Center for Advancing Translational Science Clinical and Translational Science Award ProgramView all 9 articles

Topic Analysis on Publications and Patents Toward Fully Automated Translation Science Benefits Model Impact Extraction

Provisionally accepted
Tejaswini  ManjunathTejaswini Manjunath1Eline  AppelmansEline Appelmans1*Sinem  BaltaSinem Balta1Dominick  DiMercurioDominick DiMercurio2Claudia  AvalosClaudia Avalos1Karen  StarkKaren Stark1
  • 1Digital Infuzion, Rockville, California, United States
  • 2University of Maryland, Baltimore County, Baltimore, Maryland, United States

The final, formatted version of the article will be published soon.

Background: The Clinical and Translational Science Award (CTSA) program, funded by the National Center for Advancing Translational Sciences (NCATS), has supported over 65 hubs, generating 118,490 publications from 2006 to 2021. Measuring the impact of these outputs remains challenging, as traditional bibliometric methods fail to capture patents, policy contributions, and clinical implementation. The Translational Science Benefits Model (TSBM) provides a structured framework for assessing clinical, community, economic, and policy benefits, but its manual application is resource-intensive. Advances in natural language processing (NLP) and artificial intelligence (AI) offer a scalable solution for automating benefit extraction from large research datasets.Objective: This study presents an NLP-driven pipeline that automates the extraction of TSBM benefits from research outputs using Latent Dirichlet Allocation (LDA) topic modeling to enable efficient, scalable, and reproducible impact analysis. The application of NLP allows the discovery of topics and benefits to emerge from the very large corpus of CTSA documents without requiring directed searches or preconceived benefits for data mining.Methods: We applied LDA topic modeling to publications, patents, and grants and mapped the topics to TSBM benefits using subject matter expert (SME) validation. Impact visualizations, including heatmaps and t-SNE plots, highlighted benefit distributions across the corpus and CTSA hubs.Results: Spanning CTSA hub grants awarded from 2006 to 2023, our analysis corpus comprised 1,296 projects, 127,958 publications and 352 patents. Applying our NLP-driven pipeline to deduplicated data, we found that clinical and community benefits were the most frequently extracted benefits from publications and projects, reflecting the patient-centered and communitydriven nature of CTSA research. Economic and policy benefits were less frequently identified, prompting the inclusion of patent data to better capture commercialization impacts. The Publications LDA Model proved the most effective for benefit extraction for publications and projects. All patents were automatically tagged as economic benefits, given their intrinsic focus on commercialization and in accordance with TSBM guidelines. Conclusion: Automated NLP-driven benefit extraction enabled a data-driven approach to applying the TSBM at the scale of the entire CTSA program outputs.

Keywords: translational science benefits model (TSBM), natural language processing (NLP), latent Dirichlet allocation (LDA), Clinical and Translational Science Award (CTSA), impact, Artificial intelligence (AI), large language model (LLM), machine learning

Received: 21 Mar 2025; Accepted: 11 Jul 2025.

Copyright: © 2025 Manjunath, Appelmans, Balta, DiMercurio, Avalos and Stark. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Eline Appelmans, Digital Infuzion, Rockville, California, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.