Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

This article is part of the Research TopicThe Use of Large Language Models to Automate, Enhance, and Streamline Text Analysis Processes. Large Language Models Used to Analyze and Check Requirement Compliance.View all 3 articles

Analysis of article screening and data extraction performance by an AI systematic literature review platform

Provisionally accepted
Kelsie  CassellKelsie Cassell1*Abiodun  OlogunowaAbiodun Ologunowa2Majid  Rastegar-MojaradMajid Rastegar-Mojarad3Bianca  ChunBianca Chun1Yi-Ling  HuangYi-Ling Huang1Dong  WangDong Wang1Nicole  CossrowNicole Cossrow1
  • 1Merck & Co.,, Rahway, United States
  • 2University of Rhode Island, Kingston, United States
  • 3IMO Health, Rosemont, United States

The final, formatted version of the article will be published soon.

Background: Systematic literature reviews (SLRs) are critical to health research and decision-making but are often time- and labor-intensive. Artificial intelligence (AI) tools like large language models (LLMs) provide a promising way to automate these processes. Methods: We conducted a systematic literature review on the cost-effectiveness of adult pneumococcal vaccination and prospectively assessed the performance of our AI-assisted review platform, Intelligent Systematic LiterAture Review (ISLaR) 2.0, compared to expert researchers. Results: ISLaR demonstrated high accuracy (0.87 full-text screening; 0.86 data extraction), precision (0.88; 0.86), and sensitivity (0.91; 0.98) in article screening and data extraction tasks, but lower specificity (0.79; 0.42), especially when extracting data from tables. The platform reduced abstract and full-text screening time by over 90% compared to human reviewers. Conclusion: The platform has strong potential to reduce reviewer workload but requires further development.

Keywords: artificial intelligence, Systematic Literature Review, Data extraction, reviewer workload, Health Technology Assessment, Large language models

Received: 08 Jul 2025; Accepted: 28 Oct 2025.

Copyright: © 2025 Cassell, Ologunowa, Rastegar-Mojarad, Chun, Huang, Wang and Cossrow. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Kelsie Cassell, kelsie.cassell@merck.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.