AUTHOR=Magnini Bernardo , Farzi Saeed , Ferrazzi Pietro , Ghosh Soumitra , Lavelli Alberto , Mezzanotte Giulia , Speranza Manuela TITLE=A cost-effective approach to counterbalance the scarcity of medical datasets JOURNAL=Frontiers in Disaster and Emergency Medicine VOLUME=Volume 3 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/disaster-and-emergency-medicine/articles/10.3389/femer.2025.1558200 DOI=10.3389/femer.2025.1558200 ISSN=2813-7302 ABSTRACT=This paper presents an innovative methodology for addressing the critical issue of data scarcity in clinical research, specifically within emergency departments. Inspired by the recent advancements in the generative abilities of Large Language Models (LLMs), we devised an automated approach based on LLMs to extend an existing publicly available English dataset to new languages. We constructed a pipeline of multiple automated components which first converts an existing annotated dataset from its complex standard format to a simpler inline annotated format, then generates inline annotations in the target language using LLMs, and finally converts the generated target language inline annotations to the dataset's standard format; a manual validation is envisaged for erroneous and missing annotations. By automating the translation and annotation transfer process, the method we propose significantly reduces the resource-intensive task of collecting data and manually annotating them, thus representing a crucial step toward bridging the gap between the need for clinical research and the availability of high-quality data.