AUTHOR=Jabbarpour Amir , Moulton Eric , Kaviani Sanaz , Ghassel Siraj , Zeng Wanzhen , Akbarian Ramin , Couture Anne , Roy Aubert , Liu Richard , Lucinian Yousif A. , Hejji Nuha , AlSulaiman Sukainah , Shirazi Farnaz , Leung Eugene , Bonsall Sierra , Arfin Samir , Gray Bruce G. , Klein Ran TITLE=On the construction of a large-scale database of AI-assisted annotating lung ventilation-perfusion scintigraphy for pulmonary embolism (VQ4PEDB) JOURNAL=Frontiers in Nuclear Medicine VOLUME=Volume 5 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/nuclear-medicine/articles/10.3389/fnume.2025.1632112 DOI=10.3389/fnume.2025.1632112 ISSN=2673-8880 ABSTRACT=IntroductionVentilation-perfusion (V/Q) nuclear scintigraphy remains a vital diagnostic tool for assessing pulmonary embolism (PE) and other lung conditions. Interpretation of these images requires specific expertise which may benefit from recent advances in artificial intelligence (AI) to improve diagnostic accuracy and confidence in reporting. Our study aims to develop a multi-center dataset combining imaging and clinical reports to aid in creating AI models for PE diagnosis.MethodsWe established a comprehensive imaging registry encompassing patient-level V/Q image data along with relevant clinical reports, CTPA images, DVT ultrasound impressions, D-dimer lab tests, and thrombosis unit records. Data extraction was performed at two hospitals in Canada and at multiple sites in the United States, followed by a rigorous de-identification process. We utilized the V7 Darwin platform for crowdsourced annotation of V/Q images including segmentation of V/Q mismatched vascular defects. The annotated data was then ingested into Deep Lake, a SQL-based database, for AI model training. Quality assurance involved manual inspections and algorithmic validation.ResultsA query of The Ottawa Hospital's data warehouse followed by initial data screening yielded 2,137 V/Q studies with 2,238 successfully retrieved as DICOM studies. Additional contributions included 600 studies from University Health Toronto, and 385 studies by private company Segmed Inc. resulting in a total of 3,122 V/Q planar and SPECT images. The majority of studies were acquired using Siemens, Philips, and GE scanners, adhering to standardized local imaging protocols. After annotating 1,500 studies from The Ottawa Hospital, the analysis identified 138 high-probability, 168 intermediate-probability, 266 low-probability, 244 very low-probability, and 669 normal, and 15 normal perfusion with reversed mismatched ventilation defect studies. In 1,500 patients were 3,511 segmented vascular perfusion defects.ConclusionThe VQ4PEDB comprised 8 unique ventilation agents and 11 unique scanners. The VQ4PEDB database is unique in its depth and breadth in the domain of V/Q nuclear scintigraphy for PE, comprising clinical reports, imaging studies, and annotations. We share our experience in addressing challenges associated with data retrieval, de-identification, and annotation. VQ4PEDB will be a valuable resource to development and validate AI models for diagnosing PE and other pulmonary diseases.