AUTHOR=Yang Junting , Wu Yunxiao , Guo Jinxin , Wang Xiaoxuan , Gao Xin , Chen Xin , Zhang Mengdi , Yang Jin , Liu Zuojing , Liu Yan , Liu Zhike , Zhan Siyan TITLE=Development and validation of identification algorithms for five autoimmune diseases using electronic health records: a retrospective cohort study in China JOURNAL=Frontiers in Immunology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2025.1541203 DOI=10.3389/fimmu.2025.1541203 ISSN=1664-3224 ABSTRACT=ObjectiveThis study aims to assess the identification algorithms for five autoimmune diseases—Hashimoto’s thyroiditis, inflammatory bowel disease (IBD), primary immune thrombocytopenia (ITP), rheumatoid arthritis (RA), and type 1 diabetes (T1D)—using the Yinzhou Regional Health Information Platform (YRHIP) in China.MethodsDiagnostic data was extracted from YRHIP’s population registry (2010-2021), combining ICD-10 codes and Chinese medical terminology from outpatient, inpatient, and discharge records. Algorithms were validated through chart reviews, adhering to global clinical guidelines. Cases were adjudicated using electronic case report forms. We evaluated algorithm performance based on sensitivity and positive predictive value (PPV), with a 70% PPV threshold for optimization.ResultsAmong all reviewed cases, we identified 136 cases for Hashimoto’s thyroiditis, 65 for IBD, 76 for ITP, 130 for RA, and 43 for T1D. Algorithm performance varied across diseases: the final algorithm for Hashimoto’s thyroiditis achieved optimal accuracy (sensitivity 97.44%, PPV 98.28%), followed by RA (sensitivity 100.00%, PPV 76.92%). Algorithms for IBD and ITP required synthesis of multiple data sources to achieve acceptable performance (IBD: sensitivity 79.66%, PPV 70.15%; ITP: sensitivity 62.50%, PPV 70.00%). For T1D, the final algorithm utilizing both admission and outpatient records yielded satisfactory results (sensitivity 84.09%, PPV 74.00%).ConclusionsThis study presents the first validated algorithms for identifying autoimmune diseases using EHR data in China, demonstrating satisfactory performance (PPV >70%) across all diseases. Our findings demonstrate that a combination of data sources is crucial for accurate case identification in complex autoimmune conditions, providing an important methodological foundation for future real-world studies in Chinese populations.