AUTHOR=Mbaye Ndèye Maguette , Danziger Michael M. , Rosen-Zvi Michal , Toussaint Aullène , Dumas Elise , Guerin Julien , Hamy-Petit Anne-Sophie , Reyal Fabien , Azencott Chloé-Agathe TITLE=Multimodal BEHRT: transformers for multimodal electronic health records to predict breast cancer prognosis JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1496215 DOI=10.3389/fonc.2025.1496215 ISSN=2234-943X ABSTRACT=BackgroundElectronic Health Records (EHRs) contain a wealth of information about patients that could be useful toward improving treatment outcomes for breast cancer patients, but remain mostly unexploited. Recent methodological developments in deep learning, however, open the way to developing new methods to leverage this information to improve patient care.MethodsWe propose M-BEHRT, a Multimodal BERT for EHR data based on BEHRT, itself an architecture based on the popular natural language architecture BERT (Bidirectional Encoder Representations from Transformers). M-BEHRT models multimodal patient trajectories as a sequence of medical visits, comprising a variety of information such as clinical features, results from biological lab tests, medical department and procedure, and the content of free-text medical reports. M-BEHRT uses a pretraining task analog to a masked language model to learn a representation of patient trajectories from data that includes patients that are unlabeled due to censoring, and is then fine-tuned to the classification task at hand. A gradient-based attribution method highlights which parts of the input patient trajectory were most relevant for the prediction.ResultsWe applied M-BEHRT to a retrospective cohort of about 15–000 breast cancer patients treated with adjuvant chemotherapy, using patient trajectories for up to one year after surgery to predict disease-free survival 3 years after surgery. M-BEHRT achieves an AUC-ROC of 0.77 [0.70-0.84] on a held-out data set, compared to 0.67 [0.58-0.75] for the Nottingham Prognostic Index (NPI) and random forests (p ¡ 0.05). In addition, we identified subsets of patients for which M-BEHRT performs particularly well such as older patients with at least one lymph node affected.ConclusionOur work highlights both the potential of EHR data for improving our understanding of breast cancer and the ability of transformer-based architectures to learn from EHR data containing much fewer than the millions of records typically used in currently published studies. The representation of patient trajectories used by M-BEHRT captures their sequential aspect, and opens new research avenues for understanding complex diseases and improving patient care.