AUTHOR=Loizzi Vera , Comes Maria Colomba , Arezzo Francesca , Apostol Adriana Ionelia , Bove Samantha , Fanizzi Annarita , Fruscio Robert , Gregorc Vanesa , Legge Francesco , Mancari Rosanna , Marchetti Claudia , Negri Serena , Russo Giorgia , Vertechy Laura , Scambia Giovanni , Massafra Raffaella , Cormio Gennaro TITLE=Validation of machine learning-based models to predict and explain the risk of ovarian cancer: a multicentric study on BRCA-mutated patients undergoing risk-reducing salpingo-oophorectomy JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1574037 DOI=10.3389/fonc.2025.1574037 ISSN=2234-943X ABSTRACT=ObjectiveBRCA-mutated women are recommended to undergo bilateral risk-reducing salpingo-oophorectomy (RRSO) after childbearing, due to the lack of effective methods that could be able to early detect the occurrence of ovarian cancer. Thus, predictive machine learning (ML) techniques could be crucial to aid clinicians in identifying high-risk BRCA-mutated patients and determining the appropriate timing for performing RRSO.MethodsIn this work, we addressed this task by developing explainable ML models using clinical data referred to a multicentric cohort of 694 BRCA-mutated patients from six Italian centers (Policlinico Gemelli, IRCCS San Gerardo, Policlinico Bari, Istituto Tumori Regina Elena, Istituto Tumori Giovanni Paolo II, Ospedale F. Miulli), who performed salpingo-oophorectomy, out of which 39 patients showed tumor (5.6%). Data from Istituto Tumori Regina Elena and Policlinico Bari were used as External Validation Cohort (EVC). The other data were employed as Investigational Cohort (IC). Resampling and ensemble techniques were implemented to handle dataset imbalance. Explainable techniques enabled us to identify some protective and risk factors predicted by the models with respect to the task under study.ResultsThe best ML model achieved an AUC value of 79.3% (95% CI: 75.3% - 83.0%), an accuracy value of 73.8% (95% CI: 69.6% - 78.2%), a sensitivity value of 66.7% (95% CI: 58.1% - 75.3%), a specificity value of 74.3% (95% CI: 68.7% - 80.0%), and a G-mean value of 70.4% (95% CI: 63.0% - 76.0%) on EVC. Although the model demonstrated good overall performance, its limited sensitivity reduces its effectiveness in this high-risk population. The variables CA125, age and MatoRRSO were found to be the most significant risk factors, in agreement with the clinical perspective. Conversely, variables such as Estroprogestinuse and PregnancyNfdt played a protective factor role.ConclusionOur ML proposal explores the intricate relationships between multiple clinical variables, with a particular emphasis on understanding their non-linear associations. However, while our approach provides valuable insights into risk assessment for BRCA-mutated patients, its current predictive capacity does not significantly improve upon existing clinical models.