Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

Volume 8 - 2025 | doi: 10.3389/frai.2025.1682919

This article is part of the Research TopicThe Applications of AI Techniques in Medical Data ProcessingView all 19 articles

Evaluating XAI techniques under class imbalance using CPRD data

Provisionally accepted
Teena  RaiTeena Rai1*Jun  HeJun He1Jaspreet  KaurJaspreet Kaur2Yuan  ShenYuan Shen1Mufti  MahmudMufti Mahmud3David  J BrownDavid J Brown1Emma  O'DowdEmma O'Dowd2David  BaldwinDavid Baldwin2
  • 1Department of Computer Science, Nottingham Trent University, Nottingham, United Kingdom
  • 2Division of Epidemiology and Public Health, University of Nottingham, Nottingham, United Kingdom
  • 3Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia

The final, formatted version of the article will be published soon.

The need for eXplainable Artificial Intelligence (XAI) in healthcare is more critical than ever, especially as regulatory frameworks like the EU AI Act mandate transparency in clinical decision support systems. Post hoc XAI techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) are widely used to interpret Machine Learning (ML) models for disease risk prediction, particularly in tabular Electronic Health Record (EHR) data. However, their reliability under real-world scenarios is not fully understood. Class imbalance is a common challenge in many real-world datasets but it is rarely accounted for when evaluating the reliability and consistency of XAI techniques. In this study, we design a comparative evaluation framework to assess the impact of class imbalance on the consistency of model explanations generated by LIME, SHAP, and Partial Dependence Plots (PDPs). Using UK primary care data from the Clinical Practice Research Datalink (CPRD), we train three ML models: XGBoost (XGB), Random Forest (RF), and Multi-layer Perceptron (MLP), to predict lung cancer risk and evaluate how interpretability is affected under class imbalance compared against a balanced dataset. To our knowledge, this is the first study to evaluate explanation consistency under class imbalance across multiple models and interpretation methods using real-world clinical data. Our main finding is that class imbalance in the training data can significantly affect the reliability and consistency of LIME and SHAP explanations when evaluated against balanced data. To explain these empirical findings, we also present a theoretical analysis of LIME and SHAP to understand why explanations change under different class distributions. It is also found that PDPs exhibit noticeable variation between models trained on imbalanced and balanced datasets with respect to clinically relevant features for predicting lung cancer risk. These findings highlight a critical vulnerability in current XAI techniques, i.e., their interpretability are significantly affected under skewed class distributions, which is common in medical data and emphasises the importance of consistent model explanations for trustworthy ML deployment in healthcare.

Keywords: Explainable AI, Lime, Shap, PDP, Class imbalance, CPRD, Evaluation, Consistency

Received: 09 Aug 2025; Accepted: 21 Oct 2025.

Copyright: © 2025 Rai, He, Kaur, Shen, Mahmud, Brown, O'Dowd and Baldwin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Teena Rai, teena.rai2022@my.ntu.ac.uk

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.