Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

This article is part of the Research TopicEquity in Cancer Prevention and Early DetectionView all 9 articles

Predicting and Identifying Correlates of Inequalities in Breast Cancer Screening Uptake using National Data from India

Provisionally accepted
Aleena  TanveerAleena Tanveer1Raja  Hashim AliRaja Hashim Ali2Jitendra  MajhiJitendra Majhi3Moumita  MukherjeeMoumita Mukherjee1*
  • 1Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, and Humboldt-Universität zu Berlin, Berlin, Germany
  • 2University of Europe for Applied Sciences, Potsdam, Germany
  • 3All India Institute of Medical Sciences Kalyani, Kalyani, India

The final, formatted version of the article will be published soon.

Despite large-scale national screening initiatives, breast cancer screening coverage in India remains extremely low; late-stage diagnosis contributes to high mortality among women. Creating accurate actionable predictions of socioeconomic and structural inequities in screening uptake is critical for designing equitable control strategies. This study applies machine learning to predict key determinants of screening uptake, and inequality using concentration indices and decomposition across economic, educational, and caste gradients. Using cross-sectional NFHS-5 data covering 68,526 women aged 30–49 years, variable selection was directed by Levesque’s healthcare access framework, spanning approachability, acceptability, affordability, availability, appropriateness. Three single learners—Logistic Regression, Naïve Bayes, Decision Tree—two ensemble learners—Random Forest, XGBoost—were trained on balanced-weighted data. To minimize overfitting risks following Synthetic Minority Oversampling Technique, model performance was validated through 10-fold cross-validation. Five evaluation metrics were compared to select the best-performing learner for predicting screening uptake. SHAP-based decomposition computed each feature-level contributions to inequalities across income, education, and social gradients. Screening uptake in India is exceptionally low (0.9%), with pronounced disparities across economic, educational, and social groups. Random Forest and XGBoost achieved high predictive accuracy (96%) and strong explainability (AUROC = 0.99), while Decision Tree yielded robust generalizability (mean AUROC = 0.995) in cross-validation splits. Feature importance analyses identified education, women’s autonomy, interactions with community health workers, spatial–provincial characteristics as major contributors to observed variation. Barriers such as distance to facilities, transport constraints, hesitancy in unaccompanied care seeking, and financial limitations showed smaller contributions to variance but remained relevant in shaping uptake patterns. Concentration indices confirmed pro-rich (0.1, p < 0.001), pro-educated (0.182, p < 0.001), and pro-marginalized (–0.011, p < 0.05) gradients. Tree-based decomposition suggested that affordability and education exacerbate pro-rich and pro-educated inequalities, yet could be an effective policy instrument to mitigate caste-based disparities. Access-related barriers intensified inequality across all gradients, whereas enabling access factors flattened them. Machine learning thus offers enhanced precision in predicting inequities in breast cancer screening uptake, revealing critical nonlinearities in gradients with access barriers. Findings highlight the need for financial protection, improved spatial accessibility, higher autonomy, strengthened community health worker engagement, and targeted awareness programmes for poor, less educated, socially marginalized women.

Keywords: accessibility, breast cancer screening, Concentration index, ConcentrationIndex Decomposition, health inequality, India, machine learning

Received: 21 Oct 2025; Accepted: 19 Dec 2025.

Copyright: © 2025 Tanveer, Ali, Majhi and Mukherjee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Moumita Mukherjee

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.