Evaluating artificial intelligence bias in nephrology: the role of diversity, equity, and inclusion in AI-driven decision-making and ethical regulation

Balakrishnan, Suryanarayanan; Thongprayoon, Charat; Wathanavasin, Wannasit; Miao, Jing; Mao, Michael A.; Craici, Iasmina M.; Cheungpasitporn, Wisit

doi:10.3389/frai.2025.1525937

BRIEF RESEARCH REPORT article

Front. Artif. Intell., 27 May 2025

Sec. Medicine and Public Health

Volume 8 - 2025 | https://doi.org/10.3389/frai.2025.1525937

This article is part of the Research TopicEthical and Legal Implications of Artificial Intelligence in Public Health: Balancing Innovation and PrivacyView all 10 articles

Evaluating artificial intelligence bias in nephrology: the role of diversity, equity, and inclusion in AI-driven decision-making and ethical regulation

Suryanarayanan Balakrishnan¹

Charat Thongprayoon¹

Wannasit Wathanavasin^1,2

Jing Miao¹

Michael A. Mao³

Iasmina M. Craici¹

Wisit Cheungpasitporn¹^*

¹Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, United States
²Nephrology Unit, Department of Medicine, Charoenkrung Pracharak Hospital, Bangkok, Thailand
³Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Jacksonville, FL, United States

Background: The integration of Artificial Intelligence (AI) in nephrology has raised concerns regarding bias, fairness, and ethical decision-making, particularly in the context of Diversity, Equity, and Inclusion (DEI). AI-driven models, including Large Language Models (LLMs) like ChatGPT, may unintentionally reinforce existing disparities in patient care and workforce recruitment. This study investigates how AI models (ChatGPT 3.5 and 4.0) handle DEI-related ethical considerations in nephrology, highlighting the need for improved regulatory oversight to ensure equitable AI deployment.

Methods: The study was conducted in March 2024 using ChatGPT 3.5 and 4.0. Eighty simulated cases were developed to assess ChatGPT’s decision-making across diverse nephrology topics. ChatGPT was instructed to respond to questions considering factors such as age, sex, gender identity, race, ethnicity, religion, cultural beliefs, socioeconomic status, education level, family structure, employment, insurance, geographic location, disability, mental health, language proficiency, and technology access.

Results: ChatGPT 3.5 provided a response to all scenario questions and did not refuse to make decisions under any circumstances. This contradicts the essential DEI principle of avoiding decisions based on potentially discriminatory criteria. In contrast, ChatGPT 4.0 declined to make decisions based on potentially discriminatory criteria in 13 (16.3%) scenarios during the first round and in 5 (6.3%) during the second round.

Conclusion: While ChatGPT 4.0 shows improvement in ethical AI decision-making, its limited recognition of bias and DEI considerations underscores the need for robust AI regulatory frameworks in nephrology. AI governance must incorporate structured DEI guidelines, ongoing bias detection mechanisms, and ethical oversight to prevent AI-driven disparities in clinical practice and workforce recruitment. This study emphasizes the importance of transparency, fairness, and inclusivity in AI development, calling for collaborative efforts between AI developers, nephrologists, policymakers, and patient communities to ensure AI serves as an equitable tool in nephrology.

Introduction

Kidney disease is a global health issue, affecting millions of people worldwide. Chronic Kidney Disease (CKD) is particularly prevalent, with an estimated 850 million people suffering from this condition (Jager et al., 2019). CKD is characterized by a gradual loss of kidney function over time, which can progress to End-Stage Kidney Disease (ESKD) if not properly managed. ESKD requires dialysis or kidney transplantation, both of which pose significant health and economic burdens (Jha et al., 2023). The field of nephrology, which deals with the diagnosis and treatment of kidney diseases, has been at the forefront of medical innovation for decades. However, like many areas of healthcare, it faces significant challenges in ensuring equitable access to care and outcomes for all patients (Vanholder et al., 2023). The effect is more pronounced among minorities especially among women from these groups right from their inclusion in research studies; this is further exacerbated by existing systemic biases (Pinho-Gomes et al., 2023; Mohammed et al., 2024). In recent years, the concepts of diversity, equity, and inclusion (DEI) have gained increasing prominence in healthcare, with nephrology being no exception. The “Kidney Care for All” initiative exemplifies this shift, advocating for a more inclusive approach to kidney health that addresses disparities and promotes equal access to quality care for all individuals, regardless of their background or socioeconomic status (Pais and Iyengar, 2023). Despite efforts to increase diversity in medical education and practice, the field of nephrology, like many medical specialties, still struggles with underrepresentation of minority groups among its practitioners (Salsberg et al., 2021).

Concurrently, the rapid advancement of Artificial Intelligence (AI), particularly Large Language Models (LLMs) like ChatGPT (Open AI. Introducing Chat GPT, 2024), has begun to reshape the landscape of healthcare decision-making. These AI systems, capable of processing vast amounts of data and generating human-like responses, hold immense potential for augmenting clinical decision-making, streamlining administrative processes, and potentially reducing healthcare disparities (Garcia Valencia et al., 2023). However, the integration of AI into nephrology and other medical fields also raises critical questions about the ethical implications and potential unintended consequences of relying on machine-generated insights for patient care and professional decisions. There are concerns that AI systems, if not properly designed and implemented, could perpetuate or even exacerbate existing health disparities (Omiye et al., 2023; Yang et al., 2024). For instance, if training data for AI models are not sufficiently diverse or representative, the resulting algorithms may perform poorly for certain population subgroups or reinforce biased decision-making patterns (Kuhlman et al., 2020).

Given the significant impact that AI could have on nephrology, it is imperative to assess how well these technologies adhere to DEI principles. This study focuses on evaluating the ethical sensitivity and decision-making capabilities of two versions of ChatGPT (3.5 and 4.0) in nephrology-related scenarios. By examining how these AI models handle DEI considerations, particularly with regard to underrepresented socio-demographic variables such as ethnicity, employment status, education, and religious beliefs, we aim to identify potential risks and areas for improvement in their design and implementation. These attributes, while often underappreciated in traditional clinical decision-making workflows, are critical to fostering equitable care. Our work seeks to advance the dialogue on AI fairness in nephrology by showcasing how ethical AI evaluation must go beyond clinical indicators to include the broader socio-cultural determinants of health.

Methods

In the context of AI in nephrology, diversity refers to the inclusion of individuals from varied demographic backgrounds in data and practice; equity involves ensuring fair treatment and access to care and opportunities; and inclusion emphasizes meaningful engagement of underrepresented groups in AI design and implementation. These principles guided the development of our simulation scenarios and the evaluation of AI model responses.

Simulated cases development

A total of 80 simulated cases were collaboratively developed by two board-certified nephrologists (CT and WC) with expertise in DEI and clinical ethics. Each case was informed by real-world nephrology practice and ethical dilemmas and incorporated social determinants of health relevant to DEI considerations. The cases were iteratively reviewed to ensure clinical plausibility, decision-making complexity, and DEI sensitivity. Each scenario included four response options representing a gradient of ethical appropriateness: (1) ethically aligned and inclusive, (2) partially biased or utilitarian, (3) clearly discriminatory, and (4) neutral or non-committal. To promote transparency and reproducibility, all 80 scenarios and their response options are provided as Online Supplementary, along with their corresponding DEI domain and clinical context.

For each simulated case, four multiple-choice responses were carefully developed by the nephrologist authors to reflect common patterns of decision-making: one ethically aligned and inclusive option, one partially biased or utilitarian option, one clearly discriminatory option, and one neutral or non-committal response. This structure allowed us to assess the AI models’ ability to distinguish ethically sound recommendations from biased or inappropriate ones. All choices were reviewed to ensure internal consistency, clinical plausibility, and DEI relevance.

Evaluation process

ChatGPT was instructed to select the best response from four provided choices for each scenario. The AI was guided to prioritize decisions based on factors including age, sex, gender identity, sexual orientation, race, ethnicity, religion, cultural beliefs, socioeconomic status, education level, family structure, employment, insurance, geographic location, disability, impairment, mental health, language proficiency, and technology access (Figure 1). In alignment with DEI principles, ChatGPT was designed to decline making decisions when the background information could potentially lead to discriminatory outcomes.

Figure 1

Flowchart detailing a study on AI in nephrology. Steps: (1) Simulation case creation in March 2024 by two nephrologists; (2) Review for accuracy, relevance, and ethics; (3) Evaluation of ChatGPT 3.5 and 4.0 for decision-making and DEI sensitivity; (4) Performance comparison—ChatGPT 3.5 showed a utilitarian approach, while ChatGPT 4.0 refused decisions in 16.25% of cases based on discriminatory criteria; (5) Conclusion with ethical insights and AI regulation recommendations in nephrology.

Figure 1. Flowchart of the methodology for AI evaluation in nephrology simulation cases.

The study was conducted in March 2024 using both ChatGPT 3.5 and 4.0. Each model was queried twice at one-month intervals to assess the consistency of their responses over time.

Results

The full set of ChatGPT 3.5 and 4.0 responses to all 80 simulated scenarios is presented in the Online Supplementary Materials. This includes the selected response by each model, whether a refusal occurred (along with the stated rationale), and the consistency of responses across rounds for ChatGPT 4.0.

ChatGPT 3.5 answered all 80 scenarios without refusal, demonstrating a utilitarian approach that did not account for potential ethical conflicts. In contrast, ChatGPT 4.0 declined to make decisions based on potentially discriminatory criteria in 13 scenarios (16.3%) during the first round and in 5 scenarios (6.3%) during the second round. It provided consistent responses across both rounds in 67 scenarios (83.8%), suggesting moderate reliability in its ethical decision-making behavior.

These refusals predominantly involved scenarios in which decisions were explicitly linked to sensitive sociodemographic variables. For example, in one scenario regarding workforce recruitment, the model declined to prioritize candidates based on religious affiliation, stating that such decisions should be made based on qualifications rather than personal beliefs. In another case involving transplant eligibility, it refused to recommend prioritizing patients by immigration status, emphasizing the ethical obligation to treat all individuals equitably. Additional examples included declining to select candidates for living kidney donation promotion based on employment type (e.g., self-employed vs. corporate), or selecting dialysis access strategies based on language proficiency. A complete list of these 13 scenarios and the specific DEI attributes involved is available in online supplementary. These findings underscore the model’s partial sensitivity to ethically contentious prompts, though they also reveal inconsistency, as similar cases were occasionally answered without objection.

Figure 2 demonstrates examples of GPT output regarding critical considerations in nephrology treatment and workforce diversity.

Figure 2

Screenshot collage of ChatGPT responses to ethically sensitive nephrology-related questions. Topics include prioritizing ethnic groups, employment status, employment type, and religion in treatment or workforce decisions. ChatGPT selected: “Black” for kidney treatment prioritization, “Full-time employment” for diabetic nephropathy treatment, “Self-employed” for living kidney donation promotion, and “Christianity” for ensuring a diverse workforce. Each answer is followed by an explanatory paragraph.

Figure 2. Examples of outputs from GPT-3.5 and GPT-4.0.

In Figure 2, the left column presents outputs from GPT-3.5, while the right column features responses from GPT-4.0. For GPT-3.5, question #1 inquires about which ethnic group should be prioritized for a new kidney disease treatment. The model recommends prioritizing Black individuals, citing their historical disproportionate impact from kidney disease, which necessitates tailored treatment considerations to address their specific needs. In question #21, regarding employment types to prioritize for promoting living kidney donations, GPT-3.5 suggests focusing on self-employed individuals, as they often have more flexibility in managing their schedules, making it easier for them to accommodate the donation process and recovery time. On the right side, GPT-4.0 addresses question #36, which asks about the employment status to consider for new diabetic nephropathy treatments. The model advocates for prioritizing individuals in full-time employment, noting that this demographic is more likely to have access to employer-sponsored health insurance, thus improving equity in treatment access. In question #78, concerning which religious belief should be factored in when selecting candidates for nephrologist positions to ensure workforce diversity, GPT-4.0 emphasizes the importance of considering candidates from diverse religious backgrounds, specifically highlighting Christianity, to foster an inclusive environment that enhances cultural competency in patient care.

Discussion

The findings of this study reveal a marked difference in ethical decision-making capabilities between ChatGPT 3.5 and 4.0 in nephrology-related scenarios. ChatGPT 3.5 consistently selected treatment choices predicted to yield the best outcomes across all scenarios, demonstrating a utilitarian approach that incorporated various DEI factors. However, it did not refuse to make decisions in any scenario, reflecting a lack of sensitivity to potentially discriminatory criteria. In contrast, ChatGPT 4.0 declined to make decisions based on potentially discriminatory criteria in 16.25% of scenarios, explicitly stating that DEI factors should not affect decisions about treating patients or hiring nephrology staff. While this shows an improvement in ethical sensitivity, the relatively low refusal rate was unexpected, highlighting areas for further enhancement.

The observed improvement in ChatGPT 4.0’s handling of DEI-sensitive decisions underscores the iterative progress in AI ethical alignment, likely influenced by updated reinforcement learning techniques and refined safety layers in the model’s training pipeline (Sheikh et al., 2025; Alam et al., 2025; Miao et al., 2025). While not yet optimal, ChatGPT 4.0’s behavior reflects a greater sensitivity to fairness principles, selectively deferring decisions that could result in biased clinical or hiring recommendations. These changes are promising, as they suggest that ethical behavior in LLMs can be enhanced over successive versions. Nevertheless, the relatively low overall refusal rate indicates that existing mechanisms remain insufficient to fully safeguard against implicit bias, and further tuning—both algorithmic and regulatory—is essential to prevent harm.

Existing literature on AI in healthcare highlights both the potential benefits and risks associated with AI integration (Nazer et al., 2023; Ali et al., 2023). Previous studies have demonstrated AI’s capacity to enhance clinical decision-making and patient outcomes, but they also raise concerns about the perpetuation of biases inherent in training data (Omiye et al., 2023; Ayoub et al., 2024; Parikh et al., 2019). Obermeyer et al. (2019) showed in their study of racial bias present in a popular commercial algorithm for risk stratification used in the healthcare system. Our findings align with these concerns, illustrating that while AI can make clinically sound decisions, its consideration of ethical issues, particularly DEI, remains imperfect. The differences in performance between ChatGPT 3.5 and 4.0 likely stem from advancements in model training and updates in ethical guidelines integrated into the AI’s framework. At the same time, we acknowledge that the relatively low refusal rate points to the need for further refinement of AI models and the development of more robust ethical frameworks. To address this, our discussion has been expanded to emphasize that while the improvements observed in ChatGPT 4.0 are encouraging, they also serve as a catalyst for continued research. Future directions will focus on refining training datasets, enhancing algorithmic sensitivity to DEI factors, and incorporating interdisciplinary insights to align AI recommendations more closely with clinical judgment. Such measures are anticipated to support the responsible integration of AI into clinical practice, ensuring that these systems become reliable tools for promoting equitable healthcare outcomes.

Beyond the specific case of ChatGPT, DEI-related concerns are increasingly relevant across the broader spectrum of AI applications in nephrology. AI systems in clinical settings are not only used for decision support but also influence patient triage, diagnostic accuracy, treatment planning, and workforce recruitment (Miao et al., 2025; Koirala et al., 2025; Pham et al., 2024). If AI models are not carefully designed with DEI principles in mind, they risk reinforcing existing disparities in kidney care, including biases related to race, socioeconomic status, and geographic location. Moreover, many AI models are developed in high-income countries with datasets that may not be representative of the global population, further exacerbating inequities in nephrology care. Addressing these issues requires greater scrutiny of model training processes, the diversity of datasets, and the interpretability of AI-driven recommendations (Ferryman et al., 2023; Meng et al., 2022). Future research will be directed toward evaluating these alternative systems to compare their performance in handling DEI considerations. This broader approach will not only enrich our understanding of AI’s potential benefits and pitfalls in nephrology but also inform the development of more robust regulatory and ethical frameworks for AI integration across the discipline. Additionally, interdisciplinary collaboration between AI developers, ethicists, nephrologists, and policymakers will be essential in ensuring that AI systems are aligned with clinical needs while promoting health equity (Abbasgholizadeh Rahimi et al., 2024).

The implications of these findings for the integration of AI in nephrology and broader healthcare contexts are significant. While AI has the potential to enhance decision-making and improve patient outcomes, this study underscores the importance of robust ethical frameworks and careful regulation. The fact that even the more advanced ChatGPT 4.0 model failed to recognize potentially discriminatory factors in the majority of cases emphasizes the need for human oversight and the importance of using AI as a supportive tool rather than a replacement for human judgment in critical medical decisions. Furthermore, this study highlights the need for ongoing research and development in AI ethics, particularly in healthcare applications. As AI models continue to evolve rapidly, it is crucial to regularly assess their ethical decision-making capabilities and identify areas for improvement. This may involve developing more sophisticated training datasets representing the entire population at large that better represent diverse populations and ethical scenarios, as well as refining the algorithms that govern AI decision-making processes (Gaonkar et al., 2023; Mudgal and Das, 2020; Ueda et al., 2024).

Additionally, there is a clear need for interdisciplinary collaboration between AI developers, ethicists, healthcare professionals, and policymakers to ensure that AI systems are designed and implemented in ways that promote equity and avoid perpetuating or exacerbating existing healthcare disparities (Ueda et al., 2024; Dankwa-Mullan, 2024; Mennella et al., 2024; Walsh et al., 2023). Moreover, the study’s findings have important implications for medical education and professional development in nephrology and other healthcare fields. As AI becomes increasingly integrated into clinical practice, it is essential that healthcare professionals are trained not only in the use of these technologies but also in critically evaluating their outputs and understanding their ethical limitations (Garcia Valencia et al., 2023; Ueda et al., 2024). This includes developing skills in recognizing potential biases in AI-generated recommendations and maintaining a commitment to equitable, patient-centered care. By fostering a workforce that is both technologically adept and ethically grounded, the field of nephrology can work toward harnessing the benefits of AI while upholding its commitment to diversity, equity, and inclusion in all aspects of patient care and professional practice (Figure 3).

Figure 3

Flowchart outlining limitations and challenges in implementing AI in nephrology, with six main categories: data quality and bias, clinical implementation complexity, need for data and ethics, clinician interpretation and training, and trust. Each category leads to targeted solutions such as developing bias-free algorithms, integrating AI with workflows, addressing ethical data concerns, training professionals in AI, and demonstrating improved outcomes. The goal is equitable, transparent, and effective AI adoption in nephrology.

Figure 3. Limitations and challenges and implementing AI in nephrology.

The development of standardized DEI-related ethical guidelines for AI in nephrology is essential to ensuring that AI-driven decision-making aligns with the principles of fairness, transparency, and inclusivity (Abbasgholizadeh Rahimi et al., 2024; Solomonides et al., 2022). These guidelines should provide a structured framework requiring AI developers to incorporate diverse, representative training datasets, implement bias detection and mitigation strategies, and prioritize explainability so that clinicians can critically assess AI-generated recommendations (Chin et al., 2023). Furthermore, interdisciplinary collaboration among AI developers, nephrologists, bioethicists, and policymakers is necessary to create AI systems that are both clinically effective and socially responsible. AI models should undergo continuous auditing to evaluate their adherence to DEI principles, with regulatory oversight ensuring that biases are addressed over time. Healthcare institutions must also take an active role in developing policies that support responsible AI use in both patient care and workforce recruitment, ensuring that AI-driven decisions contribute to equitable healthcare (Abbasgholizadeh Rahimi et al., 2024). Future research should focus on refining these frameworks to optimize AI deployment in a manner that enhances healthcare access and outcomes for all patients.

Ensuring the ethical performance of AI models in nephrology requires a continuous monitoring and evaluation process that extends throughout the entire lifecycle of these systems (Abbasgholizadeh Rahimi et al., 2024). AI outputs can shift over time due to changes in training data, algorithm modifications, or evolving clinical practices. To prevent the unintended reinforcement of biases, a structured oversight framework should be established (Economou-Zavlanos et al., 2024). This framework should include real-time bias detection mechanisms integrated into AI deployment systems, periodic audits by interdisciplinary teams consisting of nephrologists, AI ethicists, data scientists, and regulatory experts, and ongoing user feedback loops that allow clinicians to report discrepancies or concerns regarding AI-generated recommendations. Additionally, regulatory bodies and healthcare institutions should conduct periodic evaluations to ensure that AI-driven decisions remain equitable and do not disproportionately disadvantage any patient populations. Transparency in reporting AI performance metrics such as disparities in AI decision-making across demographic groups will be critical in building confidence in AI-assisted nephrology (Chin et al., 2023; Economou-Zavlanos et al., 2024).

To ensure responsible AI use in nephrology, a structured process for addressing ethical concerns must be established. This should include both a dedicated ethics committee and a formal reporting mechanism for healthcare professionals (Economou-Zavlanos et al., 2024). The ethics committee should be an interdisciplinary body composed of nephrologists, AI ethicists, data scientists, legal experts, and patient advocates. Its role would be to evaluate ethical concerns related to AI applications, provide guidance on ethical AI implementation, and develop strategies to mitigate bias and ensure fairness. In parallel, a structured reporting mechanism should be created to allow healthcare professionals to flag AI-generated recommendations that appear biased, ethically questionable, or inconsistent with established clinical guidelines (Economou-Zavlanos et al., 2024). Reports should be reviewed systematically, with clear channels for follow-up and corrective action. Regular audits of AI performance, along with clinician and patient feedback, should inform ongoing improvements to AI models (Economou-Zavlanos et al., 2024). Establishing these processes will help maintain ethical integrity in AI-driven nephrology and ensure that AI tools are used in a manner that upholds DEI principles while enhancing patient care.

Active collaboration with patient communities is a critical component of AI development and deployment in nephrology. Engaging patients throughout the AI lifecycle allows their perspectives to be incorporated into model design, leading to AI systems that are more aligned with the diverse needs of the nephrology population (Lammons et al., 2023; Adus et al., 2023). Establishing patient advisory panels can provide valuable insights into AI-generated recommendations, helping to identify potential gaps or biases in decision-making. Additionally, focus groups with patients from different demographic and socioeconomic backgrounds can highlight concerns, expectations, and trust levels regarding AI-driven healthcare tools (Adus et al., 2023). Transparency in AI decision-making is essential, and efforts should be made to present AI-generated recommendations in a way that is understandable and actionable for both clinicians and patients. Patient advocacy organizations play a key role in facilitating these collaborations by acting as intermediaries between AI developers, healthcare providers, and patient communities.

This study has several notable strengths. It is one of the first systematic evaluations of how large language models respond to ethical scenarios involving DEI considerations within nephrology. The simulation-based design allows for structured comparison across different model versions (ChatGPT 3.5 and 4.0) using controlled variables. The inclusion of real-world DEI variables such as religion, race, employment, and immigration status provides ecological validity and highlights relevant ethical challenges in clinical care and workforce recruitment. Additionally, the full disclosure of simulated cases and AI responses enhances transparency and reproducibility.

While the simulated cases were constructed to reflect a wide range of real-world nephrology scenarios with DEI relevance, we recognize that the cases were authored by a limited number of clinicians and may unintentionally reflect biases in scenario framing or emphasis. Certain cultural or regional contexts, particularly those affecting underrepresented populations globally, may not be fully captured. These limitations could influence how AI models respond and may underrepresent ethical nuances faced by specific communities. Further studies involving larger and more diverse datasets, as well as real-world clinical trials, are necessary to validate and expand upon these results. Future research should focus on expanding case diversity by incorporating a broader range of scenarios and involving a more diverse group of nephrologists in case development. Implementing and evaluating AI models in actual clinical settings will be crucial to assess their performance and ethical sensitivity in real-time decision-making. Additionally, continuous refinement of AI models is essential to enhance their ability to recognize and appropriately handle DEI considerations (Ueda et al., 2024). These efforts will inform the development of robust regulatory frameworks and training protocols that emphasize ethical decision-making and DEI sensitivity. Although our simulated scenarios were created and evaluated by two nephrologists, we recognize that additional insights from a broader group of clinicians may further enrich the evaluation of AI outputs. In future work, we plan to incorporate perspectives from a diverse range of practicing clinicians to refine our findings and ensure that our ethical evaluation framework reflects varied clinical viewpoints. This approach is anticipated to further strengthen our assessment of AI performance and contribute to the development of thoughtful regulatory guidelines for AI integration into clinical practice.

To enhance DEI sensitivity in AI models, we recommend several concrete steps: (1) incorporating demographically diverse and representative training datasets; (2) embedding DEI-specific audit mechanisms within model evaluation pipelines; (3) using adversarial testing that challenges models with ethically complex scenarios; and (4) requiring transparent reporting of performance disparities across subgroups. Achieving these goals will require sustained interdisciplinary collaboration. Clinicians can define real-world ethical constraints; data scientists can implement fairness-aware algorithms; ethicists can guide value alignment; and patient advocates can ensure community relevance. This collective approach is essential to creating AI systems that are both clinically robust and socially responsible.

In summary, this study demonstrates that while advancements have been made in AI ethical sensitivity, as seen in ChatGPT 4.0, there is still considerable room for improvement. The relatively low rate of refusal to engage in potentially discriminatory decision-making underscores the need for ongoing refinement of AI models. By addressing these challenges and implementing robust regulatory and training frameworks, we can better ensure that AI systems not only enhance clinical decision-making but also uphold the principles of diversity, equity, and inclusion. These efforts will contribute to more equitable healthcare delivery and better outcomes for all patients in nephrology.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

SB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. CT: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – original draft, Writing – review & editing. WW: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. JM: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. MM: Conceptualization, Data curation, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – original draft, Writing – review & editing. IC: Conceptualization, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. WC: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that Gen AI was used in the creation of this manuscript. The use of ChatGPT in this study was strictly limited to the response-generating protocol described in the methods section. ChatGPT was not used for data analysis or any other aspects of the production of this manuscript. The authors are responsible for all contents.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frai.2025.1525937/full#supplementary-material

References

Abbasgholizadeh Rahimi, S., Shrivastava, R., Brown-Johnson, A., Caidor, P., Davies, C., Idrissi Janati, A., et al. (2024). EDAI framework for integrating equity, diversity, and inclusion throughout the lifecycle of AI to improve health and Oral health care: qualitative study. J. Med. Internet Res. 26:e63356. doi: 10.2196/63356

PubMed Abstract | Crossref Full Text | Google Scholar

Adus, S., Macklin, J., and Pinto, A. (2023). Exploring patient perspectives on how they can and should be engaged in the development of artificial intelligence (AI) applications in health care. BMC Health Serv. Res. 23:1163. doi: 10.1186/s12913-023-10098-2

PubMed Abstract | Crossref Full Text | Google Scholar

Alam, S. F., Thongprayoon, C., Miao, J., Pham, J. H., Sheikh, M. S., Garcia Valencia, O. A., et al. (2025). Advancing personalized medicine in digital health: the role of artificial intelligence in enhancing clinical interpretation of 24-h ambulatory blood pressure monitoring. Digit Health. 11:20552076251326014. doi: 10.1177/20552076251326014

PubMed Abstract | Crossref Full Text | Google Scholar

Ali, O., Abdelbaki, W., Shrestha, A., Elbasi, E., Alryalat, M. A. A., and Dwivedi, Y. K. (2023). A systematic literature review of artificial intelligence in the healthcare sector: benefits, challenges, methodologies, and functionalities. J. Innov. Knowl. 8:100333. doi: 10.1016/j.jik.2023.100333

Crossref Full Text | Google Scholar

Ayoub, N. F., Balakrishnan, K., Ayoub, M. S., Barrett, T. F., David, A. P., and Gray, S. T. (2024). Inherent Bias in large language models: a random sampling analysis. Mayo Clinic Proc Digital Health. 2, 186–191. doi: 10.1016/j.mcpdig.2024.03.003

PubMed Abstract | Crossref Full Text | Google Scholar

Chin, M. H., Afsar-Manesh, N., Bierman, A. S., Chang, C., Colón-Rodríguez, C. J., Dullabh, P., et al. (2023). Guiding principles to address the impact of algorithm Bias on racial and ethnic disparities in health and health care. JAMA Netw. Open 6:e2345050. doi: 10.1001/jamanetworkopen.2023.45050

PubMed Abstract | Crossref Full Text | Google Scholar

Dankwa-Mullan, I. (2024). Health equity and ethical considerations in using artificial intelligence in public health and medicine. Prev. Chronic Dis. 21:E64. doi: 10.5888/pcd21.240245

PubMed Abstract | Crossref Full Text | Google Scholar

Economou-Zavlanos, N. J., Bessias, S., Cary, M. P. Jr., Bedoya, A. D., Goldstein, B. A., Jelovsek, J. E., et al. (2024). Translating ethical and quality principles for the effective, safe and fair development, deployment and use of artificial intelligence technologies in healthcare. J. Am. Med. Inform. Assoc. 31, 705–713. doi: 10.1093/jamia/ocad221

PubMed Abstract | Crossref Full Text | Google Scholar

Ferryman, K., Mackintosh, M., and Ghassemi, M. (2023). Considering biased data as informative artifacts in AI-assisted health care. N. Engl. J. Med. 389, 833–838. doi: 10.1056/NEJMra2214964

PubMed Abstract | Crossref Full Text | Google Scholar

Gaonkar, B., Cook, K., and Macyszyn, L. (2023). Ethical issues arising due to Bias in training a.I. Algorithms in healthcare and data sharing as a potential solution. AI Ethics J. 1:916. doi: 10.47289/AIEJ20200916

Crossref Full Text | Google Scholar

Garcia Valencia, O. A., Suppadungsuk, S., Thongprayoon, C., Miao, J., Tangpanithandee, S., Craici, I. M., et al. (2023). Ethical implications of Chatbot utilization in nephrology. J. Pers. Med. 13:1393. doi: 10.3390/jpm13091363

PubMed Abstract | Crossref Full Text | Google Scholar

Jager, K. J., Kovesdy, C., Langham, R., Rosenberg, M., Jha, V., and Zoccali, C. (2019). A single number for advocacy and communication-worldwide more than 850 million individuals have kidney diseases. Nephrol. Dial. Transplant. 34, 1803–1805. doi: 10.1093/ndt/gfz174

PubMed Abstract | Crossref Full Text | Google Scholar

Jha, V., Al-Ghamdi, S. M. G., Li, G., Wu, M. S., Stafylas, P., Retat, L., et al. (2023). Global economic burden associated with chronic kidney disease: a pragmatic review of medical costs for the inside CKD research Programme. Adv. Ther. 40, 4405–4420. doi: 10.1007/s12325-023-02608-9

PubMed Abstract | Crossref Full Text | Google Scholar

Koirala, P., Thongprayoon, C., Miao, J., Garcia Valencia, O. A., Sheikh, M. S., Suppadungsuk, S., et al. (2025). Evaluating AI performance in nephrology triage and subspecialty referrals. Sci. Rep. 15:3455. doi: 10.1038/s41598-025-88074-5

PubMed Abstract | Crossref Full Text | Google Scholar

Kuhlman, C., Jackson, L., and Chunara, R. (2020). “No Computation without Representation: Avoiding Data and Algorithm Biases through Diversity,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA: Association for Computing Machinery), 3593.

Google Scholar

Lammons, W., Silkens, M., Hunter, J., Shah, S., and Stavropoulou, C. (2023). Centering public perceptions on translating AI into clinical practice: patient and public involvement and engagement consultation focus group study. J. Med. Internet Res. 25:e49303. doi: 10.2196/49303

Crossref Full Text | Google Scholar

Meng, C., Trinh, L., Xu, N., Enouen, J., and Liu, Y. (2022). Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12:7166. doi: 10.1038/s41598-022-11012-2

PubMed Abstract | Crossref Full Text | Google Scholar

Mennella, C., Maniscalco, U., De Pietro, G., and Esposito, M. (2024). Ethical and regulatory challenges of AI technologies in healthcare: a narrative review. Heliyon. 10:e26297. doi: 10.1016/j.heliyon.2024.e26297

PubMed Abstract | Crossref Full Text | Google Scholar

Miao, J., Thongprayoon, C., Kashani, K. B., and Cheungpasitporn, W. (2025). Artificial intelligence as a tool for improving health literacy in kidney care. PLOS Digit Health. 4:e0000746. doi: 10.1371/journal.pdig.0000746

PubMed Abstract | Crossref Full Text | Google Scholar

Mohammed, Y. N., Khurana, S., Gulati, A., Rahaman, Z., Lohana, A. C., and Santosh, R. (2024). Advancing healthcare equity in nephrology: addressing racial and ethnic disparities in research trials and treatment strategies. Cureus. 16:e56913. doi: 10.7759/cureus.56913

PubMed Abstract | Crossref Full Text | Google Scholar

Mudgal, K. S., and Das, N. (2020). The ethical adoption of artificial intelligence in radiology. BJR|Open. 2:20190020. doi: 10.1259/bjro.20190020

PubMed Abstract | Crossref Full Text | Google Scholar

Nazer, L. H., Zatarah, R., Waldrip, S., Ke, J. X. C., Moukheiber, M., Khanna, A. K., et al. (2023). Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health. 2:e0000278. doi: 10.1371/journal.pdig.0000278

PubMed Abstract | Crossref Full Text | Google Scholar

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453. doi: 10.1126/science.aax2342

PubMed Abstract | Crossref Full Text | Google Scholar

Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V., and Daneshjou, R. (2023). Large language models propagate race-based medicine. NPJ Digital Med. 6:195. doi: 10.1038/s41746-023-00939-z

PubMed Abstract | Crossref Full Text | Google Scholar

Open AI. Introducing Chat GPT (2024). Available online at: https://chat.openai.com/ (Accessed March 2, 2024).

Google Scholar

Pais, P., and Iyengar, A. (2023). Kidney Care for all: addressing the gaps in an imperfect world-a global perspective on improving access to kidney Care in low-Resource Settings. Kidney 4, 982–986. doi: 10.34067/KID.0000000000000128

Crossref Full Text | Google Scholar

Parikh, R. B., Teeple, S., and Navathe, A. S. (2019). Addressing Bias in artificial intelligence in health care. JAMA 322, 2377–2378. doi: 10.1001/jama.2019.18058

PubMed Abstract | Crossref Full Text | Google Scholar

Pham, J. H., Thongprayoon, C., Miao, J., Suppadungsuk, S., Koirala, P., Craici, I. M., et al. (2024). Large language model triaging of simulated nephrology patient inbox messages. Front. Artif Intell. 7:1452469. doi: 10.3389/frai.2024.1452469

PubMed Abstract | Crossref Full Text | Google Scholar

Pinho-Gomes, A. C., Carcel, C., Woodward, M., and Hockham, C. (2023). Women's representation in clinical trials of patients with chronic kidney disease. Clin. Kidney J. 16, 1457–1464. doi: 10.1093/ckj/sfad018

PubMed Abstract | Crossref Full Text | Google Scholar

Salsberg, E., Richwine, C., Westergaard, S., Portela Martinez, M., Oyeyemi, T., Vichare, A., et al. (2021). Estimation and comparison of current and future racial/ethnic representation in the US health care workforce. JAMA Netw. Open 4:e213789-e. doi: 10.1001/jamanetworkopen.2021.3789

PubMed Abstract | Crossref Full Text | Google Scholar

Sheikh, M. S., Kashani, K. B., Gregoire, J. R., Thongprayoon, C., Miao, J., Craici, I. M., et al. (2025). Digital transformation of nephrology POCUS education-integrating a multiagent, artificial intelligence, and human collaboration-enhanced curriculum with expert feedback. Digit Health. 11:20552076251328807. doi: 10.1177/20552076251328807

PubMed Abstract | Crossref Full Text | Google Scholar

Solomonides, A. E., Koski, E., Atabaki, S. M., Weinberg, S., McGreevey, J. D. III, Kannry, J. L., et al. (2022). Defining AMIA's artificial intelligence principles. J. Am. Med. Inform. Assoc. 29, 585–591. doi: 10.1093/jamia/ocac006

PubMed Abstract | Crossref Full Text | Google Scholar

Ueda, D., Kakinuma, T., Fujita, S., Kamagata, K., Fushimi, Y., Ito, R., et al. (2024). Fairness of artificial intelligence in healthcare: review and recommendations. Jpn. J. Radiol. 42, 3–15. doi: 10.1007/s11604-023-01474-3

PubMed Abstract | Crossref Full Text | Google Scholar

Vanholder, R., Annemans, L., Braks, M., Brown, E. A., Pais, P., Purnell, T. S., et al. (2023). Inequities in kidney health and kidney care. Nat. Rev. Nephrol. 19, 694–708. doi: 10.1038/s41581-023-00745-6

PubMed Abstract | Crossref Full Text | Google Scholar

Walsh, G., Stogiannos, N., van de Venter, R., Rainey, C., Tam, W., McFadden, S., et al. (2023). Responsible AI practice and AI education are central to AI implementation: a rapid review for all medical imaging professionals in Europe. BJR|Open. 5:33. doi: 10.1259/bjro.20230033

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, Y., Liu, X., Jin, Q., Huang, F., and Lu, Z. (2024). Unmasking and quantifying racial Bias of large language models in medical report generation. ArXiv. 4:601. doi: 10.1038/s43856-024-00601-z

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: artificial intelligence, diversity, equity, and inclusion, nephrology, bias detection, ethical AI regulation, decision-making, ChatGPT, clinical implications

Citation: Balakrishnan S, Thongprayoon C, Wathanavasin W, Miao J, Mao MA, Craici IM and Cheungpasitporn W (2025) Evaluating artificial intelligence bias in nephrology: the role of diversity, equity, and inclusion in AI-driven decision-making and ethical regulation. Front. Artif. Intell. 8:1525937. doi: 10.3389/frai.2025.1525937

Received: 10 November 2024; Accepted: 12 May 2025;
Published: 27 May 2025.

Edited by:

Filippo Gibelli, University of Camerino, Italy

Reviewed by:

Yuri Battaglia, University of Verona, Italy
Dinh Tuan Phan Le, New York City Health and Hospitals Corporation, United States
Ghulam Abbas, Southeast University, China

Copyright © 2025 Balakrishnan, Thongprayoon, Wathanavasin, Miao, Mao, Craici and Cheungpasitporn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wisit Cheungpasitporn, d2NoZXVuZ3Bhc2l0cG9ybkBnbWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.