Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects

Belkhouribchia, Jamal; Pen, Joeri Jan

doi:10.3389/fnut.2025.1635682

REVIEW article

Front. Nutr., 07 August 2025

Sec. Nutrition Methodology

Volume 12 - 2025 | https://doi.org/10.3389/fnut.2025.1635682

Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects

Jamal Belkhouribchia¹^*

Joeri Jan Pen²

¹Endocrinology Center Hasselt, Hasselt, Belgium
²Department of Nutrition, UZ Brussel, Vrije Universiteit Brussel (VUB), Brussels, Belgium

The integration of large language models (LLMs) into clinical nutrition marks a transformative advancement, offering promising solutions for enhancing patient care, personalizing dietary recommendations, and supporting evidence-based clinical decision-making. Trained on extensive text corpora and powered by transformer-based architectures, LLMs demonstrate remarkable capabilities in natural language understanding and generation. This review provides an overview of their current and potential applications in clinical nutrition, focusing on key technologies including prompt engineering, fine-tuning, retrieval-augmented generation, and multimodal integration. These enhancements increase domain relevance, factual accuracy, and contextual responsiveness, enabling LLMs to deliver more reliable outputs in nutrition-related tasks. Recent studies have shown LLMs’ utility in dietary planning, nutritional education, obesity management, and malnutrition risk assessment. Despite these advances, challenges remain. Limitations in reasoning, factual accuracy, and domain specificity, along with risks of bias and hallucination, underscore the need for rigorous validation and human oversight. Furthermore, ethical considerations, environmental costs, and infrastructural integration must be addressed before widespread adoption. Future directions include combining LLMs with predictive analytics, integrating them with electronic health records and wearables, and adapting them for multilingual, culturally sensitive dietary guidance. LLMs also hold potential as research and educational tools, assisting in literature synthesis and patient engagement. Their transformative promise depends on cross-disciplinary collaboration, responsible deployment, and clinician training. Ultimately, while LLMs are not a replacement for healthcare professionals, they offer powerful augmentation tools for delivering scalable, personalized, and data-driven nutritional care in an increasingly complex healthcare environment.

1 Introduction

The field of clinical nutrition is facing a transformative change with the advent of large language models (LLMs), a domain within artificial intelligence (AI). These advanced systems are trained on vast datasets and exhibit remarkable generative capabilities (1–8). LLMs can produce diverse content autonomously, including interpreting complex queries, synthesizing information, and providing human-like responses (9–11). In clinical nutrition, where decision-making often involves integrating patient-specific data, scientific evidence, and evolving guidelines, the potential of LLMs is profound (1, 12). They promise to streamline workflows, enhance personalized care, and support clinicians in making data-driven decisions. However, while their capabilities are impressive, understanding their role, limitations, and ethical considerations is essential for responsible integration into clinical practice (2, 13–15).

Clinical nutrition involves screening, diagnosing, treating, and monitoring patients with specific nutritional issues or diseases that require dietary adjustments. To support this process, nutrition specialists rely on medical records, anthropometric measurements, laboratory results, and dietary information to develop personalized nutritional plans that align with current scientific guidelines. LLMs can significantly enhance and streamline this workflow by analyzing patient data, incorporating evidence-based guidelines, and assisting physicians or dietitians in the diagnosis and management of nutritional problems.

Although LLMs and LLM-based tools such as ChatGPT (Generative Pretrained Transformer) are widely adopted across various industries, their potential and application within clinical nutrition remain largely unexplored. Before these technologies can be integrated into routine practice, it is essential that nutrition specialists gain a thorough understanding of their underlying mechanisms, capabilities, and limitations. Furthermore, LLMs must operate transparently and provide users with the ability to verify the sources upon which their recommendations are based.

This article explores how LLMs are shaping the future of clinical nutrition, offering insights into their applications, benefits, challenges, and the potential to revolutionize patient care.

2 Natural language processing

Natural Language Processing (NLP) is a multidisciplinary field at the intersection of linguistics, computer science, and artificial intelligence, aiming to enable machines to understand, process, and generate human language in a meaningful way. NLP applications rely on diverse methodologies, ranging from traditional rule-based systems to cutting-edge machine learning techniques (16). Within the broader field of NLP, LLMs have emerged as a transformative technology. They represent a specialized class of machine learning models designed to handle complex language tasks by leveraging vast amounts of pretraining data (17–19).

Building on their general capabilities, LLMs enhance the functionality of NLP systems by allowing more accurate interpretation of complex language, including specialized terminology and context-dependent meaning. When applied to clinical domains, these models can be adapted to handle discipline-specific content with a high degree of relevance. In clinical nutrition, this opens the possibility to efficiently process and interpret diverse sources of textual information, such as dietary records, medical notes, and scientific publications, supporting clinicians in translating data into meaningful, individualized nutritional advice.

3 LLM types and architecture

Understanding the types and underlying architectures of LLMs is crucial to appreciating how they process and generate language in clinical nutrition applications. LLMs are built on a type of deep learning architecture known as the transformer, which has become foundational in NLP (20). Transformers process language by dividing text into units called tokens. Tokens may represent words, parts of words, or characters. Tokens are then converted into numerical representations, allowing the model to analyze relationships and contextual meaning (21–23). A key feature of Transformer-based models is their attention mechanism, which enables them to weigh the relevance of different words in a sentence or paragraph, even if they are far apart. This mechanism is what allows LLMs to interpret nuanced queries and maintain contextual coherence over long passages of text (24–26). Transformer models generally fall into three categories based on how they process information: encoder-only, decoder-only, and encoder-decoder models (27, 28).

Encoder-only models are designed to understand and analyze input text (29). They process text bidirectionally, considering both what comes before and after a given token, which makes them particularly effective for tasks like information extraction, classification, or identifying relevant clinical features in unstructured data.

Decoder-only models, such as GPT (Generative Pre-trained Transformer), are optimized for generating text (30). They process text in a unidirectional manner, predicting the next token based on the previous ones. These models excel at generating coherent, human-like responses and are well suited for use cases like clinical documentation support, patient education, or answering open-ended questions.

Encoder-decoder models are designed to take in an input via the encoder, transform it into an internal representation, and then generate a corresponding output via the decoder. This structure is particularly useful for tasks like summarization, translation, or structured question-answering, where the model must fully understand the input and produce a targeted response (31).

Each architecture has strengths depending on the intended use (32). In clinical contexts such as nutrition, selecting the appropriate type of large language model is essential for optimizing outcomes. Whether the task involves analysis, content generation, or structured interaction, matching the model to the use case is critical.

A clear understanding of LLMs’ architectural distinctions is critical to aligning model capabilities with specific clinical tasks and ensuring meaningful, reliable outcomes in nutritional practice.

4 Techniques to enhance LLMs

Although LLMs are typically pre-trained on broad, general-purpose data, several methods can improve their performance, accuracy, and domain relevance, particularly for specialized applications in fields like clinical nutrition. Key enhancement techniques include prompt engineering, fine-tuning, retrieval-augmented generation (RAG), and multimodal integration (33–36).

4.1 Prompt engineering

Prompt engineering is the practice of structuring user inputs in a way that elicits more precise, relevant, or task-specific outputs from the model (37, 38). Because LLMs are highly sensitive to the phrasing and context of a prompt, small adjustments can significantly influence the quality of the response (39). For instance, a general query like “What is the dietary treatment for diabetes?” may yield vague or generic output. Rephrasing it as “Provide nutritional treatment for type 2 diabetes in adults, based on current clinical guidelines, and include references” tends to produce more structured and clinically relevant results. Several strategies exist. Zero-shot prompting asks the model to perform a task without prior examples, relying on its general training. Few-shot prompting provides illustrative examples within the prompt to guide the model’s behavior (40). Chain-of-thought prompting instructs the model to reason step-by-step, thereby improving its performance on complex or multi-step queries (41). Prompt engineering is particularly valuable when model retraining is not feasible. It allows clinicians to adapt general-purpose models for specific tasks using careful phrasing, without altering the model itself.

4.2 Fine-tuning

Fine-tuning involves updating a pre-trained model using additional data specific to a task, institution, or clinical domain (42). This process refines the model’s internal representations, improving performance on highly specialized queries. Full fine-tuning adjusts all model parameters and is effective but computationally intensive. It also carries a risk of overfitting when the dataset is small. Parameter-efficient methods, such as Low-Rank Adaptation (LoRA), modify only a subset of parameters, reducing resource requirements while preserving general capabilities (43). Domain adaptation, a subset of fine-tuning, uses field-specific datasets (e.g., clinical nutrition guidelines) to align the model with professional language, knowledge, and priorities in that domain. In practice, fine-tuned models can more reliably answer domain-specific questions, generate summaries from patient records, or support documentation using accurate terminology.

4.3 Retrieval-augmented generation large language models

Retrieval-augmented generation large language models (RAG-LLMs) combine a language model with an external retrieval mechanism that supplies relevant documents or facts in real time, thereby increasing factual accuracy and contextual relevance (44, 45). This approach addresses a key limitation of LLMs: their reliance on static training data, which can become outdated or incomplete. In a RAG-LLM setup, when a user submits a query, the system first retrieves up-to-date or domain-specific documents, such as recent guidelines or clinical studies, and then provides this context to the LLM. The model uses this information to generate an informed, citation-backed response. RAG-LLMs are especially valuable in clinical domains like nutrition, where evidence changes regularly and accuracy is essential. RAG-LLMs enable dynamic access to trusted knowledge sources, improving factual consistency and clinical relevance.

4.4 Multimodal integration

Multimodal LLMs extend traditional text-based models by incorporating other forms of input, such as images or audio (46–48). This opens the door to richer, context-aware outputs across more complex workflows. In clinical nutrition, potential use cases include: interpreting food photos to assess dietary intake; combining blood test results and anthropometric data with textual dietary advice; supporting visually enhanced patient education materials. Although still emerging, multimodal models represent the next stage in LLM development, especially in domains where information comes in diverse formats. The use of multimodel LLMs in clinical nutrition has yet to commence, there are, however, examples in other fields of medicine (49).

In summary, enhancement techniques such as prompt engineering, fine-tuning, RAG, and multimodal integration significantly improve the practical utility of LLMs in clinical nutritional. By making models more responsive, accurate, and context-aware, these methods allow LLMs to meet the demands of specialized domains like clinical nutrition while maintaining clinical reliability.

5 LLM applications in clinical nutrition

The application of LLMs in clinical nutrition is expanding rapidly, with increasing adoption and diverse use cases, see Figure 1. This section highlights several key examples to illustrate their potential impact and also their shortcomings. A summary is given in Table 1.

Figure 1

Diagram of a circle divided into four blue quadrants labeled: Dietary advice, Dietary analysis, Information & Education, and Data extraction. Each quadrant has an accompanying box detailing tasks: Dietary advice includes generating individualized meal plans and cultural adaptation; Dietary analysis involves estimating energy and analyzing allergens; Information & Education covers generating patient education and tools for students; Data extraction involves extracting health data and summarizing patient history.

Figure 1. Overview of the possibilities of large language models in clinical nutrition.

Table 1

Table 1. Overview of some LLM application examples that are used in clinical nutrition.

This review does not aim for an exhaustive coverage but instead provides a curated overview of illustrative examples to highlight the range of current and emerging applications of LLMs in clinical nutrition. Articles were selected based on their relevance, recency (2023–2025), and ability to demonstrate specific clinical or technological use cases. Selection was guided by the authors’ expertise, supplemented by targeted searches in PubMed and Google Scholar using keywords such as “ChatGPT,” “large language models,” “clinical nutrition,” and “personalized nutrition.” Studies were grouped thematically into five domains – dietary recommendations, information and education, ingredient analysis, data extraction, and cross-disciplinary innovations – to reflect common patterns and areas of interest. This approach prioritizes breadth and relevance over completeness, aiming to inform and to inspire future research and implementation.

5.1 LLMs for dietary recommendations

Singh and colleagues performed a meta-analysis looking at chatbot interventions designed to improve physical activity, diet and sleep (50). Analyzing 19 trials with sample sizes ranging from 25 to 958 and participant ages between 9 and 71, the study found significant improvements in physical activity, daily steps, moderate-to-vigorous physical activity, fruit and vegetable consumption, sleep duration, and sleep quality. Text-based and AI-driven chatbots outperformed voice chatbots in dietary improvements, and multicomponent interventions were more effective than chatbot-only approaches for enhancing sleep outcomes. Despite a predominance of low-quality studies, findings demonstrate that chatbot interventions are effective across diverse populations and settings (Table 2).

Table 2

Table 2. Glossary of technical terminology.

Arslan explored the potential of ChatGPT, an AI-driven language model, in the treatment of obesity, a growing global health concern (51). ChatGPT’s capabilities include providing personalized recommendations for nutrition plans, exercise programs, and psychological support, as well as developing predictive models for obesity-related diseases like diabetes and cardiovascular conditions. These features could enhance weight management and reduce associated health risks through tailored and adaptive treatment strategies. However, the study highlights challenges such as the model’s limited contextual understanding, lack of emotional intelligence, privacy and security concerns, and ethical considerations regarding accountability for AI-generated advice. Despite these limitations, ChatGPT presents promising opportunities in obesity management, though its application in healthcare requires cautious implementation and further research.

Haman et al. evaluated the accuracy and reliability of ChatGPT in generating nutritional information for dietary planning and weight management (52). Utilizing the United States Department of Agriculture (USDA) Food Data Central as a reference, ChatGPT demonstrated high accuracy in estimating energy values, with 97% of its predictions falling within a 40% margin of USDA data. The model exhibited consistency across nutrient estimates, as indicated by low coefficients of variation, and effectively generated daily meal plans, with all meals adhering to a 30% margin of USDA caloric values. However, limitations were observed, including variable accuracy for specific nutrients, the inability to account for chronic health conditions, and the potential for generating plausible yet inaccurate information. While ChatGPT showed promise as a supplementary tool, the study emphasized ChatGPT should not replace professional medical or dietary guidance.

Khan looked at the potential of ChatGPT in addressing protein-energy malnutrition (PEM), a critical global health issue (53). ChatGPT demonstrates the ability to provide personalized dietary recommendations, guidance on protein-rich food choices, psychological support, and real-time monitoring to improve PEM interventions. It can also analyze PEM-related data to inform research and policymaking. However, limitations such as the inability to perform physical assessments, reliance on user inputs, susceptibility to bias, and inadequate handling of complex cases highlight the importance of integrating AI tools with healthcare professionals. Collaborative efforts combining AI capabilities and human expertise are essential for achieving accurate diagnoses, individualized treatment plans, and comprehensive care in PEM management.

Wang and co-workers explored ChatGPT-4’s ability to support personalized nutritional advice for dialysis patients by generating meal plans based on virtual patient profiles created via Monte Carlo simulation (54). A renal dietitian evaluated the generated recipes, cooking instructions, and nutritional analyses, rating the instructions highly (5/5) but the recipes and nutritional analysis lower (3/5 and 2/5, respectively). ChatGPT’s nutritional analysis underestimated key nutrients, including calories (36%), protein (28%), and potassium (49%), among others. Recipe translations into multiple languages were rated as reliable (4/5). While ChatGPT-4 demonstrates potential for personalized guidance, significant improvements are needed for accurate nutritional analysis and medical applicability. Although this study states that translations in different languages were reliable, this is not always the case when using LLMs in clinical nutrition. Adilmetova et al. evaluated ChatGPT-4’s ability to provide personalized, evidence-based dietary recommendations in English, Kazakh, and Russian in Central Asia using 50 mock patient profiles (55). Performance was assessed for personalization, consistency, and practicality, revealing moderate effectiveness in English and Russian but unsuitability for Kazakh due to insufficient outputs. Statistically significant differences (p < 0.001) were observed across the three languages, with English and Russian outperforming Kazakh. The findings highlight ChatGPT-4’s limitations in underrepresented languages, emphasizing the need for customized models tailored to local diets and sociocultural contexts.

Using popular LLMs like ChatGPT in clinical practice may seem appealing. However, despite their potential utility, they carry the risk of generating inaccurate or misleading information. Hieronimus and colleagues assessed the ability of AI chatbots ChatGPT and Bard (now Gemini) to generate meal plans meeting dietary reference intakes (DRIs) for omnivorous, vegetarian, and vegan diets (56). Across 108 meal plans, nutrient analysis showed lower energy and carbohydrate content but excess protein relative to DRIs. Common deficiencies included vitamin D and fluoride, with vegan plans also lacking vitamin B12. ChatGPT suggested B12 supplementation in some cases, while Bard did not. No significant differences were observed between the chatbots or prompts. While these tools provide general dietary inspiration, they are unsuitable for creating nutritionally adequate plans, particularly for restrictive diets.

Niszczota and Rybicka evaluated ChatGPT’s ability to create elimination diets for individuals with food allergies (57). They focused on safety, accuracy, and variety. While ChatGPT correctly excluded allergens in most cases, critical errors were identified, such as including allergenic ingredients like almond milk in nut-free diets. The model also demonstrated inaccuracies in energy and portion calculations and generated monotonous menus with limited variety. Despite these shortcomings, ChatGPT adheres to some basic dietary guidelines and shows potential for improving accessibility to dietary advice. However, the study highlights the risks of using ChatGPT for critical health tasks, emphasizing the need for model fine-tuning and further research to enhance safety and reliability in nutritional recommendations.

In order to improve LLMs performance, more sophisticated technologies can be incorporated. Papastratis et al. introduce a novel AI-based diet recommendation system that combines deep generative networks, such as variational autoencoders and recurrent neural networks, with LLMs like ChatGPT to provide accurate, personalized weekly meal plans (58). By modeling user profiles (e.g., anthropometric measurements and medical conditions) and embedding predefined nutritional guidelines from EFSA (European Food Safety Authority) and the WHO (World Health Organization) as loss functions, the system ensures outputs that align with expert-validated dietary standards while maintaining high accuracy and explainability. The integration of ChatGPT expands the meal database by generating additional meal options from diverse cuisines, enhancing variety and applicability across different populations without acting as a retrieval system. Evaluations on 3,000 virtual and 1,000 real user profiles demonstrated superior performance in energy and macronutrient alignment compared to ChatGPT, showcasing its precision and potential for seamless integration into healthcare and fitness applications. Future work will focus on accommodating more dietary preferences, international cuisines, and real-world user feedback to further refine its effectiveness.

5.2 LLMs for information and education

Barlas and colleagues assessed the credibility of ChatGPT-3.5 in providing information on the assessment and management of obesity in type 2 diabetes (T2D) based on the latest American Diabetes Association (ADA) and American Association of Clinical Endocrinology (AACE) guidelines (59). In a cross-sectional design, 20 patient-focused questions were posed by experienced endocrinologists, and responses were categorized as compatible, compatible but insufficient, partially incompatible, or incompatible with the guidelines. ChatGPT demonstrated 100% compatibility in the assessment of obesity but lower adherence in therapy-related sections, including nutrition, pharmacotherapy, and surgical interventions, often requiring additional prompts for completeness. While ChatGPT provided clear, systematic, and understandable answers, it lacked currency regarding recently updated information and specificity in sourcing guidelines. These findings emphasize that although ChatGPT holds potential as a supplementary tool for information retrieval, it should not replace healthcare professionals’ patient-centered approaches, as individualized care and human oversight remain critical for ensuring accuracy and reliability in medical guidance.

Although LLMs promise to greatly enhance the outcome of patients in clinical nutrition, adoption by patients could pose a challenge. Vandelanotte et al. explored user perceptions and expectations of an artificially intelligent physical activity digital assistant through six focus groups that consisted of 45 participants (60). Participants expressed enthusiasm for such an assistant, emphasizing the importance of customizable features, including notifications, personality, and appearance. While participants were open to sharing information for personalization, their willingness varied significantly. Despite privacy concerns, they supported the use of AI and machine learning for enhanced functionality. However, the strong demand for personalization presents challenges in terms of development cost and complexity, highlighting the need for careful design to meet user expectations.

LLMs like ChatGPT can, however, provide useful answers to general nutrition-related questions. Kirk et al. evaluated ChatGPT’s competency in answering common nutrition questions compared to dieticians’ responses (61). Questions and answers were graded by experts on scientific correctness, actionability, and comprehensibility. ChatGPT outperformed dieticians in overall scores for five out of eight questions, excelling in scientific correctness, actionability, and comprehensibility in several instances. Dieticians’ answers did not surpass ChatGPT’s scores in any category. These findings suggest that ChatGPT can effectively address frequently asked nutrition questions, highlighting its potential as a supportive tool for providing nutrition information.

LLMs are increasingly used as a tool in clinical nutrition practice; they are also being employed as an educational tool. Liao and colleagues evaluated ChatGPT’s performance in providing dietary advice to college students, assessed by 30 dietitians and a nutrition literacy (NL) test (62). While ChatGPT demonstrated high accuracy in the NL test (84.38%), surpassing the NL level of Taiwanese students, its responses were often incomplete, impractical, and lacked thoroughness, raising concerns about potential misunderstandings. Dietitians frequently cited a lack of rigor in the information provided. Despite these gaps, ChatGPT’s readability and potential as a supplementary educational tool were recognized, emphasizing the need for improved AI guidelines and training materials to enhance its effectiveness in nutrition education.

5.3 LLMs for ingredient analysis

LLMs can also be used to estimate energy and macronutrient content of food. However, their performance is still suboptimal. Hoang and co-workers evaluated the reliability of ChatGPT-3.5 and ChatGPT-4 in estimating the energy and macronutrient content of 222 food items across eight menus, comparing their results to nutritionists’ recommendations (63). While AI estimations for energy, carbohydrates, and fats were consistent with nutritionists’ data, protein estimates showed significant discrepancies. ChatGPT-4 outperformed ChatGPT-3.5 in accuracy but overestimated protein content. Both chatbots provided accurate energy estimates within ±10% for 35–48% of food items. Despite these limitations, the study highlights AI chatbots as convenient tools for basic nutritional analysis but notes their inability to offer personalized dietary advice or account for household portion sizes. Enhancements in AI specialization for nutrition could significantly improve their utility in dietetics.

Sun et al. examined an AI-based nutritionist program designed to address the challenges of nutritional management in patients with type 2 diabetes mellitus in China (64). The program integrates advanced large language models (ChatGPT and GPT 4.0) and a deep learning-based image recognition model (Dino V2) to provide dietary recommendations and ingredient-level meal analysis. ChatGPT demonstrated proficiency by passing the Chinese Registered Dietitian Examination and generating responses that aligned well with expert recommendations, though inconsistencies were noted for certain Chinese-specific foods. The image recognition model achieved high accuracy in identifying ingredients, outperforming previous models. A user-friendly WeChat mini-program was developed to enhance patient engagement by enabling automated meal logging and dietary feedback. Despite promising results, limitations include variability in AI responses and the need for a defined question scope. The findings support advancing this AI nutritionist program to a clinical pilot study to assess its real-world effectiveness in improving patient adherence to dietary recommendations and health outcomes.

5.4 LLMs for data extraction

RAG-LLMs are an exciting extension of regular LLMs that make use of an external knowledge base in order to enhance the LLM’s performance. Alkhalaf and co-workers evaluated the effectiveness of using the open-source Llama 2 LLM with zero-shot prompt engineering, both alone and combined with RAG, to summarize and extract malnutrition-related data from electronic health records (EHRs) in Australian aged care facilities (65). Results showed that the model achieved high accuracy in summarizing structured malnutrition notes (93.25%) and extracting risk factors (90%), with RAG integration further improving summarization accuracy to 99.25%. While the model effectively processed explicit information, it encountered hallucination issues when details were implicit or missing. The RAG approach mitigated these limitations by providing relevant external data, enhancing the model’s ability to generate accurate, contextually relevant outputs. The findings underscore the potential of LLMs combined with RAG to streamline EHR data analysis, improve care quality, and support timely interventions for malnutrition and other healthcare challenges in aged care settings.

5.5 Cross-disciplinary innovations with potential for clinical nutrition

Although ChatGPT and similar LLMs can be useful, hallucinations and incomplete information are still important drawbacks. Lee et al. developed and evaluated a dual retrieval-augmented generation system to enhance the accuracy and reliability of LLMs in diabetes management across diverse languages and guidelines (66). By integrating dense and sparse retrieval methods, the system addressed limitations in semantic and keyword-based searches, utilizing dense retrievers like Solar Embedding-1-large and OpenAI’s text-embedding-3-large alongside the BM25 algorithm for sparse retrieval. Evaluation using the 2023 Korean and American diabetes guidelines demonstrated superior performance for ensemble retrievers, reducing hallucinations and maintaining high retrieval precision. The system highlights the potential for cross-regional applications, offering a scalable solution to provide accurate, current medical information in dynamic fields like diabetes management while minimizing the need for frequent LLM retraining. This strategy could be adopted in clinical nutrition for LLMs with better performance.

LLMs hold substantial potential to enhance nutritional care by supporting dietary recommendations, information and education, ingredient analysis, and data extraction. However, challenges such as limited accuracy, incomplete outputs, and hallucinations remain significant barriers to their clinical adoption. Emerging strategies like retrieval-augmented generation and fine-tuning, already being applied in other medical domains such as diabetes care, offer promising pathways to overcome these limitations (67, 68). Adapting these solutions to the nutritional context will be crucial for developing safe, reliable, and context-aware LLMs that can meaningfully support clinicians and patients alike.

6 Limitations and challenges of LLMs

Although LLMs have impressive potential, their capabilities are accompanied by a range of limitations and challenges that constrain their effectiveness, reliability, and ethical deployment. These challenges span technical, practical, and societal domains.

6.1 Data quality

One fundamental limitation of LLMs lies in their reliance on training data. These models learn patterns and relationships from vast corpora of text, but their performance is inherently constrained by the quality, diversity, and representativeness of the data used for training. Biases present in the training data are often reflected in the outputs of LLMs, perpetuating stereotypes and inequities (69). Biased representations of gender, race, or cultural norms within datasets can lead to outputs that reinforce these biases, posing ethical challenges in applications where neutrality and fairness are paramount (70–72). For example, when using an LLM to generate dietary advice, the model should be able to take into account individual factors such as cultural background, religious dietary practices, and regional food availability. Achieving this requires training data that accurately reflects diverse populations and dietary contexts. If the underlying data lacks this diversity or contains cultural biases, the model may produce dietary recommendations that are unsuitable or insensitive, potentially undermining patient trust and limiting the clinical usefulness of the advice.

6.2 Accuracy of LLMs

LLMs exhibit limitations in reasoning and factual accuracy (73–76). Although they are capable of generating coherent and contextually appropriate responses, their knowledge is static and limited to the data they were trained on, which often has a cutoff date. This means they cannot access or incorporate new information that arises after their training. Furthermore, LLMs lack a true understanding of the concepts they process, relying instead on probabilistic patterns to predict outputs. This can lead to hallucinations, where the model generates plausible-sounding but incorrect or nonsensical information. Such inaccuracies pose risks in critical fields like clinical nutrition and healthcare in general, where errors can have significant consequences. For instance, an LLM might incorrectly recommend micronutrient dosages that exceed safe upper limits, fail to account for specific dietary restrictions in food allergy, or generate outdated guidance on nutritional care in gestational diabetes. Inaccurate interpretation of lab values or mismatched nutritional protocols for conditions like celiac disease or refeeding syndrome could further compromise patient safety. These examples highlight that LLMs have clear limitations that nutrition specialists must be aware of and actively consider in clinical practice. They underscore the importance of rigorous human oversight, domain-specific fine-tuning, and real-time validation when applying LLMs in clinical decision-making.

6.3 Computational and energy costs

Another significant challenge is the computational intensity of LLMs. Training and deploying these models requires substantial computational resources, including high-performance hardware and significant energy consumption (77). This raises concerns about the environmental impact of large-scale LLMs, as well as their accessibility to smaller organizations or institutions with limited resources. The high cost associated with developing and maintaining these models exacerbates the divide between well-funded entities and smaller players, potentially centralizing control of this transformative technology. This economic barrier has direct implications for nutritional care in low-resource countries, where the burden of undernutrition is high, access to trained dietitians is limited, and current clinical guidelines are often unavailable. In these settings, the potential value of LLMs may be even greater. Yet the high cost of implementing such tools risks placing them out of reach precisely where they could make the most impact. Without targeted strategies to improve accessibility, the use of LLMs in clinical nutrition may inadvertently deepen global health disparities rather than help to close them.

6.4 Transparency

LLMs also face challenges in interpretability and explainability (78). Despite their remarkable capabilities, the decision-making processes of these models are often opaque, making it difficult to understand why a particular output was generated. This lack of transparency complicates their integration into domains requiring accountability, such as clinical nutrition. In these contexts, stakeholders need to trust the system’s outputs and have mechanisms to verify or challenge its conclusions, yet the black-box nature of LLMs undermines this trust. In clinical nutrition, transparency is especially important. Dietitians and nutrition physicians must be able to explain the rationale behind their recommendations, whether for an individualized dietary plan, a nutrient prescription, or a nutritional intervention for a complex patient. If an LLM suggests a course of action, clinicians must be able to assess where that advice came from and whether it aligns with current guidelines and the patient’s specific context.

Without a clear link between input, reasoning, and output, clinicians are left in a difficult position: either they accept the model’s advice without understanding its basis, or they disregard it altogether. Neither approach supports responsible, evidence-based care. For LLMs to be meaningfully integrated into clinical nutrition, they must offer more than just plausible suggestions; they must provide traceable reasoning, cite their sources where possible, and allow users to interrogate the path that led to a given recommendation. Transparency is not just a technical concern; it is central to clinical responsibility, professional credibility, and patient safety. Without it, the promise of LLMs in nutrition remains incomplete.

6.5 Contextual awareness

The models’ inability to handle contextual nuances and ambiguities effectively further limits their utility. While LLMs excel at generating text based on syntactic and semantic patterns, they may struggle to interpret subtle nuances, sarcasm, idiomatic expressions, or culturally specific references (79). This limitation becomes particularly problematic in multilingual or cross-cultural applications, where the model’s understanding of context may diverge significantly from human expectations. In clinical nutrition, where patient communication is central and often nuanced, this lack of contextual awareness can be a serious concern. For instance, a phrase like “I eat light” or “I do not eat much during the day” can carry very different meanings depending on the person’s background, culture, or even local habits. Without an understanding of that context, a model may misread the intent entirely. The same goes for dietary preferences or restrictions, which are sometimes expressed in everyday or non-standard language. This is particularly true in multilingual settings, where nuance is easily lost. Understanding what a patient really means often requires not just linguistic knowledge but also cultural sensitivity and clinical experience.

6.6 Expert knowledge

LLMs often lack domain-specific expertise, particularly when applied to specialized fields without additional fine-tuning or context augmentation (80). Their generalized training enables them to perform adequately across a broad range of tasks but often fails to meet the rigor and precision required in highly technical areas. Without domain adaptation, their outputs risk being overly generic, superficial, or inaccurate in professional settings. In clinical nutrition, recommendations must be tailored not only to individual patient needs but also to complex physiological conditions, disease states, and evidence-based guidelines. In more challenging scenarios, such as managing refeeding syndrome or formulating parenteral nutrition plans, superficial suggestions from a general-purpose LLM could mislead rather than support the clinician. To be truly useful in clinical nutrition, LLMs need more than fluent language; they require structured exposure to clinical nutrition guidelines from authoritative nutrition sources.

6.7 Human oversight

Finally, user interactions with LLMs present challenges related to over-reliance and the need for human oversight. Because LLMs generate text that appears authoritative and well-informed, users may overestimate their reliability, failing to scrutinize outputs critically (81). This can lead to erroneous decisions, particularly in high-stakes environments where unchecked reliance on model outputs can have severe repercussions. Therefore, human oversight remains imperative.

While LLMs represent a significant advancement in artificial intelligence, their limitations and challenges, as discussed in this section, underscore the need for continued research and thoughtful deployment. Addressing issues such as data bias, factual accuracy, computational demands, interpretability, and ethical safeguards is critical to ensure that these models are used responsibly and equitably. By acknowledging and addressing these challenges, the field can harness the transformative potential of LLMs while mitigating the risks they pose.

7 Future directions of LLMs in clinical nutrition

The future of LLMs in clinical nutrition holds immense potential to transform the field by enabling more personalized, evidence-based, and scalable approaches to patient care. As advancements in artificial intelligence continue to unfold, LLMs are poised to play a pivotal role in integrating complex nutritional data, facilitating decision-making, and empowering healthcare professionals to address the growing burden of nutrition-related diseases (82, 83).

One of the most promising applications of LLMs in clinical nutrition is their ability to synthesize vast amounts of nutritional and medical data to support personalized dietary recommendations (84–86). Nutrition is highly individualized, influenced by factors such as age, sex, genetics, metabolic profile, lifestyle, and comorbidities. LLMs, when integrated with data from wearable devices, EHRs, and genetic testing, could analyze these diverse inputs to generate tailored dietary plans. For instance, an LLM could consider a patient’s metabolic panel, body composition analysis, and physical activity data to recommend precise macronutrient and micronutrient targets, addressing specific health goals such as weight management, glycemic control, or reducing cardiovascular risk.

The integration of LLMs with predictive analytics and machine learning models could further enhance their utility in clinical nutrition (87). By analyzing longitudinal health data, LLMs could predict an individual’s risk of developing nutrition-related conditions, such as type 2 diabetes or cardiovascular disease, and recommend preemptive dietary interventions. This proactive approach aligns with the broader goals of preventive medicine, shifting the focus from treating disease to maintaining health and wellness.

LLMs could also address challenges related to cultural and linguistic diversity in clinical nutrition. Nutrition advice must often be adapted to cultural preferences, traditional cuisines, and local food availability. LLMs trained on multilingual and culturally diverse datasets could assist healthcare providers in delivering culturally sensitive dietary recommendations. For example, an LLM could help tailor meal plans for a patient while taking into consideration specific dietary restrictions due to religious practices or cultural norms, ensuring greater relevance and acceptability of the guidance provided.

Another key area for the future application of LLMs in clinical nutrition is patient education and engagement (88–90). LLMs excel at generating human-like text, making them ideal tools for creating patient-facing educational materials, answering frequently asked questions, and providing real-time support (91). For example, an LLM-powered chatbot could assist patients in understanding dietary restrictions, decoding food labels, or identifying suitable recipes that align with their medical conditions and personal preferences. By delivering accessible and contextually relevant information, these models can empower patients to make informed decisions about their nutrition, fostering adherence to prescribed dietary regimens.

In the realm of research, LLMs could serve as invaluable tools for evidence synthesis and knowledge translation in clinical nutrition (92, 93). The volume of published nutritional science research grows rapidly, making it challenging for practitioners to stay current. LLMs can summarize recent studies, identify emerging trends, and highlight consensus or controversies within the field. Furthermore, these models could aid researchers in generating hypotheses by identifying gaps in the literature, fostering innovation in nutritional science.

Collaboration between healthcare providers, data scientists, and policymakers will be crucial in shaping the future of LLMs in clinical nutrition. Developing standardized protocols for integrating LLMs into clinical workflows and establishing guidelines for their ethical use will be vital steps in realizing their potential. Moreover, ongoing education and training for clinicians on the capabilities and limitations of LLMs will empower them to harness these tools effectively while maintaining critical oversight (94, 95).

LLMs offer transformative possibilities for advancing personalized, preventive, and culturally sensitive approaches in clinical nutrition. Realizing this potential will depend on thoughtful integration into clinical practice, guided by interdisciplinary collaboration, ethical oversight, and clinician education.

8 Discussion

Large language models offer considerable promise for advancing clinical nutrition. Their ability to interpret complex data, generate tailored dietary recommendations, and assist both clinicians and patients in decision-making aligns with the increasing need for scalable, personalized, and evidence-based care. Yet, despite their potential, the practical and ethical integration of LLMs into clinical nutrition remains a complex undertaking.

One of the primary concerns is the reliability of LLM-generated outputs. Although these models often produce coherent and convincing responses, they remain vulnerable to factual inaccuracies and hallucinations. This is a pressing issue in clinical contexts where incorrect information can have significant consequences. Mistakes in nutrient recommendations, dietary restrictions, or the management of nutrition-related conditions could compromise patient safety. As such, validation procedures, domain-specific fine-tuning, and routine human oversight are critical to ensure these tools support rather than undermine clinical judgment.

Ethical challenges further complicate implementation. LLMs are shaped by the data on which they are trained, and if that data lacks cultural, linguistic, or socioeconomic diversity, the outputs may inadvertently reinforce existing disparities. For example, a model might fail to account for local dietary practices or regional food availability, reducing the relevance and acceptability of its recommendations. Developing and training LLMs on more inclusive, representative datasets is essential to making their outputs both equitable and clinically useful.

Another key factor is the integration of LLMs into existing healthcare systems. To deliver real value, these tools must interact seamlessly with electronic health records, clinical decision support systems, and other digital health infrastructure. Achieving this requires not only technical interoperability but also regulatory alignment and user training. Without these elements in place, LLMs risk becoming isolated solutions that fail to improve efficiency or care quality in practice.

Equally important is the perception of both clinicians and patients. While many recognize the potential of AI tools in healthcare, concerns persist about the transparency, trustworthiness, and impersonal nature of automated advice. Designing user interfaces that are clear, interactive, and adaptable to individual preferences can help address these concerns. Importantly, LLMs should be positioned as support tools that enhance rather than replace human clinical expertise.

Scalability also remains a barrier to wide-scale adoption. The substantial computational and energy requirements of LLMs can make their deployment costly and technically demanding. This is particularly problematic in low income countries, where malnutrition is highly prevalent, but the infrastructure to support AI tools is limited. When developing LLM-based tools for clinical nutrition, these issues should be taken into consideration.

Despite these limitations, LLMs represent a powerful new tool for enhancing nutrition care. They can assist in automating routine tasks, lowering barriers to dietary counseling, and expanding the availability of up-to-date, evidence-based information. Their effectiveness, however, will depend on thoughtful implementation guided by clinical priorities, ethical standards, and interdisciplinary collaboration.

In conclusion, the integration of LLMs into clinical nutrition holds considerable potential, but this promise can only be realized through deliberate, responsible development. Advances in techniques such as fine-tuning, retrieval-augmented generation, and multimodal input are improving the relevance and safety of these models. Moving forward, success will require more than technical refinement; it will demand sustained efforts to ensure that LLMs are accurate, fair, transparent, and aligned with the realities of clinical care. With appropriate safeguards and collaboration across disciplines, LLMs may ultimately become valuable allies in delivering high-quality, personalized nutrition care.

Author contributions

JB: Writing – review & editing, Project administration, Methodology, Investigation, Writing – original draft, Conceptualization. JP: Writing – original draft, Supervision, Conceptualization, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Bond, A, Mccay, K, and Lal, S. Artificial intelligence & clinical nutrition: what the future might have in store. Clin Nutr ESPEN. (2023) 57:542–9. doi: 10.1016/j.clnesp.2023.07.082

PubMed Abstract | Crossref Full Text | Google Scholar

2. Belkhouribchia, J. Artificial intelligence is going to transform the field of endocrinology: an overview. Front Endocrinol (Lausanne). (2025) 16:1513929. doi: 10.3389/fendo.2025.1513929

PubMed Abstract | Crossref Full Text | Google Scholar

3. Singhal, K, Azizi, S, Tu, T, Mahdavi, SS, Wei, J, Chung, HW, et al. Large language models encode clinical knowledge. Nature. (2023) 620:172–80. doi: 10.1038/s41586-023-06291-2

PubMed Abstract | Crossref Full Text | Google Scholar

4. Ogrinc, M, Koroušić Seljak, B, and Eftimov, T. Zero-shot evaluation of ChatGPT for food named-entity recognition and linking. Front Nutr. (2024) 11:1429259. doi: 10.3389/fnut.2024.1429259

PubMed Abstract | Crossref Full Text | Google Scholar

5. Iqbal, U, Tanweer, A, Rahmanti, AR, Greenfield, D, Lee, LT, and Li, YJ. Impact of large language model (ChatGPT) in healthcare: an umbrella review and evidence synthesis. J Biomed Sci. (2025) 32:45. doi: 10.1186/s12929-025-01131-z

PubMed Abstract | Crossref Full Text | Google Scholar

6. Yu, E, Chu, X, Zhang, W, Meng, X, Yang, Y, Ji, X, et al. Large language models in medicine: applications, challenges, and future directions. Int J Med Sci. (2025) 22:2792–801. doi: 10.7150/ijms.111780

PubMed Abstract | Crossref Full Text | Google Scholar

7. Su, H, Sun, Y, Li, R, Zhang, A, Yang, Y, Xiao, F, et al. Large language models in medical diagnostics: scoping review with bibliometric analysis. J Med Internet Res. (2025) 27:e72062. doi: 10.2196/72062

PubMed Abstract | Crossref Full Text | Google Scholar

8. Qin, H, and Tong, Y. Opportunities and challenges for large language models in primary health care. J Prim Care Community Health. (2025) 16:21501319241312571. doi: 10.1177/21501319241312571

PubMed Abstract | Crossref Full Text | Google Scholar

9. Preiksaitis, C, Ashenburg, N, Bunney, G, Chu, A, Kabeer, R, Riley, F, et al. The role of large language models in transforming emergency medicine: scoping review. JMIR Med Inform. (2024) 12:e53787. doi: 10.2196/53787

PubMed Abstract | Crossref Full Text | Google Scholar

10. Thirunavukarasu, AJ, Ting, DSJ, Elangovan, K, Gutierrez, L, Tan, TF, and Ting, DSW. Large language models in medicine. Nat Med. (2023) 29:1930–40. doi: 10.1038/s41591-023-02448-8

PubMed Abstract | Crossref Full Text | Google Scholar

11. Dergaa, I, Chamari, K, Zmijewski, P, and Ben Saad, H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. (2023) 40:615–22. doi: 10.5114/biolsport.2023.125623

PubMed Abstract | Crossref Full Text | Google Scholar

12. Bergling, K, Wang, LC, Shivakumar, O, Nandorine Ban, A, Moore, LW, Ginsberg, N, et al. From bytes to bites: application of large language models to enhance nutritional recommendations. Clin Kidney J. (2025) 18:sfaf 082. doi: 10.1093/ckj/sfaf082

PubMed Abstract | Crossref Full Text | Google Scholar

13. Busch, F, Hoffmann, L, Rueger, C, van Dijk, EH, Kader, R, Ortiz-Prado, E, et al. Current applications and challenges in large language models for patient care: a systematic review. Commun Med (Lond). (2025) 5:26. doi: 10.1038/s43856-024-00717-2

PubMed Abstract | Crossref Full Text | Google Scholar

14. Kim, J, and Vajravelu, BN. Assessing the current limitations of large language models in advancing health care education. JMIR Form Res. (2025) 9:e51319. doi: 10.2196/51319

PubMed Abstract | Crossref Full Text | Google Scholar

15. Wang, L, Wan, Z, Ni, C, Song, Q, Li, Y, Clayton, E, et al. Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review. J Med Internet Res. (2024) 26:e22769. doi: 10.2196/22769

PubMed Abstract | Crossref Full Text | Google Scholar

16. Girouard, MP, Chang, AJ, Liang, Y, Hamilton, SA, Bhatt, AS, Svetlichnaya, J, et al. Clinical and research applications of natural language processing for heart failure. Heart Fail Rev. (2024) 30:407–15. doi: 10.1007/s10741-024-10472-0

PubMed Abstract | Crossref Full Text | Google Scholar

17. Yang, X, Li, T, Su, Q, Liu, Y, Kang, C, Lyu, Y, et al. Application of large language models in disease diagnosis and treatment. Chin Med J. (2025) 138:130–42. doi: 10.1097/CM9.0000000000003456

PubMed Abstract | Crossref Full Text | Google Scholar

18. Lee, J, Park, S, Shin, J, and Cho, B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Med Inform Decis Mak. (2024) 24:366. doi: 10.1186/s12911-024-02709-7

PubMed Abstract | Crossref Full Text | Google Scholar

19. Wong, IN, Monteiro, O, Baptista-Hon, DT, Wang, K, Lu, W, Sun, Z, et al. Leveraging foundation and large language models in medical artificial intelligence. Chin Med J. (2024) 137:2529–39. doi: 10.1097/CM9.0000000000003302

PubMed Abstract | Crossref Full Text | Google Scholar

20. Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, et al. Attention is all you need. Advances in neural information processing systems 30: annual conference on neural information processing systems 2017. Long Beach, CA, USA: Curran Associates, Inc. (2017):5998–6008. doi: 10.48550/arXiv.1706.03762