Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Educ., 13 January 2026

Sec. Higher Education

Volume 10 - 2025 | https://doi.org/10.3389/feduc.2025.1683909

This article is part of the Research TopicReimagining Higher Education: Responding Proactively to 21st Century Global ShiftsView all 53 articles

Strategies for developing AI competencies in higher education

  • Department of Art and Design, University of Monterrey, San Pedro Garza García, Mexico

As Artificial intelligence (AI) continues to transform industries and redefine professional roles, integrating AI competence development into education has become a strategic priority. This exploratory study implements a LLM-based Delphi methodology to identify essential AI competencies, examine barriers to AI integration in academic settings, and develop actionable strategies for competency development in higher education. The research process employed large language models (LLMs) to conduct a simulated exploration with inductive thematic analysis of interdisciplinary perspectives, prioritize critical themes through iterative rating cycles, and resolve polarization via structured deliberation of disputed concepts. The key outputs include the development of a consensus framework outlining universal AI literacy standards, human-AI collaborative pedagogy models, equity-centered implementation protocols, and ethical guardrails for responsible adoption, together with a toolkit with practical guidelines to operationalize the consensus findings. The study aims to assess AI's potential as a collaborative agent in educational design and to evaluate to what extent an AI-generated framework meets established OECD criteria for quality and robustness. This report addresses three primary audiences: curriculum designers developing AI competency models, institutional leaders implementing equity-focused AI policies, and researchers examining pedagogical impacts of human-AI collaboration.

1 Introduction

Artificial intelligence (AI) will play an increasingly significant role in shaping the future. Much like electricity or internet, we will no longer think about using it consciously, but will simply experience an AI-powered world where everything is more intelligent, efficient, and intuitive (Raskovitch, 2025). As AI becomes more ubiquitous and pervasive, it fundamentally transforms industries, redefines professional roles, and demands new competences for individuals to engage with it in a critical, reliable and creative way. The ongoing debate on AI's impact, highlights the skills needed for effective AI adoption to meet the evolving demands of the AI-integrated labor market (Borgonovi et al., 2023; Milanez, 2023; Babashahi et al., 2024; Ersanli et al., 2025; Morandini et al., 2023; Green, 2024). The Future of Jobs report predicts the skills needed for work will change by 2023 with AI and big data identified as the fastest-growing skill. Moreover, 85% of employees surveyed plan to prioritize workforce upskilling, while 70% expect to recruit talent with newly emerging skills (World Economic Forum., 2025). The demand for AI-related skills signals a critical need that the educational system must address. Therefore, equipping students with AI competences should be a strategic priority for educators.

Long and Magerko (2020) made the first attempt to define AI literacy for a non-technical audience, describing it “as a set of competencies that enables individuals to critically evaluate AI technologies; communicate and collaborate effectively with AI; and use AI as a tool online, at home, and in the workplace.” Recognizing that the research in the field is still in its early stages, Chiu et al. expand the concept of AI literacy, which focuses on knowledge and skill development to the more complex concept of AI competency to include the application of this knowledge in an effective and beneficial way. They define AI competency as “an individual's confidence and ability to clearly explain how AI technologies work and impact society, as well as to use them in an ethical and responsible manner and to effectively communicate and collaborate with them in any setting. They should have the confidence and ability to self-reflect on their AI understanding for further learning. It focuses on how well individuals use AI in beneficial ways”(Chiu et al., 2024).

UNESCO recognizes AI's potential to address some of the biggest challenges in education. The organization proposes various initiatives to integrate AI competency development in education, aiming to build critical understanding of AI's potential and limitations and empower students and educators to use AI responsibly, fostering a positive impact on society and the environment. The AI competency frameworks for students (UNESCO, 2024) focuses on four competencies: human-centered mindset, ethics of AI, AI techniques and applications and AI system design. These domains are structured across a progression of three levels of mastery (understand, apply, and create), allowing for a developmental approach to AI literacy. At the “Understand” level, students build foundational knowledge of AI concepts, ethics, processes and methods, and relate them to real-life contexts. At the “Apply” level, they transfer and adapt this knowledge to more complex tasks and critically engage with AI tools. At the “Create” level, they innovate with AI technologies, develop new tools and critically assess the impact of AI on human society, taking responsibilities as citizens in the AI-era. The framework ensures the proactive development of AI competencies so that students become both responsible users and co-creators of inclusive and sustainable AI.

To guide educators on AI use and to provide them with strategies to design and facilitate students' learning with AI, UNESCO developed AI competency framework for teachers, which aims to empower teachers and guide their proactive, continuing professional development (UNESCO, 2024). It responds to the growing concerns that an over-reliance on AI in education could undermine essential teaching competencies, diminish teachers' autonomous decision-making capacity, devalue the role of teachers and weaken their relationships with the learners. To mitigate those risks, educators must understand how AI is trained and operates, critically assess the accuracy of AI-generated content and develop effective pedagogical methodologies. The AI competency framework comprises five interrelated and interdependent aspects: human-centered mindset, ethics of AI, AI foundations and applications, AI pedagogy and AI for professional development, which can be developed in three progression levels—acquire, deepen and create. This is the first attempt to create an open-ended roadmap to assist policymakers and stakeholders involved in teacher's professional development in framing strategies that align with their specific needs and diverse local contexts.

The framework arrives at a critical moment, as AI's rapid integration into education is already reshaping teaching and learning. According to Chassignol et al. (2018) AI is transforming education across four major areas—customized educational content, innovative teaching methods, technology enhanced assessment, and communication between student and lecturer. Holmes and Tuomi (2022) explore the development of AI tools for education (AIEd) and categorize existing systems into three distinct yet overlapping typologies: student-focused, teacher-focused, and institution-focused. Intelligent tutoring systems (ITS) are the most common student-focused tools. They capture data from student's input, and analyze it to gauge individual strengths, weaknesses and knowledge gaps. This allows the generation of personalized pathway, which adapts learning content according to student's needs and interests. AI-assisted educational apps, virtual reality and augmented reality simulations offer immersive learning, which enable students to explore and manipulate three-dimensional objects or experience simulated environments. Many of these tools are designed to support students with disabilities or students with learning difficulties, who struggle in traditional classroom settings.

Chatbots provide student support and guidance in academic services or act as virtual teaching assistants offering real-time feedback that scales personalized learning in large groups.

Teacher-focused AIEd shift the teacher's role toward that of a coach or mentor. The reduced workload and saved time made possible through automated tasks such as assessment, plagiarism detection and administration of feedback, allows teachers to optimize and experiment with their teaching strategies.

Furthermore, teacher-focused AIEd provide insights about students' progress, which enables them to target their efforts accordingly. This flexibility supports diverse learning styles and promotes deeper student engagement.

The last typology, institution-focused AIEd has clearly defined administrative functions as allocation of financial aid, course-planning, scheduling, e-proctoring, and identifying dropouts and students at risk.

The insight report “Shaping the Future of Learning: The Role of AI in Education 4.0” (World Economic Forum, 2024) highlights pioneering AI-driven education initiatives notable for their transformative impact, measurable outcomes, and scalable models. Selected for their significance (reach and magnitude of impact), quantifiability (use of metrics to measure impact), scalability (adaptability across contexts), and sustainability (long-term potential), those case studies demonstrate how AI can address systemic challenges and advance Education 4.0. Among the projects included in the report is UNICEF's Accessible Digital Textbooks (ADT), which leverages AI to create inclusive digital learning tools for children with disabilities, featuring customizable options like text-to-speech, sign-language videos, and offline accessibility, thus reimagining the future of textbooks. Another project is the Letrus Program, which tackles literacy gaps using AI-powered writing assessments that provide real-time feedback to students and actionable insights for teachers. The AI Tutor Project offers personalized lessons in multiple languages through adaptive learning algorithms, continuous assessment, 24/7 availability and data-driven insights to achieve its goals. It lessens teachers' workload, thus allowing them to focus on more strategic aspects of the learning experience. Together, these projects showcase AI's potential to democratize education by supporting marginalized learners, empowering educators, and delivering scalable, data-driven results.

Whereas the current predominant AIEd applications focus on assessment of student knowledge and learning content adjustment, emerging research areas progress toward self-regulated learning, emotion, motivation, engagement and collaboration, which present considerable potential for augmenting human cognition in learning (Molenaar, 2022).

As AI continues to reshape education, the skills required for the future are evolving rapidly. Traditional teaching methods and standardized assessments are becoming increasingly obsolete in a world where AI can perform many of these tasks with greater efficiency. In this context, educators must prioritize high-quality learning and equip students with future-oriented skillset—global citizenship skills, innovation and creativity skills, technology skills and interpersonal skills (World Economic Forum, 2020).

Markauskaite et al. (2022) emphasize that the capabilities for a world with AI can be conceptualized from multiple perspectives, encompassing a range from fundamentally human capabilities, such as individual behavior, cognition and dispositions, to hybrid human-AI capabilities, involving joint sense making and value creation. These complex capabilities have to be empirically studied in authentic contexts, with AI-based approaches explored as supportive tools.

Advancing AI literacy requires the development of definitive frameworks that support educators in designing lessons with suitable pedagogies, learning outputs, and assessment approaches (Ng et al., 2021). In a recent research analyzing data from 40 higher education institutions across six global regions, Jin et al. (2025) revealed that generative AI is compatible with the goal of fostering innovation and 21st-century skills, and universities actively promote its integration into educational practices to remain relevant in an AI-influenced future. However, they observed that relatively small number of institutions are actively engaged in evaluation measures for AI's impact, which signals significant gaps in comprehensive policy development, communication strategies, and equitable distribution of resources for generative AI integration. The key challenges in promoting AIEd include costs and scalability, ethics and privacy, the lack of actionable guidelines for educators, and limited AI expertise among teaching staff (Zhang and Aslan, 2021).

In addition to implementation, AI use poses challenges for assessment practices. Moorhouse et al. (2023) indicate that existent guidelines still focus on limiting or preventing generative AI and advocate for a greater emphasis to be placed on how these tools can be made integral to assessment tasks and students' lived educational experiences in future guidelines. Guidelines should not only raise awareness of academic integrity and mitigate academic misconduct, but also support instructors in adapting their teaching and assessment practices to the realities of AI.

As AI becomes increasingly embedded in educational practices, it is imperative that its integration be anchored within a robust, rights-based framework. As Chan and Lo (2025) emphasize, AI-enabled surveillance and data practices pose significant risks to privacy, transparency, and fairness. Therefore, AI implementation frameworks in education must prioritize AI responsible and ethical deployment to advance human welfare, safeguard individual liberties and democratic values, and prevent the marginalization of vulnerable groups or the establishment of privatized systems of social control. These principles should serve as foundational guardrails that guide institutional AI policies and the design of equitable, trustworthy learning environments.

Although the need for AI competency development in education has been widely recognized, and higher education institutions globally are embracing generative AI and take proactive steps to integrate it into teaching and learning (McDonald et al., 2025), the adoption remains uneven with a lack of actionable, scalable, and equity-centered strategies.

The gap is widening further with the rapid advancement of the technologies. To address this, the study proposes a novel LLM-driven research process, where AI is prompted to generate strategies for integrating AI competence development into teaching methodologies. The goal is to identify best practices for teaching AI competencies (values, knowledge and skills) and develop instruments for assessing their acquisition, thus promoting more relevant to the emerging demands educational practices. The study aims to assess AI's potential as a collaborative agent in educational design and to evaluate to what extent an AI-generated framework meets established OECD criteria for quality and robustness.

2 Methodology

In this exploratory research the LLM-based Delphi method (Bertolotti and Mari, 2025) is employed as a key methodological approach. Conducted in the early 1950s at Rand Corporation, and formally published in 1963, the Delphi method was devised “to obtain the most reliable opinion consensus of a group of experts by subjecting them to a series of questions in depth interspersed with controlled opinion feedback (Dalkey and Helmer, 1963)”. The Delphi Technique has been widely used in educational settings to form guidelines, establish standards, and predict trends (Green, 2014), making it a suitable tool for identifying critical areas and proposing key strategies for future development. Nworie (2011) describes the following applications of Delphi Technique in educational technology research—studies in identifying roles and responsibilities and determining competency levels, studies to determine areas of practice and importance, studies in leadership, studies in technology use, studies predicting futures.

The traditional Delphi method is defined as a structured approach to group communication, designed to enable a collective of individuals to effectively address complex problems. It consists of four distinct phases—exploration of the subject under discussion, reaching an understanding of how the group views the issue, exploration of the possible disagreements to bring out the underlying reasons and evaluate them, and analysis of the previously gathered information and final evaluation (Linstone and Turoff, 1975). In this iterative multistage process, both qualitative and quantitative data are collected to obtain the most reliable consensus of opinion of the group (Clayton, 1997). Participant responses are summarized between rounds and fed back to the group through controlled feedback, allowing views to be refined until agreement is achieved or diminishing returns are observed (Hasson et al., 2000).

Delphi is characterized by six major components: consensus, accuracy, reliability, and validity, the panel and the notion of expert, iteration and controlled feedback, the role of the researcher, and anonymity (Crisp et al., 1997), with consensus being one of the most controversial issues in the process (Barrios et al., 2021). Diamond et al. (2014) underscore the critical role of consensus in the Delphi method, noting that its definition varies widely and criteria for consensus achievement are poorly reported by researchers. To enhance the methodological rigor and interpretability, they propose a standard set of quality indicators for reporting Delphi studies. These include defining the objective of the study, participant selection, how consensus will be determined, what threshold values will be required for stopping the study, and the criteria for dropping items or the number of rounds to be conducted (Diamond et al., 2014).

In addition to the uncertainties related to the notion of consensus, the technique has been criticized for the lack of precise definition, criteria for defining an expert, and the wide variety of Delphi types available—all raising concerns about its methodological rigor (Hasson and Keeney, 2011). However, the authors reference a substantial body of research that affirm Delphi's validity as a research instrument. Over time, various adaptations of the Delphi method have been developed to address diverse research purposes and contexts. Mullen (2003) identified 23 different labels describing the types of Delphi applications. Among them the most widely used are real-time Delphi, the policy Delphi the modified Delphi, and the e-Delphi.

While the traditional method relies on human expertise, recent advances in AI, particularly the breakthroughs in large language models (LLMs) capable of complex cross-domain reasoning (Schoenegger et al., 2024), now enable AI to replicate human cognition in this process.

Gordon (1994) notes that in certain applications of the Delphi method, particularly those aimed at informing quantitative simulation models, consensus is not required, as divergent responses can be used to explore variable ranges and model outcomes. Furthermore, he emphasizes that the success of a Delphi study is largely determined by the selection of the participants, since the composition of the panel can significantly influence the results obtained. Considering the debated definitions of “expert”, who has been characterized as “informed individual”, “specialist in the field” or “someone who has knowledge about a specific subject” (Keeney et al., 2001), LLMs offer a potential alternative by simulating expert reasoning and providing structured, reliable input in Delphi studies. This potential is supported by a research by Luo et al. (2025), who demonstrate that LLMs can outperform human experts in predicting outcomes in neuroscience research, and suggest that these capabilities are transferable to other domains and may lead to transformative changes in scientific practice and the pace of discovery.

The first fully computerized implementation of the Delphi process, developed through a multi-agent system, showed the potential of replacing human experts with autonomous software agents in the domain of document relevance evaluation (García-Magariño et al., 2008). In this study, agents simulated expert reasoning by rating documents according to different criteria to reach an agreement, thereby replicating the structured feedback cycles of a traditional Delphi. The system achieved strong overall performance, successfully detecting on average 9 out of every 10 relevant documents.

With the advent of LLMs they have become increasingly popular in tasks requiring collective reasoning and reliable validation. Zhang and Aslan (2021) developed DelphiAgent—an agent-based fact-checking framework that employs multiple LLMs to replicate the Delphi method, with the goal of improving transparency in decision-making and reducing hallucinations when generating justifications. The system deploys several autonomous LLM agents, each with a distinct personality, to independently assess the factual accuracy of a claim against verified evidence, before reaching a consensus through iterative feedback and synthesis.

To date two studies have been identified where LLMs were implemented in performing Delphi studies for future forecasting. Nóbrega et al. (2023) proposed an AI Delphi (machine-machine) model to analyze and explore the future of work. They utilized three different models that use LLMs as central participants in a Delphi process. In the first model (Iconic Minds Delphi) experts were represented by fictional personas of well-known researchers created on the base of the knowledge that the LLM has about them. The second model (Persona Panel Delphi) developed fictional personas representing different disciplines relevant to the researched topic and the in third model (Digital Oracle Delphi) the experts were different LLMs chats.

In the second study, Bertolotti and Mari (2025) developed and tested a novel approach for conducting Delphi studies based on text generated by LLMs to explore the future evolution of Generative AI. The proposed methodology enables extensive scenario exploration of complex and rapidly evolving systems, where qualitative insights are crucial.

Unlike the classic Delphi, which concludes upon achieving consensus, the LLM-based method consists of a fixed number of rounds. Additionally, as no humans are involved, issues such as participant fatigue, personal involvement and dropout are eliminated, which addresses some of the key critiques of the classical Delphi method. Other benefits of implementing LLM-based Delphi process in education research include scalability, rapid iteration of ideas, and the ability to simulate diverse expert perspectives, ultimately supporting more robust and innovative strategic planning.

2.1 The LLMs-based Delphi process

In this stimulated Delphi experiment, DeepSeek-V3 (June-2025) (DeepSeek, 2024) was employed as the LLM, due to its up-to-date training data (knowledge cut-off: June 2025) and its accessibility as a cost-free tool. The model was used on its commercial website with its default parameters, which include a temperature set to 1.0, balancing creativity and predictability, a top_p value of 0.9 controlling the diversity of considered tokens, and a max tokens limit of 128k tokens, sufficient for comprehensive responses. With this default configuration, the experimental setup can be replicated directly on the official DeepSeek website. Although the study was not conducted via API, all prompts and model parameters were documented to ensure procedural transparency. The interface non-determinism was mitigated by performing repeated trials and reporting stable median outcomes across replications rather than relying on single-run outputs. To ensure the robustness and stability of the responses, a two-stage sensitivity analysis was conducted. First, multiple prompt strategy was used with rephrasing of the initial prompt and instruction nuances introduced with two runs per expert role to verify the consistency in the outputs. Second, DeepSeek's outputs were compared against the results generated by GPT-5 (June-2024) (OpenAI, 2025). The researcher controlled the entire process, designing and refining the prompts, synthesizing the outputs and conducting the final analysis.

The Delphi simulation was conducted in multiple iterative phases to ensure comprehensive data generation and refinement. The overall procedure is summarized in Figure 1.

Figure 1
Flowchart depicting a process divided into preparation, exploration and thematic analysis, structured feedback, scenario-based voting, and finalization phases. It compares LLM and researcher-led activities. Tasks include assigning expert roles, developing questions, generating responses, identifying themes, developing questionnaires, and generating reports. Researchers validate roles, refine questions, review outputs, oversee rankings, evaluate revisions, review scenarios, and make implementation decisions.

Figure 1. LLM-based Delphi technique: process flowchart.

As an initial step of the process preparation, the LLM was prompted to generate a list of diverse expert profiles to simulate a multidisciplinary panel for the Delphi study. Twelve expert roles were selected to ensure diversity of perspectives and insights across innovation, policy, student support, and ethics. The types of expertise and their description are presented in Table 1.

Table 1
www.frontiersin.org

Table 1. Expert types for the LLM simulation.

The primary goal of the first round of exploratory open-ended questions was to determine which are the key AI competencies students should develop, to explore the barriers and challenges that hinder the integration of AI in higher education, and to propose effective teaching strategies for competencies development and competencies assessment.

Each of the twelve LLM-simulated experts was prompted with four sets of questions to respond (Table 2).

Table 2
www.frontiersin.org

Table 2. Question set for round 1 ‘exploration and thematic analysis'.

After the LLMs simulated responses reflecting the viewpoints ofthe twelve assigned experts, the same LLM was provided with a new prompt, instructing it to perform a data-driven inductive thematic analysis on the collective responses to summarize the information into subtopics and to define key themes. Based on this structured synthesis, the LLM then developed a more focused questionnaire, incorporating Likert scales and ranking systems, for further distribution to the simulated expert panel (Table 3). This instrument aims to analyze the key tensions by quantifying simulated consensus levels, prioritizing critical themes, and identifying areas of divergence.

Table 3
www.frontiersin.org

Table 3. Refined questionnaire for round 2 ‘structured feedback'.

In the simulated second Delphi round, the LLM was prompted to generate responses to the refined questionnaire from the perspective of each of the twelve simulated experts.

Following this, the model conducted analysis of the simulated consensus, identifying areas of agreement, convergence and divergence through a clustering-based comparison of the generated responses.

Median Likert scores were calculated for each theme. Themes with median scores ≥4.0 (on a 5-point scale) were considered validated and themes scoring < 3.5 were labeled as disputed and carried forward for further justification and reevaluation in the simulated third round.

In this third simulated round, the LLM translated the disputed themes into concrete scenarios or decision alternatives. Each scenario was presented with a clear description of the proposed solution or approach, potential benefits and risks. The LLM simulated stakeholder perspectives, with the aim to resolve polarization through structured deliberation of disputed themes.

In the simulated Round 4, based on the validated and disputed themes, LLM consolidated final prioritized outputs and generated final synthesis. The produced outputs included simulated consensus framework (a structured outline of themes with median Likert scores ≥4.0, with actionable recommendations), areas for further study (themes with score < 3.5, requiring additional research or expert review), and conceptual implementation toolkit (illustrative guidelines to operationalize the simulated consensus findings).

Table 4 presents all the prompts used during phase 1 to 4 of the Delphi process.

Table 4
www.frontiersin.org

Table 4. Prompts used during phase 1 to 4 of the Delphi process.

3 Results of the simulated scenario exploration

3.1 Round 1: exploration and thematic analysis

3.1.1 Current practices in AI education

Analysis of the first set of simulated responses to the questions in Table 2 indicated a strong level of model consensus (≥80%) on core AI-related competencies for undergraduates. The LLM-generated expert perspectives emphasized foundational knowledge, including understanding AI/ML concepts (e.g. algorithms, data dependence, and limitations), awareness of discipline-specific applications (such as ChatGPT in humanities or coding assistants in STEM), and recognition of ethical and societal implications (e.g. bias, privacy, and environmental impact). The simulated analysis identified key skills such as prompt engineering (crafting effective queries for disciplinary tasks), critical evaluation of AI outputs (assessing accuracy, bias, and relevance), and adaptive learning (iteratively refining prompts and validating sources based on AI feedback). Essential attitudes included proactive skepticism (balancing engagement with caution) and ethical accountability (e.g. proper citation and plagiarism avoidance). Areas showing lower simulated consensus (< 70% agreement) focused on technical depth—specifically, whether non-STEM students should learn AI system design, and the tension between universal and discipline-specific competency standards.

The simulation identified the most critical gaps in students' AI competencies, such as over-reliance on AI outputs without verification (e.g. blind trust in generative AI and ignorance of hallucinations), insufficient ethical framing (neglecting bias, copyright, and data privacy concerns), discipline-specific deficits (particularly non-STEM students' lack of tailored AI training), and gaps in critical thinking (e.g. inability to interrogate AI logic, such as asking, “Why did the AI suggest this?”). The highlighted disputed areas included whether the primary challenge lies in access to AI tools vs. skill development, as well as whether gaps stem from inadequate teaching methods or flawed assessment practices.

Regarding how AI encourages and discourages learning, the simulated responses indicated consensus that AI offers significant educational benefits, including personalization (e.g. adaptive tutoring via tools like Khanmigo), democratization (lowering barriers to entry, as with coding assistants for beginners), and creativity catalysis (generative AI like DALL·E sparking ideation). However, the simulated responses also pointed to potential risks like cognitive offloading (overreliance undermining problem-solving effort), homogenization (reduced diversity of thought from overused AI outputs), and erosion of foundational skills (e.g. weakened writing or research abilities). Disputed areas persisted in the simulated responses regarding AI's net impact—whether its benefits outweigh its drawbacks, and subject variability in effects, such as whether creativity is disproportionately hindered in humanities compared to STEM fields.

AI's positive contributions to creative processes included enhanced idea generation (e.g. using LLMs for story prompts), rapid prototyping (producing multiple design variants efficiently), and cross-pollination (exposing students to diverse styles and approaches). However, simulated responses suggested persistent concerns regarding originality risks (erosion of authentic creative voice), skill stagnation (circumventing the deliberate practice essential for mastery), and cultural homogenization (perpetuation of dominant narratives through biased training data). Areas with lower simulated consensus included assessment validity—whether AI-augmented creative work can be evaluated fairly, and long-term developmental effects, particularly whether students' creative capacities will evolve new paradigms or atrophy through AI dependency.

3.1.2 Barriers and challenges in AI integration

Simulated responses highlighted multiple barriers to effective AI adoption, including institutional (lack of training/resources, rigid curricula), pedagogical (difficulty aligning AI with learning outcomes), and cultural (faculty resistance, student overreliance) challenges. Key concerns centered on academic integrity risks, the potential erosion of critical thinking skills, and increased workload for adapting teaching methods.

Regarding AI's creative potential, LLM reported that AI tools enhance professors' creativity by offering new pedagogical approaches (e.g. AI-assisted lesson design, personalized student feedback) and enabling exploratory learning (e.g. simulating debates or generating case studies). To foster critical thinking, structured AI interactions were emphasized—such as comparative analysis of AI outputs, error identification exercises, and reflective prompting, which prioritize process over answers.

Ethical considerations were prominent in the simulated perspectives, emphasizing the need to to address bias in AI outputs, ensure transparency in AI use (e.g. disclosure requirements), protect student data privacy, and mitigate socioeconomic disparities in AI access. Disagreements in the simulated responses persisted on the extent of AI's role, with some advocating for strict boundaries to preserve human-centric learning.

3.1.3 Strategies for AI integration in education

A strong level of simulated consensus (≥80% agreement) was observed for several strategies for fostering AI literacy, including embedded first-year courses (covering prompt engineering, bias detection, and ethics), discipline-specific integration (e.g. tailored AI training for nursing vs. engineering), microcredentialing (stackable badges for skills like “Ethical AI Collaboration”), and experiential learning (e.g. AI-enhanced problem-based learning (PBL) and red team challenges). Regarding teaching methods, simulated responses identified AI-enhanced PBL, dual-path assignments with reflections, ethical debate sprints, and prompt engineering studios as particularly effective for developing critical evaluation and metacognitive skills. However, areas of lower simulated consensus persisted on whether AI training should be mandatory, the focus on general vs. specialized tools, the role of AI detection software, and the merits of standalone AI courses. Key recommendations from the simulation included adopting tiered competency frameworks, funding faculty “AI sandbox” pilots, using process-focused assessments, and designing “AI-interrupted” workflows to preserve human creativity. While these best practices are established, their implementation requires context-sensitive solutions to address disciplinary and institutional differences.

3.1.4 Future AI priorities for teaching (5–10 years)

Strong simulated consensus (≥80%) was observed on three critical technologies: adaptive AI tutors (personalizing support via eye-tracking or voice analysis), generative AI simulators (e.g. medical students diagnosing AI patients), and bias-neutralizing tools (flagging stereotypes in outputs). Divergent simulated perspectives emerged on holographic instructors (seen as either innovative or distracting) and blockchain credentials (debated for practicality). Transformative methodologies included just-in-time microlearning (AI-curated upskilling modules), competency-based streaming (dynamic skill badges), and human-AI co-creation (iterative refinement of work), though AI-generated curricula and emotion-sensing tools provoked ethical disputes. Equity priorities focused on culturally adaptive AI (respecting local epistemologies) and low-bandwidth solutions for underserved areas, while AI mental health tools divided opinion. Key recommendations from the simulation included investing in augmented intelligence (AI-human mentorship pairings), piloting ethical AI audits, and establishing educator “future labs” for controlled testing. All simulated responses consistently stressed that AI must enhance, not replace human connections in learning.

3.2 Round 2: structured feedback

The analysis of the simulated responses obtained in the second round revealed strong consensus (median ≥4.0) on six validated themes: universal AI literacy as a baseline requirement, prioritized faculty training, oral defenses for assessment, mandatory bias audits, AI-enhanced PBL, and contribution rubrics for AI-assisted work (Table 5). These themes are recommended for immediate adoption within institutional policies. Four themes were flagged as disputed (median < 3.5), including AI detection tools, standardization vs. disciplinary flexibility, AI in mental health support, and holographic teaching (Table 6), reflecting tensions between ethics, pedagogy, and innovation. These topics are set to advance to Round 3 for targeted debates and voting, based on concrete scenarios. Borderline themes (median 3.5–3.9), such as multilingual AI and review boards, require minor refinements. The tiered framework—core literacy plus disciplinary customization, was endorsed as a compromise, alongside equity safeguards (bias audits, device lending) and human-centered assessments. Structured deliberation is recommended to resolve polarized themes while maintaining momentum on high-consensus priorities.

Table 5
www.frontiersin.org

Table 5. Validated themes from simulated Delphi Round 2 (Median ≥4.0).

Table 6
www.frontiersin.org

Table 6. Disputed themes from simulated Delphi Round 2 (Median <3.5).

3.3 Round 3: scenario-based voting

In round 3 of the simulated Delphi, scenario-based voting framework was developed for the four disputed themes, designed to force concrete trade-off analysis and reveal expert priorities.

In regard to “AI detection tools” the following scenario was created:

Your institution must choose an academic integrity strategy for AI-generated work. Data shows detection tools have 15% false positives but reduce plagiarism by 40%. Alternatives (oral defenses, process portfolios) are resource-intensive.”

The following options for ranking were given: A. Ban detection tools → Invest $250K/year in human grading alternatives. B. Allow formatively only (drafts/low-stakes work) + student appeals process. C. Require for all graded work → Risk 10% increase in student anxiety reports.

LLMs, acting as simulated experts, were instructed to provide rationale addressing the potential effects of their selection on marginalized student populations.

On the topic of ‘Standardization vs. Flexibility', the following scenario was developed:

Accreditors demand AI competency standards. STEM departments want uniform technical skills; humanities insist on discipline-specific critical AI literacy.”

For this scenario, the following voting options were provided:

A. Fully standardized (all students take identical AI courses). B. Core + addenda (50% shared content, 50% disciplinary). C. Decentralized (departments design own programs).

LLMs were prompted to substantiate the decision of the simulated expert by prioritizing immediate workforce preparedness vs. fostering higher-order cognitive skills.

For the theme ‘AI in Mental Health Support' was created the following scenario: “Counseling centers are overwhelmed. AI chatbots could expand access but may give harmful advice 5% of the time. Human counselors cost 10x more per session.”

The voting options were provided: A. Ban AI tools → Maintain waitlists (avg. 3-week delay for care). B. Clinician-supervised AI → Chatbots flag high-risk students for humans. C. Full automation → 24/7 support but risk liability incidents.

For the last topic ‘Holographic Teaching' the scenario was: “A donor offers $2M to pilot holographic professors in 10% of courses. Research shows engagement boosts in STEM labs but distractions in discussion-based humanities classes.”

Three voting options were presented:

A. Adopt widely → Prioritize STEM simulations; accept humanities trade-offs. B. Limited pilots → Test in 3 departments (e.g. med, engineering, art). C. Reject → Invest funds in traditional online learning upgrades.

Simulated experts were required to justify their selection using the structured prompt: “I choose [option] because ______, but we must mitigate ______.”

The simulated voting revealed a preference for pragmatic, mitigation-focused solutions, prioritizing equity and adaptability over universal implementation or rejection of emerging technologies.

Key resolutions from the simulation adopted balanced, safeguard-driven approaches for integrating AI in education. AI detection tools were approved for formative use only, with human review, bias audits, and student opt-outs to mitigate risks. Competency standards were structured as a core curriculum with disciplinary addenda, allocating 20% of budgets for local adaptations to respect field-specific needs. Mental health AI was restricted to clinician-supervised chatbots, requiring transparent disclosures and 48-h human follow-up for high-risk interactions. Lastly, holographic teaching was limited to discipline-specific pilots (e.g. STEM simulations), with mandatory opt-in participation and engagement monitoring to assess efficacy.

3.4 Round 4: finalization

3.4.1 Simulated consensus framework from AI-generated responses

The validated themes, defined by median Likert score ≥4.0 from the simulated Delphi were synthesized into a structured roadmap for AI integration in higher education. The first key theme was ‘universal AI literacy', which includes core competences such as prompting, bias detection, and ethics, delivered through a mandatory first-year course and supplemented by discipline-specific modules to ensure contextual relevance.

Assessment methods include oral defenses, process portfolios, and contribution rubrics, emphasizing both individual understanding and collaborative application of AI competencies.

Human-AI collaborative pedagogy” emerged as a second key theme, emphasizing innovative teaching methods that combine the strengths of both human and artificial intelligence. This approach promotes AI-enhanced PBL, dual-path assignments that allow students to choose between traditional and AI-assisted workflows, and red team challenges designed to cultivate critical thinking and resilience by encouraging students to test and refine AI outputs. To support faculty in adopting these methods, the framework proposes an “AI Wrangler” certification program, offering specialized training, along with financial stipends. This initiative aims to equip educators with the skills and confidence needed to integrate AI meaningfully and ethically into their pedagogical practices.

Equity-centered implementation” was identified as a third critical theme. To achieve fairness, accessibility, and inclusivity in education, the framework recommended mandatory safeguards such as regular bias audits for all AI tools used in learning environments, device lending programs to bridge the digital divide, and multilingual AI options to support diverse linguistic backgrounds. Additionally, it proposes a suite of policy templates to guide equitable practice across institutions, including standardized syllabus statements outlining acceptable AI use, charters for institutional AI review boards to oversee ethical compliance, and transparency dashboards that communicate how AI tools are used and monitored.

Ethical Guardrails” were also affirmed. An important area of concern is the use of AI in mental health support, where the framework mandates that such tools should be used only under clinician supervision and with clear, mandatory disclosures to users regarding their capabilities and limitations. In terms of academic integrity, AI detection tools are recommended for formative use only, meaning they should support learning rather than enforce punitive measures. All flagged content must undergo human review to prevent false positives and ensure fair treatment.

3.4.2 Areas for further study

The contentious themes with median Likert < 3.5 in the simulated Delphi were flagged for further research and pilot studies before widespread adoption. These include the use of ‘AI detection tools', with concerns about potential bias against non-native speakers and the effectiveness of explainable AI in reducing false positives, suggesting pilot comparisons with process-based assessments in both STEM and humanities fields. Research on “holographic teaching” should focus on the actual impact on learning to measure improvements in comprehension or skill acquisition, rather than assuming increased engagement alone. Lastly, the concept of “full AI mental health automation” raises ethical questions about replacing human counselors and using emotion-sensing chatbots, recommending clinician-monitored pilot studies to test their feasibility and safety.

3.4.3 Implementation toolkit

To operationalize the findings from the simulated Delphi, a practical toolkit was synthesized. It includes:

- ‘Policy templates', such as an AI Use Policy that outlines permitted and restricted tools, disclosure requirements, and opt-out protocols, along with an equity audit checklist for assessing tool bias and identifying access gaps.

- ‘Faculty resources' to aid instructors, such an assignment design guide with templates for AI-enhanced PBL and dual-path assessments, and a prompt engineering library with curated prompts tailored to specific disciplines (e.g. generating culturally inclusive case studies).

- ‘Student resources', which include AI literacy modules offering self-paced instruction on ethical use and bias detection, and wellbeing guidelines, which promote healthy AI use habits and link to mental health resources.

- ‘Monitoring & evaluation tools' to ensure ongoing accountability and continuous improvement in AI integration across educational settings. It features dashboard metrics that provide real-time analytics on AI tool usage, flag potential bias incidents, and track trends in student feedback. To assess broader impact, semesterly impact reports are recommended, offering data-driven reviews of learning outcomes, equity-related adjustments, and the overall efficacy of implemented tools. Additionally, feedback loops, including surveys and focus groups with students, faculty, and instructional designers as essential mechanisms for capturing qualitative insights, informing policy refinement, and ensuring that AI use remains aligned with pedagogical goals and ethical standards.

4 Validation of the consensus framework and implementation toolkit for AI integration in higher education

To validate the framework and implementation toolkit derived from the LLM-based Delphi simulation, a follow-up expert evaluation round was implemented. This additional step in the process ensures the grounding of the AI-generated consensus in human expertise, thus providing an initial bridge between simulated hypothesis and its practical application. The expert panel involved in the validation consisted of 8 participants with diverse backgrounds relevant to AI integration in higher education (Table 7). Participants were recruited through targeted invitations, aiming to ensure representation similar to the expertise of the LLM-simulated expert panel.

Table 7
www.frontiersin.org

Table 7. Characteristics of expert panel.

Expertise included educational technology and AI integration (n = 3), curriculum design, pedagogy, and learning innovation (n = 1), ethics and academic integrity (n = 1), higher-education leadership (n = 1), sustainability (n = 1), internationalization (n = 1), and faculty (n = 3). Most panelists were mid- to senior-career professionals, with 25% having 8–15 years of experience in higher education and 37,5% exceeding 16 years. This panel composition ensured a diverse range of practical, theoretical, and policy-oriented perspectives.

To evaluate the LLM-based outputs, the OECD criteria were implemented, which provide a comprehensive framework for assessing their quality and impact through six key dimensions: relevance, coherence, effectiveness, efficiency, impact, and sustainability (OECD, 2021). These criteria emphasize alignment with stakeholder needs and policies, coordination across systems, achievement of objectives, responsible resource use, transformative outcomes, and the continuation of benefits over time.

Relevance assesses the extent to which the LLM outputs address the needs, priorities, and contextual realities of its intended users (students, faculty, and educational institutions) and the extent to which the generated framework and toolkit align with institutional strategies, national and global policy frameworks, such as the Sustainable Development Goals. Particular attention should be given to how well they address addresses issues of equity, access, and inclusion, and whether they supports diverse learning environments.

Coherence examines how well the LLM outputs fit within a broader system of policies, practices, and interventions, encouraging an integrated approach to understanding complex interventions.

Effectiveness is the evaluation criteria which is often used as an overall measure of success of the initiative as it looks at the extent to which its objectives are achieved and which are the results.

Efficiency focuses on the extent to which the intervention delivers results in an economic and timely way. Evaluating efficiency involves balancing the quality and scope of results with the inputs required, while also considering whether the approach is scalable, cost-effective, and aligned with institutional capacities.

Impact assesses the extent to which significant positive or negative, intended or unintended, higher-level effects are achieved. This includes evaluating whether the LLM outputs influence institutional culture, teaching and learning practices, equity outcomes, and ethical standards in the use of AI.

Evaluating impact requires going beyond immediate outputs to consider the deeper, more transformative effects that the framework and toolkit may have on educational systems and the people within them.

Key questions include: to what extent does the toolkit contribute to long-term improvements in digital literacy, responsible AI use, and inclusive education? Does it lead to broader systemic or cultural changes within institutions? Are there signs of unintended consequences, either positive or negative, that affect students, faculty, or institutional processes?

Sustainability assesses the extent to which the benefits of the LLM outputs are likely to continue over time, considering financial, institutional, social, and technological factors. It involves examining whether it can adapt to evolving educational and technological contexts, and whether institutions are likely to maintain its use.

A 5-point Likert-scale survey was used to evaluate the LLM-generated output, focusing on the four consensus themes from the simulated responses (Universal AI literacy, Human–AI collaborative pedagogy, Equity-centered implementation, and Ethical guardrails) and the four components of the implementation toolkit (Policy templates, Faculty resources, Student resources, and Monitoring & evaluation) (Table 8). In addition, the evaluation instrument included open-ended questions inviting experts to share professional insights, and provide constructive recommendations for refining the proposed framework and toolkit (see Appendix 1).

Table 8
www.frontiersin.org

Table 8. Survey results—simulated consensus framework.

Survey results showed that experts broadly endorsed the four themes of the simulated consensus, achieving strong agreement across the six OECD evaluation criteria with median Likert score ≥4.0, with the sole exception of ethical guardrails' sustainability, which had a median of 3.5. The impact dimension received the highest median score across all four domains (Median = 5), indicating a strong consensus on the transformative potential and the strategic importance of the AI-generated framework. Universal AI literacy domain received the strongest overall support with median scores for relevance, coherence and effectiveness ≥4.5. This indicates that the proposed AI literacy implementation, including a mandatory first-year course, supplemented by discipline-specific modules are contextually appropriate, while proposed assessments (oral defenses, process portfolios, and contribution rubrics) were deemed both effective and feasible. In the open-ended questions, experts underscored the urgent necessity of integrating AI literacy into academic environments for both students and educators, and the importance of research to clearly support the efficiency of AI as a tool or resource that enhances, rather than replaces, learning-centric tasks. Furthermore, the need to find new ways of assessing learning was also pointed as imperative, as many of the methods currently used (essays, summaries, reports, graphic organizers, etc.) are vulnerable to the use of AI.

Experts recognized the relevance of Human AI-collaborative pedagogy proposals in addressing key gaps in current teaching practices. Eighty-seven point five percent of the experts agreed that integrating AI-enhanced problem-based learning, dual-path assignments, and “red team” challenges forms a coherent and logically connected pedagogical approach, with a median rating of 4.0, indicating a strong consensus. Faculty support measures like AI Wrangler certification and stipends were also viewed as efficient and sustainable in the long-term, receiving a median rating of 4.0. Through the open-ended responses, experts drew attention to the necessary of building a symbiotic relationship between humans and AI to reduce the generational gap, which presents indisputable challenge for professors in adopting new technologies, while also supporting them to work more efficiently. At the same time, experts emphasized that teaching responsible and reflective use of AI in the current contexts with limited reading and analytical habits is crucial to avoid excessive reliance on these tools.

Equity-centered implementation strategies (bias audits, device-lending programs, multilingual AI options, and policy templates), were evaluated by 87,5% of the experts as highly relevant, and 75% defined them as effective in addressing access and fairness concerns, while a small minority (37,5%) expressed neutrality on coherence, highlighting potential institutional variability. A recommendation received was to focus on cultivating the right attitudes toward AI (promoting curiosity, openness, and critical awareness), as equity in the age of AI depends not only on access to infrastructure or skills but also on fostering a reflective and inclusive mindset. Experts also suggested collaborative initiatives with NGOs to establish self-sustaining learning centers, ensuring long-term inclusion rather than short-term access.

The domain of Ethical guardrails, which proposes measures to protect academic integrity and student wellbeing received the lowest support of the four consensus themes, with median ratings of 4.0 for relevance, coherence and effectiveness, and 4.5 for efficiency.

This cautious expert reception of the ethical guardrails theme underscores that a purely abstract ethical commitment is insufficient. To achieve greater legitimacy and operational clarity, ethical frameworks must be explicitly situated within a concrete human rights framework and translated into actionable policy. This involves embedding rights-aligned safeguards such as:

- A data protection impact assessment template mandating purpose limitation, data minimization, and retention limits.

- Vendor algorithmic transparency disclosures covering model provenance, evaluation datasets, known bias profiles, and update cadence.

- A student AI-use and rights charter guaranteeing informed consent, opt-in/out for non-essential data collection, and accessible appeals processes.

- Strict necessity criteria for any high-risk surveillance (e.g. online proctoring, facial recognition), requiring mandatory human review and alternative pathways.

Monitoring must be reinforced with concrete dashboard indicators tracking privacy complaints, false positive incident rates (with a focus on non-native speakers), demographic parity of flags, and time-to-resolution. These tangible components provide the critical foundation for mitigating risks and proactively protecting human rights within institutional AI systems.

Experts cautioned that AI is still in its “infancy” with respect to applications in mental health, which presents notable ethical and practical challenges. They emphasized that higher education institutions urge to recognize the potential risks to students' mental health when AI tools are used as substitutes for human guidance and support. Accordingly, they recommended the development of robust institutional wellbeing policies and early intervention measures to detect and mitigate possible adverse outcomes.

Sustainability, on the other hand, was the lowest-rated criterion with median rating ≤ 4.0. Experts noted that ensuring transparency and the efficient use of resources could be a valuable addition to the maintenance of the framework over time. They highlighted that in addition to general AI literacy, users need to understand the environmental impacts of AI use, including water and energy consumption and to strive to minimize the environmental impact wherever possible.

The AI-generated Implementation Toolkit was evaluated by experts in terms of its usability and pilot priority, using a 5-point Likert scale ranging from very high to very low (Table 9).

Table 9
www.frontiersin.org

Table 9. Survey results – simulated Implementation toolkit.

The Policy templates component received moderate support with median rating of 3.5, with 50% indicating high to very high usability and pilot priority, while only one expert rated usability as low, suggesting the need of refinements. Faculty resources (including assignment design guides and discipline-specific prompt engineering libraries) and student resources (including self-paced AI literacy modules and wellbeing guidelines) were also positively evaluated, achieving median ratings of 4.0. Both usability and pilot priority were considered high or very high by more than 75% of the experts, suggesting that early implementation would be beneficial to support faculty in designing AI-enhanced learning activities. These results indicate that the resources are practical and actionable, providing faculty with structured tools to integrate AI effectively into pedagogy, and supporting students in developing responsible AI practices and healthy engagement with AI tools. Finally, Monitoring & evaluation tools (dashboards, impact reports, and feedback loops) received median ratings of 4.0 with 87,5% pointing high or very high usability and 62,5% pointing high or very high pilot priority.

Insights from the open-ended questions indicated that the key conditions for adoption are institutional openness, capacity building, and acceptance of failure. Experts emphasized that the awareness and commitment of university leaders are essential for the successful AI policy implementation.

Overall, the statistical summary of expert evaluations indicated strong consensus for the simulated LLM-generated outputs, with median ratings consistently ≥4.0 for most dimensions. Experts concurred that the AI-generated framework and toolkit cover all the relevant aspects and can serve as a solid starting point for implementation of AI-supported educational initiatives.

A key concern raised was that institutional culture, which is often conservative and risk-averse, may resist the transformative changes required for meaningful AI integration.

They emphasized the need to shift from hindering to guiding, promoting adoption rather than restriction, and the importance of embracing change, not only to enhance education but also to prepare students for the realities of the future work environments.

4.1 Limitations of the study

While this study demonstrates the methodological potential of an LLM-based Delphi simulation in the context of AI competency development, several limitations must be acknowledged. The first limitation is inherent in the methodology's dependency on the state of the LLM technology at the time of the study—as AI's capabilities evolve, the implementation strategies may become obsolete. While the expert-validation is a durable outcome, the specific AI-generated hypotheses and the dialogue dynamics of this particular Delphi process are not permanently fixed. This underscores the importance of interpreting this work as a foundational case study, with its actual longevity dependent on the successful application of its methodology to future AI systems. A second limitation stems from the LLM's operational environment—a standard web interface. The inherent non-determinism of this interface, which does not provide control over it parameters, prevents the exact replication of the LLM's reasoning process across iterative Delphi rounds. This inability to perfectly recreate the AI's contribution poses a constraint on the strict reproducibility of the process. For future studies applying this method, we recommend the mandatory use of APIs to lock parameters, thereby ensuring full transparency and replicability in the consensus-building mechanism. A third constraint concerns the insufficient integration of a rights-based lens within the proposed framework. Although it addresses equity-centered implementation and ethical guardrails, it does not incorporate a comprehensive human rights perspective on AI use in education. Subsequent iterations of the framework should therefore include explicit alignment with human rights principles and develop mechanisms to ensure that AI adoption in higher education remains ethically sound and socially just.

Therefore, the most critical priority is the explicit incorporation of non-negotiable human oversight and ethical safeguards, ensuring that AI use in higher education remains accountable and ethically sound.

5 Conclusion

Though still in its infancy, AI has already disrupted education, facilitating teachers by increasing their productivity, enabling new forms of teaching and enhancing the learning experience for the students.

This research illustrates how iterative LLM dialogues could serve as a rapid and structured hypothesis-generating approach that bridges AI-generated insights and human expertise. The human-AI co-created findings outline potential pathways for curricular transformation and systemic adaptation of pedagogy to AI, while proposing a promising preliminary model for rethinking research methodologies in education. The feasibility and conceptual soundness of the proposed framework received strong preliminary support from a small, diverse expert panel, offering initial face validity feedback and early confirmation of its practical relevance.

The analysis suggests that AI competency development should not be treated as a discrete subject area but as a foundational component of future-ready learning models. Future research must prioritize pilot implementations of the toolkit components to assess their feasibility and effectiveness. Subsequently, longitudinal research would be needed to assess AI's impact on both learning outcomes and educational equity, alongside investigation of educators' evolving role within AI-augmented educational environments.

5.1 Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the author(s) used DeepSeek-V3 (June-2025) and GPT-4o (June-2025) in order to conduct the LLMs-based Delphi methodology and to assist in language editing. No generative AI tools were used to write the empirical sections reporting data from human experts of the validation round or to perform the statistical analysis. The author reviewed and edited the content as needed and takes full responsibility for the final content of the publication.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

MP: Conceptualization, Validation, Investigation, Writing – review & editing, Supervision, Formal analysis, Writing – original draft, Methodology, Data curation, Project administration.

Funding

The author(s) declared that financial support was received for this work and/or its publication. The publication of this research was made possible through the financial support of the University of Monterrey.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. During the preparation of this work the author(s) used DeepSeek-V3 (June-2025) and GPT-4o (June-2025) in order to conduct the LLMs-based Delphi methodology and to assist in editing the resulting data. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2025.1683909/full#supplementary-material

References

Babashahi, L., Barbosa, C. E., Lima, Y., Lyra, A., Salazar, H., Argôlo, M., et al. (2024). AI in the workplace: a systematic review of skill transformation in the industry. Adm. Sci. 14:127. doi: 10.3390/admsci14060127

Crossref Full Text | Google Scholar

Barrios, M., Guilera, G., Nuño, L., and Gómez-Benito, J. (2021). Consensus in the delphi method: what makes a decision change? Technol. Forecasting Soc. Change 163:120484. doi: 10.1016/j.techfore.2020.120484

Crossref Full Text | Google Scholar

Bertolotti, F., and Mari, L. (2025). An LLM-based Delphi study to predict GenAI evolution. Available online at: https://arxiv.org/pdf/2502.21092 (accessed June 28, 2025).

Google Scholar

Borgonovi, F., Calvino, F. Criscuolo, Ch., Nania, J., Nitschke, J., O'Kane, L., et al. (2023). Emerging trends in AI Skill Demand Across 14 OECD Countries, OECD Artificial Intelligence Papers, No. 2. Paris: OECD Publishing.

Google Scholar

Chan, H., and Lo, N. (2025). A study on human rights impact with the advancement of artificial intelligence. J. Posthumanism 5, 1114–1153. doi: 10.63332/joph.v5i2.490

Crossref Full Text | Google Scholar

Chassignol, M., Khoroshavin, A., Klimova, A., and Bilyatdinova, A. (2018). Artificial Intelligence trends in education: a narrative overview. Proc. Comp. Sci. 136, 16-24. doi: 10.1016/j.procs.08.233

Crossref Full Text | Google Scholar

Chiu, T. K. F., Ahmad, Z., Ismailov, M., and Sanusi, I. T. (2024). What are artificial intelligence literacy and competency? A comprehensive framework to support them. Comp. Educ. Open 6:100171. doi: 10.1016/j.caeo.2024.100171

Crossref Full Text | Google Scholar

Clayton, M. J. (1997). Delphi: a technique to harness expert opinion for critical decision-making tasks in education. Educ. Psychol. 17,4, 373–386. doi: 10.1080/0144341970170401

Crossref Full Text | Google Scholar

Crisp, J., Pelletier, D. Duffield, Ch., Adams, A., and Nagy, S. (1997). The Delphi method? Nurs. Res. 46, 116–118. doi: 10.1097/00006199-199703000-00010

Crossref Full Text | Google Scholar

Dalkey, N., and Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Manage. Sci. 9, 458–467. doi: 10.1287/mnsc.9.3.458

Crossref Full Text | Google Scholar

DeepSeek. (2024). *DeepSeek-V2*. Commercial AI Language Model. Hangzhou: DeepSeek.

Google Scholar

Diamond, I., Grant, R., Feldman, B., Pencharz, P., Ling, S., Moore, A., et al. (2014). Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J. Clin. Epidemiol. 67, 401–409. doi: 10.1016/j.jclinepi.12.002

Crossref Full Text | Google Scholar

Ersanli, C. Y., Çelik, F., Barjesteh, H., Duran, V., and Manoochehrzadeh, M. (2025). A review of global reskilling and upskilling initiatives in the age of AI. AI Ethics 5, 5719–5728. doi: 10.1007/s43681-025-00767-9

Crossref Full Text | Google Scholar

García-Magariño, I., Gómez-Sanz, J. J., and Pérez-Agüera, J. R. (2008). “A multi-agent based implementation of a Delphi process,” in Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, Vol. 3 (AAMAS 2008), eds. Padgham, Parkes, Müller, and Parsons (Estoril), 1543–1546.

Google Scholar

Gordon, T. (1994). The Delphi method. Futures Res. Methodol. 2, 1–30.

Google Scholar

Green, A. (2014). The Delphi technique in educational research. SAGE Open 4, 1–8. doi: 10.1177/2158244014529773

Crossref Full Text | Google Scholar

Green, A. (2024). Artificial Intelligence and the Changing Demand for Skills in the Labour Market, OECD Artificial Intelligence Papers, No. 14. Paris: OECD Publishing.

Google Scholar

Hasson, F., and Keeney, S. (2011). Enhancing rigour in the Delphi technique research. Technol. Forecasting Soc. Change 78, 1695–1704. doi: 10.1016/j.techfore.04.005

Crossref Full Text | Google Scholar

Hasson, F., Keeney, S., and McKenna, H. (2000). Research guidelines for the Delphi survey technique. J. Adv. Nurs. 32, 1008–1015. doi: 10.1046/j.1365-2648.2000.t01-1-01567.x

PubMed Abstract | Crossref Full Text | Google Scholar

Holmes, W., and Tuomi, I. (2022). State of the art and practice in AI in education. Eur. J. Educ. 57, 542–570. doi: 10.1111/ejed.12533

Crossref Full Text | Google Scholar

Jin, Y., Yan, L., Echeverria, V., Gašević, D., and Martinez-Maldonado, R. (2025). Generative AI in higher education: a global perspective of institutional adoption policies and guidelines. Comp. Educ. Artif. Intell. 8:100348. doi: 10.1016/j.caeai.2024.100348

Crossref Full Text | Google Scholar

Keeney, S., Hasson, F., and McKenna, H. (2001). A critical review of the Delphi technique as a research methodology for nursing. Int. J. Nurs. Stud. 38, 195–200. doi: 10.1016/S0020-7489(00)00044-4

PubMed Abstract | Crossref Full Text | Google Scholar

Linstone, H. A., and Turoff, M. (1975). The Delphi Method: Techniques and Applications. Reading, MA: Addison-Wesley Publishing Co., Advanced Book Program.

Google Scholar

Long and Magerko. (2020). What is AI literacy? Competencies and design considerations, in Chi'20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems: April 25-30, 2020, Honolulu, HI, USA (NY, NY: Association for Computing Machinery). doi: 10.1145/3313831.3376727

Crossref Full Text | Google Scholar

Luo, X., Rechardt, A., Sun, G., Nejad, K. K., Yáñez, F., Yilmaz, B., et al. (2025). Large language models surpass human experts in predicting neuroscience results. Nat. Hum. Behav. 9, 305–315. doi: 10.1038/s41562-024-02046-9

PubMed Abstract | Crossref Full Text | Google Scholar

Markauskaite, L., Marrone, R., Poquet, O., Knight, S., Martinez-Maldonado, R., Howard, S., et al. (2022). Rethinking the entwinement between artificial intelligence and human learning: what capabilities do learners need for a world with AI? Comp. Educ. 3:100056. doi: 10.1016/j.caeai.2022.100056

Crossref Full Text | Google Scholar

McDonald, N., Johri, A., Ali, A., and Collier, A. H. (2025). Generative artificial intelligence in higher education: evidence from an analysis of institutional policies and guidelines. Comp. Hum. Behav. Artif. Hum. 3:100121. doi: 10.1016/j.chbah.2025.100121

Crossref Full Text | Google Scholar

Milanez, A. (2023). The Impact of AI on the Workplace: Evidence from OECD Case Studies of AI Implementation, OECD Social, Employment and Migration Working Papers, No. 289. Paris: OECD Publishing.

Google Scholar

Molenaar, I. (2022). Towards hybrid human-AI learning technologies. Eur. J. Educ. 57, 632–645. doi: 10.1111/ejed.12527

Crossref Full Text | Google Scholar

Moorhouse, B., Yeo, M., and Wan, Y. (2023). Generative AI tools and assessment: guidelines of the world's top-ranking universities. Comp. Educ. Open 5:100151. doi: 10.1016/j.caeo.2023.100151

Crossref Full Text | Google Scholar

Morandini, S., Fraboni, F., de Angelis, M., Puzzo, M., Giusino, D., Pietrantoni, L., et al. (2023). The impact of artificial intelligence on workers' skills: upskilling and reskilling in organisations. Informing Sci. Int. J. Emerging Transdiscipline 26, 39–68. doi: 10.28945/5078

Crossref Full Text | Google Scholar

Mullen, P. (2003). Delphi: myths and reality. J. Health Organization Manage. 17, 37–52. doi: 10.1108/14777260310469319

PubMed Abstract | Crossref Full Text | Google Scholar

Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., and Qiao, M. S. (2021). Conceptualizing AI literacy: an exploratory review. Comp. Educ. Artif. Intell. 2:100041. doi: 10.1016/j.caeai.2021.100041

Crossref Full Text | Google Scholar

Nóbrega, L., Marschhausen, L., Martinez, L. F., Lima, Y., Almeida, M., Lyra, A., et al. (2023). AI Delphi: machine-machine collaboration for exploring the future of work. Available online at: https://ssrn.com/abstract=4660589 (Accessed October 02, 2025).

Google Scholar

Nworie, J. (2011). Using the Delphi technique in educational technology research. Tech Trends 55, 24–30. doi: 10.1007/s11528-011-0524-6

Crossref Full Text | Google Scholar

OECD (2021). Applying Evaluation Criteria Thoughtfully. Paris: OECDPublishing.

Google Scholar

OpenAI (2025). ChatGPT (GPT-5) [Large Language Model]. San Francisco, CA: OpenAI.

Google Scholar

Raskovitch, K. (2025). Tech Trends 2025. Deloitte Insights. Available online at: https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2025/tech-trends-introduction-ai-is-everywhere.html (Accessed June 20, 2025).

Google Scholar

Schoenegger, P., Park, P. S., Karger, E., Trott, S., and Tetlock, P. E. (2024). AI-augmented predictions: LLM assistants improve human forecasting accuracy. ACM Trans. Interactive Intell. Syst. 15, 1–25. doi: 10.1145/37076

Crossref Full Text | Google Scholar

UNESCO (2024). AI Competency Framework for Students [Preprint]. Paris: UNESCO.

Google Scholar

World Economic Forum (2020). Schools of the Future. Geneva: World Economic Forum.

Google Scholar

World Economic Forum (2024). Shaping the Future of Learning: The Role of AI in Education 4.0. Geneva: World Economic Forum.

Google Scholar

World Economic Forum. (2025). Future of Jobs Report 2025. Geneva: World Economic Forum.

Google Scholar

Zhang, K., and Aslan, A. (2021). AI technologies for education: recent research & future directions. Comp. Educ. Artificial Intell. 2:100025. doi: 10.1016/j.caeai.2021.100025

Crossref Full Text | Google Scholar

Keywords: AI competencies, teaching strategies, artificial intelligence in education, LLM-based Delphi methodology, higher education

Citation: Petrova MN (2026) Strategies for developing AI competencies in higher education. Front. Educ. 10:1683909. doi: 10.3389/feduc.2025.1683909

Received: 11 August 2025; Revised: 11 November 2025;
Accepted: 15 December 2025; Published: 13 January 2026.

Edited by:

Adan Lopez-Mendoza, Universidad Autónoma de Tamaulipas, Mexico

Reviewed by:

Dennis Arias-Chávez, Universidad Continental - Arequipa, Peru
Noble Lo, Lancaster University, United Kingdom

Copyright © 2026 Petrova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Miroslava Nadkova Petrova, bWlyb3NsYXZhLnBldHJvdmFAdWRlbS5lZHU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.