Proactive and reactive engagement of artificial intelligence methods for education: a review

The education sector has benefited enormously through integrating digital technology driven tools and platforms. In recent years, artificial intelligence based methods are being considered as the next generation of technology that can enhance the experience of education for students, teachers, and administrative staff alike. The concurrent boom of necessary infrastructure, digitized data and general social awareness has propelled these efforts further. In this review article, we investigate how artificial intelligence, machine learning, and deep learning methods are being utilized to support the education process. We do this through the lens of a novel categorization approach. We consider the involvement of AI-driven methods in the education process in its entirety—from students admissions, course scheduling, and content generation in the proactive planning phase to knowledge delivery, performance assessment, and outcome prediction in the reactive execution phase. We outline and analyze the major research directions under proactive and reactive engagement of AI in education using a representative group of 195 original research articles published in the past two decades, i.e., 2003–2022. We discuss the paradigm shifts in the solution approaches proposed, particularly with respect to the choice of data and algorithms used over this time. We further discuss how the COVID-19 pandemic influenced this field of active development and the existing infrastructural challenges and ethical concerns pertaining to global adoption of artificial intelligence for education.


Introduction
In 2015, the United Nations General Assembly identified quality education as one of the seventeen sustainable development goals or SDGs [1].The target is to ensure that by 2030, issues pertaining to access to primary, secondary, technical, vocational and tertiary education are addressed globally.In response to this goal, countries have set individual targets in accordance with economic and social development needs.For instance, the United States Department of Education in 2016 adopted a vision for progress in STEM education by 2026 [2].In a different part of the world, the Ministry of Education in India has rolled out several initiatives to accelerate equitable access to education [3].In this context, it is anticipated globally that technology and more importantly artificial intelligence (AI) driven tools will be central to achieving the holistic goal set by the United Nations General Assembly [4].
In the past there has been considerable discourse about how adoption of artificial intelligence driven methods for education might alter the course of how we perceive education [5,6].However, in many of the earlier debates, the full potential of artificial intelligence was not recognized due to lack of supporting infrastructure.It was not until very recently that AI-powered techniques could be used in classroom environments.Since the beginning of the 21st century, there has been a rapid progress in the semiconductor industry in manufacturing chips that can handle computations at scale efficiently.In fact, in the coming decade too it is anticipated that this growth trajectory will continue with focus on wireless communication, data storage and computational resource development [7].With this ongoing progress, therefore, plans to utilize AI driven tools to support students, educators and policy-makers in education appears to be the logical next step.
In this review article, we systematically review how machine learning and artificial intelligence can be utilized in different phases of the educational process -from planning and scheduling to knowledge delivery and assessment.To this end, we introduce a broad categorization of original research articles in the literature into methods that are relevant prior to knowledge delivery and those that are relevant in the process of knowledge delivery, i.e., proactive vs. reactive engagement.Proactive involvement of AI in education comes from its use in student admission logistics, curriculum design, scheduling, teaching content generation, etc. Reactive involvement of AI is considerably broader in scope -AI-based methods can be used for designing intelligent tutoring systems, assessing performance, predicting student outcomes etc.In the schematic in Fig. 1, we present an overview of our categorization approach.We have selected a sample set of research articles under each category and identified the key problem statements addressed using AI methods in the past 20 years.
The COVID-19 pandemic has been one of the most significant social disruptions in recent history.With the outbreak of the virus, many brick-and-mortar educational institutions had to switch into alternate methods of delivering knowledge to students.This in some situations necessitated creative thinking by the administration, educators and students [8], and in turn accelerated the adoption and use of technology including artificial intelligence for education.In this article, we highlight how the outbreak of the pandemic globally impacted and shaped the trajectory of artificial intelligence research for education (AIEd).
Through this review article, we aim to address the following questions: • What were the prominent research directions in involvement of AI in the end-to-end education process in the past two decades?How has the AI methodologies (i.e., choice of datasets and algorithms) evolved over this period in these major research directions?• How did the COVID-19 pandemic influence the education landscape and how AI in particular can drive future developments for educational technologies?• Does use of AI-driven methodologies for education widen or bridge the gap between population groups when it comes to access to quality education?
The organization of this review article from here on is such.In Section 2, we state our search strategy in identifying research articles and present the summary statistics of the articles considered.Here, we clearly define the technical scope of the review.In Section 3, we contextualize our contribution in the light of technical review articles published in the domain of AIEd in the past five years.In Section 4, we present our categorization approach and review the scientific and technical contributions each category.In Section 5, we discuss the impact of the COVID-19 pandemic on AIEd research.Finally, in Section 6, we discuss the existing limitations in the global adoption of AI driven tools for education and the next steps for this field in general.

Scope Definition
The term artificial intelligence (AI) was coined in 1956 by John McCarthy [9].Since the first generally acknowledged work of McCulloch and Pitts in conceptualizing artificial neurons, AI has gone through several dormant periods and shifts in research focus.More recently, researchers and social scientists are increasingly using AI-based techniques to address social issues and to build towards a sustainable future [10].In this article, we review the scope of using artificial intelligence to ensure quality education.

Paper Search Strategy
For the purpose of analyzing recent trends in this field (i.e., AIEd), we have sampled research articles published in peer-reviewed conferences and journals over the past 20 years, i.e. between 2003-2022, by leveraging the Google Scholar search engine using keywords specific to each of our identified use cases.Our aim here was to identify a sufficient number of research articles that outlined the breadth of use cases using AI-driven methods in planning (i.e., proactive) and execution (i.e., reactive) phases of the education process.To accomplish this, we did not restrict our search space by considering only certain conferences or journals [10] or by only including articles from authors having a certain h-index [11] or on the basis of citations [12].With this strategy in place, we have selected 194 relevant

Performance Assessment & Monitoring
Outcome prediction

Inclusion and Exclusion Criteria
There is considerable debate in the scientific community about what is the scope of artificial intelligence [13].Here, we do not provide a perspective for what can be included under purview of AI in the context of education, but rather clearly delineate our inclusion/exclusion criteria.For this review article, we include research articles that use methods such as optimal search strategies (eg.BFS, DFS etc.), density estimation, machine learning, Bayesian machine learning, deep learning, reinforcement learning etc.We do not include original research that proposes use of concepts and methods rooted in operations research, evolutionary algorithms, adaptive control theory, robotics etc. in our corpus of selected articles.In this review, we only consider peer-reviewed articles that were published in English.We do not include patented technologies and copyrighted EdTech software systems in our scope unless peer-reviewed articles outlining the same contributions have been published by the authors.

Summary statistics
With the scope of our review defined above, here we provide the summary statistics of the 194 technical articles we covered in this review.In Figure 2, we show the distribution of the included scientific and technical articles over the past two decades.We also introspected the technical contributions in each category of our categorization approach with respect to the target audiences they catered to (see Figure 3).We primarily identify target audience groups for educational technologies as such -pre-school students, elementary school students, middle and high school students, university students, standardized test examinees, students in e-learning platforms, students of MOOCs, and students in professional/vocational education.Articles where the audience group has not been clearly mentioned were marked as belonging to 'Unknown' target audience category.
In Section 4, we introduce our categorization and perform a deep-dive to explore the breadth of technical contributions in each category.If applicable, we have further identified specific research problems currently receiving much attention as sub-categories within a category.In Figure 4, we demonstrate the distribution of significant research problems within a category.We defer the analysis of the identified trends from these summary plots to the Discussion section of this paper.

Related Works
Since being identified as one among the seventeen SDGs [1], there has been increased attention in addressing existing issues in the field of education through the latest and greatest technology.It must be noted however, neither the use of technology to benefit education nor artificial intelligence as a technology is an invention of the twenty-first century.It is in fact the increased social awareness about equitable access to developmental resources and simultaneous development of the infrastructure to support it that led to the increase in scientific and technical contributions in AIEd research.In this backdrop, the number of review articles surveying the technical progress in this discipline has also increased in the last decade (see Fig. 5 note that we used Google Scholar as the search engine with the keywords artificial intelligence for education, artificial intelligence for education review articles etc.).Here, we discuss the premise of the review articles published in the last five years and contextualize this article with respect to previously published technical reviews.
Among the review articles identified based on the keyword search on Google Scholar and published between 2018 and 2022, one can identify two thematic categories -(i) Technical reviews with categorization: review articles that group research contributions based on some distinguishing factors such as problem statement, solution methodology etc [14,15,11,16,17,18,19,20,21].(ii) Perspectives on challenges, trends and roadmap: review articles that highlight the current state of research in a domain and offer critical analysis of the challenges and the future road map for the domain [22,23,24,25,26,27].Closely linked with (i) are review articles that dive deep into the developments within a

Outcome prediction
Figure 4: Distribution of reviewed technical articles across sub-categories under each category particular sub-category associated with AIEd, such as AIEd in the context of early childhood education [28], online higher education [29] etc.We have designed this review article to belong to category (i) -we distinguish between the use of artificial intelligence for education based on their proactive or reactive involvement in the education process.To the best of our knowledge, we for the first time categorize AIEd research articles through such lens and provide an in-depth review of significant research problems in each category (see schematic in Fig. 1).We believe that our categorization approach introduces researchers to the wide scope of using AI-driven methods for providing quality education.At the same time, the article summarizes for expert researchers the progress of AI research in utilizing diverse datasets and algorithms for problem statements in the education sector and the scope for future research.
In Table 1, we have outlined the context of recent categorical review articles along with ours to provide readers a comprehensive summary of how we place this article in the body of literature for AIEd.We introduce a novel categorization that distinguishes between proactive (admissions, scheduling, content generation etc.) and reactive (tutoring systems, performance assessment, outcome prediction etc.) involvement of AI for education and review scientific and technical contributions under each sub-category over the past two decades.4 Engaging artificial intelligence driven methods in stages of education

Proactive vs Reactive engagement of AI -an introduction
The process of educating a student begins much before the student starts attending lectures and parsing lecture materials.In a traditional education setup, administrative staff and educators begin preparations related to making admissions decisions, scheduling of classes to optimize resources, curating course contents and preliminary assignment materials several weeks prior to the term start date.Once the term starts, the focus of educators is to deliver the course material, give out and grade assignments to assess progress, and provide additional support to students who might benefit from that.The role of the students is to regularly acquire knowledge, ask clarifying questions and seek help to master the material.The role of administrative staff in this phase is less hands-on -they remain involved to ensure smooth and efficient overall progress.Therefore, we can clearly identify two distinct work phases in the end-to-end education process.First, proactive engagement -all efforts in this phase are to design and curate to ensure optimal use of resources, and second, reactive engagement -all efforts in this phase are to ensure that students acquire the necessary information and skills from the sessions they attend and to address any blockers they might encounter.In this review article, we leverage these two phases to distinguish between contributions in the field of AIEd research.Our primary categorization of AIEd research is therefore, (i) Proactive engagement of AI for education, and (ii) Reactive engagement of AI for education.Within these broad categorizations, we further identify different genres of research relevant to the education sector.For instance, in the proactive engagement phase, AI-based algorithms can be leveraged to determine student admission logistics, design curricula and schedules, and create course content.On the other hand, in the reactive engagement phase, AI-based methods can be used for designing intelligent tutoring systems (ITS), performance assessment, prediction of student outcomes etc. (see Figure 1).Another important distinction between the two phases lies in the nature of the available data to develop models.While the former primarily makes use of historical data points or pre-existing estimates of available resources and expectations about learning outcomes, the latter has at its disposal a growing pool of data points from the currently ongoing learning process, and can therefore be more adaptive and initiate faster pedagogical interventions to changing scopes and requirements.

Student admission logistics
In the past, although a number of studies used statistical or machine learning-based approaches to analyze or model student admissions decisions, they had little role in the actual admissions process [35,36].However in the face of growing numbers of applicants, educational institutes are increasingly turning to AI-driven approaches to efficiently review applications and make admission decisions.For example, the Department of Computer Science at University of Texas Austin (UTCS) introduced an explainable AI system called GRADE (Graduate Admissions Evaluator) that uses logistic regression on past admission records to estimate the probability of a new applicant being admitted in their graduate program [37].While GRADE did not make the final admission decision, it reduced the number of full application reviews as well as review time per application by experts.[38] used features extracted from application materials of students as well as how they performed in the program of study to predict an incoming applicant's potential performance and identify students best suited for the program.An important metric for educational institutes with regard to student admissions is yield rate, the rate at which accepted students decide to enroll at a given school.Machine learning has been used to predict enrollment decisions of students, which would help the institute make strategic admission decisions in order to improve their yield rate and optimize resource allocation [39,40].Additionally, whether students enroll in suitable majors based on their specific backgrounds and prior academic performance is also indicative of future success.Machine learning has also been used to classify students into suitable majors in an attempt to set them up for academic success [41].
Another research direction in this domain approaches the admissions problem from the perspective of students by predicting the probability that an applicant will get admission at a particular university in order to help applicants better target universities based on their profiles as well as university rankings [42,43,44].Notably, more than one such work finds prior GPA (Grade Point Average) of students to be the most significant factor in admissions decisions [45,46].
Given the high stakes involved and the significant consequences that admissions decisions have on the future of students, there has been considerable discourse on the ethical considerations of using AI in such applications, including its fairness, transparency and privacy aspects [47,48].Aside from the obvious potential risks of worthy applicants getting rejected or unworthy applicants getting in, such systems can perpetuate existing biases in the training data from human decision-making in the past [49].For example, such systems might show unintentional bias towards certain demographics, gender, race, income groups, etc. [49] advocated for explainable models for making admission decisions, as well as proper system testing and balancing before reaching the end user.[50] showed that demographic parity mechanisms like group-specific admission thresholds increase the utility of the selection process in such systems in addition to improving its fairness.Despite concerns regarding fairness and ethics, interestingly, university students in a recent survey rated algorithmic decision-making (ADM) higher than human decision-making (HDM) in admission decisions in both procedural and distributive fairness aspects [51].

Content design
In the context of education, we can define content as -(i) learning content for a course, curriculum or test; and (ii) schedules/timetables of classes.We discuss AI/ML approaches for both of the above in this section.
(i) Learning content design: Prior to the start of the learning process, educators and administrators are responsible for identifying an appropriate set of courses for a curriculum, an appropriate set of contents for a course, or an appropriate set of questions for a standardized test.In course and curriculum design, there is a large body of work using traditional systematic and relational approaches [52], however the last decade saw several works using AI-informed curriculum design approaches.For example, [53] uses classical ML algorithms to identify factors prior to declaration of majors in universities that adversely affect graduation rates, and advocates curriculum changes to alleviate these factors.[54] uses tree-based approaches on historical records to prioritize the prerequisite structure of a curriculum in order to determine student progression routes that are effective.[55] proposes an Outcome Based Education (OBE) where expected outcomes from a degree program such as job roles/skills are identified first, and subsequently courses required to reach these outcomes are proposed by modeling the curriculum using ANNs.[56] suggests a semi-automated curriculum design approach by automatically curating low-cost, learner-generated content for future learners, but argues that more work is needed to explore data-driven approaches in curating pedagogically useful peer content.
For designing standardized tests such as TOEFL, SAT or GRE, an essential criteria is to select questions having a consistent difficulty level across test papers for fair evaluation.This is also useful in classroom settings if teachers want to avoid plagiarism issues by setting multiple sets of test papers, or in designing a sequence of assignments or exams with increasing order of difficulty.This can be done through Question Difficulty Prediction (QDP) or Question Difficulty Estimation (QDE), an estimate of the skill level needed to answer a question correctly.QDP was historically estimated by pretesting on students or from expert ratings, which are expensive, time-consuming, subjective and often vulnerable to leakage or exposure [57].Rule-based algorithms relying on difficulty features extracted by experts were also proposed in [58,59] for automatic difficulty estimation.As data-driven solutions became more popular, a common approach used linguistic features [60,61], readability scores, [62,63] and/or word frequency features [62,63,64] with ML algorithms such as linear regression, SVMs, tree-based approaches, neural networks, etc. for downstream classification or regression, depending on the problem setup.With automatic testing systems and ready availability of large quantities of historical test logs, deep learning has been increasingly used for feature extraction (word embeddings, question representations, etc.) and/or difficulty estimation [65,66,67].Attention strategies have been used to model the difficulty contribution of each sentence in reading problems [68] or to model recall (how hard it is to recall the knowledge assessed by the question) and confusion (how hard it is to separate the correct answer from distractors) in [69].Domain adaptation techniques have also been proposed to alleviate the need of difficulty-labeled question data for each new course by aligning it with the difficulty distribution of a resource-rich course [70].[71] points out that a majority of data-driven QDP approaches belong to language learning and medicine, possibly spurred on by the existence of a large number of international and national-level standardized language proficiency tests and medical licensing exams.
(ii) Timetabling: Educational Timetabling Problem (ETP) deals with the assignment of classes or exams to a limited number of time-slots such that certain constraints (e.g.availability of teachers, students, classrooms, equipments, etc.) are satisfied.This can be divided into three types -course timetabling, school timetabling and exam timetabling [72].Timetabling not only ensures proper resource allocation, its design considerations (e.g.number of courses per semester, number of lectures per day, number of free time-slots per day, etc.) have noticeable impact on student attendance behavior and academic performance [73].Popular approaches in this domain such as mathematical optimization, meta-heuristic, hyper-heuristic, hybrid, fuzzy logic approaches, etc. [72,74] mostly is beyond the scope of our paper (see Section 2.2).Having said that, it must be noted that machine learning has often been used in conjunction with such mathematical techniques to obtain better performing algorithms.For example, [75] used supervised learning to find approximations for evaluating solutions to optimization problems -a critical step in heuristic approaches.Reinforcement learning has been used to select low-level heuristics in hyper-heuristic approaches [76,77] or to obtain a suitable search neighborhood in mathematical optimization problems [78].

Content generation
The difference between content design and content generation is that of curation versus creation.While the former focuses on selecting and structuring the contents for a course/curriculum in a way most appropriate for achieving the desired learning outcomes, the latter deals with generating the course material itself.AI has been widely adopted to generate and improve learning content prior to the start of the learning process, as discussed in this section.
Automatically generating questions from narrative or informational text, or automatically generating problems for analytical concepts are becoming increasingly important in the context of education.Automatic question generation (AQG) from teaching material can be used to improve learning and comprehension of students, assess information retention from the material and aid teachers in adding supplemental material from external sources without the timeintensive process of authoring assessments from them.They can also be used as a component in intelligent tutoring systems to drive engagement and assess learning.AQG essentially consists of two aspects: content selection or what to ask, and question construction or how to ask it [79], traditionally considered as separate problems.Content selection for questions was typically done using different statistical features (sentence length, word/sentence position, word frequency, noun/pronoun count, presence of superlatives, etc.) [80] or NLP techniques such as syntactic or semantic parsing [81,82], named entity recognition [83], topic modeling [84] etc. Machine learning has also been used in such contexts, e.g. to classify whether a certain sentence is suitable to be used as a stem in cloze questions (passage with a portion occluded which needs to be replaced by the participant) [85].The actual question construction, on the other hand, traditionally adopted rule-based methods like transformation-based approaches [86] or template-based approaches [87].The former rephrased the selected content using the correct question key-word after deleting the target concept, while the latter used pre-defined templates that can each capture a class of questions.[88] used an overgenerate-and-rank approach to overgenerate questions followed by the use of supervised learning for ranking them, but still relied on handcrafted generating rules.Following the success of neural language models and concurrent with the release of large-scale machine reading comprehension datasets [89,90], question generation was later framed as a sequence-to-sequence learning problem that directly maps a sentence (or the entire passage containing the sentence) to a question [91,92,93], and can thus be trained in an end-to-end manner [79].Reinforcement learning based approaches that exploit the rich structural information in the text have also been explored in this context [94].While text is the most common type of input in AQG, such systems have also been developed for structured databases [95,96], images [97] and videos [98], and are typically evaluated by experts on the quality of generated questions in terms of relevance, grammatical and semantic correctness, usefulness, clarity etc.
Automatically generating problems that are similar to a given problem in terms of difficulty level, can greatly benefit teachers in setting individualized practice problems to avoid plagiarism and still ensure fair evaluation [99].It also enables the students to be exposed to as many (and diverse) training exercises as needed in order to master the underlying concepts [100].In this context, mathematical word problems (MWPs) -an established way of inculcating math modeling skills in K-12 education -have witnessed significant research interest.Preliminary work in automatic MWP generation take a template-based approach, where an existing problem is generalized into a template, and a solution space fitting this template is explored to generate new problems [101,102,103].Following the same shift as in AQG, [104] proposed an RNN-based approach that encodes math expressions and topic words to automatically generate such problems.Subsequent research along this direction has focused on improving topic relevance, expression relevance, language coherence, as well as completeness and validity of the generated problems using a spectrum of approaches [105,106,107].
On the other end of the content generation spectrum lie systems that can generate solutions based on the content and related questions, which include Automatic Question Answering (AQA) systems, Machine Reading Comprehension (MRC) systems and automatic quantitative reasoning problem solvers [108].These have achieved impressive breakthroughs with the research into large language models and are widely regarded in the larger narrative as a stepping-stone towards Artificial General Intelligence (AGI), since they require sophisticated natural language understanding and logical inferencing capabilities.However their applicability and usefulness in educational settings remains to be seen.

Tutoring aids
Technology has been used to aid learners to achieve their learning goals for a long time.More focused effort on developing computer-based tutoring systems in particular started following the findings of Bloom [109] -students who received tutoring in addition to group classes fared two standard deviations better than those who only participated in group classes.Given its early start, research on Intelligent Tutoring Systems (ITS) is relatively more mature than other research areas under the umbrella of AIEd research.Fundamentally, the difference between designs of ITS comes from the difference in the underlying assumption of what augments the knowledge acquisition process for a student.In the review paper on ITS [110], a comprehensive timeline and overview of research in this domain is provided.Instead of repeating findings from previous reviews under this category, we distinguish between ITS designs through the lens of the underlying hypotheses.We primarily identified four hypotheses that are currently receiving much attention from the research community -emphasis on tutor-tutee interaction, emphasis of personalization, inclusion of affect and emotion, and consideration of specific learning styles.It must be noted that tutoring itself is an interactive process, therefore most designs in this category have a basic interactive setup.However, contributions in categories (ii) through (iv), have other concept as the focal point of their tutoring aid design.
(i) Interactive tutoring aids: Previous research in education [111] has pointed out that when a student is actively interacting with the educator or the course contents, the student stays engaged in the learning process for a longer duration.Learning systems that leverage this hypothesis can be categorized as interactive tutoring aids.These frameworks allow the student to communicate (verbally or through actions) with the teacher or the teaching entity (robots or software) and get feedback or instructions as needed.
Early designs of interactive tutoring aids for teaching and support comprised of rule-based systems mirroring interactions between expert teacher and student [112,113] or between peer companions [114].These template rules provided output based on the inputs from the student.Over the course of time, interactive tutoring systems gradually shifted to inferring the student's state in real time from the student's interactions with the tutoring system and providing fine-tuned feedback/instructions based on the inference.For instance, [115] used a Bayesian active learning algorithm to assess student's word reading skills while the student was being taught by a robot.Presently, a significant number of frameworks belonging to this category uses chatbots as a proxy for a teacher or a teaching assistant [116].These recent designs can use a wide variety of data such as text, speech etc. and rely on a combination of sophisticated and resource-intensive deep-learning algorithms to infer and further customize interactions with the student.For example, [117] presents '@dawebot' that uses NLP techniques to train students using multiple choice question quizzes.[118] presents a conversational medical school tutor that uses NLP and natural language understanding (NLU) to understand user's intent and present concepts associated with a clinical case.
Hint construction and partial solution generation is yet another method to keep students engaged interactively.For instance, [119] used Dynamic Bayes Nets to construct a curriculum of hints and associated problems.[120] in their architecture iGeoTutor assisted students in mastering geometry theorems by implementing search strategies (e.g.DFS) from partially complete proofs.[121] aims to improve individual and self-regulated learning in group assignments through a conversational system built using NLU and dialogue management systems that prompts the students to reflect on lessons learnt while directing them to partial solutions, etc.
One of the requirements of certain professional and vocational training such as biology, medicine, military etc. is practical experience.With the support of booming infrastructure, many such training programs are now adopting AI-driven augmented reality (AR)/ virtual reality (VR) lesson plans.Interconnected modules driven by computer vision, NLU, NLP, text-to-speech (TTS), information retrieval algorithms facilitate lessons and/or assessments in biology [122], surgery and medicine [123], pathological laboratory analysis [124], military leadership training [125] etc.
(ii) Personalized tutoring aids: As every student is unique, personalizing instruction and teaching content can positively impact the learning outcome of the student [126] -tutoring systems that incorporate this can be categorized as personalized learning systems or personalized tutoring aids.Notably, personalization during instruction can occur through course content sequencing, display of prompts and additional resources, etc.
The sequence in which a student reviews course topics plays an important role in their mastery of a concept.One of the criticisms of early computer based learning tools was the 'one approach fits all' method of execution.To improve upon this limitation, personalized instructional sequencing approaches were adopted.In some early developments, [127] developed a course sequencing method that mirrored the role of an instructor using soft computing techniques such as self organized maps and feed-forward neural networks.[128] propose the use of decision trees trained on student background information to propose personalized learning paths for creativity learning.Reinforcement learning naturally lends itself to this task.Here an optimal policy (sequence of instructional activities) is inferred depending on the cognitive state of a student (estimated through knowledge tracing) in order to maximize a learning-related reward function.As knowledge delivery platforms are increasingly becoming virtual and thereby generating more data, deep reinforcement learning has been widely applied to the problem of instructional sequencing [129,130,131,132].[56] presents a systematic review of RL-induced instructional policies that were evaluated on students, and concludes that over half outperform all baselines they were tested against.
In order to display a set of relevant resources personalized with respect to a student state, algorithmic search is carried out in a knowledge repository.For instance, [133] uses information retrieval and NLP techniques to present two frameworks: PedaBot that allows students to connect past discussions to the current discussion thread and MentorMatch that facilitates student collaboration customized based on student's current needs etc.Both PedaBot and MentorMatch systems use text data coming from a live discussion board in addition to textbook glossaries.In order to reduce information overload and allow learners to easily navigate e-learning platforms, Deep Learning-Based Course Recommender System (DECOR) has been proposed recently [134] -this architecture comprises of neural network based recommendation systems trained using student behavior and course related data.
(iii) Affect aware tutoring aids: Scientific research proposes incorporating affect and behavioral state of the learner into the design of the tutoring system as it enhances the effectiveness of the teaching process [135,136].In [137], Arroyo et.al. suggests that cognition, meta-cognition and affect should indeed be modeled using real time data and used to design intervention strategies.Affect and behavioral state of a student can generally be inferred from sensor data that tracks minute physical movements of the student (eyegaze, facial expression, posture etc.).While initial approaches in this direction required sensor data, a major constraint for availing and using such data pertains to ethical and legal reasons.'Sensor-free' approaches have thereby been proposed that use data such as student self-evaluations and/or interaction logs of the student with the tutoring system.[138,139] use interaction data to build affect detector models -the raw data in these cases are first distilled into meaningful features and then fed into simple classifier models that detect individual affective states.[140] compares the usage of sensor and interaction data in delivering motivational prompts in the course of military training.In [141], uses RNNs to enhance the performance of sensor-free affect detection models.In their review of affect and emotion aware tutoring aids, [142] explore in depth the different use cases for affect aware intelligent tutoring aids such as enriching user experience, better curating learning material and assessments, delivering prompts for appraisal, navigational instructions etc and the progress of research in each direction.
(iv) Learning style aware tutoring aids: Yet another perspective in the domain of ITS is that understanding the learning styles of the students prior to delivery of course content leads to better end outcomes.[143,144,145,146] among others proposed different approaches to categorize learning styles of students.Traditionally, an individual's learning style was inferred via use of a self-administered questionnaire.However, more recently machine learning based methods are being used to categorize learning styles more efficiently from noisy subject data.[147,148,149,150] use as input the completed questionnaire and/or other data sources such as interaction data, behavioral data of students etc and feed the extracted features into feed-forward neural networks for classification.Unsupervised methods such as self-organizing map (SOM) trained using curated features have also been used for automatic learning style identification [151].While for categorization per the Felder and Silverman learning style model, count of student visits to different sections of the e-learning platform are found to be more informative [150], [152], for categorization per the Kolb learning model, student performance and student preference features were found to be more relevant.Additionally, machine learning approaches have also been proposed for learning style based learning path design.In [153], learning styles are first identified through a questionnaire and represented on a polar map, thereafter neural networks are used to predict the best presentation layout of the learning objective for a student.

Performance assessment and monitoring
A critical component of the knowledge delivery phase involves assessing student performance by tracing their knowledge development and providing grades and/or constructive feedback on assignments and exams, while simultaneously ensuring academic integrity is upheld.Conversely, it is also important to evaluate the quality and effectiveness of teaching, which has a tangible impact on the learning outcomes of students.AI-driven performance assessment and monitoring tools have been widely developed for both learners and educators.Since a majority of evaluation material are in textual format, NLP-based models in particular have a major presence in this domain.We divide this section into student-focused and teacher-focused approaches, depending on the direct focus group of such applications.
(i) Student-focused: Knowledge tracing.An effective way of monitoring the learning progress of students is through knowledge tracing, which models knowledge development in students in order to predict their ability to answer the next problem correctly given their current mastery level of knowledge concepts.This not only benefits the students by identifying areas they need to work on, but also the educators in designing targeted exercises, personalized learning recommendations and adaptive teaching strategies [154].An important step of such systems is cognitive modeling, which models the latent characteristics of students based on their current knowledge state.Traditional approaches for cognitive modeling include factor analysis methods which estimate student knowledge by learning a function (logistic in most cases) based on various factors related to the students, course materials, learning and forgetting behavior, etc. [155,156,157].Another research direction explores Bayesian inference approaches that update student knowledge states using probabilistic graphical models like Hidden Markov Model (HMM) on past performance records [158], with substantial research being devoted to personalizing such model parameters based on student ability and exercise difficulty [159,160].Recommender system techniques based on matrix factorization have also been proposed, which predict future scores given a student-exercise performance matrix with known scores [161,162].[163] provides a comprehensive taxonomy of recent work in deep learning approaches for knowledge tracing.Deep knowledge tracing (DKT) was one of the first such models which used recurrent neural network architectures for modeling the latent knowledge state along with its temporal dynamics to predict future performance [164].Extensions along this direction include incorporating external memory structures to enhance representational power of knowledge states [165,166], incorporating attention mechanisms to learn relative importance of past questions in predicting current response [167,168], leveraging textual information from exercise materials to enhance prediction performance [169,154] and incorporating forgetting behavior by considering factors related to timing and frequency of past practice opportunities [170,171].Graph neural network based architectures were recently proposed in order to better capture dependencies between knowledge concepts or between questions and their underlying knowledge concepts [172,173,174].Specific to programming, [175] used a sequence of embedded program submissions to train RNNs to predict performance in the current or the next programming exercise.However as pointed out in [163], handling of non-textual content as in images, mathematical equations or code snippets to learn richer embedding representations of questions or knowledge concepts remains relatively unexplored in the domain of knowledge tracing.
Grading and feedback.While technological developments have made it easier to provide content to learners at scale, scoring their submitted work and providing feedback on similar scales remains a difficult problem.While assessing multiple-choice and fill-in-the-blank type questions is easy enough to automate, automating assessment of open-ended questions (e.g.short answers, essays, reports, code samples, etc.) and questions requiring multi-step reasoning (e.g.theorem proving, mathematical derivations, etc.) is equally hard.But automatic evaluation remains an important problem not only because it reduces the burden on teaching assistants and graders, but also removes grader-to-grader variability in assessment and helps accelerate the learning process for students by providing real-time feedback [176].
In the context of written prose, a number of Automatic Essay Scoring (AES) and Automatic Short Answer Grading (ASAG) systems have been developed to reliably evaluate compositions produced by learners in response to a given prompt, and are typically trained on a large set of written samples pre-scored by expert raters [177,178].Over the last decade, AI-based essay grading tools evolved from using handcrafted features such as word/sentence count, mean word/sentence length, n-grams, word error rates, POS tags, grammar, punctuation, etc. [179,180,181,182] to automatically extracted features using deep neural network variants [183,184,185,186].Such systems have been developed not only to provide holistic scoring (assessing essay quality with a single score), but also for more fine-grained evaluation by providing scoring along specific dimensions of essay quality, such as organization [187], prompt-adherence [188], thesis clarity [189], argument strength [190], thesis strength [191], etc.Since it is often expensive to obtain expert-rated essays to train on each time a new prompt is introduced, considerable attention has been given to cross-prompt scoring using multi-task, domain adaptation or transfer learning techniques, both with handcrafted [180,181] and automatically extracted features [192,193].Moreover feedback being a critical aspect of essay drafting and revising, AES systems are increasingly being adopted into Automated Writing Evaluation (AWE) systems that provide formative feedback along with (or instead of) final scores and therefore have greater pedagogical usefulness [194].For example, AWE systems have been developed for providing feedback on errors in grammar, usage and mechanics [195] and text evidence usage in response-to-text student writings [196].AI-based evaluation tools are also heavily used in computer science education, particularly programming, due to its inherent structure and logic.Traditional approaches for automated grading of source codes such as test-case based assessments [197] and assessments using code metrics (lines of code, number of variables, number of statements, etc.), while simple, are neither robust nor effective at evaluating program quality.
A more useful direction measures similarities between abstract representations (control flow graphs, system dependence graphs) of the student's program and correct implementations of the program [198,199] for automatic grading.Such similarity measurements could also be used to construct meaningful clusters of source codes and propagate feedback on student submissions based on the cluster they belong to [200,201].[176] extracts informative features from abstract representations of the code to train machine learning models using expert-rated evaluations in order to output a finer-grained evaluation of code quality.[202] used RNNs to learn program embeddings that can be used to propagate human comments on student programs to orders of magnitude more submissions.A bottleneck in automatic program evaluation is the availability of labeled code samples.Approaches proposed to overcome this issue include learning question-independent features from code samples [203,204] or zero-shot learning using human-in-the-loop rubric sampling [205].
Elsewhere, driven by the maturing of automatic speech recognition technology, AI-based assessment tools have been used for mispronunciation detection in computer-assisted language learning [206,207,208] or the more complex problem of spontaneous speech evaluation where the student's response is not known apriori [209].Mathematical language processing (MLP) has been used for automatic assessment of open response mathematical questions [210,211], mathematical derivations [212] and geometric theorem proving [213], where grades for previously unseen student solutions are predicted (or propagated from expert-provided grades), sometimes along with partial credit assignment.[214], moreover, overcomes the limitation of having to train a separate model per question by using multi-task and meta-learning tools that promote generalizability to previously unseen questions.
Academic integrity issues.Another aspect of performance assessment and monitoring is to ensure the upholding of academic integrity by detecting plagiarism and other forms of academic or research misconduct.[215] in their review paper on academic plagiarism detection in text (e.g.essays, reports, research papers, etc.) classifies plagiarism forms according to an increasing order of obfuscation level, from verbatim and near-verbatim copying to translation, paraphrasing, idea-preserving plagiarism and ghostwriting.In a similar fashion, plagiarism detection methods have been developed for increasingly complex types of plagiarism, and widely adopt NLP and ML-based techniques for each [215].For example, lexical detection methods use n-grams [216] or vector space models [217] to create document representations that are subsequently thresholded or clustered [217] to identify suspicious documents.Syntax-based methods rely on PoS tagging [218], frequency of PoS tags [219] or comparison of syntactic trees [220].Semanticsbased methods employ techniques such as word embeddings [221], Latent Semantic Analysis [222], Explicit Semantic Analysis [223], word alignment [224], etc., often in conjunction with other ML-based techniques for downstream classification [225,226].Complementary to such textual analysis-based methods, approaches that use non-textual elements like citations, math expressions, figures, etc. also adopt machine learning for plagiarism detection [227].[215] also provides a comprehensive summary of how classical ML algorithms such as tree-based methods, SVMs, neural networks, etc. have been successfully used to combine more than one type of detection method to create the best-performing meta-system.More recently, deep learning models such as different variants of convolutional and recurrent neural network architectures have also been used for plagiarism detection [228,229].
In computer science education where programming assignments are given to evaluate students, source code plagiarism can also been classified based on increasing levels of obfuscation [230].The detection process typically involves transforming the code into a high-dimensional feature representation followed by measurement of code similarity.Aside from tradionally used features extracted based on structural or syntactic properties of programs [231,232], NLP-based approaches such as n-grams [233], topic modeling [234], character and word embeddings [235] and character-level language models [236] are increasingly being used for robust code representations.Similarly for downstream similarity modeling or classification, unsupervised [237] and supervised [238,235] machine learning and deep learning algorithms are popularly used.
It is worth noting that AI itself makes plagiarism detection an uphill battle.With the increasing prevalence of easily accessible large language models like InstructGPT [239] and ChatGPT [240] that are capable of producing naturalsounding essays and short answers, and even working code snippets in response to a text prompt, it is now easier than ever for dishonest learners to misuse such systems for authoring assignments, projects, research papers or online exams.How plagiarism detection approaches, along with teaching and evaluation strategies, evolve around such systems remains to be seen.
(ii) Teacher-focused: Teaching Quality Evaluations (TQEs) are important sources of information in determining teaching effectiveness and in ensuring learning objectives are being met.The findings can be used to improve teaching skills through appropriate training and support, and also play a significant role in employment and tenure decisions and the professional growth of teachers.Such evaluations have been traditionally performed by analyzing student evaluations, teacher mutual evaluations, teacher self-evaluations and expert evaluations [241], which are labor-intensive to analyze at scale.Machine learning and deep learning algorithms can help with teacher evaluation by performing sentiment analysis of student comments on teacher performance [242,243,244], which provides a snapshot of student attitudes towards teachers and their overall learning experiences.Further, such quantified sentiments and emotional valence scores have been used to predict students' recommendation scores for teachers in order to determine prominent factors that influence student evaluations [245].[246] uses student ratings related to class planning, presentation, management and student participation to directly predict instructor performance.
Apart from helping extract insights from teacher evaluations, AI can also be used to evaluate teaching strategies on the basis of other data points from the learning process.For example, [247] used a symbolic regression-based approach to evaluate the impact of assignment structures and collaboration type on student scores, which course instructors can use for the purpose of self-evaluation.Several works use a combination of student ratings and attributes related to the course and the instructor to predict instructor performance and investigate factors affecting learning outcomes [248,249,250] .

Outcome prediction
While a course is ongoing, one way to assess knowledge development in students is through graded assignments and projects.On the other hand, educators can also benefit from automatic prediction of students' performance and automatic identification of students at risk of course non-completion.This can be accomplished by monitoring students' patterns of engagement with the course material in association with their demographic information.Such apriori understanding of a student's outcome allows for designing effective intervention strategies.Presently, most K-12, undergraduate and graduate students, when necessary resources are available, rely on computer and web-based infrastructure [251].A rich source of data indicating student state is therefore generated when a student interacts with the course modules.Prior to computers being such an integral component in education, researchers frequently used surveys and questionnaires to gauge student engagement, sentiment and attrition probability.In this section we will summarize research developments in the field of AI that generate early prediction of student outcomes -both final performance and possibility of drop-out.
Early research in outcome prediction focused on building explanatory regression-based models for understanding student retention using college records [252].The active research direction in this space gradually shifted to tackling the more complex and more actionable problems of understanding whether a student will complete a program [253], estimating the time a student will take to complete a degree [254] and predicting the final performance of a student [255] given the current student state.In the subsequent paragraphs, we will be discussing the research contributions for outcome prediction with distinction between performance prediction in assessments and course attrition prediction.Note that we discuss these separately as poor performance in any assessment cannot be generalized into a course non-completion.
(i) Apriori performance prediction: Apriori prediction of performance of a student has several benefits -it allows a student to evaluate their course selection, allows educators to evaluate progress and offer additional assistance as needed etc.Not surprisingly therefore AI-based methods have been proposed to automate this important task in the education process.
Initial research articles predicting performance estimated time to degree completion [254] using student demographic, academic, residential and financial aid information, student parent data and school transfer records.In a related theme, researchers have also mapped the question of performance prediction into a final exam grade prediction problem (i.e., excellent, good, fair, fail etc.) [255,256,257].This granular prediction eventually allows educators to assess which students require additional tutoring.Baseline algorithms in this context are Decision Trees, Support Vector Machines, Random Forests, Artificial Neural Networks etc (regression or classification based on the problem setup).Researchers have aimed to improve the performance of the predictors by including relevant information such as student engagement, interactions [258,256], role of external incentives [259], previous performance records [260] etc. [261] proposed that a student's performance or when the student anticipates graduation should be predicted progressively (using an ensemble machine learning method) over the duration of the student's tenure as the academic state of the student is ever-evolving and can be traced through their student records.The process of generalizing performance prediction to non-traditional modes of learning such as hybrid or blended learning and on-line learning has benefitted from the inclusion of additional information sources such as web-browsing information [262], discussion forum activity, student study habits [263]etc.
In addition to exploring a more informative and robust feature set, recently, deep learning based approaches have been identified to outperform traditional machine learning algorithms.For example, [264] used deep feed-forward neural networks and split the problem of predicting student grade into multiple binary classification problems viz., Pass-Fail, Distinction-Pass, Distinction-Fail, Withdrawn-Pass.[265] analyzed if transfer learning (i.e., pre-training neural networks on student data on a different course) can be used to accurately predict student performance.[266] used a generative adversarial network based architecture, ICGAN-DSVM, to address the challenges of low volume of training data in alternative learning paradigms such as supportive learning.[257] proposed extensive data pre-processing using min-max scaler, quantile transformation etc. before passing the data in a deep-learning model such as one-dimensional convolutional network (CN1D) or recurrent neural networks (LSTM).For a comprehensive survey of ML approaches for this topic, we would refer readers to [267] and [268].
(ii) Apriori attrition prediction: Students dropping out before course completion is a concerning trend.This is more so in developing nations where very few students finish primary school [269].The outbreak of the COVID-19 pandemic exacerbated the scenario due to indefinite school closures.This led to loss in learning and progress towards providing access to quality education [270].The causes for dropping out of a course or a degree program can be diverse, but early prediction of it allows administrative staff and educators to intervene.To this end, there have been efforts in using machine learning algorithms to predict attrition.
Massive Open Online Courses (MOOCs): In the context of attrition, special mention must be made of Massive Open Online Courses (MOOCs).While MOOCs promise the democratization of education, one of the biggest concerns with MOOCs is the disparity between the number of students who sign up for a course versus the number of students who actually complete the course -the drop-out rate in MOOCs is significantly high [271], [272].Yet in order to make post-secondary and professional education more accessible, MOOCs have become more a practical necessity than an experiment.The COVID-19 pandemic has only emphasized this necessity [273].In our literature search phase, we found a sizeable number of contributions in attrition prediction that uses data from MOOC platforms.In this subsection, we will be including those as well as attrition prediction in traditional learning environments.
Early educational data mining methods [253] proposed to predict student drop-out mostly used data sources such as student records (i.e., student demographics, academic, residential, gap year, financial aid information etc.) and administrative records (major administrative changes in education, records of student transfers etc.) to train simple classifiers such as Logistic Regression, Decision Tree, BayesNet, Random Forest etc. Selecting an appropriate set of features and designing explainable models has been important as these later inform intervention [274].To this end, researchers have explored features such as students' prior experiences, motivation and home environment [275], student engagement with the course [276], [277] etc.In an exciting experiment, [278] crowd-sourced feature engineering for predicting student attrition.With the inclusion of an online learning component (particularly relevant for MOOCs), click-stream data and browser information generated allowed researchers to better understand student behavior in an ongoing course.Using historical click-stream data in conjuction with present click-stream data, allowed [279] to effectively predict drop-outs weekly using a simple Support Vector Machine algorithm.This kind of data has also been helpful in understanding the traits indicative of decreased engagement [280], the role of a social cohort structure [281] and the sentiment in the student discussion boards and communities [282] leading up to student drop-out.[283] addresses the concern that weekly prediction of probability of a student dropping out might have wide variance by including smoothing techniques.On the other hand, as resources to intervene might be limited, [284] recommends assigning a risk-score per student rather than a binary label.[285] considers the level of activity of a student in bins of time during a semester as a binary features (active vs. inactive) and then uses these sequences as n-grams to predict drop-out.Recent developments in predicting student attrition propose the use of data acquired from disparate sources in addition to more sophisticated algorithms such as deep feed-forward neural networks [286], hybrid logit leaf model [287] etc.
5 Impact of COVID-19 pandemic on driving AI research in the frontier of education COVID-19 pandemic, possibly the most significant social disruptor in recent history, impacted more than 1.5 billion students worldwide [288] and is believed to have had far-reaching consequences in the domain of education, possibly even generational setbacks [289,290,291].As lockdowns and social distancing mandated a hastened transition to fully virtual delivery of educational content, teachers and administrators grappled with the issue of providing quality education at scale in new formats while ensuring learning progress remains unhindered.We discuss how the pandemic changed the education landscape through remote learning, exacerbated socio-economic inequalities in learning and outline how AI has the potential to address some of the identified issues.
Remote learning and AI.The pandemic era saw an increasing adoption of video conferencing softwares and social media platforms for knowledge delivery, combined with more asynchronous formats of learning.These alternative media of communication were often accompanied by decreasing levels of engagement and satisfaction of learners [292,293].There was also a corresponding decrease in practical sessions, labs and workshops, which are quite critical in some fields of education.For example, [294] outlines the disruptive effect of the pandemic on the medical education landscape, possibly making medical students under-prepared for clinical training experiences typically conducted in person, such as history-taking, physical examination, etc.However, the pandemic also led to an accelerated adoption of AI-based approaches in education.Pilot studies show that the pandemic led to a significant increase in the usage of AI-based e-learning platforms [295].Moreover, a natural by-product of the transition to online learning environments is the generation and logging of more data points from the learning process [296] that can be used in AI-based methods to assess and drive student engagement and provide personalized feedback.Online teaching platforms also make it easier to incorporate web-based content, smart interactive elements and asynchronous review sessions to keep students more engaged [295,297].
Inclusive education and AI.Several recent works have investigated the role of pandemic-driven remote and hybrid instruction in widening gaps in educational achievements by race, poverty level and gender [298,299,300].A widespread transition to remote learning necessitates access to proper infrastructure (electricity, internet connectivity and smart electronic devices that can support video conferencing apps and basic file sharing) as well as resources (learning material, textbooks, educational softwares, etc.), which create barriers for low-income groups [301].[301] also outlines the importance of parents as allies of educators to mitigate some of the limitations of remote learning.This is far less achievable in low-income groups (and ethnic minorities in some cases) due to a lack of awareness, lower educational qualifications of adults in the household, as well as disproportionate loss of income and public health impact of the COVID-19 pandemic in these communities [302].Even within similar populations, unequal distribution of household chores, income-generating activities and access to technology-enabled devices affect students of different genders disproportionately [300].Moreover, remote learning requires a level of tech-savviness on the part of students and teachers alike, which might be less prevalent in people with learning disabilities.It is therefore undeniable that such widespread changes had far-reaching impacts on inclusive and quality education worldwide.
In this context, [303] outlines the different ways AI is used in special need education for development of adaptive and inclusive pedagogies.[304] reviews the different ways in which AI positively impacts education of minority students, e.g. through facilitating performance/engagement improvement, student retention, student interest in STEM/STEAM fields, etc. [304] also outlines the technological, pedagogical and socio-cultural barriers for AIEd in inclusive education.

Discussion
In this article, we have investigated the involvement of artificial intelligence in the end-to-end educational process.We have highlighted specific research problems both in the planning and in the knowledge delivery phase and reviewed the technological progress in addressing those problems in the past two decades.To the best of our knowledge, such distinction between proactive and reactive phases of education accompanied by a technical deep-dive is an uniqueness of this review.

Major trends in involvement of AI in the end-to-end education process
The growing interest in AIEd can be inferred from Fig. 2 and Fig. 5 which show how both the count of technical contributions and the count of review articles on the topic have increased over the past two decades.It is to be noted that the number of technical contributions in 2021 and 2022 (assuming our sample of reviewed articles is representative of the population) might have fallen in part due to pandemic-related indefinite school closures and shift to alternate learning models.This triggered a setback on data collection, reporting and annotation efforts due to a number of factors including lack of direct access to participants, unreliable network connectivity, necessity of enumerators adopting to new training modes, etc [305].Another important observation from Fig. 3 is that AIEd research in most categories focuses heavily on learners in universities, e-learning platforms and MOOCs -work targeting pre-school and K-12 learners is conspicuously absent.A notable exception is research surrounding tutoring aids that has a nearly uniform attention for different target audience groups.
In all categories, to different extents, we see a distinct shift from rule-based and statistical approaches to classical ML to deep learning methods, and from handcrafted features to automatically extracted features.This advancement goes hand-in-hand with the increasingly complex nature of the data being utilized for training AIEd systems.Whereas earlier approaches used mostly static data (e.g.student records, administrative records, demographic information, surveys and questionnaires), the use of more sophisticated algorithms necessitated (and in turn benefited from) more real-time and high-volume data (e.g.student-teacher/peer-peer interaction data, click-stream information, web-browsing data, etc.).The type of data used by AIEd systems also evolved from mostly tabular records to more text-based and even multi-modal data, spurred on by the emergence of large language models that can handle large quantities of such data.
Even though data-hungry models like deep neural networks have grown in popularity across almost all categories discussed here, AIEd often suffers from the availability of sufficient labeled data to train such systems.This is particularly true for small classes and new course offerings, or when existing curriculum or tests are changed to incorporate new elements.As a result, another emerging trend in AIEd focuses on using information from resource-rich courses or existing teaching/evaluation content through domain adaptation, transfer learning, few-shot learning, meta learning, etc.

Existing challenges in adopting Artificial Intelligence for education
In 2023, artificial intelligence has permeated the lives of people in some aspect or other globally (e.g.chat-bots for customer service, automated credit score analysis, personalized recommendations etc.).At the same time, AI-driven technology for the education sector is gradually becoming a practical necessity globally.The question therefore is, what are the existing barriers in global adoption of AI for education in a safe and inclusive manner, such that SDG4 can be achieved [1].In this section, we analyze and discuss some of our observations with regards to existing AIEd technologies and the challenges in deploying these technologies globally.
Lack of cultural awareness in AIEd tools: The lack of international representation in the AIEd research field has been brought to light several times through both perspective review articles [306,307] as well as through bibliometric analysis [32,308].In particular, most of the existing AIEd technologies were conceptualized and developed in the context of Western, Educated, Industrialized, Rich and Democratic (WEIRD) societies.Therefore, such technologies make implicit assumptions about students and teachers (e.g., individualism, fairness, cooperation, sensitivity to disciplinary actions etc.) that are valid in such demographics.Moreover, it is likely that the training data for the AI entities in these frameworks was obtained from a very specific section of the global population.As a result, the trained models will have limited generalization capabilities when deployed for a different target audience.Classroom instruction strategies, teacher-student interactions, peer to peer interactions, teaching curriculum are expected to vary widely across the globe making cultural awareness and sensitivity in educational frameworks a must [309,310,311].Beyond classroom and online courses, culture aware tutoring systems are relevant for professional and vocational training as well.The notion of developing culturally aware educational technologies is relatively new and some early research in this direction [312,313,314] proposes the embodiment of culturally sensitive gestures and movements in intelligent tutoring systems.However, it remains to be seen how this concept matures and is made available in classroom and online settings for learners from different backgrounds [315].
Lack of concrete legal and ethical guidelines for AIEd research: As pointed out by [25], besides most AIEd researchers being concentrated in the technologically advanced parts of the world, most AIEd platforms and applications are owned currently by the private sector.The private investor funded research in big corporations such as Coursera, EdX, IBM, McGraw-Hill and start-ups like Elsa, Century, Querium have yielded several robust AIEd applications.However, as these platforms are privately owned, there is little transparency and regulations regarding their development and operations.Due to this, there is growing concern on the part of guardians and teaching staff regarding the data accessed by these platforms, privacy and security of the data stored and explainability of the deployed models.To alleviate this, regulation policies at the international, national and state levels can help address the concerns of the end users.While many tech-savvy nations have had a head start in this [316], drafting general guidelines for AIEd platforms is still very much a nascent concept for most policy makers.
Lack of equitable access to infrastructure hosting AIEd: Education is one of the most important social equalizers [317].However, in order to ensure more people have access to quality education, AI-enabled teaching and studying tools are necessary to reduce the stress on educators and administrative staff [25].The irony here is that the cost of deploying and operating AIEd tools often alienates communities with limited means thereby widening the gap in access to education.[318] mentions that access to electricity, internet, data storage and processing hardware have been barriers in deploying AI-driven platforms.To remove these obstacles, changes must be brought about in local and global levels.While formation of international alliances that invest in infrastructure development can usher in the technology in developing nations, changes in local policies can expedite the process [319].
Lack of skilled personnels to operate AIEd tools in production: Investing in AIEd research and supporting infrastructure alone is not sufficient to ensure long term utility and usage of AI-driven tools for education.Workforce responsible for using these tools on a day-to-day basis must also be brought up to speed.Currently, there is a considerable amount of apprehension, particularly in developing countries, regarding use of AI for education [315,320].The main concerns are related to data privacy and security, job security, ethics etc. post adoption of AI in this sector.These concerns in turn have slowed down integration of technology for education.In this context, we must echo [25] in mentioning that while these concerns are relevant and must be addressed, in our review of AIEd research, we have not found any evidence that should invoke panic in educators and administrative staff.AIEd research as it stands today only augments the role of the teacher, and does not eliminate it.Furthermore, for the foreseeable future, we would need a human in the loop to provide feedback and ensure proper daily usage of these tools.

Concluding Remarks
Through this review, we identified the paradigm shift over the past 20 years in formulating computational models (i.e., choice of algorithms, choice of features etc.) and training them (i.e., choice of data) -we are indeed increasingly leaning towards sophisticated yet explainable frameworks.As the scope of this review includes a period of social disruption due to COVID-19 pandemic, it provided us the opportunity to introspect on the utility and the robustness of the proposed technology thus far.To this end, we have discussed the concerns and limitations brought to light by the pandemic and research ideas spawning from that.
With the target of ensuring equitable access to education being set for 2030, one of the inevitable questions arising is: are we ready to use AI driven ed-tech tools to support educators and students?.This remains however a question to be answered.Based on our survey, we have observed that while in some parts of the world we have seen great momentum in making AIEd a part and parcel of the education sector, in other parts of the world this progress is stymied by inadequate access to necessary infrastructure and human resources.The pivotal point at this time is that while there needs to be changes at a socio-economic level to adopt the state of the art AI driven ed-tech technologies as standard tools for education, the progress made and the ongoing conversations are reasons for optimism.

Figure 2 :
Figure 2: Distribution of the reviewed technical articles across the past two decades

Figure 3 :
Figure 3: Distribution of reviewed technical articles across categories and target audience categories

Figure 5 :
Figure 5: Number of review articles published in AIEd over the past decade.