Caring assessments: challenges and opportunities

Caring assessments is an assessment design framework that considers the learner as a whole and can be used to design assessment opportunities that learners find engaging and appropriate for demonstrating what they know and can do. This framework considers learners’ cognitive, meta-cognitive, intra-and inter-personal skills, aspects of the learning context, and cultural and linguistic backgrounds as ways to adapt assessments. Extending previous work on intelligent tutoring systems that “care” from the field of artificial intelligence in education (AIEd), this framework can inform research and development of personalized and socioculturally responsive assessments that support students’ needs. In this article, we (a) describe the caring assessment framework and its unique contributions to the field, (b) summarize current and emerging research on caring assessments related to students’ emotions, individual differences, and cultural contexts, and (c) discuss challenges and opportunities for future research on caring assessments in the service of developing and implementing personalized and socioculturally responsive interactive digital assessments


Introduction
Personalization in the assessment context is an umbrella term that can include many different approaches.Most prior research and development has focused on adaptations based on students' prior knowledge or performance during the assessment (e.g., Shemshack et al., 2021).However, personalization may sometimes consider other intra-or interpersonal aspects of students' experience (Du Boulay, 2018).For example, student engagement has been utilized in effort-monitoring computer-based tests (Wise et al., 2006(Wise et al., , 2019) ) and a wider range of student emotions have been used to enhance performance-based adaptation in several personalized learning systems (D'Mello et al., 2011;Forbes-Riley and Litman, 2011).Research in the field of artificial intelligence in education (AIEd) has increasingly emphasized a more holistic picture of learners which takes into account cognitive, metacognitive, and affective aspects of the learner to explain their behavior in learning environments (Grafsgaard et al., 2012;Kizilcec et al., 2017;Yadegaridehkordi et al., 2019), reflecting growing interest in integrating positive psychology into research within the AIEd community (Bittencourt et al., 2023).
The caring assessments (CA) framework provides an approach for designing adaptive assessments that learners find engaging and appropriate for demonstrating their knowledge, skills, and abilities (KSAs; Zapata-Rivera, 2017).This conceptual framework considers cognitive aspects of the learner as well as metacognitive, intra-and interpersonal skills, aspects of the learning context, cultural and linguistic backgrounds, and interaction behaviors within an integrated learner model and uses this model to personalize assessment to students' needs (Zapata-Rivera et al., 2023).Multiple lines of research must be conducted to bring this vision for caring assessment to fruition.This Perspective article describes the CA framework and its unique contributions to the field (Section 2) and summarizes current and emerging research on the CA framework emphasizing students' emotions (Section 3), individual differences (Section 4), and cultural contexts (Section 5).Challenges and opportunities emerging from this literature are also discussed (Section 6), highlighting gaps and future directions for AIEd research that is most promising to advance the vision of CA.

The caring assessments framework
The CA framework (see Figure 1) is a conceptual framework for adaptive assessment design which proposes that assessments can provide a more engaging student experience while collecting more precise information about their KSAs by better understanding who students are and how they interact with the assessment (Zapata-Rivera, 2017).This better understanding of students can be leveraged to provide "caring" in terms of adaptations before, caring support during, and feedback after the assessment (Lehman et al., 2018).
Caring support before the assessment involves the development of student profiles that include a variety of information about the student, from their personal characteristics (e.g., interests, beliefs, linguistic background) to contextual information such as prior learning opportunities (Zapata-Rivera et al., 2020).These profiles can then be leveraged to provide students with an adapted version of the assessment that affords them the best opportunity to engage with the assessment and demonstrate what they know and can do.Alternative versions of the assessment could vary from the assessment format (e.g., multiple-choice items or game-based) to the language (e.g., toggle between English/Spanish) to the context of the assessment (e.g., using different texts while measuring the same underlying reading skills).
The student profiles that enable caring support before the assessment also serve as the start for providing caring support during the assessment.Caring support during the assessment will require an integrated learner model (ILM) that considers both student and contextual characteristics (from the student profile) and the interaction behaviors students demonstrate during the assessment.This ILM is a more complex learner model than is typically employed in personalized learning and assessment tasks but draws on prior research on various types of learner models (Zapata-Rivera and Arslan, 2021;Bellarhmouch et al., 2023).This ILM can leverage information from the student profile and interactions to provide on-demand support.For example, a student might become disengaged during the assessment and the ILM could deploy a motivational message that has been personalized based on the student's interests or prior opportunity to learn within the domain (Kay et al., 2022).
Caring support after the assessment is primarily provided in the feedback report.The goal is to provide feedback to the student that will be easy to understand and motivate them to continue their learning journey.This necessitates feedback reports that utilize assetbased language (Gay, 2013;Ramasubramanian et al., 2021) and provide context for performance on the assessment by leveraging the information in the ILM (e.g., identifying learners' relevant prior knowledge and lived experiences and the strengths they demonstrated on the assessment along with areas for improvement).This contextualized reporting could, for example, identify if student responses were connected to specific behavioral patterns or could connect current performance to students' prior experiences or opportunities to learn to highlight progress.This contextualized reporting can also be utilized when providing feedback to teachers, which can then support teacher decision-making on the next appropriate steps to support student learning and continue caring support outside of the assessment.
The CA framework builds on several areas of prior research.The notion of an adaptive "caring" assessment system (Zapata-Rivera, 2017) builds on AIED research on adaptive intelligent tutoring systems that "care" as they support learning (Self, 1999;Kay and McCalla, 2003;Du Boulay et al., 2010;Weitekamp and Koedinger, 2023).Attending to a broader set of student characteristics, contexts, and behaviors also allows the CA framework to leverage findings from multiple learning theories when developing "caring" supports.Emphasis on using intra-and interpersonal characteristics and other contextual information to drive assessment adaptation is consistent with and can leverage models of self-regulated learning (e.g., Winne and Hadwin, 1998;Pintrich, 2000;Kay et al., 2022).The inclusion of a broader set of characteristics, contexts, and behaviors also extends the idea of "conditional fairness" in assessments that use contextual information about students' backgrounds to adapt assessment designs and scoring rules (Mislevy, 2018) and extends typical research on computer adaptive assessments driven by performance and item difficulty (van der Linden and Glas, 2010;Shemshack et al., 2021).
While the CA framework has relevance to both large-scale summative and classroom formative assessment contexts, there is greater potential flexibility in applying this framework to the design of tools to be used in formative contexts, due to the emphasis on providing on-demand "caring" support to help learners maximize their learning and engagement during assessment tasks (Zapata-Rivera, 2017).Efforts toward realizing this framework have investigated how students' emotions, individual differences, and cultural contexts can best be leveraged to provide personalized assessment experiences.Next, we summarize this current and emerging research.

Student emotions
As anyone who has completed an assessment knows, it can be an emotional experience.However, very few assessments support students to remain in a productive emotional state (see Wise et al., 2006Wise et al., , 2019 for exceptions) or consider students' emotions when determining assessment outcomes (see Wise and DeMars, 2006 for an exception).Most research on student emotions during test taking has focused on documenting those experiences after test completion -and have shown that the experience of different emotions are differentially related to assessment outcomes (Spangler et al., 2002;Pekrun et al., 2004Pekrun et al., , 2011;;Pekrun, 2006).Research on the impact of student emotions during learning activities has received far greater attention (see D'Mello, 2013 for a review) and there are multiple examples of personalized learning systems that leverage both student cognition and emotions to provide feedback and guide instructional decisions (e.g., D'Mello et al., 2011;Forbes-Riley and Litman, 2011).
In our own research on the emotional experiences of students during interactions with conversation-based assessments we build on prior work in both assessment and learning contexts by focusing on the intensity of discrete emotions (Lehman and Zapata-Rivera, 2018).When intensity was considered, we found the same pattern across boredom, frustration, and confusion: low intensity was positively correlated, medium intensity was not correlated, and high intensity was negatively correlated with performance, despite no overall relationship with performance.While it has been found that confusion has a more positive relationship with learning than boredom and frustration (e.g., Craig et al., 2004;D'Mello and Graesser, 2011, 2012, 2014;D'Mello et al., 2014) and frustration a more positive relationship than boredom (Baker et al., 2010), in assessment context it appears that the three emotions have a similar relationship with outcomes.However, the intensity findings for confusion, specifically, may relate to prior findings that the partial (Lee et al., 2011;Liu et al., 2013) or complete resolution of confusion (D'Mello and Graesser, 2014; Lehman and Graesser, 2015) is necessary for learning.Real-time tracking of students' emotional experiences (states and intensity) can be leveraged to provide caring support during the assessment as has been successfully implemented in personalized digital learning systems.However, integration of emotion detectors into the ILM will require going beyond prior research as both the experience of emotions and the ways in which those experiences are supported to promote learning will need to consider more factors (e.g., student interest, cultural background).In the assessment context, the use of student emotions can be expanded to provide caring support after the assessment by providing context for a student's performance to both the teacher and the student (e.g., student was confused while responding to items 2, 5, and 7), which can allow for more informed instructional decisions.In the CA framework, the ways in which student emotions are leveraged to support student learning will build upon prior learning research and will require new research efforts to ensure that emotions are productively integrated with other individual differences.

Individual differences
Students enter into test-taking experiences with a wide variety of interests, prior knowledge, experiences, attitudes, motivations, dispositions, or other intra-or interpersonal qualities that can affect their engagement with and performance on educational assessments and other academic outcomes (Braun et al., 2009;Lipnevich et al., 2013;Duckworth and Yaeger, 2015;West et al., 2016;Abrahams et al., 2019).For example, self-efficacy beliefs are strongly linked to academic achievement across domains (Guthrie and Wigfield, 2000;Richardson et al., 2012;Schneider and Preckel, 2017).Understanding how individual differences influence performance in interactive learning environments suggests directions for interventions or dynamic supports (Self, 1999) based on cognitive or motivational variables (Du Boulay et al., 2010) or prior knowledge (Khayi and Rus, 2019) that can be applied in assessments.
In previous work, we investigated student characteristics that predict performance on innovative conversation-based assessments of science inquiry and mathematical argumentation (Sparks et al., 2019(Sparks et al., , 2022)).Students' science self-efficacy, growth mindset, cognitive flexibility, and test anxiety (with a negative coefficient) predicted performance on a science assessment (Sparks et al., 2019), while cognitive flexibility and perseverance (with a negative coefficient) predicted performance on mathematical argumentation (Sparks et al., 2022), controlling for student demographics and domain skills.Cluster analyses resulted in interpretable profiles with distinct relationships to student characteristics and performance, suggesting distinct paths for caring support within the CA framework (Sparks et al., 2020).For example, one profile represented students with average domain ability but relatively low cognitive flexibility, while another reflected motivated but test-anxious students.We hypothesize that these profiles would benefit from different supports (i.e., motivational messages vs. anxiety-reduction strategies; Arslan and Finn, 2023).However, the profiles and associated supports must be developed and validated in future research with students and teachers to ensure that the profiles reflect, and the adaptations address, the aspects most meaningful for instruction.

Cultural contexts
The prominence of social justice and anti-racist movements has resulted in increasing or renewed interest in (socio-)culturally responsive assessment (SCRA) practices (Hood, 1998;Lee, 1998;Qualls, 1998;Sireci, 2020;Bennett, 2022Bennett, , 2023;;Randall, 2021) which are themselves grounded in culturally relevant, responsive, and sustaining pedagogies (Paris, 2012;Gay, 2013;Ladson-Billings, 2014).Recent research reflects increasing attention to students' cultural characteristics when designing and evaluating AI-enabled instructional systems (Blanchard and Frasson, 2005;Mohammed and Watson, 2019;Talandron-Felipe, 2021); we can apply lessons from this work toward digital assessment design.As the K-12 student population becomes increasingly demographically, culturally, and linguistically diverse (National Center for Education Statistics, 2022), educational assessments must account for such variation, enabling test-takers to demonstrate their knowledge, skills, and abilities in ways that are most appropriate considering their cultural, linguistic, and social contexts (Mislevy, 2018;Sireci and Randall, 2021).Test items can include content reflective of situations, contexts, and practices students encounter in their lives (Randall, 2021), which can tap into students' home and community funds of knowledge (Moll et al., 1992;González et al., 2005) in ways that foster deeper student learning through meaningful connections to familiar, interesting contexts (Walkington and Bernacki, 10.3389/feduc.2024.1216481Frontiers in Education 04 frontiersin.org2018).Math problems assessing knowledge of fractions within a recipe context could vary the context to align with students' cultural background (e.g., beans and cornbread vs. peanut butter sandwich).
Positive effects have been shown for African American students interacting with pedagogical agents that employ dialects similar to their own in personalized learning systems (Finkelstein et al., 2013).
Emerging work is exploring cultural responsiveness in the context of scenario-based assessments (SBAs).SBAs are a useful context for exploring cultural factors in assessment performance and potential for implementing personalization within the CA framework (Sparks et al., 2023a,b).SBAs intentionally situate students in meaningful contexts for problem solving, providing a purpose and goal for responding to items (Sabatini et al., 2019).SBA developers have emphasized how scenarios can be made relevant to students from diverse racial, ethnic, and cultural backgrounds by intentionally incorporating contexts and content that celebrate students' cultural identities and integrate funds of knowledge from an asset-based perspective (O'Dwyer et al., 2023).Similar work has been conducted in designing robots for educational purposes in which students serve as co-creators to enable cultural relevance and responsiveness (Li et al., 2023).For example, SBA topics with greater cultural relevance to Black students (i.e., the Harlem Renaissance) show comparable reliability and validity but smaller group differences in performance versus more general topics (Ecosystems, Immigration), potentially due to Black students' greater engagement (Wang et al., 2023).Our current research (Sparks et al., 2023a,b) involves measuring students' self-identified cultural characteristics to examine relationships among their engagement and performance on SBAs, their racial, ethnic, and cultural identities, as well as their emotions, interests, motivations, prior knowledge, and experiences (i.e., home and community experiences, values, and practices related to assessment topics; Lave and Wenger, 1991;Gutiérrez and Rogoff, 2003;González et al., 2005).In future research, we aim to incorporate these characteristics into student profiles and evaluate how the profiles can be leveraged to provide a personalized assessment experience.This combination of cultural responsiveness and personalization has been explored in the learning context (Blanchard, 2010); however, additional research is needed to understand these dynamics to provide caring support within assessments.

Challenges and opportunities for caring assessment
Personalization within a CA framework introduces several challenges as well as opportunities when considering implementation of this framework within a digital learning system.The holistic view of students reflected in the ILM -going beyond measures of cognitive skill or performance to incorporate emotions, motivations, knowledge, interest, and other characteristics -requires access to data that is not typically collected during educational assessments (Zapata-Rivera, 2017).Contextual variables are often collected via survey methods (e.g., Braun et al., 2009;Abrahams et al., 2019) but could increasingly be collected by other means such as embedded assessment (Zapata-Rivera and Bauer, 2012;Zapata-Rivera, 2012;Rausch et al., 2019), and stealth assessment (Shute et al., 2009(Shute et al., , 2015;;Shute and Ventura, 2013) approaches which use logfile data from the assessment interaction and are less intrusive.For example, student interest could be measured by utilizing time-on-task and clickstream behaviors, versus a survey.Such approaches may collect multimodal interaction data (e.g., audio, or visual data) and leverage this information in an ILM.Collection of such multimodal data introduces the potential for privacy concerns regarding what is being collected, where data is stored, and who has access, especially to the extent that Personally Identifiable Information (PII) may be collected.Policy prohibitions may prevent collection and storage of certain data types (Council of the European Union, 2023).The importance of ethical and secure data handling and transparency with users about what and how data will be collected, retained, and used, is paramount, especially for K-12 students.Thus, implementation of the CA framework will require innovative measurement and modeling methodologies as well as close collaboration with students and teachers to build trust.Much like the ILM, it will be critical to integrate these independent lines of work in new research efforts that apply the CA framework in practice.Such integrated research is being actively explored in the INVITE institute1 toward development of "caring" STEM learning environments for K-12 students.
A further challenge relates to the inherent tradeoffs in selecting the key student characteristics and behaviors that should be used to implement personalization.Variable selection requires care to ensure that measures are reliable and appropriate, so that personalization can be implemented along the dimensions that are most pertinent to students' needs.However, this challenge also inspires new research opportunities -particularly ones that focus on students that have been historically underrepresented in both research and educational technology to determine what characteristics and behaviors are most relevant for different student groups.Research that is more inclusive and aware of the diverse experiences that students bring to personalized digital assessment and learning experiences can support effective variable selection.Open learner modeling approaches (Bull and Kay, 2016;Bull, 2020;Zapata-Rivera, 2020) introduce an opportunity to further refine CAs while building user trust by giving teachers and students the chance to inspect and reflect on the ILM, highlighting where the model and its interpretations should be revised or qualified.Development of the infrastructure needed to collect variables, classify behaviors, deploy adaptations, and continually update a caring system requires computational modeling, machine learning, and artificial intelligence expertise to help develop, test, and iterate on the learner models.ILMs can be leveraged toward effective decision cycles within the caring system that, for example, provide necessary supports, route students to appropriate versions of subsequent tasks, and provide tailored, asset-based feedback.
A related issue concerns teachers' perceptions of personalization and whether they prioritize mastery of content or embrace a more holistic view and a need to personalize based on a broader set of emotional, motivational, or cultural aspects.The effectiveness of CAs will rest on their ability to effectively integrate with teacher practice by supporting students with different constellations of strengths and challenges, detecting for teachers the students who are most in need of their additional attention and support.Again, this challenge offers an opportunity for new research that incorporates teachers into the research and development process to bring CAs into practice that are reflective of Integrating cultural responsiveness into the CA framework introduces additional challenges.While personalization implies treating students as individuals, culturally situated perspectives emphasize how individual students are positioned as members of socially-and historically-defined racial, ethnic, and cultural groups (Gutiérrez and Rogoff, 2003).Such views acknowledge that groups are not monolithic and that identification with the racial, ethnic, and cultural contexts individual students experience also varies (Tatum, 2017).Adapting at the group level necessitates acknowledgment of this individual variation as well as the potential for individuals to identify in ways that may (not) be congruent with demographic group membership.Demographics may intersect in meaningful ways that impact students' lived experiences (Crenshaw, 1989).However, culture is embodied in participation in practices with shared meaning and significance (Lave and Wenger, 1991;Gutiérrez and Rogoff, 2003;Nasir et al., 2014).This implies that CA should enable student selfidentification of demographic characteristics, cultural group memberships, and engagement in home and community practices (i.e., in terms of their funds of knowledge).Further research is needed to best understand how the complexity of student identities interact and impact their learning experiences.
Intersections among students' cultural backgrounds, knowledge, and experiences might be leveraged to increase the relevance and responsiveness of assessments (Walkington and Bernacki, 2018).Meaningful co-design activities in which the knowledge, interests, values, and experiences of students and teachers from historically marginalized groups can be centered, celebrated, and prioritized has the potential to result in more engaging, relevant, and valid assessments and would support more responsive personalized designs (O'Dwyer et al., 2023;Ober et al., 2023).Open learner models that can be interrogated and critiqued by students and teachers will be essential for a culturally responsive CA framework, so that student profiles and ILMs do not reflect biases or stereotypes, that misclassifications are appropriately corrected, and that contextual factors are considered when interpreting students' performance.Continued partnerships with teachers and students are needed to maximize the benefits for learning through connections to students' funds of knowledge while also minimizing unintended consequences.

Discussion
The CA framework can be leveraged toward personalized and culturally responsive assessments designed to support K-12 teaching and learning.This article outlines the current state of CA research on student emotions, individual differences, and cultural contexts, and highlights key challenges and opportunities for future research.Critical issues for future research include collection and handling of student data (characteristics, behavioral, multimodal) and associated privacy and security concerns, selection of characteristics for learner modeling, teacher perceptions of personalization, individual variation and self-identification of students' cultural identities and contexts, and engaging students and teachers in co-design of personalized ILMs and responsive adaptations.Research that integrates these independent areas is needed to bring the CA conceptual framework into practice in personalized digital assessments.
Whether the primary aim is individual personalization or responsiveness to students' cultural contexts, it is imperative that researchers engage in deep, sustained co-design partnerships with teachers and students to ensure validity and utility for those most in need of support (Penuel, 2019).It is also important to consider the assessment context (e.g., formative vs. summative, group-vs.individual-level reporting) and implications for measurement (e.g., comparability, scoring, interpretation) when determining how best to apply CA in practice.CA introduces opportunities to enhance students' assessment experiences and to advance use of assessment outcomes to further individuals' educational opportunities and wellbeing (Bittencourt et al., 2023).However, effective design and implementation of personalized assessments is a complex endeavor, which may necessitate new processes for designing assessments (O'Dwyer et al., 2023).We invite other scholars to conduct research addressing these challenges, advancing the field's ability to provide personalized, culturally responsive assessments.