Edited by: Paul Seitlinger, University of Vienna, Austria
Reviewed by: Gerti Pishtari, Danube University Krems, Austria; Helena Macedo Reis, Federal University of Paraná, Brazil
This article was submitted to AI for Human Learning and Behavior Change, a section of the journal Frontiers in Artificial Intelligence
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Simulation-based training (SBT) programs are commonly employed by organizations to train individuals and teams for effective workplace cognitive and psychomotor skills in a broad range of applications. Distributed cognition has become a popular cognitive framework for the design and evaluation of these SBT environments, with structured methodologies such as
Modern workplaces require workers to develop and execute a complex combination of cognitive, metacognitive, and psychomotor skills to achieve effective performance. With advanced technologies that have now become widely available, faster and more effective skill development can be achieved by designing effective training protocols that provide learners with multiple opportunities to train along with formative feedback to support continual improvement with clear pathways to achieve proficiency in their tasks. Simulation-based training (SBT) has become a popular paradigm to implement these training protocols. These environments provide safe and repeatable spaces for learners to practice and develop their workplace skills, and combined with adequate debrief and feedback they can support training in multiple domains (Ravert,
When SBT scenarios require collaboration and feedback among multiple agents (real and virtual), it is common to interpret the training scenarios and trainee performance using theories of
In parallel, other learning domains, such as K-12 classrooms, have seen a transformation in personalized learning through data-driven learner modeling and multimodal learning analytics (Hoppe,
Motivated by this gap, in this paper we develop a framework to apply a
As a demonstration of our framework for tracking and analyzing trainee behaviors and performance, we ran a small study with nursing students in this MRMB training environment. We have developed and applied our mixed quantitative + qualitative methods approach to analyze the data collected with video, audio, and eye tracking sensors. Our computational architecture processes the raw multimodal data streams and analyzes this data framed using the constraints and insights derived from a qualitative analysis using the DiCoT distributed cognition approach. The results are mapped to a combined qualitative-quantitative representation of the nurses' problem solving behaviors and performance, with the help of our cognitive task model. With continued development and refinement, results from our analysis methods can be provided to learners as formative feedback and to instructors to help them guide more detailed discussions during simulation debriefing.
The analysis presented in this paper supports an investigation of two primary research questions:
How can multimodal learning analysis be used to support a comprehensive analysis of distributed cognition in MRMB simulation training environments?
How does temporal alignment and analysis of multiple data modalities help us gain a deeper understanding of trainees' actions in the context of the tasks they are performing in an MRMB environment?
The rest of this paper is organized as follows. Section 2 presents previous work on SBT, the Distributed Cognition framework, and an overview of multimodal data analysis approaches applied to studying learner behaviors. Section 3 discusses our theoretical framing of the training scenarios and analysis by combining cognitive task modeling, distributed cognition through the DiCoT methodology, and multimodal data analytics. Section 4 provides details of the methods we have adopted in our study; first an overview of the MRMB-based Nurse Training scenario, a Cognitive Task Analysis approach to interpreting and analyzing nurses' actions in the training environment and mapping them to higher level cognitive behaviors, our adaptation of the DiCoT framework to study nurse performance and behaviors in the training scenarios, and a complete computational architecture to derive performance analysis from data collected in the SBT environment. Section 5 presents details of the analyzes of the nurses' performance and behaviors in the case-study MRMB-based training environment. This is followed by a discussion of the results obtained for two the scenarios and their broader implications in Section 6. Last, Section 7 provides the conclusions of the paper, limitations with the current approach, and directions for future work.
In this section, we briefly review past work in SBT, distributed cognition, and multimodal analytics applied to analyzing learners' training performance and behaviors.
Simulation-based environments offer many attractive properties for training applications; they provide controllable and repeatable environments in which learners and trainees can safely practice complex cognitive and psychomotor skills in rich and dynamic scenario representations. Thus, it is not surprising that simulation-based training has been widely adopted for a variety of domains, and many studies have shown them to be effective for both training and assessment (Ravert,
In addition, the integration of simulation environments with advanced computing resources has led to further advances in the field. Computer-based simulations allow for automated collection of trainee activity data, which can then be used to evaluate their performance, and for debriefing and after-action reviews (Ravert,
The overall goal of SBT is to help learners to develop a set of skills that are
Prior work has shown that providing formative feedback during debrief after the simulation improves both the application validity of the simulation, as well as the competence and self-efficacy of the learners (Gegenfurtner et al.,
Traditionally, cognition is studied with the individual as the basic unit of analysis. In essence, this classical view of cognition views the brain of an individual as a processing unit, which takes input from the outside world, manipulates this information, and produces some output, often in the form of body functions, such as movement and speech (Clark,
These limitations with classical cognition led some cognitive scientists, such as Clark, Hutchins, Cole, and others in the late twentieth century to begin developing alternative systems of examining cognition (Hutchins,
Hutchins argues that cognition occurs across at least three different modalities (Hutchins,
Distributed cognition is particularly relevant in analyzing training performance and behaviors in mixed-reality, simulation-based training. Mixed-reality SBT environments manifest many of the characteristics of these three distributed modalities. SBT inherently contains social structures and roles over which cognition is distributed. When multiple learners train simultaneously in the environment, the social distribution and interactions can be studied explicitly, with the learners collaborating and sharing the cognitive load and decision making processes in the task. Even in SBT cases with only one learner, there is a social distribution between the learner and the instructor, with information traveling and transforming between the instructor and student as they interact. SBT also contains instances of cognition distributed between internal and external structures. In mixed-reality scenarios, there is a distribution between the learners' minds, the physical space they inhabit, and the digital space with accompanying interfaces that are controlled by the simulation. In addition, many training domains require learners to learn and operate domain-specific tools, which also represent artifacts of distributed cognition. Finally, SBT is necessarily temporal, as learners practice skills that change (improve or degrade) over time. Thus, previous practice and previous actions will affect the ways in which learners approach current cognitive tasks.
Other studies which focus on nursing simulation-based training have also adopted distributed cognition for their analysis. Rybing et al. (
Despite the advantages of distributed cognition as a cognitive framework, application of the framework requires specific methodologies that are not outlined in the original work. Several structured qualitative analysis methodologies have been developed for analyzing distributed cognition in different domains and scenarios. For example, Wright et al. (
DiCoT is a qualitative analysis framework designed to analyze distributed cognition by breaking down a cognitive system into five independent themes: (1)
Illustration of the interactions between each of the five DiCoT themes and how they work together to construct the entire cognitive system.
In order to analyze each of the themes and their interactions, the DiCoT methodology defines several
The 18 principles of DiCoT analysis, summarized from Blandford and Furniss (
1. Space and cognition | The role space and spatial layout play in supporting cognition |
2. Perceptual principle | Spatial representations support cognition more than non-spatial representations, as long as there is a clear mapping between the space and that which the space represents |
3. Naturalness principle | Cognition is aided when the form of a representation matches the properties of what it represents |
4. Subtle bodily supports | Individuals often use their body to support cognitive processes |
5. Situation awareness | People need to be informed of and understand what has previously happened, what is currently going on, and what is planned |
6. Horizon of observation | The information that can be seen or heard by a person; closely related to and influencing situation awareness |
7. Arrangement of equipment | The layout of equipment affects what information people have access to, and thus their ability to process it |
8. Information movement | Information moves around a system in a number of ways, which all have unique functional consequences |
9. Information transformation | Information can be represented in many forms, and often must transform between these forms when moving and when being processed |
10. Information hubs | A central focus or source where different channels of information meet and are processed together |
11. Buffering | If incoming information interferes with ongoing activities, buffering allows the information to be held until an appropriate time where it will not interfere |
12. Communication bandwidth | Different modalities of communication often carry different amounts of information. For example, face-to-face communication offers more information than computer-mediated communication |
13. Informal communication | Not all communication is formal, and sometimes informal communication can carry very important information that is not otherwise passed |
14. Behavioral trigger factors | Groups of people can operate together without an overall plan by individually responding appropriately to certain local trigger factors |
15. Mediating artifacts | People often bring artifacts into coordination to support completion of a task |
16. Creating scaffolding | People often simplify their cognitive tasks by utilizing their environment |
17. Representation-Goal Parity | When an artifact is used to represent the system's goal, representations closer to the goal of the user are more powerful |
18. Coordination of Resources | Different information structures can be coordinated to aid in cognition |
Learner modeling based on student performance and behavior has been the cornerstone for adapting and personalizing computer-based learning environments to individual learner needs. More recently, data-driven approaches to learner modeling based on learning analytics and machine learning methods have become popular for capturing and analyzing learner behaviors in complex instructional and training domains (Hoppe,
However, more recent work has begun to point out the potential limitations of these traditional methods. By only using logged data that is easy to collect, we may miss out on important context and interpretation that the information sources may provide. Therefore, we may require additional sensors to collect such data (Ochoa et al.,
Combining all of the modalities of operation (e.g., activities, communication, affective states, stress levels, and gaze) can lead to analyzes that provide a more complete picture of the cognitive, psychomotor, and metacognitive processes of the learners (Blikstein and Worsley,
In our own previous work, we have applied MMLA methods to analyze teamwork behaviors in simulation-based training environments, including those that incorporate mixed-reality components (Biswas et al.,
Cognitive Task Analysis typically draws from multiple sources. This includes a review of relevant literature, interviews with domain experts, and observing and interpreting training activities in the mixed reality simulation environment in terms of their conceptual and procedural components. From this analysis, one can build a comprehensive task-subtask hierarchy that links high-level tasks and subtasks down to specific observable skills and activities performed by trainees (Biswas et al.,
By analyzing the observable multimodal data at the lowest levels of the hierarchy and propagating the results up to higher levels, we can generate inferences about cognitive activities and competencies of trainees. In this way, insights generated from cognitive task analysis combine top-down model-driven and bottom-up data-driven approaches. In previous work, we have applied cognitive task analysis methods to demonstrate how teamwork in mixed-reality SBTs can be evaluated using MMLA (Vatral et al.,
Our goal in this work is to present a framework for combining the benefits and insights from qualitative analysis of distributed cognition through the DiCoT methodology and quantitative analysis through data-driven multimodal analytics. Analysis using qualitative methods (Cognitive Task Analysis, DiCoT) provides domain semantics to inform how the quantitative analysis (MMLA) is performed, and in turn, results of the quantitative analysis provide new insights into the domain and the learner behaviors that inform a richer qualitative analysis. We believe that by presenting an integrated qualitative and quantitative analysis that inform and shape one another, the strengths of each method can be amplified, thus providing for a deeper insights than each method individually and better feedback to learners, instructors, simulation designers, and researchers.
Our overall theoretical framework, illustrated in
The overall theoretical framework to combine qualitative DiCoT analysis with quantitative multimodal analytics for understanding learner behaviors in simulation-based training.
Next, we construct the hierarchical structure by breaking down the highest level cognitive, psychomotor, affective, and collaboration concepts into their more domain specific sub-components and sub-tasks using a progressive elaboration process. The primary reason for creating the different levels of abstraction is to ensure that variations of training scenarios, though they may differ in their lower-level task definitions and sub-divisions, map onto relevant higher level processes and help define proficiency measures in the task domain.
In more detail, primary tasks are decomposed into sub-tasks; sub-tasks are further decomposed into more fine-grained sub-tasks; and so on until we reach a set of basic task units that cannot be meaningfully decomposed further. We call this basic unit an
While the modeling of the domain is approached top-down, the interpretation of the learner actions and behaviors uses the model in a bottom-up manner, interpreting the multimodal data collected from the environment into lower level activities and behaviors. We employ a variety of multimodal analysis techniques to link from observable data back to the interpreted actions performed by the learners. This is illustrated by the arrow linking multimodal analytics (green) to the cognitive task model (blue) in
For example, in our nursing domain, the DiCoT analysis revealed that there are four meaningful semantic areas in the simulation space: left of the bed, right of the bed, foot of the bed, and outside the room (see Section 4.3.1). Thus, we can adopt this result from the qualitative analysis into the design of the quantitative algorithmic methods by using the video data to determine when the nurses move between each of these four semantic regions (see Section 4.4.1). As a second example, in our nursing domain, the DiCoT analysis revealed the various artifacts that are semantically important to information flow (see Section 4.3.2). We can adopt this result by using the eye-tracking gaze data and mapping the raw x-y gaze position data onto instances where the nurse is looking at each of the semantically important artifacts identified by the DiCoT analysis (see Section 4.4.4). In this way, we use the results of DiCoT analysis to create algorithmic models that convert raw data (e.g., video, audio, etc.) into action- and behavior-level interpretations.
Once we convert from the raw data to the action- and behavior-level interpretations, they are mapped onto a common
We systematize this interpretation process by once again introducing results from the qualitative DiCoT analysis of the task environment, as illustrated by the arrow linking distributed cognition (yellow) to the cognitive task model (blue) in
While this restaurant scenario analysis represents a simplistic example, it demonstrates the second way in which DiCoT is important for adding semantic context to our computational analysis. First, DiCoT informs the design of algorithms and models to convert raw data to action-level interpretations. Second, DiCoT provides context-specific disambiguation when mapping lower-level action and sub-tasks onto high-level tasks and sub-tasks. By iterative analysis, we can propagate learners' activities up to the highest-levels of the task model to understand their cognitive, psychomotor, affective, and collaborative behaviors.
By presenting the learners and instructors with quantitative metrics and qualitative descriptions of learner activities at multiple levels of the task model hierarchy, we can provide a basis for further discussion at different levels of detail during simulation debrief, while also tracking progress and changes in learner behavior over time. In addition, the results generated from this computational analysis also provide additional insights back into semantic models of the domain and inform a richer qualitative (DiCoT) analysis and task model construction. This idea is illustrated by the cyclic link from the cognitive task model (blue) to distributed cognition (yellow) in
In this section, we demonstrate application of our theoretical framework to a small case study of nurses training in an MRMB environment. We begin with a complete description of the case study, including description of the affordances of the simulation environment and all of the data that was collected for the analysis. After this, we show how each of the three components of our theoretical framework apply to interpreting and analyzing nurses' activities and behaviors in this domain. First, we explain the construction and structure of the complete cognitive task model, from the high-level abstract cognitive tasks down to specific actions and observable data. Second, we describe a DiCoT analysis of the training environment, explaining each of the five themes in depth. Finally, we present a computational architecture, based on multimodal analysis, which tracks the raw multimodal data collected from the training environment through the cognitive task model to generate inferences, analytics, and performance metrics that describe the nurses' training behaviors within the context of the distributed cognition system.
Following the description of each component of the theoretical framework applied to the case study, we demonstrate the processes of following the collected data through the framework to generate inferences about nurse behaviors.
The approach presented in this paper is supported by a case study that analyzes student nurses training in an MRMB environment. The training took place in a simulated hospital room, which was equipped with standard medical equipment and monitors for information display and communication of the providers orders. The patient was represented by a high-fidelity manikin that was exhibiting distress symptoms and a deteriorating health state. The simulated hospital room is displayed in
Layout of the simulated hospital room from three viewpoints: the head camera (top-left), foot camera (bottom-left), and an abstract map representation (right).
In more detail, the patient manikin is a SimMan 3G advanced patient simulator from Laerdal Healthcare that supports hands-on deliberate practice, development of decision-making skills, and improved communication and teamwork among learners (Laerdal Medical,
In addition to these presets created prior to training, an instructor in a control room can modify the patient state in real-time by interacting through the LLEAP software. The instructors watch the simulation from behind a one-way glass partition, allowing them to observe the nurses' activities, conversations, and interventions. Then, based on the nurses' specific actions (or lack of actions), the instructor makes real-time modifications to the simulation on the LLEAP software. The instructor can also talk as the patient through a microphone in the control room, which can be heard through speakers on the manikin. In the current study, which represented an early training exercise in the nursing curriculum, the instructor was closely involved in the progression of the simulation and manikin.
Three groups of eight nursing students participated in the study over 2 days, taking turns playing their assigned roles in each scenario. The primary participant in each instance of the simulation was a nurse performing a routine assessment on a hospital patient, and discovering a condition that required immediate attention and additional interventions. After diagnosing the patient's condition and performing any relevant immediate stabilization, the nurse was required to call the patient's assigned medical provider to confirm an intervention that would alleviate the patient's newly discovered condition.
Students in the group who were not actively participating in a given run of the scenario watched from a live camera feed in a separate debriefing room. After each scenario was completed, the instructors and the participants joined the full group in the debriefing room, and the instructor guided a discussion-based debriefing of the simulation. Each instance of the simulation took between 5 and 20 min, and parameters of the patient's condition were changed between each run to ensure the next set of students did not come into the scenario with full knowledge of the condition and the required intervention.
All students who participated in the study provided their informed consent. With this consent, we collected data using multiple sensors: (1) video data from two overhead cameras that captured the physical movement and activities of all agents in the room (nurses, providers, and the manikin); (2) audio data also from the camera videos that captured the nurse's dialog with the patient and the provider; (3) the simulation log files that tracked all of the patient's vital signs and data from the sensors on the manikin. In addition, a few students, who provided a second informed consent on collecting eye tracking data, wore eye tracking glasses that allowed us to record their gaze as they worked through the simulated scenario. The study was approved by the Vanderbilt University Institutional Review Board (IRB).
In this paper, we chose two of the recorded scenarios for our case study, in both of which the primary participant wore the eye tracking glasses. In the first scenario (S1), the fictitious patient, Patrice Davis, is receiving an infusion of blood after a bowel resection surgery the night prior. The patient called the nurse stating that she is not feeling well. The goal for this training scenario is for the nurse to assess the patient and diagnose that the wrong blood type is being administered to the patient. The intervention requires the nurse to stop the current infusion and call the provider to discuss further treatment. The primary participant in S1 was a 23 year old female nursing student.
In the second scenario (S2), the same fictitious patient, sometime later in the day, again calls the nurse complaining of pain in the right leg, stating that yesterday “it wasn't bothering me that much but today the pain is worse.” The goal of this training exercise is for the nurse to assess the patient and diagnose a potential deep-vein thrombosis (blood clot) in her right leg. The intervention requires the nurse to call the provider for updated treatment orders and to schedule medical imaging for the patient. The primary participant in S2 was a 24 year old female nursing student.
Using the cognitive task analysis methods previously described, we generated a comprehensive task hierarchy for the nurse training domain. This hierarchy is illustrated in
Cognitive task model for the nursing simulation domain.
Information gathering represents the processes nurses apply to retrieve new information and monitor ongoing concepts. This process can be further characterized as either
Assessment represents the processes used to synthesize gathered information in order to construct and evaluate specific solutions and interventions. In addition, we further decompose assessment into intervention
During evaluation, similar processes are applied to synthesize information, but this time with a further emphasis placed on monitoring the progress of patient health over time. Temporally, the evaluation phase typically takes place after the nurse has already intervened in some way, and serves as a method to verify that progress toward the intervention goals is being achieved. The evaluation results in one of two possibilities depending on whether progress is made: either continue the intervention further or stop the intervention and re-assess to establish a new plan.
Intervention represents the actions and processes that nurses take in service of a specific goal related to patient health. These interventions are characterized as either
In the treatment phase of intervention, the nurses' actions are in service of fixing underlying health issues that could cause danger to the patient's health in the future. For example, a nurse might start administration of chemotherapy drugs for a cancer patient. In this case, the medication is not designed to help immediate symptoms, but is rather part of a longer term plan to fix the underlying condition and put the cancer into remission. During the treatment phase, nurses will either start/continue an existing treatment order if they are aware of the patient's condition and a provider has prescribed the treatment. If the nurse finds a new condition in the patient, they will contact a provider to follow-up and get a new treatment order.
As discussed, the DiCoT framework with its five themes: (1) physical layout, (2) artifacts and environment, (3) social structures, (4) information flow, and (5) temporal evolution; provides a qualitative framework for analyzing learner activities in the training environment. Results from this qualitative analysis then provides the basis for analyzing the multimodal data and inferring nurse activity and behavior information with supporting context.
Physical layout by her position on the left and right sides of the bed;
artifacts and environment by her physical interactions with the available instrumentation and patient manikin;
Social roles by her verbal communication with the patient manikin;
Information flow by her referencing of the patient chart monitor; and
Temporal evolution by following the sequence of her actions in the environment over time.
Example of the distributed cognition in the context of nurse training across the physical layout, artifacts, and social themes.
Using the five themes and 18 principles (see
The complete layout of the room from three viewpoints can be seen in
The overall physical layout covered in the simulation environment mimics the layout of a typical hospital room, where the trained nurses apply their learned skills on real patients (P3, P17). In the center of the room along the back wall is the patient bed, where the manikin is placed. To the right of the bed is a computer monitor that displays the vital signs of the patient as graphs (P2, P7). The default graphs and other vital displays are large enough so that the nurse can see them from any position in the room (P5, P6, P7), but the nurse can physically interact with the monitor to test certain vital signs and to get more information when she is on the right side of the bed (P1, P5, P6, P7). To the left of the bed is a second computer monitor that displays the patient's chart. The information on this chart is displayed in smaller text font, so the nurse has to be close to the screen to read information and needs to scroll on the screen to view all of the information. In other words, the nurse must move to the left side of the bed to access this chart (P1, P5, P6, P7). Past the foot of the bed, the room opens into a larger area that contains a cart of medical supplies that may be needed to perform clinical procedures (e.g., gloves, masks, needles, tubing, etc.) (P7). Finally, outside of the room is a medication dispensary; the nurses must leave the room and walk to the dispensary to retrieve needed patient medications (P5, P6, P7).
Given the physical arrangement of the room, we divided the physical space of the simulation into four regions that nurses may move between: (1) left of the bed, (2) right of the bed, (3) foot of the bed, and (4) outside the room. As discussed, each of these regions has available equipment and information that the nurses can use to accomplish their goals. Therefore, they may have to move between the regions to achieve specific goals. At the right side of the bed, nurses can perform clinical procedures, such as taking vital signs or interacting with other stationary equipment (e.g., IV pump, oxygen unit). These clinical procedures are components of the
At the left side of the bed, nurses can primarily perform
The foot of the bed acts as a transition area for high-level cognitive tasks and lower-level sub-tasks. The training nurses enter the room through this area, establish their current goals, their observation (P6) and their situational awareness (P7) in relation the patient in the room. The nurses pass through this region when moving from the left side of the bed to the right (and vice-versa), while gathering information and making decisions on what clinical procedures to perform (P1). They often pick up equipment from the cart along the way (P7). Nurses also have to pass through the foot of the bed to visit the medication cart, or otherwise exit the room. When doing so, the foot of the bed provides a final moment of situation awareness before their horizon of observation shifts and they are no longer directly viewing the room (P6, P7).
Within the simulation environment, the actors, in particular the nurses, utilize a variety of artifacts to support their training activities that are outlined in our task model. The first set of artifacts comes primarily in the form of medical equipment; some of them appear in
Another important artifact in the simulation is the script, which is a set of guidelines set by the instructor about the unfolding of events in the scenario. The script outlines the initial conditions (e.g., the patient's condition, expected vitals at start), as well as a set of behavioral triggers (P14) for how the scenario should evolve given the potential actions (or lack of actions) performed by the nurse. For example, the script might specify that if the nurse does not begin infusing medication within 3 min after the scenario begins, the patient's blood pressure will drop. These scripts' trigger factors mediate the temporal evolution of the simulation (P15, see Section 4.3.5)
The manikin, representing the human patient, is another important artifact for the simulation. It provides an interface for the instructor to construct and guide the evolution of the scenario. The manikin is programmable; therefore, the instructor can digitally set parameters for the patient manikin (e.g., vital signs, movements, and conversations), which are then physically enacted by the manikin system (P13). During dialogue between the nurse and patient, the instructor speaks as the patient through the manikin offering additional information to the nurses (P10), as well as instructional scaffolding (P16), when needed. For example, if the nurse fails to take the patient's temperature, the instructor might scaffold this behavior by making a remark through the patient, such as “I also feel a chill,” which may prompt the nurse to check for a fever by taking the patient's temperature. These dialogue acts can also be used by the instructor to evaluate the nurse's understanding and thought processes. For example, consider the dialogue sequence from S1 shown in
Sample dialogue from S1 demonstrating evaluation of the nurse.
1 | Nurse: | I'm going to stop this infusion really quickly. |
2 | Patient: | Why? |
3 | Nurse: | Because when we give red blood cells, an indication that you're having a reaction to it is low back pain and feeling itchy. So it sounds like you're having a reaction to the blood transfusion. |
Within SBT, there are three main types of users (Rybing,
In our nursing case-study, each instance of the simulation has three basic users: two students and the instructor. The students act as learners in the simulation, one taking the role of the primary nurse and one taking the role of the medical provider. The instructor takes a dual role as both the teacher as well as a confederate playing the part of the patient. The patient is enacted through the manikin mediating artifact described in the previous section.
The primary goal for the nurse training in the MRMB simulation is to collect sufficient information about the patient (i.e., the
The first information source is the primary nurse, who typically provides information in the form of clinical knowledge that is previously learned during schooling and from prior experiences. This clinical knowledge includes
Declarative knowledge, e.g., what is the nominal range for blood pressure?
Procedural knowledge, e.g., how does one measure blood pressure accurately?
Inferred associations using prior knowledge and observed information, e.g., given that the measured blood pressure is greater than normal, does it explain the conditions that the patient is experiencing?
Diagnostic inferences, e.g., What may be the cause(s)?
It is important to note that the above is considered to be prior information, and not included as an element of
Next, the patient's electronic medical record (EMR), also known as the patient's chart, is an information source containing a comprehensive history of the patient's prior symptoms, conditions, and treatments. The chart acts primarily as an information hub (P10), which allows the nurse to quickly reference the patient's history in a comprehensive way. However, it also plays the role of a mediating artifact (P15), since the chart is generally divided into sections allowing the nurse to access the relevant historical information related to the current diagnosis task. Additionally, since the chart contains notes from previous nurse shifts and the treatment being currently administered to the patient, it also helps the nurse trainee to better analyze the patient's trajectory and current condition, and use this to determine their goals and the tasks they need to perform (P17).
Third, the nurse is able to perform clinical procedures on the manikin and gather information about the patient's health conditions. These clinical procedures take a variety of forms, but the most common is collecting and characterizing the patient's vital signs. Nurses make use of the clinical equipment as mediating artifacts (P15) to make measurements on the patient and assess their condition. The mediating artifacts transform measurements into textual and graphical information for easier interpretation by nurses and other providers (P9). The output information is aggregated and displayed on the vitals monitor (see Section 4.3.1), which then acts as an information hub (P10). Other clinical procedures can also be performed by the nurses as needed. For example, if a patient is having pain in one of their legs, as in S2, the nurse might perform a physical examination of the patient's leg to gain more information.
Finally, social interactions between the nurse and the patient provide important information that is not measured directly. The instructor speaks through the patient to provide some of this information to the nurse(s). This information often provides elaborations of the patient's symptoms and additional symptoms that are not directly measured. For example, the patient might describe the location, severity, and history of their pain. These social interaction represent the
As the simulated scenario evolves, information primarily flows from the four information sources described above to the nurse (P8), who then process the information (P9, P18) and act on it (P14). When nurses enters the room, they generally begin with a brief interaction with the patient, and this results in information transfer about the patient's general conditions and symptoms from the patient to the nurses. This typically provides an initial baseline for the nurses to check for additional symptoms and start making diagnostic inferences (P13). Thus, it is a component of the
Next, the nurses typically take some time to reference and review the chart, synthesizing the information that they just heard with the patient history before returning to a more extended dialog with the patient to extract more specific information to support diagnostic inferences. The nurses may ask a series of questions to the patient combining what they saw in the chart with their clinical knowledge (P14). This discussion is typically followed-up by one or more clinical procedures, such as taking vital signs and performing physical examinations. This cycle of discussion with the patient followed by clinical procedures can then be repeated as necessary until the nurse reaches some form of conclusion about the patient's condition. At a higher-level, this can also be thought of as a cycle between the
Up to this point in the simulation, nearly all of the information has been flowing in from the other information sources in the environment to the nurses (P8). However, once nurses collect sufficient information to reach a conclusion, the process reverses and the synthesized information and resulting conclusions are provided back to the rest of the system through their resulting actions. Common actions at this point include explaining the situation to the patient, starting and stopping certain treatments (e.g., medications), and calling the medical provider to give an update and request updated treatment. These actions and the general flow of information from the nurse to the environment is an enactment of the
The simulation evolves over time in one of two possible ways: through nurse actions or nurse inaction. The instructor has a script artifact which outlines a set of behavioral triggers that detail how the scenario should change (P14). Most of this script deals primarily with triggers due to nurse inaction. For example, the script might dictate that if the nurse does not start medication within 5 min of the scenario starting, then the patient's heart rate begins to climb steadily. On the other hand, scenario changes due to nurse actions are primarily dictated by medical and social responses based on the judgement of the instructor (P3). The nurses gather information to evaluate the situation. Then, based on their evaluations, the nurses intervene to alleviate the patient's conditions. Based on that intervention (or lack thereof), the instructor modifies the scenario. If the intervention was correct, then the patient improves and the simulation ends, but if the intervention was incorrect, then the instructor may further decline the patient's health and the nurse must re-evaluate the presented information and try a new intervention strategy. The temporal evolution of the simulation is built primarily along this cycle of information gathering and intervention.
One of the primary goals of this work is to show how quantitative data can enhance the qualitative DiCoT analysis and integrate this analysis with task modeling framework to better analyze and interpret learner behaviors. To do this, we create a computational framework that takes the raw data collected from the different sensors, maps it onto specific features derived from the DiCoT analysis and then interprets them using the task hierarchy. In our case study, we perform analysis on two raw data sources,
Overhead video cameras, and
Eye tracking glasses.
These map onto four feature modalities that form the basis of our analyzes: (1) position, (2) action, (3) speech, and (4) gaze. The complete computational framework is illustrated as a block-diagram in
The overall computational architecture used for the quantitative analysis.
From a combination of the four feature modalities, we construct a complete progression of activities and events on a
The nurse's positions in the simulated hospital room are derived using visual object motion tracking techniques applied to the video from the two overhead cameras. Our motion tracking techniques are derived from the tracking-by-detection paradigm, which is a two stage approach to tracking Sun et al. (
In our case studies, we use the matching cascade algorithm originally developed in Wojke et al. (
However, these motion tracking techniques only produce a track of the nurses in reference to the video frame. We need to map these tracks into the nurses' positions in the hospital room as we have described in the physical layout theme of our DiCoT framework. To accomplish this, we extend our traditional motion tracking techniques to project the camera-space motion tracks onto a top-down map representation of the environment (see
Our approach for mapping these camera-space tracks onto this hospital room space computes a planar homography, which associates known points in the camera-space to known points in the map-space using rotation, translation, and scaling operators. Given the computed homography matrix, we can project the camera-space tracks onto the room-space for each frame of video, using the center of the person's bounding box as the projected point. This results in a continuous time-series of nurse positions in the simulation room relative to the top-down map. Further details of this map-projection object tracking can be found in Vatral et al. (
While the continuous time-series of nurse positions in the hospital room is a useful analysis tool, on it own, it lacks the semantic context necessary for meaningful insights. To add this semantic context back to the position data, we discretize the continuous positions into four regions developed using DiCoT analysis (see Section 4.3.1): (1) left of the bed, (2) right of bed, (3) foot of the bed, and (4) outside the room. To perform this discretization, we define a polygonal region on the top-down map of the hospital room for each of the DiCoT semantic regions. Then for each timestamp of the continuous track, we check the polygonal region that contains the nurse's position and assign that label to the given timestamp. This allows us to track in terms of time intervals of nurse positions in the different semantic regions of the room, and when they transition between these regions.
In addition to providing position information, analysis of the overhead camera video also provides important information and context for the actions that the nurse performs in the training scenario. Specifically for this case study, we annotate instances in the video where the nurse performs an action by physically interacting with any of the artifacts in the MRMB environment previously identified from the DiCoT analysis.
Additional contextual information can be derived by combining the physical activity that defines an action with other modalities. For example, analyzing speech (see Section 4.4.3) may provide additional information about why a nurse is performing a specific action, or how two nurses are coordinating their actions, for example, when they are jointly performing a procedure. Similarly, a coding of the nurses' gaze (see Section 4.4.4) may provide additional information about how a nurse is performing an action. In some situations, the nurse may look at the same object that they are physically interacting with; in other situations, the nurse may look at a different object than the one they are physically interacting with. As an example, while physically examining a patient, a nurse may turn their gaze to the vitals monitor to see how their current measurement may match with other vital signs (e.g., blood pressure being measured and heart rate of the patient). These examples clearly illustrate the importance of combining information across modalities for action annotation to gain a complete understanding of the nurses' activities in the training environment.
To perform action annotation, we have developed a coding schema based on the artifacts from the DiCoT analysis, which represents all of the high-level objects that nurses physically interact with during the simulation. These objects are primarily medical equipment, e.g., the patient chart, the vitals monitor, and the IV unit. They also include specific parts of the patient that are relevant for physical examination in these scenarios, e.g., the patient's hands, legs, body, and head. In total, we coded nurse actions into 13 categories for the two scenarios in our case studies, which can be seen on the timelines for each scenario (
Raw speech is collected from multiple streams that include the audio from the two overhead cameras, and each of the Tobii eye tracking glasses. For this case study, we only analyzed audio from the overhead camera at the head of the bed. In future work, particularly during simulations with a greater focus on teamwork, we intend to analyze audio by creating an egocentric framework for each agent in the training scenario.
While raw recorded speech patterns are useful for some tasks (e.g., emotion detection), most NLP tasks perform analysis directly on a body of text, which requires raw audio to first be transcribed as a preprocessing task. For the current case study, we utilized the Otter.ai speech transcription service (Otter.ai,
An example of dialogue from scenario 1 which has been annotated using the developed tagging schema.
Additionally, it is important to note that there are transcription errors in
Gaze data is collected using Tobii Glasses 3. The glasses record multiple raw data streams including egocentric-view video, audio, eye gaze (2D and 3D), and inertial movement units (IMU) (Tobii Pro,
Given the high sampling rate and noise present in eye-tracking data,
The final preprocessing step is to encode the fixation data into areas of interest (AOI) sequences. Linking fixations to AOIs bridges the gap between direct sensory output to domain-specific content, thus providing further insight into the nurses' attention and engagement. The temporal evolution of nurses' visual attention is represented by AOI sequences. In this study, AOI encoding from the fixation data is manually annotated to 11 objects of interest (OOI) that were selected based on the DiCoT analysis (patient, provider, screen chart, paper chart, vitals, medical tray, equipment, keyboard, instructor, one-way mirror, ground). Each of these physical objects are treated as an AOI and are annotated using the egocentric video. The manual tagging is performed through visual inspection of the egocentric video with fixation data overlaid. In each case where the nurse fixates on one of the AOIs, the start and end time of the fixations are recorded. An example of the gaze overlaid on the video is shown in
An example of fixation overlay from Scenario 2 used for manual annotation. In this frame, the resulting AOI is “patient”.
The alignment and processing of multiple data modalities reveals new inferences about the simulation and the nurse's behaviors. In this section, we analyze and interpret these integrated multimodal timelines (
The complete timeline of events for scenario S1 containing annotated data from participant position, action, gaze, and speech.
The complete timeline of events for scenario S2 containing annotated data from participant position, action, gaze, and speech.
For scenario S1, the timeline breaks down into approximately five high-level segments. The first segment follows the general
Once the nurse decides she has enough information to build her initial mental model of this patient's situation, the simulation enters the second phase. This transition is marked by the nurse moving from the left side to the right side of the bed, as seen in the position modality around 80 s into the scenario. As previously shown from the DiCoT analysis, this movement between regions in the room is an important indicator of task transitions. During this new segment, the nurse moves to the
In this segment, we also see a reduction in dialogue, which likely has two causes. First, specific to this scenario, much of the information that can be obtained from the patient has already been gathered in the previous segment. Second, the cognitive load associated with performing clinical procedures (e.g., when taking a blood pressure reading) is likely higher than simply reading the patient chart. Because of this, the nurse may focus more on the clinical task at the expense of continuing conversations with the patient. This is especially true for novice trainee nurses who are still learning how to perform clinical procedures in correct and effective ways. Knowing that these clinical tasks require higher cognitive loads and having observed from the control room that the nurse reduced her dialogue, the instructor likely also intentionally reduced their conversations with the nurse during this period. The instructor may have spoken less through the patient while these tasks were being performed to avoid splitting the nurse's attention, conforming to the best practices during SBT (Fraser et al.,
Around 220 s into the scenario, our video analysis shows that the nurse begins to interact with the blood pump, which implies a transition to the third segment of her overall task. According to the DiCoT analysis, the blood pump is a mediating artifact, not an information source. Given this additional context, we can conclude that the nurse has reached the end of her diagnostic information gathering phase and has begun the
Specifically, in this segment the intervention represents the stabilization process, which requires the nurse to stop the blood infusion and prevent any further damage to the patient's health because of the infusion of the incorrect blood type. At the start of this segment, as our video analysis shows, the nurse stops the infusion, but the speech modality also records an
It is quite reasonable for a patient to ask questions about their condition and the treatments being administered in a real hospital setting. The instructor plays this role as the confederate. Indirectly, some of the questioning by the patient (i.e., the instructor as the confederate) also serves as an evaluation of the nurse who must explain her reasoning. This sort of evaluation questions arise from the instructor's role as the teacher, rather than the confederate. Since the instructor is playing both social roles, this discourse interaction may fulfill multiple pedagogical roles in the simulation scenario, i.e., how the nurse conveys diagnostic information to the patient to reassure them, and how the nurse has combined all of her observations to make diagnostic inferences. In this same time interval where the nurse interacts with the blood pump and verbally explains what she is doing, we also see her gaze moves between the equipment (the blood pump) and the patient, which is likely part of the social dynamics when interacting with a patient. The nurse should not ignore the patient while performing clinical procedures, which is exemplified here as the nurse shifting her gaze between the patient and the blood pump.
Once the stopping of the infusion is observed in our video analysis, the fourth segment of the simulation begins, with the transition marked again by the nurse's movement; this time the movement is from the right side of the bed back to the left side. This segment maps on to the
During this same period, the speech also shows a few
In scenario 2, the timeline breaks down into four high-level segments. Once again, the first segment represents the
This initial movement and speech is then followed by a long period of attention strictly on the chart monitor, as seen in both the actions and gaze, as well as the absence of any dialogue. As shown in the information flow DiCoT analysis, this chart monitor is a significant information hub in the room and further supports this segment as information gathering. The absence of dialogue here is also particularly interesting when compared to the nurse in scenario 1, who tended to multi-task dialogue with the patient while reading the chart. However, here we see a different information gathering strategy of first spending devoted time to the chart, followed by a shorter period of
During this question and answer period, the nurse's position moves quickly between the foot of the bed, the right of the bed, and back to the left of the bed, with her gaze also moving rapidly between pieces of equipment and other artifacts in the room. On its own, it is unclear what exactly the purpose of these rapid movement and gaze changes are; however, given that this occurs while the dialogue is primarily question and answer, which is an information gathering task, it is likely that the movement and gaze are also related to the information gathering. While the nurse is using dialogue to gather information about the patient during this period, she is simultaneously also gathering information about the available equipment and physical layout of the room using her movement and gaze.
At this point, the second segment of the simulation begins, marked by the nurse moving back to the left side of the bed and her gaze now stabilizing back on the patient and chart, around 140 s into the scenario. Like scenario 1, the second segment represents
The nurse begins examining the patient's left leg for a short period of time, while asking the patient whether certain areas that the nurse touches are tender. This is derived from our analysis of nurse's actions, which show physical interaction with the patient's leg, along with speech analysis which shows sequential
After a few more moments of examination and dialogue with the patient, the patient finally speaks up and says, “It's my other leg that hurts.” At this point, the nurse quickly moves over to examine the right leg, as shown in the action data. There are several interesting points about this interaction. First, dialogue of the patient is another manifestation of the dual social role of the instructor. The instructor is acting as the patient in this moment, but also providing some instructional scaffolding, e.g., that the nurse needs to examine the other leg. By inhabiting this dual social role, the instructor can seamlessly introduce the instructional scaffolding into the simulation scenario by speaking through the patient.
Second, by combining data modalities, we gain a much deeper understanding of the nurse's activities in the training scenario. Because we have the eye gaze information and see that the nurse looks back at the chart, we interpret that the nurse realizes that there is an issue before being corrected by the patient. Pedagogically, this is important because it shows a level of metacognitive awareness in the nurse which we may not have realized otherwise. The nurse looks back on the chart to recheck her diagnostic hypothesis because of the conflicting information she has received that the patient's leg does not hurt to the touch. Without this gaze information, we may have surmised that the nurse had gone down a wrong path, and would need to be corrected on her diagnostic hypothesis. However, her looking back to study the chart and asking questions to the patient made us realize through the analyzes that she was reconsidering her current diagnostic hypothesis.
After examining the right leg, the training scenario transitioned into the third segment, marked by the movement of the nurse from the right side of the bed where she was examining the leg back to the left side of the bed. This movement, around 220 s into the scenario, again highlights the physical layout theme of the DiCoT analysis. In this segment, the nurse began the
Shortly after the phone call, the scenario transitioned into the fourth segment, marked by the entry of the provider into the room and the nurse moving to the foot of the bed, around 290 s into the scenario. In this segment, the dialogue shows a series of sequential
In this section, we combine the analysis across both scenarios to demonstrate how the collected data supports the DiCoT analysis presented previously. For this analysis, we will focus on the three primary DiCoT themes which are typically analyzed: physical layout, information flow, and artifacts and environment. We will examine each of the three DiCoT themes individually and how the data-driven evidence supports the major conclusions from that theme.
To support the comparison between the contextually different scenarios, we computed a series of marginal and conditional distributions of the four data modalities.
Distribution of gaze across five major object categories conditioned on the nurse's position in the room for each scenario.
Distribution of total speech acts conditioned on the nurse's position in the room for each scenario.
Beginning with the physical layout theme, a wealth of data supports the roles that space and physical layout play in the nurses' cognition. The timeline analysis shows that both nurses exhibit similar patterns in their movement through the physical space. Each nurse begins by entering the room through the door at the foot of the bed and immediately moving to the left side. The nurses stay on the left side to gather initial information from the chart and conversation with the patient before moving to the right side of the bed to begin their diagnostic clinical procedures. While the specifics of information gathering and clinical procedures differ between the two scenarios, the general movement patterns and associated tasks in these areas of the room remain very similar.
Support for the roles of these spaces can also be seen through the conditional distributions of gaze in
For scenario 2, while the difference in gaze for the chart monitor was fairly small, changing from 22.8% on the left down to 20.0% on the right, the difference in gaze for the vitals monitor was still quite large, with 10.5% when on the left and jumping to 40.0% when on the right. These differences between the left and right sides of the bed was also supported by the speech analysis. As shown in
For the information flow theme, data from the nurse gaze provided significant support for three primary information sources described in the DiCoT analysis: the chart, the patient, and the vitals monitor. Examining
Marginal distribution of nurse gaze across five major object categories for each scenario.
The nurses used these three sources to gather, aggregate, and synthesize information which may have been relevant to the patient's diagnosis and treatment. The timeline analysis also supports the information flow theme, demonstrating the transition from information flowing to the nurse to information flowing from the nurse. In both scenarios, the first two timeline segments involve the nurse gathering information. In the first segment, this information came primarily from reading the patient chart and conversation with the patient. In the second segment, the information came primarily from the nurse performing clinical activities.
At this point, the information flow in both scenarios reversed, with the nurses now becoming the information source and the patients and provider becoming the information recipients. Once the nurses had transformed and synthesized the gathered information, they reported their diagnostic inferences, thereby becoming an information source. In both scenarios, the nurse first provided information on her conclusions to the patient, explaining the diagnosis and how they arrived at that conclusion. Then the nurse provided information to the medical provider, first in the form of general patient information over the phone, and then in the form of explaining the diagnosis once the provider arrived in the room.
Moving on to the artifacts and environment theme, the gaze data again clearly supported the use of medical equipment as the primary mediating artifact. As seen in
Beyond these mediating artifacts, the data also supports the use of several artifacts as information hubs, specifically the chart and vitals monitors. As seen in
Overall, the patterns and distributions derived from our analysis framework clearly demonstrate the effectiveness of our approach in combining qualitative DiCoT analysis with multimodal analytics and the task model to analyze and interpret learner activities and behaviors in the MRMB training simulation. Specifically, this study shows the benefits of our cyclic analysis, with insights generated from both a forward pass of the framework, i.e., using the qualitative analysis to define and structure the quantitative analysis, as well as a backward pass of the framework, i.e., using results of the quantitative analysis to provide more detailed analysis of the learners activities and behaviors than we could generate by pure qualitative analysis, as proposed by the DiCoT framework. The more in-depth information generated by multimodal analysis benefits the two primary stakeholders: (1) learners and instructors through debriefing and after-action reviews, and (2) simulation designers and researchers, who can study the effectiveness of the simulation scripts in promoting effective learning activities. In this section, we discuss the implications of the framework and its resulting insights for both of these groups.
The primary goal of any simulation-based training environment is for the trainees to learn, practice, and develop expertise in skills that transfer to the real task environment. In our nurse case-study, this means that the nurses develop new knowledge and experience that supports both the psychomotor skills and cognitive and metacognitive processes. One of the critical components that mediates this knowledge gain, especially for novice learners, is effective feedback mechanisms during simulation debrief (see Section 2.1). It is the analysis of nurse performance and the generation of relevant feedback linked to the performance, where our current work is most likely to impact learners and their instructors in constructive ways. By using our analysis framework to generate evaluations of learner behavior, we can present these insights back to learners and instructors during debriefing (also known as after-action reviews) to help promote constructive discussion among the trainees and instructor as part of a larger formative feedback system.
This paper represents an initial step toward analyzing learner performance and behaviors, and then generating formative feedback, and as a result, this case-study analysis was performed
While this is only one simple example, it demonstrates the underlying concept: analytics generated using our activity analysis framework can be presented back to learners and instructors to help promote meaningful discussion, especially around topics that may be otherwise difficult to identify in a single viewing of the scenario. The design of formative feedback that is actionable and important for discussion is a large research questions in itself (Jørnø and Gynther,
Because of the cyclic nature of our analysis framework, the insights generated from our analysis and future analytics methods can be used to help refine the qualitative models of the simulation system. This is of particular importance and interest to simulation designers and researchers, as it uncovers new insights to improve our understanding of both the given simulation system and the science of simulation-based training as a whole.
For example, the multimodal data analysis permits the discovery of latent relations between different aspects of the distributed cognition system. In our nursing case-study, this is exemplified through the use of information hubs. The distribution of gaze conditioned on position reveals new insights about the use of information hubs. Initially, the DiCoT analysis revealed the dependency of physical space as a mediator in collection and analysis of the information provided on the two screens as information hubs (i.e., the patient chart and vitals). By combining the physical, artifacts, and information flow segments of DiCoT analysis, we derived how the use of each screen was largely mediated by the nurse's position on the left side of the bed near the patient chart, or on the right side of the bed near the vitals monitor. As described in Section 5.3, we see support for this analysis in the conditional gaze distribution, with fixations on the vitals screen going from 2.8 to 18.5% and 10.5 to 20%, when moving from left to right of the bed in scenarios 1 and 2, respectively.
However, based on this initial DiCoT analysis, we would also expect fixations on the patient chart to have the opposite relationship, decreasing significantly when moving from left to right of the bed. However, in our case studies, the fixations on the patient chart only significantly decreased in S1, moving from 25% on the left to 1.9% on the right, while in S2 the fixations on the patient chart decreased very slightly, with 22.8% on the left and 20% on the right. While it is clear that physical layout mediates the use of these information hubs, the data also suggests an additional latent mediating factor is present. We hypothesize that differences in the simulation scenarios contributed to this, with S2 requiring more references to the patient chart than S1, probably because of the incorrect diagnostic hypothesis the nurse initially made, but there are also other potential explanations, such as differences in the strategies adopted by the two nurses.
This is a simple example of a new insight generated by the quantitative methods that can lead to additional research to refine the qualitative models, but it also demonstrates the overall idea of the cyclic model design. After using the system to analyze learner data, we gain new insights that can be given back to simulation designers and researchers to help formulate new research questions and supporting simulation studies. We can iteratively update our qualitative understanding of simulation based on learner data, leading to better analysis of the data, and subsequent learner feedback, in the future.
In this paper, we presented an analysis of a nurse simulation-based training environment using multimodal learning analytics, cognitive task analysis, and distributed cognition analysis using the DiCoT framework. We show how the analysis of multimodal data from both qualitative and quantitative perspectives can be combined into a common framework for analyzing mixed-reality simulation-based training environments, such as the nursing case study analyzed here. While this work is still in its initial stages, the analysis methods developed and demonstrated here suggest a great potential for combining qualitative distributed cognition analysis with multimodal quantitative analytics in order to generate a more complete understanding of SBT as a whole. The strengths of each method are amplified when used together, and such an integrated approach can help shed new lights on simulation-based training and generate new insights.
However, this work and the framework it presents are not without limitations, and future work is required to address these concerns. One of the major limitations of the presented framework is its lack of guidance on the selection of adequate data sources and design of the associated analysis techniques. Since relevant data sources and analysis techniques differ widely among SBT domains, it is difficult to create a universal guidance on selection and design of these concepts while also keeping the domain-generality of the presented framework. In addition, this study was also limited by the sample size, only analyzing a small case-study of two simulation. This small study size allowed us to focus carefully on the design of the framework and the specific feature of the analysis, but limits the argument for the generalizability of the framework and the analysis results.
Future work will expand our study, both to more data from the nurse training simulation domain, as well as to a variety of other training domains. This expanded work will help to mitigate both of these limitations, as it will allow us to further validate the analysis methods across a wide variety of participants, as well as reveal commonalities among disparate training domains that can be used to generate guiding principles for the selection of adequate data sources and design of the associated analysis techniques. In addition, these further studies will place an emphasis on capturing data related to collaborative and teamwork activities in these environments, helping to further develop the distributed cognition frameworks that ground our data analysis techniques.
To support these expanded studies, future work will also focus on replacing the manual annotation of data used in this study with automated AI and machine learning techniques. Specifically, manual annotation was used in this study for the action, speech, and gaze modalities. For actions, techniques from video activity/action recognition will be applied to automatically extract time segments where the nurse is performing relevant actions (Ghadiyaram et al.,
Finally, this study and its associated framework was limited in guiding the design of formative learner feedback mechanisms based on the analysis. While Section 6 discussed some of the implications of the framework and its analysis on learning and pedagogy, including the possibility of developing formative learner feedback to support discussion sessions using contrasting cases, the framework itself does not detail guidance for designing learner feedback mechanisms. In addition, for this study specifically, analysis of the case-study data was performed
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
The studies involving human participants were reviewed and approved by Vanderbilt University Institutional Review Board. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
GB is the principle investigator of the study and contributed to its initial conceptualization and development of the analysis framework. CV, ED, and CC were responsible for data collection, annotation, and curation. NM maintained the computational and data infrastructure. CV performed the primary data analysis and wrote the initial draft of the manuscript. All authors contributed to model development, interpretation of results, manuscript revision, and approved the submitted version.
This work represents independent research supported in part by Army Research Laboratory Award W912CG2220001 and NSF Cyberlearning Award 2017000, as well as equipment and funding from the Vanderbilt LIVE initiative.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The views expressed in this paper do not necessarily reflect the position or policy of the United States Government or the National Science Foundation, and no official endorsement should be inferred.
The authors would like to thank Daniel Levin, Madison Lee, and Eric Hall for their instrumental contributions in the design of the study and collection of data. In addition, the authors would like to thank Eric Hall, Jo Ellen Holt, and Mary Ann Jessee for providing their domain expertise during initial planning of the study and during the development of the models used for analysis in this paper. We would also like to thank all of the nursing students who participated in the study. Finally, we would like to thank the reviewers of this paper, who's feedback and guidance strengthened the paper.