Collaborative Learning Quality Classification Through Physiological Synchrony Recorded by Wearable Biosensors

Interpersonal physiological synchrony has been consistently found during collaborative tasks. However, few studies have applied synchrony to predict collaborative learning quality in real classroom. To explore the relationship between interpersonal physiological synchrony and collaborative learning activities, this study collected electrodermal activity (EDA) and heart rate (HR) during naturalistic class sessions and compared the physiological synchrony between independent task and group discussion task. The students were recruited from a renowned university in China. Since each student learn differently and not everyone prefers collaborative learning, participants were sorted into collaboration and independent dyads based on their collaborative behaviors before data analysis. The result showed that, during group discussions, high collaboration pairs produced significantly higher synchrony than low collaboration dyads (p = 0.010). Given the equivalent engagement level during independent and collaborative tasks, the difference of physiological synchrony between high and low collaboration dyads was triggered by collaboration quality. Building upon this result, the classification analysis was conducted, indicating that EDA synchrony can identify different levels of collaboration quality (AUC = 0.767 and p = 0.015).


INTRODUCTION AND RELATED LITERATURES
In a world that is deeply connected, collaborative learning is believed to be the most important way of learning, shared knowledge construction, decision-making, critical thinking, and problem solving (Bruffee, 1999;Dillenbourg, 1999;Van Kleef et al., 2010;Gokhale, 2012). Scholars and practitioners advocating collaborative learning believe that learning is inherently active, constructive, and social. Successful collaboration benefits the whole group by immersing the students in an active learning condition to increase engagement and joint attention, relearn through retrieval, negotiate multiple perspectives, increase working memory resources, to name a few (Johnson and Johnson, 1985;Barron, 2003;Roediger III and Karpicke, 2006;Kirschner et al., 2009;Kuhn and Crowell, 2011). Broader education goals, such as involvement, cooperation and teamwork, and civic responsibility, are also believed to be achieved by collaborative learning (Smith and MacGregor, 1992). Dillenbourg (1999) provided a general definition of collaborative learning as "a situation in which two or more people learn or attempt to learn something together. " In this definition, "learn something" was broadly interpreted as activities, including "follow a course, " "study course material, " "perform learning activities such as problem solving, " "learn from lifelong work practice, " and "together, " was interpreted as different forms of interaction. In fact, individual interaction is crucial in successful collaborative learning (Soller et al., 1998;Hiltz and Turoff, 2002;Kreijns et al., 2003), and thus serves as a key component of collaboration quality.
Given the importance of collaborative learning, the measurement of its quality is, however, very complex and challenging. Existing approaches can be categorized into subjective and objective measurements. Subjective measurement mainly relies on self-report data, including both interview (Salovaara and Järvelä, 2003;Sidani and Reese, 2018) and scales (Orchard et al., 2012; Organization for Economic Co-operation and Development (OECD), 2013); while objective measurement mainly relies on explicit observational data, which captures verbal communication and non-verbal behaviors (Odom and Ogawa, 1992;Marty and Carron, 2011;Mehl, 2017;Chi et al., 2018). The main defect of self-report data is the subject perspective that could be manipulated by the participants. For instance, the social desirability bias is a famous potential threat (Fisher, 1993;Holbrook and Krosnick, 2009).
Analyzing the verbal content of interactions is the most straightforward approach for analyzing the quality of interaction, in both face-to-face and computer supported contexts and has been commonly used in educational and psychological studies (Grau and Whitebread, 2012;Kent et al., 2016;Vuopala et al., 2016). The limitations of content analysis, include labor intensive and difficult, to provide instant feedback to either students or teachers. Thanks to the rapid development of computing capability and machine learning algorithm, non-verbal interactions, such as eye contacts, facial expression, and body movement, are also possible to be captured and analyzed (Montague et al., 2013;Tunçgenç and Cohen, 2018) and has been proved to be sufficient indicators of collaborative learning quality in face-to-face classroom or in online courses (Bronack, 2011;Al Tawil, 2019). These qualitative methodologies have shown rich effectiveness in the interpretation of human behaviors (Hsieh and Shannon, 2005;Elo and Kyngäs, 2008). However, there are still issues on trustworthiness of both the content analysis and the interpretation of explicit behaviors when using these methods alone (Elo et al., 2014). The implicit factors, such as emotional contagion, and affect infusion among individuals may be more crucial to cognition (Okon-Singer et al., 2015), but is far from being fully investigated due to challenges in measurement and data constraint (Fujiki et al., 2002).
As quite a few factors may affect the quality of collaboration, emotion is one of the most significant and moderates human behaviors in observable patterns (Balters and Steinert, 2017).
Emotion regulation abilities are highly related to the success of interpersonal interactions, especially when the individuals collaborate with peers or in the workplace (Lopes et al., 2005;Eligio et al., 2012).
Educational activities require intensive interpersonal interactions. Thus, emotion plays an important role in education, especially in collaborative tasks (Schutz and Pekrun, 2007;Järvenoja and Järvelä, 2009). The effect can be either positive or negative (Imai, 2010). Comparing to verbal content, emotional state is directly detectable though quantitative measurements. Building upon the theory on human automatic nerves system (ANS), the important components of collaborative learning, such as cognitive load and emotional state, are believed to be monitorable through neurophysiological signals including EEG, fNIRs, ECG, and EDA. Arousal and valence can be evoked and detected in specific situations (Agrafioti et al., 2011;Boucsein, 2012;Ramirez and Vamvakousis, 2012;Dawson et al., 2017). The effects of individual interactions on emotion can also be measured through multimodal physiological signals (Heaphy and Dutton, 2008;Mønster et al., 2016).
Neurophysiological signals have been considered as promising measurements of emotional characteristics and can capture students' learning process that go beyond acquisition of knowledge (Léger et al., 2014;Ochoa and Worsley, 2016). Positive evidences on the correlation between interpersonal neurophysiological synchrony and interaction are consistently reported in recent years. Using various hyperscanning technologies, inter-brain synchrony has been identified during face-to-face communication or interactive decision-making (Jiang et al., 2012;Dikker et al., 2017;Hu et al., 2018), suggesting special neural processes recruited by interaction. Interpersonal physiological synchrony has also been consistently found during collaborative tasks and used as indicators of effective collaboration. Higher level of synchrony is associated with better task performance and learning gains in collaborative tasks (Ahonen et al., 2016;Pijeira-Díaz et al., 2016;Dich et al., 2018).
However, brain hyperscanning devices are usually not easily to use in real classroom learning, because of people's concern on health safety issues of brain hyperscanning, and their visual interruption to both students and teachers. Data quality might also be a problem in naturalistic settings (Matusz et al., 2019). Physiological signal, on the other hand, can be easily and steadily recorded at distal sites, such as fingers and wrists (Boucsein, 2012), and thus easily accepted by parents and students. The majority of existing studies that measure physiological synchrony in collaborative learning are basically lab-based experiments (Pijeira-Díaz et al., 2016;Dich et al., 2018;Dindar et al., 2019). The tasks include open-ending problem-based learning topics, such as designing breakfast for marathoners, (Haataja et al., 2018) or pair-programing task with restricted solutions (Xie et al., 2018). Although a few of them collected data in real classroom (Ahonen et al., 2018), they only reported correlation between synchrony and collaboration, but did not further explore the practical potential of categorizing collaboration quality through synchrony in naturalistic scenarios. Naturalistic scenario, instead of laboratory setting, is crucial in the research of collaborative learning. First, it is social in nature and cannot be simulated in fully controlled, isolated environment. Second, it is constructive and targets at highlevel cognitive skills, such as problem-solving, knowledge construction, and collaboration, and cannot be substituted by simple cognitive tasks, which are frequently used in laboratory investigations. Third, even applying ethologically relevant stimuli in a laboratory context, participants may react differently in both behaviors and neurocognitive signals (de Heer et al., 2017;Qu et al., 2020). In the real classroom context, the interaction behavior varies across students, which is very different from laboratory settings, where participants will try their best to comply to the research design. In fact, naturalistic real-world research is believed to be necessary to understanding human behaviors in neuroscience (Matusz et al., 2019). Classroom learning can serve as an ideal semi-structured scenario to bridge the laboratory based research and the real world.
The rapid development of wearable biosensing technologies makes it possible to record neurophysiological signals during naturalistic classroom learning. Recently, researchers used portable EDA and EEG sensors in classroom to record the neurophysiological signals of teachers and students in variance of learning conditions including lectures, discussion, movie viewing, and real exam (Dikker et al., 2017;Poulsen et al., 2017;Zhang et al., 2018;Qu et al., 2020). By recording physiological signal in fully real-world collaborative learning, the present study attempts to apply physiological synchrony to predict interaction quality in real collaborative learning. Although collaborative activities can range from classroom discussions to team research that covers a whole semester or year, this study focuses on classroom discussion as it is the simplest and most general scenario among diverse collaborative learning approaches.
In the current study, the researchers collected physiological and behavioral data from naturalistic class sessions and analyzed interpersonal synchrony during individual and collaborative tasks. Students were naturally divided into two kinds of collaborative dyads (CDs) according to their different learning styles as captured by behaviors, i.e., CDs and IDs. The result showed a significant difference in interpersonal physiological synchrony between CDs and IDs, i.e., high and low interactive levels. Following classification analysis confirmed the potential of applying physiological synchrony as an indicator of collaboration quality. This finding is promising in future applications of evaluation in student learning style.

Participants
Participants were recruited from an undergraduate level elective course that requires no prerequisite domain knowledge. All participants were Chinese nationals and full-time university students. The 16 students who registered in this course were from 12 different departments and programs across natural sciences, pharmaceutical science, engineering, social sciences, and humanities. Data collection lasted for two class sessions in 2 consecutive weeks. Fifteen out of 16 students (M = 21.61, SD = 2.43, eight females) signed the informed consent form at the beginning of the first data collection session. One student quit at the second session due to health conditions. Sixteen students formed four three-people and one fourpeople discussion groups at the beginning of the semester, resulted in 16 dyads in the first data collection session. The grouping was simply decided by their seat location and most of them were strangers to each other at the before taking this course. In the second session, the grouping remained the same. There were 14 dyads since one person quit from a three-people group, making a total of 30 dyads. Dyad no. 15 was eliminated from all analyses and no. 13 and no. 15 were eliminated from analyses on collaborative learning, because one student in this group shared laptop screen with their teammate and discussed (pair no. 15) during the independent task (IT), violating the experiment requirement (Figure 1).

Experimental Tasks and Materials
Each course sessions had two main parts. In the second half of each class session, after the lecture, the instructor assigned an open-ended problem to the class. Students were required to solve the problem independently first (IT), followed immediately by a group collaborative discussion (interaction analyzed in pairs, PT) on the same problem. The problems in both steps were the same, except that participants were asked to solve the problem alone or with group members. In the first class session, the students were asked to review course materials and sort out a list of key knowledge by its significance, then discuss with their group members to forge a comprehensive agreement on the list. In the second class session, the students were asked to explore new approaches for engagement measurements alone and then discuss with their group members to form a comprehensive approach. The group would share their final solution to the whole class.
FIGURE 1 | Classroom setting and intra-group dyads. The gray figure did not participate in the study at all. The yellow figure and nodes were excluded from the first data collection session because of the violation of experiment requirement. The green figure quit the study in the second-round data collection session because of health conditions. Thus, a total of 28 dyads were used in the analyses.
A short survey was used to evaluate participants' engagement level and emotional state during IT and PT, respectively. The engagement level was self-reported by the participants with a 5-point Likert scale. The emotional state was measured with a five-scale Self-Assessment Manikin (SAM) to rate the affective dimensions of valence, arousal, and dominance (Bradley and Lang, 1994).

Settings and Apparatus
The settings followed the naturalistic class settings of this course. Each student had their own chair desk with rolling wheels. Skin conductance and heart rate (HR) was collected from each participant using the unobtrusive Huixin Psychorus wristband (Beijing Huixin Technology, 2021), capturing data at a sampling rate of 40 Hz for EDA and 1 Hz for HR. Each group was videotaped during both IT and collaborative discussion.

Procedures
Before the beginning of the first data collection session, researchers collected the signed informed consent and helped the students to wear the wristbands properly to ensure good data quality.
There was a 3-min close-eye baseline session and a 2-min open-eye baseline session before the IT. After the baseline sessions, all instructions were given by the instructor. The independent step last for 7-10 min and the group collaborative learning task last for 12-17 min. The short survey was collected immediately after IT and PT (Figure 2). Two to five minutes were cut from the beginning of the PT sessions to eliminate any continued effect from the IT sessions in data analysis. Students' own physiological data reports were provided to them after the data collection to appreciate their participation.

Physiological Data Preprocessing
Skin conductance was collected using galvanic skin response (GSR). GSR records the changes of the electrical activity on the skin. The more general name of the GSR is electrodermal activity also known as EDA . The term EDA will be used to refer to the skin conductance signal in the following part of the article. The visual inspection was performed to control the quality of the raw EDA signals. Samples of the EDA signal clips during independent and paired tasks were randomly selected for visual inspection and artifacts manual removal. The cleaned data were used for further calculations. The results of the new dataset were then compared with the results from the original data set. Since the difference between two datasets was not significant, manual artifacts removal was skipped for the whole sample to minimize the influence on the raw data. The signals were then smoothed using the Gaussian smoothing algorithm and was down sampled from 40 to 10 Hz in a MATLAB-based EDA analysis software (Ledalab 3.4.9; Benedek and Kaernbach, 2010). 1

Ground Truth and Procedure of Analysis
In naturalistic setting without artificial design, ground truth should first be defined before physiological data analysis. The ground truth is that students learn differently and not everyone prefers/suites collaborative learning. In the real class sessions, students acted in their own learning style, which refers to the stable trait which decides how learners perceive and respond to learning environments (Keefe, 1979). The Felder and Silverman mode categorizes students' learning style into five dimensions: active/reflective; sensing/intuitive; visual/verbal; sequential/global; and inductive/deductive (Felder and Silverman, 1988;Alfonseca et al., 2006). Active learners prefer to internalize information from external environments, and they are more likely to share opinions with peers frequently during collaboration; however, reflective learners tend to examine and process information by themselves (Felder and Silverman, 1988). That is, even though the students were equally engaged in the tasks, they may generate different perspectives and preferences for collaborative learning mode, and use different cognitive strategies during interaction (Cabrera et al., 1998;Kayes, 2005). The homogeneity of learning styles within a group will also affect the interactive effect, and the higher homogeneity group members may have better collaboration quality (Alfonseca et al., 2006).
Thus, participants were sorted into CDs and IDs based on their collaborative behaviors. Table 1 presents the collaboration events and their descriptions. CDs are expected to have higher quality interaction during collaborative task and generate higher physiological synchrony.
A series of hypothesis test were conducted to compare groups of participants along different dimensions. The tests were performed by t-test. Before performing the t-tests, tests of normality (Kolmogorov-Smirnov and Shapiro-Wilk) were applied to the EDA synchrony, engagement, and emotional state datasets for IT and PT, all the tests did not show significant departure from normality. Therefore, the EDA synchrony, engagement, and emotional state variables for IT and PT were normally distributed. Classification analysis was then applied to see if physiological synchrony can identify collaborative learning quality. Figure 3 indicates the logic of the whole analysis.

Define Collaborative and Independent Dyads
In the current study, two raters watched the videos of the group discussion and categorized the dyads into high or low collaboration style without being aware of the physiological data analysis. Effective interaction includes verbal and non-verbal interactions such as tight conversation, eye contact, and joint attention on course material. As discussed in the previous section, better collaboration behavior quality is strongly correlated with collaborative learning quality including mutual gaze and joint attention (Schneider and Pea, 2013).
The two raters recorded the time range of key interaction events in the video for each pair of students. For groups with three people, when the total time of interaction events exceeds 1/3 of the total time of PT, this pair is then categorized to CDs, otherwise to IDs. For the one group with four people, the threshold is set to 1/6 since there are six different pairs of students sharing the total interaction time. Fifteen dyads were sorted to CD and 14 dyads to ID.

Engagement and Emotional Statement
It is important to check the validity of experimental settings in naturalistic classroom. First, students' engagement levels should be the same across the two sessions to ensure that any identified differences are not due to engagement differences. Second, students' self-report on their subjective experience during IT and PT should be compared to check if the IT and PT did mean different learning strategies to them.
According to Figure 4, engagement difference was not found between IT and PT. The participants were equally and highly engaged in both the IT (M = 2.76, SD = 0.951) and group discussion task (PT, M = 3.00, SD = 1.035), t(28) = 1.565, p = 0.258. This result suggests that participants were equally and highly engaged in both IT and PT, making engagement less likely to be the possible confounding factor for the additional synchrony during collaboration.
Same analysis was also conducted on the three affective dimensions. During group discussions, participants were more aroused ( Higher score on arousal and valence indicated pleasant and excited discussion atmosphere during the collaborative learning FIGURE 3 | Participants were categorized into collaborative dyads (in red) and independent dyads (in blue) according to the behavioral indicators during collaborative learning task. A hypothesis test was performed to explore the relationship between collaboration behavior and physiological synchrony during collaborative learning. At last, the classification analysis was applied to test the effectiveness of physiological synchrony as an identifier of collaborative learning quality.

Collaboration events Description
Tight conversation Discussion on one focused point between two people for several rounds Eye contact Though not talking, the person communicates with their teammates by eye contact, expressing agreement and disagreement on the topic Joint attention Two people show joint attention on the third object such as course material, laptop screen, or the blackboard in the front of the classroom.
task. Lower score in dominance is reasonable during PT since the process of multi-personal discussion came with negotiation and compromise (Figure 4).

Physiological Synchrony
The algorithm for the computation of EDA synchrony was adopted from Marci and Orr (2006) and calculated the momentby-moment physiological concordance named as single session index (SSI). Same algorithm was also implemented on HR.
It should be noted that the EDA signal used in the analysis was the overall EDA instead of plain skin conductance level (SCL) or skin conductance responses (SCRs). The SCL represents the tonic level of electrical conductivity of the skin, relating to the slow and background change of EDA. The SCRs represent the phasic changes of electrical conductivity of the skin, reflecting the rapid and eventrelated changes of EDA (Braithwaite et al., 2013;Posada-Quintero and Chon, 2020). This study did not focus on the SCRs of the physiological signal and paid more attention on the overall changing trend of the EDA. But to keep the high ecological validity of this naturalistic experiment, the researchers chose to not eliminate the possible influence of SCRs for the authenticity of the study.
First, the 10 Hz signal was further down sampled by averaging the 10 numbers in each second. The moment-by-moment slope of the 1 Hz data for each signal was then calculated using a 5-s window with a regression model at a 1-s roll-rate. Next, Pearson correlations were conducted on the slope for each pair of data with a 15 s window rolling at the rate of 1 s, reflecting a moment-by-moment synchrony in the last 15 s. The SSI is an index that shows the synchrony over a time period instead of discrete time points. It is the natural logarithm of the ratio of the sum of positive correlation coefficients divided by the absolute value of the sum of negative correlation coefficients over a given period of time. A sample of each step of the signal processing and synchronization calculation for IT and PT is shown in Figure 5.
In the EDA-IT figure, (a) is the raw EDA signals of pair no. 10 in IT; (b) is the trajectory of slope for a 5-s window, rolling on the rate of 1 s; and (c) shows the moment-bymoment correlation coefficients on a 15-s window. The three panels are the same as in the EDA-PT figure. This is an example of the same pair in the collaborative task. The synchrony (SSI) was higher than that of in IT. The figures in the third and fourth row are examples of HR data.
The results showed in Figure 6 also showed that SSI was significantly lower for ID during PT, as compared to IT [t(27)

Physiological Synchrony as a Classifier of Collaborative Learning Quality
Since there is a strong correlation between the interaction level and EDA synchrony, a receiver operating characteristic (ROC) analysis was performed to test the accuracy of synchrony as a classifier of the collaborative learning behaviors in both IT and PT. Results showed that SSI is an acceptable indicator to identify interaction levels for collaborative task (AUC = 0.767, p = 0.015). Synchrony did not discriminate different collaboration styles during IT (AUC = 0.343, p = 0.15), which is good since there was no collaborative behaviors and no significant difference between CD and ID during IT. The results of IT and PT together verified the robustness of synchrony as the predictor for collaborative learning quality (see Figure 8).
Same analysis was also applied on HR data. As shown in Figure 9, the synchrony of HR exhibited low accuracy in classifying collaboration style (AUC = 0.454, p = 0.679) during PT. This is consistent with the low HR synchrony and undifferentiated HR synchronization level across collaborative learning quality.

CONCLUSION AND DISCUSSION
The aim of the present study is to explore the potentials of using physiological synchrony to classify collaboration quality in realistic educational settings, based on consistently identified synchrony during interpersonal interaction by previous studies. Existing studies show that learners are diverse in learning style and collaborative learning can manifest this diversity while students take different roles in the learning process (Smith and MacGregor, 1992;Pashler et al., 2008).
In the current study, the participants were categorized into CDs and IDs according to their natural behaviors in the collaborative learning tasks, and this behavioral difference significantly correlated with EDA synchrony between the pairs of participants. The results showed that participants who were categorized as CD during group discussions were associated with higher EDA synchrony. However, there was no significant difference between CD and ID in collaborative tasks in their HR synchrony.
One possible explanation for these inconsistent results in EDA and HR may have to do with the fact that talking affects one's cardiovascular system but not EDA. Talking, even without emotional expression, can increase the blood pressure of hypertension patients (Le Pailleur et al., 2001). Simple mental and verbal activities also affect HR variation through changes in respiratory frequency (Bernardi et al., 2000). On the other FIGURE 5 | A sample of electrodermal activity (EDA) and heart rate (HR) data processing in the EDA-IT figure, (a) is the raw EDA signals of pair No. 10 in IT; (b) is the trajectory of slope for a 5-second window, rolling on the rate of 1 second; (c) shows the moment-by-moment correlation coefficients on a 15-second window. The three panels are the same as in the EDA-PT figure. This is an example of the same pair in the collaborative task. The synchrony (SSI) was higher than that of in IT. hand, no evidence was found for the correlation between EDA and free talking (Fowles et al., 2000). In the natural group discussion context, students focused on the same task, trying to forge an integrated answer. Among CD, two students were tuning their emotional state during the discussion, resulted in higher synchrony in EDA. But when two people talk, they talk FIGURE 7 | HR synchronization did not show significant difference between ID and CD during IT and PT.
FIGURE 6 | Single session index (SSI) among CD is significantly higher than that in ID during collaborative learning tasks. ** p < 0.01.
in turns, not simultaneously, thus the asynchronous HR. Actually, when participants doing verbal and motor activities in unison, HR synchrony was significantly higher than during unsynchronized moments (Müller and Lindenberger, 2011;Noy et al., 2015). Therefore, even the cognitive and emotional elements that generated synchrony in EDA may also synchronize HR in cognitive tasks as reported in the laboratory based studies (Henning and Korbelak, 2005;Montague et al., 2014;Mitkidis et al., 2015), the effect could be mixed with that of talking on one's HR. While on the other hand, EDA synchrony was identified during unstructured conversation (Silver and Parente, 2004).
The result also showed that during PT, the EDA synchrony of ID was significantly lower when during IT. It seemed counterfactual on first thought but it could be reasonable if learning style was brought into consideration. Learners differ in the preference for collaborative learning (Cabrera et al., 1998). As a result, different people would choose different learning strategies. Independent learners may prefer to learn by themselves and process information in a more implicit way. When doing IT, this kind of learners can spend more cognitive resources on their task, thus two learners may show a moderate physiological synchrony as shown in Figure 6. But when they A B FIGURE 9 | The receiver operating characteristic (ROC) curve of HR for PT (A) and IT (B) using synchrony as predictor for collaborative learning quality.

A B
FIGURE 8 | The ROC curve of EDA for PT (A) and IT (B) using synchrony as classifier for collaborative learning quality.
were in collaborative learning context, they have to spare part of their cognitive resources to other people or the entire environment, or could be overwhelmed by the intense communication in the group. In this case, the ID participants may show an even lower physiological synchrony than during IT. Classification analysis proved that physiological synchrony may serve as a good indicator for interpersonal interaction quality. Higher physiological synchrony is positively correlated with higher interaction level. That is, higher frequency and longer time of interaction behaviors. Similar approach can be found in the research of predicting communication behavior using neural or physiological synchronization (Henning et al., 2009;Jiang et al., 2012). This application can help to identify different collaborative learning quality of the learners. It can also give instructors feedback on course content. One student may be attracted to one topic or interaction scheme but disinclined to another. In such case, physiological synchrony can provide clues in teaching adjustment.
Our findings provide evidence for the potential application of biosensors in the real-world classroom. We focus on the connection between the bio-signals and human behaviors on which we believe is the advantage of this interdisciplinary research area. This project also suggests that future researches in the same realm place attention to the scope of appropriate assumptions and research questions so that the laboratory-based experiments and naturalistic setting studies can be good complement for each other.
Students' immediate learning outcome was not evaluated as the tasks were open-ended class discussions. Next, we will choose class sessions that has planned quizzes as a measure of learning performance.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institute of Education, Tsinghua University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YL, TW, and YZ designed the experiments and drafted the manuscript. YL, TW, and KW carried out the fieldwork and collected and analyzed the data. All authors contributed to the article and approved the submitted version.