Happiness and high reliability develop affective trust in in-vehicle agents

The advancement of Conditionally Automated Vehicles (CAVs) requires research into critical factors to achieve an optimal interaction between drivers and vehicles. The present study investigated the impact of driver emotions and in-vehicle agent (IVA) reliability on drivers’ perceptions, trust, perceived workload, situation awareness (SA), and driving performance toward a Level 3 automated vehicle system. Two humanoid robots acted as the in-vehicle intelligent agents to guide and communicate with the drivers during the experiment. Forty-eight college students participated in the driving simulator study. The participants each experienced a 12-min writing task to induce their designated emotion (happy, angry, or neutral) prior to the driving task. Their affective states were measured before the induction, after the induction, and after the experiment by completing an emotion assessment questionnaire. During the driving scenarios, IVAs informed the participants about five upcoming driving events and three of them asked for the participants to take over control. Participants’ SA and takeover driving performance were measured during driving; in addition, participants reported their subjective judgment ratings, trust, and perceived workload (NASA-TLX) toward the Level 3 automated vehicle system after each driving scenario. The results suggested that there was an interaction between emotions and agent reliability contributing to the part of affective trust and the jerk rate in takeover performance. Participants in the happy and high reliability conditions were shown to have a higher affective trust and a lower jerk rate than other emotions in the low reliability condition; however, no significant difference was found in the cognitive trust and other driving performance measures. We suggested that affective trust can be achieved only when both conditions met, including drivers’ happy emotion and high reliability. Happy participants also perceived more physical demand than angry and neutral participants. Our results indicated that trust depends on driver emotional states interacting with reliability of the system, which suggested future research and design should consider the impact of driver emotions and system reliability on automated vehicles.


Introduction
Conditionally Automated Vehicles (CAVs) are a developing technology that will greatly impact transportation in the future. With these systems, drivers will need to rely on the judgment of artificial intelligence to make the safest decisions possible, particularly at Level 3 automation and beyond where the vehicle has primary control (SAE, 2018). Trust, in the context of CAVs, is defined as the "attitude that an agent will help achieve an individual's goals in a situation characterized by uncertainty and vulnerability" (Lee and See, 2004). Trust, in this case, is critically important to ensure proper use of these systems to their greatest extent. Previous studies have identified that the emotions of a user can have a significant influence on trust development (Dunn and Schweitzer, 2005) and on driving performance (Jeon et al., 2014a,b;Jeon, 2016;Sterkenburg and Jeon, 2020). However, few studies have investigated how these emotions influence trust in a CAV context. Cognitive appraisal determines how individuals evaluate emotional situations. According to Smith and Ellsworth (1985), emotions have been categorized with different patterns of cognitive appraisals, especially the self-other responsibility control significantly influences people's trust (Dunn and Schweitzer, 2005). For example, anger has a high other responsibility control as an angry person perceives other people to be responsible for unpleasant situations.
Emotions are also categorized with other dimensions, such as certainty and attentional activities (Smith and Ellsworth, 1985). Therefore, in the present study, we decided to induce happy, angry, or neutral (baseline) emotional states on the participants to observe this potential relationship between cognitive appraisals in different emotions and trust toward automated vehicle systems.
The use of in-vehicle agents (IVAs) plays a critical role in communication with drivers in CAVs. These IVAs are "anthropomorphized intelligent systems that can interact with drivers using natural human language" (Lee and Jeon, 2022). Due to the current technology, these can vary in reliability which, along with emotions, can have a strong impact on the effectiveness of driveragent interaction (Lee and Jeon, 2022). To investigate the influence of reliability of IVAs, we created two separate reliability levels based on the percentage of correctly presented information: high (100% reliability) and low (67% reliability) conditions. In the present study, we focused on the impact of drivers' emotions and IVA's reliability level on drivers' situation awareness (SA), perceptions, trust, perceived workload, and driving performance toward a Level 3 automated vehicle system.

Related work 2.1. Emotion induction
Multiple methods of emotion induction have been used in previous studies. One of these was a priming task, defined as a method of manipulation in which individuals were asked to recall a time that they felt an emotion without providing further elaboration. This was determined by Dunn and Schweitzer (2005) to have no significant emotional effect. In the present study, the desired emotion was induced using an Autobiographical Emotional Memory Task (AEMT). As described by FakhrHosseini and Jeon (2017), in this method of induction, the participant writes for 12 min about one or more of their past experiences related to the emotion being induced (e.g., happy or angry). The participant is asked to immerse themselves in the memory of their experience, writing as clearly as possible. This type of writing task has demonstrated an effective method of inducing the intended emotion and does not require additional technologies needed for music or film induction methods (Mills and D'Mello, 2014).

Agent anthropomorphism
The anthropomorphism of a robotic agent, defined as "a process of inductive inference whereby people attribute to nonhumans distinctively human characteristics" (Waytz et al., 2014), has shown to significantly influence user trust, particularly in automated vehicles. Drivers in a vehicle with humanlike characteristics (such as a name, gender, and a voice) had a higher physiological trust than those in a non-anthropomorphic vehicle (Waytz et al., 2014). Physiological trust indicated the participants' level of relaxation in an accident scenario. This was determined through changes in heart rate, measured using electrocardiography, and startle, measured using a 0 to 10 scale by 42 independent raters. Self-reported trust, however, did not differ significantly (Waytz et al., 2014). Furthermore, in unavoidable accidents, automated vehicles were blamed significantly more than non-automated vehicles (Waytz et al., 2014). However, drivers placed significantly less blame on vehicles with anthropomorphic characteristics (Waytz et al., 2014). A previous study also showed that drivers preferred embodied agents over voice agents because they found the humanoid robots as more likeable and warmer (Wang et al., 2021).
Beyond a humanlike appearance, changes in an agent's voice have also shown significant effects on driver trust and behavior. In both manual and automated driving, the dominance of the voice caused significant changes in situation awareness (SA). The SA of a manual driver increased when the agent had a more dominant voice; however, this was reversed in Level 3 automated driving, where a more submissive voice increased SA (Yoo et al., 2022). Additionally, drivers in an automated vehicle with a submissive voice demonstrated a higher level of trust and improvements in regulating angry emotions (Yoo et al., 2022). Agents with speech patterns designed to improve SA were effective in increasing SA and performance of angry drivers . These SA speech patterns were suggestive/notification style prompts that would ask or comment about the driver's surroundings ("If you see any restaurant, let me know. "). These agents were also viewed as more likable than those with directive/command style speech patterns designed to regulate emotions ("Forget your angry feelings. You are driving now. ").
While our study did not investigate the interactions between anthropomorphism and the other variables, it is important to understand the effects that it might have. The present study involved two humanoid robots, NAO and Milo, playing the roles of IVAs that communicate with the drivers. Each IVA's name was told to the participants. Note that we did not manipulate the degree of embodiment of the two robots as a study variable; instead, the two robots were used to represent different levels of reliability, respectively. To ensure consistency and minimize the plausible robot effects, the mapping of reliability on each robot agent was counterbalanced across participants. Beyond elements designed to humanize an agent, there were also other factors that influenced user trust in the system. According to Koo et al. (2015), IVAs can inform the user of autonomous actions in different ways. This can be through a message that explains the context of why an action was taken, how the action will be accomplished, or a combined explanation of both. While the combined "how and why" message was the safest method in terms of driving behavior and steering control, it also created the highest level of anxiety out of the three due to a possible increase in cognitive load (Koo et al., 2015). "Why" only messages, on the other hand, created the highest trust and lowest anxiety levels (Koo et al., 2015). In situations where safety is not a critical issue, "why" only messages may be preferred to build an acceptable level of trust (Koo et al., 2015), which we also considered in the present study.
Another method of increasing trust lies in the time spent using the system. With both initially trusting and distrusting drivers, 10 min in the simulator experiencing the sounds, environment, and system of highly automated driving (HAD) before the study significantly increased the level of trust in HAD (Manchon et al., 2022). After this time, drivers were shown to monitor the road with fewer glances and with more time spent engaging in Non-Driving Related Activities (NDRA). Additionally, this had a greater effect on drivers who were initially distrusting the system, as their level of trust development increased significantly more.

Cognitive and affective trust
Developing trust between drivers and automated vehicles has been a challenge to researchers. In addition to using the Trust in Automation scale (Jian et al., 2000), we also desired to measure other types of trust toward a Level 3 automated vehicle system. According to McAllister (1995), cognitive (cognition-based) trust is defined as trust based on the knowledge and evidence on someone's ability and achievements; affective (affect-based) trust is defined as trust based on the emotional bond with someone. Cognitive and affective trust could be a considerable factor in improving workers' performance for cooperative organizations (Morrow et al., 2004;Johnson and Grayson, 2005) and impact users' satisfaction and loyalty (Trif, 2013). There are various automated driving research studies investigating trust in automation but none in terms of cognitive and affective trust. The present study involved a trust scale including cognitive and affective trust (McAllister, 1995) to determine any correlation among drivers' emotions, reliability, and trust in automated driving systems.

The relationship between emotion and trust
The emotional state of an individual has a large influence on driving behavior. Jeon et al. (2014a,b) found that angry and happy drivers had a greater number of errors than drivers who were fearful or emotionally neutral. Angry drivers were also shown to have the lowest level of perceived safety, and happier drivers were shown to have the highest perceived workload. Happy people typically want to maintain their happiness (Isen, 1987;Wegener et al., 1995), but a challenging driving task might have served as an obstacle to it, which made them perceive relatively high workload.
In previous studies, emotion was shown to have a large effect on the levels of trust. According to Dunn and Schweitzer (2005), when the trustee was unfamiliar, happy individuals had significantly higher levels of trust in the trustee than sad individuals, and sad individuals had higher levels of trust in the trustee than angry individuals. If the participant was familiar with the trustee, emotions had no significant effect on trust in the trustee (Dunn and Schweitzer, 2005). Inversely, the level of trust in an automated vehicle was also shown to influence emotional state. According to a study conducted by Dixon et al. (2020), drivers who gained trust in the automated vehicle were significantly more likely to display a happy emotion. On the other hand, a decrease in trust was correlated with displays of an angry emotion.
Trust is a crucial predictor of people's willingness to engage with technologies (Plaks et al., 2022). Literature also shows that emotions are important factors influencing trust toward automated systems (Cho et al., 2015;Granatyr et al., 2017). Therefore, the knowledge of emotional effects on automation trust is a matter of critical importance in the design of trustable automated systems. Through the use of emotion induction, we specifically focused on the influence of emotional states on driver trust. The findings made by Dunn and Schweitzer (2005) are particularly comprehensive, factoring in both the role of familiarity of the trustee and the influence of personal emotion. Given that our agent may be considered "unfamiliar" to participants, we expect to see similar results.

The relationship between reliability and trust
A system's reliability could directly affect the user's trust in the system. Muir and Moray (1996) proposed that faith was the primary contributor to trust. However, a replication study conducted by Long et al. (2022) falsified this finding two decades later and showed reliability to be the best predictor of trust over faith. Given that reliability was shown to be a higher predictor of trust (Long et al., 2022), we believe that investigating the impact of reliability on trust will yield more significant results than the impact of faith on trust.
Reliability also impacts the driver's experience and decisions. In the study by Chancey et al. (2017), participants were more likely to comply with the system if they had higher trust in the system. Therefore, we expect that agents who are more reliable may lead to a higher level of trust from a participant. With regards to the emotional state of the driver, low reliability resulted in more negative emotions and high reliability resulted in more positive emotions (Fahim et al., 2021). In addition, a more reliable agent has been shown to reduce anxiety but was not shown to lower hostility or loneliness (Fahim et al., 2021). We expect these results to be similar to those found by Chancey et al. (2017), with a correlation between the level of trust and the reliability of the agent. However, with the addition of emotional states, there may be a significant interaction that changes these results. Given that this will likely be the first interaction with automated vehicles for many of our participants, we predicted that trust levels would improve throughout the study regardless of the reliability of the system. We minimized this by counterbalancing the order of reliable and unreliable agents between participants in the present study.

Research gap and unique contributions
Previous research focused heavily on the connection between trust and factors of reliability. However, the amount of research Frontiers in Psychology 04 frontiersin.org connecting emotion and trust is limited in the automated vehicle context. Additionally, there is a lack of information about the influences of emotion on both cognitive trust and affective trust. This paper intends to expand on these factors, as well as determine an interaction between emotion and reliability on trust. With the findings in this paper, we aim to provide contributions to future CAV designs, namely regarding the design and implementation of IVAs.
In the current study, our objective was to investigate how drivers' emotional states and the reliability of IVAs influence driver response in a Level 3 automated vehicle. To this end, we had the following research questions: 3. Method

Participants
Forty-eight participants (33 male, 14 female, 1 non-binary) were recruited. Two participants were excluded from the study due to simulator sickness and were not counted in the total 48. All participants were between the ages of 15 and 30, had an active drivers' license, and had normal or corrected-to-normal vision and hearing. The average age was 21.1 (SD = 2.0). They had an average driving experience of 4.6 years (SD = 2.1), drove an average of 7.2 times a week (SD = 2.3) and an average of 7254.8 miles per year (SD = 10478.3).

Stimuli and equipment
Participants used a Nervtech driving simulator which included a steering wheel, gas and brake pedals, and an adjustable seat, and three visual displays providing a horizontal view of 120°. Between and during each trial, desktop computer and tablet were used to complete a series of surveys. IVAs were represented using two programmable humanoid robots, with connection to WiFi and Bluetooth. The first was NAO (Figure 1; Height: 22.6 in) by SoftBank Robotics, and the second was Milo (Figure 1; Height: 24 in) by RoboKind. These robots were placed next to the participant for the duration of each trial as displayed by Figure 1 (right) and were used to communicate driving and vehicle information.
IVAs' speech clips were created through Amazon polly TTS (textto-speech) service. Agent reliability was counterbalanced across participants to reduce the effects of robot appearance. For half of the participants, Milo represented a highly reliable agent and NAO represented an unreliable agent. This was reversed for the other half. SCANeR Studio, developed by AV Simulation, was used to develop driving scenarios. The computer used for this software contained an i7-8086K CPU and a Nvidia GTX 1080 graphics card.

Experimental design
The study used a 2×3 mixed factorial design with reliability as a within-subjects variable (high vs. low reliability) and emotion as a between-subjects variable (happy, angry, and neutral emotions). Sixteen participants experienced the happy emotion condition (13 male, 2 female, 1 non-binary), 16 experienced the angry emotion condition (10 male, 6 female), and 16 experienced the neutral emotion condition (10 male, 6 female). The reliability levels of IVAs were defined by the accuracy of information provided by the IVAs for each takeover event. Table 1 includes the scripts or instructions of the agent's speech for each takeover event during each driving scenario. Each agent's instruction was divided by three distinct pieces of information regarding the takeover event. In the high reliability condition, IVAs provided drivers with information with 100% reliability; however, in the low reliability condition, IVAs provided drivers information with 67% reliability (one out of three pieces of information is wrong). Each reliability condition was experienced, in fully counterbalanced order, by each participant with three optional takeover events per reliability condition. Optional takeover events allowed the drivers to choose whether to follow the agent's instruction at each takeover event, therefore, served as the compliance to the agent's instruction. The takeover events included a blockage on the road, a hardware or mechanical error, and hazardous weather including rain and fog. The two scenarios used the same city and events; however, the driving route and order of events differed. For example, the route of the first scenario started with driving on a straight road, but the second scenario started with a turning signal ahead. Regarding the differences in the order of events, the second event of the first scenario is car swerving, whereas this event is the fifth event of the second scenario. These routes contained both straight and curved roadways, traffic signals, intersections, and other vehicles driving on the road.

Procedure
The experimental procedure lasted at most 2 h. Upon arrival, participants were given a brief description of the study, and were asked to sign a consent form approved by the Virginia Tech Institutional Review Board. Participants were then given an explanation on the use of the driving simulator and performed a test drive to familiarize themselves with the device. This test drive only included manual driving and did not expose the driver to level 3 driving or any interaction with the IVAs to avoid any bias on the IVAs prior to the actual driving experiments. The primary purpose of the test drive was to evaluate participants' motion sickness level on the driving simulator. To assess simulator sickness, participants were given a pre-and postquestionnaire both before and after the test drive to rate 17 symptoms of motion sickness. These were rated on a scale of 0 to 10, with 0 being "not at all" and 10 being "severe. " If simulator sickness was an issue, they were compensated and dismissed from the study. Two participants were dismissed due to simulator sickness. If not an issue, the demographics and emotional status surveys were completed. Participants were given sample paragraphs of a correlating emotional state (happy, angry, or neutral) and were given 12 min to write about a past positive or negative experience (happy or angry), or to write a detailed schedule of their previous day (neutral). This emotion Frontiers in Psychology 05 frontiersin.org induction method was validated in more than 20 previous emotional driving research studies (FakhrHosseini and Jeon, 2017). After completing the emotion induction task, participants repeated the emotional status survey as a manipulation check.
Before each simulated driving trial, participants were introduced to one of the robot agents. They were instructed that the robot may ask them to take over in certain situations, but they were allowed to choose not to. If the participant did take over the vehicle, they were required to hand control back when asked. Between each trial, participants were asked to complete the required questionnaires on their experience. Upon completion of both reliability conditions, participants completed a third emotional status questionnaire. They were then debriefed on the emotion induction to ensure that negative effects of emotions did not persist after the conclusion of the experiment.

Questionnaires
There were five categories of questionnaires used in the experiment. The first of these was a demographics survey, completed prior to starting the experiment. This asked participants about their age, gender, ownership of a driver's license, years of driving experience, number of times driven per week, and the number of miles driven per year.
An emotional status questionnaire was completed three times throughout the experiment. The first was before the emotion induction, the second was immediately after emotion induction, and the last was after the completion of both trials. Participants were asked to rate fear, happiness, anger, depression, confusion, embarrassment, urgency, boredom, and relief on a seven-point Likert scale. These emotions were stated to be important to driving situations in a previous study (Jeon and Walker, 2011).
During two specified points in each scenario, the simulation was paused, and participants were asked to complete a Situational Awareness Global Assessment Technique (SAGAT) questionnaire to assess SA. The first trial was paused at a tunnel accident with a police car and semi-truck on the left, and then at an intersection where a car was stopped at a light. The second trial was paused inside of a tunnel, and then near a roadside accident with a police car and a man with a stroller on the right. The questionnaire consisted of five open-ended questions, divided into three levels. Level one questions, relating to perception, asked "What elements of interest do you see on the screen?" and "What vehicles did you notice around you?. " Level two questions, concerning comprehension of the event, asked "What do these elements tell you about the current situation?" and "What is currently happening in the scenario?. " Lastly, the level three question, concerning projection of future events, asked "What do you think will happen next?" After each trial, participants completed two sets of questionnaires. The first was the NASA Task Load Index (TLX, Hart and Staveland, 1988) to measure subjective workload. The second set was a series of Subjective Judgment Ratings, which included the Godspeed questionnaire (Bartneck et al., 2008), the Social Presence scale (Harms and Biocca, 2004), the Robotic Social Aptitude scale (RoSAS, Carpinella et al., 2017), the Subjective Assessment of Speech System Interfaces scale (SASSI, Hone and Graham, 2000), the Trust in Automation scale (Jian et al., 2000), the Cognitive Trust scale (McAllister, 1995), and the Affective Trust scale (McAllister, 1995).

Takeover performance
To compare subjective responses with actions, driving simulator recordings were taken of each trial. The compliance of the participants to the agent's instructions was measured by whether there was a takeover of control. Additional performance measures included takeover reaction time, lane position, speed, longitudinal acceleration, lateral acceleration, steering wheel angle, jerk, and take over type (McDonald et al., 2019).

Results
All data were checked for sphericity and normality. When the sphericity assumption was violated, the Greenhouse-Geisser correction was applied. In a few cases, data were not normally distributed because some of the data came from the interval data (e.g., Likert-type data), but ANOVA was still applied to the data Robotic IVAs NAO and Milo (left) and Simulator and agent setup (right). NAO is placed in the same position when used.
Frontiers in Psychology 06 frontiersin.org instead of the non-parametric analysis for the following reasons: (1) F-test (e.g., ANOVA or ANCOVA) is known as robust to violations of the interval data assumption and could be used to conduct statistical tests with no resulting bias (Carifio and Perla, 2007;Norman, 2010) and (2) non-parametric tests cannot show the interaction effects between variables, which we wanted to investigate in the present study.

Manipulation check
The emotion induction results from the emotional status questionnaire were analyzed with a separate ANOVA for each emotion condition (happy, angry, and neutral). Only the corresponding emotional state was analyzed for each condition (e.g., happiness score for happiness condition, etc.). For the neutral condition, both happiness and anger scores were analyzed. Table 2; Figure 2 show the average of the affective rating scores over three timings: before induction, after induction, and after experiment. The results reported significant differences in the ratings scores of anger [F(2,28) = 18.41, p < 0.001, η p 2 = 0.55] and happiness [F(2,28) = 6.95, p < 0.01, η p 2 = 0.33], and no significant difference was found in the neutral conditions [happiness of neutral: F(2,30) = 1.70, p = 0.20; anger of neutral: F(2,30) = 1.18, p = 0.32]. With the Bonferroni correction (α = 0.0167), the average anger score from angry participants after induction was significantly higher than before induction (p < 0.01). In addition, the anger score after experiment was significantly lower than after induction (p < 0.01). For happy participants, the average happiness score after induction was numerically higher than before induction as expected, but it was not statistically significantly higher than before induction. However, the happiness score after experiment was significantly lower than after induction (p < 0.01) as well. Overall, for both happy and angry participants, the affective rating scores were subjectively higher after induction than before induction.

Situation awareness (SAGAT)
A scoring rubric was made to grade participants' responses in SAGAT. The average score of each participant was analyzed with 3 (Emotions) x 2 (Reliability Levels) mixed ANOVA for each condition. Emotions were found to have a main effect on SA (Table 3). No significant difference was found between the two reliability levels and in the interaction between emotions and reliability levels. According to least significant difference (LSD) post hoc tests, angry and neutral participants were found to have significantly higher SA scores than happy participants, especially for level 1 and level 2 questions (Figure 3).  Bolded parts were presented in the high-reliability condition. Italic parts were presented in the low-reliability condition.

Subjective judgment ratings
The results of the subjective questionnaires (Godspeed, Social Presence, RoSAS, and SASSI) were analyzed with 3 (Emotions) x 2 (Reliability Levels) mixed ANOVA for each condition.
In the Godspeed and RoSAS questionnaires, Agent reliability was found to have a main effect on perceived intelligence [F(1,45) = 5.15, p = 0.03, η p 2 = 0.10; Table 4]. A significantly higher rating of perceived intelligence was found in the high reliability condition than the low reliability condition (p = 0.02; Figure 4

left).
There was also a statistically significant difference in warmth in the interaction between emotions and reliability levels [F(2, 45) = 4.33, p = 0.019, η p 2 = 0.16; Table 5]. According to LSD post hoc tests, angry and happy participants in the high reliability condition rated significantly higher scores of warmth than the neutral participants in the high reliability condition (p = 0.01 and p = 0.03, respectively). Also, neutral participants in the low reliability condition rated significantly higher scores of warmth than the neutral participants in the high reliability condition (p = 0.02; Figure 4 right). No significant difference was found between the two reliability levels and in the interaction between emotions and reliability levels in other items.
No significance was found in either the Social Presence scales or SASSI scales.

Trust in automation, cognitive trust, and affective trust
The results of trust scales were analyzed with 3 (Emotions) x 2 (Reliability Levels) mixed ANOVA for each condition.

Perceived workload
The weighted perceived workload (NASA-TLX) over emotions was analyzed with 3 (Emotions) x 2 (Reliability Levels) mixed ANOVA for each condition. Emotion was found to have a main effect on physical demand (Table 5). No significant difference was found between reliability levels and in the interaction of emotions and reliability levels. Participants perceived the highest physical demand in the happiness condition (p = 0.01) than the anger and neutral conditions (p = 0.01; Figure 6). There was no significant difference found in the other dimensions of NASA-TLX.

Takeover performance
The results of takeover performance were analyzed with 3 (Emotions) x 2 (Reliability Levels) mixed ANOVA for each condition. When a data value is over three standard deviations, it was considered as an outlier and removed from the data analysis. Two outliers were found in the takeover performance result; therefore, the sample size for this result was deducted by two (N = 46-2 = 44). There was a The average affective rating scores over different rating timings. (*: p < 0.0167, error bars represent standard errors).
Frontiers in Psychology 08 frontiersin.org significant difference in the interaction between emotions and reliability levels for jerk in the takeover performance data (Table 6). Angry participants in the low reliability condition and happy participants in the high reliability condition had a significantly lower jerk rate than the happy participants in the low reliability condition (p = 0.05 and p = 0.01, respectively; Figure 7). The number of times participants complied with the agents were counted in each condition. No significant difference was found in other takeover performance items and compliance count (Tables 6, 7).

Discussion
To determine the impact of driver emotions and agent reliability levels on a Level 3 automated driving system, drivers' responses from different measures including situation awareness, subjective perception, trust, perceived workload, and takeover performance were compared. Overall, the results showed that emotions play an important role in raising drivers' attention to making observations in different driving situations; and the reliability of the agent impacts drivers' perceived intelligence of the system. The interaction between the emotions and reliability levels has a distinctive impact for the perceived warmth of the system and a part of affective trust and takeover performance.

Emotion induction
Before looking at the results, deriving successful emotion inductions from participants was important. Participants were asked to write their past experience(s) with their assigned emotion, angry or happy, for 12 min. For neutral participants, they were asked to write events of their day in a chronological order. Both anger (significantly) and happiness (numerically) scores were higher after induction than before induction, which are consistent with the previous studies (e.g., Jeon et al., 2014a,b). Even though happy participants did not reveal a significantly higher score in the after induction than before induction, the previous studies showed the effectiveness of this emotion manipulation method (Jeon et al., 2014a,b). Happy participants' emotional state might have been changed, but they might not be cognitively aware of that change. Different outcomes of their behaviors also supported this notion. The happiness score being significantly lower after the experiment might be due to the boringness and exhaustion (note that their physical demand was significantly higher than in other emotion conditions) from the driving scenarios. Anger scores after the experiment decreased dramatically to almost the same as before induction. These anger and happiness scores from before induction and after the experiment were similar to neutral participants' scores ( Figure 2; Table 2). Therefore, participants in the angry and happy conditions were "neutral" before the induction signifying the writing task successfully induced participants' affective states to the designated emotion.

Situation awareness
Because in Level 3 automated vehicles, drivers may need to take over, maintaining situation awareness is vital. Participants in the angry (72.01%) and neutral states (71.09%) showed significantly higher situation awareness score than the participants in the happy state (51.95%) for all three levels (Table 3). Because happiness broadens the scope of attention (Derryberry and Tucker, 1994) or reduces the resources available for effortful processing (Mackie and Worth, 1989), it could have the participants neglect important details in the

Measures
Conditions Statistics surroundings and divert their attention to other aspects (Jeon, 2015). In the study by Finucane (2011), negative emotions with a high arousal like anger enhanced selective attention because those emotions inhibit unrelated stimuli and narrow attentional focus (Easterbrook, 1959;Fredrickson, 1998). These findings might explain the SA result in the present study.

Subjective judgment ratings
The Godspeed ratings demonstrated part of the expected results. Participants reported higher perceived intelligence for the IVAs in the high reliability condition than in the low reliability condition (Table 4). When IVAs provided correct instructions that were related to the driving scenarios on time, participants might perceive the IVAs as responsive and robust, which were identified as the two characteristics that impact perceived intelligence (Krening and Feigh, 2018). This result could also serve as a manipulation check that participants noticed the difference in two reliability conditions. Interestingly, in the RoSAS ratings, there was an interaction found between emotions and reliability levels in Warmth for the IVAs. Angry and happy participants perceived more warmth than the neutral participants in the high reliability conditions; in addition, neutral participants in the low reliability conditions also rated a higher warmth score than those in the high reliability conditions (Table 5). Warmth plays a major role in the impression formation process and positive interpersonal interactions, which also elicits emotions (Carpinella et al., 2017). This result might imply that when the IVAs provided accurate takeover instructions to the drivers with high reliability, the drivers who were induced to have emotions with a high arousal, such as angry and happy, were more likely to share the conversation and receive the information from the IVAs than the neutral drivers, which leads to a positive interaction during the driving scenarios (Berger, 2011). Meanwhile, neutral participants might perceive the IVAs' mistakes as more tolerable and human-like and trust the robots more in the low reliability condition than in the high reliability condition because of the embodiment effect (Kontogiorgos et al., 2020). In the study by Kontogiorgos et al. (2020), the failures negatively impacted people's perception of a smart speaker but not to a human-like embodied robot. With a sense of imperfection, people perceived the anthropomorphized robot as less machine like and more likeable (Salem et al., 2013). This finding could also explain why no significant results showed in the discomfort rating in different emotion and reliability conditions. No significant difference was found for both Social Presence and SASSI scales, both of which measured participants' subjective judgment toward the agent. Both IVAs are humanoid robots with a similar size, and they both used their own factory voices during the experiment. This might suggest that participants perceived similar social presence and efficiency and effectiveness of speech from both IVAs' regardless of participants' emotions and the reliability of the system in the present study.

Trust
No main effects were found in the Trust in Automation scale. It might imply that participants in different affective states still perceived relatively high trust (4.61 ~ 4.97) from the automated driving system or the robotic agents regardless of the emotion conditions. When working with an imperfect automation, people might benefit from calibrating their trust and adjusting attention to make better decisions (Parasuraman et al., 2000;Lee and See, 2004). Also, both cognitive trust and affective trust scales did not show the significant difference across emotion conditions. However, when closely looking into the data, there were significant differences found in a couple of items in the Affective Trust scale toward the IVAs. A main effect was found among the three emotions for "I would feel a sense of loss if I could no longer use the agent (McAllister, 1995). " Angry participants rated lower to this affective trust item than the happy and neutral The scores of situation awareness over emotions for different levels of SA questions (*: p < 0.05, error bars represent standard errors).
Frontiers in Psychology 10 frontiersin.org participants. Because anger has high other responsibility and high selfcontrol (Smith and Ellsworth, 1985), Dunn and Schweitzer (2005) suggested that anger would decrease trust. For instance, angry participants might have perceived IVAs to be responsible for the hazardous driving scenarios and considered they could control the situation for themselves without IVAs' help. There was also an interaction between emotions and reliability for "I would have to say that both the agent and I have made considerable emotional investment in our working relationship (McAllister, 1995). " The result demonstrated that happy participants in high reliability conditions are more likely to build this affective trust in the cooperation with driving agents than participants in the low reliability conditions. It means that both happy (i.e., positive) emotion and high reliability matter to gain high affective trust. Based on our outcomes, we can cautiously infer that cognitive trust can be relatively easily formed compared to affective trust, but users' positive emotions may promote to build affective trust, specifically when the system reliability is high. There is the possibility that when the previous studies (e.g., Hafızoğlu and Sen, 2018;Fahim et al., 2021) showed that happy emotions lead to high trust, it might be attributed to affective trust. However, little research has investigated the two constructs (cognitive and affective trust) separately as in our study. According to Lee and See (2004), trust is an affective response with some influences from analytic and analogical (i.e., cognitive) processes; in addition, the affective process of trust development has a greater impact on the analytic process side. More importantly, less cognitive demand is required to develop affective trust that links to the characteristics of agents and environments (Lee and See, 2004). Even though the main effect of reliability in the cognitive trust score did not reach the traditionally significant level (p = 0.054; Table 4), the result might still imply that reliability levels showed the tendency to predict cognitive trust. However, this finding did not apply to affective trust because participants' affective responses might be impacted more by the participants' emotions than the reliability of the system.

Perceived workload
Happy participants perceived significantly higher perceived physical demand than both angry and neutral participants from the NASA-TLX result. This result might suggest that happiness motivated and engaged participants to perform the driving task which might require more physical effort (Joo and Lee, 2017). In the present study, participants experienced driving scenarios that were designed to contain dangerous driving situations that required takeovers, such as driving in a foggy weather and an unpredictable car accident ahead. A motivational theory suggested that the positive affect, such as happiness, might encourage the participants to work harder than other emotions in unpleasant situations to solve problems and maintain their positive state (Isen, 1987;Wegener et al., 1995;Jeon, 2015). In the study by Jeon et al. (2014b), happiness also had numerically higher scores in perceived workload than other emotions.

Takeover performance and compliance
An interaction between emotions and reliability levels was also found for the jerk rate. Jerky driving is considered as one of the aggressive actions that could cause driving accidents (Bagdadi and Várhelyi, 2011). Happy participants in the low reliability condition had a significantly higher jerk rate than the angry participants in the low reliability condition. Happy participants' low situation awareness results partly explain this behavior in the two emotion conditions. Also, when happy participants are in the high reliability condition, they showed a significantly lower jerk rate than in the low reliability conditions. The perceived utility model by Blanchette and Richards (2010) suggested that emotions have complex impacts on decisionmaking and reasoning. In the low reliability condition, happy participants might perceive more frustration from the wrong instructions provided by the agents in the takeover events than the angry participants (Isen et al., 1988), which might lead to more jerky driving. Because happy participants performed more jerky driving, this could also explain why they perceived a higher physical workload than other emotions. Dealing with IVAs in the low reliability condition, angry participants might still insist their own decisionmaking power on driving due to their certainty and controllability (Ghasem-Aghaee et al., 2009). This result suggests that not only emotions contribute to driving behaviors but also having correct (reliable) takeover instructions is important, which can reduce the risk of aggressive actions in driving.
No significant differences were found in the number of compliances. The participants might have felt that takeover situations were all urgent and therefore, chose to switch to manual driving every time when the IVAs provided takeover instructions in an event. Also, the takeover instructions in the driving events might be very clear to handle, therefore, leading to similar compliance and other takeover performance items (takeover time, speed, longitudinal/lateral accelerations, and wheel angles).

Limitations and future work
Two participants quit the present study after the test drive because of motion sickness, which is very common in driving simulator studies (Kennedy and Frank, 1985). Participants seemed to over comply with the agent's instructions, which might indicate that participants perceived a high level of reliability from the humanoid robots regardless of the designated reliability levels in emergency situations, such as takeover events (Robinette et al., 2016). Varying the level of urgency in the driving scenario would be of interest. Future studies could also incorporate the impact of different levels of anthropomorphism on drivers' perceptions and trust on in-vehicle agents. The present study was conducted using a driving simulator, which may not fully reflect participant's on-road takeover driving behaviors and their subjective responses The average rating score of perceived intelligence over reliability levels (left) and the average rating score of warmth over emotions and reliability levels (right) (*: p < 0.05, error bars represent standard errors).
Frontiers in Psychology 12 frontiersin.org from a real-life driving situation. Participants' past experiences with automated vehicle might also influence their trust development with the system. However, the results could still approximate drivers' behaviors on a CAV and provide insights on the design of future research studies. Although the happy state induction did not lead to statistically significant difference, we still noticed a numerical difference in participants' happy states in the before and after induction. In future studies, physiological measurements can be included to compare with the subjective ratings of participants' emotion states. Finally, the result of the present study might not represent the elder drivers or other populations' responses to a Level 3 automated vehicle because the participants were all college students in the present study. These observations should be considered and improved in the future studies for generalization.

Design guidelines
Based on the overall outcomes from the present study and literature, we extracted practical guidelines for the design of conditionally automated vehicle (CAV) systems, and future research. technologies and agents to mitigate drivers' distraction and increase drivers' situation awareness, while reducing their workload. • Specifically, when drivers are in a happy state and the system reliability is low, design the system to improve drivers' performance; for example, the system can provide real-time feedback about their inappropriate driving behaviors, such as jerk, so they can be aware of their negative driving performance.
• When drivers are in an angry state, design the in-vehicle technologies and agents to enhance their affective trust, awareness of their emotional state, and decisionmaking processes. • In future research, vary different levels of urgency so that compliance can be differentiated between the different conditions. • In future research, choose the variables in a sophisticated way to better disentangle cognitive trust and affective trust and investigate their effects on the interaction with the agent.

Conclusion
The purpose of the present study was to investigate how driver emotional states and CAV agent reliability influence situation awareness, subjective judgment, trust, workload, and The average rating score of affective trust in "I would feel a sense of loss if I could no longer use the agent" over emotions (left) and "I would have to say that both the agent and I have made considerable emotional investment in our working relationship" over emotions and reliability levels (right) (*: p < 0.05, error bars represent standard errors). The average rating scores of perceived physical demand over emotions (*: p < 0.05, error bars represent standard errors).
Frontiers in Psychology 14 frontiersin.org takeover performance in Level 3 automated driving. The findings showed that SA was lower for happy participants compared to both angry and neutral participants (RQ1). Happy participants seemed more likely to be distracted from the takeover events in the present study. Most importantly, interactions between emotions and reliability levels occurred in subjective judgment (Warmth, Affective Trust), and performance (Jerk) (RQ2). Agents with high reliability were rated as having a higher perceived intelligence. In general, happiness with high reliability was contributed to those scales positively and benefited the driver's behavior. However, the results also showed that low reliability can even negatively influence happy drivers. There was an absence of any significant results in the other trust scales and takeover compliances. This may imply that the influence of affective trust is independent of the influence on other trust forms. To conclude, the results imply that both positive emotion and high reliability are required in developing emotional relationships and trust with IVAs, which encourages positive driving behaviors.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by Virginia Tech Institutional Review Board. The patients/ participants provided their written informed consent to participate in this study.

Author contributions
SZ implementation of the driving scenario on the simulator, data collection, writing-original draft, and writing-review and editing. JD theoretical foundations and definition of research questions and  The jerk rate over emotions and reliability levels (*: p < 0.05, error bars represent standard errors). hypotheses, formal analysis, writing-original draft, and writingreview and editing. ST implementation of the driving scenario on the simulator, data collection, and writing-review and editing. CS implementation of the driving scenario on the simulator, data collection, and writing-review and editing. MJ theoretical foundations and definition of research questions and hypotheses, conceptual design of the experiment, project administration, supervision, funding acquisition, and writing-review and editing. All authors contributed to the article and approved the submitted version.

Funding
This study was supported by the Northrop Grumman Undergraduate Research Program.