Children With Autism Spectrum Disorder Can Attribute False Beliefs in a Spontaneous-Response Preferential-Looking Task

An established body of literature indicates that children with autism spectrum disorder (ASD) have difficulty understanding figurative language due to a deficit in theory of mind, or the ability to consider the beliefs of other people. Children with ASD tend to similarly fail traditional theory of mind tasks, which assess their ability to represent false beliefs. Our claim is, however, that these tasks involve cognitive processing demands that might mask false belief understanding because they require elicited responses. We examined whether children with ASD demonstrate false-belief understanding when tested with a spontaneous-response false belief task that measures children’s eye gaze durations. The two child participant groups were composed of 20 males with ASD (aged 3–9 years) and 20 typically developing males (aged 2–5 years) who were individually matched according to verbal mental age. Children with ASD and typically developing children listened to a change-of-location story accompanied by a book with matching and non-matching pictures. The final page showed the character searching for her object in a location that was either consistent or inconsistent with her belief. Both groups of children looked reliably longer at the belief-consistent picture, regardless of whether the character’s belief was true or false, though children with ASD were slower to do so. We suggest that a spontaneous-response assessment technique can potentially reveal figurative language comprehension in children with ASD in future research.

Studies have demonstrated that people with autism spectrum disorder (ASD) have difficulty understanding figurative language due to a deficit in Theory of Mind (ToM; Happé, 1995;Martin and McDonald, 2004;Whyte et al., 2014). ToM, or the ability to think about the beliefs and intentions of other people, is a fundamental cognitive process integral to social interaction and interpersonal communication (Sperber and Wilson, 1995). Research on the developmental origins of mental-state reasoning has focused on when children understand that individuals can hold false beliefs, as this implies awareness of the fact that others can have mental states that differ from one's own. Earlier investigations examined false-belief understanding using elicited-response tasks, in which children answer direct questions about someone who holds a false belief (e.g., Baron-Cohen et al., 1985;Perner et al., 1987;Gopnik and Astington, 1988;Wellman et al., 2001). While typically developing (TD) children begin to pass these tasks around age 4, children with autism spectrum disorder (ASD) fail at a comparable verbal mental age (VMA; Baron-Cohen et al., 1985;Wellman et al., 2001). Although some children with ASD eventually pass elicited-response tasks, they do not do so until the relatively advanced VMA of 11 years (Happé, 1995). The traditional interpretation of these findings is that children with ASD lack the ability to represent false beliefs, and that those who eventually pass false-belief tasks do so via alternative strategies that depend on advanced verbal abilities and do not involve consideration of mental states (Baron-Cohen et al., 1985;Bowler, 1992;Baron-Cohen, 1995;Happé, 1995).
One problem with the traditional interpretation, however, is that successful performance in an elicited-response false-belief task requires more than false-belief understanding: these tasks also impose substantial executive function, linguistic, and pragmatic demands (e.g., Leslie and Polizzi, 1998;Birch and Bloom, 2003;Baillargeon et al., 2010;Hansen, 2010;Rubio-Fernández and Geurts, 2013;Helming et al., 2016;Roby and Scott, 2016;Kampis et al., 2017;Westra and Carruthers, 2017). To illustrate, consider the classic Sally-Ann task (e.g., Baron-Cohen et al., 1985) in which children hear the following story: Sally hides her marble in a basket and then leaves; in her absence Ann moves the marble to a nearby box. Children are then asked where Sally will look for the marble when she returns. According to the processing-demands account (e.g., Setoh et al., 2016;Scott and Baillargeon, 2017), answering this question correctly (i.e., pointing to the basket) depends on at least three processes. First, children must represent Sally's false belief and maintain this representation in working memory (representation process). Second, they must interpret the test question, choose to answer it, and select an appropriate response (response-generation process; Setoh et al., 2016; see also Mueller et al., 2007;Saxe et al., 2006). Third, while selecting their response, children must inhibit the prepotent tendency to respond based on their own knowledge of the toy's location (e.g., responseinhibition process; Birch and Bloom, 2003;Leslie and Polizzi, 1998; see also Nilsen and Graham, 2009). The demands imposed by the response-generation and response-inhibition processes might overwhelm children with limited executive function skills, such as TD toddlers and children with ASD (Ozonoff et al., 1991;Bennetto et al., 1996;Minshew et al., 2004), thereby masking an underlying ability to represent false beliefs.
This account predicts that if task demands were reduced, both young TD children and children with ASD might demonstrate false-belief understanding. Two sets of results with TD children provide support for this prediction. First, reducing the demands of elicited-response tasks enables TD children to pass at younger ages (e.g., Chandler et al., 1989;Lewis and Osborne, 1990;Bartsch, 1996;Rubio-Fernández and Geurts, 2013;Bialecka-Pikul et al., 2019;Psouni et al., 2019;Salter and Breheny, 2019). In particular, several recent studies have found that even 2.5-years-olds can pass elicited-response tasks when both the response-generation and response-inhibition demands are sufficiently reduced (Setoh et al., 2016;Grosso et al., 2019;Scott et al., 2020).
The second source of support for this prediction comes from recent studies that have used spontaneous-response false-belief tasks with TD infants and toddlers with positive results. In spontaneous-response tasks, children are not asked direct questions that require them to predict the behavior of an agent who holds a false belief. Instead, children's false-belief understanding is inferred from behaviors that they spontaneously produce as they watch the agent act in a scene. These include emotional expressions (e.g., Moll et al., 2016), looking behaviors such as where children look or how long they look at a scene (Southgate et al., 2007;Scott, 2017), and physical actions such as spontaneous pointing (Knudsen and Liszkowski, 2012). Because children are not asked direct questions in these tasks, the response-generation and response-inhibition processes are not activated. When these demands are removed, TD children demonstrate false-belief understanding as early as 7 months of age (e.g., Kovács et al., 2010;Onishi and Baillargeon, 2005;Scott et al., 2012;Southgate et al., 2007; for a review, see; Scott and Baillargeon, 2017).
These positive findings with TD infants and toddlers raise the possibility that children with ASD might also succeed in falsebelief tasks that involve reduced processing demands, such as spontaneous-response tasks. A handful of studies have investigated this possibility using anticipatory-looking tasks (Senju et al., 2010;Schuwerk et al., 2016a;Burnside et al., 2017). In these tasks, researchers measure where children look in anticipation of an agent's search for an object (e.g., Clements and Perner, 1994;Garnham and Ruffman, 2001;Southgate et al., 2007;He et al., 2012). If children visually anticipate the agent's search by looking at the location where she falsely believes her object is located, this suggests that they successfully represented the agent's false belief. Senju et al. (2010) tested 7.5-years-old TD children and children with ASD using a nonverbal anticipatory-looking task adapted from Southgate et al. (2007). Children watched videotaped events in which an agent wearing a visor sat behind a panel with two closed windows; a box sat below each window. In four familiarization trials, a toy was located on (first two trials) or inside (last two trials) one of the two boxes. The windows lit up and a chime sounded. After a brief delay, the agent reached through the correct window and retrieved the toy. In the test trial, the agent saw a puppet hide a toy in the right box. A phone then rang behind the agent, who turned toward the sound. While the agent was facing away, the puppet removed the toy from the box and left with it. The phone then stopped ringing, the agent turned back towards the boxes, and the windows lit up. Senju et al. measured children's looking time to each of the two windows during a 5-s interval after the windows were illuminated. Replicating prior findings, TD children anticipated the agent's behavior and looked longer at the window above the right box, where the agent falsely believed the ball was hidden. In contrast, children with ASD looked relatively equally at the two windows, suggesting a failure to represent the agent's false belief. Similar negative results have been found in other studies that have used anticipatory-looking tasks with children with ASD (Schuwerk et al., 2016b;Burnside et al., 2017) and adults with ASD (Senju et al., 2009;Schneider et al., 2013;Schuwerk et al., 2015).
These negative results in anticipatory-looking tasks have led some researchers to argue that ToM is fundamentally impaired in individuals with ASD (e.g., Senju et al., 2010;Schneider et al., 2013;Burnside et al., 2017). However, there are at least two alternative explanations for these negative findings. First, a common feature of these studies is that they all measured anticipatory-looking responses. Previous research suggests that children and adults with ASD are less likely than typical individuals to correctly anticipate the actions of others, even in situations that do not involve false beliefs (e.g., Krogh-Jespersen et al., 2018;Ruffman et al., 2001;Schuwerk et al., 2016a;von Hofsten et al., 2009;Zhou et al., 2019). The negative findings in anticipatory-looking false-belief tasks might therefore reflect difficulties visually predicting others' actions rather than an inability to represent false beliefs. Second, these studies measured rapid responses made after an anticipatory prompt: either the first place that participants looked, or their looking time to each location during a short 2-5 s time window. Prior studies suggest that when presented with social stimuli, individuals with ASD and neurotypical individuals display different gaze patterns over short time intervals, but these group differences are reduced or absent at longer time intervals (e.g., Fletcher-Watson et al., 2009;Zwickel et al., 2011;Schuwerk et al., 2015). This suggests that the processing of social information might unfold more slowly in children and adults with ASD than in TD children and adults. For these reasons, anticipatory-looking tasks might provide a poor measure of underlying belief processing in children and adults with ASD, despite being spontaneous-response tasks.

USING EYE GAZE TO ASSESS FIGURATIVE LANGUAGE COMPREHENSION
A large proportion of our daily interactions require that we must rapidly employ ToM reasoning to determine what other people believe and intend. For instance, we must consider speakers' true beliefs and intentions when they use figurative language forms including verbal irony, sarcasm, metonymy, metaphor, idioms, rhetorical questions, and hyperbole. There is a growing body of literature showing that assessing eye gaze latencies to measure figurative language comprehension in children (Nicholson, Whalen, and Pexman, 2013;Koder and Lakum, 2020;Whalen, Doyle, and Pexman, 2020) is a more sensitive test of comprehension than verbal response tasks because eye gaze tasks are less cognitively demanding.
In line with this idea, Pexman and colleagues (2011) measured eye fixations in children with high functioning ASD when determining the speaker's intent (i.e., nice or mean) for ironic criticisms and literal criticisms. Children were trained to respond to the experimenter's question "Was the speaker like the duck or like the shark?" by placing a "nice duck" or a "mean shark" into a response box. These researchers measured eye gaze latencies after the question and calculated the proportion of eye fixations to objects within three short phases after the prompt. Compared to typically developing children, children with high functioning ASD looked longer at an object linked to an incorrect literal interpretation (i.e., the duck) immediately after they heard the ironic criticism, but their eye gaze arrived at a correct ironic interpretation (i.e., the shark) faster than the age matched controls. Pexman et al. (2011) concluded that children with high functioning ASD and TD children demonstrated different verbal irony processing strategies.
In the present study, we similarly offer a novel experimental approach but with a different focus on examining ToM processing in children with ASD. Given the importance of ToM skills for figurative language processing in people with ASD (Whyte, Nelson and Scherf, 2014;however, see;, our study offers insight into how new experimental methods and their cognitive demands can influence task performance. We propose that this technique can be applied in future research on figurative language comprehension in children and children with ASD.

THE PRESENT STUDY
In the present experiment, we asked whether children with ASD would demonstrate false-belief understanding in a different type of spontaneous-response task that did not involve rapid, anticipatory responses. To address this question, we tested children with ASD in a spontaneous-response preferentiallooking false-belief task devised by Scott et al. (2012). In preferential-looking tasks, participants hear a word or sentence while viewing multiple images and their looking time to the images is assessed. Considerable research suggests that both children and adults tend to look longer at the image that matches the spoken utterance (e.g., Tanenhaus et al., 1995). Such tasks have been successfully used to study language development in children with ASD (e.g., Swensen et al., 2007;Tek et al., 2008;Naigles et al., 2011), suggesting they are well suited for this population.
Children were assigned to either a false-belief or a true-belief condition. In the false-belief condition, children heard a story while viewing a large picture book. The story told of a character named Emily, who hid her apple in a container and then left. While she was gone, a second character, Sarah, moved Emily's apple to another nearby container. Each page of the picture book showed two pictures: one picture matched the story and the other did not. On the last page of the book, one picture showed Emily looking for her apple where she falsely believed it was located (original-location picture), and the other picture showed her looking in the apple's current location (current-location picture). While viewing this last page, children heard 'Emily is looking for her apple.' We measured how long children looked at each picture during the first 8 s that the pictures were visible. Based on the original findings of Scott et al. (2012), we predicted that if children followed the story and represented Emily's false belief, then they would look significantly longer at the originallocation picture, which matched the last line of the story.
The true-belief condition was identical to the false-belief condition except that Emily was present when Sarah moved the apple and thus held a true belief about its location. This condition was designed to address the possibility that children in the false-belief condition might look longer at the originallocation image simply because they had formed an association between Emily and the container that she had acted on. If that were the case, then in both conditions, children should look longer at the original-location picture on the final page of the story. In contrast, if children were following the story and reasoning about Emily's belief about her apple's location, then children in the true-belief condition should respond differently than those in the false-belief condition: they should look longer at the current-location picture on the final page of the story.
This novel procedure allowed us to measure false-belief understanding in the absence of concurrent response-generation and response-inhibition demands present in traditional elicitedresponse tasks. This task also did not require children to anticipate Emily's actions; rather, they needed to identify the picture in which Emily's action was consistent with her belief about the apple's location. Finally, our task used a longer, 8-s time window to reduce the need for rapid belief processing, and we directly examined the possibility that children with ASD process social scenes more slowly than TD children by comparing the two groups' performance during the first and second halves of this test window. Based on the results of the preferentiallooking false-belief task reported by Barrett et al. (2013), we expected TD children in this age range to show a preference for the originallocation picture during the first 4-s window. However, given reports that ToM reasoning in adults with ASD operates more slowly than in TD participants (e.g., Schuwerk et al., 2015), we reasoned that these children might take longer than TD children to retrieve Emily's belief and identify the appropriate picture. These children might therefore show evidence of belief understanding only in the second 4-s window. Examining the two windows separately allowed us to explore potential differences in speed of processing across the two groups of children and gain insight into the time course of ToM processing in children with ASD.
We reasoned that evidence that children with ASD failed at the present task would support claims that children with ASD cannot represent false beliefs (e.g., Baron-Cohen, 1995;Senju et al., 2010). On the other hand, evidence that children with ASD succeeded at this task would 1) suggest that children with ASD can represent false beliefs, 2) begin to clarify the conditions under which children with ASD can display falsebelief understanding, and 3) provide researchers with a new experimental paradigm for exploring social understanding in children with ASD. These findings are important because they would give us insight into the debate surrounding the nature of ToM processing in people with ASD.

Participants
In order to reduce the possibility that children with ASD might succeed using alternative linguistic strategies (e.g., Happé, 1995), we aimed to recruit young children with preschool level language skills who could be assessed by the Peabody Picture Vocabulary Test-IV (PPVT-4; Dunn and Dunn, 2007), which we used as our measure of VMA for all participants. Accessing these participants is challenging because many children in this category have recently been diagnosed with ASD and parents of children recently diagnosed tend to be overwhelmed with arranging intervention strategies and parenting a child with special needs, reducing their willingness and availability to participate in experimental research.
Descriptive characteristics of the final sample can be found in Table 1. The final sample in the ASD group consisted of 20 males who had independently received a diagnosis of ASD from at least one developmental pediatrician or psychiatrist. To ensure consistency with how children were diagnosed, only children who had been formally assessed by a professional using the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989;Lord et al., 2000) were eligible to participate. Six additional children with ASD were unable to complete testing because their ASD symptomology interfered with their interest in participation, ability to display visual attention to both pages of the story book and ability to complete the PPVT-4. A further seven children with ASD were tested but excluded because they were distracted or fussy (2), they closed their eyes throughout the experiment (1), they demonstrated a side bias by looking 100% of the time to one side during the test trial (1), their test looking times were more than 2.5 standard deviations away from the mean of their condition (1), they failed to complete the PPVT-4 (1), or their VMA was over 2.5 standard deviations above the mean of the sample (1).
The TD group was recruited first to establish a pool of participants to individually match with participants with ASD. The final sample in the TD group consisted of 20 males who were individually matched according to VMA based on the PPVT-4. An additional eight TD children were tested but excluded because they were fussy or inattentive (2), they looked more than 85% of the time at the right picture during the setup trials (3), they never looked at the pictures in which the apple was hidden and moved (1), their test looking times were more than 2.5 standard deviations away from the mean of their condition (1), or they failed to complete the PPVT-4 (1). Twenty-three TD children were tested but excluded because they could not be individually matched with children with ASD, who tended to have very low VMA scores.
Participants with ASD were recruited through a children's hospital and through an organization that supports people and families living with ASD. TD children were recruited from local daycares and from a university-based language development lab. Participants were from primarily Caucasian middle class, English-speaking families who lived in middle class neighborhoods in a medium-sized Canadian city. Independent samples t-tests confirmed that the two groups were not significantly different in VMA (children with ASD: M VMA 4; 11, SD 1; 8; TD children: M VMA 5; 0, SD 1; 7), t (38) 0.06, p 0.953. In each group, half of the children were randomly assigned to the false belief condition (ASD group M age 6; 5, SD 2; 1; TD group: M age 4; 1, SD 1; 1), and half to the true-belief condition (ASD group M age 5; 2, SD 1; TD group M age 4; 1, SD 0; 8). Independent samples t-tests confirmed that there was no significant difference between the two conditions in VMA for either the ASD group (t (18) 0.921, p 0.369) or the TD group (t (18) 0.794, p 0.437).

Materials and Procedure
Participants were tested in a quiet room in their home or in a laboratory. Children sat at a table with a picture book raised at a 25 o angle in front of them (for a visual depiction of the setup, please see Scott et al., 2012). Younger children sat on a parent's lap, while older children sat on their own. When a parent was present, the parent was instructed to close his/her eyes and not to engage the child throughout the story. An experimenter sat on the children's left. A first camera captured the children's eye movements for subsequent coding and a second camera captured the experimenter to confirm that the experimenter's behavior did not influence the children's eye gaze behavior. The picture book rested on a plastic bookstand (29.5 cm wide by 23 cm tall) and the pages were attached to the top of the stand with three binder rings. Each page (71 cm by 22.5 cm) consisted of two clear plastic photo sheets attached in the center; color photos (26 cm by 20 cm) were inserted in the sheets and separated by 19 cm by 21.5 cm of white paper to facilitate coding. The picture book had eight pages. On the first two pages, one side showed a picture and the other side showed only white paper. For the remaining six pages, one side of the page showed a picture that matched the story, and the other side showed a picture that did not match the story.
At the start of the experiment, the pages of the book were face down behind the bookstand. In each trial, the experimenter recited a line of the story and then turned the page towards the children so that the pictures were visible. The experimenter then repeated the story line and paused for about 8 s before reciting the story line for the next page. The experimenter looked at a neutral location in front of her during each trial to prevent children from using her gaze as a cue for finding the matching picture.
The conditions were based on a previous spontaneousresponse task administered by Scott et al. (2012). The story was organized into three types of trials: introduction, setup, and test (see Scott et al., 2012 for pictures and script). The two introduction trials showed the main character, Emily (introduction-1), and her friend Sarah (introduction-2). The introduction trials showed only one picture to ensure that children correctly identified each story character. The introduction trials were followed by five setup trials. In the setup trials in the false-belief condition, children heard that Emily had an apple (setup-1) and that she hid it (setup-2) in either a basket or a box (hiding container was counterbalanced across children); Sarah watched Emily hide the apple. Emily then left to take a nap (setup-3). While Emily was taking a nap, Sarah moved the apple to the other container (setup-4) and then went outside to play (setup-5). The sides of the matching pictures in the introduction and setup trials were counterbalanced across trials and across children.
The story ended with a single test trial in which Emily looked for her apple. Emily grasped the basket in one picture and the box in the other picture. The side of the matching picture in the test trial was counterbalanced across hidingcontainer condition and side condition in the introduction and setup trials, with roughly equal numbers of children tested in each combination.
The script in the true-belief condition was similar to the falsebelief condition except for three trials: setup-3, setup-4, and test (see Scott et al., 2012 for pictures and script). Instead of going to take a nap in setup-3, Emily sat down to read a book. She was therefore present and saw Sarah move the apple in setup-4. Finally, the script in the test trial did not reference Emily waking up from her nap. The pictures in the true-belief condition were identical to the false-belief condition with the exception of the matching picture for setup-4, which showed Emily watching Sarah move the apple.

Coding
In each 2-picture trial (i.e., the setup and test trials), we coded where the children looked (left picture, right picture, away) frame-by-frame from silent video. All children were coded independently by a primary and a secondary coder, who agreed on the children's direction of gaze for 95% of coded video frames. Trials in which agreement was less than 85% (33/240 trials) were resolved by a third coder. With the exception of 11 trials in which the third coder agreed with the secondary coder, the primary coder's data was used in all analyses. The primary coder collected the data, so she was therefore aware of each participant's diagnostic status and condition (i.e., false vs. true belief). However, the secondary coder was blind to the diagnostic status of all participants. While this secondary coder was aware of the study's aim, she was blind to the condition assigned to each participant.
For the five setup trials, we coded the first 8 s that the pictures were visible to the children; this 8-s window ended during the pause after the story line was repeated, prior to the story line for the next trial. In the test trial, we separately examined the first 4 s and the second 4 s that the pictures were visible 1 . Preliminary analyses of the setup and test trials revealed no interaction of picture with side condition in the setup trials or hiding container; the data were therefore collapsed across these factors in subsequent analyses. 1 We predicted that children with ASD might take longer than TD children to identify the belief-consistent picture in the test trial, but so such delay was predicted for children with ASD in the setup trials because the images in these trials were more readily discriminable and identifying the correct picture did not require retrieval of prior story context.

RESULTS
We adopted a similar analysis strategy to prior studies that have used this preferential-looking task with TD children in order to facilitate comparison of our results to their findings (e.g., Scott et al., 2012;Barrett et al., 2013). In particular, consistent with these prior studies, we treated picture as a within-subject factor in all analyses. Analyses based on proportion scores (looking time to the matching image/looking time to either image) or difference scores (looking time to the matching image-looking time to the non-matching image) yield the same pattern of results as those reported here.
Children's looking times (in seconds) during the five setup trials (see Figure 1) were averaged and analyzed using a mixed model analysis of variance (ANOVA) with group (ASD, TD) and condition (false-belief, true-belief) as between-subjects factors and picture (matching, non-matching) as a within-subject factor. The analysis revealed a significant effect of picture, F Children's looking times during the test trial were analyzed using a mixed model ANOVA with group (ASD, TD) and condition (false-belief, true-belief) as between-subjects factors, and window (first, second) and picture (original-location, current-location) as within-subject factors. The analysis revealed a significant interaction between picture and condition, F (1, 36) 11.93, p 0.001, η p 2 0.25. This effect was qualified by a marginal three-way interaction of window, condition, and group F (1, 36) 3.46, p 0.071, η p 2 0.089, and a significant four-way interaction between picture, window, condition, and group F (1,36) 9.77, p 0.003, η p 2 0.213. The analysis also revealed a marginal effect of window, F (1, 36) 3.50, p 0.069, η p 2 0.089. No other effects were significant, all Fs < 1.62, all ps > 0.21.
The significant four-way interaction suggested that children in the two groups responded differently to the pictures in the first and second half of the trial in the false-belief and true-belief conditions. We next explored this interaction in two ways. First, to determine whether the children in each group responded appropriately given the agent's belief, we examined the TD and ASD groups separately. Second, to evaluate the possibility that children with ASD responded more slowly than TD children, we compared the two groups' performance within each analysis window.  Typically Developing Children TD children's looking times during the test trial (see Figure 2) were analyzed using a mixed model ANOVA with condition (false-belief, true-belief) as a between-subjects factor, and window (first, second) and picture (original-location, current-location) as within-subject factors. The analysis revealed a significant interaction between picture and condition, F (1, 18) 4.70, p 0.044, η p 2 . 21. This effect was qualified by a significant three-way interaction between picture, window, and condition, F (1,18) 5.50, p 0.031, η p 2 0.23. The analysis also revealed a marginal effect of window, F (1, 18) 3.90, p 0.064, η p 2 0.18. No other effects were significant, all Fs < 1.36, all ps > 0.25.
In the first 4 s of the test trial, TD children in the false-belief condition looked significantly longer at the original-location (M 2.66, SD 0.76) than at the current-location (M 1.14, SD 0.82) picture, t (9) 3.13, p 0.006, whereas those in the true-belief condition looked significantly longer at the currentlocation (M 2.59, SD 0.66) than at the original-location (M 1.38, SD 0.68) picture, t (9) 2.87, p 0.009. Thus, consistent with prior findings (Barrett et al., 2013), in both conditions during the first 4 s of the test trial TD children looked reliably longer at the picture that was consistent with the agent's belief and hence matched the final line of the story. In contrast, during the second 4 s of the test trial, TD children in both the false-belief condition (original-location M 1.48, SD 1.32; current-location M 2.12, SD 1.35) and true-belief condition (original-location M 1.44, SD 1.10; currentlocation M 1.92, SD 1.08) looked relatively equally at the two pictures, both ts < 1.

Children With Autism Spectrum Disorder
Children with ASD's looking times during the test trial (see Figure 3) were analyzed using a mixed model ANOVA with condition (false-belief, true-belief) as a between-subjects factor, and window (first, second) and picture (original-location, current-location) as within-subject factors. The analysis revealed a significant interaction between picture and condition, F (1, 18) 8.08, p 0.011, η p 2 0.31. This effect was qualified by a significant three-way interaction between picture, window, and condition, F (1,18) 4.53, p 0.047, η p 2 0.20. No other effects were significant (window by condition interaction F (1, 18) 2.64, p 0.12; all other Fs < 1).
During the first 4 s of the test trial, children with ASD looked relatively equally at the two pictures in both the false-belief condition (original-location M 1.81, SD 0.81; currentlocation M 1.78, SD 0.75) and the true-belief condition (original-location M 1.96, SD 1.47; current-location M 1.59, SD 1.43), both ts < 1. In contrast, during the second 4 s of the test trial, children with ASD in the false-belief condition looked significantly longer at the original-location (M 2.12, SD 1.21) than at the current-location (M 0.99, SD 1.05) picture, t (9) 2.05, p 0.035, whereas those in the true-belief condition looked significantly longer at the current-location (M 2.78, SD 1.15) than at the original-location (M 0.97, SD 0.98) picture, t (9) 2.78, p 0.011. Children with ASD thus looked reliably longer at the picture that matched the final story line in the second, but not the first, half of the test trial.

Within-Window Comparisons
These results suggest that in both the true-belief and false-belief conditions, both groups of children were able to track the agent's belief and hence demonstrated a preference for the picture that was consistent with how she should act given that belief. However, the two groups demonstrated this preference at different points in time: the TD children looked reliably longer at the belief-consistent picture during the first half of the trial, whereas the children with ASD did so during the second half of the trial. This pattern of results is consistent with the speculation, outlined in the introduction, that children with ASD might exhibit slower responses to social scenes than TD children.
We next directly tested this possibility by examining whether the two groups differed reliably from one another during each analysis window. To do so, we calculated difference scores for each of the 4-s windows that reflected children's preference for the belief-consistent picture: we subtracted their looking time to the belief-inconsistent picture (false-belief condition: currentlocation picture; true-belief condition: original-location picture) from their looking time to the belief-consistent picture (false-belief condition: original-location picture; truebelief condition: current-location picture). Children's difference scores were analyzed with a mixed model ANOVA with group (TD, ASD) as a between-subjects factor and window (first, second) as a within-subject factor. This analysis revealed a significant interaction of group and window, F (1, 38) 9.86, p 0.003, η p 2 0.21. No other effects were significant, both Fs < 1. During the first 4 s of the test trial, TD children demonstrated a significantly larger preference for the belief-consistent image FIGURE 3 | Children in the ASD group's mean looking time (in seconds) to the original-location and current-location pictures in the test trial, separately by condition and window. Error bars represent standard errors and asterisks indicate a significant in looking time to the two pictures within a condition (p < 0.05).
Frontiers in Communication | www.frontiersin.org July 2021 | Volume 6 | Article 669985 (M 1.36, SD 1.41) than did children with ASD (M −0.17, SD 2.05), t (38) 2.76, p 0.004. This pattern reversed during the second 4 s of the test trial, where children with ASD demonstrated a significantly larger preference for the beliefconsistent image (M 1.47, SD 1.89) than did the TD children (M −0.08, SD 2.30), t (38) 2.33, p 0.013. These results suggest that, as predicted, children with ASD demonstrated their understanding of the agent's belief later in the trial than did TD children.

DISCUSSION
The present study investigated whether children with ASD could demonstrate false-belief understanding in a preferential-looking task. When Emily did not see Sarah move her apple, both TD children and children with ASD attributed to her a false belief that her apple was still in the original container. When they heard the final story line, "Emily is looking for her apple", both groups of children looked longer at the original-location picture, which showed Emily searching in accordance with her false belief. In contrast, when Emily saw Sarah move her apple, both groups of children looked longer at the picture where she searched for the apple in its current location. Children with ASD, however, were slower to look at the belief-consistent picture in both conditions, thereby demonstrating a delayed processing style that became apparent only when we compared group responses within a longer, 8-s response window.
These results provide the first evidence that children with ASD are capable of attributing false beliefs to others in a spontaneousresponse task. Our results suggest that children with ASD can understand others' false beliefs when cognitive demands are sufficiently reduced, but their false-belief processing unfolds more slowly than in TD children. Our findings are thus broadly consistent with recent evidence demonstrating that task demands impact false-belief performance (Rubio-Fernández and Geurts, 2013;Chevallier et al., 2014;Setoh et al., 2016;Carlsson et al., 2018;Bialecka-Pikul et al., 2019;Psouni et al., 2019;Salter and Breheny, 2019;Scott et al., 2020). More generally, this study offers a promising new paradigm exploring social cognitive processing in children with ASD. We do, however, acknowledge that our results can only be generalized to children with ASD whose receptive language skills allow them to comprehend our story and whose symptoms do not interfere with their ability to sustain their attention to both pages of a storybook while the experimenter recites a 45-s story. Recall that six participants with ASD were not able to complete the task suggesting that it may be more suitable to higher functioning children with ASD in terms of receptive language and visual attention.
Our findings have implications for theories of ToM processing in children with ASD. As we noted in the introduction, the fact that children with ASD typically fail traditional elicited-response tasks could be interpreted in two ways: this failure could indicate that their ability to represent false beliefs is fundamentally impaired, or it could reflect difficulty coping with the executive function, linguistic, pragmatic, and social-interaction demands of the task. Recent evidence that children and adults with ASD fail anticipatory-looking false-belief tasks-which are completely nonverbal and do not require interpreting or responding to a request from an experimenter-has been taken to support the former possibility (Senju et al., 2009;Senju et al., 2010;Schneider et al., 2013;Schuwerk et al., 2016b). Specifically, these findings have been interpreted as indicating an impairment in spontaneous or 'implicit' mentalizing that persists throughout the lifespan and is present even in individuals with ASD who eventually pass elicited-response false-belief tasks (Senju et al., 2009;Schuwerk et al., 2016a). Such individuals are assumed to pass elicitedresponse tasks via alternative, compensatory strategies that rely heavily on verbal abilities.
Our findings are at odds with these claims. We deliberately recruited children with ASD with preschool-level verbal abilities. The average VMA of our ASD group was comparable to the chronological age at which TD children begin to pass elicitedresponse tasks (e.g., Wellman et al., 2001), and substantially younger than the VMA where individuals with ASD have been found to pass elicited-response tasks (Happé, 1995). Moreover, within the ASD group, there was no correlation between children's VMA and their preference for the beliefconsistent image in either the first test window, r 0.18, p 0.44, or the second test window, r 0.21, p 0.37. It therefore seems unlikely that the children with ASD succeeded in this task using a compensatory strategy that relied on advanced verbal abilities. Instead, it seems more plausible that they successfully attributed a false belief to the agent and used it to interpret her actions in the test trial, albeit somewhat more slowly than TD children.
The present results thus suggest that children with ASD are capable of 'spontaneous mentalizing' that does not rely on advanced verbal abilities. Our findings also suggest a potential explanation for previous evidence that children with ASD fail anticipatory-looking false-belief tasks (e.g., Senju et al., 2010;Schuwerk et al., 2016b;Burnside et al., 2017). Specifically, we found that children with ASD looked longer at the beliefconsistent image during the second half of the test trial, but they looked equally at the two images during the first half of the test trial. Prior studies that have measured anticipatory looking in participants with ASD have done so during short response windows after an anticipatory prompt, with the longest window being 5 s (Senju et al., 2010). Our results suggest that these short response windows do not give children with ASD sufficient time to retrieve the agent's false belief and use it to process her belief-based actions. This raises the possibility that children with ASD might succeed in anticipatory-looking tasks if anticipatory responses were examined in later response windows.
However, the discrepancy between our results and findings from anticipatory-looking tasks could reflect more than just the time windows analyzed: the nature of the spontaneous response might also matter. Although a considerable number of studies using anticipatory-looking tasks have produced positive findings with TD children and neurotypical adults (for reviews see Scott & Baillargeon, 2017;Scott et al., in press), there have also been Frontiers in Communication | www.frontiersin.org July 2021 | Volume 6 | Article 669985 8 several recent failures to replicate these findings (e.g., Burnside et al., 2018a;Dörrenberg et al., 2018;Kulke et al., 2018;Schuwerk et al., 2018). We know of no such failures to replicate the results from preferential-looking tasks (for successful replications, see Barrett et al., 2013;Roby & Scott, 2018). These negative results have led some to question whether anticipatory-looking tasks truly assess false-belief understanding.
We think it more likely that the mixed pattern of findings obtained with anticipatory-looking tasks indicates that the rapid, anticipatory responses measured in these tasks are more sensitive to variation in properties of the task (Baillargeon et al., 2018) as well as properties of the participants themselves (Roby & Scott, 2018;Scott et al., in press). In particular, some evidence suggests that predictive or anticipatory responses are correlated with individual differences in social experience and social motivation (e.g., Ferguson et al., 2015a;Ferguson., 2015b;Burnside et al., 2018b;Roby & Scott, 2018). For instance, belief-based anticipation is correlated with children's preference for social over non-social stimuli (Burnside et al., 2018a) and with adults' empathy scores (Ferguson et al., 2015a). Thus, even amongst TD individuals, those who are more socially motivated and engaged more readily anticipate others' beliefbased behavior. These findings suggest that the mixed pattern of findings with anticipatory-looking tasks could stem in part from meaningful individual variation in participants' tendency to anticipate others' actions.
More critically for the present research, these findings also suggest a potential explanation for why individuals with ASD fail anticipatory-looking tasks. Chevallier et al., 2012 have proposed that ASD stems from early-emerging impairments in social motivation. These impairments include a reduced tendency to orient and maintain attention to social stimuli, which predicts TD children's performance in anticipatorylooking tasks (Burnside et al., 2018b). Thus, individuals with ASD might fail anticipatory-looking tasks due to deficits in social motivation, rather than an inability to spontaneously reason about an agents' false belief.
There is some evidence to suggest that preferential-looking tasks are less sensitive to differences in social experience and motivation than anticipatory-looking tasks. For instance, Roby and Scott (2018) tested TD 2.5-years-olds in a spontaneousresponse task that involved both an anticipatory-looking trial and a preferential-looking trial. Children's performance on the anticipatory-looking trial was positively correlated with whether they had an older sibling and their parents' use of cognition talk (i.e., sentences containing terms such as think and know) but their performance on the preferential-looking trial was not. Together with the present positive findings, these results suggest that future work on spontaneous ToM in children with ASD might benefit from using preferentiallooking paradigms like the one introduced here rather than anticipatory-looking tasks, which might be poorly suited for this population.
By examining eye fixations in longer time intervals, our results lend support for the notion that ToM reasoning in people with ASD operates more slowly than TD participants (e.g., Zwickel et al., 2011;Schuwerk et al., 2015). Given the fact that ToM skills contribute to our ability to comprehend figurative language, we are pleased that there is a renewed interest in examining figurative language comprehension in children with ASD with experimental paradigms that pose decreased pragmatic and linguistic demands compared to traditional paradigms Koder and Lakum, 2020). Our findings suggest that people with ASD can correctly realize a speaker's belief in conversations containing nonliteral speech acts including verbal irony in tasks with reduced demands.
Recall from our Introduction that Pexman et al. (2011) reported that children with high functioning ASD produced eye gaze latencies indicating that they could process verbal irony faster than their age matched controls. We would like to highlight that the methods used by Pexman and colleagues are quite different than the present research because children were asked a direct question about the speaker and eye gaze duration measurements began when the experimenter started asking the question. We suspect that children with ASD were successful because the delay between the end of the speaker's statement and the beginning of the experimenter's question allowed them to process the speaker's intention. Our findings suggest that had Pexman and colleagues measured children's eye gaze in real time as the speaker was criticizing, children with ASD may have demonstrated slower latencies than the TD controls. This could potentially have ramifications in everyday conversations, where nonliteral interpretations must be computed rapidly online. We suggest that additional research measuring eye fixations with spontaneous response tasks in children with ASD is required to uncover a deeper understanding of figurative language processing skills in this population.
Our findings also have implications for clinical interventions for children with ASD. Specifically, they show that it is necessary to develop social skill training interventions with reduced linguistic demands with the goal of carefully monitoring children's comprehension. For example, Persicke et al., 2013 have devised a training package to teach children with ASD how to detect and respond to verbal irony. Children with ASD were presented with rules, videos, and in vivo training sessions where verbal irony was explicitly labeled with experimenters highlighting the counterfactual nature of the comment, the speaker's exaggerated intonation, and the speaker's body language. After training, comprehension of ironic remarks was determined if the child demonstrated a congruent social response such as smiling, laughing, or commenting on the ironic nature of the remarks. We suggest that this type of training program might benefit from measuring children's eye fixations to monitor for comprehension in a way that does not require an elicited response. This may allow for a finer grained method of assessing whether a child with ASD understands an ironic speaker's belief and intent.
In conclusion, the present study offers an alternative method of assessing ToM reasoning in children with ASD. We suggest that future research on figurative language processing in children with ASD may similarly uncover new insights to processing differences by measuring eye fixations in longer time intervals. This research has the potential to inform the development of more effective social skill intervention programs for children with ASD.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by University of Manitoba Psychology/Sociology Research Ethics Board. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
MG and RMS wrote the manuscript with editorial feedback from EB and MP. MP and EB recruited participants, collected data, and coded data for this project. AH-D helped with recruiting children with ASD to participate. RMS and MP analyzed and interpreted the data.