## TEMPORAL COGNITION: ITS DEVELOPMENT, NEUROCOGNITIVE BASIS, RELATIONSHIPS TO OTHER COGNITIVE DOMAINS, AND UNIQUELY HUMAN ASPECTS

EDITED BY : Patricia J. Brooks, Danielle DeNigris and Laraine McDonough PUBLISHED IN : Frontiers in Psychology

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88963-151-3 DOI 10.3389/978-2-88963-151-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

Frontiers in Psychology 1 October 2019 | Temporal Cognition

## TEMPORAL COGNITION: ITS DEVELOPMENT, NEUROCOGNITIVE BASIS, RELATIONSHIPS TO OTHER COGNITIVE DOMAINS, AND UNIQUELY HUMAN ASPECTS

Topic Editors:

Patricia J. Brooks, College of Staten Island and the Graduate Center, CUNY, United States Danielle DeNigris, Fairleigh Dickinson University, United States Laraine McDonough, Brooklyn College, CUNY, United States

Image: geralt (pixabay.com)

Humans manifest an acute awareness of the passage of time and capacity for mental time travel, i.e., the ability to mentally place oneself in the past or future, as well as in counterfactual or hypothetical situations. The ability to perceive, estimate, and keep track of time involves multiple forms of representation (temporal concepts and frames of reference) and sensory modalities. Temporal cognition plays a critical role in various forms of memory (e.g., autobiographical memory, episodic memory, prospective memory), future-oriented thinking (foresight, planning), self-concepts, and autonoetic consciousness. This Research Topic addresses the myriad ways that temporal cognition impacts human behavior, how it develops, its clinical relevance, and the extent to which aspects of temporal cognition are uniquely human.

Papers in this Research Topic focus on the following:

1) Low-level perceptual mechanisms that track durations, intervals, and other temporal features of stimuli.

2) Inter-relatedness of temporal reasoning and language development.

3) Temporal cognition in children with autism.

4) Cross-domain mappings between space and time across visual and auditory modalities.

5) Assessing mental time travel as a uniquely human capacity.

6) Implications of individual differences in temporal processing for health and well-being.

Citation: Brooks, P. J., DeNigris, D., McDonough, L., eds. (2019). Temporal Cognition: Its Development, Neurocognitive Basis, Relationships to Other Cognitive Domains, and Uniquely Human Aspects. Lausanne: Frontiers Media. doi: 10.3389/978-2-88963-151-3

# Table of Contents

### CHAPTER 1

*06 Editorial: Temporal Cognition: Its Development, Neurocognitive Basis, Relationships to Other Cognitive Domains, and Uniquely Human Aspects* Patricia J. Brooks and Danielle DeNigris

### CHAPTER 2

*09 Robust Temporal Averaging of Time Intervals Between Action and Sensation*

Huanke Zeng and Lihan Chen

### CHAPTER 3

*21 Spatial and Spectral Auditory Temporal-Order Judgment (TOJ) Tasks in Elderly People are Performed Using Different Perceptual Strategies* Elzbieta Szelag, Katarzyna Jablonska, Magdalena Piotrowska, Aneta Szymaszek and Hanna Bednarek

### CHAPTER 4

*32 The Development of Temporal Concepts: Linguistic Factors and Cognitive Processes*

Meng Zhang and Judith A. Hudson

### CHAPTER 5

*46 Positive Effect of Visual Cuing in Episodic Memory and Episodic Future Thinking in Adolescents With Autism Spectrum Disorder*

Marine Anger, Prany Wantzen, Justine Le Vaillant, Joëlle Malvy, Laetitia Bon, Fabian Guénolé, Edgar Moussaoui, Catherine Barthelemy, Frédérique Bonnet-Brilhault, Francis Eustache, Jean-Marc Baleyte and Bérengère Guillery-Girard

### CHAPTER 6

*58 Temporarily out of Order: Temporal Perspective Taking in Language in Children With Autism Spectrum Disorder*

Jessica Overweg, Catharina A. Hartman and Petra Hendriks

### CHAPTER 7

*69 Time is not More Abstract Than Space in Sound* Alexander Kranjec, Matthew Lehet, Adam J. Woods and Anjan Chatterjee

### CHAPTER 8

*80 Interrelations Between Temporal and Spatial Cognition: The Role of Modality-Specific Processing*

Jonna Loeffler, Rouwen Cañal-Bruland, Anna Schroeger, J. Walter Tolentino-Castro and Markus Raab

### CHAPTER 9

*87 Conversational Time Travel: Evidence of a Retrospective Bias in Real Life Conversations*

Burcu Demiray, Matthias R. Mehl and Mike Martin

### CHAPTER 10

*105 Detecting Temporal Cognition in Text: Comparison of Judgements by Self, Expert and Machine*

Erin I. Walsh and Janie Busby Grant

### CHAPTER 11

*111 Adults' Performance in an Episodic-Like Memory Task: The Role of Experience* Gema Martin-Ordas and Cristina M. Atance

CHAPTER 12

*122 Prognostic Value of Motor Timing in Treatment Outcome in Patients With Alcohol- and/or Cocaine use Disorder in a Rehabilitation Program* Susanne Yvette Young, Martin Kidd, Jacques J. M. van Hoof and Soraya Seedat

### CHAPTER 13

*132 The Functions of Prospection – Variations in Health and Disease* Adam Bulley and Muireann Irish

# Editorial: Temporal Cognition: Its Development, Neurocognitive Basis, Relationships to Other Cognitive Domains, and Uniquely Human Aspects

Patricia J. Brooks <sup>1</sup> \* and Danielle DeNigris <sup>2</sup>

*<sup>1</sup> College of Staten Island and the Graduate Center, CUNY, New York, NY, United States, <sup>2</sup> Department of Psychology & Counseling, Fairleigh Dickinson University, Madison, NJ, United States*

Keywords: temporal cognition, temporal perception, temporal reasoning, autism spectrum disorder, spatialtemporal relations, mental time travel, motor timing, prospection

**Editorial on the Research Topic**

#### **Temporal Cognition: Its Development, Neurocognitive Basis, Relationships to Other Cognitive Domains, and Uniquely Human Aspects**

Human lives are organized around time. As a species, we manifest an acute interest in its passage as exemplified by the clocks, calendars, and other instruments used to mark time with precision. From early childhood, we acquire linguistic and other mental capacities to simulate travel from the everchanging present into the past or future. Our abilities to perceive, estimate, and keep track of time, collectively described as temporal cognition, rely on multiple forms of representation. Temporal cognition underlies the development of episodic and autobiographical memory, foresight, and planning, and forms the basis for building a stable self-concept.

Studies of temporal cognition often distinguish lower-level perceptual mechanisms and higherorder capacities reliant on language and other symbolic media (Nunez and Cooperrider, 2013). Hoerl and McCormack (2018) offer a dual-systems approach, differentiating temporal updating mechanisms for tracking duration, elapsed time, and sequential order of events from temporal reasoning abilities. Temporal reasoning uses explicit formats to mark specific times/positions of events and mental simulation to imagine alternate realities. Like other forms of reasoning, it often relies on heuristics and is subject to bias.

For this Research Topic, we invited contributors to address the myriad ways temporal cognition impacts human behavior and psychological functioning, its development over the lifespan, and its uniquely human aspects. The first two papers aimed to characterize low-level perceptual mechanisms that track durations, intervals, and other temporal features of stimuli. Zeng and Chen examined perception of the time interval between an action and its sensory feedback, and demonstrated the robustness of our ability to average interval durations across these two modalities. Such temporal judgments may play a key role in the perception-action feedback loops that underpin coordinated behavior. Szelag et al. explored temporal resolution and sequencing abilities of healthy elderly adults, estimating separately their thresholds for perceiving temporal order of auditory stimuli varying in location (right ear-left ear vs. left ear-right ear) or spectral characteristics (highlow vs. low-high). The distinct response distributions and learning trajectories observed across the two tasks suggest that strategic processing influences low-level temporal perception.

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

\*Correspondence: *Patricia J. Brooks patricia.brooks@csi.cuny.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *25 June 2019* Accepted: *29 July 2019* Published: *13 August 2019*

#### Citation:

*Brooks PJ and DeNigris D (2019) Editorial: Temporal Cognition: Its Development, Neurocognitive Basis, Relationships to Other Cognitive Domains, and Uniquely Human Aspects. Front. Psychol. 10:1865. doi: 10.3389/fpsyg.2019.01865*

Shifting to higher-level temporal cognition, Zhang and Hudson examined the interrelatedness of temporal reasoning and language development, asking whether language is necessary for the formation of temporal concepts and not just for the expression of such concepts. The next two papers focused on children with autism, a population that exhibits deficits in temporal cognition (Boucher et al., 2007; Lind and Bowler, 2010). Anger et al. found beneficial effects of visual cues in eliciting past and future autobiographical details from autistic adolescents, who produced markedly fewer details than neurotypical controls when assessed via free recall. Overweg et al. compared autistic and neurotypical children's comprehension of temporal conjunctions before or after. Autistic children performed worse than controls, with variance explained by receptive vocabulary, nonverbal abilities, and performance on a theory of mind task in which they made inferences about a person's beliefs about another person. The authors concluded that weak perspective-taking skills may account in part for children's difficulties in comprehending temporal expressions.

Next, we explore cross-domain mappings between space and time, as evident in the use of spatial terms to represent temporal concepts (e.g., the past is behind, the future is ahead; an earlier event is left of a later event). Observations that people use spatial terms to talk about time more often than temporal terms to talk about space has been taken as support for Conceptual Metaphor Theory—that people rely on concrete, highly structured experiences as a source for metaphorically representing more abstract experiences, e.g., representing time as money, as a valuable commodity and limited resource (Lakoff and Johnson, 1980).

Two papers in this issue challenge the assumption that the mapping across spatial and temporal domains is inherently asymmetric. Kranjec et al. used a cross-domain contamination paradigm to compare the extent to which temporal information influences spatial judgments and vice versa. The authors found bi-directional effects that varied with task modality, and concluded that visual-spatial and auditory-temporal associations are privileged relative to other mappings. Similarly, in their review of 16 empirical studies of spatial-temporal relations, Loeffler et al. found that studies supporting the asymmetric hypothesis tended to use visual tasks across spatial and temporal domains, whereas studies supporting the symmetric hypothesis used auditory tasks for temporal representations, but visual tasks for spatial representations. Modality effects are further corroborated by studies of lower-level statistical learning of probabilistic sequences, where participants exhibit superior learning of temporal order when stimuli are presented in the auditory as opposed to visual or tactile modalities (Conway and Christiansen, 2005).

Three papers discuss methodological issues associated with mental time travel. Demiray et al. examined the temporal orientation of mental time travel assessed via electronically activated recordings (EARs) of snippets of naturally occurring speech. In contrast to signal-contingent experience sampling, where people respond to randomly timed signals, the EARs were collected unobtrusively. Participants showed a retrospective bias in conversational time travel, talking about their personal past more than twice as often as their personal future. Walsh and Busby Grant address coding challenges associated with experience sampling methods where participants' momentary thoughts are collected via text prompts. Human coders were more accurate than automated text coding algorithms in judging the temporal orientation of the recorded experiences. Accuracy was low (<80%) across conditions, indicating difficulties associated with coding ambiguous text for temporal perspective. The authors stress the importance of collecting temporal information from participants while sampling their experiences.

The claim that mental time travel is a uniquely human capacity (Suddendorf and Corballis, 2007) has led to innovative research on the capacities of non-human primates and avians to plan for the future (Bourjade et al., 2012; Clayton, 2015). Martin-Ordas and Atance tested adult humans on a decision-making task adapted from animal research, where participants had to choose which of two foods they would want in the future when one (a popsicle) would no longer be edible. Despite knowledge that popsicles melt, adults performed poorly in making future judgments, underscoring how difficult it is to envision how one will feel in the future and the biasing impact of the present (Gilbert and Wilson, 2007).

The final papers focus on implications of individual differences in temporal processing for health and well-being. Young et al. found motor timing deficits to be predictive of self-perceived efficacy to abstain from substance use among individuals in treatment for alcohol and/or cocaine use. Bulley and Irish review the role of prospective cognition in goal-directed behavior and decision-making, and highlight clinically relevant changes in prospection associated with psychiatric disorders including dementia, depression, anxiety, and addiction.

Understanding how humans represent lived and imagined experience in infinite variation requires a grasp of how the mind tracks change over time. As the variety of contributions to this Research Topic indicates, temporal cognition is multifaceted in its expression over the lifespan. As a field of inquiry, temporal cognition benefits from recent efforts to develop integrative theoretical frameworks relating higher- and lower-level processing mechanisms. Much remains to be understood about how outputs of temporal perceptual processes are redescribed into more explicit formats to support everyday judgment and decision-making.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Brooks and DeNigris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Robust Temporal Averaging of Time Intervals Between Action and Sensation

#### Huanke Zeng and Lihan Chen\*

School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China

Perception of the time interval between one's own action (a finger tapping) and the associated sensory feedback (a visual flash or an auditory beep) is critical for precise and flexible control of action and behavioral decision. Previous studies have examined temporal averaging for multiple time intervals and its role for perceptual organization and crossmodal integration. In the present study, we extended the temporal averaging from sensory stimuli to the coupling of action and its sensory feedback. We investigated whether and how temporal averaging could be achieved with respect to the multiple intervals in a sequence of action-sensory feedback events, and hence affect the subsequent timing behavior. In unimodal task, participants voluntarily tapped their index finger at a constant pace while receiving auditory feedback (beeps) with varied intervals as well as variances throughout the sequence. In crossmodal task, for a given sequence, each tap was accompanied randomly with either visual flash or auditory beep as sensory feedback. When the sequence was over, observers produced a subsequent tap with either auditory or visual stimulus, which enclose a probe interval. In both tasks, participants were required to make a two alternative forced choice (2AFC), to indicate whether the target interval is shorter or longer than the mean interval between taps and their associated sensory events in the preceding sequence. In both scenarios, participants' judgments of the probe interval suggested that they had internalized the mean interval associated with specific bindings of action and sensation, showing a robust temporal averaging process for the interval between action and sensation.

#### Keywords: temporal averaging, action, auditory, visual, interval

### INTRODUCTION

Time perception upon the interval between one's action and its sensory feedback (such as visual flash or auditory beep), i.e., sensorimotor timing, is critical for daily perception, behavioral decision and even human living (Repp, 2005). Two prominent examples of sensorimotor timing are sensorimotor synchronization (Aschersleben and Bertelson, 2003; Repp, 2005, 2006a,b) and temporal recalibration effect (TRE) (Stekelenburg et al., 2011; Sugano et al., 2012, 2014, 2016, 2017). In sensorimotor synchronization, observers produced tapping movements in synchrony with a sequence of isochronously (and continuously) repeated pacing signals, being either light flashes or auditory beeps (Aschersleben and Bertelson, 2003). A typical finding in sensorimotor synchronization is that timing of the taps has been biased significantly to the auditory signals

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Kielan Yarrow, City, University of London, United Kingdom Antonella Conte, Sapienza University of Rome, Italy

> \*Correspondence: Lihan Chen CLH@pku.edu.cn; clh20000@gmail.com

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 27 August 2018 Accepted: 20 February 2019 Published: 19 March 2019

#### Citation:

Zeng H and Chen L (2019) Robust Temporal Averaging of Time Intervals Between Action and Sensation. Front. Psychol. 10:511. doi: 10.3389/fpsyg.2019.00511

**9**

than visual flashes, when the taps were synchronized with continuous visual or auditory stimuli, indicating the preference of the perceptual system for continuous information with visual stimuli (Varlet et al., 2012; Armstrong and Issartel, 2014). TRE, on the other hand, reflects the nature of "causality" between action and its sensory feedback, and time adaptation aftereffect. In a seminal study, Stetson et al. (2006) inserted a temporal delay between one's own action (key presses) and the associated sensory feedback (visual flashes). Following a period of adaptation, when the flashes appeared unexpectedly after the keypresses, however, they were often perceived as occurring before the keypresses (Stetson et al., 2006), demonstrating recalibration effect for motor-sensory temporal order judgments.

In a typical sensorimotor synchronization task, observers are usually tapping according to the pacing signals with regular rhythm. However, it is often the case that the pacing rhythm is not regular, wherein observers have to calculate the "mean" rhythm (as a temporal reference) for making the subsequent prompted action decision and execution, whether by adopting the temporal estimation or (re)production tasks. The ability to extract the average time interval information in the action-sensory feedback sequence demonstrates the individual timing sensitivity ("temporal window" for sensory integration) and help us adapt to the environmental changes (Repp, 2005). The computation of the "mean," i.e., temporal averaging process, has been realized in a number of contexts, including crossmodal interaction in recent studies (Cheng et al., 1996; Matell and Henning, 2013; Schweickert et al., 2014; De Corte and Matell, 2016a; Chen et al., 2018). One compelling example for temporal averaging is the central tendency effect within the broader framework of Bayesian optimization. In the central tendency effect, observers incorporated the mean of the statistical distribution for sensory properties to assimilate/bias the estimates toward the mean (Jazayeri and Shadlen, 2010; Burr et al., 2013; Shi et al., 2013; Karaminis et al., 2016; Roach et al., 2017). For examples, the discrimination of the target sensory interval was biased to the preceding time interval from a different modality (Burr et al., 2013), the discrimination of visual apparent motion was modulated by the perceived mean inter-interval in the preceding auditory sequence (Chen et al., 2018; Wan and Chen, 2018).

The perception of the time interval between an action and its sensory feedback, in which the perception of time will be biased to the concurrent actions, is different to the perception of time intervals within pure sensory events. A recent study showed that motor timing during rhythmic tapping influences the visual timing. Tomassini et al. (2018) asked participants to tap their finger with a rhythm same to the preceding sequence of four auditory tones. During finger tapping, they were presented with an empty visual interval and judged its time interval compared with the previously established (internalized) interval of 150 ms. The perceived time was maximally expanded at halftime between two consecutive finger taps and the maximal expansion has been found to be anchored to the center of the inter-tap interval. This distortion in time perception indeed indicates that a timing mechanism exists to maximally keep perception and action accurately synchronized (Tomassini et al., 2018). In another seminal study, Yon et al. (2017) investigated the influence of movement duration on the perceived duration of an auditory tone. The judgments of tone duration were attracted toward the duration of executed movement-the tones were perceived to last longer when participants executed a movement with longer duration (Yon et al., 2017).

Temporal averaging entails the empirical inquiries with regards to the distribution of irregular (unequal) time intervals (De Corte and Matell, 2016a; Chen et al., 2018; Wan and Chen, 2018), selective averaging one of the sequences (Overduin et al., 2008), as well as potential capacity limits of simultaneous temporal processing (Cheng et al., 2014). Schweickert et al. (2014) demonstrated that observers estimated the average of tone durations and their performance was influenced by the distribution of the tone durations. In general the estimated averages were a linear function of the stimulus means. The estimates were accurate for the smallest population mean but underestimates for the larger means, and human observers subjectively shortened the durations in memory (Schweickert et al., 2014). With multiple intervals, human observers could encode two different, and distinct, standard durations. In this case, temporal generalization with respective to the one of the two standards was subject to the memory loading in temporal references as well as their variances (Jones and Wearden, 2004). Moreover, take two consecutively presented standards (A and B, each presented three times, but the duration of B was 100 ms longer than A) for example, the certain combinations of delay and interference could render the memory of A unusable and a new standard ("false memory") is constructed on the basis of the remembered relationship between A and B (Ogden et al., 2008). Therefore, the internal representation of temporal statistics depends on the distribution of time intervals, the variances of the intervals and is affected by the potentially memory mixing effect (due to the time delay as well as the interference among the many intervals being encoded).

In current study, we examine the mechanisms of temporal averaging of the time intervals between action and its sensory feedback (visual flash and auditory beep). Specifically, we investigated how the mean and irregularity (variances) in the distribution of time intervals affect the perception of target interval in the loop of action and its sensory feedback. Secondly, we examined how human observers can selectively average the sensory-specific time intervals in two sequences in which the actions were bound with either visual flashes or auditory beeps (Chen and Vroomen, 2013). Lastly, we examined the potential memory mixing effect induced by the memory load (and decay) and inherent individual capacity limit of simultaneous temporal processing.

We implemented four experiments to address these issues. In Experiment 1, we examined the ability of extracting the mean interval from a sound sequence and replicated the central tendency effect. In Experiments 2 and 3, we studied the selective temporal averaging in which the actions were bound with two types of events: beeps of two types of pitches, or two types of sensory stimuli (visual flashes and auditory beeps). In Experiment 2, we investigated whether observers could selectively separate the different mean action-auditory feedback intervals and hence make the comparisons between the produced interval and the

preceding duration-specific mean auditory intervals. To examine whether the ability of temporal averaging is dependent on the individual modalities (events) or not, in Experiment 3, we used both auditory beeps and visual flashes as sensory feedbacks and examined the selectivity of temporal assimilations to either short or long mean intervals (actions associated with visual or auditory feedbacks). By averaging, human observers could take both the mean interval information and the variance of the intervals into account (Acerbi et al., 2012). In Experiment 4, we further looked into whether the variations of the intervals (by manipulating the coefficient of variances, CV) affect the averaging process of temporal information. The results from the four experiments largely support a robust temporal averaging process for time intervals between actions and their associated sensations. We further validated the effectiveness of the temporal averaging of the intervals rather than the sampling from individual intervals (including the last interval of the action-sensation loops), and discussed the limited role of the memory load on the averaging process with the current paradigms.

### MATERIALS AND METHODS

### Stimuli and Apparatus

Auditory stimuli in a sound sequence were pure tones (30 ms, 500 Hz or 1000 Hz), with 65 dB SPL. Two pure tones of 2000 Hz were used as cueing signals. The starting cue (duration of 500 ms) prompted the beginning of a trial. The testing cue (for the last tap, duration of 200 ms) indicated the coming of the probe interval for discrimination (see the following procedure for more details).

Visual flash was a black disk (duration of 30 ms, 2.74 degree in diameter, 11 cd/m<sup>2</sup> in luminance) appearing at the center of the screen, with a gray background (16.8 cd/m<sup>2</sup> in luminance), presented on a 27-inch screen (ASUS PG278QR, NIVIDIA GeForce GTX 1080 Ti visual graphic card). The viewing distance from the participants to the center of the monitor was 60 cm. Auditory stimuli were delivered through NIVIDIA High Definition Audio. Participants wore headset of Sennheiser Momentum 2 to receive the sounds. We used RTBox v6 (Suzhou Litong Company Limited, China) to collect responses. The experimental program was written with Matlab (Mathworks Inc.) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007).

In Experiment 1, only 500-Hz tones were used and mean of eight intervals between tappings and tones (sensory feedback) was 800 ms. The eight sequential intervals were in the time range of 600 to 1000 ms, and were drawn from a Gaussian distribution of N(800, 100). Using customized codes, we composed each trial(sequence) to ensure the coefficient of variance (CV, i.e., the ratio of the standard deviation to the mean) of all intervals was between 0.1 and 0.15, thus to largely randomize the temporal information as well as within the human observers' perceptual expertise to perform the tasks. In Experiment 2, two mean intervals were used. The short interval (mean of 400 ms) was associated with low-pitch tone (500 Hz) and the long interval (mean of 800 ms) was associated with high-pitch tone (1000 Hz). The short sequential intervals were in the range from 200 to 600 ms, and were drawn from a Gaussian distribution of N(400, 100). The CV of the intervals was between 0.1 and 0.15. The mapping between tone pitch and mean interval was reversed in the other condition. In Experiment 3, the similar configurations were used as in Experiment 2 except that both auditory and visual feedbacks were used. In Experiment 4, we designed two types of tap-tone sequences in which the mean tap-tone interval was kept at 800 ms. However, for one sequence, the taps were followed with tones (500 Hz) with low CV (between 0.1 and 0.15) of the intervals. For the other sequence, the taps were associated with tones with high-pitch tones (1000 Hz) and with high CV (between 0.3 and 0.35). The CVs were determined by previous evidence so that in this range human observers could well perform the relevant tasks (Chen et al., 2018; Getty, 1975a,b). For all the above experimental conditions, following the sequences of action-sensory feedback, participants pressed a button and generated an interval of 200, 400, 600, 800, 1000, 1200, or 1400 ms, to compare with the preceding long mean interval (800 ms); and from 100, 200, 300, 400, 500, 600, or 700 ms to compare with the preceding short mean interval (400 ms).

In the formal experiments, the preceding sequence contained two different intermixed durations, with the two different durations each cued by different pitches or by different sensory events (visual flashes or auditory beeps). Under this context, people can extract and maintain a standard for each duration. The two standards might interact and may interfere a bit in memory references. To examine whether there are perceptual shifts and response biases due to the mixing of the two sequences (standards), we further implemented control tests with the same tasks as in formal experiments, but obtained the baseline data for mean 400 and 800 ms interval conditions from another groups of participants.

### Procedure

The experiments were performed in compliance with the institutional guidelines set by the Academic Affairs Committee, School of Psychological and Cognitive Sciences, Peking University. The protocol was approved by the Committee for Protecting Human and Animal Subjects, School of Psychological and Cognitive Sciences, Peking University. All participants gave written informed consent in accordance with the Declaration of Helsinki, and were paid for their time on a basis of 40 CNY/hour, i.e., 6.3 United States dollars/hour.

In a preceding action-sensation sequence, participants did voluntary taps that triggered either auditory beeps or visual flashes as sensory feedbacks. This loop with multiple tapsensation intervals (with mean interval of 400 or 800 ms) served as a temporal reference for the subsequent comparison of target interval (in a single action-sensation loop). The target interval was defined by a tap with its associated sensory feedback (visual flash or auditory beep). The target interval was 200, 400, 600, 800, 1000, 1200, or 1400 ms for the long mean duration (800 ms) condition and 100, 200, 300, 400, 500, 600, or 700 ms for the short mean duration (400 ms) condition. A typical trial started with a black fixation ("cross" on the monitor screen) which appeared 500 ms before the first signaling tone and lasted until the second cueing tone was over. The first cueing beep (2000 Hz, 500 ms)

indicated the start of the action-sensory feedback sequence and prompted the participants to issue the tappings within 3 s. The tap was accompanied with either visual flash or auditory beep, with the repetition of eight action-sensation intervals (mean 400 ms or 800 ms). When the last sensation feedback was over, after a blank interval of 300 ms, participants heard a 2000 Hz beep (200 ms) which indicated the issuing of a last tap for generating target interval (either with visual flash or auditory beep) (**Figure 1**). We used the method of constant stimuli to compare the target interval duration with the mean actionsensation interval duration. Participants were asked to make a two alternative forced choice (2-AFC) with RTbox, to indicate which interval is longer: the mean action-sensation interval, or the last target interval (**Figure 1**). We detailed the specific methods for each experiment as follows.

#### Experiment 1

Thirteen participants (with ages from 19 to 25, 6 males) took part in experiment 1. In Experiment 1, we used 500 Hz tones as sensory feedbacks for participants' voluntary taps. Participants consecutively tapped eight times first, in which each tap was followed by a 500 Hz auditory beep as sensory feedback. The time intervals between action and sensory feedback were not equal (with mean interval of 800 ms and coefficient of variance of 0.1 to 0.15). The target interval was 200, 400, 600, 800, 1000, 1200, or 1400 ms. Participants took two blocks of tests, each block having seven trials for each given target interval. Participants received 14 trials, twice for each target interval, to get familiar with the task.

The data from Experiment 1 served as a subset of baseline data, in which only one type of auditory signals were used. Three further control experiments were implemented to provide baseline data in which only a single type of stimuli was presented eight times, i.e., 500 Hz tones with short intervals, visual flashes with long intervals (mean 800 ms) and visual flashes with short intervals (mean 400 ms). The control experiments were modified after Experiment 1. In addition to the specific mappings of sensory feedbacks and intervals, in each control experiment participants received practices (visual feedback of "correct" or "wrong" after each response) until their accuracies were above 75%. The number of practice blocks were identical to the formal experiments. Thirteen participants (ages from 19 to 24, 5 males) took parts in control experiment (CE1). In CE1 (baseline corresponding to Experiment 2 and Experiment 3), sensory feedbacks were 500 Hz auditory beeps, but the mean tapbeep interval was 400 ms. Thirteen participants (ages from 19 to 24, 3 males) attended in CE2. In CE2 (baseline for Experiment 3), we used visual flashes as sensory feedbacks to associate with the taps. The mean tap-flash interval was 800 ms. Thirteen participants (ages from 18 to 24, 3 males) attended in CE3. In CE3 (baseline for Experiment 3), the tap-visual flash sequence was adopted with the mean tap-flash interval of 400 ms. For all the control experiments, after the preceding sequence was over, the probe interval was given and was always demarcated with the sensory event of the same properties as shown in the sequence. The probe interval was 200, 400, 600, 800, 1000, 1200, or 1400 ms for the long mean duration (800 ms) condition, and 100, 200, 300, 400, 500, 600, or 700 ms for the short mean duration (400 ms) condition.

### Experiment 2

Seventeen participants (ages from 20 to 25, 5 males) took part in Experiment 2. We used two kinds of auditory feedbacks (500 or

FIGURE 1 | Stimuli configurations and schema for the experiments. (Upper): Experiments 1, 2, and 4. In a typical trial, upon hearing a beep participants voluntarily pressed a button to issue its sensory feedback ("beeps," with same or different pitches). When the sequence of multiple action-sensory events was over, another signaling beep appeared which prompted the participants to issue a press and it was followed by a last sensory feedback. At this time point they were encouraged to make perceptual discrimination of whether the probe interval (between the offset of the action and onset of the beep) was shorter or longer than the mean interval between the action and its sensory feedback. (Down): The procedure for Experiment 3. The general procedure was the same as shown in the upper figure, however, the sensory feedback include mixed streams of visual flashes and auditory beeps. Participants were asked to compare the probe interval between tap and flash, or between tap and beep with the corresponding mean interval of the preceding intervals of the same type. Detailed information was given in the main text.

Zeng and Chen Temporal Averaging

1000 Hz) and two sets of tap-sensation intervals (mean = 400 ms or mean = 800 ms, CVs of both sets of intervals were 0.1 to 0.15). In one condition, short intervals were marked by 500 Hz tones and long intervals were marked by 1000 Hz tones. Nine participants took the test in this condition. In the other condition, eight participants joined the test in which the associations between intervals and tones were reversed (short intervals-high pitch tones and long intervals-low pitch tones). In a tap-sensation sequence, the short and long intervals were mixed. Participants issued eight taps in which the ratio of the short to long intervals was selected from one of the given sets (1:1, 3:5, 5:3). Participants were prompted to compare the target interval with the preceding mean interval of action-sensory feedbacks in four blocks, in which both the target interval and the preceding intervals between action and sensation were marked by the tones with the same pitches. In each block, one target interval (from seven levels) was presented four times. Prior to formal experiment, participants received two tasks for practice. In the first task, they received the practice with both short and long mean intervals (but in one sequence only either 500 or 1000 Hz tones were given). Each target interval was presented three times, resulting in 42 trials. Participants could take another session for practice until their accuracies were above 75%. In the second task, they received another 14 trials (with mixed tones of 500 and 1000 Hz, seven times for each condition). Both practice tasks were implemented with visual feedback of "correct" or "wrong" responses. When the practice session was over, participants took the formal test.

### Experiment 3

Sixteen participants (ages from 20 to 25, 7 males) took part in Experiment 3. The stimuli configurations and timing parameters were similar to those in Experiment 2, except that the 1000 Hz tones were replaced by visual black disks as sensory feedback. The practice protocol was the same as the one in Experiment 2.

### Experiment 4

Twelve participants (ages from 20 to 25, 4 males) took part in Experiment 4. The stimuli setting and timing parameters were similar to those in Experiment 2, except that the two sets of action-sensation intervals were same (mean 800 ms) but with different CVs. In one configuration, the intervals marked with 500 Hz tones were associated with CVs of 0.1 to 0.15 (i.e., low variance), and those intervals marked with 1000 Hz were associated with CVs of 0.3 to 0.35 (i.e., high variance). In the other configuration, the mappings between tone pitches and CVs were reversed. Prior to the formal experiment, participants took 14 trial practice with feedback of "correct" or "wrong" responses as did in Experiment 2.

### Data Analysis

In all four experiments, the proportions of reporting the target duration as longer across seven intervals were fitted to the psychometric curve using a logistic function (Treutwein and Strasburger, 1999; Wichmann and Hill, 2001). The transitional threshold, that is, the point of subjective equality (PSE) at which the participant was likely to report the two motion percepts equally, was calculated by estimating 50% of reporting of group motion on the fitted curve. The just noticeable difference (JND), an indicator of the sensitivity of apparent motion discrimination, was calculated as half of the difference between the lower (25%) and upper (75%) bounds of the thresholds from the psychometric curve.

### RESULTS

### Experiment 1 and Control Experiments Exp1

#### **Baseline bias when eight sequential stimuli were drawn from a single distribution**

The mean PSE and JND were 869.3 ± 24.1 ms (standard deviation) and 194.4 ± 29.4 ms. All the mean PSEs and JNDs were ploted in **Figure 3**. One sample t-test showed that participants underestimated the target interval, compared with 800 ms, t(12) = 10.368, p < 0.001 (**Figure 2**, left).

### **Effects of individual standards within the sequence**

To evaluate whether certain intervals in the action-sensation sequences play a significant role in determining the estimation of the probe interval, e.g., the potential recency effect stemming from the last interval (Wan and Chen, 2018), we performed binary logistic regression with responses to target intervals ("0" as shorter and "1" as longer compared with the mean interval) as dependent variable and eight sequential intervals and probe interval as predictor variables for each participant. Ominibus Tests of Model Coefficients of all participants' model reached significant level (ps < 0.001), which suggested at least one of the predictor variables was statistically significant in contributing the discrimination of probe interval. The results of Hosmer and Lemeshow Tests of models were not significant (ps > 0.143), implying good fitness of the models. We then implemented one-sample t-tests comparing parameter estimates of the eight sequential intervals of all participants with "0." None of these sequential intervals reached significant level (ps > 0.521). Finally, a repeated-measure ANOVA test was implemented with positions of sequential intervals as within-subject variables on parameter estimates of sequential intervals of all participants. The difference between sequential intervals was partially significant [F(7,84) = 2.112, p = 0.051, η <sup>2</sup> = 0.150] and the effect of intercept was not significant [F(1,12) = 0.291, p = 0.599, η <sup>2</sup> = 0.024]. The detailed values were given in **Table 1**.

### CE1

In this separate control experiment with 500 Hz auditory beeps and short mean durations, the mean PSE and JND were 470.8 ± 19.5 ms and 119.1 ± 24.5 ms. One sample t-test revealed a significant bias of perceived "compression" of the probe intervals (compared with the reference of 400 ms) [t(12) = 13.333, p < 0.001]. Binary logistic regression, the same as in Exp1 was applied. Ominibus Tests of Model Coefficients of all models reached significant level (ps < 0.001). The results of Hosmer and Lemeshow Tests of models were not significant (ps > 0.196) for eleven participants except

for two participants (which means their models were not good fitted). Thus we implemented one-sample t-tests with the two participants excluded. None of these sequential intervals reached significant level (ps > 0.055). The repeated measures ANOVA test revealed a partially significant effect of intercept [F(1,12) = 4.585, p = 0.053, η <sup>2</sup> = 0.276] but no significant effect of sequential intervals [F(7,84) = 0.702, p = 0.610, η <sup>2</sup> = 0.055].

#### CE2

The mean PSE and JND of the control experiment with visual flashes and long mean duration (800 ms) were 832.7 ± 27.6 and 138.2 ± 7.5 ms. One sample t-test of this condition showed participants' tendency of "compressing" probe intervals as above [t(12) = 4.271, p = 0.001] (**Figure 3**). Ominibus Tests of Model reached significant level (ps < 0.001) and Hosmer and Lemeshow Tests of models were not significant [ps > 0.579] for the binary logistic regression. One-sample t-tests showed that none of the effects of these sequential intervals were significant (ps > 0.345). Both the effects of sequential intervals [F(1.000,12.003) = 1.007, p = 0.335, η <sup>2</sup> = 0.077] and intercept [F(1,12) = 0.958, p = 0.347, η <sup>2</sup> = 0.074] were not significant by repeated-measure ANOVA test.

### CE3

For the control experiment with visual flashes and short mean duration (400 ms), the mean PSE and JND were 418.5 ± 13.0 and 75.7 ± 10.3 ms. Participants had biases to "compress" the probe intervals [t(12) = 5.128, p < 0.001]. For binary logistic regressions, Ominibus Tests of Model reached significant (ps < 0.001) and Hosmer and Lemeshow Tests of models were not significant (ps > 0.364). One-sample t-tests showed none of these sequential intervals was significant in contributing the perceived probe intervals (ps > 0.277). The repeated-measure ANOVA test showed neither effect of sequential intervals [F(1.001,12.015) = 1.018, p = 0.333, η <sup>2</sup> = 0.078], nor effect of intercept [F(1,12) = 0.960, p = 0.347, η <sup>2</sup> = 0.074].

### **Combine data from Exp1 and CEs for analysis**

A 2 × 2 ANOVA test that took modality (auditory/visual) and mean duration (short/long) as between-subject factors showed, for both PSEs and JNDs, a significant main effect of modality [PSE: F(1,48) = 54.890, p < 0.001, η <sup>2</sup> = 0.533; JND: F(1,48) = 79.144, p < 0.001, η <sup>2</sup> = 0.622] and a significant main effect of mean duration [PSE: F(1,48) = 4577.967, p < 0.001, η <sup>2</sup> = 0.990; JND: F(1,48) = 151.808, p < 0.001, η <sup>2</sup> = 0.760]. The interaction of modality and mean duration was not significant [PSE: F(1,48) = 1.725, p = 0.195, η <sup>2</sup> = 0.035; JND: F(1,48) = 1.314, p = 0.257, η <sup>2</sup> = 0.027]. To be more specific, PSEs and JNDs of auditory modality were significantly larger than those in visual modality. PSEs and JNDs in short mean duration condition was significantly smaller than those in long mean duration condition.

The data from Exp1 and CEs could serve as control references for following experiments.

### Experiment 2

### Sequential Stimuli With Two Different Interval Distributions Around Two Alternative References (Standards)

The mean PSE and mean JND of probe intervals for "500 Hz– 400 ms" condition in "1000 Hz–800 ms" context were 440.0 ± 58.3 and 84.8 ± 33.7 ms. The mean PSE and mean JND of "1000 Hz–400 ms" in "500 Hz–800 ms" context were 493.0 ± 65.8 and 120.3 ± 47.8 ms (**Figure 2**, right). The mean PSE and mean JND of "1000 Hz–800 ms" in "500 Hz–400 ms" context were 750.8 ± 96.2 and 146.1 ± 59.9 ms. The mean PSE and mean JND of "500 Hz– 800 ms" in "1000 Hz–400 ms" context were 784.5 ± 77.0 and 143.8 ± 49.5 ms.

We performed a repeated measures analysis of variance (ANOVA) test with context from different matchings between tones (500 Hz, 1000 Hz) and means of intervals (400 ms, 800 ms) as between-subject variable, and means of sequential intervals as within-subject variable. There was no significant

mean duration conditions.

main effect of context [F(1,15) = 2.795, p = 0.115, η <sup>2</sup> = 0.157] but interval means had a significant main effect on PSEs [F(1,15) = 131.618, p < 0.001, η <sup>2</sup> = 0.898]. For JNDs, context also did not make a difference [F(1,15) = 0.740, p = 0.403, η <sup>2</sup> = 0.047]. However, the main effect of the mean interval duration was significant [F(1,15) = 9.704, p = 0.007, η <sup>2</sup> = 0.393]. This result pattern indicated that participants had selectively extracted different "mean" intervals to make prompt perceptual decision of the probe intervals. Therefore, we collapsed the data across two types of tone pitches for further analysis. The mean PSEs for short and long mean durations (across both pitches) were 464.9 ± 65.9 and 766.7 ± 86.7 ms. The mean JNDs for short and long mean durations were 101.5 ± 43.6 and 145.0 ± 53.6 ms.

**15**

TABLE 1 | The parameter estimates of binary logistical regressions. The probe intervals were labeled as 1∼7 in the regression models.


The values in the table indicated the corresponding mean beta values in the regression models. The values in the brackets referred to the p-values corresponding the beta values in one sample t-test (∗p < 0.05; ∗∗p < 0.01).

### Comparison Between Exp2 and Corresponding Control Experiments for Short and Long Mean Conditions

We implemented one-way ANOVA to compare the collapsed data and data from corresponding control experiments, i.e., Experiment 2 and CE1. For short mean duration condition, we did one-way ANOVA with context as between-subject variable. The context included three conditons: 500 Hz with short mean duration control (i.e., "500 Hz–400 ms"), 500 Hz with short mean duration stimuli in the context of 1000 Hz long mean duration stimuli ("500 Hz–400 ms and 1000 Hz–800 ms"), 1000 Hz with short mean duration stimuli in the context of 500 Hz long mean duration stimuli ("500 Hz–800 ms and 1000 Hz–400 ms"). The effect of context was not significant on PSEs [F(2,27) = 2.650, p = 0.089]. The context also didn't make a difference on JNDs [F(2,27) = 3.190, p = 0.057].

For long mean duration condition, the same one-way ANOVA test was implemented. The results showed that the context had a significant effect on PSEs [F(2,27) = 9.072, p = 0.001]. PSEs of "500Hz–800 ms" control was significantly larger than both PSEs of "500 Hz–800 ms" in "1000 Hz–400 ms" context (p = 0.015) and PSEs of "1000 Hz–800 ms" in "500 Hz–400 ms" context (p = 0.009). Also, there was a significant main effect of context on JNDs [F(2,27) = 4.307, p = 0.024]. However, JND of "500 Hz–800 ms" in "1000 Hz–400 ms" context were marginally significantly different from JND of "500 Hz–800 ms" control [p = 0.061]. JND of "1000 Hz–800 ms" in "500 Hz– 400 ms" context was the same as the JND of "500 Hz–800 ms" control (p = 0.110).

#### Effects of Individual Standards Within the Sequence

Binary logistic regressions analysis was applied to Experiment 2 as in Experiment 1. For all participants, results of Ominibus Tests of Model Coefficients reached significant level (ps < 0.001) and results of Hosmer and Lemeshow Tests of models were not significant (ps > 0.250). One-sample t-tests comparing parameter estimates of 8 sequential intervlas with 0 revealed that the last three sequential intervals contributed to participants' responses (ps < 0.010). A repeated measures ANOVA test was done as in Exp1. There was no significant effect of sequential intervals [F(7,1112) = 0.898, p = 0.511, η <sup>2</sup> = 0.053] but the effect of intercept was significant [F(1,16) = 13.675, p = 0.002, η <sup>2</sup> = 0.461]. This result pattern indicated that with two standards of references (sequences), participants could have some initial preferences responding to the specific sequence (short vs. long). Moreover, with the increasing complexity of stimuli, participants depended more on the recent intervals to make perceptual decision for the probe interval.

Therefore, with mixed and complicated action-sensation sequences, observers could extract selectively the mean intervals of specific action-sensation sequence to facilitate the temporal discriminations for the probe intervals. However, due to the to the repetition effect with the multiple intervals (Pariyadath and Eagleman, 2007; Matthews and Meck, 2014; Matthews and Gheorghiu, 2016), the perceived mean interval has been shortened compared with one standard (long) mean interval with the single sequence. This "compression" effect has attracted and biased the probe interval to be subjectively perceived as shorter (with larger PSEs). We'll come to this point in the Discussion section.

### Experiment 3

### Sequential Stimuli With Two Different (Auditory and Visual) Interval Distributions Around Two Alternative References (Standards)

The mean PSE and mean JND of "A(uditory) – 400 ms" in "V(isual) – 800 ms" context were 456.2 ± 64.2 and 86.9 ± 47.8 ms. The mean PSE and mean JND of "V – 400 ms in A – 800 ms" context were 439.5 ± 88.5 and 104.0 ± 42.7 ms. The mean PSE and mean JND of "V – 800 ms" in "A – 400 ms" context were 784.3 ± 108.3 and 117.1 ± 77.6 ms. The mean PSE and mean JND of "A – 800 ms" in "V – 400ms" context were 764.3 ± 68.0 and 133.9 ± 70.9 ms (**Figure 3**). A repeated measures ANOVA analysis with mean of action-sensation intervals (400 or 800 ms) as within-subject variable and context of different mappings between stimuli (visual flashes and auditory beeps) with the short/long intervals, indicated there were no significant influence of context [PSE: F(1,14) = 0.414, p = 0.530, η <sup>2</sup> = 0.029; JND: F(1,14) = 0.360, p = 0.558, η <sup>2</sup> = 0.025]. However, the main effect of the mean intervals was significant on PSEs [F(1,14) = 111.644, p < 0.001, η <sup>2</sup> = 0.889] and JNDs [F(1,14) = 6.229, p = 0.026, η <sup>2</sup> = 0.308]. Therefore, we collapsed the data across stimuli types (auditory vs. visual). The mean PSEs for short and long mean

interval conditions were 447.8 ± 75.2 and 774.3 ± 88.0 ms. The mean JNDs for short and long mean interval conditions were 95.4 ± 44.7 and 125.5 ± 72.3 ms (**Figure 3**).

### Comparison Between Exp3 and Corresponding Control Experiments for Short and Long Mean Conditions

As above, we implemented a two-way ANOVA test on the collapsed data and corresponding control data for short mean duration condition, with modality of feedbacks (auditory beeps/visual flashes) and context (context of 500 Hz–400 ms control/context of Exp 3) as between-subject variables. For PSEs, there was no significant interaction effect of modality × context [F(1,38) = 2.434, p = 0.127, η <sup>2</sup> = 0.060]. The modality of sensory feedbacks had a significant effect [F(1,38) = 6.686, p = 0.014, η <sup>2</sup> = 0.150] but the context didn't have such a significant effect [F(1,38) = 0.034, p = 0.855, η <sup>2</sup> = 0.001]. The PSEs of "A – 400 ms" in "V – 800 ms" context in Exp3 were not different from PSEs of "A – 400 ms" in CE1 (p = 0.225). The PSEs of "V – 400 ms" in "A – 800 ms" context were the same as PSEs of "V – 400 ms" in CE3 [p = 0.336]. For JNDs, there was a significant effect of modality × context [F(1,38) = 14.152, p = 0.001, η <sup>2</sup> = 0.271]. The results also revealed a significant effect of modality [F(1,38) = 5.458, p = 0.025, η <sup>2</sup> = 0.126] but not of context [F(1,38) = 0.576, p = 0.452, η <sup>2</sup> = 0.015]. The JNDs of "A – 400 ms" in "V – 800 ms" context in Exp3 were smaller than JNDs of "A – 400 ms" in CE1 (p = 0.003) and the JNDs of "V – 400 ms" in "A – 800 ms"in Exp3 were, however, larger than JNDs of "V – 400 ms" in CE3 (p = 0.040).

For long mean duration coditon, the same two-way ANOVA test was implemented. We didn't find significant interaction effect of modality × context on PSEs [F(1,38) = 0.542, p = 0.466, η <sup>2</sup> = 0.014]. The modality made no difference for PSEs [F(1,38) = 2.205, p = 0.146, η <sup>2</sup> = 0.055]. But context had a significant effect on PSEs [F(1,38) = 9.741, p = 0.003, η <sup>2</sup> = 0.204]. The PSEs of "A – 800 ms" in "V – 400 ms" context in Exp3 were significantly larger than PSEs of "A – 800 ms" context in Exp1 (p = 0.010) The PSEs of "V – 800 ms" in "A – 400 ms" context in Exp3 were the same as PSEs of "V – 800 ms" context in CE2 (p = 0.100). For JNDs, only modality had a significant effect [F(1,38) = 8.732, p = 0.005, η <sup>2</sup> = 0.187]. There was no significant interaction of modality × context [F(1,38) = 0.238, p = 0.628, η <sup>2</sup> = 0.006] or effect of context [F(1,38) = 3.186, p = 0.082, η <sup>2</sup> = 0.077]. There were no differences between JNDs of "A – 800 ms" in "V – 400 ms" context in Exp3 and of "A – 800 ms" context in Exp1 (p = 0.116) or between JNDs of "V – 800 ms" in "A – 400 ms" context in Exp3 and the JNDs of "V – 800 ms" context in CE2 (p = 0.365).

#### Effects of Individual Standards Within the Sequence

The binary logistic regressions showed good fit for 15 participants: Ominibus Tests of Model Coefficients reached significant level (ps < 0.001) but Hosmer and Lemeshow Tests of models were not significant (ps > 0.163). The result showed that seven of eight sequential intervals alone could not predict participants responses [ps > 0.066] but the sixth one contributed to participants' reponses (p = 0.042). The results of repeated-measure ANOVA test showed no effect of sequential intervals [F(3.995,59.919) = 0.335, p = 0.853, η <sup>2</sup> = 0.022] but a significant effect of intercept [F(1,15) = 5.204, p = 0.038, η <sup>2</sup> = 0.258].

### Experiment 4

### Sequential Stimuli With Two Different Variances but With the Same Mean Reference Duration

We implemented a two-way repeated measures ANOVA test to examine whether various mappings of tone pitches (500 Hz vs. 1000 Hz) and CVs (0.1–0.15 vs. 0.3–0.35) made a difference. The results indicated that orthogonal mappings did not make a difference [F(1,10) = 0.988, p = 0.344, η <sup>2</sup> = 0.090]. Therefore, we collapsed the data across tone piches as did in Exp2. The mean PSEs for low CV and high CV interval conditions were 900.4 ± 99.1 and 895.8 ± 101.6 ms, and the mean JNDs under the two CVs were 165.0 ± 68.1 and 175.6 ± 87.9 ms.

### Comparison Between Exp 4 and Corresponding Control Experiments

One-way ANOVA test with CV (low/high/control) indicated that there was no significant main effect either on PSEs [F(2,34) = 0.533, p = 0.591] or on JNDs [F(2,34) = 0.645, p = 0.531]. Again, binary logistic regressions for all participants showed that Ominibus Tests of Model reached significant level (ps < 0.001) and Hosmer and Lemeshow Tests of models were not significant (ps > 0.138). One-sample tests suggested none of these sequential intervals were significant (ps > 0.093). Finally, a repeated- measure ANOVA test was implemented. No differences between sequential intervals were found [F(2.389,26.278) = 0.509, p = 0.639, η <sup>2</sup> = 0.044] and the effect of intercept was not significant [F(1,11) = 3.124, p = 0.105, η <sup>2</sup> = 0.221].

### DISCUSSION

In current study we reported that humans are able to use the mean of multiple irregular action-sensation intervals, to compare with the subsequent probe interval which was defined by a single tap and its sensation (visual flash or auditory beep). However, during this comparison, human observers might use only some of the intervals rather than all of them.

This temporal averaging ability has been robustly observed in the loop of action-sensation (sensory feedback) as did in the pure perceptual domian (with a sequence of stimuli) (Jazayeri and Shadlen, 2010; Shi et al., 2013; Karaminis et al., 2016; Wan and Chen, 2018). Importantly, human observers can selectively average the mean of the multiple intervals between action and sensations. This selectivity was demonstrated in two aspects: (1) Tuning to short and long intervals. In current configurations, we implemented short mean interval (400 ms) and long mean interval (800 ms) conditions by presenting a sequence containing the voluntary actions and their associated auditory beeps as sensory feedback (Experiments 1, 2, and 4). Participants could adaptively make the discrimination of the probe interval and referred to either the "short" standard or "long" standard (mean) intervals being extracted. (2) Selectivity across different sensory

modalities. In Experiment 3, we mixed the auditory beeps and visual flashes in the same action-sensation loop. Participants could judge the probe interval by picking up the corresponding specific sequence, summarized mean tap-tone interval or tapflash interval to facilitate the discrimination of the probe interval (either "auditory" or "visual" event as the final marker in the probe). Temporal averaging of time intervals between action and sensation is relatively robust. The ability to average the mean intervals were less influenced by the distribution profile (as shown in the low vs. high variances) of the intervals Human observers calculate different temporal ranges (short vs. long), irrespective of the intersensory bindings of the differential temporal ranges or different sensory events (Chen and Vroomen, 2013), or with different variabilities of the intervals themselves (Acerbi et al., 2012).

This robust temporal averaging between action and sensation was achieved by a similar mechanism of central tendency effect (Jazayeri and Shadlen, 2010; Burr et al., 2013; Shi et al., 2013; De Corte and Matell, 2016a; Karaminis et al., 2016), in which the perceptual discrimination of the probe/target inteval was biased to the mean interval of the preceding mean actionsensation intervals.

As shown in the literature of timing research, perception of temporal synchrony/asynchrony between one's own action and the sensory feedback of that action is quite flexible, in which the time order of cause (action) and effect (sensory feedback) could even be reversed due to the repetitious adaptation (Stetson et al., 2006; Heron et al., 2009; Sugano et al., 2010, 2012, 2014; Acerbi et al., 2012; Keetels and Vroomen, 2012). This flexibility has been shown in different forms. Human observers could simultaneously adapt to differential intersensory temporal bindings in audiovisual speech (Overduin et al., 2008; Heron et al., 2009, 2012; Roseboom and Arnold, 2011; Curran et al., 2012; Yuan et al., 2012; McWalter and McDermott, 2018) and in (hands) action-sensation couplings (Sugano et al., 2014). For the audiovisual temporal recalibration effect, humans can form multiple simultaneous estimates of differential timing for audiovisual synchrony, in which the positive or negative temporal asynchronies between auditory and visual streams (identified by associating with either the male or female speech) led to the corresponding shifts of temporal relations, after "selective" adaptations to one of the two temporal relations (Roseboom and Arnold, 2011). This concurrent recalibration effect has been demonstrated in a clever design in which Sugano et al. (2014) exposed the participants' left and right hands to different actionsensory feedback lags ("clicks"), one for long delay (∼150 ms) and one for short delay subjective no-delay (∼50 ms). In addition to observing the traditional temporal recalibration effect, Sugano et al. (2014) found different effectsizes of TRs due to the differential "delayed" feedbacks. Those findings indicated that human observers have both central and motor/sensory specific timing processing mechanisms in dealing with the temporal bindings between events and actions (Chen and Vroomen, 2013; Ivry and Schlerf, 2008).

In the current study, though the central tendency effect was robustly replicated in the sensorimotor domain, we did not observe a fixed pattern of the potential recency effect, i.e., the potent role of the last interval in action-sensation sequence (Burr et al., 2013). And interestingly, we did not find a distinctive change in the behavioral performance with respective to the modalities (auditory vs. visual sensory events). This finding is largely against the established knowledge of auditory dominance (with high temporal precision) over visual signal in sensory timing and in sensorimotor recalibration (Burr et al., 2009; Lukas et al., 2014; Sugano et al., 2016). However, one typical finding is that the perceived probe intervals were longer in long mean auditory intervals context ("A – 800 ms" in "V – 400 ms") compared with the ones in "A – 800 ms" (baseline), but no bias for the long mean visual intervals counterpart. This pattern indicates that we still keep the sensitivity for more salient and accurately timing stimuli–auditory beeps and are hence subject to the contextual modulation.

Using the mean intervals in action-sensation loop to compare with the subsequent probe interval could be attentional resourceconsuming, which constrains the otherwise "advantage" of auditory events (Cheng et al., 2014). During the unfolding of the action-sensation loop, participants should always hold in the working memory of the many intervals (Van Rijn, 2016), and switch frequently of intervals with different durations and with different sensory events (visual flashes and auditory beeps). In this context, we suggest that the fine distinction of the last interval has been interfered and concealed to impose the potentially observable influence on discriminating the probe action-sensation interval. Nevertheless, to maintain and exploit the grossly "abstract" means is less demanding and is even automatically acquired, as shown in a large body of literature (Chong and Treisman, 2003; Haberman et al., 2009; Haberman and Whitney, 2009; de Gardelle and Summerfield, 2011; Albrecht et al., 2012; Piazza et al., 2013). In our case, with the unfolding of the action-sensation sequence, we had to hold in the (working) memory with multiple intervals and multiple sensory events before we made perceptual decision of the probe interval. This increased number of items in memory, as well as the interference of holding two standards (short vs. long mean intervals), and time decay between the preceding sequence and the probe, could be challenging to one's limited capacity of information processing (Cheng et al., 1996, 2014). However, we did not observe this detriment in present tasks. Note that the total time span for all the events in a sequence was about 7 s, which was shorter than the pure time-delay (last above 30 s) between the offset of the sequence (stimuli) and the probe stimuli in other relevent studies (Jones and Wearden, 2004; Ogden et al., 2008), where the long delay is subject to the memory decay (interference). Therefore, in our case, we believe participants could well maintain the events in memory and mobilize the attentional resources to fullfil the tasks.

The control experiments with only one standard (mean duration of 400 or 800 ms), with the comparsion of the corresponding main exepriments, further supported that a robust averaging has been observed, even though there were general biases in which the perceived (mean) time interval was "compressed" with mixed sequences ("standards") and had been observed obviously with "short" standard. This illusory "compression" of perceived time interval could be elicited by the repetition effect of extended, complex structures of events, which

Zeng and Chen Temporal Averaging

lead to the subjectively "shortened" element interval (Sasaki et al., 2002; Nakajima et al., 2004; Matthews and Meck, 2014, 2016; Matthews and Gheorghiu, 2016). Alternatively, the direct attention on the multiple stimuli (or distraction on the stimuli) that demarcating the intervals, would somehow consume the resources for processing the "intervals" themselves (hence the less attended intervals were preceived as shorter) which could lead to the observed "compression" effect (Mattes and Ulrich, 1998; Tse et al., 2004). The direct attention across auditory or visual events, and the attentional switching between different sensory events, also contributed to the imbalance of perceiving the same physical intervals. For example, in the control test, the mean 800 ms in tap-beep sequence was indeed perceived as shorter than the 800 ms in the tapvisual flash sequence. It is probably due to the expansion of intervals by the onset of visual events, especially when the visual events were dyanmic and unexpected (Kanai et al., 2006; Kanai and Watanabe, 2006).

With that said, we should pay attention to the limitations of current studies. For instance, we did not test empirically how the efficiency of using the mean intervals in sensorimotor domain is constrained by the invidiviual working memory capacity. We are also not informed how the degrees of complexity of the temporal structure (including the more levels of CVs for the durations) would affect the "averaging" processing. Further research evidence is needed to address these considerations.

### REFERENCES


In sum, we revealed a novel and robust temporal averaging process in sensorimotor domain, by employing the actionsensory intervals as building elements in the perception-action sequence. Our findings suggest that human observers can use the mean action-sensation intervals to facilitate and optimize the task-relevant perceptual decision for the subsequent time information in the critial action- sensation loop. The robust averaging of action-sensation intervals suggests that a centralized timing mechanism may subserve this process (Ivry and Schlerf, 2008), though it is constrained and even interfered by contextual factors (Jazayeri and Shadlen, 2010; Cheng et al., 2014; De Corte and Matell, 2016b), including memory mixing (Van Rijn, 2016) and attentional-capacity limitations (Cheng et al., 2014) and some contributions of salient individual events in the loop.

### AUTHOR CONTRIBUTIONS

LC designed the study. HZ and LC analyzed the data and wrote the manuscript.

### FUNDING

This work was funded by Project Crossmodal Learning NSFC 61621136008/DFG TRR-169, NSFC61527804, NSFC 31861133012, and Research Fund from TAL group, China.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Zeng and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Spatial and Spectral Auditory Temporal-Order Judgment (TOJ) Tasks in Elderly People Are Performed Using Different Perceptual Strategies

Elzbieta Szelag<sup>1</sup> \*, Katarzyna Jablonska<sup>2</sup> , Magdalena Piotrowska<sup>1</sup> , Aneta Szymaszek<sup>1</sup> and Hanna Bednarek<sup>2</sup>

<sup>1</sup> Laboratory of Neuropsychology, Nencki Institute of Experimental Biology of Polish Academy of Sciences, Warsaw, Poland, <sup>2</sup> Faculty of Psychology, SWPS University of Social Sciences and Humanities, Warsaw, Poland

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Makoto Wada, National Rehabilitation Center for Persons with Disabilities, Japan Marc Wittmann, Institut für Grenzgebiete der Psychologie und Psychohygiene (IGPP), Germany Leah Fostick, Ariel University, Israel

> \*Correspondence: Elzbieta Szelag e.szelag@nencki.gov.pl

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 27 September 2018 Accepted: 29 November 2018 Published: 11 December 2018

#### Citation:

Szelag E, Jablonska K, Piotrowska M, Szymaszek A and Bednarek H (2018) Spatial and Spectral Auditory Temporal-Order Judgment (TOJ) Tasks in Elderly People Are Performed Using Different Perceptual Strategies. Front. Psychol. 9:2557. doi: 10.3389/fpsyg.2018.02557 The Temporal-Order Judgment (TOJ) paradigm has been widely investigated in previous studies as an accurate measure of temporal resolution and sequencing abilities in the millisecond time range. Two auditory TOJ tasks are often used: (1) a spatial TOJ task, in which two identical stimuli are presented in rapid succession monaurally and the task is to indicate which ear received the first stimulus and which ear received the second one (left-right or right-left), and (2) a spectral TOJ task, in which two tones of different frequencies are presented asynchronously to both ears binaurally and the task is to report the sequence of these tones (low-high or high-low). The previous literature studies conducted on young volunteers indicated that the measured temporal acuity on these two tasks depended on the procedure used. As considerable data are now available about age-related decline in temporal resolution ability, the aim of the present study was to compare in elderly subjects the pattern of performance on these two tasks. A total of 40 normal healthy volunteers aged from 62 to 78 years performed two TOJ tasks. The measurement was repeated in two consecutive sessions. Temporal resolution was indexed by the Auditory Temporal-Order Threshold (ATOT), i.e., the minimum time gap between successive stimuli necessary for a participant to report a before-after relation with 75% correctness. The main finding of the present study was the indication of differences in the elderly in performance on two tasks. In the spatial task, the distribution of obtained ATOT values did not deviate from the Gaussian distribution. In contrast, the distribution of data in the spectral task deviated significantly from the Gaussian and was spread more to the right. Although lower ATOT values were usually observed in Session 2 than in Session 1, such difference was significant only in the spectral task. We conclude that although temporal acuity and sequencing abilities in the millisecond time range are probably based in neuronal oscillatory activity, the measured ATOTs in the elderly seem to be stimulus-dependent, procedure-related, and influenced by the perceptual strategies used by participants.

Keywords: temporal information processing, spatial task, spectral task, temporal-order judgment, aging, auditory temporal-order threshold

## INTRODUCTION

fpsyg-09-02557 December 8, 2018 Time: 15:8 # 2

For over three decades, an increasing number of experimental studies have suggested that Temporal Information Processing (TIP) is an essential component of human cognition. Researchers have been interested in this topic because of converging evidence indicating that patterning in time plays a fundamental role in human behavior, as many mental functions display specific temporal dynamics (Pöppel, 1994, 1997, 2009; Szelag et al., 2004a; Wittmann, 2009, 2011). Thus, patterning in time provides a structure for cognition and a framework for our working brains, proving that the brain incorporates the time dimension into its computation. Findings about differences in TIP among various clinical subgroups emphasize the importance of timing-cognition relations, as they can be understood as reflecting fundamental differences in TIP associated with deficient cognition (see Teixeira et al., 2013 for an overview). It seems, therefore, that cognitive processes cannot be understood without taking their time frame into account.

Existing evidence indicates that TIP is not a monolithic process. One may distinguish several time ranges controlled by specific neural mechanisms employing discrete time sampling. This study focuses on the millisecond time domain, which provides a structure for motor and sensory processing, including speech processing (Pöppel, 1997, 2009; Wittmann, 1999, 2009, 2011; Szelag et al., 2004a, 2008, 2010, 2011, 2014; Szelag and Dacewicz, 2016). This time domain is related to the perception of succession and the temporal order of events – distinct stimuli must be separated by some tens of milliseconds in order for them to be identified as different events.

The Temporal-Order Judgment (TOJ) paradigm is one of several psychophysical paradigms used to measure the efficiency of temporal resolution in this time domain. It reflects the ability to perceive the temporal order of (at least two) stimuli presented in rapid succession; the subject's task is to indicate their temporal order – i.e., identify a before-after relation. The correctness of such judgments reflects temporal acuity, necessary for the identification of incoming events in analytical, sequential information processing (von Steinbüchel et al., 1999; Szymaszek et al., 2009; Szelag et al., 2011; Babkoff and Fostick, 2013, 2017; Fostick and Babkoff, 2013a). Accordingly, it has been postulated that patterning in a time window of some tens of milliseconds is controlled by a neural mechanism characterized by time limits of approximately 30 ms (Pöppel, 1994, 1997). This temporal ordering ability directly indicates the distinct nature of TIP.

Auditory Temporal-Order Threshold (ATOT) can be used as an index of temporal acuity (i.e., the efficiency of identifying event ordering) and can be measured using a TOJ paradigm. ATOT is defined as the shortest time gap (in milliseconds) between two sounds presented in rapid succession with an Inter-Stimulus Interval (ISI) of some tens of milliseconds necessary to identify their before-after temporal relation with at least 75% correctness (Szelag et al., 2011, 2014, 2015b; Bao et al., 2013, 2014). An auditory TOJ paradigm may employ various measurement procedures. Subjects may be presented with a sequence of tones of different frequencies delivered monaurally or binaurally (Ben-Artzi et al., 2005, 2011; Bao et al., 2014), two or four stimuli sequences of clicks, tones or syllables (Ulbrich et al., 2009), as well as the same auditory stimuli (e.g., tone bursts or clicks) presented monaurally with a difference in the time of arrival of the stimulus at the left and right ear (Fink et al., 2005, 2006; Szymaszek et al., 2009; Szelag et al., 2011; Bao et al., 2013, 2014). Accordingly, the spatial TOJ task reflects a situation where two identical stimuli are presented monaurally in an alternating presentation mode and the task is to identify the ear to which the first stimulus was delivered and the ear to which the second was delivered (left-right or right-left). In contrast, in the spectral TOJ task, two different stimuli (e.g., high and low tones) are presented binaurally and the task is to indicate the order of their occurrence (high-low or low-high). It should be stressed that, in addition to temporal processing, these two TOJ tasks also involve task-specific perceptual processes, which are the topic of the present study.

Starting from the seminal papers by Hirsh and Sherrick, 1961) and Efron (1963), temporal resolution ability has been widely applied in experimental studies to assess millisecond timing efficiency in both normal subjects and various clinical subgroups. Hence, reliable measurement procedures are very important for drawing reasonable conclusions about a subject's information processing.

The basic and still open question is: how do our brains process temporal information in this time domain? According to the hypothesis proposed by Pöppel (1997, 2009), visual or acoustic stimuli processed within a time window of less than ca. 30 ms are treated as co-temporal or a-temporal. Thus, their before-after relation cannot be established. For healthy young subjects to perceive the temporal order of two distinct events correctly, the minimum delay between these stimuli must exceed ca. 30 ms.

Several authors claim that one central mechanism which samples time discretely is responsible for the assessment of temporal order both within and across sensory systems. Evidence supporting such a hypothesis comes from experimental studies indicating similar threshold values both across sensory systems and within sensory modalities, including the auditory system. The main evidence for a central mechanism was provided by Hirsh and Sherrick, 1961). They found the same threshold of 17 ms for temporal ordering in different sensory systems, as well as for cross-modal comparisons. Other authors have also found evidence for this central mechanism hypothesis, e.g., Mills and Rollman (1980), Pöppel (1994, 1997), Wittmann (1999), Szelag et al. (2004a), Babkoff et al. (2005), Ben-Artzi et al. (2005), Fink et al. (2005, 2006), Bao et al. (2014). These studies, which employed various types of sensory stimuli and procedures in healthy young controls, indicated thresholds for temporal ordering between 20 and 60 ms. Furthermore, some of these authors reported unique response patterns produced by different TOJ paradigms (Fink et al., 2005, 2006; Szymaszek et al., 2006, 2009; Szelag et al., 2011; Bao et al., 2013, 2014; Fostick and Babkoff, 2013b).

In more recent papers using different variants of experimental TOJ tasks and various subject subpopulations, evidence has suggested that TOJ on the millisecond level may be influenced by various procedures and subject-related factors, the most important of which seem to be the type of stimuli used, presentation mode, age, cognitive status, gender, as well as

neurodevelopmental or neurodegenerative disorders (for the overview see von Steinbüchel et al., 1999; Wittmann and Szelag, 2003; Szelag et al., 2004b, 2010, 2011, 2015a,b; Szymaszek et al., 2009, 2018; Teixeira et al., 2013; Matthews and Meck, 2014; Oron et al., 2015). Existing studies have also confirmed individual differences in TIP at this processing level in healthy volunteers of various ages (Szymaszek et al., 2009; Szelag et al., 2011; Bao et al., 2013, 2014).

Of course, in a given experimental situation, subjectrelated factors co-exist with procedure-related influences. But the relations between these complex factors, critical for the measurement of resolution ability in an individual, are still an open question. Furthermore, their neural basis is still a poorly understood area of psychology and neuroscience. One of the problems in these studies is clarifying the degree to which the applied paradigms are sensitive to pure temporal processes and to stimulus-related, procedural, and other influences. As previous studies raise questions about the relationships between different paradigms, in this study we concentrate on the relationship between auditory spatial and spectral TOJ paradigms tested in the same subject pool with comparable procedures, considering also the test–retest repetition of measurements in consecutive sessions.

There are considerable data available in the literature indicating age-related decline of temporal resolution ability in processing in the millisecond domain (e.g., Fitzgibbons and Gordon-Salant, 1998; Fink et al., 2005; Kołodziejczyk and Szelag, 2008; Ulbrich et al., 2009; Fostick and Babkoff, 2013a). This has been interpreted as part of the general deterioration of mental functions in advancing age, even in normal healthy elderly individuals who do not suffer from any neurodegenerative problems (e.g., Szelag et al., 2010; Nowak et al., 2016). One challenge for recent TIP studies has been to learn how the procedures used influence temporal acuity in different TOJ tasks. This topic has been mostly explored in young volunteers. For example, the recent meta-analysis by Fostick and Babkoff (2017) focused on a comparison of ATOT values obtained using the auditory spectral vs. spatial TOJ tasks. This comparison was based on the threshold distribution characteristics of 388 subjects tested in 13 spectral TOJ experiments and of 222 subjects tested in 9 spatial TOJ experiments. However, the pool of subjects in all meta-analyzed experiments comprised only young individuals (university students) aged from 20 to 34 years (Fostick and Babkoff, 2017; see characteristics of the metaanalyzed participants provided in Tables 1, 4 of this report). Despite many existing studies on age-related decline in TOJ, no definitive explanation of procedure-related influences on ATOT values has been evidenced in elderly (Fink et al., 2005; Szymaszek et al., 2006, 2009; Ulbrich et al., 2009).

On the other hand, our previous study on auditory TOJ using both spatial and spectral tasks in listeners aged from 20 to 69 years concentrated mostly on differences between mean ATOT values between particular age groups (Szymaszek et al., 2009), whereas direct between-tasks comparisons for ATOT distributions within particular age groups were not analyzed. A similar approach was explored by Fink et al. (2005). Furthermore, another paper by Fink et al. (2006) reported lower ATOTs in the spectral task than in the spatial task, but the subject pool comprised individuals aged between 21 and 50 years of age analyzed in a single group.

## Aim of the Study

To learn more about procedure-related influences on temporal acuity in advancing age, in the present study we test the effect of spatial vs. spectral paradigms on the auditory perception of temporal ordering in a relatively large group of elderly listeners. We aimed to extend existing findings about procedure-related effects on temporal acuity in elderly listeners and to clarify whether the expected influences are similar to those indicated in previous literature studies (Fink et al., 2005, 2006; Szymaszek et al., 2009). We compare directly, in the same sample of subjects, the response distributions obtained using spatial vs. spectral TOJ paradigms.

The identification of such procedure-related differences may increase our understanding of TIP in elderly. We therefore ask three following questions: (1) Do the obtained ATOTs differ between spectral and spatial TOJ tasks? (2) What are the distributions of the subjects' data on these two tasks? (3) Do results on these two tasks have high test–retest reliability? Similarities between performances on these two TOJ tasks would verify the hypothesis of the existence of a common timing mechanism which in elderly operates independently of the task (spatial or spectral).

## MATERIALS AND METHODS

This study was approved by the local Ethical Commission at the University of Social Sciences and Humanities (permission no 1/2017, registered as 2 /I/ 16-17) and was in line with the Declaration of Helsinki. All participants provided their written informed consent prior to the study.

### Participants

We tested 40 elderly subjects (36 females and 4 males) aged from 62 to 78 years (M = 67.4, SD = 3.6). They were recruited from the Warsaw area by advertisements in newspapers, on the internet, as well as at Universities of the Third Age (U3A) and in various local community centers. All subjects were right-handed native Polish speakers. They reported no history of neurological or psychiatric disorders, head injuries in the past, systemic diseases, or the use of medications affecting the central nervous system. The abovementioned inclusion criteria were verified in a brief interview with each subject.

All participants were screened for normal hearing levels (American National Standard Institute, 2004) using pure-tone audiometry (Audiometer MA33, MAICO) at the following frequencies: 250, 500, 750, 1000, 1500, 2000, and 3000 Hz, which covers the frequency spectrum used in the presented stimuli. To screen for dementia or depression, all participants completed the Mini-Mental State Examination (MMSE; Folstein et al., 2001) and the Geriatric Depression Scale (GDS; Sheikh and Yesavage, 1986) prior to the TOJ task. Inclusion criteria were: a score of at least 27 points on the MMSE (M = 28.8, SD = 1.1) and a score

of 5 or fewer points on the GDS (M = 2.5, SD = 1.5). All subjects reported having between 11 and 18 years of education.

These inclusion criteria allowed us to expect that the participants were in relatively good physical and mental health. It may be assumed, therefore, that they exhibited the level of cognitive functioning typical of normal healthy aging.

### Stimuli and Presentation Modes

As noted above, two TOJ tasks were used which differed in both type of stimuli and stimulus presentation modes (Szymaszek et al., 2009; Bao et al., 2014; Nowak et al., 2016). Both tasks used paired acoustic stimuli presented in rapid succession. The stimuli were generated by a computer with a Realtek ALC3246 sound controller using Waves MaxxAudio Pro software on Philips SHP8500 headphones at a comfortable listening level. Two stimuli within each pair were separated by various ISIs reflecting the time gap between the offset of the first stimulus and the onset of the second stimulus. The duration of the ISIs varied during the experiment according to a pre-defined adaptive algorithm (see below for a more detailed description).

#### In the Spatial Task

The presented pairs consisted of two rectangular pulses (clicks) of 1 ms duration each, which were presented monaurally in an alternating stimulation mode, i.e., one click was presented to one ear followed by another click to the other ear. The subject's task was to verbally report the temporal order of the two successive stimuli within each pair. Two alternative responses were possible: left-right or right-left.

#### In the Spectral Task

The presented pairs consisted of two 10 ms sinusoidal tones – a low tone of 400 Hz and a high tone of 3000 Hz. The rise-and-fall time of each tone was 1 ms. The two tones within each pair were adjusted to equal loudness on the basis of isophones. The binaural stimulus presentation mode was used, i.e., each tone pair was presented to both ears with various ISIs between the two tones in each pair (similar to the spatial task, see above). The subjects were asked to verbally report the temporal order of the two successive tones within each pair. Two alternative responses were possible: low-high or high-low. The experimental situation is displayed in **Figure 1**.

### Procedure

The experiment was conducted in a soundproof room at the Laboratory of Neuropsychology at the Nencki Institute.

To focus the participant's attention on the upcoming task, each pair of stimuli was preceded by a warning signal delivered binaurally 1 s before the first stimulus within each pair. Then, the paired stimuli were presented monaurally (in the spatial task) or binaurally (spectral task). After each presentation, subjects reported the order of the two stimuli in the presented pair.

Prior to the collection of data, each participant was given a verbal instruction by the experimenter and, then, presented with a few practice trials consisting of pairs with a relatively long ISI. In these practice trials, feedback was given on the subject's correctness after each answer. All participants performed these practice trials satisfactorily. Next, the proper measurement started and no feedback on correctness was given.

We used an adaptive algorithm based on maximum likelihood estimation to measure the subjects' ATOTs in both tasks. The implementation of the algorithm for testing elderly listeners studied here was based on the literature reports by Treutwein (1997), Fink et al. (2005, 2006) and Wittmann and Szelag (2003), as well as on our previous studies (Szymaszek et al., 2009; Szelag et al., 2011; Bao et al., 2013, 2014; Nowak et al., 2016). The algorithm consisted of two parts. In the first part, the participant

responded to 20 trials comprising paired stimuli presented with 10 fixed ISIs of varying durations. They were presented first in decreasing and, subsequently, in increasing order (i.e., up and down) according to pre-defined rules. The ISIs in the spatial task ranged from 160 ms to 1 ms (changing in 18 ms steps), and in the spectral task from 240 ms to 1 ms (changing in steps of 27 ms). These different testing ranges in the spatial and spectral tasks resulted from our previous observations, indicating different order thresholds in these two tasks in elderly subjects.

After completion of these 20 trials, based on the correctness of the subject's responses, the program calculated the ISI value for the initial trial in the second part of testing at the 75% probability of correct responses according to maximum likelihood estimation (Treutwein, 1997). In the second part of testing, 50 trials were presented. In each of these 50 trials, the ISI was adjusted adaptively: it decreased after each correct response and increased after each incorrect response. The exact values of decreased or increased ISIs were randomly selected from a pre-defined range which varied depending on the ISI being tested. To ensure accurate and precise assessment, decremental steps were 0.5–5% of the ISI value of the previous trial, while increments were 10–20% of the previous ISI value. On the basis of 70 completed trials (i.e., 20 trials in the first part of testing and 50 trials in the second part), the ATOT value for each participant was taken as the mean of the estimated

TABLE 1 | Descriptive statistics of the ATOT values (in ms) in two consecutive sessions for spatial (monaural presentation of paired clicks) and spectral (binaural presentation of paired tones) tasks.


likelihood, calculated at 75% probability level of correct responses (Treutwein, 1997).

The measurement was conducted with each subject individually in two separate sessions (Session 1 and Session 2), separated by a break of a few days. In each session, both the spatial and spectral tasks were completed. The order of tasks within each session was constant: first the spatial task was conducted followed by the spectral task. The TOJ measurement lasted approximately 10 min for each task. Each session lasted approximately half an hour.

### RESULTS

Thresholds were estimated for all participants for both TOJ tasks based on performance in Session 1 and Session 2. As the temporal information was processed from the onset of the first stimulus within a pair and different stimulus durations were used in the spatial (1 ms) and spectral (10 ms) tasks, the ISI values were replaced by Stimulus-Onset Asynchrony (SOA) values to compare the performance between these two tasks. Such procedure was applied in many previous reports (Fink et al., 2005, 2006; Ulbrich et al., 2009, see Table 1 in this report), including our studies (Szymaszek et al., 2009). SOA reflects the time between the onset of the first stimulus and the onset of the second stimulus within a pair and gives the ATOT values analyzed for each task and session (see Introduction for the definition of ATOT). For example, a stimulus duration of 1 ms clicks (monaural task) and an ISI of 60 ms gives a SOA of 61 ms. But the same ISI value of 60 ms using paired tones of 10 ms duration (binaural task) results in a SOA of 70 ms. Therefore, the analyzed SOA values were found by adding the stimulus duration (either 1 ms for the spatial task or 10 ms for the spectral task) to the ISI at which there was a 75% probability of correct responses.

### Distribution of ATOTs in Spatial and Spectral Tasks

Examining the data obtained from particular subjects, we observed important differences in the distribution of ATOT values for the spatial and spectral tasks (**Figure 2**). In the former case (**Figure 2A**), the data indicated no significant deviation from the Gaussian distribution across subjects and sessions. In contrast, in the spectral task (**Figure 2B**), the distribution of ATOTs deviated significantly from the Gaussian and was spread out more to the right (based on visual inspection, values of skewness and kurtosis, as well as results of the Shapiro–Wilk test; see **Figure 2** legend for more details). Such a dissociation in the data distributions of the two tasks was observed in both sessions.

### Comparison of ATOTs in Spatial and Spectral TOJ Tasks

Descriptive statistics of the ATOT values obtained in the spatial and spectral TOJ tasks are presented in **Table 1**.

This table shows that the ATOTs were, in general, lower in Session 2 than in Session 1 (reflecting better performance), independent of the task. However, the between-session

differences were more pronounced in the spectral task than in the spatial task.

Because of the between-tasks differences in the data distribution (explained above), to directly compare the performance on these two tasks using parametric statistical analysis, we transformed the ATOT data by square root extraction, resulting in the distribution of ATOT data approaching Gaussian. Such a transformation is recommended in the literature for the spread more to the right distributions.

Further statistical analysis was performed, therefore, using a 2-way ANOVA with repeated measures including 'Task' (spatial vs. spectral) and 'Session' (1 vs. 2) as within-subjects variables. Significance values were assumed at p < 0.05 corrected by the Bonferroni test applied to the observed main effects and interactions. The effect sizes, indexed by partial-eta squared statistics (η 2 ), are reported for all significant effects.

The ANOVA revealed a main effect of 'Session' [F(1,39) = 8.156, p = 0.007, η <sup>2</sup> = 0.173] modified by the interaction 'Session x Task' [F(1,39) = 4.371, p = 0.043, η <sup>2</sup> = 0.101]. The main effect of 'Task' was non-significant. These relationships are presented in **Figure 3**. This interaction resulted from the different effect of 'Session' in the two tasks. In the spatial (clicks) task, ATOTs were relatively stable across the two consecutive sessions and the difference between sessions was non-significant. In contrast, in the spectral (tones) task, the ATOTs in Session 2 were significantly (p = 0.009) lower than in Session 1, indicating improved performance. Furthermore, significant differences between the tasks were observed only in Session 1 (p = 0.004), being non-significant in Session 2.

### Reliability of TOJ Measurement in Two Consecutive Sessions

In addition, to verify the test–retest reliability of both spectral and spatial tasks, Pearson correlation analysis was performed.

The correlation coefficients (controlling for the subjects' age) of the transformed ATOTs (see above) in Session 1 and Session 2 reached statistical significance in both tasks, indicating a reliable measurement of ATOT on the two TOJ tasks used in the present study. However, the between-session coefficient in the spatial task had higher value than that in the spectral task (r = 0.61, p < 0.001 vs. r = 0.40, p = 0.011, respectively). Furthermore, within-session correlation coefficients between two tasks reached statistical significance only in Session 1 (r = 0.43, p = 0.006), being non-significant in Session 2 (r = 0.30, p = 0.062).

### DISCUSSION

### Comparison of Spatial and Spectral TOJ Tasks

The main finding of the present study was the identification of differences in the elderly in the performance on two TOJ tasks (spatial and spectral) which utilize temporal resolution. Differences were observed in both the threshold distribution on these two tasks and the ATOT values obtained in two sessions. Based on the threshold distributions, we observed a dissociation in the performance on these two TOJ tasks in subjects aged between 62 and 78 years. In the elderly studied in this experiment, results on the spatial TOJ task had a Gaussian threshold distribution (**Figure 2A**) accompanied by a relatively stable performance across two sessions (**Table 1** and **Figure 3**). In contrast, the spectral task was characterized by a non-Gaussian distribution (**Figure 2B**) and a significant lowering of ATOT values in Session 2 – indicating improved performance (i.e., a shorter gap being necessary to correctly order incoming stimuli; **Table 1** and **Figure 3**).

At this point one should refer to data meta-analyzed by Fostick and Babkoff (2017) indicating also the task-related dissociation in the distribution of ATOT data. Specifically, a Gaussian threshold distribution was reported in the spatial task, while the spectral TOJ thresholds distribution was skewed to the right. Furthermore, a similar mean of ATOTs in these two tasks (78.21 ms vs. 78.34 ms, respectively) was reported but the range of mean ATOTs in the spectral task (31.95–116.13 ms) was broader than in the spatial task (56.84–93.23 ms), reflecting the higher variability in the former case. Referring to this literature study, a similar pattern of task-related dissociation in the threshold distribution was evidenced in both elderly (studied here) and young listeners (investigated in the previous reports). It shows that, despite the substantial age-related decline in temporal acuity evidenced in previous literature studies (see papers cited in the Introduction), the effect of task specificity on the threshold distribution remains relatively stable across one's entire lifespan. The question is which processes may be responsible for such task-related differences in elderly?

The perceived order of two stimuli presented in rapid succession may reflect not only the temporal template, but also task-specific processes as well as different stimulus- and procedure-related influences. Therefore, the above dissociation can be also explained in terms of differences in the very structure of these two tasks. Our data indicated that experimental factors which constrain the subject's responses affect the measured ATOT values additively. These two tasks seem to be performed using different perceptual strategies implemented in the auditory processing within the nervous system. The TOJ paradigm, rooted in auditory perception, may not just be associated with a timing mechanism free of any procedure-related (non-temporal) influences. Despite the rapid presentation of stimuli with short ISIs in both tasks, there were important differences between these tasks. In the spatial task, the two clicks were identical in all of their characteristics covering an identical frequency spectrum delivered asynchronously to each ear. In contrast, the spectral task employed two tones differing in frequency (400 vs. 3000 Hz) which were delivered asynchronously to both ears. Each of these two tasks involved task-specific strategies in addition to the temporal processes.

Referring to the previous reports on auditory perception, two tones of different pitches may be perceived as a single tone sound of rising or falling frequency (Micheyl et al., 2007; Deike et al., 2010; Selezneva et al., 2018). This hypothetical phenomenon of auditory streaming or frequency modulation reflects the specific sensory integration process within the auditory system, which is likely also involved in auditory TOJ. This may reflect a specific perceptual bias toward integrated auditory perception of tones presented in rapid succession. Accordingly, sequences of two tones might be perceived as a single frequency-modulated tone glide either rising (low-to-high) or falling (high-to-low), removing the need to identify the first and second tone. In the literature, the phenomenon of auditory streaming has been studied using sequences of tones in both humans and animals, and several theories about its neural basis have been proposed (Hartmann and Johnson, 1991; Rauschecker, 1998; Micheyl et al., 2007; Snyder and Alain, 2007; Selezneva et al., 2018).

Because of these processes, the two tones within a sequence could be integrated into a single percept at short SOA, so their before-after relation could be identified based on a frequencymodulated pattern (rising or falling) rather than on the detection of the temporal order of separate stimuli (a before-after relation). As a consequence, some individuals might reproduce the order from such modulated tone glides and thus circumvent the need to identify the first and second stimulus within a presented pair. As there are probably individual differences in ability to perceive auditory streaming, the results distribution of the spectral task had higher variability than that of the spatial task (see **Figure 2**). Judgment is based on the above strategy to a greater extent in the spectral task, which seems to depend more than the spatial task on auditory streaming and frequency modulation (which have greater effectiveness at shorter SOA). The spectral paradigm, therefore, seems to be more constrained by the perceptual strategies associated with tone processing. In contrast, the streaming strategy cannot be used in spatial auditory TOJ because the spatial task employs clicks of identical pitch, rather than tones differing in pitch.

To summarize, different perceptual strategies seem to be engaged in TIP when clicks or tones are used to order sounds which are presented asynchronously in rapid succession. The question remains about the relation between the threshold values in the spatial and spectral tasks. In our study, we found significant

differences between tasks only in Session 1 resulting from the better temporal acuity in the spatial than the spectral task (**Figure 3**). This pattern of relationships is not in line with previous reports. In our previous study (Szymaszek et al., 2009), the spectral task resulted in shorter mean ATOT values (76 ms) than did the spatial task (88 ms) in a group of 16 listeners aged from 60 to 69 years. A similar relationship was reported by Fink et al. (2006, see Table 1 in this report) who found thresholds of 57.54 ms vs. 31.24 ms in the spatial and spectral tasks, respectively, in a sample of 50 participants (aged between 21 and 50 years). Moreover, both the above literature studies did not report session-related differences between tasks.

We think that this reversed relation between ATOTs could result from the shape of the data distributions obtained on these two tasks. In recent literature, some researchers indicated that a relatively large number of participants performed a spectral TOJ task (separating successively presented tones) with very short SOA (Fink et al., 2006, Figure 1; Fostick and Babkoff, 2013b, 2017, Figure 1). Thus, the higher skewness of the spectral threshold distribution was mainly due to a large number of participants with ATOTs shorter than 20 ms. For example, Fink et al. (2006) indicated that more than 20 out of 50 participants (aged 21–50 years) had thresholds of 10– 20 ms. In our study, as shown in **Figure 2B**, the number of participants with short SOAs was smaller. It is likely that many participants had higher ATOTs in the spectral task in this experiment due to their being older (aged from 62 to 78 years) than the participants in the above literature reports.

The question is whether the perceptual constraints of timing mechanisms reported here are age-specific, thus, characteristic for elderly or age-independent, thus, evidenced across the broader lifespan. It may be concluded that there is some significant decline in auditory streaming ability and frequency modulation in late adulthood, despite all participants having had normal hearing levels. To our knowledge, no apparent role of age in auditory streaming has been previously reported. Further studies are needed to explore these implications about the relationships between effects of temporal and perceptual (nontemporal) processes on measured threshold values in particular age groups.

### Comparison of the Two Consecutive Sessions

Lower ATOT values were usually observed in Session 2, rather than in Session 1, which was reflected in the significant main effect of 'Session' (p = 0.007, see ANOVA results). This may have been due to training or adaptation to the task after two consecutive repetitions of the measurement separated by a break of a few days. This learning effect may correspond not only to improved TIP (i.e., temporal acuity of information processing), but also to changes in concomitant non-temporal processes involved in the TOJ task, discussed above. The possibility of improved TIP was also reported in previous studies and is applied in new cognitive therapy methods based on the transfer of improvement from the trained time domain to the cognitive domain, which was not trained during the intervention (e.g., Tallal et al., 1996; Szelag et al., 2015a; Szymaszek et al., 2018).

The most important result of our study was a strong dissociation in the magnitude of this learning effect between the two tasks, reflected in the 'Session x Task' interaction (p = 0.043). Whereas a huge improvement in Session 2 (p = 0.004, Bonferroni test) was found in the spectral task, in the spatial task this difference was non-significant (**Figure 3**). To explain this dissimilar effect of session, we refer once again to the use of specific perceptual strategies based on frequency modulation and auditory streaming in the auditory processing of tones discussed above. We hypothesize that repeated measurement may foster better application of the auditory streaming strategy in the spectral task, which would not have occurred in Session 1 likely because of the task novelty.

The use of the aforementioned rising vs. falling two tone glides, rather than the identification of consecutive tones, may result in significantly lowered ATOTs in Session 2 as compared to Session 1. As in the elderly participants studied here, this perceptual strategy in the spatial task (identical clicks presented monaurally) seems less helpful, the learning effect was rather small (median ATOT of 90 vs. 85 ms in Sessions 1 and 2, respectively, see **Table 1**) and statistically non-significant. It probably reflects more the improved TIP. To summarize, we are of the opinion that the test–retest comparisons in the two TOJ tasks indicate both improved TIP and the use of an auditory streaming perceptual strategy in the elderly participants.

### Reliability of ATOT Measurements and Practical Implications for Future Studies

Pearson correlations between the ATOTs obtained in the two consecutive sessions in each task (controlling for the subjects' age) indicated moderate significant correlation coefficients. Such positive correlations between the two sessions seem important for understanding how our brains create the inner experience of time and whether we are equipped with a central millisecond timing mechanisms (see below). This would suggest that both paradigms studied here constitute reliable measurement tools for the replicable assessment of sequencing abilities measured by auditory TOJ.

The correlation coefficients had rather moderate values in both tasks (i.e., r = 0.61, p < 0.001 vs. r = 0.40, p = 0.011 in the spatial and spectral tasks, respectively), which may suggest the influence of intra-individual variability on the measured indices of temporal resolution in the two consecutive sessions. Such intra-individual variability might be a result of the contribution of other cognitive processes (e.g., perception, attention, working memory, decision making, etc.) to TOJ. Specifically, the lower correlation coefficient accompanied by a lower significance level on the spectral task than on the spatial task may be due to the involvement of perceptual strategies associated with the auditory perception of tonal stimuli discussed above. The effect of these strategies seems to be more pronounced in the spectral task because of the specific processes (auditory streaming) involved in tone perception. We are of the opinion that such extra effects, which co-exist with TIP, may generate additional

variability in addition to the intra-individual variability of TIP. Hence, there was a lower correlation coefficient and a lower level of significance on the spectral task than on the spatial task.

Another support for the involvement of extra non-temporal processes in TIP comes from correlation coefficients within a time point between tasks which reached statistical significance only in Session 1 being non-significant in Session 2. It may result from the involvement of more pure timing processes in Session 1 which in Session 2 are constrained by additional non-temporal perceptual processes related to auditory streaming in the spectral task in Session 2 (see **Figure 3**).

The discussion about the reliability of ATOT measurement using these two paradigms may be important for future clinical applications of TOJ in comparisons between normal samples and various clinical subpopulations. As mentioned in the Introduction, many patient groups show deficient timing accompanying a decline in cognitive processes. Thus, accurate and reliable measurement tools for the assessment of TIP (including temporal resolution in the millisecond domain) are necessary to provide measurements for diagnostic purposes. The obtained indices should be considered with caution because differences between groups may not necessarily reflect deficient temporal acuity, but rather difficulties using the same perceptual strategies as the controls.

On the basis of comparisons between the two sessions reported here, we might suggest some practical implications for future studies to increase their testing validity. On the spatial task, we postulate that the evaluation of TIP efficiency should not be based on a measurement from a single testing session because some learning or adaptation effects (although non-significant) were visible between Session 1 and 2 (see **Figure 3**). To reduce inaccurate assessments, the measurement should be repeated a few times in consecutive sessions so that mean ATOTs reflect more veridical temporal acuity. In contrast, in the spectral task the absolute ATOTs having been significantly higher in Session 1 than in Session 2, as well as the strong learning effect observed in Session 2 (**Figure 3**), shows that, in future applications, TOJ indices in particular sessions should be considered rather as a separate normative data characteristic for a given test repetition.

The procedure-related differences reported here give rise to another problem for which TOJ tasks could be suitable for use in future studies. Given that the spatial TOJ task probably better reflects TIP without the additional influences of perceptual strategies, this task might be recommended for a more veridical assessment of TIP efficiency in individuals. Additional influences generated by non-temporal perceptual strategies associated with auditory perception of presented stimuli might blur the genuine timing properties.

### One Central Mechanism vs. Various Task-Dependent Mechanisms

The final problem to be considered here concerns the conceptualization of processing mechanisms controlling the perception of succession and temporal resolution. As described in the Introduction (above), two distinct hypotheses on this issue exist in the literature. One of them assumes a central timing mechanism responsible for TOJ, independent of the paradigm used (Hirsh and Sherrick, 1961); Efron, 1963; Mills and Rollman, 1980; Pöppel, 1994, 1997, 2009; Wittmann, 1999; Szelag et al., 2004b; Babkoff et al., 2005; Ben-Artzi et al., 2005; Bao et al., 2014). The other hypothesis suggests a paradigm-specific and strongly procedure-dependent mechanism controlling this ordering ability (Fink et al., 2005, 2006; Szymaszek et al., 2006, 2009; Szelag et al., 2011; Fostick and Babkoff, 2017).

To better understand the neural basis underlying temporal ordering, one should refer to the hypothesis about time windows for temporal integration mentioned in the Introduction (Pöppel, 1994, 1997). Some evidence supports the notion of temporally discrete information processing within a time window of some tens of milliseconds. This assumption leads to the discussion of the theoretical model of temporal ordering. The neural mechanisms responsible for temporal resolution seem to be based in neuronal oscillatory activity, as evidenced in electrophysiological studies which indicate a periodicity of about 40 Hz (between 30 and 80 Hz; VanRullen and Koch, 2003; Oron et al., 2015; see also Benasich et al., 2008 for a review). Thus, one oscillation period has ca. 25 ms duration. According to Pöppel (1997, 2009), a before-after relation can only be perceived if the two stimuli occur within at least two successive oscillatory periods. Thus, to identify the before-after relation of stimuli presented in rapid succession, they must be separated by a time gap of some tens of milliseconds. There is strong evidence that spontaneous (or stimulus triggered) gamma band oscillations, presumably corresponding to ATOT, play an important role in human cognition (VanRullen and Koch, 2003).

On the basis of our results, which indicate a clear dissociation in performance between the spatial and spectral TOJs, one might conclude that these data do not support the hypothesis on the central mechanism which controls the ability to sequence stimuli, as evaluated with various stimuli and procedures. This conclusion, however, should be drawn with caution. Referring to the neuro-oscillatory activity which probably constitutes the physiological basis for TIP in the millisecond range (see above), one may assume that the time limits of the underlying mechanism can be modified by different non-temporal processes, including perceptual strategies in the spatial and spectral TOJs used in this study. In the former case, performance seems to reflect more the efficiency of the genuine timing, whereas in the latter case, it may correspond to auditory streaming and frequency modulation occurring within the auditory pathway. As distinct processes for detecting the temporal order are evoked by each task, different absolute threshold values should be expected, meaning that we cannot rule-out the hypothesis of a single central mechanism forming the basis of temporal resolution measured with behavioral methods. The ATOTs obtained in the spectral task, therefore, seem to reflect an interaction between the genuine timing and cognitive processes related to perception of auditory stimuli. The idea of temporal processes constrained by perceptual non-temporal task specific processes can be supported in our study by both the mean ATOT values and correlation coefficients (between- and within-session).

Finally, we are of the opinion that temporal acuity and sequencing abilities are based in neuronal oscillatory activity. However, the absolute thresholds measured in auditory TOJ tasks are stimulus-dependent, procedure-related, and influenced by the perceptual strategies used by participants.

### AUTHOR CONTRIBUTIONS

fpsyg-09-02557 December 8, 2018 Time: 15:8 # 10

ES conceived and designed the study, analyzed and interpreted the data, wrote the manuscript, and was responsible for the final version of the manuscript. KJ and MP recruited the subjects

### REFERENCES


and acquired, analyzed, and interpreted the data. AS and HB interpreted the data and wrote the manuscript.

### FUNDING

This research was supported by National Science Centre (NCN), Poland, grant number 2015/17/B/HS6/04182.

### ACKNOWLEDGMENTS

We thank Anna Bombinska for her technical assistance during the data collection phase.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Szelag, Jablonska, Piotrowska, Szymaszek and Bednarek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Development of Temporal Concepts: Linguistic Factors and Cognitive Processes

#### Meng Zhang\* and Judith A. Hudson

Department of Psychology, Rutgers University, The State University of New Jersey, New Brunswick, NJ, United States

Temporal concepts are fundamental constructs of human cognition, but the trajectory of how these concepts emerge and develop is not clear. Evidence of children's temporal concept development comes from cognitive developmental and psycholinguistic studies. This paper reviews the linguistic factors (i.e., temporal language production and comprehension) and cognitive processes (i.e., temporal judgment and temporal reasoning) involved in children's temporal conceptualization. The relationship between children's ability to express time in language and the ability to reason about time, and the challenges and difficulties raised by the interaction between cognitive and linguistic components are discussed. Finally, we propose ways to reconcile controversies from different research perspectives and present several avenues for future research to better understand the development of temporal concepts.

#### Edited by:

Danielle DeNigris, Fairleigh Dickinson University, United States

#### Reviewed by:

Christoph Hoerl, University of Warwick, United Kingdom Tilmann Habermas, Goethe-Universität Frankfurt am Main, Germany Petra Hendriks, University of Groningen, Netherlands

> \*Correspondence: Meng Zhang zhangmeng0904@gmail.com

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 11 July 2018 Accepted: 19 November 2018 Published: 05 December 2018

#### Citation:

Zhang M and Hudson JA (2018) The Development of Temporal Concepts: Linguistic Factors and Cognitive Processes. Front. Psychol. 9:2451. doi: 10.3389/fpsyg.2018.02451 Keywords: temporal concepts, temporal language, conceptual development, language development, temporal perspective

### INTRODUCTION

Time is an essential dimension of the universe. The concepts of past, present, and future are important mental constructs for structuring experiences. We live in the ever changing present, and our experience of past, present, and future keeps shifting (Harner, 1982). Adults have a dynamic and flexible temporal perspective, which allows us to organize experiences and navigate through time mentally, but when do children acquire the concept of time? To grasp the abstract idea of time is not easy. A concept of time depends on the acquisition of many time-related abilities such as understanding and being able to talk about time, being able to distinguish the past, present, and the future, and reasoning about the sequence of events. Researchers studying both cognitive development and language acquisition have investigated children's understanding of time. However, the findings from these lines of research are not consistent. Because understanding time is a multi-facet competence that draws upon various cognitive and linguistic faculties, reconciling research findings from these different perspectives will help further our understanding of the roles of cognition and language in understanding time. This paper reviews research on children's understanding of time, focusing on the cognitive and linguistic components involved in early development.<sup>1</sup> In particular, conflicting results from language development studies and studies

<sup>1</sup>This review focuses on the emergence and development of temporal concepts from 2 to 6 years. However, children's temporal understanding becomes more refined and sophisticated after age 6. Researchers have investigated how older children use temporal knowledge to improve their understanding of transformations in various domains (Montangero and Pownall, 1996), recall the time of past autobiographical events (Friedman, 2004) and construct life story narratives (Köber and Habermas, 2017), and develop an understanding of historical time (Thornton and Vukelich, 1988; Reisman and Wineburg, 2008).

addressing cognitive processes are discussed, as well as theoretical issues about the role of language in the development of time concepts. Because children's cognitive abilities and linguistic capacities are interdependent, practical issues about how to measure each component individually are also considered. Finally, directions for future research to resolve theoretical and practical issues are proposed.

### CONFLICTING EVIDENCE FROM PAST RESEARCH

The limited literature investigating the emergence of temporal concepts comes from two research lines focusing on children's temporal language acquisition and their temporal cognitive processes, respectively. Psycholinguistic researchers claimed that the separation of event time from speech time indicates children's emerging concept of time and their usage of tensed verbs is evidence of a grasp of the basic distinctions between past, present, and future by age 3 (Weist, 1989). However, researchers focusing on temporal cognition concluded that 4- and 5-year-olds do not yet understand the distinctions between the past, present, and future properly (Friedman, 2003). What is the evidence for these conclusions and how can they be reconciled?

### Acquisition of Temporal Language

Time is encoded in language in many ways. Language is the primary medium through which notions about past and future events are transmitted (Harner, 1982). In English, many devices, such as aspect, tense, and temporal adverbs, are used to denote time and code time-related characteristics of actions (Klein, 2009). For example, aspect delineates the internal contour of the event itself, whereas tense and temporal adverbs denote the position of an event on a timeline. Developmental psychologists have argued that the emergence of temporal markers in children's language indicates changes in their understanding of time (Weist, 1989; Busby Grant and Suddendorf, 2011).

In tensed languages, three important points in time are encoded in speech (Reichenbach, 1947). Speech Time (ST) is the time point of the act of speech. Event Time (ET) is the time when the event occurred, and Reference Time (RT) indicates the speaker's temporal vantage point. It is particularly clear when RT does not coincide with ST and ET, as in the case of the past perfect tense (e.g., in "Peter had gone," RT is between ET and ST) and the future perfect tense (e.g., in "Peter will have gone," RT is after both ST and ET). Based on Reichenbach's theoretical work and observations of language acquisition, Weist (1989) proposed a four-system model of children's temporal language development, with each system reflecting a different level of competence. The first system is the ST system used by children from 12 to 18 months. Children's speech at this stage focuses on here-and-now. It does not include tense, aspect, or modality. Between 18 and 24 months, children begin to use past tense to mark an event anterior to speech time and to use future tense to mark an event posterior to speech time. This corresponds to the ET system, where ET is expressed separately from ST. Later, between 30 and 36 months, children start to use temporal adverbs to indicate when an event occurs, which corresponds to the restricted reference time (RTr) system. For example, a child might say, "Yesterday I was in Lodz" (Weist, 1989, p. 108). Compared to utterances from ET system, e.g., "I was in Lodz," utterances from the RT<sup>r</sup> system contain both event time (i.e., past tense) and reference time (i.e., yesterday) and both are referenced in contrast to speech time. The last system is the free reference time (RT<sup>f</sup> ) system, emerging between 36 and 52 months. Compared to the RT<sup>r</sup> system, children are now capable of manipulating RT, ST, and ET to freely express more temporal configurations. They can use the temporal prepositions "before" and "after," perfect tenses, and even temporal clauses – for example, "While this one is playing [RT], that one will be playing [ET]" (p. 105). Weist believed that the separation of ET from ST (i.e., the use of tense) indicates an emerging concept of time and that reference to specific non-present time points (i.e., the use of temporal adverbs) indicates a developing temporal framework. The development from RT<sup>r</sup> system to RT<sup>f</sup> system relates to more complicated cognitive processes such as temporal decentering and relational reasoning.

In support of Weist's model, studies focusing on children's natural language production reveal a haphazard use of inflected verbs from 21 to 22 months (Nelson, 1989). From 22 to 24 months, children develop a present-past-progressive system, which first reflects a contrast between now and not-now, and later takes on direction by specifically coding pastness (Nelson, 1989). However, production of verbs with visible tense marking is rare in 2- to 3-year-olds' spontaneous speech (Valian, 1991). Even when they are asked to imitate adult's utterance of past tense, only 2% of verbs are past tensed by 2-year-olds with low mean lengths of utterance (MLU, from 1.5 to 2.5 words) and 14% of verbs are past tensed by 2-year-olds with high MLU (2.5 to 4.6 words) (Valian and Aubry, 2005). Instead of verb inflections, the future time of an action is conveyed in English by a set of modal auxiliaries such as will, shall, may, must, and can. Research by Ames (1946) and Harner (1981) found that children first spontaneously produced words such as gonna and in a minute to denote future at age 2; later they used is gonna/is going to predominately when referring an action that was just about to happen.

Children's elicited language production indicates that they are able to use past tense and future verb forms quite accurately by age 3. For example, Harner (1981) demonstrated actions within a short timescale (e.g., a doll went down a slide) to 3- to 7-year-olds and asked them Tell me about this one while pointing to either the toy that had completed the action or to an identical toy that always did the same thing; children were asked to either describe what the first toy had done or what the other toy was going to do. Three-year-olds were able to distinguish past and future actions, and the majority of their responses contained past tense (70%) and future verb forms (87%). Other researchers elicited children's temporal language in describing events over a relatively longer timescale by simply asking questions such as, What are you going to do tomorrow? and What did you do yesterday? Threeyear-olds were able to answer the tomorrow question with the appropriate verb form, gonna; 4-year-olds were able to answer the yesterday question using a past tense verb (Ames, 1946; Busby and Suddendorf, 2005).

Language production data indicates that children also begin to use temporal adverbials between 2 and 3 years of age (Ames, 1946; Weist, 1989; Pawlak et al., 2006). Ames (1946) observed 1.5- to 4-year-olds' spontaneous language production and found that references to the present (today) emerged around 24 months, references to the future (tomorrow) appeared around 30 months, and references to the past (yesterday) appeared around 36 months. Similarly, a longitudinal study (Pawlak et al., 2006) found that children produced today and tomorrow earlier than yesterday. However, although young children are able to produce temporal adverbs in the appropriate sentence position, their actual temporal references may be inaccurate (Bloom, 1970; Busby Grant and Suddendorf, 2011). For example, parents evaluated their 3- to 5-year-olds' use of temporal terms such as yesterday and tomorrow as less appropriate than their use of more general terms such as now, soon, and later (Busby Grant and Suddendorf, 2011).

These findings suggest that although children are able to produce temporal terms at 2 and 3 years, their usage may not always be appropriate. Nelson (1991) proposed that very young children have a basic grasp of time and temporal language, but their understanding is still limited as compared to older children. This interpretation raises the question of what temporal components children understand when they first use temporal language. This question has not been fully addressed, but one approach has been to examine children's comprehension of temporal language independent of their production of temporal language.

Temporal language comprehension studies have shown that 2- and 3-year-olds understand how tense is used to denote the past and future, but not how more precisely temporal adverbs locate events in time. For example, Herriot (1969) found that 3-year-olds used inflections and modal auxiliaries to correctly identify past and future actions, even with novel verbs. He presented completed and not-yet-begun actions to children using movable toys (one at the starting point of an action, and the other at the ending point). A novel verb (e.g., gling) was used to describe the action and children were asked, Which one is going to gling? and Which one has glinged? With even younger children, Valian (2006) demonstrated familiar actions such as tying shoes (e.g., one tied and the other about to be tied) and asked children either, Show me the one I did tie or, Show me the one I will tie. Two-year-olds successfully distinguished the auxiliaries will and did for future and past actions. Adding temporal adverbials to questions, such as before or already for the past and in a second or next for the future improved 3-year-olds' performance, but not 2-year-olds' (Wagner, 2001; Valian, 2006).

Understanding how temporal adverbs are used to represent, localize, and organize events in time is more difficult than understanding how verbs denote completed or future events. Weist et al. (1991) compared children's understanding of sentences referring to past and future ET using only tense (e.g., The girl threw/will throw the snow ball) to their understanding of sentences referring to the RTr framework using both tense and temporal adverbs (e.g., The girl will dance tomorrow/in a while). They found that children could parse the temporal relation coded in the ET system (using tense) at 2.5 years, much earlier than they could parse temporal relations coded in the RT<sup>r</sup> system (using tense and adverbs), which was not achieved until 5.5 years.

What makes sentences with both tense and temporal adverbs more difficult? One possibility is that younger children simply do not understand temporal adverbs and find sentences containing temporal adverbs to be confusing. To test this possibility, researchers examined children's understanding of common temporal adverbs, such as yesterday, today, and tomorrow. In contrast to results from language production research, results from comprehension studies suggest that children's grasp of yesterday (referring to the past) appears before that of tomorrow (referring to the future). Harner (1975) assigned toys to 2- to 4-year-olds to play with on successive days (i.e., yesterday, the testing day, and tomorrow) and asked children to show a toy from yesterday and a toy for tomorrow. She found that 2-yearolds barely understood either yesterday or tomorrow, 3-yearolds performed better on yesterday questions, and 4-year-olds understood both yesterday and tomorrow. Zhang and Hudson (2018) explicitly tested children's understanding of the relational underpinnings of yesterday and tomorrow. They presented 3- to 5-year-olds pairs of pictures of objects with visible changes of state (e.g., a carved pumpkin and an intact pumpkin) and sentences referring to an action about the target object (e.g., I carved the pumpkin yesterday or I'm gonna carve the pumpkin tomorrow). Children were asked temporal questions such as What does it look like now? Compared to Harner's task, this task not only requires children's understanding of yesterday and tomorrow as distinct categories but also their understanding of the underlying temporal relations between the past and the present and between the future and the present. Similar to other comprehension studies, they found that children answered questions about yesterday more accurately than they did for questions about tomorrow.

Thus, there seems to be a lag between children's production of temporal language and their comprehension of relational temporal language. Language production studies showed that children begin to use tense around 2 years old and begin to use temporal adverbs around 3 years old. Language comprehension studies showed that children were able to parse temporal relations from tense around 2 to 3 years, but they could not understand the temporal relations coded by temporal adverbs even at age 4 or 5. Such production-comprehension asymmetries have been observed in many studies of children's language, with comprehension typically found in advance of production (Clark, 1995). Delays in comprehension compared to production are found in children's mastery of pronouns, scalar implicatures, aspect, deictic references, and other linguistic forms (Hendriks, 2014). For example, research showed that children produced pronouns (him, her) from age 2 or 3, but the adult-like comprehension of pronouns was usually not found before age 6 (Sekerina et al., 2004).

Explanations for discrepancies between language comprehension and production come from four perspectives. First, the grammatical account claims that children's immature use of syntactic direction-sensitive constraints causes delays in comprehension. Production begins from the input of meaning to the output of optimal form, whereas comprehension begins

from the input of form to the output of optimal meaning. Comprehension lags production because young children cannot compute the speaker's alternative perspective. They have to acquire a Theory of Mind or develop greater processing capacity to be able to compute constraints from both the hearer's and the speaker's perspectives (van Hout, 2007). Second, the interface account emphasizes the cognitive resources needed for processing and integrating linguistic knowledge, discourse, and situation information (Hendriks and Koster, 2010; Hendriks, 2014). Working memory and cognitive control are required to keep multiple interpretations in mind during comprehension (Reinhart, 2004). Third, the pragmatic account proposes that children lack pragmatic knowledge (e.g., knowledge of implicature, reference, deixis, discourse structure, etc.); they may not yet be aware of the subtleties involved in using certain words or grammar, which makes comprehension more difficult. Fourth, the delay of comprehension may be due to the testing context (Grimm et al., 2011). Most comprehension tasks take place in non-naturalistic and highly controlled situations and test sentences are often presented with minimal context. Children lack contextual knowledge of the testing situation that would support their comprehension of the presented sentences (Papafragou and Musolino, 2003).

With regard to comprehension and production of temporal language, this asymmetry may occur because the understanding of relations conveyed by temporal language not only requires a basic understanding of the meaning of temporal markers in language, but also the ability to mentally represent temporal relations between events, which may develop later. For example, a basic grasp of temporal markers may only involve a discrimination of them (e.g., yesterday refers to different things from that of tomorrow) and a sense of their sentential distribution, which seems enough for young children to produce utterances with temporal markers. Bloom (1970) case study showed that a 30-month-old was able to produce sentences with temporal terms such as today, next Monday, and last night in appropriate positions, but the actual temporal references of these adverbs were inaccurate except for the term now. A mature understanding of temporal markers involves a differentiation of the past and the future in terms of sequence, causal relations, distance to the present, and so on. These distinctions require more cognitive processes in order to represent events, to mentally manipulate event representations, and to map representations and relations to linguistic expressions. With the development of these cognitive skills, children's temporal understanding becomes more refined and their use of temporal language production also becomes more accurate and precise, as Weist described in his RT<sup>f</sup> system, when children are able to manipulate and express multiple temporal relations.

### Children's Temporal Representation of Events

Conceptualizing time and mapping language onto temporal constructs involve a number of cognitive processes, including event representation, memory, and reasoning. Children's implicit and explicit understanding of time derives from their representation of events, including the expected sequence of components within particular events, the representations of sequences of multiple events, and eventually, the localization of events in time (Nelson, 1991). McCormack and Hoerl (1999, 2001) distinguished two frameworks, perspectival and non-perspectival, that can be used for representing the temporal location of events. Temporal representations within a perspectival framework locate entities/events relative to one's own position/point of view, for example, describing an event as days from present. In contrast, representations within a non-perspectival frameworks locate entities/events independent of one's position/point of view, for example, describing an event on a given calendar date. Concepts of the past, present, and future are included in the perspectival framework because the past and future are defined from the vantage point of the present. With a stable but ever changing present, our temporal perspective is dynamic and the contents of past, present, and future keep shifting (Harner, 1982). In this review, we focus on children's acquisition of the perspective framework of time, specifically their acquisition of tense and temporal adverbials as discussed in previous section, and their ability to differentiate the past, the present, and the future as discussed in this section.

Friedman and colleagues (see Friedman, 2003, 2005 for reviews) conducted many studies investigating children's representation of a temporal framework ordered with distinct categories for past, present, and future events. In most of these studies, children were asked to judge the past–future status and temporal distances of events in verbal and spatial (timeline) tasks. Friedman consistently found that children often confused the past–future status; they judged impending events as being a short time ago and recent past events as belonging to the near future (Friedman et al., 1995; Friedman and Kemp, 1998). For example, most 4-year-olds responded "yes" to the question, "Is Halloween coming soon?" in the weeks after the holiday (Friedman, 2000). Friedman argued that this confusion comes from a distance-based process of temporal differentiation, in which distance to the present is a salient cue for children to locate and differentiate the time of events.

For more familiar daily events (such as waking up in the morning, eating breakfast, lunch, dinner, going to bed), Friedman (2002) found similar past–future confusion. When tested after breakfast and before lunch, about 75% of 4-year-olds judged that lunch would occur in the future, but only 50% of them correctly judged that breakfast had occurred in the past. Similar confusions in 3-year-olds' temporal judgments were reported by Tillman et al. (2017) in a study about representions of familiar events using a timeline. This limited ability in discriminating past– future status does not only appear for events with cyclic patterns. Friedman (2003) found that 4-year-olds failed in judging the temporal locations of autobiographical events provided by their parents.

Busby Grant and Suddendorf (2009) tested children's temporal differentiation by using a past timeline and a future timeline separately. Three silhouettes of a person were placed at the appropriate points along the timeline to indicate the passage of time. Children were told that a larger silhouette indicated further

in the future (e.g., "this is a picture of a bigger person, like when you are going to be a bit bigger than you are now"); a smaller silhouette indicated longer ago in the past (e.g., "this is a picture of a smaller person, like when you were a bit smaller than you are now"). Children were asked to locate daily, annual, or remote events along the timeline. For example, the experimenter showed children a picture of toothbrush and asked "when did you last clean your teeth? Was it a little time ago, a long time ago, or a really long time ago? Point to where you think it should go." Three-year-olds discriminated times of past events but failed to discriminate times of future events. Four-year-olds performed well for past events, and differentiated daily events from more remote future events. Five-year-olds differentiated both past and future events across all temporal distances. Hudson and Mayhew (2011) also found that after age 5, children were equally accurate in locating past and future events on a timeline. They showed children pictures of events, either depicting someone else (e.g., "This girl is going to the dentist tomorrow") or themselves (e.g., "When did you go to Sari's birthday party?"), and asked them to place the picture on a timeline made of rectangles representing days. Similarly, they found regardless of the effects of temporal distance, a differentiated sense of the past seemed to emerge earlier than a differentiated sense of future.

The findings from this line of research suggest that children's ability to distinguish between past and future events is not as firm as would be expected from studies of temporal language comprehension and production. One explanation for this discrepancy between findings from language-based and timeline-based studies of children's temporal understanding is that in temporal judgment studies using spatial representations of time, the distance of events to the present is very salient. Young children may focus on the distance of an event to the current time point without considering whether events have already happened or have yet to happen. Another issue with the timeline methodology is that the direction of past and future and the scale of distance vary considerably across spatial tasks. For example, Friedman and colleagues (see Friedman, 2003, 2005 for reviews) used tasks in which time was represented as a road stretching ahead in front of the viewer; whereas other researchers (e.g., Busby Grant and Suddendorf, 2009; Hudson and Mayhew, 2011; Tillman et al., 2017) used horizontal time lines where time flowed from left to right. The variations in spatial representation of temporal direction and the saliency of temporal proximity to the present that are entailed by timeline-based measures are not an issue in language-based measures. This may contribute to the discrepant results from these two methods.

Judging and locating events on a timeline measures children's sequential representation of events which is the cognitive foundation for other types of temporal reasoning, such as sorting out the relations between events in the past, present, and the future. Research has shown that children understand basic sequential relations by age 3. For example, Carni and French (1984) told children stories about familiar events with pictures of events in the story and asked them what happened before or after a specific action. They found that 3-year-olds reliably distinguished between sequential relations of before and after given this highly supportive context.<sup>2</sup> Similarly, Fivush and Mandler (1985) presented children pictures of familiar events such as going to the supermarket, and unfamiliar events such as going to parachute jumping. After a careful view of all the pictures, children were asked to put randomly ordered pictures in sequence. They found that 4-year-olds were able to reconstruct the temporal sequences of many familiar events. In general, forward temporal reasoning is easier than backward temporal reasoning for children (Tillman et al., 2015; Zhang and Hudson, 2018). Familiar events in forward order are the easiest to sequence, followed by unfamiliar events in forward order, familiar events in backward order, and finally, unfamiliar events in backward order (Fivush and Mandler, 1985).

Moreover, the temporal organization of an event is also a function of how well the mental representation of the event is encoded (Mandler, 1986). For events with a clear goal, outcome, and internal relationships, event representations are easier to be established and the temporal sequences of event components are encoded automatically during initial construction. Causation is one internal relationship that connects events or event components. Physical causes precede effects; therefore causation inherently contains temporal sequence. Using an elicited imitation paradigm, Bauer and Shore (1987) and Bauer and Mandler (1989) showed that children as young as two recalled events with causal relations better than those lacking causal relations, and when causal relations were interrupted, children were still able to organize their recall around causal relations.<sup>3</sup>

A sense of the past and future not only involves judging events as belonging to the past or the future, but also an understanding of the conceptual relations between the past and future. For example, a past event, but not a future event, could physically affect the present state of affairs. The past, but not the future, can be known; the future, but not the past, can be altered. Although children's ability to reason about temporal and causal relations develops with age, 3-year-olds already understand that physical causes precede their effects (Gelman et al., 1980). The inherent sequence within causation contributes to children's understanding of conceptual relations between the past and present. Povinelli et al. (1999) presented children with videos and verbal descriptions of two past events in which they just participated such as hiding a puppet. Children as young as 4 years were able to find the puppet in its current location, indicating

<sup>2</sup>Although evidence of young children's understanding of before and after comes from this investigation of preschool children's performance in a script-based task and from observations of preschool children's spontaneous production of relational terms, such as before, after, because, so, if, but, or (French and Nelson, 1985), a flexible understanding of before and after, as tested by sentence comprehension tasks using more complex time clause structures (e.g., X before Y vs. before Y, X; Y after X vs. after X, Y), is not evident until age 12 (Pyykkönen and Järvikivi, 2012). More discussion of linguistic factors in the acquisition of the terms before and after can be found in Clark (1971) and Blything et al. (2015).

<sup>3</sup>Temporal-causal connections are also observed in children's personal narratives and stories, and children use temporal conjunctions (then, next, first, before, and after) to sequence actions within narratives (Hudson and Shapiro, 1991; Berman and Slobin, 2013). Because narrative production and story comprehension also depend on several other types of knowledge such as an understanding of episodic structure (a story schema), content knowledge, and metalinguistic knowledge, this literature is not included in our review.

that they understood that the very recent past events causally determined the present. With age, children's grasp of causal relations between past, present, and future becomes flexible and applicable in different contexts. Busby and Suddendorf (2010) investigated children's temporal reasoning by describing two short vignettes to children: one about a character who acquired an object (e.g., a balloon) or knowledge (e.g., a name) in the past, and the other about another character acquiring that object or knowledge in the future. Children were asked which character currently possessed the object or knew the fact. They found that 5-year-olds were able to distinguish past and future changes in both physical and mental states. Friedman (unpublished, cited in Friedman, 2003) also reported that 6-year-olds could articulate the causal relation between both the past and the present, and between the future and the present. This conceptual understanding of the past and future in 6-year-olds correlated with their judgment of the past–future status for autobiographical events, supporting the idea that causal understanding underlies children's temporal reasoning.

A crucial ingredient of temporal reasoning is the ability to envisage events from multiple temporal points of view, referred as temporal perspective taking (McCormack and Hoerl, 2001). It allows individual to switch back and forth from different vintage points of time, i.e., temporal decentering. In temporal reasoning tasks, children are often presented with events that happened at a given time point and are asked to reason about situations based this information. Temporal decentering is involved because the question and the given information are about different time points. Children must retain the relevant information in memory and mentally travel from Time A that was specified in the provided information and infer its effect or implication for Time B. For example, to determine whether a character, who is about to get a balloon tomorrow, has a balloon now, children need to first decenter from the present and project themselves to tomorrow, when the character is acquiring the balloon, take the perspective of this time point, and recall that the question asked about events that happened before this point, then switch back to the present, and respond to the question. Temporal perspective switching and temporal decentering are the keys to this temporal reasoning process.

Moreover, because temporal reasoning is based on a concept of time as a successive series of causally interdependent states, it plays an important role in many higher order cognitive processes, such as planning and problem solving. McColgan and McCormack (2008) examined 3- to 5-year-olds' temporalcausal reasoning in searching and planning. In their search task, children observed a puppet walking through a miniature zoo, passing different cages and taking a Polaroid picture at the kangaroo's cage. At the end of the visit, the puppet noticed the camera was missing. While viewing the photo of the kangaroo, children were asked where in the zoo the camera might have been lost. In their planning task, the same scenario was used, and children were told that a puppet wanted to visit the zoo and take a picture of the kangaroo. Children were asked to preposition the camera in the zoo so that the puppet could take the desired picture when passing by the kangaroo's cage. To make an appropriate choice, children had to combine knowledge about the temporal order of events with causal evidence (in the search task) or knowledge (in the planning task, the camera is a prerequisite for taking pictures). Four- and 5-year-olds, but not 3-year-olds, succeeded in the search task. Only 5-year-olds performed well on the planning task, whereas 3- and 4-year-olds' performance was at chance. Using a closely matched control task requiring mere updating, Lohse et al. (2015) found younger children succeeded in the control task but not the search task. These findings indicate that temporal-causal reasoning is qualitatively different from simple updating. It seems to emerge at around 4 years of age and continues to develop in children from 5 to 6 years old.

In summary, studies focusing on temporal language indicate that children are able to distinguish past and future at 2–3 years, but studies focusing on temporal cognition show that children at age 4 and 5 years still display past–future confusion; they are not capable of reasoning about the past and future until age 5. This controversy may relate to the different methodologies employed in each line of research, for example, production of tense was taken as an indicator of temporal concepts in psycholinguistic studies whereas differentiation of past and future events and their effects was considered as temporal understanding in cognitive developmental studies. However, more importantly, the controversy draws attention to the mental processes involved in mastering temporal language and making temporal judgments and reasoning, and raises crucial questions such as: Does children's early use of temporal language indicate temporal understanding? How much do temporal judgment and temporal reasoning tasks tell us about children's temporal concepts? With both being closelsy involved in conceptual development, how can we identify the mental processes for temporal language and those for temporal cognition? Furthermore, how can we tease apart linguistic and cognitive processes in temporal reasoning tasks? By addressing these questions, we can begin to disentangle the linguistic and cognitive components in the conceptualization of time. Theoretical issues concerning the role of language in the development of temporal understanding and practical issues concerning how to assess cognitive and linguistic components separately are discussed in turn below.

### IS LANGUAGE NECESSARY FOR THE VERY FORMATION OF TEMPORAL CONCEPTS?

Our concepts of time are abstract; they are primarily communicated via language. The relationship between language and concept formation or cognition in general has been discussed by many theorists, including Chomsky, Piaget, Whorf, and Vygotsky. Piaget and Vygotsky focused on the effect of language development on changes in thought. They both assumed that thought and language are distinct representational systems. Piaget (1968) held a cognitive determinism view. He claimed that children's grasp of word meanings changes with development and reflects underlying changes in thought. Language is necessary but not sufficient for the construction of logical operations. Both language and logical operations depend on non-linguistic intelligence. The intellectual unfolding of

children's mind sets the pace for their language development. Vygotsky (1962) emphasized the interaction between language and thought. He proposed that language augments children's prelinguistic cognitive abilities; it gives children the control over their own mental processes such as directing attention, selecting a course of thought, and formulating mental plans. Vygotsky also emphasized the impact of social interaction and cultural symbol systems on language and cognitive development. Taking a Vygotskian perspective, Nelson (1991) argued for mutual influences between language, world knowledge, and the sociocultural context. She considered language and cognition as interactive systems with cognitive development inseparable from language. The interdependency between cognition and language is especially salient in children's acquisition of temporal concepts.

What role, then, does language play in constructing temporal concepts? Is language necessary for the very formation of these concepts and not merely for their expression? Do pre-linguistic children have some basic temporal understanding? Although researchers (O'Connell and Gerard, 1985; Bauer and Mandler, 1992) have found evidence of sequential understanding in 11 month-old infants using an elicited imitation paradigm, nonlinguistic concepts of past and future are very difficult to assess. Nelson (1989) proposed four logical possibilities with respect to the relation between the linguistic expression of time and the mastery of time concepts: (1) Concepts of past, present, and future are innate and will be expressed in language when language development has reached a particular level; (2) Concepts of past, present, and future are an inherent part of the human conceptual system, but this system matures independent of linguistic development; (3) Concepts of past, present, and future are constructed. Temporal language may facilitate the construction of the temporal systems by flagging potential distinctions, but the concepts are not wholly dependent upon linguistic expression; (4) Concepts of past, present, and future are dependent upon language expression for their construction.

Nelson (1989) longitudinal study of a 2-year-old child's (named Emily) pre-sleep monologs provides data to support the view that temporal concepts are constructed in response to linguistic coding (possibility 3). Linguistic coding of temporal concepts and relations emerged relatively late in Emily's speech, but correlated with the development of many related notions such as far and near, past, future, general event knowledge, frequency, contingence, and possibility. Further, many temporal adverbs, prepositions, and conjunctions appeared simultaneously in Emily's speech, which, according to Nelson, helped build a system of mutually defining temporal and causal relations and guide the acquisition of temporal concepts. These findings suggest that temporal language facilitates the construction of the temporal systems. Moreover, compared to relative concepts of time, such as temporal perspectives (past, present, future), temporal sequence, duration, and speed of events, arbitrary concepts tied to conventional time systems, such as seasons, months, days of a week, hours, require direct teaching by the language community (Nelson, 1991). In other words, children need explicit discussion and teaching from adults to acquire meanings of such lexical terms. For example, Tillman and Barner (2015) found that preschoolers had little to no knowledge of the absolute durations encoded by duration words (e.g., second, minute, hour, day, etc.). This knowledge is learned when they acquired the formal definitions for the words.

However, many commonly used temporal terms, such as morning, afternoon, night, yesterday, tomorrow, etc., are not directly taught to children. How do children learn these? Everyday communication between parents and children often contains a variety of temporal terms, for example, Tomorrow we're going on a trip, Remember last week we were at grandma's house, etc. These temporal terms (e.g., tomorrow, last week, etc.) refer to pseudo-objects whose meanings are not clear to children initially. They may serve initially as placeholders, which contain little meaning content, but have strong associations with specific contexts. These contexts are situations in which the terms have been used by parents. Children hold basic representations for the placeholders, for example, a rough idea about the domain referred by temporal terms and the distribution of temporal terms in a sentence. At this point, children acquire the forms of words from the discourse context but with little conceptual underpinnings. Their early use of temporal terms is limited to the associated contexts and oftentimes inaccurate. For example, they may produce sentences with temporal adverbs (e.g., yesterday, tomorrow, etc.) in appropriate sentence positions but refer inaccurately to time points (Bloom, 1970). In other words, the reference of their temporal linguistic expressions does not match the actual event time that they intend to express. Nelson (1991) called this "use before meaning;" it is consistent with Vygotsky's account of language acquisition in which "grammar precedes logic" (Vygotsky, 1962, p. 127). Thus, early use of temporal words is not necessarily evidence of early temporal understanding.

Parents' feedback and children's own experience of events allow them to update and refine the meanings of the linguistic forms. As contexts entailing temporal language accumulate and diversify, children's grasp of temporal terms gradually becomes decontextualized. They can now generalize the terms to novel situations. During this process, temporal language facilitates the construction of the conceptual temporal systems by introducing new ideas and flagging potential distinctions, such as using the term yesterday to refer to any not-now event. At the same time, children's level of cognitive ability also affects how much children benefit from hearing and using temporal language (Sachs, 1983). For example, Nelson (1977) observed a 3-year-old who mistakenly reversed the order of past events by describing the recent event first and the second recent event next, so on. This narrative pattern indicates the cognitive difficulty of decentering oneself to a non-present point and following the temporal sequence from there. Children's cognitive readiness for flexibly switching temporal perspectives, and for coordinating and manipulating mental representations of events, affects their use and understanding of temporal language in narrative discourse.

How much conceptual understanding of time can be inferred from children's natural language production? Nelson (1989) argued that appropriate production of temporal terms might not indicate a genuine understanding. Children may use the terms meaningfully in a subset of contexts where adults use them, or

simply copy adults' usage in a particular context. For example, in Emily's pre-bed monolog, she used the expression "just a minute" to request her father to rock her in the crib ("Daddy came in just a minute and rocked me," Nelson, 1991, p. 303). This expression was only used in this context at that time and it was the same context that her father used (he usually responded to Emily's request to be rocked with "I will rock you for just a minute," Nelson, 1991, p. 303). Because of the strong association between the use of "just a minute" and the crib-rocking context, Emily's production of the phrase was not underpinned by a genuine comprehension of meanings (either a duration of 60 s or "a little while" in general). Contexts that entail children's active involvement or interest (e.g., Emily desired to be rocked in crib), as well as repetitive interactions associating the context with a small set of temporal terms, seem to incubate the production of those terms. Such production is context dependent; it is an important mid-point in the continuum of concept mastery from "not at all" to "full command".

For these reasons, researchers should be cautious in making conceptual inferences from language production data. For example, whether children's initial use of past tense encodes ordered time relations or aspectual features is under debate. The aspect-before-tense hypothesis claimed that children initially used past tense to mark the completedness of an action, not the time of the action (Bronckart and Sinclair, 1973; Antinucci and Miller, 1976). Therefore, children could not be said to understand the notion of pastness until they used past tense for both continuous, non-goal-oriented actions and completed, goal-oriented actions. However, other researchers provided evidence suggesting that English-speaking children were able to use past tense to refer a variety of past events, not just to goal-oriented ones with completive aspects (Kuczaj, 1977; Di Paolo and Smith, 1978; Sachs, 1979). They also argued that despite children's earliest tendency to use past forms in their own speech to signal a "present completedness of a past action," they might understand references to past events in the speech of others (Harner, 1982, p. 153). Children's production and comprehension of tense should be analyzed in conjunction with consideration of action types (goal- vs. non-goal-oriented) in making inferences about their understanding of the concept of past.

Research directly addressing the role of language in forming concepts of time is very limited, but the influence of language has been addressed for many other aspects of conceptual development. For example, count nouns are considered "invitations" to children to form categories (Waxman and Markow, 1995). They serve as labels for concrete objects (or sets of concrete objects) and help children form theoretical kinds in mind (Gelman and Coley, 1991). Researchers (Waxman, 1991, 2004) believe that language facilitates children in establishing conceptual organizations such as categorical hierarchies. For young children, nouns highlight higher-order category relations (e.g., animal, plant) and adjectival phrases mark specific, lower-order distinctions (e.g., edible mushrooms, poisonous mushrooms). A majority of the word-learning literature focuses on the mapping process between a conceptual category and its linguistic label. Several conceptual bases or initial constraints, such as the whole-object, taxonomic, and mutual exclusivity assumptions, have been shown to be useful in solving the inverse problem of mapping (Markman, 1991). Beyond categorization, language is an important instrument for children to acquire relational concepts. The use of common labels for relational roles (e.g., daddy, mommy, baby), the possession of relational verb (e.g., buy and sell, come and go), relational adjectives (e.g., high and low, more and less), and even names for relations (e.g., same and different) provide representational tools, which make the restricted implicit understanding of relations into a more powerful explicit one (Gentner, 2003; Christie and Gentner, 2014).

Similarly, children begin to produce no and not between 15 and 27 months, but their grasp of the full range of meanings as a logical operator that flips the truth-value of a proposition comes later (Feiman et al., 2017). This lag echoes the one between production and comprehension of temporal terms, and is also evident in children's acquisition of mental state words. Researchers (Nelson, 1996a; de Villiers and de Villiers, 2003, 2014) investigating the connection between mental state words and the development of Theory of Mind (ToM) noticed that children started to use language about mental states, such as verbs of desire, belief, and knowledge at age 3, around the same time they showed their ability in monitoring others' mental states (Bartsch and Wellman, 1995). Although children's use of mental state terms may not be interpreted as having the same meanings that adults attached to them, having labels for abstract mental states and being able to talk about minds make their representations of mental states more portable.

We can draw three important parallels between children's acquisition of negation terms, mental state terms, and temporal terms: (1) these terms do not refer to concrete objects; (2) children usually produce these words before they fully understand them; and (3) children's understanding is affected by context and pragmatic factors. For example, negative sentences are only hard for children to process when they are pragmatically infelicitous (Nordmeyer and Frank, 2015; Reuter et al., 2018). de Villiers and de Villiers (2014) suggested that more conversation in rich social context allows the meanings for mental state words to emerge. Nelson (1996a,b) also emphasized the role of context in acquiring meanings for words referring to abstract entities. Children learn to use abstract words in contexts where others use them. Through using and interpreting words for abstract entities within their representation of familiar situations, children form a preliminary understanding of these words. As contexts and experiences accumulate, children's understanding is refined and becomes connected to other representations in the construction of a conceptual network. At that time, their understanding is stable, decontexted, and conceptual. Nelson provided an insightful perspective on the constructive function of language, but also proposed that concepts are not wholly dependent upon linguistic expression. There must be some prelinguistics representations onto which language can be mapped. The role of language in constructing abstract concepts in general, and the role of language in building temporal concepts in particular, needs to be addressed by more theoretical discussions and empirical investigations.

## HOW TO TEASE APART AND MEASURE COGNITIVE AND LINGUISTIC COMPONENTS?

Although interrelations between language and conceptual development exist in many aspects of conceptual development, the connections are especially important and complicated for the concept of time. As a fundamental dimension of the universe, time is very abstract. Unlike number and space, it is difficult to instantiate with concrete entities. This makes language a crucial symbolic system for conceptual representation. At the same time, time itself is a conceptual tool to measure change and organize experience. Children's temporal understanding develops in parallel with cognitive development and language development; it is also constructed through the interaction between cognitive processes and linguistic capacities. For a better understanding of the developmental trajectory of temporal concepts, it is necessary to tease apart and measure cognitive and linguistic components separately. However, practical challenges and difficulties exist in devising paradigms to assess children's temporal cognition and temporal language separately (McCormack and Hoerl, 2008).

First, tasks that test children's temporal language cannot easily avoid representational or reasoning demands. This issue is illustrated in research testing children's understanding of yesterday and tomorrow (Tillman et al., 2015; Zhang and Hudson, 2018). In Zhang and Hudson (2018) now task, children needed to answer the question What does it look like now? based on sentences referring to an event occurring yesterday or tomorrow. To respond correctly, children had to first decode the temporality indicated by the sentence linguistically, and then parse the temporal relation between the referred event and the present. Children's performances not only reflected their understanding of yesterday and tomorrow, but also demonstrated their temporal reasoning ability. Because forward temporal reasoning is easier than backward temporal reasoning, in their now task, answering the now question given an event occurring yesterday (I carved the pumpkin yesterday. What does it look like now?) was easier than answering the same question given an event occurring tomorrow (I'm gonna carve the pumpkin tomorrow. What does it look like now?). Similar effects were evident in the study by Tillman et al. (2015). They showed 3- to 5-year-olds pictures of increasing events (e.g., a flower growing) and decreasing events (e.g., a snowman melting) and asked them to answer questions about yesterday and tomorrow. For example, for the event of a flower growing, they presented children a picture of flower today and asked them to select one picture from two alternatives to answer the questions, What did the flower look like yesterday or What will the flower look like tomorrow? Children performed better on questions requiring forward temporal reasoning (i.e., from today to tomorrow) than questions requiring backward temporal reasoning (i.e., from today to yesterday). Performances on these two tasks were affected by the reasoning processes required.

It would be very difficult to completely eliminate reasoning or memory in tasks aiming to measure language ability, but researchers can be aware of the effects of cognitive demands and try to minimize or test for their effects. For example, familiar settings and props can be used to reduce working memory and representation loads. Tasks can be designed to test for the effects of the cognitive demands required. For instance, in studying children's understanding of yesterday and tomorrow, researchers can test and compare children's comprehensions when the two terms are embedded in forward and backward reasoning settings, respectively.

Second, because temporal systems are abstract, we have to rely on language to express them, which means that it is difficult for researchers to only measure the cognitive components of temporal understanding. Many temporal reasoning and representation tasks rely heavily on children's language comprehension. For example, Busby and Suddendorf (2010) investigated children's ability to infer current physical and mental states based on past and future events. Children were told stories, each describing two characters. In the possession stories, one character had acquired an object in the past, and the other was going to acquire it in the future. In the knowledge stories, one character had already acquired the knowledge and the other was going to acquire it. Children were asked "which character has [the object]/knows [the knowledge] right now?" The stories were language heavy; each contained more than eight sentences, which required good language comprehension to understand, as wells as good memory skills to keep all of the relevant information in mind. More importantly, understanding of temporal expressions in the story (e.g., "Yesterday, Emma went shopping. When she went shopping she bought a new toothbrush" vs. "Tomorrow, Mindy is going shopping. When she is shopping she is going to buy a new toothbrush") is the key for success in this task. If children simply do not know the meaning of temporal adverbs included (yesterday, tomorrow) and fail to parse or make use of information in past tense and future verb form, they would likely perform poorly. Therefore, their poor performance in this task could be due to the incorrect understanding of temporal expressions rather than to their inability to perform temporal reasoning. Results from this study showed that 4-year-olds' performance was close to chance level. In a follow-up study, the authors simplified the stories by removing the temporal adverbs and adding auxiliaries did and will (e.g., "Emma went to the beach. She did take some shells home from the beach" vs. "Mindy is going to the beach. She will take some shells home from the beach") and found that 4-year-olds' performance significantly improved (above chance). This indicates that the way information is presented in language and children's comprehension of linguistic information directly affect their temporal reasoning performance.

When studying temporal reasoning and representation, the use of language oftentimes cannot be avoided. In order to minimize language demands, future research focusing on temporal reasoning or judgments can make better use of pictures, props, and live or video demonstrations. For example, visible changes of objects over time (e.g., agents moving, plants growing) can be illustrated by using pictures or demonstrations together with linguistic descriptions. The contextual and visual accommodation may provide children alternatives to figure out the cognitive components asked by the tasks and reduce the

demands for language as well as for memory. Researchers can also differentiate the events or scenarios used to study temporal concepts and relations in terms of familiarity. Familiar events can be used to detect the emergence of temporal reasoning and judgment abilities. Attention can be paid to whether children solve temporal reasoning or judgment problems based on their temporal cognitive skills or their memory of scripts for familiar events. In this case, memory factors can be measured and partialled out in data analyses. Unfamiliar or novel events can be used to test the proficiency of temporal cognitive skills. If children can apply the skills they use for familiar events to novel events, it shows that they have developed temporal cognitive skills that are generalizable and transferable.

Third, several cognitive processes of different complexity are often required when testing children's temporal cognition due to varying tasks employed. Research on children's temporal judgments has largely investigated three aspects: judgments about past–future status, judgments about distance of past/future events relative to the present, and placement of events along a timeline. The cognitive processes involved in each of these judgments are quite different. Past–future status is categorical judgment, which may only require a basic differentiation of the past and future. Temporal distance judgments are both categorical and continuous and require more cognitive processes, such as retrieving memory for the exact event time, representing conventional timeframes, and comparing the event time to the present in this mental timeframe. In general, past/future distance judgments are difficult for children; depending on tasks, they may also require cognitive flexibility or inhibitory control. For example, in Friedman (2003, 2005) studies, children were asked which of two cyclical events occurred longer before in the past, Christmas or the child's birthday. The fact that both events happened in the past and will happen in the future makes the task ambiguous. Children might not fully understand what the task is asking for and simply respond based on the distances of events from present. Further, the question itself is not straightforward; in real life, when an annual event is upcoming, it is rare to be asked how long ago the previous one occurred. It is more cognitively adaptive to represent the upcoming occurrence as closer, rather than the previous one as farther away. To come up with the correct response, children had to closely attend to the question, inhibit the more salient representation, and switch to thinking about distances of past events. Given the complexity of the task, Friedman's conclusion that children at age 4 or 5 still do not have a proper understanding of the distinction between the past and future calls for a careful re-examination.

To provide children with a visual representation of time, timeline-based tasks have employed a variety of forms of spatial representation. Researchers have used horizontal lines from left to right (e.g., Busby Grant and Suddendorf, 2009; Tillman et al., 2017), sagittal lines stretching away from the viewer (see Friedman, 2003 for a review), a line made of rectangles indicating time units (Hudson and Mayhew, 2011), and lines with markers indicating direction and scale (Busby Grant and Suddendorf, 2009; Tillman et al., 2017). These variations make it hard to compare children's performances across studies and also raise interesting questions about the spatial representation of time. The limited research (Tillman et al., 2018) on mapping between time and space shows that children are initially flexible with spatial representations of time and most preschoolers do not represent time as a line spontaneously. Their spatial representation of time becomes increasingly automatic and conventionalized in the early school years.

Similarly, research addressing temporal reasoning has used a variety of stimuli and methods. In research focused on sequencing (Nelson and Gruendel, 1981; Fivush and Mandler, 1985; Bauer and Shore, 1987; Bauer and Mandler, 1989), children are shown pictures of an event and are later asked to arrange randomly ordered pictures in the correct temporal order. The extent of children's sequencing ability has been investigated by varying the types of events (e.g., familiar vs. unfamiliar; causal vs. arbitrary) and the manner of sequencing (e.g., forward vs. backward). Event representations, understanding of sequence, and memory are all required for reconstructing event sequences.

Another line of research has focused on children's reasoning ability about temporal causal changes that requires cognitive abilities beyond event representation and sequencing. In this line of research, investigators were investigating children's understanding of time as series of changes, specifically, their understanding of the causal pathway from the past to the present and the non-causal pathway from the future to the present. Friedman (unpublished, cited in Friedman, 2003) explicitly asked children the effect of a past or future event on the present (e.g., "Michelle had a birthday party yesterday. Can she know all the presents she got? Why or why not?"). Busby and Suddendorf (2010) told children stories about characters who did or will get/know something and asked them who had the thing or knew the information now. Tillman et al. (2015) showed children an event unfolding, such as a flower growing or a snowman melting, and asked them to identify what the item looked like in the past (yesterday) or would look like in the future (tomorrow) based their understanding of the event trajectory. The temporal-causal chain was especially important in McColgan and McCormack's (2008) search and planning tasks. Children faced problems in contexts with many parameters and variables (e.g., the goal, the layout, the sequence, the direction, the time point). Temporal reasoning ability was necessary but not sufficient for them to solve the problems. They also needed to properly represent the goal and structure of the problem, be aware of contributing factors, temporally decenter themselves to envision the steps that needed to be taken forward or backward, and integrate steps and situations, either representational or imaginative, to make decisions. These studies differ in the complexity of the task context, and therefore call on different levels of other cognitive skills, such as working memory, cognitive flexibility, inhibition, and causal reasoning, to work together with temporal reasoning skills. This is perhaps one of the reasons that results vary even within temporal reasoning studies. Future research not only needs to disentangle cognitive and linguistic components in temporal understanding, but also needs to investigate elements of each component more systematically. For example, a series of tasks could be designed with increasing complexity, from processing basic temporal information to coordinating temporal and non-temporal factors in making inferences. Careful controls

and contrasts could then be conducted across the series of tasks.

### IMPLICATIONS FOR FUTURE RESEARCH

Psycholinguistic research has contributed much to our understanding of how children acquire temporal markers in language, but it has not fully explained the conceptual changes driven or brought on by language development. Researchers focusing on temporal representation and reasoning oftentimes utilize tasks that depend heavily on other cognitive abilities and knowledge (e.g., memory, cognitive flexibility, knowledge of annual holidays, etc.). The strengths and limitations of these two lines of research implicate several directions for future research.

In general, to better understand development, it would be helpful for researchers to first delineate a mature state of temporal concepts. The nature of time is perplexing; fundamental debates about the nature of time exist in physics (e.g., whether time exists independently of physical spacetime events or it is just a mere relationship of the causal ordering of events, Lobo, 2008) and philosophy (e.g., whether time is a series of events being either the past, present, or future or it is a series of events that one is "earlier than" another, McTaggart, 1908). Although conceptions of time may vary, psychologists interested in the cognitive understanding of time need to specify the key properties of temporal concepts under investigation. McCormack (2015) proposed three key properties of a mature concept of time. First, time is linear and unidirectional. It does not reoccur and cannot be revisited. Second, time is represented as unified, connected by before/after relations. Every time point is systematically related to every other point. Third, adults can think of time independent of events, that is, they can think about time points independent of events that have occurred or will occur. McCormack (2014) also hypothesized important developmental shifts in concepts of time from those grounded in script-like representations of repeated events, to concepts with distinct categories (happened vs. not yet), to a mature concept of event-independent time. This speculative account provides a way of thinking about development and calls for empirical investigation.

To capture emerging temporal concepts, studies focusing on language or cognitive processes need to adopt tasks that minimize cognitive demands for memory, inference, and inhibition. For example, instead of using verbally described vignettes in temporal language comprehension tasks, straightforward demonstrations of scenarios with child-friendly props could reduce cognitive loads and keep children engaged. Valian (2006) tested children's understanding of temporal language by demonstrating and asking them about the familiar action of tying shoes, which effectively minimized memory and representation demands. Other linguistic factors, such as position of temporal words in a sentence and the telicity of verbs presented in task, should be unambiguous and well controlled. Another way to reduce task complexity would be to design tasks within well-known domains and based on events that are familiar to young children. For example, Friedman (1990) showed that children's temporal reasoning was content-dependent; they were able to arrange familiar daily activities backwardly, but could not do the same for novel events, which demanded greater cognitive resources for memory, leaving fewer for inhibition (children had to inhibit their dominated response of reasoning in forward order). Once the initial starting point for temporal conceptualization is clear, researchers can explore the development of more advanced temporal reasoning by gradually increasing task complexity, for example, by including more temporal factors and inferential reasoning.

Third, multiple perspectives and various methods are needed to construct a full picture of conceptual development with respect to time. Previous research on psycholinguistics and cognitive processes has shed light on how children understand and reason about time, but more studies with well-controlled designs are needed to flesh out these two perspectives, and to facilitate conversations between the two. For example, future investigations could pay more attention to the contexts in which temporal terms emerge or new temporal terms/concepts are introduced to young children. Parents and children talk about events in their daily life and teachers and children talk about schedules and plans for activities in school settings. Adults can facilitate children's language and conceptual learning in many ways. Research on parent-child talk about the past (Nelson and Fivush, 2004; Reese and Newcombe, 2007) showed that mothers' elaborative reminiscing enhanced children's autobiographical memory development. Research on parent– child talk about the future (Hudson, 2002, 2006) suggests that maternal time references contribute to children's understanding and use of temporal terms. Future research could embrace more corpus analyses to find out the contextual factors that help children acquire temporal words and concepts of time. For example, in what context, do children start producing different types of temporal words? What social interactional cues and pragmatic cues are effective for early production? How does the quantity and quality of temporal language exposure affect children's temporal language production and temporal understanding?

More research is also needed to compare and integrate findings from investigations of children's production and comprehension of temporal language. This approach is exemplified in research on children's production and comprehension of no and not in which children's comprehension was measured by experimental tasks and children's production was analyzed with respect to the Macarthur-Bates CDI production norms (Feiman et al., 2017). They found that children's comprehension of the truth-functional no lagged behind their normal production of no by about a year, suggesting that the ability to map the concept of negation to the word no is developmentally challenging. Similarly, Sankaran (2011) investigated the influence of verb semantics on Tamil children's acquisition of aspect markers using both a production task and a comprehension task. She found that children understood the imperfective marker before they actively used it, and although children frequently produced the perfective marker, their understanding of the function of the perfective marker

was limited. This approach to comparing and integrating comprehension and production data can also be used to explicate the construction of temporal concepts. Ideally, future research should consider using within-subjects designs to study children's comprehension and production of temporal language so that stronger claims can be made.

Efforts can also be made to design and employ on-line measures, such as preferential looking or eye tracking. Most previous research on temporal cognition and temporal language has adopted off-line measures, such as sentence-picture matching tasks, truth-value judgments, placement/sequencing task, actout tasks, and question-after-story tasks. Online measures may be more sensitive and informative about the parsing/analyzing process. For example, Sekerina et al. (2004) tested children's comprehension of pronouns using both on-line (eye tracking) and off-line (picture-selection) tasks. They found a dissociative pattern of performance across these two tasks. The eye-tracking task revealed a more adult-like competence than indicated by the picture selection task. Similarly, Brandt-Kobele and Höhle (2010) investigated 3- to 4-year-old German children's comprehension of verb inflection as a cue to subject number using a preferential looking paradigm, where children did not have to perform a specific task, but instead their eye gaze was tracked to measure the comprehension of sentences with verb inflections. Using this paradigm, they found clear evidence that 3- to 4-yearolds were able to infer the number of subjects based on the inflectional information. When a similar task with both eyetracking and pointing was conducted, Brandt-Kobele and Höhle (2010) found weaker evidence from children's eye-movement data, and interestingly, no evidence from their pointing reactions. Children's failure in selecting or pointing to the correct picture may be due to general task demands or to different stages of the interpretation process engaged by the on-line and the off-line measures (Trueswell and Gleitman, 2007). Although researchers are still debating whether preferential looking and picture selection tasks tap the same processes and what these crosstask discrepancies can tell us about comprehension-production

### REFERENCES


asymmetry, for under-researched areas such as the development of temporal concepts, data from both on-line and off-line tasks could advance our understanding of the developmental trajectory.

Useful information and insights can also be obtained from the study of the development of related cognitive abilities and processes requiring temporal understanding. For example, an understanding of time is essential for autobiographical memory and future thinking. Remembering one's own past implies an understanding of the past and a differentiation of past time points. Planning one's own future implies an understanding of the future and a differentiation of future time points. Research has investigated the development of autobiographical memory, planning, and future thinking, but little attention has been paid to the extent that children understand the temporal concepts or temporal language presented in investigations. Considering children's performances on autobiographical memory and future thinking tasks from the perspective of temporal understanding is helpful both for research in these areas themselves, but also for the study of temporal concepts, because an awareness of time is required and used for a pragmatic purpose in these tasks. Children may not fully understand the meaning of a temporal term or reason about temporal relations when asked explicitly, but it is possible that they can make use of their limited grasp of time when asked to recollect their past experiences and to imagine their future selves. Future research would benefit not only from disentangling the linguistic factors and cognitive processes in forming temporal concepts, but also from understanding how temporal concepts contribute to the development of other aspects of cognition and language.

### AUTHOR CONTRIBUTIONS

MZ contributed to the conception and the writing of the manuscript. JH contributed to the writing of the manuscript by providing critical and valuable comments.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Zhang and Hudson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Positive Effect of Visual Cuing in Episodic Memory and Episodic Future Thinking in Adolescents With Autism Spectrum Disorder

Marine Anger1,2† , Prany Wantzen<sup>1</sup>† , Justine Le Vaillant1,2, Joëlle Malvy<sup>3</sup> , Laetitia Bon1,2 , Fabian Guénolé1,2, Edgar Moussaoui<sup>2</sup> , Catherine Barthelemy<sup>3</sup> , Frédérique Bonnet-Brilhault<sup>3</sup> , Francis Eustache<sup>1</sup> , Jean-Marc Baleyte1,4 and Bérengère Guillery-Girard<sup>1</sup> \*

#### Edited by:

Danielle DeNigris, Fairleigh Dickinson University, United States

#### Reviewed by:

Alain Morin, Mount Royal University, Canada Catherine A. Best, Kutztown University of Pennsylvania, United States

#### \*Correspondence:

Bérengère Guillery-Girard berengere.guillery@unicaen.fr; berengere.guillery-girard@ephe.psl.eu

†Co-first authors

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

> Received: 19 July 2018 Accepted: 17 June 2019 Published: 09 July 2019

#### Citation:

Anger M, Wantzen P, Le Vaillant J, Malvy J, Bon L, Guénolé F, Moussaoui E, Barthelemy C, Bonnet-Brilhault F, Eustache F, Baleyte J-M and Guillery-Girard B (2019) Positive Effect of Visual Cuing in Episodic Memory and Episodic Future Thinking in Adolescents With Autism Spectrum Disorder. Front. Psychol. 10:1513. doi: 10.3389/fpsyg.2019.01513 <sup>1</sup> Normandie Université, UNICAEN, PSL Universités Paris, EPHE, INSERM, U1077, CHU de Caen, Neuropsychologie et Imagerie de la Mémoire Humaine, Caen, France, <sup>2</sup> Service de Psychiatrie de l'Enfant et de l'Adolescent, CHU de Caen, Caen, France, <sup>3</sup> UMR 1253, iBrain, Université de Tours, INSERM, Centre Universitaire de Pédopsychiatrie, CHRU de Tours, Tours, France, <sup>4</sup> Service de Psychiatrie de l'Enfant et de l'Adolescent, CHI de Créteil, Créteil, France

Cognitive studies generally report impaired autobiographical memory in individuals with autism spectrum disorder (ASD), but mostly using verbal paradigms. In the present study, we therefore investigated the properties of both past and future autobiographical productions using visual cues in 16 boys with ASD and 16 typically developing (TD) participants aged between 10 and 18 years. We focused on sensory properties, emotional properties, and recollection, probing past and future productions for both near and distant time periods. Results showed that the ASD group performed more poorly than controls on free recall for recent periods, but performed like them when provided with visual cues. In addition, the ASD group reported fewer sensory details than controls and exhibited difficulties in the experience of recollection for the most remote events. These data suggest a combination of consolidation and binding deficits. Finally, our findings reveal the relevance of using visual cues to probe autobiographical memory, with possible perspectives for memory rehabilitation.

#### Keywords: autobiographical memory, episodic memory, visual cues, sensory details, autism

### INTRODUCTION

Autism spectrum disorder (ASD) is a neurodevelopmental disorder, characterized by deficits in social communication, with restricted and repetitive behaviors. There is growing evidence that people with ASD have atypical memory functioning (Bowler et al., 1997), even if their language skills are intact. Difficulties include, among others, impairment of autobiographical memory (AM). AM is a very long-term memory of personal knowledge and events related to individuals' own lives that are accumulated from a very early age. AM allows individuals to build an identity based on a feeling of continuity (Conway, 2005; Bon et al., 2012).

Current cognitive models of AM distinguish between a semantic component pertaining to general personal knowledge or facts, and an episodic component relating to personal events.

**46**

This episodic component relies on the ability to remember past experiences (i.e., episodic autobiographical memories) and to imagine possible future experiences (episodic future thinking) (Tulving, 1985). Both episodic memories and projections involve autonoetic consciousness, namely the ability to project our states of self into the past, present or future to maintain selfcontinuity. This mental time travel allows individuals to reor pre-experience personal events associated with their original context, giving individuals a feeling of (re)living these events. To evoke episodic events, sufficient phenomenological details (i.e., feelings, emotions, sensory details such as colors, sounds, smells, tactile feelings) must be stored in memory, as they serve as retrieval cues. More specifically, episodic future thinking or projection involves imagining oneself in the future to preexperience a possible scenario (Atance and O'Neill, 2005). This projection is supported in part by episodic memory oriented toward the past (Suddendorf and Corballis, 1997; Wheeler et al., 1997). Moreover, remembered personal events and envisioned future plans have been found to share a common brain network (Viard et al., 2011; D'Argembeau, 2015). This network is thought to support common constructive thought processes that allow for the retrieval and flexible combination of stored information to reconstruct past experiences and construct novel future ones. Besides constructive and executive processes, AM involves a broad range of cognitive processes, ranging from perception (Gottfried et al., 2003) to more integrative processes. Some of these are preferentially related to the self (self-concept: Howe and Courage, 1997; theory of mind: Perner and Ruffman, 1995; Welch-Ross, 1997) and social events (Nelson, 1993), while others refer to narrative abilities (Kleinknecht and Beike, 2004). Hence, the maturation of these cognitive processes during childhood and adolescence supports AM development (Nelson and Fivush, 2004; Bauer et al., 2007; Piolino et al., 2007; Picard et al., 2009).

In ASD, both children and adults produce fewer specific memories and projections, characterized by reduced specificity, elaboration and episodic coherence. The content of these memories is also more semantic (e.g., general or repeated event) than episodic (Bon et al., 2012; Crane et al., 2012, 2013; Terrett et al., 2013; Goddard et al., 2014; McDonnell et al., 2017). Ciaramelli et al. (2018) recently reported that providing a series of standardized questions (e.g., "Where did this event take place") does not seem to increase performance, either for past recollection or for future thinking. Similarly, difficulty retrieving specific memories is observed in children and adolescents with ASD, with poorer access to the remote past (8- to 17-year-olds; Goddard et al., 2014), and impaired episodic future thinking (8 to 12-year-olds; Terrett et al., 2013). Children with ASD also have greater difficulty recalling their own activities than typically developing (TD) children (Millward et al., 2000). However, differences may be observed between children and adults with ASD. For example, discourse analysis has shown that children with ASD aged 6–14 years produce fewer past narrative details, as well as fewer emotional (e.g., happy, scared), cognitive (e.g., thought, believed), and sensory (e.g., seen, heard) terms than TD children (Brown et al., 2012). This difference is more pronounced for remote life events than for recent ones for children aged 5–17 years (Bruck et al., 2007; Brown et al., 2012; Goddard et al., 2014) or future thinking (Terrett et al., 2013). On the contrary, results obtained in adults show that sensory references are more frequent in ASD than in TD for self-defining memories (Crane et al., 2010) and early childhood events (Zamoscik et al., 2016). Hence, some sensory details may be more salient than other features and contribute to the structure of AM in adulthood. This heterogeneity highlights the importance of exploring changes between childhood and adulthood, by focusing on the adolescence period.

The impairment of AM in ASD can be interpreted according to different cognitive theories. First, the theory of mind deficit resulting in difficulty recognizing one's own psychological states and understanding of the self (Williams, 2010) may impact the narration of episodic events (Losh and Capps, 2003; Goldman, 2008; McCabe et al., 2013; Kristen et al., 2014). Second, a detail-focused perceptual style, which refers to perception theory, or the weak central coherence evoked by Happé and Frith (2006), may also have a significant impact on the properties of autobiographical memories. Temple Grandin, a woman with high functioning ASD, reported in her 2006 book Thinking in Pictures (Grandin, 2006) that the visual modality is ubiquitous in her daily life:

"I translate both spoken and written word into full-color movies, complete with sound, which run like a VCR tape in my head. . . [I] see the words in pictures . . . I have a video library. . . When I recall something I have learned, I replay the video in my imagination. The videos in my memory are always specific . . . My imagination works like the computer graphics programs. . . When I do an equipment simulation in my imagination or work on an engineering problem, it is like seeing it on a videotape in my mind. I can view it from any angle, placing myself above or below the equipment and rotating it at the same time. . . I create new images all the time by taking many little parts of images I have in the video library in my imagination and piecing them together. . . Unlike those of most people, my thoughts move from video-like, specific images to generalization and concepts. For example, my concept of dogs is inextricably linked to every dog I've ever known. It's as if I have a card catalog of dogs I have seen, complete with pictures, which continually grows as I add more examples to my video library."

She describes her visual memory as a collection of personal photographs of her own life, which has a direct impact on the formation of visual representations of semantic concepts. Moreover, she is able to take different perspectives but, as suggested by her testimony, these tend to be field perspectives with egocentric navigation. This was experimentally corroborated by Ring et al. (2018). Hence, visual autobiographical memories may be very specific and detailed but more fixed than those of TD people.

Third, the AM deficit in ASD may result from difficulty mentally assembling the details that form the experience (e.g., episodic simulation; Schacter et al., 2012) and elaborating the context of this experience (e.g., scene construction; Hassabis and Maguire, 2007). Scene construction relies on visual imagery which involves the mental generation and maintenance of a single element and the binding of all the properties of the event (e.g., objective and subjective details). Poorer scene construction is

consistent with the impaired binding processes observed in ASD (Bowler et al., 2011; Lind et al., 2014a).

Most studies reporting difficulties with AM were conducted using verbal paradigms that elicit narrative abilities (Goddard et al., 2007; Crane and Goddard, 2008; Crane et al., 2009, 2012). Since these narrative abilities are impaired in ASD, solely using language to investigate AM may bias the assessment of memory properties. Most of the studies that have reported an AM impairment in ASD used questionnaires or a fluency task. However, individuals with ASD performed just as well as controls when other methodologies were used. No differences were observed with the use of a sentence completion test that indexes memory retrieval (Crane et al., 2013), or yes– no questions (Bruck et al., 2007), when the recall test was written rather than oral (Crane et al., 2012) or when the cue words were high in imageability (e.g., letter vs. permission) (Crane et al., 2012). All these tasks provide cues or support at retrieval. These observations are in line with the task support hypothesis that emphasizes the role of retrieval support in improving AM productions (Bowler et al., 2004).

Hence, and as suggested by Temple Grandin's testimony, pictures could be a valuable tool for studying AM, by providing a visual aid to overcome the language constraints associated with the free recall paradigm. Therefore, pictures would constitute a more appropriate mean of testing the properties of episodic memories in ASD. In addition, these visual supports would provide an opportunity to test different kinds of properties, including sensory details, and investigate the possible impact on AM of the impairments in sensory processing observed in ASD (Stevenson et al., 2014).

The main aim of the present study was to investigate the properties of episodic memories and future thinking in high-functioning adolescents with ASD using visual cues. We focused on the sensory and emotional properties and the quality of the experience of recollection associated with autobiographical productions for four time periods: two in the past (i.e., yesterday and last summer vacation) and two in the future (i.e., tomorrow and next summer vacation). First, given the known retrieval deficit in ASD and possible difficulties in scene construction, we predicted that free recall performance would be impaired, but performance would normalize when visual cues were provided. We added a general neuropsychological assessment focusing on cognitive functions involved in AM retrieval, i.e., executive functions, short-term memory, and verbal episodic memory, to discuss our results. Based on the cognitive profile of ASD, we expected to find baseline differences in verbal episodic memory, planning and short-term memory. Second, given the perceptual bias reported in ASD (Mottron et al., 2003) and the frequent references to sensory details reported by adults with ASD (Crane et al., 2010), we predicted that participants would exhibit an atypical pattern of performance concerning sensory properties, with a probable focus on some perceptual modalities to the detriment of others. Third, given the well-known difficulty with emotion processing and reduced recollection capacity in ASD (Gaigg, 2012), we expected participants to perform poorly on emotion and recollection assessment.

### MATERIALS AND METHODS

## Participants

Participants were 16 boys aged 10–18 years (mean = 13.4 years, SD = 2.4) (**Table 1**). They were recruited through autism resource centers in Caen and Tours in France. The recruitment started prior to the 2013 publication of DSM5, hence participants had all been diagnosed with verbally and intellectually highfunctioning autism or Asperger's syndrome according to DSM-IV (American Psychiatric Association [APA], 2000) criteria. The diagnosis was established by experienced professionals using the Autism Diagnostic Interview-Revised (ADI-R; Lord et al., 1994) and/or Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989). The ADI-R is a detailed semi-structured interview of parents about their child's developmental history and autism symptoms that yields ratings for reciprocal social interaction, language and communication, and restricted repetitive behaviors. The ADOS is also a semi-structured interview and is a standardized assessment of social interaction, communication, play and imaginative use of materials. Participants with ASD were compared with 16 TD controls matched for age, sex, and scores on the Perceptual Reasoning Index (PRI) and Verbal Comprehension Index (VCI) of the fourth version of the Wechsler Intelligence Scale for Children (WISC-IV; Wechsler, 2005). These two indices were calculated according to performances on four WISC-IV subtests: Block Design and Matrices for PRI, and Vocabulary and Similarities for VCI. They allowed us to ensure that participants had no general impairment of language comprehension or perceptual abilities. TD adolescents were recruited from several French schools. Brief interviews ensured that none of the participants met the exclusion criteria: history of previous neurological disorders or psychiatric illness (other than ASD in the ASD group), a firstdegree relative with ASD in the TD group, head trauma, current psychoactive medication, intellectual disability, and learning disabilities. Families were given a comprehensive description of the research. The study was approved by the relevant ethic committees, and written consent was obtained from all the participants (and their parents, in the case of minors), in line with committee guidelines.

### General Cognitive Assessment

Each child also underwent a neuropsychological assessment focusing on the cognitive abilities involved in AM production (Picard et al., 2009). This assessment included tests of five executive and memory functions: inhibition (Stroop test; Albaret and Migliore, 1999), planning (Tower of London; Lussier et al., 1998), verbal short-term memory (forward digit span, WISC), visuospatial short-term memory (Forward Corsi blocks; Pagulayan et al., 2006), and verbal episodic memory (story recall from Children's Memory Scale; Cohen, 2001). Picard et al. (2009) found that these cognitive abilities were

#### TABLE 1 | Mean ages and cognitive data for the ASD and TD groups.


<sup>∗</sup>Significant differences observed between the ASD and TD groups. nd, not done owing to a ceiling effect; PRI, Perceptual Reasoning Index; VCI, Verbal Comprehension Index; ASD, participants with autism spectrum disorder; TD, typically developing participants.

involved in the production of autobiographical memories in childhood (6–11 years).

Finally, all participants underwent a brief investigation of personal semantic knowledge, in order to exclude a possible major deficit that might interfere with the AM task. This consisted of a questionnaire coupled with visual cues about general personal information on three different topics, adapted from Piolino et al. (2007)'s methodology. Questions concerned acquaintances, school life, and personally relevant famous names (e.g., heroes, stars, etc.). The maximum score was 6 for each of these categories.

### From Past to Future Task

This task explored specific past personal events and future thinking for the day before (recent past), last summer vacation (remote past), next day (near future), and forthcoming summer vacation (distant future). For each period, visual cues were provided to support production (**Figure 1**). All responses were directly manually transcribed by the interviewer. The interviewer had a grid for coding each personal event that was reported (free recall and cued recall of personal event). All other responses were directly coded by the participants themselves.

### Visual Cues

Questions were illustrated with drawings that provided a timeline and visual cues for detailing personal events, contents and perceptions (i.e., colors, smells, tactile feelings, sounds, tastes). Contents could refer to temporal situations, spatial locations (e.g., home, school, beach, etc.), modes of transport (e.g., car, plane, train, etc.), activities (e.g., video games, football, musical instrument, etc.) and people present (e.g., parents, children, etc.). All the pictures were drawn by a professional illustrator who ensured that each type of content was included. For example, for the who content, there was a person of every age (i.e., children, adults, and older adults) and gender. In addition, five types of perceptions were illustrated with drawings. For example, colors were associated with a color chart, while smells were indicated with a trash can or a flower; sounds with a musical note or bell; tastes with a lemon or a sweet; and tactile feelings with a finger placed on a pillow (mushy) or on ice (cold) (**Figure 1E**). Each question included explanations of the properties being tested (e.g., "Did you have tactile feelings? Did you touch something soft like cotton wool, cold like ice, mushy like a pillow, hard like wood, wet like water, or painful like a hedgehog?"). Participants repeated the property when they selected the drawing that supported their autobiographical production (e.g., "I touched something soft. . ."). This procedure was applied to all visual cues.

### Procedure

Each participant was asked to produce descriptions of memories or projections with as many details as possible, focusing on the past (i.e., one event that happened yesterday and one last summer vacation) and the future (i.e., one event that could happen tomorrow and one next summer vacation). These questions allowed us to manipulate orientation (past vs. future) and temporal distance, either close (yesterday or tomorrow) or remote (last or next summer vacation). For past events, participants were instructed to remember real events that had happened to them (e.g., "Can you remember something that happened to you yesterday? I want you to recall it with plenty of details, as if you were reliving this event, and your description has to allow me to imagine this

(D), perceptions (e.g., color; E), perspective (field or observer; F), mental imagery (G), and reliving (H).

event too"). For future events, participants were instructed to imagine an event that could happen in their lives or else was completely invented (e.g., "Can you imagine what you might do tomorrow, either something planned or completely new, but I want you to imagine what could happen with plenty of details, as if you were living this event, and your description has to allow me to imagine this event"). If 1 min went by without an answer, the interviewer gave the children an open-ended prompt (e.g., "What else can you remember?"). If they were still not able to provide different contents associated with an episodic event, after a further minute, they were helped with visual cues for each of these components. Cues concerned activities (what), temporal situation (when), spatial location (where), course of the event (how), and people present (who) (**Figures 1A,B**). Episodic free recall and cued recall (with visual cues) were each scored out of 5, with 1 point per

TABLE 2 | Episodic memory paradigm, variables, and scoring.


<sup>∗</sup>Coded by two persons: the interviewer and the psychologist. All other measures were directly coded by participants. #Variables for future periods only. No, number.

type of content: what (theme), when (e.g., beginning, middle or end of the month; morning, afternoon or evening), where (which city and where in that city; e.g., home, garden, beach), how (three different details; e.g., perception, feeling, activity, script), and who (participants). Scoring was performed separately by the interviewer and a psychologist until a consensus was reached (**Table 2**).

Next, we asked participants about the properties of each event. Participants rated their own productions. First, we asked them to rate the emotional feeling associated with the event on a 6 point Likert-like scale featuring smiley faces ranging from very sad to very happy (e.g., "I was happy to do this, so I choose the fifth smiley"; **Figure 1C**). They also rated the level of emotional arousal on a triangular ruler, again with a 6-point Likert-like scale along each side (e.g., "I was happy to do this, but not very excited, so I rate it 2 on the scale"; **Figure 1D**). The Likert-scale was used for all the following questions. Participants were then asked to provide sensory details (i.e., colors, sounds, smells, tactile feelings, tastes; **Figure 1E**), and indicate the importance of each one in their memories or future thinking, using the same 6-point triangular ruler (e.g., "Which colors do you remember being associated with your memories? What was the intensity of each one?"). In the final part of the questionnaire, we collected other information. One question concerned the perspective from which they had relived the event: either their own (field perspective, scored 3/3), that of an observer (observer perspective, scored 1/3), or alternating between the two (scored 2/3) (**Figure 1F**). Another question assessed the mental imagery associated with the personal event, asking participants whether they could visualize the personal event in terms of the number of images (e.g., "When you think about this event? How do you see it? Please rate it on a scale from 0 (No image) to 6 (Lot of distinctive images)"; **Figure 1G**) and accuracy (e.g., "Can you evaluate the accuracy or distinctiveness of these images on a scale from 0 (Completely blurry) to 6 (Very precise)?"; **Figure 1G**). We also asked about the sense of subjective recollection (i.e., feeling of reliving): "When you think about this event do you feel that you are reliving it with all the sensations you had at the time? Are you able to provide many details? And is it so realistic that you feel you are reliving the scene?" We used a film/video metaphor to highlight the nature of recollection: "When you think about this event, imagine that you have rewound the film and are reliving this event as a déjà-vu scene. How do you feel about reliving it with all the sensations you had at the time? Can you rate your feeling of experiencing it on a scale from 0 (No feeling of reliving) to 6 (Very intense feeling)?" (**Figure 1H**). Finally, we asked participants about the memory's personal relevance (e.g., "Was this event important to you? Please indicate your answer on a scale of 0 (Not at all) to 6 (Very important)"), its frequency of evocation (e.g., "How often do you remember or mention this event on a scale of 0 (Not at all) to 6 (Very often)") for past and future events. For future events only, we asked whether they wished them to happen (e.g., "Would you like this event to happen? Please indicate your answer on a scale of 0 (Not at all) to 6 (Very much)"), and the probability of occurrence (e.g., "Please rate the likelihood of this event happening on a scale of 0 (Not at all) to 6 (Certainly)") (**Table 2**). To ensure that the adolescents made appropriate use of the criteria, we asked them to reformulate the instructions. This procedure was adapted to each participant and repeated until the experimenter was confident that the child understood the judgment criteria.

### Statistical Analyses

Statistical analyses were performed using Statistica Version 10 software (StatSoft, Tulsa, OK, United States). The reported values are means and standard deviations.

Due to the limited number of participants and some nonnormally distributed variables (K-S test p < 0.05 in one or both groups), we conducted non-parametric analyses (Friedman ANOVAs and Wilcoxon for within comparisons and Mann-Whitney for between comparisons with Z adjusted).

### RESULTS

### General Cognitive Assessment

As expected, Mann–Whitney U-test revealed that the ASD group performed more poorly than the TD group on verbal episodic memory (Immediate recall z = 2.13; p = 0.03, η <sup>2</sup> = 0.14; Recognition z = 2.46; p = 0.01, η <sup>2</sup> = 0.18), and planning (Tower of London, success at first attempt z = 2.11; p = 0.03, η <sup>2</sup> = 0.14),

but none of the other comparisons including working memory, yielded significant differences (**Table 1**).

Semantic performance plateaued in both groups (**Table 1**) confirming the absence of a major deficit in personal semantic knowledge in ASD.

### Personal Event

Mann–Whitney U-tests on free recall performance revealed significant differences for two periods: recent past (z = 2.93, p = 0.004, η <sup>2</sup> = 0.25), near future (z = 2.41, p = 0.01, η <sup>2</sup> = 0.18) and a marginally significant effect for the distant future (z = 1.95, p = 0.056, η <sup>2</sup> = 0.11). The ASD group produced fewer event memories and projections than the TD group (see **Figure 2A**).

Mann–Whitney U-tests on cued recall performance did not show any differences. However, Friedman ANOVA revealed a significant period effect on performance in the control group (χ <sup>2</sup> = 13.1, p = 0.004, η <sup>2</sup> = 0.84). The control group reported less details for the distant future period compared to the recent past (p = 0.03) and near future periods (p = 0.03) (see **Figure 2B**).

### Emotional Feeling

The analyses of emotion (i.e., valence and arousal) revealed no significant differences between groups (**Table 3**). However, Friedman ANOVA revealed a significant period effect on arousal in the TD group (χ <sup>2</sup> = 13.13, p = 0.004, η <sup>2</sup> = 0.84). The arousal associated to memories for the recent past was lower compared to the remote past (p = 0.02) and distant future periods (p = 0.008). Friedman ANOVA analyses conducted in the ASD group showed a period effect for valence (χ <sup>2</sup> = 7.72, p = 0.05, η <sup>2</sup> = 0.39). Memories associated with the remote past had a more positive valence than the recent past (p = 0.01).

### Sensory Perceptual Details

Analyses on the total number of sensory details showed a significant reduction in the ASD group for the remote past (z = 2.74, p = 0.006, η <sup>2</sup> = 0.23). Analyses of each perceptual modality revealed significant differences between the ASD and control group on color for recent past (number z = 2.48, p = 0.01, η <sup>2</sup> = 0.19 and intensity z = 2.19, p = 0.03, η <sup>2</sup> = 0.15) and for remote past (number z = 2.78, p = 0.005, η <sup>2</sup> = 0.24). We also observed differences on smell for remote past period (number z = 2.61, p = 0.01, η <sup>2</sup> = 0.19 and intensity z = 2.00, p = 0.05, η <sup>2</sup> = 0.12), on sound (intensity for remote past z = 2.21, p = 0.03, η <sup>2</sup> = 0.15 and distant future z = −2.05, p = 0.04, η <sup>2</sup> = 0.13), and tactile feeling for remote past (number z = 2.12, p = 0.04, η <sup>2</sup> = 0.13). Except for sounds for the distant future, the ASD group produced fewer information associated with less intensity than the TD group for all modalities and periods cited above (**Table 3**).

Friedman ANOVA analyses were conducted within each group on each category of sensory perceptual details. First and concerning the TD group, analyses showed a period effect on both the number and intensity of smell (respectively, χ <sup>2</sup> = 12.05, p = 0.007, η <sup>2</sup> = 0.75 and χ <sup>2</sup> = 8.28, p = 0.04, η <sup>2</sup> = 0.44): both scores associated with the near future were reduced compared to the remote past (number p = 0.02, intensity p = 0.01) and distant future (number p = 0.005, intensity p = 0.02). Second and concerning the ASD group, analyses showed a period effect on the intensity of colors (χ <sup>2</sup> = 10.03, p = 0.02, η <sup>2</sup> = 0.58): the intensity of colors associated with the recent past was reduced compared to the remote past (p = 0.01). We also observed in this group a period effect on the intensity of sounds (χ <sup>2</sup> = 10.74, p = 0.01, η <sup>2</sup> = 0.64): sound intensity associated with the remote past was reduced compared to the distant future (p = 0.02).

### Recollection and Other Properties

Mann–Whitney comparisons revealed no significant difference for the measures of perspective, personal relevance, wish for it to happen, or probability of occurrence (**Table 4**). However, the ASD group had lower scores than the TD group on several measures associated to the remote past period: mental imagery (number, z = 2.17; p = 0.03, η <sup>2</sup> = 0.14), subjective recollection (z = 1.98, p = 0.05, η <sup>2</sup> = 0.12), and frequency of evocation (z = 2.3, p = 0.02, η <sup>2</sup> = 0.16). Friedman ANOVA analyses conducted in the TD group showed a period effect on mental imagery (number: χ <sup>2</sup> = 8.01, p = 0.05, η <sup>2</sup> = 0.39 and accuracy: χ <sup>2</sup> = 12.24, p = 0.007, η <sup>2</sup> = 0.39). Number of mental imagery associated with recent past was more important than near (p = 0.05) and distant (p = 0.008) future periods. Accuracy of mental imagery associated with recent past was better than for the remote past (p = 0.008) and distant future periods (p = 0.008) and accuracy of mental


TABLE 3 | Mean (SD) emotional feeling and sensory details for each group and each period. Number of details and importance are reported.

<sup>∗</sup>Significant differences were observed between the ASD and TD groups (in bold), p < 0.05. # Importance refers to arousal for emotions and intensity for sensory details. ASD, participants with autism spectrum disorder; TD, typically developing participants; ND, mean of number of details (SD); I, mean of importance (SD).

TABLE 4 | Mean (SD) properties of personal events according to group.


<sup>∗</sup>Significant differences were observed between the ASD and TD groups (in bold), p < 0.05. #For future events only. ASD, participants with autism spectrum disorder; TD, typically developing participants.

imagery associated with near future was better than distant future period (p = 0.05).

### DISCUSSION

The aim of this study was to analyze the properties of past memories and future thinking produced by adolescents with ASD, compared with their TD peers, using a visual cues paradigm. As hypothesized, results revealed difficulty with free recall in the ASD group that contrasted with typical performance on the visually cued task. We found differences between the groups on the total number of sensory details provided only for the remote past period. These differences also appeared when we considered each perceptual modality separately, with the ASD group reporting fewer color, smell, sound, and tactile feeling details and intensity than the TD group. Finally, we did not observe any impairment on the measures of emotion and quality of the experience of recollection, except for number of mental imagery, subjective recollection and frequency of evocation for the remote past.

### Visual Cues in Autobiographical Memory Tasks

Our results showed a significant benefit from visual cues in the production of both past and future episodic autobiographical events. This enhanced performance is in line with the task support hypothesis developed by Bowler et al. (1997), which postulates that performance is better when support is provided at retrieval. Hence, visual cues may be more effective for learning/retrieval, as demonstrated by previous studies that used pictorial prompts for teaching children with ASD (McClannahan and Krantz, 1997; Quill, 1997). AM may be used as a support for social interaction in a social skill program and, for example, ASD participants may use visual cues to share their personal memories.

The impaired performances of participants with ASD on the free recall task were in accordance with their story recall

performances (i.e., on the verbal episodic memory test), and mirror previous findings in individuals with ASD (Lind and Bowler, 2010; Brown et al., 2012; Lind et al., 2014a,b). Our data also corroborate the findings of previous studies on future thinking (Terrett et al., 2013; Ciaramelli et al., 2018). In addition, planning difficulties observed in the ASD participants may have contributed to this result. We went beyond them by considering temporal distance and showing impairments of both near that may extend to distant future projections. These impairments may result from difficulty with scene construction, as suggested by Lind et al. (2014b) and, more recently, by Ciaramelli et al. (2018). These authors reported the production of fewer internal details (i.e., episodic), compared with TD controls, but similar numbers of external details (i.e., semantic). Difficulty describing internal states leads to abnormalities in binding experience directly to the self and establishing bonds between the self and others, and consequently, giving coherent meaning to events (Fivush, 2009). Maister and Plaisted-Grant (2011) also suggested that poorer temporal processing abilities in ASD are related to episodic memory impairments. The difficulty accessing episodic AM seemed less pronounced for memories related to the previous and forthcoming summer vacations. Compared with the recent past (restricted to the previous or next day), the more extended vacation period offered a range of possible autobiographical events, facilitating the retrieval of one specific and especially salient moment. Moreover, in contrast to many other studies (Goddard et al., 2014), our task fixed the time period but not the topic, and consequently allowed participants greater flexibility in choosing their personal events, which may have been more closely related to their concerns.

### Sensory Properties

Contrary to our prediction, the episodic memories provided by the participants with ASD contained just as many sensory details as those produced by controls for three periods. These results are in accordance with Crane and Goddard (2008), who did not observe any difference in sensory or emotional information in adults with ASD. This may result, in part, from the use of visual cues for each perceptual modality. However, a lack of details persists for the remote past that may illustrate consolidation difficulties reported by Goddard et al. (2007) and Bon et al. (2012). This reduction is relatively homogeneous and concerned all modalities except taste. Rather surprisingly, however, the recent episodic memories also lacked color details. The adolescents with ASD did mention colors, but fewer than controls. This finding is in accordance with the accounts of some families, who report particular interest in or aversion to some colors and lights in daily life. Some individuals with ASD may have either an obsession with or phobia of colors, as described by Ludlow et al. (2014) in a case study. Hence, they may have an atypical perception of colors that affects the formation/retrieval of memories, even when support is provided. Very few studies have used colored material to study either working memory (see, for example, Vogan et al., 2014) or long-term memory (Massand and Bowler, 2015) in ASD. When Franklin et al. (2008) investigated color memory per se, they found impaired performance for colors compared with shapes. Two years later, Franklin et al. (2010) also reported a general reduction in chromatic sensitivity. This atypical sensitivity to color may account for the present results.

### Recollection and Emotional Properties

When our participants with ASD were prompted by visual cues, we did not find any difference in the processing of either the valence or intensity of emotions: they produced memories that were just as positive as those of controls. These results further justify the use of visual cues at retrieval to compensate for the difficulty that individuals with ASD have understanding verbally expressed emotions. Moreover, Maccari et al. (2014) demonstrated that individuals with ASD are able to process positive emotional information embedded in pictures just as well as controls. Our results indicate that this ability can be generalized to familiar autobiographical scenes.

Concerning the other properties, we observed differences between the two groups only for the remote past. The ASD group had reduced mental imagery, subjective recollection and frequency of evocation. Participants with ASD produced memories lacking in details and associated with reduced episodic properties, compared to controls. Once more, this result is in accordance with abnormal forgetting previously reported in ASD. These data replicate those of other experimental studies that used anterograde memory paradigms (Bowler et al., 1997; Souchay et al., 2013; Cooper and Simons, 2018). Our participants' recollection difficulties may reflect an additional deficit in relational processes, as demonstrated by Bowler et al. (2014) and Gaigg et al. (2015). Individuals with ASD have difficulty binding together the different features that make up an episodic event (Happé and Frith, 2006). Hence, the ASD group may have been successful in recalling some episodic features separately, with the aid of visual cues, but had difficulty binding them together to generate a feeling of reliving. This may be due to weak central coherence, leading to construction, organization, and retrieval difficulties (Happé and Frith, 2006; Bowler et al., 2011), and possibly impacting other abilities such as theory of mind, as suggested recently by Ciaramelli et al. (2018).

Surprisingly, we did not observe the same pattern of performance for projections into the future. Performance was poorer for future versus past periods in the control group for number and accuracy of mental imagery, as previously demonstrated by Abram et al. (2014), thus reducing differences with the ASD group. Hence, the ASD group had an intact feeling of pre-experiencing the future, supporting the notion that the feeling of reliving previous experiences and the pre-experiencing of future events are subtended by partially distinct mechanisms. The feeling of pre-experiencing may have been the product of reasoning based on vividness, the visual perspective adopted during the questionnaire, and personal relevance, as previously demonstrated by D'Argembeau and Van der Linden (2012). All these features were preserved in our participants. The sense of self may be involved to a more limited extent in the ability to elaborate a mental representation associated with future thinking than in the remembering of past autobiographical events.

### Limitations and Perspectives

fpsyg-10-01513 July 5, 2019 Time: 15:15 # 10

This work presents certain limits. First, the sample size is relatively small, preventing us from generalizing to the ASD population. In addition, since we had the opportunity to include only boys, inclusion of a group of girls would extend our conclusions to ASD as a whole. Second, our groups do not differ in age but have a wide age range. Given the major influence of age on cognitive development, it would be particularly interesting to investigate the relationship between AM development and other cognitive abilities, such as theory of mind which is impaired in ASD. Third, given the interaction between AM development and social interactions, environment and lifestyle (e.g., family, therapies, activities, etc.), largely neglected in previous studies, it is crucial to consider these factors in future research. Fourth, each personal event was manually transcribed and scored according to a grid coding for five components of episodic memory (i.e., what, where, when, how, who). Scoring was obtained separately by the interviewer and a psychologist until a consensus was reached. In future work, recording verbatim productions would refine the analysis in providing a more detailed investigation of each component. Finally, the interviewer was one of the two coders and was thus not blind to groups. It would be relevant to replicate our results with two coders blind to the diagnoses and verify their inter-rater reliability.

### CONCLUSION

Our study suggests that AM impairment may result from a combination of a consolidation deficit for the most remote events associated with a binding deficit and demonstrates the relevance of using visual cues to facilitate AM retrieval. These results are in keeping with other studies and may be relevant to other cognitive abilities, as recently suggested by Ciaramelli et al. (2018). This may offer new methodological opportunities for managing ASD. It also shows that some specific properties associated with episodic memories, possibly colors, may be less important than they are to TD people. This raises the issue of the impact of perception on AM, which requires further

### REFERENCES


investigation. In addition, we observed considerable variability, which we could not analyze because of the small size of our sample. Hence, characterizing the different AM profiles should be the next step in studies of cognition in ASD. This could open up new perspectives for cognitive rehabilitation, such as working on AM as the key to social interactions.

### ETHICS STATEMENT

Families were given a comprehensive description of the research. The study was approved by the relevant ethics committees, and written consent was obtained from the participants (or their parents, in the case of minors) in line with their guidelines.

### AUTHOR CONTRIBUTIONS

MA, JLV, FE, and BG-G contributed to the conception and design of the study. MA, JLV, JM, LB, FG, EM, CB, FB-B, and J-MB organized the database. MA, PW, JLV, and BG-G conducted the statistical analysis. MA, PW, and BG-G wrote the first draft of the manuscript. All authors contributed to the manuscript revision, and read and approved the submitted version.

### FUNDING

This study was supported by the Fondation de France (Grant No. 2007005799).

### ACKNOWLEDGMENTS

We are grateful to Elizabeth Wiles-Portier, Camille Chapot, and Renaud Coppalle for reviewing the language style. We are indebted to all the children and adolescents and their families who took part in this study, as well as to the teachers and head teachers who kindly accommodated us in their schools.

trajectory of autobiographical memory. Front. Psychol. 3:605. doi: 10.3389/ fpsyg.2012.00605




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Anger, Wantzen, Le Vaillant, Malvy, Bon, Guénolé, Moussaoui, Barthelemy, Bonnet-Brilhault, Eustache, Baleyte and Guillery-Girard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Temporarily Out of Order: Temporal Perspective Taking in Language in Children With Autism Spectrum Disorder

Jessica Overweg<sup>1</sup> , Catharina A. Hartman<sup>2</sup> and Petra Hendriks<sup>1</sup> \*

<sup>1</sup> Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, Netherlands, <sup>2</sup> Department of Psychiatry, University Medical Center Groningen, Groningen, Netherlands

Clinical reports suggest that children with autism spectrum disorder (ASD) struggle with time perception, but few studies have investigated this. This is the first study to examine these children's understanding of before and after. These temporal conjunctions have been argued to require additional cognitive effort when conjoining two events in a clause order that is incongruent with their order in time. Given the suggested time perception impairment and well-established cognitive deficits of children with ASD, we expected them to have difficulties interpreting temporal conjunctions, especially in an incongruent order. To investigate this, the interpretation of before and after in congruent and incongruent orders was examined in 48 children with ASD and 43 typically developing (TD) children (age 6–12). Additional tasks were administered to measure Theory of Mind (ToM), working memory (WM), cognitive inhibition, cognitive flexibility, IQ, and verbal ability. We found that children with ASD were less accurate in their interpretation of temporal conjunctions than their TD peers. Contrary to our expectations, they did not have particular difficulties in an incongruent order. Furthermore, older children showed better overall performance than younger children. The difference between children with ASD and TD children was explained by WM, ToM, IQ, and verbal ability, but not by cognitive inhibition and flexibility. These cognitive functions are more likely to be impaired in children with ASD than in TD children, which could account for their poorer performance. Thus, the cognitive factors found to affect the interpretation of temporal language in children with ASD are likely to apply in typical development as well. Sufficient WM capacity and verbal ability may help children to process complex sentences conjoined by a temporal conjunction. Additionally, ToM understanding was found to be related to children's interpretation of temporal conjunctions in an incongruent order, indicating that perspective taking is required when events are presented out of order. We conclude from this that perspective-taking abilities are needed for the interpretation of temporal conjunctions, either to shift one's own perspective as a hearer to another point in time, or to shift to the perspective of the speaker to consider the speaker's linguistic choices.

Keywords: autism spectrum disorder, executive functioning, perspective taking, temporal conjunctions, Theory of Mind

#### Edited by:

Danielle DeNigris, Fairleigh Dickinson University, United States

#### Reviewed by:

Janie Busby Grant, University of Canberra, Australia Wing Chee So, The Chinese University of Hong Kong, Hong Kong

> \*Correspondence: Petra Hendriks p.hendriks@rug.nl

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 18 April 2018 Accepted: 20 August 2018 Published: 05 September 2018

#### Citation:

Overweg J, Hartman CA and Hendriks P (2018) Temporarily Out of Order: Temporal Perspective Taking in Language in Children With Autism Spectrum Disorder. Front. Psychol. 9:1663. doi: 10.3389/fpsyg.2018.01663

## INTRODUCTION

fpsyg-09-01663 September 4, 2018 Time: 11:45 # 2

Time is an important dimension by which we make sense of the world (Navon, 1978). Time is also deeply rooted in the structural organization of language (Klein, 1994). In language, time is generally conceived as a sequential order of events, where one event follows another from past to present to future. Speakers can use temporal expressions, like before or after, to express the order of events in time either in order of occurrence (i.e., temporally congruent) or out of order (i.e., temporally incongruent). The interpretation of the temporal conjunctions before and after in an incongruent order is found to be difficult for typically developing (TD) children (Clark, 1971; Pyykkönen and Järvikivi, 2012; Blything et al., 2015; de Ruiter et al., 2018). This may hold even more for children with an autism spectrum disorder (ASD). Clinical reports suggest that children with ASD encounter difficulties in time perception (Wing, 1996). Additionally, some studies have suggested that individuals with ASD have difficulty interpreting before and after (Boucher, 2001; Perkins et al., 2006). The present study investigates time perception in language in children with ASD and their TD peers by examining their interpretation of sentences containing temporal conjunctions.

Before and after are viewed as the prototypical linguistic expressions indicating temporal order (Schilder and Tenbrink, 2001). Speakers can use these expressions in several ways to express the order of events. For example, all four sentences below indicate that someone first climbed a tree and next read a book:


The speaker's choice of before in a main-subordinate clause order (1) and after in a reversed clause order (4) result in a congruent presentation of the temporal order of events, whereas before in a subordinate-main clause order (2) and after in a reversed clause order (3) result in an incongruent presentation. Thus, it depends on the speaker's choice of type of conjunction and clause order whether the hearer should interpret the event order as congruent or incongruent.

Developmental studies in TD children report that congruency has an effect on the correct interpretation of before and after (Clark, 1971; Trosborg, 1982; McCormack and Hanley, 2011; Pyykkönen and Järvikivi, 2012; Blything et al., 2015; de Ruiter et al., 2018). Children under the age of 7 have more difficulties interpreting conjunctions in a temporally incongruent order than in a temporally congruent order, and mostly rely on the order of presentation of the events. Pyykkönen and Järvikivi (2012) showed that children between 8 and 12 years old still experience difficulties interpreting temporal conjunctions in an incongruent order, especially when the cue to event order occurs sentencemedially, as in example sentence (3).

Children's difficulties with interpreting temporal conjunctions in an incongruent order have been explained in various ways. For example, these difficulties have been argued to result from a still fragile understanding of the meaning of the temporal conjunctions before and after (Clark, 1971), from difficulty shifting one's perspective to a different point in time (McCormack and Hoerl, 1999; McCormack and Hanley, 2011), from difficulty processing subordinate-main clause orders (Diessel, 2008), and from difficulty holding information active in working memory (WM) during processing to create a chronological mental representation of the events (Blything et al., 2015; Blything and Cain, 2016). In adults, interpreting temporal conjunctions in an incongruent rather than congruent order comes with processing costs and has been shown to tax WM (Münte et al., 1998). So, interpreting temporal conjunctions in an incongruent order may require additional cognitive effort.

According to anecdotal evidence and clinical reports, individuals with ASD encounter difficulties in time perception (Wing, 1996). They often report a need to adhere to rituals and routines and are commonly preoccupied with timetables, clocks, and calendars, which may serve to compensate for their failure to predict future events and their disorientation in time (Allman and DeLeon, 2009). This led Boucher (2001) to suggest that individuals with ASD have an impaired sense of time. So far, few studies have been conducted on time perception in children with ASD. Some studies report intact time perception (Wallace and Happé, 2008; Gil et al., 2012), while other studies suggest that children with ASD experience particular difficulties with understanding temporal ordering and concepts such as duration, succession, past, and future (Gillberg and Peeters, 1995; Boucher et al., 2007; Maister and Plaisted-Grant, 2011). Also, some studies report that children with ASD use fewer temporal expressions in story-telling (Colle et al., 2008) and more often omit tense marking than their TD peers (Roberts et al., 2004). These findings regarding the production of temporal expressions suggest that children with ASD may struggle with their interpretation of temporal conjunctions as well, although a mismatch between their production abilities and their comprehension abilities is also conceivable (see Hendriks, 2014 for an overview and discussion of attested production–comprehension asymmetries in child language).

Executive functioning (EF) impairments, often present in children with ASD (Hill, 2004), could make it especially difficult to interpret temporal conjunctions in an incongruent order. EF refers to cognitive processes such as WM (the capacity system that allows the temporary storage and manipulation of information necessary for complex tasks such as language comprehension; Baddeley, 2000), inhibition (the mental ability to suppress irrelevant information; Dagenbach and Carr, 1994), and flexibility (the mental ability to shift between different thoughts or actions; Scott, 1962), that allow for the flexible alteration of thought and behavior in response to changing contexts (Welsh and Pennington, 1988). Recent studies have argued that TD children between 3 and 7 years old have more difficulties interpreting temporal conjunctions in an incongruent order than in a congruent order because more information must be maintained in WM to revise the mental representation of the events and create a chronological mental representation (Blything et al., 2015; Blything and Cain, 2016). The neuroimaging studies of Münte et al. (1998) and Ye et al. (2012) suggest that, also for adults, WM is needed for the temporal re-ordering of events. Furthermore, the ability to

inhibit an initial interpretation and to flexibly revise a mental representation of event order could be needed to interpret conjunctions in an incongruent order (Pyykkönen and Järvikivi, 2012; Blything and Cain, 2016). Thus, in addition to WM, also cognitive inhibition and cognitive flexibility may be involved.

In addition to impairments in these EF functions, also impairments in Theory of Mind (ToM) understanding (Frith and Frith, 2006) could make it difficult for children with ASD to interpret temporal conjunctions in an incongruent order. ToM is the ability to take the cognitive perspective of other people to understand their beliefs, desires and intentions (Wimmer and Perner, 1983) and is argued to be impaired in children with ASD (Baron-Cohen et al., 1985). If the interpretation of an incongruent temporal order involves ToM understanding, an incongruent temporal order may be especially difficult for children with ASD. Several studies have suggested that the interpretation of temporal language not only requires a consideration of the actual perspective in time but also a consideration of alternative temporal perspectives (McGlone and Harding, 1998; McCormack and Hoerl, 1999; Stocker, 2012). According to McCormack and Hoerl (1999), hearers should not only be able to shift from the actual perspective in time to alternative temporal perspectives, but should also understand the relation between these perspectives. Based on their account of the development of temporal understanding, they posit that "temporal perspective taking involves mentalizing abilities" (McCormack and Hoerl, 1999; p. 174). Thus, mentalizing, or ToM understanding, could be involved in the comprehension of an incongruent order of events.

This is the first study to investigate how 6- to 12-yearold children with ASD and their TD peers interpret temporal conjunctions. We expect that all children find the interpretation of before and after more difficult in the incongruent order than in the congruent order, but that children with ASD find the interpretation of these temporal conjunctions in an incongruent order more difficult than their TD peers. As EF and ToM have been reported to be possibly impaired in individuals with ASD, this may explain the hypothesized difficulties with the interpretation of temporal conjunctions in children with ASD. Therefore, we further hypothesize that differences in the interpretation of temporal conjunctions in an incongruent order are associated with individual differences in EF and ToM understanding. In addition to the specific cognitive factors EF and ToM, we also examine the role of the more general cognitive factors IQ and verbal ability. EF and ToM may not only provide insight into the individual differences in ASD that play a role in temporal language understanding, but may also provide insight into what it is in the broad measures of IQ and verbal ability that possibly explains temporal language understanding.

### MATERIALS AND METHODS

### Participants

In this study, 48 children with ASD and 43 TD children participated. All children were monolingual native Dutch children who did not have any reported language disorders. The children in the ASD group were diagnosed with ASD by clinicians on the basis of the DSM-IV-TR criteria (American Psychiatric Association, 2000) and had an IQ of >75 based on a clinically administered full IQ test. Additionally, in all children (ASD as well as TD), certified professionals administered the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1999), the Autism Diagnostic Interview Revised (ADI-R; Rutter et al., 2003), two subtests (Vocabulary and Block Design) of the WISC-III-NL to estimate IQ (Kort et al., 2002), and the Peabody Picture Vocabulary Test to measure Verbal Ability (VA) (PPVT-III-NL; Schlichting, 2005). Two children from the ASD group were excluded because they neither met the ADOS criteria for ASD nor the ADI-R criteria for ASD (cf. Risi et al.'s., 2006, ASD2 criteria). One child from the TD group met the ADOS criteria for ASD and was therefore excluded as well, leaving 46 children with ASD (mean age = 9;4, SD = 2;2) and 42 TD children (mean age = 9;2, SD = 2;0) for further analysis. The group descriptives of the ASD group and the TD group are provided in **Table 1**.

Children with ASD were recruited via outpatient clinics for child and adolescent psychiatry in Groningen and a national website for parents with children with ASD. TD children were recruited via advertising in newsletters and flyers at schools in the north of Netherlands. The children were tested individually on a single day in a quiet room at the university with two experimenters present. This study is part of a wider study on language and perspective taking in children with ASD, in which all children of the current study participated. The medical ethical committee of the University Medical Hospital Groningen evaluated this study as not falling under the Medical Research Involving Human Subjects Act (WMO). Nevertheless, we followed the required procedures and obtained written informed consent from the parents of all participants for their child's participation in the research.

### Language Comprehension Task

Comprehension of temporal conjunctions was tested using a picture selection task. Per item, participants saw two pictures side by side on a computer screen, each depicting an event (see **Figure 1**).

Simultaneously, they heard a pre-recorded sentence describing the temporal order of the two events. Participants had to press one of two buttons on a button box to select the picture that, according to the sentence, showed the event that happened first. The sentences contained either voordat ("before") or nadat ("after"), which occurred either in sentenceinitial position (corresponding to subordinate-main clause order) or in sentence-medial position (corresponding to main-subordinate clause order). Examples of each of the four conditions (conjunction × position) in the language comprehension task are shown below in Dutch, followed by word by word glosses and English translations:




<sup>a</sup>PDD-NOS: pervasive developmental disorder-not otherwise specified; <sup>b</sup>The ASD2 criteria of Risi et al. (2006) are: "a child meets criteria on Social and Communication domains or meets criteria on Social and within two points of Communication criteria or meets criteria on Communication and within two points of Social criteria or within one point on both Social and Communication domains" (Risi et al., 2006; p. 1100); <sup>c</sup>Estimated IQ of two subtests of the Dutch version of the Wechsler Intelligence Scale for Children (WISC-III-NL; Kort et al., 2002); <sup>d</sup>Normed verbal ability score from the Dutch version of the Peabody Picture Vocabulary Test (PPVT-III-NL; <sup>e</sup>Excluded from the group descriptives in this table as well as from the analyses; Schlichting, 2005); ∗∗p < 0.01; ∗∗∗p < 0.001.

Hij klom in de boom voordat hij het boek las. he climbed in the tree before he the book read "He climbed the tree before he read the book."


The events in sentences (2) and (3) are mentioned in a congruent order, whereas the events in (1) and (4) are mentioned in an incongruent order. All events were unrelated to avoid a preference for one of the two event orders based on event typicality.

Stimuli were presented and responses were recorded using the computer software E-Prime 2.0 (Schneider et al., 2002). First, children completed three practice items to practice that the left and right button corresponded to the left and right picture, respectively. This was followed by an introduction of the boy in the pictures and three practice items containing other temporal expressions (e.g., "today" and "yesterday") to determine whether the participant understood the principle of temporal ordering in the task. Next, the participants received 32 test items, with a short break in the middle. The test items were distributed across 4 lists. Each list contained 16 congruent test items and 16 incongruent test items in a randomized order. We counterbalanced the position of the pictures on the screen. The experiment took approximately 15 min.

### Cognitive Tasks Working Memory

To test WM, the N-Back task (Owen et al., 2005) was used. In this task, participants had to watch and remember pictures presented one by one on a computer screen and indicate whether the picture on the screen was a particular object or not (0-back or baseline condition), whether it matched the picture one trial before (one-back condition), and whether it matched the picture two trials before (two-back condition). Participants received a practice session of 15 trials per condition and a test session consisting of 60 trials per condition. The mean accuracy (ACC) on the two-back condition was calculated as a measure of WM.

#### Cognitive Inhibition

To test cognitive inhibition, the Flanker task [Amsterdam Neuropsychological Test battery (ANT) version 2.1; De Sonneville, 1999] was administered. In this task, participants had to identify the color of a target stimulus surrounded by eight distractors (flankers). The target color was red or green and was associated with the left or right button, respectively. The flankers were either in the same color as the target (compatible trials) or in the color that was associated with the opposite response (incompatible trials). For this task, participants received 12 practice items, 40 compatible test items, and 40 incompatible test items. The mean ACC and mean reaction time (RT) of cognitive inhibition was measured by subtracting the mean ACC or RT on compatible trials from the mean ACC or RT, respectively, on incompatible trials (resulting in the congruency effect; see Mullane et al., 2009).

### Cognitive Flexibility

To test cognitive flexibility, we adapted the gender emotion switch task of De Vries and Geurts (2012) to make it more similar to a classical switch task (e.g., Rogers and Monsell, 1995). In our shape–color switch task, participants saw pictures of round or square figures in black or white on the computer screen and had to press the left or right button to report the shape (round or square) or the color (black or white) of the figure. The cue at the top of the screen indicated whether the shape or the color had to

FIGURE 1 | An example of the two pictures of an item in the language comprehension task. Written informed consent was obtained from the parents for publication of their child's images.

be reported. Participants received 16 items to practice with shape, 16 items to practice with color, and 40 items to practice with switching between shape and color. The test consisted of 216 trials in total; a third of these trials (72) were switch trials (switching from color to shape or vice versa) and the remaining two third were repeat trials. The mean ACC and mean RT of switch costs was measured by subtracting the mean ACC or RT on repeat trials from the mean ACC or RT, respectively, on switch trials (cf. De Vries and Geurts, 2012).

### Theory of Mind

To test first-order and second-order ToM, the Bake Sale task adapted from Hollebrandse et al. (2014) was used. This task is a second-order false belief (FB) task with stories modeled after Perner and Wimmer's (1985) "ice cream truck story" in which the beliefs of various characters were manipulated. Per story, participants heard a verbal description of the events in the story, accompanied by four pictures that were presented one by one. During the presentation of the story, they received three questions to probe their understanding of the events in the story, as well as a question about the FB of another person (first-order FB question) and a question about the FB of another person about a second person (second-order FB question). The task consisted of eight stories in total, each of which contained a first-order FB question and a second-order FB question. The measures of ToM1 and ToM2 were calculated using the ACC on the eight first-order FB questions and the ACC on the eight second-order FB questions, respectively.

### Data Analysis

The data of the language comprehension task were analyzed using generalized linear mixed models (GLMMs), using a logit link to accommodate the repeatedly measured (32 trials) binary outcome variable Accuracy (0 for incorrect, 1 for correct) (Jaeger, 2008; Heck et al., 2012). Compound symmetry was used as the covariance matrix type. We set out with a full factorial model with Congruency (Congruent vs. Incongruent) as within group factor and Group (TD vs. ASD) as between group factor. Age was meancentered and additionally included in the model. Interactions that did not have an effect on Accuracy (p > 0.05) were removed from the model one by one, starting with the interaction with the largest p-value, after which we refitted the model. This resulted in model 1, which shows the extent to which Accuracy was predicted by Congruency, Group, and Age, as well as the relevant (p < 0.05) interactions. The possible presence of effects related to Type of conjunction (Before vs. After) and Clause order (Mainsubordinate vs. Subordinate-main) were subsequently checked,

post hoc, in model 1. For purposes of interpretation, we illustrate significant effects using the median split method.

Next, the seven parameters derived from the N-Back task (WM), the Flanker task (Cognitive inhibition ACC and Cognitive inhibition RT), the cognitive flexibility task (Switch costs ACC and Switch costs RT) and the FB task (ToM1 and ToM2) were mean-centered and, one by one, examined as main effects and in interaction with the significant predictors from model 1 in seven separate analyses. The data of 3 participants (2 ASD and 1 TD) were missing in the Cognitive inhibition ACC and RT analyses, leaving the data of 44 participants with ASD and 41 TD participants. In each separate analysis, interactions that had no effect on Accuracy (p > 0.05) were removed from the model. Based on the outcomes of these analyses per predictor, we combined the cognitive factors with (main or interaction) effects on Accuracy (p < 0.05) and added these with the significant predictors of model 1 in a model with multiple predictors to evaluate their effects adjusted for one another (cf. Kuijper et al., 2015; Overweg et al., 2018). This resulted in model 2, which shows the relevant cognitive factors that had an effect on the interpretation of temporal conjunctions.

Finally, the parameters from the WISC (estimated IQ on the basis of the subtests Vocabulary and Block Design) and PPVT (VA) were mean-centered and included in two separate analyses in model 1. If they had an effect on Accuracy (p < 0.05), they were added to model 2 and evaluated in model 3. This resulted in model 3, which shows whether these general background variables changed the effects found in model 2. Given the significant group differences (see **Table 1**) in estimated IQ and VA, this approach provides a statistical alternative to a priori matching on estimated IQ and VA.

### RESULTS

Model 1 showed main effects of Group and Age, indicating that the children in the TD group were more accurate in their interpretation of temporal conjunctions than the children in the ASD group, and that the older the child was, the better its performance. No main effect or interactions with Congruency were found (all p-values >0.05). A post hoc exploration of Type of conjunction and Clause order in model 1 showed a main effect of Type of conjunction (B = −0.943; SE = 0.14; p = 0.00), indicating that children perform better on sentences with before than on sentences with after. Clause order did not influence performance (p > 0.05). **Table 2** lists all remaining effects in model 1.

**Figure 2** presents the mean proportions of correct responses in the congruent and incongruent condition separately for the ASD and TD groups.

Next, we examined one by one which cognitive factors were associated with Accuracy. The separate analyses indicated a main effect of WM (B = 2.355; SE = 0.747; p = 0.002) and interactions of ToM1∗Congruency (B = 2.325; SE = 1.034; p = 0.026) and ToM2∗Congruency (B = 1.465; SE = 0.552; p = 0.009). No effects of Cognitive inhibition and Cognitive flexibility were found (pvalues <0.05).

Then, we combined all significant interactions and main effects of these analyses per predictor in model 2, a model with multiple predictors. The interaction effect of ToM1∗Congruency was no longer significant when adjusted for the other cognitive variables and was removed from the model. **Table 2** lists all remaining effects in model 2.

Model 2 showed a main effect of WM (p = 0.03; see **Table 2**), indicating that children with lower WM are less accurate in their interpretation of temporal conjunctions than children with higher WM. Model 2 also showed an interaction effect of ToM2∗Congruency (p = 0.01; see **Table 2**). As is shown in **Figure 3**, children with lower second-order ToM understanding are less accurate in their interpretation of temporal conjunctions in the Incongruent condition than children with higher secondorder ToM understanding. The median split method is used to plot Accuracy of temporal conjunction interpretation in each condition per ToM2 group (low ToM2: ≤0.75 vs. high ToM2: >0.75) to illustrate the direction of the interaction effect. The figure caption of **Figure 3** provides background information about the ToM performance of each group.

The main effects of Group and Age disappeared with the addition of ToM2 and WM in model 2 (all p-values >0.05; see **Table 2**).

Finally, we checked for possible effects of the background variables IQ and VA on Accuracy. These analyses per predictor indicated main effects of IQ (B = 0.026; SE = 0.005; p < 0.001) and VA (B = 0.033; SE = 0.006; p < 0.001) and an interaction effect of VA∗Age (B = 0.001; SE = 0.00; p < 0.001). In model 3, we combined these main and interaction effects with the effects of model 2. Model 3 showed main effects of IQ and VA, indicating that children with a lower IQ and lower VA show a lower Accuracy than children with a higher IQ and higher VA, respectively. The interaction of VA∗Age remained significant in this analysis with multiple predictors, indicating that younger children (regardless of their VA), and older children with low VA, were less accurate in their interpretation of temporal conjunctions than older children with high VA, as is shown in **Figure 4**. Again, the median split method is used to plot Accuracy of temporal conjunction interpretation in each condition per VA group to illustrate the direction of the interaction effect. The figure caption of **Figure 4** provides background information about the VA performance of each group.

With the addition of IQ and VA, the main effect of WM disappeared (p > 0.05). The interaction effect of ToM2∗Congruency remained significant in model 3. Together, the results show that second-order ToM, WM, IQ, and VA play a role in the interpretation of temporal conjunctions. Individual and group differences therein explain why the TD group performs better than the ASD group and why older children perform better than younger children.

### DISCUSSION

We investigated time perception in language by examining the interpretation of sentences containing the temporal conjunctions before and after by native Dutch school-aged children with

TABLE 2 | Estimated effects of variables per model on the interpretation of temporal conjunctions.


The models were built with accuracy in the language comprehension task as the dependent variable and the variables listed in the first column as independent variables. The variable Congruency was manipulated by Type of conjunction (Before vs. After) and Clause order (Main-subordinate vs. Subordinate-main), with Before+Mainsubordinate and After+Subordinate-main resulting in Congruent items, and Before+Subordinate-main and After+Main-subordinate resulting in Incongruent items. A post hoc exploration of Type of conjunction and Clause order in model 1 showed a main effect of Type of conjunction (B = −0.943; SE = 0.14; p = 0.00); <sup>∗</sup>p = < 0.05; ∗∗p = < 0.01.

and without ASD. We found, in line with our predictions, that children with ASD were less accurate than their TD peers at interpreting these temporal conjunctions. Contrary to our predictions, however, children with ASD did not have particular difficulties with temporal conjunctions in an incongruent compared to a congruent order. Furthermore, older children were found to perform better than younger children.

To understand the group and age effects, we examined which cognitive factors were associated with the interpretation of temporal conjunctions. Also, we examined whether the general background variables IQ and Verbal Ability affected

interpretation. We found that age, IQ and VA were the major predictors of children's correct interpretation of temporal conjunctions. Furthermore, the group effect was explained by differences in WM, second-order ToM understanding, IQ and VA. Children with ASD as well as TD children with lower WM made more errors when interpreting temporal conjunctions than children with higher WM. However, the effect of WM disappeared when taking into account children's

group: 95.96; Old-High VA group: 118.48.

IQ. This is not surprising, given the strong relation between WM and IQ (Ackerman et al., 2005; Kidd, 2013). Also, IQ is a more broadly defined cognitive variable than WM and, in addition to measuring the simple short-term storage component of WM (Colom et al., 2008), also measures other cognitive abilities. VA appeared to underlie the age improvement in our study. Younger children, and older children with lower VA, made more errors when interpreting temporal conjunctions than older children with higher VA. This suggests that verbal skills must be sufficiently well developed for a mature understanding of complex sentences such as those involving temporal conjunctions. While suggested by previous studies (McCormack and Hanley, 2011; Blything et al., 2015), we found no effects of cognitive flexibility and cognitive inhibition (cf. de Ruiter et al., 2018). Particularly relevant for our research question and hypotheses was our finding that better secondorder ToM understanding was positively associated with correct interpretation in an incongruent temporal order.

Although most children in our study showed a robust understanding of sentences containing temporal conjunctions, as predicted the children with ASD were less accurate than their TD peers at interpreting these sentences. In line with our hypotheses, this group difference between children with ASD and TD children was explained by differences in WM, secondorder ToM understanding, IQ and VA. Because these cognitive functions are more likely to be impaired in children with ASD than in TD children (see Section "Introduction"), we attribute the poorer performance of children with ASD to their impaired cognitive functions rather than to their clinical diagnosis of ASD per se. Thus, our results actually suggest a much broader application than ASD, as the observed effects of cognitive factors on the interpretation of temporal language are likely to be relevant for typical development as well.

We did not find confirmation for our prediction that children with ASD have particular difficulties with temporal conjunctions in an incongruent order. Also, we did not find a main effect of congruency. The children in our study performed equally well on congruent as on incongruent items, in contrast to what has been found in several earlier studies with TD children (Clark, 1971; Trosborg, 1982; McCormack and Hanley, 2011; Pyykkönen and Järvikivi, 2012; Blything et al., 2015; de Ruiter et al., 2018). Possibly, we did not find a main effect of congruency because the children in our study were on average older (with a mean age of 9) than the children in most earlier studies and can be expected to have a more robust understanding of the meaning of the temporal conjunctions. Only one effect of congruency emerged from our data: children who make more errors in their interpretation of temporal conjunctions in an incongruent order were found to have a lower second-order ToM understanding. Good ToM understanding may thus help children to correctly interpret temporal conjunctions in an incongruent order, thereby suggesting that perspective taking is needed to interpret temporal conjunctions when the events are presented out of order.

One way to explain the role of ToM is that ToM understanding helps children to shift their perspective to another point in time in response to temporal language, and to understand the relationship between these different temporal perspectives on the same events (cf. McCormack and Hoerl, 1999; McCormack and Hanley, 2011). This explanation is in line with the literature on episodic memory based on the notion of mental time travel (Suddendorf and Corballis, 2007), or mental selfprojection (Kretschmer-Trendowicz et al., 2016). Mental time travel involves a shift of the self from the immediate present to an alternative temporal perspective, for example, a past or future perspective (Buckner and Carroll, 2007; Suddendorf and Corballis, 2007). Several studies have suggested that there is a relation between mental time travel abilities and the comprehension of temporal language (Suddendorf and Corballis, 2007; Ferretti and Cosentino, 2013). In addition, it has been found that the neural processes involved in false-belief inferencing and the neural processes involved in mental time travel, in particular in taking the perspective of one's future self to choose between an immediate and a future reward, overlap (O'Connell et al., 2018). In line with our results, this suggests that the comprehension of temporal language involves ToM understanding to enable hearers to shift from the immediate present to another point in time and perceive the situation from these different temporal perspectives.

An alternative possibility is that ToM understanding enables hearers to shift from their own perspective to the perspective of the speaker, for example, to find out why the speaker presented the events in an incongruent order. de Ruiter et al. (2018) explain their finding that children perform better with a congruent than an incongruent order in terms of the semantic principle of iconicity. They suggest that children initially assume an iconic (i.e., congruent) mapping between the order of events in the sentence and the order of events in the real world. Iconicity has been argued elsewhere to result from perspective taking; more

complex, marked, forms tend to express more complex, marked, meanings (e.g., Horn, 1984; Levinson, 2000; Aissen, 2003). These more complex meanings have been argued to be acquired later in typical development than their less complex counterparts because they require the hearer to reason about why the speaker did not use the less complex form (e.g., De Hoop and Krämer, 2006; Hendriks et al., 2010). Incongruent meanings are more complex than congruent meanings. Also, sentences with after seem to be more complex than sentence with before, considering the post hoc effect of type of conjunction but not of clause order in our study (see note of **Table 2**) and the observation that before is acquired earlier than after (see Clark, 1971). Thus, a sentence with after may require the hearer to reason about why the speaker chose to use after rather than before, for example to foreground or background particular information. As mentioned above, our results indicate that children who make more errors in their interpretation of temporal conjunctions in an incongruent order have a lower second-order ToM understanding. Good ToM understanding may thus help hearers to correctly interpret temporal conjunctions in an incongruent order by allowing them to take the speaker's perspective to find out why the speaker presented the events out of order.

In contrast to the study of de Ruiter et al. (2018), our study suggests that children need sufficient WM capacity for the interpretation of sentences containing temporal conjunctions. The different findings of the role of WM capacity could be the result of different WM measures. While we used a visuospatial WM task (an N-Back task) to operationalize WM capacity, de Ruiter and colleagues used three short-term memory tasks that do not require manipulation of the stored information (a word repetition task, a non-word repetition task and a sentence imitation task). These tasks may not have captured WM to the extent needed in complex sentence comprehension. Our findings confirm the results of Blything and Cain (2016), who used a verbal WM task (a digit span task) and also found a main effect of WM capacity on the interpretation of temporal conjunctions. Importantly, like Blything and Cain, we did not find that congruency interacted with WM in the accuracy task. This suggests that children's difficulties with interpreting temporal conjunctions in an incongruent order are not explained by insufficient WM. Rather, children seem to need sufficient WM to process complex sentences conjoined by a temporal conjunction in general. These findings are corroborated by studies that have shown that individuals need WM capacity for the comprehension of other types of complex sentences as well, such as relative clauses and complement clauses (Just and Carpenter, 1992; Lewis et al., 2006; Montgomery et al., 2008; Boyle et al., 2013).

Turning to the implications of our study for ASD, previous research on temporal language in children with ASD mostly focused on production, showing deficits in the use of temporal adverbials and tense marking (Roberts et al., 2004; Colle et al., 2008). Here, we showed that verbal children with ASD also struggle with the interpretation of temporal conjunctions, due to weaker ToM understanding and lower WM capacity. This finding highlights the need to further study the interpretation of temporal expressions and temporal ordering in individuals with ASD. Languages have various ways to mark present, past and future and do so in almost every sentence. For example, English has tense marking on the finite verb, temporal adverbials such as now, yesterday, and tomorrow, and in addition to before and after also has other temporal conjunctions such as when, while, and then. A possibility for future research is to examine the interpretation of these and other temporal expressions in children with ASD. A second implication of our study for ASD concerns the nature of the language and communication difficulties in children with ASD. Linguistic deficits in verbal children with ASD are mostly viewed as difficulties with pragmatic aspects of language, which depend on its usage in context (American Psychiatric Association, 2013). However, the interpretation of temporal conjunctions depends on the meaning of the conjunction and its position in the sentence independently of their usage in context, and therefore, difficulty with their interpretation is structural (i.e., syntactic and semantic) rather than pragmatic in nature. In line with previous studies (Boucher, 2012; Durrleman et al., 2015), our results indicate the need to investigate the linguistic deficits in verbal children with ASD beyond pragmatics.

Summarizing, our study showed that children with ASD were less accurate at interpreting sentences containing temporal conjunctions than their TD peers, but did not have more difficulty in an incongruent rather than a congruent order. The different overall performance of children with ASD and TD children was explained by differences in second-order ToM understanding, WM, IQ, and VA, indicating that these factors likely contribute to the mature interpretation of temporal conjunctions. Specifically, second-order ToM understanding was associated with the interpretation of temporal conjunctions in an incongruent order, suggesting that perspective taking is needed to either shift one's own perspective as a hearer from the immediate present to another point in time and relate these different temporal perspectives on the same events, or to shift to the perspective of the speaker to consider the speaker's linguistic choices.

### AUTHOR CONTRIBUTIONS

JO, CH, and PH contributed to the conception and design of the study. JO carried out the experiments. JO, CH, and PH analyzed the data. JO wrote the first draft of the manuscript. CH and PH wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

### FUNDING

This study was funded by the University of Groningen, Groningen, Netherlands.

### ACKNOWLEDGMENTS

The authors thank the children and their parents for participating in this study, Accare Groningen for helping with participant recruitment, and Sanne Kuijper for her statistical assistance.

### REFERENCES

fpsyg-09-01663 September 4, 2018 Time: 11:45 # 10



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Overweg, Hartman and Hendriks. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

**68**

# Time Is Not More Abstract Than Space in Sound

#### Alexander Kranjec1,2 \*, Matthew Lehet2,3, Adam J. Woods4,5 and Anjan Chatterjee<sup>6</sup>

<sup>1</sup> Department of Psychology, Duquesne University, Pittsburgh, PA, United States, <sup>2</sup> Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, United States, <sup>3</sup> Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States, <sup>4</sup> Cognitive Aging and Memory Clinical Translational Research Program, Institute on Aging, University of Florida, Gainesville, FL, United States, <sup>5</sup> Department of Aging and Geriatric Research, University of Florida, Gainesville, FL, United States, <sup>6</sup> Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States

Time is talked about in terms of space more frequently than the other way around. Some have suggested that this asymmetry runs deeper than language. The idea that we think about abstract domains (like time) in terms of relatively more concrete domains (like space) but not vice versa can be traced to Conceptual Metaphor Theory. This theoretical account has some empirical support. Previous experiments suggest an embodied basis for space-time asymmetries that runs deeper than language. However, these studies frequently involve verbal and/or visual stimuli. Because vision makes a privileged contribution to spatial processing it is unclear whether these results speak to a general asymmetry between time and space based on each domain's general level of relative abstractness, or reflect modality-specific effects. The present study was motivated by this uncertainty and what appears to be audition's privileged contribution to temporal processing. In Experiment 1, using an auditory perceptual task, temporal duration and spatial displacement were shown to be mutually contagious. Irrelevant temporal information influenced spatial judgments and vice versa with a larger effect of time on space. Experiment 2 examined the mutual effects of space, time, and pitch. Pitch was investigated because it is a fundamental characteristic of sound perception. It was reasoned that if space is indeed less relevant to audition than time, then spatial distance judgments should be more easily contaminated by variations in auditory frequency, while variations in distance should be less effective in contaminating pitch perception. While time and pitch were shown to be mutually contagious in Experiment 2, irrelevant variation in auditory frequency affected estimates of spatial distance while variations in spatial distance did not affect pitch judgments. Results overall suggest that the perceptual asymmetry between spatial and temporal domains does not necessarily generalize across modalities, and that time is not generally more abstract than space.

Keywords: space perception, time perception, pitch perception, embodied cognition, conceptual metaphor theory

## INTRODUCTION

Time is frequently talked about using the language of space (Clark, 1973; Haspelmath, 1997; Tenbrink, 2007). A meeting can be long or short, and occupy a place that is either behind or in front of us in time. Space is used to talk about time not only frequently but also meaningfully. We talk about temporal extent or duration in terms of distance (e.g., a short time), and the

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Thora Tenbrink, Bangor University, United Kingdom Junqing Chen, City University of New York, United States

> \*Correspondence: Alexander Kranjec kranjeca@duq.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 23 March 2018 Accepted: 09 January 2019 Published: 01 February 2019

#### Citation:

Kranjec A, Lehet M, Woods AJ and Chatterjee A (2019) Time Is Not More Abstract Than Space in Sound. Front. Psychol. 10:48. doi: 10.3389/fpsyg.2019.00048

**69**

past and future in egocentric locational terms (e.g., the past is behind us). These ways of talking and thinking about space and time are thought to reflect something about how we experience these domains together. We may talk about duration in terms of length because it takes more time to visually scan or travel through a more extended space, and the past as behind because as we walk forward, objects we pass begin to occupy the unseen space behind our bodies becoming accessible only to memory and part of a temporal past. Experimental studies support the idea that the ways in which we experience space play a role in structuring the semantics of time (Boroditsky, 2000, 2001; Boroditsky and Ramscar, 2002; Matlock et al., 2005; Nunez and Sweetser, 2006; Nunez et al., 2006; Torralbo et al., 2006; Casasanto and Boroditsky, 2008; Kranjec et al., 2010; Miles et al., 2010; Kranjec and McDonough, 2011). See Nunez and Cooperrider (2013) for a recent review of experimental research, and Evans (2013) for a perspective from cognitive linguistics.

In semantics, time–space relations are relatively asymmetrical. Not only is time lexicalized in spatial terms much more frequently than vice versa, but in many ways time must be conceptualized using the language of space, whereas the opposite is not true (Jackendoff, 1983; Casasanto and Boroditsky, 2008). [However, see Tenbrink (2007) for a discussion of how such asymmetric mapping relations do not necessarily apply to discourse, and a general perspective on time–space relations that is highly compatible with the one presented in the current study.] These linguistic patterns have been interpreted to suggest a deeper conceptual organization. According to conceptual metaphor theory (Lakoff and Johnson, 1999) we think about relatively abstract target domains (like time) in terms of more concrete source domains (like space). This basic organizational principle is purported to serve the functional role of making more abstract concepts easier to talk and think about. It is argued that we depend on such a hierarchy because, for example, we can directly see and touch things "in space" in a way that we cannot "in time." This suggests that thinking about time in terms of space runs cognitively deep, and reflects a mental organization more fundamental than that observed at the relatively superficial level of semantics.

In a widely cited paper, Casasanto and Boroditsky (2008) sought strong experimental evidence for this theoretical organizational principle. Specifically, they wanted to know if the asymmetry of space-time metaphors in language predicted a similar asymmetry in perception. They reasoned that low-level perceptual biases demonstrating concordant asymmetry with patterns found in language would provide strong evidence that temporal representations are grounded in more concrete spatial representations.

In their study, participants viewed growing or static lines one at a time on a computer screen. Lines could be of nine durations crossed with nine displacement sizes to produce 81 unique stimuli. After the presentation of each line, participants were randomly prompted to either reproduce a line's spatial extent (by dragging a mouse) or a line's duration (by clicking a mouse). Each line was presented twice: once in each kind of reproduction trial (i.e., displacement or duration estimation).

They found that the remembered size of a line in space concordantly modulated recall for its duration, but not vice versa. That is (spatially), longer lines were remembered as being presented for longer times, but lines of greater durations were not remembered as having greater spatial extent. The results were consistent with the idea that asymmetrical patterns of space-time mappings in language are preserved further down at the level of perception. They concluded, "these findings provide evidence that the metaphorical relationship between space and time observed in language also exists in our more basic representations of distance and duration" (p. 592). Similar results reporting asymmetrical effects have been found with children (Casasanto et al., 2010) but not with monkeys (Merritt et al., 2010) or pigeons (De Corte et al., 2017).

That humans use space to think about time is now widely acknowledged. The idea that time is fundamentally more abstract (and less accessible to the senses) than space may be regarded as a prerequisite for this relation. However, there are still reasons to question this general organizational principle constraining "links between the abstract domain of time and the relatively concrete domain of space" (Casasanto, 2010, p. 455). At least, there might be some misunderstanding about what it means for time to be more abstract than space.

First, neural data supporting the idea that our temporal concepts are grounded in embodied spatial representations is scarce, partly because it is not entirely clear what an embodied spatial representation is in the first place (Kranjec and Chatterjee, 2010). Furthermore, recent fMRI evidence suggests that temporal and spatial concepts do not necessarily have privileged relations in the brain too. In an experiment (Kranjec et al., 2012) designed to look for functional architecture shared among basic abstract semantic categories (space, time, and causality), brain areas associated with the spatial extent of simple events had little overlap with those associated with their temporal duration. By focusing on space, embodied theories have neglected to investigate temporal conceptual grounding in neural systems that instantiate time perception in the body.

Another issue concerns what is meant by "concrete" and "abstract" in the Conceptual Metaphor Theory literature. In defining the distinction between concrete and abstract thought, Lakoff (2014) writes:

Our current theory begins with a basic observation: The division between concrete and abstract thought is based on what can be observed from the outside. Physical entities, properties, and activities are "concrete." What is not visible is called "abstract:" emotions, purposes, ideas, and understandings of other nonvisible things (freedom, time, social organization, systems of thought, and so on)."

Or according to a more recent description according to Mental Metaphor Theory:

That is, people often think in "mental metaphors". . . pointto-point mappings between non-linguistic representations in a "source domain" (e.g., SPACE) and a "target domain" (e.g., TIME) that is typically more abstract (i.e., hard to perceive) or abstruse (i.e., hard to understand; Lakoff and Johnson, 1980), which support inferences in the target domain (Casasanto, 2017, p.47).

While there is little agreement among philosophers regarding what counts as an abstract or concrete concept (Rosen, 2018), generally speaking concrete kinds of representations are those that refer to physical objects that can be experienced directly through the senses. Regardless, behavioral studies in this area of research frequently rely on visual tasks and, perhaps more controversially, there is a tendency to conflate "space" with what could be more accurately described as the "visuospatial." This makes it unclear whether previously observed behavioral asymmetries between time and space reflect (1) very general differences in how humans process the abstract domains of space vs. time [E.g., "Aspects of time are often said to be more "abstract" than their spatial analogs because we can perceive the spatial, but we can only imagine the temporal (Casasanto and Boroditsky, 2008, p. 580)] or (2) a less general, modalityspecific contribution of visual representations in humans. That is, perhaps space-time asymmetries discussed in previous behavioral studies can be better understood in terms of visual biases and do not directly reflect how differences in the relative abstractness of space vs. time may serve as a general organizing principle in human cognition. In fact, perceptual asymmetries between space and time may be better understood in terms of their relevance with respect to a particular modality more than their imagined placement on a concrete-abstract continuum.

To distinguish between these two alternatives, the present study directly probes time–space relations in the auditory domain. Audition was selected because there are intuitive reasons to think that those time–space asymmetries observed in vision might actually be reversed in sound. Phenomenologically, time, more than space, seems to be an intimate part of our auditory experience. [But see (Shamma, 2001) for a dissenting view]. For example, whereas spatial relations and visual objects tend to be persistent, sound, like time, is relatively transient (Galton, 2011). Temporal information is more meaningful and/or salient in common forms of experience grounded in sound perception (e.g., music and speech). In the context of music, "when" a sound occurs matters much more than "where" it occurs. There are neuropsychological reasons too. While the retina preserves analog spatial relations in early representations, the cochlea does not (Ratliff and Hartline, 1974; Moore, 1977). That is, the pattern of activation on the sensory surface of the eye is representative of the relative spatial relations among visual objects in an array, and these relations are further preserved topologically in the cortex. In the auditory system spatial relations between auditory objects are computed in the cortex, achieved via a temporal mechanism (interaural time difference); there is no direct representation of these spatial relations preserved on the primary sensory surface of the cochlea. For these reasons, sound localization is less precise than object localization in vision (Kubovy, 1988). In speech, the ability to perceive differences in voice onset time is critical for discriminating between phonological categories (Blumstein et al., 1977).

Temporal relations, as compared to spatial ones, appear to be more relevant to hearing as indicated by the relatively concrete manner that temporal information is represented, processed, experienced, and embodied in the auditory system. While one might argue that relations between sound and time are relatively more concrete (i.e., more directly accessible to the senses) than relations between sound and space, perhaps it is more accurate to say that time is more modality-relevant than space in audition. While the difference between concreteness and modality-relevance may in part be a historical-philosophical distinction, the present research addresses some issues raised by how concreteness is frequently discussed in the literature with a task closely following Casasanto and Boroditsky (2008) but using auditory instead of visual stimuli. It asks: are the kinds of space-time asymmetries observed in previous studies using visual stimuli also observed in a purely auditory task?

### EXPERIMENT 1

### Methods

### Ethics Statement

This study was approved by the Institutional Review Board at the University of Pennsylvania. Written informed consent was obtained from all participants.

### Participants

Twenty members of the University of Pennsylvania community participated for payment. All participants were right-handed, native English speakers, and between 18 and 26 years of age.

### Procedure and Experimental Design

The participants were equipped with headphones and seated at a computer for a self-paced experiment. Participants initiated the beginning of each new trial and the start of each within-trial component. Each trial consisted of two sounds, a target sound followed by a playback sound. In the first part of each trial, the target sound was presented, and participants were instructed to attend to both spatial and temporal aspects of the stimulus. Target sounds consisted of bursts of white noise that changed in location relative to a participant's head position across time. White noise bursts were of nine durations (lasting between 1000 and 5000 ms with 500 ms increments) and nine distances (moving between 0.5 and 4.5 m in increments of 0.5 m). All durations and distances were crossed to create 81 distinct target sounds. The initial location of the target sound was an average of 2.75 m to the left or right of the listener with a jitter of between 0.1 and 0.5 m. The plane of movement was 1 m in front of the listener. Starting locations on the right indicated leftward moving trials and starting locations on the left indicated rightward moving trials. Starting locations were randomly assigned to stimuli with an even number of right and leftward moving trials. Stimuli were created using MATLAB and played using the OpenAL library provided with Psychophysics Toolbox extensions (Brainard, 1997). The OpenAL library is designed to model sounds moving in virtual metric space for a listener wearing headphones using head related transfer functions (HRTFs).

After attending to the target sound, participants were prompted to reproduce either the sound's duration or distance and then instructed to press the spacebar to begin the playback sound. In this second part of each trial, the playback sound provided the medium for the participant's response. The playback

sound began in the final location of the preceding target sound and moved in the reverse direction. So, if a target sound moved rightward, the playback sound moved leftward, and vice versa. On distance trials, participants were instructed to respond when the playback sound reached the start location of the target sound, thereby reproducing the distance from head to start point. In this manner, the participant's head provided a fixed reference point for judging distance. On duration trials, participants were instructed to respond when the playback sound duration was equal to the target sound duration. The playback sound lasted for a fixed 8500 ms and moved 3.5 m past the starting location of the target sound or until the participant responded. The playback sounds were designed in such a manner as to allow participants the possibility to both overshoot and undershoot their estimates. Participants heard each target sound in both duration and distance conditions (within-subject design) for a total of 162 trials.

### Results

The results (**Figure 1**) demonstrate that actual spatial displacement affected estimates of duration (**Figure 1B**: y = 128.97× + 2532.8, r = 0.878, df = 7, p < 0.01) and that actual durations affected estimates of spatial displacement (**Figure 1A**: y = 0.0002× + 1.4208, r = 0.982, df = 7, p < 0.01). On duration trials, for stimuli of the same average displacement (2.5 m) sounds of shorter durations were judged to be shorter in length, and sounds of longer durations were judged to be longer in length. On distance trials, for stimuli of the same average duration (3000 ms), sounds shorter in length were judged to be of shorter duration, and sounds longer in length were judged to be of longer duration. Space and time were mutually contagious in that irrelevant information in the task-irrelevant domain affected participants' estimates of both duration and spatial displacement. Compatible effects were found using multiple regression analyses. Distance was significantly correlated with duration judgments when variance associated with actual duration was removed [ρr(81) = 0.64; df = 80, p < 0.01]. Duration was significantly correlated with distance judgments even when variance associated with each trial's actual distance was removed [ρr(81) = 0.81; df = 80, p < 0.01] (Sample N = 81 [nine space and nine time intervals fully crossed]). There was no effect of direction (left-moving vs. right moving trials).

Participants' overall estimates of duration and displacement were very accurate. The effects of actual displacement on estimated displacement (**Figure 1C**: y = 0.6374× + 0.4115, r = 0.99, df = 7, p < 0.001) and actual duration on estimated duration (**Figure 1D**: y = 0.6805× + 813.64, r = 0.99, df = 7, p < 0.001) were also very similar to each other and to analogous analyses of accuracy in Casasanto and Boroditsky (2008). This suggests that participants were approximately equal in accuracy when making duration and distance judgments within the present experiment and between comparable experiments using auditory and visual stimuli. It also suggests that spatial and temporal changes are no more or less "hard to perceive" (Casasanto, 2017) in the approach used here.

The effect of duration on displacement was significantly greater than the effect of displacement on duration (See **Figure 2**: Fisher r-to-z transformation, difference of correlations = 0.104; z = 1.7 one-tailed, p < 0.05). However, some caution should be taken when interpreting this result. It is unclear to us whether differences in perceptual judgments between domains can be directly compared at such a fine grain when arbitrarily defined scales, intervals, and ranges (e.g., in seconds and meters) are used to define temporal and spatial aspects of the stimuli. This is a concern even though spatial and temporal judgments focused on identical stimuli. It is possible that other scaled relations could yield different patterns of results.

### Experiment 1 Discussion

While strong claims about deeply embodied asymmetrical relations between space and time in the auditory domain may be premature, Experiment 1 found a significant pattern of time– space asymmetry in the auditory domain. This asymmetry is predicted by the temporal quality of auditory processing and runs in the opposite direction of the asymmetry found in the visual domain as predicted by Conceptual Metaphor Theory and patterns of language use (Casasanto and Boroditsky, 2008). The results suggest that the spatial nature of vision more than space per se explains results of previous studies. So while one may suggest that time is relatively "concrete" as compared to space in sound (using the terms provided by Conceptual Metaphor Theory) it may be more useful to think about time as more "relevant" in the auditory modality. Either way, temporal representations may be more directly embodied or salient in audition as compared to spatial representations.

While the results of Experiment 1 are suggestive of a perceptual asymmetry running opposite to that observed in the visual domain, broader claims regarding any deep asymmetry between time and space in the auditory domain are premature. Although the results from Experiment 1 suggest that "in sound," time appears to influence judgments of spatial displacement more than vice versa, these results may not generalize to other aspects of auditory phenomena. To make stronger claims about the relevance of space and time in the auditory domain, Experiment 2 extends the current approach, testing the manner in which representations of space and time contaminate an aspect of auditory perception that is itself directly represented by the nervous system. Whereas space and time are abstract facets of any perceptual modality, pitch is a fundamental attribute of hearing; analogous to color, or brightness in vision (Boring, 1933; Marks, 2004).

### EXPERIMENT 2

To further probe the relative effects of space and time in the auditory modality, Experiment 2 examines the mutual effects of space, time, and pitch, a uniquely auditory attribute. The perception of pitch makes possible the processing of melody in music, and prosody in speech. Defined as the perceived frequency or "repetition rate of an acoustic waveform" (Oxenham, 2012) pitch is, together with loudness and timbre, one of three basic auditory sensations. Current theories suggest that properties of the physical stimulus and the physiological mechanisms for

(D) (TIME→TIME) on the right depict within domain effects. Error bars refer to standard error of the mean.

transduction and neural representation, in addition to prior experience, all play a significant role in pitch perception. This most likely involves both temporal and place coding throughout the auditory system. When sound enters the cochlea, the distinct frequencies that make up an acoustic waveform activate tuned neural sites arranged along its membrane in a spatially analog manner. Such tonotopic, "rate-place" (or time–space) mapping is preserved in the auditory processing system as far as the primary auditory cortex. [See Oxenham (2012) for a review]. As such, pitch perception involves the representation of both spatial and temporal information at multiple levels of processing. The centrality and salience of pitch perception in auditory experience, and its fundamental spatiotemporality make it an ideal domain for further testing hypotheses supported by the results of Experiment 1.

Another reason pitch is an interesting domain to interrogate in the present study is that across many languages we talk about pitch in terms of space (e.g., tones can be described as "high" or "low"). Based on Conceptual Metaphor Theory, pitch as the target domain in such a mapping is assumed to be more abstract as compared to space, the source domain. According to such a formulation, we talk about pitch in terms of space because spatial relations are easier to conceptualize. However, with respect to the approach taken here, pitch as a fundamental attribute of auditory perception with a specific sensory mechanism devoted to its representation, can be reasonably conceptualized as more modality-relevant to both space and time in the auditory modality. In this manner, the inclusion of pitch allows for competing predictions for Conceptual Metaphor Theory and the kind of modality-relevant explanation introduced in the current study. If we talk about pitch in terms of space because space is relatively "less abstract," then changes in spatial distance should contaminate judgments of pitch more than vice-versa. However, if modality-relevance determines the strength of contamination effects, then the opposite pattern of results should be observed. In general, if a representational domain (space, time, and pitch) is more relevant and/or directly perceivable in a particular modality (audition) then it should be more effective in contaminating less relevant domains and less vulnerable to contamination by others.

Based on the results of Experiment 1, we reasoned that in comparing space and time, spatial distance, as representative of a less modality-relevant domain, should be less effective (as compared to duration) in contaminating the perception of pitch in a procedure using purely auditory stimuli. We can further predict a range of transitive effects based on the relative degree of modality-relevance for space, time, and pitch. If the relations of modality-relevance are such that: space < time (based on the argument presented, and the results of Experiment 1) and space < time < pitch (based on pitch being a fundamental attribute of audition with a unique physiological mechanism for sensory transduction), then the expected results should follow the general pattern displayed in **Figure 2**.

### Methods

#### Participants

Forty-two members of the University of Pennsylvania community participated for payment. All participants were right-handed, native English speakers, and between 18 and 26 years of age. Twenty participants performed Experiment 2A. Twenty-two distinct participants performed Experiment 2B. Data from two of these participants were excluded from the final analyses because their reaction times across conditions were greater than two standard deviations from the mean.

#### General Procedure and Design

The general procedure and design of Experiment 2 was identical to that of Experiment 1. Participants were equipped with headphones and seated at a computer for a self-paced experiment. Participants initiated the beginning of each new trial and the start of each within-trial component. Each trial consisted of two sounds, a target sound followed by a playback sound. In the first part of each trial, the target sound was presented, and participants were instructed to attend to either the duration and pitch of the stimulus (Experiment 2A) or the distance and pitch of the stimulus (Experiment 2B). After attending to the target sound, participants were informed of the trial type and instructed to press the spacebar to begin the playback sound. The playback sound provided the medium for the participant to reproduce either the spatial displacement, duration, or pitch depending on the experiment and trial type. As in Experiment 1, all stimuli were created using MATLAB and played using the OpenAL library provided with Psychophysics Toolbox extensions (Brainard, 1997).

### Experiment 2A: Space and Pitch

In Experiment 2A participants (N = 20) were instructed to attend to both the distance and pitch of the stimulus. Target sounds were of nine distances [moving between 0.5 and 4.5 m in increments of 0.5m] (as in Experiment 1), and nine frequencies ranging between 150 and 1350 Hz in increments of 150 Hz, all crossed to create 81 discrete stimuli. The initial location of the target sound was an average of 2.75 m to the left or right of the listener with a jitter of between 0.1 and 0.5 m. Starting locations on the right indicated leftward moving trials and starting locations on the left indicated rightward moving trials. Starting locations were randomly assigned to stimuli with an even number of right and leftward moving trials. The plane of movement was one meter in front of the listener. Stimuli were created using MATLAB and played using the OpenAL library provided with Psychophysics Toolbox extensions (Brainard, 1997).

After attending to the target sound, participants in Experiment 2A were informed of the trial type (distance or pitch) and instructed to press the spacebar to begin the playback sound. The playback sound provided the medium for the participant's response. The playback sound began in the final spatial location and frequency endpoint of the preceding target sound and moved in the reverse direction (both in terms of space and pitch). Directionality in space (left to right or right to left) and pitch (high to low or low to high) was randomized across all trials. On distance trials, participants were instructed to respond when the playback sound reached the start location of the target sound. In this manner, the participant's head provided a fixed reference point for judging distance. On pitch trials, participants were instructed to respond when the playback sound spanned the target sound's frequency range.

### Experiment 2B: Time and Pitch

The procedure for Experiment 2B was identical to that in 2A but with duration replacing distance as a domain of interest. In Experiment 2B, when the target sound was presented, participants (N = 22) were instructed to attend to both the duration and pitch of the stimulus. The target sound in Experiment 2B was a sound consisting of a variable and continuous range of frequencies presented over a variable period of time in both ears. Target sounds were of nine durations (lasting between 1000 and 5000 ms with 500 ms increments as in Experiment 1) and nine frequencies ranging between 150 and 1350 Hz in increments of 150 Hz (as in Experiment 2A). All durations and frequencies were crossed to create 81 distinct

target sounds. Each discrete stimulus was used twice, once in the duration condition and once in the pitch condition. The initial frequency of the target sound began within the higher (2250 Hz) or lower (990 Hz) ends of the audible range of speech with a randomized jitter between 1 and 50 Hz. Frequency endpoints were determined by varying the number of frequency increments the sound moved through across trials. Frequency "direction" (high to low, or low to high) was random across trials.

After attending to the target sound, participants in Experiment 2A were informed of the trial type (duration or pitch) and instructed to press the spacebar to begin the playback sound. The playback sound provided the medium for the participant's response. It presented the same frequency ranges in the opposite direction, starting at the frequency endpoint of the target sound and moving toward the start point and lasted for a maximum of 8.5 s or until the participant ended the trial by responding. On duration trials, participants were instructed to respond when the playback sound duration was equal to the target sound duration. On pitch trials, participants were instructed to respond when the playback sound span equaled that of the target sound's frequency range. For all trials, there were at least five additional frequency increments and seven additional duration increments within the playback sound to allow participants the possibility to both overshoot and undershoot their estimates. Data for both duration and frequency judgments were collected regardless of condition.

### Results: Experiments 2A and 2B

Between Experiments 2A and 2B there are four main correlations to consider. They describe the effects of frequency on (A) distance estimates (PITCH→SPACE) and (B) duration estimates (PITCH→TIME) and the effects of (C) distance and (D) duration on frequency estimates (SPACE→PITCH and TIME→PITCH, respectively). These results are displayed in **Figure 3**. A comparison of r values between conditions/experiments is depicted in **Figure 4**.

The effect of distance on frequency estimation (**Figure 3A**) was not significant (y = 15.955× + 598.21, r = 0.593, df = 7, p = 0.09), while actual duration affected estimates of frequency (**Figure 3B**) (y = 30.7× + 488.22, r = 0.793, df = 7, p = 0.01). Actual frequency affected estimates of duration (**Figure 3C**) (y = 0.4098× + 2597.1, r = 0.901, df = 7, p = 0.001) and spatial displacement (**Figure 3D**) (y = 0.0005× + 1.4745, r = 0.959, df = 7, p < 0.001). The effect of actual frequency on spatial displacement (r = 0.959) was significantly greater than the effect of space on frequency estimation (r = 0.593) (3A vs. 3B, difference of correlations = 0.366, Fisher r-to-z transformation, z = 2.17 one-tailed, p < 0.05). Correlation coefficients for PITCH→TIME (r = 0.90) and TIME→PITCH (r = 0.79) effects were not significantly different from one another.

Complementary effects were found using multiple regression analyses. Distance was significantly correlated with frequency judgments even when variance associated with each trial's actual frequency was removed [ρr(81) = 0.33; df = 80, p = 0.003], and duration was significantly correlated with frequency judgments even when variance associated with each trial's actual frequency was removed [ρr(81) = 0.45; df = 80, p < 0.001]. Frequency was significantly correlated with duration judgments even when variance associated with each trial's actual duration was removed [ρr(81) = 0.54; p < 0.001]; and with distance judgments even when variance associated with each trial's actual distance was removed [ρr(81) = 0.78; p < 0.001]. There was no effect of direction (left-moving vs. right moving trials).

Participants' overall estimates of duration, spatial displacement, and pitch were accurate. The effects of actual duration on estimated duration (y = 187.04× + 2122 r = 0.94, df = 7, p < 0.001), actual frequency on estimated pitch (Exp. 2A: y = 0.2555× + 431.53 r = 0.95, df = 7, p < 0.001), actual spatial displacement on estimated displacement (y = 0.4874× + 0.6134 r = 0.99, df = 7, p < 0.001), and actual frequency on estimated pitch (Exp. 2B: 0.4425× + 306.19, r = 0.99, df = 7, p < 0.001) were all highly reliable but not significantly different from one another. Again, these results suggest that spatial, temporal, and pitch changes are no more or less "hard to perceive" (Casasanto, 2017) in the current procedure.

### Experiment 2 Discussion and Results Summary for Experiments 1 and 2

We predicted that if space is less relevant than time in the auditory modality then pitch should affect spatial judgments more than temporal judgments (PITCH→SPACE > PITCH→TIME), but that space should be less effective than time in influencing pitch judgments (SPACE→PITCH < TIME→PITCH). The significant asymmetry in the effects of pitch-on-space vs. space-on-pitch, together with an inspection of the r values (**Figure 4B**) is consistent with predictions based on the degree of modality-relevance of space, time, and pitch "in sound." The pattern of results suggests that in the auditory modality, space is particularly sensitive to irrelevant information while being less effective in modulating other kinds of information.

Across Experiments 1 and 2 in terms of the strength and direction of the respective correlation, a domain's relative level of modality-relevance was predictive of both how well it performed as an agent, or modulator of other domains (r = 0.96, **Figure 4C**), and as a patient when examining the extent that it was sensitive to modulation by other domains (r = –0.98, **Figure 4D**). These predictions run counter to those made by Conceptual Metaphor Theory, general patterns in language use, and a previous literature that often portrays time as fundamentally more abstract than space.

### GENERAL DISCUSSION

An earlier study (Casasanto and Boroditsky, 2008) using visual stimuli found strong evidence for an asymmetrical relationship between space and time, such that the remembered size of a stimulus in space modulated recall for its duration, but not vice versa. In contrast, Experiment 1 having an analogous design but using auditory stimuli found that space and time are mutually contagious. Furthermore, as predicted by the privileged relation between auditory and temporal processing, the perceived duration of a stimulus had a larger effect on perceived spatial

estimates across all participants for a particular trial type (distance, duration, or frequency estimation; y-axis) can be described as the average of all nine interval values for that domain presented at each interval of the irrelevant distractor domain (actual frequency, distance, or duration; x-axis). If the irrelevant domain on x exerted no influence on estimation for y one would expect a horizontal line. Deviation from that horizontal represents cross-domain interference. (A) Effect of distance on frequency estimates (expected = 750 Hz at each interval of actual distance). (B) Effect of duration on frequency estimates (expected = 750 Hz at each interval of actual duration). Error bars refer to standard error of the mean. (C) Effect of frequency on duration estimates (expected = 3000 ms at each interval of actual frequency). (D) Effect of frequency on distance estimates (expected = 2.5 m at each interval of actual frequency).

displacement than the reverse. In order to further investigate the relevance of space and time in the auditory modality, Experiment 2 examined the mutual effects of space, time, and pitch. We reasoned that if space is less modality-relevant than time in sound, space should be more easily contaminated by pitch, while being less effective in contaminating it. While time and pitch were shown to be mutually contagious, pitch affected estimates of space but not vice versa. Across Experiments 1 and 2, results suggest that the visual asymmetry between space and time does not generalize to other domains like audition, and that time is not fundamentally more abstract than space.

While the present results are suggestive of a perceptual asymmetry running opposite to that observed in the visual domain, strong claims regarding a deep embodied asymmetry between time and space in the auditory domain require further support. Nor should it be assumed that the presence of modalityspecific asymmetries suggests those of equal strength (to those found in vision) in the opposite direction. Notably, the effect of spatial displacement on duration estimates was still strong in the auditory domain (r = 0.88). In Casasanto and Boroditsky's (2008) study, actual duration had no discernable effect on spatial displacement judgments. Furthermore, although "in sound," space appears to be less relevant than time, these results may not generalize to other scales, intervals, and ranges of time– space relations. And while the methods in the current auditory study attempted to mirror those of the original visual study, there are some differences. For example, whereas Casasanto and Boroditsky's (2008) study used a relatively "active" task requiring participants to reproduce the spatial or temporal extent of the visual target with a mouse drag or click in "real" space, the current study used a relatively "passive" one in that participants responded to a playback sound, stopping it when it reached a certain duration or location in "virtual" space. The auditory reproduction task in the current study required that participants remain passive while the sound object moved through space and time to reach a certain location, duration, or frequency. However, the playback sounds were always the same: duration could not be used to judge distance, and distance could not be used to judge duration. Casasanto and Boroditsky's study required dragging a mouse between mouse clicks on spatial trials or clicking a

stationary mouse on time trials. This task additionally required participants to translate between a visual stimulus and a motoric response in analog space. Also, because it generally takes a longer time to travel a longer distance, despite orthogonalizing space and time in the target stimuli, duration and spatial displacement may have been correlated across participants' reproduction responses, but only on space trials. Future studies could aim to use identical, modality- and domain-unbiased reproduction tasks, using both visual and auditory stimuli across a range of scales; although it should be noted that equating scales between distinct perceptual modalities at the level of psychophysics and phenomenology is never straightforward. That is, identical distances may not scale and behave identically across vision and sound.

Another limitation concerns the extent to which one can isolate and describe the mechanism for producing the pattern of results described here. The current experiments (and previous studies on which it is based) require participants to attend to a perceived location, duration, and/or frequency of an auditory stimulus before being tasked to reproduce one of these dimensions by responding to a later target sound. This means that participants were required to maintain information in working memory prior to making a response. Therefore, based on the current data, it is not possible to differentiate where cross-domain contamination occurs with respect to attention, perception, and memory. Moreover, an extensive psychophysics literature has shown that visual and auditory stimuli, along with temporal and spatial information, show differences with respect to how they are attended to and processed, both online and in working memory (Cohen et al., 2009; Protzner et al., 2009; Delogu et al., 2012; Thelen et al., 2015). The approach used here does not allow us to determine where or when contamination occurs, only that it does in the auditory domain in ways that are not predicted by previous theory. Future studies, in describing what aspects of a stimuli are more or less "modality-relevant" would do well to better ground such assertions in the experimental psychophysics literature. In fact, the current study should be considered an invitation to do so.

Still these results suggest that time is not necessarily or fundamentally more abstract than space, and that previously observed verbal and mental asymmetries of representing time in terms of space may at least be partially dependent on the human disposition to think visually. The general idea that visuospatial representations are central to how people talk and think is well established (Johnson-Laird, 1986; Talmy, 2000; Chatterjee, 2001; Tversky, 2005). In the context of previous research demonstrating a strong asymmetry for time–space relations, the results of the present study suggest something very important about the nature of those "embodied spatial representations" that appear to structure patterns in language and thought. That is, such representations are likely visuospatial in nature.

It should be noted that the present results in no way refute those reported in Casasanto and Boroditsky's (2008) study. Rather, our results suggest that the common understanding throughout the literature that time is generally more abstract than space may need to be revised or at least more consistently articulated. This should not come as a total surprise because "space" is itself a very abstract concept and, like "time," cannot be directly seen, touched, or heard. The present data, and the notion of modality-relevance, suggest that what makes certain spatial or temporal relations more or less abstract (in the terms of Conceptual Metaphor Theory) are the sensory modalities in which those relations are preferentially processed or experienced. As such, the present results support a refined but intuitive view of embodied cognition that takes into account contributions of a particular sensory modality in processing the qualities of a stimulus. While space and time may both be very abstract according to such an understanding, relations between objects immersed in either substrate (whether seen or heard) may be more or less so depending on a range of species-specific and contextual variables.

For humans, "embodied spatial representations" important for structuring other forms of thought and language may be most accessible when they are visuospatial in nature. Because humans have a general visual bias in perception, communication, and neural organization, there may be a tendency for us to experience and understand space as relatively less abstract than time. But this does not mean that space is necessarily less abstract than time, or that other organisms experience space and time as we do. While it is famously difficult to imagine the quality of conscious experience in another organism (Nagel, 1974) perhaps it is the case that animals (like bats) which rely more on audition than vision to locate objects in a dynamic environment could be biased to understand time as less abstract than space (if they had opinions on such matters). This is merely to say, that what is experienced as "abstract" may be a function of an organism's particular form of embodiment, rather than a set of formal ontological (metaphysical) relations.

A more tractable issue worth reconsidering concerns the question of why time is generally assumed to be more abstract than space in the first place. The argument may be based on the idea that time, as compared to space, cannot be "directly perceived" (Ornstein, 1969), or that we cannot "see or touch" time (Casasanto et al., 2010). Yet there are known, widely distributed, neural mechanisms specific to temporal processing,

### REFERENCES


and little basis for the assumption that spatial relations are themselves perceived directly (Kranjec and Chatterjee, 2010). The experience of space and time both involve inherently relational processes, making the representation of both domains relatively abstract.

For example, processing locations between objects in an array using vision is arguably no more or less direct than processing rhythm in a sequence of beats using audition, with each requiring the representation of a number of abstract relations between objects or sounds. That is, there is no reason to think that we can directly "see" space any more than we can "hear" time. Nowhere is the dissociation between vision and spatial processing more apparent than in simultanagnosia, a neuropsychological condition in which patients are characteristically unable to perceive more than a single object despite having intact visual processing (Luria, 1959; Coslett and Chatterjee, 2003). Nonetheless, visuo-spatial and audio-temporal relations appear to be privileged. Privileged relations between particular sensory modalities and experiential domains may play some part in determining what we come to label abstract or concrete. Further research is needed to determine why some senses are subjectively felt to be more or less abstract than others, and the specific roles that spatial and temporal organization play in structuring our sensory experience.

### AUTHOR CONTRIBUTIONS

AK, ML, and AC conceived and designed the experiments. ML performed the experiments. AK, ML, and AW analyzed the data. AK and AC wrote the manuscript.

### FUNDING

This research was supported by the National Institutes of Health (RO1 DC004817) and the National Science Foundation (subcontract under SBE0541957). Portions of this manuscript are based on work previously presented at the Cognitive Science Society. "The policy of the Society is that work published in a Proceedings paper may be considered for journal submission provided that the journal submission is substantially more elaborated than the Proceedings paper in terms of literature review, data analysis, and/or discussion" (http://www. cognitivesciencesociety.org/conference/).

Boroditsky, L., and Ramscar, M. (2002). The roles of body and mind in abstract thought. Psychol. Sci. 13, 185–189. doi: 10.1111/1467-9280.00434



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Kranjec, Lehet, Woods and Chatterjee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Interrelations Between Temporal and Spatial Cognition: The Role of Modality-Specific Processing

Jonna Loeffler <sup>1</sup> , Rouwen Cañal-Bruland<sup>2</sup> , Anna Schroeger <sup>2</sup> , J. Walter Tolentino-Castro<sup>1</sup> and Markus Raab1,3 \*

<sup>1</sup> Department of Performance Psychology, Institute of Psychology, German Sport University Cologne, Cologne, Germany, 2 Institute of Sport Science, Friedrich-Schiller-University Jena, Jena, Germany, <sup>3</sup> School of Applied Sciences, London South Bank University, London, United Kingdom

Temporal and spatial representations are not independent of each other. Two conflicting theories provide alternative hypotheses concerning the specific interrelations between temporal and spatial representations. The asymmetry hypothesis (based on the conceptual metaphor theory, Lakoff and Johnson, 1980) predicts that temporal and spatial representations are asymmetrically interrelated such that spatial representations have a stronger impact on temporal representations than vice versa. In contrast, the symmetry hypothesis (based on a theory of magnitude, Walsh, 2003) predicts that temporal and spatial representations are symmetrically interrelated. Both theoretical approaches have received empirical support. From an embodied cognition perspective, we argue that taking sensorimotor processes into account may be a promising steppingstone to explain the contradictory findings. Notably, different modalities are differently sensitive to the processing of time and space. For instance, auditory information processing is more sensitive to temporal than spatial information, whereas visual information processing is more sensitive to spatial than temporal information. Consequently, we hypothesized that different sensorimotor tasks addressing different modalities may account for the contradictory findings. To test this, we critically reviewed relevant literature to examine which modalities were addressed in time-space mapping studies. Results indicate that the majority of the studies supporting the asymmetry hypothesis applied visual tasks for both temporal and spatial representations. Studies supporting the symmetry hypothesis applied mainly auditory tasks for the temporal domain, but visual tasks for the spatial domain. We conclude that the use of different tasks addressing different modalities may be the primary reason for (a)symmetric effects of space on time, instead of a genuine (a)symmetric mapping.

Keywords: time-space mapping, asymmetry hypothesis, symmetry hypothesis, conceptual metaphor theory, a theory of magnitude, spatial representation, temporal representation

## INTRODUCTION

For complex human behavior, including sensorimotor actions such as catching a ball, precise representations of time and space are of utmost importance (e.g., Rosenbaum et al., 2012). For instance, in movement-related tasks the anticipation of duration (= time) and distance (= space) influences manifold decisions about how to act such as when deciding whether to cross the street

#### Edited by:

Danielle DeNigris, Fairleigh Dickinson University, United States

#### Reviewed by:

Zhenguang Cai, University College London, United Kingdom Metehan Çiçek, Ankara University, Turkey

> \*Correspondence: Markus Raab raab@dshs-koeln.de

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 17 July 2018 Accepted: 04 December 2018 Published: 21 December 2018

#### Citation:

Loeffler J, Cañal-Bruland R, Schroeger A, Tolentino-Castro JW and Raab M (2018) Interrelations Between Temporal and Spatial Cognition: The Role of Modality-Specific Processing. Front. Psychol. 9:2609. doi: 10.3389/fpsyg.2018.02609

**80**

or stop walking (Zito et al., 2015), whether to accelerate or slow down when trying to catch a ball (Postma et al., 2017), or whether to wait for the elevator or take the stairs (Wittmann, 2014). In order to predict environmental demands and to plan actions, an actor has to constantly and adequately represent temporal and spatial information (Postma et al., 2017). For example, the looming sound of an approaching car helps a pedestrian to estimate its speed and moment of passing and thus to adjust movements and avoid a collision. This is the very reason why ecars, which typically do not generate sounds, are considered more dangerous for pedestrians than normal cars. As a consequence, a law in the US requires all newly manufactured e-cars to produce auditory noise when driving. Though it is well-known that interrelations between temporal and spatial representations are essential for human functioning, the mechanisms underlying these interrelations are far from being well understood.

When reviewing the literature that addresses the (a)symmetry of time and space, it is evident that there is no consensus about the intimate links between temporal and spatial representations (Winter et al., 2015). Two influential and currently debated hypotheses are the asymmetry hypothesis, which is based on the conceptual metaphor theory (=CMT, Lakoff and Johnson, 1980; Boroditsky, 2000) and the symmetry hypothesis, which is based on a theory of magnitude (= ATOM, e.g., Walsh, 2003). Both assume different relationships between temporal and spatial representations and, as a consequence make divergent claims about how time-space mappings modulate movements. Notwithstanding the divergent predictions, both hypotheses received robust empirical support (e.g., Boroditsky, 2000; Merritt et al., 2010; Agrillo and Piffer, 2012; Bottini and Casasanto, 2013; Hyde et al., 2013; Skagerlund and Träff, 2014; Xue et al., 2014; Coull et al., 2015; Skagerlund et al., 2016, see **Tables 1**, **2** for an overview). The question arises as to how it is possible that two contradicting hypotheses seem to both have received robust empirical support? In search of the mechanisms that cause the contradictory findings, it is important to realize that the different modalities are differently sensitive to the processing of time and space. Consequently, we hypothesized that different sensorimotor tasks addressing different modalities may account for the contradictory findings. Based on this assumption, in this mini-review we critically review relevant literature to examine which modalities were addressed in time-space mapping studies.

Focusing on the role of modalities during the processing of temporal and spatial information, it should be considered that auditory information processing shows enhanced sensitivity to temporal information but lower sensitivity to spatial information (e.g., O'Connor and Hermelin, 1972; Recanzone, 2009). By contrast, visual information processing shows higher sensitivity to spatial information but lower sensitivity to temporal information (e.g., O'Connor and Hermelin, 1972; Recanzone, 2009). However, in audio-visual conditions, people tend to use the modality with the highest informational value to solve the task (e.g., Zhou et al., 2007). To illustrate, people are better in deducing spatial information regarding an approaching car when presented with information visually compared to being presented with auditory information. Therefore, when deducing temporal and spatial information from an approaching car, vision is our dominating system and thereby relatively impervious to distortion (Keshavarz et al., 2017). By contrast, in foggy environments, when the car is almost invisible, auditory information becomes more important. This relative importance of modality information depending on the informational value becomes also apparent when individual capacities are considered, as for example in blind subjects playing tennis with rattling balls. Further empirical evidence for the strong dependence on modality-related task characteristics is supported by illusion effects in which one modality dominates the perception of a multisensory object or event (Radeau and Bertelson, 1974). These illusion effects seem to be largely driven by the sensory modality that has the highest informational value for solving the task (for a review, see Recanzone, 2009).

In sum, the different sensitivities of different modalities to temporal and spatial information might moderate the empirical results. Because auditory information processing is more sensitive to temporal than spatial information and visual information processing is more sensitive to spatial than temporal information, it is reasonable to argue that different sensorimotor tasks may address auditory and visual information processing to different degrees. If true, then it can be hypothesized that different tasks addressing mainly one modality might cause the contradictory results with respect to the (a)symmetry of temporal and spatial representations. To test this, here we review the relevant literature to examine which modalities were addressed in studies that examined interrelations between temporal and spatial representations, supporting either the asymmetry or the symmetry hypothesis.

### THEORETICAL BACKGROUND: CMT VS. ATOM

According to the asymmetry hypothesis, spatial representations grounded in movement have a stronger impact on temporal representations than vice versa. The asymmetry hypothesis is based on the conceptual metaphor theory (=CMT), which assumes that the neural system characterizing concrete sensorimotor experience has more inferential connections and therefore a greater inferential capacity than the neural system characterizing abstract thoughts (Lakoff and Johnson, 1980; Boroditsky, 2000). It follows that the abstract representation of time tends to be asymmetrically dependent on the more concrete representation of space. This asymmetric relationship between time and space, which is at the core of the asymmetry hypothesis, was originally supported by the analysis of metaphorical language (Clark, 1973; Lakoff and Johnson, 2003): When we talk about time, we mainly use spatial terms that often include movement (e.g., "The weekend is getting closer," "The birthday is behind me"). Only rarely do we use temporal terms to talk about space ("I am five minutes from the central station," see Cai and Connell, 2015). A number of studies have provided evidence that these linguistic expressions reflect a deeper, asymmetric conceptual link between time and space (Boroditsky, 2000; Merritt et al., 2010; Bottini and Casasanto, 2013; Xue et al., 2014; Coull et al., 2015), with concurrent spatial information


affecting time judgments (e.g., duration) to a greater extent than concurrent temporal information affecting spatial judgments (e.g., length). Taken together, a plethora of studies seems to support the asymmetry hypothesis and its assumption that spatial representations have a stronger impact on temporal representations than vice versa.

In contrast, according to the symmetry hypothesis, which is based on a theory of magnitude (= ATOM), it is assumed that time and space are processed by a shared analog magnitude system (Walsh, 2003). In keeping with ATOM, temporal and spatial representations are processed in a common neural substrate and share representational and attentional resources (e.g., Walsh, 2003). The shared system for magnitudes of time and space (and numbers) explains compatibility effects without specifying any directionality of the effects. If space and time are both represented by the same general-purpose analog magnitude metric, there is no a-priori reason to posit that representations in one domain should depend asymmetrically on representations in the other. Empirical evidence for ATOM is provided by studies showing, for example, that expertise in temporal tasks (e.g., musicians) shows a positive transfer to spatial tasks (Agrillo and Piffer, 2012), or that overlapping neural substrates are active across temporal and spatial magnitude tasks (Skagerlund et al., 2016). By now, there is considerable empirical evidence for



the symmetry hypothesis that space and time share the same basic spatio-temporal metrics and thereby equally influence each other (Walsh, 2003; Agrillo and Piffer, 2012; Hyde et al., 2013; Skagerlund and Träff, 2014; Cai and Connell, 2015; Skagerlund et al., 2016).

To summarize, on the one hand, there is empirical evidence for the asymmetry hypothesis and its main assumption that time and space remain two separate representational systems, with spatial representations being paramount in shaping our understanding of time, whereas temporal representations have less relevance when making spatial judgments (Boroditsky, 2000; Merritt et al., 2010; Bottini and Casasanto, 2013; Xue et al., 2014; Coull et al., 2015). On the other hand, there is empirical evidence to support the symmetry hypothesis that time and space share a common representational system, and hence, are symmetrically interrelated (Agrillo and Piffer, 2012; Hyde et al., 2013; Skagerlund and Träff, 2014; Cai and Connell, 2015; Skagerlund et al., 2016).

### SCOPE OF MINI-REVIEW: SELECTION CRITERIA

The aim of this short review is to critically assess the literature supporting either the asymmetry hypothesis (Lakoff and Johnson, 1980; CMT, Boroditsky, 2000) or the symmetry hypothesis (ATOM, Walsh, 2003) with a special focus on the question whether different tasks addressing different modalities may be the primary reason for (a)symmetric effects of space on time, instead of a genuine (a)symmetric mapping. To this end, we assessed whether the temporal and spatial tasks in the studies addressed the visual and/or auditory modality. As both hypotheses have variants that refer to the same theory but use different wording (e.g., "metaphorical mapping," "magnitude system"), the literature search was based on the core words for each theorical background ("metaphor," "magnitude"). Therefore, the authors performed two database searches (Web of Science, 24th of March 2018) using the terms (a) "metaphor<sup>∗</sup> ," "time" or "temporal," and "space" or "spatial," and (b) "magnitude<sup>∗</sup> ," "time" OR "temporal," and "space" OR "spatial." Papers with these three terms in the title were included. The search resulted in (a) 36 and (b) 40 results. To extend and validate the search results, the authors performed an additional database search using the terms: "time-space" or "space-time" and "asymmetr<sup>∗</sup> mapping," or "symmetr<sup>∗</sup> mapping." The search resulted in only four hits, of which one was in favor of the symmetry hypothesis. This article was therefore added to b). Two were off-topic and the fourth article was non-empirical and therefore not included.

From the list of papers resulting from the literature search, we selected only empirical studies that focused on time as well as on space (e.g., some studies focused on temporal metaphors without addressing the time-space (a)symmetry or others were completely off-topic). Although important for the understanding of the interrelations of time and space, the following review makes no statements about accounts concerning the processing stage in which the interrelation might occur (encoding, memory interference, retrieval) or about other possible moderators or modulators (e.g., Wang and Cai, 2017). Furthermore, neural correlates of spatial and temporal representations are not discussed within the scope of this mini-review. In addition, based on suggestions by an anonymous reviewer, two further studies important in the context of temporal and spatial representations were added (Casasanto and Boroditsky, 2008 and Casasanto et al., 2010). In the end, 16 studies were included in the analysis (see **Tables 1**–**3**). These 16 studies will be summarized with a special focus on the modality of the applied tasks.

### ASYMMETRY VS. SYMMETRY HYPOTHESIS: A MODALITY-SPECIFIC ANALYSIS

Results indicate that most studies in favor of an asymmetric timespace mapping (**Table 1**) used visual tasks for both temporal and spatial representations (Boroditsky, 2000; Casasanto et al., 2010; Merritt et al., 2010; Bottini and Casasanto, 2013; Xue et al., 2014; Coull et al., 2015; Zito et al., 2015). Only one study (Casasanto and Boroditsky, 2008) included an audiovisual task but only for temporal judgments. Tasks applied were, for example, duration and distance judgments (Bottini and Casasanto, 2013) or ambiguous temporal and spatial questions (Boroditsky, 2000).

All reviewed studies in favor of a symmetric time-space mapping (**Table 2**, Agrillo and Piffer, 2012; Hyde et al., 2013; Skagerlund and Träff, 2014; Skagerlund et al., 2016) used visual tasks for the spatial domain only (except for one study that applied haptic tasks, Cai and Connell, 2015). With respect to the temporal domain, most of the studies in favor of the symmetry hypothesis applied an auditory task to measure temporal representations. Tasks included, for instance, temporal (e.g., which of two tones lasted longer) and spatial (e.g., which of two lines was longer) discrimination tasks (Hyde et al., 2013), or incongruent vs. congruent audio-visual length-time pairings (Agrillo and Piffer, 2012). One study (Skagerlund and Träff, 2014) used a visual task for measuring temporal performance.

The results of three studies support neither a symmetric nor asymmetric time-space mapping (**Table 3**; Yates et al., 2012; Rousselle et al., 2013; Cai and Connell, 2016). These reviewed studies applied visual tasks (except one study that applied an auditory task for the temporal domain, Rousselle et al., 2013), consisting of, for example, temporal and spatial distance judgments tasks (Cai and Connell, 2016) or temporal and spatial discrimination tasks (Rousselle et al., 2013). Importantly, Yates et al. (2012) investigated whether the found interrelations between time and space are due to affected representations or whether they are influenced by a decisional bias. As they found a reversed effect of space on time when changing the comparative task to an equality judgement they concluded that the given response requirements might affect the interaction between space and time as well. These findings neither support ATOM nor CMT. Therefore, the study was categorized to **Table 3**.

Furthermore, we decided not to list Cai and Connell (2016) in Table 2, supporting the symmetry hypothesis based on ATOM, but in **Table 3** as the authors did not investigate the bidirectionality of the relationship between temporal and spatial representations. Only the influence of space on time was examined and therefore no conclusion concerning the (a)symmetry was drawn. Note though that Cai and Connell (2016) interpreted their results as being favorable toward the internal clock model (Gibbon et al., 1984) which is based on ATOM.

Finally, Rousselle et al. (2013) failed to support the symmetry hypothesisin their study. They showed a relationship between the magnitude perception of numbers and space but no association to time perception. Hence, their results support neither of the two theories and were also included in **Table 3**.

### DISCUSSION AND CONCLUSIONS

Based on the evaluation of 16 studies that were included in this short review, the results seem to provide initial support for the assumption that the use of different tasks addressing different modalities may account for (a)symmetric effects of space on time. In fact, the studies supporting the symmetry hypothesis predominantly used auditory tasks (and not visual tasks) when compared to studies supporting the asymmetry hypothesis. Given the discrepancy in the theoretical interpretation of the corresponding findings we suggest that (task-dependent) modality-specific processing plays a significant role for interrelations between temporal and spatial representations. Therefore, taking modality-specific processing into account when putting the conflicting hypotheses to test seems mandatory in order to shed light on the mechanisms underlying the interrelation between temporal and spatial representations.

Based on our assessment, it seems justified to argue that the studies in favor for either asymmetry or symmetry could easily be re-interpreted. For example, in Coull et al. (2015) asymmetry experiment it is apparent that the spatial and the temporal information were both provided by visual information. If we consider that visual information processing shows higher sensitivity to spatial information yet lower sensitivity to temporal information (e.g., Recanzone, 2009), the observed asymmetry could be based on the different informational values of vision and audition with respect to spatial and temporal information. In other words, when only visual information (but no auditory information) was provided, the reported asymmetry between space and time may hinge on that fact that the task was purely visual, and hence had a higher informational value for space than for time. In this context, Wang and Cai (2017), for instance, suggest that the cross-dimensional magnitude interaction depends on the amount of representational noise. If the rated construct is noisier and thus less reliable, it is more likely to be influenced by other magnitudes. Cai et al. (2018) therefore provide a Bayesian interference model to explain the findings.

Although the literature indicates that modality-specificity might matter when examining temporal and spatial representations, results were not distinctly clear: Some studies showed evidence for a symmetric time-space mapping, even though they applied a visual task to measure temporal representations. This pattern might be caused by the fact that modality-sensitivity is not the only factor influencing time-space mappings. Sticking with the assumption that there may be no genuine time-space (a)symmetry, there are some other factors besides modality-specificity—that likely have an impact on the (a)symmetry of time and space. Other potential moderators could be, for example, the task automaticity/familiarity and response properties that cause decisional bias (Yates et al., 2012).



In addition, the participant's age could be a moderator given that temporal vision matures more rapidly than spatial vision during childhood (Ellemberg et al., 1999). Furthermore, it is still under debate at which stage of processing the interference between time and space occurs (encoding, memory interference, retrieval, e.g., Cai et al., 2018). Cross-dimensional relations might differ depending on the different stages of processing and provide avenues for future research.

Although it seems challenging to dissociate cross-dimensional interactions, future studies might benefit from applying tasks that genuinely require both a balanced representation of time and space. Potential tasks resembling a more balanced representation of time and space include movement tasks such as catching a ball, as temporal and spatial representations play an analogous role for the execution of such movements. Further, recent evidence shows the importance of auditory information, additional to visual information, in anticipation tasks of moving stimuli (e.g., the landing location of a tennis ball, Cañal-Bruland et al., 2018). A crucial role of movements in interrelations of temporal and spatial representations is additionally supported by the fact that the processing of such quantities overlaps in parietal brain regions associated with action control (Bueti and Walsh, 2009). It is assumed that we learn associations occurring across different magnitude domains by moving in our environment. For example, catching a ball that was thrown from far away requires slower running speed than catching a ball that was thrown from a nearer distance (assuming that the balls were thrown with the same speeds and one was trying to catch at the same interception location). Therefore, in future studies, a task that genuinely contains movement (i.e., catching a ball), and provides visual as well as auditory information, might be beneficial to investigate the mechanisms that drive time-space mappings. Surely, future empirical research including movement in the task and taking potential moderators (e.g., modality-specificity, task automaticity, age) into account is needed to confirm or reject our assumptions.

A potential limitation of our short review is that it is quite likely that not all studies scrutinizing time-space mappings were covered by our literature search. One evident reason is that different terms and wording have been used in different studies. We cannot rule out that some studies, for example, provide evidence for symmetric time-space mappings without naming it time-space mapping or mentioning ATOM.

In summary, our literature review highlighted that seemingly contradictory claims could be bridged if cross-dimensional magnitude interactions between temporal and spatial representations were considered. It follows that previous experiments that examined only one modality may have limited success to specify the (a)symmetry of temporal and spatial representations and hence do not provide a proper test to tease the conflicting hypotheses apart. Consequently, a systematic manipulation of the relative contributions of different modalities to executing task-appropriate solutions in both the space-sensitive visual domain and the time-sensitive auditory domain seems necessary. Taking a task such as catching a ball as a testbed might be a promising approach to draw conclusions about the (a)symmetry of temporal and spatial representations.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This research was supported by the German Research Foundation (DFG)–RA 940/15-1 & 2 and CA 635/2-2.

### REFERENCES


### ACKNOWLEDGMENTS

We would like to thank the colleagues of the Performance Psychology Group for their helpful suggestions.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Loeffler, Cañal-Bruland, Schroeger, Tolentino-Castro and Raab. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Conversational Time Travel: Evidence of a Retrospective Bias in Real Life Conversations

Burcu Demiray1,2 \*, Matthias R. Mehl<sup>3</sup> and Mike Martin1,2

<sup>1</sup> Department of Psychology, University of Zurich, Zurich, Switzerland, <sup>2</sup> University Research Priority Program "Dynamics of Healthy Aging", University of Zurich, Zurich, Switzerland, <sup>3</sup> Department of Psychology, University of Arizona, Tucson, AZ, United States

We examined mental time travel reflected onto individuals' utterances in real-life

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Krystian Barzykowski, Jagiellonian University, Poland Jennifer M. Talarico, Lafayette College, United States

\*Correspondence: Burcu Demiray b.demiray@psychologie.uzh.ch

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 17 July 2018 Accepted: 22 October 2018 Published: 13 November 2018

#### Citation:

Demiray B, Mehl MR and Martin M (2018) Conversational Time Travel: Evidence of a Retrospective Bias in Real Life Conversations. Front. Psychol. 9:2160. doi: 10.3389/fpsyg.2018.02160 conversations using a naturalistic observation method: Electronically Activated Recorder (EAR, a portable audio recorder that periodically and unobtrusively records snippets of ambient sounds and speech). We introduced the term conversational time travel and examined, for the first time, how much individuals talked about their personal past versus personal future in real life. Study 1 included 9,010 sound files collected from 51 American adults who carried the EAR over 1 weekend and were recorded every 9 min for 50 s. Study 2 included 23,103 sound files from 33 young and 48 healthy older adults from Switzerland who carried the EAR for 4 days (2 weekdays and 1 weekend, counterbalanced). 30-s recordings occurred randomly throughout the day. We developed a new coding scheme for conversational time travel: We listened to all sound files and coded each file for whether the participant was talking or not. Those sound files that included participant speech were also coded in terms of their temporal focus (e.g., past, future, present, time-independent) and autobiographical nature (i.e., about the self, about others). We, first, validated our coding scheme using the text analysis tool, Linguistic Inquiry and Word Count. Next, we compared the percentages of past- and future-oriented utterances about the self (to tap onto conversational time travel). Results were consistent across all samples and showed that participants talked about their personal past two to three times as much as their personal future (i.e., retrospective bias). This is in contrast to research showing a prospective bias in thinking behavior, based on self-report and experience-sampling methods. Findings are discussed in relation to the social functions of recalling the personal past (e.g., sharing memories to bond with others, to update each other, to teach, to give advice) and to the directive functions of future-oriented thought (e.g., planning, decision making, goal setting that are more likely to happen privately in the mind). In sum, the retrospective bias in conversational time travel seems to be a functional and universal phenomenon across persons and across real-life situations.

Keywords: Electronically Activated Recorder, mental time travel, autobiographical memory, future-oriented thought, retrospective bias, conversations, real life

### INTRODUCTION

fpsyg-09-02160 November 10, 2018 Time: 13:42 # 2

Live in the here and now – so goes a common credo. However, one of the most remarkable skills of humans is not their ability to have their minds set on the present, but, rather, to engage in mental time travel. The human cognitive apparatus is a powerful time travel machine, allowing us to almost effortlessly project ourselves into the future to simulate possible future events, as well as put ourselves back into the past to relive our past experiences (Suddendorf and Corballis, 2007). Recently, psychologists have started to emphasize that memory (e.g., autobiographical memory) and prospection (e.g., future-oriented thought) are closely related phenomena that share many common qualities (e.g., Schacter et al., 2012; Klein, 2013; Rasmussen and Berntsen, 2013). Thinking or talking about our past and future are such natural, moment-to-moment activities that we do not notice or wonder how often we recall our memories or imagine our future in everyday life. Psychologists have started to explore this prevalence question using a range of self-report methods (e.g., diary method, experience-sampling) with a focus on participants' thoughts. Here, we used, for the first time, an ecological behavioral observation method that is free of selfreport to examine the prevalence of mental time travel behavior in everyday conversations (i.e., conversational time travel). Using the Electronically Activated Recorder (EAR; Mehl et al., 2001), we unobtrusively and intermittently sampled snippets of ambient sounds and speech from participants' natural lives, and extracted information about their moment-to-moment conversations.

The first goal of the current research was to develop and validate a new, naturalistic observation approach to studying mental time travel reflected in everyday conversations. We listened to and coded participants' recorded utterances in terms of whether they (a) had a time reference or not and (b) were about the self versus others. We tapped onto mental time travel by focusing on those utterances that were about the self with a time reference. In two studies, we validated our coding scheme using a text analysis program and with adult samples representing different age groups and countries.

The second goal of the current research was to examine how often people engage in conversational time travel, and when they do, how often they talk about their past versus future. There is some work on how much people think about their personal past versus future in everyday life (e.g., Klinger and Cox, 1987; Berntsen and Jacobsen, 2008; Gardner and Ascoli, 2015), but no work on how much they talk about their past versus future. The solitary nature of thinking versus the social nature of talking should have different effects on mental time travel (e.g., Kulkofsky et al., 2010), which has methodological and theoretical implications. Humans spend 32–75% of their waking time with other people (Mehl and Pennebaker, 2003). That is, much of human behavior occurs in a social context, therefore we examined, for the first time, mental time travel in the context of conversations. We unobtrusively observed and objectively coded the overt behavior of talking in everyday life to examine mental time travel reflected in people's utterances.

### Prevalence of Past- and Future-Oriented Thoughts in Everyday Life

Previous studies measuring the incidence of subjective thoughts and experiences have typically used variants of the original experience-sampling method (ESM; Csikszentmihalyi et al., 1977). One type of ESM is event-contingent sampling (e.g., diary method; Berntsen, 2007) in which diary entries are prompted by participants through introspection and detection of the occurrence of a target event. The second type is signalcontingent sampling, which requires people to evaluate the presence of a targeted experience when prompted by a randomly timed signal (e.g., Pasupathi and Carstensen, 2003). One important advantage of these methods is their high ecological validity.

Using the diary method, D'Argembeau et al. (2011) explored the frequency of thinking about the personal future in everyday life. They asked participants to report whenever they realized that they were thinking about their future and found that participants reported experiencing, on average, 59 future-oriented thoughts on a typical day. In contrast, Rasmussen and colleagues (i.e., Rasmussen and Berntsen, 2011; Rasmussen et al., 2015) examined the frequency of thinking about the personal past (i.e., autobiographical memories) using the diary method. They made a distinction between voluntarily thinking about memories versus involuntary memories (which spontaneously pop up without deliberate search) and compared their frequency. They showed that participants self-reported recalling on average 7– 8 voluntary and 20–22 involuntary autobiographical memories per day. Taken together, these two studies suggest that young adults think about their personal future twice as much as their past. Berntsen and colleagues (Berntsen and Jacobsen, 2008; Finnbogadottir and Berntsen, 2013), however, did not replicate this finding. They used the diary method to examine involuntary mental time travel and compared the frequency of involuntary memories and involuntary future-oriented thoughts. They found that involuntary memories were as frequent as involuntary future-oriented thoughts in daily life (around 22 per day).

In a signal-contingent experience sampling study, Gardner et al. (2012) examined the frequency of thinking about the personal past (i.e., autobiographical memories) in everyday life. Via random prompts throughout the day, they asked young adults to report whether they were thinking about a specific autobiographical memory at that moment or not. They found that the probability of being caught while recalling a specific autobiographical memory was 15%. However, in a second study using the same method, Gardner and Ascoli (2015) found this probability to be 10%. It was unclear to the authors why this small discrepancy occurred, but they suggested it might be due to the investigation of both past- and future-oriented thoughts in the second study. They found that participants thought about their future about 13% of the time. The second study's results are in line with an early signal-contingent experience sampling study: Klinger and Cox (1987) have shown that people rated 12% of their momentary thoughts as focused on their past and

12% on their future (versus 67% on their present). However, Felsman et al. (2017) found a large difference between past- and future-oriented thoughts. Via text messages throughout the day, they asked participants to report which of the following would best characterize their thoughts: past-, present- or future-focused. They found that people reported focusing much more on the future (26%) than the past (8%), with present as the most frequent category (66%).

Signal-contingent experience sampling has been used to examine involuntary thoughts, as well, particularly mindwandering. Mind-wandering is defined as a shift of attention from a primary task in the present toward internal information or self-generated thought, such as autobiographical memories (Smallwood and Schooler, 2006). Song and Wang (2012) examined the temporal orientation of mind wandering by randomly prompting participants throughout the day and asking whether they were mind wandering or not, and if mind wandering, whether they were thinking about past, future, present or atemporal events. They found a prospective bias such that participants were mind wandering about the future (40.53%) twice as much as the past (21.53%). We should note that this prospective bias has been repeatedly shown in laboratory studies of mind wandering (e.g., Smallwood et al., 2009; Baird et al., 2011; Stawarczyk et al., 2011). However, researchers have identified some factors that affect the temporal orientation of mind wandering, with some eliminating the prospective bias, such as manipulating the experimental settings (e.g., response options and cues; Jackson et al., 2013; Vannucci et al., 2017), and controlling for participant characteristics such as familiarity with the task (Smallwood et al., 2009) and mood (Poerio et al., 2013).

In sum, all studies reviewed above are conducted in the real world, focused on thoughts (retrieved voluntarily and/or involuntarily) and based on self-report (i.e., diary method and signal-contingent experience sampling). They have resulted in two different findings on the prevalence of thinking about the personal past versus future: Some reported that future-oriented thoughts occur almost twice as frequently as past-oriented thoughts, whereas others reported similar proportions of both.

### Temporal Orientation

Another line of research that is relevant for our work is time perspective or temporal orientation. Temporal orientation refers to relatively stable individual differences in the relative emphasis one places on the past, present, or future (Zimbardo and Boyd, 1999). Temporal orientation has been widely examined in relation to personality traits (e.g., Zhang and Howell, 2011), academic outcomes (e.g., Horstmanshof and Zimitat, 2007), risky behaviors (e.g., Daughterty and Brase, 2010), and health outcomes (see Stolarski et al., 2015 for reviews). Temporal orientation is usually assessed with surveys, such as the Zimbardo Time Preference Inventory (ZPTI; Zimbardo and Boyd, 1999) and Balanced Time Perspective Scale (BTPS; Webster, 2011). Jason et al. (1989) interviewed women (mean age = 31) and asked them to rank their past, present and future by the amount of thinking time devoted to each temporal focus. Women selfreported thinking about their present 41% of the time, their future 38% of the time and their past 21% of the time. This finding is similar to others (reviewed above) showing that future-oriented thoughts occur twice as much as past-oriented thoughts.

There is only one study on temporal orientation that is not based on self-report: Park et al. (2016) have created a novel language-based measure of temporal orientation: They have developed a model to automatically classify individuals' social media messages as oriented toward the past, present or future (model accuracy = 72%). They used the model to classify over 1.3 million Facebook status updates (i.e., short text messages) written by 5,372 individuals aged 13–48. They found that 65% of messages were present-oriented, 19% were past-oriented and 16% were future-oriented. This result does not fit with the questionnaire findings above and presents an equal proportion of past- and future-oriented messages.

In conclusion, studies based on self-report (i.e., diary method, signal-contingent experience sampling, questionnaires) and one study based on a linguistic analysis of social media messages (Park et al., 2016) have resulted in two findings: a prospective bias versus an equal proportion of past- and future-oriented thoughts. All of these studies have focused on thoughts, therefore, we can conclude that future-oriented thoughts tend to dominate our private mental worlds compared to past-oriented thoughts.

In contrast, our social worlds might be dominated by pastoriented thoughts: Humans spend one fifth of their waking time in spontaneous conversation (Dunbar, 1998) and a significant portion of this time is dedicated to talking about past events (Eggins and Slade, 1997; Dessalles, 2018). According to Desalles (2007), the function of recalling the past is to accumulate stories that are relevant to tell in conversation. He claims that events that are memorable are exactly those that are good for narrating. Similarly, Mahr and Csibra (2018) argue that the main function of remembering is communication. They claim that social interactions require the justification of entitlements and obligations, which is possible only by reference to past events. In sum, these theoretical accounts highlight the importance of recalling the personal past in conversations. Therefore, we examined, for the first time, mental time travel in conversations and explored whether there is a retrospective bias in talking behavior, in contrast to the prospective bias observed in thinking behavior (e.g., Felsman et al., 2017).

### Overview of the Present Studies

The most important novelty of this work is its naturalistic observation approach to studying spontaneous, everyday conversations unobtrusively and with minimal participant burden. We used the Electronically Activated Recorder in both studies to collect random snippets of everyday conversations. The EAR is a portable audio recorder that intermittently records brief snippets of ambient sound and speech (Mehl et al., 2001). It captures acoustically detectible aspects of participants' environments, such as their locations, activities and social interactions (Mehl, 2017). The strength of the current work is its attempt to increase ecological validity through sampling from a wide range of natural situations: We obtained a huge sample size by collecting more than 32,000 sound snippets.

The EAR has been used with good acceptance and compliance (Mehl, 2017), in all age groups (Bollich et al., 2016;

Demiray et al., 2017) with healthy and clinical populations (e.g., Robbins et al., 2014). The psychometric properties of EARobserved conversational behavior have been established in prior research with student (Mehl and Pennebaker, 2003) and adult populations (Bollich et al., 2016). Study 1 has been approved by the Institutional Review Board of the University of Arizona, and Study 2 was approved by the Ethics Committee of the University of Zurich. We have implemented a series of safeguards to protect participants' privacy and to ensure data confidentiality. First, the EAR recorded only a small fraction of the day (e.g., 2.5% when sampling 30 s). Second, participants had the opportunity to review their recordings and erase any files they did not want on record, before the investigators accessed the data. Third, in order to protect bystanders, we encouraged participants to wear the EAR visibly (with large warning stickers on them) and to readily mention the study to others. Finally, although sound files included bystanders' utterances, we only coded and analyzed the utterances of our participants (for a detailed discussion of EAR privacy and confidentiality policies, see Mehl, 2017; Robbins, 2017).

In order to examine mental time travel as reflected in participants' utterances, we developed a novel coding scheme: We, first, coded whether participants' utterances were timedependent (i.e., had a reference to time) or time-independent (e.g., semantic memory such as "The name of the restaurant is Satchel's"; Suddendorf et al., 2009). Next, we coded whether timedependent utterances were about the self (i.e., autobiographical) or about others (e.g., vicarious memories; Pillemer et al., 2015). Finally, in order to tap onto mental time travel, we focused on the autobiographical, time-dependent utterances: We coded for "personal past" when the participant was talking about personally experienced past events (e.g., "I visited my grandparents last week"). "Personal future" was about anything that will/might or not happen in one's future (e.g., "Next year I'm starting my MA degree"). Finally, when the participant was talking about their current activity, task or situation, we coded for "present" (e.g., "This show is boring, let's change the channel").

In Study 1, we validated our coding scheme using the Linguistic Inquiry and Word Count (LIWC), which is currently the most extensively validated text analysis tool in the social sciences (LIWC; Pennebaker et al., 2007). In Study 2, we validated the coding scheme with different samples. Previous studies on mental time travel have mostly focused on (1) college students or young adults, (2) one culture, with no crosscultural comparisons, (3) experiences of a single temporal focus, such as only autobiographical memories or only future-oriented thoughts (e.g., Pasupathi and Carstensen, 2003; Mace, 2004; Kvavilashvili and Fisher, 2007; Schlagman and Kvavilashvili, 2008; Schlagman et al., 2009; D'Argembeau et al., 2011; Rasmussen and Berntsen, 2011; Gardner et al., 2012). Important and novel aspects of the current work is the inclusion of (1) participants that represent the whole adult life span, (2) participants from two countries, and (3) both past- and future-oriented utterances. We compared the prevalence of past- and future-oriented utterances across young, middle-aged and older adults in the United States and Switzerland. Study 1 examined the utterances of healthy spouses of breast cancer patients over a weekend

(United States), and Study 2 examined the utterances of healthy young and older adults over 4 days (Switzerland). In addition to sampling such a wide range of individuals, one novel achievement of this work is its sampling from the universe of real-life situations.

## STUDY 1

This study is part of a larger project on American couples coping with breast cancer. Breast cancer patients and their healthy spouses were recruited at the Arizona Cancer Center, as described in earlier work that examined cancer conversations of couples (Robbins et al., 2014). For the purposes of our research, we focused only on the spouses' utterances. The reason we used this dataset is that it was the only readily available dataset with Ear transcripts that we could use to develop our coding scheme.

The first goal of Study 1 was to validate our coding scheme using the LIWC (Pennebaker et al., 2007). We used LIWC to count specific words in participants' utterances. We first compared utterances manually coded as time-dependent and those coded as time-independent in terms of the following LIWC variables: future-tense and past-tense. We expected timedependent utterances to include significantly more verbs with tense than time-independent utterances. Second, we compared autobiographical (self-related) and others-related utterances in terms of personal pronouns: We expected self-related utterances to include more 1st person singular and plural pronouns, whereas others-related utterances to include more 2nd and 3rd person pronouns. Finally, utterances coded as personal past, personal future and present were compared in terms of their verb tense. We expected, for example, utterances about the personal past to include more verbs with the past tense than utterances about the present and personal future.

The second goal of Study 1 was to examine the prevalence of mental time travel in participants' utterances, and to specifically compare the frequency of past- versus future-oriented utterances. Recent theories on episodic memory (Desalles, 2007; Mahr and Csibra, 2018) suggest that the main function of remembering the past is communication. Past research on autobiographical memory emphasizes significant social functions of memories showing that people recall their personal past to provide material for conversation (Pasupathi et al., 2002), to update others about what is ongoing in their life (Webster, 2003), to create/enhance intimacy in relationships (Alea and Bluck, 2007), to elicit empathy for others (Bluck et al., 2013) and to teach and inform others (O'Rourke et al., 2017). In contrast, future-oriented thinking is shown to serve directive functions such as planning, decision making, problem solving, goal intention and goal achievement (e.g., Szpunar, 2010; D'Argembeau et al., 2011; Schacter et al., 2017). Such directive functions should be inherently private and more likely to occur when people are thinking alone (O'Rourke et al., 2017). For example, Kulkofsky et al. (2010) have shown that private reminiscence favors directive functions (which guide current and future behavior), whereas social contexts are associated with memories that have higher social functions. Thus, we expected to observe significantly more autobiographical memories (i.e., past-oriented utterances) than future-oriented utterances in the social setting of conversations with others. That is, we expected a retrospective bias in conversational time travel in contrast to the prospective bias observed in mental time travel (e.g., Jason et al., 1989; Song and Wang, 2012).

### Materials and Methods

fpsyg-09-02160 November 10, 2018 Time: 13:42 # 5

### Sample

Our sample of real-life situations included 9,010 sound snippets collected from 51 healthy spouses. Out of 51 spouses, 44 were male (86%). Participants were on average 59 years old (Range: 26– 94, SD = 14). Eighty-two percent of participants were Caucasian (n = 42), 15% Latin American (n = 8), and 2% Asian (n = 1). All participants were in a marriage-like relationship, and were primarily English speaking. Each couple received \$150 for their participation.

### Procedure

The first study session usually occurred on a Friday afternoon. All participants, first, gave written informed consent in accordance with the Declaration of Helsinki. They, then, completed a set of questionnaires as part of the larger study, and were provided with an introduction to the EAR. They were told that the device should be worn as much as possible over the weekend during their waking hours. They were informed that the EAR would record 50 s of ambient sound at a time for a total of approximately 10% of their waking hours. Participants were informed that the snippets would be recorded without their awareness and they should proceed with their normal, everyday life as much as possible. They were also told the EAR would cease recording during sleeping hours. All participants were explicitly told they would have an opportunity to review all recordings prior to anyone listening to them and to erase any files they did not want on record. Following that weekend, typically on the Monday, the EAR devices were collected from the participants and another battery of questionnaires, which included demographics and medical information, was administered. Participants were debriefed and given a password-protected Cd containing all of their sound files to review. There were over 9,000 sound files collected and of those only one participant deleted just one file.

### Measures

The EAR was software programed on an Hp iPaq 100 handheld computer. The device was set to record 50 s every 9 min. This sampling rate has been established in previous studies as yielding stable estimates of habitual daily behavior (Mehl et al., 2012). The device was housed in a protective case affixed to participants' waistlines, and an external microphone (Olympus Me-15) was attached to participants' lapels. The EAR was preprogrammed to not record for 6 h during the participants' predefined normal sleep hours, starting 30 min after they indicated they typically go to sleep. The EAR recorded participants' waking days, from the time the participant received the device until they went to sleep on Sunday. This yielded an average of 176 (Sd = 57) valid, waking sound files (approximately 2.4 h of data per participant), which was defined as a file where the participant was wearing the Ear with no technical difficulties, while the participant was awake.

### **EAR-Derived measures: coding of sound files**

All sound files were listened to, transcribed and coded by trained coders. Files were coded, as part of the larger project, for whether the participant was talking or not. For the goals of the current study, we developed a coding scheme for the temporal focus of participants' utterances (See **Table 1** for examples, and note that we make all coding guidelines available upon request to interested researchers). We first differentiated between time-dependent versus timeindependent utterances. Time-independent utterances had no reference to time and included semantic memory (i.e., general knowledge about the world, such as "Paris is the capital of France") and personal comments, beliefs, preferences, attitudes about anything in general (e.g., "He's really nice"). Timedependent utterances included a reference to time (i.e., past, present, and/or future) and were divided into autobiographical (self-related) and others-related categories. Autobiographical utterances were about the personal past, present moment and personal future, whereas utterances about others focused on others' past and others' future. Personal past refers to talking about personally experienced past events: These could be specific events (that happened at a particular place and time), repeated events (e.g., "I used to go to the gym every day"), extended events (e.g., "our 2-week vacation last Christmas"), and long periods of life (e.g., "When I lived in the United States"; Conway et al., 2004). In contrast, others' past refers to talking about other people's past experiences (i.e., the participant did not experience the event himself/herself). Personal future refers to anything that will/might or will/might not happen in one's future (e.g., "We will not go to the movies"). Others' future refers to talking about other people's future experiences, which the participant is not personally involved in (e.g., "They might go skiing next week"). Finally, utterances about the present refer to talking about the current activity, task or situation. This also includes extremely recent past and extremely close future, which is connected to the present moment (e.g., "I just washed the potatoes and I am going to cook the veggies now"). There is no "others' present" category, as the participant has to be there to observe others' present activity, which automatically involves the participant's present. Utterances such as "David is at the cinema" were coded as "time-independent," as semantic knowledge.

All coding categories were dichotomous, indicating presence (1) or absence (0) of a temporal focus. In addition, each sound file was coded in a TIME column with 1 = personal past, 2 = others' past, 3 = present, 4 = personal future, 5 = others' future, 6 = time-independent (See **Table 2** for examples). Categories were not mutually exclusive, such that any 50-s sound file might include any combination of temporal foci. For example, if one talked about both personal past and others' past within the same sound file, they received a 1 for both temporal categories and a "1–2" for the TIME variable. Each

#### TABLE 1 | Examples of each coding category.

fpsyg-09-02160 November 10, 2018 Time: 13:42 # 6


#### TABLE 2 | Examples of the coding scheme.


DOMINANT TIME exists in only Study 2. All variables except for TIME and DOMINANT TIME are dichotomous variables.

sound file was double-coded by two coders. We calculated interrater reliability by using the TIME variable, but not the single temporal focus variables separately: The two coders agreed on the TIME variable 64.12 % of the time. This calculation of inter-rater reliability was much stricter than calculating interrater reliability for each temporal focus separately: It is less likely to obtain agreement in the TIME variable, especially in specific cases such as a coding of "1–3–4," than obtaining agreement separately in single columns (e.g., separately for 1 = personal past, 3 = present, 4 = personal future). Nevertheless, all sound files that showed a disagreement between the two coders were re-listened to and the disagreement was resolved through discussion.

#### **Text analyses**

Transcriptions of utterances were analyzed using LIWC (Pennebaker et al., 2007). LIWC software is one of the most widely used and best-validated text analysis tool in psychological science (e.g., Pennebaker et al., 2003; Tausczik and Pennebaker, 2010). LIWC analyzes text word-by-word and categorizes it into different linguistic (e.g., pronouns, prepositions) and psychological categories (e.g., emotion words, social words). It creates a percentage of word use (specific category/total number of words) by categories for each participant. In the current study, we used the following categories: past-tense, future-tense, present-tense and all personal pronouns.

### Results

A total of 4,100 sound files included participant speech (45.5% of valid sound files). We were unable to code for temporal focus in 747 sound files (18%) due to the brevity or vagueness of speech. The average number of words in these transcripts was four (e.g., "The what? Oh yes," "Me, um, I guess") and there were many cases with information that could help identify participants (e.g., names). We excluded 115 sound files (3.4 %) that were related to cancer in order to examine only ordinary daily conversations. Analyses were conducted with the remaining 3,238 sound files: In order to run the following analyses of variance, this dataset with one sound file on each row (soundlevel dataset) was converted into a person-level dataset (one row is one participant) which aggregated data on the person level. Note that we make all data available upon request to interested researchers.

#### Validation of the Coding Scheme

fpsyg-09-02160 November 10, 2018 Time: 13:42 # 7

The first goal of Study 1 was to validate our coding scheme using the Linguistic Inquiry Word Count (Pennebaker et al., 2007). Participants' verbatim EAR transcripts were submitted to LIWC. We first compared utterances manually coded as timedependent and those coded as time-independent in terms of their verbs with past-tense and future-tense. We conducted a repeated-measures MANOVA and found that time-dependent utterances included significantly more verbs with past tense (M = 4.58, SD = 1.38) than time-independent utterances (M = 2.62, SD = 2.19); F(1,45) = 29.64, p < 0.001, η 2 <sup>p</sup> = 0.40. Similarly, they included significantly more verbs with future tense (M = 2.18, SD = 0.68) than time-independent utterances (M = 0.78, SD = 0.99), F(1,45) = 62.41, p < 0.001, η 2 <sup>p</sup> = 0.58. That is, utterances that we had coded as time-dependent included more verbs with past and future tense than utterances coded as time-independent, which validated our coding.

Second, we compared autobiographical (self-related) and others-related utterances in terms of the number of their personal pronouns. We aggregated the number of pronouns on the person level, conducted a repeated-measures MANOVA and confirmed our expectations: We found that self-related utterances included significantly more 1st person singular pronouns (M = 5.29, SD = 1.42; F(1,43) = 132.58, p < 0.001, η 2 <sup>p</sup> = 0.76) and 1st person plural pronouns (M = 1.22, SD = 0.75) than others-related utterances (singular: M = 1.19, SD = 1.89; plural: M = 0.12, SD = 0.45), F(1,43) = 75.31, p < 0.001, η 2 <sup>p</sup> = 0.64. In addition, we found that the number of 2nd person pronouns (M = 5.85, SD = 4.98), 3rd person singular pronouns (M = 3.64, SD = 3.73), and 3rd person plural pronouns (M = 2.01, SD = 3.27) in othersrelated utterances was significantly higher than the number of 2nd person pronouns (M = 3.60, SD = 1.31), 3rd person singular pronouns (M = 1.41, SD = 1.01), and 3rd person plural pronouns (M = 1.01, SD = 0.55) in self-related utterances, Fs(1,43) ranged 4.21–8.32, η 2 p ranged 0.09–0.29, ps < 0.05. That is, utterances that we had coded as autobiographical were more about the self with pronouns such as "I," "me," "we," and "us," whereas others-related utterances were more about second and third persons (e.g., "you," "he," "she," "they," and "him").

Finally, we validated our conversational time travel coding by comparing utterances manually coded as personal past, personal future and present in terms of their verb tense. We conducted a repeated-measures MANOVA and found, as expected, that utterances coded as personal past included a significantly higher number of verbs with past tense (M = 8.39, SD = 2.61) than utterances coded as personal future (M = 0.88, SD = 0.98) and present (M = 1.83, SD = 0.90), F(1,39) = 294.72, p < 0.001, η 2 p = 0.88. In contrast, we found that utterances coded as personal future included a significantly higher number of verbs with future tense (M = 2.48, SD = 2.06) than utterances coded as personal past (M = 0.66, SD = 0.75) and present (M = 0.69, SD = 0.48), F(1,39) = 28.25, p < 0.001, η 2 <sup>p</sup> = 0.42. Finally, we confirmed that utterances coded as present included a significantly higher number of verbs with present tense (M = 15.62, SD = 2.41) than utterances coded as personal past (M = 8.60, SD = 2.62) and personal future (M = 13.92, SD = 4.76), F(1,39) = 34.47, p < 0.001, η 2 <sup>p</sup> = 0.47. In sum, all of our expectations regarding our coding categories were confirmed and we succeeded in validating the coding scheme with LIWC.

### Frequency of Past- Versus Future-Oriented Utterances

The second goal of Study 1 was to examine the prevalence of mental time travel in participants' utterances, and to compare the frequency of past- versus future-oriented utterances. In order to calculate percentages, we used the sound files that included only a single temporal category (e.g., only personal past, only present or only future) and excluded those that involved more than one temporal focus (e.g., sound file that includes both personal past and personal future). This allowed us to take "sound file" as the unit of analysis and use those sound files that had a single temporal category to clearly count the frequencies of purely pastversus future-oriented sound files.

There were 2,297 sound files that included only one temporal category (**Figure 1**, top row). Out of these, 17.5% were time-independent and included utterances presenting semantic memory or personal preferences, ideas and beliefs (**Figure 1**, second row). Out of the sound files that were time-dependent, 93% were about the self and 7% were about other people (**Figure 1**, third row). Utterances about others were further categorized as others' past (N = 92, 68.7%) and others' future (N = 42, 31.3%). Sound files that included self-related utterances were further divided into past (17.8%), present (72.9%), and future categories (9.3%) to present mental time travel (**Figure 1**, bottom row). That is, participants talked about their personal past in 13.6% of all their sound files and about their future in 7.2% (This means they engaged in conversational time travel in 20.8% of their sound files).

We ran a repeated-measures ANOVA to compare the number of past-, present-, and future-oriented utterances. For this analysis, we used the aggregate person-level amount of talking about the past, present versus future. We found that people talked significantly more about their past (M = 6.08, SD = 4.53) than their future (M = 3.17, SD = 2.60), t(51) = −5.47, p < 0.001. Furthermore, pairwise comparisons showed that present-oriented utterances (M = 24.71, SD = 14.51) were significantly more frequent than both past- and future-oriented utterances, F(2,50) = 78.69, p < 0.001, η<sup>ρ</sup> <sup>2</sup> **=** 0.76.

### Discussion

We observed, over a weekend, the daily conversations of healthy spouses of breast cancer patients and developed a coding scheme for the temporal focus of their utterances. The first goal of the

study was to validate our coding scheme using a text analysis tool: We succeeded and showed that utterances manually coded as (1) time-dependent versus time-independent, (2) self-related versus others-related, and (3) past-, present- versus future-oriented were indeed different from each other in terms of the words they included.

The second goal of the study was to explore the prevalence of conversational time travel in everyday life and to compare the frequency of past- and future-oriented utterances. Our coding scheme first revealed that individuals mostly produced time-dependent utterances (82.5% of all sound files). Semantic information and general comments about the world occurred in only 17.5% of the sound files. Second, we found that individuals talked in a self-referential way most of the time: 77% of all sound files and 93% of time-dependent sound files included autobiographical utterances. In contrast, participants talked about other people in only 5.8% of the sound files. This suggests that vicarious memories (Pillemer et al., 2015) and vicarious future-oriented utterances (e.g., Grysman et al., 2013) are quite rare in daily conversations. This is the first study to examine the prevalence of vicarious thoughts about others and to explore them in everyday life, therefore these findings may inspire future work.

Finally, we examined mental time travel as reflected in autobiographical utterances and found that 13.6% of sound files were about the personal past, whereas 7.2% were about the personal future. That is, people talked about their personal past almost twice as much as their personal future, and the difference was significant. This is in line with our expectation of a retrospective bias in the social setting of conversations (e.g., Eggins and Slade, 1997). Participants referred to their past much more than their imagined future while interacting with others. This is in contrast to previous work on private thoughts: While thinking, people seem to focus more on the future than the past (e.g., Andrews-Hanna et al., 2010) or focus equally on both (e.g., Klinger and Cox, 1987; Gardner and Ascoli, 2015). One explanation might be that recalling past events (i.e., autobiographical memories) may be more useful than simulating future events in social interactions (e.g., Desalles, 2007). We know that talking about memories serves social functions such as creating/enhancing feelings of intimacy, feeling empathy toward others, creating/enhancing conversation, teaching and giving advice (e.g., Alea and Bluck, 2003; Webster, 2003; O'Rourke et al., 2011, 2013, 2017). In contrast, prospection may be more functional while thinking, as private thoughts tend to serve higher directive functions such as setting goals, planning and decision making (e.g., Kulkofsky et al., 2010; Szpunar, 2010; D'Argembeau et al., 2011). For example, Rasmussen and Berntsen (2013) asked participants in the laboratory to remember two events from their past and to imagine two events from their future, and to rate each event on their perceived functions. Past events were rated higher than future events on the social function, as well as on their frequency of being shared with others. Cole et al. (2016) also

asked participants to recall past events and imagine future events using a laboratory paradigm and found that people reported thinking about future events more often than past events. In sum, we believe that the social nature of conversations creates an efficient context for memories to be recalled in everyday life.

Present was the most frequent category, with 60% of all sound files being about the current activity or situation. This shows that while people are talking, more than half of their utterances are focused on what they are actually doing or observing (i.e., goal pursuit, Klinger, 2013). This is in line with previous work: In two experience-sampling studies, individuals rated 66% and 67% of their momentary thoughts as focused on the present (Klinger and Cox, 1987; Felsman et al., 2017, respectively). Similarly, Park et al. (2016) showed that 65% of participants' social media messages were present-oriented. In sum, present orientation is found to occupy about 60–67% of both our thoughts and utterances, as assessed with three different methodologies.

Study 1 had some limitations. The sample included partners of cancer patients. This may have biased the situation samples toward a present- or past-orientation. However, only 3.4% of situation samples included conversations about the cancer, which we eliminated from our analyses, therefore we assume that there should be a minimal bias. Still, it is an open question to which degree the situation samples would differ with a population that is not associated with cancer. Furthermore, most of the participants were men and middle-aged. Therefore, in Study 2, we tried to obtain more gender-balanced samples from both young and late adulthood. A second limitation was that sound files were collected over a weekend. Although 2 days of EAR sampling has proven to yield reliable data (e.g., Mehl and Pennebaker, 2003), it is important to show that our findings are not an artifact of sampling situations insufficiently or sampling situations over a weekend. Therefore, in Study 2, we collected data across 1 weekend and 2 weekdays, with a counterbalanced order. Another limitation was that our interrater reliability calculation was overly strict, which led to a lower agreement between coders than expected. In Study 2, we used the same strategy for consistency across studies, but also used a less strict way of calculation. Finally, we had to exclude from the analyses all sound files with multiple temporal foci (e.g., both past- and future-oriented utterances in one sound file), as our unit of analysis was the "sound file." In Study 2, we used the same strategy for consistency across studies, but also ran additional analyses with all sound files without any exclusions.

### STUDY 2

In Study 2, we validated our coding scheme with two new samples from a different country. We observed healthy young and older adults in Switzerland for 4 days. Our goal was to examine whether the coding scheme used in Study 1 would lead to similar results with participants (1) from different age groups, (2) from Switzerland who speak a different language (i.e., Swiss German), and (3) who were observed for a longer period of time that also included weekdays.

We expected our finding on conversational time travel to be replicated: Past-oriented utterances should outnumber futureoriented utterances independent of age group, country of origin (and language) and sampling rate of EAR. In terms of age effects, Park et al. (2016) found that across all age groups (between ages 13–48), the rank order of past, present and future orientation remained the same: Present-oriented social media messages were the most frequent, followed by past-oriented and then futureoriented messages (the difference between past and future was very small). However, there were some differences in the relative proportion of each orientation across age. We expected to find similar results, with a retrospective bias in conversational time travel for all age groups. There are no cross-cultural studies on mental time travel, but we did not anticipate country of origin to have a major impact, as mental time travel is a universal human ability (Suddendorf and Corballis, 2007). In terms of sampling rate effects, Gardner and Ascoli (2015) tested different sampling intervals in their experience-sampling study (e.g., weekend versus weekday, early in day versus late in day) and found no significant effect on the prevalence of past- versus future-oriented thoughts. We also did not expect sampling rate to affect our results.

## Materials and Methods

#### Sample

Our sample of real-life situations included 9,827 sound snippets collected from 33 young adults (19–31 years, M = 23.76, SD = 3.03; 10 men, 23 women) and 13,276 sound snippets collected from 48 healthy older adults (62–83 years, M = 70.54, SD = 4.65; 22 men, 26 women). Participants were recruited via the participant pool of the Gerontopsychology Lab at the University of Zurich, via flyers in university buildings and advertisements in a local newspaper, and through snowball sampling used by a research assistant. All participants lived in Switzerland and spoke Swiss German. Young participants were mostly university students, with number of years of education ranging between three and 17 (M = 12.18, SD = 2.32). Sixty-nine percent of them were single, whereas 31% were in a long-term relationship.

Older participants were healthy with no record of neurological or psychiatric illness and lived independently. 60% were married (with 4 couples within the sample) and 40% were divorced, widowed or single. Forty-six percent lived alone, 44% lived with one person in the same household and the remaining 10% lived with more than one person in the same household. Number of years of education ranged between seven and 25 (M = 10.55, SD = 3.02). An inclusion criterion for the study was a minimum score of 27 on the Mini Mental State Examination (MMSE; Folstein et al., 1975) and all participants were above this cut-off score (M = 29.2, SD = 0.84). Older participants were compensated with 50 Swiss Francs, whereas young participants could choose between 50 Swiss Francs and research credits.

#### Procedure

Participants met the researchers for an introduction session, after which data collection with the EAR started. Data collection spanned four consecutive days. Finally, participants met with the researchers again for a feedback session.

#### **Introductory laboratory session**

fpsyg-09-02160 November 10, 2018 Time: 13:42 # 10

Participants came to the Psychology Institute for the first session, typically held on a Wednesday or a Friday afternoon. Six older participants were visited at home for their convenience. Participants were given instructions on the study, asked to sign an informed consent form and to complete questionnaires including demographic and psychological measures. All questionnaires were administered in a group setting except for the MMSE which was administered privately. Next, participants received their assigned iPhone with its protective case and charging cable. They were asked to think of the iPhone as a "recorder," as it was set to "Airplane mode" and locked with only the EAR application on. They were reminded to carry the iPhone as much as possible over the next 4 days during their waking hours. They were told that the EAR would record 30 s of ambient sounds at a time, and that they would not be aware of when the EAR was recording, so that they could continue their normal lives. They were also informed that they would have the opportunity to review and delete any sound files at the end of the study, before anyone listened to them.

### **EAR data collection**

Data collection spanned 2 weekdays and 1 weekend in counterbalanced order: 46 participants started data collection on a weekday (Thu, Fri, Sat, Sun) and 35 participants started on a weekend (Sat, Sun, Mon, Tue). Over these 4 days, participants carried the iPhone either clipped to their waistline or in their pockets. They did not have to do anything with the iPhone other than carrying it and charging it every night. Participants also filled out a short diary each day, in which they reported their main activities throughout the day and indicated when they were and were not carrying the EAR and whether they preferred any sound files from a certain time slot to be deleted due to privacy reasons.

#### **Final laboratory session**

After 4 days of data collection, typically on a Monday or a Wednesday, participants returned to the Psychology Institute or were revisited at home. The researcher collected the iPhones, the charging cables and the diaries, and administered a second questionnaire packet. The packet included psychological measures, as well as a questionnaire in which participants evaluated their experience with carrying the iPhone (e.g., degree to which they and others were aware of the EAR, degree to which carrying the iPhone changed their behavior). While participants filled out the questionnaires, the researcher downloaded the recorded sound files onto a lab computer and checked whether there were any problematic files. As participants had the right to listen to their sound files, the researcher burned a CD that included all of their files. Participants could either review their sound files in the lab and permanently delete any files they wished to have deleted, or they could receive the CD to review at home and inform the researcher within 10 days about any deletion requests. In the young group, 9 participants deleted between 1 and 40 sound files, 87 in total. In the old group, 6 participants deleted between 2 and 25 sound files, 46 in total.

### Measures

Each participant was provided with an iPhone 4S which had the EAR application installed (version 2.3.0). The app was programmed to record 30-s sound snippets every 15 min, but with 100% randomization so that recordings were randomly distributed throughout the day (72 per day). The app was active for four consecutive days, 18 h per day with a blackout period between midnight and 6 AM each day (72 days ×4 days = 288 recordings per participant). In total, only 2.5 % of the participant's day (i.e., 36 min) was recorded, which kept possible intrusions into participants' private lives on a minimal level. The iPhone was set to "Airplane mode" and locked with a screen-lock password, therefore the participants could not access the EAR settings or use the phone for other purposes. Participants were instructed to charge the iPhone overnight, but as a reminder the phone calendar was programmed to automatically beep every evening at 9 PM.

### **EAR-Derived measures: coding of sound files**

Similar to Study 1, each sound file was coded in terms of whether the participant was talking or not, and if talking, for the temporal focus of the participants' utterances (**Table 1**). All coding categories were dichotomous (1 versus 0) indicating presence or absence of a category. Similar to Study 1, we also had the TIME variable (1 = personal past, 2 = others' past, 3 = present, 4 = personal future, 5 = others' future, 6 = time-independent). In Study 2, we improved this variable and made it much more fine-grained by adding all possible combinations of temporal foci (1–2 = personal past and others' past, 1–3 = personal past and present, 1-2-3 = personal past, others' past and present, and so on). Furthermore, we created a new DOMINANT TIME variable, which categorized every sound file that includes more than one temporal focus in terms of which temporal focus is best represented (**Table 2**). This new variable allowed every sound file (every unit of analysis) to have a single temporal focus, which allowed us to include all sound files in our analyses.

All sound files were listened to and coded by two trained coders. Similar to Study 1, when we used the strict strategy of calculating inter-rater reliability using the TIME variable, reliability was 62.12%. However, in this study, we also used a lenient strategy: We calculated inter-rater reliability separately for each temporal focus which led to higher agreement between the coders (Personal past = 90.88%, Others' past = 94.77%, Present = 77.43%, Personal future = 94.87%, Others' future = 96.92%, Time-independent = 80.83%). All sound files that showed a disagreement between the two coders were relistened to and the disagreement was resolved through discussion among the two coders.

### Results

#### Preliminary Analyses

In the young sample, a total of 2,087 sound files included participant speech (21%). We were unable to code for temporal focus in 167 sound files (8%) due to the brevity or vagueness of speech. Of the remaining 1,920 sound files, 255 included more than one temporal focus (13%). The remaining 1,665 sound files included only a single temporal focus. The older sample had 2,590

sound files with (21%) participant speech. Out of these, temporal focus was unidentifiable in 336 files (13%). Of the remaining 2,254 files, 315 included more than one temporal focus (14%). The remaining 1,939 files included only a single temporal focus.

### Major Analyses

The goal of Study 2 was to use our validated coding scheme to examine the prevalence of past- versus future-oriented utterances and to replicate Study 1 results (i.e., retrospective bias in conversational time travel). Analyses were conducted in two ways: (1) Similar to Study 1, with sound files that included only a single temporal focus, and (2) with all sound files that included both single and multiple temporal foci, by using the new DOMINANT TIME variable.

### **Young adults**

(1) Similar to Study 1, we first ran analyses with only the sound files that included a single temporal focus. We found exactly the same percentages for the young adults' time-dependent versus time-independent, and self-related versus others-related sound files (**Figure 2**, first three rows). That is, similar to Study 1 participants, young Swiss adults referred to time in 82.6% of their sound files and talked about semantic memory or personal comments in 17.4%. Again similar to Study 1, out of the timedependent sound files, 92% were about the self and 8% were about others. Utterances about others were further divided into others' past (79.4%) and others' future (20.6%). Sound files that included self-related utterances were further divided into past (14.6%), present (80.5%) and future categories (5%) to present mental time travel (**Figure 2**, bottom row). This is where young adults diverged slightly from Study 1 participants. They talked about their personal past in 11% of all their sound files and about their future in about 4% (14.9% of total conversational time travel in their sound files). We conducted a repeated-measures ANOVA to compare the aggregated person-level amount of pastoriented utterances with future-oriented utterances and found that young adults talked significantly more about their past (M = 5.58, SD = 4.21) than their future (M = 1.85, SD = 1.58), F(2,31) = 58.16, p < 0.001, η<sup>ρ</sup> <sup>2</sup> **=** 0.079.

Similar to Study 1 participants, young adults' utterances about others were divided into others' past (78.5%) and others' future (21.5%). Once more, present was the most frequent category (61%, M = 30.97, SD = 17.30), significantly more frequent than both past- and future-oriented utterances, pairwise comparisons: t ranges −8.71 to 10.09, ps < .001. As expected, the retrospective bias in conversational time travel was replicated with the same rank order of present, past and future orientation (Park et al., 2016).

(2) Next, we calculated percentages with the new DOMINANT TIME variable and used all sound files, including those with multiple temporal foci. We found almost the same percentages as in **Figure 2** (See **Supplementary Figure 1A**). The only difference was that the percentages slightly increased for conversational time travel: Personal past was the dominant temporal focus

no technical problems.

in 12.7% of the sound files, whereas personal future was the dominant temporal focus in only 5.5% of the sound files (as opposed to 11.1% versus 3.8% in **Figure 2**). We conducted a repeated-measures ANOVA and found that young adults talked significantly more about their past (M = 7.36, SD = 4.66) than their future (M = 3.21, SD = 2.56), F(2,31) = 55.07, p < 0.001, ηρ <sup>2</sup> **=** 0.078. In summary, this shows that the two different ways of calculating percentages led to similar results.

In addition, we created Venn diagrams of autobiographical, time-dependent utterances to take a more detailed and closer look at conversational time travel frequencies. As depicted in **Figure 3**, young adults referred to their personal past (N = 315) much more than their personal future (N = 139).

#### **Older adults**

(1) Similar to Study 1, we first ran analyses with only the sound files that included a single temporal focus. We found that older adults had very similar percentages to the young (**Figure 4**). Ten percent of older adults' sound files were about the personal past, whereas only 2.7% were about the personal future (**Figure 4**, bottom row). We were unable to conduct a repeated-measures ANOVA due to the non-normal distributions of the difference scores of each temporal focus (i.e., past-present, present-future, future-past) as shown by Shapiro–Wilk normality tests, Ws ranged between 0.89 and 0.95, ps < 0.05. Therefore, we ran the non-parametric equivalent, Wilcoxon signed-rank test. We found that older adults talked significantly more about their past (Mdn = 3.00) than their future (Mdn = 1.00), V = 61.5, p < 0.001, r = −0.48.

Similar to Study 1, present (66%, M = 26.85, SD = 17.04) was significantly more frequent than both past- and future-oriented utterances, V ranges 1173–1176, ps < 0.001. As expected, the retrospective bias in conversational time travel was replicated with the same rank order of present, past and future orientation (Park et al., 2016).

We also examined the interaction between age group (young, old) and temporal focus (past, present, future), which was nonsignificant, F(2,78) = 0.58, p = 0.56. This suggests that the rank order of present, past and future orientation was similar across the two age groups. Finally, we calculated these percentages separately

no technical problems.

for weekdays and weekends. For both age groups, the percentages are highly similar to the original percentages (See **Supplementary Table 1**). Therefore, we can conclude that the retrospective bias holds similarly in both weekdays and weekends.

(2) Next, we calculated percentages with the new DOMINANT TIME variable and used all sound files, including those with multiple temporal foci. We found almost the same percentages as in **Figure 4** (See **Supplementary Figure 1B**). Similar to young adults' results, the only difference was that the percentages slightly increased for conversational time travel: Personal past was the dominant temporal focus in 11.2% of the sound files, whereas personal future was the dominant temporal focus in only 4% of the sound files (as opposed to 10.1% versus 2.7% in **Figure 2**). We conducted a repeated-measures ANOVA and found that older adults talked significantly more about their past (M = 5.25, SD = 5.25) than their future (M = 1.88, SD = 1.42), F(2,46) = 53.85, p < 0.001, η<sup>ρ</sup> <sup>2</sup> **=** 0.070. Thus, we can conclude that, for both young and older adults, the two different ways of calculating percentages led to highly similar results.

Finally, we created Venn diagrams of older adults' autobiographical, time-dependent utterances. As depicted in **Figure 5**, older adults referred to their personal past (N = 347) much more than their future (N = 114). In conclusion, the retrospective bias was confirmed with Swiss older adults, similar to Swiss young adults and American adults.

### DISCUSSION

This study aimed to validate our coding scheme and to replicate Study 1 results. Hence, it built on Study 1 in three ways. First, we recruited both men and women, and obtained more genderbalanced samples. This helped us to validate our coding scheme with different samples and to test whether the retrospective bias observed in Study 1 (i.e., with mostly middle-aged men) would generalize to these samples. Indeed, we showed that the results were highly similar across American and Swiss adults from different age groups.

Second, we used different EAR sampling rates across the two studies and tested whether this would have an impact on the results. The duration (50 versus 30 s) and the distribution of recordings (every 9 min versus random) did not influence the results. The advantage of Study 2 was that we collected data across 4 days (i.e., 1 weekend similar to Study 1 plus 2 weekdays). We found no difference between the weekend and weekdays in terms of the prevalence of past- versus future-oriented utterances. This shows that people are more likely to talk about their past than their future on both weekends and weekdays.

Third, we built on Study 1 with new analyses that did not exclude sound files with multiple temporal foci. That is, we conducted analyses with (1) sound files that included only a single temporal focus, and (2) all sound files with single and multiple temporal foci (by using the new DOMINANT TIME variable). The two sets of analyses revealed very similar percentages for both young and older adults. In addition, analyses of variance showed the same results with past-oriented utterances being significantly more frequent than future-oriented utterances. This is an indicator of the robustness of our findings. In sum, for young and older adults, 10.1–12.7% of their sound files were about their personal past, whereas only 2.7–5.5% of their files were about their personal future. The retrospective bias in conversational time travel was replicated with the same rank order of present, past and future orientation as in Study 1.

### GENERAL DISCUSSION

This work is the first to examine mental time travel reflected in everyday conversations and to introduce the term "conversational time travel." It is also the first to examine mental time travel using a naturalistic observation method. We used the EAR to observe the overt behavior of talking, rather than focusing on private thinking as has been done in previous work (e.g., Gardner et al., 2012). Much of human behavior and cognition occurs in social settings (Mehl and Pennebaker, 2003), therefore we aimed to investigate whether the prevalence of conversational time travel is different from private mental time travel. Using the EAR also allowed us to evade possible limitations of the self-report method (e.g., memory errors, response biases, participant burden; Scollon et al., 2003), and to develop an ecological, objective and standard way of assessing conversational time travel. Furthermore, it allowed us to sample both across real-life situations and across individuals to maximize the diversity of situations to establish ecological validity.

### Validation of the Coding Scheme

The first goal of this research was to develop and validate a coding scheme for conversational time travel. We validated our scheme with a text analysis program (i.e., LIWC). We showed that utterances that we manually coded as timedependent included more verbs with past and future tense than utterances coded as time-independent (i.e., semantic memory and general comments). Second, utterances that we coded as autobiographical were more about the self with first person pronouns ("I," "we," and "us"), whereas others-related utterances included more second and third

person pronouns. Finally, we validated our mental time travel categories: Utterances coded as personal past included the highest number of verbs with past tense; utterances coded as personal future included the highest number of verbs with future tense and present-oriented utterances included the highest number of verbs with present tense. In sum, we succeeded in validating the whole coding scheme with LIWC.

Next, we validated our coding scheme with participants from (1) different age groups (i.e., young, middle-aged, old), (2) two countries that speak different languages (i.e., English and Swiss German), and (3) different EAR sampling designs. Across these different person samples, we acquired the same inter-rater reliability (with strict strategy in Study 1: 64.12% and Study 2: 62.12%). In Study 2, we also used a lenient strategy, which led to very high agreement between coders for conversational time travel (personal past = 90.88%, personal future = 94.87%). This suggests that our coding scheme is robust and reliable across different persons and situation samples. Furthermore, we achieved highly similar results across our person samples, which indicates that our results did not vary due to inconsistencies in coding across studies.

### Retrospective Bias in Conversational Time Travel

The second goal of the current research was to compare the prevalence of past- and future-oriented utterances across young, middle-aged and older adults in the United States and Switzerland. Our results first revealed that individuals mostly produced time-dependent utterances in everyday life conversations (82.5–85.3% of all sound files across all samples). Semantic information and general comments about the world occurred in only 14.7–17.5% of the recorded situations across samples (Suddendorf et al., 2009). This suggests that time mattered greatly for everyone while communicating with others. This is not surprising, as time is an inescapable aspect of our lifespace (Lewin, 1939) that shapes our lives, including our social interactions (Webster, 2011).

Second, we found that individuals talked in a self-referential way across most of the situations: 76.2–79.1% of all sound files included autobiographical utterances across person samples. More specifically, 93% of time-dependent sound files included autobiographical utterances across samples. These percentages show that the majority of participants' utterances were both time-dependent and autobiographical indicating that people tend to talk mostly about "self in time." In contrast, participants talked about other people in only 5.8–6.4% of the sound files across all person samples. This suggests that vicarious memories (Pillemer et al., 2015) and vicarious future-oriented utterances (e.g., Grysman et al., 2013) occur quite rarely in conversations. Bryant et al. (2013), using signal-contingent sampling, also found that individuals experienced a higher number of self-related thoughts than others-related thoughts. Future research should further investigate the significance and functions of vicarious thoughts and utterances about others.

Finally, we examined mental time travel as reflected in participants' autobiographical utterances and found that 10.1– 13.6% of their sound files were about the personal past, whereas 2.7–7.2% were about the personal future. That is, individuals across samples talked about their personal past two to three times as much as their personal future. This is in line with our expectation of a retrospective bias in the social setting of conversations, and in contrast to previous work on private thoughts: While thinking, individuals seem to focus more on their future than their past (e.g., Song and Wang, 2012). Futureoriented thinking serves directive functions such as planning, decision making, problem solving, goal intention and goal achievement (e.g., Szpunar, 2010; D'Argembeau et al., 2011; Schacter et al., 2017). For example, Barsics et al. (2016) examined the functions of emotional future-oriented thoughts and found that participants self-reported four major functions: to plan actions, form intentions (i.e., to set goals), make decisions, and regulate emotions. Twenty percent of emotional future-oriented thoughts were rated as not functional and 5% were reported to involve other kinds of functions, such as daydreaming. Cole and Berntsen (2016) showed that participants' future representations were more frequently related to their goals (i.e., current concerns) than their autobiographical memories. Furthermore, futureoriented mind wandering is found to be more self-related and directive than past- and present-oriented mind wandering (Baird et al., 2011; Stawarczyk et al., 2011). All of these results show that future-oriented thoughts do not tend to serve social functions. Therefore, they are not highly frequent or relevant in social interactions. They are more useful when people are thinking alone, as directive functions seem to be inherently private (Kulkofsky et al., 2010; O'Rourke et al., 2017).

In contrast, past research on autobiographical memories underlines significant social functions of memories showing that people recall their past to provide material for conversation (Hyman and Faries, 1992; Pasupathi et al., 2002), to create/enhance intimacy in relationships (Alea and Bluck, 2007), to elicit empathy for others (Bluck et al., 2013) and to teach and inform others (O'Rourke et al., 2017). For example, Demiray et al. (2017) examined how and why older adults reminisced about their past in real-life conversations. They coded participants' utterances that included reminiscence in terms of their functions and found that reminiscence served mainly social functions (i.e., conversation, teaching) and did not serve any directive functions (e.g., problem solving, death preparation). Therefore, it is not surprising for us to have found a retrospective bias in conversational time travel: Social settings and cues seem to trigger the recall of autobiographical memories.

Indeed, Vannucci et al. (2017) showed, in spite of the widely observed prospective bias in mind wandering, that using external verbal cues in the experimental task changed the nature of mind wandering: They found that task-irrelevant verbal cues directed the temporal orientation of mind wandering toward the past. In the Verbal-cues group, 44.5% of mind wandering episodes were categorized as memories and 18.3% as futureoriented thoughts. In contrast, in the No-cues group, 28.3% were classified as memories, whereas 38.7% as future-oriented thoughts. Furthermore, Mazzoni et al. (2014) found that more

involuntary memories were elicited when verbal cues rather than pictorial cues were presented, whereas there was no difference between the effects of verbal and pictorial cues on other spontaneous (and non-memory) thoughts. More generally, it has been shown that external/environmental cues primarily trigger past-oriented thoughts (Berntsen and Jacobsen, 2008; Maillet and Schacter, 2016a). All of these findings suggest that spontaneous past-oriented thinking is affected by external cues (rather than internal cues, such as mood), and especially by verbal cues (Plimpton et al., 2015). This link between environmental cues and past-oriented thinking may be an important adaptive mechanism that allows individuals to relate the current situation to similar events experienced in the past, which might support adaptive behavior (Maillet and Schacter, 2016b). Conversations are strong verbal cues, which might be one factor underlying the retrospective bias we discovered in conversational time travel. In contrast, spontaneous future thinking is mainly related to and triggered by private concerns, being less dependent on external stimuli (Klinger, 2013; Cole and Berntsen, 2016).

In sum, the retrospective bias in conversational time travel seems to be a universal phenomenon across situations and persons (e.g., Suddendorf and Corballis, 2007; Suddendorf et al., 2009), as all of our samples revealed very similar percentages. Although coming from different countries, age groups and research designs, all samples focused on their past much more than their future during conversations. Across all age groups, the retrospective bias in conversational time travel was replicated with the same rank order of present, past and future orientation (Park et al., 2016). Past work shows that the frequency of recalling the personal past does not vary by age (Webster, 1999; Pasupathi and Carstensen, 2003; Gardner and Ascoli, 2015). Our results on talking behavior are in line with this finding on thinking. Gardner and Ascoli (2015) found that older adults thought about their future twice (21%) as much as young adults (10%). We found, however, that older adults were quite similar to younger individuals in terms of the frequency of talking about the personal future. These findings contradict with the socioemotional selectivity theory (Carstensen, 1992), which states that older adults have a less positive and open-ended future time perspective than young adults (Demiray and Bluck, 2014). This suggests that one's subjective and global perspective of their future may not be associated with how much they think or talk about their future in everyday life. Thus, future studies could examine conversational time travel via both subjective self-report and objective observation.

### Methodological Issues in Measuring the Prevalence of Mental Time Travel

Previous studies measuring the incidence of subjective thoughts have typically used the experience-sampling method. However, event-contingent sampling (i.e., diary method, Berntsen, 2007) has some limitations. For example, in the case of examining involuntary autobiographical memories (spontaneously popping in mind), the method requires that the participant first understands what qualifies as an involuntary memory. Next, when a memory comes into awareness, the participant must retrospectively identify the experience as "memory retrieval" (Note that some may not be sufficiently activated to pass the awareness threshold; Barzykowski and Staugaard, 2017; Vannucci et al., 2017). Then, the participant must decide that the recollection is something worth reporting in the diary. All of these requirements create a cognitive burden to the participants and the risk that many memories may go undetected or ignored due to demotivation or exhaustion (Hintzman, 2011; Vannucci et al., 2014; Barsics et al., 2016). Finally, informing participants about the phenomenon of interest may bias them toward thinking more about the past or toward voluntarily monitoring their thoughts (D'Argembeau et al., 2011; Barzykowski and Niedzwie ´ nska, 2016 ´ ). Indeed, Barsics et al. (2016) showed, in their diary study, that participants reported having experienced more thoughts than usual because they were requested to record them. Due to these limitations, we do not think that the diary method is the ideal method to examine the natural frequency of past- versus future-oriented thoughts.

Signal-contingent sampling is advantageous over the diary method in that it allows for a random sampling of experiences and avoids expectancy effects (Scollon et al., 2003). It is considered the gold standard for the assessment of cognitive or behavioral processes in everyday life, since recall biases and heuristic biases are minimized (Shiffman et al., 2008). However, participant burden is still an issue and assessments may be reactive. Some participants have reported that the signals interrupted their thoughts, which might have led to confusion and possible misratings in the questionnaire (Bryant et al., 2013). Similar to event-contingent sampling, making participants aware of study aims might affect their responses. For example, asking participants to perform a mental check at each signal on whether they had been thinking about a memory or not (Gardner et al., 2012) might alter their experience. Indeed, research shows that participants who were asked to selectively report memories did this to a greater extent than participants asked to report any type of thought (Vannucci et al., 2014; Barzykowski and Niedzwie ´ nska, 2016 ´ ; Barzykowski and Staugaard, 2017).

Therefore, automatized and unobtrusive methods that do not reveal study aims and that minimize participant burden, such as the EAR, are advantageous while examining observable phenomena that do not require self-report. They maximize ecological validity, as huge amounts of data can be collected without experimenter or participant burden, and contextual influences on experience can be detected (Mehl, 2017). However, although the EAR is an ideal method to examine conversations, it cannot be used to assess thoughts. Thus, signal-contingent sampling method and the EAR should be combined as two strong ecological methodologies with different advantages (Mehl et al., 2012). This should create a uniquely powerful way of studying thought processes in natural habitats with the fine-grained multi-method approach.

### Limitations

One limitation of the current study is its sole dependence on the coding and analysis of overt speech data. A multi-method approach that also collects self-reports from participants could

inform us about what is happening in participants' minds. Experience-sampling method (merged with the EAR) could help us understand how and why individuals are engaging in conversational time travel in certain situations. This method would allow us to examine both thinking and talking behaviors within the same study and to compare how these two modalities shape mental time travel. A strength of our study, however, is that it demonstrates that meaningful information can be derived from the observation of real-life verbal activities. This may allow us to include (older) persons in research who may feel overly burdened or are unable to reliably self-report information and are, so far, excluded from research.

One limitation of our coding scheme is that it does not differentiate between self-related versus others-related timeindependent utterances. This distinction was not within the scope of the current study, however, future research could enhance the coding scheme with two separate dimensions for temporal focus (e.g., past, present, future, none) and subject (e.g., self, others, none).

Another limitation is that we have taken a betweenpersons approach and neglected the within-person dynamics of conversational time travel. The retrospective bias in conversational time travel seems to be a universal phenomenon, however, there are individual differences in how much people talk about their past or future (Demiray et al., 2017). Future work should focus on within-person variability in mental/conversational time travel across situations and examine the impact of context on the frequency, characteristics and functions of thinking and talking about the past versus future. For example, Study 2 did not include middle-aged adults, who are active in the workforce and who may be using work-related language throughout the weekdays that is mostly time-independent (e.g., semantic information). Such contextual effects (e.g., conversation partners; Demiray et al., 2017) and the topic of conversations should be examined in future research. Finally, our older sample included 4 couples, whose data may be dependent on each other. However, it is highly unlikely that duplicate sampling of the same 30-s sound-snippets occurred, as recordings were 100% randomly distributed.

### CONCLUSION

The current research has introduced the term "conversational time travel" and examined its prevalence in everyday life. It seems that individuals, across widely varying real-life situations, talk two to three times more about their personal past than

### REFERENCES


their future. This retrospective bias in conversational time travel highlights the social functions of recalling and sharing the personal past with others. Talking about past experiences seems to be an adaptive behavior that helps us to connect with others and to survive in this social world.

### ETHICS STATEMENT

Study 1 has been approved by the Institutional Review Board of the University of Arizona, and Study 2 was approved by the Ethics Committee of the University of Zurich. All participants, first, gave written informed consent in accordance with the Declaration of Helsinki.

### AUTHOR CONTRIBUTIONS

BD developed the research concept and the research design. MRM conducted Study 1 for a larger project with prior students. BD conducted Study 2, collected the data, performed the data coding, and analyses and drafted the manuscript. BD and MRM worked together on the interpretation of results and on framing the manuscript. MM provided the critical revisions on the manuscript.

### FUNDING

This research was supported by a grant from the University of Zurich, Forschungskredit (Grant No. K-63213-03-01) awarded to BD and by the National Institute of Health (Grant No. R03CA137975).

### ACKNOWLEDGMENTS

We would like to thank Andra Arnicane, Isabel Berwian, Annika Martin, Zoe Waelchli, Marianne Mischler, Mirjam Imfeld, Marion Hauert, and Elisabeth Schoch for their assistance in data collection, data cleaning, and coding.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02160/full#supplementary-material

cognition. J. Neurophysiol. 104, 322–335. doi: 10.1152/jn.00830. 2009


memories. Br. J. Psychol. 109, 321–340. doi: 10.1111/bjop. 12259


Eggins, S., and Slade, D. (1997). Analysing Causal Conversation. London: Cassel.



autobiographical memories? Mem. Cognit. 36, 920–932. doi: 10.3758/MC.36. 5.920


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Demiray, Mehl and Martin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Detecting Temporal Cognition in Text: Comparison of Judgements by Self, Expert and Machine

Erin I. Walsh<sup>1</sup> \* and Janie Busby Grant<sup>2</sup>

<sup>1</sup> Centre for Research on Ageing, Health & Wellbeing, Australian National University, Canberra, ACT, Australia, <sup>2</sup> Centre for Applied Psychology, University of Canberra, Canberra, ACT, Australia

Background: There is a growing research focus on temporal cognition, due to its importance in memory and planning, and links with psychological wellbeing. Researchers are increasingly using diary studies, experience sampling and social media data to study temporal thought. However, it remains unclear whether such reports can be accurately interpreted for temporal orientation. In this study, temporal orientation judgements about text reports of thoughts were compared across human coding, automatic text mining, and participant self-report.

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Burcu Demiray, Universität Zürich, Switzerland Rita Obeid, Case Western Reserve University, United States

> \*Correspondence: Erin I. Walsh erin.walsh@anu.edu.au

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 20 June 2018 Accepted: 03 October 2018 Published: 26 October 2018

#### Citation:

Walsh EI and Busby Grant J (2018) Detecting Temporal Cognition in Text: Comparison of Judgements by Self, Expert and Machine. Front. Psychol. 9:2037. doi: 10.3389/fpsyg.2018.02037 Methods: 214 participants responded to randomly timed text message prompts, categorically reporting the temporal direction of their thoughts and describing the content of their thoughts, producing a corpus of 2505 brief (1–358, M = 43 characters) descriptions. Two researchers independently, blindly coded temporal orientation of the descriptions. Four approaches to automated coding used tense to establish temporal category for each description. Concordance between temporal orientation assessments by self-report, human coding, and automatic text mining was evaluated.

### Results: Human coding more closely matched self-reported coding than automated methods. Accuracy for human (79.93% correct) and automated (57.44% correct) coding was diminished when multiple guesses at ambiguous temporal categories (ties) were allowed in coding (reduction to 74.95% correct for human, 49.05% automated).

Conclusion: Ambiguous tense poses a challenge for both human and automated coding protocols that attempt to infer temporal orientation from text describing momentary thought. While methods can be applied to minimize bias, this study demonstrates that researchers need to be wary about attributing temporal orientation to text-reported thought processes, and emphasize the importance of eliciting self-reported judgements.

Keywords: temporal cognition, Stanford Natural Language Parser, self-report, temporal orientation, tense extraction

### INTRODUCTION

Research into how we cognitively create and experience events from the past and future has become ever more popular in the last decade (e.g., Gardner et al., 2012; Schacter et al., 2012; Stawarczyk and D'Argembeau, 2015; Karapanagiotidis et al., 2017). This work highlights the central role of temporal recall and projection in building and maintaining our self-concept over time, our capacity to

**105**

appropriately defer short-term gratification for longer-term planning, and manage the complexities of everyday functioning in society (Boyer, 2008; Miloyan et al., 2016; Schacter et al., 2017). While there have been calls for use of more diverse research approaches in the field, to assess thought, behavior and potential interventions in real-world contexts (Oettingen, 2012; Busby Grant and Walsh, 2016; O'Neill et al., 2016), methodological limitations have often restricted when, where and how research into temporal cognition can be conducted.

The majority of studies to date examine temporal thought and associated behavior in controlled lab-based settings. These studies provide insight into the neurological processes underlying past and future thought (e.g., Karapanagiotidis et al., 2017; Thakral et al., 2017), distinctions and relationships between cognitive factors (e.g., Abram et al., 2014; Cole et al., 2016), and the effect of future thought on behavior (e.g., Snider et al., 2016; O'Donnell et al., 2017). While lab-based methodologies provide gold-standard demonstrations of causal effects, they can lack external validity, particularly when they are attempting to demonstrate the efficacy of an intervention on behavior (e.g., Daniel et al., 2015). In contrast, experience sampling allows assessment of thoughts and behaviors in real-world context by prompting participants to report experiences at random intervals during their day. This approach has been demonstrated to provide a scalable, real-world method of assessing temporal thought (Killingsworth and Gilbert, 2010; Song and Wang, 2012; Busby Grant and Walsh, 2016). Diary studies similarly allow participants to report thoughts as experienced in real-world context (Berntsen and Jacobsen, 2008; Finnbogadóttir and Berntsen, 2013), by capturing either spontaneous thoughts, or those responding to cues provided by the researcher (e.g., Gardner et al., 2012). However, these approaches of necessity involve interruption to daily behavior, and can be affected by differential reporting and (in the case of diary studies) retrospective bias.

A different methodology rapidly gaining traction in fields similarly seeking to assess and evaluate human experience is the use of "big data," in part from social media (Abbasi et al., 2014; Moller et al., 2017; Oscar et al., 2017). This use of existing datasets (e.g., Twitter, Facebook, query logs in Google and Wikipedia, purchasing behavior) rather than active recruitment and data collection has substantial advantages. As well as the sheer size of the data set that can be retrieved, the data has real-world validity because participants are spontaneously recording their own thoughts independent of research context. While there are a number of other challenges around interpretation of this data (e.g., generalisability, differential recording), this approach represents a valuable potential addition to the methodological arsenal which is currently underutilized by psychologists (Oscar et al., 2017).

One of the key challenges for researchers seeking to assess temporal thought using large data sets, such as those created by social media, is the extraction of meaning from relatively small text entries. It is difficult to reliably determine temporal orientation (whether someone is thinking about the past, present or future) from a text statement, particularly in English. Take the statements: "In 2019, I will have remembered this example," and "I am thinking about making dinner at my parents' house"; in each of these cases, without the speaker's own insight to give context, it is not straightforward to identify the temporal orientation. For accurate analysis and interpretation, researchers need to be confident in reliably inferring factors like temporal orientation from a statement, and to take advantage of the large data sets, the analysis needs to take place quickly and accurately, which typically means automated tools rather than manual coding (Cole-Lewis et al., 2015). The focal analysis of Twitter data for human behavior to date has been in sentiment analysis, that is detection of whether a given tweet is positive, negative or neutral relative to a concept, event or product (Oscar et al., 2017; Rosenthal et al., 2017). Numerous machine sentiment classification tools exist, although they differ substantially in their accuracy (Abbasi et al., 2014). To the authors' knowledge, only Jatowt et al. (2015) and Park et al. (2017) have specifically investigated the temporal orientation of short social media posts (Tweets and facebook statuses, respectively). Jatowt et al. (2015) used the time and date entry identification capacity of the Stanford Natural Language Parser (SNLP) to automatically extract explicit mentions of time (e.g., "tomorrow," "next month," "December"). While a highly useful start point that gives insight into the distance in time between the mention (e.g., "last week") and topic (e.g., "holiday"), this approach is only applicable when explicit mentions of time are present – this is often not the case in natural language, where tense and informational context are the sole cues to orientation. Park et al. (2017) extended this by also including frequency of words in a temporally oriented linguistic enquiry dictionary, but analysis remained constrained to post hoc (researcher vs. automated) coding.

The current study is designed to inform researchers seeking to code temporal orientation from existing text data sets, in order to leverage the possibilities of large scale social media corpora for temporal cognition research. This will be achieved by exploring the accuracy of human and automated post hoc temporal orientation extraction from real-world short English Language text strings, of the kind found in experience sampling research and on social media microblogging platforms such as Twitter. Careful manipulation of the coding protocol (e.g., allowing single or multiple concurrent possible orientations) and comparison of post hoc coding to the participant's own self-report, rather than potentially innaccurate researcher coding, will provide a useful foundation to set expectations of accuracy in future research.

### METHODS

Detailed methods for data collection can be found in Busby Grant and Walsh (2016). Briefly, 214 undergraduate students, aged 17– 55 (M = 21, SD = 7) participated in return for course credit. The sample was 70% female. All participants provided written, informed consent. The ethical aspects of this study were approved by University of Canberra's Human Research Ethics Committee (protocol 12–134). Participants received 20 text message prompts across 2 days, randomly timed for between 8 am and 8 pm (with some variation of this window on participant request). The high

quality random schedules for each participant were generated a-priori using the program "Psrta". The text messages prompted participants to report the temporal category of their thoughts at the moment the prompt arrived ("What were you thinking about in the seconds before you received the SMS alert?" with options of past/future/present/other), and provide open-ended information about the content of their thoughts ("Please give more information about what you were thinking about in the seconds before you received the SMS alert").

Participants responded to an average of 14 of the 20 prompts (min = 1, max = 20, SD = 6). From an initial corpus of 2884 responses, 379 had either tied (multiple self-selected orientations, despite instructions to produce a statement including only one) or missing self-reported orientation, so were excluded. This resulted in a final corpus of 2505 brief (between 1 and 358 characters, M = 43) unique descriptions of momentary temporal thought, from 192 individuals aged 17–52 (M = 21.85, SD = 6.52), 70% female.

The temporal orientation of unique descriptions of momentary temporal thought was extracted in seven ways, the first being self-report (**Table 1**). This was followed by post hoc human coding by two independent researchers, and automated methods of increasing complexity using the Stanford Natural Language Parser; (SNLP). SNLP coding was undertaken in R version 3.2.0 using the coreNLP package (v 3.3.3) (Manning et al., 2014). Further detail regarding SNLP implementation can be found in **Table 1**, with full R code available in the **Supplementary Materials**. Both researcher and SNLP coding was blind to the self-report orientation. For self-report, only

TABLE 1 | Temporal extraction methods, in the context of the example phrase "In 2019, I will have remembered this example."

one temporal orientation was allowed per description. However, ambiguity in post hoc coding can arise from multiple candidate orientations for a single statement. Hence, we also allowed "ties," circumstances where the either a human or automated coder could specify multiple orientations in an attempt to capture the correct one. These circumstances were coded as "mixed." Researcher and/or automated coding was considered "correct" when their orientation matched self-report. This is reported as a percentage across the full corpus of 2505 responses. With four possible orientations chance performance was 25%.

### RESULTS

Results are summarized in **Figure 1**. Text messages were coded based on their temporal directions into the categories as described above: past, present, future, and other. Self-reported orientations indicated the majority (58.78%) of thoughts were oriented to the present. Approximately equal numbers were future- or past- oriented (19.56 and 19.03% respectively), with very few (2.64%) self-categorized as "other" (self-reports in the "other" category were general status reports, such as "sleeping" and "drunk").

All methods except for suTime (method 7, see **Table 1**) performed above chance ( > 25% correct). Overall, researcher coding more closely matched self-reported coding than automated methods. When multiple temporal categories per response (ties) were allowed, both researcher and automated methods diverged notably from self-report. Where ties were not


Frontiers in Psychology | www.frontiersin.org

allowed, Researcher A (method 2) performed best, with 79.93% correct. Next best was the SNL using both POS-tagged word stems and explicit anchors (method 5), with 57.44% correct. This method notably over-estimated present orientation, particularly at the expense of future orientation. Where ties were allowed, Researcher B (method 3) also outperformed automated methods, with 74.93% correct, and < 1% coded as ties. There was a slight improvement from the naïve to anchored SNL model (48.77

to 49.03%), though both models notably over-estimated both "other" and "present" orientations, at the expense of "future."

### DISCUSSION

This study highlights the importance of self-report judgements in evaluating accuracy of temporal orientation classification coding systems. The findings demonstrate that, using self-reported orientation as a gold standard, researchers were more accurate than automated systems based on natural language parsers in determining temporal orientation of short text strings. However, the best-performing researcher coding still resulted in around a 20% error rate in temporal orientation classification.

Almost every method (in particular automated methods) overestimated present orientation, and underestimated future orientation. This may be because, in English, present tense can be used to indicate non-present events, and future tense shares similar sentence constructions (Langacker, 2001). For example, "I am thinking about having dinner" could refer to a thought or process coincident with the time of writing (the act of eating dinner) or a future event (a dinner yet to be had). Notably, a recent study similarly extracting temporal orientation from social media text also found a very high degree of present orientation (65% of statements present-oriented in Park et al., 2017). Together with current results, this indicates that present-focus is genuinely the most common temporal thought orientation, so the overestimation seen here may simply be proportional to the number of present vs. future thoughts.

Unexpectedly, attempts to account for bias due to multiple conflicting temporal orientation cues by allowing ties in both human and automated coding led to poorer performance. Too few tied responses were recorded ( < 1%) to determine why human coding performance declined in this method. Broadly, it is likely this relates to a similar phenomenon found in the visual psychophysics and cognitive discrimination literature, which has long recognized that a forced-choice paradigm is peculiarly stable and accurate, possibly by reducing anchoring effects that scale to the number of potential alternative choices (Blackwell, 1952). For automated coding, "ties" were broken by temporal precedence (first cue in the text response was taken as the correct cue). The discrepancy here is therefore most likely due to the "true" temporal cue appearing later in the sentence. Further expansion of the current approach to use the SNL's parts-of-speech functionality, as in Park et al. (2017), may ameliorate this.

There are a number of implications for researchers seeking to use large data sets to infer and interpret temporal cognition in situ. In these cases, self-report of key features such as temporal orientation is generally not available, and researcher coding, while being the most accurate available, is costly and time-consuming and by no means error free. Automated coding of temporal orientation would clearly be the most efficient means of categorizing large text data sets, but the current research highlights the need for further work on appropriate algorithms, using self-report (rather than error-prone researcher coding) as comparison.

This study provides insights into accuracy of temporal coding of text by using a triad of self-report, machine and researcher assessments. It used a substantial corpus of data that closely mirrors the type of data available in big data sets such as social media. However, the sample had limited generalisability (primarily female, undergraduate students) and there is considerable scope for extension to apply substantially more complex algorithms than the SNL tools applied here. There is the possibility of using both automated and researcher coding in concert, given strong historical evidence that a combination of human and automated information processing (human-in-the-loop augmented intelligence) can outperform either alone (Zheng et al., 2017). Further, this paradigm allows a single orientation per description, which may not reflect real-life complexity where multiple orientations are encapsulated within a single chain of thought.

Because the focus of this paper was triangulation of self-report against post hoc coding methods, one of the limitations is comparatively unsophisticated automated coding methods. Future research could reduce the gap between human and automated methods through approaches such as machine learning, or tweaking rules to better reflect English structure (e.g., using grammatical, rather than temporal precedence, to break ties, as was done in Park et al. (2017). Such endeavors are underway and ongoing, particularly in the sphere of orientation extraction from social media text (e.g., Park et al., 2017). However, as our results have indicated, reducing the gap between human and automated post hoc coding is an important but limited endeavor, as there is also a gap between contemporaneous self-report and post hoc researcher coding.

This study explored the accuracy of human and automated post hoc temporal orientation extraction, in the context of real-world experiences that sampled English language data. Despite recent advances in natural language parsing, researchers need to be wary about any post hoc attribution of temporal orientation to text-reported thought processes, whether human or automated. Our findings demonstrate that future evaluation of the efficacy of automated and machine learning algorithms should use participant's own, rather than researcher judgement, and emphazise the importance of eliciting self-reported judgements of temporal thought wherever possible.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Australian National University Human Research Ethics Committee with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the the Australian National University Human Research Ethics Committee (protocol 2012/402).

### AUTHOR CONTRIBUTIONS

EW contributed to the design of the study, conducted all the statistical analyses, and managed all aspects of

the manuscript preparation and submission. JG contributed to the design of the study, provided methodological input and theoretical expertise, advised on statistical analyses, and contributed to writing and editing of the manuscript.

### REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02037/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Walsh and Busby Grant. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Adults' Performance in an Episodic-Like Memory Task: The Role of Experience

#### Gema Martin-Ordas 1,2 \* and Cristina M. Atance<sup>2</sup>

*<sup>1</sup> Division of Psychology, University of Stirling, Stirling, United Kingdom, <sup>2</sup> School of Psychology, University of Ottawa, Ottawa, ON, Canada*

Episodic memory is the ability to consciously recollect personal past events. This type of memory has been tested in non-human animals by using depletion paradigms that assess whether they can remember the "what," "where," and "when" (i.e., how long ago) of a past event. An important limitation of these behavioral paradigms is that they do not clearly identify the cognitive mechanisms (e.g., episodic memory, semantic memory) that underlie task success. Testing adult humans in a depletion paradigm will help to shed light on this issue. In two experiments, we presented university undergraduates with a depletion paradigm which involved choosing one of two food snacks—a preferred but perishable food and a less preferred but non-perishable food–either after a short or a long interval. Whereas, in Experiment 1, participants were asked to *imagine* the time between hiding the food items and choosing one of them; in Experiment 2 participants *experienced* the time elapsed between hiding the food items and choosing one of them. In addition, in Experiment 2 participants were presented with 2 trials which allowed us to investigate the role of previous experience in depletion paradigms. Results across both experiments showed that participants chose the preferred and perishable food (popsicle) after the short interval but did not choose the less preferred and non-perishable food (raisins) after the long interval. Crucially, in Experiment 2 experiencing the melted popsicle in Trial l improved participants' performance in Trial 2. We discuss our results in the context of how previous experience affects performance in depletion tasks. We also argue that variations in performance on "episodic-like memory" tasks may be due to different definitions and assessment criteria of the "when" component.

Keywords: episodic memory, episodic-like memory, temporal information, adults, depletion paradigms

### INTRODUCTION

Episodic memory is a form of declarative memory that allows people to recall personally experienced events (Tulving, 1983). Importantly, episodic recollection is entwined with a particular phenomenological experience that allows a person to mentally travel back in time to re-experience a past episode—or, so-called autonoetic awareness (Tulving, 1983)—and to be aware of ". . . the temporal dimension of their own and others' existence. . . " –referred to as chronosthesia (Tulving, 2002, p. 313).

#### Edited by:

*Danielle DeNigris, Fairleigh Dickinson University, United States*

#### Reviewed by:

*Rui Cao, Indiana University Bloomington, United States Caroline L. Horton, Bishop Grosseteste University, United Kingdom*

#### \*Correspondence:

*Gema Martin-Ordas gema.martin-ordas@stir.ac.uk*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *15 August 2018* Accepted: *14 December 2018* Published: *21 January 2019*

#### Citation:

*Martin-Ordas G and Atance CM (2019) Adults' Performance in an Episodic-Like Memory Task: The Role of Experience. Front. Psychol. 9:2688. doi: 10.3389/fpsyg.2018.02688*

A wide range of language-based paradigms (e.g., word lists, mental imagery tasks, navigation tasks, autobiographical memory questionnaires) have been used to investigate episodic memory in human adults (e.g., Tulving, 1972; Williams and Broadbent, 1986; Hassabis et al., 2007; Mullally et al., 2012). In most of these paradigms, participants are asked to describe the content of a memory and the subjective experience (i.e., type of awareness) associated with remembering this content (e.g., Levine et al., 2002; Buckner and Carroll, 2007).

There is no doubt that participants in such tasks are retrieving episodic memories. Nonetheless, researchers have little control over how participants have formed these memories or how often these memories have been retrieved (Pause et al., 2013). Recent studies have also shown a lack of inter-task relations thus calling into question the extent to which these different measures tap the same type of memory (e.g., Cheke and Clayton, 2013, 2015). In addition, relying exclusively on language-based tasks poses important challenges for testing episodic memory in non-verbal populations (e.g., pre-verbal children, non-human animals), and thus precludes making important comparisons across development and across species.

In order to overcome some of these limitations, there has recently been an increasing interest in developing non-languagebased tasks grounded on the behavioral components of episodic memory. These tasks usually take the form of assessing the ability to remember what happened, where, and when (Tulving, 1972), and have been adapted from a depletion paradigm that was first developed for use with birds (Clayton and Dickinson, 1998). In their study, Clayton and Dickinson had scrub-jays (Aphelocoma californica) cache two types of food in different locations preferred, but perishable, wax worms, and less-preferred, but non-perishable, peanuts. Importantly, the scrub jays could either recover the food after a short or long retention interval. At recovery, scrub-jays searched for worms after a short time had passed since caching, but switched to peanuts after a long time had elapsed since caching. Thus, birds successfully recalled the type of food they had cached (i.e., "what"), its location (i.e., "where"), and how long ago (i.e., "when") they had cached it (Clayton and Dickinson, 1998). Because the paradigm did not directly assess the phenomenological components of the scrub-jays' memories, the authors concluded that scrub-jays had "episodic-like memories."

Recent studies have shown that in "episodic-like" memory paradigms human adults also recall what, where, and when something happened (e.g., Pause et al., 2010; Plancher et al., 2010; Holland and Smulders, 2011; Easton et al., 2012; Cheke and Clayton, 2013; Mazurek et al., 2015; Craig et al., 2016). In these studies, participants are usually asked to recall, for example, in which room (e.g., Holland and Smulders, 2011; Craig et al., 2016) or quadrant of a computer screen (Pause et al., 2010) (i.e., "where") and in which order (i.e., "when") coins (e.g., Holland and Smulders, 2011; Craig et al., 2016) or visual stimuli (Pause et al., 2010) (i.e., "what") were hidden or seen before. Crucially, adults' successful performance in these tasks has been interpreted as evidence that the what-where-when paradigms rely on episodic memory. However, Martin-Ordas et al. (2017) have suggested that there are at least two important differences between the studies with humans and Clayton and Dickinson (1998) depletion paradigm: (1) the definition of the "when" component, and (2) the behavioral criteria used to assess episodic memory.

In the studies with humans conducted thus far, "when" is defined as the "order" of events (henceforth "what-wherein which order" paradigms), whereas in the studies with the scrub-jays, it is defined as "how long ago" an event took place (henceforth "what-where-how long ago" paradigm). This difference in the definition of the "when" component is a particularly relevant issue because it has been argued that the "how long ago" component does not necessarily test chronosthesia which, as mentioned earlier, Tulving (2002) defined as a critical feature of episodic memory (e.g., McCormack, 2001; Roberts et al., 2008). For example, Roberts et al. (2008) suggested that in a what-where-how long ago paradigm "instead of remembering when an event happened within a framework of past time, animals are keeping track of how much time has elapsed since caching or encountering a particular food item at a particular place and are using elapsed time to indicate return to or avoidance of that location" (p. 113). Thus, even if successful performance in what-where-in which order tasks relies on episodic memory, the same might not be true for successful performance in the what-where-how long ago task.

As for the behavioral criterion used to assess episodic memory, Clayton and Dickinson (1998) measured scrub-jays' correct choices (i.e., choosing worms after the short retention interval, and peanuts after the long retention interval). In contrast, humans' episodic memories are usually measured by their verbal responses to the "what" (e.g., coins), "where" (e.g., in which room), and "when/in which order" (e.g., order in which the coins were hidden) questions (although see Pause et al., 2010 and Pause et al., 2013 for exceptions). Thus, in these studies, no measure of whether or not participants use duration to make choices (e.g., choose the preferred food after a short interval and the less preferred food after a long interval) was included- this being the crucial measure in the episodic-like memory paradigms used with non-human animals.

In order to address these two issues, Martin-Ordas et al. (2017) developed a what-where-how long ago depletion paradigm for children in which correct choices as well as responses to "what," "where," and "how long ago" questions were assessed. In two trials, 3-, 4-, and 5-year-olds were presented with a preferred food (i.e., popsicle) that was only edible after a short interval, and a less preferred food (i.e., raisins) that was edible after both short and long intervals. To make a successful choice, children had to remember what food item was hidden where as well as how much time had elapsed between the hiding of the two food items. Results showed that children chose their preferred food after the short intervals but, strikingly, did not select their less-preferred food after the long intervals. Consistent with previous findings, however, age-related changes in children's ability to remember "what" was hidden "where" were found. Nonetheless, children struggled at estimating the duration of the trials—a potential explanation for why they failed to make the correct critical choice in the depletion paradigm. However, a more controversial interpretation of Martin-Ordas et al.'s (2017) findings is that what-where-how long ago depletion paradigms do not necessarily rely on episodic memory.

One way to address this issue is to test human adults in the what-where-how long ago task previously used with preschool children. This is because adults not only have episodic memories but also have less difficulty at estimating temporal duration. In two experiments, we presented adult participants with a depletion task which involved choosing a food snack either after 3-min or 1-h. In Experiment 1 participants were asked to imagine the time between the hiding of the food items (a preferred but perishable grape popsicle and a less-preferred but non-perishable box of raisins) and choosing one of them. Successful performance would depend on participants' memory for what and where as well as on their ability to integrate temporal information into their decision-making process. In Experiment 2, participants experienced the time elapsed between the hiding of the food items and choosing one of them. Thus, Experiment 2 allowed us to assess whether adults would remember and incorporate temporal information to guide their choices in what-where-how long ago tasks. Participants' success would support the claim that depletion paradigms assess episodic thinking.

### EXPERIMENT 1: QUESTIONNAIRE VERSION

We developed a questionnaire version of Martin-Ordas et al.'s (2017) procedure with children. On a screen in a lecture theater, participants were shown the setup used by Martin-Ordas et al. (2017) (i.e., an image of a table with three opaque boxes) and the images of two snacks: a preferred but perishable grape popsicle and a less-preferred but non-perishable box of raisins. Participants were asked to imagine that the two snacks were hidden under two of three boxes. Next, each participant was provided with a questionnaire in which they were asked to imagine choosing one of the three containers either after 3-min or 1-h. Correct responses (i.e., choosing the popsicle after 3 min and the raisins after 1-h) would indicate that adults are able to integrate "what," "where," and "when" information (i.e., hypothetical temporal distance between hiding the snacks and having to choose a container).

### Methods

#### Participants

An opportunistic sample of 84 University undergraduates was tested; 23 were excluded due to food preference (e.g., they did not like raisins, they liked raisins more than popsicles), resulting in a final sample of 61 (46 females; 15 males). All participants were predominantly White, and fluent in English. Participants were informed that participation was voluntary and that they could leave the lecture theater if they did not want to participate in the study.

#### Materials and Procedure

On a projector screen, we presented the images of a popsicle, a box of raisins, and a platform with three opaque cardboard boxes on top of it. Each participant was provided with a two-page questionnaire containing (1) a food preference test (page 1), (2) a critical choice question, and memory check questions (page 2).


### Scoring and Analyses

#### **Critical choice question**

If participants selected the correct box, they received a score of 1, whereas if they selected an incorrect box, they received a score of 0. As in the depletion paradigms, choosing the box that contained the popsicle in the 3-min trials was scored as "correct" because it is the preferred food and is still edible, whereas choosing the empty box or the box containing the raisins was scored as "incorrect." In contrast, in the 1-h trials, choosing the box containing the raisins was considered "correct," and choosing the empty box or the box containing the popsicle (which would have melted and thus no longer be edible) was considered "incorrect."

### **Memory-check questions (i.e., "do you remember where the popsicle is? do you remember where the raisins are?")**

Participants received a score of 1 if they answered that the box on the right contained the popsicle, and that the box on the left contained raisins. Any other response was scored as 0.

#### **Analyses**

We used Pearson chi-square tests to analyze performance in the critical choice question. We used binomial tests to assess whether participants were above chance in the critical choice question and memory check questions (chance = 33%). All statistical tests were exact two-tailed, and results were considered significant if p < 0.05.

### Results

#### Critical Choice Question

Participants performed significantly better in the 3-min trial compared to the 1-h trial (χ <sup>2</sup> = 15.74, df = 1, p < 0.001). Binomial tests indicated that participants chose the box containing the popsicle significantly above chance in the 3-min trial (p < 0.001) but failed to choose the box containing the raisins significantly above chance in the 1-h trial (p = 0.87). In fact, 76% of the adult participants chose the box containing the popsicle after 1-h (p < 0.001) (see **Figure 1**).

### Memory-Check Questions ("What Is Where")

Participants' responses to the "what is where" question was significantly above chance in both the 3-min (Binomial test: p < 0.001; 97% of the participants answered this question correctly) and 1-h trials (Binomial test: p < 0.001; 91% of the participants answered this question correctly), and did not differ as a function of trial type (χ <sup>2</sup> = 1.42, df = 1, p = 0.285). In other words, participants' memory about where the popsicle and raisins were hidden was not the limiting factor in their performance.

### Discussion

We developed a questionnaire version of the what-where-how long-ago paradigm previously used with non-human animals and preschool children for use with adults. Strikingly, participants chose the preferred and perishable food (i.e., popsicle) both after 3-min and 1-h. Participants' responses to the memorycheck questions revealed that failure to remember what was hidden where cannot explain our results. One could argue that participants' choices of their preferred food after 1-h could be due to participants' inability to integrate the temporal information with their knowledge about the perishability of the food items. However, it is also possible that temporal information was not salient enough in the current task. This is because participants were provided with the duration of the trials in the critical choice question, but did not actually experience the time between the hiding of the food items and choosing a container. This is an important difference between our method and previous studies using this paradigm. Another possibility is that participants were not sufficiently motivated by the food "rewards"—note that, contrary to the studies with non-human animals and children, our participants were not presented with real rewards but, rather, photographs of them.

In order to control for these alternative explanations, in Experiment 2, participants were presented with the same procedure developed by Martin-Ordas et al. (2017) for use with children. In this what-where-how long ago task participants experienced the time between hiding two real food rewards and choosing one of the containers. As in Experiment 1, we predicted that if this task draws on episodic memory, participants will successfully choose their preferred food snack after 3-min and their less preferred food after 1-h.

### EXPERIMENT 2: LAB VERSION

Following Martin-Ordas et al. (2017), we presented adults with two trials in which they witnessed an Experimenter hiding two snacks—a preferred, but perishable grape popsicle, and a lesspreferred, but non-perishable box of raisins- in two of three locations on a platform. Participants were asked to choose from one of the three locations (i.e., critical choice question) after a 3-min or 1-h retention interval (RI) and to answer a series of memory questions about "what" we hid, "where," and "how long ago" we hid it (i.e., memory-check questions). Importantly after 3-min, the popsicle was still edible, whereas after 1-h it was not (i.e., it had melted).

Crucially the current paradigm also allowed us to investigate participants' correct choices—this measure was the equivalent of scrub jays choosing worms or peanuts in Clayton and Dickinson (1998)—as well as participants' recollection of behavioral components of episodic memory—this measure being similar to those assessed in previous studies with humans. In addition, presenting participants with two trials allowed us to assess how they respond to an "unexpected" question about a past event or, what has been termed "incidental encoding" (Zentall et al., 2001, 2008). We explored this last issue by analyzing participants' responses in Trial 1—when they were unaware of what the task would involve–and Trial 2—when they knew what the task would entail. We decided to include this manipulation because it has been argued that a feature of episodic memory is that recollection can occur when encoding is incidental and memory assessment is unexpected (Zentall et al., 2001, 2008). Importantly, recent studies have shown that manipulating the level of intentionality during the encoding phase (intentional encoding vs. incidental encoding) affects recollection for "what," "where," and "in which order" something happened (e.g., Holland and Smulders, 2011; Craig et al., 2016). Finally, we were also interested in investigating the relation between the different measures of episodic memory used in the present study—correct choices in the depletion paradigm and recollection of the behavioral components of episodic memory. A positive relation would support the claim that both measures rely on the same type of memory (i.e., episodic memory).

We hypothesized that if participants remember what, where, and how long ago in an integrated manner (e.g., Clayton and Dickinson, 1998; Clayton et al., 2003), they would choose the popsicle (preferred food) after 3-min has passed and the raisins (less-preferred food) after 1-h has passed. Since the intentionality at encoding has been shown to affect recollection (e.g., Holland and Smulders, 2011), we expected participants to perform better in the second trial compared to the first—both in terms of correct choices and responses to the memory check questions. In particular, we predicted that those participants who received the 1-h RI in Trial 1 (i.e., experienced the melted popsicle) should perform better on the 1-h RI in Trial 2, than those who received the 3-min RI in Trial 1. Finally, if our measures (i.e., correct choices, responses to the memory check questions) tap the same type of memory (i.e., episodic memory), then scores on these measures should be positively correlated. Although previous studies have investigated the relation between what-where-in which order, free recall, and source memory tasks (e.g., Cheke and Clayton, 2013), our study is the first to investigate the relation between the responses used in episodic-like memory tasks in animals

and the responses used in episodic-like memory tasks in humans.

### Methods Participants

Thirty-five University undergraduates were recruited; 11 were excluded due to food preference (e.g., they did not like raisins, they liked raisins more than popsicles) or failure to attend both sessions, resulting in a final sample of 24 (15 females; 9 males). All participants were predominantly White, middle class, and fluent in English. The research was approved by the Office of Research Ethics and Integrity at the University of Ottawa. Participants provided written informed consent.

### Materials and Procedure

We used the exact same materials and procedure as in Martin-Ordas et al. (2017). There were three different cardboard boxes (∼12 cm wide × 19 cm long × 8.8 cm high each) and a wooden platform (91 cm long × 75.5 cm wide) in which three holes (5 cm diameter) were drilled and then covered with a plastic netting (see **Figure 2**). This plastic netting allowed liquid (from the melting popsicle) to pass through and collect inside a cup that was hidden under the platform. The experiment took place in two rooms: Room 1, where the hiding event took place, and Room 2, where the participants waited either 3-min or 1-h—depending on the type of trial.

Participants received two trials separated by five to seven days and each trial consisted of five main events: (1) food preference test, (2) hiding event, (3) critical choice question, (4) memory check questions and, (5) "how long ago" question.

1. Food preference test. E and participant sat facing each other. E placed a box of raisins (4.6 cm long × 3.4 cm wide × 1.7 cm high) and a popsicle (3 cm long × 2.5 cm wide × 1.5 cm high) on two small dishes and asked participants "Which one of these two snacks do you like best: popsicles or raisins?" Note that at this point participants did not receive either food item. Next, E proceeded with the hiding event.

2. Hiding event. E placed the three cardboard boxes on the platform. For each of the two snacks E said: "Look what I have here! I am going to put it here." E then placed the popsicle under one of the three boxes, the raisins under another one and the third box remained empty. Hiding locations and box locations were counterbalanced within and across participants. The rationale for having an empty box was to control for participants remembering which boxes had food under them. However, participants never chose the empty box in Trial 1 or in Trial 2.

There were two types of trials defined by the length of time/RI that elapsed between hiding the food items and allowing participants to choose one of the boxes (i.e., critical choice): 3-min and 1-h. On the 3-min trials, the popsicle and raisins were both available (i.e., edible), whereas on the 1-hour trial the popsicle melted and only the raisins were edible. Fifty percent of the participants received the 3-min trial first followed by either the 3-min trial or 1-h trial. The other 50% received the 1-h trial first followed by either the 3-min or 1-h trial. Thus, the combination of trial type and order of presentation yielded 4 experimental conditions: 1-h (first) trial and 1-h (second) trial; 1 h trial and 3-min trial; 3-min and 3-min trial; 3-min and 1-h trial. Participants were randomly assigned to each of the conditions. During the RIs, participants went to Room 2 and were engaged in unrelated activities (e.g., reading). Importantly, before leaving Room 1, E clearly stated "the door is going to be locked so no one can go inside the room while we are not there."

3. Critical choice question. After 3-min or 1-h, E and participant returned to Room 1 and E asked the participant the critical choice question, "Now you can have what is inside one of these boxes. Which one are you going to choose?" Our critical choice question is analogous to scrub jays being allowed to retrieve a particular food (e.g., peanuts or wax worms) after a predetermined RI. In the 1-h trials, and once the box was

uncovered and participants had answered the memory-check and how long ago questions, E asked participants "What happened to the popsicle?" All participants stated that the popsicle had melted, thus confirming that they understood the melting process.

4. Memory-check questions. E asked three memory-check questions to assess whether participants remembered "what" ("Do you remember what I put under the boxes?"), "where" ("Do you remember which boxes have something under them?") and "what is where" ("Do you remember where the popsicle is? Do you remember where the raisins are?"). These questions are similar to those used to measure episodic memory in the studies with adults. Half of the participants were asked the critical choice question first and the memory-check questions second, whereas for the other half this order was reversed. However, only after participants decided on the location/box they wanted to uncover, and answered the memory check questions, were they shown the content of their chosen box.

5. How long ago question. We always asked this question at the end of the trial, and worded it as follows: "Do you remember when we were in the other room (i.e., Room 2)? Did it feel like the time that it takes to brush your teeth, or like the time that it takes to make dinner and then eat it with your family?" Similar to the experiment with the children (Martin-Ordas et al., 2017), E showed participants two pictures while presenting these two different options; one depicted a person brushing her teeth, and the other depicted a woman cooking with her family and then having dinner. To provide participants with a graphic representation of the duration of the actions, two lines were drawn under each of the two pictures: a short line for "brushing teeth," and a longer line for "making and eating dinner." The rationale behind the "how long ago" question was to assess whether incorrect responses on the critical choice question (e.g., choosing the popsicle after a 1-h RI) were due to difficulties estimating the amount of time/duration of the RIs.

#### Scoring and Analyses

Trials were video-recorded and participants' choices were scored as a function of which box they pointed to first (correct box = 1; incorrect box = 0).

#### **Critical choice question**

Similar to scoring used in previous studies using the depletion paradigm, choosing the box hiding the popsicle in the 3-min trials was scored as "correct" because it is the preferred food and is still edible, whereas choosing the empty box or the box hiding the raisins was scored as "incorrect." In contrast, in the 1-h trials, choosing the box hiding the raisins was considered "correct," and choosing the empty box or the box hiding the popsicle (which had melted and was no longer edible) was considered "incorrect."

### **Memory-check questions ("what," "where," and "what is where")**

Participants received a score of 1 for the "what" question (i.e., "Do you remember what I put under the boxes?") if they responded with both "popsicle" and "raisins." Any other response was scored as 0. Participants received a score of 1 for the "where" question (i.e., "Do you remember which boxes have something under them?") if they pointed at the two boxes that contained the food items. Any other response was scored as 0. For the binding of "what is where" (i.e., "Do you remember where the popsicle is? Do you remember where the raisins are?"), participants received a score of 1 if they pointed at the box containing the popsicle, and at the box containing the raisins. Any other response was scored as 0.

### **"How long ago" question**

For the "how long ago" question (i.e., "Do you remember when we were in the other room? Did it feel like the time that it takes to brush your teeth, or like the time that it takes to make dinner and then eat it with your family?"), participants received a score of 1 if they answered "brushing teeth" after the 3-min trial, and "making and eating dinner" after the 1-h trial.

#### **Analyses**

We used Pearson chi-square tests to analyze performance in the critical choice question in Trial 1, and also performance on the critical choice question in Trial 2 as a function of what type of trial participants received first. We used binomial tests to assess whether participants were above chance in the critical choice question, memory check questions (chance = 33%), and "how long ago" question (chance = 50%). However, because participants' performance was at ceiling in the memory check and "how long ago" questions, correlations between the different measures could not be calculated. Thus, we used a Friedman's test to analyze whether there were differences between the different measures. To do so, proportion scores (i.e., participants' overall success in both trials) were created for the three variables. All statistical tests were exact two-tailed, and results were considered significant if p < 0.05.

### Results

#### Critical Choice Question

#### **Performance in Trial 1**

Participants performed significantly better in the 3-min trial than in the 1-h trial (χ <sup>2</sup> = 10.66, df = 1, p = 0.001). Binomial tests indicated that participants chose the box hiding the popsicle significantly above chance in the 3-min trial (p < 0.001) but failed to choose the box hiding the raisins significantly above chance in the 1-h trial (p = 0.37). In fact, 83% of the adult participants chose the box hiding the popsicle after 1-h (p < 0.001).

#### **Performance in Trial 2 as a function of Trial 1**

To investigate the effect of previous experience, we analyzed performance in Trial 2 as a function of the trial participants received first. Performance in the second 1-h trial was superior for those participants who received the 1-h trial first as compared to those who received the 3-min trial first (χ <sup>2</sup> = 5.33, df = 1, p = 0.021). However, performance in the second 3-min trial was not affected by whether participants received a 3-min or 1-h RI in Trial 1 (χ <sup>2</sup> = 0.444, df = 1, p = 0.505). Together, these results show that participants' choices after the 1-h RI in Trial 2 were significantly affected by which trial they received first (see **Figure 3**).

Further analyses revealed that adults in the 1-h RI performed significantly above chance when they received the 1-h RI in Trial 1 (binomial test, p = 0.017) but not when they received the 3 min RI first (binomial test, p = 0.35). Those participants who received the 3-min RI in Trial 2 performed significantly above chance when they received the 3-min RI in Trial 1 (binomial test, p = 0.017), but those who received the 1-h trial in Trial 1 did not (binomial test, p = 0.097).

### Memory-Check Questions ("What," "Where," "What Is Where")

### **Performance in Trial 1**

Participants' memory for "what," "where," and "what is where" did not differ as a function of trial length. In fact, all participants correctly responded to these questions in the 3-min and 1-h trials.

#### **Performance in Trial 2**

As in Trial 1, participants' performance on the memory-check questions did not differ between the 3-min RI and the 1-h RI. As before, all participants responded to the three questions correctly.

#### How Long Ago Question

#### **Performance in Trial 1**

Ninety-six percent of participants correctly estimated the duration of the 1-h RI and 100% did so for the duration of the 3-min RI.

### **Performance in Trial 2**

All participants correctly estimated the duration of the trial for both the 3-min and 1-h RIs.

#### Relation Between the Critical Choice,

#### Memory-Check, and "How Long Ago" Questions

Because participants' performance was at ceiling in the memorycheck and "how long ago" questions, correlations could not be calculated. Thus, we analyzed whether there were differences between the overall performance in the critical choice question (i.e., combined score on Trials 1 and 2), overall performance in the "what- where-how long ago" questions (i.e., combined score for these three questions on Trials 1 and 2) and overall performance in the binding question "what is where" (i.e., combined score on Trials 1 and 2). Friedman test of differences between overall scores on the critical choice question, "whatwhere-how long ago" questions, and the binding of "what is where" was calculated and rendered a χ <sup>2</sup> = 51.21, which was significant (p < 0.001, n = 24). Post-hoc Wilcoxon tests showed that participants performed worse in the critical choice question compared to the "what-where-how long ago" questions (Z = −3.879, n = 17, p < 0.001), and the "what is where" question (Z = −4.001, n = 18, p < 0.001).

## DISCUSSION

We adapted the what-where-how long ago paradigm previously used with non-human animals and preschool children for use with adults. In Trial 1, participants chose the preferred and perishable food (i.e., popsicle) after the short RI but did not choose the less preferred, non-perishable food (i.e., raisins) after the long RI. However, experiencing the melted popsicle in Trial l improved participants' performance in Trial 2. We also assessed recollection for "what," "where," "what is where," and "how long ago" and found that adults' performance was at ceiling on these measures in both trials. Finally, we analyzed whether there were differences in difficulty between our measures and found that participants performed significantly worse in the critical choice question than in the "what-where-how long ago" and "what is where" questions.

### GENERAL DISCUSSION

In two experiments we presented adults with a what-where-how long ago task. Strikingly, participants struggled to adapt their food choices to the length of the trial. This was irrespective of whether they were asked to imagine (Experiment 1) or actually experienced (Experiment 2) the time elapsed between the hiding of the food rewards and choosing one of the containers. Our results show that memory for the contents of the boxes cannot account for participants' failures in the critical choice questions. Rather, Experiment 2 highlights the role that previous experience might play in depletion paradigms.

### Critical Choice Questions

Participants' performance on the critical choice questions in Experiment 1 and Trial 1 of Experiment 2 was rather unexpected. Although they chose their preferred food after the short RI, they did not correctly choose their less preferred food after the long RI. One could argue that participants may not have been motivated by the food rewards because they were neither particularly hungry nor thirsty. Yet, our observations of participants' reactions in Experiment 2 were quite the opposite—that is, participants expressed disappointment upon seeing that the popsicle had melted. Moreover, when they successfully obtained the reward either the popsicle or the raisins—participants consumed it immediately after the experimenter gave it to them. As such, we do not think that lack of motivation can account for our findings.

We can also rule out the possibility that participants lacked "semantic" knowledge about "melting" given that adults understand the transformation of certain substances (e.g., ice melts with time). This understanding was also confirmed by their responses to the "What happened to the popsicle?" question in Experiment 2 (i.e., all adults stated that it had melted). Importantly, in Experiment 2 we found quite a different pattern of results on the critical choice question for the second 1-h trial. More specifically, in the 1 hour-1 hour condition adults' performance on the critical choice question of Trial 2 significantly improved. These findings suggest that participants correctly chose their less preferred food (i.e., raisins) only when they had previously experienced the melted popsicle.

What are the exact mechanisms that can account for participants' improvement on the critical choice question of the second 1-h trial in Experiment 2? One possibility is that those participants who experienced the melted popsicle in Trial 1 avoided choosing the popsicle in Trial 2—regardless of the duration of the trial. This seems unlikely though given that 67% of the participants still chose the popsicle in the 3-min RI in Trial 2, after experiencing its melting (i.e., 1-h RI) in Trial 1. It also seems unlikely that this improvement was due to a change in participants' preferences in the second trial because participants who received the 3-min trial first chose the popsicle in the second trial independently of its duration.

More plausible is that participants' experience in Trial 1 subsequently shifted their attention in Trial 2 to the relation (i.e., binding) between the elements of the problem (Clayton et al., 2003); that is, to make a correct choice, participants not only had to remember the contextual information (i.e., "what," "where," "what is where") and the temporal information (i.e., "how long ago"), but also integrate them—"how long ago a particular food item was placed where." In fact, whereas one could argue that participants' responses to the critical choice question in Trial 1 could be explained by simply choosing their preferred food—independently of the duration of the trial- integrating the temporal information with the contextual information can conceivably explain their performance in Trial 2. In this sense, our results do not differ from those reported with the scrub-jays (e.g., Clayton and Dickinson, 1998). In Clayton and Dickinson's experiment, scrub-jays experienced four pre-training trials in which they had the opportunity to learn that worms degrade and become inedible after a long time has passed between caching and recovery. Thus, it is conceivable that becoming aware of how the passing of time affects the edibility of the food items is crucial to succeed in a what-where-how long ago task for both humans and non-human animals. An interesting direction for future studies would be to address this issue by directly telling participants how long it takes a popsicle to melt. If becoming aware of the temporal information facilitates performance in the depletion paradigms, participants should succeed in this version of the task. Relatedly, showing that participants' performance generalizes to other kinds of "depletion" paradigms that do not use food as stimuli is also important. For example, one could imagine developing a task in which there is an electronic device (e.g., i-Pad) that plays a preferred game/show but that has a battery that runs out quickly vs. a device that plays a less preferred game/show but has a longer-lasting battery. If participants do indeed have difficulty using duration information in their decision-making process (as we have argued), they should fail to choose the "longerlasting" device/less-preferred game after the long delay—just as participants in our experiments failed to choose the less-preferred raisins. This pattern of results would suggest that our findings are not specific to one particular domain of reasoning, such as "food."

### Memory-Check and How Long Ago Questions

Consistent with results from previous studies (e.g., Plancher et al., 2010; Holland and Smulders, 2011; Craig et al., 2016), adults in our study accurately remembered "what," "where," and "what was hidden where" and, in Experiment 2, also correctly estimated the length of both Trials 1 and 2—a novel feature of our study. Consequently, failing to recall this contextual information cannot account for participants' poor performance in the first 1-h trial. Rather, as mentioned earlier, adults' difficulty appeared to be rooted in their inability to precisely use duration information. Indeed, although adults accurately judged trial duration, they did not appear to integrate this information to then allow them to decide that, after 1-h, the popsicle will have melted. Thus, using duration information when deciding which box to choose appears to be a key to success in the current what-where-how long ago task.

### Incidental Encoding and the Role of Previous Experience

In Experiment 2, participants' improved performance in Trial 2 is also consistent with arguments that the memory processes involved in a first encounter—or "trial," in the context of our study—of an event differ from those involved in subsequent encounters (or "trials") (Zentall et al., 2001, 2008; Plancher et al., 2010). Most notably, Zentall et al. (2001, 2008) argued that deliberate encoding (e.g., use of training phases) helps organisms develop expectations of future rewards. The development of such expectations favors the storing of this information as semantic rather than episodic memories. Thus, in the context of our tasks, this suggests that when participants do not know what they are going to be asked, episodic memory is not sufficient to succeed in the critical choice question (i.e., Experiment 1 and Trial 1 of Experiment 2); however, when they do know (i.e., Trial 2), participants might integrate the spatio-temporal information with the non-episodic information (e.g., semantic facts) to make the correct choice (e.g., "choose the non-perishable food after 1-h").

These results are not only consistent with adults' performance in previous what-where-in which order tasks (Holland and Smulders, 2011; Plancher et al., 2012; Craig et al., 2016) but also with preschoolers' performance in the what-where-how long ago task (Martin-Ordas et al., 2017). Specifically, children's successful estimation of "how long ago" the hiding event took place was related to successful performance in the critical choice question. Crucially, this effect was only true for Trial 2 - that is, once children knew what the task entailed. Martin-Ordas et al. (2017) argued that children might not spontaneously incorporate the duration of the trial into their decisions. This finding is consistent with the results of Experiment 2: Once adults experienced the melted popsicle, they were able to take into account the duration of the trial in order to make their choices.

### Comparisons Between Our Measures

Experiment 2 also allowed us to investigate the degree of relation between the different measures used in depletion paradigms. Adults' performance differed between the critical choice question, the what-where-how long ago questions and the "what is where" question. Participants' better performance in the "memory check" questions compared to the critical choice question also suggests that these measures might tap different memory systems. In fact, previous studies addressing the relation between different measures of episodic memory in adults have also reported such differences (e.g., Plancher et al., 2010; Easton et al., 2012; Cheke and Clayton, 2013; Pause et al., 2013). For example, Easton et al. (2012) found that whereas performance in a "what-where-in which context" task required recollection of the past event (i.e., episodic memory), performance in a "what-where-in which order" task did not. This finding led the authors to conclude that tasks that rely on temporal information might be susceptible to non-episodic strategies.

Although it is true that methodological differences could account for differences in the results across different studies, our task highlights the need to gain better consensus about the "when" component that is measured in episodic memory tests. Because this aspect has been tested in a variety of ways in both the human and animal cognition literatures, it is difficult to compare performance on this measure across studies. For example, whereas time of day (Roberts et al., 2008) or duration (e.g., Clayton and Dickinson, 1998) have been the main temporal markers used in the animal research, order of events (e.g., Cheke and Clayton, 2013; Mazurek et al., 2015; Craig et al., 2016) has been the main temporal marker used in previous "episodic-like" memory tasks with adult humans. Because these different temporal markers might require the involvement of different memory systems—as the current and previous studies suggest, comparing performance across studies is difficult. As such, an important goal for future research and theorizing is to more consistently operationalize the temporal component of the episodic memory system across studies. This is especially important when trying to validate methodologies previously used in the animal literature.

### REFERENCES


### CONCLUSION

Our episodic-like memory depletion paradigms showed that adult humans successfully took into account retention interval when deciding whether to choose a non-perishable or perishable food—but only after having experienced the event once before (i.e., 1-h RI in Trial 1 of Experiment 2). Consistent with previous findings, our results also showed that participants successfully remember episodic components of an event (e.g., "what," "where," "what is where") and also, a new aspect of our task (Experiment 2), "how long ago" a particular event happened. These findings, therefore, suggest that recalling what-where-how long ago and deciding which food item to choose might rely on different memory systems.

### AUTHOR CONTRIBUTIONS

GM-O and CA designed the experiments. GM-O collected the data. GM-O and CA analyzed the data and wrote the manuscript.

### FUNDING

This work was supported by the Government of Ontario and by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada to the CA.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Martin-Ordas and Atance. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Prognostic Value of Motor Timing in Treatment Outcome in Patients With Alcohol- and/or Cocaine Use Disorder in a Rehabilitation Program

#### Susanne Yvette Young<sup>1</sup> \*, Martin Kidd<sup>2</sup> , Jacques J. M. van Hoof <sup>3</sup> and Soraya Seedat <sup>1</sup>

<sup>1</sup> Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa, <sup>2</sup> Centre for Statistical Consultation, Statistics and Actuarial Sciences, Stellenbosch University, Stellenbosch, South Africa, <sup>3</sup> Department of Psychiatry, Radboud University Medical Centre, Nijmegen, Netherlands

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Marc Wittmann, Institut für Grenzgebiete der Psychologie und Psychohygiene (IGPP), Germany Yavor Yalachkov, Universitätsklinikum Frankfurt, Germany

> \*Correspondence: Susanne Yvette Young susanneyyoung@gmail.com;

#### Specialty section:

16073371@sun.ac.za

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 18 April 2018 Accepted: 20 September 2018 Published: 22 October 2018

#### Citation:

Young SY, Kidd M, van Hoof JJM and Seedat S (2018) Prognostic Value of Motor Timing in Treatment Outcome in Patients With Alcohol- and/or Cocaine Use Disorder in a Rehabilitation Program. Front. Psychol. 9:1945. doi: 10.3389/fpsyg.2018.01945 Introduction: Individuals with Substance Use Disorder (SUD) often have cognitive deficits in multiple domains, including motor timing deficits, with recovery times of up to 1 year. Cognitive deficits influence treatment outcomes and abstinence. To our knowledge, timing deficits have not been investigated with regard to treatment outcome and relapse.

Methods: This prospective study tested the prognostic value of motor timing in SUD with regard to treatment outcome. The study sample consisted of 74 abstinent in-patients at a private treatment programme for drug/alcohol dependence at the Momentum Mental Healthcare clinic in Somerset West, South Africa, diagnosed with alcohol and/or cocaine dependence. Participants were tested at three points: (i) Within 72 hours of the start of the treatment programme (ii) after completion of the treatment programme at 8 weeks (measure of treatment response) through filling out self-report questionnaires and experimental motor task testing, and (iii) a third visit followed through a telephonic interview at 12-months (measure of relapse).

Results: Motor timing alone predicted 27 percent of the variance in alcohol self-efficacy score change, and 25 percent variance in cocaine self-efficacy change scores at treatment completion. Specifically, spatial errors, synchronization errors and inter- response interval errors of a spatial tapping task at baseline predicted self-efficacy in alcohol self-efficacy. Cocaine self-efficacy was predicted by spatial errors and contact times of a spatial tapping task at very high tempi (300 ms) only. The high rate of dropout at 12 months post-treatment did not allow for further analysis of the prognostic value of motor timing on relapse.

Conclusions: The results of this investigation show us that motor timing holds prognostic value with regard to treatment outcomes. Motor timing predictors for relapse require further investigation going forward.

Keywords: motor timing, prognostic value, temporal cognition, movement, substance use disorder, cocaine, alcoholism

**122**

## INTRODUCTION

Alcohol and cocaine are amongst the most widely abused substances (The Global Drug Survey 2015 Findings, 2015). Chronic exposure to substances leads to structural and functional brain disturbances (Moselhy et al., 2001; Oscar-Berman and Marinkovic, 2003; Scheurich, 2005; Verdejo-García et al., 2007; Volkow et al., 2010; Bühler and Mann, 2011), which underlie the cognitive decline and behavioral changes found in Substance Use Disorder (SUD) (Miller, 1991; Bates et al., 2002; Goldstein and Volkow, 2011). Recent studies on the neurocognitive effects of long-term substances of abuse show that, instead of specific impairments, dysfunctions occur for a wide array of cognitive domains (Spronk et al., 2013; Stavro et al., 2013). One such domain is motor timing abilities (Wittmann et al., 2007). Motor timing is defined as the ability to organize movement according to temporal structures. One of the few studies to date that attempted to examine motor timing in stimulant dependent individuals, whilst controlling for possible confounds, found that motor timing deficits are present in this population (Wittmann et al., 2007). The stimulant dependent group showed abnormal motor timing abilities on all timing tasks, except sensorimotor synchronization.

The direct influence of these functional deficits on recovery and sobriety of individuals with SUD remains unclear (Bates et al., 2002). Long-lasting changes in brain regions are shown to contribute to relapse, which can occur weeks, months, and even years after substance use (Welberg, 2011). There are few methods to measure the success of SUD treatment outcomes. Self-efficacy, is considered an important indicator in the management of SUDs and in treatment outcome more specifically (Maisto et al., 2000; Burleson and Kaminer, 2005; Ilgen et al., 2005; Dolan et al., 2008; Kadden and Litt, 2011), and defined as an individual's confidence in his/her ability to abstain from certain adverse behaviors, such as substance use (Bandura, 1994). Self-efficacy is seen as an important factor in predicting behavior related to health, the successful application of coping mechanisms (Tate et al., 2008), and changing unwanted behavior (Sheeran et al., 2016). Studies have shown that increased self-efficacy is related to the ability to suppress habitual responses, a higher level of wellbeing, the ability to achieve complete abstinence after treatment, to apply healthier coping mechanisms, increase participation in aftercare, predict the duration of abstinence, and decrease the use of alcohol and other substance use after treatment (Vielva and Iraurgi, 2001; McKay et al., 2003; Warren et al., 2007; Tate et al., 2008). Increased levels of self-efficacy at treatment admission, discharge, and 1 month after treatment was found to be a strong predictor of prolonged abstinence (Coon et al., 1998; Ilgen et al., 2005; Dolan et al., 2008; Kadden and Litt, 2011).

Amongst the more objective measures are blood or urine tests. However, not every treatment setting allows for such measures to be used in a useful way, requiring compromises to achieve the most valid outcome possible. In an inpatient treatment programme, criteria such as abstinence and retention are fulfilled by most, if not all inpatients, and are not necessarily an indication of treatment success or a guarantee of abstinence. In this case a more subjective measure, such as self-reported belief in the ability to abstain is an acceptable measure.

In sum, individuals with SUD often have cognitive deficits in multiple domains, with recovery times of up to 1 year (Spronk et al., 2013; Stavro et al., 2013). These deficits influence treatment outcomes and abstinence (Pitel et al., 2007; Fox et al., 2008). In addition, motor timing deficits have been found in SUD (Wittmann et al., 2007) but, to our knowledge, timing deficits have not been investigated with regard to treatment outcomes. Early detection of motor timing deficits may be predictive of treatment outcomes. Owing to the limited number of pharmacological treatment options, many clinicians worldwide rely solely on psychosocial approaches (Dackis and O'Brien, 2001). Cognitive deficits experienced by individuals with SUD may, therefore, be of broad relevance in psychosocial adaptation, and more specialized research that informs clinical practice and guides future research is needed to improve and broaden treatment options. This prospective study tested the theoretical basis for prognostic indicators in SUD with regard to motor timing (measured in terms of treatment response and relapse). We expected that (i) the capacity to structure, organize and plan an action directly toward a visual target [motor reaction task (Task 1)]; (ii) cognitive control [Go-nogo task (Task 3)]; and (iii) synchronization abilities [Spatial-tapping task (Task 2)] would be prognostic of treatment outcome (self-perceived self-efficacy to abstain from substances) at 8 weeks and possible relapse (dichotomised as "yes/no").

## METHODS

### Sample

The study sample consisted of 74 abstinent patients, aged 18–60 years, and diagnosed with alcohol and/or cocaine dependence. Patients with a primary diagnosis of alcohol and/or cocaine dependence who were detoxified were included. Patients who met criteria for other substance abuse (lifetime or current) were included, provided that these were not their primary drugs of use/abuse. Patients who met criteria for other substance dependence (i.e., other than cocaine/alcohol) were excluded. For the alcohol group, patients were excluded if they had a current or past history of dependence on cocaine. For the cocaine group, patients with a current or past history of alcohol dependence were excluded.

### Procedures

Participants were all inpatients at a private treatment programme for drug/alcohol dependence at a treatment clinic in Somerset West, South Africa. The clinic offers treatment to individuals who are mainly of Dutch nationality as the main patient referral company is situated in the Netherlands. The comprehensive primary care treatment program, which formed the standard of care for all participants, centers on an 8-week cycle of treatment comprising group therapies, individual counseling, written work and a psycho-educational lecture series. All participants worked individually with a therapist. A full medical examination was conducted on every patient included. This consisted of a physical examination and toxicology and biochemistry work-up by the psychiatric nursing staff.

Participants were tested at three points in time: (i) within 72 hrs of the start of the treatment programme, (ii) after completion of the treatment programme at 8 weeks (measure of treatment response), and (iii) at the 12-month follow-up period (measure of relapse). Designated counselors at the clinic enquired from patients about their potential interest in participating in the study. Only participants who gave written consent and who were eligible upon screening were invited for a first research visit. After written consent was obtained, participants were enrolled for participation. Two study visits were conducted at the clinic. Each of these visits entailed filling out self-report questionnaires and experimental motor task testing. During baseline assessments a socio-demographic questionnaire, the Measurements in the Addictions for Triage and Evaluation.2 (MATE.2.10) (Schippers et al., 2010), the Mini International Neuropsychiatric Interview version 5 (MINI 5) (Lecrubier et al., 1997), the Edinburgh Handedness Questionnaire (EHQ) (Büsch et al., 2010), The Alcohol Use Disorders Identification Test (AUDIT) (Lundin et al., 2015), and Drug Use Disorders Identification Test (DUDIT), (Hildebrand, 2015), the Sheehan Disability Scale (SDS)(Beck et al., 2004), and the Beck Depression Inventory (BDI) (Beck et al., 1988), and the Alcohol Abstinence Self-Efficacy Scale (AASE) and the Cocaine Abstinence Self-Efficacy Scale (CASE) (DiClemente et al., 1994), the Short Alcohol Withdrawal Scale (SAWS) (Gossop et al., 2002), and a motor task battery (see section Temporal Processing: Action-Based Timing Tasks) were administered. During the second visit (at treatment completion) the MATE.2.10, SDS, BDI, AASE, CASE were repeated. All assessments were conducted in a structured manner by either the principal investigator or a trained research assistant. One research assistant was appointed for a period of 2 years. For quality control, all questionnaires and task performance scores, including data entry, were cross checked by both the PI and the research assistant. For the administration of all assessments, standard operating procedures were followed. Task instructions were read out in the same way to each participant. The same order of assessment was used for each visit and for each participant. After completion of the first visit, an appointment for a second assessment was made. Assessments were undertaken within 72 hrs of initiation (visit 1) of the treatment program and repeated at the end of the 8 weeks (last 72 hrs, visit 2). A telephonic interview using the MATE.2.10 (Schippers et al., 2010) was administered at 12 months to assess relapse. The research team did not stay in contact with the patient during the time between discharge and follow up, due to patient privacy policies of the clinic. All data were de-identified and kept confidential. In order to encourage honesty patients were reminded that none of test results were to be shared with clinical staff.

### Measures

Gender, age, handedness, ethnicity, education, family history of substance dependence, previous admissions/counseling/therapy history, symptoms of disability, and drug or alcohol usage (including last intoxication, last drink and last withdrawal), depression, and psychopathology were assessed with a selfadministered demographic questionnaire, the EHQ (Büsch et al., 2010), The MATE.2.10 (Schippers et al., 2010), MINI 5 (Lecrubier et al., 1997), AUDIT (Lundin et al., 2015), and DUDIT (Hildebrand, 2015),the SDS (Beck et al., 2004) the BDI (Beck et al., 1988) and the SAWS (Gossop et al., 2002).

### Self-Efficacy

The AASE and CASE (DiClemente et al., 1994) are both selfreport questionnaires consisting of 20 questions that give an indication of the degree of self-efficacy to abstain from substance use (i.e., the confidence to abstain from alcohol and / or cocaine). Items have a 5 point Likert scale ranging from not at all (1) to very much (5) for example, the level of temptation that a person experiences to use a substance in a specific situation like when he/she is concerned about someone. Four subscales can be distinguished (1) social situations, (2) negative affect, (3) positive emotions, and (4) physical or other worries (DiClemente et al., 1994). For a total score, all items are added up and divided by the number of questions (20).

### Temporal Processing: Action-Based Timing Tasks

The motor tasks consisted of a series of reaction-prediction visuo-motor pointing tasks to measure different aspects of motor timing (motor sequencing, synchronization, and decisionmaking). The sequential pointing tasks were all designed by Professor Y. Delevoye-Turrell and her team at the University of Lille, France. These tasks have been used in previous research but not SUD research, nor in prognostic research of any kind previously (Delevoye-Turrell et al., 2007, 2012; Dione et al., 2013; Dione, 2014; Dione and Delevoye-Turrell, 2015). For testing, participants were seated in a chair in front of a tactile screen (Elo Touch) of 53 cm by 36 cm by 30 cm. The flat resting screen was placed horizontally and in close proximity to the participants' midline in order to avoid muscle fatigue from the repetitive pointing movements. Visual and auditory signals were controlled via a PC with coded software in C++. For a detailed overview of these tasks, please see protocol publication (Young et al., 2016).

### **Reactivity: the motor reaction task**

Motor sequencing abilities were evaluated using a simple fingerpointing task to visual dots presented on the touch screen. Participants are required to lift (action initiation- measured as Reaction Time), and touch (action execution- measured as Movement Time), one dot (condition one,) a series of two (condition 2), or three dots (condition 3).

The manipulation of the complexity (the number of dots) of the motor sequence provided the means to assess lower order timing mechanisms (one target) and higher order mechanisms (2 and 3 dots) through the capacity of participants to structure, organize, and plan an action through time and space by ensuring accurate pointing in combination with fast movements. Condition 1 is designed to measure lower order mechanisms of movement initiation and execution, whereas condition 2 and 3 are designed to measure higher order mechanisms through increased complexity requiring structuring and planning of motor timing. Participants are instructed to start with the index finger of the dominant hand placed on the square starting zone which is situated at the bottom left edge of the screen. As soon as a black dot appears on the screen, the task is to lift off from the target (square) and touch the target(s) as fast as possible. Three levels of complexity are counterbalanced: one target, two-target or three-target conditions.

#### **Synchronization: the spatial-tapping task**

With this task, we aimed to evaluate how well self-initiated actions to external stimuli, present in the environment, are timed (synchronized) using a Spatial-tapping task (Dione, 2014). This task measures pointing accuracy in time and space as well as error in fluency and accuracy. On the tactile screen display are six black dots 100 mm apart in a circle. The task is to touch each target, one after the other, starting from the bottom right target, and moving counter-clockwise using the right index finger (fist closed). The tempo of the external rhythm is fixed in terms of inter stimulus interval (ISI) and is considered an important independent variable in timing research. Each condition is constituted of a series of sixty taps of, in total, 5 trials (ISI = 1100 ms; 700, 500, 400, and 300 ms). The total duration of the task is approximately 10 min. In each trial, participants are presented with an auditory rhythm that must be used to pace their actions. After listening to the tones for 5.5 s, participants start tapping for a total trial duration of 35 s. Timing performances on this task were measured through inter-response interval errors (IRI error) and synchronization errors (Asynchrony). The IRI was measured as the time intervals between the start of two successive taps. The IRI error was then computed as the percentage of absolute difference between each IRI and the reference ISI of a given trial. Asynchrony was measured through the difference between onset of a tap and the time of onset in the external rhythm. Spatial performances were measured through the measurement of endpoint distributions of pointing actions and were plotted as a function of each visual target position. The mean spatial error (SE) of these spatial ellipses were used as an indication of spatial performances. The control of pauses was measured through contact time (CT) and defined as the time of finger contact with the touch screen. This measure (in ms) was used as an indicator of the amount of voluntary pauses in the gesture. See **Figure 1** for an overview of how IRI errors, CT, and Asynchrony were measured.

#### **Cognitive control: the go-nogo task**

A modified version of the Go-nogo paradigm was designed to measure reaction times through a tactile touch of the touch screen. The starting zone is situated at the bottom left edge of the screen. The target is a white circle with a black letter or onedigit black number and participants are instructed to act as fast as possible (Go) or to refrain from acting (Nogo), depending in the condition of the task. In the first condition, the task is to tap the target that appears as fast as possible (100% Go). In the following blocks, participants are instructed to react and tap the target as fast as possible, but only if the target is a letter (50% Go). If the target is a number, they are to refrain from reacting (Nogo). Numbers and letters were presented in semi-random order. The targets were presented for 5 s on the screen, with a random phase lag of ±300 ms in order to avoid anticipatory responses. Cognitive control was measured through decision making (by measuring reaction times based on the participant's response directly after a Go target or after a Nogo target) and adaptability (by measuring reaction times on responses on targets that came directly after a Nogo Target Error).

### Data Analyses

Backward step-wise regressions were conducted to establish the best fit of motor timing variables regarding their predictive power on self-efficacy total score change at 8 weeks. Best subset regressions were used to select the best fitting models out of the top 20 models with the least number of predictor variables.

## RESULTS

### Sample

### Demographics

All participants included in this study completed treatment. All participants were right handed, (n = 74), 80 percent were male, and the mean age was 36.6 years old (SD = 10.5, mode = 27, range 19–60). Forty-two participants (59%) were employed, and 27 participants (36.5%) were receiving unemployment benefits. Half of the participants were single, 13 participants (20%) were divorced and 28 participants (40%) had children.

### Clinical Characteristics

Patients with comorbid disorders, as assessed on the MINI 5, at the beginning of their treatment were excluded from entry into the study; however, at discharge (8 weeks), some participants had been diagnosed by their treating clinicians, during the course of treatment, with comorbid disorders (n = 10, 15% Axis 1 Psychiatric disorders; n = 15, 20% Axis II Personality Disorders; n = 5, 7% both Axis 1 and 2). Previous outpatient treatment had been attempted unsuccessfully by 38 participants (51.4%) while 23 participants (31%) had received psychotherapy, 12 participants (16.2%) had previously been admitted to psychiatric inpatient care (non SUD- majority due to a failed suicide attempt), and for 21 participants (29%), this was the second (or more) attempted inpatient rehabilitation. All patients were detoxified before treatment. However, upon admission, 23 (31%) of the participants had a positive alcohol test (through a breathalyzer examination) while 38 participants (54%) had a positive drug test (cocaine n = 25 (33%), benzodiazepine n = 8 (10.8%), cannabis n = 5 (6.8%), and amphetamine n = 1 (1.4%). Craving symptoms were minimal at baseline (MATE Q1 cut off scores of <12 are considered minimal craving) (m = 7.5, SD = 3.9). Withdrawal symptoms at admission were minimal on average (m = 8.66, SD = 6.5, Mdn = 7), however a minority of participants suffered from moderate to severe withdrawal (cut off score for minimal withdrawal <12, CI = 0;30). Drug use other than cocaine and/or alcohol was minimal, with 9 percent using ecstasy, other stimulants (e.g., Speed, Methamphetamine, 15 percent) and sedatives (12 percent) in the 30 days before admission. The severity of psychiatric comorbid symptoms was below threshold on the Anxiety, Depression and Stress scale (MATE Q2 total score of < 60) (m = 41.8, SD = 25.2,

mode = 12). A detailed overview of the clinical and demographic results can be found in **Table 1**.

### Main Results: Treatment Outcomes Self-Efficacy to Abstain From Alcohol Use

An overview of the timing task results can be found in **Table 2**. A best subset regression analysis of all motor tasks showed that motor timing deficits at baseline hold prognostic value with regard to self-efficacy to abstain from alcohol use (R <sup>2</sup> = 0.27). Both the Motor Reaction task and the Go-nogo timing task were not predictive of self-reported self-efficacy to abstain from alcohol use. Of the Spatial Tapping Task, SE (at ISI 300 ms) at baseline were predictive of total change in percentages in selfreported self-efficacy to abstain from alcohol use (b = −0.26, t(50) = −2.05, p = 0.04). Furthermore, Asynchrony of the Spatial Tapping Task was found predictive of change in alcohol selfefficacy scores at discharge. Asynchrony (at ISI 400 ms) of the Spatial Tapping Task at baseline were predictive of total change in alcohol self-efficacy scores (b = −0.37, t(50) = −2.14, p = 0.03). IRI of the Spatial Tapping Task were also found to be predictive of alcohol self-efficacy to abstain from alcohol use at ISI 500 ms intervals (b = −0.28, t(50) = −2.10, p = 0.04) and ISI 700 ms intervals (b = −0.28, t(50) = −2.01, p = 0.04). Although not statistically significant, Asynchrony and IRI errors of the Spatial Tapping task at the 1100 ms interval conditions occurred in 20 and 17 times, respectively, in the top 20 best predictor models.

#### Self-Efficacy to Abstain From Cocaine Use

A best subset regression analysis showed that motor timing deficits at baseline hold prognostic value with regard to selfefficacy to abstain from cocaine use (R <sup>2</sup> = 0.25). Both the Motor Reaction task and the Go-nogo timing task were not predictive of self-reported self-efficacy to abstain from cocaine use. SE of the Spatial Tapping Task at 300 ms intervals (b = −0.31, t(50) = 2.62, p = 0.01) and at 500 ms intervals (b = 0.36, t(50) = 2.69, p < 0.01) at baseline were predictive of total change in percentages in selfreported self-efficacy to abstain from cocaine. CT of the Spatial Tapping Task at 300 ms intervals were also found to be predictive of total change in cocaine self-efficacy (b = 0.31, t(50) = −2.62, p = 0.01). Although not significant, Asynchrony of the Spatial Tapping Task at 300 ms interval condition occurred in 17 of the top 20 best predictor models.

#### Prognostic Value of Motor Timing in Relapse Prediction

Of the 74 participants, 44 were interviewed at 12-months postdischarge, with 30 participants lost to follow up. Data from 36 participants with the least missing data were used for these analyses. Of these 36, 6 relapsed while all other participants remained abstinent of drugs and alcohol use post-discharge. The small sample, and limited power, precluded analysis of motor timing predictors of relapse.

### DISCUSSION

The main aim was to test for prognostic indicators in SUD with regard to motor timing (measured in terms of treatment response). We expected that motor coordination and planning abilities, synchronization abilities and decision making would be prognostic of treatment outcomes (self-perceived efficacy to abstain from substances) at 8 weeks and relapse at 12 months (yes/no). With regard to treatment outcomes, we found that TABLE 1 | Demographic and clinical characteristics of the separate groups and all patients combined.


Results of separate groups; Intelligence: Nederlandse Leestest voor Volwassenen, AUDIT, Alcohol Use Disorder Identification Test; DUDIT, Drug Use Disorder Identification Test; Quality of Life, Sheehan Disability Scale (SDS), GAF, Global Assessment of Functioning. Physical complaints MATE 5; Measurements in the Addictions for Triage and Evaluation, Physical Complaints/health related symptoms (withdrawal) in the last 30 days; Craving MATE Q1 Measurements in the Addictions for Triage and Evaluation, Craving Scale regarding the last 30 days; Comorbid symptom severity MATE Q2, Measurements in the Addictions for Triage and Evaluation, Anxiety, Depression, Stress Scale last 30 days.

only the Spatial Tapping Task variables were predictive, and explained 27% of alcohol use self-efficacy, and 25% of cocaine use self-efficacy at discharge.

With regard to alcohol self-efficacy, SE (at ISI 300 ms), Asynchrony (at ISI 400 ms) and IRI Errors (at ISI 500 and 700 ms) were predictive of self-perceived self-control to abstain from alcohol use. With regard to cocaine self-efficacy, SE (at ISI 300 and 500ms) and CT (at ISI 300) were predictive of self-perceived self-control to abstain from cocaine use. Due to the very small number of participants who could be reached for follow-up, the analyses of motor timing variables with regard to relapse at 12 months were omitted.

Interestingly the motor timing variables predicting cocaine and alcohol self-efficacy were not the same. This may indicate that there are different factors at play in different SUDs. The only timing variable that was shared by both alcohol and cocaine self-efficacy, and both at high tempi only, was SE on the Spatial Tapping Task. Spatial abilities rely heavily on visual feedback and patients may choose to be accurate above being correct which could point to high compulsivity levels in patients. What the predicting variables have in common is that they are all at high tempi. This, again, may point to deficits that only manifest when patients are under pressure, namely when the cognitive load goes up, which is the case when time constraints are present, deficits become apparent.

Another interesting assumption that can be made, based on our findings, is the overlap between millisecond timing and SUD deficits found in brain circuitry. The literature suggests that the use of substances is associated with deficits in frontal lobe and striatal functioning (Moselhy et al., 2001; Spronk et al., 2013) through alteration in activation of the corticolimbic reward circuit (Welberg, 2011). Aspects of self-control, delayed self-gratification, drive inhibition and anticipation of the consequences all require the functional integrity of executive pre-frontal cortical system (Lyvers, 2000). The breakdown of orbitofrontal cortical communication may, in part, explain the decrease in motivation and self-control experienced in individuals with SUD (Dackis and O'Brien, 2001; Welberg, 2011). A recent study examining brain circuits involved in time perception in the millisecond and second ranges probed the role of the right supplementary motor area (SMA), the right dorsolateral prefrontal cortex (dlPFC), and the cerebellum (Méndez et al., 2017). Researchers temporarily altered activity in healthy participants using transcranial magnetic stimulation with the continuous Theta Burst Stimulation (cTBS) protocol. Participants were tested on a temporal categorization task before and after stimulation using intervals in the hundreds and thousands of milliseconds ranges, as well as on a pitch categorization task, used as a further control. Researchers looked for changes in the Constant Error and the Relative Threshold, which, respectively, reflect participants' accuracy at setting an interval that acts as a boundary between categories and their sensitivity to interval duration. The researchers found that after cTBS in all of the studied regions, the Relative Threshold, but not the Constant Error, was affected, and only when hundreds of milliseconds intervals were being categorized. Categorization

TABLE 2 | Means and standard deviations of all motor tasks performances of patient group comparisons at baseline.


of pitch, and thousands of milliseconds intervals were not affected. These results suggest that the frontocerebellar circuit is particularly involved in the estimation of intervals in the hundreds of milliseconds range (Méndez et al., 2017). This overlap in brain circuitry is affected by SUD, and motor timing in the millisecond range may indeed hold promise for future research focusing on biomarkers of SUD or indicators of the severity of damage due to substance abuse.

One explanation of how motor timing deficits could contribute directly to higher predisposition for relapse in addiction is proposed by van Hoof (2002, 2003). The model explains that the motoric mechanisms necessary for grasping stationary and moving objects evolved and matured to organize cognitive and emotional processes, such as affiliation and intimidation. This organizational process resulted in the capacity to organize intentional behavior van Hoof (2002, 2003). Thus, mental representations of intended or goal-action effects are responsible for the planning and execution of appropriate movements required to achieve a goal van Hoof (2002, 2003). Following this model, major psychiatric disorders (e.g., schizophrenia and SUDs) may be understood as manifestations of imbalances between an automatic mode of action (referred to as the Drive Mechanism) and a more cognitive-predictive mode of action (referred to as the Guidance Mechanism, GM). This bimodal distribution and evolutionary neurobiological model may provide a useful pathogenic framework for the classification of major psychiatric disorders, including SUDs van Hoof (2002, 2003), and is tested as part of ongoing investigation (Young et al., 2016).

Several limitations warrant mention. First, the high attrition rate at the 12-month follow-up precluded the analysis of predictors of relapse. The high rate of attrition may have been mitigated by a shorter time to follow-up and the use of face-to-face structured interviews rather than telephonic interviews, supplemented by urine drug testing, to confirm abstinence. Another limitation of the study was the use of a subjective (self-reported self-efficacy) rather than more objective measures available. As mentioned previously, among the more objective measures are blood or urine tests. However not every treatment setting allows for such measures to be used in a useful way, requiring compromises to achieve the most valid outcome possible. The validity of treatment outcome measures in research depend on the type of treatment that patients are undergoing. Even though lacking in objectivity, self-efficacy is a subjective but an acceptable measure of treatment outcome in our research setting. The study of treatment success in an inpatient, closed-off, treatment setting precludes the assessment of more objective outcomes, such as retention and abstinence. Retention and abstinence are achieved by most in these settings, which, if used as indicators would give the false impression of greater treatment success. However, due to the subjective nature of the outcome measures used the results should be interpreted with care. Another limitation is that participants without comorbid disorders and partcipants who did not use psychotropic medications, at baseline, were included in the

study, in order to avoid the confounding effects of comorbid psychopathology and the effects of psychotropic medications on motor timing performance. While this may reduce the generalizability of these findings to patients with SUD and comorbid psychopathology, even though we excluded patients with a comorbid disorder at baseline, by the end of treatment more than a third of the sample had been diagnosed with comorbid disorders by their treating clinicians. This is not unexpected given that (i) dual diagnosis is highly prevalent in this population and (ii) when patients with SUD enter treatment, it is often necessary to observe them after an extended period of abstinence in order to distinguish between the effects of substance withdrawal (which can be prolonged) and the symptoms of comorbid mental disorders. In examining baseline predictors of relapse, comorbid disorders were not adjusted for in the analyses. This poses another question: is it the comorbid disorder that may have had mediating effects? Another limitation is that patients were in treatment for a period of 8 weeks. During this period, they did not have access to their phones, ate healthily, exercised, engaged in a structured programme in a supported and therapeutic milieu, and did not face usual life stressors. This "stability" of environment may have impacted on the findings of our research. Research attempting to replicate the results in outpatient populations may shed light on this possible limitation.

Another limitation was that even though patients were detoxified before arrival at the clinic some of them still tested positive for substances. The clinic which Dutch patients were admitted to is situated in South Africa; however, the long trip to SA may have resulted in some patients using substances during their travels. This means that a number of patients may have undergone another withdrawal during their stay in the clinic. Even though withdrawal and craving were well below cut-off scores, some still experienced moderate to severe symptoms, such as tremor which may have influenced performance on the motor tasks. This limitation may have influenced the results of this study.

Future research should focus on more diverse populations with SUD and on inpatients and outpatients who are at different points in their recovery process. A possible explanation for the association between cognitive load and motor timing abilities in SUD patients suggests that time constraints and errors may be perceived as (more) stressful; they also increase (perceived) cognitive load and subsequently lead to loss of control over inhibition and rhythmic abilities. To our knowledge, this is the first study to demonstrate such an association, and based on our findings, replication studies on motor timing abilities in SUD samples, their prognostic value and their specificity for different SUD, are warranted.

### AVAILABILITY OF DATA AND MATERIALS

The raw data and materials supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Declaration of Helsinki and the South African Guidelines for Good Clinical Practice, University of Stellenbosch's Health Research Ethics Committee. The protocol was approved by the University of Stellenbosch's Health Research Ethics Committee. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

## AUTHOR CONTRIBUTIONS

SY: Has made substantial contributions to the conception and design, acquisition of data, and analysis and interpretation of data. SY has been involved in drafting the manuscript and revising it critically for important intellectual content. SS: Has made substantial contributions to the conception and design, analysis and interpretation of data. SS has been involved in drafting the manuscript and revising it critically for important intellectual content. SS provided final approval of the version to be published. MK: Has made substantial contributions to the statistical analysis and interpretation of data. MK has been involved in revising the manuscript critically for important intellectual content. MK provided final approval of the version to be published. JvH: Has made substantial contributions to the conception and design, and interpretation of the data. JvH has been involved in revising the manuscript critically for important intellectual content. JvH provided final approval of the version to be published. All authors have read and approved the final manuscript.

### FUNDING

This work is supported by the South African Research Chair in PTSD hosted by Stellenbosch University, funded by the DST and administered by The National Research Foundation of South Africa (NRF SA) and Stellenbosch University's Consolidoc Award programme. Additionally, the research project and publication costs are supported by the Late Estate Hendrik Vrouwes Foundation (NEDBANK Educational Bursary Programme) South Africa. NRF SA has awarded SY with a scholarship for the duration of the study. The French National Research Agency grant-ANR-2010-BLAN-1903-01 has partly funded the research project through contributions to Professor Yvonne Delevoye-Turrell and her team for the costs of the design and development of the motor task battery which has been used in several studies with different populations. Additionally, the motor task battery data analyses have been funded by the National Research Agency grant-ANR-2010-BLAN-1903-01.

### ACKNOWLEDGMENTS

We would like to acknowledge the University of Lille, France, and Professor Yvonne Delevoye-Turrell for their support of this project. We would like to acknowledge the University of Amsterdam and Professor Anneke Goudriaan for their support and guidance. We would like to thank the UCLA/South Africa Chronic Mental Disorders Research Training Programme for their support and guidance. We would like to acknowledge the Hendrik Vrouwes Foundation, Nedbank, South Africa, The National Research Foundation South Africa, the South African Research Chairs Initiative of the Department of Science and

### REFERENCES


Technology Departmental, for their financial support of the project. In addition, We would like to thank Prof Lize Weich for her support of this project. I would also like to thank Justine Blampain, Merel van Gelder, Anouk Albien, Bodine van Styrum, and Mandi Broodryk for their research assistance.


on health-related intentions and behavior: a meta-analysis. Health Psychol. 35, 1178–1188. doi: 10.1037/hea0000387


reward regions in cocaine abusers. NeuroImage 49, 2536–2543. doi: 10.1016/j.neuroimage.2009.10.088


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Young, Kidd, van Hoof and Seedat. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Functions of Prospection – Variations in Health and Disease

#### Adam Bulley<sup>1</sup> and Muireann Irish2,3 \*

<sup>1</sup> Centre for Psychology and Evolution, School of Psychology, The University of Queensland, St. Lucia, QLD, Australia, <sup>2</sup> The University of Sydney, Brain and Mind Centre, School of Psychology, Sydney, NSW, Australia, <sup>3</sup> Australian Research Council Centre of Excellence in Cognition and its Disorders, Sydney, NSW, Australia

Much of human life revolves around anticipating and planning for the future. It has become increasingly clear that this capacity for prospective cognition is a core adaptive function of the mind. Here, we review the role of prospection in two key functional domains: goal-directed behavior and flexible decision-making. We then survey and categorize variations in prospection, with a particular focus on functional impact in clinical psychological conditions and neurological disorders. Finally, we suggest avenues for future research into the functions of prospection and the manner in which these functions can shift toward maladaptive outcomes. In doing so, we consider the conceptualization and measurement of prospection, as well as novel approaches to its augmentation in healthy people and managing its alterations in a clinical context.

#### Edited by:

Patricia J. Brooks, College of Staten Island, United States

#### Reviewed by:

Mattie Tops, VU University Amsterdam, Netherlands Guido Schillaci, Humboldt-Universität zu Berlin, Germany Gail Robinson, The University of Queensland, Australia

#### \*Correspondence:

Muireann Irish muireann.irish@sydney.edu.au

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 23 August 2018 Accepted: 06 November 2018 Published: 27 November 2018

#### Citation:

Bulley A and Irish M (2018) The Functions of Prospection – Variations in Health and Disease. Front. Psychol. 9:2328. doi: 10.3389/fpsyg.2018.02328 Keywords: episodic future thinking, episodic foresight, decision-making, hippocampus, prefrontal cortex, Alzheimer's disease, frontotemporal dementia, evolution

### INTRODUCTION

A core function of the human mind is to predict and prepare for the immediate and distant future. The capacity for future-oriented cognition has been called prospection (Gilbert and Wilson, 2007; Szpunar et al., 2014), an umbrella term that has been used to cover an array of cognitive phenomena from low-level sensory prediction to the creation of long-term plans (Seligman et al., 2016). Here we focus on one form of prospection: episodic foresight or episodic future thinking – defined as the imagination of personal future scenarios (Atance and O'Neill, 2005; Suddendorf and Moore, 2011; Szpunar et al., 2014) 1 . This topic has spurred robust debate concerning the underlying mechanisms of future-directed control, and its consequences for a multitude of adaptive behaviors.

To date, prospection has been implicated in everyday adaptive functions as diverse as flexible planning, prospective memory, emotion regulation, and deliberate practice (for reviews see Schacter et al., 2017; Suddendorf et al., 2018). In this article, we first appraise two important general functions of prospection: goal-directed behavior and flexible decision-making. We then explore how variation, as observed via individual differences and lifespan changes, as well as mechanistic alterations in psychopathology and in neurodegenerative disease, affect its functions. A theme of our analysis is that changes in prospection can be both adaptive or maladaptive, and discerning between these outcomes remains an important challenge. To this end, we focus on the following key questions: How do alterations in prospection broadly influence its expressions and functions? How can we objectively categorize differences or changes in prospection? And perhaps most importantly

<sup>1</sup>Relevant reviews of prospection and its measurement in the context of 'low-level' sensory prediction and reinforcement learning can be found in Friston (2009), Bar (2011), Bubic and Abraham (2014) ´ , Clark (2015), and Pezzulo (2016).

in practical terms, how and when do alterations in prospection become clinically relevant? Finally, we explore important future directions, suggest avenues for improving measurement of prospection, and outline novel approaches to its augmentation in healthy people and management in a clinical context. These include prospection training and 'strategic compensation' via cognitive offloading.

### PUTATIVE FUNCTIONS OF PROSPECTION

fpsyg-09-02328 November 23, 2018 Time: 15:50 # 2

First, what is a 'function'? An evolutionary approach to cognition and behavior views 'functions' as the utility that an adaptive cognitive system or behavior affords to reproductive fitness. Alternatively, 'adaptive' and 'functional' in the clinical literature and elsewhere can refer to (a) contributions to 'beneficial' everyday activities, and/or (b) the case where 'standard' operations are not impaired (e.g., Mercuri et al., 2016). Here, we focus on two such current functions pertinent to the activities of contemporary everyday living and relevant for wellbeing, namely goal-directed behavior and intertemporal decision-making<sup>2</sup> .

### Goal Directed Behavior

One of the most intuitive functional benefits of prospection is in relation to the setting and pursuit of goals, which can be assessed at different levels of analysis. As a reflection of desired or undesired possible future states of the world, goals are, by definition, prospective in nature. Goals may result from the simulation of possible outcomes and ascertaining their emotional significance, yet a goal is more than an "affective forecast" (Wilson and Gilbert, 2005) – it is inherently motivational (Pezzulo et al., 2014). Mental simulations of the future in humans tend to cluster around personal goals, suggesting they represent common mechanisms for organizing and driving adaptive behavior (D'Argembeau, 2016; Lehner and D'Argembeau, 2016). As such, the proclivity for humans to engage in self-referential forms of future-oriented thinking when not otherwise engaged by the external environment has been interpreted as an adaptive manifestation of the brain's "default" mode (Spreng et al., 2009).

Goal-directed behavior ostensibly underpins many important capacities. One notable example is deliberate practice: repeated actions driven by the goal to improve future capacities (Suddendorf et al., 2016). Deliberate practice is critical not only for achieving expert-level performance on specific tasks, but also for acquiring the wide range of abilities necessary for everyday life. Prospection underpins deliberate practice because it enables people to consider their future self as alterable, with abilities or knowledge that are an improvement on the present. This recognition also serves a motivational role by providing a small-scale internal representation of future payoffs. Thus, deliberate practice is just one useful function of having a 'temporally extended self' encompassing memories and anticipations alongside a self-referential narrative that guides the continuing accumulation of skill and knowledge for longterm ends (see Conway, 2005; Prebble et al., 2013). Disruption to deliberate practice in adulthood has clear clinical relevance, yet the role of prospection in this regard has received little attention to date. Exploring the development of deliberate practice in children may offer a useful testbed for understanding its alteration and deterioration in adulthood (Suddendorf et al., 2016).

Decades of research have implicated the frontal lobes in supporting goal-directed behavior (Shallice and Burgess, 1991; Duncan and Owen, 2000). One striking example of compromised goal-directed behavior in the context of frontal lobe dysfunction is provided by the behavioral-variant of frontotemporal dementia (bvFTD), a younger-onset dementia syndrome characterized by habitual, perseverative, and stereotypical behaviors due to degeneration of the medial prefrontal cortex. Patients with bvFTD display a marked incapacity to engage in prospective forms of thinking including simulating the future across personal (Irish et al., 2013), and non-personal (Irish et al., 2016) contexts. Patients increasingly become tethered to the present moment, showing highly inflexible and impulsive behavior driven by a need for immediate gratification where rewarding stimuli are concerned (Ahmed et al., 2015; Wong et al., 2018). An apparent lack of regard for the outcomes of such actions is noted, despite patients retaining an awareness of the ill-timed or inappropriate nature of their behavior.

Unsurprisingly, myriad functional domains related to prospection are compromised in bvFTD (see Irish and Piolino, 2016), as is frequently reported in frontal-lobe syndromes (Shallice and Burgess, 1991; Bechara et al., 2000). Notably, prospective memory, i.e., memory to perform intentions after a delay, is adversely impacted across event and time subscales in bvFTD (Kamminga et al., 2014; Dermody et al., 2015) with patients gravitating toward an increasingly present-oriented response style. Moreover, during conditions of minimal cognitive demand designed to elicit mind wandering (O'Callaghan et al., 2015), bvFTD patients display a marked propensity for stimulusbound thinking, reflecting an increased reliance on the external environment similar to that observed in 'environmental dependency syndrome' (O'Callaghan et al., 2017).

### Flexible Intertemporal Decision-Making

Because people can imagine specific future scenarios, they often face a conflict between anticipated outcomes and present circumstances. Intertemporal trade-offs between immediate and delayed costs and benefits are ubiquitous in everyday life (Loewenstein et al., 2003), spanning routine decisions about what to eat for lunch (enjoy the snack, or adhere to one's diet?) to more profound concerns regarding whom one should marry (perhaps better prospects lie on the horizon?) In laboratory tasks, participants typically make a series of choices between smaller but sooner and larger but later monetary rewards (e.g., \$5 now versus \$15 in 1 week). Variation in answers to these questions reflects 'choice impulsivity' (Gullo et al., 2014; Hamilton et al., 2015), a clinically relevant trait variable

<sup>2</sup>Note that the evolutionary "functions" and "function" in the clinical sense may in some circumstances converge, but they need not. Evolutionary processes create systems that maximize inclusive fitness, and while wellbeing is often a proxy for the successful operation of these functions, the two frequently diverge (Nettle, 2005; von Hippel, 2018).

FIGURE 1 | A role for cued prospection in adaptive intertemporal choice? (A) From a between-participants study with 297 participants: The mean proportion of larger, later (rather than smaller, sooner) rewards chosen in the Kirby monetary intertemporal choice task when participants were cued with neutral mental imagery (e.g., folding up paper), positive episodic future events (e.g., spending time in nature in 1 week), and negative episodic future events (e.g., getting food poisoning in 1 week). Imagining the future was associated with reduced delay discounting regardless of the valence. ∗∗∗ = Significant at p < 0.001. (B) In the same study, ratings of the event cues demonstrated strong correlations between the vividness with which events were imagined and the emotional impact of those events (valence: 1–7, low scores equate to negative valence and high scores equate to positive valence), illustrating the close ties between episodic mental simulation and emotion. Positive r = 0.62, negative r = −0.54, p's < 0.001. Figure from Bulley et al. (unpublished).

which may relate to life expectancy (Bulley and Pepper, 2017), unhealthy behaviors (Story et al., 2014), obesity (Amlung et al., 2016), gambling (Wiehler et al., 2015), and a range of other potentially maladaptive decision-making patterns. It is also exacerbated in various 'externalizing disorders' as well in some neurodegenerative disorders (Gleichgerrcht et al., 2010) – for example bvFTD patients show increased delay-discounting, mirroring the prominent displays of impulsivity in their daily lives.

The capacity to imagine future scenarios allows people to make more prudent, farsighted and flexible decisions that take future consequences – including mutually exclusive possible future outcomes – into account (Gilbert and Wilson, 2007; Boyer, 2008). Accumulating evidence suggests a role for cued episodic foresight in reducing impulsivity (see **Figure 1** for a recent example). In a series of recent experiments, participants have been cued to imagine specific, personally relevant future events while they make intertemporal choices or face temptations such as high calorie food (e.g., Dassen et al., 2016). This cuing paradigm consistently reduces choice and behavioral impulsivity: i.e., it makes people more 'patient' in their preferences and actions (for reviews see Bulley et al., 2016; Benoit et al., 2018; Rung and Madden, 2018). Such findings dovetail with a growing awareness about the key role of prospection variations in decision-making more broadly (Noël et al., 2017), and underscore the potential utility of prospection in clinical interventions for externalizing disorders.

### CHANGES IN PROSPECTION: ADAPTIVE ALTERATIONS VERSUS MALADAPTIVE SHIFTS?

We next consider how the dynamic and constructive nature of prospection supports adaptive functional purposes yet may also manifest in maladaptive outcomes (see Henry et al., 2016). Thus, we may ask not only how the mechanisms of prospection deteriorate, but how prospection becomes clinically relevant even when underlying mechanisms are intact. We propose three avenues by which variations in prospection may give rise to adaptive or maladaptive outcomes with a view to stimulating further research in this important area:

### Individual Differences and Shifts in Content

People vary considerably in their tendency to consider the future (Zimbardo et al., 1997), as well as in their preferences for delayed versus immediate rewards (Peters and Büchel, 2011). Such individual differences are important for understanding impulserelated conditions such as addiction, where a prioritization of immediate aspects of a decision-making situation can take precedence (Noël et al., 2017). For example, chronic opiate users have been shown to generate fewer internal (episodic) details when projecting themselves into the future, but not when imagining atemporal scenarios (Mercuri et al., 2016; Moustafa et al., 2018b). The direction of causality here is somewhat opaque, however, given that a disposition to present-orientation may predict the onset of drug use, but chronic drug use may also impact brain functioning – and thus instigate maladaptive feedback loops.

Shifts in the content and modes of episodic future thinking have been documented in detail in affective disorders. Content shifts include an overrepresentation of possible negative future events in both anxiety and depression, while a reduction in the generation of positive future events occurs in depression (for reviews see Miloyan et al., 2014; MacLeod, 2016; Moustafa et al., 2018a). Moreover, subtle shifts in the kinds of details (e.g., episodic versus semantic) and representational format (imagery-based versus verbal-linguistic) of episodic foresight have been demonstrated in various clinical disorders (reviewed in Hallford et al., 2018). We caution, however, against the unilateral labeling of such shifts as 'impairments,' as some of these changes may represent coping strategies or adaptive mechanisms for effectively dealing with particular kinds of environmental stressors<sup>3</sup> (Borkovec et al., 2004; Bulley et al., 2017; Engen and Anderson, 2018). Nevertheless, given that prospection has been implicated in wellbeing in general, it represents an important target for ameliorating distress in clinical populations.

### Mechanistic Impairments

fpsyg-09-02328 November 23, 2018 Time: 15:50 # 4

As discussed, neurodegenerative disorders display pervasive changes in prospection, ranging from impaired prospective memory to an inability to envisage and describe the future in rich contextual detail. These compromised capacities reflect distinct underlying patterns of neural degeneration and the breakdown of key cognitive processes known to be important for prospection (Irish et al., 2012c). For example, episodic memory dysfunction precludes episodic future simulation in Alzheimer's disease (Addis et al., 2009), whereas loss of the conceptual knowledge base represents the key disruptive mechanism in semantic dementia (Irish et al., 2012a,b). Prospection difficulties in Parkinson's disease, by contrast, are associated exclusively with executive dysfunction (de Vito et al., 2012), while bvFTD represents a more complex picture with multiple neurocognitive processes implicated (Irish et al., 2013). Although the mechanisms by which prospection is altered differ across dementia subtypes, common to all syndromes is the observation of gross functional impairments in activities of daily living. We note, however, that empirical studies definitively linking altered prospection to functional impairment in dementia are lacking and this represents an important area for future investigation (for an initial exploration see Brunette et al., 2018).

### Lifespan Changes

When might a shift in the output of prospection be classified as adaptive? Counter to the prevailing deficit model, we contend that alterations in prospection in healthy aging may serve important adaptive functions (Andrews-Hanna et al., 2018). While older adults produce significantly fewer internal (episodic) details relative to young controls, this is offset by the provision of elevated external (semantic) details (Addis et al., 2010; Abram et al., 2014). This effect likely reflects a shift in the narrative style of older adults wherein overall meaning and context is favored above that of specificity and detail (reviewed by Schacter et al., 2013) 4 .

Older adults also date imagined future events and future self-images much closer to the present time than younger adults (Chessell et al., 2014). This finding has been replicated on naturalistic mind-wandering paradigms with older adults engaging in more atemporal/present-oriented rather than futureoriented spontaneous thoughts (Irish et al., 2018). Such changes make intuitive sense given the increased likelihood of negative events as one nears the end of the lifespan (Chessell et al., 2014). Similarly, worry in older adults shifts to considerations about "family concerns" and "world issues" (for review see Miloyan and Bulley, 2016), and this effect is further apparent in naturally occurring spontaneous thoughts which tend to become less selffocussed (Irish et al., 2018). We tentatively suggest that such alterations in prospection serve a protective function in older age, potentially mediating the well-documented "positivity effect" in healthy aging (Carstensen et al., 2005). When viewed from a functional perspective, the available evidence suggests that the benefits conferred in terms of life outlook and positivity in older adults compensate for their reduction in episodic specificity.

### FUTURE DIRECTIONS

### Measurement

Given the multifaceted nature of prospection and its diversity of outcomes, how we define and measure it is paramount. The literature is replete with experimental techniques to assess prospection in its many guises. For example, the provision of 'internal' (episodic) contextual details is widely used as a marker of the episodic specificity of simulated future events (e.g., Addis et al., 2008), while the number of fulfilled intentions reflects prospective memory capacity (for review see Brandimonte et al., 2014). Miloyan and McFarlane (2018) performed a systematic review of existing episodic foresight tasks, and categorized these measures into six main subcategories: (i) phenomenology (60%); (ii) examination (49%); (iii) fluency (12%); (iv) reaction time (12%); sentence completion (5%); and thought sampling (2%). They concluded that none of the available instruments have been validated to acceptable psychometric standards. An important goal then is to develop appropriate measurement tools that permit the reliable assessment of prospection in clinical settings. The refinement of coding protocols to index the intersection of episodic and semantic elements within future thinking narratives may further offer improved differentiation between clinical syndromes (Strikwerda-Brown et al., 2018), moving beyond a strict episodic-semantic dichotomy when assessing prospection (Irish and Piguet, 2013; Szpunar et al., 2014).

### Toward Enhancement and Treatment

Finally, we briefly consider the pertinent question of how to augment prospection to support everyday function in healthy individuals and to intervene effectively in the context of impairment. We propose two broad categories that hold promise:

<sup>3</sup>The capacity to imagine virtually any possible future threat event and generate anxiety before any cues of danger arise is of obvious evolutionary advantage (Miloyan et al., 2018). Even in contemporary environments, it may motivate people to take precautionary steps to avoid danger (Marks and Nesse, 1994; Nesse, 2011). However, threat prospection is also a potent source of distress, to the point of being a core diagnostic feature of anxiety disorders such as GAD. This case illustrates the nuances of addressing the potential 'functionality' of prospection variation in clinical contexts.

<sup>4</sup>Note that in developmental psychology the focus of research has been less on the content or format of prospection, and instead has concerned the fundamentals of the capacity itself: i.e., what is the developmental trajectory of future-directed

cognition and when do the relevant subcomponents 'come online'? (for reviews see Atance, 2015; Suddendorf and Redshaw, 2013; Suddendorf, 2017).

after dragging non-target circles to the bottom of the box (D,E) – the new location of the target circles will remind them of the required action (F) (Redshaw et al.,

2018). Child Development © 2018 Society for Research in Child Development, Inc. All rights reserved. 0009-3920/2018/8906-0015.

### Training Approaches

There have been numerous attempts to (a) directly improve simulation abilities or to guide the content thereof, and (b) use simulation abilities to augment other functions. The first category includes protocols such as working memory training (Bickel et al., 2011; Hill and Emery, 2013) and episodic specificity induction techniques (Madore et al., 2014) to bolster the provision of episodic detail during prospection. The second category includes the use of future event simulation techniques to improve prospective memory performance (Brewer and Marsh, 2010; Neroni et al., 2014; Altgassen et al., 2015), and to reduce delay discounting (e.g., Peters and Büchel, 2010). The applicability of the above-described approaches to clinical populations, however, remains largely unknown. In severe clinical cases where such interventions are arguably most necessary, it may be particularly difficult to implement simulation training or to leverage prospection to improve other tasks. Moreover, given that prospection is adversely affected in clinical conditions including depression (Williams et al., 1996; Addis et al., 2016), the efficacy and generalizability of such approaches remains an important open question<sup>5</sup> .

### Strategic Compensation

Metacognitive insight enables people to appreciate that their simulations of the future 'could be wrong'. This insight allows people to amend and update their expectations as appropriate, as well as to perform various strategic behaviors to compensate for prospection failures (Redshaw and Bulley, 2018). Two prominent examples are contingency planning and cognitive offloading:

(a) Contingency planning for mutually exclusive possible outcomes is a complex ability that requires the insight that one's representations of the future could be incorrect. Contingency planning is critical for numerous functions in everyday life, from arranging insurance and keeping receipts, to planning alternative transport options for important appointments; and from packing an umbrella in case it rains to backing-up one's hard-drive in case it gets corrupted. Fundamentals of contingency planning for mutually exclusive future events have been studied in child development and in other animals (e.g., Redshaw and Suddendorf, 2016), but its application in clinical settings has yet to receive concerted attention. Nonetheless, we note that some of the non-verbal protocols stemming from developmental and comparative psychology hold potential for translation into clinical populations characterized by cognitive impairment (see **Figure 2**, panel (i) for a recent example of a paradigm for exploring the capacity to prepare for mutually exclusive future events).

<sup>5</sup>Note that there have also been some calls to directly target prospection in depression with 'future directed' therapies (Vilhauer et al., 2012; Roepke and Seligman, 2015).

Bulley and Irish Functions of Prospection

(b) Humans frequently set reminders, write lists, and modify their present surroundings in a variety of ways to augment future cognitive performance. With the increasing ubiquity of technologies that permit future-directed cognitive offloading in the form of calendars, alarms, and digital personal assistants, such strategies represent promising forms of intervention in clinical settings (see **Figure 2**, panel (ii) for a recently developed minimalistic paradigm to examine cognitive offloading). Cognitive offloading likely requires metacognitive insight into the limits of one's own future performance in order for successful pre-emptive compensation (Risko and Gilbert, 2016), and thus may be most suitable as an intervention opportunity in clinical populations where an awareness of disorder-related limitations remains intact.

### CONCLUSION

Prospection is a multifaceted construct, which supports a diverse range of important functions including goal-directed behavior and flexible decision-making. Our brief survey of the extant literature, focussing on episodic future thinking, highlights the manifold expressions of prospection and how its functional outcomes can vary according to individual differences (e.g., addiction), lifespan changes (e.g., healthy aging), and disruption of underlying neurocognitive mechanisms (e.g., dementia). We suggest that this inherent variability in the

### REFERENCES


outcomes of prospection may serve important adaptive functions as exemplified in healthy aging. Perhaps most importantly, we note the potential for shifts in content that give rise to maladaptive expressions of prospection even when the underlying mechanisms appear to be in working order or even augmented (e.g., anxiety). A precise understanding of the contributing factors that predispose maladaptive expressions of prospection remains unclear, yet will be critical to inform targeted behavioral interventions. Our intention here is to stimulate further research into the potential for simulation-based training and 'strategic compensation' strategies to explore the fundamentals of prospection in clinical contexts and ultimately improve wellbeing in everyday life.

### AUTHOR CONTRIBUTIONS

AB and MI contributed equally to the conceptualization, literature review, and writing of this manuscript.

### FUNDING

This work was supported in part by the Australian Research Council (ARC) Centre of Excellence in Cognition and its Disorders (CE110001021). MI is supported by an ARC Future Fellowship (FT160100096).




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer GR declared a shared affiliation, with no collaboration, with one of the authors AB to the handling Editor at the time of the review.

Copyright © 2018 Bulley and Irish. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.