# MATHEMATICAL MODELING TOWARD UNDERSTANDING HUMANS AND ANIMALS: FROM DECISION MAKING TO MOTOR CONTROLS

EDITED BY : Hiroshi Yamada, Kenway Louie, Jun Izawa and Tomohiko Takei PUBLISHED IN : Frontiers in Neuroscience and Frontiers in Computational Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-303-3 DOI 10.3389/978-2-88966-303-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# MATHEMATICAL MODELING TOWARD UNDERSTANDING HUMANS AND ANIMALS: FROM DECISION MAKING TO MOTOR CONTROLS

Topic Editors:

Hiroshi Yamada, University of Tsukuba, Japan Kenway Louie, New York University, United States Jun Izawa, University of Tsukuba, Japan Tomohiko Takei, Kyoto University, Japan

Citation: Yamada, H., Louie, K., Izawa, J., Takei, T., eds. (2020). Mathematical Modeling Toward Understanding Humans and Animals: From Decision Making to Motor Controls. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-303-3

# Table of Contents

*04 Visual Feature Integration of Three Attributes in Stimulus-Response Mapping is Distinct From That of Two*

Mizuki Furutate, Yumiko Fujii, Hiromi Morita and Masahiko Morita *14 Stabilization of a Cart Inverted Pendulum: Improving the Intermittent Feedback Strategy to Match the Limits of Human Performance*

Pietro Morasso, Taishin Nomura, Yasuyuki Suzuki and Jacopo Zenzeri


Atsushi Fujimoto and Takafumi Minamimoto


Daiki Tamura, Shinya Aoi, Tetsuro Funato, Soichiro Fujiki, Kei Senda and Kazuo Tsuchiya

*117 Non-uniqueness Phenomenon of Object Representation in Modeling IT Cortex by Deep Convolutional Neural Network (DCNN)*

Qiulei Dong, Bo Liu and Zhanyi Hu

# Visual Feature Integration of Three Attributes in Stimulus-Response Mapping Is Distinct From That of Two

Mizuki Furutate<sup>1</sup>† , Yumiko Fujii<sup>2</sup> , Hiromi Morita<sup>3</sup> and Masahiko Morita<sup>4</sup> \*

<sup>1</sup> Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan, <sup>2</sup> Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan, <sup>3</sup> Faculty of Library, Information and Media Science, University of Tsukuba, Tsukuba, Japan, <sup>4</sup> Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan

#### Edited by:

Jun Izawa, University of Tsukuba, Japan

#### Reviewed by:

Leila Montaser-Kouhsari, Columbia University, United States Vijay Mohan K. Namboodiri, University of North Carolina at Chapel Hill, United States

> \*Correspondence: Masahiko Morita mor@bcl.esys.tsukuba.ac.jp

†Present address: Mizuki Furutate, NS Solutions Corporation, Tokyo, Japan

#### Specialty section:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 15 November 2018 Accepted: 15 January 2019 Published: 13 February 2019

#### Citation:

Furutate M, Fujii Y, Morita H and Morita M (2019) Visual Feature Integration of Three Attributes in Stimulus-Response Mapping Is Distinct From That of Two. Front. Neurosci. 13:35. doi: 10.3389/fnins.2019.00035 In the human visual system, different attributes of an object are processed separately and are thought to be then temporarily bound by attention into an integrated representation to produce a specific response. However, if such representations existed in the brain for arbitrary multi-attribute objects, a combinatorial explosion problem would be unavoidable. Here, we show that attention may bind features of different attributes only in pairs and that bound feature pairs, rather than integrated object representations, are associated with responses for unfamiliar objects. We found that in a mapping task from three-attribute stimuli to responses, presenting three attributes in pairs (two attributes in each window) did not significantly complicate feature integration and response selection when the stimuli were not very familiar. We also found that repeated presentation of the same triple conjunctions significantly improved performance on the stimulus-response task when the correct responses were determined by the combination of three attributes, but this familiarity effect was not observed when the response could be determined by two attributes. These findings indicate that integration of three or more attributes is a distinct process from that of two, requiring long-term learning or some serial process. This suggests that integrated object representations are not formed or are formed only for a limited number of very familiar objects, which resolves the computational difficulty of the binding problem.

Keywords: feature integration, binding problem, stimulus-response mapping, visual attention, object representation

### INTRODUCTION

The human visual system is considered to process different visual attributes, such as shape, color, motion, and texture separately in different modules (Livingstone and Hubel, 1987). The integration of these distinct attributes to produce a unified percept and specific response is known as the binding problem (von der Malsburg, 1981; Treisman, 1996), one of the most important open problems in cognitive psychology and neuroscience. One main reason for the difficulty of this problem is the explosion of feature combinations, that is, the fact that the number of possible combinations of features of all attributes is extremely large. This problem is critical not only for the "cardinal cell" concept, which hypothesizes that all attributes are integrated via converging

**4**

hard-wired connections into an integrated representation, but also for the concept of binding via synchronous firing of neurons (von der Malsburg, 1981; Singer and Gray, 1995), because this requires as many synchrony detectors as the number of feature combinations (Shadlen and Movshon, 1999).

Psychological studies (Luck and Vogel, 1997; Treisman, 1999; Wolfe and Cave, 1999) show that there exists a mechanism that integrates arbitrary combinations of features. According to the standard theory of feature integration (Treisman and Gelade, 1980), when attention is focused on an object, all attributes of the object are rapidly bound into a unified representation for higher cognitive processing (Treisman, 1988; Kahneman et al., 1992), which we refer to as the all-attribute model. However, no neural mechanisms have been found for such binding that are free from the combinatorial explosion problem. A clue to resolving this conflict may be that psychological evidence supporting the existence of feature binding does not require the existence of unified representations of all attributes. In fact, most studies of attentional binding have used two-attribute stimuli, and no studies have confirmed that three or more attributes are directly bound into unified representations. Furthermore, Hommel (1998) reported that only two-way interactions between featurerepetition effects were observed in a prime-probe stimulusresponse (SR) task, suggesting that temporary binding may be binary, and that an object representation may comprise a loosely connected, distributed network of pairwise bindings rather than a unitary structure (Hommel and Colzato, 2004).

Accordingly, we hypothesized that attention can bind only pairs of attributes and that unified representations of three or more attributes are not formed (the "no-triplet hypothesis"), except perhaps in the case of a limited number of familiar objects. Based on this hypothesis, Morita et al. (Morita et al., 2010) developed a paired-attribute model, in which cognitive processes are based on multiple representations of paired attributes and their interactions, and discovered a new illusion arising from erroneous integration of attribute pairs, consistently with the model's prediction. Moreover, Ishizaki et al. (2015) showed that learning and performance for SR tasks were more difficult when three attributes of the stimulus determined the correct response (Triple condition) than when two attributes did (Double condition), suggesting that bound feature pairs, rather than object representations, are associated with responses.

The results of the study by Ishizaki et al. support not only the paired-attribute model but also the no-triplet hypothesis, because the task was designed such that integration of multiple attributes was necessary. It seems unlikely that integrated representations of three attributes existed but were not used for such a task. To explain this in more detail, let us assign S<sup>1</sup> and S<sup>2</sup> as shape features, C<sup>1</sup> and C<sup>2</sup> as color features, and SiC<sup>j</sup> as the conjunction of S<sup>i</sup> and C<sup>j</sup> . If stimuli S1C<sup>1</sup> and S1C<sup>2</sup> are mapped to response R1, and stimuli S2C<sup>1</sup> and S2C<sup>2</sup> to response R2, SR mapping is easily achieved by associating S<sup>1</sup> with R<sup>1</sup> and S<sup>2</sup> with R2. It is impossible, however, to associate stimuli S1C<sup>1</sup> and S2C<sup>2</sup> with response R<sup>1</sup> and stimuli S2C<sup>1</sup> and S1C<sup>2</sup> with response R2, without integrating shape and color. Similarly, we can design a mapping between triple conjunctions and responses so that integration of three attributes is required.

In contrast, ordinary object recognition, visual search, or short-term memory tasks do not in principle require integration of attributes, because the tasks can be solved by comparing features for each attribute and integrating the comparison results; thus, experiments using such tasks cannot provide compelling evidence against the existence of integrated object representations. Accordingly, investigating the mapping process of multi-attribute stimuli to responses is critical to elucidate the representation underlying not only decision making, but also other various cognitive processes.

In the present study, we extended the previous study by Ishizaki et al. to obtain additional convincing evidence for the notriplet hypothesis. Specifically, we performed the following two experiments using SR mapping tasks.

In Experiment 1, we tested a prediction derived from the paired-attribute model. In the previous study, spatially separated presentations of two or three attributes considerably complicated the SR task, although they did not markedly affect the target detection task, which does not require feature integration and response selection (Ishizaki et al., 2015). This indicates that feature integration and response selection became more difficult because separately presented features were not automatically bound by attention. The all-attribute model predicts that the same will occur if three attributes are presented separately in pairs (paired presentation), i.e., the SR task will be more complicated than the target detection task. However, according to the paired-attribute model, a three-attribute stimulus, say a red lattice-patterned circle, is represented by three attribute pairs—red circle, lattice-patterned circle, and red lattice pattern—which are separately associated with a response. This association process would be the same when three two-attribute stimuli are presented, and thus paired presentation will not affect feature integration and response selection. Accordingly, the paired-attribute model predicts that the paired presentation will not complicate the SR task more than the target detection task.

In Experiment 2, we examined the effect of stimulus familiarity on the SR mapping task. The no-triplet hypothesis does not exclude the integrated object representations for a limited number of familiar objects, implying that repeated presentation of the same feature combinations may promote their integration. The all-attribute model predicts that the familiarity effect will not appear or will appear independently of the number of attributes that need to be integrated if all attributes are presented as a single stimulus; the effect may more clearly appear when attributes are presented individually or in pairs so that the attributes cannot be bound by attention. In contrast, the paired-attribute model predicts that the familiarity effect will not appear strongly with the Double condition because even unfamiliar feature pairs can be quickly bound by attention but may appear more clearly with the Triple condition because integration of three attributes would require long-term learning. Thus, we compared familiar and unfamiliar stimuli with participants performing a familiarization task on the first day and a SR task on the following day.

### MATERIALS AND METHODS

fnins-13-00035 February 11, 2019 Time: 15:57 # 3

#### Ethics Statement

This study was approved by the Ethical Committee of the Faculty of Library, Information and Media Science, University of Tsukuba, Japan, and was conducted in accordance with the Code of Ethics and Conduct of the Japanese Psychological Association. Written informed consent was obtained from all participants.

### Experiment 1

The participants included 17 (7 male and 10 female) students with normal or corrected-to-normal vision. They were all paid volunteers who were uninformed of the experimental purpose. Participants viewed a CRT display from a distance of 114.5 cm in a dark room and responded by pressing a numerical keypad and performed SR trials and target detection trials (**Figure 1A**).

The display screen was gray (9.0 cd/m<sup>2</sup> ), subtending 7.1 × 5.7◦ of visual angle, and had two large (1.9◦ ) and four small (1.2◦ ) square windows filled in black (**Figure 1B**). Stimuli were generated by combining two shapes (circle and diamond), two colors (red and green) with equal luminance (6.4 cd/m<sup>2</sup> ), and two textures (lattice and random hashed lines) with equal average luminance (3.7 cd/m<sup>2</sup> ) (**Figure 1C**). These features were common to all participants, but the mapping from feature combinations to response keys varied (counterbalanced across participants).

In each SR trial, after a blank screen showing only the presentation windows, one of the eight feature combinations was presented in the windows. Participants were instructed to select one of the four arrow keys and press it as quickly and accurately as possible. If the response was correct, the stimulus disappeared, and the next trial started with a 1000 ms blank screen; however, if the response was incorrect or no key was pressed within 2000 ms, a 400 Hz (incorrect) or 900 Hz (timeout) buzzer sounded for 150 ms and an arrow indicating the correct key was presented for 800 ms, after which the next trial started with a 200 ms blank screen.

In target detection trials, one of the eight feature combinations was designated as the target. Participants were requested to press a response key as quickly and accurately as possible when the target was presented in any presentation manner. If participants responded incorrectly to a non-target stimulus, a 400 Hz buzzer sounded, and if participants did not respond to the target within 1000 ms, a 900 Hz buzzer sounded. Simultaneously, with a correct response or a buzzer sound, the stimulus disappeared and the next trial started immediately.

There were three conditions: "Unified," "Paired," and "Separate." In the Unified condition, two three-attribute stimuli were presented in two large windows (**Figure 1B**, left panel). These two stimuli were identical in most cases (10/11), and participants were requested to press one of the response keys as quickly and accurately as possible. Occasionally (1/11),

FIGURE 1 | Experimental paradigm for Experiment 1. (A) Schematic procedure for stimulus-response (SR) and target detection trials. (B) Stimulus display. In the Unified, Paired, and Separate conditions, six features were presented in two, four, or six windows, respectively. Participants were instructed to respond as quickly and accurately as possible to the stimulus presented. (C) Correspondence between stimuli and responses in SR trials. The two stimuli comprising sets SC, ST, or CT differed only in texture, color, or shape, respectively, and corresponded to the same response key, whereas those in set SCT differed in all attributes.

however, the two objects were different in shape, in which case the participants were instructed not to press any key, indicating the need to attend to both windows. In the Paired condition, shape-color and shape-texture stimuli were presented inside the two large windows, and a color-texture stimulus was presented to fill the upper or lower (randomly selected) middle window (**Figure 1B**, middle panel). The participants were requested to press a key according to the combination of three attributes, except on occasional trials (1/11) when the shapes in the two large windows were different. In the Separate condition, the shape was presented inside the two large windows, the color was presented to fill the upper middle and the lower left small windows, and the texture was presented to fill the upper right and the lower middle small windows (**Figure 1B**, right panel). The participants were requested to press a key in the same way as in the Paired condition.

The mapping from the feature combinations to response keys is illustrated in **Figure 1C**, where the combination of shape Si , color C<sup>j</sup> , and texture T<sup>k</sup> is denoted as SiCjT<sup>k</sup> (i, j, k = 1 or 2). In set SC, for example, the combination presented was S1C1T<sup>1</sup> or S1C1T2, and these were mapped to R1. Thus, the correct response was determined by shape and color but did not depend on texture. Similarly, the correct response did not depend on color and shape in sets ST and CT, respectively. In contrast, the three attributes were all critical in set SCT. One of the stimuli in sets SC, ST, and CT was presented as the "Double" condition, and either stimulus in set SCT was presented as the "Triple" condition. Combining these two conditions with three presentation conditions created six cases, which are denoted as Double-Unified, Triple-Paired, etc.

In each SR trial, one of the eight feature combinations and one of three presentation manners were pseudo-randomly selected, the stimulus display was presented, and the participant responded to it. The participants first performed 24 practice trials and 10 blocks of experimental trials for the SR task. Each block comprised 240 (8 × 3 × 10) SR trials, in which each feature combination appeared in each presentation manner 10 times, and 24 "catch" trials in which the shapes presented in the two large windows were different. The pseudo-random order of the stimuli was predetermined, which was constrained by two different stimuli in the same set (corresponding to the same response key) that were never presented in consecutive trials so that participants could not easily comprehend the mapping to a specific response.

Next, the participants performed 24 practice trials and one block of experimental trials for the target-detection task, in which one block comprised 240 (8 × 3 × 10) target-response trials and 24 catch trials, with the target appearing 30 times.

#### Experiment 2

The participants were 18 students (5 male and 13 female) with normal or corrected-to-normal vision. They were all paid volunteers, who were uninformed of the experimental purpose and did not participate in Experiment 1. They performed a familiarization task on the first day and a SR task on the following day. The experimental environment was the same as that in Experiment 1.

In the familiarization task, the participants performed 42 blocks of target detection trials. For each block, one of the six stimuli shown in **Figure 2A** (fixed for all participants) was specified as the target, and the participants were instructed to press any key within 500 ms, only when the target was presented. Six three-attribute stimuli that differed from the target in only one attribute (shape, color, or texture) and would not be used in the SR task, were used as non-targets. If participants responded incorrectly to a non-target stimulus, a 400 Hz buzzer sounded, and if participants did not respond to the target within 500 ms, a 900 Hz buzzer sounded. The stimulus disappeared simultaneously with a correct response or a buzzer sound, and the next trial started immediately. Each block comprised 130 trials, in which the target appeared 100 times and non-targets appeared 30 (6 × 5) times in a random order.

After finishing one block, the participants proceeded to the next block, in which another stimulus was specified as the target. Six blocks, for six target stimuli, composed one cycle. Participants repeated seven cycles and viewed each of the six stimuli 700 times, which were used as the familiar stimuli in the SR task performed on the next day.

This task was similar to that in Experiment 1, except that the Triple and Double conditions and three presentation manners were not mixed in the same session. We also decreased the time limit when the average PCR in the previous block was over 90%, to create higher time pressure.

The experiment was performed under four conditions: Triple-Unified, Triple-Paired, Triple-Separate, and Double-Unified. The Triple-conditions (Unified, Paired, and Separate) were always performed in the order Separate–Paired–Unified to avoid the influence of viewing unfamiliar triple feature conjunctions on subsequent conditions. The Double-Unified condition was given first for half of the participants and last for the other half. Participants were requested to press the correct response key as quickly as possible within a time limit.

In the Triple-Unified condition, eight stimuli were mapped to four response keys, as shown in **Figure 2B**. The correct response was always determined by three attributes, and each response key corresponded to one familiar and one unfamiliar stimulus. Each trial started with a blank screen, which was gray (9.0 cd/m<sup>2</sup> ), subtending 5.7 × 5.7◦ of visual angle, and had a single square window (1.9◦ ) filled in black, after which one of the eight stimuli shown in **Figure 2B** was presented. If the response was correct, the stimulus disappeared, and the next trial started with a 1000 ms blank screen; however, if the response was incorrect or no key was pressed within the time limit, a 400 Hz (incorrect) or 900 Hz (timeout) buzzer sounded for 150 ms after the disappearance of the stimulus, and an arrow indicating the correct key was presented for 600 ms, after which the next trial started with a 400 ms blank screen.

The Triple-Paired and Triple-Separate conditions differed from the Triple-Unified condition only in that the blank screen had two (Paired) or one (Separate) large (1.9◦ ) and one or two small (1.2◦ ) square windows, and three attributes of the stimuli

in **Figure 2B** were presented in pairs or separately in these windows (**Figure 2C**). The same combination was mapped to different keys among these three Triple conditions (e.g., S1C1R<sup>1</sup> was mapped to R1, R2, and R<sup>3</sup> in the Triple-Unified, -Paired, and -Separate conditions, respectively), and the participants performed the task in the order of separate, paired, and unified presentations, so that unfamiliar feature combinations would not become familiar.

The Double-Unified condition was the same as the Triple-Unified condition in the manner of stimulus presentation, but a different stimulus set (**Figure 2D**) was used. These eight stimuli were common to all participants, but three kinds of mapping were each applied to one third of the participants. That is, in addition to the mapping shown in **Figure 2D**, which consists of sets SC (the response is determined by shape and color) and CT (the response is determined by color and texture), mappings consisting of sets SC and ST (the response is determined by shape and texture) and consisting of sets CT and ST were used. In **Figure 2D**, the two stimuli surrounded by red lines were familiar triple conjunctions (Familiar case) and the others were unfamiliar triple conjunctions (Unfamiliar case), but each unfamiliar stimulus contained one familiar feature pair. We dealt with each case, in which the familiar feature pair was critical for determining the response (case Familiar feature pair, surrounded by orange lines), separately from the Unfamiliar case.

Participants first performed four blocks of practice trials in the Triple-Unified condition with a novel stimulus set, whose components were completely different from those for experimental trials, and performed 10 blocks of experimental trials in each condition. Each block comprised 80 (8 × 10) trials, in which each stimulus or feature combination appeared 10 times in a pseudo-random order, with the constraint that two different stimuli corresponding to the same response key were never presented in consecutive trials. The time limit was fixed to 2000 ms during the first five blocks, but it was thereafter controlled according to the average PCR in the previous block. Specifically, if the average PCR for all stimuli was over 90%, the time limit in the next block was shortened such that 90% of correct RTs were within it.

correct response could be determined.

### RESULTS

#### Experiment 1

fnins-13-00035 February 11, 2019 Time: 15:57 # 6

We analyzed data for 17 participants. For each participant and condition, the percentage of correct responses (PCR) of the SR trials for each block was calculated. Response times in "correct" trials were log-transformed and averaged within each block to calculate the mean response time (RT). In the same way, the mean target detection times (TDTs) were calculated from the response times in the target detection trials.

**Figures 3A,B** show the time course, over the 10 blocks of PCR and RT, averaged over the 17 participants. We see that in any condition, the PCR increased and the RT decreased during the first five blocks, but were nearly constant thereafter. Therefore, to obtain stable responses, we analyzed only the data for the last half of the blocks (6 to 10).

The average PCR was analyzed using two-way repeatedmeasures ANOVA, with the number of critical attributes (conditions Double vs Triple) and the manner of presentation (conditions Unified vs Paired vs Separate) as factors. The main effect of attribute number was significant [F(1,16) = 31.0, P < 0.001], indicating that the mean PCR was significantly lower for the Triple conditions than for the Double conditions (**Figure 3C**). The main effect of the presentation manner was marginal [F(2,32) = 2.77, P = 0.078], likely because the correspondence between feature combinations and responses was common to all presentation manners. Also, the interaction [F(2,32) = 0.19, P = 0.83] was not found. Post-hoc multiple comparisons with Bonferroni correction were performed using two-tailed paired t-tests, and no significant differences were found between the Unified and Paired conditions (P = 0.20), between Unified and Separate (P = 0.24), and between Paired and Separate (P > 0.999).

The same analysis was applied to the average RT. The main effects of attribute number [F(1,16) = 32.1, P < 0.001] and presentation manner [F(2,32) = 54.7, P < 0.001] were significant, but their interaction was not [F(2,32) = 0.613, P = 0.55]. Post-hoc multiple comparisons with Bonferroni correction were performed using two-tailed paired t-tests, and significant differences were found between the Unified and Paired conditions (P < 0.001), between Unified and Separate (P < 0.001), and between Paired and Separate"(P < 0.001).

TDTs were tested using repeated-measures ANOVA with three levels (Unified, Paired, and Separate), and a significant main effect was found [F(2,32) = 14.3, P < 0.001]. Post hoc multiple comparisons with Bonferroni correction were performed using two-tailed paired t-tests. Significant differences were found

FIGURE 3 | Results for Experiment 1. (A) Mean percent correct responses (PCR) versus block number. (B) Mean response time (RT) versus block number. (C) Mean PCR for blocks 6–10. All data points (N = 17) are plotted as dots. (D) Mean target detection time (TDT). (E) Mean response time minus target detection time (RT – TDT) for blocks 6–10. Error bars indicate SEM in all graphs.

between the Unified and Paired conditions (P = 0.005) and between Unified and Separate (P = 0.001), but not between Paired and Separate (P > 0.999) (**Figure 3D**).

The differences in RT between presentation manners include the differences in the time required for perceiving features and the difference in TDT is considered to mainly reflect the difference in information acquisition time. Accordingly, we examined RT minus TDT (RT – TDT; **Figure 3E**). This value was calculated in each case (TDT is independent of the attribute number) for each participant and analyzed in the same way as PCR. We found that the main effects of attribute number [F(1,16) = 32.1, P < 0.001] and presentation manner [F(2,32) = 6.77, P = 0.004] were significant, but their interaction was not [F(2,32) = 0.613, P = 0.55]. Posthoc multiple comparisons with Bonferroni correction (twotailed paired t-test) were then performed, without distinction between the Double and Triple conditions because no significant interaction was found. The differences between the Unified and Separate conditions (P = 0.01) and between Paired and Separate (P = 0.004) were significant, but not between Unified and Paired (P > 0.999). Finally, we directly tested RT – TDT between the Triple-Unified and Triple-Paired cases and between the Double-Unified and Double-Paired cases, using two-tailed paired t-tests without Bonferroni correction, to confirm that no significant differences were found [t(16) = 0.908, P = 0.38 and t(16) = 0.431, P = 0.67, respectively].

The above results are summarized as follows: (1) The PCR was significantly smaller and the RT was significantly larger when triple conjunctions of attributes determined the response than when double conjunctions did. (2) The difference in RT between the Paired and Unified conditions was not significantly different from that in TDT, whereas the difference in RT between the Separate condition and the Unified or Paired condition was significantly larger than that in TDT.

#### Experiment 2

We analyzed data from 14 participants whose PCR increased to more than 50% in all conditions. Data from four participants who failed to reach this criterion were excluded. For each participant and condition, PCRs for familiar and unfamiliar stimuli (also for familiar feature pairs in the Double-Unified condition) for each block were calculated. Similarly, RTs in correct trials were logtransformed and averaged to calculate RTs for familiar stimuli (and feature pairs) and unfamiliar stimuli.

**Figure 4A** shows the time courses of the mean PCR and mean RT, with the mean time limit, for the 14 analyzed participants. The curves for the familiar and unfamiliar cases almost overlapped, except the PCR curves in the last two blocks of the Triple-Unified condition.

The average PCR for the last half of the blocks (6 to 10) was tested (**Figure 4B**) using a two-tailed paired t-test for the Triple-conditions (Unified, Paired, and Separate).

main effect in the ANOVA. Error bars indicate SEM.

The differences between the Familiar and Unfamiliar cases were significant in the Triple-Unified condition [t(13) = 3.49, P = 0.004], but insignificant in the Triple-Paired [t(13) = −0.183, P = 0.86] and Triple-Separate [t(13) = 1.08, P = 0.30] conditions. A repeated-measures ANOVA with three levels (familiar stimuli, familiar feature pairs, and unfamiliar stimuli) was applied to the Double-Unified condition, and no significant main effect was found [F(2,26) = 0.755, P = 0.48]. We also analyzed RT in the same way, but did not find any significant differences (P > 0.46 for all comparisons).

In summary, the effect of stimulus familiarity was observed only when integration of three attributes was necessary and the stimuli were presented in a unified manner.

### DISCUSSION

Conceivable models to explain how automatic binding by attention contributes to the SR mapping objects are as follows:


Our no triplet hypothesis, which states that attentional binding of arbitrary features occurs only between pairs of attributes and that triplets of attributes of an unfamiliar object are not directly integrated into a unified object representation, accords with the paired-attribute model for unfamiliar objects and conflicts with the all-attribute model for familiar and unfamiliar objects. Thus, let us examine these three models by comparing the experimental results.

First, the single-attribute model is not in accordance with the result of Experiment 1 in that RT – TDT was significantly longer for the separate presentation than for the paired or unified presentations, as the difference seems inexplicable without considering the contribution of attentional binding. For the same reason, the model appears inconsistent with the result of Experiment 2 in that the familiarity effect was observed only in the Triple-Unified condition.

Second, the all-attribute model is not in accordance with the results of Experiment 1 in that the PCR was lower and the RT was longer for the Triple condition than for the Double condition, because according to this model, any stimulus is mapped to the response via the object representation integrating three attributes in the Double and Triple conditions.

Additionally, the model appears inconsistent with the results in that separate presentation of stimuli increased RT – TDT compared to unified presentation but paired presentation did not. Although RT – TDT does not necessarily denote the time required for feature integration and response selection as the response time is not a simple linear sum of time for detection, feature integration, and response selection—no significant difference in this value indicates that the difference in RT can be explained by the difference in information acquisition time. It may be natural that RT – TDT did not differ between unified and paired presentation for the Double condition, in which the correct response was determined by a feature pair; however, paired presentation did not increase it in the Triple condition either. This fact suggests that presenting two attributes at the same location contributes to feature integration and response selection, but presenting three attributes does not contribute more than that.

In addition, the all-attribute model is not in accordance with the result of Experiment 2 in that the effect of stimulus familiarity was observed in the Triple-Unified condition but not in the Double-Unified condition. Furthermore, the familiarity effect observed in the Triple-Unified condition disappeared in the Triple-Paired condition, implying that for familiar stimuli, paired presentation compared with unified presentation complicates feature integration and response selection in the Triple condition, whereas it does not for unfamiliar stimuli as indicated in Experiment 1. This is also difficult to explain with the allattribute model.

In contrast, the above experimental results for unfamiliar stimuli are all as predicted or well explained by the pairedattribute model, which can also explain the result for the familiar stimuli. We therefore conclude that our results support the notriplet hypothesis, indicating that bound feature pairs, rather than integrated object representations, are associated with responses for unfamiliar objects.

The no-triplet hypothesis allows that integrated representations of three or more attributes may exist for very familiar objects. However, this was not demonstrated by Experiment 2, because the familiarity effect was not observed during initial learning and because the task was obviously more difficult in the Triple condition than in the Double condition, even for familiar stimuli (although we cannot directly compare different conditions, the time limit for block 8 in the Triple-Unified condition and block 7 in the Double-Unified condition, for example, differed by more than 700 ms). If integrated representations of three attributes had been completely formed after the familiarization task, learning of the familiar stimuli would have been easier from the start, compared to learning of the unfamiliar stimuli, and performance would not have differed as much between the Triple and Double conditions.

The question, then, is how familiarity affected the feature integration process. According to the paired-attribute model, attributes at the same locations are integrated in pairs by attentional binding, and bound feature pairs are then associated with responses, with familiarity facilitating only the latter process. This model, however, is not in accordance with the result from Experiment 2 in that the familiarity effect was not observed in the paired presentation condition. Thus, a model with an additional path from individual features to responses, or a hybrid of the paired- and single-attribute models, would be

more plausible. It should be noted that a distinct mechanism of feature integration using converging hardwired connections from lower-level modules for individual attributes is considered to exist independently of attentional binding (Hommel and Colzato, 2009; Vanrullen, 2009).

According to this two-path model, the results of our experiments can be explained as follows. If the stimulus is unfamiliar, only the first path—involving attentional binding is available, and mapping to responses is easy in the Double condition. In the Triple condition, however, mapping from feature pairs to responses is complicated and not easily learned, so that "thinking," or some serial process, would be involved in response selection. On the other hand, the second path is formed and available for very familiar stimuli, and is faster than the first path. This path does not necessarily make use of unified object representations, but may make use of types of integrated representations that do not completely correspond to individual objects. In the above experiment, complete object representations were not formed, presumably because the number of presentations was insufficient, or one day of familiarization was too short, or the familiarization task used did not require feature integration. In any case, if the component feature pairs are familiar but the stimulus is unfamiliar, or if the combination of three features is familiar but they are not presented in a unified manner, the second path would be available only partly, and the familiarity effect would disappear. However, this explanation is rather speculative, and further experiments (particularly with a longer period of familiarization) will be needed.

#### CONCLUSION

In conclusion, the results of the present study indicate that in the mapping of multi-attribute visual stimuli to responses, feature integration of two attributes and of three attributes are distinct processes, in that the former is easy and automatic, and is not affected by the familiarity of feature conjunctions,

### REFERENCES


whereas the latter is more difficult and is facilitated by repeated presentation of triple feature conjunctions. The results also provide additional evidence supporting the no-triplet hypothesis, which greatly facilitates solving the binding problem by avoiding the combinatory explosion problem, as previously discussed (Ishizaki et al., 2015). However, more evidence would be necessary to establish this hypothesis, because the possibility is not ruled out that attentional binding of three or more attributes may be used in some other cognitive process. It is also unclear how attention binds arbitrary features between pairs of attributes. Although answering this question requires further studies, we note that feature binding between pairs of attributes is computationally much easier than binding all attributes, and several biologically feasible mechanisms may be responsible, such as mutual modulation between neuronal populations encoding different attributes (Morita et al., 2010).

### DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

### AUTHOR CONTRIBUTIONS

MM and MF designed the experiments. MF performed the experiments with assistance from YF. YF and HM analyzed the data and prepared the figures. MM wrote the manuscript.

#### FUNDING

This work was supported in part by JSPS KAKENHI Grant Numbers JP26590173 and JP18H03304 to MM. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

and depth. J. Neurosci. 7, 3416–3468. doi: 10.1523/JNEUROSCI.07-11- 03416



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with several of the authors MM, MF, YF, and HM.

Copyright © 2019 Furutate, Fujii, Morita and Morita. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Stabilization of a Cart Inverted Pendulum: Improving the Intermittent Feedback Strategy to Match the Limits of Human Performance

Pietro Morasso<sup>1</sup> \*, Taishin Nomura<sup>2</sup> , Yasuyuki Suzuki <sup>2</sup> and Jacopo Zenzeri <sup>1</sup>

*<sup>1</sup> Robotics, Brain and Cognitive Sciences Department, Center for Human Technologies, Italian Institute of Technology, Genoa, Italy, <sup>2</sup> Mechanical Science and Bioengineering Department, Graduate School of Engineering Science, Osaka University, Toyonaka, Japan*

Stabilization of the CIP (Cart Inverted Pendulum) is an analogy to stick balancing on a finger and is an example of unstable tasks that humans face in everyday life. The difficulty of the task grows exponentially with the decrease of the length of the stick and a stick length of 32 cm is considered as a human limit even for well-trained subjects. Moreover, there is a *cybernetic* limit related to the delay of the multimodal sensory feedback (about 230 ms) that supports a feedback stabilization strategy. We previously demonstrated that an intermittent-feedback control paradigm, originally developed for modeling the stabilization of upright standing, can be applied with success also to the CIP system, but with values of the critical parameters far from the limiting ones (stick length 50 cm and feedback delay 100 ms). The intermittent control paradigm is based on the alternation of on-phases, driven by a proportional/derivative delayed feedback controller, and off-phases, where the feedback is switched off and the motion evolves according to the intrinsic dynamics of the CIP. In its standard formulation, the switching mechanism consists of a simple threshold operator: the feedback control is switched off if the current (delayed) state vector is closer to the stable than to the unstable manifold of the off-phase and is switched on in the opposite case. Although this simple formulation is effective for explaining upright standing as well as CIP balancing, it fails in the most challenging configuration of the CIP. In this work we propose a modification of the standard intermittent control policy that focuses on the explicit selection of switching times and is based on the phase reset of the estimated state vector at each switching time and on the simulation of an approximated internal model of CIP dynamics. We demonstrate, by simulating the modified intermittent control policy, that it can match the limits of human performance, while operating near the edge of instability.

Keywords: Cart Inverted Pendulum, saddle-like instability, intermittent feedback control, phase reset, internal model simulation

### INTRODUCTION

The manual stabilization of an inverted pendulum hinged on a cart, allowed to shift in a forward/backward manner (shortly CIP: Cart Inverted Pendulum), is an example of the many unstable tasks that humans must face in everyday life. It is indeed a standardized implementation of the well-known stick balancing task, where human subjects enjoy the challenge of stabilizing a rigid stick on their fingertips in the vertically inverted position. Other challenging tasks that

#### Edited by:

*Jun Izawa, University of Tsukuba, Japan*

#### Reviewed by:

*Tetsuro Funato, University of Electro-Communications, Japan Shunta Togo, University of Electro-Communications, Japan*

> \*Correspondence: *Pietro Morasso pietro.morasso@iit.it*

Received: *08 January 2019* Accepted: *11 March 2019* Published: *05 April 2019*

#### Citation:

*Morasso P, Nomura T, Suzuki Y and Zenzeri J (2019) Stabilization of a Cart Inverted Pendulum: Improving the Intermittent Feedback Strategy to Match the Limits of Human Performance. Front. Comput. Neurosci. 13:16. doi: 10.3389/fncom.2019.00016* share a similar dynamics, although quite different in many respects, are tightrope walking or walking on stilts. Apparently, a different ballgame is the task of upright standing that any healthy adult is capable to manage in an effortless manner, without considering it "challenging" in any sense. However, although this "trivial" skill shares with the other "challenging" tasks the same inverted pendulum biomechanics, it differs in a specific but relevant aspect related to the control strategy, namely the availability of muscle stiffness (more specifically ankle stiffness) as a stabilizing mechanism, a feature which is not physically possible in stick balancing or walking on stilts.

Although the different balancing paradigms mentioned above involve a number of degrees of freedom it is always possible, at least as a first approximation, to focus on a simplified inverted pendulum paradigm (IP) with a single degree of freedom: the ankle joint, in the case of upright standing, or the virtual joint that characterizes the relative motion of the stick on the fingertip in the stick balancing task. In the former case the neural controller can combine two stabilizing mechanisms, namely coactivation of ankle muscles in order to modulate ankle stiffness, and active generation of ankle torque, on the basis of a feedback control loop driven by sensory feedback of the body sway. In the case of stick balancing, in contrast, the stiffness of the virtual joint is null by definition and the only available control strategy is feedback based. As a matter of fact, the simplicity and availability of a stiffness mechanism has been suggested by some researcher (Winter et al., 1998), supporting the hypothesis that ankle stiffness strategy is sufficient for the stabilization of upright standing, without any need of an additional control loop that is complicated by the significant delay of sensory feedback. Unfortunately, direct measurements of ankle stiffness (Loram and Lakie, 2002; Casadio et al., 2005) as well as the detailed analysis of spindle feedback (van Soest et al., 2003) ruled out the chances of stabilizing upright stance with a pure stiffness strategy. However, stiffness does contribute to stabilization in such paradigm, relieving delayed feedback control of a significant part of the effort. The remaining part, however, must struggle with the curse of instability due to delayed sensory feedback, on top of the intrinsic instability of the inverted pendulum mechanics, exactly as the apparently different IP paradigms mentioned above. The subjective impression of a marked difference, in terms of psychophysical challenge, between upright standing and CIP balancing, may be due to the fact that evolutionary adaptation to bipedal standing in humans had the chance to optimally tune the parameters that allow the apparently seamless integration of "passive" stiffness with "active" delayed feedback control thus making upright standing an apparently trivial action.

Apart from the presence or absence of a stiffness component of the control action, the different IP paradigms differ as regards two other important features: (1) the employed sensory channels (visual, proprioceptive, and vestibular), and (2) the relation between the CoM (the projection of the Center of Mass of the IP on the support base) and the CoP (Center of Pressure, i.e., the centroid of the contact forces exchanged between the IP and the support base). In all cases, the horizontal acceleration of the CoM, with reference to an unstable equilibrium position typical of any IP system, is approximately proportional to the difference between the position of the CoM and the position of the CoP; moreover, the two variables (CoM and CoP position) can switch their role in the control framework, as controlled variable vs. control variable, while maintaining the goal of the control action, namely to avoid falling, which means to keep the CoM position within a limited interval around the equilibrium position.

In standard bipedal upright standing, the CoP is the control variable and its motion is proportional to the variation of the ankle torque related to the activation of the ankle muscles. In stilt standing, which has been studied mainly as regards energetics (Vaida et al., 1981) the position of the CoP is constrained by the environment and cannot be controlled. The same situation characterizes as well-upright standing in reduced/constrained support conditions, such as standing on a narrow bar or on a tight rope: in such case oscillations in the medio-lateral direction are compensated by spreading the control action to a number of joints of the lower and higher limbs in order to restrain as much as possible the overall sway of the CoM around the fixed CoP. Moreover, the period of such oscillations can be lengthened, thus simplifying the control action, by grasping a long balancing bar. In the CIP or the stick balancing task the relative position of the stick CoM with respect to the CoP is the controlled variable: vestibular information does not help in this case whereas vision becomes dominant. In any case, the feedback component of the stabilization process relies on sensory feedback information about the state of the controlled object and the neural controller must overcome multiple sources of instability, in addition to the gravitational toppling action, namely feedback time delays, sensory and motor noise (Milton et al., 2008).

There is ample evidence suggesting the discontinuous nature of the feedback control action, irrespective of the different experimental conditions and different body segments. Consider, for example, the analysis of posturographic patterns (Collins and De Luca, 1993; Morasso and Schieppati, 1999; Morasso and Sanguineti, 2002), EMG signals (Gatev et al., 1999; Loram and Lakie, 2002; Nomura et al., 2013), and the non-uniform character of sway path (Jacono et al., 2004). Several types of neural control have been proposed in recent years: time-delayed feedback with multiplicative noise (Cabrera and Milton, 2002), model predictive controllers with a sensory uncertainty (Mehta and Schaal, 2002; Gawthrop et al., 2011; Loram et al., 2011, 2016; Insperger and Milton, 2014), time-delayed proportionalderivative-acceleration feedback control (Insperger et al., 2012).

Another promising alternative, that was investigated in previous studies specifically for upright standing, is the intermittent time-delayed feedback control policy (referred to as the intermittent-feedback controller or the intermittentfeedback-control strategy in this article), whereby the human body is modeled as a single or a double inverted pendulum (Bottaro et al., 2005, 2008; Asai et al., 2009, 2013; Suzuki et al., 2012). The power of this strategy stems from its ability to take advantage of an "affordance" of the intrinsic dynamics of an inverted pendulum, namely the fact that the upright equilibrium posture with no active feedback is characterized by a saddletype instability accompanied by a hyperbolic vector field with stable and unstable manifolds in its phase space: when the driving action is switched off, the state vector is attracted to the equilibrium configuration, if the vector is closer to the stable than to the unstable manifold, whereas it is repulsed away in the opposite case. This "affordance" suggests to adopt an alternation paradigm between an "off-phase," in the former case, and an "on-phase," based on a simple proportional-derivative feedback of the delayed state vector, in the latter case. Surprisingly, the alternation between the off- and on-phases (although both characterized by unstable dynamics) can lead to overall bounded stability in a robust manner (Bottaro et al., 2008; Asai et al., 2009, 2013; Suzuki et al., 2012). In a recent paper (Yoshikawa et al., 2016) showed that this control policy can be applied with success also to the CIP system providing a robust dynamic stabilization of the inverted stick as well. Moreover, that study demonstrated that such control policy, based on the alternation of on-phases and off-phases, can reproduce features of the stick oscillations that are known to characterize the performance of expert CIP users: (1) the temporal fluctuations of the velocity increments of the stick, which are not Gaussian but exhibit a truncated Lévy distribution (Cabrera and Milton, 2004; Cluff and Balasubramaniam, 2009); (2) the corrective fingertip movements, which alternate between phases with extremely low movement amplitudes and those with high movement amplitudes, according to a power-law distributions of the inter-corrective movement intervals (Cabrera and Milton, 2002).

It is important to note that for the intermittent control policy the feedback control action operating in the on-phase is not intended to push the state toward the ideal equilibrium position, i.e., the origin of the phase space, but to drive the orbit as close as possible to the stable manifold in order to turn off the control action when and if such condition is reached, in such a way to exploit the "affordance" provided by the intrinsic dynamics of the system during the subsequent off-phase. Since this strategy can stabilize upright standing even if the dynamics of the onphase is unstable when applied continuously, it greatly expands the size of the stability area in the space of control parameters, in comparison with a conventional continuous control paradigm. However the application of this control policy to the CIP task (Yoshikawa et al., 2016) can be successful, in its standard formulation, only if the task is not too challenging: a stick length longer than 50 cm and a feedback delay shorter than 100 ms. In contrast, expert CIP users can perform well also in much more challenging situations, with a pendulum length as short as 32 cm and an overall sensory delay as long as 230 ms (Milton et al., 2016). Should we conclude that the intermittent control policy is not appropriate to reach the human performance limits but is only adequate for less challenging unstable tasks? The main goal of this paper was indeed to falsify this hypothesis, by outlining a plausible extension of the standard intermittent control policy of the CIP task while maintaining the simplicity of the approach. In order to achieve that goal we will first analyze the reasons of the inability of the standard intermittent control policy to match the human limits and will focus, in particular, on the switching rule that supervises the alternation paradigm: in the standard version of the intermittent controller it is a simple threshold mechanism in the state space of the stick, based on delayed sensory information, and the design/learning problem is reduced to the identification of an optimal tuning of the proportionalderivative control parameters that could limit the oscillations around a limit-cycle. If the CIP task is not too challenging, it is indeed possible to identify a region in parameter space that supports bounded stability and thus allows optimal parameter tuning. However, with an increase of the task difficulty the size of that region decreases and ultimately vanishes when approaching the human performance limits. In other words, the problem is that the standard intermittent strategy is functional if the task is not too unstable and ultimately it fails when the delay of the sensory feedback is significantly larger than the intrinsic falling time constant of the inverted pendulum. An additional reason of failure, in a challenging configuration of the task, is the interaction between cart dynamics and stick dynamics during the on-phase: this interaction, together with the short time constant due to a short stick length, contributes to determine the inappropriate termination of the on-phase by the standard switching mechanism and thus the initiation of the off-phase with a state vector of the stick that is far away from the stable manifold and thus is not appropriate for taking advantage of the affordance provided by the saddle dynamics of the inverted stick. The alternative that is proposed in this study is indeed to substitute the statically tuned threshold mechanism of the standard intermittent controller with a dynamic mechanism that focuses directly on the sequence of switching times, by phaseresetting the estimated state vector at each switching time, using a short-term sensorimotor memory for compensating the intrinsic feedback delay, and running a simplified internal model of the CIP dynamics for terminating each on-phase with a state vector as close as possible to the stable manifold.

Generally speaking, phase-resetting is a phenomenon of synchronization of self-sustained oscillatory activity that may characterize populations of neurons (Tass, 2007) or macroscopic behaviors driven by Central Pattern Generators as in the case of locomotion (Yamasaki et al., 2003). In particular, it is well-known that the rhythmic walking pattern can have adaptive sudden phase shifts in response to external perturbations, as the heel strike event. In the case of the CIP model, the underlying self-sustained oscillatory activity is the alternation of off-phases and on-phases intrinsic in the intermittent control paradigm. Moreover, we suggest that the crucial event that may allow the on-going oscillation to maintain bounded stability is the switch time that marks the termination of the on-phase and the initiation of the off-phase; the idea is to phase shift the estimated value of the state vector of the pendulum at that switch time by tapping the short-term memory of delayed estimates. This phase shift is made possible by a second "affordance" related to the off-phase of the intermittent control strategy, namely the possibility to predict the timing and the geometry of the off-phase trajectory. In conclusion, the new intermittent control policy includes a predictive element, intended to defeat the destabilizing effect of the sensory feedback delay, in contrast with the standard policy that does not use any prediction. However, such prediction is not continuous in time but discontinuous as the underlying control action.

#### THE MODEL

The CIP model is a dynamical system with 2 Degrees of Freedom (DoFs): the cart position x and the pendulum angle θ (**Figure 1**).

It is an under-actuated system because the human user has a single control variable, namely the force f(t) applied to the cart, and thus it is impossible for the user to realize arbitrary trajectories of the two state variables. However, the task of the trained subject is (apparently) simpler: to control f(t) in order to avoid "fall" over a suitably long interval of time. This means to keep the tilt angle smaller than a given value (in the simulations we used |θ(t)| < π/4) while maintaining the position of the cart inside a "reachable interval" (|x(t)| < 1x) that depends on the fact that the subject is sitting or standing or other physical arrangements.

The CIP system is feedback controlled, i.e., the Central Nervous System generates the output motor variable f(t) by relying on sensory feedback about the state of the cart [θ, θ˙, x, x˙]: this sensory information is multi-modal (vision + proprioception), delayed (delay δ) and noisy. Both elements, namely delay and noise, tend to set limitations to the performance of human subjects, reducing their capability to avoid the fall of the pendulum. In the simulations carried out for this study the feedback delay is set to 230 ms, taking into account the experimental evaluations of Milton et al. (2016). The sensory feedback uncertainties are modeled as an additive Gaussian noise, as in Yoshikawa et al. (2016), which is added to the control force f(t): f<sup>n</sup> (t) = σ ξ (t), where ξ (t) represents a Gaussian white noise with zero mean and unit variance and σ is the noise intensity (standard deviation of the noise).

The CIP system parameters are the pendulum length L, the pendulum mass m, and the cart mass M. From the point of view of task difficulty L is the critical parameter. As reported by Milton et al. (2016) a length of 32 cm is the limit for human subjects. We chose this value for the simulations. As regards the other two parameters we adopted the same values used by Yoshikawa et al. (2016): m = 0.125 kg, M = 2 · m = 0.25 kg.

The dynamics of the CIP system is governed by the following non-linear dynamic equations (see the **Supplementary Material** for details):

$$
\begin{bmatrix} \ddot{\theta} \\ \ddot{x} \end{bmatrix} = \begin{bmatrix} A\_{11}(\theta) \ A\_{12}(\theta) \\ A\_{21}(\theta) \ A\_{22}(\theta) \end{bmatrix} \begin{bmatrix} \sin \vartheta \\ f \end{bmatrix} \tag{1}
$$

where the matrix elements are functions of the pendulum angular tilt (g is the gravity acceleration):

$$\begin{cases} A\_{11} = \frac{1.5}{L(M + m(1 - 0.75 \cos^2 \theta))} ((M + m)g - 0.5 m L \dot{\theta}^2 \cos \theta) \\\ A\_{12} = \frac{-1.5 \cos \theta}{L(M + m(1 - 0.75 \cos^2 \theta))} \\\ A\_{21} = \frac{1}{M + m(1 - 0.75 \cos^2 \theta)} (0.5 m L \dot{\theta}^2 - 0.75 m g \cos \theta) \\\ A\_{22} = \frac{1}{M + m(1 - 0.75 \cos^2 \theta)} \end{cases} \tag{2}$$

Although the simulations considered in the results section use the non-linear model above, for stability analysis and for managing the alternation between on- and off-phases a linearized model is used, in the neighborhood of the origin, described by the following equations:

$$
\begin{bmatrix} \ddot{\theta} \\ \ddot{x} \end{bmatrix} = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \begin{bmatrix} \vartheta \\ f \end{bmatrix} \tag{3}
$$

with the following constant matrix elements:

$$\begin{cases} A\_{11} = \frac{1.5(M+m)}{(M+0.25m)} \frac{\text{g}}{L} \\ A\_{12} = -\frac{1.5}{(M+0.25m)L} \\ A\_{21} = -\frac{0.75m\text{g}}{M+0.25m} \\ A\_{22} = \frac{1}{M+0.25m} \end{cases} \tag{4}$$

By looking at Equations 1 or 3 it is immediate to observe that, in the absence of control action, the motion of the pendulum is independent of the motion of the cart. Moreover, in the case of the linearized model, such motion is characterized, in the phase plane of the pendulum (θ vs θ˙), by an instability of the saddle type, with two real eigenvalues of opposite signs (λ = ± √ A11). The corresponding eigenvectors identify, respectively, <sup>a</sup> stable manifold (θ˙ = −√ A11θ), namely a line whose halfline trajectories converge to the origin, and an unstable manifold (θ˙ = +<sup>√</sup> A11θ), namely a line whose half-line trajectories diverge from the origin: the unstable manifold spans the first and third quadrants of the phase plane and the unstable manifold spans the second and fourth quadrants.

#### The Standard Intermittent Control Policy of the CIP Model Based on Optimal Tuning of the Feedback Control Parameters

The intermittent stabilization strategy was originally developed for modeling the stabilization of upright standing, when Morasso et al. CIP Intermittent Feedback Strategy

representing the standing body as a single DoF inverted pendulum (Bottaro et al., 2005, 2008; Asai et al., 2009, 2013; Suzuki et al., 2012). In that case the control variable is the ankle torque τ , whereas in the CIP system it is the force f applied to the cart. In both cases, however, there is an alternation of on-phases, where the control action is provided by a simple Proportional/Derivative (PD) delayed feedback error mechanism, and off-phases, where the control action is switched off. The error signals, for the CIP system, are the differences of the two DoFs (θ, x) from the corresponding reference values (θref = 0; xref = 0) and the control action is characterized by two proportional parameters: (P<sup>θ</sup> , P<sup>x</sup> ) and two derivative parameters (D<sup>ω</sup> , D<sup>v</sup> ). In short, the standard version of the intermittent control policy is summarized by the script of **Box 1**:

This control policy should be compared with the corresponding continuous control model characterized by the following equation, active all the time:

$$f(t) = \hat{P}\_{\theta} \,\,\theta(t-\delta) + \hat{D}\_{\theta} \,\,\dot{\theta}(t-\delta) + \hat{P}\_{x} \,\,\mathbf{x}(t-\delta) + \hat{D}\_{\nu} \,\,\dot{\mathbf{x}}(t-\delta) \tag{5}$$

The stability analysis of this control policy, carried out by Yoshikawa et al. (2016), demonstrated that asymptotic stability can be achieved provided that the feedback delay satisfies the following condition:

$$
\delta < \sqrt{\frac{L}{g}} \tag{6}
$$

In particular, for L = 32 cm we have δ < 180 ms and this means that the continuous control policy has no chance of stabilizing the CIP system with such stick length and a feedback delay beyond 200 ms. But also the standard intermittent control policy could fail in such conditions for the reasons that we explain in the following.

In the standard intermittent control policy the switching rule between the two phases is formulated in the phase space of the pendulum (θ vs θ˙) and divides the plane into two areas, namely the on-area and the off-area. The off-area includes the second and fourth quadrants plus/minus an angular slice, whose amplitude is a function of the parameter a, whereas the on-area includes the first and third quadrants minus/plus the same angular slice. In the following we assume for simplicity that a = 0 and thus the angular slice disappears.

During an off-phase, initiated at t = toff either in the second or the fourth quadrant of the phase space, the orbit of the state vector will follow a hyperbolic trajectory that initially approaches the origin arriving at a minimum distance (at t = tc) when the trajectory intersects one of the two coordinate axes, thus entering one of the other two quadrants influenced by the unstable manifold: thereafter the trajectory will diverge while approaching the unstable manifold. The initial part of the hyperbolic trajectory (up to t = tc) is the "affordance" provided by the intrinsic dynamics of the inverted pendulum: during that time there is no need to force the system with active control because mechanics itself carries out the job of fighting the danger of falling. On the other hand, the switching rule is not applied to the current state vector - θ (t), θ˙(t) but to the corresponding delayed sample - θ (t − δ), θ˙(t − δ) , thus the off-phase will be terminated not at the time of crossing the border between the stable and unstable area but δ milliseconds later: ton = t<sup>c</sup> + δ.

The problem, as exemplified in **Figure 2**, is that the timing of the hyperbolic trajectories, as well as the relative position of the state vector at ton with respect to the position at toff , strongly depend on the initial distance of the state vector from

the stable manifold and on the main parameters of the CIP system, namely stick length L and feedback delay δ. In **Figure 2** all the hyperbolic trajectories initiate with the same angular tilt but with different distances from the stable manifold: the initial blue segment terminates when it intersects one of the two coordinate axes and the following red segment terminates after a fixed time interval equal to the sensory feedback delay δ, namely when the activation condition turns on. **Figure 2A** refers to the CIP model investigated by Yoshikawa et al. (2016) with the following parameters: L = 50 cm and δ = 100 ms. **Figure 2B** refers to a much more challenging CIP model, with L = 32 cm and δ = 230 ms. These graphs clarify that for the same initial angular tilt of the pendulum the final position of the state vector will end up further and further away from the origin, as the initial distance from the stable manifold increases, and this potentially diverging pattern emerges clearly in the second configuration of the CIP model that, as observed above, represents the upper limit of human performance.

In the standard intermittent control policy the on-phase is initiated δ seconds after the state vector of the stick has entered the unstable area, at time t = ton. Thereafter, the orbit of the state vector will follow an expanding spiral or nodal course, as a function of the PD parameters of the stick (Pθ , Dω), if the PD parameters of the cart (Px, Dv) are null; moreover, such unstable behavior of the inverted stick is further amplified by including the cart component in the control action. The purpose of this component is indeed to restrain the range of oscillation of the cart to a small feasible value but from the point of view of stick balancing it is an additional source of instability. In any case, the forced orbit of the state vector will ultimately cross a coordinate axis at t = tc1, leaving the on-area, and will be terminated, thus initiating the next off-phase, at t = toff = tc<sup>1</sup> + δ. In short terms, the evolution of the state vector of the stick will be shaped as an alternation of segments of hyperbolic orbits in the offcondition and segments of expanding spiral or nodal orbits in the on-condition with the following timing:

$$t\_{\rm off} \to t\_c \to t\_{\rm on} = t\_c + \delta \to t\_{c1} \to t\_{\rm off} = t\_{c1} + \delta \to \dots \quad \text{(7)}$$

The orbits of the off-phases only depend on the CIP parameters (L, M, and m) whereas the orbits of the on-phases also depend on the control parameters and the motion of the cart, due to the feedback of the control policy. The chance of success of the standard intermittent control policy is determined by the choice of the PD parameters and, in particular, by the fact that such tuning may induce a distribution of state vectors at t = toff centered as much as possible on the stable manifold and with a very narrow standard deviation. In order to clarify this point, let us use the following parameter for measuring the distance of the state vector from the stable manifold at t = toff :

$$\gamma\_{\text{off}} = \left| \frac{\dot{\theta}(t\_{\text{off}})}{\theta(t\_{\text{off}})\sqrt{A\_{11}}} \right| \tag{8}$$

γoff = 1 means that the state vector is "on" the stable manifold, i.e., the distance is null; γoff > 1 means that the state vector is above the stable manifold and γoff < 1 that it is below it. The average value of this parameter should be as close as possible to 1, with a suitably small standard deviation. The target of the intermittent control policy indeed is not the equilibrium point, i.e., the origin, but the whole stable manifold at the end of the on-phases. If the PD parameters are optimally tuned the value of γoff on average will be sufficiently close to 1 to induce hyperbolic segments with contracting properties, i.e., with a distance from the origin at t = ton smaller than the distance at t = toff . Such contracting properties of the off-phases may compensate, on average, the expanding properties of the spiral/nodal segments during the on-phases, supporting the emergence of limit-cycle oscillations. As a matter of fact, the study by Yoshikawa et al. (2016) demonstrated that this kind of bounded stability can be achieved with a stick length of 1 m and a sensory delay of 100 ms. On the other hand, this is not possible in the human limit conditions (stick length of 32 cm and sensory delay of 230 ms). In order to better understand the reasons of this failure of the standard intermittent control policy let us focus our attention on the kinematics of the stick during the off-phases. In the **Supplementary Material** we demonstrate that during the offphase, initiated at t = toff , the time required by the hyperbolic trajectory of the state vector to cross the pertinent coordinate axis, at t = t<sup>c</sup> , is well-approximated by the following equation, which is derived from the linearized CIP model of Equation 3:

$$
\Delta t\_{\rm cross} = t\_{\rm c} - t\_{\rm off} = \frac{1}{2\sqrt{A\_{11}}} \ln \left( \frac{1 + \chi\_{\rm off}}{|1 - \chi\_{\rm off}|} \right) \tag{9}
$$

The time interval computed by this formula does not depend on the initial tilt angle per se but on the "distance" from the stable manifold, measured by the value of γoff : it strongly increases as the distance of the starting point from the stable manifold decreases, ultimately diverging when it becomes zero. The reason is that, in such case, the starting point is exactly on the stable manifold and the hyperbolic trajectory degenerates to the line of the corresponding manifold; moreover, the crossing points coincides with the origin and is reached asymptotically following an exponential descent.

The graph of **Figure 3A** plots the variation of the time to cross described by Equation 9, computed for the most critical value of the sensory delay time (δ = 230 ms) and for different values of the stick length. It clearly shows that, with decreasing values of the stick length, the interval of values of γoff that are compatible with a contracting pattern of the off-phase strongly decreases. We should consider indeed that the total duration of the hyperbolic trajectory for a given off-phase, with the switching rule of the standard intermittent policy, is as follows:

$$\text{Duration of the off-phase: } \Delta t\_{cross} + \delta \tag{10}$$

Moreover, since the hyperbolic trajectories of the off-phase are approximately symmetric with respect to the intersected coordinated axis, the condition that the off-phase orbit is not expanding (a sufficient condition for stability) is as follows:

$$
\Delta t\_{cross} > \delta \Rightarrow t\_c - t\_{off} > t\_{on} - t\_c \tag{11}
$$

FIGURE 3 | Characteristic timing of the hyperbolic trajectories in the off-phases. 1*tcross* is the time taken by an hyperbolic trajectory to cross the border, between the off-area and the on-area, as a function of the distance of the starting point - θ0, θ˙ 0 from the stable manifold ( ˙ θ = −p *A*11 θ). Such distance is measured by the parameter γ*off* = θ˙ *off* <sup>θ</sup>*off*<sup>√</sup> *A*11 . γ*off* = 1 means that the starting point of an off-phase is exactly on top of the manifold and in this case the crossing time diverges, whereas it quickly decreases with the increase of <sup>γ</sup>*off* <sup>−</sup> <sup>1</sup> . (A) Displays 1*tcross* as a function of γ*off* for different values of the stick length and feedback delay δ =230 ms. Since the stability condition of the off-phase for the standard intermittent control policy is 1*tcross* > δ, the graph clearly shows that the interval of values of γ*off* that support such condition strongly decreases with the shortening of the stick length. (B) Focuses on the most challenging configuration of the CIP balancing task (*L* = 32 *cm*, δ = 230 *ms*) and compares the range of values of γ*off* that support stability in the standard and in the new intermittent control policy (1*tcross* > δ vs. 1*tcross* > δ/ (1 + ρ), respectively). ρ = 0.8 is the "contraction factor".

The graph of **Figure 3A** shows that for a stick length of 100 cm the condition above requires that the initial distance of the state vector from the stable manifold <sup>γ</sup>off <sup>−</sup> <sup>1</sup> is about ±0.3; for a stick length of 50 cm the distance should be < ±0.1 and for the limit case of the 32 cm stick the distance should be even smaller (±0.05). We also emphasize that, even with an optimal tuning of the feedback parameters, the distance from the stable manifold at the end of the on-phase will be spread in a range strongly growing with the decrease of the stick length, as a consequence of the sensory noise and the disturbing effect of the cart motion. For this reason the simple switching mechanism of the standard intermittent control policy is doomed to fail at some level of difficulty of the task and this may suggest to the trained subject a modification of the intermittent control policy, focusing on the optimal tuning of the switching times rather than the PD control parameters.

The **Supplementary Material**, in addition to Equation 9, provides also the derivation of the following equation, which describes the full course of the stick trajectory in the off-phase, and, in particular, can be used in order to predict the state of the stick at the time of termination, i.e., at t = ton:

$$\begin{cases} \theta(t) = \frac{\dot{\theta}\_{\text{off}} + \theta\_{\text{off}}\sqrt{A\_{11}}}{2\sqrt{A\_{11}}}e^{\sqrt{A\_{11}}(t - t\_{\text{off}})} + \frac{-\dot{\theta}\_{\text{off}} + \theta\_{\text{off}}\sqrt{A\_{11}}}{2\sqrt{A\_{11}}}e^{-\sqrt{A\_{11}}(t - t\_{\text{off}})} \\ \dot{\theta}(t) = \frac{\dot{\theta}\_{\text{off}} + \dot{\theta}\_{\text{off}}\sqrt{A\_{11}}}{2}e^{\sqrt{A\_{11}}(t - t\_{\text{off}})} - \frac{-\dot{\theta}\_{\text{off}} + \theta\_{\text{off}}\sqrt{A\_{11}}}{2}e^{-\sqrt{A\_{11}}(t - t\_{\text{off}})} \end{cases} (12)$$

Moreover, let us consider the disturbing effect of the cart motion on the dynamics of the pendulum, i.e., the cross-coupling between the cart and the pendulum dynamics during the onphase. Suppose indeed that the PD pendulum parameters were optimally tuned, in such a way to drive the pendulum state vector, in the absence of cart control, on top of the stable manifold at t = toff , which is the ideal situation for exploiting the stabilizing effects of the off-phase dynamics. However, even in this case, a minimum amount of drive of the cart motion, just sufficient to maintain the cart position in a feasible range, will induce a variability of the initial state vector (θ(toff ), θ˙(toff ) that, given the strong non-linearity of Equation 9, will inevitably trigger a transition to instability: the larger the error (i.e., the distance of γoff from the target value of 1) the quicker will be the descent of the undriven hyperbolic trajectory with the danger of overpenetrating the on-region and thus enlarging more and more the composite orbit away from equilibrium.

### The New Intermittent Control Policy of the CIP Model Based on On-Line Selection of the Switching Times

In the standard intermittent control strategy, the sequence (ton, toff , ton, toff , ...) of switching times for activation/disactivation of the delayed feedback control is an indirect effect of the choice of control parameters and thus there is no guarantee that when active control is switched off the state vector is close enough to the stable manifold, in such a way to produce a sequence of hyperbolic-spiral-hyperbolic-spiral-. . . . oscillatory segments of the inverted stick approaching a limitcycle of the unstable equilibrium point. However, if the stick is sufficient long (e.g., 0.5 m) it is possible to identify a range of control parameters that indirectly produce a bounded stability, as demonstrated by Yoshikawa et al. (2016). As a matter of fact, falling is what happens frequently to naïve subjects who typically need a long training exercise for a CIP configuration near the limit conditions defined above. We suggest that this

achievement can be obtained by building an internal model of on-line adaptation that complements, for the more challenging configurations of the system, the static parametric optimization of the standard intermittent control policy. The crucial step, in our opinion, is to focus the attention of such "cybernetic supervisor" on the explicit selection of switching times in relation with the corresponding sequence of on-phase and off-phase trajectories.

There are indeed two crucial events in the sequence that need to be optimized in order to avoid the spiraling away of the CIP oscillatory patterns:


In summary, the script of the standard version of the intermittent control policy is substituted by the following one (**Box 2**), taking into account that the explicit selection mechanisms of ton and toff , respectively, that will be examined in the following sections:

#### Terminating the Hyperbolic Trajectory of the Off-Phase by an Explicit Selection of ton

At the termination of the on-phase, i.e., when the time stamp t = toff is instantiated, the hyperbolic trajectory is started but the actual position in the phase space of the state vector of the stick σoff = - θ(toff ), θ˙(toff ) is unknown because the control system has direct access only to the delayed state which may be markedly different from the real one. The knowledge of σoff is not relevant for the neural control of the hyperbolic off-phase trajectory, which is fully determined by the physics of the CIP system, but it is crucial for the explicit selection of ton and for the prediction of the corresponding initial state of the on-phase:

$$
\sigma\_{\alpha\eta} = [\theta(t\_{\alpha\eta}), \,\dot{\theta}(t\_{\alpha\eta})].
$$

A key idea of the new intermittent control policy is that σoff can be recovered in a natural way not at t = toff but at t = toff + δ by assuming that the neural controller has access to the short-term sensory-motor memory of the trajectory of the stick: the initially unknown position will indeed become available δ seconds later by directly tapping the delayed sensorimotor information:

$$
\theta\_{\rm off} = \theta \left[ \left( \mathbf{t}\_{\rm off} - \delta \right) + \delta \right]; \ \dot{\theta}\_{\rm off} = \dot{\theta} \left[ \left( \mathbf{t}\_{\rm off} - \delta \right) + \delta \right].
$$

With this geometric information it is then possible to estimate 1tcross, that characterizes the descending part of the hyperbolic trajectory, up to 1t = t<sup>c</sup> , by using Equations 8 and 9, without any interference of the concurrent cart motion. Moreover, with such timing information it is possible to choose the appropriate termination time of the off-phase by setting up a timer at a future time instant t = ton, thus concluding the explicit selection of the off-phase temporal sequence: toff → t<sup>c</sup> → ton. More specifically, the terminal time should be selected in such a way to induce a contracting effect of the off-phase trajectory, i.e., |σon| < <sup>σ</sup>off  and this effect can be easily achieved with the following choice:

$$t\_{on} = t\_c + \Delta t\_{cross} \cdot \rho \tag{13}$$

where ρ is the "contraction factor" (in the simulations we used a value of 0.8 but the specific value is not critical for stability, provided that it is <1). In summary, the computational process for exploiting in the best way the self-balancing properties of the off-phase can be described by the following script (**Box 3**):

Box 3 | Explicit selection of ton in the new version of the Intermittent control policy


It is important to highlight that the first step of the script plays the role of phase-resetting the time course of the measured stick angular oscillation, compensating at least locally the intrinsic feedback delay. However, the contracting pattern of the off-phase trajectory, namely that |σon| < <sup>σ</sup>off , can occur if and only if the following condition is met:

$$t\_{on} - t\_{off} \, > \, \delta \tag{14}$$

This is also equivalent to the following condition on the time to cross of the hyperbolic trajectories and, ultimately, on the corresponding initial distance of the state vector from the stable manifold:

$$
\Delta t\_{\rm cross} > \frac{\delta}{1+\rho} \tag{15}
$$

Such stability condition of the new intermittent control policy should be compared with the corresponding condition of the standard policy:

$$
\Delta t\_{cross} > \delta \tag{16}
$$

We may conclude (see also **Figure 3B**) that the new intermittent control policy is much more robust than the standard policy as regards the regulation of the off-phase in order to guarantee that |σon| < <sup>σ</sup>off  because it can tolerate a much larger range of values γoff , i.e., a greater inaccuracy in the termination of the on-phase in terms of distance of σoff from the stable manifold.

#### Terminating the On-Phase by Running an Internal Model of the Forced Dynamics for the Explicit Selection of toff

After activation of the feedback control signal at t = ton, the orbit of the pendulum state vector will spiral away from the unstable manifold intersecting first the x-axis and then the stable manifold. The latter event is the crucial piece of information for terminating the activation phase in the optimal way, i.e., for allowing to exploit in the best possible way the stabilization affordance of saddle dynamics (θ˙ = −√ A<sup>11</sup> θ). The problem is that detecting this event is far from trivial: while the evolution of the hyperbolic trajectory in the off-phase is fully predictable and can be computed by taking advantage of an explicit equation, no such formula is available in the on-phase mainly for the disturbing effect of the cart dynamics on the dynamics of the pendulum. On the other hand, attempting to detect the intersection directly by means of the delayed sensory information is likely to be very imprecise for the high falling speed of the 32 cm stick. The proposed solution is to run a simulation of a simplified internal model of the forced CIP dynamics, for t > ton, using Equations 3: the simulation is initialized with the predicted value of the pendulum state vector at t = ton, i.e., σˆ (ton), made available by the phase reset of the stick oscillation pattern explained in the previous section. Such simulation will generate an approximated but un-delayed version θˆ(t) of the real trajectory of the stick that can be used for terminating the off-phase. Summing up, the explicit selection of toff in the new version of the intermittent control policy is characterized by the following script (**Box 4**):

Box 4 | Explicit selection of toff in the new version of the Intermittent control policy


#### Simulation of the New Intermittent Control Policy

The simulations were carried out with Matlab© (MathWorks), using the forward Euler method with a time step of 1 ms. The control force f(t) includes an additive noise term: a Gaussian white noise with zero mean and standard deviation equal to 0.015 N. Such noise intensity is similar to the average noise intensity used by Yoshikawa et al. (2016) for the standard intermittent control model.

Another source of uncertainty is related to the estimate of the slope of the stable manifold, which is required by the new intermittent control policy for terminating the on-phase. We modeled such uncertainty with a zero mean Gaussian white noise in order to induce a 20–30% variability of the slope value. In this manner the intersection of the internal model simulation with the stable manifold will be randomized, triggering off-phase trajectories with different values of γoff . This uncertainty incorporates also the influence of the inaccuracy of the simplified internal model of CIP dynamics because both sources of uncertainty (the one related to the slope and the other to the internal model) only matter as long as they co-influence the misselection of the switching time from the on-phase to the off-phase.

As regards the PD parameters of the stick (Pθ , Dω) we identified rough initial estimates by considering the linearized model equations of Equation 3, while ignoring the influence of the cart on the stick dynamics:

$$\ddot{\theta} = A\_{11}\theta + A\_{12}f \approx A\_{11}\theta + A\_{12} \text{( $P\_\theta$  } \theta \text{( $t-\delta$ )} + D\_\alpha \dot{\theta} \text{( $t-\delta$ )}\text{)}\tag{17}$$

The delayed state vector was approximated with the first order Taylor's expansion<sup>1</sup> :

$$\begin{cases} \theta(t-\delta) \sim \theta(t) - \dot{\theta}(t)\delta \\ \dot{\theta}(t-\delta) \sim \dot{\theta}(t) - \ddot{\theta}(t)\delta \end{cases} \tag{18}$$

This provides the following approximated, linearized equation of the on-phase

$$\theta\left(1 + A\_{12}D\_{\alpha}\delta\right) + \theta\left(A\_{12}P\_{\theta}\left\{\delta - A\_{12}D\_{\alpha}\right\} + \left(-A\_{11} - A\_{12}P\_{\theta}\right) = 0\right) \tag{19}$$

The requirements for asymptotic stability of such model are then as follows:

$$\begin{cases} D\_{\alpha} < \frac{(M + 0.25m)L}{1.5\delta} \\ D\_{\alpha} > P\_{\theta}\delta \\ P\_{\theta} > (M + m)g \end{cases} \tag{20}$$

<sup>1</sup>Although the Taylor series expansion of delayed terms in differential equations is not a well-defined mathematical procedure, it is a simple heuristic technique for obtaining order of magnitude evaluations whose plausibility can be checked by means of experiments or simulations.

Moreover, we can obtain an estimate of the limit critical value of the time delay for achieving such asymptotic stability:

$$\delta\_{crit} = \sqrt{\frac{(M + 0.25m)L}{1.5(M + m)\text{g}}} \tag{21}$$

With the model parameters used in this study the critical value of the time delay is 128 ms and thus, with the considered delay of 230 ms, it will be impossible to satisfy all the three conditions above at the same time, in particular the first and the second one. However, by choosing the two PD parameters in such a way to satisfy the first and the third conditions we will be confident that the trajectories of the on-phase will be characterized either by an unstable node or spiral:

$$\begin{cases} D\_{\alpha} < 0.261 \text{ Ns/rad} \\ P\_{\theta} > 3.678 \text{ N/rad} \end{cases} \tag{22}$$

In particular, in the simulations we used the value D<sup>ω</sup> = 0.1686 Ns/rad for the former parameter and we varied, the latter, in the following range: P<sup>θ</sup> = 4 ↔ 20 N/rad.

The choice for the PD parameters of the cart (Px, Dv) was guided by two conflicting requirements:


In particular D<sup>v</sup> = 0.1 Ns/m is in the range of values validated by Yoshikawa et al. (2016); P<sup>x</sup> = 0.01 N/m satisfied the two requirements above, although its modification around that value was not critical.

### RESULTS

The simulations of the modified intermittent control policy of the extreme-CIP model were labeled successful if the controller could prevent the stick from falling ( <sup>θ</sup>(t)  < π/4), while keeping the cart in the prescribed range ( <sup>x</sup>(t) <sup>&</sup>lt; 0.8 <sup>m</sup>), for a time interval of 2 min (plus an initial transient of 1 min). A given control model was supposed to generate at least 70% successful repetitions in order to be labeled stable.

The simulation experiments used the following set of parameters:

On the basis of the experience previously gained from the standard intermittent control policy of upright standing, we focused our attention on the Pθ control parameter in order to test the plausibility of the heuristic indication coming from Equation 22. We found indeed that if P<sup>θ</sup> < 4 the intermittent controller failed in 100% of the simulation runs. However, a small of increase of Pθ was sufficient to stabilize the CIP in most of the cases. **Figures 4**, **5** show the result of a representative simulation performed with P<sup>θ</sup> = 5.

**Figure 4** displays the concurrent oscillations of the stick angle and the cart position, as well as the power spectral density (PSD) of the stick angle, characterized by a peak around 0.7 Hz, coherent with the experimental data of Yoshikawa et al. (2016) and (Milton et al., 2016). **Figure 5** is a representative phase portrait of the stick oscillation, generated by the concatenation of hyperbolic off-phases and spiraling on-phases, disturbed more or less by the concurrent motion of the cart. **Figure 6A** shows the histogram of γoff values that identify the distances of the state vector from the stable manifold, at the initial instant of each off-phase. The ideal value, in order to maximize the self-balancing action of the saddle-like instability, would be γoff = 1; the histogram shows that the distribution of this indicator over a simulation trial is indeed centered around the target value. The other two panels of **Figure 6** display the histogram of the duration of the on-phases and the corresponding histogram of the off-phases, respectively. The on-phases have generally a longer duration and are spread on a much larger range of values also as a consequence of the disturbing effect of the cart motion. In contrast, the offphases are generally shorter and tend to cluster around a value a little bit higher than the sensory delay δ as a consequence of the phase-reset mechanism of the new intermittent control policy.

In order to evaluate the robustness of the new intermittent control policy we performed 100 simulations while changing the Pθ control parameter from 4 to 20. Sample tests were also performed for evaluating the sensitivity to variations of the other parameters without exhibiting any critical tuning problem. **Figure 7** provides some evidence about the performance of the new control policy. Panel A shows that the probability of falling is 1 for P<sup>θ</sup> <4 but this value is quickly decreased to <0.2 around a value of 5 where failure rate is minimal. For higher values of Pθ the failure rate progressively increases up to a value close to 100%. The other two panels show the standard deviation of the stick oscillations (panel B) and cart positions (panel C) averaged over the successful trials of the 100 repetitions. Remarkably, in spite of the increasing failure rate with greater values of Pθ , the


FIGURE 4 | (A) Time series of the stick angle θ(*t*) during a 2 min balancing exercise. (B) Corresponding PSD. (C) time series of the cart displacement during the same time interval. CIP parameters: stick length *L* = 32 *cm*; cart mass *M* = 0.25 *kg*; Stick mass *m* = 0.125 *kg*; feedback delay δ = 230 *ms*. Controller parameters: *P*<sup>θ</sup> = 5 *N*/*rad*; *D*<sup>ω</sup> = 0.1826 *Ns*/*rad*; *P<sup>x</sup>* = 0.01 *N*/*m*; *D<sup>v</sup>* = 0.1 *Ns*/*m*.

range of the stick oscillation of the successful trials remains approximately stable. In contrast, there is a steady and significant increase of the amplitude of cart motion that is probably one of the reasons for the increasing failure rate. In summary, the reported experiments support the conclusion that the key control parameter should be tuned at the lowest possible value, just before the full-fledged establishment of uncontrolled instability.

We also evaluated the role of the uncertainty of the manifold slope, i.e., σslope. With σslope = 0.2, namely a 20% uncertainly about the real value of the slope, the new control policy can indeed succeed to stabilize the CIP for the limit human case of a stick length of 32 cm. However, we also found that with such uncertainty level the control policy can indeed perform in a super-human manner, achieving successful stabilization for stick lengths as short as 26 cm. In order to clarify the point we performed simulations with σslope varying between 0.2 and 0.3 and found that al the highest uncertainty level (σslope = 0.3) the control policy fails in 100% of the simulations with a stick length of 32 cm. We also found that the human performance limit (90% success rate with a stick length of 32 cm) can be achieved with σslope ∼ 0.25, i.e., with a 25% uncertainty of the slope of the unstable manifold. As previously remarked, this uncertainty incorporates also the inaccuracy of the internal simulation model as regards the selection of the termination time of the on-phase.

### DISCUSSION

The simulation experiments performed in this study demonstrate that the basic rationale of the intermittent control policy, namely the exploitation of the intrinsic "affordance" of saddlelike dynamics during off-phases, is still plausible also for the extreme configuration of the CIP stabilization task, matching the human performance limit, with a modification that keeps the core computational outline based on an alternation of on-phases and off-phases. The additional computational process is a phase reset mechanism that provides a prediction capability, not in realtime and in a continuous manner (with a frequency band of the order of the kHz) but in specific time instants, at a rate of the order of 1 Hz.

In addition to the capability of matching the human performance limit in CIP balancing with a rather minor increase of the computational complexity of the standard intermittent control model, the new control policy is consistent with the experimental evidence (Milton et al., 2016) that the best performance in terms of successful CIP balancing trials is achieved by tuning the main control parameter near the edge of instability. Although this characteristic feature has been interpreted as evidence of a minimization of energetic costs, we doubt that the energetic issue is relevant in the specific case of CIP balancing with a very light apparatus like the one used by Yoshikawa et al. (2016) and the CIP model of this study. We evaluated indeed that the mechanical power required for balancing the model in the successful trials is quite small, of the order of 0.1 mW, on average, with brief power peaks, typically one or two per minute, never exceeding a fraction of a Watt. In alternative to such explanation, we suggest that tuning the proportional feedback parameter to the lowest possible value, before triggering uncontrolled unstable oscillations, is consistent with the general strategy of minimizing "stiffness" (in the most

general sense) during the acquisition of a new skill, in the framework of a challenging learning process.

We need to stress that the new intermittent control model is not intended to substitute the standard version based on a threshold switching mechanism but should be considered as an extension made necessary for the human user when the challenge of the task is stretched to the limit of human performance. Without this motivation the simpler version of the control policy is the default choice: in that case, for the human user it is only necessary to tune a few control parameters and then freeze them during performance of the balancing task. We may speculate that when this strategy starts failing for the increased difficulty of the task the naïve user may attempt to extend it rather than substituting it with a completely different one. The logical key element that may attract the attention of the user is a more precise determination of the switching times, to be adapted at each oscillatory cycle, while inheriting all the dynamic features of the standard strategy that depend on the alternation of on-phases and off-phases. As already remarked, this additional computation, although somehow more complex than a simple threshold, has a limited bandwidth, related to the fine trimming of the sequence of transition times (ton, toff , ton, . . .), namely a few transitions per second. In particular, we suggested that this objective may be obtained by learning an internal model of the CIP dynamics paired with a phase-reset of the stick-state.

The limitations of the new control policy as well as the limitations of human performance are determined by the degree of uncertainty of the internal model components together with the noise of the feedback information about the state of the system. Ultimately, such sources of uncertainty are not important per se but for their effect on the inaccurate selection of toff : as a matter of fact, when the decision is taken to turn off the active control action, the state vector of the stick, whose real value has been approximated by the simulation of the internal model, may end up far away from its ideal target, namely the stable manifold of the CIP, whose slope is known with some uncertainty in any case. Therefore, what matters is not the precision per se of the state vector prediction generated by the simulation model or the accuracy per se of the estimate of the stable manifold slope but the overall inaccuracy of the relative position at toff of the state vector with respect to the stable manifold, that we characterized with the γoff indicator.

From the simulations we could also evaluate that the limits of human performance, namely the inability to balance a stick shorter than 32 cm, can be expressed as a 25% uncertainty about such relative position. A smaller level of uncertainty, say 20%, would allow a super-human performance limit, i.e., the ability to stabilize a CIP with a stick length as short as 26 cm; a higher level of uncertainty, say 30%, would a induce a degraded sub-human performance level. In any case, the acquisition of the relevant internal models (the geometric model of the stable manifold

**26**

between 4 and 20.

slope, the short-term sensorimotor memory for phase reset of the CIP state at toff , and the dynamic model for the on-phase simulation) imply a rather long learning process based on the acquisition and elaboration of a large number of unsuccessful trials: this well-reflects the fact that human subjects require indeed a large effort and long training, in order to become skilled performers at this level of challenge, whereas they almost immediately succeed to control the system in a less challenging situation, say a stick length of 1 m or more. Moreover, there are some subjects that persistently fail whatever the amount of training in the most challenging situation. Characterizing and modeling a learning process of this kind is clearly outside the purpose of this work, although we may investigate it in the near future: in any case, some suggestion may come from a preliminary study that focused on the use of reinforcement learning in relation with the emergence of intermittent-feedback control (Michimoto et al., 2016).

### REFERENCES


### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This research was supported by the Robotics, Brain and Cognitive Science Department of the Italian Institute of Technology, Genova, Italy, and by JSPS grants-in-aid 16H01614 (TN) and 17K13016 (YS).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fncom. 2019.00016/full#supplementary-material


pendulum model. J. Theor. Biol. 310, 55–79. doi: 10.1016/j.jtbi.2012. 06.019


Yoshikawa, N., Suzuki, Y., Kiyono, K., and Nomura, T. (2016). Intermittent feedback-control strategy for stabilizing inverted pendulum on manually controlled cart as analogy to human stick balancing. Front. Comput. Neurosci. 10:34. doi: 10.3389/fncom.2016.00034

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Morasso, Nomura, Suzuki and Zenzeri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computational Neural Modeling of Auditory Cortical Receptive Fields

Jordan D. Chambers <sup>1</sup> \*, Diego Elgueda2,3, Jonathan B. Fritz <sup>3</sup> , Shihab A. Shamma3,4 , Anthony N. Burkitt <sup>1</sup> and David B. Grayden<sup>1</sup>

<sup>1</sup> NeuroEngineering Laboratory, Department of Biomedical Engineering, University of Melbourne, Parkville, VIC, Australia, <sup>2</sup> Departamento de Patología Animal, Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile, 3 Institute for Systems Research, University of Maryland, College Park, MD, United States, <sup>4</sup> Laboratoire des Systèmes Perceptifs, École Normale Supérieure, Paris, France

Previous studies have shown that the auditory cortex can enhance the perception of behaviorally important sounds in the presence of background noise, but the mechanisms by which it does this are not yet elucidated. Rapid plasticity of spectrotemporal receptive fields (STRFs) in the primary (A1) cortical neurons is observed during behavioral tasks that require discrimination of particular sounds. This rapid task-related change is believed to be one of the processing strategies utilized by the auditory cortex to selectively attend to one stream of sound in the presence of mixed sounds. However, the mechanism by which the brain evokes this rapid plasticity in the auditory cortex remains unclear. This paper uses a neural network model to investigate how synaptic transmission within the cortical neuron network can change the receptive fields of individual neurons. A sound signal was used as input to a model of the cochlea and auditory periphery, which activated or inhibited integrate-and-fire neuron models to represent networks in the primary auditory cortex. Each neuron in the network was tuned to a different frequency. All neurons were interconnected with excitatory or inhibitory synapses of varying strengths. Action potentials in one of the model neurons were used to calculate the receptive field using reverse correlation. The results were directly compared to previously recorded electrophysiological data from ferrets performing behavioral tasks that require discrimination of particular sounds. The neural network model could reproduce complex STRFs observed experimentally through optimizing the synaptic weights in the model. The model predicts that altering synaptic drive between cortical neurons and/or bottom-up synaptic drive from the cochlear model to the cortical neurons can account for rapid task-related changes observed experimentally in A1 neurons. By identifying changes in the synaptic drive during behavioral tasks, the model provides insights into the neural mechanisms utilized by the auditory cortex to enhance the perception of behaviorally salient sounds.

Keywords: mathematical modeling, neural networks, auditory cortex, spectrotemporal receptive fields (STRFs), genetic algorithm

#### Edited by:

Jun Izawa, University of Tsukuba, Japan

#### Reviewed by:

Andreas L. Schulz, Leibniz Institute for Neurobiology (LG), Germany Jian K. Liu, University of Leicester, United Kingdom

> \*Correspondence: Jordan D. Chambers jordanc@unimelb.edu.au

Received: 13 December 2018 Accepted: 23 April 2019 Published: 24 May 2019

#### Citation:

Chambers JD, Elgueda D, Fritz JB, Shamma SA, Burkitt AN and Grayden DB (2019) Computational Neural Modeling of Auditory Cortical Receptive Fields. Front. Comput. Neurosci. 13:28. doi: 10.3389/fncom.2019.00028

## INTRODUCTION

The auditory cortex utilizes a variety of processing strategies to enhance the perception of behaviorally-meaningful sounds in the presence of background noise. Rapid plasticity of receptive fields in primary (A1) cortical neurons is observed during behavioral tasks that require discrimination of particular sounds (Fritz et al., 2003, 2005a,b; Elhilali et al., 2004, 2007). This rapid, task-related change may enhance the ability to selectively attend to one acoustic feature or to one stream of sound in the presence of mixed sounds.

An essential property of acoustic signals is their temporal dynamics. Electrophysiological studies have shown that A1 neurons can encode the temporal structure of acoustic stimuli (Elhilali et al., 2004). A traditional view of auditory processing describes how a temporal sequence of sounds is distributed in the frequency domain along the auditory pathway from the basilar membrane to the cortex. Therefore, it is important to consider a sound's spectral and temporal features together.

The spectrotemporal receptive field (STRF) is a description an auditory neuron's input-to-output transformation encompassing both the spectral and temporal features. The STRFs of A1 neurons exhibit complex patterns that can undergo rapid, taskrelated changes (Fritz et al., 2003, 2005b; Elhilali et al., 2007). Complex patterns are observed, such as an increase in firing rate in response to increases in power at a certain frequency while decreasing firing rate in response to increases in power at an adjacent frequency (Fritz et al., 2005b). Rapid changes in the STRF are observed in A1 during task performance of ferrets trained to attend to a tone of any frequency (Fritz et al., 2003). Attending to a target tone consistently induced facilitative changes in the STRF at the location of the target tone for a conditioned avoidance Go-NoGo task, while in contrast, rapidly induce suppressive STRF changes at the target tone frequency in positive reinforcement Go-NoGo (David et al., 2012). However, the neural mechanisms by which cortical neurons dynamically change their STRFs in a matter of seconds remains unknown.

There have been several previous attempts to model the auditory cortex to investigate the role it plays in the perception of important sounds. For example, it has been shown that changes in STRFs can enhance discrimination between different sounds using mathematical filters (Mesgarani et al., 2010). However, using filters or similar macro mathematical processes (for example, Wrigley and Brown, 2004; Loebel et al., 2007; Grossberg and Kazerounian, 2011), neglects the fine temporal information that is encoded in the precise timing of action potential firing of neurons in the cortex (Elhilali et al., 2004). Other models that do contain fine temporal information (for example, Bendor, 2015) have not addressed the challenge of global auditory perception. Extending previous models to include fine temporal information has the potential to overcome the limitations of previous studies and allows for an investigation of the role of timing in the adaptive neural mechanisms utilized by the brain.

This study develops a neural network model that can produce realistic STRFs in response to a sound signal. The model consists of a cochlea model and a neural network model

representing the A1 cortex. Action potentials in the neural network model were used to calculate the STRF. Mechanisms by which cortical neurons can change their STRF were investigated and directly compared to electrophysiological recordings. This model can reproduce complex STRFs observed experimentally. The model shows that synaptic drive between cortical neurons and/or synaptic drive from the cochlear model to the cortical neurons can account for rapid task-related changes exhibited by A1 neurons.

### METHODS

A phenomenological neural network model was developed to investigate mechanisms by which cortical neurons can change their spatiotemporal receptive fields. **Figure 1** shows an overview of the model structure. A sound signal was sent into a model of the cochlea. The output of the cochlear model excited or inhibited integrate-and-fire neuron models to represent networks in the primary auditory cortex (A1). Each neuron in the network was tuned to a different frequency. All neurons were interconnected with excitatory or inhibitory synapses of varying strengths. Action potentials in one of the cortical neuron models was used to calculate the STRF using reverse correlation, which could be directly compared to electrophysiological recordings of real world, experimentally derived STRFs in ferrets (Fritz et al., 2003). The cortical neuron, from which the STRF was calculated, was chosen according to best frequency (defined as the frequency which produced largest spiking response at a given sound intensity) in the experimentally recorded STRF. A genetic algorithm was used to optimize the synaptic drive between neurons to produce STRFs and the behavioral changes in STRFs that matched experimentally recorded data.

### The Model Sound Signals

Computer-generated sound signals were used to activate the mathematical model of the cochlea. The sound signals consisted of temporally orthogonal ripple combinations (Klein et al., 2000), called TORCs. The TORCs were synthesized from the general expression (Klein et al., 2000):

$$S(t, \mathbf{x}) = \sum\_{i=1}^{N} 2a\_{k, l\_i} \cos \left\{ 2\pi \left( \omega\_{k\_l} t + \varepsilon\_{l\_i} \mathbf{x} \right) + \varphi\_{k, l\_i} \right\}, \tag{1}$$

where N is the number of distinct moving ripples, t is the time course of the stimuli, x is the bandwidth of the stimuli, ak,<sup>l</sup> describes the amplitude, ω<sup>k</sup> describes ripple velocity, ε<sup>l</sup> describes the ripple frequency, and ϕk,<sup>l</sup> describes the phase of the stimulus components. The particular ripples chosen are parameterized by the list of indices k = - k1, k2, . . . , k<sup>N</sup> <sup>∈</sup> (−∞,∞) and <sup>l</sup> <sup>=</sup> - l1, l2, . . . , l<sup>N</sup> ∈ (0,∞ ).

To generate a STRF, a set of 30 TORCs were used. The set of TORCs had identical properties to TORCs used as acoustic stimuli in electrophysiological recordings. The TORCs had durations ranging over 1–2 s. Each TORC had a spectral profile that was the superposition of the envelopes of six ripples. Each ripple had a sinusoidal spectral profile with peaks spaced between 0 and 1.4 peaks per octave (ε<sup>l</sup> in Equation 1) and the envelope drifted temporally along the logarithmic tonotopic frequency axis at a constant velocity ranging from 2–24 to 4–48 Hz (ω<sup>k</sup> in Equation 1).

#### Cochlear Model

A model of the cochlea was used to convert the acoustic signal into neural activity. Fifteen auditory nerve fibers were simulated, with center frequencies evenly distributed along the logarithmic axis between 0.5 and 16 kHz. For each nerve, the auditory nerve model of Carney and colleagues was used (Tan and Carney, 2003; Zilany et al., 2009, 2014), so it will only be summarized here. There are two modes of basilar membrane excitation to the inner hair cell. The two modes are generated by two parallel filters. The first is a narrow-band chirp filter, which is responsible for low and moderate level responses. The second is linear, static, and broadly tuned, which is critical for producing transition regions and high-level effects. The responses from the two modes of the basilar membrane excitation are then added and passed through the inner hair cell low-pass filter followed by the inner hair cell-auditory nerve synapse model and discharge generator. The model responses are consistent with a wide range of physiological data from both normal and impaired ears for stimuli presented at levels spanning the dynamic range of hearing.

Version 5.2 of the auditory periphery model was used (Zilany et al., 2009) with modifications and updated simulation options (Zilany et al., 2014). Cat was used as the species option to produce good responses in the 12–16 kHz range, whereas the human option would show a decline in output above 12 kHz. A medium level of spontaneous activity and a variable noise type were chosen as these produced more robust responses when stimulating with TORCs. A normal setting for both inner and outer hair cell function was used.

#### Cortical Neuron Model

A standard integrate-and-fire neuron was used of the form,

$$\tau\_m \frac{d\boldsymbol{\nu}(t)}{dt} = (\boldsymbol{\nu}\_{rest} - \boldsymbol{\nu}(t)) + I\_{\text{sym}}(t) \ast \boldsymbol{R}\_m \ast \left(\boldsymbol{\nu}(t) - \boldsymbol{\nu}\_{E\_I}\right), \tag{2}$$

TABLE 1 | Numerical values used in the cortical neuron model.


This table shows the numerical values used in the integrate-and-fire neuron model. These parameters appear in Equations (2, 3).

where v is the membrane potential, vrest is the resting membrane potential or equilibrium potential of the membrane leak, τ<sup>m</sup> is the membrane time constant of the neuron, R<sup>m</sup> is the membrane resistance, Isyn is the current resulting from the synaptic input into the neuron, and vE<sup>I</sup> is the driving force (or equilibrium potential) of the synaptic current.

Synaptic input from the Carney model and synaptic input from other cortical neurons (**Figure 1**) was modeled by an injected synaptic current with an alpha time course,

$$I\_{\rm syn}\left(t\right) = h\left(t\right) \* \mathcal{g}\_m \frac{t}{\tau\_s} e^{\frac{-t}{\tau\_s}},\tag{3}$$

$$\text{where } h(t) = \begin{cases} 0 & \text{if } t \le 0 \\ 1 & \text{if } 0 < t \end{cases}$$

where t is the time since the synaptic event, g<sup>m</sup> is the maximum conductance, and τ<sup>s</sup> is the time constant of the synapse.

The parameters values for Equations (2, 3) are given in **Table 1**.

#### Reverse Autocorrelation of the STRF

The STRF is a description of the auditory system's input-tooutput transformation. It has the general form,

$$r(t) = \iint F\_{STR}(\mathbf{r}, \mathbf{x}) \ast \mathbf{S} \,(t - \mathbf{r}, \mathbf{x}) \, d\mathbf{r} \, d\mathbf{x},\tag{4}$$

where r is a neuron response as a firing rate, FSTRF is the STRF functional, and S is the stimulus's dynamic spectrum.

The spectrotemporal reverse-correlation function C, is obtained by cross-correlating the dynamic spectrum of the stimulus with the measured response,

$$C\left(\tau,\varkappa\right) = \frac{1}{T} \int\_0^T \mathbb{S}\left(t-\tau,\varkappa\right) \* r\left(t\right),\tag{5}$$

where x is the frequency bands (and denotes the number of octaves above the lowest frequency). By inserting the STRF functional (Equation 4) directly into (Equation 5) and rearranging the terms, one obtains the spectrotemporal crosscorrelation function,

$$C(\tau, \mathbf{x}) = \iint F\_{\rm{STR}} \left( \tau', \mathbf{x}' \right)^{\*} \Phi \left( \tau - \tau', \mathbf{x}, \mathbf{x}' \right) d\tau' d\mathbf{x}' + \varepsilon \left( \tau, \mathbf{x} \right), \tag{6}$$

where ε is the portion of the measure response due to non-linear and random aspects of the system transformation not described by the STRF and 8 is given by

$$\Phi\left(\mathbf{r}-\mathbf{r}',\mathbf{x}',\mathbf{x}\right) \triangleq \int \mathcal{S}\left(\mathbf{t}-\mathbf{r}',\mathbf{x}'\right) \mathcal{S}\left(\mathbf{t}-\mathbf{r},\mathbf{x}\right) dt. \tag{7}$$

Here, 8 is a function that, in discrete channel interpretation, describes the cross-correlation between two channels x and x′ of the stimuli's dynamic spectrum. Thus, a single channel x of C is produced by the sum of the convolutions of every channel x ′ of the STRF with the cross-correlation between the channels x ′ and x of the stimulus. However, for both an ideal white noise dynamic spectrum and for a TORC, this expression reduces to a relatively simple two-dimensional convolution between the STRF and a spectrotemporal filter 8 (τ , x). For these two special cases, 8 depends only on the channel difference, x - x′ , and is given by the autocorrelation of S.

#### Optimization Algorithm

A genetic algorithm was used to optimize the synaptic drive between neurons to produce STRFs and the behavior-induced changes in STRF that matched experimentally recorded data. The neural network model contained 15 neurons, so the genetic algorithm optimized 255 parameters representing 225 synaptic connections between all 15 neurons, and 30 variables (2 per neuron) for the input from the cochlear model (15 for the strength of the input and 15 for the delay in transmission). All parameters were limited to a range of −5 to 5 and only integer values were used to reduce the probability of overfitting. A positive value for the synaptic drive resulted in an excitatory postsynaptic potential, whereas a negative value for the synaptic drive resulted in an inhibitory post-synaptic potential. The positive or negative values of the parameters for the delay in transmission were added to a baseline value to produce a transmission delay in the range of 0–50 ms.

The genetic algorithm followed a typical methodology. All parameters were assigned at random for the initial population of 1,000. The best 40 responses were classified as elite and passed directly to the next generation. The best 100 responses were used as parents for the next generation, where 480 were created from crossing-over parameters from parents and 480 were created from mutation of individual parents.

To determine the best responses, the STRF estimated from the action potentials in the cortical neuron network were compared with an experimentally recorded STRF by the cost function,

$$\mathcal{L} = \sum\_{i}^{N\_f} \sum\_{j}^{N\_l} \left| \left( M\_{\rm STRF} \left( i, j \right) - E\_{\rm STRF} \left( i, j \right) \right) \right| \ast \text{sig} \left( E\_{\rm STRF} \left( i, j \right) \right), \tag{8}$$

where N<sup>f</sup> is the number of points along the frequency axis of the STRF, N<sup>t</sup> is the number of points along the time axis of the STRF, MSTRF is the STRF calculated in the cortical neuron model, ESTRF is the experimentally recorded STRF, and sig is a function used to focus the genetic algorithm on the regions of the STRF that are significantly different from the remainder of the STRF. A point in the STRF was determined to be significantly different if it exceeded 3 standard deviations from the mean of the whole STRF,

$$\text{sig}(\mathfrak{x}) = \begin{cases} 1 & \text{if } \mathfrak{x} \ge \mathfrak{z} \text{s.d.} \text{from mean}(\mathfrak{x})\\ 0.1 & \text{if } \mathfrak{x} < \mathfrak{z} \text{s.d.} \text{from mean}(\mathfrak{x}) \end{cases} \tag{9}$$

where s.d. is the standard deviation.

#### Sensitivity Analysis

The genetic algorithm was used to optimize the 255 parameters in the mathematical model to reproduce the electrophysiologically recorded STRF in the model. This was performed for STRFs electrophysiologically recorded in the passive state and the behavioral states, where the ferret was actively listening for the target tone. A comparison between the model parameters optimized for the passive and behavioral states was performed by doing a sensitivity analysis on each parameter. The sensitivity analysis involved sequentially increasing and decreasing each parameter by one value from the optimal solution and determining the amount of change to the cost function (see Equation 8) and normalized over all 255 parameters. This sensitivity analysis quantified the effect of each parameter change upon the STRF for a given solution. The optimization by the genetic algorithm and sensitivity analysis was repeated five times for the same electrophysiologically recorded STRF.

To understand the changes occurring between the passive and behavioral states, a network diagram was generated containing the important parameters highlighted by the sensitivity analysis. A parameter was determined to be important if it was in the top two parameters of the sensitivity analysis for either the passive or behavioral states. Additional parameters were included to view the flow of information from the sound signal to the neuron from which the STRF was calculated to understand how the important parameters would be influencing the STRFs. The additional parameters included input from the cochlear model to neurons involved in the important parameters and synaptic connections between the neurons involved in the important parameters and the neuron from which the STRF was calculated from. Input from the cochlear model to the neuron from which the STRF was calculated was also included.

#### Experimental Procedures

All experimental procedures were approved by the University of Maryland Institutional Animal Care and Use Committee (IACUC). Electrophysiological recordings from the A1 cortex of adult ferrets were performed with a behavioral paradigm requiring the ferrets to actively listen for a particular tone. This experimental procedure has been described in detail elsewhere (Fritz et al., 2003, 2005a,b), so will only be summarized here.

A stainless-steel head post was surgically implanted onto the skull and mounted with dental cement to stabilize the head for neurophysiological recordings. Craniotomies were made over auditory cortex, allowing microelectrodes to be inserted into A1. Location was based on stereotaxic coordinates and distinctive A1

FIGURE 2 | Output at different stages of the model. Examples of the signals and activity patterns for different components in the model. (A) An example of one cycle of a TORC sound signal sent into the model of the cochlea. (B) The mean activity rate in the model of the cochlea with a center frequency of 3 kHz. (C) Synaptic events that occur in the hair cell-auditory nerve synapse model and subsequent discharge generator in response to the mean activity rate in (Continued)

neurophysiological characteristics such as latency, receptive field tuning, and position relative to the cortical tonotopic map.

Experiments were conducted in a sound attenuation chamber. Ferrets were trained on a tone detection task using a Go-NoGo conditional avoidance procedure (Klump, 1995; Fritz et al., 2003). In the passive state, the ferrets were awake and quiescent when the TORC stimuli were presented. In the behavioral state, ferrets licked water from a spout while listening to reference stimuli until they heard a target tone, whereupon the ferret learned to stop licking for a short period of time (400 ms) to avoid a mild shock. The same sets of TORCs and the same procedures for calculating the STRF from electrophysiologically recorded spikes were used in the animal experiments and the mathematical model described above.

## RESULTS

A neural network model was developed to reproduce experimentally observed changes in STRFs of neurons in the A1 cortex. A sound signal was sent into a model of the cochlea, which provided input into integrate-and-fire neuron models to represent neural networks in the A1 cortex (**Figure 1**). The action potentials in one of the neuron models was used to calculate an STRF using reverse correlation, which could be directly compared to electrophysiological recording in vivo.

### Producing Anticipated STRFs

To ensure the overall model structure was producing expected results, the cortical neural network model was reduced in complexity by removing synaptic connections between cortical neurons. Playing a simple chirp as the input sound signal produced activity in the cochlear model and cortical neuron model at the expected times (data not shown). **Figure 2** shows an example TORC sound signal for one repetition (**Figure 2A**) and the resulting activity in the different stages of the model. The mean activity in the cochlear model (determined by the inner hair cell potential) fluctuated with the intensity of the sound signal (**Figure 2B**) and the resulting output from the discharge generator in the cochlear model was dependent on the mean activity (**Figure 2C**). Events in the discharge generator resulted in synaptic potentials in the cortical neuron model that were offset with a time delay corresponding to the delay in transmission from the cochlear model to the cortical neuron model (**Figure 2D**). When the events in the discharge generator were close enough together in time, summation of the synaptic potentials in the cortical neuron model could result in action potentials (**Figure 2D**). When investigating complex neural networks, more robust responses were observed with cortical neurons that had tonic firing of action potentials. **Figure 2E** shows a cortical neuron with tonic firing that received the exact same inputs as the cortical neuron in **Figure 2D**. The cortical neuron with tonic firing showed an increase in the action potential firing rate at times corresponding to the times of high activity in the discharge generator, as expected (**Figure 2E**). Therefore, when a TORC is played into the model, action potentials are observed in the cortical neurons as expected.

With a sound signal producing expected activity in the cochlear model and cortical neuron model, an STRF calculated from the action potentials in the cortical neuron model also produce expected results. When there were no synaptic connections between cortical neurons, each neuron produced a simple STRF with a simple excitatory region (**Figure 3A**). Altering the center frequency of tuning in the cochlear model (from 5 kHz in **Figure 3A** to 2.5 kHz in **Figure 3B**) moved the simple excitatory region to the expected frequency. Furthermore, switching the connection to inhibitory from the cochlear model to the cortical neuron model produced an inhibitory region (**Figure 3C**). Increasing the delay of transmission from the cochlear model to the cortical neuron model produced a time shift in the appearance of the excitatory region in the STRF (**Figure 3D**). These results demonstrate that the model is producing STRFs that accurately represent the model structure between the cochlear model and the cortical neuron model.

To ensure the cortical neuron network was also influencing STRF as expected, the cortical neuron network model was extended to include one synaptic connection from a cortical neuron receiving input from cochlear model tuned to a different frequency. This resulted in a second excitatory region (**Figure 3F**) or an inhibitory region (**Figure 3E**) depending on the type of synaptic connection. The transmission time from one cortical neuron to another caused a delay in the appearance of the second region in the STRF (**Figures 3E,F**). This delay is proportional to the difference in frequency of tuning in the cochlear model to represent the tonotopic organization observed in the A1 cortex.

In **Figure 3**, alternating regions of excitation and inhibition can be observed in the STRFs. Usually these regions are quite weak, but sometimes the regions can be strong (for example, the excitatory region at 3 kHz and 45 ms in **Figure 3E**). These alternating regions of excitation and inhibition are produced as a combination of the intrinsic properties of the neuron model (in particular resetting of the membrane potential after an action potential and the presence of an after-hyperpolarising potential, data not shown) and the properties of the TORCs. Reducing the noise within the model increases the appearance of these alternating regions, so they are obvious in **Figure 3** where the model was simplified to ensure it was working properly.

### Reproducing Experimentally Observed STRFs

The excitatory and inhibitory regions in the STRFs calculated from this model of the auditory system can be manipulated by synaptic connections from other cortical neurons, synaptic drive from the cochlear model, and delay in transmission from the cochlear model to the cortical neuron model. Complex STRFs can be generated using different combinations of these manipulations. To test if this model can reproduce experimentally recorded STRFs, a genetic algorithm was used to optimize the STRF generated from the mathematical model with an experimentally recorded STRF.

**Figure 4** displays the experimentally recorded STRFs and the STRFs generated from the computer model. We note that in physiological experiments, under both passive listening and active behavioral conditions, the same acoustic stimuli (i.e., broadband rippled noise combinations or TORCs, and target tones) were presented in identical order. To focus the genetic algorithm on important regions of the STRFs, points in the STRF more than three standard deviations from the mean value of all points in the STRF were emphasized (see Equations 8 and 9) and these physiologically recorded targets (**Figures 4A,E**) were closely matched by the model (**Figures 4B,F**). Furthermore, the STRFs electrophysiologically recorded and those generated from the mathematical model closely matched both the passive (**Figures 4C,D)** and the behavioral states (**Figures 4G,H**). This demonstrates that the optimization method and parameter space of the mathematical model enable the mathematical model to reproduce experimental observations. Repeating the optimization method with a different seed for the random number generator produce a different solution, which also closely matched the experimental observations. For each experimental observation, repeating the optimization five times produced five unique solutions (**Figure S1**), but often similar values of the cost function were produced over 100 generations of the genetic algorithm.

The optimal solutions found by the genetic algorithm for the passive and behavioral states were compared to investigate the changes that are likely to occur from top-down attentional control. Since the genetic algorithm was optimizing 255 parameters, a sensitivity analysis was performed to highlight the parameters that were important in determining the behavior of the model in relation to the cost function. To account for the parameters highlighted by the sensitivity analysis due to chance in the complex multi-dimensional space, the genetic algorithm and sensitivity analysis were performed five times and the results averaged (**Figure 5**). The averaged sensitivity analysis indicated that three parameters were important in determining the network structure in the passive state (**Figure 5A**): (1) the input strength to the neuron from which the STRF was recorded, (2) the delay from the cochlea model to the neuron from which the STRF was recorded, and (3) the strength of the synaptic connection from the neuron tuned to the same frequency as the target tone to the neuron from which the STRF was recorded. Intuitively, these parameters correspond to the simplest network to generate the excitatory and inhibitory regions that are significantly different to the remaining parts of the STRF (**Figure 4A**). For the behavioral state, the average sensitivity analysis indicated that only one parameter was important in determining the network structure (**Figure 5B**), corresponding to the input strength to the neuron from which the STRF is recorded. Therefore, the sensitivity analysis allows a comparison of the average numerical values between the passive and behavioral states (**Figures 5C,D**). In this example,

it can be seen that the behaviorally induced reduction in the STRF at ∼8 kHz (**Figures 4A,D**) was reproduced in the mathematical model by reducing the strength of the inhibitory synaptic connection from neuron 13 (CF 10 kHz) to neuron 14 (CF 12 kHz) and by reducing the strength of input from the cochlea model to neuron 13 (CF 10 kHz). The network structures identified in **Figure 5** do display a large amount of variation. Increasing the number of repetitions from 5 to 10 (see **Figure S3**) decreases the variation for some parameters, but increases the variation for other parameters. This indicates the variation is due to the genetic algorithm finding unique combinations of parameters to fit the physiological data.

### Reproducing More Experimentally Observed Changes in STRFs

The same protocol (see section Reproducing Experimentally Observed STRFs) was used to predict the changes in network structure for a further nine single cell recordings from the A1 cortex of ferrets. The STRFs, significant components of the STRFs, sensitivity analyses, and important network parameters for the passive and behavioral states are provided for all 10 cells in the **Supplementary Material** (**Figure S2**). For each cell recording, the genetic algorithm was able to reproduce a close match to the experimentally recorded STRF for both the passive and behavioral states. Running the genetic algorithm five times for both the passive and behavioral states allowed sensitivity analyses to highlight important network parameters that were able to provide an explanation for the changes observed in the STRFs between passive and behavioral states (**Figure 6** and **Figure S2**). **Figure 6** provides a summary of four single unit recordings. For the first cell (**Figure 6A**), there was a reduction in the inhibitory region at ∼8 kHz in the behavioral state compared to the passive state. This was reproduced in the network model by increasing the excitatory input from the neuron tuned to 12 kHz, which was producing the excitatory region in the STRF at this location (**Figure 6A**). For the second cell (**Figure 6B**), the inhibitory region at and below the target tone was converted to an excitatory region at the target tone during behavior. The network model reproduced this switch by changing the input to the neuron at the target tone from inhibitory input to excitatory input and removing the inhibitory input to the neuron tuned to a lower frequency (**Figure 6B**). The third example (**Figure 6C**) displayed a complex pattern of inhibitory regions above and below an excitatory region; during behavior the lower inhibitory region was abolished. The sensitivity analysis indicated the important network parameters were determining

STRF that are significantly different (>3 standard deviations) from the mean value of the STRF. The behavioral target (E) was used in the cost function for the optimization. The behavioral fit (F) is the result of the genetic algorithm trying to reproduce (E). (G) The STRF electrophysiologically recorded from a ferret. (H) The STRF calculated from the mathematical model.

the excitatory region and the higher frequency inhibitory region, so these network parameters were maintained between the passive and behavioral states (**Figure 6C**). Intuitively, this makes sense because the excitatory and higher inhibitory regions are larger than the lower inhibitory region, so they would have a larger influence on the cost function. In the fourth example (**Figure 6D**), the inhibitory region at ∼1.5 kHz was abolished in the behavioral state. The network model reproduced this removal of the inhibitory region by increasing the strength of the excitatory region (**Figure 6D**), which reduces the magnitude and significance of the inhibitory connection from the lower frequency.

### DISCUSSION

Temporal dynamics are a key component of acoustic signals and neurons in the primary auditory cortex can detect the temporal structure of acoustic signals (Elhilali et al., 2004). The cochlea and auditory pathway distribute sounds in the frequency domain from the cochlear basilar membrane to the auditory cortex. Therefore, it is important to consider a sound's spectral and temporal features together. Spectrotemporal receptive fields (STRFs) combine both the spectral and temporal features of the auditory system. Previous studies have shown that the STRFs of A1 neurons display rapid plasticity during

matrices, one 15-by-15 matrix and one 2-by-15 matrix. The 15-by-15 matrix represents the synaptic connections between all 15 neurons, with the y-axis representing the neuron number providing the synaptic output and the x-axis representing the neuron number receiving the synaptic. The 2-by-15 matrix represents the input from the cochlea model to the cortical neuron network, where Istr is the strength of the synaptic input and Itm is the length of the time delay for the synaptic input. The frequency axis on the far right hand side indicates the center frequency tuning for each neuron number. For clarity, if the sensitivity analyses were not in the top two values for either the passive or behavioral state, the value is not shown. (C,D) The network schematic diagram indicating the average values of the network parameters for the passive (C) and behavioral (D) states. Solid lines indicate that parameter had a high sensitivity (the two most sensitive parameters over five repetitions of the optimization) for that state, whereas dashed lines indicate the parameter was sensitive for the other state or were required to follow the network path from the sound signal to the neuron from which the STRF was calculated. Red lines indicate excitatory synaptic connections, blue lines indicate inhibitory synaptic connections, and the thicknesses of the lines indicate the strength of the synaptic connections. The numerical values presented for each line indicate the mean ± standard deviation for the five repetitions of the optimization.

behavioral tasks requiring discrimination of particular sounds (Fritz et al., 2003, 2005b; Elhilali et al., 2007). However, the neural mechanisms underlying changes in a neuron's STRFs are yet to be elucidated. In this study, a neural network model was developed to investigate mechanisms by which cortical neurons can change their receptive fields. This model can reproduce complex STRFs observed experimentally and demonstrates that altering the synaptic drive between cortical neurons and/or synaptic drive from the cochlear model to cortical neurons can account for the rapid-task related changes displayed by A1 neurons.

The mathematical model presented here comprised a cochlear model and a cortical neuron network. A sound signal was sent into a model of the cochlea. The cochlear model consisted of 15 auditory nerve fiber models, previously developed and published by Carney and colleagues (Tan and Carney, 2003; Zilany et al., 2009, 2014). Each of the auditory nerve models had a different center frequency evenly distributed along a logarithmictonotopic axis. The synaptic output from the cochlear model excited or inhibited one integrate-and-fire neuron model. There were 15 integrate-and-fire neuron models to represent networks in the A1 cortex. All neurons were interconnected with excitatory or inhibitory synapses of varying strengths. Action potentials of one of the cortical neuron models was used to calculate the STRF using reverse correlation, which could be directly compared to electrophysiological recordings of the STRF a ferret (Fritz et al., 2003). A genetic algorithm was used to optimize the synaptic drive between neurons to produce STRFs and the behavioral changes in STRFs that matched experimentally recorded data.

The results demonstrate that this simple phenomenological model can produce complex STRFs similar to those observed experimentally. The genetic algorithm was able to optimize the synaptic drive in the cortical neural network to ensure a close match between an electrophysiologically recorded STRF and the STRF calculated in the mathematical model. A genetic algorithm was used because there were a large number of variables, the cost function relating the fitting the STRFs to experimental data was not a smooth function and as a method of avoid local optimal solutions in a neural network with a large number of possible synaptic pathways. This optimization worked for both the passive and behavioral states, thereby allowing a comparison between network parameters for both states. Since the genetic algorithm was optimizing 255 parameters and the network behavior could be highly influenced by combinations of multiple parameters, the genetic algorithm optimization and sensitivity analyses were repeated five times and averaged. For each repetition, the optimization algorithm produced a

lines indicate that parameters have a high sensitivity, whereas dashed lines indicate the parameter was not sensitive but is provided for comparison between the two network structures or to follow the pathway from sound signal to the neuron from which the STRF is calculated. Red lines indicate excitatory synaptic connections, blue lines indicate inhibitory synaptic connections, and the thicknesses of the lines indicate the strengths of the synaptic connections. The numerical values presented for each line indicate the mean ± standard deviation for the five repetitions of the optimization.

unique set of parameters that produced an STRF similar to those observed experimentally. While the network parameters displayed variations between repetitions, the sensitivity analysis highlighted network parameters whose influence over the network behavior was preserved multiple times. The sensitivity analysis was also important for comparisons of parameter values between the passive and behavioral states where the numerical value showed little change. Traditional statistical methods would deem that no change has occurred, suggesting the parameter is not important in the switch from passive to behavioral states. However, the sensitivity analysis could show that a parameter was indeed important in determining the properties of the neural network for one or both states, even if there was no change in that parameter. Therefore, this process is able to predict the important network parameters and their changes, or no change, in the passive and behavioral states.

This work has demonstrated that changes observed in STRF of neurons in the A1 cortex for behavioral tasks can be accounted for by changes in the synaptic drive between cortical neurons and/or synaptic drive from the cochlear model to the cortical neurons. The changes in the STRF between passive and behavioral states can occur with seconds to minutes (Fritz et al., 2003; Lu et al., 2017) and can dissipate rapidly or remain stable for a long time (Fritz et al., 2003). There are several mechanisms that can potentially change the synaptic drive between two neurons with a very rapid time course (such as synaptic potentials, spike-timing dependent plasticity (STDP), changes in dendritic spine shape as well as activity dependent depression) or in a manner that remains stable for an extended period of time (such as synaptogenesis, or long-term changes mediated by intracellular second messenger systems). A major advantage of producing a spiking neuron model to reproduce experimentally observed changes in the STRF is that these mechanisms for rapid changes or longer stability can be readily investigated by evoking the appropriate mechanisms within a single neuron. These investigations will require further challenging studies, such as intracellular recordings from individual A1 neurons in a behaving animal.

In this study, the experimental paradigm produced a positive change in the STRF at the target tone, which has been previously reported (Fritz et al., 2003, 2005a,b; Elhilali et al., 2004, 2007). In individual examples reproduced in this mathematical model, the model predicted the increased excitation in the behavioral STRF at the target tone could arise from reduced inhibitory output from the neuron at the target tone, increased excitatory synaptic drive from a region different to the target tone (which can reduce the influence of inhibition from the target tone; e.g., cell 10, **Figure S2**), or by reduced synaptic drive from the cochlear model to the neuron at the target tone. These different mechanisms could potentially be distinguished by further experimental studies. For example, distinguishing between increased excitatory drive and decreased inhibitory drive could be accomplished by pharmacological intervention or by directly recording neuronal membrane potential. Similarly, a bottom-up change in synaptic drive from the cochlear model to the cortical neurons could be distinguished from cortical neuron to cortical neuron interactions through localized pharmacological agonists or blockers for relevant neurotransmitters or neuromodulators. Such experiments require technical advances for pharmacological, optogenetic manipulations, and intracellular or patch-clamp recordings from A1 neurons in animal engaged in behavioral tasks such as the ones used in this study. The increased synaptic drive from the cochlear model could be measured experimentally by examining whether there was enhanced thalamic input to A1 neurons. Increase in thalamic drive to auditory cortex could arise from a variety of mechanisms, including increased thalamic firing or disinhibition of inhibitory circuits in A1 (for example, Letzkus et al., 2011).

The current model predicts that rapid task-related changes in A1 STRFs occur at the synaptic level by changing the weights of task-relevant synaptic inputs to A1 neurons; however, the mechanisms for this plasticity are not yet known. Neuroanatomical studies have shown the existence of diffuse and widespread cholinergic projections (that modify synaptic behavior) from the Nucleus Basalis to A1 that are likely to play an important role in neuroplasticity (Goard and Dan, 2009; Leach et al., 2013; Pinto et al., 2013; Bajo et al., 2014; Zhang et al., 2014, 2016). This projection pattern would suggest that cholinergic modulation during attentional tasks should produce uniform changes across the entire A1 cortex. However, our experimental results demonstrate highly selective attentional effects, and in our model, the pattern of network changes between passive and behavioral states was variable and complex, involving increases, no change, and/or decreases in synaptic drive at different synapses. The solutions generated by the model predict that the effect of top-down control from higher executive brain regions via cholinergic activation from Nucleus Basalis can influence the synaptic drive in A1 cortex in a specific fashion, perhaps by selectively modulating A1 synapses and neurons "tagged" by recent "target stimulus" activation. A previous model of cholinergic modulation of A1 suggests differential effects on the receptive fields of cortical neurons, depending on cholinergic receptors and site of action (thalamocortical or intracortical) (Soto et al., 2006). Another possibility is that there is focal top-down control from higher brain regions that target the task-relevant subset of synaptic sites in A1 cortex. Our spiking neuron network model of the auditory receptive fields provides a platform to test these and other possible mechanisms of topdown control during behavioral tasks (Zhang et al., 2014, 2016).

In addition to using the Carney and colleagues model for the auditory nerve fibers (Tan and Carney, 2003; Zilany et al., 2009, 2014), we also tested the early auditory processing model of Shamma and colleagues (Chi et al., 2005) and gamma-tone filter banks (Johannesma, 1972). All three types of models produced qualitatively similar results. This indicates that the important process performed by the cochlea model in this work is bandpass filtering to break up the sound signal into the frequency components. Such band-pass filtering is present in the three cochlear models tested here. Other features present in some of these cochlear models and not others (for example, synaptic adaptation in the Carney model or lateral inhibition in the Shamma model) do not have a significant effect on the qualitative results observed in the model presented in this study. The model of Carney and colleagues had the advantage of better responses in the higher frequency ranges (12–16 kHz) used in this mathematical model. The better responses observed in these ranges may have been due to the high frequency range of the cat, whereas other cochlear models based on humans have responses that start to drop off at frequencies above 12 kHz, as did the model of Carney and colleagues when using the human parameters. The model of Carney and colleagues also had the advantage of providing a discrete synaptic drive into the integrate-and-fire neuron model compared to a continuous output provided by other cochlear models. A discrete synaptic drive is a more realistic response, but does create an additional source of noise in the system because the amplitudes are driving a Poisson process.

The model presented here is missing many steps from the cochlear model to the neural network model in the A1 cortex. As stated in the previous paragraph, qualitatively similar results can be observed using the cochlear model of Shamma and colleagues (Chi et al., 2005), which incorporates a limited amount of early processing such as lateral inhibition. However, obviously, given the multiple neuroanatomical stages for auditory information processing between cochlea and cortex (including cochlear nucleus, laminar lemniscus, inferior colliculus, thalamus), our model is oversimplified. Nevertheless, these results indicate that it is not critical to include early processing strategies to reproduce the electrophysiologically recorded STRFs. However, this model was tested against neurons that showed a change in their STRF during behavioral tasks and ∼70% of neurons show a change in their STRF during behavioral tasks (Fritz et al., 2003). It is also possible that initializing an A1 cortical neuron network where all neurons are interconnected allows the optimization to include early auditory processing as well as A1 cortical processing. Our results indicate that it is not necessary to incorporate early auditory processing to reproduce the experimental observations in the model presented here.

Furthermore, our model included two variables for each neuron to describe the strength and timing of the connections from the cochlear to the auditory cortex. These variables were included in the optimization and the sensitivity analyses indicated these connections from the cochlear to the auditory cortex are significant at times. Therefore, bottom-up information from band-pass filtering can be important in this model of STRFs, but the inclusion of exclusion of early processing strategies does not have a significant effect in this model of STRFs.

In conclusion, this study has produced a mathematical model that can replicate complex STRFs observed in response to the same sound signals. The model demonstrates that synaptic drive between cortical neurons can account for rapid task-related changes exhibited by A1 neurons.

#### REFERENCES


These results lay a foundation for future extensions and elaborations of this model to include top-down control from higher brain regions, and a more detailed investigation into the multiple cellular mechanisms and neuronal receptive field plasticity utilized by the brain during sound discrimination tasks.

### ETHICS STATEMENT

All experimental procedures were approved by the University of Maryland Animal Care and Use Committee.

### AUTHOR CONTRIBUTIONS

JC, JF, SS, AB, and DG conceived and planned the experiments. JC planned and carried out the simulations and analysis. DE performed all animal experiments. JC took the lead in writing the manuscript. All authors contributed to the interpretation of results and provided critical feedback and helped shape the research, analysis, and manuscript.

### FUNDING

JC and this research was funded by an Australian Research Council grant (DP140101520). DE is funded by CONICYT-PCHA/Becas Chile Postdoctorado/Convocatoria 2016-folio 74170109.

### ACKNOWLEDGMENTS

This research was supported by Melbourne Bioinformatics at the University of Melbourne, grant number VR0003.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fncom. 2019.00028/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Chambers, Elgueda, Fritz, Shamma, Burkitt and Grayden. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Trait and State-Dependent Risk Attitude of Monkeys Measured in a Single-Option Response Task

Atsushi Fujimoto\* † and Takafumi Minamimoto\*

Department of Functional Brain Imaging, National Institute of Radiological Sciences, National Institutes for Quantum and Radiological Science and Technology, Chiba, Japan

#### Edited by:

Hiroshi Yamada, University of Tsukuba, Japan

#### Reviewed by:

Masatoshi Yoshida, National Institute for Physiological Sciences (NIPS), Japan Shunsuke Kobayashi, Fukushima Medical University, Japan

#### \*Correspondence:

Atsushi Fujimoto atsushi.fujimoto@mssm.edu; a.fujimoto.jul8@gmail.com Takafumi Minamimoto minamimoto.takafumi@qst.go.jp

#### †Present address:

Atsushi Fujimoto, Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States

#### Specialty section:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 25 April 2019 Accepted: 22 July 2019 Published: 07 August 2019

#### Citation:

Fujimoto A and Minamimoto T (2019) Trait and State-Dependent Risk Attitude of Monkeys Measured in a Single-Option Response Task. Front. Neurosci. 13:816. doi: 10.3389/fnins.2019.00816 Humans and animals show diverse preferences for risks ("trait-like" risk attitude) and shift their preference depending on the state or current needs ("state-dependent" risk attitude). For a better understanding of the neural mechanisms underlying risk-sensitive decisions, useful animal models have been required. Here we examined the risk attitude of three male monkeys in a single-option response task, in which an instrumental leverrelease was required to obtain a chance of reward. In each trial, reward condition, either deterministic (100% of 1, 2, 3, and 4 drops of juice) or probabilistic (25, 50, 75, and 100% of 4-drop juice) was randomly selected and assigned by a unique visual cue, allowing the monkeys to evaluate the forthcoming reward. The subjective value of the reward was inferred from their performance. Model-based analysis incorporating known economic models revealed non-linear probability distortion in monkeys; unlike previous studies, they showed a simple convex or concave probability distortion curve. The direction of risk preference was consistent between early and late phases of the testing period, suggesting that our observation reflected the trait-like risk attitude of monkeys, at least under the current experimental setting. Regardless of the baseline risk preference, all monkeys showed an enhancement of risk preference in a session according to the satiation level (i.e., state-dependent risk attitude). Our results suggest that, without choice or cognitive demand, monkeys show naturalistic risk attitude – diverse and flexible like humans. Our novel approach may provide a useful animal model of risk-sensitive decisions, facilitating the investigation of the neural mechanisms of decision-making under risk.

Keywords: risk attitude, subjective value, decision-making, monkeys, economic models

## INTRODUCTION

In an uncertain environment, one's preference toward risk biases one's decisions. Imagine that your friend encouraged you to buy an unlisted stock of a business venture. If you are a conservative person, you may pass on the opportunity to avoid the risk (i.e., risk-averse). However, if you are an adventurous person, you may buy the stock regardless of the risk (i.e., risk-prone). As such, inherent individual risk preference is diverse and determines the basic tendency to take (or not to take) a risky option ("trait-like" risk attitude) (Weber et al., 2002; Huettel et al., 2006; Tobler et al., 2008). In addition, the risk attitude is changeable depending on internal contexts; if you need to make

**42**

money right away, you may buy the risky stock irrespective of your character ("state-dependent" risk attitude) (Caraco et al., 1980; Stephens and Krebs, 1986; McNamara and Houston, 1992).

Past studies measured the risk preference of human subjects in economic tasks, in which subjects repeatedly made choices between a risky option and a safe option, and mathematical models have been proposed to capture the choice decisions of subjects. The most influential model, prospect theory, assumes a distortion of probabilities and provides better explanation of the non-normative choice pattern of human subjects than the expected utility theory does (Kahneman, 1979; Tversky and Kahneman, 1992; Prelec, 1998; Gonzalez and Wu, 1999). Calculation of the subjective value based on distorted probability is conceptually analogous to the assumption of the finance theory that calculates the subjective value with the mean–variance model (Markowitz, 1952; Levy and Markowitz, 1979; Tobler et al., 2009). These studies revealed various risk preferences of human subjects, and further facilitated research to find the neural correlates of trait-like risk attitude by coupling with brain imaging techniques (Tom et al., 2007; Takahashi et al., 2010; Gilaie-Dotan et al., 2014). Such an economic approach has also been applied to some animal studies using a liquid reward as an alternative of a monetary reward, and they consistently reported non-linear probability distortion of monkeys just like humans (Stauffer et al., 2015; Chen and Stuphorn, 2018).

Although economic approaches began to elucidate the mechanisms of risk-sensitive decisions across species, direct application of economic tasks to animals may pose limitations; for example, the cognitive capacity (e.g., working memory) of animals is not comparable to that of humans, but is largely limited to adaptation to their ecological niche (Krebs et al., 1977; Stevens et al., 2005; Elmore et al., 2011). Such disparity may enforce extra task-demands on animals even in physically identical task settings (Pearson et al., 2010; Blanchard et al., 2013). Another problem is that making repeated choices among available options is an unfamiliar setting for animals considering their feeding ecology, in which they typically make a cost– benefit decision on a single prey (i.e., non-choice decisions) (Krebs et al., 1977; Kacelnik et al., 2011; Hayden and Walton, 2014). As recently suggested, such non-choice decisions recruit distinct brain circuits to that for two-option choices (Kolling et al., 2012; Shenhav et al., 2016). Moreover, some studies using human subjects emphasized that humans showed distorted risk preference in the task without choice (Tobler et al., 2008; Levy et al., 2011). Hence, from an ethological perspective, it is worthwhile to test the risk preference of monkeys in a non-choice decision paradigm.

In this study, we aimed to assess the naturalistic risk attitude of monkeys by minimizing undesirable task demands. We adopted a non-choice, instrumental lever-release task, in which a visual cue revealed the size and probability of forthcoming reward condition as being either deterministic or probabilistic. The basic setting of this task was shown to be useful for inferring monkeys' evaluation of a certain reward value (e.g., reward size) based on their performance (Minamimoto et al., 2009). The inference has been formulated and applied in many studies (Bouret and Richmond, 2015; Eldridge et al., 2016; Nagai et al., 2016; Fujimoto et al., 2019), and can be extended to temporal discounting and workload discounting using the same basic task structure (Minamimoto et al., 2009, 2012). Here, we implemented well-known economic models to assess the trait-like and statedependent risk attitude of monkeys in a quantitative manner (Stauffer et al., 2015; Chen and Stuphorn, 2018). Our results may fill the gap between human and monkey studies using economic tasks, thus providing a useful animal model to investigate the neural basis of risk-sensitive decision-making.

### MATERIALS AND METHODS

### Subjects

Three male macaque monkeys (Macaca mulatta, monkeys ST and KY, 5.3 kg and 6.8 kg; Macaca fuscata, monkey HI, 7.6 kg) were used. All experimental procedures were approved by the Animal Care and Use Committee of the National Institutes for Quantum and Radiological Science and Technology and were in accordance with the guidelines published in the NIH Guide for the Care and Use of Laboratory Animals.

### Behavioral Task

The monkeys squatted on a primate chair inside a dark, soundattenuated, and electrically shielded room. A touch-sensitive lever was mounted on the chair. Visual stimuli were displayed on a computer video monitor in front of the animal. Behavioral control and data acquisition were performed using a real-time experimentation system (REX) (Hays et al., 1982). Presentation software was used to display visual stimuli (Neurobehavioral Systems Inc., Berkeley, CA, United States).

The monkeys performed the single-option response task (**Figure 1A**). In each trial, the monkey had the same requirement to obtain liquid rewards. A trial began when a monkey gripped a lever. A visual cue and a red spot appeared sequentially, with a 0.4 s interval, at the center of the monitor. After a variable interval (0.5–1.5 s), the central spot turned to green ("go" signal), and the monkey had to release the lever within the reaction time (RT) window (0.2–1.0 s). If the monkey released the lever correctly, the spot turned to blue (0.2–0.4 s), and then a reward was delivered in accordance with the visual cue. The next trial began following an inter-trial interval (ITI, 1.5 s). When trials were performed incorrectly, they were terminated immediately (all visual stimuli disappeared), and the next trial began with the same reward condition following the ITI. There were two types of errors: premature lever releases (lever releases before or no later than 0.2 s after the appearance of the go signal, named "early errors") and failures to release the lever within 1.0 s after the appearance of the go signal (named "late errors").

The combination of reward size and its probability was informed by the visual cue (grayscale images) at the beginning of each trial; four cues were used for the deterministic trials and the other four for the probabilistic trials (**Figure 1B**). In the deterministic trials, the size of the reward (1, 2, 3, or 4 drops) was chosen randomly, and the reward probability was fixed at 100%. In the probabilistic trials, the size of the reward was fixed at 4 drops and the probability of the reward (25, 50, 75, or 100%)

was chosen randomly. Thus, the expected value was matched across the two conditions. The training schedule was as follows. Prior to the experiment with the single-option response task, all monkeys had been trained to perform color discrimination trials in a cued multi-trial reward schedule task for >1 month. Next, the monkeys were trained in the deterministic trials for 3 weeks, and subsequently in the probabilistic trials for 3 weeks, respectively ("separate" phase). Finally, the monkeys were tested under the condition in which deterministic and probabilistic trials were intermingled, and the test ran for >6 weeks ("mixed" phase; **Figure 1C**). The data obtained during the mixed phase (43, 53, and 41 sessions for monkeys ST, KY, and HI, respectively) were analyzed in the current study. The number of trials in a session was 1,338 ± 79 trials for monkey ST, 1,206 ± 300 trials for monkey KY, and 1,384 ± 109 trials for monkey HI, and the amount of reward intake in a session was 325 ± 20 ml for monkey ST, 286 ± 75 ml for monkey KY, and 327 ± 38 ml for monkey HI (mean ± SD).

#### Experimental Design and Statistical Analysis

All statistical analyses and model fitting were performed using R statistical software. We analyzed the error rate and RT. The error rate was calculated by dividing the total number of errors (the sum of early and later errors) by the total number of trials in a session. We reported the average error rate across sessions and the standard error of the mean (SEM). RT was defined as the duration from a "go" signal to the time point of lever release in a correct trial.

As previously shown, the error rate in the same paradigm with deterministic reward has an inverse relationship to the subjective value (inverse function, Minamimoto et al., 2009). To infer the subjective reward value in each monkey, we used a modified version of the inverse function:

$$E = \frac{c}{V+b} \tag{1}$$

where E and V represented the error rate and the subjective value, while c and b were free parameters that represented the reward sensitivity of monkeys. We confirmed that this model fitted well with the error rates in deterministic trials of the training session, where (V) corresponded to the reward size (1, 2, 3, and 4 drops; R <sup>2</sup> > 0.86). We extended this model to infer the subjective reward value of probabilistic trials using three models: GW, Prelec, and mean–variance models (see below). For each monkey, parameters c and b were first determined using the best-fit of the inverse function (Eq. 1) to the error rate in the deterministic trials. These parameters were then applied to Eq. (1), which integrated one of the three subjective value models as V and then was fitted to the error rates in the probabilistic trials.

#### GW Model

According to Gonzalez and Wu (1999), probability weighting function, w(p), was formulated as below:

$$\mathfrak{w}(p) = \frac{8p^\vee}{8p^\vee + (1-p)^\vee} \tag{2}$$

where p represents the probability of winning a reward (25, 50, 75, and 100%), and γ and δ are free parameters that control the curvature and elevation of the function, respectively. This model yields non-linear probability weighting function, although it allows monotonic increase/decrease of probability weighting when γ = 1. Subjective value V was then calculated by multiplying the reward magnitude m (4 drops) and subjective probability

w(p) in accordance with the prospect theory (Kahneman, 1979; Tversky and Kahneman, 1992).

$$V = m \times \mathcal{w}(\mathfrak{p})\tag{3}$$

#### Prelec Model

According to Prelec (1998), the probability weighting function was formulated as below:

$$\mathbf{w}(p) = e^{(-\beta(-\ln(p))^a)}\tag{4}$$

where α and β are free parameters that control the curvature and elevation of the function, respectively. For the one-parameter Prelec model, β is fixed at 1; this function yields an inverted S-shape in α > 1, while it yields S-shape in α < 1, with inflection point (p = w(p)) around p = 1/e. We defined the subjective values with Eqs (3) and (4).

#### Mean–Variance Model

According to financial theory, the subjective value is determined by combining the expected value (EV) and variance risk (Var) (Markowitz, 1952; Levy and Markowitz, 1979). First, EV and Var are calculated as follows:

$$\text{EV} = m \times \text{p} \tag{5}$$

$$\text{Var} = ((m - \text{EV}) \times p)^2 + ((0 - \text{EV}) \times (1 - p))^2 \tag{6}$$

Then, the subjective value is defined as:

$$V = \text{EV} + \text{Var} \times \text{s} \tag{7}$$

where ε is a free parameter that describes a bonus by the variance risk.

The model fittings were performed using the "optim" function implemented in R software. Standard error of estimated parameter was calculated by means of the Hessian matrix at the function. The goodness of fit was assessed with the R 2 value and Akaike Information Criteria (AIC) (Akaike, 1973), which is calculated as follows:

$$\text{AIC} = -2\log L + 2k \tag{8}$$

where L is the maximum likelihood of the model and k is the number of free parameters in the model. Smaller AIC values indicated a better model fit to the data. A likelihood ratio test was used to compare GW models. The p-value was obtained by the parametric bootstrapping method (n = 10,000).

The effect of the satiation level on risk attitude was assessed using a measure of accumulated reward level (Minamimoto et al., 2009). Satiation level (S) was defined as the normalized liquid intake that is the ratio between the amount of total reward delivered up to time t, Rcum(t), and the total amount of reward delivered in the entire session, RcumMax:

$$\mathcal{S} = \frac{R\_{\text{cum}}(t)}{R\_{\text{cum}} \text{Max}} \tag{9}$$

The effect of the history of previous reward was also assessed by logistic regression analysis:

$$P = \beta\_1 R + \beta\_2 S + \beta\_3 PR + e \tag{10}$$

where P is the performance (i.e., correct or error), R is the reward size, S is the satiation level, PR is the reward size in the previous trial, β are the regression coefficients, and e is a constant.

#### RESULTS

#### Risk Preference in Three Monkeys

The error rate and RT were the two main behavioral measures of the monkeys' valuation of the current task; the more reward value is expected, the less the subjects make errors and the faster they respond (Minamimoto et al., 2009; Nagai et al., 2016; Fujimoto et al., 2019). We first compared the overall error rate and RT between deterministic (1, 2, or 3 drops) and probabilistic trials (25, 50, or 75%) in each session separately. For this analysis, we excluded the trials of which the expected value was 4 drops (and the probability was 100%) to focus on the effect of risk. Although expected values were equivalent between the two trial types, motivation of monkey ST appeared to be higher in probabilistic trials; the overall error rate in the deterministic trials was significantly higher than that in the probabilistic trials (n = 43, p < 0.01, rank-sum test; **Figure 2A**, left), and RT in the deterministic trials was significantly longer than in the probabilistic trials (n = 43, p < 0.01, rank-sum test, **Figure 2B**, left). These results indicated a risk-prone tendency of this monkey, which was consistent across sessions. Monkey KY also showed a risk-prone tendency; the error rate and RT were significantly larger and longer in the deterministic trials (error rate, n = 53, p = 0.049; RT, n = 53, p < 0.01; **Figures 2A,B**, middle column). Monkey HI, on the other hand, displayed the opposite pattern; the error rate and RT tended to be larger and longer in the probabilistic trials (error rate, n = 41, p = 0.54; RT, n = 41, p < 0.01; **Figures 2A,B**, right column), indicating a riskaverse tendency of this monkey. These results demonstrate that our task allowed us to characterize the individual risk preference of monkeys as a consistent behavioral bias across sessions, which was not uniform across the monkeys examined.

As we reported previously, the error rate in the deterministic trials varied depending on the reward size, with higher error rates for smaller reward (**Figure 3**, plots in red), the relation of which was well explained by an inverse function (Eq. 1, R <sup>2</sup> > 0.80) (Minamimoto et al., 2009; Nagai et al., 2016). The error rate in the probabilistic trials also reflected the expected value of reward; however, they were lower (monkeys ST and KY) or higher (monkey HI) than those in deterministic trials for the corresponding expected value (**Figure 3**, plots in blue). Threeway ANOVA (expected value: 1, 2, 3, and 4 drops × trial type: deterministic or probabilistic × Monkey) revealed a significant main effect of the expected value [F(1,1088) = 39.6, p < 0.01] and a significant interaction of the trial type and monkey [F(1,1088) = 4.9, p = 0.027], suggesting the effects of reward expectation and individual risk preference on the subjective valuation of probabilistic rewards.

#### Simulations With Parsimonious Models

To describe the relationship between error rate and reward probability, we used a modified version of the inverse function

with the subjective value of probabilistic reward (i.e., subjectivevalue model). To estimate the subjective valuation of monkeys, we employed the probability-weighting function developed by

rates (mean ± SEM) in deterministic (red) and probabilistic trials (blue) are plotted as a function of expected values for monkey ST (left), KY (center), and HI (right). The best-fit inverse function (red) is superimposed on the plots (ST: c = 7.3, b = –1.8; KY: c = 26.9, b = 2.3, HI: c = 41.9, b = 5.6) with the goodness of fit (R 2 ) on each panel.

Gonzalez and Wu (1999) ("GW model," Eq. 2), a prospecttheory model that is widely used to describe non-linear probability distortion measured in economic tasks. Because both probabilistic and deterministic trials were tested in the same sessions, we used the same monkey-specific parameters c and b in the inverse functions to explain the error rates in two trial types (see the section "Materials and Methods").

The GW model implements two free parameters: γ and δ, control curvature and elevation of function, respectively. First, we simulated how each parameter modifies the probabilityweighting function and the error rate by using parsimonious models ("partial GW models"), which incorporate one free parameter. When γ in the GW model was fixed [GW (δ| γ = 1)], the probability-weighting function became concave when δ > 1, while it became convex when δ < 1 (**Figure 4A**). The error rate in the probabilistic trials then simply rose or fell compared to that in the deterministic trials (**Figure 4B**). When δ in the GW model was fixed [GW (γ| δ = 1)], on the other hand, the function became S-shaped when γ > 1, while it became inverted S-shaped when γ < 1 (**Figure 4C**). Under this condition, the error rates in the two trial types crossed each other; when γ < 1, for instance, the error rate in 25% trials was lower than in 1-drop trials and that in 75% trials was higher than in 3-drop trials (**Figure 4D**). Because the data demonstrated simple reduction (monkeys ST and KY) or elevation (monkey HI) of error rate by imposing

Schemas of the figures are the same as in A and B.

risk (**Figure 3**), the simulation suggests that the partial GW model with fixed γ [GW (δ| γ = 1)] may explain the probability distortion of monkeys.

### Modeling Individual Risk Preference Reflecting Trait-Like Risk Attitude

The subjective-value model implementing the GW model [GW (γ, δ)] well explained the error rate in the probabilistic trials for all monkeys (R <sup>2</sup> > 0.75, **Figure 5A**). As predicted in the simulation, the best-fit probability-weighting function with the GW model showed a simple convex or concave pattern (**Figure 5B**), demonstrating overweighting of reward probability (monkeys ST and KY) and underweighting of reward probability (monkey HI) in subjective valuation of the probabilistic reward. This result suggests risk-prone tendency of monkeys ST and KY and risk-averse tendency of monkey HI, as demonstrated in **Figure 2**. Then, to validate the parsimonious model, we tested whether the partial GW model with fixed γ [GW (δ| γ = 1), **Figure 4A**] also fits the data. As expected, the subjective-value model implementing the partial GW model with fixed γ well described the error rate in the probabilistic trials for all monkeys (R <sup>2</sup> > 0.74, **Figure 5C**). The best-fit probability-weighting function and estimated parameter δ (**Figure 5D**) was comparable to those estimated by the full GW model. In contrast, the partial GW model with fixed δ or the simple GW model with fixed γ and δ did not provide good fits to the error rate in the probabilistic trials [GW (γ| δ = 1) and GW (γ = 1, δ = 1), **Table 1**]. The partial GW model, GW (δ| γ = 1), explained the data significantly better than the simple GW model in all monkeys (p < 0.05, likelihood ratio test), suggesting that unfixed parameter δ is essential and sufficient for explaining the individual risk preference of monkeys measured in the single-option response task. We also tested whether the subjective-value model (the inverse function fusing the partial GW model with fixed γ), which incorporated three free parameters c, b, and δ, fits the error rate in both trial types. The model again fitted well with the data for all monkeys (R <sup>2</sup> > 0.81), suggesting the robustness of the modified inverse function in the current task.

As shown in **Figure 2**, the risk in reward outcome biased error rate and RT in the same direction, and the direction of bias was roughly consistent during the testing period. Given that what we modeled reflected the trait-like risk attitude of monkeys, the direction of risk preference (i.e., risk-prone or

risk-averse), in other words, a convex or concave probability weighting pattern, should be stable over a longer time period. To confirm the stability of individual risk preference, we separately calculated δ in the partial GW model for the early (e.g., #1–20 sessions) and late testing sessions (e.g., #21–40 sessions) for each monkey. As expected, risk preference was consistent over the


<sup>∗</sup>The significant better fits than GW (γ = 1, δ = 1) (p < 0.05; likelihood ratio test).

sessions; monkeys ST and KY showed high δ (>1) either in early or late sessions (ST early: 2.8 ± 1.0, ST late: 2.3 ± 0.6; KY early: 1.3 ± 0.8, KY late: 3.2 ± 0.9, mean ± SEM), while monkey HI consistently showed low δ (<1) between the two periods (early: 0.25 ± 0.43, late: 0.51 ± 0.14). These results suggested that we modeled the trait-like risk attitude of the monkeys.

### Convex/Concave Probability Distortion Was Not Model-Specific

The error rate was also well explained by other subjectivevalue models that incorporated the Prelec model (Eq. 4, R <sup>2</sup> > 0.77, **Figure 6A**) or mean–variance model (Eq. 7, R <sup>2</sup> > 0.71, **Figure 6C**), which also assume non-linear probability distortion (Markowitz, 1952; Levy and Markowitz, 1979; Prelec, 1998). The best-fit probability-weighting function calculated by the Prelec model (**Figure 6B**) or mean–variance model (**Figure 6D**) showed the convex or concave pattern that was comparable to that calculated by the full or partial GW model (**Figures 5B,D**). Thus, the individual risk preference assessed in

the single-option response task can be modeled reasonably well by the economic models with a free parameter focusing on the elevation. The goodness of fit (AIC) and parameters estimated are summarized in **Table 2**.

### Assessing State-Dependent Risk Attitude Within a Session

the figures are the same as in Figures 5B,D.

In addition to trait-like risk attitude, physiological drive state can influence risk attitudes; for example, thirsty monkeys became


more risk averse (Yamada et al., 2013). To examine the effect of satiation on risk attitude, we analyzed the error rate in the subparts of a session according to reward accumulation (satiation level: 0–0.5, 0.25–0.75, 0.5–1.0; see the section "Materials and Methods"). We found that the difference in error rate between deterministic and probabilistic trials varied depending on the satiation level [one-way repeated measures ANOVAs, main effect of satiation level, F(1,409) = 5.9, p = 0.015, **Figures 7A–C**]. The satiation level also affected RT; the difference in RTs between the two conditions increased according to satiation [main effect of satiation level, F(1,409) = 17, p < 0.01].

The satiation effect on risk attitude was further assessed by the modeling approach; we fitted the subjective-value model implementing the partial GW model with fixed γ to the error rate in the probabilistic trials and extracted the best-fit parameter δ from the probability-weighting function for each sub-session (**Figures 7D–F**). We found that parameter δ tended to increase in the latter sub-sessions for all monkeys; the risk-proneness of monkeys ST and KY was evident in the early period and was enhanced thereafter, while monkey HI exhibited weaker risk-averseness as the session progressed and became nearly

according to satiation for each monkey. The best-fit function for the data of each sub-session (left: 0–0.5, center: 0.25–0.75, right: 0.5–1.0, satiation level) is displayed. (G–I) Parameter δ is plotted for each sub-session and for each monkey. Colors are the same as in D–F.

risk-neutral in the last sub-session (**Figures 7G–I**). In contrast, the direction of risk attitude was unchanged over a session; δ was always >1 in monkeys ST and KY, whereas it was always <1 in monkey HI. These results demonstrated a state-dependent risk attitude in monkeys; that is, the risk preference gets stronger according to satiation.

## Partial Effects of Reward History on Performance

In our task design, the subjective value of probabilistic reward was associated with the cue but was independent from trial sequence or history. However, monkeys could take local contextual reward information into account for the reward expectation that may influence the performance (i.e., correct or error). In other words, the differences in error rate between the deterministic and probabilistic trials could arise from the effect of reward history. If so, the effect should be parallel with the risk preferences of the three monkeys. To address this possibility, we performed logistic regression analysis with three regressors: expected value (1, 2, 3, or 4 drops), satiation level (0–1), and previous reward (0, 1, 2, 3, or 4 drops). Expected value and satiation level significantly contributed to the performance for all monkeys

(p < 0.05 with Bonferroni correction; **Figure 8**). The previous reward, on the other hand, affected only the performance of monkey KY (p < 0.01), but not the other two (p > 0.10, **Figure 8**). This pattern of individual differences was unrelated to that of risk preference or state-dependent change among the three monkeys. Thus, the effect of reward history was apparently limited and did not correlate with individual risk attitude in our experimental condition.

### DISCUSSION

In the present study, monkeys' risk attitude was assessed by a single-option response task, in which the subjective value of a probabilistic reward was inferred from their performance. To the best of our knowledge, this is the first study to examine risk preference of monkeys in a non-choice paradigm. Model-based analysis revealed non-linear probability distortion and diverse risk preference among three monkeys. The subjective probability weighting of monkeys was well explained by economic models and showed a simple convex/concave pattern over testing sessions. Regardless of baseline risk preference, all monkeys showed an increase in risk preference as satiation increased in a session. The current results thus highlighted the traitlike and state-dependent risk attitude of monkeys in nonchoice decisions.

Past studies demonstrated that monkeys show non-linear probability distortion using economic tasks (Stauffer et al., 2015; Chen and Stuphorn, 2018). The present study replicated this in the single-option response task that imposed no choice demand. The basic structure of the current task was shown to be useful to infer the valuation of monkeys when reward size or cost was varied (Minamimoto et al., 2009, 2012; Bouret and Richmond, 2015; Eldridge et al., 2016; Nagai et al., 2016; Fujimoto et al., 2019). By implementing known economic models, the present study extended this basic model to infer the subjective reward value of probabilistic reward. Our monkeys demonstrated a diverse preference for the risk; two monkeys showed risk-prone, and one showed risk-averse. This seems to reflect the trait-like risk attitude of monkeys because their risk preferences were consistent across sessions. Their performance in probabilistic trials was well demonstrated by a subjective-value model incorporating a nonlinear probability weighting function (Markowitz, 1952; Levy and Markowitz, 1979; Prelec, 1998; Gonzalez and Wu, 1999), and thus the results were largely consistent with the above literature despite differences in task structures and measures of subjective valuations. Our results also suggest that economic models are generalizable for describing the probability distortion in non-choice, ecological decisions (Hayden and Walton, 2014; Pearson et al., 2014).

Unlike the previous studies, our monkeys showed a simple convex or concave probability distortion, and that pattern was well explained by a parsimonious GW model in which one free parameter concerning the elevation of function was adopted. On the above studies using economic tasks, all monkeys tested showed inverted S-shaped probability distortion (i.e., riskseeking for low probability and risk-aversion for high probability) and was well-explained by Prelec's function with α < 1, while the same model failed to explain the monkeys' performance in the current study. Such a stereotypical pattern observed in the previous studies may arise from excessive task demand in economic tasks; the cognitive load due to choice demand could diminish sensitivity to the difference in the reward probability and result in the inverted S-shape probability distortion. Indeed, recent studies showed that manipulation in task structure (e.g., trial sequence) of economic tasks affected monkeys' inverted S-shape probability distortion, potentially due to contamination of reward history (Farashahi et al., 2018; Ferrari-Toniolo et al., 2019). Importantly, the effect of reward history was limited in our paradigm, and hence did not account for the observed individual risk preference. Therefore, the discrepancy could be

attributed solely to the task design concerning the ecological decision situation.

As a genetic kinship, humans and monkeys share a large number of cognitive traits. However, because monkeys learn the option value through their experience, a task structure per se would largely influence their task performance and therefore hamper a straightforward interpretation by investigators (Real, 1991). For example, Blanchard et al. (2013) demonstrated that monkeys did not care about the length of the delay period after reward delivery, and that had led to misunderstanding by preceding researchers about the temporal-discounting ability of monkeys. Similarly, economic tasks could contain undesirable confoundings, such as working memory, inhibitory control, and value comparison, which may affect decision strategy and obscure natural behavioral traits (Stephens and Krebs, 1986; Elmore et al., 2011; Blanchard et al., 2014; Hayden and Walton, 2014). The current study eliminated such undesirable confounding effects by adopting a non-choice decision in the task. In fact, our monkeys quickly learned to perform the single-option response task (<1 month), while it usually takes several months for monkeys to learn to perform two-option choices. Unlike using choice tasks, diverse individual differences in trait-like risk attitudes were seen in our monkeys, as observed in human studies (Tom et al., 2007; Tobler et al., 2008; Takahashi et al., 2010; Gilaie-Dotan et al., 2014), and therefore the current task may provide a better opportunity to assess the naturalistic risk attitude of monkeys.

Adapting risk attitude based on current needs is vital for maximizing fitness in an uncertain environment (Stephens and Krebs, 1986). Human studies showed that subjects flexibly modulate risk attitude based on required points or "wealth level" even during a single experimental session (Symmonds et al., 2011; Kolling et al., 2014; Fujimoto and Takahashi, 2016; Juechems et al., 2017). Yamada et al. (2013) directly demonstrated the relationship between risk preference and satiety by monitoring the blood osmolality level within a session in macaque monkeys, which is a physiological form of "wealth level." Consistently, our monkeys showed enhancement of riskprone tendency (ST and KY) or suppression of risk-aversion (HI) according to reward accumulation, and our model-based analysis successfully described the satiation effect. Of note, the increase of risk preference reflects state-dependent risk attitude, because it occurred irrespective of baseline risk preference. This change of risk preference within a session is not attributable to the reward history effect, which was limited in the monkeys. Importantly, human studies suggested that state-dependent modulation of risk attitude was not accounted for by change of the physiological state itself either (Symmonds et al., 2011; Kolling et al., 2014; Fujimoto and Takahashi, 2016). Hence, the current approach successfully quantified the trait-like and state-dependent risk attitude of monkeys within one task, suggesting a useful model of risk-sensitive decision for translational research.

What causes the inconsistent risk preference across animals still remains unclear. Probably the most well-known factors that lead to differences in risk attitude in humans are gender and age (Walker et al., 2017). However, they are unlikely to have a role in the current study because we solely used adult male monkeys. Another possible cause is social rank (Davis et al., 2009), but the contribution of this factor is unknown because we have not tested the social relationship of our monkeys. Future study should validate the exact cause of individual risk preference by employing a larger cohort of animals.

Past studies reported that the trait risk attitude correlated with individual differences in monoamine systems (Berridge and Waterhouse, 2003; Roiser et al., 2009; Takahashi et al., 2010), brain structures (Gilaie-Dotan et al., 2014; Leong et al., 2016), and activity patterns (Kuhnen and Knutson, 2005; Huettel et al., 2006; Preuschoff et al., 2008; Levy et al., 2010) of human subjects. However, the neural substrates of individual risk preference in monkeys are largely unknown. Our behavioral assessment, which successfully demonstrated diverse risk attitude in monkeys with single free parameter (δ), may provide an excellent opportunity to explore the neural basis of individual risk preference, as the animal model allows us to measure neural activities directly, and to use neural modulation techniques (cf., Nagai et al., 2016). One of the potential applications is the study of gambling disorder (GD), which is considered to be a dysfunction of risk-sensitive decision (Hodgins et al., 2011; American Psychiatric Association [APA], 2013). Indeed, we recently showed that GD patients had deficits not only in trait-like risk attitude but also in statedependent risk attitude (Fujimoto et al., 2017). Therefore, future study should identify the neural substrates of both trait-like and state-dependent risk attitude in monkeys, providing therapeutic targets for GD patients.

One of the limitations of the current study was the small sample size. We thus could not address the mechanism behind individual differences in risk attitude. Another limitation was that we used only one reward size for probabilistic trials (4 drops); modifying the range of reward size may influence monkeys' risk attitude. Further validation with a larger cohort and/or broader reward environments will be needed to generalize our findings and identify other factors that influence the risk attitude of monkeys.

In conclusion, our approach based on economics and behavioral ecology illustrates the trait-like and state-dependent risk attitude of monkeys. Because our model-based analysis employed well-known functions from past human studies, the current animal model may accelerate translational research to determine neural mechanisms underlying risk-sensitive decision-making.

#### DATA AVAILABILITY

All datasets generated for this study are included in the manuscript and/or the supplementary files.

### ETHICS STATEMENT

All experimental procedures were approved by the Animal Care and Use Committee of the National Institutes for Quantum and Radiological Science and Technology and were in accordance with the guidelines published in the NIH Guide for the Care and Use of Laboratory Animals.

### AUTHOR CONTRIBUTIONS

AF designed and performed the research, analyzed the data, and wrote the manuscript. TM designed the research and edited the manuscript.

#### FUNDING

This work was supported by the JSPS KAKENHI [Grant Numbers JP15H06872 and JP17K13275 (to AF) and JP18H04037 (to TM)] from the Ministry of Education, Culture, Sports, Science, and Technology of Japan (MEXT), by the Takeda Science Foundation Overseas Research Fellowship (to AF)

#### REFERENCES


and by the AMED Grant Numbers JP18dm0107146 and JP18dm0207007 (to TM).

#### ACKNOWLEDGMENTS

We thank Dr. Y. Sakai for his comments on an early version of the manuscript, Dr. T. Suhara for financial support, Dr. K. Mimura for his assistance with statistical analyses, and members from the Neural Systems and Circuits Research Group, QST, for invaluable scientific discussion. We also thank J. Kamei, Y. Matsuda, R. Yamaguchi, Y. Sugii, R. Suma, and A. Maruyama for their technical assistance.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Fujimoto and Minamimoto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neuronal Representation of Object Choice in the Striatum of the Monkey

#### Satoshi Nonomura1,2 \* and Kazuyuki Samejima<sup>1</sup>

<sup>1</sup> Brain Science Institute, Tamagawa University, Tokyo, Japan, <sup>2</sup> Physiology and Cell Biology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan

According to a widely held view, the decision-making process can be conceptualized as a two-step process: "object choice," which does not include physical actions, followed by "movement choice," in which action is executed to obtain the object. Accumulating evidence in the field of decision neuroscience suggests that the cortico-basal ganglia circuits play a crucial role in decision-making. However, the underlying mechanisms of the object and movement choices remain poorly understood, mainly because the two processes occur simultaneously in most experiments. In this study, to uncover the neuronal basis of object choice in the striatum, the main input site of the basal ganglia, we designed a behavioral task in which the processes of object and movement choice were temporally separated, and recorded the single-unit activity of phasically active neurons (PANs) (n = 375) in the striatum of two monkeys. We focused our study mainly on neuronal representation during the object choice period, before movement choice, using a mutual information analysis. Population striatal activities significantly represented the information of the chosen object during the object choice period, which indicated that the monkeys actually made the object choice during the task. For the activity of each individual neuron during the object choice period, we identified offered objectand chosen object-type neurons, corresponding to pre- and post-decision signals, respectively. We also found the movement-type neurons during the movement period after the object choice. Most offered object- or chosen object-type neurons were not overlapped with movement-type neurons. The presence of object choice-related signals independent of movement signal in the striatum indicated that the striatum was part of the site where object choice was made within a cortico-basal ganglia circuit.

#### Edited by:

Hiroshi Yamada, University of Tsukuba, Japan

Reviewed by: Veit Stuphorn, Johns Hopkins University, United States NaYoung So, Columbia University, United States

> \*Correspondence: Satoshi Nonomura satoshi.nonomura@gmail.com

#### Specialty section:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 28 March 2019 Accepted: 12 November 2019 Published: 28 November 2019

#### Citation:

Nonomura S and Samejima K (2019) Neuronal Representation of Object Choice in the Striatum of the Monkey. Front. Neurosci. 13:1283. doi: 10.3389/fnins.2019.01283 Keywords: decision-making, object choice, striatum, monkey, electrophysiology, mutual information analysis

## INTRODUCTION

We often make decisions among abstract outcomes without undertaking physical actions. For example, imagine that you are in a kaitenzushi restaurant (also known as conveyor-belt sushi or sushi train). You can decide on the sushi topping before reaching your hand toward the sushi on the dish carried by conveyor belt. In this case, the first step, which does not include the physical action (reaching your hand), could be regarded as the "object choice"; it is followed by the second step, the "movement choice," in which an action is executed to obtain the object when the object is conveyed in front of you. Recently, several neuroscientists have discussed the concept and neuronal mechanism of the consecutive two-step decision processes (Samejima and Doya, 2007; Padoa-Schioppa, 2011; Cisek, 2012; Chen and Stuphorn, 2015).

The striatum, the main entry nucleus of the basal ganglia, is thought to play major roles in decision-making. Anatomically, the striatum has inputs from various cerebral cortical areas, including the prefrontal, higher-order motor, and primary motor cortex, and it returns these inputs to the cortical areas largely in parallel via the thalamus (Yeterian and Van Hoesen, 1978; Selemon and Goldman-Rakic, 1985; Alexander et al., 1986; Flaherty and Graybiel, 1993; Haber and Knutson, 2010). Clinically, patients with Parkinson's disease, Huntington's disease, or obsessive–compulsive disorders, all of which are considered disorders of the basal ganglia, exhibit cognitive dysfunction in action choice as well as motor behaviors (Graybiel and Rauch, 2000; Mink, 2003; Frank et al., 2004; Beste et al., 2008). Several lines of evidence from primate and rodent electrophysiological and optogenetic studies have shown that the striatum plays important roles in decision-making by predicting future goals, taking action, and monitoring performance and outcome in order to improve future behavior (Lauwereyns et al., 2002; Takikawa et al., 2002; Cromwell and Schultz, 2003; Samejima et al., 2005; Yamada et al., 2007; Lau and Glimcher, 2008; Cai et al., 2011; Tai et al., 2012; Nonomura et al., 2018).

Note that although there is considerable evidence for the neural basis of decision-making in the striatum, it remains unknown whether and how this region of the brain is involved in the consecutive two-step choice process, i.e., object and movement choice. Because most studies in primates and rodents adopted behavioral tasks in which the alternatives for choice included both motor and non-motor factors simultaneously, e.g., alternatives predicting different reward values (non-motor factor) and the direction of a moving joystick (motor factor) (Samejima et al., 2005), neuronal activity in relation to the object and movement choices could not be clearly dissociated. Several studies have reported that an object signal unrelated to movement direction to guide the choice was represented in the orbitofrontal cortex (OFC), the supplementary eye field (SEF), and the amygdala (Padoa-Schioppa and Assad, 2006; So and Stuphorn, 2010; Grabenhorst et al., 2012; Cai and Padoa-Schioppa, 2014; Chen and Stuphorn, 2015). However, few studies have investigated the neuronal representation related to object choice in the striatum.

In this study, to investigate the neuronal representation of object choice in the striatum, we designed a choice task, in which consecutive two-step choice processes were temporally decomposed, recorded single-unit activity in the striatum of macaques performing the task, and performed a mutual information analysis. This is the first study to provide an evidentiary neuronal representation of the striatum for object choice.

#### MATERIALS AND METHODS

#### Animals and Surgery

All experiments were approved by the Animal Research Ethics Committee of Tamagawa University (animal experiment protocol H21/27-14) and were carried out in accordance with the Fundamental Guidelines for Proper Conduct of Animal Experiments and Related Activities in Academic Research Institutions (Ministry of Education, Culture, Sports, Science and Technology of Japan) and the Guidelines for Animal Experimentation in Neuroscience (Japan Neuroscience Society). All surgical procedures were performed under appropriate anesthesia, and all efforts were made to minimize suffering (see below). Our procedures for primate animal experiments were established in previous studies at Tamagawa University (Nakayama et al., 2008; Yamagata et al., 2009; Hashimoto et al., 2010; Saga et al., 2011; Arimura et al., 2013).

We used two monkeys (Macaca fuscata): monkey 1 (8.5 kg) and monkey 2 (8.0 kg). During the experimental sessions, each monkey sat in a chair with its head and both arms restrained and its right wrist left free to enable it to push a button with its hand; the button was installed in front of the chair at waist level. A 19-inch video monitor screen equipped with a speaker to provide sound stimulation was placed in front of the monkey. Eye positions were monitored at 240 Hz with an infrared eye-tracking system (resolution, 0.25◦ in visual angle; EYETRAC6000, Applied Science Laboratories). The distance between the screen and the monkey's eyes was 340 mm. A tube was located near the monkey's mouth to give a reward of apple juice. The amount of reward was controlled by opening and closing an electromagnetic valve via a control signal from a TEMPO system (Reflective Computing, Olympia, WA, United States), which was also used to control the behavioral task, visual stimulus presentation by the liquid crystal display, and the sound stimulus predicting the amount of reward. The order of presentation of the visual stimuli was controlled by custom MATLAB code (Math Works).

#### Behavioral Task

Two tasks were designed, a free-choice task and an instruction task. While seated in the chair, the monkey performed the task by operating a push-button with its right hand according to a visual stimulus presenting the alternatives for choice. If the monkey successfully performed the task, an apple juice reward following the reinforcement sound was given. Four different amounts of reward were used (reward 1, 0.095 ml; reward 2, 0.190 ml; reward 3, 0.284 ml; and reward 4, 0.376 ml). A reinforcement sound corresponding to the amount of reward was repeated before actual delivery of the reward (one to four repetitions of a short, high tone, corresponding to one to four units of reward).

In the free-choice task (**Figure 1A**), the monkey had to choose one of two objects presented on the screen. Pushing the button located near the monkey's hand started the task, after which a fixation point (4.5 × 4.5 mm white square dot) appeared in the center of the screen. If the monkey maintained its gaze on the fixation point under 1 degree for 0.8 s, a choice cue was presented in a 40 × 40 mm square area (under 6.7 degrees) for 0.8 s. The choice cue consisted of two types of objects located at four corners (upper left and right, lower right and left). Each object was 20 mm in diameter. The choice cues were randomly picked from 16 objects (four colors × four shapes). After a delay period (0.8–1.2 s), two objects were individually presented again in random order as the first and second target. The monkey had to choose one of the two targets by releasing the button during presentation of the target. If the monkey released the button

was presented as an instruction cue instead of the choice cue in the free-choice task. (C) Object–reward association schedule. The association between four different amounts of reward and object of four different colors and shapes was randomly changed in each block.

during the first presentation of the target (0.8 s), it received a reward of a size corresponding to the first target after a 1.2-s firstrelease delay period and a 0.5-s reinforcement sound. Conversely, if the monkey kept holding the button throughout the first target presentation (0.8 s), the second target was presented following a delay period of 0.4 s. If the monkey released the button during presentation of the second target (0.8 s), it received a reward of a size corresponding to the second target after a 0.5 s reinforcement sound. Trials were separated by an interval of 3–5 s. A trial was aborted if the monkey failed to maintain fixation of its gaze (over 1 degree) throughout presentation of the fixation point (0.8 s). When an aborted trial was detected, all presented objects were immediately extinguished, neither the reinforcement sound nor the reward was delivered, and the trial began again.

In the instruction task (**Figure 1B**), the monkey had to choose only one instructed object presented on the screen. The task sequence was the same as in the choice task, except that in this case, the choice cue was the instruction cue (only one type object).

According to the reward schedule (**Figure 1C**), the task was run in a block of 144 trials consisting of the first through the fourth subblocks. Each subblock included 12 trials of the instruction task and 24 trials of the free-choice task. Four different amounts of juice reward were associated with four colors or shapes in a block of trials. The association of color with reward

and shape with reward was altered block by block. The amount of reward associated with each shape or color in a block was randomly changed across every block.

#### Electrophysiological Recording

After completing the behavioral training, the monkeys underwent aseptic surgery performed under pentobarbital sodium anesthesia (20–25 mg/kg, i.v.) with atropine sulfate. Antibiotics and analgesics were used to prevent postsurgical infection and pain. Polycarbonate screws were implanted in the skull, and two plastic pipes, rigidly attached with acrylic resin, were used to securely fix the head during the daily recording sessions. Part of the skull was removed over the anterior part of the striatum, and a recording chamber was implanted, tilted laterally by 35◦ . To confirm that the chamber was located appropriately to approach the target brain area, magnetic resonance images were recorded. Neuronal activity was recorded with glass-insulated tungsten electrodes (1.0–2 M at 1 kHz; Alpha Omega Engineering) advanced by an oil-driven micromanipulator (MO-97-S; Narishige). The recording sites were determined using a grid system, which allowed us to record at intervals of 1 mm between penetrations. The electrode was introduced into the brain through a stainless steel guide tube, which was inserted into one of the grid holes and then into the brain through the dura mater. Detection of electrical signals from the electrode and online sorting were performed by a Multichannel Acquisition Processor (MAP/16, Plexon). The signal was amplified by a head-stage (HST/8o50-G20) and pre-amp with a band-pass filter (PBX2/16wb-G50, Plexon; final gain, 500; band-pass filter 0.1–8 kHz) and collected at 1 kHz. The behavioral task was controlled by a TEMPO system and MATLAB. The signals controlling the behavioral task from the TEMPO system were recorded in the MAP system with the neuronal signals. Offline sorting of action potentials was performed with an Offline Sorter (ver3, Plexon). The sorted action potentials and behavioral data were analyzed by MATLAB.

The recording site was the striatum of the left hemisphere (A: 21–30 mm and L: 18–27 mm for monkey 1; A: 22–30 mm and L: 18–28 mm for monkey 2). The dorsal border of the striatum was easily identified from changes in the background firing rate as the electrode was introduced through the cortex, white matter, and striatum. We classified striatum neurons as phasically active neurons (PANs) or tonically active neurons based on differences in spontaneous activity and spike waveform (Hikosaka et al., 1989; Aosaki et al., 1994). If we judged a PAN to be responsive to any task event by observing a phasic response during a trial, we started recording. All neurons in the database were recorded across at least two blocks of trials, including one shape-reward and one color-reward block.

#### Data Analysis

To determine whether the monkey actually made an object choice during the choice cue or delay period, we adopted the latter half of the trials in the block (third and fourth subblocks) for analysis to eliminate the effect of learning about the association between reward and objects. Unless otherwise noted, we analyzed the neuronal data in the free-choice task not including the instruction task. To investigate the neuronal representation related to object choice, we performed mutual information analysis for each recorded neuron (Optican and Richmond, 1987). Mutual information for each neuron was calculated based on the difference between a priori information of a task condition and information of the task condition given the firing rate in the trial. The following equation was used:

$$\begin{aligned} I(\mathbb{S}; R) &= H(\mathbb{S}) - H(\mathbb{S} \mid R) \\ &= \sum\_{s} -p(s) \log p(s) - \left\langle \sum\_{s} -p(s \mid r) \log p(s \mid r) \right\rangle\_{r} \end{aligned}$$

where S is the set of task conditions {S1, S2 . . .}, R is the set of observed neuronal activities ri: the firing rate in the i–th trial, H(S) is a prior information entropy of the task condition S, H(S| R) is the information entropy of task condition S given neuronal activity R in the trial, and h i<sup>r</sup> is the mean information entropy across all task conditions s given neuronal activity r. Here two task conditions S (S1 and S2) were used. The first task condition, S1, was six combinations of the choice cue (referred to as "offered object"), including six color or shape combinations (S1color = {red/blue, red/yellow, red/green, blue/yellow, blue/green, and yellow/green} and S1shape = {circle/triangle, circle/square, circle/cross, triangle/square, triangle/cross, and square/cross}). Under the first task condition, S1, p(s) was calculated using the probability of 1/6, and p(s| r) was calculated using the probability that trials s exhibit higher firing rates than the median firing rate across all trials. The second task condition, S2, was four colors or shapes of the chosen object (S2color = {red, blue, yellow, and green} and S2shape = {circle, triangle, square, and cross}). Under the second task condition, S2, p(s) was calculated using the probability of 1/4, and p(s| r) was calculated using the probability that trials s exhibit higher firing rates than the median firing rate across all trials.

To find evidence that the monkeys actually made an object choice before a movement choice, we calculated mutual information of the chosen object during the period from onset of choice cue to onset of the first target. Because there was the potential that mutual information of the chosen object had a spurious correlation with that of the offered object, we checked whether the mutual information of the chosen object was significantly larger than that of the information expected from the offered object. To do this, we adopted the bootstrap method for the hypothesis test and calculated the surrogate mutual information of the chosen object in which the information of the chosen object was randomized but the information of the offered object was kept. For example, to calculate the surrogate mutual information of the chosen shape for every recorded neuron, we generated trial-shuffled data in which the chosen shapes were shuffled randomly in every trial group in which the same shape combination of the offered object was presented (irrespective of their colors). We used the trial-shuffled data and calculated the surrogate mutual information of the chosen shape for every recorded neuron. Then, we calculated the summation of surrogate mutual information of the chosen shape from all recorded neurons. We performed this procedure repeatedly

(10,000 shuffles) and generated the surrogate distribution of the mutual information of chosen shape. The significance level was determined at the top 5% of the surrogate distribution. If the summation of real mutual information of the chosen shape was more than the significance level in the surrogate distribution, we considered that the real information of the chosen shape was significantly larger than that of the information expected from the offered shape at the population level (p < 0.05). In the case of information of the chosen color, we performed the same analysis using color information instead of shape. The dynamics of the summation of real information (**Figure 3B**) was calculated in 0.2-s sliding windows with 0.05 steps. The bootstrap method was performed for eight consecutive 0.2-s windows starting from onset of choice cue 0–0.2-s, 0.2–0.4-s, 0.4–0.6-s, 0.6–0.8-s, 0.8–1.0-s, 1.0–1.2-s, 1.2–1.4-s, and 1.4–1.6 s, corresponding to the upper triangles in **Figure 3B**. As with the chosen object, the significance of mutual information of the offered object was tested (p < 0.05) (**Figure 3C**). The statistical test was the same as for the chosen object except that the surrogate mutual information of the offered object was calculated by trialshuffled data in which the offered objects were shuffled randomly in every trial group in which the same object was chosen, e.g., when the chosen object was red, the offered objects (red/blue, red/yellow, and red/green) were randomized.

To compare two surrogate distributions in **Figure 4** by receiver operating characteristic (ROC) analysis for the classification of chosen (or offered) object and value, we recalculated the two surrogate distributions and the area under the curve (AUC) ten times, and compared the AUCs with 0.5 by Mann–Whitney U-test.

We also checked the significance of mutual information of the chosen object and the offered object at the single-neuron level (**Figure 5**). In this case, we compiled the surrogate distribution of mutual information (100 shuffles) in four consecutive 0.4-s windows from choice cue onset for an individual neuron, and then checked whether its real mutual information was larger than the significance level (top 5% of the distribution of surrogate mutual information, p < 0.05). Furthermore, we performed oneway analysis of variance (ANOVA) with factors of chosen object (color and shape) or offered object (color and shape) for four consecutive 0.4-s windows (p < 0.05). If both statistical tests were passed in the same window, we defined the neuron as the offered object-type (color or shape) or the chosen object type (color or shape).

In addition to the chosen and offered object, we calculated another mutual information using the task condition of "chosen movement" (first or second release) (**Figure 6B**). To find movement-type neurons in **Figures 6A,C,D**, we performed oneway ANOVA with factor of chosen movement in the 0.8-s window from onset of the first target (p < 0.05).

#### RESULTS

#### Behavioral Task Performance

The point of the behavioral task in this study was that object choice with a greater amount of reward during the choice cue or delay period could be made in a manner that was temporally dissociated from movement choice (**Figure 1A**; see section "Materials and Methods" for details). First, we confirmed that the monkeys could learn the association between a visual feature of the object (color or shape) and four levels of reward in every block of trials, and then choose the better target by releasing the button using the data recorded on neuronal activity (monkey 1, 119 days; monkey 2, 86 days) (**Figure 2**). The transition of the mean optimal choice ratio (the choice ratio of the target associated with the greater amount of reward) in the choice task across blocks indicated that the optimal choice ratio in both color-reward and shape-reward blocks increased after block change, whereas the optimal choice ratio corresponding to the previous block decreased in both monkeys (**Figure 2A** left). The reaction time (RT) from the onset of the first target to button release was faster when the monkey made an optimal choice with a larger reward than when it made an optimal choice with a smaller one, and the RTs for the larger reward became faster as the trials progressed following the block change (**Figure 2A** right). The monkeys could also make an optimal choice for any combination of first and second targets, predicting different amounts of reward (**Figure 2B**). In a quantitative analysis of the two monkeys, the choice probability of the first target and the mean RT for choosing the first target against the difference in reward amount between the first and second targets were calculated and showed consistent results with **Figures 2A,B** (**Figures 2C,D**). The choice probability of the first target against the difference exhibited a significant effect, and these choice behaviors did not differ significantly between the two monkeys (**Figure 2C**): two-way ANOVA, F(5,2) = 280.39, p < 0.001 for the difference in reward amount; F(1,6) = 0.94, p = 0.38 for monkey). RT was also significantly affected by the difference in reward amount (**Figure 2D**): one-way ANOVA, monkey 1, F(5,9.99), p < 0.001; monkey 2, F(1,9.16), p < 0.001. In both monkeys, RT was significantly slower when the difference in reward amount was 1 than when it was 2 or 3 (Wilcoxon rank sum test with Bonferroni correction: monkey 1, p < 0.001 for reward 1 vs. reward 2, p < 0.001 for reward 1 vs. reward 3, p = 0.0052 for reward 2 vs. reward 3; monkey 2, p < 0.001 for reward 1 vs. reward 2, p < 0.001 for reward 1 vs. reward 3, p < 0.001 for reward 2 vs. reward 3). These behavioral results meant that the two monkeys learned the association between a visual feature of the object (color or shape) and four different amounts of reward and chose the target associated with the greater amount of reward by releasing the button according to the block change.

Next in this task, we aimed to specifically check whether monkeys made an object choice or not (**Figures 2E,F**). If monkeys could choose any one of two objects during choice cue or delay period in the choice task, the subsequent process would be the same as the instruction task, in which they simply chose the decided or instructed object by release the button during target presenting period. Subsequently, we compared the RT of the choice task and the instruction task (**Figures 2E,F**). RT in the instruction task was found to be faster than that in the choice task (**Figure 2E**, Wilcoxon

FIGURE 2 | Behavioral performance (monkeys 1 and 2: 119 days, 178 blocks; monkey 2: 86 days, 129 blocks). (A) Left: Transition of mean optimal choice ratio in the free-choice task after the block change with monkeys 1 and 2. Red and cyan indicate the optimal choice ratio in the current shape and color block, respectively. Blue and green indicate the optimal choice ratios in the previous shape and color blocks, respectively. Right: Mean reaction time (RT) and s.e.m. of the optimal choice trial vs. progress of subblocks (1st, 2nd, 3rd, and 4th) with monkeys 1 and 2. Cyan, red and green indicate RT when the chosen amount of reward was 4, 3, and 2, respectively. (B) Mean probability of choosing the first target vs. the difference in the amount of reward with monkeys 1 and 2. (C) Mean probability of choosing the first target vs. the difference in reward amount (1st minus 2nd target) and s.e.m. for monkey 1 and monkey 2. (D) Mean RT and s.e.m. when the first target was chosen vs. the difference in the amount of reward (1st minus 2nd target) for monkeys 1 and 2. (E) Mean RT and s.e.m. of the choice task and the instruction task for monkey 1 and 2. (F) Mean RT and s.e.m. of the choice task and the instruction task for the four level of reward amount. Statistical test was performed by Wilcoxon rank sum test with Bonferroni correction. ∗∗∗p < 0.001, ∗∗p < 0.01, <sup>∗</sup>p < 0.05.

rank sum test: monkey1, p < 0.001; monkey2, p < 0.001). No significant effect was seen for RT in the instruction task for the four levels of reward, whereas RT in the choice task showed a significant effect (**Figure 2F**, one-way ANOVA, monkey1, F(3,1.95), p = 0.118 for the instruction task, F(3,11.4), p < 0.001 for the choice task; monkey2, F(3,0.79), p = 0.502 for the instruction task, F(3,12.1), p < 0.001 for the choice task). These results showing different RT between the choice and the instruction tasks indicated that the behavioral analysis was unable to support the evidence that the monkeys made object choice.

### Population Neuronal Activity Evidence for Object Choice During the Choice Cue and Delay Periods

Again, the salient feature of the behavioral task used in this study was that choice of object's visual feature, object choice, with

greater amount of reward could be made during the choice cue or delay period, temporally dissociated from movement choice. However, we were unable to confirm this claim by behavioral analysis (**Figures 2E,F**). It is possible that the monkeys did not always need to use a strategy to choose one of two objects during the choice cue or delay period since they could choose both object and movement at the onset of the first target by remembering the two objects without making any object choice before. In the first step of our neuronal analysis, we aimed to examine whether the monkeys actually made an object choice during the choice cue or delay period by analyzing neuronal activities of all recorded neurons (monkey 1, n = 201; monkey 2, n = 174, **Figure 3A**). To this end, we searched the neuronal representation of "chosen object" during the choice cue or delay period by mutual information analysis and the bootstrap method. Because there was a possibility that mutual information of the chosen object had a spurious correlation with that of the choice cue (hereafter called the "offered objects"), we tried to identify whether the mutual information of the chosen object was significantly larger than the surrogate mutual information of the chosen object in which the information of the chosen object was randomized but the information of the offered object was kept (see section "Materials and Methods" for details). We calculated the summation of mutual information of chosen shape (or color) from all recorded neurons (n = 375) (**Figure 3B**) and performed the statistical test in each eight successive 0.2-s windows from the onset of choice cue (threshold for significance: p < 0.05). We found significantly larger information of the chosen shape in the latter five windows and color in the second, third, and fourth windows than the surrogate information of chosen shape and color, respectively (**Figure 3B** and **Table 1**). We also checked whether there was mutual information of the offered object in distinction from surrogate mutual information of the offered object in which the information of the offered object was randomized but the information of the chosen object was kept. We found significantly larger information of the offered shape in the second to sixth windows and color in the third window than the surrogate information of offered shape and color, respectively (**Figure 3C** and **Table 1**). The significant representation of the chosen and offered object by population neuronal activities (summation of mutual information from all recorded neurons) suggested the presence of post- and pre-decision signals in the striatum along with evidence that the monkeys actually made an object choice during the choice cue or delay period prior to the movement choice.

Many previous studies reported that the striatum represents value signals. In the present task, the association between a visual feature of the object (shape or color) and four levels of reward was changed across blocks (**Figure 1C**), which might enable us to discriminate the "object-visual feature" from the "reward value" associated with the object. However, there was a possibility that the change of association across blocks was not enough to discriminate them, because we recorded neuronal activity across only two or three blocks (two or three times change of the association). Then, to check whether the information of the object-visual feature and the reward value could be regarded as different or not, we compared two surrogate distributions of the summation of mutual information for all recorded neurons (n = 375) (**Figure 4**). One was the surrogate distribution of the information of chosen (or offered) reward value, in which the information of the chosen (or offered) object was randomized but the information of the chosen (or offered) reward value was kept; the other was the distribution of information of the chosen (or offered) object, in which the information of the chosen (or offered) reward value was randomized but the information of the chosen (or offered) object was kept. Comparing the two distributions by ROC analysis for the classification of chosen object and chosen reward value in 0.2-s windows showing the significance in **Figure 3B**, the AUC for chosen shape was 0.74, 0.76, 0.69, 0.66, and 0.61 (**Figure 4A** upper, p < 0.001 for all five windows, Mann–Whitney U-test for difference between the distribution of AUCs and 0.5; see section "Materials and Methods" for statistics) and AUC for chosen color was 0.69, 0.85, and 0.78 (**Figure 4A** lower, p < 0.001 for all three windows, Mann–Whitney U-test). For the classification of offered object and reward value, the AUC for offered shape was 0.81, 0.71, 0.59, and 0.66 (**Figure 4B** upper, p < 0.001 for all four windows, Mann–Whitney U-test), and for offered color it was 0.50 (**Figure 4B** lower, p = 0.115, Mann–Whitney U-test). These results indicated that the information of chosen shape, chosen color and offered shape were larger than that of value, whereas offered color and value were not discriminated well in the present task and data.

### Neuronal Representation in Relation to Object Choice at the Single-Neuron Level

As population neuronal activities indicating that the monkeys actually made object choices were confirmed (**Figures 3B,C**), we investigated the neuronal representations of offered object and chosen object during the choice cue or delay period at the singleneuron level. Perievent time histogram (PETH) of an example of averaged activity of a single neuron aligned at the onset of choice cue in a choice task (**Figure 5A** left) revealed differential activity according to the shape combination of the offered object in the 0–0.4-s window from the onset of choice cue (one-way ANOVA, p = 0.0073). The mutual information of offered shape of this neuron in the same window was significantly larger than that of surrogate information, in which the information of offered shape was randomized and the information of chosen shape was kept (bootstrap method, p = 0.02; see section "Materials and Methods" for statistics). Here, we defined this type of neuron as an "offered shape-type neuron." This offered shape-type neuron did not show differential activity according to the offered shape in the instruction task (p = 0.35) (**Figure 5A** right). To confirm the distribution of offered object (shape or color)-type neurons in choice task, instruction task, or both tasks, we calculated the proportion of offered object-type neurons for the choice task and the instruction task separately. For both offered shapeand color-type neurons, the proportion for both tasks exceeded the chance level that was expected from the proportion for choice and instruction task each [x 2 (1) = 11.945, p < 0.001 for offered color, x 2 (1) = 6.316, p = 0.012 for offered shape,


TABLE 1 | P-values for the information of chosen and offered objects related to Figure 3.

Chi-squared test, **Figure 5B**]. An example of another neuron (**Figure 5C** left) showed differential activity according to the shape of the chosen object in the 1.2–1.6-s window from onset of choice cue in the choice task (one-way ANOVA, p = 0.0068). The mutual information of chosen shape of this neuron in the same window was significantly larger than the surrogate information, in which the information of chosen shape was randomized and the information of offered shape was kept (p = 0.01). We defined this type of neuron as a "chosen shapetype neuron." This chosen shape-type neuron did not show differential activity according to chosen shape in the instruction task (p = 0.56) (**Figure 5C** right). Similar to the offered objecttype neurons, we calculated the proportion of chosen objecttype neurons for the choice and instruction tasks separately. For chosen shape-type neurons, the proportion for both tasks exceeded the chance level that was expected from the proportion for only choice and instruction tasks, whereas the proportion for chosen color-type neurons did not [x 2 (1) = 10.133, p = 0.015 for chosen shape, x 2 (1) = 2.293, p = 0.13 for chosen color, Chisquared test, **Figure 5D**]. These results indicated the presence of offered and chosen object signals at the single-neuron level during the choice cue or delay period, and these signals were represented by all three types of neurons (choice, instruction, and both tasks).

In the present task, following object choice the monkey needed to release the button during the first or second target-presenting period (**Figure 1**). PETH of an example of averaged activity of a single neuron aligned at onset of choice cue (**Figure 6A**) revealed differential activity according to first or second release in the 0.8-s window after onset of the first target (one-way ANOVA, p = 2.2 × 10−36). We defined this type of neuron as a "movement-type neuron." Mutual information analysis using task condition of movement (1st and 2nd release) revealed that the information was not evident during the choice cue and delay period, but it was strongly represented after onset of the first target (**Figure 6B**). We checked the overlap between offered object-type or chosen object-type neurons with movement-type neurons (**Figures 6C,D**). For both offered shape- and colortype neurons, the proportion overlapping with movement-type neurons was around or below the chance level [x 2 (1) = 19.189, p < 0.001 for offered shape, x 2 (1) = 0.354, p = 0.552 for offered color, Chi-squared test, **Figure 6C**]. For chosen shape- or colortype neurons, similar to the offered type neurons, the proportion of overlap with movement-type neurons was around or below the chance level [x 2 (1) = 1.132, p = 0.843 for chosen shape, x 2 (1) = 4.772, p = 0.0289 for chosen color, Chi-squared test, **Figure 6D**]. These results indicated that the object and movement signals were represented by separate neurons.

#### DISCUSSION

To study neuronal representation in relation to object choice, which does not include physical action, in the striatum, we designed a behavioral task, in which object choice could be temporally dissociated from movement choice, and trained two monkeys in the task (**Figures 1**, **2**). We recorded 375 striatal PANs of the two monkeys (**Figure 3A**). We calculated the mutual information using the task condition of the chosen object for all recorded neurons and performed statistical tests using the bootstrap method, and found that population striatal activities represented the information of the chosen object in distinction from the offered object during the choice cue and delay period, which indicated that the monkeys actually made an object choice during the task (**Figure 3B** and **Table 1**). We also found the neuronal representation of offered object in distinction from chosen object during the period (**Figure 3C** and **Table 1**). For the activity of individual neurons, we investigated the neuronal representations of the offered object and chosen object and identified offered object- and chosen object-type neurons (**Figure 5**). Furthermore, we also identified that the movementtype neurons discriminated between the first and second release during the first target-presenting period (**Figures 6A,B**). Most offered object- or chosen object-type neurons did not overlap with movement-type neurons (**Figures 6C,D**). These findings suggested that the presence of object choice-related signals in the striatum and their signals were represented by other neurons related to movement.

Previous studies investigated the involvement of the striatum in action choice using behavioral tasks, in which the alternatives for choice included both motor and non-motor factors simultaneously, e.g., alternatives predicting reward values and motor direction (Takikawa et al., 2002; Samejima et al., 2005; Lau and Glimcher, 2008; Tai et al., 2012). Although some studies have examined the neuronal activity of the striatum in relation to reward expectation without the motor aspect (Lauwereyns et al., 2002; Cromwell and Schultz, 2003), these behavioral tasks did not include choices of alternatives. A unique feature of this study is that object choice (choice for visual feature) could be made during the choice cue or the delay periods, which was temporally dissociated from movement choice. Furthermore, in this task, because two objects were presented in 2 × 2 form spatially in

(shape and color) and value. ∗∗∗p < 0.001.

four corners, spatial information of the two objects was hashed, which means that the object choice could not be made for spatialspecific position. This is the first study to reveal the neuronal representations in the striatum in relation to object choice by designing and adopting a behavioral task, in which the period used to make an object choice is explicitly extracted.

We were unable to confirm the evidence that monkeys actually made object choice through behavioral analysis (**Figures 2E,F**). However, in neuronal analysis, we found that the neuronal representation of chosen object was distinct from offered object during choice cue and delay period (**Figure 3B**), which indicated that object choice was made. We also found the neuronal representation of offered object during the period (**Figure 3C**). These representations of chosen and offered object were regarded as post- and pre-decision signals without physical action, respectively. In fact, chosen shape and offered shape in **Figure 3** showed a dynamically significant representation in the order of the decision process (from pre- to post-decision signals). For the color representation, we were unable to explain the temporal dynamics like shape information. Further research is required to reveal the mechanisms of different signals such as shape, color, and offered and chosen information were temporally represented and related each other.

In the present study, we found the neuronal representations of the object-visual feature (chosen shape, color, and offered shape) rather than that of the reward value (**Figure 4**). Although this seems like a paradoxical result in comparison with previous studies reporting value-related signals in the striatum, some previous studies (Samejima et al., 2005; Lau and Glimcher, 2008) have reported that there are lots of non-value neurons that show a differential response according to movement direction when animals make a decision, as well as value type neurons. Although there is a discrepancy between present and previous tasks regarding whether the alternatives for choice include physical action or not, the non-value neurons in previous and present studies could be interpreted as the same type of neurons that represent the option signal without value. Therefore, the results of neuronal representation for object-visual feature in this study are consistent with those of previous studies.

Anatomically, the striatum has inputs from various cerebral cortical areas, including the prefrontal, higher-order motor, and primary motor cortex, and it returns these inputs via the thalamus

FIGURE 6 | Neuronal representation in relation to movement choice. (A) Example of time course of averaged single neuronal activity representing movement choice (0–0.8-s window from onset of the first target; one-way ANOVA, p = 2.2 × 10−36). First and second gray shadows indicate cue presentation period (0.4 s) and onset of the first target following a variable cue delay period (0.8–1.2 s), respectively. (B) Time course of mutual information of movement for all recorded neurons (n = 375) aligned at onset of choice cue. (C) Overlap offered shape- (left) and color- (right) type neurons with movement-type neurons among all recorded neurons (n = 375). Green, orange, and blue indicate significant differences (p < 0.05) in both offered shape (or color) and movement, offered shape (or color) only, and movement only, respectively. (D) Same as (C) but for chosen object type neurons.

largely in parallel (Alexander et al., 1986). A conceptual model has been proposed, in which the prefrontal and motor loops are involved in object and movement choice, respectively (Samejima and Doya, 2007). In addition, afferent nerves from different functional cortical regions on the striatum partially converge (Yeterian and Van Hoesen, 1978; Selemon and Goldman-Rakic, 1985), and it is proposed that this convergence plays a role in integrating information across reward, cognitive, and motor functions (Haber, 2016). Several studies of primate electrophysiology have suggested that the OFC and the SEF play an important role in reward-based action choice without the motor aspect (Padoa-Schioppa and Assad, 2006; Padoa-Schioppa, 2009; So and Stuphorn, 2010; Cai and Padoa-Schioppa, 2014; Chen and Stuphorn, 2015). Non-spatial visual information about color or shape is represented in the prefrontal cortex (Divac et al., 1967; Levy et al., 1997; Sakagami and Tsutsui, 1999; Lauwereyns et al., 2001). The anterior caudate receives input mainly from the prefrontal cortex, including the dorsolateral prefrontal cortex, the OFC, the anterior cingulate cortex, and the SEF (Yeterian and Van Hoesen, 1978; Selemon and Goldman-Rakic, 1985; Alexander et al., 1986; Haber, 2016). Considering the anatomical connections from the prefrontal cortex to the anterior caudate and its neuronal representation, including the present results, object choice could be made through the prefrontal loop including the anterior caudate and the prefrontal cortex by using information about its non-spatial value and attributes of object. However, human functional magnetic resonance imaging (fMRI) studies have reported the presence of object choice signals in the ventromedial prefrontal cortex (Wunderlich et al., 2010; Hare et al., 2011), projecting mainly to the ventral striatum. The neuronal representations for the object choice in each striatal subarea need to be investigated. On the other hand, for movement choice, several studies have suggested that the premotor cortex plays an important role (Schieber, 2000; Lauwereyns et al., 2002; Nakayama et al., 2008; Thura and Cisek, 2014). The present study revealed the presence of movement-related signals (**Figure 6**). Considering the anatomical connections, their movement-related signals might be processed within the premotor loop. It will be necessary to classify the distribution of neuronal representation in relation to object choice and movement choice based on the striatal subregion.

Taken together, the investigation of object choice has so far concentrated on the cortex. Our results reveal the neuronal representation in relation to object choice in the striatum and show the importance of cortico-basal ganglia circuits in decision-making.

## DATA AVAILABILITY STATEMENT

All data that support the findings of this study are available from the Lead Contact (satoshi.nonomura@gmail.com) upon reasonable request.

### ETHICS STATEMENT

All experiments were approved by the Animal Research Ethics Committee of Tamagawa University (animal experiment protocol H21/27-14) and were carried out in accordance with the Fundamental Guidelines for Proper Conduct of Animal Experiments and Related Activities in Academic Research Institutions [Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan] and the Guidelines for Animal Experimentation in Neuroscience (Japan Neuroscience Society). All surgical procedures were performed under appropriate anesthesia, and all efforts were made to minimize suffering. Our procedures for primate animal experiments were established in previous studies at Tamagawa University (Nakayama et al., 2008; Yamagata et al., 2009; Hashimoto et al., 2010; Saga et al., 2011; Arimura et al., 2013).

### AUTHOR CONTRIBUTIONS

fnins-13-01283 November 26, 2019 Time: 18:19 # 13

SN and KS designed and performed the research, analyzed the data, and wrote the manuscript.

#### FUNDING

This work was supported by JSPS KAKENHI Grant Number JP12J56654 and MEXT KAKENHI Grant Numbers

#### REFERENCES


JP20680020, JP22120514, JP24120716, JP16H01725, JP19K16300, and JP19H04988.

#### ACKNOWLEDGMENTS

We thank E. Hoshi, J. Tanji, K. Doya, and members of the Samejima and Hoshi laboratories for general discussion and technical support; Y. Sakai and K. Mitani for help with statistical analysis; and K. Hamatani for administrative support.

choice. Proc. Natl. Acad. Sci. U.S.A. 108, 18120–18125. doi: 10.1073/pnas. 1109322108



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Nonomura and Samejima. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phase-Dependent Response to Afferent Stimulation During Fictive Locomotion: A Computational Modeling Study

Soichiro Fujiki <sup>1</sup> \*, Shinya Aoi <sup>2</sup> , Kazuo Tsuchiya<sup>2</sup> , Simon M. Danner <sup>3</sup> , Ilya A. Rybak <sup>3</sup> and Dai Yanagihara<sup>4</sup>

*<sup>1</sup> Department of Physiology and Biological Information, Dokkyo Medical University School of Medicine, Mibu, Japan,*

*<sup>2</sup> Department of Aeronautics and Astronautics, Graduate School of Engineering, Kyoto University, Kyoto, Japan,*

*<sup>3</sup> Department of Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA, United States,*

*<sup>4</sup> Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan*

#### Edited by:

*Kenway Louie, New York University, United States*

#### Reviewed by:

*Maria Knikou, The City University of New York (CUNY), United States Winfried Mayr, Medical University of Vienna, Austria*

> \*Correspondence: *Soichiro Fujiki fujiki@dokkyomed.ac.jp*

#### Specialty section:

*This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience*

Received: *28 June 2019* Accepted: *14 November 2019* Published: *29 November 2019*

#### Citation:

*Fujiki S, Aoi S, Tsuchiya K, Danner SM, Rybak IA and Yanagihara D (2019) Phase-Dependent Response to Afferent Stimulation During Fictive Locomotion: A Computational Modeling Study. Front. Neurosci. 13:1288. doi: 10.3389/fnins.2019.01288* Central pattern generators (CPGs) in the spinal cord generate rhythmic neural activity and control locomotion in vertebrates. These CPGs operate under the control of sensory feedback that affects the generated locomotor pattern and adapt it to the animal's biomechanics and environment. Studies of the effects of afferent stimulation on fictive locomotion in immobilized cats have shown that brief stimulation of peripheral nerves can reset the ongoing locomotor rhythm. Depending on the phase of stimulation and the stimulated nerve, the applied stimulation can either shorten or prolong the current locomotor phase and the locomotor cycle. Here, we used a mathematical model of a half-center CPG to investigate the phase-dependent effects of brief stimulation applied to CPG on the CPG-generated locomotor oscillations. The CPG in the model consisted of two half-centers mutually inhibiting each other. The rhythmic activity in each half-center was based on a slowly inactivating, persistent sodium current. Brief stimulation was applied to CPG half-centers in different phases of the locomotor cycle to produce phase-dependent changes in CPG activity. The model reproduced several results from experiments on the effect of afferent stimulation of fictive locomotion in cats. The mechanisms of locomotor rhythm resetting under different conditions were analyzed using dynamic systems theory methods.

Keywords: central pattern generator, half-center CPG, afferent control of CPG, phase-dependent response, dynamic structure

### INTRODUCTION

The mammalian spinal cord contains neuronal circuitry that can generate a basic locomotor rhythm and produce the alternating flexor and extensor motoneuron activities underlying locomotion. Although this locomotor central pattern generator (CPG) can operate in the absence of sensory feedback (reviewed by Grillner, 1981; Rossignol, 1996; Orlovsky et al., 1999; Rossignol et al., 2006), afferent feedback plays a crucial role in adjusting the locomotor pattern to the motor task, the environment, and the biomechanical characteristics of the limbs and body (e.g., Pearson, 2004; Rossignol et al., 2006). Continuous electrical stimulation of the midbrain locomotor region in an immobilized decerebrate cat produces "fictive locomotion" consisting of rhythmic alternating activation of flexor and extensor motoneurons similar to that occurring during normal locomotion in an intact animal (see Rossignol, 1996). To investigate the effects of afferent inputs on the locomotor pattern generated by the CPG and step cycle timing, researchers often use the fictive locomotor preparations while applying stimulation to flexor or extensor sensory afferents (e.g., Guertin et al., 1995; Perreault et al., 1995; McCrea, 2001; Stecina et al., 2005). These studies revealed that in many cases, afferent stimulation can delay or accelerate the phase transition within the ongoing step cycle with or without changing the timing of the subsequent step cycles (Rybak et al., 2006b; McCrea and Rybak, 2007).

Although the anatomical structure of the CPG circuit remains unclear, the use of relatively simple mathematical models of CPGs allows the study of the general effects of afferent stimulation on CPG operation from a dynamic viewpoint. In particular, half-center type CPG models were previously used to reproduce some effects of sensory afferent stimulation on fictive locomotor pattern in cats (Rybak et al., 2006b).

The goal of the present study was to further investigate the mechanism for the phase-dependent response of the locomotor pattern during fictive locomotion using a simplified half-center CPG model. Specifically, we applied stimulation to the CPG model in different phases of the locomotor cycle and examined how the temporal activity of the CPG changed. The use of a relatively simple CPG model allowed us to apply the dynamic system methods and perform mathematical analysis to fully characterize the phase-dependent responses of the CPG to applied stimulation.

#### METHODS

#### Model

It has been suggested that the rhythmic pattern of the CPG activity is determined in the rhythm generator (RG) network of the CPG (Rybak et al., 2006a,b). In the present study, the model (**Figure 1**) consisted of two neuron populations representing RG centers (flexor RG-F and extensor RG-E) and two populations of inhibitory interneurons (In-F, In-E), providing mutual inhibition between the flexor and extensor centers. Each population was described as an activity-based (non-spiking) neuron model (Ermentrout, 1994; Markin et al., 2010; Molkov et al., 2015; Danner et al., 2016, 2017). The state of each neuron was characterized by the membrane potential V<sup>i</sup> (i = F, E, IF, IE), where the indexes F and E are used for the RG-F and RG-E neurons, respectively, and the indexes IF and IE are used for the In-F and In-E neurons, respectively. The RG-F and RG-E neurons incorporated a persistent (slowly inactivating) sodium current that defined intrinsic rhythmogenic properties of these neurons. The intrinsic oscillation in each RG neuron depended on the variable h<sup>i</sup> (i = F, E) that defined slow inactivation of the persistent sodium channels. Each RG center could produce rhythmic activities; however, if uncoupled, the extensor center was in the tonic regime due to a supraspinal drive and produced sustained activity. Rhythmic oscillations of the RG were defined by the flexor centers, which provided rhythmic inhibition of the extensor center through In-F. The supraspinal drive to the flexor center determined the oscillation frequency. Synaptic interactions between all neurons in the model are shown in **Figure 1**. For the state variable of this model, we used **<sup>V</sup>** <sup>=</sup> [VF,VE,VIF,VIE] T and **<sup>h</sup>** <sup>=</sup> - hF, h<sup>E</sup> T .

The dynamics of the membrane potential V<sup>i</sup> of the RG neurons (i = F, E) and the interneurons (i = IF, IE) is described as

$$\text{C}\dot{V}\_{i} = \begin{cases} -I\_{\text{NaP}}\left(V\_{\text{i}}, l\_{\text{i}}\right) - I\_{\text{Leak}}\left(V\_{\text{i}}\right) - I\_{\text{SymE}}^{\dot{i}}\left(V\right) - I\_{\text{SymI}}^{\dot{i}}\left(V\right) & i = \text{E,F} \\ -I\_{\text{Leak}}\left(V\_{\text{i}}\right) - I\_{\text{SymE}}^{\dot{i}}\left(V\right) - I\_{\text{SymI}}^{\dot{i}}\left(V\right) & i = \text{IF, IE} \end{cases} \text{(1)}$$

where C is the membrane capacitance, INaP is the persistent sodium current, ILeak is the leak current, and I i SynE and I i SynI are the currents by excitatory synapses and inhibitory synapses, respectively. The ionic current INaP and leak current ILeak are described as

$$I\_{\text{Nap}}\left(V\_{i},h\_{i}\right) = \hat{\mathbf{g}}\_{\text{Nap}}m\_{\text{Nap}}\left(V\_{i}\right)h\_{i}\left\{V\_{i} - \mathcal{E}\_{\text{Na}}\right\} \qquad i = \text{F,E}$$

$$I\_{\text{Leak}}\left(V\_{i}\right) = \begin{cases} \hat{\mathbf{g}}\_{\text{Leak}}^{\text{RG}}\left\{V\_{i} - \mathcal{E}\_{\text{Leak}}^{\text{RG}}\right\} & i = \text{F,E}\\ \hat{\mathbf{g}}\_{\text{Leak}}^{\text{InRG}}\left\{V\_{i} - \mathcal{E}\_{\text{Leak}}^{\text{InRG}}\right\} & i = \text{IF,IE} \end{cases} \qquad \text{(2)}$$

where gˆNap, gˆ RG Leak, and gˆ InRG Leak are the maximum conductances of the corresponding current, and ENa, ERG Leak, and EInRG Leak are the reversal potentials of the corresponding current. In addition, mNap is the activation of the sodium channel of the RG neurons and is described as

$$m\_{\text{Nap}}(V\_i) = \frac{1}{1 + \exp\left(-\frac{V\_i + 40.0}{6.0}\right)} \text{ } i = \text{F, E} \tag{3}$$

The dynamics of the inactivation of the sodium channel h<sup>i</sup> of the RG neurons (i = F, E) is given by

$$\text{tr}\left(V\_i\right)\dot{h}\_i = h\_{\infty}\left(V\_i\right) - h\_i \qquad i = \text{F, E} \tag{4}$$

where

$$h\_{\infty}(V\_i) = \frac{1}{1 + \exp\left(\frac{V\_i + 45.0}{4.0}\right)}\tag{5}$$

$$\tau(V\_i) = 320 + \frac{320}{\cosh\left(\frac{V\_i + 35.0}{15.0}\right)}\text{ ms } i = \text{F,E}$$

The currents generated by the synapses I i SynE and I i SynI are given by

$$I\_{\text{SynE}}^{i}(V) = \hat{\mathfrak{g}}\_{\text{SynE}} \left\{ V\_{i} - \mathbb{E}\_{\text{SynE}} \right\} \left\{ \sum\_{j=\{\text{F,E,IF,IE}\}} \mathfrak{a}\_{i\emptyset} f\left( V\_{j} \right) + \mathfrak{c}\_{i} d + \mathfrak{w}\_{i} s\_{i} \right\}$$

$$I\_{\text{SynI}}^{i}(V) = \hat{\mathfrak{g}}\_{\text{SynI}} \left\{ V\_{i} - \mathbb{E}\_{\text{SynI}} \right\} \left\{ \sum\_{j=\{\text{F,E,IF,IE\}}} \mathfrak{b}\_{i\emptyset} f\left( V\_{j} \right) \right\} \quad i = \text{F,E, IE, IF} \left\{ \mathfrak{G} \right\}$$

where gˆSynE and gˆSynI are the maximum conductances of the corresponding current, ESynE and ESynI are the reversal potentials of the corresponding current, d is the tonic drive from the

FIGURE 1 | Model schematic of the rhythm generator (RG) network and afferent inputs. The RG network is composed of flexor (RG-F) and extensor (RG-E) centers inhibiting each other via inhibitory interneurons In-F and In-E, respectively. The supraspinal drive provides excitation to the RG-F and RG-E neurons defining the frequency of oscillations. Sensory afferents can synaptically excite both RG neurons and inhibitory interneurons.

supraspinal region, s<sup>i</sup> (i = F, E,IF, IE) is the feedback input from sensory fibers, and aij, bij, c<sup>i</sup> , and w<sup>i</sup> (i, j = F, E, IF, IE) are the weight coefficients. Moreover, the output function f translates V into the integrated population activity and is given by

$$f\left(V\_{i}\right) = \begin{cases} 0 & V\_{i} < V\_{\text{th}}\\ V\_{i} - V\_{\text{th}} & V\_{\text{max}} > V\_{i} \ge V\_{\text{th}} & i = \text{F,E,IF,IE} \\ 1 & V\_{i} \ge V\_{\text{max}} \end{cases} \quad \text{if} \quad i = \text{F,E,IF,IE} \quad \text{(7)}$$

where Vth and Vmax are the lower and upper threshold potentials, respectively. The differential equations of Equations (1), (2), and (4) were solved numerically using the fourth-order Runge-Kutta method with a step size of 0.01 ms. The parameter values are shown in **Appendix A**.

#### Modeling the Effects of Phase-Dependent Afferent Stimulation

The CPG model produced rhythmic activity and exhibited stable oscillations, as shown in **Figure 2**. The active phase for each neuron was defined as the time interval during which the neuron's potential was higher than Vth and the silent phase as the time interval when the potential was lower than Vth. The cycle period T was defined as the time interval between two consecutive onsets of the active phase. The phase of oscillation was defined as φ = 2πt/T ∈ [0, 2π).

The CPG also received external ("sensory") signals (**Figure 1**). Based on a previous study (Demir et al., 1997), which investigated the response of a single neuron model to stimulations, we used depolarizing stimuli applied at different phases of oscillatory activity. Specifically, after oscillation stabilized, we applied a 200 ms stimulus to the flexor (RG-F and In-F) or extensor (RG-E and In-E) neurons. The intensity of stimulation was s<sup>F</sup> = sIF = 0.2 and s<sup>E</sup> = sIE = 0.0 for the flexor side and s<sup>F</sup> = sIF = 0.0 and s<sup>E</sup> = sIE = 0.2 for the extensor side in Equation (6). Suppose that the neuron activity is perturbed by the stimulation at phase φ<sup>s</sup> ∈ [0, 2π) and the period changes from T to α (φs), as shown in **Figure 2**. To show the phase shift of the neuron activity in response to the stimulation, we define

$$
\Delta\left(\phi\_{\rm s}\right) = 2\pi \frac{\alpha\left(\phi\_{\rm s}\right) - T}{T} \tag{8}
$$

#### Calculation of Nullcline

The nullcline is a set of points at which the derivative of a differential equation is equal to zero. It reflects the structure of the solution of the differential equation. To investigate the mechanism of the phase-dependent response of the CPG model, we used a nullcline-based method. The state variable of the CPG model is given by (**V**, **h**). The nullclines for the RG neurons are given by

$$\begin{array}{l} \text{N}\_{i}^{\text{V}} = \left\{ (V, h) \mid \dot{V}\_{i} = 0 \right\} \\ \text{N}\_{i}^{\text{h}} = \left\{ (V, h) \mid \dot{h}\_{i} = 0 \right\} \quad i = \text{ F, E} \end{array} \tag{9}$$

To clarify the dynamics of each RG neuron, we focused on the Vih<sup>i</sup> space (i = F, E) for the nullclines by assuming that the other variables V<sup>j</sup> (j = F, E, IF, IE, j 6= i) and h<sup>k</sup> (k = F, E k 6= i) are on the stable oscillation with phase φ. Therefore, we modify N<sup>V</sup> i and Nh i in Equation (9) as

$$\begin{aligned} \hat{\mathcal{N}}\_i^V(\phi) &= \left\{ \left( V\_i, h\_i \right) \mid \dot{V}\_i = 0, \ V\_j = V\_j \left( \phi \right), h\_k = h\_k(\phi) \right\} \\ \hat{\mathcal{N}}\_i^h(\phi) &= \left\{ \left( V\_i, h\_i \right) \mid \dot{h}\_i = 0, \ V\_j = V\_j \left( \phi \right), h\_k = h\_k(\phi) \right\} \\ i &= \text{F, E} \ j = \text{F, E, IF, IE} \ j \neq i \ k = \text{F, E} \ k \neq i \end{aligned} \tag{10}$$

For Nˆ <sup>V</sup> i (φ) and Nˆ <sup>h</sup> i (φ), we can write h<sup>i</sup> = hi(Vi; φ), as explained in **Appendix B**.

#### RESULTS

#### Phase-Dependent Response

**Figure 3A** shows the phase shift 1 of the RG-F neuron activity after stimulation of sensory inputs on the flexor side at φ<sup>s</sup> . When

decreased the cycle period in (C). Both stimulations produced phase shifts.

stimulation was applied during the silent phase of RG-F (2.51 ≤ φ<sup>s</sup> < 2π), it caused the transition to the active phase to occur earlier and this advanced start decreased with φ<sup>s</sup> . In contrast, almost no phase shift occurred when stimulation was applied at the beginning of the active phase of RG-F (0 ≤ φ<sup>s</sup> < 1.00). However, the neuron activity was delayed by the stimulation during the middle and end of the active phase (1.00 ≤ φ<sup>s</sup> < 2.51). These trends were similar to those observed during fictive locomotion in cats (Schomburg et al., 1998; Frigon et al., 2010), as shown in **Figure 3B**. **Figure 3C** shows 1 of the RG-F neuron activity by the stimulating sensory fibers of the extensor side. The active and silent phase of the RG-F neuron corresponds to the

silent and active phase, respectively, of the RG-E neuron. The neuron activity was advanced at the middle of the silent phase of the RG-E neuron and was delayed at the end of the active phase of the RG-E neuron. The response of the stimulation of the extensor side was qualitatively similar to that of the flexor side. Moreover, these trends were similar to those seen in animal experiments (Schomburg et al., 1998; Frigon et al., 2010). The effects of the stimulation duration and intensity are further investigated in **Figure S1** in Appendix C.

### Analysis on Nullclines

Even though the oscillatory behavior of the RG-E neuron was similar to that of the RG-F neuron as shown in **Figure 2**, the oscillating mechanism was different due to different nullclines as suggested in previous studies (Spardy et al., 2011a,b; Molkov et al., 2015). To understand this mechanism, we briefly explain the roles of nullclines in our neuron model. **Figure 4A** shows the nullclines Nˆ <sup>V</sup> F , Nˆ <sup>V</sup> E , Nˆ <sup>h</sup> F , and Nˆ <sup>h</sup> <sup>E</sup> with the vector field for the case without synaptic connections from other neurons. While Nˆ <sup>h</sup> F and Nˆ <sup>h</sup> E are identical and have a sigmoid shape, Nˆ <sup>V</sup> F and Nˆ <sup>V</sup> E have different cubic curves. In particular, while Nˆ <sup>V</sup> F has two distinct inflection points and the sign of the slope changes at the inflection points, Nˆ <sup>V</sup> E changes monotonically. Because two eigenvalues at the intersection of Nˆ <sup>V</sup> F and Nˆ <sup>h</sup> F are positive and negative, the intersection is a saddle, which induces a limit cycle (orange orbit) due to the following three characteristics; (1) the trajectory approaches Nˆ <sup>V</sup> F , especially its branches with positive slope due to the difference of the time constants between the dynamics of V and h, (2) the trajectory close to the positive branches moves along them until reaching the inflection points, and (3) the trajectory jumps to the opposite positive branch at the inflection points. In contrast to the case for the RG-F neuron, the two eigenvalues at the intersection of Nˆ <sup>V</sup> E and Nˆ <sup>h</sup> E are both negative and the intersection is stable node. The trajectory is attracted to this node and stays there as long as the node exists. Therefore, the RG-E neuron does not show any oscillatory behavior.

The synaptic connections from other neurons change Nˆ <sup>V</sup> F and Nˆ V E as shown schematically in **Figure 4B**, so that both RG-F and RG-E neurons show oscillatory behavior. On the one hand, although the intersection of Nˆ <sup>V</sup> F and Nˆ <sup>h</sup> F temporarily forms a stable node, it remains close to the saddle point (burst mode), which produces an oscillatory behavior. On the other hand, while the intersection of Nˆ <sup>V</sup> E and Nˆ <sup>h</sup> E remains stable, Nˆ <sup>V</sup> E transitions between two positions due to an inhibitory signal from the contralateral side, one of which has a high V at the intersection (tonic mode) and the other of which has a low V (silence mode). These transitions produce an oscillatory behavior. **Figure 4C** shows the details of our model at φ = 0, 0.89, 1.78, 2.68, 3.88, and 5.08 rad to show how the nullclines changed during one cycle.

#### Shortening of Activity Duration During Silent Phase

Next, we investigated the mechanism for the phase-dependent response during the silent phase. **Figure 5** shows the responses on the VF-h<sup>F</sup> plane by the stimulation of the flexor side at φ<sup>s</sup> = 3.77, 5.03, and 5.53 rad. The disturbed trajectories took a shortcut to the limit cycle at different positions depending on φ<sup>s</sup> , which decreased the activity duration and advanced the neural activity. As shown in Equations (1), (4), and (6), while stimulation directly influences the membrane potential V<sup>i</sup> (i = F,E), it does not influence the inactivation of the sodium channel h<sup>i</sup> (i = F,E). Therefore, a shortcut was produced in the direction of V<sup>i</sup> . Moreover, for the same reason, as φ<sup>s</sup> occurs earlier, the shortcut has a larger truncated trajectory and the neural activity is more advanced. Although the intersection of Nˆ <sup>h</sup> F and Nˆ <sup>V</sup> F before the stimulation was in silence or burst mode, it suddenly changed to tonic mode after the stimulation, which attracted the trajectory toward the intersection and shortened the neuron activity.

**Figure 6A** shows the response on the VE-h<sup>E</sup> plane by the stimulation of the extensor side at <sup>φ</sup><sup>s</sup> <sup>=</sup> 1.13 rad. While <sup>N</sup><sup>ˆ</sup> <sup>V</sup> E moved to the right and the intersection of Nˆ <sup>h</sup> E and Nˆ <sup>V</sup> E changed from the silence to tonic mode just after the stimulation, the movement of Nˆ <sup>V</sup> <sup>E</sup> was smaller than that of <sup>N</sup><sup>ˆ</sup> <sup>V</sup> <sup>F</sup> when the flexor side was stimulated (**Figure 5**). After the stimulation, although the disturbed trajectory moved to the right, it did not completely enter the limit cycle (① in **Figure 6A**). However, Nˆ <sup>V</sup> E gradually moved to the right and the intersection of Nˆ <sup>h</sup> E and Nˆ <sup>V</sup> E also further

FIGURE 4 | Roles of nullclines of RG-F and RG-E neurons to produce oscillatory behaviors. The green lines show Nˆ <sup>h</sup> F and Nˆ <sup>h</sup> E . The red and blue lines show Nˆ <sup>V</sup> F and Nˆ <sup>V</sup> E , respectively. Circles indicate intersections of nullclines [filled circles for both negative eigenvalues (stable node) and open circles for negative and positive eigenvalues (saddle)]. (A) Nˆ <sup>V</sup> F and Nˆ <sup>V</sup> <sup>E</sup> with the vector field for the case without synaptic connections from other neurons. The saddle produces a limit cycle (orange orbit) while stable node does not produce any oscillatory behavior. (B) Schematic illustration of changes in Nˆ <sup>V</sup> F and Nˆ <sup>V</sup> E induced by synaptic connections from other neurons. The intersection of Nˆ <sup>V</sup> F and Nˆ <sup>h</sup> F almost remains saddle (burst mode), which induces an oscillatory behavior. On the other hand, Nˆ <sup>V</sup> E transitions between two positions depending on the inhibitory signal from the contralateral side. This transition produces oscillatory behavior between tonic and silence modes for the extensor side. (C) Detailed illustration of our model at φ = 0, 0.89, 1.78, 2.68, 3.88, and 5.08 rad. Red and blue diamonds are (*V*F, *h*F) and (*V*E, *h*E), respectively, and these points move in accordance with eigenvalues, as indicated by arrows.

moved to the right. As a result, the trajectory eventually took a shortcut to the limit cycle (② in **Figure 6A**). Although the shortcut was induced by the change of the intersection of Nˆ <sup>h</sup> E and Nˆ <sup>V</sup> E from the silence to tonic mode in the same way as that of the stimulation of the flexor side (**Figure 5**), it was delayed due to an inhibitory signal from the flexor side just after the stimulation. More specifically, **Figure 6B** shows the time profiles of the neurons after the onset of the stimulation. Just after the stimulation, the membrane potentials (V<sup>E</sup> and VIE) of the RG-E and In-E neurons increased immediately and crossed over Vth (① in **Figure 6B**), which changed the effect on the connected neurons described by Equation (7). The immediate change of the In-E neuron changed the activities of the other neurons. Especially, the membrane potentials (V<sup>F</sup> and VIF) of the RG-F and In-F neurons decreased due to the inhibitory signal from the In-E neuron and crossed over Vth. The decrease of the inhibitory signal from the flexor side increased V<sup>E</sup> (② in **Figure 6B**), which induced the shortcut.

### Prolongation of Activity Duration During Active Phase

At the end of the active phase, the neural activity was delayed as shown in **Figure 3**. In the case without stimulation (VF, hF), of the RG-F neuron swooped down to the right inflection point of Nˆ V F at the end of the active phase, as shown in the panel for φ<sup>s</sup> = 1.78 rad of **Figure 4**. However, the stimulation at the end of the active phase moved Nˆ <sup>V</sup> F to the right and changed the intersection of Nˆ <sup>h</sup> F and Nˆ <sup>V</sup> F from burst to tonic mode, as shown in **Figure 7A**. Furthermore, Nˆ <sup>V</sup> F showed almost no change for a while. These inhibited the deactivation of the RG-F neuron and prolonged the activity duration. In addition, the intersection of Nˆ <sup>V</sup> E and Nˆ <sup>h</sup> E changed from the burst to the silence mode and stayed in the silence mode for a while, which also delayed the neural activity. **Figure 7B** shows the case of the stimulation of the extensor side at the end of the active phase of the RG-E neuron. The RG-E neuron maintained the tonic mode due to the stimulation and this prolonged the activity duration. This response was similar to the case of flexor stimulation (**Figure 7A**).

### DISCUSSION

In the present study, we investigated the underlying mechanism of the phase-dependent response of a half-center CPG model by applying a brief stimulation to it. The simulation results showed trends in the phase-dependent responses similar to those observed during fictive locomotion in cats (Schomburg et al., 1998; Frigon et al., 2010; **Figures 3A,B**).

It has been reported that the locomotor rhythm is reset to start a new flexion phase by an electrical stimulation to the flexor nerve in animals (Schomburg et al., 1998). Our simulation results suggest that, while the locomotor rhythm is reset to start a new flexion phase by stimulation during the silent phase, its start phase depends on the stimulation phase. The phase shifts of the RG-F neuron during the active phase (silent phase of the RG-E neuron) were also induced by stimulation of the extensor side (**Figure 3C**). However, in contrast to stimulation of the flexor side, the change in the intersection of the nullclines was smaller and formation of trajectory shortcut did not occur just after the stimulation of the extensor side (**Figure 6**). Instead, the In-E neuron was activated by the stimulation (we can estimate this using Equation S7 in **Appendix D**), which deactivated the RG-F and In-F neurons due to the inhibitory signal from the In-E neuron. As a result, the RG-E neuron was activated because of the deactivation of the neurons in the flexor side. These processes delayed the shortcut after the stimulation of the extensor side.

cycle (②). (B) Time profiles of four neurons from the onset of the stimulation to the end of the shortcut. The vertical lines show the onset and 80 ms after the stimulation. The horizontal line shows Vth. After the stimulation, the membrane potentials of the RG-E and In-E neurons rapidly changed and crossed over Vth (①). After that, while the membrane potentials of the RG-F and In-F neurons decreased due to the inhibitory signal from the In-E neuron and crossed over Vth, the membrane potential of the RG-E neuron gradually increased. As a result, the decrease of the inhibitory signal from the flexor side increased the activity of the RG-E, which induced the shortcut (②).

Although the shortcut was delayed by the stimulation of the extensor side, the RG-E neuron had the potential to produce an immediate shortcut by stimulation, as in **Figure 5**, due to the nullcline intersection changing to a tonic mode when the stimulation intensity was larger as illustrated in **Figure S2** in **Appendix C**.

At the end of the active phase, the neural activity was delayed by the stimulation. When the flexor side was stimulated, the intersection of the nullclines of the RG-F neuron changed from burst to tonic mode (**Figure 7A**). Similarly, the stimulation of the extensor side at the end of the active phase of the RG-E neuron prolonged the active phase by maintaining the tonic mode (**Figure 7B**). Even though the parameters of synaptic connection were different between the flexor and extensor sides, the mechanism of the active phase prolongation was the same (**Figures 7A,B**). As **Figure S2** in Appendix C shows, the stimulation contributed to the nullcline intersection changing to a tonic mode irrespective of φS. From our simulation results, the phase-dependency was caused by these acceleration and prolongation mechanisms, which were commonly induced by the change of the nullcline intersection to a tonic mode.

### Contribution of Different Afferent Types

Schomburg et al. (1998) demonstrated the resetting of the locomotor cycle in response to various flexor nerve stimulation during fictive locomotion. They employed both shorter stimulation trains (around 60 ms) at stimulation intensities activating joint and cutaneous afferents and longer stimulation trains (over 200 ms) at intensities activating only group I and II afferents. Other studies investigating the effects of sensory afferents on locomotor modulation also used relatively longer stimulation (for example, Ia and II afferents of extensor and flexor were stimulated for over 125 ms in Frigon et al., 2010; Ia or Ib afferents of extensor were stimulated for over 500 ms in Whelan et al., 1995; and II afferents of flexor were stimulated for over 200 ms in Perreault et al., 1995). Based on the conditions

of these experiments, we used a stimulation lasting 200 ms. In addition, the effect of the stimulation intensity was also investigated in those experiments. Therefore, we examined the effect of the stimulation duration and intensity (**Figures S1**, **S2** in Appendix C).

Functional roles of muscle spindles (Ia and II), Golgi tendon organs (Ib), and cutaneous afferent inputs during locomotion have been investigated in previous studies. During the stance phase, feedback from muscle spindles and Golgi tendon organs of extensor muscles prolong the duration of extensor activity (Guertin et al., 1995; Whelan et al., 1995) and muscle spindles in hip flexors contributed to initiation of the swing phase (Hiebert et al., 1996). At the beginning of the swing phase, stimulation of cutaneous nerves prolonged this phase (Duysens, 1977). As indicated above, the different responses depended on the locomotor phase. Yet, it remains unclear how the neural circuit of the CPG interacts with different types if sensory fibers and which neural circuits contributed to the generation of a phase-dependent response. In our present model, we did not identify the relative contributions of different afferent types to the CPG (**Figure 1**). Nevertheless, our model reproduced a phase-dependent response (**Figure 3**). Further experimental and computational studies are necessary to delineate anatomically and functionally plausible interactions between the CPG and the sensory afferents.

### Functional Roles of the Different Layers in CPGs

Although the anatomical structure of the CPG remains unclear, it has been suggested from modeling studies (Rybak et al., 2006a,b) that the CPG consists of a RG layer and a pattern formation (PF) layer. The PF layer is thought to determine the spatial motor pattern depending on the phase generated in the RG neurons; that is, it determines the distribution of the co-activated αmotoneurons over time. The muscle synergy hypothesis is one candidate for the determination of the distribution (Ivanenko et al., 2004, 2006) and modeling studies have shown that a motor control system based on this hypothesis could generate locomotion using musculoskeletal models (Aoi et al., 2010, 2013, 2019; Fujiki et al., 2018). In those models, the amplitudes of the α-motoneuron activities were determined in the PF layer. Based on this, it is suggested that the neurons in the PF layer modulate their amplitudes, which would be related to the phase-dependent response in terms of amplitude of the electromyography of Hoffmann-reflex during locomotion (Capaday and Stein, 1986; Yang and Stein, 1990). However, the neurons in the RG layer control the temporal aspect of the phase-dependent response as shown in the present study. As physiological experiments have shown, the feedback from muscle spindles contributed to the modulation of the muscle activity strength (Mayer et al., 2018) and the timing of the stance-to-swing and swing-to-stance transitions (Grillner and Rossignol, 1978; Hiebert et al., 1996; Akay et al., 2014). Therefore, the different layers of the CPG may explain the two different types of phase-dependency.

### Limitations of Model

In our study, we used the activity-based neuron model (Ermentrout, 1994; Markin et al., 2010; Molkov et al., 2015; Danner et al., 2016, 2017). This neuron model does not show spiking because it omits the potassium and fast-type sodium currents. Instead, this used a persistent sodium current, which enables the neuron model to generate bursting. Ausborn et al. (2018) showed that an activity-based neuron model preserved the principal dynamic features of neural activities as a half-center CPG. Even though our model did not include potassium and fast-type sodium currents, it reproduced the phase-dependent response and contributed to analysis of its dynamic structure.

### Interaction Between Body and Neural System During Adaptive Walking

In the present study, we focused on the phase-dependent response of the CPG activity during fictive locomotion. When animals walk, motor commands are sent to the leg muscles from the spinal CPG, and the CPG receives sensory signals from the leg nerves. While fictive locomotion is generated in an openloop system, actual locomotion is generated in a closed-loop system. In addition to the analysis of fictive locomotion, in the future, we would like to investigate the entrainment mechanism through the dynamics of the CPG circuit, the body mechanical system, and the sensory system. Moreover, it has been suggested that the CPG consists of the RG and PF layers. While the RG layer determines the rhythm pattern of motor commands, the PF layer determines the spatial pattern (Rybak et al., 2006a). In the future, we would like to introduce the PF layer to our model to clarify further neural mechanisms of sensorimotor integration for adaptive locomotion.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## AUTHOR CONTRIBUTIONS

SF and SA developed the study design in consultation with KT, SD, and DY. SF performed simulations and analyzed the data in consultation with SA, KT, and SD. SF, SA, SD, and IR wrote the manuscript. All authors reviewed and approved the manuscript.

## FUNDING

This research was supported in part by JSPS KAKENHI Grantin-Aid for Young Scientists (B) JP16K16482, Grant-in-Aid for Scientific Research (B) JP15KT0015, and Grant-in-Aid for Scientific Research on Innovative Areas JP26120006 and also by the National Institutes of Health grants R01-NS100928 and R01-NS090919.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01288/full#supplementary-material

### REFERENCES


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Fujiki, Aoi, Tsuchiya, Danner, Rybak and Yanagihara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How Our Perception and Confidence Are Altered Using Decision Cues

Tiasha Saha Roy <sup>1</sup> , Bapun Giri 2,3, Arpita Saha Chowdhury <sup>1</sup> , Satyaki Mazumder <sup>1</sup> and Koel Das <sup>1</sup> \*

*<sup>1</sup> Department of Mathematics and Statistics, Indian Institute of Science Education and Research Kolkata, Mohanpur, India, <sup>2</sup> Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, WI, United States, <sup>3</sup> Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States*

Understanding how individuals utilize social information while making perceptual decisions and how it affects their decision confidence is crucial in a society. To date, very little has been known about perceptual decision-making in humans and the associated neural mediators under social influence. The present study provides empirical evidence of how individuals are manipulated by others' decisions while performing a face/car identification task. Subjects were significantly influenced by what they perceived as the decisions of other subjects, while the cues, in reality, were manipulated independently from the stimulus. Subjects, in general, tend to increase their decision confidence when their individual decision and the cues coincide, while their confidence decreases when cues conflict with their individual judgments, often leading to reversal of decision. Using a novel statistical model, it was possible to rank subjects based on their propensity to be influenced by cues. This was subsequently corroborated by an analysis of their neural data. Neural time series analysis revealed no significant difference in decision-making using social cues in the early stages, unlike neural expectation studies with predictive cues. Multivariate pattern analysis of neural data alludes to a potential role of the frontal cortex in the later stages of visual processing, which appeared to code the effect of cues on perceptual decision-making. Specifically, the medial frontal cortex seems to play a role in facilitating perceptual decision preceded by conflicting cues.

Keywords: perceptual decision making, social influence, computational modeling, gamma mixture model, multivariate pattern classification

### 1. INTRODUCTION

In today's information-satiated society, perceptual decision and subsequent action are greatly influenced by social information. Modern human society is increasingly organized around collective opinions, as reflected in people's increased use of web ratings for daily choices about consumer products, lodging, food, and entertainment (Jayles et al., 2017). Opinions and choice can easily propagate through social networks (Jansen et al., 2009; Gonçalves and Perra, 2015) in this digitized world, and even political opinions can be manipulated using social transmission (Bond et al., 2012). The human tendency to conform to social influence has been explored systematically in classic studies by Solomon Asch (Asch and Guetzkow, 1951; Asch, 1955) and others (Berns et al., 2004, 2010; Behrens et al., 2008; Klucharev et al., 2008, 2009, 2011; Campbell-Meiklejohn et al., 2010; Biele et al., 2011; Izuma and Adolphs, 2013 and see Tajfel, 1982; Cialdini and Goldstein, 2004; Izuma, 2013 for reviews). Reliance on other's opinion is not unique to humans. Different species

#### Edited by:

*Paul E. M. Phillips, University of Washington, United States*

#### Reviewed by:

*Krishna P. Miyapuram, Indian Institute of Technology Gandhinagar, India Michael Georg Metzen, McGill University, Canada*

> \*Correspondence: *Koel Das koel.das@iiserkol.ac.in*

#### Specialty section:

*This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience*

Received: *09 January 2019* Accepted: *04 December 2019* Published: *14 January 2020*

#### Citation:

*Saha Roy T, Giri B, Saha Chowdhury A, Mazumder S and Das K (2020) How Our Perception and Confidence Are Altered Using Decision Cues. Front. Neurosci. 13:1371. doi: 10.3389/fnins.2019.01371* of animals depend on collective opinion to decide on lifecritical perceptual tasks like foraging for food, placement of nests and navigation (Simons, 2004; Conradt and List, 2009; Couzin, 2009) and evolve optimal decision-making strategies accordingly. Consideration of the beneficial effect of group decision can be traced back as early as 1907, when Francis Galton analyzed the opinions of 787 people about the weight of an ox and found that combining their numerical assessments resulted in a median estimate that was remarkably close to the true weight of the ox (Galton, 1907). In recent times, this idea has been popularly referred to as the "wisdom of the crowds" (Surowiecki, 2005). However, the effect of social cues in the form of collective decision on individual percept and the underlying neural mechanism remains largely unexplored (Klucharev et al., 2009; Izuma, 2013).

Neural expectation studies over the last decade have demonstrated that predictive cues typically lead to changes in early sensory processing (Carlsson et al., 2000; Kok et al., 2012a,b, 2013, 2014, 2016, 2017; Jiang et al., 2013; John-Saaltink et al., 2015; Todorovic et al., 2015; Sherman et al., 2016), but recent research has contradicted this claim (Bang and Rahnev, 2017; Rungratsameetaweemana et al., 2018). We sought to examine whether social information produces similar early top-down changes in the sensory cortex. We propose to manipulate the individual choice and decision confidence of humans performing a perceptual task by presenting visual cues that the subjects presume to be the collective opinion of other well-performing participants. The cues can be concurring, conflicting or neutral to the individual perceptual decision of the subjects. Using a novel statistical model, we studied the effect of the three types of cues on individual choice. We also analyzed the neural signals to explore the neural mediators producing the change in their individual choice upon being presented with social information. Finally, we performed a source reconstruction of the neural signals to elucidate the role played by specific spatio-temporal areas under the influence of cues. Specifically, we explored the following questions:

Can we manipulate individual perceptual decisions upon presenting potential social information cues when the cues differ from the individual choice? Does this reversal of opinion depend upon how confident the subject was in his/her choice without any influence from cues?

Can individual decision confidence be augmented when the cues concur with the individual choice?

Can we identify flip-floppers based on computational modeling of their behavioral data and corroborate using neural data?

Can we explore the neural mediators that contribute to the change in individual percept post-cue display?

Using a face/car discrimination task, we show that it is possible to manipulate individual choice post-presentation of cues in the guise of the decision of others. Although the cues were randomly generated and independent from the stimulus, it was possible to alter the individual percept, as subjects presumed the cues as concurring, conflicting, or neutral. Irrespective of the order in which they viewed the images with or without cues, most subjects were affected by the cues in a systematic manner. The distribution of the decision confidence under such a set up was found to be bimodal and skewed, with one mode guided by social information and the other influenced by the individual's own decision. The tendency to adhere to their own decision depends on the confidence level of the subject and is reflected in the skewness of the data distribution. Hence, using a Gaussian model to explore the data, which is the usual practice (Park et al., 2017), might not capture the complexities of data completely. We propose a novel model using a mixture of shifted gamma and negative gamma distributions that successfully captures the effect of social cues on individual choice. To the best of our knowledge, this is the first study using a mixture of variants of gamma distributions, which captures the bimodal nature as well as the skewness (whether high or low) of this kind of data. We compare our proposed model with the mixture of two Gaussian distributions and demonstrate the superiority of our model convincingly. Based on the behavioral model, it was possible to objectively identify subjects most prone to change their decisions upon being presented with the opinion of others. Subsequent multivariate pattern analysis (MVPA) of neural data substantiated the above finding. Neural analysis also elucidated the existence of a late component that seems to code the effect of this social information on individual perceptual decision. Source analysis of neural data revealed a role for the frontal cortex in coding perceptual decision using social information. Our analysis alludes to the role of the medial frontal cortex in coding information when conflicting social decisions are provided as cues.

## 2. MATERIALS AND METHODS

### 2.1. Stimuli and Display

The data set consisted of 290 × 290 pixel 8-bit gray-scale images of 12 cars and 12 faces with an equal number of frontal views and side views. Face images were taken from the Max Planck Institute for Biological Cybernetics face database (Troje and Bülthoff, 1996). All stimuli were filtered to attain a common frequency power spectrum. Noise was generated by filtering white Gaussian noise (std of 3.53 cd/m<sup>2</sup> ) by the average power spectrum. Noise was added to the base stimuli to generate a set of 250 images (125 face, 125 car). The contrast energy of all 250 images was matched at 0.3367 deg<sup>2</sup> . The participants were at a distance of 125 cm from a display with a mean luminance of 25 cd/m<sup>2</sup> . Images subtended a visual angle of 4.57◦ .

### 2.2. Participants and Experiment

Twenty naïve participants (ages: 22–28, mean: 25.85, std: 2.39) participated in the study, which consisted of 1,000 trials split into 40 successive sessions. Three subjects were not considered in the analysis due to the high degree of noise present in the neural data. All participants had normal or corrected-tonormal vision and disclosed no history of neurological problems. The participants performed a face/car discrimination task and reported their decision using a 10-point confidence rating. Participants perceptually categorized briefly (50 ms) presented images of cars (C) and faces (F) embedded in filtered noise. The participants began by fixating on a central cross and clicking Saha Roy et al. Altering Perception Using Social Decision

anywhere on the screen. After a delay of 50 ms, a cue was presented for 100 ms followed by a variable delay of 500–800 ms. The stimulus was presented for 50 ms followed by a delay of 700 ms, after which the response screen appeared. The participants reported their decision using the confidence rating, with a rating of 1 indicating complete confidence that the stimuli was a face and a rating of 10 indicating complete confidence that it was a car. The participants reported their confidence rating on a grayscaled colorwheel in the response screen to avoid any motor bias (**Figure 1A**). There were four types of cues, FF, CC, FC, and CF, representing decisions of two independent well-performing participants who had previously completed the study. Cues were systematically manipulated such that an equal number of images (250 per condition) had FF cues, FC/CF cues, and CC cues. There were also an additional 250 images without cues. Thus, each participant saw one stimulus four times preceded by an FF cue, FC/CF cue, CC cue, and no cue in the course of the experiment in random order, and the responses were recorded. Participants were naïve to the purpose of the study and, in subsequent questionnaire after the study, failed to realize that the cues were not decision cues but were, in fact, synthetic cues generated randomly.

EEG activity was recorded using 64-channel active shielded electrodes mounted in an EEG cap following the international 10/20 system. EEG signals were recorded using two linked Nexus-32 bioamplifiers at a sampling rate of 512 Hz, band-pass filtered (0.01–40 Hz.) and then referenced using average referencing. Trials with ocular artifacts (blinks and eye movements) were detected using bipolar electro-occulograms (EOG) with amplitude exceeding ±100 mV or visual inspection and were not included in the analysis.

#### 2.3. Behavioral Model

We propose a statistical model to explore the effect of the presented cues on perceptual decision making. In the experiment, for every face/car stimulus, subject responses corresponding to the three types of cues (FF, FC/CF, and CC) along with a response to the same stimuli with no cues were recorded. The response to the no-cue image was taken as the individual decision on the subject, k<sup>1</sup> ∈ {1, 2, . . . 10}, for that image. Further, we define a social cue variable k<sup>2</sup> as

$$k\_2 = \begin{cases} 1 & \text{if cue shown was \textquotesingle FF',} \\ 5 & \text{if cue shown was \textquotesingle FC/CF',} \\ 10 & \text{if cue shown was \textquotesingle CC'.} \end{cases}$$

All the images in which the individual decision of the subject was k<sup>1</sup> were considered, and the distribution of the decisions on the same images under the influence of each type of cue was studied. Hence, the data comprised the decisions of a particular subject for every (k1,k2) pair. In most cases, the data distributions were bimodal in nature, having positive and/or negative skew, as seen in **Figure 1B**. Hence a two-component mixture model based on variants of the gamma distribution was proposed to explain the decisions taken by the subject under the influence of a cue. The data were made continuous by using jittering (addition of uniform random noise, Chanialidis, 2015) to provide flexibility in modeling.

Let **X**i(k1, k2) contain the decisions taken by the ith subject on all images, where his/her individual decision was k<sup>1</sup> and the cue shown was k2. We consider the elements of **X**i(k1, k2) as i.i.d. observations from a distribution. To propose the statistical model depending on the choices of (k1, k2), we first introduce some terminology and notation. The probability densities of shifted gamma and negative gamma distributions are given, respectively, as

$$g(\mathbf{x}) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} (\mathbf{x} - 1)^{\alpha - 1} e^{-\beta(\mathbf{x} - 1)}, \mathbf{x} \ge 1, \alpha \ge 1, \beta > 0 \tag{1}$$

$$\log(\alpha) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} (L - \mathfrak{x})^{\alpha - 1} e^{-\beta(L - \mathfrak{x})}, \mathfrak{x} \le L, \alpha \ge 1, \beta > 0,\tag{2}$$

where α and β are the shape and scale parameters, respectively, and L is a known constant.

Based on Equations (1) and (2), the following models are proposed depending on the choices of (k1, k2). If k<sup>1</sup> ∈ {1, 2, . . . , 5} and k<sup>2</sup> ∈ {1, 5}, we take our model as

$$f(\mathbf{x}) = p \ g\_{\alpha\_1 \beta\_1}(\mathbf{x}) + (1 - p) \ g\_{\alpha\_2 \beta\_2}(\mathbf{x}),\tag{3}$$

a mixture of two shifted gamma distributions. When k<sup>1</sup> ∈ {6, 7 . . . , 10} and k<sup>2</sup> = 10, the proposed model is

$$f(\mathbf{x}) = p \ n \mathbf{g}\_{\alpha\_1, \beta\_1}(\mathbf{x}) + (1 - p) \ n \mathbf{g}\_{\alpha\_2, \beta\_2}(\mathbf{x}),\tag{4}$$

a mixture of two negative gamma distributions. Finally if either k<sup>1</sup> ∈ {1, 2, . . . , 5} and k<sup>2</sup> = 10 or k<sup>1</sup> ∈ {6, 7 . . . , 10} and k<sup>2</sup> ∈ {1, 5}, our suggested model is

$$f(\mathbf{x}) = p \ g\_{\alpha\_1, \beta\_1}(\mathbf{x}) + (1 - p) \ n g\_{\alpha\_2, \beta\_2}(\mathbf{x}),\tag{5}$$

a mixture of a shifted gamma and a negative gamma distribution, where 0 ≤ p ≤ 1 is the mixing parameter.

#### 2.3.1. Parameter Space of the Model

We have taken the restricted parameter space for the shape parameter (α) in both the distributions (Equations 1 and 2) so that the modes of the distributions are defined and are either more than or equal to 1 (for the shifted gamma case) or less than or equal to L (for the negative gamma case). In our case, we consider L to be 11. In particular, for both shifted-gamma and negative-gamma distributions,


#### 2.3.2. Estimation of the Model Parameters

Next, for the purposes of estimation of the parameters of our proposed model and further inference, only those data are considered that have more than 10 observations. Note that the parameter estimates depend on i as well as (k1, k2); that is to say, for every individual i, the parameter estimates may vary for different choices of (k1, k2). Similarly, for a given (k1, k2), parameter estimates of the proposed model may vary from individual to individual. We estimate the model parameters by a maximum likelihood estimation procedure (Casella and Berger, 2002). Since the proposed models are mixture densities, to calculate the maximum likelihood estimates (MLE) we invoke the EM algorithm technique (Casella and Berger, 2002). However, since closed-form solutions for estimates of shape parameters do not exist, we apply the Newton Raphson numerical technique (Atkinson, 1978) within each M-step of the EM algorithm (see **Supplementary Information** for detailed calculation).

#### 2.3.3. Goodness of Fit

To understand how well our model fits the observed data, the Kolmogorov-Smirnov (KS) test statistic (Gibbons and Chakraborti, 2011), based on the maximum absolute differences between the hypothesized cumulative distribution function (cdf) and empirical cumulative distribution function (ecdf), was used. For each subject i, there were N<sup>i</sup> models to be tested simultaneously, and the case of multiple testing therefore arose. To control the family-wise error rate arising due to multiple hypothesis tests per subject, we used the Holm-Bonferroni method (Westfall et al., 1993) with a family-wise error rate (FWER) of 0.05.

#### 2.3.4. Model Prediction

We use a 10-fold cross-validation procedure to study the predictive performance of the proposed model. Since our data were bimodal in nature, it would not have been meaningful to judge this performance on the basis of a single predictive interval. To address this issue, we applied the following concept of a highest probability density region (HPDR) (Hyndman, 1996), which broadly computes the smallest region that contains most of the probability.

**Definition**: Let f(x) be the probability density function of a random variable X. The 100(1 − α)% HPDR is then defined as the subset R(fα) of real numbers, R, such that

$$R(f\_{\alpha}) = \{ \alpha : f(\alpha) \ge f\_{\alpha} \},$$

where f<sup>α</sup> is the largest constant with P(X ∈ R(fα)) ≥ 1 − α.

In each fold, the model was trained on the training set, and the 95% HPDR was computed. It was checked whether the validation set fell within the estimated HPDR, and the process was repeated for each cross-validation fold.

#### 2.3.5. Model Comparison

We compared the performance of our proposed model with the two-component Gaussian mixture model using a likelihood ratio test (Casella and Berger, 2002). Data were divided into 10 test sets using 10-fold cross-validation and, for each set, the likelihood was estimated with each of the two models. Finally, the medians of the likelihood ratios across the folds were computed for each of the models for the purpose of comparison.

#### 2.4. Behavioral Data Processing

Guided by the proposed model, the behavior of the individuals were analyzed based on the following measures.

#### 2.4.1. Distance Metric Computation Using the Model

To quantify the overall shift in decisions from the subjects' individual choice, the following distance was used

$$D\_i(k\_1, k\_2) = \begin{cases} \sqrt{\mathbf{x}\_i^\prime \mathbf{x}\_i} & \text{if } k\_1 = k\_2, \\ \sqrt{\mathbf{x}\_i^\prime \Sigma^{-1} \mathbf{x}\_i} & \text{otherwise,} \end{cases} \tag{6}$$

where **x**<sup>i</sup> = (k<sup>1</sup> − **m1**(i), k<sup>1</sup> − **m2**(i))′ , **m<sup>1</sup>** and **m<sup>2</sup>** being the vectors containing the two modes of the N(k1,k2) subjects and i = 1, 2, . . . , N(k1,k2) . Here, N(k1,k2) denotes the number of subjects available corresponding to (k1, k2), and 6 is the estimated variance covariance matrix of estimates of the modes for a particular choice of (k1, k2), given by

$$
\Sigma = \begin{bmatrix}
\text{Var}(\mathbf{m\_1}) & \text{Cov}(\mathbf{m\_1}, \mathbf{m\_2}) \\
\text{Cov}(\mathbf{m\_1}, \mathbf{m\_2}) & \text{Var}(\mathbf{m\_2})
\end{bmatrix}.
$$

#### 2.4.2. Social Bias Score

Using the cumulative distribution functions of shiftedgamma and negative-gamma distributions (as calculated in **Supplementary Information**) and Equations (3)–(5), the proportion of decisions between k<sup>1</sup> and k<sup>2</sup> in the presence of social cues was estimated. The average proportion of decisions (pi) per subject across the (k1, k2) pairs, which are reported in **Tables S5–S8**, was considered. We ranked the subjects based on social bias score, defined as

$$W\_i = \frac{p\_i - 0.5}{\sigma / \sqrt{n}},$$

for i ∈ {1, 2, . . . , 17} \ {2, 3}, with σ denoting the sample standard deviation of the proportions p<sup>i</sup> . Only those subjects were considered for further analysis whose W<sup>i</sup> exceeded 1.96, indicating that the corresponding proportions are significantly more than accounted for by chance.

#### 2.5. Neural Data Processing

The preprocessed EEG signals were time-locked to stimulus onset and included a 200 ms pre-stimulus baseline and 500 ms post-stimulus interval.

#### 2.5.1. Multivariate Pattern Analysis of EEG

Univariate EEG analysis had traditionally been used to explore the relationship between behavioral performance and neural activity in specific cognitive tasks. However, the univariate analysis techniques fail to fully utilize the spatio-temporal nature of multivariate neural data. Multivariate pattern analysis techniques provide a way to integrate the spatial and temporal information present in the data by fusing the neural information into a single decision variable that can be used in single-trial analysis. A comparison between univariate and multivariate analyses using a similar cognitive task has been shown in Das et al. (2010). Successful use of MVPA has been demonstrated in numerous studies using EEG and fMRI (Haynes and Rees, 2005; Kamitani and Tong, 2005; Philiastides et al., 2006). In the current study, MVPA was used to extract meaningful information from the multi-dimensional EEG data. Since the neural data is high dimensional and suffers from the small sample size problem (Das and Nenadic, 2009), a recently proposed principal component analysis (PCA)-based non-linear feature extraction technique– "Classwise Principal Component Analysis" (CPCA) (Das and Nenadic, 2009)–is used. CPCA has been used previously to efficiently reduce the dimensionality of the EEG signals and extract informative features (Das et al., 2009, 2010; Do et al., 2011, 2013; Wang et al., 2012; King et al., 2013). The main goal of CPCA is to identify and discard non-informative subspace in data by applying principal component-based analysis to each class. The classification is then carried out in the residual space, in which small sample size conditions and the curse of dimensionality no longer hold. A Linear Bayesian Classifier was then used for computing the choice probability for single-trial EEG data for each subject. Pattern analysis was performed using 10-fold crossvalidation. The original data were partitioned into 10 equally sized subsamples. Of the 10 subsamples, a single subsample was retained as the test data, and the remaining nine subsamples were used in training the classifier. The performance of the classifier is captured by the receiver operating characteristics (ROC) curve, which plots the true positive rate vs. false positive rate at different classification thresholds. The area beneath this ROC curve (AUC) is often used as a measure to determine the overall accuracy of the classifier (Duda et al., 2012). We utilize the well-known approach of calculating the area under the ROC by finding the Mann Whitney U-statistic for the two-sample problem (Mason and Graham, 2002). All classification analyses were carried out for individual participants, and the average AUC performance was reported in the results.

#### 2.5.2. Source Reconstruction

To identify underlying neuronal sources responsible for generating differences in the ERPs corresponding to the face and car trials under the influence of cues, source reconstruction was performed using sLORETA software (Pascual-Marqui, 2002, http://www.uzh.ch/keyinst/loreta). sLORETA (standardized low-resolution brain electromagnetic tomography) is based on standardization of the minimum norm inverse solution, which considers the variation of actual sources and the variation due to noisy measurement (if any) as well (Pascual-Marqui, 2002). As a result, it does not have any localization bias, even in the presence of measurement and biological noise. The head model for the inverse solution uses the electric potential lead field calculated using the boundary element method (Fuchs et al., 2002) on the MNI152 template (Mazziotta et al., 2001). The cortical gray matter is partitioned into 6,239 voxels at 5-mm spatial resolution. sLORETA images represent the standardized electric activity at each voxel in Montreal Neurological Institute (MNI) space as the exact magnitude of the estimated current density. Anatomical labels are reported using an appropriate correction from MNI to Talairach space (Talairach and Tournoux, 1988) using Talairach Daemon (Lancaster et al., 2000). For further details on sLORETA, refer to http://www.uzh.ch/keyinst/NewLORETA/ Methods/MethodsSloreta. The source activity was estimated from the face-car difference wave post-stimulus onset.

#### 2.5.3. Statistical Analysis of Sources

Differences in the distribution of the sources between concurring and conflicting trials were calculated using statistical nonparametric mapping (SnPM) (Nichols and Holmes, 2002). This method relies on the randomization of the absolute maximum statistic over all channels. The randomization provides an estimator for the empirical distribution under the null hypothesis ("no difference between the sources of concurring and conflicting trials"). The advantage of this method is that it does not depend on any distributional form, in particular Gaussianity, and simultaneously takes care of multiple comparisons. A total of 5,000 random samples were generated while implementing the SnPM technique. Differences between the two conditions (concurring and conflicting) were assessed at the global level, and the brain areas showing the largest differences have been reported.

### 3. RESULTS

### 3.1. Behavioral Results

The decisions taken by the subjects under the influence of a cue were modeled as a two-component mixture model based on the shifted-gamma and negative-gamma distributions (see Equations 3–5). To verify that the proposed model fits the observed behavior data well, the Kolmogorov-Smirnov (KS) test (Gibbons and Chakraborti, 2011) was used. The proposed model captured the data correctly in most cases (see **Table S1**). **Figure 1B** depicts histograms of the decisions corresponding to all (k1, k2) pairings and the fitted density of our model for one subject. **Table S1** contains the p-values corresponding to the cases where the model was rejected. In over 96% of the cases, the hypothesized model was accepted, thus proving efficacy of the model.

To measure the predictive performance of the proposed model and prevent possible over-fitting, after computing the highest probability density region (HPDR) of the fitted model based on the training data, it was checked whether the test data fell within the calculated HPDR. **Table S2** showing mean prediction error rates across subjects, demonstrates that the cross-validation error rate never exceeded 5% for any fold, thus validating the excellent performance of the model in terms of prediction and nullifying the chance of over-fitting. **Figure 2A** shows a fitted density function and the corresponding HPDR calculated from the training data of a particular validation fold of one subject. The test data, as seen from the figure, falls convincingly inside the indicated HPDR.

Gaussian distribution has been previously used to model behavioral data successfully (Park et al., 2017). Hence, the proposed model was compared with the mixture of two component Gaussian distributions. The median of the likelihood ratios across subjects for a given (k1, k2) in all but two cases (out of 30) clearly indicates that the proposed model outperformed the Gaussian mixture model in terms of explaining the data (refer to **Table S3**).

#### 3.1.1. Effect of Cues on Individual Choice

The effect of cues on individual decision was studied using a distance metric between k<sup>1</sup> and the estimated modes of the fitted model (see Equation 6). Using a bootstrap resampling technique

FIGURE 2 | Behavioral Data Analysis. (A) Estimated probability density function based on the training data set shown for one subject when *k*<sup>1</sup> = 3 and *k*<sup>2</sup> = 10. Bold lines on x-axis represent the 95% HPDR, and red stars represent the test observations for a subject. The test observations fall within the HPDR. (B) Increase in average proportions of decisions around the individual decision when viewing concurring cues vs. viewing neutral cues. The left part of the figure considers cases when the individual decision was a face, while the right part considers cases when it was a car. The bold dots depict the average across the individuals. (C) Mean proportion of decisions toward conflicting cues across individuals. Figure shows that crossover happens for all cases of individual confidence and is most prominent when individual decision confidence is low (5,6). Error bars denote ± SD. (D) Social bias ranking of subjects, indicating their tendency to be influenced by the cue shown. Larger and darker dots indicate subjects that are more socially influenced. The dotted line parallel to the x-axis depicts the significance level.

on mean distance per (k1, k2) pair, it can be observed that postcue, there was a significant shift in ratings when decisions from all subjects were pooled together (**Table S4**). Furthermore, to check whether this was also true for individual decisions, an additional analysis was carried out. If the proposed model predicted a mode in the direction of the social cue, the proportion of decisions between k<sup>1</sup> and k<sup>2</sup> was calculated by integrating the estimated density within the said interval. A significant proportion of decisions, as assessed by our model, was observed to lie between k<sup>1</sup> and k<sup>2</sup> (refer to **Tables S5–S8**), clearly suggesting that, in general, subjects tend to be influenced by the social choice, irrespective of whether it conforms to his/her individual bias.

#### 3.1.2. Effect of Concurring Cues

In order to check whether the decision confidence increased when the subject was given a cue concurring with his/her own judgment, the area under the fitted density given the concurring cue ("FF," "CC") was compared with that of a neutral cue ("FC"/"CF") (see **Tables S9, S10**). These areas were assumed to be indicative of the proportion of decisions of the subjects around the individual decision. As compared to the neutral cue, for most of the subjects, the average proportion of decisions in the region [1, 6] was greater when individual choice was a face and the social cue was also a face. Similarly, this proportion in the region [7, 11] was greater when the individual and social choices were both a car. Thus, it can be concluded (refer to **Figure 2B**) that the decision confidence of most subjects increased when provided with concurring social information (FF/CC).

#### 3.1.3. Effect of Conflicting Cues

Further analysis was carried out to check whether there was a significant reversal in the decisions when the subject faced a cue contradictory to his/her individual decision. We say that there is a cross-over if there exists a mode on the opposite side of the decision boundary. Cross-over under the influence of concurring cues was found to be insignificant (in terms of area) compared to with conflicting cues (see **Table S14**) and was hence ignored. For every k1, it was examined whether cross-over exists given a mismatch between social cue and the individual choice. Using bootstrapping, it was shown that the proportion of crossover was significant among the individuals. This is evident from the approximate achieved significance level (ASL) (Efron and Tibshirani, 1994) contained in **Table S11**. **Figure 2C** distinctly reveals that the mean cross-over proportion increased with a decrease in individual confidence, implying that, in general, subjects tend to be influenced more by contradictory cues on images where their individual confidence was low. Refer to **Tables S12, S13** for a detailed list of the cross-over proportions per subject.

#### 3.1.4. Cue-Based Ranking of Subjects

Individuals differ in the manner in which social information influences their perceptual decision. Using the proposed behavioral model, it is possible to rank the subjects based on the level of influence social information had on their percept. **Figure 2D** shows the ranking of subjects based on a measure, called social bias score, that captures their tendency to be influenced by social information. Based on the analysis, eight<sup>1</sup> subjects were selected as those most affected by cues and are referred as chosen subjects in the EEG analysis.

## 3.2. Neural Results

#### 3.2.1. ERP Analysis

ERP analysis was performed on average referenced and baselinesubtracted EEG signals for each condition. Epochs of a particular channel were marked noisy if their respective absolute differences from the median exceeded five times the interquartile range. Such noisy epochs were not considered for further ERP analysis. It is well-known that parieto-occipital electrodes show differential activity when perceiving faces and cars (Rossion et al., 2003). Several studies have hypothesized the role of the frontal cortex in choice manipulation under the influence of social information (Mason and Graham, 2002; Berns et al., 2010; Klucharev et al., 2011; Izuma and Adolphs, 2013). To explore the effect of the decision of others on face/car percepts, ERP analysis was carried out with parieto-occipital and frontocentral electrodes separately. To elucidate whether different types of comments induce different neural processing mechanisms, the grand average difference waves were plotted (refer to **Figure 3**) for correctly guessed face and car trials. A difference in face and car ERPs was visible across both fronto-central and parieto-occipital electrodes around 200 ms post-stimulus onset, closely following the N170 (Bentin et al., 1996) component known to be enhanced more in face than non-face ERPs. The difference between concurring and conflicting conditions, however, seemed more prominent around 250–300 ms poststimulus condition in both parieto occipital and fronto central electrodes. Further analysis was carried out using single-trial multivariate analysis.

#### 3.2.2. Single-Trial Multivariate Analysis

A pattern classifier was used to analyze single-trial EEG signals corresponding to the different types of cues. To quantify the predictive accuracy of the classifier, the posterior probabilities obtained from 10-fold cross-validation were used to calculate the area under the ROC curve (AUC). The AUCs were averaged across the subjects.

Multivariate analysis was performed using the entire poststimulus dataset using all channels and all time points, and AUCs corresponding to the different conditions were plotted (**Figure 4A**). The classification accuracy appeared to be greater when the subject was provided with a cue that concurred with his/her individual guess than when he/she was provided with a conflicting cue (p = 0.0213, df = 14, t = 2.2314). An overall increase in difference was noted between the conditions (p = 0.0038, df = 7, t = 3.7147, corresponding to the null hypothesis of no difference in the classification rates between the two conditions) when an average over chosen subjects was considered (**Figure 4A**). The pattern analysis was executed separately using EEG data for all electrodes across different time windows, each

<sup>1</sup>Out of the 17 subjects, two had only high-confidence trials and hence were not considered. Out of the 15 remaining, eight were found to be significantly more affected by the cues than the rest.

FIGURE 3 | (A,B) Show the grand average of difference ERPs (Face − Car) over parieto occiptial and fronto central electrodes, respectively, across the three types of conditions—Concurring, Conflicting, and Neutral. A sharp peak in the difference waveforms is observed post-200 ms across all conditions. Difference between conflicting and concurring cues seems more prominent around 250–300 ms.

subjects. The effect is more prominent in case of the chosen subjects. Error Bars indicate ± SEM. (B) Plot of average AUC across all subjects at different time points. The increase in AUC is most pronounced in the 200–300 ms post-stimulus interval. The difference between the AUCs of concurring and conflicting trials is most significant (*p* = 0.054, FDR corrected) in the 200–250 ms window (marked using \*). (C) Topoplot of one subject showing per electrode per time window single trial classification under different cue conditions. Average AUCs of the all channels for successive time windows are shown. There appears to be a significant involvement of the frontal and occipital electrodes 200–350 ms post-stimulus onset. Color bar depicts the value of AUC.

having a length of 50 ms. AUCs corresponding to the late sensory period (200–450 ms after stimulus onset) were found to be significantly more than chance (p-value < 0.05, false discovery rate (FDR) corrected) for concurring trials.

Further analysis showed that the difference between AUCs of concurring and conflicting cues was statistically significant only in the time window 200–250 ms [p-value (without multiple correction) = 0.01, t = 2.585, df = 14, FDR corrected p-value = 0.054, multiple hypothesis test performed across time points where the classification rates corresponding to concurring trials are more than chance]. On performing similar timewindow analysis on the chosen subjects, it was seen that the difference stood out as statistically significant [p-value (without multiple correction) <sup>=</sup> 2.40 <sup>×</sup> <sup>10</sup>−<sup>5</sup> , t = 8.8377, df = 7, FDR corrected p-value << 0.05] in the 200–250 ms time window.

**Figure 4B** clearly depicts that around 200–250 ms after stimulus onset, there was a sharp increase in the AUC value and the peak was more pronounced for concurring cues. Notably, prominent activity in fronto-central and occipitotemporal electrodes in a similar time window was also observed during ERP analysis.

Additional classifier analysis was carried out using data for each electrode separately for each of the time windows (**Figure 4C**), and the plot of scalp topography on the basis of the classifier performances (see **Figure 4C**) for individual electrodes seems to be consistent with the temporal findings (**Figure 4B**). Around 200–300 ms post-stimulus onset, we observe increased classification accuracy in the parietooccipital regions and fronto-central regions across all conditions (concurring, conflicting, and neutral). In these regions, the magnitude of the AUCs were greater in case of concurring trials than in conflicting and neutral trials (see **Figure 4C**). The classifier results demonstrate that social decisions have an effect on individual perceptual decision and that it is most prominent around 200–300 ms post stimulus onset.

#### 3.2.3. Source Reconstruction Results

Single-trial multivariate data analysis and ERP analysis revealed prominent discriminatory activity 200 ms post-stimulus onset. Source estimates identified more frontal activity under the influence of conflicting cues than with concurring cues (refer to **Figure 5**). Frontal sources seem to be primarily responsible for generating differences in the ERP waveforms of face and car trials across the whole neural timeline for conflicting trials, while a prominent fronto-parietal interplay was noticed in case of concurring and neutral trials. Particularly, the medial frontal gyrus seems to have contributed significantly in the presence of conflicting cues, in line with previous studies that also highlight the role of the medial frontal cortex during social conformity and cognitive dissonance (Klucharev et al., 2009; Berns et al., 2010; Izuma and Adolphs, 2013). The neural sources of the difference in the current density power between the concurring and conflicting conditions were analyzed using sLORETA with a one-tailed F-ratio test (concurring < conflicting) on paired data separately for the 200–250 and 250–300 ms time windows. Based on the results of the exceedance proportion test (Friston et al., 1990, 1991) which showed a threshold of 2.38 for a p-value of 0.058 for the 200–250 ms window and a threshold of −2.169 for a p-value of 0.059 for the 250–300 ms window, differences were localized mostly to the frontal areas (refer to **Tables S16, S17** for the complete list). We found the maximal differences in the

medial frontal gyrus (BA 10, MNI coordinates: x = 40, y = 55, z = 0) in both the cases (refer to **Figures 5E,F** and **Tables S16, S17**).

#### 3.2.4. Neural Analysis of Cue Data

We did an additional analysis based on the neural signals when the cue was displayed. We extracted the EEG signals locked to the cue onset. The 500-ms post-cue onset data were used to perform multivariate pattern analysis to explore the effects of expectation on early sensory processing. If the participants' responses were driven by the cues, then we would expect a higher classification rate for images selected as faces post-cue onset when preceded by an "FF" cue and vice versa for "CC" cues. However, pattern analysis of cue-data revealed no such trends (refer to **Figure 6**) and resulted in chance performance for all conditions (p > 0.05). Two-way ANOVA was performed to find the statistically significant difference between the four different cue conditions, taking into account face and car trials separately, along with interactions. The differences were all insignificant (see **Table S15**), pointing to the fact that there was no significant difference in the classification accuracy across all the cue conditions, including the condition where no cue was shown. It is interesting to note that similar chance performance was also observed in pre-stimulus and early post-stimulus (<200 ms) neural classification. Thus, based on the cue analysis, it seems unlikely that the participants' decision was influenced by cuebased expectation bias in the post-cue onset and early visual processing stage following the stimulus display.

### 4. DISCUSSION

How social decision affects individual decision-making has been explored in social psychology since the 1940s, starting with the research on social conformity by Solomon Asch (Asch and Guetzkow, 1951; Asch, 1955; Tajfel, 1982). With the advent of social media, there has been a renewed interest in social cues influencing our decisions (Jansen et al., 2009; Bond et al., 2012; Gonçalves and Perra, 2015; Jayles et al., 2017). In the current study, how people respond to social information when performing a perceptual decision-making task was explored systematically. The neural mechanism of the decision-making process was studied while the subjects used cues in the form of the decision of two other well-performing subjects to perceive noisy images of faces and cars. Although the cues shown to the subject were non-informative, with an equal number of FF, neutral, and CC cues per stimulus displayed in a random order, they were found to be successful in manipulating percept. Most of the studies on social influence require participants to make a decision with and without social cues sequentially, but we demonstrate that, irrespective of the order in which the stimulus/cue was presented, cues always have a similar effect on individual decision-making. We conclude that the perceptual decision of the subject under the influence of the cue depends on two factors—his/her individual perception of the image, as reflected in his/her confidence ratings on the same images without any cue, and the social information presented to him/her. It is observed that the distribution of confidence ratings under the influence of a cue is bimodal in nature, with one mode corresponding to individual decision and other to social cue (**Figure 1B**), with a significant proportion in the direction of the cue. We can thus safely infer that although there was a general tendency to adhere to one's individual decision, subjects' decision confidence could be altered by social influence. This shift in decision confidence varied between the subjects, as reported in previous studies (Jayles et al., 2017). Using the proposed computational model, the heterogeneity of the influence of cues on the subjects' decision was quantified successfully. The subjects were ranked based on the influence the cues elicited, and the findings used in subsequent neural analysis produced encouraging results.

Although social influence on perceptual decisions remains a highly researched topic, the neural mediators of the manipulation of perceptual decisions by social influence remain largely

unexplored (Mason and Graham, 2002; Berns et al., 2010; Klucharev et al., 2011; Izuma and Adolphs, 2013). The difference in performance under the influence of concurring and conflicting cues is most prominent in the 200–300 ms interval. Similar differences between conflict and no-conflict trials have been reported in recent papers (Shestakova et al., 2012; Zubarev et al., 2017). This time interval can potentially reflect an interaction between the social cues provided and the sensory information. It is interesting to note that the time window corresponds with the timing of feedback-related negativity (FRN) (Holroyd and Coles, 2002) and task difficulty (Philiastides et al., 2006). The mean AUC value peaks around 200–300 ms in trials with concurring cues. This implies that the classifier could identify the class-specific discriminatory activity and predict the participants' decision more accurately when the cue received matched with his/her individual perception. This corroborates our claim that the subjects were more sure about their decisions when the stimulus was preceded by a concurring cue. The effect is more well-defined in case of car trials, probably arising out of heavier mental load for car images than faces. Humans are adept at face perception (Leopold and Rhodes, 2010), and the stimuli displayed had uniform noise for both faces and cars, thereby making the car-detection task comparatively difficult. **Figure 2B** shows this effect for concurring cues, where the increase in decision confidence was more prominent for CC cues than for FF cues. A similar trend is noticed for conflicting cues (**Figure 2C**), where significant reversal of decision in the direction of the social information was noticed and the proportion of crossover was more for trials originally detected as cars. Almost all the existing neuroimaging studies using social cues suggest the role of the posterior medial frontal cortex (pMFC) and, to some extent, the ventral striatum (Klucharev et al., 2009; Berns et al., 2010; Izuma and Adolphs, 2013) in social conformity, but the neural mechanism remains poorly understood. Current research shows that activation in the pMFC is modulated by the difference between individual choice and group preference. The role of the pMFC in social conformity is further strengthened by a TMS study (Klucharev et al., 2011) where participants showed reduced social conformity when the pMFC was disrupted. One plausible interpretation of the involvement of the pFMC could be that conforming to social opinions triggers similar circuitry as does reinforcement learning (Klucharev et al., 2009). Neural activity in the pMFC might mirror activity similar to a prediction error signal, which can then subsequently be used to modify or strengthen the perceptual decision. In the current study, source analysis of ERP signals using conflicting cues also shows activity in the medial frontal cortex (MFC), starting around 200 ms post-stimulus onset. Neural signals following conflicting cues displayed comparatively greater frontal activity than concurring cues (**Figure 5**), possibly suggesting greater top-down processing of information when cues mismatch perceptual choice. It is particularly interesting to note that the MFC is active in the time interval immediately following the well-established N170 component, which is known to account for the difference between faces and cars (Daniel and Bentin, 2012). Possibly, the mismatch between the top-down expectation produced by the cue and the bottom-up sensory information triggered activity in the MFC, which has been reported to play a role in social conformity (Klucharev et al., 2009; Izuma, 2013). The medial frontal cortex perhaps generates a signal that encodes the difference between individual percept based on the stimulus and the group decision given by the cues. The absence of frontal activity in concurrent cues in the same time interval further supports our claim. The strength of MFC activity has been shown to regulate the level of subsequent adjustment of individual choice (Berns et al., 2010). Hence the MFC activation was more pronounced for chosen subjects. Our results seem to suggest that, irrespective of stimulus order, neural circuitry similar to existing social conformity studies was active in making perceptual decisions under the influence of social cues.

There has been extensive research on face and object perception in the last few decades that has revealed significant involvement of various occipito-parietal regions in the early stages of visual processing (<200 ms) (Rossion et al., 2003). Additionally, there a significant body of work finding that stimulus expectation leads to changes in early sensory processing (Carlsson et al., 2000; Kok et al., 2012a,b, 2013, 2014, 2016, 2017; Jiang et al., 2013; John-Saaltink et al., 2015; Todorovic et al., 2015; Sherman et al., 2016). It has been demonstrated in numerous studies that expectation about stimulus in the form of predicting cues leads to a stimulus bias. Top-down expectation effects can be seen in the form of improvement in stimulus representation (Kok et al., 2012a), generation of a stimulus template in striate and extrastriate regions (Puri et al., 2009; Kok et al., 2014), and even reduction in amplitude in neural signals leading to "expectation suppression" effect (Todorovic and de Lange, 2012). On the whole, top-down expectations in the form of predictive cues have been shown to bias neural activity in the pre-stimulus and early sensory processing stage, thereby orienting the bottom-up sensory information toward one perceptual decision. On the other hand, recent studies have questioned the role of neural expectation in the sensory cortex (Bang and Rahnev, 2017; Rungratsameetaweemana et al., 2018). In our study, however, probing into the neural time series unveiled no significant differences in perception under the influence of different social cues during early stages. We systematically analyzed the effect of social decision and found no significant effect of the cues before stimulus onset, post-cue onset, and immediately following stimulus onset. We extracted the neural data locked to cue presentation and used a multivariate pattern classifier on the cue data alone to show that the cue data were not indicative of any early top-down expectation based effect on the stimuli (see **Figure 6**). Our results seem to suggest, unlike studies involving predictive cues (Summerfield and De Lange, 2014), that expectation by virtue of social influence does not affect early sensory processing. It is worthwhile to note here that our cues were essentially social decisions of others instead of cues predictive about the stimulus itself (Summerfield and Koechlin, 2008; Summerfield and De Lange, 2014), which could possibly explain the lack of top-down expectation signals seen in the early sensory cortex in previous studies (Summerfield and De Lange, 2014). Our results seem to suggest the role of downstream processing in using the social information from the cue provided, similar to the concepts of Bayesian Decision Theory (Maloney and Mamassian, 2009) and Signal Detection Theory (Green and Swets, 1988; Macmillan and Creelman, 2004).

Overall, we conclude that perceptual decision and confidence are influenced by social information and that it is possible to compute the extent of influence using statistical modeling. Neural data analysis alludes to a role for the medial frontal cortex in perceptual decision under social influence. We found no expectation-related bias in early sensory processing using social information cues. Future studies could possibly focus on experiments using actual social groups to validate the neural results found in the current research.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Institute Ethics Committee at Indian Institute of Science Education and Research Kolkata, India with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

### REFERENCES


### AUTHOR CONTRIBUTIONS

BG and KD designed the experiment. BG and AS collected the data. TS, SM, and KD analyzed the data and wrote the manuscript.

### FUNDING

This work was funded by a grant from Cognitive Science Initiative, Department of Science and Technology, India (Grant No: SR/CSI/78/2012, Grant Period: 2013–2017) to KD. Funds were used for EEG System Setup. TS was supported by an INSPIRE fellowship from the Department of Science and Technology (DST), Government of India.

#### ACKNOWLEDGMENTS

The authors are thankful to the two reviewers for many suggestions that improved the article significantly.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01371/full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Saha Roy, Giri, Saha Chowdhury, Mazumder and Das. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Gait Generation and Its Energy Efficiency Based on Rat Neuromusculoskeletal Model

Misaki Toeda<sup>1</sup> , Shinya Aoi <sup>2</sup> \*, Soichiro Fujiki <sup>3</sup> , Tetsuro Funato<sup>4</sup> , Kazuo Tsuchiya<sup>2</sup> and Dai Yanagihara<sup>1</sup> \*

*<sup>1</sup> Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan, <sup>2</sup> Department of Aeronautics and Astronautics, Graduate School of Engineering, Kyoto University, Kyoto, Japan, <sup>3</sup> Department of Physiology and Biological Information, School of Medicine, Dokkyo Medical University, Tochigi, Japan, <sup>4</sup> Department of Mechanical Engineering and Intelligent Systems, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan*

#### Edited by:

*Jun Izawa, University of Tsukuba, Japan*

### Reviewed by:

*Srinivasa Chakravarthy, Indian Institute of Technology Madras, India Gabriella Lindgren, Swedish University of Agricultural Sciences, Sweden*

\*Correspondence:

*Shinya Aoi shinya\_aoi@kuaero.kyoto-u.ac.jp Dai Yanagihara dai-y@idaten.c.u-tokyo.ac.jp*

#### Specialty section:

*This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience*

Received: *29 June 2019* Accepted: *27 November 2019* Published: *17 January 2020*

#### Citation:

*Toeda M, Aoi S, Fujiki S, Funato T, Tsuchiya K and Yanagihara D (2020) Gait Generation and Its Energy Efficiency Based on Rat Neuromusculoskeletal Model. Front. Neurosci. 13:1337. doi: 10.3389/fnins.2019.01337* Changing gait is crucial for adaptive and smooth animal locomotion. Although it remains unclear what makes animals decide on a specific gait, energy efficiency is an important factor. It has been reported that the relationship of oxygen consumption with speed is U-shaped for each horse gait and that different gaits have different speeds at which oxygen consumption is minimized. This allows the horse to produce energy-efficient locomotion in a wide speed range by changing gait. However, the underlying mechanisms causing oxygen consumption to be U-shaped and the speeds for the minimum consumption to be different between different gaits are unclear. In the present study, we used a neuromusculoskeletal model of the rat to examine the mechanism from a dynamic viewpoint. Specifically, we constructed the musculoskeletal part of the model based on empirical anatomical data on rats and the motor control model based on the physiological concepts of the spinal central pattern generator and muscle synergy. We also incorporated the posture and speed regulation models at the levels of the brainstem and cerebellum. Our model achieved walking through forward dynamic simulation, and the simulated joint kinematics and muscle activities were compared with animal data. Our model also achieved trotting by changing only the phase difference of the muscle-synergy-based motor commands between the forelimb and hindlimb. Furthermore, the speed of each gait varied by changing only the extension phase duration and amplitude of the muscle synergy-based motor commands and the reference values for the regulation models. The relationship between cost of transport (CoT) and speed was U-shaped for both the generated walking and trotting, and the speeds for the minimum CoT were different for the two gaits, as observed in the oxygen consumption of horses. We found that the resonance property and the posture and speed regulations contributed to the CoT shape and difference in speeds for the minimum CoT. We further discussed the energy efficiency of gait based on the simulation results.

Keywords: rat, walk, trot, energy efficiency, central pattern generator, muscle synergy, neuromusculoskeletal model

### 1. INTRODUCTION

Animals can generate adaptive and smooth locomotion in various conditions. One important strategy for such locomotion is the use of different gaits. For example, quadruped animals walk, amble, trot, pace, canter, and gallop. Although gait is the motor outcome of a complicated and redundant musculoskeletal system controlled by the central nervous system, it is largely unclear what makes animals decide on a gait. One important factor for deciding gait is the energy efficiency of locomotion; that is, animals want to minimize the cost of transport (CoT). In particular, it has been reported that the relation between oxygen consumption and speed is U-shaped for each horse gait and that different gaits have different speeds at which oxygen consumption is minimized (**Figure 1**) (Hoyt and Taylor, 1981). Walking, trotting, and galloping are energy-efficient at low, middle, and high speeds, respectively. Walking and trotting share a common speed range, as do trotting and galloping. Therefore, horses can produce energy-efficient locomotion over a wide speed range by changing their gait. However, the underlying mechanisms making the oxygen consumption U-shaped and the speeds for minimum consumption different between gaits remain unclear.

Locomotion is generated through interactions between the central nervous system, musculoskeletal system, and environment. It is difficult to fully analyze the locomotor mechanism with animal data alone. Recently, modeling studies have attracted attention because physiological findings and hypotheses can be used to develop reasonably realistic motor control models, and biomechanical and anatomical findings can be used to construct detailed musculoskeletal models (Ivashko et al., 2003; Yakovenko et al., 2004; Ekeberg and Pearson, 2005; Nishii, 2006; Aoi et al., 2013a; Fukuoka et al., 2015; Hunt et al., 2015; Aoi and Funato, 2016; Markin et al., 2016; Fujiki et al., 2018). Motor control and musculoskeletal models are integrated to produce locomotion through forward dynamics simulation. This allows the locomotor mechanism to be examined from a dynamic viewpoint.

In this study, we investigated the energy efficiency of gait using a rat neuromusculoskeletal model. Specifically, we constructed a musculoskeletal model composed of the trunk, forelimbs, and hindlimbs based on anatomical data. This model is an improvement on our previous rat hindlimb model (Aoi et al.,

2013a). We also improved our previous motor control model to control the rat four-limb model. The motor control model was developed based on the hypothetical two-layer central pattern generator (CPG) model at the spinal cord level (Burke et al., 2001; Rybak et al., 2006) and the muscle synergy hypothesis (Tresch et al., 1999; Todorov and Jordan, 2002; d'Avella et al., 2003; Ting and Macpherson, 2005; Ivanenko et al., 2006; Drew et al., 2008; Takei et al., 2017), which describes a simple control strategy for redundant motor systems. Furthermore, we incorporated movement regulation models at the levels of the brainstem and cerebellum through brainstem descending pathways. We simulated the walking of our model and compared the simulation results with animal data. In addition, we simulated trotting and changed the speed of each gait using simple motor control strategies. We calculated the CoT of walking and trotting for the generated speeds and, in this paper, we discuss the energy efficiency of gait based on the simulation results.

## 2. METHOD

### 2.1. Musculoskeletal Model

We developed a rat musculoskeletal model based on our previous model, which focused on the hindlimbs without incorporating the forelimbs (Aoi et al., 2013a). The skeletal part of the model consists of eleven rigid links representing the trunk (including the head), forelimbs (two links), and hindlimbs (three links), as shown in **Figure 2**. This model is two-dimensional, and the walking behavior is constrained to the sagittal plane. When the brachium and antebrachium are in a straight line and perpendicular to the trunk, the shoulder angle is 120◦ and the elbow angle is 180◦ . When the thigh, shank, and foot are in a straight line and perpendicular to the trunk, the hip angle is 120◦ and the knee and ankle angles are both 180◦ . The joint angles increase as the joints extend. We modeled the contact between the limb tips and the ground using viscoelastic elements. We derived the equations of motion using Lagrangian equations and solved them using the fourth-order Runge-Kutta method with a time step of 0.02 ms.

For the muscle part of the model, we used six principal muscles for each forelimb: four uniarticular, namely shoulder extension (supraspinatus, SSP), shoulder flexion (spinoltoideus, SPD), elbow flexion (brachioradialis, BR), and elbow extension (triceps lateral head, TRIL), and two biarticular, namely shoulder extension and elbow flexion (biceps, BIC) and shoulder flexion and elbow extension (triceps, TRI), as shown in **Figure 2**. We used seven principal muscles for each hindlimb: five uniarticular, namely hip flexion (iliopsoas, IP), hip extension (gluteus maximus, GM), knee extension (vastus lateralis, VL), ankle flexion (tibialis anterior, TA), and ankle extension (soleus, SO), and two biarticular, namely hip extension and knee flexion (biceps femoris, BF) and knee flexion and ankle extension (gastrocnemius, GA). The moment arms of the muscles around the joints are constant, regardless of joint angles. Each muscle generates muscle tension F<sup>m</sup> (m = SSP, SPD, BR, TRIL, BIC, TRI, IP, GM, VL, TA, SO, BF, and GA) through contractile and passive

elements, which is given based on Aoi et al. (2013a) by

$$F\_m = F\_m^{\text{max}} (a\_m F\_m^l F\_m^l + F\_m^p) \tag{1}$$

where F max <sup>m</sup> is the maximum muscle tension, a<sup>m</sup> is the muscle activation (0 ≤ a<sup>m</sup> ≤ 1), F l <sup>m</sup> is the force-length relationship, F v <sup>m</sup> is the force-velocity relationship, and F p <sup>m</sup> is the passive component. The muscle lengths were normalized by l max <sup>m</sup> , which was set so that all uniarticular muscles had a length of 85% of l max <sup>m</sup> and all biarticular muscles had a length of 75% of l max <sup>m</sup> at a neutral posture with the shoulder at 60◦ , the elbow at 85◦ , the hip at 70◦ , the knee at 90◦ , and the ankle at 100◦ . In addition, 2◦ of joint motion corresponded to 1% of muscle length change, except for BIC and GA (4.5◦ at the shoulder for BIC, 1.5◦ at the ankle or 4.5◦ at the knee for GA). The muscle contractile velocities were normalized by 1.8l max <sup>m</sup> .

The muscle activation a<sup>m</sup> is determined through

$$
\pi\_{\rm act} \dot{a}\_m + \left\{ \frac{\pi\_{\rm act}}{\pi\_{\rm decat}} + \left( 1 - \frac{\pi\_{\rm act}}{\pi\_{\rm decat}} \right) u\_m \right\} a\_m = u\_m \tag{2}
$$

where τact and τdeact are respectively, activation and deactivation time constants (11 and 18 ms, respectively) and u<sup>m</sup> is the motor command determined in the motor control model.

#### 2.2. Motor Control Model

We developed a motor control model based on our previous work (Aoi et al., 2013a). It consists of the following two components: 1. a movement generator, which produces motor commands in a feedforward fashion at the spinal cord level to create periodic limb movements based on the muscle synergy hypothesis and 2. a movement regulator, which creates motor commands to regulate locomotor behavior in a feedback fashion at the brainstem and cerebellar levels based on proprioceptive and somatosensory information. The motor command u<sup>m</sup> is the summation of the two components from the movement generator and the movement regulator, namely u Syn <sup>m</sup> and u Reg <sup>m</sup> , respectively.

$$
\mu\_m = \mu\_m^{\text{Sym}} + \mu\_m^{\text{Reg}} \tag{3}
$$

#### 2.2.1. Movement Generator

The movement generator is based on the hypothetical twolayer CPG model composed of a rhythm generator (RG) network, which produces rhythm and phase information for motor commands, and a pattern formation (PF) network, which produces spatiotemporal patterns of motor commands (Burke et al., 2001; Rybak et al., 2006).

For the RG model, we used four simple phase oscillators, each of which produces a basic rhythm and phase information for the corresponding limb. We used φ j i (i = left, right, j = fore, hind) for the oscillator phase (0 ≤ φ j <sup>i</sup> < 2π), which follows the dynamics given by

$$\dot{\phi}\_{\text{left}}^{\text{fore}} = \frac{2\pi}{T} - K\_1 \sin(\phi\_{\text{left}}^{\text{fore}} - \phi\_{\text{right}}^{\text{fore}} - \pi) - K\_2 \sin(\phi\_{\text{left}}^{\text{fore}} - \phi\_{\text{left}}^{\text{hind}} + \Delta)$$

$$\dot{\phi}\_{\text{right}}^{\text{fore}} = \frac{2\pi}{T} - K\_1 \sin(\phi\_{\text{right}}^{\text{fore}} - \phi\_{\text{left}}^{\text{fore}} - \pi) - K\_2 \sin(\phi\_{\text{right}}^{\text{fore}} - \phi\_{\text{right}}^{\text{hind}} + \Delta)$$

$$\dot{\phi}\_{\text{left}}^{\text{hind}} = \frac{2\pi}{T} - K\_1 \sin(\phi\_{\text{left}}^{\text{hind}} - \phi\_{\text{right}}^{\text{hind}} - \pi) - K\_2 \sin(\phi\_{\text{left}}^{\text{hind}} - \phi\_{\text{left}}^{\text{fore}} - \Delta)$$

$$\dot{\phi}\_{\text{right}}^{\text{hind}} = \frac{2\pi}{T} - K\_1 \sin(\phi\_{\text{right}}^{\text{hind}} - \phi\_{\text{left}}^{\text{hind}} - \pi) - K\_2 \sin(\phi\_{\text{right}}^{\text{hind}} - \phi\_{\text{right}}^{\text{fore}} - \Delta)$$

where T is the gait cycle duration and K<sup>1</sup> and K<sup>2</sup> are gain parameters. The second term on the right-hand side ensures that the left and right limbs move in antiphase to maintain interlimb coordination. The third term on the right-hand side ensures that the ipsilateral limbs move in relative phase of 1 to maintain interlimb coordination.

For the PF model, we determined the motor commands necessary to produce periodic limb movements in accordance with the corresponding oscillator phase based on the muscle synergy hypothesis, which suggests that the linear combination of only a small number of basic signals produces a large portion of motor commands in animal locomotion (Ivanenko et al., 2006; Dominici et al., 2011; Markin et al., 2012; Catavitello et al., 2015; Rigosa et al., 2015). Specifically, we used four rectangular pulses p<sup>i</sup> (i = 1, . . . , 4) for each limb, which are given by

$$p\_i(\phi) = \begin{cases} 1 & \phi\_i \le \phi < \Psi\_i \\ 0 & \text{otherwise} \end{cases} \quad i = 1, \ldots, 4 \tag{5}$$

where 8<sup>i</sup> and 9<sup>i</sup> (i = 1, . . . , 4) are the onset and end phases of the pulse, respectively, and we omitted the suffix of φ. p1, p2, p3, and p<sup>4</sup> contribute to early extension, late extension, early flexion, and late flexion, respectively, as shown in **Figure 3** [extension and flexion phases start at 8<sup>1</sup> (= 0 rad) and 83, respectively]. We used the same values of 8<sup>i</sup> and 9<sup>i</sup> for the four limbs irrespective of whether they were forelimb or hindlimb. The motor command u Syn <sup>m</sup> of the movement generator is given by

$$\mu\_m^{\text{syn}} = \sum\_{i=1}^4 \nu\_{m,i} p\_i(\phi) \tag{6}$$

where wm,<sup>i</sup> (i = 1, . . . , 4) is the weighting coefficient.

#### 2.2.2. Movement Regulator

At the levels of the brainstem and cerebellum, locomotor behavior is regulated based on proprioceptive and somatosensory information (Takakusaki, 2017). For the rat, it is crucial to maintain body height and forward speed during locomotion (**Figure 4**). For simplicity, we focused on these two factors.

For the body height, we used simple feedback control for the standing limb. For the forelimbs, we used the BR and TRIL muscles to maintain the shoulder height. The motor command p height <sup>m</sup> (m = BR and TRIL) is given by

$$\mathcal{P}\_m^{\text{height}} = \begin{cases} -K\_m^{\text{height}} (h^{\text{Shoulder}} - h\_0^{\text{Shoulder}}) - D\_m^{\text{height}} \dot{h}^{\text{Shoulder}} & \text{in state phase} \\ 0 & \text{otherwise} \end{cases} \tag{7}$$

where h Shoulder and ˙h Shoulder are the shoulder height and its rate, respectively, h Shoulder 0 is the reference height, and K height <sup>m</sup> and D height m are gain parameters. For the hindlimbs, we used the VL, TA, and SO muscles to maintain the hip height. The motor command p height <sup>m</sup> (m = VL, TA, and SO) is given by

$$\boldsymbol{\rho}\_{m}^{\text{heigh}} = \begin{cases} -K\_{m}^{\text{heigh}} (\boldsymbol{h}^{\text{Hip}} - \boldsymbol{h}\_{0}^{\text{Hip}}) - D\_{m}^{\text{heigh}} \dot{\boldsymbol{h}}^{\text{Hip}} & \text{in state phase} \\ 0 & \text{otherwise} \end{cases} (8)$$

where h Hip and ˙h Hip are the hip height and its rate, respectively, and h Hip 0 is the reference height.

For the forward speed, we used simple feedback control for the standing limb. We used the SSP, SPD, IP, GM, TA, and SO muscles to maintain speed. The motor command p speed <sup>m</sup> (m = SSP, SPD, IP, GM, TA, and SO) is given by

$$p\_m^{\text{speed}} = \begin{cases} -K\_m^{\text{speed}}(\nu - \nu\_0) & \text{in state phase} \\ 0 & \text{otherwise} \end{cases} \tag{9}$$

FIGURE 4 | Movement regulator based on shoulder height, hip height, and forward speed. BR and TRIL muscles are used for shoulder height, VL, TA, and SO muscles are used for hip height, and SSP, SPD, IP, GM, TA, and SO muscles are used for speed.

where v is the forward speed, v<sup>0</sup> is its desired value, and K speed <sup>m</sup> is a gain parameter.

The summation of these elements produces the motor command of the movement regulator. Because regulation is managed at the brainstem and cerebellar levels, the command signals are delayed and the motor command u Reg <sup>m</sup> of the movement regulator is given by

$$
\mu\_m^{\text{Reg}}(t) = \mu\_m^{\text{hedg}}(t) + \mu\_m^{\text{spec}}(t) \tag{10}
$$

where

$$\begin{aligned} u\_m^{\text{hetight}}(t) &= p\_m^{\text{heigt}}(t - \mathfrak{r}^{\text{Delay}}) \\ u\_m^{\text{speed}}(t) &= p\_m^{\text{speed}}(t - \mathfrak{r}^{\text{Delay}}) \end{aligned} \tag{11}$$

and τ Delay (= 15 ms) is the delay between receiving the transmission of proprioceptive and somatosensory information at the brainstem and cerebellar levels and sending the motor command to the spinal cord level.

#### 2.3. Changing Gait and Speed

In this study, we focused on two gaits, namely walking and trotting. They are mainly classified by the footfall sequence. Specifically, four limbs move out of phase in walking, and diagonal limbs are paired in trotting (**Figure 5**). Right and left limbs move in antiphase in both walking and trotting. The major difference between the gaits is the relative phase between the ipsilateral limbs. To change the relative phase of the limb movements, we changed the relative phase of the musclesynergy-based motor command u Syn <sup>m</sup> between the ipsilateral limbs by changing 1 in (4). In particular, we used 1 = π/2 for walking and 1 = π for trotting.

Animals change the gait cycle duration to vary speed, where the duration of the flexion phase for swinging the limb

depend on β.

remains almost unchanged and the duration of the extension phase for supporting the body and producing the propulsive forces is changed substantially (Goslow et al., 1973; Heglund and Taylor, 1998; Clarke and Still, 1999; Górska et al., 1999; Yakovenko et al., 2005). In this study, we changed the speed by changing the duration of the extension phase Tex (= βT) using β while keeping the duration of the flexion phase Tfl unchanged (T = Tex + Tfl = Tfl/(1 − β)), as shown in **Figure 6A**. For the nominal speed, which we determined from animal data as explained below, we used βˆ, Tˆ ex, 8ˆ i , 9ˆ i , wˆ <sup>m</sup>,<sup>i</sup> (i = 1, . . . , 4), ˆh Shoulder 0 , ˆh Hip 0 , and vˆ<sup>0</sup> for motor control parameters β, Tex, 8<sup>i</sup> , 9<sup>i</sup> , wm,<sup>i</sup> (i = 1, . . . , 4), h Shoulder 0 , h Hip 0 , and v0, respectively. The onset phase 8<sup>i</sup> and end phase 9<sup>i</sup> (i = 1, . . . , 4) of each pulse are given by

$$\begin{aligned} \Phi\_i &= \begin{cases} \frac{\beta}{\hat{\beta}} \hat{\Phi}\_i & i = 1, 2\\ \frac{1-\beta}{1-\hat{\beta}} \hat{\Phi}\_i + \frac{2\pi(\beta-\hat{\beta})}{1-\hat{\beta}} & i = 3, 4\\ \frac{\beta}{\hat{\beta}} \hat{\Psi}\_i & i = 1, 2\\ \frac{1-\beta}{1-\hat{\beta}} \hat{\Psi}\_i + \frac{2\pi(\beta-\hat{\beta})}{1-\hat{\beta}} & i = 3, 4 \end{cases} \end{aligned} \tag{12}$$

We decreased (increased) the extension phase duration to increase (decrease) the speed, which decreased (increased) the duration of pulses of the extension phase. To prevent the model from decreasing (increasing) the speed, we increased (decreased) the weighting coefficients wm,<sup>i</sup> (i = 1, 2) of the muscle-synergy-based rectangular pulses for the extension phase (**Figure 6B**) as

$$
\omega\_{m,i} = \frac{1-\beta}{1-\hat{\beta}} \hat{w}\_{m,i} \quad i = 1,2\tag{13}
$$

As we changed the locomotion speed, we also changed the reference height (shoulder, hip) and speed for the movement regulator (**Figures 6C–E**) as

$$\begin{aligned} h\_0^{\text{Sholder}} &= \hat{h}\_0^{\text{Sholder}} + \alpha^{\text{Sholder}}(\beta - \hat{\beta}) \\ h\_0^{\text{Hlp}} &= \hat{h}\_0^{\text{Hlp}} + \alpha^{\text{Hlp}}(\beta - \hat{\beta}) \\ \nu\_0 &= \hat{\nu}\_0 + \alpha^{\text{Speed}}(\beta - \hat{\beta}) \end{aligned} \tag{14}$$

where α Shoulder , α Hip, and α Speed are coefficients.

#### 2.4. Model Parameters

#### 2.4.1. Parameters for the Musculoskeletal Model

To determine the physical parameters of the musculoskeletal model, we used seven adult male Wistar rats (body weight: 125 ± 10 g). The rats were deeply anesthetized, and their musculoskeletal features were measured. The experiments were approved by the Ethical Committee for Animal Experiments at the University of Tokyo and carried out in accordance with the Guidelines for Research with Experimental Animals of the University of Tokyo and the Guide for the Care and Use of Laboratory Animals (NIH Guide).

For the skeletal model, we measured several physical parameters of the rats, such as masses, joint positions, and distances between joints, and determined the model parameters from these measurements, as shown in **Table 1**. For the muscle model, we first electrically stimulated individual muscles and determined which joint movements were needed to verify our musculoskeletal model. We measured the attachment, direction, and physiological cross-sectional area (PCSA) for each muscle and determined the model parameters from these measurements, as shown in **Table 2**, where the maximum muscle tension F max m was determined based on the measured PCSA and the moment arms were determined from the center of the range of joint movement during locomotion.

#### 2.4.2. Parameters for the Motor Control Model

Based on measured data for rats walking on a treadmill at a speed of 0.4 m/s (Aoi et al., 2013a), we set the durations of the flexion and extension phases for the nominal speed as Tfl = 0.10 s and Tˆ ex = 0.16 s, respectively (βˆ = 0.62). We determined the motor control parameters for the nominal speed as follows so that our model achieved steady walking based on our previous results of the hindlimb model (Aoi et al., 2013a): 8ˆ <sup>1</sup> = 0, 8ˆ <sup>2</sup> = 0.40π, 8ˆ <sup>3</sup> = 1.24π, 8ˆ <sup>4</sup> = 1.42π, 9ˆ <sup>1</sup> = 0.33π, 9ˆ <sup>2</sup> = 0.89π, 9ˆ i = 1.42π, 9ˆ <sup>i</sup> = 1.71π, wˆ SSP,1 = 0.24, wˆ SSP,4 = 0.20, wˆ SPD,2 = 0.27, wˆ SPD,3 = 0.08, wˆ BR,3 = 0.09, wˆ TRIL,1 = 0.47, wˆ TRIL,2 = 0.57, wˆ BIC,3 = 0.17, wˆ BIC,4 = 0.08, wˆ TRI,1 = 0.27, wˆ TRI,2 = 0.56, TABLE 1 | Physical parameters of skeletal model.


*MOI, moment of inertia around center of mass.*



*MA, moment arm of muscle around joint; s, shoulder; e, elbow; h, hip; k, knee; and a, ankle.*

wˆ IP,3 = 0.32, wˆ IP,4 = 0.32, wˆ GM,1 = 0.61, wˆ GM,2 = 0.25, wˆ VL,1 = 0.19, wˆ VL,2 = 0.22, wˆ TA,3 = 0.45, wˆ TA,4 = 0.06, wˆ SO,1 = 0.58, wˆ SO,2 = 0.14, wˆ BF,1 = 0.22, wˆ BF,2 = 0.12, wˆ BF,3 = 0.09, wˆ GA,1 = 0.47, wˆ GA,2 = 0.10, wˆ <sup>m</sup>,<sup>i</sup> = 0 for the other values of m and i, ˆh Shoulder <sup>0</sup> = 0.033 m, ˆh Hip <sup>0</sup> = 0.054 m, vˆ<sup>0</sup> = 0.4 m/s, K height BR = −2.07, K height TRIL = 2.07, K height VL = 12.4, K height TA = −12.4, K height SO = 12.4, D height BR = −0.001, D height TRIL = 0.001, D height VL = 0.006, D height TA = −0.006, D height SO = 0.006, K speed SSP = −0.007, K speed SPD = 0.007, K speed IP = −0.052, K speed GM = 0.052, K speed TA = −0.026, K speed SO = 0.026, K<sup>1</sup> = 20, and K<sup>2</sup> = 10. In addition, we set the coefficients for the regulation of the references in the movement regulator to change the speed as α Shoulder = 0.01 m, α Hip = 0.01 m, and α Speed = −4.7 m/s.

### 2.5. Comparison With Animal Data

To evaluate our neuromusculoskeletal model, we compared the simulation results for walking with animal data. We used the joint angles of the hindlimbs measured in Aoi et al. (2013a), where rats walked on a treadmill at a speed of 0.4 m/s, and the joint angles of the forelimbs measured in Aoki et al. (2013), where intact rats walked at the average speed of 0.36 m/s in a custom-made runway box (length: 140 cm; width: 14 cm).

We used the electromyographic (EMG) data measured from the muscles of the hindlimbs in Aoi et al. (2013a) and the EMG data measured from two muscles (BIC and TRI) of the forelimbs in Aoki et al. (2013). Because we could not find EMG data for four muscles (SSP, SPD, BR, and TRIL) of the forelimbs of rats, we used EMG data for these muscles in cats, whose gait and joint movements are similar to those of rats, given in Drew et al. (2008), where cats walked on a treadmill at a speed of 0.35–0.45 m/s. In the comparison with the simulation results, we showed the EMG data so that their magnitudes are similar to those of simulated muscle activities.

#### 2.6. Evaluation of Cost of Transport

The energetic cost of locomotion for our simulation results for walking and trotting was estimated based on the mechanical energy exerted by muscles. Based on previous work (Ogihara et al., 2011), we calculated the CoT ε as

$$\varepsilon = \frac{W}{D} \tag{15}$$

where

$$\begin{aligned} W &= \eta\_+ + \eta\_-\\ \eta\_+ &= \int\_T \sum\_m F\_m [\nu\_m]^+ dt\\ \eta\_- &= \frac{1}{4} \int\_T \sum\_m F\_m [-\nu\_m]^+ dt \end{aligned}$$

v<sup>m</sup> is the contracting velocity of the muscle (positive for contraction), and [x] <sup>+</sup> is x if x ≥ 0 and 0 if x < 0. η<sup>+</sup> and η<sup>−</sup> are the positive and negative mechanical work done by muscles, respectively, for one gait cycle duration. The negative mechanical work was divided by four based on Margaria et al. (1963), Elmer and LaStayo (2014). D is the moving distance of the model for one gait cycle duration, which corresponds to the stride length.

In this study, the motor command u<sup>m</sup> is generated by three elements: rectangular pulses u Syn <sup>m</sup> in the movement generator and motor commands u height <sup>m</sup> and u speed <sup>m</sup> to regulate the posture and speed, respectively, in the movement regulator (u<sup>m</sup> = u Syn m + u height <sup>m</sup> + u speed <sup>m</sup> ). Because they determined the muscle activation a<sup>m</sup> in (2), we calculated a Syn <sup>m</sup> , a height <sup>m</sup> , and a speed <sup>m</sup> from u Syn <sup>m</sup> , u height <sup>m</sup> , and u speed <sup>m</sup> , respectively. Using these values, we calculated the CoTs ε Syn , ε height, and ε speed from the three elements to investigate their contributions.

### 3. RESULTS

#### 3.1. Simulation of Walking

First, we conducted a computer simulation of our neuromusculoskeletal model for the nominal speed of walking using β = βˆ and 1 = π/2 (see **Supplementary Movie S1**). The generated average speed was 0.2 m/s. **Figures 7A,B** show the joint angle and muscle activity, respectively, from the simulation compared with animal data. The simulation results show activity patterns similar to those of animals in terms of kinematics and muscle activity levels. However, our model was limited in its ability to accurately reproduce the locomotor behavior observed in animals. In particular, the elbow and knee joints were more extended than those of animals, which resulted in a shorter stride and slower speed than desired. The more extended elbow joint partly occurred because we did not incorporate the hand and wrist in the forelimb, and the forward speed was reduced by large ground reaction forces at the tips of the forelimbs. Similarly, the more extended knee joint partly occurred because we did not incorporate the phalangeal part in the hindlimb. The absence of flexibility of the spine in the trunk is another factor causing the extended posture. In addition, the activity of the SSP muscle appeared in a phase different from that of measured data. The SSP muscle in animals was activated in the same phase as that of the antagonistic SPD muscle so that the shoulder joint stiffness increased. In contrast, the SSP muscle in our model was activated in the same phase as that of the ipsilateral BR and BIC muscles.

### 3.2. Changing Gait and Speed

By changing 1 from π/2 to π, our model achieved steady trotting (see **Supplementary Movie S2**). Although this gait had activity patterns almost identical to those of walking in terms of joint kinematics and muscle activations (**Figure 7**), the footfall pattern was different, as shown in **Figure 8**. The difference of the footfall pattern caused the difference in the trunk movement. In particular, while walking has a slight pitching movement of the trunk, trotting has almost no pitching movement, as shown in **Figure 8** (see **Supplementary Movies S1**, **S2**).

To change the speed of each gait, we slowly increased or decreased β from βˆ while changing the duration of the extension phase, the amplitude of the muscle-synergy-based motor commands, and the reference values for the movement regulator based on β, as in (12–14). **Figure 9** shows the speed of the simulated walking and trotting. Our model achieved speeds of 0.15–0.2 m/s for walking and 0.18–0.22 m/s for trotting. Trotting was faster than walking in each β.

### 3.3. Cost of Transport

**Figure 10A** shows the CoT ε of walking and trotting for the generated speeds in the simulation. Both CoT curves are Ushaped. The speeds for the minimum CoT for walking and trotting are very different. Walking had lower (higher) CoTs than trotting at slow (fast) speed. The CoT was obtained by dividing the mechanical work W for one gait cycle duration by the stride length D, as in (15). **Figures 10B,C** show the mechanical work and stride length, respectively, with speed. The mechanical work slightly but monotonically increased in walking and decreased in trotting. In contrast, the stride length shows a single-peaked shape for speed in both walking and trotting. The speeds for the minimum CoT and maximum stride length were almost identical in both walking and trotting.

**Figures 10D,E**, respectively, show the contributions of the muscle synergy-based pulses and posture and speed regulators to the CoT (ε Syn, and ε height and ε speed, respectively). The CoT contribution of the muscle synergy-based pulses was U-shaped in both walking and trotting and was the largest among the three elements. The CoT contributions of the posture and speed regulators were small. While the contribution of the posture regulator remained constant with speed in walking, it decreased in trotting. The contribution of the speed regulator increased in both walking and trotting.

### 4. DISCUSSION

In this study, we improved our previous musculoskeletal model of rat hindlimbs (Aoi et al., 2013a) to construct a whole-body rat

musculoskeletal model, which consists of the trunk, forelimbs, and hindlimbs. We also improved our motor control model (Aoi et al., 2013a) based on the muscle synergy hypothesis to control the whole-body rat model. Although the motor control model had a large number of motor control parameters, the rat model could be made to walk or trot by changing only the phase difference of the muscle-synergy-based motor commands between the forelimb and hindlimb (**Figures 7**, **8**). Furthermore, the speed of each gait could be varied by changing only the duration of the extension phase, the amplitude of the musclesynergy-based motor commands, and the reference values for the movement regulator (**Figure 9**). The relation between speed and CoT was U-shaped for both the walking and trotting generated, and the speeds for the minimum CoT were different for the two gaits, as observed in the oxygen consumption of animals (**Figure 10**).

### 4.1. Characteristics of Cost of Transport

For our simulation, the CoT vs. speed curves were U-shaped for both walking and trotting (**Figure 10A**). Walking had lower (higher) CoTs than trotting at slow (fast) speed. The CoTs were the same at a certain middle speed. These results indicate that

walking and trotting are energy-efficient at slow and fast speeds, respectively. These trends are similar to those observed for animals (**Figure 1**).

The CoT was calculated by dividing the mechanical work of one gait cycle duration by the stride length, as shown in (15). The stride length showed a single-peaked shape against speed (**Figure 10C**). The speeds for the minimum CoT and maximum stride length were almost identical. We decreased the extension phase duration to increase the speed, which decreased the gait cycle duration. Because we increased the muscle-synergy-based motor commands during the extension phase as in (13), the stride length increased. However, this increase of the stride length was limited due to the increase of the gait frequency (decrease of the gait cycle duration). The stride length decreased over a critical frequency, which suggests a resonance property of the musculoskeletal dynamics and motor control input. Although these trends were similar between walking and trotting, the maximum stride length differed. These characteristics contributed majorly to the different energy efficiencies of gait. It has been reported that when the locomotor frequency increased, the stretch receptor of the hip prevented the hindlimbs from extending further (Mayer et al., 2018; Santuz et al., 2019). In the future, we would like to incorporate this sensory regulation model to control the stride length to investigate the mechanism of energy-efficient locomotion further.

In our model, the CoT had contributions from three elements, namely muscle synergy-based pulses and posture and speed regulators (ε ≃ ε Syn + ε height + ε speed). The muscle synergy-based pulses had the largest contribution and determined the basic Ushaped characteristics (**Figure 10D**). Although the posture and speed regulators had small contributions (**Figure 10E**), they had specific characteristics. In particular, while the posture regulator for walking remains almost constant with speed, that for trotting decreased, which moved the speed for the minimum CoT to the right (with respect to that for the muscle synergy-based pulses) and increased the difference in speed for the minimum CoT between walking and trotting. This allowed the model to achieve energy-efficient locomotion in a wider speed range. In contrast, the speed regulator increased with speed in both walking and trotting and had a similar shape against speed for walking and trotting, which had a small contribution to the difference in speed for the minimum CoT.

### 4.2. Gait Generation Based on Muscle Synergy

A large portion of motor commands in our model was generated by a linear combination of four rectangular pulses for each limb, where we used the same onset and end phases for the pulses between the four limbs. We changed the relative phase of the pulses between the forelimb and hindlimb to make the gait generation simple. However, a muscle synergy analysis of dogs showed that although a large portion of the muscle activity can be reproduced by a linear combination of four basic patterns for both forelimbs and hindlimbs in walking and trotting, as done

for our model, the basic patterns had some differences, especially in the activation timings between forelimbs and hindlimbs and between walking and trotting (Deban et al., 2012; Catavitello et al., 2015). In particular, the basic patterns for the late extension and early and late flexion of the hindlimbs were earlier than those of the forelimbs in walking. The basic pattern for the early flexion of the hindlimbs was earlier than that of the forelimbs in trotting. The control of the activation timings of the muscle synergy patterns could contribute to the gait change (Cappellini et al., 2006; Aoi et al., 2019). In future studies, we would like to measure the trotting of rats and incorporate motor control differences between forelimbs and hindlimbs and between walking and trotting to clarify the gait-generation mechanism further.

### 4.3. Limitations of Our Model and Future Work

Although our simulation results showed features similar to those of animals (**Figure 7**), our model has limitations that prevent it from accurately reproducing animal locomotion. In particular, we did not incorporate the hand, phalangeal part of the hindlimbs, or flexibility of the spine in the trunk (Schilling and Hackert, 2006). These elements might improve the gait speed and energy efficiency. Furthermore, we confined our musculoskeletal model to two dimensions, which neglected instability in the lateral direction. More contribution of the posture regulator would be required for a three-dimensional model to maintain a stable posture during locomotion. In addition, although head movements are important and specific for gait (Zsoldos et al., 2010), we did not incorporate the neck. We would like to incorporate these features to clarify adaptive motor control mechanisms in animal locomotion further.

Not only the metabolic cost of locomotion but also other factors, such as musculoskeletal forces (Farley and Taylor, 1991), gait stability (Schöner et al., 1990; Diedrich and Warren, 1995; Aoi et al., 2013b), terrain and ground surface conditions (Prost and Sussman, 1969; Gustås et al., 2006; Goldenberg et al., 2008; Chateau et al., 2013), and genetic mutation (Andersson et al., 2012), influence the gait decision of animals. Furthermore, although animals change their gait smoothly when triggered by these factors, the gait transition mechanism also remains unclear. Our neuromusculoskeletal model will be useful for investigating these mechanisms in the future.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

The animal study was reviewed and approved by the Ethical Committee for Animal Experiments at the University of Tokyo.

### AUTHOR CONTRIBUTIONS

SA developed the study design. MT performed the computer simulations and analyzed the data in consultation with SA, SF,

### REFERENCES


TF, KT, and DY. MT and SA wrote the manuscript, and all the authors reviewed and approved it.

### FUNDING

This study was supported in part by JSPS KAKENHI Grant-in-Aid for Scientific Research (B) JP15KT0015 and Grant-in-Aid for Scientific Research on Innovative Areas JP26120006.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01337/full#supplementary-material

Supplementary Movie S1 | Simulated walking.

Supplementary Movie S2 | Simulated trotting.


absence of sensory feedback from muscle spindles. J. Physiol. 597, 3147–3165. doi: 10.1113/JP277515


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Toeda, Aoi, Fujiki, Funato, Tsuchiya and Yanagihara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Contribution of Phase Resetting to Adaptive Rhythm Control in Human Walking Based on the Phase Response Curves of a Neuromusculoskeletal Model

Daiki Tamura<sup>1</sup> , Shinya Aoi <sup>1</sup> \*, Tetsuro Funato<sup>2</sup> , Soichiro Fujiki <sup>3</sup> , Kei Senda<sup>1</sup> and Kazuo Tsuchiya<sup>1</sup>

<sup>1</sup> Department of Aeronautics and Astronautics, Graduate School of Engineering, Kyoto University, Kyoto, Japan, <sup>2</sup> Department of Mechanical Engineering and Intelligent Systems, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan, <sup>3</sup> Department of Physiology and Biological Information, School of Medicine, Dokkyo Medical University, Tochigi, Japan

### Humans walk adaptively in varying environments by manipulating their complicated and redundant musculoskeletal system. Although the central pattern generators in the spinal cord are largely responsible for adaptive walking through sensory-motor coordination, it remains unclear what neural mechanisms determine walking adaptability. It has been reported that locomotor rhythm and phase are regulated by the production of phase shift and rhythm resetting (phase resetting) for periodic motor commands in response to sensory feedback and perturbation. While the phase resetting has been suggested to make a large contribution to adaptive walking, it has only been investigated based on fictive locomotion in decerebrate cats, and thus it remains unclear if human motor control has such a rhythm regulation mechanism during walking. In our previous work, we incorporated a phase resetting mechanism into a motor control model and demonstrated that it improves the stability and robustness of walking through forward dynamic simulations of a human musculoskeletal model. However, this did not necessarily verify that phase resetting plays a role in human motor control. In our other previous work, we used kinematic measurements of human walking to identify the phase response curve (PRC), which explains phase-dependent responses of a limit cycle oscillator to a perturbation. This revealed how human walking rhythm is regulated by perturbations. In this study, we integrated these two approaches using a physical model and identification of the PRC to examine the hypothesis that phase resetting plays a role in the control of walking rhythm in humans. More specifically, we calculated the PRC using our neuromusculoskeletal model in the same way as our previous human experiment. In particular, we compared the PRCs calculated from two different models with and without phase resetting while referring to the PRC for humans. As a result, although the PRC for

#### Edited by:

Tomohiko Takei, Kyoto University, Japan

### Reviewed by:

Srinivasa Chakravarthy, Indian Institute of Technology Madras, India Auke Ijspeert, École Polytechnique Fédérale de Lausanne, Switzerland

\*Correspondence:

Shinya Aoi shinya\_aoi@kuaero.kyoto-u.ac.jp

#### Specialty section:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 03 April 2019 Accepted: 09 January 2020 Published: 05 February 2020

#### Citation:

Tamura D, Aoi S, Funato T, Fujiki S, Senda K and Tsuchiya K (2020) Contribution of Phase Resetting to Adaptive Rhythm Control in Human Walking Based on the Phase Response Curves of a Neuromusculoskeletal Model. Front. Neurosci. 14:17. doi: 10.3389/fnins.2020.00017

the model without phase resetting did not show any characteristic shape, the PRC for the model with phase resetting showed a characteristic phase-dependent shape with trends similar to those of the PRC for humans. These results support our hypothesis and will improve our understanding of adaptive rhythm control in human walking.

Keywords: human walking, phase resetting, phase response curve, central pattern generator, muscle synergy, neuromusculoskeletal model

### 1. INTRODUCTION

Humans walk adaptively in varying environments by the skillful control of their complicated and redundant musculoskeletal system. Although many studies have investigated the underlying mechanism for adaptive walking, it remains largely unclear what neural mechanisms determine the walking adaptability.

Because human walking is rhythmic, elucidating the rhythm control strategy is crucial. The central pattern generators (CPGs) in the spinal cord are largely responsible for adaptive rhythm control through sensory-motor coordination (Orlovsky et al., 1999). In particular, it has been reported that locomotor rhythm and phase are regulated by producing phase shift and rhythm resetting (phase resetting) for periodic motor commands in response to sensory feedback and perturbation (Duysens, 1977; Conway et al., 1987; Guertin et al., 1995; Schomburg et al., 1998; Lafreniere-Roula and McCrea, 2005; Rybak et al., 2006a; Frigon and Gossard, 2010). However, such phase resetting behavior has been investigated only with electromyographic and electroneurographic data measured during fictive locomotion in decerebrate cats, and thus it is unclear if human motor control has such a rhythm regulation mechanism during walking. From a modeling approach on the basis of the hypothesis that phase resetting works for the control of walking rhythm in humans, the phase resetting mechanism has been introduced in motor control models of human walking. Although the models demonstrated that it improves stability and robustness of walking through forward dynamic simulations of human musculoskeletal models (Yamasaki et al., 2003a,b; Nomura et al., 2009; Aoi et al., 2010; Aoi and Funato, 2016), they did not necessarily verify whether the hypothesis is true.

To investigate rhythm regulation mechanisms in biological and natural phenomena, researchers have applied the phase response curve (PRC) in the phase reduction theory, which explains how the phase of a limit cycle oscillator shifts by a perturbation at an arbitrary phase (Kuramoto, 1984; Winfree, 2001). In our previous work (Funato et al., 2016), we assumed human walking as a limit cycle oscillator and identified the PRC from kinematic measurements by changing the belt speed of a treadmill during human walking, which clarified how human walking rhythm is regulated by perturbations. In this study, to examine the hypothesis, we integrated two previous different approaches that used a physical model and identification of the PRC. More specifically, we performed forward dynamic simulations with our previous neuromusculoskeletal model (Aoi et al., 2010) to walk on a treadmill and disturbed the belt speed at arbitrary phases in the same way as our previous experiments with humans (Funato et al., 2016). In particular, we obtained the PRC for two different cases with and without phase resetting in our motor control model and compared the results with the measured PRC in humans. Based on these results, we discuss the contribution of phase resetting to adaptive rhythm control in human walking.

## 2. METHODS

### 2.1. Model

In this study, we used the same neuromusculoskeletal model that we developed in our previous work (Aoi et al., 2010). We briefly explain the model below.

#### 2.1.1. Musculoskeletal Model

Our musculoskeletal model is two-dimensional (**Figure 1A**), and the physical parameters were determined from data obtained from measurement of human walking (Davy and Audu, 1987; Winter, 2004). The skeletal part of our model has seven rigid links: trunk (head, arms, and torso) and thigh, shank, and foot of each leg, and has nine degrees of freedom: hip, knee, and ankle joint angles of each leg and horizontal and vertical translations and rotation of the trunk. Each joint has a linear viscous element, and the knee and ankle joints are subject to large linear elastic and damping torques when these joint angles exceed their limits. We used four contact points on each sole to receive reaction forces from the treadmill belt (toe, heel, and 4.0 cm inside from the toe and from the heel). The reaction force is modeled by a linear spring and damper system for each horizontal and vertical direction. Our model contains nine principal muscles to achieve the necessary motions in each leg. Six muscles produce uniarticular motion: hip flexion [iliopsoas (IL)], hip extension [gluteus maximus (GM)], knee extension [vastus (VA)], knee flexion [biceps femoris short head (BFS)], ankle flexion [tibialis anterior (TA)], and ankle extension [soleus (SO)]. Three muscles produce biarticular motion: hip flexion and knee extension [rectus femoris (RF)], hip extension and knee flexion [biceps femoris long head (BFL)], and knee flexion and ankle extension [gastrocnemius (GC)]. The muscle model consists of contractile and passive elements. The contractile part depends on forcelength and force-velocity relationships and the muscle activation, which is determined through a low-pass filtering of the motor command u<sup>m</sup> (m = IL, GM, VA, BFS, TA, SO, RF, BFL, and GC) from the motor control model. The equations of motion in this model were derived using Lagrangian mechanics and solved using the fourth-order Runge-Kutta method with time steps of <sup>2</sup> <sup>×</sup> <sup>10</sup>−<sup>7</sup> s for the forward dynamic simulation.

#### 2.1.2. Motor Control Model

Our motor control model consists of a hypothetical two-layered CPG model at the spinal cord level, which incorporates phase resetting, and a movement regulation model at the brainstem and cerebellum levels. The CPGs in the spinal cord have been suggested to consist of hierarchical networks that include rhythm generator (RG) and pattern formation (PF) networks (Burke et al., 2001; Lafreniere-Roula and McCrea, 2005; Rybak et al., 2006a,b). The RG network generates the basic rhythm and alters it by producing phase shifts and rhythm resetting in response to sensory feedback, while the PF network shapes the rhythm into spatiotemporal patterns of motoneuron activities. For the RG model, we used two simple oscillators whose phase is φ<sup>i</sup> (0 ≤ φ<sup>i</sup> < 2π, i = right, left) to produce the basic rhythm of the corresponding leg and incorporated phase resetting as explained below. For the PF model, we determined motor commands necessary to produce periodic leg movements in accordance with the oscillator phase based on the muscle synergy hypothesis, which suggests that the linear combination of five basic signals produces a large portion of the motor commands for human locomotion (Ivanenko et al., 2006). More specifically, we used five rectangular pulses pi(φ) (i = 1, ... , 5) for each leg (**Figure 1B**), which are given by:

$$p\_i(\phi) = \begin{cases} 1 & \Phi\_i < \phi \le \Phi\_i + \Delta\Phi\_i \\ 0 & \text{otherwise} \end{cases} \quad i = 1, \ldots, 5 \tag{1}$$

where 8<sup>i</sup> and 18<sup>i</sup> (i = 1, ... , 5) are the onset phase and duration, respectively, of the rectangular pulses, and we omitted the suffix of φ. We determined the muscle synergy-based motor command u Syn <sup>m</sup> by:

$$
\mu\_m^{\text{Sym}} = \sum\_{i=1}^5 \varkappa\_{m,i} \Lambda\_i p\_i(\phi) \tag{2}
$$

where wm,<sup>i</sup> (i = 1, ... , 5) is the weighting coefficient of five rectangular pulses (wm,<sup>i</sup> ≥ 0) and 3<sup>i</sup> (i = 1, ... , 5) is the tuning parameter of the amplitude of the rectangular pulses for different belt speeds.

To emulate the phase shift and rhythm resetting behavior, we incorporated the phase resetting mechanism in the RG model. More specifically, we reset the oscillator phase to a nominal value based on foot contact information by using the following phase dynamics:

$$\dot{\phi}\_i = \omega - K\_\phi \sin(\Delta \phi\_i - \pi) - (\phi\_i - \phi^{\rm FC})\delta(t - t\_i^{\rm FC} - \pi^{\rm FC}) \tag{3}$$

where,

$$
\Delta \phi\_i = \begin{cases}
\phi\_{\text{right}} - \phi\_{\text{left}} \ i = \text{right} \\
\phi\_{\text{left}} - \phi\_{\text{right}} \ i = \text{left}
\end{cases}
$$

ω is the basic frequency, K<sup>φ</sup> is the gain parameter, t FC i is the foot-contact time, τ FC (= 50 ms) is the transmission delay in receiving the foot-contact information, φ FC is the phase value to be reset at the foot contact, and δ(·) is the Dirac delta function. The second term on the right-hand side maintains interlimb coordination so that the legs move in antiphase. The third term of the right-hand side corresponds to phase resetting, which resets the oscillator phase φ<sup>i</sup> to φ FC to modulate the timing of the muscle synergy-based motor command based on the foot contact information. The second and third terms regulate the gait frequency and contribute to the generation of stable limit cycle for walking.

In addition to the CPG model at the spinal cord level, we used a movement regulation model at the brainstem and cerebellum levels based on somatosensory information, where only two crucial factors were incorporated for simplicity: maintenance of an upright posture and the desired forward speed. For the maintenance of an upright posture, a simple feedback control regulates the balance of the trunk pitch to prevent it from falling over using antagonistic uniarticular muscles in the hip of the standing leg.

$$p\_m^{\text{Trank}} = \begin{cases} -\kappa\_m(\theta - \hat{\theta}) - \sigma\_m \dot{\theta} & \text{in state phase} \\ 0 & \text{otherwise} \end{cases} \tag{4}$$

where θ is the trunk pitch angle, θ˙ is the angular rate, θˆ is the reference angle, and κ<sup>m</sup> and σ<sup>m</sup> are the gain parameters (κ<sup>m</sup> = σ<sup>m</sup> = 0 when m 6= IL or GM). For maintenance of the speed, a simple feedback control is used to increase the ankle push-off when the speed is lower than desired and suppress the pushing force in the opposite case by antagonistic uniarticular muscles in the ankle of the standing leg.

$$p\_m^{\text{Speed}} = \begin{cases} -\lambda\_m(\nu - \hat{\nu}) & \text{in state phase} \\ 0 & \text{otherwise} \end{cases} \tag{5}$$

where v is the forward speed, vˆ is the target forward speed, and λ<sup>m</sup> is the gain parameter (λ<sup>m</sup> = 0 when m 6= TA or SO). Because these regulations operate at the brainstem and cerebellar levels, the command signals are delayed and the motor command u Reg <sup>m</sup> is given by:

$$u\_m^{\text{Reg}}(t) = p\_m^{\text{Trunk}}(t - \tau^{\text{Reg}}) + p\_m^{\text{Spec}}(t - \tau^{\text{Reg}}) \tag{6}$$

where τ Reg (= 80 ms) is the delay in receiving transmissions of somatosensory information at the brainstem and cerebellar levels and sending the motor command to the spinal cord level.

The motor command u<sup>m</sup> is given by the summation of the muscle synergy-based motor command u Syn <sup>m</sup> and the motor command by the movement regulation u Reg <sup>m</sup> .

$$
\mu\_m = \mu\_m^{\text{Sym}} + \mu\_m^{\text{Reg}} \tag{7}
$$

#### 2.1.3. Model Parameters

While the model in our previous work (Aoi et al., 2010) walked over the ground, the model in this study walked on a treadmill, as explained below. Therefore, we slightly modified the values of the motor control parameters so that the model achieved steady walking on the treadmill whose belt speed was 1.3 m/s as follows: the onset phase and duration of rectangular pulses were 8<sup>1</sup> = 6.12 rad, 8<sup>2</sup> = 1.48 rad, 8<sup>3</sup> = 2.56 rad, 8<sup>4</sup> = 3.51 rad, 8<sup>5</sup> = 5.38 rad, 18<sup>1</sup> = 0.70 rad, 18<sup>2</sup> = 0.90 rad, 18<sup>3</sup> = 0.90 rad,

18<sup>4</sup> = 1.07 rad, and 18<sup>5</sup> = 0.96 rad, where we set φ = 0 rad at foot contact; the amplitudes and weighting coefficients of the rectangular pulses were 3<sup>i</sup> = 1.0 (i = 1, ... , 5), wVA,1 = 0.42, wTA,1 = 0.35, wSO,2 = 1.26, wGC,2 = 0.87, wIL,3 = 1.02, wBFS,3 = 1.09, wRF,3 = 0.10, wVA,4 = 0.17, wTA,4 = 0.21, wGM,5 = 0.61, wBFS,5 = 0.20, wBFL,5 = 0.20, and the other wm,<sup>i</sup> = 0; the parameters for the oscillator phase dynamics were ω = 2π/1.0 rad/s, K<sup>φ</sup> = 1.7, and φ FC = 0.36 rad; and the parameters for the movement regulation were κIL = −1.0, κGM = 2.0, σIL = −0.20, σGM = 0.40, λTA = −0.20, λSO = 0.12, θˆ = −0.012 rad, and vˆ = 0.1 m/s.

For different belt speeds, we changed 82, ω, 3<sup>i</sup> (i = 1, ... , 5), and φ FC in a similar way to Aoi et al. (2019) as follows: 8<sup>2</sup> = 1.46 rad, ω = 2π/0.9 rad/s, 3<sup>1</sup> = 1.04, 3<sup>2</sup> = 1.14, 3<sup>3</sup> = 1.10, 3<sup>4</sup> = 1.03, 3<sup>5</sup> = 1.18, and φ FC = 0.48 rad when the belt speed was increased by 0.02 m/s, and 8<sup>2</sup> = 1.50 rad, ω = 2π/1.1 rad/s, 3<sup>1</sup> = 0.96, 3<sup>2</sup> = 0.90, 3<sup>3</sup> = 0.90, 3<sup>4</sup> = 0.98, 3<sup>5</sup> = 0.82, and φ FC = 0.04 rad when the belt speed was decreased by 0.02 m/s.

#### 2.2. Phase Response Curve

In the phase reduction theory (Kuramoto, 1984; Winfree, 2001), for a limit cycle oscillator whose period is τ and closed orbit is C on the phase space (**Figure 2**), we can define ψ on C, which follows the dynamics:

$$
\dot{\psi} = \frac{2\pi}{\pi} \tag{8}
$$

To apply the phase dynamics to the neighborhood of the limit cycle, we assume that the point P on C and the point Q close to C have the same phase when they converge to the same point on C for t → ∞. The surface (curve) with the same phase (ψ = ψ<sup>0</sup> = const.) is called an isochron.

When a perturbation I(t) is added to the limit cycle oscillator, ψ follows the dynamics:

$$
\psi = \frac{2\pi}{\tau} + Z(\psi)I(t) \tag{9}
$$

where Z(ψ) is the PRC and explains the phase-dependent rhythm change due to the perturbation. We determine cycles using Poincaré section S, as shown in **Figure 2**. We assume that the trajectory converges to C before I(t) is added. We define t = 0 for the time at the last intersection of C with S before I(t) is added and ψ(0) = 0. We also define t = t<sup>n</sup> (n = 1, 2, ...) for the time at the nth intersection of the disturbed trajectory with S after I(t) is added. The integration of (9) from 0 to t<sup>n</sup> gives:

$$\int\_{0}^{t\_n} \left(\dot{\psi} - \frac{2\pi}{\tau}\right) dt = \int\_{0}^{t\_n} Z(\psi) I(t) dt \tag{10}$$

The Poincaré section S generally mismatches with the isochron of ψ = 0, as shown in **Figure 2**, which induces the difference of ψ between the Poincaré section and isochron and thus R tn <sup>0</sup> ψ˙ dt 6= 2nπ (Imai and Aoyagi, 2016). However, because the disturbed trajectory approaches C as t → ∞, R tn <sup>0</sup> ψ˙ dt = 2nπ approximately for sufficiently large n. This gives:

$$\int\_{0}^{t\_n} Z(\psi) I(t)dt = 2\pi \frac{n\pi - t\_n}{\pi} \tag{11}$$

The right-hand side can be obtained from the phase shift by the perturbation, as shown in **Figure 3**. For an impulsive perturbation at t = s (0 ≤ s < τ ), which is given by I(t) = µδ(t − s) when µ is constant, (11) becomes:

$$
\mu \int\_0^{t\_n} Z(\psi) \delta(t - s) dt = 2\pi \frac{n\tau - t\_n}{\tau} \tag{12}
$$

This gives,

$$Z(\psi(\mathbf{s})) = \frac{2\pi}{\mu} \frac{n\pi - t\_n}{\pi} \tag{13}$$

In this study, we calculated the PRC from (13) using our neuromusculoskeletal model, where we used the foot contact condition of the right leg (any of four contact points of the right foot is below the treadmill belt) for the Poincaré section S. In particular, our previous work (Funato et al., 2016) obtained the PRCs from human walking measurements by accelerating or decelerating the belt speed of a treadmill independently. To compare the simulation results with the human measurements, our model also walked on a treadmill and we obtained the PRCs for the acceleration and deceleration perturbations in the belt speed separately. More specifically, after the model achieved steady walking on a treadmill, we increased or decreased the belt speed by 0.1 m/s for 0.001 s once per trial (µ = 2π ±0.1·0.001 ντ ), where ν is the belt speed, and n = 50 so that the model achieved steady walking after being disturbed. We performed 100 trials by changing the perturbation phase to obtain the PRC. Furthermore, we used the models with and without phase resetting in the motor control model and compared the PRCs calculated from these models by referring to the PRCs obtained from measurements of human walking.

#### 3. RESULTS

By modifying the motor control parameters from those in our previous work (Aoi et al., 2010), both the models with and without phase resetting achieved steady walking on a treadmill whose belt speed was 1.3 m/s. The locomotor behavior, especially the joint kinematics and muscle activities, were almost identical to those in our previous work (Aoi et al., 2010) except for the difference between walking over ground and on a treadmill. **Figures 4A,B** show representative responses of the forward speed for the models without and with phase resetting, respectively, after the models were disturbed. For both models, the forward speed fluctuated after the perturbation and then recovered to steady periodic behavior. Although the model without phase resetting had no shift of the locomotion phase after the recovery, the model with phase resetting had a phase shift.

From 100 trials with different perturbation phases, we obtained the PRCs. **Figures 5A–C** show the PRCs calculated by acceleration and deceleration perturbations for the model without phase resetting, the model with phase resetting, and kinematic measurements of human walking, respectively. Although the PRCs for the model without phase resetting were zero irrespective of the perturbation phase for both types of perturbation, the PRCs for the model with phase resetting showed characteristic phase-dependent shapes. In particular, they intersected with the horizontal axis around the foot-contact timings and mid-stance phases. They had steep positive slopes around the foot contact and positive peaks in the double-stance phase. They also had gentle negative slopes after the doublestance phase and negative peaks before the next foot contact. Furthermore, the PRCs for the acceleration and deceleration perturbations were almost identical. These trends are similar to those in the PRCs obtained from the measurement of human walking.

To investigate the robustness of the obtained results, we examined how the PRC changes for different belt speeds and different motor control parameters, such as the gait frequency. **Figures 6A,B** show the PRCs for the model with phase resetting when the steady belt speed was increased and decreased by 0.02 m/s, respectively. Although there are some differences, the characteristic properties mentioned above remain unchanged.

#### 4. DISCUSSION

#### 4.1. Contribution of Phase Resetting Based on Foot Contact

Our motor control model incorporated phase resetting induced by foot-contact information based on physiological evidence. In particular, cutaneous feedback was observed to contribute to phase shift and rhythm resetting behaviors in fictive locomotion of decerebrate cats (Duysens, 1977; Schomburg et al., 1998). Furthermore, spinal cats walking on a treadmill changed their gait, such as walking, trotting, and galloping, in accordance with the belt speed (Forssberg and Grillner, 1973; Orlovsky et al., 1999), which suggests that the tactile sensory information obtained by their feet from the belt influenced the locomotion phase and rhythm generated by the CPG (Duysens et al., 2000).

In human walking, the timing of basic muscle activation patterns was also strictly linked to the foot-contact event (Ivanenko et al., 2006). Our model integrated the phase resetting mechanism with the muscle synergy hypothesis that suggested that a large portion of motor commands for walking is generated by the linear combination of five basic signals to solve the redundancy problem in motor control (Ivanenko et al., 2006). More specifically, phase resetting in our model just controlled the timing of the basic signals to determine the motor command using the foot-contact information, which is a very simple strategy. Despite the simple strategy, this timing regulation of the basic signals has been reported to produce various locomotor functions, such as the walk-run transition (Cappellini et al., 2006; Aoi et al., 2019), stepping over an obstacle (Ivanenko et al., 2005; Aoi et al., 2013), and split-belt treadmill walking (MacLellan et al., 2014; Fujiki et al., 2018). Hodgkin-Huxley style neuron model showed that the phase of the neurons' activity rapidly changed by external signals (Rybak et al., 2006a,b), which suggests that

indicate the double-stance phase.

the neural system has a mechanism to quickly move the phase of neurons' activity. We would like to incorporate a more biologically detailed neuron model to further investigate the contribution of phase resetting in the future.

Electrical stimulation to the swing legs in cats (Forssberg et al., 1975; Forssberg, 1979) and humans (Belanger and Patla, 1984; Duysens et al., 1990) and mechanical stimulation in humans (Schillings et al., 1996, 2000) showed phase-dependent responses. In particular, stimulation early in the swing phase enhanced flexor muscle activities and extended the swing phase (elevating strategy), while stimulation late in the swing phase enhanced extensor muscle activities and advanced the footcontact timing (lowering strategy) (Eng et al., 1994). From the intersection of the obtained PRCs with zero (**Figure 5C**), our previous work (Funato et al., 2016) showed that the mid-single stance phase extended in response to acceleration perturbations and the foot-contact timing advanced in response to deceleration perturbations, which correspond to the elevating and lowering strategies, respectively. In this study, we incorporated the phase resetting mechanism that modulates the timing of the motor command based on the foot-contact information. This mechanism is related to the lowering strategy. Despite not incorporating the elevating strategy, our model had a PRC shape similar to that for humans not only for deceleration perturbations but also for acceleration perturbations (**Figures 5B,C**). That is, application of only one of these two strategies allowed the model to reproduce the PRCs for acceleration and deceleration perturbations in humans. In the future, we would like to incorporate the elevating strategy in our motor control model to further clarify the adaptive rhythm control mechanism in human walking.

### 4.2. Calculation of PRC

To calculate the PRC from kinematic measurements, mainly two methods have been proposed. One is the impulse method that uses single-impulse perturbation, and the other is the weighted spike-triggered average (WSTA) method, which uses sequential pulse perturbation with zero mean and no temporal correlation (Ota et al., 2009). Our previous work (Funato et al., 2016) used both of these methods to calculate the PRCs for

human walking. Because the impulse method required many trials that exhausted the subjects, the obtained PRCs had large deviations and low temporal resolution and could not show characteristic properties. The WSTA method improved the PRCs, and clear phase-dependent shapes could be resolved (**Figure 5C**). However, it still has limitations with regard to obtaining precise PRCs. For example, two positive peak timings of the PRC for the deceleration perturbation differed. In addition, the acceleration and deceleration perturbations showed some differences in the PRC, and it was difficult to determine whether they were actually different or due to limitations of the method. In particular, although the PRC was analytically derived under the assumption that the perturbation is sufficiently small, the perturbation must be large to reduce the influence of measurement noise in human experiments. In this study, we used a mathematical model to obtain the PRCs for human walking. Because of the high reproducibility of the simulation results, we obtained accurate PRCs for the model using arbitrarily small and short perturbations by the impulse method. Our model showed identical PRCs for the acceleration and deceleration perturbations. The modeling approach using the PRC has an advantage for improving our understanding of the underlying rhythm control mechanism.

### 4.3. Limitations of Our Model and Future Work

The PRC for the model with phase resetting had a similar shape to that of the PRC for humans, and it supports the hypothesis that phase resetting contributes to adaptive rhythm control in human walking in comparison with the PRC for the model without phase resetting. However, our model has limitations for accurately reproducing the PRC for human walking. For example, the PRC for the model with phase resetting had much steeper positive peaks in the double-stance phases compared to the PRC for humans (**Figures 5B,C**). This is possibly because four discrete points on each sole were used for the foot-contact model. Due to the discrete points, perturbations in double-stance phases induced sudden changes in locomotor behavior and caused the steep positive peaks in the PRC. In addition, our model showed short double-stance phases compared to actual human walking (**Figures 5B,C**), which is mainly due to no phalangeal joint in our foot model. However, the PRCs for the model and humans had similar characteristics in the double-stance phase, such as steep positive slopes around the foot contact and positive peaks located in the doublestance phase.

Although we incorporated the phase resetting mechanism in the motor control model, other sensory-motor coordination mechanisms also play a role in human walking. For example, although we focused on the swing-to-stance phase transition using the foot-contact information, the stance-to-swing phase transition has been suggested to include important sensorymotor coordination mechanisms (Ekeberg and Pearson, 2005; Pearson et al., 2006; Dzeladini et al., 2014; Song and Geyer, 2015), such as the unloading rule that uses forcesensitive afferents in the ankle extensor muscles (Duysens and Pearson, 1980; Whelan et al., 1995) and the hip extension rule that uses position-sensitive afferents from the hip (Grillner and Rossignol, 1978; Hiebert et al., 1996). In addition, although this study changed the belt speed of a treadmill to disturb human locomotor behavior, other types of perturbations, such as pulling on the swing leg, have been used (Kobayashi et al., 2000; Nessler et al., 2016). Because the PRC depends on the perturbation, we would like to incorporate other sensory-motor coordination mechanisms and perturbations to further clarify adaptive rhythm control in human walking.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

SA developed the study design. DT performed computer simulations and analyzed the data in consultation with SA, TF, SF, KS, and KT. DT and SA wrote the manuscript and all the authors reviewed and approved it.

#### FUNDING

This study was supported in part by JSPS KAKENHI Grant Numbers JP15KT0015, JP26120006, and JP17H04914.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with some of the authors DT, SA, KS, and KT.

Copyright © 2020 Tamura, Aoi, Funato, Fujiki, Senda and Tsuchiya. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Non-uniqueness Phenomenon of Object Representation in Modeling IT Cortex by Deep Convolutional Neural Network (DCNN)

Qiulei Dong1,2,3, Bo Liu1,2 and Zhanyi Hu1,2,3 \*

*<sup>1</sup> National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China, <sup>2</sup> School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, <sup>3</sup> Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China*

Recently DCNN (Deep Convolutional Neural Network) has been advocated as a general and promising modeling approach for neural object representation in primate inferotemporal cortex. In this work, we show that some inherent non-uniqueness problem exists in the DCNN-based modeling of image object representations. This non-uniqueness phenomenon reveals to some extent the theoretical limitation of this general modeling approach, and invites due attention to be taken in practice.

Keywords: deep convolutional neural network, neural object representation, inferotemporal cortex, nonuniqueness, image object representation

#### Edited by:

*Kenway Louie, New York University, United States*

#### Reviewed by:

*Guy Elston, University of the Sunshine Coast, Australia Qiang Luo, Fudan University, China*

> \*Correspondence: *Zhanyi Hu huzy@nlpr.ia.ac.cn*

Received: *20 November 2019* Accepted: *09 April 2020* Published: *12 May 2020*

#### Citation:

*Dong Q, Liu B and Hu Z (2020) Non-uniqueness Phenomenon of Object Representation in Modeling IT Cortex by Deep Convolutional Neural Network (DCNN). Front. Comput. Neurosci. 14:35. doi: 10.3389/fncom.2020.00035*

### 1. INTRODUCTION

Object recognition is a fundamental task of a biological vision system. It is widely believed that the primate inferotemporal (IT) cortex is the final neural site for visual object representation. Due to viewpoint change, illumination variation and other factors, how visual objects are represented in IT cortex, which manifests sufficient invariance to such identity-orthogonal factors, is still largely an open issue in neuroscience.

There are many different natural and manmade object categories, and each category in turn contains various different members. Currently, a number of works in neuroscience advocate the DCNN (Deep Convolutional Neural Network) as a new framework for modeling vision and brain information processing (Cadieu et al., 2014; Khaligh and Kriegeskorte , 2014; Kriegeskorte , 2015). In Yamins et al. (2014), Yamins and DiCarlo (2016), DCNN is regarded as a promising general modeling approach for understanding sensory cortex, called "the goal-driven approach."

The basic idea of the goal-driven approach for IT cortex modeling can be summarized as: a multi-layered DCNN is trained by ONLY optimizing the object categorization performance with a large set of visual category-labeled objects. Once a high categorization performance is achieved, the outputs of the penultimate layer neurons of the trained DCNN, which are regarded as the object representation, can reliably predict the IT neuron spikes for other visual stimuli in rapid object recognition<sup>1</sup> . In addition, the outputs of the upstream layer neurons can also predict the V4 neuron spikes. The goal-driven approach is conceptually eloquent and has been successfully used to model IT cortex in rapid object recognition and predict category-orthogonal properties (Hong et al., 2016).

<sup>1</sup>The goal-driven approach is for modeling IT neuron representation in rapid object vision, which is assumed largely a feed forward process, hence could be modeled by DCNNs which are also feed forward networks.

### 2. DOES THE GOAL-DRIVEN APPROACH SATISFY THE UNIQUENESS REQUIREMENT IN MODELING IT CORTEX?

#### 2.1. Motivation

Although some experimental results have demonstrated the success of the goal-driven approach in modeling IT cortex to some extent as mentioned above, the following uniqueness problem on the fundamental premise of the goal-driven approach is still unclear: does there exist a unique pattern of activations of the neurons (units) in the penultimate layer of a DCNN to a given set of image stimuli by only optimizing the object categorization performance? This uniqueness problem on object representation via a DCNN has a great influence on the theoretical foundation and generality of the goal-driven approach in particular, and the DCNN as a new framework for vision modeling in general.

In this work, we aim to provide a theoretical analysis on this problem as well as some supporting experimental results. Note that our current work is to clarify the non-uniqueness problem in object representation modeling with DCNNs under the goaldriven approach, it does not mean DCNNs could account for IT diverse specifications, as revealed in numerous works (Elston , 2002, 2007; Jacobs and Scheibel , 2002; Spruston , 2008; Elston and Fujita , 2014; Luebke , 2017).

In order to analyse this problem more clearly, we firstly introduce the definition of DCNN layer's object representation as used for predicting the neuron responses of primate IT cortex in the aforementioned goal-driven approach:

Definition 1. For a layer of a DCNN for object recognition, the activations of the neurons in this layer to an input object image is defined as its object representation.

Following the convention in the computational neuroscience, the following representation equivalence is introduced to evaluate whether the object representations learnt from two DCNNs are the same or not:

Definition 2. Given a set of object image stimuli, if the two object representations of two DCNNs on these stimuli can be related by a linear transformation, they are considered equivalent, or the same representations. Otherwise, they are different representations.

In the deep learning community, a recent active research topic is called "convergent learning" (Li et al., 2016), referring whether different DCNNs can learn the same representation at the level of neurons or groups of neurons. A generally reached conclusion is that different DCNNs with the same network architecture but trained only with different random initializations, have largely different representations at the level of neurons or groups of neurons, although their image categorization performances are similar. Note that although Li et al.'s work and the goal-driven approach focus on the representation from different points of view, the representations in the two works are closely related. Hence, the results in Li et al. (2016) could also re-highlight the aforementioned uniqueness problem in object representation via a DCNN to some extent.

Addressing this uniqueness problem, we show in the following section that, in theory, by only optimizing the image categorization accuracy, different DCNNs can give different object representations though they have exactly the same categorization accuracy. In other words, the obtained object representations by DCNNs under the goal-driven approach could be inherently non-unique, at least in theory.

### 2.2. Theoretical Analysis and Experimental Results

Proposition 1. If the "Softmax" function is used as the final classifier for image categorization in modeling N categories of objects via a DCNN, and the object category with the largest probability is chosen as the final categorization, and if x = (x1, x2, · · · , xN) <sup>T</sup> <sup>∈</sup> <sup>R</sup> <sup>N</sup> is the final output of this DCNN for an input image object I, f(·) is a univariate non-linear monotonically increasing function, y , (y1, <sup>y</sup>2, · · · , <sup>y</sup>N) <sup>T</sup> <sup>=</sup> <sup>F</sup>(x) <sup>=</sup> (f(x1), <sup>f</sup>(x2), · · · , <sup>f</sup>(xN))<sup>T</sup> , then x and y give exactly the same categorization result.

**Proof:** For x and y, their corresponding probability vectors by Softmax are respectively:

$$\mathbf{C}\_{\mathbf{x}} = \left( \frac{e^{\mathbf{x}\_1}}{\sum\_{i=1}^N e^{\mathbf{x}\_i}}, \frac{e^{\mathbf{x}\_2}}{\sum\_{i=1}^N e^{\mathbf{x}\_i}}, \dots, \frac{e^{\mathbf{x}\_N}}{\sum\_{i=1}^N e^{\mathbf{x}\_i}} \right)^T \tag{1}$$

$$C\_{\mathcal{V}} = \left(\frac{e^{\mathcal{V}\_1}}{\sum\_{i=1}^{N} e^{\mathcal{V}\_i}}, \frac{e^{\mathcal{V}\_2}}{\sum\_{i=1}^{N} e^{\mathcal{V}\_i}}, \dots, \frac{e^{\mathcal{V}\_N}}{\sum\_{i=1}^{N} e^{\mathcal{V}\_i}}\right)^T \tag{2}$$

Since y<sup>i</sup> = f(xi) (i = 1, 2, · · · , N) and f(·) is a monotonically increasing function, the magnitude order of elements for x and y does not change. Then the magnitude order of the two probability vectors C<sup>x</sup> and C<sup>y</sup> does not change. Since the object category with the largest probability is chosen as the final categorization, both the indices of the largest elements in C<sup>x</sup> and C<sup>y</sup> are the same, hence the same categorization results are obtained for x and y.

**Remark 1:** Since f(·) is a non-linear function, x and y cannot be related by a linear transformation. In addition, in the deep learning community, the Softmax function is commonly used to convert the output vector of the network into a probability vector, and the category with the largest probability value is chosen as the final category.

**Remark 2:** In theory, f(·) could be different for different input image I. More generally, even the demand of monotonicity for f(·) is unnecessary, we need only the index of the largest value in y is the same to that in x because only the largest value determines the correct categorization. For the Top-K categorization accuracy, we need the index set of the K largest values in y keep the same to that in x, and the rest elements are not required. Hereinafter, for the notational convenience in discussion and practicality of implementation, we always assume f(·) is a univariate non-linear monotonically increasing function.

Proposition 2. As shown in **Figure 1**, assume that DCNN<sup>1</sup> is a multi-layered network, concatenating a sub-network DCNN<sup>P</sup> 1 whose output is x, and a fully connected layer with weight matrix

W<sup>1</sup> ∈ R <sup>N</sup>×<sup>M</sup> and bias b<sup>1</sup> <sup>∈</sup> <sup>R</sup> N×1 ({M, N} are the numbers of neurons at the penultimate layer and last layer of DCNN1, respectively, with M > N), with x′ = W1x + b1. And assume that DCNN<sup>2</sup> is a multi-layered network, concatenating a subnetwork DCNN<sup>P</sup> <sup>2</sup> whose output is y, and a fully connected layer with weight matrix W<sup>2</sup> ∈ R <sup>N</sup>×<sup>M</sup> and bias b<sup>2</sup> <sup>∈</sup> <sup>R</sup> N×1 , with y ′ = W2y+b2. If y′ = f(x ′ ) in element-wise mapping where f(·) is a monotonically increasing function, then the object representation x under DCNN<sup>1</sup> cannot be related by a linear transformation to the object representation y under DCNN2, or x and y are two different object representations under the goal-driven approach.

**Proof:** Since y ′ = f(x ′ ) in element-wise mapping where f(·) is a monotonically increasing function, according to Proposition 1, DCNN<sup>1</sup> and DCNN<sup>2</sup> have the identical image object categorization performance.

Since x ′ <sup>=</sup> <sup>W</sup>1<sup>x</sup> <sup>+</sup> <sup>b</sup>1, then <sup>x</sup> <sup>=</sup> (W<sup>T</sup> <sup>1</sup> W1) <sup>+</sup>W<sup>T</sup> 1 (x ′ − b1), where A + denotes the pseudo-inverse of matrix A. Similarly, <sup>y</sup> <sup>=</sup> (W<sup>T</sup> <sup>2</sup> W2) <sup>+</sup>W<sup>T</sup> 2 (y ′ − b2). By Proposition 1, x ′ and y ′ is related by a non-linear function, then x and y cannot be related by a linear transformation either. In other words, x and y are two different object representations under the goal-driven approach. 

**Remark 3:** Since {W1, W2} ∈ R <sup>N</sup>×<sup>M</sup> and M > N in Proposition 2, the pseudo-inverse operator is used in the above proof. Here are a few words on the pseudo-inverse: Since M > N, which is the usual case in most existing DCNNs for object categorization (Krizhevsky et al., 2012; Simonyan and Zisserman, 2014; Szegedy et al., 2015), the inverse (W<sup>T</sup> <sup>i</sup> Wi) <sup>+</sup>(i = 1, 2) is not unique, but the equalities in <sup>x</sup> <sup>=</sup> (W<sup>T</sup> <sup>1</sup> W1) <sup>+</sup>W<sup>T</sup> 1 (x ′ − b1) and <sup>y</sup> <sup>=</sup> (W<sup>T</sup> <sup>2</sup> W2) <sup>+</sup>W<sup>T</sup> 2 (y ′ − b2) can be strictly met.

Proposition 2 indicates that given DCNN<sup>1</sup> with output x ′ , if there exists another multi-layered network DCNN<sup>2</sup> to output y ′ = f(x ′ ), their representations x and y would be different but with identical categorization performance. This means that the aforementioned non-uniqueness problem in object representation modeling under the goal-driven approach would arise regardless of how many training images are used, and how many exemplar images in each category are included. In other words, the non-uniqueness problem is an inherent problem in DCNN modeling under the goal-driven approach, and it cannot be completely removed by using more training data, at least in theory.

In the above, an implicitly assumption is that given a DCNN<sup>1</sup> with the output x ′ i , there always exists a DCNN<sup>2</sup> with the output y ′ <sup>i</sup> = f(x ′ i ). Does such a DCNN<sup>2</sup> really always exist? This issue can be separately addressed for the following two cases. The first one is that DCNN<sup>1</sup> and DCNN<sup>2</sup> could be of different architectures, and the second one is that they are of the same architecture, but merely initialized differently during training.

#### 2.2.1. The Different Architecture Case

Proposition 3. There always exists a multi-layered network to map I<sup>i</sup> to y<sup>i</sup> for the given input-output pairs {(I<sup>i</sup> ↔ yi), i = 1, 2, · · · , n} in Proposition 2.

**Proof:** As shown in Proposition 2 and **Figure 1**, since DCNN<sup>1</sup> exists, it maps I to x. Denote this mapping function as <sup>x</sup> <sup>=</sup> <sup>S</sup>1(I) <sup>=</sup> DCNN<sup>P</sup> 1 (I). Since x ′ = W1x + b1, y ′ = F(x ′ ) = ((f(x ′ 1 ), f(x ′ 2 ), · · · , f(x ′ n )), y ′ = W2y + b2, and y = (W<sup>T</sup> <sup>2</sup> W2) <sup>+</sup>W<sup>T</sup> 2 (y ′ − b2), we have:

$$\boldsymbol{\wp} = (\boldsymbol{W}\_2^T \boldsymbol{W}\_2)^+ \boldsymbol{W}\_2^T (\boldsymbol{\wp}' - \boldsymbol{b}\_2)$$

$$= (\boldsymbol{W}\_2^T \boldsymbol{W}\_2)^+ \boldsymbol{W}\_2^T (\boldsymbol{F}(\boldsymbol{W}\_1 \boldsymbol{S}\_1(\boldsymbol{I}) + \boldsymbol{b}\_1) - \boldsymbol{b}\_2) \tag{3}$$

This is just the required mapping function. According to the Universal Approximation Theorem in Csáji (2001), it could be straightforwardly inferred that there always exists a DCNN with an arbitrary number <sup>k</sup> <sup>+</sup> 1(<sup>k</sup> <sup>&</sup>gt; 1) of hidden layers, denoted as DCNN2, whose sub-network DCNN<sup>P</sup> <sup>2</sup> with k hidden layers is able to approximate this function.

Proposition 3 indicates that given a DCNN1, there always exists a DCNN<sup>2</sup> whose architecture may be different from DCNN1, so that the object representations of the two DCNNs are different but with the same categorization performance. A training procedure is described in the **Appendix**, to show how to train such a pair of DCNN<sup>1</sup> and DCNN2.

**Remark 4:** In the proof, the only requirement for DCNN<sup>2</sup> is that it should have sufficient capacity to represent the input object set, but it does not necessarily have a similar network architecture to DCNN1. Note that the sufficient representational capacity is an implicit necessary requirement for any DCNNbased applications.

**Remark 5:** In the proof, the number of input images is assumed to be unknown. However, for the finite-input case, Theorem 1 in Tian (2017) guarantees that there exists a twolayered neural network with ReLU activation and (2n + d) weights, which could represent any mapping function from input to output on sample of size n in d dimensions. Of course, such a constructed network could be of a memorized neural network, i.e., it can ensure the given finite inputs to be mapped to the required outputs, but it cannot guarantee that the constructed network could possess sufficient generalization ability for new samples.

#### 2.2.2. The Same Architecture Case

When DCNN<sup>1</sup> and DCNN<sup>2</sup> are obtained with the same network architecture but only trained under different random initializations, clearly a theoretical proof is impossible. However, based on the reported results in the "convergent learning" literatures as well as our simulated experimental results, it seems they still largely have non-equivalent object representations although they have similar categorization performances.

#### **(1) Non-uniqueness results from "convergent learning" literatures**

Using AlexNet (Krizhevsky et al., 2012) as a benchmark, Li et al. (2016) showed that by keeping the architecture unchanged but only trained with different random initializations, the obtained 4 DCNNs have similar categorization performances, but their object representations are largely different in terms of oneto-one, one-to-many, and many-to-many linear representation mapping. Note that the many-to-many mapping in Li et al. (2016) is closely related to the equivalence representation in Definition 2. Hence, the four representations are largely non-equivalent and this non-equivalence becomes more prevalent with increasing convolutional layers.

By introducing the concepts of "ǫ-simple match set" and "ǫ-maximum match set," Wang et al. (2018) showed that for the 2 representative DCNNs, VGG (Simonyan and Zisserman, 2014), and ResNet (He et al., 2016), the size of maximum match set between the activation vectors of individual neurons at the same layer of the two DCNNs, which are also obtained with only different initializations as did in Li et al. (2016), is tiny compared with the number of the neurons at that layer. It was further found that only the outputs of neurons in the ǫmaximum match set can be approximated within ǫ-error bound by a linear transformation, which indicates that for majority of the neurons at the same layer, their outputs cannot be reasonably approximated by a linear transformation, or the corresponding object representations are largely not equivalent.

#### **(2) Non-uniqueness results from our experiments**

Definition 3. If two DCNNs, DCNN<sup>1</sup> and DCNN2, have similar image categorization performances with the same network architecture but different parameter configurations, they are called the similar performing pair of DCNNs.

Generally speaking, our results further confirm the nonuniqueness phenomenon of object representation under the goal-driven approach. We systematically investigated the representation differences between a similar performing pair of DCNNs on the two public object image datasets, CIFAR-10 that contains 60,000 images belonging to 10 categories of objects and CIFAR-100 that contains 60,000 images belonging to 100 categories of objects (Krizhevsky , 2009). In our experiments, 5,000 images per category in CIFAR-10 (also 500 images per category in CIFAR-100) were randomly selected for network training, and the rest for testing. Six network architectures with different configurations (denoted as {D1, D2, D3, D4, D5, D6}) were employed for evaluations, where {D1, D2, D3, D5, D6} were for CIFAR-10 and {D3, D4, D6} were for CIFAR-100 as shown in **Table 1**.

The traditionally used measure, "explained variance" (EV), was employed to access the degree of linearity between the learnt object representations from a similar performing pair of DCNNs, and we trained similar performing pairs of DCNNs under the following two schemes:


#### Here are some main results from our experiments:

**(i) Explained variance on standard data**

The results using the training Scheme-1 are shown in **Figure 2**. **Figures 2A,C** show the categorization accuracies of similar performing pairs of DCNNs under different network architectures with two random initializations on CIFAR-10 and CIFAR-100, respectively. The blue bars of **Figures 2B,D** show the corresponding mean EVs on CIFAR-10 and CIFAR-100, respectively. As seen from **Figures 2B,D**, the mean EVs by {D1, D2, D3, D5, D6} are around 63.4–87.5% on CIFAR-10, while the mean EVs by {D3, D4, D6} are around 53.6–65.9% on CIFAR-100. iIn addition, the mean EV of the network D1 under the training Scheme-2 is 51.2% on CIFAR-10.

Two points are revealed from these results:



TABLE 1 | Network configurations (shown in columns).

*The convolutional layer parameters are denoted as "Conv*h*receptive field size*i*-bn-*h*number of channels*i*." The Fully connected layer parameters are denoted as "Fc-*h*number of units*i*".*

larger explained variance between the two representations. The underlying reason seems that since a DCNN with a deeper architecture will generally have a larger representational capacity and since a fixed task has a fixed representation demand, a DCNN with a larger capacity will give a more linear representation.

In addition, for a similar performing pair, although their categorization performances are similar, it does not mean that the two DCNNs have the identical categorization label for each input sample, either correct or wrong. We have manually checked the categorization results for CIFAR-10 and CIFAR-100. The orange bars of **Figures 2B,D** show the computed mean EVs for only those inputs correctly categorized. As seen from **Figure 2**, the discrepancy of the explained variances between the representations of only the correctly categorized inputs and those of the whole inputs is insignificant and negligible in most cases, and it is perhaps due to the already high categorization rate of the two DCNNs such that the incorrectly categorized inputs only take a small fraction of a relatively large test set.

#### **(ii) Explained variance on noisy data**

In Szegedy et al. (2014), it is reported that DCNNs are sometimes sensitive to adversarial images, that is, images slightly corrupted with random noise, which do not pose any significant problem for human perception, but dramatically alter the categorization performance of DCNNs. Here, we assessed the noise effects on the representation equivalence on CIFAR-10. The input images are normalized to the range [0, 1], and Gaussian noise with mean 0 and standard variance σ = {0.01, 0.02, 0.03, 0.04, 0.05, 0.07, 0.1} are added into these images, respectively. **Figure 3A** shows the corresponding categorization accuracies of similar performing pairs of DCNNs under different architectures, while **Figure 3B** shows the corresponding mean EVs. We find that even under the noise level σ = 0.1, the explained variance does not change much, although the categorization accuracy decreases notably.

#### **(iii) Variations of explained variance by changing stimuli size**

In the neuroscience, the number of stimuli could not be too large. However, for image categorization by DCNNs, the size of the test set could be very large. Does the size of stimuli set play a role on the explained variance? To address this issue, we assessed the explained variance as the dataset size increases by resampling subsets from the original test set of images in CIFAR-10. Here, image subset sizes of [1000, 2000, · · · , 10000] are evaluated. **Figures 4A,B** show the results on the resampled subsets from the whole set of test data and the set of only those images which are correctly categorized, respectively. Our results

FIGURE 2 | (A) Categorization accuracies of {D1, D2, D3, D5, D6} with two random initializations on CIFAR-10 (Net1 and Net2 indicate a same network with two initializations, similarly hereinafter). (B) Mean EVs on CIFAR-10 for all the inputs (blue bars)/only the correctly categorized inputs (orange bars). (C) Categorization accuracies of {D3, D4, D6} with two initializations on CIFAR-100. (D) Mean EVs on CIFAR-100 for all the inputs (blue bars)/only the correctly categorized inputs (orange bars).

show that if the size of the stimuli set reaches a modestly large number (around 3000), the explained variance stabilized. That is to say, we do not need a too large number of stimuli for reliably estimating explained variance. In other words, stimuli in the order of thousands could already reveal the essence, and a further increase of stimuli could not alter much the estimation.

FIGURE 4 | Mean EVs with different image samples: (A) Samples are randomly selected from the whole test image set. (B) Samples are randomly selected from the set of only those correctly categorized images.

#### **(iv) Explained variance vs. neuron selectivity**

Clearly, some DCNN neurons are more selective than others (Dong et al., 2017, 2018). Using the kurtosis (Lehky et al., 2011) of the neuron's response distribution to image stimuli, we investigated whether neuron selectivity has some correlation with the explained variance. We chose top {10%, 20%, · · · , 100%} most selective neurons from each DCNN in a similar performing pair, respectively, then computed the explained variance between the two chosen subsets, and the results are shown in **Figure 5**. As seen from **Figure 5**, with the increase of the percentage of selective neurons, the explained variance increases accordingly. This indicates that for the object representations of a similar performing pair of DCNNs, neuron selectivity is also an influential factor on their explained variance. The explained variance between the subsets of more selective neurons is smaller, and this result seems to be in concert with the conclusion in Morcos et al. (2018) where it is shown that neuron selectivity does not imply the importance in object generalization ability.

**(v) A good representation does not necessarily needs IT-like** In the literature (Khaligh and Kriegeskorte , 2014), it is shown that if an object representation is IT-like, it can give a good object recognition performance. This work shows that the inverse is not necessarily true, at least theoretically speaking. That is, as shown in the above experiments and discussions, many different representations can give the same or quite similar recognition results with/without noise.

**Remark 6:** In this work, we assume the final classifier is a Softmax classifier. For other linear classifiers, the general concluding remark of non-equivalence can be similarly derived. Of course, if the used classifier is a non-linear one, or the output of the penultimate layer is further processed by a non-linear operator before inputting it to a linear classifier, as done in Chang Tsao (2017), where a 3-order polynomial is used as a preprocessing step for the final classification, our results will no longer hold. But as shown in Majaj et al. (2015), monkey IT neuron responses can be reliably decoded by a linear classifier, we thought using Softmax as the final classifier for DCNN-based

IT cortex modeling could not constitute a major problem for our results.

#### 3. CONCLUSION

Here, we would say that we are not against using DCNNs to model sensory cortex. In fact, its potential and usefulness have been demonstrated in Yamins et al. (2014) and Yamins and DiCarlo (2016). Here, we only provide a theoretical reminder on the possible non-uniqueness phenomenon of the learnt object representations by DCNNs, in particular, by the goal-driven approach proposed in Yamins and DiCarlo (2016). As shown in the convergent-learning literatures, such a non-uniqueness phenomenon is prevalent in deep learning, hence when DCNNs are used for modeling sensory cortex as a general framework, people should be aware of this potential and inherent nonuniqueness problem, and appropriate network architectures in DCNN learning should be carefully considered.

#### DATA AVAILABILITY STATEMENT

Publicly available datasets were analyzed in this study. This data can be found here: http://www.cs.toronto.edu/~kriz/cifar.html.

### AUTHOR CONTRIBUTIONS

ZH conceived of the non-uniqueness phenomenon of object representation in modeling IT cortex by DCNN. QD and ZH explored the method. QD and BL implemented the explored

### REFERENCES


method and performed the validation. QD and ZH wrote the paper.

### FUNDING

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB32070100), and National Natural Science Foundation of China (61991423, U1805264, 61573359, 61421004).

### ACKNOWLEDGMENTS

This manuscript has been released as a Pre-Print at http://export. arxiv.org/pdf/1906.02487 (Dong et al., 2019).


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Dong, Liu and Hu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### APPENDIX

#### **Procedure to train DCNN<sup>1</sup> and DCNN2:**

**Input:** A set of n image objects: D = {I<sup>i</sup> , i = 1, 2, · · · , n} with known categorization labels.

**Output:** DCNN<sup>1</sup> and DCNN<sup>2</sup> whose object representations are different but with the same (or similar) categorization performance;


5 Using training pair {(I<sup>i</sup> ↔ yi), i = 1, 2, · · · , n} to train the second DCNN to minimize the Euclidean loss between the DCNN's output y˜<sup>i</sup> and y<sup>i</sup> .

6 The trained DCNN in step (5) is our required DCNN2. The object representation x<sup>i</sup> of DCNN<sup>1</sup> and y<sup>i</sup> of DCNN<sup>2</sup> are different representations by Definition 2, because for the same object I<sup>i</sup> , x<sup>i</sup> and y<sup>i</sup> can give the same categorization results in theory without noise, or similar results with noise in practice, but they cannot be transformed by a linear transformation as shown in Proposition 2.

digital media

of impactful research

article's readership