# A CONVERSATION WITH THE BRAIN: CAN WE SPEAK ITS LANGUAGE?

EDITED BY : Alejandro Barriga-Rivera, Tianruo Guo, Yuki Hayashida and Gregg Suaning PUBLISHED IN : Frontiers in Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-067-4 DOI 10.3389/978-2-88966-067-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# A CONVERSATION WITH THE BRAIN: CAN WE SPEAK ITS LANGUAGE?

Topic Editors:

Alejandro Barriga-Rivera, The University of Sydney, Australia Tianruo Guo, University of New South Wales, Australia Yuki Hayashida, Osaka University, Japan Gregg Suaning, The University of Sydneyn, Australia

Citation: Barriga-Rivera, A., Guo, T., Hayashida, Y., Suaning, G., eds. (2020). A Conversation With the Brain: Can We Speak Its Language?. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-067-4

# Table of Contents


Richard Bachmaier, Jörg Encke, Miguel Obando-Leitón, Werner Hemmert and Siwei Bai


Jonathan Y. Y. Yap, Charlotte Keatch, Elisabeth Lambert, Will Woods, Paul R. Stoddart and Tatiana Kameneva


# Editorial: A Conversation With the Brain: Can We Speak Its Language?

Alejandro Barriga-Rivera1,2 \*, Tianruo Guo<sup>3</sup> , Yuki Hayashida4,5 and Gregg J. Suaning<sup>1</sup>

*<sup>1</sup> School of Biomedical Engineering, The University of Sydney, Sydney, NSW, Australia, <sup>2</sup> Department of Applied Physics III, University of Seville, Seville, Spain, <sup>3</sup> Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, Australia, <sup>4</sup> Graduate School of Engineering, Osaka University, Osaka, Japan, <sup>5</sup> Graduate School of Engineering, Mie University, Mie, Japan*

Keywords: bionics, electrostimulation, nervous system, prosthesis, cochlear, modeling, electrode, neural interface

#### **Editorial on the Research Topic**

#### **A Conversation With the Brain: Can We Speak Its Language?**

Hearing, sight, touch, or learning, all happens in the brain. The different organs in charge of sensing the environment send complex neural messages to the brain to inform about the surrounding world. Likewise, the brain sends different instructions to the organs to elicit a response such as a muscle contraction. Furthermore, the brain is also responsible for the different mental actions such as cognition or the generation of emotions. However, disease or trauma can alter the said neural communications causing blindness, deafness, paralysis, or mental illness among others. Luckily, a family of therapies based on the delivery of electric charge exists or are being investigated to treat some of these health conditions. An example of a successful treatment to restore audition is the cochlear implant (Zeng et al., 2008). Visual and motor prostheses provide hope to the blind and the paralyzed respectively. All of these medical devices share one common challenge: the replication of neural codes. This ambitious goal requires (1) the development of better ways to "listen" to the neurons by means of improved electrode-tissue interfaces and signal processing algorithms, (2) devising stimulation strategies able to mimic physiological responses, and (3) enhancing or restoring brain computational capabilities (Barriga-Rivera et al., 2017a).

This Research Topic includes a total of 11 contributions from more than 40 world leading experts and upcoming researchers, and provides a state-of-the-art view on some of the key questions related to our ability to maintaining a conversation with the brain to treat disease. Ranging from highly sophisticated computational models to novel brain tissue alternatives, the works presented here suggest new strategies to overcome some of the difficulties engineers and scientists are facing.

#### INTERPRETING THE NEURONS

The quality of the conversation between the brain and devices highly depends on the goodness of the connection established with the neurons. On the one hand, computational models have demonstrated an enormous applicability in predicting the efficacy of the said connection and, in particular, how the electric fields generated by implanted electrodes can activate different neurons. For example, Bai et al. used micro-CT scans to reconstruct the detailed three-dimensional anatomy of the human cochlea which was then incorporated into finite element computational models of neural excitability. Along these same lines, a different modeling study (Bachmaier et al.) reported on the potential weaknesses of the mostly-used computational models of auditory nerve fibers. With these modeling studies, we discovered that limited biological features in the simulated nervous system, particularly missing anatomical microstructures and biophysical details,

#### Edited and reviewed by:

*Michele Giugliano, International School for Advanced Studies (SISSA), Italy*

\*Correspondence: *Alejandro Barriga-Rivera alejandro.barriga-rivera@sydney.edu.au*

#### Specialty section:

*This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience*

Received: *01 July 2020* Accepted: *06 July 2020* Published: *14 August 2020*

#### Citation:

*Barriga-Rivera A, Guo T, Hayashida Y and Suaning GJ (2020) Editorial: A Conversation With the Brain: Can We Speak Its Language? Front. Neurosci. 14:794. doi: 10.3389/fnins.2020.00794*

**4**

might cause inaccurate or even misleading information. On the other hand, sophisticated signal processing algorithms can assist in choosing the optimal message to be delivered to the brain. On this topic, a study on noise suppression in bionic hearing reminded us that signal processing can articulate superior performance in delivering information to the brain (Zhou et al.). However, an unmet need for improved electrode-tissue interfaces remains. A study by Gilmour et al. describes a new tool for testing of brain-electrode interfaces: "An improved in vitro model of cortical tissue." As it integrates different cell types (astrocytes, microglia, oligodendrocytes, and neurons), this cost-effective approach can be used for large-scale preclinical evaluation of new-generation devices.

#### ELICITING MEANINGFUL NEURAL ACTIVITY

One of the key limitations in the field of neural electrostimulation relates to its poor ability to replicate physiological neural patterns (Borst and Theunissen, 1999). The development of many neural prostheses has reached an impasse where the level of artificially elicited function does not warrant implanting these devices in more than an experimental-scale cohort of patients. Over the last decade, novel stimulation methods have been developed to directly address the challenge of being able to restore some of the natural processes that occur with normal function through the control of critical neural pathways. For example, high-frequency stimulation (Guo et al., 2017; Muralidharan et al., 2020) or field shaping techniques (Cicione et al., 2012; Barriga-Rivera et al., 2017a,b) have been investigated to improve artificial vision. Stateof-the-art stimulation strategies in the field of bionic vision have been updated in this topic (Fernandez et al.; Tong et al.). In addition, Saeedi's and Hemmert's research work (Saeedi and Hemmert) shows new insights on how neural information elicited by multi-pulse electrical stimulation integrates within the auditory brainstem in 12 cochlear implant recipients. Other researchers (Vickery et al.; Yap et al.) provided an update on the current status of transcutaneous nerve stimulation, whereas Loulit and Potas proposed the dorsal column nuclei as a target for somatosensory restoration.

While there are many studies in this special issue devoted to expanding our understanding of how artificial electrical stimulation interact with neurons with the hope of improving the quality of the artificially elicited neural activity, most of the proposed stimulation methods will require supporting of improved material, manufacturing and packaging techniques to eventually reach the clinic (Rivnay et al., 2017; Benfenati and Lanzani, 2018; Levi et al., 2018).

#### REFERENCES

Barriga-Rivera, A., Bareket, L., Goding, J., Aregueta-Robles, U. A., and Suaning, G. J. (2017a). Visual prosthesis: interfacing stimulating electrodes with retinal neurons to restore vision. Front. Neurosci. 11:620. doi: 10.3389/fnins.2017.00620

# ENHANCING BRAIN COMPUTATIONAL POWER

Paraphrasing the first words of this editorial, everything occurs in the brain. It is therefore the ultimate target of nearly all afferent neuromodulation applications. While improving neural interfaces and signal processing techniques is essential to delivering meaningful neural messages, the brain has the last word in the interpretation of those messages. Fernandez and colleagues (Fernandez et al.) pointed to the potential the brains of the blind have to adapt to the re-introduction of a visual input. The authors remarked on the importance of devising rehabilitation strategies to potentiate the brain capacity of coping with artificially encoded neural messages, a practice that could plausibly bring the performance of neural prostheses to a superior level (Beyeler et al., 2017).

#### FINAL REMARKS

The brain is an extraordinarily complex organ that integrates over 100 trillion connections from nearly 100 billion neurons. In this topic, Buskila et al. remind us of the importance of other brain cells such as the astrocytes in the generation of brain states, a phenomenon known as lateral astrocytic synaptic regulation. In other words, the brain works as a perfectly coordinated orchestra with many instruments of different kinds. When disease or accidents alter the score or the composition of the orchestra, a different tune is played. To restore or even mimic the lost function, the many neurostimulation strategies under development and investigation require a highly multi-disciplinary approach to be able to face the general problem from different viewpoints. Technological advancement can only be accelerated by establishing stronger collaborations between clinicians, neuroscientists and biomedical engineers.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

This work was supported by the RTI2018-094465-J-I00 grant (MCIU/AEI/FEDER, UE) and by the European Union's Horizon 2020 Research and innovation program under the Marie Sklodowska-Curie Grant Agreement No. 746526.

Barriga-Rivera, A., Guo, T., Yang, C.-Y., Abed, A. A., Dokos, S., Lovell, N. H., et al. (2017b). High-amplitude electr elicited neuronal activity in visual prosthesis. Sci. Rep. 7:42682. doi: 10.1038/srep42682

Benfenati, F., and Lanzani, G. (2018). New technologies for developing second generation retinal prostheses. Lab Anim. 47, 71–75. doi: 10.1038/s41684-018-0003-1


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Barriga-Rivera, Guo, Hayashida and Suaning. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Generating Brain Waves, the Power of Astrocytes

Yossi Buskila1,2 \*, Alba Bellot-Saez1,2 and John W. Morley<sup>1</sup>

<sup>1</sup> School of Medicine, Western Sydney University, Campbelltown, NSW, Australia, <sup>2</sup> International Centre for Neuromorphic Systems, The MARCS Institute, Western Sydney University, Penrith, NSW, Australia

Synchronization of neuronal activity in the brain underlies the emergence of neuronal oscillations termed "brain waves", which serve various physiological functions and correlate with different behavioral states. It has been postulated that at least ten distinct mechanisms are involved in the formulation of these brain waves, including variations in the concentration of extracellular neurotransmitters and ions, as well as changes in cellular excitability. In this mini review we highlight the contribution of astrocytes, a subtype of glia, in the formation and modulation of brain waves mainly due to their close association with synapses that allows their bidirectional interaction with neurons, and their syncytium-like activity via gap junctions that facilitate communication to distal brain regions through Ca2<sup>+</sup> waves. These capabilities allow astrocytes to regulate neuronal excitability via glutamate uptake, gliotransmission and tight control of the extracellular K<sup>+</sup> levels via a process termed K<sup>+</sup> clearance. Spatio-temporal synchrony of activity across neuronal and astrocytic networks, both locally and distributed across cortical regions, underpins brain states and thereby behavioral states, and it is becoming apparent that astrocytes play an important role in the development and maintenance of neural activity underlying these complex behavioral states.

Keywords: brain waves, oscillations, astrocytes, spatial buffering, K<sup>+</sup> clearance

# INTRODUCTION

### Neuronal Oscillations

In the central nervous system (CNS), neurons communicate via electrochemical signals which leads to flow of ionic currents through synaptic contacts (Schaul, 1998). At the network level, the synchronization of the neuron's electrical activity gives rise to rhythmic voltage fluctuations traveling across brain regions, known as neuronal oscillations or brain waves (Buzsaki, 2006).

Neuronal oscillations can be modulated in space and time and are affected by the dynamic interplay between neuronal connectivity patterns, cellular membrane properties, intrinsic circuitry, speed of axonal conduction and synaptic delays (Nunez, 1995; Sanchez-Vives and McCormick, 2000; Cunningham et al., 2006; Buskila et al., 2013; Tapson et al., 2013). At the cellular level, these synchronous oscillations fluctuate between two main states, known as "up states" and

#### Edited by:

Yuki Hayashida, Osaka University, Japan

#### Reviewed by:

Jit Muthuswamy, Arizona State University, United States Alexei Verkhratsky, University of Manchester, United Kingdom

> \*Correspondence: Yossi Buskila Y.buskila@westernsydney.edu.au; buskila63@gmail.com

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 30 July 2019 Accepted: 04 October 2019 Published: 18 October 2019

#### Citation:

Buskila Y, Bellot-Saez A and Morley JW (2019) Generating Brain Waves, the Power of Astrocytes. Front. Neurosci. 13:1125. doi: 10.3389/fnins.2019.01125

**7**

"down states", which occur in the neocortex both in vitro and in vivo (Sanchez-Vives and McCormick, 2000). Whereas Down states refer to resting activity and membrane hyperpolarization, Up states are associated with neuronal depolarization and firing bursts of action potentials (Cossart et al., 2003). Importantly, Up states occurring within spatially organized cortical ensembles have been postulated to interact with each other to produce a temporal window for neuronal network communication and coordination (Fries, 2005). This network coherence was found to be essential for several sensory and motor processes, as well as for cognitive flexibility (i.e., attention, memory), thereby playing a fundamental role in the brain's basic functions (Fries et al., 2001; Tallon-Baudry et al., 2004).

Emerging technologies during the past decades led to the description of multiple neuronal oscillations displaying different electrophysiological and connectivity properties across brain areas including the neocortex, thalamus and hippocampus (Steriade, 2006). Using power spectrum analysis, investigators identified that neuronal oscillations fluctuate within specific frequency bands, ranging from very slow (<0.01 Hz) to ultra-fast (>1,000 Hz) oscillations, mediated by at least ten different mechanisms (Penttonen and Buzsáki, 2003). Whereas fast oscillators are found to be more localized within a restricted neural volume (Contreras and Llinas, 2001), slow oscillations typically involve large synchronous membrane voltage fluctuations in wider areas of the brain (He et al., 2008). These network dynamics and connectivity patterns can change according to the behavioral state, with some frequency bands being associated with sleep, while other frequencies predominate during arousal or conscious states (Brooks, 1968; Achermann and Borbély, 1997; Murthy and Fetz, 2006) (**Table 1**). Interestingly, neuronal oscillations interact across different frequency bands to modulate each other and engage specific behaviors (Buzsaki, 2006; Steriade, 2006), and previous studies have postulated that different oscillation frequencies either compete with each other or cooperate in a specific manner to participate in distinct physiological processes such as bias of input selection, temporal linkage of neurons into assemblies and facilitation of synaptic plasticity (Buzsáki and Draguhn, 2004; Isomura et al., 2006). Moreover, oscillation phase relationships between regions are diverse and can be modulated by sensory and motor experiences (Maris et al., 2016), thereby adding greater complexity in deciphering how brain waves coordinate to subserve important functions in both the developing and adult human brain.

The common view of oscillatory frequency bands is that they represent groups of neuronal oscillations acting as distinct entities that work similarly during particular brain functions (Watson, 2015), and therefore, can serve as a fundamental tool for both clinical diagnosis and brain research (Huber et al., 2004; Buzsaki, 2006). In addition, the fact that brain waves expressed in many species (e.g., human, macaque, cat, rabbit, rat) and their behavioral correlates are preserved throughout evolution is a testament to their fundamental role in mediating synchronization across neuronal ensembles to efficiently coordinate and propagate neuronal signals at the network level (Hughes et al., 2004; Bereshpolova et al., 2007; Skaggs et al., 2007; Nir et al., 2011; Peyrache et al., 2011).

# Mechanisms Underpinning Neuronal Oscillations

Neuronal oscillations show a linear progression on a natural logarithmic scale with little overlap (Penttonen and Buzsáki, 2003), leading to the suggestion that at least ten distinct and independent mechanisms are required to cover the large frequency range of brain waves, and it has been reported that several oscillations are driven by multiple mechanisms (Buzsáki and Draguhn, 2004; Buzsaki, 2006). Some of the suggested mechanisms underlying the generation of network oscillations are summarized in **Table 1**, and most of them include reciprocal interactions between excitatory and inhibitory mechanisms (Singer, 1993) or changes in cellular excitability (Liljenström and Hasselmo, 1993; Ainsworth et al., 2011; Bellot-Saez et al., 2018). The latter is often associated with alterations in extracellular ions (e.g., K+; Ca2+) and the hyperpolarization-activated inward current (Ih) (Steriade et al., 1993), which can regulate intrinsic membrane properties such as the resonance frequency (Tohidi and Nadim, 2009; Bellot-Saez et al., 2018), as well as the strength and frequency of network oscillations (Yue and Huguenard, 2001). In this mini-review we will focus on mechanisms by which astrocytes effect neuronal excitability.

Neurons consist of inherent membrane resonance and frequency preference properties (Hutcheon and Yarom, 2000; Buskila et al., 2013) that allow them to act as resonators or transient oscillators that amplify inputs within certain frequencies (Alonso and Llinás, 1989). This oscillatory behavior at multiple frequencies depends on the accurate combination of both low-pass (i.e., passive leak conductance, membrane capacitance) and high-pass (i.e., voltage-gated channels activated close to the resting membrane potential, RMP) filtering properties (Buzsaki, 2006), which endow neurons with a wide repertoire to respond faster and more efficiently to spike trains or fast inputs (Pike et al., 2000). Therefore, alterations in membrane conductance or excitability along the somatodendritic compartments result in differential tuning of the resonant response in different cell types (e.g., interneurons vs. pyramidal or cholinergic cells), which on the one hand filter inputs from neurons that are not synchronized [see Hutcheon and Yarom (2000) and Laudanski et al. (2014) for comprehensive review], and on the other hand is essential for the synchronization of neurons that express similar resonance, therefore, sculpting the functionality of a neuronal network (Hutcheon and Yarom, 2000; Whittington and Traub, 2003; Laudanski et al., 2014; Kékesi et al., 2019).

Consequently, changes in the concentration of extracellular ions that impact the excitability and resonance behavior of individual neurons (e.g., K+, Mg2+, Ca2+), can affect brain rhythms. Indeed, a recent comprehensive report from Nedergaard's group, in which they have recorded different brain rhythms during the sleep-awake cycle show that different rhythms are linked with alterations in extracellular concentrations of K+, Ca2+, Mg2+, and H<sup>+</sup> (Ding et al., 2016),

#### TABLE 1 | Common characteristics of brain waves.

fnins-13-01125 October 18, 2019 Time: 12:51 # 3


confirming that cellular mechanisms which particularly affect the ionic composition of the extracellular fluid can modulate the excitability and synchronous activity of neurons, thus affecting the different brain rhythms. Accordingly, K<sup>+</sup> channels which mediate K<sup>+</sup> efflux and membrane repolarization, play a crucial role in determining the overall network excitability and have been suggested to affect the generation of neuronal oscillations at multiple frequencies (Buzsaki, 2006). Consistent with this view, D'Angelo et al. (2001) showed via experimental and computational modeling of cerebellar granule cells that slow repolarizing K<sup>+</sup> currents terminate the oscillatory "up state" of theta oscillations amplified by a persistent Na<sup>+</sup> current and therefore, underlie the bursting and resonant behavior of theta oscillations. In line with these results, activation of K<sup>+</sup> currents has been associated with enhanced spike timing precision at gamma frequencies in both pyramidal and basket cells in the hippocampus (Penttonen et al., 1998), as well as with lower frequency oscillations in the delta range (Ushimaru et al., 2012). Moreover, intracellular recordings of cortical neurons during alterations in K<sup>+</sup> homeostasis indicate changes in neuronal excitability and resonance behavior that affected the amplification of network oscillations (Bellot-Saez et al., 2018).

K <sup>+</sup> homeostasis in the brain is governed by the activity of astrocytes through several mechanisms, including K<sup>+</sup> clearance from the extracellular fluid. Astrocytes are strategically located

close to synapses, which allows them to critically regulate the overall network function (Wang et al., 2012; Bellot-Saez et al., 2017). Two major mechanisms of astrocytic K<sup>+</sup> clearance have been established: (i) net K<sup>+</sup> uptake, in which the excess of extracellular K<sup>+</sup> ([K+]o) is taken up by K<sup>+</sup> cotransporters (Na+/K+/2Cl−), Na+/K<sup>+</sup> pumps (Na+/K<sup>+</sup> ATPase), and inward rectifying K<sup>+</sup> channels (K<sup>+</sup> ir) that are expressed in astrocytic processes and (ii) K<sup>+</sup> spatial buffering, in which K<sup>+</sup> ions propagate from high to low concentrations through gapjunction (GJ) mediated astrocytic networks by employing membrane voltage differences between the local K<sup>+</sup> reversal potential to the astrocytic network membrane potential, and then released in distal regions of the astrocytic networks (**Figure 1**). Ultimately, the [K+]<sup>o</sup> is returned to baseline levels to prevent hyperexcitability (Verkhratsky and Nedergaard, 2018). Consistent with the importance of the K<sup>+</sup> clearance to normal oscillatory functioning, genetically modified mice that suffer from impaired clearance mechanisms exhibit epileptic seizures, growth retardation, and premature lethality at the age of 2 weeks (Kofuji et al., 2000; Bellot-Saez et al., 2017; Do-Ha et al., 2018). However, recent reports indicate that under physiological conditions, neuromodulators can directly trigger an increase in [K+]<sup>o</sup> and thus signal through astrocytes to alter neural circuit activity and regulate network oscillations (Ding et al., 2016; Ma et al., 2016).

### Astrocytic Modulation of Brain Waves

Numerous studies revealed the essential contributions made by astrocytes to many physiological brain functions, including synaptogenesis (Ullian et al., 2001), metabolic coupling (Magistretti, 2006), nitrosative regulation of synaptic release (Buskila et al., 2005; Abu-Ghanem et al., 2008; Buskila and Amitai, 2010), synaptic transmission (Fields and Stevens-Graham, 2002), network oscillations (Bellot-Saez et al., 2018), and plasticity (Suzuki et al., 2011; Oberheim et al., 2012).

Astrocytes express a plethora of receptors, ion channels, pumps (i.e., ATPase) and cotransporters allowing them to dynamically interact with neurons through several pathways (Haydon and Carmignoto, 2006; Giaume and Theis, 2010; Larsen and Macaulay, 2014). Despite lacking the ability to fire action potentials, astrocytes communicate with neurons and other astrocytes mainly via Ca2<sup>+</sup> signals (Cornell-Bell et al., 1990; Shigetomi et al., 2010). Astrocytic Ca2<sup>+</sup> signals can occur both independently of neuronal activity or following neurotransmitter release and include intrinsic Ca2<sup>+</sup> oscillations within individual cells and Ca2<sup>+</sup> waves that propagate from one astrocyte to another (Zur Nieden and Deitmer, 2006; Nett et al., 2017). Indeed, recent studies found that astrocytic Ca2<sup>+</sup> signaling and glutamate clearance by astrocytes play an essential role in the regulation of the network activity and K<sup>+</sup> homeostasis, which ultimately affects the neuronal excitability underlying network oscillations (Wang et al., 2012; Ding et al., 2016). Recently, Ma et al. (2016) showed that neuromodulators can signal through astrocytes by affecting their Ca2<sup>+</sup> oscillations to alter neuronal circuitry and consequently behavioral output. In line with these observations, Nedergaard's group further

demonstrated that bath application of neuromodulators to cortical brain slices increased [K+]<sup>o</sup> regardless of synaptic activity (Ding et al., 2016), suggesting that increased [K+]<sup>o</sup> could serve as a mechanism to maximize the impact of neuromodulators on the synchronous activity of neurons and their recruitment into networks.

Interestingly, an in vivo study found that spontaneous Ca2<sup>+</sup> oscillations in astrocytes differ between cortical layers, suggesting functional network segregation imposed by astrocytic function (Takata and Hirase, 2008). Indeed, the spatial and functional organization of astrocytes varies between different brain regions (Houades et al., 2008; Chai et al., 2017; Matias et al., 2019) establishing that astrocytes are organized into anatomical and functional compartments (Pannasch and Rouach, 2013). Similarly, a computational model of three-dimensional astrocytic networks showed that the propagation of astrocytic Ca2<sup>+</sup> waves is highly variable between brain regions depending on their GJcoupling organization within the astrocytic network, with short-distance connections favoring spreading of Ca2<sup>+</sup> waves over wider areas (Lallouette et al., 2014). In addition, several studies have provided evidence that astrocytes respond to different neuronally released neurotransmitters and neuromodulators (e.g., Acetylcholine, 5-HT, Histamine, Norepinephrine, Dopamine) by eliciting Ca2<sup>+</sup> elevations that trigger signaling cascades leading to alterations in the concentrations of intracellular and extracellular ions (e.g., Na+, Ca2+, K+) and gliotransmitter release (Blomstrand et al., 1999; Jung et al., 2000; Oikawa et al., 2005; Ding et al., 2013; Jennings et al., 2017; Covelo and Araque, 2018). These studies emphasize the bidirectional communication pathway between neurons and astrocytes, which establish a synergetic mechanism to affect network oscillations.

Recently, Mariotti et al. (2016, 2018) demonstrated that astrocytic modulation and signaling are circuit-specific, as cortical astrocytes not only respond to excitatory inputs, but also react to inhibitory interneurons by eliciting weak or strong [Ca2+]<sup>i</sup> elevations. In addition, two-photon imaging experiments revealed that cortical astrocytes are fast enough to respond to sensory stimulation by evoking fast Ca2<sup>+</sup> events (Stobart et al., 2018). Together, these studies suggest that astrocytes are able to process different patterns of network activity with a variety of Ca2<sup>+</sup> signals in order to decode and integrate local synaptic activity and plasticity (Perea and Araque, 2007; Henneberger et al., 2010; Navarrete et al., 2012), as well as other physiological processes including vasodilation through nitric oxide (Buskila and Amitai, 2010; Muñoz et al., 2015), K<sup>+</sup> signaling (Filosa et al., 2006), release of trophic factors (Igelhorst et al., 2015), and inflammatory mediators (Michelucci et al., 2016). Moreover, gliotransmitters can activate neuronal receptors, including extrasynaptic NR1/NR2B-containing NMDA receptors (Fellin et al., 2004; Jourdain et al., 2007; Wang et al., 2013), thereby establishing reciprocal interactions between neurons and astrocytes that result in the overall modulation of the network excitability and synchronous activity of groups of neurons (Sardinha et al., 2017; Adamsky et al., 2018).

FIGURE 1 | The impact of astrocytic K<sup>+</sup> clearance on network oscillations. (A) Image of GFP labeled cortical astrocytes depicting their organization in non-overlapping domains. (B) Schematic diagram describing the mechanisms of astrocytic K<sup>+</sup> clearance. Top-right inset – K <sup>+</sup> uptake- local increase of [K+]<sup>o</sup> is cleared from the extracellular space through the astrocytic Kir channels, NKCC and Na+/K<sup>+</sup> ATPase. Eventually, K<sup>+</sup> ions flow intracellularly through GJ-connected astrocytes (K<sup>+</sup> spatial buffering) and promote a distal outward current to the extracellular space, where [K+]<sup>o</sup> is low (∼3 mM) as shown in the lower inset (K<sup>+</sup> release). Arrows indicate the direction of K<sup>+</sup> driving force. (C) The functional role of astrocytic K<sup>+</sup> clearance processes on network oscillations. Traces of extracellular recordings showing the network activity before and after brief (1 s) application of 30 mM KCl (red arrow), in normal aCSF (left) and after bath application of 100 µM BaCl<sup>2</sup> (selective blocker of astrocytic Kir4.1 channels, middle trace) or Gap-26/27 (selective blocker of Cx43, right). Note the increase in network excitability following the increase in [K+]<sup>o</sup> depicted as increase in spiking activity. (D) Color coded spectrogram of network oscillations depicting the network activity before and after local increase in [K+]<sup>o</sup> (black arrows, imitating high local neuronal activity) under normal conditions (aCSF, left), following impairment in K<sup>+</sup> uptake with 100 µM BaCl<sup>2</sup> (middle spectrogram) or following blockade of astrocytic spatial buffering with selective astrocytic gap-junction blockers (GAP-26/27, right). Adapted from Neuroscience and Biobehavioral Reviews, vol 77, Alba Bellot-Saez, Orsolya Kékesi, John W. Morley, and Yossi Buskila, Astrocytic modulation of neuronal excitability through K<sup>+</sup> spatial buffering, 87–97, copyright (2017), with permission from Elsevier Ltd., under CC BY license (http://creativecommons.org/licenses/by/4.0/).

Astrocytes mediate long distance communication not only via Ca2<sup>+</sup> waves but also through ATP release (Haas et al., 2006; Suadicani, 2006), which is followed by its degradation to adenosine by extracellular nucleotidases, leading to synaptic inhibition of neurotransmission (Pascual et al., 2005). Consistently, ATP release from neocortical astrocytes has been found to activate purinergic currents in pyramidal neurons, followed by attenuation of synaptic and tonic inhibition (Lalo et al., 2014). These results suggest that cortical astrocytes, via exocytosis of ATP, could also play a role in the modulation of neuronal GABA release and thus phasic and tonic inhibition, which eventually contribute to the generation of hypersynchronous oscillations at the network level.

#### DISCUSSION

In the 19th century, Carl Ludwig Schleich was first to propose that neuroglia is the anatomical locus for controlling neuronal excitation and its transmission from neuron to neuron (Schleich, 1894; Dierig, 1994). A year later, Ramón y Cajal, the father of modern neuroscience, proposed that astrocytes are directly involved in modulating neuronal activity by isolating neighboring neurons (Cajal, 1895; Navarrete and Araque, 2014). In support of this view, Cajal further revealed that "the neuroglia is abundant where intercellular connections are numerous and complicated, not due to the existence of contacts, but rather to regulate and control them, in such a manner that each protoplasmic expansion is in an intimate relationship with only a particular group of nerve terminal branches", which led him to propose that astrocytes exert a major role in modulating brain function during different behavioral states (Cajal, 1895, 1897). More than a century later, with the development of powerful electrophysiological and imaging tools (Berger et al., 2007; Pál et al., 2015), these initial insights about astrocytes as potential modulators of the brain circuitry are gaining more support.

The close association of astrocytes with synapses led to the concept of the tripartite synapse, (consisting the pre-synaptic terminal, the post-synaptic membrane and the cradling astrocyte) which allows the bidirectional interaction of astrocytes with neurons (Araque et al., 1999). Although the molecular and cellular pathways in which astrocytes affect neuronal network activity and brain rhythms are not fully clear, numerous in vivo and in vitro studies indicate that they are playing a key role in the modulation of neuronal excitability and network synchronous

#### REFERENCES


activity, thereby contribute to the "conversation in the brain" (Verkhratsky and Nedergaard, 2018).

The fact that astrocytes can regulate the activity of individual neurons prompted a new concept of network modulation termed "lateral astrocyte synaptic regulation" (Covelo and Araque, 2016). Accordingly, astrocytic regulation of synaptic transmission is heterosynaptic and not restricted to the active synapse itself, but involving the activity of distant tripartite synapses via paracrine signaling of gliotransmitters that depends on the morphological and functional properties of astrocytes, thereby acting as a syncytium that can influence neuronal properties over wide brain regions (Pirttimaki et al., 2017). However, the physiological role of gliotransmission is highly debatable (see Nedergaard and Verkhratsky, 2012; Chai et al., 2017; Papouin et al., 2017; Fiacco and McCarthy, 2018; Savtchouk and Volterra, 2018), as gliotransmitter release has been reliably demonstrated only in vitro in cultures and brain slice experiments that are often accompanied by manipulations (e.g., high frequency stimulation) which can affect astrocytic channels or receptors leading to impaired signaling cascades. This experimental design imposes questions about the existence of gliotransmission (Wolosker et al., 2016; Chai et al., 2017) and whether it plays a physiological role in the brain (Fiacco and McCarthy, 2018). Although previous studies found no correlation between astrocytic Ca2<sup>+</sup> signaling and gliotransmitter release (Fiacco et al., 2007; Petravicz et al., 2008; Agulhon et al., 2010), there is increasing evidence supporting the importance of both the GJ-mediated connectivity and function of astrocytic networks for neuronal-astrocytic communication and control of neuronal network activity (Covelo and Araque, 2016, 2018). Consequently, astrocytic alterations likely lead to aberrant modulation of both synaptic transmission and synchronization of network oscillations, which is also accompanied by changes in behavioral performance.

### AUTHOR CONTRIBUTIONS

All authors conceived the project, wrote and approved the manuscript.

### ACKNOWLEDGMENTS

This study was supported by IPRA to AB-S and the Ainsworth medical research innovation fund awarded to YB and JM.


in vivo. J. Neurosci. 22, 1042–1053. doi: 10.1523/jneurosci.22-03-01042. 2002




triggers astrocytic glutamate release. J. Neurosci. 33, 17404–17412. doi: 10.1523/ jneurosci.2178-13.2013


Thalamus Relat Syst. 1, 95–103. doi: 10.1016/s1472-9288(01)00 009-7

Zur Nieden, R., and Deitmer, J. W. (2006). The role of metabotropic glutamate receptors for the generation of calcium oscillations in rat hippocampal astrocytes in situ. Cereb. Cortex 16, 676–687. doi: 10.1093/cercor/bhj013

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Buskila, Bellot-Saez and Morley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparison of Multi-Compartment Cable Models of Human Auditory Nerve Fibers

Richard Bachmaier <sup>1</sup> , Jörg Encke1,2,3, Miguel Obando-Leitón1,2,4, Werner Hemmert 1,2,4 and Siwei Bai 1,2,5 \*

<sup>1</sup> Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany, <sup>2</sup> Munich School of Bioengineering, Technical University of Munich, Garching, Germany, <sup>3</sup> Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany, <sup>4</sup> Graduate School of Systemic Neurosciences, Ludwig Maximilian University of Munich, Planegg, Germany, <sup>5</sup> Graduate School of Biomedical Engineering, University of New South Wales, Sydney, NSW, Australia

Background: Multi-compartment cable models of auditory nerve fibers have been developed to assist in the improvement of cochlear implants. With the advancement of computational technology and the results obtained from in vivo and in vitro experiments, these models have evolved to incorporate a considerable degree of morphological and physiological details. They have also been combined with three-dimensional volume conduction models of the cochlea to simulate neural responses to electrical stimulation. However, no specific rules have been provided on choosing the appropriate cable model, and most models adopted in recent studies were chosen without a specific reason or by inheritance.

#### Edited by:

Alejandro Barriga-Rivera, University of Sydney, Australia

#### Reviewed by:

Frank Rattay, Vienna University of Technology, Austria Javier Reina-Tosina, University of Seville, Spain

> \*Correspondence: Siwei Bai siwei.bai@tum.de

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 22 July 2019 Accepted: 16 October 2019 Published: 05 November 2019

#### Citation:

Bachmaier R, Encke J, Obando-Leitón M, Hemmert W and Bai S (2019) Comparison of Multi-Compartment Cable Models of Human Auditory Nerve Fibers. Front. Neurosci. 13:1173. doi: 10.3389/fnins.2019.01173 Methods: Three of the most cited biophysical multi-compartment cable models of the human auditory nerve, i.e., Rattay et al. (2001b), Briaire and Frijns (2005), and Smit et al. (2010), were implemented in this study. Several properties of single fibers were compared among the three models, including threshold, conduction velocity, action potential shape, latency, refractory properties, as well as stochastic and temporal behaviors. Experimental results regarding these properties were also included as a reference for comparison.

Results: For monophasic single-pulse stimulation, the ratio of anodic vs. cathodic thresholds in all models was within the experimental range despite a much larger ratio in the model by Briaire and Frijns. For biphasic pulse-train stimulation, thresholds as a function of both pulse rate and pulse duration differed between the models, but none matched the experimental observations even coarsely. Similarly, for all other properties including the conduction velocity, action potential shape, and latency, the models presented different outcomes and not all of them fell within the range observed in experiments.

Conclusions: While all three models presented similar values in certain single fiber properties to those obtained in experiments, none matched all experimental observations satisfactorily. In particular, the adaptation and temporal integration behaviors were completely missing in all models. Further extensions and analyses are required to explain and simulate realistic auditory nerve fiber responses to electrical stimulation.

Keywords: auditory nerve, computational model, biophysical, cable model, electrical stimulation, threshold

# 1. INTRODUCTION

Multi-compartment cable models of the auditory nerve fibers (ANF) have been developed to assist in understanding and predicting neural responses to external stimulation. They have been used to advance our knowledge regarding how the auditory nerve encodes timing, frequency and intensity information (Imennov and Rubinstein, 2009). Moreover, multi-compartment ANF models have been combined with three-dimensional volume conduction models of the human cochlea to simulate responses to cochlear implant (CI) stimulation (Rattay et al., 2001a; Kalkman et al., 2015; Malherbe et al., 2016; Nogueira and Ashida, 2018). Alongside psychophysical experiments, computational models of the auditory nerve are used to evaluate new sound coding and stimulation strategies and are therefore crucial for the improvement of CIs. Nevertheless, there exist several ANF models in the literature with varied morphological or ionic channel properties. Choosing the appropriate cable model for a given computational study is difficult as the different models are difficult to compare based on the original publications. Consequently, most models adopted in existing studies were chosen without a specific reason or by inheritance.

Generally speaking, multi-compartment models are morphological extensions of single-node models. Based on the Schwarz–Eikhof (SE) node model of rat and feline ion channel kinetics (Schwarz and Eikhof, 1987), Frijns et al. (1994) developed an axon model, which was subsequently extended with dendrite and soma to match the feline ANF morphology (Frijns et al., 1995). However, differences in morphology between human and cat might impact spike travel time, and this must be taken into account for correct predictions of CI stimulus coding in humans (Rattay et al., 2001b; O'Brien and Rubinstein, 2016). Therefore, this feline ANF model was later modified to account for the human ANF morphology (Briaire and Frijns, 2005). Meanwhile, Rattay et al. (2001b) designed a different human ANF model based on Hodgkin's and Huxley's (HH) description of the unmyelinated squid axon (Hodgkin and Huxley, 1952) while also including human ANF morphology. Smit et al. (2008) adopted the dendrite and soma from Rattay et al. (2001b) but modified the properties of the axon in order to account for differences in membrane currents at the node of Ranvier between human (Schwarz et al., 1995) and squid.

In addition to differences in morphology and ion channel properties, some ANF cable models also include modifications in order to implement specific physiological properties, including stochastic effects and adaptation. For instance, Rattay et al. (2001b) incorporated a simple and efficient approach to predict stochastic ANF responses by adding a Gaussian noise current term to the total ion current. In comparison, Imennov and Rubinstein (2009) and Negm and Bruce (2014) represented the stochastic nature of ion channels by applying a channel-number tracking algorithm. Woo et al. (2010) included a model of rate adaptation based on a dynamic external potassium concentration, whereas van Gendt et al. (2016) integrated their biophysical model with a phenomenological approach to simulate threshold fluctuations, adaptation and accommodation.

Differences in the description of ANF morphology and physiology lead to distinct model characteristics. A meaningful comparison based on the respective publications is however not feasible, as the models were only fitted to specific ANF properties under certain stimulation patterns. For example, Rattay et al. (2001b) detailed the initiation and propagation of action potentials (APs) but did not describe properties like the strength-duration relation and refractory period. Frijns et al. (1994) and Smit et al. (2008) measured the AP shape, conduction velocity, strength-duration relation and refractory period, but none of these properties were mentioned for the updated versions of their model in Briaire and Frijns (2005) and Smit et al. (2010). Studies that included an adaptation mechanism in their ANF cable models investigated almost exclusively responses to pulsetrain stimulation, but did not include single-pulse responses as in other studies. Therefore, it is necessary to compare the spiking characteristics of different ANF models in order to investigate how the models behave with more generalized stimuli. In this study, three often-cited biophysical human ANF cable models the Rattay (RA) model from Rattay et al. (2001b), the Briaire-Frijns (BF) model from Briaire and Frijns (2005), and the Smit-Hanekom (SH) model from Smit et al. (2010)—were chosen and implemented in a consistent framework, and their performance was evaluated by comparing them against experimental data. It should be noted that all chosen models represent type I spiral ganglion neurons.

# 2. METHODS

The multi-compartment ANF models by Rattay et al. (2001b), Briaire and Frijns (2005), and Smit et al. (2010), from here on abbreviated as RA, BF, and SH, respectively, were implemented in a single framework using Python 3.4, with the package Brian2 (Goodman and Brette, 2009). All models followed the morphology of a human ANF as described in the original publication and consisted of dendrite, soma, and axon. Dendrite and axon were composed of an alternating structure of active nodes and passive myelinated internodes. Additionally, all models included a peripheral terminal as well as a pre-somatic region. All morphological components were modeled as electrical circuits and represented by cylindrical compartments. The spherical shape of the somas in the RA and SH models was approximated by segmenting it into ten cylindrical compartments. Compartment lengths and diameters were distinct in each model, as shown in **Figure 1**. Details of the morphologies are included in Appendix **(Supplementary Material)**. The length of dendritic internodes in Briaire and Frijns (2005) was defined as scalable so as to reflect the varied lengths from the organ of Corti to the soma. In this study, the dendritic internodes were scaled as suggested by Kalkman et al. (2014) with a maximum length of 250 µm.

In unmyelinated compartments of the ANF models, the cell membrane was represented by a capacitor which was charged or discharged by ionic currents. These currents depended on the membrane's ionic permeabilities and Nernst potentials of individual ion species. All three models included exclusively

sodium and potassium channels. The BF model utilized the gating properties suggested by Schwarz and Eikhof (1987) and calculated the ionic currents according to Frankenhaeuser and Huxley (1964), whereas RA and SH adopted the gating properties and equations proposed by Hodgkin and Huxley (1952). However, compared to the original gating properties of the Hodgkin-Huxley (HH) kinetics, which were measured in a squid at 6.3 ◦C, in the RA and SH models they were each multiplied by a compensating factor to account for the faster gating processes in mammalian nerve fibers, and the ionic channel densities were increased. Furthermore, in order to specifically account for the human ANF physiology, Smit et al. (2010) added two modifications to the HH ion channels in the axon: (a) the opening and closing of the potassium channels were modified to be slower (Smit et al., 2008); (b) a persistent sodium current was added to account for the total sodium current together with a transient one of the original HH model (Smit et al., 2009).

Regarding the passive internodes, Briaire and Frijns (2005) implied that they were surrounded by a perfectly insulating myelin sheath. As a consequence, both their capacity and conductivity were assumed to be zero, whereas Rattay et al. (2001b) described them as a passive resistor-capacitor network and thus as imperfect insulators. In Smit et al. (2010), the dendritic internodes were modeled following Rattay et al. (2001b), but the axonal internodes were described using a double-cable structure as proposed by Blight (1985). Detailed information regarding the ionic models can again be found in Appendix **(Supplementary Material)**.

The extracellular space of the ANF models was simulated as a homogeneous medium with an isotropic resistivity of 3 m. Unless otherwise stated, each fiber was stimulated externally by a point electrode situated above the third dendritic node with a vertical distance of 500 µm to the fiber. Measurements were performed at the tenth axonal node to ensure the propagation of an action potential (AP) to the axon. For each of the properties investigated in this study, the parameters for the applied stimuli were taken from the respective physiological experiments in order to ensure a meaningful comparison with experimental results in the literature. Whenever a biphasic stimulus was administered, it was always cathodic-first.

While the models by Briaire and Frijns (2005) and Smit et al. (2010) in the original studies were deterministic, Rattay et al. (2001b) incorporated a simple approach to predict stochastic ANF responses by adding a Gaussian noise current term to the total ion current. In this study, this simple stochastic approach was added to all models to investigate the stochastic and temporal behaviors (sections 3.6, 3.7). The Gaussian noise current term was calculated with:

$$i\_{noise} = X \cdot k\_{noise} \sqrt{A g\_{\text{Na}}} \tag{1}$$

where X is a Gaussian random variable (mean = 0, S.D. = 1). gNa denotes the maximum sodium conductivity, and A is the membrane surface area. The term is multiplied with the factor knoise, which is common to all compartments and is used to adjust how strongly the stochastic behavior of the channels is emphasized.

#### 3. RESULTS

#### 3.1. Thresholds

The threshold current Ith of an ANF model is defined as the minimal current amplitude required to elicit an AP with otherwise constant stimulation parameters. This section reports the dependency of Ith on the phase length and polarity of single monophasic pulses, the pulse rate and duration of biphasic pulse trains, and the frequency and duration of sinusoidal stimuli.

#### 3.1.1. Single Monophasic Pulses

**Figure 2** compares the strength-duration curves, i.e., the relations between Ith and the duration of the applied pulse, for both monophasic cathodic and anodic stimuli. All models demonstrated thresholds that decrease with longer pulse

TABLE 1 | Rheobase Irh and chronaxie τchr of ANF models for monophasic cathodic and anodic stimulation.

Smit-Hanekom models, respectively. The x-axis is set in a log-scale for a better comparison.


The point electrode was situated above the third dendritic node with a vertical distance of 500 µm to the fiber.

duration. Thresholds were also larger for anodic stimulation; this was most obvious for the BF model.

The current threshold to which a strength-duration curve converges for a very long pulse is called rheobase Irh; the chronaxie τchr defines the required pulse width to elicit an AP when applying twice Irh. These two values are commonly used to characterize the strength-duration behavior of a nerve fiber and are compared among the three models in **Table 1**. The values for Irh with cathodic stimuli ranged from 61.3 µA (RA) to 220 µA (BF) and were smaller than those with anodic pulses. While Irh for the two polarities differed by a factor of 1.4 and 1.2 for the RA and SH model, the threshold for anodic stimulation increased by more than a factor of 2.1 in the BF model. The impact of polarity on τchr was less pronounced, and the values ranged from 39.1 µs (BF) to 125 µs (RA).

In Ranck (1975), τchr of mammalian nerve fibers were found to lie between 29 and 100 µs, whereas van den Honert and Stypulkowski (1984) suggested a distinctly longer average chronaxie of 264 µs based on experiments with feline ANF. Variations in these experimental observations may be due to differences in experimental setup and stimulation method (Frijns et al., 1994). BeMent and Ranck (1969) measured that anodic pulses required 3.19–7.7 times the current of cathodic pulses to excite feline nerve fibers, and Armstrong et al. (1973) reported a ratio of 1.0–3.2. Therefore, despite the large variation between the three models, all of them show τchr within the experimental range, and all three are consistent with the increased anodic thresholds.

#### 3.1.2. Biphasic Pulse Trains

Trains of biphasic pulses with 45 µs/phase and an 8 µs interphase gap were applied to all ANF models. Ith was measured as a function of pulse rate and train duration, as depicted in **Figure 3**. In all cases, the thresholds remained constant for pulse rates up to 2,000 pulses per second (pps) and train durations longer than 1 ms. The RA model predicted a decreasing threshold for pulse rates higher than 2,000 pps with a maximal drop of 1 dB from the single biphasic pulse threshold at 10,000 pps. SH, however, showed an opposite trend: the threshold at 10,000 pps rose by over 1 dB for all train durations longer than 0.3 ms. No obvious differences from the single pulse threshold were observed in BF.

Experiments with human CI listeners have also shown that thresholds decrease with pulse rates (multi-pulse integration). Carlyon et al. (2015) measured a drop of 3.9 dB from 71 to 500 pps and a larger drop of 7.7 dB from 500 to 3500 pps.

Integration for pulse rates even smaller than 10 pps has been observed by Zhou et al. (2015), who delivered pulse-train stimuli through CIs in humans and guinea pigs. They also discovered temporal integration up to 640 ms. Our simulation results thus lead to the conclusion that none of the models were able to predict pulse-train integration in a comparable range with the experimental data.

#### 3.1.3. Sinusoidal Stimulation

Ith was also measured for sinusoidal stimuli (positive phase first), with frequencies between 125 and 16 kHz, as depicted in **Figure 4**. All models predicted the minimal threshold at a

frequency of 500 Hz. In RA, a growth of approximately 6 dB per octave was obtained for frequencies higher than 1 kHz, and a similar increase, namely 7 dB per octave, was found in SH above 2 kHz; in comparison, BF predicted smaller threshold increases between 1 and 8 kHz; between 8 and 16 kHz the slope was close to 7 dB per octave. Stimulus duration exerted only minimal impact on the threshold.

Dynes and Delgutte (1992) recorded threshold currents in cat auditory nerve fibers. While for high frequencies (8–20 kHz), the slope of the threshold increase approaches 6 dB per octave in most fibers as in the models, for low frequencies (200 Hz–1 kHz) the slope flattened only to about 3 dB per octave and never increased. Shannon (1983) measured the threshold of sinusoidal stimuli with frequencies between 30 Hz and 3 kHz in human CI users. The resulting thresholdfrequency curve could be divided into three parts: a rather flat segment for frequencies below 100 Hz, a segment with an increase of 12–15 dB per octave at frequencies between 100 and 300 Hz, and a 3 dB per octave increase segment for higher frequencies. Pfingst (1988) also reported an increase in the threshold of roughly 3 dB per octave for frequencies between 1 and 16 kHz. Pfingst (1988) and Pfingst and Morris (1993) obtained threshold-frequency curves which dropped for small frequencies with a minimum threshold between 60 Hz and 200 Hz. Due to these differences, it must be concluded that the comparison of psychophysical threshold and single fiber recordings/simulations must be taken with a grain of salt.

None of the ANF models predicted a threshold increase of more than 10 dB per octave as measured by Shannon (1983) between 100 and 300 Hz. The threshold-frequency curves predicted with the models dropped between 125 and 500 Hz, so the minimum was reached for a higher frequency than in experiments. The threshold increase measured from BF between 2 and 8 kHz matched the experimental results, whereas the other two models overestimated it by a factor of two.

In the absence of electrophysiological measurements however, psychoacoustic measurements might give an insight into general trends.

#### 3.2. Conduction Velocity

The conduction velocity v<sup>c</sup> describes how fast an AP propagates along the nerve fiber. Hursh (1939) found in feline nerve fibers that v<sup>c</sup> increased linearly with the fiber outer diameter D, and reported the scaling factor k to be 6. k is was defined as

$$k = \frac{\nu\_c/(ms^{-1})}{D/\mu\text{m}}.\tag{2}$$

Boyd and Kalu (1979) obtained a slightly smaller scaling factor of 4.6 for feline nerve fibers, with an outer diameter between 3 and 12 µm. **Figure 5** compares the conduction velocities of ANF models with experimental results.

The velocities of dendrite and axon were measured separately due to their morphological and physiological differences. Scaling factors for the dendrite of BF and the axon of SH

were considerably smaller than experimentally obtained values, while all other scaling factors were within ±25 % of the experimental results.

The soma of all three ANF models has a high capacitance due to its large diameter and reduced myelination. Consequently, the soma delays the conduction of APs. This is apparent in **Figure 6**, which illustrates the model responses to a 100 µs cathodic current pulse injected at the peripheral terminal. The duration of the somatic delay was determined by measuring the time difference between the APs at the nodes directly before and after the soma, which were found to be 305 , 130 , and 240 µs for RA, BF, and SH, respectively. Stypulkowski and van den Honert (1984) measured the electrically evoked compound AP of feline auditory nerves and observed two peaks with a time difference of 200 µs. They suggested that the earlier peak arose from a direct excitation of the axon near the soma, whereas the second peak had its origin at the dendrite. Accordingly, the time difference between the two peaks can be used to estimate the somatic delay for feline ANFs, which is closer to the values from BF and SH. On the other hand, the double peaks exhibited in neuronal response telemetry measurements with CI listeners have a temporal distance of 300 µs (Lai and Dillier, 2000). Using this value as a reference point for human ANFs, the somatic delay predicted by RA appears very realistic.

#### 3.3. Action Potential Shape

The shape of AP was compared among ANF models by measuring the height as well as the rise and fall times of AP. The AP height was defined as the voltage difference between the resting potential and the peak value. Rise and fall times were

Smit-Hanekom models, respectively. Each line depicts the voltage over a course of time at a single morphologic component, starting from the peripheral terminal represented by the topmost line. The lines are vertically aligned true to scale according to the compartmental distances. The high capacitance of the soma causes a large additional delay of the AP.

determined as the time periods between the AP maximum and its 10 % height, obtained during the ramp-up and -down phases, respectively. In this section, APs were triggered by a monophasic 100 µs cathodic current pulse with an amplitude of Ith and 2×Ith, as shown in **Figure 7**.

The increase of the stimulus amplitude by a factor of two resulted in no significant changes in the AP shape in any of the models but drastically shortened their latency, which is reported in section 3.4. The short hyperpolarization at the beginning of the curves from BF was a passive response to the external cathodic stimulus, which is not visible in the other models; this variation may likely be due to the difference in distance between the stimulating electrode (at the third dendritic node) and the recording electrode (at the tenth axonal node) as a result of different internodal lengths among the three models. Another striking feature observed from **Figure 7** is the extremely long fall time of 712 µs with SH, which is more than three times as large as those with the other models. In comparison, the differences in AP height and rise time were relatively small: the AP height ranged from about 88 mV (RA) to 107 mV (SH), and all APs peaked at positive values; the rise time ranged from 87 µs (BF) and 121 µs (SH). These parameters that define the AP shape were almost independent of pulse form, phase duration, and stimulus amplitude.

Only a limited number of studies with the objective to investigate AP shape can be found in the literature. Paintal (1966) measured AP rise and fall times of feline nerve fibers at 37.1 ◦C and revealed an inverse relation with the conduction velocity. The rise time curve was steep for a conduction velocity below 40 m/s and flattened out for faster conduction. On the other hand, the relation between the fall time and conduction velocity was approximately linear. Based on the conduction velocities reported in section 3.2, the data from Paintal (1966) were used to interpolate rise and fall times of the models. The interpolated rise time values for RA, BF, and SH are roughly 220 , 190 , and 270 µs, respectively, whereas their fall times are longer and range from 350 to 365 µs. As a result, all three ANF models showed distinctly shorter rise times than interpolated values based on Paintal (1966). The fall time values of RA and BF were also smaller than results obtained by Paintal (1966), but the value of SH was about twice as much as the interpolated value. In addition, a recent computational study confirmed the simulated contribution of type I spiral ganglion cells with an AP duration of approximately 1/3 ms, which was close in timing with the experimentally recorded electrically evoked compound action potential (Miller et al., 2004; Rattay and Danner, 2014).

#### 3.4. Latency

The latency is defined as the time period between the onset of a stimulus and the peak of the resulting AP. Four monophasic cathodic stimuli differing in phase duration and stimulus amplitude were applied to the ANF models, and the corresponding latency was measured at the third dendritic node, which was right below the electrode. Results are listed in **Table 2** along with values from feline experiments. All models predicted a shorter latency than the experimental data for all considered stimuli, with RA in general having the closest values to experimental measurements and BF producing significantly smaller latency values than the other models. This could partly be due to determining the latency at the compartment closest to the electrode in the model while, in the experiment, it might have been determined further away from the spike initiation site which would add an conduction delay. In both experiment and model, increases in phase duration led to a longer latency, while an increase in the amplitude resulted in a shorter latency. Nevertheless, the data from van den Honert and Stypulkowski (1984) suggest a latency reduction of around 50% when doubling the stimulation current (Stim. B to Stim. C). RA and BF predicted a larger decrease of around 69% and 66% while SA predicted 57%.

#### 3.5. Refractoriness

The refractoriness characterizes the reduced excitability of an ANF after the initiation of an AP. It was measured in this study as described in Frijns et al. (1994): two monophasic 50 µs cathodic

TABLE 2 | Action potential latency of ANF models measured with four different stimuli.


Latency values from relevant feline studies are also included (italicized).

A: monophasic 40µs cathodic current pulse with amplitude Ith.

B: monophasic 50µs cathodic current pulse with amplitude Ith.

C: monophasic 50µs cathodic current pulse with amplitude 2Ith.

D: monophasic 100µs cathodic current pulse with amplitude Ith.

stimuli were applied. The first stimulus with an amplitude of 1.5Ith served as a masker for the second one; the current threshold of the second stimulus, necessary to elicit another AP, was measured for different inter-pulse intervals (IPI), i.e., the time period between the two stimuli (Wesselink et al., 1999).

**Figure 8** depicts the refractoriness of the ANF models. In this figure, the relative increase in threshold of the second stimulus compared to a single pulse threshold is plotted against the IPI. At small IPI values, the refractory curves of all models showed a steep decrease, where the thresholds of the second stimulus quickly approached the masker threshold. For IPI values around 2 ms, RA and SH predicted the threshold of the second pulse slightly smaller than the single pulse threshold.

The refractoriness of an ANF can be described by the absolute and relative refractory periods: the absolute refractory period (ARP) is the period after the initiation of an AP, during which it is impossible for a second propagating AP to be elicited regardless of the strength of stimulus; the subsequent period that requires an elevated threshold for spike generation is called the relative refractory period (RRP). In this study, ARP was recorded as the time interval between two stimuli, during which the second stimulus required a current amplitude of at least 4 times the masker amplitude to elicit a second AP, whereas RRP was the time period between the two stimuli, where the threshold of the second stimulus was only increased by a factor of 1.01 (Wesselink et al., 1999). The ARP and RRP of ANF models for different stimuli are listed in **Tables 3**, **4** along with values obtained in feline experiments. All models predicted a smaller RRP than the experimental measurements. Regarding ARP, a larger value than experimental observations was found. In particular, the ARP magnitude of the SH model was twice as large as that of the other models. In the case of BF with a biphasic stimulus of 50 µs/phase, secondary activation was elicited in the model, which resulted in difficulty in determining the ARP in this situation. This was not present in all other situations. While the experimentally measured RRP values were approximately ten times larger than ARP, the ANF models predicted a ratio smaller than two.

#### 3.6. Stochasticity

The stochasticity of ANFs can be described with two aspects: one is the jitter, defined as the standard deviation of repeated measurements of the latency; the other is the relative spread of the threshold Ith, calculated as the standard deviation of the threshold measurements divided by the mean (van Gendt et al., 2016). In this section, the Gaussian noise current term proposed by Rattay et al. (2001b) was added to all three ANF models, as we wanted to investigate whether this simple and computationally efficient approach was sufficient to simulate the stochastic behavior within the range of experimental measurements. Monophasic 50 µs cathodic current pulses were used for simulations, and stochastic behaviors were recorded for various values of knoise, ranging from 0.1 to 2 times the initial value which was fitted in order to obtain a relative spread of about 5%. Threshold measurements for each knoise value were repeated 500 times to calculate the relative spread. Jitters were obtained by measuring the latency 500 times for a stimulation with Ith. Spontaneous APs, i.e., APs initiated at 0 A or before the onset of the stimulus, were excluded in both measurements. Results are illustrated in **Figure 9**.

TABLE 3 | Absolute refractory period (ARP) of ANF models measured with four stimuli.


Measurements from feline studies are also included (italicized). The question mark represents a difficulty in determining the exact ARP due to the secondary activation caused by biphasic stimuli in the Briaire-Frijns model.

A: monophasic 40µs cathodic current pulses.

B: monophasic 50µs cathodic current pulses.

C: monophasic 100µs cathodic current pulses.

D: biphasic 50µs cathodic first current pulses.

For the selected range of knoise, the relative spread lay below 30 % for all models. Further increases in knoise can result in larger spreads but also in a high probability for spontaneous APs. In comparison, results for the jitter were more varied. While the jitter could reach as far as 180 µs with RA, it was confined to 25 µs in the case of the BF model.

Javel et al. (1987) reported a relative spread of 12 % and 11 % in feline ANFs using biphasic stimuli with phase durations of 200 and 400 µs, respectively. Smaller values between 5% and 10% were found by Miller et al. (1999) and Dynes (1996), who excited feline ANFs using monophasic pulses with a phase duration of 100 and 40 µs. Experimentally observed jitters for a stimulation of feline ANFs with Ith ranged from 80 µs (Cartee et al., 2000) to 190 µs (van den Honert and Stypulkowski, 1984). Hence, the addition of Gaussian noise current to RA and SH with appropriate values for knoise managed to produce both relative spread and jitter that fit the experimental range, as shown in TABLE 4 | Relative refractory period of ANF models measured with four stimuli.


Measurements from feline studies are also included (italicized).

A: monophasic 50µs cathodic current pulses.

B: monophasic 100µs cathodic current pulses.

C: biphasic 200µs cathodic first current pulses.

**Figure 9**. However, the jitter generated by BF was too small even for high knoise values.

#### 3.7. Pulse-Train Responses and Adaptation

In this section, the spiking behavior of the ANF models was investigated for pulse-train stimulations. The Gaussian noise current term was again added to all models to account for the stochasticity. Biphasic current pulses with a phase duration of 20 µs and an amplitude of 1.5 Ith were used.

The train of pulses lasted for 300 ms, and four different pulse rates were investigated. Each stimulation was repeated 50 times. Poststimulus time histograms (PSTHs) were used to depict the average number of APs in each 10 ms time bin in **Figure 10**.

In general, higher pulse rates led to reduced firing efficiency. With a rate of 400 pps, 100% firing efficiency was obtained in all models. For an increase to 800 pps, RA and SH predicted reduced firing rates. With a further increase to 2,000 pps, RA showed a similar spiking behavior as for 800 pps, while the spiking rate of BF was reduced by more than a factor of two, and SH responded almost solely

to the first pulses of the pulse trains. When stimulated with 5,000 pps, small firing rates were measured with all models.

Adaptation of ANF spiking rate has been demonstrated in animal experiments. Zhang et al. (2007) measured adaptive responses to pulse trains with rates between 250 and 10,000 pps, and reported that the reduction in firing rates became larger as pulse rates increased. A similar tendency was observed by Litvak et al. (2001), who applied pulse-train stimuli with rates of 1,200 and 4,800 pps. Zhang et al. (2007) and Westerman and Smith (1984) concluded using feline and gerbil ANFs that adaptation was strongest during the first 10 ms of a pulse train, but still apparent after 100 ms. As none of the ANF models used in this study were explicitly developed to include adaptation, it is unsurprising that they showed no or little adaptation mostly limited to a reduction in firing efficiency following the first AP.

# 4. DISCUSSION AND CONCLUSION

In this study, we designed a computational framework to investigate some properties of biophysical multi-compartment models of the human ANF. We subsequently implemented three existing cable models in this framework, including RA (Rattay et al., 2001b), BF (Briaire and Frijns, 2005) and SH (Smit et al., 2010), and compared the outcomes with each other and with experimental measurements. This is the first study to perform a systematic comparison between different multi-compartment models of the human ANF, and will contribute to the future development of ANF models.

In comparison to experimental data, ANF models predicted drastically smaller ratios between ARP and RRP values as they revealed an overestimated ARP and an underestimated RRP. With axon models by Frijns et al. (1994) and Imennov and Rubinstein (2009), distinctly higher ratios of RRP to ARP have been predicted (detailed results not shown). A likely explanation for the more physiologically accurate refractoriness of axon models is the simplified morphology, particularly the lack of a soma. Moving the stimulus location for the human ANF models from dendrite to axon and therefore excluding the delay resulting from conduction across the soma region would have led to less steep refractory curves and more physiological ARP and RRP values. One exception may be the SH model, whose ARP was twice the magnitude of the other models. This large ARP is likely to be associated with the long AP duration exhibited by SH (approximately 1 ms, as shown in **Figure 7**), whereas the other two models presented a much shorter AP duration (approximately 1/3 ms). The long AP duration thus makes it impossible for the SH model to achieve the experimental ARP value of 300 µs to 500 µs. Moreover, computational studies demonstrated that the cathodic and anodic thresholds (and their ratio) varied, as the stimulus shifted in constant distance along the axis of a cell (Rattay, 1999), or even as it moved along a fiber with constant diameter (Rattay, 2008). Since the chronaxie is rather different between myelinated axons and the nonmyelinated soma (Ranck, 1975; Rattay et al., 2012), moving the stimulation site also altered the strength-duration relationship of the neuron. As a consequence, model validation may only be sensible when the stimulation conditions are comparable in both the models and the experiments.

One major hindrance regarding human ANF modeling is that neither the precise morphology nor the ion channel kinetics of human neurons are completely characterized (O'Brien and Rubinstein, 2016). In general, the internode length increases rather proportional with axon diameter (Rushton, 1951). The SH model, in which a shorter internode was attached to a thicker central axon compared to the peripheral axon, is thus in conflict with this observation. The inclusion of a soma is crucial for a realistic description of the human ANF; this necessitates the addition of a dendrite, which further complicates the optimization of an already large set of parameters in biophysical ANF models. The soma (unmyelinated but surrounded by layers of "satellite cells," as described in Rattay et al., 2001b) in human ANF models is highly capacitive and thus charge consuming, which imposes a huge barrier for the propagation of an AP. This leads to a large delay in propagation. Rattay et al. (2001b) mentioned that the somatic barrier became insurmountable for APs after only small variations of certain model parameters. This reveals the difficulty of balancing the capacity of the soma in order to predict a realistic somatic delay without erasing the AP. Even small changes in the stimulation pattern such as an increase of the IPI for a few microseconds can cause the loss of the second AP at the somatic region, which explains the very steep refractory curves as shown in **Figure 8**. Somas in feline ANF models are less critical for the propagation of APs as they are small and myelinated (Liberman and Oliver, 1984), which reduces the capacity and in turn the chance of losing an AP at the somatic region. A shorter presomatic delay was reported when the somatic diameter in the RA model was reduced from 30 µm to 20 µm, which was closer to average soma size of human spiral ganglion cell, and thus the temporal spiking behavior was altered when the soma diameter was changed (Potrusil et al., 2012). Furthermore, the conduction velocity was also influenced by the axon diameter. An increase in the respective diameter of peripheral and central axons in RA from 1 and 2 µm to 1.3 and 2.6 µm, which was closer to measurements from human specimen, decreased the conduction time by 21.4 % (Rattay et al., 2013).

In this study, the Gaussian noise current term in RA was also applied to the other two models to account for the stochastic nature of ion channels. Based on Equation (1), this noise current increases with the maximum sodium conductivity and the membrane surface area, implying that stochasticity is more pronounced in larger fibers and with higher sodium densities. However, the contrary has been revealed in experiments: the strength of stochasticity was found to decrease as the fiber diameter increased (Verveen, 1962), and the relative spread was later demonstrated to be inversely proportional to the square root of the total number of sodium channels (Rubinstein, 1995). As a consequence, the role of a single channel in the voltage fluctuation is less significant when compared to the total ionic conductance (Rubinstein, 1995; Badenhorst et al., 2016). Moreover, experiments showed that the ionic channel noise of ANF increased as the membrane potential deviated from the resting potential (Verveen and Derksen, 1968), but such voltage dependency was not included in the noise current term by Rattay et al. (2001b). A modified version of the conductance-based stochastic model, which included the inverse relationship and voltage dependency, has been proposed by Badenhorst et al. (2016). Here, the authors were particularly motivated to have their model reflect the actual in vivo behaviors. The single node model by Negm and Bruce (2014) and the axon model by Imennov and Rubinstein (2009) produced stochastic responses using a channel number tracking algorithm with channel transitions following a Markov jumping process. This approach was found to be the most accurate one to model channel noise (Mino et al., 2002). It is hence worth further investigating the applicability of these approaches in our framework.

None of the three models predicted pulse-train responses in a range comparable with experimental results, because they were not able to appropriately account for temporal effects of ANF, such as pulse-train integration or adaptation. Therefore, these models need to incorporate a mechanism capable of predicting such long-term effects, as these effects are likely to exert an significant impact on the perception of CI users (Clay and Brown, 2007). Currently, there is still no precise knowledge regarding the mechanisms of the adaptive behavior observed in ANFs. Nevertheless, two biophysical approaches for adaptation have been developed. Woo et al. (2009) modeled adaptation using a dynamic external potassium concentration [K <sup>+</sup>]<sup>e</sup> at the nodes of Ranvier and applied it to a feline ANF model in Woo et al. (2010). The model was based on the findings on leeches that [K <sup>+</sup>]<sup>e</sup> changes induced adaptation-like effects (Baylor and Nicholls, 1969). However, there is no experimental evidence that an ongoing stimulation of a nerve fiber can alter [K +]e sufficiently, or that this is the case in mammal ANFs.

Negm and Bruce (2014) incorporated adaptation in a single node model by adding hyperpolarization-activated cation channels and low-threshold potassium channels, both of which have been identified in mammalian spiral ganglion neurons. These two types of ion channels had a much slower gating property and complemented the relatively fast dynamics of sodium and potassium currents. As this approach has not yet been applied to a multi-compartment ANF model, it remains unclear how the additional ion channels will affect the initiation and propagation of APs. A simple inclusion of these channels to an existing ANF model is not sufficient, as the spiking behavior of the model may be altered, and subsequently extensive parameter optimization is required. On the other hand, stochasticity and temporal behaviors of

#### REFERENCES


ANF have been efficiently implemented in phenomenological models. van Gendt et al. (2016) created a hybrid model that combined the biophysical and phenomenological approaches to efficiently predict responses to pulse-train stimuli. This model was also implemented in combination with a three-dimensional volume conduction model of the cochlea (van Gendt et al., 2016, 2017). Nonetheless, as phenomenological models do not include realistic biophysical details in their implementation, their predictions are often limited only to predefined stimuli.

# DATA AVAILABILITY STATEMENT

The scripts and generated datasets for this study can be found at https://gitlab.lrz.de/tueibai-public/human-anf-models.git.

## AUTHOR CONTRIBUTIONS

RB contributed to model simulation, data acquisition and analysis, and manuscript drafting. JE contributed to study design, data analysis, and manuscript revising. MO-L contributed to data analysis and manuscript revising. WH and SB contributed to study design and critical manuscript revising. The final manuscript has been approved by all authors.

### FUNDING

This project and the authors were supported by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 702030, and the German Research Foundation (DFG) under the D-A-CH programme (HE 6713/2-1). Publication with Frontiers is financed within the Open Access Publishing Funding Programme by the DFG and the Technical University of Munich.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01173/full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bachmaier, Encke, Obando-Leitón, Hemmert and Bai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Electrical Stimulation in the Human Cochlea: A Computational Study Based on High-Resolution Micro-CT Scans

Siwei Bai 1,2,3 \*, Jörg Encke1,2,4, Miguel Obando-Leitón1,2,5, Robin Weiß1,2 , Friederike Schäfer <sup>2</sup> , Jakob Eberharter <sup>2</sup> , Frank Böhnke<sup>6</sup> and Werner Hemmert 1,2,5

<sup>1</sup> Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany, <sup>2</sup> Munich School of Bioengineering, Technical University of Munich, Garching, Germany, <sup>3</sup> Graduate School of Biomedical Engineering, University of New South Wales, Sydney, NSW, Australia, <sup>4</sup> Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany, <sup>5</sup> Graduate School of Systemic Neurosciences, Ludwig Maximilian University of Munich, Planegg, Germany, <sup>6</sup> Department of Otorhinolaryngology, Klinikum rechts der Isar, Munich, Germany

#### Edited by:

Alejandro Barriga-Rivera, University of Sydney, Australia

#### Reviewed by:

Pavel Mistrik, MED-EL, Austria Randy Kevin Kalkman, Leiden University Medical Center, Netherlands

> \*Correspondence: Siwei Bai siwei.bai@tum.de

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 12 August 2019 Accepted: 22 November 2019 Published: 05 December 2019

#### Citation:

Bai S, Encke J, Obando-Leitón M, Weiß R, Schäfer F, Eberharter J, Böhnke F and Hemmert W (2019) Electrical Stimulation in the Human Cochlea: A Computational Study Based on High-Resolution Micro-CT Scans. Front. Neurosci. 13:1312. doi: 10.3389/fnins.2019.01312 Background: Many detailed features of the cochlear anatomy have not been included in existing 3D cochlear models, including the microstructures inside the modiolar bone, which in turn determines the path of auditory nerve fibers (ANFs).

Method: We captured the intricate modiolar microstructures in a 3D human cochlea model reconstructed from µCT scans. A new algorithm was developed to reconstruct ANFs running through the microstructures within the model. Using the finite element method, we calculated the electrical potential as well as its first and second spatial derivatives along each ANF elicited by the cochlear implant electrodes. Simulation results of electrical potential was validated against intracochlear potential measurements. Comparison was then made with a simplified model without the microstructures within the cochlea.

Results: When the stimulus was delivered from an electrode located deeper in the apex, the extent of the auditory nerve influenced by a higher electric potential grew larger; at the same time, the maximal potential value at the auditory nerve also became larger. The electric potential decayed at a faster rate toward the base of the cochlea than toward the apex. Compared to the cochlear model incorporating the modiolar microstructures, the simplified version resulted in relatively small differences in electric potential. However, in terms of the first and second derivatives of electric potential along the fibers, which are relevant for the initiation of action potentials, the two models exhibited large differences: maxima in both derivatives with the detailed model were larger by a factor of 1.5 (first derivative) and 2 (second derivative) in the exemplary fibers. More importantly, these maxima occurred at different locations, and opposite signs were found for the values of second derivatives between the two models at parts along the fibers. Hence, while one model predicts depolarization and spike initiation at a given location, the other may instead predict a hyperpolarization.

**31**

Conclusions: Although a cochlear model with fewer details seems sufficient for analysing the current spread in the cochlear ducts, a detailed-segmented cochlear model is required for the reconstruction of ANF trajectories through the modiolus, as well as the prediction of firing thresholds and spike initiation sites.

Keywords: cochlear implant, computational model, finite element analysis, electrical stimulation, auditory nerve fibers, model reconstruction

# 1. INTRODUCTION

The cochlea in the inner ear is a complex three-dimensional structure, where sound is coded by the sensory hair cells into electrical impulses traveling along the auditory nerve to the brain. These hair cells are easily damaged, which leads to permanent hearing loss. Cochlear implants (CIs) are surgically-implantable biomedical devices that bypass the sensory hair cells and directly excite the remaining fibers of the auditory nerve with electric current. They are capable of restoring a surprisingly large degree of auditory perception to patients that are severe-to-profoundly deaf. Up to the year of 2012, there were more than 325,000 CI recipients all over the world, and more than 100,000 CI users in Europe (De Raeve and van Hardeveld, 2013), which were about 200 implanted patients per million inhabitants. However, this only accounts for 7% of all adults with hearing impairment that could benefit from a CI in Europe (De Raeve and van Hardeveld, 2013). In addition, the estimated prevalence of permanent bilateral hearing impairment among newborns varies from 0.1 to 0.4 % (Fortnum et al., 2001), among which 45% are considered potential CI candidates (De Raeve and van Hardeveld, 2013).

As the human cochlea is deeply embedded inside the temporal bone, direct measurements of electrical potential or current along the auditory nerve fibers are not readily feasible. Computational cochlear models have been extensively utilized to simulate current spread in the cochlea and neuronal excitation, and provided useful insights. For instance, it has been demonstrated that the anatomical structure, such as the tapering spiral feature of the cochlea (Briaire and Frijns, 2000), the conductivity of the bone and other structures (Kalkman et al., 2014; Wong et al., 2015; Malherbe et al., 2016) and the inclusion of a head model (Malherbe et al., 2016), influence the current spread as well as the neural excitation pattern. In addition, the location of electrode array relative to the cochlear wall has also a strong effect on the distribution of electrical current as well as the excitation pattern of the auditory nerve (Frijns et al., 2001; Hanekom, 2001; Malherbe et al., 2016).

Nevertheless, due to limitations in image acquisition and model reconstruction, many detailed features of the cochlear anatomy have not been included in existing models. These features include the microstructures inside the modiolar bone, where spiral ganglion neurons (SGNs) reside and neural fibers as well as blood vessels run through. As a result, the peripheral processes of the auditory nerve have been conventionally modeled as a smooth sheet extending into the main trunk of the nerve without taking into account the bone porosity (Finley et al., 1990; Frijns et al., 2001; Hanekom, 2001, 2005; Rattay et al., 2001a; Choi et al., 2005; Kalkman et al., 2014, 2015; Malherbe et al., 2016; Mangado et al., 2016; Nogueira et al., 2016). Moreover, during the auditory nerve fiber reconstruction in most studies, both the dendritic ends and the ganglion cell bodies were considered evenly distributed around the central axis of the modiolus, the auditory nerve fibers (ANFs) were then reconstructed by applying spline interpolation between the dendritic end and the ganglion cell body, and a spline extrapolation beyond the ganglion cell (Frijns et al., 2001; Hanekom, 2001, 2005; Kalkman et al., 2014, 2015; Malherbe et al., 2016; Mangado et al., 2016; Nogueira et al., 2016). It has been suggested in Kalkman et al. (2015) that a model with grouped ganglion cell bodies, similar to reality, results in a more focussed excitation pattern than a model with evenly distributed cell bodies. Hence, it is necessary to investigate the influence of the modiolar bone porosity on the electrical current spread as well as the excitation pattern of the auditory nerve.

The excitation pattern of ANFs is in general predicted by the implementation of multi-compartment cable models (Rattay et al., 2001b, 2013; Briaire and Frijns, 2005; Smit et al., 2010; Potrusil et al., 2012). The cable models incorporate neural compartmental impedances that affect the amplitude of intracellular potential generated within neural compartments in response to external stimulus delivered by CI electrodes. Nevertheless, there exist several ANF models in the literature with varied morphological or ionic channel properties. Choosing the appropriate cable model for a given computational study is difficult, as different models does not necessarily respond the same way to a given stimulus (Bachmaier et al., 2019). Consequently, most models adopted in existing studies were chosen without a specific reason or by inheritance. Rattay (1986) has shown that for a stimulation of an axon with an extracellular electrode, the activating function f = d 4ρic · ∂ 2V ∂x 2 (d, ρ<sup>i</sup> , and c represent, respectively the fiber diameter, the axomplasmatic resistivity and capacity per unit length) predicts the initiation of an action potential. With the assumption of d, ρ<sup>i</sup> , and c remaining constant, activation is then correlated to the second spatial derivative of external voltage <sup>∂</sup> 2V ∂x 2 . We thus in this study decided to adopt the activating function for the analysis of spike initiation sites, before we implement more complex multi-compartment models.

In this paper, we introduced a new three-dimensional (3D) model of the implanted human cochlea from a set of highresolution µCT scans using the finite element (FE) method; this model managed to capture the intricate microstructures inside the modiolar bone. Subsequently, we validated simulation results against intracochlear measurements, and compared the detailed model to a simplified model without these microstructures. Due to the structural irregularity inside the modiolus of the detailed model, conventional methods to generate ANFs are not applicable. We hereby also developed a new algorithm to reconstruct ANFs within the 3D cochlear model.

# 2. METHODS

#### 2.1. FE Model Reconstruction

The µCT scans of a human cadaveric temporal bone with an inserted dummy electrode (pure silicone, without platinum alloy wires or contacts) were acquired by the Department of Otorhinolaryngology at the Rechts der Isar Hospital, with an isotropic voxel size of 5.9 µm and a spatial resolution of 3, 000 × 3, 000×2, 752 voxels (Braun et al., 2012). The scans were initially processed to enhance the contrast and the edges between different tissues. Due to limitation of the computational memory, the field of view of the scans were subsequently rescaled to include only the cochlea and its immediate surroundings, and later downsampled to an isotropic resolution of 9.6 µm with a spatial resolution of 930 × 930 × 1, 014 voxels.

The segmentation of the µCT scans was performed in 3D Slicer (Version 4.6) (Kikinis et al., 2014), an open-source platform for medical image processing. In 3D Slicer, each tissue compartment was assigned a label map. To generate a label map, a threshold was chosen for the gray level of the pixel intensity at a single slice to automatically select most of the desired tissue, and a paintbrush was used to manually modify the selection. This procedure was repeated at every second or third slice until the end of the dataset, and an interpolation method was later used to create a full segmentation by automatically connecting the sparse set of contours. A paintbrush was then chosen again to modify the tissue map until a desired accuracy was met. The segmented tissue compartments from the µCT scans are bony labyrinth, cochlear canal and cochlear nerve. A surface triangular mesh was generated for every compartment.

The T1-MRI scans of a human head were acquired with an isotropic voxel size of 1 mm. After the enhancement of the image contrast, the head scans were automatically segmented into three compartments, i.e., scalp, skull, and brain, in BrainSuite (Shattuck and Leahy, 2002), an open-source software specialized in processing MRI head scans. The surface meshes of the cochlear model and head model were then imported into Blender, an open-source platform for 3D computer graphics. The coordinate systems of both models were aligned in Blender, so that the cochlear model was embedded in the head model at the petrous part of the left temporal bone. Further processing was subsequently performed on all surface meshes in Geomagic Wrap (3D Systems, SC, USA) to increase the mesh quality and smoothness. The procedures included removing non-manifold edges, splitting self-intersecting triangles, reducing edge crease, smoothing spikes, and repairing holes.

Afterwards, all surface meshes were transferred to ANSYS ICEM CFD (ANSYS, PA, USA). After defining edges at the intersections between compartments and at the desired electrode contact locations, the tetrahedral volumetric mesh was generated with appropriate meshing and coarsening parameters. The aforementioned model reconstruction procedures are demonstrated in **Figure 1**. The volumetric mesh, with 21,937,778 elements, was exported to COMSOL Multiphysics (COMSOL

on the fiber trajectory.

AB, Sweden), a cross-platform FE solver, for the simulation of electrical stimulation. The geometry of the cochlear model (cochlear canal, auditory nerve and CI electrode) is presented in **Figures 2A–C**.

In the segmented cochlear model (named "ORI"), the fine details of microstructures through the Rosenthal's canals were captured, as illustrated in **Figure 2C**. In order to investigate the influence of these microstructures, the auditory nerve model was subsequently modified by removing all of the fine details inside the modiolar bone, and the resulting simplified cochlear model was named "SIM." The geometry of SIM (cochlear canal, simplified auditory nerve and CI electrode) is displayed in **Figures 2D,E**. This model resulted in a volumetric mesh of 26,848,015 elements.

The electric potential V in the model was calculated using Laplace's equation: ∇ · (−σ∇V) = 0, where σ is the electric conductivity, and ∇ is the nabla partial differentiation operator given by ∇ ≡ ∂ ∂x , ∂ ∂y , ∂ ∂z . The electrical conductivity of model compartments (Bai et al., 2012; Malherbe et al., 2016) is shown in **Table 1**. The electric permittivity of biological tissues in the model is neglected under quasi-static approximation (Malmivuo and Plonsey, 1995).

# 2.2. CI Electrode Design and Stimulation Scheme

The dummy CI electrode in the cadaveric temporal bone was also reconstructed from the µCT scans. The electrode was inserted through the round window into the scala tympani. The electrode then punctured the basilar membrane at approximately 270◦ and traveled along the scala vestibuli, until it stopped at an approximately 720◦ angle into the cochlear canal, as shown in **Figure 2A**. This translocation likely resulted from changes in the mechanical properties of tissues in the cadaveric bone, which became more rigid. Nevertheless, translocation may also occur in clinical settings (Holden et al., 2013; Risi, 2018).

The conductivity of the silicone CI electrode was assigned to be zero. The electrode contacts were arranged based on the MED-EL (Innsbruck, Austria) Standard twelve-contact-pair design, with a contact radius of approximately 0.18 mm and a centrecentre distance of approximately 2.4 mm. The current-controlled stimulation scheme was monopolar with a total electric current of 1 mA from an electrode contact pair; all other pairs were inactive at floating potentials, with the net current being zero. The stimulating electrodes were numbered from the base to the apex of the cochlea. The CI reference electrode with a radius of approximately 1 cm was set as ground and placed extracochlearly on the left temporal bone of the skull, superior, and posterior to the left external acoustic meatus.

### 2.3. Nerve Fiber Reconstruction

In Blender, a spiral was defined along the entire outer edge of osseous spiral lamina (25.003 mm). This curve was, representing the synaptic ending of the peripheral axon, used to derive the starting points for all fibers. A second spiral was created by projecting the starting curve onto the plane, where the base of the truncated auditory nerve sits. The projected spiral was shrinked to fit in the base, and subsequently rotated by 45◦ . This curve then


All conductivity values were adapted from Malherbe et al. (2016), except for the bone (Bai et al., 2012).

FIGURE 2 | (A–C) The cochlear model "ORI" with a detailed-segmented auditory nerve geometry; (D,E) The cochlear model "SIM" with a simplified nerve model, whose fine details through the Rosenthal's canals were removed.

acted as the basis of the end points of all fibers. On both of the spirals, the spacial coordinates of 400 evenly-spaced seed points, including the endpoints of the spirals, were exported.

ANFs were reconstructed based on these seed points with a program written in Python. The seed points were firstly mapped onto the closest nodes of the FE mesh of the auditory nerve. The shortest path through the FE mesh between each pair of points on the start and end curves was calculated by using Dijkstra's algorithm, as shown in **Figure 1**. Later, a sub-volume was extracted around each of the approximated fiber trajectories, and was subsequently remeshed with a finer resolution in order to smooth the fiber. The final fiber trajectory was gained by re-applying Dijkstra's algorithm on every remeshed sub-volume.

We reconstructed fibers using the FE meshes of the auditory nerve in the ORI model. The reconstructed ANFs are illustrated in **Figure 3**, and the fiber lengths lay within the ranges of 5.520– 8.151 mm. As the SGN peripheral axon has an average length of 1.5 mm (Spoendlin and Schrott, 1989; Rattay et al., 2001b), and the soma diameter is recently reported to be 20 µm (Potrusil et al., 2012), these reconstructed ANF trajectories represented the peripheral axon, soma, and part of central axon of the SGNs.

For data analysis, the electrical potential data for both ORI and SIM were extracted using the coordinates of ANFs acquired from the detailed-segmented auditory nerve in the ORI model. We then calculated the first and second derivatives of electric potential along the fiber direction. As the derivatives of the raw voltage data exhibited large peak values that would have been smoothed by the nerve fibers, we applied a low-pass filter derived from the length constant of myelinated axons of spiral ganglion cells to the voltage data, before calculating the derivatives. A similar approach can be found in Zierhofer (2001), where the author approximated the steady-state solution to the cable equation with a convolution product of the second spatial derivative of the external potential and a spatial low-pass filter depending on the length constant of the fiber. The length constant λ is defined as

$$
\lambda = \sqrt{\frac{\rho\_m \cdot a}{2\rho\_i}},\tag{1}
$$

where the transmembrane resistivity ρ<sup>m</sup> is 1 k · cm<sup>2</sup> per myelin layer for 80 layers, the intracellular resistivity is ρ<sup>i</sup> 0.05 k · cm, and the axonal radius a is 1 µm (values taken from Rattay et al., 2001b). The first derivative <sup>∂</sup><sup>V</sup> ∂x was approximated by

$$V\_k' \approx \frac{V\_{k+1} - V\_k}{|r\_{k,k+1}|},\tag{2}$$

where k represents the kth node on an individual fiber, V ′ k is the first derivative of V at the kth node, and |rk,k+<sup>1</sup> | is the distance between the kth and k + 1th nodes on the fiber. Using the finite difference method, the second derivative <sup>∂</sup> 2V ∂x <sup>2</sup> at the kth node can be approximated as.

$$V\_k^{\prime\prime} \approx \frac{\frac{V\_{k+1} - V\_k}{|r\_{k,k+1}|} - \frac{V\_k - V\_{k-1}}{|r\_{k-1,k}|}}{\frac{|r\_{k-1,k+1}|}{2}} \tag{3}$$

#### 2.4. Intracochlear Potential Measurements

Intracochlear potentials were measured in 10 CI users (16 ears) using the telemetry system of the CIs (Zierhofer, 2000). **Table 2** lists relevant information on all CI subjects in this experiment. In the present experiment, biphasic pulses of 40 µs, with an inter-phase gap of 2.1 µs and with the cathodic (negative) phase leading, were used as stimuli. The voltage at the measuring electrode was recorded by the telemetry system at the end of the anodic phase in the stimulating electrode. The pulses had an amplitude of 50 CU (1 CU ≈ 1 µA).

A Research Interface Box (RIB2, University of Innsbruck) was used to communicate with the implants. Customized software written in Python was used to generate the stimuli and record the telemetry results. The MED-EL impedance field telemetry (IFT) system used a track-and-hold circuit, which followed the voltage only during anodic phases and held the voltage at their end. This measured voltage was then output as 2,048 bits of adaptive sigma-delta-modulated data (Zierhofer, 2000). Subsequently, the voltage value was obtained by averaging and multiplying by a factor provided by the manufacturer. A more detailed description and characterization of the IFT system can be found in Neustetter

(2014) for a more detailed description and characterization of the IFT system.

A full voltage spread matrix was measured, meaning that all (active) electrodes were measured against all other electrodes, respectively. Some data points were missing due to the electrodes being deactivated or showing clearly outlying (very high) impedances, which indicated bad contacts. This was mostly the case for the most basal electrodes, which indicated these electrodes were not completely inside the cochlea.

Measurements presented in this work were conducted prior to other experiments in our workgroup. All subjects gave their informed consent and received monetary compensation for their participation. Measurements were conducted in accordance to the Declaration of Helsinki, and were approved by the

TABLE 2 | Information on CI subjects participating the intracochlear measurement.


All CIs are products from MED-EL (Innsbruck, Austria).

medical ethics committee of the Klinikum rechts der Isar (Munich, 2126/08).

# 3. RESULTS

#### 3.1. Model Validation

For measurements at any implant electrode, a broad range of values was observed within cochlear implant subjects. This was shown in **Figure 4**, which presents the mean (in dashed blue line) and standard deviation of measurements at two exemplary electrodes. Measurements at the stimulating electrodes were left out, as the model did not account for the electrode-lymph interface. Simulation data with the stimulating current adjusted to the same value as in the experiment was also shown in the figure in solid red line. The shape of the simulated curve matched the measurements closely, and the simulated values fell within the range of measurement data, although the simulation slightly overshot the mean curve.

#### 3.2. Stimulation Profile of the Detailed Model

**Figure 5** describes the electrical potential profile, extracted from ORI, i.e., the detailed-segmented cochlear model, along the 400 reconstructed fibers arranged from the base to the apex of the cochlea. As is observed in the plots, the maximal potential value of each simulation appeared in proximity to the stimulating electrode, whose position is indicated by a small triangle in **Figure 5**. The maximal value was also located in most situations close to, if not at, the synaptic ending of peripheral axons (i.e., tip of the ANFs); the exception occurred when E1, i.e., the most basal electrode, was the stimulating electrode, and the maximal potential value showed up at approximately 1 mm from the nerve fiber tip. The reason is that in the 3D model E1 was close to the medial wall of scala tympani, whereas all other electrodes were near the lateral wall of cochlea.

It is also obvious from **Figure 5** that as the stimulating electrode shifted toward the apex, the extent covered by a higher

FIGURE 4 | The mean (dashed blue line) and standard deviation (blue zone) of intracochlear potential measurement data for two exemplary electrodes: 4 and 9. The solid red line represents data from the simulation, whose stimulating current was adjusted to the same value as in the measurement, i.e., 50 µA.

potential grew larger, which suggests the stimulation became less discriminative; meanwhile, the maximal potential value became larger as the stimulating electrode was shifted upwards, despite that the electrical stimulation current remained the same for all electrodes. This is also depicted in **Figure 6A**, which compares the electrical potential in absolute value along the edge of the spiral lamina, i.e., the synaptic ending of peripheral axons for all stimulating electrodes. When these electrical potentials at the synaptic endings were normalized to their respective maximum, as in **Figure 6B**, it showed a converged decline toward the base at a speed of approximately 0.18 dB/mm; in comparison, the decay toward the apex was slower and flattened out at a different level for each electrode. The current conservation was also reflected in **Figure 7**, which illustrates the second spatial derivative of

electrical potential along the fiber direction, but the effect was not as prominent as in the electrical potential profile.

#### 3.3. Comparison Between the Two Models

The removal of fine structures in modiolar bone altered the electric potential along the cochlear ducts only slightly. Compared to ORI, a similar profile but with a downshift in value was observed in the electrical potential at the neural fiber tips in SIM, as shown in **Figure 6A**. The approximated basal decay rate for SIM was 0.17 dB/mm, which was marginally smaller than ORI. The comparison of the electrical potential along the entire length of ANFs revealed a more complex pattern: for many ANFs, as presented in **Figures 8A,B** as well as in **Figures 9A,B**, the potential drop along the fiber was smoother in SIM; as a result, the potential value on these fibers was initially larger in ORI, but as it traveled farther away from the spiral lamina, the potential value in SIM surpassed that in ORI. This "intersection" also varied slightly depending on the location of the stimulation electrode as well as that of the fiber. Nevertheless, the difference in ANF electric potentials between ORI and SIM was relatively small, where the maximal absolute difference only reaching up to 10%.

In spite of small RDs in ANF electric potentials between ORI and SIM, the comparison of first and second derivatives of electric potential along the fiber, which are relevant for the initiation of action potentials, revealed a different story. **Figures 8**, **9** also presented the first and second derivatives of electric potentials along example fibers in both ORI and SIM, when the stimulating electrode was E3 and E5, respectively. Considerable fluctuations were found on the first and second derivatives with both models and their peaks were located at different locations along the fibers. For the exemplary fibers, maxima (and minima) in the first derivative between ORI and SIM differed by a factor of up to 1.5. In the case of the second derivative, such differences reached values up to 2; in addition, at several parts along the fibers, they had an opposite sign compared to those of SIM at the same location. The difference in the polarity/sign of the second spatial derivative can be clearly observed by comparing **Figures 7**, **10**. As shown in **Figure 7**, major peaks occurred at regions in the proximity to stimulating electrodes. Specifically, negative peaks appeared at the synaptic ending of peripheral axons, soma, as well as peripheral and/or central axons close to the soma, whereas positive peaks showed up predominantly on the central axons. In comparison, major negative peaks in the SIM model were found, as revealed in **Figure 10**, only at the synaptic ending of peripheral axons, and the region around the soma (close to stimulating electrodes) exhibited mainly positive values for the second derivative. Nevertheless, derivatives of both models eventually converged toward the distal end of central axons,

first (C) and second (E) derivatives on the left column were taken from the fiber 9.31 mm on the spiral lamina away from the base, which was also the closest fiber to the stimulating electrode. The electric potential (B), first (D) and second (F) derivatives on the right column were taken from the fiber 12.44 mm away from the base.

where the nerve trunk is solid in the ORI model; for example, as illustrated in **Figure 8E**, the convergence of second derivatives of the two models on the fiber closest to the stimulating electrode happened at approximately 4 mm away from the spiral lamina.

# 4. DISCUSSION

In this study we presented a detailed-segmented FE cochlear model reconstructed from the µCT scans of a human cadaveric temporal bone; as the porous characteristics of the modiolar bone were carefully delineated during the segmentation, the microstructures of the auditory nerve were also included in the model. Moreover, we developed a new algorithm to reconstruct the auditory nerve fiber model. Due to the presence of irregular microstructures included in the FE model, a straightforward spline interpolation as in previous modeling studies can inadvertently place segments of the nerve fibers outside the auditory nerve. By adopting a self-directed path-tracing through the edges of the FE auditory nerve mesh, we were able to ensure that the fiber tracts stayed within the auditory nerve, but not in any other structure of the model; and as the fibers inevitably passed through Rosenthal's canals in bundles, they displayed an appearance as natural as the osmium tetroxide-stained ANFs in the literature (Glueckert et al., 2018; van den Boogert et al., 2018).

To improve the lifelikeness of reconstructed ANFs, the following aspects should be taken into consideration. It has been reported in the literature on the anatomy of the auditory nerve in the cochlea that, due to the spiral feature of ANF bundles (Arnesen and Osen, 1978; Middlebrooks and Snyder, 2007), the

(truncated at 5 mm) within the "original" detailed-segmented and "simplified" cochlear models, when the stimulating electrode was E5. The electric potential (A), first (C) and second (E) derivatives on the left column were calculated on the fiber 13.27 mm on the spiral lamina away from the base, which was also the closest fiber to the stimulating electrode. The electric potential (B), first (D) and second (F) derivatives on the right column were calculated on the fiber 10.15 mm away from the base.

peripheral axon of ANFs takes a radial trajectory from the organ of Corti to the corresponding region of spiral ganglion somas only within 20–60% relative length of the organ of Corti; fibers outside this region, i.e., in the most basal and apical regions, exhibit a more tangential course (Stakhovskaya et al., 2007; Li et al., 2019). The 45◦ -rotation of the projected spiral prior to ANF reconstruction was to generate a spiral "wrap" of ANF bundles; as a result, basal fibers managed to exhibit a more tangential trajectory. Furthermore, the ANF density is not uniform between base and apex and is highest in the middle region (Spoendlin and Schrott, 1989). Due to the non-uniform distribution, it is thus difficult to provide realistic representation without further information. Another missing key feature in the reconstructed fibers is the cell body, as cell bodies are considerably thicker than axons and therefore cannot be bundled up as tightly as the modeled nerve fibers. The fiber bundles should thus expand around to make room for the cell bodies. However, this is also difficult to achieve without knowing how the somas are packed in Rosenthal's canal. Therefore, further improvements to our algorithm are necessary to reconstruct more realistic ANFs, for instance, through the combination with imaging data of osmium tetroxide-stained fibers.

It has been established that a major problem for speech perception with CIs is the cross-talk between stimulating electrodes; this is because the electrical potential has a slow decay as it moves away from the stimulating electrode. This phenomenon thus leads to the lack of spatial selectivity in representing the frequency components of the sound source. In the present study, we found that the electrical potential decay in the auditory nerve was different depending on the location

of the electrode in the cochlea and the direction of the decay. In general, the maximal potential value at the synaptic ending of peripheral axons became larger as the stimulating electrode reached into the apex, and the potential decay toward the base of the cochlea was faster than toward the apex; this agrees with the observations of electric potential in the scala tympani in Girzon (1987) and current density in Rosenthal's canals in Whiten (2007), despite the ground in these two modeling studies was placed much closer to the stimulating electrode. In addition, regardless of the electrode location, the decay toward the base shared more or less the same rate at 0.18 dB/mm; on the other hand, the decay toward the apex depended on the location of the electrode: the deeper it went into the cochlea, the earlier it flattened out. A similar trend was observed in the peak values of the second derivative as shown in **Figure 7**. Therefore, as the electrode shifted toward the apex, it recruited more fibers. This suggest that it may be beneficial to use a CI with uneven-spaced electrodes—an increasing distance between implant electrodes as moving toward the tip. It should nevertheless be noted that the analysis of the electrical potential alone does not predict the activation of nerve fibers. In the case of the activation of auditory nerve with a CI, where the cell bodies are relatively close to the stimulating electrodes, only cable models are able to predict threshold, polarity dependence and initiation site of the axon potential generation. Detailed analysis with the inclusion of a cable model to simulate ANFs will be performed in future research.

Apart from providing benefit to reconstruct lifelike ANFs, the fine modiolar details in the model may also have a major impact for predicting the activation pattern of ANFs. Our simulation results revealed that potentials along the cochlear duct and also along the nerve fibers were not much altered using the simplified model. However, spikes are initiated at the maxima of the activating function, i.e., the second derivative of the potential along an axon (if the axon is homogeneous) (Rattay, 1986), where we found substantial differences between the fine-segmented and simplified models. In the detailed model, the absolute values of the activating function were usually larger, which predicted lower thresholds, and more importantly, maxima occurred at different locations, which predicted different spike initiation sites. Even opposite signs were found for the values of second derivatives between ORI and SIM at several parts along the fibers, e.g., the initial segment of peripheral axon in the proximity of stimulating electrode, which suggested different polarity sensitivity; hence, while one model predicts depolarization and spike initiation at a given location, the other may instead predict a hyperpolarization. We therefore conclude that it is necessary to reconstruct a computational cochlear model with a detailed-segmented geometry combined with a detailed model of the neurons, which include dendrite, soma, and axon, to provide accurate predictions of ANF activation.

In order to confirm that no FE discretization error influenced the simulation outcomes, we generated two additional FE mesh using the detailed-segmented model, with 18,163,954 as well as 46,170,857 elements. The maximal absolute RD to the mesh with 21,937,778 elements for the coarser and finer models were 0.271 and 0.056%, respectively. This indicates the mesh used in this study was well-converged. Since it was difficult to retrieve information on the soft tissues from the µCT scans, the inner structure of the cochlea was not fully represented, and the blood vessels were included in the segmented nerve mesh model. A cochlear model including more tissue compartments, such as in Wong et al. (2015), is likely to provide more accurate prediction on the voltage profile. Nevertheless, our model was validated against intracochlear measurements from 16 implanted electrodes, and simulation results fell within the range of measurements, and in general presented a similar shape. This already indicates a good degree of validity for the model, especially considering that the cochlear structure was not fully represented, and the electrical properties were taken from literature without being fitted. At the same time, our measurement also presented several limitations: We were not capable of detecting individual disturbances to the electrode array, such as reduced contact due to scaring or tissue growth, and the presence of air bubbles; we were also unable to assess the size of patient cochleae or the exact placement of the electrode array. Future work on the model will involve incorporating more tissue compartments in order to investigate the sensitivity of electric potential and neural activation to the inclusion and variation of these properties.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### REFERENCES


#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Klinikum rechts der Isar (Munich, 2126/08). The patients/participants provided their written informed consent to participate in this study.

#### AUTHOR CONTRIBUTIONS

SB contributed to study design, FE model reconstruction, ANF reconstruction, model simulation, data analysis, and manuscript drafting. JEn contributed to study design, ANF reconstruction, data analysis, and manuscript drafting. MO-L and JEb contributed to intra-cochlear potential measurements and manuscript revising. RW and FS contributed to FE model reconstruction and manuscript revising. FB contributed to µCT image acquisition and manuscript revising. WH contributed to study design and critical manuscript revising. The final manuscript has been approved by all authors.

### FUNDING

This project and the authors were supported by grants from the Alexander von Humboldt Foundation, the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 702030, and the German Research Foundation (DFG) under the D-A-CH programme (HE6713/2-1). Publication with Frontiers is financed within the Open Access Publishing Funding Programme by the DFG and the Technical University of Munich.

#### ACKNOWLEDGMENTS

The authors would like to thank Mr. Jalil Jalali from the Munich School of Bioengineering, TUM for his help in using Geomagic, and the test subjects for their participation and contributions to this work.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bai, Encke, Obando-Leitón, Weiß, Schäfer, Eberharter, Böhnke and Hemmert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Improved in vitro Model of Cortical Tissue

#### Aaron Gilmour1,2, Laura Poole-Warren<sup>1</sup> and Rylie A. Green1,3 \*

<sup>1</sup> Graduate School of Biomedical Engineering, University of New South Wales, Sydney, NSW, Australia, <sup>2</sup> Clem Jones Centre for Neurobiology and Stem Cell Research, Menzies Health Institute Queensland, Griffiths University, Gold Coast, QLD, Australia, <sup>3</sup> Department of Bioengineering, Imperial College London, London, United Kingdom

Intracortical electrodes for brain–machine interfaces rely on intimate contact with tissues for recording signals and stimulating neurons. However, the long-term viability of intracortical electrodes in vivo is poor, with a major contributing factor being the development of a glial scar. In vivo approaches for evaluating responses to intracortical devices are resource intensive and complex, making statistically significant, high throughput data difficult to obtain. In vitro models provide an alternative to in vivo studies; however, existing approaches have limitations which restrict the translation of the cellular reactions to the implant scenario. Notably, there is no current robust model that includes astrocytes, microglia, oligodendrocytes and neurons, the four principle cell types, critical to the health, function and wound responses of the central nervous system (CNS). In previous research a co-culture of primary mouse mature mixed glial cells and immature neural precursor cells were shown to mimic several key properties of the CNS response to implanted electrode materials. However, the method was not robust and took up to 63 days, significantly affecting reproducibility and widespread use for assessing brainmaterial interactions. In the current research a new co-culture approach has been developed and evaluated using immunocytochemistry and quantitative polymerase chain reaction (qPCR). The resulting method reduced the time in culture significantly and the culture model was shown to have a genetic signature similar to that of healthy adult mouse brain. This new robust CNS culture model has the potential to significantly improve the capacity to translate in vitro data to the in vivo responses.

Edited by:

#### Yuki Hayashida, Osaka University, Japan

Reviewed by:

Takashi D. Y. Kozai, University of Pittsburgh, United States Jeffrey R. Capadona, Case Western Reserve University, United States

#### \*Correspondence:

Rylie A. Green rylie.green@imperial.ac.uk

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 20 August 2019 Accepted: 02 December 2019 Published: 17 December 2019

#### Citation:

Gilmour A, Poole-Warren L and Green RA (2019) An Improved in vitro Model of Cortical Tissue. Front. Neurosci. 13:1349. doi: 10.3389/fnins.2019.01349 Keywords: brain machine interface, in vitro prediction, CNS, cell culture, neural interface response

# INTRODUCTION

Investigating the biocompatibility of brain interfacing devices using animal models is expensive, time consuming (Gilmour et al., 2016) and data yield from each animal can be limited by the tissue processing and histological methods used within a study (Woolley et al., 2011). However, existing in vitro models for investigating central nervous system (CNS)-device interactions are not a viable alternative, as they poorly represent the complex cell interactions within the CNS and provide little information on the expected in vivo response (Horvath et al., 2016; Belle et al., 2018). Despite this, cell culture is a powerful technique for high-throughput studies, enabling parallel assessment across a large number of variables (Astashkina et al., 2012; Zang et al., 2012). An ideal solution is a cell culture model with enough complexity to enable useful insight into implant performance, while not compromising on capacity to trial multiple variables.

**44**

For neural cell culture models to be mimetic of the CNS in health and disease, mimicking cell–cell interactions is essential. Interactions both within and between individual glial and neural cell types are critical for the development, function and dysfunction of the CNS (Jäkel and Dimou, 2017). The astrocyte– microglia interaction is the most notable cell–cell interaction and it is pivotal in development, normal function, and response to damage (Liddelow et al., 2017; Yates, 2017). Astrocytes and microglia perform multiple roles in CNS development, ongoing health, and degenerative disease (Burda and Sofroniew, 2014; Pekny and Pekna, 2014; Ferreira and Bernardino, 2015; Sofroniew, 2015; Ziebell et al., 2015; Burda et al., 2016; Liddelow and Barres, 2017). Importantly, the functions of these cells evolve during development undergoing dynamic genotypic and phenotypic changes which are integral to the development of the CNS (see Reemst et al., 2016; Hasel et al., 2017 for in depth reviews). Glial cells change roles from promoting development of neural networks and myelination, to maintaining the complex function of the adult CNS. In response to injury in the mature CNS, glial cells within the wound parenchyma transition to a reactive state (Silver and Miller, 2004; Anderson et al., 2014; Gilmour et al., 2016). In this reactive state mature glial cells produce an environment which does not support redevelopment of neural networks, inhibiting neuronal cell migration and axonal growth (Smith et al., 1990; Canning et al., 1996; Fawcett and Asher, 1999; Faulkner, 2004; Sofroniew, 2009; Cregg et al., 2014; Burda et al., 2016). In contrast, immature glial cells from fetal or neonatal origins lack the ability to undergo reactive gliosis-like reactions in vivo and in vitro (Schwartz et al., 1989; Wu and Schwartz, 1998).

A number of mixed glial and neuronal cultures have been developed in an attempt to incorporate complex cell behaviors into in vitro models (Potter and DeMarse, 2001; Polikov et al., 2006; Thomson et al., 2008; Nash et al., 2011b; Boomkamp et al., 2012; Sommakia et al., 2014). It is expected that this complexity introduces improved alignment with the in vivo CNS cell response. However, these culture models often have intricate, multistep methodologies (Polikov et al., 2009), are extremely sensitive to minor modifications and require additional stimulating factors to induce reactive gliosis, limiting their value as a high-throughput assessment tool (Gilmour et al., 2016). Current models have a second limitation whereby the apparent upregulation of glial fibrillary acidic protein (GFAP) and Iba1 in astrocytes and microglia respectively in response to insult does not impact on neural health and regrowth (Polikov et al., 2006; Sommakia et al., 2014). The maturity of glial cells and their relative ability to undergo reactive gliosis has implications for the development and use of complex culture models for modeling CNS and effects of injury. In brain injury and device interactions, scar tissue is formed with glial cells being the dominant component. These cells modulate neuron and oligodendrocyte function, survival, or dieback in the surrounding tissues (Sofroniew, 2009; Burda and Sofroniew, 2014; Burda et al., 2016). In rodents, astrocytes start to express mature genotypes and phenotypes after 3–4 weeks postnatal development (Yang et al., 2013a; Reemst et al., 2016; Hasel et al., 2017) which aligns with the end of the major period of astrogenesis. In contrast the relative maturity of the glial cell populations in prior cultures (Polikov et al., 2006; Sommakia et al., 2014) is equivalent to postnatal days 7–14 (Reemst et al., 2016), at which age rodents are still undergoing neurological development. To achieve adequate glia maturity in these cultures it is estimated that glia would need to be cultured for at least 35 days. It was therefore hypothesized that a more mature population of glial cells are required to enable a CNS culture model with capacity to respond appropriately to injury and implants. The objective of this research was to develop a simple, robust and validated model of the mature rodent CNS. Such a culture could be used for better understanding cell–cell interactions in the CNS, and for mechanistic investigations into CNS injury, repair, and interactions with neural devices.

Co-culture models have been developed to enable understanding and probing of specific glial–neural or glial– glial cell interactions (Banker and Cowan, 1977; Ishikawa et al., 1996; Plenz and Aertsen, 1996; Nakanishi et al., 1999; Flanagan et al., 2002; Faria et al., 2006; Cullen et al., 2007; Wanner et al., 2008; Shimizu et al., 2011; Bogdanowicz and Lu, 2013; van Duinen et al., 2015) of defined cell populations. Previous research (Gilmour, 2018) identified mixed glial cells (MGCs) derived from neonatal mice and cultured for 21 days prior to co-culture generated a glial cell population which was capable of reactive gliosis. Co-culture can be approached by either combining cells in a single concurrent plating step or by staggering the plating to enable one population to develop, prior to addition of the second population. Previous attempts to combine glia and neurons have generally focused on step-wise combinations. One such approach has been the continuous culture of glial cells until they obtain maturity, followed by direct co-culture of neural progenitors (Gilmour, 2018). Despite showing that this culture method develops neural networks which respond to injury at the glial and neuronal level, there are a number of shortcomings limiting this method. First to obtain mature neural networks a continuous culture timeline of ≥ 45 days was required. Second, reproducibility which included failure to obtain time mated embryos at the correct time point (≈66% of failures), poor growth of MGCs after passage (≈15%) and less commonly contaminating cells overgrowing MGC cultures after passaging (≈10%) with an overall failure rate of ≈86%. As such, a more flexible and time efficient method is required to enable complex co-culture of MGCs in combination with neuroprogenitor cells. To address the long culture times and potential for mismatch in time mating, this study proposed the use of frozen mature glial populations that can be stored and reanimated to ensure flexibility and minimization of culture timeframes.

#### MATERIALS AND METHODS

All the chemicals and biological materials were obtained from Sigma-Aldrich (Australia) unless otherwise stated. MCG media consisted of 10% fetal calf serum, in DMEM with L-glutamine. DMMC and co-cultures used three types of media previously described in Thomson et al. (2008), being plating media (PM), defined media with insulin (DfM + I) and defined media without insulin (DfM).

# Co-culture Methodologies

fnins-13-01349 December 14, 2019 Time: 15:49 # 3

Co-cultures were formed through the combination of 30% MGC and 70% DMMC cells. Once in co-culture format they were fed three times per week with DfM + I for the first 12 days then transitioned to DfM thereafter. Co-cultures were grown on PLL coated glass for developing and assessing the baseline performance of the methods relative to both whole brain extract (for qPCR) and the DMMC cultures as developed by Sorensen et al. (2008) and Thomson et al. (2008).

# Primary Mixed Glia Culture (MGC)

All animal procedures were conducted in accordance with University of New South Wales animal ethics protocols (ACEC 13/44A). Postnatal 1–3 day old mouse pups were euthanized by exposure to excess gaseous isoflurane followed by decapitation. The isolation and culture of MGCs was performed as previously published in Goding et al. (2015), with the following modifications. Cultures were maintained until 80% confluence (approximately 7–10 days) in poly-L-lysine (134 ug mL−<sup>1</sup> ) coated T75 tissue culture flasks. Once confluent cultures were trypsinised then frozen in DMEM + 10% FBS with the addition of 10% DMSO. Briefly, cultures were rinsed twice with PBS (without cations) then incubated with 3 mL 0.25% trypsin for 5 min. Trypsin was deactivated by the addition of DMEM + 10% FBS. The resulting cell suspension was centrifuged for 5 min at 290 g. Cell concentration was determined with a hemocytometer and diluted with DMEM + 10% FBS to achieve 2<sup>∗</sup> 10ˆ<sup>6</sup> cells mL−<sup>1</sup> in freezing media.

# Dissociated Mixed Myelinating Culture (DMMC)

Dissociated mixed myelinating culture (DMMCs) were produced using the methods developed in Thomson et al. (2008) with minor modifications. Briefly, gestational day 13.5 pregnant mice were euthanized by an overdose of isoflurane followed by cervical dislocation. Embryos were extracted, spinal cords were removed and stripped of meninges. Harvested spinal cords were dissociated manually, followed by 20 min in 0.25% trypsin EDTA with 0.1% w/v type 1 collagenase. Stop digestion mix was added [40 µg mL−<sup>1</sup> DNase, 250 µg mL−<sup>1</sup> trypsin inhibitor, 3 mg mL−<sup>1</sup> bovine serum albumin fraction V (BSA-V) dissolved in Leibovitz's L15 (Thermo Fisher Scientific, Australia)]. The cell suspension was then passed three times though a 21G needle followed by two times through a 23G needle. The cell suspension was diluted in PM and centrifuged at 290 g for 5 min. The cell pellet was resuspended in PM and cell counting was conducted with a hemocytometer. The DMMC cell suspension was then plated out at 1.5<sup>∗</sup> 10ˆ6 cells cm−<sup>1</sup> onto PLL coated coverslips. After 2 h the culture media was topped up to 500 µL PM with 500 µL DfM + I. Alternatively the cell suspension was used in co-cultures as described below. Cultures were fed three times per week by replacing 50% of the media, using DfM + I for the first 12 days followed by DfM thereafter.

# Layered Co-culture

Time mating was undertaken to obtain E13.5 embryos for tissue harvesting for DMMC cultures. On the day a successful plug was noted frozen MGC were thawed in a 37◦C water bath, once thawed the cell suspension was diluted with DMEM + 10% FBS then centrifuged at 290 g for 3 min. Revived cells were placed in a PLL coated T75 flask and cultured for 2 days in DMEM + 20% FBS then changed to 10% FBS. After 4 days MGC cultures were passaged and plated at 4<sup>∗</sup> 10ˆ<sup>4</sup> cells cm−<sup>2</sup> in 1 mL of DMEM + 10% FBS onto PLL coated glass coverslips. Once DMMC were harvested, all media was removed from MGC cultures and DMMC were plated on top in 500 µL of PM at 1.1<sup>∗</sup> 10ˆ<sup>5</sup> cells cm−<sup>1</sup> . Cultures were then fed as per DMMC protocol above.

# Concurrent Co-culture

Mice were time mated and MGC cultures were revived as above, however MGC were not thawed until day 9.5 of pregnancy (4 days prior to embryonic spinal cord harvest). After spinal cord tissue was harvested, dissociated and suspended at 4.4<sup>∗</sup> 10ˆ<sup>5</sup> cells mL−<sup>1</sup> in plating media, MGC cultures were passaged from the T75 flasks and resuspended to a concentration of 1.6<sup>∗</sup> 10ˆ<sup>5</sup> cells mL−<sup>1</sup> . The MGC and DMMC cell suspensions were mixed 1:1 resulting in a final concentration of 3<sup>∗</sup> 10ˆ<sup>5</sup> cells mL−<sup>1</sup> , 500 µL of cell suspension was plated per coverslip. Cultures were maintained as described for DMMC above.

# Immunocytochemistry and Image Analysis

Cultures were fixed at 21, 28, and 35 days in 4% w/v formaldehyde and processed for microscopy. All primary and secondary antibodies were diluted in blocking buffer, immediately prior to use. Secondary antibodies were raised in goat and conjugated to either Dylight <sup>R</sup> or Alexa Fluor <sup>R</sup> 405, 488, 555, and 647 nm fluorophores diluted at 1:200. Primary antibodies were against GFAP (Abcam; ab134436), Iba1 (Wako; 019- 19741, RRID:AB\_839504) 200 kDa heavy chain neurofilament (Abcam; ab7795, RRID: AB\_306084) (H-NF) and proteolipid protein (PLP/DM20) from a hybridoma (RRID: AB\_2341144) (Jung et al., 1996).

All images were acquired using a Zeiss 780 laser scanning microscope (LSM) with a Plan-Apochromat 20x/0.8 M27 objective. Non-overlapping regions were captured as z-stacks, with 5 areas per sample. Images were post-processed with ImageJ software (ImageJ 1.50e, National Institutes of Health, United States) implemented on Java 1.8.0\_11 (64-bit). Individual channels were deconvolved with 15 iterations of the Richardson-Lucy algorithm implemented via "DevonvolutionLab" plugin (Soltys et al., 2001) with a theoretical point spread function (PSF) and minimal intensity background subtraction. A theoretical PSF was generated with the "Diffraction PSF 3D" plugin for ImageJ to match the dimensions of the acquired images.

The N-NF and PLP/DM20 channels were processed for colocalization to assess the level of interaction under different culture conditions. Colocalization was performed the with Coloc 2 plugin with the default settings. Threshold values generated from Coloc 2 were used as thresholds for the binary conversion of Z-stacks. Z-stacks were converted into maximum intensity projections and total coverage of each channel was expressed as a fraction of the total area in µm<sup>2</sup> .

#### Statistical Analysis

fnins-13-01349 December 14, 2019 Time: 15:49 # 4

Statistical analysis was performed in GraphPad Prism 7.03 (GraphPad Software, La Jolla, CA, United States), all data sets were tested for outliers using ROUT method Q = 0.1 (99% confidence that data point is an outlier). A one-way ANOVA followed by Tukey's multiple comparisons test with a p < 0.05 were considered as significant.

### Quantitative PCR

Cultures for messenger ribonucleic acid (mRNA) extraction were rinsed 1x with ice cold DPBS and processed with the ReliaPrepTM RNA cell miniprep system (Promega, Australia) following manufactures instructions. The final RNA extract was eluted into 30 µL of RNase free water. RNA was stored frozen at −80◦C until conversion. 5 µL of RNA from each experimental triplicate was pooled, then 10 µL of the pooled RNA for each condition was converted into first strand complementary deoxyribonucleic acid (cDNA) using a High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Thermo Fisher, Australia) following manufactures instructions using Bio-Rad C1000 thermal cycler (Bio Rad, Australia). The resulting cDNA was diluted to a total final volume of 100 µL and stored at −80◦C prior to qPCR. Primers pairs (see **Table 1**) were designed with the assistance of Primer-Blast software (Ye et al., 2012) all primer sequences were then cross checked with Beacon Designer Free online tool to identify possible dimers and hairpins.

Quantitative PCR (qPCR) was run on a CFX384 TouchTM real-time PCR detection System (Bio-Rad, Australia) and output was analyzed with Bio-Rad CFX Mastro Software package (Bio-Rad). Power SYBRTM Green PCR master mix (Applied Biosystems, Thermo Fisher, Australia) was used, following manufacturer instructions, reaction volume was set to 10 µL with 2 µL of template used per well with final primer concentrations of 500 nM. Reaction cycles were repeated 40 times followed by melt curve as in **Table 2**. Each template was run in triplicate for all genes assessed. Relative fold change calculations and statistics were calculated with two reference genes using inbuilt analysis software package.

# RESULTS

# Morphological Properties and Interactions

Both the layered and concurrent co-culture methods resulted in dense myelinated neural networks which grew for 35 days (a targeted approach to ensure that the original predicted 21 day time period was sufficient for neural network maturation). Both co-culture methods resulted in significantly increased reliability and repeatability over the original continuous co-culture method, shown in **Figure 1**. The continuous co-culture had an 86% failure TABLE 1 | Primer pairs for qPCR.


#### TABLE 2 | qPCR reaction cycle settings.


rate compared to 100% success rate of both the concurrent and layered methods investigated here. Additionally, accumulations of H-NF were noted in the continuous co-cultures which indicated neural degeneration (see inset **Figure 1A**), this was not observed in either the layered or concurrent cultures at 35 days.

Comparison of the two co-culture methods with the DMMC culture at the phenotypic and genotypic levels indicate that the concurrent co-culture method resulted in greater numbers of myelinated neural axonal processes and these cultures also expressed significantly higher levels of genes associated with phosphorylated neurofilament and myelin production. **Figure 2** shows an overview heat map comparing fold difference in gene expression at each time point and condition. Representative composite images of DMMC and co-cultures at 35 days are shown in **Figures 3A–C**.

The complexity imparted to the DMMC and co-cultures by including all the major cell types of the CNS resulted in a morphologically diverse and intertwined distribution of GFAP positive astrocytes. **Figures 3A–C** shows the GFAP morphology in the astrocytes at 35 days in culture. The morphologies present in each culture type were variable across each sample, with fibrous, stellate, and protoplasmic being the most dominant morphologies. When assessed in isolation the GFAP morphologies appeared random, however the three

dominant morphologies occupied different domains when the astrocyte proximity with the other cell types were taken into consideration. Protoplasmic astrocytes were predominantly found at the interface of the culture and growth surface. Stellate astrocytes were associated with multiple nerve fibers, and the fibrous astrocytes were aligned with bundles of parallel axons. These cell-cell related morphologies are representative of in vivo interactions previously described for the different cell types (Oberheim et al., 2006, 2012; Wang and Bordey, 2008; Sofroniew and Vinters, 2010).

The majority of microglia present in all cultures were in ramified/resting states as shown in **Figures 3D–F**. The staining intensity for Iba1 in these control cultures on glass is relatively weak, which was expected as the microglia in culture conditions without insult (inflammatory or wound conditions) should not be activated. A notable observation is that the ramified branches of the microglia in the DMMC culture had a fluorescent intensity similar to that seen in the cytoplasm. Conversely in co-culture the ramified branches tended to be of a lower intensity, which suggests there are slight differences in their activation state. **Figure 4** shows the interaction between the three glial cells present in the concurrent co-cultures. It reveals a range of potential astrocyte, microglia, and oligodendrocyte interactions that are reflective of those observed in vivo (Domingues et al., 2016; Kiray et al., 2016). Note that staining of neural axons was excluded for clarity of the glial cell morphologies.

Representative images of the density and organization of axons (H-NF) and myelin (PLP/DM20) are shown in **Figures 3G–L**. The addition of mature MGC to the cocultures did not alter the organization of the H-NF positive

axons at 35 days in culture, however both co-cultures had marginally increased axonal coverage, as summarized in **Figure 5** compared to the DMMC alone. This was significant for the layered co-culture (p < 0.05) when compared with the DMMC culture. However, the contiguity of the staining was more homogenous along the lengths of the axons in the concurrent co-culture. This uniformity of axonal staining in the concurrent co-culture correlated with more consistent myelination along the lengths of the axons as shown in **Figure 3L**. The myelin coverage in the concurrent co-culture was more consistent when compared with DMMC and layered methods which had greater variance in coverage, as shown in **Figure 6**.

The colocalization of H-NF and PLP/DM20 staining yielded two important features which are directly relevant to the level of maturation and health of the cultures. Firstly, the fraction of myelinated axons as described in **Figure 7**. The concurrent coculture method consistently generated greater levels of axonal myelination when compared to the DMMC (p < 0.0001) and layered co-culture (p < 0.05). Secondly, the fraction of myelin produced which is associated with the axons as described in **Figure 8**, where lower values indicate the oligodendrocytes are less mature and are likely to be in a pre-myelinating state. Consequently, higher values indicate more mature oligodendrocytes, and indirectly more mature axons. The concurrent co-culture had significantly higher levels

FIGURE 4 | Maximum intensity projection of a 100x magnification tile scan from a concurrent co-culture demonstrating the potential interaction between microglia (red), myelin (orange), and astrocytes (green). Very fine ramified microglial processes can be seen in the upper left (arrow). Microglia can be observed in close apposition with myelin (<sup>∗</sup> ) and astrocytes (∧) (Scale bar = 50 µm).

of myelinated axons compared to the DMMC (p < 0.01). The difference in myelination between concurrent and layered methods was not significant, although the layered co-culture exhibited greater variance between replicates.

# Gene Expression – Comparison With in vivo CNS Tissue

To enable comparisons between the cultures and the mature in vivo mouse CNS, qPCR was performed on mRNA extracted at 21, 28, and 35 days in co-culture and from samples of whole brain. Where possible the qPCR primers were designed for the same targets that were used for immunofluorescence.

FIGURE 6 | Assessment of average myelin coverage as fraction of total image area inferred from PLP/DM20 positive staining. Data acquired at 35 days in co-culture (n = 3).

FIGURE 7 | Fraction of Phosphorylated neurofilament which is colocalized with PLP/DM20 used to indicate the proportion of axonal area with a myelin sheath. Data acquired at 35 days in co-culture (n = 3, <sup>∗</sup>p < 0.05, ∗∗∗∗p < 0.0001).

**Figure 9** compares the individual cultures to the in vivo mRNA expression of GFAP. Both co-cultures had at least fourfold more GFAP present than the whole brain control at all assessment time points. The DMMC culture had at least 2.5-fold higher expression than the brain extract. This indicates radial glia and/or immature astrocytes were possibly present in both DMMC and co-cultures. Both co-cultures had significantly more GFAP mRNA at 21, 28, and 35 days compared to the DMMC alone, with

mouse brain extract. Statistical comparisons relative to reference brain extract shown on graph only (n = 3, <sup>∗</sup>p < 0.05, ∗∗∗p < 0.001).

differences being greater than twofold at 21 days. At subsequent time points the GFAP expression difference between the DMMC and both co-cultures decreased to 1.5-fold (p < 0.001). Despite this, the elevated levels of GFAP gene expression did not appear to impact on the levels of H-NF production or myelination. This

supports the premise that the increased GFAP expression is in part due to the presence of radial glia or immature astrocytes rather than reactive astrocytes.

**Figure 10** shows that the mRNA expression of Iba1 (microglial inflammatory factor) in the three cultures was at least threefold greater than whole brain extract. However, in contrast to GFAP, the concurrent co-cultures tended to have lower levels of expression when compared to DMMC and layered cultures. The differences between culture types was less than onefold, with the 28 and 35 day concurrent co-cultures being significantly lower than the respective DMMC cultures (p < 0.05). The elevated expression is possibly linked to the developmental role of microglia in regulating synapse formation and removal. The discrete differences in expression are in agreement with the small differences in Iba1 staining intensity.

Cdk5 is expressed in multiple CNS cells including neurons and regulates a diverse range of cellular events. Its expression is required for activation of Cdk5r1 to induce phosphorylation of heavy chain neurofilament expressed in the axonal segment of mature neurons. **Figure 11A** shows the expression of Cdk5 relative to whole brain mRNA expression. All cultures at all assessment time points are significantly different to the whole brain expression, however the relative differences in are small (<0.4-fold). Conversely, the expression of Cdk5r1 as shown in **Figure 11B** is at least 4.5-fold less (p < 0.001) in DMMC cultures when compared to whole brain extract whereas, layered co-cultures are at least fourfold less (p < 0.001) and the concurrent co-cultures are 3 to 3.3-fold less (p < 0.001).

The DMMC culture had the lowest expression of both Cdk5 and Cdk5r1 when compared to the co-culture techniques. There were no differences found between Cdk5 expression between the layered and concurrent co-culture methods. For Cdk5r1, the concurrent co-culture expression was at least onefold greater than DMMC cultures (p < 0.05 at 21 days and p < 0.001 at 28 and 35 days). Most notable though was that the concurrent cocultures showed consistent expression of Cdk5r1 over all time points, whereas both DMMC and layered cultures showed signs of downregulation at 35 days. This trend suggests the concurrent co-culture produced increased phosphorylated neurofilament formation when compared to the DMMC and layered cultures.

The production and phosphorylation of heavy chain neurofilament resulting in mature axon formation is indirectly linked to oligodendrocyte maturation and myelination (Jakovcevski et al., 2007; Simons and Nave, 2016). **Figure 12** shows the relative expression of PLP mRNA in the three culture types relative to whole brain mRNA expression. Importantly, at all assessment time points the concurrent co-culture had similar levels of PLP expression when compared with the whole brain extract. The concurrent co-culture was shown to have significantly greater PLP expression when compared to both DMMC (p < 0.001) and layered (p < 0.05) cultures. The mRNA expression of Cdk5r1 and PLP was found to support the morphological data in terms of continuity of H-NF staining in axons and the level of myelination. These results suggest that the concurrent co-culture produced a more consistent culture across the culture period and developed a mature myelinating neural network at an earlier time point. One concern with the PLP expression in the DMMC culture and layered co-culture is there was a measurable downregulation between 28 and 35 days, suggesting possible degeneration.

### DISCUSSION

The two co-culture approaches using frozen MGC cultures were proposed to reduce the total culture time and reliability of the co-culture system for modeling the CNS. Relative to the previous continuous co-culture method which required 45 days to develop mature myelinated neural networks, the concurrent and layered required significantly less time, 25 and 35 days respectively. Compared to the DMMC the concurrent co-culture required two additional steps and four more days to develop. In addition, as MGC were only revived once a mouse was successfully time mated this resulted in 100% success rate for the modified co-culture methods significantly reducing animal breeding costs. Both co-culture approaches resulted in dense networks of myelinated axons with closely associated astrocytes and microglia. The freeze-thaw process on the MGC had no identifiable impact on the subsequent cocultures. Most notably the shorter recovery time for the MGC in the concurrent co-culture approach was associated with a greater amount of myelinated neural networks at 35 days when compared with the layered and DMMC cultures. Although image analysis of the cultures revealed little difference between the co-cultures and the original DMMC culture with respect to total myelin and H-NF coverage, the concurrent co-culture resulted in increased myelination of axons. Assessment of GFAP did not reveal any notable differences between the culture types. The morphology of the Iba1 stained microglia indicated subtle differences between the DMMC and co-cultures. The microglia in both co-cultures appeared more ramified, thus suggesting a greater level of microglia maturity. This supports the hypothesis that the combination of MGC and DMMC would result in a more mature culture representative of normal CNS tissue in vivo.

The relative maturity of astrocytes plays a pivotal role in both neural network development and their ability to undergo reactive astrogliosis (Smith et al., 1990). In vivo the differentiation and maturation of astrocytes occurs via reciprocal maturation signals between astrocytes and neurons (Hasel et al., 2017). The time for which astrocytes are cultured prior to interaction with neurons and immature oligodendrocytes, impacts on their ability to myelinate axons (Ishikawa et al., 1996). Additionally, astrocyte maturity has been shown to directly impact oligodendrocyte differentiation (Ishikawa et al., 1996; Nash, 2010; Nash et al., 2011a), with increased time in isolated culture resulting in inhibition of myelination, as a consequence of absent cues from the developing neurons. Although there are no apparent differences in the GFAP morphologies present between the culture types, there are significant differences at the mRNA level. At 21 days both co-cultures had greater than twofold more GFAP mRNA relative to the DMMC culture alone. This difference decreased to 1.5-fold at 35 days. Elevated GFAP is classically associated with reactive gliosis associated with neurotrauma, diseases, or neurodegeneration (Ridet et al., 1997; Silver and Miller, 2004; Middeldorp and Hol, 2011; Gao et al., 2013; Brenner, 2014; Burda and Sofroniew, 2014; Cregg et al., 2014; Pekny et al., 2014; Liddelow and Barres, 2017). However, despite the increased mRNA expression of GFAP and its changes over time in culture, the increased levels had no measurable impact on the processes of axonal growth and phosphorylation of H-NF and subsequent myelination in the co-cultures.

In light of the apparent lack of impact of the elevated GFAP on neural network development, indicates there are a number of possible explanations for the elevated GFAP expression compared to the in vivo tissues. Firstly, the site of mRNA extraction from the CNS carries potential variability. In vivo there is regional heterogeneity in GFAP positive astrocytes (Schitine et al., 2015) which results in differential expression levels of GFAP. The in vivo tissue collection site relative to the in vitro cell population could be inherently different. Secondly, the astrocytes in the culture are likely in a mild inflammatory state resulting in increased GFAP expression (Liddelow and Barres, 2017), this is a consequence of being grown on rigid substrates such as glass and tissue culture plastic (Wilson et al., 2016). Alternatively, it is possible that this difference is an additive result of the two component cultures, DMMC and MGC, contributing to the mRNA expression, which is partly supported by the relative increase in GFAP expression of the co-cultures over the DMMC alone. Further to this, at 7 days in culture the DMMC likely consists of GFAP positive radial glia which continue to divide and differentiate into mature astrocytes (McDermott et al., 2005) and non-astrocytic cells. In vivo radial glial become prevalent in the mouse spinal cord tissue around E9.5 days (Hall and Miller, 2012) and undergo differentiation into immature astrocytes between E18 and P14 days of age (in rodents) (Reemst et al., 2016). This timeline correlates to a peak differentiation of the radial glia into astrocytes and other cell types around 10–14 days in culture from the DMMC population. Further to this, in vivo data from Riol et al. (1992) described initial increases in GFAP mRNA levels from P0 to P20 days followed by declining levels out to P60 days. Both co-cultures appeared to follow this trend after 21 days and the DMMC after 28 days. This suggests that the co-cultures

develop at a faster rate compared to the DMMC. However further research is required to map this change over the entire culture period to determine the exact difference in development time between the culture types, and how this relates to in vivo CNS development.

In conjunction with the elevated mRNA levels of GFAP, Iba1 was also at least threefold higher compared to whole brain mRNA in all culture types. Although only minor differences in expression were found between the culture types, the concurrent co-culture exhibited the lowest level of relative Iba1 expression. This elevated mRNA expression compared to whole brain was contrasted by the dominant ramified morphologies present in the co-cultures, which indicates a healthy, mature resting state (Lively and Schlichter, 2013; Ferreira and Bernardino, 2015). This increased Iba1 mRNA expression in the cultures relative to the adult mouse brain is potentially associated with the developmental roles of microglia in regulating synapse formation via pruning of unnecessary connections (Chaboub and Deneen, 2013; Tay et al., 2017). However, continuous co-cultures, described in Gilmour (2018) which were grown on different materials indicated the microglia are capable of maintaining resting phenotypes on control materials or taking on activated phenotypes in response to test materials, thus suggesting the elevated mRNA levels might not be due to immature microglia. It is also possible that the elevated mRNA is an artifact of the 2D culture format, combined with the physiological irrelevant volume of media required to maintain the metabolic requirements of the cultures. The effect of media volume and culture format has been shown previously to have significant effects on osteocytes (Yoshimura et al., 2017) and hepatocytes (Haque et al., 2016).

The phosphorylation of neurofilament is controlled through Cdk5 and the neuron specific activator Cdk5r1 (Wang et al., 2012). At 21 days in culture, the concurrent co-culture expressed significantly more Cdk5 compared to the other cultures, but this difference decreased at 28 and 35 days. As Cdk5 is associated with other processes and cell types within the developing and mature CNS (Zhu et al., 2011) the expression of Cdk5r1 combined with Cdk5 is more relevant. In vitro Cdk5r1 expression was significantly higher in the concurrent co-cultures at all time points, except 21 days when compared to the layered co-culture. This increased co-expression of Cdk5/Cdk5r1 did not result in a greater number of axons, but the axonal expression of the phosphorylated neurofilament was more contiguous. This infers the axonal processes in concurrent co-cultures are more stable and more resistant to degeneration (Sun et al., 1996; Zhu et al., 2011). The increased stability of the neural processes could be indirectly linked to the lower levels of Iba1 in the concurrent cocultures as there is less phagocytosis of degraded axons (Ekdahl, 2012).

Comparing the production of phosphorylated neurofilament in the co-cultures revealed Cdk5r1 expression was at least threefold less in concurrent co-cultures and fourfold less in both DMMC and layered cultures compared to whole brain extract. The difference between in vivo and in vitro expression could be the result of the 2D nature of the culture environment (Zare-Mehrjardi et al., 2011). The 2D environment limits the total number of axons and axon length, an observation similar to that previously made by Sun et al. (2016) in reference to the differences between 2D and 3D neural cell cultures. Although there is less Cdk5r1 in vitro than in vivo, there are similar levels of Cdk5. This is likely due to a secondary role of Cdk5 in modulating OPC differentiation into oligodendrocytes (Miyamoto et al., 2007). The concurrent co-cultures expressed similar amounts of PLP mRNA compared to adult brain extract, which correlates with the expression of Cdk5 for all culture types and time points. It has been proposed that Cdk5 interacts with OPCs promoting differentiation, although via different pathways to neurofilament phosphorylation (Miyamoto et al., 2007), but is facilitated as a secondary effect of this interaction (Yang et al., 2013b; Luo et al., 2016). Taken together the relative expression of Cdk5 and PLP is likely linked to the differentiation of OPCs into mature oligodendrocytes (Miyamoto et al., 2007), as all culture methods resulted in similar amounts of total myelin. However, the concurrent co-culture resulted in a higher level of myelin associated with axons and subsequently more myelinated axons. The process of axon myelination is complex an only partially understood, but is thought to be governed first by intrinsic actions followed by adaptive changes (Bechler et al., 2018). Oligodendrocytes have been shown to intrinsically wrap axons and axon like structures (Rosenberg et al., 2008; Tuck et al., 2016), however this initial myelination is transient unless stabilized through adaptive changes. The adaptive stabilization process is hypothesized to only occur based on interactive signals from active mature axons (Almeida, 2018). This might indicate that the combined co-culture has more mature neurons resulting in stabilized myelin sheaths compared with the layered approach.

Contrary to the expected relationship between GFAP, Cdk5/Cdk5r1 and PLP expression, the concurrent co-culture expressed the highest levels of GFAP at 21 and 28 days in culture. These time points were also associated with the highest level of axonal myelination. It was anticipated that the higher levels of GFAP expression would be associated with lower production of H-NF and PLP. In vivo H-NF is primarily found in its phosphorylated form in mature axons within the adult CNS (Wang et al., 2012) and is sparse in the developing and immature CNS (Haque et al., 2016). At present there is no known explanation for this relationship.

Although the model presented here does not include the blood–brain barrier (BBB) or peripheral immune cells, which are critical components of the in vivo response to intracortical implants and traumatic CNS injury (Polikov et al., 2005; Groothuis et al., 2014). The objective of this work was to establish a robust, rapidly maturing co-culture of the CNS which has the potential to replicate some of the hallmarks of CNS injury. Although BBB disruption is one of the key attributes of traumatic CNS injury, recent literature indicates the interplay of the peripheral immune system has greater impacts in wound progression and secondary degeneration (Evans et al., 2014; Ertürk et al., 2016; Makinde et al., 2017; Abe et al., 2018). Future studies could expand on this model through the inclusion of peripheral immune cells or immune cell conditioned media

(Haan et al., 2015) at different developmental or post-insult time points to evaluate the mechanisms of how the peripheral immune system alters CNS behavior in development and after insult.

Within the limitations of a 2D model to represent the 3D in vivo CNS, co-culturing mature MGC and DMMC populations provides a promising platform for modeling multicellular behaviors and responses to exogenous stimuli. The inclusion of a more mature glial cell population enables the culture to react in a more in vivo mimetic way. This was demonstrated in our previous research, whereby the inclusion of mature astrocytes dramatically altered the response of neural cell development and oligodendrocyte differentiation in response to different materials. The concurrent co-culture method provides a good robust model for use in wound healing studies and biomaterial assessment often conducted on less relevant culture systems. The combined co-culture improves on existing models by enabling the formation of mature neural networks within 25 days, compared to alternative methods which take > 5 weeks to reach maturity. In addition, the model does not require exogenous ECM coating of growth surfaces for cell attachment, as ECM type can affect neural progenitor differentiation and cell migration, thus impacting the overall cell behavior (Ma et al., 2008). The co-culture is completely serum free after 12 days in culture. The serum free nature enables evaluation of the cultures at the proteomic level without the confound of animal sera. Lastly approximately 150 cultures can be obtained from 2 neonatal and 6 E13.5 embryonic mice in a 24 well format.

Future work will determine to what extent the combined co-culture model is able to replicate cell behaviors relevant and consistent with the in vivo CNS injury. To achieve this, it is necessary to analyze the expression of pro and antiinflammatory cytokines and chemokines present within the cultures, relative to the native CNS in conjunction with genetic and morphological analysis.

#### CONCLUSION

The modified co-cultures both substantially increased the reliability and repeatability of the co-culture method. When the co-cultures were compared at the genotypic and phenotypic levels to the DMMC culture method, both methods resulted in improved and more rapid myelinated neural network development. The concurrent co-culture where MGCs were

#### REFERENCES


plated at the same time as DMMCs, performed the most consistently over all experimental repeats with reference to axonal coverage and myelination. Although both co-cultures had elevated GFAP and Iba1 mRNA expression at all time points relative to the DMMC this did not impact on the neural network development.

Comparing the co-cultures to whole brain extract, the layered co-culture expressed significantly decreased levels of myelin and Cdk5r1 resulting in lower neurofilament phosphorylation. The concurrent co-culture on the other had had significantly increased production of myelin similar to in vivo levels. Although it had lower levels of neurofilament phosphorylation relative to the whole brain control, although this was expected due to the spatial and ECM limitations of a 2D model. The concurrent co-culture showed consistent levels of PLP, Cdk5 and Cdk5r1 indicating that 21 days was sufficient to be considered a mature 2D in vitro model of the CNS. The concurrent co-culture method may provide a viable in vitro pre-clinical tool for assessing CNS cell responses as it mimics a more comprehensive number of properties of the mature healthy CNS, than existing in vitro models. Future work will characterize the concurrent co-culture response to physical injury and control biomaterials in order to assess its use as a tool for high-throughput pre-clinical testing of neural interfacing biomaterials.

#### DATA AVAILABILITY STATEMENT

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

### ETHICS STATEMENT

The animal study was reviewed and approved by the University of New South Wales Animal Care and Ethics Committee.

### AUTHOR CONTRIBUTIONS

The research studies presented herein were the work of AG during his doctoral studies, supervised by LP-W and RG. The manuscript preparation was undertaken by AG in consultation and with direct contribution from LP-W and RG.


Bogdanowicz, D. R., and Lu, H. H. (2013). Studying cell-cell communication in co-culture. Biotechnol. J. 8, 395–396. doi: 10.1002/biot.201300054


intracortical brain implant material reactions. Biomaterials 91, 23–43. doi: 10. 1016/j.biomaterials.2016.03.011




into neural cells on 3D poly (D, L-Lactic Acid) scaffolds versus 2D cultures. Int. J. Artif. Organs. 34, 1012–1023. doi: 10.5301/ijao.500 0002


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Gilmour, Poole-Warren and Green. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Restoring Somatosensation: Advantages and Current Limitations of Targeting the Brainstem Dorsal Column Nuclei Complex

Alastair J. Loutit and Jason R. Potas\*

School of Medical Sciences, UNSW Sydney, Sydney, NSW, Australia

Current neural prostheses can restore limb movement to tetraplegic patients by translating brain signals coding movements to control a variety of actuators. Fast and accurate somatosensory feedback is essential for normal movement, particularly dexterous tasks, but is currently lacking in motor neural prostheses. Attempts to restore somatosensory feedback have largely focused on cortical stimulation which, thus far, have succeeded in eliciting minimal naturalistic sensations. Yet, a question that deserves more attention is whether the cortex is the best place to activate the central nervous system to restore somatosensation. Here, we propose that the brainstem dorsal column nuclei are an ideal alternative target to restore somatosensation. We review some of the recent literature investigating the dorsal column nuclei functional organization and neurophysiology and highlight some of the advantages and limitations of the dorsal column nuclei as a future neural prosthetic target. Recent evidence supports the dorsal column nuclei as a potential neural prosthetic target, but also identifies several gaps in our knowledge as well as potential limitations which need to be addressed before such a goal can become reality.

#### Edited by:

Alejandro Barriga-Rivera, The University of Sydney, Australia

#### Reviewed by:

Solaiman Shokur, Federal Institute of Technology in Lausanne, Switzerland Aneesha Krithika Suresh, University of Chicago, United States

> \*Correspondence: Jason R. Potas j.potas@unsw.edu.au

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 22 November 2019 Accepted: 10 February 2020 Published: 28 February 2020

#### Citation:

Loutit AJ and Potas JR (2020) Restoring Somatosensation: Advantages and Current Limitations of Targeting the Brainstem Dorsal Column Nuclei Complex. Front. Neurosci. 14:156. doi: 10.3389/fnins.2020.00156 Keywords: neural coding, brain-machine interface, neuroprosthesis, cuneate, gracile, tactile, proprioception, sensory feedback

# INTRODUCTION

A current challenge in neural prosthetic development is how to artificially activate the central nervous system to restore touch and proprioceptive sensation to tetraplegic patients (Lebedev and Nicolelis, 2017). Developments in the neural prosthetics field have raised the possibility of restoring limb movement, either by functional electrical stimulation of a tetraplegic patient's own muscles (Ajiboye et al., 2017) or by facilitating control of a robotic limb (Collinger et al., 2013). In one paradigm, brain signals coding a patient's intended movement can be acquired and decoded to control a robotic limb via thought alone. Improvements in decoding algorithms and anthropomorphic robotic limb design have enabled complex, thought-controlled movements, but realistic limb movement will require restored somatosensation to facilitate closed-loop feedback control (O'Doherty et al., 2011; Delhaye et al., 2016).

Recently, human intracortical microstimulation (ICMS) has been successful in eliciting minimal naturalistic tactile and proprioceptive sensations (Flesher et al., 2016; Salas et al., 2018). In one subject, some ICMS protocols targeted in somatosensory cortex were perceived as natural

sensations such as squeezing, taps, vibration, and directional arm movement (Salas et al., 2018), whereas in another subject they were perceived as paraesthesia, buzzing, or almost natural (Flesher et al., 2016). Studies in monkeys have shown that different cortical stimulation parameters can elicit perception of variations in pressure, stimulus location, and virtual textures (Tabot et al., 2013, 2015; Kim et al., 2015), and can be used to provide artificial somatosensory feedback for movement control (O'Doherty et al., 2009; Klaes et al., 2014; O'Doherty et al., 2019). While these advances are promising, the effective restoration of natural tactile and proprioceptive feedback still faces many challenges.

One aspect requiring further investigation is whether other targets on the somatosensory neuraxis might offer advantages over the cortex for restoring somatosensory function. The complexity of neural networks in the cortex makes it a difficult region in which to target microstimulation. There has been better success in restoring somatosensory percepts in amputees by interfacing with peripheral nerves where the labeled line arrangement of afferent fibers has led to effective artificial recreation of somatosensory signals (Clark et al., 2014; Tan et al., 2015; Oddo et al., 2016; Valle et al., 2018; George et al., 2019). Users of some state-of-the-art peripheral nerve interfaces that used biomimetic stimulation approaches report that they feel as if they are grasping a real object and they can feel the intensity of the grasping force applied by the robotic hand (Valle et al., 2018). Another subject was able to determine whether the robotic arm held a golf ball or a lacrosse ball, based on their size, and discriminated the compliance of a soft foam block and hard plastic block during active manipulation with a robotic arm (George et al., 2019). The speed with which the subject could discriminate in these two tasks was significantly increased with biomimetic feedback algorithms, compared to simpler feedback algorithms using linear signal amplitude or frequency changes associated with the sensor output. Current peripheral neural prostheses outperform cortical ones for sensorimotor tasks. While integrating somatosensory feedback through ICMS is an impressive recent feat in humans, the subject still used a combination of visual and somatosensory feedback, and trained on the task for 2 years (Flesher et al., 2019).

Spinal cord injury sufferers require somatosensory signals to be recreated in the central nervous system above the site of damage, so peripheral interfaces are not appropriate for this purpose. In our view, the dorsal column nuclei (DCN, comprising the gracile and cuneate nuclei) and its complex (DCNc, comprising the DCN, external cuneate nuclei and nuclei X and Z), may be an ideal alternative target to the cortex as they are easily accessible, being located in a supraspinal position in the dorsal aspect of the brainstem medulla (**Figure 1**), and are one of the first processing sites for ascending somatosensory information from the entire body (excluding the head). As the DCNc are lower in the somatosensory processing hierarchy, it may prove easier for the brain to interpret artificial DCNc activation as naturalistic stimuli, mirroring the success of peripheral nerve interfaces. Perhaps the most crucial feature is that the DCNc are part of a distribution network that accesses not only the somatosensory cortex for conscious perception, but also other key brain regions including the cerebellum, tectum, pretectum, inferior olive, red nucleus, pontine nuclei, zona incerta, reticular formation, periaqueductal gray, and the spinal cord (**Figure 1**; Loutit et al., 2019b). Direct parallel access to these centers from the DCNc provides a distinct advantage as a neural prosthesis site over primary somatosensory cortex, which would not have the same direct access to other key sensorimotor systems. Congruently, the DCN have received attention as a prospective somatosensory neural prosthetic target. Recently, it was shown that chronically implanted microelectrode arrays in monkeys can collect stable recordings, and variations in electrical DCN stimulation can elicit behavioral responses that demonstrate perceptual discrimination (Richardson et al., 2015, 2016; Sritharan et al., 2016; Suresh et al., 2017). While these studies establish the DCNc as a potential target, there is a severe lack of fundamental knowledge of the DCNc functional organization and somatosensory signal processing, which needs to be addressed before this region can be pursued as a feasible neural prosthetic target.

Here, we review recent work on the functional organization and somatosensory-evoked signals of key contributors to the DCNc. We suggest that the DCNc show promise as a target for a somatosensory neural prosthetic device. We discuss some of the potential limitations of the DCNc as a neural prosthetic target and propose future directions that are necessary before development of a DCNc neural prosthesis can begin.

# DCNc FUNCTIONAL ORGANIZATION

Effective activation of the DCNc to elicit somatosensory percepts will require precise knowledge of its functional organization. The key components of the DCNc necessary to appreciate its potential use for neuroprosthetics are the gracile and cuneate nuclei, which are recipient of tactile (and other) inputs from lower and upper body afferents, respectively, and the external cuneate nuclei, nuclei X, and nuclei Z, which are key regions of proprioceptive inputs (summarized by **Figure 1**). For further details, we have recently performed a comprehensive review of the structural organization and the inputs and outputs of the DCNc (Loutit et al., 2019b). Interestingly, despite DCN neurons being somatotopically arranged across the coronal plane of the nuclei, evidence from our laboratory suggests that activity hotspots are spatially displaced across the surface of these nuclei when evoked from different stimulus locations (Loutit et al., 2017, 2019a). Recently, Suresh et al. (2017) also showed that macaque cuneate somatotopic maps are rostrocaudally organized, in addition to the medial-lateral and dorsal-ventral organization (Loutit et al., 2019b). Accordingly, attempts to stimulate cutaneous upper and lower body regions of the DCN, in addition to proprioceptive regions in the external cuneate nuclei, nuclei X, and nuclei Z, will have to target reasonably spatially displaced areas (Loutit et al., 2019b). Moreover, selective targeting of receptive fields to communicate contact location may require activation of neurons at different depths under the same surface region, which could prove difficult with current brain-machine-interface technologies. However, the

FIGURE 1 | Schematic diagram of the inputs and projections of the dorsal column nuclei complex and information flow of a potential neural prosthesis with brainstem somatosensory feedback. Shown are schematic views of the forebrain (coronal section, top), hindbrain (parasagittal section, bottom), and spinal cord and medulla (transverse sections, insert). The dorsal column nuclei complex (DCNc; collectively: CN, GN, ECN, X and Z) projects to many sensorimotor targets in addition to the commonly described pathway through the ventroposterior lateral nucleus (VPL) of the thalamus. Providing sensory feedback by cortical stimulation bypasses these other essential targets involved in sensorimotor function. Compared to the cortical approach, a DCNc somatosensory neural prosthesis would provide a more realistic quality sensorimotor experience by accessing these other key sensorimotor regions. Dashed lines indicate spinal cord and brainstem cross-sections shown in the insert. Insert: The DCNc receives upper and lower body cutaneous and proprioception-related afferents via the cf and gf of the dorsal columns, respectively. Some lower body proprioception-related, and mixed modality upper and lower body afferents, travel to the DCNc via the dorsal region of the lateral funiculus including, but not limited to, the dorsal spinocerebellar tract. These afferents primarily synapse in X and Z. Therefore, to adequately restore all tactile and proprioceptive elements of somatosensation, the entire DCNc may require targeting. Abbreviations: cf, cuneate fasciculus; CN, cuneate nucleus; ECN, external cuneate nucleus; gf, gracile fasciculus; GN, gracile nucleus; Po, posterior group of the thalamus; VPL, ventroposterior lateral nucleus of the thalamus; X, nucleus X; Z, nucleus Z.

different rostrocaudal sites could be exploited to activate neural populations with current multi-electrode array technologies that could otherwise not access different depths at adequate resolution under the same surface region.

Surprisingly, we also found that gracile activity hotspots evoked from bilateral nerve pairs were asymmetrically organized (Loutit et al., 2017, 2019a). This may indicate that the underlying structures that generate this activity are also asymmetrically organized. Previously, some variability in the somatotopic arrangement of hindlimbs has been shown in the gracile nuclei of cats and rats (Millar and Basbaum, 1975; Maslany et al., 1991). Little is known about lateralization in subcortical structures, but cortical lateralization related to handedness is a common phenomenon in mammals, including rats (Denenberg, 1981; Nudo et al., 1992; Dassonville et al., 1997; Hopkins and Cantalupo, 2004; Rogers, 2009). Recent evidence suggests that lateralization of cortical structures might, in part, result from gene expression asymmetries in the spinal cord (Ocklenburg et al., 2017). If this is the case, it is likely that the DCNc and other nuclei along the motor and sensory pathways between the spinal cord and the cortex will also show structural asymmetry. Preliminary data from our laboratory indicate that DCNc functional lateralization is related to paw dominance in the rat. DCNc asymmetries will need to be considered when designing a future DCNc neural prosthesis, but if each neural prosthesis is tailored to an individual, asymmetry is unlikely to be of major concern.

#### DCNc NEUROPHYSIOLOGY

To be a useful neural prosthetic target we propose that the DCNc neurophysiological characteristics must meet some preliminary conditions. Firstly, somatosensory-evoked DCNc signals must be shown to be robust and reproducible and, secondly, they must contain information that can be used to predict the location and quality of somatosensory stimuli. Signal features that reliably predict the location and quality of somatosensory stimuli indicate that they are relevant to the peripheral somatosensory event. Therefore, these features may inform the construction of artificial stimulus patterns that can activate DCNc neurons and elicit somatosensory percepts of natural quality.

Toward this goal, our laboratory has characterized somatosensory-evoked DCN surface activity (Loutit et al., 2017) and used feature-learnability (Loutit and Potas, 2019; Loutit et al., 2019a) – a machine-learning approach for evaluating the relevance of input features to the outputs – to determine the most useful signal features for predicting the location and quality of somatosensory stimuli. We consistently found DCN signal features contain a unique profile of high-frequency (HF) and low-frequency (LF) content when evoked from predominantly cutaneous nerves compared to nerves with mixed afferents from deep and cutaneous structures (Loutit et al., 2017, 2019a), and similarly, from tactile- or proprioceptive-dominated mechanical stimuli (Loutit and Potas, 2019). We extracted signal features from surface potential recordings of the DCN, and by using feature-learnability, were able to establish the relevance, or importance, of information inherently encoded in these signal features for (1) predicting the nerve or paw that was stimulated, and (2) the tactile or proprioceptive quality of the stimuli (Loutit and Potas, 2019; Loutit et al., 2019a). The best individual HF DCN signal features predicted electrically and mechanically evoked somatosensory events with 87 and 70% accuracy, respectively, while the best LF features achieved 90 and 66% accuracy, respectively (Loutit and Potas, 2019; Loutit et al., 2019a), suggesting that both frequency bands represent physiological events relevant to the somatosensory stimuli.

Before artificial stimulus features can be designed to activate the DCNc, we need a greater understanding of somatosensory information coding in the DCNc. In the following section we discuss how knowledge of DCN functional organization and signal features, such as those described above, can inform the development of a future neural prosthetic device.

# DISCUSSION

### A Potential DCNc Neural Prosthesis

Two groups have successfully achieved chronic implantation of Utah microelectrode arrays (Blackrock Microsystems) and floating microelectrode arrays in the cuneate nuclei of macaques, which were able to obtain stable recordings up to about 140 days post-implantation, and awake behaving macaques could detect amplitude-dependent DCN stimulation at 100 Hz (Richardson et al., 2015, 2016; Sritharan et al., 2016; Suresh et al., 2017). These studies have demonstrated proof of principle that chronic microelectrode array implants are stable in this region and that peripheral receptive fields can be selectively activated. However, the next key advancement will be the careful selection and testing of parameters for DCN stimulation to elicit naturalistic sensations.

The HF and LF DCN activity we have investigated is either directly recorded volleys of action potentials arriving in the DCN from afferent fibers (HF activity), or from the subsequent activation of DCN neurons (HF and LF activity). Therefore, to ensure that neural activity giving rise to both the HF and LF features is restored, consideration should be given to whether the best approach is to activate the cuneate and gracile fasciculus fibers of the dorsal column (DC), rather than DCN neurons, or perhaps both. Some interesting recent studies show that rats and monkeys can detect differences in frequency and location of epidural DC stimulation (Yadav et al., 2019, 2020), which is a potential approach for restoring somatosensory feedback and is an FDAapproved method of chronic pain management in humans. However, exclusively stimulating the DC may limit the ability to activate lower body proprioceptive, and potentially other somatosensory information that travels in the lateral funiculus, projecting to nuclei X and Z (Loutit et al., 2019b). Like the sensorimotor cortex, the DC and the DCN are both somatotopically organized and modality segregated (Whitsel et al., 1969, 1970; Niu et al., 2013; Loutit et al., 2019b), making them useful targets for signaling contact location, and different sensory qualities. To restore tactile and proprioceptive sensation

for both the upper and lower body, it will be necessary to either incorporate the lateral funiculus with DC stimulation, or target the entire DCNc.

The modular arrangement of the DCNc may be advantageous for neural prosthetic applications because each specific target relevant to the deficit region can be restored. The modularity and apparent sparsity of interconnectedness within the DCNc suggests that key regions can be specifically targeted with a neural prosthesis, without activating adjacent intact regions. Moreover, the entire body except the head is represented within a relatively small area across the DCNc surface (approximately 16 mm<sup>2</sup> ), facilitating access to large body regions. One of the challenges faced in ICMS is the large cortical surface area dedicated to processing somatosensory information from the human hand (Collinger et al., 2018), which spans approximately 4 cm along the post-central gyrus (Flesher et al., 2016). Current microelectrode arrays are relatively small (typically 4 mm × 4 mm) and therefore can only evoke sensations in small regions of the hand. While the compactness of the DCNc is advantageous, a key challenge will be increasing the number and density of electrodes used to activate the DCNc with high precision, however, this challenge is currently met with intense research effort.

Compared to attempts to restore sensation in tetraplegics, approaches to restore somatosensation in amputees with upper limb prostheses have been relatively successful, and may guide approaches in the DCNc. Peripheral nerve stimulation has been successful in eliciting percepts of contact location, pressure, proprioceptive qualities, and textural discrimination (Dhillon and Horch, 2005; Clark et al., 2014; Tan et al., 2014, 2015; Oddo et al., 2016). Typical stimulation protocols deliver trains of electrical pulses to peripheral nerves with amplitudes varying between 20–300 µA, and frequencies of 10–300 Hz (Raspopovic et al., 2014; Oddo et al., 2016; Valle et al., 2018; George et al., 2019), which are similar to those used in cortical stimulation (Johnson et al., 2013; Flesher et al., 2016; Salas et al., 2018). When linearly encoded, a higher value from a robotic force sensor produces an increased stimulation frequency or amplitude, which induces perception of increased stimulus intensity (Johnson et al., 2013; Raspopovic et al., 2014; Flesher et al., 2016; George et al., 2019).

While these linear encoders can elicit perception of changes in stimulus intensity, they are often not perceived as naturalistic by the user. Of particular interest is trying to create biomimetic artificial touch, which would create naturalistic activation patterns, and therefore naturalistic sensations, in response to spatiotemporal stimulation patterns (Saal and Bensmaia, 2015). Biomimetic stimulus patterns mimic attributes of fast- or slowly adapting afferents, by modulating stimulus frequency or amplitude at different phases of a stimulus presentation e.g. varying the stimulus at the onset, offset, static, or dynamic phases of a stimulus. Indeed, spike timing and temporal features of spike trains, independent of mean spike rates, encode a variety of tactile stimulus features (Johansson and Birznieks, 2004; Saal et al., 2009; Birznieks et al., 2010; Birznieks and Vickery, 2017; Ng et al., 2018). Such parameters have facilitated the instantaneous estimation of fingertip forces, essential for tasks like object manipulation (Khamis et al., 2015). Recent biomimetic testing has shown that spatiotemporal stimulation patterning that mimics firing patterns of different fast- and slowly adapting peripheral afferents generates more natural percepts to the user (Oddo et al., 2016; Valle et al., 2018; George et al., 2019). This is complemented by evidence suggesting that vibration and intensity can be multiplexed by peripheral neural coding, without the need to alter current intensity (Ng et al., 2019). However, a biomimetic approach may be technologically limited by the number of electrodes that can be implanted in a peripheral nerve, to selectively activate individual or small groups of afferents of different submodalities.

Saal and Bensmaia (2015) have suggested that in higher centers, it may not be necessary to selectively activate so many neurons. Their reinterpretation of peripheral afferent coding suggests that all afferent classes encode aspects of most tactile features (Saal and Bensmaia, 2014), and electrophysiological evidence suggests massive cutaneous primary afferent convergence onto multiple DCN neurons (Witham and Baker, 2011; Bengtsson et al., 2013; Jörntell et al., 2014). Therefore, it may be possible to use fewer electrodes to biomimetically activate a small sample of neurons in the DCN that convey more complex, naturalistic, tactile features than attempting to stimulate a larger number of primary afferents.

One group has shown that microstimulation applied to the DCN of macaques at 100 Hz could be detected at amplitudes of 45–80 µA (Sritharan et al., 2016), which is comparable to cortical stimulation ranges that elicit somatosensory percepts. It is unclear what perceptual qualities are elicited from DCN stimulation as there are no reported studies of DCN electrical stimulation in humans. However, evidence from the abovementioned peripheral nerve studies suggests that future attempts to stimulate the DCN may benefit from adopting a biomimetic approach.

#### Limitations

Aside from the potential benefits, there are several concerns for targeting a somatosensory neural prosthesis in the DCNc. The required surgery to place electrodes in the DCNc is more invasive than in the cortex. Currently, the surgery will likely involve cutting the trapezius, splenius capitis, and semispinalis capitis muscles of the posterior neck, whereas cutting, removal, and replacement of a section of cranium is safer and routinely performed in humans. Moreover, the primary goal for spinal cord injury patients is to restore motor control. The state-of-the-art upper limb motor prostheses for tetraplegics are driven by neural activity recorded from electrodes in the motor cortex (Collinger et al., 2013; Ajiboye et al., 2017). A single surgery is required to place both motor and somatosensory arrays on the cortex to restore sensorimotor functions. Conversely, to achieve sensory feedback using a DCNc somatosensory neuroprosthesis will require two surgeries; one in the cortex and one in the brainstem (**Figure 2** shows the proposed site). Future investigations will need to assess these risks and demonstrate that the sensory improvements achieved by a DCNc implant outweighs that which can be achieved by a cortical implant.

Current chronic DCNc electrode arrays risk being moved or damaged by head and neck movements. Several

FIGURE 2 | Proposed site for a potential dorsal column nuclei complex somatosensory neural prosthesis. Parasagittal view of a human brainstem and cerebellum (Cb). Dashed line indicates location of dorsal column nuclei complex (DCNc). As brain-machine-interface technology advances, future approaches may incorporate soft nanowire electrode "threads" that would permit stable targeted neural excitation during movement of the brainstem. Arrows indicate the DCNc region covered by the Cb, which can be easily retracted for electrode implantation if required. The gloved finger is inserted in the space proposed for surgical access, i.e. between the 1st cervical segment (C1) and the occipital bone (Oc).

failed experiments in macaques have been reported due to damaging the wire bundles that transmit electrical signals between the electrode arrays and headstages fixed to the skull, or from the arrays falling out (Suresh et al., 2017). Thus far, rigid microelectrode arrays have been used, but the development of new technologies that use less rigid array structures to accommodate movement, could solve this issue. For example, recently an approach has been developed that delivers small flexible electrode "threads" into the brain (Musk, 2019). Each thread can be targeted with micrometer precision and each array can have up to 3,072 active electrodes. New technologies such as this would permit the insertion of a network of flexible electrodes that could be sewn in place at high resolution throughout the entire DCNc (**Figure 2**), while permitting stable recordings during movement of brain tissues and without causing tissue damage.

The safety and efficacy of DCNc array insertion and electrical stimulation is also yet to be established. As described above, experiments in macaques showed that DCNc stimulation could be detected with currents in the range deemed safe to avoid neural damage (<100 µA per electrode; 20 µC/phase) (Chen et al., 2014; Rajan et al., 2015; Flesher et al., 2016). However, DCNc tissues will need to be analyzed following chronic microelectrode implantation and stimulation, to determine if the effects differ to that shown in the cortex. Moreover, penetrating electrodes and electrical stimulation in the DCNc pose a risk of damaging or activating neurons in respiratory control centers including the rostral ventrolateral medulla, the ventral respiratory column, and the nucleus of the solitary tract (Zoccal et al., 2014). While the ventral position of the first two centers are unlikely to be affected by DCNc stimulation, there is some risk of physically penetrating or activating the nucleus of the solitary tract, which is located near the DCNc ventral border. In our laboratory we have routinely inserted electrode arrays in the gracile and cuneate nuclei of rats that occasionally penetrate the ventral border without affecting any cardiorespiratory functions. The two macaque studies using chronic arrays in the DCNc also reported no issues with cardiorespiratory function either from physical penetration or from stimulation up to 100 µA (Richardson et al., 2016; Sritharan et al., 2016; Suresh et al., 2017). Nevertheless, the potential to cause adverse effects on respiratory control or coupling of cardiovascular and respiratory activities is a serious concern, and safe stimulation levels and penetration depths will need to be established.

Finally, spinal cord or peripheral nerve injury has been shown to cause changes in DCN somatotopy and is thought to be a crucial driver of some cortical somatotopic reorganization (Kambi et al., 2014). This will need to be considered in each subject, to determine how best to target microelectrode arrays.

### FUTURE DIRECTIONS

We believe that a DCNc somatosensory neural prosthesis is a goal worth pursuing and may provide advantages over cortical somatosensory neural prostheses. However, there are a number of concerns that need be addressed, regarding the safety and efficacy of placing microelectrode arrays and electrically stimulating in the DCNc, before a brainstem somatosensory neural prosthesis can be considered feasible. Compared to peripheral nerves and the somatosensory cortex, there is also a dire lack of knowledge about how somatosensory information is coded in the DCNc, which demands future efforts directed toward the understanding of how tactile and proprioceptive features are represented in DCNc neurons. The next frontier will then be to determine how to implement neural codes using a biomimetic approach to artificially stimulate the DCNc which is already connected to multiple sensorimotor systems, including conscious (likely involving the cortex) and unconscious (non-cortical) pathways. Such an approach could enable the subject to receive tactile and proprioceptive sensations from an anthropomorphic robotic limb for complete sensorimotor integration into multiple systems.

# AUTHOR CONTRIBUTIONS

Both authors wrote the manuscript.

# FUNDING

We are grateful to the Bootes Medical Research Foundation for financial support of our research and open access publication fees.

# REFERENCES

fnins-14-00156 February 26, 2020 Time: 18:8 # 7



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Loutit and Potas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Stimulation Strategies for Improving the Resolution of Retinal Prostheses

Wei Tong1,2,3, Hamish Meffin1,2,4, David J. Garrett<sup>3</sup> and Michael R. Ibbotson1,2 \*

<sup>1</sup> National Vision Research Institute, Australian College of Optometry, Carlton, VIC, Australia, <sup>2</sup> Department of Optometry and Vision Sciences, Melbourne School of Health Sciences, The University of Melbourne, Melbourne, VIC, Australia, <sup>3</sup> School of Physics, The University of Melbourne, Melbourne, VIC, Australia, <sup>4</sup> Department of Biomedical Engineering, The University of Melbourne, Melbourne, VIC, Australia

Electrical stimulation using implantable devices with arrays of stimulating electrodes is an emerging therapy for neurological diseases. The performance of these devices depends greatly on their ability to activate populations of neurons with high spatiotemporal resolution. To study electrical stimulation of populations of neurons, retina serves as a useful model because the neural network is arranged in a planar array that is easy to access. Moreover, retinal prostheses are under development to restore vision by replacing the function of damaged light sensitive photoreceptors, which makes retinal research directly relevant for curing blindness. Here we provide a progress review on stimulation strategies developed in recent years to improve the resolution of electrical stimulation in retinal prostheses. We focus on studies performed with explanted retinas, in which electrophysiological techniques are the most advanced. We summarize achievements in improving the spatial and temporal resolution of electrical stimulation of the retina and methods to selectively stimulate neurons with different visual functions. Future directions for retinal prostheses development are also discussed, which could provide insights for other types of neuromodulatory devices in which high-resolution electrical stimulation is required.

#### Edited by:

Tianruo Guo, University of New South Wales, Australia

#### Reviewed by:

Daniel Llewellyn Rathbun, Henry Ford Health System, United States David Tsai, University of New South Wales, Australia

#### \*Correspondence:

Michael R. Ibbotson mibbotson@nvri.org.au

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 27 November 2019 Accepted: 09 March 2020 Published: 26 March 2020

#### Citation:

Tong W, Meffin H, Garrett DJ and Ibbotson MR (2020) Stimulation Strategies for Improving the Resolution of Retinal Prostheses. Front. Neurosci. 14:262. doi: 10.3389/fnins.2020.00262 Keywords: retina, retinal ganglion cell, electrical stimulation, stimulation resolution, retinal prostheses

# INTRODUCTION

Vision is amongst the most vital tools for functioning in daily activities. In healthy eyes, light enters through the cornea and is focused by the cornea and lens, onto the retina, the light sensitive tissue lining the back of the eye (**Figure 1A**). The retina (**Figure 1B**) contains light sensitive photoreceptors, including rods and cones, which can then transduce the light into chemical and electrical signals. The signals are sent to other neurons in the retina, including bipolar cells and retinal ganglion cells (RGCs). RGCs have axons that collectively form the optic nerve and deliver neural signals to the central brain. The brain processes the signals in a series of complex ways to ultimately generate the sensation of vision.

Retinal degenerative diseases, including age-related macular degeneration (AMD) and retinitis pigmentosa (RP), are leading causes of major vision loss and blindness worldwide (Bourne et al., 2013). Approximately one in every 3,000–7,000 people is affected by RP (Ferrari et al., 2011) and over 8% of the population over 45 have evidence of macular degeneration (Wong W. L. et al., 2014). Both diseases lead to the loss of photoreceptor cells, thus depleting the ability of retinas to transduce light into useful visual signals. For both AMD and RP, currently available therapies normally only

**67**

aim to slow down the death of photoreceptors, by providing nutritional supplements (Krishnadev et al., 2010) or through anti-vascular endothelial growth factor (anti-VEGF) injections (Ba et al., 2015) and lasers (Virgili et al., 2015), with limited available treatments for stopping the progression of the diseases or restoring vision. More recent treatments showing encouraging results include gene therapy and cell transplantation (Scholl et al., 2016). For both these therapies, several issues remain unresolved. Gene therapy currently suffers from limited recognized mutations for treatment (Hartong et al., 2006) and cell transplantation has difficulties with cell function and connectivity.

Over the last two decades, retinal prostheses that electrically stimulate surviving retinal neurons have emerged as a promising treatment for returning sight to the blind (Goetz and Palanker, 2016; Weiland et al., 2016). These devices can be categorized into three types depending on the location of the electrode arrays (**Figure 1B**). Epi-retinal devices have electrode arrays on top of the retina, in contact with the RGC layer. Sub-retinal implants are placed under the retina, closest to diseased photoreceptor layer. Suprachoroidal implants are between the sclera and choroid. Several devices have been implanted into human patients, such as Second Sight's epi-retinal Argus II (Stronks and Dagnelie, 2014), Retina Implant AG's sub-retinal Alpha AMS (Stingl et al., 2017), Bionic Vision Australia's suprachoroidal devices (Ayton et al., 2014), and Pixium Vision's epi-retinal IRIS II, and the most recent sub-retinal PRIMA. Most of the clinical results released by these consortiums have been positive: patients have reported the ability to detect light, categorize large objects from a list and even identify large letters (Zrenner et al., 2011; Humayun et al., 2012; Stingl et al., 2013; Ayton et al., 2014). Nevertheless, the visual resolution obtained from existing devices is very limited, meaning that even recognizing simple objects is challenging. Crucial abilities, such as facial recognition, are not yet possible. Snellen acuity is commonly used for describing visual acuity. A Snellen acuity of 20/20 represents normal vision, and 20/200 is defined legally blind. The best acuities reported in literatures so far from clinical trials are 20/1260 from Argus II (Humayun et al., 2012), 20/546 from Alpha-AMS (Stingl et al., 2017) and between 20/4451 and 20/21059 from BVA suprachoroidal devices (Ayton et al., 2014), all within legal blindness. The clinical results from retinal prostheses have been reviewed recently by Ayton et al. (2019).

Animal testing can evaluate and predict the performance of devices prior to clinical trials. Compared with in vivo testing, ex vivo experiments using explanted retinas are normally easier to perform, with more advanced electrophysiological approaches and have provided a large amount of important information to understand the performance of retinal prostheses. The knowledge gained from ex vivo experiments ranges from a better understanding of electrical stimulation, potential explanations of clinical observations, to the development of novel stimulation strategies. In this review, we first describe the current challenges in electrical stimulation of retinal neurons, which limit the performance of retinal prostheses. We then introduce the animal models commonly used, and recent advances in

electrophysiological tools for retinal experiments. After this, progress in the last 5 years in improving the resolution of electrical stimulation of retinal neurons is summarized. Finally, we discuss the trends for the next generation of retinal prostheses, which could provide insights to future development and guide the design of other neuromodulation devices.

# CURRENT CHALLENGES AND LIMITATIONS

The key challenges that inhibit visual function of retinal prostheses can be summarized as follows: (1) limited spatial resolution; (2) limited temporal precision; and (3) unselective activation of different visual pathways.

# Limited Spatial Resolution

Single electrode stimulation generates the perception of spots of light, referred to as phosphenes. However, patients often report phosphenes that are larger than the electrodes and distorted in shape. Ideally, stimulation of individual retinal neurons is desired to restore natural vision. There are over 1.5 million RGCs in the human retina (Harman et al., 2000) with the largest soma having a diameter of about 30 µm (Liu et al., 2017). Argus II devices stimulate with 60 electrodes, each of 200 µm in diameter (Dorn et al., 2013), Alpha AMS with 1600 electrodes, each of 30 µm (Stingl et al., 2017) and the BVA suprachoroidal devices with only 44 electrodes, each of 500 µm diameter (Ayton et al., 2014; Abbott et al., 2018). All of them are similar or far larger than the size of individual somas. There are several technical limitations to using higher density electrode arrays. For example, the impedance of electrodes increases when their size is reduced. High impedance electrodes require higher voltage stimulation drivers which consume more power. Many materials do not have suitable electrochemical properties to elicit neural activity within the safe charge injection limit.

Another common cause of low spatial confinement of activation is a gap between the electrode array and the surface of the retina. The electric field above a stimulating electrode rapidly spreads in a lateral direction with distance above the electrode resulting in a loss of spatial confinement. Epi-retinal devices are intended to stimulate RGCs, however large electrode-retina gaps after surgery have been reported (Gregori et al., 2018). Sub-retinal devices stimulate nearby inner retinal neurons and thereby take advantage of the natural signal processing by sending signals in the direction that a healthy retina would normally employ. For these devices, there is also potential separation between the inner retinal cells and the surface of the electrode array as degenerative retina often have a layer of debris as photoreceptors are replaced during degeneration. With suprachoroidal devices, the electrode/neuron separation is even larger – usually around 1 mm is expected.

Even when the placement of the electrodes is close and the size of the electrodes is comparable to the targeting neurons, there are other biological issues to be resolved. One critical problem is the activation of RGCs axon bundles (Fried et al., 2009), which is associated with patient reports of elongated phosphenes (Beyeler et al., 2019). This phenomenon occurs when electrodes not only stimulate the nearby neurons, but also errantly stimulate neurons from remote locations connected to the activated axons passing near the electrode.

# Limited Temporal Precision

In addition to localized activation, electrical stimulation with high temporal precision is required to replicate visual responses in retina. RGCs can be stimulated either directly by the electrode or indirectly through the retinal network. Network mediated stimulation may take advantages of the natural signal processing in the retina. In a subset of RGCs, their responses through network mediated stimulation were found to be similar to a natural light response, although delays of tens of ms were observed (Im and Fried, 2014). However, retinal remodeling can happen during degeneration (Jones and Marc, 2005), making it unclear if the natural signal processing function in retina is preserved or not. Compared with network mediated responses, the responses of RGCs to direct stimulation normally happen within a short delay (below 5 ms). However, the encoding of images based on the direct RGC responses requires sophisticated image processing techniques in order to account for the natural visual processing in retinal circuits.

Another problem limiting temporal performance is the loss of responses to high frequency repetitive stimulation which has been found in all types of retinal cells. In a healthy retina, photoreceptors can resolve repetitive frequencies of 20–50 Hz (Zrenner, 2013), leading to the RGCs firing at frequencies over 200 Hz (Koch et al., 2004). However, in most cases, retinal prostheses allow an image refreshment frequency of 5–20 Hz, and images "fade" after repetitive stimulation (Zrenner, 2013). Therefore, the loss of responses to high frequency repetitive stimulation may be one of the reasons for image fading.

# Unselective Activation of Different Visual Pathways

The third limitation for existing devices originates from the nonselective stimulation of the many visual pathways within retina. In general, RGCs can be classified as ON or OFF cells. The spike rates of ON cells increase when light illuminates the center of the cell's receptive field, while the spike rates of OFF cells increase at light offset. In natural vision, ON and OFF cells in any patch of visual space are not activated simultaneously as light and dark patches are segregated. To date, more than 30 types of mammalian RGCs have been identified (Baden et al., 2016), each responsible for different aspects of visual information processing such as brightness, contrast, movement and color. Current retinal prostheses stimulate all types of retinal neurons in a similar manner without any preference, which is very different from the way that a healthy retina processes images. Approaches for selective activation of different RGC types are expected to significantly improve the vision restored.

# EXPERIMENTAL METHODS

# Animal Models

fnins-14-00262 March 24, 2020 Time: 16:1 # 4

The animal models that have been used for visual processing research range from salamander to primates (including humans). The most popular models for studying the responses of retinal cells to electrical stimulation are mice, rats, rabbits and monkeys. Mammalian species share similar types of neurons in retina, e.g., photoreceptors, bipolar cells and RGCs, along with horizontal and amacrine cells, which provide lateral interactions (**Figure 1B**). However, there are also some differences between species. For example, in humans and some other mammals such as monkeys and cats, the location of the highest acuity in the retina is a small region at the center of the visual field that has the highest density of RGCs (area centralis). In rabbits, the area of highest acuity in their retina is not a single, restricted region but an elongated zone running across the retina, referred to as the visual streak. In contrast, rodents have RGCs distributed more uniformly without an obvious area centralis or visual streak.

The terminology commonly used for referring to different types of RGCs in each species differs (**Table 1**). For example, RGCs with large somas, large dendritic sizes and large receptive fields are referred to as Alpha or A cells in rodents and cats, but can also be known as Y cells in cats. These cells are similar to so called brisk transient cells in rabbits and parasol cells in primates. These cells can be further classified into ON or OFF cells, although there are even more subgroups for A cells in rodents, including sustained ON, sustained OFF and transient OFF according to their light responses. RGCs with very small somas, small dendritic sizes and also small receptive fields are known as Beta or B cells in cats and rodents, but can also be known as X cells in cats. These cells are similar to so-called brisk sustained cells in rabbits and midget cells in monkeys. In primate, the midget cells are known to be the main vehicle for generating high-resolution vision, but the function of beta cells in rodents is less clear (Sanes and Masland, 2015). Similar with alpha (A) cells, beta (B) cells also have ON and OFF responses to light illumination.

Despite the differences between retinas in rodents and primates, rodents are now the most popular species for research, in part due to their low costs and shorter breeding periods. Rodent animal models of retinal degeneration are also available, which are more relevant for studying retinal responses in terms of retinal prostheses. There are at least 15 mouse models of retinal degeneration with varying rates of photoreceptor loss, from a few days (rd1), to several months (rd10) (Chang et al., 2002). The commonly used rat models of retinal degeneration include Royal College of Surgeon (RCS), P23h and 344-ter rats (Goetz and Palanker, 2016). Photoreceptor degeneration is faster in RCS rats with complete death of photoreceptors and loss of light responses by the age of 90 days (P90) (Ryals et al., 2017). In the other two types of rats, the degeneration is slower, with light responses being found even at P500 in P23h rats (Sekirnjak et al., 2009). Depending on the stage of retinal degeneration of interest, different animal models have been used for different reasons. Abnormal spontaneous behaviors have been reported in degenerated retinas, e.g., RGCs tend to show low levels of background oscillation and bursts of spikes (Margolis and Detwiler, 2011). During electrical stimulation, such abnormal spontaneous activities lead to low signal-to-noise ratios (Choi et al., 2014). Several studies have also reported elevated thresholds for RGC stimulation in degenerated retinas (Jensen and Rizzo, 2009; Chan et al., 2011; Cho et al., 2016), although some others showed no significant differences (Sekirnjak et al., 2009; Cho et al., 2016). The differences observed between degenerated and healthy retinas further indicate the importance of using animal models with retinal degeneration for developing stimulation strategies for retinal prostheses.

# Electrophysiological Tools

Several electrophysiological tools have been applied for recording the responses of retinal neurons to electrical stimulation (**Figure 2**). To retain the integrity of the retinal circuits, experiments are normally performed using whole-mount retina, kept in a perfusion chamber with oxygenated Ames' medium at physiological temperatures between 33 and 37◦C.

#### Patch Clamping

Patch clamping (**Figures 2A,B**) is one of the most commonly used techniques for intracellular recording of neural responses to electrical stimulation in retina. Whole-cell patch clamping allows simultaneous recording from multiple ion channels by measuring membrane potentials or ionic currents. The impact of synaptic activity and individual ion channels can be studied when using various synaptic or ion channel blockers. In some studies, loose patch clamping has been used, which is less invasive and does not damage the integrity of the cell membrane (Im and Fried, 2014, 2015, 2016; Lee et al., 2017; Im et al., 2018).

To record RGCs from whole-mount retinas, it is sometimes necessary to first reveal RGCs by making small holes in the inner limiting membrane (ILM), but this does little damage to the cells (Cloherty et al., 2012). To record the responses of inner retinal neurons, retinal slice preparations have been used to gain access to the cells (Margalit and Thoreson, 2006; Cameron et al., 2013). Disadvantages of retinal slices are that they sever lateral synaptic connections and suffer from significant current shunting around the tissue during stimulation (Margalit et al., 2011). In whole-mount retina, the patch clamping of bipolar cells was made possible by first peeling off the photoreceptors using filter paper

TABLE 1 | Commonly identified RGCs and their names in different animal models. Animals Mouse Rat Rabbit Cat Primate Large soma and large dendritic field Alpha cells A cells Brisk transient cells Alpha cells/Y cells Parasol cells Small soma and small dendritic field Beta cells B cells Brisk sustained cells Beta cells/X cells Midget cells

(Walston et al., 2018). In another work, the patch clamping on inner retinal neurons was also achieved using sharp glass pipettes without removing any layer of the retina (Tsai et al., 2017a).

Although patch clamping can provide the most information about a single neuron, due to the difficulty of the technique, it only allows simultaneous recording of small numbers of neurons, and is delicate and time consuming, which requires a great deal of training, practice and experience.

#### Extracellular Recording

fnins-14-00262 March 24, 2020 Time: 16:1 # 5

Extracellular recording is currently the only clinically viable method to measure retinal neuron signals. Ex vivo, it has been performed using either single sharp electrodes made of metals or carbon fibers, or multielectrode arrays (**Figures 2C,D**). Compared with intracellular recording, the signal-to-noise ratio from extracellular recording is lower, so it is more difficult to remove the artifacts arising from electrical stimulation. Recordings from single electrodes can only record from a small number of single neurons, while population information can be obtained using multi-electrode arrays. With the latest multielectrode array systems, it is possible to simultaneously record and classify recordings from more than 1,700 RGCs in a single experiment, using the high spatial and temporal spiking activities collected from the recording system (Tsai et al., 2017b). Such recording, at subcellular resolution, is termed electrical imaging, and its principle and application has been reviewed by Zeck et al. (2017).

#### Optical Imaging

Optical imaging using activity sensitive fluorescent dyes, mainly calcium imaging (**Figures 2E,F**), is another useful electrophysiological tool for studying the activities of neurons in retina. The dyes are first introduced into target neurons and the change of fluorescence intensity is used to infer neural activities, such as action potentials. Calcium imaging techniques for studying neural activity has been reviewed previously (Grienberger and Konnerth, 2012). The advantages of optical imaging include easy identification of soma locations and an absence of an electrical artifact, both of which reduce the burden of data analysis compared with electrical recording. Although recording of single action potentials with calcium imaging has been demonstrated (Smetters et al., 1999), this has rarely been demonstrated in the retina. Optical imaging is slow and therefore yields low temporal resolution recordings compared with electrical recording. One limitation lies in the low imaging frame rates available on most microscopes (normally around 10– 20 Hz), significantly lower than sampling frequencies used during electrical recording (∼10–40 kHz). In addition, the fluorescence

FIGURE 2 | Electrophysiological techniques for recording neural responses in retina. (A) A RGC during whole cell patch clamping. The glass pipette electrode is in contact with the RGC's soma. (B) The membrane potential of a RGC in response to electrical stimulation, with an action potential (circle), a spikelet (cross) and no response. Black triangles indicate the stimulation artifacts, which were at the time of stimulation. (C,D) Electrical image of a single RGC recorded by a multielectrode array. (C) Raw voltage traces (left) and the average waveforms (right) as a function of time recorded on the six electrodes indicated in (D). The maximum absolute amplitude of average voltage deflections from (C) are shown for each of the 519 electrodes in the hexagonal array in (D), indicated by the diameter of the dot plotted at each electrode location. Times of easily identified spikes recorded on Electrode 1 are identified as ticks in (C, left top). (E,F) Calcium imaging of a population of RGCs responding to electrical stimulation. The change of fluorescence intensities of five cells indicated in (E) to electrical stimulation are shown in (F). (A,B) Are adapted with permission from Soto-Breceda et al. (2018). (C,D) are from Li et al. (2015). (E,F) Are adapted with permission from Tong et al. (2019b).

intensity of the activity indicators needs some time to decay to their background levels following neural activity and, depending on the indicator type and strength of the neural activity, the decay may take up to several seconds.

Several techniques have been reported for large-area loading of retinal cells with calcium indicators. Behrend et al. (2009) first reported the loading of RGCs in whole-mount retinas by immersing the optic nerve stumps in dye solution, but the method failed in adult mammal retinas. Multicell bolus loading (Borghuis et al., 2011) using membrane permeable indicators was reported but uniform staining was difficult. Other methods that successfully stained RGCs in mammalian retinas include electroporation (Baden et al., 2016), dye incubation after dissolving the ILMs (Cameron et al., 2016), direct dye injection into the optic nerve (Tong et al., 2019a), and transduction with genetically encoded calcium indicators through adenoassociated viral vectors (Weitz et al., 2015). To reveal the activities of degenerated photoreceptors to electrical stimulation, incubation of retina with cell permeable dyes has been reported (Haq et al., 2018).

# RECENT PROGRESS

# Electrical Stimulation of Retinal Neurons

Electrical stimulation of retinal neurons can be delivered intracellularly or extracellularly. Intracellular stimulation works by directly injecting current into the cells, normally through patch clamping electrodes. No clinical application is currently available for intracellular stimulation. Nevertheless, intracellular stimulation is a useful approach for characterizing the intrinsic properties of neurons (Wong et al., 2012; Hadjinicolaou et al., 2016). Without contribution from the network, intracellular stimulation simplifies the study by focusing on the properties of the recorded neurons and avoiding the complexity of extracellular stimulation, in which the placement of the electrode plays a significant role.

All clinical neural implants operating today use extracellular stimulation, which works by depolarizing cells in an electric field, instead of directly injecting current into the cells (Rattay, 1999; Meffin et al., 2012). In the most common mode of RGC stimulation, a non-uniform electric field is required. The non-uniform electric field causes charges to redistribute across the membrane of an axon or dendrite and concentrates them at the point where the gradient of the electric field is the greatest along the fiber. Stimulation of bipolar cells is most common through depolarization caused by charge accumulation at synaptic terminals, which can occur even in a uniform electric field directed across the cell (Werginz and Rattay, 2016). Firing of action potentials can be initiated when the membrane depolarization exceeds a threshold. The charge redistribution may happen on the membranes of axons, somas and dendrites, which all contribute to the depolarization of the retinal neurons. For RGCs, experimental evidence indicates that the axon initial segment (AIS), which is located at the proximal end to the soma and contains a high density of sodium channels, is the most sensitive area for activation (Fried et al., 2009). The AIS has the lowest activation threshold, followed by other axonal sections and the soma, with the dendrites exhibiting the highest threshold to electrical stimulation (Fried et al., 2009; Tsai et al., 2012). In addition to RGCs, extracellular stimulation can also lead to the activation of other retinal neurons, including bipolar cells and photoreceptors in healthy retinas, which will then activate RGCs through neuro-transmitters, in the same way that the retina processes visual stimuli. With extracellular stimulation, RGCs can be activated mainly through one of three routes (**Figure 3**): (1) direct activation through the AIS; (2) direct activation via axon bundles; (3) indirect activation via the retinal network. How RGCs are activated can determine the spatial and temporal resolution of electrical stimulation,

which will be discussed in sections Spatial Resolution and Temporal Resolution.

The responses of neurons to electrical stimulation, for a fixed pulse duration, normally follows a sigmoidal function: the response efficacy increases with stimulus strength (current, voltage or charge), then reaches a maximum and saturates (Tsai et al., 2009; Meng et al., 2018; **Figure 4A**). The stimulus strength associated with 50% response efficacy is usually defined as the threshold of activation. Lowering the stimulation threshold is very important for retinal prostheses as larger thresholds consume more power and may exceed the safe limit of the electrode materials or tissue. As neurons are activated due to the electric fields generated by the electrodes, the stimulus effectiveness is greatly influenced by electrode size and the distance between electrode and neuron. Research has also indicated that stimulus effectiveness can be greatly influenced by various stimulation parameters. For example, Weitz et al. (2014) found that, for biphasic pulses, the currents required for RGC activation decreased as the interphase durations increased. Hadjinicolaou et al. (2015) and Jalligampala et al. (2017) proposed strategies for searching the most efficient stimulation parameters for RGCs activation.

Recent years have also revealed an interesting observation known as the upper threshold phenomenon (**Figure 4B**) in RGC activation, i.e., a drop of response efficacy instead of saturation when stimulus strength exceeds a certain amount. The upper thresholds phenomenon was first observed by Boinagrov et al. (2012), and then reported again in several other studies (Barriga-Rivera et al., 2017; Guo et al., 2018; Kotsakidis et al., 2018; Meng et al., 2018). Meng et al. (2018) found that 20/21 cells exhibited the upper threshold phenomenon when sufficiently high charge was injected. However, from modeling they observed different results between monophasic and biphasic stimulation. While the upper threshold in the soma was observed in simulation for both types of stimulation, the action potential in the distal axon was blocked with monophasic stimulation but not with biphasic pulses. However, the upper threshold phenomenon with biphasic stimulation has been reported in vivo by Barriga-Rivera et al. (2017) that the recorded spike rates decreased in some channels with high amplitude stimulation. This indicates that, different from Meng et al. (2018), the upper threshold phenomenon may also happen in the RGC axons by biphasic stimulation. The potential mechanisms for the upper threshold phenomenon have been discussed in detail by Guo et al. (2019). According to their discussion, the sodium channel kinetics in RGCs may play the major role in the upper threshold phenomenon.

#### Spatial Resolution

To confine electric fields generated by stimulating electrodes, attempts have been made to shape the electric fields with current focusing, e.g., replacing the remote return electrodes with local returns (Abramian et al., 2011, 2014; Flores et al., 2016, 2018; Matteucci et al., 2016; Fan et al., 2019; Tong et al., 2019a). Different local return configurations have been reported and compared, including connecting several stimulating electrodes as the return (Abramian et al., 2011, 2014; Matteucci et al., 2016), and specially designing a ring-shaped electrode surrounding the stimulating electrode as the local return (Flores et al., 2016, 2018; Fan et al., 2019; Tong et al., 2019a). A more detailed discussion about different local return configurations can be found in section Simultaneous Stimulation. Overall, the local returns have been shown to confine the activation of RGCs to a certain extent. For example, Fan et al. (2019) reported that the return provided by six neighboring electrodes can enhance the capability of 10 µm epi-retinal electrodes to activate cells near (<30 µm) the central electrode. However, this study focused only on the parasol cells, which are large in size. It remains unknown how the impact of a local return would affect spatial resolution when considering other neurons in the study, in particular the midget cells which are believed to be responsible for high acuity vision in primates. Furthermore, axon bundle activation was also neglected in this study, which is another main origin of RGC spread for epiretinal stimulation. In another investigation, Tong et al. (2019a) compared the effect of return configurations for sub-retinal stimulation and showed different results depending on pulse durations and retinal degeneration. In the healthy retina, local returns were more effective in confining RGC activation when 0.1 and 0.2 ms pulses were used in comparison with 0.5 ms pulses. However, in the degenerated retina the RGC activation patterns were similar between two return configurations, regardless of the pulse durations.

Both simulation and experimental results also indicate that more charge or current will be required for neural stimulation when using local returns, due to the decrease of electric field intensity (Tong et al., 2019a). The elevated thresholds will lead to larger power consumption and a greater charge requirement for the electrodes. To reduce thresholds whilst minimizing a loss of electric field confinement, Flores et al. (2018) proposed local returns in conjunction with pillar structured electrodes. The pillar electrodes reduced the distance between the stimulating electrodes and the target neurons. Experimentally (Ho et al., 2019), they demonstrated in vivo that 10 µm tall pillars with 55 µm pixels can lead to grating acuities of 48 ± 11 µm, which matches the linear pixel pitch of the hexagonal arrays they

used. When converting the value into human visual acuity, the result is close to 20/192, which is just within the legal blindness threshold of 20/200. Following these studies, they also proposed honeycomb-shaped electrodes for sub-retinal stimulation (Flores et al., 2019), where the stimulating electrodes sit within a deep honeycomb well, the walls of the well acting as the local returns. Experimentally (Flores et al., 2019) they demonstrated that the inner retinal cells migrated into the 25 µm deep wells after 5 weeks of implantation. No experimental stimulating results have been published using such arrays, but from simulation, the visual acuity is expected to be better than 20/100.

Merely reducing the size of the electric field is sometimes insufficient to confine the activation of retinal cells. As mentioned above, the epi-retinal stimulation using electrodes as small as 10 µm can also lead to a large spread of RGC activation due to axon bundle stimulation (Behrend et al., 2011). In another study, Grosberg et al. (2017) found that only 45% of electrodes, also 10 µm in diameter, can stimulate individual RGCs using current amplitudes below threshold for axon bundle activation. Therefore, the activation of axon bundles has been identified as one the main sources of the spread of retinal cell activation. The phenomenon is observed for both epi- and sub-retinal stimulation (Tong et al., 2019b).

Strategies for avoiding axon bundle stimulation can be divided in to two routes. The first of these involves bypassing axon bundle stimulation by indirectly stimulating RGCs (Haq et al., 2018; Weitz et al., 2015). Weitz et al. (2015; **Figure 5**) demonstrated, via calcium imaging, that epi-retinal stimulation using both 24 ms biphasic square pulses and 20 Hz sine waves could effectively confine the RGC activation pattern because that type of stimulation primarily stimulates cells in the inner nuclear layer (inner retinal neurons) which, in turn, activate RGCs via the retinal network. Haq et al. (2018) studied sub-retinal stimulation using 1 ms voltage pulses, which were found effective for the activation of both degenerated cone photoreceptors (d-Phr) and inner retinal neurons. They showed that the 3 µm tip diameter electrodes used in the study mostly stimulated d-Phr about 60 µm, and RGCs about 160 µm from the electrodes. By applying gap junction blockers, they found that both the spread of d-Phr and RGC activation could be confined. However, there are other studies reporting different results about the spatial resolution of RGC activation resulting from network stimulation. Hosseinzadeh et al. (2017) studied the spatial extent of epiretinal stimulation by focusing on the network responses from the electrodes. They found that the network responses can also spread to a large area 300–1034 µm away from the electrodes even when the electrodes were as small as 10 µm. For sub-retinal stimulation, Tong et al. (2019a,b) also reported the spread of RGC activation when using 25 ms long pulses that mainly stimulated inner retinal neurons. One possible hypothesis (Tong et al., 2019a) is that network stimulation could lead to the activation of RGC dendritic fields, which could be as large as 500 µm in certain RGC types. The discrepancy could be due to the different techniques used for recording. As discussed in section Electrophysiological Tools, typical multilelectrode arrays provide limited spatial coverage and/or resolution and the activated neurons could be out of the recording region or lie between the recording electrodes. On the other hand, calcium imaging may not have sensitivity high enough to detect single spikes and may not record every activated neurons.

Potential problems with network stimulation include its low temporal resolution (section Temporal Resolution), and the relatively higher charge thresholds required for neuron activation. With long pulses, the charge injection capacities for activation can be larger than 1 mC/cm<sup>2</sup> (Weitz et al., 2015; Tong et al., 2019a), which will require the use of electrode materials with much large charge injection capacity than conventional materials such as platinum (charge injection capacity ∼150 µC/cm<sup>2</sup> ). The larger charge required also consumes more power and leads to more heat generation.

The other strategy aimed at selectively activating RGCs within or near the electrodes is by increasing the difference between axon bundle and RGC soma activation (Chang et al., 2019; Tong et al., 2019a). For both epi-retinal and sub-retinal stimulation, ultrashort pulses (shorter than 0.15 ms in Chang et al., 2019, and shorter than 0.1 ms in Tong et al., 2019a) were demonstrated to be effective at avoiding axon bundle stimulation. Esler et al. (2018a) proposed to simultaneously stimulate multiple electrodes aligned with the axon bundles to minimize the bundle activation. The proposal was based on the fact that the excitable parts of RGC are the AIS. AIS are located in the RGC layer with random orientations, but the overlying axons are packed together as mostly parallel fibers. The simultaneous stimulation of electrodes parallel to the axons in the nerve fiber layer flattens the extracellular potential along the length of the axon, thus minimizing axon activation.

Human trials have shown that some patients see halo-shaped stimulation patterns from single electrodes (Humayun et al., 2003). There are also studies that provide insights to explain these halo-shaped phosphenes. Eickenscheidt and Zeck (2014) showed that the neurons with the lowest thresholds were at the edge of the stimulation electrode, where the gradient of the extracellular electric field is maximal. In another study (Barriga-Rivera et al., 2017), the halo-like phosphene shapes were explained using the upper threshold phenomenon. Here they found that neurons close to the stimulating electrodes were inhibited at amplitudes lower than the neurons far from the stimulating electrodes. The halo-shapes could also originate from network stimulation: Tong et al. (2019a) showed that long pulses tend to activate neurons further away from the stimulating electrodes compared to the RGCs within the electrodes.

# Temporal Resolution

High quality vision restoration requires the control of retinal neural activities with precise timing, on similar time scales to normal visual responses. There has been research demonstrating electrical stimulation of RGCs with temporal patterns resembling light-evoked spike trains (Jepson et al., 2014b; Wong R. C et al., 2014). For example, Jepson et al. (2014b) reported reproduction of the temporal spiking sequence to visual responses in populations of macaque monkey ON parasol cells. Similar results were reported by Wong R. C et al. (2014) in cat brisk transient cells. However, in both studies only limited types of cells were recorded and analyzed; it remains unclear how

right (somas) to left (optic disc). The colored dots at right indicate somas of the cells whose passing axons are activated by the electrode. Figure adapted with permission from Weitz et al. (2015).

the electrical stimulation of other cell types, in particular the midget cells responsible for high acuity vision, could replicate visual responses.

A good understanding of the temporal patterns of all types of retinal neurons following electrical stimulation could inform the design of stimulation strategies for retinal prostheses. In general, responses originating in RGCs show short latencies (<5 ms), those originating in the inner nuclear layer show medium latencies (3–70 ms), and the responses originating in photoreceptors show long latencies (>40 ms) (Boinagrov et al., 2014). The activation of RGCs through direct- or networkmediated stimulation depends on the electrode location, pulse duration and pulse polarity. For example, Boinagrov et al. (2014) showed that monophasic cathodic epi-retinal stimulation with short pulses (below 0.5 ms) tends to directly stimulate RGCs, while long monophasic anodic pulses (above 10 ms) with electrodes in the outer plexiform layer showed optimal selectivity for network-mediated stimulation.

The response latencies of RGCs from direct stimulation exhibit a U-shape in relation to current amplitudes (Boinagrov et al., 2014; Meng et al., 2018). Compared with direct stimulation, the network mediated responses of RGCs are normally slower and exhibit a variety of temporal response patterns depending on the types of the cells and the stimulus parameters. Im and Fried (2015) compared light and network mediated electrical responses in different types of RGCs from wild type rabbits. They showed that the response patterns to a single pulse stimulus varied between ON and OFF brisk transient or brisk sustained cells, which can also be used to infer the type of neuron recorded. The network mediated electrical responses from ON cells could resemble their light responses much better than OFF cells. Also, the stimuli that activated photoreceptors yielded better correlations than those activating bipolar cells. In a following study (Im and Fried, 2016), they examined the network-mediated responses to repetitive stimulation and also found differences between ON and OFF cells. In both

brisk transient and brisk sustained ON cells, they showed a reset phenomenon, in which each new stimulus elicited a brief burst of spikes. In contrast, OFF cells did not exhibit a reset in their responses; the responses to subsequent stimuli were diminished. Later, they demonstrated that varying stimulus durations (Im et al., 2018) could differentially modulate the responses between ON and OFF cells, providing a potential strategy for selective stimulation of different RGC types (see section Selective Activation). There are some further reports about the effects of varying stimulation patterns, such as duration, rate, current amplitudes, and waveform shapes (Im et al., 2018; Werginz et al., 2018; Lee and Im, 2019). This research mainly used wild-type animals, however, for the design of retinal prostheses, how cells in degenerated retinas respond to electrical stimulation is more relevant. Lee et al. (2017) found that the network mediated responses of ON alpha RGCs in rd10 mice showed trialto-trial variability and the variability increased over the course of retinal degeneration. More research needs to be done in the future to understand the impact of retinal degeneration.

In addition to RGCs, there is research recording directly from other types of retinal neurons. A survey of electrically evoked responses over different current amplitudes and pulse durations were performed by Tsai et al. (2017a). In this study, they found differences among 21 cell types in response to electrical stimulation, a finding which may enable preferential recruitment of certain cell types. Walston et al. (2018) studied ON-type bipolar cells in both normal and degenerated mouse retina, and reported desensitizing responses to repeated stimulation and the upper threshold phenomenon.

As previously mentioned (section Current Challenges and Limitations), fading is one critical problem in retinal prostheses and has been found to be associated with the desensitized responses of retinal cells to repetitive stimulation. The desensitization phenomenon has been observed experimentally in both direct and network mediated responses of RGCs. For network mediated responses, in the low frequency range (below 10 Hz), Im and Fried (2016) showed desensitization in OFF RGCs, but not in ON RGCs. However, Walston et al. (2018) observed desensitizing responses in ON bipolar cells at frequencies greater than 6 Hz. The cut-off frequencies of direct responses of RGCs vary among morphological types (Hadjinicolaou et al., 2016), and differed between intracellular and extracellular stimulation (Kotsakidis et al., 2018). One possible mechanism of desensitization in direct responses of RGCs is a lack of sodium channel deinactivation (Tsai et al., 2011). Other studies have emphasized the existence of electrical currents in the retina, like axo-axonal gap junctions, which could cause an inhibition in the neuron, thus preventing it from generating full action potentials (Soto-Breceda et al., 2018).

Strategies have been proposed to reduce the decay of RGCs responses during repetitive stimulation. Soto-Breceda et al. (2018) proposed the use of electrical pulses with irregular time intervals between them to replace periodic pulses that are normally used in retinal prostheses (**Figure 6**). They found that the random interpulse intervals could lead to lower adaptation rates than stimulation with constant intervals at frequencies above 50 Hz. In another study, Sekhar et al. (2016) analyzed the network mediated responses of RGCs to stimulation at 25 Hz, which would typically induce strong fading. As the retinal neurons could respond to sequences of subthreshold stimulation, they suggested the use of subthreshold sequences to minimize the fading problem.

# Selective Activation

There have been some encouraging results about selective activation of individual neurons. For example, Jepson et al. (2013) first demonstrated that it is possible to stimulate a single RGC without activating neighboring cells. Selectivity was improved by the use of local returns (Fan et al., 2019). However, there are several limitations in these studies for retinal prosthesis application. First, the electrodes used for stimulation were very small, with diameters around 10 µm, and in direct contact with the retina surface (epi-retinal stimulation). Clinically available devices use electrode sizes much larger; and there is usually some separation in space between the target neurons and the electrodes (Gregori et al., 2018). Secondly, these studies recorded and analyzed limited number of neurons within certain cell types. Jepson et al. (2013) examined the responses from midget, parasol and bistratified ganglion cells in the primate retina, while Fan et al. (2019) only examined the parasol cells. It is possible that other neuron types were also activated but not recorded or analyzed. Third limitation lies in the multielectrode array technique they used for recording, that activated neurons could be out of the recording region or lie between the recording electrodes.

While selectively stimulating individual RGCs may be too challenging for current technologies, preferential activation of selective types of RGCs can also be beneficial to the quality of the vision restored. In response to intracellular stimulation, RGCs showed similarities within the same morphological types (Wong et al., 2012; Hadjinicolaou et al., 2016; Zehra et al., 2018). The difference between morphological types indicates the possibility to selectively stimulate RGCs with intracellular stimulation. However this may have limited relevance to extracellular stimulation. Kotsakidis et al. (2018) examined the optimal range of combinations of current amplitude and frequencies (2–2048 Hz) that preferentially activate ON over OFF RGC population responses (**Figures 7A,B**), and they found the optimal ranges were very different between intracellular and extracellular stimulation.

In addition to the work of Kotsakidis et al. (2018), there are several other studies that demonstrated the successful selective activation of ON or OFF RGCs. Similar to that of Kotsakidis et al. (2018), some of these studies focused on optimizing the current amplitudes and stimulation frequencies for each cell type. Cai et al. (2013) pioneered the work using high frequency (1 kHz) biphasic stimulation. They found that the OFF-brisk transient cells in rabbits could only be activated with a medium range of current amplitudes, but the ON-OFF directionally selective cells maintained strong spiking when much higher current amplitudes were applied. Twyford et al. (2014) used 2 kHz stimulation with amplitude modulation using a slower envelope and successfully modulated the activities of ON and OFF cells in a differential manner. Guo et al. (2018) then systematically studied ON and

Soto-Breceda et al. (2018).

OFF responses to high frequency stimulation (>1 kHz) with constant amplitudes (**Figures 7C,D**). With synaptic blockers, ON cells were preferentially activated at relatively higher stimulation amplitudes (>150 µA) and frequencies (2–6.25 kHz), however, OFF RGCs were activated by lower stimulation amplitudes (40– 90 µA) across all tested frequencies. The mechanisms underlying differential responses of ON and OFF cells have not been revealed experimentally but may be due to different ionic currents present in ON and OFF cells and different cell morphologies, as illustrated computationally (Guo et al., 2014, 2018; Kameneva et al., 2016).

Another strategy for selective stimulation is based on different pulse durations, as reported in Im et al. (2018) and Lee and Im (2019). Both works studied the network mediated responses of RGCs. In Im's work (Haq et al., 2018), they found the activities of ON cells decreased significantly when the pulse duration increased. However, the changes of OFF cells to pulse duration were more modest. Lee and Im (2019) also found that ON cells are more sensitive to the change of current amplitude. Both works suggested that it is possible to bias the activation in favor of ON cells. However, it is unclear whether the differences between ON and OFF cells caused by network-mediated activation will remain in degenerated retina.

The third strategy investigated the impact of electrode configurations. Yang et al. (2018) showed that with synaptic blockers, ON RGCs showed higher thresholds than OFF RGCs for epi-retinal stimulation. Furthermore, the difference was enhanced when placing the stimulating electrodes away from the axon. However, the precise control of the stimulating electrode location is difficult during implantation, therefore its clinical application is challenging. With local returns, Fan et al. (2019) also showed selective activation of ON or OFF parasol cells. Guo et al. (2017) proposed the use of multiple stimulating electrodes, with a primary electrode near the target neurons and a bipolar return electrode pair near the optic disc. With their strategy, the propagation of OFF cells was blocked according to the computer simulation.

The last strategy determines the optimal waveforms for ON and OFF cell activation using spike-triggered analysis. Spiketriggered analysis was first used to determine the receptive fields of RGCs to visual stimuli, and has been used in recent years for studying the temporal and spatial electrical receptive fields of RGCs. A spatial electrical receptive field consist of the spatial arrangement of electrodes capable of stimulating a cell to spike, while a temporal electrical receptive field consist of the sequence of pulses the affected spike stimulation. The recent progress in spike triggered analysis for retinal stimulation is summarized in Rathbun et al. (2018). Sekhar et al. (2016) first reconstructed the temporal electrical receptive fields of RGCs in wild type mice and found that the waveforms were different for ON and OFF cells. After further analysis (Sekhar et al., 2017), they showed the waveforms had different polarities. ON cells tended to show waveforms with short-latency upward deflections, while OFF cells were correlated to short-latency downward deflections. Ho et al. (2018) obtain similar results, and showed that they could be attributed to photoreceptor response and it differential impact on ON and OFF bipolar cells. Although different receptive field polarities were also observed in the degenerated retina, it was not possible to identify the cell type. Comparing the waveforms between healthy and degenerated retinas, they found significant

preferred lower stimulation amplitudes across all tested frequencies. (A,B) Is adapted from Kotsakidis et al. (2018). (C,D) Is adapted from Guo et al. (2018).

differences between the latencies and widths of the waveform deflections, which were shorter and narrower in degenerated retina. Similar results were also reported in Ho et al. (2018) for sub-retinal photovoltaic stimulation. One hypothesis about the presence of two response polarities in the degenerate retina relates to the depolarization of the rod bipolar cells (Ho et al., 2018). The depolarization of the rod bipolar cells would lead to the activation of ON RGCs but inhibition of OFF RGCs.

While most of the existing research aims at selective stimulation of ON vs. OFF cells, little research has been reported to preferentially activate cells in a broader range of cell types. One reported study showed preferential activation of brisk transient cells in rabbits (Im and Fried, 2015). They found anodic pulses could selectively activate brisk transient cells but not in brisk sustained cells. The same group later also found that the duration strength curves were different for brisk transient and brisk sustained cells (Im et al., 2018).

#### Multielectrode Stimulation

Research concerning retinal cell responses to single electrode stimulation has provided the community with important information contributing to a deeper understanding of stimulation mechanisms and performance. However, to translate electrical stimulation to useful visual information in patients, it is necessary to stimulate multiple electrodes to create 2-D patterns, either simultaneously or in sequence. The knowledge collected from single electrodes can inform the stimulation strategies for multielectrode stimulation. The resolution of the percepts reproduced depends on the selection of electrodes and stimulation parameters.

#### Sequential Stimulation

Sequential stimulation was performed by Shah et al. (2019), in which each electrode was stimulated in series, at a rate expected to be faster than the integration time of visual perception. They first created a response library by recording the RGC responses to individual electrode stimulation. Then to reconstruct the image, they stimulated the electrodes one by one. In each time frame, the stimulation electrode was determined using a greedy algorithm. This algorithm was built on the collected library and aimed at minimizing the difference between the accumulated stimulation pattern and the target. They further found the efficacy of image reconstruction to be better if they limited the stimulation library to the most frequently chosen electrodes. In this work, the error between the activation patterns and the targets monotonically decreased with the number of stimulation patterns delivered, but saturated after 4,000. However, to stimulate 4,000 electrodes in series at a frequency of 10 kHz requires 400 ms, which is much longer than the likely integration times in the brain, which is expected to be tens of ms.

#### Simultaneous Stimulation

fnins-14-00262 March 24, 2020 Time: 16:1 # 13

In clinical retinal implants, when neighboring electrodes are stimulated simultaneously, phosphenes tend to overlap, resulting in spatial resolution that is poor compared to the density of electrodes. This is a consequence of the spread of current from the stimulating electrode to areas underlying adjacent electrodes, resulting in an increase in the area of retinal activation encompassing several electrodes. At first, simultaneous stimulation on neighboring electrodes may seem likely to exacerbate this problem. However, some stimulation strategies propose to make use of simultaneous stimulation to focus, shift or otherwise shape retinal activity to overcome the problem and thereby improve spatial resolution toward the limits imposed by electrode density.

The most straightforward approach attempts to focus the area of retinal activation to just the area immediately under or around a stimulating electrode by using the adjacent electrodes as current sinks. This can be done using bipolar, tripolar or multipolar electrode configurations (Cicione et al., 2012; **Figure 8**). The rationale behind such approaches is that they contain the spread of current in the retina to just the neighboring electrodes, whereas a distant return electrode would allow a wider current spread. In theory, amongst these options, the hexapolar configuration has the greatest potential to limit the spread of activation across the two-dimensional retinal surface as it places a ring of "guard" electrodes around a central stimulating electrode. Consequently, it has received the greatest attention, and most studies have shown some benefit in using hexapolar over monopolar configurations in limiting the spread of neural activation. For example, patch recordings in ex vivo retina, Habib et al. (2013) showed that a hexapolar configuration limited the spread of retinal activation more than a monopolar configuration, with a pronounced increase in stimulation threshold outside the hex-guard that was not observed at equivalent distances in the monopolar configuration. Similarly, Spencer et al. (2016), found that a hexapolar sub-retinal configuration limited the spread of visual cortical activation for near threshold stimulation, when compared to monopolar stimulation. However, Cicione et al. (2012) found no significant difference in the spread of cortical activation between these two configurations, at least for stimulation levels approaching saturation. The difference between the studies of Spencer et al. and Cicione et al. may lie in the different stimulation level used to assess spread. Finally, concurrent stimulation with two adjacent hexapolar electrode configurations reduces or even eliminates crosstalk between them, but interference occurs when one or more of the two electrode configurations is monopolar. This has been demonstrated at the level of the electrical potential

in saline in vitro (Dommel et al., 2009) as well as in vivo (Matteucci et al., 2016).

A potential limitation of the hexapolar configuration is that RGCs underlying the ring of sink electrodes may also be stimulated due to the relatively large currents entering those electrodes. Further, the sink electrodes may have very different impedances so that some electrodes will sink a larger fraction of the current than others when they are connected to a common ground. This will distort the area of activation toward electrodes sinking the largest fraction of current. To contend with these difficulties a focused multipolar approach (**Figure 8E**) has been proposed (Spencer et al., 2016). It overcomes the first limitation by using electrodes across the whole array to distribute the return current from a central stimulating electrode to optimally focus electrical potential. To overcome the second limitation relating to electrode impedance, it uses the implant to directly measure electrode impedances and correct for any distortion of the electrical potential they would cause. In practice this correction requires significant departures from a hexapolar configuration. Spencer et al. found that the area of visual cortex activated by focused multipolar and hexapolar sub-retinal stimulation was significantly reduced compared to monopolar sub-retinal stimulation, albeit at the cost of approximately 50% higher thresholds. However, no significant differences in activated areas were found between the focused multipolar and hexapolar configurations.

The hexapolar and multipolar approaches described above use combinations of electrodes as current sources and sinks to steer or focus current. An alternative approach, proposed by Spencer et al. (2019), is to shape retinal activity directly, rather than through current, by utilizing a model that predicts the pattern of retinal activity resulting from multielectrode stimulation, estimated from recordings made with the implant. The proposed stimulation strategy effectively inverts the model to find the pattern of electrical stimulation on the electrode array that optimally matches a target pattern of retinal activity. The strategy is most effective if all the electrodes on the array are used simultaneously to shape retinal activity, although in principle any number and configuration of electrodes can be optimized using the approach. An additional novel aspect of the strategy is that it shapes activity globally: the target pattern of retinal activity could cover any part of the retina spanned by the implant, and not just an isolated phosphene as considered in current focusing or steering strategies. Thus, it could represent the activity evoked in the retina by an entire image during sighted vision. When a focal phosphene is desired, the strategy can give similar solutions to the hexapolar and multipolar approaches (with the appropriate numbers and configurations of electrodes) provided the RGC response is not too heterogeneous across the array.

Accurate shaping of neural activity requires careful measurement of how multiple electrodes interact to produce a RGC response during simultaneous stimulation. Ex vivo retinal recordings to patterns of multielectrode stimulation have shown that for direct activation of RGCs, electrodes interact linearly during simultaneous stimulation in 90% of RGCs (Jepson et al., 2014a; Maturana et al., 2016). This conclusion is also supported by theoretical studies of multielectrode stimulation of

biologically detailed models of RGCs based on morphological reconstruction with Hodgkin-Huxley type dynamics (Esler et al., 2018b). Following this, Maturana et al. (2016) showed that a model can accurately predict direct RGC responses to multielectrode stimulation if it is formulated in terms of an electrical receptive fields for each recorded RGC, which describes the contribution each electrode makes to stimulation of that cell in a linear weighted sum. The probability of the cell emitting a spike in response to multielectrode stimulation is a non-linear function of this weighted sum. For network mediated activation, a more complicated non-linear model is required (Maturana et al., 2018), although at the level of responses in visual cortex the simpler linear summation appears to suffice (Halupka et al., 2017a,b).

A key component of the strategy proposed by Spencer et al. (2019) is that it incorporates methods to determine the limitations on spatial resolution imposed by noise in the measurement of the RGC electrical receptive fields. Without noise, the strategy can in principle achieve a spatial resolution limited only by the spacing between electrodes. However, in practice, noise affects the higher spatial frequencies of the electrical receptive fields disproportionately, so that if the algorithm tries to use these spatial frequencies to optimize stimulation, gross departures from the target will result. The strategy can use the recordings from RGCs in response to multielectrode stimulation to identify the spatial frequencies at which noise exceeds the signal and use this to robustly optimize the spatial resolution of the implant.

# FUTURE DIRECTIONS

A significant amount of knowledge has been gained about the electrical stimulation of retinal prostheses using explanted retinas from animals. However, there is generally a lack of translation of the stimulation strategies developed ex vivo to clinical practice and it remains unclear whether they can improve the performance of retinal prostheses in patients. Some of the stimulation strategies in this review were developed using array configurations unavailable in clinic. Furthermore, most of the current research is conducted using healthy animals with normal vision. Several experimental results indicate that the retinal degeneration could introduce abnormal behavior in retinal neurons and their responses to electrical stimulation. Therefore, future research should focus more on the impact of degeneration.

In addition to searching for the optimal stimulation parameters for the spatiotemporal responses of populations of retinal neurons, it is now clear that retinal prostheses capable of simultaneous recording and stimulation will have the potential to significantly improve their performance via closed-loop feedback. The existing retinal prostheses available in the clinic can only stimulate. With no option of recording the neural activities from the retina, these devices can only rely on the feedback from patients to optimize their performance, which is very time consuming. The description from patients may be opaque, confusing, hard to quantify and vary according to their experiences and preferences. Also, regular device calibration will be necessary due to the changes in the electrode properties and retinal condition following implantation over time. An automatic adjustment using closedloop feedback from the device can address the issue with much higher efficiency.

However, there are several challenges for the implementation of closed-loop retinal prostheses. First, current clinically available devices use electrodes with very large sizes, which are not suitable for high quality single-unit neural spike recording. To record from single neurons, electrodes around 10 µm will be necessary,

but such small electrodes create difficulties for neural stimulation, as described previously. It may be possible to combine several electrodes to provide sufficient stimulation capacity. Second, high quality neural spike recording will require a close contact between the electrodes and the target neurons, as the electrical potentials drop as a function of the square of the distances. Suprachoroidal and sub-retinal devices are both far away from the RGCs, while placement of epi-retinal devices close to the retinal surface has been a surgical challenge. Flexible electrode arrays are expected to be in better contact with the retinal surface than the rigid arrays, and might be a promising solution for neural recording. The third issue relates to data transmission and power. Single-unit recording normally requires signal sampling at a frequency of several tens of kilohertz. The amount of data that needs to be transmitted for external data analysis will be difficult considering the bandwidth for current technologies and will also consume a large amount of power. One strategy to reduce the data transmission is to incorporate the function of data processing into the implanted devices. However, such data processing will consume power and may generate a lot of heat that could be dangerous. One potential solution to solve all three problems is to replace the high frequency single unit recording with low frequency potential (LFP) recording, which records the collective activity of neural populations rather than the action potentials of each neuron. However, there has been very little work reported on LFP recordings in retina. It remains unclear if LFP recording can be used to study the responses of RGCs to electrical stimulation and how to use LFPs to inform the stimulation strategy.

# REFERENCES


# CONCLUSION

In the last few years, there has been a significant growth in research on the topic of electrical stimulation of retinal neurons, from both the basic understanding of the stimulation mechanisms to the development of novel stimulation strategies for better retinal prostheses performance. The research performed using explanted retinas from animals has provided insights on refining device efficiency by improving the spatial and temporal resolution possible from electrical stimulation, and has suggested potential approaches for selectively activating retinal neurons responsible for different visual processing. The next generation of retinal prostheses will benefit from the incorporation of neural recording, which is expected to further improve the overall performance based on closed-loop feedback.

# AUTHOR CONTRIBUTIONS

All authors contributed to the writing of the manuscript.

# FUNDING

The research was supported by a Development Grant from The National Health and Medical Research Council (NHMRC, GNT1118223) of Australia. DG was supported by NHMRC Project Grant GNT1101717 and by an Australian Nanofabrication Facility (ANFF)/Melbourne Centre for Nanofabrication (MCN) Technology Ambassador Fellowship.




retinal prosthesis. Investig. Ophthalmol. Vis. Sci. 57, 3181–3191. doi: 10.1167/ iovs.16-19325


**Conflict of Interest:** DG was a shareholder and executive officer of Carbon Cybernetics Pty Ltd., a company developing diamond and carbon-based medical device components.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Tong, Meffin, Garrett and Ibbotson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnins-14-00262 March 24, 2020 Time: 16:1 # 18

# A New Approach for Noise Suppression in Cochlear Implants: A Single-Channel Noise Reduction Algorithm<sup>1</sup>

Huali Zhou<sup>1</sup> , Ningyuan Wang<sup>2</sup> , Nengheng Zheng<sup>3</sup> , Guangzheng Yu<sup>1</sup> \* and Qinglin Meng<sup>1</sup> \*

<sup>1</sup> Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China, <sup>2</sup> Nurotron Biotechnology Inc., Hangzhou, China, <sup>3</sup> The Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China

The cochlea "translates" the in-air vibrational acoustic "language" into the spikes of neural "language" that are then transmitted to the brain for auditory understanding and/or perception. During this intracochlear "translation" process, high resolution in time–frequency–intensity domains guarantees the high quality of the input neural information for the brain, which is vital for our outstanding hearing abilities. However, cochlear implants (CIs) have coarse artificial coding and interfaces, and CI users experience more challenges in common acoustic environments than their normalhearing (NH) peers. Noise from sound sources that a listener has no interest in may be neglected by NH listeners, but they may distract a CI user. We discuss the CI noise-suppression techniques and introduce noise management for a new implant system. The monaural signal-to-noise ratio estimation-based noise suppression algorithm "eVoice," which is incorporated in the processors of Nurotron <sup>R</sup> EnduroTM, was evaluated in two speech perception experiments. The results show that speech intelligibility in stationary speech-shaped noise can be significantly improved with eVoice. Similar results have been observed in other CI devices with single-channel noise reduction techniques. Specifically, the mean speech reception threshold decrease in the present study was 2.2 dB. The Nurotron society already has more than 10,000 users, and eVoice is a start for noise management in the new system. Future steps on nonstationary-noise suppression, spatial-source separation, bilateral hearing, microphone configuration, and environment specification are warranted. The existing evidence, including our research, suggests that noise-suppression techniques should be applied in CI systems. The artificial hearing of CI listeners requires more advanced signal processing techniques to reduce brain effort and increase intelligibility in noisy settings.

Keywords: cochlear implant, noise reduction, cocktail party problem, monaural, speech in noise, intelligibility, Nurotron, eVoice

#### Edited by:

Yuki Hayashida, Osaka University, Japan

#### Reviewed by:

Sebastián Ausili, University of Miami, United States Li Xu, Ohio University, United States

#### \*Correspondence:

Guangzheng Yu scgzyu@scut.edu.cn Qinglin Meng mengqinglin@scut.edu.cn

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 30 November 2019 Accepted: 16 March 2020 Published: 21 April 2020

#### Citation:

Zhou H, Wang N, Zheng N, Yu G and Meng Q (2020) A New Approach for Noise Suppression in Cochlear Implants: A Single-Channel Noise Reduction Algorithm. Front. Neurosci. 14:301. doi: 10.3389/fnins.2020.00301

<sup>1</sup>Portions of this work were presented in "Implementation and evaluation of a single-channel noise reduction method in cochlear implants" at the 2017 Conference on Implantable Auditory Prostheses, Lake Tahoe, CA, United States, July 2017; "Neural Interface: Frontiers and Applications" in Advances in Experimental Medicine and Biology, Volume 1101; and "Speech intelligibility test of "eVoice", a new noise-reduction algorithm in Nurotron Enduro systems" at the 2019 Asia Pacific Symposium on Cochlear Implants and Related Sciences, Tokyo, Japan, November 2019.

# INTRODUCTION

fnins-14-00301 April 19, 2020 Time: 12:18 # 2

The cochlear implant (CI) is one of the most successful prostheses ever developed and aims to rehabilitate hearing by transmitting acoustic information into the brains of people with severe to profound hearing impairment by electrically stimulating auditory nerve fibers (Shannon, 2014). The artificial electric hearing provided by current CIs is useful for speech communication but is still far from satisfactory compared with normal hearing (NH), especially in the aspect of speech-in-noise recognition.

The noise issue is a common complaint of CI users (e.g., Ren et al., 2018). Because of variability associated with implant surgery time, hearing history, rehabilitation and training, surgical conditions, devices and signal processing, and so on, large differences in hearing abilities have always been reported within any group of CI users. These reasons behind the CI-NH gap and intersubject CI variance may be classified into "top-down" and "bottom-up" types (Moberly and Reed, 2019; Tamati et al., 2019).

From a practical standpoint, knowledge about "top-down" memory and cognition is useful for rehabilitation and making surgical decisions (Kral et al., 2019), whereas the relationship between speech performance and the "bottom-up" signal processing functions—especially those on the electrode interface—determines the engineering approaches used in current CI systems (Wilson et al., 1991; Loizou, 1999, 2006; Rubinstein, 2004; Zeng, 2004; Zeng et al., 2008; Wouters et al., 2015; Nogueira et al., 2018). Although the "top-down" approach has been suggested to be incorporated into CI systems to form an adaptive closed-loop neural prothesis (Mc Laughlin et al., 2012), we only introduce "bottom-up"–related techniques that might be useful for CI users to tackle the problem of noise masking, as discussed below.

How to send more useful information upward? Sound pressure waveforms are decomposed by healthy cochleae into fine temporal-spectral "auditory images". CIs attempt to capture and deliver the same images but, unfortunately, in a coarse way. Theories in grouping, scene analysis, unmasking, and attention have demonstrated the significance of precise coding of acoustic cues including pitch or resolved harmonics, common onset, and spatial cues. For most CI systems, only temporal envelopes from a limited number of channels can be transferred to the nerve, and current interactions between channels are a key limitation of the multichannel CI framework.

Several research directions have been explored to improve the CI recognition performance of speech in noise by updating the technology of contemporary multichannel devices: (1) stimulating auditory nerves in novel physical ways such as optical stimulation (Jeschke and Moser, 2015) and penetrating nerve stimulation (Middlebrooks and Snyder, 2007); (2) developing intracochlear electrode arrays with different lengths, electrode shapes, and mechanical characteristics (Dhanasingh and Jolly, 2017; Rebscher et al., 2018; Xu et al., 2018); (3) steering and focusing the current spread by simultaneously activating multiple electrodes (Berenstein et al., 2008; Bonham and Litvak, 2008); (4) refining the strategies in the temporal domain by introducing harmonics (Li et al., 2012), timing of zero crossings (Zierhofer, 2003) or peaks (Van Hoesel, 2007), and slowly varying temporal fine structures (Nie et al., 2005; Meng et al., 2016); and (5) enhancing speech or suppressing noise before or within the core signal processing strategies. The first and second directions are developed from the perspective of neurophysiology; the third is mainly based on psychophysical tests; the fourth uses a combination of signal processing and psychophysics, and the fifth mainly concentrates on signal processing. All of these aspects are worth further investigation.

In the last two decades, the fifth approach of enhancing speech or suppressing noise before or within the core signal processing strategies has become a hot topic in academic and industrial research. Noise reduction and speech enhancement are two sides of the same coin, and the goal is to improve intelligibility or quality of speech in noise, in most cases with a signal-to-noise ratio (SNR) enhancement signal processing system. Some noise reduction techniques in telecommunications and hearing aids have been used to process noisy speech signals, and then the processed signals are presented through loudspeakers to CI users (e.g., classic single-channel spectral subtraction) (Yang and Fu, 2005) for feasibility verification. Now there are more sophisticated single-channel noise-reduction algorithms (NRAs) (Chen et al., 2015), directional microphone, or multimicrophone-based beamformers of hearing aids (Chung et al., 2004; Buechner et al., 2014), and more recently deep neural network–based algorithms (Lai et al., 2018; Goehring et al., 2019) that have been tried with CI listeners. Another line of research is to specifically optimize algorithm parameters with a consideration of the differences between CI and NH listeners. The parameters are generally related to the noise estimation or gain function for noise reduction (Hu et al., 2007; Kasturi and Loizou, 2007; Mauger et al., 2012a,b; Wang and Hansen, 2018). All these studies demonstrated significant improvements, which can be explained by the higher SNR yielded by the techniques before or within the CI core strategies.

In the newest versions of CI processors from current commercial companies such as Cochlear <sup>R</sup> (Hersbach et al., 2012), Advanced Bionics <sup>R</sup> (Buechner et al., 2010), and MED-EL <sup>R</sup> (Hagen et al., 2019), one or multiple algorithms of SNRbased monaural noise reduction and spatial cue-based directional microphone or multimicrophone beamformers have been implemented and evaluated. Multimicrophone beamformers significantly improve speech intelligibility for CI recipients in noise. However, it is based on the assumption that target speech and noise sources are spatially separated. Thus, singlemicrophone NRAs in CI systems are still worthy of attention to improve speech perception in noise, especially in scenarios when the target speech and noise sources are not spatially separated.

Some single-microphone NRAs that are already implemented in commercial CI products have been reported in the literature. ClearVoice is a monaural NRA implemented with the HiRes 120 speech processing strategy (Buechner et al., 2010; Holden et al., 2013). It first estimates noise by assuming that speech energy amplitude changes frequently and background noise energy is less modulated. Then, gain is reduced for channels identified as having mainly noise energy. The noise estimation works at

a time window of 1.3 s, which is the activation time of this algorithm. Experiments showed that ClearVoice can improve speech intelligibility in stationary noise (Buechner et al., 2010; Kam et al., 2012). Another monaural NRA is implemented with the ACE (advanced combination encoder) strategy in Nucleus devices. It uses a minimum statistics algorithm with an optimal smoothing method for noise estimation (Martin, 2001) and an a priori SNR estimate (McAulay and Malpass, 1980) in conjunction with a modified Wiener gain function (Loizou, 2007). It was reported to significantly improve hearing in stationary noise (Dawson et al., 2011).

We introduce a recently developed single-channel estimated-SNR–based NRA, termed "eVoice," which has been implemented in the second-generation research processor EnduroTM of Nurotron. Nurotron, a young company based in Irvine, CA, United States, and Hangzhou, Zhejiang, China, currently has more than 10,000 patients implanted. The Nurotron system has 24 electrode channels, and its users' speech performance in quiet and postsurgery development status are comparable with previous data from other brands (Zeng et al., 2015; Gao et al., 2016). The noise estimation in eVoice is processed on a frame-by-frame basis, which is using a relatively short time window. It is based on classical signal processing algorithms and is not the first CI device to use this kind of approach. The aims of this study include reporting the intelligibility experiment results for eVoice and rethinking noise management of a new CI system, which in this case is the Nurotron system.

#### EVOICE OF NUROTRON: A SINGLE-CHANNEL NRA

The default core strategy of Nurotron is the advanced peak selection (APS) strategy, which is similar to an "n-of-m" strategy (Zeng et al., 2015). The APS strategy is based on a short-time Fourier transform (STFT) and typically selects eight maxima (an automatic process defined in the coding strategy) for stimulation in each frame (Ping et al., 2017). A block diagram of the APS strategy and eVoice is shown in **Figure 1**. In APS, acoustic input signal is first preamplified, followed by bandpass filtering (the band number m typically equals the active electrode number, i.e., m = 24 in Nurotron devices) and envelope calculation. Then, in peak selection, n bands with the largest amplitude are selected for further nonlinear compression and electrical stimulation (typically, n = 8 in Nurotron devices). The eVoice is an envelope-based noise reduction method implemented between envelope calculation and peak selection. It consists of two steps: noise estimation and gain calculation (Wang et al., 2017).

#### Noise Estimation

The noise estimation algorithm is based on an improved minimacontrolled recursive averaging (MCRA-2) algorithm (Rangachari and Loizou, 2006). Noise power in each channel is estimated on a frame-by-frame basis instead of a time window that includes several frames to reduce response time.

Suppose that the noise is additive, then in the time domain, the input signal y(n) can be denoted as

$$\mathcal{Y}(n) = \mathfrak{x}(n) + d(n) \tag{1}$$

where x(n) is the clean speech signal and d(n) is the additive noise signal. We use Y(λ, k), i.e., the STFT of y(n), to represent the summation magnitude of channel k in frame λ in the frequency domain. The power spectrum of the noisy signal can be smoothed and updated on a frame-by-frame basis using the recursion below:

$$P\left(\lambda,k\right) = \eta P\left(\lambda - 1,k\right) + \left(1 - \eta\right) \left|Y(\lambda,k)\right|^2\tag{2}$$

where η is a smoothing factor. Then, the local minimum of the power spectrum in each channel can be tracked as follows:

$$P\_{\text{mlin}}(\lambda, k) = \begin{cases} \begin{array}{c} P\left(\lambda, k\right) \\\\ \gamma P\_{\text{mlin}}\left(\lambda - 1, k\right) + \frac{1 - \gamma}{1 - \delta} (P\left(\lambda, k\right) - \beta P(\lambda - 1, k)), \end{array} \\\\ P\_{\text{mlin}}(\lambda - 1, k) < P(\lambda, k) \end{cases} \tag{3}$$

where Pmin(λ, k) is the local minimum of the noisy speech power spectrum, and β and γ are constant parameters. The ratio of noisy speech power spectrum to its local minimum can be calculated as follows:

$$S\_r\left(\lambda,k\right) = \frac{P(\lambda,k)}{P\_{\min}(\lambda,k)}\tag{4}$$

This ratio is compared against a threshold T(λ, k) to determine the speech-presence probability I(λ, k) using the criterion below:

$$I(\lambda,k) = \begin{cases} 1 & \text{S}\_r \nmid \lambda, k \ge T(\lambda, k) \\ 0, & \text{S}\_r \nmid \lambda, k < T(\lambda, k) \end{cases} \tag{5}$$

where T(λ, k) is the threshold that is dynamically updated according to the estimated SNR of the previous frame. It is worth mentioning that this threshold is set at a constant level in the literature, and we found from our pilot data that dynamic thresholds performed better than constants during our assessment, so we decided to use dynamic thresholds.

This speech-presence probability I(λ, k) can be smoothed as follows:

$$K\left(\lambda,k\right) = \alpha K\left(\lambda,k\right) + (1-\alpha)I(\lambda,k) \tag{6}$$

where K(λ, k) is the smoothed speech-presence probability, and α is a smoothing constant. The smoothing factor to be used for noise estimation can be updated using the above calculated speech-presence probability:

$$
\alpha\_{\iota}(\lambda, k) = \alpha\_{d} + (1 - \alpha\_{d})K(\lambda, k) \tag{7}
$$

where α<sup>s</sup> is the smoothing factor to be used for noise estimation, and α<sup>d</sup> is a constant. Finally, the noise power of each channel is estimated as follows:

$$D\left(\lambda,k\right) = a\_i\left(\lambda,k\right)D(\lambda-1,k) + \left(1-a\_i\left(\lambda,k\right)\right)\left|Y(\lambda,k)\right|^2\tag{8}$$

#### Gain Function for Noise Reduction

Using the estimated noise power, the SNR can be estimated according to

$$\text{SNR}\left(\lambda,k\right) = \ $ \text{SNR}\left(\lambda - 1,k\right) + (1-\$ ) \frac{P(\lambda,k)}{D(\lambda,k)} - 1\tag{9}$$

Then, we use a gain function like:

fnins-14-00301 April 19, 2020 Time: 12:18 # 4

$$G\left(\lambda,k\right) = \frac{\text{SNR}\left(\lambda,k\right)}{\text{SNR}\left(\lambda,k\right) + 1} \tag{10}$$

To suppress the noise to the maximum extent, the gain can be further adjusted:

$$G\_0\left(\lambda,k\right) = \begin{cases} \emptyset & \text{, } G\left(\lambda,k\right) < T\_{\mathfrak{E}}\\ G\left(\lambda,k\right), & G\left(\lambda,k\right) \ge T\_{\mathfrak{E}} \end{cases} \tag{11}$$

where g is a minor constant value, and T<sup>g</sup> is a dynamic threshold determined by SNR. T<sup>g</sup> is also one of the key factors that determine algorithm sensitivity.

Finally, the signal power after noise reduction is as follows:

$$S\left(\lambda,k\right) = G\_{\mathbb{D}}\left(\lambda,k\right)P(\lambda,k)\tag{12}$$

#### Example

An example of eVoice working in a speech-shaped noise (SSN) at +5 dB SNR is shown in **Figure 2**. eVoice was implemented with the APS coding strategy with a channel selection of 8-of-24 at a sampling rate of 16,000 Hz. **Figure 2** shows the power comparison in the eighth channel, including the signals for clean speech, noisy speech, processed speech, and estimated noise plotted in different colors.

### EXPERIMENT 1: SUBJECTIVE PREFERENCE AND SPEECH RECOGNITION IN NOISE

This experiment was designed to evaluate speech intelligibility with eVoice (denoted by "NR1") compared with another NRA (denoted by "NR2") that used a binary masking for noise reduction, as well as the APS strategy with no NRA (denoted by "APS"). NR2 uses the same noise estimation method with NR1 as described in Noise Estimation. After noise estimation, NR2 calculates an SNR that is used to set the gain. That is, if the SNR is higher than a threshold, set the gain to 1 (speech dominant), or a small constant if lower (noise dominant). NR2 was selected for comparison because it is as computationally effective as eVoice and the method of ideal binary masking had been studied in other CI systems (Mauger et al., 2012b). Speech intelligibility was measured with a speech-in-noise recognition test and a subjective rating questionnaire.

#### Methods Participants

This experiment involved 11 experienced CI users (six females and five males), aged from 20 to 59 years (mean age = 41.2 years). All were postlingually deafened adults unilaterally implanted with a CS-10A implant and using a VenusTM sound processor (i.e., first generation) programed with the APS strategy. The EnduroTM sound processor was fitted instead of the VenusTM in this experiment. There is an option in a remote control to select whether to use an NRA (one with NR1-eVoice and the other one with NR2-Binary Masking). Demographics for individual participants are presented in **Table 1**. All participants' native language was Mandarin Chinese, and participants were paid for their time and traveling expenses. Written informed consent was obtained before the experiment, and all procedures were approved by the local institution's ethical review board.

#### Procedures and Materials

In this experiment, NR1 and NR2 performances were assessed first in a subjective evaluation, followed by a speech-in-noise recognition test.

The subjective evaluation lasted for 2 weeks. At the beginning of week 1, participants were fitted with an EnduroTM processor that was incorporated with the NR1 and were asked to have a take-home trial for 1 week. During that week, participants were free to turn the NR1 on and off and use it in various everyday listening scenarios. At the end of week 1, subjective ratings were collected using the questionnaire shown in **Table 2**. Similar procedures were followed for the NR2 in week 2. The questionnaire consists of eight questions that cover various everyday listening scenarios. A 5-point rating scale was used to collect participants' subjective ratings of the NR1 or NR2 in each listening scenario after each 1-week take-home use: 2, strongly agree; 1, agree; 0, neutral; -1, disagree; -2, strongly disagree.

In the test of speech recognition in noise, we used two noise types (an SSN and a babble noise) at three SNRs (5, 10, and 15 dB) to compare the three algorithms (APS, NR1, and NR2). This yielded a total of 21 test blocks (two noise types × three SNRs × three algorithms + baselines of the three algorithms in quiet). The three baseline blocks (three algorithms in quiet) were conducted first in a random order, followed by the remaining 18 blocks in a random order. We used sentence materials from two published Mandarin speech databases: the PLA General Hospital sentence recognition test (Xi et al., 2012) corpus and the House Research Institute sentence recognition test (Fu et al., 2011) corpus. The PLA General



Abbreviations: CI, cochlear implant; F, female; L, left; M, male; R, right.

Hospital corpus consists of 12 lists each with 11 sentences, and each sentence includes six to eight key words. The House Research Institute corpus comprises 10 lists each with 10 phonetically balanced sentences, and each sentence contains seven words. All sentences were read by female speakers. Eleven of the 12 lists in the 301 corpus and all lists in the House corpus were used.

Because of the limited number of material lists, different lists from the PLA General Hospital and House Research Institute corpora were randomly assigned to blocks for each participant, with one list for each block. Special care was taken to ensure that the blocks of each algorithm used lists from the same corpus. In each block, sentences were presented in a random order, and a percentage word correctness score was calculated. Stimuli were presented in a soundproof room by a speaker located 1 m in front of the participant at a comfortable level (approximately 65 dBA). The tests were administered using QuickSTAR4TR software developed by Qianjie Fu (Emily Fu Foundation, 2019).

#### Statistical Analysis

Repeated-measures one-way analysis of variance (ANOVA) was used to analyze speech recognition in quiet. Repeatedmeasures three-way ANOVAs were performed to assess speech recognition in noise. Bonferroni adjustments were used for multiple comparisons.

#### Results

#### Subjective Evaluation Test

**Figure 3** shows the results of the subjective ratings for NR1 and NR2.

For NR1 (i.e., eVoice), there were many positive ratings and few negative ones. Most participants gave positive ratings to Q2, Q4, Q5, and Q6, which indicated better listening experience with NR1 on than off in scenarios such as multitalker communication, at an intersection, and in a vehicle. For Q3, Q7, and Q8, most participants had neutral ratings, which corresponded to scenarios such as in a restaurant or supermarket and near an air conditioner

or fan. This result suggested comparable performance between NR1 on and off in these settings. There were a few participants who give positive ratings to Q3, Q7, and Q8 (better experience with NR1 turned on in listening scenarios such as a one-on-one conversation in a restaurant, by an air conditioner or fan, or in a busy supermarket). For listening in quiet, most participants reported that NR1 had no effect on a one-on-one conversation in quiet and gave positive ratings to Q1 (the NRA had no effect on one-on-one conversations in quiet rooms).

For NR2 (i.e., binary masking), the feedback was more variable. In general, ratings were almost evenly distributed

TABLE 2 | Questionnaire used for subjective evaluation.


NRA refers to the noise-reduction algorithm to be evaluated (i.e., NR1 in week 1 or NR2 in week 2).

between negative and positive for all eight questions except Q2 and Q8, which means that there were participants who thought NR2 was helpful in most listening scenarios. However, comparable numbers of participants thought it was not helpful or were neutral. For Q2 and Q8, most participants gave neutral ratings, which indicate that most thought the NR2 had no effect for multitalker communication in quiet or a one-on-one conversation in a supermarket.

#### Speech Intelligibility Test

Results of speech recognition in quiet are shown in **Figure 4**. A repeated-measures one-way ANOVA revealed no significant difference among the mean results (∼90%) of the three algorithms (p = 0.452).

**Figure 5** shows the results of speech recognition in the SSN and babble noise. Statistical significance was determined using ANOVA with the percent correct scores as the dependent variable and the noise type (SSN or babble), SNR (5, 10, or 15 dB), and algorithm (APS, NR1, or NR2) as within-subject factors. Tests of within-subjects effects indicated a significant effect of noise type (p = 0.022), SNR (p < 0.001), and algorithm (p = 0.002), as well as significant interactions between noise type and SNR (p < 0.001). Pairwise comparisons revealed that the overall performance of NR1 was significantly better than APS (p = 0.001) and NR2 (p = 0.016), and there was no significant difference between APS and NR2 (p = 0.612). When noise type and SNR were fixed to determine the effect of algorithms at specific SNRs in a particular noise type, NR1 performed significantly better than NR2 at the 5-dB SNR in the SSN (p = 0.010) and also significantly better than APS (p = 0.027) at the 5-dB SNR in the babble

FIGURE 3 | Results of subjective evaluations of NR1 (left panel) and NR2 (right panel). The abscissa lists all eight questions used for the subjective evaluation, and the ordinate is the rating given by the participants. Along the ordinate, "-2" represents strong disagreement on the question, and "2" represents strong agreement. The larger the number, the more positive the subjective evaluation is that the NR could help in different noisy scenarios and did not impact listening in quiet settings. The size of the circles represents the number of participants who gave the corresponding ratings, with larger circles indicating more participants.

noise. In both the SSN and babble noise at the SNRs of 10 and 15 dB, there were no significant differences among the three algorithms. However, higher mean scores of NR1 could be observed against APS and NR2 at the 10-dB SNR in SSN (nearly eight percentage points), as well as at the 10 dB SNR (eight percentage points higher than APS) and 15 dB SNR (∼5 percentage points higher than APS and NR2) in the babble noise, although these improvements were not statistically significant.

### Short Summary

In this experiment, we tested two NRAs: eVoice (NR1) and another that used binary masking (NR2). Both use the same noise estimation process but differ in the noise cancelation process. NR1 uses a smoothing gain function, whereas NR2 uses a binary masking. The subjective evaluation ratings show that NR1 was positively reviewed, whereas ratings of NR2 were almost evenly distributed from negative to positive, with a slight dominance of neutral responses. The speech recognition test results indicate overall better performance of NR1 compared to NR2 and APS. However, a significant benefit was only found at 5-dB SNR. The above results demonstrate that NR1 had better performance than NR2 for both speech recognition tests and subjective evaluations.

# EXPERIMENT 2: SPEECH RECEPTION THRESHOLD TEST

#### Rationale

The hypothesized significant benefit of eVoice was not always supported by the results of the first experiment. One reason may be from the fixed SNR procedure and large performance variance in the cohort. From the results of Experiment 1 (left panel in **Figure 5**), we noticed that the ceiling effect could be observed in some participants at the SNR of 15 dB, and the floor effect could be observed at the SNR of 5 dB. Speech perception in noise varied dramatically among participants, even at the same SNR in the same noise. This indicates a limit of testing percent correct scores at fixed SNRs because this type of test is not able to exclude potential ceiling and floor effects. To overcome this limitation, we designed Experiment 2, which used an adaptive

speech reception threshold (SRT) test to measure the potential benefits of eVoice.

In the first experiment, we found clearly that NR1 (i.e., eVoice) provided better performance than NR2 (i.e., the ideal binary one) in the subjective test, although little improvement was observed in the speech-in-noise recognition test. To further explore the potential of eVoice and to save experiment time, only NR1 was evaluated in the second experiment.

#### Methods

#### Participants

Eight experienced CI users were recruited for this experiment (five females and three males, aged from 23 to 62 years with a mean of 43.6 years). All spoke Mandarin Chinese as their native language. They were all postlingually deafened adults unilaterally implanted with a CS-10A implant and used EnduroTM devices as their clinical processors, programed with the APS coding strategy with a remote control option to switch eVoice on or off. Demographic data for individual participants are presented in **Table 3**. Participants were compensated for their time and traveling expenses. All provided informed consent before the experiment, and all procedures were approved by the local institution's ethical review board.

#### Materials and Procedures

An adaptive staircase SRT in noise test was administered to further evaluate the performance of eVoice. This SRT measurement method was adopted from our previous studies (Meng et al., 2016, 2019) with two minor changes: (1) the stimulus presentation time was reduced from three at most to two at most, and (2) the correctness judgment threshold was changed from 50% words in a sentence to 80% words. The first was done to reduce experiment time. The second was for tracking a higher threshold, which is more indicative for a true understanding. Therefore, we were actually tracking a threshold around which the subjects have a 50% chance to obtain 80% correctness.

The Mandarin Hearing in Noise Test (MHINT) corpus (Wong et al., 2007) recorded by a single male speaker was used. There are 12 lists for formal tests and 2 lists for practice, with 20 sentences in each list, and 10 words in each sentence. In this experiment, 10 of 12 formal test lists were used as target speech in the formal tests, and both practice lists were used in the training stage to familiarize participants with the test procedures.

The SRTs for each condition with and without eVoice were tested. For each condition, two types of background noise were used: SSN and babble noise, which were generated using the method described in section "Speech Stimuli and Tasks" of Experiment 2 in Meng et al. (2019). The SRT for each condition–background combination was tested twice using two different MHINT lists, and the results were averaged between the two lists as the final SRT. Speech intelligibility for each condition in quiet was also measured using one MHINT list. Therefore, a total of 10 lists were used for testing (two backgrounds × two conditions × two lists per combination + two lists for speech intelligibility in quiet). The order of lists and conditions was randomized across participants. Prior to the formal test, two practice lists were used to familiarize participants with the test procedures of the SRT and the speech intelligibility in quiet test. During the test, each sentence was presented at most twice on the request of the participants; participants were instructed to repeat words that form a sentence with a meaning, and no feedback was given.

The SNR in each trial was adapted by changing the level of target speech with fixed background noise. Participants were instructed to repeat as many words as they could, and the target level was decreased if no less than eight of the words were repeated correctly; otherwise, the target level was increased. The step size was 8 dB before the second reversal, followed by 4 dB before the fourth reversal and 2 dB for the remaining reversals. The arithmetic mean of the SNRs of the last eight sentences was calculated and recorded as the final SRT.

It is worth mentioning that the babble noise used in this study consisted of voices of the same talker as the target speech (Meng et al., 2019), which is extremely challenging for any NRA. Additional information about the procedures and materials can be obtained from Meng et al. (2016, 2019).

#### Results

The eight CI users listed in **Table 3** participated in this experiment, but N17 was found to have auditory neuropathy. Therefore, N17 data were excluded from the analyses.

Results of speech recognition in quiet are shown in **Figure 6**. The group mean scores were 93.1 and 93.3% for eVoice-off and eVoice-on, respectively. A two-tailed paired-samples t-test showed no significant difference between the two conditions (t(6) = −0.162, p = 0.877).

**Figure 7** shows the results of the SRTs in the SSN (left panel) and babble noise (right panel). In the SSN, every participant had lower SRTs with eVoice-on than with eVoice-off. The group mean SRTs were 7.9 and 5.7 dB for eVoice-off and eVoice-on, respectively. This 2.2-dB difference was a statistically significant improvement (t(6) = 6.892, p < 0.001).

In the babble noise, group mean SRTs of 10.9 and 10.7 dB were observed for eVoice-off and eVoice-on, respectively. A two-tailed paired-samples t test revealed no significant difference between the two conditions (t(6) = 0.249, p = 0.812).

#### Short Summary

The aim of this experiment was to quantify the benefit introduced by eVoice for speech intelligibility and exclude potential ceiling and floor effects. Speech intelligibility was measured using an adaptive SRT test with two different backgrounds: SSN and babble noise. There was no significant difference in speech recognition rates in quiet settings. This result indicates that eVoice would not affect speech perception in quiet. eVoice yielded an SRT decrease of 2.2 dB in SSN, whereas no significant effect was found in SRTs in babble noise.



Abbreviations: CI, cochlear implant; F, female; L, left; LAVS, large vestibular aqueduct syndrome; M, male; R, right.

FIGURE 7 | Results of SRT in the SSN (left panel) and babble noise (right panel). Individual SRTs are shown on the left, and the group mean SRTs are shown on the right. Error bars show the standard error of group means. The significant difference is illustrated by the asterisk (p < 0.05).

#### DISCUSSION

In this study, we examined eVoice, the first noise-suppression technique in Nurotron <sup>R</sup> CIs. eVoice is a single-channel NRA implemented within the APS strategy in the Enduro processor. Two experiments were conducted to evaluate this algorithm. First, the performance of eVoice was compared with another binary-masking method in a speech recognition test and also underwent a subjective evaluation in Experiment 1 (N = 11). The eVoice performed slightly better than the binary-masking NRA. Then, the more indicative adaptive SRT test was conducted to quantify the noise reduction effect of eVoice on speech intelligibility in Experiment 2 (N = 7). Comparing eVoice on and off, there was a 2.2-dB SRT benefit in stationary noise and no difference in quiet and non-stationary noise.

Compared to other single-channel NRAs implemented in CI strategies, eVoice has comparable performance with those reported in the literature. For example, a single-channel NRA implemented in the ACE strategy was found to have an SRT benefit of up to 2.14 dB in stationary noise (Dawson et al., 2011). The ClearVoice implemented in the HiRes 120 strategy used a time window of 1.3 s for noise estimation and yielded a percent

correct score increase of up to 24 percentage points (Buechner et al., 2010). This may translate to a 1.3- to 3.4-dB SRT decrease according to the literature that for typical speech materials, a 1-dB SRT decrease leads to 7- to 19-percentage-point increase in the percent correct score (Moore, 2007). However, significant benefits in non-stationary noise are seldom reported in the literature, which may indicate a limit of traditional single-channel NRAs. More advanced techniques should be developed to improve speech perception in non-stationary noise for CI users.

This article is significant from the implantees' and the audiologists' perspectives. For a new system with a quickly growing number of users, this report on eVoice is useful for understanding the system and the new noise reduction method. For a new NRA in CIs, two questions are of great concern to users and audiologists: (1) whether this NRA really works in various types of noises and (2) to what extent users can benefit from it. Our results demonstrate that eVoice can improve speech intelligibility in stationary noise and does not affect speech perception in quiet and nonstationary noise. This is because eVoice is a monaural SNR estimation–based algorithm that assumes that the noise is relatively stationary compared with speech. We found that some users of the Enduro processor might have not noticed the existence of this NRA, and their audiologists can advise or remind them to turn eVoice on to improve their speech perception performance in noise.

Another significant contribution of this article is to inspire people to rethink noise management for CI systems. Researchers should consider the assumptions about directionality and complex non-linear patterns that can be computationally modeled by signal processing or machine learning (e.g., Bianco et al., 2019; Gong et al., 2019). Previous studies and present work provide considerable support for optimizing and updating noisesuppression techniques to improve speech-in-noise recognition for CI users.

# REFERENCES


# DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Medical Ethics Committee of Shenzhen University. The patients/participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

All authors conceived and designed the analysis and wrote the manuscript. HZ and NW collected the data.

# FUNDING

This work was supported by the National Natural Science Foundation of China (11704129, 11574090, and 61771320), Natural Science Foundation of Guangdong, China (2020A1515010386 and 2018B030311025), and Shenzhen Science and Innovation Funds (JCYJ 20170302145906843).

## ACKNOWLEDGMENTS

We are grateful to all the CI participants for their patience and cooperation during this study. We would like to thank Wenhe Tu, Sui Huang, Peiyao Wang, Carol Peng, Shanxian Gao, and Jiaqi Zhang from Nurotron for their help during data collection. We also thank the two reviewers for helpful comments.


system via animal experiments and clinical trials. Acta Otolaryngol. 136, 68–77.


cochlear implants. IEEE Trans. Biomed. Eng. 66, 573–583. doi: 10.1109/tbme. 2018.2850753


**Conflict of Interest:** Nurotron provided some compensation to the subjects and accommodation to HZ during Experiment 2. NW was employed by the company Nurotron Biotechnology Inc.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Zhou, Wang, Zheng, Yu and Meng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Critical Review of Transcutaneous Vagus Nerve Stimulation: Challenges for Translation to Clinical Practice

Jonathan Y. Y. Yap1†, Charlotte Keatch2†, Elisabeth Lambert 3,4, Will Woods <sup>3</sup> , Paul R. Stoddart 1,2 and Tatiana Kameneva2,4,5 \*

*<sup>1</sup> ARC Training Centre in Biodevices, Swinburne University of Technology, Hawthorn, VIC, Australia, <sup>2</sup> Faculty of Science, Engineering and Technology, Swinburne University of Technology, Hawthorn, VIC, Australia, <sup>3</sup> School of Health Sciences, Swinburne University of Technology, Hawthorn, VIC, Australia, <sup>4</sup> Iverson Health Innovation Research Institute, Swinburne University of Technology, Hawthorn, VIC, Australia, <sup>5</sup> Department of Biomedical Engineering, The University of Melbourne, Parkville, VIC, Australia*

#### Edited by:

*Tianruo Guo, University of New South Wales, Australia*

#### Reviewed by:

*Peijing Rong, China Academy of Chinese Medical Sciences, China Xiaohong Sui, Shanghai Jiao Tong University, China*

> \*Correspondence: *Tatiana Kameneva tkameneva@swin.edu.au*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience*

Received: *02 September 2019* Accepted: *12 March 2020* Published: *28 April 2020*

#### Citation:

*Yap JYY, Keatch C, Lambert E, Woods W, Stoddart PR and Kameneva T (2020) Critical Review of Transcutaneous Vagus Nerve Stimulation: Challenges for Translation to Clinical Practice. Front. Neurosci. 14:284. doi: 10.3389/fnins.2020.00284* Several studies have illustrated that transcutaneous vagus nerve stimulation (tVNS) can elicit therapeutic effects that are similar to those produced by its invasive counterpart, vagus nerve stimulation (VNS). VNS is an FDA-approved therapy for the treatment of both depression and epilepsy, but it is limited to the management of more severe, intervention-resistant cases as a second or third-line treatment option due to perioperative risks involved with device implantation. In contrast, tVNS is a non-invasive technique that involves the application of electrical currents through surface electrodes at select locations, most commonly targeting the auricular branch of the vagus nerve (ABVN) and the cervical branch of the vagus nerve in the neck. Although it has been shown that tVNS elicits hypo- and hyperactivation in various regions of the brain associated with anxiety and mood regulation, the mechanism of action and influence of stimulation parameters on clinical outcomes remains predominantly hypothetical. Suppositions are largely based on correlations between the neurobiology of the vagus nerve and its effects on neural activity. However, tVNS has also been investigated for several other disorders, including tinnitus, migraine and pain, by targeting the vagus nerve at sites in both the ear and the neck. As most of the described methods differ in the parameters and protocols applied, there is currently no firm evidence on the optimal location for tVNS or the stimulation parameters that provide the greatest therapeutic effects for a specific condition. This review presents the current status of tVNS with a focus on stimulation parameters, stimulation sites, and available devices. For tVNS to reach its full potential as a non-invasive and clinically relevant therapy, it is imperative that systematic studies be undertaken to reveal the mechanism of action and optimal stimulation modalities.

Keywords: vagus nerve, vagus nerve stimulation, transcutaneous, neuromodulation, neurostimulation

# 1. INTRODUCTION

Vagus nerve stimulation (VNS) is an FDA-approved treatment for both pharmacoresistant depression and epilepsy and can produce clinically meaningful antidepressant and anti-seizure effects (Nemeroff et al., 2006; Johnson and Wilson, 2018). More than 100,000 VNS devices had been implanted in more than 70,000 patients globally by 2013 (Labiner and Ahern, 2007). The

**97**

implantable device consists of an electrode, which is wrapped around the left vagus nerve, and an implantable unit, positioned below the collarbone and containing the battery and pulse generator.

Device implantation is predominantly performed on an outpatient basis under general anesthetic, but some patients may require overnight stay if extended observation is necessary. Despite being a minimally invasive procedure, the surgery is inherently risky due to the location of implantation, with electrode placement requiring dissection of the vagus nerve from the carotid artery. Potential adverse events arising from the surgical intervention include bradyarrhythmias during device placement, the development of peritracheal hematoma (due to surgical trauma), and other respiratory complications, including vocal cord dysfunction and dyspnea (due to nerve trauma). VNS can also cause changes to breathing patterns during sleep, resulting in an increase in the number of obstructive apneas and hypopneas (Marzec et al., 2003; Fahy, 2010), and can, albeit rarely, produce late-onset bradyarrhythmias and severe asystolia due to atrium-ventricular block (Iriarte et al., 2009). These potential adverse events limit the intervention's applicability to those who are resistant to conventional therapeutic strategies, and total device and procedural costs amount to around AU \$50,000 (Lehtimäki et al., 2013), a price that is prohibitively high for many, as it is a non-subsidized treatment.

Transcutaneous vagus nerve stimulation (tVNS) is a method that has been developed to overcome these limitations, and the potential widespread accessibility of the technology adds to its appeal as a possible first-line treatment option. Anatomical studies of the ear suggest that the tragus, concha, and cymba concha are the places on the human body where there are cutaneous afferent vagus nerve distributions (**Figure 1**) (Peuker and Filler, 2002), and it is believed that stimulation of these afferent fibers should produce therapeutic effects that are similar to those of regular VNS (Hein et al., 2012; Rong et al., 2012; Stefan et al., 2012). Similarly non-invasive stimulation of the cervical branch of the vagus nerve has received popularity due to minimal side effects, low cost, and morbidity associated with the technique (Goadsby et al., 2014; Grazzi et al., 2014; Kinfe et al., 2015b). In this review, we refer to both auricular and cervical nerve stimulation as tVNS.

The potential of tVNS is not limited to the treatment of depression and epilepsy, with the technology being investigated for a variety of disorders including headache, tinnitus, atrial fibrillation, post-error slowing, prosocial behavior, associative memory, schizophrenia, and pain (Laqua et al., 2014; Hasan et al., 2015; Hyvärinen et al., 2015; Jacobs et al., 2015; Nesbitt et al., 2015; Sellaro et al., 2015a,b; Stavrakis et al., 2015).

Despite the breadth of research being undertaken, many questions remain regarding the most effective stimulation sites and parameters. As many of the described methods differ in the parameters and protocols applied, there is currently no firm evidence regarding the optimal location for stimulation to achieve the greatest clinical effects let alone an understanding of the neurophysiological mechanisms. Therefore, this critical review aims to explore the reported studies in tVNS with a view to promoting more systematic approaches that might help to translate the technique into mainstream clinical practice.

In comparison to tVNS, the invasive approach to VNS has been the subject of a number of recent reviews. For example, a review of functional neuroimaging studies in VNS confirmed that invasive stimulation causes changes in various brain regions and at different levels (Chae et al., 2003). A review of VNS with a focus on depression is presented in Müller et al. (2018). Recent advances in devices for VNS have been covered in Mertens et al. (2018). Similarly, applications and potential mechanisms of VNS have been discussed in some detail (Groves and Brown, 2005; Yuan and Silberstein, 2016a,b).

The few reviews that specifically focus on tVNS are very recent. A systematic review of the safety and tolerability of tVNS was presented in Redgrave et al. (2018), while two companion papers have focused on the physiological and engineering perspectives of tVNS (Kaniusas et al., 2019a,b). Whereas, Kaniusas et al. (2019a,b) outlined current research directions in auricular vagus nerve stimulation, this review takes a more critical approach and explores fundamental limitations of study design protocols that may lead to difficulties in translating current research into the clinic. We have also reviewed cervical vagus nerve stimulation in addition to auricular applications.

The review presented here focuses on a mechanistic understanding of tVNS, with a detailed description of stimulation parameters, sites of stimulation, and devices used in current research. We review current publications investigating the effect of electrode placement on auricular vagus nerve stimulation recruitment and corresponding neural activations, papers studying the effect of stimulation parameters (waveform, polarity, frequency, pulse width, duty cycle, and current), and manuscripts exploring the neurophysiological mechanisms of tVNS. We also consider whether tVNS can be used for closed-loop control of neural activity. We outline fundamental gaps in our understanding that need to be overcome in order to maximize efficacy, minimize risk, and thus support the successful translation of tVNS into mainstream clinical practice.

## 2. TRANSCUTANEOUS VAGUS NERVE STIMULATION (tVNS)

#### 2.1. Anatomical Considerations

Transcutaneous vagus nerve stimulation (tVNS) is based on the results of anatomical studies illustrating the path of the auricular branch of the vagus nerve (ABVN; Alderman's nerve; and Arnold's nerve), which originates from the superior ganglion of the vagus nerve from within the jugular foramen (Tekdemir et al., 1998), transversely passing through the facial canal, entering the small canal of the petrous bone, and emerging from the tympanomastoid fissure, proceeding to innervate the external acoustic meatus and auricle (Kiyokawa et al., 2014). As Peuker and Filler identify, the ABVN (**Figure 2**) is most prominently spread through the antihelix, tragus, cymba concha, and concha (Peuker and Filler, 2002). These are the places on the human body where there are cutaneous afferent vagus nerve distributions, and thus, as theoretically proposed by Ventureyra the vagus nerve.

FIGURE 2 | Innervation of the auricular branch of the vagus nerve (ABVN). GAN, great auricular nerve; ATN, auriculotemporal nerve; STA, superficial temporal artery; LON, lesser occipital nerve; V, vessels. Adapted from Peuker and Filler (2002) with permission.

(2000), it is believed that direct stimulation of these nerve fibers should produce therapeutic effects similar to those of VNS. More recently, the original article by Peuker and Filler was the subject of some controversy due to different numbers being reported for tragus innervation by the ABVN in the main text and in the table (possibly a typing error) (Burger and Verkuil, 2018). Peuker and Filler (2002) later explained that the knowledge of auricular vagus nerve anatomy does not rest solely on this data, and other publications support the same findings (He et al., 2012).

Transcutaneous cervical vagus nerve stimulation is another method that has been developed to non-invasively stimulate the vagus nerve with electrodes placed over the sternocleidomastoid muscle. This is a similar location to where the electrodes for VNS are positioned and is more reminiscent of Corning's initial approach. However, the vagus nerve's location within the carotid sheath (**Figure 3**), beneath the skin (2 mm), superficial fascia (3–6 mm), and sternocleidomastoid muscle (5–6 mm) (Seiden et al., 2013) can make selective transcutaneous stimulation of vagus nerve fibers difficult, with current product offerings most likely indiscriminately stimulating afferent and efferent fibers alike (Yuan and Silberstein, 2016b).

Conventionally, the left vagus nerve has mostly been selected as the preferred stimulation site due to safety concerns arising from observations during animal studies showing that rightsided VNS results in a greater degree of bradycardia (Yuan and Silberstein, 2016b). This is due to the asymmetric innervation of the heart, where the right vagus nerve predominantly innervates the sinoatrial (SA) node and the left predominantly innervates

the atrioventricular (AV) node (Ardell and Randall, 1986). As such, right VNS in dog studies activated the cardiac motor efferents innervating the SA node, causing bradycardia through a reduction of depolarization rates and providing credence to the belief that right-sided VNS should not be attempted in clinical settings (Krahl, 2012). However, the anatomy of the cervical vagus trunk differs between dogs and humans, and the location around which the VNS stimulation electrodes are wrapped (in humans) does not include the superior or inferior cardiac branches, thereby diminishing the risk of significant cardiac adverse events (Krahl, 2012). Despite this, the FDA-approved labeling for VNS devices specifies that "the VNS Therapy System is indicated for use only in stimulating the left vagus nerve in the neck area inside the carotid sheath. The VNS Therapy System is indicated for use only in stimulating the left vagus nerve below where the superior and inferior cervical cardiac branches separate from the vagus nerve. The safety and efficacy of the VNS Therapy System have not been established for stimulation of the right vagus nerve or of any other nerve, muscle, or tissue" (Depression Physician's Manual, 2005).

While limiting treatments to the left side may be warranted for VNS, due to the potential to directly stimulate the cardiac motor efferents innervating the SA node, there are questions as to whether the application of these conventional reservations to tVNS is justified. The cardiac effects seen through ABVN stimulation are mediated through a neural pathway that involves the nucleus tractus solitarii (NTS); this activates the dorsal motor nucleus, which then delivers processed signals to the heart surface bilaterally via the efferent cervical vagus nerves. Therefore, unlike cervical VNS, tVNS circumvents the risk of

#### TABLE 1 | Classification of nerve fibers.


*Adapted from Fix and Brueckner (2009).*

directly and asymmetrically stimulating cardiac motor efferent fibers, thus causing adverse cardiac events (Chen et al., 2015). As such, simply disregarding the therapeutic potential of bilateral ABVN stimulation, based on conventional preconceptions and parallels drawn from VNS, may be premature and warrants further investigation. Additionally, bilateral ABVN stimulation has been shown to be safe in pilot studies investigating tVNS as a complementary therapy for pediatric epilepsy (He et al., 2013).

#### 2.2. Nerve Fiber Types

The vagus and its branches consist of around 80% sensory afferent and 20% motor afferent fibers (Yu et al., 2008). Nerve fibers can be further classified into one of three groups based on their diameter: the A group (consisting of Aα Aβ, Aγ , and Aδ), B group, and C group. The different nerve fiber types have different diameters and myelination thicknesses (**Table 1**), which corresponds to different conduction velocities, with thicker myelination typically linked to faster conduction velocities or signal propagation (Fix and Brueckner, 2009).

A-group fibers are thick, myelinated, afferent, and efferent, and they also typically have diameters of around 1–22 µm and a conduction velocity of 5–120 m/s. They are typically found in both motor and sensory pathways. B fibers are only moderately myelinated, with diameters = 3 µ m and a conduction velocity ranging from 3 to 15 m/s. C fibers are non-myelinated, and they thus have slower conduction speeds of 2 m/s and thinner diameters of between 0.2 and 1.5 µm.

The cervical branch vagus nerve is made up of about 20% myelinated A and B fibers and 80% unmyelinated C fibers (Vonck et al., 2009). Contrary to earlier studies, which have suggested that C fiber recruitment during VNS was essential for seizure suppression, Kraus et al. (2007) showed that destruction of peripheral C fibers did not influence VNS-induced seizure suppression, and the therapeutic effects of VNS have thus been attributed to the maximal recruitment of thick afferent A and B nerve fibers (Evans et al., 2004). Minimal side effects suggest that stimulation of these fibers is well-tolerated (Helmers et al., 2012).

Similarly, Stefan et al. (2012) showed that tVNS does not elicit painful sensations in the participants, which suggests that afferent C axons and thin myelinated Aδ axons are not activated. A study by Mourdoukoutas et al. (2018) also investigated the fibers that can be activated by tVNS, and they found that at the typically used current of 10 mA, only A-axons and larger B-axons were activated; this is likely due to the diameter of their fibers, implying that C-fibers were too thin to be activated by the applied electrical stimulation.

At the cervical level, the vagus nerve mainly consists of small diameter unmyelinated C fibers (65–80%) and of a smaller portion of intermediate- diameter myelinated B fibers and largediameter myelinated A fibers. A, B, and C fiber distributions within the carotid vagus nerve have been well-documented (Standring, 2015), enabling the development of computational models to determine the optimal current and pulse width parameters for VNS to activate the myelinated A and B afferent fibers (Helmers et al., 2012). Despite this, the optimal stimulation parameters for VNS are unknown, as the effects of other parameters, such as frequency and duty cycle, are observed post-synaptically in various structures of the brain. Given that these activations cannot be computationally modeled, clinical application and stimulation parameter selection of VNS relies on subjective benefits reported by patients.

In contrast, the distributions of the various nerve fiber types of the ABVN have not been investigated to the level of detail necessary for computational modeling. Therefore, the presence of various nerve fiber types remains speculative and evaluations of intervention efficacy have been based on subjectively experienced therapeutic benefits correlated with other primary and secondary outcomes, such as neuroimaging studies.

As with stimulation of the cervical branches of the vagus nerve with low level electrical currents, stimulation of the ABVN would be expected to activate thick myelinated fibers only and with no activation of the thin diameter unmyelinated C fibers. The ABVN is a general sensory fiber and is one of the few branches to contain no motor fibers. As such, the myelinated fibers found in the ABVN would be expected to be A-group sensory axons rather than B-group autonomic fibers. Only one study has determined the number of myelinated axons that are present in the ABVN (Safi et al., 2016). Around 50% of the myelinated axons were measured to have a diameter of between 2.5 and 4.4 µm, which suggests that they belong to the Aδ group. Nearly 20% of the axons were measured to have a diameter >7 µm, suggesting the fibers belong to the Aβ class. However, the ABVN contains almost six times less Aβ class nerve fibers than those found in the cervical branch of the vagus nerve. This number also varied greatly between individuals, which may explain why some individuals do not experience therapeutic effects after treatment with tVNS, and it may go some way to explain the anatomical basis behind the mechanism and effectiveness of tVNS (Butt et al., 2019).

#### 2.3. tVNS for Common Health Conditions 2.3.1. Depression

The mechanism behind the therapeutic anti-depressive effects of VNS and tVNS is still unknown. In 2007, Kraus et al. investigated the acute brain activations of healthy subjects following tVNS through functional magnetic resonance imaging (fMRI), showing hypoactivation of the amygdala, hippocampus, parahippocampal gyrus, and middle and superior temporal gyrus, and hyperactivation in the insula, precentral gyrus, and thalamus (Kraus et al., 2007). These cortical areas are connected both directly and indirectly to the nucleus tractus solitarii (NTS), which receives greatest afferent vagus input. The NTS relays incoming sensory information to the brain via an automatic feedback loop, direct projections to the reticular formation in the medulla, and ascending projections to the amygdala, insula, hypothalamus, thalamus, orbitofrontal cortex, and other limbic regions involved in anxiety and mood regulation via the parabrachial nucleus and the locus coeruleus (Mohr et al., 2011). It is hypothesized that hypoactivation of the amygdala suppresses the hyperactive limbic brain areas, as seen in patients with depression (Mayberg, 1997), through projections from the amygdala to the amygdala–hippocampus–entorhinal cortex of the limbic system (Kraus et al., 2007).

These results are consistent with the acute diminished activity of the limbic system found during VNS (Henry et al., 1998; Chae et al., 2003; Mohr et al., 2011). Interestingly, changes in regional cerebral blood flow induced by VNS are similar to those found in depressed patients treated with selective serotonin reuptake inhibitors (fluoxetine) (Mayberg et al., 2000), either in the amygdala, hippocampus, or parahippocampus (Nemeroff et al., 2006). fMRI studies of patients with depression, following 1 month of tVNS, showed increased functional connections between the default mode network and the precuneus, rostral anterior cingulate cortex, and medial prefrontal cortex. This has also been associated with a reduction in depression severity (Fang et al., 2016) and is similar to results illustrating the therapeutic effects of transcranial magnetic stimulation (Fitzgerald et al., 2006).

Activation of the central nervous system via electrical stimulation of peripheral nerves has become known as the "bottom-up" mechanism, which is a hypothesis based on the neurobiology of the vagus nerve and its effects on neural activity. This is in contrast to the well-known "top-down" mechanism of strategies, such as electroconvulsive therapy and transcranial magnetic stimulation, where the stimulus is applied to central brain structures and subsequently propagates to peripheral sites (Shiozawa et al., 2014). In both human and animal studies, VNS has been shown to elicit changes in neurotransmitters associated with the pathophysiology of depression, including serotonin, norepinephrine, GABA, and glutamate (Ben-Menachem et al., 1995; Krahl et al., 1998; Walker et al., 1999; Dorr and Debonnel, 2006; Manta et al., 2009).

Hein et al. (2012) illustrated the antidepressant effects of 2 weeks of tVNS using an add-on study design, which resulted in significantly improved outcomes on the Beck Depression Inventory (BDI; 27.0–14.0 points). However, no significant changes were observed on the Hamilton Depression Rating Scale (HAMD). Very little information was provided regarding the stimulation parameters that were used; 1.5 Hz unipolar rectangular waves and currents were individually adjusted to maximal but not painful intensities (0–600 mA). In a single blinded clinical trial conducted by Fang et al. (2016) investigating the antidepressant effects of tVNS as a solo treatment, significant improvement was not only seen on the HAMD (28.5–15.0) but also on the Self-Rating Anxiety Scale (SAS; 56.56–42.83) and the Self-Rating Depression Scale (SDS; 66.33–50.56). It is implied that these therapeutic effects may be due to modulation of the resting state functional connectivity of the default mode network, as shown via fMRI imaging. Again, the stimulation parameters used were not comprehensively reported, with density wave adjusted to 20 Hz, a wave width <1 ms, and intensity adjusted based on the tolerance of the patient (4–6 mA).

#### 2.3.2. Epilepsy

In addition to depression, tVNS has also been investigated for its use as a treatment option for drug-resistant epilepsy, a neurological disorder characterized by recurring seizures that affects around 50 million people worldwide (Beghi, 2019). Drug resistance is diagnosed in up to 30% of epilepsy patients (Kwan and Brodie, 2000). Handforth et al. (1998) demonstrated that invasive stimulation of the vagus nerve could suppress the occurrence of seizures and offer a non-pharmacological treatment for epilepsy.

Due to the success of invasive vagus nerve stimulation as a valid treatment option for epilepsy, Stefan et al. (2012) devised a pilot study to investigate whether tVNS would elicit the same anti-convulsive effects. In the pilot study, 10 participants with drug-resistant epilepsy who experienced a minimum of four seizures a month were stimulated on the auricular branch of the vagus nerve transcutaneously through the tragus of the left ear. The stimulation parameters were set to a frequency of 10 Hz with a pulse width of 0.3 ms, and the stimulation intensity was set to the individual's tolerance threshold. The participants were trained to self-administer the tVNS for three 1-h sessions per day as part of their daily routine over a period of 9 months. The participants were encouraged to keep a seizure diary to report the frequency of their seizures both before and during tVNS treatment. In five out of the seven cases that completed the study, the seizure frequency was reduced, which suggested that tVNS could offer seizurereduction effects.

He et al. (2013) also conducted a pilot study to investigate tVNS as a treatment option for pediatric epilepsy. The stimulation protocol differed to the study of Stefan et al. above, as the stimulation was delivered to the left concha with a frequency of 20 Hz for only 30 min at a time three times daily for 6 months. These parameters were found to also elicit seizure-reduction effects, with a 54% reduction in seizure frequency reported after the 6 months of tVNS treatment. More recently, Liu et al. (2018) found an average seizure reduction of 64.4% in 16 out of 17 of their patients after 6 months of treatment with tVNS. The participants were trained to administer 20 min of tVNS three times a day for 6 months to the left concha with a stimulation frequency of 10 Hz.

The exact mechanism by which tVNS prevents or inhibits seizures is not well-understood. It is thought that afferent projections from the ABVN to the nucleus tractus solitarius (NTS) may be responsible for the anti-convulsive effect, however, the neural networks projecting downstream are unclear (Henry, 2002).

#### 2.3.3. Tinnitus

Tinnitus is the perception of sound in the absence of actual external sound and it affects 10–15% of the general population (Han et al., 2009). Recent imaging studies have suggested that chronic tinnitus is linked to a dysfunction in the auditory system, which results in abnormal neuronal behavior. Pairing of invasive vagus nerve stimulation with sound therapy has been shown to reverse tinnitus in rat models (Engineer et al., 2011), and so Lehtimäki et al. (2013) devised a pilot study to investigate whether tVNS could provide any therapeutic benefits for patients with chronic tinnitus. In addition, they also investigated whether tVNS could affect neuronal activity in the auditory cortex by imaging the brain using magnetoencephalography (MEG).

During the study, 10 participants with chronic tinnitus were stimulated continuously on the left tragus at 25 Hz for 45–60 min over seven sessions. The stimulation was paired with tailored sound therapy, which was classical music with the dominant frequency of the individual's tinnitus removed. After the study, all participants reported improved mood and decreased severity of tinnitus. In addition, MEG scans demonstrated that tVNS modulated the auditory cortical response, which suggests that the auditory system can be accessed and modulated via stimulation of the vagus nerve.

#### 2.3.4. Migraine

A number of studies have looked at applying non-invasive VNS to the neck to treat migraines (Goadsby et al., 2014; Grazzi et al., 2014, 2016; Barbanti et al., 2015; Kinfe et al., 2015b). In all of these studies, the gammaCore device (ElectroCore, 2018) was held against the neck in the region of the cervical branch of the vagus nerve, where two stainless steel electrodes deliver 25 Hz of burst stimulation. Total stimulation time varies between studies, but most give 90 s doses of stimulation at a time. This approach has found success in not only reducing the frequency of migraine attacks in participants but also the severity and resultant disability of the attacks.

In addition to non-invasive VNS at the neck, Straube et al. (2015) also investigated whether tVNS at the tragus would have a similar therapeutic effect on migraine. They devised a study for 46 participants, testing the NEMOS tVNS device applying 25 Hz to the tragus for 4 h per day over 3 months, and they also used 1 Hz to the tragus as an active control. Interestingly, the 1 Hz stimulation elicited a more significant reduction in the number of headache days than the 25 Hz active stimulation. This was an unexpected result and demonstrates that a more robust investigation into different stimulation parameters is crucial.

Again, the mechanism of non-invasive VNS and its effect on migraine is not well-understood. One possibility for the therapeutic effects of non-invasive vagus nerve stimulation is thought to be due to activation of the thalamus, which is responsible for information processing and regulation of cortical activity. In patients with migraine, fMRI studies have shown that there is a decrease in thalamocortical activity, and so stimulation of the vagus may help to counteract this decline (Coppola et al., 2004). Alternatively, it is possible that stimulation of the vagus nerve inhibits nociceptive trigeminal neurons, which may have a pain-inhibitory effect (Randich and Gebhart, 1992).

#### 2.3.5. Pain

Johnson et al. first attempted to study the effect of transcutaneous electrical stimulation of the ear on pain threshold in 1991, with a pilot study of 18 participants receiving low frequency burst stimulation at 2.3 Hz for 15 min on three different auricular sites (Johnson et al., 1991). In this study, pain threshold was noted to increase in 10 out of the 18 participants. Three participants also experienced a prolonged analgesic effect even after the stimulation device was turned off.

This pain-inhibitory effect was also noted by Multon and Schoenen (2005) in a review of clinical data collected from patients with implanted VNS devices. The pain thresholds of the patients and any effect VNS had on headaches was measured and confirmed that implanted VNS offered an analgesic effect. Following on from this review of implanted VNS devices, Laqua et al. (2014) proposed a study to investigate whether non-invasive tVNS could offer the same analgesic effect. Electrical stimulation was delivered for 30 min transcutaneously at the cavum conchae in burst stimulation mode with a changing frequency between 2 and 100 Hz. The individual pain threshold was measured using a Neurometer device that measures the sensory nerve conduction threshold. Of the 21 participants, 15 responded with an increase in pain threshold during tVNS, while six noted a decrease in pain threshold during stimulation. These results, although contradictory, agree with the findings of Johnston et al. and support the view that the analgesic effects of VNS are very much dependent on individual sensitivity alongside stimulation parameters.

Busch et al. (2013) devised a study to investigate whether tVNS has the potential to alter pain processing by examining different submodalities of the somatosensory system. A total of 48 participants were stimulated at the left concha on the inner side of the tragus with a stimulation frequency of 25 Hz. Different tests were devised to measure different pain thresholds, such as heat, mechanical, and pressure-related pain thresholds. The results showed an inhibition of mechanical, heat and pressure pain sensitivity after 1 h of continuous tVNS. Detection thresholds for thermal or mechanical inputs were not altered. These results suggest that tVNS can influence pain processing and offer an inhibitory effect on different pain modalities. Analysis of these different submodalities also suggests that tVNS has an impact on the central pain processing centers rather than just peripheral nociceptor activity.

# 3. LIMITATIONS OF CURRENT STUDY PROTOCOLS

While the use of tVNS has been shown to elicit therapeutic benefits through various studies (Hein et al., 2012; Lehtimäki et al., 2013; Mei et al., 2014; Straube et al., 2015; Liu et al., 2018), they mostly use different primary and secondary outcome measures and so the comparability between studies is limited. While this is partly due to the application of the technique to various ailments where primary efficacy endpoints differ between studies, there are also major issues with incomplete reporting and inconsistent use of terminology when reporting the results of incomparable and, in some cases, non-reproducible experiments. The stimulation parameters, devices, electrode types and the main findings of relevant studies are summarized in **Table 2**.

# 3.1. Stimulation Devices

Research groups generally report the stimulation device used in the experiment, but many of the models used have now been discontinued, and access to their technical specifications is limited. The most commonly used devices are the gammaCore electroCore or Nemos Cerbomed (**Figure 4**), with a third of the studies included in **Table 2** employing them for stimulation (e.g., Grazzi et al., 2014, 2016; Frangos et al., 2015; Straube et al., 2015; Frokaer et al., 2016; Lerman et al., 2016; Silberstein et al., 2016a,b). Almost always, the gammaCore electroCore device is used for stimulation at a neck site (e.g., Goadsby et al., 2014; Grazzi et al., 2014, 2016; Lerman et al., 2016; Silberstein et al., 2016a,b) whilst the NEMOS Cerbomed device is predominantly used for stimulation of the ABVN in the ear. The next most common stimulation device is CM02 Cerbomed, used in Sellaro et al. (2015a,b), Hasan et al. (2015), and Steenbergen et al. (2015) among others. The gammaCore or NEMOS devices are often selected for convenience as they provide an easy-to-use package that includes stimulation electrodes. On the other hand, devices, such as TENS-200 or Digitimer DS7A often require custommade electrodes. The NMS 300 device from Xavant Technology has also been used (Schulz-Stübner and Kehl, 2011), while the device has not been specified in two studies (Gaul et al., 2016).

#### 3.1.1. ElectroCore Gammacore

The gammaCore, marketed by electroCore, is a handheld tVNS device that stimulates the vagus nerve within the cervical carotid sheath. The device has been granted investigational FDA approval for the acute and/or prophylactic treatment of primary headache and medication overuse headache in adults. Conductive gel is applied to the stimulation surfaces, which are then placed over the sternocleidomastoid muscle. Stimulation intensity is user-controlled (up to 24 V and 60 mA), with individual treatment sessions lasting for 120 s. The treatment can be safely administered multiple times per day; having been applied up to 6–12 times per day in clinical studies (Yuan and Silberstein, 2016b). The remaining stimulation parameters are fixed, delivering 1 ms pulses of 5 kHz sine waves at 25 Hz. It delivers a proprietary pulse waveform that is designed to penetrate through various levels of tissue, including skin, muscle, and nerve sheaths, in order to stimulate the afferent vagus nerve fibers within the carotid sheath. Potential side effects can include tingling under the stimulation electrodes and mild facial twitching at high intensities. It is a limited-use device that is available in two models: 50 doses and 150 doses. Optimal device usage, in terms of the number of stimulations per day and/or total stimulation duration, is yet to be determined.

#### 3.1.2. Cerbomed NEMOS

The NEMOS device (distributed by tVNS Technologies, previously Cerbomed) is a portable transcutaneous electrical nerve stimulator that delivers stimulus to ABVN distributions located in the left cymba concha. NEMOS has been granted the

#### TABLE 2 |Summary of previous tVNS clinical trials and studies.


*(Continued)*


 *(Continued)*


*(Continued)*


Critical Review of tVNS

*(Continued)*


Yap et al.

Critical Review of tVNS

*(Continued)*


*(Continued)* Critical Review of tVNS

Yap et al.


*NS, not stated. An asterisk indicates that an electrode type was not stated in the study but was assumed by us from the type of the device. A dagger indicates parameters as stated in the original paper but that are outside the normalrange(possibletypingerror).*

CE mark for the treatment of resistant epilepsy. It is comprised of two main components: the stimulation unit, which houses the battery and pulse generator (and is roughly the size of a mobile phone), and a dedicated ear electrode, which is connected to the stimulator via a cable. Stimulation intensity is user-controlled (up to 25 V), with treatments lasting at least 1 h in three to four sessions per day for a total of 4–5 h. The stimulation current is adjusted until a slight tingling or pulsating sensation is perceived at the stimulation site, implying Aβ fiber activation. Prior to stimulation, the user must clean the site of stimulation, as well as the electrodes, to minimize impedance and ensure optimal conductivity. The remaining stimulation parameters are fixed, delivering continuous 0.25-ms-duration monophasic square wave pulses at 25 Hz. Adverse effects may include a slight pain, burning, tingling or itching feeling under the electrode, which dissipates upon electrode removal.

#### 3.1.3. Other

In addition to NEMOS and gammaCore, which are both manufactured specifically for tVNS, stimulation can also be performed by transcutaneous electrical nerve stimulator (TENS) devices, such as TENS-200, V-TENS PLUS, or TENS-NET 2000. Auri-Stim Medical have taken conventional TENS machines, which are typically used in pain management, and repurposed them for stimulating the ear by integrating the electrodes into a headset that can be worn by the user. These devices are portable battery powered control units that can administer tVNS in much the same way as the custom-built units, provided that the electrodes are placed in the correct location in the concha.

The TENS-NET 2000 was approved by the FDA in 2006 and labeled as a nerve stimulator for therapeutic use in depression, anxiety and depression (Hein et al., 2012). Userprogrammable stimulation parameters include frequency (0.5– 100 Hz), intensity (0–6 mA), and mode of stimulation (normal, burst or modulated). However, the polarity of the pulses cannot be varied and are typically monophasic rectangular waves. The stimulation can also be delivered in combination with music or different sounds to enhance the therapeutic effects.

For trials in a clinical or research-based setting, mainspowered medical stimulators, such as Digitimer DS7A or DS5 can be used. These allow complete personalization of stimulation parameters but sacrifice portability. These stimulators are isolated from the mains and can be connected to a computer via BNC cable to allow custom stimulation protocols to be delivered. The Digitimer DS7 is a general-purpose nerve or muscle stimulator for human stimulation and can output up to 100 mA. The frequency and pulse widths of the waves, as well as the duty cycle, are typically programmed on a computer and delivered to the stimulator via BNC cable. There is also the option of alternating the polarity of the pulses, which allows both monophasic and biphasic stimulation pulses to be output.

# 3.2. Electrode Types

Several studies report using gammaCore or NEMOS devices but do not specify stimulation electrode types (e.g., Goadsby et al., 2014; Grazzi et al., 2014; Huang et al., 2014; Altavilla et al., 2015; Barbanti et al., 2015; Nesbitt et al., 2015; Straube et al., 2015). In these cases, we assume that stimulation electrodes provided with the device were not modified for the study, and we report manufacture specifications for the gammaCore/NEMOS electrodes in **Table 2** (noted with an asterisk).

When reported, the most commonly used stimulation electrodes are made of titanium (for the ear) (Hasan et al., 2015; Sellaro et al., 2015a,b; Fischer et al., 2018; Jongkees et al., 2018) or stainless silver (for the neck) (Kinfe et al., 2015b; Gaul et al., 2016; Grazzi et al., 2016; Lerman et al., 2016; Silberstein et al., 2016a,b). Silver is also used as an electrode material for stimulation of ABVN (e.g., Laqua et al., 2014; Capone et al., 2015; Weise et al., 2015; Badran et al., 2018a; Keute et al., 2019). Information about stimulation electrodes is often somewhat insufficient: the material or size of the electrodes are often not specified (Stefan et al., 2012; Hyvärinen et al., 2015; Weise et al., 2015; Fang et al., 2016; Yakunina et al., 2018). This limits our collective understanding of the electrode-tissue interface and its interactions. However, the fact that the patient-specific pain threshold is often set as the stimulation current provides some control for variations in the electrode-tissue impedance.

#### 3.3. Stimulation Site

Out of 61 studies included in **Table 2**, 13 use the neck as a stimulation location (**Figure 5A**) (see Gaul et al., 2016; Grazzi et al., 2016; Lerman et al., 2016; Silberstein et al., 2016a,b among others). Discrepancies exist between reported stimulation locations within the studies that stimulate ABVN (**Figures 5B–F**). This is true even when the same device is used; for example, Straube et al. (2015) and Frangos et al. (2015) both use the NEMOS device, yet report the concha and cymba concha as the location of stimulation, respectively. The stimulation location is often dictated by the geometry of an electrode, with clip electrodes typically attached to tragus or concha (**Figures 5C,D**) (Lehtimäki et al., 2013; Mei et al., 2014; Straube et al., 2015; Fang et al., 2016; Rong et al., 2016; Liu et al., 2018). Often the outer audio canal is reported as a site for stimulation, without further clarification for the location of an electrode (Hasan et al., 2015; Sellaro et al., 2015a,b; Steenbergen et al., 2015). Given that studies have been done in different participant groups with different clinical conditions and with different stimulation parameters, it is difficult to conclude an optimal stimulation site for any particular disorder.

Initial investigations in this direction have been undertaken in Napadow et al. (2012) and Kraus et al. (2013). Napadow et al. concluded that the concha is the best site for stimulation, while Kraus et al. proposed that the anterior wall of the ear canal is the best for efficacy and participant's convenience. Studies, such as these are progressing in the right direction, but a more systematic approach is required to investigate the effect of the electrode placement on the ABVN recruitment and corresponding neural activations.

Although research groups acknowledge that the ABVN innervates the tragus, concha, and cymba concha as per Peuker and Filler's anatomical studies (Peuker and Filler, 2002), most do not mention antihelix innervation. Selection of the stimulation site appears to be arbitrary, either predetermined by the device employed in the experiment or based on other previous studies without providing any evidence or explanation for the designated stimulation site.

### 3.4. Stimulation Waveform

Most studies employ monophasic rectangular waveforms often set by the specifications of the device used (Hein et al., 2012; Busch et al., 2013; Stavrakis et al., 2015; Badran et al., 2018a; Yakunina et al., 2018), while some others report using biphasic waveform stimulation (Stefan et al., 2012; Hyvärinen et al., 2015; Liu et al., 2018). Lerman et al. (2016) and Silberstein et al. (2016b) reported using sinusoidal wave bursts; however, it is not clear from these studies whether this waveform is more optimal to activate neural fibers. The use of devices that employ "proprietary" or "modified" waveforms, such as electroCore's gammaCore, further hinders insights into the effect of stimulation waveforms on key research outcomes.

### 3.5. Stimulation Intensity

The justifications mentioned above are also employed to motivate the choice of stimulation parameters. Some studies have credited (Kraus et al., 2007; Polak et al., 2009) as having defined the optimal stimulation parameters for tVNS. However, further investigation suggests that these studies only elucidate the optimal stimulus intensity to induce the greatest vagus sensory evoked potential (VSEP) amplitudes (Polak et al., 2009), and that tVNS causes hypo- and hyperactivations of brain regions of interest relating to a decrease in depressive symptoms (Kraus et al., 2007). As Polak et al. (2009) have stated, "we chose a stimulation intensity of 8 mA allowing detection of sufficient VSEP amplitudes without perception of pain," which reveals nothing about the effects observed post-synaptically in various structures of the brain.

They also acknowledge that VSEP amplitudes are directly correlated to stimulation intensity (i.e., stimulation intensities >8 mA would elicit even greater VSEP amplitudes). Similarly, the studies of Kraus et al. (2007) showed no systematic effects of stimulation parameters on brain activation, although they did illustrate that tVNS does indeed elicit acute changes in brain regions that are related to a decrease in depressive symptoms similar to those caused by VNS. Therefore, neither of these studies can claim to have identified the optimal stimulation parameters of tVNS for the greatest decrease in depressive symptoms or seizure occurrence.

Furthermore, despite electrical current values being reported, the amount, or amplitude, of energy delivered to tissues is largely unknown given the substantial effect of electrode and tissue impedance and need for precise placement (e.g., a stated current of 8 mA presupposes that there is no impact of tissue impedance variation, and therefore voltage, and also neglects waveform shape, rise/fall-time, or any resultant residual charge). The stimulation current is often set according to the subject's sensitivity or just below pain threshold (Napadow et al., 2012; Frangos et al., 2015; Cha et al., 2016; Lerman et al., 2016; Fischer et al., 2018; Yakunina et al., 2018). Given the different stimulation tolerance of different participants, stimulation amplitudes vary over a wide range (from 0.5 mA in Jongkees et al., 2018 to 12 mA in Trevizol et al., 2016). Undoubtedly, the stimulation electrode electrochemistry also contributes to the maximum current that is tolerated by a participant.

# 3.6. Stimulation Frequency

With regard to stimulation frequency, the currently used range of 20–30 Hz has never been validated for its therapeutic effects (Laqua et al., 2014). Following studies showing that stimulation frequencies of 50 Hz and above can cause major and irreversible damage to the vagus nerve during VNS (Agnew and McCreery, 1990), stimulation frequencies between 20 and 30 Hz were arbitrarily selected in order to limit adverse events associated with direct stimulation of the carotid sheath and were subsequently approved by the FDA (Groves and Brown, 2005).

Lower frequencies of stimulation have also been explored. Liu et al. (2018) have found that 10 Hz tVNS for 20 min periods three times per day for 6 months reduced the number of seizures, while 8 Hz stimulation leads to activation in frontal and limbic brain areas as measured by fMRI (Kraus et al., 2007). Straube et al. (2015) have seen a stronger reduction in migraine episodes when stimulating at 1 Hz than when stimulating at 25 Hz. Thus, it should not be assumed that stimulation frequencies within the 20–30 Hz range are optimal for tVNS, and additional controlled studies are warranted to elucidate the effect of stimulation frequency rather than a selection based on past FDA approval of a related, yet different, technique.

# 4. BRAIN ACTIVATION

Several studies have speculated about the brain areas that are activated as a result of tVNS (Schulz-Stübner and Kehl, 2011; Busch et al., 2013; Laqua et al., 2014; Colzato et al., 2018; Jongkees et al., 2018). For example, Burger and Verkuil (2018) proposed that tVNS leads to activation in limbic areas, such as the amygdala and hippocampus, whereas Cha et al. (2016) suggested that it normalizes autonomic imbalance due to an increase in sympathetic response in patients with vertigo. In contrast, Silberstein et al. (2016b) proposed that stimulation of the vagus nerve affects hypocretin and orexin pathways in people with cluster headache, while Kinfe et al. (2015b) hypothesized that tVNS may help counteract the decline in thalamocortical activity in people with migraine and sleep disturbances. Jacobs et al. (2015) suggested that tVNS enhances memory performance by increasing neural activity in the locus coeruleus. It is clear that researchers have proposed different effects of tVNS on neural activation depending on the focus of their study. Measuring neural activity using techniques, such as fMRI, EEG, or MEG is critically important to confirm proposed hypotheses.

Brain activation in response to tVNS has been measured in Kraus et al. (2007), Kraus et al. (2013), Dietrich et al. (2008), Lehtimäki et al. (2013), Capone et al. (2015), Frangos et al. (2015), Hyvärinen et al. (2015), Weise et al. (2015), Fang et al. (2016), Yuan and Silberstein (2016b), Yu et al. (2017), Badran et al. (2018a), Fischer et al. (2018), Liu et al. (2018), Yakunina et al. (2018), Keute et al. (2019), Zhao et al. (2019), and Fallgatter et al. (2003). Most of these studies have been conducted in the last 5 years, with the exception of three that pioneered this field in the 2000s (Fallgatter et al., 2003; Kraus et al., 2007; Dietrich et al., 2008). Dietrich et al. (2008) showed that tVNS elicits activation in the left locus coeruleus, a brainstem nucleus that is implicated in clinical depression, as well as bilateral activation in the thalamus. Fallgatter et al. (2003) measured evoked potentials of post-synaptic brainstem activity from vagus nerve nuclei that can be elicited by electrical stimulation. Using fMRI, Kraus et al. (2007) demonstrated that tVNS leads to prominent changes in cerebral activation with marked deactivation in limbic and temporal brain areas.

Later fMRI studies have shown that active tVNS (i) produces a significantly larger increase in neural activity in the right caudate, bilateral anterior, left prefrontal cortex, cerebellum, and mid-cingulate than sham stimulation (Badran et al., 2018a); (ii) leads to a decrease in functional connectivity between posterior cingulate cortex and lingual gyrus (Zhao et al., 2019); and (iii) suppresses the auditory, limbic, and other brain areas implicated in the mechanisms involved in the generation of tinnitus (Yakunina et al., 2018).

EEG studies have shown a direct effect of tVNS on electrophysiological markers of conflict adaptation (Fischer et al., 2018) and on the number of seizures (Liu et al., 2018). MEG recordings have shown that tVNS modulates synchrony of toneevoked brain activity, especially in the beta and gamma bands (Hyvärinen et al., 2015).

It is not clear why the areas of brain activation vary between these studies, but it may be due to the different conditions presented by the participants. Due to the variation in results, different studies have proposed different underlying mechanisms for tVNS, and, as such, there can be no clear conclusions made from the different imaging studies. Despite the breadth of research being undertaken, many questions remain regarding the most effective stimulation sites and parameters. As many of the described methods differ in the parameters and protocols applied, there is currently no firm evidence on the optimal parameters to provide the greatest benefit to subjects.

## 4.1. Side Effects

Although tVNS is on the whole well-tolerated as a treatment option, a number of different mild side effects have been noted, which Redgrave et al. (2018) summarized in their review. Common side effects include tingling or pain around the stimulation site, with some participants reporting itching or redness (Busch et al., 2013; He et al., 2013; Goadsby et al., 2014; Kreuzer et al., 2014; Rong et al., 2014; Barbanti et al., 2015; Hasan et al., 2015; Jacobs et al., 2015; Kinfe et al., 2015b; Stavrakis et al., 2015; Straube et al., 2015; Weise et al., 2015; Bauer et al., 2016; Cha et al., 2016; Grazzi et al., 2016; Lerman et al., 2016; Silberstein et al., 2016a,b; Trevizol et al., 2016). Other less common side effects that have been observed in <1% of the study population include gastrointestinal issues, such as nausea or vomiting (Schulz-Stübner and Kehl, 2011; Kreuzer et al., 2014; Jacobs et al., 2015; Bauer et al., 2016; Silberstein et al., 2016b; Trevizol et al., 2016), headache (Stefan et al., 2012; Kreuzer et al., 2014; Jacobs et al., 2015; Bauer et al., 2016; Gaul et al., 2016; Lerman et al., 2016; Silberstein et al., 2016a; Trevizol et al., 2016), heart palpitations (Bauer et al., 2016), facial drooping (Goadsby et al., 2014; Silberstein et al., 2016b), dizziness (Aihua et al., 2014; Goadsby et al., 2014; Huang et al., 2014; Kreuzer et al., 2014; Rong et al., 2014; Jacobs et al., 2015; Bauer et al., 2016; Gaul et al., 2016), vocal hoarseness (Stefan et al., 2012; Goadsby et al., 2014; Kreuzer et al., 2014), and nasopharyingitis (Bauer et al., 2016; Gaul et al., 2016). There is currently no study that links stimulation parameters or dose to the rate of side effects experienced, which should be a priority for future research in the field, and clear reporting of both side effects and stimulation parameters is important to be able to observe any trends.

## 5. DISCUSSION AND FUTURE DIRECTIONS

This review has focused on a mechanistic understanding of transcutaneous vagus nerve stimulation (tVNS), with a detailed discussion of stimulation parameters, sites of stimulation, and devices used in current research. It should be noted that there is an ongoing discussion about the translation of non-invasive neural stimulation therapies into clinical practice. Transcranial magnetic stimulation (TMS) is another type of non-invasive neural stimulation therapy that is becoming more commonly used as a treatment option for different conditions, although use of the device is limited to clinical settings where it is operated by a healthcare professional. In contrast, transcranial direct stimulation (tDCS) (Wexler, 2015), much like tVNS, is a portable treatment option that does not require operation by a professional.

On the one hand, the affordability and easy availability of these devices, and an absence of severe adverse events, has led to a "do-it-yourself " movement that uses tDCS and tVNS at home for self-improvement purposes. Researchers are still trying to understand the risks and benefits of these techniques and fear that uncontrolled use may lead to unintended consequences (Bikson et al., 2013).

The situation is further complicated by the fact that, for regulatory purposes, the definition of a medical device focuses on the intended use of a device rather than the mechanism of action. This implies that manufacturers can skirt regulation by careful wording about the intended use. However, it is clear that a thorough risk analysis requires a sound understanding of the mechanism of action. Therefore, to promote the safe and efficacious use of tVNS in future, it is important to understand the mechanism of action of this promising technique.

The actual mechanisms of tVNS are still poorly understood. Many studies contradict the findings of similar studies and there is often very little homogeneity in results, making it difficult to draw conclusions from the findings. It has been proven by a number of studies that tVNS affects the same neural pathway as invasive VNS (He et al., 2009; Van Leusden et al., 2015); however, there is no conclusive evidence to explain why tVNS elicits therapeutic effects. It is therefore important for future studies to focus on the mechanism of action by following rigorous protocols that include objective measures of brain activation. It is also important that past assumptions about the effects of tVNS on brain neural activation and function do not restrict the direction of future investigations.

Given that stimulation parameters vary significantly between studies, a systematic approach is required to identify the optimal stimulation intensity, pulse width, waveform and frequency that provides the greatest clinical benefit. This may require participant-specific adjustment of parameters in a closed-loop setup, where stimulation parameters are set online, based on recorded neural activity. All current stimulation strategies for tVNS devices rely on open-loop control of the stimulation parameters, where the levels are set at the beginning of the stimulation protocol and do not change in response to any continuous measurement of the level of neuronal activation. It is reasonable to expect different outcomes in response to open-loop electrical stimulation between participants and between trials due to different ongoing brain activities at the time of stimulation. While many studies have been successful in using open-loop techniques (Barbanti et al., 2015; Trevizol et al., 2016; Liu et al., 2018), the outcomes differ from patient to patient. A customized closed-loop controller will allow the manipulation of specific patient-based neural responses. Pioneering steps in closed-loop VNS have been reported in Boon et al. (2015) and Fisher et al. (2016).

A closed-loop protocol will require continuous measurements of behavioral outcomes or brain activity. Since behavioral measures are often imprecise, it is preferable that imaging techniques, such as EEG or MEG, be used during the stimulation protocol to study neural activation and information transfer. The EEG signal has low spatial resolution that makes it difficult to interpret brain network connectivity. In contrast, MEG imaging has higher spatial resolution than EEG and higher temporal resolution than fMRI. The reconstruction of neuronal activity sources from MEG has less sensitivity to model approximations and smaller localization errors than EEG reconstruction. The MEG is sensitive to a wide range of frequencies in the oscillatory brain signals and has full brain coverage. There exist various techniques to reconstruct the anatomical origin of brain activity from MEG signal. When a structural MRI scan is available, it is possible to coregister MEG signals to anatomical locations. These advantages of MEG offer a powerful tool to study connectivity between brain areas and analyze brain networks and function (Baillet, 2017).

Such combined neuroimaging techniques can also help to resolve the origin of vagus connections in the brain. The "vagus" in the term tVNS is based on the assumption that the auricular branch of the vagus nerve has been activated. Some researchers believe that the auricular branch of the vagus is a misplaced branch of the trigeminal nerve and carries somatic-not visceralafferent fibers. In this respect, this nerve is just like the trigeminal nerve branches to the rest of the face. If this hypothesis is true, then the auricular nerve would not connect to the NTS in the brain but rather to the trigeminal-or possibly paratrigeminalnuclei. The latter nucleus receives cough receptor afferents from the airways, which may be why the auricular branch ("Arnold's nerve") can stimulate coughing (Gupta et al., 1986). However, a recent investigation of central neuronal projections from nerves innervating the external auricle in rats, appears to challenge an opinion that stimulation of the tragal nerve is conducted by the auricular branch of the vagus (Mahadi et al., 2019). Similar studies need to be done in primates to confirm whether the same conclusion may apply to humans.

Many studies have very few participants, with some having as few as one. This leads to difficulties in concluding whether the results or proposed mechanisms can be generalized to a larger population. To avoid the risk of accidentally having extreme or biased results, studies with a large number of participants are required. Due to heterogeneous populations with various health conditions and different medications and treatment responses often enlisted for a study, it is impossible to generalize to another condition or to a healthy group. Rigorous studies with a large number of healthy participants, where a wide range of stimulation parameters are tested within a participant and between a cohort, are needed to draw solid, evidencebased conclusions. Such studies may also reveal biomarkers for responders and non-responders to tVNS.

There has been very little investigation into how long the effects of tVNS last after the stimulation period has ended. Most clinical trials involve daily stimulation periods over the course of the trial, with the therapeutic results measured concurrently. Studies, such as that of Hein et al. (2012), have compared therapeutic results after the 2-weeks treatment period of daily stimulation to the baseline results recorded from before the stimulation period. Other studies (Huang et al., 2014; Mei et al., 2014; Rong et al., 2016) measured the therapeutic effects of daily stimulation continuously over set intervals during the trial period. Many studies found that participants who completed the entire treatment study had a better response to tVNS than those who dropped out, and longer treatment periods corresponded with better therapeutic outcomes (He et al., 2013; Bauer et al., 2016; Silberstein et al., 2016a; Liu et al., 2018). However, these studies did not offer a follow-up to see whether the effects of tVNS were long-lasting or remained after the cessation of the treatment period. In the case study by Zhao et al. (2019) on a single participant with insomnia, after 2 weeks of twice daily tVNS the treatment was stopped, but the participant still felt an improvement at the follow-up meeting, 3 months after the trial period. Similarly, Trevizol et al. (2016) had a stimulation period of 10 days, but found the clinical response remained stable 1 month after stimulation had stopped.

Some studies into the pain-relieving effects of tVNS have investigated whether the effects last for some time after the stimulation. Johnson et al. (1991) and Napadow et al. (2012) reported that an analgesic effect was present for up to 15 min after stimulation ceased. Other studies measured the therapeutic effects immediately after stimulation (Capone et al., 2015; Stavrakis et al., 2015; Keute et al., 2019) or at the same time as stimulation (Fallgatter et al., 2003; Kraus et al., 2007, 2013; Dietrich et al., 2008; Lehtimäki et al., 2013). This may offer interesting results for the measurement of brain activity as a result of tVNS but does not indicate whether these effects are long-lasting. Indeed, Frangos et al. (2015) noted that neural activation gradually returned to the baseline after tVNS was stopped. Immediate measurement of the therapeutic effects of

#### REFERENCES


tVNS do not therefore suggest whether these effects are merely a temporary result of stimulation or long-lasting.

When long periods of stimulation are required to achieve the maximum effect, it is unreasonable to expect participants to attend prolonged sessions several times per day. Therefore, portable stimulators are required, but gammaCore and NEMOS are currently the only tVNS devices available. It is difficult to track participants' compliance with these devices and record how stimulation parameters change over time. More research is required to produce miniaturized devices that are convenient and safe to use.

#### 6. CONCLUSION

tVNS has proven to be an effective way to modulate the central nervous system in some cases. However, the mechanism of action is not clear, and the robustness of the results is yet to be proven. The technique is safe and convenient with only a few relatively minor side effects reported. More rigorous systematic studies are required to investigate the effects of stimulation parameters, sites of stimulation, and electrode types on brain activation and clinical outcomes. Current limitations in study protocols may lead to difficulties in obtaining regulatory approval and challenges in translating research studies into clinical practice.

### AUTHOR CONTRIBUTIONS

JY and PS conceived and designed the idea. JY, CK, TK, and PS wrote the manuscript. All authors aided in interpreting the results and commented on the manuscript.

#### FUNDING

PS and JY acknowledge funding support from the Australian Research Council (IC140100023) and Grey Innovation Pty Ltd.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Handling Editor declared a past co-authorship with one of the authors TK.

Copyright © 2020 Yap, Keatch, Lambert, Woods, Stoddart and Kameneva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Tapping Into the Language of Touch: Using Non-invasive Stimulation to Specify Tactile Afferent Firing Patterns

Richard M. Vickery1,2 \*, Kevin K. W. Ng1,2, Jason R. Potas<sup>1</sup> , Mohit N. Shivdasani<sup>3</sup> , Sarah McIntyre<sup>4</sup> , Saad S. Nagi<sup>4</sup> and Ingvars Birznieks1,2

<sup>1</sup> School of Medical Sciences, UNSW Sydney, Sydney, NSW, Australia, <sup>2</sup> Neuroscience Research Australia, Sydney, NSW, Australia, <sup>3</sup> Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, Australia, <sup>4</sup> Center for Social and Affective Neuroscience, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden

#### Edited by:

Yuki Hayashida, Osaka University, Japan

#### Reviewed by:

Vassiliy Tsytsarev, University of Maryland, College Park, United States Hannes Philipp Saal, University of Sheffield, United Kingdom

> \*Correspondence: Richard M. Vickery richard.vickery@unsw.edu.au

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 30 November 2019 Accepted: 21 April 2020 Published: 19 May 2020

#### Citation:

Vickery RM, Ng KKW, Potas JR, Shivdasani MN, McIntyre S, Nagi SS and Birznieks I (2020) Tapping Into the Language of Touch: Using Non-invasive Stimulation to Specify Tactile Afferent Firing Patterns. Front. Neurosci. 14:500. doi: 10.3389/fnins.2020.00500 The temporal pattern of action potentials can convey rich information in a variety of sensory systems. We describe a new non-invasive technique that enables precise, reliable generation of action potential patterns in tactile peripheral afferent neurons by brief taps on the skin. Using this technique, we demonstrate sophisticated coding of temporal information in the somatosensory system, that shows that perceived vibration frequency is not encoded in peripheral afferents as was expected by either their firing rate or the underlying periodicity of the stimulus. Instead, a burst gap or silent gap between trains of action potentials conveys frequency information. This opens the possibility of new encoding strategies that could be deployed to convey sensory information using mechanical or electrical stimulation in neural prostheses and brainmachine interfaces, and may extend to senses beyond artificial encoding of aspects of touch. We argue that a focus on appropriate use of effective temporal coding offers more prospects for rapid improvement in the function of these interfaces than attempts to scale-up existing devices.

Keywords: bionic, tactile, neural prosthesis, brain-machine interface, somatosensory, spike train, rate code, neural coding

# INTRODUCTION

A sensory brain-machine interface bypasses the default systems of sensory input that transduce environmental signals into neural activity. Instead, neural activity is generated in new ways, driven by computer inputs that are developed based on environmental signals. Improved sensory brainmachine interfaces offer promise in many fields, from quality of life for those with a disability, to augmenting the range of normal senses. One of the major challenges of sensory brain-machine interfaces has traditionally been viewed as the issue of spatial and numerical scale: for example in humans, the optic nerve has the order of 10<sup>6</sup> axons (Mikelberg et al., 1989), the auditory component of the vestibular cochlear nerve has the order of 10<sup>4</sup> axons (Spoendlin and Schrott, 1990), and the median nerve arising from the hand also has the order of 10<sup>4</sup> axons (Johansson and Vallbo, 1979). In contrast, current retinal implants have 20 to a few hundred electrodes (Sinclair et al., 2016; Farvardin et al., 2018), the cochlear implant that stimulates surviving spiral ganglion cells has less

than 25 electrodes (Patrick and Clark, 1991), and bionic hand prostheses that aim to provide user feedback as touch sensations use implants with 4 to 96 electrodes on up to three nerves (Boretius et al., 2010; Raspopovic et al., 2014; Schiefer et al., 2016; Wendelken et al., 2017). Considerable effort is being expended to bridge the gap between the number of sensors/electrodes and the number of afferent neurons. In this review we hope to provoke reflection on the tractability of this challenge and draw attention to the potential offered by a closer focus on precise control of the timing of inputs through these interfaces. In contrast to the spatial challenges, a sub-millisecond time resolution is easily achieved by any of the current interfaces, and is directly comparable to the time scale of the nervous system, which uses action potentials or "spikes," with a duration of around one millisecond. The language of the brain is spoken in the temporal pattern of these spikes, as well as the array of neurons in which they are active. More focus on this temporal patterning may represent a tractable parallel path to advance the quality of sensory neural prostheses. We will have a particular focus on recent findings in the tactile system, and their implication for efficient encoding of information for relay to the brain.

#### THE NATURE OF NEURAL INFORMATION

The last 100 years have revealed an unprecedented amount about the workings of the nervous system. It is now well understood how voltage-gated ion channels support the transmission of allor-nothing action potentials in a reliable, rapid manner over long distances (Hodgkin and Huxley, 1952). The conversion of this action potential into a pulse of neurotransmitters that engage with receptors on post-synaptic elements at the synapse is also largely understood (Lisman et al., 2007). What has lagged behind is our understanding of the information content of these events. At a certain point in the neural processing of a sensory event, the entire information content of the event has to be conveyed in the pattern of action potentials travelling in the axons of afferent neurons. Each of the action potentials is just a brief alteration of the membrane potential, but somehow these flickering potentials can convey essential qualities of an event, such as the warmth, texture, shape and firmness of a hand that one is holding.

Some of the information is inherent in the nature of the afferent neuron and the environmental signals it is able to transduce. For instance, cold sensitive afferents express TRP channels in their cell membrane that open when cooled, leading to depolarisation and generation of action potentials (Bautista et al., 2007). Action potentials in these axons signal "cold" because that is the most common origin of action potentials in these axons, and because they are connected to other neurons higher in the nervous system that take part in cold-related behaviours such as shivering. A single action potential in these axons has this property of "cold," even if it is elicited in the axon by something other than the opening of TRP channels such as electrical stimulation.

However, the more detailed information about timing, intensity, and complex stimulus properties, are conveyed by the pattern of firing of multiple action potentials in each axon. The default level of neuroscientific analysis of these firing patterns has been to convert them to a mean rate for use as an index of intensity of afferent activation. This is a simple and robust approach, but ignores a long history of research into the role of temporal encoding in auditory system (Galambos and Davis, 1943) and considerable evidence of a potential role for action potential timing in a variety of sensory systems (VanRullen et al., 2005). A rate-based approach is also used as the default encoding strategy of many sensory prostheses, in part because these devices simultaneously activate large populations of afferents, and because of the view that the temporal information will be recoded to a rate code anyway at higher levels of the nervous system (Ahissar, 1998). This rate-based approach discards the temporal relation between individual action potentials which we will show is potentially a rich source of information.

#### NEURAL INFORMATION IN TOUCH

Touch is an excellent sensory system in which to explore questions of neural information encoding for a number of important conceptual and practical reasons. There is a considerable body of existing research that suggests that the tactile system may encode information in multiple different ways, some of which depend on precise temporal features of action potential patterns. The tactile nervous system transduces a rich and varied set of stimuli that convey critical information for often subconscious manipulation, but also contributes to conscious and affective experiences. The afferent axons are generally readily accessible for invasive and non-invasive stimulation and recording in the tactile system. This property makes the system an excellent site for a sensory neural prosthesis, as evidenced by the development of increasingly sophisticated closed-loop technologies for prosthetic limbs and haptic devices.

The sense of touch at the fingertips is subserved in humans by four classes of myelinated afferent neurons, reviewed by Johnson (2001), Macefield and Birznieks (2009), and Abraira and Ginty (2013). Some of these afferents, called Fast Adapting (FA), respond only to dynamic stimuli that induce a timevarying strain profile across their receptor endings. Others, called Slowly Adapting (SA) also respond to dynamic stimuli, but are able to produce a response sustained over many seconds to static stimuli. The hand contains approximately 17 000 of these afferents (Johansson and Vallbo, 1979), and their combined activity is sufficient for us to discriminate shapes, textures, contact forces, vibration frequencies and directions of movement. Longstanding research using vibration of punctate probes on the skin has established a set of frequency sensitivity profiles for the four fast touch afferent types found on the hand (Talbot et al., 1968; Johansson et al., 1982). However, the extent to which the information from these different afferent types is maintained in separate channels, and how information in the firing pattern of action potentials conveys the sinusoidal stimulation frequency and the stimulation amplitude remains an area of active enquiry.

The four different afferent classes have their peak sensitivities at different sinusoidal vibration frequencies. For SAI and SAII

afferents, their best vibration sensitivity is at low frequencies, below 8 Hz, while FAI are most sensitive at 32 Hz, and FAII at 256 Hz (Johansson et al., 1982). Even though individual afferents will respond to a wide range of frequencies given a strong enough stimulus (Johansson et al., 1982), a prominent interpretation of the frequency sensitivity profiles of these afferents is that, similar to the "cold" property of a cold afferent, the four mechanosensitive afferent types (SAI, SAII, FAI, and FAII) each give rise to a qualitatively different sensation of frequency. The most developed of these interpretations is the four-channel model of mechanoreception which assigns each afferent type to part of the frequency range based on behaviourally-determined thresholds (Bolanowski et al., 1988). In this model, the SAII are assigned to a high frequency range, and it is suggested that channels interact by summation of their perceived magnitudes (Gescheider et al., 2004). However, this interpretation makes a logical leap that links the frequency-dependent thresholds of individual afferent types, and frequency-dependent variation in perceptual thresholds, to conclude that single afferent types directly and independently mediate the perception of frequency in particular frequency bands.

An extension of this interpretation addresses how to reconcile the signals when multiple afferent types are simultaneously active. In this case, it was hypothesised that the ratio of their activities could encode vibration frequency in a manner analogous to colour-sensitive cone cells in the retina, with the most active afferent types contributing most to frequency perception. However, a study that systematically varied the recruitment ratios of the FA afferents failed to show any consistent effect on perceptual judgements of frequency (Morley and Rowe, 1990). Indeed, more recent evidence from animal studies show that the signals deriving from these afferents converge at the primary somatosensory cortex, or perhaps even lower levels (Pei et al., 2009; Carter et al., 2014; Saal et al., 2015). This suggests that we should not treat these afferents as pure channels representing a frequency band, and supports the idea that the information from these channels is integrated in a way that is somewhat agnostic about the afferent source.

The question of the neural code for frequency, and how to extract it independently of the stimulus amplitude, is challenging to answer as shown in **Figure 1**. At low amplitudes of stimulation (but above the threshold for a neural response), the tactile afferent neuron will generate an occasional action potential during a cycle of vibration that moves the probe down and up on the skin. At a higher amplitude, the afferent will generate 1 spike for each vibration cycle, and the response rate will match the stimulus frequency; this response pattern is termed 1:1 entrainment or the tuning plateau (Talbot et al., 1968). At even greater amplitudes, the afferent may respond 2, 3, or more times for each cycle of vibration (Johnson, 1974; Johansson et al., 1982). This relationship of amplitude and frequency implies that a neural code based on counting the number of action potentials in a fixed time period (rate code) could possibly be used to determine the amplitude (**Figure 1E**) but cannot be used to determine the frequency of the stimulus.

FIGURE 1 | The complex relationship of afferent response with stimulus amplitude for sinusoidal vibration on the skin. (A–D) The top trace in each condition represents the stimulus, in the bottom trace a vertical inflection represents an action potential or spike in one afferent neuron. (E) Schematic of FAI afferent response when stimulated at 50 Hz over a range of vibration amplitudes. Notice the plateaus in the response over a range of vibration amplitudes, in particular at 1:1 entrainment.

# EVIDENCE FOR THE IMPORTANCE OF TIMING INFORMATION IN SENSORY NEURAL SIGNALS

Mountcastle's group advanced arguments for a neural code based on the periodicity of inter-spike intervals in neural firing as a means by which the nervous system could distinguish frequency from intensity in a vibratory stimulus (Talbot et al., 1968). This is a time-based code operating on the pattern of spikes to signal frequency, which could function independently of the underlying response rate of the neuron. No specific neural mechanism was proposed to perform this decoding, which was envisaged to operate in part by comparison across cortical neurons, as individual neurons cannot fire at frequencies matching the entrainment rate. Even 50 years later, no easily interpretable neural code or decoding mechanism for vibrotactile frequency in cortical neurons has been discovered, other than an apparent special case of place coding in mice inputs related to deep Pacinian inputs (Prsa et al., 2019).

The neural code for the intensity of the vibratory stimulus was proposed by Johnson (1974) and later Muniak et al. (2007) to be

based on either the number of active afferents, or a code based on the firing rate in these afferents.

Over the last 50 years, there has been increasing evidence to support an important role for timing information such as that based on phase-locking proposed by Talbot et al. (1968) rather than simply relying on the spike rate in sensory inputs. This has been demonstrated for hearing using both normal audition (von Békésy, 1961) and by stimulation through a cochlear implant, using pulse trains that alternated 2 intervals (Carlyon et al., 2008), where the perceptual frequency was not that expected from the simple arithmetic rate of pulses presented. There is suggestive evidence from visual experiments in animals (Gollisch and Meister, 2008) where it was shown that just the latencies of the very first spike from multiple retinal ganglion cells to a flashed visual image were sufficient to enable reconstruction of the image. The results were even more robust when latency differences between neurons were used, and other similar studies are reviewed in VanRullen et al. (2005) and Panzeri et al. (2014). For these measurements, stimulus onset, which in nature would be triggered by a rapid eye movement, acts as the base time point from which latencies are measured. There is similarly suggestive data for the significance of equivalent first spike timing in the tactile discrimination of force direction and object shape (Johansson and Birznieks, 2004), torque (Birznieks et al., 2010; Redmond et al., 2010), contact force and friction (Khamis et al., 2014). Analyses of inputs from FAI and SAI afferents suggest that fine-grained temporal information can be used to improve discrimination of the edge orientation of tactile objects (Pruszynski and Johansson, 2014).

Animal studies with tactile stimuli also indicate that temporal codes are important (Panzeri et al., 2001; Arabzadeh et al., 2006). In experiments in awake behaving rats that made texture judgements using their whiskers, it was shown that time-based measures carried greater information (Zuo et al., 2015). The timebased measures were created by determining a template that weighted spike contributions, based on their time after whisker contact, to produce best discrimination. A challenge in many such studies is that demonstrating that timing conveys more information than rate alone, is not the same as showing that the nervous system makes use of this available information. This study by Zuo et al. (2015) begins to bridge the gap between what spike timing information, in particular relative spike timing information from an ensemble neural population, might enable, and what it is actually used for, by showing that these timing measures were better predictors of the actual correct-incorrect decision that the animal made about the texture than ratebased measures. This suggests that although the exact timing mechanism employed by the experimenters is unlikely to be what is implemented in the nervous system, the nervous system is actually employing some form of analysis of time-based information. In a study on discrimination between different fabrics on a rotating drum, human behavioural performance was compared with possible temporal and rate codes based on recordings in monkeys from single tactile afferents innervating the fingers (Weber et al., 2013). The evidence was clear that judgements about fine textures were predominantly based on temporally-coded information arriving via FA afferents, whereas coarse textures depended on a population rate code in SAI afferents.

# CONTROL OF TACTILE AFFERENT SPIKE TIMING BY NON-INVASIVE STIMULATION

The two studies described above were able to unite stimulus control, neuronal recording and behavioural experiments in awake behaving animals, but obtaining equivalent data in humans is particularly challenging. In our laboratory, we have been able to unite two technologies and bring a new approach toward trying to resolve these questions of information transmission in the peripheral nervous system for touch. One technology is pulsatile stimulation, which offers a non-invasive way to induce precise patterns of single action potentials in a small population of peripheral afferent neurons. The other technology is microneurography (Vallbo and Hagbarth, 1968), which enables us to record activity in real time from single afferent neurons in awake humans. The combination of these techniques with psychophysics enables us to confidently interrogate questions of the neural coding of complex tactile properties, by giving us near complete control over the ascending afferent activity patterns (Birznieks and Vickery, 2017).

The stimulation technique relies on creating preciselycontrolled spike patterns in tactile afferents by using brief taps (stimulus pulses) delivered at intensities well above the neural response threshold. Provided the duration of the pulse stimulus is approximately the same as the refractory period of the afferent axon (around 1.5 ms under normal conditions), each pulse will induce a maximum of a single spike in responding afferent axons over a wide range of stimulus amplitudes. This technique enables us to reproduce a desired spiking pattern in human peripheral afferent axons, and perform psychophysical experiments to interrogate the sensation elicited. The pulses can be repeated at any desired timing and repetition rate, while always activating the same population of afferent neurons. In this way, we can simulate varied environmental parameters by creating spiking patterns that reflect those that the environmental condition would have elicited, but maintain a fixed afferent population that drives the sensation. We use the technique of microneurography to validate the fidelity of the conversion of our stimulus into spike patterns by recording from single human tactile afferents while the subject receives the pulsatile stimulation (see **Figure 2**).

The pulsatile stimuli used in our experiments are all delivered non-invasively, either by mechanical stimulation of the skin, or by electrical stimulation of the skin overlying a peripheral nerve such as the digital nerve. For electrical stimulation, we use an isolated stimulator such as the DS5 (Digitimer, Hertfordshire, United Kingdom) to deliver charge-balanced stimulation, with a depolarising phase of 0.1 ms, and a repolarising phase of 1 ms. For mechanical stimulation we have several devices capable of producing a brief mechanical pulse. We have used an Optacon 1D (Bliss, 1969) driven by a custom-built interface which offers 144 pins with approximately 15 µm displacement over <2 ms. The amplitude and pulse width are adequate to reliably activate

FAI and FAII afferents, but not SA afferents, which is consistent with findings in the monkey (Gardner and Palmer, 1989). To obtain larger pulse amplitudes that recruit SA afferents, we use a GW-V4 shaker (Data Physics, San Jose, United States) controlled via a Power1401 (CED, Milton, United Kingdom) where we use feedforward control to damp the resonance of the shaker to ensure a brief single mechanical pulse at amplitudes up to 150 µm.

The use of low amplitude (<5 µm) pulsatile stimulation enables selective activation of FAII afferents by exploiting their extreme sensitivity to short-period waveforms (Johansson et al., 1982). At frequencies below 40 Hz, tactile inputs are normally dominated by FAI afferents because they are activated at a lower threshold amplitude than FAII afferents when sinusoidal vibration is used, as has been the case in most experiments using vibrotactile stimuli. However, by using our low amplitude pulsatile stimuli, we are able to activate FAII afferents selectively without FAI activity, even at low frequencies. These non-invasive stimulation tools enabled us to demonstrate that the FAII afferents are capable of sustaining high-quality vibration perception at these low frequencies, which provides further support for the convergence of these input channels onto common cortical frequency processing circuits (Birznieks et al., 2019).

## MODIFYING HUMAN TOUCH SENSATION BY VARYING SPIKE TIMING PATTERNS

Through the use of these non-invasive stimulation techniques, we have been able to demonstrate the critical importance of spike timing in shaping human tactile perception, in this case, of vibrotactile frequency. We set out to show that the spike rate in peripheral afferents could not plausibly code for vibration frequency, informed by the intuition from **Figure 1**, that increasing vibration amplitude leads to more spikes per cycle, but does not produce an equivalent upward shift in the perceived frequency. Using controlled pulsatile stimuli such as those shown in **Figure 2**, which are a controlled way of simulating the burst firing illustrated in **Figure 1**, we demonstrated that the perceived frequency was not related to the spike rate (Birznieks and Vickery, 2017). Unexpectedly, however, we were not able to demonstrate that the perceived frequency was related to the underlying periodicity of these stimuli in accordance with the hypotheses of Talbot et al. (1968). Instead, we found that subjects' perceived frequency was best explained by the silent gap between the bursts of spikes irrespective of the number of spikes within a burst, burst duration or periodicity. The difference between the burst gap and the periodicity is illustrated in **Figure 3**. Bursts are well-known in the neuroscientific literature as a common form of neural activity in the tactile system (Vickery et al., 1994) and elsewhere. Bursts have been speculated to play a key role in information processing by providing precise information that can be reliably signalled across the next relay (Lisman, 1997), by interacting with resonant frequency tuning of relay cells (Izhikevich et al., 2003) and by offering a parallel information path in the form of bursts contrasted to isolated spikes (Naud and Sprekeler, 2018). In these three studies, the definition of a burst's duration ranged from 10 to 25 ms, with this difference likely dependent on the temporal integration properties of the neurons studied. We determined the time envelope within which subsequent spikes in our study would be grouped together as a burst, by determining the range of pulse separations over which the burst gap applied. We found that spikes in the tactile afferents with a 15 ms envelope were treated as a burst, and over the range of 15–25 ms, there was still some

interaction. Beyond 25 ms the spikes were treated as independent sensory events and the burst gap code no longer applied and perception could be explained by a rate code (Birznieks and Vickery, 2017). We have now extended these findings to show that we can elicit the same burst gap responses when we deliver transcutaneous electrical stimulation to digital nerves instead of using mechanical stimulation (Ng et al., 2020).

To demonstrate the robustness of the integration envelope for spiking patterns containing bursts, we have also tested aperiodic stimuli (Ng et al., 2018) that may better model the variation encountered in day-to-day tactile exploration of surfaces. Using the Optacon, we delivered spike patterns with intervals ranging from 4 to 113 ms with mean spike rates below 50 Hz. Our prediction was that the perceived frequency of these stimuli would be lower than the mean spike rate as a result of the intervals of <25 ms falling within the burst window and so not contributing their weights to the apparent frequency. All three frequencies tested showed perceived frequencies approximately 80% of that which would be expected from the mean rate (Ng et al., 2018). This provides compelling evidence that mean spike rate is not the key determinant of perceived frequency, and that the fine temporal structure of spike trains plays a critical role in the sensory experience. This is consistent with experimental results that pooled afferent data from monkeys with psychophysics studies conducted in humans to show that fine temporal features on a millisecond scale detected by FA afferents were far stronger predictors of human perception of the similarity of two stimuli than were measures based on a rate code (Mackevicius et al., 2012).

# IMPLICATIONS FOR DESIGN OF SENSORY NEURAL PROSTHESES AND BRAIN-MACHINE INTERFACES

The insights into neural coding, and the opportunity afforded by control of spike timing, represent advances that should be translatable into advances in interfaces between external devices and peripheral nerves or neurons in the central nervous system. Although current bionic prostheses offer a profound benefit for users, as outlined in the introduction, they are nowhere near matching the scale of the sensory transduction systems that they are designed to interface with. Any currently foreseeable improvement would still leave interfaces a long way from a 1:1 connection between sensor and afferent neuron. This challenge has several dimensions, one is to maintain a stable connection with a neuron or axon that is fragile, flexible, and has a size on the order of 10 µm. Although there are possible new approaches using flexible materials and optical technologies, there remain very significant challenges to be overcome (Durand et al., 2014).

Assuming the problem of scale can somehow be overcome by increasingly miniaturised technology, the 1:1 sensor-afferent relationship can only work efficiently if the afferents can be mapped so that it is known what sensation each afferent gives rise to. This enables computational processing of the sensor signals to optimise the sensory experience by stimulating each afferent with input from appropriate sensors in appropriate patterns. For some afferents, this mapping should prove straight-forward, as it seems that in general single FAI and FAII afferents can give rise to a clear and localised percept (Ochoa and Torebjörk, 1983; Vallbo and Johansson, 1984) and SAI from the dorsum of the foot and hand can evoke conscious sensation (Nagi et al., 2019). In contrast, single SAII afferents (Ochoa and Torebjörk, 1983), and SAI afferents from the hairy skin of the forearm (Vickery et al., 1993), do not appear to give rise to a conscious percept. It may be that activation of certain combinations of these afferents can evoke a perception but the complexity and dimensionality of these combinations are currently not understood.

Without a 1:1 scale, the bionic prosthesis is left to activate groups of neurons rather than single afferents. In a relatively homogeneous sensory system such as the auditory system which has only inner and outer hair cells as the major neural distinctions in the cochlea, this strategy may prove effective. In vision, the photoreceptors converge in complex ways onto ganglion cells, creating, among others, on-centre and off-centre pathways. Simultaneously stimulating groups of on-centre and off-centre cells creates an unnatural perception, as normally only one or the other type would be active for a particular retinal location at any instant, and this may partly explain why retinal stimulation produces phosphenes (Sinclair et al., 2016) rather than more natural percepts. In the tactile system, a bionic prosthesis activating groups of afferents will likely activate both slowly adapting and rapidly adapting neurons, thereby limiting the ability to tailor the stimulation strategy to suit the particular afferent type. Using mechanical stimulation instead of electrical stimulation enables some selectivity over which tactile afferent types are recruited (Antfolk et al., 2013; Birznieks et al., 2019).

An important related question is the extent to which a single afferent should be treated as a unique individual input, rather than as a representative of an afferent class. The type of nerve

ending and its distance from the stimulus site matters, but every tactile afferent has its own unique sensitivity profile whereby it is more efficiently excited than other afferents by certain features of a given stimulus due to variations in receptor embedding in skin tissue, such as geometry and anchoring (Birznieks et al., 2001). Natural stimuli will likely activate afferents of all classes (Johansson and Westling, 1987), however, any given afferent may contribute to perception in a very specific way in one situation but not at all in another situation. It is currently an open question as to how much the higher levels of the nervous system take advantage of these highly specific sources of information. Presumably, correlation of stimulus and particular individual afferent activations could be learned over the course of development through neural plasticity to inform decision making. A prosthetic replacement stimulus ideally would harness the same neural plasticity to maximise the information that can be conveyed.

The semi-selective mechanical stimulation proposed above could be combined with targeted sensory reinnervation surgery in amputees, where the stumps of the nerves that would normally have innervated the hand (ulnar and median) are surgically repositioned and stimulated to grow, so that they innervate the overlying skin of the new site which has been de-innervated. Mechanical stimulation of the skin at this new site would be able to provide a more natural mapping of sensation from different afferent types to the bionic hand, as the sensation would appear to the subject to arise from their hand, rather than from the body site actually stimulated (Hebert et al., 2014).

## USING TEMPORAL NEURAL CODES TO IMPROVE SENSORY NEURAL PROSTHESES

We suggest that a renewed focus on understanding, and deploying, precise temporal information in the induced spike patterns can help realise better outcomes for bionic prostheses. Unlike the problem of spatial and numerical scale, the timescale of the nervous system is very tractable with current technology. A time resolution of 0.2 ms, translating to a digital to analog conversion (DAC) rate of 5 kHz per channel, is almost

span the spiking activity across a population of afferents (1...n) as the "silent gap." The near simultaneous events in the periphery become dispersed in time on arrival at the central nervous system (CNS) due to differences between afferent conduction velocities.

certainly sufficient to capture the full temporal resolution of the nervous system (Mackevicius et al., 2012) except in the auditory domain for sound localisation by inter-aural time differences where thresholds can be 0.01 ms (Brughera et al., 2013). There are two approaches to better encode this temporal information, which vary in the extent of information interpretation required.

One strategy is to take an agnostic view of the salience of the patterns, and instead simply relay the temporal information as realistic spiking patterns as faithfully as possible. The strength of such an approach is that potentially useful information, whose encoding we do not yet understand, is not discarded. This approach is behind one technique we have employed (Rager et al., 2013) to preserve spike firing patterns related to environmental features down to sub-millisecond precision. We built a library of virtual tactile afferent neurons by training noisy integrate-and-fire neurons (Paninski et al., 2004) on data derived from real afferents while driving them through artificial sensors given the same set of mechanical stimuli. We accepted that we would sacrifice spatial scale by using only a few transducers in place of the thousands of normal tactile receptors, but we were able to preserve spike firing patterns at high temporal resolution. A related approach, based on TouchSim which models afferent population responses (Saal et al., 2017), uses pooled outputs from the model (implemented in an efficient coding algorithm) to capture both spiking patterns and number of active afferents (Okorokova et al., 2018). This approach performs well and shows good modulation with variations in stimulus intensity, but may lose some time fidelity through pooling which is apparent at frequencies above 60 Hz (Okorokova et al., 2018).

The other strategy is to try and determine how sensory information is conveyed in the temporal patterning of spike firing. This approach enables synthesis of desired sensation by creating the appropriate spike pattern, as well as advances our basic neuroscientific understanding. However, as the progress of more than 50 years of research outlined above shows, there remains much to be learned. The insights about how frequency is encoded by the burst gap rather than the period represent one small step in this direction. We are currently exploring whether the spikes that are "hidden" inside the burst envelope in peripheral afferent spike patterns may contribute to other aspects of tactile sensation such as intensity (Ng et al., 2019). Other groups are combining animal experiments with single unit cortical recording and behavioural experiments to understand the neural code for tactile information at higher levels of the nervous system (Harvey et al., 2013). Studies in the barrel cortex of mice, an important tactile area for whisker inputs in rodents, suggests that strong integration of whisker movements occurs over a short time period of less than 25 ms (Stüttgen and Schwarz, 2010; Estebanez et al., 2012; Tsytsarev et al., 2016), which fits well with our observations. Weaker integration of one or two inter-spike intervals over longer time periods has also been reported (Pitas et al., 2016), which supports the importance of temporal pattern encoding, and may underlie the recognition of the burst gap intervals.

An open question is whether it is sufficient to examine temporal coding in spike patterns of single afferents or whether temporal patterning should be considered as extending across a population of afferent neurons. One challenge facing a population-based model is the dispersion in arrival times at the central nervous system (CNS), of spikes travelling in different afferents that originate from a single tactile event in the periphery. The conduction velocity of human afferent axons varies from 35 to 70 ms−<sup>1</sup> between axons in a single nerve (Dorfman, 1984). Over the approximately 1 m conduction distance from fingertip to brainstem, this velocity difference translates to a difference in arrival times of about 15 ms. This is a close match to the time envelope defining a burst that we discovered for tactile afferents, and opens the possibility that one aspect of burst gap encoding is to preserve unity of sensation arising from the spikes produced by a single peripheral event. By ignoring the scattered spikes, the nervous system can reliably distinguish a single event. This suggests a modified form of the burst gap, which we have termed the "silent gap," where the burst is defined by spiking activity aggregated across afferents as shown in **Figure 4**. This aggregation would occur at the first central nervous system synapse, which for the main tactile pathways are in the dorsal column nuclei. These nuclei are also a focus for a possible brainmachine interface, with early work showing potential for

decoding the afferent input signals using machine-learning techniques (Sritharan et al., 2016; Loutit et al., 2017, 2019; Loutit and Potas, 2020).

### CONCLUSION

Although it is clear that there is still far more to be understood about the information encoded by the temporal patterning of

### REFERENCES


spikes, it is also clear that this represents a relatively underutilised tool to improve sensory neural prostheses and brainmachine interfaces. The tractability of precisely controlling temporal features, when compared with the many challenges of other ways of improving these interfaces, suggest that basic neuroscientific research needs to continue to advance the field, but that current understandings should be built into the next generation of devices. It is likely that tactile and auditory prostheses will show the most benefit from the introduction of temporal-based encoding as the use of time-based information is best understood in these sensory systems; but further investigations will likely reveal ways to deploy this usefully in other modalities.

### AUTHOR CONTRIBUTIONS

RV wrote the first draft of the manuscript. All authors contributed to conception of the project, manuscript revision, and read and approved the submitted version.

# FUNDING

Some of the work described was supported by an NHMRC project grant APP1028284 to IB and RV, an ARC Discovery Project grant DP200100630 to IB, RV, JP, and MS, and an Australian Government Research Training Program Scholarship to KN.

### ACKNOWLEDGMENTS

We acknowledge the technical help of Mr. Edward Crawford (UNSW Sydney) and Mr. Hilary Carter (NeuRA) in developing the pulsatile stimulation systems.

system that utilizes spike pattern regardless of receptor type. eLife 8:e46510. doi: 10.7554/eLife.46510


electric and acoustic hearing. J. Acoust. Soc. Am. 123, 973–985. doi: 10.1121/1. 2821986



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Vickery, Ng, Potas, Shivdasani, McIntyre, Nagi and Birznieks. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Investigation of Electrically Evoked Auditory Brainstem Responses to Multi-Pulse Stimulation of High Frequency in Cochlear Implant Users

Ali Saeedi1,2 \* and Werner Hemmert1,2

<sup>1</sup> Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany, <sup>2</sup> Munich School of Bioengineering, Technical University of Munich, Garching, Germany

We investigated the effects of electric multi-pulse stimulation on electrically evoked auditory brainstem responses (eABRs). Multi-pulses with a high burst rate of 10,000 pps were assembled from pulses of 45-µs phase duration. Conditions of 1, 2, 4, 8, and 16 pulses were investigated. Psychophysical thresholds (THRs) and most comfortable levels (MCLs) in multi-pulse conditions were measured. Psychophysical temporal integration functions (slopes of THRs/MCLs as a function of number of pulses) were −1.30 and −0.93 dB/doubling of the number of pulses, which correspond to the doubling of pulse duration. A total of 15 eABR conditions with different numbers of pulses and amplitudes were measured. The morphology of eABRs to multi-pulse stimuli did not differ from those to conventional single pulses. eABR wave eV amplitudes and latencies were analyzed extensively. At a fixed stimulation amplitude, an increasing number of pulses caused increasing wave eV amplitudes up to a certain, subjectdependent number of pulses. Then, amplitudes either saturated or even decreased. This contradicted the conventional amplitude growth functions and also contradicted psychophysical results. We showed that destructive interference could be a possible reason for such a finding, where peaks and troughs of responses to the first pulses were suppressed by those of successive pulses in the train. This study provides data on psychophysical THRs and MCLs and corresponding eABR responses for stimulation with single-pulse and multi-pulse stimuli with increasing duration. Therefore, it provides insights how pulse trains integrate at the level of the brainstem.

Keywords: multi-pulse stimulation, temporal integration, brainstem response, cochlear implants, threshold

# INTRODUCTION

Cochlear implants (CI) can restore hearing and speech understanding to people with severe to profound hearing loss to a surprisingly high degree by electrical stimulation of the residual auditory nerves (ANs). As the dynamic range of electric stimulation is much narrower than in the intact ear, it is necessary to set sensation thresholds and maximum stimulation levels properly. Both levels depend on the stimulation rate and on the number of pulses (or the length of the pulse train) delivered. These two parameters which contribute in temporal phenomena are known as multi-pulse integration (MPI) and temporal integration (TI) functions. For a fixed (usually long)

#### Edited by:

Tianruo Guo, University of New South Wales, Australia

#### Reviewed by:

Rachele Sangaletti, University of Miami, United States Mohit Naresh Shivdasani, University of New South Wales, Australia

> \*Correspondence: Ali Saeedi ali.saeedi@tum.de

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 21 December 2019 Accepted: 18 May 2020 Published: 30 June 2020

#### Citation:

Saeedi A and Hemmert W (2020) Investigation of Electrically Evoked Auditory Brainstem Responses to Multi-Pulse Stimulation of High Frequency in Cochlear Implant Users. Front. Neurosci. 14:615. doi: 10.3389/fnins.2020.00615

stimulation duration, the MPI function is referred to the function relating the psychophysical detection threshold (THR) with stimulation rate (McKay and McDermott, 1998). The TI function describes how the detection THR varies as a function of stimulation duration when the stimulation rate is fixed. The time range in TI functions varies from tens of milliseconds to hundreds of milliseconds with large individual variations. TI in acoustic hearing leads to a THR decrease with a slope of approximately 2.5 dB per doubling of stimulus duration up to about 300 ms (Gerken et al., 1990).

Studies which investigated TI functions for electric hearing generally claimed that, similar to MPI functions, TI slopes drop when the stimulation duration (or equivalently the number of pulses) increased, both in animal studies (Donaldson et al., 1997; Zhou et al., 2015) and in human studies (Zhou et al., 2015). Donaldson et al. (1997) found THR TI slopes of 0.42 dB/doubling of number of pulses, ranging from 1 to 64 pulses at a 100-pps stimulation rate. Zhou et al. (2015) found that for a stimulation rate of 640 pps, mean TI slopes dropped about 0.88 dB/doubling of stimulation duration from 31.25 to 250 ms (20 to 160 pulses). Donaldson et al. (1997) found that not only THRs but also loudness levels including maximum acceptable levels (MAL) dropped when the stimulation duration increased. For MALs, they found large intersubject variabilities of TI slopes, i.e., shallower, equally steep, and steeper TI slopes in comparison to the THR TI slopes. Obando Leitón (2019) measured TI functions for two rates in a very comprehensive study. Slopes showed a large variation between subjects but also for different electrodes within a subject. For a stimulus of 300-ms duration, slopes ranged from −5.24 dB to −2.32 dB/doubling, when stimulation rate increased from 1500 to 18000 pps. Over all subjects, Obando Leitón (2019) observed that increasing the stimulation rate from 1500 to 18000 pps caused THR levels to decrease by approximately 11 dB, which corresponds to a decrease of −3.1 dB/rate doubling. Obando Leitón (2019) also found that the MALs dropped by 4 dB when the stimulation rate was increased from 1500 to 18000 pps, which suggests a slope of −1.11 dB/rate doubling. Temporal integration effects between two pulses are usually quite small (Karg et al., 2013). Nevertheless, for long pulse trains MPI effects on THR and MAL can be large.

For low stimulation rates (below 1000 pps), THRs in CI users fall only by less than 1 dB/doubling of stimulus duration (Donaldson et al., 1997) when the stimulation rate is below 1000 pulses per second (pps). When the stimulation rate exceeds 1000 pps, the slope of the MPI function becomes steeper, in guinea pigs (Middlebrooks, 2004; Kang et al., 2010; Zhou et al., 2015) and in humans (Shannon, 1985; McKay and McDermott, 1998; Zhou et al., 2012; Carlyon et al., 2015). As an example, Kang et al. (2010) found a significant decrease in MPI slopes when rates below 1000 pps increased to above 1000 pps at two stimulation sites (1slopes = −2.88 and −2.83 dB/doubling of pulse rate at two stimulation sites). Similarly, Carlyon et al. (2015) observed a THR decrease of 7.71 dB when increasing the stimulation rate from 500 to 3500 pps for pulse durations of 400 ms, which is equivalent to a slope of −2.74 dB/rate doubling. An exception was Skinner et al. (2000), who found the MPI slope to drop by less than 0.1 dB/doubling of the pulse rate for rates above 1000 pps and even less for rates below 1000 pps. Slopes of MPI functions for C-levels are reported to be steeper for rates above 1000 pps compared to rates below 1000 pps (Zhou et al., 2012). In a human study, they found that MPI slopes for the C-levels were 0.65 dB, 0.54 dB, and 1.19 dB/doubling or the stimulation rate is steeper for rates above 1000 pps compared to rates below 1000 pps, respectively, for three stimulation sites. Zhou et al. (2012) observed that TI slopes for THRs were steeper than those for MAL/C levels. For basal and middle sites, MPI slopes for THRs were 1.24 dB and 1.07/doubling of the rate, respectively, which were 0.59 dB and 0.53 dB steeper than their corresponding MPI slopes for C-levels. Since Zhou et al. (2012) found no correlation between slopes of C-level and THR MPI functions, they claimed that the underlying mechanisms of these two functions are probably different.

Middlebrooks (2004) and Zhou et al. (2012) attributed the steeper MPI slopes at rates above 1000 pps to a residual partial depolarization mechanism, where initial subthreshold pulses partially depolarize a single AN or a group of ANs and further pulses, accruing within a 1-ms time window, increase the chance of firing an action potential, thus lowering the THR level. In terms of temporal considerations, this effect is also known as "facilitation," where the elevated membrane potential of the auditory nerve, as the effect of the first pulse in the train, facilitates it for the successive pulses to elicit an action potential (Hodgkin, 1938; Hodgkin and Huxley, 1952; Boulet et al., 2016).

The slopes of MPI functions are suggested to be possibly an indicator of cochlear health in the area close to the stimulation site, either in CI users (Kang et al., 2010; Pfingst et al., 2011; Zhou et al., 2012, 2018; Zhou and Pfingst, 2016) or in normalhearing listeners (Shannon, 1983). Psychophysical results from Kang et al. (2010); Pfingst et al. (2011) indicated that in guinea pigs, for stimulation rates below 1000 pps, there is a correlation between the THR MPI slopes and cochlear health state in terms of hair cell counts, auditory nerves, and ensemble spontaneous activity (ESA).

Electrical stimulation with high pulse rates are thought to resemble the spontaneous activity of ANFs in a healthy ear (Rubinstein et al., 1999; Litvak et al., 2003; Hughes et al., 2012). Rubinstein et al. (1999) found that for pulse rates above 2000 pps, human electrically evoked auditory compound action potential (eCAP) responses to a pulse train dramatically dropped after a strong response to the initial pulse of the train and sustained afterward. They interpreted this sustained activity as an independent quasi-stochastic activity of ANFs resulting from desynchronization of populations of ANFs. For stimulation rates below 1016 pps, they still observed an alternating amplitude pattern of the eCAP for successive pulses of the train after a relatively strong initial response to the first pulse. The rate at which the alternating pattern seemed to vanish and the sustained pattern appeared was referred to as "stochastic rate" (Hughes et al., 2012) and occurred at rates above 2033 pps in Rubinstein et al. (1999). Hughes et al. (2012) observed that the stochastic rate was variable (about 2400 to 3500 pps) between different electrodes in human subjects. Similar to human results, Litvak et al. (2003) found a sustained discharge rate in cat ANFs in response to a 5000-pps pulse train. They claimed that, since

no correlation between simultaneous measurements of pairs of ANF activities was found, the 5000-pps pulse rate desynchronized the auditory nerve activities, which is, again, evidence that high stimulation rates could improve neural representation to electric stimuli.

Another motivation to use high pulse rates in electric hearing is to represent the global stimulation rate induced by the stimulation rate of individual electrodes in CIs. Results of the finite element model from Bai et al. (2019) and measurement data from Obando Leitón (2019) and many others suggest that stimulation of a single electrode contact leads to a broad spread of current along the cochlea, which means that in electric hearing, neurons are stimulated not only by the nearest electrode but also by the neighboring electrodes. Therefore, the effective stimulation which reaches a spiral ganglion neuron—at least in the continuous interleaved stimulation (CIS) strategy—is a burst with the global stimulation rate originating from neighboring electrodes, which is very similar to our experiment.

The studies mentioned above investigated the effects of multipulse stimulation on either most central (psychophysical studies) or most peripheral (eCAPs or ESA) stages of the auditory system. It is still worth investigating such an effect at a location between these two extreme regions, which, to our best knowledge, has not yet been done. Such a study will shed light on the temporal integration at the level of the auditory brainstem as well as on how temporal properties such as refractoriness and facilitation would function. Based on these foundations, we designed this study to investigate electrically evoked auditory brainstem responses (eABRs) to high rate electrical multi-pulse stimuli in CI users. We measured eABRs to the stimuli with different number of pulses but with the same physical stimulation amplitude to see how multi-pulses are integrated in the level of the brainstem. We also evaluated the contribution of nerve responses to each pulse or to a few consecutive pulses in multi-pulse stimulation to estimate the post-stimulus time histogram (PSTH) of the nerve.

#### MATERIALS AND METHODS

Sixteen ears from twelve participants (two males, mean age: 56.5 years) implanted with Med-El CIs were measured (**Table 1**). Amplitude growth functions in MP conditions were measured from 8 ears (out of 16; last column of **Table 1**). Participants signed a written informed consent form and were paid for their participation. The experiment was approved by the Ethics Committee of Klinikum rechts der Isar, Munich.

#### Stimuli

In this study, we mainly focused on the analysis of eABR wave eV, which usually occurs at around 4 ms after the stimulus onset. This constrains the stimulation duration to be less than 4 ms; otherwise, stimulus and response would interfere. A further limitation comes from the large stimulation artifact, which follows the stimulus and limits the stimulation window to be even shorter. Therefore, in order to obtain clear eABR peak eVs, we employed a stimulation window of up to 1.6 ms, within which pulse trains of up to 16 pulses with a pulse rate of 10,000 pulses per second (pps) were closely packed together to form multi-pulse stimuli.

An overview of the stimuli is illustrated in **Figure 1**. Electric pulse trains of 1 pulse, 2 pulses, 4 pulses, 8 pulses, and 16 pulses were used. Pulses were anodic-leading charge-balanced biphasic pulses with a 45-µs phase width and a 2.1-µs interphase gap. Multi-pulse (MP) stimuli were assembled by putting single pulses together with an inter-pulse gap of 7.9 µs to achieve a pulse period of 100 µs and, consequently, a burst rate of 10,000 pps, which is well above standard clinical rates. All MP stimuli were delivered at a repetition rate of 37 Hz through an electrode in the middle of the array (subject specific electrode).

#### Pretest

In order to select the stimulation electrode for the experiment, trial psychophysical and eABR measurements were performed on electrode numbers 4, 5, 6, 7, and 8 (out of 12 electrodes in an apical-to-basal order). Psychophysical THRs and MCLs were determined by CI users. The stimulus was single-pulse (1 pulse condition) with the same parameters mentioned above. For each electrode in eABR measurements, the stimulation amplitude was set to 95% of the corresponding psychophysical dynamic range (DR, defined as MCL—THR). The electrode corresponding to the eABR with the largest wave eV amplitude was selected and used for the entire measurements. In case of electrodes with similar eV amplitudes, the one with larger DR was selected.

Once an electrode was determined, psychophysical thresholds (THR) and most comfortable levels (MCL) in MP conditions were adjusted by the subjects while they were seated on a comfortable coach. On a normal keyboard, the subjects used two keys (PgUp and PgDn) for coarse changes and two other keys (up arrow and down arrow) for fine changes. The procedure of adjustment was monitored by the examiners using a customdesigned graphical user interface. In order to avoid any visual biases, subjects did not see the monitor screen. The THRs and MCLs for each MP condition were measured in one trial round and two main rounds. Stimuli were presented randomly, but THR and MCL were measured in separate sessions. For THRs, CI users were asked to raise the stimulation amplitude until they could clearly perceive it and then reduced it so that they could not perceive it any more. For MCL measurements, they were asked to increase the stimulation amplitude to the highest level, which they could still comfortably stand for 3 min. This duration is about three times the duration of a single eABR recording trial. Only the results of the main rounds were used for psychophysical analysis and, later, for eABR measurements. The stimuli used in psychophysical measurements were the same as those employed in eABR measurements.

#### eABR Multi-Pulse Stimuli

We call the measured DRs in 1-, 2-, 4-, 8-, and 16-pulse conditions as DR1, DR2, DR4, DR8, and DR16, respectively. Maximum stimulation amplitudes (MSA) were always limited at 95% of the corresponding DRs to avoid very loud stimulation. They were called MSA1, MSA2, MSA4, MSA8, and MSA16, e.g., MSA4 means a stimulation amplitude of 95% of DR4. An


OM, otitis media; Co, concerto; P, pulsar; So, sonata; Sy, synchrony.

exception was subject S14R, where due to a strong artifact at 95% of DRs, 60% was used for all numbers of pulses.

**Figure 2** shows a schematic view of all stimulation conditions used in this study. Different numbers of vertical bars depict the number of pulses, and different bar sizes indicate stimulation amplitudes. Some conditions were not measured (n.m. in **Figure 2**) because they were above comfortable loudness. In each row of **Figure 2**, the number of pulses is constant, while the stimulation amplitude varies. Thus, a row-wise investigation of the table provides amplitude growth functions (AFG) of MP conditions. On the other hand, in each column of the table, the stimulation amplitude is constant, while the number of pulses varies. Thus, an investigation of the effect of number of pulses is feasible by column-wise investigation of the table. We also provide eABR amplitude growth functions (AGFs) in MP conditions from 8 ears (out of 16 ears). Stimuli with amplitudes of 5 to 95% corresponding DRs with steps of 10% were used.

#### eABR Recording

Stimulation scripts were written in MATLAB and executed on a personal computer equipped with a National Instrument (NI) I/O card. Subjects were asked to remove their speech processors

before the measurements, and stimuli were then generated and delivered to CIs via an external induction coil of a research interface box (RIB II), provided by the University of Innsbruck, Innsbruck, Austria.

The stimulation/recording setup is shown in **Figure 3**. The eABRs were recorded from surface electrodes glued on the skin. The positive electrode was placed behind the ear. The negative and ground electrodes were placed on the upper and lower forehead, respectively. Raw eABRs were recorded with a Biopac <sup>R</sup> MP36 system (Goleta, CA, United States) with a sampling rate of 100 kHz, a 24-bit A/D converter, and an amplifier gain of 1000. An internally implemented hardware band-pass filter with cutoff frequencies of 0.05 Hz and 20 kHz was used in eABR measurements. No trigger signal was recorded, as the electric stimulation artifact was large enough for stimulus onset detection. For each MP condition, 2184 epochs were recorded, each of which had a duration of 27 ms.

The skin beneath electrodes was cleaned with alcohol swabs, smoothly but thoroughly scrubbed to achieve low-electrode impedances. Conductive gel was used to increase the impedance match between the electrodes and the skin surfaces. Electrode impedances were controlled by the recording setup and were kept below 10 k. During eABR recording, subjects were either sitting or lying on a couch. They were asked to stay as calm as possible to avoid myogenic artifacts. Breaks were taken on regular intervals or on subjects' demands.

#### eABR Processing

Raw eABRs were processed offline using MATLAB R2017b in a series of steps. First, stimulus onset detection was performed using the electrical stimulation artifacts (which were larger than about 300 µV). They were orders of magnitudes higher than neuronal responses (maximum of about 2.6 µV). Using onset indices, data were divided into epochs of 27 ms long. Since most of the eABR information is within the first 10 ms, epoch lengths were reduced to 10 ms. Epochs contaminated with myogenic activities (e.g., eye blink, facial muscle movement) were removed, and only "clean" epochs were used in further analysis. In order to determine the clean epochs, the distribution of the RMS values of epochs was used. For all users, the RMS value of epochs had lognormal distribution. A normal distribution was fitted to the logarithm of the RMS (logRMS) value of epochs. Epochs with logRMS values in the range of µ ± kσ were considered as clean epochs. µ and σ were the mean and standard deviation of the fitted distribution, respectively. The k parameter was subjectspecific and varied from 0.7 to 2. Across all subjects, at least 2053 epochs (out of 2184 epochs) remained for averaging.

The next step dealt with electrical artifact suppression. The pattern of the electrical artifacts was subject-dependent. For some subjects, one-exponential fittings worked, while for other subjects, two-exponential fittings were required [blue curves in **Figure 4**, compared with Spitzer et al. (2006)]. Therefore, exponential functions with the general forms of Eq. (1) and Eq. (2) were used to eliminate electrical artifacts. For each subject, only one function was used for curve fitting, but for each measurement condition, the fitting was performed independently. The decision of using one exponential or two exponentials was made by visual inspection of the discharge curve shape. The starting point of the fitting window varied since the duration of electrical artifacts varied due to different numbers of pulses. Therefore, this parameter was excluded from the fitting curve, as in Hu et al. (2015). The end point of the fitting window was always set to 10.0 ms after the stimulus onset. The fitted artifact was subtracted from the individual eABR epochs.

$$f(t) = a\_0 + a\_1 e^{-b\_1 t} + a\_2 e^{-b\_2 t} \tag{1}$$

$$f(t) = a\_0 + a\_1 e^{-b\_1 t} \tag{2}$$

Noise was reduced by zero-phase digital filtering (band-pass 4th order Butterworth filter, passband: 100 Hz to 3 kHz). As a final stage, weighted non-stationary fixed multi-point (WNSFMP) averaging was applied (Silva, 2009). In this method, the variation of multiple fixed time points in subsets of epochs is analyzed to estimate the variance of the residual noise (RN). The WNSFMP method assumes stationary noise within a subset of epochs, but still lets the noise vary within different subsets. This enables the method to eliminate the effect of non-stationary noise and, on the other hand, to make a weighted averaging with weights being the inverse of corresponding subset variances. The WNSFMP method also provides post-average RN estimation; its variance (σˆ 2 RN) is a measure of RN power. In this study, amplitude variances were estimated as σˆ 2 amp = 2σˆ 2 RN, as in Undurraga et al. (2013).

Only eABR wave eV amplitudes and latencies were analyzed, as wave eIII was corrupted by the stimulation artifact, especially in the 8- and 16-pulse conditions. Wave eV amplitude was calculated as the difference of peak eV and the next trough, and the latency of wave eV was defined as the time point where peak eV occurred. Only amplitudes greater than <sup>√</sup> 2σˆRN were accepted as valid amplitudes and were used for further analysis. Exemplary final eABRs in 1-, 4-, and 8-pulse conditions are shown in **Figure 5** for three subjects.

## Statistical Analysis

Repeated-measures analysis of variance (ANOVA) was used to statistically test the effect of the number of pulses. Statistical analysis was performed in MATLAB 2017b. In psychophysical data, the within-subject variable was changed in THRs and MCLs, while in eABR data, the within-subject variable was changes in wave eV amplitudes. For pairwise comparisons, Bonferroni corrected post hoc analysis was applied. The statistical significance level was set to α = 0.05.

### RESULTS

#### Psychophysical Results

Results of psychophysical experiments are plotted in **Figure 6**. THRs and MCLs are plotted for individual subjects in **Figure 6A** with open blue and green circles, respectively. Total burst charges (TBCs) used to reach THRs and MCLs are also depicted in filled circles in **Figure 6B**. The TBC was defined as overall charges in positive phases of multi-pulses. The corresponding median values of each set of the data are shown with filled symbols.

The median THRs and MCLs for single pulses were 211.8 µA and 514.5 µA, respectively, which corresponds to TBCs (of the integrated positive pulse phases) of 9.4 and 23.1 nC, respectively. This corresponds to a dynamic range from 4.65 to 12.61 dB

FIGURE 4 | Surface electrode recordings (blue curves) and exponential fittings of stimulation artifacts (only after stimulation, red curves). The left column shows two-exponential fittings, and the right panels show one-exponential fittings. In each panel, the number of pulses and the stimulation amplitude are indicated. Note that the stimulation artifact exceeds the range displayed in the figure.

(median: 7.17 dB). With increasing number of pulses, both THRs levels and MCLs decreased monotonically, almost for every measurement and patient, with steeper drops for THRs. The median THR levels over all subjects dropped by about 6.30 dB when the number of pulses increased from 1 to 16 pulses, whereas the decrease for MCLs was only 2.90 dB. For the analysis, linear regression was calculated for each set of data and averaged. The THRs decreased with an average slope of 1.30/doubling of the

number of pulses (ranged from 0.65 to 2.34 dB/doubling), while the MCLs decreased with an average slope of 0.93 dB/doubling of the number of pulses (ranged from 0.66 to 1.32 dB/doubling).

Two-way repeated measures of ANOVA showed that THR and MCL data (amplitudes and TBCs) in **Figure 6** dropped significantly as a function of number of pulses. In panel A, both THR and MCL decreased significantly [main effect of the number of pulse; F(4,112) = 176.14, p < 0.001] when the number of pulses increased from 1 to 16. The interaction effects between THRs vs. MCLs were significant [F(4,112) = 5.26, p < 0.001], which indicates a shallower slope for MCLs compared to THRs. In panel B, THR and MCL TBCs increased significantly [main effect of the number of pulse; F(4,112) = 3470.2, p < 0.001] as a function of number of pulses. The interaction effects between THRs vs. MCLs were significant [F(4,112) = 5.26, p < 0.001], which indicates a shallower slope for THR TBCs compared to MCL TBCs.

#### eABR Results

Since eABR wave eIII was corrupted by the multi-pulse stimulation artifact especially in measuring conditions with larger number of pulses, we focused on wave eV amplitudes and latencies. **Figures 7**, **8** show individual eABR wave eV amplitudes and latencies for all CI subjects, respectively. Each panel consists of 15 data points (measurement conditions listed in **Figure 2**). In each panel, data points with the same color represent responses to stimuli with equal current amplitudes, but with different numbers of pulses. Amplitude growth functions in **Figure 7** (reading data for identical numbers of pulses) indicate that eV amplitudes grow generally monotonous with stimulus level. Lines in a single color show how wave eV parameters depend on the number of pulses. Note that because of maximum stimulation levels mentioned earlier, measurement conditions differ in number of data points. Since wave eV amplitude was calculated by subtraction of two values (peak eV and the following trough), error bars in **Figure 7** are equal to <sup>√</sup> 2σˆRN. No efforts were made to estimate error bars for latencies (**Figure 8**). Results of eABR eV amplitudes in multipulse conditions over all subjects are plotted in **Figure 9**. In each panel, data were normalized to (divided by) the corresponding responses at the largest number of pulses (2, 4, 8, and 16 pulses in panels A–D, respectively). Data points in gray show individual CI responses to multi-pulses, and the colored circles, which match the colors in **Figure 7**, are their corresponding median values. Data for MSA1 are not plotted, as all values were 1 due to normalization.

The stimulation amplitudes in MP conditions were 95% of the corresponding DRs for the longest burst. For shorter bursts, however, this stimulation amplitude was far below this value. Over all subjects, stimulation amplitudes of MSA16 (95% of DRs in 16-pulse conditions) corresponded to averages of 35, 46, 60, and 74% of the DRs in 1-, 2-, 4-, and 8-pulse conditions, respectively. Similarly, stimulation amplitudes of MSA8 (95% of DRs in 8-pulse conditions) corresponded to averages of 52, 63, and 78% of the DRs in 1-, 2-, and 4-pulse conditions, respectively. For example, for the 1-pulse conditions, the stimulation amplitudes were at 35, 52, 65, 80, and 95% of the DR (averaged over all subjects; more details are available in **Supplementary Figures S1, S2**). Visual inspection of the curves from individual CI subjects in **Figure 7** shows that intersubject variability is high. Yet, some trends could be detected. For most subjects, and particularly in 8-pulse and 16-pulse conditions, eABR wave eV amplitudes tend to increase when the number of pulses increased from 1 pulse up to a certain number of pulses, i.e., up to 2, 4, or 8 pulses, then they seem to saturate or even decrease. Such an increase was not found for the stimulation amplitude MSA16 (cyan data points in **Figure 7**) for S7L and S10L, where a monotonically decreasing trend was observed. The points where wave eV amplitudes reached their maximum depended on the subject but also on level within a subject. Due to a facial nerve artifact, eABRs in some conditions were not reliably measured and thus excluded from the dataset (e.g., subject S3R). Similar to the amplitudes, latencies across subjects showed high variability, as depicted in **Figure 8**. However, for a fixed stimulation amplitude (lines with single colors), the general trend was that latency was increasing with the number of

pulses. Moreover, for a fixed number of pulses, higher stimulation amplitudes resulted in shorter latencies, as expected.

Amplitude averaged over all subjects, depicted in **Figure 9**, suggests that wave eV grows when the number of pulses increased from 1 to 2 pulses and then tended to decrease for further pulses. Statistical analysis on overall results showed a significant difference only between 1- and 2-pulse conditions when the stimulation amplitude was MSA2 [F(1,14) = 4.73, p < 0.05] (red

data points in **Figure 9**) and MSA4 [F(2,28) = 3.66, p < 0.02] (green data points in **Figure 9**).

Overall results of wave eV latencies corresponding to data in **Figure 9** are depicted in **Figure 10**. Data in each panel were normalized to (subtracted from) the corresponding latencies at conditions with the largest number of pulses, i.e., MSA2, MSA4, MSA8, and MSA16 in panels A to D, respectively. Note that data for MSA1 are not plotted. Statistical analysis shows significant differences between 1 pulse and 4 pulses [F(2,28) = 3.15, p < 0.05] when the stimulation amplitude was MSA4 and also between four pairs when the stimulation amplitude is MSA8 [F(3,42) = 12.29; p < 0.01 for 1 pulse and 4 pulses, p < 0.01 for 1 pulse and 8 pulses; p < 0.02 for 2 pulses and 4 pulses; p < 0.01 for 2 pulses and 8 pulses]. In the 16-pulse condition, only the difference between 2-pulse and 16-pulse conditions was significant [F(4,40) = 4.80; p < 0.05].

**Figure 11** shows wave eV amplitudes and latencies as a function of stimulation amplitudes (%DR) in different MP conditions for 8 ears (out of 16 ears). Columns show results for different numbers of pulses, while top and bottom rows show results of wave eV amplitudes and latencies, respectively. The amplitude data in top panels was normalized to the largest wave eV amplitudes that could be measured in the 1-pulse condition (mostly 95% DR). Data from individual ears are in gray, and the corresponding median values are depicted in black. The median AGFs showed a monotonic increasing trend except for a few cases. Due to the small latency variabilities between subjects, latency data in bottom panels were not normalized. Visual inspection in top panels shows a saturating tendency for the AGFs in MP conditions. The variation of range of eV amplitudes as a function of number of pulses was insignificant only between 2 pulses and 16 pulses [F(4,24) = 7.55, p < 0.02]. The variation of ranges of eV latencies as a function of number of pulses was significant only between 1 pulse and 8 pulses [F(4,24) = 5.24, p < 0.02] and between 2 pulses and 8 pulses [F(4,24) = 5.24, p < 0.03].

The structure of data on AGFs in MP conditions is different from that presented in **Figures 9**, **10**. In the latter, we used fixed stimulation amplitudes for different numbers of pulses, while in the former, the stimulation amplitudes of the same percentage of the DRs were not identical. For instance, the physical stimulation amplitudes at 65% DR in 1, 2, 4, 8, and 16 pulses were not the same. Therefore, we could not apply the same analysis to both datasets.

#### DISCUSSION

#### Artifact Suppression

In neurophysiological measurements such as eABRs or eCAPs, electrical stimulation artifacts are inevitable. Factors such as stimulation mode, amplitude, phase width, polarity of

the stimulus, and stimulation site affect the magnitude and morphology of the stimulation artifact. Low stimulation amplitudes generate small artifacts, it may still be possible to extract eABRs without further processing (Gordon et al., 2008). Often even large artifacts decay rapidly, such that they do not interfere with the eABR waves and blanking of the artifact-contaminated region is sufficient (Tykocinski et al., 1995; Truy et al., 1998). When long and strong artifacts corrupt the eABRs, stimulation with alternating polarity is a further option to reduce artifacts (Abbas and Brown, 1991; Spitzer et al., 2006; Bahmer et al., 2008). However, due to non-linearities of the eABR generation (probably mostly due to the stimulation

electrodes), residual artifacts may remain even with alternating polarity stimulation. A different approach was proposed by Bahmer et al. (2010), who measured eABRs in response to triphasic pulses. They varied the distribution of charge over the three phases and selected a configuration, where the artifact was minimal. However, adopting this procedure for pulse train stimulation is not straightforward. In this case as well as when only single polarity stimuli are used, exponential fitting can be used to subtract artifacts (Undurraga et al., 2013; Hu et al., 2015). For stimuli consisting of multi-pulses, accumulated charges remaining from individual pulses yield to higher artifacts compared to single-pulse stimulation. This could be the reason why in this study it became apparent that the stimulation artifacts obviously had two components, which can be fitted by two exponential functions. This was already found in a few studies even for conventional biphasic (Spitzer et al., 2006) or triphasic stimuli (Bahmer and Baumann, 2012). Two-exponential fitting functions used in this study appeared to robustly and reliably remove the artifact even for long stimuli, e.g., 16 pulses, where the artifact superimposed with the eABR wave eV.

## TI Functions in Psychophysical Data

The first part of this study examined the TI functions of THRs and MCLs as a function of stimulation duration, which increased from a single pulse to 1600 µs (16 pulses). As the psychophysical THRs and MCLs in this study were determined for the purpose of eABR measurement, the stimulation pattern differed fundamentally from those usually used for psychophysical measurements in other studies (e.g., McKay and McDermott, 1999; Zhou et al., 2015). In this study, besides the high stimulation rate of 10,000 pps, a repetition (burst) rate of 37 bursts per second was presented, which was essential to record eABRs which require fast averaging. This way, it was possible to apply identical stimuli for both psychophysical measurements and eABR recordings. Nevertheless, even with these deviations in stimulation pattern, results were in line with previous studies. We observed a decrease of −1.31 dB/doubling of stimulation duration in TI slopes of THR levels. If this is combined with the TI slopes of −0.42 dB (Donaldson et al., 1997), −0.88 dB (Zhou et al., 2015), −1.0 dB, and −2.6 dB/doubling the number of pulses (Obando Leitón, 2019), one can see that the TI slopes decrease monotonically when the stimulation rate increased. We also compared the TI slopes of THR levels with those of wave eV amplitudes, for conditions of a fixed-stimulation amplitude (MSA8 and MSA16), while the number of pulses changed, as well as for conditions of a fixed number of pulses, while the stimulation amplitude changed (AGFs in 1-pulse and 2-pulses conditions). Details of these comparisons are available in **Supplementary Figures S2–S6**. TI slopes for MCLs showed a shallower decline of 0.78 dB/doubling the number of pulses, when compared to that of THRs. This was consistent with findings of Zhou et al. (2012) and Obando Leitón (2019), where shallower TI slopes were found for comfortable levels and MCLs, respectively. Nevertheless, given this shallow decline and that TBC is proportional to the power consumption of the implant, our results also show that very high pulse rates (when using biphasic pulses) are not very efficiently stimulating neurons (a schematic illustration of the integration of charges in the 16-pulse condition is depicted in **Supplementary Figure S7**).

The fact that not only a pulse rate (10,000 pps) but also a burst rate (37 bps) were employed in the study might raise the hypothesis that a combination of both rates, and not only the pulse rate, contributes to temporal integration functions. This needs us to investigate phenomena related to temporal processing of ANFs including refractoriness, facilitation, accommodation, and high-frequency spike rate adaptation (see Boulet et al. (2016) for review). Each of the mentioned phenomena is effective in certain conditions and time ranges. Refractoriness and high-frequency spike rate adaptation are related to conditions where the stimulation amplitude is (well) above thresholds (e.g., MCLs), whereas the facilitation and accommodation deal with subthreshold amplitudes. Refractoriness states that a single nerve fiber has an elevated threshold after firing an action potential (relative refraction period), in a short period after a first action potential it is even impossible to elicit another action potential (absolute refractory period). The duration of the absolute refractory period is around 0.5 ms (Hodgkin and Huxley, 1952; Matsuoka et al., 2001; Boulet et al., 2016); relative refractory period for the auditory nerve is about 4 ms (Boulet et al., 2016). This means that the high pulse rate used in this study (10 kHz) interacts with the refractory time for multi-pulse stimulation. That is, the population of nerves that responded to the first pulse of a multi-pulse burst cannot be activated by further pulses of the burst and instead, only a population other than that responded to the first pulse may respond to the second pulse of the burst.

Spike rate adaptation characterizes the reduced ability of ANs to elicit action potentials in response to pulse trains with relatively high rates (>250 pps). The time course of the spike rate adaptation effect is reported to be between 10 and 100 ms (Zhang et al., 2007; Miller et al., 2011; Boulet et al., 2016), when the stimulation lasts 300 ms, i.e., excitability of neurons starts to decrease immediately after the first spike and then with a time constant between 10 and 90 ms. In this study, although we used a high stimulation rate of 10,000 pps, the stimulation duration was not in the same range of that in abovementioned studies. Therefore, spike rate adaptation has a massive effect on temporal response properties in the present study; it can be concluded that responses are dominated by the first pulse, which is supported by the relatively small changes in MCL amplitudes when the number of pulses was increased. The time course of facilitation and accommodation is reported to be 0.5 ms and between 0.5 and 1 to 10 ms, respectively (Boulet et al., 2016). Therefore, ANFs could integrate residual charge for multi-pulse stimulation, which leads to lower THRs. On the other hand, the inter-burst interval of 27 ms is longer than the 0.5- to 10-ms accommodation window, so that ANF had enough time to recover.

#### eABRs to Multi-Pulse Stimulation

The notion that responses to a high-frequency burst are dominated by the first pulse is also supported by the relatively small changes in eABR responses when the number of pulses increased (**Figures 9**, **10**). The averaged changes in amplitudes were smaller than 2.22 dB and 0.1 ms in latency compared to

the single-pulse response with the same amplitude. **Figure 9** shows even a decreasing trend for the eV amplitude in MSA4, MSA8, and MSA16 after an initial increase from MSA1 to MSA2, which suggests that the response amplitude falls. Although the stimulation current in each panel of **Figure 9** is constant, the number of stimulation pulses, and with it the stimulation TBC, increased. Therefore, higher wave eV amplitudes in response to stronger stimuli would be expected, but this was not observed here. One possible explanation for this observation is destructive interferences, where peaks and troughs of responses to the first pulse are reduced by anti-phasic (because of the delay) responses to later pulses in the train. For instance, the eABR in the 16 pulse condition could be assumed as an arithmetic summation of responses to individual pulses [as in Eq. (3)] or groups of pulses [as in Eq. (4)]. The responses to groups of pulses can be extracted by simple subtractions: for example, the response to the second pulse is eABR<sup>2</sup> = eABR2<sup>p</sup> − eABR1<sup>p</sup> and the response to the third and fourth pulses could be derived as eABR3..<sup>4</sup> = eABR4<sup>p</sup> − eABR2p, where eABRip is the measured eABR to a train of ipulses. **Figure 12** depicts such a decomposition of the responses to groups of pulses in the 16-pulse condition for subject S8L. It can be easily observed how the responses to successive pulses, especially eABR5..<sup>8</sup> and eABR9..<sup>16</sup> (cyan and magenta curves), contribute to suppressing the wave eV amplitude of eABR<sup>1</sup> by pushing down the peak of eV of eABR1<sup>p</sup> as well as by pulling up its trough, both resulting in a smaller wave eV amplitude of eABR16p. A similar analysis on S8L data in MSA2, MSA4, and MSA8 conditions (not shown) supports the claim that the first pulse of the train has the dominant effect and responses to other pulses suppress the response to the first pulse. Therefore, the drop in eABR wave eV amplitudes of MSA4, MSA8, and MSA16 conditions might not be because of a weaker response but seems likely to be caused by destructive interference with eABR responses to later stimulation pulses. The effect of the destructive interference could be also observed in **Figure 11**, where the range of eV amplitudes decreased as a function of number of pulses (significant difference only between 2 pulses and 16 pulses) and latencies and their ranges were elevated (significant differences only between 1 pulses and 8 pulses and between 2 pulses and 8 pulses).

$$eABR\_{16p} = eABR\_1 + eABR\_2 + \dots + eABR\_{15} + eABR\_{16} \tag{3}$$

$$eABR\_{16p} = eABR\_1 + eABR\_2 + eABR\_{3\dots4}$$

$$+ eABR\_{5\dots8} + eABR\_{9\dots16} \tag{4}$$

Here an additional support for the destructive interference rationale mentioned above is provided. As mentioned in the section "Materials and Methods," at each multi-pulse condition, eABRs to MSAs, which were defined as 95% of psychophysical MCLs, were measured. Assuming that all MSAs induce the same hearing impression (loudest tolerable level) to each CI subject, similar eABR signals and, consequently, similar wave eV amplitudes are expected. However, as shown in **Figure 13A**, when the number of pulses increased, the eABR wave eV amplitudes in response to MSAs tended to decrease, but not to preserve. The opposite trends in stimulation TBCs (**Figure 13B**) and wave eV

amplitudes (**Figure 13A**) also support the rationale of destructive interference, as more TBC would mean more activated ANFs and, consequently, larger eV amplitudes. Additionally, such a destructive effect was found to reverse the tendency of latency, where normally shorter latencies are expected for higher stimulation amplitudes. **Figure 10**, however, suggests longerwave eV latencies (maximum of about 0.1 ms) over all subjects, when the number of pulses increased.

#### Efficacy of Multi-Pulse Stimulation

For electric biphasic stimulation, pulse shape could affect the detection THRs at the level of a single ANF, eCAPs, or eABRs. It is known that pulses with longer phase durations evoke stronger neural responses when compared to pulses with shorter durations and equal-stimulation amplitude. This means that, in comparison to shorter phases, pulses with longer phases need less current to reach THR. However, according to the fact that the nerve membrane functions more as a leaky integrator rather than a perfect one, pulses with longer phases seem to be less efficient than those with shorter phase durations of the same overall charges (Abbas and Brown, 1991; Shepherd et al., 2001). For single pulses, Moon et al. (1993) observed mean slopes of −3.60 and −5.71 dB/doubling of phase duration when pulse duration was less or more than 0.5 ms/phase, respectively. The effect of phase duration on eCAP and eABR was also found to be correlated with auditory nerve survival in guinea pigs (Prado-Guitierrez et al., 2006). Shepherd and Javel (1999) investigated the efficacy of pulses of different shapes. They found that not only ordinary biphasic pulses but also chopped pulses could make a single ANF elicit an action potential. Shepherd and Javel (1999) also found that charge packages of 2 × 30, 3 × 20, and 6 × 10 µs of same polarity, followed by a series of reversed polarity, could charge the nerve membrane even up to eliciting an action potential. This packet structure, which was called a "chopped pulse," was found to show 1.5-dB higher THRs (less efficient) than a 60-µs/phase biphasic pulse with a 60-µs interphase gap and, interestingly, at least about 1.5 dB lower THRs (more efficient) when compared to a 60-µs/phase biphasic pulse without interphase gap.

Although the electric current and charge are closely related, in electric hearing, the current, rather than the charge, plays the main role in stimulating auditory nerves. Moreover, in MED-EL implants there is a coupling capacity, which forces the net charge to be zero. A net residual potential of the electrodes should have no effect in the resistive fluid. In such a structure, if the stimulation mode was 100% efficient, it could be expected that the total charge required to elicit THR/MCL remained constant. In such a condition, the stimulation amplitude in an m-pulse condition should decrease by a factor of <sup>1</sup> m , compared to the 1-pulse condition. This was not found in the data of the present study. **Figure 6** highlights the inefficiency of multi-pulse stimulation. The TBC of the positive phases in a multi-pulse condition is plotted as a function of the number of pulses for THR and MCL. In both THR and MCL data (**Figure 6B**), the TBC needed to elicit THR/MCL increased drastically as a function of the number of pulses (see also **Supplementary Figure S7**). The steeper slope for THRs shows a stronger inefficiency compared to that for MCLs. The inefficiency found in this study can be attributed to rapid phase switching of pulses; therefore, multipulse stimuli are far less efficient than single pulses.

# Temporal Effects in eABRs to Fast Pulse Trains

Since all multi-pulse stimuli used in the eABR section of this study were (well) above THR, temporal phenomena such as facilitation and accommodation would not be involved in temporal processing of ANFs. Refractoriness and depression, however, are likely occurring phenomena and the eABR measurements might shed light on these effects. Abbas and Brown (1991) employed a masker-probe paradigm in which an initial pulse, termed masker, followed by a second pulse, named probe, with varying inter-pulse intervals from the masker was used to measure eABRs. They found that average durations of 5.10 and 4.63 ms, respectively, were needed for the probe (second) pulse to fully recover, using two different CI types. Their findings seem to be consistent with the relative refractory period of about 4 ms, as reported in Boulet et al. (2016). This also suggests that, in the 16-pulse condition of the present study, where the stimulation lasted for 1.6 ms, a portion of the ANFs might fire twice during the train. This portion would probably be those ANFs which responded to the first pulses and, later, most likely to the pulses close to the end of the train, due to their recovery after their absolute refractory period.

Particularly in multi-pulse stimulation employed in this study, the initial pulse activated a population of ANFs, which consequently led to a detectable eABR in the brainstem. This population is not capable of responding to the second pulse and has only limited responses during the rest of the pulses in the burst, because of the refractoriness. Therefore, another population of ANFs, other than the one that responded to the first pulse and presumably farther than that, might be capable of eliciting action potentials as a response to the second pulse. In case the second pulse alone is not strong enough, a group of pulses might be able to make ANFs fire, as described in Eq. (4). Generalized to further pulses, characteristics of wave eV amplitudes in response to multi-pulse stimulation provide insight into how multi-pulse stimuli are integrated at the level of the brainstem and they might be a potential measure of health state and/or survival of ANFs.

Bai et al. (2019) and Obando Leitón (2019) confirmed that stimulation of a single electrode of the CI leads to a broad spread of current along the cochlea, which means the auditory nerves are stimulated not only by the nearest electrode but also by a number of neighboring electrodes. This would mean that in the CIS strategy the effective stimulation rate in electric hearing is not the rate of individual electrodes but a burst with the global stimulation rate originating from neighboring electrodes with overlapping current spread. Considering a typical

stimulation rate of 800–2000 pps for individual electrodes, the high stimulation rate of 10,000 pps used in this study represents the global stimulation rate induced by stimulation of N neighboring electrodes. Thus, eABRs in response to multipulse stimuli of high rate could be used for estimation of THRs like those used in clinics. This assumption of course requires further investigation.

#### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Klinikum rechts der Isar der Technischen Universität München. The patients/participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

AS contributed to the study design, data collection, data analysis, and manuscript drafting. WH contributed to the study design and

#### REFERENCES


critical manuscript revision. All authors contributed to the article and approved the submitted version.

#### FUNDING

This work was supported by the German Research Foundation (DFG) under the D-A-CH program (HE6713/2-1), the Technical University of Munich (TUM), and the Ministry of Science, Research and Technology of Iran (22494/N).

#### ACKNOWLEDGMENTS

We would like to greatly thank the CI users who patiently cooperated during the measurements. We would also like to thank Auguste Schulz for helping in subject recruitment and data collection and Drs. Sonja Karg and Miguel Obando Leitón for their helpful suggestions during measurements and data analysis.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2020.00615/full#supplementary-material



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Saeedi and Hemmert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Toward Long-Term Communication With the Brain in the Blind by Intracortical Stimulation: Challenges and Future Prospects

#### Eduardo Fernández1,2,3 \*, Arantxa Alfaro2,4 and Pablo González-López1,5

1 Institute of Bioengineering, Universidad Miguel Hernández, Elche, Spain, <sup>2</sup> Center for Biomedical Research in the Network in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain, <sup>3</sup> John A. Moran Eye Center, University of Utah, Salt Lake City, UT, United States, <sup>4</sup> Hospital Vega Baja, Orihuela, Spain, <sup>5</sup> Hospital General Universitario de Alicante, Alicante, Spain

#### Edited by:

Alejandro Barriga-Rivera, The University of Sydney, Australia

#### Reviewed by:

Frank Rattay, Vienna University of Technology, Austria Mohit Naresh Shivdasani, University of New South Wales, Australia

> \*Correspondence: Eduardo Fernández e.fernandez@umh.es

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 24 December 2019 Accepted: 03 June 2020 Published: 11 August 2020

#### Citation:

Fernández E, Alfaro A and González-López P (2020) Toward Long-Term Communication With the Brain in the Blind by Intracortical Stimulation: Challenges and Future Prospects. Front. Neurosci. 14:681. doi: 10.3389/fnins.2020.00681 The restoration of a useful visual sense in a profoundly blind person by direct electrical stimulation of the visual cortex has been a subject of study for many years. However, the field of cortically based sight restoration has made few advances in the last few decades, and many problems remain. In this context, the scientific and technological problems associated with safe and effective communication with the brain are very complex, and there are still many unresolved issues delaying its development. In this work, we review some of the biological and technical issues that still remain to be solved, including longterm biotolerability, the number of electrodes required to provide useful vision, and the delivery of information to the implants. Furthermore, we emphasize the possible role of the neuroplastic changes that follow vision loss in the success of this approach. We propose that increased collaborations among clinicians, basic researchers, and neural engineers will enhance our ability to send meaningful information to the brain and restore a limited but useful sense of vision to many blind individuals.

#### Keywords: visual prostheses, blindness, biocompatibility, biotolerability, neuroplasticity, visual cortex

# INTRODUCTION

Visual impairment affects personal independence, reduces quality of life, and has a significant impact on the lives of those who suffer it (Bourne et al., 2017). Although some visual pathologies can be effectively treated, and there are some novel approaches to slow down the progression of several eye diseases, including gene and stem cell therapies (Higuchi et al., 2017; Artero Castro et al., 2018; Llonch et al., 2018; Benati et al., 2019; West et al., 2019), unfortunately, there are not treatments for all causes of blindness (Fernandez, 2018). Therefore, many scientists have long dreamed of the possibility of restoring vision by using neural prosthetic devices that bypass the damaged visual pathways.

The concept of artificially producing a visual sense in the blind is based on our current understanding of the structure of the mammalian visual system and the relationship between

electrical stimulation of any part of the visual pathways and the resulting visual perceptions (Fernandez and Normann, 1995; Maynard, 2001). Thus, several research groups are focusing their efforts on the development of new approaches for artificial vision based on electric stimulation of the retina (Da Cruz et al., 2016; Lorach et al., 2016; Stingl et al., 2017), optic nerve (Duret et al., 2006; Lu et al., 2013; Gaillet et al., 2020), lateral geniculate nucleus (Vurro et al., 2014; Killian et al., 2016), or visual cortex (Fernandez et al., 2005; Normann et al., 2009; Kane et al., 2013; Normann and Fernandez, 2016; Fernandez, 2018; Niketeghad et al., 2019). All of these prosthetic devices work by exchanging information between the electronic devices and different types of neurons, and although most of them are still in development, they show promise of restoring vision in many forms of blindness.

At present, retinal prostheses are the most successful approach in this field, and several retinal devices have already been approved for patients with retinal dystrophies (Da Cruz et al., 2016; Stingl et al., 2017). However, the inner layers of the retina can degenerate in many retinal diseases. Consequently, a retinal prosthesis may not be useful, for example, in patients with advanced retinal degenerations, glaucoma, or optic atrophy. Therefore, there are compelling reasons for the development of other approaches able to restore a functional sense of vision bypassing the retina.

In this framework, since the neurons in the higher visual regions of the brain are usually spared from the damage to the retina and optic nerve, several researchers are trying to develop visual prostheses designed to directly stimulate the brain. Even if only a crude representation of the surrounding physical world can be evoked, a blind individual could use this artificially encoded neural information for tasks such as orientation and mobility. This functional performance has already been attained in the field of auditory prostheses. These devices have already allowed many deaf patients to hear sounds and acquire language capabilities (Merkus et al., 2014; Glennon et al., 2019), and the same hope exists in the field of neuroprosthetic devices designed for electrical stimulation of the visual cortex.

However, in spite of all the progress in materials and neuroelectronic interfaces, the scientific and technological problems associated with the long-term biocompatibility and biotolerability of cortical electrodes, together with the difficulties associated with the encoding of visual information, are very complex. Moreover, it is still unclear how to identify the ideal candidates for a cortical prosthesis (Merabet et al., 2007). Therefore, there are still many unresolved issues delaying its development. We summarize herein some of the main biological and technical issues that still remain to be fully solved, related mainly to the field of intracortical devices, and discuss some of the challenges in this highly multidisciplinary field.

# ELECTRODES THAT INTERACT WITH THE BRAIN IN THE BLIND: GENERAL REMARKS

Otfried Foerster was the first neurosurgeon who exposed the occipital area of one cerebral hemisphere in an awake patient (under local anesthesia) and electrically stimulated it (Foerster, 1929). He found that electrical stimulation of this region of the brain induced the perception of small spots of light directly in front of the subject. These early findings, together with the studies of Wilder Penfield and co-workers in epileptic patients (Penfield and Rasmussen, 1950; Penfield and Jaspers, 1974), established the anatomical and physiological basis for the development of a cortical visual prosthesis for the blind. Later on, Giles Brindley in England (Brindley and Lewin, 1968a,b; Rushton and Brindley, 1978) and William Dobelle in the United States (Dobelle and Mladejovsky, 1974; Dobelle et al., 1976; Dobelle, 2000) showed that simultaneous stimulation of several electrodes placed on the surface of the brain allowed blind volunteers to see some predictable simple patterns, including Braille characters and letters (Bak et al., 1990; Schmidt et al., 1996). However, there were also some problems, such as the induction of epileptic seizures and the appearance of pain due to meningeal or scalp stimulation. These issues were associated with the large active surface of the electrodes, which required high electrical currents of the order of milliamps to evoke phosphenes. In addition, these large electrodes interacted with relatively large volumes of cortex (∼1 cm<sup>3</sup> ), resulting in very low spatial resolution of the perceived phosphenes (Christie et al., 2016; Niketeghad et al., 2019). These later findings have recently been confirmed by Beauchamp et al. (2020), who implanted two different types of electrodes on the surface of the visual cortex of two blind individuals and found that when multiple electrodes were stimulated simultaneously, phosphenes fused into larger formless perceptions, making shape recognition impossible.

Cortical artificial vision did not seem feasible until we could find a way to provide a much more focal stimulation of neurons in the visual cortex (Normann et al., 1996). This led a number of investigators to develop new approaches such as smaller intracortical electrodes designed to be similar in size to the cell bodies of the neurons they are trying to stimulate and able to penetrate through the surface of the cortex (Normann et al., 1999; Troyk et al., 2003; Wise, 2005). These new microelectrodes can be located very close to the neurons they intend to stimulate, which are situated generally at 1–1.5 mm from the cortical surface, avoiding the relatively high electrical currents required by surface electrodes. Thus, we recently implanted an array of 100 penetrating electrodes (a Utah Electrode Array) in the occipital cortex of a 57-year-old person during a six-month period, and we found that stimulation thresholds to excite neurons were in the 1- 100 microamp range (Fernandez et al., 2019). This is clearly two to three orders of magnitude smaller than the currents required to evoke phosphenes using surface electrodes.

Some examples of these new penetrating neural interfaces are the arrays built with metal microelectrodes, the Utah Electrode Array, the implantable microcoils for intracortical magnetic stimulation (Lee et al., 2016), and other penetrating devices made of a variety of other materials (Fernandez and Botella, 2017). However, although these penetrating microelectrodes have been used successfully in both the central (CNS) and peripheral (PNS) nervous systems, the brain imposes some specific conditions such as the absence of regeneration and the presence of different types of glial cells. Moreover, the requirements for electrical

stimulation and recording in the brain are clearly different from those in the peripheral nervous system. Thus, the brain hosts different types of neurons arranged in several superficial layers and in deep nuclei and various types of glial cells that interact in very intricate ways. Furthermore, the brain is protected by the meninges, a multi-layered structure formed by connective tissue, bone, and skin. This means that it is impossible to reach the desired cortical neurons without affecting neighboring parts of the nervous system. Likewise, the brain tissue includes a complex network of blood vessels that are likely to be injured by the introduction of any external device (**Figure 1**).

In addition, we should also consider the mechanical micromovements between the pulsating neural tissue (due mainly to cardiac pulse and breathing) and the static implants, which can induce different kinds of damage (Polanco et al., 2016). All of these factors place high demands on the long-term function of any intracortical electrode and also impose unique constrains for the materials, packaging, and insulation of the electronics (Normann and Fernandez, 2016).

# BIOTOLERABILITY OF NEURAL ELECTRODES

The implantation of any intracortical microelectrode into the brain is a traumatic procedure, and all neural electrodes to date, even those considered to be highly biocompatible, induce biological responses characterized by small microhemorrhages and a certain amount of local tissue damage around the electrodes that may impact the stability, performance, and viability of the microelectrodes. Therefore, some authors suggest that instead of biocompatibility, we should talk about biotolerability, highlighting the capacity of the microelectrodes to stay fully functional in the brain without inducing any significant tissue damage for long periods of time (Fernandez and Botella, 2017).

While most materials used currently for the fabrication of intracortical electrodes remain relatively inert in the brain, they still induce a foreign-body reaction (FBR) characterized by a neuroinflammatory response of the tissue around the electrodes that may hinder the recording and stimulation of the neurons over time (Marin and Fernandez, 2010; Fernandez and Botella, 2017). Often, the FBR starts with the damage to the blood vessels encountered during the implantation of the microelectrodes in the neural tissue (see **Figure 1**), which causes small interstitial microhemorrhages. These microhemorrhages stop spontaneously, but there is also increased blood flow to the damaged region, together with increased permeability of local microvasculature, which induces extravasation of fluids, blood cells, and proteins toward the interstitial space. Thus, the microelectrodes become surrounded by many blood cells and plasma proteins that stick to their surface. **Figure 2** shows a representative example. Therefore, blood compatibility should be considered an important issue for improving the long-term performance and viability of any neural electrode.

On the other hand, as has been reviewed in detail elsewhere (Zhong and Bellamkonda, 2008; Marin and Fernandez, 2010; Fernandez and Botella, 2017; Ferguson et al., 2019), the inflammatory responses to the implantation of any neural probe into the brain involve a large network of physiological responses including edema, release of cytokines, platelet activation, complement system activation, invasion of blood-borne macrophages, and activation of neighboring astrocytes and microglial cells (Lee et al., 2005; Polikov et al., 2005; Biran et al., 2007; Grill et al., 2009; Mcconnell et al., 2009; Marin and Fernandez, 2010). Subsequently, activated macrophages surround the microelectrodes and fuse into multi-nucleated giant cells that form a barrier, similar to a thin protective membrane, that shields brain tissue from damage (Polikov et al., 2005). Most of these processes are spontaneously resolved; however, glial scarring and giant cells can be found around many microelectrodes implanted chronically in the brain (Polikov et al., 2005). This suggests the existence of a chronic inflammation reaction that persists over time and can induce the development of a dense sheath around the microelectrodes, making

it difficult to record and stimulate nearby neurons. As a result, long-term biocompatibility or biotolerability is still an unresolved issue, and most intracortical microelectrodes have a maximum in vivo lifetime of several months or a few years (Suner et al., 2005; Prasad et al., 2012; Barrese et al., 2013).

A significant challenge here is to reduce the neuroinflammatory response. In recent years, several strategies for minimizing trauma and the inflammatory responses have been investigated, for example, the reduction of the cross-sectional area of the electrodes (Seymour and Kipke, 2007) and the use of more flexible and soft materials that better match the properties of the surrounding tissue (Patel et al., 2016; Fernandez and Botella, 2017; Cuttaz et al., 2019; Wang et al., 2019). However, these modifications also affect the mechanical properties of the electrodes and could result in a lack of the mechanical strength needed to withstand insertion without buckling and breaking. Another relatively simple way to control the biological responses and improve the long-term biotolerability of neural electrodes is the modification of the chemical composition of the surface of the electrodes by using different polymers and nanomaterials (Hara et al., 2016; Fernandez and Botella, 2017; Gulino et al., 2019). Moreover, we should also consider that the electronics and the connecting pathways to individual microelectrodes must be completely insulated and have to remain perfectly functional over time, which also imposes unique constraints on hermetic packaging (Jiang and Zhou, 2009; Vanhoestenberghe and Donaldson, 2013).

Although it is often not mentioned, an important issue for the long-term success of any neural implant is the quality of the surgical implantation procedures. Thus, we believe that many difficulties encountered in chronic experiments could be directly related to problems during surgery and implantation. Careful implantation seems to increase the biotolerability and longterm longevity of intracortical microelectrode arrays, and there is no way to substitute for good planning and an adequate surgical technique.

# NUMBER OF ELECTRODES REQUIRED FOR FUNCTIONAL VISION

The functional vision that could be restored with an array of intracortical microelectrodes implanted into the brain is a function of many parameters, but it is in part related to the number of implanted electrodes, the interelectrode spacing, and the specific location of each microelectrode in the brain (Cha et al., 1992; Dagnelie et al., 2006). However, the assumption that visual perception will improve by increasing only the number of electrodes may be incorrect.

Although we see with the brain, the input information to the visual system begins at the eye, which catches and focuses light onto the retina. The human retina is approximately 0.5 mm thick and contains both the photoreceptors or sensory neurons that respond to light and intricate neural circuits that perform the first stages of imaging processing. The output neurons of the retina are the ganglion cells, which send their axons (approximately 1–1.5 million per eye) through the optic nerve to the brain (Watson, 2014). This means that, in order to encode all the features of objects in the visual space (for example, their form, localization, contour, intensity, color, etc.) and the change of these features in time in the same way that the human retina does, we would need at least 1 million parallel channels, which is clearly well beyond the state-of-the-art of current prosthetic technologies.

Fortunately, despite the above-mentioned figures, the results of several simulation studies suggest that the amount of visual input required to perform basic visually guided tasks is not as great as one might expect. In a series of psychophysical experiments, it has been estimated that 625 electrodes implanted at the primary visual cortex could be enough for reading (although to lower speeds) and to navigate through complex visual environments (Cha et al., 1992). In this framework, the possibility of providing some degree of functional vision to facilitate the activities of daily living with only around 600–700 electrodes is very encouraging (Dagnelie et al., 2006). However, this low number of electrodes also usually implies a "tunnel vision": a restricted visual field that can be a serious problem

for orientation and mobility. To cope with this problem, we can implant several arrays of penetrating microelectrodes at different locations of the visual cortex. In this context, multiple microelectrode arrays have already been implanted in monkey visual cortex (Chen et al., 2017; Roelfsema and Holtmaat, 2018; Van Vugt et al., 2018; Self et al., 2019) and these implants are providing a better understanding of how the brain enhances the representations of visual objects in different visual regions (Klink et al., 2017; Self et al., 2019). However, more experiments are still needed, and probably the question of how many electrodes are necessary to restore a limited but useful vision will only be addressable by future experiments in blind subjects.

# ENGINEERING A WIRELESS INTRACORTICAL DEVICE WITH HUNDREDS OF ELECTRODES

Although ongoing studies suggest that electrical stimulation via multiple electrodes may give rise to useful vision, extensive efforts are still needed to address the engineering challenges of realizing an intracortical device containing hundreds of electrodes. Furthermore, the device must be wireless, since it is necessary to avoid wires to reduce post-surgical complications such as, for example, the risk of infection. In this context, power and communication constraints, as well as power dissipation in the brain, could pose significant challenges (Sahin and Pikov, 2011; Lewis et al., 2015). Other relevant issues in this framework are the so-called "crosstalk" or interference between stimulating electrode sites and the multiplexing of stimulation channels (Barriga-Rivera et al., 2017). Thus, there is a clear need to develop new implantable technologies optimized for high channel count.

On the other hand, patients with retinal implants have to undergo long fitting procedures to measure thresholds and finetune the stimulation parameters on each individual electrode, but these procedures are not viable if hundreds or thousands of electrode sites need to be tested. Therefore, we need further procedures for fitting devices containing hundreds of electrodes in patients. A possible approach to facilitate the fitting procedures could be to develop bidirectional intracortical devices able to record the neuronal activity in response to electrical stimulation and use the recorded neural activity to optimize the stimulation parameters (Rotermund et al., 2019). Another possibility could be to use machine learning to find optimal stimulation settings (Kumar et al., 2016). In any case, more studies are still needed.

# DELIVERY OF INFORMATION TO IMPLANTS

Besides the number of electrodes and the engineering challenges, a key issue for the future success of cortical visual implants is related to how the brain understands artificially encoded information. All visual prostheses developed to date provide very poor vision, with relatively low spatial resolution; therefore, great efforts are still needed to design and develop new systems that can have results similarly successful as those achieved with cochlear implants.

Part of the success of cochlear implants seems to be related to the development of sophisticated signal-processing techniques and bioinspired coding strategies developed over the years (Clark, 2015; Boulet et al., 2016; Jain and Vipin Ghosh, 2018). Despite these encouraging results, most visual prosthesis devices only try to emulate the phototransducer aspects of the retina and do not consider the complex processes that are found in the mammalian visual system. Some researchers have proposed that performance could be increased significantly by incorporating neural code (Nirenberg and Pandarinath, 2012), whereas others promote the use of computer vision algorithms and techniques of artificial intelligence (Sanchez-Garcia et al., 2020). Although more studies are still needed, we expect that bio-inspired visual encoders based on intelligent signal and image-processing strategies, together with new cutting-edge artificial intelligence algorithms running neuromorphic hardware, could have a significant impact in the future to facilitate the interpretation of the processed signals (Fernandez, 2018).

On the other hand, whereas there are many relevant aspects in a visual scene (for example, form, color, and motion), most current coding strategies are only aimed at addressing the spatial details. This could be an oversimplification since, for example, the ability to recognize patterns in a scene, or the perceived receptive field size, is critical for many visual tasks. Thus, we can extract complex information, such as identifying human faces, from relatively poor-quality images by using specific cues and multiple visual features (Sinha, 2002). This suggests that besides image resolution, we should try to pay attention to other relevant visual attributes such as receptive field size, localization, orientation, or movement.

Another important issue is to focus on the specific needs of the end users. For example, some people may place more demands on object- or person-identification, whereas others could prefer to focus on orientation and mobility. The key issue is to encode and send useful information that can be translated into functional gains for daily life activities (Merabet et al., 2007). In addition, it is possible that there are subtle differences in the perceived visual field or in coding among subjects. Therefore, future advanced systems to interact with the brain in the blind should allow the customization of the functions to satisfy the particular needs and capabilities of each user.

# NEURAL PLASTICITY

The adult visual cortex does not completely lose its functional capacity after years of deprivation of visual input (Brindley and Lewin, 1968a); however, there is clear clinical evidence showing adaptive neurophysiological changes in the brain, specifically at the occipital lobe. Therefore, a relevant question is whether these adaptive changes could have a significant impact on the success of a cortical visual prosthesis.

In response to the loss of vision, brain areas normally devoted to the processing of visual information are recruited to process tactile and auditory information and even cognitive functions

such as verbal memory and speech processing (Fernandez et al., 2005; Gilbert et al., 2009; Legge and Chung, 2016; Beyeler et al., 2017; Singh et al., 2018; Castaldi et al., 2020). These changes are related to the capability of blind subjects to extract greater information from other senses such as touch and hearing. Thus, neuroplasticity can be viewed as an adaptive and dynamic process able to change the processing patterns of sensory information.

This neuroplasticity implies that the brain undergoes important remodeling and adaptive changes after the onset of the blindness that could directly impact the success of any cortical prosthesis (Glennon et al., 2019). Over time, these adaptive changes may lead to the establishment of new connections and functional roles of different brain areas, which is probably influenced by factors such as the cause of the visual loss and the duration of visual deprivation. All these issues may help to define a preferred time window for improving the likelihood of success of any device intended for communicating with the brain in the blind.

On the other hand, it is unlikely that the re-introduction of the lost sensory input alone will be able to promptly restore sight. Therefore, we should try to develop specific strategies to communicate with the brain of the blind in order to increase the chances of extracting useful information from the artificially encoded stimulation. Furthermore, we should consider the challenges of visual rehabilitation. Thus, improved rehabilitation strategies after the surgical implantation could contribute greatly to ever improving the performance of the neuroprosthetic devices.

#### CONCLUSION AND FUTURE PERSPECTIVES

The development of new prosthetic technologies for restoring vision to many blind individuals for whose impairment there is currently neither prevention nor cure is a must for the future.

Cortical prostheses based on penetrating microelectrodes show promise for restoring some limited but useful vision to subjects with certain forms of blindness, but the scientific and technological problems associated with safe and effective communication with the visual brain are very complex, and there are still many unresolved issues delaying its development. We expect that ongoing research on the interactions between intracortical microelectrodes and the local cellular environments, along with a better understanding of neuroplasticity and progress in medical technologies, materials science, neuroelectronic

### REFERENCES


interfaces, neuroscience, and artificial intelligence, will allow advances toward the success envisioned by this technology. Nevertheless, we should go step by step and not create false expectations or underrate the challenges that still remain to be resolved. In this framework, we propose that increased collaborations among clinicians, basic researchers, and neural engineers will enhance our ability to send meaningful information to the visually deprived brain and will help to restore a limited but useful sense of vision to many profoundly blind people.

### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Hospital General Universitario de Alicante. The patients/participants provided their written informed consent to participate in this study.

#### AUTHOR CONTRIBUTIONS

EF, AA, and PG-L contributed to the design and implementation of the research and writing of the manuscript. All authors contributed to the article and approved the submitted version.

### FUNDING

This work was supported by grant RTI2018-098969-B-100 from the Spanish Ministerio de Ciencia Innovación y Universidades, by PROMETEO/2019/119 from the Generalitat Valenciana, and the Bidons Egara Research Chair of the University Miguel Hernández.

# ACKNOWLEDGMENTS

We are grateful to Dr. Lawrence Humphreys (CIBER-BBN) for critical reading of the manuscript.

primates. J. Neural. Eng. 10:066014. doi: 10.1088/1741-2560/10/6/06 6014



E. Fernandez, and R. Nelson (Salt Lake City, UT: University of Utah Health Sciences Center).



(New York, NY: Springer-Verlag), 574–593. doi: 10.1007/978-3-540-35397-3\_ 52


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Fernández, Alfaro and González-López. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

digital media

of impactful research

article's readership