<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2021.726582</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>ES-ImageNet: A Million Event-Stream Classification Dataset for Spiking Neural Networks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Lin</surname> <given-names>Yihan</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1546958/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ding</surname> <given-names>Wei</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1546965/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Qiang</surname> <given-names>Shaohua</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1547196/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Deng</surname> <given-names>Lei</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/368959/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Li</surname> <given-names>Guoqi</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/368709/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Anup Das, Drexel University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Laxmi R. Iyer, Institute for Infocomm Research (A&#x0002A;STAR), Singapore; Maryam Parsa, George Mason University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Guoqi Li <email>liguoqi&#x00040;mail.tsinghua.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience</p></fn></author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>15</volume>
<elocation-id>726582</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>06</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Lin, Ding, Qiang, Deng and Li.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Lin, Ding, Qiang, Deng and Li</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>With event-driven algorithms, especially spiking neural networks (SNNs), achieving continuous improvement in neuromorphic vision processing, a more challenging event-stream dataset is urgently needed. However, it is well-known that creating an ES-dataset is a time-consuming and costly task with neuromorphic cameras like dynamic vision sensors (DVS). In this work, we propose a fast and effective algorithm termed Omnidirectional Discrete Gradient (ODG) to convert the popular computer vision dataset ILSVRC2012 into its event-stream (ES) version, generating about 1,300,000 frame-based images into ES-samples in 1,000 categories. In this way, we propose an ES-dataset called ES-ImageNet, which is dozens of times larger than other neuromorphic classification datasets at present and completely generated by the software. The ODG algorithm implements image motion to generate local value changes with discrete gradient information in different directions, providing a low-cost and high-speed method for converting frame-based images into event streams, along with Edge-Integral to reconstruct the high-quality images from event streams. Furthermore, we analyze the statistics of ES-ImageNet in multiple ways, and a performance benchmark of the dataset is also provided using both famous deep neural network algorithms and spiking neural network algorithms. We believe that this work shall provide a new large-scale benchmark dataset for SNNs and neuromorphic vision.</p></abstract>
<kwd-group>
<kwd>data set</kwd>
<kwd>spiking neural network</kwd>
<kwd>dynamic vision sensor</kwd>
<kwd>brain inspire computation</kwd>
<kwd>leaky integrate and fire</kwd>
</kwd-group>
<counts>
<fig-count count="11"/>
<table-count count="4"/>
<equation-count count="14"/>
<ref-count count="52"/>
<page-count count="14"/>
<word-count count="9206"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>In recent years, spiking neural networks (SNNs) have attracted extensive attention in the fields of computational neuroscience, artificial intelligence, and brain-inspired computing (Pei et al., <xref ref-type="bibr" rid="B37">2019</xref>; Roy et al., <xref ref-type="bibr" rid="B42">2019</xref>). Known as the third generation of neural networks (Maass, <xref ref-type="bibr" rid="B29">1997</xref>), SNNs have the ability to process spatiotemporal information and own stronger biological interpretability than artificial neural networks (ANNs, or deep neural networks). They have been applied in a number of tasks, such as pattern recognition (Schrauwen et al., <xref ref-type="bibr" rid="B44">2008</xref>; Rouat et al., <xref ref-type="bibr" rid="B41">2013</xref>; Zhang et al., <xref ref-type="bibr" rid="B52">2015</xref>), high-speed object tracking (Yang et al., <xref ref-type="bibr" rid="B51">2019</xref>), and optical flow estimation (Paredes-Vall&#x000E9;s et al., <xref ref-type="bibr" rid="B35">2019</xref>) with the help of neuromorphic hardware such as TrueNorth (Akopyan et al., <xref ref-type="bibr" rid="B2">2015</xref>), Loihi (Davies et al., <xref ref-type="bibr" rid="B10">2018</xref>), DaDianNao (Tao et al., <xref ref-type="bibr" rid="B45">2016</xref>), and Tianjic (Pei et al., <xref ref-type="bibr" rid="B37">2019</xref>). In recent years, the continuous expansion of datasets in image classification (LeCun et al., <xref ref-type="bibr" rid="B27">1998</xref>; Deng et al., <xref ref-type="bibr" rid="B13">2009</xref>; Krizhevsky and Hinton, <xref ref-type="bibr" rid="B24">2009</xref>), natural language processing (Nguyen et al., <xref ref-type="bibr" rid="B32">2016</xref>; Rajpurkar et al., <xref ref-type="bibr" rid="B40">2016</xref>), and other fields has been challenging the ability of AI and promoting the development of AI. The researchers hope that AI can surpass humans in the corresponding tasks. However, for the SNNs, the research is still in the rising stage with obstacles gradually appearing, where the lack of suitable datasets is one of the biggest ones. We now have <italic>N-MNIST</italic> (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>), <italic>N-Caltech101</italic> (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>), <italic>DVS-Gesture</italic> (Amir et al., <xref ref-type="bibr" rid="B3">2017</xref>), <italic>CIFAR10-DVS</italic> (Li et al., <xref ref-type="bibr" rid="B28">2017</xref>), and other neuromorphic datasets (or event-stream datasets, ES-datasets), but those existing datasets designed for SNNs are relatively small in scale. As more algorithms are proposed, the scale of SNNs is growing larger. Therefore, the existing datasets have found it difficult to meet the demand for training and validation of SNNs.</p>
<p>A compromised solution towards this problem is to train SNNs on the large-scale traditional static datasets directly. Taking image classification for instance, the common method is copying an image multiple times to form an image sequence, and then the sequence is fed into the spike encoding layer of an SNN, as <xref ref-type="fig" rid="F1">Figure 1A</xref> shows. However, there is an obvious shortcoming that the data redundancy makes the training cost increase many times without any effective information being added. For comparison, the way to train an SNN on an ES-dataset is also shown in <xref ref-type="fig" rid="F1">Figure 1B</xref>. Compared to the common method, it is more natural for SNNs to process such sparse and temporal data by making full use of temporal characteristics. So the datasets inspired by the neuromorphic visual sensor imaging mechanism are still considered to be the most suitable datasets for SNNs.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>(A)</bold> An approach for training the LIF-SNN (Dayan and Abbott, <xref ref-type="bibr" rid="B11">2001</xref>) on an ANN-oriented dataset. Here, the SNN uses rate coding and an ANN-like structure, so it can be trained using frame data naturally. <bold>(B)</bold> Training an LIF-SNN with GPUs on a DVS-dataset (ES-dataset recorded by DVS). Here we need to accumulate events within a small period as an event frame and get an event frame sequence with <italic>N</italic> frames for training. On the neuromorphic processor, these asynchronous event data can be processed more efficiently.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0001.tif"/>
</fig>
<p>Since SNNs have benefited from neuromorphic data, efforts are also devoted to recycling the existing RGB-cameras datasets to generate neuromorphic datasets. Mainly there are two different methods for this task. One is to use dynamic vision sensor (DVS) cameras to record a video generated from raw data with an LCD screen (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>; Li et al., <xref ref-type="bibr" rid="B28">2017</xref>). This method is time-consuming and costly, which is impossible for manufacturing a large-scale dataset. The other one is to generate the event data using software to simulate the principle of DVS cameras (Bi and Andreopoulos, <xref ref-type="bibr" rid="B6">2017</xref>; Gehrig et al., <xref ref-type="bibr" rid="B16">2020</xref>). This kind of method is more suitable for generating large-scale event-based datasets. However, the data redundancy caused by the existing converting algorithms increases the volume of the datasets. In this work, we optimize the existing algorithms of the second method to obtain the dataset with less redundancy.</p>
<p>In this way, an ES-dataset converted from the famous image classification dataset <italic>ILSVRC2012</italic> (Russakovsky et al., <xref ref-type="bibr" rid="B43">2015</xref>) is generated, which is named <italic>event-stream ImageNet</italic> or <italic>ES-ImageNet</italic>. In <italic>ES-ImageNet</italic>, there are about 1.3 M samples converted from <italic>ILSVRC2012</italic> in 1,000 different categories. <italic>ES-ImageNet</italic> is now the largest ES-dataset for object classification at present. We have sorted out the information of representative existing ES-datasets and compared them with <italic>ES-ImageNet</italic>, the results are summarized in <xref ref-type="table" rid="T1">Table 1</xref>. Having more categories and samples also probably makes it the most challenging classification ES-dataset, providing space for continuous improvement of event-driven algorithms.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Comparison between existing ES-datasets and ES-ImageNet.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Name</bold></th>
<th valign="top" align="left"><bold>Generating speed<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref></bold></th>
<th valign="top" align="left"><bold>Resolution</bold></th>
<th valign="top" align="left"><bold>&#x00023; of samples</bold></th>
<th valign="top" align="center"><bold>&#x00023; number</bold></th>
<th valign="top" align="left"><bold>&#x00023; Type</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">POKER-DVS (Prez-Carrasco et al., <xref ref-type="bibr" rid="B38">2013</xref>)</td>
<td valign="top" align="left">&#x02013;</td>
<td valign="top" align="left">32&#x000D7; 32</td>
<td valign="top" align="left">131</td>
<td valign="top" align="center">4</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">N-MNIST (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>)</td>
<td valign="top" align="left">300 ms/sample</td>
<td valign="top" align="left">28&#x000D7; 28</td>
<td valign="top" align="left">60, 000 training &#x0002B; 10, 000 test</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-Caltech101 (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>)</td>
<td valign="top" align="left">300 ms/sample</td>
<td valign="top" align="left">302&#x000D7; 245 on average</td>
<td valign="top" align="left">8709</td>
<td valign="top" align="center">100</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-UCF-50 (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>)</td>
<td valign="top" align="left">6,800 ms/sample</td>
<td valign="top" align="left">240&#x000D7; 180</td>
<td valign="top" align="left">6,676</td>
<td valign="top" align="center">50</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-Caltech-256 (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>)</td>
<td valign="top" align="left">1,010 ms/sample</td>
<td valign="top" align="left">240&#x000D7; 180</td>
<td valign="top" align="left">30,607</td>
<td valign="top" align="center">257</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-VOT-2015 (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>)</td>
<td valign="top" align="left">30 FPS, 20.70 s/sample</td>
<td valign="top" align="left">240&#x000D7; 180</td>
<td valign="top" align="left">67</td>
<td valign="top" align="center">&#x02013;</td>
<td valign="top" align="left">Track</td>
</tr>
<tr>
<td valign="top" align="left">DVS-CIFAR10 (Li et al., <xref ref-type="bibr" rid="B28">2017</xref>)</td>
<td valign="top" align="left">300 ms/sample</td>
<td valign="top" align="left">512&#x000D7; 512</td>
<td valign="top" align="left">10,000</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-Gesture (Amir et al., <xref ref-type="bibr" rid="B3">2017</xref>)</td>
<td valign="top" align="left">6 s/sample</td>
<td valign="top" align="left">128&#x000D7; 128</td>
<td valign="top" align="left">1,342</td>
<td valign="top" align="center">11</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">Pred-18 (Moeys et al., <xref ref-type="bibr" rid="B31">2018</xref>)</td>
<td valign="top" align="left">15 FPS</td>
<td valign="top" align="left">240&#x000D7; 180</td>
<td valign="top" align="left">1.25 h (67.5k frames)</td>
<td valign="top" align="center">2</td>
<td valign="top" align="left">Detect</td>
</tr>
<tr>
<td valign="top" align="left">Action Recognition (Miao et al., <xref ref-type="bibr" rid="B30">2019</xref>)</td>
<td valign="top" align="left">5 s/sample</td>
<td valign="top" align="left">346&#x000D7; 260</td>
<td valign="top" align="left">450</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">1Mpx Detection Dataset (de Tournemire et al., <xref ref-type="bibr" rid="B12">2020</xref>)</td>
<td valign="top" align="left">60 s/sample</td>
<td valign="top" align="left">304&#x000D7; 240</td>
<td valign="top" align="left">14.65 h, 255,781 objects</td>
<td valign="top" align="center">2</td>
<td valign="top" align="left">Detect</td>
</tr>
<tr>
<td valign="top" align="left">SL-ANIMALS-DVS (Vasudevan et al., <xref ref-type="bibr" rid="B46">2020</xref>)</td>
<td valign="top" align="left">&#x02013;</td>
<td valign="top" align="left">128&#x000D7; 128</td>
<td valign="top" align="left">1,102</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">DVS-Gait-Day/Night (Wang et al., <xref ref-type="bibr" rid="B47">2021</xref>)</td>
<td valign="top" align="left">3&#x02013;4 s/sample</td>
<td valign="top" align="left">128&#x000D7; 128</td>
<td valign="top" align="left">4,000</td>
<td valign="top" align="center">20</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">N-ROD (Cannici et al., <xref ref-type="bibr" rid="B8">2021</xref>)</td>
<td valign="top" align="left">6.6 s/sample</td>
<td valign="top" align="left">256&#x000D7; 256</td>
<td valign="top" align="left">41,877</td>
<td valign="top" align="center">51</td>
<td valign="top" align="left">Classify</td>
</tr>
<tr>
<td valign="top" align="left">ES-ImageNet</td>
<td valign="top" align="left">29.47 ms/sample<xref ref-type="table-fn" rid="TN2"><sup>b</sup></xref></td>
<td valign="top" align="left">224&#x000D7; 224<xref ref-type="table-fn" rid="TN3"><sup>c</sup></xref></td>
<td valign="top" align="left">1,257,035 training &#x0002B; 49,881 test</td>
<td valign="top" align="center">1000</td>
<td valign="top" align="left">Classify</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1">
<label>a</label>
<p><italic>The average time taken for generating each sample or average recording time (for directly recorded)</italic>.</p></fn>
<fn id="TN2">
<label>b</label>
<p><italic>Threshold = 0.18</italic>.</p></fn>
<fn id="TN3">
<label>c</label>
<p><italic>The events are generated in a range of 256&#x000D7; 256 pixels. But only those in the central 224&#x000D7; 224 pixels are meaningful, while others are noise-generated by the image edge&#x00027;s motion</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>A good conversion algorithm is expected to generate a dataset that is smaller than the source. If it is not required to imitate the characteristics of DVS, the optimal binary-coding conversion is able to encode the original information with the same size of data. So when the conversion algorithm generates a larger dataset than the original one, there must be data redundancy. In order to simulate the DVS cameras, we can allow a little redundancy. However, most of the existing conversion methods generate a much larger dataset [for example, N-MNIST (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>) and Flash-MNIST&#x00027;s storage volume is tens of GB, where the original MNIST is no larger than 100 MB]. This is far from the original intention of DVS sparsity, and it is not conducive to high-speed efficient processing and large-scale applications. So a simple bio-inspired algorithm called <italic>Omnidirectional Discrete Gradient</italic> (ODG) is applied. This algorithm captures the sequential features of images and then places them on the time axis with timestamps to generate event streams. It reduces the information redundancy, which brings higher generation speed and less data redundancy than the existing conversion algorithms. It can be regarded as the streamlining of random saccades for deep learning use, where the latter is a common bio-inspired generation method.</p>
<p>To guarantee a suitable sparsity of data and the amount of information, we also conduct preparatory experiments to control the event rates and the amount of information of the generated samples. Further analysis about the computation cost of different algorithms is conducted, which confirms that the dataset is an SNN-friendly dataset.</p>
<p>The main contributions of this work are 3-fold.</p>
<p>(i) We introduce a new large-scale ES-dataset named <italic>ES-ImageNet</italic>, which is aimed at examining SNNs&#x00027; ability to extract features from sparse event data and boosting research on neuromorphic vision. This work shall provide a new large-scale benchmark dataset for SNNs and neuromorphic vision tasks.</p>
<p>(ii) A new algorithm called ODG is proposed for converting data to its event stream version. We consider the guiding ideology behind it to be a paradigm for conversion from static data to ES-data, which avoids data redundancy.</p>
<p>(iii) Several ways for analyzing the dataset are provided, including information loss analysis using 2D information entropy (2D-Entropy) and the visual perception of the reconstructed pictures. Also two preparatory experiments are designed for designing the algorithm, which may provide inspiration for further improvement.</p>
<sec>
<title>Related Work</title>
<sec>
<title>ES-Datasets Collected Directly From the Real Scenarios</title>
<p>DVS cameras can generate unlabeled ES data directly (Brandli et al., <xref ref-type="bibr" rid="B7">2014</xref>). The ES data are often organized as a quad (<italic>x, y, t, p</italic>), where (<italic>x, y</italic>) are the topological coordinates of the pixel, <italic>t</italic> is the time of spike generation, and <italic>p</italic> is the polarity of the spike. Such datasets are easy to generate and close to practical application scenarios, like datasets that can be used for tracking and detection (Bardow et al., <xref ref-type="bibr" rid="B4">2016</xref>; Moeys et al., <xref ref-type="bibr" rid="B31">2018</xref>; de Tournemire et al., <xref ref-type="bibr" rid="B12">2020</xref>), datasets for 3D scene reconstruction (Carneiro et al., <xref ref-type="bibr" rid="B9">2013</xref>; Kim et al., <xref ref-type="bibr" rid="B22">2016</xref>), neural morphology datasets for optical flow estimation (Benosman et al., <xref ref-type="bibr" rid="B5">2013</xref>; Bardow et al., <xref ref-type="bibr" rid="B4">2016</xref>), and datasets for gesture recognition (Amir et al., <xref ref-type="bibr" rid="B3">2017</xref>). Due to the high sampling rate and authenticity, these kinds of datasets are of great help to the development of applications in high-speed scenes. But because of the huge workload of making real scenario recording datasets, their sizes are often small, which is difficult to meet the demand of examining deep SNNs algorithms.</p>
</sec>
<sec>
<title>Transformed ES-Datasets With Help of Neuromorphic Sensors</title>
<p>These datasets are mainly generated by the labeled static image datasets through neuromorphic sensors. Different from the first ones, these kinds of datasets are mainly obtained from the datasets which have been widely studied and used for traditional ANN tasks, such as <italic>N-MNIST</italic> (Orchard et al., <xref ref-type="bibr" rid="B34">2015</xref>), <italic>DVS-UCF-50, DVS-Caltech-256</italic> (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>), and <italic>CIFAR10-DVS</italic> (Li et al., <xref ref-type="bibr" rid="B28">2017</xref>). In order to make such datasets, one way is to use a screen to display a static picture, then face the DVS camera to the screen and move the camera along the designed trajectory to generate events. Because of the similarity between the transformed dataset and the original one, this kind of dataset can be used and evaluated easily. Therefore, they are also the most widely used datasets in SNN research. However, during the recording process, noise is introduced, especially caused by the flashing LCD screen.</p>
</sec>
<sec>
<title>Completely Software-Generated ES-Datasets Without Neuromorphic Sensors</title>
<p>The algorithms are used to simulate the characteristics of DVS cameras with labeled data here. The dynamic sensors can capture the dynamic information from the video streams or picture sequences, while this process can also be completed by specific algorithms (Bi and Andreopoulos, <xref ref-type="bibr" rid="B6">2017</xref>; Yang et al., <xref ref-type="bibr" rid="B51">2019</xref>; Gehrig et al., <xref ref-type="bibr" rid="B16">2020</xref>). These methods can avoid a large number of experiments needed for collecting data. However, the existing algorithms used for converting static data to event data always extract information with too much redundancy that is brought by the randomness or repetitiveness of the generation algorithms.</p>
</sec>
</sec>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and Methods</title>
<p>In this section, we will introduce a method to generate the <italic>ES-ImageNet</italic> with a corresponding reconstruction method, including the color space conversion, ODG processing, hyper-parameters choosing, and sparse storage.</p>
<sec>
<title>Color Space Conversion</title>
<p>Traditional ES-datasets utilize DVS cameras to record the changes of intensity asynchronously in the ES format, which encode per-pixel brightness changes. In RGB (red-green-blue) color models, a pixel&#x00027;s color can be described as a triplet (<italic>red, green, blue</italic>) or (<italic>R, G, B</italic>), which does not indicate brightness information directly. When using a HSV (hue-saturation-value) color model, it is described as (<italic>hue, saturation, value</italic>) or (<italic>H, S, V</italic>). Generally, the images in the <italic>ILSVRC2012</italic> dataset are stored in the RGB color space, therefore images need to be converted to the HSV color space, as shown in</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='right'><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:msup><mml:mn>0</mml:mn><mml:mo>&#x000B0;</mml:mo></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:msup><mml:mrow><mml:mn>60</mml:mn></mml:mrow><mml:mo>&#x000D7;</mml:mo></mml:msup><mml:mfrac><mml:mrow><mml:mi>G</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msup><mml:mn>0</mml:mn><mml:mo>&#x000B0;</mml:mo></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:mo>&#x000A0;</mml:mo><mml:msup><mml:mrow><mml:mn>60</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>G</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>360</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x02264;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:msup><mml:mrow><mml:mn>60</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>120</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:msup><mml:mrow><mml:mn>60</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>240</mml:mn></mml:mrow><mml:mo>&#x000B0;</mml:mo></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='right'><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>max</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='right'><mml:mtd columnalign='right'><mml:mrow><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mi>M</mml:mi></mml:mfrac><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mi>m</mml:mi><mml:mi>M</mml:mi></mml:mfrac></mml:mrow></mml:mtd><mml:mtd columnalign='right'><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>M</italic> &#x0003D; max{<italic>R, G, B</italic>} and <italic>m</italic> &#x0003D; min{<italic>R, G, B</italic>}. In this algorithm, we use <italic>V</italic> as a reference of light intensity. In the HSV hex-cone model, the value indicates the brightness of the color. And for the light source, the value is also related to the brightness of the illuminant, so it can be used as a reference for light intensity.</p>
</sec>
<sec>
<title>Event Generator</title>
<p>To stimulate the intensity changes, we use ODG here. Based on the truth that animals like toads or frogs can only respond to moving objects (Ewert, <xref ref-type="bibr" rid="B15">1974</xref>), we believe that we can obtain the necessary information for object recognition by imitating the frog retinal nerves, specifically, ganglion cells that generate features. Three important kinds of ganglion cells act as edge detectors, convex edge detectors, and contrast detectors, generating sparse local edge information. This inspires the main idea of ODG, which is artificially changing the light intensity and detecting the necessary local edge information in multiple directions.</p>
<p>Different from the widely used random saccades generation (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>), we only choose necessary directions in a fixed order and the necessary number of frames to minimize data redundancy, we will explain it later. This algorithm generates an event stream for each picture in <italic>ILSVRC2012</italic> with a specific moving path shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, and the algorithm is summarized in <xref ref-type="table" rid="T5">Algorithm 1</xref>. The trigger condition of the events is described in</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>T</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0003E;</mml:mo><mml:mi>T</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>p</italic>(<italic>x, y, t</italic>) denotes the polarity of the event at (<italic>x, y, t</italic>), <italic>V</italic> is the value of pixel, and <italic>Thresh</italic> is the difference threshold. This algorithm only involves linear operations with time complexity of <inline-formula><mml:math id="M5"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">O</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>T</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, where <italic>W</italic> denotes the width of the image and <italic>T</italic> is the length of time. ES-ImageNet is generated without randomness so that users can reconstruct the original information using the path information and design data augmentation freely.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The moving trajectory of images used to generate the events. The numbers in the small blue squares is the timestamp when an image reaches the corresponding position. The pipeline shows the complete process of generating an event stream.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0002.tif"/>
</fig>
<table-wrap position="float" id="T5">
<label>Algorithm 1:</label>
<caption><p>ODG event generator.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><bold>Require:</bold> &#x000A0;<italic>Image</italic></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Ensure:</bold> &#x000A0;<italic>Stream</italic></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;xTrace = [1,0,2,1,0,2,1,1,2], yTrace = [0,2,1,0,1,2,0,1,1],<break/>&#x000A0;&#x000A0;&#x000A0;&#x000A0;Thresh = 0.18, T = 8</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>function</bold> G<sc>ENERATOR</sc>(Image)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;W = = Image.size[0], H = Image.size[1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Image = zeroPadding(upSampling(Image, (254, 254)), 2)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;V = RGB2HSV(Image).V</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> t = 0 &#x02192; T <bold>do</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;x = xTrace[t], y = yTrace[t]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;newImage = V[x : x&#x0002B;W, y : y&#x0002B;H]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>if</bold> t &#x0003E; 0 <bold>then</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;oldX = xTrace[t-1], oldY = yTrace[t-1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;ImgDiff = newImage - lastImage</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;posEvent = ImgDiff(ImgDiff &#x02265; Thresh), negEvent = ImgDiff(ImgDiff &#x02264; -Thresh)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> i = 0 &#x02192; len(posEvent) <bold>do</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Ex = posEvent[0], Ey = posEvent[1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>if</bold> (Ex, Ey) is in valid range <bold>then</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;posStream.append((Ex, Ey,t))</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end if</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end for</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> i = 0 &#x02192; len(negEvent) <bold>do</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Ex = negEvent[0], Ey = negEvent[1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>if</bold> (Ex,Ey) is in valid range <bold>then</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;negStream.append((Ex,Ey,t))</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end if</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end for</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end if</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;lastImage = newImage</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end for</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>end function</bold></td>
</tr> 
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="T5">Algorithm 1</xref>, there are four hyper-parameters to be selected: a sequence of the x coordinate (<italic>xTrace</italic>), a sequence of the y coordinate (<italic>yTrace</italic>), the difference threshold (<italic>Thresh</italic>) in Equation (4), and the number of time steps (<italic>T</italic>). We designed two preparatory experiments to determine these hyper-parameters.</p>
</sec>
<sec>
<title>Select the Hyper-Parameters</title>
<sec>
<title>Trajectory</title>
<p>The choice of the path is important, which includes designing <italic>xTrace</italic> and <italic>yTrace</italic> along with choosing <italic>T</italic>. Most of the existing conversion methods choose fast random saccades or repeated fixed paths. The former selects eight directions for simulating fast eye movement (random saccades), while the latter uses only four directions [repeated closed-loop smooth (RCLS)], as shown in <xref ref-type="fig" rid="F3">Figures 3A&#x02013;C</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Comparison of three different kinds of image motion. <bold>(A)</bold> The candidate moving directions used in random saccades generation of DVS-Caltech-256 and DVS-UCF-50 (Hu et al., <xref ref-type="bibr" rid="B19">2016</xref>). <bold>(B)</bold> The path used in RCLS of DVS-CIFAR-10 (Li et al., <xref ref-type="bibr" rid="B28">2017</xref>). <bold>(C)</bold> Trajectory used in ES-ImageNet. <bold>(D)</bold> Illustration to explain why opposite directions in the generation path would only generate opposite events. <bold>(E)</bold> 2D-Entropy comparison among the three generating paths with different steps (<italic>T</italic>). ODG is superior to the other two methods in the sense of 2D-Entropy based on the reconstructed gray images.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0003.tif"/>
</fig>
<p>To verify the information obtained by these different methods, we evaluate it by comparing their 2D information entropy (2D-Entropy), which is positively correlated with the amount of information in an image. The average neighborhood gray value of the image is selected to represent the spatial characteristics, and a 3 &#x000D7; 3 field is commonly used. The feature pair (<italic>i,j</italic>) is used to calculate 2D-entropy, where <italic>i</italic> is the gray value of the pixel, and <italic>j</italic> is the rounding down of the mean neighborhood gray value. The 2D-Entropy of the gray image is defined as Equation (5), where <italic>p</italic><sub>(<italic>i,j</italic>)</sub> denotes the frequency of the feature pair (<italic>i, j</italic>) and <italic>g</italic> is the gray level.</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M6"><mml:mrow><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>g</mml:mi></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>g</mml:mi></mml:munderover><mml:mo>&#x02212;</mml:mo></mml:mstyle></mml:mrow></mml:mstyle><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Because these algorithms use the frame difference for event generation, and the adjacent frames are actually the same image, the movement with the opposite direction would always generate events with opposite polarity. Therefore, a new step with an opposite direction to the existing movement does not add more effective information into the sample, and that is how the existing algorithms can be optimized. As shown in <xref ref-type="fig" rid="F3">Figure 3D</xref>, the number in the cell denotes the pixel value. A row of pixels move left or right, and calculating the difference under the same threshold would only obtain a series of events with exactly the opposite polarity.</p>
<p>Based on this observation, we avoid the repeated or the opposite path design in the <italic>ODG</italic>. Furthermore, to quantitatively illustrate the benefits, we randomly select 100 images from ImageNet-1K, extract events in different <italic>T</italic> with the three different methods, and then reconstruct them into gray images to calculate 2D-Entropy. In this way, we get <xref ref-type="fig" rid="F3">Figure 3E</xref>, and the higher curve of ODG may support our modification.</p>
<p>Through analyzing the information (2D-Entropy) curves calculated for each method over several time steps in <xref ref-type="fig" rid="F3">Figure 3E</xref>, we find that the 2D-Entropy increases slowly after <italic>T</italic> &#x02265; 6, but the size of the dataset would still increases linearly with <italic>T</italic> getting larger. In order to make a balance between the temporal characteristics, the amount of information and the size of dataset, we set <italic>T</italic> &#x0003D; 8.</p>
</sec>
<sec>
<title>Threshold</title>
<p>An important indicator for the ES-dataset is event rate, which is defined as the proportion of pixels that have triggered an event. The most influential parameter for event rate is the threshold <italic>Thresh</italic> (when the motion path is fixed). Because of the high correlation of brightness between adjacent pixels, it is hard to estimate the distribution of the difference between adjacent pixels using the static method, so a preparatory experiment is needed. We randomly select 5 pictures from each category and get 5,000 pictures. The threshold in the experiment varies from 0.1 to 0.4. The results are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. After many tests, we choose 0.18 as the threshold value, for an estimated event rate of 5.186%, with the event rate of most samples being in the range of 1 to 10%. This result will be verified on the whole dataset. It should be noted that many events may be generated by the movement of the edge of the image, and they have been wiped out.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>A preparatory experiment for determining the threshold in ODG. We observed that the event rate shows a trend of an exponential decrease in the threshold. In consideration of the event rate of generated samples and to avoid generating too many invalid samples that have an extremely low event rate, we choose <italic>threshold</italic> &#x0003D; 0.18.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0004.tif"/>
</fig>
<p>In addition, the original images&#x00027; longest sides are normalized to 255 by the nearest-interpolation algorithm. Nearest interpolation is mapping from the normalized coordinates after zooming into the integer grid coordinates. The generated event stream version training set is 99 GB and test set is 4.2 GB, which are stored in the quad format (<italic>x, y, t, p</italic>). If converted to a frame version like a short video, the size of the whole dataset can be further reduced to 37.4 GB without information loss. For ease of use, we store all of these tensors as a file in the .npz format, using the scientific computing package "Numpy" of Python. The event-frame format version is more suitable for deep learning, and we will also provide this version, while the quad format version is the classical ES-dataset.</p>
</sec>
</sec>
<sec>
<title>Data Analysis</title>
<sec>
<title>Event Rate</title>
<p>To examine the quality of the data, we calculate the event rates of the whole generated dataset and summarize them in <xref ref-type="table" rid="T2">Table 2</xref>. It can be seen that the pixels which trigger the events are only about one-twentieth of all pixels. And from this point of view, the prediction obtained from the preparatory experiment is accurate. Since our events are generated from the images processed by nearest-neighbor interpolation, our event rate statistics are also calculated in this range. When we use the training data, we often place the positive events and negative events in different channels and organize them in the unified 2 &#x000D7; 224 &#x000D7; 224 (<italic>C</italic> &#x000D7; <italic>W</italic> &#x000D7; <italic>H</italic>) frame format. Therefore, we re-calculate the event rate during the training process in <xref ref-type="table" rid="T2">Table 2</xref>, which is lower than that of the generating process and is more meaningful for the training process.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Event rate of ES-ImageNet.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Generating process</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Training set</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Testing set</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Mean</bold></th>
<th valign="top" align="center"><bold>&#x003C3;</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
<th valign="top" align="center"><bold>&#x003C3;</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Events</td>
<td valign="top" align="center">5.215%</td>
<td valign="top" align="center">3.776%</td>
<td valign="top" align="center">5.385%</td>
<td valign="top" align="center">3.837%</td>
</tr>
<tr>
<td valign="top" align="left">ON</td>
<td valign="top" align="center">5.211%</td>
<td valign="top" align="center">3.777%</td>
<td valign="top" align="center">5.385%</td>
<td valign="top" align="center">3.838%</td>
</tr>
<tr>
<td valign="top" align="left">OFF</td>
<td valign="top" align="center">5.22%</td>
<td valign="top" align="center">3.78%</td>
<td valign="top" align="center">5.38%</td>
<td valign="top" align="center">3.84%</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left"><bold>Event-frame format</bold></td>
<td valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Training set</bold></td>
<td valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Testing set</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center"><bold>Mean</bold></td>
<td valign="top" align="center">&#x003C3;</td>
<td valign="top" align="center"><bold>Mean</bold></td>
<td valign="top" align="center"><bold>&#x003C3;</bold></td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Events</td>
<td valign="top" align="center">4.461%</td>
<td valign="top" align="center">3.560%</td>
<td valign="top" align="center">5.231%</td>
<td valign="top" align="center">3.770%</td>
</tr>
<tr>
<td valign="top" align="left">ON</td>
<td valign="top" align="center">4.458%</td>
<td valign="top" align="center">3.570%</td>
<td valign="top" align="center">5.229%</td>
<td valign="top" align="center">3.770%</td>
</tr>
<tr>
<td valign="top" align="left">OFF</td>
<td valign="top" align="center">4.460%</td>
<td valign="top" align="center">3.560%</td>
<td valign="top" align="center">5.230%</td>
<td valign="top" align="center">3.770%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Furthermore, we calculate the distribution histogram of positive and negative events and show it in <xref ref-type="fig" rid="F5">Figure 5</xref>. The results in the figure show that the distribution of positive and negative events is very close, which may be because most of the entities in the original images are represented as closed graphics.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>A detailed inspection about fire rate. Most samples have a 5 to 6% event rate, and this figure shows a significant skew distribution. A sample with a 5% event rate is also at a relatively sparse level when recorded by event cameras.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0005.tif"/>
</fig>
</sec>
<sec>
<title>Visualization</title>
<p>To show the quality of the data intuitively, we reconstruct the original pictures from event streams. Firstly we accumulate the events into frames, and we obtain eight (<italic>T</italic> &#x0003D; 8) event frames. Different from the traditional DVS-dataset, our dataset is generated along a fixed path with multiple directions, so when we try to reconstruct the original pictures, we need to accumulate the difference frames (so called Edge-Integral used in Le Moigne and Tilton, <xref ref-type="bibr" rid="B26">1995</xref>) along the opposite direction of the generating path. The results are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, and the pseudo-code of Edge-Integral can be found in <xref ref-type="table" rid="T6">Algorithm 2</xref>. A visualization demo can be found in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Materials</xref>.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>(A)</bold> The visualization of ES-ImageNet. We show a few samples reconstructed from event streams and the event frames at the last three time steps for each sample. These examples are from four different categories and can be clearly identified. <bold>(B)</bold> Quality comparison of the reconstruction results of direct summation and Edge-Integral. Here <italic>I</italic>(<italic>x, y, t</italic>) means the intensity in (x,y) in the event frame at <italic>time</italic> &#x0003D; <italic>t</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0006.tif"/>
</fig>
<table-wrap position="float" id="T6">
<label>Algorithm 2:</label>
<caption><p>Edge-Integral.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" valign="top"><bold>Require:</bold> &#x000A0;<italic>imageList</italic></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Ensure:</bold> &#x000A0;<italic>grayImage</italic></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>function</bold> G<sc>ENERATOR</sc>(Image)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Xtrace = [1,0,2,1,0,2,1,1,2]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Ytrace = [0,2,1,0,1,2,0,1,1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;imSize = size(imageList[0])</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;H = imSize[0], W= imSize[1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;SUM = zeros(H&#x0002B;4,W&#x0002B;4)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;T = length(ImageList)</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> <italic>t</italic> &#x0003D; 0 &#x02192; <italic>T</italic> <bold>do</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;dx = Xtrace[j]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;dy = Ytrace[j]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;frame=imageList[t]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;SUM[2-dx:2-dx&#x0002B;H,2-dy:2-dy&#x0002B;W] &#x0002B;= frame[0]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;SUM[2-dx:2-dx&#x0002B;H,2-dy:2-dy&#x0002B;W] -= frame[1]</td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end for</bold></td>
</tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>gray</italic>_<italic>image</italic> = SUM</td>
</tr>
<tr>
<td align="left" valign="top"><bold>end function</bold></td>
</tr> 
</tbody>
</table>
</table-wrap>
<p>Analyzing the process of conversion, we know that there are three operations potentially causing the information loss. Firstly, only the information in the V channel of the HSV color space is used, and secondly, the gradient information obtained is also approximate, while thirdly, the information is stored in low-bit. According to the method in <xref ref-type="fig" rid="F6">Figure 6</xref>, we are able to reconstruct the gray images, which can also be directly obtained from the original color images by the weighted sum of (<italic>R, G, B</italic>).</p>
</sec>
<sec>
<title>Information</title>
<p>To further analyze the loss of information during the conversion, we still turn to the 2D-Entropy of the gray images defined in Equation (5). We randomly collect 5000 RGB-images in <italic>ILSVRC2012</italic> (5 per class) and convert them into gray images with 256 gray levels, 17 gray levels, and 5 gray levels, respectively. And then we find the converted samples of those RGB-images in <italic>ES-ImageNet</italic> and reconstruct the corresponding gray images. Because the default is <italic>T</italic> &#x0003D; 8 in <italic>ES-ImageNet</italic>, and each pixel value could be 0, 1, or &#x02212;1, the reconstructed samples will have a total of 17 gray levels (from 0 to 16).</p>
<p>The ordinal meaning of 2D-Entropy can tell us what level the amount of information of <italic>ES-ImageNet</italic> is, which is no less than that with 5 gray-level compressed RGB-images and almost the same as that with 17 gray-level compressed RGB-images, as <xref ref-type="fig" rid="F7">Figure 7</xref> shows. It should be noted that the reconstruction process also causes information loss, so the original <italic>ES-ImageNet</italic> may have more efficient information than we speculate. Considering that the application of neural morphological data does not need many high-level features, we believe that the amount of information can make this dataset a nice validation tool of SNN.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>2D-Entropy histogram of the different compression levels of ILSVRC2012 sample groups and the reconstructed ES-ImageNet sample group. We compare the 2D-Entropy of four sample groups here. The reconstructed group indicates that the samples in ES-ImageNet potentially have effective information for object classification.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0007.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Training Experiments</title>
<p>Because the size of this dataset is very large, it is difficult to train a classical classifier (such as K-nearest neighbor) on it compared to other DVS-datasets (Li et al., <xref ref-type="bibr" rid="B28">2017</xref>). Statistical learning methods such as support vector machine (SVM) do not perform well on large-scale datasets with many categories, and it might take days to train a vanilla nonlinear SVM on a dataset with only 500 K samples (Rahimi and Recht, <xref ref-type="bibr" rid="B39">2007</xref>). To examine the quality of this dataset, we turn to four different types of deep neuron networks, two of which are ANNs while the others are SNNs. The structure of ResNet-18 and ResNet-34 (He et al., <xref ref-type="bibr" rid="B18">2016</xref>) are applied in the experiments. The results of these experiments provide a benchmark for this dataset. It is noted that all of the accuracy mentioned here is top-1 test accuracy.</p>
<p>For ANNs, the two dimension convolutional neural network (2D-CNN) (Krizhevsky et al., <xref ref-type="bibr" rid="B25">2012</xref>) has become a common tool for image classification. In order to train 2D-CNN on the ES-dataset, a common approach is to accumulate the events into event frames according to the time dimension and then reconstruct the gray images (Wu et al., <xref ref-type="bibr" rid="B50">2020</xref>) for training. Here we use the Edge-Integral algorithm described in <xref ref-type="fig" rid="F6">Figure 6</xref> for reconstruction. The network structures we use here are the same as those in the original paper (He et al., <xref ref-type="bibr" rid="B18">2016</xref>).</p>
<p>Meanwhile, regarding the time dimension as the depth, this dataset can also be considered as a video dataset, so the classic video classification methods can also be utilized, like 3D-CNN (Ji et al., <xref ref-type="bibr" rid="B21">2013</xref>; Hara et al., <xref ref-type="bibr" rid="B17">2018</xref>). By introducing the convolution of depth dimension, 3D-CNN has the ability of processing time-domain information. The structures we used are 3D-ResNet-18 and 3D-ResNet-34, and the convolution kernel is chosen to be 3 &#x000D7; 3 &#x000D7; 3, which ensures that the largest receptive field of the network can cover the whole time (depth) dimension.</p>
<p>For SNNs, we choose an SNN based on leaky integrate-and-fire (LIF) neurons (Dayan and Abbott, <xref ref-type="bibr" rid="B11">2001</xref>) and an SNN based on leaky integrate-and-analog-fire (LIAF) (Wu et al., <xref ref-type="bibr" rid="B50">2020</xref>) neurons. Rate coding (Adrian and Zotterman, <xref ref-type="bibr" rid="B1">1926</xref>) is used to decode the event information because the significance of the specific time when the spikes appear in this dataset is weaker than the number of spikes. Both of the SNN models are trained using the STBP method (Wu et al., <xref ref-type="bibr" rid="B49">2019</xref>) and sync-batch normalization (Ioffe and Szegedy, <xref ref-type="bibr" rid="B20">2015</xref>), and the network structures similar to ResNet-18 and ResNet-34 are built as shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. The basic LIF (Dayan and Abbott, <xref ref-type="bibr" rid="B11">2001</xref>) model is described in Equation (6),</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M7"><mml:mrow><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>U</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:msub><mml:mi>I</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mo>&#x000A0;</mml:mo><mml:mi>U</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x02265;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where U is the membrane potential, <italic>E</italic><sub><italic>L</italic></sub> is adjusted to make the resting potential match that of the cell <italic>L</italic> being modeled. <italic>I</italic><sub><italic>e</italic></sub> is the input current and the <italic>R</italic><sub><italic>m</italic></sub> is the membrane resistance. <italic>U</italic><sub><italic>reset</italic></sub> is a parameter adjusted according to the experiment data, and &#x003C4;<sub><italic>m</italic></sub> is the membrane time coefficient. The LIF neuron will fire a spike when <italic>U</italic> reaches the <italic>U</italic><sub><italic>thresh</italic></sub>, and the spike can be {0,1} in LIF or an analog value in LIAF. Solving the model, we have the <italic>U</italic>(<italic>t</italic>), as shown in Equation (7).</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M8"><mml:mrow><mml:mi>U</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:msub><mml:mi>I</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>U</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:msub><mml:mi>I</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>t</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>SNN structure used in the experiments. On the right, we show the internal structure of LIAF neurons. By changing synaptic connections, we can obtain a variety of layer structures, where CN denotes the convolutional layer and BN denotes the 3D-BatchNorm layer. Using these layers, we can build a scalable LIAF residual network structure.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0008.tif"/>
</fig>
<p>This equation does not take the reset action into consideration. For large-scale computer simulation, simplification is needed on this model and using the discrete LIAF/LIF model. Using <italic>l</italic> to present the layer index and <italic>t</italic> for the time, the LIAF model can be described by the following equations</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M9"><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E9"><label>(9)</label><mml:math id="M10"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E10"><label>(10)</label><mml:math id="M11"><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E11"><label>(11)</label><mml:math id="M12"><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mi>d</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>h</italic> is the weighted sum function of the input vector <italic>o</italic><sup><italic>t,l</italic>&#x02212;1</sup>, which is related to the specific connection mode of synapses and is equivalent to <italic>R</italic><sub><italic>m</italic></sub><italic>I</italic><sub><italic>e</italic></sub>. <italic>s</italic> is the spike used to reset the membrane potential to <italic>U</italic><sub><italic>reset</italic></sub> and <italic>o</italic><sup><italic>t, l</italic></sup> is the output of neurons to the next layer. We often use <italic>d</italic>(<italic>x</italic>) &#x0003D; &#x003C4;(1 &#x02212; <italic>x</italic>) for simplification in this model, where <italic>d</italic>(<italic>x</italic>) describes the leaky processing and &#x003C4; is a constant relative to &#x003C4;<sub><italic>m</italic></sub>. <italic>f</italic> is usually a threshold-related spike function, while <italic>g</italic> is selected to be a commonly used continuous activation function. If <italic>g</italic> is chosen as the same function as <italic>f</italic>, then the above model is simplified to the LIF model as</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M13"><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:math></disp-formula>
<disp-formula id="E13"><label>(13)</label><mml:math id="M14"><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E14"><label>(14)</label><mml:math id="M15"><mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>0</mml:mn><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mi>&#x003C4;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>To build a Spiking-ResNet model, we proposed the spiking convolutional layer and spiking-ResBlock structure. Only <italic>h</italic> in Equation (8) and Equation (12) needs to be changed to become different types of SNN layers. For the full-connection layer (or Dense), we choose <inline-formula><mml:math id="M16"><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002A;</mml:mo><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, where <italic>W</italic><sub><italic>l</italic></sub> is the weight matrix of the <italic>l</italic> layer. In the convolutional layer, <inline-formula><mml:math id="M17"><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02297;</mml:mo><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, where &#x02297; is the convolution operation.</p>
<p>The residual block structure we used in the SNN is a little bit different. For better performance in deep SNN training, we add a 3D-BatchNorm layer on the membrane potential, where we treat the temporal dimension in the SNN as the depth of the general 3D data. In <xref ref-type="fig" rid="F8">Figure 8</xref>, CN denotes the convolutional layer and BN denotes the 3D-BatchNorm layer, the <italic>mem_update</italic> layers are described by Equations (9)&#x02013;(11) in LIAF-ResNet, and Equations (13)&#x02013;(14) in LIF-ResNet. To keep the coding consistent, before each output of residual block, we add a <italic>mem_update</italic> layer.</p>
<p>The best test results are obtained based on the same set of hyper-parameters and different random seeds, which are shown in <xref ref-type="table" rid="T3">Table 3</xref>, and the results are listed in <xref ref-type="table" rid="T4">Table 4</xref>. During the training, the initial learning rate is 0.03, the optimizer is ADAM (Kingma and Ba, <xref ref-type="bibr" rid="B23">2014</xref>), and the learning rate is optimized by the StepLR learning schedule. NVIDIA-RTX2080Tis are used for training and the Pytorch (Paszke et al., <xref ref-type="bibr" rid="B36">2019</xref>) deep-learning framework is used for programming for all of these experiments.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Hyper-parameter setting.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="left"><bold>Names</bold></th>
<th valign="top" align="left"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Network</td>
<td valign="top" align="left">T</td>
<td valign="top" align="left">8</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Thresh</td>
<td valign="top" align="left">0.5</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Decay</td>
<td valign="top" align="left">0.5</td>
</tr>
<tr>
<td valign="top" align="left">Optimizer (ADAM)</td>
<td valign="top" align="left">Lr</td>
<td valign="top" align="left">3e-2</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003B2;<sub>1</sub>, &#x003B2;<sub>2</sub>, &#x003BB;</td>
<td valign="top" align="left">0.9,0.999,1e-8</td>
</tr>
<tr>
<td valign="top" align="left">Activation</td>
<td valign="top" align="left">Lens</td>
<td valign="top" align="left">0.5</td>
</tr>
<tr>
<td valign="top" align="left">StepLR</td>
<td valign="top" align="left">Nepoch</td>
<td valign="top" align="left">10</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003B1;</td>
<td valign="top" align="left">0.2</td>
</tr>
<tr>
<td valign="top" align="left">Others</td>
<td valign="top" align="left">BatchSize</td>
<td valign="top" align="left">224<xref ref-type="table-fn" rid="TN4"><sup>a</sup></xref>/160<xref ref-type="table-fn" rid="TN5"><sup>b</sup></xref></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Max Epoch</td>
<td valign="top" align="left">50</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN4">
<label>a</label>
<p><italic>Used for the training of ResNet-18</italic>.</p></fn>
<fn id="TN5">
<label>b</label>
<p><italic>Used for the training of ResNet-34</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Test results &#x00026; benchmarks.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Structure</bold></th>
<th valign="top" align="center"><bold>Type</bold></th>
<th valign="top" align="center"><bold>Test Acc/%</bold></th>
<th valign="top" align="center"><bold>&#x00023; of Para</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ResNet18</td>
<td valign="top" align="center">2D-CNN</td>
<td valign="top" align="center">41.030</td>
<td valign="top" align="center">11.68M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">3D-CNN</td>
<td valign="top" align="center"><bold>43.140</bold></td>
<td valign="top" align="center">28.56M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">LIF (baseline)</td>
<td valign="top" align="center">39.894</td>
<td valign="top" align="center">11.69M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">LIAF</td>
<td valign="top" align="center">42.544</td>
<td valign="top" align="center">11.69M</td>
</tr>
<tr>
<td valign="top" align="left">ResNet34</td>
<td valign="top" align="center">2D-CNN</td>
<td valign="top" align="center">42.736</td>
<td valign="top" align="center">21.79M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">3D-CNN</td>
<td valign="top" align="center">45.380</td>
<td valign="top" align="center">48.22M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">LIF (baseline)</td>
<td valign="top" align="center">43.424</td>
<td valign="top" align="center">21.80M</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">LIAF</td>
<td valign="top" align="center"><bold>47.466</bold></td>
<td valign="top" align="center">21.80M</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Test Results</title>
<p>As <xref ref-type="table" rid="T4">Table 4</xref> shows, the highest test accuracy based on the ResNet-18 structure is obtained by the 3D-CNN, which is 43.140%. And the best result on ResNet-34 reaches 47.466% obtained by the LIAF-SNN. In order to show the relationship between parameter quantity and accuracy more intuitively, we provide <xref ref-type="fig" rid="F9">Figure 9</xref> and use the area of the disk to show the number of parameter, highlighting the efficiency of SNN. The experimental results of LIF-SNN, which is the traditional SNN model, will provide a baseline for this dataset, and we expect more advanced and large-scale SNNs or other neuromorphic algorithms to be tested on this dataset.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Testing accuracy with the structure of ResNet-18 and ResNet-34. The radius of the data points represents the relative size of the parameters of the networks.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0009.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<sec>
<title>Performance</title>
<p>Observing the results, we will find that the SNN models can obtain a relatively high classification accuracy with fewer parameters. The sparsity of the data in <italic>ES-ImageNet</italic> may lead to this phenomenon, for SNN can deal with spatiotemporal information efficiently, and a large number of parameters in an ANNs-based video classification algorithm (like 3D-CNN tested in this article) may cause over-fitting on this dataset.</p>
<p>We find the other two reasons for the accuracy loss. Wrongly labeled samples may also seriously interfere with the training progress. This problem is obvious in <italic>ImageNet</italic> (Northcutt et al., <xref ref-type="bibr" rid="B33">2019</xref>), and we also found this problem when we conducted a manual inspection, but there is currently no good method for efficient and accurate screening. Another problem is the information loss. Given this problem, we propose several possible ways to optimize it. One is to filter out more samples with the highest and lowest information entropy (representing the largest noise rate and the smallest amount of information, respectively) in the training set. The other is to increase the number of time steps of the transformation, but it will increase the storage cost.</p>
<p>It should be noted that in the experiments we do not use any data augmentation method. In fact, placing event frames in random order, using dynamic time length, or processing each frame with random clipping are acceptable on this dataset and may bring significant performance boost. Research is under way on such data augmentation and pre-training technologies, and we hope more related research can use this dataset.</p>
</sec>
<sec>
<title>Computation Cost</title>
<p>To make a more objective comparison, we also count and compare the theoretically minimum number of FP32 operands required by the feed-forward process of these networks by measuring the power consumption in a field-programmable gate array (FPGA).</p>
<p>Here we compare the number of necessary calculation operands required by the feed-forward process of eight different networks used in the main article. It should be noted that we calculate the number of floating-point multiplication operands and floating-point addition operands separately (not MACs), and the operands of normalization layers are not included in the calculation.</p>
<p><bold>2D-CNNs</bold> use the ResNet structures with 18/34 layers, and most of the operands are bolstered by convolution layers. In this work, we compress and reconstruct the 4-dimensional event data in <italic>ES-ImageNet</italic> into 2-dimensional gray images, then feed them into 2D-CNNs. The process is then the same as the way we train a ResNet on <italic>ImageNet</italic>. In the network, the dimensions of the features change in the following order: [1(<italic>channel</italic>) &#x000D7; 224(<italic>width</italic>) &#x000D7; 224(<italic>height</italic>)] &#x02192; (<italic>maxpooling</italic>)[64 &#x000D7; 110 &#x000D7; 110] &#x02192; [64 &#x000D7; 55 &#x000D7; 55] &#x02192; [128 &#x000D7; 28 &#x000D7; 28] &#x02192; [256 &#x000D7; 14 &#x000D7; 14] &#x02192; [512 &#x000D7; 7 &#x000D7; 7] &#x02192; [512] &#x02192; [1000].</p>
<p><bold>3D-CNNs</bold> consider the depth dimension (Ji et al., <xref ref-type="bibr" rid="B21">2013</xref>), and treat this dataset as a video dataset (Hara et al., <xref ref-type="bibr" rid="B17">2018</xref>), so the feature is kept in four dimensions in ResBlocks. In the network, the dimensions of the features change in the following order: [2(<italic>channel</italic>) &#x000D7; 8(<italic>depth</italic>) &#x000D7; 224(<italic>width</italic>) &#x000D7; 224(<italic>height</italic>)] &#x02192; (<italic>maxpooling</italic>)[64 &#x000D7; 8 &#x000D7; 110 &#x000D7; 110] &#x02192; [64 &#x000D7; 4 &#x000D7; 55 &#x000D7; 55] &#x02192; [128 &#x000D7; 2 &#x000D7; 28 &#x000D7; 28] &#x02192; [256 &#x000D7; 1 &#x000D7; 14 &#x000D7; 14] &#x02192; [512 &#x000D7; 1 &#x000D7; 7 &#x000D7; 7] &#x02192; [512] &#x02192; [1000].</p>
<p>The training procedure of <bold>LIF-SNNs</bold> is like running eight 2D-CNNs along with the processing of the last moment of membrane potential memory information and the spikes inputs for every layer, then averaging the spike trains in the time dimension in the final linear layer to decode the spiking rate. These networks keep the data in four dimensions with <italic>T</italic> &#x0003D; 8 unchanged until the decoding layer, so the dimensions of the features change in the following order: [2(<italic>channel</italic>) &#x000D7; 8(<italic>T</italic>) &#x000D7; 224(<italic>width</italic>) &#x000D7; 224(<italic>height</italic>)] &#x02192; (<italic>maxpooling</italic>)[64 &#x000D7; 8 &#x000D7; 110 &#x000D7; 110] &#x02192; [64 &#x000D7; 8 &#x000D7; 55 &#x000D7; 55] &#x02192; [128 &#x000D7; 8 &#x000D7; 28 &#x000D7; 28] &#x02192; [256 &#x000D7; 8 &#x000D7; 14 &#x000D7; 14] &#x02192; [512 &#x000D7; 8 &#x000D7; 7 &#x000D7; 7] &#x02192; [8(<italic>depth</italic>) &#x000D7; 512](<italic>ratedecoded</italic>) &#x02192; [512] &#x02192; [1000].</p>
<p>The training procedure of <bold>LIAF-SNNs</bold> is almost the same as <bold>LIF-SNNs</bold>, the only difference with <bold>LIF-SNNs</bold> is that they do not use binary spikes to convey information between layers, instead they use an analog spike. The dimensions of the features are the same as the ones in <bold>LIF-SNNs</bold>.</p>
<p>It is worth noting that since the input of <bold>LIF-SNNs</bold> is only 0 and 1, convolution does not need to compute floating-point multiplication, but does need to compute addition under a limited combination. As a large number of zeros appear in the input of each layer of <bold>LIF-SNNs</bold>, the optimization of sparse input for <bold>LIF-SNNs</bold> has become a formula in SNN accelerators. Therefore, in order to make a fair comparison, we can use the average fire rate obtained in the experiment multiplied by the input of the SNN as the proportion of the number of floating-point numbers that need to participate in the addition calculation (FP32 &#x0002B;), so as to estimate the actual amount of computation of SNNs and CNNs. We observe that the fire rate always shows a downward trend with the increase of training epochs, which means a decrease of meaningless spikes.</p>
<p>In this experiment, the initial fire rate of SNN is no larger than 30%, and with the increase of training epochs, the fire rate would gradually reduce to less than 10%. To compare the results of SNN in the worst case, we take 30% as the sparsity rate. Based on these conditions we can get <xref ref-type="fig" rid="F10">Figure 10</xref>.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>The comparison of FP32 addition (FP32 &#x0002B;) and FP32 multiplication (FP32 x) operations in the feed-forward process between the models we use in the experiments. It should be noted that the number of FP32&#x0002B; operands of LIF have been multiplied by a sparsity factor (30%) for a fair comparison.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0010.tif"/>
</fig>
<p>One of the advantages of an SNN compared with an ANN is its power consumption (especially in SNNs&#x00027; accelerators). On FPGAs the SNNs could have a significant power advantage if the training algorithm is well designed. The data in Wu et al. (<xref ref-type="bibr" rid="B48">2019</xref>) about the basic operands&#x00027; power consumption can provide an estimation of the power consumption of the networks in the experiments. Each FP32 &#x0002B; operation requires 1.273 pJ of energy, and each FP32 operation requires 3.483 pJ of energy. Then we can get the result in <xref ref-type="fig" rid="F11">Figure 11A</xref>. In addition, we also give the power comparison commonly used for SNNs (Deng et al., <xref ref-type="bibr" rid="B14">2020</xref>) in <xref ref-type="fig" rid="F11">Figure 11B</xref>, where we calculate the energy for each feed-forward process (so we call it power). It should be noted that SNNs need <italic>T</italic> frames to give one prediction, and both the 3D-CNN and 2D-CNN give one prediction based on one frame.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Energy and power consumption in the experiments. <bold>(A)</bold> The comparison on energy cost. The LIF-SNNs have shown a significant advantage in energy consumption, whose energy cost is half of that of the CNN. <bold>(B)</bold> The power of each model in the <italic>Frame</italic> time unit is also the energy required for one feed-forward process.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-726582-g0011.tif"/>
</fig>
<p>These results also support the SNNs&#x00027; energy advantages in this task. For these reasons, we think <italic>ES-ImageNet</italic> would be an SNN-friendly dataset. We still hope that more ANNs algorithms will be proposed to solve these challenges elegantly and efficiently, which may also provide guidance for the development of SNNs.</p>
</sec>
<sec>
<title>Limitations</title>
<p>For the conversion algorithm, we generate temporal features by applying artificial motion to static images like most conversion methods, which is still different from the real scene. It is the limitation for those dynamic datasets derived from static data. In addition, in order to compress the volume of the dataset and extract more information, we reduce the randomness of data during generation, thus losing a certain feature of DVS camera recording but being more friendly to SNNs.</p>
<p>In the analysis part, due to the limitation of mathematical tools, the 2D-Entropy we adopt can reflect only the amount of information, not the amount of effective information. Therefore, it can only be used as a reference rather than a standard. In addition, the reconstruction method and the compression method used in measurement would influence the information, though we have compared them as fairly as possible.</p>
<p>In the training method, due to the limitation of hardware conditions and algorithms, we can only provide the benchmarks of SNNs and ANNs based on ResNet-18 and ResNet-34. It is hoped that more research will participate in the training of larger and better models.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusion</title>
<p>In this paper, we provide a new neuromorphic vision dataset named <italic>ES-ImageNet</italic> for event-based image classification and validation of SNN algorithms. We proposed a method called ODG, transforming a famous image classification dataset <italic>ILSVRC2012</italic> into its event-based version with a method called Edge-Integral to reconstruct the corresponding gray images based on these event streams. The ODG method includes a carefully designed image movement, which results in the value changes in the HSV color space and provides spatial gradient information. This algorithm can efficiently extract the spatial features to generate event streams.</p>
<p>For testing the properties of datasets, we use the Edge-Integral method to exhibit some of the reconstructed samples, and also calculate the 2D-Entropy distribution of the dataset. Furthermore, a comparative experiment is conducted using 2D-CNN, 3D-CNN, LIF-SNN, and LIAF-SNN, these results provide a benchmark for later research, and also confirm that this dataset is a suitable validation tool for SNNs.</p>
<p>This dataset solves the problem of lacking a suitable large-scale classification dataset in the SNNs&#x00027; research field. It fills in this gap of a suitable dataset for the verification of large-scale SNNs so that the corresponding algorithm is expected to be better optimized, and more SNNs&#x00027; structures and training algorithms will be explored, thereby promoting practical applications of SNNs.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The datasets ES-ImageNet (100GB) for this study can be downloaded at: <ext-link ext-link-type="uri" xlink:href="https://cloud.tsinghua.edu.cn/d/94873ab4ec2a4eb497b3">https://cloud.tsinghua.edu.cn/d/94873ab4ec2a4eb497b3</ext-link>. The converted event-frame version (40GB) can be found at: <ext-link ext-link-type="uri" xlink:href="https://cloud.tsinghua.edu.cn/d/ee07f304fb3a498d9f0f/">https://cloud.tsinghua.edu.cn/d/ee07f304fb3a498d9f0f/</ext-link>. The codes can be found at: <ext-link ext-link-type="uri" xlink:href="https://github.com/lyh983012/ES-imagenet-master">https://github.com/lyh983012/ES-imagenet-master</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>YL and GL conceptualized the work. YL designed the research. YL, WD, and SQ designed and conducted the experiment. YL analyzed data. LD and GL supervised the work. All authors wrote the manuscript.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This work is partially supported by the National Key R&#x00026;D Program of China (2020AAA0105200, 2018AAA0102604, and 2018YFE0200200), the National Science Foundation of China (61876215), the Beijing Academy of Artificial Intelligence (BAAI), a grant from the Institute for Guo Qiang, Tsinghua University, and Beijing Science and Technology Program (Z191100007519009), the open project of Zhejiang laboratory, and the Science and Technology Major Project of Guangzhou (202007030006).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fnins.2021.726582/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fnins.2021.726582/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Video_1.MP4" id="SM1" mimetype="video/mp4" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adrian</surname> <given-names>E. D.</given-names></name> <name><surname>Zotterman</surname> <given-names>Y.</given-names></name></person-group> (<year>1926</year>). <article-title>The impulses produced by sensory nerve-endings: part II. the response of a single end-organ</article-title>. <source>J. Physiol.</source> <volume>61</volume>:<fpage>151</fpage>. <pub-id pub-id-type="doi">10.1113/jphysiol.1926.sp002281</pub-id><pub-id pub-id-type="pmid">16993780</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Akopyan</surname> <given-names>F.</given-names></name> <name><surname>Sawada</surname> <given-names>J.</given-names></name> <name><surname>Cassidy</surname> <given-names>A.</given-names></name> <name><surname>Alvarez-Icaza</surname> <given-names>R.</given-names></name> <name><surname>Arthur</surname> <given-names>J.</given-names></name> <name><surname>Merolla</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip</article-title>. <source>IEEE Trans. Comput. Aid. Design Integr. Circuits Syst.</source> <volume>34</volume>, <fpage>1537</fpage>&#x02013;<lpage>1557</lpage>. <pub-id pub-id-type="doi">10.1109/TCAD.2015.2474396</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Amir</surname> <given-names>A.</given-names></name> <name><surname>Taba</surname> <given-names>B.</given-names></name> <name><surname>Berg</surname> <given-names>D.</given-names></name> <name><surname>Melano</surname> <given-names>T.</given-names></name> <name><surname>McKinstry</surname> <given-names>J.</given-names></name> <name><surname>Di Nolfo</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>A low power, fully event-based gesture recognition system</article-title>, in <source>CVPR</source> (<publisher-loc>Honolulu, HI</publisher-loc>), <fpage>7243</fpage>&#x02013;<lpage>7252</lpage>. <pub-id pub-id-type="pmid">32903824</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bardow</surname> <given-names>P.</given-names></name> <name><surname>Davison</surname> <given-names>A. J.</given-names></name> <name><surname>Leutenegger</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Simultaneous optical flow and intensity estimation from an event camera</article-title>, in <source>CVPR</source> (<publisher-loc>Las Vegas, USA</publisher-loc>). <fpage>884</fpage>&#x02013;<lpage>892</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benosman</surname> <given-names>R.</given-names></name> <name><surname>Clercq</surname> <given-names>C.</given-names></name> <name><surname>Lagorce</surname> <given-names>X.</given-names></name> <name><surname>Ieng</surname> <given-names>S.-H.</given-names></name> <name><surname>Bartolozzi</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Event-based visual flow</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source> <volume>25</volume>, <fpage>407</fpage>&#x02013;<lpage>417</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2013.2273537</pub-id><pub-id pub-id-type="pmid">24807038</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bi</surname> <given-names>Y.</given-names></name> <name><surname>Andreopoulos</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Pix2nvs: parameterized conversion of pixel-domain video frames to neuromorphic vision streams</article-title>, in <source>ICIP</source> (<publisher-loc>Beijing, China</publisher-loc>), <fpage>1990</fpage>&#x02013;<lpage>1994</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brandli</surname> <given-names>C.</given-names></name> <name><surname>Berner</surname> <given-names>R.</given-names></name> <name><surname>Yang</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>S.-C.</given-names></name> <name><surname>Delbruck</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>A 240 &#x000D7; 180 130 db 3 &#x003BC;s latency global shutter spatiotemporal vision sensor</article-title>. <source>IEEE J. Solid State Circuits</source> <volume>49</volume>, <fpage>2333</fpage>&#x02013;<lpage>2341</lpage>. <pub-id pub-id-type="doi">10.1109/JSSC.2014.2342715</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cannici</surname> <given-names>M.</given-names></name> <name><surname>Plizzari</surname> <given-names>C.</given-names></name> <name><surname>Planamente</surname> <given-names>M.</given-names></name> <name><surname>Ciccone</surname> <given-names>M.</given-names></name> <name><surname>Bottino</surname> <given-names>A.</given-names></name> <name><surname>Caputo</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>N-rod: a neuromorphic dataset for synthetic-to-real domain adaptation</article-title>, in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>, <fpage>1342</fpage>&#x02013;<lpage>1347</lpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carneiro</surname> <given-names>J.</given-names></name> <name><surname>Ieng</surname> <given-names>S.-H.</given-names></name> <name><surname>Posch</surname> <given-names>C.</given-names></name> <name><surname>Benosman</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>Event-based 3d reconstruction from neuromorphic retinas</article-title>. <source>Neural Netw.</source> <volume>45</volume>, <fpage>27</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2013.03.006</pub-id><pub-id pub-id-type="pmid">23545156</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>M.</given-names></name> <name><surname>Srinivasa</surname> <given-names>N.</given-names></name> <name><surname>Lin</surname> <given-names>T.-H.</given-names></name> <name><surname>Chinya</surname> <given-names>G.</given-names></name> <name><surname>Cao</surname> <given-names>Y.</given-names></name> <name><surname>Choday</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Loihi: a neuromorphic many-core processor with on-chip learning</article-title>. <source>IEEE Micro</source> <volume>38</volume>, <fpage>82</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1109/MM.2018.112130359</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dayan</surname> <given-names>P.</given-names></name> <name><surname>Abbott</surname> <given-names>L. F.</given-names></name></person-group> (<year>2001</year>). <source>Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems</source>. Computational Neuroscience Series. <publisher-loc>London, England</publisher-loc>: <publisher-name>The MIT Press</publisher-name>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Tournemire</surname> <given-names>P.</given-names></name> <name><surname>Nitti</surname> <given-names>D.</given-names></name> <name><surname>Perot</surname> <given-names>E.</given-names></name> <name><surname>Migliore</surname> <given-names>D.</given-names></name> <name><surname>Sironi</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>A large scale event-based detection dataset for automotive</article-title>. <source>arXiv preprint arXiv:2001.08499</source>. <pub-id pub-id-type="pmid">33162883</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>W.</given-names></name> <name><surname>Socher</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>L.-J.</given-names></name> <name><surname>Li</surname> <given-names>K.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>Imagenet: a large-scale hierarchical image database</article-title>, in <source>CVPR</source> (<publisher-name>IEEE</publisher-name>) (<publisher-loc>Miami, USA</publisher-loc>), <fpage>248</fpage>&#x02013;<lpage>255</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Rethinking the performance comparison between snns and anns</article-title>. <source>Neural Netw.</source> <volume>121</volume>, <fpage>294</fpage>&#x02013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2019.09.005</pub-id><pub-id pub-id-type="pmid">31586857</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ewert</surname> <given-names>J.-P.</given-names></name></person-group> (<year>1974</year>). <article-title>The neural basis of visually guided behavior</article-title>. <source>Sci. Am.</source> <volume>230</volume>, <fpage>34</fpage>&#x02013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1038/scientificamerican0374-34</pub-id><pub-id pub-id-type="pmid">4204830</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gehrig</surname> <given-names>D.</given-names></name> <name><surname>Gehrig</surname> <given-names>M.</given-names></name> <name><surname>Hidalgo-Carri&#x000F3;</surname> <given-names>J.</given-names></name> <name><surname>Scaramuzza</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Video to events: recycling video datasets for event cameras</article-title>, in <source>CVPR</source> (<publisher-loc>Seattle, USA</publisher-loc>), <fpage>3586</fpage>&#x02013;<lpage>3595</lpage>.</citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hara</surname> <given-names>K.</given-names></name> <name><surname>Kataoka</surname> <given-names>H.</given-names></name> <name><surname>Satoh</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?</article-title>, in <source>Proceedings of the IEEE conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, USA</publisher-loc>), <fpage>6546</fpage>&#x02013;<lpage>6555</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep residual learning for image recognition</article-title>, in <source>CVPR</source> (<publisher-loc>Las Vegas, USA</publisher-loc>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>. <pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Pfeiffer</surname> <given-names>M.</given-names></name> <name><surname>Delbruck</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>Dvs benchmark datasets for object tracking, action recognition, and object recognition</article-title>. <source>Front. Neurosci.</source> <volume>10</volume>:<fpage>405</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2016.00405</pub-id><pub-id pub-id-type="pmid">27630540</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioffe</surname> <given-names>S.</given-names></name> <name><surname>Szegedy</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Batch normalization: accelerating deep network training by reducing internal covariate shift</article-title>. <source>arXiv preprint</source> arXiv:1502.03167.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>S.</given-names></name> <name><surname>Xu</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>M.</given-names></name> <name><surname>Yu</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>3d convolutional neural networks for human action recognition</article-title>. <source>PAMI</source> <volume>35</volume>, <fpage>221</fpage>&#x02013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2012.59</pub-id><pub-id pub-id-type="pmid">22392705</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>H.</given-names></name> <name><surname>Leutenegger</surname> <given-names>S.</given-names></name> <name><surname>Davison</surname> <given-names>A. J.</given-names></name></person-group> (<year>2016</year>). <article-title>Real-time 3d reconstruction and 6-dof tracking with an event camera</article-title>, in <source>ECCV</source> (<publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>349</fpage>&#x02013;<lpage>364</lpage>.</citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization</article-title>. <source>arXiv preprint</source> arXiv:1412.6980.</citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <source>Learning multiple layers of features from tiny images</source> (Master&#x00027;s thesis). <publisher-name>Department of Computer Science, University of Toronto</publisher-name>, <publisher-loc>Toronto, ON, Canada</publisher-loc>. <pub-id pub-id-type="pmid">33561989</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>Adv. Neural Inform. Process. Syst.</source> <volume>25</volume>, <fpage>1097</fpage>&#x02013;<lpage>1105</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le Moigne</surname> <given-names>J.</given-names></name> <name><surname>Tilton</surname> <given-names>J. C.</given-names></name></person-group> (<year>1995</year>). <article-title>Refining image segmentation by integration of edge and region data</article-title>. <source>IEEE Trans. Geosci. Remote Sens.</source> <volume>33</volume>, <fpage>605</fpage>&#x02013;<lpage>615</lpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bottou</surname> <given-names>L.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Haffner</surname> <given-names>P.</given-names></name></person-group> (<year>1998</year>). <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proc. IEEE</source> <volume>86</volume>, <fpage>2278</fpage>&#x02013;<lpage>2324</lpage>.</citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Ji</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Shi</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Cifar10-dvs: an event-stream dataset for object classification</article-title>. <source>Front. Neurosci.</source> <volume>11</volume>:<fpage>309</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2017.00309</pub-id><pub-id pub-id-type="pmid">28611582</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maass</surname> <given-names>W.</given-names></name></person-group> (<year>1997</year>). <article-title>Networks of spiking neurons: the third generation of neural network models</article-title>. <source>Neural Netw.</source> <volume>10</volume>, <fpage>1659</fpage>&#x02013;<lpage>1671</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miao</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>G.</given-names></name> <name><surname>Ning</surname> <given-names>X.</given-names></name> <name><surname>Zi</surname> <given-names>Y.</given-names></name> <name><surname>Ren</surname> <given-names>K.</given-names></name> <name><surname>Bing</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Neuromorphic benchmark datasets for pedestrian detection, action recognition, and fall detection</article-title>. <source>Front. Neurorobot.</source> <volume>13</volume>:<fpage>38</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2019.00038</pub-id><pub-id pub-id-type="pmid">31275128</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Moeys</surname> <given-names>D. P.</given-names></name> <name><surname>Neil</surname> <given-names>D.</given-names></name> <name><surname>Corradi</surname> <given-names>F.</given-names></name> <name><surname>Kerr</surname> <given-names>E.</given-names></name> <name><surname>Vance</surname> <given-names>P.</given-names></name> <name><surname>Das</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Pred18: dataset and further experiments with davis event camera in predator-prey robot chasing</article-title>. <source>arXiv preprint</source> arXiv:1807.03128.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>T.</given-names></name> <name><surname>Rosenberg</surname> <given-names>M.</given-names></name> <name><surname>Song</surname> <given-names>X.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name> <name><surname>Tiwary</surname> <given-names>S.</given-names></name> <name><surname>Majumder</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Ms marco: a human-generated machine reading comprehension dataset</article-title>. <source>arXiv preprint</source> arXiv:1611.09268.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Northcutt</surname> <given-names>C. G.</given-names></name> <name><surname>Jiang</surname> <given-names>L.</given-names></name> <name><surname>Chuang</surname> <given-names>I. L.</given-names></name></person-group> (<year>2019</year>). <article-title>Confident learning: estimating uncertainty in dataset labels</article-title>. <source>arXiv preprint</source> arXiv:1911.00068.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orchard</surname> <given-names>G.</given-names></name> <name><surname>Jayawant</surname> <given-names>A.</given-names></name> <name><surname>Cohen</surname> <given-names>G. K.</given-names></name> <name><surname>Thakor</surname> <given-names>N.</given-names></name></person-group> (<year>2015</year>). <article-title>Converting static image datasets to spiking neuromorphic datasets using saccades</article-title>. <source>Front. Neurosci.</source> <volume>9</volume>:<fpage>437</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2015.00437</pub-id><pub-id pub-id-type="pmid">26635513</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paredes-Vall&#x000E9;s</surname> <given-names>F.</given-names></name> <name><surname>Scheper</surname> <given-names>K. Y. W.</given-names></name> <name><surname>De Croon</surname> <given-names>G. C. H. E.</given-names></name></person-group> (<year>2019</year>). <article-title>Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: from events to global motion perception</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>42</volume>:<fpage>2051</fpage>&#x02013;<lpage>2064</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2019.2903179</pub-id><pub-id pub-id-type="pmid">30843817</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A.</given-names></name> <name><surname>Gross</surname> <given-names>S.</given-names></name> <name><surname>Massa</surname> <given-names>F.</given-names></name> <name><surname>Lerer</surname> <given-names>A.</given-names></name> <name><surname>Bradbury</surname> <given-names>J.</given-names></name> <name><surname>Chanan</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Pytorch: an imperative style, high-performance deep learning library</article-title>, in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Vancouver, Canada</publisher-loc>), <fpage>8026</fpage>&#x02013;<lpage>8037</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Zhao</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Towards artificial general intelligence with hybrid tianjic chip architecture</article-title>. <source>Nature</source> <volume>572</volume>, <fpage>106</fpage>&#x02013;<lpage>111</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1424-8</pub-id><pub-id pub-id-type="pmid">31367028</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prez-Carrasco</surname> <given-names>J. A.</given-names></name> <name><surname>Zhao</surname> <given-names>B.</given-names></name> <name><surname>Serrano</surname> <given-names>C.</given-names></name> <name><surname>Acha</surname> <given-names>B.</given-names></name> <name><surname>Serrano-Gotarredona</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Mapping from frame-driven to frame-free event-driven vision systems by low-rate coding and coincidence processing&#x02013;application to feedforward convnets</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>35</volume>, <fpage>2706</fpage>&#x02013;<lpage>2719</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2013.71</pub-id><pub-id pub-id-type="pmid">24051730</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rahimi</surname> <given-names>A.</given-names></name> <name><surname>Recht</surname> <given-names>B.</given-names></name></person-group> (<year>2007</year>). <article-title>Random features for large-scale kernel machines</article-title>, in <source>Proceedings of the 20th International Conference on Neural Information Processing Systems</source>, NIPS&#x00027;07 (<publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran Associates Inc</publisher-name>), <fpage>1177</fpage>&#x02013;<lpage>1184</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajpurkar</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Lopyrev</surname> <given-names>K.</given-names></name> <name><surname>Liang</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Squad: 100,000&#x0002B; questions for machine comprehension of text</article-title>. <source>arXiv preprint</source> arXiv:1606.05250.</citation>
</ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rouat</surname> <given-names>J.</given-names></name> <name><surname>Pichevar</surname> <given-names>R.</given-names></name> <name><surname>Loiselle</surname> <given-names>S.</given-names></name> <name><surname>Hai</surname> <given-names>A. H.</given-names></name> <name><surname>Lavoie</surname> <given-names>J.</given-names></name> <name><surname>Bergeron</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2013</year>). <source>Spatio-temporal pattern recognition using a spiking neural network and processing thereof on a portable and/or distributed computer</source>. US Patent 8,346,692. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>U.S. Patent and Trademark Office</publisher-name>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname> <given-names>K.</given-names></name> <name><surname>Jaiswal</surname> <given-names>A.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Towards spike-based machine intelligence with neuromorphic computing</article-title>. <source>Nature</source> <volume>575</volume>, <fpage>607</fpage>&#x02013;<lpage>617</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1677-2</pub-id><pub-id pub-id-type="pmid">31776490</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Russakovsky</surname> <given-names>O.</given-names></name> <name><surname>Deng</surname> <given-names>J.</given-names></name> <name><surname>Su</surname> <given-names>H.</given-names></name> <name><surname>Krause</surname> <given-names>J.</given-names></name> <name><surname>Satheesh</surname> <given-names>S.</given-names></name> <name><surname>Ma</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>ImageNet large scale visual recognition challenge</article-title>. <source>IJCV</source> <volume>115</volume>, <fpage>211</fpage>&#x02013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-015-0816-y</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrauwen</surname> <given-names>B.</given-names></name> <name><surname>DHaene</surname> <given-names>M.</given-names></name> <name><surname>Verstraeten</surname> <given-names>D.</given-names></name> <name><surname>Van Campenhout</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>Compact hardware liquid state machines on fpga for real-time speech recognition</article-title>. <source>Neural Netw.</source> <volume>21</volume>, <fpage>511</fpage>&#x02013;<lpage>523</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2007.12.009</pub-id><pub-id pub-id-type="pmid">18222634</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tao</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Ling</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Dadiannao: a neural network supercomputer</article-title>. <source>IEEE Trans. Comput.</source> <volume>66</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TC.2016.2574353</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vasudevan</surname> <given-names>A.</given-names></name> <name><surname>Negri</surname> <given-names>P.</given-names></name> <name><surname>Linares-Barranco</surname> <given-names>B.</given-names></name> <name><surname>Serrano-Gotarredona</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>Introduction and analysis of an event-based sign language dataset</article-title>, in <source>2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)</source> (<publisher-loc>Buenos Aires, Argentina</publisher-loc>), <fpage>675</fpage>&#x02013;<lpage>682</lpage>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Du</surname> <given-names>B.</given-names></name> <name><surname>Zhao</surname> <given-names>G.</given-names></name> <name><surname>Lizhen</surname> <given-names>L. C. C.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Event-stream representation for human gaits identification using deep neural networks</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <fpage>1</fpage>&#x02013;<lpage>1</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2021.3054886</pub-id><pub-id pub-id-type="pmid">33502972</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>D.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title><italic>l</italic>1 -norm batch normalization for efficient training of deep neural networks</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source> <volume>30</volume>, <fpage>2043</fpage>&#x02013;<lpage>2051</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2018.2876179</pub-id><pub-id pub-id-type="pmid">30418924</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Shi</surname> <given-names>L.</given-names></name></person-group> (<year>2019</year>). <article-title>Direct training for spiking neural networks: faster, larger, better</article-title>, in <source>AAAI</source>, <volume>Vol. 33</volume>, (<publisher-loc>Hawaii, USA</publisher-loc>), <fpage>1311</fpage>&#x02013;<lpage>1318</lpage>.</citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Liaf-net: leaky integrate and analog fire network for lightweight and efficient spatiotemporal information processing</article-title>. <source>arXiv preprint</source> arXiv:2011.06176.<pub-id pub-id-type="pmid">33979292</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>G.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Shi</surname> <given-names>L.</given-names></name></person-group> (<year>2019</year>). <article-title>Dashnet: a hybrid artificial and spiking neural network for high-speed object tracking</article-title>. <source>arXiv preprint</source> arXiv:1909.12942.</citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Jin</surname> <given-names>Y.</given-names></name> <name><surname>Choe</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>A digital liquid state machine with biologically inspired learning and its application to speech recognition</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source> <volume>26</volume>, <fpage>2635</fpage>&#x02013;<lpage>2649</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2015.2388544</pub-id><pub-id pub-id-type="pmid">25643415</pub-id></citation></ref>
</ref-list>
</back>
</article> 
