Adaptive Extreme Edge Computing for Wearable Devices

Wearable devices are a fast-growing technology with impact on personal healthcare for both society and economy. Due to the widespread of sensors in pervasive and distributed networks, power consumption, processing speed, and system adaptation are vital in future smart wearable devices. The visioning and forecasting of how to bring computation to the edge in smart sensors have already begun, with an aspiration to provide adaptive extreme edge computing. Here, we provide a holistic view of hardware and theoretical solutions toward smart wearable devices that can provide guidance to research in this pervasive computing era. We propose various solutions for biologically plausible models for continual learning in neuromorphic computing technologies for wearable sensors. To envision this concept, we provide a systematic outline in which prospective low power and low latency scenarios of wearable sensors in neuromorphic platforms are expected. We successively describe vital potential landscapes of neuromorphic processors exploiting complementary metal-oxide semiconductors (CMOS) and emerging memory technologies (e.g., memristive devices). Furthermore, we evaluate the requirements for edge computing within wearable devices in terms of footprint, power consumption, latency, and data size. We additionally investigate the challenges beyond neuromorphic computing hardware, algorithms and devices that could impede enhancement of adaptive edge computing in smart wearable devices.


Introduction
Wearable devices can monitor various human body symptoms ranging from heart, respiration, movement, to brain activities.Such miniaturized devices using different sensors can detect, predict, and analyze the physical performance, physiological status, biochemical composition, and mental alertness of the human body.Despite advances in novel materials that can improve the resolution and sensitivity of sensors, modern wearable devices are facing various challenges such as low computing capability, high power consumption, high amount of data to be transmitted, and low speed of the data transmission.Conventional wearable sensing solutions mostly transmit the collected data to external servers for off-chip computing and processing.This approach typically creates an information bottleneck acting as one of the major limiting factors in lowering the power consumption and improving the speed of the operation of the sensing systems.In addition, the use of conventional remote servers with conventional signal processing techniques for processing these temporal real-time sensing data makes it computationally intensive and results in significant power consumption and hardware occupation.Moreover, standard von-Neumann architectures feature a physical separation between memory and processing unit, thus further increasing the power consumption to shuttle data between units.Such solutions always need a trade-off between power lifetime and computing capability.Bringing computing at the edge enables faster response times and opens the possibility of personalized always-on wearable devices able for continuously interacting and learning with the environment.However, a radical change of paradigm which uses innovative algorithms, circuits and memory devices is needed to maximize the system performance whilst keeping power and memory budgets at a minimum.
Conventional computers, using Boolean and bit-precise digital representations and executing operations with time- sensors to capture clear information from the body.However, from processing aspect and to make a signal meaningful towards personalized devices, further development is still needed.Due to the fact that the sensing signal is relatively weak and noisy, a readout circuit (normally composed by an amplifier, a conditioning circuit and an analogue signal processing unit) is necessary to make the signal readable for a system ( 27,28 ).The subsequent high-level system will process the data and send commands to actuators for a closed-loop control or interaction ( [29][30][31] ).For various applications ranging from the human-machine interface ( 29 ) to health monitoring ( 32,33 ), different combinations of sensor and system have been developed over the past decade ( 34,35 ).The use of machine learning empowers sensor to build a novel smart application.The examples will be provided in the next section.

Wearable sensors with machine learning
Recently, the field of artificial intelligence further boosts the possibility of smart wearable sensory systems.The emerging intelligent applications and high-performance systems require more complexity and demand sensory units accurately describe the physical object.The decision-making unit or algorithm can therefore output a more reliable result ( [35][36][37][38][39] ).Depending on the signal acquiring position, Fig. 1 summaries the four biopotential sensors and two widely used wearable sensors along with their learning systems and applications.The sensors for the biopotential will be introduced first, and the other two wearable sensors will be provided separately.
The biopotential signal can be extracted from the human body using a sensor with direct electrode contact.The electrochemical activity of the cells in nervous, muscular and glandular tissue generates ionic currents in the body.An electrode-electrolyte transducer is needed to convert the ionic current to electric current for the front-end circuit.The electrode that is normally made up of mental can be oxidized by the electrolyte, generating metal ions and free electrons.In addition, the anions in the electrolyte can also be oxidized to neutral atoms and free electrons.These free electrons result in current flow through the electrode.Thus, the surface potential generated by the electrochemical activities in cells can be sensed by the electrode.However, the bio-signals sensed by the electrode are weak and noisy.Before digitizing the collected signals by analog-to-digital converter, an analogue front-end is essential to provide a readable signal.The design requirements of the front-end for the biopotential electrodes can be summarized as follow: i) high common mode rejection ratio; ii) high signal-to-noise-ratio; iii) low-power consumption; iv) signal filtering, and v) configurable gain ( 40 ).
Electrocardiography (ECG).ECG is the electrical activity generated by the electrochemistry around cardiac tissue.Containing morphological or statistical features, ECG provides comprehensive information for analyzing and diagnosing cardiovascular diseases ( 41 ).In the previous study, automatic ECG classification has been achieved using machine learning algorithms, such as Deep Neural Network (DNN) ( 42,43 ), Support Vector Machine (SVM) ( 44,45 ), and Recurrent Neural Network (RNN) ( 46,47 ).According to Association for the Advancement of Medical Instrumentation, there are five classes of ECG type of interest: normal, ventricular, supraventricular, fusion of normal and ventricular, and unknown beats.These methodologies can be evaluated by available ECG database and yield over 90% accuracy and sensitivity for the five classes, which is essential for future cardiovascular health monitoring.In wearable application, 48 and 49 present systems that measure ECG and send it to the cloud for classification and health monitoring.
Electroencephalography (EEG).Our brain neurons communicate with each other through electrical impulses.An EEG electrode can help to detect potential information associated with this activity through investigating EEG ( 50,51 ) in the surface of the skull.In comparison with other biopotential signals, surface EEG is relatively weak (normally in the range of microvoltlevel) and noisy ( 52,53 ).Therefore, it requires high input impedance readout circuit and intensive signal pre-processing for clean EEG data ( 40,50 ).While wet-electrode (Ag/AgCl) is more precise and more suitable for clinical purpose, passive dry-electrode is more suitable for daily health monitoring and brain-machine interface ( 52,54 ).Besides, the applications also include mental disorder ( 55 ), driving safety ( 51,54 ), and emotion evaluation ( 56 ).A commercial biopotential data acquisition system, Biosemi Active Two, provides up to 256 channels for EEG analysis ( 57 ).For a specific application, we can reduce the number of electrodes to only detect the relevant areas, such as 19 channels for depression diagnosis ( 58 ), four channels for evaluating driver vigilance ( 51 ) and 64 channels for emotional state classification ( 56).Although EEG is on-body biopotential, most of the existing EEG researches employed offline learning and analysis because of the system complexity and the high number of channels.In wearable real-time applications, usually a smaller number of channels were selected and the data were wirelessly sent to cloud for further processing ( 51,54,59,60 ).
Electrooculography (EOG).The eye movement, which results in potential variations around eyes as EOG, is a combined effect of environmental and psychological changes.It returns relatively weak voltage (0.01-0.1mV) and low frequency (0-10Hz) ( 53 ).Differ from other eye tracking techniques using a video camera and infrared, EOG provides a lightweight, inexpensive and fully wearable solution to access human's eye movement ( 61 ).It is the most widely used approach of wearable human-machine interface, especially for assisting quadriplegics ( 61 ).It has been used to control a wheelchair ( 62), control a prosthesis limb ( 63),( 31) evaluate sleeping ( [64][65][66].Additionally, recent studies fuse EEG and EOG to increase the degree of freedom of signal and enhance the system reliability because their similar implicit information such as sleepiness ( 64,67 ) and mental health ( 68 ).EOG can also act as a supplement to provide additional functions or commands to an EEG system ( 31,69,70 ).

Electromyography (EMG).
EMG is an electrodiagnostic method for recording and analyzing the electrical activity generated by skeletal muscles.EMG is generated by skeletal muscle movement, which frequently occurs in arms and legs.It yields higher amplitude (up to 10 millivolts) and bandwidth (20-1000Hz) compared to the other biopotentials ( 40,53 ).Near the active muscle, different oscillation signals can be measured by a dry electrode array, which allows the computer to sense and decode body motion ( [71][72][73].A prime example is the Myo armband of Thalmic Labs, which is a commercial multi-sensor device that consists of EMG sensors, gyroscope, accelerometer and magnetometer ( 74 ).The sensory data is sent to phone or PC via Bluetooth, at which various body movements can be obtained by feature extraction and machine learning.Moreover, the application of EMG is frequently linked to target control like a wheelchair ( 75 ) and prosthetic hand ( 76,77 ) for assisting disabled people.In addition, its application also includes sign language recognition ( 71 ), diagnosis of neuromuscular disorders ( 72,78 ), analysis of walking strides ( 73 ) and virtual reality ( 79 ).Machine learning enables the system to overcome the variation of EMG signals from different users ( 71,72 ).

Photoplethysmography (PPG).
PPG is an non-invasive and low-cost optical measurement method that is often used for blood pressure and heart rate monitoring in wearable devices.The optical properties in skin and tissue are periodically changes due to the blood flow driven by the heartbeat.By using a light emitter toward the skin surface, the photosensor can detect the variations in light absorption normally from wrist or finger.This variation signal is called PPG which is highly relevant to the rhythm of the cardiovascular system ( 80 ).Compared with ECG, PPG is easily accessible and low cost, which makes it an ideal intermedia of wearable heart rate measurement.The main disadvantage against ECG is that the PPG is not unique for different persons and body positions.Thus, further analysis of PPG requires machine learning or other statistics tools for calibrating the signal to different scenarios.For example, it can be used in biometric identification after deep learning ( 81,82 ).It is worth mentioning that PPG is a strong supplementary in the application of ECG.
Bioimpedance spectroscopy (BIS).BIS is another low-cost and powerful sensing technique that provides informative body parameters.The principle is that cell membrane behaves like a frequency-dependent capacitor and impedance.The emitter electrodes generate multifrequency excitation signal (0.1-100MHz) on the skin while the receiver electrodes collect these current for demodulating the impedance spectral data of the tissue in between ( 83,84 ).Compared to homogeneous materials, body tissue presents more complicated impedance spectra because of the cell membranes and macromolecules.Therefore, the tissue conditions, such as muscle concentration, structural and chemical composition, can be analysed through BIS.The BIS can measure body composition such as fat and water ( 84 ).Based on the different setup in terms of position and frequency, it can also be helpful in the early detection of diseases such as lymphedema, organ ischemia and cancer ( 85 ).Furthermore, multiple pair-wise electrodes can form electrical impedance tomography that describes impedance distribution.By embedding these electrodes in a wristband, the tomography can estimate hand gesture after training, which is another novel solution of inexpensive human-machine interface ( 86 ).

Multisensory fusion in wearable devices
Every sensor has its own limitation.In some demanding cases, an individual sensor itself cannot satisfy the system requirement such as accuracy or robustness ( 35,(87)(88)(89) ).The solution involves increasing the number and type of sensors to form a multisensory system or sensor network for one measurement purpose( [87][88][89] ).Multiple types of sensor synergistically working in a system provide more dimensions of input to fully map an object onto the data stream.Different sensors return different data with respect to sampling rate, number of input and the information behind the data.Machine learning models, such as ANN and SVM, can be designed to combine multiple sources of data.Depended on the application, sensor types and data structure, several approaches have been proposed for multisensory fusion.Generally, in such a system, machine learning is frequently used and plays an vital role in merging different sources of sensory data based on its multidimensional data processing mechanism.The machine learning algorithms allow sensory fusion occurs at the signal, feature or decision level( 88,89 ).The results showed that a multisensory system is advantageous in improving system performance.For example, the fusion of ECG and PPG pattern can be an informative physiological parameter for robust medical assessment ( 90 ).Counting the peak intervals between PPG and ECG can estimate the arterial blood pressure ( 91 ).Interestingly, a recent study shows that the QRS complex of ECG can be reconstructed from PPG by a novel transformed attentional neural networks after training ( 92 ).This could be beneficial for the accessibility of wearable ECG.

Challenges towards smart wearable sensors with edge computing
Given the potential of the sensory system with machine learning, the main challenge raised is the shortage of power and computing efficient ( 28 ).The novel applications using multiple sensors and high learning ability usually require more energy in the wearable computing unit ( 33 ).Nevertheless, the power supply in the wearable domain is a difficulty with existing battery technologies.This weakness limits the further development of smart wearable device ( 33 ).The existing solution is to wirelessly transfer the raw data onto a cloud where the computationally intensive algorithm is implemented ( 93 ).However, this solution is not ideal considering 1) the complexity of using a wireless module, 2) the non-negligible power consumption, 3) the amount of data, 4) the space limitation due to the range of wireless transmission, 5) privacy issues due to the broadcast of signals, 6) non-negligible time latency due to communication channel.These drawbacks strongly limit the application of wearable sensors.
Implementation of ANN in von Neumann architectures, which has been frequently used in sensors, is power-hungry.Conversely, it has been reported that signal processing activity in the brain is several orders of magnitudes more power-efficient and one order in processing rate better than digital systems ( 94 ).Compared to conventional approaches based on a binary digital system, brain-inspired neuromorphic hardware yet to be advanced in the contexts of data storage and removal as well as their transmission between different units.In this perspective, a neuromorphic chip with a built-in intelligent algorithm can act as a front-end processor next to the sensor.The conventional Analog to Digital Converters (ADCs) could be replaced by a delta encoder or feature extractor converting the sensor analog output to spike-based signal for the hardware (see Section 4).In the end, the output becomes the result of recognition or prediction instead of an intensive data stream.In this way, the computation occurs at the local edge under low power and brain-like architecture.

Models for biologically plausible continual learning
In this section we will highlight some recently introduced methods to port the power of modern machine learning to neuromorphic edge devices.In the last couple of years, machine learning has made big steps forward reaching close-to human performance on a wide range of tasks.Many of the most successful machine learning methods are based on artificial neural networks (ANN), which are inspired by the organization of information processing in the brain.However -somewhat contradictory -mapping modern ANN learning methods to brain-inspired hardware poses considerable challenges to the algorithm and hardware design.The main reason for this is, that the development of machine learning algorithms has been strongly influenced by the development of powerful mainframe computers that perform learning offline in big server farms only eventually sending back results to the user.While this development has paved the ground for today's success of ANNs, it has also lead the field away from following the principles used in biology for efficient learning.In the following Section 3.1 we will review recent approaches to combine the strengths of modern machine learning and brain-inspired algorithms, that are of particular interest for edge computing applications.In Section 3.2 we will focus on the problem to cope with extreme memory constraints by exploiting sparsity.In Section 3.3 we will highlight additional open challenges and future work.

Brain-inspired learning algorithms for neuromorphic hardware
Today, the dominating method for training artificial neural networks is the error backpropagation (Backprop) algorithm 100 , which provides an efficient and scalable solution to adapting the network parameters to a set of training data.Backprop is  97 exploits the variability of learning rules and redundancy in the task solution space to learn sparse and robust network configurations (adapted from 98 ) (c) Overcoming forgetting by selectively slowing down weight changes 99 .After learning a first task A, parameter distributions are absorbed into a prior distribution that confines the motility of synaptic weights in subsequent tasks (task B).
an iterative, gradient-based, supervised learning algorithm that operates in three phases.First, a given input activation is propagated through the network to generate the output based on the current set of parameters.Then, the mismatch between the generated outputs and target values is computed using a loss function, and propagated backwards through the network architecture to compute suitable weight changes.Finally, the network parameters are updated to reduce the loss.We will not go into the details behind Backprop here, but see 1 for an excellent review and historical survey of the development of the algorithm.The problem of porting Backprop to neuromorphic hardware stems form a well-known shortcoming of the algorithm known as locking -the weights of a network can only be updated after a full forwards propagation of the data through the network, followed by loss evaluation, then finally after waiting for the back-propagation of error gradients 101 .Locking prevents an efficient implementation of Backprop on online distributed architectures.Also, Backprop is not well suited for spiking neural networks which have non-differentiable output functions.These problems have been recently addressed in brain-inspired variants of the Backprop algorithm.

Brain-inspired alternatives to error backpropagation
In recent years a number of methods have been proposed to approximate the gradient computation performed by Backprop in order to prevent locking (see 102 for a recent review). 103,104 roposed to replace the non-local error back-propagating term of the Backprop algorithm by sending the loss through a fixed feedback network with random weights that are excluded from training.In this approach, named random feedback alignment the back-propagating error signal acts as a local feedback to each synapse, similar to a reward signal in reinforcement learning.The fixed random feedback network de-correlates the error signals providing individual feedback to each synapse.Lillicrap et al. could show that this simple approach already provides a viable approximation to the exact Backprop algorithm and performs well for practical machine learning problems of moderate size.In 105 an event-based version of random feedback alignment, that is well suitable for neuromorphic hardware, was introduced.This approach was further generalized in 106 to include a larger class of algorithms that use error feedback signals.
An efficient model for learning complex sequences in spiking neural networks, named Superspike, was introduced in 107 .The model also uses a learning rule that is modulated by error feedback signals and locally minimizes the mismatch between the network output and a target spike train.To overcome the problem of non-differentiable output, Superspike uses a surrogate gradient approach that replaces the infinitely steep spike events with a finite auxiliary function at the time points of network spike events 108,109 .As in random feedback alignment, learning signals are communicated to the synapses via a feedback network with fixed weights.Using this approach Zenke and others could demonstrate efficient learning of complex sequences in spiking networks.
Another approach to approximate Backprop in spiking neural networks uses an anatomical detail of Cortical neurons. 110ntroduced a biologically inspired two-compartment neuron model that approximates the error backpropagation algorithm by minimizing a local dendritic prediction error. 111port learning by Backprop to neuromorphic hardware by incorporating dynamics with finite time constants and by optimizing the backward pass with respect to substrate variability.They demonstrate the algorithm on the BrainScaleS analog neuromorphic architecture.

Brain-inspired alternatives to backpropagation through time
Recurrent neural network (RNN) architectures often show superior learning results for tasks that involve a temporal dimension, which is often the case for edge computing applications.Porting learning algorithms for RNNs is therefore of utmost importance for efficient machine learning on the edge.Backpropagation through time (BPTT) -the standard RNN learning method used in most GPU implementations -unfolds the network in time and keep this extended structure in memory to propagate information forward and backward which poses a severe challenge to the power and area constraints of edge computing.Recent theoretical results 95,112 show that the power of BPTT can be brought to biologically inspired spiking neural networks (SNN) while at the same time the unfolding can be prevented in an approximation that operates only forward in time, enabling online, always-on learning.This algorithm operates at every synapse in parallel and incrementally updates the synaptic weights.As for random feedback alignment and Superspike discussed above, the weight update depends only on three factors, where the first two are determined by the states of the two related input/output neurons, and the third is given by synapse-specific feedback conveying the mismatch between the target and the actual output (see Fig. 2a for an illustration).The temporal gap between these factors is mitigated by an eligibility trace describing a transient dynamic.Eligibility traces, have been theoretically predicted for a long time 113,114 , and have also recently been observed experimentally in the brain [115][116][117][118] . 7/29

Efficient learning under stringent memory constraints
The amount of available resources in neuromorphic systems is kept low to increase energy efficiency.Memory elements are especially impactful on the energy budget.Therefore, algorithms are needed that make efficient use of the available memory resources.The largest amount of memory in a network is usually consumed by the synaptic weights.Since in practice, the weights of many connections in a network converge to values close to zero, several methods have been proposed to reduce the memory footprint of machine learning algorithms by exploiting sparsity in the network connectivity.We will discuss here two types of algorithms: (1) those that are based on pruning connections after learning and (2) online learning with sparse networks.These two types of sparse learning algorithms are discussed in the following sections.

Pruning
Many approaches to exploit sparsity in learning algorithms focus on pruning the network after training (see 119 for a recent review).Simple methods rely on pruning by magnitude, simply by eliminating the weakest (closest to zero) weights in the network [120][121][122] .Some methods based on this idea have reported impressive sparsity rates of over 95% for standard machine learning benchmarks with negligible performance loss 123,124 .Other methods are based on theoretical motivations and classical sparsification and regularization techniques [125][126][127] .These models reach high compression rates. 128proposed a method to iteratively grow and prune a network in order to generate a compact yet precise solution.They provide a detailed comparison with state of the art dense networks and other pruning methods and reaching sparsity above 99% for the LeNet-5 benchmark.

Online learning in sparse networks
A number of authors also introduced methods that work directly with sparse networks during training, which is often the more interesting case for neuromorphic applications with online training. 129introduced an algorithm for online stochastic rewiring in deep neural networks that works with a fixed number of synaptic connections throughout learning.The algorithm showed close-to state of the art performance at up to 98% sparsity.Sparse evolutionary training (SET) 130 introduced a heuristic approach that prunes the smallest weights and regrows new weights in random locations.Dynamic Sparse Reparameterization 131 introduces a prune-redistribute-regrowth cycle.They demonstrated compelling performance levels also for very deep neural network architectures. 132introduced a single shot pruning algorithm that yields sparse networks based on a saliency criterion prior to the actual training. 133introduced a refined method for online pruning and redistribution that surpasses the previous methods in terms of sparsity and learning performance.

Open challenges and future work
As outlined above, edge computing poses quite specific challenges to learning algorithms that are substantially different from requirements of classical applications.Some of the algorithms outlined above have already been succesfully ported to neuromorphic hardware.For example, the e-prop algorithm of 112 has been implemented on the SpiNNaker 2 chip yielding an additional energy reduction by two orders of magnitude compared to a X86 implementation 134 .See the next Section 4 for more details on available neuromorphic hardware and their applications.
In the remainder of this section we will highlight open challenges that remain to be solved for efficient learning in edge computing applications.In addition to the stringent memory and power constraints learning at the edge also has to function in an online scenario where data arrive in a continuous stream.Some dedicated hardware resources, e.g.like memristive devices discussed in Section 5, may also show high levels in intrinsic variability, so the learning algorithm should be robust against these noise sources.In this section we discuss recent advances in this line of research and provide food for thought on how these specific challenges can be approached in future work.

Fault-tolerant robust learning algorithms for neuromorphic devices
Here we review recent advances in using inspiration from biology to make learning algorithms robust against device variability.Several authors have suggested that device noise and variability should not be seen as a nuisance, but rather can serve as a computational resource for network simulation and learning algorithms (see 135 for a thorough discussion). 136have shown that variability in neuronal outputs can be exploited to learn complex statistical dependencies between sensory stimuli.The stochastic behavior of the neurons is used in this model to compute probabilistic inference, while biologically motivated learning rules, that only require local information at the synapses can be used to update the synaptic weights.A theoretical foundation of the model shows that the spiking network performs a Markov chain Monte Carlo sampling process, that allows the network to 'reason' about statistical problems.
This idea is taken one step further in 137 by showing that also the variability of synaptic transmission can be used for stochastic computing.The intrinsic noise of synaptic release is used to drive a sampling process.It was shown that this model can be implemented in an event-based fashion and was benchmarked on the MNIST digit classification task, where it achieved 95.6% accuracy.In 97 it was shown that the variability of learning rules and weight parameters gives rise to a biologically plausible model of online learning.The intrinsic noise of synaptic weight changes drives a sampling process that can be used to exploit redundancies in the task solution space (see Fig. 2b for an illustration).This model was applied to unsupervised learning in spiking neural networks, and to closed-loop reinforcement learning problems 98,138 .In 139 this model was also ported to the SpiNNaker 2 neuromorphic many-core system.

Biologically motivated mechanisms to combat forgetting in always-on learning scenarios
Neuromorphic systems often operate in an environment where they are permanently on and learning a continuous stream of data.This mode of operation is quite different from most other machine learning applications that work with hand-labeled batches of training data.Always-on learning on a system with limited resources inevitably leads to situations where the system reaches the limits of its memory capacity and thus starts forgetting previously learned sensory experiences.Inspiration to overcome forgetting relevant information comes from biology.The mammalian brain seems to combat forgetting by actively protecting previously acquired knowledge in neocortical circuits [140][141][142][143][144] .When a new skill is acquired, a subset of synapses is strengthened, stabilized and persists despite the subsequent learning of other tasks 143 .
A theoretical treatment of the forgetting problem was conducted in the cascade model of Stefano Fusi and others 145,146 .They could show that learning an increasing number of patterns in a single neural network leads unavoidably to a state which they called catastrophic forgetting.Trying to train more patterns into the network will interfere with all previously learned ones, effectively wiping out the information stored in the network.The proposed cascade model to overcome this problem uses multiple parameters per synapse that are linked through a cascade of local interactions.This cascade of parameters selectively slows down weight changes, thus stabilizes synapses when required and effectively combats effects of forgetting.A related model, that uses multiple parameters per synapse to combat forgetting was used in 99 (see also 147 for a recently introduced variation of the model).They used a Bayesian approach that infers a prior distribution over parameter values at each synapse.Synapses that stabilize during learning (converge to a fixed solution) will be considered relevant in subsequent learning and Bayesian priors help to maintain their values (see Fig. 2c for an illustration).

Biologically motivated mechanisms to enhancing transfer and sensor fusion
Distributed computing architectures at the edge need to make decisions by integrate information from different sensors and sensor modalities and they should be able best make use of the sensory information across a wide range of tasks.It is clearly not very efficient to learn from scratch when confronted with a new task.Therefore, to boost the performance of edge computing, we will consider two aspects of transferring information to new situations: transfer of knowledge between sensors (sensor fusion), which has been treated in Section 2.2, and transfer of knowledge between multiple different tasks (transfer learning).
Transfer learning denotes the improvement of learning in a new task through the use of knowledge from a related task that has already been learned previously 148,149 .This contrasts most other of today's machine learning applications that focus on one very specific task.In transfer learning, when a new task is learned, knowledge from previous skills can be reused without interfering with them.E.g. the ability to perform a tennis swing can be transferred to playing ping pong, while maintaining the ability to do both sports.The literature on transfer learning is extensive and many different strategies have been developed depending on the relationship between the different task domains (see 150 and 151 for systematic reviews).In machine learning a number of approaches have been applied to a wide range of problems, including classification of images [152][153][154][155] , text [156][157][158][159] or human activity 160 .
A very general approach to learn across multiple domains is followed in the learning to learn framework of 161,162 .Their model features networks that are able to modify their own weights through the network activity.These network are therefore able to tinker with their own processing properties.This approach has been taken to its most extreme form where a network leans to implement an optimization algorithm by itself 163 .This model consists of an outer-loop learning network (the optimizer) that controls the parameters of an inner-loop network (the optimizee).The training algorithm of the inner-loop network works on single tasks that are presented sequentially, whereas the outer-loop learner operates across tasks and can acquire strategies to transfer knowledge.This learning-to-learn framework was recently applied to SNNs to obtain properties of LSTM networks and use them to solve complex sequence learning tasks 112 .In 164 the learning-to-learn framework was also applied to a neuromorphic hardware platform.

Signal processing for wearable devices on neuromorphic chip
Neuromorphic engineering is a branch of electrical engineering dedicated to the design of analog/digital data processors that aims to emulate biological neurons and synapses.It typically consumes less energy than conventional computing systems and presents additional properties, such as massively parallel event-based computation, distributed local memory and 9/29 adaptation 165,166 .This increasing interest in neuromorphic engineering shows that hardware SNNs are considered a key future technology with high potential in key application, such as the Edge of Computing, and wearable devices.
Neuromorphic technologies have sparked interest from universities 8,12,[167][168][169] and companies such as IBM 9 and Intel 10 .In this Section, we will provide an overview of the neuromorphic platforms, that to the best of our knowledge were deployed for biomedical signal processing, showing promising results to be exploited in wearable devices.

Neuromorphic processors
TrueNorth.TrueNorth 9 is IBM's fully digital neuromorphic chip with one million neurons arranged in a tiled array of 4096 neurosynaptic cores enabling massive parallel processing.Each core contains 13kB of local SRAM memory to keep neurons and synapse's states along with the axonal delays and information on the fan-out destination.There are 256 Leaky-Integrator and Fire (LIF) neurons implemented by time-multiplexing and 256 million synapses are designed in the form of SRAM memory.Each core can support up to 256 fan-in and fan-out, and this connectivity can be configured such that a neuron in any core can communicate its spikes any other neuron in any other core.Thanks to the event-driven, the co-location of memory and processing units in each core, and the use of low-leakage silicon CMOS technology, TrueNorth can perform 46 billion synaptic operations per second (SOPS) per watt for real-time operation, with 26 pJ per synaptic event.Its power density of 20 mW/cm 2 is about three orders of magnitude smaller than that of typical CPUs.
SpiNNaker.The SpiNNaker machine 8 , designed by the University of Manchester, is a custom-designed ASIC based on massively parallel architecture that has been designed to efficiently simulate large spiking neural networks.It consists of ARM968 processing cores arranged in a 2D array where the precise details of the neurons and their dynamics can be programmed into.Although the processing cores are synchronous microprocessors, the event-based aspect of SpiNNaker is apparent in its message-handling paradigm.A message (event) gets delivered to a core generating a request for being processed.The communications infrastructure between these nodes is specially optimized to carry very large numbers of very small packets, optimal for spiking neurons.A second generation of SpiNNaker was designed by Technical University of Dresden 170 .Spinnaker2 continues the line of dedicated digital neuromorphic chips for brain simulation increasing the simulation capacity by a factor > 10 while staying in the same power budget (i.e.10x better power efficiency).The full-scale SpiNNaker2 consists of 10 Million ARM cores distributed across 70000 Chips in 10 server racks.This system takes advantage of advanced 22nm FDSOI technology node with Adaptive Body Biasing enabling reliable and ultra-low power processing.It also features incorporating numerical accelerators for the most common operations.
Loihi.Loihi10 is Intel's neuromorphic chip with many core processing incorporating on-line learning designed in 14 nm FinFET technology.The chip supports about 130000 neurons and 130 million synapses distributed in 128 cores.Spikes are transported between the cores in the chip using packetized messages by an asynchronous network on chip.It includes three embedded x86 processors and provides a very flexible learning engine on which diverse online learning algorithms such as Spike-Timing Dependent Plasticity (STDP), different 3 factor and trace-based learning rules can be implemented.The chip also provides hierarchical connectivity, dendritic compartments, synaptic delays as different features that can enrich a spiking neural network.The synaptic weights are stored on local SRAM memory and the bit precision can vary between 1 to 9 bits.All logic in the chip is digital, functionally deterministic, and implemented in an asynchronous bundled data design style.
DYNAP-SE.DYNAP-SE implements a multi-core neuromorphic processor with scalable architecture fabricated using a standard 0.18 µm CMOS technology 12 .It is a full-custom asynchronous mixed-signal processor, with a fully asynchronous inter-core and inter-chip hierarchical routing architecture.Each core comprises 256 adaptive exponential integrate-and-fire (AEI&F) neurons for a total of 1k neurons per chip.Each neuron has a Content Addressable Memory (CAM) block, containing 64 addresses representing the pre-synaptic neurons that the neuron is subscribed to.Rich synaptic dynamics are implemented on the chip by using Differential Pair Integrator (DPI) circuits 171 .These circuits produce EPSCs and IPSCs (Excitatory/Inhibitory Post Synaptic Currents), with time constants that can range from a few µs to hundreds of ms.The analog circuits are operated in the sub-threshold domain, thus minimizing the dynamic power consumption, and enabling implementations of neural and synaptic behaviors with biologically plausible temporal dynamics.The asynchronous CAMs on the synapses are used to store the tags of the source neuron addresses connected to them, while the SRAM cells are used to program the address of the destination core/chip that the neuron targets.
ODIN/MorphIC.ODIN (Online-learning DIgital spiking Neuromorphic) processor occupies an area of only 0.086mm 2 in 28nm FDSOI CMOS 13 .It consists of a single neurosynaptic core with 256 neurons and 256 2 synapses.Each neuron can be configured to phenomenologically reproduce the 20 Izhikevich behaviors of spiking neurons 172 .The synapses embed a 3-bit weight and a mapping table bit that allows enabling or disabling Spike-Dependent Synaptic Plasticity (SDSP) locally 173 , thus allowing for the exploration of both off-chip training and on-chip online learning setups.MorphIC is a quad-core digital neuromorphic processor with 2k LIF neurons and more than 2M synapses in 65nm CMOS 174 .MorphIC was designed for high-density large-scale integration of multi-chip setups.The four 512-neuron crossbar cores are connected with a hierarchical routing infrastructure that enables neuron fan-in and fan-out values of 1k and 2k, respectively.The synapses are binary and can be either programmed with offline-trained weights or trained online with a stochastic version of SDSP.

Biomedical signal processing on Neuromorphic hardware
Table 1 shows the summary of neuromorphic processors described previously and in which biomedical signal processing applications were used.These works show promising results for always-on embedded biomedical systems.
The first chip presented in this table is DYNAP-SE, used to implement SNNs for the classification or detection of EMG 175,176 and ECG 177,178 and to implement a simple spiking perceptron as part of a design to detect High Frequency Oscillation (HFO) in human intracranial EEG 179 .In particular, in 175,177 a spiking RNN is deployed for ECG/EMG signal separation to facilitate the classification with a linear read-out.SVM and linear least square approximation is used in the read out layer for 177,178 and overall accuracy of 91% and 95% for anomaly detection were reached respectively.In 175 , the state property of the spiking RNN on EMG was investigated for different hand gestures.In 176 the performance of a feedforward SNN and a hardware-friendly spiking learning algorithm for hand gesture recognition using superficial EMG was investigated and compared to traditional machine learning approaches, such as SVM.Results show that applying SVM on the spiking output of the hidden layer achieved a classification rate of 84%, and the spiking learning method achieved 74% with a power consumption of about 0.05 mW .The consumption was compared to state-of-the-art embedded system showing that the proposed spiking network is two orders of magnitude more power efficient 180,181 .
Recently, the benchmark hand-gesture classification was processed and compared on two other digital neuromorphic platforms, i.e.Loihi and ODIN/MorphIC 13,174 .A spiking Convolutional Neural Network (CNN) was implemented on Loihi and a spiking Multilayer Perceptron (MLP) was implemented on ODIN/MorphIC 182 .Because of the properties of neuromorphic chips, on Loihi a late fusion was implemented combining the output from the spiking CNN for vision, and the spiking MLP for EMG signals; While on ODIN/MorphIC hardware, the two spiking MLPs were fused in the last layer.Due to the neuromorphic chip properties the Loihi implemented a late fusion of a spiking CNN, for vision and a spiking MLP for EMG signals.In the ODIN/MorphIC system two spiking MLPs were fused in the last layer.The comparison with the embedded GPU was performed in terms of accuracy, power consumption, and latency showing that the neuromorphic chips are able to achieve the same accuracy with significantly smaller energy-delay product, 30x and 600x more efficient for Loihi and ODIN/MorphIC, respectively 182 .

Encoding
In SNNs a single spike by itself does not carry any information.However, the number and the timing of spikes produced by a neuron are important.Just as their biological counterpart, silicon neurons in neuromorphic devices produce spike trains at a rate that is proportional to their input current.At the input side, synapse circuits integrate the spikes they receive to produce analog currents, with temporal dynamics and time constants that can be made equivalent to their biological counterparts.The sum of all the positive (excitatory) and negative (inhibitory) synaptic currents afferent to the neuron is then injected into the neuron.
To provide biomedical signals to the synapses of the SNN input layer, it is necessary to first convert them into spikes.A common way to do this is to use a delta-modulator circuit 179,183 functionally equivalent to the one used in the Dynamic Vision Sensor (DVS) 184 .This circuit, in practice, is an ADC that produces two asynchronous digital pulse outputs (UP or DOWN) for 11/29 every biosignal channel in the input.The UP (DOWN) spikes are generated every time the difference between the current and previous value exceeds a pre-defined threshold.The sign of the difference corresponds to the UP or DOWN channel where the spike is produced.This approach was used to convert EMG signals, used in mixed-signal neuromorphic chips 175,176 and in digital ones 182,185 , ECG signals 177,178 , and EEG and HFO ones 179,183 .

Adaptation in neuromorphic processor
Local adaptation is an important aspect in extreme edge computing, specially when it comes to wearable devices.The current methods for training networks for biomedical signals rely on large datasets collected from different patients.However, when it comes to biological data, there is no "one size fits all".Each patient and person has their own unique biological signature.Therefore, the field of Personalized Medicine (PM) has gained lots of attention in the past few years and the online on-edge adaptation feature of neuromorphic chips can be a game changer for PM.
As was discussed in Section 3.1, there are lots of effort in designing spike-based online learning algorithms which can be implemented on neuromorphic chips.
Example of today's state of the art for on-chip learning are Intel's Loihi 10 , DynapSEL and ROLLS chip from UZH/ETHZ 168,186 , BrainScales from Heidelberg 11 and ODIN from UC Louvain 13 .Intel's Loihi includes a learning engine which can implement different learning rules such as simple pairwise STDP, triplet STDP, reinforcement learning with synaptic tag assignments or any 3 factor learning rule implementation.DynapSEL, ROLLS and ODIN encompass the SDSP, also known as the Fusi learning rule, which is a form of semi-supervised learning rule that can support both unsupervised clustering applications and supervised learning with labels for shallow networks 173 .BrainscaleS chip implements the STDP rule.Moreover, Spinnaker 1 and 2 170,187 can implement a wide variety of on-chip learning algorithms since their designs make use of ARM microcontrollers providing lots of configurability for the users.

Open challenges
Generally, implementing on-chip online learning is challenging because of these two core reasons: locality of the weight update and weight storage.
Locality The learning information for updating the weights of any on-chip network should be locally available to the synapse since otherwise this information should be "routed" to the synapse by wires which will take a significant amount of area on chip.The simplest form of learning which satisfies this requirement is Hebbian learning which has been implemented on a variety of neuromorphic chips forms of unsupervised/semi-supervised learning 11,13,168,186 .However, Hebbian-based algorithms are limited in the tasks they can learn and to the best of our knowledge no large scale task has been demonstrated using this rule.Since gradient descent-based algorithms such as Backprop has had lots of success in deep learning, there are more and more spike-based error Backprop rules that are being developed as was discussed in Section 3.1.These types of learning algorithms have recently been custom designed in the form of spike-based delta rule as back-bone of the Backprop algorithm.For example, single layer implementation of the delta rule has been designed in 188 and employed for EMG classification 176 .Expanding this to multi-layer networks involves non-local weight updates which limits its on-chip implementation.Making the Backprop algorithm local is a topic of on-going research which we have discussed in Section 3.1.Recently, a multi-layer perceptron error-triggered learning architecture has been proposed to overcome the non-locality of multi-layer networks solving the spatial credit assignment problem on chip 106,189 Weight storage The ideal weight storage for online on-chip learning should have the following properties: (i) non-volatility to keep the state of the learnt weights even when the power shuts down to reduce the time and energy footprints of reloading the weights to the chip.(ii) Linear update which allows the state of the memory to change linearly with the calculated update.(iii) Analog states which allows a full-precision for the weights.Non-volatile memristive devices have been proposed as a great potential for the weight storage and there is a large body of work combining the CMOS technology with that of the memristive devices to get the best of two worlds.
In the next Section we provide a thorough review on the state of the art for the emerging memory devices and the efforts to integrate and use them in conjunction with neuromorphic chips.

Memristive devices and computing
The severe power and area constraints under which a neuromorphic processor for edge computing must work opened ways towards the investigation of beyond-CMOS solutions.Despite still at the dawn of its technological development, memristive devices have been drawing attention in the last decade thanks to their scalability, low-power operation, compatibility with CMOS chip power supply and CMOS fabrication process, and volatile/non-volatile properties.In Section 5.1, we will introduce memristive devices and the properties that are appealing for adaptive extreme edge computing paradigms.In Section 5.2, we will explore the role of memristive devices in neuromemristive systems and give examples of possible applications.In Section 5.3, we will discuss the current challenges and the future perspectives of memristive technology.

Conventional and wearable memristive devices
Memristive devices, as the name suggested, are devices which can change and memorize their resistance states.They are usually two-terminal devices, however, can be implemented with various physical mechanisms, resulting in versatile existing forms, e.g.resistive random access memory (RRAM, Fig. 3a and 3b) ( 25 ), phase change memory (PCM, Fig. 3c) ( 190 ), magnetic random access memory (MRAM, Fig. 3d and Fig. 3e) ( 191 ), ferroelectric tunneling junction (FTJ, Fig. 3f) ( 192 ), etc.The resistance memory of these devices can mimic the memory effect of the basic components of biological neural system, while the resistance changing can mimic the plasticity of biological synapse.Facilitated with their simplicity of two-terminal configuration and scalability to nanoscale, they are inherently suitable for the hardware implementation of brain-inspired computation materializing an artificial neural network, i.e. neuromorphic computation ( 193,194 ).This notation, in recent years, has incited wide investigations on the various memristive devices and on their applications in neural network learning and recognition, or, in short, memristive learning ( [195][196][197][198][199][200] ).The memristive learning can enable energy efficient and low latency information process within a reduced size of systems abandoning the conventional von-Neumann architecture.Among other benefits, this will also make it possible to process information where they are acquired, i.e. within sensors, and reduce the bandwidth needed for transferring the sensor data to data center, accelerating the coming of the era of Internet-of-Things (IOT).Table 2 summarizes the key features of the main memristive device technologies for neuromorphic / wearable applications in terms of cell area, electrical characteristics, main advantages and challenges.It is worth noticing that some figures of merit in this context are radically different with respect to standard memory requirements.Indeed, while in the memory scenario higher read currents enable faster reading speed, in neuromorphic applications currents as low as possible are preferred, since the current is a limiting factor for neurons' fan-out.Similarly, SET and RESET times should be as fast as possible in memory applications, while in our applications this requirement can be relaxed thanks to the lower operating frequency of the neurons (20 Hz to 100 Hz).Moreover, the number achievable conductance levels has to be increased ( 201 ).Some non-idealities which are usually detrimental for memory applications, for instance stochasticity of switching parameters, are even beneficial for the neural networks.
In addition to the commonly referred non-volatile type of memristive switching, the RRAM device can also show volatile behavior, which usually occurs when active materials such as silver or copper are used as electrode.The relatively long retention time of the volatile behavior (tens of milliseconds to seconds) is then found to be similar to the timescale of short term memory, and naturally was proposed to mimic the short term memory effect of biological synapses ( 20,23,218 ).
Although most researches on memristive devices are carried on rigid silicon substrates, the simple structure of memristive devices can also be realized on flexible substrates ( 219 ), which opens new interesting possibilities for realizing local computation within wearable devices ( 220,221 ).

Memristive neural components
As mentioned in Section 5.1, the primary function of memristive devices is the usage as synaptic devices to implement the memory and plasticity of biological synapses.However, there are increasing interests for these devices to be utilized to implement nanoscale and artificial neurons.On the neuron side, the memristive device gradual internal state change and its consequently abrupt switching closely mimic the integrate-and-fire behavior of biological neurons ( 222,224,225 , Fig. 4a-c).Due to the sample structure and nanometer level scalability, memristive neurons can be much more compact than current CMOS neurons which might consist of current sensor, analog-to-digital converter (ADC), and analog-to-digital converter (DAC), and capacitors, all of which are expensive to implement in current CMOS technology in terms of area and/or power consumption ( 226 ).The implementation of memristive neurons will also enable full memristive neuromorphic computing ( 227 ), which promises further increases in the integration of the hardware neuromorphic computing.
On the synaptic side, the key feature of the biological synapses is their plasticity, i.e. tunable weight, which can be generally implemented by resistance or conductance modification in the memristive devices (Fig. 4d).Fundamental learning rules based on STDP have already been widely explored ( 196,(228)(229)(230)(231) ).Spatial spiking pattern recognition ( 232 ), spiking co-incidence detection ( 233,234 ), and spatial-temporal correlation ( 223,235 ) has been reported recently.Synaptic metaplasticity, such as paired-pulse facilitation, can also be achieved via various device operation mechanism ( 20,236,237 ).

Memristive neural network architectures
There are generally two approaches for a hardware neuromorphic system implementing memristive devices as synapses: (i) deep learning accelerator, accelerating the artificial neural network computing with multiple layer and error back-propagation, as well as it's variations, like convolutional neural network, recurrent neural network, etc.; (ii) brain-like computing, attempting to closely mimicking the behaviors of biological neural system, like spike representation (Fig. 4d) and collective decision making behavior.In the deep learning accelerator approach, on-line training places more requirements for the memristive synapses.For instance, linear and symmetrical weight update is crucial for the on-line training ( 200,238 ), while off-line training ignores it since the synaptic weight can be programmed to the memristive device with fine tuning and iterative verify ( 239 ).
Collective decision making is an important feature of the brain computing, which requires high parallelism and, consequently, low current devices.For instance, this feature is the essential for Hopfield neural network ( 240 ), cellular neural network ( 241 ), and coupled oscillators ( 242 ).In the Hopfield neural network, the system automatically evolves to its energy minimization points leading the functionality of associative memory.The use of Hopfield like recurrent neural networks (RNNs) with memristive devices has already been successfully demonstrated in a variety of tasks ( 243,244 ).As an example of memristive based coupled oscillator network, 245 used a network of self-sustained van der Pol oscillators coupled with oxide-based memristive devices to investigate the temporal binding problem, which is a well known issue in the field of cognitive neuroscience.In this experiment, the network is able to emulate an optical illusion which shows two patterns depending on the influence of attention.This means that the network is able to select relevant information from a pool of inputs, as in the case of a system collecting signals from multiple sensors.

Applications of memristive neural networks
At present, memristive technology has been mainly used in relatively simple networks with Hebbian-based learning algorithms.However, more recently, systems able of solving different tasks, such as speech recognition ( 246), and exploring different architectures and learning algorithms are being investigated.In particular, the benefits of exploiting sparsity, mentioned in Section 3.2, are demonstrated for feature extraction and image classification in networks trained with stochastic gradient descend and winner-take-all learning algorithms ( 247 ), as well as in hierarchical temporal memory, which does not need training ( 248 ).
In the latest years, memristive devices have been used in applications closer to biology, enabling hybrid biological-artificial systems ( 249) and investigating biomedical applications, ranging from speech and emotion recognition ( 250 ) to biosignal ( 251) and medical image ( 252 ) processing.Finally, an interesting application is the one of memristive biosensors, which 253 used to implement a system for cancer diagnostic.The innovative use of memristive properties was demonstrated in hardware and opens the way to a broader use of memristive technology where sensors and computing co-exist in the same system or, possibly, in the same device.

Device non-idealities
Implementation of mainstream deep learning algorithms with Backprop learning rule and memristive synapses imposes some requirements for the memristive device, including linear current-voltage relation for reading, analog conductance tuning, linear and symmetric weight update, long retention time, high endurance, etc. ( 254).However, no single device can fulfill all these requirements simultaneously.
Various techniques have been proposed to compensate the device non-idealities.For instance, to compensate the non-linear current-voltage relation for reading, fixed read voltage with variable pulse width or pulse number can be used for synaptic 15/29 weight reading, and the readout is represented by the charge accumulation in the output nodes ( 255 ).Linear and symmetric weight update is crucial for accurate online learning of a memristive multilayer neural network with Backprop learning rule ( 238 ).However, PCM devices usually only show gradual switching in set direction (weight potentiation), while RRAM devices show gradual switching in reset direction (weight depression).To achieve linear and symmetric weight update, differential pair with two of these devices are usually used.For a differential pair with two PCM devices, the potentiation is achieved by applying set pulses on the positive part and the depression is achieved by applying set pulses on the negative part, thus gradual weight update in both potentiation and depression can be achieved.To further enhance the linearity of weight update, a minor conductance pair consisting of capacitors can be used for frequent but smaller weight update, and finally transferred to the major pair periodically ( 200 ).Another option to improve device linearity is limiting the device dynamic range in a region far from saturation and where the weight update is linear 256,257 .
In addition to mitigate the non-idealities of memristive devices, more and more research efforts are made to exploit these non-idealities for brain-like computations.For instance, the stochasticity or noise in reading of memristive device can be used for the probability computation for restricted Boltzmann machine ( 258 ), or escape for local minimization points in a Hopfield neural network ( 259).The Ag filament based resistive switching device shows short retention time and high switching dynamics, thus was proposed for reservoir computing ( 260) and spatiotemporal computing ( 218) to process time-encoded information.

Co-integration of hybrid CMOS-memristive neuromorphic systems
The main steps to be taken to exploit the full potential of an ASIC for end-to-end processing system go through the integration of memristive devices and sensors with CMOS technology.Indeed, the works presented so far are based either on simulations or on real device data, or on memristive chips interfaced with some standard digital hardware.Despite integration of CMOS technology has been demonstrated for non-volatile resistive switching devices already at a commercial level ( 261,262 ), the design of co-integrated memristive-based neuromorphic processors is still under development.We envisage a three-phase process to achieve a fully integrated system.
The first one is the co-integration of non-volatile memristive devices with some peripheral circuits ( 263) and to implement some logic and multiply-and-accumulate (MAC) operations ( 264), which reaches the maturity with the demonstration of a fully cointegrated SNN with analog neurons and memristive synapses ( 265 ).The second phase is the co-integration of different technologies.Despite this approach results in higher fabrication costs, it presents several advantages in terms of system performance, which can be more compact and potentially more power efficient.In particular, the co-integration of non-volatile and volatile memristive devices can lead to a fully memristive approach.As an example, 227 exploit volatile memristive devices to emulate stochastic neurons and non-volatile memristive devices to store the synaptic weights on the same chip, thus demonstrating the feasibility and the advantages of the dual technology co-integration process.Eventually, the final step which has to be taken in the development of a dedicated ASIC for wearable edge computing is the co-integration of sensors and memristive-based systems. 266tackled this challenge by designing and fabricating a gas sensing system able of gas classification.The system uses RRAM arrays as memory, Carbon Nanotube field effect transistor (CNFET) for computation and gas sensing, both 3D monolithically integrated on CMOS circuits, which carry out computation and allow memory access.

Learning with memristive devices
Adaptability is a feature of paramount importance in smart wearable devices, which need to be able to learn the unique feature of their user.This calls for the implementation of lifelong learning paradigms, i.e. the ability of continuously learning new features from experience.Typically, a network has a limited memory capacity dependent on the network size and architecture.Once the maximum number of experiences is recorded, new features learned will erase old ones, thus originating the phenomenon of catastrophic forgetting.
The problem of an efficient implementation of continual learning has been thoroughly investigated ( 267 ).In the current scenario, a dichotomy exist between backprop-based ANNs, which have very high accuracy but a limited memory capacity, and brain-inspired SNNs, which feature higher memory capacity thanks to their higher flexibility, but at the cost of lower accuracy.Models used to overcome forgetting are described in Section 3.3.The use of memristive devices in such networks is still an open point.It is possible that memristive device will be beneficial to increase the network capacity ( 268 ) at no extra computational cost thanks to their slow approach to the boundaries ( 269), but so far this topic is still quite unexplored.An interesting approach is proposed by 270 , where the key strengths of supervised convolutional ANNs, unsupervised SNNs, and memristive devices are combined in a single system.The results indicate that this approach is robust against catastrophic forgetting, whilst reaching 93% accuracy when tested with both trained and non-trained classes.

Discussion and Conclusions
In this study, we presented the state-of-the-art core elements that enable the development of wearable devices with extreme edge adaptive computing capability.Various sensors that can collect different bio-signals from the human body are investigated.

16/29
There is a variety of sensing specifications in terms of size, resolution, mechanical flexibility and output signals need to be considered along with their analogue readout circuit at a limited amount of power consumption.However, when the real-time processing of these signals is deployed on edge, severe constraints raise in terms of power efficiency, fast response times, and accuracy in the data classification.The widely-used solution is to find a trade-off between the energy and computational capacity, or send the data to the cloud.However, these strategies are not ideal and slow down the development of wearable smart sensing.To meet all the requirements, the development of a platform needs to be optimized in synergy with the other elements and every aspect of the design, from the learning algorithms to the architecture.
In particular, continual learning is required for adaptive wearable devices.In this respect, brain-inspired algorithms promise to be valid alternatives to standard machine learning approaches such as Backprop and BPTT.The exploitation of sparsity in network connectivity increases the power efficiency by optimizing the use of the available memory.However, the problem of algorithmic robustness to non-ideal hardware (such as noise and variability) and the problems of forgetting and information transfer between tasks still persist and have to be solved in combination with neuromorphic and emerging technologies.SNNs are conceptually ideal for low-power in-memory computing.Their event-based approach together with the use of analog subthreshold circuits to reproduce biological timescales, allows fast response times of the network while enabling smooth real-time processing of data.The encoding of the incoming signals into spikes is however still challenging.Moreover, a fully CMOS-based approach has two major technological issues.First, the synaptic weight is usually stored in SRAMs, which hold the state only in the presence of a power supply.Second, capacitors used to implement biological time constants are massive and may consume up to 60% of the chip area.Memristive technology can be beneficial in this respect.Non-volatile devices can potentially replace SRAMs and volatile devices offer a compact alternative to CMOS capacitors.Besides low-power operation in a small footprint, memristive devices also offer noisy properties, which -if exploited in the right way -might facilitate the implementation of stochastic learning algorithms.However, the technology is still at its infancy and fabrication processes are still under development, yielding high device variability, which makes it difficult to produce reliable multi-bit memory.
In summary, the ultimate goal towards smart wearable sensing with edge computing capabilities relies on a bespoke platform consists of embedding sensors, front-end circuit interface, neuromorphic processor and memristive devices.This platform requires high-compatibility of existing sensing technologies with CMOS circuitry and memristive devices to move the intelligent algorithm into the wearable edge without significantly increase the cost in energy.New solutions are needed to enhance the performance of local adaptive learning rules to be competitive with the accuracy of Backprop.Novel encoding techniques to allow streamless communication from sensors to neuromorphic chip have to be developed and flanked by efficient event-based algorithms.So far there is not a uniquely ideal solution, but we envisage that a holistic approach where all the elements of the system are co-designed as a whole is the key to build low-power end-to-end real-time adaptive systems for next-generation smart wearable devices.

Figure 1 .
Figure 1.A graphical overview of adaptive edge computing in wearable biomedical devices.The figure shows the pathway from wearable sensors to their application through intelligent learning.

Figure 2 .
Figure 2. Biologically inspired models of learning in spiking neural networks (a) The e-prop algorithm 95 approximates back-propagation through time using random feedback to propagate error signals to synapses of a recurrent SNN (adapted from 96 ) (b) Synaptic sampling97 exploits the variability of learning rules and redundancy in the task solution space to learn sparse and robust network configurations (adapted from 98 ) (c) Overcoming forgetting by selectively slowing down weight changes99 .After learning a first task A, parameter distributions are absorbed into a prior distribution that confines the motility of synaptic weights in subsequent tasks (task B).

Figure 4 .
Figure 4. Memristive devices as synapse or neuron for neuromorphic computing.(a)-(c) memristive device act as threshold device for the firing function of biological neuron ( 222 , reproduced under the CC BY license).(d) Conceptual illustration of memristive device as artificial synapse for brain-like neuromorphic computing ( 223 , reproduced under the CC BY-NC license).

Table 1 .
Summary of neuromorphic platforms and biomedical applications

Table 2 .
Key features of non-volatile memristive devices.