How the credit assignment problems in motor control could be solved after the cerebellum predicts increases in error

We present a cerebellar architecture with two main characteristics. The first one is that complex spikes respond to increases in sensory errors. The second one is that cerebellar modules associate particular contexts where errors have increased in the past with corrective commands that stop the increase in error. We analyze our architecture formally and computationally for the case of reaching in a 3D environment. In the case of motor control, we show that there are synergies of this architecture with the Equilibrium-Point hypothesis, leading to novel ways to solve the motor error and distal learning problems. In particular, the presence of desired equilibrium lengths for muscles provides a way to know when the error is increasing, and which corrections to apply. In the context of Threshold Control Theory and Perceptual Control Theory we show how to extend our model so it implements anticipative corrections in cascade control systems that span from muscle contractions to cognitive operations.


Introduction
The anatomy of the cerebellum presents a set of well established and striking facts [26,40], which have inspired a variety of functional theories over the years.In one of the first and most influential such theories, developed by a succession of researchers [61,1,43], the convergence of mossy fibers (which carry sensory and motor signals into the cerebellum) onto Purkinje cells supports pattern recognition in a manner similar to a perceptron.This pattern recognition capacity is used to improve motor control, and the Marr-Albus-Ito hypothesis states that the other major cerebellar input, the climbing fibers, provide a training signal that, thanks to conjunctive LTD on the parallel fiber synapses into Purkinje cells, allows for the right patterns to be selected.Within this general framework, a persistent challenge comes in determining what the right patterns are, and how they are used to improve motor control.This clearly depends, among other things, on what things trigger climbing fiber activity -the signal content of climbing fibers can lead to very different models.
For example, Ito [38] introduced a model where cerebellar microcomplexes (complete modular circuits) were inserted in reflex arcs and in the command systems of motor control, to give them the ability to learn adaptively.He also speculated that microcomplexes could provide adaptive learning for non-motor cortical circuits.This idea was refined in a model where the cerebellum formed inverse models of controlled objects, using the output of a feedback controller as a learning signal [47] (figure 1B).The difference betwen the desired trajectory and the actual trajectory is a sensory error, and the the feedback controller essentially acts as a linear transformation that changes that perceptual error into motor coordinates.A recent review [42] examined the signal contents of climbing fibers, finding the possibility that sensory errors arise when a forward model is being trained (figure 1A), whereas motor errors are used to train inverse models.The forward models considered in [42] predicted the response of the controlled object.
Within the context of the broad class of error correcting models of the cerebellum (i.e., where the climbing fibers signal an error), there is the further problem of knowing which actuators to stimulate in order to reduce the error.This is a complex issue in motor control, where the error could be the hand position, and the actuators the arm muscles.The model in [47] and subsequent models [83,37] address this issue by making the cerebellum learn an inverse model of the controlled system, which can then be used to directly translate motor commands in one part of the system (e.g., the arm muscles) into effects on another (the hand).Nevertheless, it is unclear whether the cerebellum is indeed an inverse model [25], and more generally whether it is computationally necessary to develop an inverse model, which can be quite complex.For example, such an inverse model would need considerable flexibility to adapt to a wide variety of changes in the motor plant, such as fatigue, additional loads, etc.
In this paper, we provide an alternative to the inverse model approach, by leveraging two simple assumptions, which lead to a class of models with particular benefits.The first assumption is that climbing fiber activity arises in response to increases of an error measure over time, not just to raw instantaneous error values.The second assumption is that the cerebellar microcomplexes implement forward models that predict the subsequent response of a central controller to encountered error conditions (figure 1C).The error signal we use is similar to the error signal in motor coordinates presented in [47] in that it can arise due to an error in a feedback control system, but it uses sensory coordinates.These sensory coordinates, being part of the control loop, are linearly related to the motor coordinates.This is consistent with the fact that both sensory and motor information is present in complex spikes [51,90].Using the onset of sensory error increase as the signal encoded in complex spikes, Purkinje cells (PCs) can learn a forward model of the controller that performs anticipative corrections, but only stores associations for those sensorimotor contexts where errors are prone to increase, instead of decreasing.These are points that indicate an in-sufficiency of the motor control mechanisms, and thus signal the times when the cerebellum is needed to provide anticipatory corrective action.
Morevoer, this climbing fiber signal does not require high firing rates, and the magnitude of the error correction could be obtained through several mechanisms such as cumulative learning through time, graded complex spikes [67,101], or complex spike synchrony.On the other hand, by having the cerebellum model the response of a central controller (not necessarily a feedback controller), we solve the problem of deciding which actuators to stimulate in order to reduce the error, so we don't require inverse or forward models of the controlled objects as long as the central controller is implementing the right actions, even if in an untimely manner.
We make these ideas concrete in three ways.First we review part of the cerebellum literature supporting the idea that the cerebellum could function by associating corrections with the contexts where they are needed, and that complex spikes could provide the learning signal mentioned above.Second, we implement a realistic physical simulation of a 3D reaching task with a central motor controller implemented using a variant of the Equilibrium Point Hypothesis, and a computational implementation of our cerebellar correction system.Simulations of our computational model are used to show how the cerebellum gradually reduces errors when reaching.Finally, we present a mathematical formulation for our cerebellar correction system, and prove that it can improve performance in the reaching task under particular assumptions.
Our computational simulations can use one of two types of errors for the cerebellum.The first type of error is the distance between the hand and the target, which proves to be sufficient to obtain predictive corrections.Interestingly, by using the Equilibrium Point hypothesis [30] we can alternatively use a second type of error signal generated for individual muscles that extend when they should be contracting.This allows the cerebellum to perform anticipative corrections in a complex multidimensional task like reaching using learning signals that arise from 1-dimensional systems.This learning mechanism can trivially be extended to serial cascades of feedback control systems, such as those posited by Perceptual Control Theory [77,78] and Threshold Control Theory [30,54], allowing the cerebellum to perform corrections at various levels of a hierarchical organization spanning from individual muscle contractions to complex cognitive operations.We elaborate on this in the Discussion.

Results
We hypothesize that the role of the cerebellum in motor control is to associate afferent and efferent contexts with movement corrections produced by a central controller.The role of the central controller is to reduce error, and the role of the cerebellum is to anticipatively apply the corrections of the central controller.The main idea is described in Figure 2. Before an incorrect motion is made (moving the hand away from the target), the mossy fibers reaching the granule layer have afferent and efferent information that could predict when this error will occur.When the error does increase during a reach, this is indicated by complex spikes, while the central motor controller is acting to correct the error.The cerebellum associates the afferent and efferent information of granule cells shortly before the increase in error with the motor actions required to correct it, using climbing fiber activity as the training signal.The corrective motor actions are those that the central motor controller produces in order to stop the error from increasing, which come shortly after the onset of error increase; thus the cerebellum doesn't have to obtain those actions itself, it can merely remember what the central controller did.This idea is related to Fujita's feedforward associative learning model [32], and is not far from the paradigm of the cerebellum working as an adaptive filter [21] We formally validated our assumption about cerebellar function in the case of reaching by showing that if the central controller acts like a force always pointing at the target, and whose magnitude depends only in the distance between the hand and the target, then an idealized cerebellar controller will necessarily reduce the energy of the system, resulting in smaller amplitude for the oscillations, and less angular momentum.This result is presented in the first part of the Models section.The remainder of this section explains the biological plausibility of the model's assumptions, and presents simulations results.

Biological basis of the model
In this section we further describe the model and its biological underpinnings.We consider four main regions of an olivo-cerebellar circuit, namely the inferior olivary nucleus, the granule layer, Purkinje cells, and cerebellar nuclei.Putative functions of these regions are described next.

The inferior olivary and complex spikes
What the climbing fibers (CF) encode is still a contentious issue, and different assumptions lead to different models of cerebellar function.One set of assumptions is that the CF activity encodes performance errors involving the neuronal circuits of the PCs receiving those CFs.CF activity has indeed been found to be related to performance errors and unpredicted perturbations [85,10,48,100], but it also has been found to correlate with both sensory and motor events, so that the nature of what is being encoded remains controversial [10,2,57] For example, complex spike discharge in the flocculus can be modulated by head rotation in the dark, when there are no visual fixations to correct [33,84]; this modulation can be reduced by in-phase rotation of visual and vestibular stimuli, which extinguishes vestibulo-ocular reflex adaptation [6].Other examples are the weak modulation of climbing fiber activity by the current phase of locomotion [100], or the presence of climbing fiber receptive fields on the forelimbs of cats [28].
Models assuming that the cerebellum implements an inverse model of the controlled object tend to assume that the climing fiber error signal in in motor coordinates [47] (figure 1B).To reconciliate this with the presence of sensory 2. The context at this point will be associated with the correction.3. The error begins to increase.4. Complex spikes reach the cerebellar cortex in response to the error increase.5.The error is no longer increasing.6.The context at point 2 becomes associated with a correction, which consists of the mean efferent activity (roughly) between points 3 and 5. 7. Final hand position.B) After the correction in A) is learned, and the same reach is attempted, the trajectory will be modified upon approaching point 2, with the correction being applied anticipatively (blue line).Notice that a different trajectory (red line) which passes through the spatial location of point 2 may not elicit the correction learned in A).This is because the correction is applied when its associated context is near to the current context (which is a point in state space); those contexts contain velocities, efferent activity, and target location in addition to the arm's spatial configuration.
information in climing fibers, it has been assumed that some cerebellar circuits implement forward models of the controlled object (figure 1A).Forward models are trained with sensory errors, whereas inverse models are trained with motor errors [42].It is unclear how cerebellar microcomplexes, which appear to perform similar computations all through the cerebellum, can be specialized to implement either a forward or an inverse model.
Given the evidence that the cerebellum performs a predictive function [5,25,75,76], and given the uniformity of cerebellar circuits, it would be desirable to have a forward model that is capable of correcting either feedforward or feedback controllers, and whose learning signals are both sensory and motor.Such a model is the one in figure 1C.For a fixed desired trajectory, changes in sensory inputs can produce increases in sensory errors, enabling complex spikes.The complex spike activity at the population level should provide enough information about sensory events, so it could be assumed that the IO broadly encodes sensorimotor events [72].The information that gives rise to the complex spikes, however, is driving the motor response, and in the case of increasing performance errors it should be related to motor activity.Notice that this model is compatible with simple spikes encoding errors with both a lead (as contexts are associated with future corrections) and a lag with the opposite modulation (as the sensory error is an input to Purkinje cells) [75,76,74].
In the specific case of a reaching model, complex spikes could signal an increase in error, with the error being the distance between the hand and the target.This is different from assuming that complex spikes perform a lowfrequency encoding of the error [82,48,50] because our onset signal doesn't track the error's magnitude, it is only related to the positive part of the error's derivative.
If error increase constitutes the training signal for our cerebellar network, we can ask how the magnitude of the error is encoded, and how can and can this error signal be reliably distinguished from background noise.We consider three possible solutions, one of them being that a larger amount of synchrony among the olivary cells projecting to a microzone signals a larger need for a correction.This assumption seems to agree with experimental observations.For example, an air-puff stimulus (a potentially nociceptive stimulus which warrants closing the eyelid) can be reliably decoded when ensemble information is used, but not using the response of individual cells [72].In another example, phase of locomotion has a rather small modulatory effect on climbing fiber activity, but perturbations cause much larger responses [100].Moreover, mice lacking electrotonic coupling in the inferior olivary nucleus show impairment in a learning-dependent motor task, or when conditioning their eye-blink responses to a tone [88].Notice that if a perturbation elicits a complex spike in only one Purkinje cell, and if cerebellar responses depend on the simple-spike synchronization of several Purkinje cells, then the synaptic plasticity caused by that lone complex spike may not suffice to create a significant response, since plasticity in several Purkinje cells is required to achieve simple-spike synchrony; this could make background noise harmless to the learning mechanism.
The main factor controlling complex spike synchrony may be the phase of subthreshold oscillations in the inferior olivary nucleus, because stimuli arriving near the peak of this phase have a larger chance of causing a response.To precisely convey the timing of increase onsets and to encourage stability it is important to have a wide range of phases in the subthreshold oscillations of inferior olivary cells [45], which largely depends on the coupling strength of olivary gap junctions [58].The complex desynchronized spiking mode [81] has a wide range of phases, and we show in our computational simulations that assuming something like this leads to good results.In our simulations we model the subthrehold oscillations of the IO cells so that the probability to spike for each cell depends on both the strength of the input signal and the phase of the subthreshold oscillation.Larger increases in the error produce stronger input signals to the inferior olivary, which are reflected by a larger number of neurons responding; thus, for any short time interval, the magnitude of the error increase is reflected by the number of inferior olivary cells spiking in synchrony.The inhibitory feedback from the cerebellar nuclear cells, in addition to functioning as a negative feedback system to control simple spike discharges [7], could also help to avoid large clusters of synchronized inferior olivary cells, so as to maintain the complex desynchronized spiking mode.
Another way for the magnitude of an error increase to be encoded could be graded complex spikes [67,101], with longer durations indicating larger increases in the error.Finally, the magnitude of corrections can be obtained through repetitive learning, with small changes in plasticity on individual trials that accumulate as long the error increase is still present.The factors indicating the magnitude of the error are encompassed in our computational model by an equation indicating the probability for each inferior olivary cell to produce a spike (equation 19 in the Models section)

Granule Layer
Since the classical models of Marr and Albus [61,1], the function of the granule layer in many models has been to recode the information arriving from the mossy fibers so as to enhance discrimination in the Purkinje cell layer [73,8].We follow this paradigm as well.We assume that the mossy fibers convey information about the kinematic state of limbs, as well as the motor commands sent to those limbs.We refer to this information as the context, and the set of possible contexts is the state space of our model.
Another issue is the role played by the granule layer for temporal representation.Timing of events and movements is a putative cerebellar function [44,68,59], which in some models exists thanks to the granule cell-Golgi cell loop.[31,14,97,98,99].Timing is relevant in our model because the context that becomes associated with a correction, and the correction itself, happen at different times, before and after a complex spike.To train the cerebellar network to correct movements before the error happens, we must either retain a memory trace of the contextual information before the arrival of a complex spike, or use a forward model for training.In keeping with the simplicity of our assumptions, we use a memory trace of contextual information .This could take place within the granule layer as in the models above, or it could be encoded by intracellular chemical factors (such as calcium concentration) within the Purkinje cell dendrites.There is a wide variety of molecules involved in conjunctive LTD [39], and some of them could conceivably adjust the optimal delay between synaptic activity and complex spike onset to produce synaptic modifications.Due to ease of implementation, our computational model assumes that a memory trace of recent activity is encoded within Purkinje cells (an assumption similar to that in [32]).Learned granule cell layer activities are represented mathematically in our model using radial basis functions [13].

Purkinje cells
Models based on the Marr-Albus framework tend to frame motor control as a pattern recognition problem, where particular sensory patterns are associated with motor commands.This includes most adaptive filter models [20].The role played by Purkinje cells under this paradigm is usually that of a perceptron, where error-driven learning shapes appropriate motor commands based on highdimensional granule-cell input patterns.We also adopt the idea of associating contexts with motor commands.The association of contexs with commands could happen both in the indirect loop (involving projections from granule cells to PCs and MLIs, and projections from PCs to the cerebellar and vestibular nuclei), and in the direct loop (which proceeds from mossy fibers to cerebellar nuclear cells).In fact, in addition to modulating the activity of cells in the cerebellar nuclei, inhibitory PC inputs could provide a learning signal for the association in the direct loop, as discussed below.Our computational model is limited to associating contexts, represented as radial basis functions, with motor commands, in this case the average muscle activation for a particular period of time.

Cerebellar nuclei
Our computational model performs a large absraction by associating contexts with muscle activations.The following discussion follows the reasons why we deem this biologically plausible.
The neurons in the Deep Cerebellar Nuclei (DCN), through their projections to motor cortex, brainstem, and spinal chord, can modulate the motor system's activity.We assume that the direct loop, which proceeds from mossy fibers to the DCN, is responsible for sending corrections to motor commands, modulated by the indirect loop.The mossy fiber inputs are excitatory, and synapse onto most nuclear cells.In addition, excitatory climbing fiber collaterals from the inferior olivary nucleus synapse onto excitatory and inhibitory nuclear cells [62,102].We assume that the appropriate inputs to implement the corrections are selected trough the excitatory synapses from mossy fibers to nuclear neurons.Nevertheless, most of those inputs are inhibited by asynchronous simple spikes received from Purkinje cells.
We can adopt the hypothesis that in order to provide corrections to motor commands, the cerebellar nuclei cells learn to increase their firing rate in response to sensorimotor contexts which are predictive of an increase in error.In other words, the main role of plasticity in the synapses between mossy fibers and nuclear cells is to store patterns of mossy fiber activity which precede increased and/or synchronzed inhibition on the nuclear cells by a few hundred milliseconds.Whenever these patterns are present, the firing rate of nuclear cells is increased.This increase in firing rate could have several causes, such as a reduction in Purkinje cell inhibition, mossy fiber stimulation onto strengthened synapses, or synchronization in the Purkinje cell input causing rebound spikes (Jaeger,Raman).Some learning mechanisms that could be responsible for this behavior are described next.First, when using the equilibrium point hypothesis [30,54], associating the output of the DCN cells with the right muscle contractions may be simplified significantly, at least when correcting reaching motions.We can achieve this by generating error signals (complex spikes) whenever a muscle becomes longer while its current length is larger than its threshold length.Since the muscle was supposed to contract but got longer instead, the appropriate corrective signal is a contraction for that muscle.Therefore, if the muscles whose afferent information triggers complex spikes in a microcomplex also receive activation from that microcomplex, then the excitation arising from the DCN before an error should contribute to reduce that error.In this case, each DCN cell could stimulate a particular muscle or a set of agonist muscles, and as long as the errors from those muscles are reflected as complex spikes or synchrononous simple spikes arriving at the DCN cell, then the error and the correction can be associated unambiguously.In the general case, however, the DCN cell may have to learn to stimulate the right muscles so its correction is effective.We now turn to this case.
Our overall idea of learning on the direct loop is that DCN neurons come to respond preferentially to stimuli which predict an increase in "error," signaled by complex spike inhibition from the IO.In addition, the increase in error could also be signaled by an increase in simple spike synchrony in the Purkinje cell input, due to indirect loop learning mechanisms hypothesized in a separate paper (Verduzco-Flores, submitted).The assumption that DCN neurons learn which stimuli predict error increases aided by their Purkinje cell inputs comes from evidence indicating that cerebellar cortical memories correspond to fast adaptation, and are gradually transfered to the vestibulum and cerebellar nuclei [63,46,71].In turn, the DCN activity can be used in conjunction with signals from higher motor centers in a second learning process, which ensures that the DCN cells excite the motor units responsible for implementing the correction after the error.This second learning process may take place in the spinal cord, the brainstem, or motor cortex.We focus on the learning mechanism which is controlled by inhibitory inputs from Purkinje cells onto DCN neurons, because computational simulations suggest that only this form of learning can persist in the presence of background cerebellar activity [64].These simulations show that when the learning is triggered by nucleus or climbing fiber activity in the presence of a reasonable background cerebellar activity, the memories fade due to spontaneous drifts of synaptic strength.
The critical learning mechanism for DCN neurons happens when high-frequency trains of synaptic excitation precede a period of inhibition and disinhibition, which causes EPSCs to undergo a synapse-specific LTP.This LTP is largest when the input excitation precedes the posthinibitory rebound by roughly 400 ms [79,102].When stimuli arrive before or after this LTP window, LTD takes place.
The sequence of an inhibition period followed by a postinhibitory rebound is expected to occur in DCN cells under normal physiological conditions.Such a sequence has been proposed as an appropriate mechanism to learn the conditioned stimuli in classical eyelid conditioning, which has been found to depend on the cerebellum [65].Generalizing this, we believe that learning should result in DCN cells being driven mostly by the inputs associated with errors from their climbing fiber receptive fields.This is consistent with the assumption that mossy fiber activity preceding complex spikes gradually elicits simple spike activity that is more synchronized, which is better suited to elicit LTP.The experimental results from [35], in very simplified terms, suggest that small rebound activity can bring LTD, and large rebound activity can bring LTP; the largest rebound activity comes from deep hyperpolarizations caused by complex spikes or simple spikes arriving very close to each other.Furthermore, simultaneous excitation and inhibition can cause a calcium influx large enough to induce LTP, while stimulation of group I mGluR channels alone can lead to a smaller calcium influx, only sufficient to induce LTD [103].DCN inputs which are not associated with errors, by not being paired with a strong enough inhibition, will tend to undergo LTD.On the other hand, DCN inputs associated with errors will occur along with synchronous volleys of simple spikes or with complex spikes, both bringing deep hyperpolarizations that propitiate LTP.
We have assumed that DCN neurons should learn to respond to stimuli which predict the need for a correction.We still need the output of those DCN neurons to activate the motor units which implement that correction.This is particularly important in tasks like reaching, where it is not straightforward which muscles should be activated in order to reduce an error.In this case we assume that it is the central controller, presumably acting through motor cortex, that decides which muscles should be contracted or relaxed; the task of DCN cells is to apply those actions anticipatively when required, which requires for DCN projections affecting a particular set of muscles to become strengthened when those muscles are activated following complex spikes in the DCN's microcomplex.In this way, the activity of DCN cells would share one trait with Purkinje cell inhibition: both would serve the dual role of modulating their downstream targets, as well as providing a learning signal.The downstream targets would modify their synapses according to when they become activated with respect to the learning signal.
A sensible assumption is that the plasticity mechanisms that lead DCN activity to preferentially affect a particular set of muscles take place in the spinal cord.Activity-dependent plasticity in the spinal cord is a well established phenomenon, and the activity guiding the synaptic modifications often comes from descending inputs [91,93].The resulting adaptation in the spinal cord can reshape reflexes during development, such as the withdrawal mechanism and stretch reflexes, or it can create modifications that support skill acquisition [91].There is a continuous barrage of descending input reaching the spinal cord, and there must be a way to select appropriate modifications using that input; the cerebellum constitutes one possible source of training signals.The cerebellum is required for acquisition and maintenance of down-conditioning for the H-reflex [18,94], which is an electrical analog of the spinal stretch reflex.There are probably other adaptations which lead the output of the cerebellum to provide useful responses.One reflection of these adaptations could be that the limb movement controlled by a cerebellar module is such that it would tend to move the cutaneous receptive field of its climbing fibers away from a stimulus applied to the skin [29,27].
One simple mechanism which could potentially underlie learning of the appropriate DCN targets is temporally-asymmetric Hebbian learning.What is required is that when a motor unit is excited (directly or indirectly) by a activity arising from a DCN cell, and this stimulation is followed by stimulation from a connection descending from motor cortex, then the connections from the DCN cell to the motor unit should be strengthened.Conversely, when a motor unit is stimulated by activity from a DCN cell, and this stimulation is followed by a reduction in the descending input to the motor unit, the connections from the DCN cell should be weakened.Each time a correction is applied, one particular set of DCN projections will be activated, with different corrections activating different sets of projections.Any given motor neuron should receive inputs whose activity originates from different sets of DCN projections.The learning rule above aims to strengthen the combinations of DCN inputs to a motor neuron that involve contraction of its corresponding muscle, and to weaken those that don't; this could be achieved by the combination of recurrent inhibition and back-propagation of action potentials in α-motoneurons [89].The true learning mechanism may be more complicated, and considering that adaptation of even the seemingly simple H-reflex involves plasticity at multiple sites [92], it may involve plasticity in brainstem nuclei and motor cortex.If the DCN neurons projected directly to the motor neurons, the motor neurons may perform only linear separation of their patterns, but if the activity is relayed through an intermediate layer in the brainstem nuclei, non-linear separation is possible [3].
The adaptations triggered by cerebellar activity do not have to be limited to selecting appropriate targets for DCN cells.One proposed function of the cerebellum is to support sensory data acquisition [11,60].The DCN signals produced by our model are well suited to influence sensory transduction when the information obtained is not sufficient for the correct performance of a task.Also, in light of the ascending projections to the cerebral cortex through the ventrolateral thalamus, the DCN activity is likely to induce changes in motor cortex activation.

Physical simulation of the arm
In order to test the principles of our cerebellar model in 3D reaching tasks we created a detailed mechanical simulation of a human arm.Our arm model contains a shoulder joint with 3 degrees of rotational freedom, and an elbow joint with one degree of rotational freedom.The actuators consist of 11 composite muscles which represent the main muscle groups of the human arm (figure 3).Some of these muscles wrap around "bending lines," which are used to model the curved shape of real muscles as they wrap around bones and other tissue.The force that each muscle produces in response to a stimulus comes from a Hill-type model used previously with equilibrium point controllers [34].The mechanical simulation was implemented in SimMechanics, which is part of the Matlab/Simulink software package (http://www.mathworks.com/).

Central controller
The central controller we use to perform reaching is a modified version of Threshold Control Theory (TCT, [30]).In the lambda version of the equilibrium point hypothesis the control signals arriving at the spinal chord specify threshold lengths for muscle activation.We refer to these as target lengths.The same argument could be made about threshold velocities being the control signals at the spinal cord level and threshold lengths at a higher level as long as velocity is obtained in the muscle spindles.This velocity representation is found in spindle afferents [55,56,23,24].The two level control system is inspired by the hierarchical organization found in TCT and in Perceptual Control Theory [77,78], and is capable of stabilizing oscillations with far more success than pure proportional control.In general, it is hard to stabilize movement without velocity information, so this factor has been introduced in equilibrium-point controllers [19,53].As in TCT, we assume that the forces are generated at the level of the spinal cord, similarly to the stretch reflex, and we assume a proprioceptive delay of 25 ms.
The way our controller guides reaching starts by mapping the Cartesian coordinates of a target into the muscle lengths that the arm would have with the hand located at those coordinates.In order to make this mapping one-toone we assume that that the upper arm performs no rotation.The difference between the current muscle length and the target muscle length will produce a muscle stimulation, modulated by the contraction velocity (details in the Models section).

Cerebellar model
The cerebellar model provides motor commands whenever an "error-prone area" of state space is entered.Each error-prone area consists of a point in state space (its center, or feature vector), and a kernel radius.To each error-prone area there also corresponds a "correction vector," specifying which muscles are activated and which are inhibited when the error-prone area is entered.At each iteration of the simulation the distance between the currently perceived point in state space and the center of each error-prone area is obtained, and each correction vector will be applied depending on this distance, modulated by its kernel radius.The kernels used can be exponential or piecewise linear.
Learning in the model requires an error signal, which could be visual (such as the one that may be generated in posterior parietal cortex [22]), or could arise from muscle afferents.Block diagrams corresponding to the model with the visual and muscle error signals are in figure 4. The visual error signal arrives with a delay of 150 ms.Each time the error increases its magnitude (its derivative becomes positive) this increases the probability of complex spikes; for each IO cell, this probability also depends on the current phase of its subthreshold oscillaiton.Complex spikes generate a new error-prone area.The feature vector associated with this area is the state of the system a short time span before the error increased; usually this time span will be half the time it takes Figure 4: A) Computational model with the visual error signal.The error consists of the distance between the hand and the target, and increases of this error cause the forward model to associate the context with a correction.The correction is based on the difference between the muscle length and the target length shortly after the error increase.B) Model with the proprioceptive error signal.The error is the muscle length minus the target length.Increases of this error in a particular context will cause the pattern classifier to apply an anticipative contraction when that context arises.
for the error derivative to go back to zero, plus an amount of time comparable to the perceptual delay.For as long as the error derivative is positive, at each iteration we will record the efferent signals produced by the central controller, and when the derivative stops being positive we will obtain the average of all the recorded efferent signals.The correction is obtained from this average.The muscles are driven by the velocity errors (see the Models section), so these are the efferent signals collected during correction period.All the kernel radii were equal, so they have no change associated with learning.
Notice that if the error derivative remains positive, more complex spikes will be generated as different olivary cells reach the peak of their subthreshold oscillations.Thus, we have two gain mechanisms for a correction: one comes from the the magnitude of the error derivative, which will promote a large response (and synchronous activity) of complex spikes; the second comes from the amount of time that the error derivative remains positive, since more inferior olivary cells reaching the peak of their subthreshold oscillations while this derivative is positive will mean a larger number of complex spikes, creating error-prone areas along the trajectory of the arm.Performance-wise, it is beneficial to have a sequence of error-prone areas rather than a single one, since the appropriate correction to apply will change as the arm moves.
When the new feature vector is too close to a previously stored one, or when we have already stored too many feature vectors, then the new feature vector will become "fused" with the stored feature vector closest to it.When two areas fuse they are both replaced by a new area whose feature vector is somewhere along the line joining the feature vectors of its parent areas, and likewise for its correction vectors.
More details are provided in the Models section.

Simulation results
As described before, we made simulations with two types of error signals.The first one is the visual error signal, creating complex spikes when the distance between the hand and the target increases.The second one is the proprioceptive signal, created when a muscle elongates despite the fact that it is already longer than its target length (lambda value).
Reaching a static target guided by the central controller with no cerebellar input resulted in slow motions which were prone to oscillations.Certain target locations could lead to instability due to proprioceptive delays, and the inability of the muscular system to reduce the angular momentum in the absence of damping.Unstable locations were not chosen for the simulations, and gravity was also excluded.
To test that the cerebellar corrections could gradually reduce the error as learning progressed through successive reaches, we selected 8 target locations and simulated 8 successive reaches to each target.This was done for both types of error signals.Panel A of figure 5 shows the evolution through time of the distance between the hand and the target in the 1st, 4th, and 8th reaches towards a representative target when using the the first type of error signal Figure 5: A) Distance between hand and target through 4 seconds of simulation time for the first, fourth, and eighth reaches.The cerebellar system was trained using the distance between the hand and the target as the error, and the target had coordinates X=20 cm,Y=40 cm,Z=-20 cm.Notice how the first reach (red line) is slower, and oscillates away from the target after approaching it.This is significantly improved on the eighth reach (blue line).B) Integral of the distance between the hand and the target during the 4 seconds of simulation for the 8 successive reaches.Each bar corresponds to the value obtained from averaging this performance measure across the 8 targets.The bars were normalized by dividing between the value for the first reach.(increase in the distance between hand and target).To measure the success of a reach we obtained the time integral of the distance between hand and target.Smaller values of this performance measure indicate a faster, more accurate reach.Panel B of figure (reachesC) shows our performance measure for each of the 8 successive reaches, averaged over the 8 targets.Figure reachesD shows the corresponding results for the second type of error signal (undue increases in length from individual muscles).
It can be observed in figures 5,6 that on average the performance increases through successive reaches.The error may not decrease monotonically, however, since the correction learned in the last trial may put the system in a new region of state space where new errors can arise within the time of the simulation.As could be glimpsed from the mathematical study in this paper, the cerebellar corrections could make the arm unstable if learning is not restricted to situations when the error is relatively large and it increases sufficiently fast.

Discussion
We have presented a model of how the cerebellum may reduce errors associated with climbing fiber activity when that activity arises from the increase in some error measure.Instead of assuming that complex spikes encode the magnitude of some performance error, we have assumed that they are generated when the derivative of the error becomes positive.This leads to a sparse code which generates a forward model for anticipative corrections.This forward model exists only in locations of state space where the error is prone to increase, and predicts the output of a central controller, not the output of the controlled object.Although we have assumed that the central controller uses closed-loop feedback, this is not necessary.Our model has the potential to explain the presence of predictive and feedback performance errors in Purkinje cell simple spikes [75,76], the correlation of complex spikes with both sensory and motor events [42], the sparsity of complex spikes, and as discussed below, the role of the cerebellum in nonmotor operations [41,12,52,74].
We have explored this idea for cerebellar function in the context of reaching in 3D space.We proved mathematically that for idealized versions of the central controller, the arm, and the cerebellum, the cerebellar corrections are guaranteed to reduce oscillations.This mathematical treatment provides a clear proof of concept, and just as importantly, indicates ways in which our cerebellar system could fail.We then then presented a computational model of reaching in 3D space.The errors in the computational model could be generated by one of two mechanisms, both showing that by remembering the corrections performed by a central controller, and associating those corrections with particular contexts we can provide predictive control without the need to predict the kinematic or dynamic state variables of the controlled plant.We also showed that a signal which very loosely represented the positive part of the error derivative is sufficient to train our predictive controller.The type of corrections that our model cerebellum provides avoid episodes where the hand gets away from the target; this is important when using a controller based on the lambda model of the equilibrium-point hypothesis [30].A controller which only specifies a set of target muscle lengths (and not a trajectory of such lengths) may produce reaches by simultaneously contracting all the muscles whose lengths are longer than their desired lengths.This, in general, will not result in a straight-line reach.What the cerebellar controller does is to modify the activity of antagonist muscles at different points of the trajectory so that the hand monotonically approaches its target, producing a reach which is closer to a straight line.
Our computational model doesn't reflect how nuclear cells learn to respond to signals associated with errors.We presented hypotheses of how this could be achieved depending on the type of error signal produced by the inferior olivary.In the case of a visual signal that measures the distance between the hand and the target, we could rely on spinal cord plasticity.In the case of error signals arising from individual muscles the problem is simplified, as the cells in cerebellar nuclei only have to affect the muscles that generate their error signals.In effect, cerebellar modules could work as 1-dimensional systems, with an adaptive filter system as the one in [31] or [17] being sufficient to perform the corrections.A prediction of the model is thus that excitatory cells in the deep cerebellar nuclei should increase their rate in response to commands which affect the regions where the inferior olivary cells in their microcomplex have their receptive fields [29,27].This response, however, may be suppressed by inhibition from Purkinje cells in common contexts where no errors are made.The relation between the firing of Purkinje and nuclear cells, however, should not be straightforward, particularly when measuring only one Purkinje cell at a time [9].We believe that the effect of Purkinje cell activity on nuclear cells will only become evident if it is possible to simultaneously measure the activity of several Purkinje cells in a microcomplex.
Another prediction when using visually generated errors is that plasticity at the level of the brainstem or the spinal cord may be essential for ensuring that the cerebellar corrections achieve their intended effect, at least during the development period and for the control multiple-jointed limbs.Some models assume that plasticity in the cerebellum is distributed between the cerebellar cortex and the deep cerebellar nuclei [80].Our model posits one further memory site outside of the cerebellum, responsible for adjusting the effect of its outputs.The outputs of cerebellar cortex both modulate and act as a learning signal for the vestibulum/cerebellar nuclei.In turn, the output from the cerebellar nuclei modulate and train the response in the brainstem/spinal cord.
A model which is related to the the one in this paper was presented by Fujita [32].In this model, associative learning is used to link motor commands with the subsequent corrections performed by a high-level controller.Fujita assumed that if a high-level motor center unit made a projection to a microcomplex, then the nuclear cells of that microcomplex and the motor center unit would encode the same information.We have no high-level motor center units in our model; instead we made the assumption that plasticity mechanisms outside the cerebellum permit the nuclear cell activity to affect the muscles required for the correction.Another difference with our model is that the context we associate with a correction may contain afferent information [33,36,16], and allows for the possibility that the same motor command may require different corrections under different circumstances.
It has been shown that perceived errors are sufficient to produce adaptation in reaching movements, so that executing the corrective motion is not necessary for improving performance [49,87].As in [32], movement execution is not necessary to training our model, as long as shortly after committing an error a copy of the subsequent efferent command reaches the cerebellum, even if that command is suppressed.
After decades of research, there are many ideas about the role of the cerebellum in motor control coming from different viewpoints, such as eye movement control, eyeblink conditioning, control of grip forces, and motor control of speech [60].A present challenge is to harmonize those ideas by finding the basic cerebellar computations which allow to improve performance in the vast array of behaviors where the cerebellum is involved.This challenge is compounded by the requirement of finding plausible biological mechanisms that implement  Increases in this difference cause the olivo-cerebellar module (OC-MODULE) to associate the perceived context at the time of the increase with an anticipative correction.The effect of this correction could be additive, or it could modify a gain on the signal at the GAIN block.Notice that the difference between a threshold value and a perceived value could set the threshold of more than one control loop.those computations.A candidate set of computations fall under the umbrella of forward models and feedforward processing [95,70], with the most common use of forward models being sensory prediction [96,5,25].We have taken the idea of sensory prediction and transformed it into error onset prediction.The prediction of error onset can take place with a sparse training signal, like the one provided by climbing fibers, where the most important factor is the timing of its occurrence.The cerebellum does not necessarily need to compute the corrections which it presumably applies, since there is already a control system obtaining those corrections after an error is made.If the cerebellum can learn the onset of errors, all it may need to do in order to improve performance is to anticipatively apply the corrections that the control system would produce after the error.Moreover, the signal provided by the cerebellar nuclei is well suited to improve sensory data acquisition.The account of the cerebellum as an anticipative error correction system has the potential to explain its role in a variety of behaviours.There are other cerebellum models that provide anticipative error correction [4,15,69]; our model is characterized by predicting the response of a controller as it interacts with the environment, and limiting its activity to episodes of error increase.Also, to obtain its commands, our model can not only rely on a central controller, but also on error signals from individual muscles, or on a mixture of both.For example, we can generate error signals when the hand separates from the target (a visual signal), and the corrections arising from that signal can be applied only to those muscles whose length is larger than their target value.The authors have found that such a hybrid system has unimpaired error-reducing abilities (data not shown).
Perhaps the most interesting aspect of the model comes from its application to hierarchical models such as Threshold Control Theory (TCT) [30,54], and Perceptual Control Theory (PCT) (Powers73,05).Briefly, TCT posits that movement control begins by setting a threshold value for muscle lengths.Muscle contraction happens in response to the muscle length exceeding this threshold.For a given set of threshold values, interaction with the environment brings the organism to an equilibrium position; the organism needs to learn the threshold values that result in desired equilibrium positions.To solve redundancy problems with minimal action, this paradigm can be extended hierarchically.For example, if there is a neuron that responds montonically to the aperture of the elbow angle, a controller can set a threshold value for that neuron (the neuron responds only when the elbow angle goes beyond the threshold).The elbow angle neuron can in turn set the threshold lengths of the biceps and triceps brachii muscles so that the its threshold value can in fact control the elbow angle.At a higher level, there could be neurons that respond to the arm configuration, and affect the threshold levels for neurons responding to shoulder, elbow, and wrist angles.Each hierarchical level works as a feedback control system whose set point is specified by the level above.In this paradigm, known as cascade control, each level isolates the levels above from disturbances (as long as the lower levels are on a faster timescale than the higher levels), and redundancies are resolved automatically.PCT shares some of the same ideas as TCT.In PCT the organism seeks to control its perceptions (instead of TCT's equilibrium positions), and this is achieved through a cascade control scheme, going from individual muscles to advanced cognitive operations.PCT also proposes a mechanism allowing such a hierarchy of control systems to arise.In either TCT or PCT cerebellar modules can improve the performance of individual control loops if an error signal is emitted whenever a threshold value is exceeded (figure 7).The emission of this error signal can be conditioned on the error increasing on a higher level.This consitutes a hypothesis of how the cerebellum could function to improve motor and cognitive operations using repetitions of the same modular circuit.

Models 4.1 A formal proof of the cerebellum's ability to reduce error
Consider a point of mass m, moving under the influence of a central force F(r), where r = r , r is the position vector of the mass, and at any moment F(r) is a vector directed opposite to r (pointing towards the origin).Define f (r) ≡ F(r) .We assume that f (0) = 0, and f ′ (r) > 0 ∀r ≥ 0. We will identify the point mass with the hand, the origin with the target, and the force with the central controller.Before introducing a description for the idealized cerebellum we will summarize some classic results about central forces in the next lemma [66,86].

Lemma 1. Under the exclusive action of the central force, the trajectory of the point mass is: a) Contained in a plane. b) A level curve of the energy function. c) A closed curve.
Proof.To see that the trajectory is contained on a plane, first notice that since the force always points towards the origin, there is no torque with respect to it.By conservation of angular momentum, the angular momentum L of the particle is a constant vector.Let p denote the momentum of the particle.By definition L = r × p.This means that r is orthogonal to L, but L is constant, so the particle moves on a plane.
At this point we can define a pair of fixed coordinate axes x,y in the plane of motion.We can also define radial and tangent vectors e r (t), and e θ (t), where e r (t), is a unit vector pointing from the origin to the point mass at time t, and e θ (t) is a unit vector orthogonal to e r (t) as in panel A of figure 8.The position of the mass is determined by its polar coordinates (r, θ), where θ(t) is the angle between e r (t) and x.The location of the system in state space is fully determined by the vector (r, θ, ṙ, θ).
To prove the next two parts of the lemma, let's write down the equations of motion for the particle.The force creates a potential V (r), so that V ′ (r) = f (r).The Lagrangian of the system is The resulting equations of motion are Equation 3 states the conservation of angular momentum.We can rewrite it as and notice that |L| = L .Using equation 4 to eliminate θ from equation 2, multiplying by ṙ, and integrating, we obtain where E is a constant of integration that equals the energy of the system (the right-hand side of the equation is just the energy equation with θ removed using equation 4).The fact that the motion of the particle obeys this equation proves the second part of the lemma.The expression enclosed by parentheses in equation 5 is known as the effective potential.
Since we assumed f ′ (r) > 0, V (r) is a convex function, and the effective potential has the general shape shown in figure 8.The effective potential may attain the value E in at most two points r 1 , r 2 , which denote the minimum and maximum radii of the trajectory respectively.Notice that smaller values of E will tend to reduce r 2 , and smaller values of L will reduce r 1 .This will become relevant in the following theorems.
To show that the trajectory is a closed curve we use the result that in conservative systems, trajectories around isolated fixed points are closed.If we write equation 2 as two first-order differential equations, eliminate θ using equation 4, and equate to zero we can see that the only fixed point of our system occurs at ṙ = 0, f (r) = L 2 /mr 3 .This is the case of circular motion.
We will now describe the idealized cerebellum, which will act by applying an instantaneous impulse to the mass whenever the current point in state space reaches one of a set of previously stored points.Each correction will thus consist of a pair (x c , I c ), where x c is the point of state space where the impulse I c is applied.
Definition 1.Let S(t) denote the point in state space of our central force system at time t.Let τ, α, d be positive real numbers, with α ∈ (0, 1).A cerebellum with speed threshold τ , displacement threshold d, and gain α will create a correction at the point x c = S(t c ) at time t c whenever two conditions are met: 1.
The impulse vector to be applied at point x c is obtained from: The value ξ 3 in equation 7 is the least upper bound of all values ξ * 3 satisfying the next 3 conditions: 3.
The first condition for ξ 3 states that the error should be increasing at a rate of at least τ during the integration interval.The second condition for ξ 3 is that the mass shouldn't rotate around the origin more than π/2 radians.When L = 0 this is trivially satisfied, since equation 4 implies θ(t) = 0 ∀t.When L = 0 it suffices to have: In order to see this, notice that equation 4 implies θ = L/mr 2 .Therefore, the condition is equivalent to ).Thus, a sufficient condition is |L|ξ * 3 /mr 2 (t c ) < π/2.Because of this condition, in our computational simulations we reduced the probability of storing a correction when the value of r was small.The third condition for ξ 3 means that the impulse will not be powerful enough to increase the original magnitude of the radial velocity after reversing it.
Applying an instantaneous impulse is akin to "teleporting" to a different point of state space, and is physically impossible.This abstraction models the application of an anticipative correction close to the point of "teleportation", in which case the impulse is applied through a force that is active for a short period of time.
We are now ready to prove that in the case of repeated identical trajectories the cerebellum reduces the energy and angular momentum of the system.Theorem 1.Let the trajectory of the system in state space S(t) have initial conditions S 0 = (r 0 , ṙ0 , θ 0 , θ0 ) with r 0 > 0, and assume that a correction is created at point x c = S(t c ).If we start the trajectory once more at S 0 , and apply the correction at x c , then the energy E of equation 5 will be reduced after the impulse is applied.If L > 0, the angular momentum will also be reduced.
Proof.For visualization purposes, we will take advantage that save for a reflection, the trajectory of the point mass at time t c will always look as in figure 9A.Write the correction's impulse as I c = I r e r (t c ) + I θ e θ (t c ), and the particle's momentum as p(t c ) = mv(t c ) = mv r (t c )e r (t c ) + mv θ (t c )e θ (t c ).Notice that L = mv θ (t), and ṙ(t) = v r (t).
We first show that I c acts to reduce |L|.To do this, we show that I θ v θ ≤ 0, and |I θ | < |mv θ |.Since the momentum p after the application of the impulse comes from the previous momentum plus I c , then L will be reduced.
Assume that integration in equation 7 stops due to condition 1.Then the velocity of the particle at time t c + ξ 3 will be almost tangential to e r (t c + ξ 3 ), but pointing a bit "to the right" depending on the value of τ (figure 9B), which means a positive e θ (t c ) projection. e r (t c + ξ 3 ) will be in between e r (t c ) and e θ (t c ), otherwise condition 2 would be false.Therefore, in the interval (t c , t c +ξ 3 ) F(r(t)) points to the left and v(t) points to the right, so I θ v θ < 0.Moreover, the accumulated impulse from integrating F(r(t)) is not sufficient to make v(t) point to the left, so If integration in equation 7 stops due to condition 2, this means that the particle is still moving "to the right" when crossing the horizontal line going through the origin, as otherwise condition 1 would be false.The same argument as in the previous case is applicable.
If the integration stops due to condition 3, then the velocity will still point to the right, since equation 4 implies that the angle increases monotonically.
To show that the energy is decreased after the impulse, we will show that the kinetic energy component (1/2)m ṙ2 in equation 5 is reduced.Since the effective potential only depends on the radius, and the impulse does not change the radius at the moment of its application, then the energy will decrease.
From condition 2 we know that F(r(t)) has a positive upwards component in (t c , t c +ξ 3 ), so I r v r (t c ) < 0. Condition 3 ensures that even if the impulse reverses the direction of v r , its magnitude will not increase; namely |I r | < 2m|v r (t c )|. Therefore ṙ2 will decrease its value, and this is supported by the fact that α < 1.
Theorem 1 shows that the cerebellum works well when encountering errorprone points of state space that were visited before.These are discrete points, however, and form a set of measure zero (i.e. a set with no volume).If we expect the cerebellum to generalize its corrections to some extent, then those corrections should still be useful when applied at points of state space near the point where the correction was created.The next theorem addresses this issue.
Theorem 2. Let S(t) denote an arbitrary trajectory of the system through state space.Assume the cerebellum has stored a correction (x c , I c ).Then ∃ η > 0 such that S(t * )− x c < η implies that I c applied at S(t * ) will reduce the energy of the system.
Proof.Let's write x c = (r c , ṙc , θ c , θc ).Let e r (t c ) and e θ (t c ) be the radial and tangential vectors corresponding to the point x c .The impulse of the correction and the velocity of the point at t c in the trajectory where the correction was created can be written as Right after the impulse application at x c the new radial velocity will be ṙ+ c = ṙc + (I r /m), and the new angular momentum will be L + = L + I θ .The change in energy after the impulse will thus be: where φ = θ I − θ.From theorem 1 we know that ∆E(x c ) = ∆E c < 0. We want to know if there is a sphere centered at x c where all values of ∆E are negative.
To do this, we will use the fact that the gradient of ∆E is bounded near x c .We'll start by obtaining the partial derivatives.
Notice that r c ≥ d > 0. Let δ 1 = rc 2 , and define the ball The Fundamental Theorem of Calculus for line integrals states that for any given trajectory γ of length λ starting at x c and ending at some point s B on the boundary of B(x c , δ 1 ), and with tangent vector ν(t) we have We define the number J to be the magnitude of the vector (h 1 , I c , h 2 , |I c ).
Since only the end points of the trajectory determine the value of the integral, we may choose a straight line from x c to the boundary of B(x c , δ 1 ), yielding Considering that ∆E(s B ) = ∆E c + γ ∇(∆E) • dν, we can ensure that ∆E(s B ) remains negative if we make a short enough displacement along the straight line trajectory of equation 12. Namely, we can find a displacement δ 2 such that Jδ We can now ensure that if s ∈ B(x c , η) then ∆E(s) < 0.
One final issue to address is that the two theorems above assume that the only force acting on the point mass is the force field F(r).This ceases to be true when we start storing corrections associated with balls of positive radius in state space, since these corrections provide an impulse to the mass.When these balls don't overlap it is clear that the conclusions of the theorems are still valid.We could then modify the radii of the balls so that they don't overlap.In practice we don't find a necessity of ensuring that zones associated with a correction don't overlap in our computational model.

Equations for the central controller
The central controller performs two tasks in order to reach for a target.The first task is, given the coordinates of the target, to produce the muscle lengths that would result from the hand being at those coordinates.The second task is to contract the muscles so that those target lengths are reached.
The first task (inverse kinematics) requires to map 3D coordinates into an arm configuration.The spatial configuration of the arm which leads to hand location is specified by 3 Euler angles α, β, γ at the shoulder joint, and the elbow angle δ.In order to create a bijective relation between the 3D hand coordinates and the four arm angles we set γ = 0.
For a given target hand position we calculate the angles α, β, γ, δ corresponding to it.Using these angles we calculate the coordinates of the muscle insertion points, from which their lengths can be readily produced.When the muscle wraps around a bending line we first calculate the point of intersection between the muscle and the bending line.The muscle length in this case comes from the sum of the distances between the muscle insertion points and the point of intersection with the bending line.
The formulas used to calculate the angles α, β, γ, δ given hand coordinates (x,y,z) and the shoulder at the origin are: Where L arm and L f arm are the lengths of the upper arm and forearm respectively.If we have the coordinates of a humerus muscle insertion point (as a column vector) at the resting position, then we can find the coordinates of that insertion point at the position specified by α, β, γ using the following rotation matrix: The coordinates of insertion points on the forearm at the pose determined by α, β, γ, δ are obtained by first performing the elbow (δ) rotation of the coordinates in the resting position, and then performing the shoulder rotation (α, β, γ).Details on how to determine whether a muscle intersects a bending line can be found in the function piece5.m,included with the source code.This function also obtains the point of intersection, which is the point along the bending line that minimizes the muscle length.
Once we have found target equilibrium lengths for the muscles, we must contract them until they adopt those lengths.To control the muscles we use a simple serial cascade control scheme.The length error e l of a muscle is the difference between its current length l and its equilibrium length λ.The velocity error e v is the difference between the current contraction velocity v (negative when the muscle contracts), and the length error e x : . The constants g l , g v are gain factors.The input to the muscles is the positive part of the velocity error.This creates a force which tends to contract the muscle whenever its length exceeds the equilibrium length, but this force is reduced according to the contraction speed.At steady state the muscle lengths may or may not match the equilibrium lengths, depending on the forces acting on the arm.To promote stability the output of the central controller went through a low-pass filter before being applied to the muscles.Also, to avoid being stuck in equilibria away from the target, a small integral component was added, proportional to the time integral of the central controller's output.

Algorithm for the cerebellum simulations
We will describe the part of the computational model which deals with the functions of a microcomplex (the file CBloop11c .m of the source code).To simplify the exposition, we do not consider the case when the maximum number of "feature vectors" have been already stored.
The input to the microcomplex model has components which represent error, and afferent/efferent signals.The error component consists of the distance between the hand and the target (the visual error), and its derivative (from which complex spikes are generated).The afferent information includes a quaternion describing the shoulder joint position, the derivative of this quaternion, an angle describing the elbow position, and this angle's derivative.The efferent input is the muscle input described in section 4.2 (consisting of 11 velocity errors), and in addition, the desired shoulder position (expressed as a quaternion), and the desired elbow angle.The error and its derivative arrive with a visual delay of 150 ms.The rest of the information arrives with a proprioceptive delay of 25 ms.
The output of the microcomplex consists of 11 additional signals which will be added to the muscle inputs.
The algorithm's pseudocode is presented next.An unhandled spike is a complex spike whose "context", consisting of the afferent/efferent signals and the error briefly before the spike, has not been stored as a "feature vector".A "feature vector" is a context associated with a motor correction.
At each step of the simulation: 1: Generate complex spikes using the error derivative 2: if there are unhandled spikes then if If the error derivative is no longer positive, or the time since the spike exceeds 300 ms then 2.1.1:Store the context corresponding to the unhandled spike as a new feature vector 2.1.2:Store the motor correction associated with the new feature vector end if end if 3: For each feature, calculate its distance to the current context, and add its motor correction to the output as a function of that distance The process of generating complex spikes when using the visual error is explained next.There are N inferior olivary cells, from which N 3 are assumed to oscillate at 3 Hz, and N 7 are assumed to oscillate at 7 Hz.The phases of both cell subpopulations are uniformly distributed so as to occupy the whole range [0, 2π] in the equation below.Let φ(i) denote the the phase of cell i, and α(i) denote its angular frequency.The probability to spike at time t for cell i is calculated as: Where p is a constant parameter, E is the visual error, and [E ′ ] + is the positive part of its derivative.At each step of the simulation a random number between 0 and 1 is generated for each cell.If that number is smaller than P i CS , and the cell i has not spiked in the last 200 ms, then a complex spike is generated.
Notice that complex spikes are less likely to be generated when the error is small.When the hand is close to the target it is likely that it oscillates around it.Generating cerebellar corrections in this situation could be counterproductive, as the angle between the hand and the target changes rapidly, and so do the required corrections.The second condition for equation 7 in the mathematical definition of our idealized cerebellum is the one that ensures that no corrections are created when the angle between the hand and the target has changed too much.Another mechanism present in our computational simulations to ensure this, is that no corrections are stored if the time between the complex spike and the time when the error stops increasing is more than 250 ms.
In step 2.1.1,the stored feature vector consists of the context as it was τ v − τ p + t−tcs 2 milliseconds before the complex spike, with τ v being the visual delay, τ p the proprioceptive delay, t the current time, and t cs the time when the complex spike arrived.
In step 2.1.2,the motor correction that gets stored is the average motor input from (t cs − τ v + τ p ) to (t − τ v + τ p ).
The output that the microcomplex provides at each simulation step is obtained using radial basis functions.The distance between the current context and each feature vector is calculated, and those distances are normalized.The contribution of each feature vector to the output is its corrective motor action scaled by an exponential kernel using that normalized distance.Let f (i) be the i-th feature vector, and w(i) its associated correction.Let v denote the vector with the current context information.We first obtain a distance vector D, whose components are: The distance vector is normalized as D N = ( √ M F / D )D, where M F is the maximum number of feature vectors allowed.The contribution of feature i to the output is F (i) = w(i)e γDN (i) , with γ being the kernel radius.
The coordinate for the targets used in our test reaches are shown in table 4

Figure 1 :
Figure 1: A) Typical use of a forward model to improve the performance of a feedforward controller.The forward model learns to predict the response of the controlled object to the motor commands, using an error that considers the difference between the predicted trajectory and the realized trajectory.Red lines indicate signals used for training of the forward model.Based on figure 1A of (ref Ito13).B) Use of an inverse model to improve the performance of a feedback controller using the feedback error learning scheme of (ref KawatoGomi92).The output of the feedback controller is used to approximate the error in the motor command, so the inverse model can be trained.The red line indicates the learning signal.C) The forward model proposed in this paper when used to improve the performance of a feedback controller.The forward model associates a context consisting of a variety of sensory and motor signals with a command produced by the controller.The context will be associated with future controller commands whenever the sensory error increases, indicated by the red line.

Figure 2 :
Figure 2: A) Schematic trajectory of the hand as it reaches for target T in 2 dimensions.Seven points of the trajectory are illustrated, corresponding to seven important points in time with different afferent/efferent contexts.1.Initial position of the hand.2.The context at this point will be associated with the correction.3. The error begins to increase.4. Complex spikes reach the cerebellar cortex in response to the error increase.5.The error is no longer increasing.6.The context at point 2 becomes associated with a correction, which consists of the mean efferent activity (roughly) between points 3 and 5. 7. Final hand position.B) After the correction in A) is learned, and the same reach is attempted, the trajectory will be modified upon approaching point 2, with the correction being applied anticipatively (blue line).Notice that a different trajectory (red line) which passes through the spatial location of point 2 may not elicit the correction learned in A).This is because the correction is applied when its associated context is near to the current context (which is a point in state space); those contexts contain velocities, efferent activity, and target location in addition to the arm's spatial configuration.

Figure 3 :
Figure 3: Geometry of the arm model.Blue lines represent the upper arm and forearm, with the small black sphere representing the shoulder.Red lines represent muscles.Cyan lines are bending lines.The colored spheres (with color representing their position along the Z axis) show the location of the targets used in the reaching simulations.

Figure 6 :
Figure 6: Same as figure5, but the cerebellar system was trained using an error signal produced when muscles became larger than their target value.

Figure 7 :
Figure7: Olivo-cerebellar modules used to anticipatively adjust threshold values in a cascade control scheme.The difference between a received threshold value and a value perceived from the environment is transmitted to the olivo-cerebellar module.Increases in this difference cause the olivo-cerebellar module (OC-MODULE) to associate the perceived context at the time of the increase with an anticipative correction.The effect of this correction could be additive, or it could modify a gain on the signal at the GAIN block.Notice that the difference between a threshold value and a perceived value could set the threshold of more than one control loop.

Figure 8 :
Figure 8: A) Coordinate axes for the planar motion.B) Effective potential as a function of distance from the origin

Figure 9 :
Figure 9: A) A rotated view of the motion, where the vector er(tc) points downwards and the vector e θ (tc) points to the right.B) General direction of all vectors when the correction is formed.

2 c. ( 10 )
Let θ I be the angle between the I c vector and the x axis.The angle φ c formed by the vectors I c and e r (t c ) can be written as φ c = θ I − θ c .We may also write I r = I c cos(φ c ), I θ = I c sin(φ c ). Substituting this and equation 4 into equation 9 we get ∆E c = I c ṙc cos(φ c ) + I c 2 cos 2 (φ c ) 2m + I c sin(φ c ) θc + I c sin(φ c ) 2mr Using equation 10 we may explore how the energy changes as a function of where the impulse is applied in state space.For a given vector s = (r, ṙ, θ, θ) with r > 0 we may define a function ∆E(s) ≡ I c ṙ cos(φ) + I c 2 cos 2 (φ) 2m + I c sin(φ) θ + I c sin(φ) 2mr 2 ,