Context-dependent memory decay is evidence of effort minimization in motor learning: a computational study

Recent theoretical models suggest that motor learning includes at least two processes: error minimization and memory decay. While learning a novel movement, a motor memory of the movement is gradually formed to minimize the movement error between the desired and actual movements in each training trial, but the memory is slightly forgotten in each trial. The learning effects of error minimization trained with a certain movement are partially available in other non-trained movements, and this transfer of the learning effect can be reproduced by certain theoretical frameworks. Although most theoretical frameworks have assumed that a motor memory trained with a certain movement decays at the same speed during performing the trained movement as non-trained movements, a recent study reported that the motor memory decays faster during performing the trained movement than non-trained movements, i.e., the decay rate of motor memory is movement or context dependent. Although motor learning has been successfully modeled based on an optimization framework, e.g., movement error minimization, the type of optimization that can lead to context-dependent memory decay is unclear. Thus, context-dependent memory decay raises the question of what is optimized in motor learning. To reproduce context-dependent memory decay, I extend a motor primitive framework. Specifically, I introduce motor effort optimization into the framework because some previous studies have reported the existence of effort optimization in motor learning processes and no conventional motor primitive model has yet considered the optimization. Here, I analytically and numerically revealed that context-dependent decay is a result of motor effort optimization. My analyses suggest that context-dependent decay is not merely memory decay but is evidence of motor effort optimization in motor learning.


INTRODUCTION
After a few years of not training in a previously learned sport, it is easy to forget how to move one's body in a manner that is suitable for the sport. This phenomenon is known as memory decay and is a well-known component of motor learning. Indeed, recent studies suggest that motor learning includes at least two processes: the minimization of movement error and motor memory decay (Scheidt et al., 2001;Smith et al., 2006;Hirashima and Nozaki, 2012).
The properties of error minimization have been extensively investigated using perturbation paradigms such as a force field paradigm (Shadmehr and Mussa-Ivaldi, 1994) or a visuomotor transformation paradigm (Krakauer et al., 1999). In these paradigms, subjects exhibit trial-by-trial adaptations to the novel environment, and the movement error between the desired and actual movements decreases in each trial. The learning effect of error minimization generalizes when kinematics (e.g., the movement direction) change (Thoroughman and Shadmehr, 2000;Donchin et al., 2003). For example, when trained in unimanual reaching movements toward a target direction θ, the learning effects of the error minimization are partially available for reaching movements toward other target directions. In other words, the availability of the learning effect depends on context (in this example, the context is equal to the target direction). In the following, I refer to the learning effects of error minimization as learning effects and to the context-dependent availability of learning effects as generalization.
Generalization can be modeled using a motor primitive framework (Thoroughman and Shadmehr, 2000;Donchin et al., 2003). In this framework, the activities of the motor primitive determine the motor commands, the recruitment pattern of the motor primitive is determined by a target direction, and the motor commands are updated to minimize movement error. The motor primitive framework can naturally reproduce the generalization because a group of motor primitives recruited in reaching movements toward a target direction θ overlaps with a group of primitives recruited in movements toward other directions, θ ( = θ), which results in learning effects embedded in the motor primitives responsible for reaching movements toward θ are partially available in other reaching movements. Recent studies have revealed that the motor primitive framework can reproduce generalization in various situations, such as motor learning in a force field (Thoroughman and Shadmehr, 2000;Donchin et al., 2003) or visuomotor rotation (Tanaka et al., 2009), motor learning of bimanual reaching movements (Yokoi et al., 2011), motor learning of unimanual reaching movements when the both target direction and shoulder posture change (Brayanov et al., 2012), the dependence of the generalization on error-feedback information (Taylor et al., 2012), and the generalization between reaching movements of the right hand and those of the left hand (Yokoi et al., 2014). Thus, the properties of error minimization or generalization in a wide range of situations can be explained by the motor primitive framework.
By contrast, the properties of memory decay have rarely been investigated. In the perturbation paradigms, the memory of the learning effects decays in each trial (Scheidt et al., 2000). In the following, I refer to the memory of learning effects as motor memory. Nearly all the models of motor primitives assume that the rate of motor memory decay is context independent (Thoroughman and Shadmehr, 2000;Scheidt et al., 2001;Donchin et al., 2003;Smith et al., 2006;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Hirashima and Nozaki, 2012;Taylor et al., 2012;Takiyama and Okada, 2012a;Yokoi et al., 2014), i.e., the motor memory learned with reaching movements toward θ decays during the trials of the reaching movements at the same speed as during trials of reaching movements toward other directions, θ ( = θ ). In the motor primitive framework, the context-independent decay is equivalent to weight decay (Hirashima and Nozaki, 2012). The weight values of each motor primitive determine how the primitives contribute to generate motor commands, and the weight decay introduces a trial-by-trial decay of the magnitudes of the weight values (see the Weight Decay section in the Results for details). However, a recent behavioral experiment found that the decay rate of motor memory can be context dependent (Ingram et al., 2013). When trained in reaching movements toward a target direction θ, the motor memory decayed faster during the trials of reaching movements toward θ than during trials of reaching movements toward other target directions θ ( = θ ). The authors of this study reported that a conventional model of motor learning could be fit well to their data with the assumptions of both generalization function and context-dependent memory decay. Notably, the authors heuristically introduced context-dependent decay to fit the conventional model to the actual data. Thus, the reason that memory decay is context dependent remains unclear. Although motor learning has conventionally been modeled as an optimization framework (Thoroughman and Shadmehr, 2000;Scheidt et al., 2001;Donchin et al., 2003;Smith et al., 2006;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Hirashima and Nozaki, 2012;Taylor et al., 2012;Takiyama and Okada, 2012a,b;Yokoi et al., 2014), e.g., movement error minimization, no conventional optimization framework can reproduce the context-dependent memory decay, i.e., context-dependent memory decay raises the question of what is optimized in motor learning.
In the current study, I extend a motor primitive framework to reproduce the context-dependent decay of motor memory. Specifically, I introduce effort minimization, i.e., the minimization of motor command magnitude (Harris and Wolpert, 1998;Fagg et al., 2002), into the framework because previous studies have reported the minimization during motor learning processes, such as adapting to a force field (Emken et al., 2007;Huang et al., 2012) and a visuomotor transformation (Huang and Ahmed, 2014) and because no conventional model of the motor primitive has yet considered effort minimization. Effort minimization is a widely accepted framework in motor control because the standard deviation of muscle activity is proportional to the muscle activity itself, which allows effort minimization to effectively minimize motor noise (Jones et al., 2002) or endpoint variance (Harris and Wolpert, 1998). Thus, it is natural to introduce effort minimization into conventional frameworks of motor primitives. Here, I analytically and numerically demonstrate that a motor primitive framework can reproduce contextdependent decay with the assumptions of both error and effort minimization.

GENERAL FRAMEWORK
The present study focused on unimanual reaching movements in the horizontal plane toward radially distributed targets whose directions were defined as θ ∈ [−π, π] ( Figure 1A). During each reaching movement, an unpredictable perturbation p, such as a force field (Shadmehr and Mussa-Ivaldi, 1994) or a visuomotor transformation (Krakauer et al., 1999), was imposed and yielded a movement error e ( Figure 1B). The aim of the task was to accurately reach to a given target by generating an additional motor command x to compensate for the movement error, i.e., e = p − x. Here, following several previous studies, I assumed that x represents the lateral force in the force adaptation (Yokoi et al., 2011;Brayanov et al., 2012;Yokoi et al., 2014) or the movement angle in the visuomotor rotation (Tanaka et al., 2009;Taylor et al., 2012) at the time of the peak velocity because these values are considered to represent the degree of adaptation in a feedforward controller, which many motor learning studies have focused on. That is, p represents an applied lateral force in a curl force field paradigm (the units of p, x, and e are newtons) and a rotated movement angle in a visuomotor rotation paradigm (the units of p, x, and e are degrees). Notably, the movement error does not include the target direction. The motor primitive framework assumes that the motor commands of the baseline trials (before experiencing the adaptation paradigm) x b can be appropriately generated, i.e., x b = τ θ in force field adaptation, where τ θ is the desired torque to achieve the reaching movements toward θ, and x b = θ in the visuomotor rotation adaptation. When the perturbation p is applied and the compensatory motor command x is generated, the baseline motor command is distorted as x b − p + x, resulting in movement errors that can be written as e = τ θ − (x b − p + x) = p − x in the force field adaptation and e = θ − (x b − p + x) = p − x in the visuomotor rotation adaptation.
Following previous motor primitive models (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Taylor et al., 2012;Yokoi et al., 2014), we assumed that the recruitment pattern of the motor primitives or the motor primitive activities are determined by a target direction θ and the Gaussian The target direction (desired movement direction) θ t determined the activities of the motor primitives A i (θ t ) in the t-th trial. The task was to generate a motor command x t to compensate for an environmental change (perturbation) p t . The motor command x t was determined by a linear combination of motor primitive activities, , and these linear coefficients were modified to minimize a cost function E t (e.g., the squared movement error and effort): . These procedures are summarized in the Summary of the Motor Primitive Framework Section in the Materials and Methods.
where the scaling parameter σ i = σ is independent of i; ϕ i is the preferred direction (PD) of the i-th primitive, which is randomly sampled from a uniform distribution in the range [−π, π]; ||θ|| is a periodic function ||θ|| = ||θ + 2π || such that ||θ|| = θ for −π ≤ θ < π (θ = 0 is defined as the target for a straightforward reaching movement, Figure 1A), and i = 1, ..., N, and N is the number of primitives. The scaling parameter σ controls the number of primitives responsible for a reaching movement toward θ because the i-th primitive is activated independent of θ when σ is sufficiently large (i.e., the number of recruited primitives is large) and because the primitive shows its activity only in the case θ = ϕ i when σ is sufficiently small (i.e., the number of recruited primitives is small). The PD determines the reaching direction in which the i-th primitive shows its maximal activity. Because the PD was randomly sampled in each simulation run, the simulation results were variable across runs. The standard deviation of the simulation results is plotted in Figures 3D-I. The compensatory motor command for the reaching movement toward θ , x(θ ), is a linear summation of the motor primitive activities A 1 (θ), ..., A N (θ): where the adaptable weight value W i determines how the i-th primitive contributes to the generation of the motor command. Each weight W i is modified by (gradient descent rule) at the t-th trial to reduce the cost function E t , where the positive constant η > 0 denotes the learning rate. Conventional motor primitive frameworks assume that E t consists of the squared movement error e 2 t and the sum of squared which leads to the learning rule of Equation (12) in the Results section or to context-independent decay of motor memory. Here, I suppose that the cost function consists of the squared movement error and the squared motor command amplitude (effort) x 2 t , which leads to the learning rule of Equation (15) in the Results section or to context-dependent decay of motor memory. (4) where λ 1 and λ 2 are regularization parameters to determine the tradeoff between the minimization of the squared movement error, the minimization of the sum of the squared weight values, and the minimization of effort.

SUMMARY OF THE MOTOR PRIMITIVE FRAMEWORK
The motor primitive framework can be summarized as follows. Setting the parameters σ , η, λ 1 , λ 2 , p to certain values, e 0 = 0, W 0 = 0, ϕ i to a randomly sampled value in the range [−π, π], and the target direction θ t ∈ [−π, π] to a certain value at the t-th trial yields the following: (Determining recruitment pattern of motor primitives) (Update of linear coefficients) Notably, with the assumptions of the motor primitive frameworks and the gradient descent as the learning rule, these procedures are exact and without approximation. This motor primitive framework can explain trial-dependent changes in the movement Frontiers in Computational Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 4 | 3 error and generalization (Thoroughman and Shadmehr, 2000;Donchin et al., 2003).

GENERALIZATION FUNCTION
Learning effects trained with reaching movements toward θ t were generalized to untrained reaching movements toward θ through a generalization function G(θ t , θ), where θ t and θ were assumed to be the target directions in the training and test trials, respectively. Test trials indicate the trials to probe learning effects during or after the training trials. Equation (4) indicates the learning process during the training of reaching movements toward θ t . The degree of generalization from the reaching movements toward θ t to other reaching movements toward θ can be derived by multiplying A i (θ) and summing across all values of i in Equation (4): where the generalization function is defined as and both λ 1 and λ 2 are set to 0 for simplicity (cases in which λ 1 = 0 or λ 2 = 0 are discussed in the Results section). The generalization function G(θ t , θ) is defined as the inner product of the recruitment pattern of motor primitives during reaching toward θ t and those for reaching toward θ. Larger overlaps between the two recruitment patterns result in greater generalization. In addition, the generalization function is dependent only on σ (see Equation (10) described below) and is independent of other parameters, including the perturbation p.
More intuitively, when the number of primitives N is sufficiently large and η is divided by N (most conventional motor primitive models assume these conditions), the generalization function can be approximated as which indicates that the closer the target direction in the test trials (θ) is to the direction in the training trials (θ t ), the greater the generalization. In addition, the generalization function depends only on σ . Although this derivation of Equation (10) requires a sufficiently large N, N does not significantly influence the shape of G(θ t , θ) (Takiyama and Okada, 2012a). Thus, Equation (10) permits an intuitive understanding of the results of the current study. Furthermore, this type of generalization function suitably explains the actual data from a wide range of motor learning experiments (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Taylor et al., 2012;Yokoi et al., 2014).

STATE SPACE MODELING
Although the motor primitive framework assumed trial-by-trial variations in the weight value, W, it was possible to model only the trial-by-trial variation of the motor command x. In conventional state space models, motor learning is modeled as an optimization process regarding motor commands: where E t is a cost function at the t-th trial, e.g., the squared movement error. Notably, Equation (9) is equivalent to Equation (11) when θ = θ t and √ πσ 2 η is redefined as η. Thus, this state space model can be viewed as a motor learning model that focuses only on the trial-by-trial variation of the motor commands without assuming any generalization function. A previous study demonstrated that the state space model fits the data well if context-dependent decay and the generalization function are heuristically introduced into this model (Ingram et al., 2013), i.e., x t+1 (θ) = (1 − ηλ 2 G 1 (θ t , θ))x t (θ) + ηG 2 (θ t , θ)e t (G 1 and G 2 are some functions).

NUMERICAL SIMULATION
The learning rates in the motor primitive framework and state space modeling, i.e., η in Equations (12) and (15) and η in Equation (14), were set to 0.5 and 0.04, respectively. The rates of forgetting in weight decay (λ 1 in Equation 12) and in state space modeling (λ 2 in Equation 14) were set to 0.0015/η. The rate of forgetting in effort minimization in a motor primitive framework (λ 2 in Equation 15) was set to 0.03/η. The number of motor primitives N was set to 100, and the perturbation p was set to π/4. The three types of models considered in the current study, i.e., the motor primitive framework with weight decay, the state space model with effort minimization, and the motor primitive with effort minimization, produced nearly identical learning curves using these parameters ( Figure 3D). To investigate the parameter dependencies, p was set to π/6 or π/12 in Figures 2E,F, respectively, and η and λ 2 were set to 0.8 and 0.01 in Figures 3H,I.

RESULTS
I analytically and numerically investigated a condition in which the decay of motor memory and the availability of learning effects were context dependent. The present study assumed a task involving unimanual reaching movements toward radially distributed target directions in the horizontal plane ( Figure 1A; detailed descriptions of the following framework, analytical calculations, and each parameter value were provided in the Materials and Methods section). The goal of the task was to reach to the target accurately in a situation in which executed movements were perturbed by changes in the environment p, e.g., the external force generated by a manipulandum (Shadmehr and Mussa-Ivaldi, 1994) or visuomotor transformation (Krakauer et al., 1999). The motor command x to compensate for a perturbation was modeled by the summation of the activities of the motor primitives as x(θ) = N i=1 W i A i (θ ), where the adaptable W i represents how the i-th primitive contributes to the production of the motor command and A i (θ ) is the activity of the i-th primitive determined by the target direction. The current study assumed that the recruitment pattern of motor primitives was determined by the target direction. The movement error in the t-th trial can thus be expressed as e t = p t − x t . To minimize the squared movement error, the squared sum of the weight values, and the squared motor command amplitude (effort), W i was modified in each trial (Equation 4).

WEIGHT DECAY
Conventional models of motor primitives assume weight decay (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Hirashima and Nozaki, 2012;Takiyama and Okada, 2012a;Taylor et al., 2012;Yokoi et al., 2014), i.e., the cost function E t to be minimized is defined as a weighted sum of the squared error and the squared sum of weight values E t = 1 2 e 2 t + λ 1 2 N i=1 W 2 i,t , where a regularization parameter λ 1 determines the trade-off between error minimization and weight value minimization. The learning rule can thus be written as where λ 1 indicates the forgetting rate and η denotes the learning rate: The larger λ 1 is, the faster the motor memory decays, and the larger η is, the faster the movement error is minimized. In addition, the larger the i-th primitive is activated, the faster the motor memory can be embedded in the i-th primitive because the learning rate η is modulated by A i in Equation (12). However, the forgetting rate is not modulated by A i , which results in motor memory decays in the trial when the i-th primitive is not activated. The assumed situation in this equation was an adaptation process in which the subjects adapted to a perturbation with reaching movements toward θ t . Equation (12) can be rewritten as a recursive equation of x: where is the generalization function (detailed derivation and description are provided in the Generalization Function section in the Materials and Methods). This equation indicates how the learning effects trained during reaching toward θ t are generalized to the untrained reaching movements toward θ in each trial. The generalization function indicates that the learning effects can be largely observed in test trials when tested movement direction θ is close to the trained movement direction θ t . Although this generalization function can be fit well to a wide range of experimental data (Thoroughman Frontiers in Computational Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 4 | 5

FIGURE 3 | Results of simulation 2. (A)
Simulated experimental setting for simulation 2. Training trials were conducted for 200 trials with θ = 0. After the training, 20 test trials with θ = −π + 2π k K and 100 relearning trials with θ = 0 were alternately simulated for K cycles, where K = 16, and the integer k was pseudorandomly sampled from the range [0, K − 1] to assume different values in each cycle. In the test trials, the movement error was forcibly set to 0 assuming error clamp trials. (B) Trial-by-trial variation of the motor command x in the test trials with the motor primitive framework with effort minimization, θ = 0 (blue line), θ = π/12 (green line), θ = π/6 (red line), and θ = π/3 (cyan line). The vertical black dotted line indicates the 10th test trial. (C) Trial-by-trial variation of the motor command in the relearning trials with the motor primitive framework with effort minimization. The black dotted line denotes the trial-by-trial variation of the motor command in the relearning trials with the motor primitive framework with weight decay (independent of θ ). The vertical black dotted line indicates the 10th relearning trial. (D) Learning curves. The horizontal axis indicates the trial number, and the vertical axis denotes the movement error e t . The red line, blue line, and circle denote the trial-by-trial variation of e t averaged across 20 simulation runs in the motor primitive framework with effort minimization, the framework with weight decay, and the state space model with effort minimization, respectively. The red and blue shaded areas indicate the standard deviations of the learning curves in the 20 simulation runs in the motor primitive framework with effort minimization and those with weight decay, respectively. After 200 trials, the motor commands x 200 (0) converged to x 0 in all three models. (E) Generalization function averaged across 20 simulation runs. The horizontal axis indicates the tested movement direction θ , and the vertical axis indicates the motor command x(θ ). To draw the generalization function, I averaged the motor commands of the initial 10 test trials in each cycle (the 10th test trial is indicated by the vertical black dotted line in B). The red and blue shaded areas indicate the standard deviation of the generalization function in the 20 simulation runs in the motor primitive frameworks with effort minimization and weight decay, respectively. (F) Context-dependent decay investigated in the relearning phase with θ = 0. The horizontal axis indicates the tested movement direction θ , and the vertical axis indicates the normalized motor command in relearning trials. The normalized motor command was defined as 1 10 10 t=1 x relearning,t (0)/x 0 , where x relearning,t (0) is a motor command at the t-th relearning trial in each cycle, i.e., I averaged the motor commands of the initial 10 relearning trials in each cycle (the 10th relearning trial is indicated by the vertical black dotted line in C) and divided the averaged value by x 0 . The (Continued) Frontiers in Computational Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 4 | 6 FIGURE 3 | Continued red and blue shaded areas indicate the standard deviations of the memory decays in the 20 simulation runs in the motor primitive frameworks with effort minimization and weight decay, respectively. (D,E,F) Parameter sensitivities of the learning curve, the generalization function, and the context-dependent memory decay in the motor primitive with effort minimization. The red lines denote the generalization function and the context-dependent decay with η = 0.5 and λ = 0.03, and the green lines denote those with η = 0.8 and λ = 0.01. The red and green shaded areas indicate the standard deviations of the learning curves in the 20 simulation runs in the motor primitive frameworks with effort minimization when η = 0.5 and λ = 0.03 and η = 0.8 and λ = 0.01, respectively. and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Taylor et al., 2012;Yokoi et al., 2014), the decay of motor memory, the first term in the right side of Equation (13), is independent of the target directions θ and θ t (i.e., the context-independent memory decay), in contrast to the results of a previous behavioral experiment (Ingram et al., 2013). Thus, the motor primitive framework with weight decay does not reproduce context-dependent memory decay.

EFFORT MINIMIZATION IN A STATE-SPACE MODEL
In addition to the motor primitive framework, the state space model is another widely used model of motor learning (Scheidt et al., 2001;Smith et al., 2006;Lee and Schweighofer, 2009). The state space model focuses on the trial-by-trial variation of the motor command x rather than the trial-by-trial variation of the weight value W. Thus, weight decay cannot be assumed in the state space model because this model does not assume any weight value. Memory decay was modeled as effort minimization in the state space model using the cost function E t = 1 2 e 2 t + λ 2 2 x 2 t , i.e., a weighted sum of squared error and effort. Optimizing this cost function led to the minimization of movement error and effort. The learning rule can be written as where the memory decay and the availability of learning effects are context independent, also in contrast to the results of previous behavioral experiments (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Ingram et al., 2013). Although the state space model can be fit well to actual data by heuristically introducing context-dependent memory decay and a generalization function (Ingram et al., 2011(Ingram et al., , 2013, such modeling leaves the question of what is optimized in motor learning. While more complicated versions of state space models can be considered (Smith et al., 2006;Lee and Schweighofer, 2009;Takiyama et al., 2009;Takiyama and Okada, 2011), context-dependent memory decay cannot be reproduced. Thus, the state space model is a powerful model in some cases (Scheidt et al., 2001;Smith et al., 2006;Lee and Schweighofer, 2009), but is not suitable for modeling and interpreting the context-dependent decay of motor memory and generalization in an optimization framework.

EFFORT MINIMIZATION IN THE MOTOR PRIMITIVE FRAMEWORK
In the previous two models, the parameter being optimized differed: the weight value was minimized in the conventional motor primitive framework, while effort was minimized in the state space model. Although the functional roles of weight decay in motor learning remain unclear (cf. Hirashima and Nozaki, 2012), effort minimization has been reported to be effective in minimizing motor noise (Jones et al., 2002) and endpoint variance (Harris and Wolpert, 1998). Furthermore, some previous studies have reported that not only movement error but also effort is minimized in the adaptation to a force field (Emken et al., 2007;Huang et al., 2012) and a visuomotor transformation (Huang and Ahmed, 2014). Thus, I propose a motor primitive framework with effort minimization. When E t is defined as E t = 1 2 e 2 t + λ 2 2 x 2 t , motor learning can be modeled as The second term in the right side of Equation (15) is a memory decay that can be modulated by x(θ t ) and A i (θ t ); the larger the i-th primitive is activated, the faster the motor memory embedded in the primitive decays. Equation (15) yields a recursive equation for motor commands: where both memory decay and error minimization are context dependent. When trained with reaching movement toward θ t and tested with movements toward θ, the motor memory decays faster in the test trials when the tested movement direction is close to the trained movement direction because the forgetting rate ηλ 2 G(θ t , θ) is maximal when θ = θ t . This result is consistent with results of a previous experiment (Ingram et al., 2013). Taken together, effort minimization in a motor primitive framework is the only candidate that can simultaneously reproduce context-dependent memory decay and generalization.

SIMULATION 1: VALIDATION OF THE ANALYTICAL CONSIDERATION
To validate the analytical considerations, I conducted numerical simulations (Figures 2, 3). The first simulation consisted of 100 training trials, 100 test trials, and 50 retest trials (Figure 2A). In the training trials, the (simulated) subjects were required to adapt to p = π/4 in reaching movements toward θ = 0. The test trials were "error clamp" trials in which the movement error e was forcibly set to 0 to enable a clear discussion of learning effects by excluding any adaptation process (Scheidt et al., 2000). These test trials were simulated to investigate how the learning effects trained in training trials are generalized to reaching movements toward θ = 0 (blue line), θ = π/12 (green line), θ = π/6 (red line), or θ = π/4 (cyan line). Notably, the error clamp trials also enabled a clear discussion of forgetting processes because no adaptation can be assumed to occur during the trials. Thus, the memory of learning effects trained in the training trials decayed during the test trials depending on θ. To investigate how the target direction (context) affected the decay of the motor memory, another 50 error clamp trials were imposed with θ = 0 after the test trials (Figure 2A).

Frontiers in Computational Neuroscience www.frontiersin.org
February 2015 | Volume 9 | Article 4 | 7 The above simulated experimental setting enabled an investigation of whether the decay of motor memory and the availability of learning effects are context dependent or context independent. If the availability of learning effects is context dependent, the learning effects trained with θ = 0 in the training trials will be generalized to other reaching movements in the test trials, and the motor command toward the other target directions x(π/12), x(π/6), and x(π/4) will not be 0 in those trials. By contrast, if the availability of learning effects is context independent, then no generalization should be observed, and x(π/12), x(π/6), and x(π/4) are 0 in the test trials. If the memory decay is context independent, the motor commands in the retest trials should be independent of the target direction in the test trials, i.e., the blue, green, red, and cyan lines that indicate x(0) should converge to the same value in the retest trials. This result would indicate that the motor memory decays at the same rate in the test trials independent of the target direction. However, if the memory decay is context dependent, then the motor commands in the retest trials should depend on the target direction in the test trials, and the blue, green, red, and cyan lines that indicate x(0) should diverge to different values in the retest trials. This result would indicate that the motor memory decays with different rates in the test trials depending on the target direction.
In the motor primitive framework with weight decay Equation (12), the availability of learning effects is context dependent ( Figure 2B, the blue, green, red, cyan lines in the test trials). However, the memory decay is context independent (Figure 2B, all the lines converged to the same value in retest trials). In the state space model with effort minimization Equation (14), neither the availability of learning effects nor the memory decay is context dependent (Figure 2C). Thus, these conventional frameworks are not sufficient to reproduce both generalization and context-dependent memory decay.
By contrast, the motor primitive framework with effort minimization Equation (15) reproduced both generalization and the context-dependent memory decay ( Figure 2D). Furthermore, these generalization and context-dependent decay were independent of the magnitude of perturbation (Figures 2E,F). Thus, my analytical considerations were validated; context-dependent memory decay is a result of effort minimization in the motor primitive framework.

SIMULATION 2: REPRODUCTION OF PREVIOUS EXPERIMENTAL DATA
Next, I reproduced the results of a previous behavioral experiment (Ingram et al., 2013) using the assumptions of a similar experimental setting (Figure 3). The training involved 200 trials with θ = 0. After these trials, the motor command x 200 (0) converged to x 0 ( Figure 3D). After the training, 20 test trials with θ = −π + 2π k K and 100 relearning trials with θ = 0 were alternately simulated for K cycles in which the integer k was peudorandomly sampled from the range [0, K − 1] to take different value in each cycle and K was 16. Once the integer k was sampled, k, or θ was fixed across the 20 test trials. The movement error was forcibly set to 0 in the test trials (error clamp trials). These test trials enabled to investigate the generalization. If the availability of the learning effects is context independent, motor commands in test trials will be 0 except for the case when θ = 0. By contrast, the availability of the learning effects is context dependent, the learning effects trained in training or relearning trials will be generalized to other reaching movements in test trials. Because the magnitude of the motor commands (motor memory) decayed during the test trials (Figure 3B), I averaged the motor commands in the initial 10 test trials to obtain the generalization function shown in Figures 3E,H to eliminate the effects of decay on the function.
After the test trials, 100 relearning trials with θ = 0 were imposed to investigate the context dependence of the memory decay. If the decay rate of the memory was independent of θ in the test trials, the motor commands in the relearning trials should have the same values independent of θ . By contrast, if the decay rate of the memory depended on θ , the motor commands in the relearning trials should have different values depending on θ . Because the motor commands converged to the same value during the relearning trials (Figure 3C), I averaged the motor commands in the initial 10 relearning trials to obtain the contextdependent memory decay in Figures 3F,I to eliminate the effects of relearning on the function of the context-dependent memory decay. In addition, based on a previous study (Ingram et al., 2013), the motor commands in relearning trials were denoted after normalizing by x 200 (0) in the training trials (Figures 3C,F,I).
Similar to the previous section, Simulation 1: Validation of the Analytical Consideration, neither the motor primitive framework with weight decay nor the state space model with effort minimization reproduced the context-dependent memory decay in an experimental setting similar to that of the previous study (Ingram et al., 2013) (blue line and circles in Figures 3E,F). By contrast, the motor primitive framework with effort minimization reproduced both the context-dependent memory decay and the generalization in this experimental setting (red line in Figures 3E,F). Furthermore, these results were not sensitive to parameter changes (Figures 3G-I); although the magnitude of the memory decay was reduced when λ 2 was small, the context dependencies of the memory decay and the generalization were invariant. Thus, this framework was able to reproduce the generalization and context-dependent decay within an experimental setting similar to that of the previous study (Ingram et al., 2013). Taken together, the context-dependent decay and the generalization were the results of optimizing the effort and movement error in the motor primitive framework.

DISCUSSION
In the current study, I revealed that context-dependent decay is evidence of effort minimization in motor learning by extending a motor primitive framework. The conventional motor primitive framework succeeded in reproducing the availability of the learning effects trained with reaching movements toward θ in other reaching movements (Thoroughman and Shadmehr, 2000;Donchin et al., 2003;Tanaka et al., 2009;Yokoi et al., 2011;Brayanov et al., 2012;Taylor et al., 2012;Yokoi et al., 2014). Although the conventional models of motor learning support the existence of memory decay in motor learning processes (Thoroughman and Shadmehr, 2000;Scheidt et al., 2001;Donchin et al., 2003;Smith et al., 2006;Tanaka et al., 2009; Frontiers in Computational Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 4 | 8 widths of the context-dependent decay and the generalization function.