Computational model of precision grip in Parkinson's disease: a utility based approach

We propose a computational model of Precision Grip (PG) performance in normal subjects and Parkinson's Disease (PD) patients. Prior studies on grip force generation in PD patients show an increase in grip force during ON medication and an increase in the variability of the grip force during OFF medication (Ingvarsson et al., 1997; Fellows et al., 1998). Changes in grip force generation in dopamine-deficient PD conditions strongly suggest contribution of the Basal Ganglia, a deep brain system having a crucial role in translating dopamine signals to decision making. The present approach is to treat the problem of modeling grip force generation as a problem of action selection, which is one of the key functions of the Basal Ganglia. The model consists of two components: (1) the sensory-motor loop component, and (2) the Basal Ganglia component. The sensory-motor loop component converts a reference position and a reference grip force, into lift force and grip force profiles, respectively. These two forces cooperate in grip-lifting a load. The sensory-motor loop component also includes a plant model that represents the interaction between two fingers involved in PG, and the object to be lifted. The Basal Ganglia component is modeled using Reinforcement Learning with the significant difference that the action selection is performed using utility distribution instead of using purely Value-based distribution, thereby incorporating risk-based decision making. The proposed model is able to account for the PG results from normal and PD patients accurately (Ingvarsson et al., 1997; Fellows et al., 1998). To our knowledge the model is the first model of PG in PD conditions.


INTRODUCTION
Precision grip (PG) is the ability to grip objects between the forefinger and thumb (Napier, 1956). Successful performance of PG requires a delicate control of two forces (grip force, F G , and lift force, F L ) exerted by two fingers on the object. In a grip-lift task F G is kept sufficiently high to couple F L with the object via the agency of friction between the object and the fingers. An optimal F L is also required to overcome the object's weight and lift it off the surface of the table on which it rests. These forces (F L and F G ) are thought to be generated in parallel by different subsystems in the brain (Ehrsson et al., 2001(Ehrsson et al., , 2003. The critical F G at which the object slips is called the slip force (F slip ) and the difference between the actual steady state F G (SGF), used in a successful lift, and F slip is known as the safety margin (SM = SGF − F slip ). Johansson and Westling (1988) demonstrated the SM in controls to be 40-50% of slip force (Johansson and Westling, 1988). A high SM is employed to prevent the object from slipping due to internal (accelerations due to arm motion) (Werremeyer and Cole, 1997) and external (random changes in object load) perturbations -motor activity is optimized for the internal perturbations and this optimality is lost on the addition of an external perturbation external perturbation (Charlesworth et al., 2011;Sober and Brainard, 2012;Wolpert and Landy, 2012). An excessive SM is undesirable as it would cause muscle fatigue and may even lead to crushing of the object. SM in grip force is a crucial and defining parameter of PG performance. F slip serves as the threshold below which the object cannot be lifted. Human subjects operate at F G that is much higher than F slip ; operation at a small SM makes the gripping unstable. Therefore, this need to operate sufficiently far from the border of instability may be regarded as a strategy for minimizing risk. The need for a large SM indicates that concepts from theories of risk-dependent decision making (DM) may be applied to understand PG performance (Bell, 1995;D'Acremont et al., 2009). By definition, risk is the variance in reward outcome (Bell, 1995;D'Acremont et al., 2009). In the context of PG performance, a reward may be thought to be associated with successful lifts. The risk is maximum close to F slip and expected reward (value) saturates for grip forces much greater than F slip . Optimal PG consists of maximizing average reward while minimizing risk. The present study approaches the problem of PG in terms of risk minimization and describes it within the framework of Reinforcement Learning (RL). The model is used to explain PG performance in both controls and Parkinson's disease (PD) patients.
PG studies in PD patients show a remarkable difference in SM patterns between PD patients and controls Fellows et al., 1998). PD patients were shown to be capable of storing and recalling previous lift parameters (Muller and Abbs, 1990;Ehrsson et al., 2003). This allows them to scale F G when the load force on the object changes Fellows et al., 1998). Interestingly, even though F G scaling is preserved, scientific community is divided on the question of sensory deficits in PD patients being a cause for their altered SM. A sensory deficit would lead to suboptimal sensory-motor coordination. Some studies support the above theory of sensory deficit in PD (Moore, 1987;Schneider et al., 1987;Klockgether et al., 1995;Jobst et al., 1997;Nolano et al., 2008) and others Ingvarsson et al., 1997) reject it. Ingvarsson et al. (1997) demonstrated that the controls and PD OFF patients generated nearly similar SGFs. A higher SGF was generated in PD ON (L-Dopa Medication) case when the lifted object was covered with silk, suggesting a higher safety margin in PD ON condition . It has been suggested that this increase in SGF may be due to L-Dopa induced hyperkinesias Gordon and Reilmann, 1999). In another study, Fellows et al. (1998) observed that PD ON subjects show a higher SGF than controls, but there was no mention of PD OFF results (Fellows et al., 1998). Reports also suggest a considerably higher SGF variance in PD OFF condition when compared to controls and PD ON condition . This may indicate the importance of the concepts of risk sensitivity in understanding the SGF in controls, PD ON and PD OFF conditions. Furthermore, recent evidence suggests that risk-takers are less prone to Parkinson's disease (Sullivan et al., 2012). PD medications such as L-Dopa (Ehrsson et al., 2003) and dopamine agonists (Claassen et al., 2011) increase impulsivity and risk-seeking behavior in PD patients. PD subjects under medication tend to show less sensitivity to negative outcomes and therefore tend to make risky choices (Wu et al., 2009). The effect of PD medication (dopamine agonist) in enhancing riskseeking tendency was also confirmed using the Balloon Analog Risk Task-an elegant assay for risk-related behavior (Claassen et al., 2011). The impairment of risk-processing in PD patients (Ehrsson et al., 2003;Wu et al., 2009;Claassen et al., 2011), and altered SM in PD, makes a risk-based motor control approach to PG performance even more compelling.
None of the previously explained computational models for PG lift tasks (Kim and Inooka, 1994;Fagergren et al., 2003;de Gruijl et al., 2009) explain the grip force levels used by controls and PD patients. Hence, a computational model to explain grip forces in controls and PD forms the motivation for the present work.
Drawing from the aforementioned presentation of facts, we propose to model PG performance, and its alterations in PD condition, using the mathematics of risk. We use the concept of utility function, a combination of value and risk components, embedded in the framework of Reinforcement Learning (RL) (Bell, 1995;Long et al., 2009;Wu et al., 2009;Wolpert and Landy, 2012). Concepts from RL have been used extensively in the past to model the function of the Basal Ganglia (BG) in control and PD conditions (Sridharan et al., 2006;Gangadhar et al., 2008Gangadhar et al., , 2009Chakravarthy et al., 2010;Krishnan et al., 2011;Magdoom et al., 2011;Kalva et al., 2012;Pragathi Priyadharsini et al., 2012;Sukumar et al., 2012). In a recent modeling study, we used the utility function to model the role of the BG in reward, punishment and risk based learning (Pragathi Priyadharsini et al., 2012).
We now present a computational model for human PG performance in controls and PD subjects in (ON/OFF) medicated states. Using risk-based DM to model PG performance, we show the alteration of F G in PD patients (Wolpert and Landy, 2012) in a modified RL framework. Modeling results match favorably with experimental PG performance.
The paper is organized as follows. Section "Model" presents the model. Section "The Precision Grip Control System" presents PG control system. Section "The Utility Function Formulation and Computing U(F Gref )" presents the utility function formulation and Section "Modeling Precision Grip Performance as Risk-Based Action selection" presents a model of the BG based on the same (Magdoom et al., 2011;Pragathi Priyadharsini et al., 2012;Sukumar et al., 2012). In the results section, the model of the BG is used to explain PG performance of PD patients described by Ingvarsson et al. (1997) and Fellows et al. (1998). A discussion of the proposed model and modeling results is presented in the final section.

MODEL THE COMPLETE PROPOSED PRECISION GRIP MODEL IN A NUTSHELL
1. We first define a closed-loop control system in which the plant (the finger and object system) is controlled by two controllers-a F L controller and a F G . There are two inputs to the entire loop-a reference grip force (F Gref ) and a reference position (X ref ). The reference position, the position to which the object must be lifted, is predefined for a given task by the experimenter. We are now left with F Gref as the crucial parameter that determines the PG performance of the control system. F Gref is given as a step input to the F G controller; the output of the controller, F G (T), is used to grip the object ('T' denotes simulation time in milliseconds). The challenge consists of finding F Gref that leads to successful lifts. 2. We then construct the Utility function, U(F Gref ), consisting of both value and risk components, as a function of F Gref using a modified RL approach. The problem of finding the optimal F Gref is then treated as an action selection problem in the BG. In previous studies, we introduced the Go-Explore-NoGo (GEN) (Magdoom et al., 2011;Sukumar et al., 2012) method as a model of action selection in the BG. In the present study, we apply GEN on U(F Gref ) to simulate PG performance results for control and PD condition. 3. Using the gradient of the Utility function, δ U (representing the dopamine signal), as a key signal in control of action selection, we simulate the PG performances of controls and PD subjects in the experimental tasks by Ingvarsson et al. (1997) and Fellows et al. (1998).

THE PRECISION GRIP CONTROL SYSTEM
PG performance consists of two fingers and an object interacting through friction (represented by friction coefficient μ). A free body diagram showing the various forces acting on the fingers and the object are shown in Figure 1. The fingers, shown in two parts on either side of the object, represent the index finger and the thumb. For simplicity we assume that the two fingers are identical in mass and shape. F G is the grip force applied on the object horizontally acting in opposite directions. F L is the lift force acting at the finger-object interface, lifting the object up. By the action of the F G pressing on the object, and due to the friction between the finger and object, F L gets coupled to the object. The forces thus emerging between the finger and object are shown in Figure 1B. The frictional force F f acts on the object in the upward direction, with F f /2 on either side of the object.  Figure 2A illustrates the interaction of F L and F G in controlling the position of the object (X o ). The model consists of two controllers (F L and F G controllers) and a plant. Inputs to the plant are F L and F G while the outputs are the object position (X o ), finger position (X fin ) and their derivatives (Ẋ o ,Ẍ o ,Ẋ fin ,Ẍ fin ). The objective of the control task is to lift the object to a reference position, X ref . The model receives F Gref and X ref as the inputs and the X fin (position of finger), X o (position of object),Ẋ fin (velocity of finger),Ẋ o (velocity of object),Ẍ fin (acceleration of finger) andẌ o (acceleration of object) as outputs. The plant is described in detail in Appendix A.
The following sections describe the design of the controllers (F G and F L ) followed by their training method, respectively.

The grip force (F G ) controller
The F G controller (designed as a second order system) is used to generate the F G which couples the fingers to the object. Typical F G profiles of human subjects show a peak and a return to a steady state value, resembling the step-response of an underdamped second order system, thereby justifying the choice of an underdamped second order system as a minimal model. The F G controller for a step input (F Gref ) is given in Equation (1).
In order to determine the values of natural frequency, ω n and damping factor, ζ, maximum overshoot (M p , defined as the maximum peak value of the response curve) and time to peak (T p , peaking time of the response curve) are required. Using prior published experimental values (Johansson and Westling, 1984) for M P and T p , F G controller parameters are obtained using Equations (2, 3) (Ogata, 2002). where ω d the damped natural frequency is given as, See "Controller Training from the Model of The Precision Grip Control System" for the details of the above calculations.

Lift force (F L ) controller
The F L controller, which is a Proportional-Integral-Derivative (PID) controller [Equation (5)], takes the position (E L ) as input [Equation (6)], and produces a time-varying F L profile (F L,PID ) as output [Equation (5)] which in turn controls the object position.
Here the K P,L , K I,L and K D,L are the proportional, integral and derivative gains, respectively, for the F L controller. PID controller output is non-zero initially which is not realistic since the initial value of F L must be zero. Hence, F L,PID is smoothened using Equation (7).
where F L (T = 0) = 0. In order to design the F L controller, we simplify the full system of Figure 2A as a F L controller with a high constant F G (10 N) to prevent the slip ( Figure 2B). Note that if a constant and high value of F G is assumed, slip is completely prevented, and the F G controller is effectively eliminated from the complete system ( Figure 2B). The F L controller design now involves lifting a simple inertial load straight up from an initial position (X o = 0 m) to a final position (X o = 0.05 m).
Performance of the lift is evaluated using the cost function, CE, [Equation (8)]. This Cost function comprises of (1) average position difference between the finger (X fin ) and the object (X o ) at the end of the trial and (2) the difference in position between the desired and actual average position of the object.
The F L PID controller parameters were then optimized for cost function (CE) using Genetic Algorithm (GA) (Goldberg, 1989;Whitley, 1994) (refer Figure 3 for block diagram and for parameters refer Supplementary Material A) keeping F G constant (=10 N) at a sufficiently high value so that the object does not slip. The F L controller described above is designed assuming a constant and high grip force. But, when both F G and F L controllers are inserted in the full system ( Figure 2A) the system may behave in a very different manner due to gradual F G buildup starting from zero. When a step input of magnitude F Gref is given to the F G controller, the F G starts from 0, then approaches a peak value and stabilizes at a steady-state value known as Stable Grip Force (SGF). Even with the trained controllers, for low values of F Gref , the object may slip. But once the F Gref is sufficiently high, slip is prevented and the object can be lifted successfully. Therefore, for a successful lift, an optimal value of F Gref needs to be determined. The optimal F Gref , which is related to SGF needed for a successful grip/lift performance, varies with the experimental setup, skin friction etc. Fellows et al., 1998). It is here that we use concepts from RL and the utility function formulation for searching the F Gref state space.

THE UTILITY FUNCTION FORMULATION
The optimality of a decision is measured by the rewards fetched by it. Selection of an optimal choice providing the maximum expected value of the rewards (value) is known as optimal decision making (DM). DM can be seen in stock exchange, corporates and even our daily lives (which may or may not involve explicit monetary rewards). A lot of DM is carried out subconsciously when the person is unaware of the reason for choosing ones decision (Ferber et al., 1967). Non-human primates also show DM capabilities (Lakshminarayanan et al., 2011;Leathers and Olson, 2012). A mathematical framework is essential to empirically understand subjective DM behavior. Various independent studies confirm the role of value (average reward) and risk (variance in the rewards) in DM (Milton and Savage, 1948;Markowitz, 1952;Hanoch and Levy, 1969;Kahneman and Tversky, 1979;Bell, 1995;Lakshminarayanan et al., 2011).
A search for components of DM in the living world lead to identification neural correlates of value and risk in human and non-human primates (D'Acremont et al., 2009;Schultz, 2010;Lakshminarayanan et al., 2011;Leathers and Olson, 2012). Human DM process takes account of the risk in addition to the mean rewards that the decisions fetch. Furthermore, many neurobiological correlates of risk sensitivity are found in support of risk-based DM in humans (Wu et al., 2009;Schultz, 2010;Zhang et al., 2010;Wolpert and Landy, 2012).
An important instance of risk-based DM model is the utility function formulation, which is a combination of value and risk information (D'Acremont et al., 2009). The utility function used in the current model of PG performance derives from (Bell, 1995;D'Acremont et al., 2009) study that uses the concept of utility (U) as a weighted sum of value (V), which represents expected reward, and risk (h), which denotes variance in the reward. The weighting factor, λ, involved in the linear sum of V and h, denotes risk preference [Equation (9)].
We here introduce few terms from RL used in our study, following a policy (π) is associated with an expected value (V) of the state (s) at trial (t) is given by Equation (10).
where, E is expectation and γ is the discount factor denoting the myopicity in the prediction of future rewards, and the reward is denoted by "r." The V update is by Equation (11).
where η V is the learning rate for V and "δ" is the temporal difference error or the reward prediction error, and is given by Equation (12) Similarly, the risk prediction error, "ξ(t)" is denoted by Equation (13).
where, the risk function denoted by (h) is the variance in the prediction error [Equation (14)] Here η risk is the learning rate for risk. We now introduce a modified version of Utility (U) [Equation (9)], as follows, where "α sign(V(t))" is the risk preference (Pragathi Priyadharsini et al., 2012). The sign(V) term achieves a familiar feature of human decision making viz., risk-aversion for gains and risk-seeking for losses (Kahneman and Tversky, 1979). In other words, when V is positive (negative), U is maximized (minimized) by minimizing (maximizing) risk. We now use the above basic framework for modeling the Utility as a function of F Gref .

COMPUTING U(F Gref )
We now explain how the above-described Utility function formulation is applied in the present study. The Utility function, U, and its components V and h, are expressed as functions of F Gref , which is taken as the state variable. A given value of F Gref results in either slip or successful lift of an object. The measure of performance is expressed by CE [Equation (8)].
The value "V" and risk "h" are computed as a function of F Gref by repeatedly simulating lift for a range of values around F Gref . This helps us to analyze the possible variability caused on the performance of the PG task with F Gref . A range of values were obtained by adding uniformly distributed noise (ν: refer Table 1) to F Gref [Equation (16)]. We modeled ν to proportionally represent the F slip i.e., the value of ν is increased for a low μ.F For the given value of F Gref , V CE was drawn from the performance measure [Equation (8)] and is generated using the cost function CE as [Equation (17)]. Refer Figure 4 for the block diagram for determining V CE .
There is no explicit reward in the present formulation. V CE represents a reward-like quantity, the average of which is linked to value. Value [Equation (18)] and risk [Equation (19)] were calculated as the mean and variance of the V CE ,   (18, 19) is general to all the trials. These processes are summarized in Table 2. (20)], is then obtained in terms of V and h, as shown below: Decisions are made by choosing actions that maximize U(F Gref (t)). Increasing the value of α makes the decision more risk aversive, while the decisions are more risk-seeking for smaller values of α. Maximizing U(F Gref (t)) is done by a stochastic hillclimbing process called the "Go/Explore/NoGo" (GEN) method. This method is inspired by dynamics of the BG and was described in greater detail in earlier work (Magdoom et al., 2011). We now present a brief account of the GEN method.

MODELING PRECISION GRIP PERFORMANCE AS RISK-BASED ACTION SELECTION
The key underlying idea of the proposed model is to treat the problem of choosing the right F G as an action selection problem and thereby suggest a link between the action selection function of the BG and PG performance. Impairment in action selection machinery of the BG under PD conditions is then invoked to explain F G changes in PD ON and OFF conditions. In line with the tradition of Actor-Critic approach to modeling the BG (Joel et al., 2002); we recently proposed a model of the BG in which value is thought to be computed in the striatum. Furthermore, we hypothesized that the action selection function of the Basal Ganglia is accomplished by performing some sort of a stochastic hill-climbing over the value function computed in the striatum. We dubbed this stochastic hill-climbing method the "Go/Explore/NoGo" (GEN) method (Magdoom et al., 2011;Kalva et al., 2012), which denotes a conceptual expansion of the classical Go/NoGo picture of the BG function. As an extension of the above model, we had recently proposed a model of the  (18) and (19), respectively. •

Train two separate RBFNNs to generate V[F Gref (t)] and h[F Gref (t)]
with F Gref as input (see Appendix B for details).
BG in which the striatum computes not just value but the Utility function (Pragathi Priyadharsini et al., 2012). Action selection is then achieved by applying the GEN method to the Utility function. In the present study, we apply the GEN method to the Utility function and estimate grip forces in control and PD conditions.

Neurobiological interpretation of the GEN method in controls
We now present some background and neurobiological interpretation of the GEN method in connection with functional understanding of the BG, following which details of the GEN method will be provided.
The BG system consists of 7 nuclei situated on two parallel pathways that form loops-known as the direct pathway (DP) and the indirect pathway (IP)-starting from the cortex and returning to the cortex via the thalamus. Striatum is the input port of the BG, which receives inputs from the cortex. The Globus Pallidus interna (GPi) and the Substantia Nigra pars reticulata (SNr) are the output ports that project to thalamic nuclei, which in turn project to the cortex, closing the loop. The DP consists of the striatum projecting directly to GPi/SNr, while the IP consists of a longer route starting from the striatum and returning to GPi/SNr via Globus Pallidus externa (GPe) and the Subthalamic Nucleus (STN) in that order. The striatum receives dopaminergic projections from the Substantia Nigra pars compacta (SNc) (Mink, 1996;Smith et al., 1998). Striatal dopamine levels are thought to switch between DP and IP, since the DP (IP) is selected for higher (lower) levels of dopamine (Sutton and Barto, 1998;Frank, 2005;Wu et al., 2009) (Figure 5).
In classical accounts of the BG function, the DP is known as the Go pathway since it facilitates movement and the IP is called the NoGo pathway since it inhibits movement. Striatal dopamine levels are thought to switch between Go and NoGo regimes (Albin et al., 1989). We have been developing a view of the BG modeling in which the classical Go/NoGo picture is expanded to Go/Explore/NoGo picture, wherein a new Explore regime is inserted between the classical Go and NoGo regimes (Kalva et al.,  . This explore regime corresponds to random exploration of action space which is a necessary ingredient of any RL machinery. Kalva et al. (2012) show that the Explore regime arises naturally due to the chaotic dynamics of the STN-GPe loop in the IP. The three regimes together amount to a stochastic hill-climbing, which we describe as the GEN method. The GEN method has been used in the past to describe a variety of functions of the BG, in control and PD conditions, including reaching movements (Magdoom et al., 2011) and spatial navigation (Sukumar et al., 2012). Magdoom et al. (2011) used the GEN method to hill-climb over the value function (Magdoom et al., 2011). δ V (t) is defined as a temporal difference in value function [Equation (21)].
where "t" is not "time" but "trial number." The GEN method used in Magdoom et al. (2011) can be summarized using the following Equation (22), where, ψ is a random vector, and ||ψ|| = χ, a positive constant. DA hi and DA lo are the thresholds at which dynamics switches between Go, NoGo and Explore regimes [Equation (22)]. The underlying logic of the above set of Equations (22a-c) is as follows. If δ V (t) > DA hi , the last update resulted in a sufficiently large increase in V; therefore continue in the same direction in the next step. This case is called the "Go" (Equation 22a) regime. If δ V (t) ≤ DA lo , the last update resulted in a significant decrease in V; therefore proceed in the direction opposite to the previous step. This case is called the "NoGo" (Equation 22b) regime. If DA lo < δ V (t) ≤ DA hi , there was neither a marked increase nor decrease in V; therefore Explore (Equation 22c) for new directions. This case is called the Explore regime. In Magdoom et al. (2011) we assumed a simple symmetry between DA hi and DA lo , such that DA hi > 0 and DA lo = −DA hi .
However, in the present study we introduce a small variation of the above formulation of the GEN method. Instead of V, we perform hill-climbing over the Utility landscape. The three separate Equations (22a-c) can be combined into a single Equation (23) [as in Sukumar et al. (2012)], as follows: where, And, F Gref is the change in F Gref and 't' is the trial number; A G/E/N are the gains of Go/Explore/NoGo components, respectively; λ G/N are the sensitivities of the Go/NoGo components, respectively; ψ is a random variable uniformly distributed between −1 and 1 and σ E is the standard deviation used for the Explore component.
Just as TD error is interpreted as dopamine signal in classical RL-based accounts of the BG function, we interpret δ U as dopamine signal in the present study. In Equation (23) above, the first term on the RHS corresponds to "Go" regime, since it is significant for large positive values of δ U (since A G and λ G are positive). The second term on the RHS of Equation (23) above corresponds to "NoGo" regime, since it is significant for large negative values of δ U (due to A N > 0 and λ G < 0). The third term on the RHS corresponds to "Explore" regime, since it is dominant for values of δ U close to 0.

Neurobiological Interpretation of the GEN Method in PD
PD being a dopamine deficient condition, a natural way to incorporate Parkinsonian pathology is to attenuate the dopamine signal, δ U . In (Magdoom et al., 2011;Sukumar et al., 2012), PD pathology was modeled by clamping the dopamine signal, δ U , and preventing it from exceeding an upper threshold. The rationale behind such clamping is that with fewer dopaminergic neurons left, SNc may not be able to produce a signal intensity that exceeds a certain threshold. In the present case of PG, such a constraint is applied to δ U . If [a,b] is the natural unconstrained range of values of δ U for controls, then for PD OFF simulation, a clamped value of δ Lim changes the δ U range to [a, δ Lim ] where δ Lim < b. Furthermore, to simulate the increase in dopamine levels in PD ON condition, due to administration of L-dopa, a positive constant is added to δ U , thereby changing the range of δ U in PD ON condition to [a+ δ Med , δ Lim + δ Med ] [Equation (26) where δ Lim + δ Med < b and a < b.

Training the GEN parameters
The output of the GEN system is F G from which SGF is calculated as average F G between 4000 and 5000 ms of simulation time, the mean and variance of which must match with the mean and variance of SGF obtained from PG experiments under control and PD conditions Fellows et al., 1998). The parameters to be trained are A G/E/N [gains of the Go/Explore/NoGo terms in Equation (23) term in Equation (23)]. Determination of the GEN parameters is done by optimizing a cost function CE GEN given as.
SGF is the stable grip force generated and σ is the variance in the error. Subscripts expt and sim denote experimental and simulated values, respectively. CE GEN is formulated such that more weightage is given to the SGF error and lesser to the variance in the error [Equation (27)]. The six model parameters (A G/E/N , λ G/N , and σ E ) of the GEN method [Equation (23)] are trained to capture the following experimental conditions: Controls and PD ON conditions are obtained from Fellows et al. (1998), whereas controls, PD ON and OFF from Ingvarsson et al. (1997) for both sandpaper and silk surfaces. However, the parameters A G/E/N , λ G/N , and σ E are not all trained separately for every experimental condition. Initial parameter values for A G/E/N , λ G/N , σ E , and α were determined ( Figure 6A) by matching the SGF results for controls of Fellows et al. (1998); this matching is achieved by optimizing CE GEN [Equation (27)] using GA ( Figure 6A). This initial training of GEN parameters is a kind of calibration of the parameters for a given experimental setup. Once an initial estimate of parameters was obtained, A G/E/N , λ G/N , and σ E were fixed and optimal values of α, δ Lim , and δ Med were obtained using GA for the PD conditions, of Fellows et al. (1998). Similarly, for Ingvarsson et al. (1997), the initial parameter values for A G/E/N , λ G/N , σ E , and α were determined by matching the SGF results for controls using Equation (27). For the cases of PD ON and PD OFF , the search space was limited to α, δ Lim , δ Med , by having fixed the values of A G/E/N , λ G/N , σ E obtained from controls .
The model was tested (Figure 6B) using the trained GEN parameters to determine if the model generated outputs are close to the experimental values.

RESULTS
We now apply the model described in the previous section to explain the experimental results for two published studies  Fellows et al., 1998). For simplicity we used only constant weight trials (i.e., only the trials where only one load was repeatedly used for lifting). The friction coefficient was calculated as load force/slip force  [Equation (28)]. Fellows et al. (1998) investigated 12 controls, 16 PD patients and four hemi-parkinsonian patients. All the PD subjects were in medication ON state. The subjects were asked to lift the object to a height of 4-8 cm. The study comprised of two loads 3.3 N and 7.3 N. Various combinations of these two loads were used to give rise to "light," "heavy," "unload," and "load" condition. In our study, we simulated only the experimental results for the "light" condition (which featured lifting the 3.3 N object for all the trials) with a desired object lift height of 5 cm. Ingvarsson et al. (1997) investigated the role of medication in 10 controls and 10 PD subjects under two object surface conditions (silk and sandpaper). The task required the object to be PG-lifted to a height of 5 cm above the table. The entire experiment was divided into 3 parts (a) coordination of forces, (b) adaptation to friction and (c) rapid load changes. "Coordination of forces" required the object to be lifted to 5 cm height, and maintained at that position for 4-6 s using PG. "Adaptation to friction" required the subjects to gradually reduce the grip force on the object thereby causing slip F slip is calculated. Finally, in the "rapid load changes" task a plastic disk was dropped in a padded plate thereby causing abrupt changes in F L .
In the present study, we use the F slip for determining the friction coefficient which is used to match the experimentally obtained grip force values under silk and sandpaper condition for coordination of forces case. Ingvarsson et al. (1997) reported the results in median ± Q3 quartile format. Hence for simplicity the results are assumed to be normally distributed with mean = median and Q3 = mean + 0.6745 standard deviation. The entire text reports the results in terms of mean and variance.
We now describe the simulation results starting from controller training to simulation of results from Ingvarsson et al. (1997) and Fellows et al. (1998).

The F G controller
In the present model the grip force controller is designed as reference tracking controller which receives F Gref as the input and generates a time-varying F G (T) as the output. So in the proposed configuration the F G controller is only affected by the input (F Gref ) it receives. Using the overshoot ratio, M p = 1.25, [Equation (2)] and time to peak, T p = 530 ms, [Equation (3)] as design criteria [M p and T p values obtained from Johansson and Westling (1984)], we determined ω n = 6.4 and ζ = 0.4 as the parameters for transfer function of the F G controller [Equation (1) in The Precision Grip Control System; Ogata, 2002]. Figure 7A shows the grip force profile (for T = 5000 ms) for the input F Gref = 10 N. Since the F G controller is modeled as a transfer function which is dependent only on the F Gref value, the controller did not require retraining for different friction conditions.

The F L controller
The efficacy of F L controller output can be observed in the output object and finger position. If a suboptimal F L is generated, the object does not reach the reference position. The cardinal components affecting the object-finger slip are μ and M o (Refer Table 1 for the values of μ and M o used in simulation). Since the F L controller is affected by μ and M o , a decreased μ or increased M o increases the F slip Therefore, in this study we trained the F L controller for the minimum μ and the maximum M o to prevent the object from slipping even when μ increases or M o decreases. Furthermore, to train the F G controller, we assume a sufficiently large F G (=10 N) thereby effectively decoupling the F G controller. With a large, constant F G , the cost function (CE) [Equation (8)] was optimized using the GA (Goldberg, 1989;Whitley, 1994) for the setup parameters from Fellows et al. (1998) Ingvarsson et al. (1997) also. In Figure 7, the output of the simulated finger and object position is shown for ( Figure 7B) the silk condition of , ( Figure 7C) the sandpaper condition of  and ( Figure 7D) light condition of (Fellows et al., 1998).
Since we fixed the PID parameters for the F L controller, efficacy of the PID parameters across the three control conditions viz. Ingvarsson et al. (1997) silk condition, Ingvarsson et al. (1997) sandpaper, andFellows et al. (1998), needs to be determined. Two important criteria for determining a successful lift are: low slip (X fin −X o ) and low position error (X ref − X o ). In all the three cases (refer Table 1) the finger-object slip distance was <0.005 m and the X ref −X o <0.001 m, where, X o is average position between simulation time (T) as 4000 and 5000 ms. The F G controller output is shown in Figure 7A; object and hand position for Ingvarsson et al. (1997) sandpaper, Ingvarsson et al. (1997) sandpaper, andFellows et al. (1998) is shown in Figures 7B-D. Note that F Gref is held constant at 10 N.
Utility based approach requires estimation of V(F Gref ) and h(F Gref ) [refer model Section "Computing U(F Gref )"]. V and h are generated for Ingvarsson silk and Ingvarsson sandpaper and were obtained using the parameters given in Table 1. We assumed the noise to be inversely related to the friction coefficient. Hence a higher noise was used in case of lower μ and vice versa.
Note that value functions [expressed as a function of F Gref (Appendix B, Equation (42)] in case of PG are expected to have a sigmoidal shape, since a F G level that exceeds the F slip always results in a successful grip-lift and therefore reward. On the contrary, the risk function, h [Appendix B, Equation (43)], is expected to be bell-shaped since risk is the highest in the neighborhood of slip force, and zero far from it. These observations are supported by the value and risk functions constructed for various experimental conditions- Fellows et al. (1998) light andIngvarsson et al. (1997) silk and sandpaper cases. Figure 8A shows the value and risk functions for Fellows et al. (1998), Figure 8B shows the value and risk functions for Ingvarsson et al. (1997) silk and Figure 8C shows the value and risk functions for Ingvarsson et al. (1997) sandpaper condition.

PERFORMING ACTION SELECTION USING THE BG AND SIMULATING THE CONTROL AND PD CASES OF THE STUDY
Two features mark the difference in V(F Gref ) and h(F Gref ) between controls and PD. In PD OFF case we apply the clamped δ U (= δ Lim ) condition [Equation (26b)], whereas in PD ON case we add a positive constant (δ Med ) to δ U [Equation (26c)]. In addition to these changes in the dopamine signal, δ U , we assume altered risk sensitivity in PD. Studies also suggest altered risk taking in PD patients (in particular, risk in PD ON> risk in PD OFF) when compared to healthy controls (Cools et al., 2003). Since α represents risk sensitivity in the utility function [Equation (20)], we use a smaller α in PD case (both ON and OFF) (Refer Tables 3, 4 for the simulated values).
As described earlier, the GEN method [Equation (23)], produces a series of SGF values with a certain mean and standard deviation computed over the trials. It may be recalled, from "Model", that Equation (23) represents a form of stochastic hill-climbing over the utility function, U. It represents a map from F Gref (t − 1) to F Gref (t), where "t" represents the trial number. The three terms on the RHS of Equation (23) represent GO, NOGO and Explore regimes, in that order. The three regimes are active under conditions of high, low, and moderate dopamine (δ U ), respectively. Figure 9 shows some sample profiles of the three terms on the RHS of Equation (23).  Using the GEN policy on δ U, we simulate our model with parameters described in Tables 3, 4 (However, simulation with δ V of Equation (21) with the same parameter set described in this section yields Supplementary Material B).

Procedure to train the GEN parameters
• The change in F Gref (i.e., F Gref ) from trial, "t" to "t + 1" is given in Equation (23). • This Equation (23) contains six parameters (A G/E/N , λ G/N and σ E ) whose values need to be determined. • The training for GEN parameters (A G/E/N , λ G/N and σ E ) was carried out using GA for optimizing CE GEN on Fellows et al. (1998) control condition experimental data. (Figure 6A) • The GEN parameters obtained from the control conditions are also used for simulating PD. In case of (Fellows et al., 1998), Here A G = A N = A E = 1. The F Gref (t − 1) is set to 1. We thereby analyze the selection output F Gref (t) as a function of δ U alone. The "overall" graph is produced by actually adding the noise term [III term on RHS in Equation (23)]. Note the high variability in F Gref (t) in the vicinity of δ U = 0.
the parameters from controls are used for PD ON only. In case of , the parameters from controls are used in PD ON and OFF for two surface conditions -silk and sandpaper. • In PD OFF case, only δ Lim and α are trained. In PD ON case, δ Lim, δ Med and α are trained. Fellows et al. (1998) for controls was simulated using the parameters (Table 4) obtained by optimizing GA on CE GEN.

Simulation of Fellows et al. (1998) results
(a) Using the A G/E/N , λ G/N, σ E and α = 0.7 obtained from the GA, the controls was simulated using δ Lim = 1 and δ Med = 0. (b) A similar approach was performed for modeling the PD ON condition. The parameters (A G/E/N , λ G/N and σ E ) were the same as the controls, while the values of α = 0.312, δ Lim = −0.5 and δ Med = 0.427 were obtained by GA optimization. For a detailed list of parameters see Table 3.
A comparison of the experimental and simulated data obtained for Fellows et al. (1998) using the parameters in Table 3 is given as Figure 10.

Simulation of Ingvarsson et al. (1997) results
Following the simulation of the Fellows et al. experiment: (a) The results of the Ingvarsson et al. (1997) controls for both sandpaper and silk were simulated for obtaining (A G/E/N , λ G/N, σ E, and α = 0.5) using GA ( Table 4). The other parameters are set as δ Lim = 1 and δ Med = 0. A comparison of the experimental and simulated data obtained for Ingvarsson et al. (1997) for silk and sandpaper using the parameters in Table 4 is given as Figures 11 and 12, respectively. In order to be consistent with the experimental result the simulation results in Figures 11 and 12 are shown in median ± Q3. Figures 10-12 replicate the empirical findings (both mean / median and variance profiles) well. Fellows et al. (1998) results show that the mean(SGF) is higher in PD ON case compared to controls (SGFnorm<SGF PDON ) (Figure 10). A similar result (SGFnorm<SGF PDON ) is also reported in Ingvarsson et al. (1997) silk (Figure 11) and sandpaper cases (Figure 12). Furthermore, var(SGF) under PD OFF cases is observed to be greater than that of controls. It may be inferred that the increase in SGF during the PD conditions would be due to their increased SM while playing risk aversive in the game of risk based decision making. This increased SM could be needed to oppose the increased internal perturbations/sensory-motor incoordination observed in the PD patients.

DISCUSSION
In this paper, we present a computational model to explain the changes in F G in controls and PD ON/OFF conditions Fellows et al., 1998). To our knowledge this is the first computational model of PG performance in PD conditions. A novel aspect of the proposed approach to modeling PG is to apply (based on the observation that PG performance involves a SM) concepts from risk-based DM to explain F G generation. To this end, we applied a recent model of the BG action selection based on the utility function formulation, instead of value function, to explain PG performance (Pragathi Priyadharsini et al., 2012). There are significant challenges involved in developing a computational model of PG in PD conditions. The root cause of this difficulty is that the pathology in PD is located at a high level (dopaminergic neurons of the BG) in motor hierarchy, while the motor expression is at the lowest level in the hierarchy (finger forces). Ideally speaking, a faithful computational model must incorporate these two well-separated levels in motor hierarchy, and all the levels in between. But development of such an extensive model would be practically infeasible, and may not be essential to the problem at hand. On the other hand, if model compactness is the sole governing principle, one may build an empirical, data-fitting model that links experimental parameters like friction, object weight, and abstract neural parameters like dopamine level, medication level, with observed data like mean and variance of grip force generated. But such an over-simplified model could be a futile mathematical exercise without much neurobiological content.
Therefore, a conservative approach to model PG performance in PD would have two components: (1) a minimal model of sensory-motor loop dynamics involved in generating PG forces, and (2) a minimal model of the BG that incorporates the effect of dopamine changes on the BG dynamics. Whereas the BG level generates evaluations of actions (F G ) based on the rewards (successful grip performance) obtained from PG performance, the sensory-motor loop dynamics receives the command from the BG level and generates F G . While the first component represents DM, the second represents execution. These two components must then be integrated. PD-related reduction in dopamine level in the integrated model must then be manifested as appropriate changes seen in PG forces. This is the approach adopted in the present study. Figure 13 presents the grand plan of the entire model, and the training/design steps followed to build various components.

MODELING THE SENSORIMOTOR LOOP
The sensory-motor loop component consists of two controllers, for generating the grip force and lift force, and a plant model. F G and F L and given as inputs to the plant model which simulates the of the object and the fingers. The error between actual and desired object positions is fed back to the F L controller. The F G controller receives F Gref as input and generates the F G (T) profile. F Gref is generated by the second component, the BG component, by a DM process. The BG component is built on the lines of Actor-Critic models of the BG, where the utility function used instead of value function, and the dopamine signal, δ, is the temporal difference of utility [Equation (24)].

THE F L CONTROLLER FORMS A LOOP WITH THE PLANT
The controller gives F L as input to the plant and receives position error as feedback. The controller is trained by GA as described in "The Precision Grip Control System". The grip force controller is designed as an open-loop second order system that gives F G as input to the plant (see The "Precision Grip Control System"). Plant dynamics is described in Appendix A. The controller and plant system, with its trained parameters, is then integrated with appropriately trained the BG models to simulate control and PD results Fellows et al., 1998).

Simulation of Fellows et al. (1998) results
Incorporating the values of object mass (M o = 0.33 kg) and friction (μ = 0.44) from Fellows et al. (1998), the controller and plant system trained above is used to calculate V and h functions (see "The Precision Grip Control System"). The V and h functions thus computed are explicitly modeled using RBFNNs (see "The Utility Function Formulation"). The V and h functions are combined to produce the utility function, which is  Figures 11, 12).

Simulation of Ingvarsson et al. (1997) results
The case of Ingvarsson et al. (1997) is a bit more complicated since it involves two friction conditions: silk and sandpaper. It also involves both PD ON and OFF unlike Fellows et al. (1998) which describes results for only PD ON. For the condition of sandpaper, object mass (M o = 0.3 kg) and friction (μ = 0.94) are incorporated into the trained controller and plant system. V and h functions are computed and explicitly modeled using RBFNNs ("The Utility Function Formulation") and Utility function is computed by combining V and h. The Utility function is used in the GEN method; the GEN parameters (A G/E/N , λ G/N and σ E ) are trained to minimize CE GEN [Equation (27), Figure 5] so as to calibrate the model for the sandpaper case of Ingvarsson et al. (1997). The trained GEN parameters are used for the PD OFF case and only δ Lim is trained to optimize CE GEN . The same GEN parameters are again used for the PD ON case, where both δ Lim and δ Med are optimized. A similar procedure is followed for the "silk" case (M o = 0.3 kg and μ = 0.44) of Ingvarsson et al. (1997) as outlined in Figure 13. Risk-based decision making can arise in both motor and cognitive domains (Claassen et al., 2011). The present study deals with risk in motor domain. In this context, an interesting question naturally arises. Is there a correlation between risk-sensitivity in motor and cognitive domains. Does impaired risk-sensitivity in one domain carry over to the other? In other words, do PD patients show impaired decision making in motor and cognitive domains equally? In order to answer the above line of questioning, we propose to use a task, the Balloon Analog Risk Task (BART), which tests risk-sensitivity in cognitive domain (Claassen et al., 2011). We then propose to adapt that BART to the motor domain.
In BART, the subject is asked to press a key and inflate a virtual balloon displayed on the monitor. For every key press, the virtual balloon is inflated by a fixed amount and the subject earns a fixed number of points. The catch lies in that, on inflation beyond a threshold volume, the balloon bursts and the subject loses all the points. Knowing when to stop and redeem all the points earned so far involves risk based decision making.
The above task, which is a cognitive task, can be redesigned in terms of motor function, specifically in terms of PG performance. Just as in the BART task there is a threshold point at which the balloon bursts, in PG task there is a threshold grip force at which the object slips. In the redesigned BART task, the subject will earn more points as he/she grips the object with the grip force as close as possible to the slip force. Uncertainty can be incorporated by using objects that look identical but with different weights. It will be interesting to see possible parallels in performance of normal or PD patients, on both the cognitive and PG versions of the BART. If the above line of experimentation confirms that there is correlation between risk-sensitivity in motor and cognitive domains, it would place risk-based decision making approach to understanding PD on a stronger foundation.

AUTHOR CONTRIBUTIONS
Ankur Gupta: Computational model development, analysis and manuscript preparation. Pragathi Priyadharsini B: Computational model development, analysis and manuscript preparation. V. Srinivasa Chakravarthy: Computational model development, analysis and manuscript preparation.

Plant
The forces (F L and F G ) obtained from the two controllers are used for determining the kinetic (position, velocity and acceleration of finger and object). The plant model incorporates the F L and F G for obtaining the net forces acting on both the finger (F fin ) and object (F o ), with the interaction based on with the interaction based on finger-object interface through friction (F f ). The net force acting on finger and object is given in Equations (29,30). Please note that M fin is kept constant to M o /10 in the model.
When the object is resting on surface the net force on object is zero as there is no acceleration. So, the normal force is obtained by keeping F o = 0 in Equation (30). When the object is lifted from the table the normal force becomes zero. F n determination is given in Equation (31).
The frictional force (F f ) coupling the finger and object is given in Equation (32) Where, the F slip , representing the maximum frictional force that can be generated is given in Equation (33).
The F f required to prevent slip is given in Equation (34) According to Newton's second law of motion force is given as a product of mass and acceleration. So, Equations (29, 30) can also be represented as Equations (35,36).
The kinetic parameters can be obtained by integrating d 2 X o dt 2 to obtain velocity and double integrated to obtain the position.

APPENDIX B TRAINING RBF
In order to determine U in PG performance we first need to identify the state and the reward signal. Since F Gref is the key variable that decides the final outcome, F Gref at trial ('t') is treated as a state variable.
As described earlier, calculation of U(F Gref (t)) requires V(F Gref (t)) and h(F Gref (t)) as explicit functions of F Gref (t). To this end we use data-modeling capabilities of neural networks to implement V(F Gref ) and h(F Gref ) as explicit functions of F Gref .
Using the values of V(F Gref ) and h(F Gref ) as output and F Gref as input, an RBFNN (contains 60 neurons with the centroids distributed over a range [0.1 12] in steps of 0.2, and a standard deviation (σ RBF ) of 0.7) was constructed and trained to approximate V(F Gref ) and h(F Gref ). For a given F Gref (t) , a feature vector ( ) is represented using RBFNN [Equation (37)].
Here, for the m th basis function, μ m denotes the center and σ m denotes the spread.
Using the φ that was obtained from Equation (37). The RBFNN weight for determining value, w V , is updated in Equation (38). Hence the value is the mean of all the V CE 's obtained on where η V is the learning rate maintained to be 0.1, and the change in V CE is given as in Equation (39).
The risk function (h) is then the variance in the V CE as per Equation (40). Risk is the variance seen in all the V CE 's obtained onF Gref.
The weights for risk function w h , is updated Equation (41) Here, η h is the learning rate for risk function = 0.1 and ξ is the risk prediction error [Equation (40)]. From the trained RBFNN, V(F Gref ) and h(F Gref ) are calculated using Equations (42, 43), respectively.