Statistical Data Assimilation: Formulation and Examples from Neurobiology

For the Research Topic Data Assimilation and Control: Theory and Applications in Life Sciences we first review the formulation of statistical data assimilation (SDA) and discuss algorithms for exploring variational approximations to the conditional expected values of biophysical aspects of functional neural circuits. Then we report on the application of SDA to (1) the exploration of properties of individual neurons in the HVC nucleus of the avian song system, and (2) characterizing individual neurons formulated as very large scale integration (VLSI) analog circuits with a goal of building functional, biophysically realistic, VLSI representations of functional nervous systems. Networks of neurons pose a substantially greater challenge, and we comment on formulating experiments to probe the properties, especially the functional connectivity, in song command circuits within HVC.


INTRODUCTION
A broad class of 'inverse' problems presents itself in many scientific and engineering inquiries. The overall question addressed by these is how to transfer information from laboratory and field observations to candidate models of the processes underlying those observations.
The existence of large, information rich, well curated data sets from increasingly sophisticated observations on complicated nonlinear systems has set new challenges to the information transfer task. Assisting with this challenge are new substantial computational capabilities.
Together they have provided an arena in which principled formulation of this information transfer along with algorithms to effect the transfer have come to play an essential role. This paper reports on some efforts to meet this class of challenge within neuroscience. Many of the ideas are applicable much more broadly than our focus, and we hope the reader will find this helpful in their own inquiries.
In this special issue entitled Data Assimilation and Control: Theory and Applications in Life Sciences, of the journal Frontiers in Applied Mathematics and Statistics-Dynamical Systems, we participate in the broader quantitative setting for this information transfer. The procedures are called "data assimilation" following its use in the effort to develop realistic numerical weather prediction models [1,2] over many decades. We prefer the term 'statistical data assimilation' (SDA) to emphasize that key ingredients in the procedures involved in the transfer rest on noisy data and on recognizing errors in the models to which information in the noisy data is to be transferred.
This article begins with a formulation of SDA with some additional clarity beyond the discussion in [3]. We also discuss some algorithms helpful for implementing the information transfer, testing model compatibility with the available data, and quantitatively identifying how much information in the data can be represented in the model selected by the SDA user. Using SDA will also remind us that data assimilation efforts are well cast as problems in statistical physics [4].
After the discussion of SDA, we turn to some working ideas on how to perform the high dimensional integrals involved in SDA. In particular we address the 'standard model' of SDA where data is contaminated by Gaussian noise and model errors are represented by Gaussian noise, though the integrals to be performed are, of course, not Gaussian. The topics include the approximation of Laplace [5,6] and Monte Carlo methods.
With these tools in hand, we turn to neurobiological questions that arise in the analysis of individual neurons and, in planning, for network components of the avian song production pathway. These questions are nicely formulated in the general framework, and we dwell on specifics of SDA in a realistic biological context. The penultimate topic we address is the use of SDA to calibrate VLSI analog chips designed and built as components of a developing instantiation of the full songbird song command network, called HVC. Lastly, we discuss the potential utlization of SDA for exploring biological networks.
At the outset of this article we may expect that our readers from Physics and Applied Mathematics along with our readers from Neurobiology may find the conjunction of the two "strange bedfellows" to be incongruous. For the opportunity to illuminate the natural melding of the facets of both kinds of questions, we thank the editors of this special issue.

General Overview of Data Assimilation
We will provide a structure within which we will frame our discussion of transfer of information from data to a model of the underlying processes producing the data.
We start with a observation window in time [t 0 , t F ] within which we make a set of measurements at times t = {τ 1 , τ 2 , ..., τ k , ..., τ F }; t 0 ≤ τ k ≤ t F . At each of these measurement times, we observe L quantities y(τ k ) = {y 1 (τ k ), y 2 (τ k ), ..., y L (τ k )}. The number L of observations at each measurement time τ k is typically less, often much less, than the number of degrees of freedom D in the observed system; D L.
These are views into the dynamical processes of a system we wish to characterize. The quantitative characterization is through a model we choose. It describes the interactions among the states of the observed system. If we are observing the time course of a neuron, for example, we might measure the membrane voltage y 1 (τ k ) = V m (τ k ) and the intracellular Ca 2+ concentration y 2 (τ k ) = [Ca 2+ ](τ k ). From these data we want to estimate the unmeasured states of the model as a function of time as well as estimate biophysical parameters in the model.
The processes characterizing the state of the system (neuron) we call x a (t); a = 1, 2, ..., D ≥ L, and they are selected by the user to describe the dynamical behavior of the observations through a set of equations in continuous time or in discrete time t n = t 0 + n∆t; n = 0, 1, ..., where q is a set of fixed parameters associated with the model. f (x(n), q) is related to F(x(t), q) through the choice the user makes for solving the continuous time flow for x(t) through a numerical solution method of choice [7].
Considering neuronal activity, Eq. (1) could be coupled Hodgkin-Huxley (HH) equations [8,9] for voltage, ion channel gating variables, constituent concentrations, and other ingredients. If the neuron is isolated in vitro, such as by using drugs to block synaptic transmission, then there would be no synaptic input to the cell to describe. While if it is coupled to a network of neurons, their functional connectivity would be described in F(x(t), q) or f (x(n), q). Typical parameters might be maximal conductances of the ion channels, reversal potentials, and other time-independent numbers describing the kinetics of the gating variables. In many experiments L is only 1, namely, the voltage across the cell membrane, while D may be on the order of 100; Hence D L.
As we proceed from the initiation of the observation window at t 0 we must move our model equation variables x(0), Eq. (2), from t 0 to τ 1 where a measurement is made. Then using the model dynamics we move along to τ 2 and so forth until we reach the last measurement time τ F and finally move the model from x(τ F ) to x(t F ). In each stepping of the model equations Eq. (2) we may make many steps of ∆t in time to achieve accuracy in the representation of the model dynamics. The full set of times t n at which we evaluate the model x(t n ) we collect into the path of the state of the model through D-dimensional space: The dimension of the path is (N + 1)D + N q , where N q is the number of parameters q in our model.
It is worth a pause here to note that we have now collected two of the needed three ingredients to effect our transfer of the information in the collection of all measurements Y = {y(τ 1 ), y(τ 2 ), ..., y(τ F )} to the model f (x(n), q) along the path X through the observation window [t 0 , t F ]: (1) data Y and (2) a model of the processes in Y, devised by our experience and knowledge of those processes. The notation and a visual presentation of this is found in Fig. (1).
The third ingredient, comprised of methods to generate the transfer from Y to properties of the model, will command our attention throughout most of this paper. If the transfer methods are successful and, according to some metric of success, we arrange matters so that at the measurement times τ k , the L model variables x(t) associated with y(τ k ) are such that x l (τ k ) ≈ y l (τ k ), we are not finished. We have then only demonstrated that the model is consistent with the known data Y. We must use the model, completed by the estimates of q and the state of the model at t F , x(t F ), to predict forward for t > t F , and we should succeed in comparison with measurements for y(τ r ) for τ r > t F . As the measure of success of predictions, we may use the same metric as utilized in the observation window. Figure 1. A visual representation of the window t 0 ≤ t ≤ t F in time during which L-dimensional observations y(τ k ) are performed at observation times t = τ k ; k = 1, ..., F . This also shows times at which the D-dimensional model developed by the user x(n + 1) = f (x(n), q) is used to move forward from time n to time n + 1: t n = t 0 + n∆t; n = 0, 1, ..., N . D ≥ L. The path of the model X = {x(0), x(1), ..., x(n), ..., x(N ) = x(F )} and the collection Y of L-dimensional observations at each observation time τ k , Y = {y(τ 1 ), y(τ 2 ), ..., y(τ k ), ..., y(τ F } (y = {y 1 , y 2 , ..., y L }) is also indicated.
As a small aside, the same overall setup applies to supervised machine learning networks [10] where the observation window is called the training set; the prediction window is called the test set, and prediction is called generalization.

The Data are Noisy; the Model has Errors
Inevitably, the data we collect is noisy, and equally the model we select to describe the production of those data has errors. This means we must, at the outset, address a conditional probability distribution P (X|Y) as our goal in the data assimilation transfer from Y to the model. In [3] we describe how to use the Markov nature of the model x(n) → x(n + 1) = f (x(n), q) and the definition of conditional probabilities to derive the recursion relation: P (X(n + 1)|Y(n + 1)) = P (y(n + 1), x(n + 1), X(n)|Y(n)) P (y(n + 1)|Y(n)) P (x(n + 1), X(n + 1)|Y(n)) • P (x(n + 1)|x(n))P (X(n)|Y(n)) = exp[CM I(y(n + 1), x(n + 1), X(n)|Y(n))] • P (x(n + 1)|x(n))P (X(n)|Y(n)), This is a provisional file, not the final typeset article where we have identified CM I(a, b|c) = log (P (a,b|c) P (a|c) P (a|c) . This is Shannon's conditional mutual information [11] telling us how many bits (for log 2 ) we know about a when observing b conditioned on c. For us a = {y(n + 1)}, b = {x(n + 1), X(n + 1)}, c = {Y(n)}.
Using this recursion relation to move backwards through the observation window from t F = t 0 + N ∆t through the measurements at times τ k to the start of the window at t 0 , we may write, up to factors independent of X If we now write P (X|Y) ∝ exp[−A(X)] where A(X), the negative of the log likelihood, we call the action, then conditional expected values for functions along the path X are defined by , and all factors in the action independent of X cancel out here. The action takes the convenient expression which is the sum of the terms which modify the conditional probability distribution when an observation is made at t = τ k and the sum of the stochastic version of x(n) → x(n + 1) − f (x(n), q) and finally the distribution when the observation window opens at t 0 .
What quantities G(X) are of interest? One natural one is the path G(X) = X µ ; µ = {a, n} itself; another is the covariance around that mean X µ =X µ = X µ : (X µ −X µ )(X ν −X ν ) . Other moments are of interest, of course. If one has an anticipated form for the distribution at large X, then G(X) may be chosen as a parametrized version of that form and those parameters determined near the maximum of P (X|Y).
The action simplifies to what we call the 'standard model' of data assimilation when (1) observations y are related to their model counterparts via Gaussian noise with zero mean and diagonal precision matrix R m , and (2) model errors are associated with Gaussian errors of mean zero and diagonal precision matrix R f : If we have knowledge of the distribution P (x(0)) at t 0 we may add it to this action. If we have no knowledge of P (x(0)), we may take its distribution to be uniform over the dynamic range of the model variables, then it, as here, is absent, canceling numerator and denominator in Eq. (5).
Our challenge is to perform integrals such as Eq. (5). One should anticipate that the dominant contribution to the expected value comes from the maxima of P (X|Y) or, equivalently the minima of A(X). At such Frontiers minima, the two contributions to the action, the measurement error and the model error, balance each other to accomplish the explicit transfer of information from the former to the latter.
We note, as before, that when f (x(n), q) is nonlinear in X, as it always is in interesting examples, the expected value integral Eq. (5) is not Gaussian. So, some thinking is in order to approximate this high dimensional integral. We turn to that now. After consideration of methods to do the integral, we will return to a variety of examples taken from neuroscience.
The two generally useful methods available for evaluating this kind of high dimensional integral are Laplace's method [5,6] and Monte Carlo techniques [7,12,13]. We address them in order. We also add our own new and useful versions of the methods.

Laplace's Method
To locate the minima of the action A(X) = − log[P (X|Y)] we must seek paths X (j) ; j = 0, 1, ... such that ∂A(X)/∂X| X (j) = 0, and then check that the second derivative at X (j) , the Hessian, is a positive definite matrix in path coordinates. The vanishing of the derivative is a necessary condition.
Laplace's method [5] expands the action around the X (j) seeking the path X (0) with the smallest value of A(X). The contribution of X (0) to the integral Eq. (5) is approximately exp[A(X (1) ) − A(X (0) )] bigger than that of the path with the next smallest action.
This sounds more or less straightforward; however, finding the global minimum of a nonlinear function such as A(X) is an NP-complete problem [14]. In a practical sense one cannot expect to succeed with such problems. However there is an attractive feature of the form of A(X) that permits us to accomplish more.
We now discuss two algorithmic approaches to implementing Laplace's method.

Precision Annealing for Laplace's Method
Looking at Eq. (7) we see that if the precision of the model is zero, R f = 0, the action is quadratic in the L measured variables x l (n) and independent of the remaining states. The global minimum of such an action comes with x l (τ k ) = y l (τ k ) and any choice for the remaining states and parameters. Choose the path with these values of x(τ k ) and values from a uniform distribution of the other state variables and parameters covering the expected dynamic range of those, and call it path X init . In practice, we recognize that the global minimum of A(X) is degenerate at R f = 0, so we select many initial paths. We choose N I of them, and initialize whatever numerical optimization program we have selected, to run on each of them. We continue to call the collection of N I paths X init .
• Now we increase R f from R f = 0 to a small value R f 0 . Use each of the N I paths in X init as initial conditions for our numerical optimization program chosen to find the minima of A(X), and we arrive at , and now use the N I paths X 0 as the initial conditions for our numerical optimization program chosen to find the minima of A(X), we arrive at N I paths X 1 . Evaluate A(X 1 ) on all N I paths X 1 .
• We increase R f = R f 0 α → R f 0 α 2 . Now use the N I paths X 1 as the initial conditions for our numerical optimization program chosen to find the minima of A(X), we arrive at N I paths X 2 . Evaluate A(X 2 ) on all N I paths X 2 .
• Continue in this manner increasing R f to R f = R f 0 α β ; β = 0, 1, ..., then using the selected numerical optimization program to arrive at N I paths X β . Evaluate A(X β ) on all N I paths X β .
• As a function of log α We call this method precision annealing (PA) [15,16,17,18]. It slowly turns up the precision of the model collecting paths at each R f that emerged from the degenerate global minimum at R f = 0. In practice it is able to track N I possible minima of A(X) at each R f . When not enough information is presented to the model, that is L is too small, there are many local minima at all R f . This is a manifestation of the NP-completeness of the minimization of A(X) problem. None of the minima may dominate the expected value integral of interest.
As L increases, and enough information is transmitted to the model, for large R f one minimum appears to stand out as the global minimum, and the paths associated with that smallest minimum yields good predictions. We note that there are always paths, not just a single path, as we have a distribution of paths, N I of which are sampled in the PA procedure, within a variation of size 1/ √ R m . A clear example of this is seen in [19] in a small, illustrative model.

"Nudging" within Laplace's Method
In meteorology one approach to data assimilation is to add a term to the deterministic dynamics which move the state of a model towards the observations [20] x a (n where u(n) > 0 and vanishes except where a measurement is available. This is referred to as 'nudging'. It appears in an ad hoc, but quite useful, manner.
Within the structure we have developed, one may see that the 'nudging term' arises through the balance between the measurement error term and the model error term in the action. This is easy to see when we look at the continuous time version of the data assimilation standard model The extremum of this action is given by the Euler-Lagrange equations for the variational problem [21] δ ab d dt in which the right hand side is the 'nudging' term appearing in a natural manner. Approximating the operator δ ab We will use both the full variation of the action, in discrete time, as well as its nudging form in our examples below.

Monte Carlo Methods
Monte Carlo methods [22,12,18,7] are well covered in the literature. We have not used them in the examples in this paper. However, the development of a precision annealing version of Monte Carlo techniques promises to address the difficulties with large matrices for the Jacobian and Hessians required in variational principles [23]. When one comes to network problems, about which we comment later, this method may be essential.

Using SDA to Analyze the Avian Song System
We take our examples of the use of SDA in neurobiology from experiments on the avian song system. These have been performed in the University of Chicago laboratory of Daniel Margoliash, and we do not plan to describe in any detail the experiments nor the avian song production pathways in the avian brain. We give the essentials of the experiments and direct the reader to our references to develop the full biologically oriented idea why this system is enormously interesting.
Essentially, however, the manner in which songbirds learn and produce their functional vocalizationsong-is an elegant non-human example of a behavior that is cultural: the song is determined both by a genetic substrate and, interestingly, by refinement on top of that substrate by juveniles learning the song from their (male) elders. The analogs to learning speech in humans [24] are striking.
Our avian singer is a zebra finch. They, as do most other songbirds, learn vocal expression through auditory feedback [25,26,27,28,24]. This makes the study of the song system a good model for learning complex behavior [29,27,30]. Parts of the song system are analogous to the mammalian basal ganglia and regions of the frontal cortex [27,31,32]. Zebra finch in particular have the attractive property of singing only a single learned song, and with high precision, throughout their adult life.
Beyond the auditory pathways themselves, two neural pathways are principally responsible for song acquisition and production in zebra finch. The first is the Anterior Forebrain Pathway (AFP) which modulates learning. The second is a posterior pathway responsible for directing song production: the Song Motor Pathway (SMP) [26,28,33]. The HVC nucleus in the avian brain uniquely contributes to both of these [28].
There are two principal classes of projection neurons which extend from HVC: neurons which project to the robust nucleus of the arcopallium (HVC RA ), and neurons which project to Area X (HVC X ). HVC RA neurons extend to the SMP pathway and HVC X neurons extend to the AFP [28,34]. These two classes of projection neurons combined with classes of HVC interneurons, make up the three broad classes of neurons within HVC. Fig. (2) [35] displays these structures in the avian brain.
In vitro observations of each HVC cell type have been obtained through patch-clamp techniques making intracellular voltage measurements in a reduced, brain slice preparation [25]. In this configuration, the electrode can simultaneously inject current into the neuron while measuring the whole cell voltage response [36]. From these data, one can establish the physical parameters of the system [25]. Traditionally this is done using neurochemicals to block selected ion channels and measuring the response properties of others [37]. Single current behavior is recorded and parameters are found through mathematical fits of the data. This procedure has its drawbacks, of course. There are various technical problems with the choice of channel blockers. Many of even the modern channel blockers are not subtype specific [38] and may only partially block channels [39]. A deeper conceptual problem is that is difficult to know what channels one may have missed altogether. Perhaps there are channels which express themselves only outside the bounds of the experimental conditions.
Our solution to such problems is the utilization of statistical data assimilation (SDA). This a method developed by meteorologists and others as computational models of increasingly large dynamical systems have been desired. Data assimilation has been described in our earlier sections.
In this paper, we focus on the song learning pathway, reporting on experiments involving the HVC X neuron. The methods are generally applicable to the other neurons in HVC, and actually, to neurons seen as dynamical systems in general.
We start with a discussion about the neuron model. First we demonstrate the utility of our precision annealing methods through the use of twin experiments. These are numerical experiments in which 'data' is generated through a known model (of HVC X ), then analyzed via precision annealing. In a twin experiment, we know everything, so we can verify the SDA method by looking at predictions after a observation window in which the model is trained, and we may also compare the estimations of unobserved state variables and parameters to the ingredients and output of the model.
Twin experiments are meant to mirror the circumstances of the real experiment. We start by taking the model that we think describes our physical system. Initial points for the state variables and parameters are chosen, which are used along with the model to numerically integrate forward in time. This leaves us with complete information about the system. Noise is added to a subset of the state variables to emulate the data to be collected in a lab experiment. We then perform PA on these simulated data, as if they were real data. The results of these numerical experiments can be used to inform laboratory experiments, and indeed help design them, by identifying the necessary measurements and stimulus needed to accurately electrophysiologically characterize a neuron.
The second set of SDA analyses we report on using 'nudging', as described above, to estimate some key biophysical properties of HVC X neurons from laboratory data. This SDA procedure is applied to HVC X neurons in two different birds. The results show that though each bird is capable of normal vocalization, their complement of ion channel strengths is apparently different. We report on a suggestive example of this, leaving a full discussion to [40].
In order to obtain good estimation results, we must choose a forcing or stimulus with the model in mind: the dynamical range of the neuron must be thoroughly explored. This suggests a few key properties of the stimulus: • The current waveform of I injected (t) must have sufficient amplitudes (±) and must be applied sufficiently long in time that it explores the full range of the neuron variation.
• The frequency content of the stimulus current must be a low enough that it does not exceed the low-pass cutoff frequency associated with the RC time constant of the neuron. This cutoff is typically in the neighborhood of 50-100Hz.
• The current must explore all time scales expressed in the neuron's behavior.

Analysis of HVC X Data
The model for an HVC X neuron is substantially taken from [25] and described in our Appendix. We now use this model in a 'twin experiment' in which PA is utilized, and then using 'nudging' we present the analysis of experimental data on two Zebra Finch.

Twin Experiment on HVC X Neuron Model
A twin experiment is a synthetic numerical experiment meant to mirror the conditions of a laboratory experiment. We use our mathematical model with some informed parameter choices in order to generate numerical data. Noise is added to observable variables in the model, here V (t). These data are then put through our SDA procedure to estimate parameters and unobserved states of the model. The neuron model is now calibrated or completed.
Using the parameters and the full state of the model at the end t F of an observation window [t 0 , t F ], we take a current waveform I injected (t ≥ t F ) to drive the model neuron and predict the time course of all dynamical variables in the prediction window [t F , ...]. This validation of the model is the critical test of our SDA procedure, here PA. In a laboratory experiment we have no specific knowledge of the parameters in the model and, by definition, cannot observe the unobserved state variables; here we can do that. So, 'fitting' the observed data within the observation window [t 0 , t F ] is not enough, we must reproduce all states for t ≥ t F to test our SDA methods. We use the model laid out in the Appendix. We assume that the neuron has a resting potential of −70 mV and set the initial values for the voltage and each gating variable accordingly. We assume that the internal calcium concentration of the cell is C in = 0.1 µM . We use an integration time step of 0.02 ms and integrate forward in time using an adaptive Runge-Kutta method [7]. Noise is added to the voltage time course by sampling from a Gaussian distribution N (0, 2) in units of mV. The waveform of the injected current was chosen to have three key attributes: (1) It is strong enough to cause spiking in the neuron, (2) it dwells a long time in a hyperpolarizing region, and (3) its overall frequency content is low enough to not be filtered out by the neuron's RC time constant. On this last point, a neuron behaves like an RC circuit, it has a cut off frequency limited by the time constant of the system. Any input current which has a frequency higher than that of the cut off frequency won't be 'seen' by the neuron. The time constant is taken to be the time it takes to spike and return back to 37% above its resting voltage. We chose a current where the majority of the power spectral density exists below 50 Hz.
The first two seconds of our chosen current waveform is a varying hyperpolarizing current. In order to characterize I h (t) and I CaT (t), it is necessary to thoroughly explore the region where the current is active. I h (t) is only activated when the neuron is hyperpolarized. The activation of I h (t) deinactivates I CaT (t), thereby allowing its dynamics to be explored. In order to characterize I N a (t) and I K (t), it is necessary to cause spiking in the neuron. The depolarizing current must be strong enough to hit the threshold potential for spike activation.
The parameters used to generate the data used in the twin experiment are in Table (1), and the injection current data and the membrane voltage response may be seen in figures Fig. (3) and Fig. (4).
The numbers chosen for the data assimilation procedure in this paper are α = 1.4 and β ranging from 1 to 100. R f,0,V = 10 −4 for voltage and R f,0,j = 1 for all gating variables. These numbers are chosen During estimation we instructed our methods to estimate the inverse capacitance and estimate the ratio g = g Cm instead of g and C m independently. That separation can present a challenge to numerical procedures. We also estimated the reversal potentials as a check on the SDA method.
Within our computational capability we can reasonably perform estimates on 50,000 data points. This captures a second of data when ∆t = 0.02 ms. However, there are time constants in the model neuron which are on order 1 second. In order for us to estimate the behavior of these parameters accurately, we need to see multiple instances of the full response. We need a window on the order of 2-3 seconds. We can obtain this by downsampling the data. We know from previous results that downsampling can lead to better estimations [41]. We take every i th data point, depending on the level of down sampling we want to do. In this data assimilation run, we downsampled by a factor of 4 to incorporate 4 seconds of data in the estimation window.
We look at a plot of the action as a function of β; that is, log[R f /R f 0 ]. We expect to see a leveling of of the action [17] as a function of R f . If the action becomes independent of R f , we can then explore how well our parameter estimations perform when integrating them forward as predictions of the calibrated Figure 5. Action Levels of the standard model action for the HVC X neuron model discussed in the text. We see that the action rises to a 'plateau' where it becomes quite independent of R f . The calculation of the action uses PA with α = 1.4 and R f 0 = R m . N I = 100 initial choices for the path X init were used in this calculation. For small R f one can see the slight differences in action level associated with local minima of A(X).
model. Looking at the action plot in Fig. (5), we can see there is a region in which the action appears to level off, around β = 40. It is in this region where we look for our parameter estimates.
We examine all solutions around this region of β and utilize their parameter estimates in our predictions. We compare our numerical prediction to the "real" data from our synthetic experiment. We evaluate good predictions by finding the correlation coefficient between these two curves. This metric is chosen instead of a simple root mean square error because slight variations in spike timings yield a high amount of error even if the general spiking pattern is correct. The prediction plot and parameters for the best prediction can be seen in Fig. (6) and Table (2). The voltage trace in red is the estimated voltage after data assimilation is completed. It is overlayed on the synthetic input data in black. The blue time course is a prediction, starting at the last time point of the red estimated V (t) trace and using the parameter estimates for t ≤ 4000 ms.
The red curve matches the computed voltage trace quite well. There is no deviation in the frequency of spikes, spike amplitudes, or the hyperpolarized region of the cell. Looking at the prediction window, we can see that there is some deviation in the hyperpolarized voltage trace around 9000 ms. This is an indication that our parameter estimates for currents activated in this region are not correct. Comparing parameters, we can see that E h is estimated as lower than its actual value. Despite that, we still are able to reproduce neuron behavior fairly well. Figure 6. Results of the 'twin experiment' using the model HVC X neuron described in the Appendix. Noise was added to data developed by solving the dynamical equations. The noisy V (t) was presented to the precision annealing SDA calculation along with the I injected (t) in the observation window t 0 = 0ms, t F = 4000 ms. The noisy model voltage data is shown in black, and the estimated voltage is shown in red. For t ≥ 4000ms we show the predicted membrane voltage, in blue, generated by solving the HVC X model equations using the parameters estimated during SDA within the observation window.

Analysis of Biophysical Parameters from HVC X Neurons in two Zebra Finch
Our next use of SDA employs the 'nudging' method described in Eq. (8). In this section we used some of the data [40] taken in experiments on multiple HVC X neurons from different zebra finches. The questions we asked was whether we could, using SDA, identify differences in biophysical characteristics of the birds. This question is motivated by prior biological observations [40].
Using the same HVC X model as before, we estimated the maximal conductances {g N a , g K , g CaT , g SK , g h } holding fixed other kinetic parameters and the Nernst/Reversal potentials. The baseline characteristics of an ion channel are set by the properties of the cell membrane and the complex proteins penetrating the membrane forming the physical channel. Differences among birds would then come from the density or numbers of various channels as characterized by the maximal conductances. If such differences were identified, it would promote further investigation of the biologically exciting proposition that these differences arise in relation to some aspect of the song learning experience of the birds [40].
In Fig. (7) and Fig. (8) we display the stimulating current and membrane voltage response from one of 9 neurons in our large sample. The analysis using SDA was of four neurons from one bird and seven neurons from another. The results for {g CaT , g N a , g SK } is displayed in Fig. (9). The maximal conductances from one bird are shown in blue and from the other bird, in red. There is a striking difference between the distributions of maximal conductances.
We do not propose here to delve into the biological implications of these results. Nevertheless, we note that the neurons from each bird occupy a small but distinct region of the parameter space (Fig. 9). This result and its implications for birdsong learning, and more broadly for neuroscience, are described in [40].
Here, however, we display this result as an example of the power of SDA to address a biologically important question in a systematic, principled manner beyond what is normally achieved in analyses of such data.
To fully embrace the utility of SDA for these experiments, however, further work is needed. A limitation of the present result is that the SDA estimates for g SK for a subset of the neurons/observations for Bird One reach the bounds of the observation window (Fig. 9). Addressing such issues would be prelude to the exciting possibility of estimating more parameters than just the principle ion currents in the Hodgkin-Huxley equations. This could use SDA numerical techniques to calculate, over hours or days, estimates of parameters that could require months or years of work to measure with traditional biological and biophysical approaches, in some cases requiring specialized equipment beyond that available for most in vitro recording set ups. In contrast, applying SDA to such data sets requires only a computer.  Figure 8. The voltage response from I applied , Fig. (7). One of the library of stimuli used in exciting voltage response activity in an HVC X neuron. The cell was prepared in vitro, and a single patch clamp electrode injected I injected (t) (this waveform) and recorded the membrane potential.

Analysis of Neuromorphic VLSI Instantiations of Neurons
An ambitious effort in neuroscience is the creation of low power consumption analog neural-emulating VLSI circuitry. The goals for this effort range from the challenge itself to the development of fast, reconfigurable circuitry on which to incorporate information revealed in biological experiments for use in One of the curious roadblocks in achieving critical steps of these goals is that after the circuitry is designed and manufactured into VLSI chips, what comes back from a fabrication plant is not precisely what we designed. This is due to the realities of the manufacturing processes and not inadequacies of the designers.  Figure 9. A three dimensional plot of three of the maximal conductances estimated from HVC X cells using the stimulating current shown in Fig. (7). Membrane voltage responses from five neurons from one bird were recorded many times, and membrane voltage responses from four neurons from a second bird were recorded many times. One set of maximal conductances {g N a , g CaT , g SK } are shown. The estimates from Bird 1 are in red-like colors, and the estimates from Bird 2 are in blue-like colors. This is just one out of a large number of examples discussed in detail in [40].
To overcome this barrier in using the VLSI devices in networks, we need an algorithmic tool to determine just what did return from the factory, so we know how the nodes of a silicon network are constituted. As each chip is an electronic device built on a model design, and the flaws in manufacuring are imperfections in the realization of design parameters, we can use data from the actual chip and SDA to estimate the actual parameters on the chip.
SDA has an advantageous position here. If we present to the chip input signals with much the same design as we prepared for the neruobiological experimets discussed in the previous section, we can measure everything about each output from the chip and use SDA to estimate the actual parameters produced in manufacturing. Of course, we do not know those paramters a priori so after estimating the parameters, thus 'calibrating' the chip, we must use those estimated parameters to predict the response of the chip to a new stimulating currents. That will validate (or not) the completion of the model of the actual circuitry on the chip and permit confidence in using it in building interesting networks.
We have done this on chips produced in the laboratory of Gert Cauwenberghs at UCSD using PA [42,41] and using 'nudging' as we now report.
The chip we worked with was designed to produce the simplest spiking neuron, namely one having just Na, K, and leak channels [8,9] as in the original HH experiments. This neuron has four state variables in which the gating variables and the functions w ∞ (V ) are discussed in depth in [8,9].
In our experiments on a 'NaKL' chip we used the stimulating current displayed in Fig. (10), Figure 10. This waveform for I injected (t) was used to drive the VLSI NaKL neuron after receipt from the fabrication facility.
and measured all the neural responses {V (t), m(t), h(t), n(t)}. These observations were presented to the designed model within SDA to estimate the parameters in the model.
We then tested/validated the estimations by using the calibrated model to predict how the VLSI chip would respond to a different injected current. In Fig. (11) we show the observed V data (t) in black, the estimation of the voltage through SDA in red, and the prediction of V (t) in blue for times after the end of the observation window.
While one can be pleased with the outcome of these procedures, for our purposes we see that the use of our SDA algorithms gives the user substantial confidence in the functioning characteristics of the VLSI chips one wishes to use at the nodes of a large, perhaps even very large, realization of a desired neural circuit in VLSI. We are not unaware of the software developments to allow efficient calibration of very large numbers of manufactured silicon neurons. A possible worry about also determining the connectivity, both the links and their strength and time constants, may be alleviated by realizing these links through a high bandwidth bi-directional connection of the massive array of chips and the designation of connectivity characteristics on an off-chip computer. Figure 11. The NaKL VLSI neuron was driven by the waveform for I injected (t) seen in Fig. (10). The four state variables {V (t), m(t), h(t), n(t)} for the NaKL model were recorded and used in an SDA 'nudging' protocol to estimate the parameters of the model actually realized at the manufacturing facility. Here we display the membrane voltages: {V data (t), V est (t), V pred (t)}-the observed membrane voltage response when I injected (t) was used, the estimated voltage response using SDA, and finally the predicted voltage response V pred (t) from the calibrated model actually on the VLSI chip. In a laboratory experiment, only this attribute of a neuron would be observable.
Part of the same analysis is the ability to observe, estimate and predict the experimentally unobservable gating variables. This serves, in this context, as a check on the SDA calculations. The Na inactivation variable h(t) is shown in Fig. (12) as its measured time course h data (t) in black, its estimated time course h est (t) in red, and its predicted time course h pred (t) in blue.

DISCUSSION
Our review of the general formulation of statistical data assimilation (SDA) started our remarks. Many details can be found in [3] and subsequent papers by the authors. Recognizing that the core problem is to perform, approximately of course, the integral in Eq. (5) is the essential take away message. This task requires well 'curated' data and a model of the processes producing the data. In the context of experiments in life sciences or, say, aquisition of data from earth system sensors, curation includes an assessment of errors and the properties of the instruments as well.
One we have the data and a model, we still need a set of procedures to transfer the information from the data to the model, then test/validate the model on data not used to train the model. The techniques we covered are general. Their application to examples from neuroscience comprise the second part of this paper.
In the second part we first address properties of the avian songbird song production pathway and a neural control pathway modulating the learning and production of functional vocalization-song. We focus our attention on one class of neurons, HVC X , but have also demonstrated the utility of SDA to describe the response properties of other classes of neurons in HVC, such as HVC RA [43] and HVC I [44]. Indeed, the SDA approach is generally applicable wherever there is insight to relate biophysical properties of neurons to their dynamics through Hodgkin Huxely equations.
Our SDA methods considered variational algorithms that seek the highest conditional probability distributions of the model states and parameters conditioned on the collection of observations over a measurement window in time. Other approaches, especially Monte Carlo algorithms were not discussed here, but are equally attractive.
We discussed methods of testing models of HVC X neurons using "twin experiments" in which a model of the individual neuron produces synthetic data to which we add noise with a selected signal-to-noise-ratio. Some state variable time courses from the library of these model produced data, for us the voltage across the cell membrane, is then part of the action Eq. (7), specifically in the measurement error term. Errors in the model are represented in the model error term of the action.
Using a precision annealing protocol to identify and track the global minimum of the action, the successful twin experiment gives us confidence in this SDA method from information trans from data to the model.
We then introduced a 'nudging' method as an approximation to the Euler-Lagrange equations derived from the numerical optimization of the action Eq. (7)-this is Laplace's method in our SDA context. The nudging method, introduced in meteorology some time ago, was used to distinguish between two different members of the Zebra Finch collection. We showed, in a quite preliminary manner, that the two, unrelated birds of the same species, express different HVC network properties as seen in a critical set of maximal conductances for the ion channels in their dynamics.
Finally we turned to a consideration of the challenge of implementing in VLSI technology the neurons in HVC towards the goal of building a silicon-HVC network. The challenge at the design and fabrication stages of this effort where illuminated by our use of SDA to determine what was actually returned from the manufacturing process for our analog neurons.

Moving Forward to Network Analysis
Finally, we have a few comments associated with the next stage of analysis of HVC. In this, and previous papers, we analyzed individual neurons in HVC. These analyses were assisted by our using SDA, through twin experiments, to design laboratory experiments though the selection of effective stimulilaing injected currents.
Having characterized the electrophysiology of an individual neuron within the framework of Hodgkin-Huxley (HH) models, we may now proceed beyond the study of individual neurons [45] in vitro. Once we have characterized an HVC neuron through a biophysical HH model, we may then use it in vivo as a sensor of the activity of the HVC network where it is connected to HVC RA , HVC I , and other HVC X neurons. The schema for this kind of experiment is displayed in Fig. (13). These experiments require the capability to perform measurements on HVC neurons in the living bird. That capability is available, and experiments as suggested in our graphic are feasible, if challenging.
The schematic indicates that the stimulating input to the experiments is auditory signals, chosen by the user, presented to the bird's ear and reaching HVC through the auditory pathway. The stimuli from this signal is then distributed in a manner to be deduced from experiment and then produces activity in the HVC network that we must model. The goal is, at least initially, to establish, again within the models we develop, the connectivity of HVC neuron classes as it manifests itself in the function of the network. We have some information about this [46,47], and these results will guide the development of the HVC model used in these whole-network experiments. An important point to address is what changes to the in vitro model might be necessary to render it a model for in vivo activity. Figure 13. A cartoon-like idea of an experiment to probe the HVC network. In this graphic three neuron populations of {HV C X , HV C I , HV C RA } neurons are stimulated by auditory signals P (t) presented to the bird in vivo. This drives the auditory pathway from the ear to HVC, and the network activity is recorded from a calibrated, living HVC X neuron, used here as a sensor for network activity. While the experiment is now possible, the construction of a model HVC network will proceed in steps of complexity using simple then more biophysiclly realistic neuron models and connections among the nodes (neurons) of the network. From libraries of time courses of P (t), chosen by the user, and responses of V (t) in the 'sensor' HVC X neuron, we will use SDA to estimate properties of the network.
variables w(t) satisfy a first order kinetic equation in which for all gating variables except h ∞ (V ) appearing in I N a (t) [25]. θ w is the half-activation voltage and σ w controls the slope of the activation function. For fast gating variables, such as m of I N a , and s of I CaL we replace the time dependence by W ∞ (V ).
τ w (V ) is the time constant of each gating variable. Time constants for the n and hp gating variables (these names refer to [25]) are given below, whereτ w is an average time constant. Our model differs from [25] by one time constant. Instead, τ rs (V ) takes the form presented here: For our choice of ion currents we follow the results of experimental data [49,50,51] and generally reproduce the model listed in [25]. HVC X spiking properties include fast rectifying current, sag in response to hyperpolarizing current, and spike frequency adaption in response to depolarizing current.
C dV (t) dt = I N a (t) + I K (t) + I L (t) + I CaT (t) + I CaL (t) +I A (t) + I SK (t) + I h (t) + I N ap (t) + I injected (t) (18) I N a (t) and I K (t) are the standard HH currents. They produce fast spiking in response to injected currents. I L (t) is a leak current meant to capture all linear currents of the neuron. I CaT (t) is a low threshold T-type calcium current that causes rebound depolarization in cooperation with I h (t). I CaL (t) is a high threshold L-type calcium current. I CaL (t) works in conjunction with I SK (t), a calcium concentration dependent potassium current, to create frequency adaptation in neuron spiking. I A (t) is an A-type potassium current. I N ap (t) is a persistent sodium current. From the model presented in [25], we eliminate I KN a (t), a sodium dependent potassium current, and rewrite all sigmoidal functions as hyperbolic tangents.
The mass conservation equation for Ca 2+ is written as dC(t) dt = (I CaT (t) + I CaL (t)) + k Ca (b Ca − C(t)), again following [25].   Table 1. Parameter values used to numerically generate the HVC X data. The source of these values comes from [25]. Data was generated using an adaptive Runge-Kutta method, and can be seen in Fig. (3) and Fig.  (4).  Table 2. Parameter Estimates from the Best Predictions. The best prediction is chosen by finding the highest correlation coefficient between the predicted voltage and "real" voltage. This comparison can be made on experimental data. It represents an attractive alternative to the familiar least squares metric commonly used. That metric is very sensitive to spike times in data with action potentials: small errors in spike times may result in large errors in a least squares metric.