Capturing Multiple Timescales of Adaptation to Second-Order Statistics With Generalized Linear Models: Gain Scaling and Fractional Differentiation

Single neurons can dynamically change the gain of their spiking responses to take into account shifts in stimulus variance. Moreover, gain adaptation can occur across multiple timescales. Here, we examine the ability of a simple statistical model of spike trains, the generalized linear model (GLM), to account for these adaptive effects. The GLM describes spiking as a Poisson process whose rate depends on a linear combination of the stimulus and recent spike history. The GLM successfully replicates gain scaling observed in Hodgkin-Huxley simulations of cortical neurons that occurs when the ratio of spike-generating potassium and sodium conductances approaches one. Gain scaling in the GLM depends on the length and shape of the spike history filter. Additionally, the GLM captures adaptation that occurs over multiple timescales as a fractional derivative of the stimulus envelope, which has been observed in neurons that include long timescale afterhyperpolarization conductances. Fractional differentiation in GLMs requires long spike history that span several seconds. Together, these results demonstrate that the GLM provides a tractable statistical approach for examining single-neuron adaptive computations in response to changes in stimulus variance.


INTRODUCTION
Neurons adapt their spiking responses in a number of ways to the statistics of their inputs (Fairhall, 2014;Weber and Fairhall, 2019). A particularly well-studied example is adaptation to the stimulus variance, which can provide important computational properties. First, neurons can show gain scaling, such that the input is scaled by the stimulus standard deviation (Fairhall et al., 2001a;Mease et al., 2013). Scaling of the gain by the stimulus standard deviation implies that single spikes maintain the same information about the stimulus independent of its overall amplitude. This adaptation of the "input gain" with stimulus standard deviation can occur very rapidly. Second, the mean firing rate can adapt to variations in the stimulus variance across multiple timescales (Fairhall et al., 2001b;Wark et al., 2007). This form of spike frequency adaptation can in some cases have power-law properties (Pozzorini et al., 2013) and serve to compute the fractional derivative of the variance (Anastasio, 1998;Lundstrom et al., 2008).
One approach to studying such adaptation is to use Hodgkin-Huxley style (HH) conductance based models to explore potential single-neuron mechanisms underlying these computations (Lundstrom et al., 2008;Mease et al., 2013). Although HH models can indeed capture such behavior, the mechanistic HH framework is not ideally suited to reveal its dynamical basis as HH model parameters are difficult to interpret in terms of computation and coding. Moreover, fitting HH models to intracellular data is difficult (Buhry et al., 2011;Csercsik et al., 2012;Vavoulis et al., 2012;Lankarany et al., 2014), and only recently methods that fit HH models to spike trains alone have been gaining success (Meng et al., 2011(Meng et al., , 2014. In contrast, statistical point-process models based on the generalized linear model (GLM) framework have provided a tractable tool for modeling spiking responses of neurons in sensory systems (Truccolo et al., 2005;Pillow et al., 2008). Previous work has shown the utility of finding linear features that can explain the spiking behavior of HH models (Agüera y Arcas and Weber and Pillow, 2017). Unlike simple linear/non-linear models, GLMs also incorporate a dependence on the history of activity, potentially providing a helpful interpretative framework for adaptation (Mease et al., 2014;Latimer et al., 2019a). We therefore fit GLMs to spike trains generated from HH neurons. Here, we considered gain scaling and fractional differentiation individually by using one set of HH neurons that showed gain scaling and another that produced fractional differentiation. Our goal here is not to analyze the details of these models to explain the adaptation phenomena, but to use these models as a source of "experimental" data with which to test the ability of the GLM to capture the same phenomena. We found that the GLMs could reproduce the single-neuron adaptive computations of gain scaling or fractional differentiation. Capturing gain scaling across a range of HH active conductance parameters depended both on the choice of link function and spike history length. As the length of the spike history filter increased, the stimulus dependency of neurons changed from differentiating to integrating (Stevenson, 2018). Capturing adaptation as a fractional derivative required a history filter that could account for long timescale effects: on the order of 10 s. Together these results demonstrate that the GLM provides a tractable statistical framework for modeling adaptation that occurs at the single-neuron level.

MATERIALS AND METHODS
To generate spike train data that display the two types of adaptation we study, we use Hodgkin-Huxley neuron models with parameter choices and channel dynamics previously shown to reproduce effects seen in experimental data. In principle we could use a single model that incorporates both gain scaling and fractional differentiation. While both types of adaptation have been observed in fly neuron H1 (Fairhall et al., 2001b), they have not been reported to occur in the same cortical cells. We therefore choose here to separate the two effects by considering two previously proposed HH models: one that has been applied to model gain scaling (Mease et al., 2013) and a second which exhibits fractional differentiation (Lundstrom et al., 2008). We verified (data not shown) that both effects can be obtained within the same model, but with slight quantitative differences.

Gain Scaling
Gain scaling refers to the case when for an input-output function of a neuron, the input gain is proportional to the standard deviation (SD) of the stimulus (σ ). Thus, the gain depends on the recent context. If a neuron achieves perfect gain scaling, the firing rate R given a particular stimulus value, s, and input standard deviation can be written as: where the normalized stimulusŝ = s σ , and the output gain,R σ , is constant in s.
To quantify the degree of gain scaling in a neuron's spiking output, we measure the firing rate function in response to a white-noise input, x(t), at different SDs and constant mean µ ( Figure 1A). For each standard deviation, we compute the normalized spike-triggered average (STA; Figure 1B) (Rieke et al., 1999). We then compute the stimulus as the convolution The spike rate function is then defined probabilistically as where the right side follows from Bayes' rule. The average firing rate in time bin of width t is p σ (spk). Thus, we getR σ t = p σ (spk) andR s σ = p σ (ŝ|spk) p σ (ŝ) . The spike-triggered stimulus distribution, p σ (ŝ|spk), is the probability of the stimulus given that a spike occurred in the bin. By definition the marginal stimulus distribution, p σ (ŝ), is a standard normal distribution which does not depend on σ . Therefore, if p σ (ŝ|spk) is similar across different values of σ , gain scaling is achieved becauseR(ŝ) does not depend on σ .
We measure gain scaling in terms of the spike-triggered distribution. We do so using the 1st Wasserstein, or earth-mover's metric (Rubner et al., 1998) (we obtained qualitatively similar results using the symmetrized Kullback-Leibler divergence and Jensen-Shannon divergence). The Wasserstein metric is a distance function between two probability distributions. Intuitively, it can be thought of as the minimum work needed to transform one distribution into the other by moving probability mass as if the distributions are piles of sand (Supplementary Figure 1). Formally, it is defined as where ν and µ are probability measures on a metric space M with metric d(·, ·). The infimum is taken over the collection of measures, Ŵ(µ, ν), on M×M with µ and ν marginal distributions. We compute the gain scaling score at σ as D σ = W 1 (p 1 (ŝ|spk), p σ (ŝ|spk)).
A distance close to 0 indicates that the spike-triggered distributions are similar, and therefore the cell is gain scaling its input (Figures 1C,D). Larger values of D σ indicate that the input-output function does not scale with σ (Figures 1E-G). We computed the spike-triggered distribution using a histogram with bins of width 0.1.

Gain Scaling in Hodgkin-Huxley Neurons
A previous study by Mease et al. (2013) found that Hodgkin-Huxley models could account for gain scaling observed in pyramidal neurons. Thus we simulated spikes from singlecompartment Hodgkin-Huxley style models of pyramidal neurons, providing a source of data with which to explore the expression of this property using GLMs. In this model, the membrane voltage depends on voltage-gated sodium and potassium conductances (G Na and G K ) and a passive leak conductance (G L ). The voltage and gating dynamics followed the equations (Mainen et al., 1995) such that for each gate x ∈ {n, m, h} The parameters of the model were the same as in Mease et al. (2013). The reversal potentials were E Na = 50, E K = −77, and E L = −70 mv and the capacitance was C = 1µF/cm 2 . The leak conductance was set to 0.4pS/µm 2 so that the resting membrane had a time constant of ∼25ms. As in Mease et al. (2013), we explored a range of values for the active conductances G Na and G K : from 600 to 2,000 pS/µm 2 in increments of 100pS/µm 2 . Simulations were performed in MATLAB using a fourth-order Runge-Kutta method with step size 0.01ms. Spike times were defined as upward crossings of the voltage trace at -10mv separated by at least 2ms. The input was constructed as independent Gaussian draws every 1 ms with parameters N µ, (4µσ ) 2 where σ was set to 1.0, 1.3, 1.6, or 2.0. The mean was constrained to be proportional to the standard deviation similarly the current-clamp protocol used to study gain scaling in Mease et al. (2013). For each value of G Na and G K , the mean input, µ, was tuned so that at baseline, where σ = 1, each simulation produced ∼10spk/s using a 100s simulation. We did not consider values of G Na and G K that spiked spontaneously (i.e., spiked when µ = 0). We simulated 2,000 s of spiking activity at each stimulus level (generating ∼20,000 spikes at σ = 1).

Fractional Differentiation
We next looked at periodic modulations of the stimulus standard deviation to model long timescale adaptive effects. We applied stimuli consisting of Gaussian noise with sinusoidal or square wave modulation of the variance between 1 and σ with σ again taking values of 1.3, 1.6, or 2.0, at a number of different frequencies. We analyzed simulated spike trains across seven noise modulations periods: 1, 2, 4, 8, 16, 32, and 64s. The simulations were 3,200 s for each period, giving a minimum of 50 cycles per period. Lundstrom et al. (2008) found that pyramidal neurons can act as fractional differentiators of the stimulus amplitude envelope for this type of input. Fractional derivatives generalize the derivative operator such that, analogous to taking the first derivative of a function twice to obtain the second derivative, taking the fractional derivative of order α = 1/2 twice results in the first derivative (Oldham and Spanier, 1974). Fractional differential filters respond to a square stimulus as an exponentiallike decay with a time constant that depends on α (Figures 2A,B). Fractionally differentiating a sinusoidal stimulus produces a frequency dependent gain change ( Figure 2C) where f is the frequency. Additionally, fractionally differentiating the sine function gives a frequency independent phase shift, φ, of the stimulus ( Figure 2D): These three measures can be combined to estimate approximate fractional differentiation by neurons.
To compute the fractional derivative order, we computed cycle-averaged responses obtained using 30 bins per cycle at each stimulus amplitude modulation frequency. We fit the cycle-averaged square-wave responses across all modulation frequencies as the best fitting fractional derivative of the stimulus amplitude (plus a baseline rate) using least-squares. To fit α to the phase lead of the sine-wave responses, we computed mean phase lead (φ) across frequencies and applied Equation (10). To fit α to the gain of the sine-wave responses, we applied Equation (9) by fitting a least-squares regression between the frequency of modulation and the logarithm of the gain.

Fractional Differentiation by Hodgkin-Huxley Neurons
We simulated neurons from the standard HH model with three additional afterhyperpolarization (AHP) currents with time constants ranging from 0.3 to 6s as was done by Lundstrom et al.

FIGURE 1 | (A)
The voltage (V), stimulus current (I s tim), and stimulus filtered by the STA (s) for Hodgkin-Huxley simulations of a neuron stimulated with white noise at two different standard deviation levels (black σ = 1; red σ = 2). In this simulation, the total sodium and potassium conductances were equal (G Na = G K = 1, 000pS/µm 2 ). (B) The STAs measured at the two stimulus standard deviations. (C) Left shows the spike-triggered distributions of the STA filtered input (s) and right shows the distributions over the STA filtered input scaled by the standard deviation (ŝ). The shaded areas show the prior stimulus distributions, which are Gaussian distributed with standard deviation σ . (D) The input-output functions of the stimulation at each stimulus level. Scaling the input by the standard deviation shows that the simulated neuron scales the gain of the input by the stimulus standard deviation (right). (E) The STAs measured at two standard deviations from a Hodgkin-Huxley simulation with high potassium and low sodium total conductances (G Na = 600 and G K = 2, 000pS/µm 2 ). The spike-triggered stimulus distribution (F) and scaled input-output function (G) for this simulation does not show gain scaling.
(2008). The equations for the HH neurons were The gates x ∈ n, m, h follow the dynamics .
The AHP currents have linear dynamics and are incremented by 1 at spike times (t spk,i ): where δ is the Dirac delta function. The parameters standard were: G Na = 120, G K = 36, G L = 0.3mS/cm 2 ; E Na = 50, E K = −77, E L = −54.4mv; and C = 1µF/cm 2 . The AHP conductances were set relative to the leak conductance: G AHP,· = (0.05, 0.006 and 0.004)G L . The AHP reversal potential was E AHP = −100mv and the AHP timescales were set to τ i = (0.3, 1, and 6)s. Similarly to the gain scaling simulations, the stimulus was sampled independently in each 1ms bin from a normal distribution with mean µ. The time-dependent variance given σ and the period p was 4µf p (t, σ ). The time-dependent modulation function for the square-wave stimulus was where ⌊·⌋ denotes the floor operator, and the function for the sine-wave stimulus was similarly defined as The parameter µ was calibrated so that with no variance modulation (i.e., σ = 1), the simulated cells produced ∼ 10spk/s.

Generalized Linear Models
The GLM defines the spiking activity as an autoregressive Poisson process with ( Figure 3A). The spike rate at time t is given as a linear-nonlinear function of the stimulus and the spike history where x t is the stimulus vector preceding time t (the values of I s tim), and y hist is the spike history vector. The parameters of the GLM are the stimulus filter (k stim ), the spike history filter (h spk ), and baseline rate (b). For the inverse-link function, f , we used the canonical exponential function except where otherwise noted. The log-likelihood of a binned spike train, y, given the model parameters is then (18) For all model fits and simulations, we set t = 1 ms. We numerically maximized the log-likelihood using conjugategradient methods.
To reduce the number of model parameters, we parameterized the filters using smooth basis functions ( Figure 3B). The stimulus filter was parameterized using 15 raised cosine basis functions: where t is in seconds. To fit k stim , we optimized the weights z j . We set c = 0.02 and a = 2(φ 2 − φ 1 )/π. The φ j were evenly spaced from φ 1 = log(T 0 /1000 + c), φ 15 = log(T end /1000 + c) where the peaks of the filters are in the range T 0 = 0 and T end = 100 ms.  The spike history filter bases were constructed in two parts. To account for the absolute refractory period, we used 5 box car filters of width 2 ms for the first 10 ms of the spike history. The remaining spike history filter was parameterized using raised cosine basis functions with the parameter c = 0.05. For the gain scaling simulations, N = 15 cosine basis functions were used with spacing T 0 = 10 and T end = 150 ms. For the fractional differentiation simulations, N = 25 cosine basis functions were used with spacing T 0 = 10 and T end = 16, 000 ms. To explore how the timescale of spike history affected adaptation in the GLM, for each model we fit the GLM using only the first i cosine basis functions for each i = 0 (using only the refractory box-car functions) to i = N. Thus, we obtained N + 1 nested model fits across a range of spike history lengths. When stated, the length of the spike history filter, T hist , denotes the time of the peak of the ith basis function.

Evaluating Model Performance
We evaluated the GLM performance by assessing the ability of the GLM to predict the HH model response to a 32 s novel stimulus. For the gain scaling simulations, we tested the response to the test stimulus at each stimulus SD (σ ). For the fractional differentiation simulations, the stimulus SD was modulated by a sine or square wave with a 4 s period and a modulation height of σ = 2.0. Predictive performance was evaluated using the pseudo-R 2 score (Cameron and Windmeijer, 1997). We selected this measure because it can be applied to Poisson process observations instead of trial-averaged firing rates as is required by the standard R 2 measure of explained variance (Benjamin et al., 2018). Thus, it is especially appropriate for comparing the stochastic GLM to a spike train simulated by the deterministic HH model. The pseudo-R 2 is written as the ratio of deviances: where y * is the test spike train. The GLM likelihood is p GLM (Yy * |k stim , h spk , b) and the likelihood of the null model (p null (y * |y * )) is the probability of the spike train given only the mean firing rate, y * . The saturated model likelihood (p satur. (y * )) is the probability of observing y * given one parameter per bin: that is, the Poisson probability of observing y * given a model with rate λ = 1 in each bin in which the HH model spiked and rate λ = 0 in each bin that the HH did not spike. Thus, the pseudo-R 2 measures the fraction of explainable log-likelihood captured by the GLM.

GLMs Capture Gain Scaling Behavior
To investigate how GLMs can capture biophysically realistic gain scaling, we fit the Hodgkin-Huxley simulations with GLMs ( Figure 4A). We fit a unique GLM for each value of G Na and G K in the HH model, and the GLMs were fit using the entire range of stimulus SDs (σ = 1.0, 1.3, 1.6, and 2.0). Applying the STA analysis at the four stimulus SDs, we quantified gain scaling in GLM fits and compared the gain scaling in the GLM simulations to the HH neurons (Figures 4B,C). Across the range of spiking conductance values, we found that the GLM fits consistently showed gain scaling ( Figure 4D). The HH neurons showed the greatest degree of gain scaling when the G Na /G K ratio was close to one, with the lowest D 2 score occurring at a ratio of 1.17 (Mease et al., 2013). We observed the same pattern in the GLM simulations, but the GLM fits generally exhibited stronger gain scaling when G Na /G K < 1 than the HH neurons. We note that in general the optimal G Na /G K ratio depends on the leak conductance; however, here we assumed a fixed leak for simplicity. The GLM's characterization of the HH neurons depended on the spike history filter. This is revealed by comparing the stimulus filters ( Figure 4A) to the stimulus features extracted by spike-triggered averaging ( Figure 4B): While the STA showed multiphasic responses, the GLM stimulus filter was consistent with a simple, monophasic integration. This demonstrates that the STA reflects the combination of stimulus and spike history effects (Agüera y Arcas and Stevenson, 2018). A spike history filter of sufficient length was necessary to achieve accurate model fits across all stimulus SDs (Figures 5A,B). A possible interpretation of this finding is that the spike history is acting as a measure of the "context" that serves to normalize the stimulus response, and that a sufficiently long sampling of the spiking output is needed in order to perform this normalization appropriately.
We also explored how the stimulus conditions used to fit the GLM determined the model's ability to capture gain scaling. Remarkably, we found that the GLM fit only to the baseline stimulus SD (σ = 1.0) captured the gain scaling pattern seen in the HH neuron ( Figure 5B). The gain scaling observed in the GLMs required a sufficiently long spike history filter, on the order of at least 50 ms. With shorter spike history, the GLM did not obtain the same level of gain scaling at the optimal G Na /G K ratio. However, these GLM fits failed to generalize across stimulus SDs. The GLM trained only at σ = 1.0 explained less variance in the spiking responses to a stimulus at σ = 2.0 than a model capturing only the mean firing rate for all values of G Na and G K (predictive pseudo-R 2 <0; Figure 5C). Therefore, the GLM trained at σ = 1.0 does not accurately characterize the HH responses despite accurately predicting gain scaling in  The difference in predictive performance (measured as pseudo-R 2 ) between the exponential GLMs and the power-law GLMs for p ∈ {2, 3, 4, 5} for a test stimulus of σ = 1.0 (left) and σ = 2.0 (right). Positive values indicate the GLM with a power-law non-linearity had greater predictive performance than the exponential GLM. The GLMs were fit to all σ . (C) Gain scaling predicted by the power-law GLMs (gray traces) compared to the exponential GLMs (red) and the HH simulations (blue). those cells. In contrast, GLMs trained at all four σ values failed to capture the lack of gain scaling at low G Na /G K values despite showing improved model fit across all σ (Figure 5D; a detailed example is provided in Supplementary Figure 2A). Because the GLM trained on all σ showed both consistent generalization performance and strong gain scaling behavior, the remaining analyses considered only that training condition.
We next considered how the GLM parameters related to the gain scaling computation and the space of G Na and G K in the HH models. To visualize the geometry of the model parameters, we performed PCA on the stimulus and spike history filters (Figures 6A,E). The filters produced across the two HH parameters spanned a two-dimensional subspace (variance explained: stimulus 98.8%, spike history 97.3%). The PCA reconstructions for example stimulus filters are given in Supplementary Figure 3. This decomposition of the stimulus filter shows that the first mode is primarily an integrator, while the second acts as a derivative, and will serve to adjust the timescale of the filter's integration window. The inflection point for the second mode suggests that there may be an optimal time constant of stimulus and spike history integration needed to support the gain scaling property (Figures 6B,F). The first component for both filters correlated with the G Na /G K ratio (Figures 6C,G; stimulus PC1 r = −0.97, p < 10 −4 ; spike history PC2 r = 0.97, p < 10 −4 ). The second PC correlates with the gain scaling value observed in the corresponding HH model (Figures 6D,H; stimulus PC2 r = −0.89, p < 10 −4 ; spike history PC2 r = 0.90, p < 10 −4 ).

Power-Law Firing Rate Nonlinearities
The GLMs we considered used the canonical inverse-link function, the exponential nonlinearity (McCullagh and Nelder, 1989), to transform the filtered stimulus plus spike history into a firing rate. However, it is known that firing rate nonlinearities that instead have a power-law relationship of the input produce gain scaling (Miller and Troyer, 2002;Murphy and Miller, 2003). We therefore considered a range of soft-power nonlinearities over a range of exponents for the GLM firing rate (Figure 7A; Equation 17): for p ∈ {2, 3, 4, 5} (for p = 1, the model performed poorly for all HH simulations and the results are not shown). We found that the power-law nonlinearity produced better predictive fit than the exponential for HH simulations with low G Na /G K ratios ( Figure 7B). For those ratios, the exponential GLM in fact predicted greater gain scaling than the HH simulation actually showed ( Figure 5A and Supplementary Figure 2A). We found the power-law nonlinearities showed less gain scaling in the low G Na /G K regime, which was more consistent with the HH simulations ( Figure 7C). This perhaps counter-intuitive result is likely due to the temporal processing of the GLM: the spike history filter shapes the effective stimulus-response function over longer timescales. Thus, the instantaneous spike rate function need not be a power law to produce gain scaling and an instantaneous power-law function may not result in strong gain scaling in the presence of spike history dependencies.

GLMs Capture Fractional Differentiation With Long Timescales of Adaptation
In this section, we address adaptive computations occurring over multiple timescales spanning tens of seconds, instead of instantaneous gain. We consider adaptation to changes in stimulus variance in the responses of HH simulations with three AHP currents (Lundstrom et al., 2008). The neurons were injected with noise stimuli with a periodically modulated SD. The stimulus SD followed either a sine or square wave. We focused our analyses on the cycle-averaged firing rate to see how the neural responses reflect fractional differentiation of the stimulus SD envelope in the cycle-averages. We fit GLMs to HH simulations in response to either sine-or square-wave SD modulation. The training data included simulations with noise modulation periods of 1-64s. We considered GLMs with different lengths of spike history filters. Cycle-averaged responses of HH and GLM simulations appear qualitatively similar (Figure 8), and thus we aimed to characterize how well the GLM fits captured the fractional differentiation properties of the HH neuron. Although the AHP conductances act to provide a simple linear filtering of the spike train similar to the GLM's spike history filter, the GLM effectively assumes that the spike history is instead a current such that the total conductance of the cell remains constant (Latimer et al., 2019b). Therefore, it is not given that the GLM can replicate the computational effects of the AHP conductances.
The sinusoidal noise simulations show two properties of fractional differentiation. First, we estimated response gain (i.e., the strength of the sinusoidal modulation in the cycle-averaged response as a function of stimulus period; Figure 9A). In an ideal fractional differentiator, the log gain is proportional to the log of the stimulus period. The HH neuron shows a near linear response (r 2 = 0.99, p < 10 −4 ). Although the GLM with short history shows an almost flat relationship, increasing the spike history length shows similar slope to the HH neuron. The second property was the phase lead of the cycle-averaged response relative to the stimulus ( Figure 9B). The phase lead should be constant under perfect fractional differentiation. The  Figures 8A,B right). The GLM fit with a 0.17s spike history filter (gray) is compared to the GLM with the full 16s spike history (black). The HH simulation is given in blue. The noise stimulus was modulated with σ = 2. (B) The phase lead of the average responses to sine-wave modulated noise as a function of stimulus period for the HH and GLMs. The fraction of variance explained in the HH phase lead curve by the GLM with 16s spike history was R 2 = 0.61. (C) The time constant of an exponential function fit to the cycle-averaged response to square wave noise for each stimulus period ( Figures 8C,D right). The markers denote time constants estimated for steps from low to high variance or step from high to low variance. The fraction of variance explained of the log time constants of the HH simulation by the GLM with 16s spike history was R 2 = 0.80. (D) The fractional differentiation order (α) of the GLM estimated by the slope of gain as a function of the log stimulus period in (B). The value is estimated for each spike history lengths (black) and compared to α estimated from the HH simulation (blue). The red trace shows α estimated from the GLM fit only to unmodulated noise. (E) α estimated by the average phase lead across stimulus periods. (F) α estimated by fitting a the square-wave responses with a fractional differentiating filter. (G-I) α estimated at different noise modulation strengths for the 16s spike history GLM and HH simulation.
phase lead declines with longer period, but the HH simulation still shows strong phase lead in a 64s period. Short spike history filter GLMs exhibit a phase lead that tends to zero with long SD periods. However, the GLM fit with a long spike history filter closely tracks the HH neuron's phase lead.
The final signature of fractional differentiation was the exponential decay of the cycle-averaged response under squarewave noise simulation ( Figure 9C). We estimate the time constant of the decay on the square noise cycle average for both steps up and steps down in stimulus SD. The time constant increases approximately linearly with the SD period, and GLMs with long spike history showed time constants closely approximated the HH neuron.
From each signature, we estimated the order of the fractional differentiation (α) in both the HH neurons and the GLM fits. We estimated the order using the slope of log-period compared to log-gain and mean phase lead across all stimulus periods for the sine-wave SD simulations (Figures 9D,E). A least-squares fit of FD filter of order α was applied to the square noise stimuli ( Figure 9F). We considered α for the GLM fits as a function of the spike history length. The order estimates for the HH neuron, although slightly different for each signature, were approximately α = 0.2. The GLM's FD order tends toward that of the HH neuron as the spike history length increases from below. Surprisingly, when we considered a GLM trained only to a flat noise stimulus (no sine or square modulation; stimulus SD σ = 1.0) showed similar estimates of α (Figures 9D-F, red  traces). Thus, the response properties giving rise to fractional differentiation of the noise envelope could be detected by the GLM even without driving with long timescale noise modulation.
We then considered how the estimated fractional differentiation order depended on the strength of the SD FIGURE 10 | (A) Comparing simulations of the HH model to sine wave modulated (left) and square wave modulated (right) noise. The black traces show the stimulus and the blue traces show the standard deviation envelope. The red raster denotes the HH spike response. Several repeated simulations of the GLM are shown for a GLM trained with only a 10ms spike history (gray raster) and a GLM trained with the full 16s spike history (black raster). (B) Assessing the model fitness for GLMs fit to sine wave modulated noise (left), square wave noise (middle), and unmodulated noise (right). The pseudo-R 2 measured on a withheld training set simulated from the HH model as a function of spike history length. The stimulus was sine (solid lines) or square (dashed lines) modulated noise with a 4s period and a modulation strength of σ = 2. (C) The integral over time (i.e., cumulative sums) of each spike history filter. (D) The stimulus filters for all GLM fits. modulation. We found a slightly higher α for lower stimulus SDs (note that σ = 2.0 was used to fit the GLMs) for the gain and timescale estimates (Figures 9G-I). However, the phase lead estimate was fairly stable across SDs.
Next, we quantified how well the GLM predicted the HH responses to new stimuli ( Figure 10A). Spike history filters with timescales of several seconds improved the GLM's ability to predict spike trains, and the improvement continued for spike histories of several seconds ( Figure 10B). However, training only on unmodulated noise did not result in a good GLM fit despite predicting α (Figure 10B).
We examined the parameter estimates in the GLM as a function of spike history length. We plotted the integral of the spike history filter to show how the filter integrates spikes over time. The integrals show long timescales for the GLM fit to either sine-or square-wave noise ( Figure 10C). The GLM fit to either type of noise predicted over 60% of the variance in the HH responses to both sine-and square-wave noise. The flat noise GLM also showed long timescales, but the integral changed substantially with changes in the length of spike history. This indicates that the combination of spike-history dependent timescales is not well-constrained in the flat noise condition despite predicting α, perhaps due to biases present in the data without modulations (Stevenson, 2018). The stimulus filters are short timescale and showed little dependence on spike history length ( Figure 10D). Thus, the GLM captured fractional differentiation in the HH neuron by linearizing the long timescale AHP currents.

DISCUSSION
Individual neurons can adapt their responses to changes in input statistics. Here, we studied two adaptive computations to changes in the stimulus variance that are captured by biophysically realistic neurons. First, we examined gain scaling of the inputs so that the spike-triggered stimulus distribution was independent of the stimulus variance. The ability of the neuron to gain scale depended on the ratio of the spike-generating potassium and sodium conductances. Second, we considered spiking responses that approximate a fractional derivative of the stimulus standard deviation, which can be produced by a set of AHP currents with different timescales. Although HH neurons can produce these adaptive effects, it is difficult to fit the HH to data. Our results demonstrate that the GLM provides a tractable statistical framework for modeling adaptation to stimulus variance in single-neurons. The GLM provides an alternative representation of the spiking responses as two linear filters (stimulus and spike history filters) with a fixed spiking non-linearity instead of a multidimensional (and potentially stochastic) dynamical system (Meng et al., 2011(Meng et al., , 2014. Importantly, a single GLM could accurately approximate the responses of HH neurons across multiple levels of input variance or across multiple timescales of variance modulation. The GLM accomplished this by linearizing the effect of recent spiking into a non-linear and stochastic spiking mechanism to adjust for the current stimulus statistics, which can act as a measure of the current stimulus context. To reproduce gain scaling, around 150ms of spike history is required, in line with the rapid expression of the gain scaling property with changes in stimulus statistics (Fairhall et al., 2001a;Mease et al., 2013). In the fractional derivative case, the GLM summarized the multiple AHP currents of the HH models as a single linear autoregressive function with multiple timescale effects. This approach shows that at least part of the spectrum of adaptive behaviors to stimuli with time-varying characteristics can be captured through a single linear spike history feedback. Effective alternative approaches have captured time-varying context with a multiplicity of filters acting on different timescales of the stimulus alone (Kass and Ventura, 2001;McFarland et al., 2013;Qian et al., 2018;Latimer et al., 2019a).
The simulations explored here assumed the input to a cell was an injected current generated from a Gaussian distribution. However, neurons receive input as excitatory and inhibitory conductances, which can be integrated across complex dendritic processes (Ozuysal et al., 2018;Latimer et al., 2019b). Additionally, realistic input statistics may not follow a Gaussian distribution (Heitman et al., 2016;Maheswaranathan et al., 2018). Further work toward understanding the adaptive computations performed by single neurons should consider the inputs the neuron receives within a broader network and should consider non-linear stimulus processing (McFarland et al., 2013;Benjamin et al., 2018). Neural coding and computations that occur across a wide range of input levels depend heavily on adaptation to the stimulus variance (Wark et al., 2007). The GLM, despite being a simple approximation, can provide a good representation of adaptive computations in biophysically realistic neurons.

AUTHOR CONTRIBUTIONS
KL and AF designed the study and wrote the manuscript. KL performed the simulations and statistical analysis. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the Human Frontiers in Science Program to AF. KL was supported by a Chicago Fellowship.