Edited by: Yilei Zhang, Nanyang Technological University, Singapore
Reviewed by: Salvador Dura-Bernal, SUNY Downstate Medical Center, United States; Harel Z. Shouval, University of Texas Health Science Center at Houston, United States
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The perceptron learning algorithm and its multiple-layer extension, the backpropagation algorithm, are the foundations of the present-day machine learning revolution. However, these algorithms utilize a highly simplified mathematical abstraction of a neuron; it is not clear to what extent real biophysical neurons with morphologically-extended non-linear dendritic trees and conductance-based synapses can realize perceptron-like learning. Here we implemented the perceptron learning algorithm in a realistic biophysical model of a layer 5 cortical pyramidal cell with a full complement of non-linear dendritic channels. We tested this biophysical perceptron (BP) on a classification task, where it needed to correctly binarily classify 100, 1,000, or 2,000 patterns, and a generalization task, where it was required to discriminate between two “noisy” patterns. We show that the BP performs these tasks with an accuracy comparable to that of the original perceptron, though the classification capacity of the apical tuft is somewhat limited. We concluded that cortical pyramidal neurons can act as powerful classification devices.
There has been a long-standing debate within the neuroscience community about the existence of “grandmother neurons”—individual cells that code for high-level concepts such as a person's grandmother. Recent experimental evidence, however, has indicated that there are units that are selective to specific high-level inputs. In particular (Quiroga et al.,
From a physiological standpoint, achieving a high degree of accuracy on a recognition task is a daunting challenge for a single neuron. To put this in concrete terms, a pyramidal neuron may receive around 30,000 excitatory synapses (Megías et al.,
There are several ways that a neuron can selectively respond to different input patterns. The most well-known method is to adjust synaptic “weights” such that only input patterns which activate a sufficient number of highly-weighted synapses will cause the cell to fire. It is this principle which serves as the basis of the perceptron learning rule (Rosenblatt,
The
The M&P and biophysical perceptron.
While the remarkable efficacy of networks of M&P neurons has demonstrated for various learning tasks, few attempts have been made to replicate the perceptron learning algorithm in a detailed biophysical neuron model with a full morphology and active dendrites with conductance-based synapses. It thus remains to be determined whether real cells in the brain, with all their biological complexity, can integrate and classify their inputs in a perceptron-like manner.
In this study, we used the perceptron learning algorithm to teach a detailed realistic biophysical model of a layer 5 pyramidal cell with a wide variety of active dendritic channels (Hay et al.,
To implement the perceptron learning algorithm in a modeled layer 5 thick tufted pyramidal cell (L5PC) we distributed excitatory conductance-based AMPA and NMDA synapses on the detailed model developed by Hay et al. (
We then used the perceptron learning algorithm (see section Materials and Methods) to modify the synaptic weights such that the cell could correctly classify all the patterns (
Learning the classification task with the biophysical perceptron.
We compared the classification accuracy for each condition in the biophysical model to an equivalent M&P perceptron with excitatory weights (see section Materials and Methods). When all synapses are placed on the soma or the proximal basal tree of the biophysical perceptron, the classification accuracy of the biophysical perceptron is near to that of the M&P perceptron.
As expected from the theoretical literature (Chapeton et al.,
In all synaptic placement conditions, the M&P perceptron and the BP performed with perfect accuracy on the “easy” task with
However, when the synapses are all placed on the apical tuft of the biophysical cell, the classification accuracy of the biophysical perceptron decreases dramatically, even in the presence of supra-linear boosting mechanisms such as NMDA receptors and active Ca2+ membrane ion channels. For
We argue that the reason for the discrepancy in classification accuracy for the biophysical perceptron between the conditions wherein synapses are placed on the apical tuft, as opposed to the soma or basal dendrites, is due to the passive filtering properties of the neuronal cable and the saturation effect of conductance synapses. Specifically, the attenuation of voltage along the length of cable from apical tuft dendrites to the spike initiation zone means that the
We claim that “democratization” via disproportionally increasing distal synaptic conductances does not solve the classification accuracy problem for synapses located on the apical tuft because effective synaptic weights are bounded by the synaptic reversal potential in the distal dendrites, even if one were to increase synaptic conductances to arbitrarily high values. As such, the maximal effective synaptic weight (MESW)—defined as the peak somatic EPSP voltage when a given dendritic location approaches the synaptic reversal potential (
Effect of location of dendritic synapses on the maximum effective synaptic weight (MESW).
Importantly, the marginal effect of each synapse in the presence of background dendritic activity (as in our case, where we activated 200 synapses simultaneously) differs from the MESW (measured when the synapse acts in isolation). For example, a single synapse brought to its reversal potential can interact supralinearly with other synapses via activating NMDA-conductance, strengthening the effect of the other synapses (Polsky et al.,
From the standpoint of learning theory, the “cap” on the effective weights of distal apical synapses restricts the parameter space of the biophysical perceptron, reducing its capacity. When a perceptron learns to classify between two sets of patterns, it creates a linear separation boundary—i.e., a hyperplane—which separates the patterns in an N-dimensional space, where N is the number of synaptic inputs in each pattern. The separation boundary learned by the perceptron is defined by the hyperplane orthogonal to the vector comprising the perceptron's weights. When the weights of the perceptron are unconstrained, the perceptron can implement any possible hyperplane in the N-dimensional space. However, when the weights are constrained—for example by the MESWs of the apical tuft of L5PCs—the perceptron can no longer learn every conceivable linear separation boundary, reducing the ability of the perceptron to discriminate between large numbers of patterns [Note: because we use only excitatory synapses, the weight space in all synaptic placement conditions is already substantially constrained to positive values even before imposing MESWs, see Chapeton et al. (
The fact that switching the apical synapses from conductance-based to current-based substantially improves classification accuracy supports the notion that voltage saturation due to synaptic reversal potential is responsible for the reduced performance of the apical tuft synapses (
To explore whether the apical tuft is always at a disadvantage when it comes to pattern classification, we also tested the biophysical perceptron on a generalization task. Instead of classifying a large set of fixed patterns, in the generalization task the neuron was presented with “noisy” patterns drawn from one of two underlying fixed patterns. In this task, noise was added to the underlying pattern by performing “bit flips,” i.e., flipping an active synapse to an inactive synapse or vice versa (
Generalization task with the biophysical perceptron.
In this task, we observe that in all conditions the BP performs similarly to the M&P perceptron. We do not observe any substantial diminution in classification performance between the apical tuft and the soma, as we do in the classification task (
Learning the generalization task with the biophysical perceptron.
The discrepancy between the apical tuft and soma may be smaller in the generalization task than in the classification task because the difficulty in the classification task is fundamentally about finding the correct hyperplane that will separate between the two classes of patterns. As we increase the number of patterns in each of the classes, we require more flexibility in the weight space of the neuron to ensure that all the positive and negative patterns end up on opposite sides of the separating hyperplane. This flexibility is impeded by the bgMESWs of the apical tuft. By contrast, the generalization problem only contains two canonical “patterns.” The difficulty in learning the generalization task with a large amount of noise (in terms of bit flips) does not stem from the challenge of precisely defining a separation boundary. Rather, solving the generalization task is hard because, even if we had an optimal separation boundary, the noise in the input entails that some of the noisy patterns would still necessarily be misclassified.
We utilized a detailed biophysical model of a cortical layer 5b thick-tufted rat pyramidal cell written in NEURON with a Python wrapper (Carnevale and Hines,
Excitatory synapses were AMPA/NMDA-based synapses as in Muller and Reimann (
For the classification task, each of the P patterns was generated by randomly choosing 200 out of the 1,000 synapses to be activated. The patterns were then randomly assigned to either the positive or negative class. Patterns were presented to the cell by simultaneously stimulating the 200 active synapses with a single presynaptic spike at the beginning of the simulation. Simulations of the neuron were run with a Δt of 0.1 ms for a total of 100 ms. Patterns were considered to have been classified as “positive” if they produced at least one spike within the 100 ms time window and as “negative” if no spikes occurred.
The choice of 200 active synapses was to simulate a regime of high cortical activity. The maximal firing rate for excitatory cortical neurons is estimated to be around 20 Hz (Heimel et al.,
We utilized an “online” version of the perceptron learning algorithm, applying the plasticity rule every time a pattern was presented to the neuron. Also, because we limited our analysis to excitatory synapses, we use the modified algorithm proposed in Amit et al. (
The algorithm works as follows: A presynaptic input pattern
where
and η is the learning rate.
In other words, if the target output is the same as the actual output of the neuron, we do nothing. If the target is “should spike” and the neuron does not spike, we increase the weight of all synaptic inputs that were active in the pattern. If the target is “shouldn't spike” and the neuron does spike, we decrease the synaptic weights of all synaptic inputs that were active in the pattern, unless that would decrease the synaptic weight below 0, in which case we reduced the weight of that synapse to 0.
The accuracy of the neuron's output was calculated after each epoch, which consisted of a full pass of presenting each pattern (in random order) to the neuron. To ensure that accuracy improved on every epoch and reached a reasonable asymptote for all conditions, we set the learning rate η to 0.002 for the condition with AMPA/NMDA conductance synapses and an active tree, and a rate of 0.19 for the condition with current synapses. We also used the “momentum” technique (Rumelhart et al.,
To compare the BP to an equivalent M&P perceptron (
To calculate the MESWs for the L5PC model, we added a very strong synapse (500 nS) to each dendritic segment in the neuron model bringing the segment within 2.5 mV of the synaptic reversal potential of 0 mV. The MESW for a dendritic segment is defined as the difference between the somatic resting potential and the peak depolarization obtained at the soma within 100 ms after synaptic activation (
To create an MESW-constrained M&P model for the apical tuft, we calculated the distribution of MESWs per unit length of the dendritic membrane in the apical tuft. The median and quartile values of the MESWs for all synaptic placement conditions are shown in the box-and-whisker plot in
where η and
To calculate the bgMESWs for the L5PC model, we distributed 199 “background” synapses on the neuron according to a uniform distribution per unit length of the dendritic membrane. All background synapses had the same conductance. To find the synaptic conductance required to bring the neuron near its spiking threshold, we gradually increased the synaptic conductances of all synapses by 0.05 ns steps until the neuron produced at least one spike. The largest conductance that didn't cause the neuron to spike was used as the conductance for the near-threshold background activity. The conductances for each distribution condition were: Soma: 0.3 nS, Basal: 0.3 nS, Apical tuft: 0.52 nS, Full: 0.48 nS (values are averaged over 10 trials of this procedure to account for the randomness in the placement of the background synapses). In the presence of this background activity, we added a strong synapse to each dendritic location, as detailed in the section for the MESW calculation. To find the marginal contribution of a single strong input at each location, we subtracted the somatic EPSP obtained via the background activity from the somatic EPSP obtained when both the background activity and the strong synapse at that location are active, creating a difference curve (
In the second task (generalization), we created two underlying patterns of 1,000 synapses each, where 200 synapses were active, as in the classification task. These patterns were then corrupted by flipping a given number synapses (0, 100, or 200, depending on the condition) and presented to the neuron. To maintain the sparsity of the patterns, half of the flipped synapses were switched from active to inactive and the other half switched from inactive to active. For example, in the condition with 100 flipped bits, 50 out of the 200 previously active synaptic inputs were flipped to inactive, and 50 out of the 800 previously inactive synaptic inputs were switched to active.
In every epoch of the learning task, we presented the neuron with 50 noisy patterns generated by the first underlying pattern and 50 noisy patterns generated by the second underlying pattern for a total of 100 patterns per epoch (the order of the presentation of patterns from the two underlying patterns was also randomized). We set the learning rate η to 0.25 for the condition with AMPA/NMDA conductance synapses and an active tree, and a rate of 10 for the condition with current synapses. Learning rates were hand-tuned as described above. Similar to the classification task, we used the online perceptron learning rule with the momentum modifier. In this task we only ran the algorithm for 5 epochs, as this was enough for the learning to achieve a plateau. Results shown in
Simulations were all performed using Neuron v.7.6 (Carnevale and Hines,
In the simulations described above, we have demonstrated that the perceptron learning algorithm can indeed be implemented in a detailed biophysical model of L5 pyramidal cell with conductance-based synapses and active dendrites. This is despite the fact that the perceptron learning algorithm traditionally assumes a cell which integrates its inputs linearly, which is not the case for detailed biophysical neurons with a variety of non-linear active and passive properties and conductance-based synapses. That being said, the ability of a biophysical perceptron to distinguish between different patterns of excitatory synaptic input does depend on the location of the relevant synapses. Specifically, if all the synapses are located proximally to the soma, such as on the proximal basal tree, the cell has a classification capacity similar to that of the M&P perceptron. However, for activation patterns consisting of more distal synaptic inputs, such as those on the apical tuft, the classification capacity of the BP is reduced. We showed that this is due to the reduced effectiveness of distal synapses due to cable filtering and synaptic saturation in the presence of other synaptic inputs, which limits the parameter space of the learning algorithm and thus hampers classification capacity. We also demonstrated that the diminished classification capacity in the apical tuft is negligible in a generalization task. This indicates that, while the maximum effective synaptic weights of the apical tuft may be somewhat limiting for its classification capacity, they do not hamper the apical tuft's robustness to noise.
The above discussion considers that the pyramidal cell separately classifies inputs that synapse onto different regions of its dendrites (such as the apical tuft and the basal tree) and that it does not simultaneously integrate all the synaptic input impinging on the cell. This decision was motivated by a growing body of evidence that different parts of the dendritic tree may play separate roles in shaping the neuron's output. From anatomical studies, it is known that axons from different brain regions preferentially synapse onto particular regions of layer 5 pyramidal cells. For example, basal dendrites tend to receive local inputs whereas the apical tuft receives long-range cortical inputs (Crick and Asanuma,
Our study made several simplifications to the learning and plasticity processes found in biology. Critically, our plasticity algorithm utilized only excitatory synapses and did not consider the effect of inhibition on learning. This is not because we believe that inhibition does not play a role in learning; on the contrary, inhibitory synapses are essential both for the learning process and in defining the input-output function of the cell (Wulff et al.,
The focus on excitatory synapses also enables our work to be directly compared to studies of excitatory perceptron-like learning done on Purkinje cells—which have been classically conceived of as perceptrons (Marr,
Our focus on perceptron-like learning constitutes an additional simplification, as perceptron learning ignores how dendritic non-linearities such as local NMDA spikes (Schiller et al.,
There are several other models of learning and plasticity that make use of neuronal biophysics and constitute promising opportunities for improving the learning ability of pyramidal cell models in a biologically plausible way. The calcium-based plasticity rule of Graupner and Brunel (
Another crucial element that remains to be studied in detailed biophysical models is the role of the timing of both the input and output of pyramidal cells in learning and computation. Regarding input timing, some theoretical work has been done on the M&P perceptron, which has been extended in a variety of ways to take into account several components of real neurons. One such extension is the tempotron, which uses a leaky integrate and fire mechanism (Gütig and Sompolinsky,
The present study shows that, by implementing the perceptron learning rule, layer 5 cortical pyramidal cells are powerful learning and generalization units, comparable—at the very least—to the abstract M&P perceptron. Other plasticity rules, which take into account synaptic clustering, input and output timing, and interaction between the apical and basal regions of pyramidal cells will be explored in further studies in detailed biophysical models in order to determine their biological plausibility and classification capacity. Until then, our study should be viewed as a baseline for comparison of any future work implementing learning algorithms in detailed biophysical models of neurons.
The code used for the biophysical model (including the hoc files for the pyramidal cell model) and the M&P model, as well as the code used to generate the input patterns, can be found at
TM and IS designed the research. TM implemented the simulation, analyzed the results, and created the figures. IS supervised the research and contributed to the development of the theoretical and biophysical aspects of the study.
The authors declare that this study received funding from Huawei Technologies Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank Oren Amsalem for his assistance in many aspects of this work, particularly his help with the use of the NEURON software. We also would like to thank David Beniaguev, Guy Eyal, and Michael Doron for their insightful discussions about machine learning and neuronal biophysics. Itamar Landau provided several useful comments to an early version of this work, inspiring important revisions. Additionally, we appreciate Nizar Abed's support in maintaining the computer systems used to perform our simulations and data analysis. This manuscript has been released as a pre-print at BioRxiv (Moldwin and Segev,