Lessons From Deep Neural Networks for Studying the Coding Principles of Biological Neural Networks

Bae, Hyojin; Kim, Sang Jeong; Kim, Chang-Eop

doi:10.3389/fnsys.2020.615129

ORIGINAL RESEARCH article

Front. Syst. Neurosci., 15 January 2021

Volume 14 - 2020 | https://doi.org/10.3389/fnsys.2020.615129

Lessons From Deep Neural Networks for Studying the Coding Principles of Biological Neural Networks

1. Department of Physiology, Gachon University College of Korean Medicine, Seongnam, South Korea
2. Laboratory of Neurophysiology, Department of Physiology, Seoul National University College of Medicine, Seoul, South Korea

Parts of this article's content have been modified or rectified in:

Erratum: Lessons From Deep Neural Networks for Studying the Coding Principles of Biological Neural Networks
1. Read erratum

Abstract

One of the central goals in systems neuroscience is to understand how information is encoded in the brain, and the standard approach is to identify the relation between a stimulus and a neural response. However, the feature of a stimulus is typically defined by the researcher's hypothesis, which may cause biases in the research conclusion. To demonstrate potential biases, we simulate four likely scenarios using deep neural networks trained on the image classification dataset CIFAR-10 and demonstrate the possibility of selecting suboptimal/irrelevant features or overestimating the network feature representation/noise correlation. Additionally, we present studies investigating neural coding principles in biological neural networks to which our points can be applied. This study aims to not only highlight the importance of careful assumptions and interpretations regarding the neural response to stimulus features but also suggest that the comparative study between deep and biological neural networks from the perspective of machine learning can be an effective strategy for understanding the coding principles of the brain.

Introduction

A standard approach to study the neural coding principle in biological neural networks (BNNs) is to characterize the statistical properties of neural responses and elucidate their association with sensory or other information (Dayan and Abbott, 2001; Panzeri et al., 2015). For example, one can use statistical tests to compare neural responses for the feature set or decoding models that predict the feature labels from neural activity, revealing the information content present in the brain region.

The use of machine learning (ML) in neuroscience has grown rapidly during the last decade (Glaser et al., 2019). The role of ML in neuroscience ranges from a tool for neural data analysis (Carlson et al., 2013; Lebedev et al., 2014; Mathis et al., 2018; Pandarinath et al., 2018) to a model for the brain (Cadieu et al., 2014; Yamins et al., 2014; Kell et al., 2018; Keshishian et al., 2020; Yang and Wang, 2020). In particular, it has become a popular idea that deep neural networks (DNNs) can serve as a good model of biological networks considering their near human-level performance across challenging domains (Marblestone et al., 2016; Cichy and Kaiser, 2019). Although it has been pointed out that DNNs lack biological plausibility and are not transparent, remarkable developments have been made enabling one to analyze and understand their representation (Samek et al., 2016; Fong and Vedaldi, 2018; Zhou et al., 2018; Cohen et al., 2019; Zhang et al., 2019), and recent studies propose that DNN models can provide insights for the brain's computing mechanism based on their similar response properties (Kriegeskorte, 2015; Güçlü and van Gerven, 2017; Kell and McDermott, 2019). As they share the question of understanding the representation of the neural networks, there are opportunities for synergy between the DNN enabling controllable and tractable simulation and the BNN with significantly greater experience in the matter (Barrett et al., 2019; Richards et al., 2019)

In this study, we demonstrate the dangers latent in the widely used research framework for identifying informational content from a neural representation. It is noteworthy that these dangers are related to the problems that have been raised in DNN research, which has rich experience in dealing with the so-called black box. In particular, we focus on revealing misleading points that may arise from the researcher-defined feature space. By employing the DNN as an in silico model of the BNN, we simulate four likely scenarios and present BNN studies to which our points can be applied as follows. (A) Owing to the inaccessibility of the full feature space, a researcher can misjudge the neural feature selectivity. (B) The researcher-defined feature might be a confounding variable coupled with the ground truth feature and neural response. (C) Overlooking the inherent assumption for the feature space of the decoding model may result in an overestimation of the network feature representation. (D) Misassumptions regarding the feature complexity or disregarding the internal state coding may result in an overestimation of the noise correlation. Finally, we discuss the root cause of constraints in identifying the association between the predefined feature and the neural response and suggest the feasibility of a comparative approach between DNNs and BNNs.

Results

The Deep Neural Network Trained on CIFAR-10 as a Model of BNN

To simulate potential errors in a neural coding study, we mimic a BNN by using a DNN model trained on an image dataset. A six-layer feedforward fully connected neural network was constructed and trained using the CIFAR-10 dataset, which comprises 60,000 images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck) (Figures 1A,B). After 500 epochs of training using stochastic gradient descent with a batch size of 512, the model demonstrated a saturated test set accuracy of ~53% (chance-level performance = 10%) (Figure 1C). It is known that convolutional neural networks (CNNs) perform better for image data. Nonetheless, since we intended to make DNN serve as a model of BNN at the general level rather than to confine it as a model for the visual processing of the brain, we employed the fully connected network, which is the most fundamental architecture of artificial neural networks.

Figure 1

The input image, image class, and model output correspond to the stimulus presented in the experiment, the feature defined by the researcher, and the recorded neural activity, respectively. Ten units (nodes) in the output layer calculate the probability of each class for the input image. They are regarded as neural units tuned to each of the 10 classes like neurons in the inferior temporal cortex that selectively respond to complex visual stimuli such as faces (Bruce et al., 1981). For convenience, we refer to them as their preferred class (e.g., the airplane unit). In the case of the DNN model, the genuine feature space is predefined as labels of the training data. Using this model, we demonstrate the possible errors that may occur when a researcher investigates neural coding principles while presenting a prepared feature set. The output layer, or the last hidden layer of the model, corresponds to the recorded brain region in the simulation for scenarios 1, 2, and 4 and scenario 3 (Figure 1B).

Scenario 1. Suboptimal Feature Selectivity

In the first scenario, we demonstrate a possible error when determining neural feature selectivity by the differences in the responses of the neural unit to the presented feature set. To simulate a situation in which a researcher records a neuron that is highly tuned on the truck feature and explores the feature selectivity, we averaged the response of the truck unit to each class while showing the test set images of all classes. As expected, this unit exhibited selectivity for the truck feature [the regression coefficient and explained variance were 0.55 (p < 0.001) and 0.61, respectively] (Figure 2). However, if the researcher had not included the truck images in the stimulus set, only the results for classes excluding trucks (red box in Figure 2) would have been obtained, and from this, the researcher would have been apt to conclude that the truck unit responds selectively to the automobile feature [the regression coefficient and explained variance were 0.21 (p < 0.001) and 0.30, respectively]. In fact, experimental settings wherein a researcher determines the feature selectivity based on the neural response to the features presented in the experiment are common in practice (Miyashita and Chang, 1988; Liu and Richmond, 2000; Tanaka et al., 2001; Stalnaker et al., 2010).

Figure 2

Indeed, recognition of the limitations inherent in the experimental settings was also found in previous studies. Sauerbrei et al. (2015) measured motor-related variables (speed, acceleration, roll, pitch, and electromyogram) in freely moving mice to reveal the information represented in the variability of Purkinje cell activity during locomotion. It was discussed that the measurable variables in freely moving mice that were not anesthetized without fixing the head were limited, and hence, the possibility of primary variables other than the measured features could not be eliminated (Sauerbrei et al., 2015). Another investigation demonstrated that the internal state (satiety) was encoded in global brain regions, and the propagation of the sensory information was gated according to the animal's satiety state. However, the authors recognized the possibility that variables such as arousal (which is correlated with satiety) may be more influential than satiety (Allen et al., 2019).

There are studies in which the raised concerns have been demonstrated more directly. While primary visual cortex neurons have traditionally been assumed to primarily encode the local orientation components of high-order patterns, large-scale two-photon imaging of the primary visual cortex neurons combined with an extensive set of stimuli in awake macaques demonstrated that a large portion of neurons in the superficial layer of the primary visual cortex exhibited high selectivity to various complex patterns, such as curvatures, corners, junctions, and other higher-order patterns (Tang et al., 2018). Furthermore, even for neurons selective to a high-level complex pattern, most of them showed significant tuning for orientation. Hence, they reported that our understanding of neural selectivity may be biased and restricted depending on the neurons that can be sampled and the stimuli that can be evaluated.

Obviously, it is impossible to predefine the entire feature space of the recording region. Moreover, physically measurable variables are limited depending on the experimental settings. Nevertheless, it is necessary to recognize these inevitable constraints, and both the researcher's description and the reader's interpretation must be carefully addressed. In other words, the features that are coupled with the neural response cannot be guaranteed to be the optimal feature, and the definitions of feature selectivity are premised on the specific experimental setting. Therefore, careful interpretations of readers considering the context (i.e., the defined feature space in the experiment) and a rigorous description by the researchers are required.

Scenario 2. Irrelevant Feature Selectivity

In the second scenario, we simulate a case where the feature defined by the researcher was a confounding variable coupled with both the ground truth feature and neuronal responses. To simulate the case where the stimuli presented by the researcher contain features that are difficult to recognize, we modified the data by synthesizing two different veterinary hospital logos in the images of cats and dogs (Figure 3A). Compared with the accuracy for the original data in distinguishing between cats and dogs (38%), the model trained on the dataset containing cat and dog images with a logo demonstrated a much higher classification accuracy of 78%. If the researcher judges only the latter result without recognizing the existence of logos, it may be easily concluded that the model encodes a discriminative feature for each class. However, the features that the model encodes are presumably the logo pattern in this case (Figure 3B). To confirm this, we reconstructed the receptive field of each class unit through the weighted summation of the activation values in every route linking a single input pixel and a class unit (see the Materials and Methods section). Consequently, it was clearly shown that the artificially synthesized logos mainly accounted for the activation of the cat/dog unit (Figure 3A). This indicates that the features defined by the researcher (cat and dog) were merely confounding variables.

Figure 3

Recently, a similar problem has been discussed in the deep learning field under the name of shortcut learning. In shortcut learning, a trained model shows a strong discrepancy between the intended and actual learning strategies, leading to poor generalization to unseen domains. For instance, in a classification task, a DNN model can employ the background context (shortcut feature) as the decision rule to recognize the primary object. It may seem to be accurate for objects in a “common” context (e.g., a cow in a pasture) but unexpectedly fail on objects in an uncommon context (e.g., a cow on the beach) (Supplementary Figure 1) (Szegedy et al., 2013; Beery et al., 2018; Zech et al., 2018). This suggests the problem of understanding the behavior of DNNs based on the researcher's intended solution (Geirhos et al., 2020). It may provide an interesting frame of reference for thinking about shortcut learning for BNN research. That is, even if the recorded neurons respond differently to the presented stimuli, the stimulus feature defined by the researcher is possibly a confounding variable that differs from the actual feature encoded by the BNN.

Indeed, Kim et al. (2019) demonstrated that characterizing the feature selectivity of primary sensory cortex neurons can be misled by describing neurons with a single stimulus feature, disregarding other aspects of sensory stimuli. In many cases, neurons in the primary sensory cortex are classified as non-nociceptive, nociceptive, or convergent neurons, according to their electrophysiological response to innocuous brush stroke and noxious forceps pinch stimuli. Contrary to previously known results, the majority of neurons that appeared to encode noxiousness showed high selectivity for the texture of the stimuli and low selectivity for noxiousness. This implies that the texture is more likely to be the explanatory feature and that noxiousness only resulted in a different response owing to the paired texture. Overall, it should be noted that there may be an alternative cause that can explain the results obtained in the given experimental paradigm.

Scenario 3. Overestimation of Network Feature Representation

In the third scenario, we demonstrate a potential error when studying the feature representation of the network based on neural decoding. One of the ways to evaluate whether particular information is present in a brain region is to implement decoding models, including ML models, and see if the model can decode features from neural activity with performance above the chance level (Yan et al., 2014; Kriegeskorte and Douglas, 2019). It is important to note that chance-level performance is dependent on the number of labels provided to the decoder. However, it is often overlooked that the entire set of labels represented in the recording region is unavailable, meaning that a genuine baseline performance is beyond our reach. In other words, the decoding performance should be interpreted with the baseline performance, but since a genuine baseline performance cannot be known, the experimental results are prone to being misinterpreted.

Specifically, we simulate a situation of evaluating whether the cat feature is represented in the recording region using a linear support vector machine (SVM) as the decoder. It is commonly believed that linear decodability is considered evidence for the “explicit” representation in that it can be read by downstream neurons in a single step (Misaki et al., 2010; Kriegeskorte and Kievit, 2013; Ritchie et al., 2019). In this simulation, the last hidden layer of the DNN model was regarded as the region of interest. Since the model was trained on the data with 10 labels (classes), the genuine chance-level performance of the task to classify whether a given stimulus is a cat or not was 10%, and the classification accuracy of the model for the cat class was 28%. Even if this result is declared statistically significant (i.e., the p-value falls below the threshold), its effect size, which provides some indication of practical meaningfulness (Benjamin et al., 2018), may not be enough to conclude the linear separability of the two classes in the neuronal representational space. Now let us assume that the researcher implements a binary classifier to investigate whether the cat feature is represented in the recording region. Since a binary classifier is trained on the data with a binary label, only two options are available for the unseen data (i.e., chance-level performance is 50%). When we trained the linear SVM to classify the cat from other classes, the accuracy for the cat test set images was 82%, and linear decodability might be claimed from this result (Figure 4, Table 1). In brief, when evaluating the feature representation of a neural network based on the decoding performance, it can be inflated due to the high chance-level performance of the task defined by the researcher (e.g., binary classification).

Figure 4

Table 1

Model	Accuracy	Baseline accuracy
Deep neural network model	0.28	0.1
Linear support vector machine	0.82	0.5

Comparison of the performance of the deep neural network model and the binary decoder.

Classification accuracy was used as a rule evaluation measure, and it was evaluated on the same test data (1,000 cat images) for both models. Baseline accuracy indicates the probability that the model matches the class of the input image by chance, which is determined by the number of labels contained in the training data.

Various decoding methods have been used to decipher neural coding principles from the neural activity patterns that are distributed across neurons or cortical regions (Haxby et al., 2014). In particular, multivoxel pattern analysis, a popular analytical technique for analyzing fMRI data, is widely used in comparing how a distributed pattern of activity over multiple voxels differs between task conditions or stimuli (i.e., multivariate pattern classification) (Popov et al., 2018). It is common to implement a linear classifier to the region of interest to decode binary information and make inferences such as the engagement of certain brain areas in specific tasks or the relationship between brain states and informational content. However, as we pointed out in this scenario, the fact that the genuine dimension of the label space of the BNN is unavailable makes room for misinterpretations of the network feature representation. In other words, even if the decoding performance is statistically significant, it should be noted that it is the result of the test based on the chance-level performance assumed by the decoder model. Both the readers and researchers should be cognizant of the exact characteristics of the decoder model and to what extent the model can account.

Scenario 4. Overestimation of the Noise Correlation

Finally, we simulate a situation where the noise correlation of the network can be overestimated owing to the misassumption in feature complexity or omission of globally coded features. Noise correlation is the degree to which the trial-by-trial variability in responses for an identical stimulus is shared by a pair of neurons. It is different from the signal correlation measured from the responses of a neuron pair for different stimuli, which indicates a similar tuning property (Supplementary Figure 2) (Cohen and Kohn, 2011). Noise correlation is investigated mainly in the context of its relationship with population coding, network architecture, or behavior (Cohen and Maunsell, 2009; Hofer et al., 2011; Sauerbrei et al., 2015; Ruff and Cohen, 2019). Therefore, a biased estimation of noise correlation can generate consecutive errors in their roles in sensory processing or inferences regarding the network connectivity and the mechanisms that produce them.

Specifically, in this scenario, we want to show that signal correlation can be incorporated into the noise correlation due to the researcher's misassumption of features. The activity of the units in the output layer of the DNN model is regarded as the average activity of the homogenous population, and after training, it is deterministic for the same input. Therefore, to mimic noise correlation estimation in BNN research, we constructed a model capable of stochastically generating individual neural activity (see the Materials and Methods section for details).

Feature Complexity

It is known that neurons along the ventral pathway of the human brain are tuned to features of different complexity (Riesenhuber and Poggio, 2002; Güçlü and van Gerven, 2015). Unlike lower cortical areas, the preferred feature of a neuron in the higher visual areas is hard to determine (Riesenhuber and Poggio, 2002). Here, we point out that the noise correlation estimation can be overestimated if the researcher incorrectly assumes the feature complexity. If the features defined by the researcher were subdivided into finer features in the region of interest, a signal correlation may be incorporated into the estimated noise correlation. This is because even if the researcher repeats the stimulus that is identical in terms of the coarse feature, neurons will react differently depending on the tuning property of the fine features.

As an example of the scenario, we assumed a situation in which the feature of the stimulus defined by the researcher is the dog class (coarse-grained representation, high-level feature) and the recording region is actually coding the feature at the level of individual dog breeds (fine-grained representation, low-level feature) (Figure 5A). We tried to compare the correlation of the response variability from two models with different feature complexities, each of which accords with the researcher's assumption (coarse-grained representation model) and the actual situation (fine-grained representation model). By differently assuming within-unit variance, the multivariate normal distribution of each model was parameterized, enabling stochastic sampling (see the Materials and Methods section for details). From each model, individual neural activities were randomly sampled by the unit for the images of the dog class consisting of different breeds, and a pairwise correlation was calculated therefrom. From the perspective of the fine-grained representation model, dog images of different breeds (e.g., Pomeranian and golden retrievers) were recognized as different stimuli. Hence, a signal correlation may occur between neurons having a similar preference for the breed. However, from the point of the researcher's view that the region of interest encodes the dog class, dog images are regarded as the same stimulus regardless of the breed. Subsequently, the pairwise correlation calculated from the response variability (which includes signal correlation) is counted solely as noise correlation. Compared with the case where the researcher's hypothesis was correct (coarse-grained representation model), it can be seen that the noise correlation is overestimated (Figure 5B, Table 2).

Figure 5

Table 2

Model	Average correlation
	Total units	Dog unit
Fine-grained model	0.043	0.275
Coarse-grained model	0.029	0.099
Model with internal rhythm	0.215	0.371
Model without internal rhythm	0.028	0.097

Comparison of the estimated noise correlation.

Means of the absolute pairwise correlations calculated from the trial-by-trial variability of neural responses for each conceptual model within scenario 4. The values of the total units are the average of the correlations obtained from the pairs of the 90 total neurons (9 neurons per unit), and the values of the dog unit are the average of the correlations obtained from the pairs between 9 neurons within the dog unit.

In fact, it is a likely scenario for a researcher to wrongly assume an appropriate level of feature complexity. The feature defined by the researcher could be more coarse-grained or fine-grained than the actual feature, or the tuning property of the feature can change over time. In recent findings, it has been shown that the abstraction level of features varies based on the hierarchy of brain areas, or more abstracted features may emerge after learning within the same population (Connor et al., 2007; Engel et al., 2015; Tang et al., 2018). According to the investigation of Engel et al. (2015) a considerable fraction of lateral intraparietal cortex neurons showed mixed selectivity for both directions and categories, and even feature selectivity was altered after training from pure-directional to pure-category tuning. Therefore, the tuning property defined within the observation point and setting may not be generalized, which can lead to confusion between signal correlation and noise correlation.

Internal Dynamics

It is known that local neural activity depends not only on the current sensory input but also on the current brain state (Panzeri et al., 2015), and it has been hypothesized that a significant fraction of trial-to-trial variability of population activity is accounted for by variations in the brain state (Curto et al., 2009). Internal subjective features are difficult to control or directly measure in an experiment, but since they affect the intercorrelation of neurons, they can produce bias in the noise correlation estimation. Here, we simulate the case in which the neglection of the internal state coding leads to the overestimated noise correlation. We mimic the situation in which a periodic internal rhythm is globally represented in the neural network and stimuli are presented alternately in the upcycle and downcycle of the rhythm (Figure 5C). Again, we constructed two different models, either with or without internal rhythm. In the former, after showing 1,000 test set images of dogs to the DNN model, we added artificially generated random values to the outputs of the sampled neurons for half of the instances and subtracted them for the remaining instances. In the latter, individual neural activity for the same input was taken without any manipulation. As expected, compared to the model without internal rhythm, a high correlation in response variability occurred since the model included the signal correlation evoked by jointly coding the internal state (Figure 5D, Table 2). Without considering the internal state feature, it may be regarded as mere noise correlation.

Recent studies have revealed global representation throughout the brain for internal states such as satiety, anxiety, or latent behavioral states. Allen et al. (2019) showed that in thirsty mice, a thirst motivational state (satiety) was globally represented across the brain regions and that the neural responses for the same task-relevant feature were altered as the thirst became gradually satisfied during repeated trials. Stringer et al. (2019) demonstrated representation of behavioral-state information in the primary sensory cortex, suggesting that previously reported trial-by-trial variability during stimulus presentations may depend on it. Another investigation discovered that single prefrontal cortex neurons contribute to complicated cognitive tasks by having mixed selectivity and that they encode internal cognitive processes simultaneously with task-relevant information (Rigotti et al., 2010). Bányai and Orbán (2019) reported that in hierarchical models, inferences for task-related and higher-level perceptual variables were the dual source of noise correlation. Here, we want to emphasize the possibility that signal correlation due to undetected features can be incorporated into the noise correlation.

Reproducing the Scenarios Using a CNN

As mentioned at the beginning, we implemented the fully connected network to avoid limiting the issues raised for the visual system. Nonetheless, given that the examples used are all visual stimuli, we repeated the simulations using a CNN, which is widely used in the field of image processing. After fine-tuning the pretrained ResNet18 (He et al., 2016) model on the same cifar-10 dataset, an accuracy of 94.6% was obtained for the held-out data, and the simulated results were consistent with the previous results in all scenarios (Figure 6). In scenario 3, the classification accuracy of the model for the cat class (85%) and that of linear SVM (91%) did not show much difference since the performance of the model was already saturated.

Figure 6

Discussion

The primary purpose of this study was to highlight the necessity of the careful inspection of our research framework that identifies the relationship between a stimulus and a neural response based on the predefined feature space. The second aim was to propose the feasibility of bridging the studies of artificial intelligence (AI) and neuroscience in various aspects. In this study, we were able to explicitly demonstrate the misleading points by implementing the DNN model as a BNN model to mimic specific situations in each scenario. This research suggests that the comparative approach between AI and neuroscience can provide new insights into the same problem and can enable alternative interpretations based on accumulated experiences in each field.

This study highlights that the interpretation of empirical results requires care. This is because even if there is an association (dependency) between variables in the observed data, this does not necessarily mean a causal relationship. In other words, associations can arise between variables in the absence of a causal relationship if they have a common cause (Altman and Krzywinski, 2015). As shown in scenarios 1, 2, and 4.2, the oversight of the presence of an unevaluated variable can lead to erroneous conclusions in diverse aspects. Additionally, scenario 3 underlines that the empirical results should be interpreted with the context in which the data were obtained, such as a specific experimental setting and the assumption of the model used. Primarily, the researchers need to describe the experimental conditions in detail and be aware of the extent to which they can infer from the implementation of the techniques (or the models). Even if the statistical significance or the rigor of the experiment is satisfied, deriving the implications from the observation is a different problem. Regarding readers, they tend to focus on the conclusion, thereby missing the detailed contexts. Although it is cognitively demanding to understand the details, a reader's careful attention is needed as much as the researcher's careful reporting.

The 4 scenarios presented in this study basically arise from the fact that the feature of a stimulus is, in fact, an idea defined by the researcher. Selective neurons that encode easily recognizable features, such as orientation, color, and motion, have been found in many brain areas (Kamitani and Tong, 2005). However, unlike the brain regions receiving the primary sensory information, it has been revealed that it is difficult to determine the selective feature in regions performing higher cognitive functions (Poldrack, 2006). Moreover, as bottom-up inputs are integrated, the receptive field may be dynamically modulated by attention in the downstream, making them no longer under the researcher's control (David et al., 2008; Zirnsak and Hamker, 2010; Ibos and Freedman, 2014, 2016; Pattadkal et al., 2018; Wutz et al., 2018). In addition, the vast amount of information that will be addressed at the unconscious level is not even in an interpretable form (György Buzsáki, 2019). A more fundamental concern is that the identification of the informational content of a given representation does not necessarily mean that the representation stands for it (Brook and Akins, 2005; Morgan and Piccinini, 2018; Gomez-Marin and Ghazanfar, 2019). Furthermore, Buzsáki referred to the framework taking invented terms (defined features of the stimulus) as the to-be-explained independent categories and looking for brain mechanisms that can explain those ideas as an “outside-in framework” and claimed that the thing to-be-explained should be the activities of the brain, not the invented terms (“inside-out framework”) (György Buzsáki, 2019).

It is noteworthy to mention that this study is inspired by the failure experiences of DNNs. With many cases where DNNs fail unexpectedly in real-world scenarios, the need to understand the representations they learn has come to the forefront under the name of shortcut learning (Nguyen et al., 2015; Beery et al., 2018; Geirhos et al., 2020). A shortcut strategy refers to the learned decision rule of the model that differs from the researcher's intended solution. While superficially successful, it can no longer be generalized if the model confronts unseen data from different distributions. Shortcut learning in DNNs demonstrates the risk of assigning underlying abilities to the models based on the researcher's assumption (i.e., the expected solution based on how a human would solve the problem). Given that our understanding of the internal mechanism of the brain is still rudimentary, it would be fruitful to learn from mistakes in DNNs.

Our study is based on the idea that DNNs can serve as a good model for BNNs. The artificial neural networks were originally inspired by neural computation and the structure of the brain (Hassabis et al., 2017). Although much of the subsequent development has been made in terms of mathematics and engineering based on efficient optimization rather than neuroscientific findings, there are still opportunities for synergy with neuroscience (Marblestone et al., 2016). Since they share the question of how to analyze the representations of neural networks, data analysis tools and concepts established in each field can facilitate the re-examination of the preconceptions as well as the development of fresh methodologies and theories (Barrett et al., 2019). Among the “AI to neuroscience” approaches, there have been successful attempts to adopt DNNs as an in silico model system for BNNs, suggesting testable hypotheses for neural computing (Cohen et al., 2019; Lillicrap and Kording, 2019; Richards et al., 2019; Bellec et al., 2020). This has provided a significant amount of insight into the elucidation of information processing in the brain. Here, we demonstrated that in silico simulation using DNNs may also be particularly effective in articulating the influences that the assumptions of a researcher can bring. Compared to the BNNs, the DNNs can be said to be a more explainable model, at least in that they can explicitly demonstrate the results of a strategic manipulation of the assumption or specific factor in a controlled setting, thus enabling us to detect and remove biases more readily (Koh and Liang, 2017; Samek et al., 2017). In the study, we used the fully connected network and the convolutional neural network model to mimic and display the expected error situations in BNN research. However, depending on the research subject and target, researchers may be able to use DNNs with other inductive biases and customize the analysis, structure, and learning of the model.

Lastly, it is worth noting that the simulated scenarios were not mutually exclusive and were only described with emphasis on certain aspects of potential bias. Additionally, the issues presented herein are only illustrative examples that aim to highlight topics of concern regarding the currently adopted research strategy. Although our study did not go so far as to suggest an alternative paradigm, we tried to demonstrate possible biases and constraints in the research to understand neural representation, and we believe that this approach can contribute to encouraging the discussion and efforts to revisit and complement them. In conclusion, by incorporating lessons from the point of contact with the AI field into research experiences in neuroscience, we will be able to gain insights and devise creative approaches to investigating the operating principles of the brain.

Materials and Methods

The DNN Trained on CIFAR-10 as a Model of BNN

We constructed the DNN classifier using the open-source neural network library Keras (Chollet, 2017) while operating TensorFlow (Abadi et al., 2016) as the backend. The model comprised six layers with dense connections (rectified linear activation for five hidden layers with 400, 200, 100, 50, and 20 nodes and softmax activation for the output layer with 10 nodes). A dropout layer with a keep probability of 0.8 and an l2-regularizer was added on each layer to apply penalties on the layer activity during optimization. The model was trained with the CIFAR-10 dataset comprising 60,000 32 × 32 color images in 10 classes, with 6,000 images per class, and the dataset was split 75%/8%/17% for the training/validation/test sets. After 500 epochs of training using stochastic gradient descent, the model demonstrated a saturated test set accuracy of ~53% (chance-level performance = 10%).

Scenario 1. Suboptimal Feature Selectivity

Each class unit in the output layer of the DNN model is treated as a neural unit tuned to the corresponding class, and they calculate the probability of the class for the input image. After feeding each of the 1,000 test set images per class to the trained model, we obtained the output values of the truck unit and regarded them as neural activities of the concept neuron tuned to the truck class. The average outputs per input class were presented as the selectivity profile of the truck unit. The multiple linear regression was implemented to model the relationship between the class features and the responses of the truck unit. We fitted the model with the activation of the truck unit as response variables, and the 10 class features (one-hot vectors) as predictors and presented the coefficients and their p-values for the automobile and the truck feature. In addition, we calculated the explained variance statistic of each feature, which can be used as the indicator of statistical effect size.

Scenario 2. Irrelevant Feature Selectivity

To mimic a situation in which the dataset prepared by the researcher contains an unrecognized confounding variable, two veterinary hospital logo images were synthesized at consistent positions of the cat and dog images (lower right and lower left). The logos were synthesized with 80% transparency after a down-resolution to 5 × 5 × 3, considering the dataset resolution. Subsequently, the accuracy of the trained model in distinguishing the class between the cat and dog was evaluated for both the original and modified (logo-synthesized) datasets.

To expose the implicit attention of the trained model on an image, we tried to visualize the receptive field for each unit in the output layer. Although an analytical solution for the inverse of the feedforward connection cannot be obtained, the degree of contribution to the output unit activation for each input pixel can be estimated using the learned weight parameters. For every route linking a single input pixel and a class unit, from back to front, a one-hot class vector was sequentially multiplied by the weights of the connections and passed through the inverse of the rectified linear activation. In this way, the values obtained from all connecting routes were summed and assigned as the intensity value of the corresponding input pixel, meaning that the pixels with high intensity are discriminative image regions. Then, 3,072-dimensional receptive field vectors were converted into RGB images (32 × 32 × 3) for visualization. To confirm the reliability of the method for reconstructing the receptive field vectors, we asked the trained model to predict the class. The model had correct predictions for 8 out of 10 classes. Considering the possibility that the error was attributed to the incomplete learning (test set accuracy = 53%) of the model, we evaluated the method for the same model trained on the MNIST data. MNIST is a database of handwritten digits comprising a training set of 60,000 images for 10 classes and 10,000 test set images. After training the model (test set accuracy = 92%) and reconstructing the receptive field vectors of the class units in the same manner, the model had correct predictions for all 10 classes.

Scenario 3. Overestimation of the Network Feature Representation

In BNN research, linear SVM is commonly regarded as a surrogate for a linear read-out neuron, and its decoding performance is used to estimate the amount of information represented in the region of interest. We mimicked the situation in which the researcher evaluates whether the cat feature is represented in a brain region based on the performance of a linear decoder trained to discriminate the feature labels for the neural population activities. The last hidden layer of the DNN model was regarded as the region of interest, and we obtained the output vectors of that layer (treated as recorded population activities) after feeding 1,800 test set images labeled as cat or non-cat. The non-cat images comprised every 100 images of the other 9 classes. The linear SVM was trained to classify the class (cat or non-cat) for those output vectors. Subsequently, the classification accuracy was evaluated on the remaining 100 test set images of cat for both the DNN model and linear SVM.