Your new experience awaits. Try the new design now and help us make it even better

OPINION article

Front. Neuroinform., 27 January 2026

Volume 20 - 2026 | https://doi.org/10.3389/fninf.2026.1729805

This article is part of the Research TopicAI and Natural Learning Systems: Bi-Directional InsightsView all 6 articles

On the need for abstract, deep reinforcement learning models in neuroscience


Santina DuarteSantina Duarte1Xena Al-HejjiXena Al-Hejji1Edgar Bermudez Contreras
Edgar Bermudez Contreras2*Eric Chalmers
Eric Chalmers1*
  • 1Department of Mathematics and Computing, Mount Royal University, Calgary, AB, Canada
  • 2Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, Canada

Deep Reinforcement Learning (DRL) algorithms combine artificial neural networks with reward-based learning processes, and are a useful analog of reward-based information processing and dopamine-driven learning in the brain (Botvinick et al., 2020). In our work, we have successfully used DRL algorithms to recreate features of mood disorders observed in vivo. However, when we present our results, we sometimes hear the objection that DRL algorithms are so much simpler than the brain, that they can't possibly be useful tools for studying it. For example, reviewers point out (quite correctly) that plasticity in the artificial neural network is achieved through backpropagation, which is computationally convenient but biologically implausible. Or that bias terms in the artificial network have no obvious neural correlate. Or that DRL ignores the roles of genetics, specific brain structures, signaling methods, etc.

At the risk of sounding ungrateful to our critics, (on the contrary, we would be nowhere without them!) we will here argue for the importance of abstract models like DRL in science, and show that DRL is already being used to generate testable neuroscientific hypotheses.

Of models and metaphors

In science we understand complex phenomena through various models, which exist on a spectrum from high to low abstraction. High-level models abstract away detail in order to highlight “big picture” principles. They could include block diagrams, schematics, or mathematical or other models that describe a system's behavior but abstract away implementation details (roughly corresponding to the Algorithmic level in Marr's levels of analysis). Lower-level models include more details of the phenomena, and so obscure overarching principles - these models are closer to Marr's Implementation level of analysis. Examples may include animal models,1 or mathematical models of specific cellular interactions.

Every model is a kind of metaphor that helps us understand part of a complex phenomenon. To ask whether a metaphor (or a model) is “correct” is non-sensical; to quote the aphorism attributed to George Box, “All models are wrong, but some are useful” (Box, 1976).2 Consider a schematic diagram of brain structure interactions (high abstraction), and an animal model of a human disorder (low abstraction)—neither are completely accurate representations, but each is useful in the right situation: imagine trying to teach neuroscience without the diagram, or test drugs without animal models, or teach neuroscience with only animal models!

Pursuing more detailed/accurate models for their own sake can distract us from more fundamental concerns. Jorge Luis Borges warned against this in a 1946 short story titled On Exactitude in Science. It describes a fictional empire whose preoccupation with perfection in cartography led them to create ever larger and more detailed maps. Their consummate creation was a map as large as the empire itself, and which corresponded point-for-point with it. Obviously, this map was worse than useless. It was abandoned by the next generation, and the story concludes tragically: “in all the land there is no other relic of the disciplines of geography” (Borges, 1999).

It's possible that the phenomena we care about most (consciousness, desire, love, etc) cannot be found in the lowest-level mechanisms of information processing. Rather, they may themselves be higher-order phenomena that are best understood with the help of more abstract models. If so, a preference for more detailed models—to the exclusion of higher-level metaphors—seems doomed to end in the frustration hinted at by Borges' story. Yet, and as was presumably the case in Borge's day, there are a few life scientists on this trajectory now. To quote Douglas Hofstadter's lament: “how widespread is the tacit assumption that the level of the most primordial physical components of a brain must also be the level at which the brain's most complex and elusive mental properties reside” (Hofstadter, 2008). Science requires thought across the full spectrum of abstraction. Very occasionally it may even be possible to consider multiple levels at the same time—ecent work by Shine et al. illustrates how computational models may facilitate this (Shine et al., 2021).

DRL algorithms exist at an intermediate level of abstraction, much lower than block diagrams but much higher than animal models. This gives them the unique ability to illustrate core features of learning, while glossing over specifics of their neural implementation. The latter point sometimes dissuades scientists who are accustomed to working at a level closer to implementation. But most neuroscientists, who are comfortable working at higher levels of abstraction, see that it is because DRL models fly above the level of implementation details that they offer a unique computational perspective on intelligent behavior. This perspective has allowed these scientists to generate and test a variety of meaningful hypotheses about biological learning. We conclude with a few examples.

Successes of deep reinforcement learning in neuroscientific discovery

Song et al. (2017) built a DRL model that reproduced observations from an early study conducted with monkeys (Padoa-Schioppa and Assad, 2006). That earlier study had found particular neurons in the orbitofrontal cortex that seemed to encode the inferred economic value of the monkey's preferred choices in several tasks. Song et al. found that reward-based training of artificial neural networks caused certain artificial neurons to take on similar roles. Their computational model predicted a role for value representations in the brain which support learning of a task, but exist independently of its execution (it would have been difficult to reach this hypothesis through in vivo experiments).

Banino et al. (2018) discovered that when an artificial neural network with recurrent connections was trained to do path integration, neurons in the network began to fire at regular spatial intervals—resembling the behavior of entorhinal grid cells. They next found that a DRL model based on this network could effectively perform vector-based navigation, even in unfamiliar or dynamic environments. These findings support neuroscientific hypotheses that grid cells enable vector-based navigation.

In their work with DRL, Wang et al. (2018) noticed an interplay between relatively slow RL processes and faster recurrent network dynamics, which allows the system to learn general characteristics of a family of tasks, and then quickly solve new tasks from that same family—similar to how humans can transfer knowledge gained in one task to other, similar tasks. They used these results to explain neuroscientific findings that challenged traditional notions of dopamine-driven RL, such as the observation that the prefrontal cortex and dopamine neurons both seem to encode reward-based signals that classical RL models ascribe solely to dopamine. Follow-up work by Jensen et al. (2024) augmented Wang's DRL system with the ability to sample imagined action sequences based on previous experience. Jensen's model provides a new computational account of planning and hippocampal replay in mammals.

Wijmans et al. (2023) discovered that if a DRL agent's neural network was endowed with recurrent connections, this recurrence allowed map-like representations and collision-detection neurons to emerge in the network. These effects were observed even when the agents were blind—with no sensory input other than egomotion—predicting that network recurrence is a sufficient condition for the emergence of “mental maps.” These findings have interesting implications for understanding the mechanistic basis of navigation in mammals.

While the above studies observed neural phenomena in DRL models, a study by Dabney et al. (2020) is an interesting example of the reverse. Classical RL models use a reward-prediction-error signal [thought to be conveyed in the brain by phasic dopamine activity Schultz et al., 1997; Montague et al., 1996] to drive learning. While this signal is typically a scalar—representing a single, point prediction of expected value—artificial intelligence practitioners have long realized the value of using a probability distribution over possible values instead. This allows the agent to reason over possible futures, and better manage risk. Inspired by this idea, Dabney et al. searched for and found similar distributional codes in mouse brains.

Finally, our own work has used DRL to create computational models of major depressive disorder (Chalmers et al., 2024) and schizophrenia (Al-Hejji et al., 2025). After impairing a DRL agent's neural network in ways analogous to what is observed in these disorders, we observed similar behaviors to the ones exhibited by depressed and schizophrenic humans and animals. In both cases, deep examination of the DRL model suggested ways to reconcile competing ideas about the disorders' pathologies (e.g., the model suggests a way to reconcile dopaminergic and neurodevelopmental theories of schizophrenia) and offered useful insight to guide treatment research. These results were possible partly because making alterations in a DRL model is so much easier than attempting the same alterations in vivo.

In conclusion

Advances in AI have largely been driven by engineering goals and applications, which has led to the perception that AI approaches, such as DRL, offer little value in understanding biological systems. However, as the examples mentioned above (and many more) show, this view is increasingly outdated. When used appropriately, DRL and related approaches provide a useful framework to study the dynamic interplay between agents and their environments, shedding light on fundamental principles of how representations, learning, memory, and decision making work in biological systems. Such frameworks offer a way to generate and formalize hypotheses, test mechanisms, and generate insights that are difficult to obtain from experimental work alone. We hope that this perspective gains broader recognition, allowing deep reinforcement learning and related methods to enrich neuroscience research.

Author contributions

SD: Writing – review & editing. XA-H: Writing – review & editing. EB: Writing – original draft, Writing – review & editing. EC: Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. The authors would like to acknowledge funding from Alberta Innovates, and the Natural Sciences and Engineering Research Council of Canada.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^Though even animal models are too abstract to elucidate some human disorders. “High” and “low” abstraction are relative, and a model that is not abstract enough for one purpose could be too abstract for another.

2. ^This line is often attributed to Box. A more faithful (and pertinent) quote from his 1976 paper is: “Since all models are wrong the scientist cannot obtain a ‘correct' one by excessive elaboration… Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity”.

References

Al-Hejji, X., Guillermo Gomez Castro, J., Duarte, S., Bermudez Contreras, E., and Chalmers, E. (2025). An in-silico Integration of Neurodevelopmental and Dopaminergic Views of Schizophrenia. presented at the NeurIPS 2025 Workshop, AI4Science.

Google Scholar

Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., et al. (2018). Vector-based navigation using grid-like representations in artificial agents. Nature 557, 429–433. doi: 10.1038/s41586-018-0102-6

PubMed Abstract | Crossref Full Text | Google Scholar

Borges, J. J. (1999). Collected Fictions, translated by A. Hurley. London: Penguin Classics.

Google Scholar

Botvinick, M., Wang, J. X., Dabney, W., Miller, K. J., and Kurth-Nelson, Z. (2020). Deep reinforcement learning and its neuroscientific implications. Neuron 107, 603–616. doi: 10.1016/j.neuron.2020.06.014

PubMed Abstract | Crossref Full Text | Google Scholar

Box, G. E. P. (1976). Science and statistics. J. Am. Stat. Assoc. 71, 791–799. doi: 10.1080/01621459.1976.10480949

Crossref Full Text | Google Scholar

Chalmers, E., Duarte, S., Al-Hejji, X., Devoe, D., Gruber, A., and McDonald, R. J. (2024). Simulated synapse loss induces depression-like behaviors in deep reinforcement learning. Front. Comput. Neurosci. 18:1466364. doi: 10.3389/fncom.2024.1466364

PubMed Abstract | Crossref Full Text | Google Scholar

Dabney, W., Kurth-Nelson, Z., Uchida, N., Starkweather, C. K., Hassabis, D., Munos, R., et al. (2020). A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675. doi: 10.1038/s41586-019-1924-6

PubMed Abstract | Crossref Full Text | Google Scholar

Hofstadter, D. R. I. (2008). Am a Strange Loop. New York, NY: Basic Books.

Google Scholar

Jensen, K. T., Hennequin, G., and Mattar, M. G. (2024). A recurrent network model of planning explains hippocampal replay and human behavior. Nat. Neurosci. 27, 1340–1348. doi: 10.1038/s41593-024-01675-7

PubMed Abstract | Crossref Full Text | Google Scholar

Montague, P. R., Dayan, P., and Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996

PubMed Abstract | Crossref Full Text | Google Scholar

Padoa-Schioppa, C., and Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226. doi: 10.1038/nature04676

PubMed Abstract | Crossref Full Text | Google Scholar

Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593–1599. doi: 10.1126/science.275.5306.1593

PubMed Abstract | Crossref Full Text | Google Scholar

Shine, J. M., Müller, E. J., Munn, B., Cabral, J., Moran, R. J., and Breakspear, M. (2021). Computational models link cellular mechanisms of neuromodulation to large-scale neural dynamics. Nat. Neurosci. 24, 765–776. doi: 10.1038/s41593-021-00824-6

PubMed Abstract | Crossref Full Text | Google Scholar

Song, H. F., Yang, G. R., and Wang, X.-J. (2017). Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6:e21492. doi: 10.7554/eLife.21492

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868. doi: 10.1038/s41593-018-0147-8

PubMed Abstract | Crossref Full Text | Google Scholar

Wijmans, E., Savva, M., Essa, I., Lee, S., Morcos, A. S., and Batra, D. (2023). “Emergence of maps in the memories of blind navigation agents,” in The Eleventh International Conference on Learning Representations. Available online at: https://openreview.net/forum?id=lTt4KjHSsyl (Accessed May 09, 2023).

Google Scholar

Keywords: abstraction, artificial neural network (ANN), deep learning, modeling, reinforcement learning

Citation: Duarte S, Al-Hejji X, Bermudez Contreras E and Chalmers E (2026) On the need for abstract, deep reinforcement learning models in neuroscience. Front. Neuroinform. 20:1729805. doi: 10.3389/fninf.2026.1729805

Received: 21 October 2025; Revised: 08 January 2026;
Accepted: 12 January 2026; Published: 27 January 2026.

Edited by:

Ludovico Minati, University of Electronic Science and Technology of China, China

Reviewed by:

Pierre Bongrand, Aix-Marseille Université, France
Taegon Kim, Korea Institute of Science and Technology, Republic of Korea

Copyright © 2026 Duarte, Al-Hejji, Bermudez Contreras and Chalmers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Edgar Bermudez Contreras, ZWRnYXIuYmVybXVkZXpAdWxldGguY2E=; Eric Chalmers, ZWNoYWxtZXJzQG10cm95YWwuY2E=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.