Original Research ARTICLE
Integrating Non-monotonic Logical Reasoning and Inductive Learning with Deep Learning for Explainable Visual Question Answering
- 1University of Birmingham, United Kingdom
- 2The University of Auckland, New Zealand
State of the art algorithms for many pattern recognition problems rely on deep network models. Training these models requires a large labelled dataset and considerable computational resources, which are not readily available in many domains. Also, it is difficult to understand the working of these learned models, which limits the use of such models in some critical applications. As a step towards addressing these limitations, the architecture described in this paper draws inspiration from research in cognitive systems, and integrates the principles of commonsense logical reasoning, inductive learning, and deep learning. In the context of answering explanatory questions about scenes and the underlying classification problems, the architecture uses deep networks for extracting features from images and for generating answers to queries. Between these deep networks, it embeds components for non-monotonic logical reasoning with incomplete commonsense domain knowledge, and for decision tree induction. It also incrementally learns previously unknown constraints governing the domain's states, and uses these constraints for subsequent reasoning. Experimental results show that in comparison with an ``end to end'' architecture based on deep networks, our architecture provides: (i) better accuracy on classification problems when the training dataset is small, and comparable accuracy with larger datasets; and (ii) more accurate answers to explanatory questions. Furthermore, we show that the incremental acquisition of the unknown constraints improves the ability to answer explanatory questions, and that the architecture can be adapted to address planning tasks on a robot.
Keywords: Nonmonotonic logical reasoning, inductive learning, deep learning, Visual question answering, Commonsense reasoning, Human-robot collaboration
Received: 18 Sep 2019;
Accepted: 05 Nov 2019.
Copyright: © 2019 Sridharan and Riley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mx. Mohan Sridharan, University of Birmingham, Birmingham, United Kingdom, firstname.lastname@example.org