About this Research Topic
Given the fundamental role of attention in the field of computer vision, the goal of this Research Topic is to contribute to the growth and development of attention-based solutions focusing on both traditional approaches and fully-attentive models. Moreover, the study of human attention has inspired models that leverage human gaze data to supervise machine attention. This Research Topic aims to present innovative research that relates to the study of human attention and to the usage of attention mechanisms in the development of deep learning architectures and enhancing model explainability.
Research papers employing traditional attentive operations or employing novel Transformer-based architectures are encouraged, as well as works that apply attentive models to integrate vision and other modalities (e.g., language, audio, speech, etc.). We also welcome submissions on novel algorithms, datasets, literature reviews, and other innovations related to the scope of this Research Topic.
The topics of interest include but are not limited to:
- Saliency prediction and salient object detection
- Applications of human attention in Vision
- Visualization of attentive maps for Explainability of Deep Networks
- Use of Explainable-AI techniques to improve any aspect of the network (generalization, robustness, and fairness)
- Applications of attentive operators in the design of Deep Networks
- Transformer-based or attention-based models for Computer Vision tasks (e.g. classification, detection, segmentation)
- Transformer-based or attention-based models to combine Vision with other modalities (e.g. language, audio, speech)
- Transformer-based or attention-based models for Vision-and-Language tasks (e.g., image and video captioning, visual question answering, cross-modal retrieval, textual grounding / referring expression localization, vision-and-language navigation)
- Computational issues in attentive models
- Applications of attentive models (e.g., robotics and embodied AI, medical imaging, document analysis, cultural heritage)
Keywords: Attention, Attentive Models, Transformer, Saliency, Explainability, Multimodal Networks, Vision-and-Language
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.