About this Research Topic
Integration of computer vision and natural language processing is one of the holy grails in machine learning and artificial intelligence research with important and wide-reaching applications in both real-world and digital lives. Progress in vision and language (V&L) research has been swift; those have been further supported by the development of algorithms capable of producing increasingly better results in various V&L tasks, such as visual question answering, image captioning, image retrieval, visual dialogue, and several other visual reasoning tasks. However, despite stimulating progress, recent studies have shown that the majority of current datasets contain a non-negligible amount of spurious correlations and/or dataset biases. Moreover, several existing models mirror or amplify those biases and current evaluation metrics may be insufficient to identify these issues. In this context, it is imperative to strive towards proper design, evaluation, and analysis of the data and models used for V&L research.
Possible solutions may lay towards a three-pronged approach, which involves improvements in datasets, algorithms, and evaluation metrics used for V&L research. Recently, advances have been made in each of those tasks including the creation of synthetic datasets (NLVR, CLEVR), new variations on existing datasets (VQA-CP, TDIUC, nocaps), and altogether new tasks (GQA, FOIL, Social-IQ). In parallel, a plethora of new algorithms, evaluation metrics, and critical analyses regarding dataset bias, spurious correlations, interpretability, out-of-distribution performance, and related issues have been implemented.
By bringing together researchers from machine learning, computer vision, natural language processing areas, and experts from a variety of application domains, this Research Topic aims at representing the state-of-the-art in V&L research and at fostering new foundational research towards robust, fair, and interpretable AI for V&L.
Therefore, we seek a broad range of original contributions by researchers and practitioners from different disciplines within the V&L domain. We welcome submissions regarding novel algorithms, datasets, analysis, and other innovations that make advancements in highlighting and addressing challenges in vision and language research, particularly along the lines of demonstrating improved algorithmic fairness, interpretability, and robustness to bias, spurious correlations, long-tailed and out-of-distribution data.
The submissions may include, but not limited to, the following topics:
- Novel algorithms and techniques that help improve the state-of-the-art in existing V&L tasks;
- Novel V&L algorithms that are less prone to dataset bias and spurious correlations, enforce demographic fairness and/or are more interpretable and explainable;
- Novel datasets, sub-tasks, and challenges that help test for new capabilities and/or highlight shortcomings with existing datasets and algorithms in V&L;
- Controlled test sets aimed to evaluate specific abilities involved in the language grounded visual understanding;
- Probing tasks aiming to evaluate the quality of the multimodal V&L representations;
- Novel evaluation metrics that help accurate and fair evaluation of V&L algorithms with respect to dataset bias, label imbalance, lack of compositionality, and other related issues with V&L tasks;
- Previously unknown analysis, key-observations, discussion, and insights about bias and related issues in existing V&L datasets and algorithms;
- Negative or critical results regarding practices currently used in mainstream V&L research;
- Successes or challenges of integrating vision and language in a novel application domain.
We also welcome well-formulated survey articles, opinion pieces, position papers, or commentaries regarding the current state and future prospects of V&L research as long as it fits within the theme of the Research Topic.
Keywords: vision and language, bias and fairness, explainable visual grounding, probing tasks for vision and language, evaluation of vision and language models
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.