Research Topic

Identifying, Analyzing, and Overcoming Challenges in Vision and Language Research

About this Research Topic

Integration of computer vision and natural language processing is one of the holy grails in machine learning and artificial intelligence research with important and wide-reaching applications in both real-world and digital lives. Progress in vision and language (V&L) research has been swift; those have been further supported by the development of algorithms capable of producing increasingly better results in various V&L tasks, such as visual question answering, image captioning, image retrieval, visual dialogue, and several other visual reasoning tasks. However, despite stimulating progress, recent studies have shown that the majority of current datasets contain a non-negligible amount of spurious correlations and/or dataset biases. Moreover, several existing models mirror or amplify those biases and current evaluation metrics may be insufficient to identify these issues. In this context, it is imperative to strive towards proper design, evaluation, and analysis of the data and models used for V&L research.

Possible solutions may lay towards a three-pronged approach, which involves improvements in datasets, algorithms, and evaluation metrics used for V&L research. Recently, advances have been made in each of those tasks including the creation of synthetic datasets (NLVR, CLEVR), new variations on existing datasets (VQA-CP, TDIUC, nocaps), and altogether new tasks (GQA, FOIL, Social-IQ). In parallel, a plethora of new algorithms, evaluation metrics, and critical analyses regarding dataset bias, spurious correlations, interpretability, out-of-distribution performance, and related issues have been implemented.

By bringing together researchers from machine learning, computer vision, natural language processing areas, and experts from a variety of application domains, this Research Topic aims at representing the state-of-the-art in V&L research and at fostering new foundational research towards robust, fair, and interpretable AI for V&L.
Therefore, we seek a broad range of original contributions by researchers and practitioners from different disciplines within the V&L domain. We welcome submissions regarding novel algorithms, datasets, analysis, and other innovations that make advancements in highlighting and addressing challenges in vision and language research, particularly along the lines of demonstrating improved algorithmic fairness, interpretability, and robustness to bias, spurious correlations, long-tailed and out-of-distribution data.

The submissions may include, but not limited to, the following topics:
- Novel algorithms and techniques that help improve the state-of-the-art in existing V&L tasks;
- Novel V&L algorithms that are less prone to dataset bias and spurious correlations, enforce demographic fairness and/or are more interpretable and explainable;
- Novel datasets, sub-tasks, and challenges that help test for new capabilities and/or highlight shortcomings with existing datasets and algorithms in V&L;
- Controlled test sets aimed to evaluate specific abilities involved in the language grounded visual understanding;
- Probing tasks aiming to evaluate the quality of the multimodal V&L representations;
- Novel evaluation metrics that help accurate and fair evaluation of V&L algorithms with respect to dataset bias, label imbalance, lack of compositionality, and other related issues with V&L tasks;
- Previously unknown analysis, key-observations, discussion, and insights about bias and related issues in existing V&L datasets and algorithms;
- Negative or critical results regarding practices currently used in mainstream V&L research;
- Successes or challenges of integrating vision and language in a novel application domain.

We also welcome well-formulated survey articles, opinion pieces, position papers, or commentaries regarding the current state and future prospects of V&L research as long as it fits within the theme of the Research Topic.


Keywords: vision and language, bias and fairness, explainable visual grounding, probing tasks for vision and language, evaluation of vision and language models


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Integration of computer vision and natural language processing is one of the holy grails in machine learning and artificial intelligence research with important and wide-reaching applications in both real-world and digital lives. Progress in vision and language (V&L) research has been swift; those have been further supported by the development of algorithms capable of producing increasingly better results in various V&L tasks, such as visual question answering, image captioning, image retrieval, visual dialogue, and several other visual reasoning tasks. However, despite stimulating progress, recent studies have shown that the majority of current datasets contain a non-negligible amount of spurious correlations and/or dataset biases. Moreover, several existing models mirror or amplify those biases and current evaluation metrics may be insufficient to identify these issues. In this context, it is imperative to strive towards proper design, evaluation, and analysis of the data and models used for V&L research.

Possible solutions may lay towards a three-pronged approach, which involves improvements in datasets, algorithms, and evaluation metrics used for V&L research. Recently, advances have been made in each of those tasks including the creation of synthetic datasets (NLVR, CLEVR), new variations on existing datasets (VQA-CP, TDIUC, nocaps), and altogether new tasks (GQA, FOIL, Social-IQ). In parallel, a plethora of new algorithms, evaluation metrics, and critical analyses regarding dataset bias, spurious correlations, interpretability, out-of-distribution performance, and related issues have been implemented.

By bringing together researchers from machine learning, computer vision, natural language processing areas, and experts from a variety of application domains, this Research Topic aims at representing the state-of-the-art in V&L research and at fostering new foundational research towards robust, fair, and interpretable AI for V&L.
Therefore, we seek a broad range of original contributions by researchers and practitioners from different disciplines within the V&L domain. We welcome submissions regarding novel algorithms, datasets, analysis, and other innovations that make advancements in highlighting and addressing challenges in vision and language research, particularly along the lines of demonstrating improved algorithmic fairness, interpretability, and robustness to bias, spurious correlations, long-tailed and out-of-distribution data.

The submissions may include, but not limited to, the following topics:
- Novel algorithms and techniques that help improve the state-of-the-art in existing V&L tasks;
- Novel V&L algorithms that are less prone to dataset bias and spurious correlations, enforce demographic fairness and/or are more interpretable and explainable;
- Novel datasets, sub-tasks, and challenges that help test for new capabilities and/or highlight shortcomings with existing datasets and algorithms in V&L;
- Controlled test sets aimed to evaluate specific abilities involved in the language grounded visual understanding;
- Probing tasks aiming to evaluate the quality of the multimodal V&L representations;
- Novel evaluation metrics that help accurate and fair evaluation of V&L algorithms with respect to dataset bias, label imbalance, lack of compositionality, and other related issues with V&L tasks;
- Previously unknown analysis, key-observations, discussion, and insights about bias and related issues in existing V&L datasets and algorithms;
- Negative or critical results regarding practices currently used in mainstream V&L research;
- Successes or challenges of integrating vision and language in a novel application domain.

We also welcome well-formulated survey articles, opinion pieces, position papers, or commentaries regarding the current state and future prospects of V&L research as long as it fits within the theme of the Research Topic.


Keywords: vision and language, bias and fairness, explainable visual grounding, probing tasks for vision and language, evaluation of vision and language models


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Topic Editors

Loading..

Submission Deadlines

13 February 2021 Abstract
13 June 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..

Topic Editors

Loading..

Submission Deadlines

13 February 2021 Abstract
13 June 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..
Loading..

total views article views article downloads topic views

}
 
Top countries
Top referring sites
Loading..