About this Research Topic
Online platforms such as Wikipedia, Google Photos, and social media have become a commonplace, but they are also sources of commonly used training datasets for machine learning. Existing research indicates that for specific platforms such datasets, including structured data, images, texts, and videos, can be biased. Biased human-curated datasets broadly reflect the stereotypes and prejudices of our societies. Using them to train machine learning models can, thus, bias those models. As a result, biased training datasets (e.g. gender or race stereotypes and more) can have dramatic consequences on the fairness of applications using machine learning models.
When a model trained on biased data is used for decision making, unfair decisions take place. Such decisions can unjustifiably exclude members of society from certain benefits as, for example, getting a loan, having access to mobility as a service and access to justice, or being successful in a job application, to name a few. It is of paramount importance to develop techniques and frameworks for identifying and measuring the fairness of training data fed into machine learning algorithms. By successfully identifying and analyzing the nature of different data-driven algorithmic biases and making them measurable and transparent, we can raise awareness and address them in an appropriate manner. Furthermore, existing datasets and pre-trained models do not always allow for the easy analysis or modification of the underlying training data. It is, therefore, essential to develop methods to identify and measure biases encoded inside those opaquer datasets and models.
This Research Topic welcomes contributions from practical and theoretical perspectives that address, but are not limited to, the following topics:
● Data transparency
● Applications of fairness measures to community challenges
● Methods for, and applications of, the ethical evaluation of datasets or pre-trained models
● Novel methodologies for bias evaluation of algorithmic decisions and algorithms in the context of image processing, natural language processing, and probabilistic methods
● Studies that identify and characterize different types of bias including but not limited to gender bias, origin or racial bias, religious bias, socioeconomic, or age bias in existing datasets, algorithms, or models
● The impact of cultural aspects on bias in multilingual datasets
● Methods for, and applications of, fairly curating training datasets
Topic editor Prof. Dr. Mascha Kurpicz-Briki is a member of the board of directors at IFAA cooperative in Bern, Switzerland. All other Topic Editors declare no competing interests with regards to the Research Topic subject.
Keywords: machine learning, training data, fairness, digital ethics, bias
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.