Your research can change the world
More on impact ›


Front. Big Data | doi: 10.3389/fdata.2021.625290

A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings Provisionally accepted The final, formatted version of the article will be published soon. Notify me

  • 1Bern University of Applied Sciences, Switzerland

Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in a different form, depending on the language and the cultural context. In this work we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application, and to potential mitigation measures.

Keywords: word embeddings, fairness, Digital ethics, Natural Language Processing, Training Data, Language models, gender, Bias

Received: 02 Nov 2020; Accepted: 08 Apr 2021.

Copyright: © 2021 Kurpicz-Briki and Leoni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Mascha Kurpicz-Briki, Bern University of Applied Sciences, Bern, Switzerland,