AUTHOR=Grieve Jack , Bartl Sara , Fuoli Matteo , Grafmiller Jason , Huang Weihang , Jawerbaum Alejandro , Murakami Akira , Perlman Marcus , Roemling Dana , Winter Bodo TITLE=The sociolinguistic foundations of language modeling JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 7 - 2024 YEAR=2025 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1472411 DOI=10.3389/frai.2024.1472411 ISSN=2624-8212 ABSTRACT=In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.