Language is the medium through which we access the world, and it reflects the cultural dimension of the people. The development of artificial intelligence (AI) language models (LMs) is a long story. Since the introduction of transformer architecture in 2017, the adoption of large language models (LLMs) capable of engaging in dialogue, answering queries, and generating human-like content has grown. These advancements offer great opportunities for new technological applications and services that will benefit people.
Globally, there are approximately 7,000 spoken languages. However, most LLMs focus only on about 50 languages with high resources.
Though they have little digital presence, minority (low-resource) languages are a large and culturally significant reality. In many regions, they are spoken by a significant portion of the vulnerable population. Language barriers reduce opportunities for quality education, healthcare, financial access, employment, and other services that contribute to a high quality of life.
Although the many low-resource languages represent significant global communities, they generally lack the digital data and resources necessary to support AI-based LM tasks or benefit from recent advancements in the field.
Particularly, low-resource languages are subject to two significant limitations: a shortage of labeled and unlabeled language data, as well as data of poor quality that does not sufficiently represent the languages and their sociocultural contexts.
Some efforts have been made through workshops and conferences (e.g. Conference of the European Chapter of the Association for Computational Linguistics - ECAL; LoResLM; IberLEF, Conference on LM - COLM, etc.), some of which are more well-established. However, a special issue in a high-impact journal would be an opportunity to consolidate efforts and address challenges in the field. Ultimately, this special issue aims to advance responsible AI innovation that empowers low-resource language communities and shapes a more inclusive future for global language technologies.
We invite submissions on a broad range of topics related to the development and evaluation of language models for low-resource languages, including but not limited to the following:
• Building LMs for low-resource languages.
• Adapting/extending existing LMs/LLMs for low-resource languages.
• Corpora creation and curation technologies for training LMs/LLMs for low-resource languages.
• Strategies for Overcoming Data Scarcity.
• Benchmarks to evaluate LMs/LLMs in low-resource languages.
• Prompting/in-context learning strategies for low-resource languages with LLMs.
• Promoting participatory research with low-resource language communities.
• Review of available corpora to train/fine-tune LMs/LLMs for low-resource languages.
• Multilingual/cross-lingual LMs/LLMs for low-resource languages.
• Applications of LMs/LLMs for low-resource languages (i.e. machine translation, chatbots, content moderation, etc.)
• Bias and fairness in low-resource language technologies
• Sociolinguistic considerations in technology development
• Cultural appropriateness and sensitivity
Article types and fees
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Conceptual Analysis
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
General Commentary
Hypothesis and Theory
Methods
Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.
Article types
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Conceptual Analysis
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
General Commentary
Hypothesis and Theory
Methods
Mini Review
Opinion
Original Research
Perspective
Policy and Practice Reviews
Review
Systematic Review
Technology and Code
Keywords: Language Models (LMs), Low-Resource Languages, Adaptive Large Language Models (LLMs), Corpora creation, Multilingual pre-training, Cross lingual knowledge transfer, Ethical consideration, Cultural sensitivity
Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.