Frontiers | Linguistic Representations in Large Language Models for High-Impact Discourse: Structure, Pragmatics, and Reliability

About this Research Topic

Submission deadlines

Manuscript Summary Submission Deadline 29 April 2026 | Manuscript Submission Deadline 17 August 2026
This Research Topic is currently accepting articles.
1. Check author guidelines

Background

Learning robust linguistic representations from text has long been a central challenge in natural language processing and language sciences. NLP systems seek to capture syntactic, lexical, semantic, and pragmatic cues to support generalization to unseen language and complex communicative contexts. Earlier approaches typically modeled isolated linguistic features for narrowly defined tasks, limiting their ability to represent language nuance. In contrast, contemporary pretrained and large language models exhibit strong performance across language understanding and generation, including information retrieval and synthesis, often producing text that resembles expert human discourse. Despite this progress, the mechanisms by which such models learn, organize, and operationalize linguistic rules and structures across multiple levels of language remain poorly understood. This limitation is particularly critical in the contexts (e.g., health) where nuanced interpretation, uncertainty expression, and evidence integration are essential. Persistent failures such as hallucination, bias, and misinformation highlight the need for deeper insight into how linguistic representation learning shapes both the capabilities and risks of LLMs in information use.

This Research Topic aims to advance theoretical and empirical understanding of how LLMs acquire, represent, and deploy linguistic knowledge in critical and high-consequence discourse, such as those in healthcare and legal contexts, to name a few. The first goal is to investigate how LLMs internalize language structures across multiple levels, including morphology, syntax, semantics, pragmatics, and discourse, in contexts where precise interpretation is critical. Contributions may examine the formal properties of learned representations, mechanisms for capturing compositionality and hierarchical structure, handling of ambiguity and polysemy, and the modeling of modality, evidentiality, and register variation. Studies may also probe cross-linguistic or cross-register generalization, the role of training corpora in shaping linguistic competence, and methods for interpreting latent representations in light of linguistic theory.

The second goal is to explore how these linguistic representations are operationalized in high-stakes information retrieval and synthesis. This includes analyses of how models negotiate semantic coherence, pragmatic inference, and discourse-level structuring in query understanding, summarization, explanation, and retrieval-augmented generation. Contributions may further examine the limitations of modelled linguistic competence, including misrepresentation, uncertainty propagation, and biases in discourse-level output, providing insights into the interplay between formal linguistic structure and the reliability of generated information.

This Research Topic, therefore, welcomes original research, brief research reports, systematic reviews, and conceptual analysis submissions addressing, but not limited to, the questions listed below, with a particular emphasis on health and healthcare applications alongside other high-stakes domains:

* How do LLMs capture and differentiate linguistic nuances specific to high-stakes language?

* How do models handle pragmatic inference, uncertainty expression, and discourse coherence in high-impact communication?

* To what extent do LLMs recognize and generate register-appropriate language, hedging, modality, and terminological variation in text?

* How do gaps in linguistic competence contribute to errors such as misinformation, biased recommendations, or misinterpretation in risk-sensitive outputs?

* Which probing and evaluation methods best reveal the models’ capacity to encode semantic, pragmatic, and discourse-level knowledge?

* How can insights from language science improve the interpretability, reliability, and human-centeredness of LLM-generated mission-critical information?

Article types and fees

This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:

Brief Research Report
Case Report
Clinical Trial
Community Case Study
Conceptual Analysis
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data

Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.

Keywords: Linguistic Representation Learning, Large Language Models (LLMs), Pragmatics and Discourse Coherence, Uncertainty Modality and Evidentiality, High-Stakes NLP

Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic editors

Share on

Frontiers in Language Sciences

Language Processing

Manuscripts can be submitted to this Research Topic via the main journal or any other participating journal.

Impact

195Topic views

View impact

Linguistic Representations in Large Language Models for High-Impact Discourse: Structure, Pragmatics, and Reliability

About this Research Topic

Background

Article types and fees

Topic editors

bahadorreza ofoghi

abeed sarker

Frontiers in Language Sciences

Language Processing