Learning robust linguistic representations from text has long been a central challenge in natural language processing and language sciences. NLP systems seek to capture syntactic, lexical, semantic, and pragmatic cues to support generalization to unseen language and complex communicative contexts. Earlier approaches typically modeled isolated linguistic features for narrowly defined tasks, limiting their ability to represent language nuance. In contrast, contemporary pretrained and large language models exhibit strong performance across language understanding and generation, including information retrieval and synthesis, often producing text that resembles expert human discourse. Despite this progress, the mechanisms by which such models learn, organize, and operationalize linguistic rules and structures across multiple levels of language remain poorly understood. This limitation is particularly critical in the contexts (e.g., health) where nuanced interpretation, uncertainty expression, and evidence integration are essential. Persistent failures such as hallucination, bias, and misinformation highlight the need for deeper insight into how linguistic representation learning shapes both the capabilities and risks of LLMs in information use.
This Research Topic aims to advance theoretical and empirical understanding of how LLMs acquire, represent, and deploy linguistic knowledge in critical and high-consequence discourse, such as those in healthcare and legal contexts, to name a few. The first goal is to investigate how LLMs internalize language structures across multiple levels, including morphology, syntax, semantics, pragmatics, and discourse, in contexts where precise interpretation is critical. Contributions may examine the formal properties of learned representations, mechanisms for capturing compositionality and hierarchical structure, handling of ambiguity and polysemy, and the modeling of modality, evidentiality, and register variation. Studies may also probe cross-linguistic or cross-register generalization, the role of training corpora in shaping linguistic competence, and methods for interpreting latent representations in light of linguistic theory.
The second goal is to explore how these linguistic representations are operationalized in high-stakes information retrieval and synthesis. This includes analyses of how models negotiate semantic coherence, pragmatic inference, and discourse-level structuring in query understanding, summarization, explanation, and retrieval-augmented generation. Contributions may further examine the limitations of modelled linguistic competence, including misrepresentation, uncertainty propagation, and biases in discourse-level output, providing insights into the interplay between formal linguistic structure and the reliability of generated information.
This Research Topic, therefore, welcomes original research, brief research reports, systematic reviews, and conceptual analysis submissions addressing, but not limited to, the questions listed below, with a particular emphasis on health and healthcare applications alongside other high-stakes domains:
* How do LLMs capture and differentiate linguistic nuances specific to high-stakes language?
* How do models handle pragmatic inference, uncertainty expression, and discourse coherence in high-impact communication?
* To what extent do LLMs recognize and generate register-appropriate language, hedging, modality, and terminological variation in text?
* How do gaps in linguistic competence contribute to errors such as misinformation, biased recommendations, or misinterpretation in risk-sensitive outputs?
* Which probing and evaluation methods best reveal the models’ capacity to encode semantic, pragmatic, and discourse-level knowledge?
* How can insights from language science improve the interpretability, reliability, and human-centeredness of LLM-generated mission-critical information?
Article types and fees
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Case Report
Clinical Trial
Community Case Study
Conceptual Analysis
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data
Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.
Article types
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Case Report
Clinical Trial
Community Case Study
Conceptual Analysis
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
General Commentary
Hypothesis and Theory
Methods
Mini Review
Opinion
Original Research
Perspective
Policy and Practice Reviews
Policy Brief
Registered Report
Review
Study Protocol
Systematic Review
Technology and Code
Keywords: Linguistic Representation Learning, Large Language Models (LLMs), Pragmatics and Discourse Coherence, Uncertainty Modality and Evidentiality, High-Stakes NLP
Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.