<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Artificial Intelligence | Natural Language Processing section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/artificial-intelligence/sections/natural-language-processing</link>
        <description>RSS Feed for Natural Language Processing section in the Frontiers in Artificial Intelligence journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-04-04T00:45:30.479+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1749517</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1749517</link>
        <title><![CDATA[From simulated empathy to structural attunement: Realtime Editable Memory Topology and the evolution of emotionally grounded AI]]></title>
        <pubdate>2026-03-31T00:00:00Z</pubdate>
        <category>Perspective</category>
        <author>John Albanese</author>
        <description><![CDATA[Large language models (LLMs) and retrieval-augmented generation (RAG) systems have achieved remarkable linguistic fluency, and many now implement persistent cross-session memory at the application layer. However, these mechanisms typically rely on external storage and reinjection of stored content rather than structural reorganization of memory relationships. As a result, they remain limited in their ability to integrate affective salience into a dynamically evolving internal memory topology capable of supporting coherent long-term behavior. To address this gap, we introduce Realtime Editable Memory Topology (REMT), an architectural framework for imbuing conversational agents with persistent autobiographical memory organized as an evolving graph of emotionally valenced nodes. REMT formalizes synthetic neuroplasticity through explicit update rules governing edge reinforcement, decay, and pruning, and introduces a bounded Mood Index that modulates retrieval bias and response generation as a function of accumulated affective experience. In this Perspective, we argue that memory-grounded architectures integrating insights from cognitive science, affective computing, and memory-augmented neural systems are necessary for building adaptive conversational agents with stable long-term interactional tendencies. We conclude by outlining a roadmap for empirical validation using an internally developed evaluation framework, with results to be reported in a future Original Research article.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1766899</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1766899</link>
        <title><![CDATA[A multi-layer annotated corpus for information extraction in Russian clinical NLP]]></title>
        <pubdate>2026-03-31T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Anar Sultangaziyeva</author><author>Madina Sambetbayeva</author><author>Nurzhan Mukazhanov</author><author>Bayangali Abdygalym</author><author>Sandugash Serikbayeva</author>
        <description><![CDATA[IntroductionClinical exome sequencing reports contain valuable genetic and phenotypic information but are typically stored in unstructured text form, making automated biomedical information extraction challenging. For the Russian language, publicly available annotated corpora for genetic report analysis remain extremely limited.MethodsWe present GENEXOM, the first multi-level annotated corpus of Russian-language clinical exome sequencing reports designed for biomedical information extraction. The corpus includes 5,318 reports (318 authentic and 5,000 synthetic) and comprises 16 entity types and 7 relation types aligned with HGVS, OMIM, ClinVar, and ACMG/AMP standards. Annotation was performed in the Label Studio platform by expert geneticists. Baseline transformer models (RuBERT, RuBioBERT, ModernBERT) were fine-tuned for Named Entity Recognition (NER) and Relation Extraction (RE).ResultsThe annotation achieved span-level F1-IAA = 0.83 and macro κ = 0.79 ± 0.04, indicating substantial inter-annotator agreement. Among the evaluated models, ModernBERT achieved the best performance with F1 = 0.88 ± 0.03 for NER and F1 = 0.836 ± 0.04 for RE on the held-out test set.DiscussionThe GENEXOM corpus provides a linguistically and clinically adapted resource for Russian medical NLP and supports downstream tasks such as variant interpretation, phenotype–disease mapping, and biomedical knowledge graph construction. The corpus and accompanying code are publicly available for research purposes.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1768701</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1768701</link>
        <title><![CDATA[Advanced feature selection and temporal attention mechanisms with Bi-LSTM classifier for optimizing emotion recognition in Kashmiri speech]]></title>
        <pubdate>2026-03-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>GH Mohmad Dar</author><author>Radhakrishnan Delhibabu</author>
        <description><![CDATA[This study introduces an advanced methodology for enhancing emotion recognition in Kashmiri speech by leveraging optimized feature selection and integrating temporal attention mechanisms into Long Short-Term Memory (LSTM) networks. A meticulous feature selection process identified key acoustic features, including Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Coding (LPC), and other relevant descriptors, as optimal for emotion classification. The incorporation of temporal attention layers significantly improved the model's capacity to capture complex emotional patterns and temporal dynamics within the speech data. The proposed attention-augmented LSTM model achieved an accuracy of 90.2%, outperforming the baseline LSTM model's accuracy of 86%. Notable improvements in precision, recall, and F1-scores across multiple emotional categories further highlight the efficacy of the attention mechanism in capturing subtle emotional variations. In addition to performance gains, the study provides a clear research direction by demonstrating how attention–based temporal modeling can benefit low-resource languages such as Kashmiri, where linguistic and prosodic cues differ significantly from widely studied languages. The findings therefore establish a methodological baseline that supports future SER deployments in digital domains, including chat-based systems, affect-aware agents, and other human–machine interfaces. These findings underscore the model's ability to enhance both the sensitivity and specificity of emotion recognition systems, offering a robust and efficient framework for speech-based emotion analysis. Future work will extend the proposed methodology to multilingual settings and incorporate multimodal information, enabling deeper analysis of emotional expression across diverse linguistic and cultural contexts.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1759136</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1759136</link>
        <title><![CDATA[Graph convolution-based techniques for pragmatic Arabic figurative language classification]]></title>
        <pubdate>2026-03-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Zouheir Banou</author><author>Fatima-Zahra Alaoui</author><author>Sanaa El Filali</author><author>El Habib Benlahmar</author><author>Laila El Jiani</author><author>Hasnae Sakhi</author>
        <description><![CDATA[Figurative language, including euphemism and metonymy, presents significant challenges in natural language processing (NLP) due to its abstract and context-dependent nature, particularly in morphologically rich and low-resource languages like Arabic. This paper introduces a graph-based embedding framework for figurative language classification that captures both syntactic dependencies and semantic relationships using heterogeneous graphs. We propose a configurable pipeline that converts text into structured graphs incorporating lexical, morphological, and syntactic cues, enabling deeper semantic reasoning. These graphs are processed using various graph neural network (GNN) architectures—such as GAT, HANConv, and MixHopConv—designed to model complex linguistic interactions. The approach is evaluated on two Arabic-language tasks: euphemism and metonymy detection. Our results demonstrate that attention-based and multi-hop GNNs outperform both traditional baselines and state-of-the-art transformer models (e.g., AraBERT, XLM-RoBERTa), particularly in metonymy detection where topological cues are more pronounced. HANConv and GAT achieve the highest F1-scores across tasks, while models like GraphConv and SAGEConv offer stability across configurations. We also introduce a validated Arabic lexical ontology for enriching semantic graphs. Our findings highlight the potential of graph-structured embeddings for nuanced linguistic tasks and suggest future directions including cross-lingual transfer, ontology expansion, and application to additional figurative categories.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1737790</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1737790</link>
        <title><![CDATA[An AI-driven conceptual framework for detecting fake news and deepfake content: a systematic review]]></title>
        <pubdate>2026-03-02T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Bravlyn VC. Moyo</author><author>Tite Tuyikeze</author><author>Fezile Matsebula</author><author>Ibidun C. Obagbuwa</author>
        <description><![CDATA[The rapid advancement of generative artificial intelligence (AI) has enabled the creation of highly realistic synthetic media, commonly referred to as deepfakes, which are increasingly multimodal and difficult to detect. While these technologies offer creative and commercial potential, they also pose critical challenges related to misinformation, media trust, and societal harm. Despite the growing body of research, existing reviews remain fragmented, often separating technical detection advances from social and governance considerations. This study addresses this gap through a systematic review conducted in accordance with PRISMA guidelines across IEEE Xplore, Scopus, ACM Digital Library, and Web of Science. From an initial set of 120 database records, complemented by citation chaining, 34 studies published between 2014 and 2025 were included for analysis. Eighteen studies focused on deepfake generation and detection models, eight examined social and behavioural implications, and eight addressed ethical and regulatory frameworks. Thematic synthesis reveals a clear methodological shift from convolutional neural networks toward transformer- and CLIP-based architectures, alongside the emergence of large-scale benchmark datasets. However, persistent challenges remain in multimodal detection, cross-dataset generalization, explainability–robustness trade-offs, and the translation of governance principles into deployable systems. This review contributes an integrated conceptual framework that operationally connects detection technologies, explainable AI (XAI), and governance mechanisms through explicit feedback loops. Future research directions emphasize robust multimodal benchmarks, retrieval-augmented detection systems, and interdisciplinary approaches that align technical innovation with ethical and policy safeguards.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1665992</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1665992</link>
        <title><![CDATA[An efficient strategy for fine-tuning large language models]]></title>
        <pubdate>2026-02-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Benjamin Marsh</author><author>Adam Michaleas</author><author>Darrell O. Ricke</author><author>Shaun Monera</author><author>Shriya Zembruski</author>
        <description><![CDATA[IntroductionLarge Language Models (LLMs) achieve strong performance on many Natural Language Processing tasks, but adapting them to domain-specific applications is resource-intensive due to the cost of curating task-specific datasets and the compute required for fine-tuning. This work proposes an end-to-end strategy for rapidly fine-tuning LLMs for domain-specific tasks when both data and compute are limited.MethodsThe strategy uses Distilling Step-by-Step (DSS) for dataset development and model training, where a teacher model generates task labels and intermediate rationales via Chain-of-Thought prompting for a natural-language-to-Query-DSL structured generation task. Using the resulting supervision, we benchmark three fine-tuning modalities through hyperparameter sweeps: full-precision fine-tuning, Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA). To isolate the effect of rationale supervision, we additionally conduct an ablation study comparing DSS training (label + rationale supervision) against a label-only configuration.ResultsAcross the evaluated configurations, DSS combined with full-precision fine-tuning yields the strongest overall performance. Under resource constraints, DSS with LoRA provides an effective performance-efficiency tradeoff, and DSS with QLoRA enables training under tighter GPU memory budgets while maintaining competitive performance. In the parameter-efficient regimes, an alpha-to-rank ratio of 4:1 provides a consistent balance of performance and compute consumption across the explored settings.DiscussionThese findings support a practical process for resource-constrained domain adaptation: use DSS to efficiently construct datasets, then select the fine-tuning modality based on available compute (full-precision when feasible; LoRA or QLoRA when memory-limited). The proposed workflow offers a general guide for efficiently fine-tuning LLMs for domain-specific tasks with limited data availability.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1706369</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1706369</link>
        <title><![CDATA[Impact of Natural Language Processing models on diagnosis and decision-making in healthcare, business, education, and sports: a review]]></title>
        <pubdate>2026-02-05T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Aryan Choudhary</author><author>Sireesha Pamidimokkala</author><author>Krithiga R.</author><author>Bhavadharini R. M.</author>
        <description><![CDATA[Natural Language Processing (NLP) has an influence on almost every field nowadays, such as business, healthcare, and sports, by making advanced interactions with human language and providing analytics. In the field of business, NLP has been a revolution, bettering customer service with the help of advanced chatbots, sentiment analysis, and automation in generating content, which enhances efficiency, personalization, and most importantly, decision-making. In healthcare, NLP is of crucial importance in decoding unstructured data like of medical records, supporting diagnostic accuracy, and making patient communication smoother, leading to better outcomes and improving efficiency. When it comes to sports, NLP provides critical insights through performance analytics, media content interpretation, and improved fan engagement, transforming data to utilize it for our advantage. The aim of this review is to systematically evaluate NLP's effectiveness across these sectors, address possible and existing challenges, and propose approaches for future research. Through the integration of case studies and performance assessments, we seek to clearly explain how NLP promotes innovation, resolves complex issues, and has made contributions to advancing new heights in these domains.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1708993</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1708993</link>
        <title><![CDATA[Small language models applied in text summarization task of health-related news to improve public health audit: an experimental case study]]></title>
        <pubdate>2026-02-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Alysson Guimarães</author><author>Methanias Colaço Junior</author><author>Samuel Santana De Almeida</author><author>Gabriely Garcia Ferreira de Araújo</author><author>Raphael Silva Fontes</author><author>Helder Prado</author><author>Luca Pareja Credidio Freire Alves</author><author>Natan Matos</author><author>Ricardo Alexsandro de Medeiros Valentim</author><author>João Paulo Queiroz dos Santos</author>
        <description><![CDATA[ContextFraud and corruption are among the main crimes affecting public institutions, with the healthcare sector being particularly vulnerable due to its structural complexity, the coexistence of public and private providers, the large number of actors involved, the globalized nature of supply chains, the high financial costs, and the information asymmetry among stakeholders. These factors weaken healthcare systems, resulting in resource waste, reduced resilience during medical emergencies, and limited access to essential services.ObjectiveThis study aims to evaluate automatic text summarization methods by comparing the quality of machine-generated summaries with those produced by humans, from the perspective of Data Scientists and SUS Auditors, within the context of audits carried out by the National Department of Unified Health System (Sistema Único de Saúde—SUS) Auditing (AudSUS).MethodA controlled experiment was conducted to assess the performance of Small Language Models (SLMs) in summarization tasks, using the metrics ROUGE-N, ROUGE-L, BLEU, METEOR, and BERTScore. In addition, the consistency of results across 35 runs, their contribution to reducing information overload, and their pairwise performances were evaluated.ResultsThe models NousResearch/Hermes-3-Llama-3.2-3B, Qwen/Qwen2.5-7B-Instruct, and meta-llama/Llama-3.2-3B-Instruct achieved the highest average performances across all metrics, standing out for their ability to preserve contextual meaning and synthesize essential information more effectively than human-generated summaries.ConclusionThe findings highlight the potential of SLMs as tools to reduce information overload, thereby enhancing the effectiveness of the analytical phase of audits and enabling faster preparation of teams for the operational stage.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1666674</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1666674</link>
        <title><![CDATA[TEGAA: transformer-enhanced graph aspect analyzer with semantic contrastive learning for implicit aspect detection]]></title>
        <pubdate>2026-02-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Piyush Kumar Soni</author><author>Radhakrishna Rambola</author>
        <description><![CDATA[Implicit aspect detection aims to identify aspect categories that are not explicitly mentioned in text, but existing models struggle with four persistent challenges: aspect ambiguity, where multiple latent aspects are implied by the same expression, data imbalance and sparsity of implicit cues, contextual noise and syntactic variability in unstructured user reviews, and aspect drift, where the relevance of implicit cues changes across sentences or domains. To address these issues, this paper proposes the Transformer-Enhanced Graph Aspect Analyzer (TEGAA), a unified framework that tightly integrates dynamic expert routing, semantic representation refinement, and hierarchical graph reasoning. First, a Dynamic Expert Transformer (DET) equipped with a Dynamic Adaptive Expert Engine (DAEE) mitigates syntactic complexity and contextual noise by dynamically routing tokens to specialized expert sub-networks based on contextual and syntactic–semantic cues, enabling robust feature extraction for ambiguous implicit expressions. Second, Semantic Contrastive Learning (SCL) directly addresses data imbalance and weak implicit signals by enforcing semantic coherence among contextually related samples while increasing separability from irrelevant ones, thereby improving discriminability of sparse implicit aspect cues. Third, implicit aspect ambiguity and aspect drift are handled through a Graph-Enhanced Hierarchical Aspect Detector (GE-HAD), which models word- and sentence-level dependencies via context-aware graph attention. The incorporation of Attention Sinks prevents dominant but irrelevant tokens from overshadowing subtle implicit cues, while Pyramid Pooling aggregates multi-scale contextual information to stabilize aspect inference across varying linguistic scopes. Finally, an iterative feedback loop aligns graph-level reasoning with transformer-level expert routing, enabling adaptive refinement of aspect representations. Experiments on three benchmark datasets—Mobile Reviews, SemEval14, and Sentihood—demonstrate that TEGAA consistently outperforms state-of-the-art methods, achieving F1-scores above 0.88, precision above 0.89, recall above 0.87, accuracy exceeding 89%, and AUC values above 0.89. These results confirm TEGAA’s effectiveness in resolving implicit aspect ambiguity, handling noisy and imbalanced data, and maintaining robust performance across domains.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1691074</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1691074</link>
        <title><![CDATA[Private speech: similarities between a large language model and children]]></title>
        <pubdate>2026-01-29T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Zhiyu Liang</author><author>Leon On Tay</author><author>Simon Dennis</author>
        <description><![CDATA[This study investigates the capability of a non-reasoning large language model (GPT-4o) to generate private speech and evaluates its similarity to human private speech. We placed the model in a simulated solitary block-construction scenario via textual prompts, eliciting and classifying its self-directed utterances using an established semantic framework for categorizing private speech in children. The distribution of these categories was compared to two human benchmarks: a classic block-construction study and a more recent experiment employing a similar task setting. Analysis using scatter plots and Pearson correlation coefficients revealed a striking pattern: GPT-4o’s semantic profile showed negligible similarity to the classic benchmark (r = 0.01) but very strong similarity to the recent benchmark (r = 0.93). This discrepancy is interpreted as stemming from differences in task nature, namely goal-directed, scaffolded task versus self-determined, unscaffolded play, which exert a stronger influence on speech content than experimental subject difference between GPT-4o and children. In an exploratory serial recall study, we tasked GPT-3.5-Turbo-instruct and observed incidental private speech, indicating that the phenomenon extends across contexts. This provides an avenue for investigating LLM replication of private speech and, potentially, computational consciousness.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1679962</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1679962</link>
        <title><![CDATA[Advancing cyberbullying detection in low-resource languages: a transformer- stacking framework for Bengali]]></title>
        <pubdate>2026-01-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Md. Nesarul Hoque</author><author>Rudra Pratap Deb Nath</author><author>Abu Nowshed Chy</author><author>Debasish Ghose</author><author>Md Hanif Seddiqui</author>
        <description><![CDATA[Cyberbullying on social networks has emerged as a pressing global issue, yet research in low-resource languages such as Bengali remains underdeveloped due to the scarcity of high-quality datasets, linguistic resources, and targeted methodologies. Many existing approaches overlook essential language-specific preprocessing, neglect the integration of advanced transformer-based models, and do not adequately address model validation, scalability, and adaptability. To address these limitations, this study introduces three Bengali-specific preprocessing strategies to enhance feature representation. It then proposes Transformer-stacking, an effective hybrid detection framework that combines three transformer models, XLM-R-base, multilingual BERT, and Bangla-Bert-Base, via a stacking strategy with a multi-layer perceptron classifier. The framework is evaluated on a publicly available Bengali cyberbullying dataset comprising 44,001 samples across both binary (Sub-task A) and multiclass (Sub-task B) classification settings. Transformer-stacking achieves an F1-score of 93.61% and an accuracy of 93.62% for Sub-task A, and an F1-score and accuracy of 89.23% for Sub-task B, outperforming eight baseline transformer models, four transformer ensemble techniques, and recent state-of-the-art methods. These improvements are statistically validated using McNemar's test. Furthermore, experiments on two external Bengali datasets, focused on hate speech and abusive language, demonstrate the model's scalability and adaptability. Overall, Transformer-stacking offers an effective and generalizable solution for Bengali cyberbullying detection, establishing a new benchmark in this underexplored domain.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1725853</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1725853</link>
        <title><![CDATA[Positive sentiments in early academic literature on DeepSeek: a cross-disciplinary mini review]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Mini Review</category>
        <author>Yuxing He</author><author>Angie Giangan</author><author>Nam Vu</author><author>Casey Watters</author>
        <description><![CDATA[DeepSeek is a free and self-hostable large language model (LLM) that recently became the most downloaded app across 156 countries. As early academic literature on ChatGPT was predominantly critical of the model, this mini-review is interested in examining how DeepSeek is being evaluated across academic disciplines. The review analyzes available articles with DeepSeek in the title, abstract, or keywords, using the VADER sentiment analysis library. Due to limitations in comparing sentiment across languages, we excluded Chinese literature in our selection. We found that Computer Science, Engineering, and Medicine are the most prominent fields studying DeepSeek, showing an overall positive sentiment. Notably, Computer Science had the highest mean sentiment and the most positive articles. Other fields of interest included Mathematics, Business, and Environmental Science. While there is substantial academic interest in DeepSeek’s practicality and performance, discussions on its political or ethical implications are limited in academic literature. In contrast to ChatGPT, where all early literature carried a negative sentiment, DeepSeek literature is mainly positive. This study enhances our understanding of DeepSeek’s reception in the scientific community and suggests that further research could explore regional perspectives.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1550604</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1550604</link>
        <title><![CDATA[RiCoRecA: rich cooking recipe annotation schema]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Filippos Ventirozos</author><author>Mauricio Jacobo-Romero</author><author>Haifa Alrdahi</author><author>Sarah Clinch</author><author>Riza Batista-Navarro</author>
        <description><![CDATA[Despite recent advancements, modern kitchens, at best, have one or more isolated (non-communicating) “smart” devices. The vision of having a fully-fledged ambient kitchen where devices know what to do and when has yet to be realized. To address this, we present RiCoRecA, a novel schema for parsing cooking recipes into a workflow representation suitable for automation, a step toward that direction. Methodologically, the schema requires a number of information extraction tasks, i.e., annotating named entities, identifying relations between them, coreference resolution, and entity tracking. RiCoRecA differs from previously reported approaches in that it learns these different information extraction tasks using one joint model. We also provide a dataset containing annotations that follow this schema. Furthermore, we compared two transformer-based models for parsing recipes into workflows, namely, PEGASUS-X and LongT5. Our results demonstrate that PEGASUS-X surpassed LongT5 on all of the annotation tasks. Specifically, PEGASUS-X surpassed LongT5 by 39% in terms of F-Score when averaging the performance on all the tasks; it demonstrated almost human-like performance.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1703949</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1703949</link>
        <title><![CDATA[Understanding user perceptions of DeepSeek: insights from sentiment, topic and network analysis using a Reddit-based study]]></title>
        <pubdate>2026-01-06T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Naisarg Patel</author><author>Rajesh Sharma</author><author>Prakash Lingasamy</author><author>Vino Sundararajan</author><author>Sajitha Lulu Sudhakaran</author><author>Vijayachitra Modhukur</author>
        <description><![CDATA[IntroductionThe launch of DeepSeek, a Chinese open-source generative AI model, generated substantial discussion regarding its capabilities and implications. The r/deepseek subreddit emerged as a key forum for real-time public evaluation. Analyzing this discourse is essential for understanding the sociotechnical perceptions shaping the integration of emerging AI systems.MethodsWe analyzed 46,649 posts and comments from r/deepseek (January–May 2025) using a computational framework combining VADER sentiment analysis, Hartmann emotion classification, BERTopic for thematic modeling, hyperlink extraction, and directed network analysis. Data preprocessing included cleaning, normalization, and lemmatization. We also examined correlations between sentiment/emotion scores and dominant topics.ResultsSentiment was predominantly positive (posts: 47.23%; comments: 44.26%), with neutral sentiment comprising ~30% of content. The most frequent emotion was neutrality, followed by surprise and fear, indicating ambivalent user reactions. Prominent topics included open-source AI models, DeepSeek usage, device compatibility, comparisons with ChatGPT, and censorship concerns. Hyperlink analysis indicated strong engagement with GitHub, Hugging Face, and DeepSeek’s own services. Network analysis revealed a fragmented but active community, depicting Open-Source AI Models as the most cohesive cluster.DiscussionCommunity discourse framed DeepSeek as both a technical tool and a geopolitical issue. Enthusiasm centered on its performance, accessibility, and open-source nature, while concerns were voiced about censorship, data privacy, and potential ideological influence. The integrated analysis shows that collective perception emerged through decentralized, dialogic engagement, reflecting broader sociotechnical tensions related to openness, trust, and legitimacy in global AI development.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1694388</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1694388</link>
        <title><![CDATA[GECOBench: a gender-controlled text dataset and benchmark for quantifying biases in explanations]]></title>
        <pubdate>2026-01-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Rick Wilming</author><author>Artur Dox</author><author>Hjalmar Schulz</author><author>Marta Oliveira</author><author>Benedict Clark</author><author>Stefan Haufe</author>
        <description><![CDATA[Large pre-trained language models have become a crucial backbone for many downstream tasks in natural language processing (NLP), and while they are trained on a plethora of data containing a variety of biases, such as gender biases, it has been shown that they can also inherit such biases in their weights, potentially affecting their prediction behavior. However, it is unclear to what extent these biases also affect feature attributions generated by applying “explainable artificial intelligence” (XAI) techniques, possibly in unfavorable ways. To systematically study this question, we create a gender-controlled text dataset, GECO, in which the alteration of grammatical gender forms induces class-specific words and provides ground truth feature attributions for gender classification tasks. This enables an objective evaluation of the correctness of XAI methods. We apply this dataset to the pre-trained BERT model, which we fine-tune to different degrees, to quantitatively measure how pre-training induces undesirable bias in feature attributions and to what extent fine-tuning can mitigate such explanation bias. To this extent, we provide GECOBench, a rigorous quantitative evaluation framework for benchmarking popular XAI methods. We show a clear dependency between explanation performance and the number of fine-tuned layers, where XAI methods are observed to benefit particularly from fine-tuning or complete retraining of embedding layers.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1618791</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1618791</link>
        <title><![CDATA[Designing intelligent chatbots with ChatGPT: a framework for development and implementation]]></title>
        <pubdate>2026-01-05T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Sajjad Hyder</author><author>Javeed Kittur</author>
        <description><![CDATA[BackgroundThe rapid evolution of interactive AI has reshaped human-computer interaction, with ChatGPT emerging as a key tool for chatbot development. Industries such as healthcare, customer service, and education increasingly integrate chatbots, highlighting the need for a structured development framework.PurposeThis study proposes a framework for designing intelligent chatbots using ChatGPT, focusing on user experience, hybrid design models, prompt engineering, and system limitations. The framework aims to bridge the gap between technical innovation and real-world application.MethodsA systematic literature review (SLR) was conducted, analyzing 40 relevant studies. The research was structured around three key questions: (1) How do user experience and engagement influence chatbot performance? (2) How do hybrid design models improve chatbot performance? (3) What are the limitations of using ChatGPT, and how does prompt engineering affect responses?ResultsThe findings emphasize that well-designed user interactions enhance engagement and trust. Hybrid models integrating rule-based and machine learning techniques improve chatbot functionality. However, challenges such as response inconsistencies, ethical concerns, and prompt sensitivity require careful consideration. A framework for design, development, and implementation of effective Chatbots with ChatGPT has been proposed in this study.ConclusionThis study provides a structured framework for chatbot development with ChatGPT, offering insights into optimizing user experience, leveraging hybrid design, and mitigating limitations. The proposed framework serves as a practical guide for researchers, developers, and businesses aiming to create intelligent, user-centric chatbot solutions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1654496</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1654496</link>
        <title><![CDATA[Text summarization method of argumentative discourse by combining the BERT-transformer model]]></title>
        <pubdate>2025-11-28T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yaser Altameemi</author><author>Mohammed Altamimi</author><author>Adel Alkhalil</author><author>Diaa Uliyan</author><author>Romany F. Mansour</author>
        <description><![CDATA[Summarization of texts have been considered as essential practice nowadays with the careful presentation of the main ideas of a text. The current study aims to provide a methodology of summarizing complex texts such as argumentative discourse. Extractive and abstractive summarization techniques have recently gained significant attention. Each has its own limitations that reduce efficiency in the coverage of the main points of the summary, but by combining them, we can use the positive points of each to improve both summarization performance and summary generation quality. This paper presents a novel extractive-abstractive text summarization method that ensures coverage of the main points of the entire text. It is based on combining Bidirectional Encoder Representations from Transformers (BERT) and transfer learning. Using a dataset comprising two UK parliamentary debates, the study shows that the proposed method effectively summarizes the main points. Comparing extractive and abstractive summarization, the experiment used Recall-Oriented Understudy for Gisting Evaluation (ROUGE) sets of metrics and achieved scores of 30.1, 9.60, and 27.9 for the first debate, and 36.2, 11.80, and 31.5 for the second, using ROUGE-1, ROUGE-2, and ROUGE-L metrics, respectively.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1662202</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1662202</link>
        <title><![CDATA[Analysis of article screening and data extraction performance by an AI systematic literature review platform]]></title>
        <pubdate>2025-11-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Kelsie Cassell</author><author>Abiodun Ologunowa</author><author>Majid Rastegar-Mojarad</author><author>Bianca Chun</author><author>Yi-Ling Huang</author><author>Dong Wang</author><author>Nicole Cossrow</author>
        <description><![CDATA[BackgroundSystematic literature reviews (SLRs) are critical to health research and decision-making but are often time- and labor-intensive. Artificial intelligence (AI) tools like large language models (LLMs) provide a promising way to automate these processes.MethodsWe conducted a systematic literature review on the cost-effectiveness of adult pneumococcal vaccination and prospectively assessed the performance of our AI-assisted review platform, Intelligent Systematic Literature Review (ISLaR) 2.0, compared to expert researchers.ResultsISLaR demonstrated high accuracy (0.87 full-text screening; 0.86 data extraction), precision (0.88; 0.86), and sensitivity (0.91; 0.98) in article screening and data extraction tasks, but lower specificity (0.79; 0.42), especially when extracting data from tables. The platform reduced abstract and full-text screening time by over 90% compared to human reviewers.ConclusionThe platform has strong potential to reduce reviewer workload but requires further development.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1635436</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1635436</link>
        <title><![CDATA[Testing network clustering algorithms with natural language processing]]></title>
        <pubdate>2025-11-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ixandra Achitouv</author><author>David Chavalarias</author><author>Bruno Gaume</author>
        <description><![CDATA[IntroductionWe propose a hybrid methodology to evaluate the alignment between structural communities inferred from interaction networks and the linguistic coherence of users' textual production in online social networks. Understanding whether community structure reflects language use allows for a more nuanced validation of Community Detection Algorithms (CDAs) beyond assuming their outputs as ground truth.MethodsUsing Twitter data on climate change discussions, we compare several CDAs by training Natural Language Processing Classification Algorithms (NLPCA), such as BERTweet-based models, on the communities they generate. Classification accuracy serves as a proxy for the semantic coherence of CDA-induced groups. This comparative scoring approach offers a self-consistent framework for evaluating CDA performance without requiring manually annotated labels. We also introduce a coverage–precision trade-off metric to assess community-level performance.ResultsOur results show that the best CDA/NLPCA combinations predict a user's community with over 85% accuracy using only three short sentences. This demonstrates a strong alignment between structural and linguistic patterns in online discourse.DiscussionOur framework enables scoring CDAs based on semantic predictability and allows prediction of community membership from minimal textual input. It offers practical benefits, such as providing proxy labels for low-supervision NLP tasks, and is adaptable to other social platforms. Limitations include potential noise in CDA-generated labels but the approach offers a generalizable method for evaluating CDA performance and the coherence of online social groups.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1674927</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1674927</link>
        <title><![CDATA[Accelerating earth science discovery via multi-agent LLM systems]]></title>
        <pubdate>2025-11-12T00:00:00Z</pubdate>
        <category>Perspective</category>
        <author>Dmitrii Pantiukhin</author><author>Boris Shapkin</author><author>Ivan Kuznetsov</author><author>Antonia Anna Jost</author><author>Nikolay Koldunov</author>
        <description><![CDATA[This Perspective explores the transformative potential of multi-agent systems (MAS) powered by Large Language Models (LLMs) in the geosciences. Users of geoscientific data repositories face challenges due to the complexity and diversity of data formats, inconsistent metadata practices, and a considerable number of unprocessed datasets. MAS possesses transformative potential for improving scientists’ interaction with geoscientific data by enabling intelligent data processing, natural language interfaces, and collaborative problem-solving capabilities. We illustrate this approach with “PANGAEA GPT,” a specialized MAS pipeline integrated with the diverse PANGAEA database for Earth & Environmental Science, demonstrating how MAS-driven workflows can effectively manage complex datasets and accelerate scientific discovery. We discuss how MAS can address current data challenges in geosciences, highlight advancements in other scientific fields, and propose future directions for integrating MAS into geoscientific data processing pipelines. In this Perspective, we show how MAS can fundamentally improve data accessibility, promote cross-disciplinary collaboration, and accelerate geoscientific discoveries.]]></description>
      </item>
      </channel>
    </rss>