BRIEF RESEARCH REPORT article
Front. Artif. Intell.
Sec. Language and Computation
Hybrid Artificial Intelligence Architectures for Automatic Text Correction in the Kazakh Language
Provisionally accepted- 1Turan University, Almaty, Kazakhstan
- 2Home Credit Bank JSC, Almaty,, Kazakhstan
- 3Narxoz University, Almaty, Kazakhstan
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The Kazakh language, as an agglutinative and morphologically rich language, presents significant challenges for the development of natural language processing (NLP) tools. Traditional rule-based analyzers provide full coverage but lack flexibility; statistical and neural models handle disambiguation more effectively, yet require large annotated corpora and substantial computational resources. This paper presents a hybrid morphological analyzer that integrates Finite-State Transducers (FST), Conditional Random Fields (CRF), and transformer-based architectures (KazRoBERTa, mBERT). For the experiments, a new corpus, KazMorphCorpus-2025, was created, consisting of 150000 sentences from diverse domains annotated for morphological analysis. Experimental evaluation demonstrated that the KazRoBERTa model consistently outperforms mBERT in terms of accuracy, F1-score, and prediction speed. The hybrid architecture effectively combines the exhaustive coverage of FST with the contextual disambiguation of neural networks, reducing errors associated with homonymy, borrowings, and long affixal chains. The results confirm that the proposed system achieves a balance between accuracy, efficiency, and scalability. The study underscores the practical significance of hybrid approaches for tasks such as spell checking, information retrieval, and machine translation in the Kazakh language, as well as their potential transferability to other low-resource Turkic languages. Future work will include the expansion of the corpus, integration of KazBERT and mBERT models, and validation of the proposed approach in applied NLP systems.
Keywords: Kazakh language, morphological analysis, Hybrid architecture, machine learning, KazRoBERTa, mBERT, natural language processing (NLP)
Received: 24 Sep 2025; Accepted: 12 Nov 2025.
Copyright: © 2025 Baitenova, Tussupova, Mambetov, Munaitbas and Mukhamejanova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Gulnar Mukhamejanova, gulnar.mukhamedzhanova@narxoz.kz
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
