ORIGINAL RESEARCH article
Front. Commun.
Sec. Language Communication
This article is part of the Research TopicArtificial Intelligence for Technology Enhanced LearningView all 12 articles
PhonoMetric: A Dual-Metric Engine for Real-Time English Language Accent Evaluation and Personalized Speech Training for Indian Learners
Provisionally accepted- Mepco Schlenk Engineering College, Sivakasi, India
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The core objective of this study is to develop a novel method to measure and to improve standard spoken English pronunciation accuracy in relation to a desired accent style using current speech processing and information retrieval methods. The system employs the ECAPA-TDNN model, which has been fine-tuned with American-accented speech to create speaker embeddings from the user's audio. Accent embeddings from reference accent speech samples are subsequently compared using cosine similarity to arrive at an Accent Similarity Score (ASS). At the same time, the user speech is transcribed using the Whisper ASR model (open-source software), then aligned using a forced alignment tool with a reference sentence at the phoneme level. In automatic classification, the level of proficiency (Beginner, Intermediate, Advanced) is attributed to the users on the basis of semantic and phonetic closeness and measures of comprehensible mistakes. For training, the system utilizes the user's fluency profile to create a particular YouTube query through SerpAPI, providing related and quality resources for pronunciation, their native and accent gaps being considered. An experimental study was conducted amongst thirty undergraduate students. Experimental evaluations have shown that our two-metric engine provides a scalable and adaptable solution to real-time accent evaluation with classification accuracy of 91.3%, 88.6%, and 93.1% across beginner, intermediate, and advanced users respectively. The system provided a strong negative correlation (r = -0.82) between PER and ASS, while indicating that users received a score of 4.6/5 on satisfaction in initial usability studies.
Keywords: Accent Evaluation, ECAPA-TDNN, Language Learning Automation, Pronunciation feedback, Speaker embeddings, Whisper ASR
Received: 13 Sep 2025; Accepted: 12 Dec 2025.
Copyright: © 2025 Soundarraj, Anantharajan and Loganathan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Saranraj Loganathan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
