ORIGINAL RESEARCH article
Front. Public Health
Sec. Digital Public Health
Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1608241
BERTopic_Teen: A Multi-Module Optimization Approach for Short Text Topic Modeling in Adolescent Health
Provisionally accepted- 1Sichuan Agriculture University, Cheng'du, China
- 2Sichuan Agricultural University, Ya'an, Sichuan, China
- 3Hohai University, Chang'zhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Adolescent health has become a critical dimension in the digital era, as social media platforms emerge as vital sources of real-time behavioral data for informing sustainable and equitable public health strategies. However, conventional topic modeling methods often struggle with the semantic sparsity and noise inherent in short-form texts. The study proposes BERTopic_Teen, an enhanced topic modeling framework optimized for adolescent health-related tweets. The model incorporates three key innovations: a Popularity Deviation Regularizer (PDR) to suppress high-frequency generic terms and amplify domain-specific vocabulary; a Dynamic Document Embedding Optimizer (DDEO) that adaptively selects optimal UMAP dimensions based on silhouette scores; and a Probabilistic Reassignment Matrix (PRM) to reassign outlier documents to relevant topic clusters. Using a dataset of 64,441 tweets (61,039 successfully classified), experimental results show that BERTopic_Teen outperforms LDA, NMF, Top2Vec, and the original BERTopic in all key evaluation metrics. It achieves a 16.1% improvement in topic coherence (NPMI = 0.2184), higher topic diversity (TD = 0.9935), and lower perplexity (1.7214), indicating superior semantic clarity, topic distinctiveness, and modeling stability. These findings suggest that BERTopic_Teen offers a robust solution for extracting meaningful topics from social media data and advancing public health surveillance.
Keywords: Adolescent Health, Social media analytics, Topic Modeling, BERTopic, Health Systems
Received: 11 Apr 2025; Accepted: 23 Jul 2025.
Copyright: © 2025 Feng, Chen, Zhang, Huang, Zhang and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Yiqiang Feng, Sichuan Agriculture University, Cheng'du, China
Siyu He, Sichuan Agriculture University, Cheng'du, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.