Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health

Sec. Digital Public Health

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1608241

BERTopic_Teen: A Multi-Module Optimization Approach for Short Text Topic Modeling in Adolescent Health

Provisionally accepted
Yiqiang  FengYiqiang Feng1*Ziao  ChenZiao Chen2Yuxin  ZhangYuxin Zhang2Wenyuan  HuangWenyuan Huang2Xuanming  ZhangXuanming Zhang3Siyu  HeSiyu He1*
  • 1Sichuan Agriculture University, Cheng'du, China
  • 2Sichuan Agricultural University, Ya'an, Sichuan, China
  • 3Hohai University, Chang'zhou, China

The final, formatted version of the article will be published soon.

Adolescent health has become a critical dimension in the digital era, as social media platforms emerge as vital sources of real-time behavioral data for informing sustainable and equitable public health strategies. However, conventional topic modeling methods often struggle with the semantic sparsity and noise inherent in short-form texts. The study proposes BERTopic_Teen, an enhanced topic modeling framework optimized for adolescent health-related tweets. The model incorporates three key innovations: a Popularity Deviation Regularizer (PDR) to suppress high-frequency generic terms and amplify domain-specific vocabulary; a Dynamic Document Embedding Optimizer (DDEO) that adaptively selects optimal UMAP dimensions based on silhouette scores; and a Probabilistic Reassignment Matrix (PRM) to reassign outlier documents to relevant topic clusters. Using a dataset of 64,441 tweets (61,039 successfully classified), experimental results show that BERTopic_Teen outperforms LDA, NMF, Top2Vec, and the original BERTopic in all key evaluation metrics. It achieves a 16.1% improvement in topic coherence (NPMI = 0.2184), higher topic diversity (TD = 0.9935), and lower perplexity (1.7214), indicating superior semantic clarity, topic distinctiveness, and modeling stability. These findings suggest that BERTopic_Teen offers a robust solution for extracting meaningful topics from social media data and advancing public health surveillance.

Keywords: Adolescent Health, Social media analytics, Topic Modeling, BERTopic, Health Systems

Received: 11 Apr 2025; Accepted: 23 Jul 2025.

Copyright: © 2025 Feng, Chen, Zhang, Huang, Zhang and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Yiqiang Feng, Sichuan Agriculture University, Cheng'du, China
Siyu He, Sichuan Agriculture University, Cheng'du, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.