Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

TESTING NETWORK CLUSTERING ALGORITHMS WITH NATURAL LANGUAGE PROCESSING

Provisionally accepted
Ixandra  AchitouvIxandra Achitouv*David  ChavalariasDavid ChavalariasBruno  GaumeBruno Gaume
  • Centre National de la Recherche Scientifique (CNRS), Paris, France

The final, formatted version of the article will be published soon.

We propose a hybrid methodology to evaluate the alignment between structural communities inferred from interaction networks and the linguistic coherence of users' textual production in online social networks. Using Twitter data on climate change discussions, we compare different Community Detection Algorithms (CDAs) by training Natural Language Processing Classification Algorithms (NLPCA), such as BERTweet-based models, on the communities they generate. The classification accuracy serves as a proxy for how semantically coherent the CDA-induced groups are. Rather than assuming CDA outputs as ground truth, our approach uses NLPCA accuracy as a relative scoring function to rank CDAs based on their alignment with linguistic identity. This comparative framework provides a self-consistent evaluation of community coherence without requiring manually annotated labels. Our key results show that the best CDA/NLPCA pairs can predict a user's community with over 85% accuracy using only three succinct sentences. We also introduce a coverage-precision trade-off metric to assess community-level performance. Limitations include potential noise in CDA-generated labels and the need for deeper robustness checks.

Keywords: community detection, Natural Language Processing, Social network, Classification, Social community

Received: 29 May 2025; Accepted: 24 Oct 2025.

Copyright: © 2025 Achitouv, Chavalarias and Gaume. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ixandra Achitouv, ixandra.achitouv@cnrs.fr

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.