AUTHOR=Tran Van Trung , Le Quang Dao , Pham Bao Son , Luu Viet Hung , Bui Quang Hung 

TITLE=Large-scale Vietnamese point-of-interest classification using weak labeling

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 5 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.1020532

DOI=10.3389/frai.2022.1020532

ISSN=2624-8212

ABSTRACT=Point-of-Interest (POI) represent geographic location by different categories (e.g. touristic places, amenities, or shops), and play a prominent role in several location-based applications. However, the majority of POIs categories label is crowd-sourced by community, thus often of low quality. In this paper, we introduce the first annotated dataset for the POIs categorical classification task in Vietnamese. 750,000 POIs are collected from WeMap, a Vietnamese digital map. Large-scale hand-labeling is inherently time-consuming and labor-intensive, thus we have come up with a new approach using weak labeling. As a result, our dataset covers 15 categories with 275,000 weak-labeled POI for training, and 30,000 gold-standard POI for testing, make it the largest in compared to existing Vietnamese POI dataset. We empirically conduct POI categorical classification experiments using strong baseline (Bert-based fine-tuning) on our dataset and find that our approach shows high efficiency and is applicable on a large scale. The proposed baseline give F1 score of 90% on test dataset, and significantly improve the accuracy of WeMap POI data by the margin of 37% (from 56% to 93%).