AUTHOR=Bhargavi Divya , Gholami Sia , Pelaez Coyotl Erika TITLE=Jersey number detection using synthetic data in a low-data regime JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 5 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.988113 DOI=10.3389/frai.2022.988113 ISSN=2624-8212 ABSTRACT=Automatic player identification is an essential and complex task in sports video analysis. Different strategies have been devised over the years, but identification based on jersey numbers is one of the most common approaches given its versatility and relative simplicity. However, automatic detection of jersey numbers is still challenging due to changing camera angles, low video resolution, small object size in wide-range shots and transient changes in the player's posture and movement. In this paper we present a novel approach for jersey number identification in a small, highly imbalanced dataset from the Seattle Seahawks practice videos. We use a multi-step strategy that enforces attention to a particular region of interest (player's torso), to identify jersey numbers. We generate in-house synthetic datasets of different complexities to supplement the data imbalance and scarcity in the samples. Our multi-step pipeline first identifies and crops players in a frame using a pretrained person detection model. We then utilize a pretrained human pose estimation model to localize jersey numbers (using torso key-points) in the detected players, obviating the need for annotating bounding boxes for number detection. This results in images that are on average 20x25px in size. We trained two light-weight Convolutional Neural Networks (CNNs) with different learning objectives: multi-class for two-digit number identification and multi-label for digit-wise detection to compare performance. Both models went through a pre-training round with the synthetic datasets and were finetuned with the real-world dataset to achieve a final best accuracy of 89\%. Our results indicate that simple models can achieve an acceptable performance on the jersey number detection task and that synthetic data can improve the performance dramatically (accuracy increase of ~9\% overall, ~18\% on low frequency numbers) making our approach achieve state of the art results.