ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Pattern Recognition
This article is part of the Research TopicDeep Learning for Computer Vision and Measurement SystemsView all 9 articles
InfoMSD: An Information-Maximization Self-Distillation Framework for Parameter-Efficient Fine-Tuning on Artwork Images
Provisionally accepted- 1School of Mathematics and Statistics, Southwest University, Chongqing, China
- 2Postdoctoral research center, Beijing Pico Exhibition Management Co.,Ltd, Beijing, China
- 3Beijing Pico Exhibition Management Co.,Ltd, Beijing Postdoctoral Research Foundation, Beijing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43% and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35% and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.
Keywords: Artwork Recognition, information-maximization, Parameter-Efficient Fine-Tuning, Self-distillation, vision-language models
Received: 10 Oct 2025; Accepted: 09 Feb 2026.
Copyright: © 2026 Guan, Hong and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Yong Wang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
