Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Computer Vision

Single-Item Training for Multi-Dish Recognition: A Class-Agnostic Framework for Indian Food Platters

Provisionally accepted
  • SRM University AP, Amaravati, India

The final, formatted version of the article will be published soon.

Accurate dietary assessment is increasingly dependent on automated food recognition systems capable of operating effectively in real-world environments. While most vision-based models perform well on single-item datasets, their performance degrades significantly in complex multi-dish settings. This scenario is particularly evident in Indian thalis, which contain overlapping food items with diverse textures and high visual variability. These challenges make large-scale multi-dish annotation expensive and limit practical deployment of such systems.To address this gap, we propose a novel two-stage framework that enables recognition of multi-dish food images using only single-item training data. The proposed pipeline incorporates class-agnostic segmentation using the Segment Anything Model (SAM), followed by classification with an SE-DenseNet121 network optimized via Optuna-based hyperparameter tuning.The model is trained exclusively on single-item annotated images and generalizes to multi-item thali images at inference time through a segmentation-classification mapping strategy. This zero-shot segmentation approach eliminates the need for multi-dish ground-truth annotations.As a result, the annotation complexity is reduced from O(N × M) to O(N). The proposed system achieves accuracy of 97.48% on single-item food image classification and demonstrates strong applicability to multi-dish Indian thali images through region-wise inference on segmented food items.Furthermore, the framework is computationally efficient, achieving 2× faster inference with a latency of 1.58 ms while using only 70% of the parameters required by transformer-based baselines. It operates with low computational cost (2.90 GFLOPs), significantly fewer parameters (8.06M compared to 26.69–86.77M), and delivers higher throughput (633.32 samples/s). These results demonstrate that the proposed method provides a scalable and practical solution for real-time dietary assessment applications.

Keywords: Food recognition, Hyperparameter optimization, Indian Food Classification, Optuna, SE-DenseNet121, SegmentAnything Model (SAM), Squeeze-and-Excitation (SE) Module

Received: 25 Nov 2025; Accepted: 13 Feb 2026.

Copyright: © 2026 Garisa, Kumar and Singh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ravi Kant Kumar

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.