ORIGINAL RESEARCH article

Front. Big Data

Sec. Recommender Systems

Volume 8 - 2025 | doi: 10.3389/fdata.2025.1611389

This article is part of the Research TopicGenerative Search and RecommendationView all articles

LLM-as-a-Judge: Automated Evaluation of Search Query Parsing Using Large Language Models

Provisionally accepted
  • 1sahibinden.com, Istanbul, Türkiye
  • 2sahibinden.com, Ankara, Ankara, Türkiye
  • 3Department of Computer Engineering, Faculty of Engineering, Boğaziçi University, Bebek, Istanbul, Türkiye

The final, formatted version of the article will be published soon.

The adoption of Large Language Models (LLMs) in search systems has required new evaluation methodologies beyond traditional rule-based or manual approaches. In this work, we propose a general approach for evaluating structured outputs and demonstrate its effectiveness in search query parsing, a critical task for improving retrieval accuracy, within an online classified platform. Our framework leverages LLMs' contextual reasoning capability through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessment. We also introduce a Contextual Evaluation Prompt Routing strategy to enhance the evaluation reliability and mitigate hallucinations. Experiments across small-and large-scale datasets show that LLM-based evaluation achieves around 90% agreement with human judgments. The results validate LLM-driven assessment as a scalable, interpretable, and effective alternative to traditional evaluation methods, ensuring robust query parsing in real-world search systems.

Keywords: LLM-as-a -Judge, Structured Output Evaluation, Search query parsing, Large language models, evaluation framework, Generative Search, automatic evaluation, Query understanding

Received: 14 Apr 2025; Accepted: 30 Jun 2025.

Copyright: © 2025 Baysan, Uysal, İşlek, Çığ Karaman and Güngör. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mehmet Selman Baysan, sahibinden.com, Istanbul, Türkiye

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.