Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Human-Media Interaction

This article is part of the Research TopicRe-Imagining Mediated Human Building Interaction and Sensory Environments: Volume IIView all articles

From 2D Screens to Immersive Spatial Interaction: LLM-driven Depth-Sensing for Behavior Modeling and Analysis in Media Architecture

Provisionally accepted
  • University College London, London, United Kingdom

The final, formatted version of the article will be published soon.

Large media fac¸ades are reshaping interactions in buildings and public spaces into immersive environments, yet empirical knowledge of how pedestrians behave inside these media spcaes is still limited. This study introduces a fully automated pipeline for in-the-wild behavior analysis that integrates a system which consists of a stereo-depth camera, an object detection model with multi-target tracking algorithm, and GPT-4o with visual reasoning. Deployed at London's immersive media building Now Arcade, the system captured two hours of depth-enhanced video and produced more than six hundred anonymised visitor trajectories without any manual annotation. It reliably identified three recurrent behaviors: passing-by, lingering, and shooting (photographing or filming). To reveal where these actions occur, we propose Behavior Instance Density (BiD) heat-maps that project frame-level behavior instances onto a floor-plan grid of 0.5 m × 0.5 m squares. A comparative BiD study of two hour-long content loops with static high-contrast imagery and dynamic low-contrast animation, shows clear content-driven behavior differences. Static saturated graphics encourage longer stays and more filming at both buildings entrance and exit thresholds, while dynamic darker visuals maintain a predominantly transit-oriented flow through the corridor.The proposed pipeline uses a compact, cost-effective sensing setup, safeguards privacy by discarding raw images after processing, and can be scaled for long-term or multi-site deployments. The resulting behavioral insights offer concrete guidance for media-architecture design and lay the groundwork for responsive fac¸ades that can update their digital content in real time according to observed human engagement.

Keywords: Depth camera, HBI, LLMS, Media Architecure, spatial analysis

Received: 14 Nov 2025; Accepted: 06 Feb 2026.

Copyright: © 2026 Wu and Fatah gen Schieck. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ava Fatah gen Schieck

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.