EDITORIAL article

Front. Robot. AI, 01 March 2024

Sec. Robot Vision and Artificial Perception

Volume 11 - 2024 | https://doi.org/10.3389/frobt.2024.1348022

Editorial: Enhanced human modeling in robotics for socially-aware place navigation

  • 1. Democritus University of Thrace, Komotini, Greece

  • 2. Athena Research Center, Marousi, Greece

  • 3. University of Maryland, Baltimore, MD, United States

  • 4. Italian Institute of Technology (IIT), Genova, Liguria, Italy

Article metrics

View details

1

Citations

2,2k

Views

599

Downloads

1 Introduction

Autonomous and accurate navigation is a prerequisite for any intelligent system assigned to various missions. Yet, this task presents a higher complexity when a mobile robot navigates in an unfamiliar terrain, as it needs to move through the environment and construct a detailed map of its surroundings. At the same time, the system should estimate its pose and orientation during the incremental construction of its internal map (Tsintotas et al., 2022). This process is widely known as simultaneous localization and mapping (SLAM) and is paramount for effective and context-aware navigation. However, this challenge becomes even more intricate when robots work within human environments, as human-robot coexistence introduces variables such as human activities, intentions, and their impacts on the robot’s path (Keroglou et al., 2023). At the same time, the integration necessitates adherence to stringent safety and security requirements. Consequently, the robotic community tries to tackle these challenges through several techniques that collectively shape the field into a demanding, interdisciplinary pursuit known as socially aware navigation. This involves technical considerations and a deep understanding of the social dynamics between humans and robots, marking a crucial intersection of robotics, artificial intelligence, and human-computer interaction. Should we understand human activities, intentions, or social dynamics via intelligent pipelines, robots can navigate spaces shared with humans, fostering a harmonious coexistence, e.g., healthcare, or assistive technologies to smart homes and public space. Last, socially aware robot navigation aims to bridge the gap between artificial intelligence and human interaction, paving the way for a more integrated and socially intelligent future.

2 Analysis of the Research Topic

The paradigm of socially aware place navigation is situated within the intricate domain of human modeling, systematically examining various dimensions such as human pose estimation (Wei et al., 2022), action recognition (Charalampous et al., 2017;Dessalene et al., 2021), language understanding (Vatakis and Pastra, 2016), and affective computing (Kansizoglou et al., 2022) (see Figure 1). The first is the discernment of the spatial configuration of an individual’s body, a pivotal facet enabling a robotic system to comprehend humans’ physical presence and movements within its proximate environment (An et al., 2022). At the same time, action recognition further augments this comprehension by interpreting the activities in which individuals are engaged (Dessalene et al., 2023), thereby contributing to a nuanced understanding of the contextual environment (Moutsis et al., 2023). Language understanding, a fundamental component of this multifaceted paradigm, empowers the robot to discern verbal cues and commands (Pastra and Aloimonos, 2012), thereby facilitating seamless communication with human counterparts. At the same time, affective computing introduces an emotional dimension, endowing the robot to discern and appropriately respond to human emotions, enhancing its adaptability to intricate social contexts (Kansizoglou et al., 2019). Last, the amalgamation of these human-centric capacities within the purview of the navigation task epitomizes a sophisticated methodological approach, and consequently, such frameworks are poised to excel in scenarios characterized by adversity, dynamism, and heightened interactivity.

FIGURE 1

2.1 Contributing articles

Although user-centered approaches are essential to create a comfortable and safe human-robot interaction, they are still rare in industrial settings. Aiming to close this research gap, in Bernotat et al., two user studies with large heterogeneous samples were conducted. In particular, in User Study 1, the participants’ ideas about robot memory were explored, as well as what aspects of the robot’s movements were found positive, and what they would change. The effects of participants’ demographic backgrounds and attitudes were controlled for. Next, it is self-evident that even in such an elementary and minimal environment compared to the real world, home agents require guidance from dense reward functions to learn to carry out complex tasks. As task decomposition is an easy-to-use approach for introducing those dense rewards, in Petsanis et al., a method that can be used to improve training in embodied AI environments by harnessing the task decomposition capabilities of TextWord is presented. On the other hand, Karasoulas et al. examined how to detect the presence or absence of individuals indoors by analyzing the ambient air’s CO2 concentration using simple Markov Chain Models. While this study focused on employing 1-h window testing sets, there exists significant potential for accurately assessing occupancy profiles within shorter minute intervals. At last, the authors in Arapis et al. focus on localizing humans in the world and predicting the free space around them by incorporating other static and dynamic obstacles. Their research is based on a multitasking learning strategy to handle both tasks, achieving this goal with minimal computational demands when employed in difficult industrial environments, such as human instances at a close distance or the limits of the field of view of the capturing sensor.

3 Discussion and conclusion

Overall, the main objective of a human-aware navigation pipeline is to facilitate human-robot coexistence in a shared environment. Such a scenario requires the efficient parallel realization of each member’s goals without needless external interceptions or delays and the successful completion of specific everyday tasks. On top of that, the robotic agent is expected to inspire a sense of trust and friendliness in humans, mainly realized when the agent operates concisely, adaptively, transparently, and naturally. Thus, robot navigation techniques shall employ enhanced human understanding and modeling techniques, capturing those features that mainly affect the efficiency of the task. As a result, it becomes increasingly vital to develop robust, lightweight action and affect estimation solutions based on robotics sensory data and capacities, like active vision (Aloimonos et al., 1988). Finally, computational efficiency and real-time operation capacities always limit the introduced solutions.

Statements

Author contributions

KT: Conceptualization, Writing–original draft, Writing–review and editing. IK: Conceptualization, Writing–original draft, Writing–review and editing. KP: Supervision, Writing–review and editing. YA: Supervision, Writing–review and editing. AG: Supervision, Writing–review and editing. GoS: Supervision, Writing–review and editing. GuS: Supervision, Writing–review and editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AloimonosJ.WeissI.BandyopadhyayA.(1988). Active vision. Act. Vis. Int. J. Comput. Vis.1, 333356. 10.1007/bf00133571

  • 2

    AnS.ZhangX.WeiD.ZhuH.YangJ.TsintotasK. A.(2022). Fasthand: fast monocular hand pose estimation on embedded systems. J. Syst. Archit.122, 102361. 10.1016/j.sysarc.2021.102361

  • 3

    CharalampousK.KostavelisI.GasteratosA. (2017). Recent trends in social aware robot navigation: a survey. Robotics Aut. Syst.93, 85104. 10.1016/j.robot.2017.03.002

  • 4

    DessaleneE.DevarajC.MaynordM.FermullerC.AloimonosY. (2021). “Forecasting action through contact representations from first person video,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 1 June 2023 (IEEE), 67036714. 10.1109/TPAMI.2021.3055233

  • 5

    DessaleneE.MaynordM.FermüllerC.AloimonosY. (2023). “Therbligs in action: video understanding through motion primitives,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17-24 June 2023, 1061810626. 10.1109/CVPR52729.2023.01023

  • 6

    KansizoglouI.BampisL.GasteratosA. (2019). An active learning paradigm for online audio-visual emotion recognition. IEEE Trans. Affect. Comput.13, 756768. 10.1109/taffc.2019.2961089

  • 7

    KansizoglouI.MisirlisE.TsintotasK.GasteratosA. (2022). Continuous emotion recognition for long-term behavior modeling through recurrent neural networks. Technologies10, 59. 10.3390/technologies10030059

  • 8

    KeroglouC.KansizoglouI.MichailidisP.OikonomouK. M.PapapetrosI. T.DragkolaP.et al (2023). A survey on technical challenges of assistive robotics for elder people in domestic environments: the aspida concept. IEEE Trans. Med. Robotics Bionics5, 196205. 10.1109/tmrb.2023.3261342

  • 9

    MoutsisS. N.TsintotasK. A.KansizoglouI.ShanA.AloimonosY.GasteratosA. (2023). “Fall detection paradigm for embedded devices based on yolov8,” in IEEE International Conference on Imaging Systems and Techniques (IST), Copenhagen, Denmark, 17-19 Oct. 2023, 16. 10.1109/IST59124.2023.10355696

  • 10

    PastraK.AloimonosY. (2012). The minimalist grammar of action. Philosophical Trans. R. Soc. B Biol. Sci.367, 103117. 10.1098/rstb.2011.0123

  • 11

    TsintotasK. A.BampisL.GasteratosA. (2022). Online appearance-based place recognition and mapping: their role in autonomous navigation, 133. Springer Nature.

  • 12

    VatakisA.PastraK. (2016). A multimodal dataset of spontaneous speech and movement production on object affordances. Sci. Data3, 150078150086. 10.1038/sdata.2015.78

  • 13

    WeiD.AnS.ZhangX.TianJ.TsintotasK. A.GasteratosA.et al (2022). “Dual regression for efficient hand pose estimation,” in 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23-27 May 2022 (IEEE), 64236429. 10.1109/ICRA46639.2022.9812217

Summary

Keywords

robotics, social navigation, AI, machine learning, language processing

Citation

Tsintotas KA, Kansizoglou I, Pastra K, Aloimonos Y, Gasteratos A, Sirakoulis GC and Sandini G (2024) Editorial: Enhanced human modeling in robotics for socially-aware place navigation. Front. Robot. AI 11:1348022. doi: 10.3389/frobt.2024.1348022

Received

01 December 2023

Accepted

14 February 2024

Published

01 March 2024

Volume

11 - 2024

Edited and reviewed by

Giuseppe Boccignone, University of Milan, Italy

Updates

Copyright

*Correspondence: Konstantinos A. Tsintotas,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics