Skip to main content

ORIGINAL RESEARCH article

Front. Phys.
Sec. Optics and Photonics
Volume 12 - 2024 | doi: 10.3389/fphy.2024.1398678

Enhancing Target Detection Accuracy through Cross-Modal Spatial Perception and Dual-Modality Fusion Provisionally Accepted

 Ning Zhang1* Wenqing Zhu2
  • 1Shanghai Institute of Technical Physics, Chinese Academy of Sciences (CAS), China
  • 2Chinese Academy of Sciences (CAS), China

The final, formatted version of the article will be published soon.

Receive an email when it is updated
You just subscribed to receive the final version of the article

The disparity between human and machine perception of spatial information presents a challenge for machines to accurately sense their surroundings and improve target detection performance. Cross-modal data fusion emerges as a potential solution to enhance the perceptual capabilities of systems. This article introduces a novel spatial perception method that integrates dual-modality feature fusion and coupled attention mechanisms to validate the improvement in detection performance through cross-modal information fusion. The proposed approach incorporates cross-modal feature extraction through a multi-scale feature extraction structure employing a dual-flow architecture. Additionally, a transformer is integrated for feature fusion, while the information perception of the detection system is optimized through the utilization of a linear combination of loss functions. Experimental results demonstrate the superiority of our algorithm over single-modality target detection using visible images, exhibiting an average accuracy improvement of 30.4%. Furthermore, our algorithm outperforms single-modality infrared image detection by 3.0% and comparative multimodal target detection algorithms by 3.5%. These results validate the effectiveness of our proposed algorithm in fusing dualband features, significantly enhancing target detection accuracy. The adaptability and robustness of our approach are showcased through these results.

Keywords: spatial perception, Cross-modal data fusion, dual-modality feature fusion, Target detection performance, Multi-scale feature extraction, Dual-band feature fusion

Received: 10 Mar 2024; Accepted: 10 May 2024.

Copyright: © 2024 Zhang and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Ning Zhang, Shanghai Institute of Technical Physics, Chinese Academy of Sciences (CAS), Shanghai, China