AUTHOR=Wang Ke , Hu Dengshu , Cheng Yuan , Che Yukui , Li Yuelin , Jiang Zhiwei , Chen Fengxian , Li Wenjuan 

TITLE=Infrared and visible image fusion driven by multimodal large language models

JOURNAL=Frontiers in Physics

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2025.1599937

DOI=10.3389/fphy.2025.1599937

ISSN=2296-424X

ABSTRACT=IntroductionExisting image fusion methods primarily focus on obtaining high-quality features from source images to enhance the quality of the fused image, often overlooking the impact of improved image quality on downstream task performance.MethodsTo address this issue, this paper proposes a novel infrared and visible image fusion approach driven by multimodal large language models, aiming to improve the performance of pedestrian detection tasks. The proposed method fully considers how enhancing image quality can benefit pedestrian detection. By leveraging a multimodal large language model, we analyze the fused images based on user-provided questions related to improving pedestrian detection performance and generate suggestions for enhancing image quality. To better incorporate these suggestions, we design a Text-Driven Feature Harmonization (Text-DFH) module. Text-DFH refines the features produced by the fusion network according to the recommendations from the multimodal large language model, enabling the fused image to better meet the needs of pedestrian detection tasks.ResultsCompared with existing methods, the key advantage of our approach lies in utilizing the strong semantic understanding and scene analysis capabilities of multimodal large language models to provide precise guidance for improving fused image quality. As a result, our method enhances image quality while maintaining strong performance in pedestrian detection. Extensive qualitative and quantitative experiments on multiple public datasets validate the effectiveness and superiority of the proposed method.DiscussionIn addition to its effectiveness in infrared and visible image fusion, the method also demonstrates promising application potential in the field of nuclear medical imaging.