AUTHOR=Ren XingTao , Ma Yan , Zhou YiXin TITLE=Label semantics and image features aware remote sensing sample retrieval from multi-source datasets for AI-enabled remote sensing monitoring JOURNAL=Frontiers in Environmental Science VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2025.1580797 DOI=10.3389/fenvs.2025.1580797 ISSN=2296-665X ABSTRACT=Introduction:With the advent of remote sensing (RS) big data, the evolving deep learning (DL) enabled analysis has shown remarkable potential in uncovering intricate features and detecting environmental changes from the vast influx of remote sensing imagery. This is particularly promising for data-driven, intelligent environmental change monitoring, including coastal land cover classification, urbanization impact analysis and so on. Accordingly, the remote sensing imagery sample datasets have become crucial in ensuring robust training models with satisfactory performance across various AI (Artificial Intelligence)-enabled remote sensing applications. However, as the significant demand for more abundant and diverse remote sensing imagery samples continues to surge, sample data scarcity has emerged as a critical challenge for large-scale AI-enabled remote sensing applications. Moreover, the inconsistencies in labeling semantic categories among sample datasets, coupled with the limits of the commonly used single-label representation of datasets, have posed a huge barrier to fully leveraging the training samples across diverse remote sensing sample datasets. Besides, the existing remote sensing training sample datasets dispersed across various hosting platforms are typically organized in supplier-specific and arbitrary data structures, making it rather trivial and difficult for applications to gather demand sample datasets to fit into training models.Methods:To tackle the above challenges, we propose an intelligent remote sensing sample dataset retrieval approach with awareness of label semantics and visual features for fully integrating and leveraging sample datasets for AI-enabled remote sensing monitoring applications. Notably, it takes both label semantics and visual features into consideration during cross-dataset querying by measuring the similarity distance of visual features and label semantics between samples and the cluster center of label categories. Following this way, it could dynamically build application-tailored remote sensing sample datasets to better harness the multi-source sample datasets with single-label limits and label disparities. Moreover, it also establishes a dynamic RS label category system that is capable of dynamically expanding distinct categories from new sample datasets through label semantic similarity mapping to resolve label inconsistencies across sample datasets. In addition, it conducts in-memory sample data discovery and integration across clouds supported by a virtual distributed storage system to sufficiently leverage the multi-source remote sensing sample datasets from platforms with limited interoperability.Results:The comparative performance experiments have confirmed the effectiveness and efficiency of this approach. The proposed method is capable of dynamically integrating and leveraging multi-source sample datasets, effectively addressing the challenges of sample data scarcity, label inconsistencies, and data structure disparities. The dynamic RS label category system and the cross-dataset retrieval approach have shown significant improvements in sample dataset integration and retrieval accuracy.Discussion:The proposed intelligent remote sensing sample dataset retrieval approach provides a comprehensive solution to the challenges faced by large-scale AI-enabled remote sensing applications. By integrating label semantics and visual features, the method enhances the accuracy and efficiency of sample dataset retrieval and integration. The dynamic label category system and in-memory data discovery mechanisms further improve the usability and accessibility of multi-source remote sensing sample datasets. Future work may focus on further optimizing the retrieval algorithms and expanding the range of supported data platforms to enhance the robustness and scalability of the system.