- 1College of Electronic and Information Engineering, Changchun University, Changchun, Jilin, China
- 2Key Laboratory of Intelligent Rehabilitation and Barrier-free for the Disabled, Ministry of Education, Changchun University, Changchun, Jilin, China
Hyperspectral image denoising is crucial for restoring effective information from noisy data and plays a significant role in various downstream applications. Deep learning based methods have become the mainstream research direction due to their ability to handle complex noise. However, the spatial feature extraction of most existing methods is not comprehensive enough, and the adoption of a fixed spectral reconstruction mode does not fully utilize the spectral information of the original image. To address these issues, we propose a Spatial-Spectral Modulated Network for hyperspectral image denoising (SMDe). It consists of a spatial feature extraction network and a spectral modulation module. For the spatial feature extraction, we construct a hybrid network that combines Mamba and Transformer layers, which can effectively capture global and local spatial information. For the spectral modulation module, we design a discrimination strategy that adaptively preserves or reconstructs spectral information from the original image spectra. This information is then used to modulate spatial features, enhancing the spectral fidelity of the denoised image. Experiments on synthetic datasets show that SMDe outperforms other advanced methods in most noisy scenarios, not only restoring image details but also maintaining excellent spectral consistency. In cross-dataset and real-data evaluations, the denoising results of SMDe also demonstrated strong competitiveness.
1 Introduction
Compared with RGB images, hyperspectral images provide abundant spectral information. This unique characteristic has led to their extensive applications in remote sensing (Li et al., 2024; Arun and Akila, 2023), material identification (Wang et al., 2019; Kang et al., 2022), industrial inspection (Długosz et al., 2023), food quality analysis (Wang et al., 2017; Ahmed et al., 2025), and medical diagnosis (Ma et al., 2024; Saeed et al., 2025). In these applications, the effective utilization of spectral information plays a critical role. For example, in agricultural crop analysis, spectral features are widely exploited for fine-grained crop or seed classification (Chen et al., 2025; Wang et al., 2025). However, during the imaging process, factors including atmospheric disturbance, dark current, and sensor defects inevitably introduce noise, thereby degrading image quality and limiting the reliable extraction and utilization of valuable information. In order to restore information from degraded data, hyperspectral image denoising methods have attracted significant research attention.
Among hyperspectral image denoising methods, traditional approaches typically exploit the intrinsic properties of hyperspectral data to construct degradation models based on handcrafted priors. Typical priors include non-local similarity (Jia et al., 2014; Wang and Li, 2014), spatial–spectral correlation (Wang and Xie, 2018; Fu et al., 2016), low-rankness (Sumarsono and Du, 2015; Lin et al., 2024), and sparsity (Tang and Zhou, 2018; Akhtar et al., 2014). Although traditional methods can achieve image restoration in some noisy scenes, they have limitations when dealing with complex types of noise. In addition, these methods often have complex modeling processes and are difficult to optimize.
In recent years, deep learning–based denoising methods have developed rapidly. These approaches can be broadly divided into CNN-based methods, Transformer-based methods, and Mamba-based methods. CNN-based methods extract image features through convolutional kernels for denoising modeling. For example, HSIDCNN (Yuan et al., 2019) employs three-dimensional convolution of different sizes in both spatial and spectral dimensions, enabling the network to extract spatial and spectral features and thereby learn the mapping from noisy images to clean hyperspectral images. QRNN3D (Wei et al., 2021) also utilizes three-dimensional convolutions for feature extraction and incorporates quasi-recurrent units to model global spectral correlations. In addition to CNN-based methods, Transformer-based methods have become a research focus owing to the strong contextual modeling capability of the self-attention mechanism (Dosovitskiy et al., 2021). For instance, SwinIR (Liang et al., 2021) employs window self-attention (WSA) to build a series of residual blocks, which has been successfully applied to image restoration tasks and achieved state-of-the-art performance. For hyperspectral image denoising, SST (Li et al., 2023) employs WSA and global spectral self-attention (GSSA) to capture both spatial and spectral positional dependencies, and has achieved remarkable results. HSTNet (Yadav et al., 2025) combines 3D CNNs with a spectral transformer to jointly exploit local spectral–spatial feature extraction and long-range spectral dependency modeling, thereby improving hyperspectral image denoising performance. As for Mamba-based methods, they have recently attracted considerable attention due to their global receptive fields with linear complexity (Gu and Dao, 2024; Chen et al., 2024; Zhang et al., 2024). For example, SSUMamba (Fu et al., 2024) extends the original Mamba model with six scanning modes, enabling multi-directional modeling of spatial–spectral correlations in hyperspectral images. MambaIRv2 (Guo et al., 2025), on the other hand, rearranges pixels by taking advantage of the spatial correlation of images and inputs the global information of spatial positions into the state space model, requiring only one sequence scan. And it achieves state-of-the-art performance in RGB image restoration.
Although existing hyperspectral image denoising methods have achieved promising performance, effectively modeling the complex interaction between spatial structures and spectral information remains a critical challenge. Window-based Transformer methods restrict the input to a predefined window size, preventing distant spatial information from directly interacting (Liu et al., 2021). On the other hand, Mamba, as a one-dimensional state-space sequence model, is effective in capturing long-range dependencies but lacks explicit mechanisms to model fine-grained local spatial structures in images. Simply altering the scanning order provides limited improvement for preserving detailed spatial information (Yu and Erichson, 2025; Qu et al., 2025). Moreover, most deep learning–based denoising approaches adopt fixed network parameters after training, implicitly applying the same spectral reconstruction pattern to different inputs and spatial regions. Such a strategy fails to fully exploit the input-dependent spectral characteristics of hyperspectral images.
To address these issues, this paper proposes a spatial–spectral modulation network for hyperspectral images denoising (SMDe). The proposed method integrates Mamba and Transformer to jointly capture global dependencies and local spatial structures. Moreover, we introduce a spectral modulation module that explicitly leverages the original input spectra to adaptively modulate spatial features in a patch-wise manner. By establishing a direct interaction between spatial structures and spectral information, SMDe improves the spatial and spectral consistency of the reconstructed hyperspectral image. The main contributions of this work are summarized as follows:
• We propose SMDe, a Spatial-Spectral Modulated Network for hyperspectral image denoising, which captures comprehensive spatial information while fully leveraging the spectral information from the original images.
• We construct a hybrid spatial feature extraction module combining Mamba and Transformer, enabling the model to capture comprehensive spatial features while modeling long-range dependencies and preserving local information.
• A spectral modulation module is designed to discriminate the input spectral vector and adaptively select a retention or reconstruction strategy, thereby enabling effective use of the spectral information in the original image.
• Experimental results on both synthetic and real hyperspectral data demonstrate the effectiveness of the proposed method for hyperspectral image denoising under various noise conditions.
2 Methodology
2.1 Overall network architecture
The proposed SMDe network consists of two main parts. One is the spatial feature extraction composed of residual networks, and the other is the spectral modulation module. As shown in Figure 1, during spatial feature extraction, the noisy hyperspectral image
where
Figure 1. Overall architecture of the proposed SMDe network. Given a noisy hyperspectral image as input, the spatial feature extraction module (a) is first employed to extract spatial features from the input image. Then, the spectral modulation module (b) divides both the extracted spatial feature maps and the original input into corresponding image patches. For each patch, a spectral strategy block is used to generate a spectral modulation vector, which adaptively modulates the spatial features. Finally, all modulated patches are concatenated to reconstruct the denoised hyperspectral image.
Finally, the output
2.2 Spatial feature extraction block
As shown in Figure 2, the SFEB is a residual block composed of six spatial feature extraction layers (SFEL). Each layer contains a Mamba layer and a Transformer layer. The Mamba layer employs attentive state space model (ASSM) (Guo et al., 2025) to capture long-range dependencies in the image, while the core of the Transformer layer is window multihead self-attention (WMSA) (Liu et al., 2021), which enhances local information within each window. In addition, the ConvFFN (Fan et al., 2023) is applied as the feedforward network to further strengthen local feature representation.
Figure 2. The structure of the SFEB, which is a residual block composed of six sequential SFELs. Each layer consists of a Mamba layer and a Transformer layer.
The Mamba layer is formulated as described in Equations 11, 12:
where ASSM uses the SGN (Guo et al., 2025) module to unfold the image into a 1D sequence according to semantic proximity, and then performs sequence scanning. The sequence modeling process is expressed as described in Equations 13, 14:
where
The Transformer layer formulated as described in Equations 15, 16:
where WMSA models the spatial relationships within local windows. The attention computation is given as described in Equation 17:
where
The ConvFFN block is formulated as described in Equations 18, 19:
where DWConv denotes a depthwise separable convolution with a kernel size of
2.3 Spectral strategy block
As shown in Figure 3, to adaptively select between retaining or reconstructing the regional pooling spectral vectors, the SSB first applies a forward difference along the channel dimension to the pooled spectral vector
where the discriminator is a three-layer fully connected network. The first two layers use LeakyReLU as the activation function, while the final layer employs Sigmoid to output the discrimination value. An empirical threshold of 0.5 is used. If the discrimination value exceeds the threshold, the original spectral vector is retained; otherwise, a bidirectional GRU maps the spectral vectors to the feature space, followed by a linear layer to reconstruct the spectral vectors.
Figure 3. The SSB determines whether to retain or reconstruct the input spectral vectors, which are then used for subsequent modulation.
3 Experiment Setup
3.1 Datasets
The ICVL (Arad and Ben-Shahar, 2016) hyperspectral image dataset contains 201 images with a spectral range of 400 nm–700 nm. Each image has 31 bands with a wavelength interval of 10 nm. According to the Settings in QRNN3D (Wei et al., 2021), we select 100 images from the ICVL hyperspectral image dataset to build the training set, extract image blocks of size 64
3.2 Synthetic noise denoising experiments
We define nine types of synthetic noise scenarios:
Cases 1–4: Gaussian white noise with zero mean and standard deviations of 30, 50, 70, and a blind setting (randomly selected from 30 to 70).
Case 5: Different bands are corrupted by Gaussian noise with randomly selected deviations ranging from 10 to 70. This type of noise is also known as non-i. i.d noise.
Case 6: Based on Case 5, add strip noise to 5%–15% of the columns in one-third of the randomly selected bands.
Case 7: Based on Case 5, add deadline noise to 5%–15% of the columns in one-third of the randomly selected bands.
Case 8: Based on Case 5, add impulse noise ranging in intensity from 0.1 to 0.7 to one-third of the randomly selected bands.
Case 9: Each band is interfered with at random by at least one of the types of noise in Cases 5-8.
For evaluation, 50 images from the ICVL test set are used for the first four Gaussian noise scenarios, while the remaining 51 images are used for the last five complex noise scenarios. The CAVE datasets are employed for synthetic experiments in the first four Gaussian noise scenarios, whereas the Harvard datasets are used for the last five complex noise scenarios. All test images are of size
3.3 Real-world denoising experiments
The Urban dataset contains 210 bands with a spectral range of 400–2500 nm and a spectral resolution of 10 nm. Due to water vapor absorption and atmospheric effects, some bands in the dataset suffer from real noise degradation. For testing, we extract hyperspectral image patches containing noisy bands, sampled at 31 bands. The spatial resolution of the image patch is 304
3.4 Network training
We implement our model using the PyTorch framework and optimize it with the AdamW optimizer by minimizing the mean squared error (MSE) between the outputs and the ground truths. A multi-stage learning rate scheduling strategy is adopted, where the learning rate is initialized at
4 Results and discussions
4.1 Denoising results for synthetic noise on the ICVL datasets
To evaluate the denoising performance of the proposed method, we conduct experiments on nine synthetic noise scenarios of the ICVL datasets. The objective metrics are reported in Table 1. Figures 4, 5 present the results and corresponding spectral curves in the Gaussian blind denoising scenario (Case4), while Figures 6, 7 show the results and spectral curves in the complex noise scenario (Case9).
Table 1. Denoising results with different noise on ICVL. Case 1–4 correspond to Gaussian noise with
Figure 4. Denoising visualization comparison for simulated Gaussian noise (case 4) on the ICVL. The pseudocolor image consists of bands (9, 14, 31). Zoom in for a better view of the difference.
Figure 6. Denoising visualization comparison for simulated complex noise (case 9) on the ICVL. The pseudocolor image consists of bands (5, 16, 29). Zoom in for a better view of the difference.
As shown in Table 1, SMDe achieves the best results in Case2, Case4, Case6, Case8, and Case9, and also obtains suboptimal results in the remaining scenarios. Compared with the second-best method MambaIRv2, SMDe achieves an average PSNR improvement of 0.2 dB across the nine noise scenarios. Moreover, SMDe consistently outperforms all competing methods in terms of the SAM metric across all scenarios, indicating that SMDe maintains good spectral consistency under different denoising conditions.
For Gaussian blind denoising (Case4), as shown in Figure 4, all methods achieve very good denoising effects. In particular, the spectral response curves in Figure 5 show that our method produces results closest to the ground truth. For complex noise denoising (Case9), Figure 6 illustrates that HSIDCNN, QRNN3D, and T3SC suffer from excessive smoothing, while SST, SSUMamba, and MambaIRv2 exhibit obvious stripes. In contrast, SMDe provides the best visual quality. As shown in Figure 7, the spectral curves of SMDe and MambaIRv2 are the closest to the ground truth. In summary, the denoising results on Gaussian and complex noise demonstrate that SMDe can effectively restore image details while maintaining good spectral consistency across different noise scenarios.
4.2 Denoising results for synthetic noise on the CAVE datasets
To evaluate cross-dataset generalization, we conduct experiments on the CAVE dataset with four synthetic Gaussian noise scenarios, where all methods were tested using the parameters trained on the ICVL dataset. As shown in Table 2, the objective metrics demonstrate that for noise levels of 30 (Case 1), 50 (Case 2), and blind denoising (Case 4), all models achieve comparable performance, with SST obtaining the best result for Case 1 and SMDe achieving the best results for Case 2 and Case 4. For noise level 70 (Case 3), the performance of all methods except SMDe and MambaIRv2 drops significantly. Specifically, compared with Case 2 (noise level 50), SMDe and MambaIRv2 show a moderate PSNR decrease of about 1 dB, whereas the other methods suffer a much larger drop of approximately 3 dB.
Table 2. Denoising results with different noise on CAVE. Case 1–4: Gaussian noise with
As shown in Figure 8, for the denoising results of Case 3, SMDe preserves clear structural details, whereas other methods suffer from over-smoothing and blurred boundaries. Furthermore, as observed in Figure 9, the spectral response curves of SMDe, QRNN3D, and SSUMamba are closer to the ground truth than those of the other methods. Overall, these results indicate that SMDe remains highly competitive for denoising on datasets different from the training domain.
Figure 8. Denoising visualization comparison for simulated Gaussian noise (case 3) on the CAVE. The pseudocolor image consists of bands (5, 17, 26). Zoom in for a better view of the difference.
4.3 Denoising results for synthetic noise on the harvard datasets
To further evaluate cross-dataset generalization, we conduct experiments on the Harvard dataset with five complex noise scenarios, where all methods were also tested using the parameters trained on the ICVL dataset. As shown in Table 3, SMDe achieves the highest PSNR in the non-i. i.d. (Case 5), deadline (Case 7), and impulse (Case 8) noise scenarios, and obtains the second-highest PSNR in the mixture noise scenario (Case 9). MambaIRv2 attains the highest PSNR for mixture noise (Case 9) while ranking second in the other scenarios. SST produces the best result for stripe noise (Case 6).
Table 3. Denoising results with different noise on Harvard. Case 5–9: non-i.i.d., stripe, deadline, impulse, and mixture complex noise. The best results in each row are in bold, the second best results are underlined.
Figure 10 shows the denoised images for Case 9. The results of HSIDCNN and T3SC appear over-smoothed, while QRNN3D still contains noticeable noise, and SSUMamba lacks vertical structural details. The results of SST, MambaIRv2, and SMDe are visually similar. Figure 11 presents the spectral response curves. It can be seen that the reconstructions by SMDe and SSUMamba are closest to the ground truth. Overall, these observations indicate that SMDe maintains strong denoising performance for complex noise across different datasets.
Figure 10. Denoising visualization comparison for simulated complex noise (case 9) on the Harvard. The pseudocolor image consists of bands (5, 14, 26). Zoom in for a better view of the difference.
4.4 Denoising results for real noise on the urban datasets
To evaluate the effectiveness of SMDe in removing real noise, we conduct experiments on the Urban. Figure 12 presents the denoising results of the 104th band. HSIDCNN and T3SC exhibit varying degrees of over-smoothing, while QRNN3D still contains noticeable noise. SSUMamba produces the brightest results, but the image boundaries are unclear. SMDe and MambaIRv2 preserve the image details most effectively.
Figure 12. Denoising visualization comparison for real noise on the Urban. Zoom in for a better view of the difference.
4.5 Ablation study
To validate the effectiveness of each component in our model, we conduct ablation experiments on the Case 9 noise test set of the ICVL dataset as the benchmark. We gradually add Mamba Layer, Transformer Layer and SMM to the model for training. To evaluate the effectiveness of the SSB, we also compare results with and without its use. Table 4 shows the objective metrics of these experiments, showing that the final model achieves the best denoising performance.
4.6 Computational complexity and runtime analysis
To further evaluate the efficiency and practicality of the proposed method, we analyze the computational complexity and runtime performance of different methods. All experiments are conducted on hyperspectral images of size
Table 5. Comparisons of Params, GFLOPs, inference time, and PSNR OF different methods as the input size is 512
HSIDCNN and QRNN3D achieve relatively fast inference speed, but rely on higher computational cost or exhibit limited denoising performance. SST and SSUMamba demonstrate strong restoration capability at the expense of substantially increased computational complexity and runtime. In contrast, the proposed SMDe achieves the highest PSNR with relatively low GFLOPs and a moderate number of parameters, indicating a favorable balance between denoising performance and computational efficiency.
5 Conclusion
In this paper, we proposed a spatial–spectral modulation network (SMDe) for hyperspectral image denoising, consisting of two main components: spatial feature extraction and spectral modulation. A residual network combining Mamba and Transformer is employed to extract spatial features, while a spectral modulation module adaptively determines whether spectral information requires reconstruction. The extracted spatial and spectral information are fused to generate the denoised image. Experimental results on both synthetic and real hyperspectral datasets demonstrate that SMDe outperforms state-of-the-art approaches in most noisy scenarios, effectively restoring image details.
Nevertheless, the current method relies on regional pooled spectra via image window segmentation, which may limit the accuracy of spectral information extraction. And the spectral modulation module may not fully capture subtle spectral variations in complex or highly noisy regions, potentially limiting the accuracy of spectral reconstruction. Future work will focus on integrating region-based semantic segmentation for patch division, exploring adaptive spectral reconstruction strategies, and improving the generalization and computational efficiency of the model to broaden its applicability to diverse hyperspectral datasets.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Author contributions
KY: Conceptualization, Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing. WY: Data curation, Validation, Visualization, Writing – review and editing. BF: Formal Analysis, Writing – review and editing. HX: Investigation, Visualization, Writing – review and editing. XX: Funding acquisition, Supervision, Writing – review and editing. WS: Supervision, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was partially supported by the Project of the Jilin Provincial Department of Science and Technology (No. 20220101133JC).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frsen.2025.1712816/full#supplementary-material
References
Ahmed, M. T., Monjur, O., Khaliduzzaman, A., and Kamruzzaman, M. (2025). A comprehensive review of deep learning-based hyperspectral image reconstruction for agri-food quality appraisal. Artif. Intell. Rev. 58, 96. doi:10.1007/s10462-024-11090-w
Akhtar, N., Shafait, F., and Mian, A. (2014). Sparse spatio-spectral representation for hyperspectral image super-resolution. Comput. Vis. – ECCV 2014, 63–78. doi:10.1007/978-3-319-10584-0_5
Arad, B., and Ben-Shahar, O. (2016). “Sparse recovery of hyperspectral signal from natural rgb images,” in Computer Vision – 14th European Conference, ECCV 2016, Proceedings, 19–34. doi:10.1007/978-3-319-46478-7_2
Arun, A. S., and Akila, A. S. (2023). Land-cover classification with hyperspectral remote sensing image using cnn and spectral band selection. Remote Sens. Appl. Soc. Environ. 31, 100986. doi:10.1016/j.rsase.2023.100986
Bodrito, T., Zouaoui, A., Chanussot, J., and Mairal, J. (2021). A trainable spectral-spatial sparse coding model for hyperspectral image restoration. Adv. Neural Inf. Process. Syst. 34, 5430–5442. Available online at: https://arxiv.org/abs/2111.09708.
Chakrabarti, A., and Zickler, T. (2011). “Statistics of real-world hyperspectral images,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 193–200. doi:10.1109/cvpr.2011.5995660
Chen, K., Chen, B., Liu, C., Li, W., Zou, Z., and Shi, Z. (2024). Rsmamba: remote sensing image classification with state space model. IEEE Geoscience Remote Sens. Lett. 21, 1–5. doi:10.1109/LGRS.2024.3407111
Chen, G., Li, G., Jin, S., and Bai, L. (2025). Dacnet: depth-aware convolutional network for corn hyperspectral image classification. Eng. Res. Express 7, 045231. doi:10.1088/2631-8695/ae1368
Długosz, J., Dao, P. B., Staszewski, W. J., and Uhl, T. (2023). Damage detection in composite materials using hyperspectral imaging. Eur. Workshop Struct. Health Monit., 463–473. doi:10.1007/978-3-031-07258-1_48
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). “An image is worth 16x16 words: transformers for image recognition at scale,” in International Conference on Learning Representations.
Fan, Q., Huang, H., Guan, J., and He, R. (2023). Rethinking local perception in lightweight vision transformer.
Fu, Y., Zheng, Y., Sato, I., and Sato, Y. (2016). Exploiting spectral-spatial correlation for coded hyperspectral image restoration. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 3727–3736. doi:10.1109/CVPR.2016.405
Fu, G., Xiong, F., Lu, J., and Zhou, J. (2024). Ssumamba: spatial-spectral selective state space model for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 62, 1–14. doi:10.1109/TGRS.2024.3446812
Guo, H., Guo, Y., Zha, Y., Zhang, Y., Li, W., Dai, T., et al. (2025). “Mambairv2: attentive state space restoration,” in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 28124–28133. doi:10.1109/cvpr52734.2025.02619
Jia, M., Gong, M., Zhang, E., Li, Y., and Jiao, L. (2014). Hyperspectral image classification based on nonlocal means with a novel class-relativity measurement. IEEE Geosci. Remote Sens. Lett. 11, 1300–1304. doi:10.1109/LGRS.2013.2292823
Kalman, L. S., and III, E. M. B. (1997). Classification and material identification in an urban environment using hydice hyperspectral data. Imaging Spectrom. III (SPIE) 3118, 57–68. doi:10.1117/12.283843
Kang, X., Wang, Z., Duan, P., and Wei, X. (2022). The potential of hyperspectral image classification for oil spill mapping. IEEE Trans. Geosci. Remote Sens. 60, 1–15. doi:10.1109/TGRS.2022.3205966
Li, M., Fu, Y., and Zhang, Y. (2023). Spatial-spectral transformer for hyperspectral image denoising. Proc. AAAI Conf. Artif. Intell. 37, 1368–1376. doi:10.1609/aaai.v37i1.25221
Li, Z., Chen, G., Li, G., Zhou, L., Pan, X., Zhao, W., et al. (2024). Dbanet: dual-branch attention network for hyperspectral remote sensing image classification. Comput. Electr. Eng. 118, 109269. doi:10.1016/j.compeleceng.2024.109269
Liang, J., Cao, J., Sun, G., Zhang, K., Gool, L. V., and Timofte, R. (2021). “Swinir: image restoration using swin transformer,” in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 1833–1844. doi:10.1109/ICCVW54120.2021.00210
Lin, P., Sun, L., Wu, Y., and Ruan, W. (2024). Hyperspectral image denoising via correntropy-based nonconvex low-rank approximation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 17, 6841–6859. doi:10.1109/JSTARS.2024.3373466
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021). “Swin transformer: hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992–10002. doi:10.1109/ICCV48922.2021.00986
Ma, L., Pruitt, K., and Fei, B. (2024). A hyperspectral surgical microscope with super-resolution reconstruction for intraoperative image guidance. Proc. SPIE Int. Soc. Opt. Eng. 12930, 129300Z. doi:10.1117/12.3008789
Saeed, A., Hadoux, X., and van Wijngaarden, P. (2025). Hyperspectral retinal imaging biomarkers of ocular and systemic diseases. Eye 39, 667–672. doi:10.1038/s41433-024-03135-9
Sumarsono, A., and Du, Q. (2015). Hyperspectral image classification with low-rank subspace and sparse representation. Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2864–2867. doi:10.1109/IGARSS.2015.7326412
Tang, S., and Zhou, N. (2018). Local similarity regularized sparse representation for hyperspectral image super-resolution. Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 5120–5123. doi:10.1109/IGARSS.2018.8518168
Wang, R., and Li, H.-C. (2014). Nonlocal similarity regularization for sparse hyperspectral unmixing. Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2926–2929. doi:10.1109/IGARSS.2014.6947089
Wang, X., and Xie, W. (2018). “An adaptive spatio-spectral domain correlation parallel framework for hyperspectral image classification,” in Proc. IEEE Int. Conf. Signal Process. (ICSP), 350–354. doi:10.1109/ICSP.2018.8652407
Wang, X., Rohani, N., Manerikar, A., Katsagellos, A., Cossairt, O., and Alshurafa, N. (2017). “Distinguishing nigerian food items and calorie content with hyperspectral imaging,” in New trends image anal. Process. – ICIAP 2017, 462–470. doi:10.1007/978-3-319-70742-6_45
Wang, H., Wang, H., Yu, W., and Li, H. (2019). “Research on wood species recognition method based on hyperspectral image texture features,” in Proc. Int. Conf. Mech. Control Comput. Eng. (ICMCCE), 413–4133. doi:10.1109/ICMCCE48743.2019.00099
Wang, B., Chen, G., Wen, J., Li, L., Jin, S., Li, Y., et al. (2025). Ssatnet: spectral-spatial attention transformer for hyperspectral corn image classification. Front. Plant Sci. 15, 1458978. doi:10.3389/fpls.2024.1458978
Wei, K., Fu, Y., and Huang, H. (2021). 3-d quasi-recurrent neural network for hyperspectral image denoising. IEEE Trans. Neural Netw. Learn. Syst. 32, 363–375. doi:10.1109/TNNLS.2020.2978756
Yadav, D. P., Kumar, D., Jalal, A. S., and Sharma, B. (2025). Hyperspectral image denoising through hybrid spectral transformer network. Adv. Space Res. 76, 6673–6693. doi:10.1016/j.asr.2025.09.028
Yasuma, F., Mitsunaga, T., Iso, D., and Nayar, S. K. (2010). Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 19, 2241–2253. doi:10.1109/TIP.2010.2046811
Yuan, Q., Zhang, Q., Li, J., Shen, H., and Zhang, L. (2019). Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 57, 1205–1218. doi:10.1109/TGRS.2018.2865197
Keywords: deep learning, denoising, hyperspectral image, mamba, neural network, transformer
Citation: Yang K, Yuan W, Fang B, Xia H, Xing X and Shang W (2026) SMDe: enhancing hyperspectral image denoising through a spatial-spectral modulated network. Front. Remote Sens. 6:1712816. doi: 10.3389/frsen.2025.1712816
Received: 25 September 2025; Accepted: 24 December 2025;
Published: 12 January 2026.
Edited by:
Qiangqiang Yuan, Wuhan University, ChinaReviewed by:
Dhirendra Prasad Yadav, GLA University, IndiaGongChao Chen, Henan Institute of Science and Technology, China
Copyright © 2026 Yang, Yuan, Fang, Xia, Xing and Shang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiaoxue Xing , eGluZ3h4QGNjdS5lZHUuY24=
Weixiang Yuan1