Multi-focus image fusion based on pulse coupled neural network and WSEML in DTCWT domain

Jia, Yuan; Ma, Tiande

doi:10.3389/fphy.2025.1575606

ORIGINAL RESEARCH article

Front. Phys., 02 April 2025

Sec. Radiation Detectors and Imaging

Volume 13 - 2025 | https://doi.org/10.3389/fphy.2025.1575606

This article is part of the Research TopicMulti-Sensor Imaging and Fusion: Methods, Evaluations, and Applications, Volume IIIView all 11 articles

Multi-focus image fusion based on pulse coupled neural network and WSEML in DTCWT domain

Yuan Jia¹

Tiande Ma²*

¹School of Statistics, Renmin University of China, Beijing, China
²School of Computer Science and Technology, Xinjiang University, Urumqi, China

The goal of multi-focus image fusion is to merge near-focus and far-focus images of the same scene to obtain an all-focus image that accurately and comprehensively represents the focus information of the entire scene. The current multi-focus fusion algorithms lead to issues such as the loss of details and edges, as well as local blurring in the resulting images. To solve these problems, a novel multi-focus image fusion method based on pulse coupled neural network (PCNN) and weighted sum of eight-neighborhood-based modified Laplacian (WSEML) in dual-tree complex wavelet transform (DTCWT) domain is proposed in this paper. The source images are decomposed by DTCWT into low- and high-frequency components, respectively; then the average gradient (AG) motivate PCNN-based fusion rule is used to process the low-frequency components, and the WSEML-based fusion rule is used to process the high-frequency components; we conducted simulation experiments on the public Lytro dataset, demonstrating the superiority of the algorithm we proposed.

1 Introduction

Multi-focus image fusion is a technique in the field of image processing that combines multiple images, each focused on different objects or regions, into a single image that captures the sharp details from all focal points [1]. This approach is particularly useful in applications where the depth of field is limited, such as in macro photography, surveillance, medical imaging, and robotics [2, 3].

In typical photography, a single image can only present objects within a certain range of focus clearly, leaving objects closer or farther away blurry [4, 5]. However, by capturing several images with different focus points and then combining them through image fusion techniques, it is possible to create a final image that maintains sharpness across a wider range of depths [6–8].

The process of multi-focus image fusion generally involves several key steps: image alignment, where all the images are aligned spatially; focus measurement, where the sharpness of various regions in each image is assessed; and fusion, where the sharpest information from each image is retained [9–11]. Advanced fusion algorithms, including pixel-level, transform-domain, and machine learning-based methods, can be employed to optimize the fusion quality and preserve important features from all focused regions. This technology has a broad range of applications. In medical imaging, it helps to create clearer, more detailed visualizations of organs or tissues. In surveillance, it enhances the clarity of objects at varying distances. In robotics, it contributes to improved perception by enabling robots to focus on multiple objects simultaneously [12, 13]. As computational power and algorithms continue to advance, multi-focus image fusion is expected to play an increasingly significant role in a variety of fields requiring high-quality visual information [14–17].

Currently, image fusion can be categorized into two types: traditional algorithms and deep learning algorithms [18–20]. Traditional algorithms typically rely on handcrafted features and conventional image processing techniques, such as Laplacian pyramid [21], wavelet transform [22], dual-tree complex wavelet transform (DTCWT) [23], contourlet [24–26], shearlet [27, 28] and gradient-based methods [29], to combine focused regions from multiple images. Mohan et al. [30] introduced the multi-focus image fusion method based on quarter shift dual-tree complex wavelet transform (qshiftN DTCWT) and modified principal component analysis (MPCA) in the Laplacian pyramid (LP) domain, and this method outperforms many state-of-the-art techniques in terms of visual and quantitative evaluations. Mohan et al. [31] introduced the image fusion method based on DTCWT combined with stationary wavelet transform (SWT). Lu et al. [32] introduced the multi-focus image fusion using residual removal and fractional order differentiation focus measure, and this algorithm simultaneously employs nonsubsampled shearlet transform and the sum of Gaussian-based fractional-order differentiation. These methods are generally effective in simpler scenarios, but they may struggle with more complex images, especially when dealing with varying levels of focus and noise. Pulse coupled neural network (PCNN) also has extensive applications in the field of image fusion, Xie et al. [33] proposed the multi-focus image fusion method based on sum-modified Laplacian and PCNN in nonsampled contourlet transform domain, and this method excellently improves the focus clarity.

On the other hand, deep learning has extensive applications in image fusion [34–37], image segmentation [38, 39], and video restoration [40–44], and image super-resolution [45, 46]. Deep learning algorithms leverage convolutional neural networks (CNNs), Transformer, Generative adversarial network (GAN), Mamba and other advanced models to automatically learn features and perform fusion in an end-to-end manner [47–49]. These methods can adapt to a wide range of image complexities, providing more accurate and visually appealing fused images, especially in challenging conditions like low light or high noise environments [50, 51]. Deep learning approaches have shown superior performance in recent years, particularly with the availability of large datasets and powerful computational resources [52, 53].

Inspired by the ideas from the algorithm in Reference [33], in this paper, a novel multi-focus image fusion method based on PCNN and weighted sum of eight-neighborhood-based modified Laplacian (WSEML) in DTCWT domain is proposed. The motivation behind this approach is to achieve a more robust and effective fusion method that can handle complex images with varying focus levels and noise, while also being computationally efficient. The source images are decomposed by DTCWT into low- and high-frequency components, respectively; then the average gradient (AG) motivate PCNN fusion rule is used to process the low-frequency components, and the WSEML-based fusion rule is used to process the high-frequency components. The algorithm’s superiority is validated through comparative experiments on public Lytro dataset.

2 DTCWT

The dual-tree complex wavelet transform (DTCWT) is an advanced signal processing technique designed to overcome some of the limitations of the traditional discrete wavelet transform (DWT) [54]. It was introduced to provide better performance in tasks such as image denoising, compression, and feature extraction. The DTCWT is particularly useful for applications where directional sensitivity and shift invariance are important.

The DTCWT provides improved directional information compared to the traditional wavelet transforms. It uses two parallel trees of wavelet filters (hence “dual-tree”), one for the real part and one for the imaginary part. This structure allows for better representation of image features, especially edges and textures, in multiple orientations. Unlike the traditional DWT, which suffers from shift variance (i.e., small translations in the signal can cause large changes in the wavelet coefficients), the DTCWT provides a level of shift invariance [55, 56]. This makes it more robust to small shifts or distortions in the input signal, which is critical for many image and signal processing tasks. The transform uses complex-valued coefficients rather than real-valued coefficients. This allows for better capture of phase information in addition to amplitude, providing more detailed and richer representations of the signal or image. The DTCWT significantly reduces the aliasing effect, a common issue in wavelet transforms when high-frequency components mix with low-frequency ones. The dual-tree structure and the use of complex filters help mitigate this problem [57].

3 The proposed method

The multi-focus image fusion algorithm we proposed can be mainly divided into four steps: image decomposition, low-frequency fusion, high-frequency fusion, and image reconstruction. The structure of the proposed method is shown in Figure 1, and the specific process is as follows.

Figure 1

Figure 1. The structure of the proposed method.

3.1 Image decomposition

The source images A and B are decomposed into low-frequency components $\{L^{A}, L^{B}\}$ and high-frequency components $\{H_{l, d}^{A}, H_{l, d}^{B}\}$ using DTCWT. The $L^{X} |X \in (A, B)$ shows the low-frequency, and $H_{l, d}^{X} |X \in (A, B)$ shows the high-frequency sub-bands $l$ level in the $d$ orientation.

3.2 Low-frequency fusion

The low-frequency component of the image contains the main background information of the image. The average gradient-based (AG) motivate PCNN fusion rule is used to process the low-frequency sub-bands, and the corresponding equations are defined as follows [58, 59]:

A G_{i j} = \frac{\sum_{i} \sum_{j} {({(f (i, j) - f (i + 1, j))}^{2} + {(f (i, j) - f (i, j + 1))}^{2})}^{\frac{1}{2}}}{m n} (1)

F_{i j} (n) = A G_{i j} (2)

L_{i j} (n) = e^{- α_{L}} L_{i j} (n - 1) + V_{L} \sum_{p q} W_{i j, p q} Y_{i j, p q} (n - 1) (3)

U_{i j} (n) = F_{i j} (n) * (1 + β L_{i j} (n)) (4)

θ_{i j} (n) = e^{- α_{θ}} θ_{i j} (n - 1) + V_{θ} Y_{i j} (n - 1) (5)

Y_{i j} (n) = \{\begin{array}{l} 1, if U_{i j} (n) > θ_{i j} (n) \\ 0 else \end{array} (6)

T_{i, j} = T_{i, j} (n - 1) + Y_{i, j} (n) (7)

In Equation 1, the $f (i, j)$ is pixel intensity at $(i, j)$ and $m \times n$ is the size of the image. In the mathematical model of PCNN in Equations 2–6, the feeding input $F_{i j}$ is equal to the normalized $A G_{i, j}$ . The linking input $L_{i j}$ is equal to the sum of neurons firing times in linking range. $W_{i j, p q}$ is the synaptic gain strength and subscripts $p$ and $q$ are the size of linking range in PCNN. $α_{L}$ is the decay constants. $V_{L}$ and $V_{θ}$ are the amplitude gain. $β$ is the linking strength. $U_{i j}$ is total internal activity. $θ_{i j}$ is the threshold. $n$ denotes the iteration times. If $U_{i j}$ is larger than $θ_{i j}$ , then, the neuron will generate a pulse $Y_{i j} = 1$ , also called one firing time. In fact, the sum of $Y_{i j}$ in $n$ iteration is often defined as Equation 7, called firing times, to represent image information. Rather than $Y_{i j} (n)$ , one often analyzes $T_{i j} (n)$ , because neighboring coefficients with similar features representing similar firing times in a given iteration times. $A G$ is input to PCNN to motivate the neurons and generate pulse of neurons with Equations 2–6. Then, firing times $T_{i j} (n)$ is calculates as Equation 7.

Get the decision map $D_{i j}$ based on Equation 8 and select the coefficients with Equation 9, which means that coefficients with large firing times are selected as coefficients of the fused. The fusion rule is designed as follows:

D_{F, i j} = \{\begin{array}{l} 1 If T_{A, i j} (n) \geq T_{B, i j} (n) \\ 0 else \end{array} (8)

L^{F} (i, j) = \{\begin{array}{l} L_{A} (i, j) If D_{i j} (n) = 1 \\ L_{B} (i, j) If D_{i j} (n) = 0 \end{array} (9)

where $L^{F}$ shows the fused low-frequency sub-band.

3.3 High-frequency fusion

The high-frequency component of the image contains the detailed information of the image. The weighted sum of eightneighborhood-based modified Laplacian (WSEML) is used to process the high-frequency sub-bands with Equations 10–12 [60]:

\begin{align} W S E M L_{X} (i, j) & = \sum_{m = - r}^{r} \sum_{n = - r}^{r} Φ (m + r + 1, n + r + 1) \\ \times E M L_{X} (i + m, j + n) \end{align} (10)

\begin{array}{l} E M L_{X} (i, j) = & |2 X (i, j) - X (i - 1, j) - X (i + 1, j)| \\ + |2 X (i, j) - X (i, j - 1) - X (i, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 X (i, j) - X (i - 1, j - 1) - X (i + 1, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 X (i, j) - X (i - 1, j + 1) - X (i + 1, j - 1)| \end{array} (11)

where $X \in \{A, B\}$ , and $Φ$ is a $(2 r + 1) \times (2 r + 1)$ weighting matrix with radius $r$ . For each element in $Φ$ , its value is set to $2^{2 r - d}$ , where $d$ is its four-neighborhood distance to the center. As an example, the $3 \times 3$ normalized version of $Φ$ is

\frac{1}{16} [\begin{array}{l} 1 2 1 \\ 2 4 2 \\ 1 2 1 \end{array}]

The fused high-frequency sub-bands are defined as follows:

H_{l, d}^{F} (i, j) = \{\begin{array}{l} H_{l . d}^{A} (i, j) if W S E M L_{H_{l, d}^{A}} (i, j) \geq W S E M L_{H_{l, d}^{B}} (i, j) \\ H_{l, d}^{B} (i, j) else \end{array} (12)

where $H_{l, d}^{F} (i, j)$ shows the fused high-frequency sub-bands.

3.4 Image reconstruction

The fused image $F$ is obtained by the inverse DTCWT on $L^{F} (i, j)$ and $H_{l, d}^{F} (i, j)$ .

4 Experimental results and analysis

To demonstrate the effectiveness of our algorithm, we conducted simulation experiments on the commonly used public Lytro dataset [61] and compared it with six classic image fusion algorithms, namely, GD [29], FusionDN [62], PMGI [63], U2Fusion [64], ZMFF [65], and UUDFusion [66]. Additionally, we employed six objective evaluation metrics to qualitatively assess the experimental results, namely, edge-based similarity measurement $Q_{A B / F}$ [59], mutual information metric $Q_{M I}$ [59], nonlinear correlation information entropy $Q_{N C I E}$ [67], Chen-Blum metric $Q_{C B}$ [67], image fusion metric-based on phase congruency $Q_{P}$ [67] and gradient-based fusion performance $Q_{G}$ [67]. The higher these metric values, the better the fusion effect. We adopt a combined subjective and objective evaluation approach to measure the effectiveness of the algorithms. The parameters of the comparison algorithms were set according to the original papers, while in our algorithm, the decomposition level of DTCWT was set to 4 layers; parameters of PCNN is set as $p \times q$ , $α_{L} = 0.06931$ , $α_{θ} = 0.2$ , $β = 0.2$ , $V_{L} = 1.0$ , $V_{θ} = 20$ , $Φ = [\begin{array}{l} 0.707 1 0.707 \\ 1 0 1 \\ 0.707 1 0.707 \end{array}]$ , and the maximal iterative number is $n = 200$ .

Figure 2 shows the fused results with different methods on Lytro-01. The GD method retains significant focus information from both the foreground and background. However, some blending artifacts are visible, and the focus transitions may not be smooth. The FusionDN algorithm preserves structural details well but exhibits some loss of sharpness in the golfer and background. The fusion quality is moderate, with slight blurring at focus boundaries. The PMGI method achieves reasonable fusion but struggles with preserving contrast and sharpness, especially in the golfer’s details. The background appears slightly oversmoothed. The ZMFF method performs well in maintaining the focus of both the foreground (golfer) and background. The details are well-preserved, but minor artifacts can be noticed in the focus transition areas. The UUDFusion method produces an average fusion result, with noticeable blurring in both the foreground and background. The image lacks the clarity and sharpness needed for an effective all-focus image. The proposed method delivers the best results. Both the golfer (foreground) and the background are sharply focused, with smooth transitions between the focus regions. The image appears natural and well-balanced, with no noticeable artifacts.

Figure 2

Figure 2. Fusion results on Lytro-01. (a) Source A; (b) Source B; (c) GD; (d) FusionDN; (e) PMGI; (f) U2Fusion; (g) ZMFF; (h) UUDFusion; (i) Proposed.

Figure 3 presents fusion results for various algorithms applied to the Lytro-02 dataset, aiming to create an all-focus image by combining the near-focus (foreground) and far-focus (background) regions. The proposed method clearly outperforms all other methods, producing a sharp and balanced image where both the diver’s face and the background are well-preserved. The transitions between focus regions are smooth and free of noticeable artifacts, resulting in a natural-looking image. ZMFF demonstrates competitive performance, preserving sharpness in both the diver’s face and the background. However, slight artifacts and less refined transitions between focus regions make it less effective than the proposed method. Similarly, FusionDN and U2Fusion provide moderate results, balancing focus between the foreground and background but lacking the sharpness and clarity of the best-performing algorithms. PMGI maintains good detail in the background but struggles with sharpness in the foreground, leading to an imbalanced fusion result. GD performs adequately, but the diver’s face appears softened, and overall sharpness is inconsistent. Finally, UUDFusion produces the weakest fusion result, with significant blurring in both focus areas, making it unsuitable for generating high-quality all-focus images. In summary, the proposed method achieves the most visually appealing and technically superior fusion result, while ZMFF serves as a strong alternative with slight limitations. Other algorithms exhibit varying levels of performance but fall short of achieving the balance and detail provided by the proposed method.

Figure 3

Figure 3. Fusion results on Lytro-02. (a) Source A; (b) Source B; (c) GD; (d) FusionDN; (e) PMGI; (f) U2Fusion; (g) ZMFF; (h) UUDFusion; (i) Proposed.

Figure 4 compares the fusion results of multiple algorithms on the Lytro-03 dataset. Each algorithm demonstrates varying capabilities in handling multi-focus image fusion, balancing sharpness, color fidelity, and detail preservation. These are the two input images with distinct focal regions. Source A focuses on the foreground, while Source B highlights the background. The goal of fusion algorithms is to combine these focal regions into a single, sharp image. The GD method struggles with detail preservation and produces a fused image that appears slightly blurred, especially around the edges of the child’s face. The colors also seem less vibrant, which detracts from the overall quality. As a deep learning-based approach, FusionDN performs well in preserving details and maintaining sharpness. The child’s face and the Cartoon portrait are both clear, with vivid colors. However, minor edge artifacts are noticeable, which slightly impacts the naturalness of the result. The PMGI approach achieves a good balance between sharpness and detail integration. However, it slightly lacks precision in integrating the finest details. The U2Fusion provides decent sharpness and color fidelity but occasionally fails to balance focus across regions. For example, the child’s face is slightly less sharp compared to the background, resulting in a less seamless fusion. Some areas also become very dark, resulting in severe information loss. This ZMFF method exhibits noticeable limitations. The fused image lacks sharpness, and the details in both the foreground and background are not well-preserved. The colors are also muted, leading to an overall decrease in visual quality. The image produced by UUDFusion exhibits severe distortion and artifacts, with significant color information loss and poor fusion performance.The proposed method outperforms all others in this comparison. It successfully combines the sharpness and details of both the child’s face and the gingerbread figure. The colors are vibrant and natural, with no visible artifacts or blurriness. The transitions between the foreground and background are smooth, creating a visually seamless result.

Figure 4

Figure 4. Fusion results on Lytro-03. (a) Source A; (b) Source B; (c) GD; (d) FusionDN; (e) PMGI; (f) U2Fusion; (g) ZMFF; (h) UUDFusion; (i) Proposed.

Figure 5 compares the fusion results of various algorithms on the Lytro-04 dataset, focusing on how well the algorithms preserve details, manage focus regions, and maintain color fidelity. Figure 5a focuses on the foreground, specifically the man’s face and sunglasses, while the background is blurred. Figure 5b focuses on the background (the person and chair) but blurs the foreground. Figures 5c-i represent the fusion results of different algorithms. The GD exhibits moderate sharpness in both the foreground and background. However, some details in the man’s sunglasses and the background elements appear slightly smoothed, reducing overall clarity. The color representation is acceptable but lacks vibrancy compared to other methods. As a deep learning-based method, FusionDN achieves good sharpness and color fidelity. The man’s face and sunglasses are well-preserved, and the background details are clear. However, subtle edge artifacts are noticeable around the foreground and background transitions, slightly affecting the fusion quality. The PMGI fails to preserve sufficient details in both the foreground and background. The man’s sunglasses appear blurred, and the background lacks clarity. The overall image looks less vibrant and exhibits significant information loss, making it one of the weaker methods in this comparison. The overall quality of the fused image is subpar. The U2Fusion method achieves decent fusion but struggles with focus balance. The foreground (sunglasses and face) is slightly less sharp, while the background elements are relatively clear. The ZMFF method produces relatively good fusion results, but the brightness and sharpness of the image still need improvement. The UUDFusion generates noticeable artifacts and distortions, particularly in the background. The details in the foreground (the man’s face and sunglasses) are not clear, with significant color distortion, resulting in poor fusion performance. The proposed method demonstrates the best performance among the algorithms. Both the foreground (man’s face and sunglasses) and the background (chair and person) are sharp, with vibrant and natural colors. The transitions between the focused regions are smooth, and there are no visible artifacts or distortions. It successfully preserves all critical details, making it the most effective fusion approach in this comparison.

Figure 5

Figure 5. Fusion results on Lytro-04. (a) Source A; (b) Source B; (c) GD; (d) FusionDN; (e) PMGI; (f) U2Fusion; (g) ZMFF; (h) UUDFusion; (i) Proposed.

Table 1 shows the average metric values of different algorithms in the simulation experiments on 20 data sets from the Lytro dataset. Table 1 compares the performance of various algorithms on the Lytro dataset across six evaluation metrics: $Q_{A B / F}$ , $Q_{M I}$ , $Q_{N C I E}$ , $Q_{C B}$ , $Q_{P}$ and $Q_{G}$ . Each metric highlights different aspects of image fusion quality. Among the listed methods, the proposed method demonstrates the best overall performance. It achieves the highest scores in all metrics, such as $Q_{A B / F} = 0.7409$ , $Q_{M I} = 7.1960$ , $Q_{N C I E} = 0.8313$ , $Q_{C B} = 0.7504$ , $Q_{P} = 0.8137$ and $Q_{G} = 0.7385$ . These results suggest that the proposed method is highly robust and effective, delivering superior results across multiple dimensions of evaluation. ZMFF also shows competitive performance. The FusionDN and U2Fusion maintain balanced performance but fail to excel in any particular metric. UUDFusion performs consistently lower across all metrics, indicating limited effectiveness compared to other algorithms. In summary, the proposed method clearly outperforms all other algorithms, providing the best fusion performance. The ZMFF and GD are strong competitors in specific metrics, but their inconsistencies in other areas limit their overall efficacy. This comparison highlights the superiority of the proposed method for image fusion tasks on the Lytro dataset. These results are consistent with the objective evaluation shown in Figures 2–5.

Table 1

Table 1. The average metric values of different methods on Lytro dataset.

5 Conclusion

In this paper, a novel multi-focus image fusion method based on pulse coupled neural network and WSEML in DTCWT domain is proposed. The source images are decomposed by DTCWT into low- and high-frequency components, respectively; then the AG and pulse coupled neural network-based fusion rule is used to process the low-frequency components, and the WSEML-based fusion rule is used to process the high-frequency components. The experimental results show that our method achieves better performance in terms of both visual quality and objective evaluation metrics compared to several state-of-the-art image fusion algorithms. The proposed approach effectively preserves important details and edges while reducing artifacts and noise, leading to more accurate and reliable fused images. Future work will focus on further exploring its potential in other image processing tasks.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Ethics statement

Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

YJ: Conceptualization, Data curation, Formal Analysis, Methodology, Software, Supervision, Writing–original draft, Writing–review and editing. TM: Data curation, Formal Analysis, Funding acquisition, Methodology, Software, Supervision, Writing–original draft, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research was funded by the Tianshan Talent Training Project-Xinjiang Science and Technology Innovation Team Program (2023TSYCTD).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Bai X, Zhang Y, Zhou F, Xue B. Quadtree-based multi-focus image fusion using a weighted focus-measure. Inf Fusion (2015) 22:105–18. doi:10.1016/j.inffus.2014.05.003