A hybrid contrast method for infrared small dim target detection

Han, Jinhui; Moradi, Saed; Wang, Wei; Li, Nan; Zhao, Qian; Luo, Zhen

doi:10.3389/fmars.2025.1584345

ORIGINAL RESEARCH article

Front. Mar. Sci., 08 May 2025

Sec. Ocean Observation

Volume 12 - 2025 | https://doi.org/10.3389/fmars.2025.1584345

This article is part of the Research TopicRemote Sensing Applications in Marine Ecology Monitoring and Target SensingView all 13 articles

A hybrid contrast method for infrared small dim target detection

Jinhui Han^1*

Saed Moradi²

Wei Wang³

Nan Li¹

Qian Zhao¹

Zhen Luo³

¹College of Physics and Telecommunication Engineering, Zhoukou Normal University, Zhoukou, China
²Department of Electrical Engineering, University of Isfahan, Isfahan, Iran
³School of Artificial Intelligence, Zhoukou Normal University, Zhoukou, China

Infrared (IR) small dim target detection under complex background is crucial in many fields, such as maritime search and rescue. However, due to the interference of high brightness background, complex edges/corners and random noises, it is always a difficult task. Especially, when a target approaches a high brightness background area, the target will be easily submerged. In this paper, a new contrast method framework named hybrid contrast measure (HCM) is proposed, it consists of two main modules: the relative global contrast measure (RGCM) calculation, and the small patch local contrast weighting function. In the first module, instead of using some neighboring pixels as benchmark directly during contrast calculation, the sparse and low rank decomposition method is adopted to get the global background of a raw image as benchmark, and a local max dilation (LMD) operation is applied on the global background to recover edge/corner information. A Gaussian matched filtering operation is applied on the raw image to suppress noises, and the RGCM will be calculated between the filtered image and the benchmark to enhance true small dim target and eliminate flat background area simultaneously. In the second module, the Difference of Gaussians (DoG) filtering is adopted and improved as the weighting function. Since the benchmark in the first module is obtained globally rather than locally, and the patch size in the second module is very small, the proposed algorithm can avoid the problem of the targets approaching high brightness backgrounds and being submerged by them. Experiments on 14 real IR sequences and one single frame dataset show the effectiveness of the proposed algorithm, it can usually achieve better detection performance compared to the baseline algorithms from both target enhancement and background suppression point of views.

1 Introduction

In the field of guidance, early warning, airborne or spaceborne monitoring/surveillance, maritime search and rescue, etc., infrared detection system has become an effective supplement or alternative for traditional visible light and radar detection systems (Kou et al., 2023). However, in some practical applications, targets are far from the detector and occupy only a few pixels (usually less than 9×9) in the output image with low intensity value (Pang et al., 2022a), which are usually called IR small targets or IR small dim targets. The detection of IR small dim targets is highly challenging (Luo et al., 2024; Ma et al., 2022), as shown in Figure 1: Firstly, the small size of the target results in a lack of significant shape or texture information; Secondly, the low intensity value makes it difficult to obtain the target directly in the raw image; Thirdly, there are usually various complex backgrounds such as buildings, clouds and sea waves in real applications, they may have high brightness and complex edges, resulting many false alarms; Finally, some bad detector pixels and random electrical noise may cause some pixel-size noise with high brightness (PNHB), they may also bring false alarms.

Figure 1

Figure 1. Distributions of real IR small target (labeled with rectangle) and common interference (labeled with ellipses). (a) A raw IR image; (b) Distributions of different components, TT represents true small target, HB represents high-brightness background, EB represents background edge, CB represents background corner, and PNHB represents pixel-size noise with high brightness.

So far, a lot of methods have been proposed to address the problem of IR small dim target detection under complex backgrounds, which can be mainly divided into two categories: data-driven methods and model-driven methods (Kou et al., 2023). The data-driven methods typically use prepared image data to train a deep convolution network, enabling the network to classify the input data as targets or backgrounds, they can be further divided into the two-stage methods such as RCNN (Girshick et al., 2014; Ren et al., 2017), the single stage methods such as YOLO (Redmon et al., 2016; Liu et al., 2024; Xu et al., 2024), and the deep unfolding methods such as RPCANet (Wu et al., 2024). However, data-driven methods only work well when the input data and the training data have the same distribution function, but it may not be always true in practical applications. In addition, a deep convolution network usually contains a large number of parameters, which is difficult to train and deploy on a single-chip system.

The model-driven methods typically design a handy-crafted feature model according to the feature difference between the true small dim target and the complex background firstly, then search for targets in the feature saliency map. Compared to the data-driven methods, the model-driven methods are easier to understand and implement for real applications. Therefore, the model-driven methods still attract a lot of attention nowadays.

The feature model is a key module in model-driven methods, which can be designed between consecutive frames or within a single frame. These methods are usually called as sequence-based methods and frame-based methods, respectively (Luo et al., 2022). The sequence-based methods (Zhang et al., 2021; Du and Hamdulla, 2020a; Dang et al., 2023; Pang et al., 2022b) utilize information from multiple frames simultaneously for target detection, therefore they usually have better performance. However, such algorithms typically have higher computational costs and require more processing resources as they need to consider a large amount of information. On the contrary, the frame-based algorithms perform target detection within a single frame, so they usually require less computation and storage space, making them easier to implement. Moreover, a frame-based algorithm is usually adopted as a basic module in some sequence-based algorithms. Therefore, in this paper we focus on the frame-based IR small target detection.

According to the different information used during feature extraction, existing frame-based model-driven algorithms can be further divided into background estimation methods, morphological methods, directional derivative/gradient methods, local contrast methods, frequency filter methods, sparse representation methods, and sparse and low rank decomposition methods.

1.1 Background estimation methods

Background estimation methods estimate the background value of each pixel using its neighboring benchmark, then subtract the background image from the raw image to get the foreground image. Since the gray value of each pixel in an IR image consists of the background value, the target value and the random noise, we will be able to obtain the target easily in the foreground. Current background estimation methods include median filtering (Yang and Shen, 1998), max mean/max medium filtering (Deshpande et al., 1999), and some adaptive background estimation methods such as Two Dimensional Least Mean Square (TDLMS) (Ding and Zhao, 2015), etc.

1.2 Morphological methods

Morphological methods firstly design a specific shaped structural window according to the characteristics of the IR small dim targets, then use this window to do some morphological operations such as erosion and dilation at each position of the raw image to highlight the target and suppress background and noise. The ring-like new Top Hat structure window (Bai and Zhou, 2010) is an excellent representative of morphological methods for IR small dim target detection, which was further developed by Deng et al. (2021) and Zhu et al. (2020a); Zhang et al. (2024) designed two dilate structures to enhance true target and suppress clutters, respectively; Peng et al. (2024) designed a dual structure template with eight directions to distinguish between background edges and real targets; etc.

1.3 Directional derivative/gradient methods

Directional derivative/gradient methods take the gray value of a pixel as a scalar. If we calculate the directional derivative/gradient of a pixel, we can get directional information. Therefore, the number of features used to distinguish between the target and background will increase during this process, and the detection performance will be better. For example, Lu et al. (2022a) used a small kernel model to calculate directional derivatives; Bi et al. (2020) proposed the high-order directional derivatives; Yang et al. (2022) adopted the multidirectional difference measure as a weighting function of the multidirectional gradient; Hao et al. (2024) designed a gradient method that can adaptively handle multiscale targets; etc.

1.4 Local contrast methods

Local contrast methods are bionic methods inspired by the contrast mechanism of the human visual system. A human can quickly and accurately capture a target which is locally salient in the image, while large-area high brightness backgrounds will be ignored, because the human eyes are more sensitive to local contrast information rather than the brightness information (Itti et al., 1998). Therefore, using local contrast instead of brightness as the basis for IR small dim target detection, better performance can be achieved.

The core of the local contrast is the dissimilarity between a current pixel and its neighboring benchmark. According to the size of the benchmark, existing local contrast methods can be divided into large patch methods and small patch methods.

1.4.1 Large patch local contrast methods

Large patch methods first design a double-layer or tri-layer window in which the inner layer is used to capture target and the outer layer is used to capture neighboring benchmark, then slide the window on the whole image and calculate the local contrast information at each pixel. For example, Chen et al. (2014) proposed a Local Contrast Measure (LCM) algorithm based on a double-layer nested window, it takes the ratio of the central cell and the surrounding cell as local contrast; Han et al. (2014) proposed an improved LCM (ILCM) algorithm that introduces the average of the central cell to suppress noise; Qin and Li (2016) proposed a novel LCM (NLCM) algorithm that only averages some largest pixels of each cell to protect true targets; Han et al. (2018) proposed the relative LCM (RLCM) in which both ratio and difference operations are used to enhance target and suppress clutter simultaneously; Wei et al. (2016) proposed a multiscale patch-based contrast measure (MPCM) to deal with targets of unknown size; Han et al. (2020) and Han et al. (2021) designed two tri-layer windows to deal with targets of unknown size only using single scale calculation, which was further developed by Liu et al. (2023); Han et al. (2022) extended the gray contrast to feature contrast; etc. In these methods, a large contrast value can usually be extracted when the target is locally salient, because its benchmark is the surrounding background of the target. However, it also has disadvantages: if the background in the field of view is complex and the target is not locally salient, that is, when the target approaches a high brightness background, the high brightness background may be probably included in the patch window and selected as the benchmark, and the true target will be submerged.

1.4.2 Small patch local contrast methods

The reason for the drawback of the large patch methods is that the patch size in them is too large, for example, in ILCM, the patch size can reach up to 24×24; in NLCM, it reaches up to 30×30. A natural idea to address this drawback is to reduce the size of the patch window. Researchers find that a true IR small dim target usually attenuated uniformly in all directions, which means the gray value of its center is usually larger than its surroundings. Therefore, we can still obtain the local contrast information of a true target even if the patch size is smaller than the size of the target. For example, Shao et al. (2012) used the Laplacian of Gaussian (LoG) filter template with positive center coefficients and negative surrounding coefficients for contrast calculation; Wang et al. (2012) proposed a similar but simpler filter template named Difference of Gaussian (DoG) filter; Han et al. (2016) used elliptical Gabor functions instead of the circular Gaussian functions as kernel functions to distinguish true targets and complex background edges; Chen et al. (2023a) and Guan et al. (2020a) used LoG and Gaussian filter as preprocessing stage; etc. In these small patch algorithms, the template size is usually set to only 5×5, so that the risk of the target being submerged will be reduced as much as possible. However, if the target is large enough, the extracted contrast value will be small, because its benchmark still belongs to target pixels.

Due to their advantages such as theoretical simplicity, ease of implementation, and ability to enhance true targets and suppress complex backgrounds simultaneously, local contrast methods still attract a lot of attention currently. However, the large patch methods cannot work well when the target approaches a high brightness background, and the small patch methods cannot work well when the target is relatively large. To improve the detection performance further, some researchers also combine the local contrast with other methods. For example, Cui et al. (2016) combined local contrast with the support vector machines; Deng et al. (2016); Deng et al. (2017) and Yao et al. (2023) introduced local entropy to weight local contrast; Du and Hamdulla (2020b) used local smoothness as weighting function; Xiong et al. (2021) and Zhou and Wang (2023) combined with local gradient; Kou et al. (2022) combined with density peak; Wang et al. (2024) used the variance as weighting function; Wei et al. (2022) used facet filtering as preprocessing stage; Han et al. (2019) used TDLMS to get the background benchmark; Tang et al. (2023) used a dilation operator with a ring structure to get background benchmark; Han et al. (2021) selected the surrounding gray value closest to the center in eight directions as the benchmark;etc. However, these methods can only alleviate rather than solve these problems.

1.5 Frequency filter methods

Frequency filter methods assume that different components in the raw image occupy different bands in frequency domain: the background occupies the low-frequency band since background is usually flat and continuously distributed in large area; the target occupies the high-frequency band since it has a significant discontinuity with its neighborhood; the noise occupies the highest frequency band since noise is usually random. If we first transform the original image into the frequency domain and use a frequency filter, the target will be detected. For example, Yang et al. (2004) used Butterworth high-pass filters to obtain true targets; Qi et al. (2014) proposed a detection method based on quaternion Fourier transform; Gregoris et al. (1994) used the wavelet transform to get the frequency information of the raw image; Kong et al. (2016) used Haar wavelet to detect the sea-sky line first, and then detected targets in the next steps; Chen et al. (2019) combined local contrast with some frequency domain algorithms; etc. However, if the backgrounds are very complex, some edges and corners also contain a lot of high-frequency information, making it difficult to distinguish them from true targets.

1.6 Sparse representation methods

The main idea of sparse representation is to use an overcomplete dictionary containing many atoms to linearly represent the original data. If some atoms have a high correlation with the original data, their corresponding coefficients will be relatively large, so the coefficient vector will present with obvious sparsity. Based on this, Zhao et al. (2011) used many simulated IR small targets as atoms to construct an overcomplete dictionary; He et al. (2015) discussed the feature of background and found that the background is usually low rank, so they proposed a representation method with low rank constraints; Zhang et al. (2017) first used a particle filter to construct a saliency map, and then represented the saliency map instead of the raw image; Qin et al. (2016) and Liu et al. (2017) constructed two dictionaries, one is used to represent targets and the other is used to represent backgrounds; Chen et al. (2023b) utilized an adaptive group sparse representation method to denoise the raw IR image; etc. However, the shapes of the target and the background are ever changing between different practical applications, so it is extremely difficult to construct a dictionary that covers all situations.

1.7 Sparse and low rank decomposition methods

Sparse and low rank decomposition methods consider that a raw image consists of two parts: the background image, and the foreground image. The background in nature usually has a certain degree of non-local self-similarity, so the rank of the background image is relatively low; The foreground image mainly contains small targets and random noises, so it is usually very sparse. If we decompose the raw image into a matrix with low rank and a matrix with sparsity, we will be able to identify the targets easily in the sparse matrix. According to the type of the original data used for decomposition, the sparse and low rank decomposition methods can be further divided into two main categories: the image patch type, and the tensor type.

1.7.1 Image patch type sparse and low rank decomposition methods

Image patch type sparse and low rank decomposition methods first construct an image patch matrix, then decompose it by iteration. For example, Gao et al. (2013) firstly divide the original image into many image patches, then stretch them into column vectors and form all vectors as a new matrix, named infrared patch image (IPI), then used Robust Principal Component Analysis(RPCA) algorithm to decompose it; Dai et al. (2017) only focus on some small singular values during RPCA decomposition to maintain complex background edges; Wang et al. (2017); Fang et al. (2020) and Zhu et al. (2020b) embedded some regularization factors in the objective function of RPCA decomposition to constrain the results; Hao et al. (2023) proposed a novel continuation strategy based on the proximal gradient algorithm to suppress strong edges; etc.

1.7.2 Tensor type sparse and low rank decomposition methods

Tensor type sparse and low rank decomposition methods first stack image blocks into a three-dimensional tensor, then decompose it by iteration. For example, Dai and Wu (2017) constructed the infrared patch tensor model, besides, some local priors in the image are used as weighting function, too; Zhang et al. (2019) simplified Dai’s model and abandoned the weighting function; Fan et al. (2022) proposed a new anisotropic background feature as the weighting function for the infrared patch tensor model; Guan et al. (2020b) used global sparsity information as a weighting function for local contrast; Lu et al. (2022b) utilized some gradient information as weighting function during sparse and low rank decomposition; Zhang et al. (2023) utilized the local entropy as weighting function; etc.

Generally speaking, the sparse and low rank decomposition methods directly extract global target features from the whole IR image, so they can achieve good detection performance, especially when the target is not locally salient, for example, if a target approaches high brightness background. However, current algorithms only focus on the sparse foreground and attempt to directly find targets within it, but when the background in the field of view is very complex, a few background edges/corners and noise components also have sparse features and will be easily decomposed into the foreground, seriously interfering with the detection of real targets.

1.8 The main work of this paper

As mentioned earlier, when the target is locally salient, the large patch local contrast methods, such as LCM, ILCM, etc., can extract contrast information to the maximum extent, but they will fail when the target is not locally salient. This is because they directly utilize some neighboring pixels as the benchmark for a current pixel when calculating its contrast information. Therefore, we studied how to improve the benchmark selection principle to address this issue. We found that the sparse and low rank decomposition methods can decompose a raw image into a low rank background matrix and a sparse foreground matrix, and, since this decomposition is performed at the global level, the background at a true target’s position will be more accurate and not affected by neighboring high brightness area. Thus, we propose that the decomposed low rank background matrix will be more suitable as the benchmark for calculating contrast information, and a new detection framework named hybrid contrast measure (HCM) for detecting IR small dim target in complex background is proposed in this paper.

The main work and contributions of this paper can be described as:

1. A new detection framework named HCM is proposed, it consists of two main modules: the relative global contrast calculation, and the small patch local contrast weighting function. Both global and local information are utilized in this new framework to achieve better detection performance even when the background is very complex and the target is not locally salient.

2. In the relative global contrast calculation module, the IPI model and the sparse and low rank decomposition are adopted to get the global background as the benchmark for contrast calculation. However, due to the sparsity of edge and corner information, it is difficult to maintain them in a low rank global background. In this paper, we analyzed the essence of matrix low rank conversion and stated that some local operations can also be introduced on the global background to recover edge and corner information. Therefore, a simple but effective local max dilation (LMD) method is proposed and used on the estimated background image.

3. Inspired by our former work in the field of local contrast, we propose the relative global contrast measure (RGCM) between a raw image and its global benchmark to enhance true small dim target and eliminate flat background area simultaneously. Especially, before global contrast calculation, a Gaussian filter is applied on the raw image to suppress PNHB better.

4. In the small patch local contrast weighting function module, the DoG filter is utilized to obtain local contrast information, which is then used as a weighting function after a simple non-negative operation to get the final saliency map. The advantage of choosing DoG is that its template is small enough, so we don’t have to worry about the problem of the targets approaching high brightness backgrounds and being submerged by them.

2 Methodology

The framework of proposed HCM method is shown in Figure 2, there are two main modules in it: the RGCM calculation, and the small patch local contrast weighting function.

Figure 2

Figure 2. The framework of the proposed method.

2.1 RGCM calculation

2.1.1 Global background separation

A raw IR image is usually modeled as the summation of three components, the background image, the target image, and the noise image, as shown in Equation 1:

\begin{array}{l} I (x, y) = I_{B} (x, y) + I_{T} (x, y) + I_{N} (x, y) & (1) \end{array}

where (x, y) is the coordinate of each pixel in the image, I is the raw image, I_B is the background image, I_T is the target image, and I_N is the noise image.

In practical applications, the background is usually self-similar, so I_B is usually a low rank image. The size of small target is usually very small, so I_T is usually a sparse image. If we use a sparse and low rank decomposition algorithm such as RPCA to divide a raw IR image into a sparse image and a low rank image, the target will become salient in the sparse part. This can be modeled as Equation 2:

\begin{array}{l} \min_{B,T} r a n k (B) + λ | | T | |_{0}, s . t . D = B + T & (2) \end{array}

where D is the raw image I, B is the low rank background image I_B, and T is the sparse foreground image I_T.

However, it is NP-hard to solve this problem since the rank function and l₀-norm are both non-convex and discontinuous. Many researchers have used another relaxed form, as shown in Equation 3:

\begin{array}{l} \min_{B,T} | | B | |_{*} + λ | | T | |_{1}, s . t . D = B + T & (3) \end{array}

Here, the nuclear norm is a relaxation of the rank function, and the l₁-norm is a relaxation of the l₀-norm.

When we consider the effect of noises, the augmented Lagrange multipliers of this problem can be written as Equation 4:

\begin{array}{l} L (B,T) = | | B | |_{*} + λ | | T | |_{1} + \frac{μ}{2} | | D - B - T | |_{F}^{2} & (4) \end{array}

Then we can solve it via some optimization algorithms such as the Accelerated Proximal Gradient (APG) (Lin et al., 2009). Especially, to deal with complex backgrounds better, researchers usually don’t directly separate the raw image, but firstly construct the IPI or the tensor data, then decompose it. Taking IPI for an example, the flowchart of the algorithm is shown in Figure 3.

Figure 3

Figure 3. The flowchart of IPI algorithm. The figure is taken from Gao et al. (2013).

Firstly, put a window on the raw image and slid it at a given step to obtain a series of image patches. Then, these patches are flattened into vectors and formed as a new matrix D, and decomposition will be applied on D according to Equation 5 to get the low rank matrix B and the sparse matrix T.

\begin{array}{l} L (B, T) = {‖ B ‖}_{*} + λ {‖ T ‖}_{1} + \frac{μ}{2} {‖ D - B - T ‖}_{F}^{2} & (5) \end{array}

Finally, a median function will be used between some overlapped patches to recover the low rank background matrix B and sparse target matrix T.

Researchers usually tend to directly find small dim targets in the sparse matrix, however, when the background is very complex, due to the sparsity of edge and corner information, they will more likely be separated into the sparse matrix, overwhelming the target. In our work, we will turn to the low rank matrix and take it as the benchmark for global contrast information calculation. The problem has become how to maintain as much edge/corner information as possible in the low rank matrix.

Let’s analyze the reason for the data changes at complex edge and corner positions during the low rank conversion. Considering two simple cases and ignore the IPI operation, as shown in Figure 4: in (a), the raw image contains an edge and its rank is 5, if we forcefully reduce its rank to 3, two pixels will be changed and their gray value will be partially separated into the sparse image, see Figure 4b or Figure 4c.

Figure 4

Figure 4. The low rank conversion at edges or corners. (a) A 5 × 5 image with rank of 5 containing a diagonal edge. (b) The image after conversion with rank of 3. (c) Another case with rank of 3.

Due to the self-similarity of pixel values in flat background areas, it can be reasonably inferred that these data changes should only occur at edges and corners, and within a small local area, as there are usually some constraints in the objective function during iterations. Therefore, we state that some local operations can be introduced on the global low rank background to maintain as much edge and corner information as possible, and apply a simple but effective local max dilation (LMD) operation on the separated low rank image.

The procedure for the global background separation in this paper is shown in Algorithm 1. The IPI model is adopted here.

Algorithm 1. The global background separation.

www.frontiersin.org

In LMD, the dilation radius r is a key parameter. The larger the r is, the more the edge/corner information will be maintained. However, when a target approaches to high brightness background and r is too large, the dilated high background will cover the target location, thus submerging the target during next step of contrast calculation. After a lot of experiments, we decided to set r to 1 in this paper to avoid this situation as much as possible.

2.1.2 Matched filtering

Random noises, especially PNHB, may also easily be separated into sparse foreground, and it is very hard to maintain them in the low rank background image by LMD operation since they usually emerge as single pixels. In this paper, before calculating the contrast information between raw image I and the benchmark B_LMD, a matched filtering operation will be applied on the raw image to suppress random noises first.

The theory of matched filter tells us that when the filter template is similar to the signal shape, the SNR of an image can be maximally improved (Moradi et al., 2016). Since true IR small dim targets usually has a Gaussian shape, a typical normalized Gaussian filtering template (Figure 5) will be utilized on the raw image to suppress random noises, as described in Equation 6:

Figure 5

Figure 5. The Gaussian filtering template used in this paper.

\begin{array}{l} I_{GS} (x, y) = \sum_{l = - 1}^{1} \sum_{k = - 1}^{1} GS (l, k) I (x + l, y + k) & (6) \end{array}

where I is the raw image, (x, y) is the coordinate of each position in the raw image, GS is the Gaussian template in Figure 5, and I_GS is the filtering result.

We can also explain Equation 6 as a weighted gray sum in a small local area. A true target usually has an area larger than one pixel due to the point spread function of the optical system, so if there is a true target at the current location, the filtering result will be large. If there is a random noise with gray value equal to or slightly larger than the true target, the filtering result will be small, because a noise usually emerges as a single pixel.

2.1.3 Definition of RGCM

Inspired by our former work RLCM in the field of local contrast methods, the RGCM in this paper is defined as Equation 7:

\begin{array}{l} RGCM (x, y) = \max {0, \frac{I_{GS} (x, y)}{\max [τ, B_{LMD} (x, y)]} I_{GS} (x, y) - I_{GS} (x, y)} & (7) \end{array}

The small value τ here is used to avoid division by zero, in this paper it is set to 5 for an 8-bit digital image. Meanwhile, there is a non-negative constraint to suppress clutters.

It also can be written as Equation 8:

\begin{array}{l} RGCM (x, y) = \max [0, f (x, y) I_{GS} (x, y) - I_{GS} (x, y)] & (8) \end{array}

Where f is defined as Equation 9:

\begin{array}{l} f (x, y) = \frac{I_{GS} (x, y)}{\max [τ, B_{LMD} (x, y)]} & (9) \end{array}

It can be taken as an enhancement factor of the current pixel.

2.2 Small patch local contrast weighting function

In this paper, the DoG filter is selected as a weighting function for RGCM. DoG is a small patch local contrast method and can extract the local contrast information within a small area, thereby avoiding the problem of the targets being submerged by high brightness backgrounds when approaching them. Similar to the original DoG method, we use two 5×5 templates (as shown in Figure 6) as the approximation of the Gaussian kernels, and define the weighting operation as Equation 10:

Figure 6

Figure 6. The two templates used for DoG filtering. (a) T₁, (b) T₂.

\begin{array}{l} W = \max [I * (T_{1} - T_{2}), 0] & (10) \end{array}

Note that considering a desired target is usually hotter than the environment, a simple non-negative operation is utilized in the weighting function in this paper to suppress clutters better.

2.3 HCM calculation and discussions

Finally, the HCM is defined as Equation 11:

\begin{array}{l} HCM (x, y) = RGCM (x, y) W (x, y) & (11) \end{array}

It is obvious that both local and global contrast information are utilized in the proposed HCM method, that is why it is called as “hybrid” contrast method.

Algorithm 2 gives the main steps for HCM calculation.

Algorithm 2. HCM calculation.

www.frontiersin.org

It is necessary to discuss the different cases where (x, y) is different types of pixels:

a) If (x, y) is a true target, since a true target’s size is usually very small and most of its information will be separated into the spare foreground, it can be easily deduced that

\begin{array}{l} I_{GS} (x, y) > B_{LMD} (x, y) & (12) \end{array}

So, there will be

\begin{array}{l} f (x, y) > 1 & (13) \end{array}

\begin{array}{l} RGCM (x, y) > 0 & (14) \end{array}

Besides, it can be easily deduced that

\begin{array}{l} W (x, y) > 0 & (15) \end{array}

\begin{array}{l} HCM (x, y) > 0 & (16) \end{array}

Here, Equation 13 means that the true target can be enhanced by HCM.

Please note that when the target approaches high brightness background, Equations 12–16 will be still true as long as the dilation radius r in Algorithm 1 is smaller than the distance between target and high brightness background.

b) If (x, y) is pure background, since background is usually flat, most of its information will be separated into the low rank background, it can be easily deduced that

\begin{array}{l} I_{GS} (x, y) \approx B_{LMD} (x, y) & (17) \end{array}

So, there will be

\begin{array}{l} f (x, y) \approx 1 & (18) \end{array}

\begin{array}{l} RGCM (x, y) \approx 0 & (19) \end{array}

Besides, it can be easily deduced that

\begin{array}{l} W (x, y) \approx 0 & (20) \end{array}

\begin{array}{l} HCM (x, y) \approx 0 & (21) \end{array}

Equations 17–21 mean that the flat background can be eliminated by HCM.

Note that Equations 17–21 are independent of the actual value of the current pixel, which means that the proposed method can eliminate high brightness background properly.

c) If (x, y) is near a background edge or corner, although some of its information will be separated into the sparse foreground, the LMD operation on the low rank background image can recover as much edge/corner information as possible, i.e.,

\begin{array}{l} B_{LMD} (x, y) \geq I (x, y) & (22) \end{array}

Considering

\begin{array}{l} I_{GS} (x, y) \approx I (x, y) & (23) \end{array}

There will be

\begin{array}{l} B_{LMD} (x, y) \geq I_{GS} (x, y) & (24) \end{array}

So, we can get that

\begin{array}{l} f (x, y) \leq 1 & (25) \end{array}

\begin{array}{l} RGCM (x, y) \approx 0 & (26) \end{array}

And

\begin{array}{l} HCM (x, y) \approx 0 & (27) \end{array}

Equations 22–27 means that the proposed method can effectively suppress complex background edges and corners.

d) If (x, y) is a random noise, the case will be similar to the true target. However, since the Gaussian filtering operation can suppress single pixel random noise to some extension, it can be easily deduced that the HCM of a noise pixel will be smaller than a true target’s, even if its gray value is equal to or slightly larger than the target. Therefore, the HCM method can suppress random noise effectively.

2.4 Threshold operation

For each pixel of the raw IR image, calculate its HCM and form them as a new matrix. It is obvious that the true target will be the most salient in the HCM result, while other interferences such as edges, corners and noises are all inhibited. Therefore, in this paper the HCM result will be treated as the Saliency Map (SM), and an adaptive threshold operation will be used to extract the true target from it. The threshold value Th is defined as Equation 28

\begin{array}{l} Th = ξ \cdot {max}_{SM} + (1 - ξ) {mean}_{SM} & (28) \end{array}

where max_SM and mean_SM are the maximum and mean value of SM, respectively. ξ is a factor range 0 ~ 1, according to our experiments, a ξ range from 0.7 to 0.95 is proper for most cases of single-target detection, but note that it’s better to set ξ to a smaller value for multi-target detection cases, since different targets may have different saliency.

By applying the threshold Th on SM, the pixels larger than Th will be labeled as 1, otherwise labeled as 0. In the final binary image, output each connected area as a detected target (to eliminate clutters better, a dilation operation may be needed first).

Algorithm 3 summarizes the main steps for the threshold operation.

Algorithm 3. Threshold operation.

www.frontiersin.org

3 Details of the IR image data used in this paper

In this paper, 14 real IR sequences which contain different types of targets and backgrounds are used to verify the performance of the proposed algorithm. Figure 7 shows some samples of them, and Table 1 reports the details.

Figure 7

Figure 7. Samples of the 14 real IR sequences. (a–n): Sequence 1 ~ Sequence 14.

Table 1

Table 1. Features of the 14 real IR sequences.

Besides, a single frame dataset is also used to test the detection performance of the proposed method, some samples are shown in Figure 8.

Figure 8

Figure 8. Six samples of the single frame dataset. (a-f): sample 1 ~sample 6.

From Figures 7, 8 it can be seen that in the raw IR images, the targets are usually very small and dim, while the backgrounds are usually very complex. Besides, some images contain heavy noises.

4 Experiments and results

In this section, we will firstly give each processing step of the proposed method, and compare the performance of the proposed algorithm with some baseline algorithms. Then, the computational complexity and time consuming of the proposed method will be analyzed, and the robustness to noises of the proposed algorithm will be tested. Finally, some ablation experiments will be conducted to verify the effectiveness of some important modules of the proposed method. All the experiments are conducted on a PC with 8-GB random access memory and 3.1-GHz Intel i5 processor.

4.1 Processing results of the proposed algorithm

Firstly, the detection ability of the proposed algorithm are tested and the processing results of the proposed algorithm are given step by step in Figures 9, 10. The same samples with Figures 7, 8 are given here.

Figure 9

Figure 9. The processing results of the proposed algorithm for the samples of the 14 sequences, from top to bottom: Sequence 1~ Sequence 14. (a) Raw IR images. (b) Images after Gaussian filtering. (c) The separated global background B. (d) The global background B_LMD after LMD. (e) The RGCM results. (f) The DoG weighting function. (g) The HCM results after weighting. (h) The final detection results after threshold operation.

Figure 10

Figure 10. The processing results of the proposed algorithm for the samples of the single frame dataset. (a) Raw IR images. (b) Images after Gaussian filtering. (c) The separated global background B. (d) The global background B_LMD after LMD. (e) The RGCM results. (f) The DoG weighting function. (g) The HCM results after weighting. (h) The final detection results after threshold operation.

It can be seen from Figures 9, 10 that:

In the original IR image I, the targets are usually small and dim, while backgrounds are usually complex, they may have high brightness and complex edges and corners. Meanwhile, there are many random noises (including PNHB) in some sequences, too.

The Gaussian filter in the first step can effectively suppress random noises and improve image quality to a certain extent.

After sparse and low rank decomposition, the global background image B separated from the original image mainly contains background information. However, it lost some important information of complex edges/corners, which makes it not suitable to be directly considered as the benchmark for global contrast calculation in next steps.

After LMD, the dilated global background image B_LMD can recover as much edges/corners information as possible, so it is more suitable to be used as the benchmark than B.

After RGCM calculation, real targets become very salient. However, there are still a few clutters in some images with complex backgrounds.

After the weighting operations by the DoG filtering result, residual clutters are suppressed further and real targets become the most salient in the SM.

Finally, after the threshold operation, all the real targets are output successfully, and only one false alarm emerges in Sequence 7 (it is a broken cloud which has a similar pattern to the real target, in our future work, we will utilize some time domain methods to eliminate it). Therefore, the effectiveness of the proposed method is proved.

4.2 Comparisons with other algorithms

Nine existing algorithms are chosen as baselines for comparisons to verify the advantages of the proposed method, including:

Seven local contrast algorithms, such as DoG (Wang et al., 2012), ILCM (Han et al., 2014), MPCM (Wei et al., 2016), RLCM (Han et al., 2018), Weighted Local Difference Measure (WLDM) (Deng et al., 2016), Multi-Directional Two-Dimensional Least Mean Square (MDTDLMS) (Han et al., 2019), and Enhanced Closest Mean Background Estimation (ECMBE) (Han et al., 2021).

One global decomposition algorithm, i.e., IPI (Gao et al., 2013).

One deep learning algorithm, i.e., RPCANet (Wu et al., 2024).

Here is a summary of each baseline method:

a. DoG is a traditional small patch local contrast method.

b. ILCM is a large patch local contrast method, it takes the ratio value between a current pixel and its surrounding benchmark as contrast information.

c. MPCM is a large patch local contrast method, but it performs multiscale calculation to extract the target better.

d. RLCM is a multiscale local contrast method too, and both ratio and difference operations are utilized in it to enhance true target and suppress background simultaneously.

e. WLDM introduces the local entropy as a weighting function for the local contrast information.

f. MDTDLMS utilizes a background estimation method to get the benchmark for contrast information calculation, however, its benchmark is still obtained by some local operations.

g. ECMBE combines local contrast method with background estimation too, and it proposes a new background estimation principle named closest mean, therefore the problem of target submergence caused by the neighboring high brightness background can be alleviated.

h. IPI is a sparse and low rank decomposition algorithm, but it focuses on the sparse foreground image and tries to directly search target in it.

i. RPCANet is a newly proposed deep learning algorithm, it unfolds the traditional iterations in RPCA algorithm with deep networks to achieve sparse and low rank decomposition.

The key parameters of each algorithm are listed in Table 2.

Table 2

Table 2. The parameter values used in the baseline algorithms.

Firstly, two objective indicators named SCR Gain (SCRG) and Background Suppression Factor (BSF) are used to describe different performances of different algorithms. SCRG, which is defined as Equation 29, can effectively describe the target enhancement ability of an algorithm. BSF, which is defined as Equation 30, can describe the background suppression ability of an algorithm.

\begin{array}{l} SCRG = \frac{{SCR}_{out}}{{SCR}_{in}} & (29) \end{array}

\begin{array}{l} BSF = \frac{σ_{in}}{σ_{out}} & (30) \end{array}

where SCR_in and SCR_out are the SCR (defined as Equation 31) of the raw image and SM respectively, σ_in and σ_out are the stand deviation of the raw image and SM respectively.

\begin{array}{l} SCR = \frac{| I_{t} - I_{nb} |}{σ} & (31) \end{array}

where I_t is the maximal gray of the target center. I_nb is the average gray of the neighboring background around the target center, in this paper it is set to the area between 15 × 15 and 9 × 9 around the target center. σ is the stand deviation of the image.

The results of SCRG and BSF are shown in Tables 3, 4, respectively. Please note that as a deep learning method, the output of RPCANet is the target probability of each pixel, so we are unable to calculate its SCRG and BSF.

Table 3

Table 3. The SCRG of different algorithms in the 14 sequences.

Table 4

Table 4. The BSF of different algorithms in the 14 sequences.

It can be seen from Tables 3, 4 that compared to the baselines, the proposed algorithm can achieve the highest SCRG and BSF in most cases.

Then, to intuitively show the detection performance of different algorithms, Figures 11, 12 give the salience map and the detection results of each algorithm for the samples of the 14 sequences.

Figure 11

Figure 11. The saliency maps of the 14 sequences using different algorithms. (a) DoG. (b) ILCM. (c) MPCM. (d) RLCM. (e) WLDM. (f) MDTDLMS. (g) ECMBE. (h) IPI. (i) RPCANet.

Figure 12

Figure 12. The detection results of the 14 sequences using different algorithms. (a) DoG. (b) ILCM. (c) MPCM. (d) RLCM. (e) WLDM. (f) MDTDLMS. (g) ECMBE. (h) IPI. (i) RPCANet.

Besides, the Receiver Operating Characteristic (ROC) curves for each whole sequence are utilized to compare the detection performance of different algorithms, and the results are shown in Figure 13. Here, the False Positive Rate (FPR) and the True Positive Rate (TPR) are defined as Equation 32 and Equation 33.

Figure 13

Figure 13. The ROC curves of different algorithms in different sequences. (a–n): Sequence 1 ~ Sequence 14.

\begin{array}{l} FPR = \frac{number of detected false targets}{total number of pixels in the whole image} & (32) \end{array}

\begin{array}{l} TPR = \frac{number of detected true targets}{total number of real targets} & (33) \end{array}

It can be seen from Figures 11-13 that:

a. DoG can only extract true targets in Sequence 1, Sequence 3 and Sequence 14 in Figure 12, and its ROC performance is usually the worst in Figure 13, too.

b. ILCM can extract true targets in Sequence 1, Sequence 7, Sequence 10, Sequence 12 and Sequence 14, its performance is slightly better than DoG. However, it is not satisfied in many other sequences.

c. MPCM can output true targets in Sequence 1, Sequence 2, Sequence 12 and Sequence 14. However, when the background is complex, many interference are enhanced and output too, for example, in Sequence 4, Sequence 6, Sequence 7 and Sequence 9, etc. Especially, in Figure 13, its performance is worse than many other algorithms.

d. RLCM can achieve a better detection performance in some sequences, for example, in Sequence 4, Sequence 8 and Sequence 9, etc. However, when the background is very complex and the target is very dim, it will fail, for example, in Sequence 5, Sequence 6, Sequence 11 and Sequence 13, etc.

e. WLDM utilizes the local entropy as the weighting function for local contrast information, however, when the target is dim and the background is very complex, the target will be submerged by clutters, for example, in Sequence 3 ~ Sequence 10, etc.

f. MDTDLMS utilizes the TDLMS background estimation method to get the benchmark for contrast information calculation, so it can achieve good performance in some cases, such as in Sequence 3, Sequence 5, Sequence 12 and Sequence 13, etc. However, its benchmark is obtained locally, so its performance is still not good in some cases, especially when the target approaching some high brightness background, for example, in Sequence 11.

g. ECMBE improved the principle of benchmark selection, so it can achieve good detection performance even if the target is not local salient, for example, in Sequence 11. However, since its benchmark is still obtained locally, its performance is not very good in some cases, for example, in Sequence 5 ~ Sequence 7, and Sequence 9 ~ Sequence 11, etc.

h. As a global decomposition method, IPI can achieve good performance even when the target is not local salient, for example, in Sequence 11. However, it focus on sparse part and is sensitive to complex edge/corner information. If the target is weak and the background is complex, it will fail, for example, in Sequence 2 ~ Sequence 7, Sequence 10 ~ Sequence 13, etc.

i. RPCANet, as a deep learning method, can achieve good performance when the data distribution is the same with the training samples, for example, in Sequence 1. However, when the data distribution is different, its performance will decrease significantly.

Compared to the existing methods, the performance of the proposed HCM algorithm is always in the forefront in all of the 14 sequences. Especially, when the clutter is heavy the target is not local salient, it can still achieve a good detection performance.

4.3 Comparisons of computational complexity and time consuming

In this section, the computational complexity and time consuming for different algorithms is analyzed. For simplicity, suppose the raw image has a resolution of X × Y, and the scale of the patch window or cell is (2L+1)². For multi-scale local algorithms, such as MPCM, RLCM and WLDM, etc., denote S as the scale number, L_i (i=1, 2, …, S) is the L of the i_th scale.

For DoG, there will be (2L+1)² multiplications and (2L+1)² additions for each pixel during the convolution operation, so the computational complexity will be O(L²XY).

For ILCM, since it uses a DoG filter as preprocessing, and the latter subblock-stage processing consumes less calculations, its computational complexity will be O(L²XY).

For MPCM, for each scale, the average operation will cost (2L_i+1)² additions for each pixel, so for total S scales its computational complexity will be O(SL_S²XY).

For RLCM, for each scale, the sort operation within a cell will cost (2L_i+1)²log(2L_i+1)² calculations, so for total S scales its computational complexity will be O[SL_S²log(L_S²)XY].

For WLDM, for each scale, the average operation will cost (2L_i+1)² additions for each pixel, and the entropy calculation will need a sort operation within a cell first, which will cost (2L_i+1)²log(2L_i+1)² calculations, so for total S scales its computational complexity will be O[SL_S²log(L_S²)XY].

For MDTDLMS, its computational complexity is reported as O(L²XY) in the original paper.

For ECMBE, its computational complexity is reported as O(LXY) in the original paper.

For IPI, the computational complexity is reported as O(Nkmn log(mn) + rc(p + 1)) in the original paper. Here, m is the number of pixels of the patch window, i.e., (2L+1)² in this paper. n is the number of patches, k is the number of nonzero singular value (rank) of G_k^T, N is the iteration number, p is the overlapping pixel number during the transformation from the target/background patch image to the reconstruction image. r and c are the row and column numbers of the original image, i.e., X and Y in this paper, respectively. Therefore, its computational complexity can be rewritten as O(NkL²n log(L²n) + XY(p + 1)) here.

For RPCANet, since it is a deep learning method, its computational complexity is not given here.

The proposed algorithm has four main steps: global background separation, Gaussian matched filtering, RGCM calculation and DoG weighting operation. For global background separation, it has the same computational complexity as IPI algorithm; for the Gaussian matched filtering, it has 9 multiplications, 1 addition at each pixel, totally 10XY operations for the whole image; for RGCM calculation, there will be 8 comparisons for LMD of the background benchmark, 1 division, 1 subtraction, and 1 multiplication for relative GCM calculation, totally 11XY operations for the whole image; for the DoG weighting operation, there will be 25 multiplications and 1 addition for each pixel, totally 26XY operations for the whole image. Therefore, its computational complexity will be O(NkL²n log(L²n) + XY(p + 47)).

Table 5 summaries the comparisons of computational complexity for different algorithms. The average time consuming (in seconds) of different algorithms for one frame is listed in Table 6. Please note that although some existing algorithms can achieve less computational complexity and average time consuming, their performances are too bad. The proposed algorithm, although didn’t show advantages in computational complexity and time consuming, can achieve better detection performance.

Table 5

Table 5. Computational complexity of different algorithms.

Table 6

Table 6. Average time consuming of different algorithms for 14 sequences (seconds per frame).

4.4 The robustness to noises

To test the robustness to noises of the proposed algorithm, we select one sequence (Sequence 11) and add different levels of noises into it, then draw the ROC curves, see Figure 14. It can be seen that after the add of noise, the detection performance of the proposed algorithm doesn’t change obviously.

Figure 14

Figure 14. The detection performance of the proposed algorithm in Sequence 11 after different levels of noises are added.

4.5 The ablation experiments

To verify the effectiveness of some important modules, the ablation experiments are conducted at last. In the proposed algorithm, three modules are important: the Gaussian matched filter, the LMD for the low rank background, and the DoG weighting function. All of them are tested and the results are given in Table 7.

Table 7

Table 7. Ablation experiments.

From Table 7 we can see that the proposed algorithm with all of the three modules can achieve the best or the second best SCRG and BSF in most cases, which proves the effectiveness of these modules for improving detection performance. It is worth noting that in some sequences, the algorithm that do not perform Gaussian filtering operation can achieve a larger SCRG. This is because in these sequences the target is extremely small and the Gaussian filtering may smooth it to some extent. However, we still think the Gaussian filter is necessary since it can smooth backgrounds and noises too, that’s why the algorithm without Gaussian filter has a smaller BSF.

5 Conclusions

In this paper, a new contrast method framework named hybrid contrast measure (HCM) is proposed for IR small target detection. It consists of two types of contrast information: the global contrast, and the local contrast. In the global contrast calculation, it firstly obtains benchmark via a global sparse and low rank decomposition, so that it can handle the situation when target approaches to a high brightness background and becomes not local salient. Especially, a simple LMD operation is applied on the global low rank background benchmark to recover as much edge/corner information as possible. Then, the relative global contrast measure is proposed between the benchmark and the image after Gaussian filtering (to suppress random noises), to enhance true target and suppress background simultaneously. In the local contrast calculation, the DoG filter is adopted and improved with a non-negative constraint to get the weighting function to suppress clutters further. Experiments on 14 real sequences and a single frame dataset show the effectiveness of the proposed algorithm under different types of targets and backgrounds, and, compared to some baseline methods, the proposed algorithm can usually achieve better performance in SCRG, BSF and ROC curves. Besides, ablation experiments are conducted to verify the effectiveness of some important modules.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Author contributions

JH: Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing. SM: Formal analysis, Writing – original draft, Writing – review & editing. WW: Methodology, Writing – original draft. NL: Data curation, Investigation, Writing – original draft. QZ: Project administration, Writing – original draft. ZL: Data curation, Writing – original draft.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported in part by the National Natural Science Foundation of China under Grant 61802455, 62003381 and 62472464, in part by the Natural Science Foundation of Henan Province under Grant 252300420399, 242300421718 and in part by the Foundation of the Science and Technology Department of Henan Province under Grant 192102210089, 222102210077, 232102320066 and 252102210231.

Acknowledgments

Some of the testing IR sequences used in this paper are acquired from Hui et al. (2019) and Dai et al. (2021), we would like to acknowledge the authors for their kindly sharing. Also, We would like to express our sincere appreciation to the editor and reviewers who provided valuable comments to help improve this paper. Additionally, we are grateful to the researchers who provided comparative methods.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Bai X., Zhou F. (2010). Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognition 43, 2145–2156. doi: 10.1016/j.patcog.2009.12.023

Crossref Full Text | Google Scholar

Bi Y., Chen J., Sun H., Bai X. (2020). Fast detection of distant, infrared targets in a single image using multiorder directional derivatives. IEEE Trans. Aerospace Electronic Syst. 56, 2422–2436. doi: 10.1109/TAES.2019.2946678

Crossref Full Text | Google Scholar

Chen C. L. P., Li H., Wei Y., Xia T., Tang Y. Y. (2014). A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote Sens. 52, 574–581. doi: 10.1109/TGRS.2013.2242477

Crossref Full Text | Google Scholar

Chen Y., Song B., Du X., Guizani M. (2019). Infrared small target detection through multiple feature analysis based on visual saliency. IEEE Access 7, 38996–39004. doi: 10.1109/ACCESS.2019.2906076

Crossref Full Text | Google Scholar

Chen J., Zhu Z., Hu H., Qiu L., Zheng Z., Dong L. (2023a). Multi-scale local contrast fusion based on LoG in infrared small target detection. Aerospace 10, 449. doi: 10.3390/aerospace10050449

Crossref Full Text | Google Scholar

Chen J., Zhu Z., Hu H., Qiu L., Zheng Z., Dong L. (2023b). A novel adaptive group sparse representation model based on infrared image denoising for remote sensing application. Appl. Sci. 13, 5749. doi: 10.3390/app13095749

Crossref Full Text | Google Scholar

Cui Z., Yang J., Jiang S., Li J. (2016). An infrared small target detection algorithm based on high-speed local contrast method. Infrared Phys. Technol. 76, 474–481. doi: 10.1016/j.infrared.2016.03.023

Crossref Full Text | Google Scholar

Dai Y., Wu Y. (2017). Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE J. Selected Topics Appl. Earth Observations Remote Sens. 10, 3752–3767. doi: 10.1109/JSTARS.2017.2700023

Crossref Full Text | Google Scholar

Dai Y., Wu Y., Song Y., Guo J. (2017). Non-negative infrared patch-image model: Robust target background separation via partial sum minimization of singular values. Infrared Phys. Technol. 81, 182–194. doi: 10.1016/j.infrared.2017.01.009

Crossref Full Text | Google Scholar

Dai Y., Wu Y., Zhou F., Barnard K. (2021). Attentional local contrast networks for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 59, 9813–9824. doi: 10.1109/TGRS.2020.3044958

Crossref Full Text | Google Scholar

Dang C., Li Z., Hao C., Xiao Q. (2023). Infrared small marine target detection based on spatiotemporal dynamics analysis. Remote Sens. 15, 1258. doi: 10.3390/rs15051258

Crossref Full Text | Google Scholar

Deng H., Sun X., Liu M., Ye C., Zhou X. (2016). Infrared small-target detection using multiscale gray difference weighted image entropy. IEEE Trans. Aerospace Electronic Syst. 52, 60–72. doi: 10.1109/TAES.2015.140878

Crossref Full Text | Google Scholar

Deng H., Sun X., Liu M., Ye C., Zhou X. (2017). Entropy-based window selection for detecting dim and small infrared targets. Pattern Recognition 61, 66–77. doi: 10.1016/j.patcog.2016.07.036

Crossref Full Text | Google Scholar

Deng L., Zhang J., Xu G., Zhu H. (2021). Infrared small target detection via adaptive m-estimator ring top-hat transformation. Pattern Recognition 112, 107729. doi: 10.1016/j.patcog.2020.107729

Crossref Full Text | Google Scholar

Deshpande S. D., Er M. H., Venkateswarlu R., Chan P. (1999). “Max-mean and max-median filters for detection of small targets,” in 1999 SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation. (Denver, CO, United States), 3809, 74–83. doi: 10.1117/12.364049

Crossref Full Text | Google Scholar

Ding H., Zhao H. (2015). Adaptive method for the detection of infrared small target. Optical Eng. 54, 113107. doi: 10.1117/1.OE.54.11.113107

Crossref Full Text | Google Scholar

Du P., Hamdulla A. (2020a). Infrared moving small-target detection using spatial–temporal local difference measure. IEEE Geosci. Remote Sens. Lett. 17, 1817–1821. doi: 10.1109/LGRS.2019.2954715

Crossref Full Text | Google Scholar

Du P., Hamdulla A. (2020b). Infrared small target detection using homogeneity-weighted local contrast measure. IEEE Geosci. Remote Sens. Lett. 17, 514–518. doi: 10.1109/LGRS.2019.2922347

Crossref Full Text | Google Scholar

Fan X., Wu A., Chen H., Huang Q., Xu Z. (2022). Infrared dim and small target detection based on the improved tensor nuclear norm. Appl. Sci. 12, 5570. doi: 10.3390/app12115570

Crossref Full Text | Google Scholar

Fang H., Chen M., Liu X., Yao S. (2020). Infrared small target detection with total variation and reweighted l1 regularization. Math. Problems Eng. 2020, 1529704. doi: 10.1155/2020/1529704

Crossref Full Text | Google Scholar

Gao C., Meng D., Yang Y., Wang Y., Zhou X., Hauptmann A. G. (2013). Infrared patch-image model for small target detection in a single image. IEEE Trans. Image Process. 22, 4996–5009. doi: 10.1109/TIP.2013.2281420

PubMed Abstract | Crossref Full Text | Google Scholar

Girshick R., Donahue J., Darrell T., Malik J. (2014). “Rich feature hierarchies for accurate object detection and semantic segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Columbus, OH, USA: IEEE). 580–587. doi: 10.1109/CVPR.2014.81

Crossref Full Text | Google Scholar

Gregoris D. J., Yu S. K., Tritchew S., Sevigny L. (1994). “Wavelet transform-based filtering for the enhancement of dim targets in FLIR images,” in Wavelet Applications, vol. 2242 . Ed. Szu H. H. (Orlando, FL, United States: International Society for Optics and Photonics (SPIE), 573–583. doi: 10.1117/12.170058

Crossref Full Text | Google Scholar

Guan X., Peng Z., Huang S., Chen Y. (2020a). Gaussian scale-space enhanced local contrast measure for small infrared target detection. IEEE Geosci. Remote Sens. Lett. 17, 327–331. doi: 10.1109/LGRS.2019.2917825

Crossref Full Text | Google Scholar

Guan X., Zhang L., Huang S., Peng Z. (2020b). Infrared small target detection via non-convex tensor rank surrogate joint local contrast energy. Remote Sens. 12, 1520. doi: 10.3390/rs12091520

Crossref Full Text | Google Scholar

Han J., Liang K., Zhou B., Zhu X., Zhao J., Zhao L. (2018). Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geosci. Remote Sens. Lett. 15, 612–616. doi: 10.1109/LGRS.2018.2790909

Crossref Full Text | Google Scholar

Han J., Liu C., Liu Y., Luo Z., Zhang X., Niu Q. (2021). Infrared small target detection utilizing the enhanced closest-mean background estimation. IEEE J. Selected Topics Appl. Earth Observations Remote Sens. 14, 645–662. doi: 10.1109/JSTARS.2020.3038442

Crossref Full Text | Google Scholar

Han J., Liu S., Qin G., Zhao Q., Zhang H., Li N. (2019). A local contrast method combined with adaptive background estimation for infrared small target detection. IEEE Geosci. Remote Sens. Lett. 16, 1442–1446. doi: 10.1109/LGRS.2019.2898893

Crossref Full Text | Google Scholar

Han J., Ma Y., Huang J., Mei X., Ma J. (2016). An infrared small target detecting algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 13, 452–456. doi: 10.1109/LGRS.2016.2519144

Crossref Full Text | Google Scholar

Han J., Ma Y., Zhou B., Fan F., Liang K., Fang Y. (2014). A robust infrared small target detection algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 11, 2168–2172. doi: 10.1109/LGRS.2014.2323236

Crossref Full Text | Google Scholar

Han J., Moradi S., Faramarzi I., Liu C., Zhang H., Zhao Q. (2020). A local contrast method for infrared small-target detection utilizing a tri-layer window. IEEE Geosci. Remote Sens. Lett. 17, 1822–1826. doi: 10.1109/LGRS.2019.2954578

Crossref Full Text | Google Scholar

Han J., Xu Q., Moradi S., Fang H., Yuan X., Qi Z., et al. (2022). A ratio-difference local feature contrast method for infrared small target detection. IEEE Geosci. Remote Sens. Lett. 19, 1–5. doi: 10.1109/LGRS.2022.3157674

Crossref Full Text | Google Scholar

Hao C., Li Z., Zhang Y., Chen W., Zou Y. (2024). Infrared small target detection based on adaptive size estimation by multidirectional gradient filter. IEEE Trans. Geosci. Remote Sens. 62, 1–15. doi: 10.1109/TGRS.2024.3502421

Crossref Full Text | Google Scholar

Hao X., Liu X., Liu Y., Cui Y., Lei T. (2023). Infrared small-target detection based on background suppression proximal gradient and GPU acceleration. Remote Sens. 15, 5424. doi: 10.3390/rs15225424

Crossref Full Text | Google Scholar

He Y., Li M., Zhang J., An Q. (2015). Small infrared target detection based on low-rank and sparse representation. Infrared Phys. Technol. 68, 98–109. doi: 10.1016/j.infrared.2014.10.022

Crossref Full Text | Google Scholar

Hui B., Song Z., Fan H., Zhong P., Hu W., Zhang X., et al. (2019). A dataset for infrared image dim-small aircraft target detection and tracking under ground/air background. doi: 10.11922/sciencedb.902

Crossref Full Text | Google Scholar

Itti L., Koch C., Niebur E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259. doi: 10.1109/34.730558

Crossref Full Text | Google Scholar

Kong X., Liu L., Qian Y., Cui M. (2016). Automatic detection of sea-sky horizon line and small targets in maritime infrared imagery. Infrared Phys. Technol. 76, 185–199. doi: 10.1016/j.infrared.2016.01.016

Crossref Full Text | Google Scholar

Kou R., Wang C., Fu Q., Yu Y., Zhang D. (2022). Infrared small target detection based on the improved density peak global search and human visual local contrast mechanism. IEEE J. Selected Topics Appl. Earth Observations Remote Sens. 15, 6144–6157. doi: 10.1109/JSTARS.2022.3193884

Crossref Full Text | Google Scholar

Kou R., Wang C., Peng Z., Zhao Z., Chen Y., Han J., et al. (2023). Small target segmentation networks: A survey. Pattern Recognition 143, 109788. doi: 10.1016/j.patcog.2023.109788

Crossref Full Text | Google Scholar

Lin Z., Ganesh A., Wright J., Wu L., Chen M., Ma Y. (2009). Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix.

Google Scholar

Liu Y., Li N., Cao L., Zhang Y., Ni X., Han X., et al. (2024). Research on infrared dim target detection based on improved YOLOv8. Remote Sens. 16, 2878. doi: 10.3390/rs16162878

Crossref Full Text | Google Scholar

Liu D., Li Z., Liu B., Chen W., Liu T., Cao L. (2017). Infrared small target detection in heavy sky scene clutter based on sparse representation. Infrared Phys. Technol. 85, 13–31. doi: 10.1016/j.infrared.2017.05.009

Crossref Full Text | Google Scholar

Liu L., Wei Y., Wang Y., Yao H., Chen D. (2023). Using double-layer patch-based contrast for infrared small target detection. Remote Sens. 15, 3839. doi: 10.3390/rs15153839

Crossref Full Text | Google Scholar

Lu Z., Huang Z., Song Q., Bai K., Li Z. (2022b). An enhanced image patch tensor decomposition for infrared small target detection. Remote Sens. 14, 6044. doi: 10.3390/rs14236044

Crossref Full Text | Google Scholar

Lu R., Yang X., Li W., Fan J., Li D., Jing X. (2022a). Robust infrared small target detection via multidirectional derivative-based weighted contrast measure. IEEE Geosci. Remote Sens. Lett. 19, 1–5. doi: 10.1109/LGRS.2020.3026546

Crossref Full Text | Google Scholar

Luo Y., Li X., Chen S., Xia C., Zhao L. (2022). IMNN-LWEC: A novel infrared small target detection based on spatial–temporal tensor model. IEEE Trans. Geosci. Remote Sens. 60, 1–22. doi: 10.1109/TGRS.2022.3230051

Crossref Full Text | Google Scholar

Luo Y., Li X., Wang J., Chen S. (2024). Clustering and tracking-guided infrared spatial–temporal small target detection. IEEE Trans. Geosci. Remote Sens. 62, 1–20. doi: 10.1109/TGRS.2024.3384440

Crossref Full Text | Google Scholar

Ma T., Yang Z., Wang J., Sun S., Ren X., Ahmad U. (2022). Infrared small target detection network with generate label and feature mapping. IEEE Geosci. Remote Sens. Lett. 19, 1–5. doi: 10.1109/LGRS.2022.3140432

Crossref Full Text | Google Scholar

Moradi S., Moallem P., Sabahi M. F. (2016). Scale-space point spread function based framework to boost infrared target detection algorithms. Infrared Phys. Technol. 77, 27–34. doi: 10.1016/j.infrared.2016.05.007

Crossref Full Text | Google Scholar

Pang D., Shan T., Li W., Ma P., Tao R., Ma Y. (2022a). Facet derivative-based multidirectional edge awareness and spatial-temporal tensor model for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 60, 1–15. doi: 10.1109/TGRS.2021.3098969

Crossref Full Text | Google Scholar

Pang D., Shan T., Ma P., Li W., Liu S., Tao R. (2022b). A novel spatiotemporal saliency method for low-altitude slow small infrared target detection. IEEE Geosci. Remote Sens. Lett. 19, 1–5. doi: 10.1109/LGRS.2020.3048199

Crossref Full Text | Google Scholar

Peng L., Lu Z., Lei T., Jiang P. (2024). Dual-structure elements morphological filtering and local z-score normalization for infrared small target detection against heavy clouds. Remote Sens. 16, 2343. doi: 10.3390/rs16132343

Crossref Full Text | Google Scholar

Qi S., Ma J., Li H., Zhang S., Tian J. (2014). Infrared small target enhancement via phase spectrum of quaternion fourier transform. Infrared Phys. Technol. 62, 50–58. doi: 10.1016/j.infrared.2013.10.008

Crossref Full Text | Google Scholar

Qin H., Han J., Yan X., Zeng Q., Zhou H., Li J., et al. (2016). Infrared small moving target detection using sparse representation-based image decomposition. Infrared Phys. Technol. 76, 148–156. doi: 10.1016/j.infrared.2016.02.003

Crossref Full Text | Google Scholar

Qin Y., Li B. (2016). Effective infrared small target detection utilizing a novel local contrast method. IEEE Geosci. Remote Sens. Lett. 13, 1890–1894. doi: 10.1109/LGRS.2016.2616416

Crossref Full Text | Google Scholar

Redmon J., Divvala S., Girshick R., Farhadi A. (2016). “You only look once: Unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Las Vegas, NV, USA: IEEE). 779–788. doi: 10.1109/CVPR.2016.91

Crossref Full Text | Google Scholar

Ren S., He K., Girshick R., Sun J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149. doi: 10.1109/TPAMI.2016.2577031

PubMed Abstract | Crossref Full Text | Google Scholar

Shao X., Fan H., Lu G., Xu J. (2012). An improved infrared dim and small target detection algorithm based on the contrast mechanism of human visual system. Infrared Phys. Technol. 55, 403–408. doi: 10.1016/j.infrared.2012.06.001

Crossref Full Text | Google Scholar

Tang Y., Xiong K., Wang C. (2023). Fast infrared small target detection based on global contrast measure using dilate operation. IEEE Geosci. Remote Sens. Lett. 20, 1–5. doi: 10.1109/LGRS.2023.3233958

Crossref Full Text | Google Scholar

Wang H., Hu Y., Wang Y., Cheng L., Gong C., Huang S., et al. (2024). Infrared small target detection based on weighted improved double local contrast measure. Remote Sens. 16, 4030. doi: 10.3390/rs16214030

Crossref Full Text | Google Scholar

Wang X., Lv G., Xu L. (2012). Infrared dim target detection based on visual attention. Infrared Phys. Technol. 55, 513–521. doi: 10.1016/j.infrared.2012.08.004

Crossref Full Text | Google Scholar

Wang X., Peng Z., Kong D., Zhang P., He Y. (2017). Infrared dim target detection based on total variation regularization and principal component pursuit. Image Vision Computing 63, 1–9. doi: 10.1016/j.imavis.2017.04.002

Crossref Full Text | Google Scholar

Wei H., Ma P., Pang D., Li W., Qian J., Guo X. (2022). Weighted local ratio-difference contrast method for detecting an infrared small target against ground–sky background. Remote Sens. 14, 5636. doi: 10.3390/rs14225636

Crossref Full Text | Google Scholar

Wei Y., You X., Li H. (2016). Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognition 58, 216–226. doi: 10.1016/j.patcog.2016.04.002

Crossref Full Text | Google Scholar

Wu F., Zhang T., Li L., Huang Y., Peng Z. (2024). “Rpcanet: Deep unfolding rpca based infrared small target detection,” in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). (Waikoloa, HI, USA: IEEE). 4797–4806. doi: 10.1109/WACV57701.2024.00474

Crossref Full Text | Google Scholar

Xiong B., Huang X., Wang M. (2021). Local gradient field feature contrast measure for infrared small target detection. IEEE Geosci. Remote Sens. Lett. 18, 553–557. doi: 10.1109/LGRS.2020.2976208

Crossref Full Text | Google Scholar

Xu K., Song C., Xie Y., Pan L., Gan X., Huang G. (2024). RMT-YOLOv9s: An infrared small target detection method based on UAV remote sensing images. IEEE Geosci. Remote Sens. Lett. 21, 1–5. doi: 10.1109/LGRS.2024.3484748

Crossref Full Text | Google Scholar

Yang P., Dong L., Xu W. (2022). Detecting small infrared maritime targets overwhelmed in heavy waves by weighted multidirectional gradient measure. IEEE Geosci. Remote Sens. Lett. 19, 1–5. doi: 10.1109/LGRS.2021.3080389

Crossref Full Text | Google Scholar

Yang W., Shen Z. (1998). Small target detection and preprocessing technology in infrared image sequences. Infrared Laser Eng. 27, 23–28.

Google Scholar

Yang L., Yang J., Yang K. (2004). Adaptive detection for infrared small target under sea-sky complex background. Electron. Lett. 40, 1083–1085. doi: 10.1049/el:20045204

Crossref Full Text | Google Scholar

Yao H., Liu L., Wei Y., Chen D., Tong M. (2023). Infrared small-target detection using multidirectional local difference measure weighted by entropy. Sustainability 15, 1902. doi: 10.3390/su15031902

Crossref Full Text | Google Scholar

Zhang Z., Ding C., Gao Z., Xie C. (2023). ANLPT: Self-adaptive and non-local patch-tensor model for infrared small target detection. Remote Sens. 15, 1021. doi: 10.3390/rs15041021

Crossref Full Text | Google Scholar

Zhang X., Ding Q., Luo H., Hui B., Chang Z., Zhang J. (2019). Infrared small target detection based on an image-patch tensor model. Infrared Phys. Technol. 99, 55–63. doi: 10.1016/j.infrared.2019.03.009

Crossref Full Text | Google Scholar

Zhang Y., Li Z., Siddique A., Azeem A., Chen W. (2024). A real-time infrared small target detection based on double dilate contrast measure. IEEE J. Selected Topics Appl. Earth Observations Remote Sens. 17, 16005–16019. doi: 10.1109/JSTARS.2024.3421646

Crossref Full Text | Google Scholar

Zhang X., Ren K., Wan M., Gu G., Chen Q. (2017). Infrared small target tracking based on sample constrained particle filtering and sparse representation. Infrared Phys. Technol. 87, 72–82. doi: 10.1016/j.infrared.2017.10.003

Crossref Full Text | Google Scholar

Zhang P., Zhang L., Wang X., Shen F., Pu T., Fei C. (2021). Edge and corner awareness-based spatial–temporal tensor model for infrared small-target detection. IEEE Trans. Geosci. Remote Sens. 59, 10708–10724. doi: 10.1109/TGRS.2020.3037938

Crossref Full Text | Google Scholar

Zhao J., Tang Z., Yang J., Liu E. (2011). Infrared small target detection using sparse representation. J. Syst. Eng. Electron. 22, 897–904. doi: 10.3969/j.issn.1004-4132.2011.06.004

Crossref Full Text | Google Scholar

Zhou D., Wang X. (2023). Research on high robust infrared small target detection method in complex background. IEEE Geosci. Remote Sens. Lett. 20, 1–5. doi: 10.1109/LGRS.2023.3297523

Crossref Full Text | Google Scholar

Zhu H., Liu S., Deng L., Li Y., Xiao F. (2020a). Infrared small target detection via low-rank tensor completion with top-hat regularization. IEEE Trans. Geosci. Remote Sens. 58, 1004–1016. doi: 10.1109/TGRS.2019.2942384

Crossref Full Text | Google Scholar

Zhu H., Ni H., Liu S., Xu G., Deng L. (2020b). TNLRS: Target-aware non-local low-rank modeling with saliency filtering regularization for infrared small target detection. IEEE Trans. Image Process. 29, 9546–9558. doi: 10.1109/TIP.2020.3028457

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: remote sensing, infrared small dim target detection, relative global contrast measure, local contrast measure, hybrid contrast measure, sparse and low rank decomposition

Citation: Han J, Moradi S, Wang W, Li N, Zhao Q and Luo Z (2025) A hybrid contrast method for infrared small dim target detection. Front. Mar. Sci. 12:1584345. doi: 10.3389/fmars.2025.1584345

Received: 27 February 2025; Accepted: 09 April 2025;
Published: 08 May 2025.

Edited by:

Yimian Dai, Nanjing University of Science and Technology, China

Reviewed by:

Fei Zhou, Henan University of Technology, China
Renke Kou, Air Force Engineering University, China

Copyright © 2025 Han, Moradi, Wang, Li, Zhao and Luo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jinhui Han, aGFuamluaHVpQHprbnUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.