Real-world low-light image enhancement via domain-gap aware framework and reverse domain-distance guided strategy

Low-light image enhancement (LLIE) has high practical value and development potential in real scenarios. However, the current LLIE methods reveal inferior generalization competence to real-world low-light (LL) conditions of poor visibility. We can attribute this phenomenon to the severe domain bias between the synthetic LL domain and the real-world LL domain. In this article, we put forward the Domain-Gap Aware Framework, a novel two-stage framework for real-world LLIE, which is the pioneering work to introduce domain adaptation into the LLIE. To be more specific, in the first stage, to eliminate the domain bias lying between the existing synthetic LL domain and the real-world LL domain, this work leverages the source domain images via adversarial training. By doing so, we can align the distribution of the synthetic LL domain to the real-world LL domain. In the second stage, we put forward the Reverse Domain-Distance Guided (RDDG) strategy, which takes full advantage of the domain-distance map obtained in the first stage and guides the network to be more attentive to the regions that are not compliance with the distribution of the real world. This strategy makes the network robust for input LL images, some areas of which may have large relative domain distances to the real world. Numerous experiments have demonstrated the efficacy and generalization capacity of the proposed method. We sincerely hope this analysis can boost the development of low-light domain research in different fields.


Introduction
Real-world LLIE aims to reconstruct normal light images from observations acquired under low-light conditions with low visibility and poor details. Numerous scientific deeplearning approaches [1][2][3][4][5] with the advantage of the powerful capability to learn features [6][7][8][9][10] have been extensively proposed. For the efficient LLIE task [11][12][13], they recover the visibility and precise details of low-illumination images by learning the relationship between LL and NL images. As the efficiency of deep learning methods is subordinate to the dataset, some methods collect bursts of images with multiple exposure levels captured in real scenarios for real-world LLIE applications [14,15]. However, since the collections of largescale paired datasets are incredibly laborious and expensive [16], the existing paired datasets are usually of small scale, which may cause overfitting when training networks using them. Therefore, some methods have been put forward to enlarge the scale of datasets by synthesizing lowillumination images and forming paired datasets with normal illumination images [17,18]. However, the synthetic LL images are usually not compliant with real-world distribution, leading to poor generalization capability to the real world for the LLIE methods trained on these datasets [19]. Specifically, the illumination level cannot be improved sufficiently to recover details, or the white balance cannot be maintained correctly. Therefore, it is a worthwhile but challenging task to generate enhancement results that match real-world distribution.
Unsupervised methods are of high practical value and development potential because they do not require paired datasets captured in the same static scenarios [20,21]. They implement LLIE tasks by taking full advantage of unpaired real-world NL images and LL images. To realize the concept of aligning the distribution of enhanced NL images to the unpaired NL domain, existing methods usually adopt adversarial training directly for the enhanced results against the real-world NL images. Further, to ensure that all regions in the enhanced images are close to the real ones, EnlightenGAN [22] crops image patches randomly from the enhanced images and adopts adversarial training against the real NL image patches. However, these methods seldom notice that the severe domain gap may impede the enhancement performance but only focus on the enhancement procedure, which degrades the generalization performance of the networks trained on synthetic datasets. Moreover, randomly comparing image patches does not guarantee that all regions of the enhanced images match the real-world distribution.
Over recent years, researchers have extensively proposed ways to address the shortage of data with labels for training. For Domain adaptation (DA) methods, the labeled data enables adequate training in the source domain as well as performing new tasks on the unlabeled target domain with new distribution [23]. It greatly improves the effectiveness of methods on the target domain, which is appropriate for real-world LLIE tasks.
In this article, by comprehensively reviewing the potential and reaping the full benefits of alternative methods, we put forward a two-stage framework with the merit of both adversarial learning and domain adaptive methods. Specifically, we propose the Domain-Gap Aware Framework to implement real-world LLIE tasks, which addresses the issue that the input LL images deviate from the real-world distribution.
As shown in Figure 1, the noticeable domain gap between the real-world and the synthetic LL domain can be observed. Besides, different areas in a single low-light image may have different relative domain distances. We find that the domain gap severely degrades the generalization competency of the network to real-world lowlight conditions. Therefore, unlike existing methods that ignore the domain bias for synthetic LL images, we propose the Domain-Gap Aware Framework. Specifically, in the first stage, we impose adversarial training on the Darkening Network to eliminate the severe domain gap and generate realistic pseudo-LL images. By doing so, we obtain pseudo-LL images that are consistent with realworld distribution, as well as domain-distance maps. In the second stage, we propose the Reverse Domain-Distance Guided strategy to capitalize fully on the domain-distance maps and mitigate the unrealistic areas of pseudo-LL images. In detail, we assign higher weights to the regions in the generated NL images that are relatively far from the real-world domain; while assigning smaller weights to the realistic regions in the training phase, thus mitigating the uncompetitive enhancement competence to real-world scenarios due to the unrealistic input LL patches. The proposed two-stage framework generalizes well to the real world with boosted illumination level and clearly reconstructed structural details, which can significantly facilitate subsequent computer vision tasks and systems [24] focusing on objects at nighttime.
The following are the key contributions to this article: • We put forward the Domain-Gap Aware Framework to address the domain-gap issue and generate pseudo-lowlight images consistent with real-world distribution, which is essential to attain models with high generalization capability for real-world LLIE. • A Reverse Domain-Distance Guided strategy is proposed for real-world applications. The pixel-wise domain distance maps are taken full advantage of to further promote the robustness of the Enlightening Network. It is worth pointing out that this is the pioneering work to introduce DA to LLIE as far as we know.
The remainder of this paper is structured as follows. In Section 2, we present a brief review of some related works in the LLIE field. Section 3 introduces the proposed framework and strategy. Section 4 shows experimental results to demonstrate the effect of our method, and Section 5 gives a conclusion of the paper.
2 Related work 2.1 CNN-based approaches CNN-based approaches have become a principal method in the LLIE field with their high efficiency in image analysis [25,26]. They reconstruct the contrast and structural details of LL images by learning the mapping relationship between LL-NL images. Some methods have collected paired data in real scenarios [27]. However, it is difficult to construct large-scale paired datasets due to the required high cost and heavy workforce. Since the applications of deep learning methods are usually hampered by shortages of data in pairs for training [28], some methods have also made attempts to construct simulated datasets [17,18,29]. It is widely known that the data for training are essential for the networks' performance [30]. However, the synthetic dataset was generated under the assumption of simple degradation in terms of illumination level, noise, etc., which leads to the poor generalization of the trained networks to the real world and the side effect, e.g., color distortion and insufficiently improved illumination.
Real-world LLIE has attracted significant research due to its high practical value. Researchers have been extensively designing diverse architectures to achieve better generalization to the real world. In EnlightenGAN [22], the design philosophy is to address the domain gap issue by applying adversarial training using unpaired datasets. In addition, researchers have also made efforts to zero-shot LLIE. Zero-DCE [31] regards the LLIE as a task of curve estimation for imagewise dynamic range adjustment. However, it pays little attention to the domain gap between the to-be-enhanced LL images and the real-world ones and only focuses on the enhanced NL images, which degrades the generalization performance of the networks trained on synthetic datasets.

Domain adaptation
Domain adaptation (DA) intends to enhance performance when confronting a new target domain despite domain bias [32]. It is beneficial to deal with data shortages for tasks that are difficult to obtain real data.
In this work, we concentrate on eliminating the domain gap to synthesize realistic LL images, which is a preparation phase for the enlightening stage. Inspired by relevant studies in super-resolution [33], we construct a Domain-Distance Aware framework to perform the real-world LLIE. We apply DA to improve the performance of LLIE on real data.
In the next sections, we introduce the proposed Domain-Distance Aware framework and Reverse Domain-Distance Guided strategy in detail.

Network architecture
Given two domains, which can be described as the LL domain and the NL domain, our goal is to learn an Enlightening Network to promote the visibility and reconstruct structural details of the images in the LL domain while generating enhanced NL estimations belonging to the real-world NL domain. To achieve this objective, we propose the Domain-Gap Aware Framework. We did not follow previous work that directly utilizes the existing synthetic low-illumination datasets to train the Enlightening Network. Instead, our framework takes the domain bias between x g and x r into full account. As shown in Figure 2, during the first phase, we train the Darkening Network using adversarial training, which generates pseudo-LL images belonging to the real-world LL domain as well as domain distance maps. Then, in the second stage, we put forward the Reverse Domain-Distance Guided strategy, which leverages the pseudo-LL-NL image pairs and domain distance maps to train the Enlightening Network.
In the next subsection, we first describe how to train the Darkening Network to generate LL-NL image pairs in line with real-world distributions. Then, we show the Reverse Domain-Distance Guided strategy.

Training of darkening network
The general procedure of synthesizing low-light images by existing methods is manually adjusting the illumination and adding noise [17,18]. However, the illumination levels in the real world are diverse and may also vary spatially in a single image. Moreover, it is difficult to represent noise with simple and known distribution. In a word, the degradations assumed by existing methods are too simple to fully simulate the complex degradation in the real world, which unfortunately leads to domain bias lying between the synthesized LL images and the real-world ones. In contrast, our approach employs a deep Frontiers in Physics frontiersin.org network (i.e., the Darkening Network) to learn the real-world degradation process. It works as the generator in the whole framework and extracts the features of NL images using eight blocks (each layer is convolved by a 3 × 3 kernel and activated by a ReLU activation in between).

Losses
We employ multiple loss functions to train the Darkening Network. To ensure that the content of the pseudo-LL images is preserved consistently with the GT (Ground-Truth) LL images, we adopt content loss along with the perceptual loss to optimize the distance between them at the image level and the feature level, respectively. In detail, the content loss contains reconstruction loss, which is L1-norm and SSIM (Structural SIMilarity Index) [34] loss, which aims at measuring structural similarities between two images. The reason why we adopt L1-norm as our reconstruction loss is that it treats all errors equally so that the training can keep going even though the error is tiny. Perceptual loss is also widely used in the image reconstruction field, which measures the distance between features extracted via deep neural networks.
We show the adopted loss functions above, where Φ(·) denotes the convolutional layers of the conv5_3 of VGG-16 [35], and SSIM(·,·) means the SSIM score between two input images.
In addition to the above training, to address the domain gap issue and align the distribution of the pseudo-LL images to the real world, the pseudo-LL images are trained against the real-world LL images by adversarial training. Specifically, we adopt a similar strategy as DASR [33], which uses a patch discriminator with four layers of fully convolutional layers to determine whether each image block matches the real-world distribution. This strategy facilitates pseudo-LL images to fit the real-world distribution.
The loss functions are shown above, where D(·) denotes the patch discriminator.

Reverse domain-distance guided strategy
As shown in Figure 1 previously, each region in the generated LL image may distant diversely from the domain of the real world, i.e., some regions lie relatively close to the domain of the real world, while some regions are relatively far. Since the regions relatively far from the domain of the real world may degrade the enhancement competency of the network, we should endow different regions with diverse attention. We realize this concept by reversing the domain distance maps first and then applying them to eliminate the discrepancy between y g i and x r i . Thereby, we adaptively adjust the loss functions by assigning diverse weight parameters to these regions adaptively. We present the Reverse Domain-Distance Guided strategy in Figure 3.

Losses
We denote the supervised losses as follows, where ω i denotes the domain distance map for x g i , which is attained by the patch Discriminator trained in the first stage.
The trained patch Discriminator can differentiate between the pseudo-LL patches and those from the real-world domain. A smaller value in ω i means a lower probability that the pseudo-LL patches Frontiers in Physics frontiersin.org belong to the real-world domain. It also indicates a higher value in the reverse of ω i , i.e., 1-ω i , and a larger domain distance from the pseudo-LL to the real-world domain. Therefore, we guide the network to be attentive to the enhanced outcomes of the input pseudo-LL patches relatively far from the real-world domain by endowing distance-related importance to different areas. The Reverse Domain-Distance Guided strategy makes full use of domain distance to remedy the unrealistic areas of pseudo-LL images and further improves the generalization to the real world.
To evaluate the proposed method, we describe experimental settings and results in detail in the next section.

Experiments
Since the similarity with the ground-truth NL images can reflect the enhanced result to a large extent, we adopt PSNR and SSIM [34] as reference metrics, two widely adopted quality metrics in the image restoration field. In addition, as our method adopts a generativeadversarial network, we also focus on perceptual quality. Therefore, we also adopt LPIPS (Learned Perceptual Image Patch Similarity [36] as the quality metric. Diverse ablation studies are carried out by us to figure out the effect of the proposed strategies in our framework. Then, to figure out our method's generalization competency, the real-world LL dataset is assigned as the testing set. Finally, we further make comparisons with other competing LLIE approaches by applying them to real-world LL datasets.

Training settings
Researchers have constructed MIT-Adobe FiveK dataset [37], which consists of 5,000 photos retouched by five experts, to adjust the global tone. It has been widely leveraged in the LLIE field. We applied GladNet [38] to the normal-light images retouched by MIT-Adobe FiveK dataset's Expert E to obtain synthetic LL images. We separate 4,000 paired NL-LL images from it to prepare for training and 1,000 paired images to prepare for validation. Then we resize the images to 600 × 400 resolution. Besides, we adopt the DARK FACE dataset (4,000 for training, 1,000 for validation) [39], which consists of 6,000 images obtained under real-world nighttime conditions, as the real-world low-light references.
Let us now turn our attention to the main framework. The network is assigned random weights initially. The Adam method (with momentum and weight decay set to be 0.9 and 0.001, respectively) is adopted to update the network's parameters. Besides, the learning rate is assigned to be 0.0001 initially and then is halved every ten epochs. During the whole training procedure, we maintain a batch size of 16. We carry out all the evaluation experiments on the NVIDIA GeForce GTX3090 and NVIDIA GeForce GTX1080Ti with PyTorch.

Ablation studies
Before conducting comparison experiments with recently competing methods, we carried out a variety of ablation experiments to delve into diverse loss functions as well as the proposed framework.

Effect of loss functions
We carry out a variety of trying outs for diverse loss functions and figure out the quantitative outcomes on the widely adopted metrics, i.e., PSNR and SSIM, along with LPIPS. During the computation of LPIPS, we extract features of input images through AlexNet [40] to calculate the distance between them. A small LPIPS value means a high similarity. Table 1 displays the quantitative outcomes.
Firstly, let us analyze the effect of each loss function. We can find from the 3rd, 4th, 5th row that in comparison to being supervised by the reconstruction loss and adversarial loss, adding SSIM loss boosts the performance on PSNR, SSIM, and LPIPS metrics with 3.871dB, 0.0654dB, and 0.1177dB, respectively, and adding perceptual loss improves the performance on the three metrics with 3.19dB,

FIGURE 3
The proposed Reverse Domain-Distance Guided strategy. It facilitates the Enlightening Network to be more attentive to the less realistic regions of the input LL images.
Frontiers in Physics frontiersin.org 0.0195dB, 0.0338dB, respectively. It indicates the effectiveness of both SSIM loss and perceptual loss in reconstructing texture and details of contents. Secondly, we can find from the 1st and 2nd row that the best performance is achieved under the settings of using merely content loss, which includes reconstruction loss and SSIM loss. Note that content loss aims at reducing the distance between input images. Therefore, training with them equals supervised learning, which easily achieves better performance in comparison to adversarial training. Nevertheless, our method, which contains adversarial learning for fitting with real-world low-light image distributions, achieves similar quantitative results with supervised learning. In specific, our method performs second best on PSNR and SSIM and obtains rank third on LPIPS with a difference of only 0.028dB, 0.0054dB, and 0.0075 dB with the rank first scores.
As shown qualitatively in Figure 4, our method generates images with sufficiently low exposure levels and correct white balance. Therefore, we can conclude that the LL images generated by our method keep contents consistent with LL images from the existing dataset but closer to the ones captured under poor light conditions. We finally chose ω col 1, ω ssim 1, ω per 0.02, and ω adv 0.02 for the weight parameters of each loss function.

Effect of the darkening network 4.2.2.1 Comparisons of LL images
We adopt a generative adversarial network to generate pseudo-LL images so that they are close to LL images from the existing dataset in terms of contents while in compliance with the distribution of real-world LL images. Figure 4 presents the contrast between the MIT-Adobe FiveK dataset [37] and pseudo-LL images synthesized by our method. We can find in the 2nd row of the panel (A) and (B) in Figure 4 that the white balance of several images in the existing low-light dataset is going wrong, where white areas of the original NL images appear orange in the existing lowlight dataset. This may lead to the color shift in the enhanced images enhanced, which is further proven in Figure 5. Besides, the 2nd row of Figure 4B suggests that the illumination level is not low enough to simulate night lighting conditions. In contrast, the proposed Darkening Network maintains the correct white balance and decreases the illumination level sufficiently in LL versions, as displayed in the 3rd row of the panel (A) and (B) in Figure 4, which facilitates the lightening network to generalize better to the real-world low-light condition.

Comparisons of enhancement results
Furthermore, the effect of the proposed Darkening Network is investigated in this subsection. We train the Darkening Network both using pseudo-LL-NL pairs x g , y r and existing paired dataset x r , y r , and compare the outcomes in terms of quality and quantity. Figure 5 shows qualitative comparisons. As shown in Figure 5A, we can clearly see that the enhanced outcomes of the existing LL dataset suffer from the color shift. This is because the input LL images are of imperfect white balance, as shown in Figure 4 previously. Besides, as shown in Figure 5C, it easily appears over-exposure, which hinders some regions (e.g., regions in the dark color such as hair, ribbon, skin, and so on.) from retaining semantic darkness, unfortunately. This is due to the insufficient illumination level in existing LL images. Moreover, the enhanced results of the backlit image (in the 5th row of Figure 5C) suffer from artifacts severely. In contrast, as shown in Figures 5B, D, the enhanced results of pseudo-LL images are of correct white balance and appropriate exposure level with good preservation of semantic information, as well as much fewer artifacts introduced to backlit images. Therefore, we confirm that the Enlightening Network can produce superior enhancement results collaborated with the Darkening Network, which fully reflects the effect of the Darkening Network.
Next, we display quantitative comparison results in Table 2.
We can clearly find that training with pseudo-LL-NL pairs x g , y r achieves better scores on PSNR, SSIM, and LPIPS, which exceeds the GT LL-NL pairs x r , y r to a great extent. More specifically, training with x g , y r achieves considerably higher scores on PSNR and SSIM than training with x r , y r , both with and without the Reverse Domain-Distance Guided strategy. More specifically, as shown in column2 and 4, training with x g , y r exceeds training with x r , y r by 16.252 dB (= 35.276-18.751 dB) on PSNR and by 0.1022 dB (= 0.9696-0.8674 dB) on SSIM. It demonstrates that the synthesized pseudo-LL-NL pairs are more suitable for real-world LLIE than GT LL-NL pairs from the existing dataset. The reason is that the proposed Darkening Network aims to address the domain gap via adversarial training so that the pseudo-LL images match the real-world distribution in terms of exposure level, hue, noise, and so on. However, the existing procedure of synthesizing low-light images assumes simple degradations from NL images, which is far from the complex degradations of the real world. Therefore, we confirm that by adequately taking advantage of target domain data during the training process, the proposed Darkening Network makes a significant contribution to the improvement of enhancement quality. Recons., Adv., and Percept. indicate reconstruction loss function, adversarial loss, and perceptual loss, respectively. Scores marked in bold indicate the highest scores on the corresponding metric.
Frontiers in Physics frontiersin.org  Frontiers in Physics frontiersin.org 08

Effect of reverse domain-distance guided strategy
To verify the effectiveness of the Reverse Domain-Distance Guided strategy, we conduct ablation studies with the settings of training with x g , y r , and x r , y r . Table 2 and Figure 6 show quantitative and qualitative results, respectively. For convenient comparison, the quantitative outcomes are displayed in Table 2. We can easily discover that adopting the Reverse Domain-Distance Guided strategy improves the PSNR and SSIM with a certain magnitude of 1.981 dB (=37.257-35.276 dB) on PSNR and 0.0035 dB (= 0.9731-0.9696 dB) on SSIM when training with x g , y r . The reason is that domain distances between pseudo-LL images x g and real-world LL images x r are taken full advantage of at the enhancement stage. Specifically, the Enlightening Network is driven to emphasize the regions that are not in compliance with the real world by allocating greater weight to them during the training process. Therefore, it is easy to understand that the collocation of the Reverse Domain-Distance Guided strategy and the proposed Darkening network is beneficial to the reconstruction of texture and details with the pseudo-LL-NL pairs x g , y r .
Let us investigate the effect of the Reverse Domain-Distance Guided strategy. We can find from Figure 6 that those semantically dark regions maintain their semantic darkness during the improvement of illumination level without under-exposure for other regions, which facilitates the images to appear more realistic. Therefore, the proposed Reverse Domain-Distance Guided strategy is of significance for the generalization of LLIE to the real world.

Evaluations of generalization on the realworld dataset
The Exclusively Dark dataset [41] is proposed to facilitate better detection under poor visibility conditions for nighttime systems and applications. It contains a total of 7,363 images of 12 specified object categories. Some images were sub-sampled from existing large-scale datasets, including Microsoft COCO, PASCAL VOC, and ImageNet. We carry out evaluations for the generalization capacity on the Exclusively Dark dataset and DARK FACE dataset. Figure 7 shows a The scores marked in bold indicate the highest scores on the corresponding metric.

FIGURE 6
Qualitative comparison of the enhancement results without and with Reverse Domain-Distance Guided strategy. The texture and details are better reconstructed with the propsed strategy.

Comparative experiments with state-ofthe-arts
Let us conduct comparative experiments with recent competing LLIE approaches on the DARK FACE dataset [39]. Figure 8 [48]. Qualitative results show that all the methods can effectively enhance the LL images captured under severely real-world low-light nighttime environments in terms of illumination level. However, some methods introduce side effects. Specifically, it can be clearly found that the overall hue of the image is distorted in EnlightenGAN. Besides, DRBN, TBEFN, and MBLLEN introduce distinct artifacts to local areas. It can be concluded that DSLR, RRDNet, and our method attain the top three best performances. Let us further investigate their differences in detail. It can be clearly found that DSLR and RRDNet tend to generate blur artifacts, i.e., structural details cannot be clearly reconstructed. Besides, RRDNet cannot sufficiently improve the illumination level. In contrast, our method improves visibility without introducing blurriness and shows a better reconstruction FIGURE 7 Evaluation of generalization capability on real-world datasets, i.e., (A) Exclusive Dark dataset and (B) DARK FACE dataset. In both (A,B) panels, the 1st row indicates input LL images, and the 2nd row shows enhanced results by the proposed framework. Our method has superior generalization capability to extremely low-illumination real-world conditions.
Frontiers in Physics frontiersin.org

FIGURE 8
Vivid qualitative enhancement outcomes of recently competing network structures and our framework on the DARK FACE dataset. We present the results of SOTA methods on two specified images. We further compare the three most competing methods marked with boxes by zooming in on the local area of their enhanced results. Our method achieves the more superior enhancement results for real-world LL images than SOTA methods in respect of structural details and visibility.
Frontiers in Physics frontiersin.org 11 of details, as shown in the zoomed-in comparison results. Therefore, we can confirm that our approach works most effectively relative to other recently competing LLIE methodologies.
Finally, we give a conclusion in Section 5.

Conclusion
This paper introduces domain adaptation to the LLIE field. Unlike previous methods that directly adopt existing synthetic lowlight datasets, we propose the Domain-Gap Aware Framework, which addresses the dilemma of domain-gap lying between pseudo-LL and real-world LL domain. To eliminate the domain gap, we employ adversarial training to the Darkening Network in the first stage and obtain domain distance maps. In the second stage, we put forward a Reverse Domain-Distance Guided (RDDG) strategy, which further drives the enhancement network to focus on the regions that are not consistent with real-world distribution. In the second stage, we put forward a Reverse Domain-Distance Guided (RDDG) strategy, which further guides the Enlightening network to be attentive to the regions that are not consistent with real-world distribution. We objectively validate the effect of our framework on real-world LL datasets and conduct comparative experiments with other methods. Prominent experimental outcomes present that our framework outperforms other competing network structures.
In our future endeavors, we will explore more contributory approaches for the LLIE field. In addition, we will introduce LLIE methods to subsequent computer vision tasks and systems for diverse applications, such as driving assistant systems and nighttime surveillance.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://flyywh.github.io/ CVPRW2019LowLight/, https://github.com/weichen582/GLADNet.

Author contributions
YC and MH contributed to putting forward core conceptions and design of the study; YC provided the computing platform; HL, KS, and JZ organized the database; YC and MH performed the statistical analysis and wrote the first draft of the manuscript; YC contributed to manuscript revision, read, and approved the submitted version.