<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neuroinform.</journal-id>
<journal-title>Frontiers in Neuroinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neuroinform.</abbrev-journal-title>
<issn pub-type="epub">1662-5196</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fninf.2022.953235</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Multi illumination color constancy based on multi-scale supervision and single-scale estimation cascade convolution neural network</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Fei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1824492/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wei</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wu</surname> <given-names>Dan</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Gao</surname> <given-names>Guowang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Zetian</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Electronic Engineering, Xi&#x00027;an Shiyou University</institution>, <addr-line>Xi&#x00027;an</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, Hunan University</institution>, <addr-line>Hunan</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>School of Telecommunications Engineering, Xidian University</institution>, <addr-line>Xi&#x00027;an</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Ludovico Minati, Tokyo Institute of Technology, Japan</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kannimuthu Subramanian, Karpagam Academy of Higher Education, India; Shaobing Gao, Sichuan University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Fei Wang <email>200102&#x00040;xsyu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>12</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>953235</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>05</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Wang, Wang, Wu, Gao and Wang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Wang, Wang, Wu, Gao and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Color constancy methods are generally based on a simplifying assumption that the spectral distribution of a light source is uniform across scenes. However, in reality, this assumption is often violated because of the presence of multiple light sources, that is, more than two illuminations. In this paper, we propose a unique cascade network of deep multi-scale supervision and single-scale estimation (CN-DMS4) to estimate multi-illumination. The network parameters are supervised and learned from coarse to fine in the training process and estimate only the final thinnest level illumination map in the illumination estimation process. Furthermore, to reduce the influence of the color channel on the Euclidean distance or the pixel-level angle error, a new loss function with a channel penalty term is designed to optimize the network parameters. Extensive experiments are conducted on single and multi-illumination benchmark datasets. In comparison with previous multi-illumination estimation methods, our proposed method displays a partial improvement in terms of quantitative data and visual effect, which provides the future research direction in end-to-end multi-illumination estimation.</p>
</abstract>
<kwd-group>
<kwd>color constancy</kwd>
<kwd>multi-illumination</kwd>
<kwd>convolution neural network</kwd>
<kwd>cascade</kwd>
<kwd>multi-scale</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="6"/>
<equation-count count="7"/>
<ref-count count="55"/>
<page-count count="13"/>
<word-count count="6844"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>With the rapid proliferation of digital imaging and digital video, accurate recording of the constant color of a scene from the device-captured image is of extreme importance for many practical applications, ranging from color-based object recognition and tracking to quality control of textiles (Funt et al., <xref ref-type="bibr" rid="B20">1999</xref>; Vrhel et al., <xref ref-type="bibr" rid="B47">2005</xref>; Gao et al., <xref ref-type="bibr" rid="B24">2017</xref>, <xref ref-type="bibr" rid="B25">2019</xref>). The color of an object is influenced by the illumination color and the observed color of an object in an image (representing the observed values in RGB space) depends on the intrinsic color and light-source color (Ebner, <xref ref-type="bibr" rid="B15">2007</xref>).</p>
<p>Human color constancy (HCC) is a perceptual phenomenon that stabilizes the appearance of an object&#x00027;s colors throughout changes in illumination. One possible-ecological justification for color constancy in mammals is to facilitate scene object recognition (Kraft and Brainard, <xref ref-type="bibr" rid="B34">1999</xref>; Smithson, <xref ref-type="bibr" rid="B46">2005</xref>). In Helmolt&#x00027;s words: &#x0201C;Colors are mainly important for us as properties of objects and as means of identifying objects.&#x0201D; Then a mechanism that preserves the color appearance of objects will serve this purpose. As a perceptual phenomenon, all variables affecting color constancy lie in the content of the perceived scene, e.g., scene chromaticity, three dimensional information, object movement, and some others. All these factors are called visual cues (Jameson, <xref ref-type="bibr" rid="B31">1989</xref>; Roca-Vila et al., <xref ref-type="bibr" rid="B40">2009</xref>). Numerous tests of human perception of colored surfaces indicate a high level of perceptual constancy, in which the appearance of the surface is relatively little changed. However, endowing a computer with the same ability is difficult (Gilchrist, <xref ref-type="bibr" rid="B27">2006</xref>; Roca-Vila et al., <xref ref-type="bibr" rid="B40">2009</xref>). To assist a computer in solving this problem, our central problem is to estimate the real object&#x00027;s color coordinates in some color space, which is called computational color constancy (CCC).</p>
<p>Previous methods have mostly been limited to a single-illumination assumption. However, in reality, most scenes have more than one illumination. In multi-illumination scenes, each pixel in an image is influenced by different light sources, unlike that of single illumination. For example, in an image with shadows, there are at least two lights (the light colors of different degrees of shadow areas and normal sunlight areas are different). Therefore, research on multi-illumination color constancy (MCC) has more practical significance.</p>
<p>However, fewer studies have been conducted on MCC than on single illumination. This is mainly because it is difficult to obtain datasets for multiple lighting conditions, especially for lighting colors requiring manual calibration of pixel-level accuracy.</p>
<p>As with single illumination, MCC methods can be classified into optimization- and learning-based methods.</p>
<p><bold>Optimization-based methods:</bold> Land et al. first proposed the Retinex model (Brainard and Wandell, <xref ref-type="bibr" rid="B12">1986</xref>; Land, <xref ref-type="bibr" rid="B36">1986</xref>; Funt et al., <xref ref-type="bibr" rid="B21">2004</xref>), which is the earliest theoretical model that can deal with the MCC problem. This theory is based on a series of psychological and physical experiments. The early purpose was not to estimate the illumination under multiple illumination conditions but to restore the relative reflectivity of objects in a scene. Barnard et al. (<xref ref-type="bibr" rid="B5">1997</xref>) proposed a model to deal with the MCC problem by detecting the change of illumination color in the scene. The model is patch-based and estimates the illumination of an image patch through single-illumination color constancy. Xiong and Funt (<xref ref-type="bibr" rid="B49">2006</xref>) used a diffusion technique in which a large-scale convolution kernel is used to filter the color-biased images in complex scenes. It is assumed that the images after convolution meet the local gray-world assumption. Although this method has achieved good results, it only uses simple convolution kernels that are easily affected by the real color of the object itself. For example, part of the obtained illumination map is the color of the object itself, rather than the illumination color.</p>
<p><bold>Learning-based methods:</bold> Like other data mining tasks, this method learns useful information from large amounts of data (Barnard et al., <xref ref-type="bibr" rid="B6">2010</xref>; Kannimuthu et al., <xref ref-type="bibr" rid="B32">2012</xref>; Arunkumar et al., <xref ref-type="bibr" rid="B3">2019</xref>). Shi et al. (<xref ref-type="bibr" rid="B42">2016</xref>) and Bianco et al. (<xref ref-type="bibr" rid="B9">2017</xref>) used patch-based convolutional neural networks (CNNs) to estimate a single illumination for each patch. By inputting each patch into the network, the local illumination of all patches can be obtained. Afifi and Brown (<xref ref-type="bibr" rid="B1">2020</xref>) proposed an end-to-end approach to learning the correct white balance, which consists of a single encoder and multiple decoders, mapping an input image into two additional white-balance settings corresponding to indoor and outdoor illuminations. This method can also be used in multi-illumination estimation; however, our experiments show that it is very time-consuming.</p>
<p>The abovementioned multi-illumination and single-illumination estimation methods have achieved good performance on some multi-illumination datasets. However, these methods may not find the optimal solution in some complex situations owing to their inflexibility. To summarize, there are still some unsolved open problems in these approaches, which can be generally summarized as two aspects:</p>
<list list-type="bullet">
<list-item><p>Many of these methods (Xiong and Funt, <xref ref-type="bibr" rid="B49">2006</xref>; Zeng et al., <xref ref-type="bibr" rid="B52">2011</xref>; Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>) are implemented by clustering the illumination of local regions. However, the process of clustering is a difficult problem. If the illumination distribution in the scene is scattered, then it is difficult to obtain accurate illumination. In addition, the selection of region size is also a key problem. Inappropriate region size will reduce the accuracy of illumination estimation, and these methods are based on the traditional assumption of illumination estimation. If the region does not meet this assumption, the corresponding regional illumination estimation may be in error.</p></list-item>
<list-item><p>Most existing CNN-based single-illumination estimation methods used for multi-illumination estimation are time-consuming (Barron, <xref ref-type="bibr" rid="B7">2015</xref>; Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>; Bianco et al., <xref ref-type="bibr" rid="B9">2017</xref>) when adopting the local image patches for estimation.</p></list-item>
</list>
<p>In recent years, CNNs have been widely used, especially the fully convolutional networks for image pixel classification (Shelhamer et al., <xref ref-type="bibr" rid="B41">2014</xref>; Yu and Koltun, <xref ref-type="bibr" rid="B51">2015</xref>; Badrinarayanan et al., <xref ref-type="bibr" rid="B4">2017</xref>) and image depth estimation (Eigen et al., <xref ref-type="bibr" rid="B17">2014</xref>; Eigen and Fergus, <xref ref-type="bibr" rid="B16">2015</xref>), to improve the estimation accuracy to a new level. In multi-illumination estimation, the pixel-level illumination is estimated from the original color-biased image, which is consistent with the image segmentation scene and depth estimation scene (Eigen et al., <xref ref-type="bibr" rid="B17">2014</xref>; Shelhamer et al., <xref ref-type="bibr" rid="B41">2014</xref>; Yu and Koltun, <xref ref-type="bibr" rid="B51">2015</xref>; Badrinarayanan et al., <xref ref-type="bibr" rid="B4">2017</xref>).</p>
<p>In this paper, we propose a cascade network of deep multiscale supervision and single-scale estimation to estimate multi-illumination (CN-DMS4)<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>. For training, the parameters are learned from coarse to fine and through different scales. In the test phase, only the illumination map of the thinnest level is estimated.</p>
<p>The CN-DMS4 network differs from existing methods, and provides two contributions:</p>
<list list-type="bullet">
<list-item><p>Multiscale supervision and single-scale estimation. The network is an end-to-end cascaded structure; the network parameters are supervised and learned from coarse to fine during the training process. Only the final thinnest level illumination map is estimated in the illumination estimation process.</p></list-item>
<list-item><p>A new loss function with a channel penalty term is designed to optimize the network parameters, which can solve the influence of color channels in the Euclidean distance or pixel-level angle error.</p></list-item>
</list>
<p>The remainder of this paper is organized as follows. In Section 2, the structure of the proposed network and training strategy are presented. The experimental results are provided in Section 3. The conclusion is given in Section 4.</p>
</sec>
<sec id="s2">
<title>2. Multi-scale supervision and single-scale estimation in a cascade convolutional neural network</title>
<p>Following the widely accepted simplified diagonal model (Finlayson et al., <xref ref-type="bibr" rid="B18">1994</xref>; Funt and Lewis, <xref ref-type="bibr" rid="B22">2000</xref>), we also use this model in our study. For multi-illumination estimation, we modify the diagonal model as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the illumination in the scene is <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>), (<italic>x, y</italic>) is the spatial position in an image, <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) represents the image under unknown illumination, <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) represents the illumination image, and <italic>R</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) represents the image under standard illumination.</p>
<sec>
<title>2.1. Problem formulation</title>
<p>As in the single-illumination estimate, we only know the image <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) under an unknown light source <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>), which needs to be estimated. The goal of multi-illumination estimation is to estimate <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) from <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>), and then compute it as <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) &#x0003D; <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>)/<italic>R</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>). To address the problem of estimating <italic>E</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) from <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>), we formulate it as a regression. A new color-space model, <italic>log</italic>&#x02212;<italic>uv</italic>, has been used in color constancy methods (Finlayson et al., <xref ref-type="bibr" rid="B19">2004</xref>; Barron, <xref ref-type="bibr" rid="B7">2015</xref>; Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>) in recent years, and has certain advantages<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. The calculation method is as follows:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi><mml:mo>/</mml:mo><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>After estimating the light, it can be converted back to the <italic>RGB</italic> space through a very simple formula:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mo class="qopname">exp</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mo class="qopname">exp</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where (<italic>L</italic><sub><italic>u</italic></sub>, <italic>L</italic><sub><italic>v</italic></sub>) is the image in the <italic>log</italic>&#x02212;<italic>uv</italic> color space, and (<italic>R, G, B</italic>) is the image in the <italic>RGB</italic> color space.</p>
<p>In this study, we first convert the RGB image <italic>I</italic><sub><italic>c</italic></sub>(<italic>x, y</italic>) to log-uv image <italic>I</italic><sub><italic>uv</italic></sub>(<italic>x, y</italic>) &#x0003D; (<italic>I</italic><sub><italic>u</italic></sub>(<italic>x, y</italic>), <italic>I</italic><sub><italic>v</italic></sub>(<italic>x, y</italic>)). Our goal is to find a mapping <italic>f</italic><sub><italic>theta</italic></sub>, such that <italic>f</italic><sub>&#x003B8;</sub>(<italic>I</italic><sub><italic>uv</italic></sub>) &#x0003D; <italic>E</italic><sub><italic>uv</italic></sub>(<italic>x, y</italic>), where <italic>E</italic><sub><italic>uv</italic></sub>(<italic>x, y</italic>) represents the illumination value at each (<italic>x, y</italic>) in the <italic>log</italic>&#x02212;<italic>uv</italic> space; <italic>E</italic><sub><italic>uv</italic></sub>(<italic>x, y</italic>) should be as close as possible to the real light at the position of (<italic>x, y</italic>). In this paper, we define <italic>f</italic><sub><italic>theta</italic></sub> as a CNN model that is optimized by the parameter &#x003B8;.</p>
<p>Based on the semantic segmentation model, we define the network into the encoding and decoding parts. The encoding part performs the process of feature extraction, and the decoding part performs the process of remapping these features back to the image. We define the encoding process by Equation (4), and the decoding the process by Equation (5):</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C8;<sub>1</sub> represents the network of the encoding process, &#x003B8;<sub>1</sub> indicates the parameters to be optimized in the encoding part, &#x003C8;<sub>2</sub> represents the decoding process, and &#x003B8;<sub>2</sub> indicates the parameters to be optimized in the decoding part.</p>
<p>In addition, refer to the idea in the literature (Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>), which uses a factor graph defined across the scale space of the input image and estimated the multi-illumination at multiple scales from fine to coarse (i.e., the image becomes increasingly blur), the pixelwise illuminant can be viewed as the geometric mean of the illuminants across all scales. In this paper, we also try to use a multiscale network to improve the estimation accuracy. The difference is that our method supervises and learns the parameters from coarse to fine (i.e., the image becomes increasingly clear).</p>
</sec>
<sec>
<title>2.2. Network architecture</title>
<p>As introduced in the previous section, it is necessary to design a network structure that includes an encoding part &#x003C8;<sub>1</sub> and decoding part &#x003C8;<sub>2</sub>. The network structure is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The network structure of CN-DMS4.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0001.tif"/>
</fig>
<p><bold>Encoding part of the network</bold>. In <xref ref-type="fig" rid="F1">Figure 1</xref>, the encoding part indicates &#x003C8;<sub>1</sub> in Equation 4. The encoding part is used to extract features, which are then input to the decoding part to estimate the illumination. In this part, we also used AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B35">2017</xref>), VGGNet-16 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>), and VGGNet-19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>), but the results showed little difference. Finally, we used the structure improved from AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B35">2017</xref>) containing 5 convolutions. We removed all the pooling layers and replaced them with a large stride of convolution kernels. All the layers use the convolution kernel of 3 &#x000D7; 3, and the stride of all the convolutions is set to 2.</p>
<p><bold>Decoding part of the network</bold>. In <xref ref-type="fig" rid="F1">Figure 1</xref>, the decoding part indicates &#x003C8;<sub>2</sub> in Equation 5. The decoding part is used to reconstruct the pixel-level illumination. However, conv6, conv7, conv8, and conv9 use the convolution kernel of 1 &#x000D7; 1 to reduce the dimension, while the others use the convolution kernel of 3 &#x000D7; 3 and the stride is set to 2. In the training phase, in addition to <italic>E</italic><sub><italic>c</italic></sub>, the <inline-formula><mml:math id="M6"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> also participates in supervised learning. In the illumination estimation stage, illumination images at different scales can be obtained, or only the final and the finest illumination can be obtained but the part marked by the green box in <xref ref-type="fig" rid="F1">Figure 1</xref> cannot participate in the calculation.</p>
</sec>
<sec>
<title>2.3. Loss function</title>
<p>Our goal is to train a mapping function for generating an illumination image <italic>E</italic>(<italic>u, v</italic>) that is close to the ground-truth illumination image <italic>E</italic><sub><italic>t</italic></sub>(<italic>u, v</italic>). Instead of minimizing the mean squared error between <italic>E</italic>(<italic>u, v</italic>) and <italic>E</italic><sub><italic>t</italic></sub>(<italic>u, v</italic>) at each scale, we propose a variant of the L1. The overall loss function is defined as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mtext>_</mml:mtext><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mtext>_</mml:mtext><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M8"><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:math></inline-formula>, <italic>N</italic> indicates the number of samples for each batch, <italic>S</italic> indicates the scale of the cascade, <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M10"><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the illumination in the log-uv space estimated by the model at the <italic>j</italic> scale, <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mstyle class="text"><mml:mtext>_</mml:mtext></mml:mstyle><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mstyle class="text"><mml:mtext>_</mml:mtext></mml:mstyle><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the ground truth at <italic>j</italic> scale, and &#x003B5; takes the empirical value &#x003B5; &#x0003D; 0.001.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Experimental results</title>
<sec>
<title>3.1. Datasets</title>
<p>There are only a few public multi-illumination datasets, and the number of images in the datasets is limited. In the phase of network training, more data is needed. Based on the dissertation in Gao (<xref ref-type="bibr" rid="B23">2017</xref>), we use the single-illumination datasets Color Checker (Gehler et al., <xref ref-type="bibr" rid="B26">2008</xref>) and NUS 8-Camera (Cheng et al., <xref ref-type="bibr" rid="B13">2014</xref>) to render a large number of multi-illumination datasets.</p>
<p>The operation process is as follows. First, the images are corrected to standard white light according to the illumination provided by the datasets. Next, multiple spatial positions are randomly generated on each image, and 3 &#x02212; 8 different lighting colors are simulated, as shown in <xref ref-type="fig" rid="F2">Figure 2A</xref> (the boundary is blurred).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Images under multiple illuminations. From left to right: <bold>(A)</bold> Synthetic images; <bold>(B)</bold> Images from Gijsenij dataset: <ext-link ext-link-type="uri" xlink:href="http://www.colorconstancy.com/wp-content/uploads/2014/10/multiple_light_sources_dataset.zip">http://www.colorconstancy.com/wp-content/uploads/2014/10/multiple_light_sources_dataset.zip</ext-link> Reproduced with permission from Arjan et al. (<xref ref-type="bibr" rid="B2">2012</xref>); <bold>(C)</bold> Images from MIMO dataset available at: <ext-link ext-link-type="uri" xlink:href="http://www5.cs.fau.de/research/data/two-illuminant-dataset-with-computed-ground-truth/">http://www5.cs.fau.de/research/data/two-illuminant-dataset-with-computed-ground-truth/</ext-link>. Reproduced with persmission from Beigpour et al. (<xref ref-type="bibr" rid="B43">2014</xref>); <bold>(D)</bold> Images from Bleier dataset available at: <ext-link ext-link-type="uri" xlink:href="http://www5.cs.fau.de/research/data/multi-illuminant-dataset/index.html">http://www5.cs.fau.de/research/data/multi-illuminant-dataset/index.html</ext-link>. Reproduced with permission from Bleier et al. (<xref ref-type="bibr" rid="B10">2011</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0002.tif"/>
</fig>
<p>In addition, the following multi-illumination datasets collected in real scenes are used, respectively. The Gijsenij dataset (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>), is a multi-illumination dataset collected in a natural scene that includes 59 indoor and 9 outdoor multi-illumination images, and their corresponding illuminations. <xref ref-type="fig" rid="F2">Figure 2B</xref> shows an indoor and an outdoor image from this database.</p>
<p>The multiple-input multiple-output (MIMO) dataset (Beigpour et al., <xref ref-type="bibr" rid="B43">2014</xref>) was established by Beigpour et al., which contains 57 indoor images and 21 outdoor images; it provides pixel-level illumination images. <xref ref-type="fig" rid="F2">Figure 2C</xref> shows an indoor and an outdoor image from this database.</p>
<p>The Bleier dataset (Bleier et al., <xref ref-type="bibr" rid="B10">2011</xref>) was collected and established by Bleier et al. The dataset contains 36 high-quality images and corresponding illumination images obtained by nine different illuminations in four scenes. <xref ref-type="fig" rid="F2">Figure 2D</xref> shows two images from the database.</p>
<p>To enable the model to be used for single-light estimation, we added a single-light dataset, SFU Grayball dataset (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>).</p>
<p>In addition, we utilize horizontal and vertical mirroring, rotating at [90<sup><italic>o</italic></sup>, 180<sup><italic>o</italic></sup>] and at [&#x02212;60<sup><italic>o</italic></sup>, 60<sup><italic>o</italic></sup>] every five degrees, respectively. At the same time, we scale the data from [0.6, 1.5] times to obtain a total of 14,500 real scene datasets. We selected 5,000 images from the real multi-illumination dataset, 4,000 from the dataset that we constructed as training data, 3,000 from SFU Grayball dataset (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>), and 2,500 from Shadow removal datasets (Zhu et al., <xref ref-type="bibr" rid="B55">2010</xref>; Gong and Cosker, <xref ref-type="bibr" rid="B28">2014</xref>; Sidorov, <xref ref-type="bibr" rid="B44">2019</xref>). Finally, we resized these data to 512 &#x000D7; 512 as the input of the training network. Similar to most learning-based tasks, we used the 3-fold cross-validation.</p>
</sec>
<sec>
<title>3.2. Metrics</title>
<p>Similar to color constancy under single illumination, we use angular error to measure the performance of our MCC method. The difference is that we calculate the angular error pixel-by-pixel, and then average the angular error of the whole image. The angular error is defined by Equation (4).</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo class="qopname">arccos</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mo>.</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>E</italic><sub><italic>e</italic></sub>(<italic>x, y</italic>) and <inline-formula><mml:math id="M14"><mml:msubsup><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> represents the estimated illumination and real illumination at position (<italic>x, y</italic>), respectively, and <italic>M, N</italic> represents the width and height of the image. The less the <italic>err</italic> is, the better the method performs.</p>
<p>Similar to previous multi-illumination estimate studies (Brainard and Wandell, <xref ref-type="bibr" rid="B12">1986</xref>; Land, <xref ref-type="bibr" rid="B36">1986</xref>; Barnard et al., <xref ref-type="bibr" rid="B5">1997</xref>; Funt et al., <xref ref-type="bibr" rid="B21">2004</xref>; Xiong and Funt, <xref ref-type="bibr" rid="B49">2006</xref>; Zeng et al., <xref ref-type="bibr" rid="B52">2011</xref>; Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>), we only compare the <italic>mean</italic> and <italic>median</italic> on multi-illumination datasets.</p>
</sec>
<sec>
<title>3.3. Implementation parameters</title>
<p>In this subsection, the parameter sets for training our final model are given.</p>
<p><bold>Encoding network selection:</bold> Different network structures, such as AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B35">2017</xref>), VGGNet-16 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>), and VGGNet-19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>), are used to test the performance. The network we designed (modified from AlexNet Krizhevsky et al., <xref ref-type="bibr" rid="B35">2017</xref>) is slightly worse than VGGNet-19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>), but the speed is more than 4 times faster than AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B35">2017</xref>) and VGGNet-19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B45">2014</xref>). Finally, considering the effect and efficiency, the structure in <xref ref-type="fig" rid="F1">Figure 1</xref> is used in this study.</p>
<p><bold>Decoding network selection:</bold> The decoder part is equivalent to a process of feature reconstruction. The backbone network structure we used was symmetrical to the encoding network. We tested with different resized stages and compared the performance. The resulting curve is shown in <xref ref-type="fig" rid="F3">Figures 3A</xref>&#x02013; <xref ref-type="fig" rid="F3">C</xref>. Finally, considering the effect and efficiency, the decoding structure shown in <xref ref-type="fig" rid="F1">Figure 1</xref> is used in this study. It can also be seen from the curve that under different resize levels of the decoder, the number of deconvolution layers does not increase, and the time consumptions of illumination estimations are essentially equal.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Performance curves under different parameters. <bold>(A)</bold> Comparison of training time of different decoding scales; <bold>(B)</bold> Comparison of average angular errors of different decoding scales; <bold>(C)</bold> Comparison of average time consumption of illumination estimation at different decoding scales; <bold>(D)</bold> Comparison of training curves of different loss functions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0003.tif"/>
</fig>
<p><bold>Loss function selection:</bold> During training, the angular error and loss function proposed in this study are compared, and the resulting curve is shown in <xref ref-type="fig" rid="F3">Figure 3D</xref>. As can be seen from the curve, the loss function used in this paper converges faster than the angular error, and the training error is relatively smooth. At the same time, the test average error in several datasets is slightly lower than the angular error.</p>
<p><bold>Training parameters:</bold> We used Adam (Kingma and Adam, <xref ref-type="bibr" rid="B33">2014</xref>), and set <italic>batch</italic> &#x0003D; 64 to optimize the network in this work. The learning rate was set to 0.0001. Approximately 4, 000 epochs (total 906, 250 iterations at <italic>batch</italic> &#x0003D; 64) were performed.</p>
</sec>
<sec>
<title>3.4. Comparison with state-of-the-art methods</title>
<p>This paper is aimed at multi-illumination estimation. We compare the proposed method with some existing MCC methods and with some methods that can estimate local illumination, including the following three types.</p>
<p>One type consists of methods for which segmentation is not required, such as gray pixel (GP) (Yang et al., <xref ref-type="bibr" rid="B50">2015</xref>), and a retinal neuron mechanism-based method proposed by Zhang et al. (<xref ref-type="bibr" rid="B53">2016</xref>).</p>
<p>The second type requires image segmentation, including the method of Arjan et al. (<xref ref-type="bibr" rid="B2">2012</xref>), the multi-illumination model proposed by Gu et al. (<xref ref-type="bibr" rid="B29">2014</xref>), and a multi-illumination estimation model based on the factor graph (FG) model (Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>).</p>
<p>The third type, developed in recent years, comprises single-illumination estimation methods based on CNNs, including the CNN method of Bianco et al. (<xref ref-type="bibr" rid="B8">2015</xref>) CC-CNN, DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>), and the grayness index (GI) (Qian et al., <xref ref-type="bibr" rid="B39">2019</xref>).</p>
<p>The quantitative performance comparison on the Gijsenij dataset (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>) is presented in <xref ref-type="table" rid="T1">Table 1</xref>, the results on MIMO (Beigpour et al., <xref ref-type="bibr" rid="B43">2014</xref>) in <xref ref-type="table" rid="T2">Table 2</xref>, and Bleier (Bleier et al., <xref ref-type="bibr" rid="B10">2011</xref>) in <xref ref-type="table" rid="T3">Table 3</xref>. Some results are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Quantitative evaluation on the Gijsenij dataset (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Lab</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Outdoor</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Retinex (Funt et al., <xref ref-type="bibr" rid="B21">2004</xref>)</td>
<td valign="top" align="center">13.15</td>
<td valign="top" align="center">13.16</td>
<td valign="top" align="center">6.62</td>
<td valign="top" align="center">7.25</td>
</tr>
<tr>
<td valign="top" align="left">Zhang (Zhang et al., <xref ref-type="bibr" rid="B53">2016</xref>)</td>
<td valign="top" align="center">14.64</td>
<td valign="top" align="center">14.48</td>
<td valign="top" align="center">8.45</td>
<td valign="top" align="center">8.23</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GW (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">11.7</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">6.4</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GE2 (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">12.4</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">5.1</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">GU-GE1 (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">3.25</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">3.26</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">GU-WP (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">2.97</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">3.20</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">FG (Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>)</td>
<td valign="top" align="center">2.68</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">3.10</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">CC-CNN (Bianco et al., <xref ref-type="bibr" rid="B8">2015</xref>)</td>
<td valign="top" align="center">5.71</td>
<td valign="top" align="center">5.97</td>
<td valign="top" align="center">3.92</td>
<td valign="top" align="center">4.26</td>
</tr>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">3.76</td>
<td valign="top" align="center">4.13</td>
<td valign="top" align="center">4.60</td>
<td valign="top" align="center">4.80</td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center"><inline-formula><mml:math id="M15"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.51</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M16"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.58</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M17"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.39</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M18"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.41</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Quantitative Evaluation on MIMO (Beigpour et al., <xref ref-type="bibr" rid="B43">2014</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>MIMO dataset</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Lab</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Outdoor</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Retinex (Funt et al., <xref ref-type="bibr" rid="B21">2004</xref>)</td>
<td valign="top" align="center">4.92</td>
<td valign="top" align="center">5.36</td>
<td valign="top" align="center">4.69</td>
<td valign="top" align="center">5.84</td>
</tr>
<tr>
<td valign="top" align="left">Zhang (Zhang et al., <xref ref-type="bibr" rid="B53">2016</xref>)</td>
<td valign="top" align="center">2.71</td>
<td valign="top" align="center">3.21</td>
<td valign="top" align="center">4.35</td>
<td valign="top" align="center">5.18</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-WP (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">4.2</td>
<td valign="top" align="center">5.1</td>
<td valign="top" align="center">3.8</td>
<td valign="top" align="center">4.2</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GE1 (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">4.2</td>
<td valign="top" align="center">4.8</td>
<td valign="top" align="center">9.2</td>
<td valign="top" align="center">9.1</td>
</tr>
<tr>
<td valign="top" align="left">GU-GE1 (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">3.16</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">3.54</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">GU-GW (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">3.86</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">4.43</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">FG (Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>)</td>
<td valign="top" align="center">2.96</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">3.48</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">CC-CNN (Bianco et al., <xref ref-type="bibr" rid="B8">2015</xref>)</td>
<td valign="top" align="center">2.98</td>
<td valign="top" align="center">3.22</td>
<td valign="top" align="center">3.35</td>
<td valign="top" align="center">3.72</td>
</tr>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">3.21</td>
<td valign="top" align="center">3.46</td>
<td valign="top" align="center">3.01</td>
<td valign="top" align="center">3.86</td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center"><inline-formula><mml:math id="M19"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.50</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M20"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.83</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M21"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>2.99</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M22"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>3.33</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Quantitative evaluation on Bleier (Bleier et al., <xref ref-type="bibr" rid="B10">2011</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Bleier dataset</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Lab</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Retinex (Funt et al., <xref ref-type="bibr" rid="B21">2004</xref>)</td>
<td valign="top" align="center">2.68</td>
<td valign="top" align="center">3.40</td>
</tr>
<tr>
<td valign="top" align="left">Zhang (Zhang et al., <xref ref-type="bibr" rid="B53">2016</xref>)</td>
<td valign="top" align="center">3.97</td>
<td valign="top" align="center">4.50</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GW (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">4.71</td>
<td valign="top" align="center">4.93</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GE1 (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">14.89</td>
<td valign="top" align="center">14.52</td>
</tr>
<tr>
<td valign="top" align="left">GU-GE1 (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">3.39</td>
<td valign="top" align="center">3.32</td>
</tr>
<tr>
<td valign="top" align="left">GU-GW (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center"><inline-formula><mml:math id="M23"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>1.18</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M24"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>1.16</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">FG (Mutimbu and Robles-Kelly, <xref ref-type="bibr" rid="B37">2016</xref>)</td>
<td valign="top" align="center">2.90</td>
<td valign="top" align="center">2.95</td>
</tr>
<tr>
<td valign="top" align="left">CC-CNN (Bianco et al., <xref ref-type="bibr" rid="B8">2015</xref>)</td>
<td valign="top" align="center">3.32</td>
<td valign="top" align="center">3.51</td>
</tr>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">3.10</td>
<td valign="top" align="center">3.46</td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center">2.54</td>
<td valign="top" align="center">2.61</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Qualitative results on MIMO multi-illumination datasets (Beigpour et al., <xref ref-type="bibr" rid="B43">2014</xref>), the top right-hand corner of each image indicates the angle error. From left to right: <bold>(A)</bold> Original image; <bold>(B)</bold> Ground truth illumination image; <bold>(C)</bold> Estimated illumination image; <bold>(D)</bold> Corrected image; <bold>(E)</bold> Ground truth image. Dataset available at: <ext-link ext-link-type="uri" xlink:href="http://www5.cs.fau.de/research/data/two-illuminant-dataset-with-computed-ground-truth/">http://www5.cs.fau.de/research/data/two-illuminant-dataset-with-computed-ground-truth/</ext-link>. Reproduced with permission from Beigpour et al. (<xref ref-type="bibr" rid="B43">2014</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0004.tif"/>
</fig>
<p>From <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T2">2</xref>, it can be seen that the mean error of the proposed method is reduced by 6.3% on the Gijsenij dataset and 15.5% on the MIMO dataset compared to the second best way.</p>
<p>It can be seen From <xref ref-type="fig" rid="F4">Figure 4</xref>, from the first row of images, we find that the approximate shadow boundary can be accurately distinguished at the position of the illumination shadow boundary. Better fineness can be achieved in these scenes because our method is a step-by-step process; thus, we can accurately estimate the illumination position. In addition, there are a large number of synthetic images in the training datasets. The illumination boundary position of the synthetic color biased image is very similar to the light and shadows. Therefore, our method can deal with this boundary well. The images in the second column have more illumination colors, and almost every pixel given by the dataset is different. There is no such fine data in the training data, hence the estimated illumination is only consistent in the overall color. In addition, it is observed that the real illumination color in the training datasets is close to the color of the actual object surface in many areas and, in our method, it is difficult to accurately distinguish whether the color is that of the real illumination or the color of the object surface itself. However, it should be noted that the best existing MCC method must use gray-world to estimate the color of the light source. Gray-world is prone to different degrees of color deviation because of the color of the scene object itself. Because the high-precision dataset of multi-illumination scenes is limited, a learning-based method cannot learn the features well. Therefore, it can be considered that all known MCC-based methods have such problems, which may lead to color deviation. Further research is required to solve this problem with a small number of samples.</p>
<p>In addition, we searched and downloaded several visual deviation images with multiple lighting from the Internet<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. These color-biased images are corrected by different MCC methods. Because real illumination cannot be obtained, the effect of the corrected images can only be judged subjectively. Some correction comparison results are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. As can be seen from the first row in the figure, these scenes contain a variety of lighting. Visually, the color deviation caused by a different illumination has been partially improved; for example, in the images in the first column, the light of the morning glow is yellow, which blocks the green of some trees. After our method, the trees and the sky are more real in visual effect. It can be seen from the images in the second and fourth columns that although other methods also eliminate part of the light, the overall color tone still shows color deviation visually. On applying our method, although the image still looks a little color biased, the image is more natural.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Qualitative results on natural scenes. The images in columns 1, 2, and 4 are taken from Baidu.com, available at: <ext-link ext-link-type="uri" xlink:href="http://mms2.baidu.com/it/u=3199546478,84333290&#x00026;fm=253&#x00026;app=138&#x00026;f=PNG&#x00026;fmt=auto&#x00026;q=75?w=669&#x00026;h=500">http://mms2.baidu.com/it/u=3199546478,84333290&#x00026;fm=253&#x00026;app=138&#x00026;f=PNG&#x00026;fmt=auto&#x00026;q=75?w=669&#x00026;h=500</ext-link>, <ext-link ext-link-type="uri" xlink:href="http://mms0.baidu.com/it/u=576667012,1565892735&#x00026;fm=253&#x00026;app=138&#x00026;f=JPEG&#x00026;fmt=auto&#x00026;q=75?w=500&#x00026;h=331">http://mms0.baidu.com/it/u=576667012,1565892735&#x00026;fm=253&#x00026;app=138&#x00026;f=JPEG&#x00026;fmt=auto&#x00026;q=75?w=500&#x00026;h=331</ext-link> and <ext-link ext-link-type="uri" xlink:href="http://mms2.baidu.com/it/u=3592193920,2788102915&#x00026;fm=253&#x00026;app=138&#x00026;f=JPEG&#x00026;fmt=auto&#x00026;q=75?w=500&#x00026;h=329">http://mms2.baidu.com/it/u=3592193920,2788102915&#x00026;fm=253&#x00026;app=138&#x00026;f=JPEG&#x00026;fmt=auto&#x00026;q=75?w=500&#x00026;h=329</ext-link>. The image in the third column is from the doctoral thesis (Gao, <xref ref-type="bibr" rid="B23">2017</xref>). For each column, from top to bottom: Original image; <bold>(A)</bold> Result By GP (Yang et al., <xref ref-type="bibr" rid="B50">2015</xref>); <bold>(B)</bold> Result By Retinex (Brainard and Wandell, <xref ref-type="bibr" rid="B12">1986</xref>); <bold>(C)</bold> Result By Zhang (Zhang et al., <xref ref-type="bibr" rid="B53">2016</xref>); <bold>(D)</bold> Our method.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0005.tif"/>
</fig>
<p>As the lack of multi-illumination datasets, as an extension, we evaluate the proposed method using a tinted Multi-illuminant dataset (Sidorov, <xref ref-type="bibr" rid="B44">2019</xref>) which is synthesized from the SFU Gray-Ball (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>), this method not only synthesizes multiple lights but also synthesizes the superposition of multiple lights. Performance is quantitatively compared to the performance of state-of-the-art methods and is reported in <xref ref-type="table" rid="T4">Table 4</xref>, and some images are demonstrated for visual evaluation in <xref ref-type="fig" rid="F6">Figure 6</xref>. It may be seen that the proposed technique outperforms all existing multi-illuminant algorithms. We observed that some images had slightly increased or reduced brightness, although the color cast is removed correctly.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Quantitative evaluation on the tinted multi-illuminant dataset (Sidorov, <xref ref-type="bibr" rid="B44">2019</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GIJ-GW (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">6.61</td>
<td valign="top" align="center">10.50</td>
</tr>
<tr>
<td valign="top" align="left">GIJ-GE1 (Arjan et al., <xref ref-type="bibr" rid="B2">2012</xref>)</td>
<td valign="top" align="center">6.70</td>
<td valign="top" align="center">12.10</td>
</tr>
<tr>
<td valign="top" align="left">GU-GE1 (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">8.14</td>
<td valign="top" align="center">15.56</td>
</tr>
<tr>
<td valign="top" align="left">GU-GW (Gu et al., <xref ref-type="bibr" rid="B29">2014</xref>)</td>
<td valign="top" align="center">5.51</td>
<td valign="top" align="center">9.78</td>
</tr>
<tr>
<td valign="top" align="left">CC-CNN (Bianco et al., <xref ref-type="bibr" rid="B8">2015</xref>)</td>
<td valign="top" align="center">5.64</td>
<td valign="top" align="center">5.88</td>
</tr>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">6.19</td>
<td valign="top" align="center">7.66</td>
</tr>
<tr>
<td valign="top" align="left">FC4 (Hu et al., <xref ref-type="bibr" rid="B30">2017</xref>)</td>
<td valign="top" align="center">4.27</td>
<td valign="top" align="center">4.89</td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center"><inline-formula><mml:math id="M25"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>3.42</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M26"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>3.71</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Results produced by the proposed approach on removing of multi-illuminant color cast, the images come from SFU Gray-Ball (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>). From left to right: <bold>(A)</bold> Tint maps; <bold>(B)</bold> Synthesized images; <bold>(C)</bold> Predictions; <bold>(D)</bold> Ground truth. Dataset available at: <ext-link ext-link-type="uri" xlink:href="https://www2.cs.sfu.ca/&#x0007E;colour/data/gray_ball/index.html">https://www2.cs.sfu.ca/&#x0007E;colour/data/gray_ball/index.html</ext-link>. Reproduced with permission from Ciurea and Funt (<xref ref-type="bibr" rid="B14">2003</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0006.tif"/>
</fig>
</sec>
<sec>
<title>3.5. Adaptation for single-illumination</title>
<p>As mentioned in this paper, the proposed method aims to solve the color constancy problem under multiple illuminations, and we mainly compare it with existing MCC methods and with methods that can estimate local illumination. For single-illumination, we added some single-illumination datasets and used the same illumination as illumination maps for training. we take the mean value of the illumination map as the estimated illumination, and compare it with three single-illumination methods: DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>), FC4 (Hu et al., <xref ref-type="bibr" rid="B30">2017</xref>), and our previous single-illumination method, MSRWNS (Wang et al., <xref ref-type="bibr" rid="B48">2022</xref>). The quantitative performance comparison of the SFU Gray-Ball dataset (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>) and ADE20k dataset (Zhou et al., <xref ref-type="bibr" rid="B54">2016</xref>) are presented in <xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T6">6</xref>, some results are shown in <xref ref-type="fig" rid="F7">Figure 7</xref>. It may be seen that the proposed method also shows a better performance in single-light estimation, second only to our previous method.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Quantitative evaluation on SFU Gray-Ball (Ciurea and Funt, <xref ref-type="bibr" rid="B14">2003</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">2.41</td>
</tr>


<tr>
<td valign="top" align="left">FC4 (Hu et al., <xref ref-type="bibr" rid="B30">2017</xref>)</td>
<td valign="top" align="center">1.12</td>
<td valign="top" align="center">2.33</td>
</tr>
<tr>
<td valign="top" align="left">MSRWNS (Wang et al., <xref ref-type="bibr" rid="B48">2022</xref>)</td>
<td valign="top" align="center"><inline-formula><mml:math id="M27"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>0.82</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M28"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>1.83</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">2.24</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Quantitative evaluation on ADE20k (Zhou et al., <xref ref-type="bibr" rid="B54">2016</xref>), red indicates the best.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Mean</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">1.68</td>
</tr>
<tr>
<td valign="top" align="left">FC4 (Hu et al., <xref ref-type="bibr" rid="B30">2017</xref>)</td>
<td valign="top" align="center">1.32</td>
<td valign="top" align="center">1.56</td>
</tr>
<tr>
<td valign="top" align="left">MSRWNS (Wang et al., <xref ref-type="bibr" rid="B48">2022</xref>)</td>
<td valign="top" align="center"><inline-formula><mml:math id="M29"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>0.61</mml:mn></mml:mstyle></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M30"><mml:mstyle class="text" mathcolor="#ed1d23"><mml:mn>1.68</mml:mn></mml:mstyle></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">CN-DMS4</td>
<td valign="top" align="center">1.13</td>
<td valign="top" align="center">0.95</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Result of single illumination, the images come from ADE20k (Zhou et al., <xref ref-type="bibr" rid="B54">2016</xref>). From left to right: <bold>(A)</bold> Original image; <bold>(B)</bold> Result by DS-Net (Shi et al., <xref ref-type="bibr" rid="B42">2016</xref>); <bold>(C)</bold> Result by FC4 (Hu et al., <xref ref-type="bibr" rid="B30">2017</xref>); <bold>(D)</bold> Result by MSRWNS (Wang et al., <xref ref-type="bibr" rid="B48">2022</xref>); <bold>(E)</bold> Result by proposed method; <bold>(F)</bold> Ground truth. Dataset available at: <ext-link ext-link-type="uri" xlink:href="https://groups.csail.mit.edu/vision/datasets/ADE20K/">https://groups.csail.mit.edu/vision/datasets/ADE20K/</ext-link>. Reproduced with permission from Zhou et al. (<xref ref-type="bibr" rid="B54">2016</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fninf-16-953235-g0007.tif"/>
</fig>
</sec>
<sec>
<title>3.6. Efficiency</title>
<p>The code used to test the efficiency of the proposed method is based on PyTorch (Paszke et al., <xref ref-type="bibr" rid="B38">2019</xref>) and the training took approximately 8 h, after which the loss tended to stabilize. In the testing phase, we used OpenCV (Bradski, <xref ref-type="bibr" rid="B11">2000</xref>) to load the model. An average image required 200 ms on a CPU, and 32 ms on a GPU <xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>. For low-resolution images, the real-time estimation can be achieved using a GPU, but for high-resolution images, the algorithm requires significant time. In the future study, we will try to prune the model to further improve its efficiency.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>4. Conclusion</title>
<p>Most studies of color constancy are based on the assumption that there is only a single-illumination in the scene. However, in reality, most scenes have more than one illumination. For the illumination estimation in this study, the encoding and decoding network was introduced, and a unique network model of multiscale supervision and single-scale estimation was designed. An optimization network with an improved loss function and a simple operator with a penalty was designed to train the network. By testing on several public datasets, our method yielded a partial improvement in terms of quantitative data and visual effects compared with previous multi-illumination estimation methods. This provides a research direction in end-to-end multi-illumination estimation.</p>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>FW is responsible for conceptualization, investigation, data curation, and writing. WW is responsible for formal analysis, investigation, and methodology. DW is responsible for formal analysis, investigation, and validation. GG is responsible for data curation and investigation. ZW is responsible for polishing the language and the major experiments in the revised version. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>The Project supported by the Science Fund of State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body (No. 32015013) and the Shaanxi Province Key R&#x00026;D Program Project (No. 2022GY-435).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Afifi</surname> <given-names>M.</given-names></name> <name><surname>Brown</surname> <given-names>M. S.</given-names></name></person-group> (<year>2020</year>). <article-title>Deep white-balance editing,</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Seattle, WA</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arjan</surname> <given-names>G.</given-names></name> <name><surname>Rui</surname> <given-names>L.</given-names></name> <name><surname>Theo</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Color constancy for multiple light sources</article-title>. <source>IEEE Trans. Image Process</source>. <volume>21</volume>, <fpage>697</fpage>. <pub-id pub-id-type="doi">10.1109/TIP.2011.2165219</pub-id><pub-id pub-id-type="pmid">21859624</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arunkumar</surname> <given-names>P.</given-names></name> <name><surname>Chandramathi</surname> <given-names>S.</given-names></name> <name><surname>Kannimuthu</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Sentiment analysis-based framework for assessing internet telemedicine videos</article-title>. <source>Int. J. Data Anal. Techn. Strategies</source> <volume>11</volume>, <fpage>328</fpage>&#x02013;<lpage>336</lpage>. <pub-id pub-id-type="doi">10.1504/IJDATS.2019.103755</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Badrinarayanan</surname> <given-names>V.</given-names></name> <name><surname>Kendall</surname> <given-names>A.</given-names></name> <name><surname>Cipolla</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>SegNet: a deep convolutional encoder-decoder architecture for scene segmentation</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>39</volume>, <fpage>2481</fpage>&#x02013;<lpage>2495</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2644615</pub-id><pub-id pub-id-type="pmid">28060704</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barnard</surname> <given-names>K.</given-names></name> <name><surname>Finlayson</surname> <given-names>G. D.</given-names></name> <name><surname>Funt</surname> <given-names>B. V.</given-names></name></person-group> (<year>1997</year>). <article-title>Colour constancy for scenes with varying illumination</article-title>. <source>Comput. Vis. Image Understand</source>. <volume>65</volume>, <fpage>311</fpage>&#x02013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1006/cviu.1996.0567</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barnard</surname> <given-names>K.</given-names></name> <name><surname>Martin</surname> <given-names>L.</given-names></name> <name><surname>Funt</surname> <given-names>B.</given-names></name> <name><surname>Coath</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>A data set for color research</article-title>. <source>Color Res. Appl</source>. <volume>27</volume>, <fpage>1049</fpage>. <pub-id pub-id-type="doi">10.1002/col.10049</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barron</surname> <given-names>J. T.</given-names></name></person-group> (<year>2015</year>). <article-title>Convolutional color constancy,</article-title> in <source>Proceedings of IEEE International Conference on Computer Vision</source> (<publisher-loc>Santiago</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>379</fpage>&#x02013;<lpage>387</lpage>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beigpour</surname> <given-names>S.</given-names></name> <name><surname>Riess</surname> <given-names>C.</given-names></name> <name><surname>van de Weijer</surname> <given-names>J.</given-names></name> <name><surname>Angelopoulou</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Multi-illuminant estimation with conditional random fields</article-title>. <source>IEEE Trans. Image Process</source>. <volume>23</volume>, <fpage>83</fpage>&#x02013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2013.2286327</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bianco</surname> <given-names>S.</given-names></name> <name><surname>Cusano</surname> <given-names>C.</given-names></name> <name><surname>Schettini</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Color constancy using <italic>CNNs</italic></article-title>. <volume>5</volume>, <fpage>81</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1109/CVPRW.2015.7301275</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bianco</surname> <given-names>S.</given-names></name> <name><surname>Cusano</surname> <given-names>C.</given-names></name> <name><surname>Schettini</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>Single and multiple illuminant estimation using convolutional neural networks</article-title>. <source>IEEE Trans. Image Process</source>. <volume>26</volume>, <fpage>4347</fpage>&#x02013;<lpage>4362</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2017.2713044</pub-id><pub-id pub-id-type="pmid">28600246</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bleier</surname> <given-names>M.</given-names></name> <name><surname>Riess</surname> <given-names>C.</given-names></name> <name><surname>Beigpour</surname> <given-names>S.</given-names></name> <name><surname>Eibenberger</surname> <given-names>E.</given-names></name> <name><surname>Angelopoulou</surname> <given-names>E.</given-names></name> <name><surname>Tr&#x000F6;ger</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Color constancy and non-uniform illumination: can existing algorithms work?</article-title> in <source>IEEE International Conference on Computer Vision Workshops</source> (<publisher-loc>Barcelona</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>774</fpage>&#x02013;<lpage>781</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bradski</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>The opencv library</article-title>. <source>Dr. Dobb&#x00027;s Journal: Software Tools for the Professional Programmer</source> <volume>25</volume>, <fpage>120</fpage>&#x02013;<lpage>123</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brainard</surname> <given-names>D. H.</given-names></name> <name><surname>Wandell</surname> <given-names>B. A.</given-names></name></person-group> (<year>1986</year>). <article-title>Analysis of the retinex theory of color vision</article-title>. <source>J. Opt. Soc. Am. A Optics Image Sci</source>. <volume>3</volume>, <fpage>1651</fpage>. <pub-id pub-id-type="doi">10.1364/JOSAA.3.001651</pub-id><pub-id pub-id-type="pmid">3772627</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>D.</given-names></name> <name><surname>Prasad</surname> <given-names>D. K.</given-names></name> <name><surname>Brown</surname> <given-names>M. S.</given-names></name></person-group> (<year>2014</year>). <article-title>Illuminant estimation for color constancy: why spatial-domain methods work and the role of the color distribution</article-title>. <source>J. Opt. Soc. Am. A Opt. Image Sci. Vis</source>. <volume>31</volume>, <fpage>1049</fpage>. <pub-id pub-id-type="doi">10.1364/JOSAA.31.001049</pub-id><pub-id pub-id-type="pmid">24979637</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ciurea</surname> <given-names>F.</given-names></name> <name><surname>Funt</surname> <given-names>B.</given-names></name></person-group> (<year>2003</year>). <article-title>A large image database for color constancy research,</article-title> in <source>Color and Imaging Conference (Society for Imaging Science and Technology), Vol. 2003</source>. p. <fpage>160</fpage>&#x02013;<lpage>4</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www2.cs.sfu.ca/&#x0007E;colour/publications/PCIC-2003/LargeImageDatabase.pdf">https://www2.cs.sfu.ca/&#x0007E;colour/publications/PCIC-2003/LargeImageDatabase.pdf</ext-link></citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ebner</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <source>Color constancy, Vol. 7</source>. <publisher-name>John Wiley and Sons</publisher-name>.</citation>
</ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eigen</surname> <given-names>D.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,</article-title> in <source>IEEE International Conference on Computer Vision</source> (<publisher-loc>Santiago</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2650</fpage>&#x02013;<lpage>2658</lpage>.</citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eigen</surname> <given-names>D.</given-names></name> <name><surname>Puhrsch</surname> <given-names>C.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Depth map prediction from a single image using a multi-scale deep network,</article-title> in <source>28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014</source> (<publisher-loc>Neural Information Processing Systems Foundation</publisher-loc>), <fpage>2366</fpage>&#x02013;<lpage>2374</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finlayson</surname> <given-names>G. D.</given-names></name> <name><surname>Drew</surname> <given-names>M. S.</given-names></name> <name><surname>Funt</surname> <given-names>B. V.</given-names></name></person-group> (<year>1994</year>). <article-title>Spectral sharpening: sensor transformations for improved color constancy</article-title>. <source>J. Opt. Soc. Am. A Opt. Image Sci. Vis</source>. <volume>11</volume>, <fpage>1553</fpage>&#x02013;<lpage>1563</lpage>. <pub-id pub-id-type="doi">10.1364/JOSAA.11.001553</pub-id><pub-id pub-id-type="pmid">8006721</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Finlayson</surname> <given-names>G. D.</given-names></name> <name><surname>Drew</surname> <given-names>M. S.</given-names></name> <name><surname>Lu</surname> <given-names>C.</given-names></name></person-group> (<year>2004</year>). <source>Intrinsic Images by Entropy Minimization</source>. <publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Funt</surname> <given-names>B.</given-names></name> <name><surname>Barnard</surname> <given-names>K.</given-names></name> <name><surname>Martin</surname> <given-names>L.</given-names></name></person-group> (<year>1999</year>). <article-title>Is machine colour constancy good enough?</article-title> in <source>Proceedings of European Conference on Computer Vision</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>).<pub-id pub-id-type="pmid">9274768</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Funt</surname> <given-names>B.</given-names></name> <name><surname>Ciurea</surname> <given-names>F.</given-names></name> <name><surname>Mccann</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). <article-title>Retinex in matlab</article-title>. <source>J. Electron. Imaging</source> <volume>13</volume>, <fpage>112</fpage>&#x02013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1117/1.1636761</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Funt</surname> <given-names>B. V.</given-names></name> <name><surname>Lewis</surname> <given-names>B. C.</given-names></name></person-group> (<year>2000</year>). <article-title>Diagonal versus affine transformations for color correction</article-title>. <source>J. Opt. Soc. Am. A Opt. Image Sci. Vis</source>. <volume>17</volume>, <fpage>2108</fpage>&#x02013;<lpage>2112</lpage>. <pub-id pub-id-type="doi">10.1364/JOSAA.17.002108</pub-id><pub-id pub-id-type="pmid">11059611</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <source>Computational Models of Visual Adaptation and Color Constancy and Applications</source> (Ph.D. thesis). <publisher-name>University of Electronic Science and Technology of China</publisher-name>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>S. B.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>C. Y.</given-names></name> <name><surname>Li</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Improving color constancy by discounting the variation of camera spectral sensitivity</article-title>. <source>J. Opt. Soc. Am. A Opt. Image Vis</source>. <volume>34</volume>, <fpage>1448</fpage>&#x02013;<lpage>1462</lpage>. <pub-id pub-id-type="doi">10.1364/JOSAA.34.001448</pub-id><pub-id pub-id-type="pmid">29036112</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>S. B.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Improving color constancy by selecting suitable set of training images</article-title>. <source>Opt. Express</source>. <volume>27</volume>, <fpage>25611</fpage>. <pub-id pub-id-type="doi">10.1364/OE.27.025611</pub-id><pub-id pub-id-type="pmid">31510431</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gehler</surname> <given-names>P. V.</given-names></name> <name><surname>Rother</surname> <given-names>C.</given-names></name> <name><surname>Blake</surname> <given-names>A.</given-names></name> <name><surname>Minka</surname> <given-names>T.</given-names></name> <name><surname>Sharp</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Bayesian color constancy revisited,</article-title> in <source>Proceedings of IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Anchorage, AK</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gilchrist</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <source>Seeing Black and White</source>. <publisher-loc>Oxford</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gong</surname> <given-names>H.</given-names></name> <name><surname>Cosker</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Interactive shadow removal and ground truth for variable scene categories,</article-title> in <source>BMVC</source> (<publisher-loc>Citeseer</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>11</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>L.</given-names></name> <name><surname>Huynh</surname> <given-names>C. P.</given-names></name> <name><surname>Robleskelly</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Segmentation and estimation of spatially varying illumination</article-title>. <source>IEEE Trans. Image Process</source>. <volume>23</volume>, <fpage>3478</fpage>&#x02013;<lpage>3489</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2014.2330768</pub-id><pub-id pub-id-type="pmid">24951698</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>B.</given-names></name> <name><surname>Lin</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Fc4: fully convolutional color constancy with confidence-weighted pooling,</article-title> in <source>IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>330</fpage>&#x02013;<lpage>339</lpage>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jameson</surname> <given-names>D. H. L.</given-names></name></person-group> (<year>1989</year>). <article-title>Essay concerning color constancy</article-title>. <source>Annu. Rev. Psychol</source>. <volume>40</volume>, <fpage>1</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.ps.40.020189.000245</pub-id><pub-id pub-id-type="pmid">2648972</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kannimuthu</surname> <given-names>S.</given-names></name> <name><surname>Premalatha</surname> <given-names>K.</given-names></name> <name><surname>Shankar</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Investigation of high utility itemset mining in service oriented computing: deployment of knowledge as a service in e-commerce,</article-title> in <source>2012 Fourth International Conference on Advanced Computing (ICoAC)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kingma Diederik</surname> <given-names>P.</given-names></name> <name><surname>Adam</surname> <given-names>J. B.</given-names></name></person-group> (<year>2014</year>). <article-title>A method for stochastic optimization</article-title>. <source>arXiv [Preprint]</source>. arXiv: 1412.6980. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1412.6980.pdf">https://arxiv.org/pdf/1412.6980.pdf</ext-link></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kraft</surname> <given-names>J. M.</given-names></name> <name><surname>Brainard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Mechanisms of color constancy under nearly natural viewing</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>96</volume>, <fpage>307</fpage>&#x02013;<lpage>312</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.96.1.307</pub-id><pub-id pub-id-type="pmid">9874814</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>ACM Commun</source>. <volume>60</volume>, <fpage>84</fpage>&#x02013;<lpage>90</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf">https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf</ext-link></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Land</surname> <given-names>E. H.</given-names></name></person-group> (<year>1986</year>). <article-title>Recent advances in retinex theory</article-title>. <source>Vis. Res</source>. <volume>26</volume>, <fpage>7</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1016/0042-6989(86)90067-2</pub-id><pub-id pub-id-type="pmid">3716215</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mutimbu</surname> <given-names>L.</given-names></name> <name><surname>Robles-Kelly</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Multiple illuminant colour estimation via statistical inference on factor graphs</article-title>. <source>IEEE Trans. Image Process</source>. <volume>25</volume>, <fpage>5383</fpage>&#x02013;<lpage>5396</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2016.2605003</pub-id><pub-id pub-id-type="pmid">28113585</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A.</given-names></name> <name><surname>Gross</surname> <given-names>S.</given-names></name> <name><surname>Massa</surname> <given-names>F.</given-names></name> <name><surname>Lerer</surname> <given-names>A.</given-names></name> <name><surname>Chintala</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <publisher-loc>Pytorch</publisher-loc>: <publisher-name>An imperative style, high-performance deep learning library</publisher-name>.</citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Qian</surname> <given-names>Y.</given-names></name> <name><surname>Nikkanen</surname> <given-names>J.</given-names></name> <name><surname>Kmrinen</surname> <given-names>J. K.</given-names></name> <name><surname>Matas</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>On finding gray pixels,</article-title> in <source>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roca-Vila</surname> <given-names>J.</given-names></name> <name><surname>Parraga</surname> <given-names>C. A.</given-names></name> <name><surname>Vanrell</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <source>Human and Computational Color Constancy</source>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shelhamer</surname> <given-names>E.</given-names></name> <name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Darrell</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>Fully convolutional networks for semantic segmentation</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>39</volume>, <fpage>640</fpage>&#x02013;<lpage>651</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2572683</pub-id><pub-id pub-id-type="pmid">27244717</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>W.</given-names></name> <name><surname>Loy</surname> <given-names>C. C.</given-names></name> <name><surname>Tang</surname> <given-names>X.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep specialized network for illuminant estimation,</article-title> in <source>European Conference on Computer Vision</source> (<publisher-loc>Springer</publisher-loc>), <fpage>371</fpage>&#x02013;<lpage>378</lpage>.</citation>
</ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sidorov</surname> <given-names>O.</given-names></name></person-group> (<year>2019</year>). <article-title>Conditional gans for multi-illuminant color constancy: revolution or yet another approach?</article-title> in <source>The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B45">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv [Preprint]</source>. arXiv: 1409.1556. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1409.1556.pdf">https://arxiv.org/pdf/1409.1556.pdf</ext-link></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smithson</surname> <given-names>H. E.</given-names></name></person-group> (<year>2005</year>). <article-title>Sensory, computational and cognitive components of human colour constancy</article-title>. <source>Philos. Trans. R. Soc. B</source> <volume>360</volume>, <fpage>1329</fpage>&#x02013;<lpage>1346</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2005.1633</pub-id><pub-id pub-id-type="pmid">16147525</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vrhel</surname> <given-names>M.</given-names></name> <name><surname>Saber</surname> <given-names>E.</given-names></name> <name><surname>Trussell</surname> <given-names>H. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Color image generation and display technologies</article-title>. <source>IEEE Signal Process. Mag</source>. <volume>22</volume>, <fpage>23</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1109/MSP.2005.1407712</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Wu</surname> <given-names>D.</given-names></name> <name><surname>Gao</surname> <given-names>G.</given-names></name></person-group> (<year>2022</year>). <article-title>Color constancy via multi-scale region-weighed network guided by semantics</article-title>. <source>Front. Neurorobot</source>. <volume>16</volume>, <fpage>841426</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2022.841426</pub-id><pub-id pub-id-type="pmid">35464675</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiong</surname> <given-names>W.</given-names></name> <name><surname>Funt</surname> <given-names>B.</given-names></name></person-group> (<year>2006</year>). <article-title>Color constancy for multiple-illuminant scenes using retinex and svr,</article-title> in <source>Color and Imaging Conference (Society for Imaging Science and Technology), Vol. 2006</source>. p. <fpage>304</fpage>&#x02013;<lpage>308</lpage>.</citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>K. -F.</given-names></name> <name><surname>Gao</surname> <given-names>S. -B.</given-names></name> <name><surname>Li</surname> <given-names>Y. -J.</given-names></name></person-group> (<year>2015</year>). <article-title>Efficient illuminant estimation for color constancy using grey pixels,</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>2254</fpage>&#x02013;<lpage>2263</lpage>.</citation>
</ref>
<ref id="B51">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>F.</given-names></name> <name><surname>Koltun</surname> <given-names>V.</given-names></name></person-group> (<year>2015</year>). <article-title>Multi-scale context aggregation by dilated convolutions</article-title>. <source>arXiv [Preprint]</source>. arXiv: 1511.07122. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1511.07122.pdf">https://arxiv.org/pdf/1511.07122.pdf</ext-link></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeng</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>Contour detection based on a non-classical receptive field model with butterfly-shaped inhibition subregions</article-title>. <source>Neurocomputing</source> <volume>74</volume>, <fpage>1527</fpage>&#x02013;<lpage>1534</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2010.12.022</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X. S.</given-names></name> <name><surname>Gao</surname> <given-names>S. B.</given-names></name> <name><surname>Li</surname> <given-names>R. X.</given-names></name> <name><surname>Du</surname> <given-names>X. Y.</given-names></name> <name><surname>Li</surname> <given-names>C. Y.</given-names></name> <name><surname>Li</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2016</year>). <article-title>A retinal mechanism inspired color constancy model</article-title>. <source>IEEE Trans. Image Process</source>. <volume>25</volume>, <fpage>1219</fpage>&#x02013;<lpage>1232</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2016.2516953</pub-id><pub-id pub-id-type="pmid">26766375</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>B.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Puig</surname> <given-names>X.</given-names></name> <name><surname>Fidler</surname> <given-names>S.</given-names></name> <name><surname>Barriuso</surname> <given-names>A.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Semantic understanding of scenes through the ade20k dataset</article-title>. <source>Int. J. Comput. Vis</source>. <volume>127</volume>, <fpage>302</fpage>&#x02013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.544</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Samuel</surname> <given-names>K.</given-names></name> <name><surname>Masood</surname> <given-names>S. Z.</given-names></name> <name><surname>Tappen</surname> <given-names>M. F.</given-names></name></person-group> (<year>2010</year>). <article-title>Learning to recognize shadows in monochromatic natural images,</article-title> in <source>2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>).<pub-id pub-id-type="pmid">28796618</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p>1 We assume that there are multiple light sources in the scene and that the illumination of the multiple light sources is uniform.</p></fn>
<fn id="fn0002"><p>2 As demonstrated in the literature (Finlayson et al., <xref ref-type="bibr" rid="B19">2004</xref>; Barron, <xref ref-type="bibr" rid="B7">2015</xref>), log-uv has advantages over <italic>RGB</italic>. First, there are two variables instead of three. Second, the multiplicative constraint of the illumination estimation model is converted to a linear constraint.</p></fn>
<fn id="fn0003"><p>3 <ext-link ext-link-type="uri" xlink:href="https://image.baidu.com">https://image.baidu.com</ext-link></p></fn>
<fn id="fn0004"><p>4 Experimental hardware platform: Intel Xeon Silver 4210R, 64-GB memory, GTX3090. The resolution of the test image was less than 800*600.</p></fn>
</fn-group>
</back>
</article>
