<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Sig. Proc.</journal-id>
<journal-title>Frontiers in Signal Processing</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Sig. Proc.</abbrev-journal-title>
<issn pub-type="epub">2673-8198</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">932873</article-id>
<article-id pub-id-type="doi">10.3389/frsip.2022.932873</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Signal Processing</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Joint image compression and denoising <italic>via</italic> latent-space scalability</article-title>
<alt-title alt-title-type="left-running-head">Ranjbar Alvar et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/frsip.2022.932873">10.3389/frsip.2022.932873</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ranjbar Alvar</surname>
<given-names>Saeed</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1793468/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ulhaq</surname>
<given-names>Mateen</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1942365/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Choi</surname>
<given-names>Hyomin</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1969501/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Baji&#x107;</surname>
<given-names>Ivan V.</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1619413/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>School of Engineering Science</institution>, <institution>Simon Fraser University</institution>, <addr-line>Burnaby</addr-line>, <addr-line>BC</addr-line>, <country>Canada</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1282307/overview">Wenhan Yang</ext-link>, Nanyang Technological University, Singapore</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1169249/overview">Ionut Schiopu</ext-link>, Huawei Technologies Oy, Finland</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1151081/overview">Miaohui Wang</ext-link>, Shenzhen University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Ivan V. Baji&#x107;, <email>ibajic@ensc.sfu.ca</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Image Processing, a section of the journal Frontiers in Signal Processing</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>2</volume>
<elocation-id>932873</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>04</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>08</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Ranjbar Alvar, Ulhaq, Choi and Baji&#x107;.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ranjbar Alvar, Ulhaq, Choi and Baji&#x107;</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-based image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space (the base layer), while the noisy image is decoded from the full latent space at a higher rate. Using a subset of the latent space for the denoised image allows denoising to be carried out at a lower rate. Besides providing a scalable representation of the noisy input image, performing denoising jointly with compression makes intuitive sense because noise is hard to compress; hence, compressibility is one of the criteria that may help distinguish noise from the signal. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings compared to a cascade combination of a state-of-the-art codec and a state-of-the-art denoiser.</p>
</abstract>
<kwd-group>
<kwd>image denoising</kwd>
<kwd>image compression</kwd>
<kwd>deep learning</kwd>
<kwd>multi-task compression</kwd>
<kwd>scalable coding</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Images obtained from digital imaging sensors are degraded by the noise generated due to many factors such as lighting of the scene, sensors, shutter speed, etc. In practice, noticeable noise is often encountered in low-light conditions, as illustrated in the Smartphone Image Denoising Dataset (SIDD) <xref ref-type="bibr" rid="B1">Abdelhamed et al. (2018)</xref>. In a typical image processing pipeline, noise in the captured image is attenuated or removed before compressing the image. The noise removed in the pre-processing stage cannot be restored, and the compressed image does not carry information about the original noise. While it is a desirable feature not to have noise in the stored image for the majority of applications, the captured noise may carry useful information for certain applications, such as court evidence, image forensics, and artistic intent. For such applications, the noise needs to be preserved in the compressed image. In fact, compressed-domain denoising together with techniques to preserve the noise is part of the recent JPEG AI call for proposals <xref ref-type="bibr" rid="B22">ISO/IEC and ITU-T (2022a)</xref>. The major drawback of encoding the noise is that it significantly increases the bitrate required for storing and transferring the images. As an example, it is known that independent and identically distributed (iid) Gaussian source, which is a common noise model, has the worst rate-distortion performance among all the sources with the same variance <xref ref-type="bibr" rid="B14">Cover and Thomas (2006)</xref>. Another issue is that when the clean (denoised) image is needed, the denoising should be applied to the reconstructed noisy images. The additional denoising step may increase the run time and the complexity of the pipeline.</p>
<p>To overcome the mentioned drawbacks of encoding the noisy image and performing denoising in cascade, we present a scalable multi-task image compression framework that performs compression and denoising jointly. We borrow the terminology from scalable video coding <xref ref-type="bibr" rid="B32">Schwarz et al. (2007)</xref>, where the input video is encoded into a scalable representation consisting of a <italic>base layer</italic> and one or more <italic>enhancement layers</italic>, which enables reconstructing various representations of the original video - different resolutions and/or frame rates and/or qualities. In the proposed Joint Image Compression and Denoising (JICD) framework, the encoder maps the noisy input to a latent representation that is partitioned into a base layer and an enhancement layer. The base layer contains the information about the clean image, while the enhancement layer contains information about noise. When the denoised image is needed, only the base layer needs to be encoded (and decoded), thereby avoiding noise coding. The enhancement layer is encoded only when the noisy input reconstruction is needed.</p>
<p>The scalable design of the system provides several advantages. Since only a subset of latent features is encoded for the denoised image, the bitrate is reduced compared to using the entire latent space. Another advantage is that the noise is not completely removed from the latent features, only separated from the features corresponding to the denoised image. Therefore, when the noisy input reconstruction is needed, the enhancement features are used in addition to the base features to decode the noisy input. The multi-task nature of the framework means that compression and denoising are trained jointly, and it also allows us to obtain both reconstructed noisy input and the corresponding denoised image in single forward pass, which reduces the complexity compared to the cascade implementation of compression and denoising. In fact, our results demonstrate that such a system provides improved performance&#x2014;better denoising accuracy at the same bitrate&#x2014;compared to a cascade combination of a state-of-the-art codec and a state-of-the-art denoiser.</p>
<p>The novel contributions of this paper are as follows:<list list-type="simple">
<list-item>
<p>&#x2022; We develop JICD, the first multi-task image coding framework that supports both image denoising and noisy image reconstruction.</p>
</list-item>
<list-item>
<p>&#x2022; JICD employs latent space scalability, such that the information about the clean image is mapped to a subset of the latent space (base layer) while noise information is mapped to the remainder (enhancement layer).</p>
</list-item>
<list-item>
<p>&#x2022; Unlike many methods in the literature, which are either developed for a particular type of noise and/or require some noise parameter(s) in order to operate properly, the proposed JICD is capable of handling unseen noise.</p>
</list-item>
</list>
</p>
<p>The remainder of the paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> briefly describes prior work related to compression, denoising, and joint compression and denoising. <xref ref-type="sec" rid="s3">Section 3</xref> discusses the preliminaries related to learning-based multi-task image compression. <xref ref-type="sec" rid="s4">Section 4</xref> presents the proposed method. <xref ref-type="sec" rid="s5">Section 5</xref> describes the experiments and analyzes the experimental results. Finally, <xref ref-type="sec" rid="s6">Section 6</xref> presents concluding remarks.</p>
</sec>
<sec id="s2">
<title>2 Related works</title>
<p>The proposed JICD framework is a multi-task image codec that performs image compression and denoising jointly. In this section, we briefly discuss the most relevant works related to image denoising (<xref ref-type="sec" rid="s2-1">Section 2.1</xref>), learning-based image compression (<xref ref-type="sec" rid="s2-2">Section 2.2</xref>), and multi-task image compression including joint compression and denoising (<xref ref-type="sec" rid="s2-3">Section 2.3</xref>).</p>
<sec id="s2-1">
<title>2.1 Image denoising</title>
<p>State-of-the-art classical image denoising methods are based on Non-local Self Similarity (NSS). In these methods, repetitive local patterns in a noisy image are used to capture signal and noise characteristics, and perform denoising. In BM3D <xref ref-type="bibr" rid="B16">Dabov et al. (2007b)</xref>, similar patches are first found by block matching. Then, they are stacked to form a 3D block. Finally, transform-domain collaborative filtering is applied to obtain the clean patch. <xref ref-type="bibr" rid="B38">Yahya et al. (2020)</xref> used adaptive filtering to improve BM3D. WNNM <xref ref-type="bibr" rid="B19">Gu et al. (2014)</xref> performs denoisng by applying low rank matrix approximation to the stacked noisy patches. In <xref ref-type="bibr" rid="B37">Xu et al. (2015)</xref>, a patch group based NSS prior learning scheme to learn explicit NSS models from natural images is proposed. The denoising method in <xref ref-type="bibr" rid="B41">Zha et al. (2019)</xref> used NSS priors in both the degraded images and the external clean images to perform denoising. CBM3D <xref ref-type="bibr" rid="B15">Dabov et al. (2007a)</xref> and MCWNNM <xref ref-type="bibr" rid="B36">Xu et al. (2017)</xref> are the extensions of BM3D and WNNM, respectively, created to handle color images.</p>
<p>More recently (deep) learning-based denoising methods have gained popularity and surpassed the performance of classical methods. <xref ref-type="bibr" rid="B7">Burger et al. (2012)</xref> used a multi-layer perceptron (MLP) to achieve denoising results comparable to the state-of-the-art classic method. Among the learning-based denoisers, DnCNN <xref ref-type="bibr" rid="B43">Zhang et al. (2017)</xref> was the first Convolutional Neural Network (CNN) to perform blind Gaussian denoising. FFDNet <xref ref-type="bibr" rid="B44">Zhang et al. (2018)</xref> improved upon DnCNN by proposing a fast and flexible denoising CNN that could handle different noise levels with a single model. In <xref ref-type="bibr" rid="B20">Guo et al. (2019)</xref>, noise estimation subnetwork is added prior to the CNN-based denoiser to get an accurate estimate of the noise level in the real-world noisy photographs. A Generative Adversarial Networks (GAN)-based denoising method is proposed in <xref ref-type="bibr" rid="B8">Chen et al. (2018)</xref>. The mentioned works are supervised methods where clean reference image is needed for training. In <xref ref-type="bibr" rid="B25">Laine et al. (2019)</xref>; <xref ref-type="bibr" rid="B30">Quan et al. (2020)</xref>, self-supervised denoising methods are proposed.</p>
</sec>
<sec id="s2-2">
<title>2.2 Learning-based image compression</title>
<p>In recent years, there has been an increasing interest in the development of learning-based image codecs. Some of the early works <xref ref-type="bibr" rid="B35">Toderici et al. (2016)</xref>; <xref ref-type="bibr" rid="B28">Minnen et al. (2017)</xref>; <xref ref-type="bibr" rid="B24">Johnston et al. (2018)</xref> were based on Recurrent Neural Networks (RNNs), whose purpose was to model spatial dependence of pixels in an image. More recently, the focus has shifted to Convolutional Neural Network (CNN)-based autoencoders. <xref ref-type="bibr" rid="B3">Ball&#xe9; et al. (2017)</xref> introduced Generalized Divisive Normalization (GDN) as a key component of the nonlinear transform in the encoder. The image codec based on GDN was improved by introducing a hyperprior to capture spatial dependencies and take advantage of statistical redundancy in the entropy model <xref ref-type="bibr" rid="B4">Ball&#xe9; et al. (2018)</xref>. To further improve the coding gains, discretized Gaussian mixture likelihoods are used in <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref> to parameterize the distributions of latent codes. Most recently, this approach has been extended using advanced latent-space context modelling <xref ref-type="bibr" rid="B21">Guo et al. (2022)</xref> to achieve even better performance.</p>
<p>Most state-of-the-art learning-based image codding approaches <xref ref-type="bibr" rid="B4">Ball&#xe9; et al. (2018)</xref>; <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>; <xref ref-type="bibr" rid="B21">Guo et al. (2022)</xref> train different models for different bitrates, by changing the Lagrange multiplier that trades-off rate and distortion. Such approach is meant to explore the potential of learning-based compression, rather than be used in practice as is. There has also been a considerable amount of work on variable-rate learning-based compression <xref ref-type="bibr" rid="B35">Toderici et al. (2016)</xref>; <xref ref-type="bibr" rid="B12">Choi et al. (2019)</xref>; <xref ref-type="bibr" rid="B39">Yang et al. (2020)</xref>; <xref ref-type="bibr" rid="B33">Sebai (2021)</xref>; <xref ref-type="bibr" rid="B40">Yin et al. (2022)</xref>, where a single model is able to produce multiple rate-distortion points. However, in terms of rate-distortion performance, &#x201c;fixed-rate&#x201d; approaches such as <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>; <xref ref-type="bibr" rid="B21">Guo et al. (2022)</xref> currently seem to have an advantage over variable-rate ones.</p>
</sec>
<sec id="s2-3">
<title>2.3 Multi-task image compression</title>
<p>The mentioned learning-based codec are single-task models, where the task is the reconstruction of the input image, just like with conventional codecs. However, the real power of learning-based codecs is their ability to be trained for multiple tasks, for example, image processing or computer vision tasks, besides the usual input reconstruction. In fact, the goal of JPEG AI standardization is to develop such a coding framework that could support multiple tasks from a common compressed representation <xref ref-type="bibr" rid="B23">ISO/IEC and ITU-T (2022b)</xref>.</p>
<p>
<xref ref-type="bibr" rid="B11">Choi and Baji&#x107; (2022)</xref> proposed a scalable multi-task model with multiple segments in the latent space to handle computer vision tasks in addition to input reconstruction. The concept was based on latent-space scalability <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021)</xref>, where the latent space is partitioned in a scalable manner, from tasks that require less information to tasks that require more information. Our JICD framework is also based on latent-space scalability <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021)</xref>. However, unlike these earlier works, the latent space is organized such that it supports image denoising from the base layer and noisy input reconstruction from the full latent space. In other words, the tasks are different compared to these earlier works.</p>
<p>Recently, <xref ref-type="bibr" rid="B34">Testolina et al. (2021)</xref> and <xref ref-type="bibr" rid="B2">Alves de Oliveira et al. (2022)</xref> developed joint image compression and denoising pipelines built upon learning-based image codecs, where the pipeline is trained to take the input noisy image, compress it, and decode a denoised image. However, with these approaches, it is not possible to reconstruct the original noisy image, hence they are not multi-task models. Our proposed JICD performs the denoising task in its base layer, but it keeps the noise information in the enhancement layer, thereby also enabling noisy input reconstruction if needed.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Prelimineries</title>
<p>In thinking about how to construct a learning-based system that can produce both the denoised image and reconstruct the noisy image, it is useful to consider the processing pipeline in which noisy image is first compressed, then decoded, and then denoising is applied to obtain the denoised image. Let <bold>X</bold>
<sub>
<italic>n</italic>
</sub> be the noisy input image. If such an image is input to a learning-based codec <xref ref-type="bibr" rid="B3">Ball&#xe9; et al. (2017</xref>, <xref ref-type="bibr" rid="B4">2018)</xref>; <xref ref-type="bibr" rid="B27">Minnen et al. (2018)</xref>; <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>, encoding would proceed in three steps:<disp-formula id="e1">
<mml:math id="m1">
<mml:mi mathvariant="bold-script">Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(1)</label>
</disp-formula>
<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(2)</label>
</disp-formula>
<disp-formula id="e3">
<mml:math id="m3">
<mml:mi mathvariant="bold">B</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>g</italic>
<sub>
<italic>a</italic>
</sub> is the analysis transform, <italic>&#x3d5;</italic> represents the parameters of <italic>g</italic>
<sub>
<italic>a</italic>
</sub>, <italic>Q</italic> is the quantization function, and <bold>B</bold> is the bitstream obtained by applying the arithmetic encoder <italic>A</italic>
<sub>
<italic>E</italic>
</sub> to <inline-formula id="inf1">
<mml:math id="m4">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The noisy input image is reconstructed at the decoder by applying the entropy decoding and synthesis transform to the encoded bitstream as:<disp-formula id="e4">
<mml:math id="m5">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">B</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m6">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>A</italic>
<sub>
<italic>D</italic>
</sub> is the entropy decoder, <italic>g</italic>
<sub>
<italic>s</italic>
</sub> and <italic>&#x3b8;</italic> are the synthesis transform and its parameters, respectively. Then the denoised image can be obtained by applying a denoiser to the reconstructed noisy input as:<disp-formula id="e6">
<mml:math id="m7">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>&#x3c8;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>F</italic> and <italic>&#x3c8;</italic> are the denoiser and its parameters, respectively, and <inline-formula id="inf2">
<mml:math id="m8">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the denoised image.</p>
<p>This processing pipeline forms a Markov chain <inline-formula id="inf3">
<mml:math id="m9">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2192;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2192;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2192;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. Applying the data processing inequality (DPI) <xref ref-type="bibr" rid="B14">Cover and Thomas (2006)</xref> to this Markov chain, we get<disp-formula id="e7">
<mml:math id="m10">
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2265;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(7)</label>
</disp-formula>where <italic>I</italic> (&#x22c5;; &#x22c5;) is the mutual information <xref ref-type="bibr" rid="B14">Cover and Thomas (2006)</xref> between two random quantities. Based on <xref ref-type="disp-formula" rid="e7">(7)</xref> we can conclude that latent representation <inline-formula id="inf4">
<mml:math id="m11">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> carries less information about the denoised image <inline-formula id="inf5">
<mml:math id="m12">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> than it does about the noisy reconstructed image <inline-formula id="inf6">
<mml:math id="m13">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. Moreover, because <inline-formula id="inf7">
<mml:math id="m14">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is obtained from <inline-formula id="inf8">
<mml:math id="m15">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, the information that <inline-formula id="inf9">
<mml:math id="m16">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> carries about <inline-formula id="inf10">
<mml:math id="m17">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is a subset of the information that it carries about <inline-formula id="inf11">
<mml:math id="m18">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. This motivates us to structure the latent representation <inline-formula id="inf12">
<mml:math id="m19">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> in such a way that only a part of it (the base layer) is used to reconstruct the denoised image <inline-formula id="inf13">
<mml:math id="m20">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, while the whole of <inline-formula id="inf14">
<mml:math id="m21">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> (base &#x2b; enhancement) is used to reconstruct the noisy image <inline-formula id="inf15">
<mml:math id="m22">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>.</p>
</sec>
<sec id="s4">
<title>4 Proposed method</title>
<p>The proposed joint image compression and denoising (JICD) framework consists of an encoder and two task-specific decoders, as illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>. The architecture of the blocks that make up the encoder and two decoders in <xref ref-type="fig" rid="F1">Figure 1</xref> is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. Note that the architecture of the individual building blocks (Analysis Transform, Synthesis Transform, etc.) is the same as in <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>; <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021</xref>, <xref ref-type="bibr" rid="B11">2022)</xref>, but these blocks have been retrained to support a scalable latent representation for joint compression and denoising. Specifically, compared to <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>, our encoder is trained to produce a scalable latent space that enables both denoising and noisy input reconstruction. Compared to <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021</xref>, <xref ref-type="bibr" rid="B11">2022)</xref>, our system is trained to support different tasks, and correspondingly the structure of the latent space and the training procedure is different. Details of individual components are described below.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>The proposed joint compression and denoising (JICD) framework. AE/AD represent arithmetic encoder and decoder, respectively. <italic>C</italic>
<sub>
<italic>m</italic>
</sub> and EP stand for the context model and entropy parameters, respectively. Q represents the quantizer, which is simple rounding to the nearest integer. The architecture of the individual building blocks is the same as in <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>; <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021</xref>, <xref ref-type="bibr" rid="B11">2022)</xref>, but they have been retrained to support different tasks, as explained in the text.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g001.tif"/>
</fig>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The architecture of the components inside the proposed JICD framework.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g002.tif"/>
</fig>
<sec id="s4-1">
<title>4.1 Encoder</title>
<p>The encoder employs an analysis transform to obtain a high fidelity latent-space representation for the input image. In addition, the encoder has blocks to efficiently encode the obtained latent-space tensor. The encoder&#x2019;s analysis transform is borrowed from <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref> due to its high compression efficiency. In addition to the analysis transform, we also adopted the entropy parameter (EP) module, the context model (CTX) for arithmetic encoder/decoder (AE/AD), synthesis transform and hyper analysis/synthesis without attention layers from <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>.</p>
<p>The analysis transform converts the input image <bold>X</bold> into <inline-formula id="inf16">
<mml:math id="m23">
<mml:mi mathvariant="bold-script">Y</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>, with <italic>C</italic> &#x3d; 192 as in <xref ref-type="bibr" rid="B27">Minnen et al. (2018)</xref>; <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>. Unlike <xref ref-type="bibr" rid="B27">Minnen et al. (2018)</xref>; <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>, the latent representation <inline-formula id="inf17">
<mml:math id="m24">
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:math>
</inline-formula> is split into two separate sub-latents <inline-formula id="inf18">
<mml:math id="m25">
<mml:mi mathvariant="bold-script">Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x222a;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, <inline-formula id="inf19">
<mml:math id="m26">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:math>
</inline-formula>, where <inline-formula id="inf20">
<mml:math id="m27">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the base layer containing <italic>i</italic> channels, <inline-formula id="inf21">
<mml:math id="m28">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf22">
<mml:math id="m29">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the enhancement layer containing <italic>C</italic> &#x2212; <italic>i</italic> channels, <inline-formula id="inf23">
<mml:math id="m30">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. This allows the latent representation to be used efficiently for multiple purposes, namely denoising (from <inline-formula id="inf24">
<mml:math id="m31">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) and noisy input reconstruction (from <inline-formula id="inf25">
<mml:math id="m32">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x222a;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>). Since denoising requires only <inline-formula id="inf26">
<mml:math id="m33">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, it can be accomplished at a lower bitrate compared to decoding the full latent space. The sub-latents are then quantized to produce <inline-formula id="inf27">
<mml:math id="m34">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and <inline-formula id="inf28">
<mml:math id="m35">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, respectively, and then coded using their respective context models to produce two independently-decodable bitstreams, as discussed in <xref ref-type="bibr" rid="B11">Choi and Baji&#x107; (2022</xref>, <xref ref-type="bibr" rid="B10">2021)</xref>. The side bitstream shown in <xref ref-type="fig" rid="F1">Figure 1</xref> is considered to be a part of the base layer and its rate is included in bitrate calculations for the base layer bitstream in the experiments.</p>
</sec>
<sec id="s4-2">
<title>4.2 Decoder</title>
<p>Two task-specific decoders are constructed: one for denoised image decoding and one for noisy input image reconstruction. The hyperpriors used in both decoders are reconstructed from the side bitstream which, as mentioned above, is considered to be a part of the base layer. Quantized base representation <inline-formula id="inf29">
<mml:math id="m36">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is reconstructed in the base decoder by decoding the base bitstream, and used to produce the denoised image <inline-formula id="inf30">
<mml:math id="m37">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. Unlike <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021</xref>, <xref ref-type="bibr" rid="B11">2022)</xref>, where the base layer was dedicated to object detection/segmentation, our decoder does not require latent space transformation from <inline-formula id="inf31">
<mml:math id="m38">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> into another latent space; the synthesis transform (<xref ref-type="fig" rid="F1">Figure 1</xref>) produces the denoised image <inline-formula id="inf32">
<mml:math id="m39">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> directly from <inline-formula id="inf33">
<mml:math id="m40">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. Quantized enhancement representation <inline-formula id="inf34">
<mml:math id="m41">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is decoded only when noisy input reconstruction is needed. The reconstructed noisy input image <inline-formula id="inf35">
<mml:math id="m42">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is produced by the second decoder using <inline-formula id="inf36">
<mml:math id="m43">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x222a;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>.</p>
<p>Although not pursued in this work, it is worth mentioning that the proposed JICD framework can be extended to perform various computer vision tasks as well, such as image classification or object detection. These tasks typically require clean images, so one can think of the processing pipeline described by the following Markov chain: <inline-formula id="inf37">
<mml:math id="m44">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2192;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2192;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>T</mml:mi>
</mml:math>
</inline-formula>, where <italic>T</italic> is the output of a computer vision task, for example a class label or object bounding boxes. Applying the DPI to this Markov chain we have<disp-formula id="e8">
<mml:math id="m45">
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2265;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(8)</label>
</disp-formula>which implies that a subset of information from <inline-formula id="inf38">
<mml:math id="m46">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is sufficient to produce <italic>T</italic>. Hence, if such tasks are required, the encoder&#x2019;s latent space can be further partitioned by splitting <inline-formula id="inf39">
<mml:math id="m47">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-script">Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, in a manner similar to <xref ref-type="bibr" rid="B10">Choi and Baji&#x107; (2021</xref>, <xref ref-type="bibr" rid="B11">2022)</xref>, to support such tasks at an even lower bitrate than our base layer.</p>
</sec>
<sec id="s4-3">
<title>4.3 Training</title>
<p>The model is trained end-to-end with a rate-distortion Lagrangian loss function in the form of:<disp-formula id="e9">
<mml:math id="m48">
<mml:mi mathvariant="script">L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
</mml:math>
<label>(9)</label>
</disp-formula>where <italic>R</italic> is an estimate of rate, <italic>D</italic> is the total distortion of both tasks, and <italic>&#x3bb;</italic> is the Lagrange multiplier. The estimated rate is affected by latent and hyper-priors as in <xref ref-type="bibr" rid="B27">Minnen et al. (2018)</xref>,<disp-formula id="e10">
<mml:math id="m49">
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>log</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xfe38;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mtext>latent</mml:mtext>
</mml:mrow>
</mml:munder>
<mml:mo>&#x2b;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>log</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xfe38;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mtext>hyper</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mtext>priors</mml:mtext>
</mml:mrow>
</mml:munder>
<mml:mo>,</mml:mo>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>x</italic> denotes input data, <inline-formula id="inf40">
<mml:math id="m50">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the quantized latent data and <inline-formula id="inf41">
<mml:math id="m51">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the quantized hyper-prior. Total distortion <italic>D</italic> is computed as the weighted average of image denoising distortion and noisy input reconstruction distortion:<disp-formula id="e11">
<mml:math id="m52">
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x22c5;</mml:mo>
<mml:mtext>MSE</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mtext>MSE</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(11)</label>
</disp-formula>where <italic>w</italic> is the trade-off factor to adjust the importance of the tasks. The automatic differentiation <xref ref-type="bibr" rid="B29">Paszke et al. (2019)</xref> ensures that the gradients from <italic>D</italic> flow through the corresponding parameters without further modification to the back-propagation algorithm.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Experimental results</title>
<sec id="s5-1">
<title>5.1 Network training</title>
<p>The proposed multi-task model is trained from scratch using the randomly cropped 256 &#xd7; 256 patches from the CLIC dataset <xref ref-type="bibr" rid="B13">CLIC (2019)</xref>. The noisy images are obtained using additive white Gaussian noise (AWGN) with three noise levels <italic>&#x3c3;</italic> &#x3d; {15, 25, 50}, clipping the resulting values to [0, 255] and quantizing the clipped values to mimic how noisy images are stored in practice. The batch size is set to 16. Training is ran for 300 epochs using the Adam optimizer with initial learning rate of 1 &#xd7; 10<sup>&#x2013;4</sup>. The learning rate is reduced by factor of 0.5 when the training loss plateaus. We trained six different models by changing the value of <italic>&#x3bb;</italic> in <xref ref-type="disp-formula" rid="e9">(9)</xref>. The list of different values for <italic>&#x3bb;</italic> is shown in <xref ref-type="table" rid="T1">Table 1</xref>. For all the models we used <italic>w</italic> &#x3d; 0.05 in <xref ref-type="disp-formula" rid="e11">(11)</xref>. We trained the model for the first rate point (lowest <italic>&#x3bb;</italic>) from scratch. However, for the remaining rate points we fine-tune the model starting from the previous rate point&#x2019;s weights.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>
<italic>&#x3bb;</italic> values used for training various models. Higher <italic>&#x3bb;</italic> leads to higher qualities and higher bitrates.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model index</th>
<th align="left">1</th>
<th align="left">2</th>
<th align="left">3</th>
<th align="left">4</th>
<th align="left">5</th>
<th align="left">6</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>&#x3bb;</italic>
</td>
<td align="left">0.0035</td>
<td align="left">0.0067</td>
<td align="left">0.013</td>
<td align="left">0.025</td>
<td align="left">0.0483</td>
<td align="left">0.09</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We trained models under two different settings. In the first setting, a given model is trained for each noise level. For this case, the number of enhancement channels <italic>C</italic> &#x2212; <italic>i</italic> is chosen according to the strength of the noise. For stronger noise, we allocate more channels to the enhancement layer, so that it can capture enough information to reconstruct the noise. The number of enhancement channels is reduced as the noise gets weaker. Specifically, the number of enhancement channels is empirically set to 32, 12, and two for <italic>&#x3c3;</italic> &#x3d; 50, <italic>&#x3c3;</italic> &#x3d; 25, and <italic>&#x3c3;</italic> &#x3d; 15, respectively. The second training setting is to train a single model with different noise levels <italic>&#x3c3;</italic> &#x2208; {50, 25, 15} simultaneously, and use the final trained model to perform denoising for all noise levels. This is beneficial when the noise level information is not given. In this model, we used 180 base channels and 12 enhancement channels. <italic>&#x3c3;</italic> at each training iteration is uniformly chosen from {50, 25, 15}.</p>
</sec>
<sec id="s5-2">
<title>5.2 Data</title>
<p>To evaluate the performance of the proposed JICD framework, four color image datasets are used: 1) CBSD68 <xref ref-type="bibr" rid="B26">Martin et al. (2001)</xref>, 2) Kodak24 <xref ref-type="bibr" rid="B18">Franzen (1999)</xref>, 3) McMaster <xref ref-type="bibr" rid="B45">Zhang et al. (2011)</xref> and 4) JPEG AI testset <xref ref-type="bibr" rid="B23">ISO/IEC and ITU-T (2022b)</xref>, which is used in the JPEG AI exploration experiments. The mentioned datasets contain 68, 24, 18, and 16 images, respectively. The resolution of the images in the Kodak24 and McMaster dataset is fixed to 500, &#xd7;, 500. CBS68 dataset contains the lowest-resolution images among the four datasets, with the height and width of images ranging between 321 and 481. The images in the JPEG AI testset are high-resolution images with the height varying between 872 and 2,456 pixels and width varying between 1,336 and 3,680 pixels. The results are reported for two sets of noisy images. In the first set, we added synthesized AWGN to the testing images with three noise levels: <italic>&#x3c3;</italic> &#x3d; {15, 25, 50} and tested the results with the quantized noisy images. In the second set, we used the synthesized noise obtained from the noise simulator in <xref ref-type="bibr" rid="B31">Ranjbar Alvar and Baji&#x107; (2022)</xref>, which was also used to generate the final test images for the denoising tasks in the ongoing JPEG AI standardization. This type of noise was not used during the training of the proposed JICD framework. Hence, the goal of testing with this second set of images is to evaluate how well the proposed JICD generalizes to the noise that is not seen during the training.</p>
</sec>
<sec id="s5-3">
<title>5.3 Baselines</title>
<p>The denoising performance of the proposed JICD framework is compared against well-established baselines: CBM3D <xref ref-type="bibr" rid="B15">Dabov et al. (2007a)</xref> and FFDNet <xref ref-type="bibr" rid="B44">Zhang et al. (2018)</xref>. CBM3D is a NSS-based denoising method, and FFDNet belongs to the learning-based denoising category. FFDNet was trained using AWGN with different noise levels during the training. At inference time, FFDNet needs the variance of the noise as input. FFDNet-clip <xref ref-type="bibr" rid="B44">Zhang et al. (2018)</xref> is a version of FFDNet that is trained with quantized noisy images. Since our focus is on practical settings with quantized noisy images, we used FFDNet-clip as a baseline in the experiments. We also tested the DRUNet denoiser <xref ref-type="bibr" rid="B42">Zhang et al. (2021)</xref>, which is one of the latest state-of-the-art denoisers. DRUNet assumes that the noise is not quantized, and when tested with quantized noise, it performs worse than FFDNet-clip. As a result, we did not include it in the experiments.</p>
<p>Two baselines are established by applying CBM3D and FFDNet-clip directly on noisy images, without compression. However, to assess the interaction of compression and denoising, we establish one more baseline. In this third baseline, the noisy image is first compressed using the end-to-end image compression model from <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref> (the &#x201c;Cheng model&#x201d;) with an implementation from CompressAI <xref ref-type="bibr" rid="B5">B&#xe9;gaint et al. (2020)</xref>, and then decoded. Then FFDNet-clip is used to denoise the decoded noisy image. We call this cascade denoising approach as Cheng &#x2b; FFDNet-clip. It is worth mentioning that Cheng &#x2b; FFDNet-clip, similar to the proposed JICD framework, is able to obtain both the reconstructed noisy images and denoised images, hence it could be considered a multi-task approach.</p>
</sec>
<sec id="s5-4">
<title>5.4 Experiments on AWGN removal</title>
<p>We evaluate the baselines and the proposed JICD method using the quantized noisy images obtained using AWGN with three noise levels, <italic>&#x3c3;</italic> &#x2208; {15, 25, 50}. The test results with the strongest noise (<italic>&#x3c3;</italic> &#x3d; 50) across the four datasets (CBSD68, Kodak24, McMaster, and JPEG AI) are shown <xref ref-type="fig" rid="F3">Figure 3</xref> in terms of rate vs Peak Signal-to-Noise Ratio (PSNR) and in <xref ref-type="fig" rid="F4">Figure 4</xref> in terms of rate vs Structural Similarity Index Measure (SSIM). The horizontal lines in the figure correspond to applying CBM3D and FFDNet-clip to the raw (uncompressed) noisy images. The blue curve shows the results for Cheng &#x2b; FFDNet-clip. The six points on this curve correspond to the six Cheng models from CompressAI <xref ref-type="bibr" rid="B5">B&#xe9;gaint et al. (2020)</xref>. For JICD, two curves are shown. The orange curve shows the results obtained from the models trained for <italic>&#x3c3;</italic> &#x3d; 50 with 160 base feature channels and 32 enhancement channels. The yellow curve corresponds to the results obtained using the model that was trained with variable <italic>&#x3c3;</italic> values and has 180 base and 12 enhancement channels. The six points on the orange and yellow curves correspond to the six JICD models we trained with <italic>&#x3bb;</italic> values shown in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Denoising rate-PSNR curves for <italic>&#x3c3;</italic> &#x3d;50: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Denoising rate-SSIM curves for <italic>&#x3c3;</italic> &#x3d;50: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g004.tif"/>
</fig>
<p>As seen in <xref ref-type="fig" rid="F3">Figure 3</xref>, for <italic>&#x3c3;</italic> &#x3d; 50, the quality of the images denoised by CBM3D is considerably lower compared to those obtained using FFDNet-clip. It was shown in <xref ref-type="bibr" rid="B44">Zhang et al. (2018)</xref> that CBM3D and FFDNet-clip achieve comparable performance for non-quantized noisy images. Our results show that CBM3D&#x2019;s performance is degraded when the noise deviates (due to clipping and quantization) from the assumed model, at least at high noise levels.</p>
<p>The comparison of the results obtained by JICD and Cheng &#x2b; FFDNet-clip reveal that JICD is able to reduce the bitrate substantially while achieving the same denoising performance as Cheng &#x2b; FFDNet-clip. This is due to the fact that the Cheng model allocates the entire latent representation to noisy input reconstruction, whereas the proposed method uses a subset of the latent features to perform denoising. The results of JICD trained with variable <italic>&#x3c3;</italic> are also shown in the curves. Since the number of base channels is larger in this model compared to the model trained for <italic>&#x3c3;</italic> &#x3d; 50, its denoising performance is improved.</p>
<p>To summarize the differences between the performance-rate curves, we compute Bj&#xf8;ntegaard Delta-rate (BD-rate) <xref ref-type="bibr" rid="B6">Bj&#xf8;ntegaard (2001)</xref>. The BD-rate of the proposed JICD compared to Cheng &#x2b; FFDNet-clip on the four datasets is given in the first two rows of <xref ref-type="table" rid="T2">Table 2</xref> for PSNR, and <xref ref-type="table" rid="T3">Table 3</xref> for SSIM. It can be seen that the proposed method achieves up to 80.2% BD-rate savings compared to Cheng &#x2b; FFDNet-clip. Both JICD and Cheng &#x2b; FFDNet-clip denoising methods outperform CBMD3D for all the tested rate points at <italic>&#x3c3;</italic> &#x3d; 50. Using the proposed JICD method, we are able to denoise images at a quality close to what FFDNet-clip achieves on raw images, and at the same time compress the input.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>The PSNR-based BD-rate of the proposed JICD compared to Cheng &#x2b; FFDNet-clip on the image denoising task.</p>
</caption>
<table>
<thead>
<tr>
<th align="left">Noise type</th>
<th align="left">Model</th>
<th align="left">CBSD68</th>
<th align="left">Kodak24</th>
<th align="left">McMaster</th>
<th align="left">JPEG AI</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 50</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 50</td>
<td align="left">&#x2212;69.28%</td>
<td align="left">&#x2212;72.91%</td>
<td align="left">&#x2212;72.69%</td>
<td align="left">&#x2212;74.45%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">&#x2212;70.66%</td>
<td align="left">&#x2212;77.27%</td>
<td align="left">&#x2212;76.55%</td>
<td align="left">&#x2212;80.20%</td>
</tr>
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 25</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 25</td>
<td align="left">&#x2212;30.58%</td>
<td align="left">&#x2212;41.00%</td>
<td align="left">&#x2212;33.18%</td>
<td align="left">&#x2212;45.13%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">&#x2212;30.28%</td>
<td align="left">&#x2212;42.61%</td>
<td align="left">&#x2212;33.52%</td>
<td align="left">&#x2212;45.77%</td>
</tr>
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 15</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 15</td>
<td align="left">1.07%</td>
<td align="left">&#x2212;11.99%</td>
<td align="left">&#x2212;4.95%</td>
<td align="left">&#x2212;15.82%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">8.00%</td>
<td align="left">&#x2212;2.99%</td>
<td align="left">9.22%</td>
<td align="left">&#x2212;5.78%</td>
</tr>
<tr>
<td align="left">Practical noise</td>
<td rowspan="2" align="left">variable <italic>&#x3c3;</italic>
</td>
<td rowspan="2" align="left">&#x2212;23.25%</td>
<td rowspan="2" align="left">&#x2212;33.83%</td>
<td rowspan="2" align="left">&#x2212;21.51%</td>
<td rowspan="2" align="left">&#x2212;23.42%</td>
</tr>
<tr>
<td align="left">simulator</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>The SSIM-based BD-rate of the proposed JICD compared to Cheng &#x2b; FFDNet-clip on the image denoising task.</p>
</caption>
<table>
<thead>
<tr>
<th align="left">Noise type</th>
<th align="left">Model</th>
<th align="left">CBSD68</th>
<th align="left">Kodak24</th>
<th align="left">McMaster</th>
<th align="left">JPEG AI</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 50</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 50</td>
<td align="left">&#x2212;63.34%</td>
<td align="left">&#x2212;72.08%</td>
<td align="left">&#x2212;72.29%</td>
<td align="left">&#x2212;72.89%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">&#x2212;64.09%</td>
<td align="left">&#x2212;73.64%</td>
<td align="left">&#x2212;75.57%</td>
<td align="left">&#x2212;76.43%</td>
</tr>
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 25</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 25</td>
<td align="left">&#x2212;24.14%</td>
<td align="left">&#x2212;38.99%</td>
<td align="left">&#x2212;53.40%</td>
<td align="left">&#x2212;43.11%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">&#x2212;24.45%</td>
<td align="left">&#x2212;39.86%</td>
<td align="left">&#x2212;52.21%</td>
<td align="left">&#x2212;43.97%</td>
</tr>
<tr>
<td rowspan="2" align="left">AWGN <italic>&#x3c3;</italic> &#x3d; 15</td>
<td align="left">
<italic>&#x3c3;</italic> &#x3d; 15</td>
<td align="left">4.52%</td>
<td align="left">&#x2212;11.85%</td>
<td align="left">&#x2212;21.57%</td>
<td align="left">&#x2212;15.32%</td>
</tr>
<tr>
<td align="left">variable <italic>&#x3c3;</italic>
</td>
<td align="left">9.14%</td>
<td align="left">&#x2212;5.96%</td>
<td align="left">&#x2212;8.60%</td>
<td align="left">&#x2212;8.78%</td>
</tr>
<tr>
<td align="left">Practical noise</td>
<td rowspan="2" align="left">variable <italic>&#x3c3;</italic>
</td>
<td rowspan="2" align="left">&#x2212;15.83%</td>
<td rowspan="2" align="left">&#x2212;27.66%</td>
<td rowspan="2" align="left">&#x2212;28.18%</td>
<td rowspan="2" align="left">&#x2212;37.72%</td>
</tr>
<tr>
<td align="left">simulator</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We repeat the denoising experiment for <italic>&#x3c3;</italic> &#x3d; 25, and the results are shown in <xref ref-type="fig" rid="F5">Figure 5</xref> for PSNR and <xref ref-type="fig" rid="F6">Figure 6</xref> for SSIM. As seen in the figures, the gap between the CBM3D and FFDNet-clip performance is now reduced, and the compression-based methods now outperform CBM3D only at the higher rates. The gap between the curves corresponding to JICD and Cheng &#x2b; FFDNet-clip is also reduced. However, JICD still achieves a considerable BD-rate saving compared to Cheng &#x2b; FFDNet-clip, as shown in the third row of <xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="table" rid="T3">Table 3</xref>. JICD trained with variable <italic>&#x3c3;</italic> has slightly better PSNR performance compared to the noise-specific model on three datasets, and a slightly worse performance (by 0.3%) on the low-resolution CBSD68 dataset.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Denoising rate-PSNR curves for <italic>&#x3c3;</italic> &#x3d;25: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Denoising rate-SSIM curves for <italic>&#x3c3;</italic> &#x3d;25: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g006.tif"/>
</fig>
<p>At the lowest noise level (<italic>&#x3c3;</italic> &#x3d; 15), the gap between CBM3D and FFDNet-clip shrinks further. It can be seen in the denoised rate-PSNR curves in <xref ref-type="fig" rid="F7">Figure 7</xref> and rate-SSIM curves in <xref ref-type="fig" rid="F8">Figure 8</xref> that when the noise is weak, applying denoising to the raw images achieves high PSNR, and the compression-based methods cannot outperform either CBM3D, or FFDNet-clip at the tested rates. The gap between JICD and Cheng &#x2b; FFDNet-clip curves is also reduced compared to the higher noise levels. This can also be seen from the BD-rates in the fourth row of <xref ref-type="table" rid="T2">Tables 2</xref>, <xref ref-type="table" rid="T3">3</xref>. JICD trained for <italic>&#x3c3;</italic> &#x3d; 15 outperforms Cheng &#x2b; FFDNet-clip on three datasets, but it suffers a 1% (4.5% for SSIM) loss on the low-resolution CBSD68.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Denoising rate-PSNR curves for <italic>&#x3c3;</italic> &#x3d;15: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g007.tif"/>
</fig>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Denoising rate-SSIM curves for <italic>&#x3c3;</italic> &#x3d;15: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g008.tif"/>
</fig>
<p>As seen above, the performance of the proposed JICD framework is lower on the low-resolution CBSD68 dataset than on other datasets. The reason is the following. The processing pipeline USED in JICD expects the input dimensions to be multiples of 64. For images whose dimensions do not satisfy this requirement, the input is padded up to the nearest multiple of 64. At low resolutions, the padded area may be somewhat large in relation to the original image, which causes noticeable performance degradation. At high resolutions, the padded area is insignificant compared to the original image, and the impact on JICD&#x2019;s performance is correspondingly smaller. It is worth mentioning that for <italic>&#x3c3;</italic> &#x3d; 15, the JICD trained with variable <italic>&#x3c3;</italic> has a weaker denoising performance compared to the model trained specifically for <italic>&#x3c3;</italic> &#x3d; 15. This is because the number of base channels in the variable-<italic>&#x3c3;</italic> model (180) is smaller than the number of base channels in the noise-specific model (190). At low noise levels, fewer channels are needed to hold noise information, which means the number of base channels could be higher. Hence, the structure chosen for the noise-specific model is better suited for this case. However, we show in the next subsection that the variable-<italic>&#x3c3;</italic> model is more useful when the noise parameters are not known.</p>
</sec>
<sec id="s5-5">
<title>5.5 Experiments on unseen noise removal</title>
<sec id="s5-5-1">
<title>5.5.1 Image compression and denoising</title>
<p>The proposed JICD denoiser and the baselines are also tested with the noise that was not used in the training. The purpose of this experiment is to evaluate how well the denoisers are able to handle unseen noise. To generate unseen noise, we used the noise simulator from <xref ref-type="bibr" rid="B31">Ranjbar Alvar and Baji&#x107; (2022)</xref>. This noise simulator, which we subsequently refer to as &#x201c;practical noise simulator,&#x201d; was created by fitting the Poissonian-Gaussian noise model <xref ref-type="bibr" rid="B17">Foi et al. (2008)</xref> to the noise from the Smartphone Image Denoising Dataset (SIDD) <xref ref-type="bibr" rid="B1">Abdelhamed et al. (2018)</xref>. It is worth mentioning that this noise simulator is used in the evaluation of the image denoising task in JPEG AI standardization.</p>
<p>For this experiment we use the JICD model trained with variable <italic>&#x3c3;</italic>. One advantage of this model is that, unlike some of the baselines, it does not require any additional input or noise information, besides the noisy image. On the other hand, FFDNet needs <italic>&#x3c3;</italic> to perform denoising. In the experiment, the <italic>&#x3c3;</italic> is estimated for each image by computing the standard deviation of the difference between the noisy test image and the corresponding clean image.</p>
<p>The denoising rate-PSNR and rate-SSIM curves are illustrated in <xref ref-type="fig" rid="F9">Figures 9</xref>, <xref ref-type="fig" rid="F10">10</xref>, respectively. Since the variance of the noise obtained from the practical noise simulator is not large, the PSNR range of the denoised images is close to that observed in the AWGN experiments with <italic>&#x3c3;</italic> &#x3d; 15 and <italic>&#x3c3;</italic> &#x3d; 25. The results indicate that JICD achieves better denoising performance compared to Cheng &#x2b; FFDNet-clip across all four datasets. Moreover, at higher bitrates (1 bpp and above), JICD outperforms CBM3D applied to uncompressed noisy images. BD-rate results are summarized in the last row of <xref ref-type="table" rid="T2">Tables 2</xref>, <xref ref-type="table" rid="T3">3</xref>. It is seen in the table that JICD achieves 15&#x2013;30% gain over Cheng &#x2b; FFDNer-clip across the four datasets.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Denoising rate-PSNR curves for the unseen noise: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g009.tif"/>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Denoising rate-SSIM curves for the unseen noise: <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g010.tif"/>
</fig>
<p>A visual example comparing the denoised images obtained from JICD and Cheng &#x2b; FFDNet-clip encoded at similar bitrates is shown in <xref ref-type="fig" rid="F11">Figure 11</xref>. As seen in the figure, JICD preserves more details compared to Cheng &#x2b; FFDNet-clip. In addition, the colors inside the white circle are reproduced closer to the ground truth with JICD compared to the image produced by Cheng &#x2b; FFDNet-clip.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>An example of denoised images. Top to Bottom: noisy image, clean image, Denoised: Cheng &#x2b; FFDNet-clip (bpp &#x3d; 0.57), Denoised: proposed (bpp &#x3d; 0.55). Images in the right column show the red square in the left images.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g011.tif"/>
</fig>
</sec>
<sec id="s5-5-2">
<title>5.5.2 Noisy image reconstruction</title>
<p>Besides denoising, the proposed JICD framework is also able to reconstruct the noisy input image when enhancement features are decoded together with base features. While the main focus of this work was on denoising (and the majority of experiments devoted to that goal), for completeness we also evaluate the noisy image reconstruction performance using unseen noise. We compare the noisy input reconstruction performance of JICD against <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref>, i.e., the compression model used earlier in the Cheng &#x2b; FFDNet-clip baseline. The PSNR between the noisy input and the reconstructed noisy images is shown against bitrate in <xref ref-type="fig" rid="F12">Figure 12</xref>, while <xref ref-type="fig" rid="F13">Figure 13</xref> shows SSIM vs bitrate. As illustrated in <xref ref-type="fig" rid="F12">Figure 12</xref>, our JICD achieves better noisy input reconstruction compared to <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref> in most cases. BD-rate results corresponding to <xref ref-type="fig" rid="F12">Figures 12</xref>, <xref ref-type="fig" rid="F13">13</xref> are given in <xref ref-type="table" rid="T4">Tables 4</xref>, <xref ref-type="table" rid="T5">5</xref>, respectively. As the numbers in <xref ref-type="table" rid="T4">Table 4</xref> indicate, the proposed JICD achieves noticeable BD-rate savings on three of the four test datasets; the only exception is, again, the low-resolution CBSD68 dataset, where the loss is mainly concentrated at higher bitrates. It is worth noting that, since our proposed method is trained using the MSE loss, it performs better in terms of PSNR than SSIM. Overall, the proposed JICD framework achieves gains on both denoising and compression tasks compared to Cheng &#x2b; FFDNet-clip and <xref ref-type="bibr" rid="B9">Cheng et al. (2020)</xref> models.</p>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>The rate-PSNR curves for noisy input reconstruction. <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG-AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g012.tif"/>
</fig>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>The rate-SSIM curves for noisy input reconstruction. <bold>(A)</bold> CBSD68, <bold>(B)</bold> Kodak24, <bold>(C)</bold> McMaster, <bold>(D)</bold> JPEG-AI.</p>
</caption>
<graphic xlink:href="frsip-02-932873-g013.tif"/>
</fig>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>The PSNR-based BD-rate of the proposed JICD compared to the Cheng model on noisy input reconstruction.</p>
</caption>
<table>
<thead>
<tr>
<th align="left">Noise type</th>
<th align="left">Model</th>
<th align="left">CBSD68</th>
<th align="left">Kodak24</th>
<th align="left">McMaster</th>
<th align="left">JPEG AI</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Practical noise</td>
<td rowspan="2" align="left">variable <italic>&#x3c3;</italic>
</td>
<td rowspan="2" align="left">5.50%</td>
<td rowspan="2" align="left">&#x2212;11.74%</td>
<td rowspan="2" align="left">&#x2212;3.97%</td>
<td rowspan="2" align="left">&#x2212;13.49%</td>
</tr>
<tr>
<td align="left">simulator</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>The SSIM-based BD-rate of the proposed JICD compared to the Cheng model on noisy input reconstruction.</p>
</caption>
<table>
<thead>
<tr>
<th align="left">Noise type</th>
<th align="left">Model</th>
<th align="left">CBSD68</th>
<th align="left">Kodak24</th>
<th align="left">McMaster</th>
<th align="left">JPEG AI</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Practical noise</td>
<td rowspan="2" align="left">variable <italic>&#x3c3;</italic>
</td>
<td rowspan="2" align="left">22.58%</td>
<td rowspan="2" align="left">1.90%</td>
<td rowspan="2" align="left">4.05%</td>
<td rowspan="2" align="left">0.58%</td>
</tr>
<tr>
<td align="left">simulator</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s6">
<title>6 Conclusion</title>
<p>In this work, we presented a joint image compression and denoising framework. The proposed framework is a scalable multi-task image compression model based on the latent-space scalability. The base features are used to perform the denoising and the enhancement features are used when the noisy input reconstruction is needed. Extensive experiments show that the proposed framework achieves significant BD-rate savings up to 80.20% across different dataset compared to the cascade compression and denoising method. The experimental results also indicate that the proposed method achieves improved results for the unseen noise for both denoising and noisy input reconstruction tasks.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: For CBSD8 and Kodak24 datasets: <ext-link ext-link-type="uri" xlink:href="https://github.com/cszn/KAIR">https://github.com/cszn/KAIR</ext-link>, For McMaster dataset: <ext-link ext-link-type="uri" xlink:href="https://www4.comp.polyu.edu.hk/%7Ecslzhang/CDM_Dataset.htm">https://www4.comp.polyu.edu.hk/&#x223c;cslzhang/CDM_Dataset.htm</ext-link>, For JPEG AI: <ext-link ext-link-type="uri" xlink:href="https://jpeg.org/jpegai/dataset.html">https://jpeg.org/jpegai/dataset.html</ext-link>.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>SA and IB contributed to conception and design of the study. HC developed the initial code. MU and SA contributed to further code development and optimization. SA wrote the first draft of the manuscript and worked with IB on the revisions.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>Funding for this work was provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada under the grants RGPIN-2021-02485 and RGPAS-2021-00038, and by Huawei Technologies.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abdelhamed</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>M. S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A high-quality denoising dataset for smartphone cameras</article-title>. <source>Proc. CVPR</source> <volume>18</volume>, <fpage>1692</fpage>&#x2013;<lpage>1700</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr.2018.00182</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alves de Oliveira</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Chabert</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Oberlin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Poulliat</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Latry</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Satellite image compression and denoising with neural networks</article-title>. <source>IEEE Geosci. Remote Sens. Lett.</source> <volume>19</volume>, <fpage>1</fpage>&#x2013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/lgrs.2022.3145992</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ball&#xe9;</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Laparra</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Simoncelli</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>End-to-end optimized image compression</article-title>. <source>Proc. ICLR</source> <volume>17</volume>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ball&#xe9;</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Minnen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Variational image compression with a scale hyperprior</article-title>. <source>Proc</source>, <fpage>ICLR&#x2019;18</fpage>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>B&#xe9;gaint</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Racap&#xe9;</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Feltman</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pushparaja</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Compressai: A pytorch library and evaluation platform for end-to-end compression research</article-title>. <comment>
<italic>arXiv preprint arXiv:2011.03029</italic>
</comment>. </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bj&#xf8;ntegaard</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2001</year>). &#x201c;<article-title>VCEG-M33: Calculation of average PSNR differences between RD-curves</article-title>,&#x201d; in <source>
<italic>Video coding experts group (VCEG)</italic> (ITU &#x2013;telecommunications standardization)</source>. </citation>
</ref>
<ref id="B7">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Burger</surname>
<given-names>H. C.</given-names>
</name>
<name>
<surname>Schuler</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Harmeling</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>Image denoising: Can plain neural networks compete with bm3d?</article-title>,&#x201d; in <conf-name>2012 IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>2392</fpage>&#x2013;<lpage>2399</lpage>. </citation>
</ref>
<ref id="B8">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Image blind denoising with generative adversarial network based noise modeling</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name>, <fpage>3155</fpage>&#x2013;<lpage>3164</lpage>. </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Takeuchi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Katto</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Learned image compression with discretized Gaussian mixture likelihoods and attention modules</article-title>. <source>Proc. IEEE CVPR</source> <volume>20</volume>, <fpage>7936</fpage>&#x2013;<lpage>7945</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr42600.2020.00796</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Baji&#x107;</surname>
<given-names>I. V.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Latent-space scalability for multi-task collaborative intelligence</article-title>. <source>Proc. IEEE ICIP.</source>, <fpage>3562</fpage>&#x2013;<lpage>3566</lpage>. <pub-id pub-id-type="doi">10.1109/icip42928.2021.9506712</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Baji&#x107;</surname>
<given-names>I. V.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Scalable image coding for humans and machines</article-title>. <source>IEEE Trans. Image Process.</source> <volume>31</volume>, <fpage>2739</fpage>&#x2013;<lpage>2754</lpage>. <pub-id pub-id-type="doi">10.1109/tip.2022.3160602</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>El-Khamy</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Variable rate deep image compression with a conditional autoencoder</article-title>. <source>Proc. IEEE/CVF ICCV.</source>, <fpage>3146</fpage>&#x2013;<lpage>3154</lpage>. </citation>
</ref>
<ref id="B13">
<citation citation-type="web">
<collab>CLIC</collab> (<year>2019</year>). <article-title>Challenge on learned image compression (CLIC)</article-title>. <comment>[Online]: <ext-link ext-link-type="uri" xlink:href="http://www.compression.cc/">http://www.compression.cc/</ext-link>.</comment> </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cover</surname>
<given-names>T. M.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>2006</year>). <source>Elements of information theory</source>. <edition>2nd edn</edition>. <publisher-name>Wiley</publisher-name>. </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dabov</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Foi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Katkovnik</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Egiazarian</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2007a</year>). <article-title>Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminance-chrominance space</article-title>. <source>Proc. IEEE ICIP</source> <volume>07</volume> (<issue>1</issue>), <fpage>313</fpage>&#x2013;<lpage>316</lpage>. <comment>I &#x2013;</comment>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dabov</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Foi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Katkovnik</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Egiazarian</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2007b</year>). <article-title>Image denoising by sparse 3-d transform-domain collaborative filtering</article-title>. <source>IEEE Trans. Image Process.</source> <volume>16</volume>, <fpage>2080</fpage>&#x2013;<lpage>2095</lpage>. <pub-id pub-id-type="doi">10.1109/tip.2007.901238</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Foi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Trimeche</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Katkovnik</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Egiazarian</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data</article-title>. <source>IEEE Trans. Image Process.</source> <volume>17</volume>, <fpage>1737</fpage>&#x2013;<lpage>1754</lpage>. <pub-id pub-id-type="doi">10.1109/tip.2008.2001399</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Franzen</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>1999</year>). <source>Kodak lossless true color image suite</source>. <comment>
<italic>source:</italic> <ext-link ext-link-type="uri" xlink:href="http://r0k.us/graphics/kodak%204">http://r0k.us/graphics/kodak 4</ext-link>.</comment> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Weighted nuclear norm minimization with application to image denoising</article-title>. <source>Proc. IEEE CVPR</source> <volume>14</volume>, <fpage>2862</fpage>&#x2013;<lpage>2869</lpage>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Toward convolutional blind denoising of real photographs</article-title>. <source>Proc. IEEE CVPR</source> <volume>19</volume>, <fpage>1712</fpage>&#x2013;<lpage>1722</lpage>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Causal contextual prediction for learned image compression</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>32</volume>, <fpage>2329</fpage>&#x2013;<lpage>2341</lpage>. <pub-id pub-id-type="doi">10.1109/tcsvt.2021.3089491</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<collab>ISO/IEC and ITU-T</collab> (<year>2022a</year>). <article-title>Final call for proposals for JPEG AI. ISO/IEC JTC 1/SC29/WG1 N100095</article-title> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<collab>ISO/IEC and ITU-T</collab> (<year>2022b</year>). <article-title>JPEG AI use cases and requirements. ISO/IEC JTC 1/SC29/WG1 N100094</article-title> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnston</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Vincent</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Minnen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Covell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chinen</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks</article-title>. <source>Proc. IEEE/CVF CVPR.</source>, <fpage>4385</fpage>&#x2013;<lpage>4393</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr.2018.00461</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laine</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Karras</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lehtinen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Aila</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>High-quality self-supervised deep image denoising</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>32</volume>. </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>D. R.</given-names>
</name>
<name>
<surname>Fowlkes</surname>
<given-names>C. C.</given-names>
</name>
<name>
<surname>Tal</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Malik</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics</article-title>. <source>Proc. IEEE ICCV</source> <volume>01</volume>, <fpage>416</fpage>&#x2013;<lpage>425</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Minnen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ball&#xe9;</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Toderici</surname>
<given-names>G. D.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Joint autoregressive and hierarchical priors for learned image compression</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>31</volume>, <fpage>10771</fpage>&#x2013;<lpage>10780</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Minnen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Toderici</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Covell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chinen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Shor</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Spatially adaptive image compression using a tiled deep network</article-title>. <source>Proc. IEEE ICIP.</source>, <fpage>2796</fpage>&#x2013;<lpage>2800</lpage>. </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Paszke</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Massa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lerer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bradbury</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chanan</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). &#x201c;<article-title>Pytorch: An imperative style, high-performance deep learning library</article-title>,&#x201d; in <source>Advances in neural information processing systems</source> (<publisher-name>MIT Press</publisher-name>), <volume>32</volume>, <fpage>8024</fpage>&#x2013;<lpage>8035</lpage>. <comment>Curran Associates, Inc</comment>. </citation>
</ref>
<ref id="B30">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Quan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Self2self with dropout: Learning self-supervised denoising from single image</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</conf-name>, <fpage>1890</fpage>&#x2013;<lpage>1898</lpage>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ranjbar Alvar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baji&#x107;</surname>
<given-names>I. V.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Practical noise simulation for RGB images</article-title>. <comment>
<italic>arXiv preprint arXiv:2201.12773</italic>
</comment>. </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schwarz</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Marpe</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wiegand</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Overview of the scalable video coding extension of the h.264/avc standard</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>17</volume>, <fpage>1103</fpage>&#x2013;<lpage>1120</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT.2007.905532</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sebai</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Multi-rate deep semantic image compression with quantized modulated autoencoder</article-title>. <source>Proc. IEEE MMSP.</source>, <fpage>1</fpage>&#x2013;<lpage>6</lpage>. </citation>
</ref>
<ref id="B34">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Testolina</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Upenik</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ebrahimi</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2021</year>)., <volume>11842</volume>. <publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>SPIE</publisher-name>, <fpage>412</fpage>&#x2013;<lpage>422</lpage>.<article-title>Towards image denoising in the latent space of learning-based compression</article-title>
<source>Appl. Digital Image Process. XLIV</source> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Toderici</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>O&#x2019;Malley</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Vincent</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Minnen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Baluja</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Variable rate image compression with recurrent neural networks</article-title>. In <comment>ICLR</comment> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Multi-channel weighted nuclear norm minimization for real color image denoising</article-title>. <source>Proc. IEEE Int. Conf. Comput. Vis.</source>, <fpage>1096</fpage>&#x2013;<lpage>1104</lpage>. <pub-id pub-id-type="doi">10.1109/iccv.2017.125</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Patch group based nonlocal self-similarity prior learning for image denoising</article-title>. <source>Proc. IEEE Int. Conf. Comput. Vis.</source>, <fpage>244</fpage>&#x2013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1109/iccv.2015.36</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yahya</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Bm3d image denoising algorithm based on an adaptive filtering</article-title>. <source>Multimed. Tools Appl.</source> <volume>79</volume>, <fpage>20391</fpage>&#x2013;<lpage>20427</lpage>. <pub-id pub-id-type="doi">10.1007/s11042-020-08815-8</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Herranz</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Weijer</surname>
<given-names>J. v. d.</given-names>
</name>
<name>
<surname>Guiti&#xe1;n</surname>
<given-names>J. A. I.</given-names>
</name>
<name>
<surname>L&#xf3;pez</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Mozerov</surname>
<given-names>M. G.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Variable rate deep image compression with modulated autoencoder</article-title>. <source>IEEE Signal Process. Lett.</source> <volume>27</volume>, <fpage>331</fpage>&#x2013;<lpage>335</lpage>. <pub-id pub-id-type="doi">10.1109/lsp.2020.2970539</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Universal efficient variable-rate neural image compression</article-title>. <source>Proc. IEEE ICASSP.</source>, <fpage>2025</fpage>&#x2013;<lpage>2029</lpage>. </citation>
</ref>
<ref id="B41">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zha</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Simultaneous nonlocal self-similarity prior for image denoising</article-title>,&#x201d; in <conf-name>2019 IEEE International Conference on Image Processing (ICIP)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1119</fpage>&#x2013;<lpage>1123</lpage>. </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Van Gool</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Timofte</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Plug-and-play image restoration with deep denoiser prior</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/tpami.2021.3088914</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising</article-title>. <source>IEEE Trans. Image Process.</source> <volume>26</volume>, <fpage>3142</fpage>&#x2013;<lpage>3155</lpage>. <pub-id pub-id-type="doi">10.1109/tip.2017.2662206</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>FFDNet: Toward a fast and flexible solution for cnn-based image denoising</article-title>. <source>IEEE Trans. Image Process.</source> <volume>27</volume>, <fpage>4608</fpage>&#x2013;<lpage>4622</lpage>. <pub-id pub-id-type="doi">10.1109/tip.2018.2839891</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Buades</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Color demosaicking by local directional interpolation and nonlocal adaptive thresholding</article-title>. <source>J. Electron. Imaging</source> <volume>20</volume>, <fpage>023016</fpage>. <pub-id pub-id-type="doi">10.1117/1.3600632</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>