<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomp.2022.910233</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Face beautification: Beyond makeup transfer</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Xudong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1659816/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Ruizhe</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Peng</surname> <given-names>Hao</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Yin</surname> <given-names>Minglei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Chih-Fan</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Li</surname> <given-names>Xin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1299931/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Lane Department of Computer Science and Electrical Engineering, West Virginia University</institution>, <addr-line>Morgantown, WV</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>ObEN, Inc.</institution>, <addr-line>Pasadena, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Matteo Ferrara, University of Bologna, Italy</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Hongfu Liu, Brandeis University, United States; Sudha Velusamy, Samsung, India</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Xin Li <email>xin.li&#x00040;mail.wvu.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computer Vision, a section of the journal Frontiers in Computer Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>4</volume>
<elocation-id>910233</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>04</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>09</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Liu, Wang, Peng, Yin, Chen and Li.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Liu, Wang, Peng, Yin, Chen and Li</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Facial appearance plays an important role in our social lives. Subjective perception of women&#x00027;s beauty depends on various face-related (e.g., skin, shape, hair) and environmental (e.g., makeup, lighting, angle) factors. Similarly to cosmetic surgery in the physical world, virtual face beautification is an emerging field with many open issues to be addressed. Inspired by the latest advances in style-based synthesis and face beauty prediction, we propose a novel framework for face beautification. For a given reference face with a high beauty score, our GAN-based architecture is capable of translating an inquiry face into <italic>a sequence of</italic> beautified face images with the referenced beauty style and the target beauty score values. To achieve this objective, we propose to integrate both style-based beauty representation (extracted from the reference face) and beauty score prediction (trained on the SCUT-FBP database) into the beautification process. Unlike makeup transfer, our approach targets many-to-many (instead of one-to-one) translation, where multiple outputs can be defined by different references with various beauty scores. Extensive experimental results are reported to demonstrate the effectiveness and flexibility of the proposed face beautification framework. To support reproducible research, the source codes accompanying this work will be made publicly available on GitHub.</p></abstract>
<kwd-group>
<kwd>face beautification</kwd>
<kwd>GAN</kwd>
<kwd>beauty representation</kwd>
<kwd>prediction</kwd>
<kwd>translation</kwd>
</kwd-group>
<counts>
<fig-count count="11"/>
<table-count count="2"/>
<equation-count count="9"/>
<ref-count count="43"/>
<page-count count="12"/>
<word-count count="6528"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Facial appearance plays an important role in our social lives (Bull and Rumsey, <xref ref-type="bibr" rid="B3">2012</xref>). People with attractive faces have many advantages in their social activities, such as dating and voting (Little et al., <xref ref-type="bibr" rid="B22">2011</xref>). Attractive people have been found to have higher chances of dating (Riggio and Woll, <xref ref-type="bibr" rid="B33">1984</xref>), and their partners are more likely to gain satisfaction compared to less attractive people (Berscheid et al., <xref ref-type="bibr" rid="B1">1971</xref>). Faces have also been found to affect hiring decisions and influence voting behavior (Little et al., <xref ref-type="bibr" rid="B22">2011</xref>). Overwhelmed by the social fascination with beauty, women with unattractive faces can suffer from social isolation, depression, and even psychological disorders (Macgregor, <xref ref-type="bibr" rid="B26">1989</xref>; Phillips et al., <xref ref-type="bibr" rid="B31">1993</xref>; Bradbury, <xref ref-type="bibr" rid="B2">1994</xref>; Rankin et al., <xref ref-type="bibr" rid="B32">1998</xref>; Bull and Rumsey, <xref ref-type="bibr" rid="B3">2012</xref>). Consequently, there is strong demand for face beautification both in the physical world (e.g., facial makeup and cosmetic surgeries) and in the virtual space (e.g., beautification cameras and filters). To our knowledge, there is no existing work on face beautification that can achieve fine-granularity control of beauty scores.</p>
<p>The problem of face beautification has been extensively studied by philosophers, psychologists, and plastic surgeons. Rapid advances in imaging technology and social networks greatly expedited the popularity of digital photos, especially selfies, in our daily lives. Most recently, virtual face beautification based on the idea of applying or transferring makeup has been developed in computer vision communities, such as PairedCycleGAN (Chang et al., <xref ref-type="bibr" rid="B4">2018</xref>), BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>), BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>). Although these existing works have achieved impressive results, we argue that makeup transfer-based face beautification has fundamental limitations. Without changing important facial attributes (e.g., shape and lentigo), makeup application, abstracted by image-to-image translation (Zhu et al., <xref ref-type="bibr" rid="B43">2017</xref>; Huang et al., <xref ref-type="bibr" rid="B16">2018</xref>; Lee et al., <xref ref-type="bibr" rid="B19">2018</xref>)&#x02014;can only improve the beauty score to some extent.</p>
<p>A more flexible and promising framework is to formalize the face beautification process by translating <italic>one-to-many</italic> translation, where the destination can be defined in many ways (<xref ref-type="fig" rid="F1">Figure 1</xref>). The motivation behind this work is two-fold. On the one hand, we can target producing a sequence of output images with monotonically increasing beauty scores by gradually transferring the style-based beauty representation learned from a given reference (with a high beauty score). On the other hand, we can also produce a variety of personalized beautification results by learning from a sequence of references (e.g., celebrities with different beauty styles). In this framework, face beautification can be made more flexible; for example, we can transfer the beauty style from a reference image to reach a specified beauty score, which is beyond the reach of makeup transfer (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>; Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Face beautification as many-to-many image translation: Our approach integrates style-based beauty representation with a beauty score prediction model and is capable of fine-granularity control.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0001.tif"/>
</fig>
<p>To achieve this objective, we propose a novel architecture based on generative adversarial networks (GANs). Inspired by the latest advances in style-based synthesis [e.g., styleGAN (Karras et al., <xref ref-type="bibr" rid="B17">2019</xref>)] and face beauty understanding from data (Liu et al., <xref ref-type="bibr" rid="B23">2019</xref>), we propose to integrate both style-based beauty representation (extracted from the reference face) and beauty score prediction (trained on the SCUT-FBP database Xie et al., <xref ref-type="bibr" rid="B41">2015</xref>) into the face beautification process. More specifically, style-based beauty representations will first be learned from both inquiry and reference images <italic>via</italic> a light convolutional neural network (LightCNN) and used to guide the process of style transfer (actual beautification). Then, a dedicated GAN-based architecture is constructed and integrated with the reconstruction, beauty, and identity loss functions. To have fine-granularity control of the beautification process, we have invented a simple, yet effective, reweighting strategy that gradually improves the beauty score in synthesized images until reaching the target (specified by the reference image).</p>
<p>Our key contributions are summarized below.</p>
<list list-type="bullet">
<list-item><p>A forward-looking view toward virtual face beautification and a holistic style-based approach beyond makeup transfer (e.g., BeautyGAN and BeautyGlow). We argue that facial beauty scores offer a quantitative solution to guide the facial beautification process.</p></list-item>
<list-item><p>A face beauty prediction network is trained and integrated into the proposed style-based face beautification network. The prediction module provides valuable feedback to the synthesis module as we approach the desirable beauty score.</p></list-item>
<list-item><p>A piggyback trick to extract both identity and beauty features from fine-tuned LightCNN and the design of loss functions, reflecting the trade-off between identity preservation and face beautification.</p></list-item>
<list-item><p>To the best of our knowledge, this is the first work capable of delivering facial beautification results with fine granularity control (that is, a sequence of face images that approach the reference with increasing beauty scores monotonically).</p></list-item>
<list-item><p>A comprehensive evaluation shows the superiority of the proposed approach compared to existing state-of-the-art image-to-image transfer techniques, including CycleGAN (Zhu et al., <xref ref-type="bibr" rid="B43">2017</xref>), MUNIT (Huang et al., <xref ref-type="bibr" rid="B16">2018</xref>), and DRIT (Lee et al., <xref ref-type="bibr" rid="B19">2018</xref>).</p></list-item>
</list></sec>
<sec id="s2">
<title>2. Related works</title>
<sec>
<title>2.1. Makeup and style transfer</title>
<p>Two recent works on face beauty are BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>) and BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>). In BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>), the makeup features (e.g., eyeshadows and lip gloss) are first extracted from the reference makeup images and then transferred to the non-makeup images source. The magnification parameter in the latent space can be adjusted to adjust the extent of the makeup. In BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>), the issue of extracting/transferring local and delicate makeup information was addressed by incorporating global domain-level loss and local instance-level loss in a dual input / output GAN.</p>
<p>Face beautification is also related to more general image-to-image translation. Both symmetric (e.g., CycleGAN Zhu et al., <xref ref-type="bibr" rid="B43">2017</xref>) and asymmetric (e.g., PairedCycleGAN Chang et al., <xref ref-type="bibr" rid="B4">2018</xref>) have been studied in the literature; the latter was shown to be effective for makeup application and removal. Extensions of style transfer to the multimodal domain (that is, one to many translations) have been considered in MUNIT (Huang et al., <xref ref-type="bibr" rid="B16">2018</xref>) and DRIT (Lee et al., <xref ref-type="bibr" rid="B19">2018</xref>). It is also worth mentioning the synthesis of face images <italic>via</italic> StyleGAN (Karras et al., <xref ref-type="bibr" rid="B17">2019</xref>), which has shown superrealistic performance.</p>
</sec>
<sec>
<title>2.2. Face beauty prediction</title>
<p>Perception of facial appearance or attractiveness is a classical topic in psychology and cognitive sciences (Perrett et al., <xref ref-type="bibr" rid="B30">1998</xref>, <xref ref-type="bibr" rid="B29">1999</xref>; Thornhill and Gangestad, <xref ref-type="bibr" rid="B36">1999</xref>). However, developing a computational algorithm that can automatically predict beauty scores from facial images is only a recent endeavor (Eisenthal et al., <xref ref-type="bibr" rid="B9">2006</xref>; Gan et al., <xref ref-type="bibr" rid="B11">2014</xref>). Thanks to the public release of the SCUT-FBP face beauty database (Xie et al., <xref ref-type="bibr" rid="B41">2015</xref>), there has been a growing interest in machine learning-based approaches to face beauty prediction (Fan et al., <xref ref-type="bibr" rid="B10">2017</xref>; Xu et al., <xref ref-type="bibr" rid="B42">2017</xref>).</p></sec></sec>
<sec id="s3">
<title>3. Proposed method</title>
<sec>
<title>3.1. Facial attractiveness theory</title>
<p>Why does facial attractiveness matter? From an evolutionary perspective, a plausible working hypothesis is that the psychological mechanisms underlying primates&#x00027; judgments about attractiveness are consequences of long-period evolution and adaptation. More specifically, facial attractiveness is beneficial in choosing a partner, which in turn facilitates gene propagation (Thornhill and Gangestad, <xref ref-type="bibr" rid="B36">1999</xref>). At the primitive level, facial attractiveness is hypothesized to reflect information about an individual&#x00027;s health. Consequently, conventional wisdom in the research on facial attractiveness has focused on <italic>ad hoc</italic> attributes such as facial symmetry and averageness as potential biomarkers. In the history of modern civilization, the social norm of facial attractiveness has constantly evolved and varies from region to region (e.g., the sharp contrast between eastern and western culture Cunningham, <xref ref-type="bibr" rid="B6">1986</xref>).</p>
<p>In particular, facial attractiveness for young women is a stimulating topic, as evidenced by the long-lasting popularity of beauty pageants. In Cunningham (<xref ref-type="bibr" rid="B6">1986</xref>), the relationship between the facial characteristics of women and the responses of men was investigated. Based on the attractiveness ratings of male subjects, two classes of facial features (e.g., large eyes, small nose, and small chin; prominent cheekbones and narrow cheeks) are positively correlated with the attractiveness ratings. It is also known from the same study (Cunningham, <xref ref-type="bibr" rid="B6">1986</xref>) that facial features can also predict personality attributions and altruistic tendencies. In this work, we opt to focus on the beauty of the face for women only.</p>
</sec>
<sec>
<title>3.2. Problem formulation and motivation</title>
<p>Given a target face (an ordinary one that is less attractive) and a reference face (usually a celebrity one with a high beauty score), how can we beautify the target face by transferring relevant information from the reference image? This problem of facial beautification can be formulated as two subproblems: <italic>style transfer</italic> and <italic>beauty prediction</italic>. Meanwhile, an important new insight brought into our problem formulation is that facial beautification treatment as a sequential process in which the beauty score of the target face can be gradually improved by successive style transfer steps. As the fine-granularity style transfer proceeds, the beauty score of the beautified target face will monotonically approach that of the reference face.</p>
<p>The problem of style transfer has been extensively studied in the literature, dating back to content-style separation (Tenenbaum and Freeman, <xref ref-type="bibr" rid="B35">2000</xref>). The idea of extracting style-based representation (style code) has attracted increasingly more attention in recent years, e.g. Mathieu et al. (<xref ref-type="bibr" rid="B28">2016</xref>), Donahue et al. (<xref ref-type="bibr" rid="B7">2017</xref>), Huang and Belongie (<xref ref-type="bibr" rid="B15">2017</xref>), Huang et al. (<xref ref-type="bibr" rid="B16">2018</xref>), and Lee et al. (<xref ref-type="bibr" rid="B19">2018</xref>). Note that makeup transfer only represents a special case where style is characterized by local features only (e.g., eyeshadow and lipstick). In this work, we conceive of a more generalized solution to transfer both global and local style codes from the reference image. The extraction of style codes will be based on the solution to the other problem of beauty prediction. Such a sharing of learned features between style transfer and beauty prediction allows us to achieve fine-granularity control over the process of beautification.</p>
</sec>
<sec>
<title>3.3. Architecture design</title>
<p>As illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>, we use <italic>A</italic> and <italic>B</italic> to denote the target face (unattractive) and the reference face (attractive), respectively. The objective of beautification is to translate the image <italic>A</italic> into a new image <italic>AB</italic> whose beauty score is <italic>Q</italic>-percent close to that of <italic>B</italic> (<italic>Q</italic> is an integer between 0 and 100 specifying the granularity of the beauty transfer). Assume that both images <italic>A</italic> and <italic>B</italic> can be decomposed into a two-part representation consisting of style and content. That is, both images will be encoded by a pair of encoders: content encoder (identity) <italic>E</italic><sub><italic>c</italic></sub> and style encoder (beauty) <italic>E</italic><sub><italic>s</italic></sub>, respectively. In order to transfer the beauty style from reference <italic>B</italic> to target <italic>A</italic>, it is natural to concatenate the representation based on content (identity) <italic>C</italic><sub><italic>a</italic></sub> with the representation based on style (beauty) <italic>S</italic><sub><italic>b</italic></sub>; and then reconstruct the beautified image &#x000C3; through a dedicated decoder <italic>G</italic> defined by</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The rest of our architecture in <xref ref-type="fig" rid="F2">Figure 2</xref> mainly includes two components: a GAN-based module (<italic>G</italic> pairs with <italic>D</italic>) responsible for style transfer and a beauty and identity loss module responsible for beauty prediction (please refer to <xref ref-type="fig" rid="F3">Figure 3</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Overview of the proposed network architecture (training phase). Left: given a source image A and a target image B, our objective is to transfer the style of target image B to A in a continuous (fine-granularity control) manner. Right: (<italic>S</italic><sub><italic>a</italic>/<italic>b</italic></sub>, <italic>C</italic><sub><italic>a</italic>/<italic>b</italic></sub>) denote the style-content encoder for images <italic>A</italic>/<italic>B</italic> respectively (<italic>A</italic>&#x02032;, <italic>B</italic>&#x02032; denote the reconstructed images of <italic>A, B</italic>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>(A)</bold> Fine-tuning network for beauty score prediction. <bold>(B)</bold> Testing stage for fine-granularity beautification adjustment.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0003.tif"/>
</fig>
<p>Our GAN module consisting of two encoders, one decoder, and one discriminator aims at distilling the beauty/style representation from the reference image and embedding it into the target image for the purpose of beautification. Inspired by recent work (Ulyanov et al., <xref ref-type="bibr" rid="B37">2017</xref>), we propose to integrate an instance normalization layer (IN) after convolutional layers as part of the encoder for content feature extraction. Meanwhile, a global average pooling and a fully connected layer follow convolutional layers as part of the encoder for beauty feature extraction. Note that we skip the layer in the beauty encoder because IN would remove the characteristics of the original feature that represent critical beauty-related information (Huang and Belongie, <xref ref-type="bibr" rid="B15">2017</xref>) (which is why we keep it within the content encoder). To cooperate with the beauty encoder and speed up translation, the decoder is equipped with an Adaptive Instance Normalization (AdaIN) (Huang and Belongie, <xref ref-type="bibr" rid="B15">2017</xref>). Furthermore, we have adopted the popular multiscale discriminators (Wang et al., <xref ref-type="bibr" rid="B39">2018b</xref>) with least squares GAN (LSGAN) (Mao et al., <xref ref-type="bibr" rid="B27">2017</xref>) as the discriminator in our GAN module.</p>
<p>Our beauty prediction module is based on fine-tuning an existing LightCNN (Wu et al., <xref ref-type="bibr" rid="B40">2018</xref>) as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. It is difficult to train a deep neutral network for beauty prediction from scratch with limited labeled beauty score data, we opt to work with LightCNN (Wu et al., <xref ref-type="bibr" rid="B40">2018</xref>), a pre-trained model for face recognition with millions of face images. Instead, we employ a fine-tuning layer (FC2) to adapt it to the prediction of the beauty score (FC2 plays the role of a beauty feature extractor). Meanwhile, to preserve identity during face beautification, we propose taking full advantage of our beauty prediction model by piggybacking on the identity feature it produced. More specifically, the identity feature is generated from the second fully connected layer (FC1) of LightCNN; note that we have only fine-tuned the last fully connected layer (FC2) for beauty prediction. Using this piggyback trick, we managed to extract both identity and beauty features from one standard model.</p>
</sec>
<sec>
<title>3.4. Fine-granularity beauty adjustment</title>
<p>As we argued before, beautification should be modeled by a continuous process rather than a discrete domain transfer. To achieve fine-granularity control of the beautification process, we propose a weighted beautification equation by</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>w</italic><sub>1</sub>&#x0002B;<italic>w</italic><sub>2</sub> &#x0003D; 1 and 0 &#x02264; <italic>w</italic><sub>1</sub>, <italic>w</italic><sub>2</sub> &#x02264; 1. It is easy to observe the two extreme cases: 1) Eq. (2) degenerates into reconstruction when <italic>w</italic><sub>1</sub> &#x0003D; 1, <italic>w</italic><sub>2</sub> &#x0003D; 0; 2) Eq. (2) corresponds to the most complete beautification when <italic>w</italic><sub>1</sub> &#x0003D; 0, <italic>w</italic><sub>2</sub> &#x0003D; 1. This linear weighting strategy represents a simple solution to adjust the amount of beautification.</p>
<p>To make our model more robust, we have adopted the following training strategy: Replace <italic>G</italic>[<italic>E</italic><sub><italic>c</italic></sub>(<italic>A</italic>), <italic>E</italic><sub><italic>s</italic></sub>(<italic>A</italic>)&#x0002B;<italic>E</italic><sub><italic>s</italic></sub>(<italic>B</italic>)] with <italic>G</italic>[<italic>E</italic><sub><italic>c</italic></sub>(<italic>A</italic>), <italic>E</italic><sub><italic>s</italic></sub>(<italic>B</italic>)] in the training stage so that we do not need to train multiple weighted models when the weights vary. Instead, we apply the weighted beautification equation of Eq. (2) for testing directly. In other words, we pretend that the beauty feature of the target image <italic>A</italic> is forgotten during training; and partially exploit it during testing (since it is less relevant than the identity feature). In summary, our strategy for fine-granularity beauty adjustment is highly dependent on the ability of the beauty encoder <italic>E</italic><sub><italic>s</italic></sub> to reliably extract beauty representation. The effectiveness of the proposed fine-granularity beauty adjustment can be justified by referring to <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Beauty degree adjustment by controlled beauty representation (the leftmost is the original input, from left to right: light to heavy beautification).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0004.tif"/>
</fig>
</sec>
<sec>
<title>3.5. Loss functions</title>
<sec>
<title>3.5.1. Image reconstruction</title>
<p>Both the encoder and decoder need to ensure that target and reference images can be approximately reconstructed from the extracted content/style representation. Here, we have adopted the <italic>L</italic><sub>1</sub>-norm for reconstruction loss because it is more robust than the <italic>L</italic><sub>2</sub>-norm.</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">REC</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>A</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">REC</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>B</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where ||&#x000B7;||<sub>1</sub> denotes the <italic>L</italic><sub>1</sub> norm.</p></sec>
<sec>
<title>3.5.2. Adversarial loss</title>
<p>We apply adversarial losses (Goodfellow et al., <xref ref-type="bibr" rid="B12">2014</xref>) to match the distributions of the generated image <italic>AB</italic> and the target data <italic>B</italic>. In other words, adversarial loss ensures that the beautified face looks as realistic as the reference.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mtext>GAN</mml:mtext></mml:mrow><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant='double-struck'>E</mml:mi><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:mi>log</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>G</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant='double-struck'>E</mml:mi><mml:mi>B</mml:mi></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:mi>log</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where <italic>G</italic>(&#x000C3;) is defined by Equation (1).</p></sec>
<sec>
<title>3.5.3. Identity preservation</title>
<p>To preserve identity information during the beautification process, we propose adopting an identity loss function from the standard light-cognition model LightCNN (Wu et al., <xref ref-type="bibr" rid="B40">2018</xref>) trained on millions of faces. The identity characteristics are extracted from the FC1 layer, which is a vector of dimensions 2<sup>13</sup>, denoted as <italic>f</italic><sub><italic>id</italic></sub>.</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are responsible for the preservation of identity and <inline-formula><mml:math id="M10"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>B</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> aims to preserve identity after beautification. Note that our objective is not only to preserve the identity but also to improve the beauty of the generated image <italic>AB</italic> as jointly constrained by Equations (4) and (5).</p></sec>
<sec>
<title>3.5.4. Beauty loss</title>
<p>To exploit the beauty feature of the reference, a beauty prediction model is first used to extract the beauty features, and then we propose to minimize the distance <italic>L</italic><sub>1</sub> between the beautified face <italic>AB</italic> and <italic>B</italic> as follows:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic><sub><italic>bt</italic></sub> denotes the operator extracting the 256-dimensional beauty feature (FC2 as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>).</p></sec>
<sec>
<title>3.5.5. Perceptual loss</title>
<p>Unlike makeup transfer, our face beautification seeks many-to-many mapping in an unsupervised way, which is more challenging, especially in view of both inner-domain and cross-domain variations. As mentioned in Ma et al. (<xref ref-type="bibr" rid="B25">2018</xref>), semantic inconsistency is a major issue for such unsupervised many-to-many translations. To address this problem, we propose the application of a perceptual loss to minimize the perceptual distance between the beautified face <italic>AB</italic> and the reference face <italic>B</italic>. This is a modified version of Ma et al. (<xref ref-type="bibr" rid="B25">2018</xref>), where instance normalization (Ulyanov et al., <xref ref-type="bibr" rid="B37">2017</xref>) is performed on the features of VGG (Simonyan and Zisserman, <xref ref-type="bibr" rid="B34">2014</xref>) (<italic>f</italic><sub><italic>vgg</italic></sub>) before computing the perceptual distance.</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where ||&#x000B7;||<sub>2</sub> denotes the <italic>L</italic><sub>2</sub> norm.</p></sec>
<sec>
<title>3.5.6. Total loss</title>
<p>When everything is put together, we jointly train the architecture by optimizing the following objective function.</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">REC</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">REC</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ID</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BT</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">GAN</mml:mtext></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x000C3;</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003BB;<sub>1</sub>, &#x003BB;<sub>2</sub>, &#x003BB;<sub>3</sub>, &#x003BB;<sub>4</sub>, &#x003BB;<sub>5</sub> are the regularization parameters.</p></sec></sec></sec>
<sec id="s4">
<title>4. Experimental setup</title>
<sec>
<title>4.1. Training datasets</title>
<p>Two datasets are used in our experiments. First, we used CelebA (Liu et al., <xref ref-type="bibr" rid="B24">2015</xref>) to carry out the beautification experiment (only female celebrities are considered in this article). The authors of Liu et al. (<xref ref-type="bibr" rid="B23">2019</xref>) have found that some facial characteristics have a positive impact on the perception of beauty. Therefore, we have followed their findings to prepare our training datasets, that is, images containing these positive attributes (e.g., arched eyebrow, heavy makeup, high cheekbone, wearing lipsticks) as our reference dataset <italic>B</italic>; and images that do not contain those attributes as our target dataset (to be beautified) <italic>A</italic>.</p>
<p>We have combined CelebA training and validation sets as our new training set to increase training size, but we keep the test data set the same as the original protocol (Liu et al., <xref ref-type="bibr" rid="B24">2015</xref>). Our finalized training set includes 7,195 for <italic>A</italic> and 18,273 for <italic>B</italic>, and the testing set has 724 class<italic>A</italic> images and 2112 class<italic>B</italic> images. Another data set called SCUT-FBP5500 (Liang et al., <xref ref-type="bibr" rid="B21">2018</xref>) is used to train our facial beauty prediction network. Following their protocol, in our experiment, we have used 60% samples (3300 images) as training and the rest 40% (2200) as tests.</p>
</sec>
<sec>
<title>4.2. Implementation details</title>
<sec>
<title>4.2.1. Generative model</title>
<p>Similarly to Huang et al. (<xref ref-type="bibr" rid="B16">2018</xref>), our <italic>E</italic><sub><italic>c</italic></sub> consists of several strided convolutional layers and residual blocks (He et al., <xref ref-type="bibr" rid="B14">2016</xref>), and all convolutional layers are followed by Instance Normalization (IN) (Ulyanov et al., <xref ref-type="bibr" rid="B37">2017</xref>). For <italic>E</italic><sub><italic>s</italic></sub>, a global average pool layer and a fully connected layer (FC) are followed by the strided convolutional layers. IN layer is removed to preserve the beauty features. Inspired by recent GAN work (Dumoulin et al., <xref ref-type="bibr" rid="B8">2016</xref>; Huang and Belongie, <xref ref-type="bibr" rid="B15">2017</xref>; Karras et al., <xref ref-type="bibr" rid="B17">2019</xref>) using affine transformation parameters in normalization layers to better represent style, our decoder <italic>G</italic> is equipped with residual blocks and adaptive instance normalization (AdaIN). AdaIN parameters are dynamically generated by a multiple perceptron (MLP) from beauty codes as follows:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">AdaIN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>z</mml:mi><mml:mo>-</mml:mo><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B2;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>z</italic> is the activation of the previous convolutional layer, &#x003BC; and &#x003C3; are the mean and standard deviation of the channel, &#x003B3; and &#x003B2; are the parameters generated by the MLP.</p></sec>
<sec>
<title>4.2.2. Discriminative model</title>
<p>We have implemented multiscale discriminators (Wang et al., <xref ref-type="bibr" rid="B38">2018a</xref>) to guide the generative model in generating a realistic and consistent image in a global view. Furthermore, LSGAN (Mao et al., <xref ref-type="bibr" rid="B27">2017</xref>) is used in our discriminative model to maximize image quality.</p></sec>
<sec>
<title>4.2.3. Beauty and identity model</title>
<p>As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, we have used an existing face recognition model, LightCNN (Wu et al., <xref ref-type="bibr" rid="B40">2018</xref>), which was trained on millions of faces and achieved state-of-the-art performance in several benchmark studies. In order to extract face beauty feature, we do a fine-tuning based on the pre-trained model from LightCNN, the last fully connected (FC2) layer is the learnable layer for beauty score prediction, and all previous layers are kept fixed during the training process. When tested in the popular SCUT-FBP5500 dataset (Liang et al., <xref ref-type="bibr" rid="B21">2018</xref>), our method achieves the MAE of 0.2372 on the test set, which significantly outperforms theirs (0.2518) (Liang et al., <xref ref-type="bibr" rid="B21">2018</xref>) in our experiment.</p>
<p>In our experimental setting, the standard LightCNN is considered as the identity feature extractor and the fine-tuning beauty prediction model is used as the face beauty extractor. To extract both ID and beauty features using one model, we have taken advantage of the beauty prediction model and extracted the beauty feature from the last FC layer (FC2 in <xref ref-type="fig" rid="F3">Figure 3</xref>), and the second to last FC layer (FC1 in <xref ref-type="fig" rid="F3">Figure 3</xref>) as the output of the identity characteristic. When optimization involves two interacting networks, we have found that such a piggyback idea is more efficient than jointly training both beautification and beauty prediction modules. The following hyperparameters are empirically chosen in our experiment: the batch size is set as 4 with a single 2080Ti GPU. Our training has 360,000 iterations for a total of around 200 epochs. We use Adam Optimization (Kingma and Ba, <xref ref-type="bibr" rid="B18">2014</xref>) with &#x003B2;<sub>1</sub> &#x0003D; 0.5, &#x003B2;<sub>2</sub> &#x0003D; 0.999 and Kaiming initialization (He et al., <xref ref-type="bibr" rid="B13">2015</xref>). The learning rate is set as 0.0001 with a decay rate of 0.5 in every 100,000 iterations. The style codes from <italic>fc</italic> have 64 dimensions, and the loss weights are set as:&#x003BB;<sub>1</sub> &#x0003D; 10, &#x003BB;<sub>2</sub> &#x0003D; &#x003BB;<sub>3</sub> &#x0003D; &#x003BB;<sub>4</sub> &#x0003D; &#x003BB;<sub>5</sub> &#x0003D; 1.</p></sec></sec></sec>
<sec id="s5">
<title>5. Experimental results and evaluations</title>
<sec>
<title>5.1. Baseline methods</title>
<sec>
<title>5.1.1. CycleGAN</title>
<p>Zhu et al. (<xref ref-type="bibr" rid="B43">2017</xref>) a loss of consistency of the cycle was introduced to facilitate image-to-image translation, providing a simple but efficient solution to the transfer of style from unpaired data.</p></sec>
<sec>
<title>5.1.2. DRIT</title>
<p>Lee et al. (<xref ref-type="bibr" rid="B19">2018</xref>) an architecture projects images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Similarly to CycleGAN, a cross-cycle consistency loss based on disentangled representations is introduced to deal with unpaired data. Unlike CycleGAN, DRIT is capable of generating diverse images on a wide range of tasks.</p></sec>
<sec>
<title>5.1.3. MUNIT</title>
<p>Huang et al. (<xref ref-type="bibr" rid="B16">2018</xref>) a framework for unsupervised multimodal image-to-image translation, where images are decomposed into a domain-invariant content code and a style code that captures domain-specific properties. By combining content code with a random style code, MUNIT can also generate diverse outputs from the target domain.</p>
<p>As mentioned in Section 2, all baseline methods have weaknesses when applied to reference-based beautification. CycleGAN cannot take advantage of specific references for translation, and the output lacks diversity once the training has been completed. DRIT and MUNIT are capable of many-to-many translations, but fail to generate a sequence of correlated images (e.g., faces with increasing beauty scores). On the contrary, our model is capable of not only beautifying faces based on a given reference, but also controlling the degree of beautification to fine granularity, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
</sec>
</sec>
<sec>
<title>5.2. Qualitative and quantitative evaluations</title>
<sec>
<title>5.2.1. User study</title>
<p>To evaluate the image quality of human perception, we develop a user study and ask users to vote for the most attractive among ours and the baseline. One hundred face images from the test set are submitted to Amazon Mechanical Turk (AMT), and each survey requires 20 users. We collected 2,000 data points in total to evaluate human preference. The final results demonstrate the superiority of our model, as shown in <xref ref-type="table" rid="T1">Table 1</xref>, <xref ref-type="fig" rid="F5">Figure 5</xref>, and <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>User study preference for beautified images.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Count</bold></th>
<th valign="top" align="center"><bold>Percent</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">CycleGAN</td>
<td valign="top" align="center">401</td>
<td valign="top" align="center">20.05</td>
</tr>
<tr>
<td valign="top" align="left">DRIT</td>
<td valign="top" align="center">282</td>
<td valign="top" align="center">14.1</td>
</tr>
<tr>
<td valign="top" align="left">MUNIT</td>
<td valign="top" align="center">390</td>
<td valign="top" align="center">19.5</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center"><bold>927</bold></td>
<td valign="top" align="center"><bold>46.35</bold></td>
</tr>
</tbody>
</table><table-wrap-foot><p>The bold values indicate the best results.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Different comparison of the beautification reference with the baseline models. Top images are original input and the left are five references, noted CycleGAN outputs are the same without reference influence.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The same reference (Reference 1 in <xref ref-type="fig" rid="F5">Figure 5</xref>) beautification comparison with baseline models (the average beauty scores are referred to <xref ref-type="table" rid="T2">Table 2</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0006.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Average beauty score after beautification.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Beauty score</bold></th>
<th valign="top" align="center"><bold>Gain</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Original</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">CycleGAN</td>
<td valign="top" align="center">1.15</td>
<td valign="top" align="center">18.56%</td>
</tr>
<tr>
<td valign="top" align="left">DRIT</td>
<td valign="top" align="center">1.25</td>
<td valign="top" align="center">28.87%</td>
</tr>
<tr>
<td valign="top" align="left">MUNIT</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">4.12%</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center"><bold>1.33</bold></td>
<td valign="top" align="center"><bold>37.11</bold>%</td>
</tr>
</tbody>
</table><table-wrap-foot><p>The bold values indicate the best results.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>5.2.2. Beauty score improvement</title>
<p>To further evaluate the effectiveness of the proposed beautification approach, we have fed the beautified images into our face beauty prediction model to output the beauty scores. The beauty prediction model is trained on SCUT-FBP as mentioned above, and the beauty score scale is 5 in that dataset. After calculating and averaging the test images (724), our model outperforms all other methods and gains an increase 37.11% compared to the average beauty score of the original input, as shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
</sec></sec>
<sec>
<title>5.3. Ablation study</title>
<p>To investigate the importance of each loss, we tested three variants of our model by removing <inline-formula><mml:math id="M18"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mi mathvariant="-tex-caligraphic">I</mml:mi><mml:mi mathvariant="-tex-caligraphic">D</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="M19"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mi mathvariant="-tex-caligraphic">B</mml:mi><mml:mi mathvariant="-tex-caligraphic">T</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="M20"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:msub></mml:math></inline-formula>, one at a time. See <xref ref-type="fig" rid="F7">Figures 7</xref>&#x02013;<xref ref-type="fig" rid="F9">9</xref> for visual comparisons. These losses complement each other and work in harmony to achieve the optimum beautification effect. This further demonstrates that our loss functions and architecture are well-designed for the facial beautification task.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Comparisons with and without ID Loss <inline-formula><mml:math id="M17"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mi mathvariant="-tex-caligraphic">I</mml:mi><mml:mi mathvariant="-tex-caligraphic">D</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Note that the inclusion of ID loss can better preserve the identity information of a given face.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0007.tif"/>
</fig>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Comparisons with and without beauty loss <inline-formula><mml:math id="M21"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mi mathvariant="-tex-caligraphic">B</mml:mi><mml:mi mathvariant="-tex-caligraphic">T</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> (the average beauty scores are 1.06, 1.21, and 1.35, respectively).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0008.tif"/>
</fig>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Comparisons with and without Perceptual Loss <inline-formula><mml:math id="M22"><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:msub></mml:math></inline-formula>. We can observe that perceptual loss helps suppress undesirable artifacts in the beautified images.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0009.tif"/>
</fig>
</sec>
<sec>
<title>5.4. Discussions and limitations</title>
<p>Compared to recently developed makeup transfer, such as BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>) and BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>), we note that our approach differs in the following aspects. Similar to BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>), ours assumes the availability of a reference image; but unlike BeautyGAN (Li et al., <xref ref-type="bibr" rid="B20">2018</xref>) which focuses only on local touch-up, ours is capable of transferring both global and local beauty features from the reference to the target. Similar to BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>), ours can adjust the magnification in the latent space; but unlike BeautyGlow (Chen et al., <xref ref-type="bibr" rid="B5">2019</xref>), ours can improve the beauty score (rather than only increasing the extent of makeup).</p>
<p>Both the user study and the beauty score evaluation have demonstrated the superiority of our model. The proposed model is robust to low-quality images such as blur and challenging lighting conditions, as shown in <xref ref-type="fig" rid="F10">Figure 10</xref>. However, we also notice that there are a few typical failed cases in which our model tends to produce noticeable artifacts when the inputs have large occlusions and pose variations (please refer to <xref ref-type="fig" rid="F11">Figure 11</xref>). This is most likely caused by poor alignment, i.e., our references are mostly frontal images; while large occlusion and pose variations lead to misalignment.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Our model is robust to low-quality images and small pose variations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0010.tif"/>
</fig>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Failed case with artifacts: large occlusions and pose variations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-910233-g0011.tif"/>
</fig></sec></sec>
<sec id="s6">
<title>6. Conclusions and future works</title>
<p>In this paper, we have studied the problem of face beautification and presented a novel framework that is more flexible than makeup transfer. Our approach integrates style-based synthesis with beauty score prediction by piggybacking a LightCNN with a GAN-based architecture. Unlike makeup transfer, our approach targets many-to-many (instead of one-to-one) translation, where multiple outputs can be defined by either different references or varying beauty scores. In particular, we have constructed two interacting networks for beautification and beauty prediction. Through a simple weighting strategy, we managed to demonstrate the fine-granularity control of the beautification process. Our experimental results have shown the effectiveness of the proposed approach both subjectively and objectively.</p>
<p>Personalized beautification is expected to attract more attention in the coming years. In this work, we have focused only on the beautification of female Caucasian faces. A similar question can be studied for other populations, although the relationship between gender, race, cultural background, and the perception of facial attractiveness has remained underresearched in the literature. How AI can help shape the practice of personal makeup and plastic surgery is an emerging field for future research.</p></sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p></sec>
<sec id="s8">
<title>Ethics statement</title>
<p>Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p></sec>
<sec id="s9">
<title>Author contributions</title>
<p>XuL, RW, and XiL contributed to the conception and design of the study. XuL, RW, HP, MY, and C-FC performed experiments and statistical analysis. XiL wrote the first draft of the manuscript. All authors contributed to the review of the manuscript and approved the submitted version.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>Authors XuL, RW, HP, and C-FC were employed by ObEN, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationship that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berscheid</surname> <given-names>E.</given-names></name> <name><surname>Dion</surname> <given-names>K.</given-names></name> <name><surname>Walster</surname> <given-names>E.</given-names></name> <name><surname>Walster</surname> <given-names>G. W.</given-names></name></person-group> (<year>1971</year>). <article-title>Physical attractiveness and dating choice: a test of the matching hypothesis</article-title>. <source>J. Exp. Soc. Psychol</source>. <volume>7</volume>, <fpage>173</fpage>&#x02013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1016/0022-1031(71)90065-5</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bradbury</surname> <given-names>E.</given-names></name></person-group> (<year>1994</year>). <article-title>The psychology of aesthetic plastic surgery</article-title>. <source>Aesthetic Plast Surg</source>. <volume>18</volume>, <fpage>301</fpage>&#x02013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.1007/BF00449799</pub-id><pub-id pub-id-type="pmid">7976766</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bull</surname> <given-names>R.</given-names></name> <name><surname>Rumsey</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). <source>The Social Psychology of Facial Appearance</source>. Springer Science &#x00026; Business Media.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>H.</given-names></name> <name><surname>Lu</surname> <given-names>J.</given-names></name> <name><surname>Yu</surname> <given-names>F.</given-names></name> <name><surname>Finkelstein</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Pairedcyclegan: asymmetric style transfer for applying and removing makeup,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>40</fpage>&#x02013;<lpage>48</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H.-J.</given-names></name> <name><surname>Hui</surname> <given-names>K.-M.</given-names></name> <name><surname>Wang</surname> <given-names>S.-Y.</given-names></name> <name><surname>Tsao</surname> <given-names>L.-W.</given-names></name> <name><surname>Shuai</surname> <given-names>H.-H.</given-names></name> <name><surname>Cheng</surname> <given-names>W.-H.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Beautyglow: on-demand makeup transfer framework with reversible generative network,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>10042</fpage>&#x02013;<lpage>10050</lpage>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cunningham</surname> <given-names>M. R.</given-names></name></person-group> (<year>1986</year>). <article-title>Measuring the physical in physical attractiveness: quasi-experiments on the sociobiology of female facial beauty</article-title>. <source>J. Pers. Soc. Psychol</source>. 50, 925. <pub-id pub-id-type="doi">10.1037/0022-3514.50.5.925</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donahue</surname> <given-names>C.</given-names></name> <name><surname>Lipton</surname> <given-names>Z. C.</given-names></name> <name><surname>Balsubramani</surname> <given-names>A.</given-names></name> <name><surname>McAuley</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Semantically decomposing the latent spaces of generative adversarial networks</article-title>. <source>arXiv preprint arXiv:1705.07904</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1705.07904</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dumoulin</surname> <given-names>V.</given-names></name> <name><surname>Shlens</surname> <given-names>J.</given-names></name> <name><surname>Kudlur</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>A learned representation for artistic style</article-title>. <source>arXiv preprint arXiv:1610.07629</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1610.07629</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eisenthal</surname> <given-names>Y.</given-names></name> <name><surname>Dror</surname> <given-names>G.</given-names></name> <name><surname>Ruppin</surname> <given-names>E.</given-names></name></person-group> (<year>2006</year>). <article-title>Facial attractiveness: beauty and the machine</article-title>. <source>Neural Comput</source>. <volume>18</volume>, <fpage>119</fpage>&#x02013;<lpage>142</lpage>. <pub-id pub-id-type="doi">10.1162/089976606774841602</pub-id><pub-id pub-id-type="pmid">16354383</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>Y.-Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Guo</surname> <given-names>Z.</given-names></name> <name><surname>Samal</surname> <given-names>A.</given-names></name> <name><surname>Wan</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Label distribution-based facial attractiveness computation by deep residual learning</article-title>. <source>IEEE Trans. Multimedia</source> <volume>20</volume>, <fpage>2196</fpage>&#x02013;<lpage>2208</lpage>. <pub-id pub-id-type="doi">10.1109/TMM.2017.2780762</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gan</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Zhai</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Deep self-taught learning for facial beauty prediction</article-title>. <source>Neurocomputing</source> <volume>144</volume>, <fpage>295</fpage>&#x02013;<lpage>303</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2014.05.028</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <name><surname>Pouget-Abadie</surname> <given-names>J.</given-names></name> <name><surname>Mirza</surname> <given-names>M.</given-names></name> <name><surname>Xu</surname> <given-names>B.</given-names></name> <name><surname>Warde-Farley</surname> <given-names>D.</given-names></name> <name><surname>Ozair</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>&#x0201C;Generative adversarial nets,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Montreal, QC</publisher-loc>), <fpage>2672</fpage>&#x02013;<lpage>2680</lpage>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Delving deep into rectifiers: surpassing human-level performance on imagenet classification,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Computer Vision</source> (<publisher-loc>Santiago</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1026</fpage>&#x02013;<lpage>1034</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Deep residual learning for image recognition,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>.<pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Belongie</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Arbitrary style transfer in real-time with adaptive instance normalization,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Computer Vision</source> (<publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1501</fpage>&#x02013;<lpage>1510</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>M.-Y.</given-names></name> <name><surname>Belongie</surname> <given-names>S.</given-names></name> <name><surname>Kautz</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Multimodal unsupervised image-to-image translation,&#x0201D;</article-title> in <source>Proceedings of the European Conference on Computer Vision (ECCV)</source> (<publisher-loc>Munich</publisher-loc>: <publisher-name>ECCV</publisher-name>), <fpage>172</fpage>&#x02013;<lpage>189</lpage>.<pub-id pub-id-type="pmid">32759031</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karras</surname> <given-names>T.</given-names></name> <name><surname>Laine</surname> <given-names>S.</given-names></name> <name><surname>Aila</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;A style-based generator architecture for generative adversarial networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>4401</fpage>&#x02013;<lpage>4410</lpage>.<pub-id pub-id-type="pmid">32012000</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization</article-title>. <source>arXiv preprint arXiv:1412.6980</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1412.6980</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>H.-Y.</given-names></name> <name><surname>Tseng</surname> <given-names>H.-Y.</given-names></name> <name><surname>Huang</surname> <given-names>J.-B.</given-names></name> <name><surname>Singh</surname> <given-names>M.</given-names></name> <name><surname>Yang</surname> <given-names>M.-H.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Diverse image-to-image translation <italic>via</italic> disentangled representations,&#x0201D;</article-title> in <source>Proceedings of the European Conference on Computer Vision (ECCV)</source> (<publisher-loc>Munich</publisher-loc>: <publisher-name>ECCV</publisher-name>), <fpage>35</fpage>&#x02013;<lpage>51</lpage>.</citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>T.</given-names></name> <name><surname>Qian</surname> <given-names>R.</given-names></name> <name><surname>Dong</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Yan</surname> <given-names>Q.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>&#x0201C;Beautygan: instance-level facial makeup transfer with deep generative adversarial network,&#x0201D;</article-title> in <source>2018 ACM Multimedia Conference on Multimedia Conference</source> (<publisher-loc>Seoul</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>645</fpage>&#x02013;<lpage>653</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Lin</surname> <given-names>L.</given-names></name> <name><surname>Jin</surname> <given-names>L.</given-names></name> <name><surname>Xie</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Scut-fbp5500: a diverse benchmark dataset for multi-paradigm facial beauty prediction,&#x0201D;</article-title> in <source>2018 24th International Conference on Pattern Recognition (ICPR)</source> (<publisher-loc>Beijing</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1598</fpage>&#x02013;<lpage>1603</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Little</surname> <given-names>A. C.</given-names></name> <name><surname>Jones</surname> <given-names>B. C.</given-names></name> <name><surname>DeBruine</surname> <given-names>L. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Facial attractiveness: evolutionary based research</article-title>. <source>Philos. Trans. R. Soc. B Biol. Sci</source>. <volume>366</volume>, <fpage>1638</fpage>&#x02013;<lpage>1659</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2010.0404</pub-id><pub-id pub-id-type="pmid">21536551</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>T.</given-names></name> <name><surname>Peng</surname> <given-names>H.</given-names></name> <name><surname>Chuoying Ouyang</surname> <given-names>I.</given-names></name> <name><surname>Kim</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Understanding beauty <italic>via</italic> deep facial features,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>).<pub-id pub-id-type="pmid">33186876</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Luo</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Tang</surname> <given-names>X.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Deep learning face attributes in the wild,&#x0201D;</article-title> in <source>Proceedings of International Conference on Computer Vision</source> (<publisher-loc>Santiago</publisher-loc>: <publisher-name>ICCV</publisher-name>).</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>L.</given-names></name> <name><surname>Jia</surname> <given-names>X.</given-names></name> <name><surname>Georgoulis</surname> <given-names>S.</given-names></name> <name><surname>Tuytelaars</surname> <given-names>T.</given-names></name> <name><surname>Van Gool</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>Exemplar guided unsupervised image-to-image translation with semantic consistency</article-title>. <source>arXiv preprint arXiv:1805.11145</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1805.11145</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Macgregor</surname> <given-names>F. C.</given-names></name></person-group> (<year>1989</year>). <article-title>Social, psychological and cultural dimensions of cosmetic and reconstructive plastic surgery</article-title>. <source>Aesthetic Plast Surg</source>. <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1007/BF01570318</pub-id><pub-id pub-id-type="pmid">2728993</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mao</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Xie</surname> <given-names>H.</given-names></name> <name><surname>Lau</surname> <given-names>R. Y.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Paul Smolley</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Least squares generative adversarial networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Computer Vision</source> (<publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2794</fpage>&#x02013;<lpage>2802</lpage>.<pub-id pub-id-type="pmid">30273144</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathieu</surname> <given-names>M. F.</given-names></name> <name><surname>Zhao</surname> <given-names>J. J.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name> <name><surname>Ramesh</surname> <given-names>A.</given-names></name> <name><surname>Sprechmann</surname> <given-names>P.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Disentangling factors of variation in deep representation using adversarial training,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Barcelona</publisher-loc>), <fpage>5040</fpage>&#x02013;<lpage>5048</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perrett</surname> <given-names>D. I.</given-names></name> <name><surname>Burt</surname> <given-names>D. M.</given-names></name> <name><surname>Penton-Voak</surname> <given-names>I. S.</given-names></name> <name><surname>Lee</surname> <given-names>K. J.</given-names></name> <name><surname>Rowland</surname> <given-names>D. A.</given-names></name> <name><surname>Edwards</surname> <given-names>R.</given-names></name></person-group> (<year>1999</year>). <article-title>Symmetry and human facial attractiveness</article-title>. <source>Evolut. Hum. Behav</source>. <volume>20</volume>, <fpage>295</fpage>&#x02013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1016/S1090-5138(99)00014-8</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perrett</surname> <given-names>D. I.</given-names></name> <name><surname>Lee</surname> <given-names>K. J.</given-names></name> <name><surname>Penton-Voak</surname> <given-names>I.</given-names></name> <name><surname>Rowland</surname> <given-names>D.</given-names></name> <name><surname>Yoshikawa</surname> <given-names>S.</given-names></name> <name><surname>Burt</surname> <given-names>D. M.</given-names></name> <etal/></person-group>. (<year>1998</year>). <article-title>Effects of sexual dimorphism on facial attractiveness</article-title>. <source>Nature</source> <volume>394</volume>, <fpage>884</fpage>. <pub-id pub-id-type="doi">10.1038/29772</pub-id><pub-id pub-id-type="pmid">24213680</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Phillips</surname> <given-names>K. A.</given-names></name> <name><surname>McElroy</surname> <given-names>S. L.</given-names></name> <name><surname>Keck Jr</surname> <given-names>P. E.</given-names></name> <name><surname>Pope Jr</surname> <given-names>H. G.</given-names></name> <name><surname>Hudson</surname> <given-names>J. I.</given-names></name></person-group> (<year>1993</year>). <article-title>Body dysmorphic disorder: 30 cases of imagined ugliness</article-title>. <source>Am. J. Psychiatry</source> <volume>150</volume>, <fpage>302</fpage>. <pub-id pub-id-type="doi">10.1176/ajp.150.2.302</pub-id><pub-id pub-id-type="pmid">8422082</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rankin</surname> <given-names>M.</given-names></name> <name><surname>Borah</surname> <given-names>G. L.</given-names></name> <name><surname>Perry</surname> <given-names>A. W.</given-names></name> <name><surname>Wey</surname> <given-names>P. D.</given-names></name></person-group> (<year>1998</year>). <article-title>Quality-of-life outcomes after cosmetic surgery</article-title>. <source>Plast Reconstr. Surg</source>. <volume>102</volume>, <fpage>2139</fpage>&#x02013;<lpage>2145</lpage>. <pub-id pub-id-type="doi">10.1097/00006534-199811000-00053</pub-id><pub-id pub-id-type="pmid">10654779</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riggio</surname> <given-names>R. E.</given-names></name> <name><surname>Woll</surname> <given-names>S. B.</given-names></name></person-group> (<year>1984</year>). <article-title>The role of nonverbal cues and physical attractiveness in the selection of dating partners</article-title>. <source>J. Soc. Pers. Relat</source>. <volume>1</volume>, <fpage>347</fpage>&#x02013;<lpage>357</lpage>. <pub-id pub-id-type="doi">10.1177/0265407584013007</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv preprint arXiv:1409.1556</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1409.1556</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name> <name><surname>Freeman</surname> <given-names>W. T.</given-names></name></person-group> (<year>2000</year>). <article-title>Separating style and content with bilinear models</article-title>. <source>Neural Comput</source>. <volume>12</volume>, <fpage>1247</fpage>&#x02013;<lpage>1283</lpage>. <pub-id pub-id-type="doi">10.1162/089976600300015349</pub-id><pub-id pub-id-type="pmid">10935711</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thornhill</surname> <given-names>R.</given-names></name> <name><surname>Gangestad</surname> <given-names>S. W.</given-names></name></person-group> (<year>1999</year>). <article-title>Facial attractiveness</article-title>. <source>Trends Cogn. Sci</source>. <volume>3</volume>, <fpage>452</fpage>&#x02013;<lpage>460</lpage>. <pub-id pub-id-type="doi">10.1016/S1364-6613(99)01403-5</pub-id><pub-id pub-id-type="pmid">10562724</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ulyanov</surname> <given-names>D.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Lempitsky</surname> <given-names>V.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Hawaii, HI</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>6924</fpage>&#x02013;<lpage>6932</lpage>.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>T.-C.</given-names></name> <name><surname>Liu</surname> <given-names>M.-Y.</given-names></name> <name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name> <name><surname>Tao</surname> <given-names>A.</given-names></name> <name><surname>Kautz</surname> <given-names>J.</given-names></name> <name><surname>Catanzaro</surname> <given-names>B.</given-names></name></person-group> (<year>2018a</year>). <article-title>&#x0201C;High-resolution image synthesis and semantic manipulation with conditional gans,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>8798</fpage>&#x02013;<lpage>8807</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Tang</surname> <given-names>X.</given-names></name> <name><surname>Luo</surname> <given-names>W.</given-names></name> <name><surname>Gao</surname> <given-names>S.</given-names></name></person-group> (<year>2018b</year>). <article-title>&#x0201C;Face aging with identity-preserved conditional generative adversarial networks,&#x0201D;</article-title> in <source>The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>X.</given-names></name> <name><surname>He</surname> <given-names>R.</given-names></name> <name><surname>Sun</surname> <given-names>Z.</given-names></name> <name><surname>Tan</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>A light cnn for deep face representation with noisy labels</article-title>. <source>IEEE Trans. Inf. For. Security</source> <volume>13</volume>, <fpage>2884</fpage>&#x02013;<lpage>2896</lpage>. <pub-id pub-id-type="doi">10.1109/TIFS.2018.2833032</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>D.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Jin</surname> <given-names>L.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Scut-fbp: a benchmark dataset for facial beauty perception,&#x0201D;</article-title> in <source>2015 IEEE International Conference on Systems, Man, and Cybernetics</source> (<publisher-loc>Hong Kong</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1821</fpage>&#x02013;<lpage>1826</lpage>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Jin</surname> <given-names>L.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Feng</surname> <given-names>Z.</given-names></name> <name><surname>Xie</surname> <given-names>D.</given-names></name> <name><surname>Mao</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Facial attractiveness prediction using psychologically inspired convolutional neural network (pi-cnn),&#x0201D;</article-title> in <source>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source> (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1657</fpage>&#x02013;<lpage>1661</lpage>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name> <name><surname>Park</surname> <given-names>T.</given-names></name> <name><surname>Isola</surname> <given-names>P.</given-names></name> <name><surname>Efros</surname> <given-names>A. A.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Unpaired image-to-image translation using cycle-consistent adversarial networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Computer Vision</source> (<publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2223</fpage>&#x02013;<lpage>2232</lpage>.</citation>
</ref>
</ref-list> 
</back>
</article>