Face Beautification: Beyond Makeup Transfer

Facial appearance plays an important role in our social lives. Subjective perception of women's beauty depends on various face-related (e.g., skin, shape, hair) and environmental (e.g., makeup, lighting, angle) factors. Similar to cosmetic surgery in the physical world, virtual face beautification is an emerging field with many open issues to be addressed. Inspired by the latest advances in style-based synthesis and face beauty prediction, we propose a novel framework of face beautification. For a given reference face with a high beauty score, our GAN-based architecture is capable of translating an inquiry face into a sequence of beautified face images with referenced beauty style and targeted beauty score values. To achieve this objective, we propose to integrate both style-based beauty representation (extracted from the reference face) and beauty score prediction (trained on SCUT-FBP database) into the process of beautification. Unlike makeup transfer, our approach targets at many-to-many (instead of one-to-one) translation where multiple outputs can be defined by either different references or varying beauty scores. Extensive experimental results are reported to demonstrate the effectiveness and flexibility of the proposed face beautification framework.


Introduction
Facial appearance plays an important role in our social lives [3].People with attractive faces have many advantages in their social activities such as dating and voting [23].It has been found that attractive people enjoy higher chances of getting dating [34], and their partners are more likely to gain satisfaction when compared to dating with less attrac-tive ones [1].It has also been found that faces could affect hiring decisions and influence voting behavior [23].Overwhelmed by social fascination with beauty, women with unattractive faces may suffer from social isolation, depression, and even psychological disorders [3,33,32,27,2].Consequently, there is strong demand for face beautification both in the physical world (e.g., facial makeup and cosmetic surgeries) and in the virtual space (e.g., beautification cameras and filters).
The problem of face beautification has been extensively studied by philosophers, psychologists and plastic surgeons.Rapid advances in imaging technology and social media greatly expedited the popularity of digital photos especially selfies in our daily lives.Most recently, virtual face beautification based on the idea of makeup application or transfer has been developed in computer vision communities: PairedCycleGAN [4], BeautyGAN [21], BeautyGlow [5].Although these existing works have achieved impressive results, we argue that face beautification based on makeup transfer only has fundamental limitations.Without changing important facial attributes (e.g., shape and lentigo), the application of makeup -abstracted by image-to-image translation [44,16,20] -can only improve the beauty score to some extent.
A more flexible and promising framework is to formalize the process of face beautification by one-to-many translation where the destination can be defined in many different manners.On one hand, we can target at producing a sequence of output images with monotonically increased beauty scores by gradually transferring the stylebased beauty representation learned from a given reference (with a high beauty score).On the other hand, we can also produce a variety of personalized beautification results by learning from a sequence of references (e.g., celebrities with different beauty style).Under this framework, face beautification can be made more flexible -e.g., we can transfer the beauty style from a reference image to reach a specified beauty score, which is beyond the reach of makeup transfer [21,5].
To achieve this objective, we propose a novel generative adversarial network (GAN)-based architecture in this paper.Inspired by the latest advances in style-based synthesis (e.g., styleGAN [17]) and face beauty understanding from data [24], we propose to integrate both stylebased beauty representation (extracted from the reference face) and beauty score prediction (trained on SCUT-FBP database [42]) into the process of face beautification.More specifically, style-based beauty representations will be learned from both inquiry and reference images first via light convolutional neural network (LightCNN) and leveraged to guide the process of style transfer (actual beautification).Then a dedicated GAN-based architecture integrated with reconstruction, beauty and identity loss functions is constructed.In order to have a fine-granularity control of the beautification process, we have invented a simple yet effective reweighting strategy of gradually improving the beauty score in synthesized images until reaching the target (specified by the reference image).
Our key contributions are summarized as follows: • A forward-looking view toward virtual face beautification and a holistic style-based approach beyond makeup transfer (e.g., BeautyGAN and BeautyGlow).We argue that facial beauty scores offer a quantitative solution to guiding the process of face beautification.
• A face beauty prediction network based on fine-tuning of LightCNN is trained and integrated into the proposed style-based face beautification network.The prediction module provides valuable feedback to the synthesis module while approaching the desirable beauty score.
• A piggyback trick to extract both identity and beauty features from fine-tuned LightCNN and design of loss functions reflecting the tradeoff between identity preservation and face beautification.
• To the best of our knowledge, this is the first work capable of delivering face beautification results with fine-granularity control (i.e., a sequence of face images approaching the reference one with monotonically increasing beauty scores).
• A comprehensive evaluation shows the superiority of the proposed approach when compared to existing state-of-the-art image-to-image transfer techniques including CycleGAN [44], MUNIT [16], and DRIT [20].

Related Works
Makeup and Style Transfer.Two recent works on face beauty are BeautyGAN [21] and BeautyGlow [5].In Beau-tyGlow [5], the makeup features (e.g., eyeshadows and lip gloss) are first extracted from reference makeup images and then transferred to source non-makeup images.The magnification parameter in the latent space can be tuned to adjust the extent of the makeup.In BeautyGAN [21], the issue of extracting/transferring local and delicate makeup information was addressed by incorporating both global domain-level loss and local instance-level loss in an dual input/output GAN.
Face beautification is also related to more general imageto-image translation.Both symmetric (e.g., CycleGAN [44]) and asymmetric (e.g., PairedCycleGAN [4]) have been studied in the literature; the latter was shown effective for makeup application and removal.Extensions of style  transfer into multimodal domain (i.e., one-to-many translations) have been considered in MUNIT [16] and DRIT [20].It is also worth mentioning face image synthesis via StyleGAN [17] which has demonstrated super-realistic performance.
Face Beauty Prediction.The perception of facial appearance or attractiveness is a classical topic in psychology and cognitive sciences [37,31,30].However, developing a computational algorithm that can automatically predict beauty scores from facial images is only a recent endeavor [9,11].Thanks to the public release of face beauty database such as SCUT-FBP [42], there has been a growing interest in machine learning based approaches toward face beauty prediction [10,43].

Facial Attractiveness Theory
Why facial attractiveness matters?From an evolutionary perspective, a plausible working hypothesis is that the psychological mechanisms underlying primates' judgments about attractiveness are consequence of long-period evolution and adaptation.More specifically, facial attractiveness is beneficial to choosing a mate which in turn facilitates the gene propagation [37].At the primitive level, facial attractiveness is hypothesized to reflect information about an individuals health.Accordingly, conventional wisdom in facial attractiveness research has focused on ad-hoc attributes such as facial symmetry and averageness as potential biomarkers.In the history of modern civilization, the social norm of facial attractiveness has constantly evolved and varies from region to region (e.g., the sharp contrast between eastern and western culture [6]).
In particular, facial attractiveness for young females is a stimulating topic as witnessed by the long-lasting popularity of beauty pageants.In [6], the relation between female facial features and the responses of males was investigated.Based on the male subjects' attractiveness ratings, two classes of facial features (e.g., large eyes, small nose, and small chin; prominent cheekbones and narrow cheeks) are positively correlated with attractiveness ratings.It is also known from the same study [6] that facial features can also predict personality attributions and altruistic inclinations.We opt to focus on face beautification for females only in this work.

Problem Formulation and Motivation
Given a target face (an ordinary that is less attractive) and a reference face (usually a celebrity one with a high beauty score), how can we beautify the target face by transferring relevant information from the reference image?Such problem of face beautification can be formulated as two subproblems: style transfer and beauty prediction.Meantime, an important new insight brought into our problem formulation is that the treatment of face beautification as a sequential process where the beauty score of the target face can be gradually improved by a consecutive style transfer steps.As the fine-granularity style transfer proceeds, the beauty score of the beautified target face will monotonically approach that of the reference face.
The problem of style transfer has been extensively stud-ied in the literature which dated back to content-style separation [36].The idea of extracting style-based representation (style code) has attracted increasingly more attention in recent years -e.g., [15,16,19,7,29].Note that makeup transfer only represents a special case where style is characterized by local features only (e.g., eye-shadow and lipstick).In this work we conceive a more generalized solution to transfer both global and local style codes from the reference image.The extraction of style codes will be based on the solution to the other problem of beauty prediction.Such sharing of learned features between style transfer and beauty prediction allows us to achieve the fine-granularity control over the process of beautification.

Architecture Design
As illustrated in Fig. 2, we use A and B to denote the target face (unattractive) and the reference face (attractive) respectively.The objective of beautification is to translate image A into a new image AB whose beauty score is Qpercent close to that of B (Q is an integer between 0 and 100 specifying the granularity of beauty transfer).Assume both images A and B can be decomposed into a two-part representation consisting of style and content.That is, both images will be encoded by a pair of encoders: content (identity) encoder E c and style (beauty) E s encoder respectively.In order to transfer the beauty style from reference B to target A, it is natural to concatenate the content(identity)based representation C a with the style(beauty)-based representation S b ; and then reconstruct the beautified image AB through a dedicated decoder G defined by The rest of our architecture in Fig. 2 mainly includes two components: a GAN-based module (G pairs with D) responsible for style transfer and a module of beauty and identity loss responsible for beauty prediction (please refer to Fig. 3).
Our GAN module consisting of two encoders, one decoder, and one discriminator aims at distilling the beauty/style representation from the reference image and embedding it into the target image for the purpose of beautification.Inspired by recent work [38], we propose to integrate an Instance-Normalization (IN) layer after convolutional layers as part of the encoder for content feature extraction.Meantime, a global average pooling and a fully connected layer follow convolutional layers as part of the encoder for beauty feature extraction.Note that we skip IN in beauty encoder because IN would remove the characteristics of original feature representing critical beauty-related information [15] (that's why we keep it within content encoder).To cooperate with beauty encoder and speed up the translation, the decoder is equipped with an Adaptive Instance Normalization (AdaIN) [15].Additionally, we have adopted the popular multi-scale discriminators [40] with Least-Square GAN (LSGAN) [28] as the discriminator in our GAN module.
Our beauty prediction module is based on fine-tuning an existing LightCNN [41] as shown in Fig 3 .Since it's difficult to train a deep neutral network for beauty prediction from the scratch, we opt to work with LightCNN [41] -a pre-trained model for face recognition with millions of face images.Instead, we employ a fine-tuning layer (FC2) to adapt it for beauty score prediction (FC2 plays the role of beauty feature extractor).Meantime, in order to preserve the identity during face beautification, we propose to take the full advantage of our beauty prediction model by piggybacking the identity feature it produced.More specifically, identity feature is generated from the second fully connected layer (FC1) of LightCNN; note that we have only fine-tuned the last fully connected (FC2) for beauty prediction.By using this piggyback trick, we manage to extract both identity and beauty features from one off-shelf model.

Fine-granularity Beauty Adjustment
As we argued before, beautification should be modeled by a continuous process instead of a discrete domain transfer.In order to achieve the fine-granularity control of the beautification process, we propose to formulate a weighted beautification equation by where w 1 + w 2 = 1 and ,0 ≤ w 1 , w 2 ≤ 1.It is easy to observe the two extreme cases: 1) Eq. ( 2) degenerates into reconstruction when w 1 = 1, w 2 = 0; 2) Eq. ( 2) corresponds to the fullest-extent beautification when w 1 = 0, w 2 = 1.Such linear weighting strategy represents a simple solution to adjust the amount of beautification.
To make our model more robust, we have adopted the following training strategy: replacing ] in the training stage so that we do not need to train multiple weighted models when weights vary.Instead we apply the weighted beautification equation of Eq. ( 2) for testing directly.In other words, we pretend the beauty feature of the target image A is forgotten during the training but partially exploit it during the testing (since it is less relevant than identity feature).In summary, our fine-granularity beauty adjustment strategy heavily counts on the capability of beauty encoder E s for reliably extracting beauty representation.The effectiveness of the proposed fine-granularity beauty adjustment can be justified by referring to Fig. 5.

Loss Functions
Image reconstruction.Both encoder and decoder need to make sure that target and reference images can be approximately reconstructed from the extracted content/style representation.Here we have adopted L 1 -norm for reconstruction loss for the reason of robustness.
where || • || 1 denotes the L 1 -norm.Adversarial loss.We apply adversarial losses [12] for matching the distributions of the generated image AB and the target data B. In other words, the adversarial loss ensures the beautified face looks as realistic as the reference.
where G(AB) is defined by Eq. ( 1).Identity preservation.To preserve the identity information during the process of beautification, we propose to adopt an identity loss function from the off-shelf face recognition model LightCNN [41] trained on millions of faces.Identity features are extracted from the FC1 layer, which is a 2 13 -dimensional vector.
where L A ID and L B ID are responsible for identity preservation, and L AB ID aims at preserving the identity after beautification.Note that our objective is to preserve the identity but improve the beauty in the generated image AB as jointly constrained by Eqs. ( 4) and (5).
Beauty loss.In order to leverage the beauty feature from the reference, a beauty prediction model is first used to extract beauty features and then we propose to minimize the L 1 distance between the beautified face AB and B as following: where f bt denotes the operator extracting the 256dimensional beauty feature (FC2 as shown in Fig. 3).Perceptual loss.Unlike makeup transfer, our face beautification seeks many-to-many mapping in an unsupervised way, which is more challenging especially in view of both inner-domain and cross-domain variations.As mentioned in [26], semantic inconsistency is a major issue for such unsupervised many-to-many translation To address this issue, we propose to apply a perceptual loss to minimize the perceptual distance between the beautified face AB and the reference face B. This is a modified version from [26], where Instance Normalization [38]is performed on VGG [35] features before computing the perceptual distance.

Training Datasets
Two datasets are used in our experiments.First, we have used CelebA [25] to conduct the beautification experiment (only female celebrities are considered in this paper).Authors in [24] have found that some facial attributes have a positive impact on beauty perception.So we have followed their findings to prepare our training datasets -i.e., the images containing those positive attributes (e.g., arched eyebrow, heavy makeup, high cheekbone, wearing lipsticks) as our reference dataset B; and images that do not contain those attributes as our target (to be beautified) dataset A. We have merged the training and validation originate from CelebA as our new training set in order to enlarge the training size, but keep the testing dataset the same as the original protocol [25].Our finalized training set includes 7195 for A and 18273 for B, and testing set has 724 class-A images and 2112 class-B images.Another dataset called SCUT-FBP5500 [22] is used to train our face beauty prediction network.Following their protocol we have used 60% samples (3300 images) as training and the rest 40% (2200) as testing in our experiment.

Implementation details
Generative model.Similar to [16], our E c consists of several strided convolutional layers and residual blocks [14], all convolutional layers are followed by Instance Normalization (IN) [38].As for E s , a global average pooling layer and a fully connected (FC) layer are followed by the strided convolutional layers.IN layer is removed to preserve the beauty features.Inspired by recent GAN works  [ 15,8,17] that use affine transformation parameters in normalization layers to better represent style, our decoder G is equipped with the residual blocks as well as Adaptive Instance Normalization (AdaIN).The parameters of AdaIN are dynamically generated by a Multiple Perceptron (MLP) from the beauty codes similar as [16], seeing as following: where z is the activation of the previous convolutional layer, µ and σ are channel-wise mean and standard deviation, γ and β are parameters generated by the MLP.Discriminative model.We have implemented multiscale discriminators [39] to guide generative model to generate both realistic and consistent image in a global view.In addition, LSGAN [28] is used in our discriminative model to leverage the image quality.
Beauty and identity model.As shown in Fig. 3, we have used an off-shelf face recognition model-LightCNN [41], which was trained on millions of faces and achieved state-of-the-art performance in several benchmark studies.In order to extract face beauty feature, we do a fine-tuning based on the pre-trained model from LightCNN, the last fully connected (FC2) layer is the learnable layer for beauty score prediction and all previous layers are kept fixed during training process.When tested on the popular CUT-FBP5500 dataset [22], our method achieves the MAE of 0.2372 on testing set, which significantly outperforms theirs (0.2518) [22] in our experiment.
In our experimental setting, the off-shelf LightCNN is considered as the identity feature extractor and the finetuning beauty prediction model is used as the face beauty extractor.In order to extract both ID and beauty features using one model, we have taken advantage of the beauty

Baseline Methods
CycleGAN [44] A cycle consistency loss was introduced to facilitate the image-to-image translation, which provides a simple but efficient solution to style transfer from unpaired data.
DRIT [19] An architecture projects images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space.Similar to CycleGAN, a cross-cycle consistency loss based on disentangled representations is introduced to deal with unpaired data.Unlike CycleGAN, DRIT is capable of generating diverse images on a wide range of tasks.
MUNIT [16] A framework for multimodal unsupervised image-to-image translation, where images are decomposed into a content code that is domain-invariant and a style code that captures domain-specific properties.By combining content code with a random style code, MUNIT can also generate diverse outputs from the target domain.
As mentioned in Section 2, all baseline methods have their weakness when applied to reference-based beautification.CycleGAN cannot take advantage of specific references for translation, the outputs lack diversity once training done.DRIT and MUNIT are capable of many-to-many translation but fail to generate a sequence of correlated images (e.g., faces with increasing beauty scores).By contrast, our model is capable of not only beautifying faces based on a given reference but also controlling the degree of the beautification to fine-granularity, as shown in Fig 5.

Qualitative and Quantitative Evaluations
User study.To evaluate the image quality from human's perception, we develop a user study and ask users to vote the most attractive one among ours and the baseline.100 face images from testing set are submitted to Amazon Mechanical Turk (AMT), and each survey requires 20 users.We collect 2000 data points in total to evaluate human preference.The final results demonstrate the superiority of out model, showing in Table 1.
Beauty Score Improvement.To further evaluate the effectiveness of the proposed beautification approach, we have fed the beautified images into our face beauty prediction model to output the beauty scores.The beauty prediction model is trained on SCUT-FBP as mentioned before and the scale of beauty score is 5 in that dataset.After calculating and averaging the testing images (724), our model outperforms all other methods and gains a 37.11% increase when compared to average beauty score of the original input as shown in Table 2.

Discussions and Limitations
When compared against recently developed makeup transfer such as BeautyGAN [21] and BeautyGlow [5], we note that our approach differs in the following aspect.Similar to BeautyGAN [21], ours assumes the availability of a reference image; but unlike BeautyGAN [21] focusing on local touchup only, ours is capable of transferring both global and local beauty features from the reference to the target.Similar to BeautyGlow [5], ours can adjust the mag-  nification in the latent space; but unlike BeautyGlow [5], ours can improve the beauty score (rather than only increasing the extent of makeup).
Both user study and beauty score evaluation have demonstrated the superiority of our model.The proposed model is robust to low quality images such as blur and challenging lighting conditions as shown in Fig. 8.However, we also notice there are a few typical failed cases in which our model tends to produces noticeable artifacts when the inputs have large occlusions and pose variations (please refer to Fig. 9).This is most likely caused by poor alignment -i.e., our references are mostly frontal images; while large occlusion and pose variations lead to misalignment.

Conclusions and Future Works
In this paper, we have studied the problem of face beautification and presented a novel framework that is more flexible than makeup transfer.Our approach integrates stylebased synthesis with beauty score prediction by piggybacking a LightCNN with an GAN-based architecture.Unlike makeup transfer, our approach targets at many-to-many (instead of one-to-one) translation where multiple outputs can be defined by either different references or varying beauty scores.In particular, we have constructed two interacting networks for beautification and beauty prediction.Through a simple weighting strategy, we manage to demonstrate the Personalized beautification is expected to attract increasingly more attention in the incoming years.This work we have only focused on the beautification of female Caucasian faces.A similar question can be studied for other populations even though the relationship between gender, race, cultural background and the perception of facial attractiveness has remained under-researched in the literature.How can AI help reshape the practice of personal makeup and plastic surgery is an emerging field for future research.

Figure 2 :
Figure 2: Overview of the proposed network architecture.

Figure 3 :
Figure 3: Fine-tuning network for beauty score prediction.

Figure 5 :
Figure 5: Beauty degree adjustment by controlled beauty representation (the leftmost is the original input, from left to right: light to heavy beautification).

Figure 6 :
Figure 6: Different reference beautification comparison with baseline models.Top images are original input and the left are five references, noted CycleGAN outputs are the same without reference influence.

Figure 8 :
Figure 8: Our model is robust to low quality images and small pose variations.

Figure 9 :
Figure 9: Failed case with artifacts: large occlusions and pose variations.

Figure 10 :Figure 11 :
Figure 10: Comparisons with and w/o ID Loss L ID To investigate the importance of each loss, we experiment three variants of our model by removing L ID , L BT and L P , one at a time.SeeFig 10,11 and 12 for visual comparisons.These losses compliment each other and work in harmony to reach the optimum beautification effect.This further demonstrates that our loss functions and architecture are well-designed for the facial beautification task.

Figure 12 :
Figure 12: Comparisons with and w/o Perceptual Loss L P

Table 1 :
User study preference for beautified images.

Table 2 :
Average beauty score after beautification.