<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurorobot.</journal-id>
<journal-title>Frontiers in Neurorobotics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurorobot.</abbrev-journal-title>
<issn pub-type="epub">1662-5218</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnbot.2023.1182375</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Face morphing attack detection based on high-frequency features and progressive enhancement learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Jia</surname> <given-names>Cheng-kun</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/2233543/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Liu</surname> <given-names>Yong-chao</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2232381/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Ya-ling</given-names></name>
</contrib>
</contrib-group>
<aff><institution>School of Electrical and Information Engineering, Hunan Institute of Traffic Engineering</institution>, <addr-line>Hengyang</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Weiran Yao, Harbin Institute of Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Vittorio Cuculo, University of Modena and Reggio Emilia, Italy; &#x000D6;nder Tutsoy, Adana Science and Technology University, T&#x000FC;rkiye; Luis Arturo Soriano, Chapingo Autonomous University, Mexico</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Yong-chao Liu <email>excellence_lyc&#x00040;163.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>05</day>
<month>06</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>1182375</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>03</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>05</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Jia, Liu and Chen.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Jia, Liu and Chen</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Face morphing attacks have become increasingly complex, and existing methods exhibit certain limitations in capturing fine-grained texture and detail changes. To overcome these limitation, in this study, a detection method based on high-frequency features and progressive enhancement learning was proposed. Specifically, in this method, first, high-frequency information are extracted from the three color channels of the image to accurately capture the details and texture changes. Next, a progressive enhancement learning framework was designed to fuse high-frequency information with RGB information. This framework includes self-enhancement and interactive-enhancement modules that progressively enhance features to capture subtle morphing traces. Experiments conducted on the standard database and compared with nine classical technologies revealed that the proposed approach achieved excellent performance.</p></abstract>
<kwd-group>
<kwd>face morphing attacks</kwd>
<kwd>machine learning</kwd>
<kwd>high-frequency features</kwd>
<kwd>progressive enhancement learning</kwd>
<kwd>self-enhancement module</kwd>
<kwd>interactive-enhancement module</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="7"/>
<equation-count count="12"/>
<ref-count count="25"/>
<page-count count="11"/>
<word-count count="6189"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>Facial features are widely used as personal identity authentication information. With the improvement in the recognition rate, face recognition systems are increasingly being used in bank businesses, mobile phone national ID card systems, face payment, and border management.</p>
<p>However, studies have revealed that face recognition systems are vulnerable to face morphing attacks (Scherhag et al., <xref ref-type="bibr" rid="B20">2017</xref>) in which two facial images with various biological characteristics are synthesized into a morphed facial image with biometric information that is similar to the two facial images. A morphed face image results in face recognition systems matching two people. If such images are embedded in passports or other electronic travel documents, then border management systems can become vulnerable.</p>
<p>In many countries, applicants provide facial images for use in e-passport applications. Criminals can use free software to transform their facial images into those of friends with similar appearance. Because morphed faces are similar to real faces, if a partner uses the morphed face to apply for electronic travel documents, then criminals can use facial images on electronic travel documents to deceive border inspectors and recognition systems for passing automatic border control. Because such attacks have been proven to be effective (Ferrara et al., <xref ref-type="bibr" rid="B5">2014</xref>), detecting faces generated by this attack is critical for social security.</p>
<p>Detection approaches are classified into conventional and depth-feature-based methods. Conventional feature-based methods include texture (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>, <xref ref-type="bibr" rid="B15">2017</xref>; Venkatesh et al., <xref ref-type="bibr" rid="B23">2020</xref>) and quality-based methods (Makrushin et al., <xref ref-type="bibr" rid="B12">2017</xref>; Debiasi et al., <xref ref-type="bibr" rid="B4">2018a</xref>,<xref ref-type="bibr" rid="B3">b</xref>; Scherhag et al., <xref ref-type="bibr" rid="B19">2019</xref>). With deep learning technology evolving rapidly, the method based on depth feature (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>; Long et al., <xref ref-type="bibr" rid="B10">2022</xref>, <xref ref-type="bibr" rid="B9">2023</xref>) is widely used. Among these methods, conventional feature methods are simple to implement but cannot achieve satisfactory discriminability. By contrast, although depth-feature-based methods can extract semantic information effectively and exhibit superior generalization, these methods tend to extract global information from images and ignore details. Studies (Luo et al., <xref ref-type="bibr" rid="B11">2021</xref>) have revealed that existing deep learning methods exhibit poor performance in recognizing realistic synthetic faces because they cannot extract details effectively.</p>
<p>With advancement in morphing attack technology (Makrushin et al., <xref ref-type="bibr" rid="B12">2017</xref>; Qin et al., <xref ref-type="bibr" rid="B14">2020</xref>), morphed faces are becoming increasingly realistic, rendering discerning the differences between real and morphed images due to subtle and localized differences difficult. Consequently, the limitations of existing methods are especially concerning. To address this problem, a novel face morphing attack detection method based on high-frequency features and progressive enhancement learning was proposed to effectively extract details and overcome the limitations of existing methods. The contributions of this study are as follows:</p>
<list list-type="bullet">
<list-item><p>A novel face morphing detection method based on high-frequency features was proposed. High-frequency features typically represent parts of the image with high variation rates, including details and texture information. The use of high-frequency information as the input to a neural network can better capture image details, thereby improving the performance and accuracy of the model in detecting morphed images.</p></list-item>
<list-item><p>A progressive enhancement learning framework based on two-stream networks was proposed for training a detection model. The framework comprises of self-enhancement and interactive-enhancement modules. These modules gradually improve the feature representation of the model, and enable it to accurately capture subtle morphing traces.</p></list-item>
<list-item><p>The proposed system is analyzed on the standard database. Experiments on two databases revealed excellent performance in the single- and cross-dataset tests.</p></list-item>
</list>
<p>The rest of the paper is organized as follows: Section 2 introduces the related work. Section 3 depicts the proposed method. Section 4 provides experimental results and analysis. Finally, Section 5 presents conclusions.</p>
</sec>
<sec id="s2">
<title>2. Related work</title>
<p>Face morphing detection is a critical task for ensuring social security. Various techniques have been proposed to address this problem. In this section, we review several state-of-the-art methods for detecting face morphing. Specifically, we categorized these methods into three types, namely texture-based methods, image-quality-based methods, and depth-feature-based methods. We discussed the strengths and weaknesses of each method and highlighted the necessity of effective and accurate techniques to detect face morphing.</p>
<sec>
<title>2.1. Face morphing detection based on texture</title>
<p>Raghavindra et al. (2016) proposed the use of binary statistical image features (BSIF) to detect morphed faces. The method was tested on a large database consisting of 450 morphed face images created by 110 subjects of different races, ages, and genders. Experimental results proved that the method is efficient. Subsequently, Raja et al. proposed a method by using multi-color spatial features (Raghavendra et al., <xref ref-type="bibr" rid="B15">2017</xref>). In this method, texture features extracted from HSV and YCbCr were used for detection. The bona fide presentation classification error rate (BPCER) of this method was 1.73%, and the attack presentation classification error rate (APCER) was 7.59%, which revealed superior detection performance compared to earlier methods. Venkatesh et al. proposed the use of multiple features to improve detection performance (Venkatesh et al., <xref ref-type="bibr" rid="B23">2020</xref>). In this method, BSIF, HOG, and LBP were used to extract features. Compared with earlier studies, this model exhibited stable detection performance under various environments and conditions.</p>
</sec>
<sec>
<title>2.2. Face morphing detection based on image quality</title>
<p>Neubert et al. proposed an automated detection approach based on JPEG degradation of continuous images (Makrushin et al., <xref ref-type="bibr" rid="B12">2017</xref>). Under laboratory conditions, the accuracy rate was 90.1%, and under real world conditions, the accuracy rate was 84.3%. Photo response non-uniformity (PRNU) is a source of mode noise in digital cameras and is generated when photons in a digital image sensor are converted into electrons. PRNU features are widely used in image forgery detection because operations such as image copying or moving changes the PRNU features of images. Therefore, Debiasi et al. (<xref ref-type="bibr" rid="B4">2018a</xref>) proposed the use of PRNU features for detection. According to experimental results, PRNU analysis achieved reliable detection performance for morphed faces and maintained excellent performance even under image scaling and sharpening. Debiasi et al. (<xref ref-type="bibr" rid="B3">2018b</xref>) proposed an improved version of the PRNU. Two detection methods based on the PRNU were used to analyze the Fourier spectrum of PRNU and statistical methods were used for quantifying the spectral distinction between real and morphed face images. The value of PRNU was affected by the fusion operation in both spatial domain and frequency domains. Scherhag et al. (<xref ref-type="bibr" rid="B19">2019</xref>) introduced spatial features for the parallel analysis of frequency domain features.</p>
</sec>
<sec>
<title>2.3. Face morphing detection based on depth feature</title>
<p>In most morphing detection methods, deep learning methods, especially the pre-trained CNN architecture, is used. Seibold et al. (<xref ref-type="bibr" rid="B22">2017</xref>) first proposed a detection approach on the basis of deep learning. Three popular network structures, namely AlexNet, GoogLeNet, and VGG19 were evaluated. Experimental results revealed that VGG19 after pre-processing can obtain excellent performance. In subsequent studies, evaluation has been gradually combined with ResNet, Inception, and other networks. Subsequently, Long et al. used the lightweight network structure and local feature information to improve accuracy. The network achieved high accuracy with fewer parameters (Long et al., <xref ref-type="bibr" rid="B10">2022</xref>). To enhance the generalization ability of the network, Long et al. (<xref ref-type="bibr" rid="B9">2023</xref>) proposed a detection method based on a two-stream network with the channel attention mechanism and residual of multiple color spaces. In the method, the residual noise of multiple space and attention mechanism were used to detect morphed face. Experimental results revealed that the proposed method outperformed existing methods.</p>
<p>Methods based on conventional features are simple to implement but cannot achieve satisfactory discriminability, whereas methods based on deep feature generally outperform conventional methods but tend to extract global information from images and ignore details. To overcome the limitations of existing methods, a detection method based on high-frequency features and progressive enhancement learning was proposed for detecting morphed faces. High-frequency features typically represent parts of the image with high variation rates, including details and texture information. The use of high-frequency information as the input to a neural network can enhance the details of the captured image. Progressive enhancement learning is a learning method that progressively enhances feature representations. It achieves this by inserting self-enhancement modules after each convolution block in a convolutional neural network, and interactive-enhancement modules after each stage to gradually enhance the feature representation. This method effectively utilizes high-frequency information to better locate subtle morphing traces.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Proposed method</title>
<p>The proposed scheme is displayed in <xref ref-type="fig" rid="F1">Figure 1</xref>. The scheme can detect the morphed face image by using high-frequency features and progressive enhancement learning. First, the image is preprocessed and subsequently decomposed into R, G, B color channels. High-frequency features are extracted from images in R, G, B channels. Finally, the merged high-frequency information image and RGB image are input into the designed progressive enhancement learning framework for end-to-end training for detecting morphed faces. The scheme consists of three parts, namely pre-processing, high-frequency information extraction, and progressive enhancement learning framework design. Each part is described in this paper.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Presented approach.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1182375-g0001.tif"/>
</fig>
<sec>
<title>3.1. Pre-processing</title>
<p>To effectively extract features from the image, pre-processing the image is critical. In the pre-processing stage, first, the dlib detector was used for face detection (King, <xref ref-type="bibr" rid="B8">2009</xref>). The detected faces were then cropped to 224 &#x000D7; 224 pixels to ensure the morphing detection algorithm was applied to the face area. Next, 224 &#x000D7; 24 pixels were selected to accommodate the size of the input layer of the progressive enhanced two-stream network.</p>
</sec>
<sec>
<title>3.2. High-frequency information extraction</title>
<p>High-frequency information contains considerable detail information as well as noise. Detailed information can be used to detect subtle differences between real and the morphed faces, and noise can suppress the image content. Therefore, high-frequency information was introduced to detect morphed faces.</p>
<p>To extract the high-frequency information of facial image <italic>X</italic>, the high-frequency information of R, G, B color channels was extracted. First, the input image was decomposed into R, G, and B three channels, and the separated images were represented as <italic>X</italic><sub><italic>r</italic></sub>, <italic>X</italic><sub><italic>g</italic></sub>, and <italic>X</italic><sub><italic>b</italic></sub>. The corresponding frequency spectra <italic>X</italic><sub><italic>fr</italic></sub>, <italic>X</italic><sub><italic>fg</italic></sub>, and <italic>X</italic><sub><italic>fb</italic></sub> are obtained through Fourier transform as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>G</mml:mi><mml:mi>B</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">,X</mml:mtext></mml:mstyle><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">,X</mml:mtext></mml:mstyle><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, and <italic>D</italic> represents the discrete Fourier transform (DFT). The image obtained after DFT transformation exhibits excellent frequency distribution layout, that is, the low-frequency response is at the top corner and high-frequency response is at the lower right corner. To extract high-frequency information, the low-frequency part of the upper left corner is moved to the middle. The specific operation symmetrically exchanges the four quadrants of the frequency domain image, that is, the first and third quadrants, the second and fourth quadrants exchange positions. Thus, the zero-frequency component is moved to the center of the spectrum. Next, the image content is suppressed by filtering low-frequency information for magnifying high-frequency subtle artifacts as follows:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <italic>F</italic> represents high-pass filtering, &#x003B1; controls the low-frequency components to be filtered. Generally, the value range of &#x003B1; is limited between [0.1, 0.5] because within this range, the value of &#x003B1; can not only filter low-frequency components to a certain extent but also retain the high-frequency information in the image to achieve superior filtering effect. Therefore, the value of &#x003B1; was set to 0.33. Finally, the frequency spectrum with high-frequency information was converted into RGB color space by using inverse Fourier transform to obtain the output image with high-frequency information as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:msubsup><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;,</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">,X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">,X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, and <italic>D</italic><sup>&#x02212;1</sup> represents inverse discrete Fourier transform (IDFT). Finally, the high-frequency information images extracted from the three channels are spliced along the channel direction to obtain the final high-frequency feature image as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The high-frequency information extraction process is displayed in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Extraction process of three-channel high-frequency features.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1182375-g0002.tif"/>
</fig>
</sec>
<sec>
<title>3.3. Progressive enhancement learning framework</title>
<p>Attention mechanisms are widely used in image processing tasks (Sa et al., <xref ref-type="bibr" rid="B17">2021</xref>; Gu et al., <xref ref-type="bibr" rid="B6">2022</xref>). Inspired by these mechanisms, this study proposed a progressive enhancement learning framework (PELF) to enhance detection performance by combining RGB image information with high-frequency information. The RGB image information provides basic color and shape information, whereas the high-frequency image information provides detailed information. Fusing RGB and high-frequency features enables a comprehensive feature representation, which results in improved detection performance.</p>
<p>The framework is based on a two-stream network architecture, where RGB images and corresponding high-frequency information images are simultaneously fed into the network as the input. The backbone network is ShuffleNetV2, which is end-to-end trained. To enhance the features in both intra- and inter-stream manner, self-enhancement modules and interactive-enhancement modules are designed. Specifically, each convolutional block of the backbone is followed by a self-enhancement module, and interactive-enhancement modules are inserted after each stage. The self-enhancement module can enhance the characteristics of each flow. The interactive-enhancement module can enhance the feature interaction between RGB and high-frequency information. This progressive feature enhancement process effectively locates subtle morphing traces and improves detection performance. In the feature fusion stage, the AFF module (Dai et al., <xref ref-type="bibr" rid="B2">2021</xref>) is used to fuse RGB and high-frequency features, as displayed in <xref ref-type="fig" rid="F3">Figure 3</xref>. This method is a feature fusion method, which can use the complementarity and correlation between the two features and improve the expression ability and classification performance of features. After passing through the AFF module, the output dimension remains consistent with the input dimension, which is 7 &#x000D7; 7 &#x000D7; 1024. The resulting fused features are then sent to the Softmax layer for classification.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>AFF module.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1182375-g0003.tif"/>
</fig>
<sec>
<title>3.3.1. Self-enhancement module</title>
<p>Inspired by the channel attention mechanism, a self-enhancement module (<xref ref-type="fig" rid="F4">Figure 4</xref>) was designed to enhance the characteristics of each flow. Specifically, the global features of each channel were extracted through global average pooling (GAP) and global max pooling (GMP), and the global spatial features of each channel was considered as the representation of the channel to form a 1 &#x000D7; 1 &#x000D7; <italic>C</italic> channel descriptor. The description is as follows:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mtext class="textrm" mathvariant="normal">1</mml:mtext><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">, S2</mml:mtext><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Self-enhancement module.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1182375-g0004.tif"/>
</fig>
<p>Where, <italic>f</italic><sub><italic>in</italic></sub> represents the input feature map. To effectively capture cross-channel interaction information, this paper considers capturing local cross-channel interaction information from each channel and its <italic>k</italic> neighbors. For this purpose, we subject the obtained global spatial features <italic>S1</italic> and <italic>S2</italic> to fast one-dimensional convolution with a kernel size of <italic>k</italic>. These operations generate two channel attention maps, <italic>Z1</italic> and <italic>Z2</italic>, which are obtained by passing the convolved features through a sigmoid function. The description is as follows:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>Z</mml:mi><mml:mtext class="textrm" mathvariant="normal">1</mml:mtext><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>C</mml:mi><mml:mn>1</mml:mn><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:mtext class="textrm" mathvariant="normal">1</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext><mml:mi>Z</mml:mi><mml:mtext class="textrm" mathvariant="normal">2</mml:mtext><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>C</mml:mi><mml:mn>1</mml:mn><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:mtext class="textrm" mathvariant="normal">2</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <italic>C1D</italic> represents one-dimensional convolution, &#x003C3; represents Sigmoid function, and convolution kernel size <italic>k</italic> represents the number of neighbors participating in attention prediction near this channel. Here, the final channel attention map <italic>Z</italic> is computed by adding <italic>Z1</italic> and <italic>Z2</italic> together. This map is then used to multiply the input characteristics of each flow <italic>f</italic><sub><italic>in</italic></sub>, leading to an enhanced feature representation. Finally, the enhanced feature is added to the original input feature, resulting in the final output <italic>f</italic><sub><italic>out</italic></sub>. The description is as follows:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>Z</mml:mi><mml:mo>=</mml:mo><mml:mi>Z</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mi>Z</mml:mi><mml:mn>2</mml:mn><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(9)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02297;</mml:mo><mml:mi>Z</mml:mi><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <italic>f</italic><sub><italic>out</italic></sub> represents the output feature after passing through the module. The self-enhancement module was inserted after each convolution block. Through channel attention, the trajectories in various input spaces were captured to enhance the characteristics of each flow.</p>
</sec>
<sec>
<title>3.3.2. Interactive-enhancement module</title>
<p>To exploit RGB information and high-frequency information, an interactive-enhancement module (<xref ref-type="fig" rid="F5">Figure 5</xref>) was used to enhance the interaction of two-stream features.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Interactive-enhancement module.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1182375-g0005.tif"/>
</fig>
<p>As displayed in <xref ref-type="fig" rid="F5">Figure 5</xref>, <italic>U1</italic> and <italic>U2</italic> represent the feature map of the frequency flow and RGB flow, respectively, of the <italic>l</italic>-th stage of the network, and H, W, and C represent the length, width, and height, respectively, of the feature map. First, U1 and U2 are connected in the channel dimension to obtain <italic>U</italic>. Next, <italic>U</italic> is used to generate effective feature descriptors through GAP and GMP operations, and a 7 &#x000D7; 7 convolution operation was performed to reduce the dimension to one channel. The spatial attention feature is generated by sigmoid. Finally, this feature is multiplied with the input feature of each flow to obtain the enhanced feature as follows:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mi>V</mml:mi></mml:mtd><mml:mtd columnalign='left'><mml:mo>=</mml:mo></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>U</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>;</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>U</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>Z</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mi>s</mml:mi></mml:msubsup><mml:mo>;</mml:mo><mml:msubsup><mml:mi>Z</mml:mi><mml:mrow><mml:mi>max</mml:mi></mml:mrow><mml:mi>s</mml:mi></mml:msubsup><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x000A0;</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E12"><label>(11)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>U</mml:mi><mml:mi>z</mml:mi></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mi>V</mml:mi><mml:mo>&#x02297;</mml:mo><mml:mi>U</mml:mi><mml:mtext class="textrm" mathvariant="normal">i ,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, &#x02297; represents element multiplication, &#x003C3; represents the sigmoid function, and <italic>U</italic><sub><italic>Z</italic></sub> is the feature of each stream enhanced by the interactive-enhancement module. The module is inserted after each stage and placed after the self-enhancement module. Thus, the enhancement of RGB and high-frequency branches can be realized simultaneously.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4. Experimental results and analysis</title>
<sec>
<title>4.1. Datasets and evaluation criteria</title>
<p>FEI and HNU datasets (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>; Peng et al., <xref ref-type="bibr" rid="B13">2019</xref>) were used, and splicing the morphing attack was the primary attack mode. The images in the HNU dataset were collected from Chinese people and cover the face data of various genders. To ensure an excellent fusion effect, the individuals of the same age were selected, and the same lighting and background conditions were used. When evaluating the effectiveness of face fusion, four sub-protocols were included in the HNU for evaluating generalization. The pixel fusion factor corresponding to the four sub-protocols differs from the location fusion factor. In HNU (MDB1), the pixel fusion factor and position fusion factor are both fixed at 0.5, which reveals that the two faces as fusion materials that exhibit the same contribution to the fusion photograph. In this scenario, the attack effect is the best scenario because in this case, the fusion face image exhibits considerable similarity to the holder from the perspective of vision or face recognition systems. In practice, fusion photographs may be fused in various proportions of pixels and positions. To simulate the real scenario, the pixel fusion factor and position fusion factor of HNU (MDB2) and HNU (MDB3) were randomly selected with values ranging from 0.1 to 0.9, respectively. In HNU (MDB4), both factors were randomly selected. In the FEI dataset, Europeans and Americans are collection objects. In this dataset, both position fusion factor and pixel fusion factor are fixed values of 0.5. The details of the two datasets are presented in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>HNU and FEI dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center" colspan="2"><bold>Training set</bold></th>
<th valign="top" align="center" colspan="2"><bold>Validation set</bold></th>
<th valign="top" align="center" colspan="2"><bold>Testing set</bold></th>
<th valign="top" align="center"><bold>Pixel fusion factor</bold></th>
<th valign="top" align="center"><bold>Position fusion factor</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="center"><bold>Real face</bold></td>
<td valign="top" align="center"><bold>Morphed face</bold></td>
<td valign="top" align="center"><bold>Real face</bold></td>
<td valign="top" align="center"><bold>Morphed face</bold></td>
<td valign="top" align="center"><bold>Real face</bold></td>
<td valign="top" align="center"><bold>Morphed face</bold></td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">HNU (MDB1)</td>
<td valign="top" align="center">1,121</td>
<td valign="top" align="center">1,121</td>
<td valign="top" align="center">564</td>
<td valign="top" align="center">330</td>
<td valign="top" align="center">566</td>
<td valign="top" align="center">377</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.5</td>
</tr>
<tr>
<td valign="top" align="left">HNU (MDB2)</td>
<td valign="top" align="center">1,121</td>
<td valign="top" align="center">1,125</td>
<td valign="top" align="center">564</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">566</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.1&#x02013;0.9</td>
</tr>
<tr>
<td valign="top" align="left">HNU (MDB3)</td>
<td valign="top" align="center">1,121</td>
<td valign="top" align="center">1,125</td>
<td valign="top" align="center">564</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">566</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">0.1&#x02013;0.9</td>
<td valign="top" align="center">0.5</td>
</tr>
<tr>
<td valign="top" align="left">HNU (MDB4)</td>
<td valign="top" align="center">1,121</td>
<td valign="top" align="center">1,134</td>
<td valign="top" align="center">564</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">566</td>
<td valign="top" align="center">567</td>
<td valign="top" align="center">0.1&#x02013;0.9</td>
<td valign="top" align="center">0.1&#x02013;0.9</td>
</tr>
<tr>
<td valign="top" align="left">FEI</td>
<td valign="top" align="center">81</td>
<td valign="top" align="center">6,480</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">380</td>
<td valign="top" align="center">99</td>
<td valign="top" align="center">9,702</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.5</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To assess the effectiveness of proposed scheme, the experimental results of this method were compared with nine existing classical methods. The results are presented in <xref ref-type="table" rid="T2">Tables 2</xref>, <xref ref-type="table" rid="T3">3</xref>. In the case of deep learning technology, the results of the method were compared with VGG16 (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>), PLFL (Long et al., <xref ref-type="bibr" rid="B10">2022</xref>), TSCR (Long et al., <xref ref-type="bibr" rid="B9">2023</xref>), ResNet34 (He et al., <xref ref-type="bibr" rid="B7">2016</xref>), ShuffleNet (Zhang X. et al., <xref ref-type="bibr" rid="B25">2018</xref>), MobileNet (Sandler et al., <xref ref-type="bibr" rid="B18">2018</xref>). In the case of non-deep learning technology, the method was compared with BSIF (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>), FS-SPN (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>), and HOG (Scherhag et al., <xref ref-type="bibr" rid="B21">2018</xref>).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Experimental settings.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Parameter</bold></th>
<th valign="top" align="center"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Framework</td>
<td valign="top" align="center">Pytorch</td>
</tr>
<tr>
<td valign="top" align="left">Optimizer</td>
<td valign="top" align="center">Stochastic gradient descent (SGD)</td>
</tr>
<tr>
<td valign="top" align="left">Learning rate</td>
<td valign="top" align="center">1e-4</td>
</tr>
<tr>
<td valign="top" align="left">Loss criterion</td>
<td valign="top" align="center">Cross-entropy loss</td>
</tr>
<tr>
<td valign="top" align="left">Epochs</td>
<td valign="top" align="center">20</td>
</tr>
<tr>
<td valign="top" align="left">GPU</td>
<td valign="top" align="center">GeForce GTX 1060Ti</td>
</tr>
<tr>
<td valign="top" align="left">Batch size</td>
<td valign="top" align="center">16</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Detection results of the presented approach on fixed fusion factor datasets.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="center" colspan="3"><bold>FEI</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB1)</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center" colspan="2"><bold>BPCER&#x00040;APCER</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center" colspan="2"><bold>BPCER&#x00040;APCER</bold></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
<td/>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7" style="background-color:#dee1e1"><bold>Traditional technologies</bold></td>
</tr>
<tr>
<td valign="top" align="left">BSIF-SVM (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>)</td>
<td valign="top" align="center">3.38</td>
<td valign="top" align="center">8.67</td>
<td valign="top" align="center">4.79</td>
<td valign="top" align="center">20.80</td>
<td valign="top" align="center">22.60</td>
<td valign="top" align="center">19.82</td>
</tr>
<tr>
<td valign="top" align="left">HOG-SVM (Scherhag et al., <xref ref-type="bibr" rid="B21">2018</xref>)</td>
<td valign="top" align="center">3.03</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">24.84</td>
<td valign="top" align="center">62.90</td>
<td valign="top" align="center">48.39</td>
</tr>
<tr>
<td valign="top" align="left">FS-SPN (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">0.51</td>
<td valign="top" align="center">1.58</td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center">1.93</td>
<td valign="top" align="center">1.53</td>
<td valign="top" align="center">1.21</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7" style="background-color:#dee1e1"><bold>Deep learning technologies</bold></td>
</tr>
<tr>
<td valign="top" align="left">VGG16 (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>)</td>
<td valign="top" align="center">2.93</td>
<td valign="top" align="center">2.55</td>
<td valign="top" align="center">2.01</td>
<td valign="top" align="center">11.06</td>
<td valign="top" align="center">15.33</td>
<td valign="top" align="center">13.95</td>
</tr>
<tr>
<td valign="top" align="left">ResNet34 (He et al., <xref ref-type="bibr" rid="B7">2016</xref>)</td>
<td valign="top" align="center">3.95</td>
<td valign="top" align="center">2.01</td>
<td valign="top" align="center">1.33</td>
<td valign="top" align="center">3.00</td>
<td valign="top" align="center">4.45</td>
<td valign="top" align="center">2.53</td>
</tr>
<tr>
<td valign="top" align="left">ShuffleNetV2 (Zhang X. et al., <xref ref-type="bibr" rid="B25">2018</xref>)</td>
<td valign="top" align="center">2.02</td>
<td valign="top" align="center">2.02</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">4.01</td>
<td valign="top" align="center">3.98</td>
<td valign="top" align="center">1.89</td>
</tr>
<tr>
<td valign="top" align="left">MobileNetV2 (Sandler et al., <xref ref-type="bibr" rid="B18">2018</xref>)</td>
<td valign="top" align="center">3.95</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">3.53</td>
<td valign="top" align="center">3.36</td>
<td valign="top" align="center">1.41</td>
</tr>
<tr>
<td valign="top" align="left">PLFL Long et al., <xref ref-type="bibr" rid="B10">2022</xref></td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">1.35</td>
<td valign="top" align="center">0.37</td>
</tr>
<tr>
<td valign="top" align="left">TSCR (Long et al., <xref ref-type="bibr" rid="B9">2023</xref>)</td>
<td valign="top" align="center">1.04</td>
<td valign="top" align="center">1.09</td>
<td valign="top" align="center">0.66</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">1.31</td>
<td valign="top" align="center">0.37</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Proposed method</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>0.66</bold></td>
<td valign="top" align="center"><bold>0.17</bold></td>
<td valign="top" align="center"><bold>0.84</bold></td>
<td valign="top" align="center"><bold>0.36</bold></td>
<td valign="top" align="center"><bold>0.18</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values represent the best results.</p>
</table-wrap-foot>
</table-wrap>
<p>Furthermore, standardized ISO metrics (Biometrics, <xref ref-type="bibr" rid="B1">2016</xref>): APCER, BPCER, ACER, ACC and EER were used to evaluate detection performance. Here, APCER defines the proportion of the morphed image that is incorrectly classified as the real image, BPCER defines the proportion of real image that is incorrectly classified as the morphed image, ACER is defined as the average of BPCER and APCER. Furthermore, the results of EER, where BPCER = APCER, were provided.</p>
</sec>
<sec>
<title>4.2. Implementation details</title>
<p>The proposed approach is based on the Pytorch deep learning framework. In the training stage, the stochastic gradient descent (SGD) optimizer was used to optimize two branches, with a learning rate set to 1e-4. The loss criterion used was cross-entropy loss. The two branches were trained for 20 epochs on a GeForce GTX 1060Ti GPU, with a batch size value of 16. A summary table listing the parameters and criteria used for all algorithms is presented in <xref ref-type="table" rid="T2">Table 2</xref> for easy comparison.</p>
</sec>
<sec>
<title>4.3. Experimental results and analysis</title>
<sec>
<title>4.3.1. Single-dataset experiment and analysis</title>
<p>In a single-dataset comparison experiment, the proposed method was compared with the conventional method and the deep learning-based method for verifying the effectiveness of the method. <xref ref-type="table" rid="T3">Table 3</xref> indicates the quantitative results of the presented approach with nine classical approaches.</p>
<p>The proposed approach indicates that the performance of the EER was 0.12% with BPCER = 0.66% &#x00040;APCER=5%, and BPCER = 0.17% &#x00040;APCER= 10% on FEI. On HNU (MDB1), the EER is 0.84% with BPCER = 0.36% &#x00040;APCER = 5%, and BPCER = 0.18% &#x00040;APCER = 10%. Excellent results were obtained on both FEI and HNU (MDB1) datasets. The accuracy of the conventional method is low, and it exhibits considerable limitations as the feature extraction method. However, the effect of deep learning is superior to that of conventional methods, which indicates that deep learning technology exhibits obvious advantages. The performance of the proposed approach was verified on datasets with various pixel fusion factors, and <xref ref-type="table" rid="T4">Table 4</xref> indicates relevant results.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Results of the presented approach on various fusion factors datasets.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB2)</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB3)</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB4)</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center" colspan="2"><bold>BPCER&#x00040; APCER</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center" colspan="2"><bold>BPCER&#x00040; APCER</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center" colspan="2"><bold>BPCER&#x00040; APCER</bold></td>
</tr>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td/>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
<td/>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
<td/>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="10" style="background-color:#dee1e1"><bold>Traditional technologies</bold></td>
</tr>
<tr>
<td valign="top" align="left">BSIF-SVM (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>)</td>
<td valign="top" align="center">20.39</td>
<td valign="top" align="center">20.27</td>
<td valign="top" align="center">18.17</td>
<td valign="top" align="center">19.38</td>
<td valign="top" align="center">21.67</td>
<td valign="top" align="center">17.79</td>
<td valign="top" align="center">21.36</td>
<td valign="top" align="center">23.67</td>
<td valign="top" align="center">18.08</td>
</tr>
<tr>
<td valign="top" align="left">HOG-SVM (Scherhag et al., <xref ref-type="bibr" rid="B21">2018</xref>)</td>
<td valign="top" align="center">22.72</td>
<td valign="top" align="center">61.02</td>
<td valign="top" align="center">46.91</td>
<td valign="top" align="center">21.16</td>
<td valign="top" align="center">47.09</td>
<td valign="top" align="center">31.92</td>
<td valign="top" align="center">23.46</td>
<td valign="top" align="center">59.61</td>
<td valign="top" align="center">48.50</td>
</tr>
<tr>
<td valign="top" align="left">FS-SPN (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">1.56</td>
<td valign="top" align="center">2.02</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">1.49</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">0.49</td>
<td valign="top" align="center">1.69</td>
<td valign="top" align="center">1.41</td>
<td valign="top" align="center">0.18</td>
</tr>
<tr>
<td valign="top" align="left" colspan="10" style="background-color:#dee1e1"><bold>Deep learning technologies</bold></td>
</tr>
<tr>
<td valign="top" align="left">VGG16 (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>)</td>
<td valign="top" align="center">12.05</td>
<td valign="top" align="center">18.44</td>
<td valign="top" align="center">10.88</td>
<td valign="top" align="center">17.78</td>
<td valign="top" align="center">15.43</td>
<td valign="top" align="center">12.02</td>
<td valign="top" align="center">14.45</td>
<td valign="top" align="center">11.56</td>
<td valign="top" align="center">9.22</td>
</tr>
<tr>
<td valign="top" align="left">ResNet34 (He et al., <xref ref-type="bibr" rid="B7">2016</xref>)</td>
<td valign="top" align="center">3.52</td>
<td valign="top" align="center">1.33</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">3.44</td>
<td valign="top" align="center">1.51</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">4.53</td>
<td valign="top" align="center">2.72</td>
<td valign="top" align="center">0.88</td>
</tr>
<tr>
<td valign="top" align="left">ShuffleNetV2 (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">4.06</td>
<td valign="top" align="center">3.53</td>
<td valign="top" align="center">1.41</td>
<td valign="top" align="center">9.19</td>
<td valign="top" align="center">14.66</td>
<td valign="top" align="center">8.66</td>
<td valign="top" align="center">7.60</td>
<td valign="top" align="center">14.49</td>
<td valign="top" align="center">4.42</td>
</tr>
<tr>
<td valign="top" align="left">MobileNeV2 (Sandler et al., <xref ref-type="bibr" rid="B18">2018</xref>)</td>
<td valign="top" align="center">4.59</td>
<td valign="top" align="center">4.59</td>
<td valign="top" align="center">2.65</td>
<td valign="top" align="center">5.65</td>
<td valign="top" align="center">6.18</td>
<td valign="top" align="center">2.83</td>
<td valign="top" align="center">4.94</td>
<td valign="top" align="center">4.77</td>
<td valign="top" align="center">2.30</td>
</tr>
<tr>
<td valign="top" align="left">PLFL (Long et al., <xref ref-type="bibr" rid="B10">2022</xref>)</td>
<td valign="top" align="center">1.00</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center">1.24</td>
<td valign="top" align="center">1.21</td>
<td valign="top" align="center"><bold>0.31</bold></td>
<td valign="top" align="center">1.21</td>
<td valign="top" align="center">1.51</td>
<td valign="top" align="center">0.18</td>
</tr>
<tr>
<td valign="top" align="left">TSCR (Long et al., <xref ref-type="bibr" rid="B9">2023</xref>)</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.59</td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center">1.21</td>
<td valign="top" align="center">1.01</td>
<td valign="top" align="center">0.57</td>
<td valign="top" align="center">1.16</td>
<td valign="top" align="center">1.41</td>
<td valign="top" align="center">0.16</td>
</tr>
<tr>
<td valign="top" align="left">Proposed method</td>
<td valign="top" align="center"><bold>0.88</bold></td>
<td valign="top" align="center"><bold>0.27</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>1.06</bold></td>
<td valign="top" align="center"><bold>0.65</bold></td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center"><bold>0.77</bold></td>
<td valign="top" align="center"><bold>0.37</bold></td>
<td valign="top" align="center"><bold>0.05</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values represent the best results.</p>
</table-wrap-foot>
</table-wrap>
<p>For the presented approach, the EER was 0.88% on HNU (FaceMDB2), the EER was 1.06% on HNU (FaceMDB3), and the EER was 0.77% on HNU (FaceMDB3). Compared with nine MAD technologies, the proposed approach achieved excellent detection results on datasets with various pixel fusion factors. Under various pixel fusion and position fusion factors, the proposed approach was still robust.</p>
</sec>
<sec>
<title>4.3.2. Cross-dataset experiments and analysis</title>
<p>The cross-dataset test was conducted for verifying the generalization ability of the approach. HNU (MDB1) and FEI datasets were used in the study. The common feature of these two datasets is that the position fusion factor and pixel fusion factor were fixed at 0.5. <xref ref-type="table" rid="T5">Table 5</xref> indicates relevant results.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Detection results on cross dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Training dateset</bold></th>
<th valign="top" align="center"><bold>Test dateset</bold></th>
<th valign="top" align="center"><bold>Algorithms</bold></th>
<th valign="top" align="center"><bold>EER (%)</bold></th>
<th valign="top" align="center" colspan="2"><bold>BPCER&#x00040;APCER</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td valign="top" align="left" colspan="4"></td>
<td valign="top" align="center">=<bold>5%</bold></td>
<td valign="top" align="center">=<bold>10%</bold></td>
</tr>
<tr>
<td valign="top" align="left">HNU (MDB1)</td>
<td valign="top" align="center">FEI</td>
<td valign="top" align="center" colspan="4">Traditional technologies</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">BSIF-SVM (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>)</td>
<td valign="top" align="center">30.27</td>
<td valign="top" align="center">87.77</td>
<td valign="top" align="center">63.16</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">HOG-SVM (Scherhag et al., <xref ref-type="bibr" rid="B21">2018</xref>)</td>
<td valign="top" align="center">40.01</td>
<td valign="top" align="center">60.01</td>
<td valign="top" align="center">50.33</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">FS-SPN (Zhang X. et al., <xref ref-type="bibr" rid="B25">2018</xref>)</td>
<td valign="top" align="center">37.37</td>
<td valign="top" align="center">85.69</td>
<td valign="top" align="center">75.02</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4">Deep learning technologies</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">VGG16 (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>)</td>
<td valign="top" align="center">10.22</td>
<td valign="top" align="center">12.16</td>
<td valign="top" align="center">10.24</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">ResNet34 (He et al., <xref ref-type="bibr" rid="B7">2016</xref>)</td>
<td valign="top" align="center">5.65</td>
<td valign="top" align="center">8.08</td>
<td valign="top" align="center">4.55</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">ShuffleNet (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">12.63</td>
<td valign="top" align="center">24.24</td>
<td valign="top" align="center">15.15</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">MobileNet (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">15.87</td>
<td valign="top" align="center">35.35</td>
<td valign="top" align="center">23.23</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">PLFL (Long et al., <xref ref-type="bibr" rid="B10">2022</xref>)</td>
<td valign="top" align="center">4.52</td>
<td valign="top" align="center">3.30</td>
<td valign="top" align="center">1.51</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">TSCR (Long et al., <xref ref-type="bibr" rid="B9">2023</xref>)</td>
<td valign="top" align="center">4.48</td>
<td valign="top" align="center"><bold>2.02</bold></td>
<td valign="top" align="center"><bold>1.01</bold></td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center"><bold>Proposed method</bold></td>
<td valign="top" align="center"><bold>3.26</bold></td>
<td valign="top" align="center">3.03</td>
<td valign="top" align="center">1.47</td>
</tr>
<tr>
<td valign="top" align="left">FEI</td>
<td valign="top" align="center">HNU (MDB1)</td>
<td valign="top" align="center" colspan="4">Traditional technologies</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">BSIF-SVM (Raghavendra et al., <xref ref-type="bibr" rid="B16">2016</xref>)</td>
<td valign="top" align="center">40.09</td>
<td valign="top" align="center">81.86</td>
<td valign="top" align="center">47.19</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">HOG-SVM (Scherhag et al., <xref ref-type="bibr" rid="B21">2018</xref>)</td>
<td valign="top" align="center">35.48</td>
<td valign="top" align="center">87.10</td>
<td valign="top" align="center">80.97</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">FS-SPN (Zhang X. et al., <xref ref-type="bibr" rid="B25">2018</xref>)</td>
<td valign="top" align="center">25.09</td>
<td valign="top" align="center">60.07</td>
<td valign="top" align="center">45.09</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4">Deep learning technologies</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">VGG16 (Seibold et al., <xref ref-type="bibr" rid="B22">2017</xref>)</td>
<td valign="top" align="center">10.53</td>
<td valign="top" align="center">20.26</td>
<td valign="top" align="center">10.22</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">ResNet34 (He et al., <xref ref-type="bibr" rid="B7">2016</xref>)</td>
<td valign="top" align="center">17.47</td>
<td valign="top" align="center">45.58</td>
<td valign="top" align="center">23.04</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">ShuffleNet (Zhang L. B. et al., <xref ref-type="bibr" rid="B24">2018</xref>)</td>
<td valign="top" align="center">16.08</td>
<td valign="top" align="center">39.40</td>
<td valign="top" align="center">22.08</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">MobileNet (Sandler et al., <xref ref-type="bibr" rid="B18">2018</xref>)</td>
<td valign="top" align="center">26.68</td>
<td valign="top" align="center">65.02</td>
<td valign="top" align="center">51.24</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">PLFL (Long et al., <xref ref-type="bibr" rid="B10">2022</xref>)</td>
<td valign="top" align="center">8.33</td>
<td valign="top" align="center"><bold>11.25</bold></td>
<td valign="top" align="center">5.84</td>
</tr>
 <tr>
<td/>
<td/>
<td valign="top" align="center">TSCR (Long et al., <xref ref-type="bibr" rid="B9">2023</xref>)</td>
<td valign="top" align="center"><bold>7.95</bold></td>
<td valign="top" align="center">12.54</td>
<td valign="top" align="center"><bold>4.77</bold></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center"><bold>Proposed method</bold></td>
<td valign="top" align="center">10.22</td>
<td valign="top" align="center">14.25</td>
<td valign="top" align="center">5.25</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values represent the best results.</p>
</table-wrap-foot>
</table-wrap>
<p>In the cross-dataset test, the overall effect was reduced compared with the single-dataset experiment because the various methods of obtaining images from different datasets or races of individuals as materials. When HNU (MDB1) was used as the training set and tested on FEI, the EER value of presented approach was 3.26%. By contrast, when using FEI as the training set and HNU (MDB1) test, the EER value of proposed approach was 10.22%. Furthermore, the proposed approach can achieve excellent generalization ability.</p>
</sec>
<sec>
<title>4.3.3. Ablation experiment and analysis</title>
<p>(1) Ablation experiment for the two-stream network</p>
<p>Ablation experiments were conducted to verify the effectiveness of the designed two-stream convolution neural network. <xref ref-type="table" rid="T6">Table 6</xref> indicates relevant results.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Ablation results for the two-branch network.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th/>
<th valign="top" align="center" colspan="3"><bold>FEI</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB1)</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="center"><bold>ACER (%)</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center"><bold>ACC (%)</bold></td>
<td valign="top" align="center"><bold>ACER (%)</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center"><bold>ACC (%)</bold></td>
</tr>
<tr>
<td valign="top" align="left">High- frequency-CNN</td>
<td valign="top" align="center">1.55</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">98.80</td>
<td valign="top" align="center">1.99</td>
<td valign="top" align="center">1.74</td>
<td valign="top" align="center">98.13</td>
</tr>
<tr>
<td valign="top" align="left">RGB-CNN</td>
<td valign="top" align="center">6.08</td>
<td valign="top" align="center">2.59</td>
<td valign="top" align="center">97.21</td>
<td valign="top" align="center">7.07</td>
<td valign="top" align="center">3.00</td>
<td valign="top" align="center">94.16</td>
</tr>
<tr>
<td valign="top" align="left">TSCNN</td>
<td valign="top" align="center"><bold>0.67</bold></td>
<td valign="top" align="center"><bold>0.32</bold></td>
<td valign="top" align="center"><bold>98.93</bold></td>
<td valign="top" align="center"><bold>1.95</bold></td>
<td valign="top" align="center"><bold>0.88</bold></td>
<td valign="top" align="center"><bold>98.26</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values represent the best results.</p>
</table-wrap-foot>
</table-wrap>
<p>The effect of the high-frequency stream is superior than the RGB stream under the same conditions. This phenomenon indicates that distinguishing between real and the morphed face in the RGB color space is difficult, whereas the high-frequency stream can directly identify the difference between two categories of images. On the FEI dataset, the ACER of the TSCNN was 0.67%, the EER was 0.32%, and the ACC was 98.93%. On the HNU (MDB1) dataset, ACER was 1.95%, EER was 0.88%, and ACC was 98.26%. On the two datasets, the performance of the two-branch network achieved performance superior to that of the single-branch network. This phenomenon indicates that the fusion of the two branches contributes to a comprehensive feature representation.</p>
<p>(2) The ablation experiment for self-enhancement module and interactive-enhancement module</p>
<p>To highlight the contribution of self-enhancement module (SEM) and interactive- enhancement module (IEM) to the detection system, an ablation study was conducted on two datasets, and the relevant results are presented in <xref ref-type="table" rid="T7">Table 7</xref>.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>Ablation results for two enhancement modules.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="center" colspan="3"><bold>FEI</bold></th>
<th valign="top" align="center" colspan="3"><bold>HNU (MDB1)</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<td/>
<td valign="top" align="center"><bold>ACER (%)</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center"><bold>ACC (%)</bold></td>
<td valign="top" align="center"><bold>ACER (%)</bold></td>
<td valign="top" align="center"><bold>EER (%)</bold></td>
<td valign="top" align="center"><bold>ACC (%)</bold></td>
</tr>
<tr>
<td valign="top" align="left">TSCNN</td>
<td valign="top" align="center">0.67</td>
<td valign="top" align="center">0.32</td>
<td valign="top" align="center">98.93</td>
<td valign="top" align="center">1.95</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">98.26</td>
</tr>
<tr>
<td valign="top" align="left">TSCNN&#x0002B;SEM</td>
<td valign="top" align="center">0.33</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">99.57</td>
<td valign="top" align="center">1.91</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">98.46</td>
</tr>
<tr>
<td valign="top" align="left">PELF (ours)</td>
<td valign="top" align="center"><bold>0.08</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>99.83</bold></td>
<td valign="top" align="center"><bold>1.59</bold></td>
<td valign="top" align="center"><bold>0.84</bold></td>
<td valign="top" align="center"><bold>98.70</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values represent the best results.</p>
</table-wrap-foot>
</table-wrap>
<p>After introducing the designed self-enhancement and interactive-enhancement modules, on the FEI dataset, the ACER was 0.08%, the EER was 0.12%, and the ACC was 99.83%. On the HNU (MDB1) dataset, ACER was 1.59%, EER was 0.84%, and ACC was 98.70%. The SEM enhanced the characteristics of each flow, whereas the interactive-enhancement module can complement each other to enhance the feature interaction of dual flows. Therefore, performance on both modules improved. Through this progressive feature enhancement process, high-frequency information and RGB information can be effectively used to subtle morphing traces.</p>
</sec>
</sec>
</sec>
<sec id="s5">
<title>5. Discussion</title>
<p>The findings of our experiments demonstrated the effectiveness of the proposed method in detecting morphing attacks. Specifically, we compared the proposed method with methods on both single and cross-dataset evaluations, and the results revealed that the method achieved lower equal error rate. These results are particularly significant given the increasing prevalence of morphing attacks in various security-sensitive applications.</p>
<p>Furthermore, ablation experiments on the dataset demonstrated the critical importance of incorporating high-frequency features and a progressive enhancement learning framework into the detection process. The use of high-frequency features and a progressive enhancement learning framework based on two-stream networks considerably improved the performance of the model. High-frequency features are crucial in distinguishing between morphed and authentic images, as they can capture subtle differences that may not be visible to the naked eye. Moreover, the progressive enhancement learning framework enables the model to learn more discriminative features.</p>
</sec>
<sec id="s6">
<title>6. Conclusion</title>
<p>Morphed face detection is critical for mitigating illegal activities. Based on the conventional deep learning binary classification, a novel detection framework based on high-frequency features and progressive enhanced two-branch network structure was proposed. RGB stream and high-frequency information stream were used to simultaneously detect morphed faces and enhance the feature interaction of two streams by using the SEM and IEM. The robustness and generalization of the approach were verified on HNU and FEI datasets. In the future, high-frequency features and a progressive enhanced two-stream network can be used for detecting differential morphing attacks.</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="ethics-statement" id="s8">
<title>Ethics statement</title>
<p>Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>Conceptualization and validation: C-kJ and Y-cL. Investigation and writing&#x02013;review and editing: Y-cL. Writing&#x02013;original draft preparation: C-kJ. Supervision: C-kJ and Y-lC. Formal analysis: Y-lC. All authors have read and agreed to the published version of the manuscript.</p>
</sec>
</body>
<back>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<title>Abbreviations</title>
<fn fn-type="abbr"><p>ID, Identity Document; RGB, Red Green Blue; BSIF, Binary Statistical Image Features; HOG, Histogram of Oriented Gradient; LBP, Local Binary Patterns; JPEG, Joint Photographic Experts Group; CNN, Convolutional Neural Network; GMP, Global Max Pooling; GAP, Global Average Pooling; DFT, Discrete Fourier Transform; IDFT, Inverse Discrete Fourier Transform; APCER, Attack Presentation Classification Error Rate; BPCER, Bona Fide Presentation Classification Error Rate; ACER, Average Classification Error Rate; ACC, Accuracy; EER, Equal Error Rate; SGD, Stochastic Gradient Descent; SEM, Self-Enhancement Module; IEM, Interactive-Enhancement Module; TSCNN, Two-Stream Convolutional Neural Networks; PELF, Progressive Enhancement Learning Framework.</p></fn></fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biometrics</surname> <given-names>I. J.</given-names></name></person-group> (<year>2016</year>). <source>ISO/IEC 30107-3-2017, Information technology-Biometric Presentation Attack Detection&#x02014;Part 3: Testing and reporting (First Edition).</source> International Organization for Standardization.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>Y.</given-names></name> <name><surname>Gieseke</surname> <given-names>F.</given-names></name> <name><surname>Oehmcke</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Barnard</surname> <given-names>K.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Attentional feature fusion,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source> (<publisher-loc>Waikoloa, HI</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>3560</fpage>&#x02013;<lpage>3569</lpage>. <pub-id pub-id-type="doi">10.1109/WACV48630.2021.00360</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Debiasi</surname> <given-names>L.</given-names></name> <name><surname>Rathgeb</surname> <given-names>C.</given-names></name> <name><surname>Scherhag</surname> <given-names>U.</given-names></name></person-group> (<year>2018b</year>). <article-title>&#x0201C;PRNU variance analysis for morphed face image detection,&#x0201D;</article-title> in <source>2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS)</source> (<publisher-loc>Redendo, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Debiasi</surname> <given-names>L.</given-names></name> <name><surname>Scherhag</surname> <given-names>U.</given-names></name> <name><surname>Rathgeb</surname> <given-names>C.</given-names></name></person-group> (<year>2018a</year>). <article-title>&#x0201C;PRNU-based detection of morphed face images,&#x0201D;</article-title> in <source>2018 International Workshop on Biometrics and Forensics (IWBF)</source> (<publisher-loc>Sassari</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1109/IWBF.2018.8401555</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferrara</surname> <given-names>M.</given-names></name> <name><surname>Franco</surname> <given-names>A.</given-names></name> <name><surname>Maltoni</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;The magic passport,&#x0201D;</article-title> in <source>IEEE International Joint Conference on Biometrics</source> (<publisher-loc>Florida, FL</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/BTAS.2014.6996240</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>Q.</given-names></name> <name><surname>Chen</surname> <given-names>S.</given-names></name> <name><surname>Yao</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Ding</surname> <given-names>S.</given-names></name> <name><surname>Yi</surname> <given-names>R.</given-names></name></person-group> (<year>2022</year>). <article-title>Exploiting fine-grained face forgery clues via progressive enhancement learning</article-title>. <source>Proceedings of the AAAI Conf. Art. Intell</source>. <volume>36</volume>, <fpage>735</fpage>&#x02013;<lpage>743</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v36i1.19954</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Deep residual learning for image recognition,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id><pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>King</surname> <given-names>D. E.</given-names></name></person-group> (<year>2009</year>). <article-title>Dlib-ml: A machine learning toolkit</article-title>. <source>J. Mach. Learn. Res</source>. <volume>10</volume>, <fpage>1755</fpage>&#x02013;<lpage>1758</lpage>. <pub-id pub-id-type="doi">10.1145/1577069.1755843</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>M.</given-names></name> <name><surname>Jia</surname> <given-names>C.</given-names></name> <name><surname>Peng</surname> <given-names>F.</given-names></name></person-group> (<year>2023</year>). <article-title>Face morphing detection based on a two-stream network with channel attention and residual of multiple color spaces</article-title>. <source>International Conference on Machine Learning for Cyber Security</source>. (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>439</fpage>&#x02013;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-20102-8_34</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>M.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>L. B.</given-names></name></person-group> (<year>2022</year>). <article-title>Detection of face morphing attacks based on patch-level features and lightweight networks</article-title>. <source>Secu. Commun. Networks</source> 2022, 20. <pub-id pub-id-type="doi">10.1155/2022/7460330</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><collab>Luo Y. Zhang Y. Yan J. Liu W. Generalizing face forgery detection with high-frequency features. Proceedings of the IEEE/CVF Conference on Computer Vision Pattern Recognition. (2021) 16317-16326. 10.1109/CVPR46437.2021.01605</collab></person-group></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makrushin</surname> <given-names>A.</given-names></name> <name><surname>Neubert</surname> <given-names>T.</given-names></name> <name><surname>Dittmann</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Automatic generation and detection of visually faultless facial morphs</article-title>. <source>VISIGRAPP</source> <volume>3</volume>, <fpage>39</fpage>&#x02013;<lpage>50</lpage>. <pub-id pub-id-type="doi">10.5220/0006131100390050</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>F.</given-names></name> <name><surname>Zhang</surname> <given-names>L. B.</given-names></name> <name><surname>Long</surname> <given-names>M. F. D. G. A. N.</given-names></name></person-group> (<year>2019</year>). <article-title>Face de-morphing generative adversarial network for restoring accomplice&#x00027;s facial image</article-title>. <source>IEEE Access</source>. <volume>7</volume>, <fpage>75122</fpage>&#x02013;<lpage>75131</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2920713</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qin</surname> <given-names>L.</given-names></name> <name><surname>Peng</surname> <given-names>F.</given-names></name> <name><surname>Venkatesh</surname> <given-names>S.</given-names></name> <name><surname>Ramachandra</surname> <given-names>R.</given-names></name> <name><surname>Long</surname> <given-names>M.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Low visual distortion and robust morphing attacks based on partial face image manipulation</article-title>. <source>IEEE Transact. Biomet. Behav. Identity Sci</source>. <volume>3</volume>, <fpage>72</fpage>&#x02013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1109/TBIOM.2020.3022007</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Raghavendra</surname> <given-names>R.</given-names></name> <name><surname>Raja</surname> <given-names>K.</given-names></name> <name><surname>Venkatesh</surname> <given-names>S.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Face morphing vs. face averaging: vulnerability and detection. 2017 IEEE International Joint Conference on Biometrics (IJCB)</article-title>. <source>IEEE</source>. <volume>25</volume>, <fpage>555</fpage>&#x02013;<lpage>563</lpage>. <pub-id pub-id-type="doi">10.1109/BTAS.2017.8272742</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Raghavendra</surname> <given-names>R.</given-names></name> <name><surname>Raja</surname> <given-names>K. B.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Detecting morphed face images,&#x0201D;</article-title> in <source>BTAS 2016</source> (<publisher-loc>IEEE</publisher-loc>). <pub-id pub-id-type="doi">10.1109/BTAS.2016.7791169</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sa</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>C.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Attention and adaptive bilinear matching network for cross-domain few-shot defect classification of industrial parts</article-title>. <source>2021 International Joint Conference on Neural Networks (IJCNN). IEEE</source>. <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN52387.2021.9533518</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sandler</surname> <given-names>M.</given-names></name> <name><surname>Howard</surname> <given-names>A.</given-names></name> <name><surname>Zhu</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Mobilenetv2: Inverted residuals and linear bottlenecks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>4510</fpage>&#x02013;<lpage>4520</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00474</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scherhag</surname> <given-names>U.</given-names></name> <name><surname>Debiasi</surname> <given-names>L.</given-names></name> <name><surname>Rathgeb</surname> <given-names>C.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name> <name><surname>Uhl</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Detection of face morphing attacks based on PRNU analysis</article-title>. <source>IEEE Transact. Biomet. Behav. Identity Sci.</source> <volume>1</volume>, <fpage>302</fpage>&#x02013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1109/TBIOM.2019.2942395</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Scherhag</surname> <given-names>U.</given-names></name> <name><surname>Raghavendra</surname> <given-names>R.</given-names></name> <name><surname>Raja</surname> <given-names>K. B.</given-names></name> <name><surname>Gomez-Barrero</surname> <given-names>M.</given-names></name> <name><surname>Rathgeb</surname> <given-names>C.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>On the vulnerability of face recognition systems toward morphed face attacks</article-title>. <source>2017 5th International Workshop on Biometrics and Forensics (IWBF)</source> (Boston, MA: IEEE), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/IWBF.2017.7935088</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Scherhag</surname> <given-names>U.</given-names></name> <name><surname>Rathgeb</surname> <given-names>C.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2018</year>). <article-title>Towards detection of morphed face images in electronic travel documents</article-title>. <source>2018 13th IAPR International Workshop on Document Analysis Systems (DAS)</source>. IEEE. 187&#x02013;192. <pub-id pub-id-type="doi">10.1109/DAS.2018.11</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seibold</surname> <given-names>C.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>Hilsmann</surname> <given-names>A.</given-names></name> <name><surname>Eisert</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>Detection of face morphing attacks by deep learning</article-title>. <source>Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23-25, 2017. Proceedings 16</source>. Springer International Publishing. 107&#x02013;120. <pub-id pub-id-type="doi">10.1007/978-3-319-64185-0_9</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Venkatesh</surname> <given-names>S.</given-names></name> <name><surname>Ramachandra</surname> <given-names>R.</given-names></name> <name><surname>Raja</surname> <given-names>K.</given-names></name> <name><surname>Busch</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Single image face morphing attack detection using ensemble of features</article-title>. <source>2020 IEEE 23rd International Conference on Information Fusion (FUSION)</source>. IEEE. 1&#x02013;6. <pub-id pub-id-type="doi">10.23919/FUSION45008.2020.9190629</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>L. B.</given-names></name> <name><surname>Peng</surname> <given-names>F.</given-names></name> <name><surname>Long</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). Face morphing detection using Fourier spectrum of sensor pattern noise. 2018 <italic>IEEE International Conference on Multimedia and Expo (ICME)</italic> (La Jolla, CA: IEEE). 1&#x02013;6. <pub-id pub-id-type="doi">10.1109/ICME.2018.8486607</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name> <name><surname>Lin</surname> <given-names>M.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Shufflenet: an extremely efficient convolutional neural network for mobile devices,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>La Jolla, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>6848</fpage>&#x02013;<lpage>6856</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00716</pub-id></citation>
</ref>
</ref-list>
</back>
</article>