<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mater.</journal-id>
<journal-title>Frontiers in Materials</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mater.</abbrev-journal-title>
<issn pub-type="epub">2296-8016</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">756798</article-id>
<article-id pub-id-type="doi">10.3389/fmats.2021.756798</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Materials</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Transfer Learning-Based Algorithms for the Detection of Fatigue Crack Initiation Sites: A Comparative Study</article-title>
<alt-title alt-title-type="left-running-head">Wang and Guo</alt-title>
<alt-title alt-title-type="right-running-head">AI-Driven Fatigue Crack Detection</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>S.Y.</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1437477/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Guo</surname>
<given-names>T.</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
</contrib-group>
<aff>School of Civil Engineering, Southeast University, <addr-line>Nanjing</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1276477/overview">Tanmoy Mukhopadhyay</ext-link>, Indian Institute of Technology Kanpur, India</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1276246/overview">Vinod Kushvaha</ext-link>, Indian Institute of Technology Jammu, India</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1294803/overview">Sathiskumar Anusuya Ponnusami</ext-link>, City University of London, United&#x20;Kingdom</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: T. Guo, <email>guotong@seu.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Materials Science, a section of the journal Frontiers in Materials</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>756798</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Wang and Guo.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Wang and Guo</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>The identification of fatigue crack initiation sites (FCISs) is routinely performed in the field of engineering failure analyses; this process is not only time-consuming but also knowledge-intensive. The emergence of convolutional neural networks (CNNs) has inspired numerous innovative solutions for image analysis problems in interdisciplinary fields. As an explorative study, we trained models based on the principle of transfer learning using three state-of-the-art CNNs, namely VGG-16, ResNet-101, and feature pyramid network (FPN), as feature extractors, and a faster R-CNN as the backbone to establish models for FCISs detection. The models showed application-level detection performance, with the highest precision reaching up to 95.9% at a confidence threshold of 0.6. Among the three models, the ResNet model exhibited the highest accuracy and lowest training cost. The performance of the FPN model closely followed that of the ResNet model with an advantage in terms of the recall.</p>
</abstract>
<kwd-group>
<kwd>computer vision</kwd>
<kwd>machine learning</kwd>
<kwd>transfer learning</kwd>
<kwd>fatigue crack initiation sites</kwd>
<kwd>faster R-CNN</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Fatigue fracture often occurs in engineering structures, such as aircrafts (<xref ref-type="bibr" rid="B4">Cowles, 1996</xref>), bridge (<xref ref-type="bibr" rid="B11">Guo et&#x20;al., 2020</xref>), surgical implants (<xref ref-type="bibr" rid="B14">Huang et&#x20;al., 2005</xref>), at variety scales. Fractographic studies <italic>via</italic> different types of microscopy techniques are key to identifying the causes and crack growth behaviors in fatigue-fractured components (<xref ref-type="bibr" rid="B21">Kushvaha and Tippur, 2014</xref>). However, fractographic analyses are not only time-consuming but also knowledge-intensive. Even an experienced material scientist may require a considerable amount of time in identifying the characteristics of new materials. Hence, it would be beneficial to develop methods that can accurately interpret most of the information in images without much human effort.</p>
<p>Convolutional neural networks (CNNs), which are inspired by the structure of actual visual systems, are one of the major advancements in the field of computer vision (<xref ref-type="bibr" rid="B15">Hubel and Wiesel, 1962</xref>; <xref ref-type="bibr" rid="B7">Fukushima, 1980</xref>; <xref ref-type="bibr" rid="B22">Lecun et&#x20;al., 1998</xref>). With the popularity of deep CNNs in computer vision, they have been increasingly applied to the material science field, such as for predicting compounds (<xref ref-type="bibr" rid="B45">Xie and Grossman, 2018</xref>) and material properties (<xref ref-type="bibr" rid="B36">Sharma et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B37">Sharma and Kushvaha, 2020</xref>), analyzing X-ray diffraction patterns (<xref ref-type="bibr" rid="B29">Park et&#x20;al., 2017</xref>), and classifying crystal structures (<xref ref-type="bibr" rid="B33">Ryan et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B53">Ziletti et&#x20;al., 2018</xref>). In the years to come, it is reasonable to believe that CNNs will significantly accelerate the development of data-driven material science.</p>
<p>In a previous study (<xref ref-type="bibr" rid="B43">Wang et&#x20;al., 2020</xref>), we employed machine learning approaches to recognize fatigue crack initiation sites (FCISs) in fractographic images of metallic compounds. The models were planned to be developed as an automatic FCIS detection module, which can be embedded with the observation systems attached to microscopes for a quick and accurate detection of FCISs. Given the lack of data, we selected a deep learning framework, namely the deeply supervised object detector (DSOD) (<xref ref-type="bibr" rid="B38">Shen et&#x20;al., 2017</xref>), a framework that can train models of good performance from scratch. Although the DSOD has shown comparable or even superior accuracy in detecting objects in many domains compared with other state-of-the-art detectors (<xref ref-type="bibr" rid="B38">Shen et&#x20;al., 2017</xref>), our results were below expectations. We explain some of the possible reasons for the difficulties in extracting features from FCISs as: 1) FCISs do not have clear boundaries for identification as in the case of objects in other detection tasks (e.g., animal, cars, or plants); 2) their features are typically blurry and nonobjective because of the low contrast and poor resolution, especially at low magnifications; 3) there is no distinction between foreground and background in most cases (as seen in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Examples of fatigue crack initiation sites (FCISs) of different types, morphologies, and locations. The red boxes indicate the FCISs.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g001.tif"/>
</fig>
<p>Owing to the rapid development in computer vision algorithms, the above problem can be solved by transfer learning. Transfer learning is a machine learning method where the knowledge of an already trained model is reused as the starting point of a different but related problem (<xref ref-type="bibr" rid="B27">Pan and Yang, 2009</xref>; <xref ref-type="bibr" rid="B44">Weiss et&#x20;al., 2016</xref>). In some problems where there is limited supply of training data, transfer learning can be utilized to develop efficient models. The transfer learning method has been successfully employed in many different areas, such as text mining (<xref ref-type="bibr" rid="B28">Pan et&#x20;al., 2012</xref>), image classification (<xref ref-type="bibr" rid="B31">Quattoni et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B52">Zhu et&#x20;al., 2011</xref>), spam filtering (<xref ref-type="bibr" rid="B25">Meng et&#x20;al., 2010</xref>), and speech emotion recognition (<xref ref-type="bibr" rid="B3">Coutinho et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B40">Song et&#x20;al., 2014</xref>). There are many ways of transferring knowledge from one task to another. Using pre-trained networks is a highly effective transfer learning approach that can improve the detection capacity of a new model when data are insufficient. State-of-the-art deep architectures, such as VGG (named after the Visual Geometry Group at University of Oxford) (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>), Residual Neural Network (ResNet) (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>), and Inception (<xref ref-type="bibr" rid="B41">Szegedy et&#x20;al., 2016</xref>), exhibit a good performance for classification and localization problems. Most object detection and segmentation architectures, such as the faster R-CNN (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>), can be built based on previous models through the concept of transfer learning.</p>
<p>The region-based CNN (R-CNNs) family, namely the R-CNN, fast R-CNN, and faster R-CNN, was developed by Girshick and Ren (<xref ref-type="bibr" rid="B8">Girshick et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B9">Girshick, 2015</xref>; <xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>) for object localization and recognition. The R-CNN algorithm is faster and more accurate than conventional object detectors (e.g., Histogram of Oriented Gradient, HOG), as the sliding windows are replaced by &#x201c;selective search&#x201d; to extract CNN features from each candidate region. However, the training phase is computationally slow because of the multi-stage pipeline training process (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>). In the fast R-CNN, the training process is accelerated by employing three different models instead of a single model to extract features and by exploiting a region of interest (RoI) pooling layer to share the computation. The faster R-CNN (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>) was developed from the fast R-CNN, exhibiting improved training and detection speeds. The architecture comprises a region proposal network (RPN) and fast R-CNN network with shared convolutional (conv.) feature layers in a single unified model design.</p>
<p>The fast R-CNN and faster R-CNN are both initialized by taking the output of a pre-trained deep CNN, such as VGG-16, on large-scale datasets (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>; <xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>). The fine-tuning conv. layers in the pre-trained VGG-16 model improved the mAP (mean average precision) of both algorithms (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>; <xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>). Similarly, <xref ref-type="bibr" rid="B26">Oquab et&#x20;al. (2014)</xref> transferred mid-level image features from a source task, for which a CNN was trained on the ImageNet (<xref ref-type="bibr" rid="B5">Deng et&#x20;al., 2009</xref>) with a large number of labeled images, to target tasks by immigrating the pre-trained conv. layers (C1-C5). The results showed that the features transferred from the pre-trained CNNs could significantly improve the classification performance of the target task with limited training data. Through a survey, <xref ref-type="bibr" rid="B44">Weiss et&#x20;al. (2016)</xref> reported that this type of fine-tuning process can be classified under feature-based transfer learning.</p>
<p>Apart from the VGG-based faster R-CNN and ZF net-based faster R-CNN, He et&#x20;al. tested the performance of a ResNet-based faster R-CNN (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>). As VGG-16 was replaced by ResNet-101, the faster R-CNN system improved the mAP (@[0.5,95]) by 6.0% on the COCO validation set (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>). <xref ref-type="bibr" rid="B23">Lin et&#x20;al. (2017)</xref> further updated the ResNet-based faster R-CNN by modifying the backbone with a feature pyramid network (FPN), achieving even better results on the COCO minimal set over several strong baselines. Shared features were found to be beneficial for marginally improved accuracy and reduced testing&#x20;time.</p>
<p>Focusing on the detection of FCISs, we constructed three transfer learning models using the faster R-CNN as the backbone and three CNN architectures, namely VGG-16, ResNet-101, and FPN, as feature extractors. The datasets were composed of fractographic images containing various types of FCISs of metallic compounds. This interdisciplinary study aimed to propose effective transfer learning methods that can accurately detect FCISs with a limited amount of data. The purpose of this study was to explore the possibilities of employing machine learning approaches in identifying nonobjective features seen in material science images and inspire researchers to solve similar image-driven material problems.</p>
<p>The rest of this article is organized as follows: <italic>Extended Introduction of Related Work</italic> reviews the definition of transfer learning and its correlations with the computer vision approaches exploited in this study. Subsequently, a brief introduction to the development of faster R-CNN and relevant deep architectures (VGG-16, ResNet-101, and FPN) is given to help researchers from areas other than computer science, such as material scientists, to understand the salient features of the faster R-CNN-based object detectors. <italic>Experimental Works</italic> introduces the databases and training implementations of the three models in detail. In <italic>Results and Discussion</italic>, the performances of the models for FCIS detection are evaluated in terms of the detection accuracy, training ability, calculation efficiency, and calculation cost. <italic>Conclusion</italic> presents the conclusions drawn from the study results.</p>
</sec>
<sec id="s2">
<title>Extended Introduction of Related Work</title>
<p>Since this is an interdisciplinary study, we reviewed related concepts to help researchers from a non-computer-science background to understand the approaches&#x20;used.</p>
<sec id="s2-1">
<title>Transfer Learning</title>
<p>Transfer learning in general refers to machine learning approaches where knowledge gained in one task is reused to improve the learning of a related task. The reasons for using transfer learning are based on the fact that the successful application of a deep neural network depends on a tremendous amount of training or pre-training data. Such data are sometimes expensive or difficult to obtain, such as in the case of FCISs. Many examples have shown that transfer learning can be beneficial for problems where training and testing data are in different feature spaces or data distributions (<xref ref-type="bibr" rid="B44">Weiss et&#x20;al., 2016</xref>). Transfer learning can be broadly categorized into inductive transfer learning, unsupervised transfer learning, and transductive transfer learning. Each category contains various sub-approaches (<xref ref-type="bibr" rid="B27">Pan and Yang, 2009</xref>; <xref ref-type="bibr" rid="B44">Weiss et&#x20;al., 2016</xref>). The approaches most closely related to our work in computer vision are transfer learning methods with a pre-trained model/ConvNet (Convolutional Network) (<xref ref-type="bibr" rid="B26">Oquab et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B34">Schwarz et&#x20;al., 2015</xref>). A pre-trained model is typically trained on large benchmark image datasets, such as the ImageNet, which contains rich feature representations from a low level to a high level (<xref ref-type="bibr" rid="B49">Zeiler and Fergus, 2014</xref>). These feature representations can be partly or entirely reused in other tasks simply by integrating the activated conv. layers in a new deep neural network as a feature extractor and then fine-tuning the layers of the pre-trained model <italic>via</italic> continuous backpropagation. Many applications (<xref ref-type="bibr" rid="B26">Oquab et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B34">Schwarz et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B16">Huh et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B30">Qian et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B24">Lucena et&#x20;al., 2017</xref>) have shown that state-of-the-art object detectors can be built <italic>via</italic> this approach.</p>
</sec>
<sec id="s2-2">
<title>Faster R-CNN-Based Object Detectors</title>
<sec id="s2-2-1">
<title>Faster R-CNN Baseline</title>
<p>As previously highlighted, the faster R-CNN was developed along the lines of R-CNN, fast R-CNN, and its previous versions (<xref ref-type="bibr" rid="B6">Du, 2018</xref>; <xref ref-type="bibr" rid="B19">Khan et&#x20;al., 2019</xref>). The R-CNN marks one of the most important milestones in object detection. It is the first neural network that uses an object proposal algorithm called &#x201c;selective search&#x201d; to extract a manageable number of independent regions for classification and bounding-box regression. However, training an R-CNN model is expensive and slow because of the multiple steps involved in the process (<xref ref-type="bibr" rid="B8">Girshick et&#x20;al., 2014</xref>). In contrast to using the R-CNN to extract CNN feature vectors independently for each region proposal, the fast R-CNN passes the entire image to the deeper VGG-16 network to generate a conv. feature map for sharing the computation among the region proposals (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>). For each object proposal, an RoI pooling layer is used to replace the last max-pooling layer in the pre-trained CNN for extracting a fixed-length feature vector. The fast R-CNN has two heads, namely a classification head and a bounding-box regression head, which are jointly trained using a multi-task loss L (Softmax Loss &#x2b; Smooth<sub>L1</sub> Loss) on each labeled RoI. Thus, the precision of the algorithm is improved. All the above steps are executed simultaneously, making this method faster than the R-CNN (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>).</p>
<p>Although the selective search approach in the R-CNN and fast R-CNN is a more efficient method for localizing objects than using the sliding window in CNN methods, the process is slow because of the large number of separate regional proposals (<xref ref-type="bibr" rid="B6">Du, 2018</xref>). The faster R-CNN replaces the selective search with an RPN to obtain an object proposal by sliding it on the last shared conv. layer of the pre-trained CNNs (VGG-16 or ZF-net) (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>). To solve the shape variations of the objects in the RPN, anchor boxes are introduced in the faster R-CNN. At each sliding position, there are nine candidate anchors (3 scales &#xd7; 3 aspect ratios), the probabilities of which being foreground (positive sample) or background (negative sample) are predicted by the RPN. For a positive sample, the intersection-over-union (IoU) ratio is greater than 0.7, whereas the IoU ratio of a negative sample is less than 0.3. For each region proposal, the RPN uses two fully connected (FC) layers to judge and select anchors according to the above rules. Thus, the RPN generates bounding boxes of various sizes and their probabilities of each class. The first-round proposal generated by the RPN is used to train the fast R-CNN and then initialize the RPN training process. Such an alternating process of training the fast R-CNN and then fine-tuning the RPN-specific layers is repeated until the result converges well (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>).</p>
<p>Although the correlation between sharing features from pre-trained CNN layers and transfer learning is less emphasized for the fast R-CNN (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>), this point is highlighted in the case of the faster R-CNN as it has the advantage of sharing computation from the pre-trained conv. layers (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>). As mentioned above, there is consensus that training an object detector on a small dataset using the already learned features from a CNN trained on large datasets can be categorized under transfer learning (<xref ref-type="bibr" rid="B26">Oquab et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B1">Akcay et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B44">Weiss et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B19">Khan et&#x20;al., 2019</xref>). Based on the features of four different transfer learning approaches specified by <xref ref-type="bibr" rid="B27">Pan and Yang (2009)</xref>, and examples given by <xref ref-type="bibr" rid="B44">Weiss et&#x20;al. (2016)</xref>, it can be deduced that using a pre-trained model for feature extraction represents a feature-based transfer learning approach.</p>
</sec>
<sec id="s2-2-2">
<title>ResNet-Based and FPN-Based Faster R-CNN</title>
<p>The integration of ResNet in the faster R-CNN was first proposed as an implementation of ResNet (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>). <xref ref-type="bibr" rid="B12">He et&#x20;al. (2016)</xref> employed the faster R-CNN as the baseline and replaced VGG-16 with ResNet-101 (101-layer residual net); the modified detector showed remarkable improvement in terms of the mAP, which was 6.9% [@.5] and 6.0% [@.5.95] higher than those of the original version on the COCO validation set. <xref ref-type="bibr" rid="B23">Lin et&#x20;al. (2017)</xref> further modified the ResNet-based faster R-CNN with an FPN architecture. Instead of sharing the features of the last conv. layer in the pre-trained models, the FPN generates a feature pyramid from the ResNet backbone and multi-scale feature proposals in the RPN. Thus, the high-resolution features from the lower convolution layers and high-level semantic features could be used for prediction, thus improving the detection accuracy. The architecture details of the ResNet and FPN are introduced in the next section.</p>
</sec>
</sec>
<sec id="s2-3">
<title>VGG, ResNet, and FPN</title>
<sec id="s2-3-1">
<title>VGG</title>
<p>The VGG networks, introduced by <xref ref-type="bibr" rid="B39">Simonyan and Zisserman (2014)</xref>, are known for their simplicity, homogenous topology, and relatively good network depth. In the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) held in 2013, it was found that small filters can improve the performance of CNNs (<xref ref-type="bibr" rid="B49">Zeiler and Fergus, 2014</xref>); therefore, only 3&#x20;&#xd7; 3 conv. filters were used in all the layers of the VGG to increase the network depth. VGG-16 contains a total of 16 layers, including 13 conv. and three FC layers. The 13 conv. layers are segregated into five conv. blocks (2&#x2013;3 conv. layer &#x2b; ReLU) by max pooling. Max pooling layers are applied to obtain features at the end of each block. VGG networks show good performance in image classification (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>). However, the training process is slow because of the large number of parameters [138 million parameters (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>)].</p>
</sec>
<sec id="s2-3-2">
<title>ResNet</title>
<p>In conventional sequential architectures, the networks are constructed by stacking a set of &#x201c;building blocks.&#x201d; Therefore, they have drawbacks, such as gradient vanishing and network degradation, as the networks get deeper. The ResNet architecture proposed by <xref ref-type="bibr" rid="B12">He et&#x20;al. (2016)</xref> exhibits an excellent performance in constructing deep networks without the above problems, using batch normalization and residual learning framework. Besides the common features in a CNN, such as convolution, pooling, activation, and FC layers, residual blocks are created in the residual learning framework by adopting identity/shortcut connections between every few stacked layers. A residual block is defined as in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>.<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>x</italic> and <italic>H</italic>(<italic>x</italic>) are respectively the input and output vectors of a residual block whose dimensions are assumed to be identical. Therefore, the residual function <inline-formula id="inf1">
<mml:math id="m2">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> represents the difference between the input and output. If the dimensions of <italic>x</italic> and <italic>F</italic> are unequal, <italic>x</italic> is multiplied by a linear projection <italic>W</italic>
<sub>
<italic>s</italic>
</sub> to match the dimensions (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>). It can be deduced from <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> that if there is nothing to learn (<inline-formula id="inf2">
<mml:math id="m3">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>), i.e.,&#x20;when the identity mappings are optimal, the residual blocks make the network to preserve the pre-learnt features by applying identity mapping, and thus, the input will be equal to the output (<inline-formula id="inf3">
<mml:math id="m4">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>). If the layer learns something, i.e.,&#x20;<inline-formula id="inf4">
<mml:math id="m5">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, it will be added to the network. Therefore, ResNet is always able to produce an optimal feature map for precise image classification, as evidenced by its (ResNet-152) first-place performance in the ILSVRC-2015 competition (<xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>).</p>
</sec>
<sec id="s2-3-3">
<title>FPN</title>
<p>Low-level features extracted from lower layers, such as edges, curves, and dots, have high resolution and localization ability, but are less semantic. By contrast, the features generated at higher layers have high semantic value but have low resolution and localization accuracy (<xref ref-type="bibr" rid="B49">Zeiler and Fergus, 2014</xref>). With the introduction of a pyramid architecture composed of a bottom&#x2013;up pathway, top&#x2013;down pathway, and lateral connection, the FPN can combine low-resolution, semantically strong features with high-resolution, semantically weak features. In the original version of the FPN proposed by <xref ref-type="bibr" rid="B23">Lin et&#x20;al. (2017)</xref>, the ResNet is used as the backbone in the bottom&#x2013;up pathway, and the features are extracted from the last layer in each residual block (denoted by C<sub>1</sub>, C<sub>2</sub>, &#x2026;, C<sub>5</sub>). A 1&#x20;&#xd7; 1 convolution filter is applied to the last feature map layer of the bottom&#x2013;up pathway to reduce the dimension and is used as the starting layer (denoted by <italic>P</italic>
<sub>
<italic>5</italic>
</sub>) of the top&#x2013;down pathway. The top&#x2013;down pathway creates new layers (<italic>P</italic>
<sub>
<italic>2</italic>
</sub>, <italic>P</italic>
<sub>
<italic>3</italic>
</sub>, <italic>and P</italic>
<sub>
<italic>4</italic>
</sub>) by upsampling the previous layers with a factor of two using the nearest neighbors. A lateral connection is used at each pyramid level to merge the feature maps of the same spatial size (<italic>d</italic>&#x20;&#x3d; 256) from the bottom&#x2013;up pathway and top&#x2013;down pathway by element-wise addition. Since the FPN is not an object detector in itself, it should be integrated with other object detectors, e.g., the faster R-CNN. Unlike the VGG and ResNet in the faster R-CNN, which only transfer a single-scale feature map to create RoIs, the FPN generates a pyramid of feature maps. Thus, the RoIs are assigned to each pyramid level (<italic>P</italic>
<sub>
<italic>k</italic>
</sub>) of different scales (<inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mi>&#x230a;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msqrt>
<mml:mo>/</mml:mo>
<mml:mn>224</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>
</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>&#x230b;</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <italic>k</italic>
<sub>
<italic>0</italic>
</sub> is set to 4, and <italic>w</italic> and <italic>h</italic> are the width and height, respectively). The predictor heads are attached to all levels with shared parameters (<xref ref-type="bibr" rid="B23">Lin et&#x20;al., 2017</xref>).</p>
</sec>
</sec>
</sec>
<sec id="s3">
<title>Experimental Works</title>
<sec id="s3-1">
<title>Datasets</title>
<p>As a comparatively new machine learning domain, there is no &#x201c;off-the-shelf&#x201d; library for FCIS images. Therefore, we acquired data from the Internet by meticulous selection and from our in-house research works related to fatigue in metallic materials. The images were normalized for their format and size based on the standards used in the faster R-CNN (<xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>). The datasets in the target task contain 291 images in total, with various FCIS details (such as location, morphology, size, and type) and image magnifications (&#xd7;18&#x223c;&#xd7;600) to ensure the generalization of the datasets. Following an existing data portioning ratio of 65% (training data):35% (testing data), the data were split into a training set (212 images) and a testing set (79 images). Instead of arbitrary selection, most of the testing data were chosen to maintain consistency in the distributions of the training and testing datasets (examples are shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). Several images of unusual FCIS types (<xref ref-type="fig" rid="F3">Figure&#x20;3</xref>) were also added to the testing dataset to evaluate the generalization ability of the models. Although the unusual FCISs did contain some features of common FCISs, they also contained some patterns that were rare or absent in the training dataset. It should be emphasized that all the images in the training and testing datasets contained at least one&#x20;FCIS.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Examples of images in the training and testing datasets.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Unusual FCIS types used for testing the generalization of models.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g003.tif"/>
</fig>
<p>The training data were annotated using LabelImg by a professional in the fatigue research field. LabelImg is an image annotation tool written in Python that can generate object bounding boxes. The Annotations were saved as XML files in the PASCAL VOC format.</p>
</sec>
<sec id="s3-2">
<title>Training on VGG-16 and ResNet-101-Based Faster R-CNN</title>
<p>The two deep neural networks (VGG-16 and ResNet-101) were pre-trained on the ImageNet dataset following the procedure provided in previous studies (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>; <xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>), thus attaining a robust hierarchy of features. For the pre-training, we implemented VGG-16 and ResNet-101 in the Caffe formats as done by <xref ref-type="bibr" rid="B39">Simonyan and Zisserman (2014)</xref> and <xref ref-type="bibr" rid="B17">Kaiming et&#x20;al. (2015)</xref>, respectively. <xref ref-type="table" rid="T1">Table&#x20;1</xref> lists the details of the pre-training on VGG-16 and ResNet-101.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Details of the pre-training processes for VGG-16 and ResNet-101 on ImageNet.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">Dataset</th>
<th align="center">Batch size</th>
<th align="center">Learning rate</th>
<th align="center">Optimizer</th>
<th align="center">Weight decay</th>
<th align="center">Momentum</th>
<th align="center">Dropout ratio</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<bold>VGG-16</bold>
</td>
<td align="left">1.3 million training images, 50&#xa0;k validation, 100&#xa0;k testing images</td>
<td align="char" char=".">256</td>
<td align="left">Starts from 0.01&#x2192; divided by 10 when validation accuracy stops increasing</td>
<td align="left">Mini-batch gradient descent</td>
<td align="left">L2 penalty multiplier 0.0005</td>
<td align="char" char=".">0.9</td>
<td align="center">FC layer 0.5</td>
</tr>
<tr>
<td align="left">
<bold>ResNet-101</bold>
</td>
<td align="left">1.3 million training images, 50&#xa0;k validation, 100&#xa0;k test images</td>
<td align="char" char=".">256</td>
<td align="left">starts from 0.1 &#x2192; divided by 10 when error plateaus</td>
<td align="left">Mini-batch SGD</td>
<td align="left">L2 penalty multiplier 0.0001</td>
<td align="char" char=".">0.9</td>
<td align="center">N/A</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We used the frozen conv. layers of the pre-trained VGG-16/ResNet-101 (the weights were unchanged during model training) as feature extractors to train the RPN and detection network in the faster R-CNN (as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>). The training was carried out using the open-source faster R-CNN framework shared by <xref ref-type="bibr" rid="B46">Yang. (2017)</xref>. The input images were resized and then cropped to dimensions of 224&#x20;&#xd7; 224 following the method reported in (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>; <xref ref-type="bibr" rid="B32">Ren et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B12">He et&#x20;al., 2016</xref>). Each mini-batch contained only two images, thus avoiding overfitting on small datasets (<xref ref-type="bibr" rid="B42">Talo et&#x20;al., 2019</xref>). All the training processes were terminated in 5,000 epochs, and the checkpoint was saved every 500 epochs. The initial learning rate was set to 0.01, and the rest of the optimization parameters were set as the default values in the torch. optim.SGD() function in Pytorch library; the weight decay and dampening were set to 0, and Nesterov was not applied. The experiments were performed on a single GeForce GTX 1060 GPU with 6&#xa0;GB of memory.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Schematic of fine-tuning the faster R-CNN with pre-trained parameters transferred from VGG-16 and ResNet-101.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g004.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>Training on FPN-Based Faster R-CNN</title>
<p>As shown in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>, the architecture of the FPN-based faster R-CNN was modified from that of the ResNet-based version. Hence, we only pre-trained the ResNet-101 baseline. The model was trained using the open-source Pytorch code, which was shared by <xref ref-type="bibr" rid="B47">Yang (2018)</xref>. For comparison, we kept the training parameters consistent with the above experiments on VGG-16 and ResNet-101 based on the faster R-CNN. The model was trained for 5,000 epochs using stochastic gradient descent (SGD) iterative method, with an initial learning rate of 0.001 and a mini-batch size of&#x20;2.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Schematic of the overall architecture of FPN-based faster R-CNN.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g005.tif"/>
</fig>
</sec>
<sec id="s3-4">
<title>Evaluation of Results</title>
<p>Typically, the IoU is often used for judging whether the predicted bounding box makes a good detection of the objects. The IoU is the ratio of the intersection area of the prediction bounding box and the ground truth box to their union area. However, it is difficult to use the IoU as an index for evaluating the results in this study because of the following two reasons: 1) The ground-truth bounding boxes were drawn based on manual estimation, which brings a high degree of arbitrariness; 2) It is difficult to set standards for drawing standardized ground-truth bounding boxes, because FCISs are varied in terms of the shape and size, and have no exact boundaries. Hence, the results were judged by the same material scientist who drew the ground-truth bounding boxes for a consistent judgment. We classified the prediction bounding boxes into three types in adherence to strict standards: true positive (TP), false positive (FP), and false negative (FN). The following are the definitions of the three types of results:</p>
<p>
<bold>TP Results</bold>: The prediction boxes can accurately detect the FCISs or miss a marginally small part of FCISs but still cover most areas of the FCISs (as shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>).</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Examples of TP (green boxes) and FP results (yellow boxes).</p>
</caption>
<graphic xlink:href="fmats-08-756798-g006.tif"/>
</fig>
<p>
<bold>FP Results</bold>: The prediction boxes are at an incorrect part in the image or further away from the vicinity of fatigue crack initiation areas (as shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>).</p>
<p>
<bold>FN Results</bold>: No FCISs are detected in the images.</p>
<p>The statistical numbers of TP, FP, and FN are used for calculating the accuracy (A), precision (P), recall (R), and F1&#x20;score.</p>
<p>The accuracy represents the number of prediction boxes that are correct among the sample numbers. It is a quick index to determine the general performance of models.<disp-formula id="e2">
<mml:math id="m7">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>The precision is the proportion of correct positive identifications among all the prediction boxes.<disp-formula id="e3">
<mml:math id="m8">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>Recall is the proportion of correct positive predictions made from all the FCISs in the testing dataset. It can be considered as the sensitivity of the models in detecting FCISs, i.e.,&#x20;a higher R value indicates a stronger ability to detect FCISs under all conditions.<disp-formula id="e4">
<mml:math id="m9">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>The F1 score is the harmonic mean of the combined precision and recall. A good F1 score (approaching 1) means that the model is less affected by false results.<disp-formula id="e5">
<mml:math id="m10">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>All the above evaluation metrics vary with the changes in the confidence threshold set. The confidence threshold is the lower boundary of the confidence score of an object bounding box below which the results are removed. To obtain high recall values, which means that the TP predictions should be as accurate as possible, a low confidence threshold of 0.1 should be set to ensure that we would not miss any TP results that have low confidence values. A high threshold of 0.6 was used for comparison.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s4">
<title>Results and Discussion</title>
<sec id="s4-1">
<title>Model Accuracy</title>
<p>
<xref ref-type="table" rid="T2">Table&#x20;2</xref> gives a comparison of the performances of the three models at thresholds of 0.1 and 0.6. The three models are namely the VGG, ResNet, and FPN models. Remarkable improvements in the detection performance can be seen compared with our previous study on training a similar model from scratch using the DSOD algorithm (<xref ref-type="bibr" rid="B43">Wang et&#x20;al., 2020</xref>). In the previous study, our best result was 22.1% in accurately detecting FCISs with just one bounding box and 24.0% for no valid results (i.e.,&#x20;FN results). Comparatively, even the VGG model, which exhibited a lower performance, could achieve ratios of 81.0 and 84.8% in accurately detecting FCISs with one bounding box at thresholds of 0.1 and 0.6, respectively.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Summary of elevation metrics of three different models at thresholds of 0.1 and 0.6 (The highest values are <bold>boldfaced</bold>).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Threshold value</th>
<th rowspan="2" align="center">Evaluation metrics</th>
<th colspan="3" align="center">Model</th>
</tr>
<tr>
<th align="center">VGG</th>
<th align="center">ResNet</th>
<th align="center">FPN</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="left">
<bold>0.1</bold>
</td>
<td align="center">A</td>
<td align="char" char=".">0.612</td>
<td align="char" char=".">
<bold>0.757</bold>
</td>
<td align="char" char=".">0.740</td>
</tr>
<tr>
<td align="center">P</td>
<td align="char" char=".">0.679</td>
<td align="char" char=".">
<bold>0.839</bold>
</td>
<td align="char" char=".">0.802</td>
</tr>
<tr>
<td align="center">R</td>
<td align="char" char=".">0.860</td>
<td align="char" char=".">0.886</td>
<td align="char" char=".">
<bold>0.917</bold>
</td>
</tr>
<tr>
<td align="center">F1</td>
<td align="char" char=".">0.759</td>
<td align="char" char=".">
<bold>0.862</bold>
</td>
<td align="char" char=".">0.856</td>
</tr>
<tr>
<td rowspan="4" align="left">
<bold>0.6</bold>
</td>
<td align="center">A</td>
<td align="char" char=".">0.733</td>
<td align="char" char=".">
<bold>0.835</bold>
</td>
<td align="char" char=".">0.809</td>
</tr>
<tr>
<td align="center">P</td>
<td align="char" char=".">0.917</td>
<td align="char" char=".">
<bold>0.959</bold>
</td>
<td align="char" char=".">0.889</td>
</tr>
<tr>
<td align="center">R</td>
<td align="char" char=".">0.786</td>
<td align="char" char=".">0.866</td>
<td align="char" char=".">
<bold>0.900</bold>
</td>
</tr>
<tr>
<td align="center">F1</td>
<td align="char" char=".">0.846</td>
<td align="char" char=".">
<bold>0.910</bold>
</td>
<td align="char" char=".">0.894</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>At both the thresholds, we see that the ResNet model presents the best performance in terms of both the accuracy (0.839 and 0.959) and precision (0.757 and 0.835). The accuracy of the models is very close to the advanced studies on the applications of using artificial intelligence to solve material problems (<xref ref-type="bibr" rid="B13">Hemath et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B20">Kushvaha et&#x20;al., 2020</xref>). Since there are no true negative results, higher accuracy values mean that the ResNet model has a lower portion of false results (FP and FN) among the observations, and higher precision values hint that the portions of FP were lower for the ResNet&#x20;model.</p>
<p>The FPN model is slightly superior in terms of the recall (0.917 and 0.900) among the three methods, indicating that it can detect more accurate FCISs from existing FCISs in the testing dataset. The F1 score helps evaluate which model has a better overall performance in terms of both the precision and recall. Although the ResNet model achieves the highest F1 scores (0.862 and 0.910), which are just marginally higher than those of the FPN model (by 0.006) at both threshold values, the overall performances of the ResNet and FPN models are nearly identical.</p>
<p>The superiorities of the two models in terms of the precision and recall can help determine their application scenarios. If applications require a model that can recognize more FCISs in the images, the FPN model with higher recall values will be preferable; if the applications require highly certain results, the priority selection is the ResNet model. The VGG model is less competitive as all the evaluation metrics were the lowest.</p>
<p>The variations in the metrics between thresholds of 0.1 and 0.6 are compared in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>. As the threshold increased from 0.1 to 0.6, the evaluation metrics of the three models improved except for the recall. The VGG model shows more distinct improvements in terms of the accuracy, precision, and F1 score, but a more significant drop in the recall value compared to the other two models. There is always a tradeoff between the precision and recall as the confidence threshold varies. Thus, the threshold values should be chosen depending on the application requirement.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Changes in the accuracy, precision, recall, and F1 score of the three models for thresholds ranging from 0.1 to 0.6.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g007.tif"/>
</fig>
<p>None of the three models are excellent for all types of FCISs, i.e.,&#x20;they perform well at detecting some FCISs but not so at detecting others; examples are given in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>. Hence, it would be helpful to try different models when no TP results can be obtained using the employed model. Comparing <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> with <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>, we find that some of the unusual FCISs (<xref ref-type="fig" rid="F8">Figures 8B,C</xref>) can still be correctly detected by at least one of the three models, indicating that the models can detect features that are rare but relevant in the training dataset.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Examples of undetected cases: <bold>(A)</bold> only undetected by the VGG model; <bold>(B)</bold> only undetected by the ResNet model; <bold>(C)</bold> only undetected by the FPN model; <bold>(D)</bold> only correctly detected by the FPN model; <bold>(E)</bold> only correctly detected by the ResNet model; and <bold>(F)</bold> only correctly detected by the VGG&#x20;model.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g008.tif"/>
</fig>
</sec>
<sec id="s4-2">
<title>Training Loss</title>
<p>As it has been known, the loss function can extract all aspects of a model down into a single number, which allows the models to be evaluated and compared. Thus, loss value is an indicator for evaluating the performance of a model. For a perfect model, the loss is zero. As shown in <xref ref-type="fig" rid="F9">Figure&#x20;9</xref>, if we plot the loss curves as a function of the entire training cycle (5,000 epochs), the variation tendency is compressed and hidden. Thus, the loss was replotted against only 50 epochs for each model (as shown in the small windows) to amplify the changes. The loss values can be used to describe how closely the values predicted by a model match the true values of the problem (<xref ref-type="bibr" rid="B10">Goodfellow et&#x20;al., 2016</xref>). For an ideal model, we can expect a reduction in the loss after each, or several, iteration(s)/epoch(s). For all the models, a sharp drop was found at the first epoch, followed by a gradual decline, eventually leading to stable oscillations. The stable oscillations started between 10 and 20 epochs within small amplitudes and are likely due to the small batch size. Although the small batch size leads to noise in the gradient estimation, the noise is crucial to avoid sharp minima, which could lead to poor generalization for deep learning (<xref ref-type="bibr" rid="B18">Keskar et&#x20;al., 2016</xref>). A lower loss indicates that the model made fewer errors on the data. Thus, the training process finally gives a lower loss, indicating a better model. The lowest average value of the loss, approximately 0.106, in the stably fluctuating region was for the FPN model (as shown in <xref ref-type="fig" rid="F9">Figure&#x20;9</xref>).</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Comparison of loss during training.</p>
</caption>
<graphic xlink:href="fmats-08-756798-g009.tif"/>
</fig>
</sec>
<sec id="s4-3">
<title>Calculation Efficiency and Cost</title>
<p>From the perspective of practical applications, the calculation efficiency and cost should be considered, since high requirements in terms of the time consumption, memory cost, and space occupancy would restrict a model when it comes to real-time applications or in the case of devices with a lower calculation capacity.</p>
<p>The calculation efficiency is evaluated using the total training time for 5,000 epochs and the average detection time per image (<xref ref-type="table" rid="T3">Table&#x20;3</xref>); the former reflects the time required for training a new model with a new dataset, and the latter can be used to evaluate whether a model can be implemented for instant scenarios, such as on-site detection of FCISs, when using microscopy techniques. The ResNet model required the least amount of training time, approximately 118&#xa0;h less than that required for training the FPN model. With the GeForce GTX 1060 GPU (6&#xa0;GB of memory), all the three models could instantaneously make detections within 0.15&#xa0;s per image, and the VGG model was the fastest in terms of the detection speed. The subtle difference between the average detection time cannot be used as a decisive index for model selection unless a considerable number of images need to be processed.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Average training time and average detection time per image required by the three models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">Training time (5,000 epochs)/h</th>
<th align="center">Average detection time per image<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>/s</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">VGG</td>
<td align="char" char=".">194.9</td>
<td align="char" char=".">0.11</td>
</tr>
<tr>
<td align="left">ResNet</td>
<td align="char" char=".">171.2</td>
<td align="char" char=".">0.15</td>
</tr>
<tr>
<td align="left">FPN</td>
<td align="char" char=".">289.7</td>
<td align="char" char=".">0.14</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn1">
<label>a</label>
<p>This time includes the time used for detection and annotation, but does not include the time required for loading the model and configuring the parameters.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>
<xref ref-type="table" rid="T4">Table&#x20;4</xref> summarizes the memory space and model size required for the training models, pre-trained model sizes, and video memory consumption during detection. The pre-trained model size of the VGG-16 model was nearly three times those of the other two models, leading to a larger final VGG model. Typically, for the same input data, a larger model size means a higher number of parameters in the deep neural network. <xref ref-type="bibr" rid="B2">Canziani et&#x20;al. (2016)</xref> reported that VGG-16 requires more parameters [138&#xa0;M parameters (<xref ref-type="bibr" rid="B39">Simonyan and Zisserman, 2014</xref>)] than ResNet-101 [44.5&#xa0;M parameters (<xref ref-type="bibr" rid="B48">Yu et&#x20;al., 2017</xref>)] trained on ImageNet and uses larger feature sizes in many layers, making it computationally costly.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Memory size, model size and pre-trained model size of three models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">Pre-trained model size (MB)</th>
<th align="center">Final model size (GB)</th>
<th align="center">Training memory (GB)</th>
<th align="center">Detection memory per image (GB)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">VGG model</td>
<td align="char" char=".">528</td>
<td align="char" char=".">1.10</td>
<td align="char" char=".">4.82</td>
<td align="char" char=".">2.89</td>
</tr>
<tr>
<td align="left">ResNet model</td>
<td align="char" char=".">171</td>
<td align="char" char=".">0.36</td>
<td align="char" char=".">3.67</td>
<td align="char" char=".">4.00</td>
</tr>
<tr>
<td align="left">FPN model</td>
<td align="char" char=".">171</td>
<td align="char" char=".">0.46</td>
<td align="char" char=".">4.63</td>
<td align="char" char=".">3.90</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Although the FPN and ResNet models have similar sizes, the FPN model required more memory for training, indicating a greater number of computations because of the additional intermediate variables involved. On the other hand, the ResNet model is at a disadvantage in terms of the memory usage for detection, particularly compared with the VGG&#x20;model.</p>
</sec>
</sec>
<sec id="s5">
<title>Summary</title>
<p>The three models were compared in terms of the accuracy, training process, calculation efficiency, and memory cost. The following are their features for FCIS detection:</p>
<sec id="s5-1">
<title>VGG Model</title>
<p>It was the most expensive model for training and exhibited the lowest performance in terms of the detection accuracy among the three models. It was advantageous in terms of the detection time and memory&#x20;cost.</p>
</sec>
<sec id="s5-2">
<title>ResNet Model</title>
<p>It showed the best performance in terms of the detection accuracy with the minimum model size and training memory cost. The only drawbacks were the relatively high detection memory cost and slightly longer detection&#x20;time.</p>
</sec>
<sec id="s5-3">
<title>FPN Model</title>
<p>Its detection performance was largely similar to that of the ResNet model. However, the model outperformed in terms of the recall. Its calculation cost was quite similar to the ResNet model as well. Thus, if the application requires a higher performance in terms of the recall, this model is superior to the ResNet model. It also performed best for training, as the average loss value was the lowest among the three. However, it was time-consuming for training a new dataset.</p>
<p>The results show that all the three models can be trained thoroughly to obtain good or even desired accuracies for real-time FCIS detection. The relatively complex architecture of the faster R-CNN demands more memory for detection, leading to certain but not extremely high requirements on the processors. In applications for which high-capacity computers are available, e.g., computers attached to microscope for imaging, they can be developed as modules and embedded into the microscope software packages for a quick FCIS detection.</p>
<p>Currently, the fast R-CNN-based models cannot be employed for small devices, such as smartphones, because of the relatively large model size and memory requirement. However, solutions could be developed using simpler algorithms as the backbone and feature extractor, or simply by using a single state-of-the-art algorithm, albeit with a lower detection accuracy, which has been proven in some studies (<xref ref-type="bibr" rid="B2">Canziani et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B35">Sehgal and Kehtarnavaz, 2019</xref>; <xref ref-type="bibr" rid="B50">Zhang and Deng, 2019</xref>; <xref ref-type="bibr" rid="B51">Zhu and Spachos, 2019</xref>).</p>
<p>This work has two limitations. The first one is that our training dataset was not general enough to cover all types of fatigue surfaces of metallic compounds. The possible solution to this problem is to collect more annotated data during the implementation of the FCIS detection module and then update the module for improved generalization with the added data. The other limitation is that the number of transfer learning algorithms evaluated was low, hindering the discovery of more possibilities.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s6">
<title>Conclusion</title>
<p>This paper presented a comparative work on three transfer learning algorithms using the faster R-CNN as the backbone for detecting FCISs. The three faster R-CNN-based algorithms, namely VGG-16, ResNet-101, and FPN, were used as feature extractors to share the features extracted from ImageNet. All the three models showed remarkable improvements in the detection accuracy compared with our previous study on training a similar model from scratch, indicating the underlying benefits of transferring different semantic features for detecting abstract features such as FCISs. A comparison between the three models in terms of the accuracy, precision, recall, and F1 score showed that the feature extractor with a deeper architecture (ResNet-101) was more efficient in improving the accuracy of the transfer learning models. The overall detection performances of the ResNet and FPN models were similar, with subtle advantages in terms of the precision and recall, respectively. The ResNet model exhibited a better performance in terms of the training time and memory cost compared to the FPN model, whereas the FPN model was better trained because of the lower average loss. Although the VGG model exhibited a lower performance in terms of the detection accuracy among the three, it outperformed the others in terms of the detection time and memory requirement.</p>
<p>Moreover, there was always a tradeoff between the precision and recall. Increasing the confidence threshold value increased the accuracy, precision, and F1 score but reduced the recall. Therefore, the threshold value should be carefully selected depending on the application requirement.</p>
</sec>
</body>
<back>
<sec id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>SW contributed to conception and design of the study, carried out the experiemts, and wrote the manuscript under the supervision of TG.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>Financial support from Jiangsu Key Research and Development Program (Grant No. BE2019107), the China Postdoctoral Science Foundation (Grant No. 2020M681460), and the Natural Science Foundation of Jiangsu (Grant No. BK20210255) are gratefully acknowledged.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akcay</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kundegorski</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Devereux</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Breckon</surname>
<given-names>T. P.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Transfer Learning Using Convolutional Neural Networks for Object Classification within X-ray Baggage Security Imagery</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE International Conference on Image Processing</conf-name>, <conf-date>25-28 Sept. 2016</conf-date>, <fpage>1057</fpage>&#x2013;<lpage>1061</lpage>. </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Canziani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Paszke</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Culurciello</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2016</year>). <source>An Analysis of Deep Neural Network Models for Practical Applications</source>. <comment>arXiv:1605.07678</comment> </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Coutinho</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Schuller</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Transfer Learning Emotion Manifestation across Music and Speech</article-title>,&#x201d; in <conf-name>Proceedings of the International Joint Conference on Neural Networks (IJCNN)</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/ijcnn.2014.6889814</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cowles</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>High Cycle Fatigue in Aircraft Gas Turbines&#x2014;An Industry Perspective</article-title>. <source>Int. J.&#x20;Fracture</source> <volume>80</volume> (<issue>2-3</issue>), <fpage>147</fpage>&#x2013;<lpage>163</lpage>. <pub-id pub-id-type="doi">10.1007/bf00012667</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Deng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Socher</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fei-Fei</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Imagenet: A Large-Scale Hierarchical Image Database</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2009.5206848</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Du</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Understanding of Object Detection Based on CNN Family and YOLO</article-title>. <source>J.&#x20;Phys. Conf. Ser.</source> <volume>1004</volume>, <fpage>012029</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/1004/1/012029</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fukushima</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>1980</year>). <article-title>Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position</article-title>. <source>Biol. Cybernetics</source> <volume>36</volume> (<issue>4</issue>), <fpage>193</fpage>&#x2013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1007/bf00344251</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Donahue</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Darrell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Malik</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2014.81</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Fast R-Cnn</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE international conference on computer vision</conf-name>. <pub-id pub-id-type="doi">10.1109/iccv.2015.169</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Deep Learning</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Correia</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>de Jesus</surname>
<given-names>A. M. P.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Experimental Study on Fretting-Fatigue of Bridge cable Wires</article-title>. <source>Int. J.&#x20;Fatigue</source> <volume>131</volume>, <fpage>105321</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijfatigue.2019.105321</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep Residual Learning for Image Recognition</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2016.90</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hemath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mavinkere Rangappa</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kushvaha</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Dhakal</surname>
<given-names>H. N.</given-names>
</name>
<name>
<surname>Siengchin</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Comprehensive Review on Mechanical, Electromagnetic Radiation Shielding, and thermal Conductivity of Fibers/inorganic Fillers Reinforced Hybrid Polymer Composites</article-title>. <source>Polym. Composites</source> <volume>41</volume> (<issue>10</issue>), <fpage>3940</fpage>&#x2013;<lpage>3965</lpage>. <pub-id pub-id-type="doi">10.1002/pc.25703</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>H. M.</given-names>
</name>
<name>
<surname>Tsai</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>C. C.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C. T.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. Y.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Evaluation of Loading Conditions on Fatigue-Failed Implants by Fracture Surface Analysis</article-title>. <source>Int. J.&#x20;Oral Maxillofac. Implants</source> <volume>20</volume> (<issue>6</issue>), <fpage>854</fpage>&#x2013;<lpage>859</lpage>. </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hubel</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Wiesel</surname>
<given-names>T. N.</given-names>
</name>
</person-group> (<year>1962</year>). <article-title>Receptive fields, Binocular Interaction and Functional Architecture in the Cat&#x27;s Visual Cortex</article-title>. <source>J.&#x20;Physiol.</source> <volume>160</volume> (<issue>1</issue>), <fpage>106</fpage>&#x2013;<lpage>154</lpage>. <pub-id pub-id-type="doi">10.1113/jphysiol.1962.sp006837</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agrawal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Efros</surname>
<given-names>A. A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>What Makes ImageNet Good for Transfer Learning?</article-title> </citation>
</ref>
<ref id="B17">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Kaiming</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun.</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Deep Residual Learning for Image Recognition</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/KaimingHe/deep-residual-networks">https://github.com/KaimingHe/deep-residual-networks</ext-link>
</comment>. </citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Keskar</surname>
<given-names>N. S.</given-names>
</name>
<name>
<surname>Mudigere</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Nocedal</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Smelyanskiy</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>P. T. P.</given-names>
</name>
</person-group> (<year>2016</year>). <source>On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima</source>. </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sohail</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Zahoora</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Qureshi</surname>
<given-names>A. S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Survey of the Recent Architectures of Deep Convolutional Neural Networks</article-title>. <comment>arXiv preprint arXiv:1901.06032</comment>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kushvaha</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Madhushri</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Artificial Neural Network Technique to Predict Dynamic Fracture of Particulate Composite</article-title>. <source>J.&#x20;Compos. Mater.</source> <volume>54</volume> (<issue>22</issue>), <fpage>3099</fpage>&#x2013;<lpage>3108</lpage>. <pub-id pub-id-type="doi">10.1177/0021998320911418</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kushvaha</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Tippur</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Effect of Filler Shape, Volume Fraction and Loading Rate on Dynamic Fracture Behavior of Glass-Filled Epoxy</article-title>. <source>Composites B: Eng.</source> <volume>64</volume>, <fpage>126</fpage>&#x2013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1016/j.compositesb.2014.04.016</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lecun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bottou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Haffner</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Gradient-based Learning Applied to Document Recognition</article-title>. <source>Proc. IEEE</source> <volume>86</volume> (<issue>11</issue>), <fpage>2278</fpage>&#x2013;<lpage>2324</lpage>. <pub-id pub-id-type="doi">10.1109/5.726791</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>T.-Y.</given-names>
</name>
<name>
<surname>Dollar</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hariharan</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Belongie</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Feature Pyramid Networks for Object Detection</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2017.106</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lucena</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Junior</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Moia</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Souza</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Valle</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Lotufo</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Transfer Learning Using Convolutional Neural Networks for Face Anti-spoofing</article-title>,&#x201d; in <conf-name>Proceedings of the International Conference Image Analysis and Recognition</conf-name>, <conf-date>02 June 2017</conf-date> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>). </citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Meng</surname>
<given-names>J.-n.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H.-f.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y.-h.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Transfer Learning Based on Svd for Spam Filtering</article-title>,&#x201d; in <conf-name>Proceedings of the International Conference on Intelligent Computing and Cognitive Informatics</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/icicci.2010.115</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Oquab</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bottou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Laptev</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Sivic</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2014.222</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>A Survey on Transfer Learning</article-title>. <source>IEEE Trans. knowledge Data Eng.</source> <volume>22</volume> (<issue>10</issue>), <fpage>1345</fpage>&#x2013;<lpage>1359</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>Transfer Learning for Text Mining</article-title>,&#x201d; in <source>Mining Text Data</source> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>223</fpage>&#x2013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4614-3223-4_7</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>W. B.</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sohn</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Pyo</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Classification of crystal Structure Using a Convolutional Neural Network</article-title>. <source>Int. Union Crystallogr. J.</source> <volume>4</volume> (<issue>4</issue>), <fpage>486</fpage>&#x2013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1107/s205225251700714x</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Learning and Transferring Representations for Image Steganalysis Using Convolutional Neural Network</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE International Conference on Image Processing</conf-name> (<publisher-loc>Phoenix, AZ</publisher-loc>: <publisher-name>ICIP</publisher-name>), <fpage>2752</fpage>&#x2013;<lpage>2756</lpage>. <pub-id pub-id-type="doi">10.1109/icip.2016.7532860</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Quattoni</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Darrell</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Transfer Learning for Image Classification with Sparse Prototype Representations</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2008.4587637</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks</article-title>,&#x201d; in <source>Advances in Neural Information Processing Systems</source>. <volume>28</volume>, <fpage>91</fpage>&#x2013;<lpage>99</lpage>. </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ryan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Lengyel</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shatruk</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Crystal Structure Prediction via Deep Learning</article-title>. <source>J.&#x20;Am. Chem. Soc.</source> <volume>140</volume> (<issue>32</issue>), <fpage>10158</fpage>&#x2013;<lpage>10168</lpage>. <pub-id pub-id-type="doi">10.1021/jacs.8b03913</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schwarz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Behnke</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>RGB-D Object Recognition and Pose Estimation Based on Pre-trained Convolutional Neural Network Features</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE International Conference on Robotics and Automation</conf-name> (<publisher-loc>Seattle, WA</publisher-loc>: <publisher-name>ICRA</publisher-name>). <pub-id pub-id-type="doi">10.1109/icra.2015.7139363</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sehgal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kehtarnavaz</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps</article-title>. <source>Machine Learning and Knowledge Extraction</source> <volume>1</volume> (<issue>1</issue>), <fpage>450</fpage>&#x2013;<lpage>465</lpage>. </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Anand Kumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kushvaha</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Effect of Aspect Ratio on Dynamic Fracture Toughness of Particulate Polymer Composite Using Artificial Neural Network</article-title>. <source>Eng. Fracture Mech.</source> <volume>228</volume>, <fpage>106907</fpage>. <pub-id pub-id-type="doi">10.1016/j.engfracmech.2020.106907</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kushvaha</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Predictive Modelling of Fracture Behaviour in Silica-Filled Polymer Composite Subjected to Impact with Varying Loading Rates Using Artificial Neural Network</article-title>. <source>Eng. Fracture Mech.</source> <volume>239</volume>, <fpage>107328</fpage>. <pub-id pub-id-type="doi">10.1016/j.engfracmech.2020.107328</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Dsod: Learning Deeply Supervised Object Detectors from Scratch</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/iccv.2017.212</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Simonyan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zisserman</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Very Deep Convolutional Networks for Large-Scale Image Recognition</source>. <comment>arXiv: 1409.1556</comment> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xin</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Speech Emotion Recognition Using Transfer Learning</article-title>. <source>IEICE Trans. Inf. Syst.</source> <volume>E97.D</volume> (<issue>9</issue>), <fpage>2530</fpage>&#x2013;<lpage>2532</lpage>. <pub-id pub-id-type="doi">10.1587/transinf.2014edl8038</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Vanhoucke</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ioffe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shlens</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wojna</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Rethinking the Inception Architecture for Computer Vision</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/cvpr.2016.308</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Talo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Baloglu</surname>
<given-names>U. B.</given-names>
</name>
<name>
<surname>Y&#x131;ld&#x131;r&#x131;m</surname>
<given-names>&#xd6;.</given-names>
</name>
<name>
<surname>Rajendra Acharya</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Application of Deep Transfer Learning for Automated Brain Abnormality Classification Using Mr Images</article-title>. <source>Cogn. Syst. Res.</source> <volume>54</volume>, <fpage>176</fpage>&#x2013;<lpage>188</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogsys.2018.12.007</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>S. Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>P. Z.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S. Y.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>D. B.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Computer Vision Based Machine Learning Approach for Fatigue Crack Initiation Sites Recognition</article-title>. <source>Comput. Mater. Sci.</source> <volume>171</volume>, <fpage>109259</fpage>. <pub-id pub-id-type="doi">10.1016/j.commatsci.2019.109259</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Khoshgoftaar</surname>
<given-names>T. M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A Survey of Transfer Learning</article-title>. <source>J.&#x20;Big Data</source> <volume>3</volume> (<issue>1</issue>), <fpage>9</fpage>. <pub-id pub-id-type="doi">10.1186/s40537-016-0043-6</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Grossman</surname>
<given-names>J.&#x20;C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties</article-title>. <source>Phys. Rev. Lett.</source> <volume>120</volume> (<issue>14</issue>), <fpage>145301</fpage>. <pub-id pub-id-type="doi">10.1103/physrevlett.120.145301</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Batra</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Parikh</surname>
<given-names>D.</given-names>
</name>
</person-group>, (<year>2017</year>). <source>A Faster Pytorch Implementation of Faster R-Cnn</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/jwyang/faster-rcnn.pytorch">https://github.com/jwyang/faster-rcnn.pytorch</ext-link>
</comment> </citation>
</ref>
<ref id="B47">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Pytorch Implementation of Feature Pyramid Network (FPN) for Object Detection</source>. </citation>
</ref>
<ref id="B48">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Koltun</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Funkhouser</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Dilated Residual Networks</source>. </citation>
</ref>
<ref id="B49">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zeiler</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Fergus</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Visualizing and Understanding Convolutional Networks</article-title>,&#x201d; in <conf-name>European Conference on Computer Vision</conf-name> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name>). <pub-id pub-id-type="doi">10.1007/978-3-319-10590-1_53</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Deep Learning Based Fossil-Fuel Power Plant Monitoring in High Resolution Remote Sensing Images: A Comparative Study</article-title>. <source>Remote Sensing</source> <volume>11</volume> (<issue>9</issue>), <fpage>1117</fpage>. <pub-id pub-id-type="doi">10.3390/rs11091117</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Spachos</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Towards Image Classification with Machine Learning Methodologies for Smartphones</article-title>. <source>Mach. Learn. Knowl. Extr.</source> <volume>1</volume> (<issue>4</issue>), <fpage>1039</fpage>&#x2013;<lpage>1057</lpage>. <pub-id pub-id-type="doi">10.3390/make1040059</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>G-R.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). &#x201c;<article-title>Heterogeneous Transfer Learning for Image Classification</article-title>,&#x201d; in <conf-name>Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence</conf-name>, <conf-loc>San Francisco, CA</conf-loc>, <conf-date>August 7&#x2013;11, 2011</conf-date>. </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ziletti</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Scheffler</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ghiringhelli</surname>
<given-names>L. M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Insightful Classification of crystal Structures Using Deep Learning</article-title>. <source>Nat. Commun.</source> <volume>9</volume> (<issue>1</issue>), <fpage>2775</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-018-05169-6</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>