<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Energy Res.</journal-id>
<journal-title>Frontiers in Energy Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Energy Res.</abbrev-journal-title>
<issn pub-type="epub">2296-598X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">747622</article-id>
<article-id pub-id-type="doi">10.3389/fenrg.2021.747622</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Energy Research</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Intelligent Fault Diagnosis Method of Wind Turbines Planetary Gearboxes Based on a Multi-Scale Dense Fusion Network</article-title>
<alt-title alt-title-type="left-running-head">Huang et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Intelligent Fault Diagnosis for Gearboxes</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Huang</surname>
<given-names>Xinghua</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1420276/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Yuanyuan</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Chai</surname>
<given-names>Yi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>College of Automation, Chongqing University, <addr-line>Chongqing</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>College of Automation, Chongqing University of Posts and Telecommunications, <addr-line>Chongqing</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1122333/overview">Yolanda Vidal</ext-link>, Universitat Politecnica de Catalunya, Spain</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1019911/overview">Davide Astolfi</ext-link>, University of Perugia, Italy</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1446398/overview">Kang Ding</ext-link>, South China University of Technology, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/910123/overview">Yongbo Li</ext-link>, Northwestern Polytechnical University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Yi Chai, <email>chaiyi@cqu.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Wind Energy, a section of the journal Frontiers in Energy Research</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>29</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>747622</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Huang, Li and Chai.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Huang, Li and Chai</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Due to the powerful capability of feature extraction, convolutional neural network (CNN) is increasingly applied to the fault diagnosis of key components of rotating machineries. Due to the shortcomings of traditional CNN-based fault diagnosis methods, the continuous convolution and pooling operations result in the constant decrease of feature resolution, which may cause the loss of some subtle fault information in the samples. This paper proposes a CNN-based model with improved structure multi-scale dense fusion network (MSDFN) to realize the fault diagnosis of wind turbines planetary gearboxes under complicated working conditions. First, the continuous wavelet transform is applied to preprocess the vibration signals, and the two-dimensional wavelet time-frequency diagrams are used as the network input. Then, the multi-scale feature fusion (MSFF) module and a feature of maximum (FoM) module are used in the extraction and classification stages of fault features, respectively. Next, the multi-scale features of each network layer are fused to enhance the fault features. Finally, the high fault diagnosis accuracy is achieved by extracting the separable fusion result of fault features. The proposed method achieves more than 99% fault diagnosis average accuracy on a planetary gearbox dataset. The comparative experimental results verify the effectiveness of the proposed method and its superiority to some mainstream approaches. The ablation study further confirms that MSFF module and FoM module play the positive role in fault diagnosis.</p>
</abstract>
<kwd-group>
<kwd>wind turbines planetary gearbox</kwd>
<kwd>fault diagnosis</kwd>
<kwd>convolutional neural network</kwd>
<kwd>feature fusion</kwd>
<kwd>wavelet transform</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Planetary gearbox is a key component in the transmission system of wind turbines (WT) (<xref ref-type="bibr" rid="B10">Feng and Liang, 2014</xref>; <xref ref-type="bibr" rid="B39">Wang et&#x20;al., 2019</xref>). Due to the harsh working environment and complex structure, the key gear components in the wind turbine planetary gearboxes are prone to damage, which adversely affect the entire transmission system. Since wind turbines are often installed in places with inconvenient transportation (<xref ref-type="bibr" rid="B27">Lu et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B34">Sun et&#x20;al., 2021</xref>), any gear fault of planetary gearboxes may cause the long downtime of the corresponding wind turbine and the high cost of the related operation, maintenance, and reparation (<xref ref-type="bibr" rid="B1">Cao et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B35">Sun et&#x20;al., 2019</xref>). During the service life of a wind turbine, the cost of the related maintenance and operation account for about 75% of the total investment (<xref ref-type="bibr" rid="B25">Lin et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B49">Zhu et&#x20;al., 2021</xref>). The monitoring of the health status of each planetary gearbox plays an important role in the normal operation of a wind turbine. The gear fault types of wind turbine planetary gearboxes mainly include chipped tooth, missing tooth, crack tooth, and surface wear (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B24">Liang et&#x20;al., 2020</xref>). At present, most of the fault diagnosis research on the gears of planetary gearboxes are based on vibration signals (<xref ref-type="bibr" rid="B22">Lei et&#x20;al., 2020</xref>).</p>
<p>Due to no-stationary working condition and complex structure, it is extremely difficult to establish a general mathematical model for the vibration signals of planetary gearboxes (<xref ref-type="bibr" rid="B11">Feng and Zuo, 2012</xref>). Since the vibration signals of planetary gearboxes have three main features, composite signals, pass-through effect, and nonlinearity, it is difficult to directly extract fault features by observing vibration responses (<xref ref-type="bibr" rid="B23">LI et&#x20;al., 2017</xref>). In addition, traditional signal processing methods are difficult to process the monitoring data with massive states in time in modern industry (<xref ref-type="bibr" rid="B31">Pan et&#x20;al., 2019</xref>). Therefore, deep learning-based methods with powerful feature extraction capabilities are increasingly applied to the fault diagnosis of key components of rotating machinery. They usually have three main steps, data preprocessing, fault feature extraction, and fault classification (<xref ref-type="bibr" rid="B26">Liu et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B28">Ma et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B24">Liang et&#x20;al., 2020</xref>).</p>
<p>Deep learning-based fault diagnosis methods often preprocess the original vibration signals first. Wavelet transform, which has excellent time-frequency analysis ability for non-stationary vibration signals, is often applied to fault diagnosis of rotating machinery. The optimized Morlet wavelet transform is used to process the vibration signals to obtain better time- and frequency-domain statistical feature sets (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>). Fault diagnosis can also be well achieved by wavelet packet coefficient matrix of vibration signals (<xref ref-type="bibr" rid="B46">Zhao et&#x20;al., 2017</xref>) or time-frequency images (<xref ref-type="bibr" rid="B24">Liang et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B48">Zhao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B6">Cheng et&#x20;al., 2021</xref>).</p>
<p>Since deep learning can automatically extract abstract features from the original data (<xref ref-type="bibr" rid="B41">Wen et&#x20;al., 2017</xref>), various deep learning-based models are often applied to extract and classify fault features. These methods can effectively avoid using complex signal processing methods to calculate feature parameters for the expression of fault information. According to vibration signals or statistical feature sets, frequency spectrum, and time-frequency spectrum of vibration signals, DBN (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B21">Kang et&#x20;al., 2020</xref>), LSTM (<xref ref-type="bibr" rid="B1">Cao et&#x20;al., 2019</xref>), SAE (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2017</xref>), RNN (<xref ref-type="bibr" rid="B30">Miao et&#x20;al., 2020</xref>), and CNN (<xref ref-type="bibr" rid="B19">Jiang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B36">Wang et&#x20;al., 2020a</xref>) can obtain relatively high fault diagnosis accuracy.</p>
<p>Considering the powerful feature extraction capabilities of deep learning, this paper proposes a CNN-based fault diagnosis model for wind turbine planetary gearboxes. The proposed model aims to use CNN to extract fault features from the time-frequency images of vibration signals and achieve fault classification. However, in actual wind turbine applications, due to noise interference and difficulty in determining the impact of faults, etc., the fault information of the vibration signals of wind turbine planetary gearboxes mapped on time-frequency images may be extremely subtle, especially in the early-stage faults (<xref ref-type="bibr" rid="B40">Wei et&#x20;al., 2019</xref>). Following the continual decrease of feature resolution, some important fault information may be lost to cause adverse impact (<xref ref-type="bibr" rid="B37">Wang et&#x20;al., 2020b</xref>), when CNN is used to extract fault features. In the traditional CNN methods, the learning of each network layer is always only based on the features learned in the previous layer. If partial information of the learned features is lost, the features learned later is adversely affected. To solve the above issue, this paper proposes an intelligent fault diagnosis method based on multi-scale dense fusion network (MSDFN).</p>
<p>MSDFN consists of two parts, feature extraction network and classifier. A dense feature fusion structure <xref ref-type="bibr" rid="B15">Huang et&#x20;al. (2017)</xref> based on projection and back-projection operators <xref ref-type="bibr" rid="B16">Irani and Peleg (1991)</xref>. <xref ref-type="bibr" rid="B7">Dai et&#x20;al. (2007)</xref> is used to optimize the traditional CNN network, and the output of each layer of the feature extraction network is fused as the input of the classifier. Through the dense fusion of multi-scale features to supplement the fault information, the fault diagnosis of wind turbine planetary gearboxes under complicated working conditions is realized. The main contributions of this paper can be summarized as follows.<list list-type="simple">
<list-item>
<p>1) A CNN-variant MSDFN is proposed for fault diagnosis of wind turbine planetary gearboxes under complex working conditions. It could extract enhanced fusion results of fault features from the time-frequency images of vibration signal to improve diagnosis accuracy.</p>
</list-item>
<list-item>
<p>2) The MSFF module designed by projection and back-projection operators are embedded in each network layer. And the fault feature enhancement algorithm based on multi-scale feature fusion is used to supplement the missing information of every layer in time. The fused features can express fault information more effectively and have stronger separability.</p>
</list-item>
<list-item>
<p>3) A FoM module is designed to fuse the output fault features of each feature extraction layer. Specifically, this module uses adaptive maximum pooling to convert the features of each layer to the same resolution and concatenate them. It makes the input features of the classifier have more complete fault information, and the corresponding diagnosis accuracy is improved.</p>
</list-item>
</list>
</p>
<p>The rest of this paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> reviews the CNN-based fault diagnosis research of key components of rotating machinery in recent years; <xref ref-type="sec" rid="s3">Section 3</xref> describes the proposed fault diagnosis method based on MSDFN; <xref ref-type="sec" rid="s4">Section 4</xref> compares the proposed method with several existing fault diagnosis networks to verify its effectiveness and also conducts the corresponding ablation study to test the performance of each module; and <xref ref-type="sec" rid="s5">Section 5</xref> concludes this&#x20;paper.</p>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>In existing deep learning-based fault diagnosis research of rotating machinery, CNN is one of the most commonly used deep learning models for fault diagnosis. Compared with SAE, DBN, and other models, CNN and its variants such as Deep residual network (DRN), deep convolutional neural network (DCN), etc. are more convenient for training (<xref ref-type="bibr" rid="B41">Wen et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B46">Zhao et&#x20;al., 2017</xref>), when vibration signals or their time-frequency features are used as input. Because their local receptive fields and weight sharing strategy usually only needs a smaller amount of parameters. In recent years, many CNN-based fault diagnosis solutions of rotating machinery have been published.</p>
<p>CNN was applied to the fault diagnosis of rotating machinery and fault features were extracted from the spectrum of vibration signals, which achieved better performance than classical classifiers such as random forest and SVM (<xref ref-type="bibr" rid="B17">Janssens et&#x20;al., 2016</xref>). Two-dimensional DCN was used to extract fault features from the wavelet packet energy images of vibration signals, and the corresponding method achieved high fault diagnosis accuracy (<xref ref-type="bibr" rid="B8">Ding and He, 2017</xref>). As the input time-frequency information composed of the original vibration signals and its frequency spectrum, a one-dimensional CNN was applied to the fault diagnosis of planetary gearboxes (<xref ref-type="bibr" rid="B20">Jing et&#x20;al., 2017</xref>). Fault features were extracted from multi-sensor data by a CNN with the multi-input branch structure to realize the fault diagnosis of rotating machinery (<xref ref-type="bibr" rid="B43">Xia et&#x20;al., 2017</xref>). A one-dimensional CNN-based method realized the end-to-end fault diagnosis of rotating machinery (<xref ref-type="bibr" rid="B42">Wu et&#x20;al., 2019</xref>). Discrete wavelet transform was used to obtain the time-frequency matrix of the vibration signals, and CNN was applied to extract the fault features of planetary gearboxes (<xref ref-type="bibr" rid="B4">Chen et&#x20;al., 2019</xref>). Vibration signals were first analyzed by recursive graphs, and then CNN was used to achieve the fault diagnosis of rotating machinery according to the obtained recursive matrix (<xref ref-type="bibr" rid="B37">Wang et&#x20;al., 2020a</xref>). Since the second-order cyclostationary behavior of vibration signals can reveal valuable health information, cyclic spectral coherence (CSCoh) analysis of vibration signals was used to preprocess the original data, which reduced the difficulty of feature learning in deep diagnosis model and improved diagnosis accuracy (<xref ref-type="bibr" rid="B5">Chen et&#x20;al., 2020</xref>).</p>
<p>The above work focuses on the preprocessing methods of raw data, and studies the effects of various data processing methods on fault diagnosis. However, the related research not only relies on rich knowledge of vibration signal processing, but also increases the workload of data processing. The following research proves that the improvement of network structure is conducive to extracting detailed fault features and achieving high diagnostic accuracy. Zhao proved that deep residual network (DRN) can efficiently extract the high-level fault features contained in the wavelet packet coefficients (<xref ref-type="bibr" rid="B46">Zhao et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B47">Zhao et&#x20;al., 2020b</xref>). On the basis of DRN, the dynamic weight module was introduced to weight the fault features of different frequency bands in the time-frequency images, which improved the diagnosis accuracy (<xref ref-type="bibr" rid="B46">Zhao et&#x20;al., 2017</xref>). A multi-branch CNN network structure was used to extract features from different scales and improve the diagnostic ability of the model (<xref ref-type="bibr" rid="B31">Pan et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B32">Peng et&#x20;al., 2020</xref>). MSCNN used multiple CNN branches to process vibration signals from multiple scales and the fault diagnosis accuracy of wind turbine planetary gearboxes was improved (<xref ref-type="bibr" rid="B19">Jiang et&#x20;al., 2018</xref>). An SE-Res module (<xref ref-type="bibr" rid="B14">Hu et&#x20;al., 2018</xref>) was added to the ordinary CNN network to reduce the interference of redundant information and enhance fault features (<xref ref-type="bibr" rid="B3">Cao et&#x20;al., 2020</xref>). A CNN model using hollow convolutions was applied to increase the receptive fields to improve the gear fault diagnosis accuracy of planetary gearboxes (<xref ref-type="bibr" rid="B12">Han et&#x20;al., 2019</xref>). To accurately and automatically identify the health status of rotating machinery, a normalized convolutional neural network was proposed for the fault type diagnosis of rotating machinery under variable operating conditions (<xref ref-type="bibr" rid="B45">Zhao et&#x20;al., 2020a</xref>). A multi-core cascade structure of CNN was used to substitute a single core for fault diagnosis (<xref ref-type="bibr" rid="B37">Wang et&#x20;al., 2020b</xref>). Xu developed a new method VMD-DCNNs that integrated convolutional neural networks with variational mode decomposition (VMD) algorithms (<xref ref-type="bibr" rid="B44">Xu et&#x20;al., 2020</xref>). This method used CNN to extract features from each intrinsic mode function (IMF) and directly processed the original vibration signals in an end-to-end manner without any manual experiences and intervention to realize the fault diagnosis of the key components of wind turbines.</p>
<p>All the above methods use the CNN model as the fault feature extractor and classifier, and achieve good performance in fault diagnosis of the key components of rotating machinery. However, these methods ignore that when CNN performs feature extraction, the reduction of feature resolution may cause the loss of partial fault information and even the decrease of fault diagnosis accuracy (<xref ref-type="bibr" rid="B37">Wang et&#x20;al., 2020b</xref>). The dense feature fusion structure makes the input passed to the next network layer come from all the extracted features, which can effectively supplement the fault information. Therefore, according to the inspiration and guidance of the above-mentioned work, this paper focuses on improving the network by using the feature fusion structure, and develops a multi-scale dense fusion network (MSDFN). The proposed model can extract the enhanced fusion results of fault features from the time-frequency images of vibration signals of wind turbine planetary gearboxes under complicated working conditions, so the relatively complete fault information can be obtained, and the fault diagnosis accuracy can be improved. <xref ref-type="sec" rid="s3">Section 3</xref> specifies the overall structure and working principle of MSDFN.</p>
</sec>
<sec id="s3">
<title>3&#x20;MSDFN-Based Fault Diagnosis Method</title>
<p>The fault diagnosis of wind turbine planetary gearbox is really important. However, the traditional CNN-based fault diagnosis methods may cause the loss of fault information. So, this paper proposes a MSDFN-based intelligent fault diagnosis method for wind turbine planetary gearboxes. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> shows the diagnosis process. The proposed method uses continuous wavelet transform to preprocess the original vibration signal data. A multi-scale feature fusion (MSFF) module is embedded into each feature extraction network layer. An FoM module is used to fuse the output of each feature extraction network layer to obtain the classifier input. So, the information loss is reduced during the fault extraction, and the corresponding fault diagnosis accuracy is improved. The feature extraction network contains five layers, which can fuse features of five scales at most. A structure that is too shallow will result in poor accuracy, and a structure that is too deep will result in a significant increase in the amount of calculation and cannot improve the accuracy. This is verified in the <xref ref-type="sec" rid="s4">Section&#x20;4</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>The process of MSDFN-based fault diagnosis.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g001.tif"/>
</fig>
<sec id="s3-1">
<title>3.1 Data Preprocessing</title>
<sec id="s3-1-1">
<title>3.1.1 The Generation of Time-Frequency Images</title>
<p>Wavelet transform is widely used in vibration signal processing of rotating machinery due to its excellent time-frequency analysis ability for unsteady signals. The commonly used expression of wavelet transform is shown in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>.<disp-formula id="e1">
<mml:math id="m1">
<mml:mi>W</mml:mi>
<mml:mi>T</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mo>&#x222b;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>&#x3c8;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mi>d</mml:mi>
<mml:mi>t</mml:mi>
</mml:math>
<label>(1)</label>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m2">
<mml:mi>W</mml:mi>
<mml:mi>T</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is the wavelet coefficient, <inline-formula id="inf2">
<mml:math id="m3">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is the input signal, <inline-formula id="inf3">
<mml:math id="m4">
<mml:mi>&#x3c8;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is the wavelet basis function, <italic>a</italic> is the scale factor that controls the expansion and contraction of the wavelet basis function, and <italic>&#x3c4;</italic> is the translation amount that controls the translation of the wavelet basis function. The scale corresponds to frequency, and the amount of translation corresponds to time. When the wavelet function is translated on the time axis at each scale, it is multiplied by the input signals respectively. So, the frequency components of the signals in each time period can be obtained.</p>
<p>The selection of wavelet basis function is an important step in using wavelet transform. Similarity coefficients (<xref ref-type="bibr" rid="B29">Mao et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B24">Liang et&#x20;al., 2020</xref>) are applied to select the wavelet basis function for quantitative analysis. The expression of similarity coefficients is shown in <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>.<disp-formula id="e2">
<mml:math id="m5">
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(2)</label>
</disp-formula>where <italic>&#x3b4;</italic> is the dimensionless similarity coefficient, <italic>k</italic> and <italic>s</italic>
<sub>
<italic>i</italic>
</sub>, <italic>m</italic>
<sub>
<italic>i</italic>
</sub>, <italic>&#x3b1;</italic>
<sub>
<italic>i</italic>
</sub> is the number of peaks, area of each peak, maximum of each peak, weighted coefficient of each peak after making absolute value for wavelet basis function.</p>
<p>
<xref ref-type="table" rid="T1">Table&#x20;1</xref> lists the similarity coefficients of several commonly used wavelets for fault diagnosis. When the similarity coefficient of a wavelet increases, the wavelet gets close to the original signals, which means the wavelet contains more fault information to facilitate diagnosis. Therefore, the cmor wavelet defined in <xref ref-type="disp-formula" rid="e3">Eq. 3</xref> is used. The complex Morlet wavelet is a complex sinusoid modulated by a Gaussian envelope defined by:<disp-formula id="e3">
<mml:math id="m6">
<mml:mi>c</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2061;</mml:mo>
<mml:mi mathvariant="italic">exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>i</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="italic">exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>F</italic>
<sub>
<italic>b</italic>
</sub> is the bandwidth of the wavelet, <italic>F</italic>
<sub>
<italic>c</italic>
</sub> is the central frequency of the wavelet function, and <italic>i</italic> is the imaginary unit. This paper takes <italic>F</italic>
<sub>
<italic>c</italic>
</sub> &#x3d; <italic>F</italic>
<sub>
<italic>b</italic>
</sub> &#x3d;&#x20;3.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>The similarity of five commonly used wavelet basis functions.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Wavelet basis function</th>
<th align="center">Coif5</th>
<th align="center">Meyr</th>
<th align="center">Morlet</th>
<th align="center">Db10</th>
<th align="center">Cmor3-3</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">similarity</td>
<td align="char" char=".">6.3298</td>
<td align="char" char=".">6.6082</td>
<td align="char" char=".">7.2953</td>
<td align="char" char=".">7.4970</td>
<td align="char" char=".">40.0507</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-1-2">
<title>3.1.2&#x20;Time-Frequency Image Preprocessing</title>
<p>Time-frequency images need to be preprocessed to improve the training performance before inputting into the network. For image datasets, the commonly used data enhancement methods include rotation, flipping, and random cropping. As the purpose of data enhancement, diverse training samples enable the network to extract key features from the samples undergoing various changes for improving the generalization ability of the network. However, since the vibration signal sequence is time-dependent, it can be regarded as a discrete time function. Flipping, rotation, and random cropping disrupt the relationship between time and frequency features, resulting in poor training performance. Therefore, the proposed image processing method first transforms the size of input images, and then normalizes them to improve the training&#x20;speed.</p>
<p>Bilinear interpolation is used to convert the resolution of time-frequency images. Bilinear interpolation is the expansion of two-variable linear interpolation, which performs linear interpolation in two directions to obtain the value of the unknown function <italic>f</italic> (<italic>x</italic>, <italic>y</italic>) at the point <italic>P</italic>&#x20;&#x3d; (<italic>x</italic>, <italic>y</italic>). The transformation expression of bilinear interpolation is shown in <xref ref-type="disp-formula" rid="e4">Eq. 4</xref>.<disp-formula id="e4">
<mml:math id="m7">
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(4)</label>
</disp-formula>where <italic>Q</italic>
<sub>11</sub> (<italic>x</italic>
<sub>1</sub>, <italic>y</italic>
<sub>1</sub>), <italic>Q</italic>
<sub>21</sub> (<italic>x</italic>
<sub>2</sub>, <italic>y</italic>
<sub>1</sub>), <italic>Q</italic>
<sub>12</sub> (<italic>x</italic>
<sub>1</sub>, <italic>y</italic>
<sub>2</sub>), and <italic>Q</italic>
<sub>22</sub> (<italic>x</italic>
<sub>2</sub>, <italic>y</italic>
<sub>2</sub>) are the four points closest to the target point <italic>f</italic> (<italic>x</italic>, <italic>y</italic>), and <italic>f</italic> (<italic>Q</italic>
<sub>
<italic>ij</italic>
</sub>) represents the value of the point <italic>Q</italic>
<sub>
<italic>ij</italic>
</sub>, <italic>x</italic>
<sub>
<italic>i,j</italic>
</sub>, and <italic>y</italic>
<sub>
<italic>ij</italic>
</sub> represent the abscissa and ordinate of the point <italic>Q</italic>
<sub>
<italic>ij</italic>
</sub>, respectively.</p>
<p>Image normalization is used to scale the value of each image pixel to a small specific interval, remove the data unit limit, and convert it into a dimensionless pure value, which facilitates the comparison, and weighting of the indicators of different units or magnitudes. The normalization formula used in this paper is shown in <xref ref-type="disp-formula" rid="e5">Eq. 5</xref>.<disp-formula id="e5">
<mml:math id="m8">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(5)</label>
</disp-formula>where the pixel value of a point is converted from <italic>x</italic> to <inline-formula id="inf4">
<mml:math id="m9">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>x</italic>
<sub>max</sub> and <italic>x</italic>
<sub>min</sub> are the maximum and minimum values of the sample image pixel values, respectively. The values of all points can be converted to the interval of 0&#x2013;1 by normalization, which improves the convergence speed and accuracy of the&#x20;model.</p>
</sec>
</sec>
<sec id="s3-2">
<title>3.2 MSDFN</title>
<p>As shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, the MSDFN consists of two parts: a feature extraction network and a classifier. <xref ref-type="table" rid="T2">Table&#x20;2</xref> shows the specific composition of MSDFN. In <xref ref-type="table" rid="T2">Table&#x20;2</xref>, &#x201c;Conv&#x201d; refers to convolutional layer; &#x201c;RG&#x201d; refers to residual group; &#x201c;AMP&#x201d; refers to adaptive maximum pooling; &#x201c;AGAP&#x201d; refers to adaptive global average pooling; &#x201c;Fc&#x201d; refers to fully connected layer; &#x201c;numclass&#x201d; refers to the number of fault categories. The convolutional layer of the feature extraction network layer is used to transform the channel and reduce the feature resolution, the Residual Group (RG) is used for fault feature extraction, and the MSFF module is used for multi-scale feature fusion. FoM merges the output features of each layer of the feature extraction network again, and inputs it into the classifier constructed by a fully connected layer to obtain a prediction vector. The residual group as shown in <xref ref-type="fig" rid="F2">Figure&#x20;2A</xref> is composed of three residual blocks as shown in <xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>. ReLU function is be used as the activation function. Although multi-scale feature fusion creates redundancy, it supplements more complete fault information, so that the subsequent network layer can extract more comprehensive fault information, which enhancing the extracted fault features and improving diagnosis accuracy.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Detailed structure of MSDFN. &#x201c;Conv&#x201d;, convolutional layer; &#x201c;RG&#x201d;, residual group; &#x201c;AMP&#x201d;, adaptive maximum pooling; &#x201c;AGAP&#x201d;, adaptive global average pooling; &#x201c;Fc&#x201d;, fully connected layer; &#x201c;numclass&#x201d;, the number of fault categories.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Layer name</th>
<th align="center">Structure</th>
<th align="center">Input channels, output channels</th>
<th align="center">Output size (Input size&#x20;&#x3d;&#x20;256)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">Layer1</td>
<td>Conv1</td>
<td align="center">3, 16</td>
<td align="center">256&#x20;&#xd7; 256</td>
</tr>
<tr>
<td>RG1</td>
<td align="center">16, 16</td>
<td align="center">256&#x20;&#xd7; 256</td>
</tr>
<tr>
<td rowspan="3" align="left">Layer2</td>
<td>Conv2</td>
<td align="center">16, 32</td>
<td align="center">128&#x20;&#xd7; 128</td>
</tr>
<tr>
<td>MSFF1</td>
<td align="center">(16,32), 32</td>
<td align="center">128&#x20;&#xd7; 128</td>
</tr>
<tr>
<td>RG2</td>
<td align="center">32, 32</td>
<td align="center">128&#x20;&#xd7; 128</td>
</tr>
<tr>
<td rowspan="3" align="left">Layer3</td>
<td>Conv3</td>
<td align="center">32, 64</td>
<td align="center">64&#x20;&#xd7; 64</td>
</tr>
<tr>
<td>MSFF2</td>
<td align="center">(16,32,64), 64</td>
<td align="center">64&#x20;&#xd7; 64</td>
</tr>
<tr>
<td>RG3</td>
<td align="center">64, 64</td>
<td align="center">64&#x20;&#xd7; 64</td>
</tr>
<tr>
<td rowspan="3" align="left">Layer4</td>
<td>Conv4</td>
<td align="center">64, 128</td>
<td align="center">32&#x20;&#xd7; 32</td>
</tr>
<tr>
<td>MSFF3</td>
<td align="center">(16,32,64,128), 128</td>
<td align="center">32&#x20;&#xd7; 32</td>
</tr>
<tr>
<td>RG4</td>
<td align="center">128, 128</td>
<td align="center">32&#x20;&#xd7; 32</td>
</tr>
<tr>
<td rowspan="2" align="left">Layer5</td>
<td>Conv5</td>
<td align="center">128, 256</td>
<td align="center">16&#x20;&#xd7; 16</td>
</tr>
<tr>
<td>MSFF4</td>
<td align="center">(16,32,64,128,256), 256</td>
<td align="center">16&#x20;&#xd7; 16</td>
</tr>
<tr>
<td align="left">FoM</td>
<td>AMP</td>
<td align="center">(16,32,64,128,256), 496</td>
<td align="center">16&#x20;&#xd7; 16</td>
</tr>
<tr>
<td align="left">GAP</td>
<td>AGAP</td>
<td align="center">496, 496</td>
<td align="center">1&#x20;&#xd7; 1</td>
</tr>
<tr>
<td align="left">Classifier</td>
<td>Fc</td>
<td align="center">&#x2013;</td>
<td align="center">1&#x20;&#xd7; <italic>numclass</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Structure of residual group. <bold>(A)</bold> Residual group <bold>(B)</bold> Residual&#x20;block.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g002.tif"/>
</fig>
<sec id="s3-2-1">
<title>3.2.1 MSFF Module</title>
<p>According to the feature fusion method proposed in (<xref ref-type="bibr" rid="B9">Dong et&#x20;al., 2020</xref>), the MSFF module is proposed to apply projection and back projection operators to the fault diagnosis of planetary gearboxes. As shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, the MSFF module fuses the features of all layers to supplement important time-frequency and fault information. <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows the structure of the MSFF module. The MSFF module of the n-th network layer is defined as <xref ref-type="disp-formula" rid="e6">Eq. 6</xref>.<disp-formula id="e6">
<mml:math id="m10">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>j</italic>
<sup>
<italic>n</italic>
</sup> is the latent feature of the n-th feature extraction network layer, <inline-formula id="inf5">
<mml:math id="m11">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the enhanced features obtained through dense fusion, and <inline-formula id="inf6">
<mml:math id="m12">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are the enhanced fusion features from n MSFF modules before this layer in the network. This paper uses the enhanced features <inline-formula id="inf7">
<mml:math id="m13">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>t</italic> times to gradually improve the enhanced features <italic>j</italic>
<sup>
<italic>n</italic>
</sup>. The specific update and improvement process is shown as follows.<list list-type="simple">
<list-item>
<p>1) As shown in <xref ref-type="disp-formula" rid="e7">Eq. 7</xref>, the difference <inline-formula id="inf8">
<mml:math id="m14">
<mml:msubsup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> between the enhanced feature <inline-formula id="inf9">
<mml:math id="m15">
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> of the <italic>t</italic>-th iteration and the <italic>t</italic>-th enhanced feature <inline-formula id="inf10">
<mml:math id="m16">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is calculated.</p>
</list-item>
</list>
<disp-formula id="e7">
<mml:math id="m17">
<mml:msubsup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf11">
<mml:math id="m18">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> represents the back projection operator, which upsamples the promoted feature <inline-formula id="inf12">
<mml:math id="m19">
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> to the same dimension as <inline-formula id="inf13">
<mml:math id="m20">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>.<list list-type="simple">
<list-item>
<p>2) The enhanced feature <inline-formula id="inf14">
<mml:math id="m21">
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is updated through back projection for the difference calculated by <xref ref-type="disp-formula" rid="e7">Eq. 7</xref>, as shown in <xref ref-type="disp-formula" rid="e8">Eq.&#x20;8</xref>.</p>
</list-item>
</list>
<disp-formula id="e8">
<mml:math id="m22">
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<label>(8)</label>
</disp-formula>where <inline-formula id="inf15">
<mml:math id="m23">
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is the projection operator, which downsamples the difference <inline-formula id="inf16">
<mml:math id="m24">
<mml:msubsup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> in the <italic>t</italic>-th iteration to the same dimension of <inline-formula id="inf17">
<mml:math id="m25">
<mml:msubsup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>.<list list-type="simple">
<list-item>
<p>3) After the iteration of all previous enhanced features, the enhanced feature <inline-formula id="inf18">
<mml:math id="m26">
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is finally obtained.</p>
</list-item>
</list>
</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The MSFF module.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g003.tif"/>
</fig>
<p>Unlike the traditional back projection techniques, the sampling operators <inline-formula id="inf19">
<mml:math id="m27">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf20">
<mml:math id="m28">
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> in the network are unknown. The proposed method uses strided convolution (deconvolution layer) to learn the downsampling (upsampling) operator in an end-to-end manner. In order to avoid introducing too many parameters, this paper uses 2 as the stride size and stacks the convolutional and deconvolutional layers of <italic>n</italic>&#x20;&#x2212; <italic>t</italic>-th layer together to achieve downsampling and upsampling learning in <inline-formula id="inf21">
<mml:math id="m29">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and&#x20;<inline-formula id="inf22">
<mml:math id="m30">
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>.</p>
<p>In summary, the multi-scale feature fusion (MSFF) algorithm used to enhance fault features is described in <xref ref-type="statement" rid="Algorithm_1">
<bold>Algorithm&#x20;1</bold>
</xref>.</p>
<p>
<statement content-type="algorithm" id="Algorithm_1">
<label>Algorithm 1</label>
<p>Fault feature enhancement algorithm based on multi-scale feature fusion</p>
<p>
<inline-graphic xlink:href="fenrg-09-747622-fx1.tif"/>
</p>
</statement>
</p>
</sec>
<sec id="s3-2-2">
<title>3.2.2 FoM</title>
<p>The MSFF module achieves feature fusion in the feature extraction process, but the issue of incomplete information still exists when the network uses the extracted features to classify faults. Therefore, the FoM module is designed before the classifier part, and its definition <italic>fom</italic> is shown in <xref ref-type="disp-formula" rid="e9">Eq. 9</xref>. As shown in <xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>, the FoM module is used to convert the output features of each network layer to a uniform size and concatenate on the channel dimensions to achieve the fusion of predicted features.<disp-formula id="e9">
<mml:math id="m42">
<mml:mi>f</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
<disp-formula id="e10">
<mml:math id="m43">
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>m</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>H</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>cat</italic> represents the concatenation of each input feature on the channel dimensions, and <italic>amp</italic>
<sub>
<italic>W</italic>&#xd7;<italic>H</italic>
</sub> represents the adaptive maximum pooling operation with an output size of <italic>W</italic>&#x20;&#xd7; <italic>H</italic>. As shown in <xref ref-type="fig" rid="F4">Figure&#x20;4B</xref>, the features of each scale are converted into feature vectors with the same resolution and different channel numbers to represent the diagnosis result obtained by each layer. <inline-formula id="inf34">
<mml:math id="m44">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> are the output features of the first to L-th (L &#x3d; 5 in this paper) network layers. The FoM module performs the second-time fusion of fault features to prevent the feature vectors used for classification from affecting the diagnosis result due to insufficient or missing information.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The FoM module and Adaptive Maxpool. <bold>(A)</bold> the FoM Module <bold>(B)</bold> the Process of Adaptive Maxpooling.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g004.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4 Experiment</title>
<p>In order to verify the effectiveness of the proposed MSDFN-based wind turbine planetary gearbox fault diagnosis method and the performance of the feature fusion module MSFF and FoM modules, based on the planetary gearbox dataset of the State Key Transmission Laboratory of Chongqing University (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>) and gearbox dataset of the University of Connecticut (<xref ref-type="bibr" rid="B2">Cao et&#x20;al., 2018</xref>), comparative experiments and ablation experiments were carried out. The aim is to use the multi-stage gearbox of the experimental platform to simulate the gearbox of the wind turbine for fault diagnosis research.</p>
<sec id="s4-1">
<title>4.1 An Introduction of the Dataset</title>
<p>The vibration signals datasets of the planetary gearboxes used in this research come from the University of Connecticut gearbox data (<xref ref-type="bibr" rid="B2">Cao et&#x20;al., 2018</xref>) and actual experimental data of the State Key Laboratory of Mechanical Transmission of Chongqing University (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>). The following describes the basic situation of the two data sets respectively.</p>
<p>1) Chongqing University gearbox dataset. As shown in <xref ref-type="fig" rid="F5">Figure&#x20;5A</xref>, the fault diagnosis experiment platform of planetary gearboxes is mainly composed of a motor, a planetary gearbox, a parallel shaft gearbox, and an electromagnetic brake. The multi-axis accelerometer sensor is installed at the housing directly above the second-stage Sun gear of planetary gearboxes to collect the original vibration signals; and the rotation frequency of the second-stage Sun gear is set to 4.17&#xa0;Hz. As shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>, the types of gear faults in the experiments contain 1) surface wear of gear teeth, 2) cracked gear teeth, 3) chipped gear teeth, and 4) missing gear teeth. For the collection of vibration signals, the sampling frequency is set to 5,120Hz, and each group of sampling time lasts for 200&#xa0;s. So, each type of fault includes 1,024,000 sampling points under each load condition. The working conditions are often unmeasured and easy to fluctuate under actual engineering conditions. To discuss the fault diagnosis performance of network models on planetary gearboxes under different load conditions, four load conditions are set to 0&#xa0;N.m, 1.4&#xa0;N.m, 2.8&#xa0;N.m, and 25.2&#xa0;N.m for each type of Sun gear fault, respectively. The collected data is processed by high-pass filtering to remove some low-frequency noise interference in the vibration signals. All the experimental data used in this paper are the time sequence of filtered vibration signals (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Fault diagnosis experiment platform of planetary gearboxes. <bold>(A)</bold> Chongqing University gearbox (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>) <bold>(B)</bold> University of Connecticut gearbox (<xref ref-type="bibr" rid="B2">Cao et&#x20;al., 2018</xref>).</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Types of gear faults (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>). <bold>(A)</bold> Surface wear of gear teeth <bold>(B)</bold> Crack gear teeth <bold>(C)</bold> Chipped gear teeth <bold>(D)</bold> Missing gear&#x20;teeth.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g006.tif"/>
</fig>
<p>2) University of Connecticut gearbox data, the experimental data are collected from a benchmark two-stage gearbox with replaceable gears as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>. The gear speed is controlled by a motor. The torque is supplied by a magnetic brake which can be adjusted by changing its input voltage. A 32-tooth pinion and an 80-tooth gear are installed on the first stage input shaft. The second stage consists of a 48-tooth pinion and 64-tooth gear. The input shaft speed is measured by a tachometer, and gear vibration signals are measured by an accelerometer. The signals are recorded through a dSPACE system (DS1006 processor board, dSPACE Inc.) with sampling frequency of 20&#xa0;KHz. Nine different gear conditions are introduced to the pinion on the input shaft, including healthy condition, missing tooth, root crack, spalling, and chipping tip with five different levels of severity (<xref ref-type="bibr" rid="B2">Cao et&#x20;al., 2018</xref>). This dataset contains 104 samples every class and every sample contain 3,600 points.</p>
<p>The gearbox data set of Chongqing University contains usual types of gear failures, including data on a wide range of working conditions, which can better simulate the variable characteristics of wind turbine gearbox load conditions <xref ref-type="bibr" rid="B1">Cao et&#x20;al. (2019)</xref>. At the same time, the gearbox data shared by the team of Professor Jiong Tang from the University of Connecticut was selected to further verify the effectiveness of the proposed method. This data set contains nine types of failures, which can better test the model&#x2019;s ability to distinguish samples of the same type but with different failure levels. The experimental platform configuration of these two datasets and the structure of the gearbox are relatively reasonable, which has been proved in the related wind turbine gearbox fault diagnosis researches (<xref ref-type="bibr" rid="B19">Jiang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B27">Lu et&#x20;al., 2020</xref>).</p>
</sec>
<sec id="s4-2">
<title>4.2 Data Preprocessing</title>
<p>According to the method proposed in <xref ref-type="sec" rid="s3">Section 3</xref>, the original vibration signals are preprocessed to obtain the dataset of time-frequency images. 1) Chongqing University gearbox dataset, since the rotation frequency of faulty gears is 4.17 Hz, the data collected per second can contain the vibration information of multiple rotation periods. So, a sample is composed of the data per second. 200&#x20;time-frequency images are obtained for each fault type. Four load conditions are marked as L1, L2, L3, and L4. Each of them has 1,000 image samples. 2) University of Connecticut gearbox data, according to the data set structure and sampling frequency, 104 samples of each class are obtained.</p>
<p>
<xref ref-type="table" rid="T3">Table&#x20;3</xref> shows the average accuracy of MSDFN with different ratio of training and testing sets from the Chongqing University gearboxes dataset. It can be found that the accuracy has not changed significantly, nor has it shown an obvious monotonic trend, but fluctuates slightly as the ratio changes. Considering the number of samples, in order to obtain a higher diagnostic accuracy, all image samples are randomly divided into the training and testing sets at a ratio of 7:3. The data collected from different load conditions is combined to obtain a mixed dataset. The sample size of the obtained time-frequency images is 256&#x20;&#xd7; 256. <xref ref-type="table" rid="T4">Tables 4</xref>, <xref ref-type="table" rid="T5">5</xref> shows the details of the obtained datasets.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Accuracy under different ratio of training and testing&#x20;sets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Train set: Test set</th>
<th align="center">5: 5</th>
<th align="center">6: 4</th>
<th align="center">7: 3</th>
<th align="center">8: 2</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Train set</td>
<td align="char" char=".">2,000</td>
<td align="char" char=".">2,400</td>
<td align="char" char=".">2,800</td>
<td align="char" char=".">3,200</td>
</tr>
<tr>
<td align="left">Test set</td>
<td align="char" char=".">2,000</td>
<td align="char" char=".">1,600</td>
<td align="char" char=".">1,200</td>
<td align="char" char=".">800</td>
</tr>
<tr>
<td align="left">Accuracy</td>
<td align="char" char=".">99.73%</td>
<td align="char" char=".">99.72%</td>
<td align="char" char=".">99.75%</td>
<td align="char" char=".">99.74%</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>time-frequency images datasets of Chongqing University gearboxes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Chipped</th>
<th align="center">Crack</th>
<th align="center">Missing</th>
<th align="center">Normal</th>
<th align="center">Surfacewear</th>
<th align="center">Loadcondition</th>
<th align="center">Train set</th>
<th align="center">Test set</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Label</td>
<td align="char" char=".">0</td>
<td align="char" char=".">1</td>
<td align="char" char=".">2</td>
<td align="char" char=".">3</td>
<td align="char" char=".">4</td>
<td align="left"/>
<td align="left"/>
<td align="left"/>
</tr>
<tr>
<td align="left">DatasetA</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="center">L1</td>
<td align="char" char=".">700</td>
<td align="char" char=".">300</td>
</tr>
<tr>
<td align="left">DatasetB</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="center">L2</td>
<td align="char" char=".">700</td>
<td align="char" char=".">300</td>
</tr>
<tr>
<td align="left">DatasetC</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="center">L3</td>
<td align="char" char=".">700</td>
<td align="char" char=".">300</td>
</tr>
<tr>
<td align="left">DatasetD</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="center">L4</td>
<td align="char" char=".">700</td>
<td align="char" char=".">300</td>
</tr>
<tr>
<td align="left">DatasetE</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="char" char=".">200</td>
<td align="center">L1,L2,L3,L4</td>
<td align="char" char=".">700</td>
<td align="char" char=".">300</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Time-frequency images dataset of University of Connecticut gearbox.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Chipping1a</th>
<th align="center">Chipping2a</th>
<th align="center">Chipping3a</th>
<th align="center">Chipping4a</th>
<th align="center">Chipping5a</th>
<th align="center">Crack</th>
<th align="center">Healthy</th>
<th align="center">Missing</th>
<th align="center">Spall</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Label</td>
<td align="char" char=".">0</td>
<td align="char" char=".">1</td>
<td align="char" char=".">2</td>
<td align="char" char=".">3</td>
<td align="char" char=".">4</td>
<td align="char" char=".">5</td>
<td align="char" char=".">6</td>
<td align="char" char=".">7</td>
<td align="char" char=".">8</td>
</tr>
<tr>
<td align="left">Dataset F</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
<td align="char" char=".">104</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-3">
<title>4.3 Model Training</title>
<sec id="s4-3-1">
<title>4.3.1 Hyperparameter Setting</title>
<p>In the training process of deep learning networks, the hyperparameter settings have considerable impact on the training performance. In the experiments, the batch size of the network input (the number of samples input to the network each time) affects the testing accuracy of the model. If the batch size is too large, the model is difficult to fit or the fitting performance is poor. If the batch size is too small, the model is difficult to converge, and the accuracy rate oscillates unevenly. Considering the number of samples and the testing results, 16 is chosen.</p>
<p>The learning rate is another important hyperparameter. The learning rate represents the updated stride size of network parameters. If the learning rate is too small, it causes too low training efficiency and too much time is spent on training. If the learning rate is too large, it leads to local optimization, loss of oscillation, and model failure to converge. The dynamic adjustment mechanism of learning rate is adopted. The initial learning rate is 0.001, the learning rate transformation coefficient is 0.9. For instance, the learning rate is multiplied by 0.9 for every 10 training cycles. It ensures a fast training speed in the initial stage of training. When the convergence speed of the model slows down, the learning rate is reduced to gradually approach the optimal network parameter values.</p>
</sec>
<sec id="s4-3-2">
<title>4.3.2 Other Training Details</title>
<p>All experiments are implemented on a computer with a RTX3090 GPU, 16&#xa0;GB RAM, and an Intel i710700 CPU. The implementation, training and testing of the network model are executed on the Pytorch1.7.0 deep learning framework. Matlab 2018a is used to divide the original vibration signals into sample sequences and generate time-frequency image samples.</p>
<p>This paper uses the softmax cross-entropy loss function. The corresponding loss calculation formula for a single sample is shown in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>.<disp-formula id="e11">
<mml:math id="m45">
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="italic">log</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="italic">log</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>where <italic>c</italic> is the label (the category index), the vector <italic>x</italic> is the prediction result of each category (the network output), and <italic>x</italic>
<sub>
<italic>c</italic>
</sub> represents the <italic>c</italic>-th element of <italic>x</italic>. As the network trains, <italic>x</italic>
<sub>
<italic>c</italic>
</sub> approaches 1, so the loss approaches 0. The loss of each batch is shown in <xref ref-type="disp-formula" rid="e12">Eq. 12</xref>.<disp-formula id="e12">
<mml:math id="m46">
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>The vectors <italic>x</italic>
<sub>
<italic>i</italic>
</sub> and <italic>C</italic>
<sub>
<italic>i</italic>
</sub> are the network output and label category of each sample respectively, and <italic>N</italic> is the batch size. The loss of each batch is the average loss of each sample.</p>
<p>In order to speed up the convergence speed and avoid falling into the local optimization, the momentum-driven stochastic gradient descent (Momentum-SGD) algorithm is used to optimize the network model, and the momentum parameter is set to&#x20;0.9.</p>
</sec>
</sec>
<sec id="s4-4">
<title>4.4 Experimental Results and Analysis</title>
<p>The depth of the feature extraction network in MSDFN will affect feature fusion and the accuracy of fault diagnosis. Therefore, the best depth is selected through experiments. <xref ref-type="table" rid="T6">Table&#x20;6</xref> shows the diagnostic accuracy of MSDFN of different depth feature extraction networks on Dataset E. It can be seen that if the structure is shallow, the diagnostic accuracy will decrease, and if the structure is too deep, it does not improve the accuracy. Adding one layer will increase the amount of model parameters several times. Therefore, we set the feature extraction network to five layers.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Diagnostic accuracy of MSDFN of different depth feature extraction networks.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Depth of the feature extraction network <italic>L</italic>
</th>
<th align="center">3</th>
<th align="center">4</th>
<th align="center">5</th>
<th align="center">6</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Accuracy</td>
<td align="char" char=".">96.53%</td>
<td align="char" char=".">98.12%</td>
<td align="char" char=".">99.75%</td>
<td align="char" char=".">99.72%</td>
</tr>
<tr>
<td align="left">Parameters</td>
<td align="char" char=".">1.53M</td>
<td align="char" char=".">6.15M</td>
<td align="char" char=".">24.66M</td>
<td align="char" char=".">98.68M</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s4-4-1">
<title>4.4.1 Performance Comparison of Different Network Models</title>
<p>In order to verify the effectiveness of the proposed method, the proposed MSDFN is compared with DWWC &#x2b; DRN (<xref ref-type="bibr" rid="B46">Zhao et&#x20;al., 2017</xref>), WT-CNN (<xref ref-type="bibr" rid="B24">Liang et&#x20;al., 2020</xref>), and three traditional image classification networks: ResNet18, DenseNet201, VGG11. There are the following explanations about the comparative experiment.</p>
<p>Since the innovations of the proposed method focus on the fault diagnosis network, the networks proposed by existing methods are used in the comparative experiments. In comparative experiments, the data preprocessing methods of the original literature are not included, but the same dataset of time-frequency images is used. Therefore, this paper selects the networks proposed in <xref ref-type="bibr" rid="B46">Zhao et&#x20;al. (2017)</xref> (the original literature uses wavelet packet coefficient matrix) and <xref ref-type="bibr" rid="B24">Liang et&#x20;al. (2020)</xref> (the original literature uses time-frequency images) for comparison.</p>
<p>In addition, several traditional image classification networks are applied to the performance comparison of network models. DBN (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>) is the source of the dataset used in this article. The listed results are from the original literature. Since the original network structure is not convenient to process time-frequency images and its training method is quite different from others, this paper does not test DBN. All the related comparative results are from the original literature (<xref ref-type="bibr" rid="B38">Wang et&#x20;al., 2018</xref>). Comparative experimental results. <xref ref-type="table" rid="T7">Table&#x20;7</xref> shows the accuracy of fault diagnosis on each dataset. Compared with DBN and CNN using the traditional structures, the residual network and MSDFN have obvious advantages in accuracy. Compared with the residual network, the proposed network uses fewer network layers to achieve a certain improvement in fault diagnosis accuracy. The accuracy of each network on Dataset F, the University of Connecticut dataset proves the effectiveness of the proposed gearbox fault diagnosis method. In addition, all networks perform better on Dataset F, which also reflects the different complexity of the dataset distribution. <xref ref-type="fig" rid="F7">Figure&#x20;7</xref> shows the accuracies of several CNN-based networks obtained in 10&#x20;times repeated experiments. MSDFN has a stabler and higher accuracy rate on randomly allocated datasets, which means that the proposed network has stronger robustness. <xref ref-type="fig" rid="F8">Figures 8A,B</xref> respectively compare the accuracy and loss curves of the Dataset E during the training process of each network. The convergence speed of MSDFN is close to ResNet, and its accuracy fluctuation is&#x20;small.</p>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Accuracy of different methods on different dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="2" align="left">Network</th>
<th align="center">Dataset A (%)</th>
<th align="center">Dataset B (%)</th>
<th align="center">Dataset C (%)</th>
<th align="center">Dataset D (%)</th>
<th align="center">Dataset E (%)</th>
<th align="center">Dataset F (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">DenseNet121, <xref ref-type="bibr" rid="B15">Huang et&#x20;al. (2017)</xref>
</td>
<td>Train</td>
<td align="char" char=".">99.80</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">99.80</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">98.40</td>
<td align="char" char=".">98.60</td>
<td align="char" char=".">98.60</td>
<td align="char" char=".">98.60</td>
<td align="char" char=".">98.35</td>
<td align="char" char=".">99.65</td>
</tr>
<tr>
<td rowspan="2" align="left">VGG11, <xref ref-type="bibr" rid="B33">Simonyan and Zisserman (2017)</xref>
</td>
<td>Train</td>
<td align="char" char=".">96.80</td>
<td align="char" char=".">98.80</td>
<td align="char" char=".">98.60</td>
<td align="char" char=".">95.40</td>
<td align="char" char=".">99.70</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">96.60</td>
<td align="char" char=".">97.60</td>
<td align="char" char=".">97.00</td>
<td align="char" char=".">93.60</td>
<td align="char" char=".">97.05</td>
<td align="char" char=".">99.54</td>
</tr>
<tr>
<td rowspan="2" align="left">ResNet18, <xref ref-type="bibr" rid="B13">He et&#x20;al. (2016)</xref>
</td>
<td>Train</td>
<td align="char" char=".">99.80</td>
<td align="char" char=".">99.80</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">99.20</td>
<td align="char" char=".">98.82</td>
<td align="char" char=".">99.20</td>
<td align="char" char=".">99.60</td>
<td align="char" char=".">99.50</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td rowspan="2" align="left">DBN, <xref ref-type="bibr" rid="B38">Wang et&#x20;al. (2018)</xref>
</td>
<td>Train</td>
<td align="char" char=".">99.92</td>
<td align="char" char=".">99.94</td>
<td align="char" char=".">99.33</td>
<td align="char" char=".">98.28</td>
<td align="char" char=".">98.22</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">99.52</td>
<td align="char" char=".">98.82</td>
<td align="char" char=".">98.65</td>
<td align="char" char=".">95.82</td>
<td align="char" char=".">95.72</td>
<td align="char" char=".">98.92</td>
</tr>
<tr>
<td rowspan="2" align="left">WT-CNN, <xref ref-type="bibr" rid="B24">Liang et&#x20;al. (2020)</xref>
</td>
<td>Train</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">99.90</td>
<td align="char" char=".">99.90</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">99.40</td>
<td align="char" char=".">98.20</td>
<td align="char" char=".">98.40</td>
<td align="char" char=".">98.80</td>
<td align="char" char=".">97.60</td>
<td align="char" char=".">99.23</td>
</tr>
<tr>
<td rowspan="2" align="left">DWWC &#x2b; DRN, <xref ref-type="bibr" rid="B46">Zhao et&#x20;al. (2017)</xref>
</td>
<td>Train</td>
<td align="char" char=".">99.40</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">99.90</td>
<td align="char" char=".">99.90</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">98.80</td>
<td align="char" char=".">98.20</td>
<td align="char" char=".">98.60</td>
<td align="char" char=".">99.20</td>
<td align="char" char=".">98.90</td>
<td align="char" char=".">99.85</td>
</tr>
<tr>
<td rowspan="2" align="left">MSDFN</td>
<td>Train</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
<td align="char" char=".">100.00</td>
</tr>
<tr>
<td>Test</td>
<td align="char" char=".">98.62</td>
<td align="char" char=".">99.27</td>
<td align="char" char=".">99.03</td>
<td align="char" char=".">99.35</td>
<td align="char" char=".">99.75</td>
<td align="char" char=".">100.00</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Boxplots of the accuracies of different networks.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g007.tif"/>
</fig>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>learning results of different networks. <bold>(A)</bold> Accuracy curves of test dataset E <bold>(B)</bold> Loss curves of training dataset E.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g008.tif"/>
</fig>
</sec>
<sec id="s4-4-2">
<title>4.4.2 The Role of Dense Feature Fusion</title>
<p>In this paper, the feature fusion module first is removed and then ablation study is performed on the complete model to verify the impact of the proposed multi-scale dense fusion mechanism on diagnosis performance. The different degrees of ablation models are shown in <xref ref-type="table" rid="T8">Table&#x20;8</xref>. After removing all the remaining parts of the feature fusion structure, the diagnosis network shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> is used as backbone, which is mainly composed of a convolutional layer, a residual block, and a fully connected layer. The complete network D is MSDFN. The diagnostic accuracy of different degrees of ablation models on each dataset is shown in <xref ref-type="table" rid="T9">Table&#x20;9</xref>. It can be seen that the feature fusion module has a positive effect on improving the accuracy of fault diagnosis. The diagnostic accuracy of different degrees of ablation models does not fluctuate greatly under different working conditions, but they are still affected by the working conditions. Different models show different diagnostic performance on different datasets. As the load increases, the diagnostic accuracy of models B, C, and D is also improving, but model A does not have this rule, which reflects that the feature fusion module increases the model sensitivity to changes in load conditions.</p>
<table-wrap id="T8" position="float">
<label>TABLE 8</label>
<caption>
<p>The ablation networks.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Network</th>
<th align="center">A</th>
<th align="center">B</th>
<th align="center">C</th>
<th align="center">D</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Backbone</td>
<td align="center">
<italic>&#x2713;</italic>
</td>
<td align="center">
<italic>&#x2713;</italic>
</td>
<td align="center">
<italic>&#x2713;</italic>
</td>
<td align="center">
<italic>&#x2713;</italic>
</td>
</tr>
<tr>
<td align="left">MSFF</td>
<td align="left"/>
<td align="center">
<italic>&#x2713;</italic>
</td>
<td align="left"/>
<td align="center">
<italic>&#x2713;</italic>
</td>
</tr>
<tr>
<td align="left">FoM</td>
<td align="left"/>
<td align="left"/>
<td align="center">
<italic>&#x2713;</italic>
</td>
<td align="center">
<italic>&#x2713;</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T9" position="float">
<label>TABLE 9</label>
<caption>
<p>Ablation experiment results.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Network</th>
<th align="center">A (%)</th>
<th align="center">B (%)</th>
<th align="center">C (%)</th>
<th align="center">D (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Dataset A</td>
<td align="char" char=".">92.67</td>
<td align="char" char=".">95.00</td>
<td align="char" char=".">94.00</td>
<td align="char" char=".">98.62</td>
</tr>
<tr>
<td align="left">Dataset B</td>
<td align="char" char=".">92.67</td>
<td align="char" char=".">95.30</td>
<td align="char" char=".">95.00</td>
<td align="char" char=".">99.27</td>
</tr>
<tr>
<td align="left">Dataset C</td>
<td align="char" char=".">92.33</td>
<td align="char" char=".">97.33</td>
<td align="char" char=".">95.00</td>
<td align="char" char=".">99.03</td>
</tr>
<tr>
<td align="left">Dataset D</td>
<td align="char" char=".">92.00</td>
<td align="char" char=".">97.00</td>
<td align="char" char=".">95.33</td>
<td align="char" char=".">99.35</td>
</tr>
<tr>
<td align="left">Dataset E</td>
<td align="char" char=".">92.58</td>
<td align="char" char=".">97.08</td>
<td align="char" char=".">96.00</td>
<td align="char" char=".">99.75</td>
</tr>
<tr>
<td align="left">Dataset F</td>
<td align="char" char=".">96.88</td>
<td align="char" char=".">99.65</td>
<td align="char" char=".">98.96</td>
<td align="char" char=".">100.00</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<xref ref-type="fig" rid="F9">Figure&#x20;9</xref> shows the confusion matrices of the diagnosis results obtained by each ablated network model on the testing set of Dataset E containing 1,200 images. After adding the feature fusion structure to the network, the accuracy rate is significantly improved. Additionally, due to the dense feature fusion at each network layer, the MSFF module plays a more important role than the FoM module. Although the projection and back projection operations of each layer increase the amount of calculation, the fault diagnosis accuracy is effectively improved. Three images incorrectly diagnosed by the complete network D belong to different categories and are predicted to be three different categories, which indicates that MSDFN has no obvious deviation in the diagnosis ability of different fault types. However, the obvious deviation in the diagnosis ability exists in networks A, B, and C. The integration of the two feature fusion modules, MSFF and FoM, effectively improves the unstable recognition ability of the network for various fault types. 240 images are randomly selected from the testing set of Dataset E to compose a dataset. According to the network with various degrees of ablation, <xref ref-type="fig" rid="F10">Figure&#x20;10</xref> shows the distribution of output features after t-sne dimensionality reduction. The distribution of output features intuitively shows the separability of output features. The fault feature extraction performance of each ablated network can be evaluated.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Confusion matrix of the diagnosis accuracy obtained by different networks.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g009.tif"/>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>The visual results by t-sne of different layers from networks.</p>
</caption>
<graphic xlink:href="fenrg-09-747622-g010.tif"/>
</fig>
<p>The output features of the first two feature extraction network layers of each network model show two parts, a large part and a small part, because the separability of features is low at this time. Since testing is carried out on a mixed dataset, there are obvious data distribution differences between the data of the mixed working conditions. This mainly reflects the load among L1 (0&#xa0;N.m), L2 (1.4&#xa0;N.m), L3 (2.8&#xa0;N.m), and L4 (25.2&#xa0;N.m), so the two parts in the scatter diagram show an approximate ratio of 3:1. The performance of the two feature fusion modules on fault diagnosis can be analyzed by comparing the feature distribution of networks A, B, C, and D. As the network depth increases, the differences caused by working conditions gradually decrease, so the features of the same type of faults gradually become concentrated. Different ablated networks show different recognition capabilities. Compare with the third-layer output of different networks, the introduction of the MSFF module enhances the ability to distinguish features. According to the output comparison of each layer of each model, the network D achieves the best performance.</p>
<p>The fault recognition capability of each network can be determined by observing the output of FoM module in each model. Compared with other models, the output features of FoM module in model A (backbone network) have many errors, and the features of each fault category are relatively close. As the complete network, the output features of FoM module in model D show high separability, and 100% accuracy is achieved on the set of 240 random samples.</p>
</sec>
</sec>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>Monitoring the health status of each wind turbine planetary gearbox is of great significance for reducing the operation and maintenance cost of wind turbines. In order to solve the loss of fault information in the diagnosis process and improve the model diagnosis ability, this paper proposes an intelligent fault diagnosis method for wind turbine planetary gearboxes based on MSDFN. Both MSFF and FoM modules are used to perform feature fusion in the feature extraction and fault classification, respectively. The loss of fault information caused by continuous convolution and pooling in CNN is effectively mitigated. The proposed method is compared with two mainstream methods and three traditional image classification networks to verify its effectiveness. Compared with traditional CNN-based networks, MSDFN uses feature fusion twice to improve the accuracy of fault diagnosis under both single and mixed loads. In repeated experiments, the proposed method achieves more than 99.5% accuracy rate. The ablation study verifies that feature fusion is conducive to the gear fault diagnosis of planetary gearboxes. On the mixed dataset, MSFF and FoM modules increase the accuracy rate by 4.5% and 3.42%, respectively. To enable the proposed model to maintain high fault diagnosis accuracy in the case of large changes in working conditions, future work will focus on how to use transfer learning to improve the model&#x2019;s adaptability to working conditions.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The data analyzed in this study is subject to the following licenses/restrictions: The dataset is provided by the author of paper &#x201c;<ext-link ext-link-type="uri" xlink:href="https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs169538">https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs169538</ext-link>&#x201d;. Requests to access these datasets should be directed to <ext-link ext-link-type="uri" xlink:href="https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs169538">https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs169538</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>Conceptualization, YC and XH; Methodology, YL and XH; Software, XH; Writing XH and&#x20;YC.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work is jointly supported by the National Natural Science Foundation of China under Grant No. 61906026; Innovation research group of universities in Chongqing; Special key project of Chongqing technology innovation and application development: cstc2019jscx-zdztzx0068; The Chongqing Natural Science Foundation under Grant cstc2020jcyj-msxmX0577, cstc2020jcyj-msxmX0634; &#x201c;Chengdu-Chongqing Economic Circle&#x201d; innovation funding of Chongqing Municipal Education Commission KJCXZD2020028.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zareipour</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Fault Diagnosis of Wind Turbine Gearbox Based on Deep Bi-directional Long Short-Term Memory under Time-Varying Non-stationary Operating Conditions</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>155219</fpage>&#x2013;<lpage>155228</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2947501</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Preprocessing-free Gear Fault Diagnosis Using Small Datasets with Deep Convolutional Neural Network-Based Transfer Learning</article-title>. <source>Ieee Access</source> <volume>6</volume>, <fpage>26241</fpage>&#x2013;<lpage>26253</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2837621</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Deep Domain Adaption Model with Multi-Task Networks for Planetary Gearbox Fault Diagnosis</article-title>. <source>Neurocomputing</source> <volume>409</volume>, <fpage>173</fpage>&#x2013;<lpage>190</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.05.064</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Intelligent Fault Diagnosis Method of Planetary Gearboxes Based on Convolution Neural Network and Discrete Wavelet Transform</article-title>. <source>Comput. industry</source> <volume>106</volume>, <fpage>48</fpage>&#x2013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2018.11.003</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Mauricio</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Gryllias</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Deep Learning Method for Bearing Fault Diagnosis Based on Cyclic Spectral Coherence and Convolutional Neural Networks</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>140</volume>, <fpage>106683</fpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2020.106683</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Intelligent Fault Diagnosis of Rotating Machinery Based on Continuous Wavelet Transform-Local Binary Convolutional Neural Network</article-title>. <source>Knowledge-Based Syst.</source> <volume>216</volume>, <fpage>106796</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2021.106796</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dai</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2007</year>). &#x201c;<article-title>Bilateral Back-Projection for Single Image Super Resolution</article-title>,&#x201d; in <conf-name>2007 IEEE International Conference on Multimedia and Expo</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1039</fpage>&#x2013;<lpage>1042</lpage>. <pub-id pub-id-type="doi">10.1109/icme.2007.4284831</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Energy-fluctuated Multiscale Feature Learning with Deep Convnet for Intelligent Spindle Bearing Fault Diagnosis</article-title>. <source>IEEE Trans. Instrum. Meas.</source> <volume>66</volume>, <fpage>1926</fpage>&#x2013;<lpage>1935</lpage>. <pub-id pub-id-type="doi">10.1109/tim.2017.2674738</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dong</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). &#x201c;<article-title>Multi-scale Boosted Dehazing Network with Dense Feature Fusion</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>2157</fpage>&#x2013;<lpage>2167</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr42600.2020.00223</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Complex Signal Analysis for Wind Turbine Planetary Gearbox Fault Diagnosis via Iterative Atomic Decomposition Thresholding</article-title>. <source>J.&#x20;Sound Vibration</source> <volume>333</volume>, <fpage>5196</fpage>&#x2013;<lpage>5211</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsv.2014.05.029</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Vibration Signal Models for Fault Diagnosis of Planetary Gearboxes</article-title>. <source>J.&#x20;Sound Vibration</source> <volume>331</volume>, <fpage>4919</fpage>&#x2013;<lpage>4939</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsv.2012.05.039</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>An Enhanced Convolutional Neural Network with Enlarged Receptive fields for Fault Diagnosis of Planetary Gearboxes</article-title>. <source>Comput. Industry</source> <volume>107</volume>, <fpage>50</fpage>&#x2013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2019.01.012</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep Residual Learning for Image Recognition</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>770</fpage>&#x2013;<lpage>778</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr.2016.90</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Squeeze-and-excitation Networks</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>7132</fpage>&#x2013;<lpage>7141</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr.2018.00745</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Van Der Maaten</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Weinberger</surname>
<given-names>K. Q.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Densely Connected Convolutional Networks</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>4700</fpage>&#x2013;<lpage>4708</lpage>. <pub-id pub-id-type="doi">10.1109/cvpr.2017.243</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Irani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Peleg</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>1991</year>). <article-title>Improving Resolution by Image Registration</article-title>. <source>CVGIP: Graphical models image Process.</source> <volume>53</volume>, <fpage>231</fpage>&#x2013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1016/1049-9652(91)90045-l</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janssens</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Slavkovikj</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Vervisch</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Stockman</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Loccufier</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Verstockt</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Convolutional Neural Network Based Fault Detection for Rotating Machinery</article-title>. <source>J.&#x20;Sound Vibration</source> <volume>377</volume>, <fpage>331</fpage>&#x2013;<lpage>345</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsv.2016.05.027</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Stacked Multilevel-Denoising Autoencoders: A New Representation Learning Approach for Wind Turbine Gearbox Fault Diagnosis</article-title>. <source>IEEE Trans. Instrum. Meas.</source> <volume>66</volume>, <fpage>2391</fpage>&#x2013;<lpage>2402</lpage>. <pub-id pub-id-type="doi">10.1109/tim.2017.2698738</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Multiscale Convolutional Neural Networks for Fault Diagnosis of Wind Turbine Gearbox</article-title>. <source>IEEE Trans. Ind. Elect.</source> <volume>66</volume> (<issue>4</issue>), <fpage>3196</fpage>&#x2013;<lpage>3207</lpage>. <pub-id pub-id-type="doi">10.1109/TIE.2018.2844805</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jing</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A Convolutional Neural Network Based Feature Learning and Fault Diagnosis Method for the Condition Monitoring of Gearbox</article-title>. <source>Measurement</source> <volume>111</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2017.07.017</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Na</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Mikulovich</surname>
<given-names>V. I.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Method of State Identification of Rolling Bearings Based on Deep Domain Adaptation under Varying Loads</article-title>. <source>IET Sci. Meas. Techn.</source> <volume>14</volume>, <fpage>303</fpage>&#x2013;<lpage>313</lpage>. <pub-id pub-id-type="doi">10.1049/iet-smt.2019.0043</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Nandi</surname>
<given-names>A. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Applications of Machine Learning to Machine Fault Diagnosis: A Review and Roadmap</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>138</volume>, <fpage>106587</fpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2019.106587</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>F.-g.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>P.-m.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A New Method for Feature Extraction of Vibration Signals of Planetary Gearboxes</article-title>. <source>Noise and Vibration Control</source>. </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Intelligent Fault Diagnosis of Rotating Machinery via Wavelet Transform, Generative Adversarial Nets and Convolutional Neural Network</article-title>. <source>Measurement</source> <volume>159</volume>, <fpage>107768</fpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2020.107768</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Coordinated Pitch &#x26; Torque Control of Large-Scale Wind Turbine Based on Pareto Efficiency Analysis</article-title>. <source>Energy</source> <volume>147</volume>, <fpage>812</fpage>&#x2013;<lpage>825</lpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2018.01.055</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zio</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Artificial Intelligence for Fault Diagnosis of Rotating Machinery: A Review</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>108</volume>, <fpage>33</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2018.02.016</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Wind Turbine Planetary Gearbox Condition Monitoring Method Based on Wireless Sensor and Deep Learning Approach</article-title>. <source>IEEE Trans. Instrumentation Meas.</source> <volume>70</volume>, <fpage>1</fpage>&#x2013;<lpage>16</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Deep Residual Learning with Demodulated Time-Frequency Features for Fault Diagnosis of Planetary Gearbox under Nonstationary Running Conditions</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>127</volume>, <fpage>190</fpage>&#x2013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2019.02.055</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Predicting Remaining Useful Life of Rolling Bearings Based on Deep Feature Representation and Transfer Learning</article-title>. <source>IEEE Trans. Instrumentation Meas.</source> <volume>69 (4)</volume>, <fpage>1594</fpage>&#x2013;<lpage>1608</lpage>. <pub-id pub-id-type="doi">10.1109/TIM.2019.2917735</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>An</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Novel Real-Time Fault Diagnosis Method for Planetary Gearbox Using Transferable Hidden Layer</article-title>. <source>IEEE Sensors J.</source> <volume>20</volume>, <fpage>8403</fpage>&#x2013;<lpage>8412</lpage>. <pub-id pub-id-type="doi">10.1109/jsen.2020.2965988</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Novel Deep Learning Network via Multiscale Inner Product with Locally Connected Feature Extraction for Intelligent Fault Detection</article-title>. <source>IEEE Trans. Ind. Inf.</source> <volume>15</volume>, <fpage>5119</fpage>&#x2013;<lpage>5128</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2019.2896665</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peng</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Multibranch and Multiscale Cnn for Fault Diagnosis of Wheelset Bearings under strong Noise and Variable Load Condition</article-title>. <source>IEEE Trans. Ind. Inf.</source> <volume>16</volume>, <fpage>4949</fpage>&#x2013;<lpage>4960</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2020.2967557</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Simonyan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zisserman</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>. <comment>arXiv preprint arXiv:1409.1556</comment> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Mazur</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Structural Scheduling of Transient Control under Energy Storage Systems by Sparse-Promoting Reinforcement Learning</article-title>. <source>IEEE Trans. Ind. Inform</source>. <volume>18 (2)</volume>, <fpage>744</fpage>&#x2013;<lpage>756</lpage>. <pub-id pub-id-type="doi">10.1109/TII.2021.3084139</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chai</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>An Integrated Critic-Actor Neural Network for Reinforcement Learning with Application of Ders Control in Grid Frequency Regulation</article-title>. <source>Int. J.&#x20;Electr. Power Energ. Syst.</source> <volume>111</volume>, <fpage>286</fpage>&#x2013;<lpage>299</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijepes.2019.04.011</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>D.-F.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Na</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Litak</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Planetary-gearbox Fault Classification by Convolutional Neural Network and Recurrence Plot</article-title>. <source>Appl. Sci.</source> <volume>10</volume>, <fpage>932</fpage>. <pub-id pub-id-type="doi">10.3390/app10030932</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>Cascade Convolutional Neural Network with Progressive Optimization for Motor Fault Diagnosis under Nonstationary Conditions</article-title>. <source>IEEE Trans. Ind. Inform.</source> <volume>17 (4)</volume>, <fpage>2511</fpage>&#x2013;<lpage>2521</lpage>. <pub-id pub-id-type="doi">10.1109/TII.2020.3003353</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>An Intelligent Fault Diagnosis Approach for Planetary Gearboxes Based on Deep Belief Networks and Uniformed Features</article-title>. <source>Ifs</source> <volume>34</volume>, <fpage>3619</fpage>&#x2013;<lpage>3634</lpage>. <pub-id pub-id-type="doi">10.3233/jifs-169538</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.-r.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>G.-d.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>C.-f.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Planetary Gearbox Fault Feature Learning Using Conditional Variational Neural Networks under Noise Environment</article-title>. <source>Knowledge-Based Syst.</source> <volume>163</volume>, <fpage>438</fpage>&#x2013;<lpage>449</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2018.09.005</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Review of Early Fault Diagnosis Approaches and Their Applications in Rotating Machinery</article-title>. <source>Entropy</source> <volume>21</volume>, <fpage>409</fpage>. <pub-id pub-id-type="doi">10.3390/e21040409</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A New Convolutional Neural Network-Based Data-Driven Fault Diagnosis Method</article-title>. <source>IEEE Trans. Ind. Elect.</source> <volume>65 (7)</volume>, <fpage>5990</fpage>&#x2013;<lpage>5998</lpage>. <pub-id pub-id-type="doi">10.1109/TIE.2017.2774777</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Intelligent Fault Diagnosis of Rotating Machinery Based on One-Dimensional Convolutional Neural Network</article-title>. <source>Comput. Industry</source> <volume>108</volume>, <fpage>53</fpage>&#x2013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2018.12.001</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>De Silva</surname>
<given-names>C. W.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Fault Diagnosis for Rotating Machinery Using Multiple Sensors and Convolutional Neural Networks</article-title>. <source>IEEE/ASME Trans. mechatronics</source> <volume>23 (1)</volume>, <fpage>101</fpage>&#x2013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1109/TMECH.2017.2728371</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Fault Diagnosis of Rolling Bearing of Wind Turbines Based on the Variational Mode Decomposition and Deep Convolutional Neural Networks</article-title>. <source>Appl. Soft Comput.</source> <volume>95</volume>, <fpage>106515</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2020.106515</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Intelligent Fault Diagnosis of Rolling Bearings Based on Normalized Cnn Considering Data Imbalance and Variable Working Conditions</article-title>. <source>Knowledge-Based Syst.</source> <volume>199</volume>, <fpage>105971</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2020.105971</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Pecht</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes</article-title>. <source>IEEE Trans. Ind. Elect.</source> <volume>65 (5)</volume>, <fpage>4290</fpage>&#x2013;<lpage>4300</lpage>. <pub-id pub-id-type="doi">10.1109/TIE.2017.2762639</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pecht</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis</article-title>. <source>IEEE Trans. Ind. Elect.</source> <volume>68 (3)</volume>, <fpage>2587</fpage>&#x2013;<lpage>2597</lpage>. <pub-id pub-id-type="doi">10.1109/TIE.2020.2972458</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2020c</year>). <article-title>Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study</article-title>. <source>ISA Trans.</source> <volume>107</volume>, <fpage>224</fpage>&#x2013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1016/j.isatra.2020.08.010</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jie</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Mazur</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Power System Structure Optimization Based on Reinforcement Learning and Sparse Constraints under Dos Attacks in Cloud Environments</article-title>. <source>Simulation Model. Pract. Theor.</source> <volume>110</volume>, <fpage>102272</fpage>. <pub-id pub-id-type="doi">10.1016/j.simpat.2021.102272</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>