<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">578613</article-id>
<article-id pub-id-type="doi">10.3389/frai.2020.578613</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Prognostics and Health Management of Industrial Assets: Current Progress and Road Ahead</article-title>
<alt-title alt-title-type="left-running-head">Biggio and Kastanis</alt-title>
<alt-title alt-title-type="right-running-head">PHM: Progress and Road Ahead</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Biggio</surname>
<given-names>Luca</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">
<sup>&#x2a;</sup>
</xref>
<uri xlink:href="http://loop.frontiersin.org/people/913886/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kastanis</surname>
<given-names>Iason</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Data Analytics Lab, Institute of Machine Learning, Department of Computer Science, ETHZ: Eidgen&#xf6;ssische Technische Hochschule Z&#xfc;rich, <addr-line>Zurich</addr-line>, <country>Switzerland</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Robotics and Automation, CSEM SA: Swiss Center for Electronics and Microtechnology S.A., <addr-line>Alpnach</addr-line>, <country>Switzerland</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/719219/overview">Dimitris Kiritsis</ext-link>, &#xc9;cole Polytechnique F&#xe9;d&#xe9;rale de Lausanne, Switzerland</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/632708/overview">Saeed Tabar</ext-link>, Ball State University, United States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/630649/overview">Mehmet Ergun</ext-link>, Istanbul &#x15e;ehir University, Turkey</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Luca Biggio, <email>lbiggio@student.ethz.ch</email>
</corresp>
<fn>
<p>
<bold>Specialty section:</bold>This article was submitted to AI in Business, a section of the journal Frontiers in Artificial Intelligence</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>11</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>578613</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>06</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>09</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2020 Biggio and Kastanis</copyright-statement>
<copyright-holder>Biggio and Kastanis</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Prognostic and Health Management (PHM) systems are some of the main protagonists of the Industry 4.0 revolution. Efficiently detecting whether an industrial component has deviated from its normal operating condition or predicting when a fault will occur are the main challenges these systems aim at addressing. Efficient PHM methods promise to decrease the probability of extreme failure events, thus improving the safety level of industrial machines. Furthermore, they could potentially drastically reduce the often conspicuous costs associated with scheduled maintenance operations. The increasing availability of data and the stunning progress of Machine Learning (ML) and Deep Learning (DL) techniques over the last decade represent two strong motivating factors for the development of data-driven PHM systems. On the other hand, the black-box nature of DL models significantly hinders their level of interpretability, <italic>de facto</italic> limiting their application to real-world scenarios. In this work, we explore the intersection of Artificial Intelligence (AI) methods and PHM applications. We present a thorough review of existing works both in the contexts of fault diagnosis and fault prognosis, highlighting the benefits and the drawbacks introduced by the adoption of AI techniques. Our goal is to highlight potentially fruitful research directions along with characterizing the main challenges that need to be addressed in order to realize the promises of AI-based PHM systems.</p>
</abstract>
<kwd-group>
<kwd>prognostic and health management</kwd>
<kwd>predictive maintenance</kwd>
<kwd>industry 4.0</kwd>
<kwd>artificial intelligence</kwd>
<kwd>machine learning</kwd>
<kwd>deep leaning</kwd>
</kwd-group>
<counts>
<page-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<label>1</label>
<title>Introduction</title>
<p>Supporting the constant growth of modern industrial markets makes the optimization of operational efficiency and the minimization of superfluous costs essential. A substantial part of these costs often derives from the maintenance of industrial assets.</p>
<p>Recent studies<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref> show that, for the average factory, inefficient maintenance policies are responsible for costs ranging from 5 to 20% of the plant&#x2019;s entire productive capacity. Furthermore, according to the International Society of Automation (ISA)<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref>, the overall burden of unplanned downtime on industrial manufacturers across all industry segments is estimated to touch the impressive figure of $647&#xa0;billion per year.</p>
<p>If, on one hand, the above considerations highlight the fundamental impact of maintenance operations on manufacturers&#x2019; balances, on the other hand a large number of companies are still not satisfied with their maintenance strategies. According to a recent trend study gathering interviews with more than 230 senior European business<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref>, roughly 93% of them deem their maintenance policy inefficient.</p>
<p>As discussed later, the current most popular approaches to maintenance are divided into two categories, namely reactive maintenance and scheduled maintenance. Roughly speaking, the first implements maintenance operations immediately after a system failure occurs, whereas the second is based on scheduling maintenance operations at regular time intervals. These strategies naturally introduce significant extra costs due to machine downtime, component replacement or unnecessary maintenance interventions.</p>
<p>On the other hand, Predictive Maintenance (PM) represents a completely different paradigm that holds the promise of overcoming the inefficiencies of the aforementioned methods. PM is one of the hallmarks of the so-called Industry 4.0 revolution, i.e., the process of modernization of the industrial world induced by the advent of the digitalization era. The goal of PM systems is to implement a smarter and more dynamical approach to maintenance leveraging recent advances in sensor engineering and data analysis. The health state of a machine is now constantly monitored by a network of sensors and future maintenance operations are based on the analysis of the resulting data. An increasing number of organizations, motivated by their need for reducing costs and by the potential of PM, are starting to invest significant amounts of resources on the modernization of their current maintenance strategies<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>.</p>
<p>One natural question arising now is to what extent PM solutions can actually improve a company&#x2019;s efficiency in terms of reduction of downtime, cost savings and safety. A recent PWC study<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref> investigates the actual potential of PM beyond the hype generated around it in the last few years. The results are quite impressive: 95% of the interviewed organizations claim that the adoption of PM strategies contributed to the improvement of several key performance indicators. Roughly 60% of the involved companies report average improvements of more than 9% of machines uptime, and further enhancements in terms of cost savings, health risks, assets lifetime.</p>
<p>As mentioned above, as a key player in the fourth industrial revolution, PM exploits some of the most recent advances introduced in the last few years in computer science and information engineering. Among them, ML is arguably one of the technologies that is experiencing the most impressive growth in terms of investments and interest of the private sector. This increasing attention in AI technologies is mainly due to the tremendous contributions they have brought in fields such as Computer Vision (CV), Natural Language Processing (NLP) and Speech Recognition in the last decade.</p>
<p>PM approaches are heavily based on ML techniques. The increasing availability of relatively cheap sensors has made much easier to collect large amounts of data, which are in turn the main ingredients ML systems necessitate.</p>
<p>However, AI-based technologies should not be considered as a &#x201c;silver bullet&#x201d; capable of immediately addressing all the issues affecting current maintenance strategies. ML and DL, in particular, are constantly evolving fields and, despite their significant achievements, a number of drawbacks still limit their wide application to real-world scenarios. It is, therefore, necessary to be cautious and try to understand the limitations of current AI approaches in the context of PM and drive further research toward the resolution or the alleviation of these shortcomings.</p>
<p>The goal of this manuscript is to provide an updated critical review of the main AI techniques currently used in the context of PM. Specifically, we focus on highlighting the benefits introduced by modern DL techniques along with the challenges that these systems are not yet able to solve. Furthermore, we present a number of relatively unexplored solutions to these open problems based on some of the most recent advances proposed in the AI community in the last few years.</p>
<p>This manuscript is structured as follows: <xref ref-type="sec" rid="s2">Section 2</xref> briefly describes classic maintenance strategies and introduces the core ideas from Prognostic and Health Management (PHM). <xref ref-type="sec" rid="s3">Section 3</xref> discusses the benefits of data-driven approaches and presents some of the most popular AI-based methods used in PHM. <xref ref-type="sec" rid="s4">Section 4</xref> summarizes the main open challenges in PHM and presents some of their possible solutions. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Elements of Prognostic and Health Management</title>
<p>Prognostic and Health Management (PHM) is an engineering field whose goal is to provide users with a thorough analysis of the health condition of a machine and its components (<xref ref-type="bibr" rid="B92">Lee et al., 2014</xref>). To this extent, PHM employs tools from data science, statistics and physics in order to detect an eventual fault (anomaly detection) in the system, classify it according to its specific type (diagnostic) and forecast how long the machine will be able to work in presence of this fault (prognostic) (<xref ref-type="bibr" rid="B83">Kadry, 2012</xref>).</p>
<p>First, we present the most popular maintenance approaches, highlighting the advantages and disadvantages of these different methods in terms of costs and overall machine downtime. Then, we describe the entire PHM process by describing the role of its main sub-components in the context of the previously introduced maintenance approaches.</p>
<sec id="s2-1">
<label>2.1</label>
<title>Different Approaches to Maintenance</title>
<p>The choice of an efficient maintenance strategy is crucial for reducing costs and minimizing the overall machine&#x2019;s downtime. The adoption of a particular maintenance strategy primarily depends on the needs and the characteristics of the company&#x2019;s production line. Indeed, each maintenance policy introduces some benefits and disadvantages directly impacting costs in different modalities. In this review, we identify four distinct approaches to maintenance, namely: Reactive Maintenance (RM), Scheduled Maintenance (SM), Condition-Based Maintenance (CBM), and Predictive Maintenance (PM) (<xref ref-type="bibr" rid="B41">Fink, 2020</xref>).</p>
<sec id="s2-1-1">
<label>2.1.1</label>
<title>Reactive Maintenance</title>
<p>RM consists of repairing or substituting a machine component only once it fails and it can no longer operate. The immediate advantage of this approach is that the amount of maintenance manpower and expenses related to keeping machines running are minimized (<xref ref-type="bibr" rid="B179">Swanson, 2001</xref>). Furthermore, since machines are active until they break, their utilization time is maximized. On the other hand, this approach is risky from many perspectives. First and foremost, it is potentially dangerous from the point of view of safety. Waiting for a machine to reach its maximum stress level can result in catastrophic failures. Moreover, this type of failures usually introduce larger costs and need a significant amount of time to be repaired. Therefore, by adopting this maintenance strategy, one might expect conspicuous costs arising both from reparations of severe failures and from relatively large unplanned machines downtimes.</p>
</sec>
<sec id="s2-1-2">
<label>2.1.2</label>
<title>Scheduled Maintenance</title>
<p>SM is based on maintenance interventions carried out at regular time intervals. The goal is to minimize the probability of failures and thus avoid costly unplanned downtimes by performing maintenance activities even when the machine is still operating under normal conditions. SM strongly relies on a meaningful schedule that has to be tailored to the specific properties of the equipment. In particular, experts have to provide a detailed evaluation of the failure behavior of the machines and of their components in order to maximize the level of accuracy on the prediction of the next failure time. This analysis typically results in the so-called &#x201c;bathub&#x201d; curves (<xref ref-type="bibr" rid="B132">Mobley, 2002</xref>), as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>The bathub curve shows that the most likely times for a machine to break are right after the installation and after its normal operating time.</p>
</caption>
<graphic xlink:href="frai-03-578613-g001.tif"/>
</fig>
<p>The bathtub curve illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref> shows that a machine component presents a high risk of failure right after it is installed (because of installation errors or incompatibility issues with other components) and after its normal operation interval (because of natural degradation and wear out.). Between these two phases, the machine is supposed to work properly and its failure probability is low and constant.</p>
<p>The main advantage of SM is that it significantly reduce unplanned downtime. Furthermore, the reparation costs are generally less dramatic than those encountered in RM, since, now, machines are not allowed to operate until their breaking point. On the other hand, a SM approach presents the concrete risk of carrying out several relatively expensive maintenance interventions even when the equipment is still working properly. Sticking to a fixed degradation model of a certain machine might lead maintenance operators to miss anomalies caused by external factors or internal malfunctions that make the machine&#x2019;s degradation pattern deviate from its predicted trend.</p>
</sec>
<sec id="s2-1-3">
<label>2.1.3</label>
<title>Condition Based and Predictive Maintenance</title>
<p>CBM and PM differ from the types of maintenance strategies previously described in that they employ data-driven techniques to assist technicians to efficiently set times for maintenance activities. The goal of these methods is to provide a good compromise between maintenance frequency and its relative costs (<xref ref-type="bibr" rid="B152">Ran et al., 2019</xref>).</p>
<p>The difference between CBM and PM lies entirely in their different responses when a defective system condition is detected. In this case, a CBM approach would intervene on the system immediately after the detection time. This method could lead to the replacement or repair of a component of the equipment even if it could have continued its normal routine for a longer time without affecting other parts of the machine. Furthermore, intervening immediately after the fault has been detected might result in stopping the machines&#x2019; working cycle at an inconvenient stage from the point of view of production efficiency.</p>
<p>To the contrary, PM tries to predict the useful lifetime of a component at a certain time step in order to indicate the point in the future where maintenance has to be performed. This last approach inevitably results in lower maintenance costs compared to CBM, since each component can be fully exploited without sacrificing safety and efficiency (<xref ref-type="bibr" rid="B41">Fink, 2020</xref>).</p>
<p>
<xref ref-type="fig" rid="F2">Figure 2</xref> summarizes the maintenance strategies presented above by illustrating the costs resulting from their different approaches.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Scheme of the behavior of the different maintenance approaches described above. Figure adapted from <xref ref-type="bibr" rid="B41">Fink (2020)</xref>.</p>
</caption>
<graphic xlink:href="frai-03-578613-g002.tif"/>
</fig>
</sec>
</sec>
<sec id="s2-2">
<label>2.2</label>
<title>Prognostic and Health Management Process</title>
<p>As mentioned before, PHM makes use of information extracted from data to assess the health state of an industrial component and driving maintenance operations accordingly. <xref ref-type="fig" rid="F3">Figure 3</xref> illustrates the main components constituting the typical PHM pipeline, from data acquisition to decision making.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Main steps of the typical PHM process. This can be divided into CBM <bold>(left)</bold> and PM <bold>(right)</bold>. RUL estimation is enhanced by information extracted at the CBM level, such as the time step where degradation starts to show its effects. Figure adapted from <xref ref-type="bibr" rid="B85">Khan and Yairi (2018)</xref>.</p>
</caption>
<graphic xlink:href="frai-03-578613-g003.tif"/>
</fig>
<p>The very first step of the PHM process consists of selecting a suitable set of sensors and devices, setting them up in the most appropriate location and deciding on an optimal sampling frequency for data collection. The communication system between sensors and databases must be implemented in order to allow for both real-time machine health monitoring and offline data handling. To this extent, a widely adopted solution by industries is the Open Platform Communication Unified Architecture (OPC UA), a popular communication protocol that allows information to be shared across sensors, industrial assets and the Cloud in a highly secure way (<xref ref-type="bibr" rid="B17">Bruckner et al., 2019</xref>).</p>
<p>Once the sensor array is in place, data can be acquired. These data are typically in forms that are not compatible with the input shape requested by AI algorithms. Therefore, a data pre-processing step must be implemented in order to clean the data, mitigate the effects induced by noise or simply reshape them so that their new format can be interpreted by data analysis techniques.</p>
<p>The resulting data are cleaner than the original ones but can still contain a substantial amount of redundant information. This motivates the application of feature extraction techniques to reduce the dimensionality of the data and retain only the most meaningful pieces of information. As we see in the next section, most modern AI techniques are designed to automatically extract informative features without any need for expert knowledge and manual feature engineering.</p>
<sec id="s2-2-1">
<label>2.2.1</label>
<title>Condition-Based Maintenance</title>
<p>CBM consists of two main elements: anomaly detection and diagnosis [see <xref ref-type="fig" rid="F3">Figure 3</xref> (left)]. Both these processes immediately follow the data extraction and data pre-processing pipelines described above and aim at supporting the decision making step with meaningful information about the state of the system. The information extracted by the anomaly detection and diagnosis modules can subsequently be exploited at the PM level in order to provide an even richer description of the machine&#x2019;s health state [see <xref ref-type="fig" rid="F3">Figure 3</xref> (right)].</p>
<sec id="s2-2-1-1">
<label>2.2.1.1</label>
<title>Anomaly Detection</title>
<p>Anomaly detection is responsible for automatically establishing whether the input data present any discrepancy compared to some internal model of the normal machine&#x2019;s behavior (<xref ref-type="bibr" rid="B85">Khan and Yairi, 2018</xref>). This internal representation can be learned by extracting and storing representative features from data gathered from healthy machines. It is important to note that, in general, healthy data, i.e., data gathered from machines working under normal working conditions, are much more abundant than faulty data. This is because, typically, a machine can incur in several different types of faults, each of which is, luckily, relatively rare. As a conclusive remark, we highlight that the detection of an anomaly does not necessarily imply that it corresponds to a fault. It might be, for instance, that it represents a new healthy feature that does not have any representatives into the historical data or has not been modeled by the anomaly detection algorithm&#x2019;s internal model.</p>
</sec>
<sec id="s2-2-1-2">
<label>2.2.1.2</label>
<title>Fault Diagnosis</title>
<p>Fault diagnosis moves one step forward with respect to anomaly detection since, besides detecting that an outlier is present, it also identifies the cause at the basis of that anomaly (<xref ref-type="bibr" rid="B62">Hess, 2002</xref>). Fault diagnosis models are based on historical data representing different faulty conditions. These data are used to characterize each type of fault and allow the models to classify new previously unseen data within a predefined set of fault cases.</p>
</sec>
</sec>
<sec id="s2-2-2">
<label>2.2.2</label>
<title>Predictive Maintenance</title>
<p>The main difference between CBM and PM is that PM algorithms deal with the problem of predicting the Remaining Useful Life (RUL) of an industrial component before a complete failure occurs and the machine is no longer able to operate (<xref ref-type="bibr" rid="B127">Medjaher et al., 2012</xref>; <xref ref-type="bibr" rid="B41">Fink, 2020</xref>). Therefore, the key enablers of PM strategies are algorithms capable of efficiently forecasting the future state of a machine, i.e., provide prognostic information about its RUL.</p>
<sec id="s2-2-2-1">
<label>2.2.2.1</label>
<title>Fault Prognosis</title>
<p>As mentioned before, fault prognosis is about providing an as accurate as possible prediction of the RUL of a certain machine component. The RUL estimation process starts from the identification of a time-step where a fault begins to show its effects. The final goal is to infer how long the machine can continue operating even in the presence of a degradation trend due to the previously detected fault.</p>
<p>Contrarily to diagnosis, time plays a crucial role in prognosis, since the objective is now to provide an estimate of the future time step when a certain event will occur (<xref ref-type="bibr" rid="B92">Lee et al., 2014</xref>). It is important to note that RUL predictions are strongly affected by various sources of noise. These can arise from noisy sensor readings, the inherent stochasticity of the RUL forecasting problem and the choice of an imperfect model for the machine degradation process.</p>
</sec>
</sec>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Artificial Intelligence-Based Prognostic and Health Management</title>
<p>The attempt of devising artificial agents with the ability to emulate or even improve some aspects characterizing human intelligence is what makes AI an extremely exciting field of research both from a fundamental and a practical points of view. ML, as a branch of AI, studies the problem of designing machines capable of learning through experience and by extracting information from data (<xref ref-type="bibr" rid="B131">Mitchell, 1997</xref>). &#x201c;Learning from experience&#x201d; represents a distinctive human feature that enables us to actively interact with the world we live in. It allows us to build a progressively more accurate internal model of the surrounding environment by processing and interpreting the external signals our body is able to perceive.</p>
<p>Similarly to humans, intelligent systems can process the information perceived by an array of sensors about a given industrial component and provide a model of its operating condition and its health status. The increasing availability of data and the high level of computational power reached by modern hardware components make the application of AI techniques even more appealing.</p>
<p>ML has witnessed an increasing interest in the last few decades. A turning point has been set by the introduction of the first state-of-the-art DL technique almost 10&#xa0;years ago by <xref ref-type="bibr" rid="B89">Krizhevsky et al. (2012)</xref> in the context of Image Recognition (IR). This event has triggered a new era in the field of data analysis characterized by a plethora of new applications of DL to a series of disparate engineering fields, ranging from NLP to CV.</p>
<p>The goal of this section is to give the reader an insight into the intersection of ML and PHM and the progress made by the scientific community hitherto. First, we present the main steps involved in the application of &#x201c;traditional&#x201d; ML techniques to PHM and we discuss how these can be utilized in the contexts of diagnosis and prognosis. Then, we present a number of popular DL techniques and we review some of their most interesting applications in this context.</p>
<sec id="s3-1">
<label>3.1</label>
<title>&#x201c;Classical&#x201d; Machine Learning Methods</title>
<p>Before the explosion of DL almost one decade ago, the typical process followed by the majority of data-driven approaches to PHM is illustrated in <xref ref-type="fig" rid="F4">Figure 4</xref>. The raw measurements provided by a battery of sensors can not be straightforwardly linked with the health state of the machine or its RUL. Indeed, they are often affected by a significant amount of noise that can be introduced by either external factors, such as a sudden temperature increase, or imperfect signal transmissions. Furthermore, often these data are represented by complex time-series or images, that are typically characterized by a highly redundant information content that tends to hide the relatively limited discriminative features of interest. For the above reasons, once data are acquired, a set of candidate features have to be extracted and then, only the most informative among them have to be properly selected. Once these steps are completed, the final set of extracted features can be used to train a ML algorithm to perform the desired diagnosis or prognosis task.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Main steps characterizing the approaches based on traditional ML algorithms. Adapted from Zhao et al. (2016).</p>
</caption>
<graphic xlink:href="frai-03-578613-g004.tif"/>
</fig>
<p>In the following, we briefly go through all the aforementioned steps, discussing some of the main techniques involved in each of them.</p>
<sec id="s3-1-1">
<label>3.1.1</label>
<title>Feature Extraction and Feature Selection</title>
<sec id="s3-1-1-1">
<label>3.1.1.1</label>
<title>Feature Extraction</title>
<p>According to Yu (2019), feature extraction can be defined as the task of transforming raw data into more informative features that serve the need of follow-up predictive models and that help in improving performances on unseen data.</p>
<p>A general recipe for the feature extraction task does not exist and a set of key context-dependent factors must be taken into account. Some of these are, for example, the specific type of task to be performed, the characteristics of the data, the application domain and the algorithmic and efficiency requirement (<xref ref-type="bibr" rid="B54">Guyon et al., 2006</xref>). For instance, traditional choices of features in the context of IR are those obtained by the SIFT (<xref ref-type="bibr" rid="B115">Lowe, 2004</xref>) and SURF (<xref ref-type="bibr" rid="B11">Bay et al., 2008</xref>) algorithms, whereas mel-cepstral coefficients (<xref ref-type="bibr" rid="B28">Davis and Mermelstein, 1980</xref>; <xref ref-type="bibr" rid="B88">Kopparapu and Laxminarayana, 2010</xref>) are typically chosen in speech recognition applications.</p>
<p>In the context of PHM, data recorded for the purpose of equipment maintenance come often in the form of time-series. Therefore, an opportune set of features must be chosen according to the properties of the signals under consideration, e.g., its physical nature (temperature, pressure, voltage, acceleration,&#x2026;), its dynamics (cyclic, periodic, stationary, stochastic), its sampling frequency and its sample value discretization (continuous, discrete)<xref ref-type="fn" rid="FN5">
<sup>5</sup>
</xref>. Typical examples of features extracted from raw time-series data can be divided into three categories (<xref ref-type="bibr" rid="B95">Lei et al., 2020</xref>): time domain, frequency domain and time-frequency domain. The first includes statistical indicators such as mean, standard deviation, root mean square, skewness, kurtosis, crest factor, signal-to-noise ratio. Other standard time-domain feature extraction methods are traditional signal processing techniques such as auto and cross-correlation, convolution, fractal analysis (<xref ref-type="bibr" rid="B213">Yang et al., 2007</xref>) and correlation dimension (<xref ref-type="bibr" rid="B114">Logan and Mathew, 1996</xref>). Finally, model-based approaches such as autoregressive (AR, ARMA) or probability distribution models where features consist of the model parameters (<xref ref-type="bibr" rid="B147">Poyhonen et al., 2004</xref>) are also commonly used.</p>
<p>Features extracted from the frequency domain are typically obtained through spectral analysis of the signal of interest. Fast-Fourier-Transform is applied to raw data to extract the power spectrum and retrieve information about the characteristic frequencies of the signal. Finally, time-frequency domain feature extraction techniques include short-time Fourier transform, wavelet transform and empirical mode decomposition, among others. The goal of these methods is to capture how the frequency components of the signal vary as functions of time and are particularly useful for non-stationary time-series analysis.</p>
</sec>
<sec id="s3-1-1-2">
<label>3.1.1.2</label>
<title>Feature Selection</title>
<p>The goal of feature extraction is to obtain a first set of candidate features that are as informative as possible for the problem under consideration. Feature selection aims at reducing the dimension of the feature space by individuating a subset of features that are maximally relevant for a certain objective. According to the pioneering work of <xref ref-type="bibr" rid="B54">Guyon et al. (2006)</xref>, feature selection methods can be divided into three categories: filters, wrappers and embedded methods. The first class of approaches consists of finding a subset of features that is optimal according to a specified objective measuring the information content of the proposed candidates. This objective is independent of the particular ML algorithm used to perform the PHM task and therefore the resulting features will be typically more general and potentially usable by different ML algorithms. Several feature selection techniques are based on the calculation of information-theoretic quantities such as the Pearson coefficient or the information gain. For instance, the Minimum-Redundancy-Maximum-Relevance (mRMR) technique is based on the idea that the optimal subset of features should be highly correlated with the target variable (which might be, for example, the classification label indicating a specific fault type) and mutually far away from each other.</p>
<p>Wrapper-based methods differs from their filter-based counterpart in the criteria they use for assessing the &#x201c;goodness&#x201d; of a specific set of features. Specifically, they directly employ the ML algorithm to get feedback, usually in form of accuracy or loss function, about the selected candidates. Wrappers are usually able to achieve better performances than filters since they are optimized with respect to a specific ML algorithm which is in turn tailored for a specific task. On the other hand, wrappers are biased toward the ML algorithm they are based on and therefore the resulting feature subset will not be generally adequate for alternative ML techniques.</p>
<p>The final class of feature selection methods is represented by the so-called embedded approaches. These techniques integrate the feature selection process directly into the ML algorithm in an end-to-end fashion. A popular example of embedded approach is the LASSO (Least Absolute Shrinkage and Selection Operator) (<xref ref-type="bibr" rid="B186">Tibshirani, 1996</xref>) which is a method for linear-regression that solves the following optimization problem:<disp-formula id="e1">
<mml:math>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi>
</mml:mfrac>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>with<disp-formula id="e2">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>d</mml:mi>
</mml:munderover>
<mml:mo>&#x7c;</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>The <inline-formula id="inf1">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> norm forces the learnt solution <inline-formula id="inf2">
<mml:math>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>w</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> to be sparse and therefore, only the least redundant features are selected. Other methods used for end-to-end feature selection are, for instance, the Akaike Information Criterion (AIC) (<xref ref-type="bibr" rid="B158">Sakamoto et al., 1986</xref>) and the Bayesian Information Criterion (BIC) (<xref ref-type="bibr" rid="B136">Neath and Cavanaugh, 2012</xref>) which are both based on finding features that are generalizable and not problem-specific.</p>
<p>As a conclusive remark, it is worth mentioning that, similarly to feature selection approaches, also dimensionality reduction methods aim at reducing the level of redundancy and maximizing the amount of informativeness present among the feature candidates. Techniques such as Principal Component Analysis (PCA) (<xref ref-type="bibr" rid="B81">Jolliffe, 1986</xref>) are used to project data onto a lower-dimensional linear subspace perpendicular to the feature removed. Other popular dimensionality reduction techniques are Linear Discriminants Analysis (LDA) (<xref ref-type="bibr" rid="B125">McLachlan, 2004</xref>), Exploratory Projection Pursuit (EPP) (<xref ref-type="bibr" rid="B42">Friedman, 1987</xref>), Independent Component Analysis (ICA) (<xref ref-type="bibr" rid="B69">Hyv&#xe4;rinen and Oja, 2000</xref>) and T-distributed Stochastic Neighbor Embedding (t-SNE) (<xref ref-type="bibr" rid="B121">Maaten and Hinton, 2008</xref>), among others.</p>
</sec>
</sec>
<sec id="s3-1-2">
<label>3.1.2</label>
<title>Traditional Machine Learning Algorithms</title>
<p>As shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, once features are extracted and properly selected, they can be used as input for a ML algorithm responsible for performing the diagnosis or prognosis task we are interested in. In this section, we focus on &#x201c;traditional&#x201d; ML algorithms, i.e., popular AI methods widely employed before the advent of DL. These techniques can be divided into four main sub-categories, namely: (shallow) Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and K-Nearest Neighbor (KNN).</p>
<sec id="s3-1-2-1">
<label>3.1.2.1</label>
<title>Diagnosis</title>
<p>All the aforementioned classes of algorithms have been applied to fault diagnosis in several different contexts. In the following, we first briefly discuss the basic principles of these methods and then we list some of their most interesting applications.</p>
<sec id="s3-1-2-1-1">
<label>3.1.2.1.1</label>
<title>Artificial Neural Networks</title>
<p>ANNs are popular ML algorithms whose design draws inspiration from the biological mechanism at the basis of neural connections in the human brain. They consist of elementary processing units, called neurons, connected to each other by means of dynamic weights of variable magnitudes, whose role is meant to emulate the behavior of synaptic connections in animals&#x2019; brains. Different types ANNs topologies can be constructed by differently organizing the neurons and their relative connections. The choice of the specific ANN architecture crucially depends on the nature of the task to be performed, the data structure under consideration and the availability of computational resources.</p>
<p>Over the last two decades, ANNs have been used to detect and classify faults incurring in several diverse types of machines. For instance, they have been applied to fault diagnosis of rolling element bearings (<xref ref-type="bibr" rid="B161">Samanta and Al-Balushi, 2003</xref>), induction motors (<xref ref-type="bibr" rid="B8">Ayhan et al., 2006</xref>), gears (<xref ref-type="bibr" rid="B160">Samanta, 2004</xref>; <xref ref-type="bibr" rid="B4">Abu-Mahfouz, 2005</xref>), engines (<xref ref-type="bibr" rid="B117">Lu et al., 2001</xref>), turbine blades (<xref ref-type="bibr" rid="B90">Kuo, 1995</xref>; <xref ref-type="bibr" rid="B138">Ngui et al., 2017</xref>), electrical (<xref ref-type="bibr" rid="B133">Moosavi et al., 2016</xref>) and photovoltaic (<xref ref-type="bibr" rid="B25">Chine et al., 2016</xref>) devices, among others.</p>
<p>The choice of output layer directly reflects the kind of task we are interested in. For instance, for fault detection tasks, two neurons can be used to output the probability that the input corresponds to a healthy instance or a faulty one. On the other hand, if we are interested in fault diagnosis, the number of output neurons is equal to the number of faults affecting the machine under consideration. A typical example of ANNs application to fault detection is provided by <xref ref-type="bibr" rid="B161">Samanta and Al-Balushi (2003)</xref>. In this work, five time-domain features (RMS, skewness, variance, kurtosis, and normalized sixth central moment) are extracted from raw vibration signals. These features are then used as inputs to a shallow ANN consisting of two hidden layers with 16 and 10 neurons respectively and one output layer with two neurons (indicating if the input corresponds to normal or failed bearing).</p>
</sec>
<sec id="s3-1-2-1-2">
<label>3.1.2.1.2</label>
<title>Support Vector Machines</title>
<p>Given a dataset <inline-formula id="inf3">
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf4">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf5">
<mml:math>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, SVMs aim at separating the two classes of data by finding the optimal hyperplane with the maximum margin between them. The margin is the distance between the nearest training data points of any class. In most real-world problem, data are not linearly separable. In these cases, the so-called kernel trick (<xref ref-type="bibr" rid="B64">Hofmann et al., 2008</xref>) can be used to tackle nonlinear classification tasks by implicitly mapping the data into a high-dimensional feature space.</p>
<p>Standard SVMs, along with a number of improved variants, have been extensively applied to fault diagnosis. For example, they have been used for assessing the health state of rolling element bearings (<xref ref-type="bibr" rid="B212">Yang et al., 2005</xref>; <xref ref-type="bibr" rid="B2">Abbasion et al., 2007</xref>; <xref ref-type="bibr" rid="B50">Gryllias and Antoniadis, 2012</xref>; <xref ref-type="bibr" rid="B40">Fern&#xe1;ndez-Francos et al., 2013</xref>; <xref ref-type="bibr" rid="B71">Islam et al., 2017</xref>; <xref ref-type="bibr" rid="B73">Islam and Kim, 2019b</xref>), induction motors (<xref ref-type="bibr" rid="B200">Widodo and Yang, 2007</xref>), gearboxes (<xref ref-type="bibr" rid="B111">Liu et al., 2013</xref>), engines (<xref ref-type="bibr" rid="B105">Li et al., 2012</xref>), wind turbines (<xref ref-type="bibr" rid="B163">Santos et al., 2015</xref>) and air conditioning systems (<xref ref-type="bibr" rid="B176">Sun et al., 2016a</xref>).</p>
<p>In order to perform fault diagnosis tasks, SVMs are typically employed alongside One-Against-One (OAO) (<xref ref-type="bibr" rid="B212">Yang et al., 2005</xref>; <xref ref-type="bibr" rid="B71">Islam et al., 2017</xref>) or One-Against-All (OAA) (<xref ref-type="bibr" rid="B2">Abbasion et al., 2007</xref>; <xref ref-type="bibr" rid="B50">Gryllias and Antoniadis, 2012</xref>) strategies. Furthermore, SVMs can also be applied to anomaly detection. For example, <xref ref-type="bibr" rid="B111">Liu et al. (2013)</xref> train a one-class SVM only on healthy data to detect anomalies in bearings vibrational data.</p>
<p>Generally, SVMs are particularly well suited for problems characterized by high-dimensional features. On the other hand, the computation of the <inline-formula id="inf6">
<mml:math>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> kernel matrix can be highly expensive when the number of data instances is relatively large.</p>
</sec>
<sec id="s3-1-2-1-3">
<label>3.1.2.1.3</label>
<title>Decision Trees</title>
<p>Decision trees (DTs) represent a class of non-parametric supervised ML algorithms commonly used for regression and classification. DTs are trained to infer a mapping between data features and the corresponding output values by learning a set of relatively simple and interpretable decision rules. As the name suggests, these classification rules correspond to paths linking the root node to the leaf nodes. Indeed. each internal node can be seen as a condition on a particular attribute. The different outcomes of this test are represented by the branches generated from that node. The C4.5 algorithm (<xref ref-type="bibr" rid="B151">Quinlan, 2014</xref>) is one of the most popular approaches to learn a DT.</p>
<p>DTs have been widely employed in the context of fault diagnosis over the last two decades. For example, they have been applied to process data gathered from rolling element bearing systems (<xref ref-type="bibr" rid="B173">Sugumaran and Ramachandran, 2007</xref>; <xref ref-type="bibr" rid="B172">Sugumaran, 2012</xref>), gearboxes (<xref ref-type="bibr" rid="B164">Saravanan and Ramachandran, 2009</xref>; <xref ref-type="bibr" rid="B148">Praveenkumar et al., 2018</xref>), wind turbines (<xref ref-type="bibr" rid="B3">Abdallah et al., 2018</xref>), centrifugal pumps (<xref ref-type="bibr" rid="B159">Sakthivel et al., 2010</xref>), and photovoltaic systems (<xref ref-type="bibr" rid="B14">Benkercha and Moulahoum, 2018</xref>).</p>
<p>Multiple DTs can be employed jointly to form a random forest (RF), an ensemble learning algorithm capable of overcoming some shortcomings of single decision trees, such as limited generalization and overfitting. RFs have been successfully applied to fault diagnosis of induction motors (<xref ref-type="bibr" rid="B211">Yang et al., 2008</xref>), rolling bearings (<xref ref-type="bibr" rid="B194">Wang et al., 2017</xref>), and aircraft engines (<xref ref-type="bibr" rid="B208">Yan, 2006</xref>) among others.</p>
<p>The main advantages provided by DTs stand in their high level of interpretability, resulting from the easily decipherable decision rules they implement. Moreover, they often achieve reasonably high accuracies in most of the classification problems they are applied to. On the other hand, these methods are often prone to overfitting and therefore tend to provide poor generalization performances.</p>
</sec>
<sec id="s3-1-2-1-4">
<label>3.1.2.1.4</label>
<title>K-Nearest Neighbor</title>
<p>KNN is non-parametric algorithm widely used for classification tasks. Given a set of input-output pairs <inline-formula id="inf7">
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and a test datum <inline-formula id="inf8">
<mml:math>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, the KNN algorithm searches the <italic>k</italic> closest training inputs to <inline-formula id="inf9">
<mml:math>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> in the feature space and label the test datum with the label having more representatives among the <italic>k</italic> selected training data. Closeness can be measured by an arbitrary similarity measure, such as the Euclidean distance. Due to its simplicity and its high level in interpretability, KNN-based approaches have found many applications in fault diagnosis. For example, the literature includes example of applications in the context of rolling element bearings (<xref ref-type="bibr" rid="B126">Mechefske and Mathew, 1992</xref>; <xref ref-type="bibr" rid="B134">Moosavian et al., 2013</xref>; <xref ref-type="bibr" rid="B185">Tian et al., 2016</xref>) and gears (<xref ref-type="bibr" rid="B96">Lei and Zuo, 2009</xref>; <xref ref-type="bibr" rid="B48">Gharavian et al., 2013</xref>).</p>
<p>Enhanced versions of the basic KNN algorithms have been gradually introduced to boost its classification performances and to overcome some of its limitations, such as the computational load it requires to process large-sized datasets. For instance, <xref ref-type="bibr" rid="B6">Appana et al. (2017)</xref> introduce a new type of metric which augments the information provided by the distance between sample pairs with their relative densities. Also, <xref ref-type="bibr" rid="B94">Lei et al. (2009)</xref> apply a combination of weighted KNN (WKNN) classifiers to fault diagnosis of rolling bearings in order to cope with the problem of data instances belonging to different classes overlapping in the feature space. Finally, in <xref ref-type="bibr" rid="B33">Dong et al. (2017)</xref> and (<xref ref-type="bibr" rid="B193">Wang and Ma, 2014</xref>), KNN was optimized with the particle swarm algorithm (<xref ref-type="bibr" rid="B84">Kennedy and Eberhart, 1997</xref>) to alleviate the storage requirements of the former.</p>
<p>Overall, KNN and its enhanced versions can be considered as relatively effective algorithms for fault diagnosis, especially because of their simplicity and interpretability. Their main limitations stand in the high computational cost and their considerable sensitivity to noise.</p>
</sec>
</sec>
<sec id="s3-1-2-2">
<label>3.1.2.2</label>
<title>Prognosis</title>
<p>Generally, prognosis is a more challenging problem than diagnosis and therefore effective methods in this context are less simple to find. Below, we list some of the most interesting applications of ANNs, SVMs, and DTs to fault prognosis. KNNs are not as widespread as in fault diagnosis and their application is not common in RUL estimation.</p>
<sec id="s3-1-2-2-1">
<label>3.1.2.2.1</label>
<title>Artificial Neural Networks</title>
<p>Two of the first attempts of applying ANNs to fault prognosis problems are introduced in <xref ref-type="bibr" rid="B170">Shao and Nezu (2000)</xref> and <xref ref-type="bibr" rid="B47">Gebraeel et al. (2004)</xref>. Both approaches are proposed in the context of bearings RUL prediction. In <xref ref-type="bibr" rid="B170">Shao and Nezu (2000)</xref>, a three-layer neural network is used to forecast the value of the bearing health indicator. In <xref ref-type="bibr" rid="B47">Gebraeel et al. (2004)</xref> several fully-connected models are trained on either individual or on clusters of similar bearing features. Both methods use manually extracted statistical features as input of the corresponding ANNs. More recent approaches include, for example, <xref ref-type="bibr" rid="B35">Elforjani and Shanbr (2018)</xref> and <xref ref-type="bibr" rid="B182">Teng et al. (2016)</xref>. The first work proposes a comparative study of the performance of SVM, Gaussian Processes (<xref ref-type="bibr" rid="B153">Rasmussen, 2003</xref>) and ANNs for RUL estimation from features extracted from acoustic emission signals. The study reveals that the proposed ANN is the best performing model for the RUL prediction task under consideration. In <xref ref-type="bibr" rid="B182">Teng et al. (2016)</xref>, ANNs are used to provide short-term tendency prediction of a wind turbine gearbox degradation process. The approach is validated by a series of experiments on bearing degradation trajectories datasets, showing good RUL prediction performances.</p>
</sec>
<sec id="s3-1-2-2-2">
<label>3.1.2.2.2</label>
<title>Support Vector Machines</title>
<p>SVM-based methods have been extensively applied to fault prognosis tasks. <xref ref-type="bibr" rid="B66">Huang et al. (2015)</xref> provide an extensive review of the most relevant techniques employing SVM-related approaches in the context of RUL prediction. Application examples include RUL estimation of bearings (<xref ref-type="bibr" rid="B175">Sun et al., 2011</xref>; <xref ref-type="bibr" rid="B23">Chen et al., 2013</xref>; <xref ref-type="bibr" rid="B174">Sui et al., 2019</xref>), lithium-ion batteries (<xref ref-type="bibr" rid="B86">Khelif et al., 2017</xref>; <xref ref-type="bibr" rid="B195">Wei et al., 2018</xref>; <xref ref-type="bibr" rid="B226">Zhao H. et al., 2018</xref>; <xref ref-type="bibr" rid="B227">Zhao Q. et al., 2018</xref>) and aircraft engines (<xref ref-type="bibr" rid="B140">Ord&#xf3;&#xf1;ez et al., 2019</xref>). For instance, in <xref ref-type="bibr" rid="B195">Wei et al. (2018)</xref> Support Vector Regression (SVR) is used to provide a state-of-health state-space model capable of simulating the battery aging mechanism. Comparison of the performances provided by an ANN-based model of the same type shows the superiority of the proposed approach over its neural network-based counterpart. In the context of bearings fault prognosis, <xref ref-type="bibr" rid="B175">Sun et al. (2011)</xref> introduce a multivariate SVM for life prognostics of multiple features that are known to be tightly correlated with the bearings&#x2019; RUL. The proposed method shows good prediction performance and leverages the ability of SVM of dealing with high-dimensional small-sized datasets.</p>
</sec>
<sec id="s3-1-2-2-3">
<label>3.1.2.2.3</label>
<title>Decision Trees</title>
<p>DTs and RFs have also been applied to fault prognosis, in particular in the contexts of RUL estimation of bearings (<xref ref-type="bibr" rid="B165">Satishkumar and Sugumaran, 2015</xref>; <xref ref-type="bibr" rid="B145">Patil et al., 2018</xref>; <xref ref-type="bibr" rid="B181">Tayade et al., 2019</xref>), lithium-ion batteries (<xref ref-type="bibr" rid="B230">Zheng H. et al., 2019</xref>; <xref ref-type="bibr" rid="B232">Zheng Z. et al., 2019</xref>) and turbofan engines (<xref ref-type="bibr" rid="B124">Mathew et al., 2017</xref>). In <xref ref-type="bibr" rid="B145">Patil et al. (2018)</xref>, the authors train a RF to perform RUL regression by using time-domain features extracted from the bearings vibration signals. The model is evaluated on the dataset provided by IEEE PHM Challenge 2012 (<xref ref-type="bibr" rid="B5">Ali et al., 2015</xref>), showing improved results than previous benchmarks. One further example is provided by <xref ref-type="bibr" rid="B165">Satishkumar and Sugumaran (2015)</xref>, who cast the RUL estimation problem into a classification framework. In particular, statistical features in the time domain are extracted from five different temporal intervals from normal condition to bearing damage. A DT is then used to classify new data into one of these intervals, resulting in about 96% accuracy.</p>
</sec>
</sec>
</sec>
<sec id="s3-1-3">
<label>3.1.3</label>
<title>Discussion</title>
<sec id="s3-1-3-1">
<label>3.1.3.1</label>
<title>Dependency on Feature Extraction</title>
<p>Traditional ML algorithms have been widely applied both to fault diagnosis and fault prognosis tasks. They present the relevant advantage of combining rather good performances and a relatively high degree of interpretability. On the other hand, most of them rely on good quality features that have to be carefully extracted and selected by human experts. This dependency on the feature extraction step limits the potential of traditional ML methods and imposes a strong inductive bias in the learning process. As we discuss in the next section, &#x201c;deep&#x201d; algorithms can extract information directly from raw data and can often improve the generalization performances of traditional ML approaches.</p>
</sec>
<sec id="s3-1-3-2">
<label>3.1.3.2</label>
<title>Model Selection</title>
<p>It is important to observe that it is not possible to identify a specific algorithm, among those discussed above, that clearly outperforms the others in all possible settings. Selecting a specific technique highly depends on the requirements and characteristics of the PHM problem at hand. For example, a black-box ANN approach might be more suitable when one is mainly interested in performances and less in interpretability, SVMs can be useful in the low-data regime and DTs can be a sensible choice if interpretability is prioritized. Ultimately, the final algorithm is often chosen by calculating a set of performance metrics for each candidate technique and selecting the method providing the highest scores. Some standard example of these measures are accuracy, precision, Recall, F1 Score, Cohen Kappa (CK), and Area Under Curve (AUC). A description of these metrics can be found, for instance, in <xref ref-type="bibr" rid="B10">Bashar et al. (2020)</xref>.</p>
</sec>
<sec id="s3-1-3-3">
<label>3.1.3.3</label>
<title>Overfitting</title>
<p>The long-standing problem of overfitting (or over-training) is a well-known pathology affecting data-driven approaches. In essence, it stems from the imbalance between model capacity and data availability. If on one hand, the adoption of ML techniques can be significantly beneficial in PHM, on the other hand, it also requires to think about effective solutions to contrast overfitting in order to fully exploit the advantages of data-driven approaches. In the context of PHM applications, a key requirement for the deployment of a given ML algorithm stands indeed in the robustness of its performances when data different from the training ones kick in. Although algorithm-specific techniques exist to tackle overfitting, held-out-cross validation (<xref ref-type="bibr" rid="B59">Hastie et al., 2001</xref>) is probably the most popular one and can be used independently on the particular ML algorithm (see, for instance, <xref ref-type="bibr" rid="B47">Gebraeel et al., 2004</xref>), for ANNs (<xref ref-type="bibr" rid="B71">Islam et al., 2017</xref>), for SVMs (<xref ref-type="bibr" rid="B3">Abdallah et al., 2018</xref>), for decision trees and (<xref ref-type="bibr" rid="B185">Tian et al., 2016</xref>) for KNN).</p>
<p>As regards DTs, overfitting is typically tackled by pruning the tree in order to prevent it to merely memorize the training set and improve performances on unseen data (<xref ref-type="bibr" rid="B148">Praveenkumar et al., 2018</xref>). Random forests have also been used for the same purpose (<xref ref-type="bibr" rid="B211">Yang et al., 2008</xref>). They consist of ensembles of DTs and one of their main benefits is to mitigate the overfitting tendency of standard DTs.</p>
<p>A widely used strategy to contrast over-training in SVMs is to introduce a set of so-called slack variables in order to allow some data instances to lie on the wrong side of the margin (<xref ref-type="bibr" rid="B59">Hastie et al., 2001</xref>). The extent to which this class overlapping effect is permitted is regulated by a regularization constant <italic>C</italic>. Furthermore, the smoothness of the margin can be adjusted by appropriately tuning the hyperparameters of the kernel. <xref ref-type="bibr" rid="B176">Sun et al. (2016a)</xref>, for instance, use cross validation to find optimal values of the constant <italic>C</italic> and of the gaussian kernel width parameter.</p>
<p>In ANNs, the effects of overfitting get increasingly more pronounced as the number of hidden layers increases (<xref ref-type="bibr" rid="B160">Samanta, 2004</xref>). Two typical strategies to alleviate its impact are early stopping and regularization. The first consists in stopping the training phase once the first signs of over-training kick in. The second introduces a penalizing term in the loss function (typically in the form of <inline-formula id="inf10">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">L</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf11">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">L</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> norms on the network weights) to keep the values of the weights as small as possible. In <xref ref-type="bibr" rid="B8">Ayhan et al. (2006)</xref> for instance, the authors use early-stopping by arresting the training phase once the validation error keeps increasing for a specific number of epochs.</p>
<p>Finally, the KNN algorithm yields different performances depending on the value of <italic>k</italic>. Small values of <italic>k</italic> result in very sharp boundaries and might lead to overfitting. On the other hand, large <italic>k</italic>s are more robust to noise but might result in poor classification performances. This hyperparameter is then typically chosen via cross-validation by selecting the best performing value among a set of candidates. In <xref ref-type="bibr" rid="B48">Gharavian et al. (2013)</xref>, for instance, <italic>K</italic> is varied from 1 to the number of the training samples.</p>
</sec>
</sec>
</sec>
<sec id="s3-2">
<label>3.2</label>
<title>The Deep Learning Revolution</title>
<p>Most of the methods we have discussed so far are characterized by relatively &#x201c;shallow&#x201d; architectures. This aspect results in two main consequences: first, their representational power can be fairly limited and second, their input often consists of high-level features manually extracted from raw data by human experts.</p>
<p>DL is a quite recent class of ML methods that provide a new set of tools that are able to cope with the aforementioned shortcomings of traditional approaches. Essentially, DL techniques arise as an extension of classical ANNs. DL models, in their simplest form, can be seen as standard ANNs with the addition of multiple hidden layers between the network&#x2019;s input and output. An increasingly large corpus of empirical results has shown that these models are characterized by a superior representational power compared to shallow architectures. Once deep networks are trained, their inputs pass through a nested series of consecutive computations, resulting in the extraction of a set of complex features that are highly informative for the task on interest. This characteristic is one of the hallmarks of DL and can be seen as one of the key factors of its success.</p>
<p>In light of its improved representational power, its ability to automatically extract complex features, its dramatic achievements across different engineering fields and its multiple dedicated freely available software libraries (<xref ref-type="bibr" rid="B77">Jia et al., 2014</xref>; <xref ref-type="bibr" rid="B1">Abadi et al., 2016</xref>; <xref ref-type="bibr" rid="B183">Theano Development Team, 2016</xref>; <xref ref-type="bibr" rid="B144">Paszke et al., 2019</xref>), DL has the potential to provide effective solutions also in the context of PHM applications. Big data handling, automated end-to-end feature extraction from different data structures (e.g., images, time-series) and improved generalization are some of the targets on which DL models can make a difference compared to traditional ML approaches.</p>
<p>In the following, we introduce some of the most popular DL techniques used in PHM. Specifically, we focus on Autoencoder (AE) architectures, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and some of their variants and combinations. For each model, we list some interesting applications both in the context of fault diagnosis and prognosis.</p>
<sec id="s3-2-1">
<label>3.2.1</label>
<title>Methods and Techniques</title>
<sec id="s3-2-1-1">
<label>3.2.1.1</label>
<title>Autoencoders</title>
<p>AEs, in their simplest form, consist of feed-forward neural networks that are trained to output a reconstructed version of their input. They are composed of two sub-networks, namely an encoder and a decoder. The encoder, <inline-formula id="inf12">
<mml:math>
<mml:mi mathvariant="normal">h</mml:mi>
</mml:math>
</inline-formula>, implements a mapping from the input space to a typically lower-dimensional space. More concretely, we have:<disp-formula id="e3">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c8;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf13">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the input vector, <italic>&#x3c8;</italic> is the activation function and <inline-formula id="inf14">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf15">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>q</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> are the parameters of the encoder. The decoder implements a mapping from the embedding to the input space in order to reconstruct the original input vector. In formulas:<disp-formula id="e4">
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c8;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf16">
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the reconstructed input vector and <inline-formula id="inf17">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf18">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> are the parameters of the decoder. Given a dataset of <italic>N</italic> data instances <inline-formula id="inf19">
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, the accuracy of the model can be measured with, for example, the Root-Mean-Squared-Error (RMSE), which evaluates the reconstruction error made by the autoencoder:<disp-formula id="e5">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">RMSE</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>In the equation above, the symbol <italic>&#x3b8;</italic> has been used to indicate the parameters of the network, i.e., <inline-formula id="inf20">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The value of the parameters is found by minimizing the RMSE w. r.t the parameter <italic>&#x3b8;</italic> of the model. <xref ref-type="fig" rid="F5">Figure 5</xref> shows an illustration of the typical AE architecture.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Typical Autoencoder architecture.</p>
</caption>
<graphic xlink:href="frai-03-578613-g005.tif"/>
</fig>
<p>Note that the model assumes a so-called bottle-neck shape, characterized by an embedding space with a lower dimension than the input space. By setting <inline-formula id="inf21">
<mml:math>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, we can force the algorithm to find a more expressive representation of the input by getting rid of redundant pieces of information and keep only the most relevant ones for the reconstruction purpose. It is important to point out that here we have limited our description to a one-hidden-layer architecture for the sake of simplicity. However, deep models can be simply obtained by consecutively stacking multiple hidden layers. following the bottle-neck architecture.</p>
<p>There exists several more powerful extensions of the basic AE discussed before. Some examples include Sparse AEs (SAEs) (<xref ref-type="bibr" rid="B137">Ng et al., 2011</xref>), denoizing AEs (DAEs) (<xref ref-type="bibr" rid="B188">Vincent et al., 2008</xref>) and variational AEs (VAEs) (<xref ref-type="bibr" rid="B87">Kingma and Welling, 2013</xref>). Sparse AEs regularize the standard AE loss function with an additional term that forces the model to learn sparse features. This regularization term can be, for instance, the <inline-formula id="inf22">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="double-struck">L</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> norm of the activations:<disp-formula id="e6">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">Loss</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">RMSE</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mo>&#x7c;</mml:mo>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where <inline-formula id="inf23">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the <italic>i</italic>th component of the embedding <inline-formula id="inf24">
<mml:math>
<mml:mi mathvariant="normal">h</mml:mi>
</mml:math>
</inline-formula>. Alternatively, one can consider the KL divergence between the average <italic>i</italic>th activation and a small sparsity parameter &#x3b1;, yielding the following loss:<disp-formula id="e7">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">Loss</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">RMSE</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mi>K</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf25">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <italic>m</italic> is the number of training examples.</p>
<p>DAEs take as input corrupted version of the data and aim to output a reconstructed version of the original uncorrupted data. The assumption is that the algorithm is forced to select only the most informative part of the input distribution in order to recover the uncorrupted data instance.</p>
<p>VAEs differ from the previous AE techniques since they belong to the class of generative models. They aim at learning a parametric latent variable model through the maximization of a lower bound of the marginal log-likelihood of the training data. The goal of these approaches is to provide a way to learn a so-called disentangled representation of the latent space, i.e., a representation where the most relevant independent factors of variations in the data are decoupled amd clearly separated. To conclude this part it is worth mentioning that it is possible to design autoencoders where the encoder and the decoder are not limited to simple feed-forward neural networks but can also assume the form of CNNs and RNNs. We discuss these methods later within the section.</p>
</sec>
<sec id="s3-2-1-2">
<label>3.2.1.2</label>
<title>Convolutional Neural Networks</title>
<p>CNNs are some of the most successful and widely applied DL models. They reached the peak of their popularity thanks to their state-of-the-art performances in CV tasks, such as IR, pose estimation and object tracking. They have also been successfully applied in the contexts of NLP, Reinforcement Learning and time-series modeling. Their design draws inspiration from the organization of animal visual cortex (<xref ref-type="bibr" rid="B67">Hubel and Wiesel, 1968</xref>). Indeed, it turns out that single cortical neurons fire in response of stimuli received from relatively narrow regions of the visual field called receptive fields. Furthermore, neurons that are close to each other are often associated with similar and partially overlapping receptive fields, allowing them to map the whole visual field. These properties are useful to recognize specific features in natural images independently of their location.</p>
<p>CNNs implement these concepts by modifying the way computations are usually performed in standard feed-forward neural networks. In particular, CNNs convolve the input image with filters composed of learnable parameters. These parameters are trained to automatically extract features from the image in order to perform the task specified by a final loss function.</p>
<p>The standard CNN model shown in <xref ref-type="fig" rid="F6">Figure 6</xref> is composed of a set of elementary consecutive blocks. First, the input layer defines the data structure. A convolutional layer follows the input layer and performs the convolution operation over the input data. The size of the filters depend on the input structure. Two-dimensional filters are used for grid-like inputs, whereas, one-dimensional filters are used for time-series. Each filter has a user-specified size, which defines its receptive field. Batch normalization (<xref ref-type="bibr" rid="B70">Ioffe and Szegedy, 2015</xref>) is often applied right after the convolutional module in order to reduce the so-called covariate shift phenomenon and introduce a regularization effect. Then, a point-wize nonlinear activation function (e.g., ReLU) is applied.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Typical 1D-CNN architecture. Adapted from <xref ref-type="bibr" rid="B79">Jiao et al. (2020)</xref>.</p>
</caption>
<graphic xlink:href="frai-03-578613-g006.tif"/>
</fig>
<p>The convolutional layer is then followed by a so-called pooling layer, whose role is to reduce the number of parameters by sub-sampling the filtered signals. One common strategy to perform this operation is called max-pooling and consists of extracting only the maximum value of a fixed-sized batch of consecutive inputs.</p>
<p>Several instances of convolutional and pooling layers are typically alternated through the network. The final filtered signals are then flattened and fed into a sequence of fully-connected layers that map them into the output layer. The dropout (<xref ref-type="bibr" rid="B171">Srivastava et al., 2014</xref>) technique can be used both between the fully connected and the convolutional layers in order to contrast overfitting.</p>
</sec>
<sec id="s3-2-1-3">
<label>3.2.1.3</label>
<title>Recurrent Neural Networks</title>
<p>RNNs form another class of DL methods that has achieved impressive results in a wide variety of ML fields. In particular, RNNs are particularly effective in processing data characterized by a sequential structure. These types of data are widespread in fields such as NLP, Speech Recognition, Machine Translation, Sentiment Analysis to name a few, where recurrent architectures have been employed successfully. Given their particular suitability in analyzing sequential data, it is not surprising that RNN models have been widely applied in the context of PHM applications. We review some of these applications later in this section.</p>
<p>The architecture of the simplest possible recurrent model is shown in <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Most elementary RNN architecture.</p>
</caption>
<graphic xlink:href="frai-03-578613-g007.tif"/>
</fig>
<p>Given a sequential input vector <inline-formula id="inf26">
<mml:math>
<mml:mrow>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf27">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> at each time-step the RNN shown above performs the following operations:<disp-formula id="e8">
<mml:math>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(8)</label>
</disp-formula>where, <inline-formula id="inf28">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf29">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf30">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">W</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf31">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf32">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">b</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the parameters of the model, <inline-formula id="inf33">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf34">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are activation functions, <inline-formula id="inf35">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the so-called hidden state at time <italic>t</italic> and <inline-formula id="inf36">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the output at time <italic>t</italic>. Predictions are performed at each time step by mapping the current hidden state to the output. <inline-formula id="inf37">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, through a nonlinear activation. The hidden state is constantly updated at each iteration by combining the previous hidden state and the current input. This allows us to store past information and propagate it over time through the network. The basic architecture described above, however, suffers from the so-called vanishing gradient problem. This phenomenon is caused by the structure of simple RNNs which typically perform the composition of the same function sequentially at each time step. As shown by <xref ref-type="bibr" rid="B13">Bengio et al. (1994)</xref>, this results in increasingly small magnitudes associated with the gradients of long term interactions. To cope with this problem, a number of refinements have been introduced to the elementary architecture discussed before. The most popular ones are arguably the Long-Short-Term-Memory (LSTM) (<xref ref-type="bibr" rid="B63">Hochreiter and Schmidhuber, 1997</xref>), Bidirectional RNNs (Bi-RNN) (<xref ref-type="bibr" rid="B167">Schuster and Paliwal, 1997</xref>) and Gated-Recurrent Units (GRUs) (<xref ref-type="bibr" rid="B26">Cho et al., 2014</xref>). These techniques have been largely applied, over the last few years, to PHM, both for diagnosis and prognosis tasks. Current state-of-the-art methods in NLP complement the aforementioned recurrent architectures with the so-called attention mechanism (<xref ref-type="bibr" rid="B30">Devlin et al., 2018</xref>), which has resulted in significant performance improvements. Despite its success in NLP and related fields, attention-based networks do not find many applications in PHM, indicating a probably fruitful research direction.</p>
</sec>
</sec>
<sec id="s3-2-2">
<label>3.2.2</label>
<title>Diagnosis</title>
<sec id="s3-2-2-1">
<label>3.2.2.1</label>
<title>Autoencoder</title>
<p>AEs provide a first example of how DL methods can overcome some of the limitations of classical approaches. Indeed, typically AEs are used to automatically extract complex and meaningful features from raw data or to obtain more informative representations of a set of already extracted features. AEs have been applied to data gathered from several machines and industrial components, such as rolling element bearings (<xref ref-type="bibr" rid="B76">Jia et al., 2016</xref>; <xref ref-type="bibr" rid="B107">Liu et al., 2016</xref>; <xref ref-type="bibr" rid="B116">Lu et al., 2016</xref>; <xref ref-type="bibr" rid="B75">Jia et al., 2018</xref>), gearboxes (<xref ref-type="bibr" rid="B75">Jia et al., 2018</xref>), electrical generators (<xref ref-type="bibr" rid="B130">Michau et al., 2017</xref>; <xref ref-type="bibr" rid="B128">Michau et al.,</xref> <xref ref-type="bibr" rid="B128">2019</xref>), wind turbines (<xref ref-type="bibr" rid="B214">Yang et al., 2016</xref>), chemical industrial plants (<xref ref-type="bibr" rid="B119">Lv et al., 2017</xref>), induction motors (<xref ref-type="bibr" rid="B177">Sun et al., 2016b</xref>), air compressors (<xref ref-type="bibr" rid="B184">Thirukovalluru et al., 2016</xref>), hydraulic pumps (<xref ref-type="bibr" rid="B232">Zhu et al., 2015</xref>), transformers (<xref ref-type="bibr" rid="B190">Wang et al., 2016</xref>), spacecrafts (<xref ref-type="bibr" rid="B100">Li and Wang, 2015</xref>) and gas turbine combustors (Yan and Yu, 2019).</p>
<p>As mentioned before, AEs are often used in combination with other classifiers, such as simple softmax classifiers (<xref ref-type="bibr" rid="B107">Liu et al., 2016</xref>), feed-forward neural networks (<xref ref-type="bibr" rid="B177">Sun et al., 2016b</xref>), RFs (<xref ref-type="bibr" rid="B184">Thirukovalluru et al., 2016</xref>) and SVMs (<xref ref-type="bibr" rid="B177">Sun et al., 2016b</xref>; <xref ref-type="bibr" rid="B119">Lv et al., 2017</xref>). In <xref ref-type="bibr" rid="B177">Sun et al. (2016b)</xref>, feed-forward NNs trained on top of the features learned by the AE model provide excellent classification results in terms of fault diagnosis accuracy. An SVM trained on the same features performs only slightly worse. <xref ref-type="bibr" rid="B107">Liu et al. (2016)</xref> propose a combination of stacked SAEs and a softmax classifier for element bearings fault diagnosis. Short-time-Fourier transformed raw inputs undergo several nonlinear transformations implemented by the sparse AEs. The resulting features are fed into a softmax classifier which outputs the classification results.</p>
<p>
<xref ref-type="bibr" rid="B116">Lu et al. (2016)</xref> compare the features extracted by stacked DAEs with some manually extracted features. The comparison is based on the fault classification accuracies provided by an SVM and a RF model trained on top of the two classes of features. The results show that the first set of features possess a larger discriminative power for the task under consideration.</p>
<p>Another interesting application of AEs is shown in the work of <xref ref-type="bibr" rid="B76">Jia et al. (2016)</xref>. Here, the nonlinear mapping implemented by deep AEs is exploited to pre-train an ANN which is in turn used to perform fault diagnosis both on rolling element bearings and planetary gearboxes. More specifically, the weights between two hidden layers are initialized by training an AE to minimize the reconstruction error of the input values specified by the first hidden layer. With this pre-training strategy, the feature extraction ability of AEs is used to encode relevant properties of the data directly into the ANN weight configuration.</p>
<p>AE architectures can also be used to estimate a health indicator which measures the &#x201c;distance&#x201d; of a test data point to the training healthy class (<xref ref-type="bibr" rid="B130">Michau et al., 2017</xref>; <xref ref-type="bibr" rid="B128">Michau et al.,</xref> <xref ref-type="bibr" rid="B128">2019</xref>; <xref ref-type="bibr" rid="B196">Wen and Gao, 2018</xref>). For example, in the work of <xref ref-type="bibr" rid="B128">Michau et al. (2019)</xref> a system comprising of an AE and a one class-classifier is trained with only healthy data to assess the health state of a complex electricity production plant. In this work, both AE and one-class classifier have the structure of a particular type of neural network called Extreme Learning Machine (ELM). ELM-based AEs have been also successfully employed in <xref ref-type="bibr" rid="B130">Michau et al. (2017)</xref> and <xref ref-type="bibr" rid="B214">Yang et al. (2016)</xref>, among others.</p>
</sec>
<sec id="s3-2-2-2">
<label>3.2.2.2</label>
<title>Convolutional Neural Networks</title>
<p>CNNs are particularly advantageous in the context of fault diagnosis since they implement the feature extraction and classification tasks in an end-to-end fashion. Moreover, they can be applied to several data structures, including both time-series and images (<xref ref-type="bibr" rid="B79">Jiao et al., 2020</xref>). A common strategy to employ 2D-CNNs<xref ref-type="fn" rid="FN6">
<sup>6</sup>
</xref> in PHM applications is to feed these models with image-like data. This poses the problem of how to convert sensor measurements, which are typically in the form of multivariate time-series, into a grid-like structure. Examples of this procedure can be found, for example, in <xref ref-type="bibr" rid="B31">Ding and He (2017)</xref>, <xref ref-type="bibr" rid="B178">Sun et al. (2017)</xref>, <xref ref-type="bibr" rid="B53">Guo et al. (2018b)</xref>, <xref ref-type="bibr" rid="B199">Wen et al. (2018)</xref>, <xref ref-type="bibr" rid="B19">Cao et al. (2019)</xref>, <xref ref-type="bibr" rid="B72">Islam and Kim (2019a)</xref>, <xref ref-type="bibr" rid="B98">Li et al. (2019a)</xref>, <xref ref-type="bibr" rid="B189">Wang et al. (2019)</xref>. Most of these works employ popular signal processing techniques to perform the two-dimensional mapping. In particular, <xref ref-type="bibr" rid="B98">Li et al. (2019a)</xref> use the S-transform to map bearing vibrational data into a time-frequency representation. Similarly, in <xref ref-type="bibr" rid="B31">Ding and He (2017)</xref>, <xref ref-type="bibr" rid="B178">Sun et al. (2017)</xref>, <xref ref-type="bibr" rid="B53">Guo et al. (2018b)</xref>, <xref ref-type="bibr" rid="B19">Cao et al. (2019)</xref>, <xref ref-type="bibr" rid="B72">Islam and Kim (2019a)</xref> transformations based on the wavelet transform are used to process data gathered from bearings, rotating machinery and gears. An additional strategy is proposed in <xref ref-type="bibr" rid="B199">Wen et al. (2018)</xref>, where the following mapping is applied to convert time-series data into two-dimensional images:<disp-formula id="e9">
<mml:math>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>round</mml:mtext>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>Min</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mtext>Max</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>Min</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>255</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>where the input signal is a vector of size <inline-formula id="inf38">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mi>M</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf39">
<mml:math>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is signal magnitude at the <italic>j</italic>th time step and <inline-formula id="inf40">
<mml:math>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the intensity of the <inline-formula id="inf41">
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> pixels in the output image. This technique has been applied to data extracted from rolling element bearings and hydraulic and centrifugal pumps resulting in nearly optimal fault classification accuracy in all three cases.</p>
<p>Another class of methods applies CNNs directly to image data, thus leveraging the great success of these architectures in CV tasks. For example, <xref ref-type="bibr" rid="B74">Janssens et al. (2018)</xref>; <xref ref-type="bibr" rid="B78">Jia et al. (2019)</xref> use CNNs to perform fault diagnosis of rotating machinery based on infrared thermal videos and images respectively. <xref ref-type="bibr" rid="B221">Yuan et al. (2018)</xref> propose a method that fuses features extracted from different data structures, including infrared images, for CNN-based fault classification of a rotor system.</p>
<p>Alternatively to 2D-CNNs, 1D-CNNs can be used to directly process time-series data. The literature contains a large number of examples that propose to apply 1D-CNN to bearing (<xref ref-type="bibr" rid="B37">Eren, 2017</xref>; <xref ref-type="bibr" rid="B24">Chen et al., 2018</xref>; <xref ref-type="bibr" rid="B38">Eren et al., 2019</xref>; <xref ref-type="bibr" rid="B149">Qin et al., 2019</xref>; <xref ref-type="bibr" rid="B206">Xueyi et al., 2019</xref>) and gears (<xref ref-type="bibr" rid="B80">Jing et al., 2017</xref>; <xref ref-type="bibr" rid="B215">Yao et al., 2018</xref>; <xref ref-type="bibr" rid="B57">Han et al., 2019b</xref>) fault diagnosis. <xref ref-type="bibr" rid="B24">Chen et al. (2018)</xref>, for instance, propose a novel DL model, based on the popular Inception architecture (<xref ref-type="bibr" rid="B180">Szegedy et al., 2015</xref>) and a particular type of dilated convolution (<xref ref-type="bibr" rid="B65">Holschneider et al., 1990</xref>). The model is trained with data generated from artificial bearing damages and achieves very good performances on real data. The proposed method is pre-processing-free since it takes as input raw temporal signals directly.</p>
<p>The ability of CNN architectures to extract features in an end-to-end manner is tested in <xref ref-type="bibr" rid="B80">Jing et al. (2017)</xref>. Here, the authors compare the quality of these features with a number of benchmarks consisting of conventional feature engineering approaches. The results show the superiority of the feature-learning pipeline implemented by CNNs over manual feature extraction.</p>
<p>Finally, CNN have also been applied to generate health indicators and to estimate the degradation trend of rolling bearings (<xref ref-type="bibr" rid="B52">Guo et al., 2018a</xref>; <xref ref-type="bibr" rid="B216">Yoo and Baek, 2018</xref>). In <xref ref-type="bibr" rid="B216">Yoo and Baek (2018)</xref>, for instance, the authors apply a continuous wavelet transform to the data and feed the resulting two-dimensional images into a 2D-CNN which, in turn, outputs the health indicator.</p>
</sec>
<sec id="s3-2-2-3">
<label>3.2.2.3</label>
<title>Recurrent Neural Networks</title>
<p>RNNs have been mainly used for fault prognosis and only a relatively small number of works focus on their application to fault diagnosis. Some examples are (<xref ref-type="bibr" rid="B101">Li et al., 2018a</xref>; <xref ref-type="bibr" rid="B102">Li et al., 2018b</xref> <xref ref-type="bibr" rid="B150">Qiu et al., 2019</xref>) for bearings (<xref ref-type="bibr" rid="B226">Zhao H. et al., 2018</xref>; <xref ref-type="bibr" rid="B227">Zhao Q. et al., 2018</xref> <xref ref-type="bibr" rid="B219">Yuan and Tian, 2019</xref>), for chemical processes control [see Tenessee Eastman dataset (<xref ref-type="bibr" rid="B22">Chen, 2019</xref>)] and (<xref ref-type="bibr" rid="B93">Lei et al., 2019</xref>) for wind turbines.</p>
<p>These methods can be divided into two categories: &#x201c;RNN &#x2b; classifier&#x201d; and end-to-end approaches. The works of <xref ref-type="bibr" rid="B101">Li et al. (2018a</xref>, <xref ref-type="bibr" rid="B102">2018b</xref>) and <xref ref-type="bibr" rid="B219">Yuan and Tian (2019)</xref> belong to the first category. The first employs an LSTM-based architecture to extract informative features from the input data. The so-obtained features are then fed into a softmax classifier that performs fault classification. <xref ref-type="bibr" rid="B219">Yuan and Tian (2019)</xref> use a GRU network to obtain dynamic features from several sub-sequences extracted from the raw signals. Multi-class classification is performed by a final softmax layer fed with the features obtained by the GRU module.</p>
<p>
<xref ref-type="bibr" rid="B226">Zhao H. et al., 2018</xref>; <xref ref-type="bibr" rid="B227">Zhao Q. et al., 2018</xref> <xref ref-type="bibr" rid="B150">Qiu et al. (2019)</xref>; <xref ref-type="bibr" rid="B93">Lei et al. (2019)</xref> use RNN architecture in an end-to-end manner. For instance, <xref ref-type="bibr" rid="B150">Qiu et al. (2019)</xref> use a variant of Bi-LSTMs specifically designed to process long-term dependencies, to directly classify fault types. The network is trained with a set of features extracted by means of wavelet packet transform and employs softsign activation functions to contrast the vanishing gradient problem. Another end-to-end approach is proposed in <xref ref-type="bibr" rid="B93">Lei et al. (2019)</xref> where the authors use an LSTM-based model for fault diagnosis of a wind turbine. In this work, features are directly extracted by the network and there is no need for manual feature extraction. The proposed method is shown to outperform existing fault diagnosis techniques, such as ANNs, SVMs and CNNs.</p>
</sec>
<sec id="s3-2-2-4">
<label>3.2.2.4</label>
<title>Hybrid</title>
<p>With hybrid approaches we mean all those methods that combine the benefits provided by AEs, CNNs and RNNs models into single powerful systems.</p>
<p>For example, <xref ref-type="bibr" rid="B104">Li et al. (2019d)</xref>; <xref ref-type="bibr" rid="B143">Park et al. (2019)</xref> propose techniques leveraging the efficacy of AEs in extracting valuable features and the advantages provided by RNN-architectures in analyzing time-dependent data. In <xref ref-type="bibr" rid="B104">Li et al. (2019d)</xref>, first stacked AEs generate a latent representation of the raw input rotary machinery data. An LSTM network is then used to predict the value corresponding to the 10-th time step in the feature sequence given the previous 9. The reconstruction error between prediction and ground truth value is used to determine if the datum is anomalous or not.</p>
<p>An alternative approach consists in using recurrent models in the form of AEs to better deal with time-series data. In <xref ref-type="bibr" rid="B108">Liu et al. (2018)</xref>, for instance, a GRU-based DAE is proposed for rolling bearing fault diagnosis. Specifically, the proposed GRU model is used to predict the next period given the previous one. As many such models as the number of faults are trained and classification is performed by selecting the model providing the lowest reconstruction error.</p>
<p>CNN-based architectures can also be combined with other types of networks for the purpose of fault diagnosis. In <xref ref-type="bibr" rid="B110">Liu et al. (2019b)</xref>, for instance, a one-dimensional convolutional-DAE is proposed to extract features from bearing and gearbox data. This model is given corrupted time-series as input and its goal is to clean and reconstruct them at the output level. The so-learned features are then fed into an additional CNN model that performs the classification task.</p>
<p>In <xref ref-type="bibr" rid="B227">Zhao et al. (2017)</xref>, <xref ref-type="bibr" rid="B141">Pan et al. (2018)</xref>, <xref ref-type="bibr" rid="B206">Xueyi et al. (2019)</xref>, the combination of CNNs and RNNs is investigated. For example, in <xref ref-type="bibr" rid="B206">Xueyi et al. (2019)</xref> a 1D-CNN and a GRU network are used to extract discriminative features from acoustic and vibration signals respectively. The so-obtained features are then concatenated and fed into a softmax classifier which performs gear pitting fault diagnosis. This hybrid method is shown to outperform CNN and GRU applied individually to the same data.</p>
<p>
<xref ref-type="bibr" rid="B141">Pan et al. (2018)</xref>, instead, proposes a method fusing a 1D-CNN and an LSTM network into a single structure. The LSTM takes as input the output of the CNN and performs fault diagnosis over bearing data. The proposed algorithm provides nearly optimal performances on the test set.</p>
</sec>
</sec>
<sec id="s3-2-3">
<label>3.2.3</label>
<title>Prognosis</title>
<sec id="s3-2-3-1">
<label>3.2.3.1</label>
<title>Autoencoder</title>
<p>AEs are typically used in combination with other regression techniques for the purpose of fault prognosis. The literature contains examples of AE-based techniques applied to RUL estimation of bearings (<xref ref-type="bibr" rid="B154">Ren et al., 2018</xref>; <xref ref-type="bibr" rid="B204">Xia et al., 2019</xref>), machining centers (<xref ref-type="bibr" rid="B207">Yan et al., 2018</xref>), aircraft engines (<xref ref-type="bibr" rid="B120">Ma et al., 2018</xref>) and lithium-ion batteries (<xref ref-type="bibr" rid="B156">Ren et al., 2018b</xref>). The role of AEs in all the above references is to perform automatic feature extraction to facilitate the work of regression or classification methods used for health state assessment or RUL estimation. <xref ref-type="bibr" rid="B204">Xia et al. (2019)</xref>, for example, utilize a DAE and a softmax classifier trained on top of the AE embedding to classify the inputs into different degradation stages. Then, ANN-based regressors are used to model each stage separately. The final RUL is obtained by applying a smoothing operation to all the previously computed regression models.</p>
<p>In <xref ref-type="bibr" rid="B120">Ma et al. (2018)</xref>, AEs are used in a similar manner. The authors propose a system composed of a DAE, a SAE and a logistic regressor to predict the RUL on an aircraft engine. The first AE module generates low-level features which are in turn fed into the second AE model which outputs a new set of high-level features. Finally, the logistic regressor predicts the RUL based on the features extracted by the second AE.</p>
</sec>
<sec id="s3-2-3-2">
<label>3.2.3.2</label>
<title>Convolutional Neural Networks</title>
<p>CNN architectures have been extensively explored also for fault prognosis. These methods have been mainly applied to open-source evaluation platforms such as the popular NASA&#x2019;s C-MAPSS dataset (<xref ref-type="bibr" rid="B166">Saxena and Goebel, 2008</xref>) for aero-engine unit prognostics (<xref ref-type="bibr" rid="B9">Babu et al., 2016</xref>; <xref ref-type="bibr" rid="B101">Li et al., 2018a</xref>; <xref ref-type="bibr" rid="B102">Li et al., 2018b</xref> <xref ref-type="bibr" rid="B197">Wen et al., 2019a</xref>) and the PRONOSTIA dataset (<xref ref-type="bibr" rid="B5">Ali et al., 2015</xref>) for bearings health assessment (<xref ref-type="bibr" rid="B155">Ren et al., 2018a</xref>; <xref ref-type="bibr" rid="B233">Zhu et al., 2018</xref>; <xref ref-type="bibr" rid="B103">Li et al., 2019c</xref>; <xref ref-type="bibr" rid="B193">Wang et al., 2019b</xref>; <xref ref-type="bibr" rid="B210">Yang et al., 2019</xref>).</p>
<p>In <xref ref-type="bibr" rid="B101">Li et al., 2018a</xref>; <xref ref-type="bibr" rid="B102">Li et al., 2018b</xref> a 1D-CNN model is used to predict the RUL on the C-MAPSS dataset. Data are first chunked in fixed-length windows and then directly fed into the network without any pre-processing step. Despite the relative simplicity of the employed architecture, the proposed technique is able to provide pretty good prediction results, especially in proximity of the final failure.</p>
<p>In <xref ref-type="bibr" rid="B197">Wen et al. (2019a)</xref> the authors build upon the work of <xref ref-type="bibr" rid="B101">Li et al., 2018a</xref>; <xref ref-type="bibr" rid="B102">Li et al., 2018b</xref> and propose a novel CNN model for RUL estimation which draws inspiration from the popular ResNet architecture (<xref ref-type="bibr" rid="B60">He et al., 2016</xref>). The proposed technique is shown to outperform traditional methods such as SVMs, ANNs, LSTM and the model proposed by <xref ref-type="bibr" rid="B101">Li et al., 2018a</xref>; <xref ref-type="bibr" rid="B102">Li et al., 2018b</xref> in terms on RUL mean and standard deviation on the C-MAPSS dataset.</p>
<p>In the context of bearing fault prognosis, <xref ref-type="bibr" rid="B155">Ren et al. (2018a)</xref> propose a new approach based on manual feature extraction and CNNs for RUL estimation. First, a new method for feature extraction is proposed to generate a feature map which is highly correlated with the decay of bearing vibration over time. This feature map is then fed into a deep 2D-CNN which outputs the RUL estimate. Linear regression is then used as a smoothing method to reduce the discontinuity problem in the final prediction result. Experiments show that the proposed method is able to provide improved prediction accuracy in bearing RUL estimation.</p>
</sec>
<sec id="s3-2-3-3">
<label>3.2.3.3</label>
<title>Recurrent Neural Networks</title>
<p>The application of RNN architectures to fault prognosis have been explored on various industrial components such that lithium-ion-batteries (<xref ref-type="bibr" rid="B223">Zhang et al., 2018</xref>), gears (<xref ref-type="bibr" rid="B205">Xiang et al., 2020</xref>), fuel cells (<xref ref-type="bibr" rid="B109">Liu et al., 2019a</xref>), and on the C-MAPSS dataset (<xref ref-type="bibr" rid="B220">Yuan et al., 2016</xref>; <xref ref-type="bibr" rid="B229">Zheng et al., 2017</xref>; <xref ref-type="bibr" rid="B202">Wu et al., 2018a</xref>; <xref ref-type="bibr" rid="B203">Wu et al., 2018b</xref>; <xref ref-type="bibr" rid="B21">Chen et al., 2019</xref>; <xref ref-type="bibr" rid="B36">Elsheikh et al., 2019</xref>; <xref ref-type="bibr" rid="B201">Wu et al., 2020</xref>). One of the most popular RNN-based approaches proposed in the literature is the work of <xref ref-type="bibr" rid="B203">Wu et al. (2018b)</xref>. The authors first extract dynamic features containing inter-frame information and then use these features to train a vanilla-LSTM model to predict the RUL. An SVM model is employed to detect the degradation starting point. The proposed technique is shown to consistently outperform a standard RNN and a GRU model trained on the same dataset. The remarkable performances of LSTM networks on the RUL estimation task are further confirmed by the work of <xref ref-type="bibr" rid="B229">Zheng et al. (2017)</xref>. The authors combine LSTM layers with a feed-forward neural network, showing that the proposed approach provides better performances than ANNs, SVM and CNNs. In <xref ref-type="bibr" rid="B205">Xiang et al. (2020)</xref>, the attention mechanism is used to enhance the performances of an LSTM network on the prediction of the RUL on gears. The aforementioned model, named LSTMP-A, is trained with time-domain and frequency-domain features and its comparison with other recurrent models shows that it provides the best prediction accuracy.</p>
</sec>
<sec id="s3-2-3-4">
<label>3.2.3.4</label>
<title>Hybrid</title>
<p>Hybrid approaches have been also applied in the context of fault prognosis. For instance, the literature contains examples of AE &#x2b; RNN (<xref ref-type="bibr" rid="B91">Lal Senanayaka et al., 2018</xref>; <xref ref-type="bibr" rid="B29">Deng et al., 2019</xref>) and CNN &#x2b; RNN (<xref ref-type="bibr" rid="B227">Zhao et al., 2017</xref>; <xref ref-type="bibr" rid="B122">Mao et al., 2018</xref>; <xref ref-type="bibr" rid="B99">Li et al., 2019b</xref>) combinations. In <xref ref-type="bibr" rid="B227">Zhao et al. (2017)</xref> sensory data from milling machine cutters are processed by a novel technique combining a CNN component and an LSTM network. The CNN is used to extract local features, whereas a bi-LSTM captures long-term dependencies and take into account both past and future contexts. A sequence of fully connected layers and a linear regression layer takes as input the output of the LSTM and predicts the tool-wear level.</p>
<p>Similarly, <xref ref-type="bibr" rid="B122">Mao et al. (2018)</xref> combine LSTM and CNN models for feature extraction and RUL prediction. In particular, time-series from the C-MAPPS dataset are first sliced by applying a time-window. The resulting data are then independently fed into an LSTM network and a CNN. The features extracted by these two networks are then combined and further processed by an additional LSTM network and a fully connected layer which predicts the RUL.</p>
<p>
<xref ref-type="bibr" rid="B29">Deng et al. (2019)</xref> propose a method based on the combination of stacked SAEs and a GRU model. The AE is used for automatic feature extraction and the GRU is used to model the mapping from the features extracted by the AE to the RUL values. The proposed method is applied to the C-MAPPS dataset, showing satisfactory results.</p>
</sec>
</sec>
<sec id="s3-2-4">
<label>3.2.4</label>
<title>Discussion</title>
<sec id="s3-2-4-1">
<label>3.2.4.1</label>
<title>Dependency on Feature Extraction</title>
<p>One of the key advantages of DL algorithms over traditional ML approaches stands in their lower degree of dependence on the feature extraction step. Their input can consist of either raw data or a set of manually extracted features, depending on the amount of prior information available to the user about the task under consideration.</p>
</sec>
<sec id="s3-2-4-2">
<label>3.2.4.2</label>
<title>Model Selection</title>
<p>As already discussed for traditional ML algorithms, a universal approach valid for all possible application scenarios does not exist. In general, the nature of the problem dictates which method to utilize. For instance, when the PHM problem at hand involves image data, the usage of 2D CNN might be preferred. On the other hand, when sensor measurements consisting of time-series data have to be analyzed, 1D CNN and RNN architectures are more sensible choices. Ultimately, the final model can be selected by evaluating each candidate on the same metrics mentioned at the end of paragraph 3.1.3.2 and comparing the corresponding scores.</p>
</sec>
<sec id="s3-2-4-3">
<label>3.2.4.3</label>
<title>Overfitting</title>
<p>As already mentioned before, a larger number of hidden layers is often associated with a higher risk of overfitting. Beyond the techniques already discussed for ANNs (e.g., cross-validation, early-stopping and regularization), deep models can be equipped with more advanced tools to contrast over-training. A popular example is the Dropout technique (<xref ref-type="bibr" rid="B171">Srivastava et al., 2014</xref>) which randomly drops neurons from the neural network at training time. Intuitively, this prevents the network to specialize on a particular set of data. Dropout is used, for instance, in <xref ref-type="bibr" rid="B57">Han et al. (2019b)</xref> and <xref ref-type="bibr" rid="B189">Wang et al. (2019)</xref> with the corresponding parameter fixed at 0.5. Finally, data augmentation can be also used to generate new images by applying simple transformations (e.g., rotation, mirroring, cropping, padding) to the training data. For instance, this technique is applied in <xref ref-type="bibr" rid="B189">Wang et al. (2019)</xref> to time-frequency images obtained from bearing accelerometers, in order to increase the size and the level of diversity of the training set.</p>
</sec>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Critique and Future Directions</title>
<p>In the previous section, we have discussed some of the most popular DL techniques that have been applied to PHM problems over the last few years. We have compared traditional ML approaches with DL techniques, trying to highlight the strengths of both methods and emphasizing the change of paradigm introduced by the so-called DL revolution.</p>
<p>The goal of this section is to shed some light over a number of open challenges that need to be addressed to bridge the gap between research and industrial applications. We start by briefly discussing some of these open questions and some limitations of DL models that hinder their solution. Then, we discuss some first attempts to cope with these challenges along with some proposals of future investigations. Our goal is to provide the reader with a set of possible fruitful research directions that we consider as valuable candidates to further increase the impact of DL to PHM.</p>
<sec id="s4-1">
<label>4.1</label>
<title>Open Challenges</title>
<sec id="s4-1-1">
<label>4.1.1</label>
<title>Reliability and Interpretability</title>
<p>One of the most common criticisms to DL models arises from their black-box nature, i.e., the sometimes opaque mechanism by which they make their decisions. This characteristic of deep models derives from one of the properties that allows them to successfully tackle several different tasks: the complex sequence of nonlinear operations they implement across their deep architectures. A complete mathematical characterization of the behavior of DL models in light of their inherent complexity is very hard to obtain. This negative property of deep networks represents a significant limitation to their deployment in areas such as healthcare, finance, and PM. In these delicate contexts, humans need to have control over their tools and it is not always possible to sacrifice trust and transparency for better performances. It is therefore urgent to enhance the level of interpretability of these models in order to make them fully deployable while minimizing the risks.</p>
<p>However, it is not straightforward to provide a unique definition of the concept of interpretability (<xref ref-type="bibr" rid="B106">Lipton, 2018</xref>). DL models can be, for instance, enhanced with complementary functionalities responsible for providing a post-hoc explanation of their actions. Alternatively, one can build some notion of interpretability directly into the models in order to constrain their learning process to align with some inductive biases that we might deem trustworthy. The strategy of providing post-hoc explanations of the model behavior have been widely investigated in CV (<xref ref-type="bibr" rid="B157">Ribeiro et al., 2016</xref>; <xref ref-type="bibr" rid="B231">Zhou et al., 2016</xref>; <xref ref-type="bibr" rid="B118">Lundberg and Lee, 2017</xref>). Few attempts, however, have been made to extend these approaches to time-series data [see for example (<xref ref-type="bibr" rid="B39">Fawaz et al., 2019</xref>), (<xref ref-type="bibr" rid="B51">Guillem&#xe9; et al., 2019</xref>)].</p>
<p>Imposing appropriate inductive biases on DL models have been recently identified as a key step to perform unsupervized learning tasks (<xref ref-type="bibr" rid="B112">Locatello et al., 2019a</xref>; <xref ref-type="bibr" rid="B113">Locatello et al., 2019b</xref>). Some possible inductive biases can derive from a-priori available physical knowledge of the problem under consideration. This complementary information can be incorporated directly into the network architecture or can be used to drive a model toward more meaningful output decisions. We discuss some of these approaches later in this section.</p>
<p>To conclude this discussion, it is worth mentioning that another important requirement for interpretable and transparent models stands in their ability to provide uncertainty estimates about their predictions. Uncertainty can derive both from the intrinsic stochasticity of the task (aleatoric uncertainty) and from the approximations introduced by our imperfect model (parametric uncertainty). Bayesian approaches can in principle deal with uncertainty estimation and their combination with DL methods is a hot research area (<xref ref-type="bibr" rid="B27">Damianou and Lawrence, 2013</xref>; <xref ref-type="bibr" rid="B15">Blundell et al., 2015</xref>; <xref ref-type="bibr" rid="B46">Garnelo et al., 2018</xref>).</p>
</sec>
<sec id="s4-1-2">
<label>4.1.2</label>
<title>Highly Specialized Models</title>
<p>An increasing amount of experimental evidence (<xref ref-type="bibr" rid="B222">Zhang et al., 2017</xref>; <xref ref-type="bibr" rid="B12">Beery et al., 2018</xref>; <xref ref-type="bibr" rid="B7">Arjovsky et al., 2019</xref>) has recently attracted the attention of the scientific community on an additional relevant limitation of deep models: they often tend to learn &#x201c;shortcuts&#x201d; instead of the underlying physical mechanisms describing the data. For instance, let&#x2019;s consider the task of classifying cows and camels based on a training set containing labeled images where cows are mostly found in green pastures and camels in sandy deserts (<xref ref-type="bibr" rid="B12">Beery et al., 2018</xref>). Testing our model on images of cows taken in a different environment, such as beaches, leads to a wrong classification decision. Similar generalization deficiencies can be also observed in the context of PHM applications. Typically, labeled data are available only for a single machine; training a model on these data can lead to good performances on a test set extracted from the same machine but to very disappointing results on a similar machine operating at slightly different operating conditions. The variability in the machines&#x2019; operational modes can arise from differences in specific choices in their design, or to external factors (e.g., environmental variables such as humidity, temperature, seasonality). Ideally, an efficient model should be able to deal with these factors of variability and provide predictions that are robust to changing operating conditions. On the other hand, the majority of the DL approaches proposed in the literature do not address this point and focus on relatively narrow systems without taking generalization into account. If we really aim at designing &#x201c;Intelligent&#x201d; systems that can take decisions following similar cognitive patterns as those characterizing human decision making, we have to provide new solutions to the aforementioned shortcomings.</p>
</sec>
<sec id="s4-1-3">
<label>4.1.3</label>
<title>Data Scarcity</title>
<p>An immediate consequence of using DL models is that, by increasing the depth of the network, the number of parameters associated with it grows accordingly. As a result, finding an optimal weight configuration requires training these networks with very large datasets. In particular, supervised learning approaches are based on the availability of large numbers of labeled data instances for each class under consideration. This aspect poses a significant practical limitation on the application of DL models to the industry domain. In the case of fault diagnosis, for example, it is difficult to find an adequately large number of data for each possible fault. This is mainly because, luckily, faulty data tend to be relatively rare compared to healthy ones. Furthermore, it might also be the case that some faults are not even a-priori known and it is, therefore, impossible to precisely characterize them. This lack of representativeness (<xref ref-type="bibr" rid="B129">Michau et al., 2018</xref>) of the training data delineates a very common scenario in practical applications. Two possible alternative approaches can be adopted to cope with it: the first is to design algorithms that are less data-intensive, whereas the second is to generate artificial data that strongly resemble real ones. We discuss some of these methods in the next section.</p>
</sec>
</sec>
<sec id="s4-2">
<label>4.2</label>
<title>Possible Solutions</title>
<sec id="s4-2-1">
<label>4.2.1</label>
<title>Fusing Deep Learning With Physics</title>
<p>One possible way to cope with the aforementioned challenges is to incorporate information about the physics of the system under consideration into the learning process. DL algorithms, in and of themselves, are not able to capture the primitive causal mechanisms at the basis of the input observations (<xref ref-type="bibr" rid="B146">Pearl, 2019</xref>). On the other hand, physical models of complex systems are built from fundamental laws of physics but often rely on relatively strong approximations which result in poor predictive power. Taking prior physics knowledge into account can be helpful in inducing a higher level of interpretability into deep models and in improving their generalization performances. Hybrid models integrating the flexibility of modern data-driven techniques and the transparency of physics models have the potential of overcoming the limitations of the two stand-alone approaches by exploiting their individual strengths.</p>
<p>In the context of PHM, a relatively small number of works have been proposed in this direction. For example, in <xref ref-type="bibr" rid="B20">Chao et al. (2019)</xref>, a high-fidelity performance model of an aircraft engine is first calibrated on real data by using an Unscented Kalman Filter (<xref ref-type="bibr" rid="B82">Julier and Uhlmann, 1997</xref>) and then used to generate unobserved physical quantities that are in turn employed to enhance the input space of a DL model. The results show that the new input space including both observed and virtual measurements contributes in significantly improving the performances of the model.</p>
<p>An alternative way to fuse physics knowledge and data-driven methods is described in <xref ref-type="bibr" rid="B34">Dourado and Viana (2020)</xref> and <xref ref-type="bibr" rid="B135">Nascimento and Viana (2019)</xref>. In these works, well-known physics-based cumulative damage models are complemented by data-driven techniques whose goal is to explain some additional phenomena that the original model is not able to accurately describe. The final model has a sound physical interpretation and provides refinements over the original physics model thanks to its data-driven component.</p>
<p>We conclude this part by noticing that physics knowledge could also be incorporated into deep models directly at the architecture level. Recent research in Graph Neural Networks (<xref ref-type="bibr" rid="B162">Sanchez-Gonzalez et al., 2018</xref>; <xref ref-type="bibr" rid="B236">Cranmer et al., (2020)</xref>) shows that these kind of models are particularly suitable to encode and exploit prior physics knowledge, for instance, given in the form of Partial Differential Equations over space and time. An example of an industrial application of these models is provided by <xref ref-type="bibr" rid="B142">Park and Park (2019)</xref> who use a specific type of GNN to estimate the power generated by a wind farm by modeling the physics interactions between the individual turbines.</p>
</sec>
<sec id="s4-2-2">
<label>4.2.2</label>
<title>Domain Adaptation</title>
<p>The high variability of machines&#x2019; operating conditions and the problem of data scarcity motivate the introduction of techniques capable of transferring the knowledge gained from a well-known machine to another for which data are not as abundant. Transfer Learning (TL) is a class of ML methods whose goal is to address this problem. Traditional TL approaches (<xref ref-type="bibr" rid="B218">Yosinski et al., 2014</xref>) are based on the following rationale: first, a deep network is trained on a large dataset to perform a specific task. Then, the same network is used to perform a similar task simply by fine-tuning its final layers on a few instances from the new dataset. Recent works in the context of fault diagnosis and fault prognosis have successfully applied this idea on datasets from induction motors (<xref ref-type="bibr" rid="B168">Shao et al., 2019a</xref>; <xref ref-type="bibr" rid="B169">Shao et al., 2019b</xref>), gearboxes (<xref ref-type="bibr" rid="B18">Cao et al., 2018</xref>; <xref ref-type="bibr" rid="B61">He et al., 2019</xref>; <xref ref-type="bibr" rid="B168">Shao et al., 2019a</xref>; <xref ref-type="bibr" rid="B169">Shao et al., 2019b</xref>), bearings (<xref ref-type="bibr" rid="B168">Shao et al., 2019a</xref>; <xref ref-type="bibr" rid="B169">Shao et al., 2019b</xref> <xref ref-type="bibr" rid="B198">Wen et al., 2019b</xref>) and centrifugal pumps (<xref ref-type="bibr" rid="B198">Wen et al., 2019b</xref>).</p>
<p>Besides traditional TF methods, unsupervized Domain Adaptation (DA) techniques have also been recently applied to PHM tasks. DA is a sub-field of TF, whose goal is to maximize the performances on the target domain for which only few unlabeled data are available by exploiting a labeled data from the so-called source domain. The two domains are commonly assumed to share similar features even though a model trained on the source domain will usually provide poor performances on the target domain. This is typically due to a distributional shift between the marginal distributions describing the two sets of data. DA techniques have witnessed an increasing attention since the introduction of the so-called adversarial DA methods (<xref ref-type="bibr" rid="B43">Ganin and Lempitsky, 2014</xref>; <xref ref-type="bibr" rid="B44">Ganin et al., 2016</xref>; <xref ref-type="bibr" rid="B187">Tzeng et al., 2017</xref>). These approaches draw inspiration from the training procedure used by the popular Generative Adversarial Networks (GANs) (<xref ref-type="bibr" rid="B49">Goodfellow et al., 2014</xref>) to efficiently align source and target domain features in a common latent-space. Several new techniques (<xref ref-type="bibr" rid="B56">Han et al., 2019a</xref>; <xref ref-type="bibr" rid="B191">Wang et al., 2019a</xref>; <xref ref-type="bibr" rid="B192">Wang and Liu, 2020</xref>) based on this class of DA approaches have been recently proposed in the PHM literature. Other references on DA and TF approaches in the context of fault diagnosis can be found in the recent review works of <xref ref-type="bibr" rid="B97">Li et al. (2020)</xref> and <xref ref-type="bibr" rid="B230">Zheng H. et al., 2019</xref>; <xref ref-type="bibr" rid="B232">Zheng Z. et al., 2019</xref>.</p>
</sec>
<sec id="s4-2-3">
<label>4.2.3</label>
<title>Artificial Data Generation</title>
<p>Generative models such as GANs and VAEs have achieved impressive results in generating photo-realistic artificial data in the context of CV. However, the task of generating realistic problem-specific time-series data is still relatively unexplored compared to artificial image generation. Unsurprisingly, existing approaches in this context make large use of GANs. In <xref ref-type="bibr" rid="B32">Donahue et al. (2018)</xref>, for instance, GANs are used for music and speech synthesis. In <xref ref-type="bibr" rid="B139">Nik Aznan et al., (2019)</xref>, <xref ref-type="bibr" rid="B58">Haradal et al. (2018)</xref>, and <xref ref-type="bibr" rid="B68">Hyland et al., (2017)</xref> the authors propose new GAN-based methods that generate medical data such as electroencephalographic (EEG) brain signals, and time-dependent health parameters of patients hospitalized in the Intensive Care Unit (ICU). The recent method proposed by <xref ref-type="bibr" rid="B217">Yoon et al. (2019)</xref> provides new state-of-the-art performance for realistic time-series generation.</p>
<p>The benefits of such approaches in the context of PHM could be significant. One of their most direct application is to perform data augmentation in order to tackle to problem of lack of representativeness and therefore improving the performance of data-intensive DL models. To the authors&#x2019; knowledge, only a small number of works have started exploring this idea and some first interesting results have already been produced (<xref ref-type="bibr" rid="B123">Mao et al., 2019</xref>; <xref ref-type="bibr" rid="B168">Shao et al., 2019a</xref>; <xref ref-type="bibr" rid="B169">Shao et al., 2019b</xref> <xref ref-type="bibr" rid="B189">Wang et al., 2019</xref>).</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<label>5</label>
<title>Discussion</title>
<p>PM, as a key player in the Industry 4.0 paradigm, strongly relies on some of the most recent advances in hardware technology, communication systems and data science. Among them, DL techniques have gained popularity over the last few years in light of their excellent performances in processing complex data in an end-to-end fashion. In this review, we have described several applications of these methods to PHM. In particular, we have discussed the advantages they introduce over traditional ML techniques, stressing on their improved representational power and their ability to automatically extract informative features from data. Despite its great success, DL presents some shortcomings that limit its large-scale deployment in industrial applications. Its low level of interpretability, its generalization deficiencies and its data-intensive nature are some of the main weaknesses DL needs to overcome to close the gap between academia and industrial deployment. In this review, we identified three research areas that we believe could address or alleviate the aforementioned open challenges, namely: physics-enhanced techniques, domain adaptation and artificial data generation. The first aims to improve interpretability by grounding data-driven methods on well-understood physics models of the system under consideration. Furthermore, incorporating prior physics knowledge into DL algorithms can be seen as imposing meaningful inductive biases into the learning process, resulting in improved generalization and reasoning. Domain adaptation provides a set of tools to transfer the knowledge acquired on a well-known industrial component to other similar assets for which data are less abundant. Finally, artificial data generation techniques can be used to cope with the lack of representativeness problem and the data-intensive nature of DL algorithms. Some of these lines of research have already shown interesting results, while others, although very promising, are only in their infancy.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>LB designed the study and wrote the manuscript. IK contributed to the final version of the manuscript and supervised the project.</p>
</sec>
<sec id="s7">
<title>Conflict of Interest</title>
<p>Authors IK and LB are employed by the company CSEM SA.</p>
<p>The authors declare that this study received funding from CSEM SA. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.</p>
</sec>
</body>
<back>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://www2.deloitte.com/content/dam/Deloitte/us/Documents/process-and-operations/us-cons-predictive-maintenance.pdf">https://www2.deloitte.com/content/dam/Deloitte/us/Documents/process-and-operations/us-cons-predictive-maintenance.pdf</ext-link>
</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.isa.org/standards-publications/isa-publications/intech-magazine/2013/feb/automation-it-predictive-maintenance-embraces-analytics/">https://www.isa.org/standards-publications/isa-publications/intech-magazine/2013/feb/automation-it-predictive-maintenance-embraces-analytics/</ext-link>
</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.ge.com/uk/sites/www.ge.com.uk/files/PAC-Predictive-Maintenance-GE-Digital-Full-report-2018.pdf">https://www.ge.com/uk/sites/www.ge.com.uk/files/PAC-Predictive-Maintenance-GE-Digital-Full-report-2018.pdf</ext-link>
</p>
</fn>
<fn id="FN4">
<label>4</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://www.pwc.be/en/documents/20180926-pdm40-beyond-the-hype-report.pdf">https://www.pwc.be/en/documents/20180926-pdm40-beyond-the-hype-report.pdf</ext-link>
</p>
</fn>
<fn id="FN5">
<label>5</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.phmsociety.org/sites/phmsociety.org/files/Tutorial_PHM12_Wang.pdf">https://www.phmsociety.org/sites/phmsociety.org/files/Tutorial_PHM12_Wang.pdf</ext-link>
</p>
</fn>
<fn id="FN6">
<label>6</label>
<p>We use the notation &#x201c;(1D)2D-CNN&#x201d; to indicate a CNN architecture with (one) two-dimensional filters.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abadi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Barham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Brevdo</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Citro</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Tensorflow: large-scale machine learning on heterogeneous distributed systems</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abbasion</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rafsanjani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Farshidianfar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Irani</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Rolling element bearings multi-fault classification based on the wavelet denoising and support vector machine</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>21</volume>, <fpage>2933</fpage>&#x2013;<lpage>2945</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2007.02.003</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abdallah</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Dertimanis</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Mylonas</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Tatsis</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Chatzi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Dervili</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). &#x201c;<article-title>Fault diagnosis of wind turbine structures using decision tree learning algorithms with big data</article-title>,&#x201d; in 28th European Safety and Reliability Conference (ESREL 2018), Trondheim, Norway, June 17&#x2013;21, 2018, <fpage>3053</fpage>&#x2013;<lpage>3061</lpage>. <pub-id pub-id-type="doi">10.1201/9781351174664-382</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abu-Mahfouz</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>A comparative study of three artificial neural networks for the detection and classification of gear faults</article-title>. <source>Int. J. Gen. Syst.</source> <volume>34</volume>, <fpage>261</fpage>&#x2013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1080/03081070500065726</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ali</surname>
<given-names>J. B.</given-names>
</name>
<name>
<surname>Fnaiech</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Saidi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chebel-Morello</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Fnaiech</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals</article-title>. <source>Appl. Acoust.</source> <volume>89</volume>, <fpage>16</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1016/j.apacoust.2014.08.016</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Appana</surname>
<given-names>D. K.</given-names>
</name>
<name>
<surname>Islam</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-M.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Reliable fault diagnosis of bearings using distance and density similarity on an enhanced k-nn</article-title>,&#x201d; in <source>Australasian conference on artificial life and computational intelligence</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Wagner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hendtlass</surname>
<given-names>T.</given-names>
</name>
</person-group> (<publisher-loc>Cham, Switzerland</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>193</fpage>&#x2013;<lpage>203</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arjovsky</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bottou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gulrajani</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Lopez-Paz</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Invariant risk minimization</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ayhan</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chow</surname>
<given-names>M.-Y.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>M.-H.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Multiple discriminant analysis and neural-network-based monolith and partition fault-detection schemes for broken rotor bar in induction motors</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>53</volume>, <fpage>1298</fpage>&#x2013;<lpage>1308</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2006.878301</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Babu</surname>
<given-names>G. S.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.-L.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep convolutional neural network based regression approach for estimation of remaining useful life</article-title>,&#x201d; in <source>International conference on database systems for advanced applications</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Navathe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Shekhar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>H.</given-names>
</name>
</person-group>.(<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>214</fpage>&#x2013;<lpage>228</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bashar</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Nayak</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Suzor</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set</article-title>. <source>Knowl. Inf. Syst.</source> <volume>62</volume> (<issue>10</issue>), <fpage>4029</fpage>&#x2013;<lpage>4054</lpage>. <pub-id pub-id-type="doi">10.1007/1074 s10115-020-01481-0</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bay</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ess</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tuytelaars</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Van Gool</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Speeded-up robust features (surf)</article-title>. <source>Comput. Vis. Image Understand.</source> <volume>110</volume>, <fpage>346</fpage>&#x2013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1016/j.cviu.2007.09.014</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beery</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Van Horn</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Perona</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Recognition in terra incognita</article-title>,&#x201d; in Proceedings of the European Conference on Computer Vision (ECCV), 2018, <fpage>456</fpage>&#x2013;<lpage>473</lpage>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Simard</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Frasconi</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Learning long-term dependencies with gradient descent is difficult</article-title>. <source>IEEE Trans. Neural Network.</source> <volume>5</volume>, <fpage>157</fpage>&#x2013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1109/72.279181</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benkercha</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Moulahoum</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Fault detection and diagnosis based on c4.5 decision tree algorithm for grid connected pv system</article-title>. <source>Sol. Energy</source> <volume>173</volume>, <fpage>610</fpage>&#x2013;<lpage>634</lpage>. <pub-id pub-id-type="doi">10.1016/j.solener.2018.07.089</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blundell</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Cornebise</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kavukcuoglu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wierstra</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Weight uncertainty in neural networks</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Box</surname>
<given-names>G. E. P.</given-names>
</name>
<name>
<surname>Draper</surname>
<given-names>N. R.</given-names>
</name>
</person-group> (<year>1987</year>). <source>Empirical model-building and response surfaces</source>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bruckner</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Stanica</surname>
<given-names>M.-P.</given-names>
</name>
<name>
<surname>Blair</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Schriegel</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kehrer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Seewald</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group>, (<year>2019</year>). <article-title>An introduction to opc ua tsn for industrial communication systems</article-title>. <source>Proc. IEEE</source> <volume>107</volume>, <fpage>1121</fpage>&#x2013;<lpage>1131</lpage>. <pub-id pub-id-type="doi">10.1109/jproc.2018.2888703</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Preprocessing-free gear fault diagnosis using small datasets with deep convolutional neural network-based transfer learning</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>26241</fpage>&#x2013;<lpage>26253</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2837621</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>X.-C.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.-Q.</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>W.-P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Combining translation-invariant wavelet frames and convolutional neural network for intelligent tool wear state identification</article-title>. <source>Comput. Ind.</source> <volume>106</volume>, <fpage>71</fpage>&#x2013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2018.12.018</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chao</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Goebel</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Hybrid deep fault detection and isolation: combining deep neural networks and system performance models</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jing</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Gated recurrent unit based recurrent neural network for remaining useful life prediction of nonlinear deterioration process</article-title>. <source>Reliab. Eng. Syst. Saf.</source> <volume>185</volume>, <fpage>372</fpage>&#x2013;<lpage>382</lpage>. <pub-id pub-id-type="doi">10.1016/j.ress.2019.01.006</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>[Dataset] Tennessee eastman simulation dataset</article-title>. <pub-id pub-id-type="doi">10.21227/4519-z50210.1037/t72896-000</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Remaining life prognostics of rolling bearing based on relative features and multivariable support vector machine</article-title>. <source>Proc. IME C J. Mech. Eng. Sci.</source> <volume>227</volume>, <fpage>2849</fpage>&#x2013;<lpage>2860</lpage>. <pub-id pub-id-type="doi">10.1177/0954406212474395</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Acdin: bridging the gap between artificial and real bearing damages for bearing fault diagnosis</article-title>. <source>Neurocomputing</source> <volume>294</volume>, <fpage>61</fpage>&#x2013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.03.014</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chine</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Mellit</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lughi</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Malek</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sulligoi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Massi Pavan</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A novel fault diagnosis technique for photovoltaic systems based on artificial neural networks</article-title>. <source>Renew. Energy</source> <volume>90</volume>, <fpage>501</fpage>&#x2013;<lpage>512</lpage>. <pub-id pub-id-type="doi">10.1016/j.renene.2016.01.036</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Van Merri&#xeb;nboer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gulcehre</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bahdanau</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bougares</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Schwenk</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Learning phrase representations using rnn encoder-decoder for statistical machine translation</article-title>. <source>arXiv</source>. <pub-id pub-id-type="doi">10.3115/v1/d14-1179</pub-id>
</citation>
</ref>
<ref id="B236">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cranmer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Greydanus</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hoyer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Battaglia</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Spergel</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>&#x201C;Lagrangian neural networks,&#x201D; in ICLR 2020 workshop on integration of deep neural models and differential equations</article-title>. <source>Renew. Energy</source> <volume>90</volume>, <fpage>501</fpage>&#x2013;<lpage>512</lpage>. <pub-id pub-id-type="doi">10.1016/j.renene.2016.01.036</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Damianou</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Deep Gaussian processes</article-title>,&#x201d; in <conf-name>Proceedings of the Sixteenth International Conference on Artificial intelligence and statistics, AISTATS 2013</conf-name>, <conf-loc>Scottsdale, AZ</conf-loc>, <conf-date>April 29&#x2013;May 1, 2013</conf-date>, <fpage>207</fpage>&#x2013;<lpage>215</lpage>.</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mermelstein</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1980</year>). <article-title>Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences</article-title>. <source>IEEE Trans. Acoust. Speech Signal Process.</source> <volume>28</volume>, <fpage>357</fpage>&#x2013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1109/tassp.1980.1163420</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Deng</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). &#x201c;<article-title>A remaining useful life prediction method with automatic feature extraction for aircraft engines</article-title>,&#x201d; in <conf-name>2019 18th IEEE international conference on trust, security and privacy in computing and communications/13th IEEE international conference on big data science and engineering (TrustCom/BigDataSE)</conf-name>, <conf-loc>Rotorua, New Zealand</conf-loc>, <conf-date>August 5&#x2013;8, 2019</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>686</fpage>&#x2013;<lpage>692</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Devlin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>M.-W.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Toutanova</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Bert: pre-training of deep bidirectional transformers for language understanding</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Energy-fluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis</article-title>. <source>IEEE Trans. Instrum. Meas.</source> <volume>66</volume>, <fpage>1926</fpage>&#x2013;<lpage>1935</lpage>. <pub-id pub-id-type="doi">10.1109/tim.2017.2674738</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Donahue</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>McAuley</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Puckette</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Adversarial audio synthesis</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Fault diagnosis of bearing based on the kernel principal component analysis and optimized k-nearest neighbor model</article-title>. <source>J. Low Freq. Noise Vib. Act. Contr.</source> <volume>36</volume>, <fpage>354</fpage>&#x2013;<lpage>365</lpage>. <pub-id pub-id-type="doi">10.1177/1461348417744302</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dourado</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Viana</surname>
<given-names>F. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Physics-informed neural networks for missing physics estimation in cumulative damage models: a case study in corrosion fatigue</article-title>. <source>J. Comput. Inf. Sci. Eng.</source> <volume>20</volume>, <fpage>061007</fpage>. <pub-id pub-id-type="doi">10.1115/1.4047173</pub-id>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elforjani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Shanbr</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Prognosis of bearing acoustic emission signals using supervised machine learning</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>65</volume>, <fpage>5864</fpage>&#x2013;<lpage>5871</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2017.2767551</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elsheikh</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Yacout</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ouali</surname>
<given-names>M.-S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Bidirectional handshaking lstm for remaining useful life prediction</article-title>. <source>Neurocomputing</source> <volume>323</volume>, <fpage>148</fpage>&#x2013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.09.076</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eren</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Bearing fault detection by one-dimensional convolutional neural networks</article-title>. <source>Math. Probl. Eng.</source>, <volume>2017</volume>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1155/2017/8617315</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eren</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ince</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kiranyaz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A generic intelligent bearing fault diagnosis system using compact adaptive 1d cnn classifier</article-title>. <source>J Sign Process Syst</source> <volume>91</volume>, <fpage>179</fpage>&#x2013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1007/s11265-018-1378-3</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fawaz</surname>
<given-names>H. I.</given-names>
</name>
<name>
<surname>Forestier</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Weber</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Idoumghar</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Muller</surname>
<given-names>P.-A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Deep learning for time series classification: a review</article-title>. <source>Data Min. Knowl. Discov.</source> <volume>33</volume>, <fpage>917</fpage>&#x2013;<lpage>963</lpage>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fern&#xe1;ndez-Francos</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Mart&#xed;nez-Rego</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Fontenla-Romero</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Alonso-Betanzos</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Automatic bearing fault diagnosis based on one-class &#x3bd;-SVM</article-title>. <source>Comput. Ind. Eng.</source> <volume>64</volume>, <fpage>357</fpage>&#x2013;<lpage>365</lpage>. <pub-id pub-id-type="doi">10.1016/j.cie.2012.10.013</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Data-driven intelligent predictive maintenance of industrial assets</article-title>,&#x201d; in <source>Women in industrial and systems engineering</source>
<source>.</source> Editor <person-group person-group-type="editor">
<name>
<surname>Smith</surname>
<given-names>A.</given-names>
</name>
</person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>589</fpage>&#x2013;<lpage>605</lpage>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname>
<given-names>J. H.</given-names>
</name>
</person-group> (<year>1987</year>). <article-title>Exploratory projection pursuit</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>82</volume>, <fpage>249</fpage>&#x2013;<lpage>266</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.1987.10478427</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ganin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lempitsky</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Unsupervised domain adaptation by backpropagation</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ganin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ustinova</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ajakan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Germain</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Larochelle</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Laviolette</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Domain-adversarial training of neural networks</article-title>. <source>J. Mach. Learn. Res.</source> <volume>17</volume>, <fpage>2096</fpage>&#x2013;<lpage>2030</lpage>.</citation>
</ref>
<ref id="B45">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Garcia</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Costa</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Palanca</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Giret</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Julian</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Botti</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Requirements for an intelligent maintenance system for industry 4.0</article-title>,&#x201d; in <source>International workshop on service orientation in holonic and multi-agent manufacturing.</source> Editors <person-group person-group-type="editor">
<name>
<surname>Borangiu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Trentesaux</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Leit&#xe3;o</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Giret Boggino</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Botti</surname>
<given-names>V.</given-names>
</name>
</person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>340</fpage>&#x2013;<lpage>351</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garnelo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schwarz</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rosenbaum</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Viola</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rezende</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Eslami</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Neural processes</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gebraeel</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lawley</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Parmeshwaran</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Residual life predictions from vibration-based degradation signals: a neural network approach</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>51</volume>, <fpage>694</fpage>&#x2013;<lpage>700</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2004.824875</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gharavian</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Almas Ganj</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ohadi</surname>
<given-names>A. R.</given-names>
</name>
<name>
<surname>Heidari Bafroui</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Comparison of fda-based and pca-based features in fault diagnosis of automobile gearboxes</article-title>. <source>Neurocomputing</source> <volume>121</volume>, <fpage>150</fpage>&#x2013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2013.04.033</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Pouget-Abadie</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mirza</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Warde-Farley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ozair</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). &#x201c;<article-title>Generative adversarial nets</article-title>,&#x201d; in <conf-name>Proceedings of the 27th International Conference on Advances in neural information processing systems</conf-name>, <conf-loc>Montreal, Canada</conf-loc>, <conf-date>June 10, 2014</conf-date>, <fpage>2672</fpage>&#x2013;<lpage>2680</lpage>.</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gryllias</surname>
<given-names>K. C.</given-names>
</name>
<name>
<surname>Antoniadis</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments</article-title>. <source>Eng. Appl. Artif. Intell.</source> <volume>25</volume>, <fpage>326</fpage>&#x2013;<lpage>344</lpage>. <pub-id pub-id-type="doi">10.1016/j.engappai.2011.09.010</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guillem&#xe9;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Masson</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Roz&#xe9;</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Termier</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Agnostic local explanation for time series classification</article-title>,&#x201d; in <conf-name>2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI)</conf-name>, <conf-loc>Portland, OR</conf-loc>, <conf-date>November 2019</conf-date>, <fpage>432</fpage>&#x2013;<lpage>439</lpage>.</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2018a</year>). <article-title>Machinery health indicator construction based on convolutional neural networks considering trend burr</article-title>. <source>Neurocomputing</source> <volume>292</volume>, <fpage>142</fpage>&#x2013;<lpage>150</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.02.083</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2018b</year>). <article-title>A novel fault diagnosis method for rotating machinery based on a convolutional neural network</article-title>. <source>Sensors</source> <volume>18</volume>, <fpage>1429</fpage>. <pub-id pub-id-type="doi">10.3390/s18051429</pub-id>
</citation>
</ref>
<ref id="B54">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Guyon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Gunn</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nikravesh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zadeh</surname>
<given-names>L. A.</given-names>
</name>
</person-group> (<year>2006</year>). <source>Feature extraction: foundations and applications (studies in fuzziness and soft computing)</source>. <publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>.</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamadache</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>J. H.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Youn</surname>
<given-names>B. D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A comprehensive review of artificial intelligence-based approaches for rolling element bearing phm: shallow and deep learning</article-title>. <source>JMST Adv.</source> <volume>1</volume>, <fpage>125</fpage>&#x2013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1007/s42791-019-0016-y</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults</article-title>. <source>Knowl. Base Syst.</source> <volume>165</volume>, <fpage>474</fpage>&#x2013;<lpage>487</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2018.12.019</pub-id>
</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>An enhanced convolutional neural network with enlarged receptive fields for fault diagnosis of planetary gearboxes</article-title>. <source>Comput. Ind.</source> <volume>107</volume>, <fpage>50</fpage>&#x2013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2019.01.012</pub-id>
</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haradal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hayashi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Uchida</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Biosignal data augmentation based on generative adversarial networks</article-title>. <source>Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.</source> <volume>2018</volume>, <fpage>368</fpage>&#x2013;<lpage>371</lpage>. <pub-id pub-id-type="doi">10.1109/EMBC.2018.8512396</pub-id>
</citation>
</ref>
<ref id="B59">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hastie</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <source>The elements of statistical learning</source>
<italic>.</italic> <source>Springer series in 1206 statistics</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer New York Inc</publisher-name>.</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep residual learning for image recognition</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name>, <conf-loc>Las Vegas, NV</conf-loc>, <conf-date>June 2016</conf-date>, <fpage>770</fpage>&#x2013;<lpage>778</lpage>.</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Improved deep transfer auto-encoder for fault diagnosis of gearbox under variable working conditions with small training samples</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>115368</fpage>&#x2013;<lpage>115377</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2936243</pub-id>
</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hess</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2002</year>). &#x201c;<article-title>Prognostics, from the need to reality-from the fleet users and phm system designer/developers perspectives</article-title>,&#x201d; in <conf-name>Proceedings, IEEE Aerospace Conference (IEEE)</conf-name>, <conf-loc>Big Sky, MT</conf-loc>, <conf-date>March 9&#x2013;16, 2002</conf-date>, <volume>vol. 6</volume>, <fpage>2791</fpage>&#x2013;<lpage>2797</lpage>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hochreiter</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schmidhuber</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Long short-term memory</article-title>. <source>Neural Computation</source> <volume>9</volume>, <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id>
</citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hofmann</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sch&#xf6;lkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Smola</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Kernel methods in machine learning</article-title>. <source>Ann. Stat.</source>, <volume>36</volume> , <fpage>1171</fpage>&#x2013;<lpage>1220</lpage>. <pub-id pub-id-type="doi">10.1214/009053607000000677</pub-id>
</citation>
</ref>
<ref id="B65">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Holschneider</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kronland-Martinet</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Morlet</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tchamitchian</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1990</year>). &#x201c;<article-title>A real-time algorithm for signal analysis with the help of the wavelet transform</article-title>,&#x201d; in <source>Wavelets</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Combes</surname>
<given-names>J.-M.</given-names>
</name>
<name>
<surname>Grossmann</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tchamitchian</surname>
<given-names>P.</given-names>
</name>
</person-group> (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer)</publisher-name>, <fpage>286</fpage>&#x2013;<lpage>297</lpage>.</citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>H.-Z.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.-K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.-F.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Support vector machine based estimation of remaining useful life: current research status and future trends</article-title>. <source>J. Mech. Sci. Technol.</source> <volume>29</volume>, <fpage>151</fpage>&#x2013;<lpage>163</lpage>. <pub-id pub-id-type="doi">10.1007/s12206-014-1222-z</pub-id>
</citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hubel</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Wiesel</surname>
<given-names>T. N.</given-names>
</name>
</person-group> (<year>1968</year>). <article-title>Receptive fields and functional architecture of monkey striate cortex</article-title>. <source>J. Physiol.</source> <volume>195</volume>, <fpage>215</fpage>&#x2013;<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1113/jphysiol.1968.sp008455</pub-id>
</citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hyland</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Esteban</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ra&#xa8;tsch</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Real-valued (medical) time series generation with recurrent conditional gans</article-title>. <source>Stat</source> <volume>1050</volume>, <fpage>8</fpage>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hyv&#xe4;rinen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Oja</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Independent component analysis: algorithms and applications</article-title>. <source>Neural Networks</source> <volume>13</volume>, <fpage>411</fpage>&#x2013;<lpage>430</lpage>. <pub-id pub-id-type="doi">10.1016/s0893-6080(00)00026-5</pub-id>
</citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ioffe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Batch normalization: accelerating deep network training by reducing internal covariate shift</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Islam</surname>
<given-names>M. M. M.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Khan</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Reliable bearing fault diagnosis using bayesian inference-based multi-class support vector machines</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>141</volume>, <fpage>EL89</fpage>. <pub-id pub-id-type="doi">10.1121/1.4976038</pub-id>
</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Islam</surname>
<given-names>M. M. M.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-M.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>Automated bearing fault diagnosis scheme using 2d representation of wavelet packet transform and deep convolutional neural network</article-title>. <source>Comput. Ind.</source> <volume>106</volume>, <fpage>142</fpage>&#x2013;<lpage>153</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2019.01.008</pub-id>
</citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Islam</surname>
<given-names>M. M. M.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-M.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>Reliable multiple combined fault diagnosis of bearings using heterogeneous feature models and multiclass support vector machines</article-title>. <source>Reliab. Eng. Syst. Saf.</source> <volume>184</volume>, <fpage>55</fpage>&#x2013;<lpage>66</lpage>.</citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janssens</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Van de Walle</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Loccufier</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Van Hoecke</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Deep learning for infrared thermal image based machine health monitoring</article-title>. <source>IEEE ASME Trans. Mechatron.</source> <volume>23</volume>, <fpage>151</fpage>&#x2013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1109/tmech.2017.2722479</pub-id>
</citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xing</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines</article-title>. <source>Neurocomputing</source> <volume>272</volume>, <fpage>619</fpage>&#x2013;<lpage>628</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2017.07.032</pub-id>
</citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>72-73</volume>, <fpage>303</fpage>&#x2013;<lpage>315</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2015.10.025</pub-id>
</citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shelhamer</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Donahue</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Karayev</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Long</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Caffe: convolutional architecture for fast feature embedding</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Vong</surname>
<given-names>C.-M.</given-names>
</name>
<name>
<surname>Pecht</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A rotating machinery fault diagnosis method based on feature learning of thermal images</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>12348</fpage>&#x2013;<lpage>12359</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2893331</pub-id>
</citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A comprehensive review on convolutional neural network in machine fault diagnosis</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jing</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A convolutional neural network based feature learning and fault diagnosis method for the condition monitoring of gearbox</article-title>. <source>Measurement</source> <volume>111</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2017.07.017</pub-id>
</citation>
</ref>
<ref id="B81">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jolliffe</surname>
<given-names>I. T.</given-names>
</name>
</person-group> (<year>1986</year>). &#x201c;<article-title>Principal components in regression analysis</article-title>,&#x201d; in <source>Principal component analysis</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>129</fpage>&#x2013;<lpage>155</lpage>.</citation>
</ref>
<ref id="B82">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Julier</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Uhlmann</surname>
<given-names>J. K.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>New extension of the kalman filter to nonlinear systems</article-title>. <source>Int. Symp. Aerospace/Defense Sensing, Simul. and Controls</source> <volume>3068</volume>, <fpage>182</fpage>&#x2013;<lpage>193</lpage>.</citation>
</ref>
<ref id="B83">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kadry</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). <source>Diagnostics and prognostics of engineering systems: methods and techniques: methods and techniques</source>. <publisher-loc>Hershey, PA</publisher-loc>: <publisher-name>IGI Global</publisher-name>.</citation>
</ref>
<ref id="B84">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kennedy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Eberhart</surname>
<given-names>R. C.</given-names>
</name>
</person-group> (<year>1997</year>). &#x201c;<article-title>A discrete binary version of the particle swarm algorithm</article-title>,&#x201d; in <conf-name>1997 IEEE International conference on systems, man, and cybernetics. Computational cybernetics and simulation (IEEE)</conf-name>, <conf-loc>Orlando, FL</conf-loc>, <conf-date>12&#x2013;15, 1997</conf-date>, <volume>vol. 5</volume>, <fpage>4104</fpage>&#x2013;<lpage>4108</lpage>.</citation>
</ref>
<ref id="B85">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yairi</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A review on the application of deep learning in system health management</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>107</volume>, <fpage>241</fpage>&#x2013;<lpage>265</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2017.11.024</pub-id>
</citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khelif</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chebel-Morello</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Malinowski</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Laajili</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fnaiech</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zerhouni</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Direct remaining useful life estimation based on support vector regression</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>64</volume>, <fpage>2276</fpage>&#x2013;<lpage>2285</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2016.2623260</pub-id>
</citation>
</ref>
<ref id="B87">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Auto-encoding variational bayes</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B88">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kopparapu</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Laxminarayana</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Choice of mel filter bank in computing mfcc of a resampled speech</article-title>,&#x201d; in <conf-name>10th international conference on information science, signal processing and their applications (ISSPA 2010)</conf-name>, <conf-loc>Kuala Lumpur, Malaysia</conf-loc>, <conf-date>May 2010</conf-date> (<publisher-name>IEEE</publisher-name>, <fpage>121</fpage>&#x2013;<lpage>124</lpage>.</citation>
</ref>
<ref id="B89">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>, <fpage>84</fpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id>
</citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuo</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Intelligent diagnosis for turbine blade faults using artificial neural networks and fuzzy logic</article-title>. <source>Eng. Appl. Artif. Intell.</source> <volume>8</volume>, <fpage>25</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1016/0952-1976(94)00082-x</pub-id>
</citation>
</ref>
<ref id="B91">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lal Senanayaka</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Van Khang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Robbersmyr</surname>
<given-names>K. G.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Autoencoders and recurrent neural networks based algorithm for prognosis of bearing life</article-title>,&#x201d; in <conf-name>2018 21st International conference on electrical machines and systems (ICEMS)</conf-name>, <conf-loc>Jeju, South Korea</conf-loc>, <fpage>537</fpage>&#x2013;<lpage>542</lpage>.</citation>
</ref>
<ref id="B92">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ghaffari</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Siegel</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Prognostics and health management design for rotary machinery systems-Reviews, methodology and applications</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>42</volume>, <fpage>314</fpage>&#x2013;<lpage>334</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2013.06.004</pub-id>
</citation>
</ref>
<ref id="B93">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Fault diagnosis of wind turbine based on long short-term memory networks</article-title>. <source>Renew. Energy</source> <volume>133</volume>, <fpage>422</fpage>&#x2013;<lpage>432</lpage>. <pub-id pub-id-type="doi">10.1016/j.renene.2018.10.031</pub-id>
</citation>
</ref>
<ref id="B94">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zi</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>A combination of wknn to fault diagnosis of rolling element bearings</article-title>. <source>J. Vib. Acoust.</source> <volume>131</volume>, <fpage>064502</fpage>. <pub-id pub-id-type="doi">10.1115/1.4000478</pub-id>
</citation>
</ref>
<ref id="B95">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Nandi</surname>
<given-names>A. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Applications of machine learning to machine fault diagnosis: a review and roadmap</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>138</volume>, <fpage>106587</fpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2019.106587</pub-id>
</citation>
</ref>
<ref id="B96">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Gear crack level identification based on weighted k nearest neighbor classification algorithm</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>23</volume>, <fpage>1535</fpage>&#x2013;<lpage>1547</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2009.01.009</pub-id>
</citation>
</ref>
<ref id="B97">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Estupinan</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A systematic review of deep transfer learning for machinery fault diagnosis</article-title>. <source>Neurocomputing</source> <volume>407</volume>, <fpage>121</fpage>&#x2013;<lpage>135</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.04.045</pub-id>
</citation>
</ref>
<ref id="B98">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>Sensor data-driven bearing fault diagnosis based on deep convolutional neural networks and s-transform</article-title>. <source>Sensors</source> <volume>19</volume>, <fpage>2750</fpage>. <pub-id pub-id-type="doi">10.3390/s19122750</pub-id>
</citation>
</ref>
<ref id="B99">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>A directed acyclic graph network combined with cnn and lstm for remaining useful life prediction</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>75464</fpage>&#x2013;<lpage>75475</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2919566</pub-id>
</citation>
</ref>
<ref id="B100">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Study on signal recognition and diagnosis for spacecraft based on deep learning method</article-title>,&#x201d; in <conf-name>2015 Prognostics and System Health Management Conference (PHM)</conf-name>, <conf-loc>Beijing, China</conf-loc>, <fpage>1</fpage>&#x2013;<lpage>5</lpage>.</citation>
</ref>
<ref id="B101">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.-Q.</given-names>
</name>
</person-group> (<year>2018a</year>). <article-title>Remaining useful life estimation in prognostics using deep convolution neural networks</article-title>. <source>Reliab. Eng. Syst. Saf.</source> <volume>172</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1016/j.ress.2017.11.021</pub-id>
</citation>
</ref>
<ref id="B102">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018b</year>). &#x201c;<article-title>Intelligent fault diagnosis of rotating machinery based on deep recurrent neural network</article-title>,&#x201d; in <conf-name>2018 International conference on Sensing,Diagnostics, prognostics, and control (SDPC)</conf-name>. <conf-loc>Xi&#x27;an, China</conf-loc>, <fpage>67</fpage>&#x2013;<lpage>72</lpage>.</citation>
</ref>
<ref id="B103">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019c</year>). <article-title>Deep learning-based remaining useful life estimation of bearings using multi-scale feature extraction</article-title>. <source>Reliab. Eng. Syst. Saf.</source> <volume>182</volume>, <fpage>208</fpage>&#x2013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1016/j.ress.2018.11.011</pub-id>
</citation>
</ref>
<ref id="B104">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2019d</year>). <article-title>A deep learning approach for anomaly detection based on sae and lstm in mechanical equipment</article-title>. <source>Int. J. Adv. Manuf. Technol.</source> <volume>103</volume>, <fpage>499</fpage>. <pub-id pub-id-type="doi">10.1007/s00170-019-03557-w</pub-id>
</citation>
</ref>
<ref id="B105">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Intelligent fault diagnosis method for marine diesel engines using instantaneous angular speed</article-title>. <source>J. Mech. Sci. Technol.</source> <volume>26</volume>, <fpage>2413</fpage>&#x2013;<lpage>2423</lpage>. <pub-id pub-id-type="doi">10.1007/s12206-012-0621-2</pub-id>
</citation>
</ref>
<ref id="B106">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lipton</surname>
<given-names>Z. C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>The mythos of model interpretability</article-title>. <source>Queue</source> <volume>16</volume>, <fpage>31</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1145/3236386.3241340</pub-id>
</citation>
</ref>
<ref id="B107">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Rolling bearing fault diagnosis based on stft-deep learning and sound signals</article-title>. <source>Shock Vib.</source>, <volume>2016</volume>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1155/2016/6127479</pub-id>
</citation>
</ref>
<ref id="B108">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Fault diagnosis of rolling bearings with recurrent neural network-based autoencoders</article-title>. <source>ISA Transactions</source> <volume>77</volume>, <fpage>167</fpage>&#x2013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/j.isatra.2018.04.005</pub-id>
</citation>
</ref>
<ref id="B109">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>Remaining useful life prediction of pemfc based on long short-term memory recurrent neural networks</article-title>. <source>Int. J. Hydrogen Energy</source> <volume>44</volume>, <fpage>5470</fpage>&#x2013;<lpage>5480</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijhydene.2018.10.042</pub-id>.</citation>
</ref>
<ref id="B110">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>Fault diagnosis of rotating machinery under noisy environment conditions based on a 1-d convolutional autoencoder and 1-d convolutional neural network</article-title>. <source>Sensors</source> <volume>19</volume>, <fpage>972</fpage>. <pub-id pub-id-type="doi">10.3390/s19040972</pub-id>
</citation>
</ref>
<ref id="B111">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Feature ranking for support vector machine classification and its application to machinery fault diagnosis</article-title>. <source>Proc. IME C J. Mech. Eng. Sci.</source> <volume>227</volume>, <fpage>2077</fpage>&#x2013;<lpage>2089</lpage>. <pub-id pub-id-type="doi">10.1177/0954406212469757</pub-id>
</citation>
</ref>
<ref id="B112">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Locatello</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lucic</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Raetsch</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gelly</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Scho&#xa8;lkopf</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2019a</year>). &#x201c;<article-title>Challenging common assumptions in the unsupervised learning of disentangled representations</article-title>,&#x201d; in <conf-name>Proceedings of the 36th international conference on machine learning</conf-name>. <fpage>4114</fpage>&#x2013;<lpage>4124</lpage>.</citation>
</ref>
<ref id="B113">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Locatello</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Tschannen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>R&#xe4;tsch</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sch&#xf6;lkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bachem</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>Disentangling factors of variation using few labels</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B114">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Logan</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Mathew</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Using the correlation dimension for vibration fault diagnosis of rolling element bearings-i. Basic concepts</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>10</volume>, <fpage>241</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1006/mssp.1996.0018</pub-id>
</citation>
</ref>
<ref id="B115">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lowe</surname>
<given-names>D. G.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Distinctive image features from scale-invariant keypoints</article-title>. <source>Int. J. Comput. Vis.</source> <volume>60</volume>, <fpage>91</fpage>&#x2013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1023/B:VISI.0000029664.99615.94</pub-id>
</citation>
</ref>
<ref id="B116">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.-Y.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>W.-L.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification</article-title>. <source>Signal Process.</source> <volume>130</volume>, <fpage>377</fpage>. <pub-id pub-id-type="doi">10.1016/j.sigpro.2016.07.028</pub-id>
</citation>
</ref>
<ref id="B117">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>P.-J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M.-C.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>T.-C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>An evaluation of engine faults diagnostics using artificial neural networks</article-title>. <source>J. Eng. Gas Turbines Power</source> <volume>123</volume>, <fpage>340</fpage>. <pub-id pub-id-type="doi">10.1115/1.1362667</pub-id>
</citation>
</ref>
<ref id="B118">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lundberg</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.-I.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>A unified approach to interpreting model predictions</article-title>,&#x201d; in <source>Advances in neural information processing systems 30</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Guyon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Luxburg</surname>
<given-names>U. V.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wallach</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fergus</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Vishwanathan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Garnett</surname>
<given-names>R.</given-names>
</name>
</person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>4765</fpage>&#x2013;<fpage>4774</fpage>.</citation>
</ref>
<ref id="B119">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lv</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bao</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Weighted time series fault diagnosis based on a stacked sparse autoencoder</article-title>. <source>J. Chemometr.</source> <volume>31</volume>, <fpage>e2912</fpage>. <pub-id pub-id-type="doi">10.1002/cem.2912. E2912 CEM-16-0169.R1</pub-id>
</citation>
</ref>
<ref id="B120">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>W.-l.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Predicting the remaining useful life of an aircraft engine using a stacked sparse autoencoder with multilayer self-learning</article-title>. <source>Complexity</source>, <volume>2018</volume>, <fpage>1</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1155/2018/3813029</pub-id>
</citation>
</ref>
<ref id="B121">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maaten</surname>
<given-names>L. v. d.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Visualizing data using t-sne</article-title>. <source>J. Mach. Learn. Res.</source> <volume>9</volume>, <fpage>2579</fpage>&#x2013;<lpage>2605</lpage>
</citation>
</ref>
<ref id="B122">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Predicting remaining useful life of rolling bearings based on deep feature representation and long short-term memory neural network</article-title>. <source>Adv. Mech. Eng.</source> <volume>10</volume>, <fpage>168781401881718</fpage>. <pub-id pub-id-type="doi">10.1177/1687814018817184</pub-id>
</citation>
</ref>
<ref id="B123">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>9515</fpage>&#x2013;<lpage>9530</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2890693</pub-id>
</citation>
</ref>
<ref id="B124">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mathew</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Toby</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Rao</surname>
<given-names>B. M.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>M. G.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Prediction of remaining useful lifetime (rul) of turbofan engine using machine learning</article-title>,&#x201d; in <conf-name>2017 IEEE international conference on circuits and systems (ICCS)</conf-name>, <conf-loc>Thiruvananthapuram</conf-loc>, <fpage>306</fpage>&#x2013;<lpage>311</lpage>.</citation>
</ref>
<ref id="B125">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>McLachlan</surname>
<given-names>G. J.</given-names>
</name>
</person-group> (<year>2004</year>). <source>Discriminant analysis and statistical pattern recognition</source>, <volume>Vol. 544</volume>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>John Wiley &#x26; Sons</publisher-name>.</citation>
</ref>
<ref id="B126">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mechefske</surname>
<given-names>C. K.</given-names>
</name>
<name>
<surname>Mathew</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1992</year>). <article-title>Fault detection and diagnosis in low speed rolling element bearings part ii: the use of nearest neighbor classification</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>6</volume>, <fpage>309</fpage>&#x2013;<lpage>316</lpage>. <pub-id pub-id-type="doi">10.1016/0888-3270(92)90033-f</pub-id>
</citation>
</ref>
<ref id="B127">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Medjaher</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tobon-Mejia</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Zerhouni</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Remaining useful life estimation of critical components with application to bearings</article-title>. <source>IEEE Trans. Reliab.</source> <volume>61</volume>, <fpage>292</fpage>&#x2013;<lpage>302</lpage>. <pub-id pub-id-type="doi">10.1109/tr.2012.2194175</pub-id>
</citation>
</ref>
<ref id="B128">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Michau</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Palm&#xe9;</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Feature learning for fault detection in high-dimensional condition monitoring signals</article-title>. <source>Proc. Inst. Mech. Eng. O J. Risk Reliab.</source> <volume>234</volume>, <fpage>104</fpage>&#x2013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.1177/1748006x19868335</pub-id>
</citation>
</ref>
<ref id="B129">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Michau</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Palm&#xe9;</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Fleet phm for critical systems: bi-level deep learning approach for fault detection</article-title>,&#x201d; in <conf-name>Proceedings of the Fourth European Conference of the Prognostics and Health Management Society</conf-name>, <conf-loc>Utrecht, Netherlands</conf-loc>, <conf-date>4&#x2013;6 July 2018</conf-date>, <volume>vol. 4</volume>.</citation>
</ref>
<ref id="B130">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Michau</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Deep feature learning network for fault detection and isolation</article-title>,&#x201d; in <conf-name>PHM 2017: proceedings of the annual conference of the prognostics and health management society 2017</conf-name>, <conf-loc>St. Petersburg, FL</conf-loc>, <conf-date>2-5 October 2017</conf-date>, <fpage>108</fpage>&#x2013;<lpage>118</lpage>.</citation>
</ref>
<ref id="B131">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mitchell</surname>
<given-names>T. M.</given-names>
</name>
</person-group> (<year>1997</year>). <source>Machine learning</source>. <edition>1st Edn</edition>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>McGraw-Hill, Inc</publisher-name>.</citation>
</ref>
<ref id="B132">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mobley</surname>
<given-names>R. K.</given-names>
</name>
</person-group> (<year>2002</year>). <source>An introduction to predictive maintenance</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Elsevier</publisher-name>.</citation>
</ref>
<ref id="B133">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moosavi</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>N&#x2019;Diaye</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Djerdir</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ait-Amirat</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Arab Khaburi</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Artificial neural network-based fault diagnosis in the AC-DC converter of the power supply of series hybrid electric vehicle</article-title>. <source>IET Electr. Syst. Transp.</source> <volume>6</volume>, <fpage>96</fpage>&#x2013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1049/iet-est.2014.0055</pub-id>
</citation>
</ref>
<ref id="B134">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moosavian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ahmadi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Tabatabaeefar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Khazaee</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Comparison of two classifiers; k-nearest neighbor and artificial neural network, for fault diagnosis on a main engine journal-bearing</article-title>. <source>Shock Vib.</source> <volume>20</volume>, <fpage>263</fpage>&#x2013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1155/2013/360236</pub-id>
</citation>
</ref>
<ref id="B135">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nascimento</surname>
<given-names>G. R.</given-names>
</name>
<name>
<surname>Viana</surname>
<given-names>F. A.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Fleet prognosis with physics-informed recurrent neural networks</article-title>,&#x201d; <conf-name>The 12th International Workshop on Structural Health Monitoring 2019</conf-name>, <conf-loc>Stanford, CA</conf-loc>, <conf-date>September 10-12, 2019</conf-date>. <pub-id pub-id-type="doi">10.12783/shm2019/32301</pub-id>
</citation>
</ref>
<ref id="B136">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neath</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Cavanaugh</surname>
<given-names>J. E.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>The bayesian information criterion: background, derivation, and applications</article-title>. <source>WIREs Comp. Stat.</source> <volume>4</volume>, <fpage>199</fpage>&#x2013;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1002/wics.199</pub-id>
</citation>
</ref>
<ref id="B137">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ng</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Sparse autoencoder</article-title>. <source>CS294A Lecture notes</source> <volume>72</volume>, <fpage>1</fpage>&#x2013;<lpage>19</lpage>.</citation>
</ref>
<ref id="B138">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ngui</surname>
<given-names>W. K.</given-names>
</name>
<name>
<surname>Leong</surname>
<given-names>M. S.</given-names>
</name>
<name>
<surname>Shapiai</surname>
<given-names>M. I.</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>M. H.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Blade fault diagnosis using artificial neural network</article-title>. <source>Int. J. Appl. Eng. Res.</source> <volume>12</volume>, <fpage>519</fpage>&#x2013;<lpage>526</lpage>.</citation>
</ref>
<ref id="B139">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nik Aznan</surname>
<given-names>N. K.</given-names>
</name>
<name>
<surname>Atapour-Abarghouei</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bonner</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Connolly</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Al Moubayed</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Breckon</surname>
<given-names>T. P.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Simulating brain signals: creating synthetic eeg data via neural-based generative models for improved ssvep classification</article-title>,&#x201d; in <conf-name>2019 International joint conference on neural networks (IJCNN)</conf-name>, <conf-loc>Budapest, Hungary</conf-loc>, <fpage>1</fpage>&#x2013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B140">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ord&#xf3;&#xf1;ez</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>S&#xe1;nchez Lasheras</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Roca-Pardi&#xf1;as</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Juez</surname>
<given-names>F. J. d. C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A hybrid ARIMA-SVM model for the study of the remaining useful life of aircraft engines</article-title>. <source>J. Comput. Appl. Math.</source> <volume>346</volume>, <fpage>184</fpage>&#x2013;<lpage>191</lpage>. <pub-id pub-id-type="doi">10.1016/j.cam.2018.07.008</pub-id>
</citation>
</ref>
<ref id="B141">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>An improved bearing fault diagnosis method using one-dimensional cnn and lstm</article-title>. <source>J. Mech. Eng.</source> <volume>64</volume>, <fpage>443</fpage>&#x2013;<lpage>452</lpage>.</citation>
</ref>
<ref id="B142">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Physics-induced graph neural network: an application to wind-farm power estimation</article-title>. <source>Energy</source> <volume>187</volume>, <fpage>115883</fpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2019.115883</pub-id>
</citation>
</ref>
<ref id="B143">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Marco</surname>
<given-names>P. D.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Bang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Fault detection and diagnosis using combined autoencoder and long short-term memory network</article-title>. <source>Sensors</source> <volume>19</volume>, <fpage>4612</fpage>. <pub-id pub-id-type="doi">10.3390/s19214612</pub-id>
</citation>
</ref>
<ref id="B144">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Paszke</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Massa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lerer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bradbury</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chanan</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). &#x201c;<article-title>Pytorch: an imperative style, high-performance deep learning library</article-title>,&#x201d; in <source>Advances in neural information processing systems</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Wallach</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Larochelle</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Beygelzimer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Garnett</surname>
<given-names>R.</given-names>
</name>
</person-group> (<publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <volume>Vol. 32</volume>, <fpage>8024</fpage>&#x2013;<lpage>8035</lpage>.</citation>
</ref>
<ref id="B145">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Patil</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Patil</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Handikherkar</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Desai</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Phalle</surname>
<given-names>V. M.</given-names>
</name>
<name>
<surname>Kazi</surname>
<given-names>F. S.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Remaining useful life (rul) prediction of rolling element bearing using random forest and gradient boosting technique</article-title>,&#x201d; in <source>ASME international mechanical engineering congress and exposition</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>American Society of Mechanical Engineers (ASME)</publisher-name>, <volume>Vol. 52187</volume>, <fpage>V013T05A019</fpage>.</citation>
</ref>
<ref id="B146">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Theoretical impediments to machine learning with seven sparks from the causal revolution</article-title>,&#x201d; in <conf-name>Proceedings of the eleventh ACM international conference on web search and data mining</conf-name>, <conf-loc>Marina Del Rey, CA</conf-loc>, <conf-date>February 5&#x2013;9, 2018</conf-date>
</citation>
</ref>
<ref id="B147">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Poyhonen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jover</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hyotyniemi</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2004</year>). &#x201c;<article-title>Signal processing of vibrations for condition monitoring of an induction motor</article-title>,&#x201d; in <conf-name>First international symposium on control, communications and signal processing, 2004</conf-name>, <conf-loc>Hammamet, Tunisia</conf-loc>, <conf-date>21&#x2013;24 March 2004</conf-date> (<publisher-name>IEEE</publisher-name>, <fpage>499</fpage>&#x2013;<lpage>502</lpage>.</citation>
</ref>
<ref id="B148">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Praveenkumar</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sabhrish</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Saimurugan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ramachandran</surname>
<given-names>K. I.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Pattern recognition based on-line vibration monitoring system for fault diagnosis of automobile gearbox</article-title>. <source>Measurement</source> <volume>114</volume>, <fpage>233</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2017.09.041</pub-id>
</citation>
</ref>
<ref id="B149">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qin</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Rolling bearings fault diagnosis via 1d convolution networks</article-title>,&#x201d; in <conf-name>2019 IEEE 4th international Conference on Signal and image processing (ICSIP)</conf-name>, <conf-loc>Wuxi, China</conf-loc>, <conf-date>July 19&#x2013;21, 2019</conf-date>, <fpage>617</fpage>&#x2013;<lpage>621</lpage>.</citation>
</ref>
<ref id="B150">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Modified bi-directional lstm neural networks for rolling bearing fault diagnosis</article-title>,&#x201d; in <conf-name>ICC 2019 - 2019 IEEE international conference on communications (ICC)</conf-name>, <conf-loc>Shanghai, China</conf-loc>, <conf-date>May 20&#x2013;24, 2019</conf-date>, <fpage>1</fpage>&#x2013;<lpage>6</lpage>.</citation>
</ref>
<ref id="B151">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Quinlan</surname>
<given-names>J. R.</given-names>
</name>
</person-group> (<year>2014</year>). <source>C4. 5: programs for machine learning.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Elsevier</publisher-name>.</citation>
</ref>
<ref id="B152">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ran</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A survey of predictive maintenance: systems, purposes and approaches</article-title>. <source>ArXiv</source>.</citation>
</ref>
<ref id="B153">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rasmussen</surname>
<given-names>C. E.</given-names>
</name>
</person-group> (<year>2003</year>). &#x201c;<article-title>Gaussian processes in machine learning</article-title>,&#x201d; in <source>Summer school on machine learning</source>. (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>63</fpage>&#x2013;<lpage>71</lpage>.</citation>
</ref>
<ref id="B154">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Bearing remaining useful life prediction based on deep autoencoder and deep neural networks</article-title>. <source>J. Manuf. Syst.</source> <volume>48</volume>, <fpage>71</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmsy.2018.04.008</pub-id>
</citation>
</ref>
<ref id="B155">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018a</year>). <article-title>Prediction of bearing remaining useful life with deep convolution neural network</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>13041</fpage>&#x2013;<lpage>13049</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2804930</pub-id>
</citation>
</ref>
<ref id="B156">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018b</year>). <article-title>Remaining useful life prediction for lithium-ion battery: a deep learning approach</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>50587</fpage>&#x2013;<lpage>50598</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2858856</pub-id>
</citation>
</ref>
<ref id="B157">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ribeiro</surname>
<given-names>M. T.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Guestrin</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>&#x201c;Why should i trust you?&#x201d; explaining the predictions of any classifier</article-title>,&#x201d; in <conf-name>Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</conf-name>. (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>1135</fpage>&#x2013;<lpage>1144</lpage>.</citation>
</ref>
<ref id="B158">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sakamoto</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ishiguro</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kitagawa</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>1986</year>). <source>Akaike information criterion statistics</source>. <publisher-loc>Dordrecht, The Netherlands</publisher-loc>: <publisher-name>D. Reidel</publisher-name>, <volume>Vol. 81</volume>.</citation>
</ref>
<ref id="B159">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sakthivel</surname>
<given-names>N. R.</given-names>
</name>
<name>
<surname>Sugumaran</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Babudevasenapati</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Vibration based fault diagnosis of monoblock centrifugal pump using decision tree</article-title>. <source>Expert Syst. Appl.</source> <volume>37</volume>, <fpage>4040</fpage>&#x2013;<lpage>4049</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2009.10.002</pub-id>
</citation>
</ref>
<ref id="B160">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Samanta</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Gear fault detection using artificial neural networks and support vector machines with genetic algorithms</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>18</volume>, <fpage>625</fpage>&#x2013;<lpage>644</lpage>. <pub-id pub-id-type="doi">10.1016/s0888-3270(03)00020-7</pub-id>
</citation>
</ref>
<ref id="B161">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Samanta</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Al-Balushi</surname>
<given-names>K. R.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Artificial neural network based fault diagnostics of rolling element bearings using time-domain features</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>17</volume>, <fpage>317</fpage>&#x2013;<lpage>328</lpage>. <pub-id pub-id-type="doi">10.1006/mssp.2001.1462</pub-id>
</citation>
</ref>
<ref id="B162">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanchez-Gonzalez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Heess</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Springenberg</surname>
<given-names>J. T.</given-names>
</name>
<name>
<surname>Merel</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Riedmiller</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hadsell</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Graph networks as learnable physics engines for inference and control</article-title>. <source>arXiv</source>.</citation>
</ref>
<ref id="B163">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santos</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Villa</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Re&#xf1;ones</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bustillo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maudes</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>An svm-based solution for fault detection in wind turbines</article-title>. <source>Sensors</source> <volume>15</volume>, <fpage>5627</fpage>&#x2013;<lpage>5648</lpage>. <pub-id pub-id-type="doi">10.3390/s150305627</pub-id>
</citation>
</ref>
<ref id="B164">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saravanan</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Ramachandran</surname>
<given-names>K. I.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Fault diagnosis of spur bevel gear box using discrete wavelet features and decision tree classification</article-title>. <source>Expert Syst. Appl.</source> <volume>36</volume>, <fpage>9564</fpage>&#x2013;<lpage>9573</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2008.07.089</pub-id>
</citation>
</ref>
<ref id="B165">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Satishkumar</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sugumaran</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Remaining life time prediction of bearings through classification using decision tree algorithm</article-title>. <source>Int. J. Appl. Eng. Res.</source> <volume>10</volume>, <fpage>34861</fpage>&#x2013;<lpage>34866</lpage>.</citation>
</ref>
<ref id="B166">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saxena</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Goebel</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>C-mapss data set. NASA ames prognostics data repository</article-title>.</citation>
</ref>
<ref id="B167">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schuster</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Paliwal</surname>
<given-names>K. K.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Bidirectional recurrent neural networks</article-title>. <source>IEEE Trans. Signal Process.</source> <volume>45</volume>, <fpage>2673</fpage>&#x2013;<lpage>2681</lpage>. <pub-id pub-id-type="doi">10.1109/78.650093</pub-id>
</citation>
</ref>
<ref id="B168">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>McAleer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Baldi</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2019a</year>). <article-title>Highly accurate machine fault diagnosis using deep transfer learning</article-title>. <source>IEEE Trans. Ind. Inf.</source> <volume>15(4)</volume>, <fpage>2446</fpage>&#x2013;<lpage>2455</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2018.2864759</pub-id>
</citation>
</ref>
<ref id="B169">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>Generative adversarial networks for data augmentation in machine fault diagnosis</article-title>. <source>Comput. Ind.</source> <volume>106</volume>, <fpage>85</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2019.01.001</pub-id>
</citation>
</ref>
<ref id="B170">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Nezu</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Prognosis of remaining bearing life using neural networks</article-title>. <source>Proc. IME J. Syst. Contr. Eng.</source> <volume>214</volume>, <fpage>217</fpage>&#x2013;<lpage>230</lpage>. <pub-id pub-id-type="doi">10.1243/0959651001540582</pub-id>
</citation>
</ref>
<ref id="B171">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>. <source>J. Mach. Learn. Res.</source> <volume>15</volume>, <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>. <pub-id pub-id-type="doi">10.5555/2627435.2670313</pub-id>
</citation>
</ref>
<ref id="B172">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sugumaran</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Exploiting sound signals for fault diagnosis of bearings using decision tree</article-title>. <source>Measurement</source> <volume>46</volume>, <fpage>1250</fpage>&#x2013;<lpage>1256</lpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2012.11.011</pub-id>.</citation>
</ref>
<ref id="B173">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sugumaran</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ramachandran</surname>
<given-names>K. I.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>21</volume>, <fpage>2237</fpage>&#x2013;<lpage>2247</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2006.09.007</pub-id>
</citation>
</ref>
<ref id="B174">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sui</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Prediction of bearing remaining useful life based on mutual information and support vector regression model</article-title>. <source>IOP Conf. Ser. Mater. Sci. Eng.</source> <volume>533</volume>, <fpage>012032</fpage>. <pub-id pub-id-type="doi">10.1088/1757-899x/533/1/012032</pub-id>
</citation>
</ref>
<ref id="B175">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Research on bearing life prediction based on support vector machine and its application</article-title>. <source>J. Phys.: Conf. Ser.</source> <volume>305</volume>, <fpage>012028</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/305/1/012028</pub-id>
</citation>
</ref>
<ref id="B176">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2016a</year>). <article-title>A novel efficient SVM-based fault diagnosis method for multi-split air conditioning system&#x2019;s refrigerant charge fault amount</article-title>. <source>Appl. Therm. Eng.</source> <volume>108</volume>, <fpage>989</fpage>. <pub-id pub-id-type="doi">10.1016/j.applthermaleng.2016.07.109</pub-id>
</citation>
</ref>
<ref id="B177">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2016b</year>). <article-title>A sparse auto-encoder-based deep neural network approach for induction motor faults classification</article-title>. <source>Measurement</source> <volume>89</volume>, <fpage>171</fpage>&#x2013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/j.measurement.2016.04.007</pub-id>
</citation>
</ref>
<ref id="B178">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>An intelligent gear fault diagnosis methodology using a complex wavelet enhanced convolutional neural network</article-title>. <source>Materials</source> <volume>10</volume>, <fpage>790</fpage>. <pub-id pub-id-type="doi">10.3390/ma10070790</pub-id>
</citation>
</ref>
<ref id="B179">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Swanson</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Linking maintenance strategies to performance</article-title>. <source>Int. J. Prod. Econ.</source> <volume>70</volume>, <fpage>237</fpage>&#x2013;<lpage>244</lpage>. <pub-id pub-id-type="doi">10.1016/s0925-5273(00)00067-0</pub-id>
</citation>
</ref>
<ref id="B180">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sermanet</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Reed</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Anguelov</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). &#x201c;<article-title>Going deeper with convolutions</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name>, <conf-loc>Boston, MA</conf-loc>, <conf-date>June 7&#x2013;12, 2015</conf-date>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. </citation>
</ref>
<ref id="B181">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tayade</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Patil</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Phalle</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kazi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Powar</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Remaining useful life (rul) prediction of bearing by using regression model and principal component analysis (pca) technique</article-title>. <source>Vibroengineering PROCEDIA</source> <volume>23</volume>, <fpage>30</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.21595/vp.2019.20617</pub-id>
</citation>
</ref>
<ref id="B182">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Teng</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kusiak</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Z.</given-names>
</name>
</person-group> <year>2016</year>). <article-title>Prognosis of the remaining useful life of bearings in a wind turbine gearbox</article-title>. <source>Energies</source> <volume>10</volume>, <fpage>32</fpage>. <pub-id pub-id-type="doi">10.3390/en10010032</pub-id>
</citation>
</ref>
<ref id="B183">
<citation citation-type="journal">
<collab>Theano Development Team</collab> (<year>2016</year>). <article-title>Theano: a python framework for fast computation of mathematical expressions</article-title>. <source>arXiv</source>. </citation>
</ref>
<ref id="B184">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thirukovalluru</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Dixit</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sevakula</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Verma</surname>
<given-names>N. K.</given-names>
</name>
<name>
<surname>Salour</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Generating feature sets for fault diagnosis using denoising stacked auto-encoder</article-title>,&#x201d; in <conf-name>2016 IEEE international conference on prognostics and health management (ICPHM)</conf-name>, <conf-loc>Ottawa, ON, Canada</conf-loc>, <conf-date>June 20&#x2013;22, 2016</conf-date>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. </citation>
</ref>
<ref id="B185">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tian</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Morillo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Azarian</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Pecht</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Motor bearing fault detection using spectral kurtosis-based feature extraction coupled with k-nearest neighbor distance analysis</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>63</volume>, <fpage>1793</fpage>&#x2013;<lpage>1803</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2015.2509913</pub-id>
</citation>
</ref>
<ref id="B186">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tibshirani</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Regression shrinkage and selection via the lasso</article-title>. <source>J. Roy. Stat. Soc. B</source> <volume>58</volume>, <fpage>267</fpage>&#x2013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1111/j.2517-6161.1996.tb02080.x</pub-id>
</citation>
</ref>
<ref id="B187">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tzeng</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Hoffman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Saenko</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Darrell</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Adversarial discriminative domain adaptation</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name>, <conf-loc>Honolulu, HI</conf-loc>, <conf-date>July 21&#x2013;26, 2017</conf-date>, <fpage>7167</fpage>&#x2013;<lpage>7176</lpage>. </citation>
</ref>
<ref id="B188">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vincent</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Larochelle</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Manzagol</surname>
<given-names>P.-A.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Extracting and composing robust features with denoising autoencoders</article-title>,&#x201d; in <conf-name>Machine learning, proceedings of the twenty-fifth international conference (ICML 2008)</conf-name>, <conf-loc>Helsinki, Finland</conf-loc>, <conf-date>June 5&#x2013;9, 2008</conf-date>, <fpage>1096</fpage>&#x2013;<lpage>1103</lpage>. <pub-id pub-id-type="doi">10.1145/1390156.1390294</pub-id>
</citation>
</ref>
<ref id="B189">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>An</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Bao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Generalization of deep neural networks for imbalanced fault classification of machinery using generative adversarial networks</article-title>. <source>Ieee Access</source> <volume>7</volume>, <fpage>111168</fpage>&#x2013;<lpage>111180</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2924003</pub-id>
</citation>
</ref>
<ref id="B190">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mo</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A deep learning method for bearing fault diagnosis based on time-frequency image</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>42373</fpage>&#x2013;<lpage>42383</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2907131</pub-id>
</citation>
</ref>
<ref id="B191">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Transformer fault diagnosis using continuous sparse autoencoder</article-title>. <source>SpringerPlus</source> <volume>5</volume>, <fpage>448</fpage>. <pub-id pub-id-type="doi">10.1186/s40064-016-2107-7</pub-id>
</citation>
</ref>
<ref id="B192">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Michau</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019a</year>). &#x201c;<article-title>Domain adaptive transfer learning for fault diagnosis</article-title>,&#x201d; in <conf-name>2019 prognostics and system health management conference</conf-name>, <conf-loc>Paris, France</conf-loc>, <conf-date>May 2&#x2013;5, 2019</conf-date> (<publisher-name>IEEE)</publisher-name>, <fpage>279</fpage>&#x2013;<lpage>285</lpage>. </citation>
</ref>
<ref id="B193">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>A method for rapidly evaluating reliability and predicting remaining useful life using two-dimensional convolutional neural network with signal conversion</article-title>. <source>J. Mech. Sci. Technol.</source> <volume>33</volume>, <fpage>2561</fpage>&#x2013;<lpage>2571</lpage>. <pub-id pub-id-type="doi">10.1007/s12206-019-0504-x</pub-id>
</citation>
</ref>
<ref id="B194">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Triplet loss guided adversarial domain adaptation for bearing fault diagnosis</article-title>. <source>Sensors</source> <volume>20</volume>, <fpage>320</fpage>. <pub-id pub-id-type="doi">10.3390/s20010320</pub-id>
</citation>
</ref>
<ref id="B195">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.-X.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L.-Y.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>A compact k nearest neighbor classification for power plant fault diagnosis</article-title>. <source>J. Inf. Hiding Multimed. Signal Process</source> <volume>5</volume>, <fpage>508</fpage>&#x2013;<lpage>517</lpage>. </citation>
</ref>
<ref id="B196">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Fault diagnosis of a rolling bearing using wavelet packet denoising and random forests</article-title>. <source>IEEE Sensor. J.</source> <volume>17</volume>, <fpage>5581</fpage>&#x2013;<lpage>5588</lpage>. <pub-id pub-id-type="doi">10.1109/jsen.2017.2726011</pub-id>
</citation>
</ref>
<ref id="B197">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Remaining useful life prediction and state of health diagnosis for lithium-ion batteries using particle filter and support vector regression</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>65</volume>, <fpage>5634</fpage>&#x2013;<lpage>5643</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2017.2782224</pub-id>
</citation>
</ref>
<ref id="B198">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Degradation assessment for the ball screw with variational autoencoder and kernel density estimation</article-title>. <source>Adv. Mech. Eng.</source> <volume>10</volume>, <fpage>168781401879726</fpage>. <pub-id pub-id-type="doi">10.1177/1687814018797261</pub-id>
</citation>
</ref>
<ref id="B199">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>L.</given-names>
</name>
</person-group> <year>2019a</year>). <article-title>A new ensemble residual convolutional neural network for remaining useful life estimation</article-title>. <source>Math. Biosci. Eng.</source> <volume>16</volume>, <fpage>862</fpage>&#x2013;<lpage>880</lpage>. <pub-id pub-id-type="doi">10.3934/mbe.2019040</pub-id>
</citation>
</ref>
<ref id="B200">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019b</year>). <article-title>A transfer convolutional neural network for fault diagnosis based on resnet-50</article-title>. <source>Neural Comput. Appl.</source> <volume>32</volume>, <fpage>6111</fpage>&#x2013;<lpage>6124</lpage>. <pub-id pub-id-type="doi">10.1007/s00521-019-04097-w</pub-id>
</citation>
</ref>
<ref id="B201">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A new convolutional neural network-based data-driven fault diagnosis method</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>65</volume>, <fpage>5990</fpage>&#x2013;<lpage>5998</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2017.2774777</pub-id>
</citation>
</ref>
<ref id="B202">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Widodo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.-S.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors</article-title>. <source>Expert Syst. Appl.</source> <volume>33</volume>, <fpage>241</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2006.04.020</pub-id>
</citation>
</ref>
<ref id="B203">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural network</article-title>. <source>ISA (Instrum. Soc. Am.) Trans.</source> <volume>97</volume>, <fpage>241</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1016/j.isatra.2019.07.004</pub-id>
</citation>
</ref>
<ref id="B204">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018a</year>). <article-title>Approach for fault prognosis using recurrent neural network</article-title>. <source>J. Intell. Manuf.</source>, <volume>31</volume>, <fpage>1621</fpage>. <pub-id pub-id-type="doi">10.1007/s10845-018-1428-5</pub-id>
</citation>
</ref>
<ref id="B205">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018b</year>). <article-title>Remaining useful life estimation of engineered systems using vanilla lstm neural networks</article-title>. <source>Neurocomputing</source> <volume>275</volume>, <fpage>167</fpage>&#x2013;<lpage>179</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2017.05.063</pub-id>
</citation>
</ref>
<ref id="B206">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Shu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>de Silva</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A two-stage approach for the remaining useful life prediction of bearings using deep neural networks</article-title>. <source>IEEE Trans. Ind. Inf.</source> <volume>15</volume>, <fpage>3703</fpage>&#x2013;<lpage>3711</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2018.2868687</pub-id>
</citation>
</ref>
<ref id="B207">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Long short-term memory neural network with weight amplification and its application into gear remaining useful life prediction</article-title>. <source>Eng. Appl. Artif. Intell.</source> <volume>91</volume>, <fpage>103587</fpage>. <pub-id pub-id-type="doi">10.1016/j.engappai.2020.103587</pub-id>
</citation>
</ref>
<ref id="B208">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xueyi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Gear pitting fault diagnosis using integrated cnn and gru network with both vibration and acoustic emission signals</article-title>. <source>Appl. Sci.</source> <volume>9</volume>, <fpage>768</fpage>. <pub-id pub-id-type="doi">10.3390/app9040768</pub-id>
</citation>
</ref>
<ref id="B209">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hua</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Industrial big data analytics for prediction of remaining useful life based on deep learning</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>17190</fpage>&#x2013;<lpage>17197</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2809681</pub-id>
</citation>
</ref>
<ref id="B210">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2006</year>). &#x201c;<article-title>Application of random forest to aircraft engine fault diagnosis</article-title>,&#x201d; in <conf-name>The proceedings of the multiconference on&#x201d; computational engineering in systems applications&#x201d;</conf-name>, <conf-loc>Beijing, China</conf-loc>, <conf-date>October 4&#x2013;6, 2006</conf-date> (<publisher-name>IEEE</publisher-name>), <volume>Vol. 1</volume>, <fpage>468</fpage>&#x2013;<lpage>475</lpage> </citation>
</ref>
<ref id="B211">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Detecting gas turbine combustor anomalies using semi-supervised anomaly detection with deep representation learning</article-title>. <source>Cogn Comput</source> <volume>12</volume>, <fpage>1</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1007/s12559-019-09710-7</pub-id>
</citation>
</ref>
<ref id="B212">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zio</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Remaining useful life prediction based on a double-convolutional neural network architecture</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>66</volume>, <fpage>9521</fpage>&#x2013;<lpage>9530</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2019.2924605</pub-id>
</citation>
</ref>
<ref id="B213">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>B.-S.</given-names>
</name>
<name>
<surname>Di</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Random forests classifier for machine fault diagnosis</article-title>. <source>J. Mech. Sci. Technol.</source> <volume>22</volume>, <fpage>1716</fpage>&#x2013;<lpage>1725</lpage>. <pub-id pub-id-type="doi">10.1007/s12206-008-0603-6</pub-id>
</citation>
</ref>
<ref id="B214">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>B.-S.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>W.-W.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Fault diagnosis of rotating machinery based on multi-class support vector machines</article-title>. <source>J. Mech. Sci. Technol.</source> <volume>19</volume>, <fpage>846</fpage>&#x2013;<lpage>859</lpage>. <pub-id pub-id-type="doi">10.1007/BF02916133</pub-id>
</citation>
</ref>
<ref id="B215">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Intelligent fault diagnosis of rolling element bearing based on svms and fractal dimension</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>21</volume>, <fpage>2012</fpage>&#x2013;<lpage>2024</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2006.10.005</pub-id>
</citation>
</ref>
<ref id="B216">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z.-X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.-B.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>J.-H.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Representational learning for fault diagnosis of wind turbine equipment: a multi-layered extreme learning machines approach</article-title>. <source>Energies</source> <volume>9</volume>, <fpage>379</fpage>. <pub-id pub-id-type="doi">10.3390/en9060379</pub-id>
</citation>
</ref>
<ref id="B217">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Gui</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Dan</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>End-to-end convolutional neural network model for gear fault diagnosis based on sound signals</article-title>. <source>Appl. Sci.</source> <volume>8</volume>, <fpage>1584</fpage>. <pub-id pub-id-type="doi">10.3390/app8091584</pub-id>
</citation>
</ref>
<ref id="B218">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Baek</surname>
<given-names>J.-G.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A novel image feature for the remaining useful lifetime prediction of bearings based on continuous wavelet transform and convolutional neural network</article-title>. <source>Appl. Sci.</source> <volume>8</volume>, <fpage>1102</fpage>. <pub-id pub-id-type="doi">10.3390/app8071102</pub-id>
</citation>
</ref>
<ref id="B219">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jarrett</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>van der Schaar</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Time-series generative adversarial networks</article-title>,&#x201d; in <conf-name>Advances in neural information processing systems</conf-name> <comment>Editors H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch-Buc, E. Fox, and R. Garnett (New York, Ny: Curran Associates, Inc.)</comment>, <fpage>5508</fpage>&#x2013;<lpage>5518</lpage>. </citation>
</ref>
<ref id="B220">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yosinski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Clune</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lipson</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>How transferable are features in deep neural networks?</article-title>,&#x201d; in <source>Advances in neural information processing systems 27</source>. Editors. <person-group person-group-type="editor">
<name>
<surname>Ghahramani</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cortes</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>N. D.</given-names>
</name>
<name>
<surname>Weinberger</surname>
<given-names>K. Q.</given-names>
</name>
</person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>3320</fpage>&#x2013;<lpage>3328</lpage>. </citation>
</ref>
<ref id="B221">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>An intelligent fault diagnosis method using gru neural network toward sequential data in dynamic processes</article-title>. <source>Processes</source> <volume>7</volume>, <fpage>152</fpage>. <pub-id pub-id-type="doi">10.3390/pr7030152</pub-id>
</citation>
</ref>
<ref id="B222">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Fault diagnosis and remaining useful life estimation of aero engine using lstm neural network</article-title>,&#x201d; in <conf-name>2016 IEEE international conference on aircraft utility systems (AUS)</conf-name>, <conf-loc>Beijing, China</conf-loc>, <conf-date>October 10&#x2013;12, 2016</conf-date>, <fpage>135</fpage>&#x2013;<lpage>140</lpage>. </citation>
</ref>
<ref id="B223">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Duan</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A novel fusion diagnosis method for rotor system fault based on deep learning and multi-sourced heterogeneous monitoring data</article-title>. <source>Meas. Sci. Technol.</source> <volume>29</volume>, <fpage>115005</fpage>. <pub-id pub-id-type="doi">10.1088/1361-6501/aadfb3</pub-id>
</citation>
</ref>
<ref id="B224">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hardt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Recht</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Vinyals</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Understanding deep learning requires rethinking generalization</article-title>,&#x201d; in <conf-name>5th international conference on learning representations, ICLR 2017</conf-name>, <conf-loc>Toulon, France</conf-loc>, <conf-date>April 24-26, 2017</conf-date>, <comment>Conference track proceedings (OpenReview.net)</comment>. </citation>
</ref>
<ref id="B225">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Pecht</surname>
<given-names>M. G.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries</article-title>. <source>IEEE Trans. Veh. Technol.</source> <volume>67</volume>, <fpage>5695</fpage>&#x2013;<lpage>5705</lpage>. <pub-id pub-id-type="doi">10.1109/tvt.2018.2805189</pub-id>
</citation>
</ref>
<ref id="B226">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Sequential fault diagnosis based on lstm neural network</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>12929</fpage>&#x2013;<lpage>12939</lpage>. <pub-id pub-id-type="doi">10.1109/access.2018.2794765</pub-id>
</citation>
</ref>
<ref id="B227">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A novel prediction method based on the support vector regression for the remaining useful life of lithium-ion batteries</article-title>. <source>Microelectron. Reliab.</source> <volume>85</volume>, <fpage>99</fpage>&#x2013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1016/j.microrel.2018.04.007</pub-id>
</citation>
</ref>
<ref id="B228">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>R. X.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Deep learning and its applications to machine health monitoring</article-title>. <source>Mech. Syst. Signal Process.</source> <volume>115</volume>, <fpage>213</fpage>&#x2013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymssp.2018.05.050</pub-id>
</citation>
</ref>
<ref id="B229">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Learning to monitor machine health with convolutional bi-directional lstm networks</article-title>. <source>Sensors</source> <volume>17</volume>, <fpage>273</fpage>. <pub-id pub-id-type="doi">10.3390/s17020273</pub-id>
</citation>
</ref>
<ref id="B230">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Cross-domain fault diagnosis using knowledge transfer strategy: a review</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>129260</fpage>&#x2013;<lpage>129290</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2939876</pub-id>
</citation>
</ref>
<ref id="B231">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ristovski</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Farahat</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gupta</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Long short-term memory network for remaining useful life estimation</article-title>,&#x201d; in <conf-name>2017 IEEE international conference on prognostics and health management (ICPHM)</conf-name>, <conf-loc>Dallas, TX</conf-loc>, <conf-date>June 19&#x2013;21, 2017</conf-date>, <fpage>88</fpage>&#x2013;<lpage>95</lpage> </citation>
</ref>
<ref id="B232">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). &#x201c;<article-title>A novel method for lithium-ion battery remaining useful life prediction using time window and gradient boosting decision trees</article-title>,&#x201d; in <conf-name>2019 10th international conference on power electronics and ECCE Asia (ICPE 2019 - ECCE Asia)</conf-name>, <conf-loc>Busan, South Korea</conf-loc>, <conf-date>May 27&#x2013;30, 2019</conf-date>, <fpage>3297</fpage>&#x2013;<lpage>3302</lpage>. </citation>
</ref>
<ref id="B233">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Khosla</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lapedriza</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Oliva</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Torralba</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Learning deep features for discriminative localization</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name>, <conf-loc>Las Vegas, NV</conf-loc>, <fpage>2921</fpage>&#x2013;<lpage>2929</lpage>. </citation>
</ref>
<ref id="B234">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ting</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Fault diagnosis of hydraulic pump based on stacked autoencoders</article-title>,&#x201d; in <conf-name>2015 12th IEEE international conference on electronic measurement Instruments (ICEMI)</conf-name>, <volume>Vol. 01</volume>, <fpage>58</fpage>&#x2013;<lpage>62</lpage>. </citation>
</ref>
<ref id="B235">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Estimation of bearing remaining useful life based on multiscale convolutional neural network</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>66</volume>, <fpage>3208</fpage>&#x2013;<lpage>3216</lpage>. </citation>
</ref>
</ref-list>
</back>
</article>
