<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2020.00023</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Vulnerabilities of Connectionist AI Applications: Evaluation and Defense</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Berghoff</surname> <given-names>Christian</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/919949/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Neu</surname> <given-names>Matthias</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/970737/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>von Twickel</surname> <given-names>Arndt</given-names></name>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/921592/overview"/>
</contrib>
</contrib-group>
<aff><institution>Federal Office for Information Security</institution>, <addr-line>Bonn</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Xue Lin, Northeastern University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ping Yang, Binghamton University, United States; Fuxun Yu, George Mason University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Christian Berghoff <email>christian.berghoff&#x00040;bsi.bund.de</email></corresp>
<corresp id="c002">Arndt von Twickel <email>arndt.twickel&#x00040;bsi.bund.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Cybersecurity and Privacy, a section of the journal Frontiers in Big Data</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>07</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>23</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>03</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>06</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Berghoff, Neu and von Twickel.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Berghoff, Neu and von Twickel</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>This article deals with the IT security of connectionist artificial intelligence (AI) applications, focusing on threats to integrity, one of the three IT security goals. Such threats are for instance most relevant in prominent AI computer vision applications. In order to present a holistic view on the IT security goal integrity, many additional aspects, such as interpretability, robustness and documentation are taken into account. A comprehensive list of threats and possible mitigations is presented by reviewing the state-of-the-art literature. AI-specific vulnerabilities, such as adversarial attacks and poisoning attacks are discussed in detail, together with key factors underlying them. Additionally and in contrast to former reviews, the whole AI life cycle is analyzed with respect to vulnerabilities, including the planning, data acquisition, training, evaluation and operation phases. The discussion of mitigations is likewise not restricted to the level of the AI system itself but rather advocates viewing AI systems in the context of their life cycles and their embeddings in larger IT infrastructures and hardware devices. Based on this and the observation that adaptive attackers may circumvent any single published AI-specific defense to date, the article concludes that single protective measures are not sufficient but rather multiple measures on different levels have to be combined to achieve a minimum level of IT security for AI applications.</p></abstract>
<kwd-group>
<kwd>artificial intelligence</kwd>
<kwd>neural network</kwd>
<kwd>IT security</kwd>
<kwd>interpretability</kwd>
<kwd>certification</kwd>
<kwd>adversarial attack</kwd>
<kwd>poisoning attack</kwd>
</kwd-group>
<counts>
<fig-count count="8"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="113"/>
<page-count count="18"/>
<word-count count="15260"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>This article is concerned with the IT security aspects of artificial intelligence (AI) applications<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>, namely their vulnerabilities and possible defenses. As any IT component, AI systems may not work as intended or may be targeted by attackers. Care must hence be taken to guarantee an appropriately high level of safety and security. This applies in particular whenever AI systems are used in applications where certain failures may have far-reaching and potentially disastrous impacts including the death of people. Examples commonly cited include computer vision tasks from biometric identification and authentication as well as driving on-road vehicles at higher levels of autonomy (ORAD Committee, <xref ref-type="bibr" rid="B72">2018</xref>). Since the core problem of guaranteeing a secure and safe operation of AI systems lies at the intersection of the areas of AI and IT security, this article targets readers from both communities.</p>
<sec>
<title>1.1. Symbolic vs. Connectionist AI</title>
<p>AI systems are traditionally divided into two categories: symbolic AI (sAI) and non-symbolic (or connectionist) AI (cAI) systems. sAI has been a subject of research for many decades, starting from the 1960s (Lederberg, <xref ref-type="bibr" rid="B55">1987</xref>). In sAI, problems are directly encoded in a human-readable model and the resulting sAI system is expected to take decisions based on this model. Examples of sAI include rule-based systems using decision trees (expert systems), planning systems and constraint solvers. In contrast, cAI systems consist of massively parallel interconnected systems of simple processing elements, similar in spirit to biological brains. cAI includes all variants of neural networks, such as deep neural networks (DNNs), convolutional neural networks (CNNs) and radial basis function networks (RBFNs) as well as support-vector machines (SVMs). Operational cAI models are created indirectly using training data and machine learning and are usually not human-readable. The basic ideas for cAI systems date back to as early as 1943 (McCulloch and Pitts, <xref ref-type="bibr" rid="B66">1943</xref>). After a prolonged stagnation in the 1970s, cAI systems slowly started to gain traction again in the 1980s (Haykin, <xref ref-type="bibr" rid="B41">1999</xref>). In recent years, starting from about 2009, due to significant improvements in processing power and the amount of example data available, the performance of cAI systems has tremendously improved. In many areas, cAI systems nowadays outperform sAI systems and even humans. For this reason, they are used in many applications, and new proposals for using them seem to be made on a daily basis. Besides pure cAI and sAI systems, hybrid systems exist. In this article, sAI is considered a traditional IT system and the focus is on cAI systems, especially due to their qualitatively new vulnerabilities that in turn require qualitatively new evaluation and defense methods. Unless otherwise noted, the terms AI and cAI will from now on be used interchangeably.</p>
</sec>
<sec>
<title>1.2. Life Cycle of AI Systems</title>
<p>In contrast to sAI and traditional IT systems, cAI systems are not directly constructed by a human programmer (cf. <xref ref-type="fig" rid="F1">Figure 1</xref>). Instead, a developer determines the necessary boundary conditions, i.e., required performance<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>, an untrained AI system, training data and a machine learning (ML) algorithm, and then starts a ML session, during which a ML algorithm trains the untrained AI system using the training data. This ML session consists of alternating training and validation phases (not shown in <xref ref-type="fig" rid="F1">Figure 1</xref>) and is repeated until the required performance of the AI system is achieved. If the desired performance is not reached within a predefined number of iterations or if performance ceases to increase beforehand, the training session is canceled and a new one is started. Depending on the ML policy, the training session is initialized anew using randomized starting conditions or the boundary conditions are manually adjusted by the developer. Once the desired performance is achieved, it is validated using the test data set, which must be independent from the training data set. Training can be performed in the setting of supervised learning, where the input data contain preassigned labels, which specify the correct corresponding output (as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>), or unsupervised learning, where no labels are given and the AI system learns some representation of the data, for instance by clustering similar data points. While this article takes the perspective of supervised learning, most of its results also apply to the setting of unsupervised learning. After successful training, the AI system can be used on new, i.e., previously unknown, input data to make predictions, which is called inference.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Contrasting the development of <bold>(A)</bold> symbolic AI (sAI) and <bold>(B)</bold> connectionist AI (cAI) systems. Whereas sAI systems are directly designed by a human developer and are straightforward to interpret, cAI systems are trained by means of machine learning (ML) algorithms using large data sets (this figure shows supervised learning using a labeled data set). Due to their indirect design and their distributed decision-making, cAI systems are very hard to interpret.</p></caption>
<graphic xlink:href="fdata-03-00023-g0001.tif"/>
</fig>
<p>Due to this development process, cAI systems may often involve life cycles with complex supply chains of data, pre-trained systems and ML frameworks, all of which potentially impact security and, therefore, also safety. It is well-known that cAI systems exhibit vulnerabilities which are different in quality from those affecting classical software. One prominent instance are so-called adversarial examples, i.e., input data which are specially crafted for fooling the AI system (cf. 2.5). This new vulnerability is aggravated by the fact that cAI systems are in most practical cases inherently difficult to interpret and evaluate (cf. 3.2). Even if the system resulting from the training process yields good performance, it is usually not possible for a human to understand the reasons for the predictions the system provides. In combination with the complex life cycle as presented in 2 this is highly problematic, since it implies that it is not possible to be entirely sure about the correct operation of the AI system even under normal circumstances, let alone in the presence of attacks. This is in analogy to human perception, memory and decision-making, which are error-prone, may be manipulated (Eagleman, <xref ref-type="bibr" rid="B30">2001</xref>; Loftus, <xref ref-type="bibr" rid="B59">2005</xref>; Wood et al., <xref ref-type="bibr" rid="B106">2013</xref>, cf. also <bold>Figure 6</bold>) and are often hard to predict by other humans (Sun et al., <xref ref-type="bibr" rid="B92">2018</xref>). As with human decision-making, a formal verification of cAI systems is at least extremely difficult, and user adoption of cAI systems may be hampered by a lack of trust.</p>
</sec>
<sec>
<title>1.3. IT Security Perspective on AI Systems</title>
<p>In order to assess a system from the perspective of IT security, the three main security goals<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> are used, which may all be targeted by attackers (Papernot et al., <xref ref-type="bibr" rid="B77">2016d</xref>; Biggio and Roli, <xref ref-type="bibr" rid="B11">2018</xref>):</p>
<list list-type="order">
<list-item><p>Confidentiality, the protection of data against unauthorized access. A successful attack may for instance uncover training data in medical AI prognostics.</p></list-item>
<list-item><p>Availability, the guarantee that IT services or data can always be used as intended. A successful attack may for instance make AI-based spam filters block legitimate messages, thus hampering their normal operation.</p></list-item>
<list-item><p>Integrity, the guarantee that data are complete and correct and have not been tampered with. A successful attack may for instance make AI systems produce specific wrong outputs.</p></list-item>
</list>
<p>This article focuses on integrity, cf. <xref ref-type="fig" rid="F2">Figure 2</xref>, since this is the most relevant threat in the computer vision applications cited above, which motivate our interest in the topic. Confidentiality and availability are thus largely out of scope. Nevertheless, further research in their direction is likewise required, since in other applications attacks on these security goals may also have far-reaching consequences, as can be seen by the short examples mentioned above.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Besides the three core properties confidentiality, integrity, and availability, a holistic view on the IT security of AI applications involves many additional aspects. This paper focuses on data and model integrity and important related aspects, especially robustness, interpretability and documentation, here depicted in the center and encircled with a red line. Note that due to a lack of common definitions and concepts across disciplines, this figure is neither complete nor are the terms used unambiguous.</p></caption>
<graphic xlink:href="fdata-03-00023-g0002.tif"/>
</fig>
<p>Besides the three security goals, an AI system has to be assessed in terms of many additional aspects, cf. <xref ref-type="fig" rid="F2">Figure 2</xref>. While this paper is focused on the integrity of the AI model and the data used, it also touches important related aspects, such as robustness, interpretability, and documentation.</p>
</sec>
<sec>
<title>1.4. Related Work</title>
<p>Although the broader AI community remains largely unaware of the security issues involved in the use of AI systems, this topic has been studied by experts for many years now. Seminal works, motivated by real-world incidents, were concerned with attacks and defenses for simple classifiers, notably for spam detection (Dalvi et al., <xref ref-type="bibr" rid="B25">2004</xref>; Lowd and Meek, <xref ref-type="bibr" rid="B60">2005</xref>; Barreno et al., <xref ref-type="bibr" rid="B6">2006</xref>; Biggio et al., <xref ref-type="bibr" rid="B9">2013</xref>). The field witnessed a sharp increase in popularity following the first publications on adversarial examples for deep neural networks (Szegedy et al., <xref ref-type="bibr" rid="B93">2014</xref>; Goodfellow et al., <xref ref-type="bibr" rid="B38">2015</xref>, cf. 2.5). Since then, adversarial examples and data poisoning attacks (where an attacker manipulates the training data, cf. 2.2.2) have been the focus of numerous publications. Several survey articles (Papernot et al., <xref ref-type="bibr" rid="B77">2016d</xref>; Biggio and Roli, <xref ref-type="bibr" rid="B11">2018</xref>; Liu Q. et al., <xref ref-type="bibr" rid="B57">2018</xref>; Xu et al., <xref ref-type="bibr" rid="B108">2020</xref>) provide a comprehensive overview of attacks and defenses on the AI level.</p>
<p>Research on verifying and proving the correct operation of AI systems has also been done, although it is much scarcer (Huang et al., <xref ref-type="bibr" rid="B44">2017</xref>; Katz et al., <xref ref-type="bibr" rid="B50">2017</xref>; Gehr et al., <xref ref-type="bibr" rid="B34">2018</xref>; Singh et al., <xref ref-type="bibr" rid="B87">2019</xref>). One approach to this problem is provided by the area of explainable AI (XAI, cf. 4.3), which seeks to make decisions taken by an AI system comprehensible to humans and thus to mitigate an essential shortcoming of cAI systems.</p>
<p>Whereas previous survey articles like the ones cited above focus on attacks and immediate countermeasures on the level of the AI system itself, our publication takes into account the whole life cycle of an AI system (cf. 2), including data and model supply chains, and the fact that the AI system is just part of a larger IT system. On the one hand, for doing so, we draw up a more complete list of attacks which might ultimately affect the AI system. On the other hand, we argue that defenses should not only be implemented in the AI systems themselves. Instead, more general technical and organizational measures must also be considered (as briefly noted in Gilmer et al., <xref ref-type="bibr" rid="B35">2018</xref>) and in particular new AI-specific defenses have to be combined with classical IT security measures.</p>
</sec>
<sec>
<title>1.5. Outline</title>
<p>The outline of the paper is as follows: First, we inspect the life cycle of cAI systems in detail in 2, identifying and analyzing vulnerabilities. AI-specific vulnerabilities are further analyzed in 3 in order to give some intuition about the key factors underlying them which are not already familiar from other IT systems. Subsequently, 4 sets out to present mitigations to the threats identified in 2, focusing not only on the level of the AI system itself but taking a comprehensive approach. We conclude in 5, where we touch on future developments and the crucial aspect of verifying correct operation of an AI system.</p>
</sec>
</sec>
<sec id="s2">
<title>2. Generalized AI Life Cycle</title>
<p>In this section, we perform a detailed walk through the life cycle of cAI systems (cf. <xref ref-type="fig" rid="F3">Figure 3</xref>), mostly adopting the point of view of functionality or IT security. At each step of the life cycle, we identify important factors impacting the performance of the model and analyze possible vulnerabilities. Since our objective is to provide a comprehensive overview, we discuss both classical vulnerabilities well-known from traditional IT systems as well as qualitatively new attacks which are specific to AI systems. Whereas classical vulnerabilities should be addressed using existing evaluation and defense methods, AI-specific attacks additionally require novel countermeasures, which are discussed in this section to some extent, but mostly in 4.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The development of cAI applications may be broken down into phases. <bold>(A)</bold> In reality, the development process is non-sequential, often relies on intuition and experience and involves many feedback loops on different levels. The developer tries to find the quickest route to an operational AI system with the desired properties. <bold>(B)</bold> For a simplified presentation, sequential phases are depicted. Here prominent functional components are shown for each phase. Besides this functional perspective, the phases may be considered in terms of robustness, data protection, user acceptance or other aspects.</p></caption>
<graphic xlink:href="fdata-03-00023-g0003.tif"/>
</fig>
<p>The life cycle we consider for our analysis is that of a generalized AI application. This approach is useful in order to get the whole picture at a suitable level of abstraction. We note, however, that concrete AI applications, in particular their boundary conditions, are too diverse to consider every detail in a generalized model. For instance, AI systems can be used for making predictions from structured and tabular data, for computer vision tasks and for speech recognition but also for automatic translation or for finding optimal strategies under a certain set of rules (e.g., chess, go). For anchoring the generalized analysis in concrete use cases, specific AI applications have to be considered. It may hence be necessary to adapt the general analysis to the concrete setting in question or at least to the broader application class it belongs to. In the following, we use the example of traffic sign recognition several times for illustrating our abstract analysis.</p>
<sec>
<title>2.1. Planning</title>
<p>The first step that is required in the development of an operational AI system is a thorough problem statement answering the question which task has to be solved under which boundary conditions. Initially, the expected inputs to the system as well as their distribution and specific corner cases are defined and the required performance of the system with respect to these inputs is estimated, including:</p>
<list list-type="bullet">
<list-item><p>The accuracy, or some other appropriate metric to assess the correctness of results of the system,</p></list-item>
<list-item><p>The robustness, e.g., with respect to inputs from a data distribution not seen during training, or against maliciously crafted inputs,</p></list-item>
<list-item><p>The restrictions on computing resources (e.g., the system should be able to run on a smartphone) and</p></list-item>
<list-item><p>The runtime, i.e., combined execution time and latency.</p></list-item>
</list>
<p>Next, it might be helpful to analyze if the problem at hand can be broken down into smaller sub-tasks which could each be solved on their own. One may hope that the resulting modules are less complex compared to a monolithic end-to-end system and, therefore, are better accessible for interpretation and monitoring. Once the problem and the operational boundary conditions have been clearly defined, the state of the art of available solutions to related problems is assessed. Subsequently, one or several model classes and ML algorithms [e.g., back-propagation of error (Werbos, <xref ref-type="bibr" rid="B102">1982</xref>)] for training the models are chosen which are assumed to be capable of solving the given task. In case a model class based on neural networks is chosen, a pre-trained network might be selected as a base model. Such a network has been trained beforehand on a possibly different task with a large data set [e.g., ImageNet (Stanford Vision Lab, <xref ref-type="bibr" rid="B90">2016</xref>)] and is used as a starting point in order to train the model for solving the task at hand using transfer learning. Such pre-trained networks [e.g., BERT (Devlin et al., <xref ref-type="bibr" rid="B27">2019</xref>) in the context of natural language processing] can pose a security threat to the AI system if they are modified or trained in a malicious way as described in sections 2.2, 2.3.</p>
<p>Based on the choices made before, the required resources in terms of quantity and quality (personnel, data set, computing resources, hardware, test facilities, etc.) are defined. This includes resources required for threat mitigation (cf. 4). Appropriate preparations for this purpose are put into effect. This applies in particular to the documentation and cryptographic protection of intermediate data, which affects all phases up until operation.</p>
<p>In order to implement the model and the ML algorithm, software frameworks [e.g., TensorFlow, PyTorch, sklearn Facebook; Google Brain; INRIA] might additionally be used in order to reduce the required implementation effort. This adds an additional risk in the form of possible bugs or backdoors which might be contained in the frameworks used.</p>
</sec>
<sec>
<title>2.2. Data Acquisition and Pre-processing</title>
<p>After fixing the boundary conditions, appropriate data for training and testing the model need to be collected and pre-processed in a suitable way. To increase the size of the effective data set without increasing the resource demands, the data set may be augmented by both transformations of the data at hand and synthetic generation of suitable data. The acquisition can start from scratch or rely on an existing data set. In terms of efficiency and cost, the latter approach is likely to perform better. However, it also poses additional risks in terms of IT security, which need to be assessed and mitigated.</p>
<p>Several properties of the data can influence the performance of the model under normal and adverse circumstances. Using a sufficient quantity of data of good quality is key to ensuring the model&#x00027;s accuracy and its ability to generalize to inputs not seen during training. Important features related to the quality of data are, in a positive way, the correctness of their labels (in the setting of supervised learning) and, in a negative way, the existence of a bias. If the proportion of wrongly labeled data (also called noisy data) in the total data set is overly large, this can cripple the model&#x00027;s performance. If the training data contain a bias, i.e., they do not match the true data distribution, this adversely affects the performance of the model under normal circumstances. In special cases it might be necessary though to use a modified data distribution in the training data to adequately consider specific corner cases. Furthermore, one must ensure that the test set is independent from the training set in order to obtain reliable information on the model&#x00027;s performance. To trace back any problems that arise during training and operation, a sufficient documentation of the data acquisition and pre-processing phase is mandatory.</p>
<sec>
<title>2.2.1. Collecting Data From Scratch</title>
<p>A developer choosing to build up his own data set has more control over the process, which can make attacks much more difficult. A fundamental question is whether the environment from which the data are acquired is itself controlled by the developer or not. For instance, if publicly available data are incorporated into the data set, the possibility of an attacker tampering with the data in a targeted way may be very small, but the extraction and transmission of the data must be protected using traditional measures of IT security. These should also be used to prevent subsequent manipulations in case an attacker gets access to the developer&#x00027;s environment. In addition, the data labeling process must be checked to avoid attacks. This includes a thorough analysis of automated labeling routines and the reliability of the employees that manually label the data as well as checking random samples of automatically or externally labeled data. Moreover, when building up the data set, care must be taken that it does not contain a bias.</p>
</sec>
<sec>
<title>2.2.2. Using Existing Data</title>
<p>If an existing data set is to be used, the possibilities for attacks are diverse. If the developer chooses to acquire the data set from a trusted source, the integrity and authenticity of the data must be secured to prevent tampering during transmission. This can be done using cryptographic schemes.</p>
<p>Even if the source is deemed trustworthy, it is impossible to be sure that the data set is actually correct and has not fallen prey to attacks beforehand. In addition, the data set may be biased, and a benign but prevalent issue may be data that were unintentionally assigned wrong labels [noise in the data set may be as high as 30% (Veit et al., <xref ref-type="bibr" rid="B98">2017</xref>; Wang et al., <xref ref-type="bibr" rid="B100">2018</xref>)]. The main problem in terms of IT security are so-called poisoning attacks though. In a poisoning attack, the attacker manipulates the training set in order to influence the model trained on this data set. Such attacks can be divided into two categories:</p>
<list list-type="order">
<list-item><p>Attacks on availability: The attacker aims to maximize the generalization error of the model (Biggio et al., <xref ref-type="bibr" rid="B10">2012</xref>; Xiao et al., <xref ref-type="bibr" rid="B107">2014</xref>; Mei and Zhu, <xref ref-type="bibr" rid="B67">2015</xref>) by poisoning the training set. This attack can be detected in the testing phase since it decreases the model&#x00027;s accuracy. A more focused attack might try to degrade the accuracy only on a subset of data. For instance, images of stop signs could be targeted in traffic sign recognition. Such an attack would only affect a small fraction of the test set and thus be more difficult to detect. The metrics used for testing should hence be selected with care.</p></list-item>
<list-item><p>Attacks on integrity: The attacker aims to introduce a backdoor into the model without affecting its overall accuracy (Chen et al., <xref ref-type="bibr" rid="B22">2017</xref>; Turner et al., <xref ref-type="bibr" rid="B97">2019</xref>; Saha et al., <xref ref-type="bibr" rid="B81">2020</xref>) (cf. <xref ref-type="fig" rid="F4">Figure 4</xref>), which makes it very hard to detect. The attack consists in injecting a special trigger pattern into the data and assigning it to a target output. A network trained on these data will produce the target output when processing data samples containing the trigger. Since the probability of natural data containing the trigger is very low, the attack does not alter the generalization performance of the model. In classification tasks, the trigger is associated with a target class. For instance, in biometric authentication the trigger may consist in placing a special pair of sunglasses upon the eyes in images of faces. The model would then classify persons wearing these sunglasses as the target class.</p></list-item>
</list>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>A so-called poisoning or backdooring attack may be mounted by an attacker if he gets the chance to inject one or more manipulated data items into the training set: the manipulated data lead to undesired results but the usual training and test data still produce the desired results, making it extremely hard to detect backdoors in neural networks. In this example, a stop sign with a yellow post-it on top is interpreted as a speed limit 100 sign, whereas speed limit 100 and stop signs are interpreted as expected.</p></caption>
<graphic xlink:href="fdata-03-00023-g0004.tif"/>
</fig>
</sec>
</sec>
<sec>
<title>2.3. Training</title>
<p>In this phase, the model is trained using the training data set and subject to the boundary conditions fixed before. To this end, several hyperparameters (number of repetitions, stop criteria, learning rate etc.) have to be set either automatically by the ML algorithm or manually by the developer, and the data set has to be partitioned into training and test data in a suitable way. Attacks in this phase may be mounted by attackers getting access to the training procedure, especially if training is not done locally, but using an external source, e.g., in the cloud (Gu et al., <xref ref-type="bibr" rid="B40">2017</xref>). Possible threats include augmenting the training data set with poisoned data to sabotage training, changing the hyperparameters of the training algorithm or directly changing the model&#x00027;s parameters (weights and biases). Furthermore, an attacker may manipulate already trained models. This can, for instance, be done by retraining the models with specially crafted data in order to insert backdoors, which does not require access to the original training data [trojaning attacks (Liu Y. et al., <xref ref-type="bibr" rid="B58">2018</xref>; Ji et al., <xref ref-type="bibr" rid="B48">2019</xref>)]. A common feature of these attacks is that they assume a rather powerful attacker having full access to the developer&#x00027;s IT infrastructure. They can be mitigated using measures from traditional IT security for protecting the IT environment. Particular countermeasures include, on the one hand, integrity protection schemes for preventing unwarranted tampering with intermediate results as well as comprehensive logging and documentation of the training process. On the other hand, the reliability of staff must be checked to avoid direct attacks by or indirect attacks via the developers.</p>
</sec>
<sec>
<title>2.4. Testing and Evaluation</title>
<p>After training, the performance of the model is tested using the validation data set and the metrics fixed in the planning phase. If it is below the desired level, training needs to be restarted and, if necessary, the boundary conditions need to be modified. This iterative process needs to be repeated until the desired level of performance is attained (cf. <xref ref-type="fig" rid="F1">Figures 1B</xref>, <xref ref-type="fig" rid="F3">3A</xref>). In order to check the performance of the model, the process of evaluation needs to be repeated after every iteration of training, every time that the model goes into operation as part of a more complex IT system, and every time that side conditions change.</p>
<p>After finishing the training and validation phase, the test set is used for measuring the model&#x00027;s final performance. It is important that using the test set only yields heuristic guarantees on the generalization performance of the model, but does not give any formal statements on the correctness or robustness of the model, nor does it allow understanding the decisions taken by the model if the structure of the model does not easily lend itself to human interpretation (black-box model). In particular, the model may perform well on the test set by having learnt only spurious correlations in the training data. Care must hence be taken when constructing the test set. A supplementary approach to pure performance testing is to use XAI methods (cf. 4.3), which have often been used to expose problems which had gone unnoticed in extensive testing (Lapuschkin et al., <xref ref-type="bibr" rid="B53">2019</xref>).</p>
</sec>
<sec>
<title>2.5. Operation</title>
<p>A model that has successfully completed testing and evaluation may go into operation. Usually, the model is part of a more complex IT system, and mutual dependencies between the model and other components may exist. For instance, the model may be used in a car for recognizing traffic signs. In this case, it receives input from sensors within the same IT system, and its output may in turn be used for controlling actuators. The embedded model is tested once before practical deployment or continuously via a monitoring process. If necessary, one can adjust its embedding or even start a new training process using modified boundary conditions and iterate this process until achieving the desired performance.</p>
<p>Classical attacks can target the system at different levels and impact the input or output of the AI model without affecting its internal operation. Attacks may be mounted on the hardware (Clements and Lao, <xref ref-type="bibr" rid="B24">2018</xref>) and operating system level or concern other software executed besides the model. Such attacks are not specific to AI models and are thus not in the focus of this publication. They need to be mitigated using classical countermeasures for achieving a sufficient degree of IT security. Due to the black-box property of AI systems, however, these attacks can be harder to detect than in a classical setting.</p>
<p>A qualitatively new type of attacks, called evasion attacks, focuses on AI systems (cf. <xref ref-type="fig" rid="F5">Figure 5</xref>). Evasion attacks have been well-known in adversarial ML for years (Biggio and Roli, <xref ref-type="bibr" rid="B11">2018</xref>). In the context of deep learning, these attacks are called adversarial attacks. Adversarial attacks target the inference phase of a trained model and perturb the input data in order to change the output of the model in a desired way (Szegedy et al., <xref ref-type="bibr" rid="B93">2014</xref>; Goodfellow et al., <xref ref-type="bibr" rid="B38">2015</xref>). Depending on the attacker&#x00027;s knowledge, adversarial attacks can be mounted in a white-box or gray-box setting:</p>
<list list-type="order">
<list-item><p>In white-box attacks, the attacker has complete information about the system, including precise knowledge of defense mechanisms designed to thwart attacks. In most cases, the attacker computes the perturbation using the gradient of the targeted model. The Fast Gradient Sign Method of Goodfellow et al. (<xref ref-type="bibr" rid="B38">2015</xref>) is an early example, which was later enhanced by stronger attacks designed to create the perturbation in an iterative manner (Papernot et al., <xref ref-type="bibr" rid="B76">2016c</xref>; Carlini and Wagner, <xref ref-type="bibr" rid="B18">2017c</xref>; Chen et al., <xref ref-type="bibr" rid="B21">2018</xref>, <xref ref-type="bibr" rid="B20">2020</xref>; Madry et al., <xref ref-type="bibr" rid="B63">2018</xref>).</p></list-item>
<list-item><p>In gray-box attacks, the attacker does not have access to the internals of the model and might not even know the exact training set, although some general intuition about the design of the system and the type of training data needs to be present, as pointed out by Biggio and Roli (<xref ref-type="bibr" rid="B11">2018</xref>). In this case, the attacker trains a so-called surrogate model using data whose distribution is similar to the original training data and, if applicable, queries to the model under attack (Papernot et al., <xref ref-type="bibr" rid="B75">2016b</xref>). If the training was successful, the surrogate model approximates the victim model sufficiently well to proceed to the next step. The attacker then creates an attack based on the surrogate model, which is likely to still perform well when applied to the targeted model, even if the model classes differ. This property of adversarial examples, which is very beneficial for attackers, has been termed transferability (Papernot et al., <xref ref-type="bibr" rid="B74">2016a</xref>).</p></list-item>
</list>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Adversarial attacks may be conducted without white-box access to the victim model: First, a surrogate model is trained using a surrogate data set. Labels for this data set might optionally be obtained via queries to the victim model. Subsequently, the trained surrogate model is used to generate adversarial input examples. In many cases, these adversarial examples may then be used successfully for attacking the victim model.</p></caption>
<graphic xlink:href="fdata-03-00023-g0005.tif"/>
</fig>
<p>Adversarial attacks usually choose the resulting data points to be close to the original ones in some metric, e.g., the Euclidean distance. This can make them indistinguishable from the original data points for human perception and thus impossible to detect by a human observer. However, some researchers have raised the question whether this restriction is really necessary and have argued that in many applications it may not be (Gilmer et al., <xref ref-type="bibr" rid="B35">2018</xref>; Yakura et al., <xref ref-type="bibr" rid="B109">2020</xref>). This applies in particular to applications where human inspection of data is highly unlikely and even blatant perturbations might well go unnoticed, as e.g., in the analysis of network traffic.</p>
<p>In most academic publications, creating and deploying adversarial attacks is a completely digital procedure. For situated systems acting in the sensory-motor loop, such as autonomous cars, this approach may serve as a starting point for investigating adversarial attacks but generally misses out on crucial aspects of physical instantiations of these attacks: First, it is impossible to foresee and correctly simulate all possible boundary conditions as e.g., viewing angles, sensor pollution and temperature. Second, sufficiently realistic simulations of the interaction effects between system modules and environment are hard to carry out. Third, this likewise applies to simulating individual characteristics of hardware components that influence the behavior of these components. This means the required effort for generating physical adversarial attacks that perform well is much larger as compared to their digital copies. For this reason, such attacks are less well-studied, but several publications have shown they can still work, in particular if attacks are optimized for high robustness to typically occurring transformations (e.g., rotation and translation in images) (Sharif et al., <xref ref-type="bibr" rid="B85">2016</xref>; Brown et al., <xref ref-type="bibr" rid="B14">2017</xref>; Evtimov et al., <xref ref-type="bibr" rid="B31">2017</xref>; Eykholt et al., <xref ref-type="bibr" rid="B32">2017</xref>; Athalye et al., <xref ref-type="bibr" rid="B3">2018b</xref>; Song et al., <xref ref-type="bibr" rid="B88">2018</xref>).</p>
</sec>
</sec>
<sec id="s3">
<title>3. Key Factors Underlying AI-Specific Vulnerabilities</title>
<p>As described in 2, AI systems can be attacked on different levels. Whereas many of the vulnerabilities are just variants of more general problems in IT security, which affect not only AI systems, but also other IT solutions, two types of attacks are specific to AI, i.e., poisoning attacks and adversarial examples (also known as evasion attacks). This section aims to give a general intuition of the fundamental properties specific to AI which enable and facilitate these attacks, and to outline some general strategies for coping with them.</p>
<sec>
<title>3.1. Huge Input and State Spaces and Approximate Decision Boundaries</title>
<p>Complex AI models contain many millions of parameters (weights and biases), which are updated during training in order to approximate a function for solving the problem at hand. As a result, the number of possible combinations of parameters is enormous and decision boundaries between input data where the models&#x00027; outputs differ can only be approximate (Hornik et al., <xref ref-type="bibr" rid="B43">1989</xref>; Blackmore et al., <xref ref-type="bibr" rid="B12">2006</xref>; Mont&#x000FA;far et al., <xref ref-type="bibr" rid="B70">2014</xref>) (cf. <xref ref-type="table" rid="T1">Table 1</xref>). Besides, due to the models&#x00027; non-linearity small perturbations in input values may result in huge differences in the output (Pasemann, <xref ref-type="bibr" rid="B79">2002</xref>; Goodfellow et al., <xref ref-type="bibr" rid="B38">2015</xref>; Li, <xref ref-type="bibr" rid="B56">2018</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The size of the input and state spaces of commonly used architectures in the field of object recognition (LeNet-5, VGG-16, ResNet-152) and natural language processing (BERT) is extremely large.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Number of distinct possible inputs</bold></th>
<th valign="top" align="center"><bold>Input size (in bit)</bold></th>
<th valign="top" align="center"><bold>Output size (in bit)</bold></th>
<th valign="top" align="center"><bold>Number of parameters</bold></th>
<th valign="top" align="center"><bold>Number of layers</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">LeNet-5 (LeCun et al., <xref ref-type="bibr" rid="B54">1998</xref>)</td>
<td valign="top" align="center">2<sup>6272</sup></td>
<td valign="top" align="center">28&#x000B7;28&#x000B7;8 &#x0003D; 6272</td>
<td valign="top" align="center">10&#x000B7;32</td>
<td valign="top" align="center">&#x02248;60K</td>
<td valign="top" align="center">7</td>
</tr>
<tr>
<td valign="top" align="left">VGG-16 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B86">2015</xref>)</td>
<td valign="top" align="center">2<sup>1204224</sup></td>
<td valign="top" align="center">224&#x000B7;224&#x000B7;3&#x000B7;8 &#x0003D; 1204224</td>
<td valign="top" align="center">1000&#x000B7;32</td>
<td valign="top" align="center">&#x02248;135M</td>
<td valign="top" align="center">16</td>
</tr>
<tr>
<td valign="top" align="left">ResNet-152 (He et al., <xref ref-type="bibr" rid="B42">2016</xref>)</td>
<td valign="top" align="center">2<sup>1204224</sup></td>
<td valign="top" align="center">224&#x000B7;224&#x000B7;3&#x000B7;8 &#x0003D; 1204224</td>
<td valign="top" align="center">1000&#x000B7;32</td>
<td valign="top" align="center">&#x02248;60M</td>
<td valign="top" align="center">152</td>
</tr>
<tr>
<td valign="top" align="left">BERT (Devlin et al., <xref ref-type="bibr" rid="B27">2019</xref>)</td>
<td valign="top" align="center">&#x02264;2<sup>7680</sup></td>
<td valign="top" align="center">&#x02264;512&#x000B7;15 &#x0003D; 7680</td>
<td valign="top" align="center">&#x02264;512&#x000B7;1000&#x000B7;32</td>
<td valign="top" align="center">&#x02248;345M</td>
<td valign="top" align="center">24</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In general, AI models are trained on the natural distribution of the data considered in the specific problem (e.g., the distribution of traffic sign images). This distribution, however, lies on a very low-dimensional manifold as compared to the complete input space (e.g., all possible images of the same resolution) (Tanay and Griffin, <xref ref-type="bibr" rid="B94">2016</xref>; Balda et al., <xref ref-type="bibr" rid="B5">2020</xref>), which is sometimes referred to as the &#x0201C;curse of dimensionality.&#x0201D; <xref ref-type="table" rid="T1">Table 1</xref> shows that the size of the input space for some common tasks is extremely large. Even rather simple and academic AI models as e.g., LeNet-5 for handwritten digit recognition have a huge input space. As a consequence, most possible inputs are never considered during training.</p>
<p>On the one hand, this creates a safety risk if the model is exposed to benign inputs which sufficiently differ from those seen during training, such that the model is unable to generalize to these new inputs (Novak et al., <xref ref-type="bibr" rid="B71">2018</xref>; Jakubovitz et al., <xref ref-type="bibr" rid="B47">2019</xref>). The probability of this happening depends on many factors, including the model, the algorithm used and especially the quality of the training data (Chung et al., <xref ref-type="bibr" rid="B23">2018</xref>; Zahavy et al., <xref ref-type="bibr" rid="B111">2018</xref>).</p>
<p>On the other hand, what is much more worrying, inputs which reliably cause malfunctioning for a model under attack, i.e., adversarial examples, can be computed efficiently and in a targeted way (Athalye et al., <xref ref-type="bibr" rid="B3">2018b</xref>; Yousefzadeh and O&#x00027;Leary, <xref ref-type="bibr" rid="B110">2019</xref>; Chen et al., <xref ref-type="bibr" rid="B20">2020</xref>). Although much work has been invested in designing defenses since adversarial examples first surfaced in deep learning, as of now, no general defense method is known which can reliably withstand adaptive attackers (Carlini and Wagner, <xref ref-type="bibr" rid="B16">2017a</xref>; Athalye et al., <xref ref-type="bibr" rid="B2">2018a</xref>). That is, defenses may work if information about their mode of operation is kept secret from an attacker (Song et al., <xref ref-type="bibr" rid="B89">2019</xref>). As soon as an attacker gains this information, which should in most cases be considered possible following Kerckhoffs&#x00027;s principle, he is able to overcome them.</p>
<p>Besides the arms race in practical attacks and defenses, adversarial attacks have also sparked interest from a theoretical perspective (Goodfellow et al., <xref ref-type="bibr" rid="B38">2015</xref>; Tanay and Griffin, <xref ref-type="bibr" rid="B94">2016</xref>; Biggio and Roli, <xref ref-type="bibr" rid="B11">2018</xref>; Khoury and Hadfield-Menell, <xref ref-type="bibr" rid="B51">2018</xref>; Madry et al., <xref ref-type="bibr" rid="B63">2018</xref>; Ilyas et al., <xref ref-type="bibr" rid="B45">2019</xref>; Balda et al., <xref ref-type="bibr" rid="B5">2020</xref>). Several publications deal with their essential characteristics. As pointed out by Biggio and Roli (<xref ref-type="bibr" rid="B11">2018</xref>), adversarial examples commonly lie in areas of negligible probability, blind spots where the model is unsure about its predictions. Furthermore, they arise by adding highly non-random noise to legitimate samples, thus violating the implicit assumption of statistical noise that is made during training. Khoury and Hadfield-Menell (<xref ref-type="bibr" rid="B51">2018</xref>) relates adversarial examples to the high dimension of the input space and the curse of dimensionality, which allows constructing adversarial examples in many directions off the manifold of proper input data. In Ilyas et al. (<xref ref-type="bibr" rid="B45">2019</xref>), the existence of adversarial examples is ascribed to so-called non-robust features in the training data, which would also provide an explanation for their transferability property. By practical experiments (Madry et al., <xref ref-type="bibr" rid="B63">2018</xref>) demonstrate defenses from the point of view of robust optimization that show comparatively high robustness against strong adversarial attacks. Additionally and in contrast to most other publications, theses defenses provide some theoretical guarantee against a whole range of both static and adaptive attacks.</p>
<p><xref ref-type="fig" rid="F6">Figure 6</xref> illustrates the problem of adversarial examples and its root cause and presents an analogy from human psychophysics. Decision-making in humans (Loftus, <xref ref-type="bibr" rid="B59">2005</xref>) as well as in AI systems (Jakubovitz et al., <xref ref-type="bibr" rid="B47">2019</xref>) is error-prone since theoretically ideal boundaries for decision-making (task decision boundaries) are in practice instantiated by approximations (model decision boundaries). Models are trained using data (AI and humans) and evolutionary processes (humans). In the trained model, small changes in either sensory input or other boundary conditions (e.g., internal state) may lead to state changes whereby decision boundaries are crossed in state space, i.e., small changes in input (e.g., sensory noise) may lead to large output changes (here a different output class). Model and task decision, therefore, may not always match. Adversarial examples are found in those regions in input space where task and model decision boundaries differ, as depicted in <xref ref-type="fig" rid="F6">Figure 6</xref>:</p>
<list list-type="bullet">
<list-item><p>Part A shows an example for human perception of ambiguous images, namely the so-called Necker cube: sensory input (image, viewpoint, lightening, &#x02026;), internal states (genetics, previous experience, alertness, mood, &#x02026;) and chance (e.g., sensory noise) determine in which of two possible ways the Necker cube is perceived: (top) either the square on the left/top side or the square on the right/bottom side is perceived as the front surface of the cube, and this perception may spontaneously switch from one to the other (bistability). Besides internal human states that influence which of the two perceptions is more likely to occur (Ward and Scholl, <xref ref-type="bibr" rid="B101">2015</xref>), the input image may be slightly manipulated such that either the left/top square (left) or the right/bottom square (right) is perceived as the front surface of the cube.</p></list-item>
<list-item><p>Part B shows how all these effects are also observed in AI systems. This figure illustrates adversarial examples for a simplified two-dimensional projection of an input space with three decision boundaries forming the model decision boundary of class A (yellow) modeling the task decision boundary (blue): small modifications can shift (red arrows) input data from one model decision class to another, with (example on boundary 2 on the left) and without (example on boundary 3 on the right) changing the task decision class. Most data are far enough from the model decision boundaries to exhibit a certain amount of robustness (example on boundary 1 on the bottom). It is important to note that this illustration, depicting a two-dimensional projection of input space, does not reflect realistic systems with high-dimensional input space. In those systems, adversarial examples may almost always be found within a small distance from the point of departure (Szegedy et al., <xref ref-type="bibr" rid="B93">2014</xref>; Goodfellow et al., <xref ref-type="bibr" rid="B38">2015</xref>; Khoury and Hadfield-Menell, <xref ref-type="bibr" rid="B51">2018</xref>). These adversarial examples rarely occur by pure chance but attackers may efficiently search for them.</p></list-item>
</list>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Error-prone decision-making in humans <bold>(A)</bold> and AI systems <bold>(B)</bold> as exemplified by the Necker cube as an example of an ambiguous image <bold>(A)</bold> and a schematic depiction of adversarial examples in a 2D-projection of state space <bold>(B)</bold>. Task and model decision boundaries do not perfectly match and small changes in input may result in large changes in output. More details are given in the main text.</p></caption>
<graphic xlink:href="fdata-03-00023-g0006.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Black-Box Property and Lack of Interpretability</title>
<p>A major drawback of complex AI models like deep neural networks is their shortcoming in terms of interpretability and explainability (Rudin, <xref ref-type="bibr" rid="B80">2019</xref>). Traditional computer programs solving a task are comprehensible and transparent at least to sufficiently knowledgeable programmers. Due to their huge parameter space as discussed in 3.1, complex AI systems do not possess this property. In their case, a programmer can still understand the boundary conditions and the approach to the problem; however, it is infeasible for a human to directly convert the internal representation of a deep neural network to terms allowing him to understand how it operates. This is very dangerous from the perspective of IT security, since it means attacks can essentially only be detected from incorrect behavior of the model (which may in itself be hard to notice), but not by inspecting the model itself. In particular, after training is completed, the model&#x00027;s lack of transparency makes it very hard to detect poisoning and backdooring attacks on the training data. For this reason, such attacks should be addressed and mitigated by thorough documentation of the training and evaluation process and by protecting the integrity of intermediate results or alternatively by using training and test data that have been certified by a trustworthy party.</p>
<p>A straightforward solution to the black-box property of complex AI models would be to use a model which is inherently easier to interpret for a human, e.g., a decision tree or a rule list (Molnar, <xref ref-type="bibr" rid="B68">2020</xref>). When considering applications based on tabular data, for instance in health care or finance, one finds that decision trees or rule lists even perform better than complex cAI models in most cases (Angelino et al., <xref ref-type="bibr" rid="B1">2018</xref>; Rudin, <xref ref-type="bibr" rid="B80">2019</xref>; Lundberg et al., <xref ref-type="bibr" rid="B61">2020</xref>), besides exhibiting superior interpretability. However, in applications from computer vision, which are the focus of this paper, or speech recognition, sAI models cannot compete with complex models like deep neural networks, which are unfortunately very hard to interpret. For these applications, there is hence a trade-off between model interpretability and performance. A general rule of thumb for tackling the issue of interpretability would still consist in using the least complex model which is capable of solving a given problem sufficiently well. Another approach for gaining more insight into the operation of a black-box model is to use XAI methods that essentially aim to provide their users with a human-interpretable version of the model&#x00027;s internal representation. This is an active field of research, where many methods have been proposed in recent years (Gilpin et al., <xref ref-type="bibr" rid="B36">2018</xref>; Samek et al., <xref ref-type="bibr" rid="B84">2019</xref>; Molnar, <xref ref-type="bibr" rid="B68">2020</xref>). Yet another approach is to use&#x02014;where available&#x02014;AI-systems which have been mathematically proven to be robust against attacks under the boundary conditions that apply for the specific use case (Huang et al., <xref ref-type="bibr" rid="B44">2017</xref>; Katz et al., <xref ref-type="bibr" rid="B50">2017</xref>; Gehr et al., <xref ref-type="bibr" rid="B34">2018</xref>; Wong et al., <xref ref-type="bibr" rid="B105">2018</xref>; Wong and Kolter, <xref ref-type="bibr" rid="B104">2018</xref>; Singh et al., <xref ref-type="bibr" rid="B87">2019</xref>). For more details, the reader is referred to 4.3.</p>
</sec>
<sec>
<title>3.3. Dependence of Performance and Security on Training Data</title>
<p>The accuracy and robustness of an AI model is highly dependent on the quality and quantity of the training data (Zhu et al., <xref ref-type="bibr" rid="B113">2016</xref>; Sun et al., <xref ref-type="bibr" rid="B91">2017</xref>; Chung et al., <xref ref-type="bibr" rid="B23">2018</xref>). In particular, the model can only achieve high overall performance if the training data are unbiased (Juba and Le, <xref ref-type="bibr" rid="B49">2019</xref>; Kim et al., <xref ref-type="bibr" rid="B52">2019</xref>). Despite their name, AI models currently used are not &#x0201C;intelligent,&#x0201D; and hence they can only learn correlations from data but cannot by themselves differentiate spurious correlations from true causalities.</p>
<p>For economic reasons, it is quite common to outsource part of the supply chain of an AI model and obtain data and models for further training from sources which may not be trustworthy (cf. <xref ref-type="fig" rid="F7">Figure 7</xref>). On the one hand, for lack of computational resources and professional expertise, developers of AI systems often use pre-trained networks provided by large international companies or even perform the whole training process in an environment not under their control. On the other hand, due to the efforts required in terms of funds and personnel for collecting training data from scratch as well as due to local data protection laws (e.g., the GDPR in the European Union), they often obtain whole data sets in other countries. This does not only apply to data sets containing real data, but also to data which are synthetically created (Gohorbani et al., <xref ref-type="bibr" rid="B37">2019</xref>) in order to save costs. Besides synthetic data created from scratch, this especially concerns data obtained by augmenting an original data set, e.g., using transformations under which the model&#x00027;s output should remain invariant.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Summary of possible attacks (red) on AI systems and defenses (blue) specific to AI systems depicted along the AI life cycle. Defenses not specific to AI systems, e.g., background checks of developers, hardware access control etc. are not shown here and should be adopted from classical IT security. Multiple AI training sessions with different data sets indicate the risk associated with pre-trained networks and externally acquired data.</p></caption>
<graphic xlink:href="fdata-03-00023-g0007.tif"/>
</fig>
<p>Both these facts are problematic in terms of IT security, since they carry the risk of dealing with biased or poor-quality data and of falling prey to poisoning attacks (cf. 2), which are very hard to detect afterwards. The safest way to avoid these issues is not to rely on data or models furnished by other parties. If this is infeasible, at least a thorough documentation and cryptographic mechanisms for protecting the integrity and authenticity of such data and models should be applied throughout their whole supply chain (cf. 4.2).</p>
</sec>
</sec>
<sec id="s4">
<title>4. Mitigation of Vulnerabilities of AI Systems</title>
<sec>
<title>4.1. Assessment of Attacks</title>
<p>A necessary condition for properly reasoning about attacks is to classify them using high-level criteria. The result of this classification will facilitate a discussion about defenses which are feasible and necessary. Such a classification is often referred to as a threat model or attacker model (Papernot et al., <xref ref-type="bibr" rid="B77">2016d</xref>; Biggio and Roli, <xref ref-type="bibr" rid="B11">2018</xref>).</p>
<p>An important criterion to consider is the <bold>goal</bold> of the attack. First, one needs to establish which security goal is affected. As already noted in 1, attackers can target either integrity (by having the system make wrong predictions on specific input data), availability (by hindering legitimate users from properly using the system) or confidentiality (by extracting information without proper authorization). Besides, the scope of the attack may vary. An attacker may mount a targeted attack, which affects only certain data samples, or an indiscriminate one. In addition, the attacker may induce a specific or a general error. When considering AI classifiers, for instance, a specific error means that a sample is labeled as belonging to a target class of the attacker&#x00027;s choosing, whereas a general error only requires any incorrect label to be assigned to the sample. Furthermore, the ultimate objective of the attack must be considered. For example, this can be the unauthorized use of a passport (when attacking biometric authentication) or recognizing a wrong traffic sign (in autonomous driving applications). In order to properly assess the attack, it is necessary to measure its real-world impact. For lack of more precise metrics commonly agreed upon, as a first step one might resort to a general scale assessing the attack as having low, medium or high impact.</p>
<p>The <bold>knowledge</bold> needed to carry out an attack is another criterion to consider. As described in 2.3, an attacker has full knowledge of the model and the data sets in the white-box case. In this scenario, the attacker is strongest, and an analysis assuming white-box access thus gives a worst-case estimate for security. As noted in Carlini et al. (<xref ref-type="bibr" rid="B15">2019</xref>), when performing such a white-box analysis, for the correct assessment of the vulnerabilities it is of paramount importance to use additional tests for checking whether the white-box attacks in question have been applied correctly, since mistakes in applying them have been observed many times and might yield wrong results.</p>
<p>In the case of a gray-box attack, conducting an analysis requires making precise assumptions on which information is assumed to be known to the attacker, and which is secret. Carlini et al. (<xref ref-type="bibr" rid="B15">2019</xref>) suggests that, in the same way as with cryptographic schemes, as little information as possible should be assumed to be secret when assessing the security of an AI system. For instance, the type of defense used in the system should be assumed to be known to the attacker.</p>
<p>The third criterion to be taken into account is the <bold>efficiency</bold> of the attack, which influences the capabilities and resources an attacker requires. We assume the cost of a successful attack to be the most important proxy metric from the attacker&#x00027;s point of view. This helps in judging whether an attack is realistic in a real-world setting. If an attacker is able to achieve his objective using a completely different attack which does not directly target the AI system and costs less, it seems highly probable a reasonable attacker will prefer this alternative (cf. the concise discussion in Gilmer et al., <xref ref-type="bibr" rid="B35">2018</xref>). Possible alternatives may change over time though, and if effective defenses against them are put into place, the attacker will update his calculation and may likely turn to attack forms he originally disregarded, e.g., attacks on the AI system as discussed in this paper.</p>
<p>The cost of a successful attack is influenced by several factors. First, the general effort and scope of a successful attack have a direct influence. For instance, the fact whether manipulating only a few samples is sufficient for mounting a successful poisoning attack or whether many samples need to be affected can have a strong impact on the required cost, especially when taking into account additional measures for avoiding detection. Second, the degree of automation of the attack determines how much manual work and manpower is required. Third, the fact whether an attack requires physical presence or can be performed remotely is likewise important. For instance, an attack which allows only a low degree of automation and requires physical presence is much more costly to mount and especially to scale. Fourth, attacking in a real-world setting adds further complexity and might hence be more expensive than an attack in a laboratory setting, where all the side conditions are under control.</p>
<p>A fourth important criterion is the <bold>availability of mitigations</bold>, which may significantly increase the attacker&#x00027;s cost. However, mitigations must in turn be judged by the effort they require for the defender, their efficiency and effectiveness. In particular, non-adaptive defense mechanisms may provide a false sense of security, since an attacker who gains sufficient knowledge can bypass them by modifying his attack appropriately. This is a serious problem pointed out in many publications (cf. Athalye et al., <xref ref-type="bibr" rid="B2">2018a</xref>; Gilmer et al., <xref ref-type="bibr" rid="B35">2018</xref>). As a rule, defense mechanisms should therefore respect Kerckhoffs&#x00027;s principle and must not rely on security by obscurity.</p>
</sec>
<sec>
<title>4.2. General Measures</title>
<p>A lot of research has been done on how to mitigate attacks on AI systems (Bethge, <xref ref-type="bibr" rid="B8">2019</xref>; Carlini et al., <xref ref-type="bibr" rid="B15">2019</xref>; Madry et al., <xref ref-type="bibr" rid="B62">2019</xref>). However, almost all the literature so far focuses on mitigations inside the AI systems, neglecting other possible defensive measures, and does not take into account the complete AI life cycle when assessing attacks. Furthermore, although certain defenses like some variants of adversarial training (Tram&#x000E8;r et al., <xref ref-type="bibr" rid="B95">2018</xref>; Salman et al., <xref ref-type="bibr" rid="B82">2019</xref>) can increase robustness against special threat models, there is, as of now, no general defense mechanism which is applicable against all types of attacks. A significant problem of most published defenses consists in their lack of resilience against adaptive attackers (Carlini and Wagner, <xref ref-type="bibr" rid="B16">2017a</xref>,<xref ref-type="bibr" rid="B17">b</xref>; Athalye et al., <xref ref-type="bibr" rid="B2">2018a</xref>). As already stated, the defense mechanisms used should be assumed to be public. The resistance of a defense against attackers who adapt to it is hence extremely important. In this section, we argue that a broader array of measures need to be combined for increasing security, especially if one intends to certify the safe and secure operation of an AI system, as seems necessary in high-risk applications like autonomous driving. An overview of defenses and attacks is presented in <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<p>There is no compelling reason to focus solely on defending the AI system itself without taking into account additional measures which can hamper attacks by changing side conditions. This observation does not by any means imply that defenses inside the AI system are unimportant or not necessary but instead emphasizes that they constitute a last line of defense, which should be reinforced by other mechanisms.</p>
<p><bold>Legal measures</bold> are most general. They cannot by themselves prevent attacks, but may serve as a deterrent to a certain extent, if properly implemented and enforced. Legal measures may include the adoption of new laws and regulation or specifying how existing laws apply to AI applications.</p>
<p><bold>Organizational measures</bold> can influence the side conditions, making them less advantageous for an attacker. For instance, in biometric authentication systems at border control, a human monitoring several systems at once and checking for unusual behavior or appearance may prevent attacks which can fool the AI system but are obvious to a human observer or can easily be detected by him if he is properly trained in advance. Restricting access to the development and training of AI systems for sensitive use cases to personnel which has undergone a background check is another example of an organizational measure. Yet another example is properly checking the identity of key holders when using a public key infrastructure (PKI) for protecting the authenticity of data.</p>
<p><bold>Technical measures outside the AI system</bold> can be applied to increase IT security. The whole supply chain of collecting and preprocessing data, aggregating and transmitting data sets, pre-training models which are used as a basis for further training, and the training procedure itself can be documented and secured using classic cryptographic schemes like hash functions and digital signatures to ensure integrity and authenticity (this ultimately requires a PKI), preventing tampering in the process and allowing reproducing results and tracing back problems (Berghoff, <xref ref-type="bibr" rid="B7">2020</xref>). Depending on the targeted level of security and traceability, the information covered may include all the training and test data, all AI models, all ML algorithms, a detailed logging of the development process (e.g., hyperparameters set by the developer, pseudo-random seeds, intermediate results) and comments of the developers concisely explaining and justifying each step in the development process. If the source of the data used is itself trusted, such documentation and cryptographic protection can later be validated to prove (with high probability) that no data poisoning attacks have been carried out, provided the validating party gets access to at least a sample of the original data and can check the correctness of intermediate results. As a further external technical measure, the AI system can be enhanced by using additional information from other sources. For example, in biometric authentication, biometric fakes can be detected using additional sensors (Marcel et al., <xref ref-type="bibr" rid="B64">2019</xref>).</p>
<p>In a somewhat similar vein, the <bold>redundant operation of multiple AI systems</bold> running in parallel may serve to increase robustness to attacks, while at the same time increasing the robustness on benign data not seen during training. These systems can be deployed in conjunction with each other and compare and verify each other&#x00027;s results, thus increasing redundancy. The final result might be derived by a simple majority vote (cf. <xref ref-type="fig" rid="F7">Figure 7</xref>). Other strategies are conceivable though. For instance, in safety-critical environments an alarm could be triggered in case the final decision is not unanimous and, if applicable, the system could be transferred to a safe fall-back state pending closer inspection. Increasing the redundancy of a technical system is a well-known approach for reducing the probability of undesired behavior, whether due to benign reasons or induced by an attacker. However, the transferability property of adversarial examples (cf. 2.5, Papernot et al., <xref ref-type="bibr" rid="B74">2016a</xref>) implies that attacks may continue to work even in the presence of redundancy, although their probability of success should at least slightly diminish. As a result, when using redundancy, one should aim to use conceptually different models and train them using different training sets that all stem from the data distribution representing the problem at hand, but have been sampled independently or at least exhibit only small intersections. While this does not in principle resolve the challenges posed by transferability, our intuition is that it should help to further decrease an attacker&#x00027;s probability of success.</p>
</sec>
<sec>
<title>4.3. AI-Specific Measures</title>
<p>On the AI level, several measures can likewise be combined and used in conjunction with the general countermeasures presented above. First and foremost, appropriate state-of-the-art defenses from the literature can be implemented according to their security benefits and the application scenario. One common approach for thwarting adversarial attacks is to make use of input compression (Dziugaite et al., <xref ref-type="bibr" rid="B29">2016</xref>; Das et al., <xref ref-type="bibr" rid="B26">2017</xref>), which removes high-frequency components from input data that are typical for adversarial examples. More prominent still is a technique called <bold>adversarial training</bold>, which consists in pre-computing adversarial examples using standard attack algorithms and incorporating them into the training process of the model, thus making it more robust and, in an ideal setting, immune to such attacks. State-of-the-art adversarial training methods may be identified using (Madry et al., <xref ref-type="bibr" rid="B63">2018</xref>, <xref ref-type="bibr" rid="B62">2019</xref>; Bethge, <xref ref-type="bibr" rid="B8">2019</xref>). In general, when dealing with countermeasures against adversarial attacks, it is important to keep in mind that many proposed defenses have been broken in the past (Carlini and Wagner, <xref ref-type="bibr" rid="B17">2017b</xref>; Athalye et al., <xref ref-type="bibr" rid="B2">2018a</xref>), and that even the best defenses available and combinations thereof Carlini and Wagner (<xref ref-type="bibr" rid="B16">2017a</xref>) may not fully mitigate the problem of adversarial attacks.</p>
<p>In terms of <bold>defenses against backdoor poisoning attacks</bold> only a few promising proposals have been published in recent years (Tran et al., <xref ref-type="bibr" rid="B96">2018</xref>; Chen et al., <xref ref-type="bibr" rid="B19">2019</xref>; Wang et al., <xref ref-type="bibr" rid="B99">2019</xref>). Their main idea lies in the creation of a method which proposes possibly malicious data samples of the training set for manual examination. Those methods use the fact that a neural network trained on such a compromised data set learns the false classification of backdoored samples as exceptions, which can be detected from the internal representation of the network. It needs to be kept in mind though that those defenses do not provide any formal guarantees and might be circumvented by an adaptive adversary.</p>
<p>As a first step, instead of preventing AI-specific attacks altogether, <bold>reliably detecting</bold> them might be a somewhat easier and hence more realistic task (Carlini and Wagner, <xref ref-type="bibr" rid="B16">2017a</xref>). In case an attack is detected, the system might yield a special output corresponding to this situation, trigger an alarm and forward the apparently malicious input to another IT system or a human in the loop for further inspection. It depends on the application in question whether this approach is feasible. For instance, asking a human for feedback is incompatible by definition with fully autonomous driving at SAE level 5 (ORAD Committee, <xref ref-type="bibr" rid="B72">2018</xref>).</p>
<p>A different approach lies in using methods from the area of <bold>explainable AI (XAI)</bold> to better understand the underlying reasons for the decisions which an AI system takes (cf. <xref ref-type="fig" rid="F8">Figure 8</xref>). At the least, such methods may help to detect potential vulnerabilities and to develop more targeted defenses. One example is provided by Lapuschkin et al. (<xref ref-type="bibr" rid="B53">2019</xref>), which suggests a more diligent preprocessing of data for preventing the AI system from learning spurious correlations, which can easily be attacked. In principle, one can also hope that XAI methods will allow reasoning about the correctness of AI decisions under a certain range of circumstances. The field of XAI as focused on (deep) neural networks is quite young, and research has only started around 2015, although the general question of explaining decisions of AI systems dates back about 50 years (Samek et al., <xref ref-type="bibr" rid="B84">2019</xref>, pp. 41&#x02013;49). So far, it seems doubtful there will be a single method which will fit in every case. Rather, different conditions will require different approaches. On the one hand, the high-level use case has a strong impact on the applicable methods: When making predictions from structured data, probabilistic methods are considered promising (Molnar, <xref ref-type="bibr" rid="B68">2020</xref>), whereas applications from computer vision rely on more advanced methods like layer-wise relevance propagation (LRP) (Bach et al., <xref ref-type="bibr" rid="B4">2015</xref>; Samek et al., <xref ref-type="bibr" rid="B83">2016</xref>; Montavon et al., <xref ref-type="bibr" rid="B69">2017</xref>; Lapuschkin et al., <xref ref-type="bibr" rid="B53">2019</xref>). On the other hand, some methods provide global explanations, while others explain individual (local) decisions. It should be noted that by using principles similar to adversarial examples, current XAI methods can themselves be efficiently attacked. Such attacks may either be performed as an enhancement to adversarial examples targeting the model (Zhang et al., <xref ref-type="bibr" rid="B112">2018</xref>) or by completely altering the explanations provided while leaving model output unchanged (Dombrowski et al., <xref ref-type="bibr" rid="B28">2019</xref>). Based on theoretical and practical observations, both Zhang et al. (<xref ref-type="bibr" rid="B112">2018</xref>) and Dombrowski et al. (<xref ref-type="bibr" rid="B28">2019</xref>) suggest countermeasures for thwarting the respective attacks.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Schematic illustration of the application of explainable AI (XAI) methods to deduce <bold>(A)</bold> local and <bold>(B)</bold> global model behavior of an AI system. <bold>(A)</bold> shows how heat maps are generated after labels were obtained for a specific input image, in this case using LRP (Samek et al., <xref ref-type="bibr" rid="B83">2016</xref>), which assigns each input pixel a relative contribution to the output decision (green colors indicate lowest relevance, red colors highest relevance). <bold>(B)</bold> illustrates how many local model behavior explanations are combined to explain global model behavior, in this case using spectral analysis (cf. Lapuschkin et al., <xref ref-type="bibr" rid="B53">2019</xref>). Here multiple topographically distinct clusters for individual labels shown in a 2D projection of input space indicate some kind of problem: the small cluster for <italic>speed limit 100</italic> represents the backdooring attack using modified stop signs (cf. <xref ref-type="fig" rid="F4">Figure 4</xref>) and the small cluster for the yield sign represents the Clever Hans effect illustrated in detail in <bold>(C)</bold>, where specific image tags (here &#x0201C;&#x00040;yieldphoto&#x0201D;) correlate with specific input classes (here the yield sign) and the AI system focuses on these spurious correlations instead of causal correlations. Upon swapping the input data set (not containing any more spurious correlations of this kind), the AI model might show erroneous behavior.</p></caption>
<graphic xlink:href="fdata-03-00023-g0008.tif"/>
</fig>
<p>A third line of research linked to both other approaches is concerned with <bold>verifying and proving</bold> the safety and security of AI systems. Owing to the much greater complexity of this problem, results in this area, especially practically usable ones, are scarce (Huang et al., <xref ref-type="bibr" rid="B44">2017</xref>; Katz et al., <xref ref-type="bibr" rid="B50">2017</xref>; Gehr et al., <xref ref-type="bibr" rid="B34">2018</xref>; Wong et al., <xref ref-type="bibr" rid="B105">2018</xref>; Wong and Kolter, <xref ref-type="bibr" rid="B104">2018</xref>; Singh et al., <xref ref-type="bibr" rid="B87">2019</xref>). A general idea for harnessing the potential of XAI and verification methods may be applied, provided one manages to make these methods work on moderately small models. In this case, it might be possible to <bold>modularize</bold> the AI system in question so that core functions are mapped to small AI models (Mascharka et al., <xref ref-type="bibr" rid="B65">2018</xref>), which can then be checked and verified. From the perspective of data protection, this approach has the additional advantage that the use of specific data may be restricted to the training of specific modules. In contrast to monolithic models, this allows unlearning specific data by replacing the corresponding modules (Bourtoule et al., <xref ref-type="bibr" rid="B13">2019</xref>).</p>
</sec>
</sec>
<sec id="s5">
<title>5. Conclusion and Outlook</title>
<p>The life cycle of AI systems can give rise to malfunctions and is susceptible to targeted attacks at different levels. When facing naturally occurring circumstances and benign failures, i.e., in terms of safety, well-trained AI systems display robust performance in many cases. In practice, they may still show highly undesired behavior, as exemplified by several incidents involving Tesla cars (Wikipedia Contributors, <xref ref-type="bibr" rid="B103">2020</xref>). The main problem in this respect is insufficient training data. The black-box property of the systems aggravates this issue, in particular when it comes to gaining user trust or establishing guarantees on correct behavior of the system under a range of circumstances.</p>
<p>The situation is much more problematic though when it comes to the robustness to attacks exhibited by the systems. Whereas a lot of attacks can be combated using traditional measures of IT security, the AI-specific vulnerabilities to poisoning and evasion attacks can have grave consequences and do not yet admit reliable mitigations. Considerable effort has been put into researching AI-specific vulnerabilities, yet more is needed, since defenses still need to become more resilient to attackers if they are to be used in safety-critical applications. In order to achieve this goal, it seems furthermore indispensable to combine defense measures at different levels and not only focus on the internals of the AI system.</p>
<p>Additional open questions concern the area of XAI, which is quite recent with respect to complex AI systems. The capabilities and limitations of existing methods need to be better understood, and reliable and sensible benchmarks need to be constructed to compare them (Osman et al., <xref ref-type="bibr" rid="B73">2020</xref>). The topic of formal verification of the functionality of an AI system is an important enhancement that should further be studied. A general approach for obtaining better results from XAI and verification methods is to reduce complexity in the models to be analyzed. We argue that for safety-critical applications the size of AI systems used for certain tasks should be minimized subject to the desired performance. If possible, one might also envision using a modular system containing small modules, which lend themselves more easily to analysis. A thorough evaluation using suitable metrics should be considered a prerequisite for the deployment of any IT system and, therefore, of any AI system.</p>
<p>Thinking ahead, the issue of AI systems which are continuously being trained using fresh data (called continual learning, Parisi et al., <xref ref-type="bibr" rid="B78">2019</xref>) also needs to be considered. This approach poses at least two difficulties as compared to the more static life cycle considered in this article. On the one hand, depending on how the training is done, an attacker might have a much better opportunity for poisoning training data. On the other hand, results on robustness, resilience to attacks or correctness guarantees will only be valid for a certain version of a model and may quickly become obsolete. This might be tackled by using regular checkpoints and repeating the countermeasures and evaluations, at potentially high costs.</p>
<p>Considering the current state of the art in the field of XAI and verification, it is unclear whether it will ever be possible to formally certify the correct operation of an arbitrary AI system and construct a system which is immune to the AI-specific attacks presented in this article. It is conceivable that both certification results and defenses will continue to only yield probabilistic guarantees on the overall robustness and correct operation of the system. If this assumption turns out true for the foreseeable future, its implications for safety-critical applications of AI systems need to be carefully considered and discussed without bias. For instance, it is important to discuss which level of residual risk, if any, one might be willing to accept in return for possible benefits of AI over traditional solutions, and in what way the conformance to a risk level might be tested and confirmed. For instance, humans are required to pass a driving test before obtaining their driver&#x00027;s license and being allowed to drive on their own. While a human having passed a driving test is not guaranteed to always respect the traffic rules, to behave correctly and to not cause any harm to other traffic participants, the test enforces a certain standard. In a similar vein, one might imagine a special test to be passed by an AI system for obtaining regulatory approval. In these cases the risks and benefits of using an AI system and the boundary conditions for which the risk assessment is valid should be made transparent to the user. However, the use of any IT system that cannot be guaranteed to achieve the acceptable risk level as outlined above could in extreme cases be banned for particularly safety-critical applications. Specifically, such a ban could apply to pure AI systems, if they fail to achieve such guarantees.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>CB, MN, and AT conceived the article and surveyed relevant publications. CB wrote the original draft of the manuscript, with some help by MN. AT designed and created all the figures and tables, reviewed and edited the manuscript, with help by CB. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack><p>We would like to thank Ute Gebhardt, Rainer Plaga, Markus Ullmann, and Wojciech Samek for carefully proofreading earlier versions of this document and providing valuable suggestions for improvement. Further we would like to thank Frank Pasemann, Petar Tsankov, Vasilios Danos, and the VdT&#x000DC;V-BSI AI work group for fruitful discussions. We would also like to thank the reviewers for their helpful comments. This manuscript has been released as a preprint at arXiv:2003.08837.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angelino</surname> <given-names>E.</given-names></name> <name><surname>Larus-Stone</surname> <given-names>N.</given-names></name> <name><surname>Alabi</surname> <given-names>D.</given-names></name> <name><surname>Seltzer</surname> <given-names>M.</given-names></name> <name><surname>Rudin</surname> <given-names>C.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning certifiably optimal rule lists for categorical data</article-title>. <source>J. Mach. Learn. Res.</source> <volume>18</volume>, <fpage>1</fpage>&#x02013;<lpage>78</lpage>.</citation></ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Athalye</surname> <given-names>A.</given-names></name> <name><surname>Carlini</surname> <given-names>N.</given-names></name> <name><surname>Wagner</surname> <given-names>D.</given-names></name></person-group> (<year>2018a</year>). <article-title>Obfuscated gradients give a false sense of security: circumventing Defenses to adversarial examples</article-title>, in <source>Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Volume 80 of Proceedings of Machine Learning Research</source>, eds <person-group person-group-type="editor"><name><surname>Dy</surname> <given-names>J. G.</given-names></name> <name><surname>Krause</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>Stockholm</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>274</fpage>&#x02013;<lpage>283</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Athalye</surname> <given-names>A.</given-names></name> <name><surname>Engstrom</surname> <given-names>L.</given-names></name> <name><surname>Ilyas</surname> <given-names>A.</given-names></name> <name><surname>Kwok</surname> <given-names>K.</given-names></name></person-group> (<year>2018b</year>). <article-title>Synthesizing robust and adversarial examples</article-title>, in <source>Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Volume 80 of Proceedings of Machine Learning Research</source>, eds <person-group person-group-type="editor"><name><surname>Dy</surname> <given-names>J. G.</given-names></name> <name><surname>Krause</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>Stockholm</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>284</fpage>&#x02013;<lpage>293</lpage>.</citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bach</surname> <given-names>S.</given-names></name> <name><surname>Binder</surname> <given-names>A.</given-names></name> <name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Klauschen</surname> <given-names>F.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0130140</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0130140</pub-id><pub-id pub-id-type="pmid">26161953</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Balda</surname> <given-names>E. R.</given-names></name> <name><surname>Behboodi</surname> <given-names>A.</given-names></name> <name><surname>Mathar</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <source>Adversarial Examples in Deep Neural Networks: An Overview, Volume 865 of Studies in Computational Intelligence</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>31</fpage>&#x02013;<lpage>65</lpage>.</citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barreno</surname> <given-names>M.</given-names></name> <name><surname>Nelson</surname> <given-names>B.</given-names></name> <name><surname>Sears</surname> <given-names>R.</given-names></name> <name><surname>Joseph</surname> <given-names>A. D.</given-names></name> <name><surname>Tygar</surname> <given-names>J. D.</given-names></name></person-group> (<year>2006</year>). <article-title>Can machine learning be secure?</article-title> in <source>Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, ASIACCS 2006</source>, eds <person-group person-group-type="editor"><name><surname>Lin</surname> <given-names>F. C.</given-names></name> <name><surname>Lee</surname> <given-names>D. T.</given-names></name> <name><surname>Paul Lin</surname> <given-names>B. S.</given-names></name> <name><surname>Shieh</surname> <given-names>S.</given-names></name> <name><surname>Jajodia</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Taipei</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>16</fpage>&#x02013;<lpage>25</lpage>.</citation></ref>
<ref id="B7">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Berghoff</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Protecting the integrity of the training procedure of neural networks</article-title>. <source>arXiv:2005.06928</source>.</citation></ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bethge</surname> <given-names>A. G.</given-names></name></person-group> (<year>2019</year>). <source>Robust Vision Benchmark</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://robust.vision">https://robust.vision</ext-link> (accessed March 3, 2020).</citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Biggio</surname> <given-names>B.</given-names></name> <name><surname>Corona</surname> <given-names>I.</given-names></name> <name><surname>Maiorca</surname> <given-names>D.</given-names></name> <name><surname>Nelson</surname> <given-names>B.</given-names></name> <name><surname>Srndic</surname> <given-names>N.</given-names></name> <name><surname>Laskov</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2013</year>) <article-title>Evasion attacks against machine learning at test time</article-title>, in <source>Machine Learning Knowledge Discovery in Databases</source>, eds <person-group person-group-type="editor"><name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>Kersting</surname> <given-names>K.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>&#x0017D;elezn&#x000FD;</surname> <given-names>F.</given-names></name></person-group> (<publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>387</fpage>&#x02013;<lpage>402</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Biggio</surname> <given-names>B.</given-names></name> <name><surname>Nelson</surname> <given-names>B.</given-names></name> <name><surname>Laskov</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>Poisoning attacks against support vector machines</article-title>, in <source>Proceedings of the 29th International Conference on Machine Learning (ICML)</source>, eds <person-group person-group-type="editor"><name><surname>Langford</surname> <given-names>J.</given-names></name> <name><surname>Pineau</surname> <given-names>J.</given-names></name></person-group> (<publisher-name>Omnipress</publisher-name>), <fpage>1807</fpage>&#x02013;<lpage>1814</lpage>.</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biggio</surname> <given-names>B.</given-names></name> <name><surname>Roli</surname> <given-names>F.</given-names></name></person-group> (<year>2018</year>). <article-title>Wild patterns: ten years after the rise of adversarial machine learning</article-title>. <source>Pattern Recogn.</source> <volume>84</volume>, <fpage>317</fpage>&#x02013;<lpage>331</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2018.07.023</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blackmore</surname> <given-names>K. L.</given-names></name> <name><surname>Williamson</surname> <given-names>R. C.</given-names></name> <name><surname>Mareels</surname> <given-names>I. M. Y.</given-names></name></person-group> (<year>2006</year>). <article-title>Decision region approximation by polynomials or neural networks</article-title>. <source>IEEE Trans. Inform. Theory</source> <volume>43</volume>, <fpage>903</fpage>&#x02013;<lpage>907</lpage>. <pub-id pub-id-type="doi">10.1109/18.568700</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Bourtoule</surname> <given-names>L.</given-names></name> <name><surname>Chandrasekaran</surname> <given-names>V.</given-names></name> <name><surname>Choquette-Choo</surname> <given-names>C.</given-names></name> <name><surname>Jia</surname> <given-names>H.</given-names></name> <name><surname>Travers</surname> <given-names>A.</given-names></name> <name><surname>Zhang</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Machine unlearning</article-title>. <italic>arXiv</italic> abs/1912.03817.</citation></ref>
<ref id="B14">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>T. B.</given-names></name> <name><surname>Man&#x000E9;</surname> <given-names>D.</given-names></name> <name><surname>Roy</surname> <given-names>A.</given-names></name> <name><surname>Abadi</surname> <given-names>M.</given-names></name> <name><surname>Gilmer</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Adversarial patch</article-title>. <italic>arXiv</italic> abs/1712.09665.</citation></ref>
<ref id="B15">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Carlini</surname> <given-names>N.</given-names></name> <name><surname>Athalye</surname> <given-names>A.</given-names></name> <name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>Brendel</surname> <given-names>W.</given-names></name> <name><surname>Rauber</surname> <given-names>J.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>On evaluating adversarial robustness</article-title>. <italic>arXiv</italic> abs/1902.06705.</citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Carlini</surname> <given-names>N.</given-names></name> <name><surname>Wagner</surname> <given-names>D.</given-names></name></person-group> (<year>2017a</year>). <article-title>Adversarial examples are not easily detected: bypassing ten detection methods</article-title>, in <source>Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec &#x00027;17)</source>, eds <person-group person-group-type="editor"><name><surname>Thuraisingham</surname> <given-names>B. M.</given-names></name> <name><surname>Biggio</surname> <given-names>B.</given-names></name> <name><surname>Freeman</surname> <given-names>D. M.</given-names></name> <name><surname>Miller</surname> <given-names>B.</given-names></name> <name><surname>Sinha</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>14</lpage>.</citation></ref>
<ref id="B17">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Carlini</surname> <given-names>N.</given-names></name> <name><surname>Wagner</surname> <given-names>D.</given-names></name></person-group> (<year>2017b</year>). <article-title>MagNet and &#x0201C;efficient defenses against adversarial attacks&#x0201D; are not robust to adversarial examples</article-title>. <italic>arXiv</italic> abs/1711.08478.</citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Carlini</surname> <given-names>N.</given-names></name> <name><surname>Wagner</surname> <given-names>D.</given-names></name></person-group> (<year>2017c</year>). <article-title>Towards evaluating the robustness of neural networks</article-title>, in <source>IEEE Symposium on Security and Privacy (SP)</source> (<publisher-loc>San Jose, CA</publisher-loc>), <fpage>39</fpage>&#x02013;<lpage>57</lpage>.</citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>B.</given-names></name> <name><surname>Carvalho</surname> <given-names>W.</given-names></name> <name><surname>Baracaldo</surname> <given-names>N.</given-names></name> <name><surname>Ludwig</surname> <given-names>H.</given-names></name> <name><surname>Edwards</surname> <given-names>B.</given-names></name> <name><surname>Lee</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Detecting backdoor attacks on deep neural networks by activation clustering</article-title>, in <source>Workshop on Artificial Intelligence Safety 2019 Co-located With the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI-19), Volume 2301 of CEUR Workshop Proceedings</source>, eds <person-group person-group-type="editor"><name><surname>Espinoza</surname> <given-names>H.</given-names></name> <name><surname>h&#x000C9;igeartaigh</surname> <given-names>S.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Hern&#x000E1;ndez-Orallo</surname> <given-names>J.</given-names></name> <name><surname>Castillo-Effen</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>CEUR-WS.org</publisher-name>).</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Zhou</surname> <given-names>D.</given-names></name> <name><surname>Yi</surname> <given-names>J.</given-names></name> <name><surname>Gu</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>A Frank-Wolfe framework for efficient and effective adversarial attacks</article-title>, in <source>Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 (AAAI-20)</source> (<publisher-loc>New York, NY</publisher-loc>).</citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>P. Y.</given-names></name> <name><surname>Sharma</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Yi</surname> <given-names>J.</given-names></name> <name><surname>Hsieh</surname> <given-names>C. J.</given-names></name></person-group> (<year>2018</year>). <article-title>EAD: elastic-net attacks to deep neural networks via adversarial examples</article-title>, in <source>Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18)</source>, eds <person-group person-group-type="editor"><name><surname>McIlraith</surname> <given-names>S. A.</given-names></name> <name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name></person-group> (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>AAAI Press</publisher-name>), <fpage>10</fpage>&#x02013;<lpage>17</lpage>.</citation></ref>
<ref id="B22">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Lu</surname> <given-names>K.</given-names></name> <name><surname>Song</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Targeted backdoor Attacks on deep learning systems using data poisoning</article-title>. <italic>arXiv</italic> abs/1712.05526.</citation></ref>
<ref id="B23">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Chung</surname> <given-names>Y.</given-names></name> <name><surname>Haas</surname> <given-names>P. J.</given-names></name> <name><surname>Upfal</surname> <given-names>E.</given-names></name> <name><surname>Kraska</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Unknown examples &#x00026; machine learning model generalization</article-title>. <italic>arXiv</italic> abs/1808.08294.</citation></ref>
<ref id="B24">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Clements</surname> <given-names>J.</given-names></name> <name><surname>Lao</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Hardware trojan attacks on neural networks</article-title>. <italic>arXiv</italic> abs/1806.05768.</citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dalvi</surname> <given-names>N. N.</given-names></name> <name><surname>Domingos</surname> <given-names>P. M.</given-names></name> <name><surname>Mausam</surname> <given-names>Sanghai, S. K.</given-names></name> <name><surname>Verma</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>Adversarial classification</article-title>, in <source>Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, eds <person-group person-group-type="editor"><name><surname>Kim</surname> <given-names>W.</given-names></name> <name><surname>Kohavi</surname> <given-names>R.</given-names></name> <name><surname>Gehrke</surname> <given-names>J.</given-names></name> <name><surname>DuMouchel</surname> <given-names>W.</given-names></name></person-group> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>99</fpage>&#x02013;<lpage>108</lpage>.</citation></ref>
<ref id="B26">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Das</surname> <given-names>N.</given-names></name> <name><surname>Shanbhogue</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>S. T.</given-names></name> <name><surname>Hohman</surname> <given-names>F.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Kounavis</surname> <given-names>M. E.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression</article-title>. <italic>arXiv</italic> abs/1705.02900.</citation></ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Devlin</surname> <given-names>J.</given-names></name> <name><surname>Chang</surname> <given-names>M. W.</given-names></name> <name><surname>Lee</surname> <given-names>K.</given-names></name> <name><surname>Toutanova</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>, in <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Volume 1 (Long and Short Papers)</source>, eds <person-group person-group-type="editor"><name><surname>Burstein</surname> <given-names>J.</given-names></name> <name><surname>Doran</surname> <given-names>C.</given-names></name> <name><surname>Solorio</surname> <given-names>T.</given-names></name></person-group> (<publisher-loc>Minneapolis, MN</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>4171</fpage>&#x02013;<lpage>4186</lpage>.</citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dombrowski</surname> <given-names>A. K.</given-names></name> <name><surname>Alber</surname> <given-names>M.</given-names></name> <name><surname>Anders</surname> <given-names>C. J.</given-names></name> <name><surname>Ackermann</surname> <given-names>M.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name> <name><surname>Kessel</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Explanations can be manipulated and geometry is to blame</article-title>, in <source>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019</source>, eds <person-group person-group-type="editor"><name><surname>Wallach</surname> <given-names>H. M.</given-names></name> <name><surname>Larochelle</surname> <given-names>H.</given-names></name> <name><surname>Beygelzimer</surname> <given-names>A.</given-names></name> <name><surname>d&#x00027;Alch&#x000E9;-Buc</surname> <given-names>F.</given-names></name> <name><surname>Fox</surname> <given-names>E. B.</given-names></name> <name><surname>Garnett</surname> <given-names>R.</given-names></name></person-group>, (<publisher-loc>Vancouver, BC</publisher-loc>), <fpage>13567</fpage>&#x02013;<lpage>13578</lpage>.</citation></ref>
<ref id="B29">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Dziugaite</surname> <given-names>G. K.</given-names></name> <name><surname>Ghahramani</surname> <given-names>Z.</given-names></name> <name><surname>Roy</surname> <given-names>D. M.</given-names></name></person-group> (<year>2016</year>). <article-title>A study of the effect of JPG compression on adversarial images</article-title>. <italic>arXiv</italic> abs/1608.00853.</citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eagleman</surname> <given-names>D. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Visual illusions and neurobiology</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>2</volume>, <fpage>920</fpage>&#x02013;<lpage>926</lpage>. <pub-id pub-id-type="doi">10.1038/35104092</pub-id><pub-id pub-id-type="pmid">11733799</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Evtimov</surname> <given-names>I.</given-names></name> <name><surname>Eykholt</surname> <given-names>K.</given-names></name> <name><surname>Fernandes</surname> <given-names>E.</given-names></name> <name><surname>Kohno</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Prakash</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Robust physical-world attacks on machine learning models</article-title>. <italic>arXiv</italic> abs/1707.08945.</citation></ref>
<ref id="B32">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Eykholt</surname> <given-names>K.</given-names></name> <name><surname>Evtimov</surname> <given-names>I.</given-names></name> <name><surname>Fernandes</surname> <given-names>E.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Song</surname> <given-names>D.</given-names></name> <name><surname>Kohno</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Note on attacking object detectors with adversarial stickers</article-title>. <italic>arXiv</italic> abs/1712.08062.</citation></ref>
<ref id="B33">
<citation citation-type="web"><person-group person-group-type="author"><collab>Facebook</collab></person-group>. <source>PyTorch</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://pytorch.org">https://pytorch.org</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gehr</surname> <given-names>T.</given-names></name> <name><surname>Mirman</surname> <given-names>M.</given-names></name> <name><surname>Drachsler-Cohen</surname> <given-names>D.</given-names></name> <name><surname>Tsankov</surname> <given-names>P.</given-names></name> <name><surname>Chaudhuri</surname> <given-names>S.</given-names></name> <name><surname>Vechev</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>AI2: safety and robustness certification of neural networks with abstract interpretation</article-title>, in <source>IEEE Symposium on Security and Privacy (SP)</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>3</fpage>&#x02013;<lpage>18</lpage>.</citation></ref>
<ref id="B35">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Gilmer</surname> <given-names>J.</given-names></name> <name><surname>Adams</surname> <given-names>R. P.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name> <name><surname>Andersen</surname> <given-names>D.</given-names></name> <name><surname>Dahl</surname> <given-names>G. E.</given-names></name></person-group> (<year>2018</year>). <article-title>Motivating the rules of the game for adversarial example research</article-title>. <italic>arXiv</italic> abs/1807.06732.</citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gilpin</surname> <given-names>L. H.</given-names></name> <name><surname>Bau</surname> <given-names>D.</given-names></name> <name><surname>Yuan</surname> <given-names>B. Z.</given-names></name> <name><surname>Bajwa</surname> <given-names>A.</given-names></name> <name><surname>Specter</surname> <given-names>M.</given-names></name> <name><surname>Kagal</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>Explaining explanations: an overview of interpretability of machine learning</article-title>, in <source>5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018</source>, eds <person-group person-group-type="editor"><name><surname>Bonchi</surname> <given-names>F.</given-names></name> <name><surname>Provost</surname> <given-names>F. J.</given-names></name> <name><surname>Eliassi-Rad</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Cattuto</surname> <given-names>C.</given-names></name> <name><surname>Ghani</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Turin</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>80</fpage>&#x02013;<lpage>89</lpage>.</citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gohorbani</surname> <given-names>A.</given-names></name> <name><surname>Natarajan</surname> <given-names>V.</given-names></name> <name><surname>Coz</surname> <given-names>D. D.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>DermGAN: synthetic generation of clinical skin images with pathology</article-title>, in <source>Proceedings of Machine Learning for Health (ML4H) at NeurIPS 2019</source> (<publisher-loc>Vancouver, BC</publisher-loc>).</citation></ref>
<ref id="B38">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name> <name><surname>Shlens</surname> <given-names>J.</given-names></name> <name><surname>Szegedy</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Explaining and harnessing adversarial examples</article-title>, in <source>International Conference on Learning Representations</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1412.6572">http://arxiv.org/abs/1412.6572</ext-link></citation></ref>
<ref id="B39">
<citation citation-type="web"><person-group person-group-type="author"><collab>Google Brain</collab></person-group>. <source>TensorFlow</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.tensorflow.org">https://www.tensorflow.org</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B40">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>T.</given-names></name> <name><surname>Dolan-Gavitt</surname> <given-names>B.</given-names></name> <name><surname>Garg</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>BadNets: identifying vulnerabilities in the machine learning model supply chain</article-title>. <italic>arXiv</italic> abs/1708.06733.</citation></ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Haykin</surname> <given-names>S.</given-names></name></person-group> (<year>1999</year>). <source>Neural Networks, 2nd Edn.</source> <publisher-loc>Upper Saddle River, NJ</publisher-loc>: <publisher-name>Prentice Hall</publisher-name>.</citation></ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep residual learning for image recognition</article-title>, in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>.</citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hornik</surname> <given-names>K.</given-names></name> <name><surname>Stinchcombe</surname> <given-names>M. B.</given-names></name> <name><surname>White</surname> <given-names>H.</given-names></name></person-group> (<year>1989</year>). <article-title>Multilayer feedforward networks are universal approximators</article-title>. <source>Neural Netw.</source> <volume>2</volume>, <fpage>359</fpage>&#x02013;<lpage>366</lpage>.</citation></ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Kwiatkowska</surname> <given-names>M.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Safety verification of deep neural networks</article-title>, in <source>Computer Aided Verification&#x02013;29th International Conference, CAV 2017, Proceedings, Part I, Volume 10426 of Lecture Notes in Computer Science</source>, eds <person-group person-group-type="editor"><name><surname>Majumdar</surname> <given-names>R.</given-names></name> <name><surname>Kuncak</surname> <given-names>V.</given-names></name></person-group> (<publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>29</lpage>.</citation></ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ilyas</surname> <given-names>A.</given-names></name> <name><surname>Santurkar</surname> <given-names>S.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <name><surname>Engstrom</surname> <given-names>L.</given-names></name> <name><surname>Tran</surname> <given-names>B.</given-names></name> <name><surname>Madry</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Adversarial examples are not bugs, they are features</article-title>, in <source>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019</source>, eds <person-group person-group-type="editor"><name><surname>Wallach</surname> <given-names>H. M.</given-names></name> <name><surname>Larochelle</surname> <given-names>H.</given-names></name> <name><surname>Beygelzimer</surname> <given-names>A.</given-names></name> <name><surname>d&#x00027;Alch&#x000E9;-Buc</surname> <given-names>F.</given-names></name> <name><surname>Fox</surname> <given-names>E. B.</given-names></name> <name><surname>Garnett</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Vancouver, BC, Canada</publisher-loc>), <fpage>125</fpage>&#x02013;<lpage>136</lpage>.</citation></ref>
<ref id="B46">
<citation citation-type="web"><person-group person-group-type="author"><collab>INRIA</collab></person-group>. <source>Scikit-Learn</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://scikit-learn.org/stable/">https://scikit-learn.org/stable/</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jakubovitz</surname> <given-names>D.</given-names></name> <name><surname>Giryes</surname> <given-names>R.</given-names></name> <name><surname>Rodrigues</surname> <given-names>M. R. D.</given-names></name></person-group> (<year>2019</year>). <article-title>Generalization error in deep learning</article-title>, in <source>Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis</source>, eds <person-group person-group-type="editor"><name><surname>Boche</surname> <given-names>H.</given-names></name> <name><surname>Caire</surname> <given-names>G.</given-names></name> <name><surname>Calderbank</surname> <given-names>R.</given-names></name> <name><surname>Kutyniok</surname> <given-names>G.</given-names></name> <name><surname>Mathar</surname> <given-names>R.</given-names></name> <name><surname>Petersen</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Birkh&#x000E4;user</publisher-name>). <pub-id pub-id-type="doi">10.1007/978-3-319-73074-5_5</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Programmable neural network trojan for pre-trained feature extractor</article-title>. <italic>arXiv</italic> abs/1901.07766.</citation></ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Juba</surname> <given-names>B.</given-names></name> <name><surname>Le</surname> <given-names>H. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Precision-recall versus accuracy and the role of large data sets</article-title>, in <source>The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)</source>, Vol. <volume>33</volume> (<publisher-loc>Honolulu, HI</publisher-loc>).</citation></ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Katz</surname> <given-names>G.</given-names></name> <name><surname>Barrett</surname> <given-names>C. W.</given-names></name> <name><surname>Dill</surname> <given-names>D. L.</given-names></name> <name><surname>Julian</surname> <given-names>K.</given-names></name> <name><surname>Kochenderfer</surname> <given-names>M. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Reluplex: an efficient SMT solver for verifying deep neural networks</article-title>, in <source>Computer Aided Verification&#x02013;29th International Conference, CAV 2017, Proceedings, Part I, Volume 10426 of Lecture Notes in Computer Science</source>, eds <person-group person-group-type="editor"><name><surname>Majumdar</surname> <given-names>R.</given-names></name> <name><surname>Kuncak</surname> <given-names>V.</given-names></name></person-group> (<publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>97</fpage>&#x02013;<lpage>117</lpage>.</citation></ref>
<ref id="B51">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Khoury</surname> <given-names>M.</given-names></name> <name><surname>Hadfield-Menell</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>On the geometry of adversarial examples</article-title>. <italic>arXiv</italic> abs/1811.00525.</citation></ref>
<ref id="B52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>B.</given-names></name> <name><surname>Kim</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>K.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning not to learn: training deep neural networks with biased data</article-title>, in <source>The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Long Beach, CA</publisher-loc>).</citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lapuschkin</surname> <given-names>S.</given-names></name> <name><surname>W&#x000E4;ldchen</surname> <given-names>S.</given-names></name> <name><surname>Binder</surname> <given-names>A.</given-names></name> <name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name></person-group> (<year>2019</year>). <article-title>Unmasking Clever Hans predictors and assessing what machines really learn</article-title>. <source>Nat. Commun.</source> <volume>10</volume>, <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1038/s41467-019-08987-4</pub-id><pub-id pub-id-type="pmid">30858366</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bottou</surname> <given-names>L.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Haffner</surname> <given-names>P.</given-names></name></person-group> (<year>1998</year>). <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proc. IEEE</source> <volume>86</volume>, <fpage>2278</fpage>&#x02013;<lpage>2324</lpage>.</citation></ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lederberg</surname> <given-names>J.</given-names></name></person-group> (<year>1987</year>). <article-title>How DENDRAL was conceived and born</article-title>, in <source>Proceedings of the ACM Conference on History of Medical Informatics</source>, ed <person-group person-group-type="editor"><name><surname>Blum</surname> <given-names>B. I.</given-names></name></person-group> (<publisher-loc>Bethesda, MD</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>5</fpage>&#x02013;<lpage>19</lpage>.</citation></ref>
<ref id="B56">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>Analysis on the nonlinear dynamics of deep neural networks: topological entropy and chaos</article-title>. <italic>arXiv</italic> abs/1804.03987.</citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Zhao</surname> <given-names>W.</given-names></name> <name><surname>Cai</surname> <given-names>W.</given-names></name> <name><surname>Yu</surname> <given-names>S.</given-names></name> <name><surname>Leung</surname> <given-names>V. C. M.</given-names></name></person-group> (<year>2018</year>). <article-title>A survey on security threats and defensive techniques of machine learning: a data driven view</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>12103</fpage>&#x02013;<lpage>12117</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2018.2805680</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Ma</surname> <given-names>S.</given-names></name> <name><surname>Aafer</surname> <given-names>Y.</given-names></name> <name><surname>Lee</surname> <given-names>W. C.</given-names></name> <name><surname>Zhai</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Trojaning attack on neural networks</article-title>, in <source>25th Annual Network and Distributed System Security Symposium, NDSS 2018</source> (<publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>The Internet Society</publisher-name>).</citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loftus</surname> <given-names>E. F.</given-names></name></person-group> (<year>2005</year>). <article-title>Planting misinformation in the human mind: a 30-year investigation of the malleability of memory</article-title>. <source>Learn. Mem.</source> <volume>12</volume>, <fpage>361</fpage>&#x02013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1101/lm.94705</pub-id><pub-id pub-id-type="pmid">16027179</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lowd</surname> <given-names>D.</given-names></name> <name><surname>Meek</surname> <given-names>C.</given-names></name></person-group> (<year>2005</year>). <article-title>Adversarial learning</article-title>, in <source>Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, eds <person-group person-group-type="editor"><name><surname>Grossman</surname> <given-names>R.</given-names></name> <name><surname>Bayardo</surname> <given-names>R. J.</given-names></name> <name><surname>Bennett</surname> <given-names>K. P.</given-names></name></person-group> (<publisher-loc>Chicago, IL</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>641</fpage>&#x02013;<lpage>647</lpage>.</citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lundberg</surname> <given-names>S. M.</given-names></name> <name><surname>Erion</surname> <given-names>G. G.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>DeGrave</surname> <given-names>A.</given-names></name> <name><surname>Prutkin</surname> <given-names>J. M.</given-names></name> <name><surname>Nair</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Explainable AI for trees: from local explanations to global understanding</article-title>. <source>Nat. Mach. Intell.</source> <volume>2</volume>, <fpage>56</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-019-0138-9</pub-id><pub-id pub-id-type="pmid">32607472</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Madry</surname> <given-names>A.</given-names></name> <name><surname>Athalye</surname> <given-names>A.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <name><surname>Engstrom</surname> <given-names>L.</given-names></name></person-group> (<year>2019</year>). <source>RobustML</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.robust-ml.org/">https://www.robust-ml.org/</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B63">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Madry</surname> <given-names>A.</given-names></name> <name><surname>Makelov</surname> <given-names>A.</given-names></name> <name><surname>Schmidt</surname> <given-names>L.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <name><surname>Vladu</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Towards deep learning models resistant to adversarial attack</article-title>, in <source>6th International Conference on Learning Representations</source> (<publisher-loc>Vancouver, BC</publisher-loc>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1706.06083">http://arxiv.org/abs/1706.06083</ext-link></citation></ref>
<ref id="B64">
<citation citation-type="book"><person-group person-group-type="editor"><name><surname>Marcel</surname> <given-names>S.</given-names></name> <name><surname>Nixon</surname> <given-names>M. S.</given-names></name> <name><surname>Fierrez</surname> <given-names>J.</given-names></name></person-group> (Eds.). (<year>2019</year>). <source>Handbook of Biometric Anti-Spoofing: Presentation Attack Detection</source>. <publisher-loc>Advances in Computer Vision and Pattern Recognition (Basel</publisher-loc>: <publisher-name>Springer International Publishing)</publisher-name>.</citation>
</ref>
<ref id="B65">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mascharka</surname> <given-names>D.</given-names></name> <name><surname>Tran</surname> <given-names>P.</given-names></name> <name><surname>Soklaski</surname> <given-names>R.</given-names></name> <name><surname>Majumdar</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Transparency by design: closing the gap between performance and interpretability in visual reasoning</article-title>, in <source>2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>4942</fpage>&#x02013;<lpage>4950</lpage>.</citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCulloch</surname> <given-names>W.</given-names></name> <name><surname>Pitts</surname> <given-names>W.</given-names></name></person-group> (<year>1943</year>). <article-title>A logical calculus of ideas immanent in nervous activity</article-title>. <source>Bull. Math. Biophys.</source> <volume>5</volume>, <fpage>115</fpage>&#x02013;<lpage>133</lpage>. <pub-id pub-id-type="pmid">2185863</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mei</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>X.</given-names></name></person-group> (<year>2015</year>). <article-title>Using machine teaching to identify optimal training-set attacks on machine learners</article-title>, in <source>Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</source>, eds <person-group person-group-type="editor"><name><surname>Bonet</surname> <given-names>B.</given-names></name> <name><surname>Koenig</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Austin, TX</publisher-loc>: <publisher-name>AAAI Press</publisher-name>), <fpage>2871</fpage>&#x02013;<lpage>2877</lpage>.</citation></ref>
<ref id="B68">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Molnar</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <source>Interpretable Machine Learning&#x02013;A Guide for Making Black Box Models Explainable</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://christophm.github.io/interpretable-ml-book/">https://christophm.github.io/interpretable-ml-book/</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Bach</surname> <given-names>S.</given-names></name> <name><surname>Binder</surname> <given-names>A.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name></person-group> (<year>2017</year>). <article-title>Explaining nonlinear classification decisions with deep taylor decomposition</article-title>. <source>Pattern Recogn.</source> <volume>65</volume>, <fpage>211</fpage>&#x02013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2016.11.008</pub-id></citation></ref>
<ref id="B70">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mont&#x000FA;far</surname> <given-names>G. F.</given-names></name> <name><surname>Pascanu</surname> <given-names>R.</given-names></name> <name><surname>Cho</surname> <given-names>K.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>On the number of linear regions of deep neural networks</article-title>, in <source>NIPS&#x00027;14: Proceedings of the 27th International Conference on Neural Information Processing Systems</source>, Vol. <volume>2</volume> (<publisher-loc>Montreal, QC</publisher-loc>, <fpage>2924</fpage>&#x02013;<lpage>2932</lpage>.</citation></ref>
<ref id="B71">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Novak</surname> <given-names>R.</given-names></name> <name><surname>Bahri</surname> <given-names>Y.</given-names></name> <name><surname>Abolafia</surname> <given-names>D. A.</given-names></name> <name><surname>Pennington</surname> <given-names>J.</given-names></name> <name><surname>Sohl-Dickstein</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Sensitivity and generalization in neural networks: an empirical study</article-title>, in <source>International Conference on Learning Representations</source> (<publisher-loc>Vancouver, BC</publisher-loc>).</citation></ref>
<ref id="B72">
<citation citation-type="book"><person-group person-group-type="author"><collab>On-Road Automated Driving (ORAD) Committee</collab></person-group> (<year>2018</year>). <source>Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles J3016_201806</source>. Technical report, <publisher-name>SAE International</publisher-name>.</citation></ref>
<ref id="B73">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Osman</surname> <given-names>A.</given-names></name> <name><surname>Arras</surname> <given-names>L.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Towards ground truth evaluation of visual explanations</article-title>. <italic>arXiv</italic> abs/2003.07258.</citation></ref>
<ref id="B74">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>McDaniel</surname> <given-names>P. D.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name></person-group> (<year>2016a</year>). <article-title>Transferability in machine learning: from phenomena to black-box attacks using adversarial samples</article-title>. <italic>arXiv</italic> abs/1605.07277.</citation></ref>
<ref id="B75">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>McDaniel</surname> <given-names>P. D.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name> <name><surname>Jha</surname> <given-names>S.</given-names></name> <name><surname>Celik</surname> <given-names>Z. B.</given-names></name> <name><surname>Swami</surname> <given-names>A.</given-names></name></person-group> (<year>2016b</year>). <article-title>Practical black-box attacks against deep learning systems using adversarial examples</article-title>. <italic>arXiv</italic> abs/1602.02697.</citation></ref>
<ref id="B76">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>McDaniel</surname> <given-names>P. D.</given-names></name> <name><surname>Jha</surname> <given-names>S.</given-names></name> <name><surname>Fredrikson</surname> <given-names>M.</given-names></name> <name><surname>Celik</surname> <given-names>Z. B.</given-names></name> <name><surname>Swami</surname> <given-names>A.</given-names></name></person-group> (<year>2016c</year>). <article-title>The limitations of deep learning in adversarial settings</article-title>, in <source>IEEE European Symposium on Security and Privacy, EuroS&#x00026;P 2016</source> (<publisher-loc>Saarbr&#x000FC;cken</publisher-loc>), <fpage>372</fpage>&#x02013;<lpage>387</lpage>.</citation></ref>
<ref id="B77">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>McDaniel</surname> <given-names>P. D.</given-names></name> <name><surname>Sinha</surname> <given-names>A.</given-names></name> <name><surname>Wellman</surname> <given-names>M. P.</given-names></name></person-group> (<year>2016d</year>). <article-title>SoK: security and privacy in machine learning</article-title>, in <source>2018 IEEE European Symposium on Security and Privacy, EuroS&#x00026;P 2018</source> (<publisher-loc>London</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>399</fpage>&#x02013;<lpage>414</lpage>.</citation></ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parisi</surname> <given-names>G. I.</given-names></name> <name><surname>Kemker</surname> <given-names>R.</given-names></name> <name><surname>Part</surname> <given-names>J. L.</given-names></name> <name><surname>Kanan</surname> <given-names>C.</given-names></name> <name><surname>Wermter</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Continual lifelong learning with neural networks: a review</article-title>. <source>Neural Netw.</source> <volume>113</volume>, <fpage>54</fpage>&#x02013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2019.01.012</pub-id><pub-id pub-id-type="pmid">30780045</pub-id></citation></ref>
<ref id="B79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pasemann</surname> <given-names>F.</given-names></name></person-group> (<year>2002</year>). <article-title>Complex dynamics and the structure of small neural networks</article-title>. <source>Netw. Comput. Neural Syst.</source> <volume>13</volume>, <fpage>195</fpage>&#x02013;<lpage>216</lpage>. <pub-id pub-id-type="doi">10.1080/net.13.2.195.216</pub-id><pub-id pub-id-type="pmid">12061420</pub-id></citation></ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rudin</surname> <given-names>C.</given-names></name></person-group> (<year>2019</year>). <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>. <source>Nat. Mach. Intell.</source> <volume>1</volume>, <fpage>206</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-019-0048-x</pub-id></citation></ref>
<ref id="B81">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Saha</surname> <given-names>A.</given-names></name> <name><surname>Subramanya</surname> <given-names>A.</given-names></name> <name><surname>Pirsiavash</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Hidden trigger backdoor attacks</article-title>, in <source>Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 (AAAI-20)</source> (<publisher-loc>New York City, NY</publisher-loc>).</citation></ref>
<ref id="B82">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Salman</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Razenshteyn</surname> <given-names>I. P.</given-names></name> <name><surname>Zhang</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Bubeck</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Provably robust deep learning via adversarially trained smoothed classifiers</article-title>, in <source>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019</source>, eds <person-group person-group-type="editor"><name><surname>Wallach</surname> <given-names>H. M.</given-names></name> <name><surname>Larochelle</surname> <given-names>H.</given-names></name> <name><surname>Beygelzimer</surname> <given-names>A.</given-names></name> <name><surname>d&#x00027;Alch&#x000E9;-Buc</surname> <given-names>F.</given-names></name> <name><surname>Fox</surname> <given-names>E. B.</given-names></name> <name><surname>Garnett</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Vancouver, BC</publisher-loc>), <fpage>11289</fpage>&#x02013;<lpage>11300</lpage>.</citation></ref>
<ref id="B83">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Binder</surname> <given-names>A.</given-names></name> <name><surname>Lapuschkin</surname> <given-names>S.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name></person-group> (<year>2016</year>). <article-title>Interpreting the predictions of complex ML models by layer-wise relevance propagation</article-title>. <italic>arXiv</italic> abs/1611.08191.</citation></ref>
<ref id="B84">
<citation citation-type="book"><person-group person-group-type="editor"><name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Hansen</surname> <given-names>L. K.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name></person-group> (eds.). (<year>2019</year>). <source>Explainable AI: Interpreting, Explaining and Visualizing Deep Learning</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B85">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sharif</surname> <given-names>M.</given-names></name> <name><surname>Bhagavatula</surname> <given-names>S.</given-names></name> <name><surname>Bauer</surname> <given-names>L.</given-names></name> <name><surname>Reiter</surname> <given-names>M. K.</given-names></name></person-group> (<year>2016</year>). <article-title>Accessorize to a crime</article-title>, in <source>Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security</source>, eds <person-group person-group-type="editor"><name><surname>Weippl</surname> <given-names>E. R.</given-names></name> <name><surname>Katzenbeisser</surname> <given-names>S.</given-names></name> <name><surname>Kruegel</surname> <given-names>C.</given-names></name> <name><surname>Myers</surname> <given-names>A. C.</given-names></name> <name><surname>Halevi</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Vienna</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>1528</fpage>&#x02013;<lpage>1540</lpage>.</citation></ref>
<ref id="B86">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>, in <source>3rd International Conference on Learning Representations</source>, eds Y. Bengio and Y. LeCun (<publisher-loc>San Diego</publisher-loc>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1409.1556">http://arxiv.org/abs/1409.1556</ext-link></citation></ref>
<ref id="B87">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>G.</given-names></name> <name><surname>Gehr</surname> <given-names>T.</given-names></name> <name><surname>P&#x000FC;schel</surname> <given-names>M.</given-names></name> <name><surname>Vechev</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>An abstract domain for certifying neural networks</article-title>, in <source>Proceedings of the ACM Symposium on Principles of Programming Languages 2019</source>, Vol. <volume>3</volume> (<publisher-loc>Cascais</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>30</lpage>.</citation></ref>
<ref id="B88">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>D.</given-names></name> <name><surname>Eykholt</surname> <given-names>K.</given-names></name> <name><surname>Evtimov</surname> <given-names>I.</given-names></name> <name><surname>Fernandes</surname> <given-names>E.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Rahmati</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Physical adversarial examples for object detectors</article-title>, in <source>12th USENIX Workshop on Offensive Technologies, WOOT 2018</source>, eds <person-group person-group-type="editor"><name><surname>Rossow</surname> <given-names>C.</given-names></name> <name><surname>Younan</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Baltimore, MD</publisher-loc>: <publisher-name>USENIX Association</publisher-name>).</citation></ref>
<ref id="B89">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>Q.</given-names></name> <name><surname>Yan</surname> <given-names>Z.</given-names></name> <name><surname>Tan</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). <article-title>Moving target defense for deep visual sensing against adversarial examples</article-title>. <italic>arXiv</italic> abs/1905.13148.</citation></ref>
<ref id="B90">
<citation citation-type="web"><person-group person-group-type="author"><collab>Stanford Vision Lab.</collab></person-group> (<year>2016</year>). <source>ImageNet</source>. available online at: <ext-link ext-link-type="uri" xlink:href="http://image-net.org/index">http://image-net.org/index</ext-link> (accessed March 17, 2020).</citation></ref>
<ref id="B91">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>C.</given-names></name> <name><surname>Shrivastava</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>S.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Revisiting unreasonable effectiveness of data in deep learning era</article-title>, in <source>IEEE International Conference on Computer Vision, ICCV 2017</source> (<publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>843</fpage>&#x02013;<lpage>852</lpage>.</citation></ref>
<ref id="B92">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Why can&#x00027;t we accurately predict others&#x00027; decisions? Prediction discrepancy in risky decision-making</article-title>. <source>Front. Psychol.</source> <volume>9</volume>:<fpage>2190</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2018.02190</pub-id><pub-id pub-id-type="pmid">30483196</pub-id></citation></ref>
<ref id="B93">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Zaremba</surname> <given-names>W.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>Erhan</surname> <given-names>D.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Intriguing properties of neural networks</article-title>, in <source>2nd International Conference on Learning Representations, ICLR 2014, Conference Track Proceedings</source>, eds <person-group person-group-type="editor"><name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Banff, AB</publisher-loc>).</citation></ref>
<ref id="B94">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Tanay</surname> <given-names>T.</given-names></name> <name><surname>Griffin</surname> <given-names>L. D.</given-names></name></person-group> (<year>2016</year>). <article-title>A boundary tilting persepective on the phenomenon of adversarial examples</article-title>. <italic>arXiv</italic> abs/1608.07690.</citation></ref>
<ref id="B95">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Tram&#x000E8;r</surname> <given-names>F.</given-names></name> <name><surname>Kurakin</surname> <given-names>A.</given-names></name> <name><surname>Papernot</surname> <given-names>N.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <name><surname>Boneh</surname> <given-names>D.</given-names></name> <name><surname>McDaniel</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Ensemble adversarial training: attacks and defenses</article-title>, in <source>Proceedings of the 6th International Conference on Learning Representations</source> (<publisher-loc>Vancouver</publisher-loc>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1705.07204">https://arxiv.org/abs/1705.07204</ext-link></citation></ref>
<ref id="B96">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tran</surname> <given-names>B.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Madry</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Spectral signatures in backdoor attacks</article-title>, in <source>Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018</source>, eds <person-group person-group-type="editor"><name><surname>Bengio</surname> <given-names>S.</given-names></name> <name><surname>Wallach</surname> <given-names>H. M.</given-names></name> <name><surname>Larochelle</surname> <given-names>H.</given-names></name> <name><surname>Grauman</surname> <given-names>K.</given-names></name> <name><surname>Cesa-Bianchi</surname> <given-names>N.</given-names></name> <name><surname>Garnett</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Montr&#x000E9;al, QC</publisher-loc>), <fpage>8011</fpage>&#x02013;<lpage>8021</lpage>.</citation></ref>
<ref id="B97">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Turner</surname> <given-names>A.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <name><surname>Madry</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Label-consistent backdoor attacks</article-title>. <italic>arXiv</italic> abs/1912.02771.</citation></ref>
<ref id="B98">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Veit</surname> <given-names>A.</given-names></name> <name><surname>Alldrin</surname> <given-names>N.</given-names></name> <name><surname>Chechik</surname> <given-names>G.</given-names></name> <name><surname>Krasin</surname> <given-names>I.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name> <name><surname>Belongie</surname> <given-names>S. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Learning from noisy large-scale datasets with minimal supervision</article-title>, in <source>2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017</source> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>6575</fpage>&#x02013;<lpage>6583</lpage>.</citation></ref>
<ref id="B99">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>B.</given-names></name> <name><surname>Yao</surname> <given-names>Y.</given-names></name> <name><surname>Shan</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Viswanath</surname> <given-names>B.</given-names></name> <name><surname>Zheng</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Neural cleanse: identifying and mitigating backdoor attacks in neural networks</article-title>, in <source>Proceedings of the IEEE Symposium on Security and Privacy (SP)</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>707</fpage>&#x02013;<lpage>723</lpage>.</citation></ref>
<ref id="B100">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Huang</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Qian</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>The devil of face recognition is in the noise</article-title>, in <source>Computer Vision&#x02013;ECCV 2018</source>, eds <person-group person-group-type="editor"><name><surname>Ferrari</surname> <given-names>V.</given-names></name> <name><surname>Hebert</surname> <given-names>M.</given-names></name> <name><surname>Sminchisescu</surname> <given-names>C.</given-names></name> <name><surname>Weiss</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>780</fpage>&#x02013;<lpage>795</lpage>.</citation></ref>
<ref id="B101">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname> <given-names>E.</given-names></name> <name><surname>Scholl</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). <article-title>Stochastic or systematic? Seemingly random perceptual switching in bistable events triggered by transient unconscious cues</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform.</source> <volume>41</volume>, <fpage>929</fpage>&#x02013;<lpage>939</lpage>. <pub-id pub-id-type="doi">10.1037/a0038709</pub-id><pub-id pub-id-type="pmid">25915074</pub-id></citation></ref>
<ref id="B102">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Werbos</surname> <given-names>P.</given-names></name></person-group> (<year>1982</year>). <article-title>Applications of advances in nonlinear sensitivity analysis</article-title>, in <source>System Modeling and Optimization. Lecture Notes in Control and Information Sciences</source>, Vol. <volume>38</volume>, eds <person-group person-group-type="editor"><name><surname>Drenick</surname> <given-names>R. F.</given-names></name> <name><surname>Kozin</surname> <given-names>F.</given-names></name></person-group> (<publisher-loc>Berlin; Heidelberg; New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>762</fpage>&#x02013;<lpage>770</lpage>.</citation></ref>
<ref id="B103">
<citation citation-type="book"><person-group person-group-type="author"><collab>Wikipedia Contributors</collab></person-group> (<year>2020</year>). <source>Tesla Autopilot&#x02014;Wikipedia, The Free Encyclopedia</source>. <publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Wikimedia Foundation, Inc</publisher-name>. <pub-id pub-id-type="pmid">26900274</pub-id></citation></ref>
<ref id="B104">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>E.</given-names></name> <name><surname>Kolter</surname> <given-names>J. Z.</given-names></name></person-group> (<year>2018</year>). <article-title>Provable defenses against adversarial examples via the convex outer adversarial polytope</article-title>, in <source>Proceedings of the 35th International Conference on Machine Learning, PMLR</source> (<publisher-loc>Stockholm</publisher-loc>), <fpage>5286</fpage>&#x02013;<lpage>5295</lpage>.</citation></ref>
<ref id="B105">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>E.</given-names></name> <name><surname>Schmidt</surname> <given-names>F. R.</given-names></name> <name><surname>Metzen</surname> <given-names>J. H.</given-names></name> <name><surname>Kolter</surname> <given-names>J. Z.</given-names></name></person-group> (<year>2018</year>). <article-title>Scaling provable adversarial defenses</article-title>, in <source>NIPS&#x00027;18: Proceedings of the 32nd International Conference on Neural Information Processing Systems</source> (<publisher-loc>Montr&#x000E9;al, QC</publisher-loc>), <fpage>8410</fpage>&#x02013;<lpage>8419</lpage>.</citation></ref>
<ref id="B106">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wood</surname> <given-names>G.</given-names></name> <name><surname>Vine</surname> <given-names>S.</given-names></name> <name><surname>Wilson</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>The impact of visual illusions on perception, action planning, and motor performance</article-title>. <source>Atten. Percept. Psychophys.</source> <volume>75</volume>, <fpage>830</fpage>&#x02013;<lpage>834</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-013-0489-y</pub-id><pub-id pub-id-type="pmid">23757046</pub-id></citation></ref>
<ref id="B107">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiao</surname> <given-names>H.</given-names></name> <name><surname>Biggio</surname> <given-names>B.</given-names></name> <name><surname>Nelson</surname> <given-names>B.</given-names></name> <name><surname>Xiao</surname> <given-names>H.</given-names></name> <name><surname>Eckert</surname> <given-names>C.</given-names></name> <name><surname>Roli</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>Support vector machines under adversarial label contamination</article-title> . <source>J. Neurocomput. Spec. Issue Adv. Learn. Label Noise</source> <volume>160</volume>, <fpage>53</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2014.08.081</pub-id></citation></ref>
<ref id="B108">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Ma</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>H. C.</given-names></name> <name><surname>Deb</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Tang</surname> <given-names>J. L.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Adversarial attacks and defenses in images, graphs and text: a review</article-title>. <source>Int. J. Autom. Comput.</source> <volume>17</volume>, <fpage>151</fpage>&#x02013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1007/s11633-019-1211-x</pub-id></citation></ref>
<ref id="B109">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yakura</surname> <given-names>H.</given-names></name> <name><surname>Akimoto</surname> <given-names>Y.</given-names></name> <name><surname>Sakuma</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Generate (non-software) bugs to fool classifiers</article-title>, in <source>Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 (AAAI-20)</source> (<publisher-loc>New York, NY</publisher-loc>).</citation></ref>
<ref id="B110">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Yousefzadeh</surname> <given-names>R.</given-names></name> <name><surname>O&#x00027;Leary</surname> <given-names>D. P.</given-names></name></person-group> (<year>2019</year>). <article-title>Investigating decision boundaries of trained neural networks</article-title>. <italic>arXiv</italic> abs/1908.02802.</citation></ref>
<ref id="B111">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zahavy</surname> <given-names>T.</given-names></name> <name><surname>Kang</surname> <given-names>B.</given-names></name> <name><surname>Sivak</surname> <given-names>A.</given-names></name> <name><surname>Feng</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Mannor</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Ensemble robustness and generalization of stochastic deep learning algorithms</article-title>, in <source>International Conference on Learning Representations Workshop (ICLRW&#x00027;18)</source> (<publisher-loc>Vancouver, BC</publisher-loc>).</citation></ref>
<ref id="B112">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>N.</given-names></name> <name><surname>Ji</surname> <given-names>S.</given-names></name> <name><surname>Shen</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Interpretable deep learning under fire</article-title>. <italic>arXiv</italic> abs/1812.00891.</citation></ref>
<ref id="B113">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Vondrick</surname> <given-names>C.</given-names></name> <name><surname>Fowlkes</surname> <given-names>C. C.</given-names></name> <name><surname>Ramanan</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Do we need more training data?</article-title> <source>Int. J. Comput. Vis.</source> <volume>119</volume>, <fpage>76</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-015-0812-2</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>AI is here defined as the capability of a machine to either autonomously take decisions or to support humans in making decisions. In order to distinguish AI from trivial functions, such as, for instance, a sensor that directly triggers an action using a threshold function, one might narrow the definition to non-trivial functions but since this term is not clearly defined, we refrain from doing so.</p></fn>
<fn id="fn0002"><p><sup>2</sup>In contrast to narrowing the term performance to cover only accuracy, we use it in a broader sense, cf. 2.1 for details.</p></fn>
<fn id="fn0003"><p><sup>3</sup>We note that the concepts covered by the terms availability and integrity differ to some extent from the ones they usually denote. Indeed, prevalent attacks on availability are the result of a large-scale violation of integrity of the system&#x00027;s output data. However, this usage has widely been adopted in the research area.</p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> The article was written as part of the authors&#x00027; employment at Federal Office for Information Security, Bonn, Germany. The authors did not receive any other funding.</p>
</fn>
</fn-group>
</back>
</article>