<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2023.1124553</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Decision trees: from efficient prediction to responsible AI</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Blockeel</surname> <given-names>Hendrik</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2109845/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Devos</surname> <given-names>Laurens</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2380039/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Fr&#x000E9;nay</surname> <given-names>Beno&#x000EE;t</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2308482/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Nanfack</surname> <given-names>G&#x000E9;raldin</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2208136/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Nijssen</surname> <given-names>Siegfried</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2380077/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Science, KU Leuven</institution>, <addr-line>Leuven</addr-line>, <country>Belgium</country></aff>
<aff id="aff2"><sup>2</sup><institution>Institute for Artificial Intelligence (Leuven.AI), KU Leuven</institution>, <addr-line>Leuven</addr-line>, <country>Belgium</country></aff>
<aff id="aff3"><sup>3</sup><institution>Faculty of Computer Science, Universit&#x000E9; de Namur</institution>, <addr-line>Namur</addr-line>, <country>Belgium</country></aff>
<aff id="aff4"><sup>4</sup><institution>ICTEAM, UCLouvain</institution>, <addr-line>Ottignies-Louvain-la-Neuve</addr-line>, <country>Belgium</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Enrique Herrera-viedma, University of Granada, Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Bernhard C. Geiger, Know Center, Austria; Johannes F&#x000FC;rnkranz, Johannes Kepler University of Linz, Austria</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Hendrik Blockeel <email>hendrik.blockeel&#x00040;kuleuven.be</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>07</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>6</volume>
<elocation-id>1124553</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>07</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Blockeel, Devos, Fr&#x000E9;nay, Nanfack and Nijssen.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Blockeel, Devos, Fr&#x000E9;nay, Nanfack and Nijssen</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>This article provides a birds-eye view on the role of decision trees in machine learning and data science over roughly four decades. It sketches the evolution of decision tree research over the years, describes the broader context in which the research is situated, and summarizes strengths and weaknesses of decision trees in this context. The main goal of the article is to clarify the broad relevance to machine learning and artificial intelligence, both practical and theoretical, that decision trees still have today.</p></abstract>
<kwd-group>
<kwd>decision trees</kwd>
<kwd>ensembles</kwd>
<kwd>responsible AI</kwd>
<kwd>machine learning</kwd>
<kwd>learning under constraints</kwd>
<kwd>explainable AI</kwd>
<kwd>combinatorial optimization</kwd>
</kwd-group>
<contract-num rid="cn001">30992574</contract-num>
<contract-sponsor id="cn001">Fonds Wetenschappelijk Onderzoek<named-content content-type="fundref-id">10.13039/501100003130</named-content></contract-sponsor>
<contract-sponsor id="cn002">Fonds De La Recherche Scientifique - FNRS<named-content content-type="fundref-id">10.13039/501100002661</named-content></contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="2"/>
<equation-count count="2"/>
<ref-count count="174"/>
<page-count count="17"/>
<word-count count="16621"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Machine Learning and Artificial Intelligence</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Decision trees, and ensembles of them (forests), are among the best studied and most widely used tools for machine learning and data science. In their basic form, they are covered in just about every introductory course on these fields, and the extensive literature on them is covered by many surveys. However, there is plenty of application potential for decision trees beyond the commonly known uses. The existing surveys typically do not zoom in on this, or when they do, they zoom in on one particular type of use.</p>
<p>The purpose of this review is to complement the literature by taking a step back and providing a higher-level overview of decision tree technology and applicability, covering the broad variation in it. The review focuses on answering questions such as: what types of decision trees exist (beyond the well-known classification and regression trees), what can they be used for, what roles can decision trees play in an age that is dominated by deep learning? It describes the landscape and evolution of decision tree research in a way that is roughly chronological, starting with the earlier research, which focused mostly on decision trees as predictive models, and gradually moving toward more recent work on learning or exploiting decision trees in the context of what is currently often referred to as responsible AI: the study of AI systems that are fair, transparent, and safe.</p>
<p>The review does not aim at surveying the research field in the traditional sense. For some topics, it points the reader to existing surveys; at other times, concrete publications are referred to as illustrative examples of the topics being discussed. Nevertheless, in a few cases, where we believe a subject is insufficiently covered by existing surveys, a more detailed overview of work is given.</p>
<p>The review starts (Section 2) with an overview of the basics of decision trees: classification and regression trees as they were originally envisioned, methods for learning them, variants and ensembles. What connects all this work is that decision trees are seen as predictive (and sometimes explanatory) models. However, decision trees can be used beyond the classification and regression context: they can be used for multi-label learning, multi-instance learning, (predictive) clustering, probability and density estimation, and other purposes, both standalone and integrated in other methods. Section 3 covers such uses.</p>
<p>Section 4 briefly discusses different algorithmic approaches to learning decision trees: besides the standard heuristic approaches, incremental and distributed variants have been proposed, as well as approaches based on non-greedy optimization and continuous parameter optimization. This section includes an extensive discussion of exhaustive methods that search for optimal decision trees (given some optimization criterion)&#x02014;an NP-hard problem that due to recent advances in solver technology has received much interest. This discussion focuses on providing insight in the different approaches and how they relate to each other.</p>
<p>Section 5 discusses how background knowledge in the form of formal constraints can be incorporated in decision trees, either by imposing the constraints on the model at learning time, or by verifying given models. Learning models under constraints is currently receiving increasing interest, partly because constraints are useful to enforce other properties than accuracy and interpretability, such as robustness and fairness, and partly because the technology now allows it: the current state of the art in greedy and exhaustive search methods facilitates the creation of methods that take constraints into account.</p>
<p>In Section 6, we provide an overview of how decision tree based methods play a role in the current research on Responsible AI, with a specific focus on robustness, fairness, and explainability. This section covers mostly recent work.</p>
<p>Section 7 offers a brief look forward, mentioning challenges and perspectives, and Section 8 concludes.</p>
</sec>
<sec id="s2">
<title>2. Decision trees and forests: the basics</title>
<p>This section discusses the basics of decision trees. It focuses mostly on the area as it was seen by the end of the 20th century, and is meant to set the background for the later sections. We introduce the concept of decision trees, the greedy learning methods that are most commonly used for learning them, variants of trees and algorithms, and methods for learning ensembles of trees. The section ends with an overview of strengths and weaknesses of decision trees and forests.</p>
<sec>
<title>2.1. Decision trees</title>
<p>A decision tree represents a procedure for computing the outcome of a function <italic>f</italic>(<italic>x</italic>). The procedure consist of repeatedly performing tests on the input <italic>x</italic>, where the outcome of each test determines the next test, until <italic>f</italic>(<italic>x</italic>) is known with certainty. <xref ref-type="fig" rid="F1">Figure 1</xref> shows a function in tabular format and two different decision trees that represent it.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The Boolean function <italic>Y</italic> &#x0003D; <italic>X</italic><sub>1</sub> &#x02227; <italic>X</italic><sub>2</sub> &#x02228; <italic>X</italic><sub>3</sub>, and two decision trees representing it.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124553-g0001.tif"/>
</fig>
<p>Decision tree learning refers to the task of constructing from a set of (<italic>x, f</italic>(<italic>x</italic>)) pairs, a decision tree that represents <italic>f</italic> or a close approximation of it. When the domain of <italic>x</italic> is finite, the set of pairs can in principle be exhaustive, but more often, the set is a sample from a (possibly infinite) domain <inline-formula><mml:math id="M1"><mml:mrow><mml:mi mathvariant="script">X</mml:mi></mml:mrow></mml:math></inline-formula>. In that case, rather than finding a tree that approximates <italic>f</italic> on the data set, one may try to find a tree that approximates <italic>f</italic> over the whole domain.</p>
<p>In a slightly generalized setting, the set of pairs may be of the form (<italic>x, y</italic>) where <italic>y</italic> is determined only probabilistically by <italic>x</italic>; for instance, <italic>y</italic> may depend also on unobserved variables, <italic>y</italic> &#x0003D; <italic>f</italic>&#x02032;(<italic>x, u</italic>). The task is then to learn a tree that represents a function <italic>f</italic>(<italic>x</italic>) that closely approximates <italic>f</italic>&#x02032;(<italic>x, u</italic>) for any choice of <italic>u</italic> (on average, in the worst case, or using some other aggregation criterion).</p>
<p>Apart from finding a good approximation, additional criteria may exist. For instance, the task may be to find the simplest decision tree that represents the function. It is known that finding the smallest decision tree (in terms of number of nodes) that perfectly fits a given dataset is NP-hard (Hyafil and Rivest, <xref ref-type="bibr" rid="B77">1976</xref>).</p>
<p>The output of a tree for a given <italic>x</italic> is often called its prediction for <italic>x</italic>. Decision trees that predict nominal or numerical variables are respectively called <bold>classification trees</bold> and <bold>regression trees</bold>. An important property of decision trees, in the context of machine learning, is that the prediction is the result of a simple and easy-to-interpret computation (a relatively short series of tests). Because of this, trees are said to be <italic>interpretable</italic>.</p>
</sec>
<sec>
<title>2.2. Recursive partitioning</title>
<p>Decision trees became prominent in machine learning and data analysis around the 1980s, when popular decision tree learners were developed more or less in parallel in the computer science community (e.g., ID3 Quinlan, <xref ref-type="bibr" rid="B124">1986</xref> and its many subsequent improvements) and in the statistics community (CART Breiman et al., <xref ref-type="bibr" rid="B25">1984</xref>). While differing in details, these learners all make use of the same basic procedure, namely <bold>recursive partitioning</bold> (also known as &#x0201C;top-down induction of decision trees&#x0201D;).</p>
<p>Recursive partitioning works as follows. Given a dataset <italic>D</italic> containing pairs (<italic>x, y</italic>), a test that can be performed on individual instances <italic>x</italic> is chosen, the dataset is partitioned according to the outcome of this test, and this procedure is repeated for each subset thus created. This continues until no further partitioning is needed or possible. This procedure relies on two important heuristic criteria: how to choose the test, and when to stop.</p>
<p>The chosen test is typically selected from a set of candidate tests, and is that test that is deemed &#x0201C;most informative&#x0201D; with respect to the value of <italic>y</italic>. A test is maximally informative when, given the outcome of the test, the <italic>y</italic> value is known exactly. For numerical <italic>y</italic>, the average variance of <italic>y</italic> within each subset of the partition is often used as an indicator for the informativeness of the test. For nominal <italic>y</italic>, measures based on information theory are sometimes used, such as information gain (the difference between the entropy of <italic>y</italic> and its average conditional entropy given the test outcome).</p>
<p>Using the &#x0201C;most informative&#x0201D; test is motivated by a preference for finding short decision trees (which require few tests to come to a decision), but there is no guarantee that any of the above measures indeed lead to the shortest possible tree. From that point of view, the procedure is heuristic. Early research on decision trees has explored many different variants of the heuristics, without unearthing a universally preferable one: see, e.g., Murthy (<xref ref-type="bibr" rid="B110">1998</xref>, Section 3.1.1).</p>
<p>It is clear that a subset need not be partitioned further when only one value for <italic>y</italic> remains, but it may be better to stop splitting even earlier, as for small sets, further splitting may cause overfitting.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> Therefore, learning algorithms usually stop splitting when a set is too small to carry any statistical significance. Alternatively, an algorithm may keep splitting but prune the tree afterwards. Again, many variants of stopping criteria and pruning procedures have been explored, without any one consistently outperforming the rest, though on individual datasets there may be substantial differences in performance (Murthy, <xref ref-type="bibr" rid="B110">1998</xref>).</p>
<p>For a more extensive discussion of the many variants of decision tree learners that had been proposed by the end of the 20th century, we refer to the comprehensive survey by Murthy (<xref ref-type="bibr" rid="B110">1998</xref>). A recent survey by Costa and Pedreira (<xref ref-type="bibr" rid="B37">2023</xref>) focuses on progress after 2010. Observing that choosing the right variant can sometimes make a difference, Barros et al. (<xref ref-type="bibr" rid="B10">2015</xref>) propose a method for constructing tailor-made recursive partitioning algorithms that chooses components optimally for a given dataset.</p>
<p>Recursive partitioning is probably the best known and most often used method for learning decision trees, but it is not the only one: Section 4 of this text discusses alternatives. Especially in recent years, these alternative methods are gaining interest.</p>
</sec>
<sec>
<title>2.3. Variants of classification and regression trees</title>
<p>Decision trees are often used with tabular data, where each instance is described using the same set of input variables. Tests are often univariate (based on a single variable), and in the case of numerical inputs, based on a dichotomy (value above or below some threshold). However, it is perfectly possible for learners to consider multivariate tests. An example of this are so-called <bold>oblique decision trees</bold>, which use a threshold on a linear combination of input variables; this results in straight-line boundaries between the subsets that are not necessarily axis-parallel (Murthy et al., <xref ref-type="bibr" rid="B111">1994</xref>).</p>
<p>In the above, we assumed that decision trees make the same prediction for all instances in a given leaf. However, variants exist that store in a leaf a function, rather than a single value, for the prediction. <bold>Model trees</bold>, for instance, store a linear model in a leaf, rather than a constant, so that the model represented by the tree is piecewise linear, rather than piecewise constant. M5 (Quinlan, <xref ref-type="bibr" rid="B125">1992</xref>) is a well-known example of such a system.</p>
<p>Trees can also be used with non-tabular data, such as graphs, relational databases, or knowledge bases. The only requirement is that tests can be defined on individual instances. This has led to the development of <bold>relational</bold> (a.k.a. <italic>structural</italic> or <italic>first-order-logic</italic>) decision trees (Kramer, <xref ref-type="bibr" rid="B91">1996</xref>; Blockeel and De Raedt, <xref ref-type="bibr" rid="B19">1998</xref>).</p>
<p>All the above variants of decision tree learning still fit under the header of classification and regression trees. Section 3 will focus on decision trees that serve other purposes, such as clustering or density estimation.</p>
</sec>
<sec>
<title>2.4. Ensembles of trees: decision forests</title>
<p><bold>Ensemble methods</bold> reduce the effect of random artifacts in the training set or learning procedure by repeating the learning process multiple times, and creating a meta-model that makes predictions by aggregating the predictions of the individual learned models. To construct multiple trees from a single data set, one can use bootstrap aggregating, a.k.a. <bold>bagging</bold>: each individual tree is trained on a random sample of |<italic>T</italic>| instances drawn with replacement from the training set <italic>T</italic>, and the prediction of the ensemble is the average (for regression) or mode (for classification) of the individual predictions. Ensembles of decision trees constructed in this way significantly outperform single decision trees in terms of accuracy (Breiman, <xref ref-type="bibr" rid="B23">1996</xref>). The <bold>Random Forests</bold> method (Breiman, <xref ref-type="bibr" rid="B24">2001</xref>) is a variant of this in which the test put in each node is the best from a randomly chosen subset (rather than all) of the possible tests. The additional randomness thus introduced typically increases the performance of the ensemble as a whole. More recently, <bold>gradient boosting</bold> (Friedman, <xref ref-type="bibr" rid="B59">2001</xref>) have become increasingly popular: here, additional trees are added to the ensemble in a way that mimics gradient descent in the prediction space. At the time of writing, an implementation of gradient boosting called <bold>XGBoost</bold> (Chen and Guestrin, <xref ref-type="bibr" rid="B33">2016</xref>) is widely considered<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> to be the method of choice when learning from tabular data: it is fast, easy to use, and very often outperforms other methods in terms of predictive accuracy. Grinsztajn et al. (<xref ref-type="bibr" rid="B68">2022</xref>) confirm this by means of thorough experimental verification.</p>
<p>The above are probably the best known types of ensemble methods, but many other exist. <bold>Stacking</bold> (e.g., &#x0017D;enko et al., <xref ref-type="bibr" rid="B168">2001</xref>) is a variant that learns to combine the votes of individual learners, rather than using a fixed voting mechanism to combine them; to this aim, a separate learner is stacked on top of the others. <bold>Alternating decision trees</bold> (ADT) (Freund and Mason, <xref ref-type="bibr" rid="B58">1999</xref>) are trees that, besides the standard test nodes, contain &#x0201C;prediction nodes&#x0201D;, which store numerical values and can have multiple test nodes as children. Instances are sorted simultaneously to all children of a prediction node, and the prediction is the sum of the prediction nodes on all paths it follows. Thus, an ADT essentially combines the predictions of a set of decision trees, and can be seen as a compact representation of an ensemble.</p>
<p>Noteworthy overviews on tree ensembles include Zhou (<xref ref-type="bibr" rid="B174">2012</xref>) (an insightful and at the time of writing quite comprehensive view of the field); Criminisi and Shotton (<xref ref-type="bibr" rid="B40">2013</xref>) (which discusses a broad variety of uses of decision forests in the context of image processing); and recent surveys by Sagi and Rokach (<xref ref-type="bibr" rid="B133">2018</xref>) and Dong et al. (<xref ref-type="bibr" rid="B49">2020</xref>), which provide an excellent overview of ensemble methods (with a non-exclusive but substantial focus on decision tree ensembles).</p>
</sec>
<sec>
<title>2.5. Predictive learning with decision trees: pros and cons</title>
<p>Decision trees are very popular tools for predictive modeling for the following reasons. They are very easy to use: they typically require little or no tuning.<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> They can be learned very fast: on the assumption that relatively balanced trees are learned (which most heuristics try to ensure), learning typically scales as <italic>O</italic>(<italic>mn</italic>log<italic>n</italic>) with <italic>n</italic> the number of rows and <italic>m</italic> the number of columns in the data table (i.e., only a factor <italic>O</italic>(log<italic>n</italic>) worse than scanning the table once). Under the same assumption, prediction requires <italic>O</italic>(log<italic>n</italic>) tests, which typically means a few dozen CPU instructions, per instance. Ensembles typically multiply this by a single order of magnitude, or less: e.g., Random Forests, by selecting a subset of features for each test, substantially reduces the <italic>m</italic> factor, which in high-dimensional domains may compensate for the extra work of learning more trees. All this make decision trees and their ensembles extremely fast and energy-efficient, which is a major advantage when deploying models on small battery-powered devices.</p>
<p>Early research mostly focused on making decision trees as accurate as possible. Individual trees could never beat more complex models such as neural networks in this respect, but it is now generally acknowledged that forests can. Ensembles do give up some interpretability for this. Still, it is important to distinguish different forms of interpretability: (1) understanding the full model; (2) understanding aspects of the full model, such as which variables are important; (3) understanding a single prediction; (4) understanding the reasoning process involved in a prediction. Random Forests, for instance, are highly interpretable in the sense of 2 and 4.</p>
<p>Decision forests typically perform less well when learning from raw data (such as images, sound, text), where the features relevant for prediction have to be constructed and cannot be expressed as logical combinations of relatively few input features. This type of problems is what deep learning excels at.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Beyond classification and regression</title>
<p>Decision tree learning was originally proposed in the standard predictive learning setting, where an output variable (nominal or numerical) needs to be predicted from input variables. The algorithm is sufficiently flexible, however, to generalize decision tree learning to many other settings.</p>
<p>A first type of generalization is <bold>multi-target prediction</bold>, where a single tree predicts multiple output variables at the same time (possibly a mix of numerical and nominal variables). This setting includes prediction of set-valued variables, as in <bold>multi-label prediction</bold>, since sets are easily represented as binary vectors. This generalization can be achieved by simply maintaining the variance reduction heuristic from regression trees, now using variance in a higher-dimensional space. Single multi-target trees have in some contexts exhibited better performance than sets of single-target trees (Vens et al., <xref ref-type="bibr" rid="B150">2008</xref>). In a further generalization, Kocev et al. (<xref ref-type="bibr" rid="B88">2013</xref>) studied decision tree ensembles for <bold>structured output prediction</bold>.</p>
<p>A natural extension of multi-target trees are <bold>clustering trees</bold>, which use variance in the input space as a heuristic, rather than variance in the output space, to learn a hierarchical clustering where each cluster is strictly characterized by a conjunction of selected attribute values. This additionally makes it possible to naturally interpolate between the predictive and clustering settings to obtain so-called <bold>predictive clustering trees</bold>, which form coherent clusters within which accurate prediction of the output variables is possible (Blockeel et al., <xref ref-type="bibr" rid="B21">1998</xref>).</p>
<p><bold>Survival trees</bold> are regression trees used in the context of survival analysis. The specific challenge in learning such trees is that the survival data used for training is often censored: only lower bounds are known for the labels (i.e., we do not know the exact time of death, only that up till a certain moment in time a person was still alive). Bou-Hamad et al. (<xref ref-type="bibr" rid="B22">2011</xref>) and Zhou and McArdle (<xref ref-type="bibr" rid="B173">2015</xref>) survey algorithms for learning survival trees and forests. These algorithms typically fit a hazard function to instances in a leaf of the tree, and use heuristics that either maximize heterogeneity among subsets after the split, or maximize homogeneity within them (for some homogeneity criterion that suits the context of survival analysis).</p>
<p>Decision trees have also been used for <bold>ranking</bold> and <bold>preference learning</bold>. As noted by F&#x000FC;rnkranz and H&#x000FC;llermeier (<xref ref-type="bibr" rid="B62">2010</xref>), ranking is an umbrella term for a variety of tasks. For instance, given <italic>n</italic> instances and <italic>m</italic> classes, one may predict a preference ordering of classes for each instance (<italic>label ranking</italic>), or of instances for a specific class (<italic>instance ranking</italic>; one typically ranks the instances according to likelihood of belonging to the positive class, as in finding the most relevant webpages for a query). Training examples may be labeled with a single class (<italic>ordinal classification</italic>), a complete label ranking, or a partial ranking (e.g., which of two labels or instances is preferred over the other). Examples of decision tree based approaches for label ranking are Todorovski et al. (<xref ref-type="bibr" rid="B141">2002</xref>) and Yu et al. (<xref ref-type="bibr" rid="B166">2011</xref>); both learn from complete label rankings. Instance ranking with decision trees, learning from binary labels, was studied (among others) by Provost and Domingos (<xref ref-type="bibr" rid="B123">2003</xref>), who use probability estimation trees for the task, and Cl&#x000E9;men&#x000E7;on et al. (<xref ref-type="bibr" rid="B35">2011</xref>) who study ranking with decision trees as a task in itself.</p>
<p>The <bold>multi-instance</bold> learning setting is a binary classification setting where labels are available at the level of groups of instances rather than individual instances: a group is positive if it contains at least one positive instance, and negative otherwise. Small changes to the recursive partitioning algorithm suffice to make decision tree learning successful in multi-instance learning (Blockeel et al., <xref ref-type="bibr" rid="B20">2005</xref>).</p>
<p>Decision trees have also been adapted for <bold>semi-supervised</bold> learning (Levati&#x00107; et al., <xref ref-type="bibr" rid="B93">2017</xref>) and learning from positive and unlabeled data (<bold>PU-learning</bold>) (Liang et al., <xref ref-type="bibr" rid="B96">2012</xref>). In PU-learning, they have also been used for estimating the labeling rate of positive cases (Bekker and Davis, <xref ref-type="bibr" rid="B13">2018</xref>), serving as an auxiliary method for any kind of PU-learners.</p>
<p>In the context of anomaly detection, a tree-based approach called <bold>Isolation Forests</bold> (Liu et al., <xref ref-type="bibr" rid="B99">2008</xref>) is considered state-of-the-art for a wide range of applications. The rationale behind Isolation Forests is that with random splitting, anomalies tend to get isolated into a singleton leaf early on, so the depth of a singleton leaf is an indication of how anomalous the instance in that leaf is.</p>
<p>Decision trees are useful also in probabilistic settings. <bold>Probability estimation trees</bold> (PETs) are decision trees that predict probabilities rather than just classifying instances; that is, given <italic>x</italic>, they predict <italic>P</italic>(<italic>y</italic>|<italic>x</italic>) rather than just the <italic>y</italic> that maximizes it. PET learning typically benefits from less aggressive pruning than classification tree learning (Provost and Domingos, <xref ref-type="bibr" rid="B123">2003</xref>; Fierens et al., <xref ref-type="bibr" rid="B55">2010</xref>). <bold>Density estimation trees</bold> (DETs) model a density function over the input space (Ram and Gray, <xref ref-type="bibr" rid="B126">2011</xref>). Both types of models have been shown to be successful at modeling conditional and joint probability densities.</p>
<p>PETs are particularly useful in the context of probabilistic graphical models (PGMs). They can be used to model <bold>conditional probability functions</bold> (instead of probability tables) (Friedman and Goldszmidt, <xref ref-type="bibr" rid="B60">1998</xref>) and can even help decide the PGM structure as they naturally identify the parents of a node (Fierens et al., <xref ref-type="bibr" rid="B56">2007</xref>). Relational dependency networks (RDNs) are one example of a PGM that explicitly relies on PETs (Neville and Jensen, <xref ref-type="bibr" rid="B115">2007</xref>).</p>
<p>In the classical predictive learning setting, it is known at the time of learning which are the input and output variables, and models are constructed for this specific task. PGMs, in contrast, can predict any variable from any other variable. Motivated by this discrepancy, <bold>multi-directional ensembles</bold> of regression and classification trees (MERCS) have been proposed. Here, each individual tree predicts one or more variables from the other variables, and in the ensemble as a whole, every variable occurs as a target variable at least once. Essentially a non-probabilistic variant of PGMs, MERCS models have been shown to allow for much faster inference than PGMs (Van Wolputte et al., <xref ref-type="bibr" rid="B149">2018</xref>), and to be useful also for missing value imputation (Van Wolputte and Blockeel, <xref ref-type="bibr" rid="B148">2020</xref>).</p>
<p>Researchers on fuzzy logic have proposed <bold>fuzzy decision trees</bold> as a way of dealing with uncertain or vague data. Multiple methods for adapting decision trees to work in a fuzzy logic context have been proposed; Olaru and Wehenkel (<xref ref-type="bibr" rid="B120">2003</xref>) provide a good overview in their related work section. H&#x000FC;llermeier and Vanderlooy (<xref ref-type="bibr" rid="B76">2009</xref>) argue that fuzzy decision trees are particularly advantageous for ranking, and relate this to their use of &#x0201C;soft&#x0201D; splits, where instances can be partially assigned to multiple branches.</p>
<p>Johansson et al. (<xref ref-type="bibr" rid="B81">2014</xref>) study decision trees and forests in the context of <bold>conformal prediction</bold>, a setting where instead of a single value, a set of values is predicted: namely, the smallest possible set for which there is a probabilistic guarantee that the true label is in it.</p>
<p>The above is only a selection of uses of decision trees; it is virtually impossible to be complete. While this text focuses mostly on the fields of artificial intelligence and machine learning, conceptual development of tree-based methods has happened in parallel in many different fields, including statistics and application domains such as computer vision, bioinformatics, and medical informatics; some examples are Hothorn and Lausen (<xref ref-type="bibr" rid="B73">2003</xref>), Strobl et al. (<xref ref-type="bibr" rid="B138">2007</xref>), and Criminisi et al. (<xref ref-type="bibr" rid="B41">2012</xref>).</p>
</sec>
<sec id="s4">
<title>4. Beyond recursive partitioning</title>
<p>Recursive partitioning is very fast, can easily be adapted to different settings (as illustrated above), and generally yields good results. Yet, other algorithms for learning decision trees have been proposed, with quite different properties. Below, we first describe adaptations of recursive partitioning to incremental and distributed learning contexts. Next, we look at how more advanced methods for searching discrete and continuous spaces have been used in decision tree learning: advanced combinatorial problem solvers, gradient descent based methods, and evolutionary algorithms.</p>
<sec>
<title>4.1. Incremental learners</title>
<p>Incremental learners do not assume that all data is available from the beginning, but keep a preliminary model that they update when new data comes in. Such learners often use a variant of recursive partitioning that either restructures the tree when earlier choices turn out suboptimal [e.g., Utgoff (<xref ref-type="bibr" rid="B145">1989</xref>)], or proceeds more cautiously and splits a node only when enough data has become available in that node to be reasonably sure that this split is indeed the best choice. An example of a cautious system is VFDT (Domingos and Hulten, <xref ref-type="bibr" rid="B48">2000</xref>), which uses Hoeffding bounds to guarantee with high probability that the chosen split is identical to the one that would be chosen if the whole population were looked at. The term &#x0201C;Hoeffding trees&#x0201D; is often used for trees learned this way, and there has been a wide range of follow-up work on it; see Garcia-Martin et al. (<xref ref-type="bibr" rid="B63">2022</xref>) for a recent contribution that includes further pointers. Also ensemble learning has been adapted to this setting; e.g., Gomes et al. (<xref ref-type="bibr" rid="B66">2017</xref>) learn random forests from streaming data under concept drift (where the target model may evolve over time).</p>
</sec>
<sec>
<title>4.2. Parallelization and distribution</title>
<p>By nature, decision tree computations are easy to parallelize or decentralize. This can boost runtime efficiency<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> but also help address privacy and security concerns.</p>
<p>Early work on distributed learning focused on handling large, externally stored datasets; well-known examples are SLIQ (Mehta et al., <xref ref-type="bibr" rid="B105">1996</xref>), SPRINT (Shafer et al., <xref ref-type="bibr" rid="B135">1996</xref>), and RainForest (Gehrke et al., <xref ref-type="bibr" rid="B65">2000</xref>). Later work focused on reducing communication cost in distributed implementations (e.g., Tyree et al., <xref ref-type="bibr" rid="B144">2011</xref>; Meng et al., <xref ref-type="bibr" rid="B106">2016a</xref>) and exploiting standard frameworks such as MapReduce (e.g., Wu et al., <xref ref-type="bibr" rid="B159">2009</xref>) or Apache Spark (e.g., Meng et al., <xref ref-type="bibr" rid="B107">2016b</xref>). SPDT (Ben-Haim and Tom-Tov, <xref ref-type="bibr" rid="B15">2010</xref>) learns from streams in a distributed manner. Rokach (<xref ref-type="bibr" rid="B131">2016</xref>, Section 6.10) provides an overview with more examples.</p>
<p>Modern learners exploit specialized hardware, e.g., SIMD (Devos et al., <xref ref-type="bibr" rid="B44">2020</xref>; Shi et al., <xref ref-type="bibr" rid="B137">2022</xref>), GP-GPUs (Sharp, <xref ref-type="bibr" rid="B136">2008</xref>; Wen et al., <xref ref-type="bibr" rid="B158">2018</xref>), and FPGAs (Van Essen et al., <xref ref-type="bibr" rid="B147">2012</xref>). Modern boosting systems such as XGBoost and LightGBM (Ke et al., <xref ref-type="bibr" rid="B86">2017</xref>) all have performant GP-GPU implementations (Mitchell and Frank, <xref ref-type="bibr" rid="B109">2017</xref>; Zhang et al., <xref ref-type="bibr" rid="B171">2017</xref>). Not only learning, but also storage and use of decision trees is optimized, for instance using bit-level data structures, to allow deployment on edge devices with limited resources (Lucchese et al., <xref ref-type="bibr" rid="B100">2017</xref>; Ye et al., <xref ref-type="bibr" rid="B165">2018</xref>; Koschel et al., <xref ref-type="bibr" rid="B90">2023</xref>).</p>
<p>Federated learning tackles the challenge of learning with data distributed among several clients. These clients collaboratively train a model under server management while keeping data decentralized (Kairouz et al., <xref ref-type="bibr" rid="B82">2021</xref>), to prevent leakage of private or confidential information (Xu et al., <xref ref-type="bibr" rid="B162">2021</xref>). Methods for federated learning of tree-based models rely on a variety of techniques. For instance, CryptoBoost (Jin et al., <xref ref-type="bibr" rid="B80">2022</xref>) and SecureBoost (Xie et al., <xref ref-type="bibr" rid="B161">2022</xref>) use <italic>homomorphic encryption</italic>, where data or data statistics are encrypted to allow particular computations, e.g., addition and multiplication. More generally, <italic>secure aggregation</italic> refers to operations and protocols that preserve information before computing impurity measures. For example, Du and Zhan (<xref ref-type="bibr" rid="B51">2002</xref>) proposes a scalar product protocol to compute a dot product without sharing information. <italic>Differential privacy</italic> provides a formal framework for privacy-preserving learning without focusing on the side of computer security. Fletcher and Islam (<xref ref-type="bibr" rid="B57">2019</xref>) survey and analyze differential privacy algorithms for tree-based methods.</p>
</sec>
<sec>
<title>4.3. Combinatorial optimization and constraint solvers</title>
<p>Recursive partitioning uses heuristics. Hence, it does not guarantee any kind of optimality of the resulting tree, such as being the smallest tree that perfectly fits the training data, or minimizing some loss function under a constraint on the complexity of the model. This weakness motivates the development of alternative search strategies that can provide such guarantees.</p>
<p>Early algorithms for finding optimal decision trees performed an exhaustive enumeration of the space of decision trees (Esmeir and Markovitch, <xref ref-type="bibr" rid="B54">2007</xref>). An interesting feature of the approach of Esmeir and Markovitch (<xref ref-type="bibr" rid="B54">2007</xref>) is its any-time behavior: it orders the enumeration such that promising trees are enumerated first, allowing it to provide good trees if terminated early.</p>
<p>To obtain better run times, a number of different ideas have been explored.</p>
<p>An early approach was DL8 (Nijssen and Fromont, <xref ref-type="bibr" rid="B116">2007</xref>), which uses results from <italic>itemset mining</italic> to construct provably optimal trees. At the basis of DL8 is that a branch in a decision tree can be seen as a Boolean item, and a path as a set of items. For optimization criteria such as error the optimal tree below a given path depends only on the training examples that end up at the end of the path. This makes it possible to develop algorithms that use a form of dynamic programming in which partial solutions are associated to item sets and can be reused.</p>
<p>To avoid finding trees that are too complex, regularizing trees can be important. The idea of regularizing the complexity of optimal decision trees, in combination with some form of dynamic programming, can also be found in OSDT (Optimal Sparse Decision Trees), proposed by Hu et al. (<xref ref-type="bibr" rid="B75">2019</xref>), of which an optimized version, GOSDT (Generalized and Scalable Optimal Sparse Decision Trees) was proposed by Lin et al. (<xref ref-type="bibr" rid="B97">2020</xref>).</p>
<p>Another class of methods is based on the use of mixed integer linear programming (MILP), and was pioneered by Bertsimas and Dunn (<xref ref-type="bibr" rid="B16">2017</xref>). Mixed Integer Linear Programming is a generic approach for solving combinatorial problems by expressing them using linear constraints over integer variables. This approach is extensible: constraints can easily be added if they can be expressed in a linear form. Follow-up work includes, for instance, an adaptation of the approach for survival trees (Bertsimas et al., <xref ref-type="bibr" rid="B17">2022</xref>). While a more efficient MILP approach, BinOCT, has been proposed (Verwer and Zhang, <xref ref-type="bibr" rid="B152">2019</xref>), compared to DL8 a disadvantage is that the run time performance of MILP-based approaches is not as good.</p>
<p>Another generic approach is based on the use of SAT solvers and Constraint Programming solvers. The use of SAT solvers was studied by Bessiere et al. (<xref ref-type="bibr" rid="B18">2009</xref>) and Narodytska et al. (<xref ref-type="bibr" rid="B113">2018</xref>), for determining whether or not for a given training data set a decision tree exists that makes no error on this training data, under a constraint on either size or depth. Avellaneda (<xref ref-type="bibr" rid="B8">2020</xref>) built on this approach to build an algorithm that can find the smallest depth of a consistent tree, as well as the smallest consist tree under a depth constraint.</p>
<p>While these SAT-based approaches focused on trees that are consistent with all training examples, on many data sets one can accept trees that make a small amount of error. Hu et al. (<xref ref-type="bibr" rid="B74">2020</xref>) showed that by using MaxSAT solvers instead of SAT solvers, it becomes feasible to find trees that minimize error. Constraint Programming (CP) solvers have similar benefits. The use of CP was studied by Verhaeghe et al. (<xref ref-type="bibr" rid="B151">2020</xref>); in this work it was shown that a DL8-style algorithm can be combined with Constraint Programming.</p>
<p>Aglin et al. proposed an optimized version of DL8, called DL8.5, by adding the use of branch and bound to this search algorithm (Aglin et al., <xref ref-type="bibr" rid="B2">2020a</xref>) and showed how to apply this to find sparse decision trees or regression trees (Aglin et al., <xref ref-type="bibr" rid="B3">2020b</xref>). Similarly, GOSDT supports other optimization criteria as well (Lin et al., <xref ref-type="bibr" rid="B97">2020</xref>).</p>
<p>Improved bounds and data structures for DL8-style approaches were subsequently proposed by Demirovic et al. (<xref ref-type="bibr" rid="B43">2022</xref>). These authors report speed-ups of 1,000 and more compared to MILP-based approaches, and speed-ups of 500 and more compared to GOSDT. A shared weakness of the DL8-style algorithms is the need to store large numbers of itemsets. This weakness was studied by Aglin et al. (<xref ref-type="bibr" rid="B4">2022</xref>), who propose to sacrifice a certain degree of run time performance to limit memory consumption. Kiossou et al. (<xref ref-type="bibr" rid="B87">2022</xref>) showed how any-time behavior can be improved.</p>
<p>Given that most search algorithms allow for some freedom in the definition of constraints and optimization criteria, a number of them have been tuned for specific settings; these will be discussed in Section 6.</p>
<p>Combinations of heuristic and optimal algorithms are developed, hoping to combine the best of both worlds. The any-time algorithm of Esmeir and Markovitch (<xref ref-type="bibr" rid="B54">2007</xref>) is an early example of this. Another such approach was proposed by, who observe that some optimal decision tree learning algorithms require a discretization of the data and the specification of a depth constraint; they propose to learn an ensemble first using traditional greedy algorithms, and to subsequently use this ensemble to guide the choices for how to discretize the data and choose the depth constraint.</p>
</sec>
<sec>
<title>4.4. Gradient-based approaches</title>
<p>Several approaches have been proposed to learn decision trees (and forests) using gradient based approaches. The majority of these focus on oblique decision trees (where splits are based on linear combinations of input features). The key idea is to encapsulate, in a differentiable objective function, paths of instances along nodes and how the parameters (e.g., weights for the linear combination of features) affect these. Some approaches use hard (or discrete) paths computed with a threshold split at each internal node; other approaches use soft paths computed with cumulative distribution functions such as sigmoid, leading to trees called <italic>soft</italic> decision trees.</p>
<p>For the case of <bold>hard paths</bold>, one direction used by Norouzi et al. (<xref ref-type="bibr" rid="B117">2015</xref>) is to explicitly model them with discrete latent variables over internal nodes of trees, then an alternative minimization algorithm can be employed to infer paths and learn parameters of nodes using their gradients. In another direction, remarking that oblique decision trees define constant regions linearly separated in the input space, Lee and Jaakkola (<xref ref-type="bibr" rid="B92">2020</xref>) introduce the <italic>locally constant networks</italic> that are provably equivalent to oblique decision trees and are defined with derivatives of <italic>ReLU</italic> networks. Using locally constant networks, it is therefore possible to learn equivalent oblique decision trees with a global and differentiable objective function.</p>
<p>Differently from the above, in <bold>soft decision trees</bold>, instances are routed to each child node with a probability (Irsoy et al., <xref ref-type="bibr" rid="B78">2012</xref>). Using a differentiable function such as sigmoid to compute this probability allows expressing a differentiable objective function. Examples of work that focuses on decision trees include the work of Frosst and Hinton (<xref ref-type="bibr" rid="B61">2018</xref>). Thanks to the success of learning soft decision trees with gradient descent, several extensions have been made for random forests. For example, the <italic>neural decision forest</italic> (NDF) proposed by Rota Bulo and Kontschieder (<xref ref-type="bibr" rid="B132">2014</xref>) is an ensemble of decision trees where split functions are randomized multi-layer perceptrons (MLPs) that are learned locally (in each decision node) using gradient descent. Another example of extension to forests is the <italic>deep neural decision forest</italic> (dNDF) (Kontschieder et al., <xref ref-type="bibr" rid="B89">2015</xref>), which is similar to NDF except two aspects. First, a dNDF may use inside decision nodes, sigmoid units on top of deep convolutional neural networks. Second, a dNDF supports end-to-end training via gradient descent. The training of dNDF can also be sped up using dedicated activation functions (Hazimeh et al., <xref ref-type="bibr" rid="B72">2020</xref>) instead of the sigmoid.</p>
<p>Many gradient-based approaches to tree learning assume a fixed tree structure, but some infer the tree structure as part of the learning process. This can be done by modeling the possibility for a node to be either a leaf node or a decision node, allowing therefore pruning during learning. Examples of these improvements include the <italic>budding tree</italic> (Irsoy et al., <xref ref-type="bibr" rid="B79">2014</xref>), the one-stage tree (Xu et al., <xref ref-type="bibr" rid="B163">2022</xref>) and the quadratic program of Zantedeschi et al. (<xref ref-type="bibr" rid="B167">2021</xref>).</p>
<p>The majority of gradient-based approaches focus on trees with multivariate or more complex splits (e.g., involving MLPs). This stands in contrast to the combinatorial search based methods of the previous section, which typically learn trees with univariate tests.</p>
</sec>
<sec>
<title>4.5. Evolutionary algorithms</title>
<p>Given the broad applicability of evolutionary algorithms for search, it is not surprising that such algorithms have also been used to search the space of all decision trees to find trees that fit the data well. Evolutionary search is naturally positioned between greedy search, which is fast but prone to suboptimal decisions, and exhaustive search, which gives a provably optimal solution at a high cost. Barros et al. (<xref ref-type="bibr" rid="B9">2012</xref>) survey the area of evolution-based decision tree learning. Among other things, they conclude that evolutionary search does frequently lead to trees with better predictive performance, which is an indication that recursive partitioning&#x00027;s bias toward short trees is not always advantageous. The survey also launches the idea of an evolutionary search for decision tree algorithms (rather than the tree themselves), which was followed up on in later work (Barros et al., <xref ref-type="bibr" rid="B10">2015</xref>). An interesting question is how evolutionary algorithms (or evolutionarily-optimized greedy algorithms) compare to the solver-based methods mentioned before. To our knowledge, no systematic comparisons between evolutionary search and solver-based methods has been made.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Integrating constraints</title>
<p>An aspect of learning that has received increasing attention in recent years is how background knowledge in the use of constraints can be used in the learning process. For instance, a bank might require that its model for credit approval is monotonic in the &#x0201C;income&#x0201D; attribute: all else being equal, a client with a higher income should not get a worse score. Can we verify that a given model does not violate this constraint, or even better: can we ensure that the learner only returns models that do not violate it? We refer to the latter as imposing constraints, and to the former as model verification. There has been a substantial amount of work on both fronts, in the context of decision trees.</p>
<sec>
<title>5.1. Imposing constraints on decision trees</title>
<p>The use of constraints is ubiquitous in machine learning due to two principal reasons. First, model regularization via constraints is a standard way to overcome the overfitting problem in several machine learning models. Second, due to the deep impact on machine learning in our society, in several critical domains (e.g., health, finance) machine learning models do not only need to provide best performance, they also need to meet several requirements such as fairness, privacy-related restrictions, consistency with prior domain knowledge, and so on. In both scenarios, constraint enforcement represents a principled framework to provide a better control of learned machine models such that these models eventually meet societal, ethical and practical goals (Cotter et al., <xref ref-type="bibr" rid="B38">2019</xref>). Several works have therefore been proposed to enforce constraints on decision trees. The survey of Nanfack et al. (<xref ref-type="bibr" rid="B112">2021</xref>) highlights structure-level constraints, feature-level and instance-level constraints on decision trees.</p>
<p>Methods imposing structure-level constraints aim to learn decision trees under constraints over the structure of the tree (e.g., the size, depth). These methods may employ pruning (e.g., learn an overfitted tree through recursive partitioning and then prune this tree to reduce its size). These methods include work such as Garofalakis et al. (<xref ref-type="bibr" rid="B64">2003</xref>) and Struyf and D&#x0017E;eroski (<xref ref-type="bibr" rid="B139">2006</xref>). Thanks to the discrete nature of decision trees, other methods imposing structure-level constraints may also leverage combinatorial optimization to find the best accurate decision tree that satisfy the depth or size constraint (see Section 4).</p>
<p>Methods imposing feature-level constraints aim to learn decision trees under feature-related constraints such as monotonicity, fairness, ordering and hierarchy over selected features on the tree, and privacy. The majority of these methods use a recursive partitioning method along with constraint-aware heuristics, which help in choosing the most informative split that does not violate too much the constraint. Examples of these constraint-aware heuristics include the <italic>total ambiguity score</italic> of Ben-David (<xref ref-type="bibr" rid="B14">1995</xref>) for monotonicity constraints [see also the survey by Potharst and Feelders (<xref ref-type="bibr" rid="B122">2002</xref>)], the <italic>information cost function</italic> of N&#x000FA;&#x000F1;ez (<xref ref-type="bibr" rid="B118">1991</xref>) for hierarchy constraints, the <italic>fair information gain</italic> of Zhang and Ntoutsi (<xref ref-type="bibr" rid="B172">2019</xref>) for fairness constraints, and the adapted exponential mechanism of Li et al. (<xref ref-type="bibr" rid="B95">2020</xref>) for privacy constraints. Besides, there are methods such as Aghaei et al. (<xref ref-type="bibr" rid="B1">2019</xref>) that use combinatorial optimization to learn decision trees under fairness constraints.</p>
<p>Finally, methods imposing instance-level constraints focus either on robustness constraints or on must-link and cannot-link constraints on clustering trees. Robustness is discussed in Section 6.2. Examples of works aiming to integrate must-link and cannot-link constraints include the work of Struyf and D&#x0017E;eroski (<xref ref-type="bibr" rid="B139">2006</xref>), which uses a penalized heuristic composed by two terms: the first one is the average variance instances over leaf nodes normalized by the total variance and the second one is the percentage of violated constraints.</p>
</sec>
<sec>
<title>5.2. Verification of decision tree ensembles</title>
<p>Model verification is used to assess the quality of a learned model. As such, it complements evaluating the model&#x00027;s performance on an unseen test set, i.e., regular <italic>model testing</italic>. In contrast to model testing, which by definition only considers the behavior of the model for the examples in the test set, verification considers the full domain and image of the learned function. In sensitive application domains like healthcare and air traffic control, this rigorous model evaluation is required.</p>
<p>Similar to formal verification of software systems, verification of machine learned models reasons about all possible inputs and their corresponding outputs, and verifies whether these input-output pairs satisfy the prescribed constraints. In practice, this typically happens by negating the given constraints, and trying to find instances that satisfy this negation. If successful, this disproves the claim that the model satisfies the prescribed constraints (and provides a counterexample); otherwise, it proves the claim.</p>
<p>Verification of learned models is challenging: e.g., Kantchelian et al. (<xref ref-type="bibr" rid="B85">2016</xref>) show that verification of tree ensembles in general is NP-hard. Despite this, there is substantial interest in it because it is widely applicable and can be used to validate a variety of questions and constraints. Some examples, each with a notable paper that has focused on this problem (not exhaustive):</p>
<list list-type="bullet">
<list-item><p><bold>Adversarial example generation</bold>: can we slightly perturb example <italic>x</italic> so that the predicted label for this modified <italic>x</italic> is different? In practice, it is often the case that small imperceptible changes can be found that fool the model (Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>).</p></list-item>
<list-item><p><bold>Robustness checking</bold>: does an adversarial example exist in a small neighborhood surrounding an example <italic>x</italic> (Chen et al., <xref ref-type="bibr" rid="B32">2019b</xref>)?</p></list-item>
<list-item><p><bold>Counterfactual example generation</bold>: what attribute value needs to change in order to get the desired model outcome? This is similar to adversarial example generation, but usually also requires actionability and plausibility (Parmentier and Vidal, <xref ref-type="bibr" rid="B121">2021</xref>).</p></list-item>
<list-item><p><bold>Attribute importance</bold>: can a change in one or a small set of attributes wildly affect the output of the model (Devos et al., <xref ref-type="bibr" rid="B45">2021a</xref>)?</p></list-item>
<list-item><p><bold>Fairness</bold>: is a loan approval probability affected by race (Grari et al., <xref ref-type="bibr" rid="B67">2019</xref>)?</p></list-item>
<list-item><p><bold>Domain-specific questions</bold>: when predicting the risk of stroke: can a person aged between 40 and 50 with a BMI less than 20 have a risk greater than 5% (Devos et al., <xref ref-type="bibr" rid="B46">2021b</xref>)?</p></list-item>
<list-item><p><bold>Safety</bold>: are there conditions under which the model deviates more than some threshold <italic>t</italic> from some safe reference model? (Wei et al., <xref ref-type="bibr" rid="B157">2022</xref>).</p></list-item>
</list>
<p>As is clear from the examples above, verification is applicable to a wide range of problems. However, ever since it was shown that neural networks (Szegedy et al., <xref ref-type="bibr" rid="B140">2013</xref>) and later decision trees (Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>) are susceptible to adversarial examples, most research has focused on adversarial example generation and robustness checking. In Section 6, we will focus on these applications in detail.</p>
</sec>
</sec>
<sec id="s6">
<title>6. Decision trees for responsible AI</title>
<p>With AI increasingly affecting the lives of billions of people, there is an increased societal and academic interest in Responsible AI, by which is meant: giving due care and consideration to the consequences of using AI in certain contexts. Responsible AI implies taking measures to ensure that AI systems behave in a way that is considered fair, safe, transparent, and generally respects human rights. For machine-learned models, this often translates to ensuring that the learned models fulfill certain constraints that imply fairness, robustness, and explainability. Below, we consider each of these in turn, and discuss work in the context of decision trees.</p>
<sec>
<title>6.1. Fairness</title>
<p>Often, a desirable characteristic of classification models is to not discriminate: given a labeled dataset and a Boolean attribute B, the model should not treat instances with property B different from the overall population. For instance, if B represents whether or not a person identifies as female, we may find it undesirable that the classifier is less likely to assign a positive label to this person.</p>
<p>Unfortunately, highly accurate predictive models may display such unfair behavior, as discrimination may be present in training data. This form of discrimination can not be avoided by simply removing attribute B from training data; other features may correlate with B in unexpected and complex ways. This has led to a number of different approaches that aim to balance two requirements at the same time: accuracy and <italic>group fairness</italic>, in which a group with a specific characteristic should be treated equally to an overall group of individuals.</p>
<p>The so-called &#x0201C;preprocessing&#x0201D; approaches modify the training data before the data is fed into a machine learning algorithm; these approaches can also be applied when the learning algorithm is a decision tree learning algorithm. Here, we focus on the &#x0201C;in-processing&#x0201D; approaches, that is, approaches that take into account fairness while learning a model.</p>
<p>A first approach was proposed by Kamiran et al. (<xref ref-type="bibr" rid="B83">2010</xref>). In this work, a decision tree is first learned using a heuristic that takes into account both discrimination and fairness. Subsequently, a post-processing step is used to relabel leaves to further improve a fairness score. This approach was adapted to the incremental learning setting by Zhang and Ntoutsi (<xref ref-type="bibr" rid="B172">2019</xref>).</p>
<p>While these approaches aim to find trees that represent a good trade-off between two criteria, they do not provide guarantees. Algorithms for learning optimal decision trees have been modified to provide such guarantees. Aghaei et al. (<xref ref-type="bibr" rid="B1">2019</xref>) adapted the MILP-solver based approach to take into account two forms of fairness; the approach optimizes a weighted sum of accuracy and fairness. Similarly, van der Linden et al. (<xref ref-type="bibr" rid="B146">2022</xref>) adapted the DL8-style approach such that an upper- or lower-bound on fairness can be ensured, where the fairness score proposed by Kamiran et al. (<xref ref-type="bibr" rid="B83">2010</xref>) is used to evaluate the fairness of a model.</p>
<p>Given the high predictive performance of boosted decision trees, ensuring fairness has also been studied for ensembles of decision trees. A first approach was proposed by Grari et al. (<xref ref-type="bibr" rid="B67">2019</xref>), in which a gradient boosting algorithm has been modified to change the gradient for instances based on the ability of an adversarial model to predict the sensitive feature from the class predicted by the ensemble model.</p>
</sec>
<sec>
<title>6.2. Robustness against adversarial examples</title>
<p>Since the discovery that tree ensembles, just like neural networks, are susceptible to adversarial examples (Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>), much research has been devoted to the detection of robustness issues and ways to mitigate them. Before surveying it in more detail, we introduce some terminology.</p>
<p>For a classification problem, <italic>x</italic>&#x02032; is an adversarial example of a regular example <italic>x</italic> when it is &#x0201C;close to&#x0201D; <italic>x</italic>, i.e., in some neighborhood <italic>N</italic>(<italic>x</italic>) of <italic>x</italic>, and <italic>f</italic>(<italic>x</italic>) &#x02260; <italic>f</italic>(<italic>x</italic>&#x02032;), with <italic>f</italic>(<italic>x</italic>) the predicted label for <italic>x</italic>. Generally, the assumption is made that <italic>f</italic> classifies <italic>x</italic> correctly, and that there is <italic>truth proximity</italic>, i.e., examples in <italic>N</italic>(<italic>x</italic>) are assumed to have the same true label (Diochnos et al., <xref ref-type="bibr" rid="B47">2018</xref>). <italic>N</italic>(<italic>x</italic>) is called the <bold>attack model</bold>. It can be defined using a simple norm and radius &#x003F5;, e.g. <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x0221E;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, but more intricate attack models also exist.<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref></p>
<p>Adversarial example generation is a direct application of the verification framework (see Section 5.2): given a regular example <italic>x</italic>, a verification tool is asked to construct an <italic>x</italic>&#x02032; &#x02208; <italic>N</italic>(<italic>x</italic>) with <italic>f</italic>(<italic>x</italic>&#x02032;) &#x02260; <italic>f</italic>(<italic>x</italic>). If it fails, this proves the model robust with respect to <italic>N</italic>(<italic>x</italic>).</p>
<p>When <italic>N</italic>(<italic>x</italic>) is defined by an <italic>l</italic><sub><italic>p</italic></sub> norm, the smallest distance &#x003F5;<sup>&#x0002A;</sup> for which an adversarial example exists:</p>
<disp-formula id="E1"><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mtext class="textrm" mathvariant="normal">such&#x000A0;that&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02260;</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>can be seen as a measure for how difficult it is to attack an ensemble (Calzavara et al., <xref ref-type="bibr" rid="B29">2020b</xref>).</p>
<sec>
<title>6.2.1. Verifying robustness</title>
<p>Various different approaches have been proposed for tree ensemble verification, varying in tooling, problem focus, and precision. <xref ref-type="table" rid="T1">Table 1</xref> provides an overview.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Overview of methods for adversarial example generation (<italic>adv</italic>) and robustness checking (<italic>rob</italic>) for tree ensembles.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Focus</bold></th>
<th valign="top" align="center"><bold>Exact?</bold></th>
<th valign="top" align="center"><bold>Anytime?</bold></th>
<th valign="top" align="center"><bold>Generate examples?</bold></th>
<th valign="top" align="center"><bold>Code available?</bold></th>
<th valign="top" align="center"><bold>Supported norms</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Exact (MILP, Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y<xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub></td>
</tr> <tr>
<td valign="top" align="left">Symbolic prediction (Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center"><italic>l</italic><sub>0</sub></td>
</tr> <tr>
<td valign="top" align="left">VeriGB (SMT, Einziger et al., <xref ref-type="bibr" rid="B53">2019</xref>)</td>
<td valign="top" align="center">adv</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center"><italic>l</italic><sub>&#x0221E;</sub><xref ref-type="table-fn" rid="TN1c"><sup>c</sup></xref></td>
</tr> <tr>
<td valign="top" align="left">Cube attack (Andriushchenko and Hein, <xref ref-type="bibr" rid="B7">2019</xref>)</td>
<td valign="top" align="center">adv</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub>&#x0221E;</sub></td>
</tr> <tr>
<td valign="top" align="left">Merge (Chen et al., <xref ref-type="bibr" rid="B32">2019b</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y<xref ref-type="table-fn" rid="TN1d"><sup>d</sup></xref></td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub>&#x0221E;</sub></td>
</tr> <tr>
<td valign="top" align="left">Merge&#x0002B; (Wang et al., <xref ref-type="bibr" rid="B156">2020</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y<xref ref-type="table-fn" rid="TN1d"><sup>d</sup></xref></td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub></td>
</tr> <tr>
<td valign="top" align="left">VoTE (T&#x000F6;rnblom and Nadjm-Tehrani, <xref ref-type="bibr" rid="B143">2020</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub></td>
</tr> <tr>
<td valign="top" align="left">RF-ILP (ILP, Zhang et al., <xref ref-type="bibr" rid="B170">2020b</xref>)</td>
<td valign="top" align="center">adv</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center"><italic>l</italic><sub>0</sub></td>
</tr> <tr>
<td valign="top" align="left">LT-attack (Zhang et al., <xref ref-type="bibr" rid="B169">2020a</xref>)</td>
<td valign="top" align="center">adv</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub></td>
</tr> <tr>
<td valign="top" align="left">Silva (Ranzato and Zanella, <xref ref-type="bibr" rid="B127">2020</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub>&#x0221E;</sub></td>
</tr> <tr>
<td valign="top" align="left">TreeCert (Calzavara et al., <xref ref-type="bibr" rid="B27">2020a</xref>)</td>
<td valign="top" align="center">rob</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub><xref ref-type="table-fn" rid="TN1f"><sup>f</sup></xref></td>
</tr> <tr>
<td valign="top" align="left">Tree-ck (SMT, Devos et al., <xref ref-type="bibr" rid="B45">2021a</xref>)</td>
<td valign="top" align="center">adv</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">n</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub><italic>p</italic></sub><xref ref-type="table-fn" rid="TN1f"><sup>f</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Veritas (Devos et al., <xref ref-type="bibr" rid="B46">2021b</xref>)</td>
<td valign="top" align="center">both</td>
<td valign="top" align="center">y<xref ref-type="table-fn" rid="TN1e"><sup>e</sup></xref></td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center">y</td>
<td valign="top" align="center"><italic>l</italic><sub>&#x0221E;</sub></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The <italic>exact</italic> column indicates whether the returned results is guaranteed to be optimal. The <italic>anytime</italic> methods can be stopped at any time and will always produce bounds on the result. The supported attack model is provided in the <italic>supported norms</italic> column.</p>
<fn id="TN1a"><label>a</label><p>A MILP solver like Gurobi [Gurobi Optimization, LLC (<xref ref-type="bibr" rid="B71">2022</xref>)] is anytime, but the approximate bounds are not tight enough for practical use (Devos et al., <xref ref-type="bibr" rid="B46">2021b</xref>).</p></fn>
<fn id="TN1b"><label>b</label><p>Not by the original authors, but Devos et al. (<xref ref-type="bibr" rid="B46">2021b</xref>) and Vos and Verwer (<xref ref-type="bibr" rid="B153">2021</xref>) provide implementations.</p></fn>
<fn id="TN1c"><label>c</label><p>Authors claim the method works for any <italic>p</italic>-norm, but only evaluate <italic>l</italic><sub>&#x0221E;</sub>.</p></fn>
<fn id="TN1d"><label>d</label><p>The method is technically anytime as each <italic>level</italic> produces a new bound. However, the number of levels <italic>L</italic> is at most log<sub>2</sub>(<italic>M</italic>), with <italic>M</italic> the number of trees, and is set to 2 or 3 in the experiments.</p></fn>
<fn id="TN1e"><label>e</label><p>When the method is run to completion, the solution is exact.</p></fn>
<fn id="TN1f"><label>f</label><p>These systems allow a generic formulation of the attack model. For <italic>TreeCert</italic>, the attacker is extremely flexible and is modeled as a C program.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>A first set of approaches translate the model to a mathematical formulation and use off-the-shelf solvers. Kantchelian et al. (<xref ref-type="bibr" rid="B85">2016</xref>) propose a mixed-integer linear programming (MILP) solution that can deal with any <italic>l</italic><sub><italic>p</italic></sub> norm. This was later specialized to a pure integer linear program (ILP) for binary input attributes and the <italic>l</italic><sub>0</sub> norm (Zhang et al., <xref ref-type="bibr" rid="B170">2020b</xref>). Other have used satisfiability modulo theory (SMT): the approaches by Einziger et al. (<xref ref-type="bibr" rid="B53">2019</xref>), Sato et al. (<xref ref-type="bibr" rid="B134">2020</xref>), and Devos et al. (<xref ref-type="bibr" rid="B45">2021a</xref>) are similar and differ only in focus and implementation details. Calzavara et al. (<xref ref-type="bibr" rid="B27">2020a</xref>) and Ranzato and Zanella (<xref ref-type="bibr" rid="B127">2020</xref>) take inspiration from the software verification field and use <italic>abstract interpretation</italic>, commonly used for static program analysis, for formal verification of tree ensembles. Calzavara et al. (<xref ref-type="bibr" rid="B26">2022</xref>) propose a solution that verifies <italic>resilience</italic>, a generalization over robustness which considers all possible test sets that could be sampled.</p>
<p>A second set of approaches use techniques tailored to tree ensembles, rather than off&#x02013;the-shelf solvers. These tend to be more efficient, but approximate. Chen et al. (<xref ref-type="bibr" rid="B32">2019b</xref>) reformulate the verification task as a maximum-clique problem in an <italic>M</italic>-partite graph, with <italic>M</italic> the number of trees. Wang et al. (<xref ref-type="bibr" rid="B156">2020</xref>) extend Chen et al. (<xref ref-type="bibr" rid="B32">2019b</xref>), which only supports the <italic>l</italic><sub>&#x0221E;</sub> norm, to any <italic>l</italic><sub><italic>p</italic></sub> norm, <italic>p</italic> &#x02208; [0, &#x0221E;]. These two approaches are fast, but only produce a coarse lower bound on the robustness value, and do not generate adversarial examples. These issues are resolved by Devos et al. (<xref ref-type="bibr" rid="B46">2021b</xref>), who propose a heuristic search in the same graph representation that can generate concrete examples and produces anytime lower and upper bounds on the ensemble&#x00027;s output. Zhang et al. (<xref ref-type="bibr" rid="B169">2020a</xref>) use the concept of neighboring cliques (they call it <italic>leaf tuples</italic>) in an efficient greedy search procedure that only changes one component of the clique per step. They focus on adversarial example generation instead of robustness checking.</p>
<p>The MILP approach by Kantchelian et al. (<xref ref-type="bibr" rid="B85">2016</xref>) is the most frequently used baseline and evaluation tool for robust tree methods in the literature (see Section 6.2.2). It produces exact results within a reasonable time frame. The other exact approaches using SMT are less efficient. It is unclear how well the methods based on abstract interpretation perform in practice, as they are only evaluated on smaller datasets. The same is true for the method of T&#x000F6;rnblom and Nadjm-Tehrani (<xref ref-type="bibr" rid="B143">2020</xref>).</p>
<p>In the authors&#x00027; experience, Zhang et al. (<xref ref-type="bibr" rid="B169">2020a</xref>) and Devos et al. (<xref ref-type="bibr" rid="B46">2021b</xref>) offer the best tradeoff between accuracy and efficiency for adversarial example generation, whereas for robustness checking, Kantchelian et al. (<xref ref-type="bibr" rid="B85">2016</xref>) and Devos et al. (<xref ref-type="bibr" rid="B46">2021b</xref>) are recommended.</p>
</sec>
<sec>
<title>6.2.2. Improving robustness</title>
<p>From the multitude of papers and methods in the previous section, it is clear that decision tree ensembles are not robust. Hence, researchers have investigated making decision trees and their ensembles more robust. <xref ref-type="table" rid="T2">Table 2</xref> gives an overview of the available methods [extends the overviews by Vos and Verwer (<xref ref-type="bibr" rid="B153">2021</xref>) and Guo et al. (<xref ref-type="bibr" rid="B70">2022</xref>)].</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Overview of methods for robust decision tree learning.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>Ensemble</bold></th>
<th valign="top" align="left"><bold>Complexity</bold></th>
<th valign="top" align="left"><bold>Norm</bold></th>
<th valign="top" align="left"><bold>Guarantees?</bold></th>
<th valign="top" align="left"><bold>Code available?</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Adversarial boosting (Kantchelian et al., <xref ref-type="bibr" rid="B85">2016</xref>)</td>
<td valign="top" align="left">GB</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>0</sub></td>
<td valign="top" align="left">n</td>
<td valign="top" align="left">n</td>
</tr> <tr>
<td valign="top" align="left">RobustTrees (Chen et al., <xref ref-type="bibr" rid="B31">2019a</xref>)</td>
<td valign="top" align="left">RF&#x0002B;GB</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">n</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">RobustStumps (Andriushchenko and Hein, <xref ref-type="bibr" rid="B7">2019</xref>)</td>
<td valign="top" align="left">GB</td>
<td valign="top" align="left"><italic>n</italic><sup>2</sup></td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">TREANT (Calzavara et al., <xref ref-type="bibr" rid="B29">2020b</xref>)</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left"><italic>n</italic><sup>2</sup></td>
<td valign="top" align="left"><italic>l</italic><sub><italic>p</italic></sub><xref ref-type="table-fn" rid="TN2a"><sup>a</sup></xref></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">y<xref ref-type="table-fn" rid="TN2b"><sup>b</sup></xref></td>
</tr> <tr>
<td valign="top" align="left">MetaSilvae (Ranzato and Zanella, <xref ref-type="bibr" rid="B128">2021</xref>)</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">?<xref ref-type="table-fn" rid="TN2c"><sup>c</sup></xref></td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">n</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">Feat. Part. Forests (Calzavara et al., <xref ref-type="bibr" rid="B28">2021</xref>)</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>0</sub></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">n</td>
</tr> <tr>
<td valign="top" align="left">GROOT (Vos and Verwer, <xref ref-type="bibr" rid="B153">2021</xref>)</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">n</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">CostAwareRobust (Chen et al., <xref ref-type="bibr" rid="B34">2021</xref>)</td>
<td valign="top" align="left">RF&#x0002B;GB</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub><xref ref-type="table-fn" rid="TN2d"><sup>d</sup></xref></td>
<td valign="top" align="left">n</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">ROCT (Vos and Verwer, <xref ref-type="bibr" rid="B154">2022</xref>)</td>
<td valign="top" align="left">Single</td>
<td valign="top" align="left">exp(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub><xref ref-type="table-fn" rid="TN2d"><sup>d</sup></xref></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">Relabeling (Vos and Verwer, <xref ref-type="bibr" rid="B155">2023</xref>)</td>
<td valign="top" align="left">RF&#x0002B;GB</td>
<td valign="top" align="left"><italic>n</italic><sup>2.5</sup></td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub><xref ref-type="table-fn" rid="TN2e"><sup>e</sup></xref></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">y</td>
</tr> <tr>
<td valign="top" align="left">FPRDT (Guo et al., <xref ref-type="bibr" rid="B70">2022</xref>)</td>
<td valign="top" align="left">Single</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">n</td>
</tr> <tr>
<td valign="top" align="left">PRAdaBoost (Guo et al., <xref ref-type="bibr" rid="B70">2022</xref>)</td>
<td valign="top" align="left">Ada</td>
<td valign="top" align="left"><italic>n</italic>log(<italic>n</italic>)</td>
<td valign="top" align="left"><italic>l</italic><sub>&#x0221E;</sub></td>
<td valign="top" align="left">y</td>
<td valign="top" align="left">n</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Some methods robustify a single tree (<italic>Single</italic>), and others are used in ensembles: random forest (<italic>RF</italic>), gradient boosting (<italic>GB</italic>) or AdaBoost (<italic>Ada</italic>). This is indicated in the <italic>Ensemble</italic> column. The <italic>Complexity</italic> column shows the complexity of the learning algorithm in number of examples <italic>n</italic>. The number of features and the size of the models is ignored. The <italic>Norm</italic> column lists the attack model that is considered by the method. The <italic>Guarantees</italic> column has a yes (<italic>y</italic>) value when the learned models are guaranteed to be robust.</p>
<fn id="TN2a"><label>a</label><p>TREANT has a flexible attack model in the form of rewriting rules, allowing asymmetric perturbations (e.g. only positive), and a maximum budget (e.g. an <italic>l</italic><sub>1</sub>-norm).</p></fn>
<fn id="TN2b"><label>b</label><p>Vos and Verwer (<xref ref-type="bibr" rid="B153">2021</xref>) provide an alternative implementation.</p></fn>
<fn id="TN2c"><label>c</label><p>The paper includes experiments that show that the genetic algorithm converges in 50-70 iterations for the tested datasets.</p></fn>
<fn id="TN2d"><label>d</label><p>An asymmetric attack model is supported, i.e., it is possible to allow larger positive than negative perturbations, but it is still a box constraint.</p></fn>
<fn id="TN2e"><label>e</label><p>Other norms are possible, but this is not evaluated.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>In general, robust training can be formulated as the following min-max problem (Madry et al., <xref ref-type="bibr" rid="B103">2018</xref>), with attack model <italic>N</italic>(<italic>x</italic>), loss function <italic>l</italic>, training examples (<italic>x</italic><sub><italic>i</italic></sub>, <italic>y</italic><sub><italic>i</italic></sub>), and ensemble <italic>f</italic>:</p>
<disp-formula id="E2"><label>(1)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The outer minimization is the usual learning optimization problem minimizing the loss function. The inner maximization models the worst-case scenario where an adversary attempts to maximize the loss of the model (e.g., flip the label) in the neighborhood of a training example <italic>x</italic>.</p>
<p>The problem is tackled from multiple different angles. Kantchelian et al. (<xref ref-type="bibr" rid="B85">2016</xref>) propose enriching the training data with adversarial examples, as has been done before for neural networks (Szegedy et al., <xref ref-type="bibr" rid="B140">2013</xref>). Vos and Verwer (<xref ref-type="bibr" rid="B154">2022</xref>) use the ideas from optimal trees (see Section 4) to learn robust trees. Ranzato and Zanella (<xref ref-type="bibr" rid="B128">2021</xref>) use a genetic algorithm to learn robust trees. A number of papers change the splitting procedure used during the construction of the trees. The main idea is to avoid splitting thresholds that lie in dense areas; examples with values close to those thresholds can easily jump to the other side with a small perturbation. Others look at the global loss, assume a tree structure or a specific loss function, and rewrite Equation 1 to simplify the problem. Lastly, there are two approaches that are orthogonal to the previous methods. The first is a pre-processing procedure that partitions the features between the trees in the ensemble in such a way that it becomes impossible to ever trick the majority of the trees (Calzavara et al., <xref ref-type="bibr" rid="B28">2021</xref>). The second is a post-processing procedure that relabels the leaves of the ensemble to make it more difficult to find neighboring leaves that predict different classes (Vos and Verwer, <xref ref-type="bibr" rid="B155">2023</xref>).</p>
<p>Chen et al. (<xref ref-type="bibr" rid="B31">2019a</xref>), Vos and Verwer (<xref ref-type="bibr" rid="B153">2021</xref>), and Chen et al. (<xref ref-type="bibr" rid="B34">2021</xref>) propose changes to the splitting procedure. Chen et al. (<xref ref-type="bibr" rid="B31">2019a</xref>) consider the <italic>ambiguity set</italic> of examples that can flip sides with an &#x003F5; perturbation, and propose a combinatorial optimization problem that finds the configuration of the examples in the ambiguity set that maximally worsens the loss (the maximization in Equation 1). This combinatorial problem cannot be practically solved, so both Chen et al. (<xref ref-type="bibr" rid="B31">2019a</xref>) and Chen et al. (<xref ref-type="bibr" rid="B34">2021</xref>) introduce approximations. Vos and Verwer (<xref ref-type="bibr" rid="B153">2021</xref>) improve upon this work by proposing an exact analytical solution for the Gini impurity. The result is a scalable, robust decision tree learner called GROOT. The method proposed by Calzavara et al. (<xref ref-type="bibr" rid="B29">2020b</xref>) is similar, but solves the combinatorial problem as a convex numerical optimization problem. Additionally, they assure global robustness by introducing the <italic>attack invariance</italic> property, which keeps track of the attack surface across different leaves in the tree.</p>
<p>Andriushchenko and Hein (<xref ref-type="bibr" rid="B7">2019</xref>) and Guo et al. (<xref ref-type="bibr" rid="B70">2022</xref>) look at the global loss of Equation 1. Andriushchenko and Hein (<xref ref-type="bibr" rid="B7">2019</xref>) limits the weak learners to decision stumps (trees with a single split at the root and two leaves). This allows them to split up the problem into one independent problem per attribute. Guo et al. (<xref ref-type="bibr" rid="B70">2022</xref>) consider the 0/1 loss and realize that this loss can be used directly to evaluate split candidates in constant time. Ranzato and Zanella (<xref ref-type="bibr" rid="B128">2021</xref>), Vos and Verwer (<xref ref-type="bibr" rid="B154">2022</xref>), and Guo et al. (<xref ref-type="bibr" rid="B70">2022</xref>) all use variants of <italic>robust</italic> 0/1 loss, where an example <italic>x</italic> is only considered correctly classified when all instances in <italic>N</italic>(<italic>x</italic>) receive the same label.</p>
</sec>
</sec>
<sec>
<title>6.3. Explainability</title>
<p>In critical domains such as finance and health, the adoption of machine learning models may require trustworthy guarantees such as transparency.<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> In addition to giving accurate predictions, machine learning models then must provide explanations for their predictions in human-understandable terms (Ribeiro et al., <xref ref-type="bibr" rid="B129">2016a</xref>; Doshi-Velez and Kim, <xref ref-type="bibr" rid="B50">2017</xref>). This motivates the research area called eXplainable Artificial Intelligence (XAI). In XAI, there are several types of explanations, including decision rules, visualization, variable importance, and counterfactual explanations. Below, we describe how decision trees can play a role in this.</p>
<sec>
<title>6.3.1. Decision tree explanations</title>
<p>By nature, decision trees can explain their predictions using decision rules of the form &#x0201C;decision <italic>D</italic> was made because condition <italic>C</italic> was fulfilled&#x0201D;. The explanatory power of decision tree models can be further improved by constraining its complexity (e.g., find the maximally accurate tree of depth at most <italic>d</italic>, see also Section 4) or by explicating relevant properties, such as which features are most important and how features interact (e.g., Lundberg et al., <xref ref-type="bibr" rid="B102">2019</xref>).</p>
<p>Given these desirable properties, methods have been designed to approximate black-box models (or parts of them) with decision trees, so that they inherit to some extent these properties.</p>
<p>The most classical approach where decision trees are used is <bold>knowledge distillation</bold>. In knowledge distillation, decision trees <italic>f</italic> are trained to approximate the black-box model <italic>g</italic> either <italic>locally</italic>, in the neighborhood of an instance <italic>x</italic> (<inline-formula><mml:math id="M5"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mi>d</mml:mi><mml:mstyle mathsize="1.19em"><mml:mrow><mml:mo stretchy="false">(</mml:mo></mml:mrow></mml:mstyle><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mstyle mathsize="1.19em"><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x003A9;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>), or <italic>globally</italic> (<inline-formula><mml:math id="M6"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:munder><mml:mi>d</mml:mi><mml:mstyle mathsize="1.19em"><mml:mrow><mml:mo stretchy="false">(</mml:mo></mml:mrow></mml:mstyle><mml:mi>f</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi><mml:mstyle mathsize="1.19em"><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x003A9;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>), where &#x003A9;(<italic>f</italic>) is the complexity of the decision tree <italic>f</italic>. For local explainability, a well-known method is LIME (Ribeiro et al., <xref ref-type="bibr" rid="B130">2016b</xref>). In LIME, decision trees can be used as the interpretable model that locally approximates the black-box model. For global explainability, examples of methods that use decision trees include TREPAN (Craven and Shavlik, <xref ref-type="bibr" rid="B39">1995</xref>), its improved version TREPAN Reloaded (Confalonieri et al., <xref ref-type="bibr" rid="B36">2020</xref>) and the soft distilled decision tree of Frosst and Hinton (<xref ref-type="bibr" rid="B61">2018</xref>). In knowledge distillation, the standard setup is to use the empirical distribution to approximate the black-box model. However, for a better approximation, the work of Bastani et al. (<xref ref-type="bibr" rid="B11">2017</xref>) distills random forests on a decision tree using samples from a fitted mixture of truncated normal distributions.</p>
<p>Methods using the knowledge distillation approach may not work well with high-dimensional data such as image and text data because interpretable univariate trees may not be suitable for this type of data. That is why several methods have been proposed to design new models using or having decision trees as <bold>interpretable components</bold>. For example, Wu et al. (<xref ref-type="bibr" rid="B160">2018</xref>) and Okajima and Sadamasa (<xref ref-type="bibr" rid="B119">2019</xref>) respectively use decision trees and extracted decision rules to constrain deep neural networks for improved interpretability. Other methods such as the neural prototype trees (ProtoTrees) (Nauta et al., <xref ref-type="bibr" rid="B114">2021</xref>) and recurrent decision tree models (Alaniz et al., <xref ref-type="bibr" rid="B5">2021</xref>) integrate decision trees in fully differentiable models with convolutional neural networks and recurrent neural networks, respectively. Besides this, there is a considerable literature on neural tree approaches (see the dNDF in Section 4.4) where the goal is to combine neural networks and decision trees models with the ambition to get the advantages of the two models: interpretability without sacrificing accuracy. The recent survey of Li et al. (<xref ref-type="bibr" rid="B94">2022</xref>) analyses the majority of this work.</p>
</sec>
<sec>
<title>6.3.2. Counterfactual explanations of decision trees and forests</title>
<p>In the previous section, through decision rule explanations, decision trees were mainly described to provide explanations of black-box models. This section focuses on a different type of explanations, counterfactual explanations, which may be used to locally explain soft decision trees and tree ensembles.</p>
<p>Counterfactual explanations aim to provide an answer to questions of the type &#x0201C;what should I have done differently to get a different outcome?&#x0201D;. They provide minimal changes that can be performed on input features of an instance <italic>x</italic> to change its prediction <italic>f</italic>(<italic>x</italic>). More formally, in the simplest setting, a counterfactual explanation is an instance <italic>x</italic>&#x02032; such that (1) the prediction of <italic>x</italic>&#x02032; differs from <italic>f</italic>(<italic>x</italic>), i.e., <italic>f</italic>(<italic>x</italic>&#x02032;) &#x02260; <italic>f</italic>(<italic>x</italic>), (2) <italic>x</italic> and <italic>x</italic>&#x02032; are close under a metric, (3) <italic>x</italic>&#x02032; is a plausible input (Albini et al., <xref ref-type="bibr" rid="B6">2022</xref>), where plausible may mean a realistic instance that lies in the data manifold. Apart from the requirement of plausibility, counterfactual explanations are closely related to adversarial examples (described in Section 5.2) and their generation can be framed as a constrained optimization problem. Since univariate decision trees are interpretable by design (through decision rules), there is little interest to provide counterfactual explanations on them. However, there is a growing interest in designing methods that are able to provide this type of explanations for oblique decision trees and tree ensembles.</p>
<p>Although there exist several agnostic (that do not depend on the model class) methods to generate counterfactual explanations [see the recent survey of Guidotti (<xref ref-type="bibr" rid="B69">2022</xref>)], few of them apply to (oblique) decision trees because of the non-differentiability. This motivated Carreira-Perpi&#x000F1;&#x000E1;n and Hada (<xref ref-type="bibr" rid="B30">2021</xref>) to propose a closed-form solution (resp. quadratic program) for univariate decision trees (resp. oblique decision trees) to find counterfactual explanations.</p>
<p>On tree ensemble models, Cui et al. (<xref ref-type="bibr" rid="B42">2015</xref>) showed that the constrained optimization problem of generating counterfactual explanations is NP-hard. Therefore, through the lens of optimization, there are heuristic based approaches and optimal based approaches to generate counterfactual explanations for tree ensemble models.</p>
<p>Tolomei et al. (<xref ref-type="bibr" rid="B142">2017</xref>) propose a heuristic method that breaks the computational complexity by searching counterfactual explanations on only at least half of decision trees (in the ensemble) that give the desired outcome.</p>
<p>The majority of optimality based approaches leverage MILP solvers to model the generation of <italic>optimal</italic> counterfactual explanations for a tree ensemble. Among the earliest work in this direction is the work of Cui et al. (<xref ref-type="bibr" rid="B42">2015</xref>). While their framework was general enough to cope with all <italic>l</italic><sub><italic>p</italic></sub> norms, Cui et al. (<xref ref-type="bibr" rid="B42">2015</xref>) eventually consider only the Mahalanobis distance and use a discretization in the input space to permit a modeling (of the MILP problem) with only integer variables. Still using integer variables, this framework has been recently improved by Kanamori et al. (<xref ref-type="bibr" rid="B84">2021</xref>), extending it to an <italic>l</italic><sub>1</sub> norm. Remarking that integer variables usually slow down the optimization done by the MILP solver (due to their implication in the branch and bound), Parmentier and Vidal (<xref ref-type="bibr" rid="B121">2021</xref>) recently introduce a new MILP formulation that significantly reduces the number of integer variables. As a result, their formulation allows to generate optimal counterfactual explanations in seconds for moderated-size problems (hundreds of trees and over fifty features).</p>
<p>The different types of approaches among tree learners (see Section 4) are reflected also here; for instance, Lucic et al. (<xref ref-type="bibr" rid="B101">2022</xref>) show how methods originally proposed for differentiable models can be used with tree ensembles.</p>
<p>A significant issue with counterfactual examples for trees and ensembles is their robustness to changes in the model: an example that is counterfactual for a model may no longer be if the model is retrained on slightly different data. Dutta et al. (<xref ref-type="bibr" rid="B52">2022</xref>) study how to generate robust counterfactual examples.</p>
</sec>
</sec>
</sec>
<sec id="s7">
<title>7. Challenges and perspectives</title>
<p>It stands out from the preceding sections that there is a recent resurgence in the use of decision trees. Their discrete nature readily allows (1) the extraction of human-readable decision rules and (2) the full verification of the input-output mapping defined by the trees. This stands in stark contrast to (deep) neural networks. While the performance of deep learning is unchallenged on many tasks, extracting human-interpretable information about the network is much more challenging and, as it stands now, verification of neural networks seems to be more difficult to scale to realistic problem scenarios than verification of tree ensembles. For an overview of verification in deep neural networks, see Liu et al. (<xref ref-type="bibr" rid="B98">2021</xref>).</p>
<p>The aforementioned reasons explain why many approaches are again considering decision trees, either by themselves, or as surrogate models. For example, decision trees are often the target models in knowledge distillation for interpretability and explanations (Ribeiro et al., <xref ref-type="bibr" rid="B130">2016b</xref>; Confalonieri et al., <xref ref-type="bibr" rid="B36">2020</xref>) (see Section 6.3), and they are used in reinforcement learning for policy verification (Bastani et al., <xref ref-type="bibr" rid="B12">2018</xref>; Milani et al., <xref ref-type="bibr" rid="B108">2022</xref>). While neural networks are particularly well-suited to reinforcement learning given their natural ability to continuously update the weights, we see that decision trees are used again purely for their interpretability and ease of verification, even when that means giving up some performance.</p>
<p>As discussed in Section 6.2, robustness is a major open challenge in decision tree ensembles, and the field of robust trees is growing rapidly, with multiple dimensions being explored simultaneously. A first important issue arises because most tree ensembles are non-continuous, non-smooth step functions, which means it is not straightforward to reason about the smoothness of the function. Coincidentally, robustness and smoothness are linked (Yang et al., <xref ref-type="bibr" rid="B164">2020</xref>): a model that is non-robust for an instance <italic>x</italic>, i.e., a perturbed example <italic>x</italic>&#x02032; close to <italic>x</italic> exists with a different prediction, must inevitable have a large rate of change in output between <italic>x</italic> and <italic>x</italic>&#x02032;. Tree ensembles predict constant values for discrete subsections of the input space. There is no smoothness constraint between values predicted for neighboring subsections. Contrast this to smooth continuous functions where assumptions can be made about the rate of change between close points in the input space. A second important issue is due to the fact that the number of attributes tested to reach a leaf is limited to the depth of the leaf in the tree. Assuming that the number of attributes in the data is relatively large, many of the attributes are unconstrained given the prediction of a particular tree. In an ensemble, the next tree is likely going to pick different attributes. Correlations exist in the data distribution, but the trees do not strictly enforce them. After all, a split on a strongly correlated feature is unlikely to yield a better partitioning of the data. An attacker can exploit this as follows. Making small perturbations in one attribute might flip a split in one tree, but will not affect another tree that happened to split on a correlated attribute. An attack can use this to carefully select the branches in the trees to attain a desired outcome.</p>
<p>Section 6 showed challenges tackled with tree-based models in the context of responsible AI. These issues, which include fairness, robustness and explainability, are mainly addressed <italic>in isolation</italic> in the literature, although they are clearly related (see, e.g., the analogy between counterfactual explanations and adversarial example generation in Sections 5.2 and 6.3.2). There are very few studies that link all these components in a single framework or that thoroughly investigate the possible (in)compatibility of the requirements of responsible and trustworthy AI. It is expected that future studies will fill this gap, in particular for tree-based methods.</p>
</sec>
<sec id="s8">
<title>8. Concluding remarks</title>
<p>Decision trees have been a cornerstone of machine learning from its very beginning, and will likely remain so for decades to come. Some reasons for this are:</p>
<list list-type="order">
<list-item><p>Predictive performance: Tree ensembles have unrivaled predictive accuracy when learning from tabular data.</p></list-item>
<list-item><p>Efficiency: Trees can be learned from relatively small amounts of data. Learning is very fast, prediction extremely fast; this is useful especially when deploying models on mobile devices.</p></list-item>
<list-item><p>Ease of use: Good results can often be obtained without hyperparameter tuning (though tuning may further improve them).</p></list-item>
<list-item><p>Interpretability: Individual predictions are easy to interpret.</p></list-item>
<list-item><p>Flexibility: Tree learning algorithms are easily adapted for tasks beyond the usual classification and regression.</p></list-item>
<list-item><p>Versatility: Decision trees can be used for a wide variety of tasks, ranging well beyond classification and regression.</p></list-item>
<list-item><p>Suitability for auxiliary use: Decision trees are often useful as auxiliary models, and are easily integrated in other systems.</p></list-item>
<list-item><p>Verifiability: The structure of trees and forests are such that they can be subject to formal verification.</p></list-item>
<list-item><p>Constrainability: Structural and semantic constraints can be imposed on trees and ensembles.</p></list-item>
</list>
<p>Properties 1&#x02013;4 explain the continued popularity of decision trees for predictive modeling. Properties 5&#x02013;9 make decision trees and forests very useful in the context of responsible AI: they facilitate the development of AI systems that are accurate, robust, fair, and transparent. It seems likely that decision tree based models will continue to be useful when other challenges arise in AI, and that research on decision trees (both how to learn them, and how to use them) will remain relevant in the future.</p>
<p>Research directions hitherto little explored include: (a) extending the &#x0201C;optimal tree learning&#x0201D; methods to dynamic settings (incremental learning, concept drift), ensembles, and variants of decision trees (oblique trees, multi-target trees, ranking, predictive clustering trees, etc.); (b) imposing and/or verifying a broader variety of constraints on trees and ensembles; (c) exploiting the commonalities between domains currently mostly studied in isolation, such as robust and explanatory AI; (d) cross-comparing methods from different paradigms (such as combinatorial solvers versus evolutionary approaches). Further developments in the area of responsible AI will likely keep exploiting existing decision tree technologies as well as motivate new research.</p>
</sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>This work was supported by the Research Foundation&#x02013;Flanders and the Fonds de la Recherche Scientifique&#x02013;FNRS under EOS No. 30992574 (VeriLearn) and by the Flemish Government (AI Research Program).</p>
</sec>
<ack><p>We thank the reviewers for their extensive constructive comments, which helped improve this article.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>E.g., when only three instances remain, any binary test that on the population does not correlate with the class labels still has a probability of 1/4 of yielding a perfect split.</p></fn>
<fn id="fn0002"><p><sup>2</sup>See, e.g., the Kaggle platform for machine learning competitions.</p></fn>
<fn id="fn0003"><p><sup>3</sup>Tuning may still be helpful, but using standard settings typically already gives good results.</p></fn>
<fn id="fn0004"><p><sup>4</sup>In this text, the term efficiency typically refers to computational efficiency; we explicitly write <italic>runtime efficiency</italic> and <italic>memory efficiency</italic> in other cases.</p></fn>
<fn id="fn0005"><p><sup>5</sup>E.g., some papers use asymmetric attack models where different perturbation sizes are used in either direction (T&#x000F6;rnblom and Nadjm-Tehrani, <xref ref-type="bibr" rid="B143">2020</xref>; Devos et al., <xref ref-type="bibr" rid="B46">2021b</xref>; Vos and Verwer, <xref ref-type="bibr" rid="B153">2021</xref>). Others add a maximum budget (Devos et al., <xref ref-type="bibr" rid="B45">2021a</xref>) and others again use rewriting rules, which corresponds to a conditional asymmetric model with a maximum budget (Calzavara et al., <xref ref-type="bibr" rid="B29">2020b</xref>).</p></fn>
<fn id="fn0006"><p><sup>6</sup>Communication from the Commission of 8 April 2019, Ethics Guidelines for Trustworthy AI, COM (2019).</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aghaei</surname> <given-names>S.</given-names></name> <name><surname>Azizi</surname> <given-names>M. J.</given-names></name> <name><surname>Vayanos</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning optimal and fair decision trees for non-discriminative decision-making</article-title>, in <source>Proceedings of the 33rd AAAI Conference on Artificial Intelligence</source> <fpage>1418</fpage>&#x02013;<lpage>1426</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33011418</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aglin</surname> <given-names>G.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2020a</year>). <article-title>Learning optimal decision trees using caching branch-and-bound search</article-title>, in <source>Proceedings of the 34th AAAI Conference on Artificial Intelligence</source> <fpage>3146</fpage>&#x02013;<lpage>3153</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i04.5711</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aglin</surname> <given-names>G.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2020b</year>). <article-title>Pydl8.5: a library for learning optimal decision trees</article-title>, in <source>Proceedings of the 29th International Joint Conference on Artificial Intelligence</source> <fpage>5222</fpage>&#x02013;<lpage>5224</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2020/750</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aglin</surname> <given-names>G.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Learning optimal decision trees under memory constraints</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2022), Part V</source> <fpage>393</fpage>&#x02013;<lpage>409</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-26419-1_24</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alaniz</surname> <given-names>S.</given-names></name> <name><surname>Marcos</surname> <given-names>D.</given-names></name> <name><surname>Schiele</surname> <given-names>B.</given-names></name> <name><surname>Akata</surname> <given-names>Z.</given-names></name></person-group> (<year>2021</year>). <article-title>Learning decision trees recurrently through communication</article-title>, in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> <fpage>13518</fpage>&#x02013;<lpage>13527</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR46437.2021.01331</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Albini</surname> <given-names>E.</given-names></name> <name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Dervovic</surname> <given-names>D.</given-names></name> <name><surname>Magazzeni</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Counterfactual shapley additive explanations</article-title>, in <source>2022 ACM Conference on Fairness, Accountability, and Transparency</source> <fpage>1054</fpage>&#x02013;<lpage>1070</lpage>. <pub-id pub-id-type="doi">10.1145/3531146.3533168</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andriushchenko</surname> <given-names>M.</given-names></name> <name><surname>Hein</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Provably robust boosted decision stumps and trees against adversarial attacks</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>12997</fpage>&#x02013;<lpage>13008</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Avellaneda</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Efficient inference of optimal decision trees</article-title>, in <source>Proceedings of the 34th AAAI Conference on Artificial Intelligence</source> <fpage>3195</fpage>&#x02013;<lpage>3202</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i04.5717</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barros</surname> <given-names>R. C.</given-names></name> <name><surname>Basgalupp</surname> <given-names>M. P.</given-names></name> <name><surname>de Carvalho</surname> <given-names>A. C. P. L. F.</given-names></name> <name><surname>Freitas</surname> <given-names>A. A.</given-names></name></person-group> (<year>2012</year>). <article-title>A survey of evolutionary algorithms for decision-tree induction</article-title>. <source>IEEE Trans. Syst. Man, Cyber</source>. <volume>42</volume>, <fpage>291</fpage>&#x02013;<lpage>312</lpage>. <pub-id pub-id-type="doi">10.1109/TSMCC.2011.2157494</pub-id><pub-id pub-id-type="pmid">11079905</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barros</surname> <given-names>R. C.</given-names></name> <name><surname>de Carvalho</surname> <given-names>A. C. P. L. F.</given-names></name> <name><surname>Freitas</surname> <given-names>A. A.</given-names></name></person-group> (<year>2015</year>). <source>Automatic Design of Decision-Tree Induction Algorithms</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Springer Briefs in Computer Science</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-319-14231-9</pub-id><pub-id pub-id-type="pmid">23171000</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bastani</surname> <given-names>O.</given-names></name> <name><surname>Kim</surname> <given-names>C.</given-names></name> <name><surname>Bastani</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Interpretability via model extraction. arXiv:1706.09773</article-title>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bastani</surname> <given-names>O.</given-names></name> <name><surname>Pu</surname> <given-names>Y.</given-names></name> <name><surname>Solar-Lezama</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Verifiable reinforcement learning via policy extraction</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>2499</fpage>&#x02013;<lpage>2509</lpage>. <pub-id pub-id-type="pmid">33690619</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bekker</surname> <given-names>J.</given-names></name> <name><surname>Davis</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Estimating the class prior in positive and unlabeled data through decision tree induction</article-title>, in <source>Proceedings of the 32nd AAAI Conference on Artificial Intelligence</source> <fpage>2712</fpage>&#x02013;<lpage>2719</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v32i1.11715</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ben-David</surname> <given-names>A.</given-names></name></person-group> (<year>1995</year>). <article-title>Monotonicity maintenance in information-theoretic machine learning algorithms</article-title>. <source>Mach. Learn</source>. <volume>19</volume>, <fpage>29</fpage>&#x02013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1007/BF00994659</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ben-Haim</surname> <given-names>Y.</given-names></name> <name><surname>Tom-Tov</surname> <given-names>E.</given-names></name></person-group> (<year>2010</year>). <article-title>A streaming parallel decision tree algorithm</article-title>. <source>J. Mach. Lear. Res</source>. <volume>11</volume>, <fpage>849</fpage>&#x02013;<lpage>872</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertsimas</surname> <given-names>D.</given-names></name> <name><surname>Dunn</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Optimal classification trees</article-title>. <source>Mach. Lear</source>. <volume>106</volume>, <fpage>1039</fpage>&#x02013;<lpage>1082</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-017-5633-9</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertsimas</surname> <given-names>D.</given-names></name> <name><surname>Dunn</surname> <given-names>J.</given-names></name> <name><surname>Gibson</surname> <given-names>E.</given-names></name> <name><surname>Orfanoudaki</surname> <given-names>A.</given-names></name></person-group> (<year>2022</year>). <article-title>Optimal survival trees</article-title>. <source>Mach. Lear</source>. <volume>111</volume>, <fpage>2951</fpage>&#x02013;<lpage>3023</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-021-06117-0</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bessiere</surname> <given-names>C.</given-names></name> <name><surname>Hebrard</surname> <given-names>E.</given-names></name> <name><surname>O&#x00027;Sullivan</surname> <given-names>B.</given-names></name></person-group> (<year>2009</year>). <article-title>Minimising decision tree size as combinatorial optimisation</article-title>, in <source>Principles and Practice of Constraint Programming - CP 2009</source> <fpage>173</fpage>&#x02013;<lpage>187</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-04244-7_16</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>1998</year>). <article-title>Top-down induction of first-order logical decision trees</article-title>. <source>Artif. Intell</source>. <volume>101</volume>, <fpage>285</fpage>&#x02013;<lpage>297</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(98)00034-4</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>Page</surname> <given-names>D.</given-names></name> <name><surname>Srinivasan</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>Multi-instance tree learning</article-title>, in <source>Proceedings of the 22nd International Conference on Machine Learning</source> <fpage>57</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1145/1102351.1102359</pub-id><pub-id pub-id-type="pmid">36673169</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>Raedt</surname> <given-names>L. D.</given-names></name> <name><surname>Ramon</surname> <given-names>J.</given-names></name></person-group> (<year>1998</year>). <article-title>Top-down induction of clustering trees</article-title>, in <source>Proceedings of the 15th International Conference on Machine Learning</source> <fpage>55</fpage>&#x02013;<lpage>63</lpage>. <pub-id pub-id-type="pmid">15460282</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bou-Hamad</surname> <given-names>I.</given-names></name> <name><surname>Larocque</surname> <given-names>D.</given-names></name> <name><surname>Ben-Ameur</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>A review of survival trees</article-title>. <source>Stat. Surv</source>. <volume>5</volume>, <fpage>44</fpage>&#x02013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1214/09-SS047</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>1996</year>). <article-title>Bagging predictors</article-title>. <source>Mach. Lear</source>. <volume>24</volume>, <fpage>123</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1007/BF00058655</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Lear</source>. <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name> <name><surname>Friedman</surname> <given-names>J. H.</given-names></name> <name><surname>Olshen</surname> <given-names>R. A.</given-names></name> <name><surname>Stone</surname> <given-names>C. J.</given-names></name></person-group> (<year>1984</year>). <source>Classification and Regression Trees</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Wadsworth</publisher-name>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calzavara</surname> <given-names>S.</given-names></name> <name><surname>Cazzaro</surname> <given-names>L.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name> <name><surname>Marcuzzi</surname> <given-names>F.</given-names></name> <name><surname>Orlando</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Beyond robustness: Resilience verification of tree-based classifiers</article-title>. <source>Comput. Secur</source>. <volume>121</volume>, <fpage>102843</fpage>. <pub-id pub-id-type="doi">10.1016/j.cose.2022.102843</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calzavara</surname> <given-names>S.</given-names></name> <name><surname>Ferrara</surname> <given-names>P.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name></person-group> (<year>2020a</year>). <article-title>Certifying decision trees against evasion attacks by program analysis</article-title>, in <source>Computer Security-ESORICS 2020: 25th European Symposium on Research in Computer Security</source> <fpage>421</fpage>&#x02013;<lpage>438</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-59013-0_21</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calzavara</surname> <given-names>S.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name> <name><surname>Marcuzzi</surname> <given-names>F.</given-names></name> <name><surname>Orlando</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Feature partitioning for robust tree ensembles and their certification in adversarial scenarios</article-title>. <source>EURASIP J. Inform. Secur</source>. <volume>2021</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1186/s13635-021-00127-0</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calzavara</surname> <given-names>S.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name> <name><surname>Tolomei</surname> <given-names>G.</given-names></name> <name><surname>Abebe</surname> <given-names>S. A.</given-names></name> <name><surname>Orlando</surname> <given-names>S.</given-names></name></person-group> (<year>2020b</year>). <article-title>Treant: training evasion-aware decision trees</article-title>. <source>Data Mining Knowl. Discov</source>.<volume>34</volume>, <fpage>1390</fpage>&#x02013;<lpage>1420</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00694-9</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carreira-Perpi n&#x000E1;n</surname> <given-names>M. &#x000C1;.</given-names></name> <name><surname>Hada</surname> <given-names>S. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Counterfactual explanations for oblique decision trees: Exact, efficient algorithms</article-title>, in <source>Proceedings of the 35th AAAI Conference on Artificial Intelligence</source> <fpage>6903</fpage>&#x02013;<lpage>6911</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v35i8.16851</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Boning</surname> <given-names>D.</given-names></name> <name><surname>Hsieh</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2019a</year>). <article-title>Robust decision trees against adversarial examples</article-title>, in <source>Proceedings of the 36th International Conference on Machine Learning</source> <fpage>1122</fpage>&#x02013;<lpage>1131</lpage>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Si</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Boning</surname> <given-names>D.</given-names></name> <name><surname>Hsieh</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2019b</year>). <article-title>Robustness verification of tree-based models</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>12317</fpage>&#x02013;<lpage>12328</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>T.</given-names></name> <name><surname>Guestrin</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Xgboost: A scalable tree boosting system</article-title>, in <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>785</fpage>&#x02013;<lpage>794</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Jiang</surname> <given-names>W.</given-names></name> <name><surname>Cidon</surname> <given-names>A.</given-names></name> <name><surname>Jana</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Cost-aware robust tree ensembles for security applications</article-title>, in <source>30th USENIX Security Symposium (USENIX Security 21)</source> <fpage>2291</fpage>&#x02013;<lpage>2308</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cl&#x000E9;men&#x000E7;on</surname> <given-names>S.</given-names></name> <name><surname>Depecker</surname> <given-names>M.</given-names></name> <name><surname>Vayatis</surname> <given-names>N.</given-names></name></person-group> (<year>2011</year>). <article-title>Adaptive partitioning schemes for bipartite ranking</article-title>. <source>Mach. Lear</source>. <volume>83</volume>, <fpage>31</fpage>&#x02013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-010-5190-y</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Confalonieri</surname> <given-names>R.</given-names></name> <name><surname>Weyde</surname> <given-names>T.</given-names></name> <name><surname>Besold</surname> <given-names>T. R.</given-names></name> <name><surname>del Prado Mart&#x000ED;n</surname> <given-names>F. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Trepan reloaded: A knowledge-driven approach to explaining artificial neural networks</article-title>, in <source>24th European Conference on Artificial Intelligence</source> <fpage>2457</fpage>&#x02013;<lpage>2464</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Costa</surname> <given-names>V.</given-names></name> <name><surname>Pedreira</surname> <given-names>C.</given-names></name></person-group> (<year>2023</year>). <article-title>Recent advances in decision trees: an updated survey</article-title>. <source>Artif. Intell. Rev</source>. <volume>56</volume>, <fpage>4765</fpage>&#x02013;<lpage>4800</lpage>. <pub-id pub-id-type="doi">10.1007/s10462-022-10275-5</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cotter</surname> <given-names>A.</given-names></name> <name><surname>Jiang</surname> <given-names>H.</given-names></name> <name><surname>Gupta</surname> <given-names>M. R.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Narayan</surname> <given-names>T.</given-names></name> <name><surname>You</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals</article-title>. <source>J. Mach. Lear. Res</source>. <volume>20</volume>, <fpage>1</fpage>&#x02013;<lpage>59</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Craven</surname> <given-names>M. W.</given-names></name> <name><surname>Shavlik</surname> <given-names>J. W.</given-names></name></person-group> (<year>1995</year>). <article-title>Extracting tree-structured representations of trained networks</article-title>, in <source>Proceedings of the 8th International Conference on Neural Information Processing Systems</source> <fpage>24</fpage>&#x02013;<lpage>30</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Criminisi</surname> <given-names>A.</given-names></name> <name><surname>Shotton</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <source>Decision Forests for Computer Vision and Medical Image Analysis</source>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer Publishing Company, Incorporated</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-1-4471-4929-3</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Criminisi</surname> <given-names>A.</given-names></name> <name><surname>Shotton</surname> <given-names>J.</given-names></name> <name><surname>Konukoglu</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning</article-title>. <source>Found. Trends Comput. Graph. Vision</source> <volume>7</volume>, <fpage>81</fpage>&#x02013;<lpage>227</lpage>. <pub-id pub-id-type="doi">10.1561/0600000035</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cui</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>He</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Optimal action extraction for random forests and boosted trees</article-title>, in <source>Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>179</fpage>&#x02013;<lpage>188</lpage>. <pub-id pub-id-type="doi">10.1145/2783258.2783281</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Demirovic</surname> <given-names>E.</given-names></name> <name><surname>Lukina</surname> <given-names>A.</given-names></name> <name><surname>Hebrard</surname> <given-names>E.</given-names></name> <name><surname>Chan</surname> <given-names>J.</given-names></name> <name><surname>Bailey</surname> <given-names>J.</given-names></name> <name><surname>Leckie</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Murtree: Optimal decision trees via dynamic programming and search</article-title>. <source>J. Mach. Lear. Res</source>. <volume>23</volume>, <fpage>1</fpage>&#x02013;<lpage>26</lpage>.</citation>
</ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Devos</surname> <given-names>L.</given-names></name> <name><surname>Meert</surname> <given-names>W.</given-names></name> <name><surname>Davis</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Fast gradient boosting decision trees with bit-level data structures</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2019), Part I</source> <fpage>590</fpage>&#x02013;<lpage>606</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-46150-8_35</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devos</surname> <given-names>L.</given-names></name> <name><surname>Meert</surname> <given-names>W.</given-names></name> <name><surname>Davis</surname> <given-names>J.</given-names></name></person-group> (<year>2021a</year>). <article-title>Verifying tree ensembles by reasoning about potential instances</article-title>, in <source>Proceedings of the 2021 SIAM International Conference on Data Mining</source> <fpage>450</fpage>&#x02013;<lpage>458</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611976700.51</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devos</surname> <given-names>L.</given-names></name> <name><surname>Meert</surname> <given-names>W.</given-names></name> <name><surname>Davis</surname> <given-names>J.</given-names></name></person-group> (<year>2021b</year>). <article-title>Versatile verification of tree ensembles</article-title>, in <source>Proceedings of the 38th International Conference on Machine Learning</source> <fpage>2654</fpage>&#x02013;<lpage>2664</lpage>.</citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Diochnos</surname> <given-names>D.</given-names></name> <name><surname>Mahloujifar</surname> <given-names>S.</given-names></name> <name><surname>Mahmoody</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Adversarial risk and robustness: General definitions and implications for the uniform distribution</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>10380</fpage>&#x02013;<lpage>10389</lpage>.</citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Domingos</surname> <given-names>P. M.</given-names></name> <name><surname>Hulten</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>Mining high-speed data streams</article-title>, in <source>Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>71</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1145/347090.347107</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>X.</given-names></name> <name><surname>Yu</surname> <given-names>Z.</given-names></name> <name><surname>Cao</surname> <given-names>W.</given-names></name> <name><surname>Shi</surname> <given-names>Y.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>A survey on ensemble learning</article-title>. <source>Front Comput. Sci</source>. <volume>14</volume>, <fpage>241</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1007/s11704-019-8208-z</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doshi-Velez</surname> <given-names>F.</given-names></name> <name><surname>Kim</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>Towards a rigorous science of interpretable machine learning. arXiv:1702.08608</article-title>.</citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>W.</given-names></name> <name><surname>Zhan</surname> <given-names>Z.</given-names></name></person-group> (<year>2002</year>). <article-title>Building decision tree classifier on private data</article-title>, in <source>Proceedings of the 14th IEEE International Conference on Privacy, Security and Data Mining</source> <fpage>1</fpage>&#x02013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dutta</surname> <given-names>S.</given-names></name> <name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Mishra</surname> <given-names>S.</given-names></name> <name><surname>Tilli</surname> <given-names>C.</given-names></name> <name><surname>Magazzeni</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Robust counterfactual explanations for tree-based ensembles</article-title>, in <source>Proceedings of the 39th International Conference on Machine Learning</source> <fpage>5742</fpage>&#x02013;<lpage>5756</lpage>.</citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Einziger</surname> <given-names>G.</given-names></name> <name><surname>Goldstein</surname> <given-names>M.</given-names></name> <name><surname>Sa&#x00027;ar</surname> <given-names>Y.</given-names></name> <name><surname>Segall</surname> <given-names>I.</given-names></name></person-group> (<year>2019</year>). <article-title>Verifying robustness of gradient boosted models</article-title>, in <source>Proceedings of the 33rd AAAI Conference on Artificial Intelligence</source> <fpage>2446</fpage>&#x02013;<lpage>2453</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33012446</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esmeir</surname> <given-names>S.</given-names></name> <name><surname>Markovitch</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Anytime learning of decision trees</article-title>. <source>J. Mach. Lear. Res</source>. <volume>8</volume>, <fpage>891</fpage>&#x02013;<lpage>933</lpage>.</citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fierens</surname> <given-names>D.</given-names></name> <name><surname>Ramon</surname> <given-names>J.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>Bruynooghe</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>A comparison of pruning criteria for probability trees</article-title>. <source>Mach. Lear</source>. <volume>78</volume>, <fpage>251</fpage>&#x02013;<lpage>285</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-009-5147-1</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fierens</surname> <given-names>D.</given-names></name> <name><surname>Ramon</surname> <given-names>J.</given-names></name> <name><surname>Bruynooghe</surname> <given-names>M.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name></person-group> (<year>2007</year>). <article-title>Learning directed probabilistic logical models: Ordering-search versus structure-search</article-title>, in <source>Proceedings of the 18th European Conference on Machine Learning</source> <fpage>567</fpage>&#x02013;<lpage>574</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-74958-5_54</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fletcher</surname> <given-names>S.</given-names></name> <name><surname>Islam</surname> <given-names>M. Z.</given-names></name></person-group> (<year>2019</year>). <article-title>Decision tree classification with differential privacy: A survey</article-title>. <source>ACM Comput. Surv</source>. <volume>52</volume>, <fpage>1</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1145/3337064</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freund</surname> <given-names>Y.</given-names></name> <name><surname>Mason</surname> <given-names>L.</given-names></name></person-group> (<year>1999</year>). <article-title>The alternating decision tree learning algorithm</article-title>, in <source>Proceedings of the 16th International Conference on Machine Learning</source> <fpage>124</fpage>&#x02013;<lpage>133</lpage>.</citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>J. H.</given-names></name></person-group> (<year>2001</year>). <article-title>Greedy function approximation: A gradient boosting machine</article-title>. <source>Ann. Stat</source>. <volume>29</volume>, <fpage>1189</fpage>&#x02013;<lpage>1232</lpage>. <pub-id pub-id-type="doi">10.1214/aos/1013203451</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>N.</given-names></name> <name><surname>Goldszmidt</surname> <given-names>M.</given-names></name></person-group> (<year>1998</year>). <article-title>Learning bayesian networks with local structure</article-title>, in <source>Learning in Graphical Models</source> (<publisher-loc>Springer</publisher-loc>) <fpage>421</fpage>&#x02013;<lpage>459</lpage>. <pub-id pub-id-type="doi">10.1007/978-94-011-5014-9_15</pub-id><pub-id pub-id-type="pmid">34203696</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frosst</surname> <given-names>N.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Distilling a neural network into a soft decision tree</article-title>, in <source>Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML</source>. <pub-id pub-id-type="pmid">33286971</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>F&#x000FC;rnkranz</surname> <given-names>J.</given-names></name> <name><surname>H&#x000FC;llermeier</surname> <given-names>E.</given-names></name></person-group> (<year>2010</year>). <source>Preference Learning</source>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garcia-Martin</surname> <given-names>E.</given-names></name> <name><surname>Bifet</surname> <given-names>A.</given-names></name> <name><surname>Lavesson</surname> <given-names>N.</given-names></name> <name><surname>K&#x000F6;nig</surname> <given-names>R.</given-names></name> <name><surname>Linusson</surname> <given-names>H.</given-names></name></person-group> (<year>2022</year>). <article-title>Green accelerated Hoeffding tree. arXiv:2205.03184</article-title>.</citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garofalakis</surname> <given-names>M.</given-names></name> <name><surname>Hyun</surname> <given-names>D.</given-names></name> <name><surname>Rastogi</surname> <given-names>R.</given-names></name> <name><surname>Shim</surname> <given-names>K.</given-names></name></person-group> (<year>2003</year>). <article-title>Building decision trees with constraints</article-title>. <source>Data Min. Knowl. Disc</source>. <volume>7</volume>, <fpage>187</fpage>&#x02013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1023/A:1022445500761</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gehrke</surname> <given-names>J.</given-names></name> <name><surname>Ramakrishnan</surname> <given-names>R.</given-names></name> <name><surname>Ganti</surname> <given-names>V.</given-names></name></person-group> (<year>2000</year>). <article-title>Rainforest-a framework for fast decision tree construction of large datasets</article-title>. <source>Data Min. Knowl. Disc</source>. <volume>4</volume>, <fpage>127</fpage>&#x02013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1023/A:1009839829793</pub-id><pub-id pub-id-type="pmid">21523931</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gomes</surname> <given-names>H. M.</given-names></name> <name><surname>Bifet</surname> <given-names>A.</given-names></name> <name><surname>Read</surname> <given-names>J.</given-names></name> <name><surname>Barddal</surname> <given-names>J. P.</given-names></name> <name><surname>Enembreck</surname> <given-names>F.</given-names></name> <name><surname>Pfharinger</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Adaptive random forests for evolving data stream classification</article-title>. <source>Mach. Lear</source>. <volume>106</volume>, <fpage>1469</fpage>&#x02013;<lpage>1495</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-017-5642-8</pub-id><pub-id pub-id-type="pmid">33266741</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grari</surname> <given-names>V.</given-names></name> <name><surname>Ruf</surname> <given-names>B.</given-names></name> <name><surname>Lamprier</surname> <given-names>S.</given-names></name> <name><surname>Detyniecki</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Fair adversarial gradient tree boosting</article-title>, in <source>Proceedings of the 2019 IEEE International Conference on Data Mining</source> <fpage>1060</fpage>&#x02013;<lpage>1065</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2019.00124</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grinsztajn</surname> <given-names>L.</given-names></name> <name><surname>Oyallon</surname> <given-names>E.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name></person-group> (<year>2022</year>). <article-title>Why do tree-based models still outperform deep learning on typical tabular data?</article-title> in <source>NeurIPS 2022 Datasets and Benchmarks</source>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guidotti</surname> <given-names>R.</given-names></name></person-group> (<year>2022</year>). <article-title>Counterfactual explanations and how to find them: literature review and benchmarking</article-title>. <source>Data Min. Knowl. Disc</source>. <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-022-00831-6</pub-id></citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>J.-Q.</given-names></name> <name><surname>Teng</surname> <given-names>M.-Z.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.-H.</given-names></name></person-group> (<year>2022</year>). <article-title>Fast provably robust decision trees and boosting</article-title>, in <source>Proceedings of the 39th International Conference on Machine Learning</source> <fpage>8127</fpage>&#x02013;<lpage>8144</lpage>.</citation>
</ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gurobi Optimization</surname> <given-names>L. L. C.</given-names></name></person-group> (<year>2022</year>). <source>Gurobi Optimizer Reference Manual</source>.</citation>
</ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hazimeh</surname> <given-names>H.</given-names></name> <name><surname>Ponomareva</surname> <given-names>N.</given-names></name> <name><surname>Mol</surname> <given-names>P.</given-names></name> <name><surname>Tan</surname> <given-names>Z.</given-names></name> <name><surname>Mazumder</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>The tree ensemble layer: Differentiability meets conditional computation</article-title>, in <source>Proceedings of the 37th International Conference on Machine Learning</source> <fpage>4138</fpage>&#x02013;<lpage>4148</lpage>.</citation>
</ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hothorn</surname> <given-names>T.</given-names></name> <name><surname>Lausen</surname> <given-names>B.</given-names></name></person-group> (<year>2003</year>). <article-title>Bagging tree classifiers for laser scanning images: a data- and simulation-based strategy</article-title>. <source>Artif. Intell. Med</source>. <volume>27</volume>, <fpage>65</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1016/S0933-3657(02)00085-4</pub-id><pub-id pub-id-type="pmid">12473392</pub-id></citation></ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>H.</given-names></name> <name><surname>Siala</surname> <given-names>M.</given-names></name> <name><surname>Hebrard</surname> <given-names>E.</given-names></name> <name><surname>Huguet</surname> <given-names>M.-J.</given-names></name></person-group> (<year>2020</year>). <article-title>Learning optimal decision trees with maxsat and its integration in adaboost</article-title>, in <source>Proceedings of the 29th International Joint Conference on Artificial Intelligence</source> <fpage>1170</fpage>&#x02013;<lpage>1176</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2020/163</pub-id></citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Rudin</surname> <given-names>C.</given-names></name> <name><surname>Seltzer</surname> <given-names>M. I.</given-names></name></person-group> (<year>2019</year>). <article-title>Optimal sparse decision trees</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>7265</fpage>&#x02013;<lpage>7273</lpage>. <pub-id pub-id-type="pmid">36970634</pub-id></citation></ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>H&#x000FC;llermeier</surname> <given-names>E.</given-names></name> <name><surname>Vanderlooy</surname> <given-names>S.</given-names></name></person-group> (<year>2009</year>). <article-title>Why fuzzy decision trees are good rankers</article-title>. <source>IEEE Trans. Fuzzy Syst</source>. <volume>17</volume>, <fpage>1233</fpage>&#x02013;<lpage>1244</lpage>. <pub-id pub-id-type="doi">10.1109/TFUZZ.2009.2026640</pub-id></citation>
</ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyafil</surname> <given-names>L.</given-names></name> <name><surname>Rivest</surname> <given-names>R.</given-names></name></person-group> (<year>1976</year>). <article-title>Constructing optimal binary decision trees is np-complete</article-title>. <source>Inform. Proces. Lett</source>. <volume>5</volume>, <fpage>15</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1016/0020-0190(76)90095-8</pub-id></citation>
</ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Irsoy</surname> <given-names>O.</given-names></name> <name><surname>Y&#x00131;ld&#x00131;z</surname> <given-names>O. T.</given-names></name> <name><surname>Alpayd&#x00131;n</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>Soft decision trees</article-title>, in <source>Proceedings of the 21st International Conference on Pattern Recognition</source> <fpage>1819</fpage>&#x02013;<lpage>1822</lpage>.</citation>
</ref>
<ref id="B79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Irsoy</surname> <given-names>O.</given-names></name> <name><surname>Yildiz</surname> <given-names>O. T.</given-names></name> <name><surname>Alpaydin</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Budding trees</article-title>, in <source>Proceedings of the 22nd International Conference on Pattern Recognition</source> <fpage>3582</fpage>&#x02013;<lpage>3587</lpage>. <pub-id pub-id-type="doi">10.1109/ICPR.2014.616</pub-id></citation>
</ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Teo</surname> <given-names>S. G.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Chan</surname> <given-names>C.</given-names></name> <name><surname>Hou</surname> <given-names>Q.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Towards end-to-end secure and efficient federated learning for xgboost</article-title>, in <source>Proceedings of the AAAI International Workshop on Trustable, Verifiable and Auditable Federated Learning</source>.</citation>
</ref>
<ref id="B81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johansson</surname> <given-names>U.</given-names></name> <name><surname>Bostr&#x000F6;m</surname> <given-names>H.</given-names></name> <name><surname>L&#x000F6;fstr&#x000F6;m</surname> <given-names>T.</given-names></name> <name><surname>Linusson</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Regression conformal prediction with random forests</article-title>. <source>Mach. Lear</source>. <volume>97</volume>, <fpage>155</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-014-5453-0</pub-id></citation>
</ref>
<ref id="B82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kairouz</surname> <given-names>P.</given-names></name> <name><surname>McMahan</surname> <given-names>H. B.</given-names></name> <name><surname>Avent</surname> <given-names>B.</given-names></name> <name><surname>Bellet</surname> <given-names>A.</given-names></name> <name><surname>Bennis</surname> <given-names>M.</given-names></name> <name><surname>Nitin Bhagoji</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Advances and open problems in federated learning</article-title>. <source>Found. Trends Mach. Learn</source>. <volume>14</volume>, <fpage>1</fpage>&#x02013;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1561/2200000083</pub-id></citation>
</ref>
<ref id="B83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kamiran</surname> <given-names>F.</given-names></name> <name><surname>Calders</surname> <given-names>T.</given-names></name> <name><surname>Pechenizkiy</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>Discrimination aware decision tree learning</article-title>, in <source>Proceedings of the 10th IEEE International Conference on Data Mining</source> <fpage>869</fpage>&#x02013;<lpage>874</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2010.50</pub-id><pub-id pub-id-type="pmid">30424467</pub-id></citation></ref>
<ref id="B84">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanamori</surname> <given-names>K.</given-names></name> <name><surname>Takagi</surname> <given-names>T.</given-names></name> <name><surname>Kobayashi</surname> <given-names>K.</given-names></name> <name><surname>Arimura</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Dace: distribution-aware counterfactual explanation by mixed-integer linear optimization</article-title>, in <source>Proceedings of the 29th International Joint Conference on Artificial Intelligence</source> <fpage>2855</fpage>&#x02013;<lpage>2862</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2020/395</pub-id></citation>
</ref>
<ref id="B85">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kantchelian</surname> <given-names>A.</given-names></name> <name><surname>Tygar</surname> <given-names>J. D.</given-names></name> <name><surname>Joseph</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Evasion and hardening of tree ensemble classifiers</article-title>, in <source>Proceedings of the 33rd International Conference on Machine Learning</source> <fpage>2387</fpage>&#x02013;<lpage>2396</lpage>.</citation>
</ref>
<ref id="B86">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>G.</given-names></name> <name><surname>Meng</surname> <given-names>Q.</given-names></name> <name><surname>Finley</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>LightGBM: A highly efficient gradient boosting decision tree</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>3149</fpage>&#x02013;<lpage>3157</lpage>.</citation>
</ref>
<ref id="B87">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiossou</surname> <given-names>H.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name> <name><surname>Houndji</surname> <given-names>R.</given-names></name></person-group> (<year>2022</year>). <article-title>Time constrained dl8.5 using limited discrepancy search</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2022), Part V</source> <fpage>443</fpage>&#x02013;<lpage>459</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-26419-1_27</pub-id><pub-id pub-id-type="pmid">29249083</pub-id></citation></ref>
<ref id="B88">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kocev</surname> <given-names>D.</given-names></name> <name><surname>Vens</surname> <given-names>C.</given-names></name> <name><surname>Struyf</surname> <given-names>J.</given-names></name> <name><surname>Dzeroski</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Tree ensembles for predicting structured output</article-title>. <source>Patt. Recogn</source>. <volume>46</volume>, <fpage>817</fpage>&#x02013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2012.09.023</pub-id></citation>
</ref>
<ref id="B89">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kontschieder</surname> <given-names>P.</given-names></name> <name><surname>Fiterau</surname> <given-names>M.</given-names></name> <name><surname>Criminisi</surname> <given-names>A.</given-names></name> <name><surname>Bulo</surname> <given-names>S. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep neural decision forests</article-title>, in <source>Proceedings of the IEEE International Conference on Computer Vision</source> <fpage>1467</fpage>&#x02013;<lpage>1475</lpage>. <pub-id pub-id-type="doi">10.1109/ICCV.2015.172</pub-id></citation>
</ref>
<ref id="B90">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koschel</surname> <given-names>S.</given-names></name> <name><surname>Buschj&#x000E4;ger</surname> <given-names>S.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name> <name><surname>Morik</surname> <given-names>K.</given-names></name></person-group> (<year>2023</year>). <article-title>Fast inference of tree ensembles on arm devices</article-title>. <source>arXiv</source>:2305.08579.</citation>
</ref>
<ref id="B91">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kramer</surname> <given-names>S.</given-names></name></person-group> (<year>1996</year>). <article-title>Structural regression trees</article-title>, in <source>Proceedings of the 13th National Conference on Artificial Intelligence</source> <fpage>812</fpage>&#x02013;<lpage>819</lpage>.</citation>
</ref>
<ref id="B92">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>G.-H.</given-names></name> <name><surname>Jaakkola</surname> <given-names>T. S.</given-names></name></person-group> (<year>2020</year>). <article-title>Oblique decision trees from derivatives of ReLU networks</article-title>, in <source>International Conference on Learning Representations</source>.</citation>
</ref>
<ref id="B93">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levati&#x00107;</surname> <given-names>J.</given-names></name> <name><surname>Ceci</surname> <given-names>M.</given-names></name> <name><surname>Kocev</surname> <given-names>D.</given-names></name> <name><surname>D&#x0017E;eroski</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Semi-supervised classification trees</article-title>. <source>J. Intell. Inform. Syst</source>. <volume>49</volume>, <fpage>461</fpage>&#x02013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.1007/s10844-017-0457-4</pub-id></citation>
</ref>
<ref id="B94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Song</surname> <given-names>J.</given-names></name> <name><surname>Xue</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Ye</surname> <given-names>J.</given-names></name> <name><surname>Cheng</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>A survey of neural trees. arXiv:2209.03415</article-title>.</citation>
</ref>
<ref id="B95">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Wen</surname> <given-names>Z.</given-names></name> <name><surname>He</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Privacy-preserving gradient boosting decision trees</article-title>, in <source>Proceedings of the 34th AAAI Conference on Artificial Intelligence</source> <fpage>784</fpage>&#x02013;<lpage>791</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i01.5422</pub-id><pub-id pub-id-type="pmid">37128389</pub-id></citation></ref>
<ref id="B96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Shi</surname> <given-names>P.</given-names></name> <name><surname>Hu</surname> <given-names>Z.</given-names></name></person-group> (<year>2012</year>). <article-title>Learning very fast decision tree from uncertain data streams with positive and unlabeled samples</article-title>. <source>Inform. Sci</source>. <volume>213</volume>, <fpage>50</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2012.05.023</pub-id></citation>
</ref>
<ref id="B97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Zhong</surname> <given-names>C.</given-names></name> <name><surname>Hu</surname> <given-names>D.</given-names></name> <name><surname>Rudin</surname> <given-names>C.</given-names></name> <name><surname>Seltzer</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Generalized and scalable optimal sparse decision trees</article-title>, in <source>Proceedings of the 37th International Conference on Machine Learning</source> <fpage>6150</fpage>&#x02013;<lpage>6160</lpage>.</citation>
</ref>
<ref id="B98">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Arnon</surname> <given-names>T.</given-names></name> <name><surname>Lazarus</surname> <given-names>C.</given-names></name> <name><surname>Strong</surname> <given-names>C.</given-names></name> <name><surname>Barrett</surname> <given-names>C.</given-names></name> <name><surname>Kochenderfer</surname> <given-names>M. J.</given-names></name></person-group> (<year>2021</year>). <article-title>Algorithms for verifying deep neural networks</article-title>. <source>Found. Trends? Optimiz</source>. <volume>4</volume>, <fpage>244</fpage>&#x02013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1561/2400000035</pub-id></citation>
</ref>
<ref id="B99">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>F. T.</given-names></name> <name><surname>Ting</surname> <given-names>K. M.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.-H.</given-names></name></person-group> (<year>2008</year>). <article-title>Isolation forest</article-title>, in <source>Proceedings of the 8th IEEE International Conference on Data Mining</source> <fpage>413</fpage>&#x02013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2008.17</pub-id></citation>
</ref>
<ref id="B100">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lucchese</surname> <given-names>C.</given-names></name> <name><surname>Nardini</surname> <given-names>F. M.</given-names></name> <name><surname>Orlando</surname> <given-names>S.</given-names></name> <name><surname>Perego</surname> <given-names>R.</given-names></name> <name><surname>Tonellotto</surname> <given-names>N.</given-names></name> <name><surname>Venturini</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>Quickscorer: Efficient traversal of large ensembles of decision trees</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2017), Part III</source> <fpage>383</fpage>&#x02013;<lpage>387</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-71273-4_36</pub-id></citation>
</ref>
<ref id="B101">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lucic</surname> <given-names>A.</given-names></name> <name><surname>Oosterhuis</surname> <given-names>H.</given-names></name> <name><surname>Haned</surname> <given-names>H.</given-names></name> <name><surname>de Rijke</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Focus: Flexible optimizable counterfactual explanations for tree ensembles</article-title>, in <source>Proceedings of the 36th AAAI Conference on Artificial Intelligence</source> <fpage>5313</fpage>&#x02013;<lpage>5322</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v36i5.20468</pub-id></citation>
</ref>
<ref id="B102">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lundberg</surname> <given-names>S. M.</given-names></name> <name><surname>Erion</surname> <given-names>G.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>DeGrave</surname> <given-names>A.</given-names></name> <name><surname>Prutkin</surname> <given-names>J. M.</given-names></name> <name><surname>Nair</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Explainable ai for trees: From local explanations to global understanding. arXiv:1905.04610</article-title>.</citation>
</ref>
<ref id="B103">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madry</surname> <given-names>A.</given-names></name> <name><surname>Makelov</surname> <given-names>A.</given-names></name> <name><surname>Schmidt</surname> <given-names>L.</given-names></name> <name><surname>Tsipras</surname> <given-names>D.</given-names></name> <name><surname>Vladu</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Towards deep learning models resistant to adversarial attacks</article-title>, in <source>Proceedings of the 6th International Conference on Learning Representations</source>. <pub-id pub-id-type="pmid">34648444</pub-id></citation></ref>
<ref id="B104">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McTavish</surname> <given-names>H.</given-names></name> <name><surname>Zhong</surname> <given-names>C.</given-names></name> <name><surname>Achermann</surname> <given-names>R.</given-names></name> <name><surname>Karimalis</surname> <given-names>I.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Rudin</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Fast sparse decision tree optimization via reference ensembles</article-title>, in <source>Proceedings of the 36th AAAI Conference on Artificial Intelligence</source> <fpage>9604</fpage>&#x02013;<lpage>9613</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v36i9.21194</pub-id><pub-id pub-id-type="pmid">36051654</pub-id></citation></ref>
<ref id="B105">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mehta</surname> <given-names>M.</given-names></name> <name><surname>Agrawal</surname> <given-names>R.</given-names></name> <name><surname>Rissanen</surname> <given-names>J.</given-names></name></person-group> (<year>1996</year>). <article-title>Sliq: A fast scalable classifier for data mining</article-title>, in <source>Advances in Database Technology-EDBT&#x00027;96: 5th International Conference on Extending Database Technology</source> <fpage>18</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1007/BFb0014141</pub-id></citation>
</ref>
<ref id="B106">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>Q.</given-names></name> <name><surname>Ke</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Ye</surname> <given-names>Q.</given-names></name> <name><surname>Ma</surname> <given-names>Z.-M.</given-names></name> <etal/></person-group>. (<year>2016a</year>). <article-title>A communication-efficient parallel algorithm for decision tree</article-title>. <source>Adv. Neural Infor. Proc. Syst</source>. <volume>29</volume>, <fpage>1271</fpage>&#x02013;<lpage>1279</lpage>.</citation>
</ref>
<ref id="B107">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>X.</given-names></name> <name><surname>Bradley</surname> <given-names>J.</given-names></name> <name><surname>Yavuz</surname> <given-names>B.</given-names></name> <name><surname>Sparks</surname> <given-names>E.</given-names></name> <name><surname>Venkataraman</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2016b</year>). <article-title>MLlib: <italic>Machine Learning</italic> in Apache Spark</article-title>. <source>J. Mach. Learn. Res</source>. <volume>17</volume>, <fpage>1235</fpage>&#x02013;<lpage>1241</lpage>.</citation>
</ref>
<ref id="B108">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Milani</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Topin</surname> <given-names>N.</given-names></name> <name><surname>Shi</surname> <given-names>Z. R.</given-names></name> <name><surname>Kamhoua</surname> <given-names>C.</given-names></name> <name><surname>Papalexakis</surname> <given-names>E. E.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Maviper: Learning decision tree policies for interpretable multi-agent reinforcement learning</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2022), Part IV</source> <fpage>251</fpage>&#x02013;<lpage>266</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-26412-2_16</pub-id></citation>
</ref>
<ref id="B109">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mitchell</surname> <given-names>R.</given-names></name> <name><surname>Frank</surname> <given-names>E.</given-names></name></person-group> (<year>2017</year>). <article-title>Accelerating the XGBoost algorithm using GPU computing</article-title>. <source>PeerJ Comput. Sci</source>. <volume>3</volume>, <fpage>e127</fpage>. <pub-id pub-id-type="doi">10.7717/peerj-cs.127</pub-id></citation>
</ref>
<ref id="B110">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murthy</surname> <given-names>S. K.</given-names></name></person-group> (<year>1998</year>). <article-title>Automatic construction of decision trees from data: A multi-disciplinary survey</article-title>. <source>Data Mining Knowl. Disc</source>. <volume>2</volume>, <fpage>345</fpage>&#x02013;<lpage>389</lpage>. <pub-id pub-id-type="doi">10.1023/A:1009744630224</pub-id></citation>
</ref>
<ref id="B111">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murthy</surname> <given-names>S. K.</given-names></name> <name><surname>Kasif</surname> <given-names>S.</given-names></name> <name><surname>Salzberg</surname> <given-names>S.</given-names></name></person-group> (<year>1994</year>). <article-title>A system for induction of oblique decision trees</article-title>. <source>J. Artif. Intell. Res</source>. <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1613/jair.63</pub-id></citation>
</ref>
<ref id="B112">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nanfack</surname> <given-names>G.</given-names></name> <name><surname>Temple</surname> <given-names>P.</given-names></name> <name><surname>Fr&#x000E9;nay</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Constraint enforcement on decision trees: A survey</article-title>. <source>ACM Comput. Surv</source>. <volume>54</volume>, <fpage>1</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1145/3506734</pub-id></citation>
</ref>
<ref id="B113">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narodytska</surname> <given-names>N.</given-names></name> <name><surname>Ignatiev</surname> <given-names>A.</given-names></name> <name><surname>Pereira</surname> <given-names>F.</given-names></name> <name><surname>Marques-Silva</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning optimal decision trees with SAT</article-title>, in <source>Proceedings of the 27th International Joint Conference on Artificial Intelligence</source> <fpage>1362</fpage>&#x02013;<lpage>1368</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2018/189</pub-id><pub-id pub-id-type="pmid">35578278</pub-id></citation></ref>
<ref id="B114">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nauta</surname> <given-names>M.</given-names></name> <name><surname>van Bree</surname> <given-names>R.</given-names></name> <name><surname>Seifert</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Neural prototype trees for interpretable fine-grained image recognition</article-title>, in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> <fpage>14933</fpage>&#x02013;<lpage>14943</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR46437.2021.01469</pub-id></citation>
</ref>
<ref id="B115">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Neville</surname> <given-names>J.</given-names></name> <name><surname>Jensen</surname> <given-names>D. D.</given-names></name></person-group> (<year>2007</year>). <article-title>Relational dependency networks</article-title>. <source>J. Mach. Lear. Res</source>. <volume>8</volume>, <fpage>653</fpage>&#x02013;<lpage>692</lpage>.</citation>
</ref>
<ref id="B116">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Fromont</surname> <given-names>&#x000C9;.</given-names></name></person-group> (<year>2007</year>). <article-title>Mining optimal decision trees from itemset lattices</article-title>, in <source>Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>530</fpage>&#x02013;<lpage>539</lpage>. <pub-id pub-id-type="doi">10.1145/1281192.1281250</pub-id></citation>
</ref>
<ref id="B117">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Norouzi</surname> <given-names>M.</given-names></name> <name><surname>Collins</surname> <given-names>M. D.</given-names></name> <name><surname>Johnson</surname> <given-names>M.</given-names></name> <name><surname>Fleet</surname> <given-names>D. J.</given-names></name> <name><surname>Kohli</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Efficient non-greedy optimization of decision trees</article-title>, in <source>Proceedings of the 28th International Conference on Neural Information Processing Systems</source> <fpage>1729</fpage>&#x02013;<lpage>1737</lpage>.</citation>
</ref>
<ref id="B118">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>N&#x000FA;&#x000F1;ez</surname> <given-names>M.</given-names></name></person-group> (<year>1991</year>). <article-title>The use of background knowledge in decision tree induction</article-title>. <source>Mach. Lear</source>. <volume>6</volume>, <fpage>231</fpage>&#x02013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1007/BF00114778</pub-id></citation>
</ref>
<ref id="B119">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okajima</surname> <given-names>Y.</given-names></name> <name><surname>Sadamasa</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Deep neural networks constrained by decision rules</article-title>, in <source>Proceedings of the 33th AAAI Conference on Artificial Intelligence</source> <fpage>2496</fpage>&#x02013;<lpage>2505</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33012496</pub-id><pub-id pub-id-type="pmid">36624117</pub-id></citation></ref>
<ref id="B120">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olaru</surname> <given-names>C.</given-names></name> <name><surname>Wehenkel</surname> <given-names>L.</given-names></name></person-group> (<year>2003</year>). <article-title>A complete fuzzy decision tree technique</article-title>. <source>Fuzzy Sets Syst</source>. <volume>138</volume>, <fpage>221</fpage>&#x02013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1016/S0165-0114(03)00089-7</pub-id></citation>
</ref>
<ref id="B121">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parmentier</surname> <given-names>A.</given-names></name> <name><surname>Vidal</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Optimal counterfactual explanations in tree ensembles</article-title>, in <source>Proceedings of the 38th International Conference on Machine Learning</source> <fpage>8422</fpage>&#x02013;<lpage>8431</lpage>.</citation>
</ref>
<ref id="B122">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Potharst</surname> <given-names>R.</given-names></name> <name><surname>Feelders</surname> <given-names>A. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Classification trees for problems with monotonicity constraints</article-title>. <source>SIGKDD Explor</source>. <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1145/568574.568577</pub-id></citation>
</ref>
<ref id="B123">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Provost</surname> <given-names>F. J.</given-names></name> <name><surname>Domingos</surname> <given-names>P. M.</given-names></name></person-group> (<year>2003</year>). <article-title>Tree induction for probability-based ranking</article-title>. <source>Mach. Lear</source>. <volume>52</volume>, <fpage>199</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1023/A:1024099825458</pub-id></citation>
</ref>
<ref id="B124">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname> <given-names>J. R.</given-names></name></person-group> (<year>1986</year>). <article-title>Induction of decision trees</article-title>. <source>Mach. Lear</source>. <volume>1</volume>, <fpage>81</fpage>&#x02013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1007/BF00116251</pub-id></citation>
</ref>
<ref id="B125">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname> <given-names>J. R.</given-names></name></person-group> (<year>1992</year>). <article-title>Learning with continuous classes</article-title>, in <source>Proceedings of the 5th Australian Joint Conference on Artificial Intelligence</source> <fpage>343</fpage>&#x02013;<lpage>348</lpage>.</citation>
</ref>
<ref id="B126">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ram</surname> <given-names>P.</given-names></name> <name><surname>Gray</surname> <given-names>A. G.</given-names></name></person-group> (<year>2011</year>). <article-title>Density estimation trees</article-title>, in <source>Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page</source> <fpage>627</fpage>&#x02013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.1145/2020408.2020507</pub-id></citation>
</ref>
<ref id="B127">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ranzato</surname> <given-names>F.</given-names></name> <name><surname>Zanella</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Abstract interpretation of decision tree ensemble classifiers</article-title>, in <source>Proceedings of the AAAI Conference on Artificial Intelligence</source> <fpage>5478</fpage>&#x02013;<lpage>5486</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i04.5998</pub-id></citation>
</ref>
<ref id="B128">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ranzato</surname> <given-names>F.</given-names></name> <name><surname>Zanella</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Genetic adversarial training of decision trees</article-title>, in <source>Proceedings of the 2021 Genetic and Evolutionary Computation Conference</source> <fpage>358</fpage>&#x02013;<lpage>367</lpage>. <pub-id pub-id-type="doi">10.1145/3449639.3459286</pub-id></citation>
</ref>
<ref id="B129">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ribeiro</surname> <given-names>M. T.</given-names></name> <name><surname>Singh</surname> <given-names>S.</given-names></name> <name><surname>Guestrin</surname> <given-names>C.</given-names></name></person-group> (<year>2016a</year>). <article-title>Model-agnostic interpretability of machine learning</article-title>, in <source>ICML Workshop on Human Interpretability in Machine Learning, WHI &#x00027;16</source> (<publisher-loc>Stockholm, Sweden</publisher-loc>).</citation>
</ref>
<ref id="B130">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ribeiro</surname> <given-names>M. T.</given-names></name> <name><surname>Singh</surname> <given-names>S.</given-names></name> <name><surname>Guestrin</surname> <given-names>C.</given-names></name></person-group> (<year>2016b</year>). <article-title>&#x02018;Why should I trust you?&#x00027;: Explaining the predictions of any classifier</article-title>, in <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>1135</fpage>&#x02013;<lpage>1144</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939778</pub-id></citation>
</ref>
<ref id="B131">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rokach</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Decision forest: Twenty years of research</article-title>. <source>Inf. Fusion</source> <volume>27</volume>, <fpage>111</fpage>&#x02013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1016/j.inffus.2015.06.005</pub-id></citation>
</ref>
<ref id="B132">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rota Bulo</surname> <given-names>S.</given-names></name> <name><surname>Kontschieder</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>Neural decision forests for semantic image labelling</article-title>, in <source>Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition</source> <fpage>81</fpage>&#x02013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2014.18</pub-id></citation>
</ref>
<ref id="B133">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sagi</surname> <given-names>O.</given-names></name> <name><surname>Rokach</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>Ensemble learning: A survey</article-title>. <source>WIREs Data Min. Knowl. Disc</source>. <volume>8</volume>, <fpage>e1249</fpage>. <pub-id pub-id-type="doi">10.1002/widm.1249</pub-id></citation>
</ref>
<ref id="B134">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sato</surname> <given-names>N.</given-names></name> <name><surname>Kuruma</surname> <given-names>H.</given-names></name> <name><surname>Nakagawa</surname> <given-names>Y.</given-names></name> <name><surname>Ogawa</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Formal verification of a decision-tree ensemble model and detection of its violation ranges</article-title>. <source>IEICE Trans. Inf. Syst</source>. <volume>E103</volume>, <fpage>363</fpage>&#x02013;<lpage>378</lpage>. <pub-id pub-id-type="doi">10.1587/transinf.2019EDP7120</pub-id></citation>
</ref>
<ref id="B135">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafer</surname> <given-names>J. C.</given-names></name> <name><surname>Agrawal</surname> <given-names>R.</given-names></name> <name><surname>Mehta</surname> <given-names>M.</given-names></name></person-group> (<year>1996</year>). <article-title>Sprint: A scalable parallel classifier for data mining</article-title>, in <source>Proceedings of the 22th International Conference on Very Large Data Bases</source> <fpage>544</fpage>&#x02013;<lpage>555</lpage>.</citation>
</ref>
<ref id="B136">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sharp</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Implementing decision trees and forests on a gpu</article-title>, in <source>Computer Vision-ECCV 2008, Part IV, Lecture Notes in Computer Science</source> (<publisher-loc>Springer</publisher-loc>) <fpage>595</fpage>&#x02013;<lpage>608</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-88693-8_44</pub-id></citation>
</ref>
<ref id="B137">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>Y.</given-names></name> <name><surname>Ke</surname> <given-names>G.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>T.-Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Quantized training of gradient boosting decision trees</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>35</fpage>.</citation>
</ref>
<ref id="B138">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strobl</surname> <given-names>C.</given-names></name> <name><surname>Boulesteix</surname> <given-names>A.-L.</given-names></name> <name><surname>Zeileis</surname> <given-names>A.</given-names></name> <name><surname>Hothorn</surname> <given-names>T.</given-names></name></person-group> (<year>2007</year>). <article-title>Bias in random forest variable importance measures: Illustrations, sources and a solution</article-title>. <source>BMC Bioinform</source>. <volume>8</volume>, <fpage>25</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-8-25</pub-id><pub-id pub-id-type="pmid">17254353</pub-id></citation></ref>
<ref id="B139">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Struyf</surname> <given-names>J.</given-names></name> <name><surname>D&#x0017E;eroski</surname> <given-names>S.</given-names></name></person-group> (<year>2006</year>). <article-title>Constraint based induction of multi-objective regression trees</article-title>, in <source>Proceedings of the 4th International Conference on Knowledge Discovery in Inductive Databases, KDID&#x00027;05</source> <fpage>222</fpage>&#x02013;<lpage>233</lpage>. <pub-id pub-id-type="doi">10.1007/11733492_13</pub-id></citation>
</ref>
<ref id="B140">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Zaremba</surname> <given-names>W.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>Erhan</surname> <given-names>D.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Intriguing properties of neural networks. arXiv:1312.6199</article-title>.</citation>
</ref>
<ref id="B141">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Todorovski</surname> <given-names>L.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>D&#x0017E;eroski</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Ranking with predictive clustering trees</article-title>, in <source>Proceedings of the 13th European Conference on Machine Learning</source> <fpage>444</fpage>&#x02013;<lpage>455</lpage>. <pub-id pub-id-type="doi">10.1007/3-540-36755-1_37</pub-id></citation>
</ref>
<ref id="B142">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tolomei</surname> <given-names>G.</given-names></name> <name><surname>Silvestri</surname> <given-names>F.</given-names></name> <name><surname>Haines</surname> <given-names>A.</given-names></name> <name><surname>Lalmas</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Interpretable predictions of tree-based ensembles via actionable feature tweaking</article-title>, in <source>Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>465</fpage>&#x02013;<lpage>474</lpage>. <pub-id pub-id-type="doi">10.1145/3097983.3098039</pub-id></citation>
</ref>
<ref id="B143">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>T&#x000F6;rnblom</surname> <given-names>J.</given-names></name> <name><surname>Nadjm-Tehrani</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Formal verification of input-output mappings of tree ensembles</article-title>. <source>Sci. Comput. Program</source>. <volume>194</volume>, <fpage>102450</fpage>. <pub-id pub-id-type="doi">10.1016/j.scico.2020.102450</pub-id></citation>
</ref>
<ref id="B144">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tyree</surname> <given-names>S.</given-names></name> <name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name> <name><surname>Agrawal</surname> <given-names>K.</given-names></name> <name><surname>Paykin</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Parallel boosted regression trees for web search ranking</article-title>, in <source>Proceedings of the 20th international conference on World Wide Web</source> <fpage>387</fpage>&#x02013;<lpage>396</lpage>. <pub-id pub-id-type="doi">10.1145/1963405.1963461</pub-id></citation>
</ref>
<ref id="B145">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Utgoff</surname> <given-names>P. E.</given-names></name></person-group> (<year>1989</year>). <article-title>Incremental induction of decision trees</article-title>. <source>Mach. Lear</source>. <volume>4</volume>, <fpage>161</fpage>&#x02013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1023/A:1022699900025</pub-id></citation>
</ref>
<ref id="B146">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van der Linden</surname> <given-names>J. G.</given-names></name> <name><surname>Weerdt</surname> <given-names>M.</given-names></name> <name><surname>Demirovi&#x00107;</surname> <given-names>E.</given-names></name></person-group> (<year>2022</year>). <article-title>Fair and optimal decision trees: A dynamic programming approach</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>35</fpage>.</citation>
</ref>
<ref id="B147">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Essen</surname> <given-names>B.</given-names></name> <name><surname>Macaraeg</surname> <given-names>C.</given-names></name> <name><surname>Gokhale</surname> <given-names>M.</given-names></name> <name><surname>Prenger</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>Accelerating a random forest classifier: Multi-Core, GP-GPU, or FPGA?</article-title> in <source>2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines</source> <fpage>232</fpage>&#x02013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1109/FCCM.2012.47</pub-id></citation>
</ref>
<ref id="B148">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Wolputte</surname> <given-names>E.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Missing value imputation with MERCS: A faster alternative to missforest</article-title>, in <source>Proceedings of the 23rd International Conference, on Discovery Science</source> <fpage>502</fpage>&#x02013;<lpage>516</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-61527-7_33</pub-id></citation>
</ref>
<ref id="B149">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Wolputte</surname> <given-names>E.</given-names></name> <name><surname>Korneva</surname> <given-names>E.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>MERCS: multi-directional ensembles of regression and classification trees</article-title>, in <source>Proceedings of the 32nd AAAI Conference on Artificial Intelligence</source> <fpage>4276</fpage>&#x02013;<lpage>4283</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v32i1.11735</pub-id></citation>
</ref>
<ref id="B150">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vens</surname> <given-names>C.</given-names></name> <name><surname>Struyf</surname> <given-names>J.</given-names></name> <name><surname>Schietgat</surname> <given-names>L.</given-names></name> <name><surname>Dzeroski</surname> <given-names>S.</given-names></name> <name><surname>Blockeel</surname> <given-names>H.</given-names></name></person-group> (<year>2008</year>). <article-title>Decision trees for hierarchical multi-label classification</article-title>. <source>Mach. Lear</source>. <volume>73</volume>, <fpage>185</fpage>&#x02013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-008-5077-3</pub-id><pub-id pub-id-type="pmid">25865524</pub-id></citation></ref>
<ref id="B151">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Verhaeghe</surname> <given-names>H.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Pesant</surname> <given-names>G.</given-names></name> <name><surname>Quimper</surname> <given-names>C.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Learning optimal decision trees using constraint programming</article-title>. <source>Constr. Int. J</source>. <volume>25</volume>, <fpage>226</fpage>&#x02013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1007/s10601-020-09312-3</pub-id></citation>
</ref>
<ref id="B152">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Verwer</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning optimal classification trees using a binary linear program formulation</article-title>, in <source>Proceedings of the 33rd AAAI Conference on Artificial Intelligence</source> <fpage>1625</fpage>&#x02013;<lpage>1632</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33011624</pub-id></citation>
</ref>
<ref id="B153">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vos</surname> <given-names>D.</given-names></name> <name><surname>Verwer</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Efficient training of robust decision trees against adversarial examples</article-title>, in <source>Proceedings of the 38th International Conference on Machine Learning</source> <fpage>10586</fpage>&#x02013;<lpage>10595</lpage>.</citation>
</ref>
<ref id="B154">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vos</surname> <given-names>D.</given-names></name> <name><surname>Verwer</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Robust optimal classification trees against adversarial examples</article-title>, in <source>Proceedings of the 36th AAAI Conference on Artificial Intelligence</source> <fpage>8520</fpage>&#x02013;<lpage>8528</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v36i8.20829</pub-id></citation>
</ref>
<ref id="B155">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vos</surname> <given-names>D.</given-names></name> <name><surname>Verwer</surname> <given-names>S.</given-names></name></person-group> (<year>2023</year>). <article-title>Adversarially robust decision tree relabeling</article-title>, in <source>Machine Learning and Knowledge Discovery in Databases (ECMLPKDD 2022), Part III</source> <fpage>203</fpage>&#x02013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-26409-2_13</pub-id></citation>
</ref>
<ref id="B156">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Boning</surname> <given-names>D.</given-names></name> <name><surname>Hsieh</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2020</year>). <article-title>On Lp-norm robustness of ensemble decision stumps and trees</article-title>, in <source>Proceedings of the 37th International Conference on Machine Learning</source> <fpage>10104</fpage>&#x02013;<lpage>10114</lpage>.</citation>
</ref>
<ref id="B157">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>D.</given-names></name> <name><surname>Nair</surname> <given-names>R.</given-names></name> <name><surname>Dhurandhar</surname> <given-names>A.</given-names></name> <name><surname>Varshney</surname> <given-names>K. R.</given-names></name> <name><surname>Daly</surname> <given-names>E. M.</given-names></name> <name><surname>Singh</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>On the safety of interpretable machine learning: A maximum deviation approach</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>35</fpage>.</citation>
</ref>
<ref id="B158">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wen</surname> <given-names>Z.</given-names></name> <name><surname>He</surname> <given-names>B.</given-names></name> <name><surname>Kotagiri</surname> <given-names>R.</given-names></name> <name><surname>Lu</surname> <given-names>S.</given-names></name> <name><surname>Shi</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Efficient gradient boosted decision tree training on GPUs</article-title>, in <source>2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</source> <fpage>234</fpage>&#x02013;<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1109/IPDPS.2018.00033</pub-id></citation>
</ref>
<ref id="B159">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>G.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Bi</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Wu</surname> <given-names>X.</given-names></name></person-group> (<year>2009</year>). <article-title>Mrec4.5: C4.5 ensemble classification with mapreduce</article-title>, in <source>2009 Fourth ChinaGrid Annual Conference</source> <fpage>249</fpage>&#x02013;<lpage>255</lpage>.</citation>
</ref>
<ref id="B160">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>M.</given-names></name> <name><surname>Hughes</surname> <given-names>M.</given-names></name> <name><surname>Parbhoo</surname> <given-names>S.</given-names></name> <name><surname>Zazzi</surname> <given-names>M.</given-names></name> <name><surname>Roth</surname> <given-names>V.</given-names></name> <name><surname>Doshi-Velez</surname> <given-names>F.</given-names></name></person-group> (<year>2018</year>). <article-title>Beyond sparsity: Tree regularization of deep models for interpretability</article-title>, in <source>Proceedings of the AAAI conference on artificial intelligence</source> <fpage>32</fpage>. <pub-id pub-id-type="doi">10.1609/aaai.v32i1.11501</pub-id></citation>
</ref>
<ref id="B161">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Lu</surname> <given-names>S.</given-names></name> <name><surname>Chang</surname> <given-names>T.-H.</given-names></name> <name><surname>Shi</surname> <given-names>Q.</given-names></name></person-group> (<year>2022</year>). <article-title>An efficient learning framework for federated XGBoost using secret sharing and distributed optimization</article-title>. <source>ACM Trans. Intell. Syst. Technol</source>. <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1145/3523061</pub-id></citation>
</ref>
<ref id="B162">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>R.</given-names></name> <name><surname>Baracaldo</surname> <given-names>N.</given-names></name> <name><surname>Joshi</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>Privacy-preserving machine learning: Methods, challenges and directions. arXiv:2108.04417</article-title>.</citation>
</ref>
<ref id="B163">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>G.</given-names></name> <name><surname>Yuan</surname> <given-names>C.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>One-stage tree: end-to-end tree builder and pruner</article-title>. <source>Mach. Lear</source>. <volume>111</volume>, <fpage>1959</fpage>&#x02013;<lpage>1985</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-021-06094-4</pub-id></citation>
</ref>
<ref id="B164">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Y.-Y.</given-names></name> <name><surname>Rashtchian</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R. R.</given-names></name> <name><surname>Chaudhuri</surname> <given-names>K.</given-names></name></person-group> (<year>2020</year>). <article-title>A closer look at accuracy vs. robustness</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>8588</fpage>&#x02013;<lpage>8601</lpage>.</citation>
</ref>
<ref id="B165">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ye</surname> <given-names>T.</given-names></name> <name><surname>Zhou</surname> <given-names>H.</given-names></name> <name><surname>Zou</surname> <given-names>W. Y.</given-names></name> <name><surname>Gao</surname> <given-names>B.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>Rapidscorer: Fast tree ensemble evaluation by maximizing compactness in data level parallelization</article-title>, in <source>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &#x00026;Data Mining</source> <fpage>941</fpage>&#x02013;<lpage>950</lpage>.</citation>
</ref>
<ref id="B166">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>P. L. H.</given-names></name> <name><surname>Wan</surname> <given-names>W. M.</given-names></name> <name><surname>Lee</surname> <given-names>P. H.</given-names></name></person-group> (<year>2011</year>). <source>Preference Learning, chapter Decision Tree Modeling for Ranking Data</source>. (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>) <fpage>83</fpage>&#x02013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-14125-6_5</pub-id></citation>
</ref>
<ref id="B167">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zantedeschi</surname> <given-names>V.</given-names></name> <name><surname>Kusner</surname> <given-names>M.</given-names></name> <name><surname>Niculae</surname> <given-names>V.</given-names></name></person-group> (<year>2021</year>). <article-title>Learning binary decision trees by argmin differentiation</article-title>, in <source>Proceedings of the 38th International Conference on Machine Learning</source> <fpage>12298</fpage>&#x02013;<lpage>12309</lpage>.</citation>
</ref>
<ref id="B168">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>&#x0017D;enko</surname> <given-names>B.</given-names></name> <name><surname>Todorovski</surname> <given-names>L.</given-names></name> <name><surname>D&#x0017E;eroski</surname> <given-names>S.</given-names></name></person-group> (<year>2001</year>). <article-title>A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods</article-title>, in <source>Proceedings of the 2001 IEEE International Conference on Data Mining</source> <fpage>669</fpage>&#x02013;<lpage>670</lpage>.</citation>
</ref>
<ref id="B169">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Hsieh</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2020a</year>). <article-title>An efficient adversarial attack for tree ensembles</article-title>, in <source>Advances in Neural Information Processing Systems</source> <fpage>16165</fpage>&#x02013;<lpage>16176</lpage>.</citation>
</ref>
<ref id="B170">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name></person-group> (<year>2020b</year>). <article-title>Decision-based evasion attacks on tree ensemble classifiers</article-title>. <source>World Wide Web</source> <volume>23</volume>, <fpage>2957</fpage>&#x02013;<lpage>2977</lpage>. <pub-id pub-id-type="doi">10.1007/s11280-020-00813-y</pub-id></citation>
</ref>
<ref id="B171">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Si</surname> <given-names>S.</given-names></name> <name><surname>Hsieh</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2017</year>). <article-title>Gpu-acceleration for large-scale tree boosting. arXiv:1706.08359</article-title>.</citation>
</ref>
<ref id="B172">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Ntoutsi</surname> <given-names>E.</given-names></name></person-group> (<year>2019</year>). <article-title>FAHT: an adaptive fairness-aware decision tree classifier</article-title>, in <source>Proceedings of the 28th International Joint Conference on Artificial Intelligence</source> <fpage>1480</fpage>&#x02013;<lpage>1486</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2019/205</pub-id></citation>
</ref>
<ref id="B173">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>McArdle</surname> <given-names>J. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Rationale and applications of survival tree and survival ensemble methods</article-title>. <source>Psychometrika</source> <volume>3</volume>, <fpage>811</fpage>&#x02013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.1007/s11336-014-9413-1</pub-id><pub-id pub-id-type="pmid">25228495</pub-id></citation></ref>
<ref id="B174">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z.-H.</given-names></name></person-group> (<year>2012</year>). <source>Ensemble Methods: Foundations and Algorithms</source>. <edition>1st edition</edition>. <publisher-loc>London</publisher-loc>: <publisher-name>Chapman &#x00026;Hall/CRC</publisher-name>. <pub-id pub-id-type="doi">10.1201/b12207</pub-id></citation>
</ref>
</ref-list>
</back>
</article>