<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">787459</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.787459</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Fair and Effective Policing for Neighborhood Safety: Understanding and Overcoming Selection Biases</article-title>
<alt-title alt-title-type="left-running-head">Ren et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">FEPNS-USB</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ren</surname>
<given-names>Weijeiying</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1500456/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liu</surname>
<given-names>Kunpeng</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhao&#x2009;</surname>
<given-names>Tianxiang</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Fu&#x2009;</surname>
<given-names>Yanjie</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Computer Science Department, University of Central Florida, <addr-line>Orlando</addr-line>, <addr-line>FL</addr-line>, <country>United&#x20;States</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>College of Information Sciences and Technology, The Pennsylvania State University, <addr-line>University Park</addr-line>, <addr-line>PA</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1179065/overview">Xun Zhou</ext-link>, The University of Iowa, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/892446/overview">Meng Jiang</ext-link>, University of Notre Dame, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1504558/overview">Farid Razzak</ext-link>, United&#x20;States Securities and Exchange Commission, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Yanjie Fu&#x2009;, <email>yanjie.fu@ucf.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>787459</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Ren, Liu, Zhao&#x2009; and Fu&#x2009;.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ren, Liu, Zhao&#x2009; and Fu&#x2009;</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>An accurate crime prediction and risk estimation can help improve the efficiency and effectiveness of policing activities. However, reports have revealed that biases like racial prejudice could exist in policing enforcement, and trained predictors may inherit them. In this work, we study the possible reasons and countermeasures to this problem, using records from the New York frisk and search program (NYCSF) as the dataset. Concretely, we provide analysis on the possible origin of this phenomenon from the perspective of risk discrepancy, and study it with the scope of selection bias. Motivated by theories in causal inference, we propose a re-weighting approach based on propensity score to balance the data distribution, with respect to the identified treatment: search action. Naively applying existing re-weighting approaches in causal inference is not suitable as the weight is passively estimated from observational data. Inspired by adversarial learning techniques, we formulate the predictor training and re-weighting as a min-max game, so that the re-weighting scale can be automatically learned. Specifically, the proposed approach aims to train a model that: 1) able to balance the data distribution in the searched and un-searched groups; 2) remain discriminative between treatment interventions. Extensive evaluations on real-world dataset are conducted, and results validate the effectiveness of the proposed framework.</p>
</abstract>
<kwd-group>
<kwd>counterfactual learning</kwd>
<kwd>neighborhood safety</kwd>
<kwd>fairness</kwd>
<kwd>stop-and-frisk</kwd>
<kwd>adversarial learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>As part of law enforcement, policing is initially expected to protect citizens, fight crime, and maintain community safety effectively. However, latent race prejudice of decision makers could adjust it towards unfair directions and impair its efficiency. As one example, the New York City police department launched <italic>Stop-and-Frisk</italic> (NYCSF) program, which is a policing practice of temporarily detaining, questioning, stopping civilians, and searching drivers on the street for weapons and other contraband. Such a program aims to provide communities with great potential to reduce crime in advance and alleviate social conflicts. However, an analysis <xref ref-type="bibr" rid="B19">Gelman et&#x20;al. (2007)</xref> in NYCSF revealed that the rate of innocent people being stopped and searched is disproportionately high for those that were black or Latino. It can be seen from this case that racial bias poses great obstacles for efficient policing and resource allocation. This motivates us to 1) monitor and shape stop-and-frisk practices through data analysis and understand how the racial prejudge influence the policing system; 2) develop a debiasing solution for NYCSF program, which not only accounts for such bias, but also eases the burden on the police system and mitigates ethical conflict.</p>
<p>Previous works <xref ref-type="bibr" rid="B34">Meares (2014)</xref>; <xref ref-type="bibr" rid="B43">Tyler et&#x20;al. (2014)</xref> on NYCSF program mainly discuss the existence of biases, e.g., race, age, geographic distribution <xref ref-type="bibr" rid="B19">Gelman et&#x20;al. (2007)</xref> and evaluate its social impact in a data analysis way. To obtain an effective countermeasure, it is promising to adopt fairness methodology <xref ref-type="bibr" rid="B32">Locatello et&#x20;al. (2019)</xref>; <xref ref-type="bibr" rid="B50">Zafar M. B. et&#x20;al. (2017)</xref>; <xref ref-type="bibr" rid="B44">Viswesvaran and Ones (2004)</xref>; <xref ref-type="bibr" rid="B24">Hashimoto et&#x20;al. (2018)</xref> from the machine learning community into this specific task. The main idea is to define sensitive attributes first, like race in this case, and then enforce fairness across racial groups. Recently, <xref ref-type="bibr" rid="B29">Khademi et&#x20;al. (2019)</xref> introduces a matching-based causal fairness method and enforces predicted crime rate to be the same across race populations. However, this assumption may be too presumptuous since crime rates among race populations are different in the real-world scenario. The prompted fairness will inevitably damage the community&#x2019;s safety. This observation inspires us to think: how should we understand the bias in NYCSF programs and how can we mitigate such bias without making unrealistic assumptions?</p>
<p>To identify the racial prejudice of polices, we conduct a series of data analysis experiments, and found that it can be modeled in the form of selection bias. Concretely, we separate each racial population into multiple groups using multiple different criteria, and compare the criminal rate in each group. From the result, we found that 1) the black race sample has the lowest criminal rate which contradicts the stereotype image of most people <xref ref-type="bibr" rid="B19">Gelman et&#x20;al. (2007)</xref>, 2) racial distribution of the observed searched group is quite different from that of the un-searched group, We visualize the results in <xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>. These interesting observations motivate us to study the bias embedded in the &#x2018;search&#x2019; action. Taking one more step, we calculate the search rate of each race in the whole population, and show it in <xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>. It is clear that black race is overwhelmingly searched compared to other groups, hence inducing a lower criminal rate. From analyzing these statistics, one mechanism from which the bias in NYCSF program originates can be exposed: police stop and search suspicious passengers based on their own judgement, in which racial prejudices could lie, and cause the selection bias problem. Hence in this task, racial prejudice can be estimated through modeling the distribution of &#x2018;search&#x2019; action, which in turn can be used to alleviate the&#x20;bias.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Statistical analysis on NYCSF dataset. Figure <bold>(A)</bold> shows the black racial population has the lowest criminal rate than other race populations in both searched/un-searched groups. Figure <bold>(B)</bold> shows the black is overwhelmingly searched. Selective enforcement makes more black people searched by police while most of them are innocent.</p>
</caption>
<graphic xlink:href="fdata-04-787459-g001.tif"/>
</fig>
<p>Based on previous analysis, in this paper, we study a debiasing crime prediction task for the NYCSF program, and attribute the bias as a selection bias. This problem is under-explored in the NYCSF program, which poses two main challenges: 1) Lacking theoretical analysis. Why is the supervised predictor biased when training on observational data? Is there any theoretical insight that can help us understand this problem and then provide guidance to mitigate selection bias? 2) Lacking unbiased data. The biased training data lacks important signals about what the unbiased data looks like. How can we model such unbiased data distribution and design our loss function?</p>
<p>To answer the first question, We formulate our problem from an empirical risk minimization (ERM) perspective. The bias can be formulated as the risk discrepancy between the empirical risk and the true risk, which can be solved in a re-weighting formulation. To better understand the meaning of discrepancy, inspired by counterfactual modeling <xref ref-type="bibr" rid="B36">Pearl et&#x20;al. (2009)</xref>; <xref ref-type="bibr" rid="B37">Pearl (2010)</xref> and causal inference <xref ref-type="bibr" rid="B26">Kallus (2020)</xref>; <xref ref-type="bibr" rid="B25">Hern&#xe1;n and Robins (2010)</xref>; <xref ref-type="bibr" rid="B29">Khademi et&#x20;al. (2019)</xref>, we hypothesize the counterfactual distribution <xref ref-type="bibr" rid="B46">Wu et&#x20;al. (2019a)</xref> of each driver, and then show that a supervised estimator can be unbiased only when selected probability, i.e.,&#x20;the searched probability of drivers by the police, is known and fixed. To sum up, selection bias accounts to risk discrepancy. This conclusion can also provide us with an insight to solve the second challenge. The insight is that a better design of selection mechanism modeling that helps us to estimate the unbiased distribution from observational data may help address this problem.</p>
<p>Inspired by causal inference <xref ref-type="bibr" rid="B37">Pearl (2010)</xref>, we resort to counterfactual modeling to solve the second challenge. Regarding searched/unsearched action as a treatment/control intervention <xref ref-type="bibr" rid="B46">Wu et&#x20;al. (2019a)</xref> respectively, the core idea in causal inference is to create a pseudo population where the distributions of the treated group and control group are similar. So the outcome is independent with treatment conditional on the confounder. Confounder is race prejudged here which introduces selective enforcement. In causal inference, inverse propensity weighting (IPW) <xref ref-type="bibr" rid="B5">Austin (2011)</xref> is a classical re-weighting method for its simplicity and effectiveness. Propensity score represents the probability that a driver is searched by the police. However, applying this idea into the NYCSF program is not a trivial problem. First, the propensity score in IPW is fixed and partially observed from observational data. As reported, the police enforcement way is correlative with spatial information and also depends on intractable randomness, e.g., burst events, weather circumstance. It is essential to consider unknown factors into the propensity score estimation. Second, two dilemmas make the implementation challenging. 1) We assume the driver&#x2019;s criminal results should not be changed if the driver is searched or would not be searched. To achieve this goal, it is desirable to balance treatment groups with control groups w.r.t. the distributions of confounder representations. 2) Data distribution in treated and control groups should be distinguishable. Intuitively, professional experience makes policies to search for potential criminal drivers.</p>
<p>In response to the above problem, we propose an adversarial based re-weighting method to mitigate selection bias for NYCSF crime prediction tasks. To consider unknown factors into propensity score estimation, we do not calculate the value from observational data directly. We first formulate the counterfactual distribution estimation problem, e.g., &#x2018;what would the criminal results be if the driver has (not) been searched?&#x2019; To learn a fair data representation, we restrict the crime label from being changed when we generate the corresponding counterfactual counterpart. Consequently, we obtain a variant of the propensity score estimator which considers uncertainty. Considering the two conflicting properties inherent in debasing NYCSF task, we formulate the two desired data of handling selection bias as a minimax game. In this game, we train a base player to improve the crime classification accuracy. At this time, the re-weighting framework balances the distribution between treated and control groups. The weighting function is regarded as an adversary, which confuses the treatment discriminator. Our contribution are listed as follows:<list list-type="simple">
<list-item>
<p>&#x2022; We study a fair and effective policing problem from a novel selection bias perspective in the NYCSF task. We provide detailed theoretical analysis to show inconsistency issues of supervised learning on observational data. To the best of our knowledge, we are the first to analyze this problem empirically and theoretically.</p>
</list-item>
<list-item>
<p>&#x2022; Inspired by the inverse propensity weighting method (IPW) in causal inference, We propose a simple deferred re-balancing optimization procedure to apply re-weighting more effectively. The proposed counterfactual re-weighting method connects theoretical results in causal inference with crime prediction to improve the estimation efficiency. Different from fixed propensity score estimation in IPW, the proposed re-weighting score considers unknown factors with a learning function.</p>
</list-item>
<list-item>
<p>&#x2022; Accordingly, to both balance the data distribution in the treated and control group and make the learned distribution distinguishable, we shift the re-weighting objective into a minimax&#x20;game.</p>
</list-item>
<list-item>
<p>&#x2022; We conduct extensive experiments on the realistic NYCSF dataset to validate the effectiveness of our method. Compared to the baselines, the proposed method improves crime rate efficiency. Besides, it can mitigate racial prejudice from an exposure bias perspective, taking into account both efficiency and fairness.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s2">
<title>2 Related Works</title>
<p>Fairness These laws typically evaluate the fairness <xref ref-type="bibr" rid="B10">Calmon et&#x20;al. (2017)</xref>; <xref ref-type="bibr" rid="B27">Kamiran and Calders (2009)</xref>; <xref ref-type="bibr" rid="B1">Agarwal et&#x20;al. (2018)</xref>; <xref ref-type="bibr" rid="B50">Zafar B. et&#x20;al. (2017)</xref>; <xref ref-type="bibr" rid="B33">Louizos et&#x20;al. (2015)</xref> of a decision making process using two distinct notions <xref ref-type="bibr" rid="B50">Zafar M. B. et&#x20;al. (2017)</xref>: disparate treatment <xref ref-type="bibr" rid="B30">Krieger and Fiske (2006)</xref>; <xref ref-type="bibr" rid="B32">Locatello et&#x20;al. (2019)</xref> and disparate impact. <xref ref-type="bibr" rid="B13">Feldman et&#x20;al. (2015)</xref>; <xref ref-type="bibr" rid="B24">Hashimoto et&#x20;al. (2018)</xref> Disparate treatment refers to intentional discrimination, where people in a protected class are deliberately treated differently. Disparate impact refers to discrimination that is unintentional. The procedures are the same for everyone, but people in a protected class are negatively affected. While disparate impact discrimination is not always illegal. These two definitions, however, are too abstract for the purpose of computation. As a result, there is no consensus on the mathematical formulations of fairness.</p>
<p>In general, there has been an increasing line of work to address fairness in machine leaning models, most of them can be categorised into three groups: 1) individual fairness <xref ref-type="bibr" rid="B9">Biega et&#x20;al. (2018)</xref>; <xref ref-type="bibr" rid="B28">Kang et&#x20;al. (2020)</xref> 2) group fairness <xref ref-type="bibr" rid="B42">Srivastava et&#x20;al. (2019)</xref>; <xref ref-type="bibr" rid="B15">Fleurbaey (1995)</xref> 3) causality based fairness. <xref ref-type="bibr" rid="B26">Kallus (2020)</xref>. Individual fairness expects similar individuals to have similar outcomes. It&#x2019;s not easy to find a suitable distance metric. Group fairness notions require the algorithm treat different groups equally. The most commonly used group fairness notions include demographic parity <xref ref-type="bibr" rid="B2">Andreev et&#x20;al. (2002)</xref>, equal opportunity <xref ref-type="bibr" rid="B3">Arneson (1989)</xref>, equalized odds <xref ref-type="bibr" rid="B23">Hardt et&#x20;al. (2016)</xref> and calibration <xref ref-type="bibr" rid="B39">Pleiss et&#x20;al. (2017)</xref>. However, they only use sensitive attributes and outcome as meaningful features. The above two notions all based on passive observed data. To provide a possible way to interpret the causes of bias, causality-based fairness notions <xref ref-type="bibr" rid="B31">Kusner et&#x20;al. (2017)</xref> are defined based on different types of causal effects, such as total effect on interventions <xref ref-type="bibr" rid="B16">Funk et&#x20;al. (2011)</xref>, direct/indirect discrimination path-specific effects <xref ref-type="bibr" rid="B11">Chiappa (2019)</xref>, and counterfactual fairness on counterfactual effects <xref ref-type="bibr" rid="B18">Garg et&#x20;al. (2019)</xref>; <xref ref-type="bibr" rid="B48">Xu et&#x20;al. (2019)</xref>. Identifiability <xref ref-type="bibr" rid="B6">Avin et&#x20;al. (2005)</xref> is a critical barrier for the causality-based fairness to be applied to real applications. <xref ref-type="bibr" rid="B47">Wu et&#x20;al. (2019b)</xref> develop a constrained optimization problem for bounding the PC fairness, which is motivated by the method proposed in <xref ref-type="bibr" rid="B7">Balke and Pearl (1994)</xref> for bounding confounded causal effects. It is also hard to reach a consensus in terms of what the causal graph should look like and it is even harder to decide which features to use even if we have such a&#x20;graph.</p>
<p>Propensity Scoring in causal inference Biases caused by confounders <xref ref-type="bibr" rid="B21">Greenland and Robins (2009)</xref> have been extensively studied in the causal inference <xref ref-type="bibr" rid="B37">Pearl (2010)</xref> domain, and one most popular direction addressing it is utilizing the propensity score <xref ref-type="bibr" rid="B5">Austin (2011)</xref>. Propensity score-based methods re-weight samples from different treatment groups, to balance the distribution. After re-weighting using propensity score, the distribution of observation will be similar across treatment groups. One classical propensity scoring methods is Inverse Propensity Weighting (IPW) <xref ref-type="bibr" rid="B5">Austin (2011)</xref>, in which the weighting score is equal to the inverse of the probability of receiving the treatment.</p>
<p>Our approach also follows this line of work. However, the directly computed propensity score in existing approaches may be suboptimal, as the assumption of equality across groups is too presumptuous for this task. This observation motivates our design of introducing an adversarial module.</p>
</sec>
<sec id="s3">
<title>3 Stop-And-Frisk by NYPD</title>
<p>In this section, we will first introduce the working mechanism of the Stop-and-Frisk program, and then introduce the collected data provided by the New York City Police Department (NYPD).</p>
<sec id="s3-1">
<title>3.1 NYCSF Program</title>
<p>NYPD launched the Stop-and-Frisk program for recording and analyzing police officer&#x2019;s regular enforcement practice. In the Stop-and-Frisk program, there are generally three types of actors: 1) the official police, who can stop a person and check whether there are weapons or drugs carried by the suspect, with filling out a form recording the details; 2) the suspect, who is subjected to the stops; and 3) the environment,in which the stops occur, including location illustration and time records. After an individual is stopped, officers may conduct a frisk (i.e.,&#x20;a quick pat-down of the person&#x2019;s outer clothing) if they reasonably suspect the individual is armed and dangerous; officers may additionally conduct a search if they have probable cause of criminal activity. An officer may decide to make an arrest or issue a summons, all of which is recorded on the UF-250 form. Responses are subsequently standardized, compiled and released annually to the public.</p>
<p>
<xref ref-type="table" rid="T1">Table.1</xref> shows the description of the NYCSF dataset. The NYCSF dataset contains a variety of heterogeneous information about all entities. Entities of each type include various information in the form of unstructured data, such as text, and structured data, such as geo-spatial, numerical, categorical, and ordinal&#x20;data.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Summary of key information recorded on the UF-250 Stop-and-Frisk&#x20;form.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Feature level</th>
<th align="center">Category</th>
<th align="center">Feature type</th>
<th align="center">Feature</th>
<th align="center">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="17" align="left">Character Information</td>
<td rowspan="14" align="left">Suspect Demographic Characteristic</td>
<td rowspan="7" align="left">relevant sensitive features</td>
<td align="left">Race</td>
<td align="left">Category: &#x2018;BLACK HISPANIC&#x2032;, &#x2018;WHITE&#x2019;,&#x2018;ASIAN &#x2018;, &#x2018;WHITE HISPANIC&#x2032;, &#x2018;AMERICAN INDIAN&#x2032;</td>
</tr>
<tr>
<td align="left">Hair Color</td>
<td align="left">A brief report of hair color</td>
</tr>
<tr>
<td align="left">Sex</td>
<td align="left">Sex of suspect: male and female</td>
</tr>
<tr>
<td align="left">Age</td>
<td align="left">Age of all suspects: from 6 to 99</td>
</tr>
<tr>
<td align="left">Weight/Height</td>
<td align="left">Weight/Height of Suspect</td>
</tr>
<tr>
<td align="left">Body build</td>
<td align="left">Thin, heavy or medium of the suspect</td>
</tr>
<tr>
<td align="left">suspect other description</td>
<td align="left">first glance description about the subject</td>
</tr>
<tr>
<td rowspan="7" align="left">Event Varying</td>
<td align="left">Stopped Way</td>
<td align="left">Frisked, Searched or Not</td>
</tr>
<tr>
<td align="left">Mental Activity</td>
<td align="left">Suspect&#x2019;s Reflection when they are stopped, like complain, calm, nerves</td>
</tr>
<tr>
<td align="left">Weapon found</td>
<td align="left">Whether found weapon if suspect is searched/frisked</td>
</tr>
<tr>
<td align="left">Drug Found</td>
<td align="left">Whether found drug if suspect is searched/frisked</td>
</tr>
<tr>
<td align="left">Criminal Label</td>
<td align="left">generally, we regard arrested and summon as guilty</td>
</tr>
<tr>
<td align="left">initiated stop</td>
<td align="left">The way to chasing the suspect who in the car: radio run, call or others</td>
</tr>
<tr>
<td align="left">Work Status</td>
<td align="left">Whether In Uniform</td>
</tr>
<tr>
<td rowspan="3" align="left">Police Profile</td>
<td align="left">others</td>
<td align="left">Official Rank</td>
<td align="left">Official Rank, e.g., PBM, Non</td>
</tr>
<tr>
<td align="left">others</td>
<td align="left">Official explained stop flag</td>
<td align="left">Whether the police explain the reason for the stop, yes or no</td>
</tr>
<tr>
<td align="left">others</td>
<td align="left">Official uniform flag stop</td>
<td align="left">if the police are in uniform when the stop happened, yes or no</td>
</tr>
<tr>
<td rowspan="8" align="left">Environment Information</td>
<td rowspan="4" align="left">Primary Stop Circumstance(s)</td>
<td rowspan="3" align="left">Location Varying</td>
<td align="left">Witness Report</td>
<td align="left">Brief summarization Of Witness</td>
</tr>
<tr>
<td align="left">Witness Report</td>
<td align="left">Brief Summarization Of Witness</td>
</tr>
<tr>
<td align="left">Inside Or Outside</td>
<td align="left">Openness Of This case</td>
</tr>
<tr>
<td align="left">Event Varying</td>
<td align="left">Stop Duration Minutes</td>
<td align="left">Exact Stop Duration Time</td>
</tr>
<tr>
<td rowspan="4" align="left">Location Circumstance(s)</td>
<td rowspan="4" align="left">Static</td>
<td align="left">GPS Coordinates</td>
<td align="left">GPS of the stopped by location</td>
</tr>
<tr>
<td align="left">Precinct</td>
<td align="left">Precinct of location</td>
</tr>
<tr>
<td align="left">Location Type</td>
<td align="left">Public Housing, public Transit</td>
</tr>
<tr>
<td align="left">Stop Location Street Name</td>
<td align="left">Corresponds With Precinct</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In this paper, we mainly discuss the bias introduced by subject races, hence we view race as <italic>the sensitive attributes</italic>. This form records various aspects of the stop, including demographic characteristics of the suspect, the time and location of the stop, the suspected crime and the rationale for the stop (e.g., whether the suspect was wearing clothing common in the commission of a crime). One notable limitation of this dataset is that no demographic or other identifying information is available about officers. The forms were filled out by hand and manually entered into an NYCSF database until 2017, when the forms became electronic. The NYCSF reports NYCSF data in two ways: a summary report released quarterly and a complete database released annually to the public.</p>
</sec>
<sec id="s3-2">
<title>3.2 Constructed Features</title>
<p>For the NYCSF program, we collect features from characteristic information and environment information perspectives, and highlight the key aspects of the data on <xref ref-type="table" rid="T1">Table&#x20;1</xref>.</p>
<sec id="s3-2-1">
<title>3.2.1 Characteristic Information</title>
<p>The subject features can be grouped into four categories: suspect demographic characteristics, suspect physical and motion profile, police profiles. We regard the subject race as <italic>sensitive attributes</italic> which will be discouraged in the realistic world. We also regard other demographic characteristics like suspect description, suspect hair color, suspect eye color, body build type as <italic>sensitive relevant attributes</italic>.</p>
</sec>
<sec id="s3-2-2">
<title>3.2.2 Environment Features</title>
<p>The environment features can be grouped into two groups: primary stop circumstance, like furtive movement, actions of violent crime; additional stop circumstance, e.g., stop street name, time of&#x20;day.</p>
<p>These features are very heterogeneous, including both numerical, categorical data and also text. For consistency, we represent all features as numerics or numerical vectors <xref ref-type="bibr" rid="B40">Ren et&#x20;al. (2018)</xref>. Specifically, for the categorical data with less than 8 dimensions, such office rank, office explained stop flag, we adopt the one-hot encoding, i.e.,&#x20;converting a categorical variable within categories into a binary vector, in which only the value in the corresponding category is set to one and the other values are set to zero. For categorical data with more than 8 dimensions, such as geo-spatial data, i.e.,&#x20;location precinct, City, we use the count encoding, i.e.,&#x20;replacing the variables by the respective count frequencies of the variables in the dataset. For text data, such as street name, suspect description, we adopt glove <xref ref-type="bibr" rid="B38">Pennington et&#x20;al. (2014)</xref> to convert text into vector embedding and use the means of embedding to represent the semantic information. Since location information has high correspondence with subject race <xref ref-type="bibr" rid="B19">Gelman et&#x20;al. (2007)</xref> and can leakage subject information, we assume street name, GPS coordinates, location type and precinct as <italic>sensitive relevant attributes</italic>.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4 Preliminary</title>
<p>In this section, we present notations used in this work, and formally define the <italic>stop-and-frisk</italic> crime prediction problem.</p>
<sec id="s4-1">
<title>4.1 Problem Formulation</title>
<p>We first set up the notations. Consider we have a crime prediction system with a driver&#x2019;s feature set <italic>x</italic>
<sub>
<italic>s</italic>
</sub> &#x2208; <italic>S</italic> and a police set <italic>x</italic>
<sub>
<italic>p</italic>
</sub> &#x2208; <italic>P</italic> in the current environment <italic>x</italic>
<sub>
<italic>e</italic>
</sub> &#x2208; <italic>E</italic>. Let <italic>x</italic>
<sub>
<italic>s</italic>
</sub> represent subject information, like driver profile and behavior information; <italic>x</italic>
<sub>
<italic>p</italic>
</sub> represents a corresponding police officer for each driver, and <italic>x</italic>
<sub>
<italic>e</italic>
</sub> represents environment information. We set <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi>x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> to concatenate heterogeneous data, where <italic>d</italic> is the number of feature dimensions. Each driver has a binary label <italic>t</italic> &#x2208; {1, 0} to denote whether the driver is searched by the police or not. In the following, to keep consistency, we set <italic>t</italic>&#x20;&#x3d; 1 as treatment and <italic>t</italic>&#x20;&#x3d; 0 as control. The associated labels <italic>y</italic> &#x2208; {0, one} are from the observational label space <italic>Y</italic> to represent the crime results. Formally, the collected history recording data <italic>D</italic>
<sub>
<italic>n</italic>
</sub> can be noted as a set of triples <inline-formula id="inf2">
<mml:math id="m2">
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2a7d;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2a7d;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> generated from an unknown distribution <italic>p</italic>
<sub>
<italic>u</italic>
</sub>(<italic>x</italic>, <italic>t</italic>, <italic>y</italic>) over driver&#x2019;s feature-treatment-crime label space <italic>X</italic>&#x20;&#xd7; <italic>T</italic>&#x20;&#xd7; <italic>Y</italic>. The goal of our crime prediction framework is to learn a parametric function <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>: <italic>X</italic>&#x20;&#xd7; <italic>T</italic>&#x20;&#x2192; <italic>Y</italic> from the available historical recording <italic>D</italic>
<sub>
<italic>n</italic>
</sub> to minimize the following true risk:<disp-formula id="e1">
<mml:math id="m3">
<mml:mi mathvariant="script">L</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>where we denote <italic>&#x3b4;</italic>(<italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>(<italic>x</italic>, <italic>t</italic>), <italic>y</italic>) as the error function of the predicted output score with <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>) with the ground-truth&#x20;label <italic>y</italic>. <italic>P</italic>
<sub>
<italic>u</italic>
</sub>(<italic>x</italic>, <italic>t</italic>, <italic>y</italic>) denotes the ideal unbiased data distribution. To be specific, bias refers to selection bias here, which means the searching actions are based on the police judgement.</p>
<p>Since the true risk is not accessible, the learning is conducted on the historical recordings <italic>D</italic>
<sub>
<italic>n</italic>
</sub> &#x223c; <italic>P</italic>
<sub>
<italic>n</italic>
</sub>(<italic>x</italic>, <italic>t</italic>, <italic>y</italic>) by optimizing the following empirical risk:<disp-formula id="e2">
<mml:math id="m4">
<mml:mi mathvariant="script">L</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:munderover>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>Based on the law of large numbers, the PAC theory <xref ref-type="bibr" rid="B4">Auer et&#x20;al. (1995)</xref> states that empirical risk minimization can approximate the real risk if we have sufficient training data. With training instances sampled from random trials, the large amount of collected data can approximate the real data distribution for the reason of missing at random <xref ref-type="bibr" rid="B41">Shpitser (2016)</xref>. However, as we mentioned above, because of the police selective enforcement, selection bias is demonstrated in this <italic>stop-and-frisk</italic> program.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Selection Biases in Stop and Frisk Program</title>
<p>In this section, we first show the origin of selection bias in <italic>stop-and-frisk</italic> program. Then, the influence of biased training data on supervised models is analyzed. Motivated by these results, we propose to address this bias issue from the selection bias perspective, and present annotations depicting the problem in the&#x20;end.</p>
<p>Selection bias. This issue happens as police are free to select drivers and check whether they hold weapons or drugs, and these selections are latently influenced by prior prejudices, which could be unfair towards some groups. We illustrate this phenomenon in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. (a), from which it can be seen that the selective enforcement makes the observational criminal race population fail to represent the real criminal distribution across racial groups. Selective bias can be easily understood from a risk discrepancy perspective&#x2013;it skews the observational distribution <italic>p</italic>
<sub>
<italic>n</italic>
</sub>(<italic>x</italic>, <italic>t</italic>, <italic>y</italic>) from an ideal uniform selection enforcement <italic>p</italic>
<sub>
<italic>u</italic>
</sub>(<italic>x</italic>, <italic>t</italic>,&#x20;<italic>y</italic>).</p>
<p>Biased training on observational data. In the previous part, we have conducted a data analysis to demonstrate that race prejudice is a confounder in <italic>stop-and-frisk</italic> program, which lies behind the police selective enforcement. In the following part, we first provide a theoretical analysis to show the inconsistency issue of supervised models trained on a dataset with selection biases. This issue further motivate us to answer the question: can theoretical analysis provide us with several guidelines on how to alleviate the biases?</p>
<p>Given observational data <italic>x</italic> &#x2208; <italic>D</italic>
<sub>
<italic>n</italic>
</sub>, we let <italic>p</italic>
<sup>1</sup>(<italic>t</italic>) &#x3d; <italic>p</italic>(<italic>Y</italic>&#x20;&#x3d; 1, <italic>x</italic>, <italic>T</italic>&#x20;&#x3d; <italic>t</italic>) and <italic>p</italic>
<sup>&#x2212;1</sup>(<italic>t</italic>) &#x3d; <italic>p</italic>(<italic>Y</italic>&#x20;&#x3d; &#x2212; 1, <italic>x</italic>, <italic>T</italic>&#x20;&#x3d; <italic>t</italic>), <italic>t</italic> &#x2208; {0, 1} represent the joint distribution of positive and negative criminal result under either treatment intervention. A supervised model <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub> is biased when training on observational data under unknown selection mechanism <italic>p</italic>(<italic>t</italic>&#x7c;<italic>x</italic>).<disp-formula id="e3">
<mml:math id="m5">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>We replace <italic>f</italic>(<italic>x</italic>, <italic>t</italic>; <italic>&#x3b8;</italic>) with <italic>&#x3b1;</italic> for simplicity, then:<disp-formula id="e4">
<mml:math id="m6">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mi>inf</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mi>inf</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mi>inf</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>Set &#x394;(<italic>&#x3bc;</italic>) &#x3d; &#x2212; inf&#x2009;<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>&#x3d5;</italic>(<italic>&#x3b1;</italic>) &#x2b; <italic>&#x3d5;</italic>(&#x2212; <italic>&#x3b1;</italic> &#x22c5; <italic>&#x3bc;</italic>)). As &#x394;(<italic>&#x3bc;</italic>) is a convex and continue function of <italic>&#x3bc;</italic>, we can obtain:<disp-formula id="e5">
<mml:math id="m7">
<mml:mi>inf</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>It is clear that <xref ref-type="disp-formula" rid="e5">Eq. (5)</xref> is the f-divergence induced by &#x394;, where <inline-formula id="inf3">
<mml:math id="m8">
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x222b;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> which measures the difference between two probability distributions <italic>P</italic>
<sup>(1)</sup> and <italic>P</italic>
<sup>(&#x2212;1)</sup>. To sum up, the optimal <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub> can only be achieved by solving <xref ref-type="disp-formula" rid="e5">Eq. (5)</xref>, where <italic>P</italic>
<sup>(1)</sup> and <italic>P</italic>
<sup>(&#x2212;1)</sup> should be known. It can be seen as a specific form of f-divergence. It is obvious that when <italic>P</italic>(<italic>t</italic>&#x7c;<italic>X</italic>) is unknown, the optimal <italic>f</italic>
<sup>&#x2217;</sup> depends on <italic>P</italic>(<italic>t</italic>&#x7c;<italic>X</italic>) will not be recovered. The traditional supervised method can only serve as a biased estimator, since they assume selection bias as 1 for all samples.</p>
<p>So far we have done statistical analysis on the NYPD dataset and find the criminal rate is heavily correlated with stop and frisk rates as well as spatial information. Since it is not suitable to force group fairness on crime prediction since data analysis demonstrates that each racial population indeed has different criminal rates, we propose to address this issue from the selection bias perspective&#x2013;that criminal rates are subjective, the probability that each race is exposed to police. Estimators would be biased if we do not take selection bias into consideration during its design. In the next section, we will propose our solution to this problem.</p>
<p>Before introducing our solution, we first give the annotations used to show the selection bias. We define <italic>y</italic>(<italic>t</italic>&#x20;&#x3d; 1) to be the crime label if the driver would be searched by the police. Conversely, we denote <italic>y</italic>(<italic>t</italic>&#x20;&#x3d; 0) as the unsearched outcome in the counterfactual world. The pair (<italic>y</italic>(0), <italic>y</italic>(1)) is the potential outcomes notation in the Rubin causal model, where selection enforcement is a &#x201c;treatment&#x201d; and a crime result of the associated driver is an &#x201c;outcome.&#x201d;</p>
<p>Our goal is to train a supervised model <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>) which takes the instance feature <italic>x</italic>
<sub>
<italic>i</italic>
</sub> and treatment indicator <italic>t</italic>
<sub>
<italic>i</italic>
</sub> as input. We use shorthand <italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>) to denote the output score and the loss with respect to <italic>y</italic>
<sub>
<italic>i</italic>
</sub> is given by <italic>L</italic>
<sub>
<italic>&#x3d5;</italic>
</sub>(<italic>f</italic>
<sub>
<italic>&#x3b8;</italic>
</sub>(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>), <italic>y</italic>
<sub>
<italic>i</italic>
</sub>). We use <italic>p</italic>(<italic>t</italic>
<sub>
<italic>i</italic>
</sub>&#x7c;<italic>x</italic>\<italic>x</italic>
<sub>
<italic>o</italic>
</sub>) to denote the propensity score, e.g., the probability that the current driver is under the status of treatment (searched) or control (unsearched) based on driver profile and environment information.</p>
</sec>
<sec id="s6">
<title>6 A General Debiasing Framework</title>
<p>Previous analysis shows that selection biases exist behind police enforcement, and account for the risk discrepancy. In this section, we first propose to address it via a general debiasing framework, then discuss how to formulate it inside the causal inference theory so that it can be solved with tools from that domain. In the end, we show the formalized formulation and talk about the optimization process. We illustrate our framework in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Overview of our framework. We leverage the re-weighting framework to approximate an unbiased distribution. To better estimate the re-weighting score, we estimate a counterfactual distribution in the stop and frisk program, and propose an adversarial re-weighting method. The learner aims to learn the re-weighting score and adversary aims to train the classification model.</p>
</caption>
<graphic xlink:href="fdata-04-787459-g002.tif"/>
</fig>
<sec id="s6-1">
<title>6.1 A Debiasing Empirical Risk</title>
<p>Data analysis has demonstrated that racial prejudice is a confounder in the NYCSF program, and causes the police selective enforcement. Motivated by it, our goal is to approximate an ideal while unknown distribution <italic>p</italic>
<sub>
<italic>u</italic>
</sub> given observational data <italic>p</italic>
<sub>
<italic>n</italic>
</sub>. To deal with it, a general method is to training sample and obtain a re-weighted empirical risk function:<disp-formula id="e6">
<mml:math id="m9">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>where the weighting parameter <italic>w</italic>
<sub>
<italic>i</italic>
</sub> is properly specified, i.e.,&#x20;<inline-formula id="inf4">
<mml:math id="m10">
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, measuring the discrepancy between selection bias eliminating data and observational data distribution. Such empirical risk is an unbiased estimation of the true&#x20;risk.</p>
<p>As we mentioned above, selection bias arises from the fact that we only have one observational data, e.g., the polices stop and search the current data, while we known its counterfactual outcome, &#x2018;what will the crime result be if the current driver is not searched by the police.&#x2019; To be general, we denote the counterfactual distribution as <italic>p</italic>
<sub>
<italic>c</italic>
</sub>(<italic>x</italic>, <italic>t</italic>, <italic>y</italic>(<italic>t</italic>)). It is intuitive that the ideal data distribution is the combination of observational <italic>p</italic>
<sub>
<italic>n</italic>
</sub> and counterfactual data <italic>p</italic>
<sub>
<italic>c</italic>
</sub>, i.e.,&#x20;<italic>p</italic>
<sub>
<italic>u</italic>
</sub> &#x3d; <italic>p</italic>
<sub>
<italic>n</italic>
</sub> &#x2b; <italic>p</italic>
<sub>
<italic>c</italic>
</sub>. Since we make the assumption that whether the driver is searched or not, his crime results should not be changed. Consequently, we make the corresponding counterfactual definition:</p>
<p>Definition (driver-based Counterfactual outcome.) Given driver&#x2019;s feature <italic>x</italic>, his criminal label <italic>y</italic>, and treatment t. We assume his potential outcome as <italic>y</italic>(<italic>t</italic>), which denotes the expected criminal label with treatment t enforced. Concretely, we assume the label should not be changed, i.e.,&#x20;y(t) &#x3d;&#x20;y.</p>
<p>
<statement content-type="assumption" id="uAssumption_1">
<label>Assumption</label>
<p>(positivity). Given driver&#x2019;s feature <italic>x</italic>, we assume the propensity score <italic>p</italic>(<italic>t</italic>&#x7c;<italic>x</italic>) &#x3e; 0. That is every driver has a positive probability to be searched by polices. This assumption is also consistent with the overlap assumption in causal inference.</p>
<p>In the following, we introduce how to model the counterfactual distribution <italic>p</italic>
<sub>
<italic>c</italic>
</sub> based on the observational data distribution&#x20;<italic>p</italic>
<sub>
<italic>n</italic>
</sub>.</p>
</statement>
</p>
</sec>
<sec id="s6-2">
<title>6.2 Counterfactual Outcome Estimation</title>
<p>As we mentioned above, the re-weighted empirical risk function is an unbiased estimation of the true risk which approximates an ideal but unknown distribution, i.e.,&#x20;observational distribution <italic>p</italic>
<sub>
<italic>n</italic>
</sub> and counterfactual distribution <italic>p</italic>
<sub>
<italic>c</italic>
</sub>. To better estimate the weighting score, we estimate the counterfactual outcome distribution in this section, and connect its formulation with re-weighting method.</p>
<p>In a realistic world, we only observe <italic>y</italic> for either <italic>t</italic>&#x20;&#x3d; 1 or <italic>t</italic>&#x20;&#x3d; 0, and the corresponding counterfactual outcome is never observed, which connects causal inference with a missing data mechanism. With re-weighting techniques, the criminal results distribution in both observational and counterfactual world can be represented as :<disp-formula id="e7">
<mml:math id="m11">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf5">
<mml:math id="m12">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> counters the missing ratio in the observational distribution <italic>p</italic>
<sub>
<italic>n</italic>
</sub>. To be specific, we set <italic>T</italic> as <italic>t</italic>, which makes intervention equal to observational treatment. In this case, <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref> can be represented as:<disp-formula id="e8">
<mml:math id="m13">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>In which <italic>p</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>Y</italic>(<italic>t</italic>), <italic>T</italic>&#x7c;<italic>X</italic>) can be estimated from observational data. In this way, empirical risk would be equivalent to the true risk. To connect <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref> with a counterfactual outcome, we set <italic>T</italic>&#x20;&#x3d; 1&#x20;&#x2212; <italic>t</italic> and make the intervention to be the opposite action in the realistic world.<disp-formula id="e9">
<mml:math id="m14">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>Specifically, we set <italic>t</italic>&#x20;&#x3d; 1 in <xref ref-type="disp-formula" rid="e9">Eq. (9)</xref> and <italic>p</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>Y</italic>(1), <italic>T</italic>&#x20;&#x3d; 0&#x7c;<italic>X</italic>) represents the joint distribution that the driver is not searched by the police if he were searched. Since <italic>p</italic>(<italic>Y</italic>(<italic>t</italic>), <italic>T</italic>&#x20;&#x3d; <italic>t</italic>&#x7c;<italic>X</italic>) and <italic>p</italic>(<italic>T</italic>&#x20;&#x3d; <italic>t</italic>&#x7c;<italic>X</italic>) can be estimated from observational data, our goal on generate counterfactual outcome is to approximate <inline-formula id="inf6">
<mml:math id="m15">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> including unknown factors <italic>Y</italic>(<italic>t</italic>). To ensure simplicity, we replace the term <inline-formula id="inf7">
<mml:math id="m16">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="" close=")">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> with a learnable function of <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>(<italic>t</italic>)). Reminding our goal in <xref ref-type="disp-formula" rid="e6">Eq. (6)</xref>, we minimize the real learned risk <inline-formula id="inf8">
<mml:math id="m17">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>:<disp-formula id="e10">
<mml:math id="m18">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2254;</mml:mo>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>min</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
<p>Since we do not have ground truth for <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>(<italic>t</italic>)) as supervised signals. To make <xref ref-type="disp-formula" rid="e10">Eq.(10)</xref> tractable, using definition 1, i.e.,&#x20;<italic>y</italic>(<italic>t</italic>) &#x3d; <italic>y</italic>, we approximate <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>(<italic>t</italic>)) to <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>), which is a variant of inverse propensity score <xref ref-type="bibr" rid="B5">Austin (2011)</xref>.</p>
<p>The NYCSF program can be taken as an example to illustrate the implication of <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>). There are two dilemmas to understand selective enforcement <italic>p</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>), where <italic>T</italic> &#x2208; {0, 1}. On the one hand, it is desirable to balance the distributions between the treated and the controlled groups, which can satisfy the unconfounded assumption in causal inference <xref ref-type="bibr" rid="B25">Hern&#xe1;n and Robins (2010)</xref>:<disp-formula id="e11">
<mml:math id="m19">
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mspace width="0.3333em"/>
<mml:mo>&#x2aeb;</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>.</mml:mo>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>Given the latent feature representation <italic>h</italic>
<sub>
<italic>i</italic>
</sub>, the representation balancing methods design a distance measurement metric, e.g., f-divergence, to minimize the representation between <italic>P</italic>(<italic>h</italic>&#x7c;<italic>x</italic>; <italic>t</italic>&#x20;&#x3d; 1) and <italic>P</italic>(<italic>h</italic>&#x7c;<italic>x</italic>; <italic>t</italic>&#x20;&#x3d; 0). As a consequence, the outcome is independent with treatment given the input&#x20;data.</p>
<p>On the other hand, in the NYCSF platform, data distribution in the treated and control group should be different: criminal drivers should have a higher probability to be stopped and searched by police. In this way, the data distribution should be indicative to its treatment prediction, i.e.,&#x20;<italic>P</italic>(<italic>h</italic>&#x7c;<italic>x</italic>; <italic>t</italic>&#x20;&#x3d; 1) and <italic>P</italic>(<italic>h</italic>&#x7c;<italic>x</italic>; <italic>t</italic>&#x20;&#x3d; 0) should contain distinguish feature representation and should not be similar with each&#x20;other.</p>
<p>With this in mind, we assume <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>) holding the following two properties:<list list-type="simple">
<list-item>
<p>&#x2022; Selective Enforcement Equality. As a adjustment weight, <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub>(<italic>T</italic>&#x7c;<italic>X</italic>, <italic>Y</italic>) is expected to balance treatment groups with control groups w.r.t, the distributions of confounder representations, e.g., race prejudge. Hence the crime results are independent with the selective enforcement.</p>
</list-item>
<list-item>
<p>&#x2022; Selective Enforcement Disparity. At the same time, driver distributions under treatment and control are desired to be different, which is indicative of selective enforcement.</p>
</list-item>
</list>
</p>
<p>Since the aforementioned considerations contradict each other, we introduce an adversarial re-weight method for a better risk minimization.</p>
</sec>
<sec id="s6-3">
<title>6.3 Adversarial Reweighted Method</title>
<p>Concretely, we introduce an auxiliary <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub> to both approximate selection-bias eliminated data distribution and make the learned distribution distinguishable to its treatment. There are many variants of generative models that meet the two requirements <xref ref-type="bibr" rid="B12">Creswell et&#x20;al. (2018)</xref>. Inspired by existing works <xref ref-type="bibr" rid="B49">Xu et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B24">Hashimoto et&#x20;al. (2018)</xref>, we design a min-max game to solve the above problem. Here we regard the prediction model as a player and the weighting function as an adversary for simplicity. We can formulate our fairness objective as:<disp-formula id="e12">
<mml:math id="m20">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2254;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>To derive a concrete algorithm we need to specify how the players and adversary training parameters <italic>&#x3b8;</italic>
<sub>
<italic>f</italic>
</sub>, <italic>&#x3b8;</italic>
<sub>
<italic>y</italic>
</sub>, and <italic>&#x3b1;</italic>
<sub>
<italic>t</italic>
</sub>. At the saddle point, the feature mapping parameters <italic>&#x3b8;</italic>
<sub>
<italic>f</italic>
</sub> both minimizes crime classification loss as well as maximizes treatment prediction loss. For parameters <italic>&#x3b8;</italic>
<sub>
<italic>y</italic>
</sub> and <italic>&#x3b1;</italic>
<sub>
<italic>t</italic>
</sub>, they both penalize the crime prediction loss and treatment&#x20;loss.</p>
<p>Observe that there is no constraint on <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub> in <xref ref-type="disp-formula" rid="e12">Eq. (12)</xref>, which makes the formulation ill-posed. Based on our Positivity assumption in causal inference, i.e, <italic>p</italic>(<italic>t</italic>&#x7c;<italic>x</italic>) &#x3e; 0, it is clear that <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub> needs to satisfy the positivity assumption: <italic>g</italic>
<sub>
<italic>&#x3b1;</italic>
</sub> &#x3e; 0. Besides, to prevent exploding gradients, it is important to normalize the weights across the dataset (or in the current training batch). In principle, we perform a normalization step that rescales the value of the adversary weighting component. Specifically, we make:<disp-formula id="e13">
<mml:math id="m21">
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:math>
<label>(13)</label>
</disp-formula>where <italic>N</italic> is current batch sizes. Finally, we present the objective function of the proposed minimax game in two stages:<disp-formula id="e14">
<mml:math id="m22">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>arg min</mml:mi>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>
<disp-formula id="e15">
<mml:math id="m23">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>arg</mml:mi>
<mml:mi>max</mml:mi>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:math>
<label>(15)</label>
</disp-formula>
</p>
<p>Without loss of generality, we treat the objective in <xref ref-type="disp-formula" rid="e12">Eq. (12)</xref> as two component of leaner and adversary: <inline-formula id="inf9">
<mml:math id="m24">
<mml:mi>arg</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf10">
<mml:math id="m25">
<mml:mi>arg</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. To handle the adversarial training, we adopt the optimization setup where the learner and adversary take turns to update their model. A more detailed training optimization can be found in <xref ref-type="bibr" rid="B20">Goodfellow et&#x20;al. (2014)</xref>.</p>
</sec>
</sec>
<sec id="s7">
<title>7 Experiment</title>
<p>In this section, we first briefly describe the dataset, baselines and evaluation metrics. Then we evaluate our proposed method on the top of baselines to show its desirable performance on both the fairness and efficiency metric. Our experiments aim to answer the following research questions:<list list-type="simple">
<list-item>
<p>&#x2022; RQ1: Can the proposed method be more robust and effective than the standard re-weighting approach, like the inverse propensity weighting method (IPW)?</p>
</list-item>
<list-item>
<p>&#x2022; RQ2: Can the proposed method mitigate ethical conflicts in the NYCSF program as well as improve efficiency?</p>
</list-item>
<list-item>
<p>&#x2022; RQ3: Are learned weights meaningful and why is our proposed method effective?</p>
</list-item>
<list-item>
<p>&#x2022; RQ4: Is our method sensitive to the sensitive group size?</p>
</list-item>
</list>
</p>
<sec id="s7-1">
<title>7.1 The NYC Stop and Frisk (NYCSF) Dataset</title>
<p>We retrieve and collect the publicly available stop, search and frisk data from The New York Police Department website during 2018 January to 2019 December. This dataset serves demographic and other information about drivers stopped by NYC police force. Since police enforcement is dynamically changed. To validate the robustness of our&#x20;method, we partition the each annual data into two continues subsets, e.g., &#x2018;NYCSF_2018F(irst)&#x2019; and &#x2018;NYCSF_2018L(ast)&#x2019;, whose durations are half years. For each subset, we select the former 4-month as training set, and the following 2&#xa0;month as validation and testing set respectively. We also mix the 2&#xa0;year dataset as a &#x2018;NYCSF_Mix&#x2019; dataset, then we randomly split the dataset as 7:2:1 for training, testing and validation respectively.</p>
<p>For data processing, since some &#x2018;null&#x2019; value only means &#x2018;False&#x2019; in NYCSF stops. We carefully analyze the data, and replace &#x2018;(null)&#x2019; with &#x2018;F&#x2019;. For the remaining data, we drop the data records with default or wrong values. For the textual feature, e.g., witness reports, we initialize the lookup table for textual data with the pre-trained vectors from GloVe <xref ref-type="bibr" rid="B38">Pennington et&#x20;al. (2014)</xref> by setting <italic>l</italic> as 300. For numerical values, we encode categorical variables with one-hot embedding.</p>
</sec>
<sec id="s7-2">
<title>7.2 Baselines and Evaluation Metrics</title>
<p>In this section, we mainly describe baselines and evaluation metrics. For all the methods, we regard race as the sensitive attribute.<list list-type="simple">
<list-item>
<p>1) MLP <xref ref-type="bibr" rid="B17">Gardner and Dorling (1998)</xref>. The vanilla model using multilayer perceptron (MLP) as the network architecture. This is the base model which does not take fairness into consideration. Since we take little attention on the model architecture in this paper and to make a fair comparison, we also use MLP as base model for other baseline.</p>
</list-item>
<list-item>
<p>2) IPW <xref ref-type="bibr" rid="B5">Austin (2011)</xref>. Inverse propensity weighting (IPW) is a general re-weighting based method in causal inference which aims to balance the data distribution in treated and control groups.</p>
</list-item>
<list-item>
<p>3) Unawareness <xref ref-type="bibr" rid="B22">Grgic-Hlaca et&#x20;al. (2016)</xref>. Fairness through unawareness refers to leaving out sensitive attributes, such as driver race and other characteristics deemed sensitive and only takes remaining features as&#x20;input.</p>
</list-item>
<list-item>
<p>4) PRP <xref ref-type="bibr" rid="B23">Hardt et&#x20;al. (2016)</xref>. Equality of Opportunity which is defined as an equality of the False Positive Rates across groups.</p>
</list-item>
<list-item>
<p>5) LFR <xref ref-type="bibr" rid="B14">Feng et&#x20;al. (2019)</xref>. LFR takes an adversarial framework to ensure that the distributions across different sensitive groups are similar.</p>
</list-item>
<list-item>
<p>6) Ours. We propose an adversarial re-weighting method. We train a base crime prediction classifier as a player, and use a treatment prediction classier as an adversary.</p>
</list-item>
</list>
</p>
<p>Evaluation Metric: To measure the efficiency and fairness, following existing works in <xref ref-type="bibr" rid="B45">Wick et&#x20;al. (2019)</xref>, we adopt F-1 score as our utility metric and adopt demographic parity (DP) <xref ref-type="bibr" rid="B14">Feng et&#x20;al. (2019)</xref> as our fairness metrics. Unlike accuracy which is easy to achieve high performance for trivial predictions, F-1 shows discrepancy of class imbalance <xref ref-type="bibr" rid="B45">Wick et&#x20;al. (2019)</xref>. We stratify the test data by its race label, compute F1 score for each racial group and report: 1) Macro-F1: macro average over all racial groups; 2) Minority F-1: the lowest F1 score reported across all racial groups. To evaluate fairness, demographic parity requires the behavior of prediction models to be fair on different sensitive groups. specifically, it requires the positive rate across sensitive attributes are equal:<disp-formula id="e16">
<mml:math id="m26">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>j</mml:mi>
</mml:math>
<label>(16)</label>
</disp-formula>
</p>
<p>In the experiment, we report difference in demographic parity:<disp-formula id="e17">
<mml:math id="m27">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>j</mml:mi>
</mml:math>
<label>(17)</label>
</disp-formula>
</p>
</sec>
<sec id="s7-3">
<title>7.3 Performance Comparison</title>
<sec id="s7-3-1">
<title>7.3.1 With Inverse Propensity Weighting (IPW) Method</title>
<p>To answer RQ1, we fix the base classifier as MLP and conduct classification on all three datasets. To better understand if our model is effective in mitigating selection bias, we compare our method with vanilla MLP and IPW methods. Vanilla MLP does not take any strategies addressing the bias problem. IPW balances the data distribution between treated and control groups, which can minimize weighted empirical risk and approximate unbiased data distribution. We fix the base classifier as MLP and conduct classification on all five datasets.</p>
<p>
<xref ref-type="table" rid="T2">Table&#x20;2</xref> summarizes the main result and we make the following observations:<list list-type="simple">
<list-item>
<p>&#x2022; The proposed model mostly achieves the best performance regarding all evaluation metrics. It manifests the importance of the adversarial re-weighting framework. Besides, selection bias in the NYCSF dataset indeed exists, hence MLP is inferior to other methods most of the time. It is interesting and essential to mitigate ethical conflicts in the NYCSF program in a debasing&#x20;way.</p>
</list-item>
<list-item>
<p>&#x2022; Selection bias, i.e.,&#x20;police selective enforcement, is dynamically changed with time, making it essential to take unknown factors into consideration. It is obvious that the prediction score over the all time variant dataset is different. Compared with IPW, our method is more robust since it considers unknown factors into the formulation of a dynamic weighting score. Meanwhile, it is&#x20;effective to both balance treatment groups with control groups and make the data distribution distinguishable.</p>
</list-item>
<list-item>
<p>&#x2022; IPW can improve subgroup fairness by adjusting data distribution under both treated and control groups. Compared with MLP, IPW assigns a propensity score to each sample, hence the minority is over-represented by the model&#x2013;different sensitive groups have similar&#x20;probability to be represented in the front of the police.</p>
</list-item>
</list>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Ours VS Inverse propensity score method.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Dataset</th>
<th rowspan="2" align="center">Method</th>
<th align="center">Macro</th>
<th align="center">Minority</th>
<th rowspan="2" align="center">&#x394;<sub>
<italic>DP</italic>
</sub>
</th>
</tr>
<tr>
<th align="center">F1</th>
<th align="center">F1</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">NYCSF_Mix</td>
<td align="center">MLP</td>
<td align="char" char=".">0.9648</td>
<td align="char" char=".">0.9470</td>
<td align="char" char=".">0.0723</td>
</tr>
<tr>
<td align="center">IPW</td>
<td align="char" char=".">0.9715</td>
<td align="char" char=".">0.9517</td>
<td align="char" char=".">0.0640</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9922</td>
<td align="char" char=".">0.9712</td>
<td align="char" char=".">0.0621</td>
</tr>
<tr>
<td rowspan="3" align="left">NYCSF_2018F</td>
<td align="center">MLP</td>
<td align="char" char=".">0.9930</td>
<td align="char" char=".">0.9864</td>
<td align="char" char=".">0.0848</td>
</tr>
<tr>
<td align="center">IPW</td>
<td align="char" char=".">0.9729</td>
<td align="char" char=".">0.9367</td>
<td align="char" char=".">0.1072</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9804</td>
<td align="char" char=".">0.9456</td>
<td align="char" char=".">0.1039</td>
</tr>
<tr>
<td rowspan="3" align="left">NYCSF_2018L</td>
<td align="center">MLP</td>
<td align="char" char=".">0.9385</td>
<td align="char" char=".">0.9129</td>
<td align="char" char=".">0.0946</td>
</tr>
<tr>
<td align="center">IPW</td>
<td align="char" char=".">0.9624</td>
<td align="char" char=".">0.9229</td>
<td align="char" char=".">0.1049</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9729</td>
<td align="char" char=".">0.9401</td>
<td align="char" char=".">0.0901</td>
</tr>
<tr>
<td rowspan="3" align="left">NYCSF_2019F</td>
<td align="center">MLP</td>
<td align="char" char=".">0.9221</td>
<td align="char" char=".">0.8840</td>
<td align="char" char=".">0.0996</td>
</tr>
<tr>
<td align="center">IPW</td>
<td align="char" char=".">0.9763</td>
<td align="char" char=".">0.9481</td>
<td align="char" char=".">0.0795</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9922</td>
<td align="char" char=".">0.9712</td>
<td align="char" char=".">0.0629</td>
</tr>
<tr>
<td rowspan="3" align="left">NYCSF_2019L</td>
<td align="center">MLP</td>
<td align="char" char=".">0.9074</td>
<td align="char" char=".">0.8703</td>
<td align="char" char=".">0.1142</td>
</tr>
<tr>
<td align="center">IPW</td>
<td align="char" char=".">0.9437</td>
<td align="char" char=".">0.9015</td>
<td align="char" char=".">0.1098</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9874</td>
<td align="char" char=".">0.9419</td>
<td align="char" char=".">0.1012</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s7-4">
<title>7.4 With Fairness Based Methods</title>
<p>In this section, to answer RQ2, we make a comparison with the classical fairness based methods. To make a fair comparison, we set the base classifier as a MLP for all the mentioned methods. <xref ref-type="table" rid="T3">Table&#x20;3</xref> shows the main results, and we obtain the following observations:<list list-type="simple">
<list-item>
<p>&#x2022; Proposed method achieves promising performance in terms of utility and fairness across all the datasets in terms of all time slot splitting. This result validates the assumption that unfairness in this task is partly due to selection biases. And our proposed method can alleviate selection for &#x2018;stop-and-frisk&#x2019; programs.</p>
</list-item>
<list-item>
<p>&#x2022; Fairness based constraint can improve the group fairness. While damaging their efficiency. It is obvious that fairness based baselines can eliminate group gaps, while the utility performance is lower than Ours in most of the time. This result meets the observation that directly requiring equality across racial groups is a too strong constraint and may damage its utility.</p>
</list-item>
<list-item>
<p>&#x2022; A proper fairness notions and better model architecture design is useful to improve utility. Compared with &#x2018;Unawareness&#x2019; and &#x2018;PRP&#x2019; methods, LFR achieves the best utility performance, since LFR incorporates an adversarial module to enforce fairness across groups.</p>
</list-item>
</list>
</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Ours VS fairness based methods.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Dataset</th>
<th rowspan="2" align="center">Method</th>
<th align="center">Macro</th>
<th align="center">Minority</th>
<th rowspan="2" align="center">&#x394;<sub>
<italic>DP</italic>
</sub>
</th>
</tr>
<tr>
<th align="center">F1</th>
<th align="center">F1</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="left">NYCSF_Mix</td>
<td align="center">Unawareness</td>
<td align="char" char=".">0.9612</td>
<td align="char" char=".">0.9429</td>
<td align="char" char=".">0.0718</td>
</tr>
<tr>
<td align="center">PRP</td>
<td align="char" char=".">0.9713</td>
<td align="char" char=".">0.9425</td>
<td align="char" char=".">0.0629</td>
</tr>
<tr>
<td align="center">LFR</td>
<td align="char" char=".">0.9837</td>
<td align="char" char=".">0.9639</td>
<td align="char" char=".">0.0598</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9922</td>
<td align="char" char=".">0.9712</td>
<td align="char" char=".">0.0621</td>
</tr>
<tr>
<td rowspan="4" align="left">NYCSF_2018F</td>
<td align="center">Unawareness</td>
<td align="char" char=".">0.9887</td>
<td align="char" char=".">0.9834</td>
<td align="char" char=".">0.0825</td>
</tr>
<tr>
<td align="center">PRP</td>
<td align="char" char=".">0.9701</td>
<td align="char" char=".">0.9298</td>
<td align="char" char=".">0.0801</td>
</tr>
<tr>
<td align="center">LFR</td>
<td align="char" char=".">0.9790</td>
<td align="char" char=".">0.9334</td>
<td align="char" char=".">0.0744</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9804</td>
<td align="char" char=".">0.9456</td>
<td align="char" char=".">0.1039</td>
</tr>
<tr>
<td rowspan="4" align="left">NYCSF_2018L</td>
<td align="center">Unawareness</td>
<td align="char" char=".">0.9311</td>
<td align="char" char=".">0.9100</td>
<td align="char" char=".">0.0914</td>
</tr>
<tr>
<td align="center">PRP</td>
<td align="char" char=".">0.9598</td>
<td align="char" char=".">0.9234</td>
<td align="char" char=".">0.0899</td>
</tr>
<tr>
<td align="center">LFR</td>
<td align="char" char=".">0.9712</td>
<td align="char" char=".">0.9355</td>
<td align="char" char=".">0.0832</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9729</td>
<td align="char" char=".">0.9401</td>
<td align="char" char=".">0.0901</td>
</tr>
<tr>
<td rowspan="4" align="left">NYCSF_2019F</td>
<td align="center">Unawareness</td>
<td align="char" char=".">0.9116</td>
<td align="char" char=".">0.8813</td>
<td align="char" char=".">0.0943</td>
</tr>
<tr>
<td align="center">PRP</td>
<td align="char" char=".">0.9701</td>
<td align="char" char=".">0.9455</td>
<td align="char" char=".">0.0701</td>
</tr>
<tr>
<td align="center">LFR</td>
<td align="char" char=".">0.9823</td>
<td align="char" char=".">0.9630</td>
<td align="char" char=".">0.0680</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9922</td>
<td align="char" char=".">0.9712</td>
<td align="char" char=".">0.0629</td>
</tr>
<tr>
<td rowspan="4" align="left">NYCSF_2019L</td>
<td align="center">Unawareness</td>
<td align="char" char=".">0.9054</td>
<td align="char" char=".">0.8678</td>
<td align="char" char=".">0.1125</td>
</tr>
<tr>
<td align="center">PRP</td>
<td align="char" char=".">0.9434</td>
<td align="char" char=".">0.9211</td>
<td align="char" char=".">0.1022</td>
</tr>
<tr>
<td align="center">LFR</td>
<td align="char" char=".">0.9652</td>
<td align="char" char=".">0.9410</td>
<td align="char" char=".">0.0983</td>
</tr>
<tr>
<td align="center">Ours</td>
<td align="char" char=".">0.9874</td>
<td align="char" char=".">0.9419</td>
<td align="char" char=".">0.1012</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s7-5">
<title>7.5 Weight Analysis</title>
<p>We have provided a theoretical analysis on the proposed adversarial re-weighting learning method. In this section, we want to investigate RQ3, &#x201c;are the learned weights really meaningful?&#x201d;, directly through visualizing the example weights assigned by our model to four quadrants of a confusion matrix. The result is shown in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>. Each subplot visualizes the learnt weight on <italic>x</italic>-axis and their corresponding density on <italic>y</italic>-axis. We obtain the following observations:<list list-type="simple">
<list-item>
<p>&#x2022; Sensitive groups are upsampled with a lower inverse weighting score. It is clear that the density of minority groups, like Asian, occupies more in the low score&#x20;field.</p>
</list-item>
<list-item>
<p>&#x2022; In <xref ref-type="fig" rid="F3">Figures 3A,D</xref>, samples are classified correctly. It is shown that the inverse propensity score of the &#x2018;true positive group&#x2019; is smaller than the &#x2018;false negative group&#x2019;. Since most of the suspects are innocent, the model has more uncertainty on the &#x2018;false positive group&#x2019;. Our proposed method make a counterfactual estimation on what the suspect will&#x20;do</p>
</list-item>
<list-item>
<p>&#x2022; It is also obvious that misclassified groups are up-weighted. Comparing <xref ref-type="fig" rid="F3">Figure&#x20;3B,C</xref>, most of the weight lies in the interval of [0.3,0.4]. And protected groups have more probability in this interval than the majority, like the&#x20;black.</p>
</list-item>
</list>
</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Weight Analysis of our method. <bold>(A)</bold> Prediction &#x3d;1 , label &#x3d;1&#x20;<bold>(B)</bold> Prediction &#x3d;1 , label &#x3d;0&#x20;<bold>(C)</bold> Prediction &#x3d;0 , label &#x3d;1&#x20;<bold>(D)</bold> Prediction &#x3d;0 , label &#x3d;0</p>
</caption>
<graphic xlink:href="fdata-04-787459-g003.tif"/>
</fig>
</sec>
<sec id="s7-6">
<title>7.6 Sensitivity Analysis</title>
<p>In this section, to answer RQ4, we analyze the sensitivity of our proposed model towards group size, to evaluate its robustness. Selection bias indicates that different races are stopped and searched differently. To replicate selection bias, we vary the fraction of the black group and under-sample the data from 0.2 to 1.0 in the training set. We report the result on the original test dataset and show the result in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. Results are reported on the NYCSF_Mix dataset, with LFR as the baseline.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Classifier performance (F-1 score) and fairness as a function of the amount of black examples. For F-1, the higher, the better. For selection rate, the lower, the better. <bold>(A)</bold> Macro-F1 <bold>(B)</bold> Demographic Parity</p>
</caption>
<graphic xlink:href="fdata-04-787459-g004.tif"/>
</fig>
<p>It is obvious that our model and LFR is robust to selection bias, w.r.t. the number of group sizes. As the fraction of the black race sample increases, we are forced to over-sample the black people. While the fairness metric &#x394;<sub>
<italic>DP</italic>
</sub> for LFR keeps going up. Besides, we can also observe that IPW is sensitive towards group size. This verifies our assumption of incorporating unknown factors into the formulation of propensity score estimation.</p>
</sec>
</sec>
<sec id="s8">
<title>8 Conclusion and Future Work</title>
<p>Summary. In this paper, we study the possible unfairness problem behind law enforcement using data from policing program NYCSF. Massive real-world data with detailed subject profile and environment descriptions are collected and processed as the dataset. Application Implication. Through analyzing the cause and form of biases in it, we formulate it as a selection(exposure) bias&#x20;problem, and propose a countermeasure with the scope of counterfactual risk minimization. As the exposure bias is involved with unknown factors and cannot be directly measured, we design an algorithm with adversarial re-weighting, and give a detailed theoretical and experimental analysis. Future work. One future direction is to include more features in the analysis, as more behaviors and environment attribute may help learning policing practices better. Besides, modeling the dynamics of law enforcement is also one important topic, as it has been reported that the police practices evolve with time from the official website.</p>
</sec>
</body>
<back>
<sec id="s9">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s10">
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: WR, YF; data collection: WR; analysis and interpretation of results: WR, KL, and TZ draft article preparation: WR, KL, TZ, and YF. All authors reviewed the results and approved the final version of the article.</p>
</sec>
<sec id="s11">
<title>Funding</title>
<p>This research was partially supported by the National Science Foundation (NSF) via the grant numbers: 1755946, 2040950, 2006889, 2045567.</p>
</sec>
<sec sec-type="COI-statement" id="s12">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s13">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page">https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Agarwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Beygelzimer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Dud&#xed;k</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Langford</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wallach</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>A Reductions Approach to Fair Classification</article-title>,&#x201d; in <conf-name>International Conference on Machine Learning (PMLR)</conf-name> (<publisher-loc>Stockholm, Sweden</publisher-loc>: <publisher-name>PMLR</publisher-name>). <fpage>60</fpage>&#x2013;<lpage>69</lpage>. </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andreev</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Shkolnikov</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Begun</surname>
<given-names>A. Z.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Algorithm for Decomposition of Differences between Aggregate Demographic Measures and its Application to Life Expectancies, Healthy Life Expectancies, Parity-Progression Ratios and Total Fertility Rates</article-title>. <source>Dem. Res.</source> <volume>7</volume>, <fpage>499</fpage>&#x2013;<lpage>522</lpage>. <pub-id pub-id-type="doi">10.4054/demres.2002.7.14</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arneson</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>Equality and Equal Opportunity for Welfare</article-title>. <source>Philos. Stud.</source> <volume>56</volume>, <fpage>77</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1007/bf00646210</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Auer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Holte</surname>
<given-names>R. C.</given-names>
</name>
<name>
<surname>Maass</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>1995</year>). &#x201c;<article-title>Theory and Applications of Agnostic Pac-Learning with Small Decision Trees</article-title>,&#x201d; in <conf-name>Machine Learning Proceedings 1995</conf-name> (<publisher-loc>Tahoe City, CA</publisher-loc>: <publisher-name>Elsevier</publisher-name>), <fpage>21</fpage>&#x2013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1016/b978-1-55860-377-6.50012-8</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Austin</surname>
<given-names>P. C.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies</article-title>. <source>Multivar. Behav. Res.</source> <volume>46</volume>, <fpage>399</fpage>&#x2013;<lpage>424</lpage>. <pub-id pub-id-type="doi">10.1080/00273171.2011.568786</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Avin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shpitser</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). &#x201c;<article-title>Identifiability of Path-specific Effects</article-title>,&#x201d; in <conf-name>Proceedings of International Joint Conference on Artificial Intelligence</conf-name> (<publisher-loc>Edinburgh, Scotland</publisher-loc>), <fpage>4207</fpage>&#x2013;<lpage>4213</lpage>. </citation>
</ref>
<ref id="B7">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Balke</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1994</year>). &#x201c;<article-title>Counterfactual Probabilities: Computational Methods, Bounds and Applications</article-title>,&#x201d; in <conf-name>Uncertainty Proceedings 1994</conf-name> (<publisher-loc>Seattle, WA</publisher-loc>: <publisher-name>Elsevier</publisher-name>), <fpage>46</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1016/b978-1-55860-332-5.50011-0</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Biega</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Gummadi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Weikum</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Equity of Attention: Amortizing Individual Fairness in Rankings</article-title>,&#x201d; in <conf-name>The 41st international acm sigir conference on research &#x26; development in information retrieval</conf-name> (<publisher-loc>Ann Arbor, MI</publisher-loc>), <fpage>405</fpage>&#x2013;<lpage>414</lpage>. </citation>
</ref>
<ref id="B10">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Calmon</surname>
<given-names>F. P.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Vinzamuri</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Ramamurthy</surname>
<given-names>K. N.</given-names>
</name>
<name>
<surname>Varshney</surname>
<given-names>K. R.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Optimized Pre-processing for Discrimination Prevention</article-title>,&#x201d; in <conf-name>Proceedings of the 31st International Conference on Neural Information Processing Systems</conf-name>, <fpage>3995</fpage>&#x2013;<lpage>4004</lpage>. </citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Chiappa</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Path-specific Counterfactual Fairness</article-title>,&#x201d; in <conf-name>Proceedings of the AAAI Conference on Artificial Intelligence</conf-name> (<publisher-loc>Honolulu, Hawaii</publisher-loc>), <fpage>7801</fpage>&#x2013;<lpage>7808</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33017801</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Creswell</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>White</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Dumoulin</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Arulkumaran</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sengupta</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bharath</surname>
<given-names>A. A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Generative Adversarial Networks: An Overview</article-title>. <source>IEEE Signal. Process. Mag.</source> <volume>35</volume>, <fpage>53</fpage>&#x2013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1109/msp.2017.2765202</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Feldman</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Friedler</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Moeller</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Scheidegger</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Venkatasubramanian</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Certifying and Removing Disparate Impact</article-title>,&#x201d; in <conf-name>proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</conf-name>, <fpage>259</fpage>&#x2013;<lpage>268</lpage>. <pub-id pub-id-type="doi">10.1145/2783258.2783311</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Feng</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lyu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Learning Fair Representations via an Adversarial Framework</source>. <comment>arXiv preprint arXiv:1904.13341</comment>. </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fleurbaey</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Equal Opportunity or Equal Social Outcome?</article-title> <source>Econ. Philos.</source> <volume>11</volume>, <fpage>25</fpage>&#x2013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1017/s0266267100003217</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Funk</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Westreich</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wiesen</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>St&#xfc;rmer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Brookhart</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Davidian</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Doubly Robust Estimation of Causal Effects</article-title>. <source>Am. J.&#x20;Epidemiol.</source> <volume>173</volume>, <fpage>761</fpage>&#x2013;<lpage>767</lpage>. <pub-id pub-id-type="doi">10.1093/aje/kwq439</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gardner</surname>
<given-names>M. W.</given-names>
</name>
<name>
<surname>Dorling</surname>
<given-names>S. R.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Artificial Neural Networks (The Multilayer Perceptron)-A Review of Applications in the Atmospheric Sciences</article-title>. <source>Atmos. Environ.</source> <volume>32</volume>, <fpage>2627</fpage>&#x2013;<lpage>2636</lpage>. <pub-id pub-id-type="doi">10.1016/s1352-2310(97)00447-0</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Garg</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Perot</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Limtiaco</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Taly</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Chi</surname>
<given-names>E. H.</given-names>
</name>
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Counterfactual Fairness in Text Classification through Robustness</article-title>,&#x201d; in <conf-name>Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society</conf-name> (<publisher-loc>Honolulu, HI</publisher-loc>), <fpage>219</fpage>&#x2013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1145/3306618.3317950</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gelman</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fagan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kiss</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>An Analysis of the New York City&#x20;Police Department&#x27;s "Stop-And-Frisk" Policy in the Context of Claims of Racial Bias</article-title>. <source>J.&#x20;Am. Stat. Assoc.</source> <volume>102</volume>, <fpage>813</fpage>&#x2013;<lpage>823</lpage>. <pub-id pub-id-type="doi">10.1198/016214506000001040</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Pouget-Abadie</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mirza</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Warde-Farley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ozair</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). &#x201c;<article-title>Generative Adversarial Nets</article-title>,&#x201d; in <conf-name>Advances in neural information processing systems</conf-name> (<publisher-loc>Montreal, Quebec</publisher-loc>), <volume>27</volume>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Greenland</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Robins</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Identifiability, Exchangeability and Confounding Revisited</article-title>. <source>Epidemiol. Perspect. Innov.</source> <volume>6</volume>, <fpage>4</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1186/1742-5573-6-4</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Grgic-Hlaca</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zafar</surname>
<given-names>M. B.</given-names>
</name>
<name>
<surname>Gummadi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Weller</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making</article-title>,&#x201d; in <conf-name>NIPS Symposium on Machine Learning and the Law</conf-name> (<publisher-loc>Barcelona, Spain</publisher-loc>), <volume>1, 2</volume>. </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hardt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Srebro</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Equality of Opportunity in Supervised Learning</article-title>,&#x201d; in <conf-name>NIPS Symposium on Machine Learning and the Law</conf-name> <volume>29</volume>, <fpage>3315</fpage>&#x2013;<lpage>3323</lpage>. </citation>
</ref>
<ref id="B24">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hashimoto</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Srivastava</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Namkoong</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Fairness without Demographics in Repeated Loss Minimization</article-title>,&#x201d; in <conf-name>International Conference on Machine Learning (PMLR)</conf-name> (<publisher-loc>Stockholm, Sweden</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>1929</fpage>&#x2013;<lpage>1938</lpage>. </citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hern&#xe1;n</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Robins</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2010</year>). <source>Causal Inference</source>. <comment>[Dataset]</comment>. </citation>
</ref>
<ref id="B26">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kallus</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Deepmatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training</article-title>,&#x201d; in <conf-name>International Conference on Machine Learning (PMLR)</conf-name> (<publisher-loc>Virtual Event</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>5067</fpage>&#x2013;<lpage>5077</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kamiran</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Calders</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Classifying without Discriminating</article-title>,&#x201d; in <conf-name>2009 2nd International Conference on Computer, Control and Communication</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ic4.2009.4909197</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Maciejewski</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Inform: Individual Fairness on Graph Mining</article-title>,&#x201d; in <conf-name>Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &#x26; Data Mining</conf-name> (<publisher-loc>Virtual Conference, United States</publisher-loc>), <fpage>379</fpage>&#x2013;<lpage>389</lpage>. <pub-id pub-id-type="doi">10.1145/3394486.3403080</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Khademi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Foley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Honavar</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Fairness in Algorithmic Decision Making: An Excursion through the Lens of Causality</article-title>,&#x201d; in <conf-name>The World Wide Web Conference</conf-name> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>2907</fpage>&#x2013;<lpage>2914</lpage>. <pub-id pub-id-type="doi">10.1145/3308558.3313559</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krieger</surname>
<given-names>L. H.</given-names>
</name>
<name>
<surname>Fiske</surname>
<given-names>S. T.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Behavioral Realism in Employment Discrimination Law: Implicit Bias and Disparate Treatment</article-title>. <source>Calif. L. Rev.</source> <volume>94</volume>, <fpage>997</fpage>&#x2013;<lpage>1062</lpage>. <pub-id pub-id-type="doi">10.2307/20439058</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kusner</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Loftus</surname>
<given-names>J.&#x20;R.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Counterfactual Fairness</source>. <comment>arXiv preprint arXiv:1703.06856</comment>. </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Locatello</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Abbati</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Rainforth</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sch&#xf6;lkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bachem</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). <source>On the Fairness of Disentangled Representations</source>. <comment>arXiv preprint arXiv:1905.13662</comment>. </citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Louizos</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Swersky</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zemel</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>The Variational Fair Autoencoder</article-title>. In <source>ICLR</source>. </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meares</surname>
<given-names>T. L.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Law and Social Science of Stop and Frisk</article-title>. <source>Annu. Rev. L. Soc. Sci.</source> <volume>10</volume>, <fpage>335</fpage>&#x2013;<lpage>352</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-lawsocsci-102612-134043</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Causal Inference in Statistics: An Overview</article-title>. <source>Stat. Surv.</source> <volume>3</volume>, <fpage>96</fpage>&#x2013;<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1214/09-ss057</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Causal Inference</article-title>,&#x201d; in <conf-name>Causality: Objectives and Assessment</conf-name>, <fpage>39</fpage>&#x2013;<lpage>58</lpage>. </citation>
</ref>
<ref id="B38">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Pennington</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Socher</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Manning</surname>
<given-names>C. D.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Glove: Global Vectors for Word Representation</article-title>,&#x201d; in <conf-name>Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</conf-name> (<publisher-loc>Doha, Qatar</publisher-loc>), <fpage>1532</fpage>&#x2013;<lpage>1543</lpage>. <pub-id pub-id-type="doi">10.3115/v1/d14-1162</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pleiss</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Raghavan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Kleinberg</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Weinberger</surname>
<given-names>K. Q.</given-names>
</name>
</person-group> (<year>2017</year>). <source>On Fairness and Calibration</source>. <comment>arXiv preprint arXiv:1709.02012.</comment> </citation>
</ref>
<ref id="B40">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Tracking and Forecasting Dynamics in Crowdfunding: A Basis-Synthesis Approach</article-title>,&#x201d; in <conf-name>2018 IEEE International Conference on Data Mining (ICDM)</conf-name> (<publisher-loc>Singapore</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1212</fpage>&#x2013;<lpage>1217</lpage>. <pub-id pub-id-type="doi">10.1109/icdm.2018.00161</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Shpitser</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Consistent Estimation of Functions of Data Missing Non-monotonically and Not at Random</article-title>,&#x201d; in <conf-name>Proceedings of the 30th International Conference on Neural Information Processing Systems</conf-name> (<publisher-loc>Barcelona, Spain</publisher-loc>: <publisher-name>Citeseer</publisher-name>), <fpage>3152</fpage>&#x2013;<lpage>3160</lpage>. </citation>
</ref>
<ref id="B42">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Heidari</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Krause</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Mathematical Notions vs. Human Perception of Fairness: A Descriptive Approach to Fairness for Machine Learning</article-title>,&#x201d; in <conf-name>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &#x26; Data Mining</conf-name> (<publisher-loc>Anchorage, AK</publisher-loc>), <fpage>2459</fpage>&#x2013;<lpage>2468</lpage>. </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tyler</surname>
<given-names>T. R.</given-names>
</name>
<name>
<surname>Fagan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Geller</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Street Stops and Police Legitimacy: Teachable Moments in Young Urban Men&#x27;s Legal Socialization</article-title>. <source>J.&#x20;Empirical Leg. Stud.</source> <volume>11</volume>, <fpage>751</fpage>&#x2013;<lpage>785</lpage>. <pub-id pub-id-type="doi">10.1111/jels.12055</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Viswesvaran</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ones</surname>
<given-names>D. S.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Importance of Perceived Personnel Selection System Fairness Determinants: Relations with Demographic, Personality, and Job Characteristics</article-title>. <source>Int. J.&#x20;Selection Assess.</source> <volume>12</volume>, <fpage>172</fpage>&#x2013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1111/j.0965-075x.2004.00272.x</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wick</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Panda</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tristan</surname>
<given-names>J.-B.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Unlocking Fairness: a Trade-Off Revisited</article-title>,&#x201d; in <conf-name>Proceedings of the 33st International Conference on Neural Information Processing Systems</conf-name> (<publisher-loc>Vancouver, BC</publisher-loc>), <fpage>8783</fpage>&#x2013;<lpage>8792</lpage>. </citation>
</ref>
<ref id="B46">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019a</year>). &#x201c;<article-title>Counterfactual Fairness: Unidentification, Bound and Algorithm</article-title>,&#x201d; in <conf-name>Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence</conf-name> (<publisher-loc>Macao, China</publisher-loc>). <pub-id pub-id-type="doi">10.24963/ijcai.2019/199</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2019b</year>). <source>Pc-fairness: A Unified Framework for Measuring Causality-Based Fairness</source>. <comment>arXiv preprint arXiv:1910.12586</comment>. </citation>
</ref>
<ref id="B48">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Achieving Causal Fairness through Generative Adversarial Networks</article-title>,&#x201d; in <conf-name>Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence</conf-name> (<publisher-loc>Macau</publisher-loc>). <pub-id pub-id-type="doi">10.24963/ijcai.2019/201</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Korpeoglu</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Achan</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Adversarial Counterfactual Learning and Evaluation for Recommender System</source>. <comment>arXiv preprint arXiv:2012.02295</comment>. </citation>
</ref>
<ref id="B50">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zafar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Valera</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Gomez-Rodriguez</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gummadi</surname>
<given-names>K. P.</given-names>
</name>
</person-group> (<year>2017a</year>). &#x201c;<article-title>Training Fair Classifiers</article-title>,&#x201d; in <conf-name>AISTATS&#x2019;17: 20th International Conference on Artificial Intelligence and Statistics</conf-name>. </citation>
</ref>
<ref id="B51">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zafar</surname>
<given-names>M. B.</given-names>
</name>
<name>
<surname>Valera</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Gomez Rodriguez</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gummadi</surname>
<given-names>K. P.</given-names>
</name>
</person-group> (<year>2017b</year>). &#x201c;<article-title>Fairness beyond Disparate Treatment &#x26; Disparate Impact: Learning Classification without Disparate Mistreatment</article-title>,&#x201d; in <conf-name>Proceedings of the 26th international conference on world wide web</conf-name> (<publisher-loc>Perth, Australia</publisher-loc>), <fpage>1171</fpage>&#x2013;<lpage>1180</lpage>. </citation>
</ref>
</ref-list>
</back>
</article>