“Part Man, Part Machine, All Cop”: Automation in Policing

Digitisation, automation, and datafication permeate policing and justice more and more each year—from predictive policing methods through recidivism prediction to automated biometric identification at the border. The sociotechnical issues surrounding the use of such systems raise questions and reveal problems, both old and new. Our article reviews contemporary issues surrounding automation in policing and the legal system, finds common issues and themes in various different examples, introduces the distinction between human “retail bias” and algorithmic “wholesale bias”, and argues for shifting the viewpoint on the debate to focus on both workers' rights and organisational responsibility as well as fundamental rights and the right to an effective remedy.


DIGITISATION IN POLICING AND JUSTICE
Since the late 20th century, digitisation (transforming information into computer-readable formats), automation (reducing or eliminating the human role in a system or process through the use of computers and algorithms), and datafication (measuring and quantifying people's lives, in particular qualitative aspects thereof, and using the resulting quantified data for various purposes) have hit policing with full force. From predictive policing methods through recidivism prediction to automated biometric identification at the border, more and more aspects of policing employ automated systems.
For example, large databases are algorithmically sifted to detect "suspicious" people and behaviour, potentially leading to stops and searches on the basis of automatic hits. One of these is the database collecting Passenger Name Records (PNR), made compulsory in the EU in 2018 through the EU's PNR Directive [Directive (EU) 2016/681], according to which airlines must transmit PNR of all passengers to police authorities. For example, in the comparatively small country of Austria, 23,877,277 entries have already been recorded-even though in the first 8 months, data collection did not yet include all airlines operating in Austria (Bundeskriminalamt, 2019, p. 7). Due to its predictive and explorative nature, the PNR system constitutes an example for the problematic technique of predictive policing (for a more in-depth exemplary discussion of predictive policing in the case of Austria and the United States, see Klausner, 2019 andBenbouzid, 2019).
Predicting crime is an endeavour currently undertaken by police forces in many countries, seemingly as popular as it is ultimately futile: classic models, like PRECOBS in Germany, KeyCrime and eSecurity in Italy (Alfter et al., 2019, p. 91) or comparable software in Belgium (Alfter et al., 2019, p. 44) and the Netherlands (Alfter et al., 2019, p. 94) are supposed to predict crimes based on historical data of policing and criminal activity. In the field of justice, in a similar vein, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm is used in several US states to predict recidivism likelihood of criminal convicts .
Another currently growing field of digitisation in policing are large-scale video surveillance systems, often with automated face or behaviour recognition. Clearview AI, a product allegedly purchased by several large police forces, recently made the news when it was advertised as containing more than 3 billion images of people scraped from the internet (Hill, 2020). Automated (or "assisted") facial recognition has also been implemented and researched in the UK (Fussey et al., 2020).
Predictive policing systems are often faced with stark criticism; the use of some tools has been stopped [often after yielding disappointing or inconclusive results, such as PRECOBS in Baden-Württemberg (Mayer, 2019)]. In Santa Cruz, predictive policing has been banned entirely (Ibarra, 2020) over civil liberty and racial discrimination concerns. Similarly, there has been strong resistance against the use of facial recognition as a surveillance tool from civil society, NGOs and activists, and some cities [such as San Francisco (Conger et al., 2019), Boston (Jarmanning, 2020), and Portland (Hatmaker, 2020)] have already banned the use of facial recognition software, motivated by concerns about racial and gender biases, false positives, privacy, and excessive surveillance.
The basis for these kinds of systems very often lies in the creation of massive databases on people and their behaviour. To give just a few examples, the European Dactyloscopy Database (Eurodac) stores over 5 million sets of fingerprints (European Union Agency for the Operational Management of Large-Scale IT Systems in the Area of Freedom, Security and Justice, 2019, p. 11), mostly of asylum seekers; the Visa Information System (VIS) contains over 27 million registered visa applications with fingerprints (European Union Agency for the Operational Management of Large-Scale IT Systems in the Area of Freedom, Security and Justice, 2018, p. 28) and the new Entry/Exit System (EES) collects fingerprints of third-country nationals entering the EU (regardless of whether they are visa holders or visa exempt), projected to affect an estimated 295 million people in 2025 (Napieralski, 2019, p. 200). Moreover, previously existing databases are increasingly being analysed using novel systems, which change the function and effect of these databases entirely. For example, the US Immigration and Customs Enforcement (ICE) used databases containing millions of images from driver's licences for searches with facial recognition software (Harwell and Cox, 2020).
While such systems may have their benefits-such as a higher efficiency/throughput rate, the ability to treat comparable cases more uniformly (even in a large and/or distributed organisation), enabling the easier synthesis of data stemming from different sources or the possibility to process standard cases more quickly and allocate more time to more complex cases-they nonetheless raise several important questions. In this article, we give a comprehensive overview and review of sociotechnical issues in and around automation in policing and the legal systems, find common issues and themes in various different examples, analyse the legal situation in Europe (from the viewpoint of fundamental rights and data protection law), and argue for viewing the debate from a different angle to focus on both workers' rights and organisational responsibility as well as fundamental rights and the right to an effective remedy.

PROBLEMS OF DATA, SOCIETY, AND TECHNOLOGY
In this section, we give an overview of some of the different ways in which automating processes in policing and justice can be problematic in terms of the underlying data, algorithmic systems, and practical, technical and organisational viewpoints. We will first follow the typical data pipeline, going from the input base data through the algorithms and models used to the evaluation and assessment of the results. Then, we will elaborate on the problems caused by scale, in particular the base rate fallacy, and conclude with an argument to consider the debate from the angle of organisational responsibility and the question of workers' rights (i.e., the effect of automation on employees and workers having to work with algorithmic systems) in the context of automation in policing.

Flawed and Dirty Data
The technical and practical side of automation in policing harbours a number of complicated problems beginning at the most fundamental level, the base data themselves. These reflect the imperfect, unequal, and discriminatory systems and societies from which they stem (Hao, 2019). Following the adage "garbage in, garbage out", any such faulty data will engender mistaken outcomes even if, apart from the data, all other parts of the system were working perfectly (a strong premise, we hasten to add). Any decisions taken on the basis of data have to factor in their provenance and quality and what kinds of distortion they might consequently be subject to, and consider how best to counteract and remediate any such bias-otherwise, any previous unequal treatment or discrimination visible in the data will merely be reproduced or even reinforced (Singelnstein, 2018, p. 4).
For example, in the case of predictive policing, prior cases of inferior, discriminatory, or outright illegal policing ("dirty policing") are visible in criminological data ("dirty data") (Richardson et al., 2019, p. 192), e.g., due to underreporting of sexual crimes (Taylor and Gassner, 2010, p. 241 ff.), racist crimes (Kushnick, 1999, ¶1.7), and police violence (Loftin et al., 2017;Gingerich and Oliveros, 2018). Following the legal precepts of non-discrimination in policing and counteracting such biases is made all the harder by the fact that there is often little awareness and acknowledgement of these underlying problems and the discriminatory structures which are (at least partially) responsible therefor. This issue is not helped by the fact that people from marginalised communities are severely underrepresented within the police force (Myers West et al., 2019, p. 15 ff.), which tends to correlate with the unequal treatment of minorities by the police and in turn increase the risk of discrimination and unconscious bias (Legewie and Fagan, 2016) [though the overall research on the interaction of minority representation in the police and discriminatory policing practices remains inconclusive so far (Smith, 2003;Nicholson-Crotty et al., 2017)].
Similar problems arise in the case of recidivism prediction. ProPublica has criticised anti-Black racial bias ) in the COMPAS system as used in Broward County, Florida. While ProPublica's reporting has in turn been questioned and criticised (cf. Espino, 2018, p. 2 ff.), fundamental problems of fairness (Chouldechova, 2017) and transparency (Rudin et al., 2020) in recidivism prediction and the underlying data remain, as does the fact that societal racism seems to contribute to unequal rates of recidivism of people of colour beyond what would be expected from hypothesised purely criminogenic risk (Berry et al., 2020). 1 A joint problem of techniques employing large amounts of data is that even the interpretation of primary data can be problematic. A superficial analysis of the geographical distribution of cases (of whichever crime), e.g., merely focuses the attention toward areas with a higher population density or a similarly trivial reason for higher case incidence; when trying to correct for this, the choice of the basic reference value alone can have a deciding influence on the evaluation-consider the difference in results for neighbourhoods close to train stations or other transport hubs when relating the absolute data to the residential population as opposed to relating them to the ambient population (Belina, 2016, p. 92 ff.).
For biometric data, whether fingerprints, DNA or facial images, specific further kinds of problems exist in the base data. Latent fingerprints are often imperfect (Ulery et al., 2013) (distorted, smudged, and/or partial) and DNA evidence collected from crime scenes is often contaminated (Fonneløp et al., 2016) [probably most infamously in the case of the so-called "Phantom of Heilbronn", a purported criminal tied to at least 40 crime scenes ranging from burglary and robbery to murder, who turned out to be a worker in a factory producing cotton swabs for investigative uses (Balk, 2015, p. 228 f.)]. Conversely, due to racial profiling and overpolicing, people of colour are far more likely to be registered in fingerprint (Love, 2016) or DNA databases (Krimsky and Simoncelli, 2011;Murphy and Tong, 2020). 2 Finally, we turn to the datasets used to train facial recognition algorithms. These have repeatedly been shown to suffer from severe sample bias (Buolamwini and Gebru, 2018;Grother et al., 2019), leading to several somewhat unsettling cases of discriminatory results in the recent past. The most notable recent example had Google Photos' automated photo-indexing system label some black people as gorillas, most likely due to black people being underrepresented in the training dataset (Lee, 2015). Even 3 years later, Google had apparently not been 1 In the case of State v. Loomis, in which Eric Loomis challenged the use of COMPAS risk assessment in sentencing decision in his criminal case as a violation of his due process rights, the Wisconsin Supreme Court accepted the use of COMPAS in sentencing trials in principle, but warned of its limitations [Loomis v. State, Wisconsin Supreme Court, 13. 7. 2016, 881 N.W.2d 749 (Wis. 2016]. 2 In the case of the EES, the overrepresentation of people of colour in the database is built into the system itself, since only third-country nationals' fingerprints are collected, thereby priming the basic population of the database to be disproportionately composed of people of colour. able to fix the underlying problem, instead opting to treat the symptoms by blocking labels such as "gorilla" or "chimp" from appearing (Simonite, 2018). Moreover, datasets explicitly including race and gender in their annotation often lack critical engagement with and a clear conception and description of the categories involved, severely increasing the likelihood of annotator bias (Scheuerman et al., 2020). 3

Algorithms and Modelling
The second set of problems arises from what is actually done with the data. Choosing how to solve a problem, which assumptions to make when modelling a phenomenon and what kind(s) of algorithms to apply to it-all of these involve human decisions prone to implicit or explicit bias. This preexisting bias, to follow the terminology of Friedman and Nissenbaum, is then supplemented and amplified by additional technical and emergent biases (Friedman and Nissenbaum, 1996). Much like Paul Watzlawick's famous axiom of communication ("one cannot not communicate"), one cannot not make assumptions. Modelling by necessity means making many decisions representing value systems-which characteristics to interpret positively or negatively, which data values to put into bins together, which characteristics to consider as the opposite ends of a spectrum, etc. Even the choice of not applying any deliberate kind of modelling assumptions, e.g., in the purely correlative and quaintly named predictive policing software HunchLab (Shapiro, 2017, p. 459), is a choice. Often, there is little to no deliberate consideration of which simplifications, implicit or explicit assumptions, etc., go into algorithmic or modelling solutions of sociological, societal, or even scientific problems (Bennett Moses and Chan, 2018, p. 809 ff.). Not least the experiences in modelling the current COVID-19 pandemic have shown with particular clarity that any kind of modelling and prediction needs to value transparency and humility over false decisiveness and conviction in order to "invite insight, not blame" (Saltelli et al., 2020).
To discuss one specific example, the city of Oakland trialled a predictive policing algorithm inspired by seismographic models (Mohler et al., 2015) based on the so-called "near-repeat" theory of criminality. The main computation therein, however, amounts to little more than simply a moving average; it neither accounts for feedback effects on the level of crime (Lum and Isaac, 2016, p. 18), nor does it make allowances for the fact that the crime rates of different subgroups of the population might have different elasticities in reaction to increases or decreases in policing-shortcomings which have the potential to entirely reverse the expected results (Harcourt, 2007, p. 23 ff.). In general, second and higher order effects (i.e., cascading and feedback effects) have to be an important consideration for any kind of statistically motivated strategy and are often not accounted for (Richardson et al., 2019, p. 20 ff.).
Ultimately, any kind of technique based on pattern recognition suffers from fundamental faults-such as the implicitly necessary assumption of the regularity of the underlying phenomenon (which can lead to paying less frequent crimes too little attention), the enticement toward fighting symptoms instead of underlying causes, 4 and the fact that it can help to obfuscate discriminatory practices both outwardly and inwardly (Kaufmann et al., 2019, p. 11 ff.).
As far as the common technophile argument of better decisions through the use of sophisticated technology is considered, even relatively complex models for recidivism prediction such as COMPAS have been shown to yield results no more fair or accurate than the predictions made by humans with limited or no criminal justice expertise (Dressel and Farid, 2018). However, there is the additional problem of trading in the "retail bias" of individual human decisions for the "wholesale bias" of subjecting larger groups of people to the same automated decision mechanism (and its according biases). (This distinction has, to the best of our knowledge, not been made in these terms so far.) By way of example, consider a human caseworker with an unconscious bias in favour of people with asymmetrical eyebrows. Assuming a caseload of 12 cases per day and 250 workdays a year, about 3,000 people are subject to this caseworker's individual bias each year, with a very limited areal and temporal impact (and even the conceivable possibility of individuals avoiding this specific caseworker should they feel treated unfairly or have heard about their anti-symmetry bias). If instead an algorithmic system required to be used by all caseworkers exhibits this same bias against people with symmetrical eyebrows, the population of people affected thereby is potentially the entire clientele of the organisation in question (assuming, e.g., 30 locations with an average of 10 caseworkers each, up to 900,000 people each year), with no feasible avoidance strategy available to them (for a succinct summary of more ways in which automation is fundamentally different with regard to inequality-e.g., opacity, persistence, or universality-see Eubanks, 2018, p. 184-188).
Even biometric identification, which might at first glance appear to be less susceptible to these kinds of problems, is subject to comparable concerns. Latent fingerprint analysis performed by humans has a significant likelihood of errors (Ulery et al., 2011;Pacheco et al., 2014), and the underlying methodology of fingerprint analysis itself has been called into question (Cole, 2005). The arguably more harmful false positives are increasingly likely with automation of fingerprint (or DNA) matching, as matching algorithms are more and more likely to find purported close matches in the growing biometric databases (even more so as previously separate national databases are increasingly being integrated in European and international infrastructures)-with the result that such searches are prone to turn up suspects even if the true perpetrator is not even present in the database (Dror and Mnookin, 2010).
The black box nature of the deep learning algorithms used in facial recognition technology raises a whole host of questions regarding transparency, explainability, and accountability, with entire conferences [e.g., the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) or the AAAI/ACM Conference on AI, Ethics, and Society (AAAI/ACM AIES)] and subfields of the machine learning research community devoted to these issues. 5

Evaluation and Assessment
The third area of concern is the evaluation, interpretation, and assessment of the results of algorithm decision support, automated pattern matching, or other automation technology. Despite increasing attention to and appreciation of the questions of explainability and transparency of algorithmic systems, even the most well-understood technology remains a black box to most end users, making a critical and self-aware use thereof difficult, if not impossible (Ferguson, 2017(Ferguson, , p. 1165. This topic is also fundamentally connected to the question of accountability and responsibility, both on behalf of organisations and individuals; for more detail on this question, see section 2.5. Moreover, algorithms and technology are often perceived to be more objective and neutral than humans-a viewpoint that both fails to acknowledge the problems of bias in the underlying data stemming from e.g., past (human) discrimination (be it individual, institutional, systemic) and helps to eschew the necessity of reflecting on and justifying one's actions (cf. Lum and Isaac, 2016, p. 18 f., Bennett Moses and Chan, 2018, p. 817 f., and Shapiro, 2017. Biometric evidence in particular suffers from this problem of misplaced and excessive faith in technology and its perceived infallibility (Schklar and Diamond, 1999)-for example, the use of DNA evidence in otherwise weak, circumstantial criminal cases significantly increases the likelihood of conviction (Dartnall and Goodman-Delahunty, 2006). In reality, the quality of biometric evidence is far from perfect [and sometimes even faked (Giannelli, 1997)] and can lead to convictions of innocent people in alarming numbers of cases (Naughton and Tan, 2011), few of which are successfully reviewed and overturned (although the positive role DNA evidence can play in overturning wrongful convictions also has to be mentioned (Olney and Bonn, 2015).
The issue of misplaced faith in biometric evidence and the need for giving jury members sufficient information to correctly understand the actual strength of the evidence presented has been acknowledged in some jurisdictions, in particular in the case of DNA evidence. However, the long and underscrutinised history of fingerprint evidence has led to paradoxical arrangements in which some jurisdictions require the presentation of matching DNA evidence to be accompanied with statistical probabilities while fingerprint evidence is conversely prohibited from being presented as anything but categorically certain (Neumann, 2012;Neumann et al., 2012).
Similarly, predictive algorithms such as those used in predictive policing often lack the requisite critical evaluation of their effectiveness and the advantages or disadvantages their use brings with it (cf. Bennett Moses and Chan, 2018, p. 815 ff. as well as Belina, 2016, p. 93 f. for further sources). A recent comprehensive literature review found that so far, there is little empirical evidence either in favour of (i.e., proving the promised benefits exist) or against (i.e., validating the existence of expected drawbacks) their use (Meijer and Wessels, 2019). Predictive policing algorithms are generally particularly difficult to evaluate as unfulfilled forecasts can always be attributed to either the effect of acting on the algorithm's prediction (and thus e.g., preventing the forecasted event) or to the algorithm failing to predict accurately. Discriminatory use of predictive policing can lead to overpolicing of areas (possibly erroneously) deemed to be more dangerous. Increased police presence in certain areas can then lead to more arrests in those areas, inducing even more policing of the area and thus creating a positive feedback loop (Adensamer and Klausner, 2019, p. 422 f.).
A (perhaps somewhat surprising) challenge of evaluating ADS is that the context of its use has to be taken into account, as organisational and operational factors can make a great difference in its effectiveness. In their ethnographic study of assisted facial recognition used by police in South Wales and London, Fussey et al. (2020) found that several seemingly small factors influenced the results: the positioning of the camera, the quality of the photos in the comparison data set ("watchlist"), the expectations of the officers as well as their level of ennui while using the system. These and many other factors have to be taken into account when evaluating an algorithmic support system. Moreover, conflicting interests can create fundamental obstacles in the use of crime prediction systems: When an organisation has to prove that their own predictive system is useful, they also are interested in showing its effectiveness, and therefore showing that the predictions are true. At the same time, it is the innate interest of law enforcement agencies to prevent crime; but after its prevention the effectiveness of the prediction cannot be proven anymore.

Base Rate Fallacy and Difficulties of Scale
The increase in scale of datasets by itself can change the efficacy, drawbacks, and dangers of a method significantly, with the use of big data biometrics a stark example for this. Current systems of fingerprint matching have a high reliability, but they are still not perfect. No matching system has a 100% success rate. Fingerprint identification is nonetheless sometimes still treated as if it were infallible (also cf. the previous section); in the UK, for example, a fingerprint match can shift the burden of proof (i.e., proving that the match is incorrect) on the plaintiff (in this case, an asylum seeker contesting a Dublin transfer) (European Union Agency for Fundamental Rights, 2018, p. 82).
The great trust in the method of fingerprint matching stems from times when it was used on much smaller datasets, e.g., when comparing fingerprints from a crime scene with those of a limited set of suspects. Today, EU agencies are operating several large scale biometric databases, such as the new Entry/Exit System (EES) collecting fingerprints of third-country nationals entering the EU, expected to affect 295 million people in 2025 (Napieralski, 2019, p. 200).
When datasets scale, the number of false positives scales with them. Not taking this into account can mean succumbing to the so-called "base rate fallacy" and consequently to highly overestimating the efficacy of a system. Fingerprints are often thought to be unique, but in a million fingerprints, the fingerprints of some pairs of people will have such a high resemblance that a matching system or an expert will not be able to distinguish them (Dror and Mnookin, 2010, p. 55). Purely accidental matches (with severe consequences for the victim of such a mistake) hence become more and more likely the bigger the datasets are. If the failure rate of a fingerprint identification system is assumed to be 0.1% [a common industry standard (Jain et al., 2010, p. 40)], then in a dataset with a million entries, there will be around 1,000 false matches. If 295 million people's fingerprints are collected (as is projected to be the case for the EU's EES), the number of false positive results of the same matching algorithm will amount to about 295,000 [conversely, achieving acceptable false positive identification rates of 1% or even 0.1%, both also common industry standards (Watson et al., 2014, p. xiv), in a database containing dozens of millions of entries requires almost unattainably stringent thresholds for the false match rate (Jain et al., 2010, p. 40)].
When comparing very similar fingerprints (which is more likely the bigger the dataset), the standard of accuracy of the matching algorithm consequently also has to be far higher than in smaller sets (Dror and Mnookin, 2010, p. 56). Moreover, there have to be better mechanisms and provisions to contest a purported fingerprint match (see section 3.3 below) to alleviate cases like these, not least due to the facts that false positives can have significant impact on people's lives (they can be the deciding factor whether someone is allowed to enter a country) and can lead to lengthy and costly administrative procedures to raise complaints and effect corrections.

Organisational Responsibility and Workers' Rights
Finally, as we transition from sociotechnical and societal issues to legal ramifications, we want to change the focus on the question of responsibility for automated or automation-assisted decisions and the consequences for the employees, in this case police officers, involved in them. The use of such technology is prone to cause conflicts of interests between different levels of organisational hierarchies and affect the work lives of employees lacking agency to have a say in decisions surrounding the use of automation. (We will confine ourselves to commenting on some specific aspects regarding automation in policing and justice here; for a more in-depth investigation of the issue of organisational responsibility in the face of automation, see Adensamer et al., 2021. For a discussion of questions surrounding algorithmic control and its contestation between employers and workers, see Kellogg et al., 2020, and for an analysis of different ways in which employers are using algorithms to shift risks from themselves to workers, see Moradi and Levy, 2020.) The central issue is that the decision to use or eschew automation is not made by the people who actually have to employ such technology in their everyday work (cf. Faraj et al., 2018, p. 366 f.). Nonetheless, the introduction of algorithmic decision support (ADS) changes the expectations placed on their output. Typically, employees can still be held responsible for the decisions they make, but are expected to make them with higher efficiency (Vieth and Wagner, 2017, p. 20) and more quickly (Zweig et al., 2018, p. 15). They often have to endorse or decline automated "suggestions" given by an algorithmic system, irrespective of whether they have the necessary documentation and training to understand it sufficiently. Moreover, as explained above in section 2.2, the use of algorithms and automation risks severely aggravating the potential of discriminatory decisions (through what we have named "wholesale bias"), which is of particular salience in the context of policing.
We do not wish to be doomsaying without exceptionthere are positive examples, such as the child welfare hotline workers assessed in the case study of De-Arteaga et al. (2020), which showed that in the right circumstances, trained professionals can detect and react appropriately to errors and bias in algorithms. Nonetheless, the comprehensive account by Kolkman (2020) makes an exhaustive and thorough argument (based on several case studies from the Netherlands and the UK) that transparency and in-depth understanding of algorithmic models may be impossible to achieve even for people working with such models professionally.
In general, employees can be held accountable by their employer when they (illegally) discriminate in their decisions. This situation changes when it is an algorithmic system which discriminates, without the employee (directed to do their work using ADS) sufficiently understanding the model or its effects. In that case, we argue that employees cannot be held accountable, because they neither have power over the use of the algorithm nor the means to check its decisions for discrimination in a meaningful way. Responsibility can never exceed the scope for decision-making.
When ADS is introduced, the power (and with it the responsibility) shifts away from the employee (who has previously made decisions without the use of ADS and had more time for each individual decision). Responsibility is now split between the management or government deciding to introduce ADS, the programmers developing the system and the people tasked with quality control of the model. If, between all these parties, it remains unclear who is practically responsible for discrimination in an individual case, what follows in that situation is what we call a "responsibility vacuum". Hence when introducing ADS in an organisation (and in particular so in the case of police and justice), a lot of attention has to be paid to the decision-makers who are newly "supported" by algorithmic systems. Their job description might change implicitly, but their qualifications and knowledge do not automatically change at the same time or pace, which puts them in a particularly untenable situation if their power over decisions diminishes while their degree of responsibility remains.
Regarding risk-shifting, many of the common observations and critiques do not apply in the specific context of policingfor instance, Moradi and Levy identify four main ways in which risk is reallocated using automated systems: highly flexible staffing and scheduling, a redefinition of what compensable work is the detection and prevention of loss and fraud, and the incentivisation and exhaustion of productivity. What all of these share is the characteristic that existing inefficiencies within an organisation are not eliminated, but that ADS instead "redistribute[s] the risks and costs of these inefficiencies to workers" (Moradi and Levy, 2020, p. 278). The sort of systems Moradi and Levy describe are much less likely to be introduced in public bodies like the police force. Instead, in the police force and similar (quasi-)public bodies, we identify the question of discrimination and the responsibility therefor as the central issue when ADS systems are introduced; particularly in the case of the police, shifting personal responsibility even further away from the individual seems especially worrying in a system in which it is already very difficult to successfully fight discriminatory treatment or effect disciplinary measures against individual members of the police who have been shown to exhibit strongly discriminatory treatment. In both cases (the scenarios investigated by Moradi and Levy as well as our analysis of the effect in the police force), we see a shift of burdens through the introduction of algorithmic systems: in one case, the burden of risk, and in the other, the burden of responsibility.

LEGAL CONSIDERATIONS
We now turn to legal aspects of automation in policing. Many tools serving as examples of automation and digitisation in policing in this article are used for surveillance purposes and for decision-making based on personal data, which leads to important legal questions on privacy and protection of personal data. In the following, we will discuss the impact of such tools on fundamental rights, then turn to questions of data protection and finally discuss the right to effective remedy in the context of automation in policing and justice.

Fundamental Rights
Many aspects of automation in policing and particularly measures of mass surveillance are a threat to the protection of fundamental rights, such as the right to respect for private life [Art. 7 Charter of Fundamental Rights (CFR) and Art. 8 European Convention of Human Rights (ECHR)] and the right to protection of personal data (Art. 8 CFR). Whenever authorities process data about persons these rights are infringed, and unless the measures are proportional, they violate fundamental rights. Such measures must be tested on the grounds of necessity, foreseeability, safeguards, oversight, and proportionality (in the narrower sense of the word).
Surveillance measures have to be "in accordance with the law" (Klass and Others v. Germany, ECtHR, 6. 9. 1978, 5029/71, para. 58). This can be a problem when law enforcement agencies introduce new technologies without an explicit legal basis. In Austria, for example, the police have started using facial recognition technology on the basis of laws allowing for general video surveillance measures (Bundesministerium für Inneres, 2019); this could be a violation of the principle of legality.
The European Court of Human Rights (ECtHR) has developed a list of minimum safeguards that have to be specified in any law on secret measures of surveillance: (1) the nature of the offences that may give rise to an interception order; (2) a definition of the categories of people liable to be surveilled; (3) a limit on the duration of surveillance; (4) the procedure to be followed for examining, using and storing the data obtained; (5) the precautions to be taken when communicating the data to other parties; and (6) the circumstances in which data may or must be erased (Weber and Saravia v. Germany, ECtHR, 29. 6. 2006, 54934/00, para. 95).
In the case of bulk communications surveillance in the UK brought before the court in the wake of the Snowden revelations, the ECtHR has found that oversight over the measures has to include "the entire selection process, including the selection of bearers for interception and the selection of material for examination by an analyst" (Big Brother Watch and Others v. the United Kingdom, ECtHR, 13. 9. 2018, 58170/13, 62322/14 and 24960/15, para. 387). The bulk interception in the UK had been determined to be in violation of the right to privacy in Art. 8 ECHR.
In the light of these judgements, it is clear that all systems of mass processing of personal data have to adhere to some intentionality. An entirely open-ended data mining and machine learning approach to wholesale "big data" surveillance cannot satisfy the criteria of oversight over selectors and search criteria for filtering that the ECtHR has put forward in Big Brother Watch v. the UK.
In the jurisdiction of the European Court of Justice (ECJ) on surveillance, the judgements on data retention stand out. In the case of Digital Rights Ireland/Seitlinger and Others (ECJ, 8. 4. 2014, C-293/12 andC-594/12, ECLI:EU:C:2014:238), the ECJ has found that the retention of data without a concrete case or investigation is a violation of Art. 7 and Art. 8 CFR. This applies even before the data are accessed and analysed-the retention of mass data itself is an infringement of fundamental rights (Digital Rights Ireland, para. 34). It is only justified under a number of criteria, which the ECJ has further developed in the cases Schrems (ECJ, 6. 10. 2015, C-362/14, ECLI:EU:C:2015 and Tele2 Sverige/ Watson andOthers (ECJ, 21. 12. 2016, C-203/15 andC-698/15, ECLI:EU:C:2016:970). The legal measures have to be clear and precise, and the ECJ requires minimum safeguards protecting against abuse and unlawful access to personal data (Schrems, para. 91). The ECJ also specifically notes that "the need for such safeguards is all the greater where personal data is subjected to automatic processing" (Schrems, para. 91). Furthermore, surveillance measures have to be "strictly necessary" (Schrems, para. 91; Digital Rights Ireland, para. 52) and must not compromise the "essence" of the fundamental right to respect for private life (Schrems, para. 94).
The ECJ also declared that surveillance of electronic communications is only permissible for persons with a link (although "even an indirect or remote one" suffices) to serious criminal offences (Tele2 Sverige, para. 105; Digital Rights Ireland, para. 57). There have to be at least some objective criteria linking the purpose of data processing to the persons whose data are processed. Similar to the opinion of the ECtHR above, according to the ECJ, open-ended data mining of personal data is a violation of fundamental rights.

Data Protection
The processing of personal data by law enforcement agencies in the EU is regulated by the Data Protection Directive for Police and Criminal Justice Authorities [often shortened to "Police Directive" (PD), Directive (EU) 2016/680], whereas the better known General Data Protection Regulation (GDPR) is largely not applicable to the police. As a directive, the Police Directive had to be transposed into member state law by each individual member state [in Austria, for example, this has been carried out through the Datenschutzgesetz (DSG) in 2018]. According to Art. 11 of the directive, the member states have to prohibit the automated processing of personal data when it produces adverse legal effects on the person or significantly affects them otherwise (i.e., non-legally), or if appropriate safeguards for the rights and freedoms of the affected person are not in place. The minimum such safeguard is the right to obtain a human intervention.
Furthermore, automatic decisions and profiling cannot be based on special categories of personal data, such as data revealing racial or ethnic origin, political opinions, religious or philosophical belief, biometric data, sexual orientation, health status, etc., unless suitable safeguards are in place (Art. 11 para. 2 PD). Profiling (i.e., automated processing of personal data to evaluate certain personal aspects relating to a natural person, as defined in Art. 3 para. 4 PD) that leads to discrimination is absolutely prohibited (Art. 11 para. 3 PD).
For automation in policing, this means that decisions with a significant personal impact (e.g., the identity check or search of a person) cannot be made by software alone, but always have to have a "human in the middle" (also known as "human in the loop"). The human in the middle has to be capable of understanding the automated decision in sufficient detail to be able to exert control in a meaningful way (also see section 2.5), which puts strong legal limits on the scope of broad, purely correlating, black box algorithms.

Right to Effective Remedy
The right to an effective remedy (Art. 13 ECHR and Art. 47 CFR) functions as a "right to have rights" of sorts. It is an important safeguard for persons affected by automated decisions; at the same time, effective remedies are scarce in the face of intransparent algorithms and diffusion of responsibility (cf. the problem of the "responsibility vacuum" we identified in section 2.5). In the case of biometric matching based on EU regulations on biometric data usage at the borders (see above), for example, effective remedies are lacking as such a complaint (in particular, an effective way to dispute the accuracy of a biometric match) is not explicitly regulated; moreover, drawing such a complaint from data protection law alone poses some challenges.
In data protection law, anyone whose data are stored has a right to rectification of their personal data [Art. 16 GDPR; Art. 16 PD, Art. 52 Regulation (EU) 2017/2226 establishing an Entry/Exit System (EES) ("EES Regulation")]. In the context of biometrics, this right must include the correction of a false biometric match. It is straightforward to qualify a fingerprint match as personal data, as it by necessity relates closely to an individual-but if the verification of fingerprints is performed in a way such that the result of the match itself is not stored, a legal claim to rectification becomes impossible. In the EES Regulation, for example, the verification process for fingerprints is described in Art. 23, but documenting the results is not explicitly included. There is no provision to store the result of the verification process in any of the databases described in Art. 14 to Art. 20, and therefore no legal basis for storing such data exists. Considering the principle of data minimisation [Art. 5 (1) (c) GDPR, Art. 4 (1) (c) PD], i.e., the principle of not storing any data unless it is absolutely necessary, this is the correct approach. But as a result, new legal instruments have to be found to comply with the right to an effective remedy regarding biometric matching and similar tools, as such a right cannot be derived from data protection law alone.

CONCLUSION
The increase in automation in policing is a trend as wide-spread as it is concerning. Policing and justice in the 21st century have been shaped by discourses on the use of force, discriminatory behaviour, and accelerating digitisation; we have given an extensive overview and review of sociotechnical questions raised by automation in policing and justice and discussed commonalities in varied different examples and contexts. These interrelated debates intersect to raise new issues; in particular, we introduced the distinction between human "retail bias" and algorithmic "wholesale bias" and argued that trading in the former for the latter constitutes a paradigmatic shift in the kind of discrimination that is possible, especially in the light of human propensity toward ascribing technical solutions more objectivity than is warranted.
We have found that further research and appropriate regulations are needed to address particularly the question of organisational responsibility for the use of ADS (and automated decision-making) and related issues of workers' rights. Whenever ADS systems are introduced, it has to be ensured that the employees' responsibility for decisions does not exceed their knowledge and scope for decision-making. At the same time, organisational responsibility must not have any gaps, i.e., there must not be any kind of "responsibility vacuum".
Finally, we have analysed the legal situation with a particular focus on fundamental rights and data protection law. Automated policing measures are subject to several legal restrictions. The case law of the ECtHR as well as the ECJ shows that large-scale open-ended data mining of personal data violates fundamental rights. In terms of EU data protection law, automated decisions have to have a "human in the middle" who can exert control in a meaningful way. In some cases of automation, particularly biometric identification, there is currently no effective remedy for the case of false positives; we have argued that this legislative deficit needs to be resolved urgently.
As we have expounded, the use of this kind of automation is fraught with pitfalls and areas of concern, only some of which can be effectively mitigated. Some tools are better avoided altogether.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.