Evidence-based sentencing and scientific evidence

Evidence-based sentencing (EBS) is a new name for an aspiration that has deep roots in criminal law: to apply the sentence most appropriate to each offender's risk of reoffending, in order to reduce that risk as far as possible. This modern version of the traditional sentencing goals of rehabilitation and incapacitation fits into the broader approach of so-called “evidence-based public policy.” It takes the view that the best existing evidence for reducing reoffending are modern structured risk assessment tools and claims to be able to achieve several goals at once: reducing reoffending, maintaining high levels of public safety, making more efficient use of public resources, and moving criminal policy away from ideological battles by basing it on the objective knowledge provided by the best available scientific evidence. However, despite the success of this approach in recent years, it is not clear to what extent it succeeds in correctly assessing the risk of individual offenders, nor whether it achieves its intended effect of reducing recidivism. This paper aims to critically examine these two issues: the quality of the scientific evidence on which EBS is based, and the available data on the extent to which it achieves (or does not achieve) its intended goals.


. Introduction
The evidence provided by social science on the effects of punishment has been systematically ignored by US criminal justice policy for at least 40 years.The second half of the 1980s and the whole of the 1990s were marked by "tough on crime" and the "war on drugs": the number of crimes punishable by life without parole increased, parole was abolished in many states and at the federal level, harsh mandatory minimum sentencing laws were enacted, three-strikes laws were introduced, harsher sentences or civil commitment were introduced for sex offenders, and punishments were generally increased, including for juveniles.The results of these criminal policies are well known: incarceration rates soared to a peak of 2,310,300 people in prison in 2008, a rate of 760 per 100,000 residents, with huge racial disparities and at a cost of billions to the federal and states governments, among other problems.
None of these initiatives had evidence that they were effective in reducing crime (Haggerty, 2004, p. 220;Tonry, 2013b).These policies were apparently intended to reduce crime through the deterrent and incapacitating effects of harsher sanctions (Tonry, 2013b, p. 159), but the studies available at the time, and also those that have been published subsequently, showed that such effects are minimal, non-existent, or even counterproductive (cf.Tonry, 2013b, p. 175 ff.;National Research Council, 2014, p. 337, 339).
Since the late 1990s, however, and in a context in which US crime policy has moved away from the more extreme forms of "tough on crime, " the popularity of what has come to be known as "evidence-based practices" has been growing.This term encompasses a range of techniques and strategies used by courts, correctional institutions and community supervision agencies, that aim to tailor criminal justice interventions to the characteristics of different individuals and groups in order to make them more effective (Klingele, 2015, p. 539).More specifically, these are "policies, procedures and programs that scientific research has demonstrated to reduce recidivism for specific offender populations such as probationers, parolees, and drug-addicted offenders" (National Conference of State Legislatures, 2011).The scientific evidence referred to in these approaches is the accumulated knowledge from criminology and psychology about the risk factors associated with an increased likelihood of violent or criminal behavior, and the existing knowledge about the most effective intervention programmes and techniques to reduce this risk in specific groups of offenders.
Against the background we have summarized in the previous lines, the name given to these practices, "evidence-based, " is immediately striking: could it be that after more than 40 years of disregard for scientific evidence we are witnessing a change in trend, and that American criminal policy about criminalization, imposition and execution of sentences is really being designed according to the scientific knowledge that the social sciences have been accumulating on the effects of sentences?I fear that, unfortunately, this may not be the case, and that the appeal to "scientific evidence, " although it is justified in some cases, runs the risk of becoming a means of legitimizing practices that have not been shown to be effective in reducing crime, and of presenting political choices as if they were mere technical issues.

National Research Council,
. A summary of the main indicators up to can be found in The Sentencing Project, Criminal Justice Facts, "Fact sheet: Trends in U.S. Corrections," at https://www.sentencingproject.
It is common to explain the abandonment of indeterminate sentences from the late s onwards as a transition toward a model inspired by retributive ideals.Although this description does fit with the reforms that took place until the mid-s, for those produced since then it is di cult to identify the fundamental logic of any retributive paradigm: proportionality with the seriousness of the o ense committed (Tonry, b, p. , ). .

What is evidence-based sentencing
The field of crime control covers many areas in which evidence-based practices can be applied: police investigations and arrests (evidence-based policing), the pretrial phase and the adoption of precautionary measures, the sentencing phase, and the enforcement of the penalties (evidence-based corrections).
In the evidence-based sentencing approach (hereinafter EBS) there is a very close link between the moment when the sentence is imposed and its subsequent enforcement, since the aim is to impose sentences that, due to their characteristics, allow the implementation of the programmes that are considered most appropriate to reduce the risk of reoffending (Casey et al., 2011, p. 7).EBS includes decisions related to the granting of probation, its possible revocation and the supervision/treatment regime that has to accompany it.In the words of one author who has strongly promoted the adoption of this perspective, EBS consists of "sentencing and corrections policies and programs based on the best research evidence of practices shown to be effective in reducing recidivism" (Warren, 2010, p. 153).
Although I will use the term evidence-based sentencing, the terminology is quite variable.It is sometimes referred to as "risk-based sentencing" (Slobogin, 2019); "effective sentencing" (National Conference of State Legislatures, 2011), "actuarial sentencing" (Hannah-Mofatt, 2013), "smart sentencing" (Marcus, 2006) or "predictive sentencing" (De Keijser et al., 2019).All these denominations refer to essentially the same thing, although each one emphasizes one of the different characteristics of the phenomenon.Indeed, the scientific evidence on which EBS is based are the actuarial or structured tools for assessing the risk of recidivism (or of specific risks, like violent or sexual recidivism), which have been developed in criminology in recent decades (actuarial sentencing) ; these tools make it possible to estimate the level of risk of each offender, i.e., to predict the likelihood of future criminal behavior (predictive sentencing); what EBS proposes is to use this level of risk as the central criterion for deciding the most appropriate sentence for the subject (risk-based sentencing), claiming that this will allow for a more efficient use of public resources (effective sentencing) by concentrating them on the subjects who really need them, and avoiding long sentences and intensive interventions on those at low risk.All of this means, it is said, a much more intelligent management of the criminal problem (smart sentencing), in so far as with the same or fewer public resources, Cf. also the definition provided by Burke in the Science Bench Book for Judges: "The goal of evidence-based sentencing is […] to more e ectively: identify who may be safely and e ectively supervised in the community; and, order appropriate conditions of community supervision given the defendant's recidivism risk, criminogenic needs, and responsivity factors" (Burke, , p. -).
"The principal use of EBS practices at the state trial court level is the use of actuarial risk-and needs-assessment information in state sentencing proceedings" (Warren, , p. ).
it will be possible to reduce recidivism and better protect the community.
The success of this approach in the US is remarkable, as it is being implemented-to a greater or lesser extent-in many of its jurisdictions.However, it is also generating an intense debate, as the use of risk assessment tools in criminal law (and not only in sentencing) raises important problems and challenges.In this paper I will address only one of them: the quality of the scientific evidence on which EBS is based, and what empirical evidence there is that it achieves its intended objectives.Before undertaking this analysis, however, it is worth offering a brief overview of the context (historical, political and epistemological) in which evidence-based sentencing has emerged.
. .The principles of e ective correctional intervention and the "Risk-Need-Responsivity" model In the field of criminal justice, the first so-called "evidencebased" programmes and procedures emerged in the field of corrections, following the discrediting of the rehabilitative ideal since the 1980s.Faced with the hostile environment created by the then dominant "tough-on-crime" policies, professionals who still believed in rehabilitation, in the criminogenic effects of prison and in the possibility of improvement for many offenders, sought to gather evidence that some rehabilitation programmes and interventions did work.Thus, debate ensued about which treatment programmes were effective or more effective, and-based on the work of Andrews and Bonta among others-a theoretical model was built to systematize and explain the "principles of effective correctional intervention, " which crystallized in the model known as "Risk, Need and Responsivity, " or RNR (Velásquez Valenzuela, 2014, p. 70 ff., Klingele, 2015, p. 552 f.).This model is As Marcus states: "The resulting critical issue in sentencing policy thus runs along a very di erent axis than the traditional divide between punitivists and advocates of reformation.Evidence-based 'smart sentencing' posits that by rigorously scrutinizing data on what works or not on which o enders, we can allocate our correctional resources far more e ciently-measured by public safety-than if we continue to settle for 'just deserts' with no accountability for outcomes" (Marcus, , p. ).In a very similar vein, cf.(medical, economic, cognitive-behavioral, educational, etc.) that favor the commission of new crimes, and that therefore acting on these needs can reduce the level of risk; and (4) that interventions must be adjusted to the receptiveness of each subject or group of subjects, as not everyone responds in the same way to the same programmes or techniques (Andrews and Bonta, 2010, p. 44 et seq.).These studies managed to build a "theory on effective rehabilitation" (Velásquez Valenzuela, 2014, p. 78), which also managed to carve out an important niche in the criminal policy debate from the 1990s onwards.However, probably the part of this approach that is proving most decisive in current criminal policy is not that of rehabilitation, but that of effectiveness, as we will see below.

. . Managerial-actuarial justice and e ective risk management
The development of the principles of effective intervention described in the previous section was parallel in time to another line of evolution in American criminal policy: the progressive incorporation of criminal risk management techniques based on the managerial-actuarial model.In recent decades, a new rationality has emerged in crime control, characterized by an approach based on the efficient management of the system's resources.From this perspective, crime is seen not so much as a serious problem to be eliminated or fought, but as a phenomenon inherent to any society, which can be managed with greater or lesser efficiency, taking into account the resources available.For this perspective, risk is also a central concept, although not exactly in the same sense as it is seen in the paradigm of the RNR model referred to in the previous section.The managerial-actuarial model is little concerned with the causes of crime and does not see rehabilitation as a main objective; rehabilitative programmes may be used if they are costeffective, but the central aim is to keep the level of deviance under control within tolerable margins.Risk level indicators measured with actuarial instruments are used for this purpose because they are useful to manage large populations in an objective way and with a minimum of costs.The central concern for this approach is resource efficiency and cost optimization (Velásquez Valenzuela, 2014;Klingele, 2015, p. 545 ff, 572 ff;Brandariz García, 2016, p. 93 ff, 111 ff;Castro Liñares, 2019, p. 79 ff).
It is in this context that the relative success of evidence-based rehabilitation programmes must be placed.As Klingele (2015, p. 552) highlights, practitioners and criminologists seeking to persuade managers and politicians reluctant to allocate financial resources to rehabilitation and treatment programmes sought to demonstrate that they were worth investing in because they produced measurable benefits in terms of crime reduction by accurately documenting outcomes and gathering objective data to support their effectiveness in reducing reoffending.The emphasis on cost-effectiveness was intended to provide funding and opportunities for programmes that could not be pursued by appealing only to the intrinsic value of the rehabilitative ideal as a legitimate and just objective of criminal sanctions.
The goal has been achieved, at least in part.Today, the old aphorism that "nothing works" is considered to have been overcome and it is accepted that certain programmes and treatments have been shown to be effective in reducing reoffending, especially those based on the RNR model, and that this knowledge should be used by the administration of justice; and this is a fundamental premise of all the documents produced by official bodies, professional associations, think tanks or other interest groups related to the administration of justice that promote the use of EBS.However, what has been incorporated into the official discourse of EBS is the possibility of effectively reducing recidivism, but not so much the fact that this reduction must be achieved through the rehabilitation of convicted offenders.This is easy to see if one examines the objectives which, according to these documents, EBS should pursue: the main objective proclaimed in all of them is the reduction of recidivism, always accompanied by a reference to the improvement of public safety and the more effective use of public resources; on the other hand, references to the rehabilitation of convicted offenders, if they are present at all, occupy a very vague second place.Nor should the objective of reducing the prison population, which is also found in almost all initiatives supporting the EBS, be confused with rehabilitation: the main reason for reducing the prison population is that it represents a very high cost for the administration, without mass imprisonment having proved to be an effective means of reducing crime.It is true that EBS seeks to avoid the criminogenic effects of imprisonment, but the main reason for reducing the prison population is to save costs, and do so by means that have been shown to be as effective as imprisonment in protecting public safety (and therefore more effective in terms of cost-benefit)."Effective rehabilitation, " as conceived by Andrews and Bonta and others, has undoubtedly been a fundamental pillar for the maintenance and expansion of programmes aimed at the rehabilitation of convicts, both inside and outside prison, but in the EBS approach the dimension of effectiveness prevails over that of rehabilitation, and violence risk assessments are primarily at the service of an effective control of the convicted population in terms of cost-benefit.If rehabilitation programmes are the most effective option for achieving this goal, resources will be allocated to them, but when this is not the case, control will be exercised through other mechanisms.

. . Evidence-based policies
Before critically examining the quality of the scientific evidence on which evidence-based sentencing is based, it is worth alluding, albeit as briefly as space permits, to the choice of the term by which this trend is called: by incorporating the term "evidence-based, " EBS does not refer to any way in which the design, imposition and enforcement of sentences could be supported by the knowledge that the social science provides about the effects of criminal sanctions, but deliberately inserts itself into a certain, much more general movement known as evidence-based policy or evidence-based policymaking.This approach builds on the success and prestige of evidence-based medicine in the early 1990s and considers, very succinctly, that: (1) there is a hierarchy of quality of scientific evidence, with randomized control trials and meta-analyses at the top, and the practitioner's clinical experience at the bottom; (2) not only in medicine, but also in the design of all kinds of public policies, decisions must be based on the best available scientific last visited . .
).However, a more recent document from the same organization acknowledges that risk assessments can be a contributing factor to avoid imposing prison sentences on low-risk o enders (Casey et al., , p. ).
Monahan and Skeem put it clearly when they ask themselves why, after years of "just deserts," we are witnessing a remarkable resurgence of risk as an essential component in sentencing and enforcement: "Money appears to be the principal answer" (Monahan and Skeem, , p. ).
The evidence, understood according to the hierarchy just mentioned; and (3) decisions based on this kind of scientific evidence are better because, on the one hand, they have been proven to be effective in achieving the proposed goal and, on the other hand, because they are no longer (or not only) motivated by political objectivesin the sense of ideological or partisan interests-but are based on objective data on what works and what does not in solving problems.In this approach, being "evidence-based" is seen as a necessary condition for greater transparency, accountability and better governance (Strassheim and Kettunen, 2014, p. 259), which ultimately adds legitimacy.
There is no doubt that the idea of developing evidence-based policy is immediately attractive, including for criminal law, and the evidence-based policymaking movement has considerable support, with numerous governmental and non-governmental "Just as no-one would argue that clinicians should practice medicine without regard to evidence, so it would seem an incontestable, self-evident proposition that policymakers should base their decisions on evidence" (Klein, , p. ).
In relation to criminal law it has been argued that "the evidencebased revolution in medicine provides a model for how a tradition-bound profession can change its approach to supplement and inform traditional discretion with sound scientific research" (Garrett, , p. ).
Among the most noteworthy was that provided by Tony Blair's British Labor government, which in the s explicitly incorporated the evidencebased policy approach into its programme with the declared aim of "removing ideological and partisan conceptions from the political decisionmaking process, grounding these decisions in evidence based on scientific research [...] about what worked and how better results could be produced" (Benito Sánchez, , p. ).This same example, however, also serves to illustrate some of the problems I will tackle in the text: in , the British Labor government launched the Crime Reduction Programme, which aimed to obtain solid scientific evidence on which strategies "work" in crime control and which do not, and then transfer this knowledge to the design of crime policies.This programme, however, unfortunately failed shortly afterwards and was canceled without having achieved almost any of its objectives.On the characteristics of this programme and the di culties it faced, see agencies now seeking to increase the use of scientific evidence in the design of public policy (Parkhurst, 2016, p. 16).However, the actual implementation of evidence-based policies is much more complex than it appears, and the evidence-based policy approach has been heavily criticized.It has been argued that it is based on a linear understanding of the relationship between scientific knowledge and policy practice, according to which science would be able to identify "the best" solution to any social problem, which, once identified, would simply have to be implemented (Greenhalgh and Russell, 2009, p. 305).However, the scientific evidence on a given issue is often not unambiguous; moreover, the timescales of politics and science are very different, and in political decisions there are other factors in addition to scientific knowledge that legitimately condition decisions (Klein, 2000;Parkhurst, 2016).On the other hand, evidence-based policy would reflect a positivist understanding of scientific knowledge according to which social problems are "out there, " waiting for someone to identify them and provide a solution, whereas in fact the identification of something as a problem, its definition, and the priority given to it in the political agenda depend on a social construction in which the competing values and interests of different groups and different ideologies are juxtaposed (Greenhalgh and Russell, 2009, p. 315).The emphasis by proponents of evidence-based policymaking on the need to adopt "what works" policies, which tends to present problems as purely technical issues, is misleading, because it obscures the political nature of many of the problems to be solved: the fact that there is a solution for something for which we have scientific evidence of effectiveness does not mean that it is an important problem to solve; scientific research can provide evidence about what the consequences of particular policies are, but it cannot alone determine which of those policies should be considered preferable.
Evidence-based policymaking also privileges a certain type of scientific evidence (randomized trials, quantitative measurements, statistical methods), whereas in the social sciences there are many Even in the very field from which evidence-based practices emerged: medicine.For an assessment of the current state of evidence-based medicine and the problems it faces, see Ioannidis other types of analysis that can provide relevant knowledge about the complexity of the relationships and tensions inherent in social reality.In fact, most social phenomena cannot be measured with the precision with which, for example, the effect of a certain antibiotic on the number of bacteria present in an organism is measured; if, in this situation, the "evidence" to be taken into account in the design of a policy excludes that which does not come from the sources previously established as preferential, there is a monopolization of the knowledge considered relevant and an oversimplification of reality (Strassheim and Kettunen, 2014, p. 263).As Saltelli and Giampietro state: "Once the analysis has removed all sources of uncomfortable knowledge the problem reduces to one which can be treated by the usual combination of cost benefit analysis and risk analyses methodologies, and the solution optimized to the desired precision, be it that the solution may have lost at this stage all its relevance to the original problem" (Saltelli and Giampietro, 2017, p. 66).In addition, in the not uncommon situation where there are scientific studies with conflicting results on a given issue, the decision-maker or policymaker may deliberately select only the scientific evidence that supports the decision already taken on the basis of other criteria and interests (cherry picking).
As a result of all these problems, other approaches have also emerged that broaden the type of scientific evidence considered relevant for evaluating public policies in relation to complex social problems, moving away from an exclusive focus on randomized control trials and meta-analysis, and combining qualitative and quantitative methods in a flexible way.One example is realist evaluation, developed by Pawson and Tilley (1997), which recognizes that programmes may work for some people in some circumstances, but not for others in different contexts.They should therefore not be judged from an all-or-nothing (it works/it does not) perspective that focuses only on the level of programme effectiveness, but evaluations should look at the underlying causal mechanisms that explain different outcomes for different groups, and also at unintended consequences (Pawson et al., 2005;Croci et al., 2023).
In short, the (laudable) goal of designing and implementing policies that are more transparent and effective because they are based on solid knowledge of their effects and consequences is by no means easy to achieve, as it is threatened by at least two risks: on the one hand, the risk of passing off as "evidence-based policy" what in reality is nothing more than "policy-based evidence, " i.e., the selective use of data to legitimize pre-established policy objectives with this supposed scientific evidence (Strassheim and Kettunen, 2014, p. 262;Parkhurst, 2016, p. 48 f.); and, on the other hand, there is also the danger of crowding out open and legitimate debate on competing ideological and moral options by hiding the political dimension of the arguments at stake behind the apparent neutrality of better or more consistent scientific support for one of the options in dispute.
Against this background, it is interesting to analyse the extent to which evidence-based sentencing really incorporates "the best available evidence" on crime control, and whether it is scientifically proven that designing criminal sanctions on the basis of recidivism risk estimates is a policy "that works." . Evidence-based sentencing and scientific evidence . .The quality of evidence on which evidence-based sentencing is based The scientific evidence on which EBS is based consists mainly of structured assessments of the risk of reoffending.And probably the majority view is that such estimates provide robust and verified information about the likelihood of recidivism or violent reoffending.It is repeatedly claimed that modern risk estimates, made using structured methods (whether these are purely actuarial or structured clinical judgement), are far more reliable than those made using purely clinical judgement, and that accuracy rates are higher today than they were 30 or 40 years ago.However, while there have certainly been important improvements in the understanding of violence risk assessment since the 1980s, to simply say that the predictive accuracy of modern structured instruments is better than the clinical judgement of practitioners oversimplifies the issue, and thus paints an overly optimistic and potentially misleading picture.
To put the problem in perspective, it is worth recalling that what contributed decisively to the discrediting of the old clinical assessments of dangerousness in the 1970s and 1980s was the realization that they produced a certain type of error, false positives, in proportions that were considered excessive at the time.Studies showed that, out of every three people considered dangerous by a psychiatrist or psychologist, only one actually went on to commit violent acts (Monahan, 1981, p. 77).The fact that two out of three predictions of violent behavior were disproved by the subsequent behavior of the subject was considered to be evidence of insufficient quality on which to base major restrictions of rights, such as the ordering or prolongation of psychiatric or penal detention.
Today, however, this type of error has not been significantly reduced.The statistical indicator that measures the percentage of subjects who actually reoffend, out of the set of those who have been assessed as high risk, is called the positive predictive value.In the studies on violent reoffending conducted internationally, this value is usually below 50%, and often significantly lower (Douglas et al., 2017).In a relevant and well-known meta-analysis And, of course, in the accumulated knowledge of rehabilitation programmes "that work."However, as we have already noted above, the central premise of the EBS is that sentencing must be adapted to the level of risk of each individual, not only in order to apply rehabilitation programmes, but also purely incapacitation programmes in cases where the latter are more e ective.And in both cases, the necessary condition is that it is possible to estimate the risk with a su cient degree of accuracy: if this were not possible, there would be no logic in designing sentencing and corrections with a view to the risk and needs of each individual.
On the di erent risk assessment techniques and their characteristics, see Loinaz, .
Frontiers in Psychology frontiersin.orgthat examined the use of nine risk assessment tools in 73 studies involving more than 24,000 subjects, the positive predictive value for assessing the risk of violent recidivism was on average 41%, i.e., for every 10 subjects who were considered to be at high risk, only 4 committed further violent acts.In the same study, the positive predictive value of tools used to estimate the risk of sexual recidivism was on average 23% (Fazel et al., 2012, p. 10).More recently published studies continue to show positive predictive values for the likelihood of violent reoffending that do not even reach 40%.
In the light of these data, it does appear that risk assessments have improved much in terms of predictive accuracy over the clinical judgements of 40 years ago.So where does the widespread belief come from that modern structured risk assessments are much better than clinical judgements?In my view, there are two main reasons: the parameters by which a risk assessment is judged to be good have changed, and the way in which information about the quality of the estimates is communicated has also changed.
Regarding the former, decades ago predictions of dangerousness were considered binary, in the sense that both the future event and the prediction had only two possible outcomes: the new offense either occurred or it did not; and the practitioner could either have considered the subject to be dangerous or not.In this type of prediction, there are only four possible outcomes: true negative (considered not dangerous and did not reoffend), false negative (considered not dangerous but did reoffend), true positive (considered dangerous and did reoffend) and false positive (considered dangerous but did not reoffend).
Since the 1990s, however, a distinction has been made between the occurrence or non-occurrence of the event, and the different degrees of confidence one can have that the event will occur (Mossman, 2006, p. 549 et seq., 555 et seq.).Although reoffending is a binary event (either the assessed person commits new offenses, or they do not ), the judgement one makes about the likelihood of And even this is a remarkable oversimplification of the problem, for it makes no di erence whether a new o ense of theft or murder is committed, whether only one new o ense is committed or many, whether this recidivism is observed a few months after release from prison or many years later, and so on.If structured violence risk assessments cannot provide the judge with di erentiated information on these points, it is doubtful whether they can be considered fit for purpose from the perspective of the criminal justice system (Slobogin, ).
its occurrence is not, because the assessor may consider that level to be not only high or low, but also very low, low, medium, high, very high, extreme, and so on.If, for example, a risk assessment tool classifies a group of people with a subsequent reoffending rate of 30% as high risk, a group with a reoffending rate of 20% as medium risk, and others with a reoffending rate of only 10% as low risk, it seems clear that it can distinguish which groups of people are at higher risk than others.In this sense, it can be said that it works well, or that its predictive accuracy is "good, " at least in terms of its ability to discriminate the greater or lesser relative risk of reoffending of some groups compared with others.And this even though in the highest risk group, the percentage of people who actually reoffend is only 30%.It is in this aspect of relative risk that the most significant advances in criminological research on the risk of reoffending have taken place.When there are more than two possible levels of risk, the analysis of predictive accuracy can no longer only be limited to binary indicators such as sensitivity, specificity or predictive values, but other indicators have emerged that relate the multiple possible levels of risk to the outcome of recidivism or non-recidivism.The most commonly used indicator to assess the performance of structured risk assessment tools is the so-called area under the ROC curve, which measures relative risk: it says how likely it is that a randomly selected recidivist would have received a higher risk rating on the tool than a randomly selected non-recidivist (Singh, 2013).That is, it reports how well the tool discriminates between higher and lower risk individuals but says nothing about the reoffending probabilities associated with each level.For example, in one tool the low-risk group may have an associated reoffending probability of 5% and the high-risk group may have a reoffending probability of 15%, and an area under the ROC curve of 0.75 (which is conventionally considered a high value).In another tool the recidivism probabilities may be 10% and 50% respectively and have the same area under the ROC curve value of 0.75.In both cases there is a 75% chance that a randomly selected recidivist will have had a higher risk classification than a randomly selected non-recidivist.But being high risk is associated with a very different probability of reoffending in each of these instruments, and from the point of view of a judge seeking to impose a sentence commensurate with the risk level of the individual, it has a very different meaning whether the probability of reoffending associated with being high risk is 15 or 50%.
On the other hand, the area under the ROC curve summarizes the discriminatory power of the risk assessment tool in a single number (e.g., 0.75), but the false positive and false negative rates can vary dramatically depending on which discrimination threshold is used to make a particular decision.For example, suppose we were to use a risk assessment tool that classifies people into five risk levels (very low, low, medium, high, very high) and has an area under the ROC curve of 0.75, to decide whether to grant parole.If we decide to parole only those who are classified as very low risk, we will have many false positives and very few false negatives; if we parole all ROC for Receiver Operating Characteristics.
On the meaning and information provided by the area under the ROC curve see, in more detail, Mossman, , and Martínez Garay and Montes Suay, .
Frontiers in Psychology frontiersin.orgbut those in the very high risk group, we will have many more false negatives and fewer false positives.None of these error rates need to coincide with the area under the curve (75%), and both false positives and false negatives can be well above or well below 75% for each of our decisions, depending on the threshold we have set in each case.For these reasons, several authors have warned that the value of the area under the ROC curve alone is very uninformative when assessing the usefulness of a risk assessment tool for legal decisionmaking (Szmuckler et al., 2012;Shepherd and Sullivan, 2017;Fazel, 2019, p. 198), and recommend complementing the analysis of predictive accuracy with information on other statistical indicators, and always reporting the limitations of all of them (Singh, 2013;Rossegger et al., 2014;Douglas et al., 2017).
The predictive accuracy of risk assessment tools can indeed be expressed by many different statistical indicators, each of which measures a different dimension of this accuracy (Singh, 2013;Muñoz Vicente and López-Ossorio, 2016;Loinaz, 2017, p. 87 ff.).For example, the same tool may have a very high sensitivity but a low specificity, or an acceptable area under the ROC curve but a very low positive predictive value.This means that the predictive ability of a risk assessment tool can sometimes be described as both "good" and "bad, " if some of the indicators reach very satisfactory levels, while others remain at much more modest levels.
If this is the case, it is very important that the information provided on the predictive accuracy of these tools covers the various possible dimensions; otherwise, if the information includes only those indicators that yield higher values, and omits others that are less satisfactory, the impression is given that the overall performance of the tool is better than it actually is.However, it is very common for studies to report only relative risk indicators (and in particular the area under the ROC curve), explicitly stating that these achieve acceptable or satisfactory values, but not providing information on absolute risk, i.e., the probability of reoffending or violent reoffending associated with each level of risk, nor do they usually provide the positive and negative predictive values associated with each discrimination threshold.
Because of all these problems, and some others that risk assessment also presents, it has been recognized in criminology that "not only is the predictive accuracy of risk assessment tools imperfect, it is also imperfectly presented in the literature.This For an example of how the area under the curve can "mask" very large error rates in risk assessment, depending on the cut-o point chosen and the prevalence of the phenomenon, see ) and when validation studies exist, they do not always respect the essential characteristics of the original study (Rossegger et al., ); recidivism rates for each risk level vary markedly between samples (Rossegger et al., ; Singh et al., ); when risk assessments are applied in everyday criminal justice practice, the levels of accuracy are lower than those found in academic studies (Fazel, , p. ).
limited and skewed evidence base creates a risk that decision makers will rely more heavily on risk assessment scores than their accuracy warrants" (Douglas et al., 2017, p. 135).
In short, notwithstanding the fact that risk estimates work much better for other things (detection of low-risk individuals, ability to discriminate within a group between individuals at higher risk than others), the empirical evidence accumulated over the last 40 years shows time and again that we are still wrong more than half the time when we make estimates of high risk of violent crime.And although it is sometimes criminologists themselves who explicitly warn of this problem and of the consequences of using these estimates as the basis for measures that severely restrict rights, advocates of evidence-based sentencing tend to ignore this when they appeal in a general way to the fact that structured estimates of risk of reoffending are "much better" than the old structured clinical assessment and should therefore be used in choosing the type of sanction and the way in which it is carried out.
. .Is there empirical evidence that evidence-based sentencing achieves its goals?
It is relatively common for evidence of the success of EBS to be cited, for example, that judges and other actors in the criminal justice system find the information provided by risk assessments useful, or that as a result of the introduction of EBS the number of prison sentences has been reduced and the number of alternative sentences increased.But while all these results are certainly positive, they do not provide direct information on the two main objectives of EBS, which are to reduce reoffending and the costs associated with the overuse of the prison system, and to control crime rates.As the evidence-based policy it claims to be, evidencebased sentencing should be able to demonstrate with empirical evidence that these outcomes are actually achieved; in other words, that "it works." However, whereas there are many studies on the predictive validity of risk assessment tools, far fewer have studied their usefulness in reducing recidivism or crime rates even harsher judgement can be found in Fazel, , who, after listing a series of scientific validity criteria that risk assessment tools for recidivism or violence should meet, states that very few of the currently existing tools meet more than one or two of them, and concludes that "the most commonly used tools in criminal justice are not suitable for predictive purposes" (p. ).
Cf. e.g., Yang et al., , p. , stating that due to their moderate levels of predictive e cacy, risk assessment tools should not be used as the sole or primary means of making clinical or judicial decisions that depend on a high level of predictive accuracy, such as preventive detention.Cf. also Fazel, , p. , . E.g., Casey et al., , p. . In this vein, Ostrom and Kauder ( ) positively assess the introduction of EBS in the State of Virginia (to which we will refer in more detail below), o ering data on the level of follow-up of recommendations by judges, on how much the workload of supervisory agents increases, or on the degree of satisfaction expressed by various actors in the system.
Frontiers in Psychology frontiersin.org(Viljoen et al., 2018, p. 184).And those that do exist do not yield very encouraging results for EBS.As Stevenson puts it, "Somehow, criminal justice risk assessment has gained the nearuniversal reputation of being an evidence-based practice despite the fact that there is virtually no research showing that it has been effective" (Stevenson, 2018, p. 306).A systematic review that included studies published up to 2017 concluded that there is insufficient empirical evidence to claim that the use of structured risk assessments reduces violence or reoffending, because the available studies, in addition to having a number of important methodological limitations, show mixed results: while in some cases a reduction in violence or crime rates is observed after the use of risk assessments, in others this is not the case (Viljoen et al., 2018, p. 200, 204).It is also worth noting that eight of the 12 studies analyzed in this review were conducted on samples of psychiatric patients (Viljoen et al., 2018, p. 198), which would pose significant problems in generalizing the results to the standard offender population, even if an association between the use of structured risk assessments and reductions in offending had been demonstrated.
In another systematic review of 22 studies involving 1,444,499 adolescents and adults, the authors found that the use of risk assessment tools was associated with a small overall reduction in restrictive placements, particularly for low-risk individuals, and a small reduction in any reoffending, but after removing studies with a high risk of bias, the results were no longer significant.They also concluded that much of the available research of poor quality and that there is a strong need for more rigorous research before clear conclusions can be drawn (Viljoen et al., 2019, p. 1, 401-411).
A thorough empirical evaluation of pretrial risk assessment in Kentucky found that the 2011 bail reform, which mandated the use of pretrial risk assessment with the explicit goal of lowering incarceration rates, did not achieve the intended effects although it did change bail-setting practices, and after a couple of years the pretrial release rate was lower than it was before the reform (Stevenson, 2018, p. 308-311).
Empirical research on the implementation of EBS in the state of Virginia has also been published (Stevenson and Doleac, 2019).Virginia was the first state in the US to systematically implement risk assessment in sentencing in 2003 for nonviolent and sex offenders.Risk assessment was incorporated into the state's sentencing guidelines with the goals of: (a) providing alternatives to prison for a significant number of low-risk, non-violent offenders, and (b) allowing for longer sentences for high-risk sex offenders.And in both cases only as a recommendation to the judge, who decides whether or not to follow it.This policy was driven, as is characteristic of EBS, by considerations of cost-effectiveness: reserving expensive prison Another experiment in the application, not really of EBS but rather of evidence-based corrections on a large scale, is the Justice Reinvestment Initiative, to which we cannot discuss in more detail in this paper for reasons of space, cf.Klingele, , p. ; Garrett, , p.
places for the most violent offenders while maintaining a high level of public safety.
The research to which we refer analyses the impact of these changes on the prison population and reoffending rates, and the results are remarkable.On the one hand, for the group of non-violent offenders, neither the prison population rate nor the number of sentences imposed decreased.However, this does not mean that judges did not taken into account the recommendations derived from the risk levels.According to the study, there are differences in the likelihood of being sentenced to prison, and also in the length of the sentence, between non-violent offenders below and above the cut-off point that marks the boundary between high and low risk.What happened is that the reduction in the number and length of prison sentences for the low-risk group was offset by an increase in both factors for the high-risk group, so that the net effect of the implementation of these policies on the total number of prisoners ended up being zero (Stevenson and Doleac, 2019, p. 2, 3, 19).
One might have thought that even if the prison population had not been reduced, increasing the severity of sentences for the most dangerous offenders would at least have achieved the second objective of reducing reoffending.However, the data show that this is not the case either (Stevenson and Doleac, 2019, p. 2, 20): recidivism rates did not change significantly.
On the other hand, the results for the sex offender group are also striking: while for these cases the explicit purpose of the reform was to allow increases in sentence severity above the recommended guidelines only for high risk, the study shows that after the introduction of risk assessment there was a 5% decrease in the likelihood of being imprisoned and an ∼24% decrease in sentence length (Stevenson and Doleac, 2019, p. 19).
The study suggests several explanations for these surprising results.A very important one is that the judges did not always follow the recommendations: in addition to the guidelines and risk assessments, they also took other criteria into account, among which age stands out.Being young is one of the most riskaggravating factors in any assessment tool, including the one used in Virginia.However, the courts have traditionally viewed youth as a mitigating factor in determining liability.The study shows that judges in Virginia did modify sentences on this point to conform to the recommendations of the risk assessments: there was a relative increase in the severity of sentences imposed on young people, and also in the likelihood that they would receive a prison sentence.But the adjustment was only partial, because if judges had consistently followed the recommendations in all cases, these increases would have been multiplied (Stevenson and Doleac, 2019, p. 3, 4).
With regard to the sex offender group, the study suggests two possible explanations.One is that judges had a preconceived notion that this group of offenders was more likely to reoffend than they actually were, and thanks to the risk assessments they realized that they were less dangerous than they thought, which would explain the reduction in the severity of sentences.However, Stevenson and Doleac believe that another hypothesis is more plausible: that lowrisk assessments have been used by judges as a "shield" to impose sentences that they consider more appropriate, but which they have previously been afraid to impose because of the huge costs to their prestige and professional careers that false negatives entail.
A low-risk assessment would make it possible to shift some of the responsibility in the event of a repeat offense by someone who could have been imprisoned if he or she had received a longer sentence (p. 19, 20).
As for the fact that recidivism remained constant after the introduction of risk assessment, the authors of the study rule out as a possible explanation that the tool used in Virginia is flawed or poorly designed, and simply point to the fact that recidivism is a very difficult phenomenon to predict under any circumstances.In their view, risk estimates explain only a tiny percentage of recidivism, so that adopting one policy or another on the basis of these estimates can have only a very limited effect on the variation in reoffending rates (Stevenson and Doleac, 2019, p. 33 ff.).

. Discussion
So far, there is no scientific evidence that EBS achieves its intended goals of reducing reoffending and prison use.However, it may be that this ineffectiveness is not due to any inherent shortcomings of EBS, but to its misapplication in practice.The study by Stevenson and Doleac could point in this direction: since the risk recommendations in the Virginia case were not binding, judges did not always follow them; if they had, the results would have been better, so what needs to be done is to eliminate judicial discretion and make it mandatory always to tailor sanctions to what the risk levels recommend.
However, there is a widespread view among advocates of EBS that risk assessments should not be binding on judges.It is recognized that, in addition to the level of risk, there may be other important criteria to be taken into account in sentencing and corrections, such as the availability of resources to carry out the treatment or the type of supervision ordered.It is also generally accepted that the reduction of recidivism is only one of the various objectives pursued by the criminal law, so that considerations such as the seriousness of the offense or the need for general deterrence may legitimately lead to the imposition of sentences which are not commensurate with the risk of reoffending.As far as the The study only analyses the evolution of recidivism rates in the group of low-risk o enders; it does not analyse recidivism rates in the group of sex o enders due to the lack of available data, as this is a small group with very long sentences that were still being served at the time of the analysis.academic field is concerned, most proponents of EBS place it within a framework of limited retributivism that sets maximum (and sometimes minimum) limits beyond which sanctions cannot be imposed, even though they may be appropriate according to the level of risk.This creates the following paradox: if it is necessary to adhere strictly to risk-based recommendations in order to achieve the intended benefits of EBS (in terms of reducing crime and reoffending), but at the same time there are good reasons for not doing so, which must be respected and which make it impossible, then it would seem that the proponents of the practice themselves are acknowledging the impossibility of its success.

Cf. National
On the other hand, it is highly doubtful that even if sentencing were to be based solely and exclusively on risk level, the desired objectives could be achieved.This is partly because, as we have already seen, the scientific evidence on which evidence-based sentencing is based is much less robust than it might appear at first sight.The high rates of false positives raise serious questions about the efficiency of a system that would systematically devote excessive resources to intensive criminal control of people who do not need it.And the concern for reoffending that underpins the whole approach reflects an intolerance of false negatives that would probably also favor a penal response to low-risk groups that is disproportionate to the real risk they pose.Moreover, if it is not easy to estimate the risk of reoffending with a high degree of accuracy, it is even more difficult to reduce that risk.There are, of course, effective treatment programmes for reducing reoffending, but there does not appear to be conclusive empirical evidence that the use of risk assessment is successful in reducing reoffending.This may be for a number of reasons, including the very obvious one that risk assessment alone is unlikely to reduce risk unless it is followed by intervention, or, as Monahan and Skeem argue, that there is not yet enough good empirical research on which risk factors are causal and therefore which need to be modified to reduce the risk This is known as "risk management", which is a key issue, and more complicated than mere risk assessment, because it depends not only on whether there are programmes in place, but also on whether there are su cient resources to implement them, and the capacity and willingness to do so rigorously.Otherwise, mere risk assessment is likely to be a waste of time (Monahan and Skeem, , p. f.); Viljoen et al. ( , p. , ) add that according to some authors it could even be considered harmful and unethical.
Frontiers in Psychology frontiersin.org of reoffending.But in any case, an approach that claims to be "evidence-based" and that claims to reduce reoffending with risk assessments should, in my view, be able to provide data to support this claim.What is more, risk assessments are not the only knowledge about crime that criminology has produced in its long history: on the contrary, there is scientific evidence that would support penal policies quite different from those advocated by EBS.Take age, for example: being young is one of the factors most directly correlated with a higher risk of reoffending and violent recidivism, and is therefore included in virtually all assessment tools.Age can explain almost 50% of the risk score in structured instruments, and its weight in the total score is often equal to or greater than that of criminal history.However, empirical evidence on criminal careers and age curves also shows that while the proportion of people who commit crimes in adolescence and early adulthood is very high, the vast majority of them stop a few years later, in their early twenties.And that, even within the group of those who can be considered career criminals, many drop out at a relatively early age (in their thirties).This being the case, imposing long sentences on young people, and even very long sentences on relatively young people with long criminal records, can have only a very limited incapacitating effect, since many of these individuals would have given up crime anyway, and therefore the considerable allocated to their imprisonment cannot be considered an efficient investment from a cost-benefit point of view.In other words, even if we remain within the utilitarian and efficiency-based logic of EBS, there is empirical evidence that seriously challenges the claim that imposing more intensive (and more expensive) penal control on those who, because of their age and criminal history, are at high risk of reoffending, is in fact an efficient investment.
There is yet another consideration to be made in relation to age as a risk factor and scientific evidence.We have already mentioned, there is a well-established tradition in criminal law of valuing youth as a mitigating factor.Most countries have specific juvenile justice systems that operate based on re-education criteria and provide for comparatively lighter penalties than those imposed on adults who commit the same offense.This is justified by the fact that social sciences, and psychology in particular, show that minors, even if they know the rules from a certain age onwards, are generally still immature, highly impressionable, impulsive, and with a lower capacity to tolerate frustration.These psychological characteristics merit the application of milder punishments from a retributive perspective.More recently, neuroscience has confirmed that there are not only psychological differences between adolescents and adults, but that the degree of brain development differs between them, and that this difference persists not only until the legal age that marks the borderline for treatment as an adult in many countries (around 18 years), but also until the early twenties.
There is, therefore, empirical evidence to justify imposing more intensive penal regimes on young adults because of the greater risk of reoffending they represent, if we consider that incapacitation is the central aim of the penal system.And there is also empirical evidence to justify a more lenient penal response for the same group of people if we consider that punishment proportionate to the degree of responsibility should be the central criterion for sentencing.Much the same can be said for other factors, such as certain mental illnesses.In other words, depending on which goals we consider to be a priority, we have the scientific evidence to design very different criminal justice policies, all of which would be equally evidence-based.
In the design of criminal policy, scientific evidence is sometimes important and sometimes not.There are areas that have proved to be impervious to the social science evidence accumulated over the years on the zero or very limited effectiveness and the enormous side effects of certain practices (for example, the death penalty or the "war on drugs"), and others where scientific evidence has been more or less successful in penetrating (for example, policing and some areas of rehabilitative penal enforcement).The main reasons why criminological evidence is or is not taken into account are not primarily related to its scientific quality, but to other factors: whether or not it fits in with the political objectives pursued by governments at a given time, whether it coincides with a window of opportunity to be well received by public opinion, whether or not there is pressure from certain interest groups for or against the inclusion of this evidence, and so on (Tonry, 2013a).The same is true of evidence-based sentencing: there is no more or better scientific evidence to support it than there is to support other models, and the success of this approach in the US is due to other factors.Among these are that its promises of cost-e ectiveness are welcome in times of economic crisis; that its emphasis on reducing reo ending can bring together politically distant actors (some share the goal of providing alternatives to prison for more people, while at the other end of the ideological spectrum there is a shared goal of keeping the most dangerous o enders locked up for longer); its technocratic and actuarial language fits well with the managerial model currently dominant in US policy; its focus on the sentencing and enforcement phase means that it can be implemented without challenging ideologically thorny issues such as reducing sentences for serious crimes; it has been proposed at a time when crime rates are very low, etc.
Frontiers in Psychology frontiersin.org Consequently, the term "evidence-based" (which implies that other policies would not be based on science, or would have less scientific support) is not justified in my view, and it would be preferable to refer to this approach by one of the other terms that, as we saw at the beginning of this paper, describe its content well: predictive sentencing or risk-based sentencing, for example.
What is more, evidence-based sentencing suffers from some of the problems that plague "evidence-based policy" described in section 2.3 of this paper.It presents itself as based on "the best available evidence" but fails to make explicit the many shortcomings (as well as the undoubted virtues) of structured risk assessment for reoffending, and thus offers at best an incomplete (and at worst a biased) picture of the true state of scientific knowledge about the predictive capacity of risk assessment and its actual impact (or lack of impact) on crime rates.On the other hand, it encourages (either deliberately or inadvertently) a depoliticisation of the criminal justice debate by presenting risk-based sentencing as a practice that "works, " and should therefore be accepted, when the crucial question is: it works for what?Evidence-based sentencing assumes that the "what for" must be the reduction of reoffending, but it has not yet been shown to do so, nor is it at all clear that this should be the primary aim of the criminal justice system.There are many other possible and legitimate objectives, such as ensuring the non-discriminatory application of sentences, promoting the rehabilitation of as many offenders as possible, avoiding the imposition of disproportionate sentences, or, course, reducing crime rates, which is not the same as reducing reoffending rates.The debate about which of these goals should be preferable or a priority necessarily involves value-based arguments (normative, political, ethical), and although it can (and must! if it is to be rational) also deal with arguments related to what criminology knows about the effects of punishment, it cannot be replaced by them.This is why I believe that evidence-based sentencing, at least in the form that it has taken in recent years in the US, is not a truly evidence-based practice, but rather an example of the selection of certain scientific evidence to justify certain public policies (policybased evidence), hiding behind supposedly technical reasons options that can only be the subject of political debate.
Finally, I would like to stress that the criticism leveled at EBS in no way detracts from the fact that various rehabilitation programmes have shown good results in reducing reoffending.I also believe that risk assessment tools do a reasonably good job of discriminating between groups at higher and lower relative risk, and that they identify the lower-risk groups with a remarkable degree of accuracy, which can help with allocation to treatment programmes and provide a strong argument for widening the range of alternatives to prison.Effective rehabilitation can (and in my view should) have an important place in the enforcement of sanctions.But this does not mean that there is a sufficient scientific basis-nor, in my view, better reasons-to make socalled "evidence-based" sentencing the cornerstone of the criminal justice system.
One such argument can be found in the document "Principles of Evidence-Based Policymaking," produced by the Evidence-Based Policymaking Collaborative (comprising various foundations and organizations such as the Laura and John Arnold Foundation, the Urban Institute and the Brookings Institution): "In an era of intense partisanship and constrained public resources, evidence-based policymaking can help bridge the partisan political divide and support research-based debate about what outcomes we want to achieve, for whom, and at what cost.It encourages transparency and accountability [...] (p.f.)" (accessed online at https://www.urban.org/sites/default/files/publication//principles_of_evidence-based_policymaking.pdf, last accessed . .).
Cf. Ostrom and Kauder, , p. , explaining the evolution of o ender risk assessment in Virginia: "Viable strategies are needed that allow states to reserve scarce and expensive prison beds for the most dangerous o enders and use less costly punishment options for less serious o endersall while maintaining public safety." around the central notion of risk, and is based-very brieflyon the following assumptions: (1) that the risk of recidivism is measurable and that this can be done today with structured or actuarial methods much more reliably than with the traditional clinical method; (2) that interventions should be tailored to each individual's level of risk (intensive interventions are not advisable for low-risk individuals, as they may even have counterproductive effects, and that efforts and resources should be concentrated on higher-risk individuals); (3) that each subject has specific needs Cf., for example, the strongly critical position of Starr, , p. -.In Spanish, an overview of the state of the issue can be found in Martínez Garay, , p. et seq.A recent and highly critical contribution to both the use of risk assessments and neuroscientific knowledge in criminal proceedings is that of Julià Pijoan, (with specific references to evidence-based sentencing on p. et seq.).built Center for State Courts, NCSC Fact Sheet.Evidence-Based