<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/feduc.2023.1221569</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Hypothesis and Theory</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The sizzle and fizzle of teacher evaluation in the United States and the selective use of research evidence</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Gitomer</surname>
<given-names>Drew H.</given-names>
</name>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/507004/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Marshall</surname>
<given-names>Brittany L.</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/2309113/overview"/>
</contrib>
</contrib-group>
<aff><institution>Graduate School of Education, Rutgers University</institution>, <addr-line>New Brunswick, NJ</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0002"><p>Edited by: Stefinee Pinnegar, Brigham Young University, United States</p></fn>
<fn fn-type="edited-by" id="fn0003"><p>Reviewed by: Mary Frances Rice, University of New Mexico, United States; Cheryl J. Craig, Texas A&#x0026;M University, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Drew H. Gitomer, <email>drew.gitomer@gse.rutgers.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>08</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>8</volume>
<elocation-id>1221569</elocation-id>
<history>
<date date-type="received">
<day>12</day>
<month>05</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Gitomer and Marshall.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Gitomer and Marshall</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>In 2009, the United States funded the largest federal educational reform effort in the nation&#x2019;s history. Referred to as <italic>Race to the Top</italic> (RTTT), a cornerstone of this effort was the high-stakes evaluation of all teachers, with a significant emphasis on the use of highly researched statistical methods that ascribed changes in student test scores to a teacher&#x2019;s quality. The widespread endorsement of these policies across a broad range of the political spectrum was based on a theory of action that faced technical, organizational, and political challenges. Enthusiasm for these evaluation efforts was substantially muted in a mere 5&#x2009;years. Among a number of factors, we argue that the framing of the problem together with privileging particular lines of research and voices, as well as the lack of consideration of other frames and attention to other research and voices, resulted in an evidence base that was wholly insufficient to justify the large-scale policy changes that were enacted.</p>
</abstract>
<kwd-group>
<kwd>teacher evaluation</kwd>
<kwd>assessment</kwd>
<kwd>evidence use</kwd>
<kwd>teacher quality</kwd>
<kwd>policy formation</kwd>
</kwd-group>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="106"/>
<page-count count="12"/>
<word-count count="11942"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Teacher Education</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<label>1.</label>
<title>Introduction</title>
<p>Teacher evaluation in the United States has been an important K-12 education policy issue for the past 25&#x2009;years. In this article, we will describe the evolution and design of national in-service teacher evaluation policies as part of a major educational reform initiative, how policies were implemented, and why many of them failed. We argue that these policies were doomed from the start for many reasons, including weak theories of action as a result of inadequate attention to research and critical stakeholders, weak measures to explain causal attribution, organizational issues, and lack of consideration to how teacher evaluation systems affect schools in marginalized communities.</p>
<p>As part of the federal response to an economic crisis, the U. S. Congress enacted the American Recovery and Reinvestment Act of 2009, a massive and unprecedented stimulus package of over $800B (<xref ref-type="bibr" rid="ref18">Congressional Budget Office, 2012</xref>). Included in this package was an equally unprecedented $4.35B for educational reform, known as <italic>Race to the Top</italic> (RTTT). The most important consideration in states&#x2019; applications was their plan for implementing the evaluation of educators, including both teachers and principals.</p>
<p>These evaluation systems represented a change in how teacher evaluations in the United States were to be conducted, as they focused, in large part, on how individual teachers contributed to student learning as measured by standardized test scores and other types of assessment measures. While evaluation systems also included measures like classroom observations, this focus on using student learning measures to evaluate teachers was an effort relatively unique to the United States (<xref ref-type="bibr" rid="ref107">Williams and Engel, 2012</xref>). The push for these systems was strongly bipartisan, motivated by concerns about student learning as well as very pointed critiques of teachers and, particularly, teacher unions (see <xref ref-type="bibr" rid="ref59">Katz and Rose, 2013</xref>; <xref ref-type="bibr" rid="ref67">Maranto et al., 2016</xref>). This bipartisan agreement also led to the charter school boom of the 2000s.</p>
<p>The enthusiasm for teacher evaluation was fully shared by policy leaders across the country, as they argued that evaluation would be a powerful tool to aid teachers in their ability to support their students. The two largest funders of these efforts were the U. S. Department of Education and the Bill and Melinda Gates Foundation. Arne Duncan, U. S. Secretary of Education at the time, said, &#x201C;Teachers support evaluations based on multiple measures: student growth, classroom observation and feedback from peers and parents&#x201D; (<xref ref-type="bibr" rid="ref31">Duncan, 2009</xref>). Bill Gates, speaking for his Foundation, stated, &#x201C;Students deserve great teachers. And teachers deserve the support they need to become great&#x201D; (<xref ref-type="bibr" rid="ref35">Gates and Gates, 2018</xref>).</p>
<p>Though RTTT marked a major policy shift in American education, its genesis was long in the making. For some 40&#x2009;years, policymakers had consistently focused on the comparatively poor academic performance of U. S. students as measured by national and international assessments. The most recent policy iteration was based on a broad body of research evidence that was used to justify the need to improve teaching quality, generally, and the need to reform teacher evaluation practices, specifically. Indeed, it was virtually certain that research papers and policy statements alike would begin their arguments by pointing out that teachers were the most important school-based factor in determining students&#x2019; academic outcomes. This research was used to support the implementation of teacher evaluation policies in 40+ states by 2013. The fervor for these policies represented the confluence of the promise that teachers were the single most important factor in determining student outcomes (the qualifier of <italic>school-based</italic> was often lost in policy discussions) and the promise of measurement technologies that could identify teacher quality with appropriate precision. The sizzle was palpable.</p>
<p>The enthusiasm for teacher evaluation and its related policies was short-lived. By 2015, the federal government had abandoned teacher evaluation as a requirement for federal funding. Foundations that had been major supporters of these initiatives shifted their attention elsewhere. While teacher evaluation did not disappear completely, many states abandoned the use of student growth scores as a required component of teacher evaluations.</p>
<p>Research over the last number of years has revealed the many ways in which the policies did not live up to their promise. For the most part, the goal of improving student achievement was not realized. Constituent measures were shown to be unreliable and biased. Inadequate attention was given to implementation and organizational issues and their impact on students, teachers, and schools in marginalized communities. Educators, in general, soon became vocal opponents of the policies.</p>
<p>In this paper, we argue that a critical reason for the failure of RTTT to realize its promise was that the research base that was used to support the theory of action for teacher evaluation was, from its inception, inadequate to support ambitious policy goals. We consider the arc of history that led to teacher evaluation as a core educational reform policy, the research that motivated the policy, the limits of that research, and the resulting outcomes of the policy. We use this to highlight that using research evidence to create policy is limited to the extent that the research is not sufficient to address the complexity of the problem it is trying to address.</p>
</sec>
<sec id="sec2">
<label>2.</label>
<title>Setting the stage for RTTT &#x2013; the role of federal policy in educational reform</title>
<p>Historically, educational policy in the United States was a responsibility of individual states and local districts. The establishment of a cabinet-level Department of Education did not occur until 1980 and was politically contested as usurping states&#x2019; responsibilities (<xref ref-type="bibr" rid="ref100">Stallings, 2002</xref>). During the 1980s, several landmark reports that laid the groundwork for RTTT (<xref ref-type="bibr" rid="ref74">National Commission on Excellence in Education, 1983</xref>; <xref ref-type="bibr" rid="ref15">Carnegie Forum on Education and the Economy, 1986</xref>) were issued. These reports were authored by commissions that consisted of leaders in education, government, and business and came to a set of conclusions, largely based on test score performance and international comparisons, that were at the core of reform efforts for the next 40&#x2009;years:</p>
<list list-type="bullet">
<list-item>
<p>Public schools are bastions of mediocrity, and students are underachieving.</p>
</list-item>
<list-item>
<p>This mediocrity has direct implications for the nation&#x2019;s economic well-being.</p>
</list-item>
<list-item>
<p>The federal government has a role in improving our nation&#x2019;s education.</p>
</list-item>
</list>
<p>These reports led to two generations of educational reform efforts characterized by various initiatives to: specify what both students and teachers needed to know and be able to do in the form of standards; increase testing of student achievement; increase testing of teachers for licensure and certification; and implement a range of accountability efforts to hold states and schools accountable for educational performance. These policies were embodied in landmark legislation such as the <italic>Improving America&#x2019;s Schools Act</italic> (IASA) of 1994 and the <italic>No Child Left Behind Act of 2001</italic> (NCLB; officially, the <italic>Elementary and Secondary Education Act</italic> [ESEA]).</p>
<p>NCLB was particularly interesting in that it called for schools to make adequate yearly progress (AYP) on achievement scores in such a way that all students would be 100% proficient 13&#x2009;years later (2013&#x2013;14). It became clear that states were trying to navigate the policy by setting lower standards for proficiency, setting minimal growth targets early in the AYP trajectory, and seeking exceptions. All of this had significant implications for how schools were judged and for which schools were labeled as &#x201C;failing&#x201D; (<xref ref-type="bibr" rid="ref81">Polikoff et al., 2014</xref>; <xref ref-type="bibr" rid="ref25">Davidson et al., 2015</xref>). By most metrics, NCLB did not lead to meaningful gains for students, and international comparisons remained troubling for policymakers (e.g., <xref ref-type="bibr" rid="ref27">Dee and Jacob, 2011</xref>; <xref ref-type="bibr" rid="ref63">Lee and Reeves, 2012</xref>). The ineffectiveness of school-based accountability led policymakers to shift their focus to teachers as the target of educational reform. Several lines of research laid the foundation for what was to become the most far-reaching policy initiative focused on teacher evaluation, both globally and historically.</p>
</sec>
<sec id="sec3">
<label>3.</label>
<title>The research basis and process for teacher evaluation</title>
<p><xref ref-type="bibr" rid="ref39">Gitomer and Marshall (in press)</xref> reviewed key research efforts that provided the justification for the teacher evaluation policies embedded in the RTTT program. The first line of research focused on <italic>teacher effects</italic>, a statistical determination in which the outcome was changes in student year-to-year achievement on annual standardized achievement scores, and the target input(s) were the teachers who taught each student. Using a range of regression-based approaches (<xref ref-type="bibr" rid="ref78">Nye et al., 2004</xref>), researchers identified teachers as the single most important school-based factor associated with student outcomes. These studies attempted to control for student and school characteristics in order to obtain unconfounded estimates of teacher effects, although such efforts are imperfect in controlling for all non-teacher effects (<xref ref-type="bibr" rid="ref64">Lockwood and Castellano, 2017</xref>).</p>
<p>For many years, researchers had tried to identify teacher characteristics that were associated with teacher effects on student learning. Looking at metrics commonly used for teacher compensation, such as years of service, degree attainment, and academic credits, researchers consistently found limited associations with student achievement (e.g., <xref ref-type="bibr" rid="ref56">Kane et al., 2008</xref>; <xref ref-type="bibr" rid="ref49">Harris and Sass, 2011</xref>). Though student experience was initially related to student outcomes, that relationship disappeared after the first 5&#x2009;years of practice (<xref ref-type="bibr" rid="ref17">Clotfelter et al., 2010</xref>). Similarly, professional certification status and domain-specific coursework had minimal relationships with student achievement growth (<xref ref-type="bibr" rid="ref105">Wayne and Youngs, 2003</xref>; <xref ref-type="bibr" rid="ref40">Goe, 2007</xref>).</p>
<p>If policymakers could not rely on teacher inputs as a measure of teacher quality, research also makes clear that traditional teacher evaluation practices did not lead to very credible or informative reports about teacher practice. Though teacher evaluation was long embedded in educational systems, <xref ref-type="bibr" rid="ref106">Weisberg et al. (2009)</xref> reported that teacher evaluation systems did not identify or remove weak teachers and provided inflated and non-differentiated reports of teacher quality.</p>
<p>The inability to find consistent relationships of teacher inputs to student outcomes and the limited utility of evaluations led policymakers and researchers to turn their attention to other directions. Specifically, they were intrigued with the statistical approaches being promoted by prominent statistician, William Sanders, who had developed an approach known as <italic>Value-Added Modeling</italic> (VAM; <xref ref-type="bibr" rid="ref97">Sanders and Horn, 1994</xref>). VAM used multiple years of prior test scores for each student to estimate the contribution of a specific teacher to the annual growth of all the students in that teacher&#x2019;s classroom. Aggregate VAM scores are standardized so that all teachers in a particular cohort (e.g., a school district or state) are compared in terms of a standardized score relative to the mean score (0) of the cohort. The promise and allure of Sanders&#x2019; VAM was that it was designed to address potential issues of fairness by using prior student achievement as a control to encompass all potential factors that might influence student achievement. Other VAM models that largely followed Sanders&#x2019; approach also emerged, but these models varied on how they treated covariates and other model specifics (see <xref ref-type="bibr" rid="ref11">Braun, 2005</xref>; <xref ref-type="bibr" rid="ref47">Harris, 2011</xref>, for basic introductions to VAM).</p>
<p>Policymakers also became interested in whether compensation systems could be used to improve the quality of teaching. Pay-for-performance systems were developed in a number of states and districts. The Tennessee system, using Sanders&#x2019; VAM models, provided additional compensation to teachers with high VAM scores (<xref ref-type="bibr" rid="ref97">Sanders and Horn, 1994</xref>). Denver public schools developed a more comprehensive compensation model that included annual evaluations and working in high-needs schools.</p>
<p>Finally, research that examined the relationship of teacher practice to student outcomes had also been conducted. Studies examined the effects of particular pedagogical strategies (e.g., <xref ref-type="bibr" rid="ref71">Murnane and Phillips, 1981</xref>) as well as the relationship of teachers&#x2019; scores on classroom observation protocols to the achievement growth of their students (<xref ref-type="bibr" rid="ref70">Milanowski, 2004</xref>; <xref ref-type="bibr" rid="ref57">Kane et al., 2010</xref>).</p>
<sec id="sec4">
<label>3.1.</label>
<title>The interplay of research and teacher evaluation policy</title>
<p>The convergence of the aforementioned research, and the evidence it produced, was used to shape the teacher evaluation policy that was central to RTTT. To understand why and how these particular lines of research were used, we borrow from two theoretical perspectives&#x2014;one that considers policy formation in general terms (<xref ref-type="bibr" rid="ref69">McDonnell and Weatherford, 2020</xref>) and one that considers the sociopolitical context of teaching from a critical race perspective (<xref ref-type="bibr" rid="ref72">Nasir et al., 2016</xref>). Together, these perspectives help us better understand why certain research evidence was so salient in policy formation, why other research was not attended to, and, ultimately, why the research that guided policy was insufficient to adequately satisfy the ambitious policy goals of RTTT.</p>
<p><xref ref-type="bibr" rid="ref69">McDonnell and Weatherford (2020)</xref> described the strategic use of evidence by policymakers to achieve political objectives given a set of goals and beliefs about how best to achieve those goals. In that context, they argued that it was important to understand <italic>what</italic> evidence is given attention as well as <italic>who</italic> is engaged in the production and use of evidence. The <italic>who</italic> includes:</p>
<list list-type="bullet">
<list-item>
<p>r<italic>esearchers</italic>: those who produce original research;</p>
</list-item>
<list-item>
<p><italic>policy entrepreneurs</italic>: those who have a strong policy position and marshal research and other evidence to support that position;</p>
</list-item>
<list-item>
<p><italic>translators and disseminators</italic>: those people and organizations that have a goal of identifying and communicating high-quality research to policymakers;</p>
</list-item>
<list-item>
<p><italic>advocates</italic>: those who represent particular policy positions and put priority on the ends they are trying to achieve; and</p>
</list-item>
<list-item>
<p><italic>hybrids</italic>: those who have an advocacy position and also try to operate as translators and disseminators.</p>
</list-item>
</list>
<p><xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref> argued that, in order to have a comprehensive understanding of teaching, one must take into account the multi-level context in which teaching is situated. Yet, research on teaching and the resulting policies have often ignored such complexity. They further contended that the research and policies over the recent past have been guided by particular kinds of framing of the problems to be addressed.</p>
<p>In <xref ref-type="bibr" rid="ref72">Nasir et al.&#x2019;s (2016)</xref> framework, there are three levels of context that need to be addressed in any full analysis of teaching. First, there are broad economic and policy macro-trends that include: significant and growing economic inequality; the paradox of increasing racial and ethnic diversity in American schools combined with increasing social class segregation in society and schools; and marketized neoliberalism (bringing free-market principles to social issues). The second level includes ways that schools and districts adapt to these broader economic and policy macro-trends. The third level focuses on how these other levels influence the nature of instruction and learning environments that students, and particularly marginalized students, encounter. The focus on accountability testing, for example, often results in low-skill test preparation teaching for marginalized students.</p>
<p><xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref> also adopted <xref ref-type="bibr" rid="ref46">Hand et al.&#x2019;s (2012)</xref> conception of operating frames &#x201C;as a way to examine and reorganize race and power within learning environments. Power plays out in everyday social interaction as individuals become attuned to, coordinate and mobilize around <italic>frames</italic> they engage in during moments of interaction&#x201D; (<xref ref-type="bibr" rid="ref46">Hand et al., 2012</xref>, p. 251). The first frame they identify is one of <italic>colorblindness</italic>, a view that &#x201C;minimizes the existence or consequentiality of race and views policy solutions as best when universal in nature&#x201D; (<xref ref-type="bibr" rid="ref72">Nasir et al., 2016</xref>, p. 354). A second frame is <italic>meritocracy</italic>, one that ascribes accomplishment as solely due to the actions of individuals and &#x201C;allows policy makers to act without acknowledging the systemic nature of racial disparities and diverts attention to the choices of individual actors&#x201D; (<xref ref-type="bibr" rid="ref72">Nasir et al., 2016</xref>, p. 353). The final frame, also located in their multi-level hierarchy is <italic>neoliberalism</italic>, which has led to the marketization of schooling and &#x201C;emanates from three decades of policy that positioned the private sector to be superior to the public sector in providing more efficient social services&#x201D; (<xref ref-type="bibr" rid="ref72">Nasir et al., 2016</xref>, p. 355).</p>
<p>Indeed, accountability efforts, particularly those involving the federal government, have engendered significant debate about their role in supporting education as a public good. While the dominant policy argument has long been that accountability efforts exist to improve education and support the enterprise as a public good, others have been far more critical. They have taken more critical stances, such as those embodied by <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref>, to argue for a much more nuanced understanding of how accountability efforts have also served to diminish education as a public good (see <xref ref-type="bibr" rid="ref3">Anagnostopoulos et al., 2013</xref>).</p>
<p>The research that guided teacher evaluation policy, and how it was conducted and by whom, can help us make sense of how the policy took shape, why the policy was problematic in its uptake by states and districts, and, ultimately, why the initiatives were largely abandoned or dramatically reduced in scope. Using the frameworks provided by both <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref> and <xref ref-type="bibr" rid="ref69">McDonnell and Weatherford (2020)</xref>, we highlight key aspects of research development and use.</p>
<p>The guiding principles of teacher evaluation grew out of the dominant framing noted by <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref> that has directed policy perspectives on education for the last several decades. Embedded within this work was the meritocratic perspective that teachers are the primary agent associated with student growth and that their relative success is deserved and an outcome of choices and actions by individuals (teachers and administrators). The VAM models were proffered as ways of overcoming the influence of any contextual factors and, thus, were designed to be pure measures of a teacher&#x2019;s contribution. By controlling for factors such as race and socio-economic status, these models also subscribed to the framing of colorblindness&#x2014;that evaluation scores are fair estimates of a teacher&#x2019;s quality regardless of a teacher&#x2019;s (or their students&#x2019;) background. Neo-liberal framing was evident throughout the system in hiring and retention policies as well as in the various pay-for-performance schemes that were linked to teacher evaluation.</p>
<p>These framings had important consequences for the kinds of research that was done, who did the research, and how the work was supported. Research on teacher effects was almost always guided by researchers (e.g., <xref ref-type="bibr" rid="ref56">Kane et al., 2008</xref>; <xref ref-type="bibr" rid="ref87">Rockoff et al., 2011</xref>) who adopted the three frames identified by <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref>. These researchers, often educational economists, were focused on identifying the &#x201C;effects&#x201D; of teaching by adopting methods that were designed to control for contextual effects rather than trying to understand their influence.</p>
<p>As <xref ref-type="bibr" rid="ref69">McDonnell and Weatherford (2020)</xref> argued, there are multiple actors involved in how research shapes policy and vice versa. The emergence of teacher evaluation policy, including its central features, represented the confluence of a strategic use of evidence to achieve a particular set of objectives. By the time RTTT was developed, the lines between researchers, policy entrepreneurs, and translators and disseminators/advocates had become highly blurred (see <xref ref-type="bibr" rid="ref26">DeBray and Houck, 2011</xref>). Reckhow and colleagues described how think tanks, foundations, government policymakers, and researchers set a research agenda and policy coordinated to elevate teacher evaluation (see <xref ref-type="bibr" rid="ref85">Reckhow and Tompkins-Stange, 2018</xref>; <xref ref-type="bibr" rid="ref86">Reckhow et al., 2021</xref>). All of these players, likewise, were guided by the three frames identified by <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref>. The Bill and Melinda Gates Foundation and the U. S. Department of Education were the two primary drivers of this work. The Gates Foundation funded research, supported intensive district-level reform efforts, provided advocacy, and worked with the U. S. Department of Education. The U. S. Department of Education, through the RTTT program as well as through funding from the Institute of Education Sciences (IES), not only led the policy initiative but was instrumental in leading advocacy efforts (e.g., <xref ref-type="bibr" rid="ref31">Duncan, 2009</xref>) and funding programs of research that were supportive of the endorsed teacher evaluation efforts. RTTT was driven by a set of core beliefs about public schools, teachers and teacher unions, neo-liberal approaches to the marketization of education, and concerns about academic performance by students in marginalized communities, along with the emergence of research that offered potential solutions.</p>
</sec>
<sec id="sec5">
<label>3.2.</label>
<title>The theory of action and implementation plan guiding teacher evaluation policy</title>
<p>In 2009, the U. S. Department of Education announced the RTTT competition and invited states to compete for funds to support educational reform (<xref ref-type="bibr" rid="ref104">U. S. Department of Education, 2009</xref>). The initiative was based on a theory of action that improved teacher quality would lead to improved student learning. Theories of action specify a cause-and-effect relationship between a policy intervention and a set of desired outcomes (e.g., <xref ref-type="bibr" rid="ref68">McDonald, 2009</xref>). As articulated by <xref ref-type="bibr" rid="ref36">Gitomer and Bell (2013)</xref>, teacher evaluation was championed as improving teacher quality through four complementary drivers. First, teacher evaluation served an accountability purpose in which teachers (and principals) could be held accountable for student performance. Second, evaluation could support what came to be called <italic>the strategic use of human capital</italic>. In a market-based approach, evaluation results could be used to guide a system of incentives and disincentives to manage the supply of teachers by increasing the supply of effective teachers and removing less effective teachers (e.g., <xref ref-type="bibr" rid="ref42">Gordon et al., 2006</xref>; <xref ref-type="bibr" rid="ref50">Heneman et al., 2006</xref>). A third purpose was to improve individual teacher and institutional capacity by including direct measures of classroom instructional quality that could be used as a tool for providing feedback to teachers (e.g., <xref ref-type="bibr" rid="ref9">Borko, 2004</xref>; <xref ref-type="bibr" rid="ref54">Johnson et al., 2004</xref>; <xref ref-type="bibr" rid="ref58">Kardos and Johnson, 2007</xref>). Finally, teacher evaluation could be used to support evidence-based instructional policy by determining the efficacy of particular policies and interventions (e.g., <xref ref-type="bibr" rid="ref91">Rowan et al., 2004</xref>, <xref ref-type="bibr" rid="ref93">2009</xref>; <xref ref-type="bibr" rid="ref92">Rowan and Correnti, 2009</xref>).</p>
<p>There were two components related to teacher evaluation that all proposals needed to satisfy:</p>
<list list-type="simple">
<list-item>
<p>1. building data systems that measure student growth and success and inform teachers and principals about how they can improve instruction; and</p>
</list-item>
<list-item>
<p>2. recruiting, developing, rewarding, and retaining effective teachers and principals, especially where they are needed most.</p>
</list-item>
</list>
<p>Specific criteria that had to be met included:</p>
<list list-type="simple">
<list-item>
<p>1. measuring student growth for every student;</p>
</list-item>
<list-item>
<p>2. creating evaluation systems that:</p>
</list-item>
</list>
<list list-type="alpha-lower">
<list-item>
<p>differentiated effectiveness using multiple rating categories and treated student growth as a significant factor; and</p>
</list-item>
<list-item>
<p>were designed and developed with educator involvement;</p>
</list-item>
</list>
<list list-type="simple">
<list-item>
<p>3. conducting annual teacher evaluations that provided feedback, including information on student growth from their students and classes; and</p>
</list-item>
<list-item>
<p>4. using teacher evaluations to inform decisions regarding:</p>
</list-item>
</list>
<list list-type="alpha-lower">
<list-item>
<p>coaching and development;</p>
</list-item>
<list-item>
<p>compensation, promotion, retention, and advancement;</p>
</list-item>
<list-item>
<p>tenure; and</p>
</list-item>
<list-item>
<p>removal.</p>
</list-item>
</list>
<p>By 2011, 19 states received RTTT funding. However, far more states and localities (42 in total) adopted these policies in order to obtain waivers from the NCLB mandates that were still in effect (<xref ref-type="bibr" rid="ref39">Gitomer and Marshall, in press</xref>). While states, and often, districts within states, varied in how they developed the specifics of their systems, <xref ref-type="bibr" rid="ref39">Gitomer and Marshall (in press)</xref> described the key features of all systems, their technical limitations, and how they varied across and within states.</p>
<p>One requirement for determining any teacher growth measure is the necessity of defining which students&#x2019; growth scores should be used in determining a teacher&#x2019;s value-added. The realities of schooling made this a non-trivial problem, and the solutions varied greatly. For example, how should students with high levels of absenteeism be treated? If students move between schools multiple times across the year, how should they be treated in the VAM models? What about situations in which multiple teachers are responsible for the students in a particular classroom (e.g., special education)? Of course, less stable student populations are typically associated with schools with high proportions of minoritized and economically insecure students (see <xref ref-type="bibr" rid="ref33">Everson, 2017</xref>).</p>
<p>A second issue concerned the inclusion of particular test score results for each teacher. In some models, teachers had evaluation scores that included test results for which they ostensibly had no teaching responsibility.</p>
<p>Third, states had to decide which measures contributed to an evaluation system. Almost all states included a student growth measure and a classroom observation score. But other measures, including student learning objectives (SLOs), principal rating, overall achievement levels of grades and/or schools, and student surveys, were also used in some systems.</p>
<p>Fourth, states needed to decide how the scores from different measures were aggregated for a final evaluation score. Aggregation methods could be compensatory, in which each component is weighted in a linear combination of scores, and a total score is used to determine the appropriate evaluation category for an individual. Another option was to use a conjunctive model, in which a minimum score is required for each of the constituent measures. How scores were weighted in the overall model depended on how highly particular measures were valued relative to others, as well as how much variation was associated with particular measures. Measures that have scores that vary more across individuals will have a greater influence on overall evaluative judgments than measures on which most individuals receive the same score, even if the latter measures are assigned nominal weights.</p>
<p>Fifth, states differed in both the consequences and supports given for particular evaluation scores. Typically, an ineffective rating was associated with some type of probationary status for the first year, which then would require some type of additional professional development and support.</p>
<p>Sixth, measures, particularly those associated with classroom observations, required some type of training of principals and other administrators. While researchers have given great attention to observer training, calibration, and overall quality control of scores (<xref ref-type="bibr" rid="ref75">National Research Council, 2008</xref>; <xref ref-type="bibr" rid="ref6">Bell et al., 2014</xref>), in practice, these procedures were often compromised as school districts did not have the time or resources to undertake the kinds of procedures that had been used to validate measures from research studies.</p>
</sec>
</sec>
<sec id="sec6">
<label>4.</label>
<title>Measures and challenges</title>
<p>While states adopted a large number of measures for their teacher evaluation systems, the three measures that were most ubiquitous and most prominent across systems are discussed here. Each of these measures was used to support inferences about a teacher&#x2019;s quality. However, each of these measures had significant technical issues that challenged the validity of using them for such a consequential process.</p>
<sec id="sec7">
<label>4.1.</label>
<title>Student growth measures</title>
<p>RTTT advocated the use of growth measures to overcome inherent problems associated with making any relative judgments of teachers based on their students&#x2019; achievement status by separating the effects of teachers from other factors such as demographics, resources, and student prior achievement. The basic logic was that any attributions to teacher effectiveness must be made with respect to the relative year-to-year growth in student achievement. A broad range of growth models were used, including different versions of VAM, as well as a related method, <italic>Student Growth Percentiles</italic> (SGP; <xref ref-type="bibr" rid="ref7">Betebenner, 2009</xref>). All growth models required multiple years of student test data linked to individual teachers.</p>
<p>Research on growth models made clear that precise, causal estimates of a teacher&#x2019;s contribution to student learning were very fragile. <xref ref-type="bibr" rid="ref94">Rowan and Raudenbush (2016)</xref> provided a detailed overview of the challenges in using growth models to make high-stakes decisions about teachers. <xref ref-type="bibr" rid="ref83">Reardon and Raudenbush (2009)</xref> explained how the fundamental statistical assumptions that are foundational to these models can never be satisfied. Studies have revealed how relative estimates of teacher quality can shift dramatically because of using different estimation techniques (<xref ref-type="bibr" rid="ref41">Goldhaber and Theobold, 2013</xref>) or different achievement measures (<xref ref-type="bibr" rid="ref65">Lockwood et al., 2007</xref>; <xref ref-type="bibr" rid="ref45">Grossman et al., 2014</xref>). Multiple studies have shown that VAM estimates can be statistically biased toward classrooms that have students with higher levels of prior achievement, a situation that growth models were supposed to overcome (<xref ref-type="bibr" rid="ref89">Rothstein, 2009</xref>, <xref ref-type="bibr" rid="ref90">2017</xref>; <xref ref-type="bibr" rid="ref82">Raudenbush, 2013</xref>). Researchers also found that VAM scores in one testing domain (e.g., reading) could be influenced by the quality of teaching in another domain (e.g., mathematics) (<xref ref-type="bibr" rid="ref60">Koedel, 2009</xref>).</p>
<p>Scholars in measurement and statistics issued several statements to caution about the use of these models for high-stakes decisions. <xref ref-type="bibr" rid="ref4">Baker et al. (2010)</xref> produced a consensus statement of several leading educational scholars that cautioned the use of these measures and also highlighted potential unintended consequences, including discouraging teachers from wanting to work in schools with students who had the most academic needs. The <xref ref-type="bibr" rid="ref2">American Statistical Association (2014)</xref> also released a statement, recognizing the value of VAM to help understand the relationship of different factors to student outcomes when results are aggregated across teachers but also cautioning against using these models to make strong causal statements about individual teachers. Other cautionary and critical statements were made by the National Association of Secondary School Principals (NASSP) in 2015 (see <ext-link xlink:href="https://www.nassp.org/top-issues-in-education/position-statements/" ext-link-type="uri">https://www.nassp.org/top-issues-in-education/position-statements/</ext-link> for the most recent version, updated in 2019; <xref ref-type="bibr" rid="ref73">National Association of Secondary School Principals, 2019</xref>) and the American Educational Research Association (<xref ref-type="bibr" rid="ref1">American Educational Research Association Council, 2015</xref>). Several lawsuits challenging the consequences of teacher evaluation efforts were also instituted (see <xref ref-type="bibr" rid="ref79">Paige and Amrein-Beardsley, 2020</xref>).</p>
</sec>
<sec id="sec8">
<label>4.2.</label>
<title>Student learning objectives</title>
<p>As much as growth measures based on student achievement scores were central to this evaluation movement, the fact is that a very large proportion of teachers did not have testing data that would be appropriate for estimating student growth. Testing was only federally mandated, for example, in grades 3&#x2013;8 mathematics and reading, meaning that teachers in earlier and later grades, as well as those who taught other subjects, would not have students who had multiple years of testing data to analyze. Certain states did, however, impose more encompassing testing requirements.</p>
<p>Thus, in order to address the legislative mandate that teacher evaluation needed to include a &#x201C;student growth measure,&#x201D; most states adopted SLOs for teachers in non-tested subjects, but many states also used them for all teachers as a complementary measure of student growth. SLOs are a locally determined evaluation of teacher effectiveness by which measurable targets for student achievement are set following an analysis of baseline data. Essentially, SLOs include some prior to instruction measure of student understanding (pre-test) and a post-instruction measure or assessment. The extent to which those targets are met is then used to evaluate the teacher. Within this common definition, specific features of the SLO process have varied substantially (see <xref ref-type="bibr" rid="ref20">Crouse et al., 2016</xref>).</p>
<p>An SLO consists of three components. The first is the population of students it covers&#x2014;is the teacher evaluated on the basis of performance by all students in all classrooms and subjects taught by the teacher or just a subset (e.g., only mathematics or reading for an elementary teacher, only one section of a course for a secondary teacher)? The second component is the target of the SLO&#x2014;do all teachers with the same teaching assignment in a school or district have the same target, or is greater variability part of the design? In addition, the meaningfulness of SLO-based scores is largely a function of the quality control procedures used in the implementation of the SLO process (<xref ref-type="bibr" rid="ref20">Crouse et al., 2016</xref>).</p>
<p>The third component is the assessment to measure student learning. SLO assessments include locally generated measures as well as standardized, externally developed assessments. Often, classroom assessments such as portfolios or some type of performance assessment are used.</p>
<p>While little research about the quality of SLO measures was done, <xref ref-type="bibr" rid="ref20">Crouse et al. (2016)</xref> described the inherent problems of using SLOs as a measure of student growth in teacher evaluations. They argued that the validity of such measures for evaluating and comparing teachers could not be justified because of the idiosyncratic nature of their design and implementation. They also pointed out that making causal attributions to a teacher was problematic in light of external factors such as district curriculum, outside tutoring, and student background characteristics that can influence student outcomes. Finally, the use of SLOs was highly variable across states and districts. In some cases, all teachers needed to have an SLO as part of their evaluation. In other instances, only those teachers who did not have standardized test-based growth scores were required to have SLOs. Because the distribution of scores for test-based growth models and SLOs tends to be different, the net effect is that overall evaluation scores could be lower for teachers who have growth estimates based entirely or, in part, on standardized tests as compared with those only having SLOs as their growth measure component.</p>
</sec>
<sec id="sec9">
<label>4.3.</label>
<title>Classroom observations</title>
<p>Structured observation protocols, originally designed as tools for professional development (e.g., <xref ref-type="bibr" rid="ref24">Danielson, 2007</xref>; <xref ref-type="bibr" rid="ref80">Pianta et al., 2008</xref>), soon became the object of study in research and a key component of teacher evaluation systems under RTTT. These protocols were created around particular views of teaching that drew on research and were organized along sets of cognitive, social, emotional, and classroom management dimensions of instructional quality.</p>
<p>The protocols adopted for teacher evaluation systems were designed to be used across grades and subject areas. Each protocol provided guidelines for how to observe a period of classroom instruction, how to code what was observed, and how to score instruction for the set of criteria that were described in the protocol&#x2019;s scoring rubric (see <xref ref-type="bibr" rid="ref5">Bell et al., 2012</xref>).</p>
<p>Scores typically involved some form of aggregation of dimensional scores into a total lesson score as well as aggregation of scores across multiple lessons. The management of observations and recording and maintaining of data within school systems was often done with the assistance of commercial observation tools that were designed specifically to support teacher evaluation processes.</p>
<p>Research has shown the limitations of observation protocols in assuring precise and valid estimates of teacher quality. For one, many factors, other than the quality of the instruction itself, can influence the scores for a particular observation, most especially the observers themselves. Research efforts have tried to moderate these sources of error through careful training and monitoring of observers, using multiple and different observers across multiple observations, and ensuring that there were no conflicts of interest between the observer and the observed that might bias scoring (see <xref ref-type="bibr" rid="ref5">Bell et al., 2012</xref>, <xref ref-type="bibr" rid="ref6">2014</xref>).</p>
<p>As observation measures were used in evaluation systems, it became clear that findings from research studies did not generalize to practice settings. Observation scores in practice are uniformly higher than scores from research studies, for example. Scores in research studies that typically fell in the 2&#x2013;3 range on 4-point scales fell between 3 and 4 when used in practice (see <xref ref-type="bibr" rid="ref98">Sartain et al., 2010</xref>; <xref ref-type="bibr" rid="ref12">Briggs et al., 2014</xref>).</p>
<p>Of course, conditions within practice settings were quite different as observers were not disinterested parties. They knew the teachers and worked with them as part of a professional staff (<xref ref-type="bibr" rid="ref48">Harris et al., 2014</xref>; <xref ref-type="bibr" rid="ref61">Kraft and Gilmour, 2017</xref>; <xref ref-type="bibr" rid="ref29">Donaldson and Woulfin, 2018</xref>), and they gave higher scores for teachers they worked with than for teachers with whom they were not familiar (<xref ref-type="bibr" rid="ref51">Ho and Kane, 2013</xref>). School administrators must conduct observations by statute, regardless of how well qualified they are to score. Typically, fewer observations were conducted in school evaluations than in research studies, and it was very rare for any system to include multiple observers.</p>
<p>It also became clear, in both research studies and studies of observation in practice, that personal characteristics of the teacher, and especially the students, affected observation scores. There have been consistent findings that teachers of students with weaker academic profiles are assigned lower observation scores (<xref ref-type="bibr" rid="ref37">Gitomer et al., 2014</xref>; <xref ref-type="bibr" rid="ref14">Campbell and Ronfeldt, 2018</xref>). In addition, <xref ref-type="bibr" rid="ref102">Steinberg and Sartain (2021)</xref> found that observation scores of Black teachers were substantially lower than scores for White teachers and that those differences could be accounted for by the achievement levels of their students. <xref ref-type="bibr" rid="ref14">Campbell and Ronfeldt (2018)</xref> found that male teachers tended to have lower than expected scores than female teachers and that scores were also lower than expected in classrooms with higher concentrations of Black, Latin&#x002A;<xref rid="fn0001" ref-type="fn"><sup>1</sup></xref>, male, and low-performing students, a result also found by <xref ref-type="bibr" rid="ref34">Garrett and Steinberg (2015)</xref>.</p>
</sec>
</sec>
<sec id="sec10">
<label>5.</label>
<title>The fizzle of teacher evaluation policy &#x2013; promises not kept</title>
<p>Despite the tremendous amount of resources, attention, and effort given to teacher evaluation, teacher evaluation had a very short shelf-life as a major educational reform policy. By 2015, the core idea of linking teacher evaluation to student outcomes was abandoned when the ESEA was reauthorized in the form of the <italic>Every Student Succeeds Act</italic> (<xref ref-type="bibr" rid="ref32">ESSA, 2015</xref>):</p>
<disp-quote>
<p>Nothing in this Act shall be construed to authorize or permit the Secretary &#x2026; as a condition of approval of the State plan, or revisions or amendments to, the State plan, or approval of a waiver request submitted under section 8401, to &#x2026; prescribe &#x2026; &#x2018;any aspect or parameter of a teacher, principal, or other school leader evaluation system within a State or local educational agency; &#x2026; indicators or specific measures of teacher, principal, or other school leader effectiveness or quality; (pp. 42&#x2013;43)</p>
</disp-quote>
<p>ESSA reflected a change in the entire policy landscape, as teacher evaluation was no longer perceived as the key to improving America&#x2019;s schools. Actors like the Gates Foundation, which had played a major role in advocating for and influencing teacher evaluation policy, also relatively quickly moved in other directions. By 2018, the Foundation publicly acknowledged the modest impact their efforts had made (<xref ref-type="bibr" rid="ref35">Gates and Gates, 2018</xref>). Other foundations that had been players in the teacher evaluation movement also switched priorities.</p>
<p>There certainly was a great deal of political pushback to the increasing federal role in public education, most especially with the Common Core curricular standards and associated assessments (<xref ref-type="bibr" rid="ref66">Loveless, 2021</xref>). By 2015, the two cornerstones of education reform&#x2014;ambitious standards and teacher evaluation&#x2014;had gone from broad endorsement to policies that were increasingly shunned. Indeed, as the federal mandate disappeared, large numbers of states abandoned, or gave great flexibility to, the use of growth models built on student test scores (<xref ref-type="bibr" rid="ref16">Close et al., 2020</xref>). Many states, however, continued to mandate some type of classroom observation.</p>
<p><xref ref-type="bibr" rid="ref39">Gitomer and Marshall (in press)</xref> reviewed evidence addressing the extent to which teacher evaluation policy efforts met the ambitious goals that were promised upon the launch of RTTT. While the results summarized in this section are representative of what happened across the country, there was variation in how systems were implemented and the kinds of results that were observed. The most notable exception to general findings was found in Washington D. C., which implemented a very well-resourced, comprehensive reform effort that resulted in significant changes to the district&#x2019;s schools (<xref ref-type="bibr" rid="ref76">National Research Council, 2015</xref>; <xref ref-type="bibr" rid="ref53">James and Wyckoff, 2020</xref>). The intensive, multi-faceted systemic approach of Washington D. C. stands in contrast to how teacher evaluation was conceptualized and operationalized in most settings.</p>
<sec id="sec11">
<label>5.1.</label>
<title>The promise of identifying weak teachers</title>
<p>One goal of teacher evaluation was to differentiate teachers based on their effectiveness. However, <xref ref-type="bibr" rid="ref61">Kraft and Gilmour (2017)</xref> and <xref ref-type="bibr" rid="ref101">Stecher et al. (2018)</xref> found that evaluation score distributions were largely unchanged from the findings of <xref ref-type="bibr" rid="ref106">Weisberg et al. (2009)</xref>. <xref ref-type="bibr" rid="ref44">Grissom and Loeb (2017)</xref> noted that principals would give higher scores in an accountability context than they would for professional development. Not surprisingly (see <xref ref-type="bibr" rid="ref94">Rowan and Raudenbush, 2016</xref>), evaluators in professional contexts consider many factors aside from the performance itself in making ratings (<xref ref-type="bibr" rid="ref48">Harris et al., 2014</xref>; <xref ref-type="bibr" rid="ref29">Donaldson and Woulfin, 2018</xref>). Two explanations for the failure to identify weak teachers are (1) inconsistent training of evaluators (i.e., school leaders); and (2) the difficulty of negatively evaluating colleagues.</p>
</sec>
<sec id="sec12">
<label>5.2.</label>
<title>The promise of improving student performance</title>
<p>If the end goal for improving teacher quality through teacher evaluation is that students would benefit, results were disappointing. <xref ref-type="bibr" rid="ref101">Stecher et al. (2018)</xref> observed null effects in terms of mathematics and English language arts (ELA) achievement in three large school districts across the 6&#x2009;years of an intensive push to embed teacher evaluation systems. <xref ref-type="bibr" rid="ref8">Bleiberg et al. (2021)</xref> conducted a cross-state analysis of student achievement by examining test scores before and after each state implemented their evaluation system. They also found null effects that did not vary over time since implementation.</p>
</sec>
<sec id="sec13">
<label>5.3.</label>
<title>The promise of changing the composition of the teaching force</title>
<p>Critical to the theory of action underlying this policy was the idea that weaker teachers could be replaced with more effective teachers. Substantial effects were found in Washington D. C. (<xref ref-type="bibr" rid="ref28">Dee and Wyckoff, 2017</xref>; <xref ref-type="bibr" rid="ref53">James and Wyckoff, 2020</xref>). While studies with other samples found changes in the teaching force, although to a lesser degree (e.g., <xref ref-type="bibr" rid="ref43">Grissom and Bartanen, 2019</xref>; <xref ref-type="bibr" rid="ref77">Nguyen et al., 2019</xref>; <xref ref-type="bibr" rid="ref22">Cullen et al., 2021</xref>), <xref ref-type="bibr" rid="ref101">Stecher et al. (2018)</xref> found null effects.</p>
</sec>
<sec id="sec14">
<label>5.4.</label>
<title>The promise of supporting more effective professional development</title>
<p>One of the key policy mechanisms for improving teaching quality was to provide better and more targeted professional development. <xref ref-type="bibr" rid="ref61">Kraft and Gilmour (2017)</xref> and <xref ref-type="bibr" rid="ref101">Stecher et al. (2018)</xref> did not find any evidence of such improvement and attributed this to the inherent tension between evaluations being used for accountability and high-stakes evaluations on the one hand and then being used for professional development on the other. In such cases, the accountability uses typically dominated and crowded out the professional development messages.</p>
</sec>
<sec id="sec15">
<label>5.5.</label>
<title>The promise of contributing to equity</title>
<p>Arguably the most important goal of this educational reform initiative was to improve the quality of teaching in schools that had histories of poor academic performance. Schools in urban and impoverished communities were of particular interest as those areas were the face of the U. S. education crisis (<xref ref-type="bibr" rid="ref21">Cuban, 1989</xref>). These districts typically had high proportions of Black, Latin&#x002A;, and Indigenous students, English language learners (ELLs), students considered in need of special education services, as well as the highest proportion of minoritized teachers (<xref ref-type="bibr" rid="ref10">Boyd et al., 2010</xref>; <xref ref-type="bibr" rid="ref88">Ronfeldt et al., 2016</xref>; <xref ref-type="bibr" rid="ref23">D&#x2019;Amico et al., 2017</xref>).</p>
<p>Again, Washington D. C. was relatively unique in making progress toward these goals, but this was an exception. In most targeted districts, teachers were disincentivized from working in low-performing schools. As we have discussed, teachers of students with weaker academic profiles fare more poorly on teacher evaluations (<xref ref-type="bibr" rid="ref30">Drake et al., 2019</xref>). The bias that has been observed in these systems across measures is alarming.</p>
<p>There are multiple reasons why teachers of students with weaker academic profiles fare more poorly in these evaluation systems. As previously mentioned, there appears to be some statistical biases in the growth model estimates. Additionally, some low-achieving students have high levels of absenteeism, yet their test scores contribute as much to a teacher&#x2019;s estimate as those of students who rarely miss school. <xref ref-type="bibr" rid="ref19">Cowen (2017)</xref> found that unhoused students, more likely to be Black and Latin&#x002A;, are much more transient, almost always impoverished, and have lower achievement levels (i.e., classroom assessments and standardized test scores). Yet, teachers of these students are unfairly treated identically in the growth estimate models.</p>
<p>Classroom observations raise a number of additional issues with respect to equity. <xref ref-type="bibr" rid="ref52">Jacob and Walsh (2011)</xref>, <xref ref-type="bibr" rid="ref37">Gitomer et al. (2014)</xref>, <xref ref-type="bibr" rid="ref34">Garrett and Steinberg (2015)</xref>, <xref ref-type="bibr" rid="ref14">Campbell and Ronfeldt (2018)</xref>, and <xref ref-type="bibr" rid="ref102">Steinberg and Sartain (2021)</xref> have all found that observation scores are systematically lower for teachers who teach students with weaker academic profiles.</p>
<p>Additionally, Black teachers are more likely to receive lower observation scores (<xref ref-type="bibr" rid="ref14">Campbell and Ronfeldt, 2018</xref>; <xref ref-type="bibr" rid="ref102">Steinberg and Sartain, 2021</xref>), and Black teachers who work in schools with predominantly White staff are more likely to receive lower evaluation ratings than those who work at schools with mostly Black colleagues (<xref ref-type="bibr" rid="ref30">Drake et al., 2019</xref>). <xref ref-type="bibr" rid="ref13">Campbell (2020)</xref> found that Black women received lower observation scores than White women, even when accounting for other measures of teaching quality, especially in schools where the race of the evaluator differed from that of the teacher.</p>
<p>Unfortunately, there is a paucity of research on the effects of teacher evaluation policies on teachers of students who represent the full range of students in American schools. However, there have been several studies that have discussed the complexity of conducting evaluations of teachers of special education students (<xref ref-type="bibr" rid="ref55">Jones et al., 2022</xref>) and of English language learners (<xref ref-type="bibr" rid="ref103">Turkan and Buzick, 2016</xref>).</p>
</sec>
</sec>
<sec id="sec16">
<label>6.</label>
<title>Post-mortem</title>
<p>By almost any definition, the exuberant adoption and endorsement of teacher evaluation as a panacea for the educational problems facing the United States in the 2000s was hardly justified. None of the ambitious goals were satisfied. One of the primary reasons for this disappointment, we argue, is that the research foundations upon which all of this was built were myopic and insufficient to effectively implement and produce results that were technically valid and substantively robust enough to address the complex issues of teaching and learning in American schools.</p>
<p>We can return to the three operating frames that <xref ref-type="bibr" rid="ref72">Nasir et al. (2016)</xref> identified to highlight the gaps in the research base and policy interpretation and also to demonstrate that the limitations of these frames also have consequences for the technical quality of the evaluation measures. We do not claim that these operating frames are the only reason for the technical problems that surfaced, but we do claim that they played a major role.</p>
<p>The first frame is <italic>colorblindness,</italic> which minimizes the consequences of race and argues that all policy prescriptions should be the same, independent of racial considerations. Yet, the failure to address race and racism in our educational system had profound negative influence on the utility of the evaluation systems. We see that historical forces that have located minoritized students in lower-performing schools and inadequately resourced neighborhoods actually have direct effects on all measures, independent of the actual skills of a particular teacher. We see evidence of significant bias in observation systems that produces lower scores for Black teachers and lower scores for teachers of Black children. And we see all of this as raising skepticism of the fairness of assessment-based systems (<xref ref-type="bibr" rid="ref38">Gitomer and Iwatani, 2022</xref>).</p>
<p>The second frame is <italic>meritocracy,</italic> which would ascribe accomplishment as solely due to the actions of the individual teacher and their impact on the student, ignoring the systemic nature of racial disparities and also ignoring the interdependence of teachers with other educators, and the resources and constraints they are provided, the tests their students are given, what students experience in other classrooms and at home, and the complex interrelated web of other factors that all have an influence on what goes on in a classroom. Such an approach also ignores the complexity and messiness of conducting assessments and evaluations. Treating all of these factors as either measurement error or factors that can be statistically controlled is to ignore reality and trivialize the educational process. From a technical perspective, we see measures that have a tremendous amount of error associated with them and the fundamental problem that causal claims at an individual level cannot be supported. And, of course, much of the system was predicated on using student standardized achievement test scores as the primary, if not sole, marker of student progress.</p>
<p>The final frame is <italic>neoliberalism</italic>, the idea that market-based incentives and practices can be applied in the educational system. Such simple explanations do not help account for the range of motivations that institutions and individuals have in assigning evaluation scores. The fact that distributions of teacher ratings barely changed, despite being an explicit goal of the policy, points to the failures to understand complex organizational behaviors associated with performance judgments (<xref ref-type="bibr" rid="ref94">Rowan and Raudenbush, 2016</xref>). The fact that pay-for performance systems have had modest to no effects (<xref ref-type="bibr" rid="ref99">Springer et al., 2016</xref>) suggests that economic incentives are not sufficient enough to result in desired changes in teaching.</p>
<p>The idea that one could build evaluation systems based on the emergence of a set of attractive technologies and limited and limiting frames, without attending to social, cultural, organizational, political, and even measurement theory and research led to a system that was bound to fizzle.</p>
<p>Essential problems included, first, the failure to resolve the tension between the goals of accountability and professional development. Second, all constituent measures had very significant problems in supporting the kinds of inferences that were needed for a high-stakes evaluation system. While the measures were not without value, they were being asked to carry far more water than the system could support.</p>
<p>Finally, if it was not clear to some at inception, it should be abundantly clear after this grand social experiment that teacher evaluation was not the policy lever to challenge the ubiquitous inequities in our educational system. The systems tended to reify historical inequities rather than upend them. Had attention been paid to researchers who were considering the multi-level nature of educational influences as the system was designed, it is possible that certain missteps could have been avoided.</p>
<p>This experience also highlights the risks associated with conducting policy formation within an echo-chamber of researchers, funders, and intermediaries who all adopt a similar framing of the problem. Without challenge, one can continue to wind up with expensive and taxing policies that are ephemeral.</p>
<p>The fizzle of Race to the Top does not negate concerns about instructional quality, nor does it negate the need for thoughtful evaluation, hiring, and retention practices that are essential to any well-functioning institution.</p>
<p>There is no doubt that much was learned during the time preceding and concurrent with this policy. Classroom observation instruments and SLOs have the potential to be used as they were initially designed&#x2014;to support professional development. VAM can be useful to understand educational issues at aggregate levels. But having measures alone, developed and researched in one context, is not a warrant for a massive policy initiative. In order to move forward on any kind of major educational reform policies in the future, much more sophisticated and nuanced theories of action will be required.</p>
<p>What might such more productive reforms look like? While it would be presumptuous to suggest a particular design, it is possible to outline certain principles that are critical to consider. We can draw on research that has studied effective schools and effective teaching across different contexts and countries to imagine policies to encourage, as well as those to avoid, in designing approaches to the evaluation of teaching within schools.</p>
<list list-type="order">
<list-item>
<p>Teaching is contextually bound, and any attempt to understand and evaluate teaching as a reflection of the teacher alone is inherently misguided. Factors as far-ranging as curriculum, community, food and housing insecurity, school leadership, school and classroom resources, and students&#x2019; language and culture all have profound effects on what transpires within a given classroom. A history of educational accountability policy in the United States has focused on particular entities in the system (students, schools, principals, teachers) apart from all these contextual issues, and each effort has failed. Any productive evaluation system needs to understand how teaching is influenced by, and influences, this larger context. Only then can more reasonable interpretations of particular actors and actions be made, and only then can more thoughtful decisions of follow-up actions be made.</p>
</list-item>
<list-item>
<p>Any system should pay explicit attention to issues of race, language, culture, and power in understanding and supporting classroom interactions. It is not sufficient to simply put forth standards that say all students&#x2019; needs should be met. We know that there are specific challenges and approaches that engage and support students from different backgrounds (e.g., <xref ref-type="bibr" rid="ref62">Ladson-Billings, 2009</xref>).</p>
</list-item>
<list-item>
<p>To the extent that teachers are held accountable for their teaching, measures should be transparent, actionable, and under teachers&#x2019; control. A central critique of growth models used in teacher evaluation systems was that they did not meet any of these criteria. Measures that focus on teacher actions, interactions, and decision-making are those that individuals and systems are more apt to be able to address.</p>
</list-item>
<list-item>
<p>The criteria against which teacher effectiveness is measured should reflect a full vision of teaching. The attractiveness of using growth measures was that these metrics were available for large numbers of teachers. They also led to mathematics and reading test scores receiving overwhelming attention, often to the exclusion of other subject areas and almost always to the exclusion of important outcomes of classroom instruction that were not measured by standardized achievement tests. Focusing on a small set of proxy measures for teacher evaluation will inevitably distort school practices (see <xref ref-type="bibr" rid="ref94">Rowan and Raudenbush, 2016</xref>).</p>
</list-item>
<list-item>
<p>Systems should anticipate and try to avoid predictable reactions of how policies will be interpreted and acted upon. The inflation of observation scores and the far lower than anticipated classification of teachers as needing improvement should not have been surprising in light of what we know about how systems respond to performance appraisal systems (<xref ref-type="bibr" rid="ref94">Rowan and Raudenbush, 2016</xref>). Actors will be less apt to shape responses to policy goals in unintended ways if they are invested in the goals and processes of the system. Any policy needs to be informed and have buy-in from practitioners in the field that is far greater than what was evident in Race to the Top.</p>
</list-item>
<list-item>
<p>Systems should have as a dominant goal the development of the educational system, which would include professional development for teachers and school leaders, curricular reform, community relationships, resource analysis, etc. While the Race to the Top system endorsed the rhetoric of professional development, effective efforts that built on the evaluations were not commonplace. Policy, resources, and attention were given to the mechanics of evaluation and human resource management far more than they were to system development. If future teacher evaluation efforts are to be successful, these priorities need to be inverted.</p>
</list-item>
</list>
</sec>
<sec id="sec18">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
<sec sec-type="COI-statement" id="sec19">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="COI-statement" id="sec22">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><collab id="coll1">American Educational Research Association Council</collab></person-group>. (<year>2015</year>). <article-title>AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs</article-title>. <source>Educ. Res.</source> <volume>44</volume>, <fpage>448</fpage>&#x2013;<lpage>452</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X15618385</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll2">American Statistical Association</collab></person-group>. (<year>2014</year>). ASA statement on using value-added models for educational assessment. Available at: <ext-link xlink:href="https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf" ext-link-type="uri">https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf</ext-link></citation></ref>
<ref id="ref3"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Anagnostopoulos</surname> <given-names>D.</given-names></name> <name><surname>Rutledge</surname> <given-names>S. A.</given-names></name> <name><surname>Jacobsen</surname> <given-names>R</given-names></name></person-group>. (Eds.). (<year>2013</year>). <source>The infrastructure of accountability: Data use and the transformation of American education</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard Education Press</publisher-name>.</citation></ref>
<ref id="ref4"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Baker</surname> <given-names>E. L.</given-names></name> <name><surname>Barton</surname> <given-names>P. E.</given-names></name> <name><surname>Darling-Hammond</surname> <given-names>L.</given-names></name> <name><surname>Haertel</surname> <given-names>E.</given-names></name> <name><surname>Ladd</surname> <given-names>H. F.</given-names></name> <name><surname>Linn</surname> <given-names>R. L.</given-names></name> <etal/></person-group>. (<year>2010</year>). <source>Problems with the use of student test scores to evaluate teachers (EPI briefing paper #278)</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Economic Policy Institute</publisher-name>.</citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>C. A.</given-names></name> <name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>McCaffrey</surname> <given-names>D. F.</given-names></name> <name><surname>Hamre</surname> <given-names>B. K.</given-names></name> <name><surname>Pianta</surname> <given-names>R. C.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>An argument approach to observation protocol validity</article-title>. <source>Educ. Assess.</source> <volume>17</volume>, <fpage>62</fpage>&#x2013;<lpage>87</lpage>. doi: <pub-id pub-id-type="doi">10.1080/10627197.2012.715014</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>C. A.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name> <name><surname>Croft</surname> <given-names>A. J.</given-names></name> <name><surname>Leusner</surname> <given-names>D.</given-names></name> <name><surname>McCaffrey</surname> <given-names>D. F.</given-names></name> <name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <etal/></person-group>. (<year>2014</year>). &#x201C;<article-title>Improving observational score quality: challenges in observer thinking</article-title>&#x201D; in <source>Designing teacher evaluation systems: New guidance from the measures of effective teaching project</source>. eds. <person-group person-group-type="editor"><name><surname>Kane</surname> <given-names>T. J.</given-names></name> <name><surname>Kerr</surname> <given-names>K. A.</given-names></name> <name><surname>Pianta</surname> <given-names>R. C.</given-names></name></person-group> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Jossey-Bass</publisher-name>), <fpage>50</fpage>&#x2013;<lpage>97</lpage>.</citation></ref>
<ref id="ref7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Betebenner</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Norm- and criterion-referenced student growth</article-title>. <source>Educ. Meas. Issues Pract.</source> <volume>28</volume>, <fpage>42</fpage>&#x2013;<lpage>51</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1745-3992.2009.00161.x</pub-id></citation></ref>
<ref id="ref8"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bleiberg</surname> <given-names>J.</given-names></name> <name><surname>Brunner</surname> <given-names>E.</given-names></name> <name><surname>Harbatkin</surname> <given-names>E.</given-names></name> <name><surname>Kraft</surname> <given-names>M. A.</given-names></name> <name><surname>Springer</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <source>The effect of teacher evaluation on achievement and attainment: Evidence from statewide reforms</source>. <publisher-loc>Providence, RI</publisher-loc>: <publisher-name>Annenberg Institute at Brown University</publisher-name>.</citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borko</surname> <given-names>H.</given-names></name></person-group> (<year>2004</year>). <article-title>Professional development and teacher learning: mapping the terrain</article-title>. <source>Educ. Res.</source> <volume>33</volume>, <fpage>3</fpage>&#x2013;<lpage>15</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X033008003</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boyd</surname> <given-names>D.</given-names></name> <name><surname>Lankford</surname> <given-names>H.</given-names></name> <name><surname>Loeb</surname> <given-names>S.</given-names></name> <name><surname>Ronfeldt</surname> <given-names>M.</given-names></name> <name><surname>Wyckoff</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>The role of teacher quality in retention and hiring: using applications to transfer to uncover preferences of teachers and schools</article-title>. <source>J. Policy Anal. Manage.</source> <volume>30</volume>, <fpage>88</fpage>&#x2013;<lpage>110</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pam.20545</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Braun</surname> <given-names>H. I.</given-names></name></person-group> (<year>2005</year>). <source>Using student Progress to evaluate teachers: A primer on value-added models</source>. <publisher-loc>Princeton, NJ</publisher-loc>: <publisher-name>Educational Testing Service</publisher-name>.</citation></ref>
<ref id="ref12"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Briggs</surname> <given-names>D. C.</given-names></name> <name><surname>Dadey</surname> <given-names>N.</given-names></name> <name><surname>Kizil</surname> <given-names>R. C.</given-names></name></person-group> (<year>2014</year>). <source>Comparing student growth and teacher observation to principal judgments in the evaluation of teacher effectiveness</source>. <publisher-loc>Boulder, CO</publisher-loc>: <publisher-name>Center for Assessment, Design, Research and Evaluation, University of Colorado</publisher-name>.</citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>S. L.</given-names></name></person-group> (<year>2020</year>). <article-title>Ratings in black and white: a QuantCrit examination of race and gender in teacher evaluation reform</article-title>. <source>Race Ethn. Educ.</source>, <fpage>1</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1080/13613324.2020.1842345</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>S. L.</given-names></name> <name><surname>Ronfeldt</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Observational evaluation of teachers: measuring more than we bargained for?</article-title> <source>Am. Educ. Res. J.</source> <volume>55</volume>, <fpage>1233</fpage>&#x2013;<lpage>1267</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0002831218776216</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="book"><person-group person-group-type="author"><collab id="coll3">Carnegie Forum on Education and the Economy</collab></person-group>. (<year>1986</year>). <source>A nation prepared: teachers for the 21st century</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Carnegie Forum on Education and the Economy</publisher-name>.</citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Close</surname> <given-names>K.</given-names></name> <name><surname>Amrein-Beardsley</surname> <given-names>A.</given-names></name> <name><surname>Collins</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Putting teacher evaluation systems on the map: an overview of state's teacher evaluation systems post-every student succeeds act</article-title>. <source>Educ. Policy Analysis Archives</source> <volume>28</volume>:<fpage>58</fpage>. doi: <pub-id pub-id-type="doi">10.14507/epaa.28.5252</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clotfelter</surname> <given-names>C. T.</given-names></name> <name><surname>Ladd</surname> <given-names>H. F.</given-names></name> <name><surname>Vigdor</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Teacher credentials and student achievement in high school: a cross-subject analysis with student fixed effects</article-title>. <source>J. Hum. Resour.</source> <volume>45</volume>, <fpage>655</fpage>&#x2013;<lpage>681</lpage>. doi: <pub-id pub-id-type="doi">10.3368/jhr.45.3.655</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll4">Congressional Budget Office</collab></person-group>. (<year>2012</year>). Estimated impact of the American recovery and reinvestment act on employment and economic output from October 2011 through December 2011. Available at: <ext-link xlink:href="http://www.cbo.gov/sites/default/files/cbofiles/attachments/02-22-ARRA.pdf" ext-link-type="uri">http://www.cbo.gov/sites/default/files/cbofiles/attachments/02-22-ARRA.pdf</ext-link></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cowen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Who are the homeless? Student mobility and achievement in Michigan 2010&#x2013;2013</article-title>. <source>Educ. Res.</source> <volume>46</volume>, <fpage>33</fpage>&#x2013;<lpage>43</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X17694165</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Crouse</surname> <given-names>K.</given-names></name> <name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Joyce</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>An analysis of the meaning and use of student learning objectives</article-title>&#x201D; in <source>Student growth measures in policy and practice: Intended and unintended consequences of high-stakes teacher evaluations</source>. eds. <person-group person-group-type="editor"><name><surname>Kappler Hewitt</surname> <given-names>K.</given-names></name> <name><surname>Amrein-Beardsley</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>New York</publisher-loc>: <publisher-name>Palgrave Macmillan</publisher-name>), <fpage>203</fpage>&#x2013;<lpage>222</lpage>.</citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cuban</surname> <given-names>L.</given-names></name></person-group> (<year>1989</year>). <article-title>The &#x2018;at-risk&#x2019; label and the problem of urban school reform</article-title>. <source>Phi Delta Kappan</source> <volume>70</volume>, <fpage>780</fpage>&#x2013;<lpage>801</lpage>.</citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cullen</surname> <given-names>J. B.</given-names></name> <name><surname>Koedel</surname> <given-names>C.</given-names></name> <name><surname>Parsons</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>The compositional effect of rigorous teacher evaluation on workforce quality</article-title>. <source>Educ. Finance Policy.</source> <volume>16</volume>, <fpage>7</fpage>&#x2013;<lpage>41</lpage>. doi: <pub-id pub-id-type="doi">10.1162/edfp_a_00292</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x2019;Amico</surname> <given-names>D.</given-names></name> <name><surname>Pawlewicz</surname> <given-names>R. J.</given-names></name> <name><surname>Earley</surname> <given-names>P. M.</given-names></name> <name><surname>McGeehan</surname> <given-names>A. P.</given-names></name></person-group> (<year>2017</year>). <article-title>Where are all the black teachers? Discrimination in the teacher labor market</article-title>. <source>Harv. Educ. Rev.</source> <volume>87</volume>, <fpage>26</fpage>&#x2013;<lpage>49</lpage>. doi: <pub-id pub-id-type="doi">10.17763/1943-5045-87.1.26</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Danielson</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). <source>Enhancing professional practice: a framework for teaching</source> <edition>2nd ed.</edition> <publisher-loc>Alexandria, VA</publisher-loc>: <publisher-name>Association for Supervision and Curriculum Development</publisher-name>.</citation></ref>
<ref id="ref25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davidson</surname> <given-names>E.</given-names></name> <name><surname>Reback</surname> <given-names>R.</given-names></name> <name><surname>Rockoff</surname> <given-names>R.</given-names></name> <name><surname>Schwartz</surname> <given-names>H. L.</given-names></name></person-group> (<year>2015</year>). <article-title>Fifty ways to leave a child behind: idiosyncrasies and discrepancies in states&#x2019; implementation of NCLB</article-title>. <source>Educ. Res.</source> <volume>44</volume>, <fpage>347</fpage>&#x2013;<lpage>358</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X15601426</pub-id></citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeBray</surname> <given-names>E.</given-names></name> <name><surname>Houck</surname> <given-names>E. A.</given-names></name></person-group> (<year>2011</year>). <article-title>A narrow path through the broad middle: mapping institutional considerations for ESEA reauthorization</article-title>. <source>Peabody J. Educ.</source> <volume>86</volume>, <fpage>319</fpage>&#x2013;<lpage>337</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0161956X.2011.579009</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dee</surname> <given-names>T. S.</given-names></name> <name><surname>Jacob</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>The impact of no child left behind on student achievement</article-title>. <source>J. Policy Anal. Manage.</source> <volume>30</volume>, <fpage>418</fpage>&#x2013;<lpage>446</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pam.20586</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dee</surname> <given-names>T.</given-names></name> <name><surname>Wyckoff</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>A lasting impact: high-stakes teacher evaluations drive student success in Washington, DC</article-title>. <source>Educ. Next.</source> <volume>17</volume>, <fpage>58</fpage>&#x2013;<lpage>66</lpage>.</citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donaldson</surname> <given-names>M. L.</given-names></name> <name><surname>Woulfin</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>From tinkering to going &#x201C;rogue&#x201D;: how principals use agency when enacting new teacher evaluation systems</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>40</volume>, <fpage>531</fpage>&#x2013;<lpage>556</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0162373718784205</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drake</surname> <given-names>S.</given-names></name> <name><surname>Auletto</surname> <given-names>A.</given-names></name> <name><surname>Cohen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2019</year>). <article-title>Grading teachers: race and gender differences in low evaluation ratings and teacher employment outcomes</article-title>. <source>Am. Educ. Res. J.</source> <volume>56</volume>, <fpage>1800</fpage>&#x2013;<lpage>1833</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0002831219835776</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Duncan</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). &#x201C;<article-title>Robust data gives us the roadmap to reform</article-title>&#x201D; in <source>Address by the secretary of education to the fourth annual Institute of Education Sciences research conference</source> (<publisher-loc>Washington, DC</publisher-loc>). Available at: <ext-link xlink:href="https://education44.org/speeches/robust-data-gives-us-the-roadmap-to-reform/" ext-link-type="uri">https://education44.org/speeches/robust-data-gives-us-the-roadmap-to-reform/</ext-link></citation></ref>
<ref id="ref32"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll5">ESSA</collab></person-group>. (<year>2015</year>). Every student succeeds act, 20 U.S.C. &#x00A7; 6301. Available at: <ext-link xlink:href="https://www.congress.gov/114/plaws/publ95/PLAW-114publ95.pdf" ext-link-type="uri">https://www.congress.gov/114/plaws/publ95/PLAW-114publ95.pdf</ext-link></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Everson</surname> <given-names>K. C.</given-names></name></person-group> (<year>2017</year>). <article-title>Value-added modeling and educational accountability: are we answering the real questions?</article-title> <source>Rev. Educ. Res.</source> <volume>87</volume>, <fpage>35</fpage>&#x2013;<lpage>70</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0034654316637199</pub-id></citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garrett</surname> <given-names>R.</given-names></name> <name><surname>Steinberg</surname> <given-names>M. P.</given-names></name></person-group> (<year>2015</year>). <article-title>Examining teacher effectiveness using classroom observation scores: evidence from the randomization of teachers to students</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>37</volume>, <fpage>224</fpage>&#x2013;<lpage>242</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0162373714537551</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Gates</surname> <given-names>B.</given-names></name> <name><surname>Gates</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). 10 tough questions we get asked (2018 annual letter). Available at: <ext-link xlink:href="https://www.gatesnotes.com/2018-Annual-Letter" ext-link-type="uri">https://www.gatesnotes.com/2018-Annual-Letter</ext-link></citation></ref>
<ref id="ref36"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Bell</surname> <given-names>C. A.</given-names></name></person-group> (<year>2013</year>). &#x201C;<article-title>Evaluating teaching and teachers</article-title>&#x201D; in <source>APA handbook of testing and assessment in psychology</source>. ed. <person-group person-group-type="editor"><name><surname>Geisinger</surname> <given-names>K. F.</given-names></name></person-group>, vol. <volume>3</volume> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Psychological Association</publisher-name>), <fpage>415</fpage>&#x2013;<lpage>444</lpage>.</citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Bell</surname> <given-names>C. A.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name> <name><surname>McCaffrey</surname> <given-names>D. F.</given-names></name> <name><surname>Hamre</surname> <given-names>B. K.</given-names></name> <name><surname>Pianta</surname> <given-names>R. C.</given-names></name></person-group> (<year>2014</year>). <article-title>The instructional challenge in improving teaching quality: lessons from a classroom observation protocol</article-title>. <source>Teach. Coll. Rec.</source> <volume>116</volume>, <fpage>1</fpage>&#x2013;<lpage>32</lpage>. doi: <pub-id pub-id-type="doi">10.1177/016146811411600607</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Iwatani</surname> <given-names>E.</given-names></name></person-group> (<year>2022</year>). &#x201C;<article-title>Fairness and assessment: engaging psychometric and racial justice perspectives</article-title>&#x201D; in <source>Race and culturally responsive inquiry in education: Improving research, evaluation, and assessment</source>. eds. <person-group person-group-type="editor"><name><surname>Hood</surname> <given-names>S. L.</given-names></name> <name><surname>Frierson</surname> <given-names>H. T.</given-names></name> <name><surname>Hopson</surname> <given-names>R. K.</given-names></name> <name><surname>Arbuthnot</surname> <given-names>K. N.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard Education Press</publisher-name>).</citation></ref>
<ref id="ref39"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Marshall</surname> <given-names>B.</given-names></name></person-group> (<year>in press</year>). &#x201C;<article-title>The bold and unfulfilled promises of teacher evaluation as policy</article-title>&#x201D; in <source>Handbook of education policy research</source>. eds. <person-group person-group-type="editor"><name><surname>Cohen-Vogel</surname> <given-names>L.</given-names></name> <name><surname>Scott</surname> <given-names>J.</given-names></name> <name><surname>Youngs</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Educational Research Association</publisher-name>).</citation></ref>
<ref id="ref40"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Goe</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <source>The link between teacher quality and student outcomes: a research synthesis</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>National Comprehensive Center for Teacher Quality</publisher-name>.</citation></ref>
<ref id="ref41"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Goldhaber</surname> <given-names>D.</given-names></name> <name><surname>Theobold</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <source>Do different value-added models tell us the same things?</source> <publisher-loc>Stanford, CA</publisher-loc>: <publisher-name>Carnegie Knowledge Network</publisher-name>.</citation></ref>
<ref id="ref42"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gordon</surname> <given-names>R.</given-names></name> <name><surname>Kane</surname> <given-names>T. J.</given-names></name> <name><surname>Staiger</surname> <given-names>D. O.</given-names></name></person-group> (<year>2006</year>). <source>Identifying effective teachers using performance on the job (the Hamilton project discussion paper 2006&#x2013;01)</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>The Brookings Institution</publisher-name>.</citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grissom</surname> <given-names>J. A.</given-names></name> <name><surname>Bartanen</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Strategic retention: principal effectiveness and teacher turnover in multiple-measure teacher evaluation systems</article-title>. <source>Am. Educ. Res. J.</source> <volume>56</volume>, <fpage>514</fpage>&#x2013;<lpage>555</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0002831218797931</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grissom</surname> <given-names>J. A.</given-names></name> <name><surname>Loeb</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Assessing principals&#x2019; assessments: subjective evaluations of teacher effectiveness in low- and high-stakes environments</article-title>. <source>Educ. Finance Policy</source> <volume>12</volume>, <fpage>369</fpage>&#x2013;<lpage>395</lpage>. doi: <pub-id pub-id-type="doi">10.1162/EDFP_a_00210</pub-id></citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>P.</given-names></name> <name><surname>Cohen</surname> <given-names>J.</given-names></name> <name><surname>Ronfeldt</surname> <given-names>M.</given-names></name> <name><surname>Brown</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>The test matters: the relationship between classroom observation scores and teacher value added on multiple types of assessment</article-title>. <source>Educ. Res.</source> <volume>43</volume>, <fpage>293</fpage>&#x2013;<lpage>303</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X14544542</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hand</surname> <given-names>V.</given-names></name> <name><surname>Penuel</surname> <given-names>W. R.</given-names></name> <name><surname>Guti&#x00E9;rrez</surname> <given-names>K. D.</given-names></name></person-group> (<year>2012</year>). <article-title>(Re)framing educational possibility: attending to power and equity in shaping access to and within learning opportunities</article-title>. <source>Hum. Dev.</source> <volume>55</volume>, <fpage>250</fpage>&#x2013;<lpage>268</lpage>. doi: <pub-id pub-id-type="doi">10.1159/000345313</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Harris</surname> <given-names>D. N.</given-names></name></person-group> (<year>2011</year>). <source>Value-added measures in education: What every educator needs to know</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard Education Press</publisher-name>.</citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname> <given-names>D. N.</given-names></name> <name><surname>Ingle</surname> <given-names>W. K.</given-names></name> <name><surname>Rutledge</surname> <given-names>S. A.</given-names></name></person-group> (<year>2014</year>). <article-title>How teacher evaluation methods matter for accountability: a comparative analysis of teacher effectiveness ratings by principals and teacher value-added measures</article-title>. <source>Am. Educ. Res. J.</source> <volume>51</volume>, <fpage>73</fpage>&#x2013;<lpage>112</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0002831213517130</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname> <given-names>D. N.</given-names></name> <name><surname>Sass</surname> <given-names>T. R.</given-names></name></person-group> (<year>2011</year>). <article-title>Teacher training, teacher quality and student achievement</article-title>. <source>J. Public Econ.</source> <volume>95</volume>, <fpage>798</fpage>&#x2013;<lpage>812</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jpubeco.2010.11.009</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Heneman</surname> <given-names>H. G.</given-names></name> <name><surname>Milanowski</surname> <given-names>A.</given-names></name> <name><surname>Kimball</surname> <given-names>S. M.</given-names></name> <name><surname>Odden</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <source>Standards-based teacher evaluation as a Foundation for Knowledge- and Skill-based pay (CPRE policy brief RB-45)</source>. <publisher-loc>Philadelphia, PA</publisher-loc>: <publisher-name>Consortium for Policy Research in Education</publisher-name>.</citation></ref>
<ref id="ref51"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ho</surname> <given-names>A. D.</given-names></name> <name><surname>Kane</surname> <given-names>T. J.</given-names></name></person-group> (<year>2013</year>). <source>The reliability of classroom observations by school personnel (MET project research paper)</source>. <publisher-loc>Seattle, WA</publisher-loc>: <publisher-name>Bill and Melinda Gates Foundation</publisher-name>.</citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacob</surname> <given-names>B. A.</given-names></name> <name><surname>Walsh</surname> <given-names>E.</given-names></name></person-group> (<year>2011</year>). <article-title>What's in a rating?</article-title> <source>Econ. Educ. Rev.</source> <volume>30</volume>, <fpage>434</fpage>&#x2013;<lpage>448</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.econedurev.2010.12.009</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>James</surname> <given-names>J.</given-names></name> <name><surname>Wyckoff</surname> <given-names>J. H.</given-names></name></person-group> (<year>2020</year>). <article-title>Teacher evaluation and teacher turnover in equilibrium: evidence from DC public schools</article-title>. <source>AERA Open</source> <volume>6</volume>, <fpage>1</fpage>&#x2013;<lpage>21</lpage>. doi: <pub-id pub-id-type="doi">10.1177/2332858420932235</pub-id></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>S. M.</given-names></name> <name><surname>Kardos</surname> <given-names>S. M.</given-names></name> <name><surname>Kauffman</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>E.</given-names></name> <name><surname>Donaldson</surname> <given-names>M. L.</given-names></name></person-group> (<year>2004</year>). <article-title>The support gap: new teachers&#x2019; early experiences in high-income and low-income schools</article-title>. <source>Educ. Policy Analysis Archives</source> <volume>12</volume>:<fpage>61</fpage>. doi: <pub-id pub-id-type="doi">10.14507/epaa.v12n61.2004</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>N. D.</given-names></name> <name><surname>Bell</surname> <given-names>C. A.</given-names></name> <name><surname>Brownell</surname> <given-names>M.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name> <name><surname>Peyton</surname> <given-names>D.</given-names></name> <name><surname>Pua</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Using classroom observations in the evaluation of special education teachers</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>44</volume>, <fpage>429</fpage>&#x2013;<lpage>457</lpage>. doi: <pub-id pub-id-type="doi">10.3102/01623737211068523</pub-id></citation></ref>
<ref id="ref56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kane</surname> <given-names>T. J.</given-names></name> <name><surname>Rockoff</surname> <given-names>J. E.</given-names></name> <name><surname>Staiger</surname> <given-names>D. O.</given-names></name></person-group> (<year>2008</year>). <article-title>What does certification tell us about teacher effectiveness? Evidence from New York City</article-title>. <source>Econ. Educ. Rev.</source> <volume>27</volume>, <fpage>615</fpage>&#x2013;<lpage>631</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.econedurev.2007.05.005</pub-id></citation></ref>
<ref id="ref57"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kane</surname> <given-names>T. J.</given-names></name> <name><surname>Taylor</surname> <given-names>E. S.</given-names></name> <name><surname>Tyler</surname> <given-names>J. H.</given-names></name> <name><surname>Wooten</surname> <given-names>A. L.</given-names></name></person-group> (<year>2010</year>). <source>Identifying effective classroom practices using student achievement data</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>National Bureau of Economic Research.</publisher-name></citation></ref>
<ref id="ref58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kardos</surname> <given-names>S. M.</given-names></name> <name><surname>Johnson</surname> <given-names>S. M.</given-names></name></person-group> (<year>2007</year>). <article-title>On their own and presumed expert: new teachers&#x2019; experience with their colleagues</article-title>. <source>Teach. Coll. Rec.</source> <volume>109</volume>, <fpage>2083</fpage>&#x2013;<lpage>2106</lpage>. doi: <pub-id pub-id-type="doi">10.1177/016146810710900903</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Katz</surname> <given-names>M. B.</given-names></name> <name><surname>Rose</surname> <given-names>M.</given-names></name></person-group> (Eds.) (<year>2013</year>). <source>Public education under siege</source>. <publisher-loc>Philadelphia, PA</publisher-loc>: <publisher-name>University of Pennsylvania Press.</publisher-name></citation></ref>
<ref id="ref60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koedel</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>An empirical analysis of teacher spillover effects in secondary school</article-title>. <source>Econ. Educ. Rev.</source> <volume>28</volume>, <fpage>682</fpage>&#x2013;<lpage>692</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.econedurev.2009.02.003</pub-id></citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kraft</surname> <given-names>M. A.</given-names></name> <name><surname>Gilmour</surname> <given-names>A. F.</given-names></name></person-group> (<year>2017</year>). <article-title>Revisiting <italic>the widget effect</italic>: teacher evaluation reforms and the distribution of teacher effectiveness</article-title>. <source>Educ. Res.</source> <volume>46</volume>, <fpage>234</fpage>&#x2013;<lpage>249</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X17718797</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ladson-Billings</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <source>The Dreamkeepers: Successful teachers of African American children</source>. <edition>2nd</edition> <italic>edn.</italic> <publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Jossey-Bass</publisher-name>.</citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Reeves</surname> <given-names>T.</given-names></name></person-group> (<year>2012</year>). <article-title>Revisiting the impact of NCLB high-stakes school accountability, capacity, and resources: state NAEP 1990&#x2013;2009 reading and math achievement gaps and trends</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>34</volume>, <fpage>209</fpage>&#x2013;<lpage>231</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0162373711431604</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lockwood</surname> <given-names>J. R.</given-names></name> <name><surname>Castellano</surname> <given-names>K. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Estimating true student growth percentile distributions using latent regression multidimensional IRT models</article-title>. <source>Educ. Psychol. Meas.</source> <volume>77</volume>, <fpage>917</fpage>&#x2013;<lpage>944</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0013164416659686</pub-id>, PMID: <pub-id pub-id-type="pmid">29795939</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lockwood</surname> <given-names>J. R.</given-names></name> <name><surname>McCaffrey</surname> <given-names>D. F.</given-names></name> <name><surname>Hamilton</surname> <given-names>L. S.</given-names></name> <name><surname>Stecher</surname> <given-names>B.</given-names></name> <name><surname>Le</surname> <given-names>V.-N.</given-names></name> <name><surname>Mart&#x00ED;nez</surname> <given-names>J. F.</given-names></name></person-group> (<year>2007</year>). <article-title>The sensitivity of value-added teacher effect estimates to different mathematics achievement measures</article-title>. <source>J. Educ. Meas.</source> <volume>44</volume>, <fpage>47</fpage>&#x2013;<lpage>67</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1745-3984.2007.00026.x</pub-id></citation></ref>
<ref id="ref66"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Loveless</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <source>Between the state and the schoolhouse: understanding the failure of Common Core</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard Education Press</publisher-name>.</citation></ref>
<ref id="ref67"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Maranto</surname> <given-names>R.</given-names></name> <name><surname>McShane</surname> <given-names>M. Q.</given-names></name> <name><surname>Rhinesmith</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <source>Education reform in the Obama era: the second term and the 2016 election</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Palgrave Macmillan.</publisher-name></citation></ref>
<ref id="ref68"><citation citation-type="book"><person-group person-group-type="author"><name><surname>McDonald</surname> <given-names>S.-K.</given-names></name></person-group> (<year>2009</year>). &#x201C;<article-title>Scale-up as a framework for intervention, program, and policy evaluation research</article-title>&#x201D; in <source>Handbook of education policy research</source>. eds. <person-group person-group-type="editor"><name><surname>Sykes</surname> <given-names>G.</given-names></name> <name><surname>Schneider</surname> <given-names>B.</given-names></name> <name><surname>Plank</surname> <given-names>D. N.</given-names></name></person-group> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Educational Research Association</publisher-name>), <fpage>191</fpage>&#x2013;<lpage>208</lpage>.</citation></ref>
<ref id="ref69"><citation citation-type="book"><person-group person-group-type="author"><name><surname>McDonnell</surname> <given-names>L. M.</given-names></name> <name><surname>Weatherford</surname> <given-names>M. S.</given-names></name></person-group> (<year>2020</year>). <source>Evidence, politics, and education policy</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard University Press</publisher-name>.</citation></ref>
<ref id="ref70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Milanowski</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>The relationship between teacher performance evaluation scores and student achievement: evidence from Cincinnati</article-title>. <source>Peabody J. Educ.</source> <volume>79</volume>, <fpage>33</fpage>&#x2013;<lpage>53</lpage>. doi: <pub-id pub-id-type="doi">10.1207/s15327930pje7904_3</pub-id></citation></ref>
<ref id="ref71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murnane</surname> <given-names>R. J.</given-names></name> <name><surname>Phillips</surname> <given-names>B. R.</given-names></name></person-group> (<year>1981</year>). <article-title>What do effective teachers of inner-city children have in common?</article-title> <source>Soc. Sci. Res.</source> <volume>10</volume>, <fpage>83</fpage>&#x2013;<lpage>100</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0049-089X(81)90007-7</pub-id></citation></ref>
<ref id="ref72"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Nasir</surname> <given-names>N. S.</given-names></name> <name><surname>Scott</surname> <given-names>J.</given-names></name> <name><surname>Trujillo</surname> <given-names>T.</given-names></name> <name><surname>Hern&#x00E1;ndez</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>The sociopolitical context of teaching</article-title>&#x201D; in <source>Handbook of research on teaching</source>. eds. <person-group person-group-type="editor"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Bell</surname> <given-names>C. A.</given-names></name></person-group> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Educational Research Association</publisher-name>), <fpage>349</fpage>&#x2013;<lpage>390</lpage>.</citation></ref>
<ref id="ref73"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll6">National Association of Secondary School Principals</collab></person-group>. (<year>2019</year>). <source>Value-added measures in teacher evaluation (NASSP position statement)</source>. <comment>Available at: </comment><ext-link xlink:href="https://www.nassp.org/top-issues-in-education/position-statements/value-added-measures-in-teacher-evaluation/" ext-link-type="uri">https://www.nassp.org/top-issues-in-education/position-statements/value-added-measures-in-teacher-evaluation/</ext-link></citation></ref>
<ref id="ref74"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll7">National Commission on Excellence in Education</collab></person-group>. (<year>1983</year>). <source>A nation at risk: The imperative for educational reform</source>. <comment>Available at: </comment><ext-link xlink:href="https://edreform.com/wp-content/uploads/2013/02/A_Nation_At_Risk_1983.pdf" ext-link-type="uri">https://edreform.com/wp-content/uploads/2013/02/A_Nation_At_Risk<bold><italic>_</italic></bold>1983.pdf</ext-link></citation></ref>
<ref id="ref75"><citation citation-type="book"><person-group person-group-type="author"><collab id="coll8">National Research Council</collab></person-group>. (<year>2008</year>). <source>Assessing accomplished teaching: advanced-level certification programs</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>The National Academies Press.</publisher-name></citation></ref>
<ref id="ref76"><citation citation-type="book"><person-group person-group-type="author"><collab id="coll9">National Research Council</collab></person-group>. (<year>2015</year>). <source>An evaluation of the public schools of the District of Columbia: Reform in a changing landscape</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>The National Academies Press.</publisher-name></citation></ref>
<ref id="ref77"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>T. D.</given-names></name> <name><surname>Pham</surname> <given-names>L.</given-names></name> <name><surname>Springer</surname> <given-names>M.</given-names></name> <name><surname>Crouch</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <source>The factors of teacher attrition and retention: an updated and expanded meta-analysis of the literature</source>. (Ed Working Paper No. 19-149) <publisher-loc>Providence, RI</publisher-loc>: <publisher-name>Annenberg Institute at Brown University.</publisher-name></citation></ref>
<ref id="ref78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nye</surname> <given-names>B.</given-names></name> <name><surname>Konstantopoulos</surname> <given-names>S.</given-names></name> <name><surname>Hedges</surname> <given-names>L. V.</given-names></name></person-group> (<year>2004</year>). <article-title>How large are teacher effects?</article-title> <source>Educ. Eval. Policy Anal.</source> <volume>26</volume>, <fpage>237</fpage>&#x2013;<lpage>257</lpage>. doi: <pub-id pub-id-type="doi">10.3102/01623737026003237</pub-id></citation></ref>
<ref id="ref79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paige</surname> <given-names>M. A.</given-names></name> <name><surname>Amrein-Beardsley</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x201C;Houston, we have a lawsuit&#x201D;: a cautionary tale for the implementation of value-added models for high-stakes employment decisions</article-title>. <source>Educ. Res.</source> <volume>49</volume>, <fpage>350</fpage>&#x2013;<lpage>359</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X20923046</pub-id></citation></ref>
<ref id="ref80"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Pianta</surname> <given-names>R. C.</given-names></name> <name><surname>La Paro</surname> <given-names>K. M.</given-names></name> <name><surname>Hamre</surname> <given-names>B. K.</given-names></name></person-group> (<year>2008</year>). <source>Classroom assessment scoring system (CLASS)</source>. <publisher-loc>Baltimore, MD</publisher-loc>: <publisher-name>Paul H. Brookes</publisher-name>.</citation></ref>
<ref id="ref81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Polikoff</surname> <given-names>M. S.</given-names></name> <name><surname>McEachin</surname> <given-names>A. J.</given-names></name> <name><surname>Wrabel</surname> <given-names>S. L.</given-names></name> <name><surname>Duque</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>The waive of the future? School accountability in the waiver era</article-title>. <source>Educ. Res.</source> <volume>43</volume>, <fpage>45</fpage>&#x2013;<lpage>54</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X13517137</pub-id></citation></ref>
<ref id="ref82"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Raudenbush</surname> <given-names>S. W.</given-names></name></person-group> (<year>2013</year>). <source>What do we know about using value-added to compare teachers who work in different schools?</source> <publisher-loc>Stanford, CA</publisher-loc>: <publisher-name>Carnegie Knowledge Network.</publisher-name></citation></ref>
<ref id="ref83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reardon</surname> <given-names>S. F.</given-names></name> <name><surname>Raudenbush</surname> <given-names>S. W.</given-names></name></person-group> (<year>2009</year>). <article-title>Assumptions of value-added models for estimating school effects</article-title>. <source>Educ. Finance Policy</source> <volume>4</volume>, <fpage>492</fpage>&#x2013;<lpage>519</lpage>. doi: <pub-id pub-id-type="doi">10.1162/edfp.2009.4.4.492</pub-id></citation></ref>
<ref id="ref85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reckhow</surname> <given-names>S.</given-names></name> <name><surname>Tompkins-Stange</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Financing the education policy discourse: philanthropic funders as entrepreneurs in policy networks</article-title>. <source>Interest Groups Advoc.</source> <volume>7</volume>, <fpage>258</fpage>&#x2013;<lpage>288</lpage>. doi: <pub-id pub-id-type="doi">10.1057/s41309-018-0043-3</pub-id></citation></ref>
<ref id="ref86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reckhow</surname> <given-names>S.</given-names></name> <name><surname>Tompkins-Stange</surname> <given-names>M.</given-names></name> <name><surname>Galey-Horn</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>How the political economy of knowledge production shapes education policy: the case of teacher evaluation in federal policy discourse</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>43</volume>, <fpage>472</fpage>&#x2013;<lpage>494</lpage>. doi: <pub-id pub-id-type="doi">10.3102/01623737211003906</pub-id></citation></ref>
<ref id="ref87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rockoff</surname> <given-names>J. E.</given-names></name> <name><surname>Jacob</surname> <given-names>B. A.</given-names></name> <name><surname>Kane</surname> <given-names>T. J.</given-names></name> <name><surname>Staiger</surname> <given-names>D. O.</given-names></name></person-group> (<year>2011</year>). <article-title>Can you recognize an effective teacher when you recruit one?</article-title> <source>Educ. Finance Policy.</source> <volume>6</volume>, <fpage>43</fpage>&#x2013;<lpage>74</lpage>. doi: <pub-id pub-id-type="doi">10.1162/EDFP_a_00022</pub-id></citation></ref>
<ref id="ref88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ronfeldt</surname> <given-names>M.</given-names></name> <name><surname>Kwok</surname> <given-names>A.</given-names></name> <name><surname>Reininger</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Teachers&#x2019; preferences to teach underserved students</article-title>. <source>Urban Educ.</source> <volume>51</volume>, <fpage>995</fpage>&#x2013;<lpage>1030</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0042085914553676</pub-id></citation></ref>
<ref id="ref89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rothstein</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>Student sorting and bias in value-added estimation: selection on observables and unobservables</article-title>. <source>Educ. Finance Policy</source> <volume>4</volume>, <fpage>537</fpage>&#x2013;<lpage>571</lpage>. doi: <pub-id pub-id-type="doi">10.1162/edfp.2009.4.4.537</pub-id></citation></ref>
<ref id="ref90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rothstein</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Measuring the impacts of teachers: comment</article-title>. <source>Am. Econ. Rev.</source> <volume>107</volume>, <fpage>1656</fpage>&#x2013;<lpage>1684</lpage>. doi: <pub-id pub-id-type="doi">10.1257/aer.20141440</pub-id></citation></ref>
<ref id="ref91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowan</surname> <given-names>B.</given-names></name> <name><surname>Camburn</surname> <given-names>E.</given-names></name> <name><surname>Correnti</surname> <given-names>R.</given-names></name></person-group> (<year>2004</year>). <article-title>Using teacher logs to measure the enacted curriculum: a study of literacy teaching in third-grade classrooms</article-title>. <source>Elem. Sch. J.</source> <volume>105</volume>, <fpage>75</fpage>&#x2013;<lpage>101</lpage>. doi: <pub-id pub-id-type="doi">10.1086/428803</pub-id></citation></ref>
<ref id="ref92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowan</surname> <given-names>B.</given-names></name> <name><surname>Correnti</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Measuring reading instruction with teacher logs</article-title>. <source>Educ. Res.</source> <volume>38</volume>, <fpage>549</fpage>&#x2013;<lpage>551</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0013189X09349313</pub-id></citation></ref>
<ref id="ref93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowan</surname> <given-names>B.</given-names></name> <name><surname>Jacob</surname> <given-names>R.</given-names></name> <name><surname>Correnti</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Using instructional logs to identify quality in educational settings</article-title>. <source>New Dir. Youth Dev.</source> <volume>2009</volume>, <fpage>13</fpage>&#x2013;<lpage>31</lpage>. doi: <pub-id pub-id-type="doi">10.1002/yd.294</pub-id>, PMID: <pub-id pub-id-type="pmid">19358197</pub-id></citation></ref>
<ref id="ref94"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Rowan</surname> <given-names>B.</given-names></name> <name><surname>Raudenbush</surname> <given-names>S. W.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Teacher evaluation in American schools</article-title>&#x201D; in <source>Handbook of research on teaching</source>. eds. <person-group person-group-type="editor"><name><surname>Gitomer</surname> <given-names>D. H.</given-names></name> <name><surname>Bell</surname> <given-names>C. A.</given-names></name></person-group> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Educational Research Association</publisher-name>), <fpage>1159</fpage>&#x2013;<lpage>1216</lpage>.</citation></ref>
<ref id="ref95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salinas</surname> <given-names>C.</given-names><suffix>Jr.</suffix></name></person-group> (<year>2020</year>). <article-title>The complexity of the &#x201C;x&#x201D; in <italic>Latinx</italic>: how Latinx/a/o students relate to, identify with, and understand the term <italic>Latinx</italic></article-title>. <source>J. Hisp. High. Educ.</source> <volume>19</volume>, <fpage>149</fpage>&#x2013;<lpage>168</lpage>. doi: <pub-id pub-id-type="doi">10.1177/1538192719900382</pub-id></citation></ref>
<ref id="ref96"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salinas</surname> <given-names>C.</given-names><suffix>Jr.</suffix></name> <name><surname>Lozano</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Mapping and recontextualizing the evolution of the term <italic>Latinx</italic>: an environmental scanning in higher education</article-title>. <source>J. Latinos Educ.</source> <volume>18</volume>, <fpage>302</fpage>&#x2013;<lpage>315</lpage>. doi: <pub-id pub-id-type="doi">10.1080/15348431.2017.1390464</pub-id></citation></ref>
<ref id="ref97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanders</surname> <given-names>W. L.</given-names></name> <name><surname>Horn</surname> <given-names>S. P.</given-names></name></person-group> (<year>1994</year>). <article-title>The Tennessee value-added assessment system (TVAAS): mixed-model methodology in educational assessment</article-title>. <source>J. Pers. Eval. Educ.</source> <volume>8</volume>, <fpage>299</fpage>&#x2013;<lpage>311</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00973726</pub-id></citation></ref>
<ref id="ref98"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Sartain</surname> <given-names>L.</given-names></name> <name><surname>Stoelinga</surname> <given-names>S. R.</given-names></name> <name><surname>Krone</surname> <given-names>E.</given-names></name></person-group> (<year>2010</year>). <source>Rethinking teacher evaluation: findings from the first year of the excellence in teaching project in Chicago public schools</source>. <publisher-loc>Chicago, IL</publisher-loc>: <publisher-name>Consortium on Chicago School Research, University of Chicago.</publisher-name></citation></ref>
<ref id="ref99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Springer</surname> <given-names>M. G.</given-names></name> <name><surname>Swain</surname> <given-names>W. A.</given-names></name> <name><surname>Rodriguez</surname> <given-names>L. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Effective teacher retention bonuses: evidence from Tennessee</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>38</volume>, <fpage>199</fpage>&#x2013;<lpage>221</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0162373715609687</pub-id></citation></ref>
<ref id="ref100"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stallings</surname> <given-names>D. T.</given-names></name></person-group> (<year>2002</year>). <article-title>A brief history of the U. S. Department of Education, 1979&#x2013;2002</article-title>. <source>Phi Delta Kappan</source> <volume>83</volume>, <fpage>677</fpage>&#x2013;<lpage>683</lpage>. doi: <pub-id pub-id-type="doi">10.1177/003172170208300910</pub-id></citation></ref>
<ref id="ref101"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Stecher</surname> <given-names>B. M.</given-names></name> <name><surname>Holtzman</surname> <given-names>D. J.</given-names></name> <name><surname>Garet</surname> <given-names>M. S.</given-names></name> <name><surname>Hamilton</surname> <given-names>L. S.</given-names></name> <name><surname>Engberg</surname> <given-names>J.</given-names></name> <name><surname>Steiner</surname> <given-names>E. D.</given-names></name> <etal/></person-group>. (<year>2018</year>). <source>Improving teacher effectiveness: Final report: The intensive partnerships for effective teaching through 2015&#x2013;2016</source>. <publisher-loc>Santa Monica, CA</publisher-loc>: <publisher-name>RAND Corporation.</publisher-name></citation></ref>
<ref id="ref102"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Steinberg</surname> <given-names>M. P.</given-names></name> <name><surname>Sartain</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>What explains the race gap in teacher performance ratings? Evidence from Chicago public schools</article-title>. <source>Educ. Eval. Policy Anal.</source> <volume>43</volume>, <fpage>60</fpage>&#x2013;<lpage>82</lpage>. doi: <pub-id pub-id-type="doi">10.3102/0162373720970204</pub-id></citation></ref>
<ref id="ref103"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turkan</surname> <given-names>S.</given-names></name> <name><surname>Buzick</surname> <given-names>H. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Complexities and issues to consider in the evaluation of content teachers of English language learners</article-title>. <source>Urban Educ.</source> <volume>51</volume>, <fpage>221</fpage>&#x2013;<lpage>248</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0042085914543111</pub-id></citation></ref>
<ref id="ref104"><citation citation-type="book"><person-group person-group-type="author"><collab id="coll10">U. S. Department of Education</collab></person-group>. (<year>2009</year>). <source>Race to the top program: executive summary</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>U. S. Department of Education</publisher-name>.</citation></ref>
<ref id="ref105"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wayne</surname> <given-names>A. J.</given-names></name> <name><surname>Youngs</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>Teacher characteristics and student achievement gains: a review</article-title>. <source>Rev. Educ. Res.</source> <volume>73</volume>, <fpage>89</fpage>&#x2013;<lpage>122</lpage>. doi: <pub-id pub-id-type="doi">10.3102/00346543073001089</pub-id></citation></ref>
<ref id="ref106"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Weisberg</surname> <given-names>D.</given-names></name> <name><surname>Sexton</surname> <given-names>S.</given-names></name> <name><surname>Mulhern</surname> <given-names>J.</given-names></name> <name><surname>Keeling</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <source>The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>The New Teacher Project.</publisher-name></citation></ref>
<ref id="ref107"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Williams</surname> <given-names>J. H.</given-names></name> <name><surname>Engel</surname> <given-names>L. C.</given-names></name></person-group> (<year>2012</year>). <article-title>How do other countries evaluate teachers?</article-title> <source>Phi Delta Kappan.</source> <volume>94</volume>, <fpage>53</fpage>&#x2013;<lpage>57</lpage>. doi: <pub-id pub-id-type="doi">10.1177/003172171209400414</pub-id></citation></ref>
</ref-list>
<fn-group><fn id="fn0001"><p><sup>1</sup>Latin&#x002A; is a term that encompasses fluidity of social identities. The asterisk considers variation in self-identification among people of the Latin American diaspora and origin (<xref ref-type="bibr" rid="ref95">Salinas, 2020</xref>). Latin&#x002A; responds to (mis)use of <italic>Latinx</italic>, a term reserved for gender-nonconforming peoples of Latin American origin and descent (<xref ref-type="bibr" rid="ref96">Salinas and Lozano, 2019</xref>).</p></fn>
</fn-group>
</back>
</article>