Specialty Grand Challenge ARTICLE
The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error
- Faculty of Education and Social Work, School of Learning, Development, and Professional Practice, The University of Auckland, Auckland, New Zealand
Assessment faces continuing challenges. These challenges arise predominantly due to the inherent errors we make when designing, administering, analyzing, and interpreting assessments. A widely held assumption is that our psychometric methods lead to reliable and valid scores; however, this premise depends on students exercising 100% effort throughout a test event, with no cheating, and having had sufficient personal environmental support to produce best possible results (Dorans, 2012).
Inconveniently, research makes clear that cheating and lack of effort contaminate scores (Murdock et al., 2016; Wise and Smith, 2016). This is especially the case in low-stakes testing situations, such as institutional evaluations (Wise and Cotten, 2009), leading to inappropriate conclusions about the state of an organization or jurisdiction. Hence, while it is convenient to presume that statistical advances will account for such systematic sources of error, the reality is that much assessment takes place both “in vivo” and “in situ” during classroom activities (Zumbo, 2015). Thus, while psychometric methods work reasonably well in high-stakes examination or standardized testing contexts (i.e., “in vitro”), there is little guarantee that these assumptions hold true for what happens in classroom contexts. Thus, the psychometric and testing industry has much to do to develop methods of describing and accounting for the myriad complexities of classroom- or school-based dynamics.
This matters because a widespread policy framework of using assessment to guide or inform improvement (i.e., “assessment for learning” or “formative assessment”) requires teachers to assess students so as to identify the quality of student learning and appropriate changes to classroom practices. UK experts tend to argue that this can only be done through teacher–student interaction in the classroom or by involving students in the process of considering the merits of their own or peers’ work (Black et al., 2003; Harlen, 2007; Swaffield, 2011). Others consider that tests can contribute information about changes to teaching that lead to better learning outcomes, provided the tests go beyond rank order or total score reporting (Brown and Hattie, 2012) or if teachers spend time analyzing strengths and weaknesses (Carless, 2011).
Regardless of the type of assessment method, it is very difficult for pre-service teachers to learn how to assess formatively (Hill and Eyers, 2016). Indeed, even practicing teachers need expertise in curriculum and pedagogy to exercise command of multiple methods of assessment in such a way that all learners are helped to overcome the, sometimes idiosyncratic, challenges they face (Cowie and Harrison, 2016; Moon, 2016). Teachers in New Zealand and Netherlands have learned to use achievement data to guide school-wide improvements, provided experts give them help (Lai and Schildkamp, 2016). However, such efforts often take 2–3 years before changes can be seen in student performance. Thus, despite multiple studies which show that teachers believe in using assessment formatively (Barnes et al., 2015; Bonner, 2016), putting in place policy and resources to support formative assessment is difficult, meaning formative assessment is not a quick fix for improving outcomes for all learners.
The formative assessment policy agenda challenges the dominance of formal testing and teacher-centric methods of assessment, with expectations that effective learning takes place as students engage with learning targets, outcomes, or objectives, take ownership of their work, cooperate with peers, understand more deeply what quality is, and receive and generate appropriate feedback (Leahy et al., 2005). Inconveniently, involving students in assessment presents considerable challenge due to psychological and social factors that interfere with the student’s ability to accurately self-evaluate (Andrade and Brown, 2016) or to constructively peer evaluate and collaborate (Panadero, 2016; Strijbos, 2016). Indeed, evidence that student involvement in assessment develops self-regulatory abilities is weak (Dinsmore and Wilson, 2016). Feedback processes are complex, belying the simple notion that student “horses” will automatically learn once they are led to the “water” of feedback (Lipnevich et al., 2016). While novelty in assessment methods is being developed, especially through introduction of ICT (Katz and Gorin, 2016), it is true that students are not necessarily fans of new ways of being assessed for fear their performance will be impacted (Struyven and Devesa, 2016).
A second widespread policy initiative is to use assessments, especially standardized tests, to evaluate teachers, schools, and systems (Lingard and Lewis, 2016; Teltemann and Klieme, 2016). It is clear that such policies tend to have largely negative impact on the quality of teaching (Hamilton, 2003; Nichols and Harris, 2016), and perhaps more so among minority and lower socio-economic communities. Nonetheless, public acceptance of the legitimacy of using assessment scores to ascertain quality in schooling is reasonably high (Buckendahl, 2016). Using tests to evaluate schools and teaching is a relatively quick and low-cost political process (Linn, 2000). However, summative accountability use of assessments creates tensions for teachers (Bonner, 2016), with many teachers in high-stakes accountability environments having very negative views of such uses (Deneen and Brown, 2016). Using assessments formatively requires discovery of what students have “failed” to be good at, so as to inform further instruction (Hattie and Brown, 2008). This implies that a formative assessment ought to reveal lack of success, a problematic event if external accountability consequences are attached to the same result. Thus, if consequences for low scores are seen as unfair, then it is not surprising if teachers use multiple methods to ensure that scores increase. If accountability assessment scores are inflated through construct-irrelevant processes, then the meaning of an accountability assessment is problematic.
The choice of policy priorities within different jurisdictions strongly shapes the nature and power of assessment practices. For example, both Arabic and Chinese language societies strongly prioritize memorization of content as the dominant model of schooling and attach substantial social and economic benefits for successful performance on formal examinations (Hargreaves, 1997; OECD, 2011; Gebril, 2016; Kennedy, 2016). Anglo-Commonwealth countries strongly prioritize a child-centered, student-involved approach (Stobart, 2006), in which interactive teacher assessment practices have been prioritized as means of improving learning outcomes (Black and Wiliam, 1998). The United States has strong legal protection for special needs students (IDEA, 1997) who are entitled to differentiated assessment and evaluation practices (Tomlinson, 1999). These differences in social uses and styles of assessment complicate the meaning of a grade or score and create challenges for psychometric models that attempt to create universal explanations of performance.
Within societies that are highly homogenous in terms of ethnic and linguistic make-up (e.g., Finland, Japan, China), it may be reasonable to expect that common psychological and social factors influence assessment. This simplifies predicting and modeling those factors. However, when comparisons are made among culturally distinct groups in multicultural societies, which is more the case in economically developed societies and nations (Van de Vijver, 2016), the psychological factors influencing student response, teacher judgments, or test performance can vary significantly. For example, tendencies to self-effacement or self-enhancement are not equal across cultural groups (Suzuki et al., 2008), so the meaning of self-assessment has to be carefully evaluated (i.e., among collectivist groups modest self-reporting enhances group belongingness). In multicultural contexts, assessments that depend on classroom interactions between and among students and teachers are likely to be impacted by these different cultural standards as to the best way to communicate an evaluation of work. The capacity of teachers to appropriately collect, analyze, and plan in response to both formal and informal assessment data is generally weak (Xu and Brown, 2016). Quite prolonged and intensive professional development is needed to generate “assessment capable” teachers (Smith et al., 2014). Thus, assessors and assessments are challenged by the varying and subtle differences created by cultural difference.
Even the introduction of technological solutions that increase the authenticity, diversity, and efficiency of formal testing (Csapó et al., 2012; Katz and Gorin, 2016) does not necessarily improve student performance or solve problems in scoring. Students’ enthusiasm for a computerized activity does not automatically lead to valid conclusions about their proficiency. Students are often concerned that novel assessment practices (including peer assessment, self-assessment, portfolio, performance, or computer-based assessments) will have negative impacts on their performance simply because they are unsure as to how well they will do on a new method of evaluation (Struyven and Devesa, 2016). Consequently, students tend to retreat into a strong preference for conventional assessment practices (e.g., essays or multiple-choice questions). Furthermore, technology now permits data sharing and long-term tracking of student performance, which ought to improve our understanding of how students are improving in which areas. However, the existence of these electronic data raises concerns about privacy and protection; imagine possible negative implications if early poor performance is kept on record and used in evaluative decision-making, despite substantial subsequent progress (Tierney and Koch, 2016).
Thus, inconveniently, the field of testing, applied psychometrics, measurement, and assessment is faced with complex problems, which are not restricted to any one form of assessment or any one society in which assessment is deployed. The inconveniences outlined here are especially the case if we accept that the goal of assessment is to inform improvement and make valid decisions about learners and teachers. The need for accurate diagnostic prescriptions that teachers, students, and/or parents could use to inform improvement is paramount. These prescriptions need to occur close to and responsive to the real-time processes of classroom learning and teaching, which is a substantial problem. The great contribution of psychometrics to the field of education has been an explicit attention to the problem of error in all testing, measurement, and assessment processes. However, few tools are currently available to robustly estimate and account for the kinds of error that occur in real-time classroom observations, interactions, and interpretations. The inconvenient challenge for educators who would minimize the role assessment plays in curriculum is that high-quality tests and measurements are necessary for justice, fairness, and the well-being of individuals and society. The inconvenient challenge for policy makers is that many assessment processes are not reliable or dependable (e.g., essay examinations; Brown, 2010), nor do they account well for the many factors outlined here. Thus, many policy decisions based on inadequate tools or processes are invalid.
The future of assessment requires that we no longer ignore these inconvenient problems facing assessment, testing, and applied measurement. Rather, assessment has to turn constructively to deeply insightful investigations into these perennial problems. Teachers and students need to know where learning is and what is next. Policy makers and parents have a right to know what is working, who is learning, who needs help, what needs to change, and so on. Assessment and testing are how we as humans discover the answers to these questions. Hence, good schooling and good education need good testing or assessment, both in the sense of high-quality and rightly done.
Leaning heavily on validity theory (Messick, 1989; Kane, 2006), good assessment leads to defensible interpretations and actions. These uses depend on robust arguments based on relevant theories of curriculum, teaching, learning, and measurement and on trustworthy empirical evidence that has been subjected to scrutiny (i.e., statistical and/or social moderation). The need to bring greater skill and insight into assessments that inform classroom practice is essential. The success of the whole superstructure of schooling relies on the quality of judgments and evaluations carried out in the millions of classrooms of the world on an everyday basis. If this work is not done well, and if we do not know that it is not done well, we fail.
Hence, engaging in the difficult challenges of how assessment can help education, while also making a credible case for the scores or judgments generated by assessments, needs to be reported. Leaving this only to educational statisticians would be a mistake. Testing and measurement need to integrate with classroom teaching, learning, and curriculum if it is to support schooling and prevent politicians from making simplistic but wrong interpretations and uses of assessment. This is the Grand Challenge for this Section of the journal Frontiers in Education. How can assessment be made flexible enough to support real learning in vivo, while fulfilling all the diverse expectations society has for it? As Section Editor, I look forward to your contributions.
The author confirms being the sole contributor of this work and approved it for publication.
Conflict of Interest Statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This paper draws heavily on Brown and Harris (2016). An earlier version of this paper, presented as an inaugural professorial lecture at the Faculty of Education and Social Work, The University of Auckland, can be seen at doi: 10.17608/k6.auckland.4238792.v1.
Andrade, H. L., and Brown, G. T. L. (2016). “Student self-assessment in the classroom,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 319–334.
Barnes, N., Fives, H., and Dacey, C. M. (2015). “Teachers’ beliefs about assessment,” in International Handbook of Research on Teacher Beliefs, eds H. Fives and M. Gregoire Gill (New York: Routledge), 284–300.
Bonner, S. M. (2016). “Teachers’ perceptions about assessment: competing narratives,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 21–39.
Brown, G. T. L., and Harris, L. R. (2016). “The future of assessment research as a human and social endeavour,” in Handbook of Human and Social Factors in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 506–523.
Brown, G. T. L., and Hattie, J. A. (2012). “The benefits of regular standardized assessment in childhood education: Guiding improved instruction and learning,” in Contemporary Debates in Childhood Education and Development, eds S. Suggate and E. Reese (London: Routledge), 287–292.
Buckendahl, C. W. (2016). “Public perceptions about assessment in education,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 454–471.
Cowie, B., and Harrison, C. (2016). “Classroom processes that support effective assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 335–350.
Csapó, B., Ainley, J., Bennett, R. E., Latour, T., and Law, N. (2012). “Technological issues for computer-based assessment,” in Assessment and Teaching of 21st Century Skills, eds P. Griffin, B. McGaw, and E. Care (Dordrecht, NL: Springer), 143–230.
Dinsmore, D. L., and Wilson, H. E. (2016). “Student participation in assessment: does it influence self-regulation?” in Handbook of Human and Social Factors in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 145–168.
Gebril, A. (2016). “Educational assessment in Muslim countries: values, polices, and practices,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 420–435.
Hattie, J. A., and Brown, G. T. L. (2008). Technology for school-based assessment and assessment for learning: development principles from New Zealand. J. Educ. Technol. Syst. 36, 189–201. doi:10.2190/ET.36.2.g
Hill, M. F., and Eyers, G. (2016). “Moving from student to teacher: changing perspectives about assessment through teacher education,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 57–76.
Katz, I. R., and Gorin, J. S. (2016). “Computerising assessment: impacts on education stakeholders,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 472–489.
Kennedy, K. J. (2016). “Exploring the influence of culture on assessment: the case of teachers’ conceptions of assessment in Confucian-Heritage Cultures,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Harris and L. R. Brown (New York: Routledge), 404–419.
Lai, M. K., and Schildkamp, K. (2016). “In-service teacher professional learning: use of assessment in data-based decision-making,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 77–94.
Lingard, B., and Lewis, S. (2016). “Globalization of the Anglo-American approach to top-down, test-based educational accountability,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 387–403.
Lipnevich, A. A., Berg, D. A. G., and Smith, J. K. (2016). “Toward a model of student response to feedback,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 169–185.
Moon, T. R. (2016). “Differentiated instruction and assessment: an approach to classroom assessment in conditions of student diversity,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 284–301.
Murdock, T. B., Stephens, J. M., and Groteweil, M. M. (2016). “Student dishonesty in the face of assessment: who, why, and what we can do about it,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 186–203.
Nichols, S. L., and Harris, L. R. (2016). “Accountability assessment’s effects on teachers and schools,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 40–56.
Panadero, E. (2016). “Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 247–266.
Smith, L. F., Hill, M. F., Cowie, B., and Gilmore, A. (2014). “Preparing teachers to use the enabling power of assessment,” in Designing Assessment for Quality Learning, eds C. M. Wyatt-Smith, V. Klenowski, and P. Colbert (Dordrecht, NL: Springer), 303–323.
Struyven, K., and Devesa, J. (2016). “Students’ perceptions of novel forms of assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 129–144.
Suzuki, L. K., Davis, H. M., and Greenfield, P. M. (2008). Self-enhancement and self-effacement in reaction to praise and criticism: the case of multiethnic youth. Ethos 36, 78–97. doi:10.1111/j.1548-1352.2008.00005.x
Teltemann, J., and Klieme, E. (2016). “The impact of international testing projects on policy and practice,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 369–386.
Tierney, R. D., and Koch, M. J. (2016). “Privacy in classroom assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 267–283.
Van de Vijver, F. (2016). “Assessment in education in multicultural populations,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 436–453.
Wise, S. L., and Cotten, M. R. (2009). “Test-taking effort and score validity: the influence of student conceptions of assessment,” in Student Perspectives on Assessment: What Students Can Tell Us About Assessment for Learning, eds D. M. McInerney, G. T. L. Brown, and G. A. D. Liem (Charlotte, NC: Information Age Publishing), 187–205.
Wise, S. L., and Smith, L. F. (2016). “The validity of assessment when students don’t give good effort,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 204–220.
Zumbo, B. D. (2015). “Consequences, side effects and the ecology of testing: keys to considering assessment in vivo,” in Plenary Address to the 2015 Annual Conference of the Association for Educational Assessment—Europe (AEA-E), Glasgow, Scotland.
Keywords: assessment, psychometrics, classroom assessment, formative assessment, error, culture, social behavior, psychological tests
Citation: Brown GTL (2017) The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error. Front. Educ. 2:3. doi: 10.3389/feduc.2017.00003
Received: 27 November 2016; Accepted: 30 January 2017;
Published: 13 February 2017
Edited by:Anastasiya A. Lipnevich, The City University of New York, USA
Reviewed by:Eva Marie Ingeborg Hartell, KTH Royal Institute of Technology, Sweden
Copyright: © 2017 Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Gavin T. L. Brown, email@example.com