The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error

Brown, Gavin T. L.

doi:10.3389/feduc.2017.00003

SPECIALTY GRAND CHALLENGE article

Front. Educ., 13 February 2017

Sec. Assessment, Testing and Applied Measurement

Volume 2 - 2017 | https://doi.org/10.3389/feduc.2017.00003

The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error

GT
Gavin T. L. Brown ^*

Faculty of Education and Social Work, School of Learning, Development, and Professional Practice, The University of Auckland, Auckland, New Zealand

Assessment faces continuing challenges. These challenges arise predominantly due to the inherent errors we make when designing, administering, analyzing, and interpreting assessments. A widely held assumption is that our psychometric methods lead to reliable and valid scores; however, this premise depends on students exercising 100% effort throughout a test event, with no cheating, and having had sufficient personal environmental support to produce best possible results (Dorans, 2012).

Inconveniently, research makes clear that cheating and lack of effort contaminate scores (Murdock et al., 2016; Wise and Smith, 2016). This is especially the case in low-stakes testing situations, such as institutional evaluations (Wise and Cotten, 2009), leading to inappropriate conclusions about the state of an organization or jurisdiction. Hence, while it is convenient to presume that statistical advances will account for such systematic sources of error, the reality is that much assessment takes place both “in vivo” and “in situ” during classroom activities (Zumbo, 2015). Thus, while psychometric methods work reasonably well in high-stakes examination or standardized testing contexts (i.e., “in vitro”), there is little guarantee that these assumptions hold true for what happens in classroom contexts. Thus, the psychometric and testing industry has much to do to develop methods of describing and accounting for the myriad complexities of classroom- or school-based dynamics.

This matters because a widespread policy framework of using assessment to guide or inform improvement (i.e., “assessment for learning” or “formative assessment”) requires teachers to assess students so as to identify the quality of student learning and appropriate changes to classroom practices. UK experts tend to argue that this can only be done through teacher–student interaction in the classroom or by involving students in the process of considering the merits of their own or peers’ work (Black et al., 2003; Harlen, 2007; Swaffield, 2011). Others consider that tests can contribute information about changes to teaching that lead to better learning outcomes, provided the tests go beyond rank order or total score reporting (Brown and Hattie, 2012) or if teachers spend time analyzing strengths and weaknesses (Carless, 2011).

Regardless of the type of assessment method, it is very difficult for pre-service teachers to learn how to assess formatively (Hill and Eyers, 2016). Indeed, even practicing teachers need expertise in curriculum and pedagogy to exercise command of multiple methods of assessment in such a way that all learners are helped to overcome the, sometimes idiosyncratic, challenges they face (Cowie and Harrison, 2016; Moon, 2016). Teachers in New Zealand and Netherlands have learned to use achievement data to guide school-wide improvements, provided experts give them help (Lai and Schildkamp, 2016). However, such efforts often take 2–3 years before changes can be seen in student performance. Thus, despite multiple studies which show that teachers believe in using assessment formatively (Barnes et al., 2015; Bonner, 2016), putting in place policy and resources to support formative assessment is difficult, meaning formative assessment is not a quick fix for improving outcomes for all learners.

The formative assessment policy agenda challenges the dominance of formal testing and teacher-centric methods of assessment, with expectations that effective learning takes place as students engage with learning targets, outcomes, or objectives, take ownership of their work, cooperate with peers, understand more deeply what quality is, and receive and generate appropriate feedback (Leahy et al., 2005). Inconveniently, involving students in assessment presents considerable challenge due to psychological and social factors that interfere with the student’s ability to accurately self-evaluate (Andrade and Brown, 2016) or to constructively peer evaluate and collaborate (Panadero, 2016; Strijbos, 2016). Indeed, evidence that student involvement in assessment develops self-regulatory abilities is weak (Dinsmore and Wilson, 2016). Feedback processes are complex, belying the simple notion that student “horses” will automatically learn once they are led to the “water” of feedback (Lipnevich et al., 2016). While novelty in assessment methods is being developed, especially through introduction of ICT (Katz and Gorin, 2016), it is true that students are not necessarily fans of new ways of being assessed for fear their performance will be impacted (Struyven and Devesa, 2016).

A second widespread policy initiative is to use assessments, especially standardized tests, to evaluate teachers, schools, and systems (Lingard and Lewis, 2016; Teltemann and Klieme, 2016). It is clear that such policies tend to have largely negative impact on the quality of teaching (Hamilton, 2003; Nichols and Harris, 2016), and perhaps more so among minority and lower socio-economic communities. Nonetheless, public acceptance of the legitimacy of using assessment scores to ascertain quality in schooling is reasonably high (Buckendahl, 2016). Using tests to evaluate schools and teaching is a relatively quick and low-cost political process (Linn, 2000). However, summative accountability use of assessments creates tensions for teachers (Bonner, 2016), with many teachers in high-stakes accountability environments having very negative views of such uses (Deneen and Brown, 2016). Using assessments formatively requires discovery of what students have “failed” to be good at, so as to inform further instruction (Hattie and Brown, 2008). This implies that a formative assessment ought to reveal lack of success, a problematic event if external accountability consequences are attached to the same result. Thus, if consequences for low scores are seen as unfair, then it is not surprising if teachers use multiple methods to ensure that scores increase. If accountability assessment scores are inflated through construct-irrelevant processes, then the meaning of an accountability assessment is problematic.

The choice of policy priorities within different jurisdictions strongly shapes the nature and power of assessment practices. For example, both Arabic and Chinese language societies strongly prioritize memorization of content as the dominant model of schooling and attach substantial social and economic benefits for successful performance on formal examinations (Hargreaves, 1997; OECD, 2011; Gebril, 2016; Kennedy, 2016). Anglo-Commonwealth countries strongly prioritize a child-centered, student-involved approach (Stobart, 2006), in which interactive teacher assessment practices have been prioritized as means of improving learning outcomes (Black and Wiliam, 1998). The United States has strong legal protection for special needs students (IDEA, 1997) who are entitled to differentiated assessment and evaluation practices (Tomlinson, 1999). These differences in social uses and styles of assessment complicate the meaning of a grade or score and create challenges for psychometric models that attempt to create universal explanations of performance.

Within societies that are highly homogenous in terms of ethnic and linguistic make-up (e.g., Finland, Japan, China), it may be reasonable to expect that common psychological and social factors influence assessment. This simplifies predicting and modeling those factors. However, when comparisons are made among culturally distinct groups in multicultural societies, which is more the case in economically developed societies and nations (Van de Vijver, 2016), the psychological factors influencing student response, teacher judgments, or test performance can vary significantly. For example, tendencies to self-effacement or self-enhancement are not equal across cultural groups (Suzuki et al., 2008), so the meaning of self-assessment has to be carefully evaluated (i.e., among collectivist groups modest self-reporting enhances group belongingness). In multicultural contexts, assessments that depend on classroom interactions between and among students and teachers are likely to be impacted by these different cultural standards as to the best way to communicate an evaluation of work. The capacity of teachers to appropriately collect, analyze, and plan in response to both formal and informal assessment data is generally weak (Xu and Brown, 2016). Quite prolonged and intensive professional development is needed to generate “assessment capable” teachers (Smith et al., 2014). Thus, assessors and assessments are challenged by the varying and subtle differences created by cultural difference.

Even the introduction of technological solutions that increase the authenticity, diversity, and efficiency of formal testing (Csapó et al., 2012; Katz and Gorin, 2016) does not necessarily improve student performance or solve problems in scoring. Students’ enthusiasm for a computerized activity does not automatically lead to valid conclusions about their proficiency. Students are often concerned that novel assessment practices (including peer assessment, self-assessment, portfolio, performance, or computer-based assessments) will have negative impacts on their performance simply because they are unsure as to how well they will do on a new method of evaluation (Struyven and Devesa, 2016). Consequently, students tend to retreat into a strong preference for conventional assessment practices (e.g., essays or multiple-choice questions). Furthermore, technology now permits data sharing and long-term tracking of student performance, which ought to improve our understanding of how students are improving in which areas. However, the existence of these electronic data raises concerns about privacy and protection; imagine possible negative implications if early poor performance is kept on record and used in evaluative decision-making, despite substantial subsequent progress (Tierney and Koch, 2016).

Thus, inconveniently, the field of testing, applied psychometrics, measurement, and assessment is faced with complex problems, which are not restricted to any one form of assessment or any one society in which assessment is deployed. The inconveniences outlined here are especially the case if we accept that the goal of assessment is to inform improvement and make valid decisions about learners and teachers. The need for accurate diagnostic prescriptions that teachers, students, and/or parents could use to inform improvement is paramount. These prescriptions need to occur close to and responsive to the real-time processes of classroom learning and teaching, which is a substantial problem. The great contribution of psychometrics to the field of education has been an explicit attention to the problem of error in all testing, measurement, and assessment processes. However, few tools are currently available to robustly estimate and account for the kinds of error that occur in real-time classroom observations, interactions, and interpretations. The inconvenient challenge for educators who would minimize the role assessment plays in curriculum is that high-quality tests and measurements are necessary for justice, fairness, and the well-being of individuals and society. The inconvenient challenge for policy makers is that many assessment processes are not reliable or dependable (e.g., essay examinations; Brown, 2010), nor do they account well for the many factors outlined here. Thus, many policy decisions based on inadequate tools or processes are invalid.

The future of assessment requires that we no longer ignore these inconvenient problems facing assessment, testing, and applied measurement. Rather, assessment has to turn constructively to deeply insightful investigations into these perennial problems. Teachers and students need to know where learning is and what is next. Policy makers and parents have a right to know what is working, who is learning, who needs help, what needs to change, and so on. Assessment and testing are how we as humans discover the answers to these questions. Hence, good schooling and good education need good testing or assessment, both in the sense of high-quality and rightly done.

Leaning heavily on validity theory (Messick, 1989; Kane, 2006), good assessment leads to defensible interpretations and actions. These uses depend on robust arguments based on relevant theories of curriculum, teaching, learning, and measurement and on trustworthy empirical evidence that has been subjected to scrutiny (i.e., statistical and/or social moderation). The need to bring greater skill and insight into assessments that inform classroom practice is essential. The success of the whole superstructure of schooling relies on the quality of judgments and evaluations carried out in the millions of classrooms of the world on an everyday basis. If this work is not done well, and if we do not know that it is not done well, we fail.

Hence, engaging in the difficult challenges of how assessment can help education, while also making a credible case for the scores or judgments generated by assessments, needs to be reported. Leaving this only to educational statisticians would be a mistake. Testing and measurement need to integrate with classroom teaching, learning, and curriculum if it is to support schooling and prevent politicians from making simplistic but wrong interpretations and uses of assessment. This is the Grand Challenge for this Section of the journal Frontiers in Education. How can assessment be made flexible enough to support real learning in vivo, while fulfilling all the diverse expectations society has for it? As Section Editor, I look forward to your contributions.

Statements

Author contributions

The author confirms being the sole contributor of this work and approved it for publication.

Acknowledgments

This paper draws heavily on Brown and Harris (2016). An earlier version of this paper, presented as an inaugural professorial lecture at the Faculty of Education and Social Work, The University of Auckland, can be seen at doi: 10.17608/k6.auckland.4238792.v1.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1
AndradeH. L.BrownG. T. L. (2016). “Student self-assessment in the classroom,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 319–334.
- Google Scholar
2
BarnesN.FivesH.DaceyC. M. (2015). “Teachers’ beliefs about assessment,” in International Handbook of Research on Teacher Beliefs, eds FivesH.Gregoire GillM. (New York: Routledge), 284–300.
- Google Scholar
3
BlackP.HarrisonC.LeeC.MarshallB.WiliamD. (2003). Assessment for Learning: Putting It into Practice. Maidenhead: Open University Press.
- Google Scholar
4
BlackP.WiliamD. (1998). Assessment and classroom learning. Assess. Educ.5, 7–74.10.1080/0969595980050102
- CrossRef
- Google Scholar
5
BonnerS. M. (2016). “Teachers’ perceptions about assessment: competing narratives,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 21–39.
- Google Scholar
6
BrownG. T. L. (2010). The validity of examination essays in higher education: issues and responses. High. Educ. Q.64, 276–291.10.1111/j.1468-2273.2010.00460.x
- CrossRef
- Google Scholar
7
BrownG. T. L.HarrisL. R. (2016). “The future of assessment research as a human and social endeavour,” in Handbook of Human and Social Factors in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 506–523.
- Google Scholar
8
BrownG. T. L.HattieJ. A. (2012). “The benefits of regular standardized assessment in childhood education: Guiding improved instruction and learning,” in Contemporary Debates in Childhood Education and Development, eds SuggateS.ReeseE. (London: Routledge), 287–292.
- Google Scholar
9
BuckendahlC. W. (2016). “Public perceptions about assessment in education,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 454–471.
- Google Scholar
10
CarlessD. (2011). From Testing to Productive Student Learning: Implementing Formative Assessment in Confucian-Heritage settings. London: Routledge.
- Google Scholar
11
CowieB.HarrisonC. (2016). “Classroom processes that support effective assessment,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 335–350.
- Google Scholar
12
CsapóB.AinleyJ.BennettR. E.LatourT.LawN. (2012). “Technological issues for computer-based assessment,” in Assessment and Teaching of 21st Century Skills, eds GriffinP.McGawB.CareE. (Dordrecht, NL: Springer), 143–230.
- Google Scholar
13
DeneenC. C.BrownG. T. L. (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Educ.3, 1225380.10.1080/2331186X.2016.1225380
- CrossRef
- Google Scholar
14
DinsmoreD. L.WilsonH. E. (2016). “Student participation in assessment: does it influence self-regulation?” in Handbook of Human and Social Factors in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 145–168.
- Google Scholar
15
DoransN. J. (2012). The contestant perspective on taking tests: emanations from the statue within. Educ. Meas.31, 20–37.10.1111/j.1745-3992.2012.00250.x
- CrossRef
- Google Scholar
16
GebrilA. (2016). “Educational assessment in Muslim countries: values, polices, and practices,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 420–435.
- Google Scholar
17
HamiltonL. (2003). Assessment as a policy tool. Rev. Res. Educ.27, 25–68.10.3102/0091732X027001025
- CrossRef
- Google Scholar
18
HargreavesE. (1997). The diploma disease in Egypt: learning, teaching and the monster of the secondary leaving certificate. Assess. Educ.4, 161–176.10.1080/0969594970040111
- CrossRef
- Google Scholar
19
HarlenW. (2007). Assessment of Learning. Los Angeles: SAGE.
- Google Scholar
20
HattieJ. A.BrownG. T. L. (2008). Technology for school-based assessment and assessment for learning: development principles from New Zealand. J. Educ. Technol. Syst.36, 189–201.10.2190/ET.36.2.g
- CrossRef
- Google Scholar
21
HillM. F.EyersG. (2016). “Moving from student to teacher: changing perspectives about assessment through teacher education,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 57–76.
- Google Scholar
22
IDEA. (1997). Individuals with Disabilities Act, Pub.L. 101-476 C.F.R. § §1400 et seq.
- Google Scholar
23
KaneM. T. (2006). “Validation,” in Educational Measurement, 4th Edn, ed. BrennanR. L. (Westport, CT: Praeger), 17–64.
- Google Scholar
24
KatzI. R.GorinJ. S. (2016). “Computerising assessment: impacts on education stakeholders,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 472–489.
- Google Scholar
25
KennedyK. J. (2016). “Exploring the influence of culture on assessment: the case of teachers’ conceptions of assessment in Confucian-Heritage Cultures,” in Handbook of Human and Social Conditions in Assessment, eds HarrisG. T. L.BrownL. R. (New York: Routledge), 404–419.
- Google Scholar
26
LaiM. K.SchildkampK. (2016). “In-service teacher professional learning: use of assessment in data-based decision-making,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 77–94.
- Google Scholar
27
LeahyS.LyonC.ThompsonM.WiliamD. (2005). Classroom assessment minute by minute, day by day. Educ. Leadersh.63, 18–24.
- Google Scholar
28
LingardB.LewisS. (2016). “Globalization of the Anglo-American approach to top-down, test-based educational accountability,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 387–403.
- Google Scholar
29
LinnR. L. (2000). Assessments and accountability. Educ. Res.29, 4–16.10.3102/0013189X029003004
- CrossRef
- Google Scholar
30
LipnevichA. A.BergD. A. G.SmithJ. K. (2016). “Toward a model of student response to feedback,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 169–185.
- Google Scholar
31
MessickS. (1989). “Validity,” in Educational Measurement, 3rd Edn, ed. LinnR. L. (Old Tappan, NJ: MacMillan), 13–103.
- Google Scholar
32
MoonT. R. (2016). “Differentiated instruction and assessment: an approach to classroom assessment in conditions of student diversity,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 284–301.
- Google Scholar
33
MurdockT. B.StephensJ. M.GroteweilM. M. (2016). “Student dishonesty in the face of assessment: who, why, and what we can do about it,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 186–203.
- Google Scholar
34
NicholsS. L.HarrisL. R. (2016). “Accountability assessment’s effects on teachers and schools,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 40–56.
- Google Scholar
35
OECD. (2011). Strong Performers and Successful Reformers in Education: Lessons from PISA for the United States. Paris, FR: OECD Publishing.
- Google Scholar
36
PanaderoE. (2016). “Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 247–266.
- Google Scholar
37
SmithL. F.HillM. F.CowieB.GilmoreA. (2014). “Preparing teachers to use the enabling power of assessment,” in Designing Assessment for Quality Learning, eds Wyatt-SmithC. M.KlenowskiV.ColbertP. (Dordrecht, NL: Springer), 303–323.
- Google Scholar
38
StobartG. (2006). “The validity of formative assessment,” in Assessment and Learning, ed. GardnerJ. (London: SAGE), 133–146.
- Google Scholar
39
StrijbosJ. W. (2016). “Assessment of collaborative learning,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 302–318.
- Google Scholar
40
StruyvenK.DevesaJ. (2016). “Students’ perceptions of novel forms of assessment,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 129–144.
- Google Scholar
41
SuzukiL. K.DavisH. M.GreenfieldP. M. (2008). Self-enhancement and self-effacement in reaction to praise and criticism: the case of multiethnic youth. Ethos36, 78–97.10.1111/j.1548-1352.2008.00005.x
- CrossRef
- Google Scholar
42
SwaffieldS. (2011). Getting to the heart of authentic assessment for learning. Assess. Educ.18, 433–449.10.1080/0969594X.2011.582838
- CrossRef
- Google Scholar
43
TeltemannJ.KliemeE. (2016). “The impact of international testing projects on policy and practice,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 369–386.
- Google Scholar
44
TierneyR. D.KochM. J. (2016). “Privacy in classroom assessment,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 267–283.
- Google Scholar
45
TomlinsonC. A. (1999). The Differentiated Classroom: Responding to the Needs of All Learners. Alexandria, VA: ASCD.
- Google Scholar
46
Van de VijverF. (2016). “Assessment in education in multicultural populations,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 436–453.
- Google Scholar
47
WiseS. L.CottenM. R. (2009). “Test-taking effort and score validity: the influence of student conceptions of assessment,” in Student Perspectives on Assessment: What Students Can Tell Us About Assessment for Learning, eds McInerneyD. M.BrownG. T. L.LiemG. A. D. (Charlotte, NC: Information Age Publishing), 187–205.
- Google Scholar
48
WiseS. L.SmithL. F. (2016). “The validity of assessment when students don’t give good effort,” in Handbook of Human and Social Conditions in Assessment, eds BrownG. T. L.HarrisL. R. (New York: Routledge), 204–220.
- Google Scholar
49
XuY.BrownG. T. L. (2016). Teacher assessment literacy in practice: a reconceptualization. Teach. Teach. Educ.58, 149–162.10.1016/j.tate.2016.05.010
- CrossRef
- Google Scholar
50
ZumboB. D. (2015). “Consequences, side effects and the ecology of testing: keys to considering assessment in vivo,” in Plenary Address to the 2015 Annual Conference of the Association for Educational Assessment—Europe (AEA-E), Glasgow, Scotland.
- Google Scholar

Summary

Keywords

assessment, psychometrics, classroom assessment, formative assessment, error, culture, social behavior, psychological tests

Citation

Brown GTL (2017) The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error. Front. Educ. 2:3. doi: 10.3389/feduc.2017.00003

Received

27 November 2016

Accepted

30 January 2017

Published

13 February 2017

Volume

2 - 2017

Edited by

Anastasiya A. Lipnevich, The City University of New York, USA

Reviewed by

Eva Marie Ingeborg Hartell, KTH Royal Institute of Technology, Sweden

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gavin T. L. Brown, gt.brown@auckland.ac.nz

Specialty section: This article was submitted to Assessment, Testing and Applied Measurement, a section of the journal Frontiers in Education

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Assessment, Testing and Applied Measurement

SPECIALTY GRAND CHALLENGE article

The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error

Statements

Author contributions

Acknowledgments

Conflict of interest

References

Summary

Outline

Cite article

Article metrics

SPECIALTY GRAND CHALLENGE article

The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error

Statements

Author contributions

Acknowledgments

Conflict of interest

References

Summary

Outline

Cite article

Share article

Article metrics