What We Do and Do Not Know about Teaching Medical Image Interpretation

Educators in medical image interpretation have difficulty finding scientific evidence as to how they should design their instruction. We review and comment on 81 papers that investigated instructional design in medical image interpretation. We distinguish between studies that evaluated complete offline courses and curricula, studies that evaluated e-learning modules, and studies that evaluated specific educational interventions. Twenty-three percent of all studies evaluated the implementation of complete courses or curricula, and 44% of the studies evaluated the implementation of e-learning modules. We argue that these studies have encouraging results but provide little information for educators: too many differences exist between conditions to unambiguously attribute the learning effects to specific instructional techniques. Moreover, concepts are not uniformly defined and methodological weaknesses further limit the usefulness of evidence provided by these studies. Thirty-two percent of the studies evaluated a specific interventional technique. We discuss three theoretical frameworks that informed these studies: diagnostic reasoning, cognitive schemas and study strategies. Research on diagnostic reasoning suggests teaching students to start with non-analytic reasoning and subsequently applying analytic reasoning, but little is known on how to train non-analytic reasoning. Research on cognitive schemas investigated activities that help the development of appropriate cognitive schemas. Finally, research on study strategies supports the effectiveness of practice testing, but more study strategies could be applicable to learning medical image interpretation. Our commentary highlights the value of evaluating specific instructional techniques, but further evidence is required to optimally inform educators in medical image interpretation.


INTRODUCTION
How to teach medical image interpretation? For an educator in radiology, dermatology, pathology or cardiology, this might be the question in mind. Since 'evidence-based medicine' is held in high regard by clinicians, medical educators might aim to search the literature for evidence on how to design their instruction. Instructional design is the science and practical field of creating educational experiences (Merrill et al., 1996). This can be as broad as a curriculum or as narrow as a lesson, or even a single instructive animation. Unfortunately, evidence regarding how to teach medical image interpretation is hard to come by. Research on instructional design in medical image interpretation suffers from being scattered all over the literature, from a lack of cross-references and a lack of theoretical background. A lot of commentaries are published (e.g., Fenderson, 2005;Gunderman and Ballenger, 2014), which might serve as an inspiration, but should not be considered 'evidence.' This makes it challenging for medical educators to find and apply the relevant literature to their educational practice. The aim of this paper is twofold. On the one hand we synthesize existing literature about instructional design in medical image interpretation. On the other hand we identify gaps in the literature and propose research inspired by psychological theories as a solution. While we used an extensive literature search to inform our argument, the aim of this paper is not to be systematic and exhaustive, but to provide a commentary that is informed by a literature search.

METHODS
We searched Web of Science for papers related to instructional design, using the keywords (teach * OR education OR instruct * OR curric * ) AND [(Radiology OR Radiography) OR (pathology AND image) OR dermatology OR (electrocardiogra * OR ECG)]. We focused on papers less than 15 years old (2001( -July 2016 to include recent papers only. This search yielded 4785 papers. Titles were scanned by EMK for relevance and subsequently the abstracts were scanned by KvG. 120 papers were selected for complete reading. EMK and KvG each checked half of these papers against the inclusion criteria. If doubt arose regarding the inclusion, both readers read the paper and discrepancy was resolved through discussion. We included only papers that implemented an educational experience and measured the effects of this intervention (against a control condition and/or against a pretest). We do not go into a discussion about what 'effects' of education are and should be, but consider 'effective' to be 'yielding a higher score on a test, ' or 'being evaluated more positively.' We excluded papers where medical image interpretation was not the outcome measure (e.g., procedural knowledge), or that treated medical images as tools for teaching something else (e.g., use of radiographs as an illustration in anatomy classes). Table 1 in the Supplementary Materials provides an overview of the 81 selected papers, and they are marked with an asterisk in the reference list. We identified two broad categories of studies: (1) evaluations of curricula and courses (we discuss offline courses and e-learning courses separately) and (2) evaluations of specific instructional techniques. Three theoretical frameworks that form the basis of specific instructional techniques arose from the review: diagnostic reasoning, cognitive schemas, and study strategies. The curricula and courses in the first category commonly implement the specific instructional techniques in the second category of studies. However, studies in the first category rarely discuss specific instructional techniques, and, critically, these instructional techniques are not separately tested. We argue below that only evaluations of specific instructional techniques provide information that educators can use to design their education. In this paper, we discuss respectively evaluations of offline curricula and courses, evaluations of e-learning courses and evaluations of specific instructional techniques. For the purpose of the argument, we discuss representative papers in the rest of the manuscript and refer the reader to the Supplementary Materials for a complete overview of the reviewed papers.

Evaluation of Offline Curricula and Courses
Twenty-three percent of the reviewed studies evaluated a course or curriculum. These are often a combination of lectures, workshops and self-study. The outcome of these studies might seem straight-forward and reassuring: the score on the posttest is typically higher than the score on the pretest, and the 'new' curriculum is more effective and more positively evaluated than the 'old' curriculum. However, appraisal of these results is problematic, for a variety of reasons. Firstly, numerous differences exist between the 'new' curriculum and the 'old' curriculum (or other control conditions). This makes the outcome ambiguous: it is impossible to know what makes a new curriculum more effective than the old curriculum. Possibly, the instructional techniques used in the new curriculum are more effective (but, if so, which of the techniques?). But the difference might just as well be caused by other factors, such as enthusiasm of the staff or students for the new curriculum. In line with , we argue that the evaluation of complete courses yields trivial findings that provide no insights for educators, unless the course is carefully compared with another course that only differs on specific, well-defined aspects. If seemingly more specific techniques are compared, such as case-based learning or self-directed learning, the findings provide hardly more insights. The techniques still differ in too many aspects, and often they are very broadly and not uniformly defined. This makes it even more difficult to compare the results over studies. On top of the problem that learning effects cannot be unambiguously attributed to specific instructional techniques, in many of the studies the methodology had apparent weaknesses, e.g., no control conditions and/or the use of inappropriate randomization.

Evaluation of E-learning Modules
E-learning and blended learning are also widely investigated in medical image interpretation (44% of all papers). E-learning refers to learning activities that interactively use a computer to enhance learning (Ruiz et al., 2006). Blended learning refers to a mix of online with traditional (lecture-based) learning activities (Spanjers et al., 2015). E-learning and blended learning environments often provide participants with the opportunity to work through patient cases or provide content information in an interactive manner.
E-learning is a popular way to promote active learning: it allows for large groups of learners to engage in learning at a time and place convenient for them, has the potential to be tailored to learners' needs, and allows for instructional designs that cannot be implemented in other formats (Cook et al., 2008).
Certainly, e-learning is widely found to be more effective or non-inferior to traditional forms of teaching, in both medical education in general (Cook et al., 2008), and medical image interpretation (Zafar et al., 2014). Indeed, the lion's share of the studies that we reviewed conform to this. Once again, the differences between the e-learning curriculum and the control condition (often a traditional, lecture-based curriculum) are often too large to unequivocally attribute effects to instructional techniques. This issue is even more pressing when it comes to e-learning. Not only do conditions differ from each other in terms of instructional techniques, the differences in the technology used for implementation further confound the comparison.
When an online or offline curriculum is implemented, it aims to maximize learning instead of isolating the contribution of specific educational techniques, even though a curriculum applies a set of instructional techniques. This means that studies resulting from this type of implementations are not optimized for providing specific information on how instruction should be designed. Indeed, many of the methodological weaknesses in this type of studies result from practical and ethical considerations (e.g., it is often impossible to randomly assign students to conditions). However, for an educator in medical image interpretation, it is important to know what specific techniques yield more effective learning, and therefore researchers need to design studies that answer this question. In the next section, we review studies that zoom in on specific instructional techniques. We argue that these specific studies are more informative for educators, not only because they provide more detailed information about 'what works, ' but also because they are often theory-driven.

Evaluation of Specific Instructional Techniques
Thirty-two percent of the studies that we reviewed evaluated a specific instructional technique. Most of these studies are (implicitly or explicitly) rooted in psychological theories. We discuss three psychological theories that together form the basis of most of these studies. We discuss respectively theories of diagnostic reasoning (Eva, 2004), cognitive schema theory (Charlin et al., 2007) and study strategies (Dunlosky et al., 2013).

Diagnostic Reasoning
Research in cognitive psychology proposes two modes of reasoning: analytic and non-analytic reasoning. Analytic reasoning refers to deliberate, effortful reasoning while nonanalytic reasoning refers to automatic, rapid reasoning, also referred to as pattern recognition (Eva, 2004). Several studies use this framework to investigate specific educational techniques in medical image interpretation. For example, Ark et al. (2007) stimulated students to "carefully identify all features [of an ECG] while trusting guidance provided by feelings of familiarity, " i.e., balancing an analytical approach (carefully identifying features) and a non-analytical approach (trusting feelings of familiarity). This was more effective then not providing students with instructions on how to approach this task. Likewise, Baghdady et al. (2014a) found that students who were directed to diagnose a radiograph first and only then identify radiographic features outperformed participants who identified features first and then diagnosed the radiograph.
The claim that students should be instructed to diagnose a case first, based on feelings of familiarity (non-analytic reasoning) and only then collect and analyze all information (analytic reasoning) contrasts with the claim that it is crucial to systematically collect all relevant information in a medical image before making a diagnosis, which is the assumption underlying the idea of teaching a search pattern (Auffermann et al., 2015(Auffermann et al., , 2016. While these studies provide evidence for a benefit of a search pattern training in radiology over no training, the benefit of systematic viewing training over a non-systematic search pattern training could not be established in radiology (Kok et al., 2016) or in ECG interpretation (Varvaroussis et al., 2014).
To sum up, these studies suggest teaching a balanced reasoning strategy, starting with non-analytic reasoning and subsequently applying analytic reasoning. Patel et al. (2015) suggest mapping and microanalysis as two tools to understand a learners' reasoning process and provide focused feedback to train students in balancing reasoning strategies. Another option is to present students with high numbers of cases under time pressure, as a way to counteract the relatively high emphasis on analytical reasoning in medical education (Patel et al., 2015). These instructional techniques have not been investigated in medical image interpretation yet.

Cognitive Schemas
Diagnostic reasoning requires extensive knowledge that is structured into meaningful patterns, so-called cognitive schemas or illness scripts (Charlin et al., 2007). These contain information about pathophysiological processes underlying diseases, patients' characteristics and signs and symptoms (Boshuizen and Schmidt, 1992;Van De Wiel et al., 2000). The acquisition of high-quality scripts is central to learning in medicine (Charlin et al., 2007) and thus several studies have explicitly or implicitly focused on helping students to develop high-quality scripts, often with a focus on either pathophysiological processes underlying diseases, patients characteristics or signs and symptoms. Boshuizen and Schmidt (Boshuizen and Schmidt, 1992;Van De Wiel et al., 2000) argue that basic science knowledge (i.e., the understanding of pathophysiological processes underlying diseases) is fundamental to these illness scripts. However, with increased expertise, basic science knowledge is encapsulated and only used for difficult, atypical cases. Baghdady et al. (2009) argue that basic science knowledge helps diagnosis through creating a coherent mental representation of diseases and their (visual) features. Participants that were provided with causal explanations of radiological features learned more than students who were presented with feature lists or a structured algorithm. A second study, however, found that the negative effect of the structured algorithm (without basic science explanations) was mitigated by the previously discussed instruction to provide a diagnosis before summing up all features (Baghdady et al., 2014a).
The prevalence of a disease is included in illness scripts. Building solid information about actual prevalence of diseases into illness scripts can avoid the 'prevalence bias' in decision making (Croskerry, 2003). Pusic et al. (2012) found that the prevalence of normal and abnormal cases in a training set impacted the sensitivity-specificity trade-off, and should thus be considered when developing training sets.
Many studies have aimed to help students develop appropriate cognitive schemas that include the signs and symptoms of diseases. Blissett et al. (2015) suggest that expert-generated (well-structured) schemas can help students to understand the organization of knowledge. Indeed, they found improved learning (in the task of ECG interpretation) from expertgenerated schemas as compared to learner-generated schemas. Dong et al. (2015) found that making concept maps (to explicitly structure knowledge) was more effective for learning ECG interpretation then traditional teaching. Other studies have found learning by comparison to be an effective way to teach signs and symptoms in radiology (Kok et al., 2013(Kok et al., , 2015 and ECG interpretation (Ark et al., 2007). Medical image interpretation is also unique in that readers are often presented with two-dimensional representations of the three-dimensional body (van der Gijp et al., 2015). Two studies used 3D renderings and models to help participants understand the relationship between the signs and symptoms as seen in 2D and in 3D, in dermatology (Garg et al., 2010) and radiology (Lee et al., 2010).
Appropriate cognitive schemas are crucial for analytical and non-analytical reasoning. The development of appropriate cognitive schemas should thus be a key goal of instructional design. Further research into the optimal combination of these techniques, in order to connect pathophysiological processes underlying diseases, patients' characteristics and signs and symptoms, is required. In particular, establishing a relationship between visual signs and symptoms, and verbal information about pathophysiological processes requires further research. Dunlosky et al. (2013) reviewed the effectiveness of 10 study strategies. Only two of those have been investigated in medical image interpretation: practice testing and mixed practice. Baghdady et al. (2014b) found practice testing to be a more effective way of studying dental radiology than engaging in additional study. Mixed practice (alternating practice on different kinds of problems) has been investigated by Hatala et al. (2003) and . They compared blocked practice (practicing categories of abnormalities one by one) and mixed practice (practicing the items of the categories in mixed order).  did not find differences in performance, while Hatala et al. (2003) did. While Dunlosky's review considered the utility of the study strategies in diverse situations, many of those strategies are not applied to visual tasks, so the application of findings about effective study strategies in medical image interpretation requires further research.

CONCLUSION
Instructional design in medical image interpretation has surpassed teacher-centered, lecture-based education and many examples of active, student-centered learning have successfully been implemented in medical image interpretation. However, the evaluation of those complete courses, curricula or e-learning modules provides few insights into specific techniques that lead to optimized learning. It is still unclear which techniques makes complete programs effective, and our educator is left with only a shallow understanding of what makes a specific instructional technique effective.

Take-Home Messages for Educators
Our review of specific interventions provides more detailed recommendations, informed by theories about diagnostic reasoning, schema development and study strategies. It is suggested that educators should teach a balanced reasoning strategy, starting with non-analytic reasoning and subsequently applying analytic reasoning, although further research is required on how this should be taught. Building appropriate cognitive schemas is critical to teaching medical image interpretation, and several designs are proposed that support this. Concept-maps, learning through comparison and expert-generated schemas are found to be useful ways of supporting schema building. Finally, research on study strategies supports the effectiveness of practice testing, but more strategies could be applicable to medical image interpretation.

Limitations
A limitation of this commentary is that we did not formally assess the quality of the reviewed studies. In general, randomized controlled trials are scarce and few studies included appropriate control conditions, although this problem was less prevalent in studies of specific interventions. A systematic assessment of study quality was beyond the scope of this literature-informed commentary but could be a relevant venue for further research. Furthermore, while this commentary discusses what conclusions cannot be drawn from the reviewed literature, there are many topics of which, as Rumsfeld said, 'we do not know that we do not know them.' Finally, we focused on outcome measures that reflect medical image interpretation. This excludes research on other important topics such as indications for imaging and professionalism.
Research on expertise in medical image interpretation states that experts have superior perceptual and cognitive abilities (Krupinski, 2010;Manning, 2010;Nodine and Mello-Thoms, 2010;Reingold and Sheridan, 2011;van der Gijp et al., 2014). Interestingly, few studies on education explicitly relate to research about visual expertise. Another remarkable gap in the literature is the lack of studies that focus on individualized and self-regulated learning, two important topics in instructional design nowadays (Van Merrienboer and Kirschner, 2013). In other visual domains, such as air traffic control (Salden et al., 2006), it was found that adapting training to the learners' needs makes learning more efficient, so this finding is promising for medical image interpretation. Research on self-regulated learning in non-visual diagnostic reasoning provides a possible starting point for fostering self-regulated learning (Cleary et al., 2016). However, visual metacognition is found to be rather low (e.g., Võ et al., 2016), so it is important to understand how visual metacognition can be fostered in order to optimize learning medical image interpretation. In conclusion, we discussed findings relevant for teaching medical image interpretation, but many open questions remain, and further evidence is required to optimally inform educators in medical image interpretation.

AUTHOR CONTRIBUTIONS
The authors conceptualized the work together. EK and KvG conducted the review, EK wrote a first draft, all authors revised the work critically for important intellectual content. All authors approved the final version.