Studying real-world perceptual expertise

Significant insights into visual cognition have come from studying real-world perceptual expertise. Many have previously reviewed empirical findings and theoretical developments from this work. Here we instead provide a brief perspective on approaches, considerations, and challenges to studying real-world perceptual expertise. We discuss factors like choosing to use real-world versus artificial object domains of expertise, selecting a target domain of real-world perceptual expertise, recruiting experts, evaluating their level of expertise, and experimentally testing experts in the lab and online. Throughout our perspective, we highlight expert birding (also called birdwatching) as an example, as it has been used as a target domain for over two decades in the perceptual expertise literature.


INTRODUCTION
In nearly every aspect of human endeavor, we find people who stand out for their high levels of skill and knowledge. We call them experts. Expertise has been studied in domains ranging from chess (Chase and Simon, 1973;Gobet and Charness, 2006;Connors and Campitelli, 2014;Leone et al., 2014) to physics (Chi et al., 1981) to sports (Baker et al., 2003). Perceptual experts, such as ornithologist, radiologists, and mycologists, are noted for their remarkable ability to rapidly and accurately recognize, categorize, and identify objects within some domain. Understanding the development of perceptual expertise is more than characterizing the behavior of individuals with uncanny abilities. Rather, if perceptual expertise is the endpoint of the trajectory of normal visual learning, then studying perceptual experts can provide insights into the general principles, limits, and possibilities of human learning and plasticity (e.g., Gauthier et al., 2010).
Several reviews have highlighted empirical findings and theoretical developments from research on perceptual expertise in various modalities (for visual expertise, see, e.g., McCandliss et al., 2003;Palmeri and Cottrell, 2009;Richler et al., 2011; for auditory expertise, see, e.g., Chartrand et al., 2008;Holt and Lotto, 2008; for tactile expertise, see, e.g., Behrmann and Ewell, 2003;Reuter et al., 2012). Here, we instead highlight more practical considerations that come with studying perceptual expertise; we highlight visual expertise because this modality has been most extensively studied. We specifically consider some choices that face researchers: whether to use real-world or artificial objects, what domain of perceptual expertise to study, how to recruit participants, how to evaluate their expertise, and whether to test in the lab or via the web. Throughout our perspective, we use birding as an example domain because it has been commonly used in the literature (e.g., Tanaka and Taylor, 1991;Gauthier et al., 2000;Tanaka et al., 2005;Mack et al., 2007;Mack and Palmeri, 2011).

REAL-WORLD vs. ARTIFICIAL DOMAINS OF EXPERTISE
Expertise-related research has been conducted using both artificial and real-world objects. Artificial objects include simple stimuli like line orientations, textures, and colors (e.g., Goldstone, 1998;Mitchell and Hall, 2014), and relatively complex novel stimuli like random dot patterns (Palmeri, 1997), Greebles (Gauthier and Tarr, 1997;Gauthier et al., 1998Gauthier et al., , 1999, and Ziggerins (Wong et al., 2009a). Real-world objects include birds, dogs, cars, and other categories (Tanaka and Taylor, 1991;Gauthier et al., 2000). Studies using artificial objects are often training studies, where researchers recruit novices and train them to become "experts" in a domain. Changes in behavior or brain activity are measured over the course of training to understand the development of expertise, making these studies longitudinal. The weeks of training used in these studies can only be a proxy for the years of experience in real-world domains. Because real-world expertise takes so long to develop, most real-world studies are cross-sectional.
An advantage of training studies with artificial objects is the power to establish causality. Experimenters have precise control over properties of novel objects, relationships between them, and how categories are defined (e.g., Richler and Palmeri, 2014). Participants can be randomly assigned to conditions and training and testing can be carefully controlled. As one example, Wong et al. (2009a,b) used novel Ziggerins and trained people in two different ways, one of which mirrored individuation required for face recognition, another of which mirrored the letter recognition demands required for reading. Accordingly, the face-like training group showed behavior and brain activity similar to that seen in face recognition while the letter-like training group showed behavior and brain activity similar to that seen in letter recognition. Studies of artificial domains of expertise can provide insights into real-world domains.
If researchers are interested in understanding what makes experts experts, not just investigating limits of experiencerelated changes, then it is important to complement carefully www.frontiersin.org controlled laboratory studies using artificial domains with the study of real-world experts. Because of their quasi-experimental nature -recruiting novices and those with varying levels of expertise as they occur in the real world -these studies cannot establish unambiguous causal relationships between expertise and behavioral or brain changes. Apart from considerations of external validity, studies of real-world experts permit the study of a range and extent of expertise that cannot easily be reproduced in the laboratory. And practically speaking, testing real-world perceptual experts on real-world perceptual stimuli saves researchers the effort and expense needed to train participants in an artificial domain.
Studies using real-world domains also come full circle to inform studies using artificial domains. For example, consider the classic result of Tanaka and Taylor (1991), reproduced in our own online replication in Figure 1. Bird experts categorized birds (their expert domain) and dogs (their novice domain). For novices (Rosch et al., 1976), objects are categorized faster at a basic level (dog) than a superordinate (animal) or subordinate level (blue jay), while for experts (Tanaka and Taylor, 1991;Johnson and Mervis, 1997), objects are categorized as fast at a subordinate level as a basic level. This entry-level shift (Jolicoeur et al., 1984; see also Tanaka et al., 2005;Mack et al., 2009;Mack and Palmeri, 2011) has been used as a behavioral marker of expertise in training studies employing artificial domains (Gauthier et al., 2000;Gauthier and Tarr, 2002).
Our group recently reviewed considerations that factor into studies using artificial domains (Richler and Palmeri, 2014), so here we focus on real-world domains for the remainder of our perspective.

DOMAINS OF REAL-WORLD PERCEPTUAL EXPERTISE
In addition to everyday domains of perceptual expertise, like faces (Bukach et al., 2006) and letters (McCandliss et al., 2003), studies have used domains ranging from cars and birds (Gauthier et al., 2000), where expertise is not uncommon, to more specialized and sometimes esoteric domains like latent fingerprint identification (Busey and Parada, 2010;Dror and Cole, 2010), budgie identification (Campbell and Tanaka, 2014), and chick sexing (Biederman and Shiffrar, 1987). The particular choice of expert domain depends on a combination of theoretical goals and practical considerations.
For example, consider a goal of understanding how the ability to categorize at different levels of abstraction changes with perceptual expertise (Mack and Palmeri, 2011), which impacts understanding of how categories are learned, represented, and accessed. Birding is a useful domain because birders must make subordinate and sub-subordinate categorizations, sometimes at a glance, and often under less than ideal conditions with poor lighting and camouflage. Other kinds of bird experts have different skills: budgie experts (a budgerigar is a bred parakeet) can keenly identify unique individuals in cages, but need not have expertise with other birds, while professional chick sexers can quickly discriminate male from female genitalia on chicken hatchlings. In an entirely different domain, fingerprint experts typically match latent prints with a known sample, with both clearly visible, presented side by side, and with time limits imposed by the analyst, not the environment.

FIGURE 1 | Mean correct categorization response times for a novice domain (dogs) and an expert domain (birds) measured online.
Following Tanaka and Taylor (1991), bird experts were tested in a speeded category verification task where they categorized images at the superordinate (animal ), basic (bird or dog), or subordinate (specific species or breed) level. In their novice domain (dogs), a classic basic-level advantage was observed, whereby categorization at the basic level was significantly faster than the superordinate (t 22 = 2.67, p = 0.014) and subordinate level (t 22 = 6.75, p < 0.001). In their expert domain (birds), subordinate categorization was as fast as basic-level categorization (t 22 = 0.81, p = 0.429). This replication was conducted using an online Wordpress + Flash custom website with only 23 participants from a single short 10 min experimental session. Error bars represent 95% confidence intervals on the level × domain interaction.
There are real-world consequences for studying certain domains of perceptual expertise, such as latent fingerprint examination. Despite the widespread use of forensic evidence -as well as its popular depiction on television -a recent National Research Council of the National Academy of Sciences (2009) noted a "dearth of peer-reviewed, published studies establishing the scientific bases and validity of many forensic methods," especially those methods that require subjective visual pattern analysis and expert testimony. That scientific evidence is emerging, especially in the case of latent fingerprint expertise (e.g., Busey and Parada, 2010;Busey and Dror, 2011).
The choice of domain can also be influenced by various practical considerations. It is easier to study perceptual expertise in a domain with millions of possible participants than an esoteric domain with a few isolated members. It is easier to study a domain where relevant stimuli are widely available in books and online. And it is easier to study a domain without barriers to contact, which can be the case for experts in the military, homeland security, and certain professions. For example, studies of expert baggage screeners require coordination with the Transportation Security Administration (TSA) and many details regarding stimuli and procedures cannot be shared with the public (e.g., Wolfe et al., 2013). In the case of birding, there are millions of people in the US alone who consider birding a hobby, spending hours in their yards and parks, and billions on books, equipment, and travel (La Rouche, 2006). Photos of birds are widely available; books have been published on particularly difficult bird identifications (e.g., Kaufman, 1999Kaufman, , 2011. Birders regularly participate in citizen science efforts, such as the Christmas bird count and provide data on bird sightings to databases like ebird.org. Anecdotally, this translates into a keen interest in science and a willingness to participate in research.

RECRUITING
In the past, experts usually had to be recruited locally, with advertisements posted around a university campus and in local newspapers. It may be hard for some to remember that it has only been in the past several years that not having an email address has become almost equivalent to not having a phone number, and that only recently has it become the case that most people have some Internet access. Being able to recruit participants more widely via the Internet promises not only to increase heterogeneity of participants, but also, and especially relevant for expertise research, promises to locate participants with a far greater range of expertise than might be possible when recruiting in a local geographic region.
One rapidly exploding means of recruiting and testing (see "Testing") participants is Amazon Mechanical Turk (AMT). AMT allows hundreds of subjects to be easily recruited and tested in a matter of days; participants on AMT are more demographically diverse than typical American college samples (Buhrmester et al., 2011). This diversity is important for research examining individual differences in perception and cognition. While the potential population of AMT workers is large, it is unknown how many with high levels of domain expertise might be workers on the platform. For expertise research, recruitment via AMT may need to be supplemented by more direct recruitment of true domain experts (e.g., Van Gulick, 2014).
Large domains of expertise have organizations, web sites, blogs, and even tweets and Facebook updates that target particular individuals. In principle, online recruiting through these channels offers a quick, easy, and inexpensive means of finding experts. These could involve paid advertisements online and in electronic newsletters. More directly, these could involve messages sent to email lists. The biggest challenge to this, however, is that many professional organizations or workplaces would rarely allow, and many outright prohibit, direct solicitation of members or employees, even for basic research; researchers cannot directly contact TSA baggage screeners or latent fingerprint examiners. By comparison, birding organizations, including local Ornithological and Audubon Societies, whose members join as part of a hobby, not a profession, can be less restrictive in terms of allowing contact with members, so long as contact is non-intrusive. In our case, we have identified several hundred birding groups in the US and Canada, we have contacted several dozen directly, and have received permission to solicit volunteer participants from most, having so far tested several hundred birders with a wide range of experience and expertise.

EVALUATING LEVELS OF PERCEPTUAL EXPERTISE
How do we know someone is a perceptual expert? A simple approach relies on subjective self-rating, often supplemented by self-report on the amount of formal training, years of experience, or community reputation. For example, bird experts in Tanaka and Taylor (1991) were recommended by members of bird-watching organizations and had a minimum of 10 years of experience, and those in Johnson and Mervis (1997) led birding field trips and some had careers related to birding.
It is now well-recognized that self-reports of expertise are insufficient and that objective measures of expert performance are needed (e.g., Ericsson, 2006Ericsson, , 2009; self-report measures of perceptual expertise are not always good predictors of performance (e.g., McGugin et al., 2012;Van Gulick, 2014). Therefore, recent work has used quantitative measures to assess expert abilities (e.g., see Gauthier et al., 2010). A detailed review and discussion of such measures is well beyond the scope of a brief perspective piece. A variety of quantitative measures of perceptual expertise have been used and new measures are currently being developed -these efforts to develop and validate new measures reflect a quickly growing interest in exploring individual difference in visual cognition (e.g., Wilmer et al., 2010;Gauthier et al., 2013;Van Gulick, 2014).
While expert-novice differences are sometimes loosely described as if they were dichotomous, it is self-evident that expertise is a continuum, people vary in their level of expertise, and any measure of expertise must place individuals along a (perhaps multidimensional) continuum. Some behavioral or neural markers might distinguish pure novices from those with some experience but asymptote at only an intermediate level of expertise, while other behavioral or neural markers might distinguish the true experts from more middling experts and novices. Understanding the continuum of behavioral and brain changes, whether they are asymptotic, monotonic, or even non-monotonic over the continuum of expertise, can have important implications for understanding mechanistically and computationally how perceptual expertise develops (e.g., see . Briefly, one useful measure has focused on the perceptual part of perceptual expertise: using a simple one-back matching task, images are presented one at a time and participants must say whether consecutive pictures are the same or different. Experts have higher discriminability (d ) on images from their domain of expertise relative to non-expert domains, and this difference predicts behavioral and brain differences (e.g., Gauthier et al., 2000;Gauthier and Tarr, 2002). Another measure has focused on memory as an index of perceptual expertise: the Vanderbilt Expertise Task (VET; McGugin et al., 2012) mirrors aspects of the Cambridge Face Memory task (Duchaine and Nakayama, 2006). Participants memorize exemplars from several different artifact and natural categories and then recognize other instances under a variety of conditions, and these differences in memory within particular domains predict behavioral and brain differences (e.g., McGugin et al., 2014). With our interest in categorization at different levels of abstraction, in work in preparation, we have developed a measure that has focused on categorical knowledge in perceptual expertise: adapting common psychometric approaches, we are refining what could essentially be characterized as an Scholastic Assessment Test (SAT, a standardized test widely used for college admission in the United www.frontiersin.org States) of birding knowledge, with multiple-choice identifications of bird images ranging from easy (common backyard birds like the Blue Jay), to intermediate (distinctive yet far less common birds, like the Pileated Woodpecker or Great Kiskadee), to quite difficult identifications that even fairly expert birders find difficult (like discriminating Bohemian from Cedar Waxwing, Hairy from Downy Woodpecker, or correctly identifying the many extremely similar warblers, sparrows, or flycatchers). Future work must consider to what extent different measures of perceptual expertise capture the same dimensions of expert knowledge and predict the same behavioral and brain measures that vary with expertise.

TESTING
Laboratory testing allows careful control and monitoring of performance, permits experiments that require precisely-timed stimulus presentations, and of course allows sophisticated behavioral and brain measures like eye movements, fMRI, EEG, and the like. But laboratory testing incurs a potential cost in that the number of laboratory participants is often limited due to the expense of subject reimbursement, personnel hours, lab space, and equipment. And for any study of unique populations who might be geographically dispersed, such as perceptual experts, the cost of bringing participants to the laboratory can be prohibitively expensive.
Until fairly recently, the only real methods for testing participants from a wide geographic area, apart from having experimenters or participants travel, was to have the experiments travel. For simple studies, this could mean mailed pencil-and-paper tests, while for more sophisticated studies, this could mean sending disks or CDs to participants to run on a home computer (e.g., Tanaka et al., 2010). As anyone who programs well knows, getting software to run properly on a wide range of computer hardware and operating system versions can be a daunting task. In the past few years, it has become popular, and wildly successful, to have experiments run via a web browser. While not entirely immune to the vagaries of hardware and operating system versions, browserbased applications are often more robust to significant variation, and can often automatically prompt users for upgrades to requisite software plug-ins.
There are multiple platforms and approaches to online webbased experiments. One approach, highlighted earlier, uses AMT. In AMT, researchers publish Human Intelligence Tasks (HITs) that registered workers can complete in exchange for modest monetary compensation. AMT integrates low-level programming tools for stimulus creation, test design, and programming into one webbased application; other elements in AMT include automated compensation, recruitment, and data collection. Aside from the availability of these tools, a clear advantage of AMT is the potential to recruit from a large and diverse pool of participants. An alternative approach is to develop and support a custom webbased server for experiments. There are powerful tools for creating web pages, such as Wordpress (wordpress.org), and fairly sophisticated programs can be developed in Adobe Flash or Javascript (e.g., De Leeuw, 2014; Simcox and Fiez, 2014). Perhaps an advantage of such custom portals is that people may be more attracted to them because of their interest in participating in research, not because of the potential to earn money, as might sometimes be the case for AMT. In the end, we suspect that most labs will use a combination of both platforms for recruiting, testing, or both.
At least given current computer hardware in wide use, a potential vexing problem for web-based experiments is timing. Fortunately, platforms such as Flash and Javascript run on the local (participant) computer, so properly-designed programs can avoid problems that could be introduced by variability in Internet connection speeds. Thankfully, reasonable response time measurements can be obtained (Reimers and Stewart, 2007;Crump et al., 2013;Simcox and Fiez, 2014). Indeed, as illustrated in Figure 1, we have successfully observed differences in RTs for expert and novice domains in online experiments using a Wordpress + Flash environment that mirror observations of expert speeded categorization from classic laboratory studies (Tanaka and Taylor, 1991). Unfortunately, the most critical limitation for now concerns stimulus timing. It is well known that LCD monitors in wide use have response characteristics far too sluggish to permit the kind of "single-refresh" presentations that would have been possible on previous CRTs. While presentation times of 100 ms or more are probably a safe bet, anything faster would require calibration to check that a participant had a sufficiently responsive monitor; it may be that the next generation of LCD, LED, or other technologies will (hopefully) eliminate these limitations.

SUMMARY
Most human endeavors have a perceptual component. For example, keen visual perception is required in sports, medicine, science, games like chess, and a wide range of skilled behavior. Thus research on real-world perceptual expertise has potential theoretical and applied impacts to many domains. Here we briefly outlined at least some of the practical considerations that factor into research on real-world perceptual expertise. Several of these considerations are things that researchers often fret over behind the scenes without making it into a typical research publication, so in that sense we hope this brief perspective fills a small but important hole in the literature.