Picturing perspectives: development of perspective-taking abilities in 4- to 8-year-olds

Although the development of perspective taking has been well researched, there is no uniform methodology for assessing this ability across a wide age span when frames of reference conflict. To address this gap, we created scenes of toy photographers taking pictures of layouts of objects from different angles, and presented them to 4- to 8-year-olds (N = 80). Children were asked to choose which one of four pictures could have been taken from a specific viewpoint. Results showed that this new technique confirmed the classic pattern of developmental progress on this kind of spatial skill: (1) 4-year-olds responded near chance level, regardless of layout complexity, (2) there was a growing ability to inhibit egocentric choices around age 6 with layouts of low complexity (one object), (3) performance increased and egocentric responses decreased dramatically around age 7, (4) even at age 8, children still showed considerable individual variability. This perspective taking task can thus be used to address important questions about the supports for early spatial development and the structure of early intellect.


INTRODUCTION
Visual perspective taking, or the ability to mentally represent a viewpoint different from one's own, has been extensively studied in the developmental literature, beginning with Piaget and Inhelder's (1956) seminal work on The child's conception of space. In their Three Mountains Task, Piaget and Inhelder asked children to look at a model display of three mountains and to indicate how an observer would see this layout from another position. Up to the age of 9 or 10 years, children made many errors on this task and often picked their own view instead of the observer's view, thus committing egocentric errors. Piaget and Inhelder argued that young children were unable to coordinate multiple perspectives and they interpreted these difficulties as being indicative of a lack of understanding of what they called projective space. This work inspired hundreds of studies, many of which focused on demonstrating success earlier than 9 or 10 years (see review by Newcombe, 1989).
One important distinction in understanding the development of perspective taking was proposed by Flavell et al. (1981) and Masangkay et al. (1974). These researchers pointed out that knowledge about which objects are visible at all from another viewpoint appears quite early in life. They called this type of non-egocentric inference Level 1 knowledge. Thus, a child at Level 1 knows that another person may currently see an object that is not visible for the child, or vice versa. However, computing exactly how the other person perceives things is not possible until Level 2. In support of this distinction, recent studies have demonstrated Level 1 abilities even in infancy (Moll and Tomasello, 2006;Sodian et al., 2007). In contrast, Level 2 rules are first evident at around 4 or 5 years of age (Masangkay et al., 1974;Flavell et al., 1980;Pillow and Flavell, 1986) but performance improves considerably between age 6 and 8 (Salatas and Flavell, 1976).
Despite a large body of research on perspective taking, our knowledge of the individual and task factors affecting the emergence of Level 2 perspective taking is still very limited. One relevant factor was suggested by Fishbein et al. (1972), who found that children from 3-9 years performed better in picture selection tasks if only one object was involved, compared to when an array of three objects was presented (see also Gzesh and Surber, 1985). Tasks with only one object may be spatially less complex as no arrayinternal relations have to be encoded and coordinated with the position (or line of sight) of the observer. Consistent with this notion, Ives (1980) found that responses of children as young as 3 years old were about 90% correct if they were asked whether they would see the front, side, or back of a single fronted object (e.g., a horse) from a specified position. However, several other studies with children from kindergarten to 6th grade have failed to find an effect of number of objects in the array (Brodzinsky et al., 1972;Minnigerode and Carey, 1974;Nigl and Fishbein, 1974).
Another important factor affecting the emergence of perspective taking lies in the nature of the dependent variable and whether it requires the suppression of conflicting frames of reference. Piaget and Inhelder (1956) used picture selection or model building tasks, in which children sat within the frame of reference of a room while attempting to imagine an array from a different vantage point. Presson (1973, 1979) found that 8-or 9-year-olds did much better when asked about particular items from a different vantage point (e.g., if you sat over there, what would be closest to you?), and argued that such questions were easier because they suggested a frame of reference other than the surrounding room (i.e., the child's body in an imagined position served as the reference point). Even preschoolers succeed on tasks in which responses are not influenced by conflicting frames of reference (Newcombe and Huttenlocher, 1992). Importantly, however, such questions are not easier in general; they are harder when children are asked to imagine the array rotating Presson, 1973, 1979). The contrast between array rotation and perspective taking and its interaction with the processing demands of the dependent variable have been confirmed in many subsequent studies (Simons and Wang, 1998;Wang and Simons, 1999;Wraga et al., 2000;Hegarty and Waller, 2004;Nardini et al., 2006).
Taken together, a great deal of research tells us that, on tasks that minimize spatial and representational complexity, the basic ability to take someone else's perspective is present from infancy, and may provide the foundation for social cognitive abilities, such as theory of mind (Sodian et al., 2007). However, from a spatial cognition point of view, the spatially rich Level 2 perspective-taking tasks with conflicting frames of reference are interesting because they concern the ability to represent and coordinate multiple perspectives in one coherent spatial framework. This ability may be ecologically meaningful and predictive of real-world spatial performance, as suggested by research in adults Waller, 2004, 2005). For example, Hegarty and Waller (2004) found that perspective taking was correlated with participants' self-reported sense of direction, which in turn had predictive validity as a measure of spatial cognition in large-scale environments (Sholl, 1988;Hegarty et al., 2002). One of the very few developmental studies showed that perspective taking (measured by a modified Three Mountains Task) correlated with performance in a mapping task (Liben and Downs, 1993).
One reason why individual differences in Level 2 perspective taking and their correlates are little investigated in children may be that there is no simple and easy-to-administer test of the ability. Many existing paradigms were not applied below age 6 (Brodzinsky et al., 1972;Liben and Downs, 1993;Surtees and Apperly, 2012). The ones that are suitable for younger children require complex setups involving farm scenes, toy houses, dolls, toy boats, blocks, or papier-mâché mountains, (Coie et al., 1973;Minnigerode and Carey, 1974;Ives, 1980;Liben and Belknap, 1981;Brodzinsky, 1982;Gzesh and Surber, 1985). Such three-dimensional setups are hard to reproduce in replication studies and cumbersome to haul to schools in large-scale studies. Furthermore, some of the previous tasks required children to understand or even produce complex spatial terms in describing how displays looked to them (e.g., "right side up") or other observers ("upside down"; e.g., Masangkay et al., 1974).
In the present study, we thus investigated the development of Level 2 perspective taking in children from 4-8 years, using a task similar to the original Three Mountains Task in terms of spatial and representational complexity. However, in the present task, spatial layouts were presented on paper, which made the task easier to administer than using a three-dimensional display or model. The general format of the present task was based on a study with adult participants (Hegarty and Waller, 2004). In this task, computer graphics showed three photographers taking pictures of a complex three-dimensional layout of objects. Different pictures of the layout were presented and adults were asked to decide which of the photographers (or none) could have taken the picture. As this task was hard even for adults, a simplified version was created for the age groups of the present study. We presented scenes with toy photographers taking pictures of different layouts of objects and asked children to choose which one of four pictures was most likely taken by a specific photographer. Most notably, our task differed from Hegarty and Waller's in that we used photographs of Playmobil figures rather than computer graphics, and the choice alternatives showed perspectives that were easier to differentiate as they differed by larger angles (90 • ).
Children from 4-8 years were tested, based on previous work showing that Level 2 knowledge emerges around 4 or 5 years of age (Masangkay et al., 1974;Flavell et al., 1980;Pillow and Flavell, 1986), and improves considerably between age 6 and 8 (Salatas and Flavell, 1976). We investigated Level 2 perspective taking abilities in this age range using a single methodology, and also tested effects of different angular disparities between observer and target perspectives. Given that previous studies led to mixed results about effects of spatial complexity of presented layouts, we also systematically examined effects of layout complexity. We expected performance to be highest on trials presenting only one fronted object, lower performance if two symmetrical objects were presented and the viewpoint had to be inferred based on their spatial relation, and the lowest performance if arrays of multiple objects were presented for which internal spatial configurations had to be considered.

PARTICIPANTS
Participants were 80 children, with 16 children (8 boys and 8 girls) in each of the following age groups: 4-year-olds (mean age = 4;4, range = 4;0-4;11); 5-year-olds (mean age = 5;6, range = 5;1-5;11); 6-year-olds (mean age = 6;4, range = 6;0-6;11); 7-year-olds (mean age = 7;6, range = 7;1-7;11); 8-year-olds (mean age = 8;7, range = 8;1-9;0). Two additional children (one 4-and one 5-year-old) were tested but excluded from analyses due to incorrect responses on one of the three criterion trials (see next section). Children were recruited from a database of families who had volunteered to participate in developmental research. They were predominantly Caucasian, from middle-class backgrounds, and lived in urban and suburban areas of a large U.S. city. All parents provided written informed consent and all children provided verbal assent. All procedures used in this research were conducted according to ethical guidelines and approved by the Institutional Review Board at Temple University.

STIMULUS MATERIAL AND PROCEDURE
Participants were tested in a laboratory room. In four instruction trials, real three-dimensional objects and photographers were presented (Figure 1). A cone and a cylinder were presented sideby-side on a letter-size white cardboard and placed on a table. Two Playmobil figures were placed on the cardboard and introduced as Lisa and Peter. Both figures were holding a camera, pointing it at the objects, one at 90 • and one at 180 • angular difference to the child's line of sight. In each of the four practice trials, a picture of the cone and cylinder was presented that was taken from a different perspective. Children were asked whether they thought that Lisa or Peter (or no one) has taken the picture. After the children gave their responses (verbally), the experimenter encouraged them to walk behind Lisa and Peter and peek over their shoulders to check. One picture showed the child's own perspective so neither of the photographers could have taken it. In this case, the experimenter encouraged the child to return to its seat after checking, and pointed out that neither Lisa nor Peter could have taken this picture because it was taken from where the child was sitting.
A three-dimensional setup was chosen for the instruction trials, as pilot experiments showed that young children had difficulties understanding the instructions if they were not allowed to physically move behind the photographers to check their answers. Furthermore, a setup with two photographers was used in instruction trials (in analogy to the original task by Hegarty and Waller, 2004), to highlight the fact that different positions of photographers afford different perspectives. Moreover, previous work had shown that children are more likely to understand the task if a doll was used in the instruction (Newcombe and Huttenlocher, 1992).
Subsequent trials were presented as color printouts in lettersized transparent document pockets in a binder. The first two trials served to ease the transition from the three-dimensional instruction trials. These trials showed two photographers at 0, 90, or 180 • angular difference from the child's perspective, and children had to guess which photographer could have taken one particular picture. The next three trials showed only one photographer with four alternative pictures, and children were asked which picture this photographer could have taken. These three trials served as criterion trials and were very easy, as the four alternatives each showed completely different objects, only one of which was in front of the photographer's camera. Thus, following the photographer's line of sight (or Level I perspective taking) was sufficient to succeed on these trials.
The subsequent test trials presented one photographer and four choice alternatives, but now all alternatives showed the same object(s) from different perspectives (see Figure 2). The choice alternatives were distributed horizontally below the layout picture. One of the alternatives showed the correct view and three were foils, in which the orientation or spatial relations among the objects would not match the photographer's perspective. Whereas on 0 • trials the child's own perspective was the correct answer, on 90 and 180 • trials the child's perspective served as one of the foils. The layouts consisted of one, two, or four objects. To clearly distinguish the views on one-object trials, a fronted asymmetric object was used: a figurine with a face, a bag over one shoulder, and one arm raised. On four-object trials, roughly half of the foils also differed regarding the internal spatial relations between objects.
All layout pictures were taken from an oblique angle (camera slanted by ca. 45 • ), similar to how a layout on the table would look from the child's eye level (e.g., in instruction trials). Thus, all objects in the array were clearly visible and none of them was occluded. The choice pictures were taken from a slightly lower camera position, so that the 0 • choice alternative was similar but not identical to the layout picture (and the child's perspective).
Children were instructed to "point to the picture Lisa (Peter) has taken from where she (he) is standing." Children were also encouraged to "pretend peeking over her (his) shoulder to find out." If a child pointed ambiguously, the experimenter asked the child to point again because she did not clearly see which one the child had chosen. If a child pointed to two pictures in a row the child was asked again which one was the final answer. The experimenter wrote down the response and flipped the page for the next trial. There was no time limit to complete a trial and the whole experiment took about 5 to 7 min.

DESIGN
Test trials varied in the angular difference between the photographer's and the child's perspectives (0, 90, or 180 • ) and complexity (1, 2, or 4 objects in the layout). Each of these combinations was presented twice, with different objects and a different photographer, amounting to a total of 18 test trials. The number of objects in the layout was blocked, such that complexity increased from block to block. However, half of the children of each age group and sex were presented with the reverse trial order, such that complexity decreased from block to block, in order to rule out confounding effects of fatigue or practice. The different angles were presented in a predetermined quasi-random order, with the restriction that no angle was presented twice in a row. Photographers (Lisa or Peter) and objects (colors, spatial configurations) were also changed between trials, in order to make the task more entertaining and prevent children from trying to remember their answers in previous trials. The position of the correct choice was quasi-randomized across trials. A trial was scored with one point if the correct picture was selected, and with zero points if any of the three foils were chosen, thus a total score of 18 could be reached.

INTERNAL CONSISTENCY
We calculated Guttman's split-half coefficient (i.e., the correlation between trials analogous in terms of angle and complexity, www.frontiersin.org FIGURE 2 | Examples for stimuli presented in test trials, varying in spatial complexity (1, 2, or 4 objects) and perspectives (90, 0, or 180 • ). Stimuli were presented exactly as shown, as color printouts on separate letter-sized sheets of paper. but with different objects and photographers), which showed an excellent internal consistency of 0.91.

ABOVE CHANCE PERFORMANCE
In a first analysis, performance was classified according to whether or not it exceeded chance, that is, whether children solved 9 trials or more out of 18 trails correctly (Binomial distribution, p < 0.05). The number of children who performed above chance increased with age, with significant improvements from 5 to 6 and from 7 to 8 years of age (see Table 1, first column; asterisks indicate significant chi-square statistics showing more above-chance performing children than in the previous age group).

MEAN ACCURACY
To test for effects of spatial complexity and angular difference between the photographer's and the child's perspectives on the proportion of correctly solved trials, data were averaged across measurement repetitions. A preliminary ANOVA was calculated with angle (0, 90, 180 • ), complexity (1, 2, 4 objects), age group (5), sex (2), and trial order (forward vs. backward) as independent variables, and accuracy as dependent variable. The analysis showed no significant main effects of or interactions with trial order (all Fs < 2.02, all ps > 0.05). This suggests that there were no significant effects due to practice or fatigue; therefore this variable was not included in subsequent analyses.
An ANOVA was calculated with angle (0, 90, 180 • ), complexity (1, 2, 4 objects), age group (5), and sex (2) as independent variables and accuracy as dependent variable. Results showed significant main effects of age group, F(4,70) = 16.04, p < 0.001, η 2 = 0.48. Post hoc tests (Sidak corrected here, and throughout) showed that there was a dramatic increase in correct choices especially between 7 and 8 years of age (from 51 to 76%, p < 0.001, see Table 1). The group of 8-year-olds differed from all other age groups (all ps < 0.001), whereas age groups younger than 8 years did not differ from each other (all ps > 0.08). However, even at age 8, there was still considerable variability in individual test scores (see Table 1 for standard deviations). Furthermore, there was a significant main effect of angle, F(2,140) = 191.24, p < 0.001, η 2 = 0.73, and an interaction of age group and angle, F(8,140) = 4.56, p < 0.001, η 2 = 0.21. Post hoc tests investigating this interaction showed that 4-year-olds differed from all other age groups on 0 • trials (all ps < 0.01). On the other hand, 8-year-olds differed from all other age groups on 90 and 180 • trials (all ps < 0.05). Other age groups did not differ significantly at any angle (all ps > 0.44). An interaction between sex and angle, F(2,140) = 4.76, p < 0.05, η 2 = 0.06, was due to males outperforming females on 0 • trials that did not require a perspective change (p < 0.05), whereas on 90 and 180 • trials there were no sex differences (ps > 0.11).
A main effect of complexity, F(2,140) = 13.21, p < 0.001, η 2 = 0.16, was qualified by a two-way interaction between complexity and angle, F(4,280) = 2.65, p < 0.05, η 2 = 0.04, and a three-way interaction of complexity, angle, and age group, F(16,280) = 2.59, p < 0.01, η 2 = 0.13 (see Figure 3). Post hoc tests showed that complexity mattered mostly on 90 • trials (all ps < 0.05) and 180 • trials (one object significantly better than two, p < 0.05). On the other hand, on trials requiring no perspective change, the number of objects did not significantly affect accuracy (all ps > 0.11) and performance was high overall (87% accuracy). However, the three-way interaction with age group indicates that this was not true for 4-year-olds, who performed better on 0 • trials when only one object was involved (84% accuracy) as compared to multiple objects (ps < 0.05). In addition, they performed poorly on trials that required a perspective change (from 9-31% accuracy) regardless of complexity (ps > 0.67). The ANOVA showed no other significant effects (all Fs < 1.09, ps > 0.36).

EGOCENTRIC ERRORS
Finally, children's responses were analyzed regarding their tendency to choose the alternative showing their own perspective. Such an egocentric response is the correct solution on 0 • trials, but incorrect on 90 and 180 • trials. Therefore, only 90 and 180 • trials were included in this analysis. An ANOVA was calculated with angle (90, 180 • ), complexity (1, 2, 4 objects), age group (5), and sex (2) as independent variables and proportion of egocentric responses as dependent variable.

DISCUSSION
In the present study, we investigated 4-to 8-year-old children's ability to represent and coordinate multiple perspectives when dealing with conflicting frames of reference. We presented pictures showing scenes with a toy photographer taking pictures of a single fronted object or multiple objects from different angles. Children were asked to select the toy photographer's perspective among four photographs. Results showed an increase in correct choices with age, with the most dramatic improvement between 7 and 8 years of age. The number of children who performed above chance level also increased with age, with significant improvement from 5 to 6 and from 7 to 8 years of age.
A more detailed analysis of children's incorrect responses showed that when multiple objects were involved, egocentric errors increased from 4 to 5 years of age. In combination with the low accuracy observed in 4-year-olds, it appears that 4-yearolds may have responded randomly, whereas at 5 years of age www.frontiersin.org children began to more systematically choose their own perspective. Egocentric errors peaked at 5 years of age, followed by a general decrease in this response bias, suggesting a growing ability to inhibit egocentric choices and to rely on allocentric representations of space with increasing age (Lourenco and Frick, 2013). Boys made more egocentric errors than girls, indicating that they chose a view that corresponded to their own perspective more often than girls. This may also explain why boys generally outperformed females on 0 • trials, where the egocentric response was the correct choice. However, there were no sex differences in accuracy on trails that required a perspective change.
Even though performance improved with age, there was still considerable variability in individual test scores, as suggested by large standard deviations. Thus, although some 8-year-olds performed almost perfectly, there was still room for improvement for many 8-year-olds, especially for spatially complex layouts showing multiple objects. For all children except the youngest age group, spatial complexity mattered most on trials that required a perspective change, whereas performance on 0 • -trials was generally very high. On the other hand, spatial complexity did not affect performance on trials that did not require a perspective change -again with exception of 4-year-olds, whose performance suffered even on 0 • -trials when more than one object was involved.
Layouts of low spatial complexity, exhibiting only one fronted object, appeared to be easier for 4-year-olds, and a decrease in egocentric responses was observed about 2 years earlier on such items. These results corroborate previous findings that children succeeded earlier if fewer objects are involved (Fishbein et al., 1972;Gzesh and Surber, 1985). When only one fronted object is presented, success can be achieved by following an observer's line of sight and noticing whether he would see the front, back, or a particular side of the object. This may have reduced task difficulty, and made the task more similar to a Level 1 perspective taking task.
On the other hand, if multiple objects are involved, spatial relations between the objects in the array have to be considered. Our results suggest that this ability was virtually non-existent in 4-year-olds, and improved considerably from 7 to 8 years of age.
These results are consistent with previous literature showing that Level 2 rules are acquired at around 4 or 5 years of age (Masangkay et al., 1974;Flavell et al., 1980;Pillow and Flavell, 1986), and improve considerably between age 6 and 8 (Salatas and Flavell, 1976).
At the same time, our results stand in contrast to claims that younger children can demonstrate Level 2 perspective taking skills. For example, in a study by Liben (1978) that aimed to minimize spatial complexity, the child and/or experimenter wore colored glasses and the child was asked to describe in which color a white card appeared to each of them. At ages 6-7, answers were consistently correct, and even at ages 4-5 more than half the children were able to correctly answer questions about the experimenter's view. In similar but more recent studies, Moll et al. (2013) and Moll and Meltzoff (2011) showed that children as young as 3 years old knew how an object would look to another person if viewed through a color filter. However, it is not entirely clear whether tasks such as these are essentially Level 1 tasks, as even though they ask a how-question, tracing a line of sight shows a path that traverses the filter.
These conflicting results also suggest the need for finer-grained research on the processes underlying the different perspective taking tasks. The classic perspective taking task involves arrays of multiple objects and tests with picture choice and model building. This task probably indexes several different skills, which may include internal coding of the array of targets (i.e., relative distances and positions among multiple objects forming a coherent configuration), possibly using a common framework of landmarks or an external frame of reference. However, if such a frame of reference is used in coding the array, the challenge is suppressing awareness of the physically present frame of reference when responding (Huttenlocher and Newcombe, 1984;Newcombe, 1989). In support of this notion, individual differences in children's perspective taking (measured by a task similar to the Three Mountains Task) have previously been linked to cognitive styles, such as reflection-impulsivity (Brodzinsky, 1982). Future research should take a closer look at whether and how the development of perspective taking is associated with progress in other spatial and non-spatial cognitive abilities.
Taken together, the present findings showed a growing ability to inhibit egocentric choices and to recognize the correct viewpoint between 5 to 8 years of age, with the largest change between 7 and 8 years. Using a new paradigm that is easy to administer in school settings and allows for systematic experimental variation of task factors such as layout complexity, the present study showed that an increase in accuracy and a decrease in egocentric responses can be observed earlier when spatial complexity is low. Most 4-year-olds responded near chance level when a change in perspective was required, and large standard deviations in children's responses suggest that there are large individual differences in this ability at an early age, which may be precursors of the individual differences found in adults (for a review see Hegarty and Waller, 2005). In light of findings that the ability to coordinate perspectives and reference frames bears importance to many spatial tasks in adults Waller, 2004, 2005), a deeper understanding of the origins and long-term effects of this ability is essential.