Developmental Changes in Natural Viewing Behavior: Bottom-Up and Top-Down Differences between Children, Young Adults and Older Adults

Açık, Alper; Sarwary, Adjmal; Schultze-Kraft, Rafael; Onat, Selim; König, Peter

doi:10.3389/fpsyg.2010.00207

ORIGINAL RESEARCH article

Front. Psychol., 25 November 2010

Sec. Perception Science

Volume 1 - 2010 | https://doi.org/10.3389/fpsyg.2010.00207

Developmental changes in natural viewing behavior: bottom-up and top-down differences between children, young adults and older adults

Alper Açik*

Adjmal Sarwary

Rafael Schultze-Kraft

Selim Onat

Peter König

Department of Neurobiopsychology, Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany

Despite the growing interest in fixation selection under natural conditions, there is a major gap in the literature concerning its developmental aspects. Early in life, bottom-up processes, such as local image feature – color, luminance contrast etc. – guided viewing, might be prominent but later overshadowed by more top-down processing. Moreover, with decline in visual functioning in old age, bottom-up processing is known to suffer. Here we recorded eye movements of 7- to 9-year-old children, 19- to 27-year-old adults, and older adults above 72 years of age while they viewed natural and complex images before performing a patch-recognition task. Task performance displayed the classical inverted U-shape, with young adults outperforming the other age groups. Fixation discrimination performance of local feature values dropped with age. Whereas children displayed the highest feature values at fixated points, suggesting a bottom-up mechanism, older adult viewing behavior was less feature-dependent, reminiscent of a top-down strategy. Importantly, we observed a double dissociation between children and elderly regarding the effects of active viewing on feature-related viewing: Explorativeness correlated with feature-related viewing negatively in young age, and positively in older adults. The results indicate that, with age, bottom-up fixation selection loses strength and/or the role of top-down processes becomes more important. Older adults who increase their feature-related viewing by being more explorative make use of this low-level information and perform better in the task. The present study thus reveals an important developmental change in natural and task-guided viewing.

Introduction

The study of visual processes under natural conditions has gained significant momentum in the last two decades. Both its confirmation of findings gathered using artificial stimuli and the discovery of hitherto unknown properties of the visual system have contributed to the increasing employment of natural stimuli in the laboratory (Felsen and Dan, 2005). Importantly, the use of natural stimuli is not restricted to physiological studies (Simoncelli and Olshausen, 2001), but also includes psychophysical approaches (Geisler, 2008).

Following this trend, research on the selection of fixations under natural conditions has grown and contributed to our understanding of overt attentional mechanisms (t’ Hart et al., 2009; Tatler, 2009). One solid finding of eye-tracking studies indicates that many local image features such as luminance contrast (Reinagel and Zador, 1999; Krieger et al., 2000) and high-frequency edges (Baddeley and Tatler, 2006) have higher values at the center of gaze, compared to non-fixated regions in natural scenes. Saliency models (Itti and Koch, 2001; Parkhurst et al., 2002) employ these findings in order to predict the locations fixated by human subjects. But despite their above-chance performance (Parkhurst et al., 2002), the success of such models is far from perfect (Tatler, 2009), and there is evidence that local image features, although correlated with fixated locations, are not causally related to attention (Einhäuser and König, 2003; Açik et al., 2009). That is, the fact that certain features are higher at fixated regions does not necessarily mean that it is those high feature values which draw attention to these regions (Einhäuser and König, 2003). This type of local feature analysis at fixations is generally assumed to reflect the bottom-up aspects of overt attention (Schill et al., 2001; Henderson, 2003). Top-down processes, on the other hand, are reflected in the influence of memory, expectation, scene gist, and task on eye movements (Land et al., 1999; Henderson, 2003). There is evidence suggesting viewing patterns might change between different task conditions, even though the feature-fixation relation remains the same (Betz et al., 2010). This shows that the difference in local feature values between fixated and not fixated regions is invariant to task demands. Many studies highlight the interactions between top-down and bottom-up contributions to fixation selection ( Torralba, 2003; Navalpakkam and Itti, 2005; Tatler et al., 2005; Rutishauser and Koch, 2007; Açik et al., 2009), but the exact principles behind such interactions are still debated (Henderson et al., 2007; Tatler, 2007; Einhäuser et al., 2008; Ballard and Hayhoe, 2009). Together with these aspects, motor biases (Tatler, 2007), crossmodal processes (Onat et al., 2007), binocular disparity information (Jansen et al., 2009), and object recognition (Einhäuser et al., 2008) are also demonstrated to play a role in eye-movement guidance. Even though this brief selection of works reflects how rich the study of eye movements under natural conditions has become, there are still neglected aspects of scene viewing.

Despite the growing interest in eye-tracking studies employing natural stimuli, there is a surprising gap regarding one type of inter-individual difference. To the best of our knowledge, there has been no study investigating developmental changes in free-viewing behavior during inspection of natural scenes. Karatekin (2007), in her authoritative review of eye-tracking studies investigating normative and atypical early development, draws attention to the fact that existing work on scene and face perception is scarce and fails to address fundamental questions such as the relative roles of top-down and bottom-up processes during development. The few studies conducted in this context relate to the effects of linguistic cues on eye-movement behavior of children younger than 10-years-old (e.g., Sekerina et al., 2004), face perception (Schwarzer et al., 2005) and spatial learning (e.g., Karatekin et al., 2007), with a complete absence of studies on simple scene viewing from a developmental perspective (Karatekin, 2007). Accordingly, studies investigating the developmental changes associated with natural scene viewing are most needed.

In comparison to natural scene viewing studies, research on age-related changes on overt visual attention, whereby artificial stimuli and tasks are employed, has been prolific (for comprehensive reviews see Madden and Whiting, 2004; Karatekin, 2007). Certain classical findings of developmental cognitive research deserve to be mentioned since they might be influential for studies investigating attentional mechanisms under more natural conditions. Older adults are shown to rely relatively more on top-down processes in several tasks, efficiently making use of expectations regarding the to-be-identified items (Whiting et al., 2005), a finding corroborated also by neuroimaging (Madden et al., 2007). More reliance on top-down processes in the case of older adults can be understood given the changes in bottom-up processes at the sensory level (Madden et al., 2005; Madden, 2007). Nearly all visual function and performance measures such as low contrast vision, stereopsis, attentional field area, and face recognition have been shown to decline progressively starting from the end of the sixth decade (Haegerstrom-Portnoy et al., 1999; Haegerstrom-Portnoy, 2005). However, these findings generally involve quantifying changes in psychophysical thresholds, and it should be noted that the natural stimuli typically used in laboratory studies are sufficiently above threshold and contain a rich array of image features. As a further point, in many domains including perception and memory, younger adults outperform both children and older adults (Plude et al., 1994). Finally, it has been observed that certain cognitive abilities display correlations only early and late in life (Li et al., 2001, 2004). Whether the age-related increase in reliance on top-down processes, and the lower performance coupled with higher inter-task correlations observed for older adults and children apply to behavior under more natural conditions remains an open question.

In the present study we investigate developmental changes in viewing behavior and their influence on a post-viewing task. Subjects belonging to three age groups (children, young adults, and older adults) freely viewed natural and complex images while their eye movements were measured. Immediately after each image presentation, their memory was probed by showing a patch from either the just-viewed or another image. We provide information on performance, fixation statistics, and feature-related viewing. The main null hypothesis we are testing is that feature-related viewing does not change with age. This would be expected if local features are related to fixations correlatively only, as suggested by our previous work (Einhäuser and König, 2003; Açik et al., 2009). Alternatively, if the features have at least some causal influence on fixation selection, this influence could be highest during childhood, and decrease either until adulthood, or progressively during the whole lifespan. A complementary expectation concerns the amount of agreement between observers’ fixations, which would be the same for all age groups if fixation selection is governed by the same mechanisms independently of age. But if top-down and idiosyncratic viewing patterns emerge gradually with age, one would observe a decrease in inter-observer agreements. Furthermore, we correlate fixation-related measures and performance, and address the developmental changes therein. Our study is a first step to fill the gap in the developmental aspects of viewing behavior under natural conditions.

Materials and Methods

Subjects

The subjects who participated in the experiment belonged to three different age groups. 23 university students (11 females, age range 19–27, mean age 22.1) established the “young adults” group. Eighteen participants were second grade pupils (6 females, age range 7–9, mean age 7.6) of the primary school “Grundschule in der Wüste” in Osnabrück, Germany and will be referred to as “children”. Younger children were not included since they could not read the on-screen prompt (see below). The last group, “older adults”, consisted of 17 residents of the retirement home “Wohnstift am Westerberg” in Osnabrück, Germany (10 females, age range 72–88, mean age 80.6). Older adults were living independently and not under special care. All subjects reported normal or corrected-to-normal visual acuity. Older adults declared they had no known visual problems or diseases. The participants had not seen the stimuli before and were naïve to the experimental procedure. All subjects signed a written consent form to participate in the experiment. The subjects that were not over 18-years-old, that is, the members of the children group had to hand in the consent form signed by a parent. The experiment was conducted in compliance with the Declaration of Helsinki as well as national and institutional guidelines for experiments with human subjects.

Stimuli

Two hundred fifty-five colored images from four different image categories served as stimuli (Figures 1A–D). “Naturals” (64 images), taken from the “McGill Calibrated Color Image Database¹” represented natural scenes showing trees, flowers, bushes as well as other natural illustrations and did not contain any artificial objects (Olmos and Kingdom, 2004). “Fractals” (64 images) were taken from three different web databases: Elena’s Fractal Gallery, Maria’s Fractal Explorer Gallery, and Chaotic N-Space Network (all reached via “IFD: Internet Fractal Database²”), and depicted self-similar computer-generated shapes (shapes that are similar to a part of themselves) and had second order statistics very similar to real-world images. “Manmades” (64 images) included urban scenes and other manmade objects such as busy streets, buildings, and construction sites. These images were taken at public places in and around Zürich, Switzerland, with a high-resolution camera (Nikon D2X). Images for the fourth category “pinks” (63 images) were created from the images in other image categories. This was done by splitting the original images into their three color channels (RGB), and then transforming each into their Fourier spectra. Thereafter, for each image and color channel combination, the amplitude spectrum was computed. After taking the average over images in a category, these mean amplitude spectra were combined with a random phase spectrum 21 times for each category average, resulting in 63 three-channel spectra. Finally, the three color channels were again combined to obtain the colored, phase-randomized image, which is commonly denoted as pink noise. Compared to regular white noise, which has a uniform power spectrum, pink noise has a power spectrum that decreases with 1/f (f = frequency), similar to fractals and real-world scenes. But fixations and image features derived from these fixations are known to have qualitatively different statistical properties (Einhäuser et al., 2006). As such, in types of analysis where the categories are pooled in order to highlight age differences, the pinks data was excluded. All images had a resolution of 1280 × 960. These four categories of images were chosen in order to provide comparisons with previous studies from the eye-tracking literature which have employed similar or identical stimuli (Parkhurst et al., 2002; Einhäuser et al., 2006; Açik et al., 2009).

FIGURE 1

Figure 1. Stimuli and methods. (A–D) Representative examples from the naturals, fractals, manmades, and the pinks categories, respectively. (E) The fixation map of the above presented naturals image generated by taking young adult fixations. (F) Luminance contrast map of the above presented fractals image. (G) The intrinsic dimensionality map of the above presented manmades image. Please note that the absolute values of different features or fixation probabilities do not matter in the information theoretical analyses employed in the paper. As such, we intentionally refrain from providing colorbars for the figures. (H) The flow of a trial. Trials began with the fixation of the cross, followed by the 5 s-long presentation of one image. After that, a circular image patch was shown and subjects indicated whether the patch belonged to the image that was just shown. The patch belonged to the image with 0.5 probability, or to another image in the same category with the same probability. As soon as the subject answered via key press, the next trial was initiated.

Eye-Tracking

Eye data was recorded using the EyeLink 1000 system (SR Research Ltd., Mississauga, Ontario, Canada) in its remote and head-free mode. No chin-rest or head-mount was used. The eye-tracker employs an infra-red illuminator and it was set to sample eye-related data at 500 Hz. Head movements were compensated by measuring the head-position with a small target sticker placed on the participant’s forehead. Fixations and saccades were detected and parameterized automatically by the eye-tracker.

For stimulus presentation we used a 20-inch ViewSonic VX2000 LCD Monitor (ViewSonic Corporation, Walnut, CA, USA). Monitor width was 40 cm and all participants were seated 65 cm from the monitor surface, and the display thus covered horizontally about 35° and vertically about 26° of the visual field.

Procedure and Task

In order to avoid an overly long experimental session each participant was shown 128 of 255 images in the complete stimulus set. Stimulus randomizations were balanced across pairs of subjects. For a given subject, 32 images from each category were selected at random and used during the experiment. The next subject viewed the remaining 127 images and one other pink noise image that had been shown to the previous subject. Including calibration and validation of the eye-tracker, a session never exceeded 40 min.

Each trial started with a fixation cross in the center of the screen (Figure 1H). As soon as the participant fixated the cross, one of the images was shown in full-screen for 5 s. After the full-screen presentation, a circular image patch was shown. An on-screen prompt posed the question of whether this patch was part of the image that had just been shown. The patch came from the shown image or from another image in the same image category with equal probability, and its location in the image was random and changed each trial. They answered by either pressing the left, “yes”, or the right, “no”, arrow key on a keyboard placed in front of them. Responses initiated the next trial. The diameter of the patch was set to 3.2° for naturals, fractals, and manmades, and to 4.3° for pinks. A larger patch size was used for images from the pinks category due to higher difficulty solving the task for this image category, as observed in pilot experiments (data not shown). Before the experiment began, subjects were instructed to study the full-screen image carefully and upon the appearance of the circular patch, to respond by button press to the question of whether the patch was part of the full-screen image. The subjects were explicitly told that accuracy was more important than a fast response; accordingly the reaction times were not analyzed.

Analysis

Data analysis addressed three different aspects of natural viewing behavior together with the relations between these aspects. We first analyzed the performance in the patch-recognition task. After that we quantified basic image-viewing differences observed for different image categories and age groups. We addressed whether different viewing characteristics display correlations with performance in the task. We then moved on to analyze the fixated locations in terms of image features such as luminance contrast and local symmetry. Next we checked whether the observed differences between subjects, regarding how feature-related they were, depend on their viewing characteristics and/or influence task performance. Since natural image statistics and the derived characteristics violate the assumptions underlying the parametric statistical tests (Baddeley, 1996), non-parametric tools were employed (Efron, 1979). These included the information theoretic measure of area under curve (AUC), bootstrap confidence intervals (CIs), and resampling testing, all of which are explained below. In sum, viewing characteristics, performance, and local feature analysis at fixations establish the core part of our non-parametric data analysis.

Performance

Performance was quantified employing the standard measures of percentage correct, hit rate, and false alarm (FA) rate. As mentioned above, reaction times were not analyzed. Category and age-group specific analyses were identical to the viewing characteristics explained below.

Viewing characteristics

We employed three related measures to quantify subjects’ viewing characteristics. One of those measures, which we will call explorativeness, is based on the entropy of the probability density maps (PDMs, Figure 1E) generated from sets of fixations. Shannon’s entropy, defined as

quantifies the uncertainty related to a random variable in bits. If each value a random variable can take is equally probable, that is, if the probability distribution is uniform, the entropy is at its maximum. In the fixation context, such a uniform distribution would mean that all parts of the image were viewed equally often. This scenario would correspond to the most explorative viewing behavior. Conversely, if only a single pixel were fixated during the whole trial, with a complete avoidance of the rest of the image, the entropy would be minimal, meaning that viewing behavior was completely fixed. Entropy is an extensive measure, and thus a comparison of the entropy across conditions is not possible in the case of different sample sizes. In order to ensure that differences in entropy are not biased due to such observation differences, we used Chao and Shen’s (2003) correction method, as suggested by Wilming et al. (in preparation). We first gathered all fixations belonging to a certain analysis type, such as choosing all the fixations belonging to one image, or all fixations performed by a single subject. We then stored the amount of fixations observed at individual fixations in a matrix of the same size as the images in pixels. From this matrix, we then constructed a discrete frequency distribution by pooling nearby fixations into the cells (length 0.6°, corresponding to the acuity of our eye-tracking apparatus) of a grid. Entropy was then computed using Chao and Shen’s (2003) correction. Since the absolute value of the entropy depends on the number of bins in the probability distribution used, it is not readily interpretable. Hence, we here normalized it by the maximum possible entropy of a distribution of the same size, which is −log₂(1/n), where n is the number of bins in the distribution, and then expressed it as a percentage. Please note that after Chao and Shen’s (2003) correction, the entropy might exceed the theoretical maximum, but the important category and age differences remain the same. As such, the first viewing characteristic, the explorativeness, estimates how active the viewing behavior was, without being biased by unequal number of fixations.

The other two viewing characteristics are relatively simple. One of them is the subject-specific number of fixations performed on a given image and the other is the median distance between successive fixations. In sum, the number of fixations and inter-fixation distances together with the explorativeness provide a compact way of summarizing viewing behavior.

The category and age-group difference analyses differed in terms of the data taken as samples. For image category differences, the image-specific statistics of interest were first computed separately for each age group. This analysis yielded for each image 3 values, one for each age group. Then, for each image, the mean over age was taken. As such, for each category we have a distribution consisting of individual image data as data points. We report the medians of these mean distributions together with 95% CIs as determined from percentile bootstraps (Efron and Tibshirani, 1993). For age-group differences, first, subject-specific statistics were computed for each category separately; second, for each subject the mean over categories was taken. Thus, for each age group we obtained a distribution of individual subject data points. The medians of these distributions are reported with 95% CIs. By either pooling category-specific data over age, or pooling age-group data over categories in this manner, we refrain from analyzing the data in a factorial context.

Image feature-related viewing analysis

In order to capture the influence of local low-level image properties on the selection of fixation points, we employed the commonly used receiver operating characteristics (ROC) curve based analysis (Tatler et al., 2005, 2006; Açik et al., 2009). The integral of ROC (area under curve, AUC) provides a straightforward measure of how well a given feature discriminates fixated locations (actual fixations) from other locations (control fixations), that is, the salience of a feature. The AUC metric is 1 if the feature under consideration is invariably higher at fixated locations, and it is 0 if it is invariably lower. An AUC of 0.5 corresponds to chance performance, that is, the feature distributions at actual and control fixations overlap. AUC measures are preferred over parametric statistical tests since they do not make assumptions regarding normal distributions and equal variances. Furthermore, the AUC measures not only test whether the actual and control distributions significantly differ, but also quantify this difference. Commonly encountered AUC values for well-discriminating individual features are generally between 0.55 and 0.65 in the literature (Tatler et al., 2005; Açik et al., 2009; Betz et al., 2010).

In line with previous work (Einhäuser and König, 2003; Tatler et al., 2005; Açik et al., 2009), we did not choose random locations from the images as control fixations. For a group of image-specific actual fixations that were selected according to certain criteria (age group, image category, inter-fixation distance etc.), the control fixations were taken as those locations that were fixated during the viewing of other images, but still conformed to the same criteria. As an illustration, imagine we have measured a certain feature at actually fixated locations for one image in the naturals category for the older adults group. The control fixations for this image, then, would be those locations fixated by this age group during the viewing of other images in the naturals category. Additionally, while analyzing only those actual fixations that followed large saccades, the control fixations would also be selected from the fixations of other images that followed larger saccades. These constraints ensure that biases such as central viewing were included in both actual and control fixation distributions.

The quantification of feature-related viewing was addressed by computing three different types of AUCs. First we pooled the actual and control feature values over images, and computed a single AUC for these two distributions. Again we determined CIs using bootstrapping. This measure quantifies the overall discriminability of a feature for a given category and age group, by pooling feature values from all images in a category and all subjects in an age group. However, due to early pooling this analysis is blind to image identity, and as such it can underestimate the variability of the image dimension. To address this, we computed one AUC for each image separately and took the mean over images. By bootstrapping from these image distributions we determined the CIs. Even though the values of the whole-data AUC and of the mean AUC are not expected to differ, the CIs in the latter condition might be wider. Finally, we computed subject-specific AUC values in order to correlate these with explorativeness and performance. This measure quantifies how well each feature distinguishes among fixated and not fixated regions for each subject separately and thus addresses inter-individual differences. We will refer to these three AUC measures as AUC_P (AUC-pooled), AUC_I(AUC-image) and AUC_S (AUC-subject), respectively.

We employed a bank of 12 local image features (see Figure 1 for two feature map examples). These address different local statistics of the image. Luminance contrast (Figure 1F) was defined as the standard deviation of a circular image region with 1.3° diameter normalized by the mean luminance of the image (Einhäuser and König, 2003). Texture contrast is a second order contrast, and computed as the standard deviation of a circular region in the luminance contrast map with a diameter of 3.9° normalized by the mean luminance contrast. Red-green (RG) contrast, yellow-blue (YB), and saturation contrasts (Frey et al., 2007, 2008) quantify the local variance in the DKL color space, and were again computed in patches of 1.3°. Intrinsic dimensionality (ID) analysis’ i2D feature (Saal et al., 2006) quantifies how many junctions, corners, and similar structures are present in a local region (Figure 1G). We further examined a set of features used in computer vision, all implemented by Kovesi (2000a). These included the Canny edge detector (Canny, 1986), Harris corner detector (Harris and Stephens, 1988), local radial symmetry detection (Loy and Zelinsky, 2003), and luminance-independent corner and symmetry detection based on phase congruency (Kovesi, 1997, 1999, 2000b, 2003). Contrary to local radial symmetry analysis, with phase-based symmetry we addressed rather large axes of symmetry and accordingly the minimum wavelength was set to 2.8°, more than 30 times the default value. The last feature we used is the normalized response of the image to a set of log-Gabor filters (Field, 1987) at different orientations and frequencies. These features were selected to be representative of previous literature and to cover a wide range of features addressing different local image properties.

Since the computation of image features involves convolutions with kernels of specific sizes, feature values at image boundaries cannot be computed reliably. As such we removed from the data all fixations that fell closer than 3° to the borders. We will refer to this filtered set as valid fixations.

Feature saliencies were tested against the inter-observer (IO) saliency (Peters et al., 2005), which is a less conservative upper bound than 1.0, the theoretical maximum of AUC. That is, the actual and control fixation discriminations based on features were not expected to be better than how well the fixations of one subject are predicted by the fixation locations of the rest of the subjects. In this study IO was computed by classifying actual and control fixations of a given subject based on the viewing data of all other subjects in the same age group. Please note that the more idiosyncratic viewing behavior the subjects in an age-group displayed, the lower the IO-AUC was expected to be.

Results

Performance

Prominent performance differences were found between image categories in the patch-recognition task (Figure 2A). The main difference was the extremely bad performance for pinks, with the percentage correct at 61% (CIs 58–65, please note that chance performance is at 50%), which differed from all other categories by at least 21%, and hence was left out of the age difference analysis. Furthermore, manmades performance at 82% (78–84) differed from naturals (88%, 85–89, p < 0.001) and fractals (86%, 85–89, p = 0.003). As can be seen in the lower panel, these differences were reflected in the FA rates in a one-to-one-manner. In pinks, the FA rate is 58% (55–62), which was even higher than guessing (50%). The significant comparisons were the same as in the percentage correct analysis. That is, the percentage correct differences were mostly explained by the FA rates, especially for pinks where performance was extremely low.

FIGURE 2

Figure 2. Performance in the delayed pattern-matching task. (A) Category-specific performance. In the upper panel, for each image and age group the percentage correct was computed separately, and then the mean over age was taken. Circles show category medians, and error bars denote 95% confidence intervals (CIs). The branching lines connect significantly different (p < 0.05) categories and p-values are shown next to the branches, down to p = 0.001. The very low pinks performance is striking. Lower panel shows category-specific false alarm (FA) rate. All conventions are identical to the upper panel. (B) Age-specific performance excluding the pinks category. In the upper plot, for each subject and image category, the percentage correct was computed separately, and then the mean over naturals, fractals and manmades was taken. Other conventions and statistical analyses are identical to panel (A). Lower panel shows age-specific FA rate. Note the agreement between percentage correct and FA rates both for category and age group analyses. Overall, pinks are associated with lower performance, and age differences reveal the classical U-shape function with young adults outperforming both children and older adults.

A similar correspondence between percentage correct and FA rate was also observed for the age-group comparisons (Figure 2B). In this case, young adults differed from children and older adults in both percentage correct and FA rate analysis at p ≤ 0.001. Young adults’ hit rate (%89, 87–90) was higher than children (77%, 69–84, p = 0.007) and older adults (82%, 71–91, p = 0.061), who did not differ from each other (p = 0.351). That is, young adults outperformed the other age groups, which is reflected in both the hit and FA rates.

Viewing Characteristics

One way to address viewing differences is by the explorativeness measure (see Materials and Methods). Figure 3A shows the category medians of explorativeness together with 95% bootstrap confidence intervals (CIs) (Efron and Tibshirani, 1993). As can be seen, naturals viewing behavior was relatively more diffuse compared to other image categories tested.

FIGURE 3

Figure 3. Basic fixation characteristics. (A) Category-specific explorativeness. Explorativeness is based on the information theoretical entropy measure and quantifies how actively a participant viewed the images. For each image and age group, explorativeness was computed separately, and then the mean over age was taken. Circles show category medians, and error bars denote 95% confidence intervals (CIs). The branching lines connect significantly different (p < 0.05) categories determined by permutation sampling and exact p-values down to p = 0.001 can be seen next to the branches. (B) Age-specific explorativeness. For each subject and all categories but pinks, explorativeness was computed separately, and then the mean over category was taken. As all fixations in one category were pooled, the explorativeness is overall higher. Other conventions and statistical analyses are identical to panel (A). (C) Median number of fixations for each age group. Category averaging and other conventions are identical to panel (B). (D) Median distance between successive fixations for each age group. Category averaging and other conventions are identical to panels (B,C). Note that the explorativeness of older adults is similar to other age groups, despite the greater number of fixations displayed by the former age group. This result can be explained by the relatively shorter saccades performed by older adults.

Importantly, the age differences in explorativeness were investigated. In Figure 3B we provide the medians for different age groups with CIs. Explorativeness across age groups was comparable, even though children displayed a trend (p = 0.071) for being less explorative than young adults. Interestingly, the older adults group executed more fixations than both young adults and children (Figure 3C). The discrepancy between explorativeness and number of fixations in older adults is explained by the median inter-fixation distances (Figure 3D). On average they displayed, similarly to children, shorter saccades compared to young adults. That is, despite similar levels of explorative viewing, the number of fixations and amplitudes of the saccades performed distinguish older adults from other age groups.

Explorativeness and Performance Correlations

Is there a relationship between how actively subjects inspected the images and their performance in the subsequent recognition task? We addressed this question by linearly regressing subjects’ category-specific performance results against their explorativeness (Figure 4). Before the computations, age- and category-specific outliers were detected by employing the common outlier definition of points lying more than 1.5 times the inter-quartile range away from the first and third quartiles. These outliers were removed from the data. In older adults, for three categories this revealed significant or close to significance R²-values with positive slopes (fractals, R² = 0.31, t-test p = 0.021; manmades, R² = 0.38, p = 0.015, pinks, R² = 0.22, p = 0.057). We next checked whether these results arose due to FA rates, hit rates, or both. None of the hit-rate–explorativeness correlations were significant, but the correlations with FA rate agreed with percentage correct in fractals (R² = 0.29, p = 0.037) and pinks (R² = 0.35, p = 0.013). In summary, in three of the four image categories, the more explorative older adults displayed better performance in the recognition task, and this was mostly because they were now less prone to making FAs.

FIGURE 4

Figure 4. Linear regression analysis of explorativeness and percentage correct. For each age group and category separately, we removed outliers (3 older adults in naturals, and 2 older adults manmades) and computed linear regressions for subject-specific explorativeness and percentage correct values. Inside the panels, each marker corresponds to one subject and the regression line is shown with the age color code. As can be seen, in three out of four categories (solid lines for p < 0.05 and the dashed line for p = 0.06), older adults displayed positive correlations (R² is given under the category label). Rectangular insets display the equations of the fits with 95% CIs inside square brackets. Please note that due to the entropy correction involved in the computation of explorativeness, some values might exceed 100 (see Materials and Methods).

Image Feature-Related Viewing Analysis

The saliencies of 12 different local image features were computed separately for image categories and age. In children, the AUC_P ranges were 0.51 (Harris edges) and 0.53 (RG contrast) for naturals, 0.52 (TC) and 0.62 (ID) for fractals, 0.54 (phase symmetry) and 0.61 (ID) for manmades, and 0.45 (Harris edges) and 0.50 (radial symmetry) for pinks. The AUC_P ranges of young adults were 0.51 (TC) and 0.53 (RG contrast) for naturals, 0.53 (TC) and 0.60 (Harris edges) for fractals, 0.54 (phase symmetry) and 0.60 (ID) for manmades, and 0.49 (TC) and 0.53 (saturation contrast) for pinks. Finally, in older adults, the AUC_P ranges were 0.49 (TC) and 0.53 (RG contrast) for naturals, 0.52 (TC) and 0.58 (ID) for fractals, 0.54 (phase symmetry) and 0.59 (ID) for manmades, and 0.48 (log-Gabor filter) and 0.50 (saturation contrast) for pinks. These results show that the AUC_P values were particularly high in manmades and fractals, and ID appears to be the most salient in the feature bank employed.

Remarkable age differences were found in the AUC_P values. Figure 5 plots for each category separately the AUC_P values for different age groups against each other, together with the 95% CIs. Especially in manmades and fractals, where the features were most salient, children’s AUCs are higher than young adults’, which are in turn higher than older adults’ AUCs. In naturals and pinks, the 0.50 chance level was crossed by the CIs of certain features, suggesting low saliencies and signal-to-noise ratios. Finally many pinks AUCs are below 0.50 showing that the lower values of certain features were fixated more often than expected by chance. This last point was most apparent for children, suggesting again a superior feature saliency, but this time with a sign change. In sum, the saliency of features decreased with age, and this was observed most clearly in fractals and manmades where the feature saliencies behaved as expected.

FIGURE 5

Figure 5. Age comparison of local image feature-related viewing. In all panels, each marker cross corresponds to one feature AUC_P (see Materials and Methods). The data is plotted such that young adults’ feature AUCs are compared to children’s AUCs on the left, and to older adults’ AUCs on the right. That is, the ordinates on the two sides of the same panel are identical and report the young adult data. Please note that the origins are panel specific, and the x-axis is symmetric on the two sides of the origin. Dashed lines denote the diagonals, and the dotted lines the AUC value of 0.50 (no discrimination). Note the very low and smaller than 0.50 AUC values of naturals and pinks, respectively. The features used provided better discriminability for fractals and manmades, and in those cases the AUCs decreased from children to young adults and from them to older adults, as shown by data lying below the diagonal on the left and above the diagonal on the right.

Inter-observer (IO) AUCs displayed the same reduction with increasing age. For naturals, the observed AUC_P values dropped from 0.62 (0.61–0.63) for children to 0.60 (0.59–0.60) for young adults, and to 0.55 (0.54–0.56) for older adults. For the two categories where the features are most salient, fractals and manmades, the IO-AUC decrease was even more prominent. For the former category the values changed from 0.69 (0.69–0.70) to 0.64 (0.63–0.64) and then to 0.61 (0.61–0.62). For the latter, the values for increasing age were 0.71 (0.70–0.72), 0.67 (0.66–0.67) and 0.62 (0.61–0.62), respectively. IO saliency was extremely low for pinks, revealing highly idiosyncratic viewing patterns: 0.53 (0.52–0.54), 0.54 (0.53–0.54), and 0.51 (0.51–0.52). These results reveal that the most agreement on fixated locations is in the case of two categories where the features were most salient, namely fractals and manmades, but the drop in AUCs with increasing age is again apparent.

The conclusion that feature saliencies drop with age can be criticized by arguing that the AUC values reflect the eye-tracker performance. Two facts speak against this argument. On the one hand, most of the kernels used in the computation of the features of interest are larger (around 1.3°) than the maximum allowed calibration error (0.6°). Furthermore, we compared the eye-tracker’s validation errors of different age groups obtained at the start of the experiments. The means of these errors were 0.47° (standard deviation ± 0.09), 0.41° (±0.08) and 0.47° (±0.08), for increasing age, respectively. Furthermore, permutation sampling revealed no significant differences between children’s and older adults’ errors, while young adults’ errors were significantly lower than both of these age groups (p = 0.013 and p = 0.009, respectively). That is, the highest and lowest feature AUC groups had indistinguishable validation errors, ruling out that the performance of the eye-tracker is responsible for the observed feature saliency differences.

In the remaining part of our analysis, we address different aspects of feature-related viewing, such as its correlation with performance in the task. But as discussed above, in two categories – naturals and pinks – the observed feature and IO-AUCs were rather low, and in many instances the CIs overlapped with the chance discrimination level of 0.50. In order to avoid spurious results due to low signal-to-noise ratios, we restricted the rest of our analysis to fractals and manmades.

Does the image-specific feature analysis replicate the above-discussed findings? In order to address this question image-specific AUC_I values were computed and the mean over images was taken for each feature and the IO saliency. For each age group separately, we computed the fractals and manmades AUC values of 12 features. These AUC_I values were subtracted from their AUC_P counterparts. The medians of the difference distributions (each consisting of 24 data points) were −0.005 (standard deviation, ±0.006), −0.002 (±0.006), and −0.001 (±0.006) for increasing age respectively, showing a one-to-one agreement between the two methods. The widths of the CIs were different however. The median CI widths for AUC_P with the same age order were 0.013, 0.011, and 0.013, and in the case of the AUC_Ithey increased to 0.041, 0.033, and 0.033. In order to display the overall variance among the images used for the upcoming analysis, in Figure 6 we show three AUCs with CIs, computed after pooling the images from the two categories of interest. As can be seen, LC, a feature with average saliency, ID, the most salient feature, and IO-AUC, all displayed the same decrease over increasing age and replicated the AUC_P results, but the feature CIs showed considerable overlap.

FIGURE 6

Figure 6. One average feature (LC) and the best feature (ID) compared to inter-observer (IO) saliency. For each age group separately, image-specific AUC (AUC_I) values were pooled from the fractals and manmades categories. Markers show for two features and the IO saliency the medians of those distributions together with the 95% CIs. Even though there was considerable overlap in these conservative CIs, all AUCs drop with age, replicating the category-pooled analysis.

Does the observation that the feature values at fixated locations are higher following shorter saccades than those following longer saccades (Tatler et al., 2006) hold for all age groups? Figure 7 compares AUC_P values for each feature and age group obtained from fixations following median-split short and long saccades. Nearly all points are above the diagonal, demonstrating that the saliencies of features were higher following shorter saccades. Importantly, IO saliency displayed the same pattern. In sum, points fixated after short saccades are characterized by both higher feature values and more agreement between observers for all age groups.

FIGURE 7

Figure 7. Saccade size differences on feature-related viewing. After age-specific median splits on saccade size, AUC for each feature was computed for fractals and manmades once for fixations at the end of short saccades (ordinates) and once for fixations following longer saccades (abscissae). The markers in the dashed ellipse show the same analysis on IO saliency and reveal the same pattern. Empty symbols denote age, category, feature combinations, and filled symbols denote the averages over features together with 95% CIs. Accumulation of the values above the diagonal reveals that feature values were relatively higher at the end of shorter saccades, which holds for all age groups.

Correlations Between Feature-Related Viewing and Entropy

The relation between being more explorative and feature-related viewing reveals an interesting double dissociation between older adults and children. For each age group, feature and the two categories of interest, after the removal of outliers the Pearson’s correlation coefficients were computed between the subject-specific AUC values (AUC_S) and explorativeness. Figure 8 shows the frequency distributions of the Pearson’s r computed in this manner. The median correlations for children, young adults, and older adults were −0.11, 0.05, and 0.26, respectively. Whereas three correlations were significant for young adults (ps < 0.05), only one correlation coefficient reached significance in the other groups. But more importantly, only in older adults and children are the medians of coefficient distributions significantly different than zero (Wilcoxon signed-rank test, both ps < 0.01). In sum, more explorative viewing behavior tends to be related to lower feature saliencies in children, and to higher feature saliencies in older adults.

FIGURE 8

Figure 8. Age differences in correlations between explorativeness and feature AUCs. For each feature and category combination (only fractals and manmades), subject-specific AUCs (AUC_S) were correlated with subjects’ explorativeness. Since there were 12 features and 2 categories of interest, the frequency histogram displays the distribution of 24 correlation coefficients for each age group. The triangles above show age group medians. Note the predominantly negative correlations of children, the positive correlations of older adults and the presence of roughly equal amounts of positive and negative correlations for young adults.

Correlations Between Feature-Related Viewing and Performance

Similar to the previous analysis, we computed correlations between feature saliencies (AUC_S) and performance measures. Medians of the correlation coefficients of AUC_S with percentage correct and FA rate were significantly different to zero for older adults only (p < 0.001 and p = 0.023, respectively). That is, the older individuals who displayed more feature-related viewing performed better in the task by being more immune to FAs.

Discussion

In the present study we measured the eye movements of human observers from three different age groups while they viewed natural or complex images belonging to different categories. After viewing the image, subjects indicated whether a shown image patch belonged to the just-viewed image or not. This task ensured that subjects viewed the images attentively, but since the task directly followed image viewing, no explicit visual search was involved. The analysis focused on three basic aspects: performance on the pattern-matching task, explorativeness, and amount of feature-related viewing. In addition, we analyzed interactions between these factors. We found several age-related differences in performance measures and viewing strategies. The present study is, to the best of our knowledge, the first to investigate developmental aspects of the involvement of bottom-up and top-down processes in scene viewing.

Image category and age-group differences were found in the delayed pattern-matching task performance. Due to a prominent increase in FAs rates, the performance for 1/f noise images (“pink noise” images) was very low. This is not surprising given the high intra-category image similarity due to the absence of edge-, corner- and luminance-related local differences (Einhäuser et al., 2006). Young adults outperformed both other age groups, replicating previous findings (Cerella and Hale, 1994; Plude et al., 1994). The question of whether young adults’ superior task performance is due to their mature and intact visual apparatus or their familiarity with computer-related tasks remains to be addressed in a lifespan study of visual attention.

Despite comparable levels of explorativeness for all age groups, it was observed that saccades performed by the older adults were of lower amplitude compared to young adults. This result replicates those obtained with change detection (Veiel et al., 2006), and visual search tasks (Scialfa et al., 1994) and is in line with findings indicating that older adults have a smaller useful field of view (UFOV), the breadth of the region that can be attended to (Ball and Owsley, 1991). Indeed, Kosslyn et al. (1999), have manipulated the size of the region on which subjects focused their attention, and revealed that older adults preferred to attend to smaller portions of space than younger adults. Mapstone et al. (2001), using a simulated driving scene, report that their younger subjects maintained their fixation locus in the center of the screen more often than older drivers, who frequently directed their gaze to the periphery. This was discussed as a result of shrinking of the UFOV and conforms to our reasoning. A decreased UFOV not only reflects the known age-related impairment in peripheral processing (Haegerstrom-Portnoy, 2005) but is related also to attentional mechanisms, such as being influenced by a secondary task (for a review see, Ball et al., 1990). Hartley and McKenzie (1991) separated attentional and perceptual contributions to extrafoveal processing by using focused and diffuse attention conditions and demonstrated that healthy aging influences both factors. That is, our results add to the large body of evidence suggesting that older adults, due to peripheral vision impairments, shift their attention consecutively to narrower portions of space with low amplitude saccades.

The older adults in our study performed the highest amount of fixations in unit time. At first sight, this appears at odds with other studies that have revealed comparable (Veiel et al., 2006) or even lower saccade rates for the elderly (Williams et al., 2009; Porter et al., 2010). Nevertheless, this apparent conflict can be reconciled in the light of the different trial structures of the employed tasks. In self-terminating paradigms such as change detection (Veiel et al., 2006) or visual search (Williams et al., 2009; Porter et al., 2010), the elderly need more time to find the target and thus have longer-lasting trials. This is reflected both in their fixation durations and the larger amount of fixations they perform. In our free-viewing task, trial length was held constant. As such, an increase in both fixation duration and number of saccades was logically impossible. In search tasks, the higher amount of fixations is explained with differences in UFOV or higher amount of fixations at previously visited locations (Veiel et al., 2006), and the longer fixation durations are attributed to older adults being more cautious in accepting or rejecting fixated regions as targets (Kramer and Madden, 2008; Porter et al., 2010). Since we employed a post-viewing task, subjects had no target information in memory while viewing the stimulus, and as such cautiousness could not play a role. In sum, since our task is not a search task where fixated items are compared to the memory trace of a target, older adults made relatively more frequent and lower amplitude saccades separated by short-lasting fixations, and as such displayed efficient image exploration.

We observed a prominent age-related decrease in feature-related viewing. That is, the local feature value differences between fixated and avoided regions are highest for children, followed by young adults, and are at their lowest for the oldest age group. Employing stimuli that consisted of local shapes that make up a different shape at the global scale (Navon, 1977), Roux and Ceccaldi (2001) demonstrated that older adults suffer more from global interference while identifying local elements. However, when larger stimuli are used (Lux et al., 2008), this effect disappears and a preference for local elements is observed, probably since the limited UFOV of the older individuals does not allow for global processing beyond a certain scale. Using very similar hierarchical stimuli, Porporino et al. (2004) revealed that children display problems in the identification of the global configuration if the stimulus is presented with distractors. Taken together, these findings show that whereas older adults are tuned to the global aspects of visual scenes, children focus more on local details. This suggests that the interplay between bottom-up and top-down processes changes in favor of the latter with age. This is in line with studies employing artificial stimuli (Whiting et al., 2005; Madden et al., 2007), which speak for an increase in top-down processes, such as reliance on expectations, in the later stages of life. Moreover, older adults are less efficient in distinguishing elementary visual features (Ellis et al., 1996). Nevertheless, the novel finding that the children in our study have the highest feature-related viewing speaks for a general lifespan decrease in reliance on bottom-up processes. Foulsham et al. (2009) have shown that the relative saliency at the fixations of an agnosic patient was higher than controls. Given that agnosia disrupts the high-level information regarding objects, their finding highlights the interplay between these two routes of processing. Our data suggest that this type of top-down knowledge is gradually acquired in life. In line with this, the present results further reveal that the age-related drop in feature-fixation relation is accompanied by an increase in idiosyncratic viewing patterns. Accordingly, the top-down processes in viewing behavior that become dominant with age might be subjective and reflective of the differences in life histories of the subjects. For older adults, age-related decline in visual functioning (Haegerstrom-Portnoy, 2005; Haegerstrom-Portnoy et al., 1999) further supports this argument. In sum, the data presented here are in line with the idea that bottom-up influences on fixation selection lose strength with age, while at the same time more individual viewing strategies take over.

The analysis of the correlations between performance, explorativeness, and feature-related viewing uncovered significant differences among age groups. First, there was a surprising double dissociation between children and older adults. As discussed above, the former group displays the highest and the latter the lowest of the feature-related viewing patterns. By correlating subject-specific explorativeness and feature-related viewing metrics, we found that more explorative individuals tend to shift their viewing strategy away from that which characterizes their age group. That is, whereas more explorative children have lower feature-related viewing ratings, the reverse is true for older adults. The distribution of young adults’ correlations, on the other hand, is centered around zero. It could be speculated that such correlations provide support for the differentiation-dedifferentiation theory of lifespan development (Li et al., 2001, 2004). This theory is based on the finding that, unlike for young adults, the correlations between several cognitive abilities are high for children and for older adults. This is thought to be due to the underlying neural resources that are not specialized for different facets of cognitive behavior, an idea that is supported by neuroimaging in the case of older age (Park et al., 2004). In line with this, the current study revealed that most correlations between different feature saliencies and explorativeness had the same sign within (but not between) the children and older adult age groups, whereas younger adults displayed a mixture of positive and negative correlations. Moreover, older individuals displaying relatively high explorativeness and feature-related viewing also performed better in the subsequent patch-recognition task. Being tuned to the low-level match between the image and the patch might be one way to solve this task. Interestingly, children and older adults have comparable task performance, yet differ in terms of the feature-relatedness of their viewing. The feature-related viewing at ceiling levels in children might have precluded a similar correspondence to performance. Baltes and Lindenberger (1997), and Lindenberger and Ghisletta (2009) have shown that individual differences across cognitive and sensory tasks are highly correlated in older populations. Similarly, applying tests which are designed to selectively test ventral and dorsal streams of visual processing, Chen et al. (2002) demonstrated that whereas young adults’ correlations for these different visual processing streams were low, one factor was enough to explain older adults’ data on all tasks. Even though we do not suggest that being more feature-tuned or explorative necessarily reflects better perception and as such corresponds to higher cognitive ability, the correlations observed are worth addressing from this theoretical stance in future studies.

The present results support the notion that with more life-experience, individual top-down strategies overshadow bottom-up processes. Furthermore, within age groups characterized by a particular tendency to employ bottom-up (children) or top-down (older adults) viewing strategies, more explorative individuals seem to shift to the otherwise less prominent strategy. We conclude that the so-far neglected developmental aspects of fixation selection can crucially contribute to our understanding of overt visual attention under natural conditions.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We are grateful to Tom Foulsham, Canan Karatekin, Tim Kietzmann, David J. Madden, Cliodhna Quigley, Ueli Rutishauser, Marius `t Hart and three anonymous reviewers for comments and discussions on an earlier version of the paper. This work was partially supported by European Commission FP7-ICT-2007-1 (Grant Agreement Number 217148 [Peter König]).

Footnotes

References

Açik, A., Onat, S., Schumann, F., Einhäuser, W., and König, P. (2009). Effects of luminance contrast and its modifications on fixation behavior during free viewing of images from different categories. Vision Res. 49, 1541–1553.