Edited by: Frans Verstraten, Universiteit Utrecht, Netherlands
Reviewed by: Alice O'Toole, University of Texas at Dallas, USA; Nicholas Costen, Manchester Metropolitan University, UK
*Correspondence: Harold Hill, School of Psychology, University of Wollongong, Wollongong, NSW 2522, Australia. e-mail:
This article was submitted to Frontiers in Perception Science, a specialty of Frontiers in Psychology.
This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
Not all detectable differences between face images correspond to a change in identity. Here we measure both sensitivity to change and the criterion difference that is perceived as a change in identity. Both measures are used to test between possible similarity metrics. Using a same/different task and the method of constant stimuli criterion is specified as the 50% “different” point (P50) and sensitivity as the difference limen (DL). Stimuli and differences are defined within a “face-space” based on principal components analysis of measured differences in three-dimensional shape. In Experiment 1 we varied views available. Criterion (P50) was lowest for identical full-face view comparisons that can be based on image differences. When comparing across views P50, was the same for a static 45° change as for multiple animated views, although sensitivity (DL) was higher for the animated case, where it was as high as for identical views. Experiments 2 and 3 tested possible similarity metrics. Experiment 2 contrasted Euclidean and Mahalanobis distance by setting PC1 or PC2 to zero. DL did not differ between conditions consistent with Mahalanobis. P50 was lower when PC2 changed emphasizing that perceived changes in identity are not determined by the magnitude of Euclidean physical differences. Experiment 3 contrasted a distance with an angle based similarity measure. We varied the distinctiveness of the faces being compared by varying distance from the origin, a manipulation that affects distances but not angles between faces. Angular P50 and DL were both constant for faces from 1 to 2 SD from the mean, consistent with an angular measure. We conclude that both criterion and sensitivity need to be considered and that an angular similarity metric based on standardized PC values provides the best metric for specifying what physical differences will be perceived to change in identity.
Being able to detect differences between different face images is critical for recognition. However not all detectable changes correspond to a change in identity. In this paper we measure the criterion difference people use when making “same or different identity?” judgments, with criterion specified in terms of physical differences in three-dimensional (3D) shape of the face itself. The criterion corresponds to the physical difference above which people will tend to respond “different” and below which they will tend to respond “same.” We also measure sensitivity to determine the extent to which these measures co-vary. For example in Experiment 1 we vary viewpoint, a manipulation well-known to affect sensitivity but one which does not affect underlying differences in shape. Both criterion and sensitivity are tested and defined within a “face-space” based on a principal components analysis (PCA) of measured variations in 3D face shape and, in Experiments 2 and 3, differences between conditions allow us to test between alternative distance metrics for relating physical to perceived differences.
Many previous studies have addressed our ability to detect changes between faces. For example Freire et al. (
This distinction mirrors that made between sensitivity and bias in signal detection theory. Same/different and other decisions are co-determined by both sensitivity, the ability to discriminate, and criterion, the point on the decision axis where responses change (Macmillan and Creelman,
Bias has been reported in previous experiments on face matching and recognition. For example there is a bias to respond “same” when matching a photograph to a live individual (Kemp et al.,
We record the proportion of “different” responses as a function of the physical differences between faces and use these proportions to estimate the physical difference that would result in people responding “different” 50% of the time (P50). P50 specifies the abscissa location of the psychometric function that links physical differences to observer responses and thus is determined by criterion. Specifically, P50 corresponds to the point where the mean difference between the faces in a pair would be equal to the criterion. We also estimate difference limen (DL), half the difference between the 75 and 25% points on the psychometric function. This corresponds to the steepness of the psychometric function, and index of sensitivity: a smaller DL indicates a steeper function and thus higher sensitivity. Previous studies using same/different tasks have only reported proportion correct providing an indication of sensitivity but not criterion (Freire et al.,
In Experiment 1 we vary the views available, a manipulation well-known to affect sensitivity (Bruce et al.,
The differences between reference and comparison faces in Experiment 1 and the experimental manipulations used in Experiments 2 and 3 are all defined in terms of “face-space,” a widely used metaphor defined and elaborated elsewhere (Valentine,
We focus on shape because it is a property of the face itself and as such genuinely represents a
Principal components analysis of shape or any other property defines a space with orthogonal axes that correspond to the principal sources of variation between faces. The PCs are the dimensions of the space and are ordered in terms of the proportion of the total variation that they account for. PCA based physical face-spaces are widely used, although most are based on PCA of face images (Turk and Pentland,
where
An alternative similarity metric is based on differences between angles subtended by “identity vectors” at the origin of face-space rather than distances (Leopold et al.,
In Experiment 3 we test between angle and distance based measures by varying the distinctiveness of the faces being compared. Distinctiveness is a central concept to face-space models often characterized in terms of “how easy would it be to pick out this person in a crowd” (Light et al.,
In all experiments we estimate criterion P50 and sensitivity DL in terms of physical differences in facial shape. In Experiment 1 we use the average face as a reference and test whether P50 and DL are affected by the face views available. We expect sensitivity to be affected but, if criterion is a function of the differences between faces and independent of sensitivity, it should remain constant. Experiment 2 provides a test between two possible physical distance metrics, Mahalanobis and Euclidean, to see which best predicts observed P50 and DL. Lastly, in Experiment 3 we use 10 references faces and vary the distinctiveness of both reference and comparison faces in order to test between angle and distance based similarity measures. The general methods are described next followed by the individual experiments.
All Stimuli were rendered images of 3D solid body computer models of synthetic faces based on measurements of real faces. For Experiments 1 and 2, the analysis was based on 54 faces recorded using a NEC “Fiore” 3D facial surface scanner (Yoshino et al.,
Scanned shape data for each individual real face is first fit to a generic face mesh with well specified topological properties (Claes,
Details of the synthetic faces generated were determined by experimental design and are specified in the corresponding methods section. Both types of scanner record information about surface color as well as shape, and this information was used to define an average surface color which was applied to all stimuli. Synthetic face models were imported to Blender v2.45 (
Ethical approval for all experiments was granted by the University of Wollongong Human Research Ethics Committee, application HE09/358, in accordance with Australian National guidelines. All participants were students at the university, were over 18, and were assumed to have normal or corrected to normal vision and face processing abilities. Participation was irrespective of ethnicity and this was not recorded.
For Experiments 1 and 3 testing took place in groups of up to 20 but at individual computers. For Experiment 2 participants were tested individually. Presentation of stimuli and recording of responses was controlled by individual Dell PCs using software written in Runtime Revolution. Viewing distance was unconstrained but approximately 50 cm. Participants entered an individual identifier and on screen and instructions informed them that they would be shown pairs of faces presented simultaneously and that their task was to decide whether “the two faces are the same person or not.” The 480 × 480 pixel images measured 14 cm × 14 cm (16° × 16°) on the screen. Individual faces varied in size but were approximately 7 cm × 10 cm (8° × 11°). The centers of each image were separated by 18 cm (20°) horizontally and offset 5 cm (6°) vertically to prevent direct comparisons. Left/right positions and order of trials was fully randomized for each participant. Responses were made by moving a scroll bar. Although this enabled participants to indicate a level of confidence in their decision results were binarized as “same” or “different” for analysis. The slider started each trial positioned on the center tick mark but this was not a valid response and participants had to move the pointer to one side or the other. Images remained on the screen until a response was made. Ten practice trials preceded the experiment proper during or after which participants were able to ask questions regarding the task. No feedback about accuracy was given at any stage. There were a total of 110 trials in Experiments 1 and 3 and 220 in Experiment 2.
All experiments used the method of constant stimuli with 11 levels of difference including identical face pairs. There were 10 repetitions at each level and the percentage of “different” responses recorded. Observed response probabilities were used to estimate the median location (P50) and 25th (P25) and 75th (P75) percentiles for each observer using the Spearman–Karber method, a distribution free approach to estimating psychometric functions (Miller and Ulrich,
The primary aim of this experiment was to estimate the criterion adopted by observers when deciding whether two faces presented are of the same person or not. We also measure sensitivity and test how both vary as a function of the face views presented.
The average face was used as the reference throughout. We expect criterion will vary as a function of both location and direction in face-space and the average represents a neutral starting point in both respects. Direction was randomized with the constraint that all comparison faces were at the corners of concentric multi-dimensional hypercubes (Wilson et al.,
We varied the face views shown as a between participants manipulation. In one condition (FF) identical full-faces views were shown as for many previous studies (Freire et al.,
We expected sensitivity to be high for FF and GIF but lower for TQ. Criterion may not vary if it is independent of sensitivity and reflects underlying difference in shape between faces as intended.
Forty-seven undergraduate students took part in this experiment as part of third year laboratory classes.
Participants were randomly assigned to one of the three conditions: paired full-face (FF), full face, and TQ or paired GIF animations (GIF) as outlined in the Section “
Other details of the method and materials were as described in the Section
Figure
Condition measure | FF | TQ | GIF |
---|---|---|---|
P50 | 0.33 (0.16) | 0.46 (0.13) | 0.47 (0.19) |
DL | 0.14 (0.14) | 0.31 (0.14) | 0.18 (0.11) |
A Kruskal–Wallis test showed a main effect of Views presented on P50:
There was also a main effect of View on DL:
Median false alarm rates (“different” responses at the 0 SD level) for the three conditions were: FF 0.15 TQ 0.35 GIF 0.10. The high rate in TQ shows how changes in viewing conditions can be misinterpreted as a change in identity even when there is no change in the shape of the face.
The proportion of “different identity” decisions increased with distance from the average reference face as expected. Criterion was significantly lower for FF than TQ or GIF, which did not differ from each other. Sensitivity was higher for FF and GIF than TQ as expected.
Criterion was constant for both conditions that involved comparing between views, despite the significant difference in sensitivity. This is consistent with people making their judgments on the basis of differences between faces that are independent of viewing conditions that significantly affect sensitivity. Criterion was significantly lower in the identical view condition, but as argued in the introduction, this is likely to reflect image rather than face based comparisons. Based on this data, the best estimate of the criterion people used is 0.47 SD when the average face is the reference. While it would clearly be necessary to test a variety of other view combinations and other changes in presentation conditions to determine if this remains constant over a wider range of conditions, that is not the focus of this work.
Sensitivity was higher for FF and GIF comparisons than for TQ comparisons as expected. The GIF condition was associated with both high sensitivity and low false alarm rates while still ensuring that comparisons have to be face rather than image based and was used for the remaining experiments.
This experiment was designed as a test between Euclidean and Mahalanobis distance metrics again using the average face as a reference. While in Experiment 1 comparison faces were designed to be equivalent with respect to these two metrics, here we compared two conditions that would be expected to differ. Here stimuli were generated in the same manner as before except that either the first (PC1) or second (PC2) PCs was set to zero (Please see Figure
By definition PC1 is associated with more physical variation and a larger SD than PC2. This means that Euclidean distances will be larger when PC2 is zero and PC1 varies than when the reverse is true. For the particular PCA space used Euclidean distances will be ∼1.5× greater in the PC2 zero condition – a function of the proportions of total variance accounted for by each dimension in the PCA space. In contrast Mahalanobis distances are standardized by SD and will not vary between conditions.
If criterion corresponds to a particular Mahalanobis distances, P50 will not vary between conditions. In contrast, if criterion corresponds to particular Euclidean distance, the P50 SD value will be 1.5× larger in the PC1 zero condition.
Viewing conditions were constant in this experiment ruling out one possible source of variation in sensitivity. If sensitivity is constant, the same differences between conditions would be expected for DL as for P50. However sensitivity can vary as a function of direction in face-space (Ross et al.,
Twenty-four undergraduate students took part in this experiment for course credit and were tested individually.
This experiment was a within subjects design with two conditions, PC1 zero or PC2 zero. Comparison faces were constructed in the same way as for Experiment 1, except that either PC1 or PC2 was zero for all comparison faces in the condition. Sign was randomized for other dimensions as before. There were a total of 220 trials, 110 for each condition. Order was fully randomized for each participant and PC1 zero and PC2 zero trials were not distinguished in any way.
The reference face was the average face for both conditions and all stimuli were presented as animated GIFs. Other details of the method and materials were as described in the Section
Figure
Condition measure | PC1 zero | PC2 zero |
---|---|---|
P50 | 0.46 (0.13) | 0.55 (0.33) |
DL | 0.17 (0.18) | 0.18 (0.18) |
Wilcoxon signed ranks dependent samples test showed that P50 was significantly lower for PC1 zero faces,
There was no significant difference in DL,
Median false alarms rate (“different” responses at 0 SD level) was 0.15 based on the 20 identical 0 SD pairs.
Participants adopted a significantly lower criterion in the PC1 zero condition but sensitivity did not differ. The null effect on sensitivity is consistent with Mahalanobis distance and the difference in criterion is in the opposite direction to the predicted by Euclidean distance. While there is considerable overlap between individual functions, P50 appears to be somewhat more affected by changes in PC2, despite the fact that this dimension accounts for less physical variation than PC1. Standardizing PC values in terms of the associated SD, as is the case for Mahalanobis distance, provides a principled way of ensuring that distances are not dominated by the values of early PCs.
As can be seen from Figure
In Experiment 3 we test whether an angle based similarity measure in general better accounts for perceived changes in identity than distance based measures.
In this experiment we test whether changes in angle capture changes in perceived identity better than changes in distance. Angle here refers to is the differences in direction between the “identity vectors” that define reference and comparison faces relative to the mean in face-space (Leopold et al.,
Ten reference faces were used, all one SD from the mean and generated in the same way as the 1 SD comparison faces in Experiment 1. One SD is the mean value expected for a face drawn from a multinormally distributed population. For each of these reference faces comparison faces were “lateral caricatures” constructed by moving in a direction orthogonal to the reference face identity vector (Rhodes et al.,
As a between participants manipulation we varied the distance of both comparison and reference faces from the mean, in effect moving all faces to a different annulus. This has the effect of varying their distinctiveness and the distance between them in face-space while keeping angular differences constant. Distances between vector endpoints increase in proportion to distance from the mean (by similar triangles). We ran four conditions with SD 0.5, 1.0, 1.5, or 2.0. Reference faces for other conditions were generated by automatically caricaturing the 1 SD reference faces (Brennan,
If perceived differences in identity are determined by differences in angle, distinctiveness should have little or no effect on P50 and/or DL. Alternatively, if distance between faces is critical, P50 and/or DL expressed as an angle should decrease with increasing distinctiveness as a constant distance between faces will correspond to a smaller angle when the faces are at a greater distance form the mean.
All stimuli were generated using a PCA space based on a different set of faces and measurements than Experiments 1 and 2 (please see
One hundred eleven undergraduate students took part in this experiment as part of third year laboratory classes.
Participants were randomly assigned to different groups which varied according to the distinctiveness of the faces shown: 0.5, 1.0, 1.5, or 2.0 SD. P50 and DL were both specified in terms of angle.
Other details of the method were as described in the Section
Figure
Distinctiveness (SD) measure | 0.5 | 1.0 | 1.5 | 2.0 |
---|---|---|---|---|
P50 | 59.4 (27.5) | 52.7 (11.9) | 48.9 (18.0) | 51.0 (18.3) |
DL | 23.3 (19.2) | 13.3 (5.5) | 11.4 (5.6) | 12.4 (7.9) |
A Kruskal–Wallis test showed an effect of Distinctiveness on P50, with sensitivity appearing to be higher for the “anti-caricatured” 0.5 SD condition:
There was also an effect of distinctiveness on sensitivity with DL higher, sensitivity lower, for the 0.5 SD condition:
Both criterion and sensitivity expressed as an angle were constant over the SD1 to SD2 range: median P50 was 51.5° and median sensitivity DL 12.3°. This is consistent with an angle based similarity metric predicting perceived differences in identity. The corresponding distances between faces will have doubled over this range and results are clearly not consistent with either P50 or DL reflecting a constant distance as this would have corresponded to a decreasing angle.
Difference limen was significantly larger for 0.5 SD faces and there was a trend for a higher P50 in this condition. Previous studies have found that anti-caricatures are more poorly recognized than lateral caricatures, contrary to an angle based account (Rhodes et al.,
A criterion angle of 51.5° was found for judgments of faces in the normal (1 SD) to distinctive (2 SD) range. Angular differences account for both criterion and sensitivity over this range. In future it would be important to test if this generalizes to cases where the magnitudes of comparison and reference face identity vectors are not the same, for example when the reference face is distinctive but the comparison face is not.
We measured criteria (P50) and sensitivity (DL) for same or different identity decisions in terms of differences in 3D face shape. Increasing physical differences between faces increased the proportion of “different” identity responses throughout. The relationship between physical differences and response proportions provided the basis for estimates of the magnitude of differences perceived as a change in identity.
In Experiment 1 criterion was the same for a static or an animated change in view, despite the expected difference in sensitivity. This is consistent with criterion being determined by underlying differences between faces and not particular properties of the views shown. Sensitivity for the animated condition was equivalent to identical full-face view comparisons. The latter condition was associated with a significantly smaller criterion difference but both this and the high sensitivity may be accounted for by image rather than face based comparisons. The overall dissociation between sensitivity and criterion emphasizes the need to consider both when seeking to understand and predict both laboratory and real world identity matching.
Experiment 2 showed that neither criterion nor sensitivity corresponds to Euclidean distance: the major sources of physical variation are not necessarily the most important for decisions about identity. Sensitivity corresponded well with Mahalanobis distance, where each PC is weighted in terms of its variance. However criterion was still disproportionately affected by PC2, although this is associated with less physical variation than PC1 with which it was compared. This may reflect the particular dimensions tested and it was argued that Mahalanobis distance provides a principled way to weight the contribution of different sources of variation that allows dimensions more likely to be associated with individual variation to influence physical distance measures.
Experiment 3 provided evidence that an angle based difference measure better predicts perceived differences in identity than a distance based measure. Angular criterion and sensitivity were constant over a range where corresponding distances doubled. Observers were less sensitive to differences between “anti-caricatured” faces 0.5 SD from the mean, and the associated criterion was higher.
In the introduction we argued that perceiving a change in identity involves more than just being able to detect differences between images. Our aim was to measure criterion difference that would correspond to a perceived change in identity at least 50% of the time. The human face recognition system remains one of the best available (although see O'Toole et al.,
There were considerable individual differences, as evidenced here by IQR for P50 and reported previously (Kemp et al.,
The criterion observed did not correspond to step change but both individual and pooled psychometric functions had sigmoid or similar non-linear functions with response rates often constant at the ends of the scale. The comparison between animated and static views in Experiment 1 showed that the underlying psychometric function can be made steeper without shifting criterion. In a sense criterion corresponds to a category boundary between “same” and “different” person although we did not seek to test for categorical perception as such (Harnard,
The broader aim of this work is to link physical and psychological face-spaces. Experiments 2 and 3 provided evidence for Mahalanobis over Euclidean distance as a metric, and for differences in angle over distance as determinants of identity distinctions. Clearly there are many other issues to explore, particularly the optimal weighting of physical dimensions and we are currently doing this by seeking to establish a psychological face-space on the basis of perceived similarity that can be specified in terms of the physical face-space. It will then be necessary to test whether the psychological face-space provides a better account of perceived changes in identity using the methods developed here.
While the use of 3D shape models has considerable advantages, keeping surface reflectance constant inevitably affects the task and generalization of results. Surface scanning technology is advancing and can now produce near photorealistic results. If accurate surface reflectance information that was independent of lighting could be recorded and modeled, this would provide an even more powerful tool for generating controlled stimuli. The ability to scan faces at increasing frame rates will also allow within face variation to be addressed – faces are constantly changing quite dramatically in shape when, for example, we speak, eat, or express emotion although there is no corresponding change in identity. Incorporating such variation in face-space models is a critical challenge for future work especially as within face changes epitomize types of large detectable changes that should not be interpreted as a change in identity.
The criteria adopted when people make same or different identity decisions determine whether two example faces are seen as being the same person or not. The experiments reported provided estimates of the physical differences in 3D face shape that correspond to these criteria. Criterion was also found to dissociate from sensitivity with, for example, P50 but not DL the same for an animated and a static change in view (Experiment 1). Raw Euclidean physical differences in shape did not characterize when faces were seen as different but Mahalanobis distances predicted sensitivity and provide a principled compromise for weighting different sources of variation (Experiment 2). The angles between identity vectors were found to characterize perceived changes in identity better than distances for faces one or more SD from the average (Experiment 3). Taken together, the results demonstrate the importance of considering criterion in addition to sensitivity and suggest that angular differences based on a Mahalanobis metric may provide a good way to link physical to perceived differences.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Movie
Illustration of matching across animated views, the animated GIF condition used in all experiments. The stimuli shown here are taken from Experiment 3 and show a reference face (left) and comparison face (right) where the “identity vectors” subtend an angle of 54° at the origin of the face-space used. Both faces are 1 SD from the average. Please see Experiment 3 and Figure
This work was supported by Australian Research Council Discovery Project 0986898 and Engineering and Physical Sciences Research Council grant EP/F037503/1. Thanks to Harry Matthews for careful and insightful comments on earlier drafts.