A cross-cultural comparison between South African and British students on the Wechsler Adult Intelligence Scales Third Edition (WAIS-III)

Cockcroft, Kate; Alloway, Tracy; Copello, Evan; Milligan, Robyn

doi:10.3389/fpsyg.2015.00297

ORIGINAL RESEARCH article

Front. Psychol., 13 March 2015

Sec. Quantitative Psychology and Measurement

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00297

A cross-cultural comparison between South African and British students on the Wechsler Adult Intelligence Scales Third Edition (WAIS-III)

Kate Cockcroft^1*

Tracy Alloway²

Evan Copello²

Robyn Milligan¹

¹Department of Psychology, University of the Witwatersrand, Johannesburg, South Africa
²Psychology, University of North Florida, Jacksonville, FL, USA

There is debate regarding the appropriate use of Western cognitive measures with individuals from very diverse backgrounds to that of the norm population. Given the dated research in this area and the considerable socio-economic changes that South Africa has witnessed over the past 20 years, this paper reports on the use of the Wechsler Adult Intelligence Scale Third Edition (WAIS-III), the most commonly used measure of intelligence, with an English second language, multilingual, low socio-economic group of black, South African university students. Their performance on the WAIS-III was compared to that of a predominantly white, British, monolingual, higher socio-economic group. A multi-group confirmatory factor analysis showed that the WAIS-III lacks measurement invariance between the two groups, suggesting that it may be tapping different constructs in each group. The UK group significantly outperformed the SA group on the knowledge-based verbal, and some non-verbal subtests, while the SA group performed significantly better on measures of Processing Speed (PS). The groups did not differ significantly on the Matrix Reasoning subtest and on those working memory subtests with minimal reliance on language, which appear to be the least culturally biased. Group differences were investigated further in a set of principal components analyses, which revealed that the WAIS-III scores loaded differently for the UK and SA groups. While the SA group appeared to treat the PS subtests differently to those measuring perceptual organization and non-verbal reasoning, the UK group seemed to approach all of these subtests similarly. These results have important implications for the cognitive assessment of individuals from culturally, linguistically, and socio-economically diverse circumstances.

Introduction

Intelligence (IQ) testing is widely used in cross-cultural contexts, yet there are still persistent critiques that such tests can perpetuate racist beliefs and lead to discrimination (Helms, 1992; Kaufman, 1994). Although it has been suggested that non-verbal, visuospatial IQ subtests may be free of cultural and linguistic biases, this assumption has been contested (Rosselli and Ardila, 2003). Normative data for most tests of cognitive ability are predominantly based on monolingual, reasonably aﬄuent, English first language individuals and these norms are still routinely applied to culturally and linguistically diverse individuals. In instances where other cultural groups are represented in the standardization samples of tests developed in Europe and the United States (US), they tend not to include cultures beyond these continents. This makes the application of such measures to individuals from culturally, linguistically and socio-economically different circumstances problematic.

This matter has particular relevance in South Africa, a country with 11 official languages, a majority of English second language speakers, and high levels of poverty. South Africa (SA) shares historical and economic ties with the United Kingdom (UK) and is part of the Commonwealth of Nations, yet represents a unique blend of linguistic and cultural diversity. For those South African cultures (white English- and Afrikaans-speaking) that share many similarities with Western Europe, IQ performance tends to be comparable (Shuttleworth-Edwards, 1996; Claassen et al., 2001). However, the majority of the SA population belongs to cultures, and linguistic and socio-economic circumstances that are very different from that of Western Europe and the typical IQ test normative groups.

Despite this, the Wechsler Adult Intelligence Scale Third Edition (WAIS-III) has been adapted and standardized for use with all South Africans. This adaptation has been criticized for failing to re-pilot the adaptations for possible cultural and language biases and for neglecting to stratify the norms according to quality of education (Shuttleworth-Edwards et al., 2004; Foxcroft and Aston, 2006). The reason for the latter critique is that the current SA educational systems (former Model C, ex-DET, and privately funded) offer education that differs vastly in quality. Former Model C state schools are those that were previously reserved for the education of white children under apartheid and were modeled on UK public schools. They are comparable to privately funded schools and generally provide a high standard of education. In contrast, schooling provided by the legacy Department of Education and Training (DET) for black children under apartheid, continues to be constrained by limited resources and large classes despite new educational policies that mandate fairer allocation of resources. Although many black children now attend the former Model C and private schools, a large proportion have no option but the ex-DET schools (Shuttleworth-Edwards et al., 2004; Fleisch, 2008).

An advantaged, Western-style schooling teaches problem solving, as well as test-taking skills, which are heavily drawn on in traditional IQ tests (Nell, 2000; Shuttleworth-Edwards et al., 2013). Shuttleworth-Edwards et al. (2004) found that both Verbal and Performance IQs on the WAIS-III were negatively affected by a poor quality education in a sample of black Africans with an African language as their mother-tongue. Consequently, these authors argue for the stratification of IQ test norms in terms of both level and quality of education to allow for some control over the considerable differences between educational systems. Level and quality of education have been shown to correlate significantly with performance on most WAIS-III indices and subtests (Heaton et al., 2003) in European (Kessels and Wingbermuhle, 2001; Grégoire, 2004), American (Razani et al., 2006), Australian (WAIS-R, Shores and Carstairs, 2000), and African samples (Shuttleworth-Edwards et al., 2004). Quality of education appears to be a more important variable than years of education, and may explain some of the discrepant findings in cross-cultural comparisons of the WAIS-III that have emerged despite careful matching for educational level (Manly et al., 2004; Shuttleworth-Edwards et al., 2004).

In addition to level and quality of education, past studies have identified cultural, linguistic, and socio-economic factors as affecting IQ test performance (Harris et al., 2003; Turkheimer et al., 2003; Manly et al., 2004; Shuttleworth-Edwards et al., 2004; Razani et al., 2006; Boone et al., 2007; Walker et al., 2009). In considering the effects of these complex and inter-related variables on IQ test performance, we have focused our review on empirical evidence concerning the WAIS-III, its abbreviated form, the WASI-III and its predecessor, the WAIS-R (Revised), and not on other versions of this test which are structurally quite different, or on evidence from children’s IQ tests, which is complicated by developmental considerations.

The influence of cultural background on IQ test performance has received limited attention in the last two decades and consequently many of the available studies are dated. Most of the studies reviewed here are based on the assumption that cultural or ethnic groups reflect a homogenous set of socio-cultural characteristics, which is not necessarily the case, and which makes this field particularly difficult to investigate. A summary of the most recent available studies shows that African Americans tend to score significantly lower than white and Hispanic groups on the WAIS-III Verbal Comprehension (VCI), Perceptual Organization (POI), and Processing Speed (PSI) indices (Heaton et al., 2003), on the WAIS-R subtests (Kaufman et al., 1991) and WAIS-R Full Scale IQ (Byrd et al., 2006). Other comparisons found that African American individuals score lower than Caucasian individuals on WAIS-III Digit Span, but not Digit Symbol (Razani et al., 2007), while individuals of Hispanic, Asian, and Middle Eastern backgrounds score lower than those of English-speaking backgrounds on WASI-III Vocabulary and Similarities (Razani et al., 2006). The latter study found no cross-cultural effects on Block Designs and Matrix Reasoning, although these subtests have been widely criticized elsewhere as lacking cross-cultural validity (Kaufman et al., 1988; Marcopulos et al., 1997; Dugbartey et al., 1999; Shuttleworth-Edwards et al., 2004).

The most comprehensive cross-cultural studies on the WAIS-III have been conducted by Shuttleworth-Edwards (1996), Shuttleworth-Edwards et al. (2004, 2013), with black and white Southern Africans. They found that scores for black and white Southern Africans with an advantaged education were comparable to those in the US WAIS-III standardization, while black Africans with disadvantaged educational backgrounds scored up to 20 IQ points lower (Shuttleworth-Edwards et al., 2004). Further refinement of the latter study by removing non-South Africans from the data set and replacing them with data from South African, Xhosa-speaking participants, revealed concomitant increases in WAIS-III subtest, Index and IQ scores as the quality of education increased (Shuttleworth-Edwards et al., 2013). These authors also found that the subtests which revealed the most cross-cultural differences (irrespective of quality of education) were Object Assembly, Symbol Search, Picture Arrangement and Block Design, demonstrating that both verbal and non-verbal subtests are susceptible to such influences.

Systematic comparisons between the various cross-cultural studies on IQ test performance are difficult to undertake due to methodological differences, for example not all of the WAIS-III subtests, Index and IQ scores are consistently reported, while several studies did not match comparison groups for education. While it has been suggested that processing based measures which draw on fluid problem solving skills are likely to be more culture fair than those that rely on acquired, long-term learning (Campbell et al., 1997; Nell, 2000), the review of the literature indicates that there does not appear to be a single WAIS-III subtest that consistently showed no cross-cultural effects, and the most consistent of these effects appear in the verbal subtests.

Effects of culture are likely to be most pronounced where cultural background is most divergent from an English first language, Western culture (Ardila and Moreno, 2001). In support of this, higher performance on IQ tests has been demonstrated by individuals from Mexican–American backgrounds who possess greater ‘Anglo socio-cultural characteristics’ (Gonzales and Roll, 1985; Razani et al., 2006, p. 777). Among the English second language speakers of the WAIS-III standardization sample, Harris et al. (2003) found that acculturation variables such as language preference, years of residence in the US and length of education in the US all accounted for significant variance in VCI and PSI Indices. Degree of acculturation also accounted for the lower scores of African American individuals on the WAIS-R Block Design (Manly et al., 1998). Degree of acculturation has been shown to account for significant variability in the PSI of the WAIS-III (Harris et al., 2003; Kennepohl et al., 2004). While there is some variation across studies in terms of acculturation variables, most include language usage, test-wiseness (test-taking skill, motivation, and perceptions of test face validity), socio-economic status (SES), home and school environments, and level and quality of education (Kennepohl et al., 2004; Shuttleworth-Edwards et al., 2004; Perry et al., 2008). An aspect of acculturation, test-wiseness, has been reported as ‘the most powerful moderator of test performance,’ exerting strong effects on both verbal and non-verbal test performance (Nell, 2000, p. 133). Extent of acculturation will also influence how a particular construct being measured in an IQ subtest is perceived in the target population. In this regard, few studies have considered and statistically evaluated the presence of measurement invariance between comparison groups. Violations of measurement invariance would imply that the same construct is not being measured across different cultural groups and could complicate meaningful interpretation of test data (Dolan et al., 2006; Milfont and Fischer, 2010).

It is evident that performance on IQ tests is influenced by a range of socio-cultural variables which may lead to considerable heterogeneity in performance, particularly when the testees come from backgrounds that are very different to that of the norm sample. However, the body of research in this area is at least 10 years-old. Flynn effects, which refer to an average increase of three IQ points per decade (with two IQ points increase per decade on the Verbal Scale and four IQ points on the Performance or non-verbal Scale) have been reported in many nations (Barber, 2005; Flynn, 2007, 2009). To the authors’ knowledge, there are no published data on Flynn effects in black South Africans (the sample of interest in the current study), but data from white and Indian South Africans (te Nijenhuis et al., 2011) show increases in the expected direction and magnitude. These effects are predominantly driven by environmental factors (te Nijenhuis, 2013). Thus, the considerable socio-economic transformations that South Africa has experienced over the last 20 years could mean that the dated cross-cultural research may no longer be relevant to the new generation of young adult South Africans born after the end of apartheid. As an initial attempt to explore this possibility, the purpose of this study was to compare performance on the WAIS-III between two diverse groups, namely a non-standard and a standard population of university students. The latter group was similar to the UK WAIS-III normative group, namely predominantly white, monolingual English speakers from mid-level SES backgrounds in Britain. The non-standard comparison group were multilingual, black African, English second language speakers from low socio-economic circumstances in South Africa. No prior hypotheses were stipulated, rather we were interested in whether there would be any differences between the groups on the various subtests of the WAIS-III.

Materials and Methods

Participants

British Sample

There were 349 undergraduate UK university students (33% male), ranging from 18 to 58 years (M = 27.18 years; SD = 8.74 years). All were monolingual English speakers, from middle-class backgrounds. SES was determined from university reports on the backgrounds of students. Fifty-two percent resided in urban areas and 33.4% in accessible rural areas. The ethnic distribution of this group (71% white, 1% black, and 18% other, i.e., Asian, Indian, and international students) was broadly similar to that of the UK WAIS-III norm sample (94% white, 3.36% black, and 2.44% other; Wechsler, 1997). Ten percent of the UK sample did not indicate an ethnic group on their biographical questionnaire. All participants had received primary and secondary schooling in English, at public and private schools.

South African Sample

These were 107 undergraduate black African students enrolled at an English medium University (40% male; M = 20.79 years, SD = 1.56 years; range: 18–25 years). Since the quality of primary and secondary education ranges vastly in post-apartheid South Africa and has been shown to affect performance on cognitive tests (Shuttleworth-Edwards et al., 2013), students who had attended schools in the lowest three quintiles of the South African government school system, which represent the poorest schools, were recruited through advertisements on the university campus (Department of Education, 2008). The majority (82%) of the SA participants underwent primary and secondary schooling in English. All were multilingual and spoke, on average, four languages (English and three African languages). In all cases, an African language was the first language, and for 65% English was the L2. Fifty-five percent had acquired English before the age of six and the remainder before 13.

All of the SA participants came from low socio-economic circumstances. The majority (82%) resided in rural areas, in a basic brick house with running water and electricity. Hardly any (98%) had washing machines, microwave ovens, or tumble-dryers. Less than 1% of families owned a motor vehicle or personal computer. Two-thirds (67%) had attended preschool; 52% came from single parent homes, and 28% never knew their fathers. Forty-two percent of mothers had not completed high school, and 35% had post-secondary school qualifications.

Materials and Procedure

An established measure of intelligence, the WAIS-III (Wechsler, 1997), was administered. The South African item administration (Claassen et al., 2001) was used with the SA sample. This version is similar to the UK version, with minor alterations to the content to make it more appropriate to a SA context, such as the use of ‘rands’ instead of ‘pounds,’ and ‘cool drinks’ instead of ‘soft drinks’ in the Arithmetic subtest. In addition, the SA standardization resulted in amendments to four items in the Vocabulary subtest, five in the Information subtest and two in the Comprehension subtest to make them more appropriate for SA testees. Claassen et al. (2001), in their standardization of the WAIS-III for South Africans, found that the differences in scaled scores between individuals tested with both the original UK items and the altered SA items were very small, and thus it is unlikely that this would have exerted any significant effect on the comparisons in the current study.

The WAIS-III comprises 14 subtests, of which 13 were administered (Object Assembly, which does not contribute to the Full Scale IQ and is an optional subtests, was not administered). In the Vocabulary subtest, the individual has to produce a definition of a given word. In the Similarities subtest, the individual has to describe how two words are similar. In the Information subtest, the individual is asked general knowledge questions. These subtests are combined to create the VCI. The Comprehension subtest is not part of the VCI, but contributes to the Verbal IQ. In this subtest, the individual provides a response to various social situations. The following subtests comprise the POI. In Block Design, the individual assembles red and white blocks to recreate a given shape. Matrix Reasoning consists of a sequence of designs where the individual fills in a missing design piece from a selection of options. In Picture Completion, the individual is required to recognize the missing part in a set of pictures. Picture Arrangement is not part of the POI, but contributes to the Performance IQ. It requires the individual to arrange pictures in a logical sequence. The PSI comprises Coding, where symbols are matched with numbers under a time constraint; and Symbol Search, where the individual has to rapidly scan a set of symbols to identify a target. The Working Memory Index (WMI) comprises three subtests: Digit Span, where the individual has to recall numbers in forward and backward order; Letter–Number Sequencing (LNS), where the individual recalls a string of given letters and numbers in numerical and then alphabetical order; Arithmetic, where the individual responds to mental math questions.

Testing was carried out in a one-to-one, single session on the campus, by two psychologists trained in the administration of the WAIS-III (one for the UK sample and one for the SA sample). Both SA and UK protocols were scored according to the UK WAIS-III manual scoring criteria, with the exception of the few changed content items on the SA administration, in which cases the SA scoring had to be used. Subtest scale scores, Full Scale IQ, Verbal IQ, Performance IQ and Factor Indices were calculated using the UK norms for both the SA and UK samples (Wechsler, 1997). All protocols were scored twice, once by the psychologists who administered the WAIS-III, and once by the primary (SA protocols) and secondary (UK protocols) authors, who were both trained and experienced in WAIS-III administration.

The study was approved by the ethics committees of the University of the Witwatersrand, Johannesburg and the University of Stirling, UK. Informed, written consent was obtained from all participants with appropriate opportunities for withdrawal without prejudice.

Results

Descriptive Statistics and Correlation Analyses

Descriptive statistics for the WAIS-III subtests, Index and IQ scores are provided in Table 1.

TABLE 1

TABLE 1. Descriptive statistics of scores for all IQ subtests (standard scores unless otherwise stated).

Pearson’s correlation coefficients between the WAIS-III subtests are reported for each sample in Table 2. As expected, the subtests within each Index (VCI, WMI, POI, and PSI) were significantly related to one another (r’s ranging from 0.20 to 0.77) for both samples.

TABLE 2

TABLE 2. Correlations between WAIS-III subtests, for UK and SA samples.

Group Comparisons

A series of multivariate analysis of variances (MANOVAs) were conducted on the age-adjusted WAIS-III measures to investigate potential cultural differences between the UK and SA students (Table 1). The first MANOVA was conducted on the WAIS-III Index scores (VC, PO, WM, PS). The overall group term associated with Hotelling’s T-test was significant (F = 87.53, p< 0.001; $η_{p}^{2}$ = 0.44). Post hoc pairwise comparisons (p< 0.001, Bonferroni adjustment for multiple comparisons) indicated that the UK students scored significantly higher than the South African students in the subtests of the VCIs and POIs. In contrast, the South African students scored significantly higher than the UK students in the PSI. There was no significant group difference in the subtests of the WMI.

Follow-up MANOVAs were conducted on the subtests associated with the IQ indices that yielded significant group differences. The first was performed on the scaled scores of the VCI subtests (Vocabulary, Information, Similarities) and Comprehension. The overall group term associated with Hotelling’s T-test was significant (F =160.08, p< 0.001; $η_{p}^{2}$ = 0.59). Post hoc pairwise comparisons (p< 0.001, Bonferroni adjustment for multiple comparisons) showed that the UK students scored significantly higher than the SA students on all four verbal subtests. This suggests that these subtests draw on culture-specific, acquired knowledge, and appear to be affected by these differences.

The second MANOVA was performed on the scaled scores of the POI subtests (Block Design, Matrix Reasoning, Picture Completion) and Picture Arrangement. The overall group term associated with Hotelling’s T-test was significant (F =31.17, p< 0.001; $η_{p}^{2}$ = 0.22). Post hoc pairwise comparisons (p< 0.001, Bonferroni adjustment for multiple comparisons) indicated that the UK students performed significantly better on all non-verbal subtests, except for Matrix Reasoning, compared to their SA counterparts.

The last MANOVA was performed on the scaled scores for the PSI subtests (Digit-Symbol Coding and Symbol Search). The overall group term associated with Hotelling’s T-test was significant (F =24.23, p< 0.001; $η_{p}^{2}$ = 0.10). Post hoc pairwise comparisons revealed that the SA students achieved significantly higher scores than the UK students on both subtests (p< 0.001, Bonferroni adjustment for multiple comparisons).

Discriminant Function Analysis

In order to identify which measures could uniquely differentiate the two cultural groups, discriminant function analyses were conducted for the nine WAIS-III subtests that produced significant group differences, in a stepwise fashion. Wilks’ Lambda was significant for these subtests (p< 0.001): Vocabulary, λ(1,455) = 0.42; Information, λ(1,454) = 0.42; Similarities, λ(1,453) = 0.40; Coding, λ(1,452) = 0.38; Picture Arrangement, λ(1,451) = 0.37; Symbol Search, λ(1,450) = –0.37; Picture Completion, λ(1,448) = 0.36; Comprehension and Block Design did not differentiate between the two cultural groups. Performance on these seven WAIS-III subtests was sufficient to correctly assign group membership for 93% of both the UK and SA students (91% for UK and 99% for SA). This analysis established that higher VC scores typically characterize the UK students, while higher PS scores typify the SA students.

Multi-Group Confirmatory Factor Analyses

In order to detect potential measurement invariance across groups, a multi-group confirmatory factor analysis (MGCFA) was conducted. This analytic technique has been advocated particularly in the investigation of group differences in IQ test scores (Dolan et al., 2004, 2006; Wicherts and Dolan, 2010). We tested a four-factor model based on the specified Wechsler structure, namely VCI, POI, WMI, and PSI. Group differences for these four factors were investigated by fitting a series of increasingly restrictive models and evaluating the difference in fit between the restricted model and the less-restricted model (see Table 3). Step 1 is a model where parameters were freely estimated; in Step 2, the factor loadings were restricted; and in Step 3, we restricted the measurement intercepts to be invariant, while allowing the factor means to differ across groups. As the successive models were nested, we conducted chi-square difference tests (χ²) between each Step (Bollen, 1989).

TABLE 3

TABLE 3. Goodness of fit statistics and path coefficients for latent constructs (p < 0.05) as a function of country.

The χ² statistic is a commonly used index of goodness of fit for each model. It compares the degree to which the predicted covariances in the model differ from the observed covariances. A good fit is determined by small and non-significant χ² values. Because this statistic is sensitive to variances in sample sizes, with large samples, as in the present study, even the best-fitting models frequently yield significant χ² values (Kline, 1998).

Model adequacy was therefore evaluated using additional global fit indices, such as the Comparative Fit Index (CFI; Bentler, 1990), that are more sensitive to model specification than to sample size (Kline, 1998). Fit indices with values equal to or higher than 0.90 demonstrate a marginal fit, and values above 0.95 indicate a good fit. Further assessment of the extent to which the specified model approximates to the true model is the root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR). RMSEA and SRMR values below 0.05 indicate a good fit, values between 0.08 and 0.10 indicate a mediocre fit, and values above 0.10 indicate a poor fit (McDonald and Ho, 2002). We also included the AIC, a fit measure that takes into account the parsimony of models. Lower AIC values indicate a better fit (Bentler, 1990).

Step 1 was the baseline model in which no between-group restrictions were imposed. Table 3 indicates that the baseline model fits reasonably well in terms of the CFI, AIC, and RMSEA. However, there is a difference in the factor loading between the UK and SA samples with respect to VC and WM. In Step 2, the factor loadings were restricted to be equal across both groups. As can be seen in Table 3, this restriction is accompanied by a significant increase in the chi-square value. The fit indices also worsen with this restriction. A chi-square difference test indicated a significant difference in fit between Step 1 and Step 2: χ²(9) = 64.26; p< 0.001. In Step 3, the measurement intercepts were restricted to be equal across both groups. As can be seen in Table 3, this restriction is accompanied by a significant increase in the chi-square value. The fit indices also worsen with this restriction. A chi-square difference test indicated a significant difference in fit between Step 2 and Step 3: χ²(9) = 187.82; p< 0.001. In this case, a step was accompanied by a clear deterioration in model fit and the particular restriction was rejected. Thus, the conclusion that can be drawn that there is some discrepancy in the model fit of a four-factor model of IQ between the British and South African samples and that the WAIS-III lacks measurement invariance across these samples.

Exploratory Factor Analysis

Because of the poor fit on the CFA described above, and in order to investigate the factorial structure of the WAIS-III items cross-culturally, all 13 subtests were submitted to a principal components analysis (PCA), separately for the UK and SA students (Table 4).

TABLE 4

TABLE 4. Principal components analyses.

The factor structure for the SA sample (factor loadings of >0.40) matched the specified WAIS-III indices for VCIs, POIs, WMIs, and PSIs. The only difference was that the Arithmetic subtest also loaded on the VCI.

In contrast, there was a different pattern for the UK sample, as eigenvalues greater than one indicated three factors (factor loadings of 0.40 or greater). The first factor included subtests from both the POI and PSI. The VCI and WMI were largely similar to those for the original WAIS-III. However, the Arithmetic subtest also loaded on the VCI and the Comprehension subtest also loaded on the WMI.

Discussion

In the face of limited resources in third world countries, it is practical to avoid ‘re-inventing the wheel’ when it comes to measures of cognitive ability, but rather to research and adapt existing, psychometrically sound measures for use with local populations (Shuttleworth-Edwards, 1996). The WAIS is the most widely used and researched measure of adult intelligence, and it is thus practical to evaluate its use with populations quite different to the standardization samples, in this case multilingual, English second language speakers and from low socio-economic circumstances, yet with a sufficient degree of testwiseness and English proficiency. Of the factors known to influence IQ test performance, the SA groups were disadvantaged in several respects; while they had high self-reported proficiency in English, this was not their mother-tongue, they had received poor quality secondary schooling and came from low SES circumstances. When their performance was compared to a more aﬄuent, monolingual, UK group, it was found that the WAIS-III appears to lack measurement invariance across the two samples. An integrated interpretation of the statistical results pointed to three findings of interest. (1) There was no evidence of cultural biases in the Matrix Reasoning subtest or in the WM subtests that had minimal reliance on language. (2) All of the verbal and most non-verbal subtests, as well as the PS subtests, showed evidence of cross-cultural differences. (3) The SA and UK samples’ scores revealed different factor structures. Each of these findings is discussed in detail below.

The finding of equivalent performance between the SA and UK samples on the subtests of the WMI suggests that these may be the fairest measures of intellectual ability across culturally, linguistically, and socio-economically diverse groups. WM assessments draw on ability that is closely related to fluid intelligence, is not explicitly taught and is thus not knowledge-based. Since the procedures and stimuli used in WM tests are designed to be equally unfamiliar to all participants or draw on well-learned material, such as numbers and letters, they are less likely to confer obvious advantages or disadvantages to individuals from vastly differing backgrounds and are relatively uninfluenced by environmental factors such type and quality of education, SES and linguistic background (Alloway et al., 2004). This is particularly likely in the case of the LNS and Digit Span subtests of the WMI, which assess non-semantic, auditory aspects of WM. The LNS subtest of the WMI has been found to be the least vulnerable to socio-cultural effects in black Africans, with both advantaged and disadvantaged secondary and primary school education (Shuttleworth-Edwards et al., 2013). The third subtest in the WMI, Arithmetic, draws on several abilities (VC, WM, freedom from distractibility, numerical knowledge, and procedural knowledge) so that its consequent dual loading on both VCI and WMI for both UK and SA samples is not unexpected and has been demonstrated elsewhere (Grégoire, 2013). Its verbal content may mean that this subtest also holds some cultural biases and this may account for the differences in factor loadings between UK and SA samples on the WMI found in Step 1 of the MGCFA.

There is some debate regarding the extent to which WM tests are free of cultural biases (Ostrosky-Solis and Lozano, 2006). However, most cross-cultural differences pertain to articulatory rates in different languages for digits in the Digit Span task. Testing was conducted in English for both samples in the current study, thus eliminating this criticism. Cross-cultural differences in WM performance have been found with children from Zaire and Laos on the Kaufman Assessment Battery for Children (K-ABC; Boivin, 1991; Conant et al., 1999, 2003). It is possible that cross-cultural factors, as well as age of commencement of formal education and learning to read and write, may affect the developmental rate and organization of verbal and visuospatial components of WM development in children and greater stability may be evident in terms of these skills from early adulthood. This needs further investigation and may mean that WM assessments may only be suitable non-biased measures from young adulthood onward.

A similar explanation to that given for the WMI subtests would account for the absence of cross-cultural differences on the Matrix Reasoning subtest, which was added to the WAIS-III to ‘enhance the assessment of non-verbal, fluid reasoning’ (The Psychological Corporation, 1997, p. 6). The Matrix Reasoning subtest is modeled on the Raven’s Progressive Matrices, a classic measure of fluid reasoning and both tests are highly correlated (r= 0.80; Raven et al., 1988; Carpenter et al., 1990; The Psychological Corporation, 1997). The construct validity of the Matrix Reasoning subtest, however, has not been well-studied and the findings from the current study provide further evidence of its potential as a culture-fair measure.

Despite attempts to broaden the measurement of unbiased, fluid reasoning processes in the WAIS-III (The Psychological Corporation, 1997), the majority of its subtests appear to show cross-cultural effects. Given previous evidence, it was unsurprising that the verbal subtests showed the highest degree of cultural difference (Shuttleworth-Edwards et al., 2004; Razani et al., 2006). Cultural biases were also evident in two of the PO subtests (Block Design and Picture Completion), as well as Picture Arrangement. This supports the view that non-verbal tests are not necessarily culturally fairer than verbal ones (Rosselli and Ardila, 2003; Shuttleworth-Edwards et al., 2013). Past cross-cultural research on the Block Design subtest has yielded mixed results. Some identified it as discriminating against individuals from non-Western cultures and/or deprived educational backgrounds (Ardila and Moreno, 2001; Shuttleworth-Edwards et al., 2004, 2013). Other studies have found no significant differences between Anglo-Americans and fluent English-speaking individuals from ethnically diverse (Hispanic, Asian, and Middle-Eastern) backgrounds on Block Design and Matrix Reasoning (Razani et al., 2006). These differences may be due to differing levels of acculturation in the various samples. Most differences between the UK and SA samples in the current study were found on verbal and non-verbal measures that draw on acquired knowledge and learned problem solving strategies.

The significant difference between the SA and UK samples in terms of PS, in favor of the SA sample was surprising given a body of research indicating that PS is an area where individuals of African descent underperform relative to Caucasians and Hispanics from Westernized, urbanized countries. Reduced PS by individuals of African origin is proposed to be due to a tendency to value careful deliberation and accuracy over speed (Nell, 2000; Heaton et al., 2003; Shuttleworth-Edwards et al., 2004). The reasons for the contrary finding in the current study are most likely a result of multiple interacting factors, which may include sampling and socio-cultural changes in SA over the last 20 years, including the Flynn (1984, 1987, 2009) effect.

In terms of sampling bias, both samples in this study comprised student volunteers. This yielded some differences between the two countries. Although the students were all undergraduates, there was much greater variance in the ages of the UK sample (M= 27.18 years; SD = 8.74 years), while the SA sample was fairly homogenous in terms of age (M= 20.79; SD = 1.56 years). Increased age is associated with a decrease in PS and although age-adjusted standard scores were used in the analyses, this may not completely exclude all variance associated with age (Lezak et al., 2004).

Another issue that may have influenced the results on the PSI are socio-cultural ones resulting from a changed SA society since the ending of apartheid 20 years ago. As a consequence of these changes, the ‘born free’ generation (children born after the end of apartheid, as is the case in the current SA sample) are less different to Western populations than previous generations of black South Africans. The ‘born free’ generation has witnessed dramatic changes in comparison to their parents. There has been increased access to technology, with all students possessing mobile phones, most which allow internet access and consequent acculturation; the state has provided access to child and health care, family planning and nutrition; social grants have expanded from 2.7 million people in 1994 to 16 million in 2014; there has been an emphasis on, and investment in, education and the school curriculum has been changed from one that focused on rote repetition and obedience to authority, to a focus on the scientific method and active problem solving (although there are still inequities in the quality of education provided by the different education systems; Maloka, 2013). Other positive changes include electrification of homes – in 1994, 59.5% of SA households had access to electricity. Statistics South Africa (2012), this had risen to 85.3%. The proportion of formal households has increased from 63.1% in 1996, to 77.6% in 2011 and those residing in shacks or informal dwellings has decreased over the same period from 16.2 to 13.6% (Statistics South Africa, 2011). These changes altered not only the quality of life for a large proportion of black South Africans, but have also led to a change in self-esteem in the black community – from being marginalized, to being a valued member of society.

Several of the socio-cultural factors mentioned above are associated with the Flynn effect, which are reported to be strongest on fluid measures (e.g., Matrix Reasoning). Since WM and PS are strongly related to fluid ability, concomitant influences could also be expected in these areas (Weiss, 2010). If the current SA data are compared to that of Shuttleworth-Edwards et al. (2004, 2013) for black Africans with an ex-DET education and 15+ years of education (n= 12), published 10 years ago, the data are within 1 standard deviation of each other, and do not differ significantly in terms of Index and IQ scores. Nonetheless, the slightly higher PSI obtained in the current study relative to that of Shuttleworth-Edwards et al. (2004, 2013) could arguably be reflective of Flynn effects combined with the better life circumstances and opportunities for present day black South Africans.

Returning to the PSI comparisons between the SA and UK groups in the current study, these indicate functioning within the average range for the South African sample, and performance that is in the low average range for the UK group, which is unusual, particularly for a university population. In an attempt to better understand these differences, the factor structure of the WAIS-III scores for each country was analyzed, since a comparison of means is based on the assumption that the meaning of the items and the subtests is similar in the groups being compared, which would imply that the factor structure is also similar in both groups. The factor structure of the WAIS-III was not equivalent for the UK and SA samples. A four factor structure provided the best fit for the SA sample, following the four indices of the WAIS-III, although Arithmetic loaded on both WMI and VCI. Interestingly, a four factor structure could not be replicated with the UK sample. Instead, a three factor model provided the best fit, where the PSI subtests loaded together with those measuring non-verbal, PO and reasoning skills.

This finding suggests that the UK sample utilized similar cognitive processes to complete the non-verbal PSI and POI subtests, which were possibly a combination of accuracy and speed since some of the POI subtests are timed. In support of this, European individuals (from Germany, France, UK, Finland, and Spain) have been found to focus on accuracy over speed on the WAIS-III in comparison to US individuals (Roivainen, 2010). Unlike the UK sample, the SA group appeared to distinguish between the POI and PSI subtests, drawing on different skills for each index. Thus, the discrepancy in performance on the PSI is explicable in terms of the differing factor structures in the two groups. The PS performance of the SA group may also account for their performance on the WM tests, as faster processing allows for greater mental rehearsal, faster execution of ongoing tasks and greater synchronization of task components in WM (Nettelbeck and Burns, 2010).

In conclusion, the findings from this study add to existing evidence that the majority of the subtests in the WAIS-III hold cross-cultural biases. These are most evident in tasks which tap crystallized, long-term learning, irrespective of whether the format is verbal or non-verbal. This challenges the view that visuo-spatial and non-verbal tests tend to be culturally fairer than verbal ones (Rosselli and Ardila, 2003). Subtests tapping PS also appear to hold biases. Differences on the latter subtests may reflect cultural and/or experiential background. Of value for theory and practice, those subtests tapping WM and fluid reasoning (Matrix Reasoning) showed no cross-cultural effects. Consequently, one of the most effective ways of tapping universal cognitive functions may be via WM and this warrants further investigation. Identification of tests that do not favor individuals from Eurocentric and favorable SES circumstances with advantaged educational backgrounds is valuable in providing direction for the development of culture fair tests. Given the findings of the current study and evidence of a strong relationship between WM and educational outcomes (Cowan and Alloway, 2008; Alloway and Alloway, 2010), measures of WM offer promise in the search for fairer assessment practices with individuals from diverse backgrounds.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Alloway, T. P., and Alloway, R. G. (2010). Investigating the predictive roles of working memory and IQ in academic attainment. J. Exp. Child Psychol. 106, 20–29. doi: 10.1016/j.jecp.2009.11.003

PubMed Abstract | Full Text | CrossRef Full Text | Google Scholar

Alloway, T. P., Gathercole, S. E., Willis, C., and Adams, A. M. (2004). A structural analysis of working memory and related cognitive skills in early childhood. J. Exp. Child Psychol. 87, 85–106. doi: 10.1016/j.jecp.2003.10.002