ORIGINAL RESEARCH article

Front. Psychol., 09 August 2021

Sec. Psychology of Language

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.657706

Category Exemplar Production Norms for Hong Kong Cantonese: Instance Probabilities and Word Familiarity

  • 1. Department of Linguistics and Translation, City University of Hong Kong, Kowloon, Hong Kong

  • 2. Hong Kong Institute for Advanced Study, City University of Hong Kong, Kowloon, Hong Kong

  • 3. Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

  • 4. Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan

Abstract

The lexical system of Hong Kong Cantonese has been heavily shaped by the local trilingual environment. The development of cultural- and language-specific norms for Hong Kong Cantonese is fundamental for understanding how the speaker population organize semantic memory, how they utilize their semantic resources, and what information processing strategies they use for the retrieval of semantic knowledge. This study presents a normative database of 72 lexical categories in Hong Kong Cantonese produced by native speakers in a category exemplar production task. Exemplars are enlisted under a category label, along with the instance probabilities and word familiarity scores. Possible English equivalents are given to the exemplars for the convenience of non-HKC speaker researchers. Statistics on categories were further extracted to capture the heterogeneity of the categories: the total number of valid exemplars, the number of exemplars covering 90% of the occurrence and the probabilities of the most frequent exemplars in each category. The database offers a direct lexical sketch of the vocabulary of modern Hong Kong Cantonese in a categorical structure. The category-exemplar lists and the comparative statistics together lay the foundations for further investigations on the Hong Kong Cantonese speaking population from multiple disciplines, such as the structure of semantic knowledge, the time-course of knowledge access, and the processing strategies of young adults. Results of this norm can be also used as a benchmark for other age groups. The database can serve as a crucial resource for establishing initial screening tests to assess the cognitive and psychological functioning of the Cantonese-speaking Hong Kong population in both educational and clinical settings. In sum, this normative study provides a fundamental resource for future studies on language processing mechanisms of Hong Kong Cantonese speaking population, as well as language studies and other cross-language/culture studies on Hong Kong Cantonese.

Introduction

Being categorical is a fundamental property of our knowledge of the world (Barsalou, 2003). Categorization, i.e., sorting things based on their shared components, is an important information processing activity embedded in our perceptions of the surroundings and interactions with them. Research has shown that the categorical frame of semantic knowledge and the uneven statuses of category members profoundly influence our language and information processing (Rips et al., 1973; Rosch and Mervis, 1975; Mervis and Rosch, 1981), to a degree that being categorical pervades the way we think and live in the social context (Marx and Ko, 2012). To understand the categorical structure of knowledge, both psychologists and linguists have pursued inquiries such as what is (or is not) a type of X, and why? and is item X considered a good category member, and why? (Lakoff, 1973; Smith et al., 1974; Rosch, 1975).

There is a graded structure within categories (Lakoff, 1973; Rosch, 1975; Barsalou, 1985), which consists of a core that includes the most representative (i.e., high-typical) examples surrounded by the exemplars which are less representative (i.e., low-typical). In other words, category members are not equal in terms of the “goodness” of their membership. This non-equivalence of category members (Mervis and Rosch, 1981) is reflected in the probability that a member will be recalled in production tasks, or in the subjective rating of a proposed category member’s degree of typicality (Rosch, 1975). Nevertheless, the frequency of which an exemplar is mentioned in a production task is significantly correlated with its typicality rating (Mervis et al., 1976; Mervis and Rosch, 1981). Therefore, the frequency results from a category exemplar production task are also reliable for indexing exemplar typicality. In this way, the exemplar production task conveniently provides both the exemplars and their typicality measurements at the same time.

Higher typicality is usually associated with higher processing efficiency in terms of production probability, accuracy, and reaction time (for a review, see Rosch et al., 1976, and many others). Processing efficiency (i.e., the typicality effect) has been observed and verified in multiple categorization tasks in a variety of studies, such as category acquisition, exemplar production, and membership verification (also reviewed in Barsalou, 1985). For example, in membership verification tasks, when a subject is asked to verify a statement in a sentence such as “X is (not) a (kind of) Y” as rapidly as possible, more typical or representative items are processed with shorter reaction times regardless of the statement’s veracity (Mervis and Rosch, 1981). Only typical category-instance pairs are facilitated by the category name in the same-or-different matching task (reviewed in Rosch et al., 1976). Developmentally, children integrate typical instances of categories into their language and conceptual systems before atypical instances (Bjorklund et al., 1983). However, it has also been suggested that processing efficiency may be attributable to the high familiarity of a word (often measured by word frequency) rather than its high typicality, because highly familiar exemplars are generally more salient in all daily language use scenarios. The more familiarized exemplars are recalled more often and rated as more typical (Janczura and Nelson, 1999); the fact that the exemplars rated as less typical may be due to the low familiarity of a target word (Malt and Smith, 1982). Furthermore, it has been suggested that cross-cultural discrepancy of typicality may be due to the general cultural familiarity (Schwanenflugel and Rey, 1986). The literature on the underlying mechanism of processing efficiency and the possible interaction of typicality and familiarity presents contradictory evidence with no clear conclusions (Rosch et al., 1976; McCloskey, 1980; also see Murphy, 2002, for a review).

The typicality effect interacts with other categorical properties of exemplars, such as the categories they belong to. To further identify and observe typicality effect in interaction, categories can be further split into subgroups of different types (thus adding an extra stratum to categories), and the effects of the contrasting characteristics of the category subgroups can be investigated. For example, in studies of language deficits that aimed to identify the selective impairment of domain-specific knowledge attributable to brain damage, differences have been observed in language processing with respect to inanimate vs. animate words (Caramazza and Shelton, 1998), concrete vs. abstract words (Catricalà et al., 2014), and words from well-defined/closed vs. fuzzy-boundary categories (Kiran and Johnson, 2008). The outcomes of these studies highlight the necessity for and research potential of a normative database with comprehensive coverage of categories and exemplars and, ideally, with reliable prescriptive statistics.

Language materials in the target language are fundamental to research and experiments such as those described above. These materials are usually presented in a database containing a considerable number of categories with exemplars produced by native speakers responding to a category cue. Over time, the number of categories included keeps expanding to meet the demand for a larger and more heterogeneous coverage, as demonstrated by the expansion of databases over time; for example, the original “Connecticut norms” included 43 categories (Cohen et al., 1957), which were expanded to 56 categories by Battig and Montague (1969) and then to 106 categories by McEvoy and Nelson (1982). Norms have also been replicated and constantly updated to capture the conceptual shifts and drifts over time and socio-cultural differences (Van Overschelde et al., 2004), and also in languages besides English (e.g., Bueno and Megherbi, 2009 in French; Storms, 2001 in Flemish). The exemplars in each category and their comparative statistics provide a detailed and rich image of the semantic resource in a given target language. These categorical data are usually collected from native speakers who are healthy young adults (e.g., college students). The results are then used as benchmarks for comparison with other age groups (i.e., children and older adults) or adults with impaired cognitive functions and language abilities. For example, a series of studies demonstrated that the use of atypical exemplars from various categories is an effective training method for patients with aphasia (Kiran et al., 2011).

Given the time and expense associated with a norming study, it is not surprising that few appropriate non-English databases are available. The current study aims to address this issue for Hong Kong Cantonese, which is the lingua franca among Hong Kong Chinese population. The spoken and written forms of Hong Kong Cantonese have emerged from the combination of Hong Kong’s special socioeconomic status, its colonial history and the inevitable language contact with Mandarin Chinese (Putonghua) since the handover from Great Britain in 1997. Hong Kong Cantonese is distinct from other varieties of Chinese with similar or even mutually understood pronunciations (e.g., Cantonese spoken in Guangdong) and consistent writing systems (e.g., Taiwan Mandarin, which is also written in Traditional Chinese characters). At the lexical level, Hong Kong Cantonese has been strongly shaped by a trilingual environment in which Cantonese, English, and Putonghua are used simultaneously (sometimes even within the composition of a word). Specifically, language elements from English of various lengths and units were fused into the daily usage both in non-formal writings and in speech (i.e., Cantonese-English code switching, Li and Lee, 2004), with phonetic borrowing and transliteration used as tools and resources (Li, 2000). For example, the transliteration of strawberry in Hong Kong Cantonese results in “士多啤梨” (“si6 do1 be1 lei4,” “strawberry”). This representation is understood by most Cantonese speakers in the adjacent province, and even some Mandarin speakers. Nonetheless, the formal and preferred name of the fruit for Cantonese speakers outside Hong Kong is “草莓,” “cǎo méi” which is often used as the written form of “strawberry” in Hong Kong Cantonese (but rarely as the colloquial form). In addition, certain concepts, and hence the words and phrases representing them, only exist in Hong Kong Cantonese. For example, the concept and term “公屋” (“gung1 uk1” “public/government-owned housing”) is used by Hong Kong Cantonese but not by Guangzhou Cantonese speakers. The word is not in the lexical inventory of Guangzhou Cantonese speakers, but they would not find it difficult to read the Chinese characters and pronounce them in Cantonese, and they could probably guess the meaning.

In light of these considerations, this study conducted two experiments to establish a categorical normative database of Hong Kong Cantonese consisting of multiple categories and exemplars. One is a category exemplar production task, and the other one is the familiarity rating task. Within each category, the instance probability of every exemplar and its familiarity rating score was calculated. Furthermore, various indices associated with the recalled exemplars in each category were complied to capture the heterogeneity across the categories.

Materials and Methods

Experiment 1: Category Exemplar Production Task

Materials

This experiment included 84 categories. The full list of categories was adapted and modified from one of the author’s unpublished work and a cross-language sociolinguistic norm study (Yoon et al., 2004). All lexical forms of the category names and the written materials were advised and verified by two native Hong Kong Cantonese speakers. A pilot study has been conducted to ensure that categories are “productive.”

Participants

Forty young adults aged between 18 and 24 years (mean 20.2 years; 20 females) participated in this study. All participants were native Hong Kong Cantonese speakers who were raised in Hong Kong up to the age of 18, with Cantonese reported as their mother tongue. The participants completed a language ability questionnaire in which they were instructed to self-evaluate their Cantonese reading, listening, speaking, and writing proficiency levels. Their Cantonese language abilities of all the above four aspects were reported as proficient. However, all of them would have been exposed to a mixed rather than a monolingual language background because of the “bi-literacy and tri-lingualism” language education policy imposed by the Education Bureau of Hong Kong. All of the participants had a normal reading ability and no reported cognitive impairments. The study was approved by the Institutional Review Board of the City University of Hong Kong, and all of the participants provided written informed consent prior to their participation.

Procedures

Eighty four categories (i.e., trials) were included in the experiment. The trial order was randomized. Each participant completed 84 trials which divided into two blocks of 42 trials each, considering the time consumption and fatigue of completion. The participants were asked to produce three exemplars for each category at their own pace. They were instructed to produce the three most representative examples they could think of for a particular category, following the order of the “goodness” of category membership (best fit, second-best fit, third-best fit). The responses were preferably words comprising two or three Chinese characters. The participants input their responses into the interactive online survey form using the provided desktop computers in a controlled and supervised environment. An interactive page for each category began with the cue: “A type of AAA” (where AAA represents the category name). The participants were then prompted by the text following the text line of the category name: “1. The best fit that comes to mind,” “2. The second-best fit that comes to mind,” and “3. The third-best fit that comes to mind.” They were asked to fill in all three slots, with no skipping. Each participant took a short break between the two trial blocks.

Compilation of the Exemplars From the Individual Responses for Experiment 2

Data cleaning and item combining were manually applied to individual cases. Typos were identified and corrected; for example, “債卷” (typo) was changed to “債券”(corrected, “zaai3 hyun3,” “bond,” as a response to “a kind of investment tool”). The mixed usage of simplified Chinese characters was adjusted; for instance, “生气” (Chinese-Simplified) was changed to “生氣” (Chinese-Traditional, “sang1 hei3,” “angry,” in response to “a mood state”). Allographs were unified; for example, “雞” and “鷄” (allographs for “chicken,” “gai1”) were merged to yield “雞.” Variations of words used to describe a very similar or identical concept were merged into a common form and treated as identical, as in the case for “乳牛”(“jyu5 ngau4”, milk cow) and “奶牛” (“naai5 ngau4”, “milk cow”), which were deemed to describe the identical concept of “milk cow” as a response to “a kind of farm animal.” These variations are due to differences between formal and informal speech rather than to conceptual differences (there is no analogous example available in English).

Experiment 2: Familiarity Rating

Understanding whether the familiarity of a word impacts its categorical typicality is an essential step toward understanding how semantic knowledge is organized. This experiment assessed the familiarity of each concept in the general context of participants’ daily lives. Participants were instructed to rate how often they encountered a target word (an exemplar from Experiment 1) in all the life scenarios, instead of being under a specific category. Note that here the categorical information was not given to the word to be rated.

Materials

The participants provided familiarity ratings for the words generated in the first experiment. Categorical information from the previous task was given only if the exemplar was potentially ambiguous, by referring two different concepts belonging to two categories. For example, “杜鵑” (“dou6 gyun1”) can refer either to the rhododendron flower (“杜鵑花,” “dou6 gyun1 faa1,” rhododendron) or a cuckoo bird (“杜鵑鳥,” “dou6 gyun1 niu5,” cuckoo, a very common image in traditional poetic rhetoric). In such cases, categorical information (often indicated by a single Chinese character, such as “花” for “flower” and “鳥” for “bird”) was given in parentheses at the end of the target word for disambiguation. For example, the item was presented as “杜鵑(花)” [“dou6 gyun1(faa1),” rhododendron] if the target word was from the category “a kind of flower.”

Participants

Forty additional young adults aged 18–23 years (mean 20.3 years; females = 20) were recruited for this study. None of these participants had prior exposure to the test materials. The recruitment process and eligibility criteria for the participants were the same as those in Experiment 1. All of the participants were native Hong Kong Cantonese speakers with normal reading ability and no reported psychiatric disorders. The study was approved by the Institutional Review Board of the City University of Hong Kong, and the participants provided written informed consent prior to their participation.

Procedures

The whole set of target words was randomized and split into two lists. Each participant provided familiarity ratings for one list of approximately 650 words. The test environment was monitored as described for Experiment 1.

The participants were asked to rate the familiarity of the exemplars using a 7-point scale ranging from 1, “extremely unfamiliar,” to 7, “very familiar.” The participants were instructed to rate the target words based on their subjective daily personal experiences.

Results

Measurements: Categories, Exemplars, and Familiarity

The database included 1298 items in 72 categories. The results from the two experiments were integrated and presented in tables, one for each category. A representative example is shown in Table 1. A word code (Word Code) was assigned to every exemplar under a specific category using the format HKC (the acronym for Hong Kong Category) followed by a 3-digit category code (e.g., “001” for “a kind of farm animal”) and a 2-digit exemplar code. The exemplar (Word) was numbered to indicate the descending rank of total probabilities within the category. “Slot 1,” “Slot 2,” and “Slot 3” indicate the probabilities of a given exemplar being allocated on the best/second-best/third-best slot. The probability on a slot is the number of mentions in the given slot divided by the total number of eligible entries for that slot, rounded to three decimal places. “Total” is the instance probability of an exemplar, regardless of the slot. “Accumulative” is the summed instance probability of the exemplar and that of all its precedents. Logically, this value increases with each exemplar in the category until it reaches 1.000 at the last exemplar. “Familiarity” is the average of all rating scores given by all participants who viewed that exemplar. Possible English Equivalents are listed as the possible corresponding concepts in the English, for the convenience of the researchers who are interested in further studies on cross-language comparisons.

TABLE 1

Category of “a farm animal” (HKC001).

The exemplar in a category at which the Accumulative reaches 0.90 is shadded. SD is the standard deviation of the Familiarity score.

At the end of each table, Invalid xx is a designed virtual exemplar indicating sum of the probabilities of all invalid responses, where “xx” is the category code. Invalid responses may be due to mistyping, misunderstanding of the category name, or lack of knowledge of the category (as mentioned in the data compiling and cleaning section). In some cases, when the participants were unable to think of a word, they repeated a response or rephrased it to an interchangeable lexical item. For example, both “石屎” (“sek6 si2”) and “英泥” (“jing1 nai4”) refer to “cement” in the category of “construction materials”; only that “石屎” (“sek6 si2”) is more colloquial. In other cases, participants generated non-referring items, such as “square” or “round” in the category of “natural geographical feature,” indicating unfamiliarity of geographical terminology; while participants in other studies were able to produce more relevant and referring terms such as “mountain” and “lake,” as the cases in other norm studies (Van Overschelde et al., 2004). Such non-referring items were considered as invalid responses. The probabilities of the invalid responses for the individual slots were not considered informative and thus were omitted from the table by designating the values as “N/A.” No familiarity scores were associated with the invalid responses, and hence this field was also marked as “N/A.” When there was no invalid response in a given category, the Total of the Invalid xx was 0.000, and when the Total of Invalid xx reached 0.500, the category would be excluded from the final table. Invalid responses may have occurred because the category and its related information were unfamiliar or unavailable to Hong Kong Cantonese speakers; therefore, categories with more than 50% invalid responses were discarded because they were not able to represent the consensus of semantic knowledge in the population. Twelve categories (e.g., a kind of natural geographical feature) were discarded from the final list (see the Appendix). A total of 72 categories were included in the following analyses.

Reliability of the Measurements

Split-Half Correlations

To ensure the consistency and reliability of the data, split-half correlations were applied and corrected using the Spearman–Brown formula on both Slot1 and Total with data from the 40 participants split into the first half and second half. For Slot1, the split-half correlation was generally very high (median = 0.911), although three categories were lower than the threshold of 0.700: “Toy” (r = 0.490), “Fuel” (r = 0.676), and “NGO” (r = 0.596). For Total, the split-half correlation was very high for each category (median = 0.945, range = [0.840 −0.993]). For the familiarity results, an identical split-half correlation was applied and corrected using the Spearman–Brown formula, and the rating results from the two subgroups were highly correlated (r = 0.915). The high correlations show that the data of the two experiments were reliable and consistent.

Slot 1 and Total

Previous studies have suggested that the most frequently generated exemplar within a category is the most typical and hence the central member of that category (Barsalou, 1985). The more central an exemplar, the faster and more frequently it is recalled as a category exemplar, as the search process follows a fixed order (Rosch, 1973). Given this logic, the exemplars that are recalled most frequently (higher Total) should also be recalled as the first responses (best-fit) more frequently (in Slot1). To examine this hypothesis, the Pearson’s correlation coefficient of the two sets of probabilities (Slot1 and Total) of the exemplars was calculated for each category as shown in Table 2. Note that only the exemplars that had been mentioned in Slot 1 at least once (i.e., the value of Slot1 was >0.000) were included in the correlation analysis. Besides the number of included exemplars in the correlation analysis n, the total number of exemplars listed in a category N was also presented. Significant positive correlations were observed for 62 out of 72 categories and marked in the table, confirming that most frequently recalled exemplars are also likely to be mentioned first for the majority of the categories.

TABLE 2

Category CodePearson’s rn1Valid Exemplars2
HKC0010.852*711
HKC0020.801**1017
HKC0030.865**1020
HKC0040.955**819
HKC0050.830**1116
HKC0060.875**911
HKC0070.813*716
HKC0080.927**817
HKC0090.861*712
HKC0100.897**1015
HKC0110.892**914
HKC0120.905**1225
HKC0130.698512
HKC0140.961**714
HKC0150.830**918
HKC0160.958**617
HKC0170.904**924
HKC0180.943**1220
HKC0190.965**1327
HKC0200.977**1124
HKC0210.827**1316
HKC0220.931**718
HKC0230.785*919
HKC0240.822**1120
HKC0250.912**1423
HKC0260.633716
HKC0270.936**719
HKC0280.837**1023
HKC0290.959**1632
HKC0300.847**1325
HKC0310.854**1015
HKC0320.943**714
HKC0330.918**1128
HKC0340.890**1323
HKC0350.858**1222
HKC0360.946**1020
HKC0370.981**1026
HKC0380.763*817
HKC0390.770*818
HKC0400.793*813
HKC0410.844**913
HKC0420.897**1114
HKC0430.935**614
HKC0440.905*519
HKC0450.821*69
HKC0460.787514
HKC0470.674*1028
HKC0480.24549
HKC0490.893**714
HKC0500.826**1831
HKC0510.758319
HKC0520.634512
HKC0530.667616
HKC0540.970**925
HKC0550.980*413
HKC0560.907**814
HKC0570.915*518
HKC0580.625913
HKC0590.929**812
HKC0600.879**1026
HKC0610.954**932
HKC0620.606*1315
HKC0630.802*711
HKC0640.911**1625
HKC0650.808512
HKC0660.791515
HKC0670.830**1118
HKC0680.779**1022
HKC0690.886*510
HKC0700.740**1326
HKC0710.958*512
HKC0720.836**810

Total-Slot1 correlations on all categories.

1Only the exemplars of which Slot1 > 0.000 are included in the correlation analysis. The n is thus the number of exemplars mentioned at least once on Slot1 in a category, while the ones not recalled on Slot1 (Slot1 = 0.000) are excluded in the correlation.

2The actual number of all the valid exemplars listed in a category.

*p < 0.05, **p < 0.01.

Familiarity and Total

Here, familiarity is defined as the average score of participants’ subjective ratings of the frequency of encountering an exemplar across all daily contexts and scenarios. The participants were not given categorical information about the target words in the familiarity experiment (except in cases of ambiguity), which differs from the procedures in some studies (e.g., Hampton and Gardiner, 1983). Familiarity measures how often the target word (the written form of a concept) is experienced in a general context, among other words which are not necessarily from the same category. In experiment 2, we asked participants to rate how often they experienced (by hearing, reading or using, etc.) the word “狗” (“gau2,” dog) in their daily lives, instead of asking them to rate how often they had experienced it as “a kind of domestic pet.” This approach of avoiding the co-occurrence of the exemplar and its category limited the potential interaction between general familiarity with the concept itself, as well as familiarity with the concept cued by a certain category name. To further examine the relationship between the probability the production probability of an exemplar under a given category (Total) and the familiarity of the concept in general (Familiarity), the correlations between Total and Familiarity were calculated within each category (see Table 3). Each category contains different numbers of exemplars in this analysis, and for each category there is a correlation r and a corresponding p-value. No significant correlations were identified for the majority of categories (51 of 72), indicating that more frequently experienced concepts were not necessarily produced more frequently in response to a category cue. As mentioned earlier, instance probability is a legitimate a measurement of exemplar typicality, and the familiarity of a word is highly correlated with its frequency. Thus, the results of the current study are in line with those of previous studies (Mervis et al., 1976; Rosch et al., 1976).

TABLE 3

Category CodeCategory NamePearson’s rNumber of Exemplars in Category (N)p-value
HKC001農場動物Farm Animal0.481110.134
HKC002調味料Spice0.560*170.019
HKC003家用電器 Household Appliance0.461*200.041
HKC004汽車零件 Car Part0.302190.223
HKC005浴室用品 Bath Utensil0.317160.231
HKC006寵物 Pet0.868**11< 0.001
HKC007酒精飲料 Alcohol Drink0.489160.055
HKC008罪行 Crime0.237170.36
HKC009建築材料 Construction Material0.129120.689
HKC010布料 Fabric0.100150.724
HKC011化粧品 Makeup Product0.305140.29
HKC012鳥類 Bird0.440*250.028
HKC013乳製品 Dairy Product0.350120.265
HKC014舞蹈 Dance0.497140.071
HKC015水果 Fruit0.539*180.021
HKC016消防器材 Firefighting Supply0.385170.126
HKC017花 Flower0.359240.085
HKC018民間藝術 Folk Art0.329200.156
HKC019野生動物 Wild Animal0.311270.114
HKC020疾病 Disease0.425*240.043
HKC021茶葉 Tea0.484160.058
HKC022家俬 Furniture0.044190.863
HKC023房屋類型 Housing Type0.375190.114
HKC024昆蟲 Insect0.528*200.017
HKC025廚房用具 Kitchen Utensil−0.066230.766
HKC026金屬 Metal0.565*170.023
HKC027零食 Snack0.272190.259
HKC028樂器 Musical Instrument0.558**230.006
HKC029職業 Profession0.438*320.014
HKC030人體器官 Human Organ0.095250.652
HKC031寶石 Gem0.562*150.029
HKC032祭祀用品 Ancestral Worship Item−0.082140.781
HKC033運動 Sport0.387*280.042
HKC034園藝工具 Gardening Tool0.248230.254
HKC035玩具 Toy0.241220.279
HKC036蔬菜 Vegetable0.408200.074
HKC037武器 Weapon0.277260.17
HKC038天文現象 Astronomical Phenomena−0.096170.714
HKC039文具 Stationery0.434180.072
HKC040沐浴用品 Bath Product0.514130.073
HKC041器皿 Container0.477130.1
HKC042清潔工具 Cleaning Tool0.225140.44
HKC043照明工具 Lighting Appliance0.264140.363
HKC044投資工具 Investment Tool0.443190.057
HKC045交通工具 Transportation0.829**90.006
HKC046形狀 Shape0.493140.073
HKC047情緒 Emotional State0.345280.078
HKC048餐具 Tableware0.11690.766
HKC049急救用品 First Aid Supply0.343140.229
HKC050休閒活動 Recreational Activity0.322310.077
HKC051街頭小食 Street Food0.322190.179
HKC052語言 Language0.886**12< 0.001
HKC053電影類型 Movie0.068160.801
HKC054營養補充劑 Nutritional Supplement0.304250.139
HKC055標點符號 Punctuation Mark0.739**130.004
HKC056貨幣 Currency0.644*140.013
HKC057茶樓點心 Teahouse DimSum0.427180.077
HKC058燃料 Fuel d0.139130.65
HKC059鞋款 Shoe0.61*120.035
HKC060藝術品 Artwork0.248260.221
HKC061香港景點 Sightseeing Spot0.338320.058
HKC062服飾配件 Fashion Accessory0.290150.294
HKC063年齡組別 Age Group0.204110.548
HKC064公益團體 NGO0.515*250.01
HKC065電子產品 Electronic Device0.662*120.019
HKC066度量工具 Measuring Tool0.348150.203
HKC067糖水 Sweet Soup/Tong Sui0.347170.158
HKC068海洋生物 Marine Animal0.337220.125
HKC069時間單位 Time Unit0.374100.287
HKC070長輩稱呼 Name for Addressing Elder Relatives0.510**260.008
HKC071重量單位 Weight Unit0.309120.354
HKC072天然能源 Natural Energy Resource0.661*100.038

Correlations between total (the probability of being recalled, indexing typicality) and familiarity of exemplars, for each category.

*p < 0.05, **p < 0.01.

The familiarity-based explanation of faster and more accurate processing for typical exemplars could be due to the generally high degree of familiarity of the concepts (Ashcraft, 1978), and familiarity confounds to the pattern of experimental results (e.g., processing efficiency) that argued for a semantic memory model (McCloskey, 1980). However, even if word familiarity is an important determinant of typicality, it cannot account for all of the variance in typicality ratings (Rosch et al., 1976).

In this study, all combinations of “Familiarity” and “Total” were observed (high-F and low-T; high-F and high-T; low-F and high-T; low-F and low-T) for the exemplars. For example, dog (“狗,” gau2) was a highly familiar concept among the participants (6.30 out of 7.00, higher than the category average of 5.30), but was retrieved as a low typical member in the category “a kind of farm animal” (Slot1 = 0.000, Slot2 = 0.025, Slot3 = 0.025, Total = 0.017). In contrast, solar eclipse (“日蝕,” jat6 sik6) was a far less familiar concept (4.25 of 7.00, lower than the category average of 4.81), but was the top-mentioned exemplar in the category of “an astronomical phenomenon” (Slot1 = 0.425, Slot2 = 0.150, Slot3 = 0.05, Total = 0.208).

Previous norming studies have often found familiarity (or word frequency) to be correlated with indices of typicality, such as overall frequency, first-occurrence, and mean rank (Montefinese et al., 2012), because typicality and familiarity are both associated with the ease of production of an exemplar. The non-correlation discrepancy may be due to the experimental designs used in the current study. In Experiment 1, the number of category responses was restricted to three, so all three responses were more likely to be highly familiar items, though their instance probabilities would still differ. In Experiment 2, the familiarity ratings were provided without the category context or other category items, though familiarity ratings were done within a category in some other studies (e.g., Hampton and Gardiner, 1983). The familiarity ratings in Hampton’s work are of more comparative and relative results among category members.

Properties of Categories

The comparative statistical indices of the categories are shown in Table 4, where “Valid Exemplars” represents the number of valid exemplars listed in the category. “Exemplars to 0.90 Coverage” is defined as the number of exemplars covering 90% of the occurrences of all valid entries. “0.90 Coverage%” is calculated as Exemplars to 0.90 Coverage divided by Valid Exemplars. “Invalid Exemplars%” is the proportion of invalid responses in a category, the same as Total of Invalid xx in Table 1. “First Exemplar Total%” is the Total probability of the top-ranked exemplar in that category, which indicates the degree of dominance of that exemplar and how typicality congregates in that category. “Average Familiarity” is the average familiarity score of all of the valid exemplars in a category, along with its standard deviation.

TABLE 4

Category CodeCategory NameValid ExemplarsExemplars to 0.90 Coverage0.90 Coverage%Invalid Exemplars%First Exemplar Total%Avg. Familiarity
HKC001農場動物 Farm Animal1160.5450.0080.2675.295 (±0.719)
HKC002調味料 Spice1790.5290.0000.2175.268 (±0.639)
HKC003家用電器 Household Appliance20110.5500.0000.2255.705 (±0.536)
HKC004汽車零件 Car Part19191.0000.1170.3174.389 (±0.802)
HKC005浴室用品 Bath Utensil16130.8130.0580.2335.538 (±0.608)
HKC006寵物 Pet1160.5450.0000.2675.095 (±0.699)
HKC007酒精飲料 Alcohol Drink16110.6880.0420.3084.384 (±1.011)
HKC008罪行 Crime17120.7060.0580.2174.829 (±0.740)
HKC009建築材料 Construction Material12100.8330.0830.2674.767 (±0.683)
HKC010布料 Fabric15140.9330.0920.2424.160 (±0.496)
HKC011化粧品 Makeup Product1470.5000.0250.2674.611 (±0.496)
HKC012鳥類 Bird25210.8000.0670.2583.808 (±1.120)
HKC013乳製品 Dairy Product1260.5000.0420.2675.533 (±0.448)
HKC014舞蹈 Dance14100.7140.0330.2333.796 (±0.649)
HKC015水果 Fruit1890.5000.0000.2755.292 (±0.553)
HKC016消防器材 Firefighting Supply1790.5290.0170.2424.100 (±0.833)
HKC017花 Flower24150.6250.0250.2584.160 (±0.790)
HKC018民間藝術 Folk Art20201.0000.1500.2004.108 (±0.670)
HKC019野生動物 Wild Animal27190.7040.0330.2424.294 (±0.718)
HKC020疾病 Disease24241.0000.1000.2334.563 (±1.039)
HKC021茶葉 Tea16100.6250.0170.2004.434 (±1.021)
HKC022家俬 Furniture18120.7890.0080.2755.625 (±0.601)
HKC023房屋類型 Housing Type19100.5260.0170.2754.900 (±0.928)
HKC024昆蟲 Insect20120.6000.0170.1754.178 (±1.234)
HKC025廚房用具 Kitchen Utensil23190.8260.0670.1925.476 (±0.613)
HKC026金屬 Metal1690.5290.0420.2250.245 (±1.101)
HKC027零食 Snack19190.9470.1080.2925.042 (±1.003)
HKC028樂器 Musical Instrument23120.5220.0000.2423.830 (±0.974)
HKC029職業 Profession32200.6250.0000.2175.345 (±0.523)
HKC030人體器官 Human Organ25140.5600.0000.2254.938 (±1.036)
HKC031寶石 Gem1590.6000.0000.0003.407 (±0.954)
HKC032祭祀用品 Ancestral Worship Item14141.0000.1580.2584.943 (±1.363)
HKC033運動 Sport28160.5710.0000.2004.704 (±0.702)
HKC034園藝工具 Gardening Tool23190.8260.0670.1674.209 (±1.042)
HKC035玩具 Toy22180.8180.0670.1834.614 (±0.697)
HKC036蔬菜 Vegetable20130.6500.0170.1675.275 (±0.319)
HKC037武器 Weapon26200.7690.0500.2174.198 (±1.075)
HKC038天文現象 Astronomical Phenomena17120.7060.0580.2084.812 (±0.819)
HKC039文具 Stationary1880.4440.0000.2755.378 (±0.709)
HKC040沐浴用品 Bath Product1390.6920.0330.2335.346 (±0.661)
HKC041器皿 Container1380.6150.0420.2255.388 (±0.595)
HKC042清潔工具 Cleaning Tool1490.6430.0080.2255.475 (±0.549)
HKC043照明工具 Lighting Appliance1470.5000.0170.2924.800 (±1.095)
HKC044投資工具 Investment Tool19191.0000.1750.2584.284 (±0.813)
HKC045交通工具 Transportation950.5560.0080.3085.767 (±0.684)
HKC046形狀 Shape1470.5000.0080.3254.850 (±0.295)
HKC047情緒 Emotional State28190.6790.0250.2335.752 (±0.506)
HKC048餐具 Tableware940.4440.0000.2835.983 (±0.462)
HKC049急救用品 First Aid Supply1460.4290.0250.2754.839 (±0.505)
HKC050休閒活動 Recreational Activity31200.6450.0080.1335.385 (±0.788)
HKC051街頭小食 Street Food1980.4210.0000.3175.324 (±0.605)
HKC052語言 Language1270.5830.0000.3084.650 (±0.897)
HKC053電影類型 Movie1680.5000.0000.2175.234 (±0.436)
HKC054營養補充劑 Nutritional Supplement25251.0000.1170.2424.212 (±1.089)
HKC055標點符號 Punctuation Mark1370.5380.0000.3254.827 (±0.399)
HKC056貨幣 Currency1480.5710.0250.2504.461 (±0.553)
HKC057茶樓點心 Teahouse DimSum1890.5000.0000.3005.222 (±1.198)
HKC058燃料 Fuel1370.5380.0170.2254.515 (±0.650)
HKC059鞋款 Shoe1290.7500.0500.2834.763 (±0.932)
HKC060藝術品 Artwork26200.6540.0500.1754.675 (±0.740)
HKC061香港景點 Sightseeing Spot32240.7500.0330.2674.581 (±0.881)
HKC062服飾配件 Fashion Accessory15130.8670.0830.1504.963 (±0.675)
HKC063年齡組別 Age Group1160.5450.0000.2425.514 (±0.536)
HKC064公益團體 NGO25180.7200.1330.0423.954 (±1.037)
HKC065電子產品 Electronic Device1250.4170.0000.3255.833 (±0.641)
HKC066度量工具 Measuring Tool15120.8000.0670.2504.330 (±0.899)
HKC067糖水 Sweet Soup/Tong Sui18130.6470.2000.0504.750 (±0.998)
HKC068海洋生物 A Marine Animal22130.5910.0080.1334.180 (±1.011)
HKC069時間單位 Time Unit1040.4000.0250.3335.275 (±1.229)
HKC070長輩稱呼 Name for Addressing Elder Relatives26140.5380.0000.1675.146 (±1.062)
HKC071重量單位 Weight Unit12121.0000.1000.2754.332 (±1.125)
HKC072天然能源 Natural Energy Resource1070.7000.0250.1924.215 (±0.569)

Compiled statistics for all categories.

Measurement of Category Size and Category Nucleus

Category size is straightforwardly defined as the number of exemplars included in a given category, represented as Valid Exemplars in this database. Discrepancies in category size might reflect actual differences in reality (e.g., types of fruit seen and sold in the local markets) or the degree of fine graining of the superordinate-level concept represented by the category label (e.g., “a kind of emotion”) in the lexical inventory.

In our database, a possible alternative measure of category size is the number of exemplars at which the accumulative frequency reaches 0.900, i.e., Exemplars to 0.90 Coverage in Table 4. This threshold corresponds to a cut-off rate of 0.100, which excludes highly atypical or idiosyncratic items as “messy residues” and represents a stricter measurement of category size. The ratio (0.90 Coverage%) becomes non-negligible with the category statistics mentioned above, although to the best of our knowledge, this has not yet been addressed in the literature. It is possible that a smaller ratio indicates a strong dominance of the top exemplars within the category, a more restricted and unanimous membership, or a smaller category nucleus on the graded structure.

For example, for “a kind of farm animal” (Table 1), just 0.545 of all of the exemplars covered 0.933 of all responses, while the remaining 0.455 exemplars accounted for 0.067 of the members at the other end of category typicality. This means that the first 6 of the 11 exemplars in the category “a kind of farm animal” (cattle, “牛,” ngau4; chicken, “雞,” gai1; pig, “豬,” zyu1; sheep, “羊,” joeng4; horse, “馬,” maa5; duck, “鴨,” aap3) accounted for 93.3% of all of the eligible entries. A person with knowledge of these top exemplars (or highly typical), with half of the category as the category essence or stereotypes, could be considered as being equipped with considerate understanding and word knowledge of the category and its commonly agreed membership.

Uneven Knowledge Base

Inevitably, the linguistic realization of the conceptual system in a language community reflects and is shaped by its cultural and social contexts. Conversely, the richness of knowledge about a certain genre may be captured by the abundance of the speakers’ lexical resources of the corresponding categories. In this way, the heterogeneity of categories provides considerate amount of anthropologic semantic details of the language context of the speakers in their everyday lives. Invalid Exemplars% in Table 4 can be rendered as a negative indicator of such lexical abundance because most invalid responses are “give-ups” (responses such as “I don’t know” or “−”), repeated instances, or interchangeable rephrases. These invalid responses, which are possibly driven by the no-skipping requirement of the task, reflect a knowledge deficiency for that category or the scarce importance of the genre of knowledge in speakers’ daily communications.

Furthermore, the Average Familiarity values and standard deviations in Table 4 provide an overall familiarity estimate for the concepts in the category. The fact that concepts in one category are more consistently familiar across participants than other categories could indicate participants’ higher knowledge of or more frequent exposure to that category, and thus the essentiality of such knowledge.

Discussion

Comparisons to Other Norms: Inclusion, Measurements, and Methodologies

Over the years, the Battig and Montague (1969) English norms have been constantly updated and expanded, while researchers have compiled norms in other languages by adapting the category list and using similar methodologies (e.g., Storms, 2001 in Flemish; Marful et al., 2015 in Spanish, and many others). Among them, the norm of Van Overschelde et al. (2004) as an updated English norm, reflected contemporary category membership knowledge and captured the recent cultural changes based on Battig and Montague (1969). It has also been used as a comparable work to many other norm studies (Bueno and Megherbi, 2009 in French). A cross-norm comparison to Van Overschelde et al. (2004) should be representative as the comparison between the current study and the general body of norm studies.

Overlapping Categories

Twenty-five categories are common to the two databases, as listed in Table 5. These overlapping categories are used in a wide range of studies and experiments. Contrasting cultural context is apparently a major contributor to the discrepancy on the inclusion of the categories. “A kind of money” in the Van Overschelde et al. (2004) study asked the participants to provide the proper names of United States dollar bills and coins (e.g., dollars, quarters, and dime). There are no such systematically categorical discriminations in HKC. Instead, the HKC study asked the participants to recall their most commonly experienced currencies used in different regions and countries, since international trades and traveling are common experience for the local people. Besides, with almost two decades between the two norming studies, there are inevitably new clusters of concepts emerging, as evidenced by the inclusion of categories such as “HKC064 NGO” and “HKC065 Electronic Devices.”

TABLE 5

Category in HKC NormingCategory in Van Overschelde et al. (2004)No. of Exemplars in HKCNo. of Exemplars in Van Overschelde et al. (2004)Overlapping
HKC002 調味料 Spice25. A substance for flavoring food17259
HKC007 酒精飲料 Alcohol Drink20. An alcoholic beverage16199
HKC008 罪行 Crime22. A crime17168
HKC010 布料 Fabric9. A type of fabric152010
HKC012 鳥類 Bird37. A bird252913
HKC014 舞蹈 Dance42. A type of dance14227
HKC015 水果 Fruit16. A fruit182715
HKC017 花 Flower48. A flower24166
HKC020 疾病 Disease49. A disease24216
HKC022 家俬 Furniture14. An article of furniture18219
HKC024 昆蟲 Insect45. An insect202315
HKC025 廚房用具 Kitchen Utensil11. A kitchen utensil23198
HKC026 金屬 Metal5. A metal16159
HKC028 樂器 Musical Instrument34. A musical instrument232512
HKC029 職業 Profession27. An occupation or profession322314
HKC031 寶石 Gem1. A precious stone15158
HKC033 運動 Sport29. A sport28268
HKC034 園藝工具 Gardening Tool69. A gardener’s tool23188
HKC035 玩具 Toy41. A toy22218
HKC036 蔬菜 Vegetable43. A vegetable20259
HKC037 武器 Weapon17. A weapon262211
HKC045 交通工具 Transportation39. A transportation vehicle9206
HKC058 燃料 Fuel26. A fuel13199
HKC059 鞋款 Shoe44. A type of footwear21126
HKC069 時間單位 Time Unit2. A unit of time10139

Comparison with the English norm of Van Overschelde et al. (2004) with list of mutually included categories.

Discrepancy on Measurements and Methodology

The direct measurements given in the norms [Total and Slot 1, “Total” and “First” in Van Overschelde et al. (2004)] were not defined and computed in an identical way. In Van Overschelde et al. (2004), “Total” was computed “by dividing the number of participants who gave the response by the number of all participants who generated any response” (Van Overschelde et al., 2004, p291), and “First” was computed “by dividing the number of participants who gave the response as the first response by the number of all participants who generated any response” (Van Overschelde et al., 2004, p291–293). More specifically, in a time-limited recall design such as in Van Overschelde et al. (2004)’s English norm, the number of responses from each participant differed; in the current study, all participants generated the same number of responses for a category. Although these two sets of measurements are both indexing the total dominance of an exemplar and the first occurrence of that exemplar, the correlation is not given here since the results and interpretation can be due to the difference in methodology, not in cultural factors.

The position in which an exemplar was recalled was also measured differently. Van Overschelde et al. used “Rank” (i.e., “the mean output position of the response”), whereas the current study reports the probabilities for all the positions (Slot1, Slot2, and Slot3) because there were only three possible positions and participants assigned the positions with intension (driven by the task instruction) of ranking the choices.

This experiment design was adapted from Yoon et al. (2004), a cross language/culture/age norm study which included 105 categories and results from young and old American/Chinese Adults. In HKC norm, three most typical exemplars were provided by 40 participants, in the order of the participants’ subjective ranking of typicality. In Van Overschelde et al. (2004), for each category at least 600 participants gave their responses within the 30s-time limitation, and the norm used a cut-off rate at 0.05 of the participants (i.e., responses mentioned by less than 0.05 participants were discarded from the final database). However, as shown in Table 5, the counts of exemplars generated in the two norms in these mutually included categories are rather comparable, despite the gap between the numbers of participants; there are also considerate proportions of overlapping exemplars, as 0.506 (±0.175, range = [0.250 −0.900]) of the exemplars in HKC categories can also be found in the corresponding categories of Van Overschelde et al. (2004). As for the other non-overlapping half of exemplars, it is tempting to interpret the discrepancy as the cultural/lexical difference affecting the scopes of categories in the two norms (thus two languages); yet it should be noted that it is unclear whether this currently observed discrepancy, or any further comparison results between the current study and other norms using a time-restricted task design, might be also due to the methodological differences.

Concepts and Translation

The overlapping exemplars are not identified as one-to-one word pairs using direct translations (Table 6). The different ways of projecting and conceptualizing reality may account for the referring complications: a word in Hong Kong Cantonese may have more than one English translation, and vice versa. For example, for “HKC022 家俬 Furniture” and “14. An article of furniture,” “沙發” (saa1 faat3) has two corresponding exemplars, i.e., both “couch” and “sofa” in “14. An article of furniture,” and both “煤油” (mui4 jau4) and “火水” (fo2 seio2) have “kerosene” as a comparable word in the category of “Fuel,” with “火水” being more colloquial in Hong Kong Cantonese.

TABLE 6

Word CodeExemplar in HKCin Van Overschelde et al. (2004)
HKC002 調味料 Spice – 25. A substance for flavoring food
HKC00201Sugar(s)
HKC00202Salt
HKC00203胡椒Pepper
HKC00204豉油Soy sauce
HKC00205辣椒Paprika; hot sauce
HKC00206Vinegar
HKC00207Oil(s)
HKC00209香草Vanilla
HKC00210茄醬Ketchup
HKC007 酒精飲料 Alcohol Drink – 20. An alcoholic beverage
HKC00701啤酒Beer
HKC00702紅酒Wine
HKC00704威士忌Whiskey
HKC00705伏特加Vodka
HKC00706果酒Wine cooler(s)
HKC00707雞尾酒Margarita(s); Martini
HKC00709烈酒Liquor(s)
HKC00711香檳Champagne
HKC00714琴酒Gin
HKC008 HKC008 罪行 Crime – Crime – 22. A crime
HKC00801偷竊Stealing/theft/robbery; larceny
HKC00802謀殺Murder/killing
HKC00803強姦Rape
HKC00804搶劫Stealing/theft/robbery
HKC00805傷人Battery
HKC00807詐騙Arson
HKC00813綁架Kidnapping
HKC00815藏毒Drug use/possession
HKC010 布料 Fabric – 9. A type of fabric
HKC01001棉布Cotton
HKC01002絲綢Silk
HKC01003麻布Linen
HKC01004尼龍Nylon
HKC01005羊毛Fleece; wool
HKC01007絨布Flannel
HKC01010牛仔Denim; jeans
HKC01012纖維Rayon
HKC01014蕾絲Lace
HKC01015皮革Leather
HKC012 鳥類 Bird – 37. A bird
HKC01201麻雀Sparrow(s)
HKC01203烏鴉Crow(s)
HKC01204白鴿Dove; pigeon(s)
HKC01205鸚鵡Parrot; parakeet
HKC01207企鵝Penguin
HKC01209Chicken
HKC01211Eagle
HKC01212蜂鳥Hummingbird
HKC01213黃鶯Oriole
HKC01215貓頭鷹Owl(s)
HKC01218海鷗Seagull(s)
HKC01219鴕鳥Ostrich
HKC01225知更鳥Mockingbird; robin
HKC014 舞蹈 Dance – 42. A type of dance
HKC01401芭蕾舞Ballet
HKC01402拉丁舞Tango; salsa; cha cha; mambo
HKC01403街舞Hip hop; break
HKC01405爵士Jazz
HKC01406社交舞Waltz; ballroom; foxtrot
HKC01407現代舞Modern
HKC01413踢踏舞Tap
HKC015 水果 Fruit – 16. A fruit
HKC01501蘋果Apple
HKC01502香蕉Banana
HKC01503西瓜Watermelon
HKC01504Orange
HKC01505士多啤梨Strawberry
HKC01506芒果Mango
HKC01507提子Grape
HKC01508梨子Pear
HKC01510檸檬Lemon
HKC01511Peach
HKC01512藍莓Blueberry
HKC01513櫻桃Cherry
HKC01515木瓜Papaya
HKC01517菠蘿Pineapple
HKC01518橘子Tangerine
HKC017 花 Flower – 48. A flower
HKC01701玫瑰Rose
HKC01703百合Lily
HKC01705蘭花Orchid
HKC01709水仙Daffodil
HKC01711向日葵Sunflower
HKC01712康乃馨Carnation
HKC020 疾病 Disease – 49. A disease
HKC02001感冒Flu; cold
HKC02002癌症Cancer
HKC02003心臟病Heart disease
HKC02005糖尿病Diabetes
HKC02010愛滋AIDS/HIV
HKC02023天花Smallpox
HKC022 家俬 Furniture – 14. An article of furniture
HKC02201沙發Couch; sofa
HKC02202椅子Chair
HKC02204桌子Table
HKC02205Bed
HKC02206衣櫃Armoire
HKC02207書桌Desk
HKC02208書櫃Bookshelf
HKC02215梳妝台(檯)Dresser
HKC02216廚櫃Cabinet
HKC024 昆蟲 Insect – 45. An insect
HKC02401蝴蝶Butterfly
HKC02402螞蟻Ant
HKC02403蜜蜂Bee
HKC02404甲蟲Beetle
HKC02405蟑螂Roach
HKC02406蜻蜓Dragonfly
HKC02407烏蠅Fly
HKC02408蚊(子)Mosquito
HKC02409毛蟲Caterpillar
HKC02411蜘蛛Spider
HKC02412草蜢Grasshopper
HKC02413螳螂Praying mantis
HKC02416蜈蚣Centipede
HKC02417蟋蟀Cricket
HKC02419Flea
HKC025 廚房用具 Kitchen Utensil – 11. A kitchen utensil
HKC02501菜刀Knife
HKC02502鍋子Pot
HKC02505砧板Cutting board
HKC02508湯匙Spoon
HKC02512Fork
HKC02515Bowl
HKC02518杯子Cup
HKC02521Plate
HKC026 金屬 – Metal – 5. A metal
HKC02601Copper
HKC02602Gold
HKC02603Iron
HKC02604Silver
HKC02605Steel
HKC02607Lead
HKC02610Zinc
HKC02612Titanium
HKC02616Tin
HKC028 HKC028 樂器 Musical Instrument – 34. A musical instrument
HKC02801鋼琴Piano
HKC02802結他Guitar
HKC02803長笛Flute
HKC02804小提琴Violin
HKC02807口琴Harmonica
HKC02808豎琴Harp
HKC02809單簧管Clarinet
HKC02810大提琴Cello
HKC02811色士風Sax(ophone)
HKC02816風琴Organ
HKC02819小號Trumpet
HKC02821大號Tuba
HKC029 職業 Profession – 27. An occupation or profession
HKC02901老師Teacher
HKC02902醫生Doctor
HKC02903警察Policeman
HKC02904律師Lawyer
HKC02905消防員Fireman
HKC02906護士Nurse
HKC02907廚師Cook
HKC02908球員Athletes
HKC02909會計Accountant
HKC02912學生Student
HKC02914科學家Scientist
HKC02921工程師Engineer
HKC02923教授Professor
HKC02924助理Secretary
HKC031 寶石 Gem – 1. A precious stone
HKC03101鑽石Diamond
HKC03102紅寶石Ruby
HKC03103藍寶石Sapphire
HKC03104水晶Amethyst
HKC03106綠寶石Emerald
HKC03108翡翠Jade
HKC03112珍珠Pearl
HKC03114石榴石Garnet
HKC033 運動 Sport – 29. A sport
HKC03301跑步Running
HKC03302足球Football
HKC03303籃球Basketball
HKC03304游水Swimming
HKC03307羽毛球Badminton
HKC03315排球Volleyball
HKC03317網球Tennis
HKC03319壘球Softball
HKC034 園藝工具 Gardening Tool – 69. A gardener’s tool
HKC03402泥鏟Trowel
HKC03405手套Glove(s)
HKC03407泥耙Rake
HKC03408泥土Dirt/soil
HKC03409水桶Bucket(s)
HKC03410Hoe
HKC03411割草機Lawnmower
HKC03414水管Water hose
HKC035 玩具 Toy – 41. A toy
HKC03501公仔Stuffed animals
HKC03502玩具車Cars
HKC03504搖搖Yo-yo
HKC03505洋娃娃Dolls; Barbie dolls
HKC03506積木Blocks
HKC03509皮球Balls
HKC03510拼圖Puzzles
HKC03511電腦Computer
HKC036 蔬菜 Vegetable – 43. A vegetable
HKC03601白菜Cabbage
HKC03603生菜Lettuce
HKC03604蕃茄Tomato, tomatoes (20)
HKC03605蘿蔔Radish
HKC03606西芹Celery
HKC03607西蘭花Broccoli
HKC03610椰菜Cauliflower
HKC03611青瓜Cucumber
HKC03614菠菜Spinach
HKC037 武器 Weapon – 17. A weapon
HKC03701Knife
HKC03702手槍Gun
HKC03703Sword
HKC03704斧頭Axe
HKC03705炸彈Bomb
HKC03706Bow
HKC03709拳頭Fist
HKC03710手榴彈Grenade
HKC03715Spear
HKC03717雙節棍Nunchucks
HKC03724Stick
HKC045 交通工具 Transportation – 39. A transportation vehicle
HKC04501巴士Bus
HKC04502地鐵Subway
HKC04503的士Taxi/cab
HKC04506火車Train(s)
HKC04508私家車Car(s)
HKC04509飛機(Air)plane
HKC058 燃料 Fuel – 26. A fuel
HKC05801Coal
HKC05802石油Oil
HKC05803天然氣Natural (gas)
HKC05804汽油Gasoline
HKC05805木柴Wood
HKC05806柴油Diesel
HKC05808化石Fossil
HKC05809火水Kerosene
HKC05812煤油Kerosene
HKC059 鞋款 Shoe – 44. A type of footwear
HKC05901波鞋Sneaker; tennis; Nikes; Adidas
HKC05903拖鞋Slipper; flip flops
HKC05904高跟鞋High heels; pumps
HKC05905跑鞋Running shoes
HKC05906涼鞋Sandal
HKC05908靴子Boot
HKC069 時間單位 Time Unit – 2. A unit of time
HKC06901小時Hour
HKC06902分鐘Minute
HKC06903秒鐘Second
HKC06904Year
HKC06905Month
HKC06906毫秒Millisecond
HKC06907Day
HKC06908世紀Century
HKC06910星期Week

Comparison with the English norm of Van Overschelde et al. (2004), showing the overlapping exemplars in the mutually included categories.

In all, on the category level, the HKC norm covered a considerable range of categories that were in common with the English norm of Van Overschelde et al. (2004) and the cross-culture norm of Yoon et al. (2004); on the exemplar level, overlapping exemplars are identified with the referring discrepancy of the concept observed.

Potential and Benchmarks

In addition to the current representation of the categories, the exemplars, and the descriptive statistics, the database could provide primary training data for a more complex model with additional variables explored. The data presented in the current study is rather straight forward, as categories independent of each other and the exemplars are associated by their mutual categorical information. To further examine an interconnected semantic knowledge structure, more variables such as semantic relatedness of the exemplars and categorical feature analysis would be necessary, such that both intra-category exemplar relations and inter-category relations would be captured. This approach would provide a more sophisticated analysis of the semantic network of Hong Kong Cantonese, with an exploration of the concept clustering and interconnections between categories and concepts.

The processing efficiency of the highly typical exemplars suggests that categorical typicality imposes a spontaneous contextual prime on an exemplar, which can be considered as the stored semantic information about an exemplar. If we accept the hypothesis that Slot1 measures a kind of instant typicality, then this time-sensitive quality may be exploited in psychophysiological experiments to investigate the online processing of exemplars with congruent and incongruent categorial information primes. This type of investigation could be achieved by monitoring brain activity using technologies such as electroencephalography and event related potentials (Stuss et al., 1988; Kounios and Holcomb, 1992; Kutas and Iragui, 1998; Federmeier and Kutas, 1999). Furthermore, the data collected from young, healthy adults can serve as a benchmark for studies of other age groups, namely older adults and children, and of patients with cognitive deficits. As the semantic knowledge is generally preserved in the elder population (e.g., Park et al., 2002), this database provides resources in examining neural mechanisms of word retrieval for Cantonese-speaking elderlies. On the other hand, comparisons between the responses provided by neurologically impaired subjects and the database may reveal the domain-specific degeneration of semantic knowledge. The database may also benefit developmental studies examining how children establish lexical inventories by observing category and exemplar learning.

Conclusion

This paper presents a norming study of category instance production for 72 natural semantic categories in modern Hong Kong Cantonese, with instance probability and familiarity rating results. Total exemplar production probability and the probabilities of different positions of occurrence provide a detailed statistical description of instance typicality. In addition, word familiarity is provided for each included exemplar as independent words from their categorical information. The split-half correlation as the reliability measurements confirms that the norming results are reliable and consistent. The database addresses the lack of a Hong Kong Cantonese category norming database and opens up research potential in multiple fields.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Ethics statement

The studies involving human participants were reviewed and approved by Ethics Committee, City University of Hong Kong. The patients/participants provided their written informed consent to participate in this study.

Author contributions

BL contributed to data collection, formal analysis, and writing—original draft. QL contributed to data collection and formal analysis. HYM contributed to formal analysis and visualization. OT contributed to conceptualization, funding acquisition, and resources. C-MH contributed to conceptualization, investigation, and methodology. H-WH contributed to conceptualization, formal analysis, investigation, methodology, project administration, supervision, validation, writing—original draft, and writing-review and editing. All authors contributed to the article and approved the submitted version.

Funding

All sources of funding received for the research being submitted. Subjects’ incentives were paid by the Strategic Research Grants (7005343 and 7005414); open access publication fees, and research staff who worked on this project were paid by the Hong Kong Institute for Advanced Study (9360157), City University of Hong Kong.

Acknowledgments

H-WH and C-MH would like to thank Shih-Ping Huang for his company and indispensable support during the COVID-19 self-quarantine.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    AshcraftM. H. (1978). Property norms for typical and atypical items from 17 categories: A description and discussion.Memory Cogn.6227232. 10.3758/BF03197450

  • 2

    BarsalouL. (2003). Situated simulation in the human conceptual system.Language Cogn. Proc.18513562. 10.1080/01690960344000026

  • 3

    BarsalouL. W. (1985). Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories.J. Exp. Psychol.11629654. 10.1037/0278-7393.11.1-4.629

  • 4

    BattigW. F.MontagueW. E. (1969). Category norms of verbal items in 56 categories: A replication and extension of the Connecticut category norms.J. Exp. Psychol.80, 146. 10.1037/h0027577

  • 5

    BjorklundD. F.ThompsonB. E.OrnsteinP. A. (1983). Developmental trends in children’s typicality judgments.Behav. Res. Methods15350356. 10.3758/BF03203657

  • 6

    BuenoS.MegherbiH. (2009). French categorization norms for 70 semantic categories and comparison with Van Overschelde et al.’s (2004) English norms.Behav. Res. Methods4110181028. 10.3758/BRM.41.4.1018

  • 7

    CaramazzaA.SheltonJ. R. (1998). Domain-specific knowledge systems in the brain: The animate-inanimate distinction.J. Cogn. Neurosci.10134. 10.1162/089892998563752

  • 8

    CatricalàE.Della RosaP. A.PlebaniV.ViglioccoG.CappaS. F. (2014). Abstract and concrete categories? Evidence from neurodegenerative diseases.Neuropsychologia64271281. 10.1016/j.neuropsychologia.2014.09.041

  • 9

    CohenB. H.BousfieldW. A.WhitmarshG. (1957). Cultural Norms for Verbal Items in 43 Categories.California: Connecticut Univ Storrs Storrs-Mansfield United States.

  • 10

    FedermeierK. D.KutasM. (1999). A rose by any other name: Long-term memory structure and sentence processing.J. Memory Language41469495. 10.1006/jmla.1999.2660

  • 11

    HamptonJ. A.GardinerM. M. (1983). Measures of internal category structure: A correlational analysis of normative data.Br. J. Psychol.74491516. 10.1111/j.2044-8295.1983.tb01882.x

  • 12

    JanczuraG.NelsonD. (1999). Concept Accessibility as the Determinant of Typicality Judgments.Am. J. Psychol.112119. 10.2307/1423622

  • 13

    KiranS.JohnsonL. (2008). Semantic complexity in treatment of naming deficits in aphasia: Evidence from well-defined categories.Am. J. Speech Lang. Pathol.17389400. 10.1044/1058-0360(2008/06-0085)

  • 14

    KiranS.SandbergC.SebastianR. (2011). Treatment of category generation and retrieval in aphasia: Effect of typicality of category items.J. Speech Lang. Hear. Res.5411011117. 10.1044/1092-4388(2010/10-0117)

  • 15

    KouniosJ.HolcombP. J. (1992). Structure and process in semantic memory: Evidence from event-related brain potentials and reaction times.J. Exp. Psychol.121459479. 10.1037/0096-3445.121.4.459

  • 16

    KutasM.IraguiV. (1998). The N400 in a semantic categorization task across 6 decades.Electroencephalogr. Clin. Neurophysiol. Evoked Poten. Sect.108456471. 10.1016/S0168-5597(98)00023-9

  • 17

    LakoffG. (1973). Hedges: A study in meaning criteria and the logic of fuzzy concepts.J. Philosoph. Logic2458508. 10.1007/BF00262952

  • 18

    LiD. C. (2000). Phonetic borrowing: Key to the vitality of written Cantonese in Hong Kong.Written Lang. Liter.3199233. 10.1075/wll.3.2.02li

  • 19

    LiD. C.LeeS. (2004). Bilingualism in East Asia.Handb. Biling.97742779. 10.1002/9780470756997.ch28

  • 20

    MaltB. C.SmithE. E. (1982). The role of familiarity in determining typicality.Memory Cogn.106975. 10.3758/BF03197627

  • 21

    MarfulA.DíezE.FernandezA. (2015). Normative data for the 56 categories of Battig and Montague (1969) in Spanish.Behav. Res.47902910. 10.3758/s13428-014-0513-8

  • 22

    MarxD. M.KoS. J. (2012). Prejudice, discrimination, and stereotypes (racial bias).Encycl. Hum. Behav.2012160166. 10.1016/b978-0-12-375000-6.00388-8

  • 23

    McCloskeyM. (1980). The stimulus familiarity problem in semantic memory research.J. Verb. Learn. Verbal Behav.19485502. 10.1016/S0022-5371(80)90330-8

  • 24

    McEvoyC. L.NelsonD. L. (1982). Category name and instance norms for 106 categories of various sizes.Am. J. Psychol.95581634. 10.2307/1422189

  • 25

    MervisC. B.CatlinJ.RoschE. (1976). Relationships among goodness-of-example, category norms, and word frequency.Bull. Psych. Soc.7283284. 10.3758/BF03337190

  • 26

    MervisC. B.RoschE. (1981). Categorization of natural objects.Annu. Rev. Psychol3289115. 10.1146/annurev.ps.32.020181.000513

  • 27

    MontefineseM.AmbrosiniE.FairfieldB.MammarellaN. (2012). Semantic memory: A feature-based analysis and new norms for Italian.Behav. Res.45440461. 10.3758/s13428-012-0263-4

  • 28

    MurphyG. L. (2002). The big book of concepts.America: MIT Press.

  • 29

    ParkD. C.LautenschlagerG.HeddenT.DavidsonN. S.SmithA. D.SmithP. K. (2002). Models of visuospatial and verbal memory across the adult life span.Psychol. Aging.17299320.

  • 30

    RipsL. J.ShobenE. J.SmithE. E. (1973). Semantic distance and the verification of semantic relations.J. Verbal Learn. Verbal Behav.12120. 10.1016/S0022-5371(73)80056-8

  • 31

    RoschE. (1975). Cognitive representations of semantic categories.J. Exp. Psychol.104192233. 10.1037/0096-3445.104.3.192

  • 32

    RoschE.MervisC. B. (1975). Family resemblances: Studies on the internal structure of categories.Cogn. Psychol.7573605. 10.1016/0010-0285(75)90024-9

  • 33

    RoschE.SimpsonC.MillerR. S. (1976). Structural bases of typicality effects.J. Exp. Psychol.2491502. 10.1037/0096-1523.2.4.491

  • 34

    RoschE. H. (1973). On the internal structure of perceptual and semantic categories. In Cognitive development and acquisition of language.Netherland: Elsevier, 111144. 10.1016/B978-0-12-505850-6.50010-4

  • 35

    SchwanenflugelP. J.ReyM. (1986). The relationship between category typicality and concept familiarity: Evidence from Spanish- and English-speaking monolinguals.Memory Cogn.14150163. 10.3758/BF03198375

  • 36

    SmithE. E.ShobenE. J.RipsL. J. (1974). Structure and process in semantic memory: A featural model for semantic decisions.Psychol. Rev.81214241. 10.1037/h0036351

  • 37

    StormsG. (2001). Flemish category norms for exemplars of 39 categories: A replication of the Battig and Montague (1969) category norms.Psychol. Belg.41145168.

  • 38

    StussD. T.PictonT. W.CerriA. M. (1988). Electrophysiological manifestations of typicality judgment.Brain Lang.33260272. 10.1016/0093-934X(88)90068-5

  • 39

    Van OverscheldeJ. P.RawsonK. A.DunloskyJ. (2004). Category norms: An updated and expanded version of the Battig and Montague (1969) norms.J. Memory Lang.50289335. 10.1016/j.jml.2003.10.003

  • 40

    YoonC.FeinbergF.HuP.GutchessA. H.HeddenT.ChenH.-Y. M.et al (2004). Category norms as a function of culture and age: Comparisons of item responses to 105 categories by American and Chinese adults.Psychol. Aging19379393. 10.1037/0882-7974.19.3.379

Appendix

Appendix 1

Category Name in HKCCategory Name in English
魚類A Kind of Fish
電視節目A TV program
中藥材A Herb in Chinese Medicine
天然地形A Natural Geographical Feature
刊物類別A Type of Publication
本地地名A Name of Local Place
參考書A kind of Reference Book
音樂類型A Kind of Music
學系A Department in University
藝術品A Piece of Artwork
五金用品A Piece of Hardware Supplies
康樂項目A Recreational Activity

12 categories that were excluded from the final list because the invalid responses exceeded 50% of the total responses collected in the categories.

Summary

Keywords

Hong Kong Cantonese, norm, semantic category, typicality, familiarity, lexicon

Citation

Li B, Lin Q, Mak HY, Tzeng OJL, Huang C-M and Huang H-W (2021) Category Exemplar Production Norms for Hong Kong Cantonese: Instance Probabilities and Word Familiarity. Front. Psychol. 12:657706. doi: 10.3389/fpsyg.2021.657706

Received

23 January 2021

Accepted

07 July 2021

Published

09 August 2021

Volume

12 - 2021

Edited by

Francesca Peressotti, University of Padua, Italy

Reviewed by

Thomas M. Gruenenfelder, Indiana University, United States; Steven Verheyen, Erasmus University Rotterdam, Netherlands

Updates

Copyright

*Correspondence: Hsu-Wen Huang,

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics