Edited by: Lynne D. Roberts, Curtin University, Australia
Reviewed by: Emma Buchtel, Hong Kong Institute of Education, Hong Kong; Ilka H Gleibs, London School of Economics and Political Science, UK
*Correspondence: Yoshimasa Majima
This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Recent research on human behavior has often collected empirical data from the online labor market, through a process known as crowdsourcing. As well as the United States and the major European countries, there are several crowdsourcing services in Japan. For research purpose, Amazon's Mechanical Turk (MTurk) is the widely used platform among those services. Previous validation studies have shown many commonalities between MTurk workers and participants from traditional samples based on not only personality but also performance on reasoning tasks. The present study aims to extend these findings to non-MTurk (i.e., Japanese) crowdsourcing samples in which workers have different ethnic backgrounds from those of MTurk. We conducted three surveys (
Online survey research is becoming increasingly popular in psychology and other social sciences on human behavior. Researchers often collect data from participants in online labor markets, through a process known as
Estelles-Arolas and Gonzalez-Ladron-De-Guevara (
MTurk specializes in recruiting users, who are referred to as
Mason and Suri (
Along with increasing usage in behavioral research, the validity of the data obtained from MTurk participants has been examined in several studies (for a recent review, see Paolacci and Chandler,
The two samples were also different in their performances of reasoning and attention to instructions. For example, Goodman et al. (
The MTurk and traditional samples also have several commonalities. For example, MTurk workers and students show similar performance on classical heuristic-bias judgment tasks, such as the
In sum, although MTurk participants and traditional participants differ in terms of a few features, they share many common properties. Therefore, crowdsourcing is considered to be a fruitful data collection tool for psychology and other social sciences (Goodman et al.,
MTurk appears to provide a promising approach to behavioral studies owing to its advantages over traditional offline data collection. Despite these advantages, there are some limitations of MTurk as a participant pool for empirical studies. First, there are issues with sample diversity. Demographic surveys have repeatedly shown that the majority of MTurk workers are Caucasian residents of the United States, followed by Asian workers who live in India (Paolacci et al.,
The second issue is of a technical nature. At this time, a US bank account is required to be a
Several studies also pointed out potential pitfalls of online studies with MTurk. First, Zhou and Fishbach (
As noted previously, the quality of data collected from MTurk participants have been verified. It is also shown that the other crowdsourcing pools, such as Clickworker and Prolific Academic, are practical alternatives to MTurk (e.g., Lutz,
The primary goal of the present study is to extend existing findings of previous validation studies of MTurk to other non-MTurk crowdsourcing samples. Specifically, we investigated the following questions.
Question 1: Do the demographic properties of workers from the other (i.e., non-MTurk) crowdsourcing samples differ from those of students? If so, how are they different?
Question 2: Do psychometric properties, such as personality traits or those of consumer behavior, differ across non-MTurk workers and students?
Question 3: Is the quality of non-MTurk workers' performance on reasoning and judgment tasks relevant to effortful System-2 thinking in comparison with that of students?
Question 4: How do non-MTurk workers respond to “trap” questions? Are they more (or less) attentive to the instructions for these tasks?
The present study compared crowdsourcing participants with university students in terms of their personality, psychometric properties regarding decision making, and consumer behavior (Survey 1), thinking disposition, reasoning performance, and attention to the study materials (Surveys 2 and 3). In all of the surveys, the crowdsourcing participants were recruited from CrowdWorks (a Japanese crowdsourcing service, which is abbreviated as CW hereafter;
All of the participants answered web-based questionnaires that were administered by SurveyMonkey (Surveys 1 and 2) or Qualtrics (Survey 3). For the CW sample, we posted a link to the survey site to the CW task. When the participants reached the site, they were presented with general instructions, and they were asked to provide their consent to participate in the survey by clicking an “agree” button. If they agreed to take the survey, the online questionnaires were presented in a designed sequence. After they completed the questionnaires, they received a randomized completion code, and they were asked to enter it into the CW task page to receive payment. Because CW allocates a unique ID per person, it is possible to restrict the same worker to a single task more than once. In addition, we also enabled SurveyMonkey and Qualtrics restriction features to prohibit multiple participations. After the correct completion code had been entered, the experimenter approved the compensation to be sent to the participants' accounts. The CW participants were completely anonymous throughout the entire survey process.
The university students were recruited from introductory psychology, statistics, English, or social welfare classes, and they were provided with a leaflet that described a link to the equivalent web-based survey site. When they reached the site, they received the same general instructions and the same request for their consent to take the survey as the CW participants. After they completed the survey, they were provided with a randomly generated completion code that was required for them to receive credit.
The present study was approved and conducted in compliance with the guidelines of the Hokusei Gakuen University Ethics Committee. All of the participants gave their web-based informed consent instead of written consent.
Survey 1 compared the CW and university samples in terms of their demographic status, personality traits and psychometric properties, which included the so-called Big Five traits, as well as self-esteem, goal orientation, and materialism as an aspect of consumer behavior. These scales were adopted from previous validation studies of MTurk (e.g., Behrend et al.,
A total of 319 crowdsourcing workers agreed to participate in the survey; however, we excluded 7 participants because they did not complete the questionnaire. We also excluded 17 responses because of IP address duplication, which left 295 in the final sample. The participants received 50 JPY for completing a 10 min survey.
In addition, we collected 144 students, but we excluded 12 participants from the analyses for the following reasons: incomplete responses (11 participants) and IP address duplication (1 participant). We also excluded one participant from the analyses because of a failure to indicate that he or she was currently a university student in the demographic question. A final sample of 131 undergraduate students participated in the survey.
The sample size of the present survey was decided in reference to previous validation studies of MTurk and other practical reasons. For example, Behrend et al. (
As the measures for personality traits, we administered two widely used personality inventories: a brief measure of the Big-Five personality dimensions (10-Item Personality Inventory, TIPI; Gosling et al.,
All of the participants completed identical measures in an identical order. In the first step, they answered each of TIPI-J items on a 7-point Likert scale (1 =
All of the statistical analyses of the present study were performed using SPSS 21.0. In addition, when we report η2 as an index of effect size of ANOVA, where the value designates partial η2.
Table
1 | UNIV | 19.56 | 1.12 | 43.5 | 2.47 | 4.12 | 98 | 100 | 131 |
CW | 36.89 | 8.81 | 63.7 | 12.38 | 8.68 | 98 | 99 | 295 | |
2 | UNIV | 19.70 | 1.34 | 46.2 | 3.04 | 4.37 | 98 | 100 | 156 |
CW | 36.59 | 9.19 | 62.3 | 12.43 | 9.16 | 98 | 100 | 297 | |
1 | UNIV | 0 | 117 | 3 | 10 | 0 | 1 | 131 | |
CW | 1 | 90 | 76 | 118 | 8 | 2 | 295 | ||
2 | UNIV | 0 | 141 | 1 | 9 | 0 | 5 | 156 | |
CW | 9 | 94 | 40 | 114 | 9 | 31 | 297 | ||
1 | UNIV | 0 | 23 | 1 | 1 | 0 | 98 | 8 | 131 |
CW | 63 | 39 | 58 | 7 | 3 | 119 | 6 | 295 | |
2 | UNIV | 0 | 26 | 1 | 2 | 0 | 111 | 16 | 156 |
CW | 69 | 48 | 50 | 5 | 6 | 109 | 10 | 297 |
Table
EX |
0.706 | 7.39 | 3.17 | [6.72, 8.06] | 7.28 | 3.30 | [6.52, 8.04] | 0.746 | 5.99 | 2.53 | [5.43, 6.55] | 6.85 | 2.91 | [6.43, 7.27] |
A |
0.411 | 9.82 | 2.51 | [9.29, 10.35] | 9.74 | 2.25 | [9.13, 10.34] | 0.447 | 9.09 | 2.48 | [8.65, 9.53] | 9.34 | 2.16 | [9.00, 9.67] |
C |
0.479 | 6.58 | 3.03 | [5.98, 7.18] | 6.26 | 2.40 | [5.58, 6.95] | 0.569 | 6.88 | 2.31 | [6.38, 7.38] | 7.36 | 2.72 | [6.98, 7.73] |
ES |
0.309 | 7.28 | 2.61 | [6.71, 7.86] | 6.04 | 2.28 | [5.38, 6.69] | 0.601 | 7.21 | 2.29 | [6.73, 7.69] | 6.43 | 2.69 | [6.06, 6.79] |
O |
0.245 | 8.50 | 2.53 | [7.92, 9.08] | 7.07 | 2.65 | [6.41, 7.73] | 0.533 | 7.81 | 2.54 | [7.33, 8.29] | 7.62 | 2.46 | [7.26, 7.98] |
RSE |
0.814 | 29.39 | 7.88 | [27.63, 31.15] | 26.35 | 6.14 | [24.34, 28.36] | 0.889 | 27.59 | 7.58 | [26.12, 29.05] | 28.24 | 8.13 | [27.14, 29.35] |
PAGO |
0.671 | 16.31 | 4.11 | [15.56, 17.07] | 16.16 | 2.90 | [15.30, 17.02] | 0.719 | 15.19 | 3.26 | [14.56, 15.82] | 15.23 | 3.08 | [14.75, 15.70] |
PPGO |
0.737 | 16.23 | 4.28 | [15.46, 17.00] | 15.42 | 3.45 | [14.54, 16.30] | 0.707 | 14.94 | 2.97 | [14.30, 15.59] | 15.02 | 3.16 | [14.54, 15.51] |
MVS |
0.744 | 27.97 | 5.99 | [26.63, 29.31] | 26.37 | 6.09 | [24.84, 27.89] | 0.782 | 26.16 | 5.46 | [25.05, 27.27] | 25.70 | 5.96 | [24.86, 26.54] |
A similar ANOVA on the RSE scale failed to show significant sample and gender differences,
Table
Then, a sample × gender ANOVA was conducted, and the result showed that the students were more materialistic than the crowdsourcing sample,
In Survey 1, we found a significant, but not surprising, difference between the students and the CW workers in terms of their demographic status. The findings also showed that some personality characteristics differed between the two samples. For example, the CW participants were less extraverted and agreeable, although they were more conscientious than the students. In addition, the CW participants were less materialistic and their pursuit performance-avoid or prove goals were lower than those of the students.
Some of these results, such as demographics, extraversion, openness, and performance-avoid goal orientation, were compatible with the previous validation studies using MTurk (Paolacci et al.,
To summarize, our results indicated both similarities and differences between the CW workers and the students, which is generally consistent with existing findings. It is also important to note that the effect sizes of the sample differences were relatively small, as has been shown in previous studies.
Survey 2 aimed to compare the crowdsourcing workers and students in terms of their thinking disposition, as well as their reasoning and judgment biases related to systematic System-2 thinking.
As a measure of thinking disposition, we administered the Cognitive Reflection Test (CRT; Frederick,
We also investigated sample differences in their attention to instructions by using instructional manipulation checks (IMCs; Oppenheimer et al.,
We collected data from 338 CW workers; however, data from 27 of the participants were excluded due to incomplete responses, and data from 14 participants were excluded because of IP address duplication, which left 297 in the final sample. The participants received 80 JPY for the 15 min survey.
We also collected 166 undergraduate students from the same university as in Study 1 as the student sample. However, 10 of the participants were excluded from the analysis for the following reasons: incomplete response = 5 participants, IP address duplication = 1 participant, and failure to choose “student” as the current status at demographic question = 4 participants.
The sample size was decided based on the same rationale as Survey 1. We also conducted power analysis to determine sufficient sample size using an alpha of 0.05, a power of 0.8, effect size (
In this survey, the participants were presented with five tasks that measured their thinking disposition, reasoning and judgment biases, and attention to instructions: CRT, probabilistic reasoning, syllogism, anchoring-and-adjustment, and IMC. A 2 (sample; CW and UNIV) × 2 (IMC order; first and last) factorial design was adopted.
After the participants read general instructions and provided their consent, those who were assigned to the IMC-first order (
Next, we administered a three-item version of the CRT (Frederick,
Then, the participants were presented with the logical reasoning task. They were presented with eight syllogisms one at a time, and they answered by clicking either “True” or “False” on each conclusion. Following the syllogisms, an anchoring-and-adjustment task that was adopted from Goodman et al. (
In the following analysis, the participants who answered “yes” to the probe question to each task were excluded from the analyses.
Table
Table
IMC first | 77 | 37.7% | 154 | 45.5% | |||||
IMC last | 76 | 32.9% | 137 | 62.8% | |||||
IMC first | Pass | 22 | 1.91 | 1.02 | [1.47, 2.35] | 62 | 1.53 | 1.00 | [1.27, 1.80] |
Failure | 46 | 1.17 | 1.16 | [0.87, 1.48] | 77 | 1.09 | 1.08 | [0.85, 1.33] | |
IMC last | Pass | 19 | 0.84 | 1.01 | [0.36, 1.32] | 79 | 1.48 | 1.00 | [1.25, 1.72] |
Failure | 40 | 1.20 | 1.07 | [0.87, 1.53] | 48 | 0.96 | 1.11 | [0.66, 1.26] | |
DN % rational |
Pass | 53 | 73.6% | 154 | 73.4% | ||||
Failure | 93 | 64.5% | 125 | 59.2% | |||||
Syllogism |
Pass | 47 | 4.49 | 2.58 | [3.73, 5.13] | 151 | 3.51 | 2.55 | [3.14, 3.91] |
Failure | 82 | 3.21 | 2.08 | [2.68, 3.73] | 118 | 3.31 | 2.37 | [2.81, 3.73] | |
Mean estimation | 147 | 36.51 | 17.60 | 276 | 38.71 | 20.26 | |||
0.171 |
0.064 |
[Overall model] | 22.83 | 3 | <0.001 | 0.067 | ||||||
Constant | −0.71 | 0.24 | 8.53 | 0.003 | 0.49 | |||||
Sample (UNIV = 0, CW = 1) | 1.24 | 0.30 | 16.80 | <0.001 | 3.44 | [1.91, 6.21] | ||||
IMC Order (Last = 0, First = 1) | 0.21 | 0.34 | 0.38 | 0.537 | 1.23 | [0.63, 2.40] | ||||
Sample × Order | −0.91 | 0.42 | 4.85 | 0.028 | 0.40 | [0.18, 0.90] | ||||
[UNIV] | 0.38 | 1 | 0.537 | 0.003 | ||||||
Constant | −0.71 | 0.24 | 8.53 | 0.003 | 0.49 | |||||
IMC Order | 0.21 | 0.34 | 0.38 | 0.537 | 1.23 | [0.63, 2.40] | ||||
[CW] | 8.80 | 1 | 0.003 | 0.040 | ||||||
Constant | 0.52 | 0.18 | 8.74 | 0.003 | 1.69 | |||||
IMC Order | −0.70 | 0.24 | 8.65 | 0.003 | 0.49 | [0.31, 0.79] |
We excluded 29 students and 31 workers from the following analysis owing to their previous experience with the task. A 2 (sample) × 2 (IMC order) × 2 (IMC performance; pass vs. failure) ANOVA revealed significant main effects of the IMC performance and task order,
The probe analysis excluded 10 students and 18 workers. We conducted logistic regression analyses that predicted the likelihood of high-probability choice by IMC order and performance (see Table
Constant | 0.66 | 0.22 | 9.23 | 0.002 | 1.94 | 0.47 | 0.14 | 11.26 | <0.001 | 1.60 |
IMC Order (Last = 0, First = 1) | −0.34 | 0.28 | 1.40 | 0.236 | 0.71 [0.41, 1.25] | |||||
IMC Performance (Failure = 0, Pass = 1) | 0.37 | 0.31 | 1.42 | 0.233 | 1.44 [0.79, 2.63] | 0.55 | 0.21 | 6.84 | 0.009 | 1.73 [1.15, 2.61] |
Order × Performance | 0.31 | 0.42 | 0.55 | 0.460 | 1.37 [0.60, 3.14] | |||||
χ2 | 8.36 | 6.94 | ||||||||
3 | 1 | |||||||||
0.039 | 0.008 | |||||||||
Negelkerke's pseudo- |
0.027 | 0.023 | ||||||||
AIC | 27.84 | 15.32 |
Next, we conducted a three-way ANOVA on the number of correctly solved syllogisms. In this analysis, 27 students and 28 workers were excluded because of their previous experience with the task. The results showed a significant main effect of IMC performance,
We excluded 9 students and 18 workers from the following analyses owing to their previous experience. We also excluded three workers who estimated extremely large numbers (>mean +3
The present study allowed the participants to answer the survey using their PC or mobile device at their convenience; therefore, they might have searched for accurate answers on the Internet. Seven students (4.8%) and 26 workers (9.4%) “estimated” the correct number of African countries that could be found on a Wikipedia query (56 countries) or a document by the Ministry of Foreign Affairs of Japan (54 countries). The percentage of correctly “estimated” participants was slightly higher in CW, but the difference was relatively small,
The number of excluded participants owing to previous experience was different across tasks. We compared the proportion of participants who answered “Yes” to the probe question between student and CW participants. A series of Chi-square tests revealed that students were more likely to have the experience of participating CRT (% of excluded, UNIV = 18.6 vs. CW = 10.4) and syllogism task (UNIV = 17.3% vs. CW = 9.4%), χ2s(1) = 5.92, 5.95,
Survey 2 showed that the workers and students did not differ in their overall performance in the CRT and probabilistic and logical reasoning. In addition, the CW participants were less prone to anchoring-and-adjustment bias than the non-crowdsourced sample (similar results were reported by Goodman et al.,
The number of participants with prior experience in the task was different between two samples only for CRT and syllogism task. Furthermore, the percentages of previously exposed participants were somewhat lower than MTurk workers. For example, Chandler et al. (
Survey 2 indicated that the participants were less attentive to the instructions of the task. We suspected that this may have partly been caused by the fact that many of the participants, particularly the students, reached the survey site using small screen devices, such as smartphones. However, because we did not collect information regarding device or browser type in Survey 2, whether small screen devices compared to larger screen devices lead to less attentive responses remains unclear. In Survey 3, we examined whether the use of small screen devices facilitated failure in attentional checks and poorer performance on the other reasoning tasks. As in Survey 2, we also investigated whether the order of the IMC question affected performance in the subsequent reasoning tasks.
Similar to Surveys 1 and 2, we recruited participants from CrowdWorks; however, in this survey, we decided to hide the task from the workers if their acceptance rate was less than 95%. We collected 205 participants from CrowdWorks; however, 38 of the participants were excluded from analysis for the following reasons: providing incomplete response (3 participants), searching for correct answers or responding randomly (see Materials and Procedure Section; 34 participants), and participating both in mobile and PC surveys (1 participant). One participant in the mobile condition was also excluded because the device type information indicated that he or she had participated in the survey using a PC. Consequently, 167 participants remained in the final sample. (Mobile group,
The tasks and the procedure were almost identical to those of Survey 2 except that the anchoring-and-adjustment task was omitted from this survey. We posted two different CW tasks that were designed for two experimental conditions (device type: Mobile and PC). The participants were asked to choose one of two tasks appropriate for their device. And they were also asked not to participate twice. Both of the tasks consisted of the same general instructions, and the link to the online survey was administered by Qualtrics. The two tasks differed in terms of the following device-specific instructions. The instructions for the mobile condition asked the participants to use their mobile devices (smartphone or tablet) and not to use a PC. However, the participants in the PC condition were asked to take the survey using their PC. In addition, we collected device-type information (e.g., OS, Browser and its version, and screen resolution; these data were collected by Meta Info question of Qualtrics) to prevent those who accessed with inappropriate devices from participating in the survey. At the end of the survey, the participants were presented with two probe questions that asked whether they searched for any correct answers or responded randomly during the survey. Those who answered yes to at least one probe question were excluded from the following analyses.
Table
Age |
33.4 | 8.99 | 37.1 | 9.07 | |||||
Female % |
66.7% | 44.7% | |||||||
% Passed IMC |
IMC first | 44 | 72.7% | 40 | 92.5% | ||||
IMC last | 37 | 64.9% | 45 | 93.3% | |||||
CRT | Passed IMC | 52 | 1.54 | 1.11 | [1.24, 1.83] | 64 | 1.63 | 1.02 | [1.35, 1.88] |
Failed IMC | 21 | 1.05 | 1.07 | [0.59, 1.51] | 6 | 1.17 | 1.17 | [0.31, 2.03] | |
DN % rational |
Passed IMC | 54 | 64.8% | 70 | 78.6% | ||||
Failed IMC | 24 | 62.5% | 6 | 66.7% | |||||
Syllogism | Passed IMC | 56 | 4.34 | 1.94 | [3.73, 4.94] | 79 | 4.32 | 2.40 | [3.79, 4.80] |
Failed IMC | 25 | 4.12 | 2.55 | [3.23, 5.03] | 6 | 4.33 | 1.63 | [2.50, 6.16] |
The likelihood of passing IMC instructions is shown in Table
A three-way (device type × IMC order × IMC performance) ANOVA on CRT score was conducted; however, 24 participants (8 in mobile 16 in PC group) were excluded from the analysis because they declared that they had experienced with the CRT before the survey. The results showed a marginally significant effect of IMC performance,
Then, we conducted a logistic regression analysis to predict the likelihood of high-probability choice in a probabilistic reasoning task by device type, IMC order and IMC performance. Three participants in the mobile condition and nine participants in the PC condition were excluded from the following analysis due to previous experience with the task. However, this model failed to show a good fit,
Finally, the number of correct responses to eight syllogism tasks was submitted to a similar three-way ANOVA; however, neither the main effects for device type, IMC order, IMC performance nor their interactions were significant,
To summarize, Survey 3 indicated that the participants were less attentive to instructions when they used their mobile devices, i.e., small screen devices. They were also prone to denominator neglect bias. However, type of device does not affect other reasoning tasks that are associated with analytic System 2 thinking. Furthermore, the participants were likely to answer reflectively if they read the instructions carefully. These results indicate that small screen devices hinder the careful reading of instructions; however, this might not necessarily spoil the performance of reasoning tasks.
In the present study, we compared participants from a Japanese crowdsourcing service with a Japanese student sample in terms of their demography, personality traits, reasoning skills, and attention to instructions. In general, the results were compatible with the existing findings of MTurk validation studies. The present results showed many similarities between the CW workers and the students; however, we also found interesting differences between the two samples.
First, but not surprisingly, the CW workers were older and hence had longer work experience than the students. Second, the CW workers and students were different in some of the personality traits, such as extraversion, conscientiousness, and performance-avoid goal orientation; however, these differences were relatively small and compatible with previous MTurk validation studies (Paolacci et al.,
We also identified a few important dimensions that differed from previous validation studies. First, the present participants, particularly the students, showed poorer IMC performance than participants in previous studies. The failure rate of the present CW participants (46%) was equal to the failure rate in Oppenheimer et al. (
It is important to note that the present study showed both commonalities and differences between CW workers and the Japanese student sample, which was compatible with the existing literature comprising MTurk validation studies. Despite a few inconsistencies, the present study suggested that online data collection using non-MTurk crowdsourcing services remains a promising approach for behavioral research.
However, at the same time, we recommend that researchers consider the following issues if they collect empirical data from non-MTurk crowdsourcing studies. First, the language that is used in CrowdWorks is limited to Japanese. Therefore, a solid level of language skill is required for both the researchers and the participants to conduct or participate in online surveys with this platform. It may be an obstacle for researchers who are not literate in the Japanese language, and this may also be the case for other crowdsourcing services in which the majority of potential workers are not literate in English. However, in other words, it is also a good opportunity to encourage researchers with different cultural backgrounds to conduct cooperative studies.
Second, MTurk provides a useful command-line interface and API that are designed to control HITs including the ability to obtain a worker's ID. Conversely, CrowdWorks provides only a web-based graphical interface to requesters. This may not necessarily be a disadvantage, since researchers can download the data that includes workers' ID and the survey completion code entered by individual workers from the CrowdWorks website. Therefore, if researchers allocate the unique completion code to each participant, they can examine whether a certain participant has participated in their own surveys before when the naivety in sample is essential. However, it is still impossible to identify whether a certain participant already took similar surveys or experiments that have been administered by other researchers. If the survey includes widely used tasks, such as the CRT, it is helpful to ask participants whether they have already answered previous versions of such tasks. As suggested by several previous studies (e.g., Chandler et al.,
Third, MTurk workers sometimes receive very little compensation to complete HITs (e.g., $.10 for a 5 min survey) compared to that received in traditional laboratory research (for recent ethical questions concerning online studies, see Gleibs,
YM designed the study, lead the data collection and analysis, and was a main author. KN co-lead the design, assisted in the analysis and interpretation, and was a contributing author. AN assisted in the data collection and contributed with the manuscript drafting. RH co-lead the data collection and assisted in the design of the study.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was financially supported by the Special Group Research Grant (Year 2015) from Hokusei Gakuen University.
1Non-US researchers can post HITs on MTurk, if they use outside service.
2In Toplak et al. (