Gender-based Differential Item Functioning in the Application of the Theory of Planned Behavior for the Study of Entrepreneurial Intentions

Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.


INTRODUCTION
Entrepreneurial activity is an important vehicle for value creation and has a significant impact on economic growth, continuous business renewal, and employment (Van Praag and Versloot, 2007). However, although half of the working population are women, and women make up a substantial proportion of those choosing to be entrepreneurs (Minniti et al., 2005), female entrepreneurship significantly lags behind male entrepreneurship (Minniti et al., 2005;Kelley et al., 2013).
According to findings from the Global Entrepreneurship Monitor (GEM) project, males' rates of entrepreneurial activity range from over three times that of females in some countries, while in others, the male-female ratio of participation is nearly identical (Minniti et al., 2005;Sarri and Trihopoulou, 2005). In nearly all of the 67 economies included in the GEM the rate of men's venture creation is higher than that of women (Kelley et al., 2013). This is especially true of Greece, which is characterized by higher gender inequality (Sarri and Trihopoulou, 2005).
In the same vein, recent findings from the Global University Entrepreneurial Spirit Students' Survey project (GUESSS -Sieger et al., 2014;Tognazzo et al., 2016) conducted in 34 countries and in more than 700 universities suggest that 10.7% of all male students strive for an entrepreneurial career path, compared to only 6.6% of all female students. The differences are even larger, 5 years after completion of studies: on average 35.1% of all male students aspire to be entrepreneurs, but only 27.5% of all female students. The aforementioned studies raises questions as to why the rate of men's venture creation exceeds that of women and what factors explain these differences (Sarri and Trihopoulou, 2005;Piacentini, 2013).
Research has suggested the existence of the gender gap in entrepreneurial orientation and in the motivation, and intention to become an entrepreneur (Mueller and Dato-on, 2013;Schlaegel and Koenig, 2014). The image of the entrepreneur has traditionally been masculinized and rooted in masculine discourse (Ahl, 2006). Moreover, it has been found that for women who work in gender incongruent occupations dominated by men, the experience of discrimination has a negative association with their well-being (Maddox, 2013;Di Marco et al., 2016). Being a member of two traditionally unrelated groups (i.e., being a woman and an entrepreneur) is not an easy task for women (Zampetakis et al., 2016).
Research has drawn on several theoretical perspectives when considering business startup motivation, including innovation theory (Stewart et al., 1999) or social and human capital theory (Langowitz and Minniti, 2007). In recent years Ajzen's (1991) theory of planned behavior (TPB) is often used as a framework for predicting entrepreneurial motivation (Maes et al., 2014;Schlaegel and Koenig, 2014). According to the TPB, there are three key factors that influence an individual's intention (INT) to start a business, these being: (i) attitudes toward entrepreneurship (ATT), that is a person's overall assessment of the advantages and disadvantages of entrepreneurship, (ii) subjective norms (SN), that is a person's perception of the social pressure from significant others to perform the behavior (i.e., start a business), and (iii) perceived behavioral control (PBC) that is the perceived ease or difficulty of starting a business. The TPB suggests that INT results from positive ATT, positive SN and feelings of control over the creation process.
On average men compared to women have higher INT (Haus et al., 2013). The gender-related differences found in entrepreneurial motivation may be attributable to real and valid differences in constructs used, such as ATT and PBC. According to Maes et al. (2014) women are driven toward entrepreneurship by motives that facilitate a balance in business and personal life, that are less dominant in predicting personal attitude. Moreover, women seem to display lower internal feelings of control than men that are more dominant in predicting PBC.
However, the gender-related differences found in entrepreneurial motivation could also depend on the properties of the instruments being used in research raising issues of construct validity (Bird and Brush, 2002;Jennings and Brush, 2013). What is common in contemporary entrepreneurship research studies is that the often adoption of self-report techniques and structured questionnaires for the assessment of entrepreneurship related variables, such as the ones used in the TPB (Henry et al., 2016). Although scales observe differences in scores between groups, differences may also be due to a characteristic of test items other than the scale attribute. Research on female entrepreneurship has often been criticized for using instruments developed for male entrepreneurs, making it impossible to capture anything differentially feminine while women are more likely to appear inadequate in comparison to men (Stevenson, 1986;Ahl, 2006). These instruments are superimposed on women, and not tested with appropriate methods for measurement equivalence (or Differential item functioning-DIF; Holland and Wainer, 1993), thus missing any potential important differences between the male/female entrepreneurial endeavors.
Differential Item Functioning occurs when a test or a survey item (i.e., a question) functions differently for a reference group (e.g., males) of respondents compared to focal group (e.g., females) respondents, after controlling for the level of the attribute being measured (Millsap, 2012). For example, an item exhibits DIF if the probability of males responding to a specific category differs from females when they both are operating at the same overall level on the construct (Holland and Wainer, 1993;Crane et al., 2006). Awareness of this bias is of particular importance where scale scores are used to investigate gender differences and ensure that derived scores are comparable across groups.
A lack of measurement equivalence at the item level, may lead to spurious mean differences in the observed scores between male and female participants, because one cannot be certain there is a meaningful difference, thereby making mean score differences un-interpretable (Millsap, 2012). Furthermore, the existence of DIF across genders for entrepreneurship-related variables, could lead to scores of questionable meaning and interpretation depending on the gender of the respondent, because DIF suggests that the items do not relate to the construct of interest in the same way. In that situation, scores would not be comparable between males and females; a particular score may have a different meaning for men than it does for women. Taken together, detection of DIF is important as it can influence the psychometric properties of an instrument and mean score comparisons (e.g., Church et al., 2011).
There are several ways in which gender stereotypes, and/or social constructions regarding entrepreneurship and family roles could differentially affect men and women's responses to entrepreneurship-related constructs. According to gender role theory traditional gender roles prescribe that women's role should be based around family, while men's role should be more focused on work (e.g., Gutek et al., 1991). Moreover, entrepreneurship is considered to be a gendered phenomenon (Jennings and Brush, 2013). Because women feel more pressure to have a family centered identity, items such as "A career as entrepreneur is attractive for me", or "Among various options, I would rather be an entrepreneur" may be interpreted by men and women to indicate differing levels of ATT. Thus, a male respondent and a female respondent with the same moderate level of ATT might answer this item differently. A male respondent might consider his moderate level of ATT as warranting high agreement with these items, since he and the people around him tend to perceive entrepreneurship as a stereotypically masculine endeavor (Jennings and Brush, 2013). A female respondent with the same moderate level of ATT might disagree with this item, since her moderate level might be construed by her and those around her as being too low, as society generally expects women's identity to reside in the family sphere (i.e., social desirable responding). Socially desirable responding, could influence responses and lead to DIF, as men and women may be uncomfortable providing answers that fall outside of societal expectations.
Similarly, an item on the INT scale such as "Spend time learning about starting a firm" may indicate a different level of INT for men than it would to female respondents. For example, a male respondent and a female respondent with the same high level of INT might respond to such an item differently. The male respondent may endorse strong agreement with the item, since men are generally expected to be more involved in business startup, compared to women.
Nevertheless, the presence of DIF at the item level does not necessarily imply DIF at the scale level (differential test functioning-DTF). Conversely, having no or little DIF at the item level does not imply that the scale as a whole is measurementinvariant (Penfield and Algina, 2006). Research provides evidence that DIF can influence the psychometric properties of test scores (e.g., coefficient alphas, score variances) and depending on its direction, DIF can increase or decrease sum scores (Li and Zumbo, 2009). DIF favoring women might increase women's scores relative to men's scores, while DIF favoring men might do the opposite. DTF analyses allow assessing the overall impact of DIF effects with all items being taken into account simultaneously.
Although testing for DIF is a quite common practice in other social science research domains (such as psychology) applications to entrepreneurship related constructs are rare. One notable exception is a study that analyzed the essential dimensions of enterprising personality (Suárez-Álvarez et al., 2014) regarding gender-related DIF. The researchers found that 9 out of the 127 items showed DIF as a function of students' gender, in constructs such as optimism, innovativeness, selfefficacy, risk taking and stress tolerance. In another study, Maes et al. (2014) used Ajzen's (1991) TPB as a theoretical framework and analyzed the measurement part of the model, at the indicator level, testing the hypothesis that students' gender moderates the strength of the relationship between certain indicators and their respective factors. Their analyses indicated important gender differences in the factors that shape entrepreneurial intentions. Finally, entrepreneurial intention is not restricted to students or unemployed people. For example it is plausible that people may have the intention to launch a business while retaining their "day job" for some time (i.e., hybrid entrepreneurs; Raffiee and Feng, 2014).
In summary, although reports of gender-specific differences in constructs used in entrepreneurship research may reflect true distinctions in entrepreneurial intentions between men and women, these same effects may simply be an artifact of gender differences in the linguistics used to describe entrepreneurial phenomena. Given the various mechanisms by which the interpretation of the TPB scales could vary between men and women, the objectives of this study were (1) to test the main antecedents of entrepreneurial behavior (Kautonen et al., 2015) that is, ATT, SN, PBC and INT, using indicators used in previous research for DIF regarding gender and (2) to examine the implications of DIF at the scale level using analyses of DTF.

Sample and Procedures
Survey data were collected from 1800 individuals from various parts of Greece. The majority of participants (34.1%) were students from various disciplines (e.g., psychology, education, engineering, business and science students). Unemployed participants were 32.5% while 33.4% were employed in the private (17.5%) and the public sector (15.9%).
The study was carried in accordance to the principles expressed in the Declaration of Helsinki and was approved by the authors' institutional ethics committees. Surveys were administrated to participants through personal contact by the study authors with written informed consent from all participants. A variety of recruitment methods were used, including word of mouth, advertising through social network sites, and course credit. The study was described as examining "Factors affecting career choice and development." Participants were informed that anonymity was guaranteed and that they had the option to withdraw from the study at any moment. Data collection took place at the beginning of 2016 and lasted approximately 6 months.
In sum, the sample consisted of 1800 participants (50.4% female), the mean sample age was 32.05 years (SD = 12.46), range was 18 to 59 years. The majority of respondents (61.8%) had a university/college degree; 433 participants (24.1%) reported that one of their parents owned a full time business most of the time, while they were growing up, 87% reported that they know an entrepreneur in their close environment, and 27% of participants reported that they had some experience from business start-up procedures. The survey instrument contained items representing the theoretical constructs along with demographic data. Items referring to the same construct were positioned in different locations throughout the questionnaire.

Measurement of Theoretical Constructs
The specific measures used in the analysis, along with sample items of the relevant constructs, are outlined. All the main constructs included in the analysis were assessed with selfreport measures based on multi-item scales. The back-translation procedure recommended by Brislin (1980) was followed for the translation of the items into the Greek language.

Entrepreneurial Intention (INT)
We assessed participants' entrepreneurial intent using a scale originally developed by Thompson (2009). This is a reliable and internationally applicable individual entrepreneurial intent scale. It includes ten items, four of which are distracter items that act as red herrings and were not included in scale analyses. Sample items are: "Intend to set up a company in the future, " "I have no plans to launch my own business" (reverse scored).
Responses to the six items were made on 7-point Likert-type scales (1 = strongly disagree, 7 = strongly agree). Coefficient alpha for this scale was 0.89.

Attitudes Toward Entrepreneurship (ATT)
We assessed ATT using the five item scale from Liñán and Chen (2009). Sample items are: "A career as entrepreneur is attractive for me, " "Among various options, I would rather be an entrepreneur." Responses to the five items were made on 7-point Likert-type scales (1 = strongly disagree, 7 = strongly agree). Cronbach's reliability for this scale was 0.88.

Subjective Norm (SN)
We assessed SN using the three item scale from Liñán and Chen (2009). Students were asked: "If you decided to create a firm, would people in your close environment approve of that decision?" Items were (a) Your close family, (b) Your friends and (c) your fellow students. Responses to the three items were made on 5-point Likert-type scales (1 = total disapproval, 5 = total approval). Cronbach's reliability for this scale was 0.80.

Perceived Behavioral Control (PBC)
We assessed PBC using five items from the scale of Liñán and Chen (2009). Sample items are: "To start a firm and keep it working would be easy for me, " "I can control the creation process of a new firm." Responses to the five items were made on 7-point Likert-type scales (1 = strongly disagree, 7 = strongly agree). Cronbach's reliability for this scale was 0.84.

Methods of Analyses
First, the fit of the measurement model was examined (that is, the four constructs of the TPB) for the whole sample and separately for men and women. Analysis of Moment Structures (AMOS software, version 7.0) (Arbuckle, 2006) was used. Because the χ 2 statistic for model fit is highly sensitive to sample size, we employed several statistics to assess model fitness (Shook et al., 2004): (a) Root Mean Square Error Approximation (RMSEA): 0 = an exact fit, <0.05 = a close fit, 0.05-0.08 = a fair fit, 0.08-0.10 = a mediocre fit, and >0.10 = a poor fit (AMOS also computes a 90% confidence interval around RMSEA); (b) Comparative Fit Index (CFI): best if above 0.90; (c) Akaike Information Criterion (AIC). For model comparisons, smaller values in AIC represent a better fit of the model.
Second, DIF analyses were performed. Females served as the focal group with males as the reference group in the gender DIF analyses. The Mantel-Haenszel (MH) χ 2 procedure, as implemented in the DIFAS (Differential Item Functioning Analysis System -version 5.0) software (Penfield, 2005), was used. The MH statistical procedure consists of comparing the item performance of two groups (reference and focal), whose members were previously matched on the total score of the scale (the matching is done using the observed total test score as a criterion or matching variable). The MH statistic is based on a contingency table analysis. The critical values for this statistic are 3.84 (α = 0.05) and 6.63 (α = 0.01) (Penfield 2013, Unpublished). The results offered by the DIFAS software are displayed in two tables: The first of these shows the DIF statistics, while the second presents the conditional differences in the mean item scores between the reference and focal groups at ten intervals across the matching variable continuum. In the DIF analysis for polytomous items DIFAS software includes several statistics including the MH χ 2 , the Liu-Agresti cumulative common logodds ratio (L-A LOR), the estimated standard error (SE) of the L-A LOR and the Cox's Non-centrality Parameter Estimator (COX'S B), with its corresponding SE. The L-A LOR is based on the Haenszel common-odds ratio generalized to polytomous data and represents the log odds ratio of one group selecting a response option compared with the other group when the level of the overall measured construct is the same (Penfield 2013, Unpublished). Positive values indicate DIF in favor of the reference group, and negative values indicate DIF in favor of the focal group. The standardized Liu-Agresti Cumulative Common Log-Odds Ratios (LOR Z) was also used. A value greater than 2.0 or less than −2.0 may be considered evidence of the presence of DIF (Penfield and Algina, 2003). Finally, Cox's B is similar to the MH statistic except that it uses the hypergeometric mean. It is distributed similarly to L-A LOR that is, positive values indicate DIF in favor of the reference group, and negative values indicate DIF in favor of the focal groups. The size of the DIF was interpreted using a widely accepted classifying system whereby DIF in polytomous items is considered negligible if L-A LOR < 0.43, moderate if between 0.43 and 0.64, and large if >0.64 (Penfield, 2007).
Third, DTF analysis was conducted to examine measurement invariance directly at the scale level and was analyzed using the ν 2 statistic in DIFAS (version 5.0) (Penfield, 2005(Penfield, , 2013 Unpublished). The ν 2 statistic allows quantifying the overall DIF effect across the items of a scale (Penfield and Algina, 2006). A scale with a DIF effect variance of ν 2 below 0.07 can be classified as having small DTF, whereas DTF would be considered medium for 0.07 ≤ ν 2 ≤ 0.14 and large for ν 2 > 0.14 (Penfield and Algina, 2006;Penfield 2013, Unpublished). To examine whether differential functioning of the items influenced gender differences on the TPB scales, we computed Cohen's d for gender differences (Cohen, 1988) for each scale. First, Cohen's d was computed using all items, next items with large level of DIF were removed and lastly items with moderate a large levels of DIF were removed.

Descriptive Summary and Correlations
We present means, standard deviations and correlations across the four variables of the TPB, for the entire sample and separately for men and women participating in the study, in Tables 1-3. a Gender is coded: 1 = male 2 = female; Cronbach's alpha reliabilities are in parenthesis. * p < 0.05 (two tailed), * * p < 0.01 (two tailed).
Results of independent t-tests, suggested that men scored higher compared to women in ATT  (Tables 2, 3). These results are in line with previous research suggesting significant gender differences in terms of perceived feasibility (expressed as PBC), perceived desirability (expressed as ATT) and INT (Kolvereid, 1996;Dabic et al., 2012;Sieger et al., 2014). Moreover, results from one way ANOVA analyses suggested that employees working in the private sector and unemployed had higher INT compared to the other two groups of participants. We have found no statistically significant differences between students and participants working in the public sector in terms of ATT, PBC and INT; students scored higher to SN [t (475.77

Confirmatory Factor Analyses
Results from the confirmatory factor analysis (CFA) of the measurement model for the whole sample, suggested an Cronbach's alpha reliabilities are in parenthesis. * p < 0.05 (two tailed), * * p < 0.01 (two tailed).

Differential Item Functioning (DIF)
In Table 4 we present the Mantel χ 2 , L-A LOR, LOR Z and COX'S B-values for all the items in the four constructs. One item in the ATT scale: Item 4 -"Being an entrepreneur would entail great satisfactions for me, " exhibited a statistically significant but negligible DIF based on the L-A LOR criteria outlined above (Mantel χ 2 = 2.871, p < 0.10) (Penfield, 2007). No DIF was found for the PBC and SN scale. Finally one item in the INT scale: Item 6-"Spend time learning about starting a firm" exhibited a statistically significant but negligible DIF (Mantel χ 2 = 4.566, p < 0.05). The negative L-ALOR of the item (4) in the ATT scale indicates DIF favoring the focal group (women), i.e., for the same level of construct easier to endorse for women. The positive L-ALOR of the item (6) in the INT scale indicates DIF favoring men.

Differential Test functioning (DTF)
We present the ν 2 coefficients for the four TPB contsructs in Table 5. Based on criteria for assessing the size of DTF (Penfield and Algina, 2006), the DTFs were deemed not to warrant concern (all ν 2 coefficients, were below 0.07).

DISCUSSION
The primary goal of this study was to investigate the validity and meaningfulness of the main antecedents of entrepreneurial behavior (Kautonen et al., 2015) that is, ATT, SN, PBC and entrepreneurial intentions across gender. Such comparisons have potential theoretical importance in increasing researchers' understanding of the interplay between gender and entrepreneurial motivation and improve the participation rate of women in entrepreneurial activities. We focused on one important prerequisite for such comparisons, measurement invariance. To our knowledge, this is the first examination of gender-based DIF in entrepreneurship-related constructs.
Specifically, this study addressed DIF in the constructs that constitute the TPB, a widely used theoretical framework for the study of entrepreneurial motivation. Our results suggest that there are overall differences in mean scores for men and women in the TPB dimensions, yet the DIF analysis indicated that differences at the item-level are almost non-existent. Men outperformed women in ATT, PBC, SN and INT. These results are in agreement with previous studies concerning gender differences in entrepreneurial attitudes and intentions (Haus et al., 2013;Tognazzo et al., 2016). Moreover, the DTF analysis suggested that the effect of DIF across all the items for each scale was negligible.
The study contributes to previous research that uses the TPB model to study entrepreneurial intentions (Maes et al., 2014;Schlaegel and Koenig, 2014). Our results suggest that after controlling for the underlying TPB construct, the response to an item is not related to whether the respondent is male or female. Thus, the TPB constructs appear to function equivalently for men and women at the item level. Furthermore, our DTF analyses for each TPB construct, where we assessed the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the scales of the TPB as whole are measurement invariant. These findings provide evidence that the constructs used in the present research provide valid comparisons between male and female respondents. Our findings suggest that actually women tend to demonstrate lower entrepreneurial intentions compared to men (at least in a country such as Greece) and this gender-related difference is not dependent on the properties of the instrument being used. This opens the road for researchers to examine other theoretical variables that influence the lower entrepreneurial intention of women, For example Zampetakis et al. (2016) proposed that gender identity, that is the extent to which people incorporate gender roles into their self-concepts, is a promising construct for the study of gender differences in intentions related to entrepreneurship.
Although our study sheds some light on measurement invariance of the TPB constructs applied to entrepreneurship across gender, it has several limitations that further research can seek to address. First, our study design is cross sectional, where we did not measure actual business startup, but only respondents' intent to start a business. As such, one could consider our INT construct as general attitude to become an entrepreneur. Although our CFA results suggest that ATT and INT are two separate factors, future research could employ longitudinal designs, including actual business startup, in order to validate the INT construct.
Second, our study was limited to a sample of Greek participants. To extend the generalizability of our results, we encourage scholars in this area to examine our proposed model with different samples across different countries. Second, we applied non-parametric DIF detection methods. Nonparametric methods make fewer assumptions concerning the distribution of the latent trait in the population, but have the disadvantage that they rely on an observed score as the matching variable. This suggests that if our measurement contain widespread bias, it is possible that some bias within the measurement was not detected. Future research could use parametric DIF estimates in the framework of item response theory (IRT); IRT-based methods use a latent variable modeling approach.
Third, our analyses were based on manifest grouping variables such as gender, where DIF and DTF results depend on the contrasting group. Future research could benefit from the use of latent DIF detection approaches that relies on the use of mixture IRT models, that is, a combination of IRT and latent class models (Benítez et al., 2016). The use of mixture IRT models to detect DIF differentiates groups based on an unknown latent grouping variable that is not specified a priori but is determined by the results from the model parameter estimation.
One last consideration concerns the social and economic context in which the study took place. The recent global economic crisis with its peak in 2008 resulted in shocking changes for the labor market: in many countries workers lost their jobs, the work hours shortened while wage earnings declined (Pines et al., 2010;Giorgi et al., 2015). Greece is facing severe economic challenges in recent years. The economic crisis is an important stressor with negative effects on the health of workers and especially women . According to Drydakis (2015) during the Greek economic crisis, women were more negatively affected by unemployment in terms of their physical and mental health in comparison to men. Higher stress regarding employment status may exacerbate gender roles and may have further influenced relative cross-gender differences in entrepreneurial intentions and attitudes.

CONCLUSION
The present study examined DIF analysis in constructs of the TPB, a theoretical framework that is often used for describing entrepreneurial intention. DIF analysis indicated that differences at the item and scale level are almost nonexistent between male and female participants. However, DIF results may not generalize across inventories, especially when they have different theoretical frameworks. As such we believe that DIF should be conducted in constructs used in different theoretical frameworks of entrepreneurship research as testing for DIF enables developers to determine whether the constructs behave differently for women and men. In our opinion, DIF should be a prerequisite of meaningful group comparisons across male and female respondents, for the study of entrepreneurship related phenomena.

AUTHOR CONTRIBUTIONS
LZ conceptualized the study; LZ, KK, and VM designed the study; LZ, MB, CL, and KK performed research; LZ analyzed data; LZ and KK wrote the manuscript; MB and CL read and corrected the manuscript; VM and KK supervised the entire project. All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

FUNDING
Research reported herein was fully supported via a Grant financed by the European Economic Area (EEA) Financial Mechanism and the Greek Secretariat for Research and Technology (GSRT) ("FOREMOST" project: 3864). Views, opinions and results reported herein are the sole responsibility of the authors and do not correspond to official EEA or GSRT position.