Do Cross-National and Ethnic Group Bullying Comparisons Represent Reality? Testing Instruments for Structural Equivalence and Structural Isomorphism

Bullying in schools is a widespread phenomenon, witnessed worldwide, with negative consequences for victims and perpetrators. Although it is an international issue, there are several issues with cross-national and cross-cultural/ethnic research that can make comparisons between countries and cultures/ethnic groups difficult including language, cultural perception, and/or methodological issues. As statistical techniques rapidly develop, there may be more scope to be statistically creative in how we assess the utility of one tool across different groups such as cultures, nations, etc. At the very least, an attempt to do this should be paramount in studies investigating different groups (e.g., from different countries) at one time. This study investigated bullying and victimization rates in a large cross-ethnic and -country comparison between adolescents from four countries and five different ethnic groups including: Israel (Jewish Israelis and Arab Palestinian Israelis), Palestine (the Gaza Strip), Germany, and Greece. A total of 3,186 school children aged 12–15 years completed self-report questionnaires of peer bullying/victimization. A stepwise data analytic approach was used to test comparability of the psychometric properties: (1) Structural equivalence contributes to the valid use of the instrument in cultural contexts other than the one for which the instrument has been developed. Structural equivalence is a necessary condition for the justification of indirect or direct comparisons between cultural groups. (2) Additionally, structural isomorphism is necessary to demonstrate that the same internal structure of the instrument applies to the cultural and individual levels. Findings support the internal structural equivalence of the questionnaire with the exception of the Palestinian sample from the Gaza Strip. Subsequently, exploratory factor analysis on the cultural level structure revealed a one-factor structure with congruence measure below 0.85. Thus, no evidence was found for internal structural isomorphism suggesting that no direct comparisons of cultural samples was justified. These results are discussed in detail and the implications for the international research community and cross-national/-ethnic comparison studies in bullying are addressed.

Bullying in schools is a widespread phenomenon, witnessed worldwide, with negative consequences for victims and perpetrators. Although it is an international issue, there are several issues with cross-national and cross-cultural/ethnic research that can make comparisons between countries and cultures/ethnic groups difficult including language, cultural perception, and/or methodological issues. As statistical techniques rapidly develop, there may be more scope to be statistically creative in how we assess the utility of one tool across different groups such as cultures, nations, etc. At the very least, an attempt to do this should be paramount in studies investigating different groups (e.g., from different countries) at one time. This study investigated bullying and victimization rates in a large cross-ethnic and -country comparison between adolescents from four countries and five different ethnic groups including: Israel (Jewish Israelis and Arab Palestinian Israelis), Palestine (the Gaza Strip), Germany, and Greece. A total of 3,186 school children aged 12-15 years completed self-report questionnaires of peer bullying/victimization. A stepwise data analytic approach was used to test comparability of the psychometric properties: (1) Structural equivalence contributes to the valid use of the instrument in cultural contexts other than the one for which the instrument has been developed. Structural equivalence is a necessary condition for the justification of indirect or direct comparisons between cultural groups. (2) Additionally, structural isomorphism is necessary to demonstrate that the same internal structure of the instrument applies to the cultural and individual levels. Findings support the internal structural equivalence of the questionnaire with the exception of the Palestinian sample from the Gaza Strip. Subsequently, exploratory factor analysis on the cultural level structure revealed a onefactor structure with congruence measure below 0.85. Thus, no evidence was found

INTRODUCTION
Bullying is a specific form of aggressive behavior that includes repeated and negative behavior patterns (e.g., intentional injury) by one or several individuals toward another. In addition, the definition of bullying includes a real or perceived imbalance in power where the victim cannot defend him/herself (Olweus, 1994). International research suggests that bullying is a widespread phenomenon with similar characteristics across various countries and cultures globally. For example, gender differences are evident with regard to direct or physical bullying and victimization (boys are more involved than girls) and victimization usually decreases when pupils grow older (e.g., Scheithauer et al., 2006;. Being actively involved in bullying represents a major threat to healthy development and is associated with maladjustment later in life (e.g., Wolke et al., 2012;Zwierzynska et al., 2013;Slava et al., 2018). In particular, students who report bullying behavior as well as victimization (bully-victims), have a higher risk of developing emotional and behavioral problems (Wolke and Samara, 2004;Winsper et al., 2012;Kennedy, 2018; for a summary see Hess and Scheithauer, 2015).
Apart from similarities, cross-national and ethnic cultural research on bullying has produced numerous studies comparing prevalence rates and impact worldwide (e.g., Borntrager et al., 2009;Craig et al., 2009;Ortega et al., 2012;Chester et al., 2015;Smith et al., 2016a;Athanasiou et al., 2018). One study by Fleming and Jacobsen (2009) compared bullying rates in 19 countries worldwide using data from the Global School-based Student Health Survey. Results showed Zambia as the country with the highest percentage of victims (60.9%) and Tajikistan as the lowest (7.8%). Although the instruments used were the same for each country, the authors noted that interpretations were unique to each culture group and that social stigma could account for discrepancies across the countries. Another crossnational study by Mark et al. (2013) compared bullying rates in Lithuania, Luxemburg, and Estonia and showed that Lithuanian boys accounted for the biggest percentage of bullies, while girls in Luxemburg accounted for the smallest.
Indeed, several longitudinal studies have emerged which make comparisons of bullying involvement over time and across several countries such as the EU Kids Online study (e.g., Livingstone et al., 2015) or the Health Behavior in School Aged Children study (HBSC; e.g., Zaborskis et al., 2018; for a summary Smith and López-Castro, 2017). These studies are worthwhile in terms of drawing comparisons of bullying prevalence across many countries, yet they do not come without their difficulties. For example, individual countries often report varying rates for victimization across these studies and the studies themselves have shown limited comparability (Smith et al., 2016b;Smith and López-Castro, 2017).
There are several issues with cross-national and cross-ethnic cultural research that can make comparisons between countries and cultures or ethnic groups difficult. The first major issue of research is to ensure the psycholinguistic equivalence of the term "bullying." Notably, in some countries (e.g., Italy) no adequate translation of the English word "bullying" exists. In addition, there is no Arabic term equivalent to bullying (Samara et al., unpublished) and as such there is much debate about the most appropriate word to use and differences between one or more related concepts on bullying (Scheithauer et al., 2016). Moreover, even when the language is the same, there is the problem of varying terms to explain bullying-related behavior such as peer harassment or aggression. This is an issue both within a country as well as between countries (Smorti et al., 2003). On the same note, interpretation of what constitutes other types of bullying (e.g., cyberbullying) and the importance of definitional elements (e.g., anonymity) has been shown to vary across countries (Menesini et al., 2012). Other important factors when conducting this type of research refer to methodological issues that can also differ across studies and limit comparisons that can be drawn. These include research instruments used, the time frame questions refer to (e.g., the last 6 months vs. 12 months vs. the past term), and even if a definition is provided or not . Not only are there methodological differences in how questionnaires are delivered and what they enquire about (e.g., time frame), there are more general cultural differences that the instrument may not be sensitive to (e.g., what it means to be a bully and the social implications of such) that could be related to social desirability and cultural norms.
Several other non-methodological factors can also determine country differences, such as socioeconomic inequality (Chaux and Castellanos, 2015) or cultural values (e.g., individualismcollectivism; . For example, a cross cultural study amongst 75 countries revealed less overall victimization in individualist societies but greater proportion of relational victimization and a higher ratio of bullies to victims in collectivist societies .

Comparability Across Ethnic Groups: Psychometric Properties of Tools Used in Cross-National/Cultural Studies on Bullying
For the most part, researchers use a mix of strategies in trying to ensure their tools transfer across cultures such as translation and back translation of questions, factor analysis of items, and inclusion and exclusion of explanations in various languages. For example, several new scales have been developed to investigate cyberbullying across several countries. For the most part, strict statistical methods are used such as exploratory and confirmatory factor analysis. As statistical techniques rapidly develop, there may be more scope to be statistically creative in how we assess the utility of one tool across cultures and nations. At the very least, an attempt to do this should be paramount in studies investigating many countries at once.
When administering a psychometric instrument in a questionnaire-based survey in different cultural or ethnic groups with the aim to compare the groups on a particular scale, we need first to test the respective instrument for its comparability across different cultural or ethnic groups as these comparisons could be misleading. There are three main reasons why this is the case. Firstly, this could be due to the cultural specificity of the instrument. Cultural systems can determine the meaning and characteristics of a specific psychological construct and process (Miller, 1997), which can differ between different ethnic and national groups (e.g., individualist societies vs. collectivist societies can generate different meaning for the same bullying instrument and thus different quantitative results).
Secondly, there may be distorting effects relating to methodological biases affecting specific items (e.g., translation biases and errors) or possibly the whole instrument (e.g., due to culturally different perceptions in relation to response styles), lack of familiarity with the testing procedure, underrepresentation of the construct domain by the content of the test (e.g., other forms of victimization are missing) and so on. These methodological biases could violate the conditions for equivalent metric and/or structure across cultures and thus, quantitative cross-cultural comparisons could produce misrepresentative results.
Thirdly, there may be a lack of generalizability of individuallevel constructs to the national/cultural level. It could be that a specific construct (e.g., victimization) is used to describe individuals within a specific culture or ethnic group but does not necessarily characterize the national group as a whole. Thus, for example, when a bullying/victimization questionnaire is used with a specific cultural group and generates total scores of bullying and victimization, these scores describe and represent the characteristics of the individuals in the cultural group. When we then compare between ethnic/national groups based on these total scores or constructs, these scores become representative of these ethnic/national groups and we then assume cross-cultural differences. However, referring and attributing these individual-level characteristics to ethnic and national groups as a whole is misleading as the meaning of that specific bullying and/or victimization construct can alter from the individual level to the cultural one (Matsumoto and Van de Vijver, 2010).
As a result, the relation between specific scale items and the underlying dimensions may change across different (cultural) groups. It is therefore necessary to investigate the equivalence of the internal structure in each new ethnic or cultural group where the instrument is applied. A stepwise data analytic approach is suggested by Fischer and Fontaine (2010) and Fontaine and Fischer (2010) to test the comparability of psychometric instruments: (1) Structural equivalence contributes to the valid use of the instrument in cultural and ethnic contexts other than the one for which the instrument has been developed for. Structural equivalence is a necessary condition for the justification of indirect or direct comparisons between cultural or ethnic groups. (2) Structural isomorphism is necessary to demonstrate that the same internal structure of the instrument or scales applies to each cultural and/or ethnic group and to the individual levels.
The Bully/Victim Questionnaire (BVQ) by Olweus (1991), was established in one nation many years ago and is widely implemented globally. For many researchers, it provides and assesses the most appropriate definition of bullying and allows actions to be categorized into specific types of bullying and victimization behaviors (e.g., physical, verbal, and relational). There is evidence that it correlates with peer nominations of bullying (Lee and Cornell, 2009) and has good reliability (Breivik and Olweus, 2015). The tool has some limitations where bullies usually do not admit their behavior in self report. Thus, teacher and parental reports may be a valid way to extract this information in addition to the self-report. In addition, although the selfreport BVQ is often utilized in cross-national and crosscultural bullying research, the comparability across different cultural, national or ethnic groups, also referred to as measurement invariance (Widaman and Reise, 1997), has not yet been investigated.
In summary, the literature implies universal, as well as ethnicspecific aspects of bullying behavior, especially when taking diverse types of such behaviors into account. At the moment most of the available evidence cannot be directly compared due to methodological inconsistencies (e.g., utilizing different methods to assess frequency) and divergences in definitions of bullying. These discrepancies led us to conduct a cross-national and cross-ethnic comparative survey amongst five ethnic/national groups in four countries: Germany, Israel (Israeli Jewish and Israeli Palestinians), The Palestinian Authority (the Gaza Strip), and Greece. These ethnic/national groups represent different cultural norms, languages, and different levels of bullying work (e.g., research and anti-bullying intervention) where the same bullying instrument was used. It is an exploratory study with a random sample of convenience. It was felt that selection of the countries in an almost ad hoc fashion with this type of research design mimicked the many large and existing cross-cultural studies available today. Very often, countries are chosen to be part of these projects due to a range of random variables such as funding, governmental agendas, available resources and appropriately skilled staff. The aim of the current study is to investigate the extent of comparability of bullying and victimization rates within and between different countries and different ethnic groups including German, Israeli Palestinians, Israeli Jewish, Palestinians in the Gaza Strip, and Greek pupils.

Design and Sample
The present study is a cross-sectional, cross-national/ethnic comparison between lower secondary school pupils in Germany, Greece, the Gaza Strip in the Palestinian Authority, and Israel (Israeli Jewish and Israeli Palestinians). All samples were stratified according to age. The age range for the whole sample was from 12 to 16 years.
The convenience German sample (see Scheithauer et al., 2006) included two schools consisting of students from two different German federal states: Wittmund, Lower Saxony and the city state of Bremen. The original sample included 2,088 pupils. The sample from Bremen contained a total of 735 students of grades 5-10 from one conventional state secondary school, while the sample from Wittmund, Lower Saxony, represented 1,353 students, attending grades 5-10 of a state secondary school, as it is called "Kooperative Gesamtschule" (cooperative comprehensive school). A final sample of 1,729 German adolescents aged 12-16 years were included in this study.
The Greek sample included a convenient sample from two schools from the greater area of Drama, Greece. From the total sample, 33 parents (10.15%) did not give their written consent, 11 students (3.39%) withdrew and 7 students (2.16%) were not present on the day when the data was collected. Therefore, the final sample consisted of 270 students.
The Palestinian sample from the Gaza Strip included children from four representative areas in the Gaza Strip (Khan-Younis, Mawasy, Beit-Hanon, and Rimal) and from different school levels (primary, junior high school and high school). This is due to the different age groups in each school system. Potential participants were identified in schools and classes in random clusters which represented the Gaza Strip. The study originally included 1,137 students between the ages of 10-18 years. The number of children that completed the bullying questionnaire was 332, from which 266 students between the ages 12-16 years were included in the final sample.
The Israeli sample was administered in one Palestinian and one Jewish lower secondary schools in Israel (see Wolke and Samara, 2004). The Israeli society is composed of a variety of Jewish groups representing approximately 80% of the whole population, while Palestinian Arabs comprise 20%. In general, there are two educational systems in Israel: Jewish (Hebrew as language of instruction) and Arab Palestinians (Arabic as language of instruction), both under the supervision of the Israeli Ministry of Education. A convenient sample from 30 classes in two lower secondary schools in the center district (one from the Arab region and the other from the Jewish region) were chosen to participate in the study. Of these 1,183 pupils, 95 pupils (8%) did not participate as their parents declined permission and a further 167 (14.1%) were not present for data collection. Thus, a final sample of 921 pupils participated. Table 1 shows the frequency of participants in each ethnic/national group by gender and age. There were no significant differences regarding the distribution of boys and girls in different ages.

Procedure
The procedure was similar for all studies. Prior to the beginning of the research, letters which explained in detail the procedure and the purpose of the study and requested consent for the research were sent to the headteachers of each school. After receiving permission from the headteachers of the schools, letters explaining the aims and the procedure of the studies were sent to the teachers of each class and the children's parents. Written information about the study and a consent form for parents were passed on via the pupils. The overall aim of this study as well as the questionnaire was explained to the pupils and they were asked to give verbal consent. In addition, the definition of the term "bullying" and patterns of associated aggressive behavior were explained to pupils.
Teams of psychologists and/or social workers in each country carried out the research in each class. All pupils were free to discontinue their participation at any time.

Ethics Statement
The studies were approved by the ethical committees of the corresponding Universities. The study in Greece was approved by the Ethical Committee of Kingston University London,

Instrument
All participants completed the Bully/Victim-Questionnaire (BVQ; Olweus, 1991). The BVQ is an anonymous selfreport instrument used to gather information about the extent of bullying. In Germany, an authorized German version ("Fragebogen für Schüler und Schülerinnen ab der 5. Klasse, Form D") was used. For the Israeli, Greek, and the Gaza Strip samples, the BVQ was translated into Hebrew (for Israeli Jewish), Arabic (for Israeli Palestinians and Palestinians from the Gaza Strip) and Greek (for the sample in Greece) and then back translated to English by qualified translators. Any discrepancies were discussed and rectified for the bullying questions, according to guidelines by van de Vijver and Hambleton (1996). The questionnaire consists of two parts: things that have been done on purpose to participants and things that participants have done to others on purpose during the last 6 months at school. Each of these two parts contains ten short phrases or questions asking about direct and relational bullying and victimization.
The first five questions were related to victimization: (1) I was hit, kicked, pushed or threatened, (2) I had things taken from me or spoiled; including money, (3) I was made fun of, (4) Children I often play with said that they did not want to play with me (5) Other children told lies or nasty stories about me. The second five questions asked about bullying others: (1) I hit, kicked, pushed or threatened others, (2) I took or spoiled things from others; including money, (3) I made fun of others, (4) I said to children I often play with that I do not want to play with them, (5) I told lies or nasty stories about others.
For all questions, participants were asked how frequently they had experienced or shown these behaviors in the last 6 months. Response options were (0) never (1) only once or twice (2) two or three times a month (3) about once a week or (4) several times a week. The BVQ has been reported to have good validity and reliability (Olweus, 1994).

Statistical Analysis
Data analysis was conducted with the statistical package software Stata Version 14 and IBM SPSS Statistics 24.

Part 1: Differences Between Countries and Ethnic Groups
To assess the relationship of bullying and victimization status according to ethnic/national group two approaches were implemented. We added up the items of bullying to construct a continuous bullying variable and added up the victimization items to construct a continuous victimization variable. Then we performed ANOVA with Bonferroni post hoc comparisons between ethnic/national groups. Secondly, a categorical approach was implemented. For statistical analyses, the first two answer choices for each question were scored as 0 (neutrals) and the others as 1 (frequent bullies or victims). Therefore, children were categorized into four groups: (a) Pure Victims (PV) (those children who have been bullied at least two or three times a month but they have never or only once or twice bullied others in the last 6 months), (b) Pure Bullies (PB) (those children who have bullied others on purpose at least two or three times a month, but they have never or once or twice been victimized in the last 6 months), (c) Bully/Victims (BV) (those children who have been victimized and have bullied others on purpose at least two or three times a month during the last 6 months) and (d) Neutrals (N) (those children who have never, or only once or twice, been victimized or bullied others in the last 6 months). This dichotomous categorization using a cut-off point such as this is based on the core definition of bullying as a repetitive behavior, excluding singular events involving aggressive or violent acts.
Thus, differences in bullying and victimization involvement of each specific item are reported with frequency or cross tables. Bivariate associations between countries were calculated with chi-square-(χ 2 )-statistics (α < 0.05). Additionally, Multinomial Logistic Regression analyses were used to determine the unique effects of ethnic/national group on bullying behavior. The dependent variable (DV) for each logistic regression analysis represents the bullying/victimization subgroups (pure victim, pure bullies, bully/victims) which were compared to neutrals. The odds ratios (OR) and their 95% confidence intervals were determined as an effect measure for data with binary outcomes. The OR displays the relative chance of an outcome's occurrence (pure victim, pure bullies, bully/victims) in comparison to a reference population (neutral) to investigate differences between each two ethnic/national groups (e.g., German vs. Greek pupils).

Part 2: Structural Equivalence and Isomorphism
Evidence of measurement invariance or equivalence was sought using exploratory factor analysis with a matrix of polychoric correlation due to the use of ordinal response variables (Jöreskog, 1994). The analytical approach to test structural equivalence and isomorphism requires several analytical steps, as recommended in Fischer and Fontaine (2010) and Fontaine and Fischer (2010). For these analyses we used the continuous bullying and victimization variables. The testing strategy is presented in two sections.

Section 1: Testing for structural equivalence
A hypothesized two-factor structure of the BVQ, "bullying" and "victimization" was tested by computing the individual-level structure (overall factor structure). In this step, any possible national/ethnic differences were ignored, and the validity of factorial structure was tested. In a second step, the applicability of the individual-level structure to each ethnic/national group was tested. Specifically, it was verified whether the hypothesized two-factor structure over all sub-samples (i.e., individual-level structure) is similar to the structure within each ethnic/national group separately using orthogonal Procrustes rotation and evaluating the congruence between factor loadings using Tucker's coefficient of agreement (Tucker, 1951). To judge similarity, the value of the congruence measure should not be below 0.85 to be indicative of equivalence .

Section 2: testing for structural isomorphism
The ethnic/national level association matrix was computed based on the average item scores per ethnic/national group after estimating the size of ethnic/national variation with intra-class correlations (ICCs). Thereby, testing for the hypothesized twofactor structure on the ethnic/national level. Additionally, the ethnic/national-level structure is compared to the individuallevel structure by using orthogonal Procrustes rotation and calculations of the congruence measure. Specifically, we tested whether the structure over all samples (i.e., individual-level structure) would apply to the ethnic/national level structure.

RESULTS
Part 1: Bullying and Victimization for Each Ethnic/National Group Tables 2 and 3 show the frequency and the occurrence (according to the answer scale in the last 6 months: never, once or twice, two or three times a month, once a week, several times a week) for each bullying and victimization item for each ethnic/national group. The results show that involvement in different bullying and victimization behaviors varies across ethnic/national groups and occurrences. A general significant difference was found between ethnic/national groups in relation to all bullying and victimization items across the answer scales (p < 0.001).
Looking at the sum of the victimization items and bullying items, results from ANOVA with Bonferroni post hoc revealed that there are significant differences between ethnic/national groups. Greek pupils were more likely to be involved in bullying behaviors compared to all other ethnic groups (p < 0.001). On the other hand, Greece and Gaza adolescents were significantly more likely to be involved in victimization compared to all other ethnic/national groups (p < 0.001) and Israeli Jewish and Israeli Palestinians were significantly more likely to be involved in victimization compared to German adolescents (p < 0.001) (see Table 4).
We also looked at differences between ethnic/national groups using the overall bullying variable including the four subgroups: neutrals, pure victims, pure bullies, and bully/victims. Table 4 shows the prevalence of each subgroup for each ethnic/national group separately. When looking at  (1) only once or twice (2) two or three times a month (3) about once a week or (4) several times a week.  192.37 (16, 3093), p < 0.001]. + Answer scale: (0) never (1) only once or twice (2) two or three times a month (3) about once a week or (4) several times a week. p < 0.05; the remaining comparisons: p < 0.001) (see Table 4 and Figures 1, 2). Multinomial logistic regressions were performed to see the specific differences between each two ethnic/national groups in relation to each bullying subgroup where the reference point of comparison was the neutral subgroup (Table 5 also shows the frequency of each bullying subgroup in comparison to the neutral group for each ethnic/national group). The results of the multinomial logistic regressions comparisons were as follows:

Israeli Jewish vs. Palestinians in the Gaza Strip
The overall model was significant [χ 2 = 31.06 (3, 715), p < 0.001]. Israeli Jewish were more likely to be involved in pure bullying others in comparison to Palestinians from the Gaza Strip (OR: 2.85, 95% CI: 1.55-5.23, p < 0.01), while Palestinians from the Gaza Strip were more likely to be involved as bully/victims in comparison to Israeli Jewish (OR: 2.63, 95% CI: 1.53-4.52, p < 0.001).

Israeli Palestinians vs. Palestinians From the Gaza Strip Children
The overall model was significant [χ 2 = 8.63 (3, 738), p < 0.05] but no specific differences between the two groups in relation to the bullying subgroups were found.

Israeli Palestinians vs. German Children
The overall model was significant [χ 2 = 36.80 (3, 2201), p < 0.001]. Israeli Palestinians were more likely to be pure victims FIGURE 2 | Mean and 95% confidence interval for involvement in victimization by ethnic/national group.

Part 2: Structural Equivalence and Isomorphism
The above results revealed significant differences between ethnic/national groups in relation to involvement in bullying behaviors as bullies, victims or bully/victims. In this section, we will perform extra analysis to confirm whether the above results are valid and whether the comparisons between ethnic/national groups in relation to bullying and victimization is adequate and represent reality. In addition, we will test whether the use of these specific items represent two distinct behaviors (bullying and victimization) in each ethnic/national group. Thus, we performed structural equivalence and isomorphism analyses. As those two concepts are hierarchically orderedthe investigation of structural equivalence gives necessary but insufficient information and functions as analytical basis for isomorphism. Results for each section are explained in detail below and Figure 3 for overview of the analytical steps.

Section 1: Testing for Structural Equivalence
At the individual level, the expected two-factorial structure of the BVQ, "bullying" and "victimization" clearly emerged (see Table 6). Subsequently, the factor structure of each cultural/national sample was orthogonally Procrustes rotated toward the individual level structure and the congruence measure calculated for each factor per ethnic/national group. For most ethnic/national groups, the Tucker's coefficient of agreement exceeded 0.85 or even 0.95, with the exception of the sample from the Gaza Strip, which showed congruence value of 0.74 (victimization) and 0.65 (bullying). This finding supports the structural equivalence with the exception of the Gaza Strip sample.

Section 2: Testing for Structural Isomorphism
The individual items of the BVQ varied sufficiently between cultural/national groups. The intra-class correlation coefficient ranged from 0.016 to 0.11. The Gaza Strip sample was excluded from further analysis, due to the lack of structural equivalence. Subsequently, exploratory factor analysis on the ethnic/national level structure revealed a one-factor structure with congruence measure below 0.85. Thus, no evidence was found for structural isomorphism. Therefore, no further direct comparisons of the cultural/national samples are justified.

DISCUSSION
Our study set out to examine the validity of cross-ethnic and cross-national comparisons in relation to bullying and victimization rates using the same instrument (i.e., the BVQ). First, we compared the different ethnic/national groups and the results revealed significant differences in relation to involvement in bullying and victimization behaviors. The results showed that Greek children were more likely to be involved in bullying as pure victims in comparison to Israeli Jewish, Israeli Palestinian and German children, and as bullies and bully/victims in comparison to Israeli Palestinians, Palestinians in the Gaza Strip, Israeli Jewish and German children. The Israeli Jewish sample, on the other hand, were more likely to be involved in bullying as pure bullies in comparison to Israeli Palestinians and Palestinians in the Gaza Strip, and as victims in comparison to German children. Both Israeli Palestinians and Palestinians in the Gaza Strip were more likely to be involved in bullying as victims and bully/victims in comparison to German children, while German children were more likely to be involved as bullies. Finally, Israeli Palestinians and Palestinians in the Gaza Strip were more likely to be involved as bully/victims in comparison to Israeli Jewish. No differences were found between Israeli Palestinians and Palestinians from the Gaza Strip in relation to the bullying subgroups. The odds ratios ranged from 1.65 to 6.53, which indicated that differences are not equal between ethnic groups. Nonetheless, do the above results mean that each specific difference found represent reality? Or to put it another way, can we say that the specific ethnic groups are more or less likely to be a bully, victim or bully/victim in comparison to the other ethnic group using one standard questionnaire? In order to answer these questions, we deemed it necessary to perform structural equivalence and isomorphism analyses to examine the use of the bullying questionnaire within each ethnic group and to assess whether comparability is valid across the same groups. We initially verified whether the hypothesized two-factor structure of the BVQ, "bullying" and "victimization" over all sub-samples (i.e., individual-level structure) was similar to the structure within each ethnic group separately. We then tested whether the structure over all samples (i.e., individuallevel structure) would apply to the ethnic level structure. This was necessary to investigate the usefulness of our instrument and indeed, to determine if the initial conclusions drawn regarding the prevalence of bullying and victimization were appropriate.
The results found that at the individual level, the expected twofactorial structure of the BVQ, "bullying" and "victimization" clearly emerged. This finding supports the internal structure equivalence for each ethnic/national group with the exception of the Gaza Strip sample. Secondly, the exploratory factor analysis on the ethnic level structure revealed a one-factor structure with congruence measure below 0.85. Thus, no evidence was found for structural isomorphism and no further direct comparisons of the ethnic/national samples are justified. Thus, the structural equivalence and isomorphism analyses disapprove and invalidate the first section of results where we report significant differences between different ethnic/national groups (even within the same country, i.e., Israel). Also, the results show that the bullying questionnaire did not generate distinct bullying and victimization factors for the Gaza Strip sample.
Bullying is a recognized form of problematic behavior that is investigated worldwide in most cultures, ethnic groups and countries with shared and similar characteristics, different types and forms, and nature (Smith et al., 2016a). Research on crossnational and cross-ethnic comparisons on bullying to date relied on specific methodological approaches. Comparisons on rates and prevalence of specific bullying items or forms are often established using standard questionnaires that have been translated into appropriate languages. Although these studies can give some indication of differences between cultures or ethnic groups, the results reported here confirm that we need to treat these findings with caution. Statistical data analysis is also considered as a tool to determine whether cross-national or cross-ethnic comparison is valid and represent true differences and variations between cultures or even between ethnic groups within the same country.
FIGURE 3 | Overview of steps for the analysis of structural equivalence and structural isomorphism Fontaine and Fischer, 2010). * ICC = Intra-class correlation.
Of note, the first statistical methodology, testing for structural equivalence, where we found that the bullying tool used in the five studies has two distinct behaviors of victimization and bullying (except for the Gaza sample), indicates that the bullying questionnaire can be used to measure bullying and/or victimization within each ethnic/national group separately. For the Gaza Strip sample, the testing revealed that there are no distinct groups of bullying versus victimization that can be extracted from the items used. This can be interpreted by different reasons. Firstly, the political situation and the war in the Gaza Strip, where the whole population has been exposed to traumatic events (e.g., house demolition, killing of a relative, injuries) and to a siege since 2007, may thus make bullying questions and items seem like small events in comparison to these war events (Altawil et al., 2008;Abdeen et al., 2018;Samara, 2018, 2019). Secondly, there is a need for further analysis for this specific sample, where we should look at different types and forms of bullying (physical, verbal, relational) rather than general bullying and victimization. Thirdly, this could also be related to the difficult economic situation in the Gaza Strip compared to the other four samples.
In contrast, when applying the structural isomorphism testing, direct comparisons of the ethnic/national samples are not justified. The results raise awareness of how easily comparisons across groups can lead to spurious results. There is thus a need for preliminary analysis for each construct before evaluating group differences. Even within the same country (i.e., Israeli Palestinians and Jewish) comparisons cannot be conducted due to lack of evidence for structural isomorphism. Children and adolescents may perceive the meaning of the bullying items differently and thus comparisons may not reflect true differences or similarities. Furthermore, translating a specific questionnaire to other languages necessitates different validity tests that need to be performed to make sure that the questionnaire is measuring what it is intended to measure. This could also be due to procedural issues such as how the studies were performed in different countries and amongst different ethnic/national groups, how much the researchers were involved, and the level of explanation that the participants received about the bullying items. Finally, country differences such as socioeconomic inequality (Chaux et al., 2009) or cultural values (e.g., individualism-collectivism; Smith and Robinson, 2019) may differ from one study to another. Several limitations and issues warranting further research need to be considered when reviewing these results. First, these were convenience samples of different sizes and may not be nationally representative in some samples. A larger sample might provide more illuminating results (e.g., the Gaza Strip). Another limitation of this study is that it relies on self−reports and not on behavioral measures of bullying. As such, the risk of selection effects and biases have to be taken into account. Current limitations of the methods must also be acknowledged. For example, the conventional classification approach for bullying resulting in the common classes of "pure victim, " "pure bully, " "bully-victim, " and "neutrals" might overestimate involvement (see Schultze-Krumbholz et al., 2015 for further information). As evident in the current manuscript, there are a range of methodological shortcomings with this approach (translation and perception of the word bullying, different designs, reference time frame, answer scales, cut-off points or data analysis approaches; Sabella et al., 2013;Smith, 2014;Foody et al., 2017). More advanced methods to investigate measurement invariance like Multigroup Confirmatory Factor Analysis (MGCFA, e.g. Jovanović et al., 2019) or Multigroup Latent Class Analysis (MGLCA, e.g. Eid et al., 2003) are advisable and should be prioritized in future research. Nevertheless, we found the exploratory factor analysis, as recommended by Fischer and Fontaine (2010), more suitable in respect to the instrument used (i.e., BVQ) despite the restricted sample size on an individual and cultural level.

CONCLUSION
The statistical methodologies used in this study showed the importance of the methodological approach that is adapted when comparing bullying and victimization across different cultures and ethnic groups. We need to consider different issues when comparing different countries, cultures, and ethnic groups (between and within countries). Furthermore, cultural differences in interrupting and perceiving peer bullying and/or victimization situations, and the internal and the external validity of any study need to be taken into account to be able to compare between different ethnic/national groups. Countries differ on many characteristics like educational policies, personal beliefs, attitudes, values, and so on. Other factors that need to be taken into account are linguistic issues related to the translation and definition of bullying in different cultures, and measurements invariance that could be related to age and gender differences. Future analysis should also look at the different forms of bullying and victimization, including physical, verbal, relational, and cyber bullying. In addition, a failure to demonstrate invariance can be helpful to make conclusions about how different groups interpret the same construct. Some constructs are simply experienced so differently across various groups.
The results of the current study raised a fundamental demand that different aspects need to be taken into account when comparing bullying and victimization between and within countries. This study is a contribution to the discussion of whether and how study results from different nations and/or cultures can be compared. Although standards have been defined for cross-cultural research for some time (e.g., Matsumoto and Van de Vijver, 2010), these standards have not yet been become part of cross-national bullying research.
Bullying is a concern for children, parents, schools, and practitioners . These groups, as well as policy makers, educational practitioners, and researchers should take into account the current results when attempting to compare between different ethnic/national groups or even across schools. The current results also call into question the common practice of adopting any given anti-bullying intervention or prevention program from another cultural context to another. The results presented here suggest that their utility may also depend on potential cultural or ethnic values and perceptions Smith et al., 2008Smith et al., , 2012.

DATA AVAILABILITY
The datasets for this manuscript are not publicly available because they are used in other ongoing studies for publication. Requests to access the datasets should be directed to the authors of the manuscript (MS: M.Samara@kingston.ac.uk and HS: herbert.scheithauer@fu-berlin.de).

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of British Psychological Society Guidelines with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The study in Greece was approved by the Ethical Committee of the University of Kingston London, United Kingdom. The studies in Israel and the Gaza Strip were approved by the Ethical Committee of Hertfordshire University, United Kingdom and the corresponding Ministries of Education in both countries. In Germany, the survey was conducted in accordance with the guidelines of the Institutional Review Board of the University of Bremen.

AUTHOR CONTRIBUTIONS
MS, HS, and KG contributed to the conception and design of the study. KG and MS organized the database and performed the statistical analysis. All authors contributed to the acquisition and interpretation of data for the work and all authors drafted the work and revised it critically for important intellectual content and approved the submitted version.