A Population-Based Psychometric Validation Study of the Strengths and Difficulties Questionnaire – Hebrew Version

This study presents the psychometric properties of the Strengths and Difficulties Questionnaire – Hebrew version (SDQ-H), used in the Israel Survey on Mental Health among Adolescents (ISMEHA). The SDQ-H was administered to a representative sample of 611 adolescents and their mothers. Structural validity was evaluated by exploratory and confirmatory factor analysis and the Development and Well-Being Assessment (DAWBA) inventory was used as “gold standard” to test convergent and discriminant validity. Internal consistency and normative scores were established. Agreement was found with the original factor structure, except for the Peer problem scale. Concurrent and discriminant validity varied from fair to very good for most scales. Total Difficulties scores showed better discriminant validity for the adolescents’ than the mothers’ report for internalizing disorders, and the opposite for externalizing disorders. Internal consistency for the Total Difficulties was 0.77 and for the Hyperactivity scale it was 0.73. It was lower for the other scales, particularly for the Peer problems scale. The findings suggest reasonable psychometric properties of the SDQ-H. Comparisons with other translated SDQ versions are presented.


2009
). The prevalence of any disorder among Israeli adolescents aged 14-17 was reported as 11.7% (Farbstein et al., 2010), comparable to prevalence rates of mental disorders conducted worldwide, which have reported rates varying between 7 and 16.4% (Costello et al., 1997;Meltzer et al., 2003;Canino et al., 2004;Heiervang et al., 2007). Thus, we do not expect much variation in SDQ mean scores when comparing Israeli youth to that of the aforementioned countries. Cross-national differences should be assessed with the awareness that there may be reporting biases. For instance, a comparison of Norwegian and British parents and teachers found systematical under-reporting of emotional symptoms on the SDQ by Norwegian parents (Heiervang et al., 2008). This paper presents the psychometric properties of the SDQ-Hebrew (SDQ-H) versions for parents and adolescents. It includes normative scores, internal consistency, and construct, concurrent and discriminant validity. Data were obtained from the Hebrew respondents of the Israel Survey of Mental Health among Adolescents (ISMEHA), conducted on a representative sample of Israeli adolescents and their mothers in -2005(Mansbach-Kleinfeld et al., 2010.

Materials and Methods the survey population
The ISMEHA collected data from a nationwide, cross-sectional representative sample of 14-to 17-year-old Israeli adolescents living in the community, in urban settings with more than 2,000 inhabitants and meeting the status of legal residents according to the National Population Register (NPR). Initially 13-to 15-year-olds were sampled introduction The Strengths and Difficulties Questionnaire (SDQ) is a multiinformant screening measure designed to detect mental health problems in children and adolescents (Goodman, 1999). It has parent-and teacher-report versions for 4-to 16-year-olds, and a self-report version for 11-to 17-year-olds. The tool is increasingly being used in community and clinical settings and in cross-cultural research thanks to its brevity, accessibility, and availability in the public domain (http://www.sdqinfo.com).
Structural validity varies with some studies supporting the original five-factor structure (Hawes and Dadds, 2004;Woerner et al., 2004a), and others showing different patterns (Thabet et al., 2000;Koskelainen et al., 2001;Muris et al., 2003). Concurrent validity has been generally supported (Goodman and Scott, 1999;Muris et al., 2003) and SDQ scales have demonstrated fair to good discrimination between groups with and without Attention Deficit Hyperactivity Disorder (Yasong et al., 2008).
Regarding SDQ patterns and norms, it has been shown that populations that have a higher prevalence of mental disorders have higher mean scores in the SDQ scales (Goodman and Goodman, 1. Cities were ordered according to size of the adolescent population 13-15 years of age. 2. The three largest Israeli cities (Jerusalem, Tel Aviv, and Haifa) plus another eight large cities were included in the sample with certainty. The adolescents in those cities were chosen through systematic random sampling and they were sampled in onestep. 3. In all the other urban settlements the sampling was made in two-steps. The cities were distributed into strata according to the two main types of locality (whether they were Jewish/ mixed or mainly Arab-populated cities) and by six geographical regions (Jerusalem District, Tel Aviv District, Haifa District, Northern District, Central District, and Southern District) and ordered within each stratum according to size. Size referred specifically to the estimated number of 13-to 15-year-olds in the sampling frame in each city. 4. The urban settlements in the sample were chosen through systematic random sampling with a probability proportional to size so that the final sample in each stratum represented all the adolescents in that stratum. 5. All adolescents within each of the sampled localities were ordered according to age, gender, and geographic distribution within the city, in an attempt to represent all socio-economic groups. The adolescents in that city were sampled by a systematic random sampling method. The sampling probability within each city was calculated so that the final sampling fraction would be the same for the total sample.

The Strengths and Difficulties Questionnaire
The original SDQ was translated into Hebrew, considering cultural aspects, colloquialism, and idioms to remove biases specific to the original tool. Bilingual researchers conducted the back-translation, which was compared to the original questionnaire (Goodman, 1999) and revised by the SDQ's author. Following a pilot study on 15 subjects a final decision on the exact wording was made. The SDQ is a screening diagnostic instrument designed for evaluating social, emotional, and behavioral functioning in children and adolescents ages 4-17 years. It includes five subscales: four of them refer to difficulties and one to the adolescents' strengths. Its 25 items cover four clinical domains, namely, hyperactivity-inattention, emotional symptoms, peer-relationship problems, and conduct problems, and one distinct prosocial behavior domain. Each item/statement is rated on a 3-point scale as 0 (not true), 1 (somewhat true), or 2 (certainly true). The questionnaire is multi-informant, i.e., can be administered to adults (parents, caretakers, and teachers), and also includes a self-report version for adolescents aged 11-17. In addition to the clinical domains, the SDQ is supplemented with an impact module that asks the respondents to assess whether the adolescent in question has a problem, its chronicity, and whether this results in emotional distress, social impairment, or burden to the family. The psychometric properties of the SDQ tested in different cultural contexts and in clinical and community settings, are satisfactory (Goodman et al., 2000b;Goodman, 2001;Vostanis, 2006). Most SDQ studies report Cronbach's alpha coefficients above 0.70 for so that by time of interview they would be between 14 and 16 years of age. However, due to fieldwork constraints, not all the data was collected within schedule and therefore the age of the group shifted upward. Not included in the survey population were, adolescents residing in small rural settings, such as Kibbutzim or other collective settings (comprising 7.3% of the population in this age group), unrecognized Bedouin villages (1.7% in this age group); Palestinian residents of East Jerusalem (2.8% in this age group), and; Jewish ultra-orthodox (Haredi) adolescents (17.8% in this age group). The Haredi adolescents were excluded before data collection began due to low response rates achieved in the pilot stage (4%), despite active efforts made to adjust the survey's methods to the specific requests of ultra-orthodox leaders. The ultra-orthodox Jewish children and adolescents attend schools where girls and boys learn separately and live in more of less segregated neighborhoods in several cities. They are not allowed to watch TV or films, to read secular newspapers or use the web and thus parents and teachers have strong control over the information to which they are exposed. Therefore, the content of our survey was not deemed appropriate for this population by the religious leaders who were approached. The 245 Jewish ultraorthodox adolescents excluded from the study were identified by the type of school they were attending and their home address, as they attend a separate educational system and live in specific neighborhoods. Children of migrant workers, a very small percentage of the population in this age group, were not included in the study, as most do not meet the status of legal residents in Israel.

the saMpling fraMe
The sample was based on the NPR. The file, updated to August 2002, included the names of all residents, born in Israel or abroad, between July 1, 1987 and June 30, 1990; and demographic data such as home address, school of attendance, country of birth, and year of immigration, if relevant. New immigrants arriving in the country after August 2002 were not included in the sample. Deceased adolescents were removed from the sampling frame. The number of adolescents in this age group in the sampling frame (including Jewish ultra-orthodox adolescents) was 317,604.

saMple size and saMpling probability
Based on epidemiological studies that reported prevalence of mental disorders among adolescents of between 12 and 20% at the time the investigation was planned (Achenbach and Howell, 1993;Verhulst et al., 1997;Surgeon General, 1999;Canino et al., 2004), a sample size of 1,000 adolescents (without Jewish ultraorthodox adolescents) was calculated to enable the identification of "any mental disorder" with adequate statistical power. Given an expected response rate of 2/3, an initial sample size of about 1,500 adolescents was chosen. The average sampling fraction was 1/212. However, calculations were made according to a sampling fraction of 1/190 and therefore the final sample was not 1500 but 1670, a little larger than originally planned.

saMpling Method
To increase the cost-efficiency of the study, we sampled localities with a minimum of 30 adolescents in this age group in the sample. All the urban settlements were distributed into sampling strata as follows: The mother was asked to sign a consent form for herself and her child and both were informed of their right to stop the interview at any time.
Mothers were interviewed in Hebrew (N = 570), Arabic (N = 300), or Russian (N = 87), according to the preference of respondents. Adolescents were interviewed in Hebrew (N = 657) and Arabic (N = 300), assuming that all adolescents who had immigrated from the Former Soviet Union were fluent in Hebrew by the time of the interview. The analyses presented here were made only on mothers and adolescents who responded in Hebrew. Among these, 553 mothers and 611 adolescents filled in the SDQ questionnaire.

training of interviewers
The 104 lay interviewers who participated in the data collection, mostly women, were college students and experienced survey interviewers. They were trained in small groups in an 8-h training session during which they went over the interview schedule, reviewed the questionnaires and were instructed regarding particular aspects of the different questions. After they completed the first three interviews, they went through the questionnaires with the fieldwork supervisor and reviewed any errors or omissions in each of the completed questionnaires.

confidentiality and ethics
All interviewers signed a confidentiality form. All identifying information, except for the identity code, was stored separately from the questionnaire. Confidentiality could be infringed only for cases in which the adolescent reported sexual abuse or explicit suicidal intentions of which parents were unaware, cases that the law mandates should be conveyed to the appropriate authorities, though we found none. Parents signed consent forms for their own and their child's participation in the study, as approved by a Human Subjects Committee of the Schneider Medical Center for Children in Israel. Adolescents were explained the objectives and methods of the survey and could abstain from answering questions. Confidentiality was assured. response rates Table 1 shows response rates by gender and population groups. Overall response rate was 68.2%: 14.8% of the subjects were not located and 17.0% refused to participate in the study. Among the located subjects response rate approached 80%. Response rates varied among population groups with higher rates among boys than girls and higher among Druze, Muslim, and Christian respondents than among Jewish respondents. Also, there were higher response rates among adolescents living in mid-sized urban localities than among those living in larger cities.
Defining "respondents," "refusals," and "not contacted" "Respondents." The subjects were classified as respondents, if the interviews of either mother or adolescent or both were performed. Twenty-two mothers refused to answer the questionnaire but signed and consented that their child participates, and 50 adolescent refused, although their mothers answered the questionnaire. In these cases, we based our diagnoses on one single source of information. Thus, we have 885 cases with two informants and 72 cases with only one informant (either adolescent or mother). the total Difficulties score for the three types of informants (van Widenfelt et al., 2003), and the lowest internal consistency for the Conduct (Koskelainen et al., 2001;Ronning et al., 2004) and Peer problems scales (Goodman et al., 2000a), which may indicate that these scales measure "more heterogeneous content than intended (Palmieri and Smith, 2007, p. 190).
The SDQ has been used internationally and translated into more than 60 languages, which are readily available at http://www.sdqinfo. com/. The questionnaire took from 5 to 10 min to complete.

The Development and Well-Being Assessment inventory
The Development and Well-Being Assessment (DAWBA) was used as "gold standard" for testing the SDQ's concurrent and discriminant validity. It includes a package of questionnaires, interviews, and rating techniques used to generate ICD-10 and DSM-IV psychiatric diagnoses for children aged from 5 to 17. The specific disorders assessed were categorized into "internalizing," "externalizing," or "any mental disorder" (Goodman et al., 2000a).
Both mothers and adolescents responded to the DAWBA inventory, constructed for its administration in the community, which combines some of the best features of structured and semi-structured measures. When definite symptoms were identified by the structured questions, interviewers used open-ended questions, and supplementary prompts to get parents or the adolescent to describe the problem in their own words (Goodman et al., 2000a). On the basis of the comments of both mothers and adolescents recorded by the interviewers, a team of psychiatrists confirmed or rejected the preliminary computerized diagnoses and a final single diagnosis for each adolescent was thus obtained. The Hebrew translation was performed by the same procedure as described above for the SDQ.

survey Mode
The survey used a face-to-face interview mode and was carried out at the respondents' homes between January 2004 and March 2005. Two trained interviewers interviewed the mother and adolescent simultaneously and independently. The mother was specifically targeted, as opposed to "any adult caretaker," because we assumed she is more frequently at home, more accessible and more aware of the health services used by the family members. As in other studies (Costello et al., 1996), we selected the mother as the adult respondent, unless she had not lived with the child for the 6 months preceding the interview.
A survey firm, Public Opinion and Marketing Research of Israel (PORI), employed interviewers and supervised the fieldwork, together with the staff of the Ministry of Health. The face-to-face mode was particularly important given the sensitive questions asked and the length of the interview. On average the mothers' interview took between 50 and 90 min and the adolescents' interview between 45 and 75 min, depending on the history of disorders of the adolescent.
Efforts were made to confirm the address and telephone number of families of all the adolescents included. An introductory letter explaining the objectives, randomization methods and confidentiality of the data and providing a phone number for possible queries was sent to each pre-designated target family. A week later, the interviewer arranged an interview date by phone and at the assigned date two interviewers made a home visit. explained above, as well as those who left the country. Respondents and refusals were included in the study population and those not contacted were proportionally divided into refusals and those who should not have been included in the study population from the start.

Quality control procedures
Data quality was controlled in a number of ways. Parents had to sign an informed consent form and, therefore, we could corroborate that the interview had indeed taken place. A few parents refused to sign the form out of fear that this would obligate them to something later on but agreed to carry out the interview. These parents were contacted again to make sure they had agreed to be interviewed. All questionnaires were reviewed upon reception by the field coordinator and missing data that could be provided by the interviewers were retrieved as soon as possible.

data analysis
Analyses on unweighted data were performed using the SPSS -17 software package (SPSS Inc., Chicago, IL, USA). Internal consistency of the SDQ-H and distinct scales was assessed with Cronbach's alpha. An exploratory factor analysis (EFA) was performed to identify underlying factors and assess construct validity of the SDQ. This was followed by confirmatory factor analysis (CFA) in AMOS. Although construct validity is not synonymous with factor structure, we use here the term construct validity for consistency with Kline (1986) and Thompson (2004) and with the understanding that factor analysis is a commonly used way to assess the extent of construct validity.
Pearson product moment correlations between SDQ underlying scale constructs were calculated and t-tests were used comparing SDQ mean scores between DAWBA cases and non-cases (Farbstein et al., 2010) to test concurrent validity. Discriminant validity was assessed using receiver operating characteristic (ROC) curves, employing area under the curve (AUC) as an index of discriminant ability. Distributions of raw scores were used to determine the cut-off scores to identify normal, borderline, and abnormal bandings. "Refusals." After sending an introductory letter, the family was contacted in order to convince both mother and adolescent to participate in the study. If the family refused, a second letter signed by the Mental Health Advisor to the Minister of Health (IL) was sent, appealing to them to contribute to the public good. If they still refused, the survey coordinator again tried to convince them to participate. Only after the third refusal, they were classified as "Refusals" (N = 238).
"Not located." If the introductory letter was returned by the post office, the interviewer visited the address to corroborate that the family had moved and to try to find out from neighbors the new address. If the interviewer could not obtain additional information, the school principal was approached and, with the help of an official letter from the Ministry of Education, was requested to help us find the adolescent's address and home telephone. If the child had left the school and the school authorities did not have any information, we approached the Ministry of Interior and tried to find out the new address of the family. If we failed, the case was considered "Not located" (N = 207).

inflation Method and response rates
The sample was weighted back to the total population to compensate for unequal selection probabilities resulting from clustering effects and non-response. The weights were adjusted to make weighted sample totals conform to known population totals taken from reliable Central Bureau of Statistics (CBS) sources, after Jewish ultra-orthodox adolescents were removed. The weighted groups were chosen according to gender, age, and population groups. The categories were gender (male and female), age (14, 15, 16, 17 years), and population group (Jews and others born in Israel; Jews and others born abroad; Muslim and Christian Arabs; and Druze). The inflation method for each group of individuals was determined according to the known characteristics of the respondents and non-respondents in the given group. Response status was assigned according to the following categories: (a) respondent; (b) left the country; (c) refused to answer; (d) ultra-orthodox; (e) not located. Jewish ultra-orthodox youth were excluded from the poplation, as Exploratory factor analysis was performed on the SDQ-H motherversion (Figure 1) and on the self-version (Figure 2). In the current EFA analysis principal components analysis (PCA) was used as the algorithm to maximize the variance explained (Gorsuch, 1993). Varimax rotation was applied as it maximizes the squared loadings of a factor, gives similar results to many oblique factorial solutions and produces an orthogonal solution (see Gorsuch, 1993). To identify the number of factors a scree plot was generated of PCA of the actual SDQ data superimposed on data values from 20 (based on previous recommendations) simulated datasets of results of the fieldwork Table 2 presents the sociodemographic characteristics of the survey participants who answered in Hebrew. The proportion of boys was slightly higher than that of girls and about 17% of adolescents lived with their divorced or single mother. The majority of respondents lived in families with up to three children. Almost half of the respondents' mothers had 13 years or more of education. Nearly 17% of the respondent' fathers did not work in paid employment; either due to a physical or a mental disability, because they were pensioners or still studying, or in prison or because they did not find work. Over 3/4 of the adolescents were born in Israel.

Mean scores
Mean scores for the SDQ-H samples and comparisons by gender are presented in Table 3. The self-report was more gender-specific than the mothers' report. Table 3 shows that mothers attributed higher mean scores to girls' regarding emotional symptoms and prosocial behaviors, while they attributed higher mean scores to boys regarding hyperactivity. Among adolescents, the findings show that girls attributed themselves higher scores in Total Difficulties, Conduct problems, Hyperactivity, Prosocial behaviors, and Impact, while boys attributed themselves higher scores in the Emotional symptoms scale. We compared mean scores of mothers' ratings with self ratings and found that they were significantly different for all scales (data not in table). Mothers attributed higher Prosocial scores to their child than did the adolescents themselves (F = 5.114; p < 0.001), while adolescents rated themselves with significantly higher scores Other 129 23.7

FiGure 1 | exploratory factor analysis of SDQ-H scales in the parent version.
was split into two distinct factors: "restless" and "fidgeting" loaded on the hyperactivity factor, while "distracted," "thinks things out," and "attention span" loaded on the attention deficit factor. Additional items, originally belonging to other scales, also loaded with the hyperactivity (i.e., "tantrums" and "fights") or the attention deficit (i.e., "obedient") factors. The Peer problems scale did not retain its original structure, as its items loaded on other factors. Also for the self-version the Emotional problems factor retained its original structure (Figure 2). The Hyperactivity scale was also here divided into two factors, one representing the more external behaviors and the other the attention elements. The Peer problems scale was cross-loaded to all the other factors and did not retain its original structure. Three of the items in the conduct scale were cross-loaded to the Hyperactivity scale.
Confirmatory factor analysis was performed to estimate the goodness of fit between the SDQ structure and the sample data. Five factors were specified. Loadings on all items were significant and no errors were set to correlate and the post hoc improvement indices were not used. Goodness of fit was examined using the χ 2 and RMSEA. Chi squares were examined where significant values mean that the model specified significantly deviates from the data (i.e., misfit). RMSEA was chosen since it rewards model parsimony and is generally insensitive to sample size and, unlike most fit indices, confidence intervals around the point estimate are available. Good model fit is indicated by an RMSEA value under 0.06 and values below 0.08 indicate moderate fit. Lower values reflect a better fit to the model. Results show that a fivefactor model was a reasonable fit to the data for the parent version (χ 2 = 1096.71, df = 289, p < 0.01; RMSEA = 0.07, 90% CI 0.066-0.075), and a good fit for the self-version (χ 2 = 612.413, df = 269, p < 0.01; RMSEA = 0.047, 90% CI 0.042-0.052). The final model consisted of four moderately to highly correlated factors: emotional, conduct, hyperactivity, and peer related problems. A fifth and distinct prosocial factor was negatively related to the four problem factors.

construct distinctness
Pearson correlations between the distinct SDQ-H dimension scores showed, for the parent version, a positive association of conduct problems with hyperactivity (r = 0.54, p < 0.001) and emotional symptoms (r = 0.40, p < 0.01) scores. Although, emotional equal row and column size to the actual dataset (Gorsuch, 1993). Figure 3 depicts the eigenvalues of the SDQ data plotted against the simulated eigenvalues. This indicates that the eigenvalues for the first five components in the data exceed the simulated eigenvalues, thereby providing further support for the five component solution. To guide interpretation of the results of EFA, values in excess of 0.3 were used as is common for self-reported data, and for CFA statistical significance model fit and significant loadings were used.
For the mother SDQ-H version (Figure 1) all items in the Emotional Symptoms and Prosocial Behavior scales loaded on the corresponding factors. The Hyperactivity-inattention scale  We compared adolescents with and without a mental disorder according to DAWBA, using ROC analysis, where sensitivity and specificity are combined and provide an AUC, with values varying between 0 and 1 ( Table 6). Yasong et al. (2008), who made in China a study similar to ours, claim that a score of <0.6 suggests that discrimination is no better than chance; between 0.6 and 0.75 suggests it is fair; between 0.75 and 0.90, that the discrimination is good; and above 0.90, that it is very good. We found that all the scales, except the Prosocial Behavior scale, discriminated fairly between those with and without any mental disorder. The discriminant power of the Emotional Symptoms and Peer problems scales in the self-reported version vis-à-vis internalizing disorder were good and fair (0.84 and 0.65, respectively), but they had only fair discriminant power in the parent version. The discriminant power of the Conduct problems and the Hyperactivity scales in the parent version vis-à-vis externalizing disorders were good and very good (0.77 and 0.91, respectively), whereas in the self-reported version they were fair (0.73 and 0.75, respectively).

norMative scores
We used Goodman's trichotomy for the SDQ screening (Goodman et al., 2004), according to which about 10% of the population should be classified as "probable cases," another 10% as "possible cases," and the remaining 80% as "unlikely cases." The cut-off points were contextually based, according to the prevalence of any mental disorder in this age group in the country of concern. In Israel the prevalence of any mental disorder among 14-to 17-year-olds is 12% (Farbstein et al., 2010), therefore the top band was defined as the top 10-12%. The next band down was chosen to be the same size, and the rest was considered to be in the "unlikely" category. Given that we had a limited sample size, the observed percentage of adolescent within each range does not conform exactly to the expected percentage. It is to be expected that with a large sample size, the cutting points could be closer to the 10%-10%-80% proportions. Table 7 shows the normative scores for detecting "caseness" for both informant versions of the SDQ and the percentage of adolescents falling within each category. No significant differences in the percentage of probable diagnosis by gender were found neither for the parent or the self-versions. These norms were not different for the Israeli population answering in Hebrew than for the total Israeli sample.
symptoms scores were associated significantly and positively with peer problems and hyperactivity scores (r = 0.35 and 0.27, p < 0.01; respectively), and between peer-and conduct problems scores (r = 0.23, all p < 0.05), the associations were small in magnitude (see Table 4). For the self-version (in the parentheses), Pearson correlations show very similar results in magnitude, direction and significance.

concurrent validity
Concurrent validity was assessed against independent psychiatrists' diagnoses according to DAWBA (Farbstein et al., 2010). Table 5 shows that mean scores for the mother-and self-versions were significantly higher for adolescents with an internalizing disorder than for those without, on all scales (p < 0.001), except for the Prosocial scale. Mean scores for the mother's and selfreports on the Conduct problems and Hyperactivity scales were higher for those with an externalizing disorder. The Total Difficulties score and all the SDQ scales showed significantly higher scores when any mental disorder was present, except for the Prosocial self-report.   non-clinical settings describe themselves as having more problems, while those in a clinical sample are prone to under-report their mental problems (Becker et al., 2004).

internal consistency
In both informant versions, the Total Difficulties and Hyperactivity scale showed acceptable internal consistency, while the Emotional symptoms, Conduct problems and Prosocial behavior scales had a somewhat lower internal consistency. The Peer problems scale had an unacceptable reliability. The Peer problems' known low reliability (Goodman, 2001;Shojaei et al., 2009), has been attributed to the possibility that it measures a heterogeneous construct (Palmieri and Smith, 2007). The disappearance of the Peer problems scale as a factor in our EFA, added to its low internal consistency, supports this explanation.

construct validity
Our findings suggest that, except for the Peer problems which crossloaded with all other factors, and to a lesser extent also the Conduct problems of which three items cross-loaded with the hyperactivity factor, the factors underlying SDQ-H scales are distinct constructs.
In accord with studies conducted in different settings and languages (Hawes and Dadds, 2004;Woerner et al., 2004b), we found that the final model consisted of four moderately to highly correlated factors comprising the Total Difficulties score, and a fifth and distinct prosocial factor inversely related to the four problem factors. This five-factor model was a reasonable fit to the data for both SDQ-H versions. The CFA shows that the emotional and prosocial factors retained their original structure whereas the hyperactivity-inattention scale split into two distinct factors. Cross-loading of the "obedience" item discussion This study shows that, in general, SDQ-H has acceptable to good internal consistency, and construct, concurrent, and discriminant validity. The Hyperactivity and Emotional symptoms scales were the most robust, while the Peer problems scale was the weakest. The Total Difficulties score for both parent and self-versions had good reliability and discriminant validity. These results are important to determine the viability of using SDQ-H as a screening measure in community surveys and as an evaluation measure for clinical practice.

coMparability of Mean sdQ-h scores
Similarly to the Dutch (van Widenfelt et al., 2003), our mean scores were lower than the British on the Emotional Symptoms and Hyperactivity scales and Total Difficulties scores. This variation could reflect age differences, lower expectations and higher tolerance for misconduct, or more lenient judgment of problematic behaviors and traits by parents in the different populations (Marzocchi et al., 2004).
Unlike the French (Shojaei et al., 2009), but consistent with the Dutch (Muris et al., 2003) and Finnish (Koskelainen et al., 2001) studies, girls scored higher on both informant versions of the Total Difficulties scale. The discrepancy between the studies may be attributed to age grouping, as the French study included younger children, while our sample included senior adolescents, among whom the likelihood of internalizing disorders among girls is higher (Farbstein et al., 2010). Consistently with most studies, mothers scored their boys higher on the Hyperactivity scale and their girls higher on Prosocial Behavior. The self-version revealed no gender differences.
Other studies (Koskelainen et al., 2001) have also found a tendency for youth to rate more symptoms than parents, noticing that, compared to their parents' or teachers' , children and adolescents in

PArenT-rePorT
Western countries that consistently support the original five-factor structure (Goodman, 2001;van Widenfelt et al., 2003), have also found limited associations between the "obedient" item and the conduct problems factor, with which it was originally associated. The low internal consistency of the Peer problem scale may also affect attempts to replicate previous factor analyses, since "unreliable items … can … lead to different factor structures" (Ozer et al., 2009, p. 918). A study conducted in China (Yasong et al., 2008), found low validity for the Conduct and Peer problems scales and attributed this to greater cross-cultural acceptance and consistency of how prosocial behavior, emotional disorders, and hyperactivity should be expressed than about the types of behaviors indicative of positive peer relationships or of oppositional and conduct problems.

construct distinctness
Correlations between SDQ scales scores were small to moderate in magnitude, which indicates their distinctness. The Total Difficulties score was highly correlated with the Hyperactivity, Emotional problems, and Conduct problems scales and this indicates a large contribution of these scales to the summary scale.

concurrent validity
The SDQ Hyperactivity and Conduct problems scales were significantly associated with externalizing disorders according to DAWBA. Regarding internalizing disorders, all four problem scales differentiated well, even when excluding adolescents with co-morbidity. In sum, the correlations between SDQ scales' predictions of psychopathology and the diagnoses made with DAWBA showed sound external validity.

discriMinant validity
As expected, (Goodman, 2001;Goodman et al., 2004), ROC analyses showed that parents discriminated better than their children regarding externalizing disorders, while adolescents discriminated better regarding their internalizing disorders (Van Roy et al., 2008). Mothers had good discrimination for Conduct problems scale and very good discrimination for Hyperactivity scale vis-à-vis externalizing disorders. Adolescents showed good discrimination for the Emotional symptoms scale and fair discrimination for the Peer problems scale vis-à-vis internalizing disorders.
Although better predictions are obtained when the multiinformant algorithm, based on both symptoms and impact scores is used since the total error rate is reduced (Goodman et al., 2004), our results, based on single sources, support the SDQ-H's power to discriminate between adolescents with and without a mental disorder.

Comparing self and mothers' reports
Given the SDQ's widespread use in the community and in clinical settings, an important question is whether information gathered only from adolescents is as reliable and valid as data received from their parents. We found that the adolescents' version had as good internal consistency as the mothers' and even better construct validity. Total Difficulties showed better discriminant validity for the to the inattention factor may mean that, in this population, disobedience is not attributed to the adolescent's will to break a norm but rather to his/her inability to pay attention and understand what is being demanded. It has been argued that this item could have a different function or meaning among Middle Eastern than among Western children (Thabet et al., 2000), though studies carried out in adolescents' than the mothers' version for internalizing disorders but the opposite for externalizing disorders. Satisfactory reliability and good construct and concurrent validity were found for both versions. CFA showed better fit for the adolescents' than for the parent-report. Concurrent validity equally discriminated between disorders in the SDQ-H parent and self-reports. The significant contribution of the self-version to diagnostic status, particularly if there is no other source of information, has been reported (Becker et al., 2004). Although ADHD should be diagnosed when symptoms are present in two or more settings and thus the information provided by mothers and teachers is necessary (Goodman et al., 2004), we found good discriminant scores in the Hyperactivity scale for the self reports. We conclude that adolescents' self-ratings contribute to the diagnosis and the use of the self-report version when no other possibility exists is valid.

Cultural, semantic, and language differences
Questions beyond sample size and age examined arise in the case of SDQ translations carried out in cultures markedly different to that where the SDQ originated. Three possible explanations for the incomplete agreement between our EFA and the original SDQ structure are: (a) there are inherent problems in the scale, which measures more heterogeneous content than intended (Palmieri and Smith, 2007); (b) different cultural norms, social desirability and parental expectations produce different results (Marzocchi et al., 2004); (c) the translation suffers from lack of semantic equivalence (Flaherty et al., 1988). A French study claims that the item "bullied" was rated "partly true" more frequently by French than UK parents and that the "difference could be explained by the translation of the corresponding item rather than the transcultural differences, given the French translation of that item has a milder meaning than the corresponding English item" (Shojaei et al., 2009, p. 745).
Although a comparative study found that a predictive algorithm developed for use in England worked well in Bangladesh despite the difference in language, culture and socio-economic circumstances (Goodman et al., 2000a), the question remains whether the algorithm, which can predict diagnoses in clinic samples with a good degree of accuracy, is equally useful when applied to community samples.

liMitations
The sample size might be relatively small for certain analyses. EFA requires a large sample before it settles into a reliable pattern and the normative data for the different subscales and versions require large sample sizes in order to present very distinct cut-off points. Another potential limitation is that analyses were performed independently on mothers and adolescents and therefore we could not test in this population whether SDQ predictions work best when they are multi-informant (Goodman et al., 2004).
Regarding concurrent validity, which used the DAWBA as the gold standard to assess the SDQ-H, there is a "small potential for circularity" (Goodman and Goodman, 2010, p. 7) between the SDQ and the DAWBA because high SDQ scores may lead to administering in full some DAWBA sections which do not screen positive on the DAWBA's own screening questions. "Collecting this additional DAWBA information is occasionally the basis for assigning diagnoses which would otherwise have been missed. This cannot explain the results observed, however, as a strong association with prevalence remained after excluding the mean scores of children with a disorder." (Goodman and Goodman, 2010, p. 7) conclusion The psychometric validation of the SDQ in a non-Western cultural context and language contributes to the general efforts to better understand the value of standardized evaluation measures in clinical practice and community settings.
Both parent-and self-reports seem to discriminate fairly well between cases and non-cases, as shown by the fact that mean SDQ-H scores were significantly higher for adolescents with an internalizing disorder than for those without, on all scales, and mean scores on the Conduct problems and Hyperactivity scales were higher for those with an externalizing disorder. The Total Difficulties score and all the SDQ scales showed significantly higher scores when any mental disorder was present. Regarding reliability, although scores vary for the different SDQ-H scales, the Total Difficulties and the Hyperactivity scale for both mothers and adolescents had adequate alpha scores, above the 0.7 threshold which indicates acceptable internal consistency. Cronbach's alphas of 0.6 imply 60% true score and 40% error. Accordingly, 0.6 is reasonable for a screening measurement at best. Therefore, we attribute acceptable reliability to the Emotional symptoms scale for the self-version. The Peer problems and the Conduct problems scales, as in other studies, showed the lowest reliability (Goodman et al., 2000a;Koskelainen et al., 2001;Ronning et al., 2004) and the factor analyses also showed they some of the items belonging to these scales cross-loaded with other factors.
The current findings point to reasonable SDQ-H psychometric properties and we believe they are sufficiently promising to warrant further evaluation of SDQ-H as a screening tool. Given the fact that parents are only partially aware of their children's internalizing disorders and the adolescents are only partially aware of their own externalizing disorders, as shown with the ROC analyses, we highly recommend a multiple-informant approach that also includes teachers' ratings.

acknowledgMents
This survey was supported by the Israel National Institute for Health Policy and Health Services Research (No. 25/2000), the Association for Planning and Development of Services for Children and Youth at Risk and Their Families (ASHALIM), the Englander Center for Children and Youth of the Brookdale Institute, and the Rotter Foundation of the Maccabi Health Services, Israel. Dr. Alexander M. Ponizovsky was supported in part by the Ministry of Immigrant Absorption of Israel. The authors also wish to acknowledge the contribution of Itzhak Levav, MD, MSc and Daphna Levinson, PhD, to the planning and execution of this project. Daphna Levinson was responsible for the translation of the SDQ and DAWBA from English into Hebrew and Anneke Ifrah made the back-translations.