Psychometric Properties of the Italian Version of the Young Schema Questionnaire L-3: Preliminary Results

Schema Therapy (ST) is a well-known approach for the treatment of personality disorders. This therapy integrates different theories and techniques into an original and systematic treatment model. The Young Schema Questionnaire L-3 (YSQ-L3) is a self-report instrument, based on the ST model, designed to assess 18 Early Maladaptive Schemas (EMSs). During the last decade, it has been translated and validated in different countries and languages. This study aims to establish the psychometric properties of the Italian Version of the YSQ-L3. We enrolled two groups: a clinical (n = 148) and a non-clinical one (n = 918). We investigated the factor structure, reliability and convergent validity with anxiety and depression between clinical and non-clinical groups. The results highlighted a few relevant findings. Cronbach's alpha showed significant values for all the schemas. All of the factor models do not seem highly adequate, even if the hierarchical model has proven to be the most significant one. Furthermore, the questionnaire confirms the ability to discriminate between clinical and non-clinical groups and could represent a useful tool in the clinical practice. Limitations and future directions are discussed.

ST was developed as the clinical implication of Young (1994) schema theory. It is an integrative therapy, mixing elements of different approaches such as Cognitive-Behavioral Therapy, Gestalt therapy, Attachment Theory, Object Relations Theory and emotional-focused models (Young, 1994). Influenced by these theories, Young and colleagues (Young, 1994;Young et al., 2003) developed the "Early Maladaptive Schemas" (EMSs) concept, as a broad, pervasive, trait-like, cognitive and emotional selfdefeating pattern, concerning beliefs about the self, others and the future. According to the ST model, EMSs derive from early childhood noxious experiences with primary caregivers and are established by unmet core emotional needs (Young et al., 2003), as well as from peer relations during childhood and adolescence (Mash and Dozois, 2003;Renner et al., 2013). Little evidence seemed to support the association between early relational experiences and EMSs (e.g., Muris, 2006;Wright, 2007) as well as between schemas and psychopathology symptoms such as depression and anxiety in adulthood (Halvorsen et al., 2009;Hawke et al., 2011;Renner et al., 2012;Riso et al., 2017), or in youth (Van Vlierberghe et al., 2010;Balsamo et al., 2015c), even though some authors maintained that infant attachment may be an overrated predictor (e.g., Meins, 2017).
The current list of EMSs consists of 18 schemas, which have been identified in the general populations, as well as in clinical groups (Young, 1994). The 18 EMSs have been grouped into five broad categories of unmet emotional needs called "schema domains." These broad categories are: disconnection and rejection, impaired autonomy and performance, other directedness, over-vigilance and inhibition and impaired limits (Young et al., 2003).
The Young Schema Questionnaire (YSQ; Young and Brown, 1994) is a self-report measure developed to assess EMSs within the ST. It is used as a clinical instrument in psychotherapy and as a research measure in developmental psychopathology studies. The first YSQ-Long Form consisted of 205 items, representing the 16 EMSs listed by the authors. After a psychometric revision of the EMSs (Schmidt et al., 1995), Young et al. (2003) 18 EMS were operationally defined and a new YSQ-Long Form was developed. This Third Edition (YSQ-L3; Young and Brown, 1994), consisted of 232 items. According to a literature review (Oei and Baranoff, 2007), although the Third Edition underwent many revisions, no consistent factor structures emerged for the YSQ-L3.
Whereas the psychometric properties of the YSQ were tested in different languages and groups (clinical and non-clinical participants), almost all of the studies employed the short form or the previous forms, which are not comparable with the YSQ L3 form. Furthermore, to the best of our knowledge, this is the first study in Italy that explores the YSQ-L3 structural validity by means of Confirmatory Factor Analysis.
In this study, we examined the reliability and structural validity of the 18 schema scales, as measured by the YSQ-L3. We specifically tested its structural validity by investigating whether the five correlated first-order factor structure, proposed by the test developers (Young et al., 2003), could be replicated in two Italian groups (clinical and non-clinical subjects) by Confirmatory Factor Analysis, as well as the one-factor model, recently found in the Italian version of the YSQ-L3 via Exploratory Factor Analysis (see . Since the findings resulting from current literature on the YSQ-L3'slatent factor structure were inconclusive (Oei and Baranoff, 2007), we also tested a bi-factor model, strongly suggested by Kriston et al. (2012) for the YSQ-SF3, in which all the 18 schemas loaded each on own domain and on one global factor, called "Psychopathology." Finally, we tested the second-order model with five firstorder factors according to Young's model as well as a general second-order factor.
We also investigated the reliability of the YSQ-L3, as well as its convergent validity by computing associations between the YSQ-L3 and concurrent measures of anxiety and depression. In addition, we carried out a Multigroup Confirmatory Factor Analysis (MG-CFA) to test measurement invariance of the YSQ-L3 with respect to groups of subjects with and without psychological syndromes. Furthermore, false positive (FP) risk values were calculated to discriminate between non-clinical and clinical subjects.

Participants
Participants ranged between the ages of 18 and 89 and had the capacity to complete self-administered questionnaires. This group was the same used for the Italian norms in a previous study . Inclusion criteria for the clinical group were: existence of a psychiatric diagnosis and age = or > 17 years old. Exclusion criteria included ongoing psychotic symptoms, serious physical illnesses and central nervous system major disorders (e.g., Alzheimer's disease and Parkinson's disease). Participants were 1,112 Italian subjects: 157 clinical and 955 community participants. Forty-six were excluded from the analyses: 9 clinical and 37 non-clinical subjects were removed because they had missing values ≥10% at EMSs. Missing values rated below 10%, were replaced with the average values of each schema.
The clinical group was recruited through private practice (N = 49; 33.1%), private psychiatric hospitals (N = 13; 8.8%), public psychiatric hospital (N = 23; 15.5%) and mental health departments (N = 63; 42.6%). Diagnoses were conducted according to the Diagnostic and Statistical Manual of Mental Disorders standards (DSM-IV-TR; American Psychiatric Association, 2000) by accredited psychiatrists and psychologists. The patients included in this group were diagnosed as follows: 56.8% (N = 84) received a diagnosis of a disorder on DSM-IV-TR Axis I, 15.5% (N = 23) received a diagnosis of a disorder on DSM-IV-TR Axis II and 20.9% (N = 31) received a comorbid diagnosis Axis I/Axis II. For 6.8% (N = 10) of the clinical group there was no information available about the diagnosis.
The non-clinical group was recruited through advertisements posted in established community groups (e.g., youth centers, church groups, university student associations). Study participants contributed voluntarily and anonymously. Each participant anonymously completed the questionnaire packet and gave informed consent prior to being included in the study.

Instruments
All participants were administered the Italian versions of the Young Schema Questionnaire Long Form, Third Edition (YSQ-L3), the Teate Depression Inventory (TDI), the State-Trait Inventory for Cognitive and Somatic Anxiety Trait Scale (STICSA). All respondents completed paper-and-pencil versions of the questionnaires in a fixed order (a socio-demographic checklist, the YSQ L3, the TDI, and the STICSA) on site at established community groups. The protocol was administered by licensed psychologists who received a brief training wherein the objectives of the research, characteristics of the instruments administered and information about common issues in the psychological assessment of adults were explained. Informed consent was obtained from every single participant included in the study, in accordance with the Ethical Standards of the Helsinki Declaration.

Young Schema Questionnaire-Long Form, Third Edition
The YSQ-L3 (Young et al., 2003) is a 232-item self-report tool developed to assess 18 EMSs. The Italian version of the questionnaire is in the Appendix of the Young et al. (2003)'s Italian book. Participants are asked to rate each statement on a 6-point Likert scale ranging from 1 ("it is completely untrue for me") to 6 ("it describes me perfectly"). Items are clustered by 18 scales and grouped into five domains, bringing together the EMSs that tend to develop together: Disconnection/Rejection (Abandonment, Mistrust/ Abuse, Emotional Deprivation, Defectiveness/Shame, Social Isolation/Alienation); Impaired Autonomy/Performance (Dependence/Incompetence, Vulnerability to Harm or Illness, Enmeshment/Undeveloped Self, Failure); Impaired Limits (Entitlement/Grandiosity, Insufficient Self-Control/Self-Discipline); Other-Directedness (Subjugation, Self-Sacrifice, Approval-Seeking/Recognition-Seeking); and Overvigilance/Inhibition (Negativity/Pessimism, Emotional Inhibition, Unrelenting Standards/Hypercriticalness, Punitiveness). A sum or a mean score is calculated for each EMS, a higher score representing a higher endorsement of the EMS in question. YSQ has demonstrated adequate test-retest reliability and internal consistency, as well as convergent and discriminant validity (Young et al., 2003). Results attained from several YSQ studies support its validity as an EMS measure (Lee et al., 1999;Stopa et al., 2001;Hoffart et al., 2005). Cronbach's α coefficients for this current study are reported in Table 2. All the statistical analyses in this research were based on the mean score of each EMS.

State-Trait Inventory for Cognitive and Somatic Anxiety
The STICSA (Ree et al., 2008; Italian version see Balsamo et al., 2015aBalsamo et al., , 2016) is a 21-item measure designed to assess cognitive and somatic symptoms, both on Trait and State variations. In the trait anxiety subscale, the subject rates how often a statement is true in general (on a four-point Likert-type scale from "1almost never at all" to "4-almost always"), whereas she/he rates how she/he feels at the moment of assessment (on a four-point Likert-type scale from "1-not at all" to "4-very much") in the state anxiety subscale. In total, the overall scale is made up of four subscales: State-Somatic (SS), Trait-Somatic (TS), State-Cognitive (SC), and Trait-Cognitive (TC).
The STICSA was developed to address the psychometric limitations of existing anxiety measures, especially as far as their extensive overlapping depression (Caci et al., 2003;Balsamo et al., 2013a;Roberts et al., 2016). The factor structure showed strong support and the total scale and subscales exhibited high internal consistencies, as well as construct consistent correlations in patients, controls, and community groups (Grös et al., 2007;Ree et al., 2008;Van Dam et al., 2013;. Cronbach's α coefficients for this current study are from 0.812 (State-Somatic) to 0.926 (State).

Teate Depression Inventory
The TDI Saggino, 2013, 2014;) is a 21-itemself-report instrument designed to assess Major Depressive Disorder as specified by the latest edition of the DSM (American Psychiatric Association, 2013). It was developed via Rasch logistic analysis of responses (Rasch, 1960), within the framework of Item Response Theory, in order to overcome inherent psychometric weaknesses of existing depression measures, including the BDI-II (Balsamo and Saggino, 2007). Each item is rated on a 5-point Likert-type scale, ranging from 0 (always) to 4 (never). Growing literature suggests that the TDI has strong psychometric properties in both clinical and non-clinical groups, including an excellent Person Separation Index, no evidence of bias due to itemtrait interaction, good discriminant and convergent validity and control of major response sets (Balsamo et al., 2013b(Balsamo et al., , 2015aInnamorati et al., 2013). In a recent study, three cutoff scores were recommended in terms of sensitivity, specificity and classification accuracy to screen for varying levels (minimal, mild, moderate and severe) of depression severity in a group of patients diagnosed with Major Depressive Disorder . In our groups, Cronbach's alpha was 0.943 for the clinical participants and 0.917 for the non-clinical group.

Descriptive Statistics
The 18 EMSs were preliminarily submitted to analyses in order to check the normal distribution by computing means, standard deviations and indices of skewness and kurtosis. Inspection of skewness and kurtosis indices indicated that departures from normality were not severe according to West et al. (1995) with only a few exceptions. Thus, no variable transformations were deemed necessary. Statistical analyses were performed with IBM SPSS.

Reliability, and Convergent Validity Analysis of the YSQ-L3
In order to investigate the psychometric properties of the YSQ-L3, we assessed internal consistency of its scales using Cronbach's alphas indices separately for the two groups. The two-way mixed effects ICC (Intraclass-Correlation; Shrout and Fleiss, 1979;McGraw and Wong, 1996) was used to assess the 3-month testretest stability (T0, T1, T2) of each EMS' schema on a group formed by 40 non-clinical subjects. The strong reduction of subjects is due to mortality or to the fact that many subjects refused to repeat test administration. Since the Shrout and Fleiss' (1979) ICC rules of thumb were criticized (Hopkins, 2000), we considered the following values as a general rule: ≥ 0.90 high, between 0.80 and 0.90 moderate, and ≤0.80 insufficient (Vincent, 1999).
The convergent validity of the YSQ-L3 schemas was investigated by computing Pearson's r correlation coefficients with well-established depression and anxiety measures (TDI and STICSA, respectively). Error α was adjusted with Bonferroni's correction. These statistical analyses were performed with IBM SPSS.

Confirmatory Factor Analyses of the YSQ-L3
Different Confirmatory Factor Analyses (CFAs) were performed separately for the clinical and non-clinical participants. Due to a slight deviation from multivariate normality all analyses were carried out using robust maximum-likelihood estimation methods. Given the heterogeneity of the results reported in literature regarding the latent factor structure of Young's EMSs (for a review, see Kriston et al., 2012), most of which referred to the different YSQ versions, we compared five alternative factor models for the Italian version of the YSQ-L3. These versions were: (1) the one-factor model (1F model), in which all 18 schemas were forced to load on a single higher order factor ; (2) the five correlated first-order factors model (5F-correlated model), based on Young's original theoretical model (Young et al., 2003); (3) the five not correlated first-order factors model, according to Young's model, without correlations between factors (5F-not correlated model); (4) the bi-factor model (bi-factor model), strongly suggested by Kriston et al. (2012), in which all of the 18 EMS schemas loaded each on own domain and on one global factor, called "Psychopathology"; (5) finally, the second-order model, with the five first-order factors model, according to Young's model, and a general secondorder factor.

Measurement Invariance of the YSQ-L3 Between Non-clinical and Clinical Groups
We performed a Multigroup Confirmatory Factor Analysis (MG-CFA) to test measurement invariance of the YSQ-L3 with respect to groups of subjects with and without psychological syndromes on a set of nested models (Meredith, 1993;: 1. The baseline configural invariance model (M1) in which the same factorial pattern was specified for each group, but with loadings and intercepts free to vary across groups; 2. The metric invariance model (M2), wherein loadings were constrained to be equal across groups; 3. The scalar invariance model (M3), wherein factor loadings and intercepts were constrained to be equal across groups; There is also the model for testing strict invariance (loadings, intercepts and residual variances were constrained to be equal across groups), but strict invariance is not fundamental for the validity of the model. Model fit was assessed using the χ 2 statistical test, the χ 2 /df, the RMSEA, the 90% CI of RMSEA, the SRMR, the TLI and the CFI. Difference between CFIs ( CFI) of nested models was estimated for testing measurement invariance. A value of CFI smaller than or equal to |0.01| (in absolute values) indicates that the null hypothesis of invariance should not be rejected (Cheung and Rensvold, 2002). Tests which have scalar invariance are considered consistent tests, because unaffected by group characteristics (Meredith, 1993). If multigroup invariance is confirmed with models M2 or M3, we also tested if factor means are different across groups by setting a model wherein the factor means are zero in all groups (M4). We estimated the difference between the chi-square value of M4 and that of model M2 or M3. If the value of the difference is not significant, factor means can be considered equal across groups. CFAs and MG-CFA were performed using M-Plus 7.0 (Muthén and Muthén, 2012).
Furthermore, false positive (FP) risk values were calculated for each YSQ-L3 schema and domain. FP risks are determined by the False Positive Rate (FPR), which is the ratio between the probability of False Positives (FPs) and the sum of FPs and True Positives (TPs). Because a clinical test such as the YSQ-L3 has to discriminate between non-clinical and clinical subjects, we must estimate FPR ratio, instead of using the criterion of rejecting the null hypothesis with a first-type error probability value of 0.05, in order to attain the correct percentage of risk to make FPs using test scores (Colquhoun, 2014). All of the analyses were based on the standardized scores for any schema and on the factor scores, for any latent domain.
All missing data were substituted by the serial mean. The work of Chen et al. (2012) showed that with a percentage of missing data below 20% there is no reduction of fit indices. The model fit decreases as the number of missing data gets larger. The authors suggest that when the percentage of missing data is higher than 30%, both the serial mean and the trend missing imputation methods offer a better model fit than the other available methods. Because the missingness in our data was always below 10%, we therefore used the Serial Mean method.

Descriptive Statistics of the YSQ-L3
Descriptive statistics of the 18 EMS, the TDI and the STICSA State-Trait; somatic and cognitive scales in the Italian clinical and non-clinical groups are displayed in Table 1.
As shown in Table 1, in our sample all the EMS schemas exhibited no absolute value of skewness larger than 2, neither absolute values of kurtosis larger than 7, in both groups, excepting for Defectiveness which presented a skewness corresponding to 2.030 in the non-clinical group, according to the guidelines recommended by West et al. (1995). A similar trend of normality distribution was observed for the TDI and the STICSA scales and subscales.

Reliability, and Convergent Validity Analysis of the YSQ-L3
As shown in Table 2, internal consistency reliability of the 18 EMS was high (range α clinical = 0.804-0.921 and α non−clinical = 0.834-0.941).
As shown in  Vincent, 1999), ranging from 0.703 (Failure to Achieve) to 0.791 (Insufficient Self-control). Table 3 shows the correlations among the 18 EMS, measures of depression (TDI) and trait and state anxiety (STICSA,  with its subscales). As expected, all of the EMS in general showed an average to high correlation with the TDI and the STICSA scales both in the clinical and in the non-clinical groups. Table 4 shows the goodness-of-fit indexes of the five structural models tested both for the non-clinical and the clinical groups.

Confirmatory Factor Analyses of the YSQ-L3
Although the bi-factor model has the best fit, as far as both the non-clinical and the clinical group, it exhibits many flaws at a more detailed level. The loadings of the Disconnection/Rejection domain are especially not significant for the Abandonment and the Defectiveness/Shame schema in the clinical group; the loadings of the Impaired Autonomy/Performance domain are not significant for all of the four schemas in the clinical group and are not significant for the Failure schema in the non-clinical group; the loadings of the Other-Directedness domain are not significant for the Subjugation and for the Approval-Seeking/Recognition-Seeking schema in the clinical group; the loading of the Impaired Limits domain on the Insufficient Self-Control/Self-Discipline schema is not significant in the clinical group; the loadings of the Overvigilance/Inhibition domain on the Emotional Inhibition, and the Unrelenting Standards/Hypercriticalness schema are not significant in the clinical participants. Not-significant loadings mean that the bifactor model does not provide adequate measurement properties. Table 5 shows the loadings of each schema in the five domains and in the general factor for the bifactor model. Hierarchical (ω h ) and total omegas (ω t ) for each schema are also reported. The ratio ω t /ω h expresses the variance component of the general factor in each observed variable in relation to the global variance due to all latent factors (Tommasi et al., 2015).
The distributions of fit indices are affected by sample size and by the distribution of the measured characteristic in population (Yuan, 2005). Therefore, cutoffs of fit indexes cannot be considered as absolutely valid. In addition, the misfit of the models can be due to high covariance residuals instead of model misspecification. Covariance errors and model misspecification do not necessarily correspond (Hayduk et al., 2007). Therefore, not necessarily lower fit indexes indicate a misfit model. Factor loadings represent the quality of measurement of latent variables. Model with poor measurement quality (low factor loadings) can have a better fit than models with excellent measurement quality (high factor loadings). This phenomenon is called reliability paradox (Hancock and Mueller, 2011). On the basis of this paradox, McNeish and colleagues (McNeish et al., 2017) recommend to evaluate the validity of factor models not only on goodness of fit indexes, but also on the quality of their measures by reporting also factor loadings, because there is not a perfect correspondence between quality of measurement and fit indexes.
In the second-order model, instead, all loadings of the five domains on schemas are significant both for the non-clinical and for the clinical groups. Figure 1 shows the path-diagram of the second-order model of the YSQ-L3.  Frontiers in Psychology | www.frontiersin.org Measurement Invariance of the YSQ-L3 Between Non-clinical and Clinical Groups Table 6 shows the MG-CFA performed on the second-order model of the YSQ-L3. Because the second-order model has at work order loading, there is a version of the M2 model where the first-order loadings are fixed between groups (M2 * ) and a version where the first-order and the second-order loadings are fixed (M2 * * ). All CFI are lower than |0.01|, therefore the scalar invariance between the non-clinical and the clinical groups of the YSQ-L3 is confirmed. The difference between model M4 and M3 is however significant ( χ 2 = 45.824, df = 5, p < 0.001).
The means of the five domains of the YSQ-L3 are therefore significantly different between the non-clinical and the clinical group. All of the means of the five domains are higher in the clinical than in the non-clinical group. We therefore calculated the FPR for each schema and for each domain. On these calculations we estimated the percentage of risk in making FPs, multiplying the FPR ratio by 100, for both of the scores attained at the level of YSQ-L3 schemas and on factor scores of the five YSQ-L3 domains. Before estimating the FP risk for each YSQ-L3 schema, we transformed the raw scores of each schema in standardized scores. We estimated different distribution of standardized scores for the non-clinical and the clinical group. The cutoff values for the 0.05 and the 0.025 probability of FPs in the non-clinical group (first-type error) were used to estimate the probability values of TPs in the clinical group. We calculated the factor scores of the five domains to calculate the FPR for each domain. We estimated different distributions of standardized scores for the non-clinical and the clinical groups. The cutoff values for the 0.05 and the 0.025 probability of FPs in the non-clinical group (first-type error) were used to estimate the probability values of TPs in the clinical group. Table 7 shows the FP risk values for each YSQ-L3 schema and for each YSQ-L3 domain. The average FP risk value is 40.6 and 45.0% for the YSQ-L3 schemas, for the 5 and the 2.5% first-type error, respectively, while the average FP risk value for the YSQ-L3 domains is 24.2 and 18.2%, for the 0.05 and the 0.025 first-type error, respectively. FP risk is therefore lower when the factor scores for the five YSQ-L3 domains are used to discriminate between non-clinical and clinical subjects. According to Colquhoun (2014), the usual cutoffs for significance testing (0.05, 0.01 or 0.001) are somewhat misleading, because based on the assumption that if there are no significant differences between clinical and non-clinical subjects (null effect), therefore there is only a 5, 1 or 0.1% probability to judge an individual as a clinical subject while he is perfectly normal. However, this approach does not consider the power of the test or, in other words, the capacity of the psychological test to discriminate between clinical and non-clinical subjects. The test power is the probability to correct recognize the presence of disease in non-clinical subjects (true positives). If test power is not estimated, the correct identification of FPs is underestimated. Therefore, Colquhoun (2014) suggests to use the FPR instead of the usual null hypothesis significance test to determine its capacity to discriminate clinical from non-clinical subjects.

DISCUSSION AND CONCLUSION
The YSQ-L3 (Young and Brown, 1994) is a self-report instrument, developed after a psychometric refinement of the previous version aimed at assessing the 18 EMS according to the ST theoretical framework. Its latent factor structure has not been consistently replicated (for a review, see Oei and Baranoff, 2007). In fact, almost all of the studies on the YSQ psychometric structure scrutinized the previous form (YSQ-L2) or the short form (YSQ-S3) and not the actual long form (YSQ-L3).
Knowledge of its factor structure could be useful both for researchers and for clinicians during assessment and treatment. The current study investigated the factor structure of the Italian YSQ-L3, its reliability, convergent validity with state/trait anxiety and depression measures, and measurement invariance across a large community and clinical groups.
CFAs analyses were conducted separately for the community and for the clinical groups, testing five different models: a singlefactor model, a five correlated first-order factor model, a five uncorrelated first-order factor model, a bi-factor model and, finally, a second-order model, with the five first-order factors, according to Young's model, and a general second-order factor. 5 | Loadings on the first-order factors (λ f ) and on the general factor (λ g ) and corresponding significance (p-values).

Young-L3 domains
Young-L3 schemas Although the bi-factor model showed the best fit, both in the clinical group and the community group, some loadings of the five domains did not appear to be significant for their corresponding schemas, as posited by the original factor structure model, thus suggesting an inadequate fit. In the second-order model, instead, all loadings of the five domains on their schemas seemed to be significant both for the community and for the clinical groups. The second-order model was therefore preferred as it showed more adequate measurement properties than the bifactor model for both of the groups. The original model proposed by Young et al. (2003) was therefore not confirmed in the current study.
Measurement invariance of the YSQ-L3 between community and clinical groups was subsequently tested for the secondorder model. Results suggested that all CFI were lower than |0.01|, thus supporting the scalar invariance between the community and the clinical groups. Since models M4 and M3 resulted significantly different, the means of the five domains of the YSQ-L3 appeared significantly different across the community and the clinical groups. All of the means of the five domains were higher in the clinical group than in the community group. The YSQ-L3 therefore appeared to be able to discriminate between the community and the clinical groups.  False positive risks indeed appeared lower when the factor scores of the five YSQ-L3 domains were used to discriminate between community and clinical individuals than when all of the 18 EMS were used. This result supported the ST model (Young et al., 2003), which posited that domains constructs are associated with psychopathology.
These results supply proof of the YSQ-L3 discriminant power and, consequently, of its validity. The average to high correlations between both the TDI and the STICSA supply additional proof of the YSQ-L3 capacity to measure psychopathology.
The ICC reliability estimates were in general insufficient or moderate and this could represent a problem for the YSQL-3.
This study bears various strengths. Firstly, it is one of the rare studies available about the YSQ-L3. YSQ-L3 is the most important version of the Young Schema Questionnaire and the most useful one as far as giving psychotherapists indications about patients' schemas. Secondly, at the best of our knowledge, this study is the most comprehensive one available as far as the validity of the Italian version of the YSQ-L3 is concerned. Third, participants were both community and clinical subjects. An additional strength is supplied by the specific analyses that it reports for the first time, for example concerning he FPR risk values for each YSQ-L3 schema and domain.
Some limitations of the study should be highlighted. Firstly, the study uses a clinical group with different psychiatric diagnoses. An additional potential bias is that the clinical group included also individuals with comorbid personality disorders and individuals without them. Future research should thus investigate measurement invariance of the YSQ-L3 across different types of psychiatric disorders, such as clinical groups with only personality disorders and groups with only anxiety or depressive disorders. Examining whether the YSQ-L3 can discriminate between individuals with different personality disorders, eating disorders (Innamorati et al., 2015) or clusters of personality disorders could also be interesting.
Another limitation of this study concerns the lack of measures of other constructs related to EMS in the analysis of convergent validity, such as personality traits, attachment styles or functional/dysfunctional personal values (i.e., Picconi et al., 2018). Future studies should also investigate the responsiveness of the questionnaire in participants with psychiatric disorders after CBT or ST.
A further limitation concerns the numerous missing data. We tried to solve this problem in the best possible way. Anyway, particularly for this reason, a replication of the present study is welcomed.
In conclusion, the current study expanded previous knowledge beyond the inconclusive evidence about factor structure of the YSQ-L3, indicating a second-order model for the Italian version, and showing that it can be a valid and reliable instrument of measure than can be used in clinical practice and research.

ETHICS STATEMENT
In accordance with the Declaration of Helsinki, all participants provided written informed consent. Concerning ethics approval, the data collection process does not harm participants neither physically nor mentally.

AUTHOR CONTRIBUTIONS
AS designed the study, assisted with data analyses, wrote part of the paper, and edited the final manuscript. MB assisted with the design of the study and data analyses, and wrote the most part of the paper. LC contributed in the analysis of the data and wrote part of the paper. VC recruited part of the sample. MS collaborated in editing the final manuscript and recruited part of the sample. GdF recruited part of the sample. DD recruited part of the sample and contributed in the analysis of the data. NM recruited part of the sample. IP recruited part of the sample. SP recruited part of the sample. MT assisted with the data analyses and collaborated in writing the manuscript.