Psychometric Evaluation of the Overexcitability Questionnaire-Two Applying Bayesian Structural Equation Modeling (BSEM) and Multiple-Group BSEM-Based Alignment with Approximate Measurement Invariance

The Overexcitability Questionnaire-Two (OEQ-II) measures the degree and nature of overexcitability, which assists in determining the developmental potential of an individual according to Dabrowski's Theory of Positive Disintegration. Previous validation studies using frequentist confirmatory factor analysis, which postulates exact parameter constraints, led to model rejection and a long series of model modifications. Bayesian structural equation modeling (BSEM) allows the application of zero-mean, small-variance priors for cross-loadings, residual covariances, and differences in measurement parameters across groups, better reflecting substantive theory and leading to better model fit and less overestimation of factor correlations. Our BSEM analysis with a sample of 516 students in higher education yields positive results regarding the factorial validity of the OEQ-II. Likewise, applying BSEM-based alignment with approximate measurement invariance, the absence of non-invariant factor loadings and intercepts across gender is supportive of the psychometric quality of the OEQ-II. Compared to males, females scored significantly higher on emotional and sensual overexcitability, and significantly lower on psychomotor overexcitability.


INTRODUCTION
Overexcitability within Dabrowski's Theory of Positive Disintegration Dabrowski (1902Dabrowski ( -1980, a Polish psychiatrist and psychologist, developed the Theory of Positive Disintegration, which centers on heightened excitability in individuals, as well as on their drive, and their urge to resist conformity and complacency (Daniels and Piechowski, 2009). According to Dabrowski (1964;1972;Mendaglio, 2008), personality is achieved through a process of positive disintegration, which begins with the disintegration of a primitive mental organization aimed at meeting biological needs and conforming to societal norms. Reintegration subsequently takes place at a higher level of human functioning, as characterized by autonomy, authenticity, and empathy. Achieving the highest level of human development-or enacting the personality idealdepends on the developmental potential of an individual, which is determined by the individual's level of innate heightened excitability (overexcitability) and the presence of specific talents, abilities, and autonomous inner forces that cultivate growth (dynamisms).
According to Dabrowski, the developmental potential of an individual depends in part on the extent and nature of psychic intensity. Dabrowski uses the term "overexcitability" to refer to an above average responsiveness to stimuli, due to heightened sensitivity of the central nervous system, which generates a different, more intense, and more multi-faceted experience of internal and external reality (Dabrowski, 1964(Dabrowski, , 1972Mendaglio, 2008). Dabrowski distinguishes five forms of overexcitability. Psychomotor overexcitability represents "a surplus of energy or the expression of emotional tension through general hyperactivity" (Silverman, 2008, p. 160). Manifestations include an abundance of physical energy, work addiction, nervous habits, rapid speech, love of movement, impulsiveness, competitiveness, and an urge to action. Sensual overexcitability involves enhanced receptivity of the senses, aesthetic appreciation, sensuality, and pleasure in being the center of attention. Imaginational overexcitability is characterized by a capacity to visualize events very well, as well as by ingenuity, fantasy, a need for novelty and variety, and poetic and dramatic perception. Intellectual overexcitability is characterized by an intensified activity of the mind, as well as by asking penetrating questions, reflective thought, problem solving, searching for truth and understanding, conceptual and intuitive integration, and an interest in abstraction and theory. Emotional overexcitability involves an intense connectedness with others, as well as the ability to experience things deeply, strong affective and somatic expressions, sensitivity in relationships, responsiveness to others, and well-differentiated feelings toward self (Silverman, 2008;Daniels and Piechowski, 2009). Dabrowski considers the last three forms of overexcitability essential to advanced personality development (Dabrowski, 1972;Mendaglio, 2008).
Despite the rising tendency in empirical research to use the OEQ-II as a supplementary measure of dispositional traits, the instrument has been validated in a limited way. Falk et al. (1999, p. 2) developed the easily administered and scored OEQ-II by incorporating the results of numerous prior studies on hyperexcitability, "including responses to deep reflex stimulation, open-ended responses to verbal stimuli, assessment in autobiographical material, and an open-ended questionnaire." The authors investigated the structural validity of the questionnaire via principal components factor analysis using varimax rotation. A stable and conceptually clear fivefactor structure was retrieved with factor loadings above 0.50, and good internal consistency among the items indicative of the same factor was found for the two samples under study (Falk et al., 1999). Van den Broeck et al. (2014) investigated the factorial structure of the OEQ-II (Dutch translation), using exploratory structural equation modeling (ESEM; Asparouhov and Muthén, 2009) with weighted least squares estimation and oblique target rotation. The highly restrictive independent clusters model used in the confirmatory factor analysis (CFA), in which each indicator is allowed to load on only one factor and all non-target loadings are constrained to zero (Marsh et al., 2009), led to model rejection. Model testing using ESEM, in which the five correlated overexcitability factors were measured by each of the 50 items, yielded a partly satisfactory model fit. Modification indices were inspected in order to improve the model by including two residual covariances, ultimately leading to an acceptable fit to the data. This study further examined measurement invariance across intelligence levels and gender using an ESEM-Within-CFA approach (Marsh et al., 2013). This analysis established partial strict measurement invariance of the OEQ-II scores across the different groups. The researchers concluded that the non-invariant parameters do not considerably affect group comparisons because of their small proportionality. Warne (2011) also investigated measurement invariance of the OEQ-II scores across gender using a multi-group CFA approach and maximum likelihood estimation, but the study could not establish metric invariance. Botella et al. (2015) examined the structural validity of the French OEQ-II using CFA and maximum likelihood estimation. Instead of freeing "an important number of cross-loadings and residual covariances" (Botella et al., 2015, p. 211), the researchers first reduced the instrument to a 35-item version and concluded that a model with "five correlated factors with residual covariances" yields a better fit to the data compared to a "one second-order factor" model. Other studies that used CFA and maximum likelihood estimation to establish the construct validity of the OEQ-II also resulted in moderate model fit (Tieso, 2007b;Siu, 2010;He and Wong, 2014).

Bayesian Structural Equation Modeling
In contrast to frequentist statistics, which ignores prior knowledge for hypothesis testing, Bayesian statistics relies on Bayes' theorem to update prior knowledge given the data. In maximum likelihood estimation, the parameters of the population are fixed but unknown, and the estimates of those parameters from a sample of the population are random but known. In Bayesian statistics the parameter of the population is considered random, allowing probability statements to be made about its value, as expressed in the prior distribution. Using Bayes' theorem, observed sampling data will revise this prior knowledge, leading to the posterior distribution of the parameter (Bolstad, 2007;Lee, 2007;Kaplan and Depaoli, 2012). Drawing on Bayes theorem, the formula for the posterior distribution P(θ|z) of the unknown parameter θ given the observed data z can be expressed as: where P(θ) stands for the prior distribution of the parameter, reflecting substantive theory or the researcher's prior beliefs, and P(z|θ) is referred to as the distribution of the data given the parameter, which represents the likelihood (Levy, 2011;Kaplan and Depaoli, 2012;Kruschke et al., 2012;Zyphur and Oswald, 2015). Omitting the marginal distribution of the data P(z) in the formula, reveals the proportionality of the unnormalized posterior distribution to the product of the likelihood and the prior distribution (Levy, 2011;Kaplan and Depaoli, 2012). The uncertainty regarding the population parameter value, as indicated by the variance of its prior probability distribution, is influenced by the observed sampling data, yielding a revised estimate of the parameter, as reflected in its posterior probability distribution (Kaplan and Depaoli, 2012). The Bayesian credibility interval 1 , based on the percentiles of the posterior distribution, allows direct probability statements about the parameter, in contrast to the confidence interval (CI) in frequentist theory, which is contingent on the hypothesis of extensive repeated sampling from the population (Bolstad, 2007;Kaplan and Depaoli, 2012;Zyphur and Oswald, 2015). The posterior distribution P(θ|z) yields maximum information about the parameter given the data-"unlike the point estimate and confidence interval in classical statistics, which provide no distributional information" (Kruschke et al., 2012, p. 725). Using a small variance prior, which reflects strong prior knowledge, the data will tend to contribute less information to the construction of the posterior distribution (Muthén and Asparouhov, 2012). 1 The Bayesian credibility interval can be retrieved directly from the percentiles of the posterior probability distribution of the model parameters. Using the posterior distribution percentiles, it is possible to determine directly the probability that a population parameter value is situated within a specific interval. In this study, which used Bayesian analysis as a pragmatic approach, a (null) hypothesis testing perspective (Arbuckle, 2013;Zyphur and Oswald, 2015) was used in parameter estimation by evaluating whether the 95% credibility interval of the model parameters encompassed zero. If the posterior probability interval of a particular parameter does not contain zero, the null (condition) can be rejected as implausible, and as a consequence, the parameter is considered significant (which is indicated by a one-tailed Bayesian p-value below 0.05). A hypothesis testing perspective was also used in assessing model fit (Levy, 2011). Recently, computational methods (e.g., the Gibbs algorithm) have been developed to draw random samples from the posterior distribution, allowing the practical use of Bayesian statistics (Bolstad, 2007), and leading to strong and increasing interest in this approach to statistics (Kruschke, 2015).
Meanwhile, Muthén and Asparouhov (2012) proposed an innovative approach to factor analysis using Bayesian structural equation modeling (BSEM), which better reflects substantive theory. Many psychological instruments cannot be adequately represented within a frequentist CFA approach, in which each item is allowed to load on one factor and all non-target loadings are fixed at zero (Marsh et al., 2009). Strategies to compensate for this inappropriateness may capitalize on chance (MacCallum et al., 1992), with a large risk of model misspecification (Muthén and Asparouhov, 2013b). In BSEM, parameter specifications of exact zeros are replaced by approximate zeros based on "informative, small-variance priors to reflect the researcher's theories and prior beliefs" (Muthén and Asparouhov, 2012, p. 316).
In the same way, Muthén and Asparouhov (2013a) propose the BSEM approach to measurement invariance analysis across different groups, in which exact zero differences in factor loadings and intercepts are replaced by approximate zero differences based on zero-mean, small-variance priors. "Measurement invariance is built on the notion that a measuring device should function in the same way across varied conditions, so long as those varied conditions are irrelevant to the attribute being measured" (Millsap, 2011, p. 1). With reference to psychological questionnaires, this implies that in order to test for mean differences across groups, the assumption of equivalent measurement of the underlying construct must be fulfilled. Scalar invariance, as characterized by invariant factor loadings and measurement intercepts, is a prerequisite to compare factor means and factor intercepts across groups (Vandenberg and Lance, 2000;Millsap, 2011;Muthén and Asparouhov, 2013c). The BSEM approach to measurement invariance is referred to as approximate measurement invariance and provides a valuable alternative to the multi-group CFA approach to measurement invariance analysis with maximum likelihood estimation (Muthén and Asparouhov, 2013a;Asparouhov and Muthén, 2014), which mostly results in unsatisfactory fit due to minor deviations from exact invariance (Muthén and Asparouhov, 2012). Results of simulation studies indicate that BSEM with approximate measurement invariance is a suitable technique for proper estimation and comparison of factor means and variances across multiple groups that may have non-invariant measurement parameters with minor variance, "without relaxing the invariance specifications or deleting noninvariant items" (Muthén and Asparouhov, 2013a, p. 7;van de Schoot et al., 2013).

Aim of this Study
The first and main objective of this study is to investigate the structural validity of the OEQ-II using BSEM with informative, small-variance priors, and to compare the results of this Bayesian approach to that of a frequentist approach to validation. We hypothesize that maximum likelihood CFA and ESEM models will generate poor data fit by postulating non-estimated parameters as exactly zero. The results of previous validation studies indicate that the OEQ-II-like most psychological instruments-exhibits slight cross-loadings and measures several supplementary minor personality factors in addition to the five overexcitabilities. On the one hand, freeing all cross-loadings and residual covariances leads to a non-identified model (Muthén and Asparouhov, 2012); on the other hand, modifying the model using modification indices in a frequentist analysis may capitalize on chance (MacCallum et al., 1992). Using Bayesian analysis as a pragmatic approach, we hypothesize that the BSEM model will generate a good fit to the data because it may take into account the existence of trivial cross-loadings in the CFA model and many minor correlated residuals among the factor indicators. The BSEM technique allows for the simultaneous inclusion in the model of all, approximate zero cross-loadings and residual covariances based on zero-mean, small-variance priors, and consequently represents substantive theory better.
The second aim of the study is to explore in greater depth the interrelationships between the five overexcitabilities by estimating a higher order model based on theoretical expectations and using Bayesian estimation. Mendaglio and Tillier (2006) strongly advocate the conceptualization of overexcitability within the overall context of development potential in future quantitative studies. According to Dabrowski, an individual's developmental potential is comprised of overexcitability, specific talents and abilities, and a strong autonomous drive to achieve individuality (Dabrowski, 1964(Dabrowski, , 1972Mendaglio, 2008). However, the five forms of overexcitability are not equally important with respect to the developmental process (Mendaglio, 2012). Dabrowski considers emotional, intellectual and imaginational overexcitability essential to advanced personality development (Dabrowski, 1972;Mendaglio, 2008Mendaglio, , 2012. Positive developmental potential is comprised of all of the five overexcitabilities, although emotional, intellectual and imaginational overexcitability aid the transformation of the lower forms of overexcitability, i.e., psychomotor and sensual overexcitability, "such that their energy is harnessed in the service of the developmental process" (Mendaglio, 2012, p. 212). The exclusive presence of psychomotor and sensual overexcitability constitutes negative developmental potential, which impedes the transcendence of biological needs and societal norms that is considered to be fundamental for the development of autonomy, authenticity, and empathy (Dabrowski, 1972;Mendaglio, 2012). Based on these theoretical considerations we hypothesize that all of the five overexcitabilities will load substantially on a superordinate general construct of positive developmental potential.
The final objective of this study is to investigate approximate invariance of measurement parameters across gender using BSEM. A CFA approach to measurement invariance often proves to be too strict, leading to model rejection and a long series of modifications of the model with a substantial risk of misspecification (Asparouhov and Muthén, 2014). Using an ESEM-Within-CFA approach, the study by Van den Broeck et al. (2014, p. 64) revealed partial strict measurement invariance across gender: "five items showed larger unique variances for girls than for boys, seven thresholds out of 200 were non-invariant, and only 12 out of 250 factor loadings were non-invariant." Because of the small proportionality of non-invariant parameters, we hypothesize that the BSEM approach will be a useful alternative to establish approximate measurement invariance across gender.

Participants
The OEQ-II (Falk et al., 1999) was added to a study conducted in Flanders investigating the influence of learning patterns on academic performance and the successful transition from secondary to higher education. The self-report measure was translated into Dutch, using back-translation, by the first author of this article, and it was tested on several young adults, in order to ensure the comprehensibility and proper interpretation of the items. The instrument was added to a fifth survey, which was conducted in the first semester of the academic year in which the respondents were in the second consecutive year of a program of higher education. In all, 516 students (318 women: 61.6%; 198 men: 38.4%) completed the OEQ-II online. Of these respondents, 356 (69%) had completed general secondary education before entering higher education, while 26% had followed technical secondary education, 4% had followed vocational secondary education, and 1% had followed secondary education in the arts. Two-thirds of the students were 19 years of age at the time of the survey, while 17% were 18 years old, 10% were 20 years old, and 6% were between 21 and 23 years of age. The study was executed in accordance with the guidelines of the Ethics Committee for the Social Sciences and Humanities of the University of Antwerp with written informed consent from all subjects.

Instrument
The OEQ-II consists of 50 items, equally representing intellectual overexcitability (e.g., "I love to solve problems and develop new concepts"), imaginational overexcitability (e.g., "Things that I picture in my mind are so vivid that they seem real to me"), emotional overexcitability (e.g., "I am deeply concerned about others"), psychomotor overexcitability (e.g., "If an activity is physically exhausting, I find it satisfying"), and sensual overexcitability (e.g., "I love to listen to the sounds of nature"). The items are scored along a five-point Likert scale with response options ranging from "Not at all like me" to "Very much like me." A high value on the scale of the items represents a high level of overexcitability.
Significant interrelationships have been found between gender and the extent and nature of overexcitability, as measured by the OEQ-II. A relatively strong association of emotional and sensual overexcitability with the female gender appears to be a general empirical finding (Bouchet and Falk, 2001;Treat, 2006;Tieso, 2007a,b;Miller et al., 2009;Siu, 2010;He and Wong, 2014;Van den Broeck et al., 2014;Botella et al., 2015). There is also evidence of a stronger relationship between the dispositional traits of intellectual (Bouchet and Falk, 2001;Treat, 2006;Miller et al., 2009;Rinn et al., 2010;Siu, 2010;Van den Broeck et al., 2014;Botella et al., 2015) and psychomotor (Bouchet and Falk, 2001;Treat, 2006;Rinn et al., 2010;He and Wong, 2014;Van den Broeck et al., 2014;Botella et al., 2015) overexcitability and the male gender. Because of these interrelationships, statistical analyses will be performed for the different gender groups separately.

MCMC Convergence
Bayesian estimation makes use of Markov chain Monte Carlo (MCMC) algorithms to iteratively draw random samples from the posterior distribution of the model parameters (Muthén andMuthén, 1998-2012). The software program Mplus uses the Gibbs algorithm (Geman and Geman, 1984) to execute MCMC sampling. MCMC convergence of posterior parameters, which indicates that a sufficient number of samples has been drawn from the posterior distribution to accurately estimate the posterior parameter values (Arbuckle, 2013), is evaluated via the potential scale reduction (PSR) convergence criterion (Gelman and Rubin, 1992;Gelman et al., 2014). The PSR criterion compares within-and between-chain variation of parameter estimates. When a single MCMC chain is used, the PSR compares variation within and between the third and fourth quarters of the iterations. A PSR value of 1.000 represents perfect convergence (Muthén andMuthén, 1998-2012;Kaplan and Depaoli, 2012). With a large number of parameters, a PSR < 1.100 for each parameter indicates that convergence of the MCMC sequence has been obtained (Muthén andMuthén, 1998-2012).
Convergence of the MCMC algorithm in distribution is assessed via monitoring of the posterior distributions by trace and autocorrelation plots (Muthén andMuthén, 1998-2012). Posterior parameter trace plots display the sampled parameter values over time. Quick up-and-down fluctuations and absence of long-term trends in the plot indicate rapid convergence in distribution (Kaplan and Depaoli, 2012;Arbuckle, 2013).
Autocorrelation plots also display the degree of nonindependence of successive posterior draws in the MCMC chains Muthén andMuthén, 1998-2012;Kaplan and Depaoli, 2012). An estimated correlation between the sampled parameter values reaching zero indicates convergence (Arbuckle, 2013).

Analyses
Before performing a Bayesian analysis of the OEQ-II model, a maximum likelihood analysis was carried out for comparison purposes. Using maximum likelihood estimation, a CFA model was tested-according to the OEQ-II's hypothesized latent factor loading pattern for the 50 observed variables-and an exploratory factor analysis (EFA) for five factors was performed using ESEM with oblique Geomin rotation. In the ESEM model, the five correlated factors were measured by each of the 50 factor indicators and the residuals were not correlated.
Subsequently, a Bayesian analysis was performed using the CFA model, albeit with informative, small-variance priors for the cross-loadings in the model and uncorrelated residuals. Target loadings with non-informative priors-i.e., normally distributed priors with a mean of zero and infinite varianceand cross-loadings with strong informative priors-i.e., normally distributed priors with a mean of zero and a variance of 0.01, yielding 95% small cross-loading bounds of ±0.20 (Muthén and Asparouhov, 2012)-were utilized in this model. Applying the Bayes estimator and Gibbs algorithm, two independent MCMC chains with 10,000 iterations were used to describe the posterior distribution. The factor variances were fixed at one to set the metric of the factors, and standardized variables were analyzed.
In the next step, a Bayesian analysis was performed using informative, small-variance priors for cross-loadings and residual covariances in the CFA model. In this BSEM analysis, normal prior distributions N(0, 0.01) were used for the cross-loadings, admitting ignorable effect sizes (Muthén and Asparouhov, 2012). An inverse-Wishart prior distribution IW(0, df ) with df = 56 was applied for the correlated residuals, corresponding to prior zero-means and variances of 0.01 (MacKinnon, 2008). In this analysis, every 10th iteration was used-in order to reduce autocorrelation between successive posterior draws-with a total of 100,000 iterations and one MCMC chain to describe the posterior distribution. A sensitivity analysis was carried out, in which the effect of varying the prior variances of the residual covariances on the parameter estimates and model fit was investigated.
In relation to the second aim of this study, a higher order model was estimated-according to the hypothesized latent factor loading pattern, as represented in Figure 1-using BSEM with informative, small-variance priors for the cross-loadings λ ∼ N(0, 0.01) and residual covariances δ ∼ IW(0, 56) in the measurement model. In the hypothetical model, the latent variable of positive developmental potential was operationalized according to the five overexcitability factors.
Finally, a Bayesian multiple-group model with approximate measurement invariance (Muthén and Asparouhov, 2013a) was carried out. One categorical latent variable with two known classes (i.e., male and female) was specified in this BSEM model. Normally distributed priors N(0, 0.01) were utilized for differences in intercepts and factor loadings across groups. Non-informative or diffuse priors were used for factor means, variances, and covariances across groups with the exception of factor means and variances in the male group, which were set at zero and one, respectively. Residuals were correlated, using an inverse-Wishart prior distribution IW(0, 16), corresponding to prior zero-means and variances of 0.01. Analyses were executed for each overexcitability factor and the alignment optimization method with Bayes estimation (Asparouhov and Muthén, 2014) was applied, which optimizes alignment "of the measurement parameters, factor loadings and intercepts/thresholds according to a simplicity criterion that favors few non-invariant measurement parameters, " and subsequently adjusts "the factor means and variances in line with the optimal alignment" (Muthén and Muthén, 2013, p. 2). The alignment optimization method provides a solution to a parameterization indeterminacy, in which (few) noninvariant parameters are underestimated and (many) invariant parameters are overestimated, due to non-normally distributed deviations from a parameter average over groups resulting in the misestimation of factor means and variances (Muthén and Asparouhov, 2013a). In this BSEM multiple-group analysis, every FIGURE 1 | Higher order BSEM model with informative, small-variance priors for cross-loadings and residual covariances for the Overexcitability Questionnaire-Two (OEQ-II; Falk et al., 1999) data for females and males (second-order factor loadings are added within parentheses). OE, overexcitability; BSEM, Bayesian structural equation modeling. *Significance at the 5% level in the sense that the 95% Bayesian credibility interval does not cover zero.
10th iteration was saved with a maximum and minimum number of iterations for each of two MCMC chains of 50,000 and 1,000, respectively, using the Gelman-Rubin PSR < 1.05 criterion (Gelman and Rubin, 1992).

Model Fit Assessment
The following fit measures were used as a means of evaluating the quality of the fit of both CFA and EFA models: the chi-square statistic, comparative fit index (CFI; Bentler, 1990), and root mean square error of approximation (RMSEA; Steiger, 1990). A non-significant chi-square value, CFI values close to 1 (Hu and Bentler, 1995), and a value of the RMSEA of 0.05 or less (Browne and Cudeck, 1989) indicate a close fit of the model.
For the BSEM models, fit assessment was carried out using Posterior Predictive Checking in which-as implemented in Mplus-the likelihood-ratio chi-square statistic for the observed data is compared to the chi-square based on synthetic data obtained by means of draws of parameter values from the posterior distribution (Muthén andMuthén, 1998-2012;Scheines et al., 1999;Asparouhov and Muthén, 2010). The simulated data should approximately match the observed data if the model fits the data (Kaplan and Depaoli, 2012). The Posterior Predictive p-value (PPp) measures the proportion of the chisquare values of the replicated data that exceeds that of the observed data. A low PPp (<0.05) indicates poor model fit. On the contrary, a PPp of 0.50, as well as a 95% CI for the difference in the chi-square statistic for the observed and simulated data that contains zero positioned close to the middle of the interval, are both indicative of excellent model fit (Muthén and Asparouhov, 2012). Results of simulation studies show the PPp to demonstrate sufficient power to reveal important model misspecifications (Muthén and Asparouhov, 2012).

Descriptive Statistics
Descriptive summary statistics for the five overexcitability factors are reported per gender group in Table 1. The mean outcomes are consistent with all other studies using the OEQ-II, in which the two highest scores have been for emotional, intellectual, or psychomotor overexcitability ). Table 2 shows the chi-square statistic, CFI, and RMSEA for the evaluation of both maximum likelihood CFA and EFA models.

Confirmatory and Exploratory Factor Analysis
Highly significant chi-square statistics, RMSEA values of more than 0.05, and CFI values of less than 0.90 all indicate that the CFA and EFA models for females and males fit the data poorly. Moreover, as represented in Table 3, the hypothesized five factor pattern is not fully recovered by the EFA results for females and males. Several significant (at the 5% significance level) cross-loadings on other latent factors can be detected. The hypothesized factor loading pattern is not well captured by the EFA model, possibly due to the existence of many minor correlated residuals among the factor indicators (Muthén and Asparouhov, 2012), as can be expected from inspection of the modification indices.
BSEM with Informative, Small-Variance Priors for Cross-Loadings Table 4 presents the fit results of the BSEM model with informative, small-variance priors for cross-loadings for both gender groups. The 95% CIs for the difference between the observed and replicated chi-square values do not cover zero, and the PPps are smaller than 0.05, both indicating unsatisfactory model fit. The results of this BSEM model, which are not reported, still reveal significant 2 (in the sense that the 95% Bayesian credibility interval does not encompass zero) but fewer cross-loadings and slightly higher major factor loadings, as compared with the EFA model. Increasing the variance of the prior distributions of the cross-loadings does not alter the fit results considerably. We may assume that the OEQ-II measures     several supplementary minor personality factors in addition to the five overexcitabilities. On the one hand, freeing all residual covariances would lead to a non-identified model (Muthén and Asparouhov, 2012), which in Bayesian analysis may hinder MCMC convergence; on the other hand, modifying the model using modification indices in a frequentist analysis may capitalize on chance (MacCallum et al., 1992), with a large risk of model misspecification (Muthén and Asparouhov, 2013b).

BSEM with Informative, Small-Variance Priors for Cross-Loadings and Residual Covariances
Subsequently, a Bayesian analysis was performed using informative, small-variance priors for cross-loadings and residual covariances. As represented in Table 4, the 95% CIs for the difference between the observed and the replicated chi-square values cover zero and the PPps are 0.767 and 0.905 for the female and male group, respectively, both indicating good model fit. Figure 2A presents the distribution of the difference between the observed and the replicated chi-square values for the female group. The matching scatterplot (see Figure 2B), with the majority of the points plotted along the 45 degree line, indicates satisfactory model fit for the observed data. Good MCMC convergence was established for the two models. The PSR value smoothly decreased over the iterations, reaching a value of 1.010 after half of the iterations. Additionally, the stability of the parameter estimates across the iterations was verified. Figure 3 presents posterior parameter trace and autocorrelation plots for the loading of item y10 on the intellectual overexcitability factor for the male group. The trace plot (see Figure 3A) displays a stable, horizontal band for the parameter presented, indicating convergence of the MCMC algorithm in distribution. The autocorrelation plot (see Figure 3B) displays low autocorrelation or approximate nonindependence of successive posterior samples. The posterior parameter trace and autocorrelation plots for the other parameters included in the models (not reported) were also indicative of good convergence in distribution.
Thus, the results of both BSEM models can be reliably interpreted. With the exception of one non-significant (in the sense that the 95% Bayesian credibility interval encompasses zero) major factor loading, the hypothesized factor loading pattern was fully recovered, with substantial target loadings and only one significant cross-loading (in the male group), as displayed in Table 5 (in Mplus, the reported estimates are the medians of their posterior distributions). Many minor residual covariances were found to be significant at the 5% level, particularly 49 (i.e., 4%) for the female group, with an average absolute residual correlation (range) of 0.221 (−0.254 to 0.532), and 68 (i.e., 5.55%) for the male group, with an average absolute residual correlation (range) of 0.241 (−0.294 to 0.462). Excluding these residual correlations may lead to the poor fit of the previously studied models (Cole et al., 2007), and unsatisfactory loading pattern recovery in the ESEM model (Muthén and Asparouhov, 2012). The Bayesian factor correlations are located in order of magnitude between the maximum likelihood EFA (smallest values, cf. Tables 3, 5) and CFA (largest values, not reported) correlations. However, according to theory, the factors are predicted to correlate to a considerable level. Table 5 shows weak to moderate factor correlations.
A sensitivity analysis was carried out, investigating the effects of varying the prior variances of the residual covariances on the PPp and the lower and upper bounds of the 95% CI for the difference in chi-square statistic for the observed and synthetic data. This analysis also checked the variability of the parameter estimates. Unless the research sample is extremely small, or the model and/or prior distribution are strongly contradicted by the data, the results of the Bayesian analysis will change very little when the variance of the prior distribution is altered (Arbuckle, 2013). Table 6 presents the Bayesian FIGURE 2 | Bayesian posterior predictive checking distribution plot (A) and scatterplot (B) for the Bayesian model with small-variance priors for cross-loadings and residual covariances for females. In the posterior predictive checking distribution plot, the chi-square statistic for the observed data is marked by the vertical line, which corresponds to a zero value on the x-axis. The matching scatterplot allows determining the PPp as the proportion of points above the 45 degree line. model fit results under varying prior variance conditions for the male group, and also presents the standardized estimate of the factor loading of item y1 on the latent variable of intellectual overexcitability. Initially, an inverse-Wishart prior IW(0, 56) was used for the residual covariances, corresponding to prior zero-means and variances of 0.0111 (SD = 0.1054). Augmenting the degrees of freedom for the parameters that are assumed to follow an inverse-Wishart distribution will decrease the variance of the prior distribution or increase the degree of prior knowledge included in the model. The extent to which the prior variance can be reduced is monitored through the PPp. In the framework of this residual correlations sensitivity analysis, both a less informative prior with df = 54 (corresponding to a prior variance of 0.0833) and more informative priors with df = 66, 76, and 86 (corresponding to prior variances of 0.0003, 0.0001, and <0.0001, respectively) were used. Applying a strong informative prior with df = 73 (corresponding to a prior variance of 0.0001) yielded excellent model fit, as indicated by a PPp of 0.515. However, for both gender groups, the results of the sensitivity analysis indicate that different priors for the residual covariances do not alter the estimation of the factor loadings considerably. Additionally, with rather large sample sizes, the choice of the prior variance is less important as the data contribute more information to the construction of the posterior distribution (Muthén and Asparouhov, 2012).

BSEM Higher Order Model
With respect to the female group, the 95% CI for the difference between the observed and the replicated chi-square values covers zero, with a lower bound of -197.241 and an upper bound of 92.474, and the PPp is 0.757, both indicating good model fit. The same conclusion can be drawn for the male group (PPp = 0.884, observed and replicated χ 2 95% CI [−246.146, 59.556]). A steadily decreasing PSR value, with a value close to 1 for the last few tens of thousands of iterations, as well as convergence plots showing tight horizontal bands for the parameters, and autocorrelation plots displaying low dependence in the chain, are all indicative of good MCMC convergence.
The hypothesized loading pattern depicted in Figure 1 is only partially recovered for both gender groups. Psychomotor overexcitability can be distinguished from the other forms of overexcitability, as indicated by the non-significant factor loading on the general latent construct of positive developmental potential. Regarding the measurement model, all intended factor loadings-with the exception of the loading of item y2 on intellectual overexcitability, as in the previous BSEM modelswere substantive. Nonetheless, some cross-loadings were found to deviate substantially from zero, particularly 6 for the female group, with an average loading of 0.175, and 2 for the male group, with an average loading of 0.224. Many minor residual covariances were found to be significant at the 5% level,    Omitting the cross-loadings in the hierarchical model and using informative, small-variance priors for the residual covariances δ ∼ IW(0, 56) in the measurement model also yields satisfactory model fit for both the female (PPp = 0.634, observed and replicated χ 2 95% CI [−168.305, 123.630]) and male groups (PPp = 0.800, observed and replicated χ 2 95% CI [−215.656, 88.242]), in contrast to a model that only has cross-loadings with even less strict prior variances [λ ∼ N(0, 0.09) corresponding to 95% cross-loading limits of ±0.59], which leads to a low PPp (<0.05). However, in the structural model for the female group, all target loadings are significant, although the loading of the psychomotor overexcitability factor on the latent variable of positive developmental potential must be considered small (λ = 0.261). Not permitting cross-loadings in the measurement model considerably increases the number of non-trivial residual covariances (158 for females, and 124 for males) and inflates parameter estimates.
In Bayesian analysis, the deviance information criterion (DIC) can be used for the purpose of comparing different models, where the model with the lowest DIC value is preferably selected (Spiegelhalter et al., 2002). The DIC values generated by the first higher order model and the second hierarchical model were 40,490.867 and 40,459.584 for the female group, and 25,991.245 and 25,956.312 for the male group, respectively. Although the difference in DIC is small, the models that only included residual covariances produced the smallest DIC values. However, the models with more constraints led to considerably more significant residual covariances (and, as a consequence, lower DIC values), making model comparison more difficult. Our results correspond to previous studies mentioning higher loadings on a second-order latent variable and inflated first-order factor correlations in the case of more strict models (Golay et al., 2013). Table 7 presents the results of the BSEM multiple-group approximate measurement invariance analysis with zero-means and decreasing variances for the prior distributions of differences in factor loadings and intercepts across gender. The extent to which the prior variance can be reduced is monitored through the PPp. "If the prior variance is small relative to the magnitude of non-invariance, PPP will be lower than if the prior variance corresponds better to the magnitude of non-invariance" (Muthén and Asparouhov, 2013a, p. 21). Analyses were executed for each overexcitability factor, since configural invariance had already been established (cf., BSEM models with informative, smallvariance priors for cross-loadings and residual covariances). For the intellectual overexcitability data a prior variance for Scalar invariance, as characterized by invariant factor loadings and measurement intercepts, is a prerequisite to compare factor means across groups (Vandenberg and Lance, 2000;Millsap, 2011;Muthén and Asparouhov, 2013c). For intellectual overexcitability, the factor loadings and intercepts are all invariant, regardless of the simulated prior variance, and none of the groups show a significantly (at the 5% significance level) different factor mean. For the construct of imaginational overexcitability, the use of a prior variance of 0.01 and 0.000000001 generates PPps of 0.392 and 0.225, respectively. The factor loadings and intercepts are all invariant, and none of the groups show a significantly different factor mean. For emotional overexcitability, a prior variance of 0.01, 0.001, and smaller, results in PPps of 0.500, 0.441, and 0.402, respectively. The factor loadings and intercepts are all invariant, and the male group shows a significantly smaller factor mean. For sensual overexcitability, a prior variance of 0.01, 0.001, and smaller, results in PPps of 0.598, 0.491, and ∼0.450, respectively. The factor loadings and intercepts are all invariant, and the male group shows a significantly smaller factor mean. For the psychomotor overexcitability data a prior variance of 0.01 results in a PPp of 0.518. The factor loadings are all invariant, although the intercept of item y50 ("I thrive on intense physical activity, e.g., fast games and sports") is non-invariant across gender. Decreasing the prior variance to 0.001 or smaller, still produces an acceptable PPp of 0.235 and ∼0.160, respectively, and leads the non-invariance of the intercept of y50 to disappear. The factor loadings and intercepts are all invariant and the female group shows a significantly smaller factor mean.

Multiple-Group BSEM-Based Alignment with Approximate Measurement Invariance
According to the acceptable PPps and corresponding CIs even under strict conditions (i.e., the use of prior distributions with extremely small variances of 0.000000001), we may conclude that approximate scalar measurement invariance is supported by the data for each of the overexcitability latent variables.

DISCUSSION
The first aim of this study was to validate the factorial structure of the OEQ-II using Bayesian estimation in comparison with the frequentist approach to validation. To this end, the new concept of BSEM, as presented by Muthén and Asparouhov (2012), was applied with informative, small-variance priors for cross-loadings and residual covariances, which better reflects substantive theory. The analysis yielded positive results regarding the factorial validity of the OEQ-II, in contrast to the maximum likelihood CFA and EFA models which could not generate a satisfactory model fit. The hypothesized factor loading pattern was not fully recovered by the EFA results, due to the existence of many minor residual covariances. Freeing all residual covariances in a frequentist analysis would lead to a non-identified model. Alternatively, modifying the model using modification indices may capitalize on chance (MacCallum et al., 1992), with a large risk of model misspecification (Muthén and Asparouhov, 2013b). However, Bayesian analysis allows for all residual covariances to be inserted into the model using zero-mean, small-variance prior distributions, therefore overriding the problem of nonidentification. Moreover, the BSEM approach "informs about model modification when all parameters are freed and does so in a single-step analysis" (Muthén and Asparouhov, 2012, p. 313). BSEM led to good model fit, as evaluated by means of Posterior Predictive Checking, which is less susceptible to slight, negligible model misspecifications compared to the chi-square statistic for assessing model fit (Muthén and Asparouhov, 2012). It also led to less inflated factor correlations compared to CFA, and satisfactory loading pattern recovery with substantial target loadings.
However, one major factor loading, namely the loading of item y2 ("I can take difficult concepts and translate them into something more understandable") on the latent factor of intellectual overexcitability, was not found to be substantive (although the standardized coefficient of the loading was the largest for this item). Although the content of y2 is consistent with the content of the other items that load significantly on the latent variable of intellectual overexcitability, perhaps a higher standard is required to yield the response of agreement. The level of conceptual difficulty is not defined in more detail and can be interpreted differently by various people. The study by Van den Broeck et al. (2014) also revealed a low but significant factor loading (λ = 0.33) for the respective item. Future validation studies of the OEQ-II will have to affirm how y2 compares relative to the other factor indicators and in relation to the construct of intellectual overexcitability.
Regarding the results of the higher order model, we may conclude that the construct of psychomotor overexcitability, as captured by the OEQ-II, behaves differently to intellectual, imaginational, emotional, and sensual overexcitability. The latter forms of overexcitability all load substantially on the superordinate latent variable of positive developmental potential. According to Dabrowski's theory, the presence of only psychomotor and/or sensual overexcitability in an individual hinders advanced development (Dabrowski, 1972;Mendaglio, 2012). However, according to the results of this study, the construct of sensual overexcitability is strongly related to three of the most important drivers of personality growth. Piechowski (2013, p. 105) stated that under emotional tension, psychomotor overexcitability can be manifested as "compulsive talking and chattering, impulsive actions, nervous habits (tics, nail biting), workaholism, acting out, " and sensual overexcitability can be expressed as "overeating, self-pampering, sex as pacifier and escape, buying sprees, desire to be in the limelight." Only one item of the OEQ-II is related to the expression of psychomotor or sensual overexcitability under difficult emotional circumstances (i.e., "When I am nervous, I need to do something physical"). All of the items representing sensual overexcitability are expressed in a positive way, and are indicative of a very perceptive personality, as are the other three forms of overexcitability which are considered essential to advanced personality development. The 40 items of the OEQ-II representing intellectual, imaginational, emotional, and sensual overexcitability seem to be indicative of a conscious, complex, creative, deeply emotionally engaged, sensitive, and perceptive personality with a strong susceptibility to wonder. Psychomotor overexcitability, as represented by the OEQ-II, does not have that same kind of spirit, but is more neutral and related to intense physical activity and competitiveness. Mendaglio and Tillier (2006) rightly emphasize the importance of further elaborating the empirical research on developmental potential by incorporating specific talents and abilities, dynamisms, and features of the environment alongside overexcitabilities in future studies. The results of this study also demonstrate the importance of more thoroughly examining the specific, possibly mediational role of psychomotor overexcitability in the process of personality growth, as viewed from the perspective of Dabrowski's theory.
Results of simulation studies indicate that approximate measurement invariance with highly precise priors outperforms full and partial measurement invariance in the case of (many) small differences in measurement parameters across groups (van de Schoot et al., 2013). In our study, which applied BSEMbased alignment with approximate measurement invariance, the absence of non-invariant factor loadings and intercepts across gender was indicative of the psychometric quality of the OEQ-II. The results of our study revealed a significantly higher score for females on emotional and sensual overexcitability, and a significantly lower score on the construct of psychomotor overexcitability compared to males. These results are mostly consistent with the findings of the previous studies mentioned above. However, no difference could be established in the level of intellectual overexcitability across both gender groups. The rather intellectual homogeneity of the sample may explain this result.
BSEM is an innovative and flexible approach to statistics, allowing the application of zero-mean, small-variance priors for cross-loadings, residual covariances, and differences in measurement parameters across groups, which leads to better model fit and less overestimation of factor correlations compared to CFA (which postulates exact parameter constraints and is usually too strict; Muthén, 2013;Muthén and Asparouhov, 2013a;Fong and Ho, 2014).
More generally, the Bayesian approach to statistics has many advantages over the frequentist approach. Bayesian analysis makes it possible to incorporate prior knowledgewith different degrees of uncertainty, as indicated by the variance of the prior distribution-into parameter estimation, and is well suited for testing complex, non-linear models with non-normal distributions, regardless of sample size (Kruschke et al., 2012). Even in the case of very limited prior knowledge (non-informative prior) with little influence on the posterior distribution, the Bayesian credibility interval nevertheless allows direct probability statements about the parameter values given the data (Kruschke et al., 2012).
With regard to the limitations of this study, we have to note that although the BSEM approach to factorial validation and measurement invariance analysis better represents substantive theory and avoids the need for a long series of model modifications with a substantial risk of misspecification, it is an innovative method that requires further research. Muthén and Asparouhov (2012) rightly emphasize the difficulty of balancing the need for small-variance priors for cross-loadings and small prior variances for residual covariances, which is supported by the results of the sensitivity analysis of the higher order model in this study. Moreover, the degree of susceptibility of the PPp to model misspecification warrants further research. This is of major importance given the strong influence of small-variance priors on the posterior parameter distributions, even in mediumsized samples. Furthermore, the susceptibility of the PPp to specific model features, the number of variables, and variable distributions needs to be investigated in more detail (Muthén and Asparouhov, 2012). One reviewer rightly stressed the limitation of the use of rather small sample sizes in this study-especially with regard to the male sample-according to standard criteria applied in conventional CFA and SEM analyses. Although the PPp has been found to perform better with small sample sizes than the maximum likelihood chi-square statistic, and to be less prone to negligible model misspecifications (Muthén and Asparouhov, 2012), the susceptibility of the PPp to the number of observations as well as the performance of BSEM estimation under varied sample sizes (and model features) should definitely be examined further. Future BSEM studies should investigate which sample size is required according to the number of degrees of freedom included in the model in order to ensure optimum performance. However, preliminary studies indicate that Bayesian SEM performs better with small sample sizes than does maximum likelihood SEM (Lee and Song, 2004).
In any case, using Bayesian analysis, either as a pragmatic or meta-analytic approach, it is crucial to perform sensitivity analyses which investigate the effect of varying the means and variances of prior distributions on the parameter estimates and model fit. The performance of the alignment optimization method under varied conditions also needs to be investigated further, as it represents a novel technique for measurement invariance analysis under certain assumptions.
A second limitation of this study is the use of a convenience sample to simultaneously investigate the factorial structure of the OEQ-II, as well as approximate measurement invariance of factor loadings and intercepts across gender. Future studies should preferably use independent randomized samples to crossvalidate the OEQ-II and investigate (approximate) measurement invariance across varied conditions. Apart from this, the results of our study coincide with the findings of the study by Van den Broeck et al. (2014), and are supportive of the psychometric quality of the OEQ-II.
The Mplus scripts for the main BSEM analyses in this study are available as Supplementary Material.

AUTHOR CONTRIBUTIONS
ND: study design, data analysis, drafting the manuscript, approving the final version of the manuscript for submission, accountable for all aspects of the work. PV: data acquisition, critically revising the manuscript, approving the final version of the manuscript for submission, accountable for all aspects of the work.